The aim of this project was to optimize a physics program to run as fast as possible. Originally, the project contained objects that would simply check collision with every other object in the scene to determine if there was a collision. I used Optick to profile the program to identify functions that has a high %self to allow me to target my optimizations.
Through the course of the project, i had incorporated both memory management and collision optimizations, taking frame times at 800 objects from 65ms down to 2.8ms.
For the collision, i had to find a method to reduce the total checks as the object count is increased so it would not be O(n). Doing so, I initially split the scene into a tree along the x and z axis, more than halving frame time. However, this was not optimal when making use of the y axis and a high object count.
Following this, I implemented a basic OctTree, further reducing frame time. While very good, the overhead of tree creation took up quite a large amount of time during the update function. Following this, I modified the program to make use of a fixed tree with Prune and Sweep to further reduce frame time, the removal of the tree creation overhead did reduce frame times. However, it fails to scale to higher object counts. Equally, Prune and Sweep had a minimal impact, with a 0.2ms difference.
Finally, I modified the OctTree to be a fixed tree, allowing the scene to be split up automatically, not manually. Initially this increased frame times. However, I modified the functionality to make use of a "loose list", where on each frame, objects check if they are still colliding with their previous leaf node, if not, they are added to the loose list and then put through the standard tree traversal to find their new leaf node. This slightly increased memory usage, but dropped frame times even further.
Memory management was equally vital to reducing frame times and project initialization.
For safety, I added canaries, using header, footers and trackers to objects created using new, allowing me to traverse the heap and check for errors. Equally, the use of headers enabled me to add objects to memory pools.
My Memory pools were initialized at the start, making use of the size of objects to create a pool of appropriate size. During object initialization, the pool is checked to see if there is a chunk available, if so, the object is created at the chunk, with the header containing if it is from a pool or not, following this, on deleting the object, the pointer to the chunk of memory it inhabited is simply added back to a free chunk list in the memory pool for later reuse.
For Object Initialization on 800 objects, the time taken reduced from 0.982ms (including both the header, footer and memory trackers) to 0.776ms.
I have also implemented Arenas for temporary data, simply being a chunk of memory with an offset, where objects added are placed at the offset and the offset is then incremented by the objects size. At the end of a frame, the offset is set to 0, "resetting" the Arena.
To remove thread creation overhead, I have implemented a thread pool, allowing for thread reuse. This makes use of a thread array and a task queue.
Over the course of the project, I came to understand the importance of profiling code and optimization in ensuring game functionality runs at an appropriate speed alongside other aspects of a game such as graphics.
I found manually managing memory quite helpful.
If i were to return to this project, I would optimize the memory usage of the program, where instead of storing the parent leaf node on the object, i would instead loop through each leaf node and then check if the objects in its list still collide with it. Resulting in the only memory increase being from the use of a loose list and not the extra pointer per object as well.
I would also seek methods to prevent object clipping at leaf bounds as objects can sometimes phase through one another if at leaf boundaries as the leaf node is determined based on an objects center position.