Glare Engine is a multithreaded engine based on the OpenGL rendering API. It has a wide variety of features implemented in the engine. In terms of user experience, it has an application class for the users to inherit and override the init, update and fixed update themselves. In order for the engine to work, you will also need to implement the CreateApplication method and create an application of your inherited class and return it.
Fig 1.1. Multithreading tracing
In order to synchronize the multiple threads of the engine it uses Display Lists which can be filled with Draw Commands from anywhere in the application. They will be executed on the main thread by order of entry once the logic threads are done.
Glare Engine has a light system that implements Directional Lights, Spotlights and Point Lights. The system supports infinite Point Lights, and up to 4 Directional Lights and 4 Spotlights that are able to cast shadows. The Point Lights are computed using a volume instead of a plane, a sphere in this case.
In addition, the users can create their own scenes through the fully featured editor or through the Lua scripting language both capables of creating the three supported light types and objects in the world and assign them different geometries, materials and textures.
Furthermore, the users can save or load their work thanks to the YAML scene serialization feature. Users can also explore their scenes with our controllable camera.
Fig 1.3. Save scene window
Fig 1.2. Lights and materials editor window
Lua scripting also offers the users more control over the engine than the editor, like a wider variety of systems that can be added or the possibility to change the skybox.
The engine also implements resource management for textures, geometries and materials. This feature manages the loading and creation of these assets while also providing a centralised point of access to them.
Glare Engine works via Deferred Rendering and users can visualise the result of each pass from the editor itself without needing an external debugging program such as RenderDoc. This window can also show the depth map that is generated from a light if the selected editable entity has the component “Cast Shadow” or other processes results such as SSAO.
Fig 1.4. Editor window that renders all of the produced textures via render targets
As mentioned before the engine supports post-processes like SSAO and Bloom which are enableable and editables from within the editor. Any post-process can easily be added due to it’s post-process system management.
In terms of programming paradigms, the engine works with an Entity Component System in pursuit of optimization and saving frame time. This means that the engine has components that contain data, these components can be added to entities so that systems can work over these entities using the data that their components contain.
Another goal of Glare Engine is to give the users all the tools they need to create scenes and visualize its performance without the need of opening an IDE or the terminal. For this sake the engine also has its own terminal and performance tab. The logs from the terminal will be dumped to a log.txt file, this helps users debug their applications and also provides more in depth information in case of a crash.
Lastly Glare Engine also implements PBR (Physically Based Rendering), which is a computer graphics approach that aims to achieve photorealism.
Fig 1.5. Post-process editor window
Fig 1.6. Performance window
Fig 1.6¡7 Console output
Throughout the development of the engine, we have had different material APIs. From the very basic to the one we have right now that tries to get close to how UE4 works with its materials from C++.
At the beginning we had a material system that worked, but that it was hard to maintain and was not easy to add new materials. Due to this, we made a new material class. The new class was coded using C++ templates, in order to minimize code repetition and try to make a user usable API. We had 1 material class and each type of material inherited from the parent MaterialSettings class. In theory this was simple, you created a settings class for each different type of material you wanted, but when wanting to edit the settings in runtime, things started to be messy. In order to edit something, you first had to get the material id, then get the pointer to the material. After that, you had to take its settings, and cast it to your settings class in order to access its attributes. The code was not pretty for the end user so we had to make some changes.
Embracing the power of the templates, we made a new version of the materials. Since we are accustomed to UE4 and its material handling in C++, we tried to implement something that worked similarly for the end user.
Fig 2.1.1 Set float parameter on UE4
Fig 2.1.2 Set float parameter on Glare Engine
The new version of the materials works similarly to the 2.0 version, the main change is that the MaterialSettings class does not hold attributes that forces you to have to cast the class in order to access the settings. Instead, it has a SettingsDataManager attribute that holds and deals with all the types of possible attributes that a settings class might have. This class was also made using C++ templates in order to make a generalistic code.
So as to add custom parameters to a MaterialSettings class, in the constructor you just need to declare the parameters using the add_parameter method giving it a name and a default value. The type of the parameter is inferred based on the default value you give it, so if you want a vec4 for example, you must give a vec4 as a default value. Then if you want to edit or recover a parameter value, you can just get or set it by referring to it with its name, so you never need to include or cast to the specific settings classes, since the material class handles all of it from the inside.
Fig 2.1.3 Parameters of a MaterialSettings class
After having the Forward Rendering Pipeline fully functional, we started to change it to a Deferred Rendering Pipeline. In summary, instead of computing the lights in the same shader as the geometry pass, basically computing the lights per drawn object, we first make a geometry pass that outputs the object position, normals, and color as textures and then, after every object has been drawn, we compute each light once, based on those textures.
This has a huge improvement on our performance, that is if we render a lot of lights at the same time. Deferred Rendering is used for large quantities of lights, mostly small lights, that is when the performance improvement really starts to be noticed. It is not magic and it does not come without setbacks. Using this type of pipeline, limits the engine to one material. This material has to be large enough to support all kinds of different types of objects, so if you want to implement it, you have to forget about doing specific materials.
Lights
Making this change is not enough to use the full potential of the Deferred Rendering. Lights need to be tweaked in order to squeeze every frame out of it. On our first iteration, all of our lights (directional light, spotlight and point light) were computed using a plane geometry, the whole screen basically. This means that the lights, even though that are only calculated once, they calculate each of the pixels of the screen, so as you can imagine, this is not the best. For directional lights using a plane is the way to go, since it is a global light, so you must compute all of the pixels of the screen. In the case of spotlights and point lights, using the whole screen is a waste of resources since usually they only light up a small part of the screen. The best thing in this case is using a geometry that suits each of the two lights, a sphere for the point light and a cone for the spotlight. This technique limits the computing of the lights to only the amount of pixels that the volume covers, allowing for a much larger quantity of lights.
Fig 2.2.1 Forward rendering vs Deferred rendering
This is one of the first techniques that we implemented on the engine, it allows us to synchronize the logic and rendering threads. The Display List, as the name suggests, is a list which contains Draw Commands that will be executed in order of entry. They contain the action that will be performed when a list is executed, these actions encapsulate graphic API functions (in our case OpenGL) that can only be called from the main thread.
After a Display List is filled with its commands it is uploaded to the window class to a list of Display Lists, which is a convenient general place of access for uploading them. The window class contains two linked lists of Display Lists, while one of them is being filled the other one is being executed, and once the list that is being executed is empty the lists swap positions and the frame is ended.
At first we decided that Draw Command had to be a virtual class and that any command that inherits from it had to be implemented on its own .h and .cc file. However, after some time utilising the technique we noticed that our project was getting filled with Command classes that were, at most, 100 lines long and that all of those files were only enlarging our project pointlessly. That is why we came up with the solution of implementing each class’ Draw Command on the same file as the class, that way we drastically reduced the amount of files created and centralised each class’ utilities while keeping the same functionalities.
Fig 2.3.1 Display Lists
The bloom is a post-process effect that has as goal recreate the effect of something glowing. It consist in extracting an rgb value of a “light-finished” scene render target that surpasses a given threshold and save that values into its own texture, after that we blur the given texture with a Gaussian blur which makes the blur harder on the inside and finally we blend the given texture to the final texture of the scene.
This post-process effect has been implemented in the engine as follows.Firstly and after all the lights have been processed we’ve store in a texture only the values that surplus a given threshold (brightness)
After that we started doing the Bloom pass and it came with the first problem as we can not write on the same texture we’re reading, for that we used 3 different materials one that blurs horizontally, one that blurs vertically and one that is used as a decoy passing between them the blurred texture in each axis. To finalize we blend the final blurred texture into the main one.
Fig 2.4.1 Bloom
The entity component system also known as ECS is a popular method used to avoid the inheritance of Object Oriented Programming. Instead of having huge class trees, objects are formed by adding components. These components normally have no inheritance between them, and in order for components to have behaviours, systems exist. It is a data-oriented design.
The systems are the connection between the components. They take different components and add a behaviour to them. In summary, systems are not created for a specific object, instead, they are created to affect objects that meet a certain criteria or pass a filter, which are the components. For example, if a system has a signature of transform and velocity components, all the entities that have those 2 components will be affected by that system.
Making a good ECS is hard, and most of all it takes time. There are 2 good examples of engines using ECS such as Unity and Karisma Engine of Digital Legends. Also, there are a lot of already well built ECS libraries out there like Entt [8]. A bad implementation or simply a good to go implementation, can lead your engine to lose performance instead of losing it. ECS is meant to be used when a large quantity of entities is at play.
Us at Glare Engine, implemented a simple version of ECS which cannot be compared to the ones mentioned above. It works and it is profound enough for us to grasp an understanding of this new programming paradigm called ECS that is slowly invading the graphics programming world.
Fig 2.5.1 ECS
The main difference between OPP and ECS is the data management. As you can see in the image above, if an entity consists of a transform, mesh and material components, when using OPP all that data is stored in a contiguous memory space. That means, that if you stored all the entities in an array and you wanted to access the transform componentes of each of them, you would have to go jumping through memory since the mesh and material components would be on your way. This is not really fun for the memory. In the case of ECS since all of the components are stored in a separated array and the entity only has an id of the component it owns, if you had to loop through all the transform components, you would loop through a contiguous array of memory, being cache friendly.
Shadow Mapping is a technique that is used for generating shadows from light sources. The way it works is by rendering the scene from the light’s point of view and taking a “picture” where we capture the depth of each pixel and output it to a texture. Everything that is shown on the depth texture will be lit from the light source, and everything that is not on the texture but is inside the frustum of the light will have a shadow casted on, unless it is being lit by another light source.
After we have generated the depth map we just have to convert each rendered point to the light’s coordinate space, afterwards we index the shadow map to obtain the closest visible depth from the light’s perspective. Lastly, we compare both depths to determine if the rendered pixel’s depth is greater than the shadow map’s. In that case the point will be within shadows and thus will not be lit.
Fig. 2.6.1 Shadow projecting on plane from lightsource
For the Shadow Mapping implementation in our engine we have used ECS.
Each light is an entity that has its own transform and light component (Directional Light Component, Spotlight Component, etc.). Additionally, lights that are able to cast a shadow have the “Cast Shadow” component. This component stores the light’s depth texture, its material instance and its light space matrix, that way we can pass it on to the light’s shader when computing the shadows.
We use a system that runs after all of the geometries’ transformations are computed. This system goes through every light with the Cast Shadow component, generates the view and projection matrices for that light, attaches its’ depth texture to the framebuffer that will “take the picture” and finally renders the scene using the generated light space matrix. This way each light has its own shadow map that is generated rendering the scene as if it were a camera.
Finally, on the light’s fragment shader we compute the shadow and use this result to determine how much of each light will be applied to the pixel, if the light is not able to cast shadow the result will always be 1, so it will not affect the light’s result.
For computing the shadow first we perform a perspective divide and transform the projection coordinates to the range of 0 and 1 so that we can use these coordinates to sample from the Depth Map.
Fig 2.6.2 Point’s depth comparison
Fig 2.5.1 ECS
Fig. 2.6.5 PoissonDisk array
In order to achieve softer shadows and reduce the aliasing we have used the Poisson Sampling technique in combination with PCF (Percentage Closer Filtering). This means that we sample from the Shadow Map N number of times each time with different UV coordinates. Each time that the sample is occluded from the light source we subtract to the final light’s result, thus giving us an artificial gradiance on the shadow’s edges and softening them.
The “PoissonDisk” value that you see there is just an array containing constant vec2 that is used for randomizing the sampling.
In order to avoid shadow artifacts there are some solutions that we have implemented. The first and most noticeable shadow artifact is usually shadow acne, this occurs because of the limited resolution of the shadow map which means that multiple fragments sample from the same value. In order to avoid this we just need to add a bias to the depth comparison on the shader. This bias could be variable basing it on the surface’s normal and the light's direction, but during testing we found out that a constant value worked perfectly for us and we could avoid these extra calculations without any noticeable problem.
Another shadow artifact that usually occurs is Peter Panning, the effect that this artifact has is a slight detachment of the shadow from its object. It is the result of our previous fix, because we are applying a bias and reducing the depth of some of our shadow’s points the ones that are the closest to the object aren’t sampled at the exact position. Luckily there is an easy fix that we can apply for this artifact, we just need to change the cull faces to front when rendering the scene in light space.
The last shadow artifact that we had to overcome was Over Sampling. The problem with this shadow artifact is that the regions that are outside of the light’s frustum are considered to be in shadow. This is caused by the fact that these values of depth are achieved from outside the light’s frustum. In order to fix this we must configure, in the case of OpenGL, the border so these coordinates return a maximum depth of 1 and make the textures clamped to the border. Even with that there is still some Over Sampling, the solution to this is also rather simple, we just add a guard to the Compute Shadow function that returns whenever the projected vector’s z coordinate is larger than 1.0.
These are the general settings of our shadows, but there are still some specifications depending on the type of light that is going to be computed.
Fig. 2.6.4 ComputeShadow Function
Fig. 2.6.6 Compute Shadow Guard
For the directional lights’ projection matrix we use an orthographic matrix because directional light is not supposed to have a position. However, because we use the inverse of its transform matrix in order to generate the view matrix they must have a position that will affect not only the generation of the depth map but also the shadow computing in the fragment shader. To fix this we have implemented a system which makes it so directional light’s position is always at a radius of the camera, the camera being the center, while looking directly at it. This gives the impression that the directional lights have no position because everything that the camera is able to see is lit by the directional light.
The only difference between the spotlights and the directional lights is that spotlights do in fact have a position. This affects the creation of the projection matrix, because now it must be a perspective matrix in order to generate a shadow that looks like the one that a flashlight would generate, where it gets wider the further it is from the light source.
Screen Space Ambient Occlusion is a technique used to simulate the natural light phenomenon that happens without us even noticing. As we all know, light in real life, bounces off when colliding into something. This means that the corners of objects usually are less bright than the other part. This can really be appreciated by looking at wall corners for example.
If we take a look at the image on the right, we can see that the corners of the mesh are darker than the rest of the mesh. If we take this image and use it together with the light calculations, we can multiply the amount of ambient occlusion that a pixel has to the light output to create similar results as the real life ambient occlusion where edges are brighter and corners are darker, this gives the scene a more realistic lightning look.
The SSAO algorithm was developed by Crytek. The good thing about this algorithm is that it simplifies the ambient occlusion calculation and makes it more performant. The idea here is that you only compute the ambient occlusion once per pixel. You need to sample random points around that pixel's view space position, and check if these points are inside or outside the geometry (using the position texture). Depending on the number of points that fall inside or outside of the geometry in the world, the pixel will be brighter or darker.
> points inside → Darker
> points outside → Brighter
There are two parameters that can be tweaked from the editor in order to achieve the desired results: bias and sample radius. The sample radius, as you can imagine, is the radius from the pixel's view space position,where the sample positions will be checked. The bias is used to avoid acne issues that might appear due to precision problems and it is added to the sample positions depth when computing the occlusion.
After computing the occlusion value for each pixel and before we pass it to the lightning pass, we make another pass that blurs out the result to achieve a more smooth results instead of really sharp lines or also to avoid weird fragments that might appear, since the points we sample are random, so the results are random. This technique is not perfect, but it does the work trying to make the least calculations possible. In the case we wanted a more detailed AO technique we can always implement the HBAO and HBAO+ techniques.
Fig 2.6.1 SSAO
If skyboxes didn't exist, worlds would feel empty. Players don’t even notice that they are surrounded by 6 still images that are moving with them wherever they go, giving them a sensation of really being inside a world. But that is what they really are, 6 images distributed in the inside of a cube.
To make this, you first need to create a cubemap texture, and render a cube to screen that always goes along with the camera, with the texture mapped using the cubes position as texture coordinates.
When using a Deferred Rendering pipeline, things start to change. The skybox has to be the last thing you render, since you don’t want to compute the lights for the background or apply any post processing to it. The problem here is that you need to have access to the geometry passes depth texture or depth buffer because you only want to render the skybox where no object has already been rendered, behind everything basically. So in order to do that, you take the geometry passes framebuffer and you copy the depth buffer to the skybox passes framebuffer.
The texture coordinates used by the fragment shader to sample the cubemap texture are not the same UVs that the cube geometry has. A skybox's texture coordinates are obtained using the vertex positions, in this case, since it is a cubemap texture and not a 2D texture, the UVs are vec3. If you look closely, the z coordinate is flipped, this is done because the cubemaps coordinate system is left-handed.
Fig 2.7.1 Lake skybox in Glare Engine
Fig 2.7.2 Skybox vertex shader
When programming the engine, we encountered performance issues that appeared when a lot of objects were being rendered at the same time. To solve this, we started to use instanced rendering or batched rendering. This means that objects that share a geometry and material could be rendered in one single draw call instead of having to make a draw call for each of them. This performance gain starts to be noticeable when large amounts of the same type of objects are being rendered, it is not really for small quantities.
After we had this manually done, for it to be scalable we had to implement a way to automatically select the object that could and would be instantiated, since doing it on your own was not really fast. So before rendering objects, we analyze and sort the objects that have the same geometry and material id, and store the world transform in a list with a key made up of those two values. After sorting the map, we render the objects that have 1 world transform in the list as usual and the ones that have more than 1 are rendered using instanced rendering.
This algorithm can be improved in many ways, since the sorting algorithm is slow and probably we lose more performance doing it than rendering each one separately when the entity count is low.
Fig 2.8.1 Instancing sorting algorithm
Fig 2.8.2 Instancing rendering algorithm
When programming the engine, we encountered performance issues that appeared when a lot of objects were being rendered at the same time. To solve this, we started to use instanced rendering or batched rendering. This means that objects that share a geometry and material could be rendered in one single draw call instead of having to make a draw call for each of them. This performance gain starts to be noticeable when large amounts of the same type of objects are being rendered, it is not really for small quantities.
After we had this manually done, for it to be scalable we had to implement a way to automatically select the object that could and would be instantiated, since doing it on your own was not really fast. So before rendering objects, we analyze and sort the objects that have the same geometry and material id, and store the world transform in a list with a key made up of those two values. After sorting the map, we render the objects that have 1 world transform in the list as usual and the ones that have more than 1 are rendered using instanced rendering.
This algorithm can be improved in many ways, since the sorting algorithm is slow and probably we lose more performance doing it than rendering each one separately when the entity count is low.
Fig. 2.9.1 Fresnel https://marmoset.co/posts/basic-theory-of-physically-based-rendering/
PBR implementation
Glare Engine has PBR implemented for all of its 3 types of lights. Each of them have the same base code with some varieties. The spotlight for example, has an added code that handles filtering the pixels to make the cone lookalike cutoff shape.
The first part of the shader (line 63 to 67) is used to compute the radiance of the pixel. With the distance from the light’s center to the pixel's world position, we calculate the attenuation of the light to then use it to know the brightness of the pixel.
The Fresnel Schlick equation is used to try to get similar results of the base Fresnel equation which can be a bit heavy. The Fresnel equation provides us a percentage of light that gets reflected from a surface after a ray of light hits it. F0 in this algorithm represents the surface reflection at zero incidence.
With the DistributionGGX function we calculate the NDF or Normal Distribution Function, that based on the roughness of the surface computes the quantity of microfacets aligned to the halfway vector.
The GeometrySmith function is another one that we need to use for the Cook-Torrance BRDF to work. The value returned from this function represents how much a microfacet is overshadowed by another in a surface. Lightning reflection starts to reduce when roughness gets higher.
All of the three values above are calculated to then compute the Cook-Torrance BRDF which allows us to know the reflectance or the amount of light that gets reflected.
In the end we calculate the NdotL which we use to know if a pixel is facing the light or not, with a dot product between the surface normal and light direction. And to finish, we use all the data we computed to get the pixels outgoing radiance.
Fig. 2.9.2 PointLight PBR fragment shader
[1] Bloom technique - LearnOpenGL https://learnopengl.com/Advanced-Lighting/Bloom
[2] YAML Scene Serialization - The Cherno - 2020 https://www.youtube.com/watch?v=IEiOP7Y-Mbc&t=1945s
[3] Render to target - opengl-tutorial http://www.opengl-tutorial.org/intermediate-tutorials/tutorial-14-render-to-texture/
[4] Shadow Mapping - ogldev https://ogldev.org/www/tutorial23/tutorial23.html
[5] Shadow Mapping - LearnOpenGL https://learnopengl.com/Advanced-Lighting/Shadows/Shadow-Mapping
[6] Shadow Mapping - opengl-tutorial http://www.opengl-tutorial.org/intermediate-tutorials/tutorial-16-shadow-mapping/
[7] ECS - Nomad Game Engine https://savas.ca/nomad
[8] ECS - Austin Morlan https://austinmorlan.com/posts/entity_component_system/
[9] ECS - Entt https://github.com/skypjack/entt
[10] ECS - Digital Legends http://www.digital-legends.com/
[11] SSAO - ogldev https://ogldev.org/www/tutorial45/tutorial45.html
[12] SSAO - 3D game shaders for beginners https://lettier.github.io/3d-game-shaders-for-beginners/ssao.html
[13] SSAO - Learn OpenGL https://learnopengl.com/Advanced-Lighting/SSAO
[14] Skybox - ogldev https://ogldev.org/www/tutorial25/tutorial25.html
[15] Skybox - Khronos https://www.khronos.org/opengl/wiki/Cubemap_Texture
[16] Skybox - LearnOpenGL https://learnopengl.com/Advanced-OpenGL/Cubemaps
[17] PBR - Filament https://google.github.io/filament/Filament.md.html
[18] PBR - Sébastien Lagarde https://seblagarde.wordpress.com/2011/08/17/hello-world/
[19] PBR - Theory of PBR https://marmoset.co/posts/basic-theory-of-physically-based-rendering/
[20] PBR - LearnOpenGL https://learnopengl.com/Advanced-Lighting/Advanced-Lighting