George Parrish (Portfolio) - Efficient deferred shading

Extract from Siggraph 2010

Screen space classification for efficient deferred shading

Disney Interactive Studios

I co-developed this technique and implemented the PS3 version.

A detailed explanation is published in the book "Game Engine Gems 2".

Figure 1: (a) Final rendered image, (b) Soft shadow classification, (c) Sky classification, (d) MSAA edge classification

Introduction

Deferred shading is an increasingly popular technique for video game rendering.

In the standard implementation a geometry pass writes depths, normals and other properties to a geometry buffer (G-buffer) before a lighting pass is applied as a screen space operation. Deferred shading is often combined with deferred shadowing where the occlusion values due to one or more lights are gathered in a screen space shadow buffer.

The universal application of complex shaders to the entire screen during the shadow and light passes of these techniques can contribute to poor performance.

A more optimal approach would take different shading paths for different parts of the scene.

For example we would prefer to only apply expensive shadow filtering to known shadow edges. However this typically involves the use of dynamic branches within shaders which can lead to poor performance on current video game shading hardware.

We developed a novel technique for decomposing the screen into tiles and defining a number of useful tile classification criteria. This approach can be used to reduce the complexity of the lighting pass in deferred shading.

Our approach

In our approach we divide the screen into tiles, where each tile is 4x4 pixels.

We pre-generate a screen aligned mesh of quads, each quad covering a single tile of 4x4 pixels.

For each tile we aim to apply lighting shaders that contain the minimum required complexity.

The seven global light properties used on Split/Second are the following:

Sky. These are the fastest pixels because they don't require any lighting calculations at all. The sky color is simply copied directly from the G-buffer.
Sun light. Pixels facing the sun require sun and specular light calculations, unless they are fully in shadow.
Solid Shadow. Pixels fully in shadow don't require any shadow or sun light calculations.
Soft shadow. Pixels at the edge of shadows require expensive eight-tap percentage closer filtering (PCF) unless they face away from the sun.
Shadow fade. Pixels near the end of the dynamic shadow draw distance fade from full shadow to no shadow to avoid pops as the geometry moves out of the shadow range.
Light scattering. All but the nearest pixels have a light scattering calculation applied.
Antialiasing. Pixels at the edges of polygons require lighting calculations for both 2x MSAA fragments.

We calculate which light properties are required for each 4x4 pixel tile and store the result in a 7-bit shaderID.

To achieve this we apply a full screen classification pass which determines which properties of our lighting calculation are required in each screen tile. The classification pass outputs a screen aligned classification texture containing the shaderID for each 4x4 pixel tile.

Shaders for each combination are pre-generated - one for each unique shaderID - by preprocessing an uber-shader during shader compilation.

Index buffer generation

Once the shaderID for each tile has been calculated we generate an index buffer with which to submit a single draw call for each shader found in our scene.

The index buffer defines which of the quads in the screen aligned mesh are drawn for each shader.

In the image below we see the index buffer generated for soft shadow edge pixels, highlighted in green.

PS3 Implementation

In the PS3 implementation the index buffer generation is carried out on the SPU cell processors, and synchronised with the GPU submission and rendering.

An SPU job is executed for each shaderID, which analyses the classification texture for tiles with a matching shaderID, and builds a corresponding index buffer.

Each SPU writes directly into the GPU command buffer, by patching draw calls in the GPU command buffer for each index buffer generated.

A GPU fence is released when all SPUs are finished.

Tile size

The optimum tile size to trade off average tile complexity against vertex cost can vary according to hardware and scene complexity. For Split/Second we found that a tile size of 4x4 pixels gave a good balance. Vertex processing cost is further reduced by a simple aggregation of adjacent tiles with the same shaderID.

Shader Management

Rather than trying to manually manage 128 separate shaders, we opted for a single uber-shader with all the lighting properties included, and we used conditional compilation to remove the code we didn't need in each case.

This is achieved by prefixing the uber-shader with a fragment, defining just the properties needed, for each shader combination. The listing below shows an example for a shader only requiring sunlight and soft shadow.