Ground cover rendering in PURE
Excerpt from Siggraph 2009 presentation
I created a novel technique for rendering fields of alpha blended quads, depth sorted by view direction.
Here I describe the rendering systems in PURE which placed ground cover such as rocks, flowers and grasses in the world.
Grasses and flowers were rendered as alpha blended screen-aligned sprites. To avoid depth sorting artifacts I used predetermined rendering orders that were dependent on view direction.
The PURE playable game area was 2 km square in size. The ground cover was laid down by artists using a density map which covers his area with a resolution of one pixel per square meter. A typical density map is shown below.
The density map indexes the 16 distinct ground cover objects available on each level, where the darkest part of the map is index 0 and the lightest is index 15. To give the artist an intuitive way to map the levels, the indexing scheme was designed so that smaller indexes correspond to smaller items of ground cover. Index 0 represents no cover, indexes 1 to 3 represent small, medium and large gravel, and indexes 4 to 15 represent increasing heights of grass and flowers. The gravel ground cover was rendered using hardware instanced 3d models. All the grasses and flowers were rendered as screen-aligned sprites. Examples textures for these are shown below.
The grass was rendered in a world aligned square region fixed around the camera. The region was broken up into 400 tiles of 8 meters square.
Each pixel in the density map represented the types of ground cover within a 1m square area. Within the 1 meter area I rendered 4 screen aligned sprites each at half meter intervals, with a small amount of random offset from their grid position. In total there were 256 sprites rendered on each grass tile. Each sprite was aligned to the terrain surface.
The size of a rendered sprite was determined by its index in the density map, however individual sprites had a small random size offset give a more natural appearance.
Calculating the data for each grass tile is a CPU intensive operation so the tile data was cached. The contents of the cache were updated as the camera moved around the world, ejecting tile data that was no longer visible.
The data for each grass sprite was stored as a single float4 vector. The world position of the point stored in the xyz components and the grass type and random size offset combined into the w component. The generation of the render data was further accelerated by prefetching this tile data into the cpu cache line.
The vertex stream for each tile was generated once per frame. Although the data of a grass point was stored as a single vector4, we copy this into four vector4 values to generate the four sprite vertices in the vertex stream.
The w component of each vertex is encoded with the sprite corner number and the randomized scale multiplier. This sprite corner provides the vertex shader with a lookup into an array of sprite definitions. the array contains the texture coordinates and size offset data for each sprite corner of each grass type. Each sprite has 4 corners, and there are 16 grass types, so the array contains 64 elements.
With 256 grass points, the vertex stream is 16KB for each tile. The vertex information is decoded in the vertex shader using the follow code
To avoid depth sorting issues with the alpha geometry, the 8m square grass tiles are rendered furthest from the camera first. We also attempt to render the sprites on each individual tile from back to front. To do this we pre-calculate the ideal rendering order for a fixed number of view directions around a standard tile. Then we find the closest match to the current camera direction and use its pre-calculated render order. On Pure we found that 16 pre-calculated views was a good trade-off between storage size and visual quality.
For performance reasons we found that these pre-calculated render orders couldn't be created with a per-sprite granularity. Because we copy the tile cache information into the vertex stream once per frame, memory access speed is an important factor in CPU performance. We have found that performance is best when tile data is read in cache-line sized chunks. On our target platforms this means processing sprites in groups of 8.
We arranged the memory of each grass tile into 4x2 sprite cells. Each cell contains one cache-line worth of data. This is shown below
Tile cache layout showing the division of a tile into 32 sections, each containing 8 screen-aligned sprites
Tile cache render order shown for two viewing directions. Darker areas are drawn last.