3 Comments

Dynamic weather part 2 – occlusion queries.

I talk a bit about how the weather system is implemented in this post about snow cover, but I’ll go over it briefly again.

The weather consists of a series of precipitation and temperature fronts that move across the map. Each “front” is a blob-like texture that is rendered to a render target after being combined with a climate data map. The render target is (currently) 256 x 256 for a 1024 x 1024 world map. The final render target represents the amount of precipitation falling, the temperature, and the wind strength/direction at each spot in the world map. Additionally, for regions that are cold and wet enough, I accumulate snow in a separate low resolution snow cover render target. To give you a better idea, my diagnostic visualization for this data looks something like this:

 

According to some folks, reading back data from a render target after several frames of delay shouldn’t cause a stall, but that is not what I’m experiencing. No matter how many frames I wait, I take a big perf hit when reading the data – even if it’s just one pixel. So I’m currently in the process of removing any texture reads (GPU to CPU) in my game engine. I’ve already redone my HDR algorithm so it no longer needs to read exposure from the GPU, and the weather system is the final step. It varies greatly depending on the computer I use, but on my main development machine (GeForce 8500) I take about a 5ms per frame hit by making any GetData calls on a render target. There are two reasons I need to read data back from the GPU with this system:

  1. I need to keep the snow cover map in CPU memory, since the graphics device may be reset at any time. It is also needed for game saves.
  2. I need to know the precipitation and temperature at the current world location so I can, for instance, render the appropriate number of rain drops, or control the cloud cover.

I’ve been investigating occlusion queries as an alternative to retrieving this data, and have come up with an implementation that seems to work for me. To take the problem at its most basic, let’s see what we have to do to get the value of a single pixel. Occlusion queries simply allow you to ask how many pixels a Draw call actually ended up drawing. The data is not immediate – it comes back several frames later – but that is not a concern for us. The important part is that they don’t cause a stall. So how can we get the value of a pixel with this? We can draw n primitives to a 1 x 1 render target, where each primitive has a mask value that is compared to the pixel sample. If the mask value is larger than the pixel sample, we can abort the shader output with the clip intrinsic, thereby avoiding have that drawn pixel counted. It looks something like this:

 

float4 PixelShaderFunctionWeather(VertexShaderOutput input) : COLOR0
{
 // Using (0.5, 0.5) for the texcoord, since I'm sampling from a 1x1 RT.
 float4 value = tex2D(ScreenS, float2(0.5, 0.5));
 // This clip outputs a pixel when the value is larger than the mask.
 // For instance, if we draw with the following masks:
 // 0.25, 0.50, 0.75, 1.00
 // If the pixel value is 0.7, we'll get 1 of 4 pixels drawn.
 clip(value - input.Mask);
 return float4(1, 0, 0, 1); // We can return whatever we want here.
}

 

For example, if we draw 200 primitives (with ever-increasing mask values), and end up having a pixel count of 50 for the occlusion query, we know that the pixel value is (50 / 200), or 0.25. So we just need to prepare a vertex buffer that contains the necessary primitives. We can use a triangle for each. Our vertex only needs to contain position and a mask value:

 

struct OcclusionQueryVertex : IVertexType
{
 public OcclusionQueryVertex(Vector3 position, Color mask)
 {
  Position = position;
  Mask= mask;
 }
 public Vector3 Position; // Position in projection space.
 public Color Mask; // Isolates the component we're interested in and species mask value
 public static readonly VertexElement[] VertexElements = new VertexElement[]
 {
  new VertexElement(0, VertexElementFormat.Vector3, VertexElementUsage.Position, 0),
  new VertexElement(sizeof(float) * 3, VertexElementFormat.Color, VertexElementUsage.Color, 0),
 };
 private static VertexDeclaration vertexDeclaration = new VertexDeclaration(VertexElements);
 public VertexDeclaration VertexDeclaration
 {
  get { return vertexDeclaration; }
 }
}

 

And our vertex buffer consists of a series of triangles, each with a different mask value. One triangle may look like:

 

 // These weird numbers attempt to cover the entire screen with one triangle.
 vertices[vIndex + 0] = new OcclusionQueryVertex(new Vector3(-3, -2, 0), colorMask);
 vertices[vIndex + 1] = new OcclusionQueryVertex(new Vector3(0, 3, 0), colorMask);
 vertices[vIndex + 2] = new OcclusionQueryVertex(new Vector3(3, -2, 0), colorMask);

PIX capture of the draw call for the occlusion query. Note the mask values in the red component.

 

The vertex shader should just pass through the position. The value n determines our resolution. If we want to be able to discern all 256 different pixel values, we need 256 triangles, each with a different appropriate mask value. After setting up the input texture and setting the vertex buffer, the draw call looks like:

 

 occlusionQuery.Begin();
 device.DrawPrimitives(PrimitiveType.TriangleList, 0, Resolution); // Resolution = n
 occlusionQuery.End();

 

Then we can read from the occlusionQuery several frames later when it is complete (IsComplete). The data we’ve been trying to retrieve can be obtained like so:

 

 byte value = (byte)(occlusionQuery.PixelCount * 255 / Resolution);

 

Of course, there are additional complexities. We need to manage a queue of occlusion queries, since we’ll have many in flight at the same time (since they take a few frames to return). Or, you can decide to only bother requesting a new value when the old one returns. The fact that I’m using Color for the mask value bears some mentioning. In my case I need to read all four color components (RGBA). Using an occlusion query will only let you get a single component, essentially. Thus, to get each of the 4 components, I need to make separate 4 draw calls (and thus 4 occlusion queries). I could have used a simple float for my mask, but then I would have to add shader logic (or different techniques) to isolate the particular RGBA component. Using a Color value lets me do this automatically. To isolate the green component, for instance, just make sure the other 3 components (RBA) are all zero in the vertex buffer you use for that call – then they’ll be ignored in my shader.

Snow Cover

Snow cover is a different beast altogether. Instead of a single value (or four values), I need (in my current incarnation) 1024 values for my 32 x 32 map. I work around this by only requesting a small subset each frame. I request the values for the areas currently surrounding the camera every frame, but for those outside of view, I only update a handful each frame. Currently I update 5, so it takes around 200 frames to update the whole map (roughly 7 seconds at 30 FPS).

Performance considerations

It goes without saying that you need a separate draw call for each query you do. If your performance bottleneck is already the number of draw calls, then using this technique probably isn’t a good choice.

Conclusion

This appears to have addressed the perf issues I was having with reading back from the GPU. If I were designing my weather system from scratch again however, I might try to make something that doesn’t require the GPU for calculations. This is a fair amount of work to go through just to get single values.

Advertisements

3 comments on “Dynamic weather part 2 – occlusion queries.

  1. Nice solution. But I think it is a bit overkill.

    I think the GPU stall occurs wether you are trying to read back a texture that is currently in use (that is bind to a sampler and being read in a shader). It doesn’t matter the number of frames that have passed since you wrote to that rendertarget.

    Maybe the solution would be to have “double buffer”. Have more than one snow cover texture and ping-pong between them every frame. That way you will be updating one while reading the previous one without stalls.

    Anyway, this is just an idea. I have not tested it myself.

  2. Well that’s exactly what my previous solution did (actually n buffers, where n was the number of frames I delayed before reading back). I still incurred a big “fixed cost” performance hit when retrieving the data for a texture that was no longer in use. The first texture readback in any draw cycle would result in a several millisecond delay (and any subsequent ones would only be dependent on the amount of data being transferred, which is almost nothing).

    I’m not sure if this is some flaw in the way XNA implements the texture readback on top of DirectX, or a peculiarity with my particular graphics card, or what.

  3. I’ve verified this with a small test project that reads (GetData) one pixel from a render target that was rendered 3 frames previous. The cost incurred by this varies from 0.1ms to 4.5ms per frame, depending on various factors: How much I render in a frame, where I make the GetData call, PC vs Xbox, etc…

    I see generally higher times on the PC.

    Calling GetData must cause some sort of synchronization, and as a result it’s a very “unstable” fixed cost.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Just another WordPress site

Just another WordPress.com site

The Space Quest Historian

Adventure game blogs, Let's Plays, live streams, and more

Harebrained Schemes

Developer's blog for IceFall Games

kosmonaut's blog

3d GFX and more

Halogenica

Turn up the rez!

bitsquid: development blog

Developer's blog for IceFall Games

Game Development by Sean

Developer's blog for IceFall Games

Lost Garden

Developer's blog for IceFall Games

Memories

Developer's blog for IceFall Games

Casey's Blog

Developer's blog for IceFall Games

Blog

Developer's blog for IceFall Games

Rendering Evolution

Developer's blog for IceFall Games

Simon schreibt.

Developer's blog for IceFall Games

Dev & Techno-phage

Do Computers Dream of Electric Developper?

- Woolfe -

Developer's blog for IceFall Games

Fabio Ferrara

Game Developer

Clone of Duty: Stonehenge

First Person Shooter coming soon to the XBOX 360

Low Tide Productions

Games and other artsy stuff...

BadCorporateLogo

Just another WordPress.com site

Sipty's Writing

Take a look inside the mind of a game developer.

%d bloggers like this: