Leave a comment

Xbox/XNA performance tuning

As I was finishing up work on changes to Tank Negotiator’s networking session and controller sign in code, I re-enabled a debug FPS component to check my frame rate for some reason (I forget why).

I noticed that while I often maintained 60 FPS, I had drops down to the upper teens in frame rate occasionally, and sustained frame rates in the thirties. So in the past few days I’ve been busy measuring performance and making optimizations.

CPU bound

The first thing I did was look at a frame capture in PIX (on Windows). I was shocked to find that I was making  286 draw calls in a frame with 4 players present (and not doing a whole lot).

Well I guess I wasn’t altogether shocked. This game was mostly written before I had much experience writing rendering code. The code is organized according to standard OOP principles where objects (for instance the player’s tank) and responsible for updating and drawing themselves. I don’t have a separate piece of rendering code that organizes meshes and effects properly in order to minimize draw calls and state switching.

I then started timing the Update and Draw cycles. Sure enough, the combined time for them frequently exceeds 16.666ms, causing the drop in frame rate.

Though it’s tempting to believe that all the lighting and particle effects (often taking up large portions of the screen, with lots of overdraw) eat up GPU cycles, I have yet to encounter a scenario where this is true. If, when I experience big frame rate drops, I pause the game (which retains the Draw cycle, but which brings the Update cycle down to nearly 0ms), the frame rate always picks back up to 60.

So it would be a complete waste of time to look for GPU optimizations.

Draw vs Update

Surprisingly, the Draw cycle generally takes more CPU time than Update. I’ve managed to reduce this quite a bit by minimizing my draw calls. I’ve been able to cut the number of draw calls by more than 50% just by re-factoring the code in key places.

Unfortunately there doesn’t seem to be any one big culprit in either the Update or Draw cycles. I’ve been able to make good improvements, but just by gradually nibbling away at various things. As of now, I never drop below 30 FPS for high action frames. But I’m still hoping I can make the bar 60 FPS.

Draw cycle

The Draw cycle has been by far the hardest. I’ll go through some of the optimizations I’ve made – what’s worked and what hasn’t.

I use shadow volumes for my shadows. These require separate vertices/indices for each mesh. However, I was not using vertex buffers – I was just sending the data anew each time (with DrawUserPrimitive). The meshes in my game aren’t big, but this still resulted in several KB of vertex data being sent each frame. Converting to proper vertex buffers resulted in a pretty big improvement.

I’ve also targeted certain objects for instancing. Bullets used to be a separate draw call each (I know, horrible). They are now instanced, along with power ups and a few other meshes.

In one area, I’ve been surprised at the lack of effect my optimizations had. The HUD for 4 players used to result in 48 draw calls. It cost me between 2.5 and 3ms of Draw time (determined by switching them on and off). I kept switching effects and textures when rendering. Through a combination of code re-factoring and texture atlasing, I’ve been able to bring this down to 9 draw calls. Unfortunately, it still takes 2ms of Draw time.

This is frankly one area where I’m confused – 9 SpriteBatch calls, with a total of 48 sprites (in all, not each). 2ms? I’m really not sure what is going on here.

Update cycle

One problematic area I know I have is with lasers. A laser beam is shot and it bounces off walls until it hits target. Each bounce it needs to check all wall pieces so see if there has been a collision – up to a maximum of 12 bounces. There might be 70 walls that it could bounce off. Multiply that by four players, and give them the highest laser upgrade (which results in a laser that can travel twice the length). That’s a lot of collision detection.

8 laser bounces.

I hadn’t noticed lasers being used a lot during my most serious frame rate drops, but still – a single laser could result in up to 2ms of Update time perform the collision detections. So it’s definitely an area to improve.

It’s well-known that CPU cache misses are expensive on the Xbox’s Xenon processor. I noticed that the data needed for laser collisions was spread apart in different allocations. I thought I might try putting all that data in a single block of memory in the hopes that this would avoid cache misses and perhaps improve performance. Unfortunately it had very little effect (I really wish there were a way to actually measure if cache misses were affecting performance on the Xbox).

What did work was to micro-optimize all the code involved in the collision detection. This involved the following:

  • manually inline most functions
  • pass all Vector2 parameters by ref
  • replace Vector2 function calls with “manual” calculations. Don’t use the Vector2+ operator, or the Min/Max functions. Instead do the calculations individually on the X and Y components.

These simple but annoying optimizations resulted in me reducing the laser collision detection cost by about 50%. They don’t have any effect on Windows, but they make a big difference on the Xbox, which uses the .net compact framework.

I can probably do some algorithmic improvements to reduce the number of walls with which I need to collide lasers. However, I think the next step should be to try to focus more tightly on times where the frame rate drops significantly to see if there are any terribly bad things happening. At least, since I’m CPU bound, I have the option of optimizing either the Update or Draw cycles.

Profiling

After running out of obvious and easy optimizations, I used the SlimTune profiler on Windows to see what the remaining problem code is. Unfortunately there are no big fish to fry. However, I’ve only used it on the entire lifetime of my application. I need to learn how to use it to target specific times when there are frame drops.

In the end, I’m not sure this will be as fruitful as putting in debug switches to turn off various bits of the code and seeing the result live.

Replays

Though my game doesn’t offer this as a public feature, I do have a way to record games and replay them. This has been essential in ensuring any changes I’ve made haven’t caused any unintentional changes in gameplay. The slightest different in collision detection routine, or the update order of components, could caused things to go awry; if this happens then the replays will no longer function properly. This tends to be very obvious (such as the replay ends when the recorded replay data ends, and there is no winner).

Unfortunately, sometimes I encountered situations where I need to make a modification as part of optimization. One example is a highly-used function that is part of my collision detection. It in turn calls another function whose sole purpose is to return the square of the number passed in.

static float square(float f)
{
    return f * f;
}

Yeah, it’s kind of lame – and it’s also a terrible place to have a function call in the .net compact framework. However, when I tried to replace it by simply multiplying the two terms in the calling function instead, my replays suddenly stopped working!

I vaguely remembered something weird about passing float parameters on the Xbox, and a minute of googling brought me to this page. Insane!

The above function actually converts the float passing in to a double, then back to a float. Then it does the multiplication. Then, it converts the result to a double, and back to a float on return. Some float values must change slightly when they undergo this conversion from double to float and back, and so I started getting different results when I replaced it with the inline version.

[Addendum:] Actually, this isn’t related to round-tripping a float to double and back to a float. As far as I know a double should be able to represent any float perfectly, so this isn’t an issue.

What’s actually going on is that floating point math is performed using double precision, and only converted back to float when necessary. That means:

            float dot = Vector3.Dot(v, s) / lenSq;

can’t be replaced by

            float dot = (v.X * s.X + v.Y * s.Y + v.Z * s.Z) / lenSq;

because the numerator is now divided by the denominator using double precision, giving a different result. The fix is to cast to float first:

            float dot = (float)(v.X * s.X + v.Y * s.Y + v.Z * s.Z) / lenSq;

 

 

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Just another WordPress site

Just another WordPress.com site

The Space Quest Historian

Adventure game blogs, Let's Plays, live streams, and more

Harebrained Schemes

Developer's blog for IceFall Games

kosmonaut's blog

3d GFX and more

Halogenica

Turn up the rez!

bitsquid: development blog

Developer's blog for IceFall Games

Game Development by Sean

Developer's blog for IceFall Games

Lost Garden

Developer's blog for IceFall Games

Memories

Developer's blog for IceFall Games

Casey's Blog

Developer's blog for IceFall Games

Blog

Developer's blog for IceFall Games

Rendering Evolution

Developer's blog for IceFall Games

Simon schreibt.

Developer's blog for IceFall Games

Dev & Techno-phage

Do Computers Dream of Electric Developper?

- Woolfe -

Developer's blog for IceFall Games

Fabio Ferrara

Game Developer

Clone of Duty: Stonehenge

First Person Shooter coming soon to the XBOX 360

Low Tide Productions

Games and other artsy stuff...

BadCorporateLogo

Just another WordPress.com site

Sipty's Writing

Take a look inside the mind of a game developer.

%d bloggers like this: