I’ve been mainly working on some visual polish for Tank Negotiator, along with finishing the test automation cases. But when I get bored of that sometimes I work on perf, so this is going to be basically another status report on my performance work.
I’m still missing my 60 FPS goal pretty regularly. I don’t think this is a deal-breaker, but I’d like to inch closer if possible.
In my Update cycle, I’ve managed to reduce the collision testing for tanks vs walls down to about 25% of what it used to be. It was frequently eating up about 3ms of time per frame, so this is a good thing. The improvements came solely with how my quad tree was organized, so that I now test an average of 3 wall pieces per movement instead of 12.
I keep wondering if I’m hitting issues with caches misses on the Xbox CPU, given how all my sequentially-accessed data is scattered across memory. But even though I’ve seen expensive pieces of code that have poor memory access patterns, I’ve never realized any improvement by reorganizing the data. Admittedly I’m flying blind here, since there is no way to actually measure cache misses. But I’ve been surprised how – as far as I can tell – this is not a big performance issue.
On the other hand, I have definitely seen performance improvements by manually inlining functions, turning properties into fields and passing structure parameters by ref.
In the Draw cycle, I feel like the number of draw calls has less performance impact than I would expect, while the amount of data I pass has a greater impact than I would expect.
One of the things I draw is bullet holes in the walls. I draw up to 50 of them (50 quads), fading old ones out as new ones are needed. I figured that sending up to 50 quads (200 vertices) to the GPU every frame would be no big deal. But that ends up being typically about 4KB of data. That single Draw call costs 1.3ms of CPU time! I realized a good improvement here by using a DynamicVertexBuffer and only updating it when necessary.
On the other hand, one of the shocking things I found was that I was making separate draw calls for each segment of the lasers (which bounce off walls up to 12 times)! Typically we might be drawing 10 or 20 segments, so that’s 10 or 20 draw calls, and 1 or 2KB of data. Bringing it back down to 1 or 2 draw calls didn’t seem to result in much improvement though.
Another good example of where CPU-to-GPU bandwidth is an issue is a large screen of text I frequently bring up in trial mode. There are about 500 characters in the blurb, and the single SpriteBatch.Draw call that draws them ends up taking 1.5ms. Text is slow! Each glyph quad is 96 bytes of data. So 500 characters ends up being nearly 50KB.
Another area which wasn’t a huge deal, but definitely showed up on the radar, was a place where I was re-fetching the EffectParameter from the Effect each time. In one place I did this about 100 times per frame. This cost about 0.33ms. Stashing the EffectParameters and re-using them brought it down to about 0.25ms. A minor improvement, but a fairly straightforward change.