Well originally in this post I was going to just tie up some loose ends from the previous one, provide a sample app, etc…
Unfortunately, I sort of abandoned it because I was never very satisfied with the state in which I got things in terms of robustness and performance.
After working on a water flow shader a few weeks ago however, I thought of an another idea for a way to model “area sounds”. Instead of a set discrete sound emitters, we can have a “flow map” of sorts. At each point on the pre-calculated map we have a value that represents the volume and direction of the sound source. So at runtime, it is very simple to turn this into a volume and position in the stereo field.
The crux then, is how to generate this map. That’s what this post will discuss.
Generating the audio “flow map”
My previous attempt to model area sounds with a limited set of point emitters was a very rough approximation. What we want ideally is an infinite number of point emitters to describe our sound. The technique I’m about to describe gets us closer to that goal.
So just like we have a height map, we’ll have an emitter map that indicates the intensity of a particular sound (e.g. flowing river sound) at a point. So in the image below, each square in the water would be “painted” with this sound (essentially becoming an emitter), and the land next to it would not.
To generate the flow map for our sound based on the emitter map, for each point on the flow map we sum up the influence of all emitters that affect it. As you can imagine, this involves a lot of calculation, so this must be done offline.
There is a bit of trickery in how to “sum up” the emitters, since we need to define how much each emitter influences the directionality of each point on the flow map.
For an emitter at a point, we’ll calculate the loudness (via some falloff function), and the weighted direction (unit vector towards the emitter, multiplied by the loudness).
We sum these up for all emitters at a point, and then divide the weighted direction by the total loudness accumulate at that point.
Some equations (a is loudness, v is direction):
The division by the total loudness makes sense when you think about how an emitter than influences a location loudly but with zero directionality would help push that location towards zero directionality.
Some examples might be useful. Suppose in the following photo, we have a river meandering into a pool. We’ve painted this onto the emitter map (red):
The above shows the flow map (with volume and direction visualized separately). The direction map basically looks like a normal map. You can see the pool in the upper right. Within the pool, the volume is loud, but the directionality is minimal.
A bit of further explanation might be good at this point (or look at the previous post). Just like in my last post, I use the magnitude of the direction vector to determine the speaker balance. So a vector of (0.5, 0.0) is more weighted toward the center of the speaker balance than a vector of (1.0, 0.0), which would be completely in the left ear.
When these locations are applied directly to the XNA AudioEmitter and AudioListener Position properties, we get the desired effect (assuming our SoundEffect.DistanceScale is set to 1).
Now, imagine I have a world map resolution of 1024 x 1024, and my emitter radius is 50 world units. That ends up being over 10 billion calculations we need to make (1024 x 1024 x 100 x 100)! You can see why we need to do this offline.
We can speed up the calculations greatly by doing them from the point of view of the emitter. For an emitter with a radius of 50, we pre-generate a 101 x 101 influence map that specifies the direction/magnitude at each spot. Then, wherever we have a non-zero value in the emitter map, we just “stamp” this influence map on the flow map. Afterward all the emitters are applied, we need to do a pass over the whole flow map to divide direction by the accumulated weight at each spot.
Based on my tests, I think most scenarios would take a matter of seconds on a modern computer.
So we have the “science” down. Now we need to talk about the “art” of drawing the emitter maps.
You might be wondering: hey, won’t the sound get louder if I have more emitters? So if I have a river 8 units wide, it will generate twice as much noise as a river 4 units wide?
Well yes, and that actually makes sense, though it can be a little tricky to model correctly. Take a look at the following image:
You can see the difference in volume, which makes sense given what I’ve described. It can make it hard to manage your levels though. Note that where the blue is maxed out, we are essentially “clipping”. We’ve gone beyond the maximum volume at which we clamp, and so we’ve essentially lost information. We might end up having full volume away from the river because of this.
This can be fixed by globally scaling all our values by some amount as we’re calculating them (now the skinny river would be even quieter though), or by reducing the level of the emitters in places where there are a lot.
Another option is to employ some sort of “tone mapping” on the volume part of the flow map. Equalize things a bit. I think this could be a nice thing to tweak, since my experimentation suggests that a loud river falls off to lower volume surprisingly quickly with this algorithm. It may actually be accurate in the real world – that far off river may be 1000000 times quieter than the river up close, but you can still hear it because your ears have great dynamic range. That doesn’t work so well in a computer game though.
Choosing a fall-off curve
I found that a physically-based fall-off curve works pretty well. This means the volume is inversely proportional to distance. So, choose an arbitrary volume value at a distance of 1, then decrease accordingly with distance (1 / R).
We can use temporal smoothing to remove any discontinuities we encounter on the map. This shouldn’t happen, except maybe near the edges of influence of an emitter where volume goes to near zero (but not quite zero).
Something like the following algorithm (often used in HDR exposure changes) should work, where Tau is an arbitrary value:
float temporalAdaptFactor = (1f - (float)Math.Exp(-secondsEllapsed * Tau)); float currentFrameLoudness = previousLoudness + (targetLoudness - previousLoudness) * temporalAdaptFactor; // Do the same with direction, etc...
The problem with XACT
XACT really gives me a headache. I had all this working with the SoundEffect API in my test app. Then, I tried to switch it over to XACT, which is what my game engine uses. My audio positioning logic didn’t work at all!
It turns out XACT’s 3d audio works completely differently than in the SoundEffect API. And in fact, it isn’t possible to position things in an xy plane and still control the “speaker balance”. For instance, a sound coming from (0.5, 0.5, 0) will be positioned (in the audio field) the same as one at (0.707, 0.707, 0).
Since in both SoundEffect and XACT ignore the Z coordinate is ignored (assuming your listeners Up vector is (0, 0, 1)), this effectively means that surround sound doesn’t work properly in XACT with my algorithm. I can’t have a sound coming from a particular direction and have it balanced slightly on the opposite side of the audio field like I can with SoundEffect.
The best you can hope for is a stereo effect (which is fine for me, I don’t have a surround sound system) where you control the L-R balance with the y value (which ideally would control the front/back balance).
Another issue I encountered: what is the relationship between SoundEffect.Volume, and the volume of a XACT cue? I still haven’t found the exact relationship. Full volume (1) of a SoundEffect appears to be somewhere between 0 and -6db in XACT. I did create a volume curve in XACT that does approximate how SoundEffect.Volume works (a doubling of the value results in +6db). But there is still the absolute volume issue.
One big drawback of this algorithm is that this basically implies static world content wherever it is used. That river had better not move or dry up! In practice I don’t think this is a big deal. Even if it’s used to model something like waves crashing on an ocean shore, despite being a static map we can always vary the actual sound/volume used based on other factors (storminess of the sea on that day).
On the other hand, a benefit of the algorithm, at least when it comes to my engine, is that region objects don’t need to be loaded. My world is chunk-based, and if an audio emitters object exists in an unloaded chunk, there is no way the player could hear it – even if it would normally be loud enough. When the chunk finally gets loaded, the sound would pop into existence. There is no such problem with the flow map technique I’ve described here.
Here is the visual studio project for my test app. It lets you paint onto an emitter map, and then calculate the sound “flow map” from that. You can move around with an Xbox controller to see how the resulting flow map sounds from different locations (sorry about requiring the controller, but it should be straightforward to add keyboard support).
You can tweak the overall emitter volume scale, and the falloff curve (both these require a recalculation).
I hope this post was useful, and hopefully this technique will let me design nice soundscapes that incorporate rushing rivers, ocean shores, and whatever other kind of audio phenomena that are best modeled as “area emitters”.