A problem I’ve been putting off for a while involves figuring out how to implement the sound source for the rushing water of a stream or river. Point sound sources (say from a crackling fire or a ticking clock) are easy and well-supported with 3d positioning by even basic frameworks like XNA.
But sounds that come from areas are a bigger challenge. I toyed briefly with describing these areas as a series of rectangles, but struggled with how to turn that into a sound volume and direction.
The current (very basic) implementation I have consists of a series of point sources (with fall off ranges) which are treated collectively as one. That is, the point sources are separate entities, but the engine recognizes that they all represent the same sound. It chooses the closest/loudest point source and use that as the loudness/direction of the river sound.
The problem with that is clear from the above picture, where the river curves. On an peninsula on a river bend, the sound suddenly jumps from one location to another as we transfer from one sound source to the other. This could have some temporal smoothing applied to make it less jarring, but it will still sound wrong.
Another option I considered involves calculating the “average point source”. One problem with that is that we often use more sound sources in areas with more complex geography (e.g. a curvy section river), and the average would then be more heavily weighted towards that area. Additionally, averaging two sources that are a good distance apart might result in an average position that is within hearing range for the player, despite the fact that the two original sources may actually have been out of hearing range.
In short, there are many challenges with using point sources to model an area emitter.
I decided to code up a small test app to help with sorting these issues out, and to see if I could indeed come up with a workable solution. This also led me to better understand how sound is positioned in XACT or with the XNA SoundEffect APIs. So perhaps a short digression here is useful.
The final sound consists of a volume, and a position in the “sound field”. The latter corresponds to a L-R balance in a stereo system, or a position in a 2d plane in a 5.1 surround-sound system. So in the sound field, you might have the following:
I had never considered exactly how the 3d sound stuff worked – I just supplied it with positions for the AudioEmitter and the AudioListener and hoped for the best. But how do these positions translate into an actual balance of sound within a surround-sound or stereo system?
The SoundEffect API has a global static property, SoundEffect.DistanceScale, that provides this mapping. In XACT this can be controlled per cue. So by setting this value, we can explicitly control where our sound shows up in the sound field. For instance, we could just set DistanceScale to 1, and then use a unit vector (as the delta between the AudioListener and AudioEmitter) to position the sound. Also important is to give your AudioListener and AudioEmitter the same Forward and Up vectors (these essentially define the sound coordinate system).
An important point to note is that the distance between AudioListener and AudioEmitter doesn’t affect the volume of the sound by default – it only affects the perceived direction. If you’re using XACT, you can set up an RPC to map Distance to a Volume.
With this in mind, I set off to figure out how to turn a bunch of river sound sources into a sound of the correct volume explicitly positioned in the sound field.
First, let’s define how volume falls off over distance for one of these point sources. A rigorous look at the science of sound is beyond the scope of the article (and beyond my desired time investment), so we’ll let ourselves be rather liberal with how we do this. As long as it sounds good.
Our point sources are really area sources, so it seems like it would make sense to define a zone around the sound source where volume actually does not fall off. Beyond that, sound
intensity pressure should fall off at 1/r, where r is the distance from the source.
So if we were to graph this, it might look like the following:
There are two values we need to control here per sound source. d1 represents the region over which sound is at full volume. d2 represents the region over which it then fades to “zero”. Of course, it doesn’t actually reach zero until infinity. Finite sound sources are convenient, so we’ll define a near zero value ε which represents some very quiet volume.
The curve across d2 above can be expressed as:
y = F / (x + a)
where F and a are constants. We want to express this in terms of d1 and d2, however, as that is more convenient. At distance d1, we can say the volume is 1 (max volume), and at distance (d1 + d2) we can say it is ε. Therefore:
1 = F / (d1 + a)
ε = F / (d1 + d2 + a)
Doing the algebra, we get the following equations for F and a:
a = ε (1 – ε) d2 – d1
F = ε (1 – ε) d2
So now we have a way of describing the volume over distance when given (1) the range of full volume, and (2) the range at which it falls off to “zero”. Visualized as placed across a river, it might look like this:
Then we would place a series of these along the river’s path.
Note that this is just one way to represent volume over distance. You’re free to choose whatever falloff suits your particular scenario.
What about direction?
We talked about sound volume changing over distance. What about direction? How does our distance from the source affect the balance in the sound field? Obviously, when far away, the sound comes from the direction of the source. As we enter into the sound area however, it gradually shifts direction until it is coming from all around us as we reach the center of our sound. Sort of like how gravity would gradually decrease to zero as your approach the center of the earth, since there is more and more mass pulling outward on you as your descend. As you walk into the river, the sound comes from all around you.
Even right at the edge of the solid/full part of our sound area, the sound won’t be quite as directional as when far away (and our source area approximates a sound point). So my solution is to define some value – let’s call it α – that lies somewhere between 0 and 1 and defines how close to the center the balance is. I’m sure there is a rigorous calculation that would let you come up with a correct value, but since these are all just approximations of real world phenomena I have just chosen an arbitrary value that sounds good (0.4).
Beyond this, as we reach the outer limits of the sound distance, the sound becomes more completely directional. So the “balance factor” goes from 1 to 0 over (from Center to Full Directional) as you go away from the sound source. For the outer range of the sound, I just use the sound intensity to control the directionality. Our graph thus looks something like this:
Putting it all together
So once we have all these sound sources, how do we combine them? Remember, together these sound sources are supposed to form one single entity. If we were to add their contributions they would weight more heavily towards regions with a denser collection of sound sources – even though that denser collection doesn’t affect the sound field significantly. Instead, we want to find the maximum contribution of any sound source at a point.
For volume this is fairly straightforward. We just take the maximum intensity level calculated from any of the source sources. Here is a visualization for a selected point (center of the cross-hairs) surrounded by sound sources:
When it comes to directionality though, how can we take the max of the contributions from the source sources?
Remember that if we were just to average all the values, we would end up with scenarios like this:
What other options do we have? One possibility is to divide the 2-d sound field into n arcs (say, 10 arcs of 36 degrees each), choose the max volume source in each arc, and then somehow combine then afterward. This would address the issue above, since the sound source on the far right would be essentially ignored due to the closer one. Then there is the question of how to combine the results from each arc. There are also issues with discontinuities as a sound source moves from one arc to the next and a previously ignored sound source jumps into play (there are ways to mitigate this which I’ll get to later).
But there are more fundamental issues here. Say for instance you have two sound sources in an arc. You need to choose one to represent that arc, but each may have vastly different balances (e.g. one just slightly off center, and one at the extreme end of the audio field) despite having near similar strengths (and thus both exerting a strong influence on the final result). So it doesn’t seem this solution is sufficient for our purposes.
This might be a good time to bring up a small point which is essentially for understanding some of the remaining diagrams. It is important to highlight the distinction between plotting position of the sound sources (which is what we have mainly seen so far) and plotting the position/magnitude of the source source within the audio field. Here is an example of the same data set in both versions:
So back to our question – what other approaches can we use? Let’s look at the data we have to express each point:
- a direction in the sound field. e.g. the balance. This is a 2-d vector. But it is not a unit vector, since the balance can be anywhere (and this is important).
- a magnitude
It should be clear that 2 dimensions is not sufficient to express this. We can’t represent balance properly with a 2-d point and still include magnitude (the diagram above shows balance and magnitude combined, so information is lost in the plot).
So what if we introduce a third dimension? The balance can be represented by a 3-d unit vector whose xy projection looks like our familiar 2-d balance vector. The angle with the xy plane would represent the relative position in our audio field. Then the magnitude of the 3-d vector can then represent volume.
It might look like this:
And a cross section (zx projection) along a 1-d line in our world:
But how does this help us? Well, look what happens when we describe a sphere that just encompasses all points (we’ll use a circle in the cross sectional representation to make it easier to understand):
Note how the “sphere” moves over to the left once the vector for a4 is removed. It turns out we can use this fact to generate our final sound direction and balance. This can sort of be considered a 3d version of taking the min/max of a series of scalar values.
I don’t have any sort of rigorous mathematical proof here, but it does seem to produce a reasonable result. I use the center of the resulting sphere to determine my final direction and balance.
With this algorithm, we can see that the problems outlined with other techniques go away:
This post has already been way too long, so I’ll prepare another one that has a final summary and a sample project.