Final Project - Acoustic Simulation for Human Echolocation

For my final project I propose to develop a system for simulating impulse responses in virtual 3D spaces.

Clearly there has been substantial work on acoustic simulation for architectural applications, virtual reality, and gaming. Most of the work focuses on a set of stationary or moving sources with a single listener position that is fixed, on a set path, or interactively controllable by a user.

One factor that makes the echolocation application unique is that the source and the listener are coincident. I'm also interested in exploring some optimizations that I haven't seen in existing literature.

Current Approaches

Simulated acoustics are broadly categorized into with wave-based or geometric. With wave approaches the focus is on finding efficient ways to solve the wave equation, typically with finite element or finite difference approaches. This approach is generally regarded as the most rigerous but also too computationally expensive for real-time operation. Ray, Beam, and Frustum tracing are common geometric aproaches, where sound propagation is modeled as a beam that is reflected on, diffracted around, and transmitted through the environment.

Another geometric approach which is related (and often implicit) in the others is the Image Source Method (ISM), where sound sources are reflected across surfaces in the scene, creating virtual source images. This process can be repeated to the desired reflection order. While this approach is relatively simple to understand and implement, it suffers from exponential growth in the reflection order, and so is usually only used for the first several reflections. This is the approach I'd like to explore in this project.

Goals and Validation

The exponential growth of the image-source method is the main problem implementations run into. The goal of this project will be to try the above optimizations and examine their effect on the runtime and scaling behavior. To validate I will start with a naive image-source implementation as a base-line.

Possible Optimizations

Leverage listener/source symmetry

Because the listener and the source are coincident, there may be simplifications that can me made about the reflection geometry.

Identifying reflection cycles

If we can identify cycles in the reflections, rather than continuing to expand virtual image sources we could substitute a simple IIR filter. I suspect that these sort of cycles are much more common with coincident source and listener. There also may be effeciency gains (with accuracy trade-offs) by including near-cycles as well.

It's worth noting that any cycle must go through the source, so with a non-coincident source and listener it's unlikely that a cycle would go directly through the listener.

Pre-process surface pair visibility

This optimisation is from (1). By precomputing a visibility matrix between each face in the scene, you can substantially reduce the number of virtual images needed, which would just be culled in a further visibility check anyways.

Grouping distant virtual sources

Most of the processing time in image-source techniques is spent calculating the higher-order reflections, as the standard implementation grows exponentially with order. Most implementations opting for speed truncate the ISM at a pre-determined reflection order and approximate the reverb tail either by concatenating a decaying noise signal to the impulse response (2) or applying a parametric reverb.

I'd like to explore whether it's advantageous to combine nearby virtual sources at higher reflection orders. I think it would be important to maintain their relative impulse times and perhaps keep track of the overall size of the combined source, but all of the visibility and reflection calculations would be done as if the sources are a single one, saving compute time. By varying the grouping algorithm it might be possible to simulate the reverb tail without switching regimes entirely.

Stretch Goal

One approach to create a binaural audio image from sound rays as they reach the listener is to convolve each one with the Head-Related Impulse Response (HRIR) (or equivilantly multiply by the HRTF in the frequency domain), depending on the incidence angle of the incoming sound relative to the listener's head orientation. If the head orientation changes, the left and right ear responses must be recalculated using all of the incoming sound rays. This is a relatively expensive operation.

Another approach is to ignore the diffraction and shadowing effects of the head and shoulders and calculate a response for separate listeners at the ear locations. Head rotations are now represented as translations of the source which requires a full recalculation of the sound paths.

I'd like to explore storing the overall impulse response ambisonically. This has the benefit that it decouples the acoustic simulation with the decoding and playback, so the impulse response could be rendered binaurally for headphones or for a multi-channel speaker setup. Additionally, (for a binaural arrangement) head rotations would only require re-decoding the ambisonic signal using the HRTF, which is much cheaper then the full set of incoming sound rays.

Technology

For this project I will be using Julia (6). While somewhat slower than C, the ability to write effecient iterative code without needing to vectorize everything or dropping into a different environment gives it an edge over other high-level options such as matlab. Cython is tempting though...

References

1. Lauri Savioja - Modeling Techniques for Virtual Acoustics
2. Lehmann, Johansson - Diffuse Reverberation Model for Efficient Image-Source Simulation of Room Impulse Responses
3. Savioja et. al - Use of GPUs in room acoustic modeling and auralization
4. Taylor, et. al - iSound: Interactive GPU-based Sound Auralization in Dynamic Scenes
5. Noisternig et. al - Framework for Real-Time Auralization in Architectural Acoustics
6. http://julialang.org/