Section 09
Glossary of Spatial Audio Terms
22 essential concepts in spatial audio, psychoacoustics, and acoustic engineering.
Ambisonics
A full-sphere surround sound technique using spherical harmonic decomposition. Higher orders encode finer spatial resolution.
Azimuth
The horizontal angle of a sound source relative to a listener, measured 0°–360° or ±180° from front.
B-Format
The standard encoding for first-order ambisonics: W (pressure), X, Y, Z (velocity components on each axis).
Binaural
Two-channel audio processed with HRTFs to simulate 3D spatial hearing over headphones, replicating natural listening cues.
Cocktail Party Effect
The auditory system's ability to selectively attend to one conversation in a noisy multi-talker environment using spatial and spectral segregation.
Cone of Confusion
A set of positions on a cone around the interaural axis that produce identical ITD and ILD values, requiring pinna spectral cues for disambiguation.
Convolution Reverb
Reverberation created by convolving an audio signal with an impulse response (IR) captured in a real acoustic space.
Doppler Effect
The perceived frequency shift caused by relative motion between source and listener — higher pitch on approach, lower on recession.
Elevation
The vertical angle of a sound source above or below the listener's horizon, typically -90° (below) to +90° (above).
Externalization
The perceptual quality of a binaural sound appearing outside the head rather than inside it — dependent on accurate HRTF rendering.
Haas Effect
When two identical sounds arrive within 1–30ms, the auditory system perceives one sound at the first-arrival location (precedence/fusion zone).
HRTF
Head-Related Transfer Function — the frequency-domain filter representing how the head, ears, and torso transform sound at each azimuth and elevation.
ILD
Interaural Level Difference — the intensity difference between ears. Primary localization cue for high frequencies (above ~1.5kHz).
Impulse Response
The complete acoustic "fingerprint" of a space — its response to a theoretically instantaneous sound, capturing all reflections and decay.
ITD
Interaural Time Delay — the arrival time difference between ears (max ~690μs at 90°). Primary cue for low-frequency lateralization.
Localization
The perceptual process of determining the spatial position of a sound source using binaural, monaural, and environmental cues.
Object-Based Audio
An audio format in which sounds are encoded as objects with position metadata rather than fixed channel assignments, enabling adaptive rendering.
Phantom Image
A perceived sound source located between two loudspeakers, created by amplitude panning — no physical source exists at that position.
Precedence Effect
The perceptual mechanism by which the first-arriving sound dominates localization, suppressing directional information from later reflections.
RT60
Reverberation time — the time for sound pressure level to decay 60dB after the source stops. A key descriptor of room acoustics.
Soundfield
The complete 3D distribution of acoustic energy in a space — ambisonics encodes the soundfield as a set of spherical harmonic components.
VBAP
Vector Base Amplitude Panning — a loudspeaker panning method using triplets of speakers to position virtual sources anywhere in 3D space.