Sound and Music for Video Games Technology Overview Roger Crawfis Ohio State University Overview • • • • Fundamentals of Sound Psychoacoustics Interactive Audio Applications What is sound? • Sound is the sensation perceived by the sense of hearing • Audio is acoustic, mechanical, or electrical frequencies corresponding to normally audible sound waves Dual Nature of Sound • Transfer of sound and physical stimulation of ear • Physiological and psychological processing in ear and brain (psychoacoustics) Transmission of Sound • Requires a medium with elasticity and inertia (air, water, steel, etc.) • Movements of air molecules result in the propagation of a sound wave Longitudinal Motion of Air Wavefronts and Rays Reflection of Sound Absorption of Sound • Some materials readily absorb the energy of a sound wave • Example: carpet, curtains at a movie theater Refraction of Sound Refraction of Sound Diffusion of Sound • Not analogous to diffusion of light • Naturally occurring diffusions of sounds typically affect only a small subset of audible frequencies • Nearly full diffusion of sound requires a reflection phase grating (Schroeder Diffuser) The Inverse-Square Law (Attenuation) I W 2 4r I is the sound intensity in W/cm^2 W is the sound power of the source in W r is the distance from the source in cm The Skull • Occludes wavelengths “small” relative to the skull • Causes diffraction around the head (helps amplify sounds) • Wavelengths much larger than the skull are not affected (explains how low frequencies are not directional) The Pinna Ear Canal and Skull • (A) Dark line – ear canal only • (B) Dashed line – ear canal and skull diffraction Auditory Area (20Hz-20kHz) Spatial Hearing • Ability to determine direction and distance from a sound source • Not fully understood process • However, some cues have been identified as useful The “Duplex” Theory of Localization • Interaural Intensity Differences (IIDs) • Interaural Arrival-Time Differences (ITDs) Interaural Intensity Difference • The skull produces a sound shadow • Intensity difference results from one ear being shadowed and the other not • The IID does not apply to frequencies below 1000Hz (waves similar or larger than size of head) • Sound shadowing can result in up to ~20dB drops for frequencies >=6000Hz • The Inverse-Square Law can also effect intensity Head Rotation or Tilt • Rotation or tilt can alter interaural spectrum in predictable manner Interaural Arrival-Time Difference • Perception of phase difference between ears caused by arrival-time delay (ITD) • Ear closest to sound source hears the sound before the other ear Digital Sound • Remember that sound is an analogue process (like vision). • Computers need to deal with digital processes (like digital images). • Many similar properties between computer imagery and computer sound processing. Class or Semantics • Sample • Stream • Music • Tracks • MIDI Sounds Sound for Games • Stereo doesn’t cut it anymore – you need positional audio. • Positional audio increases immersion • The Old: Vary volume as position changes • The New: Head-Related Transfer Functions (HRTF) for 3d positional audio with 2-4 speakers • Games use: – – – – – Dolby 5.1: requires lots of speakers Creative’s EAX: “environmental audio” Aureal’s A3D: good positional audio DirectSound3D: Microsoft’s answer OpenAL: open, cross-platform API Audio Basics • Has two fundamental physical properties – Frequency (the pitch of the wave – oscillations per second (Hertz)) – Amplitude (the loudness or strength of the wave - decibels) Amplitude Frequency Sampling • A sound wave is “sampled” – measurements of amplitude taken at a “fast” rate – results in a stream of numbers Amplitude 1 0.5 mS Time 5 -0.5 -1 10 15 20 Data Rates for Sound • Human ear can hear frequencies between ?? and ??. • Must sample at twice the highest frequency. – – – – Assume stereo (two channels) Assume 44Khz sampling rate (CD sampling rate) Assume 2 bytes per channel per sample How much raw data is required to record 3 minutes of music? Waveform Sampling: Quantization • Quantization • Introduces • Noise • Examples: 16, 12, 8, 6, 4 bit music • 16, 12, 8, 6, 4 bit speech Limits of Human Hearing • Time and Frequency Events longer than 0.03 seconds are resolvable in time shorter events are perceived as features in frequency 20 Hz. < Human Hearing < 20 KHz. (for those under 15 or so) “Pitch” is PERCEPTION related to FREQUENCY Human Pitch Resolution is about 40 - 4000 Hz. Limits of Human Hearing • Amplitude or Power??? – “Loudness” is PERCEPTION related to POWER, not AMPLITUDE – Power is proportional to (integrated) square of signal – Human Loudness perception range is about 120 dB, where +10 db = 10 x power = 20 x amplitude – Waveform shape is of little consequence. Energy at each frequency, and how that changes in time, is the most important feature of a sound. Limits of Human Hearing • Waveshape or Frequency Content?? – Here are two waveforms with identical power spectra, and which are (nearly) perceptually identical: Wave 1 Wave 2 Magnitude Spectrum Limits of Human Hearing Masking in Amplitude, Time, and Frequency – Masking in Amplitude: Loud sounds ‘mask’ soft ones. Example: Quantization Noise – Masking in time: A soft sound just before a louder sound is more likely to be heard than if it is just after. Example (and reason): Reverb vs. “Preverb” – Masking in Frequency: Loud ‘neighbor’ frequency masks soft spectral components. Low sounds mask higher ones more than high masking low. Limits of Human Hearing • Masking in Amplitude • Intuitively, a soft sound will not be heard if there is a competing loud sound. Reasons: – Gain controls in the ear stapedes reflex and more – Interaction (inhibition) in the cochlea – Other mechanisms at higher levels Limits of Human Hearing • Masking in Time – In the time range of a few milliseconds: – A soft event following a louder event tends to be grouped perceptually as part of that louder event – If the soft event precedes the louder event, it might be heard as a separate event (become audible) Limits of Human Hearing • Masking in Frequency Only one component in this spectrum is audible because of frequency masking Sampling Rates • For Cheap Compression, Look at Lowering the Sampling Rate First • 44.1kHz 16 bit = CD Quality • 8kHz 8 bit MuLaw = Phone Quality • Examples: • • Music: 44.1, 32, 22.05, 16, 11.025kHz Speech: 44.1, 32, 22.05, 16, 11.025, 8kHz Views of Digital Sound • Two (mainstream) views of sound and their implications for compression • 1) Sound is Perceived – The auditory system doesn’t hear everything present – Bandwidth is limited – Time resolution is limited – Masking in all domains • 2) Sound is Produced – “Perfect” model could provide perfect compression Production Models • Build a model of the sound production system, then fit the parameters – Example: If signal is speech, then a wellparameterized vocal model can yield highest quality and compression ratio • Benefits: Highest possible compression • Drawbacks: Signal source(s) must be assumed, known, or identified MIDI and Other ‘Event’ Models • Musical Instrument Digital Interface – Represents Music as Notes and Events – and uses a synthesis engine to “render” it. • An Edit Decision List (EDL) is another example. – A history of source materials, transformations, and processing steps is kept. Operations can be undone or recreated easily. Intermediate nonparametric files are not saved. Event Based Compression • A Musical Score is a very compact representation of music • Benefits: – Highest possible compression • Drawbacks: – Cannot guarantee the “performance” – Cannot assure the quality of the sounds – Cannot make arbitrary sounds Event Based Compression • Enter General MIDI – Guarantees a base set of instrument sounds, – and a means for addressing them, – but doesn’t guarantee any quality • Better Yet, Downloadable Sounds – Download samples for instruments – Benefits: Does more to guarantee quality – Drawbacks: Samples aren’t reality Event Based Compression • Downloadable Algorithms – Specify the algorithm, the synthesis engine runs it, and we just send parameter changes – Part of “Structured Audio” (MPEG4) • Benefits: – Can upgrade algorithms later – Can implement scalable synthesis • Drawbacks: – Different algorithm for each class of sounds (but can always fall back on samples) Compressed Audio Formats Name Extension Ownership AIFF (Mac) .aif, .aiff Public AU (Sun/Next) .au Public CD audio (CDDA) N/A Public MP3 .mp3 MPEG Audio Layer-III Windows Media Audio .wma Proprietary (Microsoft) QuickTime .qt Proprietary (Apple) RealAudio .ra, ram Proprietary (Real Networks) WAV .wav Public To be continued … • Stop here • Sound Group Technical Presentations. • Suggested Topics: – Compression – Controlling the Environment – ToolKit I features – ToolKit II features – Examples and Demos Environmental Effects • • • • Obstruction/Occlusion Reverberation Doppler Shift Atmospheric Effects Obstruction • Same as sound shadowing • Generally approximated by a ray test and a low pass filter • High frequencies should get shadowed while low frequencies diffract Obstruction Occlusion • A completely blocked sound • Example: A sound that penetrates a closed door or a wall • The sound will be muffled (low pass filter) Reverberation • • • • Effects from sound reflection Similar to echo Static reverberation Dynamic reverberation Static Reverberation • Relies on the “closed container” assumption • Parameters used to specify approximate environment conditions (decay, room size, etc.) • Example: Microsoft DirectSound3D EAX Static Reverberation Dynamic Reverberation • Calculation of reflections off of surfaces taking into account surface properties • Typically diffusion and diffraction ignored • “Wave Tracing” • Example: Aureal A3D 2.0 Dynamic Reverberation Comparison • Static Reverberation less expensive computationally, simple to implement • Dynamic Reverberation very expensive computationally, difficult to implement, but potentially superior results Doppler Shift • • • • Change in frequency due to velocity Very susceptible to temporal aliasing The faster the update rate the better Requires dedicated hardware Atmospheric Effects • Attenuate high frequencies faster than low frequencies • Moisture in air increases this effect