On the manifolds of spatial hearing Vikas C. Raykar and Ramani Duraiswami University of Maryland College Park NIPS 2006 workshop on novel applications of dimensionality reduction December 9, 2006 1 Human spatial hearing How are humans able to judge the direction of a sound source? Why do we have two ears? Why is the pinna shaped the way it is? 2 Plan of the talk • Human spatial hearing • Perceptual manifolds • Exploratory studies • Applications 3 How do humans localize sound source? Primary cues Interaural Time Difference (ITD) Interaural Level Difference (ILD) Explains localization only in the horizontal plane. All points in the one half of the hyperboloid of revolution have the same ITD and IID. [cone of confusion ] Other cues Pinna shape gives elevation cues for higher frequencies. Torso and Head give elevation cues for lower frequencies. Source Left ear HEAD Intricate system to be completely modelled Right ear 4 It’s head, torso, and pinna 5 Head Related Transfer Function(HRTF) Spectral filtering caused by the head, torso, and the pinna. HRIR—Head related impulse response. Can experimentally measure HRIR for all elevation and azimuth. Convolve the source signal with the measured HRIR to create virtual audio 6 Sample HRIR and HRTF Source directly in front of your right ear. 7 CIPIC Database Public Domain HRIR Database HRIRs sampled at 1250 points around the head 45 subjects Anthropometry measurements V. Ralph Algazi, Richard O. Duda, Dennis M. Thompson, Carlos Avendano,"The CIPIC HRTF database, "in WASSAP '01 (2001 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, Mohonk Mountain House, New Paltz, NY, Oct. 2001). 8 Interaural polar coordinate system Azimuth Elevation 9 Plan of the talk • Human spatial hearing • Perceptual manifolds • Exploratory studies • Applications 10 Manifold representation 45o 45o If we can unfold this low-dimensional manifold we have a good perceptual representation of the signal. 11 Our data matrix d Elevation manifold 50 points in a [HRTF=257 HRIR=200] dimensional space 200 x 50 257 x 50 13 Dimensionality Reduction methods • We used to following four methods – Principal Component Analysis (PCA) – Local Linear Embedding (LLE) – Isomap – Maximum Variance Unfolding (MVU) • We expect – The manifold to have an intrinsic dimensionality of 1. – The first embedded component to be monotonic with elevation. 14 HRTF elevation manifold PCA 15 HRTF manifold Isomap (K=3) 16 HRTF manifold Isomap (K=2) 17 HRTF manifold LLE (K=3) 18 HRTF manifold LLE (K=2) 19 HRTF manifold MVU 20 HRTF manifold MVU 21 HRIR elevation manifold PCA Isomap LLE MVU 22 Complete manifold Azimuth -45:5:45 Elevation -45:5:230 • We expect – The manifold to have an intrinsic dimensionality of 2. – The first two embedded components should show a grid like structure. 23 Complete manifold PCA 24 Complete manifold LLE (K=4) 25 Complete manifold LMVU (K=4) 26 Isomap (K=4) 27 HRIR manifold PCA Isomap Isomap Data representation -- manifold properties LLE, MVU - numerical problems 28 Plan of the talk • Human spatial hearing • Perceptual manifolds • Exploratory studies • Applications 29 Problem 1: Interpolation • HRTFs generally measured for a finite sampling grid of elevation and azimuth. • For a smooth virtual audio system we need to interpolate HRTFs. • HRTF measurement is a tedious and time consuming process. – Normally takes an hour. – Subject must be immobile. 30 Some prelimnary results 31 Problem 2: Distance metric • • • • • How to compare any two given HRTFs Perceptually inspired metric Psychoacoustical tests Squared log-magnitude error It is tough to decide what aspects of a given signal are perceptually relevant • Use geodesic distance 32 Distance on the manifold 33 Problem 3: Customization • HRTF measured for a particular person if used for different persons elevation perception is very poor. • Ear shape of each person is unique and also the anatomy. • Each person’s localizing capabilites are tuned to the shape of their ear and anatomy. • A big bottleneck for commercialization of spatial audio. 34 Style vs Content 35 Anthropometric measurements 36 Problem 4: Microphone calibration 37 Thank You ! | Questions ? 38