On the manifolds of spatial hearing

advertisement
On the manifolds of spatial hearing
Vikas C. Raykar and Ramani Duraiswami
University of Maryland College Park
NIPS 2006 workshop on novel applications of dimensionality reduction
December 9, 2006
1
Human spatial hearing
How are humans able to judge the
direction of a sound source?
Why do we have two ears?
Why is the pinna shaped
the way it is?
2
Plan of the talk
• Human spatial hearing
• Perceptual manifolds
• Exploratory studies
• Applications
3
How do humans localize sound source?
 Primary cues
 Interaural Time Difference (ITD)
 Interaural Level Difference (ILD)
 Explains localization only in the
horizontal plane.
 All points in the one half of the
hyperboloid of revolution have the same
ITD and IID.
 [cone of confusion ]
 Other cues
 Pinna shape gives elevation cues for
higher frequencies.
 Torso and Head give elevation cues
for lower frequencies.
Source
Left ear
HEAD
Intricate system to be completely modelled
Right ear
4
It’s head, torso, and pinna
5
Head Related Transfer Function(HRTF)



Spectral filtering caused by
the head, torso, and the
pinna.
HRIR—Head related impulse
response.
Can experimentally measure
HRIR for all elevation and
azimuth.
Convolve the source signal with the measured
HRIR to create virtual audio
6
Sample HRIR and HRTF
Source directly in front of your right ear.
7
CIPIC Database
 Public Domain HRIR
Database
 HRIRs sampled at
1250 points around
the head
 45 subjects
 Anthropometry
measurements
V. Ralph Algazi, Richard O. Duda, Dennis M. Thompson, Carlos Avendano,"The CIPIC HRTF database, "in WASSAP '01 (2001
IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, Mohonk Mountain House, New Paltz, NY, Oct.
2001).
8
Interaural polar coordinate system
Azimuth
Elevation
9
Plan of the talk
• Human spatial hearing
• Perceptual manifolds
• Exploratory studies
• Applications
10
Manifold representation
  45o

  45o
If we can unfold
this low-dimensional manifold we have a
good perceptual representation of the signal.
11
Our data matrix
d
Elevation manifold
50 points in a [HRTF=257 HRIR=200] dimensional space
200 x 50
257 x 50
13
Dimensionality Reduction methods
• We used to following four methods
– Principal Component Analysis (PCA)
– Local Linear Embedding (LLE)
– Isomap
– Maximum Variance Unfolding (MVU)
• We expect
– The manifold to have an intrinsic
dimensionality of 1.
– The first embedded component to be
monotonic with elevation.
14
HRTF elevation manifold PCA
15
HRTF manifold Isomap (K=3)
16
HRTF manifold Isomap (K=2)
17
HRTF manifold LLE (K=3)
18
HRTF manifold LLE (K=2)
19
HRTF manifold MVU
20
HRTF manifold MVU
21
HRIR elevation manifold
PCA
Isomap
LLE
MVU
22
Complete manifold
Azimuth -45:5:45
Elevation -45:5:230
• We expect
– The manifold to have an intrinsic
dimensionality of 2.
– The first two embedded components should
show a grid like structure.
23
Complete manifold PCA
24
Complete manifold LLE (K=4)
25
Complete manifold LMVU (K=4)
26
Isomap (K=4)
27
HRIR manifold
PCA
Isomap
Isomap
Data representation -- manifold properties
LLE, MVU - numerical problems
28
Plan of the talk
• Human spatial hearing
• Perceptual manifolds
• Exploratory studies
• Applications
29
Problem 1: Interpolation
• HRTFs generally measured for a finite sampling
grid of elevation and azimuth.
• For a smooth virtual audio system we need to
interpolate HRTFs.
• HRTF measurement is a tedious and time
consuming process.
– Normally takes an hour.
– Subject must be immobile.
30
Some prelimnary results
31
Problem 2: Distance metric
•
•
•
•
•
How to compare any two given HRTFs
Perceptually inspired metric
Psychoacoustical tests
Squared log-magnitude error
It is tough to decide what aspects of a
given signal are perceptually relevant
• Use geodesic distance
32
Distance on the manifold
33
Problem 3: Customization
• HRTF measured for a particular person if used for
different persons elevation perception is very
poor.
• Ear shape of each person is unique and also the
anatomy.
• Each person’s localizing capabilites are tuned to
the shape of their ear and anatomy.
• A big bottleneck for commercialization of spatial
audio.
34
Style vs Content
35
Anthropometric measurements
36
Problem 4: Microphone calibration
37
Thank You ! | Questions ?
38
Download