Slide - University of Maryland Institute for Advanced Computer Studies

advertisement
Extracting the pinna spectral notches
Vikas C. Raykar | Ramani Duraiswami
University of Maryland, CollegePark
B. Yegnanaryana
Indian Institute of Technology, Chennai, India
Plan of the talk




Human Spatial Hearing.
Role of the pinna.
Extracting the pinna spectral notches.
A few exploratory studies.
Human spatial hearing
 Primary cues
 Interaural Time Difference (ITD)
 Interaural Level Difference (ILD)
 Can explain only localization in the
horizontal plane.
 All points in the one half of the
hyperboloid of revolution have the
same ITD and IID.
 [cone of confusion ]
 Other cues
 Pinna shape gives elevation cues
for higher frequencies.
 Torso and Head give elevation
cues for lower frequencies.
Source
Left ear
Intricate system to be completely modelled
HEA
D
Right ear
Head Related Transfer Function (HRTF)
 The spectral filtering
caused by the head,
torso and the pinna can
be described by the
HRTF or HRIR.
 Can experimentally
measure HRTF for all
elevation and azimuth
for both ears for different
persons.
Convolve the source signal with the measured HRIR to create virtual audio
CIPIC Database
 Public Domain HRIR
Database
 HRIRs sampled at
1250 points around
the head
 45 subjects including
the KEMAR
 Anthropometry
measurements
V. Ralph Algazi, Richard O. Duda, Dennis M. Thompson, Carlos Avendano,"The CIPIC HRTF database, "in WASSAP '01
(2001 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, Mohonk Mountain House, New
Paltz, NY, Oct. 2001).
Interaural polar Coordinate system
Azimuth
Elevation
Sample HRTF
HRTF Visualization
Motivation for our work
HRTF Composition
0
1
1
0.2
2
0
3
-0.2
3
-0.4
-0.6
4
0
Frequency(kHz)
Time(ms)
0.4
Pinna Resonances
0
0.6
4
2
Torso Reflection
Direct Pulse
Head Diffraction
5
-20
-40
10
-60
15
-80
20
-100
-0.8
0
100
200
Elevation(deg)
Knee Reflection
HRIR
0
100
200
Pinna Spectral notches Elevation(deg)
HRTF
HRTF Composition
• Good parametric models exist for the head and the torso
effects.
• Role of pinna yet to be properly understood.
HRTFs for different pinnae
Features due to pinna
 Pinna contributes to the sharp
spectral notches and peaks.
 Pinna spectral notches are important
for elevation perception.
 Substantial psychoacoustical,
behavioral, and neurophysiological
evidence.
 We propose a method to
automatically extract the
frequencies of the pinna spectral
notches.
0
Time(ms)
2
0.4
1
1
0.2
2
0
3
-0.2
3
-0.4
-0.6
4
Frequency(kHz)
0.6
4
0
0
5
-20
-40
10
-60
15
-80
20
-100
-0.8
0
100
200
Elevation(deg)
HRIR
0
100
200
Elevation(deg)
HRTF
Extracting the spectral notches
Direct application of Pole-Zero modelling techniques do not work.
 All pole-model is very good at picking the dominant spectral peaks.
Can use Linear Prediction analysis.
 However Pole-Zero models are highly sensitive to noise.
The measured HRIR includes the combined effects of the head diffraction and
shoulder, torso, and the knee reflection. Due to the combined effects it is
difficult to isolate the notches due to pinna alone.
 The order of the model is also not known.
We are not guaranteed that perceptually relevant nulls will be captured.
They do a good fit for the envelope of the signal (provided the order is high),
but perceptually relevant features may not be captured.
Pole-Zero Modelling
Determine the initial onset
 Slope
of the unwrapped phase spectrum gives the initial
delay.
0
0.4
1
1
0.2
2
0
3
-0.2
3
-0.4
-0.6
4
0
Frequency(kHz)
Time(ms)
4
2
0
0.6
5
-20
-40
10
-60
15
-80
20
-100
-0.8
0
100
200
Elevation(deg)
HRIR
0
100
200
Elevation(deg)
HRTF
Eliminate torso/knee reflections
Window the signal using a Hann window.
0
0.2
2
0
3
-0.2
3
-0.4
-0.6
4
0
Frequency(kHz)
0.4
1
1
2
0
0.6
4
Time(ms)

5
-20
-40
10
-60
15
-80
20
-100
-0.8
0
100
200
Elevation(deg)
HRIR
0
100
200 Knee reflection
Torso reflection
Elevation(deg)
HRTF
Effect of Delay
Effect of windowing
Group delay function
The group delay function
• In order to emphasize the spectral nulls we
compute the group delay function.
• It is the negative first derivative of the phase
response of the filter.
• The notch frequencies can be extracted by
finding the extrema in the group delay function.
Remove the spectral peaks
The poles can be extracted by doing a Linear Prediction(LP)Analysis.
LP analysis basically fits an all pole model of order p to the HRIR.
From the given HRIR signal we can compute the LP residual by passing it
through the inverse filter .
Window the LP residual
LP residual
Original signal
Autocorrelation of the windowed LP residual
LP residual
Autocorrelation
The complete algorithm
A particular subject
Different subjects
Pinna is essential for elevation perception
 Psychoacoustical experiments have shown that
high frequencies are essential for vertical
localization. [Gardner and Gardner, Hebrank and Wright,
Musicant and Butler ]
 When pinna cavities are occluded with plastic
moulds localization ability decreases. [Gardner and
Gardner, Hoffman et.al.]
 There is a consensus that elevation perception
is monaural. [Wightman and Kistler, Middlebrooks and Green]
Three questions
 Are the spectral notches perceptually significant ?
 How are these spectral notches caused?
 What can we do once we have these spectral
notches?
 Interpolation
 Customization
 Anthropometric studies
The case for notches ?
3
 Experiments on cats suggest that single
4
auditory nerve fibers are able to signal in
their discharge rates the presence of a 0 100 200
Elevation(deg)
spectral notch. [Poon and Brugge]
HRIR
 Interestingly a moving notch is better,
[implications for head movement]
 Vertical illusion in cats [Tollin and Yin]
0
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
0
Frequency(kHz)
[ Wright et.al and Moore et.al]
Time(ms)
 Spectral peaks and notches are the dominant
cues contributed by the pinna.
 As elevation increases the notch frequency
increases.
0
 The spectral peaks do not show a definite
4
1
1
trend with elevation.
2
 Spectral notches can be detected and
2
3
discriminated.
5
-20
-40
10
-60
15
-80
20
-100
-0.8
0
100 200
Elevation(deg)
HRTF
Vertical Illusion in cats
Tollin DJ and Yin TCT (2003). Spectral cues explain illusory
elevation effects with stereo sounds in cats, J.Neurophysiol.,
90:525-530.
Problem 1: Interpolation
• HRTFs generally measured for a finite sampling grid of
elevation and azimuth.
• For implementing a virtual audio system we need to
interpolate HRTFs.
• HRTF measurement is a tedious and time consuming
process. Normally takes 2 to 4 hours. Subject must be
immoblile.
Current approaches for interpolation
•
•
•
•
Time Domain
Frequency Domain
Principal Components domain
These may not be the right way to do interpolation since
they do not consider the perceptual aspect of human
hearing.
• Ideally interpolation should be done in perceptually
important feature space.
[ Those features in the HRIR important for source
localization ]
Interpolation in the frequency domain
Problem 2: HRTF Customization
• Experimentally shown that with
HRTF measured for a
particular person if used for
different persons elevation
perception is very poor.
• Ear shape of each person is
unique and also the anatomy.
• Each person’s localizing
capabilites are tuned to the
shape of their ear and
anatomy.
Current approaches for Customization
•
•
•
•
Complete Measurement.
Database Matching.
Numerical Modelling.
Frequency Scaling.
Pinna shape and notches
Correlate the pinna spectral notches with
elevation, azimuth and anthropometric
measurements
Empirical approaches
The cause for notches ?
• Batteau’s reflection model
• Hebrank and wrightreflection from the
posterior concha wall
• Lopez-Poveda and
Meddis incorporated
diffraction into the model.
• Shaws model for
resonances
Reflection Model
Shaw’s modes
Extracted Pinna Resonances
Compensating for the pinna angles
Coordinate system alignment
Reduction in ISD
Scope of our work
 Proposed a method to automatically extract the
frequencies of the pinna spectral notches.
 Pinna spectral notches are perceptually significant for
source localization. So instead of convolving with the
complete HRIR we can build simplified models based
on the features extracted.
 Interpolation can be done in the perceptually important
feature domain.
 The pinna spectral noches can be related to the shape
and the size of the pinna and customization of the HRIR
is possible.
Probably open questions
• How sensitive are the notch frequencies to the
probe position?
• Given an image of the ear, can we hope to get the
HRTF?
• What happens behind the ears?
• Are multiple spectral notches redundant?
• What is the role of crus helias?
• How are the spectral notches from the right and left
ear integrated?
• Is analysing in terms of notches the right
approach? Could there be a template matching
approach.
Thank You ! | Questions ?
Download