Extracting the pinna spectral notches Vikas C. Raykar | Ramani Duraiswami University of Maryland, CollegePark B. Yegnanaryana Indian Institute of Technology, Chennai, India Plan of the talk Human Spatial Hearing. Role of the pinna. Extracting the pinna spectral notches. A few exploratory studies. Human spatial hearing Primary cues Interaural Time Difference (ITD) Interaural Level Difference (ILD) Can explain only localization in the horizontal plane. All points in the one half of the hyperboloid of revolution have the same ITD and IID. [cone of confusion ] Other cues Pinna shape gives elevation cues for higher frequencies. Torso and Head give elevation cues for lower frequencies. Source Left ear Intricate system to be completely modelled HEA D Right ear Head Related Transfer Function (HRTF) The spectral filtering caused by the head, torso and the pinna can be described by the HRTF or HRIR. Can experimentally measure HRTF for all elevation and azimuth for both ears for different persons. Convolve the source signal with the measured HRIR to create virtual audio CIPIC Database Public Domain HRIR Database HRIRs sampled at 1250 points around the head 45 subjects including the KEMAR Anthropometry measurements V. Ralph Algazi, Richard O. Duda, Dennis M. Thompson, Carlos Avendano,"The CIPIC HRTF database, "in WASSAP '01 (2001 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, Mohonk Mountain House, New Paltz, NY, Oct. 2001). Interaural polar Coordinate system Azimuth Elevation Sample HRTF HRTF Visualization Motivation for our work HRTF Composition 0 1 1 0.2 2 0 3 -0.2 3 -0.4 -0.6 4 0 Frequency(kHz) Time(ms) 0.4 Pinna Resonances 0 0.6 4 2 Torso Reflection Direct Pulse Head Diffraction 5 -20 -40 10 -60 15 -80 20 -100 -0.8 0 100 200 Elevation(deg) Knee Reflection HRIR 0 100 200 Pinna Spectral notches Elevation(deg) HRTF HRTF Composition • Good parametric models exist for the head and the torso effects. • Role of pinna yet to be properly understood. HRTFs for different pinnae Features due to pinna Pinna contributes to the sharp spectral notches and peaks. Pinna spectral notches are important for elevation perception. Substantial psychoacoustical, behavioral, and neurophysiological evidence. We propose a method to automatically extract the frequencies of the pinna spectral notches. 0 Time(ms) 2 0.4 1 1 0.2 2 0 3 -0.2 3 -0.4 -0.6 4 Frequency(kHz) 0.6 4 0 0 5 -20 -40 10 -60 15 -80 20 -100 -0.8 0 100 200 Elevation(deg) HRIR 0 100 200 Elevation(deg) HRTF Extracting the spectral notches Direct application of Pole-Zero modelling techniques do not work. All pole-model is very good at picking the dominant spectral peaks. Can use Linear Prediction analysis. However Pole-Zero models are highly sensitive to noise. The measured HRIR includes the combined effects of the head diffraction and shoulder, torso, and the knee reflection. Due to the combined effects it is difficult to isolate the notches due to pinna alone. The order of the model is also not known. We are not guaranteed that perceptually relevant nulls will be captured. They do a good fit for the envelope of the signal (provided the order is high), but perceptually relevant features may not be captured. Pole-Zero Modelling Determine the initial onset Slope of the unwrapped phase spectrum gives the initial delay. 0 0.4 1 1 0.2 2 0 3 -0.2 3 -0.4 -0.6 4 0 Frequency(kHz) Time(ms) 4 2 0 0.6 5 -20 -40 10 -60 15 -80 20 -100 -0.8 0 100 200 Elevation(deg) HRIR 0 100 200 Elevation(deg) HRTF Eliminate torso/knee reflections Window the signal using a Hann window. 0 0.2 2 0 3 -0.2 3 -0.4 -0.6 4 0 Frequency(kHz) 0.4 1 1 2 0 0.6 4 Time(ms) 5 -20 -40 10 -60 15 -80 20 -100 -0.8 0 100 200 Elevation(deg) HRIR 0 100 200 Knee reflection Torso reflection Elevation(deg) HRTF Effect of Delay Effect of windowing Group delay function The group delay function • In order to emphasize the spectral nulls we compute the group delay function. • It is the negative first derivative of the phase response of the filter. • The notch frequencies can be extracted by finding the extrema in the group delay function. Remove the spectral peaks The poles can be extracted by doing a Linear Prediction(LP)Analysis. LP analysis basically fits an all pole model of order p to the HRIR. From the given HRIR signal we can compute the LP residual by passing it through the inverse filter . Window the LP residual LP residual Original signal Autocorrelation of the windowed LP residual LP residual Autocorrelation The complete algorithm A particular subject Different subjects Pinna is essential for elevation perception Psychoacoustical experiments have shown that high frequencies are essential for vertical localization. [Gardner and Gardner, Hebrank and Wright, Musicant and Butler ] When pinna cavities are occluded with plastic moulds localization ability decreases. [Gardner and Gardner, Hoffman et.al.] There is a consensus that elevation perception is monaural. [Wightman and Kistler, Middlebrooks and Green] Three questions Are the spectral notches perceptually significant ? How are these spectral notches caused? What can we do once we have these spectral notches? Interpolation Customization Anthropometric studies The case for notches ? 3 Experiments on cats suggest that single 4 auditory nerve fibers are able to signal in their discharge rates the presence of a 0 100 200 Elevation(deg) spectral notch. [Poon and Brugge] HRIR Interestingly a moving notch is better, [implications for head movement] Vertical illusion in cats [Tollin and Yin] 0 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 0 Frequency(kHz) [ Wright et.al and Moore et.al] Time(ms) Spectral peaks and notches are the dominant cues contributed by the pinna. As elevation increases the notch frequency increases. 0 The spectral peaks do not show a definite 4 1 1 trend with elevation. 2 Spectral notches can be detected and 2 3 discriminated. 5 -20 -40 10 -60 15 -80 20 -100 -0.8 0 100 200 Elevation(deg) HRTF Vertical Illusion in cats Tollin DJ and Yin TCT (2003). Spectral cues explain illusory elevation effects with stereo sounds in cats, J.Neurophysiol., 90:525-530. Problem 1: Interpolation • HRTFs generally measured for a finite sampling grid of elevation and azimuth. • For implementing a virtual audio system we need to interpolate HRTFs. • HRTF measurement is a tedious and time consuming process. Normally takes 2 to 4 hours. Subject must be immoblile. Current approaches for interpolation • • • • Time Domain Frequency Domain Principal Components domain These may not be the right way to do interpolation since they do not consider the perceptual aspect of human hearing. • Ideally interpolation should be done in perceptually important feature space. [ Those features in the HRIR important for source localization ] Interpolation in the frequency domain Problem 2: HRTF Customization • Experimentally shown that with HRTF measured for a particular person if used for different persons elevation perception is very poor. • Ear shape of each person is unique and also the anatomy. • Each person’s localizing capabilites are tuned to the shape of their ear and anatomy. Current approaches for Customization • • • • Complete Measurement. Database Matching. Numerical Modelling. Frequency Scaling. Pinna shape and notches Correlate the pinna spectral notches with elevation, azimuth and anthropometric measurements Empirical approaches The cause for notches ? • Batteau’s reflection model • Hebrank and wrightreflection from the posterior concha wall • Lopez-Poveda and Meddis incorporated diffraction into the model. • Shaws model for resonances Reflection Model Shaw’s modes Extracted Pinna Resonances Compensating for the pinna angles Coordinate system alignment Reduction in ISD Scope of our work Proposed a method to automatically extract the frequencies of the pinna spectral notches. Pinna spectral notches are perceptually significant for source localization. So instead of convolving with the complete HRIR we can build simplified models based on the features extracted. Interpolation can be done in the perceptually important feature domain. The pinna spectral noches can be related to the shape and the size of the pinna and customization of the HRIR is possible. Probably open questions • How sensitive are the notch frequencies to the probe position? • Given an image of the ear, can we hope to get the HRTF? • What happens behind the ears? • Are multiple spectral notches redundant? • What is the role of crus helias? • How are the spectral notches from the right and left ear integrated? • Is analysing in terms of notches the right approach? Could there be a template matching approach. Thank You ! | Questions ?