ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 3: Perception 1. 2. 3. 4. Ear Physiology Auditory Psychophysics Pitch Perception Music Perception Dan Ellis Dept. Electrical Engineering, Columbia University dpwe@ee.columbia.edu E4896 Music Signal Processing (Dan Ellis) http://www.ee.columbia.edu/~dpwe/e4896/ 2013-02-04 - 1 /24 1. Ear Physiology • The ear is a very sensitive transducer of air pressure variations into nerve firings just above Brownian motion !? Middle ear Auditory nerve Cortex Outer ear Midbrain Inner ear (cochlea) • The cochlea is largely understood the brain is more difficult E4896 Music Signal Processing (Dan Ellis) 2013-02-04 - 2 /24 The Ear • Impedance matching & transduction Ear canal Middle ear bones Pinna Cochlea (inner ear) Eardrum (tympanum) pinna acts as horn eardrum + bones match impedance cochlea transduces to nerve firings E4896 Music Signal Processing (Dan Ellis) 2013-02-04 - 3 /24 The Cochlea • Complex resonant structure Oval window (from ME bones) Travelling wave Basilar Membrane (BM) Cochlea 16 kHz Resonant frequency 50 Hz 0 Position 35mm http://www.wadalab.mech.tohoku.ac.jp/FEM_BM-e.html active feedback to maintain near-ringing state efferent fibers? E4896 Music Signal Processing (Dan Ellis) 2013-02-04 - /24 Hair Cells • Transduce mechanical motion to nerve firings Cochlea Tectorial membrane Inner Hair Cell (IHC) Basilar membrane Auditory nerve Outer Hair Cell (OHC) 3,000 IHCs driving 20,000 nerves easily damaged E4896 Music Signal Processing (Dan Ellis) 2013-02-04 - 5 /24 Auditory Nerve • IHC fires near maximum displacement Local BM displacement 50 time / ms Typical nerve signal (mV) cannot fire every cycle some “noise” Firing count Cycle angle E4896 Music Signal Processing (Dan Ellis) 2013-02-04 - 6 /24 Nerve Responses • Onset enhancement • Frequency selectivity • Dynamic range Spike count 100 Time 100 ms Tone burst One fiber: ~ 25 dB dynamic range dB SPL 300 Spikes/sec 80 60 40 200 100 20 Intensity / dB SPL 0 100 Hz 1 kHz (log) frequency E4896 Music Signal Processing (Dan Ellis) 0 20 40 60 80 100 10 kHz Hearing dynamic range > 100 dB 2013-02-04 - 7 /24 Auditory Nerve Ensemble Secker-Walker & Searle ’90 • Ensemble of nerves provide full information similar to constant-Q log-intensity spectrogram freq / 8ve re 100 Hz PatSla rectsmoo on bbctmp2 (2001-02-18) 5 4 3 2 1 0 0 10 E4896 Music Signal Processing (Dan Ellis) 20 30 40 50 60 time / ms 2013-02-04 - 8 /24 Auditory Models • Filterbank + nonlinearity varying (but broad) bandwidth IHC Sound Outer/middle ear filtering Cochlea filterbank IHC SlaneyPatterson 12 chans/oct from 180 Hz, BBC1tmp (20010218) 60 channel 50 40 30 20 10 0 0.1 E4896 Music Signal Processing (Dan Ellis) 0.2 0.3 0.4 0.5 time / s 2013-02-04 - 9 /24 2. Auditory Psychophysics • Extensive study of relationship between physical (Φ) and psychological (Ψ) values perception is not “direct”! • Common across all perceptual modalities proprioceptive force, body positioning vision hearing • Φ - Ψ distinction is important! E4896 Music Signal Processing (Dan Ellis) 2013-02-04 - 10/24 Loudness perception • Perception of physical parameter just noticeable difference - jnd magnitude scaling • Weber’s law: • Loudness L log(L) = log10 (L) = dB(I) = log(L) I I 0.3 0.3 log(I) 0.03 dB(I) 33.3 log10 (L) log(I) Hartmann(1993) Classroom loudness scaling data 2.6 Log(loudness rating) I 2.4 Textbook figure: L α I 0.3 2.2 2.0 Power law fit: L α I 0.22 1.8 1.6 1.4 -20 -10 0 10 Sound Hartmann ’90level tracks 19+20 E4896 Music Signal Processing (Dan Ellis) 2013-02-04 - 11/24 Equal Loudness • Fletcher-Munson curves (1937) Intensity / dB SPL 120 80 40 0 100 1000 10,000 freq / Hz match intensity to specific 1 kHz tone loudness growth E4896 Music Signal Processing (Dan Ellis) 2013-02-04 - 12/24 Masking • Limited dynamic range in cochlea Intensity / dB effect within frequency“critical bands” absolute threshold masking tone masked threshold log freq basis of MPEG Audio temporal effects level / dB • Forward/backward Masking tone 20 Elevated masking threshold “skirt” 18 16 14 12 10 20 8 15 6 4 10 2 5 0 0 50 100 150 time / ms E4896 Music Signal Processing (Dan Ellis) 200 250 freq / Bark 0 tracks 23-25 2013-02-04 - 13/24 Limits of Hearing • Test what listeners can discriminate A B “two-interval forced-choice”: X X = A or B? time • Roughly... timing: 2 ms difference, 20 ms ordering tuning: ~1% spectral profile: single components ~ 2 dB phase? tones vs. noise... E4896 Music Signal Processing (Dan Ellis) 2013-02-04 - 14/24 3. Pitch Perception • Complex (non-sinusoidal) tones give a single, fused percept despite harmonics resolved by cochlea freq. chan. 70 60 50 40 30 20 10 0 0.05 0.1 time/s percept is of a single pitch .. but pitch does NOT rely on the fundamental track 37 E4896 Music Signal Processing (Dan Ellis) 2013-02-04 - 15/24 “Place” models of pitch • Hypothesis: Pitch results from activation pattern resolved harmonics frequency channel Correlate with harmonic ‘sieve’: Pitch strength Duifhuis et al. ’82 AN excitation broader HF channels cannot resolve harmonics frequency channel support: low harmonics are important but: pitch of noisy signals E4896 Music Signal Processing (Dan Ellis) 2013-02-04 - 16/24 “Time” models of pitch • Autocorrelation freq neatly unifies pitch phenomena per-channel autocorrelation time Meddis & Hewitt’ 91 autocorrelation Summary autocorrelation 0 10 20 30 common period (pitch) lag / ms but: high-frequency modulation evokes weak pitch E4896 Music Signal Processing (Dan Ellis) 2013-02-04 - 17/24 Competing Cues • Perhaps brains use both place & time cues common perceptual strategy: opportunistic combination of information • e.g. Probabilistic combination arg max P r( |x) P r( |x1 )P r( |x2 ) arg max P r( ) if x1, x2 are independent given θ E4896 Music Signal Processing (Dan Ellis) 2013-02-04 - 18/24 4. Music Perception • Hearing music involves • Can study with subjective experiments E4896 Music Signal Processing (Dan Ellis) Grey ’75 instruments notes rhythm 2013-02-04 - 19/24 Scene Analysis • Detect separate events freq / Hz common onset common harmonicity 8000 6000 Pierce ’83 4000 2000 0 0 1 2 3 4 5 6 7 8 9 time / s instruments & timbre E4896 Music Signal Processing (Dan Ellis) 2013-02-04 - 20/24 Consonance • Musical intervals • Pitch Helix E4896 Music Signal Processing (Dan Ellis) Warren et al. 2003 relate to harmonic proximity 2013-02-04 - 21/24 Rhythm • Sensitive to periodicity speech? breathing? brain? • Onsets + autocorrelation? variations in tapping 4/4 vs 3/4 40 40 30 30 20 20 10 10 0 1 2 3 4 5 6 7 8 9 10 4 4 2 2 0 0 -2 0 1 2 3 4 5 6 7 8 9 10 -2 80 80 60 60 40 40 20 20 0 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 100 200 300 400 500 600 700 800 900 1000 0 0 1 2 3 4 5 6 7 8 9 10 1000 100 500 0 0 0 100 200 300 400 500 600 700 800 E4896 Music Signal Processing (Dan Ellis) 900 1000 -100 2013-02-04 - 22/24 Sequences • Perceptual effects of sequences e.g. streaming frequency TRT: 60-150 ms 1 kHz Δf: –2 octaves track 36 time • Music is built of sequences different sensitivities E4896 Music Signal Processing (Dan Ellis) 2013-02-04 - 23/24 Summary • Ear converts air pressure to nerve firings spectral analysis • Brain does a lot with scarce information dealing with the real world • Music is a complex signal multiple sources harmonic structures temporal patterns E4896 Music Signal Processing (Dan Ellis) 2013-02-04 - 24/24 References • • • • • • • • • • H. Duifhuis, L. F. Willems, & R. J. Sluyter, “Measurement of pitch in speech: An implementation of Goldstein’s theory of pitch perception,” J. Acoust. Soc. Am., 71(6):1568-1580, 1982. H. Fletcher & W. A. Munson, “Relation between loudness and masking,” J. Acoust. Soc. Am., 9:1-10, 1937. B. Gold, N. Morgan, & D. Ellis, Speech and Audio Signal Processing (2nd ed.), Wiley, 2011. J. M. Grey, An Exploration of Musical Timbre, Ph.D. Dissertation, Stanford CCRMA, No. STAN-M-2, 1975. W. M. Hartmann, “Auditory demonstrations on compact disk for large N,” J. Acoust. Soc. Am., 93(1):1-16, 1993. R. Meddis & M. J. Hewitt, “Virtual pitch and phase sensitivity of a computer model of the auditory periphery. I: Pitch identification,” J. Acoust. Soc. Am., 89(6):2866-2882, 1991. B. C. J. Moore, An Introduction to the Psychology of Hearing (4th ed.), Academic Press, 1997. J. R. Pierce, The Science of Musical Sound, Scientific American Press, 1983. H. E. Secker-Walker & C. L. Searle, “Time-domain analysis of auditory-nervefiber firing rates,” J. Acoust. Soc. Am., 88(3):1427- 1436, 1990. J. D. Warren, S. Uppenkamp, R. D. Patterson, & T. D. Griffiths, “Separating pitch chroma and pitch height in the human brain,” PNAS 100(17):10038-42, 2003. E4896 Music Signal Processing (Dan Ellis) 2013-02-04 - 25/24