Lecture 3: Perception 1. Ear Physiology 2. Auditory Psychophysics

advertisement
ELEN E4896 MUSIC SIGNAL PROCESSING
Lecture 3:
Perception
1.
2.
3.
4.
Ear Physiology
Auditory Psychophysics
Pitch Perception
Music Perception
Dan Ellis
Dept. Electrical Engineering, Columbia University
dpwe@ee.columbia.edu
E4896 Music Signal Processing (Dan Ellis)
http://www.ee.columbia.edu/~dpwe/e4896/
2013-02-04 - 1 /24
1. Ear Physiology
• The ear is a very sensitive transducer
of air pressure variations into nerve firings
just above Brownian motion !?
Middle
ear
Auditory
nerve
Cortex
Outer
ear
Midbrain
Inner ear
(cochlea)
• The cochlea is largely understood
the brain is more difficult
E4896 Music Signal Processing (Dan Ellis)
2013-02-04 - 2 /24
The Ear
• Impedance matching & transduction
Ear canal
Middle ear
bones
Pinna
Cochlea
(inner ear)
Eardrum
(tympanum)
pinna acts as horn
eardrum + bones match impedance
cochlea transduces to nerve firings
E4896 Music Signal Processing (Dan Ellis)
2013-02-04 - 3 /24
The Cochlea
• Complex resonant structure
Oval window
(from ME bones)
Travelling
wave
Basilar Membrane
(BM)
Cochlea
16 kHz
Resonant
frequency
50 Hz
0
Position
35mm
http://www.wadalab.mech.tohoku.ac.jp/FEM_BM-e.html
active feedback to maintain near-ringing state
efferent fibers?
E4896 Music Signal Processing (Dan Ellis)
2013-02-04 -
/24
Hair Cells
• Transduce mechanical motion
to nerve firings
Cochlea
Tectorial
membrane
Inner Hair Cell
(IHC)
Basilar
membrane
Auditory nerve
Outer Hair Cell
(OHC)
3,000 IHCs driving 20,000 nerves
easily damaged
E4896 Music Signal Processing (Dan Ellis)
2013-02-04 - 5 /24
Auditory Nerve
• IHC fires near maximum displacement
Local BM
displacement
50
time / ms
Typical nerve
signal (mV)
cannot fire every cycle
some “noise”
Firing
count
Cycle
angle
E4896 Music Signal Processing (Dan Ellis)
2013-02-04 - 6 /24
Nerve Responses
• Onset enhancement
• Frequency selectivity
• Dynamic range
Spike
count
100
Time
100 ms
Tone burst
One fiber:
~ 25 dB dynamic range
dB SPL
300
Spikes/sec
80
60
40
200
100
20
Intensity / dB SPL
0
100 Hz
1 kHz
(log) frequency
E4896 Music Signal Processing (Dan Ellis)
0
20
40
60
80
100
10 kHz
Hearing dynamic range > 100 dB
2013-02-04 - 7 /24
Auditory Nerve Ensemble
Secker-Walker & Searle ’90
• Ensemble of nerves provide full information
similar to constant-Q log-intensity spectrogram
freq / 8ve re 100 Hz
PatSla rectsmoo on bbctmp2 (2001-02-18)
5
4
3
2
1
0
0
10
E4896 Music Signal Processing (Dan Ellis)
20
30
40
50
60
time / ms
2013-02-04 - 8 /24
Auditory Models
• Filterbank + nonlinearity
varying (but broad) bandwidth
IHC
Sound
Outer/middle
ear
filtering
Cochlea
filterbank
IHC
SlaneyPatterson 12 chans/oct from 180 Hz, BBC1tmp (20010218)
60
channel
50
40
30
20
10
0
0.1
E4896 Music Signal Processing (Dan Ellis)
0.2
0.3
0.4
0.5
time / s
2013-02-04 - 9 /24
2. Auditory Psychophysics
• Extensive study of relationship between
physical (Φ) and psychological (Ψ) values
perception is not “direct”!
• Common across all perceptual modalities
proprioceptive force, body positioning
vision
hearing
• Φ - Ψ distinction is important!
E4896 Music Signal Processing (Dan Ellis)
2013-02-04 - 10/24
Loudness perception
• Perception of physical parameter
just noticeable difference - jnd
magnitude scaling
• Weber’s law:
• Loudness
L
log(L) =
log10 (L) =
dB(I) =
log(L)
I
I 0.3
0.3 log(I)
0.03 dB(I)
33.3 log10 (L)
log(I)
Hartmann(1993) Classroom loudness scaling data
2.6
Log(loudness rating)
I
2.4
Textbook figure:
L α I 0.3
2.2
2.0
Power law fit:
L α I 0.22
1.8
1.6
1.4
-20
-10
0
10 Sound
Hartmann
’90level
tracks 19+20
E4896 Music Signal Processing (Dan Ellis)
2013-02-04 - 11/24
Equal Loudness
• Fletcher-Munson curves (1937)
Intensity / dB SPL
120
80
40
0
100
1000
10,000
freq / Hz
match intensity to specific 1 kHz tone
loudness growth
E4896 Music Signal Processing (Dan Ellis)
2013-02-04 - 12/24
Masking
• Limited dynamic range in cochlea
Intensity / dB
effect within frequency“critical bands”
absolute
threshold
masking
tone
masked
threshold
log freq
basis of MPEG Audio
temporal effects
level / dB
• Forward/backward
Masking tone
20
Elevated masking
threshold “skirt”
18
16
14
12
10
20
8
15
6
4
10
2
5
0
0
50
100
150
time / ms
E4896 Music Signal Processing (Dan Ellis)
200
250
freq / Bark
0
tracks 23-25
2013-02-04 - 13/24
Limits of Hearing
• Test what listeners can discriminate
A
B
“two-interval
forced-choice”:
X
X = A or B?
time
• Roughly...
timing: 2 ms difference, 20 ms ordering
tuning: ~1%
spectral profile: single components ~ 2 dB
phase?
tones vs. noise...
E4896 Music Signal Processing (Dan Ellis)
2013-02-04 - 14/24
3. Pitch Perception
• Complex (non-sinusoidal) tones
give a single, fused percept
despite harmonics resolved by cochlea
freq. chan.
70
60
50
40
30
20
10
0
0.05
0.1
time/s
percept is of a single pitch
.. but pitch does NOT rely on the fundamental
track 37
E4896 Music Signal Processing (Dan Ellis)
2013-02-04 - 15/24
“Place” models of pitch
• Hypothesis:
Pitch results from activation pattern
resolved
harmonics
frequency channel
Correlate with harmonic ‘sieve’:
Pitch strength
Duifhuis et al. ’82
AN excitation
broader HF channels
cannot resolve
harmonics
frequency channel
support: low harmonics are important
but: pitch of noisy signals
E4896 Music Signal Processing (Dan Ellis)
2013-02-04 - 16/24
“Time” models of pitch
• Autocorrelation
freq
neatly unifies pitch phenomena
per-channel
autocorrelation
time
Meddis & Hewitt’ 91
autocorrelation
Summary
autocorrelation
0
10
20
30
common period
(pitch)
lag / ms
but: high-frequency modulation evokes weak pitch
E4896 Music Signal Processing (Dan Ellis)
2013-02-04 - 17/24
Competing Cues
• Perhaps brains use both place & time cues
common perceptual strategy:
opportunistic combination of information
• e.g. Probabilistic combination
arg max P r( |x)
P r( |x1 )P r( |x2 )
arg max
P r( )
if x1, x2 are independent given θ
E4896 Music Signal Processing (Dan Ellis)
2013-02-04 - 18/24
4. Music Perception
• Hearing music involves
• Can study with
subjective
experiments
E4896 Music Signal Processing (Dan Ellis)
Grey ’75
instruments
notes
rhythm
2013-02-04 - 19/24
Scene Analysis
• Detect separate events
freq / Hz
common onset
common harmonicity
8000
6000
Pierce ’83
4000
2000
0
0
1
2
3
4
5
6
7
8
9 time / s
instruments & timbre
E4896 Music Signal Processing (Dan Ellis)
2013-02-04 - 20/24
Consonance
• Musical intervals
• Pitch Helix
E4896 Music Signal Processing (Dan Ellis)
Warren et al. 2003
relate to
harmonic
proximity
2013-02-04 - 21/24
Rhythm
• Sensitive to periodicity
speech? breathing? brain?
• Onsets + autocorrelation?
variations in tapping
4/4 vs 3/4
40
40
30
30
20
20
10
10
0
1
2
3
4
5
6
7
8
9
10
4
4
2
2
0
0
-2
0
1
2
3
4
5
6
7
8
9
10
-2
80
80
60
60
40
40
20
20
0
0
1
2
3
4
5
6
7
8
9
10
0
1
2
3
4
5
6
7
8
9
10
0
1
2
3
4
5
6
7
8
9
10
0
100
200
300
400
500
600
700
800
900
1000
0
0
1
2
3
4
5
6
7
8
9
10
1000
100
500
0
0
0
100
200
300
400
500
600
700
800
E4896 Music Signal Processing (Dan Ellis)
900
1000
-100
2013-02-04 - 22/24
Sequences
• Perceptual effects of sequences
e.g. streaming
frequency
TRT: 60-150 ms
1 kHz
Δf:
–2 octaves
track 36
time
• Music is built of sequences
different sensitivities
E4896 Music Signal Processing (Dan Ellis)
2013-02-04 - 23/24
Summary
• Ear converts air pressure to nerve firings
spectral analysis
• Brain does a lot with scarce information
dealing with the real world
• Music is a complex signal
multiple sources
harmonic structures
temporal patterns
E4896 Music Signal Processing (Dan Ellis)
2013-02-04 - 24/24
References
•
•
•
•
•
•
•
•
•
•
H. Duifhuis, L. F. Willems, & R. J. Sluyter, “Measurement of pitch in speech: An
implementation of Goldstein’s theory of pitch perception,” J. Acoust. Soc. Am.,
71(6):1568-1580, 1982.
H. Fletcher & W. A. Munson, “Relation between loudness and masking,” J.
Acoust. Soc. Am., 9:1-10, 1937.
B. Gold, N. Morgan, & D. Ellis, Speech and Audio Signal Processing (2nd ed.),
Wiley, 2011.
J. M. Grey, An Exploration of Musical Timbre, Ph.D. Dissertation, Stanford
CCRMA, No. STAN-M-2, 1975.
W. M. Hartmann, “Auditory demonstrations on compact disk for large N,” J.
Acoust. Soc. Am., 93(1):1-16, 1993.
R. Meddis & M. J. Hewitt, “Virtual pitch and phase sensitivity of a computer
model of the auditory periphery. I: Pitch identification,” J. Acoust. Soc. Am.,
89(6):2866-2882, 1991.
B. C. J. Moore, An Introduction to the Psychology of Hearing (4th ed.), Academic
Press, 1997.
J. R. Pierce, The Science of Musical Sound, Scientific American Press, 1983.
H. E. Secker-Walker & C. L. Searle, “Time-domain analysis of auditory-nervefiber firing rates,” J. Acoust. Soc. Am., 88(3):1427- 1436, 1990.
J. D. Warren, S. Uppenkamp, R. D. Patterson, & T. D. Griffiths, “Separating pitch
chroma and pitch height in the human brain,” PNAS 100(17):10038-42, 2003.
E4896 Music Signal Processing (Dan Ellis)
2013-02-04 - 25/24
Download