critical bands/auditory filters

advertisement
Auditory perception classes:
critical bands/auditory filters
David Poeppel
Questions/complaints/concerns:
david.poeppel@nyu.edu
QuickTime™ and a
decompressor
are needed to see this picture.
Coronal slice (structural MRI) illustrating localized
activation in superior auditory cortex (upper bank of
superior temporal gyrus) to sinusoidal tones of different
frequencies.
Coronal slice illustrating auditory pathway from ear to
auditory cortex
Auditory system
Kandel 2000
Frontal lobe
8a
46d
Temporal lobe
Tpt
Parabelt
Belt
STS
CPB
CL
CM
Core
Thalamus
12vl
ML
MGv
Midbrain
MGd
ICc
Lateral lemniscus
Sup. olivary complex
Contralateral cochlear n.
TS1, 2
RPB
RM
AL
R
A1
10, orb
RTL
RTM
RT
MGm
Sg/Lim
ICp
ICdc
DNLL
VNLL
PM
ICx
LSO
MSO
MNTB
AVCN
PVCN
DCN
Kaas & Hackett , PNAS 97, 11793 - 9 (2000)
Hall & Garcia, 2012
Auditory cortex
Auditory cortex
Medial
geniculate body
Medial
geniculate body
Inferior colliculus
Inferior colliculus
Lateral
lemniscus
Lateral
lemniscus
Superior Olivary
Complex
Superior Olivary
Complex
Cochlear Nucleus
Cochlear Nucleus
Left Cochlea
Right Cochlea
Chandrasekaran & Kraus 2009
Functional anatomy of speech sound processing
Hickok & Poeppel, 2007, Nat Rev Neurosci
A few reminders about the characteristics
• Frequency range of human auditory system
– 20 Hz to 20,000 Hz (textbook); 50 Hz to 10,000 Hz (really); most
psychophysics is done between 100 - 5,000 Hz (because that is the
range in which one obtains interpretable data).
• Intensity range
– Extends over many orders of magnitude (depending on frequency); at
the ‘sweet spot’ (~1000-3000 Hz) about a 120 dB dynamic range
• Sensitivity
– JNDs for frequency: ~0.2% (e.g., at 1000 Hz base frequency, listeners
can distinguish 1000 Hz from 1002 Hz -- impressive!)
– JNDs for loudness discrimination: ~1 dB
– Sensitivity to timing differences: a few microseconds in spatial hearing
(JNDs for azimuthal localization ~1 deg); 2 milliseconds (gap
thresholds); 25 milliseconds (order threshold). Really impressive …
Absolute threshold of hearing in quiet
audiograms
Perceptual attributes of sounds
• Spatial location
– Binaural hearing (inter-aural time and intensity differences),
head-related transfer function.
• Loudness
– Signal amplitude (ASA Demos 8-11, compare 6/3/1 dB steps)
• Pitch
– sound frequency, fundamental frequency of complex periodic
signals, or inter-harmonic spacing
• Timbre
– Distribution of energy across frequency, shape of the spectrum
The frequency resolution/processing of the system
underlies the construction of perceptual attributes.
Pure vs. complex tones (all A440) pitch, timbre, phase
T (= 2.27 ms)
t
 pitch is (largely) phase invariant
The auditory periphery
Human cochlea: 3-D reconstruction
QuickTime™ and a
decompressor
are needed to see this picture.
Cochlear animation, the Hudspeth version
http://www.rockefeller.edu/labheads/hudspeth/movie06_popup.html
Masking
• The interference one sound causes in the
reception of another sound
– Peripheral component/cause: overlapping excitation pattern
– Central component/cause: uncertainty - “informational masking”
• Masking experiments have been used
extensively to investigate spectral and temporal
aspects of hearing
– Masking to study frequency selectivity: the critical band
– Forward and backward masking (temporal and spectral constraints)
– Comodulation masking release CMR (‘unmasking’ of sub-threshold
signal by comodulated signal in different regime)
Classic experiment: Fletcher 1940
Bandlimited noise
stimuli
Classic experiment: Fletcher 1940
Sound level (dB)
Determine threshold of sinusoidal signal in noise.
Noise always centered at signal frequency.
signal
masker
Frequency (Hz)
Schooneveldt & Moore 1989
• ASA Demos 2-6 -- count tones in noise, as function of bandwidth.
• Increases in noise bandwidth result in more noise passing through a given filter, yielding more masking.
However, when the noise bandwidth exceeds the filter bandwidth, there is no more threshold change. The
point at which further increases yield no further threshold in creases: critical band.
• Starting with Fletcher, masking studies have been used to evaluate frequency selectivity of auditory
system.
• Interpretation of masking data: auditory periphery can be described as a set of contiguous, overlapping
bandpass filters, with overlapping passbands. These “auditory filters” comprise the first stage in the spectrotemporal analysis of all sounds.
loudness
Critical bands by loudness comparison
frequency
Zwicker & Feldtkeller 1967; Scharf 1970; Rossing 1982
Reference noise band compared to test noise band with increasing bandwidth (constant power).
When the bandwidth of the test noise exceeds the critical bandwidth, the loudness begins to increase.
(ASA Demo 7)
Model of masking: Power spectrum model
1.
2.
3.
4.
The (peripheral) auditory system contains an array of linear overlapping bandpass filters.
When detecting signal in noise, listener makes use of just one filter, centered close to the
signal frequency. This filter will pass the signal but remove a great deal of the noise.
Only the noise components passing through the filter will mask the signal.
The threshold is determined by the amount of noise passing through the filter. The threshold
corresponds to some signal-to-noise ratio K at the output of the filter.
• Simplifying assumption made by Fletcher: rectangular filters, ‘flat top’, width of the filter is CB.
• Estimate value of CB indirectly by measuring power of sinusoidal signal Ps required for
detection in broadband white noise of power density N0.
Noise falling within CB is N0 x CB. Following 4 above, Ps/(N0 x CB) = K
CB = Ps/(N0 x K)
By measuring Ps and N0 and estimating K, the value of the critical band can be determined.
(Fletcher estimated K=1; Scharf, 1970, revised that to about 0.4) (Ps/N0 called ‘critical ratio’)
Estimating the shape of the auditory filter based on power-spectrum model:
Ps = K

∞
0
N(f) W(f) df
• Masker is represented by its long-term power spectrum N(f)
• Weighting function, or auditory filter is W(f)
• Ps is power of the signal at threshold
Patterson’s ‘notched noise’ method
To assess auditory filter shape,
Roy Patterson developed a new
masking approach.
The signal is fixed in frequency
and the masker is noise with a
bandstop, the width of which is
varied.
Threshold corresponds to a
constant signal-to-masker ratio at
the output of the filter.
Patterson, R.D. (1976). J. Acoust. Soc. Am., 59, 640-659.
Patterson’s ‘notched noise’ method
Patterson, R.D. (1974). J. Acoust. Soc. Am., 55, 802-809.
Shape of auditory filter from notched noise
Typical values of
auditory filter
bandwidth, based on
notched-noise
approach: 10-15% of
center frequency.
The auditory filter, unsurprisingly, is unlike a simple rectangular filter. This filter cannot be
specified with a single number …
However, some sort of summary statistic is useful. Common measure: bandwidth of the filter at
the point at which the power has fallen by a factor of 2 -- i.e. by 3 dB. (Other measure: equivalent
rectangular bandwidth (ERB)).
Other approaches to characterizing the auditory filter:
• psychophysical tuning curves (e.g. Vogten 1974)
• rippled noise method (Glasberg, Moore, Nimmo-Smith 1984 )
Psychophysical tuning curves
Four approaches to study critical bands
1. Masking with noise on target
2. Critical bands by loudness
3. Notched noise method
4. Psychophysical tuning curves
• A critical band CB is understood as a spectral window over which
energy is integrated for certain tasks.
• Spectral band filters, as observed in psychoacoustics and in
behavioral experiments, are sliding spectral windows.
• CB estimates in humans are ~0.14 - 0.23 octaves
The width of the critical band (auditory filter)
changes with center frequency
The shape of the critical band (auditory filter)
changes with signal amplitude
Relation between auditory filters and
excitation pattern
Top: 1-kHz sinusoid as
‘represented’ by five
auditory filters, centered at
different frequencies.
Bottom: calculated
excitation pattern of
auditory system
Moore & Glasberg 1983
Summary 1: The filter-bank model of hearing
• The basilar membrane 'filters' parts of a sound into the auditory-nerve fibers, so that
the output of the cochlea is like the output of a bank of filters that transmit information in
parallel.
• Each of these 'auditory' filters is centered at a different frequency, and responds to
only a narrow range of frequencies. The centre frequencies of adjacent filters are very
close, so that their frequency ranges overlap considerably.
• There are around 25,000 nerve fibres in the auditory nerve, so the center frequencies
of filters are, effectively, continuously distributed over the ear's frequency range.
Summary 2: critical bands/auditory filters
1. Human auditory perceptual analysis is quantized into < 30 “critical bands”
2. of perceptually near-identical frequency analysis classes
3. corresponding to approximately equal length bands of cochlear tissue (receptor surface)
Hall & Garcia, 2012
basal end
apical end
Co-modulation masking release (CMR)
Task: Detection of a pure tone embedded in noise
Stimuli: Target & 2 types of noise
1) Gaussian white noise (UM)
amplitude
time
2) Amplitude-modulated noise (CM)
(Gaussian white noise x sinusoid)
Co-modulated
temporal fluctuations
across different
frequency bands
Auditory CMR Stimuli presentation timeline &
summary
target = 750ms
Target
mask =
1000ms
ISI = 500ms
ISI = 500ms
x 100 trials per condition per session x
3...
Task: Detection of a pure tone embedded in noise
Stimuli: Target masked by either UM or CM noise
UM
CM
Each type of noise masker is filtered to have different bandwidths:
typically: 50, 100, 200, 400, 1000, 2000 Hz (=12 blocks) centered at a
given frequency (e.g., 1, 2 or 4 kHz)
Measure: thresholds for detection of target in each noise condition
using interleaved staircases (2x 1-up, 2-down), 50 trials per staircase
Trials blocked by condition type
Co-modulation masking release (CMR)
UM
CM
Critical Band
width for 1 kHz
Center
Frequency
First reported in: Hall, J.W., Haggard, M.P. & Fernandez, M.A. (1984). Detection by spectro-temporal pattern analysis. J. Acoust. Soc. Am. 76:
50-56.
Co-modulation masking release (CMR)
Co-modulation masking release (CMR)
Summary
•
•
•
Target detection is easier when the noise
masker is amplitude co-modulated (CM)
compared to the reference (UM) white noise
condition.
The CMR effect increases with increasing
bandwidth (counterintuitive...adding more noise
reduces threshold!)
Proposed mechanisms (many): e.g., “the diplistening hypothesis”
Co-modulation masking release (CMR)
•
The classical auditory CMR effect (typically run
at 50Hz) can be obtained @ lower modulation
frequencies...(extensive piloting done at 4Hz,
5Hz & 10Hz modulation frequencies).
The above results are also reported in the literature: Bacon et al.
(1997). Masking by modulated and unmodulated noise: effects of
bandwidth, modulation rate, signal frequency, and masker level.
Spectral resolution is great.
But do you always need it?
Shannon 1995
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
Download