17-Auditionb

advertisement
Audition
(or, how we hear things)
April 7, 2009
Dirty Work
• Final interim course reports to turn in.
• Final project report guidelines to hand out...
• On Thursday, we’ll talk about auditory (exemplar) models
of speech perception.
• Recap: categorical perception homework.
How Do We Hear?
• The ear is the organ of hearing. It converts sound waves
into electrical signals in the brain.
• the process of “audition”
• The ear has three parts:
• The Outer Ear
• sound is represented acoustically (in the air)
• The Middle Ear
• sound is represented mechanically (in solid bone)
• The Inner Ear
• sound is represented in a liquid
The Ear
Outer Ear Fun Facts
• The pinna, or auricle, is a bit more receptive to sounds
from the front than sounds from the back.
• It functions primarily as “an earring holder”.
• Sound travels down the ear canal, or auditory meatus.
• Length  2 - 2.5 cm
• Sounds between  3500-4000 Hz resonate in the ear
canal
• The tragus protects the opening to the ear canal.
• Optionally provides loudness protection.
• The outer ear dead ends at the eardrum, or tympanic
membrane.
The Middle Ear
the anvil
(incus)
the hammer
(malleus)
the stirrup
(stapes)
eardrum
The Middle Ear
• The bones of the middle ear are known as the ossicles.
• They function primarily as an amplifier.
• = increase sound pressure by about 30 dB
• Works by focusing sound vibrations into a smaller area
• area of eardrum = .55 cm2
• area of footplate of stapes = .032 cm2
• Think of a thumbtack...
Concentration
• Pressure (on any given area) = Force / Area
• Pushing on a cylinder provides
no gain in force at the other end...
• Areas are equal on both sides.
• Pushing on a thumb tack provides
a gain in force equal to A1 / A2.
• For the middle ear ,
force gain 
• .55 / .032  17
Leverage
• The middle ear also exerts a lever action on the inner
ear.
• Think of a crowbar...
• Force difference is
proportional to ratio of
handle length to end length.
• For the middle ear:
• malleus length /
stapes length
• ratio  1.3
Conversions
• Total amplification of middle ear  17 * 1.3  22
• increases sound pressure by 20 - 24 dB
• Note: people who have lost their middle ear bones can
still hear...
• With a 20-24 dB loss in sensitivity.
• (Fluid in inner ear absorbs 99.9% of acoustic energy)
• For loud sounds (> 85-90 dB), a reflex kicks in to
attenuate the vibrations of the middle ear.
• this helps prevent damage to the inner ear.
The Attenuation Reflex
• Requires 50-100
msec of reaction time.
• Poorly attenuates
sudden loud noises
• Muscles fatigue after
15 minutes or so
• Also triggered by
speaking
tensor
tympani
stapedius
The Inner Ear
• In the inner ear there is a
snail-shaped structure
called the cochlea.
• The cochlea:
• is filled with fluid
• consists of several
different membranes
• terminates in membranes
called the oval window and
the round window.
Cochlea Cross-Section
• The inside of the cochlea is divided into three sections.
• In the middle of them all is the basilar membrane.
Contact
• On top of the
basilar membrane
are rows of hair
cells.
• We have about 3,500 “inner” hair cells...
• and 15,000-20,000 “outer” hair cells.
How does it work?
• On top of each hair cell
is a set of about 100 tiny
hairs (stereocilia).
• Upward motion of the
basilar membrane
pushes these hairs into
the tectorial membrane.
• The deflection of the hairs opens up channels in the hair
cells.
• ...allowing the electrically charged endolymph to flow
into them.
• This sends a neurochemical signal to the brain.
An Auditory Fourier Analysis
• Individual hair cells in
the cochlea respond
best to particular
frequencies.
• General limits:
20 Hz - 20,000 Hz
• Cells at the base
respond to high
frequencies;
tonotopic organization of the
cochlea
• Cells at the apex
respond to low.
How does this work?
• Hermann von Helmholtz (again!) first proposed the place
theory of cochlear organization.
• Original idea: one hair cell for each frequency.
• a.k.a. the “resonance theory”
• But...we can perceive more frequencies than we have
hair cells for.
• The rate theory emerged as an alternative:
• Frequency of cell firing encodes frequencies in the
acoustic signal.
• a.k.a. the “frequency theory”
• Problem: cell firing rate is limited to 1000 Hz...
Synthesis
• The volley theory attempted to salvage the frequency
rate proposal.
• Idea: frequency rates higher than 1000 Hz are “volleyed”
back and forth between individual hair cells.
• There is evidently considerable evidence for this
proposal.
Traveling Waves (in the ear!)
• Last but not least, there is the traveling wave theory.
• Idea: waves of different frequencies travel to a different
extent along the cochlea.
• Like wavelength:
• Higher frequency waves are shorter
• Lower frequency waves are longer
The Traveling Upshot
• Lower frequency waves travel the length of the
cochlea...
• but higher frequencies cut off after a short distance.
• All cells respond to lower frequencies (to some extent),
• but fewer cells respond to high frequency waves.
• Individual hair cells thus function like low-pass filters.
Hair Cell Bandwidth
• Each hair cell responds to a range of frequencies,
centered around an optimal characteristic frequency.
Frequency Perception
• In reality, there is (unfortunately?) more than one truth--
• Place-encoding (traveling wave theory) is probably
more important for frequencies above 1000 Hz;
• Rate-encoding (volley theory) is probably more
important for frequencies below 1000 Hz.
• Interestingly, perception of frequencies above 1000 Hz
is much less precise than perception of frequencies below
1000 Hz.
• Match this tone:
• To the tone that is twice the frequency:
Higher Up
• Now try it with this tone:
• Compared to these tones:
• Idea: listeners interpret pitch differences as (absolute)
distances between hair cells in the cochlea.
• Perceived pitch is expressed in units called mels.
• Twice the number of mels = twice as high of a
perceived pitch.
• Mels = 1127.01048 * ln (1 + F/700)
• where acoustic frequency (F) is expressed in Hertz.
The Mel Scale
Equal Loudness Curves
• Perceived loudness also depends on frequency.
Audiograms
• When an audiologist tests your hearing, they determine
your hearing threshold at several different frequencies.
• They then chart how much your hearing threshold differs
from that of a “normal” listener at those frequencies in an
audiogram.
• Noise-induced
hearing loss tends
to affect higher
frequencies first.
• (especially
around 4000 Hz)
Age
• Sensitivity to higher frequencies also diminishes with
age. (“Presbycusis”)
Note: the
“teen buzz”
Otitis Media
• Kids often get ear infections, which are technically
known as otitis media.
• = fluid fills the middle ear
• This leads to a form of conduction deafness, in which
sound is not transmitted as well to the cochlea.
• Auditorily, frequencies from 500 to 1000 Hz tend to drop
out.
Check out a Praat
demo.
Loudness
• The perceived loudness of a sound is measured in units
called sones.
• The sone scale also exhibits a non-linear relationship
with respect to absolute pressure values.
Masking
• Another scale for measuring auditory frequency
emerged in the 1960s.
• This scale was inspired from the phenomenon of
auditory masking.
• One sound can “mask”, or obscure, the perception of
another.
• Unmasked:
• Masked:
• Q: How narrow can we make the bandwidth of the
noise, before the sinewave becomes perceptible?
• A: Masking bandwidth is narrower at lower frequencies.
Critical Bands
• Using this methodology, researchers eventually
determined that there were 24 critical bands of hearing.
• The auditory system integrates all acoustic energy
within each band.
•  Two tones within the same critical band of
frequencies sound like one tone
• Ex: critical band #9 ranges from 920-1080 Hz
•  F1 and F2 for
might merge together
• Each critical band  0.9 mm on the basilar membrane.
•  The auditory system consists of 24 band-pass filters.
• Each filter corresponds to one unit on the Bark scale.
Bark Scale of Frequency
• The Bark scale converts acoustic frequencies into
numbers for each critical band
Bark Table
Band
Center Bandwidth
Band
Center Bandwidth
1
50
20-100
13
1850
1720-2000
2
150
100-200
14
2150
2000-2320
3
250
200-300
15
2500
2320-2700
4
350
300-400
16
2900
2700-3150
5
450
400-510
17
3400
3150-3700
6
570
510-630
18
4000
3700-4400
7
700
630-770
19
4800
4400-5300
8
840
770-920
20
5800
5300-6400
9
1000
920-1080
21
7000
6400-7700
10
1170
1080-1270
22
8500
7700-9500
11
1370
1270-1480
23
10500
9500-12000
12
1600
1480-1720
24
13500
12000-15500
Spectral Differences
• Acoustic vs. auditory spectra of
F1 and F2
Cochleagrams
• Cochleagrams are spectrogram-like representations
which incorporate auditory transformations for both pitch
and loudness perception
• Acoustic spectrogram vs. auditory cochleagram
representation of Cantonese word
• Check out Peter’s vowels in Praat.
Hearing Aids et al.
• Generally speaking, a hearing aid is simply an
amplifier.
• Old style: amplifies all frequencies
• New style: amplifies specific frequencies, based
on a listener’s particular hearing capabilities.
• More recently, profoundly deaf listeners may regain
some hearing through the use of a cochlear implant
(CI).
• For listeners with nerve deafness.
• However, CIs can only transmit a degraded signal to
the inner ear.
Cochlear Implants
A Cochlear Implant artificially stimulates the nerves
which are connected to the cochlea.
Nuts and Bolts
•
The cochlear implant chain of events:
1. Microphone
2. Speech processor
3. Electrical stimulation
•
What the CI user hears is entirely determined by the
code in the speech processor
•
Number of electrodes stimulating the cochlea ranges
between 8 to 22.
•
•
 poor frequency resolution
Also: cochlear implants cannot stimulate the low
frequency regions of the auditory nerve
Noise Vocoding
• The speech processor operates like a series of critical
bands.
• It divides up the frequency scale into 8 (or 22) bands and
stimulates each electrode according to the average
intensity in each band.
This results in what sounds (to us) like a highly degraded
version of natural speech.
What CIs Sound Like
• Check out some nursery rhymes which have been
processed through a CI simulator:
CI Perception
• One thing that is missing from vocoded speech is F0.
• …It only encodes spectral change.
• Last year, Aaron Byrnes put together an experiment
testing intonation perception in CI-simulated speech for
his honors thesis.
• Tested: discrimination of questions vs. statements
• And identification of most prominent word in a
sentence.
• 8 channels:
• 22 channels:
The Findings
• CI User:
• Excellent identification of the most prominent word.
• At chance (50%) when distinguishing between
statements and questions.
• Normal-hearing listeners (hearing simulated speech):
• Good (90-95%) identification of the prominent word.
• Not too shabby (75%) at distinguishing statements
and questions.
• Conclusion 1: F0 information doesn’t get through the CI.
• Conclusion 2: Noise-vocoded speech might not be a
completely accurate CI simulation.
Mitigating Factors
• The amount of success with Cochlear Implants is highly
variable.
• Works best for those who had hearing before they
became deaf.
• Depends a lot on the person
• Possibly because of reorganization of the brain
• Works best for (in order):
• Environmental Sounds
• Speech
• Speaking on the telephone (bad)
• Music (really bad)
Critical Period?
• For congentially deaf users, the Cochlear Implant
provides an unusual test of the “forbidden experiment”.
• The “critical period” is extremely early-• They perform best, the earlier they receive the implant
(12 months old is the lower limit)
• Steady drop-off in performance thereafter
• Difficult to achieve natural levels of fluency in speech.
• Depends on how much they use the implant.
• Partially due to early sensory deprivation.
• Also due to degraded auditory signal.
Practical Considerations
• It is largely unknown how well anyone will perform with a
cochlear implant before they receive it.
• Possible predictors:
• lipreading ability
• rapid cues for place are largely obscured by the
noise vocoding process.
• fMRI scans of brain activity during presentation of
auditory stimuli.
One Last Auditory Thought
• Frequency
coding of
sound is
found all the
way up in
the auditory
cortex.
• Also: some
neurons
only fire
when
sounds
change.
Critical Bands
• In theory, the auditory system divides up the frequency
scale into 24 bands and integrates all the acoustic intensity
within each band.
The bands are narrower and more numerous at the lower
end of the scale.
Download