The incredible sense of hearing 2

advertisement
EEL 6586
Automatic Speech Processing
Meena Ramani
04/12/06
Topics to be covered
Lecture 1: The incredible sense of hearing 1
Anatomy
Perception of Sound
Lecture 2: The incredible sense of hearing 2
Psychoacoustics
Hearing aids and cochlear implants
The incredible sense of hearing-2
“Behind these unprepossessing flaps lie structures of such
delicacy that they shame the most skillful craftsman"
-Stevens, S.S. [Professor of Psychophysics, Harvard University]
How do we hear?
Threshold of Hearing
Equal loudness curves
The Bass Loss Problem
Rock music
Too lowno bass
Too hightoo much bass
Threshold variation with age
Thresholds of hearing for normal & HI listeners
Threshold of hearing (dB SPL)
90
Normal hearing
Hearing impaired
80
70
60
50
40
30
20
10
0
-10 2
10
3
10
Frequency (Hz)
10
4
The Audiogram
Audiogram
100
Left Ear
Right Ear
Hearing Level (HL), dB
80
60
40
20
0
-20
0
1000
2000
3000
4000
Frequency, Hz
5000
6000
The Audiogram (contd.)
Audiogram
100
Left Ear
Right Ear
Hearing Level (HL), dB
80
60
40
20
0
-20
0
1000
2000
3000
4000
Frequency, Hz
5000
Pure tone audiogram
[250 500 1K 2K 4K 6k] Hz
<20 dB HL is Normal Hearing
6000
Loudness Growth Curve
LGOB loudness growth curve at 250 Hz
7
LGOB-Loudness rating
6
Normal hearing
Hearing impaired
5
4
3
2
1
0
0
20
40
60
80
Input level (dB SPL)
100
Otoacoustic emissions
• The ear produces some sounds!
– OHC-outer hair cell
• Used to test hearing for infants & check if
patient is feigning a loss
Monoaural beats
If two tones are presented monaurally with a small frequency
difference, a beating pattern can be heard
500 & 502 Hz
500 & 520 Hz
Interaction of the two tones in the same auditory filter
Waveform: 150 Hz + 170 Hz
Binaural beats
Beating can also be heard when the tones are presented to
different ears!
500 Hz - left
520 Hz - right
binaural
Beating arises from neural interaction
Only perceived if the tones are sufficiently close in frequency
The case of the missing fundamental
Telephone BW: 300-3400 Hz
How do we know the pitch?
Primary Auditory cortex
•Pitch sensitive neurons [Bendor and Wang, Nature 2005]
•Neuron responds to fundamental and harmonics
•What are the I/Ps to these neuron?
How do spikes represent periodic, temporal and spectral
information?
Auditory-periphery
model
(Zhang et al. ~2001)
Matlab code available
Feed it a wav file
Spits out PSTH
<post stimulus time histogram>
Critical bands
Equally loud, close in frequency
•Same IHCs
•Slightly louder
Equally loud, separated in freq.
•Different IHCs
•Twice as loud
Psychoacoustic experiments
Critical Band (cont.)
Center Freq Critical BW
100
90
200
90
500
110
1000
150
2000
280
5000
700
10000
1200
• Proposed by Fletcher
• How to measure?
– S/N ratio vs noise BW
• CB ~= 1.5mm spacing on BM
• 24 such band pass filters
• BW of the filters increases with fc
• Logarithmic relationship
– Weber’s law example
• Bark scale
Critical bands for HI
4 kHz tuning curve for normal & HI listeners
Desired tone threshold (dB SPL)
90
80
70
Masker
Normal hearing
Hearing impaired
60
50
40
30
20
10
0 3
10
10
Desired tone frequency (Hz)
4
“You know I can't hear you when the water is
running!”
MASKING
Frequency Masking
• Masking occurs because two frequencies lie within a critical band and
the higher amplitude one masks the lower amplitude signal
• Masking can be because of broad band, narrowband noise, pure and
complex tones
• Low frequency broad band sounds mask the most
– Eg. Truck on road, water flowing
• Masking threshold
– Amount of dB for test tone to be just audible in presence of noise
Temporal Aspects of Masking
• Simultaneous Masking
• Pre-Stimulus/Backward/Premasking
– 1st test tone 2nd Masker
• Poststimulus/Forward/Postmasking
– 1st Masker 2nd test tone
Temporal Aspects of Masking (contd.)
Simultaneous masking
– Duration >200ms constant test tone threshold
– Assume hearing system integrates over a period of 200ms
Postmasking
– Decay in effect of masker for 100ms
– More dominant
Premasking
– Takes place 20ms before masker is on!!
– Each sensation is not instantaneous , requires build-up time
• Quick build up for loud maskers
• Slower build up for softer maskers
– Less dominant effect
Temporal masking for HI
Desired tone threshold (dB SPL)
Temporal resolution at 4 kHz for normal & HI listeners
80
70
60
50
Normal hearing
Hearing impaired
40
30
20
10
0
0
20
40
60
80
100
120
Desired-Masker tone separation (ms)
140
EEL 6586
Automatic Speech Processing
Meena Ramani
04/14/06
Cell phone speech for normal hearing
4000
0
3000
-50
2500
-100
2000
-150
1500
-200
Cell phone speech for SNHL
1000
4000
-250
500
0
0
0
3500
0.5
1
Time (s)
1.5
2
Normal Hearing
What do the hearing
impaired hear?
Frequency (Hz)
Frequency (Hz)
3500
3000
-50
2500
-100
2000
-150
1500
-200
1000
-250
500
0
0
0.5
1
Time (s)
1.5
2
Sensorineural Hearing Loss
Mild to Severe Loss
[10 20 30 60 80 90] dB HL
Facts on Hearing Loss in Adults
• One in every ten (28 million) Americans has hearing loss.
• The vast majority of Americans (95% or 26 million) with hearing loss
can have their hearing loss treated with hearing aids.
• Only 6 million use HAs
• Millions of Americans with hearing loss could benefit from
hearing aids but avoid them because of the stigma.
Types of Hearing aids
Behind The ear
In the Canal
In the Ear
Completely in the
canal
Anatomy of a Hearing Aid
•
•
•
•
Microphone
Tone hook
Volume control
On/off switch
• Battery compartment
Hearing Aid
Fitting
Ear Mold Measurements
Acclimatization effect
Auditory cortex brain plasticity
Time for the HI to reuse the HF information:
Acclimatization effect
How does this affect HA fitting?
– Multiple fitting sessions
– Initial fitting should be optimum one
So doc, what is the fitting
methodology employed
by the hearing aid
company to compensate
for my hearing loss?
Not-so-average Joe
(PhD EE/Speech person)
?
So, do you want your HA to:
1) Always be comfortably loud
2) Equalize loudness across
frequencies
3) Normalize loudness
…?
?
Which fitting methodology is the best?
Existing HL compensation algorithms
Hearing aid fitting
algorithms
Threshold-only
NAL-R
POGO




HG
Suprathreshold
Fig 6
IHAFF
Rationale
Adhoc: Half Gain, POGO
Make speech comfortable: NAL-R
Loudness normalization: IHAFF, Fig 6
Loudness equalization: DSL
DSL
Spectrograms and sound files
Normal hearing
HI with DSL gain
Hearing impaired
HI with Linear gain
HI with RBC gain
Sensorineural hearing loss [10 20 30 60 80 90] dB HL
Speech level= 65 dBA
Section Two
Performance metrics
Speech Intelligibility
Objective Measures
AI, STI
Speech Intelligibility (SI): The degree
to which speech can be understood
Subjective Measures
HINT
Speech Quality: “Does the
speech match your expectations?”
Speech Quality
Objective Measures
PESQ
Subjective Measures
MOS
Performance metrics (contd.)
• Objective speech quality measure
– Perceptual Evaluation of Subjective Quality (PESQ)
• Subjective speech quality measure
– Mean Opinion Score (MOS)
Reference
• Subjective speech intelligibility measure
signal
– Hearing In Noise Test (HINT)
Comparison
signal
Score
Hearing In Noise Test (HINT)
Subjective listening experiments
Left ear audiograms of the HI subjects
Threshold of hearing (dB HL)
120
100
80
60
40
20
0
0
2000
4000
Frequency (Hz)
6000
Audiograms of the HI patients
8000
Location:
Shands speech & hearing clinic
(sound proof booth)
Subjects:
15 HI people
– PTA: 40-70 dB HL
15 normal hearing people
Tools used:
Matlab HINT and MOS GUIs
Subjective HINT and MOS scores for RBC:
hearing impaired, cell phone speech
SNR relative to baseline (dB)
Ave. HINT scores of 15 HI subjects
Ave. MOSs of 15 HI subjects
5
5-Excellent
0
4-Good
-5
3-Fair
-10
2-Poor
-15
1-Bad
-20
None
HPF
RBC
NALR POGO
Algorithm
HG
NALRP DSL
None
HPF
RBC
NALR POGO
Algorithm
HG NALRP DSL
RBC has a 7 dB improvement in SI when compared to DSL
MOS scores reveal that RBC has a quality rating of ‘Good’
Subjective HINT and MOS scores for RBC:
normal hearing, cell phone speech
RBC has a 12 dB improvement in SI when compared to DSL
MOS scores reveal that RBC has a quality rating of ‘Good’
Cochlear Implants
Definition:
A device that electrically stimulates the
auditory nerve of patients with severe-toprofound hearing loss to provide them with
sound and speech information
The first fully functional Brain Machine
Interface (BMI)
Who is a candidate?
• Severe-to profound sensorineural hearing
loss
• Hearing loss did not reach severe-toprofound level until after acquiring oral
speech and language skills
• Limited benefit from hearing aids
CI statistics
• Worldwide:
– Over 100,000 multi-channel implants
• At Univ of Florida:
– Implanted first patient in 1985
– Currently follow over 400 cochlear patients
Technical and Safety Issues
• Magnetic Resonance Imaging
• Surgical issues
How does the Cochlea encode
frequencies?
Example: New Freedom
CI characteristics
1. Electrode design
–
Number of electrodes, electrode configuration
2. Type of stimulation
–
Analog or pulsatile
3. Transmission link
–
Transcutaneous or percutaneous
4. Signal processing
–
Waveform representation or feature extraction
Signal processing
•
•
•
•
•
Compressed Analog (CA)
Continuous Interleaved Sampling (CIS)
Multiple Peak (MPEAK )
Spectral Maxima Sound Processor (SMSP)
Spectral Peak (SPEAK)
Compressed Analog (CA) approach
CA activation signals
Continuous Interleaved Sampling (CIS)
CIS activation signals
Multiple Peak (MPEAK)
MPEAK activated electrodes
Spectral Maxima Sound Processor
(SMSP)
SMSP activated electrodes
Spectral Peak (SPEAK)
SPEAK activated electrodes
Outcomes for Post-lingual Adults
• Wide range of success
• Most score 90-100% on AV sentence
materials
• Majority score > 80% on high context
• Performance more varied on single word
tests
Auditory Brainstem Implant
• Approved October 20, 2000
• Uses the Nucleus 24 system
processors
• Plate array with 21 electrodes
Review-1
Pinna:
ITDs,IIDs: Horizontal localization
Reflections: Vertical localization
Ear canal:
¼ wave resonance 1-3 kHz
Middle ear:
Amplification by lever action and by area
Stapedius reflex
Cochlea:
IHCs/OHCs: convert mechanical to electrical
Place theory: frequency analysis
Missing fundamental
Review-2
Adaptation:
AN firing sensitive to changes
Otoacoustic emissions:
Produced by movement of OHCs
Beats:
Monaural & binaural
Measurement of hearing:
Audiogram: threshold of hearing
Threshold variation with age
Equal loudness curves
Bass loss problem: discrimination against LFs
Review-3
Critical bands:
used for efficient encoding
Bark scale
Masking:
Frequency: LFs mask more
Temporal: simultaneous, pre and post
Hearing impairment:
Hearing aids: external to cochlea
Cochlear implants: inside cochlea
Download