EEL 6586 Automatic Speech Processing Meena Ramani 04/12/06 Topics to be covered Lecture 1: The incredible sense of hearing 1 Anatomy Perception of Sound Lecture 2: The incredible sense of hearing 2 Psychoacoustics Hearing aids and cochlear implants The incredible sense of hearing-2 “Behind these unprepossessing flaps lie structures of such delicacy that they shame the most skillful craftsman" -Stevens, S.S. [Professor of Psychophysics, Harvard University] How do we hear? Threshold of Hearing Equal loudness curves The Bass Loss Problem Rock music Too lowno bass Too hightoo much bass Threshold variation with age Thresholds of hearing for normal & HI listeners Threshold of hearing (dB SPL) 90 Normal hearing Hearing impaired 80 70 60 50 40 30 20 10 0 -10 2 10 3 10 Frequency (Hz) 10 4 The Audiogram Audiogram 100 Left Ear Right Ear Hearing Level (HL), dB 80 60 40 20 0 -20 0 1000 2000 3000 4000 Frequency, Hz 5000 6000 The Audiogram (contd.) Audiogram 100 Left Ear Right Ear Hearing Level (HL), dB 80 60 40 20 0 -20 0 1000 2000 3000 4000 Frequency, Hz 5000 Pure tone audiogram [250 500 1K 2K 4K 6k] Hz <20 dB HL is Normal Hearing 6000 Loudness Growth Curve LGOB loudness growth curve at 250 Hz 7 LGOB-Loudness rating 6 Normal hearing Hearing impaired 5 4 3 2 1 0 0 20 40 60 80 Input level (dB SPL) 100 Otoacoustic emissions • The ear produces some sounds! – OHC-outer hair cell • Used to test hearing for infants & check if patient is feigning a loss Monoaural beats If two tones are presented monaurally with a small frequency difference, a beating pattern can be heard 500 & 502 Hz 500 & 520 Hz Interaction of the two tones in the same auditory filter Waveform: 150 Hz + 170 Hz Binaural beats Beating can also be heard when the tones are presented to different ears! 500 Hz - left 520 Hz - right binaural Beating arises from neural interaction Only perceived if the tones are sufficiently close in frequency The case of the missing fundamental Telephone BW: 300-3400 Hz How do we know the pitch? Primary Auditory cortex •Pitch sensitive neurons [Bendor and Wang, Nature 2005] •Neuron responds to fundamental and harmonics •What are the I/Ps to these neuron? How do spikes represent periodic, temporal and spectral information? Auditory-periphery model (Zhang et al. ~2001) Matlab code available Feed it a wav file Spits out PSTH <post stimulus time histogram> Critical bands Equally loud, close in frequency •Same IHCs •Slightly louder Equally loud, separated in freq. •Different IHCs •Twice as loud Psychoacoustic experiments Critical Band (cont.) Center Freq Critical BW 100 90 200 90 500 110 1000 150 2000 280 5000 700 10000 1200 • Proposed by Fletcher • How to measure? – S/N ratio vs noise BW • CB ~= 1.5mm spacing on BM • 24 such band pass filters • BW of the filters increases with fc • Logarithmic relationship – Weber’s law example • Bark scale Critical bands for HI 4 kHz tuning curve for normal & HI listeners Desired tone threshold (dB SPL) 90 80 70 Masker Normal hearing Hearing impaired 60 50 40 30 20 10 0 3 10 10 Desired tone frequency (Hz) 4 “You know I can't hear you when the water is running!” MASKING Frequency Masking • Masking occurs because two frequencies lie within a critical band and the higher amplitude one masks the lower amplitude signal • Masking can be because of broad band, narrowband noise, pure and complex tones • Low frequency broad band sounds mask the most – Eg. Truck on road, water flowing • Masking threshold – Amount of dB for test tone to be just audible in presence of noise Temporal Aspects of Masking • Simultaneous Masking • Pre-Stimulus/Backward/Premasking – 1st test tone 2nd Masker • Poststimulus/Forward/Postmasking – 1st Masker 2nd test tone Temporal Aspects of Masking (contd.) Simultaneous masking – Duration >200ms constant test tone threshold – Assume hearing system integrates over a period of 200ms Postmasking – Decay in effect of masker for 100ms – More dominant Premasking – Takes place 20ms before masker is on!! – Each sensation is not instantaneous , requires build-up time • Quick build up for loud maskers • Slower build up for softer maskers – Less dominant effect Temporal masking for HI Desired tone threshold (dB SPL) Temporal resolution at 4 kHz for normal & HI listeners 80 70 60 50 Normal hearing Hearing impaired 40 30 20 10 0 0 20 40 60 80 100 120 Desired-Masker tone separation (ms) 140 EEL 6586 Automatic Speech Processing Meena Ramani 04/14/06 Cell phone speech for normal hearing 4000 0 3000 -50 2500 -100 2000 -150 1500 -200 Cell phone speech for SNHL 1000 4000 -250 500 0 0 0 3500 0.5 1 Time (s) 1.5 2 Normal Hearing What do the hearing impaired hear? Frequency (Hz) Frequency (Hz) 3500 3000 -50 2500 -100 2000 -150 1500 -200 1000 -250 500 0 0 0.5 1 Time (s) 1.5 2 Sensorineural Hearing Loss Mild to Severe Loss [10 20 30 60 80 90] dB HL Facts on Hearing Loss in Adults • One in every ten (28 million) Americans has hearing loss. • The vast majority of Americans (95% or 26 million) with hearing loss can have their hearing loss treated with hearing aids. • Only 6 million use HAs • Millions of Americans with hearing loss could benefit from hearing aids but avoid them because of the stigma. Types of Hearing aids Behind The ear In the Canal In the Ear Completely in the canal Anatomy of a Hearing Aid • • • • Microphone Tone hook Volume control On/off switch • Battery compartment Hearing Aid Fitting Ear Mold Measurements Acclimatization effect Auditory cortex brain plasticity Time for the HI to reuse the HF information: Acclimatization effect How does this affect HA fitting? – Multiple fitting sessions – Initial fitting should be optimum one So doc, what is the fitting methodology employed by the hearing aid company to compensate for my hearing loss? Not-so-average Joe (PhD EE/Speech person) ? So, do you want your HA to: 1) Always be comfortably loud 2) Equalize loudness across frequencies 3) Normalize loudness …? ? Which fitting methodology is the best? Existing HL compensation algorithms Hearing aid fitting algorithms Threshold-only NAL-R POGO HG Suprathreshold Fig 6 IHAFF Rationale Adhoc: Half Gain, POGO Make speech comfortable: NAL-R Loudness normalization: IHAFF, Fig 6 Loudness equalization: DSL DSL Spectrograms and sound files Normal hearing HI with DSL gain Hearing impaired HI with Linear gain HI with RBC gain Sensorineural hearing loss [10 20 30 60 80 90] dB HL Speech level= 65 dBA Section Two Performance metrics Speech Intelligibility Objective Measures AI, STI Speech Intelligibility (SI): The degree to which speech can be understood Subjective Measures HINT Speech Quality: “Does the speech match your expectations?” Speech Quality Objective Measures PESQ Subjective Measures MOS Performance metrics (contd.) • Objective speech quality measure – Perceptual Evaluation of Subjective Quality (PESQ) • Subjective speech quality measure – Mean Opinion Score (MOS) Reference • Subjective speech intelligibility measure signal – Hearing In Noise Test (HINT) Comparison signal Score Hearing In Noise Test (HINT) Subjective listening experiments Left ear audiograms of the HI subjects Threshold of hearing (dB HL) 120 100 80 60 40 20 0 0 2000 4000 Frequency (Hz) 6000 Audiograms of the HI patients 8000 Location: Shands speech & hearing clinic (sound proof booth) Subjects: 15 HI people – PTA: 40-70 dB HL 15 normal hearing people Tools used: Matlab HINT and MOS GUIs Subjective HINT and MOS scores for RBC: hearing impaired, cell phone speech SNR relative to baseline (dB) Ave. HINT scores of 15 HI subjects Ave. MOSs of 15 HI subjects 5 5-Excellent 0 4-Good -5 3-Fair -10 2-Poor -15 1-Bad -20 None HPF RBC NALR POGO Algorithm HG NALRP DSL None HPF RBC NALR POGO Algorithm HG NALRP DSL RBC has a 7 dB improvement in SI when compared to DSL MOS scores reveal that RBC has a quality rating of ‘Good’ Subjective HINT and MOS scores for RBC: normal hearing, cell phone speech RBC has a 12 dB improvement in SI when compared to DSL MOS scores reveal that RBC has a quality rating of ‘Good’ Cochlear Implants Definition: A device that electrically stimulates the auditory nerve of patients with severe-toprofound hearing loss to provide them with sound and speech information The first fully functional Brain Machine Interface (BMI) Who is a candidate? • Severe-to profound sensorineural hearing loss • Hearing loss did not reach severe-toprofound level until after acquiring oral speech and language skills • Limited benefit from hearing aids CI statistics • Worldwide: – Over 100,000 multi-channel implants • At Univ of Florida: – Implanted first patient in 1985 – Currently follow over 400 cochlear patients Technical and Safety Issues • Magnetic Resonance Imaging • Surgical issues How does the Cochlea encode frequencies? Example: New Freedom CI characteristics 1. Electrode design – Number of electrodes, electrode configuration 2. Type of stimulation – Analog or pulsatile 3. Transmission link – Transcutaneous or percutaneous 4. Signal processing – Waveform representation or feature extraction Signal processing • • • • • Compressed Analog (CA) Continuous Interleaved Sampling (CIS) Multiple Peak (MPEAK ) Spectral Maxima Sound Processor (SMSP) Spectral Peak (SPEAK) Compressed Analog (CA) approach CA activation signals Continuous Interleaved Sampling (CIS) CIS activation signals Multiple Peak (MPEAK) MPEAK activated electrodes Spectral Maxima Sound Processor (SMSP) SMSP activated electrodes Spectral Peak (SPEAK) SPEAK activated electrodes Outcomes for Post-lingual Adults • Wide range of success • Most score 90-100% on AV sentence materials • Majority score > 80% on high context • Performance more varied on single word tests Auditory Brainstem Implant • Approved October 20, 2000 • Uses the Nucleus 24 system processors • Plate array with 21 electrodes Review-1 Pinna: ITDs,IIDs: Horizontal localization Reflections: Vertical localization Ear canal: ¼ wave resonance 1-3 kHz Middle ear: Amplification by lever action and by area Stapedius reflex Cochlea: IHCs/OHCs: convert mechanical to electrical Place theory: frequency analysis Missing fundamental Review-2 Adaptation: AN firing sensitive to changes Otoacoustic emissions: Produced by movement of OHCs Beats: Monaural & binaural Measurement of hearing: Audiogram: threshold of hearing Threshold variation with age Equal loudness curves Bass loss problem: discrimination against LFs Review-3 Critical bands: used for efficient encoding Bark scale Masking: Frequency: LFs mask more Temporal: simultaneous, pre and post Hearing impairment: Hearing aids: external to cochlea Cochlear implants: inside cochlea