presentation_nitya_March2013

advertisement
Real-time Implementation of
Multi-band Frequency Compression
for Listeners with Moderate
Sensorineural Impairment
Prem C. Pandey, Pandurangarao. N. Kulkarni, Nitya Tiwari
{pcpandey, nitya} @ ee.iitb.ac.in, pnk_bewoor @ yahoo.com
EE Dept., IIT Bombay
March 2013
Outline
1. Introduction
2. Signal processing
3. Real-time implementation
4. Results
5. Conclusion
nitya@ee.iitb.ac.in
1
2
3
4
5
2/24
1. Introduction
Speech production mechanism
Excitation source & filter model
• Excitation: voiced/unvoiced glottal, frication
• Filtering: vocal tract filter
nitya@ee.iitb.ac.in
1
2
3
4
5
3/24
Speech segments
• Words • Syllables • Phonemes • Sub-phonemic segments
Phonemes: basic speech units
• Vowels: Pure vowels, Diphthongs
• Consonants: Semivowels, Stops, Fricatives, Affricates, Nasals
/aba/
/ada/
/apa/
/aga/
nitya@ee.iitb.ac.in
1
2
3
4
5
4/24
Phonemic features
• Modes of excitation
• Glottal
Unvoiced (aspiration, constriction at the glottis)
Voiced (vibration of vocal chords)
• Frication
Unvoiced (constriction in vocal tract)
Voiced (constriction in vocal tract & glottal vibration)
• Movement of articulators
• Continuant (steady-state vocal tract configuration): vowels, nasal stops, fricatives
• Non-continuant (changing vocal tract): diphthongs, semivowels, oral stops (plosives)
• Place of articulation (place of maximum constriction in vocal tract)
Bilabial, Labio-dental, Linguo-dental, Alveolar, Palatal, Velar, Gluttoral
• Changes in voicing frequency (Fo)
Supra-segmental features
• Intonation
nitya@ee.iitb.ac.in
• Rhythm
1
2
3
4
5
5/24
Hearing mechanism
Peripheral auditory system
• External ear (sound collection)
• Pinna
• Auditory canal
• Middle ear (impedance matching)
• Ear drum
• Middle ear bones
• Inner ear (analysis and transduction): cochlea
• Auditory nerve (transmission of neural impulses)
Central auditory system (information interpretation)
nitya@ee.iitb.ac.in
1
2
3
4
5
6/24
Auditory system
nitya@ee.iitb.ac.in
1
2
3
4
Tonotopic map of cochlea
5
7/24
Hearing impairment
Types of hearing losses
• Conductive loss
• Central loss
• Sensorineural loss
• Functional loss
Sensorineural hearing loss
• Elevated hearing thresholds
Reduced intelligibility as speech components are inaudible
• Reduced dynamic range & loudness recruitment (abnormal loudness growth)
Distortion of loudness relationship among speech components
• Increased temporal masking
Poor detection of acoustic events
• Increased spectral masking (due to widening of auditory filters)
• Reduced frequency selectivity
• Reduced ability to sense spectral shapes of speech sounds
>> Poor intelligibility and degraded perception of speech
nitya@ee.iitb.ac.in
1
2
3
4
5
8/24
Signal processing in hearing aids
Currently available
• Frequency selective amplification
Improves audibility but may not improve intelligibility in presence of noise
• Automatic volume control
• Multichannel dynamic range compression (settable attack time, release time,
and compression ratios)
Compresses the natural dynamic range into the reduced dynamic range
Under Investigation
• Improvement of consonant-to-vowel ratio (CVR): for reducing the effects of
increased temporal masking
• Techniques for reducing the effects of increased spectral masking: Binaural
dichotic presentation, Spectral contrast enhancement, Multi-band frequency
compression
• Noise suppression
nitya@ee.iitb.ac.in
1
2
3
4
5
9/24
Signal processing for reducing the effects of increased intra-speech
spectral masking
• Binaural hearing (moderate bilateral loss)
– Binaural dichotic presentation (Lunner et al. 1993, Kulkarni et al. 2012)
• Monaural hearing
– Spectral contrast enhancement (Yang et al. 2003)
Errors in identifying peaks, increases dynamic range >> degraded speech perception
– Multi-band frequency compression (Arai et al. 2004, Kulkarni et al. 2012)
Multi-band frequency compression
Presentation of speech energy in relatively narrow bands to avoid masking by
adjacent spectral components, and without causing perceptible distortions.
nitya@ee.iitb.ac.in
1
2
3
4
5
10/24
Research objective
Implementation of multi-band frequency compression for real-time
operation for use in hearing aids
• Reduction in computational requirement for implementing on a DSP
chip without speech quality degradation
• Low algorithmic and computational delay for real-time
implementation on a DSP board with16-bit fixed-point processor
nitya@ee.iitb.ac.in
1
2
3
4
5
11/24
2. Signal Processing
Signal processing for multi-band frequency compression
Processing steps
• Segmentation and spectral analysis
• Resynthesis
●
Spectral modification
Multi-band frequency compression (Arai et al. 2004)
• Processing using auditory critical bandwidth
• Compressed magnitude & original phase spectrum >> Complex spectrum
Multi-band frequency compression (Kulkarni et al. 2009, 2012)
• Investigations using different bandwidth, segmentation & frequency mapping
• Processing using complex spectrum
• Reduced computation and processing artifacts
nitya@ee.iitb.ac.in
1
2
3
4
5
12/24
Multi-band frequency compression (Kulkarni et al. 2009, 2012)
Signal processing
• Segmentation using analysis window with 50 % overlap
– fixed-frame (FF) segmentation : window length L = 20 ms
– pitch-synchronous (PS) segmentation: L = 2 pitch periods / prev. L (voiced / unvoiced)
• Zero padding >> N-point frame
• Complex spectra by N-point DFT
• Analysis bands for compression
– fixed bandwidth
– one-third octave
– auditory critical band (ACB)
• Mappings for spectral modification
– sample-to-sample
– spectral sample superposition
– spectral segment
• Resynthesis by overlap-add method
Frequency mapping with compression factor α = 0.6
as shown by Kulkarni et al. 2012
nitya@ee.iitb.ac.in
1
2
3
4
5
13/24
Details of signal processing
Analysis band for compression
• 18 bands based on ACB
Mapping for spectral modification
• Spectral segment mapping
No irregular variations in spectrum
Spectral Segment Mapping as shown by Kulkarni et
al. 2012
a  k ic  [( k ic  ( k   0 . 5 )) /  ]
b  a  1/
n1
Y (k ' )  (m  a ) X (m ) 
 X ( j )  (b  a ) X (n )
/i/
BB noise
j m 1
Resynthesis
• Resynthesis by N-point IDFT
• Reconstruction by overlap-add method
(a) Unprocessed
(b) Processed
Unprocessed and processed (α =0.6) spectra of
vowel /i/ and broad band noise
nitya@ee.iitb.ac.in
1
2
3
4
5
14/24
Evaluation and results
• Modified Rhyme Test (MRT)
Best results with PS segmentation, ACB, spectral segment mapping & compression
factor of 0.6
• Normal hearing subjects with loss simulated by additive noise
– 17 % improvement in recognition scores
– 0.88 s decrease in response time
• Subjects with moderate sensorineural loss
– 16.5 % improvement in recognition scores
– 0.89 s decrease in response time
nitya@ee.iitb.ac.in
1
2
3
4
5
15/24
Real-time implementation requirements
Comparison of processing schemes with different segmentation
• FF processing: Fixed-frame segmentation
• Discontinuities masked by 50 % overlap-add
• Perceptible distortions in form of another superimposed pitch related to shift duration
• PS Processing: Pitch-synchronous segmentation
• Less perceptible distortions
• Pitch estimation (large computational requirement)
• Not suitable for music
Modified segmentation scheme for real-time implementation
• Lower perceptual distortions
• Lower computational requirement
nitya@ee.iitb.ac.in
1
2
3
4
5
16/24
Modified multi-band frequency compression
(LSEE processing)
•
•
•
•
Griffin and Lim’s method of signal estimation from modified STFT
Minimize mean square error between estimated and modified STFT
Multiplication by analysis window before overlap-add
Window requirement

 m   w
2
( mS  n )  1
• For partial overlap only a few windows meet the requirement
• Fixed window length L and shift S, with S = L/4 (75% overlap)
• Modified Hamming window

w (n)  1 /

4p
2
 2q
2 

 p  q cos  2 π  n  0 . 5  / L 
where p = 0.54 and q = -0.46
nitya@ee.iitb.ac.in
1
2
3
4
5
17/24
Investigations using offline implementation
a)
• Comparison of analysis-synthesis techniques for
multi-band frequency compression
• Pitch-synchronous implementation
• LSEE method based implementation
• Evaluation
• Informal listening
• Objective evaluation using PESQ measure (0 – 4.5)
• Test material
Sentence “ Where were you a year ago?” from a male
speaker, vowels /a/-/i/-/u/, music
• Results
PESQ score (compression factor range 1 ― 0.6):
4.3 ― 3.7 (sentence/ vowels)
b)
c)
d)
Comparison of analysis-synthesis methods:
(a) unprocessed, (b) FF, (c) PS, and (d) LSEE.
Processing with spectral segment mapping,
auditory critical bandwidth, α = 0.6.
nitya@ee.iitb.ac.in
1
2
3
4
5
18/24
Processed outputs from off-line implementation
Fixed-frame MFC
Input Signal
α=1
α
= 0.8
Pitch-synchronous
α
= 0.6
α=1
α
= 0.8
α
= 0.6
Fixed-frame LSEE
α=1
α
= 0.8
α
= 0.6
Vowels
/a i u /
“Where
were you a
year ago?”
Music clip
• Conclusion
LSEE processing suited for non-speech and audio applications
nitya@ee.iitb.ac.in
1
2
3
4
5
19/24
3. Real-time Implementation
16-bit fixed point DSP: TI/TMS320C5515
• 16 MB memory space : 320 KB on-chip RAM with 64 KB DARAM, 128 KB on-chip ROM
• Three 32-bit programmable timers, 4 DMA controllers each with 4 channels
• FFT hardware accelerator (8 to 1024-point FFT)
Max. clock speed: 120 MHz
DSP Board: eZdsp
• 4 MB on-board NOR flash for user program
• Codec TLV320AIC3204: stereo ADC & DAC, 16/20/24/32-bit quantization , 8 – 192 kHz
sampling
• Development environment for C: TI's 'CCStudio, ver. 4.0‘
nitya@ee.iitb.ac.in
1
2
3
4
5
20/24
Signal acquisition & processing
Implementation details
•
•
•
•
•
● Sampling rate:10 kHz
ADC & DAC quantization: 16-bit
FFT length N:1024 / 512
Analysis-synthesis : LSEE-based fixed frame, 260-sample (26 ms) window, 75% overlap
Mapping for spectral modification: spectral segment mapping with auditory critical bands
Input samples, spectral values & processed output: 32-bits each with 16-bit real and
imaginary parts
nitya@ee.iitb.ac.in
1
2
3
4
5
21/24
Data transfer and buffering operations (S = L/4)
•
DMA cyclic buffers
• 5 block input buffer
• 2 block output buffer
(each of S samples)
•
Four pointers
• current input block
• current output block
• just-filled input block
• write-to output block
(Pointers incremented cyclically
on DMA interrupt)
Efficient realization of 75 % overlap
& zero padding
•
•
Total delay
• Algorithmic delay: L = 4S samples (26 ms)
• Computational delay: L/4 = S samples (6.5 ms)
nitya@ee.iitb.ac.in
1
2
3
4
5
22/24
4. Results
Comparison of proc. output from offline-implementation & DSP board
• Test material
Sentence “ Where were you a year ago?”, vowels /a/-/i/-/u/, music
• Evaluation methods
● PESQ measure (0 – 4.5)
• Informal listening
• Results
• Lowest clock for satisfactory operation: 20 MHz / 12 MHz (N = 1024/512)
• No perceptible change in output with N = 512, processing capacity used ≈ 1/10
of the capacity with highest clock (120 MHz)
• Informal listening: Processed output from DSP similar to corresponding output from
offline implementation
• PESQ scores (compression factor range 0.6 ― 1)
For sentence 2.5 ― 3.4 & for vowels 3.0 ― 3.4
• Total processing delay: 35 ms
nitya@ee.iitb.ac.in
1
2
3
4
5
23/24
Processed outputs from DSP board
Fixed-frame LSEE MFC
Input Signal
α=1
α = 0.8
α = 0.6
Vowels /a/-/i/-/u/
“Where were you
a year ago?”
Music clip
nitya@ee.iitb.ac.in
1
2
3
4
5
24/24
5. Conclusion
• Multi-band frequency compression using LSEE processing
-- Processed speech output perceptually similar to that from earlier established pitchsynchronous processing.
-- Suitable for speech as well as non-speech audio signals.
-- Well-suited for real-time processing due to use of fixed-frame segmentation.
• Implementation for real-time operation using 16-bit fixed-point processor
TI/TMS320C5515: used-up processing capacity ≈ 1/10, delay = 35 ms
• Further work
Combining multi-band frequency compression with other types of processing for
use in hearing aids
• Frequency-selective gain
• Multi-band dynamic range compression (settable gain and compression ratios)
• Noise reduction
• CVR modification
nitya@ee.iitb.ac.in
1
2
3
4
5
25/24
THANK YOU
nitya@ee.iitb.ac.in
1
2
3
4
5
Abstract
Widening of auditory filters in persons with sensorineural hearing impairment leads
to increased spectral masking and degraded speech perception. Multi-band
frequency compression of the complex spectral samples using pitch-synchronous
processing has been reported to increase speech perception by persons with
moderate sensorineural loss. It is shown that implementation of multi-band
frequency compression using fixed-frame processing along with least-squares error
based signal estimation reduces the processing delay and the speech output is
perceptually similar to that from pitch-synchronous processing. The processing is
implemented on a DSP board based on the 16-bit fixed-point processor
TMS320C5515, and real-time operation is achieved using about one-tenth of its
computing capacity.
nitya@ee.iitb.ac.in
1
2
3
4
5
27/
References
[1]
H. Levitt, J. M. Pickett, and R. A. Houde (eds.), Senosry Aids for the Hearing Impaired. New York: IEEE Press, 1980.
[2]
B. C. J. Moore, An Introduction to the Psychology of Hearing, London, UK: Academic, 1997, pp 66–107.
[3]
J. M. Pickett, The Acoustics of Speech Communication: Fundamentals, Speech Perception Theory, and Technology. Boston,
Mass.: Allyn Bacon, 1999, pp. 289–323.
[4]
H. Dillon, Hearing Aids. New York: Thieme Medical, 2001.
[5]
T. Lunner, S. Arlinger, and J. Hellgren, “8-channel digital filter bank for hearing aid use: preliminary results in
monaural, diotic, and dichotic modes,” Scand. Audiol. Suppl., vol. 38, pp. 75–81, 1993.
[6]
P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti, “Binaural dichotic presentation to reduce the effects of spectral
masking in moderate bilateral sensorineural hearing loss”, Int. J. Audiol., vol. 51, no. 4, pp. 334–344, 2012.
[7]
T. Baer, B. C. J. Moore, and S. Gatehouse, “Spectral contrast enhancement of speech in noise for listeners with
sensorineural hearing impairment: effects on intelligibility, quality, and response times”, Int. J. Rehab. Res., vol. 30, no. 1,
pp. 49–72, 1993.
[8]
J. Yang, F. Luo, and A. Nehorai, “Spectral contrast enhancement: Algorithms and comparisons,” Speech Commun., vol.
39, no. 1–2, pp. 33–46, 2003.
[9]
T. Arai, K. Yasu, and N. Hodoshima, “Effective speech processing for various impaired listeners,” in Proc. 18th Int. Cong.
Acoust., 2004, Kyoto, Japan, pp. 1389–1392.
[10]
K. Yasu, M. Hishitani, T. Arai, and Y. Murahara, “Critical-band based frequency compression for digital hearing aids,”
Acoustical Science and Technology, vol. 25, no. 1, pp. 61-63, 2004.
[11]
P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti, “Multi-band frequency compression for reducing the effects of
spectral masking,” Int. J. Speech Tech., vol. 10, no. 4, pp. 219–227, 2009.
[12]
P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti, “Multi-band frequency compression for improving speech
perception by listeners with moderate sensorineural hearing loss,” Speech Commun., vol. 54, no. 3 pp. 341–350, 2012.
[13]
E. Zwicker, “Subdivision of the audible frequency range into critical bands (Freqenzgruppen),” J. Acoust. Soc. Am., vol.
33, no. 2, pp. 248, 1961.
[14]
D. G. Childers and H. T. Hu, “Speech synthesis by glottal excited linear prediction”, J. Acoust. Soc. Am., vol. 96, no. 4,
pp. 2026–2036, 1994.
nitya@ee.iitb.ac.in
1
2
3
4
5
28/
[15]
D. W. Griffin and J. S. Lim, “Signal estimation from modified short-time Fourier transform,” IEEE Trans. Acoustics,
Speech, Signal Proc., vol. 32, no. 2, pp. 236-243, 1984.
[16]
X. Zhu, G. T. Beauregard, and L. L. Wyse, “Real-time signal estimation from modified short-time Fourier transform
magnitude spectra”, IEEE Trans. Audio, Speech, Language Process., vol. 15, no. 5, pp. 1645–1653, 2007.
[17]
Texas Instruments Inc., TMS320C5515 Fixed-Point
http://focus.ti.com/ lit/ds/symlink/tms320c5515.pdf.
[18]
Texas Instruments Inc., TLV320AIC3204 Ultra Low Power Stereo Audio Codec. 2008, [online] Available: http://focus.
ti.com/lit/ds/symlink/tlv320aic3204.pdf.
nitya@ee.iitb.ac.in
1
2
3
4
5
Digital
Signal
Processor.
2011,
[online]
Available:
29/
Download