Progress Report

Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06 Outline    Mel Frequency Cepstral Coefficient(MFCC) Mismatch in speech recognition  Feature based-CMS、CMVN、HEQ  Feature based-RASTA、data-driven  Speech enhancement-Spectral substraction、wiener filtering Conclusions and applications Outline    Mel Frequency Cepstral Coefficient(MFCC) Mismatch in speech recognition  Feature based-CMS、CMVN、HEQ  Feature based-RASTA、data-driven  Speech enhancement-Spectral substraction、wiener filtering Conclusions and applications Mel Frequency Cepstral Coefficients(MFCC)   39 dimension The most common used feature in speech recognition Advantages: High accuracy and low complexity Mel Frequency Cepstral Coefficients(MFCC)  The framework of feature extraction: xt(n) Speech signal x(n) Pre-emphasis x’(n) DFT At(k) Mel filter-bank Yt(m) Window energy   yt  j , et   y t    yt  j  , et   2 y  j  , 2 e  t t       Log(| |2) et derivatives yt (j) MFCC IDFT Yt’(m) Pre-emohasis  Pre-emphasis of spectrum at higher frequencies x[n] Pre-emphasis x’[n] End-point Detection(Voice activity detection) Noise(silence) Speech Windowing Rectangle window Hamming window Mel-filter bank  After DFT we get spectrum amplitude frequency Mel-filter bank amplitude frequency Triangular shape in frequency(overlaped) Uniformly spaced below 1kHz Logarithmic scale above 1kHz Delta Coefficients  1 st/2 nd order differences 13 dimension 39 dimension 1 st order 2 nd order Outline    Mel Frequency Cepstral Coefficient(MFCC) Mismatch in speech recognition  Feature based-CMS、CMVN、HEQ  Feature based-RASTA、data-driven  Speech enhancement-Spectral substraction、wiener filtering Conclusions and applications Mismatch in Statistical Speech Recognition y[n] x[n] original speech n1(t) h[n] additive convolutional noise noise  additive noise O =o1o2…oT feature vectors Speech Corpus W=w1w2...wR Search Acoustic Lexicon Models output sentences Language Model Possible Approaches for Acoustic Environment Mismatch x[n] Feature Extraction Model Training Acoustic Models y[n] Feature Extraction Search and Recognition Acoustic Models (training) (recognition) input signal n2(t) acoustic reception microphone distortion phone/wireless channel Feature Extraction Speech Enhancement Feature-based Approaches Model-based Approaches Text Corpus Outline    Mel Frequency Cepstral Coefficient(MFCC) Mismatch in speech recognition  Feature based-CMS、CMVN、HEQ  Feature based-RASTA、data-driven  Speech enhancement-Spectral substraction、wiener filtering Conclusions and applications Feature-based Approach- Cepstral Moment Normalization (CMS, CMVN)  P P Cepstral Mean Substraction(CMS)—Convolutional Noise P(y) P(y)  P(x)    P(x) becomes additive in Convolutional noise in time domain CMS cepstral domain y[n] = x[n]h[n]  y = x+h ,x, y, h in cepstral domain most convolutional noise changes only very slightly for some reasonable time interval x = yh Cepstral Mean Substraction(CMS)   assuming E[x ] = 0 , xCMS = yE[y] then E[y ] = h Feature-based Approach- Cepstral Moment Normalization (CMS, CMVN)  CMVN: variance normalized as well  P(x) xCMVN= xCMS/[Var(xCMS)]1/2 P(y) P(x) CMS P(y) P(x) CMVN P(y) Feature-based Approach-HEQ(Histogram Equalization)  The whole distribution equalized  y=CDFy-1[CDFx(x)] P P CDFx CDFy P=0.2 P=0.2 x 3 y 3.5 Outline    Mel Frequency Cepstral Coefficient(MFCC) Mismatch in speech recognition  Feature based-CMS、CMVN、HEQ  Feature based-RASTA、data-driven  Speech enhancement-Spectral substraction、wiener filtering Conclusions and applications Feature-based Approach-RASTA amplitude f amplitude f Perform filtering on these signals(temporal filtering) modulation frequency Feature-based Approach-RASTA(Relative Spectral Temporal filtering)   Assume the rate of change of noise often lies outside the typical rate of vocal tract shape A specially designed temporal filter Bz   1 3 a0  a1z  a3 z  a4 z 1  b1z 1  z 4 4 Emphasize speech Modulation Frequency (Hz ) Data-driven Temporal filtering  PCA(Principal Component Analysis) y x e Data-driven Temporal filtering  We should not guess our filter, but get it from data filter convolution B1(z) B2(z) Original feature stream yt Bn(z) Frame index L zk(1) zk(2) zk(3) Outline    Mel Frequency Cepstral Coefficient(MFCC) Mismatch in speech recognition  Feature based-CMS、CMVN、HEQ  Feature based-RASTA、data-driven  Speech enhancement-Spectral substraction、 wiener filtering Conclusions and applications Speech Enhancement- Spectral Subtraction(SS)    producing a better signal by trying to remove the noise for listening purposes or recognition purposes Noise n[n] changes fast and unpredictably in time domain, but relatively slowly in frequency domain, N(w) amplitude amplitude speech speech noise noise t f Outline    Mel Frequency Cepstral Coefficient(MFCC) Mismatch in speech recognition  Feature based-CMS、CMVN、HEQ  Feature based-RASTA、data-driven  Speech enhancement-Spectral substraction、wiener filtering Conclusions and applications Conclusions    We give a general framework of how to extract speech feature We introduce the mainstream robustness There are still numerous noise reduction methods(leave in the reference) References Q&A

Progress Report

Related documents

Products

Support

Progress Report

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib