AcousticProcessing1

Signal Processing Front End sk Spectrum Signal Processing Front End Filterbank MFCC Linear Prediction Analysis O=o(1)o(2)..o(T) Signal A quantity who changes with the time.  Continuous signal  Discrete-Time signal  Digital signal  Digital Data: 00010101  Discrete-time signal from Continuous Signal Speech signal is a continuous signal.  In order to process the Speech signal in a computer we need to digitalized it.  S(t) 1 ~ Sk Sk Zero Order Holder Quantizer 2 ~ S (t) 3 Digitalization 4 5 Sd SNR por cuantización SNR Nyquist  fs  2 fw Ts  1 / 2 fw   Ts  1 / fs dB  6 . 02 n  1 . 76 Store Spectrum   Representation in the Frequency domain of a time signal. Tools: – Fourier Transform (continuous signals) – Fourier Series (periodic continuous signals) – Discrete-Time Fourier Transform (Discrete Time Signals periodic and not-periodic) – Discrete Fourier Transform (Discrete Time Signals) – Fast Fourier Transform (Discrete Time Signals of length 2x) Spectral Analysis  Discrete-Time Fourier Transform (DTFT). – Short-Time Fourier Transform (STFT). – Windowing effects  Discrete Fourier Transform (DFT) – Short-Time Fourier Transform (STFT). Discrete-Time Fourier Transform  Discrete-Time Fourier Tranform (DTFT) Synthesis Definition  S (e j T ) s e n  j  nT n    If sn is a periodic waveform N 1 S (e j T ) s n e (1)  j n T n0  DTFT Analysis Definition sn  T 2  2 0 S (e j T )e j  nT d Short-Time Fourier Transform   Since speech signal change with the time, we we make a short-time analysis. Hence the Short-Time Fourier Transform (STFT) is defined as: N 1 S ' (e j T )  wn s ne  j n T n0  Hence, the speech signal is multiplied in time by a window w n . Windowing Effects  S ' (e j T j T ) is only an approximation to S ( e ) N 1 S ' (e j T )  n0 S ' (e j T ) T 2 T wn   2  2 S (e 0 Effect: Smooth the spectrum  2 S (e j T )e j  nT 0 j T )W ( e j (   ) T   j nT d e  )e j  nT d Main lobe Side lobes 101-point FIR filters Narrow Bandwidth Analysis Wide Bandwidth Analysis Discrete Fourier Transform (DFT)  The spectrum is sampled, as follow: p   2p TN , p= 0 ,1 ,2 ,...N -1 By substituting it in 1 S p  S (e j , the DFT is defined as follows: N 1 p )   sn e  j p n , 0  p  N -1 n0  The analysis DFT is as follows: N 1 sn  S p e j p n , 0  n  N -1 p0  Time resolution depends on N. N samples in time corresponds to N samples in frequency STDFT DFT DFT DFT DFT o(1) o(2) o(3) o(4) Time-Frequency Representation: Sonogram  Can we use all the Sampled Spectrum as our feature observations for the Automatic Speech Recognition task? Reducing the information Filterbank Energy as Parameters  Formant Frequencies as Parameters  Cepstral Coefficients  Filterbank  Uniform Filterbank  No-uniform Filterbank  Filterbank Front End Uniform Filterbank No-uniform Filterbank Filter-Bank Front End FB FB FB FB o(1) o(2) o(3) o(4)  Can we found better parameters? Introduction S ( )  E ( ) H ( ) Since excitation information is not needed for ASR (in English), it is desirable to separate the excitation information form the vocal tract information. H ( ) E ( ) We can think the speech spectrum as a signal, we can observer that is composed for the multiplication of a slow signal, H ( ) and a fast signal,E ( ) . S ( )  E ( ) H ( ) We can filter the spectrum signal using linear filtering, however, the excitation and the speech signals are multiplied, and linear filters only useful to separate signals that are added.  On the other hand, linear filtering is convenient to use because it is simple to implement.  Therefore, we transform this no-linear relationship to a linear relationship using a log transformation. S (  )  log  E (  )   log  H (  )    In the log transform, we can filter out the excitation signal, and keep the vocal tract information. This is called the cepstrum spectrum.  If we only take into account the magnitude of the log transform we obtain the cepstrum. Since the vocal tract information was in the slow signal spectrum, hence from the cepstrum, we keep the slower coefficients.  CC DFT |.|2 Log() IDFT Cepstral Coefficients CC CC CC CC o(1) o(2) o(3) o(4) MFCC  If : – Instead of a IDFT used to obtain the cepstrum is used a Cosine Transform. – Instead of feeding to the Cosine Transform the Energy in each frequency, the output of a non-linear filterbank energy is used  we obtain the MFCC MFCC DFT Non-Uniform Filterbank |.|2 Log() Cosine Transform MFCC o(1) MFCC o(2) MFCC MFCC o(3) o(4)  Sphinx: – $base_dir/preprocessing/adc2mfcc.8k.csh

AcousticProcessing1

Related documents

Products

Support

AcousticProcessing1

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib