ECE5527 - Final Pres..

ASR Front End Processing Implemented on Texas Instruments OMAP-L137 Jacob Zurasky – 12/12/11 Project Goals  Create a front-end for embedded ASR  Extract feature vectors from speech data  Allow for many different specifications  Extract features real-time, while allowing enough CPU time for analysis Hardware Platform  Texas Instruments OMAP – L137 DSP, dual core  TMS320C6747  ARM9  AIC3106 Audio Codec  64MB SDRAM Signal Flow Block Diagram Audio Framing PreEmphasis Window FFT Mel Filter Log DCT Deltas 13 - MFCCs 13 - Deltas 13 - Delta Deltas Data Streams  Streams are a way to transfer blocks of data efficiently  Uses enhanced direct memory access (EDMA)  Block of data can be accessed by SIO_reclaim(…)  Block of data can be sent by SIO_issue(…) Input Stream DSP Audio Codec Output Stream Stream Example  After SIO_reclaim, pIn points to input data and pOut points to output data  After SIO_issue, those buffers are reused by the audio codec Pre-Emphasis  y[n] = x[n] – ax[n-1]  First order high-pass filter  Used to compensate for the higher frequency roll-off in human speech production Windowing Function  Rectangular, Hann, Hamming, Cosine, Gaussian… Hamming Window FFT  Magnitude of Frequency Spectrum  Texas Instrument’s DSPLIB for C67x Mel Filter  Triangular Bandpass Filters along Mel Frequency Scale  Mimics the logarithmic nature of human hearing Discrete Cosine Transform (DCT)  Transforms back from frequency domain  Typically first 12 values are used as the Mel Frequency Cepstral Coefficients  Look-up table for efficiency Deltas  Produce 13 MFCC’s per frame  13 more from the first derivative  13 additional from the second derivative  39 dimensional vector to represent the current frame Observations  Pre-Emphasis and Windowing an input frame Input Frame Pre-Emphasis and Windowed Frame Observations  FFT and Log, Mel Filter Magnitude of Frequency Spectrum Log, Mel Filtered Spectrum Observations  Discrete Cosine Transform to produce MFCC’s Mel Frequency Cepstral Coefficients Full Feature Vector for 1 frame Observations  Frame Size = 256 samples @ 16 kHz Fs  1 Frame = 16 mS  Feature Extraction Time  Debug – 1.55 mS  Release – 0.25 mS  Real Time Feature Extraction  0.25 mS / 16 mS = 1.56% usuage Future Goals  Complete training code for DSP  Load training data to SDRAM  DSP calculates all feature vectors associated with a given phone  Calculates Gaussian mixture model  Save acoustic model off-chip  Evaluate the acoustic model (digital recognition)  Complete embedded ASR on limited vocabulary

ECE5527 - Final Pres..

Related documents

Products

Support

ECE5527 - Final Pres..

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib