Speech Coding

Speech Coding Using LPC What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for   Transmission  Available Bandwidth Encryption Uncompressed Speech signal  Analog speech is a bandpassed signal between 200 and 3400 Hz.  Uncompressed digital speech is a bit stream at 64kB/s.  Transmission technology must  transmit the signals from point A to point B:  with minimum degradation  using minimum bandwidth Speech coding  By coding we mean an efficient representation of the signal – COMPRESSION  The main approaches: waveform coding smart quantizers  transform coding  Parametric / hybrid coding  } How each of these works:  Waveform coders: try to find an efficient representation of the waveform, directly.  Transform coders: try to find an efficient representation in the frequency domain. FFT, etc.  Parametric coders: try to find a small set of parameters that are an efficient representation of the signal. exc. H ( ) speech Comparison of speech coders LPC (Linear Predictive coding)  LPC is a model for signal production: it is based on the assumption that the speech signal is produced by a very specific model. Speech Production in Humans  The speech signal is created by:   A pressure source (lungs), exciting ... A Filter (Vocal tract: pharynx - mouth [soft palate, tongue] - nasal cavity) For DSP Engineer  An excitation source  A time varying filter filter: Excitation speech H(t, ) The model and its representation  The LPC model looks at speech as:   Excitation:  periodic (voiced) originating in the larynx  noise (unvoiced) fricative, produced in the mouth An all-pole filter representing the vocal tract .. .. all pole filter: H() Block Diagram Why the name “Linear Predictive Coding”  It is assumed that the new sample is the weighted linear combination of previous samples p s (n)   a s (n  i )  Ge(n) i i 1 Z-Plane Representation  In the z-plane we can write the model as a transfer function: H(z)  G p 1   ai z  i i 1 • Clearly this transfer function has only poles which is why it represents an all pole filter. Mathematical analysis  Reminder: our problem is to find the LPC parameters, for a given speech signal. This is called the Inverse Problem.  How do we find the set of parameters that gives the best match to the signal? What are these Parameters  The Coefficients of the All Pole Filter  Pitch of the speech  How do we find the Coefficients:  least squares  Formulation:    Given a signal s(n); Defining an error as: Find the set of square error: ai e(n)  s(n)   ai s(n  i) i 1 that will minize the mean E   e2 ( n) n p Solution:  Simply equate the derivative of E to zero: E  0, i  1... p ai • Which gives us the Normal Equations: p  a  s(n  k )s(n  i)   s(n)s(n  i), i  1...p k 1 k n n • These are no more than p linear equations in p unknowns... Or in matricial form:   s(n  1) s(n  1)  n   s(n  2) s(n  1) n     s(n  p) s(n  1) n  s(n 1)s(n  2)  s(n  2)s(n  2)  n n  n    s(n  p)s(n  2)  n  s(n 1)s(n  p)   a    s(n 1)s(n)   s(n  2)s(n  p)   a    s(n  2)s(n)  n 1 2      a  n s(n  p)s(n  p)  p  n  n      s ( n  p ) s ( n )   n  What is each element of the form-  s(n  k )s(n  i)? n  A correlation; in other words:  take the signal, multiply it by a shifted version, and sum.  Since our signal is long and time varying- we did it on short windows  Two variants:   autocorrelation method covariance method Solving the Matrix  Found the Coefficients a(i) by Using the Levinson-Durbin recursion method Second Parameter  Pitch was found by the finding the correlation of the signal window with itself  Then these parameters were transmitted Predictor coefficients Gain Pitch period Voiced/unvoiced switch Total Overall bit rate 18 * 8 = 144 5 6 1 156 50 * 156 = 7800 bits / second Bit rate for plain LPC vocoder Predictor coefficients Gain DCT coefficients Total Overall bit rate 18 * 8 = 144 5 40 * 4 = 160 309 50 * 309 = 15450 bits / second Bit rate for voice-excited LPC vocoder with DCT Conclusion  Sound produced through LPC method is not exactly the real sound but it sounds intelligibly understandable  LPC can be used in Speech recognition systems  LPC was widely used in Military because of low bit rate in transmission  There are many variants over the basic scheme: LPC-10, CELP, MELP, RELP, VSELP, ASELP, LD-CELP...

Speech Coding

Related documents

Products

Support

Speech Coding

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib