The Past, Present, and Future of Speech Processing

Speech Coding (Part I)  Waveform Coding 虞台文 Content        Overview Linear PCM (Pulse-Code Modulation) Nonlinear PCM Max-Lloyd Algorithm Differential PCM (DPCM) Adaptive PCM (ADPCM) Delta Modulation (DM) Speech Coding (Part I)  Waveform Coding Overview Classification of Coding schemes  Waveform coding  Vocoding  Hybrid coding Quality versus Bitrate of Speech Codecs Waveform coding     Encode the waveform itself in an efficient way Signal independent Offer good quality speech requiring a bandwidth of 16 kbps or more. Time-domain techniques – – –  Frequency-domain techniques –  Linear PCM (Pulse-Code Modulation) Nonlinear PCM: -law, a-law Differential Coding: DM, DPCM, ADPCM SBC (Sub-band Coding) , ATC (Adaptive Transform Coding) Wavelet techniques Vocoding      ‘Voice’ + ‘coding’ . Encoding information about how the speech signal was produced by the human vocal system. These techniques can produce intelligible communication at very low bit rates, usually below 4.8 kbps. However, the reproduced speech signal often sounds quite synthetic and the speaker is often not recognisable. LPC-10 Codec: 2400 bps American Military Standard. Hybrid coding    Combining waveform and source coding methods in order to improve the speech quality and reduce the bitrate. Typical bandwidth requirements lie between 4.8 and 16 kbps. Technique: Analysis-by-synthesis – – – – RELP (Residual Excited Linear Prediction) CELP (Codebook Excited Linear Prediction) MPLP (Multipulse Excited Linear Prediction) RPE (Regular Pulse Excitation) Quality versus Bitrate of Speech Codecs Speech Coding (Part I)  Waveform Coding Linear PCM (Pulse-Code Modulation) Pulse-Code Modulation (PCM)  A method for quantizing an analog signal for the purpose of transmitting or storing the signal in digital form. Quantization  A method for quantizing an analog signal for the purpose of transmitting or storing the signal in digital form. Linear/Uniform Quantization Quantization Error/Noise Quantization Error/Noise overload noise overload noise granular noise Quantization Error/Noise  2 X max  Quantization Step Size b 2 Quantization Error/Noise Unquantized sinewave 3-bit quantization waveform 3-bit quantization error 8-bit quantization error  2 X max 2b  Quantization Step Size The Model of Quantization Noise x ( n) x ( n)  +  2  e(n)  e( n ) x ( n)  + e( n ) x ( n)  2 x ( n )  x ( n )  e( n )  SQNR   2 signal 2 q -noise Signal-to-Quatization-Noise Ratio (SQNR)  A measurement of the effect of quantization errors introduced by analog-to-digital conversion at the ADC. 2  signal  signal SQNRdB  10log 2  20 log  q -noise  q -noise 2  signal  signal SQNRdB  10log 2  20log  q -noise  q -noise Signal-to-Quatization-Noise Ratio (SQNR) x ( n )  x ( n )  e( n )  2  e(n)  Assume e(n) ~ U ( 2 , 2 )  2  2 2  X   e2   max2b 12 3  2  x2 3  22b SQNRdB  10 log 2  10 log 2 e  X max  x   10 log 3  20b log 2  20 log  4.77  6.02b  20 log X max x X max x 2 X max 2b 2  signal  signal SQNRdB  10log 2  20log  q -noise  q -noise Signal-to-Quatization-Noise Ratio (SQNR) x ( n )  x ( n )  e( n )  2  e(n)  Assume e(n) ~ U ( 2 , 2 )  2  2 2  X   e2   max2b 12 3  2  x2 3  22b SQNRdB  10 log 2  10 log 2 e  X max  x   10 log 3  20b log 2  20 log  4.77  6.02b  20 log X max x X max x 2 X max 2b 2  signal  signal SQNRdB  10log 2  20log  q -noise  q -noise Signal-to-Quatization-Noise Ratio (SQNR) constant Each code bit contributes 6dB. SQNRdB  4.77  6.02b  20 log The term Xmax/x tells how big a signal can be accurately represented X max x 2  signal  signal SQNRdB  10log 2  20log  q -noise  q -noise Signal-to-Quatization-Noise Ratio (SQNR) Determined by A/D converter. SQNRdB  4.77  6.02b  20 log Depending on the distribution of signal, which, in turn, depends on users and time. X max x 2  signal  signal SQNRdB  10log 2  20log  q -noise  q -noise Signal-to-Quatization-Noise Ratio (SQNR) SQNRdB  4.77  6.02b  20 log X max x Overload Distortion  X max X max midtread  X max X max midrise Assume x ~ N (0,  ) 2 x Probability of Distortion x  X max x X max midtread  X max X max midrise Assume x ~ N (0,  ) 2 x  X max  X max  3 x P(" overlad ")    Probability of  Distortion P(" overlad ")  0.0026  x  x  X max x X max midtread  X max X max midrise Assume x ~ N (0,  ) 2 x Overload and Quantization Noise with Gaussian Input pdf and b=4 x x e ( dB )  X max X max midtread  X max X max  x (dB) X max midrise Uniform Quantizer Performance Uniform Input Pdf Gaussian Input Pdf SQNR SQNR (dB) (dB) X max  x (dB) X max  x (dB) SQNR  4.77  6.02b  20 log X max x More on Uniform Quantization    Conceptually and implementationally simple. – Imposes no restrictions on signal's statistics – Maintains a constant maximum error across its total dynamic range. x varies so much (order of 40 dB) across sounds, speakers, and input conditions. We need a quantizing system where the SQNR is independent of the signal’s dynamic range, i.e., a near-constant SQNR across its dynamic range. Speech Coding (Part I)  Waveform Coding Nonlinear PCM Probability Density Functions of Speech Signals Counting the number of samples in each interval provides an estimate of the pdf of the signal. Probability Density Functions of Speech Signals Probability Density Functions of Speech Signals  Good approx. is a gamma distribution, of the form 1/ 2  3  p( x)     8 x | x |    e 3| x| 2 x p(0)   Simpler approx. is a Laplacian density, of the form: 1 p( x)  e 2 x  2| x| x 1 p(0)  2 x Probability Density Functions of Speech Signals     Distribution normalized so that x=0 and x=1• Gamma density more closely approximates measured distribution for speech than Laplacian. Laplacian is still a good model in analytical studies. Small amplitudes much more likely than large amplitudes— by 100:1 ratio. Companding    The dynamic range of signals is compressed before transmission and is expanded to the original value at the receiver. Allowing signals with a large dynamic range to be transmitted over facilities that have a smaller dynamic range capability. Companding reduces the noise and crosstalk levels at the receiver. Companding x C ( x) Compressor ŷ y Uniform Quantizer 1 C ( x) Expander x̂ Companding x g ( x) Compressor ŷ y Uniform Quantizer 1 g ( x) Expander x̂ Companding x g ( x) y ŷ 1 After compression, g ( xy)is x̂ Nearly uniformly distributed Compressor Uniform Quantizer Expander The Quantization-Error Variance of Nonuniform Quantizer x g ( x) ŷ y Compressor Jayant and Noll Uniform Quantizer 2   e2  12 1 g ( x) Expander  X max  X max p( x) C ( x) 2 dx x̂ The Quantization-Error Variance of Nonuniform Quantizer x g ( x) ŷ y Compressor Jayant and Noll Uniform Quantizer 2   e2  12 1 g ( x) Expander  X max  X max p( x) C ( x) 2 dx x̂ 2  2  Jayant and Noll e  12  p( x) X max  X max C ( x) 2 dx The Optimal C(x) x g ( x) y If the signal’s pdf is known, ŷ 1 then the minimum g SQNR, ( x)is x̂ achievable by letting Compressor Uniform Quantizer C ( x)  X max   x Expander 3 0 X max 0 p( x)dx 3 p( x)dx 2  2  Jayant and Noll e  12  p( x) X max  X max C ( x) 2 dx The Optimal C(x) x g ( x) y If the signal’s pdf is known, ŷ 1 then the minimum g SQNR, ( x)is x̂ achievable by letting Compressor Uniform Quantizer C ( x)  X max   x Expander 3 0 X max 0 p( x)dx 3 p( x)dx PDF-Independent Nonuniform Quantization Assuming overload free,  x2 SQNR  2  2 e   X max  X max 12  X max  X max x 2 p ( x)dx 1 C ( x) 2 p ( x)dx We require that SQNR is independent on p(x). 1 1 2   2 x  C ( x)  k / x  C ( x)  k ln x  A 2 k C ( x) C ( x)  k ln x  A Logarithmic Companding x g ( x) ŷ y Compressor Uniform Quantizer -Law & A-Law Companding  -Law – –  A North American PCM standard Used by North America and Japan A-Law – – An ITU PCM standard Used by Europe g 1 ( x) Expander x̂ x g ( x) ŷ y Compressor Uniform Quantizer -Law & A-Law Companding  -Law (=255  in U.S. and Canada) A North American PCM standard ln 1   | x | C ( xby ) North America  signand ( x) Japan –y  Used ln(1   ) –  A-Law (A=87.56  in Europe) An ITU PCM x| 1  A |standard  sign( x) 0 | x |  – Used by Europe A 1  ln A y  C A ( x)   1  ln  A | x |  sign( x) 1 | x | 1  1  ln A A – g 1 ( x) Expander x̂ x g ( x) Compressor ŷ y Uniform Quantizer y  C ( x ) y  C A ( x) -Law & A-Law Companding x x g 1 ( x) Expander x̂ x g ( x) ŷ y -Law Companding Compressor  | x| ln 1   X max  y  C ( x)  X max  ln(1   ) Uniform Quantizer     sign( x) X max y  C ( x ) x ( n)  0  y ( n)  0 x(n)  X max  y(n)  X max   0  y ( n)  x ( n) x ( n) X max g 1 ( x) Expander x̂ x ŷ y g ( x) g 1 ( x) -Law Companding Compressor  | x| ln 1   X max  y  C ( x)  X max  ln(1   )     sign( x) Uniform Quantizer x̂ Expander  z ln 1  z    ln z z 1 z 1 X max y  C ( x ) 1  | x|  X   max ln  X  sign( x) max  C ( x )    X  1 ln   | x |   sign( x)  max ln   X max   x ( n) X max  | x| X max  | x| X max 1 1 x ŷ y g ( x) g 1 ( x) -Law Companding Compressor  | x| ln 1   X max  y  C ( x)  X max  ln(1   )     sign( x) Uniform Quantizer x̂ Expander  z ln 1  z    ln z z 1 z 1 Linear X max y  C ( x ) 1  | x|  X   max ln  X  sign( x) max  C ( x )    X  1 ln   | x |   sign( x)  max ln   X max   Log x ( n) X max  | x| X max  | x| X max 1 1 x g ( x) Compressor ŷ y Uniform Quantizer g 1 ( x) Expander Histogram for -Law Companding x(n) y(n) x̂ g ( x) x Compressor ŷ y Uniform Quantizer g 1 ( x) Expander -law Approximation to Log yˆ (n) Distribution of quantization level for a -law 3-bit quantizer. x ( n) x̂ SQNR of -law Quantizer   X 2  X max   max SQNRdB  6.02b  4.77  20log ln(1   )   10log 1     2     x    x    6.02b dependence on b  good  Much less dependence on Xmax/x  good  For large  SQNR is less sensitive to the changes in Xmax/x  good Linear SQNRdB  6.02b  4.77  20 log X max x Comparison of Linear and -law Quantizers   X 2  X max   max SQNRdB  6.02b  4.77  20log ln(1   )   10log 1     2     x    x   A-Law Companding 1  A| x| 0 | x | 1  ln A  sign( x) A y  C A ( x)   1  ln  A | x |  sign( x) 1 | x | 1  1  ln A A A-Law Companding Linear 1  A| x| 0 | x | 1  ln A  sign( x) A y  C A ( x)   1  ln  A | x |  sign( x) 1 | x | 1  1  ln A A Log y  C A ( x) y  C ( x ) A-Law Companding x x SQNR of A-Law Companding SQNRdB  6.02b  4.77  20log(1  A) Demonstration PCM Demo Speech Coding (Part I)  Waveform Coding Max-Lloyd Algorithm Q(x): Quantization (Reconstruction) Level How to design a nonuniform quantizer? Q(x) ck qk qk1 xk  qk  xk 1 ck1 qk xk1 xk xk+1 x Q(x): Quantization (Reconstruction) Level How to design a nonuniform quantizer? Q(x) qk1 xk  qk  xk 1 ? ck1 qk xk1 xk ? ? ck qk xk+1 x How to design a nonuniform quantizer? Major tasks: 1. Determine the decision thresholds xk’s 2. Determine the reconstruction levels qk’s Related task: 3. Determine codewords ck’s ck2 ck1 ck ck+1 ck+2 ck+3 qk2 qk1 qk qk+1 qk+2 qk+3 xk2 xk1 xk xk+1 xk+2 xk+3 xk+4 ck2 ck1 ck ck+1 ck+2 ck+3 qk2 qk1 qk qk+1 qk+2 qk+3 xk2 xk1 xk xk+1 xk+2 xk+3 xk+4 Optimal Nonuniform Quantization Major tasks: 1. Determine the decision thresholds xk’s 2. Determine the reconstruction levels qk’s An optimal quantizer is the one that minimizes the following quantization-error variance.   E  X  Q( X )   2 e 2   ck2 ck1 ck ck+1 ck+2 ck+3 qk2 qk1 qk qk+1 qk+2 qk+3 xk2 xk1 xk xk+1 xk+2 xk+3 xk+4 Optimal Nonuniform Quantization  2     E  X  Q( X )    e( x) p( x)dx    2 2 e N   k 1 xk 1 xk  x  qk  2 p( x)dx N * 1 (x , * N * 1 , x ,q , , q )  arg min   * N x1   xN q1   qN k 1 xk 1 xk  x  qk  2 p( x)dx ck2 ck1 ck ck+1 ck+2 ck+3 qk2 qk1 qk qk+1 qk+2 qk+3 xk2 xk1 xk xk+1 xk+2 xk+3 xk+4 Necessary Conditions for an Optimum  e2 0 qk  leads to the “centroid” condition  e2 0 xk  leads to the “nearest neighborhood” condition N * 1 (x , * N * 1 , x ,q , , q )  arg min   * N x1   xN q1   qN k 1 xk 1 xk  x  qk  2  e2 p( x)dx ck2 ck1 ck ck+1 ck+2 ck+3 qk2 qk1 qk qk+1 qk+2 qk+3 xk2 xk1 xk xk+1 xk+2 xk+3 xk+4 Necessary Conditions for an Optimum  e2 0 qk  leads to the “centroid” condition xk 1 qk  e2 0 xk    xk xk 1 xk xp( x)dx , k  1, ,N p( x)dx  leads to the “nearest neighborhood” condition qk 1  qk xk  , 2 k  1, ,N ck2 ck1 ck ck+1 ck+2 ck+3 qk2 qk1 qk qk+1 qk+2 qk+3 xk2 xk1 xk xk+1 xk+2 xk+3 xk+4 Optimal Nonuniform Quantization  e2 0 qk  leads to the “centroid” condition xk 1 qk  e2 0 xk    xk xk 1 xk xp( x)dx , k  1, ,N p( x)dx  leads to the “nearest neighborhood” condition qk 1  qk xk  , 2 k  1, ,N ck2 ck1 ck ck+1 ck+2 ck+3 qk2 qk1 qk qk+1 qk+2 qk+3 xk2 xk1 xk xk+1 xk+2 xk+3 xk+4 The Max-Lloyd algorithm 1. Initialize a set of decision levels {xk} and set  e2   xk 1 2. Calculate reconstruction levels {qk} by qk    xk xk 1 xk  xk 1 xp( x)dx p( x)dx  x  qk  p( x)dx 3. Calulate mse by   4. 2 2 If  e   e , exit. 5. Set  e2   e2 and adjust decision levels {xk} by xk  6. Go to 2 2 e xk 2 qk 1  qk 2 ck2 ck1 ck ck+1 ck+2 ck+3 qk2 qk1 qk qk+1 qk+2 qk+3 xk2 xk1 xk xk+1 xk+2 xk+3 xk+4 The Max-Lloyd algorithm 1. Initialize a set of decision levels {xk} and set  e2   xk 1 2. 3. Calculate reconstruction levels {qk} by qk Calulate mse by   2 e  x  x  qk  xk 1 2    xk xk 1 xk xp( x)dx p( x)dx p( x)dx k 4. 2 2 If  e   e , exit. 5. Set  e2   e2 and adjust decision levels {xk} by xk  6. Go to 2 qk 1  qk 2 The Max-Lloyd algorithm (Practical Version) Exercise Speech Coding (Part I)  Waveform Coding Differential PCM (DPCM) Do you find any correlation and/or redundancy among the samples? Typical Audio Signals 0.25 0.2 0.15 A segment of audio signals 0.1 0.05 0.6 0 0.4 0.2 -0.05 0 -0.1 -0.2 -0.15 -0.4 -0.2 -0.6 -0.8 1250 1300 1350 1400 1450 1500 1550 1600 1650 1700 1750 0 500 1000 1500 2000 2500 The Basic Idea of DPCM    Adjacent samples exhibit a high degree of correlation. Removing this adjacent redundancy before encoding, a more efficient coded signal can be resulted. How? – Accompanying with prediction (e.g., linear prediction) – Encoding prediction error only Linear Prediction p sˆ(n)   ak s(n  k ) sˆ( n) k 1 e(n)  s (n)  sˆ(n) n1 np n2 p  s (n)   ak s (n  k ) n3 k 1 N E p   e 2 ( n) n 1 a*  arg min E p a a  (a1 , , a p ) s (n) Linear Predictor p sˆ(n)   ak s(n  k ) k 1 s (n) Predictor sˆ( n) s (n) Predictor Predictor sˆ( n) DPCM Codec e( n ) s (n) +  Quantizer sˆ(n)  sˆ ( n) + Predictor Channel e ( n )  e( n ) + s ( n)  s ( n) s ( n)  s ( n) e ( n) + A/D converter + sˆ ( n) Channel Predictor The dynamic range of prediction error is much smaller than the DPCM Codec signal’s.  Less quantization levels needed e( n ) s (n) +  Quantizer sˆ(n)  sˆ ( n) + Predictor Channel e ( n )  e( n ) + s ( n)  s ( n) s ( n)  s ( n) e ( n) + A/D converter + sˆ ( n) Channel Predictor Performance of DPCM  By using a logarithmic compressor and a 4bit quantizer for the error sequence e(n), DPCM results in high-quality speech at a rate of 32,000 bps, which is a factor of two lower than logarithmic PCM Speech Coding (Part I)  Waveform Coding Adaptive PCM (ADPCM)  ( n)   ( n) Basic Concept  The power level in a speech signal varies slowly with time. (n)  Let the quantization step dynamically adapt to the slowly time-variant power level. Adaptive Quantization Schemes   Feed-forward-adaptive quantizers – estimate (n) from x(n) itself – step size must be transmitted Feedback-adaptive quantizers – adapt the step size, , on the basis of the quantized signal xˆ (n) – step size needs not to be transmitted Feed Forward Adaptation x ( n) Quantizer Step-Size Adaptation System c (n)  (n) xˆ (n) Encoder  (n)  (n) Decoder c (n) xˆ (n) The source signal is not available at receiver. So, the receiver can’t evaluate (n) by itself. Feed Forward Adaptation x ( n) Quantizer Step-Size Adaptation System c (n)  (n) xˆ (n) Encoder c (n)  (n)  (n) (n) has to be transmitted. Decoder xˆ (n)  x(n) Quantization error e(n)  xˆ (n)  x(n) (n)  0 (n) The Step-Size Adaptation System x ( n) Quantizer Step-Size Adaptation System xˆ (n) Encoder c (n)  (n)  (n) (n) has to be transmitted. 2(n), Estimate signal’s short-time energy,  c (n) xˆ (n)  x(n) Decoder and make (n)  (n). Quantization error  (n) e(n)  xˆ (n)  x(n) (n)  0 (n) The Step-Size Adaptation System  Low-Pass Filter Approach  2 ( n)  n  h(n)   n , n  0,0    1 x 2 ( m) h ( n  m ) m   n nm 2  x ( m)   m   n 1 n 1 nm 2 2  x ( m )  x ( n)  m  n  m 1 2 2  x ( m )  x ( n)  m    2 (n  1)  x 2 (n) (n)  0 (n) The Step-Size Adaptation System  Low-Pass Filter Approach  = 0.99  = 0.9 (n)  0 (n) The Step-Size Adaptation System  Moving Average Approach n  ( n)  2  x ( m) h ( n  m) 2 m  n  M 1 1  M n  m  n  M 1 x 2 ( m) 1 h( n)  , 0  0  M  1 M 1  ( n)  M 2 n  x 2 ( m) m  n  M 1 (n)  0 (n) Feed-Forward Quantizer    (n) evaluated every M Samples Use M=128, 1024 for estimates Suitable choosing of min and max 1  ( n)  M 2 n  x 2 ( m) m  n  M 1 (n)  0 (n) Feed-Forward Quantizer    (n) evaluated every M Samples Use M=128, 1024 for estimates Suitable choosing of min and max Too long (n) can be evaluated at both sides using the same alogorithm. Hence, it needs not to be transmitted. Feedback Adaptation x ( n) Quantizer xˆ (n) Encoder c (n) Step-Size Adaptation System  (n) Decoder c (n) Step-Size Adaptation System  (n) xˆ (n) The same as feed-forward adaptation except that the input changes. The Step-Size Adaptation System x ( n) Quantizer xˆ (n) Encoder c (n) Step-Size Adaptation System  (n) Decoder c (n) Step-Size Adaptation System  (n) xˆ (n) Alternative Approach to Adaptation (n)  P(n)  (n  1)  P(n){P1, P2, …} depends on c(n1).  Needs to impose the limits min  (n)  max  The ratio max/min controls the dynamic range of the quantizer. Alternative Approach to Adaptation (n)  P(n)  (n  1)  P(n){P1, P2, …} depends on c(n1).  Needs to impose the limits P1 P2 P3 P4 P5 P P67 P8 min  (n)  max The ratio max/min controls the dynamic range of the quantizer. Alternative Approach to Adaptation Speech Coding (Part I)  Waveform Coding Delta Modulation (DM) Delta Modulation  Simplest form of DPCM –  The prediction of the next is simply the current Sampling rate chosen to be many times (e.g., 5) the Nyquist rate, adjacent samples are quite correlated, i.e., s(n)s(n1). – 1-bit (2-level) quantizer is used – Bit-rate = sampling rate Review DPCM e( n ) s (n) +  Quantizer sˆ(n)  sˆ ( n) + Predictor Channel e ( n )  e( n ) + s ( n)  s ( n) s ( n)  s ( n) e ( n) + A/D converter + sˆ ( n) Channel Predictor DM Codec e( n ) s (n) +  e ( n)  1  Quantizer  sˆ ( n) + Predictor z1 Channel + s ( n)  Channel s ( n) e ( n) + + sˆ ( n) Predictor z1 A/D converter Distortions of DM T step size time  0 1 1 1 1 1 0 0 1 e(n)  1 code words: c(n)   0 e(n)  1 0 0 1 0 0 1 0 Distortions of DM slope overload condition step size T granular noise time  0 1 1 1 1 1 0 0 1 e(n)  1 code words: c(n)   0 e(n)  1 0 0 1 0 0 1 0 Choosing of Step Size time Needs large step size Needs small step size Adaptive DM (ADM) time (n)  (n  1) K e ( n ) e ( n 1) (n)  (n  1) K e ( n ) e ( n 1) Adaptive DM (ADM) K 2

The Past, Present, and Future of Speech Processing

Related documents

Products

Support

The Past, Present, and Future of Speech Processing

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib