Part6

Speech Coding Waveform Coding  Vocoders   Middle Term Evaluation Waveform Coding  In the time domain – PCM – Delta PCM (DPCM) – Adaptive DPCM  In the Frequency Domain – Filterbank spectrum Analyser – Subband coding – Adaptive Transform Coding – Vector Waveform Quantisation Pulse Code Modulation PCM 11010010.... Uniform quantiser Nonuniform quantiser Uniform Quantiser sn 11 t 10 01 T=1/Fs 00 S(w) 11 01 w Fs>2Wc 00 wc 10 t Rate=RFs Encoding 11010010.... Each sample of signal is quantized to one of the 2R amplitude values. A-law m-law Nonuniform quantiser 11 10 01 00 11 01 00 11 t Rate=RFs Encoding 11010011.... Delta PCM   Since successive speech samples exhibit high correlation, hence the average of successive-samples amplitude change is very small. Therefore, by encoding the differences between successive samples, fewer bits are required. DPCM coder-decoder E n c o d e r e~(n) s(n) Goal: Decorralate speech signal. Therefore: A simple long-term LP predictor is enough. D e c o d e r + S e(n) Quantiser Predictor (LP Analyser) ^s~(n) Channel e(n) + S s~(n) + ^s~(n) Predictor (LP Analyser) DPCM coder-decoder (this version ensures that the error in s~(n) is only the quantization error) s(n) + e(n) S Sampled signal modified by the quantisation process. e~(n) Quantiser + - s~(n) ^s~(n) Predictor (LP Analyser) S + D e c o d e r Channel e(n) + S s~(n) + ^s~(n) Predictor (LP Analyser) E n c o d e r DPCM coder-decoder (improved version) s(n) + e~(n) e(n) S Quantiser ^s~(n) - ^s~(n) All-zero Linear Filter + + s~(n) S Predictor (LP Analyser) S E n c o d e r + + D e c o d e r Channel e(n) + S s~(n) ~ + ^s (n) Predictor (LP Analyser) + S s~(n) + Predictor (LP Analyser) Adaptive PCM and DPCM PCM and DPCM assumes the speech signal is stationary.  The coding process can be improve by assuming the speech is quasistationary.  An improvement is to use an adaptive quantiser.  Adaptive DPCM Step-size Adaptation s(n) + e~(n) e(n) S Quantiser - ^s~(n) All-zero Linear Filter ^s~(n) + + s~(n) Predictor (LP Analyser) S S + + Predictor Adaptator D e c o d e r E n c o d e r Channel e(n) S + + Predictor (LP Analyser) + s~(n) ^s~(n) S s~(n) + Predictor (LP Analyser) Vocoders      Channel vocoder Cepstral vocoder Phase vocoder Formant vocoder Linear prediction coder. Cepstral Vocoder Pitch Estimator s(n) stDFT Log |.| IDFT “Low-time” lifter w(n) Channel D e c o d e r DFT Exp(.) Pitch Pulse Generator V/U Noise Generator IDFT Convolution E n c o d e r Linear Prediction in Speech Coding Introduction  Generalities  Methods  Introduction   This speech coders are called Vocoders (voice coder). Basic Idea Estimate parameters  Encode Parameters Transmit Parameters Decode Parameters Synthetise Speech They usually provide more bandwidth compression than is possible with waveform coding (2400-9600bps). Generalities LP Model  Parameter Estimation  Typical Memory requirements  LP Model Pitch Period Voice Impulse Generator Voice/Unvoice Switch White Noise Unvoice Generator All-pole filter Gain Speech Signal Glottal filter Vocal tract filter Lip Radiation filter Parameter Estimation  Therefore, for each frame: – estimate LP coefficients (ai´s) – estimate Gain – estimate type of excitation (voice or unvoice). – Estimate pitch. Typical Memory Requirements Pitch coefficient (6 bits).  Gain (5 bits)  Model parameters:  – LP coefficients (8-10 bits)  Small changes in the LPC results in large changes in the pole positions. – Reflection coefficients (6 bits)  If |rk| near 1, then large distortion. – Log-Area Ratio:  Represent a non-linear transformation of the Reflection Coefficients to expand the scale near to |rk| near 1. Methods Introduction  LPC-10  Analysis-by-Synthesis  Introduction  The main difference of the LP vocoders is the calculation fo the source of excitation. LPC-10 Window (180 samples) Speech Signal AMDF and Zero Crossing LP Analysis (Covariance Method) ADC (8kHz) Non-linear warping Sample Speech Channel D e c o d e r Pitch Period (7 bits) (4 bits) Gain (5 bits) Impulse Generator White Noise Generator E n Voice/Unvoice c Switch (1 bit) o LAR coefficients d (4 bits and 5 bits) e Reflection r Coefficients Pitch Frequency (7 bit) 10 Reflection Coefficients. (5 bits for one and 4 bits for the others). Voice/Unvoice Switch(1 bit) Synthesized Speech Signal 1/A(z) Analysis-by-Synthesis Methods Introduction  Multipulse LPC Vocoder  Regular Excited Linear Prediction (RELP)  Code Excited Linear Prediction (CELP) Vocoder  Introduction Sampled Speech Buffer and LP Analysis LP Synthesis Filter Multipulse excitation generator Perceptual weighting Filter Error minimisation E n c o d e r Multipulse LPC vocoder Multipulse excitation consists of a short sequence of pulses to minimise the energy of the perceptual error.  For simplicity, the amplitude y location of the impulses are obtained sequencially by minimising the energy for one pulse at a time.  In practice 4-8 pulses are calculated every 5 ms.  Multipulse LPC Sampled Speech Pitch Synthesis Filter Buffer and LP Analysis LP Synthesis Filter Perceptual weighting Filter Error minimisation Pulses’ locations (4 bits) Pulses’ amplitude (4 bits) Channel Scale factor (6 bits) Pulses’ locations (4 bits) Pulses’ amplitude (4 bits) Pitch filter parameters (6 bits) Scale factor (6 bits) Multipulse excitation generator D e c o d e r 10-12 Reflection Coefficients (5 bits). Pitch filter parameters (6 bits) Excitation Generator E(z) 10-12 Reflection Coefficients (5 bits). 1/A(z) Synthesized Speech Signal E n c o d e r Memory Requirments  Updated every 5 ms: – Scale factor(larger amplitude) log quantised: 6 bits – Pulse Amplitude (relative to the larger one) linear quantised: 4 bits.  Updated every 20 ms: – Vocal Tract Parameters (reflection coefficients): 6 bits. – Pitch Period: 6 bits Effective for good-quality speech at 9600bps.  They have been used for airborne mobile satellite telephone service.  Variations Every time a new location and amplitude of of an impulse is obtained, one can go back and reoptimise the amplitudes of the previous impulses.  Joint optimisation of all the amplitudes, after all locations have been determined.  Code Excited Linear Prediction (CELP) Vocoder The excitation signal is selected from a codebook of zero-mean Gaussian sequences.  LP coefficients are calculated around every 20 ms.  CELP Sampled Speech Pitch Synthesis Filter Buffer and LP Analysis LP Synthesis Filter Perceptual weighting Filter Gaussian Excitation Codebook D e c o d e r 10-12 Reflection Coefficients (5 bits). Error minimisation Pitch filter parameters (6 bits) Gain factor (6 bits) Index of the excitation sequence (4 bits) Channel Pitch filter parameters (6 bits) Gain factor (6 bits) Pulses’ amplitude (4 bits) Excitation Generator E(z) 10-12 Reflection Coefficients (5 bits). 1/A(z) Synthesized Speech Signal E n c o d e r With a codebook of 1024 sequences can be obtain toll-quality speech.  Rate around 4.8Kbps.  Variations Low-Delay CELP  VSELP  Topics to Evaluate      Vocal Tract Physiological Model. Linear Prediction (LP). Relatinoship betwen Vocal Tract Physiological Model and LP. Filterbank and Signal Processing. HMM – Basics – Applied to Speech Recognition. – Parameter re-estimation.

Part6

Related documents

Products

Support

Part6

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib