ch5.3 (Vocoders).ppt

Vocoders 1 The Channel Vocoder (analyzer):  The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300 HZ.  Typically, 16-20 linear phase FIR filter are used.  The output of each filter is rectified and lowpass filtered.  The bandwidth of the lowpass filter is selected to match the time variations in the characteristics of the vocal tract.  For measurement of the spectral magnitudes, a voicing detector and a pitch estimator are included in the speech analysis. 2 The Channel Vocoder (analyzer block diagram): Rectifier Lowpass Filter A/D Converter Bandpass Filter Rectifier Lowpass Filter A/D Converter S(n) Encoder Bandpass Filter To Channel Voicing detector Pitch detector 3 The Channel Vocoder (synthesizer): 16-20 linear-phase FIR filters  Covering 0-4 kHz  Each having a bandwidth between 100300 Hz  20 ms frames, or 50 Hz changing of spectral magnitude  LPF bandwidth: 20-25 Hz  Sampling rate of the output of the filters: 50 Hz  4 The Channel Vocoder (synthesizer):  Bit rate: 1 bit for voicing detector  6 bits for pitch period  For 16 channels, each coded with 3-4 bits, updated 50 times per second  Then the total bit rate is 2400-3200 bps  Further reductions to 1200 bps can be achieved by exploiting frequency correlations of the spectrum magnitude 5 The Channel Vocoder (synthesizer):  At the receiver the signal samples are passed through D/A converters.  The outputs of the D/As are multiplied by the voiced or unvoiced signal sources.  The resulting signal are passed through bandpass filters.  The outputs of the bandpass filters are summed to form the synthesized speech signal. 6 The Channel Vocoder (synthesizer block diagram): D/A Converter Bandpass Filter ∑ Channel Decoder From D/A Converter Output speech Bandpass Filter Voicing Information Switch Pitch period Random Noise generator Pulse generator 7 The Phase Vocoder :  The phase vocoder is similar to the channel vocoder.  However, instead of estimating the pitch, the phase vocoder estimates the phase derivative at the output of each filter.  By coding and transmitting the phase derivative, this vocoder destroys the phase information . 8 The Phase Vocoder (analyzer block diagram): cos k n Short-term magnitude cos  n Lowpass Filter k ak n Differentiator Differentiator k sin k n bk n Compute Short-term Magnitude And Phase Derivative Encoder S(n) Lowpass cos  n Filter sin k n Decimator To Channel Decimator Short-term phase derivative 9 The Phase Vocoder (synthesizer block diagram, kth channel): Decimate Short-term amplitude cos k n Channel Decoder From Cos Interpolator ∑ Integrator Decimate Sin Interpolator Short-term Phase sin k n derivative 10 The Phase Vocoder :        LPF bandwidth: 50 Hz Demodulation separation: 100 Hz Number of filters: 25-30 Sampling rate of spectrum magnitude and phase derivative: 50-60 samples per second Spectral magnitude is coded using PCM or DPCM Phase derivative is coded linearly using 2-3 bits The resulting bit rate is 7200 bps 11 The Formant Vocoder :  The formant vocoder can be viewed as a type of channel vocoder that estimate the first three or four formants in a segment of speech.  It is this information plus the pitch period that is encoded and transmitted to the receiver. 12 The Formant Vocoder :  Example of formant: : The spectrogram of the utterance “day one” showing the pitch and the harmonic structure of speech.  (b) : A zoomed spectrogram of the fundamental and the second harmonic.  (a) (a) (b) 13 The Formant Vocoder (analyzer block diagram): F3 F3 B3 F2 F2 B2 F1 F1 B1 Input Speech Pitch And V/U Decoder V/U F0 Fk :The frequency of the kth formant Bk :The bandwidth of the kth formant 14 The Formant Vocoder ( synthesizer block diagram) F3 B3 F2 B2 F1 B1 V/U F0 : F3 F2 ∑ F1 Excitation Signal 15 Linear Predictive Coding :  The objective of LP analysis is to estimate parameters of an all-pole model of the vocal tract.  Several methods have been devised for generating the excitation sequence for speech synthesizes.  LPC-type of speech analysis and synthesis are differ primarily in the type of excitation signal that is generated for speech synthesis. 16 Synthesis Lattice Structure Periodic Impulse generator Voiced + + + Switch Unvoiced White Noise generator + … + k(M;m) Θ̂0 Z-1 … Z-1 + + k(2;m) + k(2;m) + Z-1 + + + S´ (m) k(1;m) k(1;m) + Synthesized speech Z-1 Gain estimate 17 LPC 10 :  This methods is called LPC-10 because of 10 coefficient are typically employed.  LPC-10 partitions the speech into the 180 sample frame.  Pitch and voicing decision are determined by using the AMDF and zero crossing measures. 18 A General Discrete-Time Model For Speech Production 19 ‫پيشگويي خطي‬ ‫تعيين مرتبه پيشگويي‬ ‫صدادار‬ ‫بي صدا‬ ‫صفحه ‪ 20‬از ‪54‬‬ ‫پيشگويي خطي‬ ‫تعيين مرتبه پيشگويي‬ ‫صفحه ‪ 21‬از ‪54‬‬ ‫پيشگويي خطي‬ ‫تعيين مرتبه پيشگويي‬ ‫‪2‬‬ ‫‪ m‬‬ ‫‪s‬‬ ‫‪[n] ‬‬ ‫‪‬‬ ‫‪n  m  M 1‬‬ ‫‪PG  10 log‬‬ ‫‪2‬‬ ‫‪ m‬‬ ‫‪‬‬ ‫‪e‬‬ ‫[‬ ‫‪n‬‬ ‫]‬ ‫‪ n  m  M 1‬‬ ‫‪‬‬ ‫صدادار‬ ‫بي صدا‬ ‫صفحه ‪ 22‬از ‪54‬‬ ‫پيشگويي خطي‬ ‫مثال‬ ‫‪M=4‬‬ ‫‪M=10‬‬ ‫صفحه ‪ 23‬از ‪54‬‬ ‫پيشگويي خطي‬ ‫مثال‬ ‫‪M=2‬‬ ‫‪M=10‬‬ ‫‪M=54‬‬ ‫صفحه ‪ 24‬از ‪54‬‬ ‫پيشگويي خطي‬ ‫ايده پيشگويي خطي بلند مدت‬ ‫‪M=10‬‬ ‫‪M=50‬‬ ‫صفحه ‪ 25‬از ‪54‬‬ ‫پيشگويي خطي‬ ‫پيشگويي خطي بلند مدت‬ ‫صفحه ‪ 26‬از ‪54‬‬ ‫وكدر ‪LPC10‬‬ ‫مشخصات عمومي‬ ‫‪‬‬ ‫‪‬‬ ‫‪‬‬ ‫‪‬‬ ‫‪‬‬ ‫صفحه ‪27‬‬ ‫بخاطر ارسال ‪ 10‬ضريب پيشگويي خطي به ‪ LPC10‬معروف‬ ‫است‪.‬‬ ‫نرخ ارسال برابر ‪ 2400‬بيت بر ثانيه ميباشد‪.‬‬ ‫تعداد نمونهها در هر فريم برابر ‪ 180‬نمونه در نظر گرفته شده‬ ‫است‪.‬‬ ‫تعداد ‪ 54‬بيت به ازاي هر فريم ارسال ميشود‪.‬‬ ‫سيگنال آنالوگ ورودي آن با نرخ ‪ 8000‬هرتز نمونه برداري شده‬ ‫و با ‪ 16‬بيت كوانتايز ميشود‪.‬‬ ‫وكدر ‪LPC10‬‬ ‫رشته بيت‬ ‫ارسالي‬ ‫صفحه ‪ 28‬از ‪54‬‬ ‫كد كننده‬ ‫آشكار ساز‬ ‫صدا‬ ‫فيلتر پيش تاكيد‬ ‫فريم بندي‬ ‫تخمين‬ ‫دوره گام‬ ‫فيلتر خطاي‬ ‫پيشگويي‬ ‫تعيين ضرايب‬ ‫پيشگويي‬ ‫محاسبه بهره‬ ‫كد گشايي‬ ‫‪LPC‬‬ ‫كد گذاري‬ ‫ضرايب ‪LPC‬‬ ‫كد گذاري‬ ‫دوره گام‬ ‫كد گذاري بهره‬ ‫انديس‬ ‫دوره گام‬ ‫‪Bit Encoder‬‬ ‫انديس بهره‬ ‫انديس‬ ‫ضرايب‪LPC‬‬ ‫سيگنال‬ ‫‪PCM‬‬ ‫ودودي‬ ‫تشخيص پريود پيچ‬ m R[l,m]   s[n]s[n  l] n  m  N 1 MDF[l , m]  m  s[n]  s[n  l ] n  m  N 1 s[n]  b. s[n  N ]  e[n],    ‫روش خود همبستگي‬ ‫روش تابع تفاضل دامنه‬ ‫ روش‬YMC m  N 1  m 29 ‫صفحه‬ ‫وكدر ‪LPC10‬‬ ‫كد كننده‬ ‫آشكار ساز صدا‬ ‫‪ -1‬محاسبه انرژي ( باند پايين)‬ ‫‪ -2‬محاسبه نرخ عبور از صفر‬ ‫‪ -3‬محاسبه بهره پيشگويي‬ ‫تخمين پريود پيچ‬ ‫‪ ‬محاسبه ‪MDF‬‬ ‫‪ ‬ارسال يكي از مقادير‪:‬‬ ‫‪T=20,21,…,39,40,42,…,80,84,…,15‬‬ ‫‪4‬‬ ‫صفحه ‪ 30‬از ‪54‬‬ ‫وكدر ‪LPC10‬‬ ‫كد كننده‬ ‫كوانتيزاسيون ضرايب ‪LPC‬‬ ‫‪ -1‬حل معادله نرمال به روش لوينسون‪ -‬دوربين‬ ‫‪ -2‬محاسبه ضرايب ‪RC‬‬ ‫صفحه ‪ 31‬از ‪54‬‬ ‫وكدر ‪LPC10‬‬ ‫سنتز گفتار‬ ‫كد كننده‬ ‫سيگنال اصلي‬ ‫بخش كد كننده‬ ‫• تعيين صدادار‪/‬بيصدا بودن فريم‬ ‫• تعيين دوره گام فثط براي حالت‬ ‫صدادار‬ ‫• محاسبه بهره سيگنال‬ ‫كد گشا‬ ‫بهره‬ ‫مدل منبع‬ ‫‪G‬‬ ‫‪V/U‬‬ ‫قطار ضربه با پريود‬ ‫يراير دوره گام‬ ‫گفتار سنتز شده‬ ‫صفحه ‪32‬‬ ‫نويز‬ ‫تصادفي‬ ‫وكدر ‪LPC10‬‬ ‫محدوديتها‬ ‫‪ -1‬تقسيم بندي به دو قسمت صدادار و بيصدا‬ ‫‪ -2‬استفاده از نويز تصادفي و قطار ضربه پريوديك جهت تحريك(قطار ضربه تنها‬ ‫نميتواند تمامي صوتهاي واكدار را ايجاد كند‪).‬‬ ‫‪ -3‬حفظ نشدن فاز سيگنال اصلي‬ ‫‪ -4‬استفاده از قطار ضربه يك تخطي از مدل ‪ AR‬است‪.‬‬ ‫صفحه ‪33‬‬ Residual Excited LP Vocoder :  Speech quality can be improved at the expense of a higher bit rate by computing and transmitting a residual error, as done in the case of DPCM.  One method is that the LPC model and excitation parameters are estimated from a frame of speech. 34 Residual Excited LP Vocoder :  The speech is synthesized at the transmitter and subtracted from the original speech signal to form the residual error.  The residual error is quantized, coded, and transmitted to the receiver  At the receiver the signal is synthesized by adding the residual error to the signal generated from the model. 35 Residual Excited LP Vocoder :  The residual signal is low-pass filtered at 1000 Hz in the analyzer to reduce bit rate  In the synthesizer, it is rectified and spectrum flattened (using a HPF), the lowpass and highpass signals are summed and the resulting residual error signal is used to excite the LPC model.  RELP vocoder provides communication-quality speech at about 9600 bps. 36 RELP Analyzer (type 1): S(n) Buffer And window f (n; m) ∑ e (n; m) Residual error LP Parameters Excitation {â(i; m)} Encoder stLP analysis Θ̂ 0 , gain estimate V/U, decision parameters To Channel P̂, pitch estimate LP Synthesis model 37 RELP Analyzer (type 2): S(n) Buffer And window f (n; m) Inverse Filter Â(z; m) stLP analysis Prediction Residual  (n; m) Lowpass Filter To Decimator DFT Encoder Channel LP Parameters {â(i; m)} 38 Synthesizer for a RELP vocoder From Channel Decoder Buffer And Controller Residual Interpolator Rectifier Highpass Filter ∑ LP model Parameter updates LP synthesizer Excitation 39 Multipulse LPC Vocoder  RELP needs to regenerate the highfrequency components at the decoder. A  crude approximation of the high frequencies The multipulse LPC is a time domain analysis-by-synthesis method that results in a better excitation signal for the LPC vocal system filter. 40 Multipulse LPC Vocoder  The information concerning the excitation sequence includes:  the location of the pulses  an overall scale factor corresponding to the largest pulse amplitude  The pulse amplitudes relative to the overall scale factor       The scale factor is logarithmically quantized into 6 bits. The amplitudes are linearly quantized into 4 bits. The pulse locations are encoded using a differential coding scheme. The excitation parameters are updated every 5 msec. The LPC vocal-tract parameters and the pitch period are updated every 20 msec. The bit rate is 9600 bps. 41 Analysis-by-synthesis coder A stored sequence from a Gaussian excitation codebook is scaled and used to excite the cascade of a pitch synthesis filter and the LPC synthesis filter  The synthetic speech is compared with the original speech  Residual error signal is weighted perceptually by a filter ˆ ˆ   ( z / c) A( z ) W ( z)   ˆ  ( z) Aˆ ( z / c) 42 Obtaining the multipulse excitation: (Analysis by synthesis method) Input speech s(n) Buffer And LP analysis P̂ Pitch Synthesis filter Θ p (z) LP Synthesis filter + ∑ f̂(n; m) f(n; m)  (n; m) Perceptual Weighting filter W(z) Multipulse Excitation generator Error minimization  W (n; m) 43 Code Excited LP :  CELP is an analysis-by-synthesis method in which the excitation sequence is selected from a codebook of zero-mean Gaussian sequence.  The bit rate of the CELP is 4800 bps. 44 CELP (analysis-by-synthesis coder) : Speech samples LP Gain Gaussian Excitation codebook Buffer and LP analysis Side information parameters Pitch Synthesis filter Spectral Envelope (LP) Synthesis filter ∑ Perceptual Weighting Filter W(z) Computer Index of Energy Excitation (square and sum) sequence 45 Analysis-by-synthesis coder This weighted error is squared and summed over a subframe block to give the error energy  By performing an exhaustive search through the codebook we find the excitation sequence that minimize the error energy  46 Analysis-by-synthesis coder  The gain factor for scaling the excitation sequence is determined for each codeword in the codebook by minimizing the error energy for the block of samples 47 CELP (synthesizer) : From Channel decoder Buffer And controller Gaussian Excitation codebook Pitch Synthesis filter LP Synthesis filter LP parameters, gain and pitch estimate updates 48 CELP synthesizer    Cascade of two all-pole filter with coefficients that are updated periodically First filter is a long-delay pitch filter used to generate the pitch periodicity in voiced speech This filter has this form  p ( z)  p 1  bz  p 49 CELP Parameters of the filter can be determined by minimizing the prediction error energy, after pitch estimation ,over a frame duration of 5msec  Second filter is a short-delay all-pole (vocal-tract) filter and has 10-12 coefficients that are determined every 1020msec  50 Example: sampling frequency is 8khz  subframe block duration for the pitch estimation and excitation sequence is performed every 5msec.  We have 40 samples per 5-msec  The excitation sequence consist of 40 samples  51 Example: A codebook of 1024 sequences gives good-quality speech  For such codebook size ,we require 10bits to send codebook index  Hence the bit rate is reduced by a factor of 4  The transmission of pitch predictor parameters and spectral predictor brings the bit rate to about 4800 bps  52 Low-delay CELP coder CELP has been used to achieve tollquality speech at 16000 bps with low delay.  Although other types of vocoders produces high quality speech at 16000 bps these vocoders buffer 10-20msec of speech samples  53 Low-delay CELP coder The one way delay is of the order of 20-40 msec  With modification of CELP, it is possible to reduce the one-way delay to about 2ms  Low-delay CELP is achieved by using a backward-adaptive predictor with a gain parameter and an excitation vector size as small as 5 samples  54 Low-delay CELP coder Input Speech s(n) Buffer and window Excitation Vector quantizer codebook Gain Gain adaptation LP (high-order) Synthesis filter + f̂(n; m) - Predictor adaptation Error minimization f(n; m) ∑  (n; m) Perceptual Weighting Filter W(z)  W (n; m) 55 Low-delay CELP coder  Pitch predictor used in the conventional forward-adaptive coder is eliminated  In order to compensate for the loss in pitch information, the LPC predictor order is increased significantly , to an order of 50 56 Low-delay CELP coder  LPC coefficients are updated more frequently, every 2.5 ms  5-sample excitation vector corresponds to an excitation block duration of 0.625 msec at an 8-khz sampling rate 57 Low-delay CELP coder  The logarithm of the excitation gain is adapted every subframe excitation block by employing a 10th-order adaptive linear predictor in the logarithmic domain  The coefficients of the logarithmic-gain predictor are updated every four blocks by performing an LPC analysis of previously quantized excitation signal blocks 58 Low-delay CELP coder The perceptual weighting filter is also 10th order and is updated once every four blocks by employing an LPC analysis on frames of the input speech signal of duration 2.5 msec  The excitation codebook in the low-delay CELP is also modified compared to conventional CELP.  10-bit excitation codebook is employed.  59 Vector Sum Excited LP :  The VSELP coder and decoder basically differ in method by which the excitation sequence is formed.  In next block diagram of the VSELP, there are three excitation source.  One excitation is obtained from the pitch period state.  The other two excitation source are obtained from two codebook. 60 VSELP Decoder : Long-term Filter state 0 Codebook 1 ∑ Pitch synthesis filter Spectral envelop (LP) synthesis filter Spectral Synthetic post filter Speech 1 Codebook 2 2 61 VSELP Decoder LPC synthesis filter is implemented as a 10-pole filter and its coefficients are coded and transmitted every 20ms.  Coefficients are updated in each 5-ms frame by interpolation  Excitation parameters are also updated every 5ms  62 VSELP Decoder 128 codewords in each of the two codebooks  codewords are constructed from two sets of seven basis codewords by forming linear combinations of the seven basis codewords  The long-term filter state is also a codebook with 128 codeword sequences  63 VSELP Decoder   In each 5-msec frame, the codewords from this codebook are filtered through the speech system filter ˆ( z ) and correlated with the input speech sequence The filtered codeword is used to update the history and the lag is transmitted to the decoder 64 VSELP Decoder Thus the update occurs by appending the best-filtered codeword to the history codebook  The oldest sample in the history array is discarded  The result is that the long-term state becomes an adaptive codebook  65 VSELP Decoder The three excitation sequence are selected sequentially from each of three codebooks  Each codebook search attempts to find the codeword that minimizes the total energy of the perceptually weighted error  Once the codewords have been selected the three gain parameters are optimized  66 VSELP Decoder Joint gain optimization is sequentially accomplished by orthogonalizing each weighted codeword vectors prior to the codebook search  These parameters are vector quantized to one of 256 eight-bit vectors and transmitted in every 5-ms frame.  67 Vector Sum Excited LP :  The bit rate of the VSELP is about 8000 bps.  Bit allocations for 8000-bps VSELP Parameters Bits/5-ms Frame Bits/20ms 10 LPC coefficients Average speech energy Excitation codewords from two VSELP codebooks Gain parameters Lag of pitch filter - 38 5 14 8 7 56 32 28 Total 29 159 68 VSELP Decoder  Finally, an adaptive spectral post filter is employed in VSELP following the LPC synthesis filter this post filter is a pole-zero filter of the form ˆ( z / c) Aˆ ( z ) W ( z)   ˆ( z ) Aˆ ( z / c) 69 DEMO Speech Codec Male Speaker Female Speaker Music Original Speech/Music (16-bit sampled at 8KHz) FS-1015 (LPC-10e 2.4 kb/s) FS-1016(CELP 4.8 kb/s) IS-54 ( VSELP 7.95 kb/s) G.721 (32 kb/s ADPCM) 70

ch5.3 (Vocoders).ppt

Related documents

Products

Support

ch5.3 (Vocoders).ppt

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib