Speech & Audio Processing Speech & Audio Coding Examples A Simple Speech Coder LPC Based Analysis Structure Linear Prediction Analysis Windowing Analysis Analysis Filter 15 April 2020 AutoCorrelation LevinsonDurbin Quantization Audio Input Preemphasis Residual Residual Filter Coeffs Filter Coeffs Veton Këpuska 2 Windowing Analysis Stage N – Length of the Analysis Window 10-30 msec 15 April 2020 Veton Këpuska 3 Some Analysis Windows 15 April 2020 Veton Këpuska 4 MATLAB Useful Functions wintool Use “doc wintool” for more information window Use “>doc window” for the list of supported windows Define your own window if needed e.g: Sine window and Vorbis window n 0.5 wn sin sine window N 2 n 0.5 wn sin sin vorbis window N 2 15 April 2020 Veton Këpuska 5 LPC Analysis Stage LPC Method Described in: Ch5-Analysis_&_Synthesis_of_PoleZero_Speech_Models.ppt Summary: Perform Autocorrelation Solve system of equations with DurbinLevinson Method MATLAB help doc lpc, etc. 15 April 2020 Veton Këpuska 6 Example of MATLAB Code function myLPCCodec(wavfile, N) % % wavfile - input MS wav file % N - LPC Filter Order % [x, fs, nbits] = wavread(wavfile); % plot(x); % Playing Original Signal soundsc(x,fs); % Performing LPC analysis using MATLAB lpc function [a, g] = lpc(x,N); % performing filtering operation on estimated filter coeffs % producing predicted samples est_x = filter([0 -a(2:end)], 1, x); % error signal e = x - est_x; % Testing the quality of predicted samples soundsc(est_x, fs); ge[n] % Synthesis Stage With Zero Loss of Information syn_x = filter([0 -a(2:end)], 1, g.*e); soundsc(syn_x,fs); H z 1 A z ŝ[n] p sˆn k sˆn k gen k 1 15 April 2020 Veton Këpuska 7 Analysis of Quantization Errors Use MATLAB functions to research the effects of quantization errors introduced by precision of the arithmetic operations and representation of the filter and error signal: Double (float64) representation (software emulation) Float (float32) representation (software emulation) Int (int32) representation (hardware emulation) Short (int16) representation (hardware emulation). Useful MATLAB functions: Fix, floor, round, ceil Example: sig_hat=fix(sig*2^(B-1))/2^(B-1); Truncation of the sig to B bits. 15 April 2020 Veton Këpuska 8 Quantization of Error Signal & Filter Coefficients Can Apply ADPCM for Error Signal Filter Coefficients in the Direct Filter Form are found to be sensitive to quantization errors: Small quantization error can have a large effect on filter characteristics. Issue is that polynomial coefficients have nonlinear mapping to poles of the filter (e.g., roots of the polynomial). Alternate representations possible that have significantly better tolerance to quantization error. 15 April 2020 Veton Këpuska 9 LPC Filter Representations As noted previously when Levinson-Durbin algorithm was introduced one alternate representation to filter coefficients was also mentioned: PARCOR coefficients: LPC to PARCOR: a jp j 1 j p for i p,p 1, ,1 a ij1 a ij aii aii j 1 k 2 i 1 j i 1 ki 1 aii11 15 April 2020 Veton Këpuska 10 PARCOR Filter Representation PARCOR to LPC: for i 1,2, ,p aii ki a ij a ij1 ki aii1j 1 j i 1 j a jp 1 j p 15 April 2020 Veton Këpuska 11 Line Spectral Frequency Representation It turns out that PARCOR coefficients can be represented with LSF that have significantly better properties. Note that: 1 H z A z The PARCOR lattice structure of the LPC synthesis filter above: kp+1=∓1 Input + z-1 Bp 15 April 2020 Ap-1 Ap - A0 + kp z-1 Bp-1 Veton Këpuska - Output kp-1 z-1 B0 12 Line Spectral Frequency Representation From previous slide the following holds: Ap 1 z Ap z k p B p 1 z B p z z 1 B p 1 z k p Ap 1 z A0 z 1 & B0 z z 1 & B p z z p 1 Ap z 1 From this realization of the filter the LSP representation is derived: 15 April 2020 Veton Këpuska 13 LSF Representation k p 1 1 Pp 1 z Ap z B p z k p 1 1 Q p 1 z Ap z B p z 15 April 2020 1 Ap z Pp 1 z Q p 1 z 2 Veton Këpuska 14 LPC Synthesis Filter with LSF 1 1 H z A z 1 A z 1 1 1 1 Pp 1 z 1 Q p 1 z 1 2 15 April 2020 Veton Këpuska 15 A Simple Speech Coder LPC Based Synthesis Structure Decoding Residual Signal Residual Synthesis Filter Deemphasis Audio Output Filter Coeffs Filter Coeffs 15 April 2020 Veton Këpuska 16 Audio Coding Audio Coding Most of the Audio Coding Standards use principles of Psychoacoustics. Example of Basic Structure of MP3 encoder: Audio Input Filterbank & Transform Quantization Bit-stream Psychoacoustic Model 15 April 2020 Veton Këpuska 18 Basic Structure of Audio Coders Filterbank Processing Psychoacoustic Model Quantization 15 April 2020 Veton Këpuska 19 Filter Bank Analysis Synthesis Filterbank Processing: Splitting full-band signal into several subbands: Uniform sub-bands (FFT) Critical Band (FFT followed by non-linear transformation) Reflect Human Auditory Apparatus. Mel-Scale and Bark-Scale transformations f Mel 1127.01048 * ln 1 700 f 2 Bark 13 * arctan 0.00076 * f 3.5 * arctan 7500 15 April 2020 Veton Këpuska 21 Mel-Scale 15 April 2020 Veton Këpuska 22 Bark-Scale 15 April 2020 Veton Këpuska 23 Analysis Structure of Filterbank hk[n] – Impulse Response of a Quadrature Mirror kth-filter N – Number of Channels. Typically 32 ↓ - Down-sampling MDCT – Modified Discrete Cosine Transform ↓ MDCT Audio Input 15 April 2020 MDCT Quantization h1[n] hk[n] ↓ MDCT MDCT hN[n] ↓ MDCT MDCT Veton Këpuska Bit Stream 24 Analysis Structure of Filterbank gk[n] – Impulse Response of a Inverse Quadrature Mirror kth-filter N – Number of Channels. Typically 32 ↑ - Up-sampling Bit Stream 15 April 2020 MDCT IMDCT ↑ g1[n] Decoding IMDCT – Inverse Modified Discrete Cosine Transform MDCT IMDCT ↑ gk[n] MDCT IMDCT ↑ gN[n] Veton Këpuska Audio Output 25 Psycho-Acoustic Modeling Psychoacoustic Model Masking Threshold according to the human auditory perception. Masking threshold is used to quantize the Discrete Cosine Transform Coefficients Analysis is done in frequency domain represented by DFT and computed by FFT. 15 April 2020 Veton Këpuska 27 Threshold of Hearing Absolute threshold of audibly perceptible events in quiet conditions (no other sounds). Any signal bellow the threshold can be removed without effect on the perception. 15 April 2020 Veton Këpuska 28 Threshold of Hearing 15 April 2020 Veton Këpuska 29 Frequency Masking Schröder Spreading Function Bark Scale Function: f 2 z f 13 * arctan 0.00076 * f 3.5 * arctan 7500 z z f maskee z f mas ker 10 * log 10 F z 15.81 7.5z 0.474 17.5 1 z 0.474 15 April 2020 Veton Këpuska 1 2 2 30 Masking Curve 15 April 2020 Veton Këpuska 31 Primary Tone 1kHz 15 April 2020 Veton Këpuska 32 Masked Tone 900 Hz 15 April 2020 Veton Këpuska 33 Combined Sound 1kHz + 0.9kHz 15 April 2020 Veton Këpuska 34 Combined 1kHz + 0.9kHz (-10dB) 15 April 2020 Veton Këpuska 35 Combined 1kHz + 5kHz (-10dB) 15 April 2020 Veton Këpuska 36 END 15 April 2020 Veton Këpuska 37