ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 6: Linear Prediction (LPC) 1. 2. 3. 4. Resonance & The Source-Filter Model Linear Prediction (LPC) LP Representations LP Synthesis & Modification Dan Ellis Dept. Electrical Engineering, Columbia University dpwe@ee.columbia.edu E4896 Music Signal Processing (Dan Ellis) http://www.ee.columbia.edu/~dpwe/e4896/ 2013-02-25 - 1 /16 1. Resonance • Resonance is ubiquitous in physical systems e.g. plucked/struck string, drum head room “coloration” vocal tract A B C D Poles easily implemented in LTI filters E4896 Music Signal Processing (Dan Ellis) 0 Position Time • Resonances 0 Velocity 2013-02-25 - 2 /16 Singe Pole-Pair Resonance • imag Simple resonance = Second-order IIR Band Pass Filter 1 0.5 + y[n] z-1 2rcosωc z-1 -r2 r ωc 0 x[n] -0.5 -1 z-plane |H(ejw)| -1 0 ωc 0 -0.5 0 0.5 1 real ω Q= c B B ≈ 2(1-r) 0.1 0.2 E4896 Music Signal Processing (Dan Ellis) 0.3 ω/π impulse_bpf.pd 2013-02-25 - 3 /16 Resonances in Speech • Vocal Tract (throat + tongue + lips) acts as variable resonator freq / Hz resonances = “formants” 4000 3000 2000 1000 hε z e 0 0.5 w has a E4896 Music Signal Processing (Dan Ellis) tcl ^ 0 1 c θ watch 1.5 n time I z/ s I thin as a I d ay m dime 2013-02-25 - 4 /16 Source-Filter Model • Separation of: Source: fine structure in time/frequency Filter: subsequent shaping by physical resonances Formants Pitch t Glottal pulse train Voiced/ unvoiced Vocal tract resonances + Radiation characteristic Speech f Frication noise t Source • Filter Advantages Good match to real signals Salient pieces E4896 Music Signal Processing (Dan Ellis) 2013-02-25 - 5 /16 2. Linear Prediction (LPC) • LPC = Linear Predictive Coding remove redundancy in signal try to predict next point as linear combination of previous values p s[n] = ak s[n k] + e[n] k=1 {ak } are pth order linear predictor coefficients e[n] is residual “innovation” a/k/a prediction error • Transfer function S(z) = E(z) 1 1 p k=1 ak z k 1 = A(z) all-pole “autoregressive” (AR) modeling E4896 Music Signal Processing (Dan Ellis) 2013-02-25 - 6 /16 Voice Modeling & LPC • Direct expression of source-filter model p s[n] = ak s[n k] + e[n] k=1 Pulse/noise excitation e[n] H(z) Vocal tract H(z) = 1/A(z) s[n] |H(ejω)| z-plane f acoustic tube model of vocal tract is all-pole vocal tract resonances change slowly ~ 10-20ms but: nasals E4896 Music Signal Processing (Dan Ellis) 2013-02-25 - 7 /16 • You can “see” resonances in a spectral slice: energy / dB Estimating LPC Models -40 -60 -80 -100 0 1000 2000 • We can find LPC coefficients {ae[n]} to minimize energy of residual k n s[n] n freq / Hz : 2 p e2 [n] = 3000 ak s[n k] k=1 differentiate w.r.t. ak & solve end up with p linear equations involving s[n j]s[n autocorrelations rss (|j k|) = k] n E4896 Music Signal Processing (Dan Ellis) 2013-02-25 - 8 /16 LPC Illustration 0.1 0 windowed original -0.1 -0.2 LPC residual -0.3 0 dB 50 100 150 200 250 300 350 400 time / samp original spectrum 0 LPC spectrum -20 -40 -60 residual spectrum 0 1000 2000 3000 4000 5000 6000 7000 freq / Hz • Actual poles: z-plane E4896 Music Signal Processing (Dan Ellis) E4896_L06_lpc_diary 2013-02-25 - 9 /16 Short-Time LP Analysis freq / kHz • Solve LPC for each ~20 ms frame 8 6 4 2 0 20 Imaginary Part 1 10 5 12 0 0 -0.5 -1 freq / kHz 15 0.5 8 -5 -10 -1 0 Real Part 1 -15 0 0.2 0.4 0.6 0.8 1 6 4 2 0 0.5 1 E4896 Music Signal Processing (Dan Ellis) 1.5 2 2.5 3 time / s 2013-02-25 - 10/16 3. LP Representations • Can interpret LPC filter fit many ways: • Picking out resonances if signal was source + resonances, should find them approximation • Low-ordere spectral [n] |E(e )| j 2 minimizing 2 also minimizes different from e.g. Fourier approximation... • Finding & removing smooth spectrum 1 A(z) is smooth approximation of S(z) S(z) 1 = E(z) A(z) E(z) = S(z)A(z) is “unsmoothed” S(z) • Signal whitening removing linear dependence makes residual like white noise (iid, flat spectrum) E4896 Music Signal Processing (Dan Ellis) 2013-02-25 - 11/16 • Alternative Forms Many formulations for pth order all-pole IIR: predictor coefficients {ak } / polynomial A(z) roots { i = ri ej i } of A(z) reflection coefficients (for lattice filter structure) Line Spectral Frequencies (LSF) • Choice depends on: mathematical convenience numerical stability statistical properties (e.g. for coding) opportunities for modification E4896 Music Signal Processing (Dan Ellis) 2013-02-25 - 12/16 4. LPC Synthesis • LP analysis on ~20ms frames gives prediction filter A(z) and residual e[n] recombining them should yield perfect s[n] coding applications further compress e[n] Encoder Decoder Filter coefficients {ai} |1/A(ej!)| Input s[n] Represent & encode f LPC analysis Excitation generator Represent & encode Residual e[n] ^ e[n] All-pole filter H(z) = t Output ^ s[n] 1 1 - "aiz-i e.g. simple pitch tracker ➝ “buzz-hiss” encoding Pitch period values 16 ms frame boundaries 100 50 0 -50 1.3 1.35 E4896 Music Signal Processing (Dan Ellis) 1.4 1.45 1.5 1.55 1.6 1.65 1.7 1.75 time / s 2013-02-25 - 13/16 LPC Warping • Replacing delays with allpass elements warps frequencies but not magnitudes z-1 z+ z+1 Original W Frequency 8000 P 0.8 A = 0.6 6000 4000 2000 0 0.6 0.5 1 1.5 2 Time Warped LPC resynth, 0.4 2.5 3 2.5 3 = -0.2 A = -0.6 0 0 0.2 0.4 0.6 0.8 ^ W P Frequency 8000 0.2 6000 4000 2000 0 0.5 1 1.5 2 Time http://www.ee.columbia.edu/~dpwe/resources/matlab/polewarp/ E4896 Music Signal Processing (Dan Ellis) 2013-02-25 - 14/16 Cross-Synthesis • Mix residual (source) of one signal with resonances (filter) of another or: just use white noise as excitation formants carry phonemes ➝ vocoder E4896 Music Signal Processing (Dan Ellis) 2013-02-25 - 15/16 Summary • Resonances (poles) color sound • Source + Filter model decouples excitation and resonances • Linear Prediction is a simple way to model and implement resonances (filter) • Many interpretations, representations, modifications E4896 Music Signal Processing (Dan Ellis) 2013-02-25 - 16/16