ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 5: Sinusoidal Modeling 1. 2. 3. 4. Sinusoidal Modeling Sinusoidal Analysis Sinusoidal Synthesis & Modification Noise Residual Dan Ellis Dept. Electrical Engineering, Columbia University dpwe@ee.columbia.edu E4896 Music Signal Processing (Dan Ellis) http://www.ee.columbia.edu/~dpwe/e4896/ 2013-02-18 - 1 /16 1. Sinusoidal Modeling • Periodic sounds each ridge is a sinusoidal harmonic .. with smoothly-varying parameters Violin.arco.ff.A4 ➔ ridges in spectrogram .. an efficient & flexible description? E4896 Music Signal Processing (Dan Ellis) 2013-02-18 - 2 /16 Sinusoid Modeling • Analogous to Fourier series model harmonics explicitly? e.g. x[n] = ak [n]cos( k [n]) k ... for pitched signal with fundamental k [n] = k · 0 [n] · n 0 [n] • Additional constraints harmonicity smoothness of ak [n] • Arbitrarily accurate given enough sinusoids E4896 Music Signal Processing (Dan Ellis) 2013-02-18 - /16 Examples • Using Michael Klingbeil’s SPEAR http://www.klingbeil.com/spear/ E4896 Music Signal Processing (Dan Ellis) 2013-02-18 - 4 /16 Envelope Limitations • Extracted envelope reflects analysis window 2000 0.3 1500 Frequency 0.4 0.2 0.1 0 1000 500 0 10 20 30 40 50 60 0 70 2000 0.3 1500 Frequency 0.4 0.2 0.1 0 0 0.5 1 Time 1.5 2 0 0.5 1 Time 1.5 2 1000 500 0 10 20 30 40 50 60 70 0 • Sharp window violates assumptions E4896 Music Signal Processing (Dan Ellis) 2013-02-18 - 5 /16 2. Sinusoidal Analysis • Sinusoids = peaks in spectrogram slices N 1 = DFT frames X[k, m] = x[n + mL] w[n]e j 2 Nkn n=0 • DFT length N window determines frequency resolution: X(ej ) long enough to “see harmonics” e.g. 2-3x longest pitch cycle typically 50-100 ms but: too long → blurs amplitude envelope ak [n] W (ej ) • Hop advance L choose N/2 or N/4 .. denser for simpler interpolation along time E4896 Music Signal Processing (Dan Ellis) 2013-02-18 - 6 /16 Sinusoidal Peak Picking • Local maxima in DFT frames freq / Hz 8000 6000 4000 2000 level / dB 0 0 20 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 time / s 0 -20 -40 -60 0 1000 2000 3000 4000 5000 6000 7000 freq / Hz 20 y 10 ab2/4 0 -10 0 y = ax(x-b) x b/2 -20 400 600 E4896 Music Signal Processing (Dan Ellis) phase / rad level / dB • Quadratic fit for sub-bin resolution 800 freq / Hz -5 -10 400 600 800 freq / Hz 2013-02-18 - 7 /16 Peak Selection • Don’t want every peak just “true” sinusoids threshold? level / dB 20 0 -20 -40 -60 0 1000 2000 3000 local shape - fits 4000 ( • Look for stability 5000 0) 6000 7000 freq / Hz W (ej ) of frequency & amplitude in successive time frames phase derivative in time/freq E4896 Music Signal Processing (Dan Ellis) 2013-02-18 - 8 /16 Track Formation • Connect peaks in adjacent frames to form sinusoids freq can be ambiguous if large frequency changes birth existing tracks death new peaks time • Unclaimed peak → create new track • No continuation of track → termination hysteresis E4896 Music Signal Processing (Dan Ellis) 2013-02-18 - 9 /16 Pitch Tracking • Extracted sinusoids could be anywhere 6000 freq / Hz freq / Hz but often expect them to be in harmonic series 4000 2000 0 0 0.05 0.1 0.15 0.2 time / s 700 650 600 550 0 0.05 0.1 0.15 0.2 factor • Find pitch by searching for common [n] = k · [n] can then “regularize” pitch E4896 Music Signal Processing (Dan Ellis) k time / s 0 2013-02-18 - 10/16 3. Sinusoidal Synthesis • Each sinusoid track {a [n], level drives an oscillator k [n]} k 3 3 2 2 a k[n] 1 a k[n]·cos(W k[n]·t) 0 0 700 Hz / q fre 1 -1 600 500 0 Wk[n] 0.05 0.1 0.15 0.2 n -2 time / s -3 0 0.05 0.1 0.15 time / s 0.2 can interpolate amplitude, frequency samples • Faster method synthesizes DFT frames then overlap-add trickier to achieve frequency modulation E4896 Music Signal Processing (Dan Ellis) 2013-02-18 - 11/16 Sinusoidal Modification • Sinusoidal description very easy to modify e.g. changing time base of sample points freq / Hz 5000 4000 3000 2000 1000 0 0 0.05 0.1 0.15 0.2 0.25 0.3 • Frequency stretch 0.35 0.4 0.45 0.5 time / s 40 level / dB level / dB preserve formant envelope? 30 20 10 0 0 1000 2000 3000 4000 freq / Hz E4896 Music Signal Processing (Dan Ellis) 40 30 20 10 0 0 1000 2000 3000 4000 freq / Hz 2013-02-18 - 12/16 4. Noise Residual • Some energy is not well fit with sinusoids e.g. noisy energy • Can just keep it as residual or model it some other way • Leads to “sinusoidal + noise” model x[n] = ak [n]cos( k [n]n) + e[n] k mag / dB 20 sinusoids 0 original -20 -40 -60 -80 residual LPC 0 1000 2000 E4896 Music Signal Processing (Dan Ellis) 3000 4000 5000 6000 7000 freq / Hz 2013-02-18 - 13/16 Sinusoids + Noise Decomposition • Removing sines reveals noise & transients Guitar - original 4000 Frequency 3000 2000 1000 0 0 0.2 0.4 0.6 0.8 1 Time 1.2 1.4 1.6 1.8 2 1.4 1.6 1.8 2 1.4 1.6 1.8 2 Guitar - sinusoid reconstruction 4000 Frequency 3000 2000 1000 0 0 0.2 0.4 0.6 0.8 1 Time 1.2 Guitar - residual (original - sines) 4000 Frequency 3000 2000 1000 0 0 0.2 0.4 0.6 0.8 1 Time 1.2 • Different representation approaches... E4896 Music Signal Processing (Dan Ellis) 2013-02-18 - 14/16 5. Limitations • The spectrogram (mag STFT) is not linear freq / Hz superpositions suffer from phase effects abs(stft(s1)) + abs(stft(s2)) 1400 1300 1300 1200 1200 1100 1100 1000 1000 900 900 800 0 0.5 1 abs(stft(s1+s2)) 1400 800 250 200 150 100 50 0 0.5 time / sec 1 0 • Separating sources is generally hard... parameters tracking E4896 Music Signal Processing (Dan Ellis) 2013-02-18 - 15/16 Summary • Spectrogram shows sinusoid harmonics in many sounds • Peak picking in spectrogram can effectively extract them • Sinusoidal domain extremely flexible for modification • Noise residual can add even more realism E4896 Music Signal Processing (Dan Ellis) 2013-02-18 - 16/16