ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 1: Course Introduction 1. Course Structure 2. DSP: The Short-Time Fourier Transform Dan Ellis Dept. Electrical Engineering, Columbia University dpwe@ee.columbia.edu E4896 Music Signal Processing (Dan Ellis) http://www.ee.columbia.edu/~dpwe/e4896/ 2016-01-20 - 1 /23 freq / kHz Signal Processing and Music 4 3 2 freq / kHz 1 0 4 3 2 freq / kHz 1 0 4 3 2 freq / kHz 1 0 4 3 2 1 0 0 2 4 6 8 10 12 14 16 18 20 time / sec • Music is a very rich signal • Signal Processing can expose some of it • We want to get inside it E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - 2 /23 1. Course Goals • Survey of applications of signal processing to music audio music synthesis music/audio processing (modification) music audio analysis • Connect basic DSP theory to sound phenomena and effects • Hands-on, live investigations of audio processing algorithms using Matlab, Pd, ... E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - 3 /23 Course Structure • Two weekly sessions Monday: presentations, practical exposition, discussion Wednesday: presentations, practical, sharing • Grade structure 20% 10% 30% 40% practicals participation one presentation three mini-project assignments one final project E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - 4 /23 Flipped Classroom • Video Lectures are required viewing before Monday session 30-50 mins each recorded in 2013 bring one question (at least) to class pop quizzes E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - 5 /23 Hands-on Practicals • Music Signal Processing involves connecting algorithms with perceptions • Our goal is to be able to ‘play’ with algorithms to feel how they work In class, on laptops, in small groups • We will use several platforms: PureData (Pd) E4896 Music Signal Processing (Dan Ellis) Matlab Python 2016-01-20 - 6 /23 Projects • There will be 3 “mini projects” assigned approx. week 3, week 5, week 7 provide basic pieces, but open-ended involve implementation & experimentation • Also, Final project on a topic of your choice implement some kind of music signal processing due at end of semester • Good projects ... ... start with clear goals ... apply ideas from class ... evaluate performance & modify accordingly • Could continue into summer... E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - 7 /23 Project Examples Fig. 4. Comparison of effectiveness of different combination of features. • Mood the most effective combination of features. The result is shown in Figure 4 Although there are some features especially useful, I also observe that the number of features enhance the overall classification accuracy. Therefore, the best combination is to use all the features. Classification • Beat Tracking Fig. 6. C two classes happy ar investiga shown in mistaken the same and scary envision This phe would li the diffe combinat to find o result is timber fe found tha are melo effective while the sad song Figure 1: Screenshot of the PD patch implementing the Scheirer algorithm. Much of the logic is hidden inside the “combfilterbank” object at the bottom. 2.2 Tempo Selection Each of the six onset pulse signals (one for each band) is fed into its own bank of 150 comb filters – you can think of these filters as mapping to 150 possible tempos that we want to detect. A comb filter is essentially just a delay implemented a circular buffer. The delay time on each filters corresponds to the tempo that that filter is meant to detect – so for example, the filter that is detecting a tempo of 90 bpm has a delay time of 90 60 = 1.5 Hz, which yields a buffer length of 44100 = 29400 samples. The tempo selector then, for each 1.5 bpm, sums the energy across all 6 subbands at that tempo. The tempo with the highest total energy is chosen as the tempo of the piece. Because of the sheer number of filters (150 6 = 900), it was impossible to implement this using standard PD objects. Therefore I created a PD External written in C [3] and used this within my PD patch. The external has six inputs, one for each subband, and has two float outlets – one indicating the detected tempo, and the other indicating the phase. I added a signal outlet as well for E4896 Music Signal Processing (Dan Ellis) Fig. 5. Composition of hits in each class. • Music VI. E VALUATION AND R ESULTS In the evaluation part, I will use 100 test samples (20 samples for each mood class) to evaluate the effectiveness of my classifier. Essentially, there are two tasks I would like to investigate in the evaluation. First, I use the best combination of feature found in the feature selection part, which is to use all of the features. The average hit rate is 0.64. However, I noticed that the accuracy of each mood class are dramatically different. The hit rates for angry and Transcription In this classifier ness of works, w when cla a key ro number o further an classes a features t found. Fo songs are unexplain In the fu 2016-01-20 - 8 /23 Presentations • Everyone will make at least one in-class presentation e.g. presentation of a relevant paper ... or your latest mini-project ... or just something you’re curious about • We will assign time slots first-come-first served slots each class check course calendar to choose a time email TA • Encourage discussion informal presentations of practical findings E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - 9 /23 Web Site • Information distributed via: http://www.ee.columbia.edu/~dpwe/e4896/ E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - 10/23 Course Outline Jan 20: Intro Mar 07: Pitch tracking Jan 25: Acoustics Mar 21: Time & pitch scaling Feb 01: Perception Mar 28: Beat tracking Feb 08: Analog synthesis Apr 04: Chroma & Chords Feb 15: Sinusoidal models Apr 11: Audio alignment Feb 22: LPC models Apr 18: Music fingerprinting Feb 29: Filtering & Reverb Apr 25: Source separation • Anything missing? E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - 11/23 2. Digital Signals Discrete-time sampling limits bandwidth xd[n] = Q( xc(nT ) ) Discrete-level quantization limits dynamic range time E T • Sampling interval T • Sampling frequency • Quantizer Q(x) = E4896 Music Signal Processing (Dan Ellis) = 2 /T · round(x/ ) 0 2016-01-20 - 12/23 Fourier Series • Observation: Any periodic signal can be constructed from sinusoids at integer multiples of the fundamental frequency M x(t) = x(t + T ) x(t) • E.g., square wave k ( 1) 0 = 0; ak = k 2 1 k=0 1 k k) k = 1, 3, 5, . . . otherwise x(t) 1.0 |ak| 0.5 0 0.5 1 1.5 2 k ak cos( t+ T 1 0.5 0 E4896 Music Signal Processing (Dan Ellis) 0.5 1 t 1.5 1 2 3 4 5 6 7 k 2016-01-20 - 13/23 Fourier Transform • Complex form of Fourier Series + analysis M x(t) j 2T k t ck e k= M 1 ck = T T /2 x(t)e j 2T k t dt T /2 • Let T Harmonics become infinitely close: 1 x(t) = 2 X(j ) = 0.02 X(j )ej t d 0 -0.01 0 x(t)e j t dt x(t) 0.01 0.002 0.004 level / dB -20 0.006 0.008 time / sec |X(jΩ)| -40 -60 -80 0 2000 x(t), X(j ) are equivalent, dual descriptions E4896 Music Signal Processing (Dan Ellis) 4000 6000 8000 freq / Hz 2016-01-20 - 14/23 Spectrum of Periodic Sounds • If sound is (nearly) periodic (i.e., pitched), Fourier Transform approaches Fourier Series x(t + T ) 2 c ( k k k T ) 0.1 level / dB amplitude x(t) X(j ) 0 -0.1 1.52 1.54 1.56 • n.b.: |X(j 1.58 time / s -40 -60 -80 -100 0 1000 2000 3000 4000 freq / Hz )| plotted in log units: dB(x) = 20 log10 (x) E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - 15/23 Discrete-Time Fourier Transf. (DTFT) • Sampled x[n] = x(nT ) has same FT form 1 x[n] = 2 X(ej )ej X(ej ) = x[n]e n d j n n= • ... but results in spectrum with period 2π: |X(ejω)| π 3 arg{X(ejω)} 2 ω π 1 0 π 2π 3π E4896 Music Signal Processing (Dan Ellis) 4π ω 5π 2π 3π 4π 5π −π 2016-01-20 - 16/23 Discrete Fourier Transform (DFT) • N-point finite-length sampled signal the kind you can have on a computer! N 1 1 x[n] = N X[k]WNnk k=0 WN = e N 1 X[k] = j 2N x[n]WN nk n=0 • Just a matrix-multiply between vectors X[0] X[1] X[2] .. . ⇧ ⇧ ⇧ ⇧ ⇧ ⇤ X[N ⇥ 1 ⌃ ⇧ 1 ⌃ ⇧ ⌃ ⇧ 1 ⌃=⇧ ⇧ ⌃ ⇧. ⌅ ⇤ .. 1] 1 WN1 WN2 .. . (N 1) 1 WN E4896 Music Signal Processing (Dan Ellis) 1 WN2 WN4 .. . ··· ··· ··· .. . 2(N 1) ··· WN 1 ⇥ (N 1) ⌃ WN ⌃⇧ 2(N 1) ⌃ ⇧ WN ⌃⇧ ⌃⇧ .. . (N WN x[0] x[1] x[2] .. . ⌃⇧ ⌅⇤ 1)2 x[N ⇥ ⌃ ⌃ ⌃ ⌃ ⌃ ⌅ 1] 2016-01-20 - 17/23 Short-Time Fourier Transform • To localize energy in time and frequency... chop signal into N-point windows every L points take DFT of each one 0.1 L 2L 3L 0 -0.1 short-time 2.35 2.4 2.45 2.5 2.55 2.6 window time / s DFT freq / Hz k→ 4000 N 1 3000 X[k, m] = 2000 n=0 1000 0 x[n + mL] w[n]e j 2 Nkn m= 0 m=1 m=2 m=3 E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - 18/23 The Spectrogram • Plot STFT |X[k, m]| as a grayscale image 0.1 4000 10 0 3000 -10 2000 -20 -30 1000 intensity / dB freq / Hz 0 -40 0 2.35 2.4 2.45 2.5 2.55 2.6 -50 time / s freq / Hz 4000 3000 2000 1000 0 0 0.5 E4896 Music Signal Processing (Dan Ellis) 1 1.5 2 2.5 time / s 2016-01-20 - 19/23 Time-Frequency Tradeoff • Shorter window ➝ better resolution of time detail ➝ worse resolution of frequency detail 0.2 freq / Hz 4000 3000 2000 10 0 1000 -10 0 freq / Hz Window = 48 pt Wideband Window = 256 pt Narrowband 0 -20 4000 -30 -40 3000 -50 level / dB 2000 1000 0 1.4 1.6 E4896 Music Signal Processing (Dan Ellis) 1.8 2 2.2 2.4 2.6 time / s 2016-01-20 - 20/23 Spectrogram in Matlab - • https://www.ee.columbia.edu/~dpwe/e4810/matlab/M14-fft.diary E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - 21/23 Live Matlab Spectrogram • http://www.wakayama-u.ac.jp/~kawahara/MatlabRealtimeSpeechTools/ E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - 22/23 Summary + Assignment • Music Signal Processing synthesis, modification, analysis hands-on investigation • Course participation, practicals, projects, presentations • Digital Signal Processing signals on computers Fourier analysis & spectrogram • Assignment Watch Acoustics video & prepare question Do Pd tutorial http://en.flossmanuals.net/pure-data/ through to “Amplitude Modulation” E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - 23/23