Lecture 1: Course Introduction 1. Course Structure

advertisement
ELEN E4896 MUSIC SIGNAL PROCESSING
Lecture 1:
Course Introduction
1. Course Structure
2. DSP: The Short-Time Fourier Transform
Dan Ellis
Dept. Electrical Engineering, Columbia University
dpwe@ee.columbia.edu
E4896 Music Signal Processing (Dan Ellis)
http://www.ee.columbia.edu/~dpwe/e4896/
2016-01-20 - 1 /23
freq / kHz
Signal Processing and Music
4
3
2
freq / kHz
1
0
4
3
2
freq / kHz
1
0
4
3
2
freq / kHz
1
0
4
3
2
1
0
0
2
4
6
8
10
12
14
16
18
20
time / sec
• Music is a very rich signal
• Signal Processing can expose some of it
• We want to get inside it
E4896 Music Signal Processing (Dan Ellis)
2016-01-20 - 2 /23
1. Course Goals
• Survey of applications of signal processing to music audio
music synthesis
music/audio processing (modification)
music audio analysis
• Connect basic DSP theory to sound phenomena and effects
• Hands-on, live investigations of audio processing algorithms
using Matlab, Pd, ...
E4896 Music Signal Processing (Dan Ellis)
2016-01-20 - 3 /23
Course Structure
• Two weekly sessions
Monday: presentations, practical exposition, discussion
Wednesday: presentations, practical, sharing
• Grade structure
20%
10%
30%
40%
practicals participation
one presentation
three mini-project assignments
one final project
E4896 Music Signal Processing (Dan Ellis)
2016-01-20 - 4 /23
Flipped Classroom
• Video Lectures are required viewing before Monday session
30-50 mins each
recorded in 2013
bring one question (at least) to class
pop quizzes
E4896 Music Signal Processing (Dan Ellis)
2016-01-20 - 5 /23
Hands-on Practicals
• Music Signal Processing involves connecting algorithms with perceptions
• Our goal is to be able to ‘play’ with
algorithms to feel how they work
In class, on laptops, in small groups
• We will use several platforms:
PureData (Pd)
E4896 Music Signal Processing (Dan Ellis)
Matlab
Python
2016-01-20 - 6 /23
Projects
• There will be 3 “mini projects” assigned
approx. week 3, week 5, week 7
provide basic pieces, but open-ended
involve implementation & experimentation
• Also, Final project
on a topic of your choice
implement some kind of music signal processing
due at end of semester
• Good projects ...
... start with clear goals
... apply ideas from class
... evaluate performance & modify accordingly
• Could continue into summer...
E4896 Music Signal Processing (Dan Ellis)
2016-01-20 - 7 /23
Project Examples
Fig. 4.
Comparison of effectiveness of different combination of features.
• Mood the most effective combination of features. The result is shown
in Figure 4 Although there are some features especially useful,
I also observe that the number of features enhance the overall
classification accuracy. Therefore, the best combination is to
use all the features.
Classification
• Beat Tracking
Fig. 6. C
two classes
happy ar
investiga
shown in
mistaken
the same
and scary
envision
This phe
would li
the diffe
combinat
to find o
result is
timber fe
found tha
are melo
effective
while the
sad song
Figure 1: Screenshot of the PD patch implementing the Scheirer algorithm.
Much of the logic is hidden inside the “combfilterbank” object at the bottom.
2.2
Tempo Selection
Each of the six onset pulse signals (one for each band) is fed into its own
bank of 150 comb filters – you can think of these filters as mapping to 150
possible tempos that we want to detect. A comb filter is essentially just a delay
implemented a circular buffer. The delay time on each filters corresponds to
the tempo that that filter is meant to detect – so for example, the filter that
is detecting a tempo of 90 bpm has a delay time of 90
60 = 1.5 Hz, which yields
a buffer length of 44100
=
29400
samples.
The
tempo
selector then, for each
1.5
bpm, sums the energy across all 6 subbands at that tempo. The tempo with
the highest total energy is chosen as the tempo of the piece.
Because of the sheer number of filters (150 6 = 900), it was impossible to
implement this using standard PD objects. Therefore I created a PD External
written in C [3] and used this within my PD patch. The external has six inputs,
one for each subband, and has two float outlets – one indicating the detected
tempo, and the other indicating the phase. I added a signal outlet as well for
E4896 Music Signal Processing (Dan Ellis)
Fig. 5.
Composition of hits in each class.
• Music VI. E VALUATION AND R ESULTS
In the evaluation part, I will use 100 test samples (20
samples for each mood class) to evaluate the effectiveness
of my classifier. Essentially, there are two tasks I would
like to investigate in the evaluation. First, I use the best
combination of feature found in the feature selection part,
which is to use all of the features. The average hit rate is
0.64. However, I noticed that the accuracy of each mood
class are dramatically different. The hit rates for angry and
Transcription
In this
classifier
ness of
works, w
when cla
a key ro
number o
further an
classes a
features t
found. Fo
songs are
unexplain
In the fu
2016-01-20 - 8 /23
Presentations
• Everyone will make at least one in-class presentation
e.g. presentation of a relevant paper
... or your latest mini-project
... or just something you’re curious about
• We will assign time slots first-come-first served
slots each class
check course calendar to choose a time
email TA
• Encourage discussion
informal presentations of practical findings
E4896 Music Signal Processing (Dan Ellis)
2016-01-20 - 9 /23
Web Site
• Information distributed via:
http://www.ee.columbia.edu/~dpwe/e4896/
E4896 Music Signal Processing (Dan Ellis)
2016-01-20 - 10/23
Course Outline
Jan 20: Intro
Mar 07: Pitch tracking
Jan 25: Acoustics
Mar 21: Time & pitch scaling
Feb 01: Perception
Mar 28: Beat tracking
Feb 08: Analog synthesis
Apr 04: Chroma & Chords
Feb 15: Sinusoidal models
Apr 11: Audio alignment
Feb 22: LPC models
Apr 18: Music fingerprinting
Feb 29: Filtering & Reverb
Apr 25: Source separation
• Anything missing?
E4896 Music Signal Processing (Dan Ellis)
2016-01-20 - 11/23
2. Digital Signals
Discrete-time sampling
limits bandwidth
xd[n] = Q( xc(nT ) )
Discrete-level
quantization
limits
dynamic range
time
E
T
• Sampling interval T
• Sampling frequency
• Quantizer Q(x) =
E4896 Music Signal Processing (Dan Ellis)
= 2 /T
· round(x/ )
0
2016-01-20 - 12/23
Fourier Series
• Observation:
Any periodic signal can be constructed from sinusoids at integer multiples of the
fundamental frequency
M
x(t) = x(t + T )
x(t)
• E.g., square wave
k
( 1)
0
= 0; ak =
k
2
1
k=0
1
k
k)
k = 1, 3, 5, . . .
otherwise
x(t)
1.0
|ak|
0.5
0
0.5
1
1.5
2 k
ak cos(
t+
T
1
0.5
0
E4896 Music Signal Processing (Dan Ellis)
0.5
1
t
1.5
1 2 3 4 5 6 7
k
2016-01-20 - 13/23
Fourier Transform
• Complex form of Fourier Series + analysis
M
x(t)
j 2T k t
ck e
k= M
1
ck =
T
T /2
x(t)e
j 2T k t
dt
T /2
• Let
T
Harmonics become infinitely close:
1
x(t) =
2
X(j ) =
0.02
X(j )ej t d
0
-0.01
0
x(t)e
j t
dt
x(t)
0.01
0.002
0.004
level
/ dB
-20
0.006
0.008
time / sec
|X(jΩ)|
-40
-60
-80
0
2000
x(t), X(j ) are equivalent, dual descriptions
E4896 Music Signal Processing (Dan Ellis)
4000
6000
8000
freq / Hz
2016-01-20 - 14/23
Spectrum of Periodic Sounds
• If sound is (nearly) periodic (i.e., pitched), Fourier Transform approaches Fourier Series
x(t + T )
2
c
(
k
k k
T )
0.1
level / dB
amplitude
x(t)
X(j )
0
-0.1
1.52
1.54
1.56
• n.b.: |X(j
1.58
time / s
-40
-60
-80
-100
0
1000
2000
3000
4000
freq / Hz
)| plotted in log units:
dB(x) = 20 log10 (x)
E4896 Music Signal Processing (Dan Ellis)
2016-01-20 - 15/23
Discrete-Time Fourier Transf. (DTFT)
• Sampled x[n] = x(nT ) has same FT form
1
x[n] =
2
X(ej )ej
X(ej ) =
x[n]e
n
d
j n
n=
• ... but results in spectrum with period 2π:
|X(ejω)|
π
3
arg{X(ejω)}
2
ω
π
1
0
π
2π
3π
E4896 Music Signal Processing (Dan Ellis)
4π
ω
5π
2π
3π
4π
5π
−π
2016-01-20 - 16/23
Discrete Fourier Transform (DFT)
• N-point finite-length sampled signal
the kind you can have on a computer!
N 1
1
x[n] =
N
X[k]WNnk
k=0
WN = e
N 1
X[k] =
j 2N
x[n]WN nk
n=0
• Just a matrix-multiply between vectors
X[0]
X[1]
X[2]
..
.
⇧
⇧
⇧
⇧
⇧
⇤
X[N
⇥
1
⌃ ⇧
1
⌃ ⇧
⌃ ⇧
1
⌃=⇧
⇧
⌃ ⇧.
⌅ ⇤ ..
1]
1
WN1
WN2
..
.
(N 1)
1 WN
E4896 Music Signal Processing (Dan Ellis)
1
WN2
WN4
..
.
···
···
···
..
.
2(N 1)
···
WN
1
⇥
(N 1) ⌃
WN
⌃⇧
2(N 1) ⌃ ⇧
WN
⌃⇧
⌃⇧
..
.
(N
WN
x[0]
x[1]
x[2]
..
.
⌃⇧
⌅⇤
1)2
x[N
⇥
⌃
⌃
⌃
⌃
⌃
⌅
1]
2016-01-20 - 17/23
Short-Time Fourier Transform
• To localize energy in time and frequency...
chop signal into N-point windows every L points
take DFT of each one
0.1
L
2L
3L
0
-0.1
short-time 2.35
2.4
2.45
2.5
2.55
2.6
window
time / s
DFT
freq / Hz
k→
4000
N 1
3000
X[k, m] =
2000
n=0
1000
0
x[n + mL] w[n]e
j 2 Nkn
m= 0
m=1
m=2
m=3
E4896 Music Signal Processing (Dan Ellis)
2016-01-20 - 18/23
The Spectrogram
• Plot STFT |X[k, m]| as a grayscale image
0.1
4000
10
0
3000
-10
2000
-20
-30
1000
intensity / dB
freq / Hz
0
-40
0
2.35
2.4
2.45
2.5
2.55
2.6
-50
time / s
freq / Hz
4000
3000
2000
1000
0
0
0.5
E4896 Music Signal Processing (Dan Ellis)
1
1.5
2
2.5
time / s
2016-01-20 - 19/23
Time-Frequency Tradeoff
• Shorter window
➝ better resolution of time detail
➝ worse resolution of frequency detail
0.2
freq / Hz
4000
3000
2000
10
0
1000
-10
0
freq / Hz
Window = 48 pt
Wideband
Window = 256 pt
Narrowband
0
-20
4000
-30
-40
3000
-50
level
/ dB
2000
1000
0
1.4
1.6
E4896 Music Signal Processing (Dan Ellis)
1.8
2
2.2
2.4
2.6
time / s
2016-01-20 - 20/23
Spectrogram in Matlab
-
•
https://www.ee.columbia.edu/~dpwe/e4810/matlab/M14-fft.diary
E4896 Music Signal Processing (Dan Ellis)
2016-01-20 - 21/23
Live Matlab Spectrogram
•
http://www.wakayama-u.ac.jp/~kawahara/MatlabRealtimeSpeechTools/
E4896 Music Signal Processing (Dan Ellis)
2016-01-20 - 22/23
Summary + Assignment
• Music Signal Processing
synthesis, modification, analysis
hands-on investigation
• Course
participation, practicals, projects, presentations
• Digital Signal Processing
signals on computers
Fourier analysis & spectrogram
• Assignment
Watch Acoustics video & prepare question
Do Pd tutorial http://en.flossmanuals.net/pure-data/
through to “Amplitude Modulation”
E4896 Music Signal Processing (Dan Ellis)
2016-01-20 - 23/23
Download