Perceptually-Inspired Music Audio Analysis

advertisement
Perceptually-Inspired
Music Audio Analysis
Dan Ellis
Laboratory for Recognition and Organization of Speech and Audio
Dept. Electrical Eng., Columbia Univ., NY USA
dpwe@ee.columbia.edu
1.
2.
3.
4.
5.
http://labrosa.ee.columbia.edu/
Perceptually-Inspired Analysis
The Acoustic Structure of Music
Music Scene Analysis
Large Music Collections
Open Issues
Music Audio Analysis - Dan Ellis
2012-01-07
1 /55
1. Perceptually-Inspired Analysis
• Machine Listening:
Describe
Automatic
Narration
Emotion
Music
Recommendation
Classify
Environment
Awareness
ASR
Music
Transcription
“Sound
Intelligence”
VAD
Speech/Music
Environmental
Sound
Speech
Dectect
Task
Extracting useful information from sound
Music Audio Analysis - Dan Ellis
Music
Domain
2012-01-07
2 /55
Listening to Mixtures
Bregman ’90
• The world is cluttered
sound is transparent
mixtures are inevitable
• Useful information is structured by ‘sources’
specific definition of a ‘source’:
intentional independence
Music Audio Analysis - Dan Ellis
2012-01-07
3 /55
Scene Analysis
• Detect separate events
freq / Hz
common onset
common harmonicity
8000
6000
Pierce ’83
4000
2000
0
0
1
2
3
4
5
6
7
8
9 time / s
instruments & timbre
Music Audio Analysis - Dan Ellis
2012-01-07
4 /55
freq / kHz
2. The Acoustic Structure of Music
4
3
2
freq / kHz
1
0
4
3
2
freq / kHz
1
0
4
3
2
freq / kHz
1
0
4
3
2
1
0
0
2
4
6
8
10
• Pitches,Voices, Rhythm
Music Audio Analysis - Dan Ellis
12
14
16
18
20
time / sec
2012-01-07
5 /55
Pitch, Harmony, Consonance
• Musical
intervals
relate to
harmonic
proximity
• Pitch Helix
Music Audio Analysis - Dan Ellis
2012-01-07
6 /55
Timbre
Grey ’75
• The property that distinguishes instruments
spectrum, noise, onset, dynamics, ...
Music Audio Analysis - Dan Ellis
2012-01-07
7 /55
Rhythm
• Periodic “events” perceived as a structure
Harnoncourt
40
McKinney & Moelants ’06
30
20
10
0
Frequency
7671
2416
761
240
0
5
10
15
20
25
time / sec
30
hierarchy, swing
Music Audio Analysis - Dan Ellis
2012-01-07
8 /55
Sequences & Streaming
• Perceptual effects of sequences
e.g. streaming
frequency
TRT: 60-150 ms
1 kHz
Δf:
–2 octaves
time
• Music is built of sequences
at many different levels
Music Audio Analysis - Dan Ellis
2012-01-07
9 /55
Music and Scene Analysis
harmonic relations
→ overlapped harmonics
rhythmic playing
→ synchronized onsets
co-ordinated ensembles
→ mutual dependence
of sources
Natural Scene
8
city22.wav
7
6
5
4
3
2
1
0
freq / kHz
auditory scene analysis
freq / kHz
• Music appears designed to “defeat”
0
1
2
3
4
Musical Scene
8
5
6
7
8
9
10
6
7
8
9
10
time / sec
brodsky.wav
7
6
5
4
3
2
1
0
0
1
2
3
4
5
• Maybe that’s why we like it!
Music Audio Analysis - Dan Ellis
2012-01-07 10/55
3. Music Scene Analysis
• Interesting music audio analysis tasks are
Signal
freq / kHz
perceptually defined
Let it Be (final verse)
4
20
0
2
-20
0
162
what do
listeners hear?
Melody
C5
C4
C3
C2
Piano
C5
C4
C3
C2
164
166
168
170
172
level / dB
174 time / s
Onsets
& Beats
G
Per-frame
chroma
E
D
C
1
0.75
0.5
0.25
0
intensity
A
Per-beat
normalized
chroma
G
E
D
C
A
390
Music Audio Analysis - Dan Ellis
395
400
405
410
415
time / beats
2012-01-07 11/55
Note Transcription
• Goal: Recover the score (notes, timing, voices)
Frequency
4000
3000
2000
1000
0
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Time
musicians can (be trained to) do it
• Framework:
find the best-matching
synthesis parameters?
Note events
{tk, pk, ik}
synthesis
Music Audio Analysis - Dan Ellis
?
Observations
X[k,n]
note
template
2-D
convolution
2012-01-07 12/55
freq / kHz
Note Transcription Problems
“Oh, I’m just about to lose my mind...”
4
3
noise / multiple f0s
2
1
0
0.5
1
1.5
2
40
20
20
0
0
dB
40
-20
ampl.
2.5
0
1000
2000
3000
4000
-20
0.2
0.2
0
0
0.1
3.5
time / s
0
1000
2000
3000
4000
freq / Hz
note segmentation
-0.1
-0.2
Voice Activity
Detection
unclear f0
0.1
-0.1
0.75
3
-0.2
0.76
0.77
0.78
0.79
Music Audio Analysis - Dan Ellis
3.3
3.31
3.32
3.33
3.34
time / s
2012-01-07 13/55
Pitch Templates
• Harmonic series as patterns on log-frequency
spectrograms
*
look for largest peak?
matched filters can
enhance fundamental
0
0
20
40
60
80
100
120
140
160
0
20
40
60
80
100
120
140
160
0
0
20
40
=
0
8000
4000
2000
1000
500
250
8000
4000
2000
1000
500
250
0.5
1
Music Audio Analysis - Dan Ellis
1.5
2
2.5
3
3.5
time / s
2012-01-07 14/55
60
80
• Notes generate multiple
harmonics in sinusoid analysis
find pitches by grouping them?
freq / Hz
Sinusoid Tracks
Maher & Beauchamp ’94
3000
2500
2000
1500
1000
500
0
0
1
2
3
4
time / s
• Problems
when to “break tracks”
0.06
0.04
0.02
0
0
0.5
1
1.5
time / s
how to group (harmonicity, onset)
Music Audio Analysis - Dan Ellis
2012-01-07 15/55
Iterative Removal
Klapuri ’01, ’06
• At each frame:
estimate dominant f0 by checking for harmonics
cancel it from spectrum
repeat until no f0 is prominent
Subtract
& iterate
audio
frame
level / dB
Harmonics enhancement
Predominant f0 estimation
f0 spectral smoothing
60
40
dB
dB
20
10
0
-20
0
20
30
0
1000
2000
3000
frq / Hz
20
10
10
0
0
0
-10
0
1000
2000
3000
4000
frq / Hz
0
0
100
200
300
1000
2000
3000
frq/Hz
f0 / Hz
Stop when no more prominent f0s
Music Audio Analysis - Dan Ellis
2012-01-07 16/55
Probabilistic Model
Goto ’97, ’00
• Generative model:
p(x(f )) =
w(F, m)p(x(f )|F , m) dF
m
spectrum
= weighted combination
of tone models at specific f0s
Tone Model (m=2)
p(x | F,2, µ(t)(F,2))
p(x,1 | F,2, µ(t)(F,2))
c(t)(1|F,2)
c(t)(2|F,2)
Tone Model (m=1)
p(x | F,1, µ(t)(F,1))
c(t)(1|F,1)
c(t)(3|F,2)
c(t)(4|F,2)
c(t)(2|F,1)
F
fundamental
frequency F+1200
c(t)(3|F,1)
c(t)(4|F,1)
F+1902
F+2400
x [cent]
x [cent]
‘knowledge’ in models
& prior distributions for f0
• Is it
perceptually relevant?
Music Audio Analysis - Dan Ellis
2012-01-07 17/55
Trained Pitch Classifier
• Exchange signal models for data
Poliner & Ellis ‘05,’06,’07
transcription as pure classification problem
feature representation
feature vector
Training data and features:
•MIDI, multi-track recordings,
playback piano, & resampled audio
(less than 28 mins of train audio).
•Normalized magnitude STFT.
classification posteriors
Classification:
•N-binary SVMs (one for ea. note).
•Independent frame-level
classification on 10 ms grid.
•Dist. to class bndy as posterior.
hmm smoothing
Temporal Smoothing:
•Two state (on/off) independent
HMM for ea. note. Parameters
learned from training data.
•Find Viterbi sequence for ea. note.
Music Audio Analysis - Dan Ellis
2012-01-07 18/55
Instrument Modeling
• Use NMF to model spectrum
as templates + activation
Grindlay & Ellis ’09, ’11
X=W·H
• Eigeninstrument bases
constrain
instrument
spectra
Music Audio Analysis - Dan Ellis
2012-01-07 19/55
Rhythm Tracking
• Rhythm/Beat tracking has 2 main components:
front end: extract ‘events’ from audio
back end: find plausible beat sequence to match
Audio
Onset Event
detection
Beat
marking
Beat times etc.
Musical
knowledge
• Other outputs
tempo
time signature
metrical level(s)
Music Audio Analysis - Dan Ellis
2012-01-07 20/55
Onset Detection
• Simplest thing is
energy envelope
W/2
2
e(n0 ) =
n= W/2
w[n] |x(n + n0 )|
0
|X(f, t)|
f
f · |X(f, t)|
6
2
60
50
40
30
20
10
0
level / dB
f
60
50
40
30
20
10
0
Music Audio Analysis - Dan Ellis
Maracatu
4
level / dB
emphasis on
high frequencies?
freq / kHz
Harnoncourt
8
0
2
4
6
8
10
time / sec
0
2
4
6
8
10
time / sec
2012-01-07 21/55
Multiband Derivatives
• Sometimes energy just “shifts”
Puckette et. al ’98
calculate & sum onset in multiple bands
use ratio instead of difference - normalize energy
o(t) =
freq / kHz
f
|X(f, t)|
W (f ) max(0,
|X(f, t 1)|
1)
bonk~
8
6
4
2
level / dB
0
60
50
40
30
20
10
0
0
2
4
Music Audio Analysis - Dan Ellis
6
8
10
time / sec
0
2
4
6
8
10
time / sec
2012-01-07 22/55
Phase Deviation
Bello et al. ’05
• When amplitudes don’t change much,
phase discontinuity may signal new note
0.15
0.1
0.05
0
−0.05
−0.1
8.96
8.97
8.98
8.99
9
9.01
• Can detect by comparing
9.02
actual phase with
extrapolation from past
X(f, tn )
X̂(f, tn+1 ) = X(f, tn )
X(f, tn 1 )
combine with amplitude...
Music Audio Analysis - Dan Ellis
9.03
9.04
9.05
^
X(f, tn+1)
9.06
Im{X}
prediction
Complex
deviation
X(f, tn)
∆Θ
∆Θ
X(f, tn-1)
Re{X}
actual
X(f, tn+1)
2012-01-07 23/55
Rhythm Tracking
• Earliest systems were rule based
Desain & Honing 1999
based on musicology Longuet-Higgins and Lee, 1982
inspired by linguistic grammars - Chomsky
input: event sequence (MIDI)
output: quarter notes, downbeats
Music Audio Analysis - Dan Ellis
2012-01-07 24/55
Tempo Estimation
• Perception of beat comes from regular spacing
.. the kind of thing we detect with autocorrelation
• Pick peak in
onset envelope
autocorrelation
after applying
“human
preference”
window
check for
subbeat
Music Audio Analysis - Dan Ellis
Onset Strength Envelope (part)
4
2
0
-2
8
8.5
9
9.5
10
10.5
11
11.5
3
3.5
Raw Autocorrelation
12
time / s
400
200
0
Windowed Autocorrelation
200
100
0
-100
0
0.5
1
1.5
Secondary Tempo Period
Primary Tempo Period
2
2.5
lag / s
2012-01-07 25/55
4
Resonators for Beat Tracking
• How to address:
Scheirer ’98
build-up of rhythmic
evidence
“ghost events”
(audio input)
• Reminiscent of a
comb filter...
resonant filterbank of
y(t) = y(t
T ) + (1
)x(t)
for all possible T
Music Audio Analysis - Dan Ellis
2012-01-07 26/55
Multi-Hypothesis Systems
• Beat is ambiguous
Goto & Muraoka 1994
Goto 2001
Dixon 2001
→ develop several alternatives
Drumsound
finder
cy um
n
e
r
Compact disc requ pect
F s
f
Manager
t
Musical
audio signals
A/D conversion
Onsettime
finders
Higher-level
checkers
Onsettime
vectorizers
Frequency
analysis
Beat information
Agents
Beat prediction
Beat information
transmission
inputs: music audio
outputs: beat times, downbeats, BD/SD patterns...
Music Audio Analysis - Dan Ellis
2012-01-07 27/55
Objective Function Optimization
Ellis 2007
• Re-cast beat tracking as optimization:
Find beat times {ti} to maximize
N
C({ti }) =
N
O(ti ) +
i=1
F (ti
ti
1, p)
i=2
O(t) is onset strength function
F ( t, ) is tempo consistency score e.g.
2
t
F ( t, ) =
log
(needs tempo for ø)
• Looks like an exponential search over all {t }
i
... but Dynamic Programming saves us
Music Audio Analysis - Dan Ellis
2012-01-07 28/55
Beat Tracking by DP
• To optimize C({t }) =
N
N
F (ti
O(ti ) +
i
i=1
ti
1, p)
i=2
define C*(t) as best score up to time t
then build up recursively (with traceback P(t))
O(t)
C*(t)
τ
t
C*(t) = O(t) + max{αF(t – τ, τp) + C*(τ)}
τ
P(t) = argmax{αF(t – τ, τp) + C*(τ)}
τ
final beat sequence {ti} is best C* + back-trace
Music Audio Analysis - Dan Ellis
2012-01-07 29/55
% <beats> returns the chosen beat sample times.
% 2007!06!19 Dan Ellis dpwe@ee.columbia.edu
•
beatsimple
% backlink(time) is best predecessor for this point
% cumscore(time) is total cumulated score to this poin
backlink = !ones(1,length(localscore));
cumscore = localscore;
Search range for previous beat
Beat tracking in 15 lines of%prange
Matlab
= round(!2*period):!round(period/2);
function beats = beatsimple(localscore, period, alpha)
% beats = beatsimple(localscore, period, alpha)
% Core of the DP!based beat tracker
% <localscore> is the onset strength envelope
% <period> is the target tempo period (in samples)
% <alpha> is weight applied to transition cost
% <beats> returns the chosen beat sample times.
% 2007!06!19 Dan Ellis dpwe@ee.columbia.edu
% backlink(time) is best predecessor for this point
% cumscore(time) is total cumulated score to this point
backlink = !ones(1,length(localscore));
cumscore = localscore;
% Search range for previous beat
prange = round(!2*period):!round(period/2);
% Log!gaussian window over that range
txcost = (!alpha*abs((log(prange/!period)).^2));
f o r i = max(!prange + 1):length(localscore)
timerange = i + prange;
% Search over all possible predecessors
% and apply transition weighting
Music
Audio Analysis
Ellis
scorecands
= txcost -+Dan
cumscore(timerange);
% Find best predecessor beat
% Log!gaussian window over that range
txcost = (!alpha*abs((log(prange/!period)).^2));
f o r i = max(!prange + 1):length(localscore)
timerange = i + prange;
% Search over all possible predecessors
% and apply transition weighting
scorecands = txcost + cumscore(timerange);
% Find best predecessor beat
[vv,xx] = max(scorecands);
% Add on local score
cumscore(i) = vv + localscore(i);
% Store backtrace
backlink(i) = timerange(xx);
end
% Start backtrace from best cumulated score
[vv,beats] = max(cumscore);
% .. then find all its predecessors
while backlink(beats(1)) > 0
beats = [backlink(beats(1)),beats];
end
2012-01-07 30/55
Results
mean output BPM
bonus6 (Jazz): BPM target vs. BPM out
200
150
100
70
400
300
200
150
50
40
100
= 400
= 100
30
= 400
= 100
70
/
vary
tempo
estimate τ
Beat accuracy
accuracy
Beat accuracy
1
0.5
0
1
0.5
0
/ of inter-beat-intervals
0.2
0.1
0
/ of inter-beat-intervals
/
vary
tradeoff
weight α
mean output BPM
against
human
tapping
data
accuracy
• Verify
bonus1 (Choral): BPM target vs. BPM out
0.2
0.1
30
Music Audio Analysis - Dan Ellis
40 50
70
100
150 200
target BPM
0
70
100
150
200
300 400
target BPM
2012-01-07 31/55
Chord Recognition
• Do people hear simultaneous notes
or do they learn the sound of chords?
music limits the likely combinations
chords have a definite “color”
Beatles - Beatles For Sale - Eight Days a Week (4096pt)
• Recognize chords
labeled data available
analogous to
speech recognition
G
#
100
E
80
#
D
60
#
40
C
20
B
0
intensity
#
A
16.27
true
align
recog
Music Audio Analysis - Dan Ellis
120
F
pitch class
instead of notes?
#
24.84
time / sec
E
G
D
E
G
DBm
E
G
Bm
Bm
G
G
Am
Em7 Bm
Em7
2012-01-07 32/55
Chord Features: Chroma
Fujishima 1999
• Idea: Project all energy onto 12 semitones
regardless of octave
G
F
4
chroma
freq / kHz
chroma
maintains main “musical” distinction
invariant to musical equivalence
no need to worry about harmonics?
3
G
F
D
2
D
C
1
C
A
0
50 100 150 fft bin
A
2
NM
C(b) =
4
6
8
time / sec
B(12 log2 (k/k0 )
50
100
150
200
250 time / frame
b)W (k)|X[k]|
k=0
W(k) is weighting, B(b) selects every ~ mod12
Music Audio Analysis - Dan Ellis
2012-01-07 33/55
Chroma Resynthesis
• Chroma describes the notes in an octave
... but not the octave
• Can resynthesize by presenting all octaves
Ellis & Poliner 2007
... with a smooth envelope
“Shepard tones” - octave is ambiguous
M
0
-10
-20
-30
-40
-50
-60
12 Shepard tone spectra
freq / kHz
level / dB
b
b
o+ 12
yb (t) =
W (o + ) cos 2
w0 t
12
o=1
Shepard tone resynth
4
3
2
1
0
500
1000
1500
2000
2500 freq / Hz
endless sequence illusion
Music Audio Analysis - Dan Ellis
0
2
4
6
8
10 time / sec
2012-01-07 34/55
Chroma Resynthesis
• Resynthesis illustrates what has been captured
can combine with MFCC features for coarse spectrum
freq / Hz
Let It Be - log-freq specgram (LIB-1)
6000
1400
300
chroma bin
Chroma features
B
A
G
E
D
C
freq / Hz
Shepard tone resynthesis of chroma (LIB-3)
6000
1400
300
freq / Hz
MFCC-filtered shepard tones (LIB-4)
6000
1400
300
0
5
Music Audio Analysis - Dan Ellis
10
15
20
25
time / sec
2012-01-07 35/55
Beat-Synchronous Chroma
• Store just one chroma frame per beat
Bartsch & Wakefield ’01
a compact representation of musical content
freq / Hz
Let It Be - log-freq specgram (LIB-1)
6000
1400
300
Onset envelope + beat times
chroma bin
Beat-synchronous chroma
B
A
G
E
D
C
freq / Hz
Beat-synchronous chroma + Shepard resynthesis (LIB-6)
6000
1400
300
0
5
Music Audio Analysis - Dan Ellis
10
15
20
25
time / sec
2012-01-07 36/55
Chord Recognition System
• Analogous to speech recognition
Sheh & Ellis ’03
Gaussian models of features for each chord
Hidden Markov Models for chord transitions
Beat track
Audio
test
100-1600 Hz
BPF
Chroma
25-400 Hz
BPF
Chroma
train
Labels
beat-synchronous
chroma features
chord
labels
HMM
Viterbi
C maj
B
A
G
Root
normalize
Gaussian
Unnormalize
24
Gauss
models
E
D
C
C
D
E
G
A
B
C
D
E
G
A
B
c min
B
A
G
E
D
Resample
C
b
a
g
Count
transitions
24x24
transition
matrix
f
e
d
c
B
A
G
F
E
D
C
Music Audio Analysis - Dan Ellis
C
D
EF
G
A
B c
d
e f
g
a
b
2012-01-07 37/55
Key Normalization
• Chord transitions depend on key of piece
dominant, relative minor, etc...
probabilities should
be key-relative
estimate main key of
piece
rotate all chroma
features
learn models
aligned chroma
• Chord transition
Taxman
I'm Only Sleeping
G
F
G
F
G
F
D
D
D
C
C
C
A
A
A
C D
F G
Love You To
A
A
C D
F G
A
Aligned Global model
G
F
G
F
D
D
D
C
C
C
A
A
A
C D
She Said She Said
C D
F G
A
Good Day Sunshine
G
F
G
F
D
D
D
C
C
C
A
C D
F G
C D
F G
And Your Bird Can Sing
G
F
A
F G
A
A
F G
C D
Yellow Submarine
G
F
A
Music Audio Analysis - Dan Ellis
Eleanor Rigby
A
A
C D
F G
A
C D
F G
aligned chroma
2012-01-07 38/55
Chord Recognition
freq / Hz
• Often works:
Audio
Let It Be/06-Let It Be
2416
761
G
C
G
G
F
BeatE
synchronous D
chroma
C
B
A
Recognized
0
A:min
2
a
F
4
6
C
G
F
C
C
G
F
C
8
10
12
C
A:min/b7
F:maj7
C
F:maj6
Ground
truth
chord
A:min/b7
F:maj7
240
A:min
G
G
14
a
16
18
• But not always:
Music Audio Analysis - Dan Ellis
2012-01-07 39/55
Eigenrhythms: Drum Pattern Space
• Pop songs built on repeating “drum loop”
Ellis & Arroyo ‘04
variations on a few bass, snare, hi-hat patterns
• Eigen-analysis (or ...) to capture variations?
by analyzing lots of (MIDI) data, or from audio
• Applications
music categorization
“beat box” synthesis
insight
Music Audio Analysis - Dan Ellis
2012-01-07 40/55
Aligning the Data
• Need to align patterns prior to modeling...
tempo (stretch):
by inferring BPM &
normalizing
downbeat (shift):
correlate against
‘mean’ template
Music Audio Analysis - Dan Ellis
2012-01-07 41/55
Eigenrhythms (PCA)
• Need 20+ Eigenvectors for good coverage of
100 training patterns (1200 dims)
• Eigenrhythms both add and subtract
Music Audio Analysis - Dan Ellis
2012-01-07 42/55
Posirhythms (NMF)
Posirhythm 1
Posirhythm 2
HH
HH
SN
SN
BD
BD
Posirhythm 3
Posirhythm 4
HH
HH
SN
SN
BD
BD
0.1
Posirhythm 5
Posirhythm 6
HH
HH
SN
SN
BD
BD
0
50
100
150
200
250
300
350
400
0
-0.1
0
50
100
150
200
250
300
350
samples (@ 200 Hz)
beats (@ 120 BPM)
• Nonnegative: only adds beat-weight
• Capturing some structure...
Music Audio Analysis - Dan Ellis
2012-01-07 43/55
Eigenrhythms for Classification
• Projections in Eigenspace / LDA space
PCA(1,2) projection (16% corr)
LDA(1,2) projection (33% corr)
10
6
blues
4
country
disco
hiphop
2
house
newwave
rock 0
pop
punk
-2
rnb
5
0
-5
-10
-20
-10
0
10
-4
-8
-6
-4
-2
0
2
• 10-way Genre classification (nearest nbr):
PCA3: 20% correct
LDA4: 36% correct
Music Audio Analysis - Dan Ellis
2012-01-07 44/55
Eigenrhythm BeatBox
• Resynthesize rhythms from eigen-space
Music Audio Analysis - Dan Ellis
2012-01-07 45/55
4. Large Music Audio Datasets
• Music Information Retrieval (MIR)
Bertin-Mahieux et al ’11
is a vibrant new field
many commercial opportunities
• But: music audio is hard to share
copyright owners have been burned
researchers use personal collections...
• Idea: Million Song Dataset (MSD)
commercial scale
available to all
many different “facets”
http://labrosa.ee.columbia.edu/millionsong
Music Audio Analysis - Dan Ellis
2012-01-07 46/55
• Features,
MSD Facets
Lyrics,
Tags,
Covers,
Listeners ...
Music Audio Analysis - Dan Ellis
2012-01-07 47/55
MSD Audio Features
• Use Echo Nest “Analyze” features
segment audio into variable-length “events”
freq / Hz
represent by
12 chroma +
12 “timbre”
TRKUYPW128F92E1FC0 - Tori Amos - Smells Like Teen Spirit
2416
Original
761
240
B
A
G
supports
a crude
resynthesis:
EN
Features
freq / Hz
E
D
C
2416
Resynth
761
240
0
Music Audio Analysis - Dan Ellis
2
4
6
8
10
12 time / sec 14
2012-01-07 48/55
MSD Metadata
EN Metadata
Last.fm Tags
artist:
release:
title:
id:
key:
mode:
loudness:
tempo:
time_signature:
duration:
sample_rate:
audio_md5:
7digitalid:
familiarity:
year:
100.0 – cover
57.0 – covers
43.0 – female vocalists
42.0 – piano
34.0 – alternative
14.0 – singer-songwriter
11.0 – acoustic
8.0 – tori amos
7.0 – beautiful
6.0 – rock
6.0 – pop
6.0 – Nirvana
6.0 – female vocalist
6.0 – 90s
5.0 – out of genre covers
'Tori Amos'
'LIVE AT MONTREUX'
'Smells Like Teen Spirit'
'TRKUYPW128F92E1FC0'
5
0
-16.6780
87.2330
4
216.4502
22050
'8'
5764727
0.8500
1992
SHS Covers
%5489,4468, Smells
TRTUOVJ128E078EE10
TRFZJOZ128F4263BE3
TRJHCKN12903CDD274
TRELTOJ128F42748B7
TRJKBXL128F92F994D
TRIHLAW128F429BBF8
TRKUYPW128F92E1FC0
5.0
4.0
4.0
4.0
4.0
3.0
3.0
3.0
2.0
2.0
2.0
2.0
2.0
2.0
2.0
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
cover songs
soft rock
nirvana cover
Mellow
alternative rock
chick rock
Ballad
Awesome Covers
melancholic
k00l ch1x
indie
female vocalistist
female
cover song
american
MxM Lyric Bag-of-Words
Like Teen Spirit
Nirvana
Weird Al Yankovic
Pleasure Beach
The Flying Pickets
Rhythms Del Mundo feat. Shanade
The Bad Plus
Tori Amos
Music Audio Analysis - Dan Ellis
12
11
10
9
7
6
6
6
hello
i
a
and
it
are
we
now
6
6
6
4
4
4
3
3
here
us
entertain
the
feel
yeah
to
my
3
3
3
3
3
3
3
3
is
with
oh
out
an
light
less
danger
2012-01-07 49/55
Melodic-Harmonic Mining
• What can you find in a million songs?
Bertin-Mahieux et al. ’10
what characterizes the content?
Beat
tracking
Music
audio
• Frequent
Chroma
features
Key
normalization
Landmark
identification
Locality
Sensitive
Hash Table
#1 (3491)
#2 (2775))
#3 (2255)
#4 (1241))
#5 (1224))
#6 (1218))
#7 (1092))
#8 (1084))
#9 (1080))
#10 (1035))
# (1021)
#11
# (1005))
#12
#13 (974)
#14 (942))
#15 (936))
#16 (924))
#17 (920))
#18 (913))
#19 (901))
#20 (897)
#21 (887)
#22 (882))
#23 (881)
#24 (881))
#25 (879))
#26 (875))
#27 (875))
#28 (874))
#29 (868))
#30 (844)
#31 (839)
#32 (839))
#33 (794)
#34 (786))
#35 (785))
#36 (747))
#37 (731))
#38 (714))
#39 (706))
#40 (698)
#41 (682)
#42 (678))
#43 (675)
#44 (657))
#45 (656))
#46 (651))
#47 (647))
#48 (638))
#49 (610))
#50 (593)
#51 (592)
#52 (591))
#53 (589)
#54 (572))
#55 (571))
#56 (550))
#57 (549))
#58 (534))
#59 (534))
#60 (531)
#61 (528)
#62 (525))
#63 (522)
#64 (514))
#65 (510))
#66 (507))
#67 (500))
#68 (497))
#69 (486))
#70 (479)
Music Audio Analysis - Dan#71Ellis
(476)
#72 (468))
#73 (468)
#74 (466))
#75 (463))
#76 (454))
#77 (453))
clusters
of 12 x 8
binarized
eventchroma
2012-01-07
50#80
/55
#78 (448))
#79 (441))
(440)
Results - Beatles
• Over 86 Beatles tracks
• All beat offsets = 41,705 patches
LSH takes 300 sec - approx NlogN in patches?
• High-pass
• Song filter
remove hits
in same
track
Music Audio Analysis - Dan Ellis
05-Here There And Everywhere 12.1-20.5s
10
10
8
8
chroma bin
12
6
6
4
4
2
2
09-Martha My Dear 90.9-98.6s
12-Piggies 22.0-29.6s
12
12
10
10
8
8
chroma bin
to avoid
sustained
notes
chroma bin
along time
chroma bin
02-I Should Have Known Better 92.4-97.7s
12
6
6
4
4
2
2
5
10
beat
15
20
5
10
beat
15
20
2012-01-07 51/55
5. Outstanding Issues
• Perceptually Inspired?
Music Perception is complex:
Structure
Expectation
Memory
Enjoyment
• Many problems still to solve
structure
metrical hierarchy
music similarity & preference
Music Audio Analysis - Dan Ellis
2012-01-07 52/55
Summary
• Machine Listening:
Getting useful information from sound
• Musical sound
... constructed to confound scene analysis?
• Transcription tasks
... recover notes, beats, chords etc.
• Million Song Dataset for research
... large-scale, multiple facets
Music Audio Analysis - Dan Ellis
2012-01-07 53/55
References
•
•
•
•
•
•
•
•
•
•
•
M. A. Bartsch & G. H. Wakefield, “To catch a chorus: Using chroma-based representations for audio
thumbnailing,” Proc. IEEE WASPAA, 15-18, Mohonk, 2001.
J. P. Bello, L. Daudet, S. Abdallah, C. Duxbury, M. Davies, M. B. Sandler, “A Tutorial on Onset
Detection in Music Signals,” IEEE Tr. Speech and Audio Proc., 13(5):1035-1047, 2005.
T. Bertin-Mahieux, D. Ellis, B. Whitman, & P. Lamere, “The Million Song Dataset,” Int. Symp. Music
Inf. Retrieval ISMIR, Miami, 2011. A. Bregman, Auditory Scene Analysis, MIT Press, 1990.
P. Desain & H. Honing, “Computational models of beat induction: The rule-based approach,” J.
New Music Research, 28(1):29-42, 1999. J. M. Grey, An Exploration of Musical Timbre, Ph.D.
Dissertation, Stanford CCRMA, No. STAN-M-2, 1975.
S. Dixon, “Automatic extraction of tempo and beat from expressive performances,” J. New Music
Research, 30(1):39-58, 2001.
D. Ellis, “Beat Tracking by Dynamic Programming,” J. New Music Research, 36(1):51-60, 2007.
D. Ellis & J. Arroyo, “Eigenrhythms: Drum pattern basis sets for classification and generation,” Int.
Symp. Music Inf. Retrieval ISMIR, 101-106, Barcelona, 2004.
D. Ellis & G. Poliner, “Identifying Cover Songs With Chroma Features and Dynamic Programming
Beat Tracking,” Proc. ICASSP, IV-1429-1432, Hawai'i, 2007.
T. Fujishima, “Realtime chord recognition of musical sound: A system using common lisp music,”
Proc. Int. Comp. Music Conf., 464–467, Beijing, 1999.
M. Goto, “A Robust Predominant-F0 Estimation Method for Real-time Detection of Melody and Bass
Lines in CD Recordings,” Proc. ICASSP, II-757-760, 2000.
M. Goto, “An Audio-based Real-time Beat Tracking System for Music With or Without Drumsounds,” J. New Music Research, 30(2):159-171, 2001.
Music Audio Analysis - Dan Ellis
2012-01-07 54/55
References
•
•
•
•
•
•
•
•
•
G. Grindlay & D. Ellis, “Transcribing Multi-instrument Polyphonic Music with Hierarchical
Eigeninstruments”, IEEE J. Sel. Topics in Sig. Process., 5(6):1159-1169, 2011.
A. Klapuri, “Multiple Fundamental Frequency Estimation by Summing Harmonic Amplitudes”, Proc.
Int. Symp. Music IR, 216-221, 2006.
R. Maher & J. Beauchamp, “Fundamental frequency estimation of musical signals using a two-way
mismatch procedure,” J. Acoust. Soc. Am. 95(4):2254-2263, 1994.
M. F. McKinney & D. Moelants, “Ambiguity in tempo perception: What draws listeners to different
metrical levels?” Music Perception, 24(2):155–166, 2006.
J. R. Pierce, The Science of Musical Sound, Scientific American Press, 1983.
G. Poliner & D. Ellis, “A Discriminative Model for Polyphonic Piano Transcription”, Eurasip Journal
of Advances in Signal Processing, article id 48317, 2007.
M. Puckette, T. Apel, D. Zicarelli, “Real-time audio analysis tools for Pd and MSP,” Proc. Int. Comp.
Music Conf., Ann Arbor, 109–112, 1998.
E. D. Scheirer, “Tempo and beat analysis of acoustic musical signals,” J. Acoust. Soc. Am.,
103:588-601, 1998.
A. Sheh & D. Ellis, “Chord Segmentation and Recognition using EM-Trained Hidden Markov
Models,” Int. Symp. Music Inf. Retrieval ISMIR, 185-191, Baltimore, 2003.
Music Audio Analysis - Dan Ellis
2012-01-07 55/55
Download