• Chord Recognition and Segmentation using EM-trained Hidden Markov Models

advertisement
Chord Recognition and
Segmentation using EM-trained
Hidden Markov Models
• Chord Recognition
• EM-trained HMMs
• Experiments & Results
Alex Sheh & Dan Ellis
{asheh79,dpwe}@ee.columbia.edu
LabROSA, Columbia University
Chord Recognition - Sheh & Ellis
2003-10-29 - p. 1 /19
Chord Transcription
problem:
• Basic
Recover chord sequence labels from audio
Easier than note transcription ?
More relevant to listener perception ?
Chord Recognition - Sheh & Ellis
2003-10-29 - p. 2 /19
Difficulties with
Chord Transcription
• Enharmonicity:
Chord labels can be ambiguous
C# vs Db
• Many different chord classes
major, minor, 6th, 9th, ...
fold into 7 ‘main’ classes:
maj, min, maj7, min7, dom7, aug, dim
• Acoustic variability
chords are the same regardless of instrumentation
Chord Recognition - Sheh & Ellis
2003-10-29 - p. 3 /19
Approaches to
Chord Transcription
• Note transcription, then note→chord rules
like labeling chords in MIDI transcripts
• Spectrum→chord rules
i.e. find harmonic peaks, use knowledge of likely
notes in each chord
• Trained classifier
don’t use any “expert knowledge”
instead, learn patterns from labeled examples
Chord Recognition - Sheh & Ellis
2003-10-29 - p. 4 /19
Statistical-Pattern-Recognition
Chord Recognizer
• Use labeled training examples to estimate
p(x|wi) for features x and chord class wi
Labeled
training
examples
{xn,ωxn}
Sort
according
to class
p(x|ω1)
Estimate
conditional pdf
for class ω1
• Use Bayes Rule to get posterior
• probabilities for each class given features:
p(x|wi) · p(wi)
p(wi|x) =
 j p(x|w j ) · p(w j )
Chord Recognition - Sheh & Ellis
2003-10-29 - p. 5 /19
Training Data Sources
need (lots of) examples of audio
• We
segments and the appropriate chord labels
Not widely available!
Even when you can get it, there is very little
• We could hand-mark a training set
painfully time-consuming!
• Can we generate it automatically?
we could, if we already had the chord
transcriptions system working...
Chord Recognition - Sheh & Ellis
2003-10-29 - p. 6 /19
Speech Recognition Analogy
• Chord recognition is like word recognition
we are trying to recover a sequence of ‘exclusive’
labels associated with an audio stream
we have a lot of potential training audio,
but no time labels
• Can we do what they do in ASR?
.. i.e. iterative re-estimation of models and labels
using Expectation-Maximization
Need only label sequence, not timings
(i.e. words or chords in order, no times)
Chord Recognition - Sheh & Ellis
2003-10-29 - p. 7 /19
EM HMM Re-Estimation
• Estimate ‘soft’ labels using current models
• Update model parameters from new labels
• Repeat until convergence to local maximum
Model inventory
ae1
ae2
dh1
dh2
ae3
Initialization Θ
init
parameters
Repeat until
convergence
Labelled training data
dh ax k ae t
s ae t aa n
dh
s ae
E-step:
probabilities p(qni |XN
1,Θold)
of unknowns
M-step:
maximize via
parameters
Chord Recognition - Sheh & Ellis
ax
k
t
dh
ae
t
aa n
ax
k
Uniform
initialization
alignments
ae
Θ : max E[log p(X,Q | Θ)]
2003-10-29 - p. 8 /19
Chord Sequence Data Sources
we need are the chord sequences
• All
for our training examples
• OLGA Tab archives (http://www.olga.net/)?
multiple authors, unreliable quality
• Hal Leonard “Paperback Song Series”
many Beatles songs, consistent detail
manually retyped for 20 songs:
“Beatles for Sale”, “Help”, “Hard Day’s Night”
• Issues:
repeats, intros, weird bits in the middle
Chord Recognition - Sheh & Ellis
2003-10-29 - p. 9 /19
Experiments
investigations to see if this
• Preliminary
works
small database to get started
compare different feature sets
different ways to evaluate results
can we reintroduce a little high-level knowledge?
Chord Recognition - Sheh & Ellis
2003-10-29 - p. 10/19
Training/Test Conditions
• Two training sets:
Train on 18 songs, test on 2 held-out songs
(fair test)
Train on all 20 songs, test on 2 songs from training
set (cheating, rewards over-fitting, upper bound)
• Two evaluation measures
Recognition: transcribe unknown chords
Forced alignment: find time boundaries given
correct chord sequence
Score frame-level accuracy against hand-markings
Chord Recognition - Sheh & Ellis
2003-10-29 - p. 11/19
Features
• Can try any features with EM training...
• Use MFCCs as baseline
“the” feature in ASR, also useful in music IR
also try deltas, double deltas as in ASR
capture ‘formants’, not pitch - straw man
• “Pitch Class Profile” features (Fujishima’99)
Spectrum-to-chroma mapping weights
Chroma bin (quarter tones)
collapse FFT bin energies
into (24) chroma bins
a/k/a Chroma Spectrum
(Bartsch’01) ...
20
15
10
5
20
40
60
80
FFT bin (up to 2000 Hz)
Chord Recognition - Sheh & Ellis
100
2003-10-29 - p. 12/19
Averaging Rotated PCP Models
system learns separate models for
• Statistical
each of (7 chord types) x (21 roots)
models are means, variances of feature vectors
only 32 actually appear in our training set
even so, many have few training instances
• Expect similarity between e.g. Amaj & Bmaj
same chord, just shifted in frequency
shift = rotation of 24-bin chroma space
align & average all transpositions of
• Can
same chord after each training iteration
then rotate back to starting chroma, continue...
Chord Recognition - Sheh & Ellis
2003-10-29 - p. 13/19
Results: Recognition
• Models are not adequately discriminant:
Percent Frame Accuracy: Recognition
Feature
MFCCs are poor
(can overtrain)
PCPs a little better
(ROT helps
generalization)
MFCC
MFCC D
PCP
PCP ROT
Chord Recognition - Sheh & Ellis
Recog
train18 train20
5.9
16.7
7.7
19.6
15.8
7.6
1.5
6.9
10.0
23.6
18.2
26.4
23.3
23.1
20.1
13.1
Eight Days a Week
Every Little Thing
(random ~3%)
2003-10-29 - p. 14/19
Results: Alignment
• Some glimmers of hope!
Percent Frame Accuracy: Forced Alignment
Feature
MFCC
MFCC D
PCP
PCP_ROT
PCP ROT
best accuracy
& best generalization
Chord Recognition - Sheh & Ellis
Align
train18 train20
27.0
20.9
14.5
23.0
24.1
13.1
19.9
19.7
26.3
41.0
46.2
53.7
68.8
68.3
83.3
83.8
Eight Days a Week
Every Little Thing
2003-10-29 - p. 15/19
Chord Confusions
"E M AJOR" CONFUSION M ATRIX
Eight Days a We e k
Maj
Min
Maj7
Min7
Dom7
Aug
Dim
E/Fb
15 8
115
.
9
.
.
.
E#/F
9
.
.
.
.
.
.
F#/Gb
.
.
.
11
.
.
.
G
3
.
.
.
.
.
.
G#/Ab .
.
.
.
.
.
.
A
9
.
.
.
1
.
.
A#/Bb .
.
.
.
.
.
.
B/Cb
8
.
.
.
,
.
.
B#/C
.
.
.
.
.
.
.
C#/Db .
20
.
.
.
.
.
D
.
14
.
.
.
.
.
D#/Eb
.
.
.
.
.
.
.
• Major/minor confusions
• C# is relative minor (shared notes)
Chord Recognition - Sheh & Ellis
2003-10-29 - p. 16/19
What did the models learn?
model centers (means)
• Chord
indicate chord ‘templates’:
PCP_ROT family model means (train18)
0.4
DIM
DOM7
MAJ
MIN
MIN7
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
C
0
D
5
E F
Chord Recognition - Sheh & Ellis
10
G A
15
20
BC
25
(for C-root chords)
2003-10-29 - p. 17/19
Recognition/Alignment
Example
• A flavor of the features & results:
Beatles - Beatles For Sale - Eight Days a Week (4096pt)
#
G
#
120
F
100
pitch class
E
80
#
D
60
#
40
C
20
B
0
intensity
#
A
16.27
true
align
recog
24.84
time / sec
E
G
D
E
G
DBm
E
G
Bm
Chord Recognition - Sheh & Ellis
Bm
G
G
Am
Em7 Bm
Em7
2003-10-29 - p. 18/19
Conclusions & Future Work
EM is a viable approach for
• ASR-style
learning chord models
more training data needed
• Better features
capture ‘global’ properties of chords
robust to fine tuning issues?
• Better representation for training labels
i.e. allow for extra repeats?
ways to reintroduce
• More
music knowledge
Chord Recognition - Sheh & Ellis
2003-10-29 - p. 19/19
Download