Document 10379143

advertisement
ELEN E4896 MUSIC SIGNAL PROCESSING
Lecture 15:
Research at LabROSA
!
1.
2.
3.
4.
Sources, Mixtures,
& Perception
!
Spatial Filtering
!
Time-Frequency
Masking
Model-Based! Separation
Dan Ellis
Dept. Electrical Engineering, Columbia University
dpwe@ee.columbia.edu
E4896 Music Signal Processing (Dan Ellis)
http://www.ee.columbia.edu/~dpwe/e4896/
2014-05-05 - 1 /19
Sparse + Low-Rank + NMF
• Optimization to decompose spectogram:
minimize
|S|1 + |L| + DKL (Y
s.t.
Y =S+L+H·W
S
Zhuo Chen
L||H · W)
Y
H•W
L
S
E4896 Music Signal Processing (Dan Ellis)
2014-05-05 - 2 /19
case bold letters (e.g. x, d s, and z) denote vectors. f 2
NMF
{1, 2, · · · , F } isBeta
used to Process
index frequency.
t 2 {1, 2, · · · , T }
Hoffman
is used toAutomatically
index time. k choose
2 {1, 2,how
· · · ,many
K} is used Liang,
to index
dictionarycomponents
components.to use
BP-NMF is formulated as:
•
X = D(S
(1)
Z) + E
[5] C
ri
S
N
(a) The selected components learned from single-track instrument. For each instrument, the components are sorted by approximated fundamental frequency. The dictionary is cut off above
5512.5 Hz for visualization purposes.
E4896 Music Signal Processing (Dan Ellis)
[6] N
on
A
M
2014-05-05 [7]
- 3 /19
Music Complexity
Colin Raffel
• How can we capture musical patterns in the
Million Song Dataset?
• Network analysis of quantized simultaneities
after Serrà et al. 2012
from Serrà, Corral, Boguña, Haro, & Arcos, 2012
E4896 Music Signal Processing (Dan Ellis)
2014-05-05 - 4 /19
Large-Scale Cover Recognition 1
Thierry Bertin-Mahieux
• How can we find covers in 1M songs?
@ 1 sec / comparison, one search = 11.5 CPU-days
full N2 mining = 16,000 CPU-years
• Need a hashing technique
landmark-based description of chroma patches
!
!
!
!
!
!
Euclidean space projection?
E4896 Music Signal Processing (Dan Ellis)
2014-05-05 - 5 /19
Large-Scale Cover Recognition 2
Thierry Bertin-Mahieux
• 2D Fourier Transform Magnitude (2DFTM)
fixed-size feature to capture “essence” of chromagram: !
• First results on finding covers in 1M songs
Average rank
meanAP
random
500,000
0.000
jumpcodes 2
308,369
0.002
137,117
0.020
2DFTM (50 PC)
E4896 Music Signal Processing (Dan Ellis)
2014-05-05 - 6 /19
Jazz Discography Project
• How can MIR help organize jazz collections?
our tools are quite genre-specific
e.g. beat tracker is fine for pop, useless for Jazz
40
30
20
10
80
60
40
20
0
84
86
E4896 Music Signal Processing (Dan Ellis)
88
90
92
94
96
98
2014-05-05 - 7 /19
Local Tagging
• MFCC-statistics classifiers on 5 sec windows
trained from MajorMiner data
freq / Hz
01 Soul Eyes
2416
1356
761
427
240
135
_90s
club
trance
end
drum_bass
singing
horns
punk
samples
silence
quiet
noise
solo
strings
indie
house
alternative
r_b
funk
soft
ambient
british
distortion
drum_machine
country
keyboard
saxophone
fast
instrumental
electronica
80s
voice
beat
slow
rap
hip_hop
jazz
piano
techno
dance
female
bass
vocal
pop
electronic
rock
synth
male
guitar
drum
50
100
150
200
250
300
1.5
1
0.5
0
−0.5
−1
−1.5
−2
40
80
E4896 Music Signal Processing (Dan Ellis)
120
160
200
240
280
time / s
320
2014-05-05 - 8 /19
Onset Correlation
• “Ahead of” or “behind” the beat?
Tony Williams
E4896 Music Signal Processing (Dan Ellis)
Brian McFee
Elvin Jones
2014-05-05 - 9 /19
Structural Similarity
Diego Silva
Helene Papadopoulos
• Self-similarity shows repeating structure in music
• Can we find similar pieces by finding similar
2020
from Bello 2011
structures?
IEEE TRANSACTIONS ON AUDIO, SPEECH, A
Fig. 5. Comparison of recurrence plots for two performances of W. A.
Fig. 6.- 10
Retrie
E4896 Music Signal Mozart’s
Processing
(Dan Ellis)
2014-05-05
/19
Symphony
# 40, movement 3. The figures illustrate how beat-tracking
Ordinal LDA Segmentation
• Low-rank decomposition of skewed selfsimilarity to identify repeats
• Learned weighting
Self-similarity
275
220
220
110
165
110
-110
55
-220
0
0
Lag
330
55 110 165 220 275 330
Beat
55 110 165 220 275 330
Beat
Latent repetition
0
220
1
110
2
0
-110
3
4
5
6
-220
E4896 Music Signal Processing (Dan Ellis)
-330
0
Filtered self-sim.
-330
0
Skewed self-sim.
0
Factor
Linear Discriminant
Analysis between adjacent
segments
330
Lag
of multiple factors
to segment
Beat
330
McFee
7
55 110 165 220 275 330
Beat
0
55 110 165 220 275 330
Beat
2014-05-05 - 11/19
Lyric Recognition
Matt McVicar
• Speech Recognition for Songs
lots of interference
atypical speech
Frequency (kHz)
Polyphonic Audio
Acapella Audio
4
4
3
3
2
2
1
1
0
0
2
4
6
0
8
0
2
4
Frequency (kHz)
Natural Speech
4
3
3
2
2
1
1
E4896 Music
0
1
2
3
4
Time (seconds)
8
Synthesized Speech
4
0
6
5
6
0
0
1
2
3
4
5
Time (seconds)
6
7
Figure 1: Comparison of vocal types used in this paper, example clip ‘This Love’, Levine-Carmichael. Top row: full
polyphonic audio (including vocals, two electric guitars, bass guitar, piano and drums), Acapella audio (voice only). Bottom
row: Natural
speech performed
by the authors, synthesized speech using the ‘say’ command in Mac OSX. 2014-05-05
Signal
Processing
(Dan Ellis)
- 12/19
Singing ASR
• Speech recognition adapted to singing
needs aligned data
• Align scraped “acapellas” and full mix
McVicar
including jumps!
E4896 Music Signal Processing (Dan Ellis)
2014-05-05 - 13/19
“Remixavier"
Raffel
• Optimal align-and-cancel of mix and acapella
timing and channel may differ
E4896 Music Signal Processing (Dan Ellis)
2014-05-05 - 14/19
Million Song Dataset
• Many Facets
Bertin-Mahieux
McFee
Echo Nest audio features
+ metadata
Echo Nest “taste profile”
user-song-listen count
Second Hand Song covers
musiXmatch lyric BoW
last.fm tags
!
• Now with audio?
resolving artist / album / track / duration against what.cd
E4896 Music Signal Processing (Dan Ellis)
2014-05-05 - 15/19
MIDI-to-MSD
• Aligned MIDI to Audio is a nice
Raffel
transcription
Shi
!
!
!
!
!
!
!
!
!
E4896 Music Signal Processing (Dan Ellis)
2014-05-05 - 16/19
• Problem: De-DTMF
Stationary tones confuse speech detector
Adaptively filter sinusoids with steady amplitude
1000 1500
Freq / Hz
−1
−1
Imaginary Part
20
0
0
1
Real Part
0.6
0.7
0.68
0.68
0.4
0.2
0.7
0.72
Real Part
LPC fit
Find roots
Transform
radii
Overlapadd
Filter
audio frames
Add
poles
Map to
zeros
Filtered signal
Filter response & spectrum
60
1000
55
56 57
Time
40
20
0
−20
500
E4896 Music Signal Processing (Dan Ellis)
1000 1500
Freq / Hz
1
Transformed filter
0
−1
−1
15
0
1
Real Part
1.0
0.8
Framing
2000
0
0
1
LPC poles detail
Imaginary Part
3000
20
Imaginary Part
Ouput audio
56 57
Time
40
−20
500
Gain / dB
Input audio
55
Imaginary Part
1000
0
Frequency
Gain / dB
2000
LPC poles
0.0
Mapped radius
Frequency
tcp_d1_02_counting_cia_irdial Spectrum and LPC fit
3000
60
Transformed filter detail
0.72
0.7
0.68
0.68
0.7
0.72
Real Part
2014-05-05 - 17/19
Pitch-based Filtering
• Resample to flatten pitch, then filter
E4896 Music Signal Processing (Dan Ellis)
2014-05-05 - 18/19
Summary
• Signal Separation
NMF, RPCA, cancellation, filtering
!
• Music Information
Beat tracking, segmentation
Large datasets
Indexing & retrieval
!
• Speech
Lyric recognition
Speech detection & enhancement
E4896 Music Signal Processing (Dan Ellis)
2014-05-05 - 19/19
References
[Bello 2011] J P Bello, “Measuring structural similarity in
music”, IEEE Tr. Audio, Speech, & Lang., 19(7): 2013-2025,
2011.
[Serra et al. 2012] J Serrà, A Corral, M Boguña, M. Haro, & J.
Arcos, “Measuring the evolution of contemporary western
popular music”, Scientific Reports, 2:521, 2012.
E4896 Music Signal Processing (Dan Ellis)
2014-05-05 - 20/19
Download