Sound Analysis Research at Lab ROSA 1.

advertisement
Sound Analysis Research
at LabROSA
Dan Ellis
Laboratory for Recognition and Organization of Speech and Audio
Dept. Electrical Eng., Columbia Univ., NY USA
dpwe@ee.columbia.edu
http://labrosa.ee.columbia.edu/
1. Speech
2. Music
3. Environmental Sound
LabROSA Overview - Dan Ellis
2005-04-01 - 1 /17
LabROSA Overview
Information
Extraction
Music Eigenrhythms
Machine Meeting
Learning turns
Environment
Personal
audio
FDLP Signal
Processing
Speech
LabROSA Overview - Dan Ellis
2005-04-01 - 2 /17
1. Speech Analysis / Recognition
with Sambarta Bhattacharjee
recognizers work for read speech
• Speech
poorly for spontaneous
e.g. 5% errors ➝ 30%
• Transform spontaneous speech to read?
Spont speech pole freq
Read
20
15
10
5
50
100
150
200
250
300
350
400
450
500
550
Spontaneous
20
15
10
5
50
100
150
200
250
300
350
400
450
LabROSA Overview - Dan Ellis
500
550
slope < 1
➝ reduction
Read speech pole freq
2005-04-01 - 3 /17
Meeting Recordings
with Jerry Liu and ICSI
• Multi-mic recordings for speaker turns
every voice reaches every mic... (?)
... but with differing coupling filters (delays, gains)
• Find turns with minimal assumptions
e.g. ad-hoc sensor setups (multiple PDAs)
differences to remove effect of source signal
- no spectral models, < 1xRT
LabROSA Overview - Dan Ellis
2005-04-01 - 4 /17
Speaker Turns from Timing Diffs
• Find best timing skew between mic pairs
• Find clusters in high-confidence points
Gaussians to each cluster,
• Fit
assign that class to all frames within radius
ICSI0: good points
All pts: nearest class
All pts: closest dimension
0
0
0
-20
-20
-20
-40
-40
-40
-60
-60
-60
-80
-80
-80
-100
-100
-50
0
-100
-100
LabROSA Overview - Dan Ellis
-50
0
-100
-100
-50
0
2005-04-01 - 5 /17
2. Music Signal Analysis
• A lot of music data available
e.g. 60G of MP3
≈ 1000 hr of audio/15k tracks
• What can we do with it?
implicit definition of ‘music’
• Quality vs. quantity
Speech recognition lesson:
10x data, 1/10th annotation, twice as useful
• Motivating Applications
music similarity / classification
computer (assisted) music generation
insight into music
LabROSA Overview - Dan Ellis
2005-04-01 - 6 /17
Transcription as Classification
with Graham Poliner
• Signal models typically used for transcription
harmonic spectrum, superposition
• But ... trade domain knowledge for data
transcription as pure classification problem:
Audio
Trained
classifier
p("C0"|Audio)
p("C#0"|Audio)
p("D0"|Audio)
p("D#0"|Audio)
p("E0"|Audio)
p("F0"|Audio)
single N-way discrimination for “melody”
per-note classifiers for polyphonic transcription
LabROSA Overview - Dan Ellis
2005-04-01 - 7 /17
Classifier Transcription Results
• Trained on MIDI syntheses (32 songs)
SMO SVM (Weka)
• Tested on ISMIR MIREX 2003 set
foreground/background separation
Frame-level pitch concordance
system
“jazz3”
overall
fg+bg
71.5%
44.3%
just fg
56.1%
45.4%
LabROSA Overview - Dan Ellis
2005-04-01 - 8 /17
Eigenrhythms: Drum Pattern Space
with John Arroyo
• Pop songs built on repeating “drum loop”
bass drum, snare, hi-hat
small variations on a few basic patterns
• Eigen-analysis (PCA) to capture variations?
by analyzing lots of (MIDI) data
• Applications
music categorization
“beat box” synthesis
LabROSA Overview - Dan Ellis
2005-04-01 - 9 /17
Eigenrhythms
20+ Eigenvectors for good coverage
• Need
of 100 training patterns (1200 dims)
• Top patterns:
LabROSA Overview - Dan Ellis
2005-04-01 - 10/17
Eigenrhythms for Classification
• Projections in Eigenspace / LDA space
PCA(1,2) projection (16% corr)
LDA(1,2) projection (33% corr)
10
6
blues
4
country
disco
hiphop
2
house
newwave
rock 0
pop
punk
-2
rnb
5
0
-5
-10
-20
-10
0
10
-4
-8
-6
-4
-2
• 10-way Genre classification (nearest nbr):
PCA3: 20% correct
LDA4: 36% correct
LabROSA Overview - Dan Ellis
2005-04-01 - 11/17
0
2
3. Other Sounds: Clap Detection
clapping may
• Rhythmic
help neural development
with Nathan Lesser
sensori-motor planning
focus and attention
metronome”
• “Interactive
devices
give feedback on synchrony
sensor-based
• Classroom deployment?
acoustic-based?
for multiple simultaneous users??
LabROSA Overview - Dan Ellis
from interactivemetronome.com
2005-04-01 - 12/17
burst for
• Initial
near-field
“direct sound”
Far-field (327MUDD ff50:4)
0.25
0
-0.25
-0.5
energy (4ms) / dB
reverberation
(RT60 ~ 900ms)
Near-field (327MUDD nf50:4)
0.5
0
-20
-40
-60
-80
freq / kHx
level
• Absolute
varies
slopes
• Decay
~ same
amplitude
Clap Range Discrimination
10
8
6
4
2
0
0
LabROSA Overview - Dan Ellis
0.1
0.2
0.3
0.4
0.5
0.6
0.7
time / s
0
0.1
0.2
0.3
0.4
2005-04-01 - 13/17
0.5
0.6
0.7
time / s
“Personal Audio”
• Easy to record everything you hear
~100GB / year @ 64 kbps
• Very hard to find anything
with Keansub Lee
how to scan?
how to visualize?
how to index?
• Starting point: Collect data
~ 60 hours (8 days, ~7.5 hr/day)
hand-mark 139 segments (26 min/seg avg.)
assign to 16 classes (8 have multiple instances)
LabROSA Overview - Dan Ellis
2005-04-01 - 14/17
Features for Long Recordings
• Feature frames = 1 min (not 25 ms!)
• Characterize variation within each frame...
Normalized Energy Deviation
Average Linear Energy
120
15
100
10
80
15
40
10
20
5
5
dB
Average Log Energy
60
dB
Log Energy Deviation
120
15
100
10
80
20
freq / bark
20
freq / bark
60
20
freq / bark
freq / bark
20
5
15
15
10
10
5
5
60
dB
dB
Spectral Entropy Deviation
Average Spectral Entropy
0.9
0.8
15
0.7
10
5
•
0.6
0.5
bits
20
freq / bark
freq / bark
20
0.5
15
0.4
10
0.3
0.2
5
0.1
50
100
150
200
250
300
350
400
450
time / min
and structure within coarse auditory bands
LabROSA Overview - Dan Ellis
2005-04-01 - 15/17
bits
Personal Audio Applications
/
• Visualization
browsing /
diary inference
link in other
information sources
- diary
- email
• NoteTaker interface:
“what was I hearing?”
LabROSA Overview - Dan Ellis
2005-04-01 - 16/17
LabROSA Summary
• LabROSA
signal processing
+ machine learning
+ information extraction
• Applications
Speech: Recognition, Organization
Music: Transcription, Recommendation
Environment: Detection, Description
• Also...
signal separation, compression, dolphins...
LabROSA Overview - Dan Ellis
2005-04-01 - 17/17
Download