Sound Analysis Research at Lab ROSA 1.

Sound Analysis Research at LabROSA Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/ 1. Speech 2. Music 3. Environmental Sound LabROSA Overview - Dan Ellis 2005-04-01 - 1 /17 LabROSA Overview Information Extraction Music Eigenrhythms Machine Meeting Learning turns Environment Personal audio FDLP Signal Processing Speech LabROSA Overview - Dan Ellis 2005-04-01 - 2 /17 1. Speech Analysis / Recognition with Sambarta Bhattacharjee recognizers work for read speech • Speech poorly for spontaneous e.g. 5% errors ➝ 30% • Transform spontaneous speech to read? Spont speech pole freq Read 20 15 10 5 50 100 150 200 250 300 350 400 450 500 550 Spontaneous 20 15 10 5 50 100 150 200 250 300 350 400 450 LabROSA Overview - Dan Ellis 500 550 slope < 1 ➝ reduction Read speech pole freq 2005-04-01 - 3 /17 Meeting Recordings with Jerry Liu and ICSI • Multi-mic recordings for speaker turns every voice reaches every mic... (?) ... but with differing coupling filters (delays, gains) • Find turns with minimal assumptions e.g. ad-hoc sensor setups (multiple PDAs) differences to remove effect of source signal - no spectral models, < 1xRT LabROSA Overview - Dan Ellis 2005-04-01 - 4 /17 Speaker Turns from Timing Diffs • Find best timing skew between mic pairs • Find clusters in high-confidence points Gaussians to each cluster, • Fit assign that class to all frames within radius ICSI0: good points All pts: nearest class All pts: closest dimension 0 0 0 -20 -20 -20 -40 -40 -40 -60 -60 -60 -80 -80 -80 -100 -100 -50 0 -100 -100 LabROSA Overview - Dan Ellis -50 0 -100 -100 -50 0 2005-04-01 - 5 /17 2. Music Signal Analysis • A lot of music data available e.g. 60G of MP3 ≈ 1000 hr of audio/15k tracks • What can we do with it? implicit definition of ‘music’ • Quality vs. quantity Speech recognition lesson: 10x data, 1/10th annotation, twice as useful • Motivating Applications music similarity / classification computer (assisted) music generation insight into music LabROSA Overview - Dan Ellis 2005-04-01 - 6 /17 Transcription as Classification with Graham Poliner • Signal models typically used for transcription harmonic spectrum, superposition • But ... trade domain knowledge for data transcription as pure classification problem: Audio Trained classifier p("C0"|Audio) p("C#0"|Audio) p("D0"|Audio) p("D#0"|Audio) p("E0"|Audio) p("F0"|Audio) single N-way discrimination for “melody” per-note classifiers for polyphonic transcription LabROSA Overview - Dan Ellis 2005-04-01 - 7 /17 Classifier Transcription Results • Trained on MIDI syntheses (32 songs) SMO SVM (Weka) • Tested on ISMIR MIREX 2003 set foreground/background separation Frame-level pitch concordance system “jazz3” overall fg+bg 71.5% 44.3% just fg 56.1% 45.4% LabROSA Overview - Dan Ellis 2005-04-01 - 8 /17 Eigenrhythms: Drum Pattern Space with John Arroyo • Pop songs built on repeating “drum loop” bass drum, snare, hi-hat small variations on a few basic patterns • Eigen-analysis (PCA) to capture variations? by analyzing lots of (MIDI) data • Applications music categorization “beat box” synthesis LabROSA Overview - Dan Ellis 2005-04-01 - 9 /17 Eigenrhythms 20+ Eigenvectors for good coverage • Need of 100 training patterns (1200 dims) • Top patterns: LabROSA Overview - Dan Ellis 2005-04-01 - 10/17 Eigenrhythms for Classification • Projections in Eigenspace / LDA space PCA(1,2) projection (16% corr) LDA(1,2) projection (33% corr) 10 6 blues 4 country disco hiphop 2 house newwave rock 0 pop punk -2 rnb 5 0 -5 -10 -20 -10 0 10 -4 -8 -6 -4 -2 • 10-way Genre classification (nearest nbr): PCA3: 20% correct LDA4: 36% correct LabROSA Overview - Dan Ellis 2005-04-01 - 11/17 0 2 3. Other Sounds: Clap Detection clapping may • Rhythmic help neural development with Nathan Lesser sensori-motor planning focus and attention metronome” • “Interactive devices give feedback on synchrony sensor-based • Classroom deployment? acoustic-based? for multiple simultaneous users?? LabROSA Overview - Dan Ellis from interactivemetronome.com 2005-04-01 - 12/17 burst for • Initial near-field “direct sound” Far-field (327MUDD ff50:4) 0.25 0 -0.25 -0.5 energy (4ms) / dB reverberation (RT60 ~ 900ms) Near-field (327MUDD nf50:4) 0.5 0 -20 -40 -60 -80 freq / kHx level • Absolute varies slopes • Decay ~ same amplitude Clap Range Discrimination 10 8 6 4 2 0 0 LabROSA Overview - Dan Ellis 0.1 0.2 0.3 0.4 0.5 0.6 0.7 time / s 0 0.1 0.2 0.3 0.4 2005-04-01 - 13/17 0.5 0.6 0.7 time / s “Personal Audio” • Easy to record everything you hear ~100GB / year @ 64 kbps • Very hard to find anything with Keansub Lee how to scan? how to visualize? how to index? • Starting point: Collect data ~ 60 hours (8 days, ~7.5 hr/day) hand-mark 139 segments (26 min/seg avg.) assign to 16 classes (8 have multiple instances) LabROSA Overview - Dan Ellis 2005-04-01 - 14/17 Features for Long Recordings • Feature frames = 1 min (not 25 ms!) • Characterize variation within each frame... Normalized Energy Deviation Average Linear Energy 120 15 100 10 80 15 40 10 20 5 5 dB Average Log Energy 60 dB Log Energy Deviation 120 15 100 10 80 20 freq / bark 20 freq / bark 60 20 freq / bark freq / bark 20 5 15 15 10 10 5 5 60 dB dB Spectral Entropy Deviation Average Spectral Entropy 0.9 0.8 15 0.7 10 5 • 0.6 0.5 bits 20 freq / bark freq / bark 20 0.5 15 0.4 10 0.3 0.2 5 0.1 50 100 150 200 250 300 350 400 450 time / min and structure within coarse auditory bands LabROSA Overview - Dan Ellis 2005-04-01 - 15/17 bits Personal Audio Applications / • Visualization browsing / diary inference link in other information sources - diary - email • NoteTaker interface: “what was I hearing?” LabROSA Overview - Dan Ellis 2005-04-01 - 16/17 LabROSA Summary • LabROSA signal processing + machine learning + information extraction • Applications Speech: Recognition, Organization Music: Transcription, Recommendation Environment: Detection, Description • Also... signal separation, compression, dolphins... LabROSA Overview - Dan Ellis 2005-04-01 - 17/17

Sound Analysis Research at Lab ROSA 1.

Related documents

Products

Support

Sound Analysis Research at Lab ROSA 1.

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib