Audio Lifelogging Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA ! dpwe@ee.columbia.edu 1. 2. 3. 4. Audio Lifelogging - Dan Ellis http://labrosa.ee.columbia.edu/ Audio Lifelogging Speech Environmental Sound Analysis Medical Applications 2014-01-29 1 /6 Personal Audio Lifelogs ! ! • Easy to record everything you hear ~250GB / year @ 64 kbps ! ! • Very hard to find anything how to scan? how to visualize? how to index? Audio Lifelogging - Dan Ellis 2014-01-29 2 /6 Datasets and tasks Speech Recognition E’ noise backgrounds Transcripts of discussions would be useful… • noise backgrounds recorded in a family home (living room). distant-mic / noisy ASR still quite limited sources, well-defined application domain with a learnable CHiME (Comp. Hearing in Track Multisource 2 resultsEnv.s): Results cabulary’ and ‘grammar’. 80 70 Word error rate (%) 60 Results Track 2 results 80 70 Word error rate (%) 60 50 40 40 30 20 10 30 20 10 A A TU TU FB M 50 0 −6 ASR Baseline (reverb) ASR Baseline (noisy) TU Tampere & KU Leuven TU Munich, TUT, KUL & BMW FBK−Irst & INESC−ID Mitsubishi Electric −3 0 3 SNR (dB) 6 9 14 h of audio in 0.5 to 1.5 h sessions over Barker several et al.Best ’13 weeks. 0 system: spatial enhancement, MLLT, SAT, LD −6 −3 0 3 6 9 SNR (dB) http://spandh.dcs.shef.ac.uk/chime_workshop/slides/CHiME13_overview.pdf augmentation, bMMIfeature noise-adaptive training, DLM Best system: spatial enhancement, SAT, LDA, f-bMMI, The 2nd ‘CHiME’ Challenge ( MLLT, 01/06/2013 6 / 23 Audio Lifelogging - Dan Ellis The2014-01-29 2nd ‘CHiME’ Challenge3 /6 augmentation, bMMI noise-adaptive training, DLM and MBR decoding Environmental Sound Classification Ellis, Zheng, McDermott ’11 • Classify soundtracks with “texture” features ! ! Sound \x\ Automatic gain control \x\ mel filterbank (18 chans) ! ! • Mixed results… Audio Lifelogging - Dan Ellis FFT \x\ \x\ \x\ \x\ Envelope correlation Histogram Octave bins 0.5,1,2,4,8,16 Hz Modulation energy (18 x 6) mean, var, skew, kurt (18 x 4) Cross-band correlations (318 samples) 2014-01-29 4 /6 Segmentation & Clustering • ! Features from 1 min windows ! • Segmentation by local statistics ! • Cluster across • Lee & Ellis ’04 09:00 09:30 10:00 10:30 11:00 11:30 preschool cafe Ron lecture 12:30 office outdoor group L2 cafe office outdoor lecture outdoor DSP03 compmtg meeting2 13:30 lab 14:00 cafe meeting2 Manuel outdoor office cafe office Mike Arroyo? outdoor Sambarta? 14:30 whole set ! 16:00 Audio Lifelogging - Dan Ellis cafe office 13:00 2004-09-14 preschool 12:00 15:00 Manual labels of each class 2004-09-13 15:30 office office office postlec office Lesser 16:30 17:00 17:30 18:00 outdoor lab cafe 2014-01-29 5 /6 Medical Applications • Tracking behavior & symptoms coughing sleep disturbances • Tracking hospital interactions correlate recordings Audio Lifelogging - Dan Ellis 2014-01-29 6 /6 Privacy freq / kHz scramble audio over 200ms windows freq / kHz • Privacy-preserving features 4 Original (dan+kean-ex.wav) 2 20 0 0 -20 4 Scrambled (200ms wins over 1s) -40 -60 level / dB 2 0 0 2 4 6 8 10 12 14 time / s • Self-audio only augmented-reality earphones/mics Hearium Audio Lifelogging - Dan Ellis 2014-01-29 7 /6 Summary • Personal Audio Lifelogs Too good to waste! ! • Speech & Nonspeech Content Both useful in different ways ! • Applications Personal information, behavior measurement Audio Lifelogging - Dan Ellis 2014-01-29 8 /6