Statistical automatic identification of microchiroptera from echolocation calls

Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab Electrical and Computer Engineering University of Florida Gainesville, FL, USA November 19, 2004 Overview • • • • • Motivations for bat acoustic research Review bat call classification methods Contrast with 1970s human ASR Experiments Conclusions Bat research motivations • Bats are among: – the most diverse, – the most endangered, – and the least studied mammals. • 1000 species, ~25% of all mammal species • Close relationship with insects, agricultural impact, disease vectors • Acoustical research non-invasive, significant domain (echolocation) • Simplified biological acoustic communication system (compared to human speech) Bat echolocation • Ultrasonic, brief chirps • Determine range, velocity of nearby objects (clutter, prey, conspecifics) • Tailored for task, environment Tadarida brasiliensis (Mexican free-tailed bat) Listen to 10x time-expanded search calls: Echolocation calls • Two characteristics – Frequency modulated -- range – Constant frequency -- velocity • Features (holistic) – Freq. extrema – Duration – Shape – # harmonics – Call interval Mexican free-tailed calls, concatenated Current classification methods • Expert sonogram readers – Manual or automatic feature extraction – Comparison with exemplar sonograms • Automatic classification – Decision trees – Discriminant function analysis – Artificial neural networks – Spectrogram correlation Parallels the knowledge-based approach to human ASR from the 1970s (acoustic phonetics, expert systems, cognitive approach). Acoustic phonetics DH AH F UH T B AO L G EY EM IH Z OW V ER • Bottom up paradigm – Frames, boundaries, groups, phonemes, words • Manual or automatic feature extraction – Formants, voicing, duration, intensity, transitions • Classification – Decision tree, discriminant functions, neural network, Gaussian mixture model, Viterbi path Acoustic phonetics limitations • Variability of conversational speech – Complex rules, difficult to train • Boundaries difficult to define – Coarticulation • Feature estimates brittle – Variable noise robustness • Hard decisions, errors accumulate Shifted to information theoretic paradigm of human ASR, better able to account for variability of speech, noise. Information theoretic ASR • Data-driven models from computer science – Non-parametric: dynamic time warp (DTW) – Parametric: hidden Markov model (HMM) • Frame-based – Expert information in feature extraction – Models account for feature, temporal variability Information theoretic ASR dominates state-of-the-art speech understanding systems. Data collection • UF Bat House, home to 60,000 bats – Mexican free-tailed bat (vast majority) – Evening bat – Southeastern myotis • Continuous recording – 90 minutes around sunset – ~20,000 calls • Equipment: – – – – – B&K mic (4939), 100 kHz B&K preamp (2670) Custom amp/AA filter NI 6036E 200kS/s A/D card Laptop, Matlab Experiment design • Designs and assumptions – – – – All recorded bats are Mexican free-tailed Calls divided into different intraspecies calls All calls are search phase Hand-labeled call detection is complete (no discarded calls) • Hand labels – – – – – Narrowband spectrogram Endpoints, class label 436 calls in 261 0.5-sec sequences (2% of data) Four classes, a priori: 34, 40, 20, 6% All experiments on hand-labeled data only • Baseline Experiments – Features: Fmin, Fmax, Fmax_energy, and duration, from zero crossings and MUSIC – Classifier: Discriminant function analysis, quadratic boundaries • DTW and HMM – Frame-based features: fundamental frequency (MUSIC super-resolution estimate), log energy, temporal derivatives (HMM only) – DTW: MUSIC frequencies, 10% endpoint range – HMM: 5 states/model, 4 Gaussian mixtures/state, diagonal covariances • Tests – Leave one out – 75% train, 25% test, 1000 trials – Test on train (HMM only) Results • Baseline, zero crossing – Leave one out: 72.5% correct – Repeated trials: 72.5 ± 4% (mean ± std) • Baseline, MUSIC – Leave one out: 79.1% – Repeated trials: 77.5 ± 4% • DTW, MUSIC – Leave one out: 74.5 % – Repeated trials: 74.1 ± 4% • HMM, MUSIC – Test on train: 85.3 % Confusion matrices Baseline, zero crossing 1 2 3 4 Baseline, MUSIC 1 2 3 4 1 107 38 1 2 72.3% 1 110 36 1 1 74.3% 2 21 134 16 4 76.6% 2 12 149 12 2 85.1% 3 2 29 57 0 64.8% 3 4 18 66 0 75.0% 4 4 3 0 18 72.0% 4 3 2 0 20 80.0% 72.5% 79.1% DTW, MUSIC 1 2 3 4 1 115 29 0 4 2 32 131 11 3 5 20 4 5 4 HMM, MUSIC 1 2 3 4 77.7% 1 118 25 0 5 79.7% 1 74.9% 2 10 154 5 6 88.0% 63 0 71.6% 3 1 12 75 0 85.2% 0 16 64.0% 4 0 0 0 25 100% 74.5% 85.3% Conclusions • Human ASR algorithms applicable to bat echolocation calls • Experiments – – – – Weakness: accuracy of class labels No labeled calls excluded HMM most accurate, undertrained MUSIC frequency estimate robust, slow • Machine learning – DTW: fast training, slow classification – HMM: slow training, fast classification Future work • Find robust features of bat echolocation calls that match assumptions of machine learning algorithms – Noise robust – Distribution modeled by Gaussian mixtures • Use hand-labeled subset of data to create call detection algorithm • Explore unsupervised learning – Self-organized maps – Clustering • Real-time portable detection/classification system on laptop PC Further information • http://www.cnel.ufl.edu/~markskow • markskow@cnel.ufl.edu • DTW reference: – L. Rabiner and B. Juang, Fundamentals of Speech Recognition, Prentice Hall, Englewood Cliffs, NJ, 1993 • HMM reference: – L. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” in Readings in Speech Recognition, A. Waibel and K.F. Lee, Eds., pp. 267–296. Kaufmann, San Mateo, CA, 1990.

Statistical automatic identification of microchiroptera from echolocation calls

Related documents

Products

Support

Statistical automatic identification of microchiroptera from echolocation calls

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib