Automatic detection of microchiroptera echolocation calls from field recordings using machine learning algorithms Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab Electrical and Computer Engineering University of Florida, Gainesville, FL, USA May 19, 2005 Overview • • • • Motivations for acoustic bat detection Machine learning paradigm Detection experiments Conclusions Bat detection motivations • Bats are among the most diverse yet least studied mammals (~25% of all mammal species are bats). • Bats affect agriculture and carry diseases (directly or through parasites). • Acoustical domain is significant for echolocating bats and is non-invasive. • Recorded data can be volumous automated algorithms for objective and repeatable detection & classification desired. Conventional methods • Conventional bat detection/classification parallels acoustic-phonetic paradigm of automatic speech recognition from 1970s. • Characteristics of acoustic phonetics: – – – – Originally mimicked human expert methods First, boundaries between regions determined Second, features for each region were extracted Third, features compared with decision trees, DFA • Limitations: – Boundaries ill-defined, sensitive to noise – Many feature extraction algorithms with varying degrees of noise robustness Machine learning • Acoustic phonetics gave way to machine learning for ASR in 1980s: • Advantages: – – – – Decisions based on more information Mature statistical foundation for algorithms Frame-based features, from expert knowledge Improved noise robustness • For bats: increased detection range Detection experiments • Database of bat calls – 7 different recording sites, 8 species – 1265 hand-labeled calls (from spectrogram readings) • Detection experiment design – Discrete events: 20-ms bins – Discrete outcomes: Yes or No: does a bin contain any part of a bat call? Detectors • Baseline – Threshold for frame energy • Gaussian mixture model (GMM) – Model of probability distribution of call features – Threshold for model output probability • Hidden Markov model (HMM) – Similar to GMM, but includes temporal constraints through piecewise-stationary states – Threshold for model output probability along Viterbi path Feature extraction • Baseline – Normalization: session noise floor at 0 dB – Feature: frame power • Machine learning – Blackman window, zero-padded FFT – Normalization: log amplitude mean subtraction • From ASR: ~cepstral mean subtraction • Removes transfer function of recording environment • Mean across time for each FFT bin – Features: • Maximum FFT amplitude, dB • Frequency at maximum amplitude, Hz • First and second temporal derivatives (slope, concavity) Feature extraction examples Feature extraction examples Feature extraction examples Six features: Power, Frequency, P, F P, F Detection example Experiment results Experiment results Conclusions • Machine learning algorithms improve detection when specificity is high (>.6). • HMM slightly superior to GMM, uses more temporal information, but slower to train/test. • Hand labels determined using spectrogram, biased towards high-power calls. • Machine learning models applicable to other species. Bioacoustic applications • To apply machine learning to other species: – Determine ground truth training data through expert hand labels – Extract relevant frame-based features, considering domain-specific noise sources (echos, propellor noise, other biological sources) – Train models of features from hand-labeled data – Consider training “silence” models for discriminant detection/classification Further information • http://www.cnel.ufl.edu/~markskow • markskow@cnel.ufl.edu Acknowledgements Bat data kindly provided by: Brock Fenton, U. of Western Ontario, Canada