Automatic detection of microchiroptera echolocation calls from field recordings

advertisement
Automatic detection of microchiroptera
echolocation calls from field recordings
using machine learning algorithms
Mark D. Skowronski and John G. Harris
Computational Neuro-Engineering Lab
Electrical and Computer Engineering
University of Florida, Gainesville, FL, USA
May 19, 2005
Overview
•
•
•
•
Motivations for acoustic bat detection
Machine learning paradigm
Detection experiments
Conclusions
Bat detection motivations
• Bats are among the most diverse yet least
studied mammals (~25% of all mammal
species are bats).
• Bats affect agriculture and carry diseases
(directly or through parasites).
• Acoustical domain is significant for
echolocating bats and is non-invasive.
• Recorded data can be volumous 
automated algorithms for objective and
repeatable detection & classification desired.
Conventional methods
• Conventional bat detection/classification parallels
acoustic-phonetic paradigm of automatic
speech recognition from 1970s.
• Characteristics of acoustic phonetics:
–
–
–
–
Originally mimicked human expert methods
First, boundaries between regions determined
Second, features for each region were extracted
Third, features compared with decision trees, DFA
• Limitations:
– Boundaries ill-defined, sensitive to noise
– Many feature extraction algorithms with varying
degrees of noise robustness
Machine learning
• Acoustic phonetics gave way to machine
learning for ASR in 1980s:
• Advantages:
–
–
–
–
Decisions based on more information
Mature statistical foundation for algorithms
Frame-based features, from expert knowledge
Improved noise robustness
• For bats: increased detection range
Detection experiments
• Database of bat calls
– 7 different recording sites, 8 species
– 1265 hand-labeled calls (from spectrogram
readings)
• Detection experiment design
– Discrete events: 20-ms bins
– Discrete outcomes: Yes or No: does a bin
contain any part of a bat call?
Detectors
• Baseline
– Threshold for frame energy
• Gaussian mixture model (GMM)
– Model of probability distribution of call features
– Threshold for model output probability
• Hidden Markov model (HMM)
– Similar to GMM, but includes temporal constraints
through piecewise-stationary states
– Threshold for model output probability along Viterbi
path
Feature extraction
• Baseline
– Normalization: session noise floor at 0 dB
– Feature: frame power
• Machine learning
– Blackman window, zero-padded FFT
– Normalization: log amplitude mean subtraction
• From ASR: ~cepstral mean subtraction
• Removes transfer function of recording environment
• Mean across time for each FFT bin
– Features:
• Maximum FFT amplitude, dB
• Frequency at maximum amplitude, Hz
• First and second temporal derivatives (slope, concavity)
Feature extraction examples
Feature extraction examples
Feature extraction examples
Six features: Power, Frequency, P, F P, F
Detection example
Experiment results
Experiment results
Conclusions
• Machine learning algorithms improve detection
when specificity is high (>.6).
• HMM slightly superior to GMM, uses more
temporal information, but slower to train/test.
• Hand labels determined using spectrogram,
biased towards high-power calls.
• Machine learning models applicable to other
species.
Bioacoustic applications
• To apply machine learning to other species:
– Determine ground truth training data through
expert hand labels
– Extract relevant frame-based features, considering
domain-specific noise sources (echos, propellor
noise, other biological sources)
– Train models of features from hand-labeled data
– Consider training “silence” models for discriminant
detection/classification
Further information
• http://www.cnel.ufl.edu/~markskow
• markskow@cnel.ufl.edu
Acknowledgements
Bat data kindly provided by:
Brock Fenton, U. of Western Ontario, Canada
Download