SonoBat 3 presentation

advertisement
Automated species
classification with SonoBat
3
SonoBat uses a decision engine based on the
quantitative analysis of approximately 10,000 speciesknown recordings from across North America.
SonoBat automatically
recognizes and sorts calls, then
processes the calls to extract six
dozen parameters that describe
the time-frequency and timeamplitude trends of a call.
SonoBat’s intelligent call
trending algorithm can
recognize the end of calls
buried in echo and noise.
SonoBat can also successfully establish trends through noise
and from low power signals.
SonoBat generates high
resolution continuous trends
of time-frequency and timeamplitude content that
enable robust parameter
extraction. Inclusion of
amplitude parameters
increases classification
performance above that
achieved by using just timefrequency parameters alone.
Example classifier discriminant
functions for Myotis leibii vs. M.
septentrionalis showing the
relative dependence of frequency
and amplitude parameters
selected as important for
discrimination.
SonoBat will classify individual
calls processed as high resolution
standard views.
However, classifying an entire sequence will typically provide
more reliable results as this method benefits from the combined
information within the sequence.
Press this button to classify a currently opened sequence.
Knowing when to
not make a
classification
decision prevents
misclassifications.
SonoBat assesses a number
of signal quality and reliability
indicators, and if any fall below
their acceptable level, the
classification indicator displays
as grayed out to indicate an
unreliable result.
In this example the quality falls below the
default minimum acceptable value of 0.80
(may be changed in the Pref Panel).
SonoBat rejected this example because it
has an overloaded signal (a.k.a. saturated
or clipped). Overloaded signals provide
unreliable amplitude information that could
lead to misclassification.
Rejected calls do not contribute
to sequence decisions.
SonoBat rejected this example
because it has both an overloaded
signal and falls below the default
minimum acceptable quality of 0.80.
SonoBat classifies calls using an
ensemble consensus of hierarchical
decision algorithms and reports a
single species decision if the result
exceeds the discriminant probability
threshold set in the Pref Panel.
If a classification decision
does not exceed the threshold,
then SonoBat displays the
species or hierarchical groups
that sum to the threshold, as
shown in this example.
If the processed data from a
call falls outside of known or
confident data space,
SonoBat will display a
classification result of “none.”
It may also indicate, as in this
example, a call having a trend
that reaches either end of its
std view window. SonoBat
treats this as an incomplete
and thereby unreliable set of
call data and does not venture
a classification.
Should this happen when
manually selecting a call for
std view display and
classification, simply select a
larger std view window and
the call will reprocess.
Note: A more generous
spacing tolerance reduces
truncated call trends during
sequence classifications and
automated batch runs.
In this example, one of the hierarchical
group decisions did not exceed the
discriminant probability threshold required
to advance to the next decision level.
If the decision does not reach the species
level, then SonoBat reports the last level
that did not satisfy the threshold.
Detectors can only
record the strongest
portions of calls from
bats partially out range
resulting in call
fragments.
(Grayed-out result indicates unreliable result.)
Some call fragments mimic the data
characteristics of other species and this can
result in misclassifications, unless recognized
as such. E.g., out of range little brown bats will
leave fragments from the body of the call
having a simple curved sweep that mimic red
bats.
SonoBat applies secondary logic tests to
classification decisions and rejects calls that do
not meet minimum classification criteria.
Another call later in the same
sequence as the call fragment in
the previous slide.
This call had sufficient criteria for
this species to output a confident
species decision.
Sequence classification
performed on the same
file showing the decision
based on all accepted
calls within the
sequence.
Classifying an entire sequence (i.e., bat pass) typically provides
more confident results than individual call classification as this
method benefits from the combined information within the sequence.
For a sequence classification, SonoBat first ranks the calls in a
sequence based on apparent time-frequency coverage and amplitude
and then classifies the individual calls in descending order of rank up
to the number designated in the preferences for "max # of calls to
consider per file." If any of these calls result in a rejected
classification, SonoBat will move on to the next call in the ranked
order until reaching the "max # of calls to consider per file" or the
end of the available acceptable calls in the file.
SonoBat then considers the accepted calls by hierarchical group and
species and processes them to generate a mean sequence decision.
SonoBat also determines a
sequence vote decision
based on the individual call
decisions. A majority vote requires a minimum of two calls per
majority spp (except for spp like L. noctivagans or T. brasiliensis that
can have few calls/sec) and requires the majority species to have
equal to or better than twice the number of calls as the sum of the
second and third most prevalent species (if classified).
Consensus of sequence vote
and the mean sequence
decision provides the most
confident results. For example, for 1,444 known Northeastern US
recordings, the consensus decision correctly classified 98.5% of
recordings, the mean sequence decision 98.9%, and the agreement of
consensus and mean decisions correctly classified 99.1% of the
recordings. (Acceptance rates were 58.8, 47.4, and 47.3% respectively
using a 0.90 threshold setting.)
Note: Automated batch runs will only consider
the first default segment of files larger than your
max segment to process setting.
Current file
length.
Max segment to process.
The quality and accuracy of call trending and classification
depends upon the quality of the recorded signals.
Recording from the ground, near flat
surfaces, or through tubes will render
distorted signals such as these.
A std view from the sequence in
the previous slide reveals the
extent of the signal distortion and
how it inhibits call trending and the
recognition of call parameters.
In summary:
garbage in, garbage out.
Although distorted and
noisy calls typically prevent
confident and reliable
classification, SonoBat still attempts to distinguish bat calls
from noise and tallies sequences
with calls during batch
processing to facilitate counting
total bat passes*.
* To best tally passes, run the SonoBatch
utility without first scrubbing files as you
would for manual processing.
SonoBat 3 classification preference settings
Acceptable discriminant
probability threshold for
classification decisions.
Recommended and
default value: 0.90.
In general, this term provides a measure of how closely the data
approach the centroid of the multivariate data space of the group
or species decision. Note the term “probability” in the name. As a
result of species call variability, noise, and recording distortions,
species with similar call morphologies can still occasionally
generate calls and provide recordings that intrude on the centroid
of another species’ data space and result in a high discriminant
probability rating despite an incorrect absolute classification.
SonoBat 3 classification preference settings
Recommended
and default
value: 0.80.
SonoBat will reject calls with a quality rating below this value and not
consider them for sequence classification or for parameterization output.
To interpret where to set the acceptable quality for your recordings, you
can observe the quality rating of calls manually selected for std view
display and analysis enabled.
SonoBat synthesizes the quality rating based on a combination of signal
strength and noise. The rating works for most recording situations,
however some out of range calls in very quiet recording environments
may nevertheless score disproportionally high quality ratings, i.e., bats at
a distance from the microphone that will only yield call fragments.
SonoBat 3 classification preference settings
Six or more calls will generally
produce more reliable sequence
classifications. Recommended
and default value: 8.
When enabled, SonoBat
will immediately classify
individual calls when
rendered as std views.
Batch processing of files
for classification
decisions and call
parameter extraction.
Press this button to open the SonoBatch panel.
SonoBat will process designated batches of
individual files or folders of files to extract call
parameters or to classify to species.
Set to classify.
Set to
parameterize.
Drop files or folders here
to assemble a batch.
Or push this button
to navigate to files
and folders.
If processing stereo files,
select channel to process.
Dropping a folder will
go one directory level
deep, i.e., a folder of
folders will load all the
files within the
imbedded folders.
(Except for files in folders
named “Deleted Files” or
Scrubbed Files.”)
Click for a
do-over.
Clicking on the Directory listing tab will
display the folders designated for processing
and the number of files in each folder.
Total no. of files to process.
Toggles option to append
classification decisions to
the filename when a
decision exceeds the
discriminant probability
threshold. This can
facilitate vetting
classification decisions
after a SonoBatch run.
E.g., if SonoBat classified the file
"BatSpring-29Jul09-2031,12.wav"
as an Epfu, the SonoBatch processor would
change the filename to
"BatSpring-29Jul09-2031,12-Epfu.wav"
Files already processed.
Batch process in
progress.
Batch progress status.
Total no. of
files to
process.
Example spreadsheet output from SonoBatch run.
Consensus of vote
and mean decision.
Classification
by vote.
Rudimentary bat recognition for
tallying passes.
Mean sequence decision entered
in cell if discriminant probability
exceeds threshold.
Empty cell if decision did
not reach species level.
Example spreadsheet section from SonoBatch run.
The last columns of the output
spreadsheet display in
descending prevalence all calls
in the sequence classified to
species with a DP of at least
0.75.
Individual calls classified to a
species of interest will display
here, even if the sequence did
not meet the conditions for a
sequence decision, or these
may be from a second, less
dominant species in the
sequence. This enables
querying the spreadsheet for
particular species to perform
manual inspection for species
of interest.
Download