DTW Applications and Derivation - Seidenberg School of Computer

advertisement
Dynamic Time Warping
Applications and Derivation
Charles Tappert
Seidenberg School of CSIS, Pace University
Dynamic Time Warping (DTW)
non-linear/elastic matching, Viterbi algorithm

Many Applications




Speech recognition
Speech sound alignment
Speech sound generation
Online handwriting recognition
Dynamic Time Warping (DTW)
non-linear/elastic matching, Viterbi algorithm



Derivation of a DTW algorithm variation (speech recognition)
A speech utterance is represented as a time sequence of
feature vectors
Example
Dynamic Time Warping (DTW)
non-linear/elastic matching, Viterbi algorithm


Consider a finite state machine model of a speech utterance
prototype where the observable output from transitions
between states is an acoustic feature vector which is a
probabilistic function of the origin state of each transition
Note: some transitions cause stretching and others cause
compression of the sequence of feature vectors produced
Dynamic Time Warping (DTW)
non-linear/elastic matching, Viterbi algorithm


Background information
Univariate (one-dimensional) normal density function
Dynamic Time Warping (DTW)
non-linear/elastic matching, Viterbi algorithm


Background information
Multivariate normal density function
Dynamic Time Warping (DTW)
non-linear/elastic matching, Viterbi algorithm



In traversing each arc in this model a feature vector is
produced with assumed underlying normal distribution
where i is the unknown, j the prototype, Vi are feature
vectors of the unknown, Mj are mean feature vectors and
sigma the covariance matrix of the prototype
This statistical characterization of prototypes would require
multiple repetitions of the vocabulary to be recognized
Dynamic Time Warping (DTW)
non-linear/elastic matching, Viterbi algorithm


To find the optimal overall probability of the model
(prototype) generating the candidate, we estimate the
maximum value of the cumulative probability over the
possible paths through the model
Assuming statistical independence of the feature vectors,
the best path to any point (i, j) and probability P(i, j) can be
computed, starting with P(0, 0) = Prob(0, 0) and P(i, j) = 0
elsewhere, using the recursion relation
Dynamic Time Warping (DTW)
non-linear/elastic matching, Viterbi algorithm


Taking the log of terms in previous equation, dropping
constant terms, multiplying by -2, and assuming zero
covariance terms yields the recursion relation
where D(i, j) is considered a cumulative distance measure
Dynamic Time Warping (DTW)
non-linear/elastic matching, Viterbi algorithm



Further, assume equal variances and transition probabilities,
and include an index k indicating the prototype
d(i, j; k) is the distance between feature vectors i and j
Note: since the log function is a monotonically increasing
function of its argument and changing sign converts a
maximizing relation into a minimizing one, this distance
relation leads to the same decisions as the probability
recursion relation, except for the simplifying assumptions
Dynamic Time Warping (DTW)
non-linear/elastic matching, Viterbi algorithm



This derivation shows the simplifying assumptions made in
going from a probabilistic model to a greatly simplified
distance model
Currently, most string matching commercial and research
systems use probabilistic models, and the Hidden Markov
Model (HMM) is probably the dominant one
As computing power has increased over the years, more
complex and primarily probabilistic models requiring large
training corpuses have been used
Dynamic Time Warping (DTW)
non-linear/elastic matching, Viterbi algorithm

In his research at IBM’s T.J. Watson Research Center,
your instructor worked in both the speech recognition
and the pen computing/handwriting recognition groups



founding member of the speech group (once over 50 workers)
spearheaded development of ThinkWrite handwriting recognizer
in IBM’s pen-enabled ThinkPad product in the early 1990s
The data in both the speech and online handwriting
problems are time sequences


Speech is recorded as a time waveform and usually transformed
via frequency analysis into a sequence of spectral time samples
Online handwriting is captured as a time sequence of x-y
coordinates describing the trajectory of the handwriting
Download