Learn to Compress and Restore Sequential Data Shixia Liu

advertisement
Learn to Compress and Restore Sequential Data∗
Yi Wang and Jianhua Feng
Shixia Liu
Department of Computer Science,
Tsinghua University,
Beijing, 100084, China.
Information Visualization and
Interactive Visual Analytics,
IBM China Research Lab,
Beijing, 100094, China.
Introduction
able to accurately model highly varied sequential data.
Moreover, as a hidden Markovian model, VLHMM is generally applicable to all kinds of sequences, whatever discrete/continuous and univariate/multivariate.
Data compression methods can be classified into two
groups: lossless and lossy. Usually the latter achieves a
higher compression ratio than the former. However, to develop a lossy compression method, we have to know, for a
given type of data, what information can be discarded without significant degradation of the data quality.
A usual way to obtain such knowledge is by experiments.
For example, from user statistics, we know that human eyes
are insensitive to some frequency channels of the light signal. Thus we can compress image data by decomposing
them into various frequency channels using a DCT transformation, and neglect the coefficients of the channels that
are insensitive to human eyes. However, it is complex and
expensive for human analysts to conduct and study so many
experiments.
Alternatively, we propose to learn the knowledge automatically by using machine learning techniques. Under the
framework of Bayesian learning, general prior knowledge is
expressed by designing the statistical models, and the refined
posterior knowledge can be learned automatically from data
to be compressed. More particularly, we consider the compression of some input data as learning a statistical model
from the data, and consider the restoration of data as sampling from the learned model. Therefore, only the estimated
model parameters are saved as the compressed version.
A key to this idea is to design a statistical model that can
accurately describe the data (so it is possible to recover the
data precisely) and is defined by a compact set of parameters
(so to achieve high compression ratio).
For a general application of compressing sequential data,
we designed the Variable-length Hidden Markov Model
(VLHMM), whose learning algorithm automatically learns
a minimal set of parameters (by optimizing a MinimumEntropy criterion) that accurately models the sequential data
(by optimizing a Maximum-Likelihood criterion). The selfadaption ability of the learning algorithm makes VLHMM
The VLHMM Approach
The VLHMM combines the advantages of the hidden
Markov model (HMM), the high-order HMM (n-th order
HMM or n-HMM), and the variable-length Markov model
(VLMM) (Rissanen 1983).
Like HMM and n-HMM, VLHMM models hidden
Markovian process switching over a specified number, say
S, of states. We call the several previous states used to determine the current state a context. For each state transition,
the model moves from the current context to a new state,
which, together with the current context, forms the new context. For HMM the new context is truncated to have the fixed
length 1; for n-HMM all contexts have the fixed length of n;
whereas for VLHMM the contexts have variable lengths that
are learned from data. Thus the model can be represented
by a directed graph, whose nodes correspond to the contexts
and whose edges represent the context transitions.
Using the graph representation, simulating a VLHMM,
like simulating a HMM or an n-HMM, can be considered
a random walking over the graph. Because on each context
an output pdf is defined, and after each context transition an
observable is sampled from the output pdf of the destination
context, simulating the model generates a sequence of observables. Because the output pdfs can have rather flexible
form, the observables can be whatever discrete/continuous
scalar/vector. This makes VLHMM applicable to model various kinds of sequential data.
Using VLHMM, the idea of “learning to compress” becomes to learn a VLHMM, and use the Viterbi algorithm to
align the training sequence to a path of the learned context
graph, where the path is the one from which the training sequence is most likely generated. Thus, only the transition
probabilities and the output pdfs along the path are needed
to be saved as the compressed data. We restore the training
sequence data by simulating the saved path.
Although HMM and n-HMM can also be learned and
simulated, they are not suitable for compression. The HMM,
because of its well known restriction of single-state contexts, cannot accurately model sequential data that is highly
∗
This work was finished during YW’s visit to IBM China Research Lab. YW hope to thank the support from NFS of China
(No. 60573094), 973 Program of China (No. 2006CB303103), and
City University of Hong Kong (CityU. 118205 and 1211/04E) for
critical preliminary research.
c 2007, Association for the Advancement of Artificial
Copyright Intelligence (www.aaai.org). All rights reserved.
1912
varied over time. Thus simulating the model may not be
able to restore the training sequence correctly. The n-HMM
using long contexts is accurate, but since there is a total
of S n potential contexts, the number of model parameters
would increase exponentially, learning which would require
intractable amount of training data and learning time.
Whereas, VLHMM is an accurate high-order model and
it learns a minimum set of contexts to achieve high compression ratio. VLHMM makes use of the fact that it is not
necessary to extend all the contexts to a fixed large length
n. Instead, it learns longer contexts to model parts of the
training sequence that are highly varied over time, and learns
shorter ones for parts with plain dynamics. The effectiveness
of VLHMM is ensured by that the dynamical complexity of
most sequential data changes over time. For example, the
dynamical complexity of the digital voice sequences varies
according to the change of the topic, the mood, and etc.
VLMM also learns a minimum set of contexts with variable lengths. But as an “observable” model without the output pdfs, it can only model sequences of univariate and discrete values. Although continuous/multivariate sequences
can be discretized before learning a VLMM, the discretization without considering the temporal coherence of the sequences usually introduces significant error that is hard to
control (Wang & et al 2006).
VLHMM is a hidden Markov model, and is generally applicable to all kinds of sequential data. Similar with VLMM,
its contexts have variable lengths, and are the shortest but
long enough to accurately determine the next state. Shortening the contexts would exponentially reduce the total number of contexts and model parameters, and thus ensures optimal compression ratio. Whereas the enough lengths ensure
the accuracy of modeling and thus precise data restoration.
However the variable lengths also make the total number of contexts of VLHMM unknown prior to learning,
even with the number of states, S given. This is different
from HMM and n-HMM. In Bayesian learning, it is referred by unknown model structure and is well known difficult. Our solution is to derived a structural-EM algorithm,
which is proved to converge to a Maximum-Likelihood estimation (Friedman 1997). Compatible with the proof, our
algorithm invokes a context-tree growing/pruning procedure
in its M-step to update the estimate of contexts by optimizing a Minimum-Entropy criterion.
Figure 1: Trajectories of compressed ballet motion. The
red trajectories of two hands show a short segment of MoCap ballet data. Its two compressed-and-restored versions
are in green and blue respectively. The green one is learned
with S set to 60 and is compressed to 11.77% to the original
motion. Its restored trajectories are very close to those of
the original motion. The blue one is compressed to 2.14%
with S = 7. At this extreme compress ratio, the restored
ending frame has obvious difference from that of the original version. However the restored trajectories still preserve
the general structure of the original version.
is based on indexing the MoCap data by exploiting structural
information derived from the skeletal virtual human model.
Both papers claim high compression ratio with low visual
quality degradation.
We compare the VLHMM method with both of these
methods, as well as other potential methods that are mentioned and tested in these two papers. Experiments show
that the VLHMM method achieves comparable compression ratio with comparatively little degradation (measured
by the metrics proposed in (Arikan 2006)). In addition, a noticeable advantages of VLHMM over these methods is that
VLHMM is not restricted for only the MoCap data.
Experiment details can be found online at http:
//dbgroup.cs.tsinghua.edu.cn/wangyi/VLHMM,
and more experiments on other types of sequential data, like
digital movie and audio, are on going.
References
Arikan, O. 2006. Compression of motion capture databases. In
Proc. ACM SIGGRAPH.
Chattopadhyay, S., and et al. 2007. Human motion capture data
compression by model-based indexing: A power aware approach.
IEEE Trans. on Visualization and Computer Graphics 13(1):5–
14.
Friedman, N. 1997. Learning Bayesian networks in the presence
of missing values and hidden variables. In Proc. Uncertainty in
AI.
Rissanen, J. 1983. A universal data compression system. IEEE
Trans. on Information Theory 29:656–664.
Sigal, L., and Black, M. J. 2006. Humaneva: Synchronized
video and motion capture dataset for evaluation of articulated human motion. Technical report, Department of Computer Science,
Brown University.
Wang, Y., and et al. 2006. Mining complex time-series data by
learning Markovian models. In Proc. IEEE ICDM.
Experiments and Progress
Although the VLHMM is a general compressor for various
types of sequential data, our primary experiments are conducted for the motion capture (MoCap) data. Because MoCap data record 3D body movements of human performers,
it is typical as high-dimensional and with complex dynamics (Sigal & Black 2006). With the recent development of
the MoCap technology, researchers started to consider the
compression of MoCap data (Arikan 2006) (Chattopadhyay
& et al 2007). As most lossy compression methods introduced in Section ”Introduction”, these two are based on
carefully collected expertise: (Arikan 2006) approximates
short clips of motion using Bezier curves and clustered principal component analysis, and (Chattopadhyay & et al 2007)
1913
Download