Document 18000696

advertisement
Outline
 Latent Variable Models
 Probabilistic PCA (PPCA)
 Dual Probabilistic PCA (DPPCA)
 Gaussian Latent Variable Models
 MDS, Kernel PCA, Nonlinear variants
 Unified objective function
 Experiments
 Extensions
 Gaussian Process Dynamical Models
 Hierarchical GPLVM
Motivation
 To learn a low dimensional representation of
high dimensional data
 Our primary focus will be on visualization of
the data in 2D
 Desiderata:
 Maintains proximity data (e.g. what’s close in
data space is close in latent space)
 Probabilistic framework
 Projection from latent space to data space
 A non-linear latent space embedding
Notation
 D = # of observed dimensions
 q = # of latent dimensions
 N = # of data points
 Y = observed data
 X = latent data
 W = linear mapping
Probabilistic PCA

A prior is placed over X, our latent embedding space s.t X ~ N(0,I). X is then
marginalized out.
Marginalize X
Optimize W
Y is observed
Columns of U are the eigenvectors of S
L is a diagonal matrix of eigenvalues
R is an arbitrary orthogonal matrix
Taking the marginal likelihood of Y
by integrating over X results in PPCA.
Optimizing the log likelihood w.r.t W
results in solving the eigenvalue problem
for our sample covariance matrix S.
Generative Model of PPCA
Bishop et. al 2006
Z represents the latent space
Which then gets mapped
to the data space X via W
The green contours show
the marginal density for X
Dual Probabilistic PCA

A prior is placed over W, our mapping parameter s.t W ~ N(0, I). W is then
marginalized out.
Optimize X
Marginalize W
Y is observed
Columns of U are the eigenvectors of S
L is a diagonal matrix of eigenvalues
R is an arbitrary orthogonal matrix
Taking the marginal likelihood of Y
by integrating over X results in PPCA.
The marginal likelihood replaces W with X,
but the ML estimate takes a similar form…
DPPCA = PPCA
 Solution for PPCA:
 Solution for Dual PPCA:
 Lawrence claims these are “equivalent” through the
relation:
Connection to Gaussian Processes
 Recall that a Gaussian process is fully defined by its mean m(x)
and covariance function.
Assuming a 0 mean function
and a covariance function K
Recall that the marginal likelihoods for DPPCA takes this form:
This is equivalent to a product over D independent Gaussian processes
with the linear covariance function mentioned above.
Kernel PCA vs. GPLVM
Kernel PCA: Y  X
Y
Y
Nonlinear
Embedding
GPLVM: X  Y
W
Y
Linear
Mapping
via W
K’(Y)
W
X
Linear
Mapping
via W
X
X
K(X)
Nonlinear
Embedding
Unifying Objective
 GPLVM can be connected to classical MDS and kernel
PCA via a unifying objective function.
Kernel PCA: S is non-linear & K is linear
GPLVM: K is non-linear & S is linear
Classical MDS: S is a proximity or similarity matrix
 GPLVM can be optimized via SCG, scaled conjugate
gradient.
Sparsification
 Kernel methods may be sped up through sparsification
 In SCG optimization, each gradient computation requires
an inversion of the kernel matrix
 This is
datasets
and thus prohibitive for visualizing large
 We can represent the data with an Informative Vector
Machine
 Through a subset I of d points known as the active set with
likelihood
 This is
dominated by selection of the active set
(Sparse) Oil Flow Visualizations
Radial Basis Function Kernel
Leads to smooth functions that fall
away to zero in regions with no data.
Multi-layer Perceptron Kernel
Leads also to smooth functions,
but regions with no data tend to
have the same values.
Oil Flow Visualization
Full (No Sparsification) RBF Kernel
Generative Topographic Mapping
Alternative Noise Models

We are NOT constrained to Gaussian noise models

We visualize handwritten 2’s with two noise models
Gaussian Noise
Test
Pixels removed in red
Baseline Model
Gaussian Model
Bernoulli Model
Bernoulli Noise
Hallucinating Faces
These faces were created by taking 64 uniformly spaced and ordered points from the latent space (1D)
and visualizing the mean of their distribution in data space.
Examples from the data-set which are closest to the corresponding fantasy images in latent space.
Pros and Cons of GPLVM
 Pros
 Probabilistic
 Missing data straightforward.
 Can sample from model given X.
 Different noise models can be handled.
 Flexible family of kernels.
 Cons
 Speed of optimisation.
 Optimisation is non-convex using a non-linear kernel.
 Sensitivity to initializations.
Initialization Issues
GPLVM initialized with PCA
Swiss Roll Dataset
GPLVM initialized with Isomap
Gaussian Process Dynamical Models
Nonlinear
Dynamical System
Generative Process
GPDM
 GPDM marginalizes
over parameters A and B
Jack M. Wang et. al, Gaussian Process Dynamical Models (NIPS 2006)
GPDM on walking data
Training Data of Walking Video
GPDM of walk data learned
with an RBF Kernel +
2nd order dynamics
Missing Data
Training data with frames missing from
51 to 100, out of 158 frames in total.
Missing frames recovered by GPDM
with linear + RBF kernel
Visualization of Latent Space
GPLVM
Random Samples from X
GPDM
Data Space Mappings
MORE GPDM Action
GPDM learned using a 2nd order RBF Kernel for dynamics
Hierarchical GPLVM
We want
MAP Estimates
of this
Neil D. Lawrence Hierarchical Gaussian Process Latent Variable Models (ICML
High Five!
Provides
coordination
information
between the
2 subjects
Frames
A: 85, B: 114, C:127, D: 141, E: 155, F: 170, G: 190, H: 215
A motion capture data set consisting of two interacting subjects. The data,
which was taken from the CMU MOCAP database, consists of two subjects
that approach each other and ‘high five’.
Run, don’t walk!
The skeleton is decomposed as shown in Figure 4. In the plots, crosses are latent positions associated with the run
and circles are associated with the walk. We have mapped three points from each motion through the hierarchy.
Periodic dynamics was used in the latent spaces.
Conclusion
Download