Document 18000696

Outline  Latent Variable Models  Probabilistic PCA (PPCA)  Dual Probabilistic PCA (DPPCA)  Gaussian Latent Variable Models  MDS, Kernel PCA, Nonlinear variants  Unified objective function  Experiments  Extensions  Gaussian Process Dynamical Models  Hierarchical GPLVM Motivation  To learn a low dimensional representation of high dimensional data  Our primary focus will be on visualization of the data in 2D  Desiderata:  Maintains proximity data (e.g. what’s close in data space is close in latent space)  Probabilistic framework  Projection from latent space to data space  A non-linear latent space embedding Notation  D = # of observed dimensions  q = # of latent dimensions  N = # of data points  Y = observed data  X = latent data  W = linear mapping Probabilistic PCA  A prior is placed over X, our latent embedding space s.t X ~ N(0,I). X is then marginalized out. Marginalize X Optimize W Y is observed Columns of U are the eigenvectors of S L is a diagonal matrix of eigenvalues R is an arbitrary orthogonal matrix Taking the marginal likelihood of Y by integrating over X results in PPCA. Optimizing the log likelihood w.r.t W results in solving the eigenvalue problem for our sample covariance matrix S. Generative Model of PPCA Bishop et. al 2006 Z represents the latent space Which then gets mapped to the data space X via W The green contours show the marginal density for X Dual Probabilistic PCA  A prior is placed over W, our mapping parameter s.t W ~ N(0, I). W is then marginalized out. Optimize X Marginalize W Y is observed Columns of U are the eigenvectors of S L is a diagonal matrix of eigenvalues R is an arbitrary orthogonal matrix Taking the marginal likelihood of Y by integrating over X results in PPCA. The marginal likelihood replaces W with X, but the ML estimate takes a similar form… DPPCA = PPCA  Solution for PPCA:  Solution for Dual PPCA:  Lawrence claims these are “equivalent” through the relation: Connection to Gaussian Processes  Recall that a Gaussian process is fully defined by its mean m(x) and covariance function. Assuming a 0 mean function and a covariance function K Recall that the marginal likelihoods for DPPCA takes this form: This is equivalent to a product over D independent Gaussian processes with the linear covariance function mentioned above. Kernel PCA vs. GPLVM Kernel PCA: Y  X Y Y Nonlinear Embedding GPLVM: X  Y W Y Linear Mapping via W K’(Y) W X Linear Mapping via W X X K(X) Nonlinear Embedding Unifying Objective  GPLVM can be connected to classical MDS and kernel PCA via a unifying objective function. Kernel PCA: S is non-linear & K is linear GPLVM: K is non-linear & S is linear Classical MDS: S is a proximity or similarity matrix  GPLVM can be optimized via SCG, scaled conjugate gradient. Sparsification  Kernel methods may be sped up through sparsification  In SCG optimization, each gradient computation requires an inversion of the kernel matrix  This is datasets and thus prohibitive for visualizing large  We can represent the data with an Informative Vector Machine  Through a subset I of d points known as the active set with likelihood  This is dominated by selection of the active set (Sparse) Oil Flow Visualizations Radial Basis Function Kernel Leads to smooth functions that fall away to zero in regions with no data. Multi-layer Perceptron Kernel Leads also to smooth functions, but regions with no data tend to have the same values. Oil Flow Visualization Full (No Sparsification) RBF Kernel Generative Topographic Mapping Alternative Noise Models  We are NOT constrained to Gaussian noise models  We visualize handwritten 2’s with two noise models Gaussian Noise Test Pixels removed in red Baseline Model Gaussian Model Bernoulli Model Bernoulli Noise Hallucinating Faces These faces were created by taking 64 uniformly spaced and ordered points from the latent space (1D) and visualizing the mean of their distribution in data space. Examples from the data-set which are closest to the corresponding fantasy images in latent space. Pros and Cons of GPLVM  Pros  Probabilistic  Missing data straightforward.  Can sample from model given X.  Different noise models can be handled.  Flexible family of kernels.  Cons  Speed of optimisation.  Optimisation is non-convex using a non-linear kernel.  Sensitivity to initializations. Initialization Issues GPLVM initialized with PCA Swiss Roll Dataset GPLVM initialized with Isomap Gaussian Process Dynamical Models Nonlinear Dynamical System Generative Process GPDM  GPDM marginalizes over parameters A and B Jack M. Wang et. al, Gaussian Process Dynamical Models (NIPS 2006) GPDM on walking data Training Data of Walking Video GPDM of walk data learned with an RBF Kernel + 2nd order dynamics Missing Data Training data with frames missing from 51 to 100, out of 158 frames in total. Missing frames recovered by GPDM with linear + RBF kernel Visualization of Latent Space GPLVM Random Samples from X GPDM Data Space Mappings MORE GPDM Action GPDM learned using a 2nd order RBF Kernel for dynamics Hierarchical GPLVM We want MAP Estimates of this Neil D. Lawrence Hierarchical Gaussian Process Latent Variable Models (ICML High Five! Provides coordination information between the 2 subjects Frames A: 85, B: 114, C:127, D: 141, E: 155, F: 170, G: 190, H: 215 A motion capture data set consisting of two interacting subjects. The data, which was taken from the CMU MOCAP database, consists of two subjects that approach each other and ‘high five’. Run, don’t walk! The skeleton is decomposed as shown in Figure 4. In the plots, crosses are latent positions associated with the run and circles are associated with the walk. We have mapped three points from each motion through the hierarchy. Periodic dynamics was used in the latent spaces. Conclusion

Document 18000696

Related documents

Products

Support

Document 18000696

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib