Multi-label Prediction via Sparse Infinite CCA

Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note: all tables and figures are taken from the original paper Outline • Canonical Correlation Analysis  CCA  Probabilistic CCA • Infinite Canonical Correlation Analysis Model  The Indian Buffet Process  The Infinite CCA Model  Inference • Multitask Learning using Infinite CCA  Fully supervised setting  Semi-supervised setting • Experiments  Infinite CCA results on synthetic data  Infinite CCA applied to multi-label prediction • Conclusion Canonical Correlation Analysis • For variables , CCA seeks the linear projections so that the variables are maximally correlated in the projection space. • Correlation coefficient between two variables in the embedded space is given by • CCA can be posed as a constrained optimization problem • Let and denote the covariance matrix of data samples Probabilistic CCA Let , , consider following latent variable model: We can also write where Latent variable z is shared between x and y Probabilistic CCA • • • • Probabilistic interpretation of CCA Maximum likelihood approach for parameter estimation Number of canonical correlation components is fixed The projection matrix is not sparse  Use IBP as a prior on the binary matrices with infinitely countable columns  Posterior inference determines the subset of latent features for the responsible observations  The IBP ensures that the matrices are sparse Indian Buffet Process • Given an matrix of observations each with the latent feature model can be expressed as where • IBP interpretation:  First customer tries dishes  nth customer tries: Previously-tasted dish k with probability  completely new dishes features, Infinite CCA Model • Impose IBP prior on matrix so that the dimensionality of the latent space associated with can be automatically determined from an unbounded number. • Represent the where • Two random vectors x and y can be modeled as • z is shared between x and y, and are noise. Infinite CCA Model Let The full model can be written as The graphical model structure Inference • Sample B  Sample existing dishes: where  Sample new dishes: use an M-H step Propose Accept the proposal with an acceptance probability Inference • Sample V • Sample Z Multitask Learning using Infinite CCA Consider each example is associated with multiple labels. One task: to predict each label. Motivation: borrow information across tasks. • Apply infinite CCA model to capture label correlations • Learn better predictive features by projecting the data to a subspace directed by label information cross-covariance matrix : input-output correlation label covariance matrix : label correlation Multitask Learning using Infinite CCA • Fully supervised setting (Model-1) Given labeled data , the model is to learn task parameters in the subspace. Predict labels in the original D dimensional space by inflating parameters back to D dimensions with the projection matrix. • Semi-supervised setting (Model-2) Learn the embeddings for both training and testing data, thus training and testing both take place in the K dimensionality subspace. Experiments (I) • Generate two datasets of dimensionalities 25 and 10, each having 100 samples; • Ground truth: have 4 correlation components with a 63% sparsity in the true projection matrix • Classical CCA found 8 components with significant correlations; while infinite CCA correctly discovered exactly 4 components. • Classical CCA infer the projection matrix with no exact zero entries. If set small values to be zero, the uncovered sparsity was about 25%; • Infinite CCA can infer the projection matrix with 57% zero entries and 62% zero entries after thresholding very small values. Experiments (II) • Use two real-world multi-label datasets (Yeast and Scene) from UCI repository; • The Yeast dataset consists of 1500 training and 917 testing examples, each having 103 features. The number of labels per example is 14. • The Scene dataset consists of 1211 training and 1196 testing examples, each having 294 features. The number of labels per example is 6. • Compare the following models Conclusion • Present a nonparametric Bayesian model for the CCA problem; • Automatically select the number of correlation components and capture the sparsity pattern; • Enable to deal with missing data; • Solve the multi-label learning problem .

Multi-label Prediction via Sparse Infinite CCA

Related documents

Products

Support

Multi-label Prediction via Sparse Infinite CCA

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib