Learning-Representat..

Learning representations By Yang Zechao The Machine Learning course www.cs.cmu.edu/~tom/10701_sp11/lectures.shtml Neural Nets for Face Recognition Learning Lower Dimensional Representations • Supervised learning of lower dimension representation – Hidden layers in Neural Networks – Fisher linear discriminant • Unsupervised learning of lower dimension representation – – – – Principle Components Analysis (PCA) Independent components analysis (ICA) Canonical correlation analysis (CCA) Deep Belief Networks(DBN) Learning Lower Dimensional Representations • Supervised learning of lower dimension representation – Hidden layers in Neural Networks – Fisher linear discriminant • Unsupervised learning of lower dimension representation – – – – Principle Components Analysis (PCA) Independent components analysis (ICA) Canonical correlation analysis (CCA) Deep Belief Networks(DBN) Principle Components Analysis • Idea: – Given data points in d-dimensional space, project into lower dimensional space while preserving as much information as possible • E.g., find best planar approximation to 3D data • E.g., find best planar approximation to 10^4 D data – In particular, choose projection that minimizes the squared error in reconstructing original data Principle Components Analysis • Like auto-encoding neural networks, learn re-representation of input data that can best reconstruct it. Principle Components Analysis • learned encoding is linear function of inputs (not logistic) • No local minimum problems when training! • Given d-dimensional data X, learns d-dimensional • representation, where – the dimensions are orthogonal – top k dimensions are the k-dimensional linear rerepresentation that minimizes reconstruction error (sum of squared errors) Principle Components Analysis Assume data is set of d-dimensional vectors, where nth vector is 𝑥 𝑛 =< 𝑥1𝑛 … 𝑥𝑑𝑛 > We can represent these in terms of any d orthogonal vectors 𝑢1 … 𝑢𝑑 , 𝑥𝑛 = 𝑑 𝑛 𝑧 𝑖=1 𝑖 𝑢𝑖 𝑢𝑖𝑇 𝑢𝑗 = 𝛿𝑖𝑗 ; So, PCA: given M<d, Find < 𝑢1 … 𝑢𝑀 > that minimizes 𝐸𝑚 ≡ Where 𝑥 𝑛 = 𝑥 + 𝑥= 1 𝑁 𝑑 𝑛 𝑥 𝑖=1 𝑁 𝑖=1 𝑛 𝑀 𝑧 𝑖=1 𝑖 𝑢𝑖 𝑥𝑛 − 𝑥𝑛 2 Principle Components Analysis • Note we get zero error if M=d, so all error is due to missing components. • Therefore, 𝐸𝑀 = 𝑑 𝑇 𝑁 𝑢 𝑖=𝑀+1 𝑛=1 𝑖 𝑑 𝑇 𝑢 𝑖=𝑀+1 𝑖 Σ𝑢𝑖 = Covariance matrix:Σ = 𝑛 𝑥 −𝑥 2 𝑥𝑛 − 𝑥 𝑥𝑛 − 𝑥 𝑇 Principle Components Analysis • Minimize EM = 𝑑𝑖=𝑀+1 𝑢𝑖𝑇 Σ𝑢𝑖 • use Lagrange Multiplier, get 𝑢𝑖𝑇 Σ𝑢𝑖 + 𝜆𝑖 1 − 𝑢𝑖𝑇 𝑢𝑖 • derive → 𝑆𝑢𝑖 = 𝜆𝑖 ui • so EM = 𝑑 𝑖=𝑀+1 𝜆𝑖 PCA algorithm 1 1. X ← Create N x d data matrix, with one row vector xn per data point 2. X ← subtract mean x from each row vector xn in X 3. Σ ← covariance matrix of X 4. Find eigenvectors and eigenvalues of S 5. PC’s ← the M eigenvectors with largest eigenvalues Very Nice When Initial Dimension Not Too Big • What if very large dimensional data? – e.g., Images (d ¸ 10^4) • Problem: – Covariance matrix Σ is size (d x d) – d=10^4 | Σ | = 10^8 • Singular Value Decomposition (SVD) to the rescue! – pretty efficient algs available, including Matlab SVD – some implementations find just top N eigenvectors SVD Data X, one row per data point U gives coordinates of rows of X in the space of principle components S is diagonal, 𝑆𝑘 > 𝑆𝑘+1 , 𝑆𝑘2 is kth eigenvalue rows of 𝑉 𝑇 are unit length eigenvectors of 𝑋 𝑇 𝑋 Singular Value Decomposition To generate principle components: 1 𝑁 𝑑 𝑛 • Subtract mean 𝑥 = 𝑥 from each data point, to 𝑖=1 create zero-centered data • Create matrix X with one row vector per (zero centered) data point • Solve SVD: X = US𝑉 𝑇 • Output Principle components: columns of V (= rows of 𝑉𝑇) – Eigenvectors in V are sorted from largest to smallest eigenvalues – S is diagonal, with 𝑠𝑘2 giving eigenvalue for kth eigenvector Singular Value Decomposition • To project a point (column vector x) into PC coordinates:𝑉 𝑇 x • If xi is ith row of data matrix X, then • To project a column vector x to M dim Principle Components subspace, take just the first M coordinates of 𝑉 𝑇 x Independent Components Analysis • PCA seeks orthogonal directions <Y1 … YM> in feature space X that minimize reconstruction error • ICA seeks directions <Y1 … YM> that are most statistically independent. I.e., that minimize I(Y), the mutual information between the Yj : Learning Lower Dimensional Representations • Supervised learning of lower dimension representation – Hidden layers in Neural Networks – Fisher linear discriminant • Unsupervised learning of lower dimension representation – – – – Principle Components Analysis (PCA) Independent components analysis (ICA) Canonical correlation analysis (CCA) Deep Belief Networks(DBN) Dimensionality reduction across multiple datasets • Given data sets A and B, find linear projections of each into a common lower dimensional space! – Generalized SVD: minimize sq reconstruction errors of both – Canonical correlation analysis: maximize correlation of A and B in the projected space Canonical Correlation Analysis • Measuring the linear relationship between two multi dimensional variables • Finding two sets of basis vectors such that the correlation between the projections of the variables onto these basis vectors is maximized • Determine Correlation Coefficients Learning Lower Dimensional Representations • Supervised learning of lower dimension representation – Hidden layers in Neural Networks – Fisher linear discriminant • Unsupervised learning of lower dimension representation – – – – Principle Components Analysis (PCA) Independent components analysis (ICA) Canonical correlation analysis (CCA) Deep Belief Networks(DBN) Deep Belief Networks • Problem: training networks with many hidden layers doesn’t work very well – local minima, very slow training if initialize with zero weights • Deep belief networks – autoencoder networks to learn low dimensional encodings – but more layers, to learn better encodings Deep Belief Networks The second row is reconstructed from 2000-1000500-30 DBN The third row is reconstructed from 2000-300, linear PCA Encoding of digit images in two dimensions 784-2 linear encoding (PCA) 784-1000-500-250-2 DBNet Learning Lower Dimensional Representations • Supervised learning of lower dimension representation – Hidden layers in Neural Networks – Fisher linear discriminant • Unsupervised learning of lower dimension representation – – – – Principle Components Analysis (PCA) Independent components analysis (ICA) Canonical correlation analysis (CCA) Deep Belief Networks(DBN) Fisher linear discriminant Objective : • LDA seeks to reduce dimensionality while preserving as much of the class discriminatory information as possible • We seek to obtain a scalar 𝑦 by projecting the samples 𝑥 onto a line 𝑦 = 𝑤𝑇𝑥 Fisher linear discriminant • Of all the possible lines we would like to select the one that maximizes the separability of the scalars • In order to find a good projection vector, we need to define a measure of separation Fisher linear discriminant • Define class means: • Could choose w according to: Fisher Linear Discriminant • For each class we define the scatter, an equivalent of the variance, as • Fisher Linear Discriminant chooses: Fisher Linear Discriminant • Choose n-1 dimension projection for n-class classification problem • Use within-class covariances to determine the projection • Minimizes a different error function (the projected withinclass variances) Thank You!

Learning-Representat..

Related documents

Products

Support

Learning-Representat..

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib