A Generalization of PCA to the Exponential Family Collins, Dasgupta and Schapire Presented by Guy Lebanon Two Viewpoints of PCA • Algebraic Given data D {x1 ,, xn }, xi R N find a linear transformation T : R N R K such that the sum of 2 squared distances i xi Txi is minimized (over all linear transformation R N R K ) • Statistical N Given data D {x1 ,, xn }, xi R assume that each point xi is a N ( i , I ) random variable. Find the maximum likelihood estimator̂i under the constraint that {ˆ i } are in a K dimensional subspace and are linearly related to the data. • The Gaussian assumption may be inappropriate – especially if the data is binary valued or non-negative for example. • Suggestion: replace the Gaussian distribution by any exponential distribution. Given data D {x1,, xn }, xi R N such that each point xi comes from an exponential family distribution e , x t ( x )c ( ) , find the MLE for i under the assumption that it lies in a low dimensional subspace. i i i i • The new algorithm finds a linear transformation in the parameter space i but a nonlinear subspace in the original coordinates x . • The loss functions may be cast in terms of Bregman distances. • The loss function is not convex in the general case. • The authors use the alternating minimization algorithm (Csiszar and Tsunadi) to compute the transformation.