A Generalization of PCA to the Exponential Family Collins, Dasgupta and Schapire

advertisement
A Generalization of PCA to the
Exponential Family
Collins, Dasgupta and Schapire
Presented by Guy Lebanon
Two Viewpoints of PCA
• Algebraic
Given data D  {x1 ,, xn }, xi  R N find a linear
transformation T : R N  R K
such that the sum of
2
squared distances i xi  Txi
is minimized
(over all linear transformation R N  R K
)
• Statistical
N
Given data D  {x1 ,, xn }, xi  R assume that each point xi
is a N ( i , I ) random variable. Find the maximum
likelihood estimator̂i  under the constraint that {ˆ i } are in
a K dimensional subspace and are linearly related to the data.
• The Gaussian assumption may be
inappropriate – especially if the data is
binary valued or non-negative for example.
• Suggestion: replace the Gaussian
distribution by any exponential distribution.
Given data D  {x1,, xn }, xi  R N such that each
point xi comes from an exponential family
distribution e  , x t ( x )c ( ) , find the MLE for
 i  under the assumption that it lies in a low
dimensional subspace.
i
i
i
i
• The new algorithm finds a linear
transformation in the parameter space  i 
but a nonlinear subspace in the original
coordinates x .
• The loss functions may be cast in terms of
Bregman distances.
• The loss function is not convex in the
general case.
• The authors use the alternating
minimization algorithm (Csiszar and
Tsunadi) to compute the transformation.
Download