Archetypal Analysis for Machine Learning

advertisement
Informatics and Mathematical Modelling / Cognitive Sysemts Group
Archetypal Analysis for
Machine Learning
Morten Mørup
Joint work with
DTU Informatics
Cognitive Systems Group
Technical University of Denmark
MLSP 2010 September 1st
Lars Kai Hansen
DTU Informatics
Cognitive Systems Group
Technical University of Denmark
1
Informatics and Mathematical Modelling / Cognitive Sysemts Group
MLSP 2010
September 1st
2
Informatics and Mathematical Modelling / Cognitive Sysemts Group
Archetypical Analysis (AA)
X

X
S
C
AA formed by two simplex constraints
Archetype: Xck formed by convex combination of the data points
Projection: sn gives the convex combination of archetypes forming each data point
MLSP 2010
September 1st
3
Informatics and Mathematical Modelling / Cognitive Sysemts Group
The Original paper of Adler and Breiman considered 3 applications
Swiss army head shape
Los Angeles Basin air polution 1976
Tokamak Fusion Data
Other Applications:
Flame dynamics (Stone & Adler 1996)
End member extraction of Galaxy Spectra (Chan et al, 2003)
Data driven Benchmarking (Porzio et al. 2008)
MLSP 2010
September 1st
4
Informatics and Mathematical Modelling / Cognitive Sysemts Group
Archetypical analysis extract the
”principal convex hull” (PCH) of the data cloud
Convex hull: Blue lines and light shaded region (dots indicate points in convex set)
Dominant convex hull: green lines and gray shaded region (dots indicate archetypes)
(Dwyer, 1988)
While convex set can be identified in linear time O(N) (McCallum & Avis 1979)
finding C and S is a non-convex (NP hard) problem.
NB: One might think that AA is highy driven by outliers, however, ”outliers”
are only relevant if they reflect representative dynamics in the data!
MLSP 2010
September 1st
5
Informatics and Mathematical Modelling / Cognitive Sysemts Group
Our (new) mathematical results:
1: The AA/PCH model is in general unique!
See Theorem 1
2: The AA/PCH model can be efficiently
initialized by the proposed FurthestSum
algorithm
3: The AA/PCH model parameters can
be efficiently optimized by
normalization invariant projected
gradient
For details on derivation of updates
and their computational complexity
see section 2.3
The proposed FurthestSum algorithm
guarantee extraction of points in the
convex set, see Theorem 2
MLSP 2010
September 1st
Large scale Applications
6
Informatics and Mathematical Modelling / Cognitive Sysemts Group
Our Machine Learning Applications
 Computer vision
 NeuroImaging
 TextMining
 Collaborative Filtering
MLSP 2010
September 1st
7
Informatics and Mathematical Modelling / Cognitive Sysemts Group
Computer Vision: CBCL face database
Face database: K=361 pixels, N=2429  all images belong with probabilty 1 to convex set
X

X
S
C
SVD/PCA: Low -> high freq. dynamics
NMF: Part Based Representation
AA: Archetypes/Freaks
K-means: Centroids/Prototypes
MLSP 2010
September 1st
8
Informatics and Mathematical Modelling / Cognitive Sysemts Group
Archetypal Analysis naturally
bridges clustering methods with
low rank representations
MLSP 2010
September 1st
9
Informatics and Mathematical Modelling / Cognitive Sysemts Group
NeuroImaging: Positron Emission Tomography
X

Altansering tracer injected, recorded signal in theory mixture
of 3 underlying binding profiles (Archetypes): Low binding
regions, High binding regions and artery/veines. Each voxel a
given concentration fraction of these tissue types.
S
X
C
XC
S
Low Binding
MLSP 2010
September 1st
High Binding
10
Artery/Veines
Informatics and Mathematical Modelling / Cognitive Sysemts Group
Text Mining: NIPS term-document (bag of words)
X

C
S
X
XC:
Distinct Aspects
Prototypical Aspects
MLSP 2010
September 1st
11
Informatics and Mathematical Modelling / Cognitive Sysemts Group
Collaborative filtering: MovieLens
Medium size and large size Movie lens data (www.grouplens.org)
Medium size: 1,000,209 ratings of 3,952 movies by 6,040 users
Large size: 10,000,054 ratings of 10,677 movies given by 71,567
Extracts features representing distinct user types, each user represented as a given concentration
fraction of the user types. AA appear to have less tendency to overfit.
MLSP 2010
September 1st
12
Informatics and Mathematical Modelling / Cognitive Sysemts Group
Conclusion
 Archetypal Analysis is Unique in general (Theorem 1)
 Archetypal Analysis can be efficiently initialized by the
proposed FurhtestSum algorithm (Theorem 2) and optimized
through normalization invariant projected gradient.
 Archetypal Analysis naturally bridges clustering with low rank
approximations
 Archetypal Analysis results in easy interpretable features that
are closely related to the actual data
 Archetypal Analysis useful for a large variety of machine
learning problem domains within unsupervised learning.
(Computer Vision, NeuroImaging, TextMining, Collaborative Filtering)
 Archetypal Analysis can be extended to kernel representations
finding the principal convex hull in (a potentially infinite)
Hilbert space (see section 2.4 of the paper).
MLSP 2010
September 1st
13
Informatics and Mathematical Modelling / Cognitive Sysemts Group
Open problems and current research directions:
 What is the optimal number of components?
Cross-validation based on missing value prediction
(see also collaborative filtering example in the paper)
Bayesian generative models for AA/PCH that
automatically penalize model complexity.
 What if ’pure’ archetypes cannot be well represented
by the data available?
vs.
MLSP 2010
September 1st
14
Informatics and Mathematical Modelling / Cognitive Sysemts Group
Selected References from the paper
[1] Adele Cutler and Leo Breiman, “Archetypal analysis,” Technometrics, vol. 36,
no. 4, pp. 338–347, Nov 1994.
[2] D. S. Hochbaum and D. B. Shmoys., “A best possible heuristic or the k-center
problem.,” Mathematics of Operational Research, vol. 10, no. 2, pp. 180–184,
1985.
[7] Emily Stone and Adele Cutler, “Introduction to archetypal analysis of spatiotemporal dynamics,” Phys. D, vol. 96, no.1-4, pp. 110–131, 1996.
[8] Giovanni C. Porzio, Giancarlo Ragozini, and Domenico Vistocco, “On the use of
archetypes as benchmarks,” Appl. Stoch. Model. Bus. Ind., vol. 24, no. 5, pp.
419–437, 2008.
[9] B. H. P. Chan, D. A. Mitchell, and L. E. Cram, “Archetypal analysis of galaxy
spectra,” MON.NOT.ROY.ASTRON.SOC., vol. 338, pp. 790, 2003.
[11] D. McCallum and D. Avis, “A linear algorithm for finding the convex hull of a
simple polygon,” Information Processing Letters, vol. 9, pp. 201–206, 1979.
[12] Rex A. Dwyer, “On the convex hull of random points in a polytope,” Journal of
Applied Probability, vol. 25, no. 4, pp.688–699, 1988.
MLSP 2010
September 1st
15
Download