EM: Gaussian Mixture – 2D examples

advertisement
Expectation Maximization (EM)
Basic Notions [Dempster, Laird & Rubin 1977]
„ Consider points p = (x1, ..., xd) from a d-dimensional
Euclidean vector space
„ Clusters are represented by probability distributions
„ Typically: mixture of Gaussian distributions
„ Representation of a cluster C
Center point PC of all points in the cluster
d x d Covariance matrix 6C for the points in the cluster C
35
R1
37
27
29
31
33
19
21
23
R6
25
11
13
15
R11
17
3
1
WS 2003/04
R36
R31
R26
R21
R16
5
Density function for cluster C:
1 ˜x P T ˜¦ 1 ˜x P 1
C
P( x | C )
˜e2 C C
2S d ¦ C
„
7
„
9
„
6 – 126
Data Mining Algorithms
EM: Gaussian Mixture – 2D examples
A Gaussian Mixture Model, k = 3
A single Gaussian
density function
R36
R31
R26
R21
R36
R31
R26
35
R1
R21
37
29
31
33
23
25
27
17
19
R6
21
13
15
7
9
11
1
3
5
R16
R11
37
33
R1
35
29
31
25
27
21
R6
23
17
Data Mining Algorithms
19
13
15
9
11
5
7
1
WS 2003/04
3
R16
R11
6 – 127
EM – Basic Notions
„
Density function for clustering M = {C1, …, Ck}
„ Let Wi denote the fraction of cluster Ci in the entire data set D
P( x)
„
¦
k
i 1
Wi ˜ P ( x | Ci )
Assignment of points to clusters
„ A point may belong to several clusters with different probabilities
P(Ci | x) Wi ˜
„
P ( x | Ci )
P( x)
Maximize E(M), a measure for the quality of a clustering M
„ E(M) indicates the probability that the data D have been
generated by following the distribution model M
E (M )
WS 2003/04
¦
xD
log( P ( x))
Data Mining Algorithms
6 – 128
EM – Algorithm
ClusteringbyExpectationMaximization (point set D, int k)
Generate an initial model M’ = (C1’, …, Ck’)
repeat
// (re-) assign points to clusters
Compute P(x|Ci), P(x) and P(Ci|x) for each object x from D
and each cluster (= Gaussian) Ci
// (re-) compute the models
Compute a new model M = {C1, …, Ck} by recomputing Wi,
PC and 6C for each Cluster Ci
Replace M’ by M
until |E(M) – E(M’)| < H
return M
WS 2003/04
Data Mining Algorithms
6 – 129
EM – Recomputation of Parameters
„
Weight Wi of cluster Ci
Wi
„
Center Pi of cluster Ci
Pi
„
Shape matrix
of cluster Ci
6i
WS 2003/04
6i
1
¦ P(Ci | x)
n xD
¦
¦
xD
¦
x ˜ P (Ci | x)
xD
P (Ci | x)
2
P
P
(
C
|
x
)(
x
)
i
i
xD
¦
xD
P (Ci | x)
Data Mining Algorithms
6 – 130
EM – Discussion
„
Convergence to (possibly local) minimum
„
Computational effort:
„
„
„
O(n * |M| * #iterations)
„
#iterations is quite high in many cases
Both result and runtime strongly depend on …
„
… the initial assignment
„
… a proper choice of parameter k (= desired number of clusters)
Modification to obtain a really partitioning variant
„
„
WS 2003/04
Objects may belong to several clusters
Assign each object to the cluster to which it belongs with the
highest probability
Data Mining Algorithms
6 – 131
Clustering (II) – Summary
„
„
„
„
„
Advanced Clustering Topics
„ Incremental Clustering, Generalized DBSCAN, Outlier
Detection
Scaling Up Clustering Algorithms
„ BIRCH, Data Bubbles, Index-based Clustering, GRID
Clustering
Sets of Similarity Queries (Similarity Join)
Subspace Clustering
Expectation Maximization: a statistical approach
WS 2003/04
Data Mining Algorithms
6 – 132
Download