Cluster analysis

advertisement
Cluster Analysis
1.Single Link Cluster Analysis
2.Ward’s Minimum Sum of Squares
3.k-Means Cluster Analysis
4.SPSS TwoStep Cluster Analysis
Single-Link Clustering
(most popular method)
Cost
(Importance)
.
. .
Left
Single Link: Join item
to cluster which has the
single closest member.
A
Right
C
B
q
.
Since B<q, join the star
to the Left cluster, even
though A>q and C>q.
Complete Pain Relief
(Importance)
Cluster Analysis
Single Chain Agglomerative Procedure
(most popular method)
Part-Worth Coefficients of “Complete Pain Relief”
Therapy
A
2
Therapy
B
5
Therapies
CD
9 10
Therapy
E
15
Single Link: Join item to cluster which has the single closest member.
First Stage:
A= 2
B=5
C=9
AB= 3
BD=5
AC=6
BE=10
AD=8
CD= 1
AE=13
CE=6
BC= 4
DE=5
CDA=7
CDB=4
CDE=5
AB= 3
AE =13
BE =10
ABE=10
CDE=5
Second Stage:
(Euclidian Distance)
Third Stage:
Fourth Stage:
Fifth Stage:
ABCD=4
ABCDE=5
D=10
E=15
Single Chain Agglomerative
Clustering Output: Dendogram
5
4
3
1
A
B
C
D
E
Strength
(Importance)
Ward’s Clustering
.
. .
Left
Ward’s Cluster: Join item
to cluster which has the
smallest distance
ESS.
In this case, if star is
joined to left cluster,
ESS=A2+B2+C2+D2
Right
D
C
B
A
.
= mean location of points
in proposed cluster
Water Resistance (Importance)
Ward’s Minimum Variance
Agglomerative Clustering Procedure
First Stage:
A= 2
B=5
C=9
AB= 4.5
BD=12.5
AC=24.5
BE=50.0
AD=32.0
CD= 0.5
AE=84.5
CE=18.0
BC= 8.0
DE=12.5
CDA=38.0
CDB=14
CDE=20.66
AB= 5.0
AE =85
BE =50.5
ABE=93.17
CDE=25.18
Second Stage:
Third Stage:
Fourth Stage:
Fifth Stage:
ABCD=41.0
ABCDE=98.8
D=10
E=15
Ward’s Minimum Variance
Agglomerative Clustering Output
98.8
25.18
5
0.5
A
B
C
D
E
k-Means Clustering
1. Begin with two starting center points and
allocate each item to nearest cluster center.
2. Recalculate center of clusters. Stop if center
hasn’t changed.
3. Allocate items to nearest cluster center. Goto 2.
k-Means Clustering
1
4
A
A
B
2
B
5
A
A
B
3
A
B
B
SPSS TwoStep Cluster Method
-scalable cluster analysis algorithm designed to handle
very large data sets.
-can handle both continuous and categorical variables or attributes.
-automatically select the number of clusters.
Step 1: pre-cluster the cases (or records) into many
small sub-clusters;
Step 2: cluster the sub-clusters resulting from pre-cluster
step into the desired number of clusters.
Download