CCC.doc

advertisement
Cubic Clustering Criterion CCC
SAS Technical Report #108
n = number of observations
nk= number in cluster k
p = number of variables
q = number of clusters
X = nxp data matrix
M = qxp matrix of cluster means
Z = cluster indicator (zik=1 if obs. i in cluster k)
Assume each variable has mean 0.
Z’Z = diag(n1, ..., nq), M = (Z’Z)-1Z’X
SS(total) matrix = T= X’X
SS(between clusters) matrix = B = M’ Z’Z M
SS(within clusters) matrix = W = T-B
R2 = 1 – trace(W)/trace(T)
(trace = sum of diagonal elements)
Stack columns of X into one long column.
Regress on Kronecker product of Z with pxp identity matrix
Compute R2 for this regression – same R2
The CCC idea is to compare the R2 you get for a given set of clusters with the R2 you
would get by clustering a unfoirmly distributed set of points in p dimensional space.
Download