Cubic Clustering Criterion CCC SAS Technical Report #108 n = number of observations nk= number in cluster k p = number of variables q = number of clusters X = nxp data matrix M = qxp matrix of cluster means Z = cluster indicator (zik=1 if obs. i in cluster k) Assume each variable has mean 0. Z’Z = diag(n1, ..., nq), M = (Z’Z)-1Z’X SS(total) matrix = T= X’X SS(between clusters) matrix = B = M’ Z’Z M SS(within clusters) matrix = W = T-B R2 = 1 – trace(W)/trace(T) (trace = sum of diagonal elements) Stack columns of X into one long column. Regress on Kronecker product of Z with pxp identity matrix Compute R2 for this regression – same R2 The CCC idea is to compare the R2 you get for a given set of clusters with the R2 you would get by clustering a unfoirmly distributed set of points in p dimensional space.