Cubic Clustering Criterion (CCC) Overview

Cubic Clustering Criterion CCC SAS Technical Report #108 Here I’ll change notation from the technical report to show the relationship between CCC and typical regression computations. n = number of observations nk= number in cluster k p = number of variables q = number of clusters Y = nxp data matrix M = qxp matrix of cluster means X = cluster indicator (xik=1 if obs. i in cluster k) Assume each variable has mean 0 (center the data). X’X = diag(n1, ..., nq), ˆ = (X’X)-1X’Y SS(total) matrix (uncorrected) = T= Y’Y SS(between clusters) matrix (uncorrected) = B = ˆ ’ X’X ˆ SS(within clusters) matrix = W = T-B R2 = 1 – trace(W)/trace(T) (trace = sum of diagonal elements) Stack columns of Y into one long column. Regress on Kronecker product of X with pxp identity matrix Compute R2 for this regression – same R2 The CCC idea is to compare the R2 you get for a given set of clusters with the R2 you would get by clustering a uniformly distributed set of points in p dimensional space.

Cubic Clustering Criterion (CCC) Overview

Related documents

Products

Support

Cubic Clustering Criterion (CCC) Overview

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib