Quality of Clusterings • Two metrics: – SSE – Dissimilarity Ratio Computing SSE • Save clusters. Two new columns are created: Cluster and Distance. • Create new column as formula. Name it as dist-sqr and define it as Distance2 • Analyze – Distribution for dist-sqr. Get the mean and multiply by N to obtain SSE Computing Dissimilarity Ratio • Dissimilarity ratio = (inter-cluster distance / intracluster distance) • Inter-cluster distance is the smallest distance between centroids • Normalize centroid coordinates: – Coordinates are given in cluster output – Find mean and std dev for each dimension from histogram (distribution) output – Normalize each centroid coordinate: • (x - mean) /st dev – Compute distances between each pair of centroids: • Inter-cluster distance is given by the smallest of the normalized centroid distances d (x i i yi ) 2 Dissimilarity Ratio – cont. • Intra-cluster distance is given by the average max dist of the clusters. • The max dist of each cluster is found at the clusters output in JMP. • Computer dissimilarity ratio (DR) for each clustering • The higher the DR the better the clustering.