Dr. Michael R. Hyman
• Also called classification analysis and numerical taxonomy
• Goal: assign objects to groups so that intra-group similarity and inter-group dissimilarity as maximized
• No (in)dependent variables
• Find naturally occurring groupings of objects
2
• Benefit segmentation
• Finding market niches
• Finding homogeneous market segments for future study
• Data reduction
3
Clusters Formed by Using Data on Two Characteristics
4
Scatter Plot of Income and Education Data for PC Owners and Non-owners
5
6
7
Procedure #1: Divisive (tear down)
• Start with profile data
• Find variable with highest variance
• Split objects above and below mean on this variable
• Find remaining high variance variable and split along mean
8
• Select similarity measure
– Distance (Euclidean, city block)
– Correlation
– Similarity
• Search similarity matrix for most similar cluster pair
• Repeat iteratively until only one cluster remains
9
Commonly
Used
Similarity
Coefficients
20
10
• Theory and practice
• Distance that clusters combine
• Within/between group variance
• Relative sizes of clusters
11
• Single (nearest neighbor)
• Makes long, thin clusters
• Complete (maximum distance to farthest neighbor)
• Sensitive to outliers
• Average distance between objects
• Variance methods (minimum withincluster variance)
• Nodal (begin with two least similar objects as nodes)
12
13
14
Procedure #2: Agglomerative
Reliability and Validity Assessment
• Use different distance measures
• Use different clustering methods
• Split data, run both halves, and compare
• Shuffle cases (objects)
• Solve with subset of profile variables
15
• Early assignments treated as permanent
– Precludes later revision for improved fit
• Number of clusters
– More clusters means greater intra-group homogeneity but less descriptive power
• No good measure of cluster compactness
• Lack of statistical properties makes inference difficult
16
• Coping with inter-correlated profile variables
• Must select profile variables that can discriminate among objects
• Sensitive to unit of measurement and outliers
– Fix: Standardize data and delete outliers
• Subjective interpretation of results (i.e., naming clusters)
17
Steps for
Conducting a Cluster
Analysis: A
Summary
18
19