Cluster Analysis Dr. Michael R. Hyman

advertisement

Cluster Analysis

Dr. Michael R. Hyman

Introduction

• Also called classification analysis and numerical taxonomy

• Goal: assign objects to groups so that intra-group similarity and inter-group dissimilarity as maximized

• No (in)dependent variables

• Find naturally occurring groupings of objects

2

Uses in Studying Consumers

• Benefit segmentation

• Finding market niches

• Finding homogeneous market segments for future study

• Data reduction

3

Clusters Formed by Using Data on Two Characteristics

4

Scatter Plot of Income and Education Data for PC Owners and Non-owners

5

6

7

Procedure #1: Divisive (tear down)

• Start with profile data

• Find variable with highest variance

• Split objects above and below mean on this variable

• Find remaining high variance variable and split along mean

8

Procedure #2: Agglomerative

(build up)

• Select similarity measure

– Distance (Euclidean, city block)

– Correlation

– Similarity

• Search similarity matrix for most similar cluster pair

• Repeat iteratively until only one cluster remains

9

Commonly

Used

Similarity

Coefficients

20

10

Procedure #2: Agglomerative

Stopping Rules

• Theory and practice

• Distance that clusters combine

• Within/between group variance

• Relative sizes of clusters

11

Procedure #2: Agglomerative

Linkage Methods

• Single (nearest neighbor)

• Makes long, thin clusters

• Complete (maximum distance to farthest neighbor)

• Sensitive to outliers

• Average distance between objects

• Variance methods (minimum withincluster variance)

• Nodal (begin with two least similar objects as nodes)

12

13

14

Procedure #2: Agglomerative

Reliability and Validity Assessment

• Use different distance measures

• Use different clustering methods

• Split data, run both halves, and compare

• Shuffle cases (objects)

• Solve with subset of profile variables

15

General Problems

• Early assignments treated as permanent

– Precludes later revision for improved fit

• Number of clusters

– More clusters means greater intra-group homogeneity but less descriptive power

• No good measure of cluster compactness

• Lack of statistical properties makes inference difficult

16

General Problems (cont.)

• Coping with inter-correlated profile variables

• Must select profile variables that can discriminate among objects

• Sensitive to unit of measurement and outliers

– Fix: Standardize data and delete outliers

• Subjective interpretation of results (i.e., naming clusters)

17

Steps for

Conducting a Cluster

Analysis: A

Summary

18

19

Download