Some notes for a Psychological Methods paper that outlines the

advertisement
Amalgamation of Partitions from Multiple Segmentation Bases:
A Comparison of Non-Model-Based and Model-Based Methods
Rick L. Andrews1, Michael J. Brusco2, Imran S. Currim3,4
Lerner College of Business and Economics, Newark, Delaware 19716
College of Business, Florida State University, Tallahassee, Florida 32306-1110
Paul Merage School of Business, University of California, Irvine, California 92697-3125
July 15, 2008
1
2
3
4
Tel: +1 302 831 1190 andrews@lerner.udel.edu
Tel: +1 850 644 6512 mbrusco@cob.fsu.edu
Tel: +1 949 824 8368 Fax: +1 949 725 2826 iscurrim@uci.edu (Corresponding author)
The authors are listed in alphabetical order
Abstract
The segmentation of customers on multiple bases is a pervasive problem in marketing
research. For example, segmentation service providers partition customers using a variety of
demographic and psychographic characteristics, as well as well as an array of consumption
attributes such as brand loyalty, switching behavior, and product/service satisfaction.
Unfortunately, the partitions obtained from multiple bases are often not in good agreement with
one another, making effective segmentation a difficult managerial task. Therefore, the
construction of segments using multiple independent bases often results in a need to establish a
partition that represents an amalgamation or consensus of the individual partitions. In this paper,
we compare three methods for finding a consensus partition. The first two methods are
deterministic, do not use a statistical model in the development of the consensus partition, and
are representative of methods used in commercial settings, whereas the third method is based on
finite mixture modeling. In a large scale simulation experiment the finite mixture model yielded
better average recovery of holdout (validation) partitions than its non-model-based competitors.
This result calls for important changes in the current practice of segmentation service providers
that group customers for a variety of managerial goals related to the design and marketing of
products and services.
Keywords: Marketing; Clustering; Market segmentation; Consensus partition; Finite mixture
models
1
1. Introduction
Market segmentation has maintained a venerable position in the marketing research
literature since the ground-breaking work of Wendell Smith (1956) more than one-half century
ago. The pragmatic benefits of segmentation have been described by Frank et al. (1972), Wind
(1978), and Punj and Stewart (1983), among others. Advancements in segmentation
methodology have proliferated in the marketing and classification literatures, and an excellent
treatment of these developments is provided by Wedel and Kamakura (2000). Because
segmentation is a challenging task of considerable importance in industry today, a large number
of segmentation service providers, such as Claritas, NPD/Crest, Simmons, MediaMark, and Polk
Automotive, have developed a variety of segmentation based products and services for various
product and service categories such as retail, restaurant, real estate, financial services, health
care, automotive, telecommunications, internet, cable and satellite services, energy, media and ad
agencies, travel, and not-for-profit organizations. In addition, Claritas provides additional
segmentation services on nearly all marketing databases from leading providers such as
ACNielsen, Gallup, IRI, JD Power, MediaMark, Nielsen Media Research, NFO, NPD, Polk
Automotive, Scarborough and Simmons, and nearly all major direct mail list providers,
consumer marketing surveys and audience measurement systems.
One of the most important distinguishing aspects of state-of-the-art segmentation
procedures is the presence or absence of a statistical model. Traditional clustering procedures,
such as Ward’s (1963) hierarchical method or K-means partitioning algorithms (MacQueen,
1967), which are widely used in commercial settings, take the data “as is” and do not posit any
statistical model. Hereafter, we refer to methods of this type as “non-model-based.”
Contrastingly, some academic literature advocates finite mixture models as a preferred approach
to clustering because of the provision of a formal statistical model (e.g., Banfield and Raftery,
1993; McLachlan and Peel, 2000). Mixture model approaches have received considerable
attention in the marketing research literature (Dillon and Kumar, 1994; Wedel and Kamakura,
2000); however, their relative incorporation into marketing practice is limited at best. It is likely
that some of the hesitancy for practitioners to adopt model-based approaches to clustering may
stem from an insufficient knowledge of their performance relative to their non-model-based
competitors. Extensive comparisons of model-based and non-model-based methods are slowly
emerging in the literature (Andrews et al., 2008; Steinley and Brusco, 2008) and should
2
ultimately provide marketing research analysts with information as to precise conditions under
which model-based approaches are apt to be preferred. For example, the study by Andrews et al.
(2008) compares model-based and non-model-based procedures for segmenting consumers
jointly and simultaneously on two sets of bases, corresponding to households’ responses to
product and marketing mix strategies and their demographic characteristics. They found that if
the manager’s primary purpose is to forecast responses to product and marketing mix variables
for a new sample of consumers for whom only demographics are available, model and nonmodel based procedures perform equally well. On the other hand, if developing an
understanding of the true segmentation structure in a market is important, as is often the case for
design and marketing of products, the model-based procedure is clearly preferred.
Another critical factor for characterizing segmentation problems is the nature of the data
measurements. The most common non-model-based procedures (e.g., K-means), as well as
methods based on mixtures of normal distributions, are designed for data that are at least
interval-scaled. Although clustering based on metric data is certainly important in marketing
research, so are segmentation problems based on categorical data (Green et al., 1988). The
viability of model-based methods for segmentation based on categorical data is also recognized
in the literature on latent class analysis (Dillon and Kumar, 1994). In contrast, despite the
flexibility of non-model-based clustering methods such as p-median algorithms (Brusco and
Köhn, 2008) and K-centroid clustering heuristics (Chaturvedi et al., 1997) for clustering data
measured on nominal or ordinal measurements, their implementation in the marketing literature
is limited.
One especially significant segmentation problem pertaining to categorical data arises
when the same set of consumers are segmented independently on two or more distinct bases.
When different partitions of customers are obtained for different segmentation bases, perhaps
with varying number of segments across partitions, a natural problem that arises is the
establishment of a single consensus partition that best reflects an amalgamation of the individual
partitions (Krieger and Green, 1999). 1 The potential for segmenting consumers on different
1
The segmentation approach of interest here is distinct from the bicriterion clustering approaches studied by
Andrews et al. (2008), in which model-based and non-model-based procedures were compared in terms of their
abilities to form a single partition on the basis of two sets of bases. The current study takes multiple partitions
resulting from the clustering of independent bases as given and compares model-based and non-model-based
procedures for forming a single consensus partition. Thus, the procedures of interest in this study could be viewed
3
bases has been identified by a number of authors (Krieger and Green, 1996; Ramaswamy et al.,
1996; Brusco et al., 2002; Andrews and Currim, 2003a; Brusco et al., 2003). Unfortunately,
segment partitions for different bases often do not correlate well with one another. For example,
household responses to product and marketing mix characteristics often do not correlate well
with household characteristics (Wind, 1978; Green and Krieger, 1991; Gupta and Chintagunta,
1994; Brusco et al., 2002), resulting in segmentation that is ineffective.
The segmentation of households based on different bases such as household responses to
the product or marketing mix and household characteristics is pervasive across commercial
applications as well. For example, Claritas employs information on purchase behaviors and
consumer descriptor variables (e.g., demographic, lifestyle) in its PRIZM NE and other
associated segmentation systems to increase profitability through customer acquisition,
development, and retention. NPD also employs consumer behavior (e.g., shopping behavior,
purchases) and descriptor variables (e.g., demographics, lifestyle, attitudes) to reveal groupings
of consumers, help retailers better understand the behaviors and characteristics of the buyers they
have, and determine how best to build market share. Simmons gives client companies insight on
the differences and similarities between consumer segments based on their body characteristics,
similarity and differences in food preferences, and other descriptor variables (e.g., self-image,
and media usage preferences) so that clients are able to better target products and promotions to
groups of people of different body sizes. MediaMark Research employs consumer behavior
(actual and intended product usage) and descriptor variables (e.g., demographics,
psychographics, lifestyle and media) for print media (e.g., newspapers, magazines, etc.) to
support concept and product development, conduct prime prospect analysis to help expand
readership, attract new readers and meet existing reader needs. And, R. L. Polk uses a
combination of automotive preference and purchase data, and demographic and lifestyle data to
develop predictive segmentation models, and help client companies to communicate with and
retain current customers, identify in-market buyers, generate brand awareness, reach, capture
new customers, and target ‘the right people at the right time with the right offer.’ These examples
suggest that segmentation based on multiple bases is pervasive in commercial settings as well.
as alternatives to bicriterion clustering methods, but with the added advantage that they can easily accommodate
problems in which different bases produce different numbers of segments.
4
What options are available to marketing research analysts looking to obtain a consensus
partition from multiple independent partitions? One classic approach has deep historical roots in
the theory of voting and social choice (Borda, 1784; Condorcet, 1785), and is based on the
aggregation of preferences. Règnier (1965) is generally credited with the extension of these
principles to clustering. His approach views the set of categorical variables as a system of binary
equivalence relations and subsequently obtains the equivalence relation that is minimum distance
from the system of relations in a least squares sense. The resulting optimization problem can be
transformed into a binary integer program and is widely known as the clique partitioning
problem (Grötschel and Wakabayashi, 1989). Hereafter, we refer to this approach as the CPP.
Krieger and Green (1999) developed an approach for amalgamating partitions that is
conceptually similar to the CPP, although less mathematically tractable for exact solution
methods. Their approach, which they call SEGWAY, attempts to find a consensus partition that
maximizes a weighted function of agreement indices between the consensus partition and the set
of categorical measurements. Krieger and Green opted for Hubert and Arabie’s (1985) adjusted
Rand index (ARI) as the agreement measure, which was an excellent choice in light of its
reputation in the classification literature. For example, Steinley (2004, p. 394) provides cogent
arguments for the superiority of the ARI over classification rates, including ARI’s correction for
chance and the greater information it provides by capturing the pattern of classified observations.
The CPP and SEGWAY methods for consensus partitioning do not employ a statistical
model. They take the data measurements “as is” and optimize the appropriate objective criterion
using discrete partitioning algorithms. As an alternative to these two non-model-based
approaches, we consider latent class models (Lazarsfeld and Henry, 1968) as a third approach.
Each of the three approaches for amalgamation of partitions has its own inherent
advantages and limitations. For example, the CPP possesses the benefits of historical precedent
and the use of the ubiquitous least-squares approach to data analysis. Another desirable property
of the CPP is that it is one of the few discrete partitioning methods that automatically determines
the number of clusters via the solution process. Relative to the CPP, SEGWAY offers greater
flexibility with respect to differentially weighting the importance of partitions in the objective
function, as well as the incorporation of constraints to secure minimum ARI thresholds for each
partition. The SEGWAY algorithm is, however, more computationally demanding than the CPP
and selection of appropriate weights and thresholds can be a tedious and challenging endeavor.
5
Whereas the CPP and SEGWAY models are plausible discrete partitioning approaches to
the amalgamation of partitions, the latent class method (Lazarsfeld and Henry, 1968; Dillon and
Kumar, 1994) is the natural probabilistic approach. The latent class method, which falls within
the domain of finite mixture models (FMM), is predicated on the assumption of local
independence among the categorical attributes (McLachlan and Peel, 2000, Chapter 5; Grim,
2006). Although somewhat unrealistic, this assumption, which is sometimes referred to as the
naive Bayes approach, often yields very satisfactory performance when used in practice. Using
the popular EM algorithm, segment probabilities for each class of each partition are iteratively
estimated with the goal of maximizing log likelihood. Upon convergence of the algorithm, each
customer is assigned to the segment for which the membership probability is largest.
Although the CPP, SEGWAY, and FMM approaches for consensus partitioning each
have their own merits, little is known about their relative performances when applied to
segmentation data (Krieger and Green, 1999). Without such information, it is difficult to offer
recommendations to market research practitioners with respect to the best practices for
amalgamating partitions from different bases. Accordingly, our contribution is the comparison
of the consensus partitioning approaches across a broad range of data conditions. The results of
our study revealed that the FMM provided better recovery than its non-model-based competitors.
In the next section, we provide a formal precise description of the CPP, SEGWAY, and
FMM approaches. This is followed by the description and results of a simulation experiment
comparing the consensus partitioning methods with respect to their fit to holdout (validation)
partitions. The paper concludes with a discussion of the findings and suggestions for future
research.
2. Amalgamation methods
2.1. Binary equivalence relations and the clique partitioning problem (CPP)
Our development of the problem of aggregating binary equivalence relations and the
formulation of the CPP uses the following definitions:
N:
the number of customers (or firms) that have been partitioned, indexed 1  n  N;
P:
the number of partitions of the customers to amalgamate, indexed 1  p  P;
X:
an N  P matrix with elements xnp representing the class measurement for customer n
in partition p, for 1  n  N and 1  p  P;
6
an N  N matrix that defines a binary equivalence relation on partition p, for 1  p 
E(p) :
P. The elements of E(p) are defined as enj( p ) = 1 if xnp = xjp and enj( p ) = 0 if xnp  xjp, for
1  n, j  N and 1  p  P;
an N  N matrix defining a “median” relation for the N customers, where enj = 1 if
E:
customers n and j are assigned to the same segment, 0 otherwise, for 1  n, j  N;
The goal of the problem posed by Règnier (1965) is to find the median relation (Mirkin,
1974, 1979; Barthélemy and Monjardet, 1995), E, that provides the best aggregation of the P
binary equivalence relations, E(1), E(2), ..., E(P), as measured by the least squares loss function:
P
P
N

N
Minimize:  E ( p ) - E  enj( p )  enj
p 1

2
.
(1)
p 1 n 1 j 1
The loss function is minimized subject to constraints guaranteeing reflexivity, symmetry,
transitivity, and binary properties of the median relation, which are, respectively, enforced as
follows:
enn = 1
 1  n  N,
(2)
enj = ejn
 1  n < j  N,
(3)
eni + eij – enj  1
 1  n, i, j  N,
(4)
 1  n, j  N.
(5)
enj  {0, 1}
The linearization of (1) is accomplished via some algebra that begins with the expansion
of the quadratic expression and separation of terms that do not include the median relation:
P
N
N
P
N
N


Minimize:   enj( p )   1  2enj( p ) enj .
p 1 n 1 j 1
(6)
p 1 n 1` j 1
The first component of (6) does not depend on the median relation, E, and is, therefore,
removable from the objective function. This elimination, along with the incorporation of the first
summation term of the second component into the parentheses, yields the following objective
function:
P


( p)

 enj .
P

2
e


nj


n 1 j 1 
p 1

N
Minimize:
N
(7)
Some further simplification is possible by observing that the reflexivity property of the
equivalence relation allows the main diagonal of E to be ignored. Similarly, the symmetry
7
property permits restriction to the upper triangle (or lower triangle) of E. We define the decision
variables, ynj = 1 if customers n and j are placed in the same segment of the median relation and 0
otherwise, for 1  n < j  N. Additionally, we employ the parameters,
P
cnj  P  2 enj( p ) ,
 1  n < j  N.
(8)
p 1
Using these definitions, the CPP is formulated as follows (Grötschel and Wakabayashi,
1989; Kochenberger et al., 2005):
N 1
z
Minimize:
N
c
n 1 j  n 1
Subject to:
nj
y nj ,
(9)
yni + yij – ynj  1
 1  n < i < j  N,
(10)
yni – yij + ynj  1
 1  n < i < j  N,
(11)
– yni + yij + ynj  1
 1  n < i < j  N,
(12)
 1  n < j  N.
(13)
ynj  {0, 1}
Constraints (10), (11), and (12), which are sometimes referred to as “triangle” constraints,
are required to guarantee transitivity. The exact solution of the CPP is possible for modestlysized N using mathematical programming procedures (Grötschel and Wakabayashi, 1989;
Palubeckis, 1997; Mehrotra and Trick, 1998); however, heuristic procedures are more common,
particularly for larger problem instances. We use a heuristic relocation algorithm that is similar
in design to methods proposed by Marcotorchino and Michaud (1981) and Règnier (1965). The
steps of the algorithm are presented in the Appendix.
2.2. The SEGWAY algorithm
Krieger and Green (1999) presented the SEGWAY algorithm as a method for consensus
partitioning based on the ARI. To highlight the relationship between SEGWAY and the CPP,
we develop the ARI using the notation from the previous subsection. We begin with the
definition of the following measures:
N 1
1( p )  
 e
N
n 1 j  n 1
N 1
 (2p )  
 e
N
n 1 j  n 1
( p)
nj
( p)
nj

: enj( p )  enj  1 ,

: enj( p )  1 and enj  0 ,
8
 1  p  P,
(14)
 1  p  P,
(15)

N 1
( p)
3

 (1  e
N
n 1 j  n 1
N 1
 (4p )  
( p)
nj

) : enj( p )  0 and enj  1 ,
 (1  e
N
n 1 j  n 1
( p)
nj

) : enj( p )  enj  0 ,
 1  p  P,
(16)
 1  p  P,
(17)
The value of 1( p ) represents the number of customer pairs that are in the same segment in
the amalgamated partition defined by E, and also in the same segment of the partition
corresponding to E(p). Similarly,  (4p ) represents the number of customer pairs that are in
different segments in the amalgamated partition defined by E, and also in different segments of
the partition corresponding to E(p). Contrastingly, the  (2p ) and  3( p ) values reflect discordance
between the amalgamation partition and partition p. For example,  (2p ) is the number of
customer pairs that are in different segments in the amalgamated partition defined by E, but in
the same segment of the partition corresponding to E(p). Effectively, the objective function of
CPP approach is to minimize total discordance and, therefore (1) could be re-written as follows:
P
Minimize:


2  (2p )   3( p ) .
(18)
p 1
The SEGWAY approach differs from the CPP by using the ARI, rather than simple
discordance between the amalgamated partitions and each of the P individual partitions. The
ARI for each partition is computed as follows:
ARI ( p E) 
H (1p   (4p ) )  [( 1( p )   (2p ) )( 1( p )   3( p ) )  ( 3( p )   (4p ) )(  (2p )   (4p ) )]
,
H 2  [( 1( p )   (2p ) )( 1( p )   3( p ) )  ( 3( p )   (4p ) )(  (2p )   (4p ) )]
(19)
for 1  p  P and where H = N(N-1)/2. Krieger and Green (1999) propose maximization of a
weighted function of the ARI(p | E) values, subject to constraints on minimum segment size and
minimum ARI thresholds for each of the individual partitions. To present the model, we define
p (1  p  P) as the ARI weights, min_size (1  s  S) as the minimum segment size, and
ARI_threshold(p) (1  p  P) as the ARI thresholds. The SEGWAY optimization problem is
posed as follows:
P
Maximize:

p 1
Subject to:
p
(20)
( ARI ( p E))
|s|  min_size
9
 1  s  S,
(21)
ARI(p | E)  ARI_threshold(p),
P

p 1
p
 1  p  P,
 1.
(22)
(23)
Although the formulation of the SEGWAY model is presented with considerable flexibility, the
selection of weights and thresholds in these types of scalar optimization problems is especially
problematic (see, for example, Brusco et al., 2003). Accordingly, throughout the remainder of
this paper, ARI threshold and segment size constraints are not considered and equal ARI weights
of p = 1/P for 1  p  P are assumed.
The SEGWAY algorithm proposed by Krieger and Green (1999) is similar in design to
the algorithm for CPP presented in the Appendix. In fact, there are only two salient differences.
The first difference is the obvious modification of the objective function to reflect weighted ARI
values rather than simple discordance. The second difference is that Krieger and Green
developed SEGWAY to be used for a pre-specified number of segments, S. However, there is no
reason that S cannot be permitted to vary during the execution of the algorithm, and our
comparison incorporates this additional flexibility.
2.3. The latent class finite mixture model (FMM)
The latent class model is a finite mixture model developed to explain the structure of a set
of multivariate categorical data (Dillon and Kumar, 1994). To describe the FMM for
amalgamation of segments, we define x n  {xnp } as the vector of discrete random variables
corresponding to class memberships for the nth customer on the P partitions (1  n  N and 1  p
 P), where x np may take values g  1,  , g p . The probability of observing x n , given that
customer n is a member of latent segment s, is

gp
P
f (x n | s )   f ( xnp  g s )
 pg

,
(24)
p 1 g 1
where
 1 if xnp  g
0 otherwise.
 pg  
(25)
The constraint
gp
 f (x
g 1
np
 g s)  1
10
(26)
is needed to satisfy the laws of probabilities (Dillon and Kumar, 1994), so only g p  1
probabilities need to be estimated for each p, s.
Additionally, we denote s as the probability of membership in segment s for each of the
latent segments (1  s  S), with the constraint that
S

s 1
s
 1 . The latent class model is given by
S
Pr( x n )   s f (x n s ) .
(27)
s 1
The goal is to estimate the s and the probabilities f ( xnp  g s) to maximize the
following log-likelihood function:
1
L
N
S

log  s f (x n s) .

n 1
 s 1

N
(28)
The total number of parameters required is ( S  1)  S  g p  1 . The estimation process was
P
p 1
completed using the EM-algorithm (Dempster et al., 1977). To initialize the algorithm, a random
partition of S segments was created and the s and f ( xnp  g s) values were obtained. The
algorithm was implemented until convergence and the consensus partition was produced by
assigning each customer to the latent segment for which its probability was greatest. We
restarted the algorithm 20 times using a different random initial partition for each restart. In
most instances, the same final likelihood algorithm was obtained for each restart, indicating that
problems of local optimality are likely not relevant for our application.
The EM algorithm for FMM requires a fixed value of S. For this reason, we ran the EMalgorithm for 2  S  8 and used a modified version of Akaike’s (1973) information criterion
(AIC) to choose S. In particular, we used the following AIC3 criterion, which has yielded
excellent performance in two segment retention studies (Andrews and Currim 2003b, 2003c):
P


AIC3  2L  3 S  1  S  ( g p  1)  .
p 1


The values of S minimizing AIC3 was selected for evaluation.
3. Simulation design
3.1. Experimental factors
11
(29)
Six experimental factors were manipulated to generate the datasets. These between
dataset factors were:
1. The number of latent segments (levels of S = 2 and 4)
2. The segment membership probabilities (levels of s = 1/S for 1  s  S, and 1 = .6
with s = .4 / (S – 1) for 2  s  S).
3. The number of partitions to amalgamate (levels of P = 3, 4, and 5)
4. The number of classes within each partition (levels of gp = 2, 3, and 4)
5. The partition concordance parameters (levels of  = 2, 3, and mixed)
6. The number of customers (levels N = 100 and 200).
A full-factorial design associated within each of these six factors produced 33  23 = 216 cells.
Three datasets were generated for each cell, resulting in a total of 648 unique datasets. The
datasets were generated using the procedure described in Appendix B of Krieger and Green’s
(1999) paper, using the  parameter to control for the strength of association among the
partitions. The greater the value of , the greater the consistency of the partitions with the
underlying consensus partition. Also, for consistency with Krieger and Green’s study, for each
dataset two validation (holdout) partitions were generated in addition to the P partitions to be
submitted to an amalgamation process. Performance with respect to these validation partitions
provides the basis for comparing the different methods.
The levels of the number of latent segments, the number of classes within each partition,
and the number of customers were selected based on levels selected in simulation studies
published on segmentation (e.g., Vriens et al., 1996; Andrews, Ainslie, and Currim 2002;
Andrews, Ansari and Currim 2002; Andrews and Currim 2003c). Segment membership
probabilities were chosen to correspond to segments of equal size, and alternatively, segments
comprising one large segment and one of more equally sized smaller segments. The minimum
size of the segments conformed with Andrews and Currim (2003c). The number of partitions to
be amalgamated (3-5) was chosen to allow for household purchase behavior (e.g., heavy or light
users, or loyal or switcher, etc.), demographic description (e.g., income, or employment status, or
household size, etc.), lifestyle variables (e.g., consumption of the arts or technology, or different
types of retailers, lifestyle, etc.), and other bases. When a larger number of descriptors is
available, data reduction techniques such as factor or principle component analysis are often used
to derive a smaller (e.g., 3-5) number of partitions.
12
3.2. Algorithms and computer implementation
The method used to amalgamate the partitions can be perceived as a within dataset factor.
For the simulation experiment, the CPP, SEGWAY, and FMM methods were applied to each of
the 648 datasets under the practical assumption that the value of S was unknown. The CPP and
SEGWAY algorithms were permitted to modify S so as to optimize their respective criteria and
the AIC3 criterion was used to determine the appropriate number of segments for FMM. For
each dataset and each method, data were collected with respect to the average agreement
(measured by ARI) between the final partition and the two validation partitions and total
computation time.
The CPP, SEGWAY, and FMM procedures were written as MATLAB (MathWorks, Inc.,
2002) m-files. All computational results were obtained by implementing these programs on a 2.4
GHz Core Duo Processor with 3GB of SDRAM.
4. Simulation results
Table 1 provides, for each level of each factor, the average recovery of the validation
partitions obtained by the CPP, SEGWAY, and FMM methods. An analysis-of-variance of these
results was also performed and the results for significant main effects and two-way interactions
are presented in Table 2.
[Insert Tables 1 and 2 About Here]
The ANOVA results in Table 2 show that all main effects are significant; however, there
is a substantial disparity among the size of the effects as measured by partial eta-squared (2).
Among the between dataset factors, the number of segments S had the greatest effect (2 = .912).
As shown in Table 1, all three methods exhibited much greater recovery of the validation
partitions at S = 4 relative to S = 2. However, interpreting the interaction of this effect with the
method (as we do below) probably produces more useful insights since segmentation problems
in general become more difficult as the number of segments increases.2 The second and third
largest main effects corresponded to the concordance factor (2 = .758) and the number of
classes per partition (2 = .676), respectively. Recovery performances of each of the three
2
The original Rand statistic approaches its upper limit of one as the number of segments increases. The Adjusted
Rand Index (ARI) was created to overcome this limitation (Steinley 2004), but it is not clear whether this tendency
toward one could still be present in the current simulation experiment. Later, we compute the values of ARI
between methods and find the same pattern of results for the number of segments S, which indicates that the statistic
itself increases as the number of segments increases. Krieger and Green (1999) did not vary the number of segments
in the study that introduced SEGWAY.
13
methods improved markedly as the number of classes per partition (gp) increased,3 and as
expected segment recovery was strongest for  = 3 and weakest for  = 2.4 Although clearly not
as important as S, gp, and , the number of partitions in the amalgamation process possessed the
fourth largest effect size (2 = .298). The CPP, SEGWAY, and FMM procedures each realized
improved recovery performance as P increased.5 The mixing proportions (2 = .241) and the
number of customers (2 = .009) had the smallest between dataset effects on recovery of the
validation partitions. As observed from Table 1, for each of the three amalgamation methods,
the difference between the mean for equal segment membership probability and 60% probability
for the largest segments were quite small. Similarly, there were only minor differences in the
means for 100 versus 200 customers.
Table 2 shows that the within dataset factor, amalgamation method, had a statistically
significant effect size of (2 = .138). The FMM procedure clearly outperformed its non-modelbased competitors. The overall recovery performances for FMM, SEGWAY, and CPP were
.4989, .4716, and .4482, respectively. Moreover, the FMM procedure outperformed its
competitors at nearly all factor levels, lagging slightly behind SEGWAY and CPP when gp = 2.
Fourteen of the 21 two-way interactions were statistically significant at the  = .05 level.
What is especially interesting is that the second and fourth largest interaction effects correspond
to the within dataset factor – the amalgamation method. The second largest interaction effect (2
= .103) occurred between the amalgamation method and the number of classes within the
partitions (gp). Table 1 is helpful for uncovering the details of this effect. At gp= 2, there is
minimal disparity among the average recovery performances of the three methods. The average
ARI values are .3720, .3747, and .3683 for SEGWAY, CPP, and FMM, respectively. At gp= 3,
the relative performance of FMM improves markedly, whereas CPP’s diminishes. The average
ARI values are .4949, .4659, and .5270 for SEGWAY, CPP, and FMM, respectively. The
separation of the methods is even more pronounced at gp= 4, where the average ARI values are
.5480, .5039, and .6014 for SEGWAY, CPP, and FMM, respectively. What is unequivocally
clear from this interaction effect is that the relative superiority of FMM grows markedly as gp
increases.
3
This main effect could be reflecting a tendency of the ARI to increase as the number of classes within partitions
increases.
4
Increases in consistency of partitions results in improved segment recovery.
14
The fourth largest interaction effect (2 = .066) is between S and the amalgamation
method. Once again, Table 1 provides evidence as to the nature of this effect. At S = 2, FMM’s
average recovery of .3471 is a good bit stronger than SEGWAY’s (.3080) and quite a bit better
than CPP’s (.2629). Contrastingly, at S = 4, although FMM still holds the advantage, it does so
by a less substantial margin. The average ARI values are .6353, .6335, and .6508 for SEGWAY,
CPP, and FMM, respectively.
The FMM procedure exhibited the greatest efficiency, requiring an average of 16.30
seconds. The CPP algorithm was also efficient, yielding an average computation time of 20.36
seconds, whereas the SEGWAY algorithm averaged 89.52 seconds.
The results in Tables 1 and 2 provide compelling evidence that there are clear differences
in the recovery performances of the three amalgamation methods. In light of the superior
recovery performance of FMM recovery, a natural question is: Do the differences in recovery
translate into salient differences in the partitions obtained by the three methods? To answer this
question, we analyze two additional outputs of the simulation study: (a) the average number of
segments used by each of the three amalgamation methods, and (b) the average pairwise
agreement (as measured by ARI) between the partitions produced by the amalgamation methods.
Table 3 reports the average number of segments determined by each of the three
amalgamation methods. Consistent with previous research (Andrews and Currim, 2003a,
2003b), FMM is conservative in its use of segments, averaging 2.06 across the test problems.
The SEGWAY and CPP procedures are much more aggressive, averaging 5.56 and 6.92
segments, respectively. In some instances, these non-model-based methods generated 10 or
more segments, some of which contain only a single customer. The explanation for this finding
is that the non-model-based methods can often realize a modest improvement in their objective
functions by peeling off one (or possibly several) customers from a sizable segment and placing
those customers in a newly created segment. Contrastingly, through its use of the AIC3 criterion,
FMM is resistant to opening new segments because of the parameter penalty and, therefore,
avoids the overfitting problems that are inherent in the non-model-based methods.
[Insert Table 3 About Here]
Table 4 presents the average ARIs between pairs of partitions produced by SEGWAY,
CPP, and FMM. The two non-model-based methods exhibit a strong concordance. The average
5
Segment recovery is better when more data rather than less is used.
15
ARI between the SEGWAY and CPP partitions is .8900. Moreover, the two non-model-based
method obtain ‘perfect agreement’ (ARI = 1) for 36.88% of the test problems in the experimental
study. The FMM procedure exhibits somewhat less agreement with its non-model-based
competitors, yielding average ARI values of .7906 and .7276 with SEGWAY and CPP,
respectively. Perfect agreement between the FMM and SEGWAY partitions was realized for
only 11.26% of the datasets and for 2.31% of the datasets the ARI was less than 0.2. Thus, we
conclude that the actual segmentation results produced by model-based and non-model-based
amalgamation methods are quite different and would produce very different segmentation and
marketing strategies if implemented in real-world applications.
[Insert Table 4 About Here]
5. Discussion, conclusions, and implications for current practice
With respect to non-model-based procedures for establishing a consensus partition, our
results indicate that CPP and SEGWAY both tend to overfit the data, using more segments than
necessary. This problem seems to be slightly more severe for CPP, which experienced a greater
increase in the selected number of segments and a more pronounced degradation in recovery of
the validation partitions.
The FMM provided better recovery of the validation partitions than either of its nonmodel-based competitors. Whereas CPP and SEGWAY are apt to overfit the data, using more
segments than necessary, the FMM showed a remarkable ability to more conservatively choose S
and yield better performance in holdout samples. We believe that part of FMM’s success is
attributable to the use of the AIC3 criterion for model selection, which has performed well in
several previous segmentation studies (Andrews and Currim, 2003b, 2003c).
One of the inherent limitations of any simulation study is the inability to generalize the
results beyond the ranges of the factors tested. We selected the factor levels for S, gp , and N
based on simulation based studies on segmentation in the published literature. The levels for the
number of customers (N = 100 and N = 200) also enabled the Monte Carlo simulation study to be
conducted with a reasonable time frame. The SEGWAY algorithm and, to a lesser extent, the
CPP procedure can require considerable computation time for N  300. Although the levels of N
are apt to be below what might be encountered in applications, it is important is observe that this
factor had the smallest effect on cluster recovery in both experiments. Further, while
segmentation service providers usually describe a large number of segments, managers of their
16
customer firms will rarely consider more than four segments for differentiated product design or
marketing decisions. Consequently, segmentation service providers usually link the larger
number of more homogeneous segments to a smaller number of more basic segments which are
more heterogeneous.
Another decision that was made when conducting our experiment pertained to the
measure of partition recovery. We selected Hubert and Arabie’s (1985) adjusted Rand index as
the measure of agreement between the consensus partition and the validation partitions. This
decision was based on the well-recognized properties of the ARI in the classification literature
(Steinley, 2004), as well as its importance in market segmentation research (Helsen and Green,
1991; Carmone et al., 1999). Simulation studies have shown that ARI is the most desirable
index for measuring segment recovery (Steinley, 2004).
The caveats of our experiment notwithstanding, we believe that our simulation study
offers compelling information for segmentation practice. As reviewed in the introduction section
of this paper, segmentation is pervasive in commercial settings today, as a precursor for a variety
of managerial decisions, across product categories and industries, products and services, and forprofit and not-for-profit organizations. Today, non-model based approaches dominate
applications in commercial settings, including segmentation service providers. We recommend
that analysts in commercial settings consider the finite mixture approach as a viable method for
amalgamating a set of partitions established on different bases. Moreover, the mixture model
approach should be used in conjunction with the AIC3 index as the criterion for determining the
number of segments. This combination produced better results than CPP and SEGWAY even
when these latter two methods were implemented using the true number of segments used to
generate the synthetic data. Thus, the benefit of using FMM extends beyond better resolution of
the segment retention problem.
Finally, as marketing research analysts seek new approaches to the problem of
amalgamating partitions obtained from different bases, they should monitor research streams in
other disciplines. Most notable among these is the field of pattern recognition and machine
learning, where a consensus clustering is often referred to as an ensemble (Strehl and Ghosh,
2002; Topchy et al., 2005). Topchy et al., for example, highlight important advantages of
clustering ensembles for data mining, which include robustness, novelty, stability and confidence
estimation, and parallelization and scalability.
17
Appendix
The iterative relocation algorithm for the CPP requires as input an initial partition of the
customers into S segments. We denote such a partition as S = {A1, A2,...,AS}, where As is the set
of customer indices assigned to segment s and Ns = As is the number of customers is segment
s for 1  s  S. The steps of the relocation algorithm are as follows:
Step 1. Set * = 0 and i = 0.
Step 2. Set i = i + 1, s = 0, and define s : i  As.
Step 3. Set s = s + 1. If s > S + 1, then go to Step 10; otherwise go to Step 4.
Step 4. If s = s, then go to Step 3; otherwise go to Step 5.
Step 5. Compute the change in the objective function, , that occurs from moving customer i
from segment s to s. If  < *, then set * = , i* = i, s** = s and s* = s Go to Step 3.
Step 6. If i < N, then go to Step 2.
Step 7. If * = 0, then STOP.
Step 8. Set As** = As** – {i*}, As* = As*  {i*}, Ns** = Ns** – 1, and Ns* = Ns* + 1.
Step 9. If s* = S + 1, then S = S + 1 (i.e., a new segment has been created).
Step 10. If Ns** = 0, then S = S – 1 and all segment labels greater than s** are reduced by one as
the number of segments has been reduced.
Step 11. Go to Step 1.
On convergence, the resulting partition is guaranteed to be a local optimum with respect
to all possible relocations of a customer index to a different segment; however, a global optimum
is not guaranteed. For this reason, we restart the algorithm 20 times using a different initial
partition for each restart
We have described the relocation algorithm within the context of the CPP; however,
assuming fixed S, the algorithm is structurally the same as the SEGWAY method developed by
Krieger and Green (1999). The only difference is that SEGWAY has a maximization objective
function, thus necessitating a slight change in Step 5. We note that although Krieger and Green
(1999) presented SEGWAY within the context of fixed S, there is no reason the algorithm cannot
be run while permitting S to vary.
18
References
Akaike, H., 1973. Information theory and an extension of the maximum likelihood principle. In
B. N. Petrov and B. F. Csaki (Eds.) Second International Symposium on Information
Theory, Budapest: Academiai Kiado, pp. 267-281.
Andrews, R. L., Ainslie, A., Currim, I. S., 2002. An empirical comparison of logit choice models
with discrete versus continuous representations of heterogeneity. Journal of Marketing
Research, 39 (November), 479-487.
Andrews, R. L., Ansari, A., Currim, I. S., 2002. Hierarchical Bayes versus finite mixture conjoint
analysis models: a comparison of fit, prediction, and partworth recovery. Journal of
Marketing Research, 39 (February), 87-98.
Andrews, R. L., Brusco, M. J., Currim, I. S., 2008. A comparison of methods for bicriterion
clustering problems: Are there benefits from having a statistical model? Working paper,
University of Delaware.
Andrews, R. L., Currim, I. S., 2003a. Recovering and profiling the true segmentation structure in
markets: An empirical investigation. International Journal of Research in Marketing, 20,
177-192.
Andrews, R. L., Currim, I. S., 2003b. Retention of latent segments in regression-based marketing
models. International Journal of Research in Marketing, 20, 315-321.
Andrews, R. L., Currim, I. S., 2003c. A comparison of segment retention criteria for finite
mixture logit models. Journal of Marketing Research, 40 (May), 235-243.
Banfield, J. D., Raftery, A. E., 1993. Model-based Gaussian and non-Gaussian clustering.
Biometrics, 49, 803-821.
Barthélemy, J.-P., Monjardet, B., 1995. The median procedure for partitions. DIMACS Series in
Discrete Mathematics and Theoretical Computer Science, 19, 3-34.
Borda, J. C., 1784. Mèmoire sur les élections au scrutin. Histoire de l’Académie royale des
Sciences pour 1781. Paris.
Brusco, M. J., Cradit, J. D., Stahl, S., 2002. A simulated annealing heuristic for a bicriterion
partitioning problem in market segmentation. Journal of Marketing Research, 39
(February), 99-109.
Brusco, M. J., Cradit, J. D., Tashchian, A., 2003. Multicriterion clusterwise regression for joint
segmentation settings: An application to customer value. Journal of Marketing Research,
40 (May), 225-234.
Brusco, M. J., Köhn, H.-F., 2008. Optimal partitioning of a data set based on the p-median
model. Psychometrika, 73 (March), 89-105.
Carmone, F. J., Kara, A., Maxwell, S., 1999. HINoV: A new model to improve market
segmentation by identifying noisy variables. Journal of Marketing Research, 36
(November), 501-509.
19
Chaturvedi, A., Carroll, J. D., Green, P. E., Rotondo, J. A., 1997. A feature-based approach to
market segmentation via overlapping K-centroids clustering. Journal of Marketing
Research, 34 (August), 370-377.
Condorcet, M. J. A. N., 1785. Caritat, marquis de Essai sur l’application de l’analyse à la
probabilité des décisions rendues à la pluralité des voix. Paris.
Dempster, A. P., Laird, N. M., Rubin, D. B., 1977. Maximum likelihood from incomplete data
via the EM algorithm (with discussion). Journal of the Royal Statistical Society B, 39, 138.
Dillon, W. R., Kumar, A., 1994. Latent structure and other mixture models in marketing: An
integrative survey and overview. In R. P. Bagozzi (Ed.), Advanced Methods of
Marketing Research, Oxford: Blackwell, pp. 295-351.
Frank, R. E., Massy, W. F., Wind, Y., 1972. Market Segmentation. Englewood Cliffs, NJ:
Prentice-Hall.
Green, P. E., Krieger, A. M., 1991. Segmenting markets with conjoint analysis. Journal of
Marketing, 55, 20-31.
Green, P. E., Schaffer, C. M., Patterson, K. M., 1988. A reduced-space approach to the clustering
of categorical data in market segmentation. Journal of the Market Research Society, 30,
267-288.
Grim, J., 2006. EM cluster analysis for categorical data. In D.-Y. Yeung, J. T. Kwok, A. L. N.
Fred, F. Roll, D. de Ridder (Eds.), Structural, Syntactic, and Statistical Pattern
Recognition, Berlin: Springer, pp. 640-648.
Grötschel, M., Wakabayashi, Y., 1989. A cutting plane algorithm for a clustering problem.
Mathematical Programming, 45, 59-96.
Gupta, S., Chintagunta, P. K., 1994. On using demographic variables to determine segment
membership in logit mixture models. Journal of Marketing Research, 31, 128-136.
Helsen, K., Green, P. E., 1991. A computational study of replicated clustering with an
application to market segmentation. Decision Sciences, 22 (5), 1124-1141.
Hubert, L. J., Arabie, P., 1985. Comparing partitions. Journal of Classification, 2 (2), 193-218.
Kochenberger, G., Glover, F., Alidaee, B., Wang, H., 2005. Clustering of microarray data via
clique partitioning. Journal of Combinatorial Optimization, 10, 77-92.
Krieger, A. M., Green, P. E., 1996. Modifying cluster-based segments to enhance agreement
with an exogenous response variable. Journal of Marketing Research, 33 (August), 351363.
Krieger, A. M., Green, P. E., 1999. A generalized Rand-index method for consensus clustering
of separate partitions of the same data base. Journal of Classification, 16 (1), 63-89.
Lazarsfeld, P. F., Henry, N., 1968. Latent structure analysis. Boston: Houghton-Mifflin.
MacQueen, J. B., 1967. Some methods for classification and analysis of multivariate
observations. In L. M. Le Cam and J. Neyman (Eds.), Proceedings of the fifth Berkeley
symposium on mathematical statistics and probability (Vol. 1), Berkeley, CA: University
of California Press, pp. 281-297.
20
Marcotorchino, F., Michaud, P., 1981. Heuristic approach of the similarity aggregation problem.
Methods of Operations Research, 43, 395-404.
MathWorks, Inc., 2002. Using MATLAB (version 6). Natick: The MathWorks, Inc.
McLachlan, G., Peel, D., 2000. Finite mixture models. New York: Wiley.
Mehrotra, A., Trick, M. 1998. Cliques and clustering: A combinatorial approach. Operations
Research Letters, 22, 1-12.
Mirkin, B. G., 1974. The problems of approximation in space of relations and qualitative data
analysis. Information and Remote Control, 35, 1424-1431.
Mirkin, B. G., 1979. Group choice. New York: Wiley.
Ramaswamy, V., Chatterjee, R., Cohen, S. H., 1996. Joint segmentation on distinct
interdependent bases with categorical data. Journal of Marketing Research, 33 (August),
335-350.
Palubeckis, G., 1997. A branch-and-bound approach using polyhedral results for a clustering
problem. INFORMS Journal on Computing, 9, 30-42.
Punj, G., Stewart, D. W., 1983. Cluster analysis in marketing research: Review and suggestions
for application. Journal of Marketing Research, 20 (May), 134-148.
Règnier, S., 1965. Sur quelques aspects mathématiques des problèmes de classification
automatique. I.C.C. Bulletin, 4, 175-191.
Smith, W., 1956. Product differentiation and market segmentation as alternative marketing
strategies. Journal of Marketing, 20, 3-8.
Steinley, D., 2004. Properties of the Hubert-Arabie adjusted Rand index. Psychological Methods,
9, 386-396.
Steinley, D., Brusco, M. J., 2008. Selection of variables in cluster analysis: An empirical
comparison of eight procedures. Psychometrika, 73 (1), 125-144.
Strehl, A., Ghosh, J., 2002. Cluster ensembles – A knowledge reuse framework for combining
multiple partitions. Journal of Machine Learning Research, 3, 583-617.
Topchy, A., Jain, A. K., Punch, W., 2005. Clustering ensembles: models of consensus and weak
partitions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27 (12), 116.
Vriens, M., M. Wedel, T. Wilms, 1996. Metric conjoint segmentation methods: a Monte Carlo
comparison. Journal of Marketing Research, 33 (February), 73-85.
Ward, J. H., 1963. Hierarchical grouping to optimize an objective function. Journal of the
American Statistical Association, 58, 236-244.
Wedel, M., Kamakura, W. A., 2000. Market Segmentation: Conceptual and Methodological
Foundations (2nd ed.). Boston, MA: Kluwer.
Wind, Y., 1978. Issues and advances in segmentation research. Journal of Marketing Research,
15, 317-338.
21
Table 1.
A comparison of average ARI values for the validation partitions obtained by the CPP,
SEGWAY and FMM methods.
Factor
Factor levels
SEGWAY
CPP
FMM
Number of Segments
S=2
.3080
.2629
.3471
S=4
.6353
.6335
.6508
s = 1/S for 1  s  S
.4440
.4157
.4713
1 = .6, s = .4/(S-1) for 2  s  S
.4993
.4807
.5266
P=3
.4272
.4121
.4645
P=4
.4685
.4359
.4974
P=5
.5192
.4965
.5349
Number of Classes
gp = 2
.3720
.3747
.3683
in each Partition
gp = 3
.4949
.4659
.5270
gp = 4
.5480
.5039
.6014
=2
.3648
.3389
.3901
=3
.5862
.5692
.6123
 = 2 or 3 (mixed)
.4639
.4365
.4943
N = 100
.4671
.4435
.4932
N = 200
.4762
.4529
.5047
.4716
.4482
.4989
Segment Probabilities
Number of Partitions
Concordance factor
Number of Customers
Overall
22
Table 2.
Analysis of variance of average ARI values for the validation partitions obtained by the CPP,
SEGWAY and FMM methods.
Source
Segments (S)
SegProb (s)
Partitions (P)
Classes (gp)
Concordance ()
Customers (N)
Method
S  s
SP
S  gp
S
SN
S  Method
s  P
s  gp
s  
s  N
s  Method
P  gp
P
PN
P  Method
gp  
gp  N
gp  Method
N
  Method
N  Method
Error
Total
SS
df
MS
F
Sig(F)
Effect
size (2)
54.189
1.667
2.225
10.948
16.419
.049
.836
1
1
2
2
2
1
2
54.189
1.667
1.113
5.474
8.209
.049
.418
19451.455
598.449
399.378
1964.916
2946.756
17.529
149.957
.000
.000
.000
.000
.000
.000
.000
.912
.241
.298
.676
.758
.009
.138
2.016
.090
.525
.192
.026
.373
.041
.080
.034
.000
.010
.123
.079
.017
.048
.050
.000
.603
.002
.013
.001
1
2
2
2
1
2
2
2
2
1
2
4
4
2
4
4
2
4
2
4
2
2.016
.045
.262
.096
.026
.187
.021
.040
.017
.000
.005
.031
.020
.009
.012
.012
.000
.151
.001
.003
.000
723.637
16.084
94.199
34.413
9.251
66.998
7.429
14.315
6.178
.000
1.844
11.074
7.124
3.133
4.281
4.477
.088
54.080
.324
1.187
.090
.000
.000
.000
.000
.002
.000
.001
.000
.002
.996
.158
.000
.000
.044
.002
.001
.916
.000
.724
.315
.914
.278
.017
.091
.035
.005
.066
.008
.015
.007
.000
.002
.023
.015
.003
.009
.009
.000
.103
.000
.003
.000
5.240
95.897
1881
1943
.003
23
Table 3.
A comparison of average number of segments obtained by the CPP, SEGWAY and FMM
methods when assuming S is unknown and determined by the method
Factor
Factor levels
SEGWAY
CPP
FMM
Number of Segments
S=2
5.56
8.13
2.00
S=4
5.56
5.71
2.11
s = 1/S for 1  s  S
5.77
6.69
2.07
1 = .6, s = .4/(S-1) for 2  s  S
5.35
7.16
2.05
P=3
5.27
6.21
2.03
P=4
6.06
7.66
2.08
P=5
5.35
6.89
2.06
Number of Classes
gp = 2
2.99
2.64
2.02
in each Partition
gp = 3
5.91
7.79
2.08
gp = 4
7.78
10.33
2.07
=2
6.04
8.06
2.06
=3
4.95
5.63
2.06
 = 2 or 3 (mixed)
5.69
7.06
2.06
N = 100
5.06
6.32
2.05
N = 200
6.06
7.52
2.06
5.56
6.92
2.06
Segment Probabilities
Number of Partitions
Concordance factor
Number of Customers
Overall
24
Table 4.
A comparison of average ARI values between pairs of partitions obtained by the CPP, SEGWAY
and FMM methods when assuming S is unknown and determined by the method
SEGWAY
SEGWAY
and
and
and
CPP
FMM
FMM
S=2
.8013
.6996
.5823
S=4
.9787
.8815
.8729
s = 1/S for 1  s  S
.8935
.7961
.7445
1 = .6, s = .4/(S-1) for 2  s  S
.8864
.7850
.7107
P=3
.8926
.7201
.6789
P=4
.8634
.7990
.7131
P=5
.9139
.8526
.7909
Number of Classes
gp = 2
.9528
.8071
.8128
in each Partition
gp = 3
.8656
.7894
.7009
gp = 4
.8515
.7752
.6692
=2
.8333
.7029
.6196
=3
.9485
.8804
.8449
 = 2 or 3 (mixed)
.8880
.7884
.7184
N = 100
.8879
.7733
.7075
N = 200
.8920
.8078
.7477
.8900
.7906
.7276
Factor
Number of Segments
Segment Probabilities
Number of Partitions
Concordance factor
Number of Customers
Factor levels
Overall
25
CPP
Download