Amalgamation of Partitions from Multiple Segmentation Bases: A Comparison of Non-Model-Based and Model-Based Methods Rick L. Andrews1, Michael J. Brusco2, Imran S. Currim3,4 Lerner College of Business and Economics, Newark, Delaware 19716 College of Business, Florida State University, Tallahassee, Florida 32306-1110 Paul Merage School of Business, University of California, Irvine, California 92697-3125 July 15, 2008 1 2 3 4 Tel: +1 302 831 1190 andrews@lerner.udel.edu Tel: +1 850 644 6512 mbrusco@cob.fsu.edu Tel: +1 949 824 8368 Fax: +1 949 725 2826 iscurrim@uci.edu (Corresponding author) The authors are listed in alphabetical order Abstract The segmentation of customers on multiple bases is a pervasive problem in marketing research. For example, segmentation service providers partition customers using a variety of demographic and psychographic characteristics, as well as well as an array of consumption attributes such as brand loyalty, switching behavior, and product/service satisfaction. Unfortunately, the partitions obtained from multiple bases are often not in good agreement with one another, making effective segmentation a difficult managerial task. Therefore, the construction of segments using multiple independent bases often results in a need to establish a partition that represents an amalgamation or consensus of the individual partitions. In this paper, we compare three methods for finding a consensus partition. The first two methods are deterministic, do not use a statistical model in the development of the consensus partition, and are representative of methods used in commercial settings, whereas the third method is based on finite mixture modeling. In a large scale simulation experiment the finite mixture model yielded better average recovery of holdout (validation) partitions than its non-model-based competitors. This result calls for important changes in the current practice of segmentation service providers that group customers for a variety of managerial goals related to the design and marketing of products and services. Keywords: Marketing; Clustering; Market segmentation; Consensus partition; Finite mixture models 1 1. Introduction Market segmentation has maintained a venerable position in the marketing research literature since the ground-breaking work of Wendell Smith (1956) more than one-half century ago. The pragmatic benefits of segmentation have been described by Frank et al. (1972), Wind (1978), and Punj and Stewart (1983), among others. Advancements in segmentation methodology have proliferated in the marketing and classification literatures, and an excellent treatment of these developments is provided by Wedel and Kamakura (2000). Because segmentation is a challenging task of considerable importance in industry today, a large number of segmentation service providers, such as Claritas, NPD/Crest, Simmons, MediaMark, and Polk Automotive, have developed a variety of segmentation based products and services for various product and service categories such as retail, restaurant, real estate, financial services, health care, automotive, telecommunications, internet, cable and satellite services, energy, media and ad agencies, travel, and not-for-profit organizations. In addition, Claritas provides additional segmentation services on nearly all marketing databases from leading providers such as ACNielsen, Gallup, IRI, JD Power, MediaMark, Nielsen Media Research, NFO, NPD, Polk Automotive, Scarborough and Simmons, and nearly all major direct mail list providers, consumer marketing surveys and audience measurement systems. One of the most important distinguishing aspects of state-of-the-art segmentation procedures is the presence or absence of a statistical model. Traditional clustering procedures, such as Ward’s (1963) hierarchical method or K-means partitioning algorithms (MacQueen, 1967), which are widely used in commercial settings, take the data “as is” and do not posit any statistical model. Hereafter, we refer to methods of this type as “non-model-based.” Contrastingly, some academic literature advocates finite mixture models as a preferred approach to clustering because of the provision of a formal statistical model (e.g., Banfield and Raftery, 1993; McLachlan and Peel, 2000). Mixture model approaches have received considerable attention in the marketing research literature (Dillon and Kumar, 1994; Wedel and Kamakura, 2000); however, their relative incorporation into marketing practice is limited at best. It is likely that some of the hesitancy for practitioners to adopt model-based approaches to clustering may stem from an insufficient knowledge of their performance relative to their non-model-based competitors. Extensive comparisons of model-based and non-model-based methods are slowly emerging in the literature (Andrews et al., 2008; Steinley and Brusco, 2008) and should 2 ultimately provide marketing research analysts with information as to precise conditions under which model-based approaches are apt to be preferred. For example, the study by Andrews et al. (2008) compares model-based and non-model-based procedures for segmenting consumers jointly and simultaneously on two sets of bases, corresponding to households’ responses to product and marketing mix strategies and their demographic characteristics. They found that if the manager’s primary purpose is to forecast responses to product and marketing mix variables for a new sample of consumers for whom only demographics are available, model and nonmodel based procedures perform equally well. On the other hand, if developing an understanding of the true segmentation structure in a market is important, as is often the case for design and marketing of products, the model-based procedure is clearly preferred. Another critical factor for characterizing segmentation problems is the nature of the data measurements. The most common non-model-based procedures (e.g., K-means), as well as methods based on mixtures of normal distributions, are designed for data that are at least interval-scaled. Although clustering based on metric data is certainly important in marketing research, so are segmentation problems based on categorical data (Green et al., 1988). The viability of model-based methods for segmentation based on categorical data is also recognized in the literature on latent class analysis (Dillon and Kumar, 1994). In contrast, despite the flexibility of non-model-based clustering methods such as p-median algorithms (Brusco and Köhn, 2008) and K-centroid clustering heuristics (Chaturvedi et al., 1997) for clustering data measured on nominal or ordinal measurements, their implementation in the marketing literature is limited. One especially significant segmentation problem pertaining to categorical data arises when the same set of consumers are segmented independently on two or more distinct bases. When different partitions of customers are obtained for different segmentation bases, perhaps with varying number of segments across partitions, a natural problem that arises is the establishment of a single consensus partition that best reflects an amalgamation of the individual partitions (Krieger and Green, 1999). 1 The potential for segmenting consumers on different 1 The segmentation approach of interest here is distinct from the bicriterion clustering approaches studied by Andrews et al. (2008), in which model-based and non-model-based procedures were compared in terms of their abilities to form a single partition on the basis of two sets of bases. The current study takes multiple partitions resulting from the clustering of independent bases as given and compares model-based and non-model-based procedures for forming a single consensus partition. Thus, the procedures of interest in this study could be viewed 3 bases has been identified by a number of authors (Krieger and Green, 1996; Ramaswamy et al., 1996; Brusco et al., 2002; Andrews and Currim, 2003a; Brusco et al., 2003). Unfortunately, segment partitions for different bases often do not correlate well with one another. For example, household responses to product and marketing mix characteristics often do not correlate well with household characteristics (Wind, 1978; Green and Krieger, 1991; Gupta and Chintagunta, 1994; Brusco et al., 2002), resulting in segmentation that is ineffective. The segmentation of households based on different bases such as household responses to the product or marketing mix and household characteristics is pervasive across commercial applications as well. For example, Claritas employs information on purchase behaviors and consumer descriptor variables (e.g., demographic, lifestyle) in its PRIZM NE and other associated segmentation systems to increase profitability through customer acquisition, development, and retention. NPD also employs consumer behavior (e.g., shopping behavior, purchases) and descriptor variables (e.g., demographics, lifestyle, attitudes) to reveal groupings of consumers, help retailers better understand the behaviors and characteristics of the buyers they have, and determine how best to build market share. Simmons gives client companies insight on the differences and similarities between consumer segments based on their body characteristics, similarity and differences in food preferences, and other descriptor variables (e.g., self-image, and media usage preferences) so that clients are able to better target products and promotions to groups of people of different body sizes. MediaMark Research employs consumer behavior (actual and intended product usage) and descriptor variables (e.g., demographics, psychographics, lifestyle and media) for print media (e.g., newspapers, magazines, etc.) to support concept and product development, conduct prime prospect analysis to help expand readership, attract new readers and meet existing reader needs. And, R. L. Polk uses a combination of automotive preference and purchase data, and demographic and lifestyle data to develop predictive segmentation models, and help client companies to communicate with and retain current customers, identify in-market buyers, generate brand awareness, reach, capture new customers, and target ‘the right people at the right time with the right offer.’ These examples suggest that segmentation based on multiple bases is pervasive in commercial settings as well. as alternatives to bicriterion clustering methods, but with the added advantage that they can easily accommodate problems in which different bases produce different numbers of segments. 4 What options are available to marketing research analysts looking to obtain a consensus partition from multiple independent partitions? One classic approach has deep historical roots in the theory of voting and social choice (Borda, 1784; Condorcet, 1785), and is based on the aggregation of preferences. Règnier (1965) is generally credited with the extension of these principles to clustering. His approach views the set of categorical variables as a system of binary equivalence relations and subsequently obtains the equivalence relation that is minimum distance from the system of relations in a least squares sense. The resulting optimization problem can be transformed into a binary integer program and is widely known as the clique partitioning problem (Grötschel and Wakabayashi, 1989). Hereafter, we refer to this approach as the CPP. Krieger and Green (1999) developed an approach for amalgamating partitions that is conceptually similar to the CPP, although less mathematically tractable for exact solution methods. Their approach, which they call SEGWAY, attempts to find a consensus partition that maximizes a weighted function of agreement indices between the consensus partition and the set of categorical measurements. Krieger and Green opted for Hubert and Arabie’s (1985) adjusted Rand index (ARI) as the agreement measure, which was an excellent choice in light of its reputation in the classification literature. For example, Steinley (2004, p. 394) provides cogent arguments for the superiority of the ARI over classification rates, including ARI’s correction for chance and the greater information it provides by capturing the pattern of classified observations. The CPP and SEGWAY methods for consensus partitioning do not employ a statistical model. They take the data measurements “as is” and optimize the appropriate objective criterion using discrete partitioning algorithms. As an alternative to these two non-model-based approaches, we consider latent class models (Lazarsfeld and Henry, 1968) as a third approach. Each of the three approaches for amalgamation of partitions has its own inherent advantages and limitations. For example, the CPP possesses the benefits of historical precedent and the use of the ubiquitous least-squares approach to data analysis. Another desirable property of the CPP is that it is one of the few discrete partitioning methods that automatically determines the number of clusters via the solution process. Relative to the CPP, SEGWAY offers greater flexibility with respect to differentially weighting the importance of partitions in the objective function, as well as the incorporation of constraints to secure minimum ARI thresholds for each partition. The SEGWAY algorithm is, however, more computationally demanding than the CPP and selection of appropriate weights and thresholds can be a tedious and challenging endeavor. 5 Whereas the CPP and SEGWAY models are plausible discrete partitioning approaches to the amalgamation of partitions, the latent class method (Lazarsfeld and Henry, 1968; Dillon and Kumar, 1994) is the natural probabilistic approach. The latent class method, which falls within the domain of finite mixture models (FMM), is predicated on the assumption of local independence among the categorical attributes (McLachlan and Peel, 2000, Chapter 5; Grim, 2006). Although somewhat unrealistic, this assumption, which is sometimes referred to as the naive Bayes approach, often yields very satisfactory performance when used in practice. Using the popular EM algorithm, segment probabilities for each class of each partition are iteratively estimated with the goal of maximizing log likelihood. Upon convergence of the algorithm, each customer is assigned to the segment for which the membership probability is largest. Although the CPP, SEGWAY, and FMM approaches for consensus partitioning each have their own merits, little is known about their relative performances when applied to segmentation data (Krieger and Green, 1999). Without such information, it is difficult to offer recommendations to market research practitioners with respect to the best practices for amalgamating partitions from different bases. Accordingly, our contribution is the comparison of the consensus partitioning approaches across a broad range of data conditions. The results of our study revealed that the FMM provided better recovery than its non-model-based competitors. In the next section, we provide a formal precise description of the CPP, SEGWAY, and FMM approaches. This is followed by the description and results of a simulation experiment comparing the consensus partitioning methods with respect to their fit to holdout (validation) partitions. The paper concludes with a discussion of the findings and suggestions for future research. 2. Amalgamation methods 2.1. Binary equivalence relations and the clique partitioning problem (CPP) Our development of the problem of aggregating binary equivalence relations and the formulation of the CPP uses the following definitions: N: the number of customers (or firms) that have been partitioned, indexed 1 n N; P: the number of partitions of the customers to amalgamate, indexed 1 p P; X: an N P matrix with elements xnp representing the class measurement for customer n in partition p, for 1 n N and 1 p P; 6 an N N matrix that defines a binary equivalence relation on partition p, for 1 p E(p) : P. The elements of E(p) are defined as enj( p ) = 1 if xnp = xjp and enj( p ) = 0 if xnp xjp, for 1 n, j N and 1 p P; an N N matrix defining a “median” relation for the N customers, where enj = 1 if E: customers n and j are assigned to the same segment, 0 otherwise, for 1 n, j N; The goal of the problem posed by Règnier (1965) is to find the median relation (Mirkin, 1974, 1979; Barthélemy and Monjardet, 1995), E, that provides the best aggregation of the P binary equivalence relations, E(1), E(2), ..., E(P), as measured by the least squares loss function: P P N N Minimize: E ( p ) - E enj( p ) enj p 1 2 . (1) p 1 n 1 j 1 The loss function is minimized subject to constraints guaranteeing reflexivity, symmetry, transitivity, and binary properties of the median relation, which are, respectively, enforced as follows: enn = 1 1 n N, (2) enj = ejn 1 n < j N, (3) eni + eij – enj 1 1 n, i, j N, (4) 1 n, j N. (5) enj {0, 1} The linearization of (1) is accomplished via some algebra that begins with the expansion of the quadratic expression and separation of terms that do not include the median relation: P N N P N N Minimize: enj( p ) 1 2enj( p ) enj . p 1 n 1 j 1 (6) p 1 n 1` j 1 The first component of (6) does not depend on the median relation, E, and is, therefore, removable from the objective function. This elimination, along with the incorporation of the first summation term of the second component into the parentheses, yields the following objective function: P ( p) enj . P 2 e nj n 1 j 1 p 1 N Minimize: N (7) Some further simplification is possible by observing that the reflexivity property of the equivalence relation allows the main diagonal of E to be ignored. Similarly, the symmetry 7 property permits restriction to the upper triangle (or lower triangle) of E. We define the decision variables, ynj = 1 if customers n and j are placed in the same segment of the median relation and 0 otherwise, for 1 n < j N. Additionally, we employ the parameters, P cnj P 2 enj( p ) , 1 n < j N. (8) p 1 Using these definitions, the CPP is formulated as follows (Grötschel and Wakabayashi, 1989; Kochenberger et al., 2005): N 1 z Minimize: N c n 1 j n 1 Subject to: nj y nj , (9) yni + yij – ynj 1 1 n < i < j N, (10) yni – yij + ynj 1 1 n < i < j N, (11) – yni + yij + ynj 1 1 n < i < j N, (12) 1 n < j N. (13) ynj {0, 1} Constraints (10), (11), and (12), which are sometimes referred to as “triangle” constraints, are required to guarantee transitivity. The exact solution of the CPP is possible for modestlysized N using mathematical programming procedures (Grötschel and Wakabayashi, 1989; Palubeckis, 1997; Mehrotra and Trick, 1998); however, heuristic procedures are more common, particularly for larger problem instances. We use a heuristic relocation algorithm that is similar in design to methods proposed by Marcotorchino and Michaud (1981) and Règnier (1965). The steps of the algorithm are presented in the Appendix. 2.2. The SEGWAY algorithm Krieger and Green (1999) presented the SEGWAY algorithm as a method for consensus partitioning based on the ARI. To highlight the relationship between SEGWAY and the CPP, we develop the ARI using the notation from the previous subsection. We begin with the definition of the following measures: N 1 1( p ) e N n 1 j n 1 N 1 (2p ) e N n 1 j n 1 ( p) nj ( p) nj : enj( p ) enj 1 , : enj( p ) 1 and enj 0 , 8 1 p P, (14) 1 p P, (15) N 1 ( p) 3 (1 e N n 1 j n 1 N 1 (4p ) ( p) nj ) : enj( p ) 0 and enj 1 , (1 e N n 1 j n 1 ( p) nj ) : enj( p ) enj 0 , 1 p P, (16) 1 p P, (17) The value of 1( p ) represents the number of customer pairs that are in the same segment in the amalgamated partition defined by E, and also in the same segment of the partition corresponding to E(p). Similarly, (4p ) represents the number of customer pairs that are in different segments in the amalgamated partition defined by E, and also in different segments of the partition corresponding to E(p). Contrastingly, the (2p ) and 3( p ) values reflect discordance between the amalgamation partition and partition p. For example, (2p ) is the number of customer pairs that are in different segments in the amalgamated partition defined by E, but in the same segment of the partition corresponding to E(p). Effectively, the objective function of CPP approach is to minimize total discordance and, therefore (1) could be re-written as follows: P Minimize: 2 (2p ) 3( p ) . (18) p 1 The SEGWAY approach differs from the CPP by using the ARI, rather than simple discordance between the amalgamated partitions and each of the P individual partitions. The ARI for each partition is computed as follows: ARI ( p E) H (1p (4p ) ) [( 1( p ) (2p ) )( 1( p ) 3( p ) ) ( 3( p ) (4p ) )( (2p ) (4p ) )] , H 2 [( 1( p ) (2p ) )( 1( p ) 3( p ) ) ( 3( p ) (4p ) )( (2p ) (4p ) )] (19) for 1 p P and where H = N(N-1)/2. Krieger and Green (1999) propose maximization of a weighted function of the ARI(p | E) values, subject to constraints on minimum segment size and minimum ARI thresholds for each of the individual partitions. To present the model, we define p (1 p P) as the ARI weights, min_size (1 s S) as the minimum segment size, and ARI_threshold(p) (1 p P) as the ARI thresholds. The SEGWAY optimization problem is posed as follows: P Maximize: p 1 Subject to: p (20) ( ARI ( p E)) |s| min_size 9 1 s S, (21) ARI(p | E) ARI_threshold(p), P p 1 p 1 p P, 1. (22) (23) Although the formulation of the SEGWAY model is presented with considerable flexibility, the selection of weights and thresholds in these types of scalar optimization problems is especially problematic (see, for example, Brusco et al., 2003). Accordingly, throughout the remainder of this paper, ARI threshold and segment size constraints are not considered and equal ARI weights of p = 1/P for 1 p P are assumed. The SEGWAY algorithm proposed by Krieger and Green (1999) is similar in design to the algorithm for CPP presented in the Appendix. In fact, there are only two salient differences. The first difference is the obvious modification of the objective function to reflect weighted ARI values rather than simple discordance. The second difference is that Krieger and Green developed SEGWAY to be used for a pre-specified number of segments, S. However, there is no reason that S cannot be permitted to vary during the execution of the algorithm, and our comparison incorporates this additional flexibility. 2.3. The latent class finite mixture model (FMM) The latent class model is a finite mixture model developed to explain the structure of a set of multivariate categorical data (Dillon and Kumar, 1994). To describe the FMM for amalgamation of segments, we define x n {xnp } as the vector of discrete random variables corresponding to class memberships for the nth customer on the P partitions (1 n N and 1 p P), where x np may take values g 1, , g p . The probability of observing x n , given that customer n is a member of latent segment s, is gp P f (x n | s ) f ( xnp g s ) pg , (24) p 1 g 1 where 1 if xnp g 0 otherwise. pg (25) The constraint gp f (x g 1 np g s) 1 10 (26) is needed to satisfy the laws of probabilities (Dillon and Kumar, 1994), so only g p 1 probabilities need to be estimated for each p, s. Additionally, we denote s as the probability of membership in segment s for each of the latent segments (1 s S), with the constraint that S s 1 s 1 . The latent class model is given by S Pr( x n ) s f (x n s ) . (27) s 1 The goal is to estimate the s and the probabilities f ( xnp g s) to maximize the following log-likelihood function: 1 L N S log s f (x n s) . n 1 s 1 N (28) The total number of parameters required is ( S 1) S g p 1 . The estimation process was P p 1 completed using the EM-algorithm (Dempster et al., 1977). To initialize the algorithm, a random partition of S segments was created and the s and f ( xnp g s) values were obtained. The algorithm was implemented until convergence and the consensus partition was produced by assigning each customer to the latent segment for which its probability was greatest. We restarted the algorithm 20 times using a different random initial partition for each restart. In most instances, the same final likelihood algorithm was obtained for each restart, indicating that problems of local optimality are likely not relevant for our application. The EM algorithm for FMM requires a fixed value of S. For this reason, we ran the EMalgorithm for 2 S 8 and used a modified version of Akaike’s (1973) information criterion (AIC) to choose S. In particular, we used the following AIC3 criterion, which has yielded excellent performance in two segment retention studies (Andrews and Currim 2003b, 2003c): P AIC3 2L 3 S 1 S ( g p 1) . p 1 The values of S minimizing AIC3 was selected for evaluation. 3. Simulation design 3.1. Experimental factors 11 (29) Six experimental factors were manipulated to generate the datasets. These between dataset factors were: 1. The number of latent segments (levels of S = 2 and 4) 2. The segment membership probabilities (levels of s = 1/S for 1 s S, and 1 = .6 with s = .4 / (S – 1) for 2 s S). 3. The number of partitions to amalgamate (levels of P = 3, 4, and 5) 4. The number of classes within each partition (levels of gp = 2, 3, and 4) 5. The partition concordance parameters (levels of = 2, 3, and mixed) 6. The number of customers (levels N = 100 and 200). A full-factorial design associated within each of these six factors produced 33 23 = 216 cells. Three datasets were generated for each cell, resulting in a total of 648 unique datasets. The datasets were generated using the procedure described in Appendix B of Krieger and Green’s (1999) paper, using the parameter to control for the strength of association among the partitions. The greater the value of , the greater the consistency of the partitions with the underlying consensus partition. Also, for consistency with Krieger and Green’s study, for each dataset two validation (holdout) partitions were generated in addition to the P partitions to be submitted to an amalgamation process. Performance with respect to these validation partitions provides the basis for comparing the different methods. The levels of the number of latent segments, the number of classes within each partition, and the number of customers were selected based on levels selected in simulation studies published on segmentation (e.g., Vriens et al., 1996; Andrews, Ainslie, and Currim 2002; Andrews, Ansari and Currim 2002; Andrews and Currim 2003c). Segment membership probabilities were chosen to correspond to segments of equal size, and alternatively, segments comprising one large segment and one of more equally sized smaller segments. The minimum size of the segments conformed with Andrews and Currim (2003c). The number of partitions to be amalgamated (3-5) was chosen to allow for household purchase behavior (e.g., heavy or light users, or loyal or switcher, etc.), demographic description (e.g., income, or employment status, or household size, etc.), lifestyle variables (e.g., consumption of the arts or technology, or different types of retailers, lifestyle, etc.), and other bases. When a larger number of descriptors is available, data reduction techniques such as factor or principle component analysis are often used to derive a smaller (e.g., 3-5) number of partitions. 12 3.2. Algorithms and computer implementation The method used to amalgamate the partitions can be perceived as a within dataset factor. For the simulation experiment, the CPP, SEGWAY, and FMM methods were applied to each of the 648 datasets under the practical assumption that the value of S was unknown. The CPP and SEGWAY algorithms were permitted to modify S so as to optimize their respective criteria and the AIC3 criterion was used to determine the appropriate number of segments for FMM. For each dataset and each method, data were collected with respect to the average agreement (measured by ARI) between the final partition and the two validation partitions and total computation time. The CPP, SEGWAY, and FMM procedures were written as MATLAB (MathWorks, Inc., 2002) m-files. All computational results were obtained by implementing these programs on a 2.4 GHz Core Duo Processor with 3GB of SDRAM. 4. Simulation results Table 1 provides, for each level of each factor, the average recovery of the validation partitions obtained by the CPP, SEGWAY, and FMM methods. An analysis-of-variance of these results was also performed and the results for significant main effects and two-way interactions are presented in Table 2. [Insert Tables 1 and 2 About Here] The ANOVA results in Table 2 show that all main effects are significant; however, there is a substantial disparity among the size of the effects as measured by partial eta-squared (2). Among the between dataset factors, the number of segments S had the greatest effect (2 = .912). As shown in Table 1, all three methods exhibited much greater recovery of the validation partitions at S = 4 relative to S = 2. However, interpreting the interaction of this effect with the method (as we do below) probably produces more useful insights since segmentation problems in general become more difficult as the number of segments increases.2 The second and third largest main effects corresponded to the concordance factor (2 = .758) and the number of classes per partition (2 = .676), respectively. Recovery performances of each of the three 2 The original Rand statistic approaches its upper limit of one as the number of segments increases. The Adjusted Rand Index (ARI) was created to overcome this limitation (Steinley 2004), but it is not clear whether this tendency toward one could still be present in the current simulation experiment. Later, we compute the values of ARI between methods and find the same pattern of results for the number of segments S, which indicates that the statistic itself increases as the number of segments increases. Krieger and Green (1999) did not vary the number of segments in the study that introduced SEGWAY. 13 methods improved markedly as the number of classes per partition (gp) increased,3 and as expected segment recovery was strongest for = 3 and weakest for = 2.4 Although clearly not as important as S, gp, and , the number of partitions in the amalgamation process possessed the fourth largest effect size (2 = .298). The CPP, SEGWAY, and FMM procedures each realized improved recovery performance as P increased.5 The mixing proportions (2 = .241) and the number of customers (2 = .009) had the smallest between dataset effects on recovery of the validation partitions. As observed from Table 1, for each of the three amalgamation methods, the difference between the mean for equal segment membership probability and 60% probability for the largest segments were quite small. Similarly, there were only minor differences in the means for 100 versus 200 customers. Table 2 shows that the within dataset factor, amalgamation method, had a statistically significant effect size of (2 = .138). The FMM procedure clearly outperformed its non-modelbased competitors. The overall recovery performances for FMM, SEGWAY, and CPP were .4989, .4716, and .4482, respectively. Moreover, the FMM procedure outperformed its competitors at nearly all factor levels, lagging slightly behind SEGWAY and CPP when gp = 2. Fourteen of the 21 two-way interactions were statistically significant at the = .05 level. What is especially interesting is that the second and fourth largest interaction effects correspond to the within dataset factor – the amalgamation method. The second largest interaction effect (2 = .103) occurred between the amalgamation method and the number of classes within the partitions (gp). Table 1 is helpful for uncovering the details of this effect. At gp= 2, there is minimal disparity among the average recovery performances of the three methods. The average ARI values are .3720, .3747, and .3683 for SEGWAY, CPP, and FMM, respectively. At gp= 3, the relative performance of FMM improves markedly, whereas CPP’s diminishes. The average ARI values are .4949, .4659, and .5270 for SEGWAY, CPP, and FMM, respectively. The separation of the methods is even more pronounced at gp= 4, where the average ARI values are .5480, .5039, and .6014 for SEGWAY, CPP, and FMM, respectively. What is unequivocally clear from this interaction effect is that the relative superiority of FMM grows markedly as gp increases. 3 This main effect could be reflecting a tendency of the ARI to increase as the number of classes within partitions increases. 4 Increases in consistency of partitions results in improved segment recovery. 14 The fourth largest interaction effect (2 = .066) is between S and the amalgamation method. Once again, Table 1 provides evidence as to the nature of this effect. At S = 2, FMM’s average recovery of .3471 is a good bit stronger than SEGWAY’s (.3080) and quite a bit better than CPP’s (.2629). Contrastingly, at S = 4, although FMM still holds the advantage, it does so by a less substantial margin. The average ARI values are .6353, .6335, and .6508 for SEGWAY, CPP, and FMM, respectively. The FMM procedure exhibited the greatest efficiency, requiring an average of 16.30 seconds. The CPP algorithm was also efficient, yielding an average computation time of 20.36 seconds, whereas the SEGWAY algorithm averaged 89.52 seconds. The results in Tables 1 and 2 provide compelling evidence that there are clear differences in the recovery performances of the three amalgamation methods. In light of the superior recovery performance of FMM recovery, a natural question is: Do the differences in recovery translate into salient differences in the partitions obtained by the three methods? To answer this question, we analyze two additional outputs of the simulation study: (a) the average number of segments used by each of the three amalgamation methods, and (b) the average pairwise agreement (as measured by ARI) between the partitions produced by the amalgamation methods. Table 3 reports the average number of segments determined by each of the three amalgamation methods. Consistent with previous research (Andrews and Currim, 2003a, 2003b), FMM is conservative in its use of segments, averaging 2.06 across the test problems. The SEGWAY and CPP procedures are much more aggressive, averaging 5.56 and 6.92 segments, respectively. In some instances, these non-model-based methods generated 10 or more segments, some of which contain only a single customer. The explanation for this finding is that the non-model-based methods can often realize a modest improvement in their objective functions by peeling off one (or possibly several) customers from a sizable segment and placing those customers in a newly created segment. Contrastingly, through its use of the AIC3 criterion, FMM is resistant to opening new segments because of the parameter penalty and, therefore, avoids the overfitting problems that are inherent in the non-model-based methods. [Insert Table 3 About Here] Table 4 presents the average ARIs between pairs of partitions produced by SEGWAY, CPP, and FMM. The two non-model-based methods exhibit a strong concordance. The average 5 Segment recovery is better when more data rather than less is used. 15 ARI between the SEGWAY and CPP partitions is .8900. Moreover, the two non-model-based method obtain ‘perfect agreement’ (ARI = 1) for 36.88% of the test problems in the experimental study. The FMM procedure exhibits somewhat less agreement with its non-model-based competitors, yielding average ARI values of .7906 and .7276 with SEGWAY and CPP, respectively. Perfect agreement between the FMM and SEGWAY partitions was realized for only 11.26% of the datasets and for 2.31% of the datasets the ARI was less than 0.2. Thus, we conclude that the actual segmentation results produced by model-based and non-model-based amalgamation methods are quite different and would produce very different segmentation and marketing strategies if implemented in real-world applications. [Insert Table 4 About Here] 5. Discussion, conclusions, and implications for current practice With respect to non-model-based procedures for establishing a consensus partition, our results indicate that CPP and SEGWAY both tend to overfit the data, using more segments than necessary. This problem seems to be slightly more severe for CPP, which experienced a greater increase in the selected number of segments and a more pronounced degradation in recovery of the validation partitions. The FMM provided better recovery of the validation partitions than either of its nonmodel-based competitors. Whereas CPP and SEGWAY are apt to overfit the data, using more segments than necessary, the FMM showed a remarkable ability to more conservatively choose S and yield better performance in holdout samples. We believe that part of FMM’s success is attributable to the use of the AIC3 criterion for model selection, which has performed well in several previous segmentation studies (Andrews and Currim, 2003b, 2003c). One of the inherent limitations of any simulation study is the inability to generalize the results beyond the ranges of the factors tested. We selected the factor levels for S, gp , and N based on simulation based studies on segmentation in the published literature. The levels for the number of customers (N = 100 and N = 200) also enabled the Monte Carlo simulation study to be conducted with a reasonable time frame. The SEGWAY algorithm and, to a lesser extent, the CPP procedure can require considerable computation time for N 300. Although the levels of N are apt to be below what might be encountered in applications, it is important is observe that this factor had the smallest effect on cluster recovery in both experiments. Further, while segmentation service providers usually describe a large number of segments, managers of their 16 customer firms will rarely consider more than four segments for differentiated product design or marketing decisions. Consequently, segmentation service providers usually link the larger number of more homogeneous segments to a smaller number of more basic segments which are more heterogeneous. Another decision that was made when conducting our experiment pertained to the measure of partition recovery. We selected Hubert and Arabie’s (1985) adjusted Rand index as the measure of agreement between the consensus partition and the validation partitions. This decision was based on the well-recognized properties of the ARI in the classification literature (Steinley, 2004), as well as its importance in market segmentation research (Helsen and Green, 1991; Carmone et al., 1999). Simulation studies have shown that ARI is the most desirable index for measuring segment recovery (Steinley, 2004). The caveats of our experiment notwithstanding, we believe that our simulation study offers compelling information for segmentation practice. As reviewed in the introduction section of this paper, segmentation is pervasive in commercial settings today, as a precursor for a variety of managerial decisions, across product categories and industries, products and services, and forprofit and not-for-profit organizations. Today, non-model based approaches dominate applications in commercial settings, including segmentation service providers. We recommend that analysts in commercial settings consider the finite mixture approach as a viable method for amalgamating a set of partitions established on different bases. Moreover, the mixture model approach should be used in conjunction with the AIC3 index as the criterion for determining the number of segments. This combination produced better results than CPP and SEGWAY even when these latter two methods were implemented using the true number of segments used to generate the synthetic data. Thus, the benefit of using FMM extends beyond better resolution of the segment retention problem. Finally, as marketing research analysts seek new approaches to the problem of amalgamating partitions obtained from different bases, they should monitor research streams in other disciplines. Most notable among these is the field of pattern recognition and machine learning, where a consensus clustering is often referred to as an ensemble (Strehl and Ghosh, 2002; Topchy et al., 2005). Topchy et al., for example, highlight important advantages of clustering ensembles for data mining, which include robustness, novelty, stability and confidence estimation, and parallelization and scalability. 17 Appendix The iterative relocation algorithm for the CPP requires as input an initial partition of the customers into S segments. We denote such a partition as S = {A1, A2,...,AS}, where As is the set of customer indices assigned to segment s and Ns = As is the number of customers is segment s for 1 s S. The steps of the relocation algorithm are as follows: Step 1. Set * = 0 and i = 0. Step 2. Set i = i + 1, s = 0, and define s : i As. Step 3. Set s = s + 1. If s > S + 1, then go to Step 10; otherwise go to Step 4. Step 4. If s = s, then go to Step 3; otherwise go to Step 5. Step 5. Compute the change in the objective function, , that occurs from moving customer i from segment s to s. If < *, then set * = , i* = i, s** = s and s* = s Go to Step 3. Step 6. If i < N, then go to Step 2. Step 7. If * = 0, then STOP. Step 8. Set As** = As** – {i*}, As* = As* {i*}, Ns** = Ns** – 1, and Ns* = Ns* + 1. Step 9. If s* = S + 1, then S = S + 1 (i.e., a new segment has been created). Step 10. If Ns** = 0, then S = S – 1 and all segment labels greater than s** are reduced by one as the number of segments has been reduced. Step 11. Go to Step 1. On convergence, the resulting partition is guaranteed to be a local optimum with respect to all possible relocations of a customer index to a different segment; however, a global optimum is not guaranteed. For this reason, we restart the algorithm 20 times using a different initial partition for each restart We have described the relocation algorithm within the context of the CPP; however, assuming fixed S, the algorithm is structurally the same as the SEGWAY method developed by Krieger and Green (1999). The only difference is that SEGWAY has a maximization objective function, thus necessitating a slight change in Step 5. We note that although Krieger and Green (1999) presented SEGWAY within the context of fixed S, there is no reason the algorithm cannot be run while permitting S to vary. 18 References Akaike, H., 1973. Information theory and an extension of the maximum likelihood principle. In B. N. Petrov and B. F. Csaki (Eds.) Second International Symposium on Information Theory, Budapest: Academiai Kiado, pp. 267-281. Andrews, R. L., Ainslie, A., Currim, I. S., 2002. An empirical comparison of logit choice models with discrete versus continuous representations of heterogeneity. Journal of Marketing Research, 39 (November), 479-487. Andrews, R. L., Ansari, A., Currim, I. S., 2002. Hierarchical Bayes versus finite mixture conjoint analysis models: a comparison of fit, prediction, and partworth recovery. Journal of Marketing Research, 39 (February), 87-98. Andrews, R. L., Brusco, M. J., Currim, I. S., 2008. A comparison of methods for bicriterion clustering problems: Are there benefits from having a statistical model? Working paper, University of Delaware. Andrews, R. L., Currim, I. S., 2003a. Recovering and profiling the true segmentation structure in markets: An empirical investigation. International Journal of Research in Marketing, 20, 177-192. Andrews, R. L., Currim, I. S., 2003b. Retention of latent segments in regression-based marketing models. International Journal of Research in Marketing, 20, 315-321. Andrews, R. L., Currim, I. S., 2003c. A comparison of segment retention criteria for finite mixture logit models. Journal of Marketing Research, 40 (May), 235-243. Banfield, J. D., Raftery, A. E., 1993. Model-based Gaussian and non-Gaussian clustering. Biometrics, 49, 803-821. Barthélemy, J.-P., Monjardet, B., 1995. The median procedure for partitions. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 19, 3-34. Borda, J. C., 1784. Mèmoire sur les élections au scrutin. Histoire de l’Académie royale des Sciences pour 1781. Paris. Brusco, M. J., Cradit, J. D., Stahl, S., 2002. A simulated annealing heuristic for a bicriterion partitioning problem in market segmentation. Journal of Marketing Research, 39 (February), 99-109. Brusco, M. J., Cradit, J. D., Tashchian, A., 2003. Multicriterion clusterwise regression for joint segmentation settings: An application to customer value. Journal of Marketing Research, 40 (May), 225-234. Brusco, M. J., Köhn, H.-F., 2008. Optimal partitioning of a data set based on the p-median model. Psychometrika, 73 (March), 89-105. Carmone, F. J., Kara, A., Maxwell, S., 1999. HINoV: A new model to improve market segmentation by identifying noisy variables. Journal of Marketing Research, 36 (November), 501-509. 19 Chaturvedi, A., Carroll, J. D., Green, P. E., Rotondo, J. A., 1997. A feature-based approach to market segmentation via overlapping K-centroids clustering. Journal of Marketing Research, 34 (August), 370-377. Condorcet, M. J. A. N., 1785. Caritat, marquis de Essai sur l’application de l’analyse à la probabilité des décisions rendues à la pluralité des voix. Paris. Dempster, A. P., Laird, N. M., Rubin, D. B., 1977. Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society B, 39, 138. Dillon, W. R., Kumar, A., 1994. Latent structure and other mixture models in marketing: An integrative survey and overview. In R. P. Bagozzi (Ed.), Advanced Methods of Marketing Research, Oxford: Blackwell, pp. 295-351. Frank, R. E., Massy, W. F., Wind, Y., 1972. Market Segmentation. Englewood Cliffs, NJ: Prentice-Hall. Green, P. E., Krieger, A. M., 1991. Segmenting markets with conjoint analysis. Journal of Marketing, 55, 20-31. Green, P. E., Schaffer, C. M., Patterson, K. M., 1988. A reduced-space approach to the clustering of categorical data in market segmentation. Journal of the Market Research Society, 30, 267-288. Grim, J., 2006. EM cluster analysis for categorical data. In D.-Y. Yeung, J. T. Kwok, A. L. N. Fred, F. Roll, D. de Ridder (Eds.), Structural, Syntactic, and Statistical Pattern Recognition, Berlin: Springer, pp. 640-648. Grötschel, M., Wakabayashi, Y., 1989. A cutting plane algorithm for a clustering problem. Mathematical Programming, 45, 59-96. Gupta, S., Chintagunta, P. K., 1994. On using demographic variables to determine segment membership in logit mixture models. Journal of Marketing Research, 31, 128-136. Helsen, K., Green, P. E., 1991. A computational study of replicated clustering with an application to market segmentation. Decision Sciences, 22 (5), 1124-1141. Hubert, L. J., Arabie, P., 1985. Comparing partitions. Journal of Classification, 2 (2), 193-218. Kochenberger, G., Glover, F., Alidaee, B., Wang, H., 2005. Clustering of microarray data via clique partitioning. Journal of Combinatorial Optimization, 10, 77-92. Krieger, A. M., Green, P. E., 1996. Modifying cluster-based segments to enhance agreement with an exogenous response variable. Journal of Marketing Research, 33 (August), 351363. Krieger, A. M., Green, P. E., 1999. A generalized Rand-index method for consensus clustering of separate partitions of the same data base. Journal of Classification, 16 (1), 63-89. Lazarsfeld, P. F., Henry, N., 1968. Latent structure analysis. Boston: Houghton-Mifflin. MacQueen, J. B., 1967. Some methods for classification and analysis of multivariate observations. In L. M. Le Cam and J. Neyman (Eds.), Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1), Berkeley, CA: University of California Press, pp. 281-297. 20 Marcotorchino, F., Michaud, P., 1981. Heuristic approach of the similarity aggregation problem. Methods of Operations Research, 43, 395-404. MathWorks, Inc., 2002. Using MATLAB (version 6). Natick: The MathWorks, Inc. McLachlan, G., Peel, D., 2000. Finite mixture models. New York: Wiley. Mehrotra, A., Trick, M. 1998. Cliques and clustering: A combinatorial approach. Operations Research Letters, 22, 1-12. Mirkin, B. G., 1974. The problems of approximation in space of relations and qualitative data analysis. Information and Remote Control, 35, 1424-1431. Mirkin, B. G., 1979. Group choice. New York: Wiley. Ramaswamy, V., Chatterjee, R., Cohen, S. H., 1996. Joint segmentation on distinct interdependent bases with categorical data. Journal of Marketing Research, 33 (August), 335-350. Palubeckis, G., 1997. A branch-and-bound approach using polyhedral results for a clustering problem. INFORMS Journal on Computing, 9, 30-42. Punj, G., Stewart, D. W., 1983. Cluster analysis in marketing research: Review and suggestions for application. Journal of Marketing Research, 20 (May), 134-148. Règnier, S., 1965. Sur quelques aspects mathématiques des problèmes de classification automatique. I.C.C. Bulletin, 4, 175-191. Smith, W., 1956. Product differentiation and market segmentation as alternative marketing strategies. Journal of Marketing, 20, 3-8. Steinley, D., 2004. Properties of the Hubert-Arabie adjusted Rand index. Psychological Methods, 9, 386-396. Steinley, D., Brusco, M. J., 2008. Selection of variables in cluster analysis: An empirical comparison of eight procedures. Psychometrika, 73 (1), 125-144. Strehl, A., Ghosh, J., 2002. Cluster ensembles – A knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3, 583-617. Topchy, A., Jain, A. K., Punch, W., 2005. Clustering ensembles: models of consensus and weak partitions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27 (12), 116. Vriens, M., M. Wedel, T. Wilms, 1996. Metric conjoint segmentation methods: a Monte Carlo comparison. Journal of Marketing Research, 33 (February), 73-85. Ward, J. H., 1963. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58, 236-244. Wedel, M., Kamakura, W. A., 2000. Market Segmentation: Conceptual and Methodological Foundations (2nd ed.). Boston, MA: Kluwer. Wind, Y., 1978. Issues and advances in segmentation research. Journal of Marketing Research, 15, 317-338. 21 Table 1. A comparison of average ARI values for the validation partitions obtained by the CPP, SEGWAY and FMM methods. Factor Factor levels SEGWAY CPP FMM Number of Segments S=2 .3080 .2629 .3471 S=4 .6353 .6335 .6508 s = 1/S for 1 s S .4440 .4157 .4713 1 = .6, s = .4/(S-1) for 2 s S .4993 .4807 .5266 P=3 .4272 .4121 .4645 P=4 .4685 .4359 .4974 P=5 .5192 .4965 .5349 Number of Classes gp = 2 .3720 .3747 .3683 in each Partition gp = 3 .4949 .4659 .5270 gp = 4 .5480 .5039 .6014 =2 .3648 .3389 .3901 =3 .5862 .5692 .6123 = 2 or 3 (mixed) .4639 .4365 .4943 N = 100 .4671 .4435 .4932 N = 200 .4762 .4529 .5047 .4716 .4482 .4989 Segment Probabilities Number of Partitions Concordance factor Number of Customers Overall 22 Table 2. Analysis of variance of average ARI values for the validation partitions obtained by the CPP, SEGWAY and FMM methods. Source Segments (S) SegProb (s) Partitions (P) Classes (gp) Concordance () Customers (N) Method S s SP S gp S SN S Method s P s gp s s N s Method P gp P PN P Method gp gp N gp Method N Method N Method Error Total SS df MS F Sig(F) Effect size (2) 54.189 1.667 2.225 10.948 16.419 .049 .836 1 1 2 2 2 1 2 54.189 1.667 1.113 5.474 8.209 .049 .418 19451.455 598.449 399.378 1964.916 2946.756 17.529 149.957 .000 .000 .000 .000 .000 .000 .000 .912 .241 .298 .676 .758 .009 .138 2.016 .090 .525 .192 .026 .373 .041 .080 .034 .000 .010 .123 .079 .017 .048 .050 .000 .603 .002 .013 .001 1 2 2 2 1 2 2 2 2 1 2 4 4 2 4 4 2 4 2 4 2 2.016 .045 .262 .096 .026 .187 .021 .040 .017 .000 .005 .031 .020 .009 .012 .012 .000 .151 .001 .003 .000 723.637 16.084 94.199 34.413 9.251 66.998 7.429 14.315 6.178 .000 1.844 11.074 7.124 3.133 4.281 4.477 .088 54.080 .324 1.187 .090 .000 .000 .000 .000 .002 .000 .001 .000 .002 .996 .158 .000 .000 .044 .002 .001 .916 .000 .724 .315 .914 .278 .017 .091 .035 .005 .066 .008 .015 .007 .000 .002 .023 .015 .003 .009 .009 .000 .103 .000 .003 .000 5.240 95.897 1881 1943 .003 23 Table 3. A comparison of average number of segments obtained by the CPP, SEGWAY and FMM methods when assuming S is unknown and determined by the method Factor Factor levels SEGWAY CPP FMM Number of Segments S=2 5.56 8.13 2.00 S=4 5.56 5.71 2.11 s = 1/S for 1 s S 5.77 6.69 2.07 1 = .6, s = .4/(S-1) for 2 s S 5.35 7.16 2.05 P=3 5.27 6.21 2.03 P=4 6.06 7.66 2.08 P=5 5.35 6.89 2.06 Number of Classes gp = 2 2.99 2.64 2.02 in each Partition gp = 3 5.91 7.79 2.08 gp = 4 7.78 10.33 2.07 =2 6.04 8.06 2.06 =3 4.95 5.63 2.06 = 2 or 3 (mixed) 5.69 7.06 2.06 N = 100 5.06 6.32 2.05 N = 200 6.06 7.52 2.06 5.56 6.92 2.06 Segment Probabilities Number of Partitions Concordance factor Number of Customers Overall 24 Table 4. A comparison of average ARI values between pairs of partitions obtained by the CPP, SEGWAY and FMM methods when assuming S is unknown and determined by the method SEGWAY SEGWAY and and and CPP FMM FMM S=2 .8013 .6996 .5823 S=4 .9787 .8815 .8729 s = 1/S for 1 s S .8935 .7961 .7445 1 = .6, s = .4/(S-1) for 2 s S .8864 .7850 .7107 P=3 .8926 .7201 .6789 P=4 .8634 .7990 .7131 P=5 .9139 .8526 .7909 Number of Classes gp = 2 .9528 .8071 .8128 in each Partition gp = 3 .8656 .7894 .7009 gp = 4 .8515 .7752 .6692 =2 .8333 .7029 .6196 =3 .9485 .8804 .8449 = 2 or 3 (mixed) .8880 .7884 .7184 N = 100 .8879 .7733 .7075 N = 200 .8920 .8078 .7477 .8900 .7906 .7276 Factor Number of Segments Segment Probabilities Number of Partitions Concordance factor Number of Customers Factor levels Overall 25 CPP