Statistical Techniques MAR 6648: Marketing Research February 1, 2011 Overview • We’ll talk about basic statistical tools – T-tests, crosstabs, and regression are useful tools • We’ll talk about what they can and can’t do • More sophisticated tools can give a deeper view of your customers – Conjoint analysis, cluster analysis, and factor analysis can help you understand who your customers are and what they like A Quick Note on Data Analysis • Statistics are just one part of an argument • People are easily persuaded by numbers and statistics – The more complicated the analysis, the less likely it is to be challenged • The strongest challenge to many statistical arguments is not in how the data are analyzed, but in how the data are collected – Methodological expertise always trumps data analytic experience – Data analytic knowledge allows for more careful consideration of methodology Really Basic: Comparing Groups • In marketing we often have a need to understand differences between groups – Segmentation • Are two or more segments really different along some dimension of behavior or attitude? – Experiments • Did the treatment work? • We need a systematic approach that allows us to say when two (or more) groups of customers, companies, markets, etc. really are different Most Basic: t-tests • Do web shoppers pay a different price for cars than dealership shoppers? • Do a hypothesis test: Null Hypothesis: = Alternative: ≠ T-test Results • “Customers who bought their new vehicles on the Auto Online website report having paid less for their vehicles than did customers who purchased their vehicles at the dealership (Monline = $11,582 vs. Mdealer = $13,594), t(1398) = -6.14, p < .001).” – If the p-value of the test is “small” we reject the null hypothesis – Here “small” typically means less than 5% (p = .05) • Now try answering a different question: – Are customers who purchase a car online more likely to buy their next car online as well? Understanding Associations • One of the most common questions in Marketing Research: – Are two (or more) variables associated? Customer type Subsequent transaction Tools for Analyzing Associations • Cross tabulation • Regression – Only for two categorical variables – Easy to understand Online 1st Online 2nd Dealer 1st – Applies to any number of variables – Not necessarily categorical variables – Slightly harder to understand Dealer 2nd 1400 1200 1000 Sales 800 600 400 200 0 0.75 1.25 1.75 2.25 Price 2.75 3.25 Χ2-test for Association • We can do a statistical test here • The null hypothesis is that there is no association between method of first purchase and method of subsequent purchase – This means that the percentage of people their next car online is the same regardless of how they purchased their previous car • Again, if the p-value of the test is less than .05, we reject the null hypothesis Intuition for Χ2-test • The Χ2-test is based on comparing the actual cell counts to what we would expect them to be if there was no association 1 2 = 333/500 = 154/500 3 4 = 0.692*0.666 = 0.231*500 We would expect the table to look like this if there was no association Intuition for Χ2-test • The Χ2-test is based on comparing the actual cell counts to what we would expect them to be if there was no association Actual Expected Conclusion: Based on this data, it looks like customers who purchase a car online are no more likely to buy their next car online than customers who bought their initial car from a dealer. Χ2 (1, N = 500) = 3.002, p = .08 Crosstabs • Crosstabs is a quick and easy tool for analyzing the association between two categorical variables • Caveats: – You find associations – not causations • An observed association may be driven by a third variable not captured in the analysis • In crosstabs we cannot control for other variables – we need regression for this – Warning: Be careful when cell counts are low. The test does not work well in this case (stats programs should tell you) Key Points • T-tests: – Good for analyzing data with a continuous dependent variable and a 2-level categorical variable – Does not allow for a more complex design – Does not allow the analysis to control for the presence of another known variable • Crosstabs – An easy method for describing categorical data – Easily analyzed using simple non-parametric tests (e.g., chi-square) – Poorly suited for handling non-categorical data – But often unable to isolate causation in data Regression • Regression analysis is widely used in Marketing Research – It can detect associations between variables – It can help make forecasts – It can test Marketing Mix models: Impact of marketing mix variables on sales – It can analyze results of experiments Example: Minute Maid Sales • Imagine that you’ve been hired as a consultant for the Minute Maid Company • Before going for an important meeting with senior management, you have been asked to analyze the sales data for MM orange juice for the Southern California market • To assist in your deliberations, some data have become available from one of your key accounts (the largest grocery chain in the market) Example: Minute Maid Sales • The database was collected from weekly store scanner data that captures information such as sales (# of cartons sold), price, and other promotion information for each product • Management is particularly interested in understanding how different pricing strategies affect sales The data week Total OJ Sales(00 cartons) Minute Maid-Sales (00 cartons) Price-MM 1 1029 66 2.99 2 350 89 2.99 3 802 565 2.59 4 701 50 2.99 5 484 186 2.99 6 763 334 2.39 7 848 57 2.99 8 957 732 1.99 . . . . . . . . 115 1296 88 2.53 116 1472 760 2.19 Weekly Minute Maid Sales and Price A Linear Sales Model • We wish to explain variation of sales as a function of price • Assume that sales and price are related as: St =β0 + β1Pt + εt • We have now assumed that sales in week t is a linear function of price plus a random component • We need to find β0 and β1 SPSS Regression output t-statistic b0 b1 Standard errors of b0 and b1 ≈ uncertainty associated with b0,b1 St =β0 + β1Pt + εt p-value Test of H0: β1=0 Ha: β1 ≠0 What does this mean? Key Points • Regression: – Generates a specific equation describing the relationship between a specific predictor (e.g., prices) and a specific outcome variable (e.g., sales) – The results can offer precise (if imperfect) prescriptions for managers Example: Minute Maid Sales • We previously identified a relationship between Minute Maid prices and Minute Maid sales – Essentially, Sales = 1093 + (-377 x price) • This model seems a little simplistic – What about accounting for the behavior of competitors? – Regression is good at that too • St =β0 + β1Pmm + β2Ptp + β3Ptr + β4Psb + ε Sales = 289 + (-479 × MMprice) + (131 × TPprice) + (175 × TRprice) + (144 × SBprice) These are dummy coded variables representing the presence or absence of specific product promotions in the OJ market. Question: Did our Minute Maid promotions positively influence sales? (controlling for the presence of other known variables)? Multiple Regression Controlling for everything else, the advertisement was still effective. An ad increased sales by 202 units. (Now, given the cost of advertising, you can make a recommendation about whether advertising is a good idea.) Multiple Regression What else can we learn? Tropicana Ads do not influence Minute Maid sales, but Store Brand ads do. It looks like ads generally decrease price sensitivities. (We would need to test interactions to learn more about it) Multiple Regression • Conceptually, the procedure allows you to track multiple variables at once – Track the influence of competition – Control for exogenous factors (e.g., weather, seasonality, etc.) • Every added variable improves the fit of the model to the given data Multiple Regression • Pitfalls: – That does not necessarily make it better at predicting the future. You can “overfit” the data – Bad things happen when the predictors are strongly related to each other – It intrinsically assumes that a linear model is a pretty good approximation • It often is • But not always… Key Points • Regression not only helps make precise predictions, it can simultaneously account for multiple influences • In so doing, it gets much closer to causal inferences (and good market researchers are after causal inferences) • Nevertheless, regression is not a panacea, and should be used as a tool, not the only tool • Nothing fixes poor research design Specialized Techniques • Research for segmentation decisions – Segmentation is an essential part of the marketing plan, but how do we actually find the segments • Demographics? – Sometimes useful, but demographics are often a poor predictor of behaviors and attitudes • Attitudes – Segment customers based on attitudinal info (e.g., “optimists vs. “pessimists”, “leaders” vs. “followers”) • Benefits – Segment customers based on benefits sought from product/service • Behavior – Segment customers based on similar behavior (e.g., “heavy users”, “light users”) • Cluster analysis is a technique used to identify groups of ‘similar’ customers in a market (i.e., market segmentation). • If some customers are very similar to one another but different from other (groups of) customers, cluster analysis can help you identify these (multiple) segments. Brand Loyalty Cluster Analysis Price sensitivity Cluster Analysis • What is it actually doing? • The algorithm measures the “distance” between every point and generates a solution which minimizes distances within a cluster and maximizes distances between clusters – Note that this language is very close to how you were taught to think about the attributes of good segmentation • What, exactly, is “distance”? – A rare literal example Cluster Analysis: Baseball • Baseball batters attempt to hit balls to parts of the field without any defensive players. • Baseball coaches have seven players to distribute wherever they want on the field. • Despite this general flexibility, fielders are almost uniformly distributed in the same locations. • Is that where batted balls tend to land? Let’s look at clustering of batted balls for a single player. Chase Utley Example: Shopping Attitudes • • • • • • V1: Shopping is fun V2: Shopping is bad for your budget V3: I combine shopping with eating out V4: I try to get the best buys while shopping V5: I don’t care about shopping V6: You can save a lot of money by comparing prices Example: Shopping • Cluster 1: _______________ • Cluster 2: _______________ • Cluster 3: _______________ Key Points • Cluster Analysis allows us to simplify across respondents • When used effectively, it can guide marketing strategy • Nevertheless, it is by no means pure computational science. Identifying and labeling clusters requires some interpretation – This is a strength (in flexibility) – And a weakness Clusters versus Factors Factor Analysis V1 V2 V3 V4 V5 Cluster Analysis Data ….. V20 Factor Analysis • Factor Analysis can be used for data reduction (i.e., to reduce the number of variables). • Factor analysis: Summarize the information contained in a larger number of variables into a smaller number of ‘factors’ without significant loss of information. – Data reduction is important when you need to measure “fuzzy” concepts like “love,” “trust,” or “satisfaction – Ask a series of questions that tap into the different components of the concept – Too many variables! Factor analysis can help to reduce this dimensionality problem Factor Analysis: Intuition • Factor analysis assumes that the correlation between a large number of variables is due to them all being dependent on the same small number of “factors” • Example: Choice of movies – Suppose individuals choose movies based on two main attributes: • Plot/story line (A1) • Production quality (A2) – Each individual has a preference for A1 and A2 Example: Choice of Movies A1 Weight A2 Weight I can relate to the characters 0.81 -0.02 The movie is visually pleasing 0.07 0.92 Set and costume design are an important part of a movie -0.13 0.85 Movie features major stars 0.09 0.16 Movie has first-rate special effects -0.08 0.69 Engaging story-line 0.76 0.12 I feel “transported” while watching 0.72 -0.18 Key Points • Factor Analysis allows us to simplify across measures • It helps hone in on large difficult concepts that a single item measures poorly • It has a set of guidelines for interpretation and use (e.g., Eigenvalues > 1, KMO > .6), but it is only slightly less flexible than Cluster Analysis Key Points • Market Research data is often extremely bulky and complicated. We need tools simply to make it comprehensible – Cluster Analysis helps with complexity across consumers, Factor Analysis helps with complexity across measures, Perceptual maps can helpfully present this information • These analytic tools are well suited to basic strategic concerns – Identifying segments and matching them to preferences and brand perceptions – In combination they are even better • Use these tools carefully: Because there is room for interpretation, there is also room for clumsiness (or deceptiveness) How do individuals form preferences over a large selection of different brands within a product category? . . . . . . Think of different brands as different combinations of attributes! . . . Engine Size HP Type #Doors Brand Price 2.5L 184 Sedan 4 BMW $27,800 4.0L 203 SUV 2 Ford $21,715 6.0L 316 SUV 4 Hummer $48,455 3.0L 215 Sedan . . . 2.4L . . . 4 . . . 157 Sedan Lexus $29,435 . . . 4 . . . . . . Toyota $18,970 Attribute based approach • Think of a product (a certain car) as a bundle of attributes. • A consumer prefers a certain car, car A, to another, car B, because the attributes of car A are more appealing to the consumer than the attributes of car B. • Suppose we assume that consumers form preferences over brands implicitly by forming preferences for the attributes of which the brands consists. • So if we present certain lists of attributes the consumer can rank these. Conjoint Analysis • Conjoint Analysis: A technique that enables a researcher to estimate consumers’ valuations of different attributes – Allows us to understand how consumers make trade-offs among attributes/characteristics of products and services – How much are consumers willing to pay/give up to get/avoid different attributes? Uses of Conjoint Analysis: New Products • Estimate market share of brands that differ in attribute levels Uses of Conjoint Analysis: Pricing/Valuation • Use information about customers’ valuation of attributes to guide pricing strategy for a product line Uses of Conjoint Analysis: Brand Equity • Brand name equity • How much is a brand really worth? or Assumption of Part-Worth’s • Total utility = sum of utilities of each attribute U( U( U( )= ) = ) = u(motorola) + u(pink) + u($149) + u(flip format)+… u(motorola) + u(grey) + u($149) + u(flip format)+… u(nokia) + u(black) + u($129) + u(candy bar format)+… Example: New Job Salary $100 K Location $150 K Example: New Job Prospective Employee 1 City Salary New York 0.0 (w11) San Francisco 0.75 (w12) $100,000 0.0 (w21) $150,000 0.25 (w22) Now we can rank jobs for this person: U(NY,$100K)=0 U(NY,$150K)=0.25 U(SF,$100K)=0.75 U(SF,$150K)=0.25+0.75=1.0 Example: New Job Prospective Employee 1 City Salary New York 0.0 (w11) San Francisco 0.75 (w12) $100,000 0.0 (w21) $150,000 0.25 (w22) Now we can rank jobs for this person: U(NY,$100K)=0 U(NY,$150K)=0.25 U(SF,$100K)=0.75 U(SF,$150K)=0.25+0.75=1.0 Example: New Job City Salary Prospective Employee 1 Prospective Employee 2 New York 0.0 (w11) 0.0 (w11) San Francisco 0.75 (w12) 0.25 (w12) $100,000 0.0 (w21) 0 $150,000 0.25 (w22) 0.75 (w22) (w21) Now we can rank jobs for this person, and compare it to this person: U(NY,$100K)=0 U(NY,$150K)=0.25 U(SF,$100K)=0.75 U(SF,$150K)=0.25+0.75=1.0 U(NY,$100K)=0 U(NY,$150K)=0.75 U(SF,$100K)=0.25 U(SF,$150K)=0.25+0.75=1.0 Example: New Job City Salary Prospective Employee 1 Prospective Employee 2 New York 0.0 (w11) 0.0 (w11) San Francisco 0.75 (w12) 0.25 (w12) $100,000 0.0 (w21) 0 $150,000 0.25 (w22) 0.75 (w22) (w21) Now we can rank jobs for this person, and compare it to this person: U(NY,$100K)=0 U(NY,$150K)=0.25 U(SF,$100K)=0.75 U(SF,$150K)=0.25+0.75=1.0 U(NY,$100K)=0 U(NY,$150K)=0.75 U(SF,$100K)=0.25 U(SF,$150K)=0.25+0.75=1.0 How do we get the part-worths? • This is very nice but we don’t know consumers’ valuations of attributes… • …and consumers probably don’t know their own valuations either! • A solution: Force consumers to rank different bundles of attributes (i.e., “brands”) 1. C 2. E C 3. A D 4. F 5. B 6. D A B E F Conjoint Analysis: Approaches • Traditional Conjoint: Have respondents directly rank or rate a series of product profiles Conjoint ≈Consider Jointly • Discrete Choice Models (allows for non-choice) – Also called “Choice Based Conjoint” • (from Sawtooth Software’s web-site: http://www.sawtoothsoftware.com/conjoint-analysissoftware) Standard Conjoint Analysis: Process • Develop the set of attributes • Select the levels of each attribute • Obtain an evaluation (rating or ranking) of the product profiles from respondents • Estimate the part-worths values for each level of each attribute • Compute importance weights for each attribute (normalized range) • Aggregation of results across consumers • Evaluate the tradeoffs among attributes • Market simulations • Evaluate accuracy of results From Preference to Choice • Conjoint model predicts utility, not choice • Utility is a continuous, relative measure of preference for each alternative. Choice is a discrete outcome. • Need a rule to translate preferences to choices: – First choice rule: Respondent chooses the profile with the highest predicted utility score – Share of preference rule: Predictions of choice probabilities sum to 1 over the set of stimuli tested. • First choice rule usually more appropriate for sporadic, non-routine purchases. • However, both rules are ad hoc Conjoint Pluses and Minuses • When to use CA: – Can the product be seen as a bundle of attributes? • Avoid using CA for “image” products – Are the respondents familiar with the category? • Avoid using CA for new-to-the-world products – Must know relevant attributes (exploratory research) • Warnings – CA will not indicate the absence of an important attribute – Attributes should be actionable to the firm – Interpolation between attribute levels ok – but do not extend beyond the range selected Key points • Conjoint is a very popular and frequently useful tool for identifying the underlying utilities of consumers. • It details the relative value of product attributes and guides product development and competitive pricing. • Nevertheless, its application is deeply contingent on both the consumer and the product category. Summary • There are a number of useful statistical techniques that can help you understand your data – T-tests, crosstabs, and regression are basic tools that can make comparisons and show relationships between marketing variables – Cluster, factor, and conjoint analysis can help you understand your customers’ traits and preferences • These tools are only effective if you have good research design to start