Regression Marketing Analytics Rajkumar Venkatesan Conservatism in Major League BB • Batting Average = Hits/(Opportunities– Walks) • OnBase% = (Hits+Walks)/Opportunities • OVERUSED: “small ball” – Sacrifice Bunt • Give up an out to advance the runner – Stealing Bases • Risk an Out to advance the runner. • UNDERUSED – Don’t risk making outs and runs will take care of themselves. Rajkumar Venkatesan CUSTOMER $ SPENT BY A Diagnosing Market Response: Regression Analysis NUMBER OF PROMOTIONS Marketing Analytics Rajkumar Venkatesan Example: Shopper Card Program Units purchased = a+b1*price paid + b2*feature ad + b3*display Customer 1 1 1 2 2 2 2 2 2 3 3 3 3 Units Purchased 2 1 2 1 1 5 1 1 2 2 2 3 1 Price Paid 1.99 1.99 1.99 2.29 2.29 1 2.29 2.29 2.1 1.99 2.1 1 1.99 Feature Ad Display 0 0 1 1 1 0 0 0 0 1 1 0 0 0 1 1 1 1 1 1 1 0 1 0 0 0 CODE: 1 = YES 0 = NO Marketing Analytics Data 1 = YES 0 = NO Rajkumar Venkatesan Example: Regression Output From Excel SUMMARY OUTPUT Regression Statistics Multiple R 0.882814911 R Square 0.779362167 Adjusted R Square 0.705816222 Standard Error 0.62024339 Observations 13 ANOVA df Regression Residual Total Intercept Price Paid Feature Ad Display 3 9 12 SS 12.22999092 3.46231677 15.69230769 Coefficients Standard Error 6.23 1.19 -2.29 0.51 0.30 0.39 0.28 0.42 Marketing Analytics MS 4.076663641 0.384701863 t Stat 5.24 -4.47 0.78 -0.67 F 10.59694 P-value 0.00 0.00 0.46 0.52 Rajkumar Venkatesan Price Elasticity Price elasticity can be derived as the ratio of change in quantity demanded (%∆Q) and percentage change in price (%∆P). PED = [Change in Sales/Change in Price] × [Price/Sales] = (∆Q/∆P) × (P/Q) Marketing Analytics Rajkumar Venkatesan Belvedere Vodka Year Sales (units) Ln(Sales) Price (dollars ) Ln(Price) Advertising (dollars) Ln (Advertising) 2007 410 6.016 215.44 5.373 20486.1 9.93 2006 381 5.943 211.45 5.354 2923.5 7.98 2005 365 5.900 207.45 5.335 4826.3 8.48 2004 369 5.911 240.87 5.484 13726.6 9.53 2003 339 5.826 241.33 5.486 10330.2 9.24 2002 306 5.724 247.55 5.512 13473.6 9.51 2001 273 5.609 240.48 5.483 9264.6 9.13 Marketing Analytics Rajkumar Venkatesan Belvedere Price Elasticity 6.05 Regression Statistics Multiple R 0.67536 R Square 0.45611 Adjusted R Square 0.34733 Observations 7 6 5.95 Ln (Sales) 5.9 5.85 5.8 Ln (Sales) 5.75 Linear (Ln (Sales 5.7 5.65 5.6 5.55 5.3 Intercept Ln (Price) Coefficients 12.686 −1.259 Standard Error 3.340 0.615 5.35 5.4 5.45 Ln (Price) 5.5 5.55 t Stat P-value 3.798 0.013 −2.048 0.096 Marketing Analytics Rajkumar Venkatesan Belvedere Advertising Elasticity 6.05 6 5.95 5.9 Ln (Sales) Regression Statistics Multiple R 0.06102 R Square 0.00372 Adjusted R Square −0.19553 Standard Error 0.15252 Observations 7 5.85 5.8 Ln (Sales) 5.75 Linear (Ln (Sales)) 5.7 5.65 5.6 5.55 8 Intercept Ln (advertising) Coefficients 5.963 −0.013 Standard Error 0.850 0.093 8.5 9 9.5 Ln (Advertising) t Stat 7.018 −0.137 Marketing Analytics 10 10.5 P-value 0.001 0.897 Rajkumar Venkatesan Marketing Analytics Rajkumar Venkatesan Customer Retention: Logistic Regression • Linear regression assumes the dependent variable (DV) to be continuous (and normally distributed) Profits - + • Often we have variables where there are only 2 different values 0 • Buy (1) vs no buy (0) • Retain (1) vs lose customer (0) Marketing Analytics Rajkumar Venkatesan Customer Retention: Logistic Regression • With categorical (1/0) dependent variables, linear regression can result in nonsensical estimated probabilities (e.g. probability of retention > 100%) • A model that allows us to do this is the so-called “logistic regression” Prob(Reten tion) e ( a b1 X ) 1 e ( a b1 X ) – Predictions are bound between [0,1] Marketing Analytics Rajkumar Venkatesan 1.2 1 0.8 0.6 0.4 0.2 0 -6 -4 -2 0 2 4 6 x Logistic Prob(retention) Marketing Analytics Rajkumar Venkatesan Logistic Regression: The connection to Bookies Prob(Reten tion) This is called the “odds” P 1 P e a b1 X e ( a b1 X ) 1 e ( a b1 X ) Chance of retention to chance of churn where, P the probabilit y of retention Marketing Analytics Rajkumar Venkatesan SuperBowl 2012 Odds Green Bay Packers 3.45 to 1 New England Patriots 4.4 to 1 New Orleans Saints 8.5 to 1 Baltimore Ravens 9.5 to 1 San Deigo Chargers 10.5 to 1 Detroit Lions 13 to 1 Houston Texans 17.5 to 1 Pittsburg Steelers 20 to 1 Marketing Analytics Rajkumar Venkatesan What is Odds? • If you chose a random day of the week (7 days), then the odds that you would choose a Sunday would be: – (1/7)/[1-(1/7)] = 1/6, but not 1/7. • The odds against you choosing Sunday are 6/1 = 6 , meaning that it's 6 times more likely that you don't choose Sunday. • Generally, 'odds' are not quoted to the general public in this format because of the natural confusion with the chance of an event occurring being expressed fractionally as a probability. • A bookmaker may (for his own purposes) use 'odds' of 'one-sixth', the overwhelming everyday use by most people is odds of the form 6 to 1, 6-1, or 6/1 (all read as 'six-to-one') where the first figure represents the number of ways of failing to achieve the outcome and the second figure is the number of ways of achieving a favorable outcome: thus these are "odds against". • An event with m to n "odds against" would have probability n/(m + n), while an event with m to n "odds on" would have probability m/(m + n). Source: http://en.wikipedia.org/wiki/Odds Marketing Analytics Rajkumar Venkatesan Example: Will a Physician Prescribe a Drug? Data Model Prob (prescribe ) Marketing Analytics e a b* Sales 1 e Calls a b* Sales Calls Rajkumar Venkatesan Example: XLStat Output Summary statistics: Variable nrx_ind Categories 0 1 Frequencies % 1128 1425 Variable Observations Obs. with missing data sales calls 2553 0 Minimum Maximum Mean 0.000 12.000 2.396 44.183 55.817 Obs. without missing data 2553 Std. deviation 2.128 Goodness of fit statistics (Variable nrx_ind): Statistic Independent Observations 2553 Sum of weights 2553.000 DF 2552 -2 Log(Likelihood) 3504.580 R²(McFadden) 0.000 R²(Cox and Snell) 0.000 R²(Nagelkerke) 0.000 AIC 3508.580 SBC 3520.270 Iterations 0 Full 2553 2553.000 2551 3216.666 0.082 0.107 0.000 3220.666 3232.356 6 Marketing Analytics Rajkumar Venkatesan Logistic Regression: Coefficients • Key difference: coefficients are not interpreted as such • Need to calculate “odds ratio” – For example, if the logit regression coefficent b = 2.303, then the odds ratio is: eb =e2.303 = 10 – when the IV increases one unit, the odds that the DV = 1 increases by a factor of 10, when other variables are controlled. Marketing Analytics Rajkumar Venkatesan Example: XLStat Output Source Intercept sales calls Value Standard error -0.575 0.064 0.361 0.023 Wald Chi-Square Pr > Chi² 79.883 < 0.0001 235.781 < 0.0001 What is the Odds Ratio for Sales Calls? –Caution: odds ratios that are close to one, do NOT suggest that the coefficients are insignificant – it just means there is 50/50 chance of outcome Marketing Analytics Rajkumar Venkatesan Example: Physicians Prescriptions Intercept (a) Coefficient of Sales Calls (b) exp(b) exp (a + bx) probability of prescription odds odds ratio Difference in Probability -0.575 0.361 1.435 Sales Calls = 0 Sales Calls = 1 0.56 0.81 0.36/(1-0.36) 0.360 0.56 1.435 For each additional sales call, the odds of a physician prescribing a drug increases by 43% (holding everything else constant). 0.447 0.81 0.087 Prob (prescription) when sales calls is zero = exp(-0575)/[1+exp(-0.575)] Prob (prescription) when sales calls is one = exp(-0.575+0.361)/[1+exp(-0.575+0.361)] Marketing Analytics Rajkumar Venkatesan Reaction to econometric analysis? Rajkumar Venkatesan Combined Effect of Age and Online Online Age 0 91 105 0 1 1 163 189 Average Profit Marketing Analytics Rajkumar Venkatesan Diagnosing Customer Profits and Retention: Common Drivers Behavioral characteristics • • • • • • purchase volume/quantity Frequency of buying length of relationship number of product categories purchased selling costs customer satisfaction Goal: To identify key lever(s) that “drive” customer value Demographic/firmographic characteristics • Age, income, gender • Loyalty program membership • Firm size Psychographic characteristics • Attitudes, values • Interests • Activities Marketing Analytics Rajkumar Venkatesan Model Building • Determine properties of dependent variable – Linear, + ve values, Dummy Variable • Select model that reflects dependent variable properties – Logistic regression for dummy variables Marketing Analytics Rajkumar Venkatesan Model Building • Include the decision variable of interest among the independent variable set – Price, advertising, online • Include common control variables – Quality, Distribution, Demographics, Tenure, Competition etc. Marketing Analytics Rajkumar Venkatesan Model Building • Does including lagged dependent variable lead to UNIT ROOT? • If UNIT ROOT, use difference as the dependent variable • Are some independent variables correlated more than 0.8. If so, can we eliminate one of the correlated variables or combine them. Marketing Analytics Rajkumar Venkatesan Model Building • Are some variables Missing at Random (MAR) or are they missing systematically? • If variables are missing systematically, are there proxies that can replace the missing variables Marketing Analytics Rajkumar Venkatesan Model Building • Does the model hint @ causality or is it a correlational model? – Are dependent and independent variables measured at the same time? – Are there sufficient controls or confounding variables included – Can a reverse causation reasonably exist – Do we need to recommend an experiment? Marketing Analytics Rajkumar Venkatesan