Regression Basics

advertisement
Regression
Marketing Analytics
Rajkumar Venkatesan
Conservatism in Major League BB
• Batting Average = Hits/(Opportunities– Walks)
• OnBase% = (Hits+Walks)/Opportunities
• OVERUSED: “small ball”
– Sacrifice Bunt
• Give up an out to advance the runner
– Stealing Bases
• Risk an Out to advance the runner.
• UNDERUSED
– Don’t risk making outs and runs
will take care of themselves.
Rajkumar Venkatesan
CUSTOMER
$ SPENT BY A
Diagnosing Market Response:
Regression Analysis
NUMBER OF PROMOTIONS
Marketing Analytics
Rajkumar Venkatesan
Example: Shopper Card Program
Units purchased = a+b1*price paid + b2*feature ad + b3*display
Customer
1
1
1
2
2
2
2
2
2
3
3
3
3
Units Purchased
2
1
2
1
1
5
1
1
2
2
2
3
1
Price Paid
1.99
1.99
1.99
2.29
2.29
1
2.29
2.29
2.1
1.99
2.1
1
1.99
Feature Ad Display
0
0
1
1
1
0
0
0
0
1
1
0
0
0
1
1
1
1
1
1
1
0
1
0
0
0
CODE:
1 = YES
0 = NO
Marketing Analytics
Data
1 = YES
0 = NO
Rajkumar Venkatesan
Example: Regression Output From Excel
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.882814911
R Square
0.779362167
Adjusted R Square 0.705816222
Standard Error
0.62024339
Observations
13
ANOVA
df
Regression
Residual
Total
Intercept
Price Paid
Feature Ad
Display
3
9
12
SS
12.22999092
3.46231677
15.69230769
Coefficients
Standard Error
6.23
1.19
-2.29
0.51
0.30
0.39
0.28
0.42
Marketing Analytics
MS
4.076663641
0.384701863
t Stat
5.24
-4.47
0.78
-0.67
F
10.59694
P-value
0.00
0.00
0.46
0.52
Rajkumar Venkatesan
Price Elasticity
Price elasticity can be derived as the
ratio of change in quantity demanded
(%∆Q) and percentage change in price
(%∆P).
PED = [Change in Sales/Change in Price] × [Price/Sales] = (∆Q/∆P) × (P/Q)
Marketing Analytics
Rajkumar Venkatesan
Belvedere Vodka
Year
Sales
(units)
Ln(Sales)
Price
(dollars )
Ln(Price)
Advertising
(dollars)
Ln (Advertising)
2007
410
6.016
215.44
5.373
20486.1
9.93
2006
381
5.943
211.45
5.354
2923.5
7.98
2005
365
5.900
207.45
5.335
4826.3
8.48
2004
369
5.911
240.87
5.484
13726.6
9.53
2003
339
5.826
241.33
5.486
10330.2
9.24
2002
306
5.724
247.55
5.512
13473.6
9.51
2001
273
5.609
240.48
5.483
9264.6
9.13
Marketing Analytics
Rajkumar Venkatesan
Belvedere Price Elasticity
6.05
Regression Statistics
Multiple R
0.67536
R Square
0.45611
Adjusted R
Square
0.34733
Observations
7
6
5.95
Ln (Sales)
5.9
5.85
5.8
Ln (Sales)
5.75
Linear (Ln (Sales
5.7
5.65
5.6
5.55
5.3
Intercept
Ln (Price)
Coefficients
12.686
−1.259
Standard
Error
3.340
0.615
5.35
5.4
5.45
Ln (Price)
5.5
5.55
t Stat P-value
3.798
0.013
−2.048
0.096
Marketing Analytics
Rajkumar Venkatesan
Belvedere Advertising Elasticity
6.05
6
5.95
5.9
Ln (Sales)
Regression Statistics
Multiple R
0.06102
R Square
0.00372
Adjusted R
Square
−0.19553
Standard Error
0.15252
Observations
7
5.85
5.8
Ln (Sales)
5.75
Linear (Ln (Sales))
5.7
5.65
5.6
5.55
8
Intercept
Ln (advertising)
Coefficients
5.963
−0.013
Standard
Error
0.850
0.093
8.5
9
9.5
Ln (Advertising)
t Stat
7.018
−0.137
Marketing Analytics
10
10.5
P-value
0.001
0.897
Rajkumar Venkatesan
Marketing Analytics
Rajkumar Venkatesan
Customer Retention: Logistic Regression
• Linear regression assumes the dependent variable (DV) to be
continuous (and normally distributed)
Profits
-
+
• Often we have variables where there are only 2 different
values
0
• Buy (1) vs no buy (0)
• Retain (1) vs lose customer (0)
Marketing Analytics
Rajkumar Venkatesan
Customer Retention: Logistic Regression
• With categorical (1/0) dependent variables, linear regression
can result in nonsensical estimated probabilities (e.g.
probability of retention > 100%)
• A model that allows us to do this is the so-called “logistic
regression”
Prob(Reten tion) 
e
( a  b1 X )
1 e
( a  b1 X )
– Predictions are bound between [0,1]
Marketing Analytics
Rajkumar Venkatesan
1.2
1
0.8
0.6
0.4
0.2
0
-6
-4
-2
0
2
4
6
x
Logistic Prob(retention)
Marketing Analytics
Rajkumar Venkatesan
Logistic Regression:
The connection to Bookies
Prob(Reten tion) 
This is
called

the “odds”
P
1 P
e
a  b1 X
e
( a  b1 X )
1 e
( a  b1 X )
Chance of retention to chance of
churn
where, P  the probabilit y of retention
Marketing Analytics
Rajkumar Venkatesan
SuperBowl 2012 Odds
Green Bay Packers
3.45 to 1
New England Patriots
4.4 to 1
New Orleans Saints
8.5 to 1
Baltimore Ravens
9.5 to 1
San Deigo Chargers
10.5 to 1
Detroit Lions
13 to 1
Houston Texans
17.5 to 1
Pittsburg Steelers
20 to 1
Marketing Analytics
Rajkumar Venkatesan
What is Odds?
•
If you chose a random day of the week (7 days), then the odds that you would
choose a Sunday would be:
– (1/7)/[1-(1/7)] = 1/6, but not 1/7.
•
The odds against you choosing Sunday are 6/1 = 6 , meaning that it's 6 times more
likely that you don't choose Sunday.
•
Generally, 'odds' are not quoted to the general public in this format because of the
natural confusion with the chance of an event occurring being expressed
fractionally as a probability.
•
A bookmaker may (for his own purposes) use 'odds' of 'one-sixth', the
overwhelming everyday use by most people is odds of the form 6 to 1, 6-1, or 6/1
(all read as 'six-to-one') where the first figure represents the number of ways of
failing to achieve the outcome and the second figure is the number of ways of
achieving a favorable outcome: thus these are "odds against".
•
An event with m to n "odds against" would have probability n/(m + n), while an
event with m to n "odds on" would have probability m/(m + n).
Source: http://en.wikipedia.org/wiki/Odds
Marketing Analytics
Rajkumar Venkatesan
Example:
Will a Physician Prescribe a Drug?
Data
Model
Prob (prescribe ) 
Marketing Analytics
e
 a  b* Sales
1 e
Calls
 a  b* Sales

Calls
Rajkumar Venkatesan

Example: XLStat Output
Summary statistics:
Variable
nrx_ind
Categories
0
1
Frequencies
%
1128
1425
Variable
Observations Obs. with missing data
sales calls
2553
0
Minimum
Maximum
Mean
0.000
12.000
2.396
44.183
55.817
Obs. without missing data
2553
Std. deviation
2.128
Goodness of fit statistics (Variable nrx_ind):
Statistic Independent
Observations
2553
Sum of weights 2553.000
DF
2552
-2 Log(Likelihood)
3504.580
R²(McFadden)
0.000
R²(Cox and Snell) 0.000
R²(Nagelkerke)
0.000
AIC
3508.580
SBC
3520.270
Iterations
0
Full
2553
2553.000
2551
3216.666
0.082
0.107
0.000
3220.666
3232.356
6
Marketing Analytics
Rajkumar Venkatesan
Logistic Regression: Coefficients
• Key difference: coefficients are not interpreted as such
• Need to calculate “odds ratio”
– For example, if the logit regression coefficent b = 2.303,
then the odds ratio is: eb =e2.303 = 10
–  when the IV increases one unit, the odds that the DV =
1 increases by a factor of 10, when other variables are
controlled.
Marketing Analytics
Rajkumar Venkatesan
Example: XLStat Output
Source
Intercept
sales calls
Value
Standard error
-0.575
0.064
0.361
0.023
Wald Chi-Square Pr > Chi²
79.883 < 0.0001
235.781 < 0.0001
What is the Odds Ratio for Sales Calls?
–Caution: odds ratios that are close to one, do NOT suggest
that the coefficients are insignificant – it just means there is
50/50 chance of outcome
Marketing Analytics
Rajkumar Venkatesan
Example: Physicians Prescriptions
Intercept (a)
Coefficient of
Sales Calls (b)
exp(b)
exp (a + bx)
probability of
prescription
odds
odds ratio
Difference in
Probability
-0.575
0.361
1.435
Sales Calls = 0 Sales Calls = 1
0.56
0.81
0.36/(1-0.36)
0.360
0.56
1.435
For each additional sales call, the odds
of a physician prescribing a drug
increases by 43% (holding
everything else constant).
0.447
0.81
0.087
Prob (prescription) when sales calls is zero =
exp(-0575)/[1+exp(-0.575)]
Prob (prescription) when sales calls is one
= exp(-0.575+0.361)/[1+exp(-0.575+0.361)]
Marketing Analytics
Rajkumar Venkatesan
Reaction to econometric analysis?
Rajkumar Venkatesan
Combined Effect of Age and Online
Online
Age
0
91
105
0
1
1
163
189
Average Profit
Marketing Analytics
Rajkumar Venkatesan
Diagnosing Customer Profits and
Retention: Common Drivers
Behavioral characteristics
•
•
•
•
•
•
purchase volume/quantity
Frequency of buying
length of relationship
number of product categories purchased
selling costs
customer satisfaction
Goal:
To identify
key lever(s)
that “drive”
customer value
Demographic/firmographic characteristics
• Age, income, gender
• Loyalty program membership
• Firm size
Psychographic characteristics
• Attitudes, values
• Interests
• Activities
Marketing Analytics
Rajkumar Venkatesan
Model Building
• Determine properties of dependent variable
– Linear, + ve values, Dummy Variable
• Select model that reflects dependent
variable properties
– Logistic regression for dummy variables
Marketing Analytics
Rajkumar Venkatesan
Model Building
• Include the decision variable of interest
among the independent variable set
– Price, advertising, online
• Include common control variables
– Quality, Distribution, Demographics, Tenure,
Competition etc.
Marketing Analytics
Rajkumar Venkatesan
Model Building
• Does including lagged dependent variable
lead to UNIT ROOT?
• If UNIT ROOT, use difference as the
dependent variable
• Are some independent variables correlated
more than 0.8. If so, can we eliminate one
of the correlated variables or combine them.
Marketing Analytics
Rajkumar Venkatesan
Model Building
• Are some variables Missing at Random
(MAR) or are they missing systematically?
• If variables are missing systematically, are
there proxies that can replace the missing
variables
Marketing Analytics
Rajkumar Venkatesan
Model Building
• Does the model hint @ causality or is it a
correlational model?
– Are dependent and independent variables
measured at the same time?
– Are there sufficient controls or confounding
variables included
– Can a reverse causation reasonably exist
– Do we need to recommend an experiment?
Marketing Analytics
Rajkumar Venkatesan
Download