A Fixed Effect Ridge Regression Model with Interaction for

advertisement
Ridge Regression using PROC REG
A Fixed Effect Model for Determining the Mixture of AcquisitionSubscription Cost
Steven Matthew Anderson
Century Link
Anderson.Research.co.llc@gmail.com
Outline
• A Case Study to Introduce Ridge Regression
– Description of the Business Problem
– Regression Model
– Problems with the Model
• Ridge Regression Model
– Description of the Method
– How Does it Work
• SAS’s PROC REG
– Code
– Output
• Simulation of the Model
• Summary
• Future work
A Case Study to Introduce Ridge Regression
• Terminology
–
–
–
–
–
–
Fixed Cost
Variable Cost
Acquisition Expense
Subscription Expense
Mixtures of Acquisition and Subscription Expense
Side Note: Some Examples of Analysis Using this Cost Structure
• The Business Problem
• The Regression Model
• Problems with the Model
Fixed Cost
• Fixed costs are business
expenses that do not
change in proportion to the
activity of the business
(within a relevant time
period)
• Discretionary fixed costs
• Staff Salaries
• Network Management
• Data/IP Strategy
• Sales Force Management
• Most Overhead expense
– Arise from annual decisions
by management to spend on
certain fixed cost items
– Costs that do not change
significantly over time
25
20
Expense
• Committed fixed costs
Fixed Cost vs Time
15
10
Adjustment
5
0
0
5
10
15
Time
20
25
Variable Cost
• Variable costs are expenses
that change in proportion to
the activities of the
business.
• Semi-variable costs are
fixed costs that are adjusted
periodically to
accommodate changes in
business activity.
• Costs of goods sold
• Commissions
• Sales Headcount (minus commissions)
• Call Center Staffing
• Bad Debt
Variable Cost vs TIme
– Looks like a step function
over time
25
Expense
• Semi-variable costs are
considered in this study to
be variable costs.
30
Adjustment
20
15
10
5
0
0
5
10
15
Time
20
25
30
Acquisition Expense
– # Sales units (Gross Inwards)
– # Call Center employees
• Marketing incentives
• Sales Headcount
• Installation of Service
• Design Services (WAN)
Acquisition Cost vs Sales Units
25
20
Expense
• Can be interpreted as
expenses incurred to
“Make the Sale.”
• Positively Correlated
with acquisition
activities
15
10
5
0
0
5
10
15
Sales Units (AGI)
20
25
Subscription Expense
– Monthly Revenue
– # of Revenue Generating
Units (RGU)
• Repair of services
• Collections
• Network Monitoring
Subscription Cost vs Revenue
30
25
Expense
• Can be interpreted as
expenses incurred to
“Keep the Customer.”
• Positively Correlated
with Monthly
Subscription Activity
20
15
10
5
0
0
5
10
15
Revenue
20
25
Mixed Acquisition/Subscription
Expense
• Expenses that are
positively correlated
with both Subscription
and Acquisition Activity
• Fleet
• Construction
• Hosting Operations
Financial Analysis Examples using
this Cost Structure
• Break Even Analysis
– Used to analyze the
potential profitability of
an expenditure in a
sales based business
– Need to find the beakeven point (point
where revenue is
equal to expense)
BEP 
Fixed Cost
Selling Price  Variable Cost
Picture stolen from Wikipedia
Financial Analysis Examples using
this Cost Structure
• Customer Lifetime Value
– Used in Marketing to
determine how much each
customer is “worth” over
time
Calculated
– R=Revenue
– E=Expense
T
Rtk  Etk
CLVk  
t
t  0 1  it 
Rtk  Etk
 R E 
t
t 1 1i
t
by:
T
k
0
k
0
 
Subscription Margink
1 - it t
t 1
T
  AquisitionMargink 
Description of the Business Problem
• Given a particular cost pool (i.e. bucket)
– What percentage of the cost pool can be
classified as fixed or variable cost?
– What percentage of the cost pool can be
classified as acquisition or subscription cost?
Regression Model
•
•
•
•
•
Expense 0  1 A  2 S  3  AS 
Expense = Total expense in cost pool
A = Acquisition Activity (AGI)
S = Subscription Activity (RGU)
(AS) = Cross Product Interaction Term
Regression Model
100% Subscription Expense
Subscription Activity
100% Acquisition Expense
Subscription Activity
Regression Model
Regression Model
Answering the Fixed/Variable Expense Question
Let A  Aquisition Expense,and S  Subscription Expense
so that the
Total Expense  0  1 A   2 S  3  AS 
 0  AverageFixed Expense
Variable Expense Total Expense  0
Percentageof Variable Expense
Percentageof Fixed Expense

Variable Expense
Total Expense
Fixed Expense
Total Expense
0
Total Expense
 1
Variable Expense
Total Expense
Regression Model
Answering the Acquisition/Subscription Question
A2 S 2
Total Expense 2  2  1
E
E
A
Let 1 
 A  1 E ,
E
S
2 
 S  2 E ,
E
E (Total Expense)
Acquisition
S
and E 2  A 2  S 2  1 E    2 E 
2
2
 E 
 E 

 E    E   E    E 
2
E
2
1
2
2
1
A
2
2
2
1
2
1
2
1 2  2 2
1 2  2 2 1 2  2 2
Subscription
1
Percentage of Subscription Cost
Percentage of Acquisition Cost
The Results from My Brilliant Model
• Variance Inflation Factors are HUGE!
• None of the parameter estimates are
significant
• When parameter estimates were
significant:
– the confidence intervals around them made
the results useless!
– The signs were often wrong with respect to
reality
The Problem Reading the Log
• Extreme Cases
– SAS Note: Model is not full rank. Least-squares solutions for the
parameters are not unique. Some statistics will be misleading. A
reported DF of 0 or B means that the estimate is biased.
– SAS Note: The following parameters have been set to 0, since
the variables are a linear combination of other variables as
shown.
interaction =-105.877 * Intercept + 13.0209 * ln_agi + 8.13133 * ln_rgu
An Example
Analysis of Variance
ods graphics on;
proc reg data=sim_data outvif
outest=bob ;
model total_expense=A S
Interaction / tol vif collin;
run;
proc print data=bob;
run;
ods graphics off;
Source
DF
Sum of
Squares
Mean
Square
F Value
Pr > F
Model
3
43231154
14410385
74.77
<.0001
Error
46
8865802
192735
Corrected Total
49
52096956
Parameter Estimates
Variable
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
Tolerance
Variance
Inflation
Intercept
1
14672
20592
0.71
0.4798
.
0
A
1
-4.55289
8.23521
-0.55
0.5830
0.00192
521.23743
S
1
-2.08466
4.09754
-0.51
0.6134
0.00330
302.85512
interaction
1
0.00176
0.00164
1.07
0.2898
0.00128
784.02140
Collinearity Diagnostics
Number
Eigenvalue
Condition
Index
Proportion of Variation
Intercept
A
S
interaction
1
3.99240
1.00000
5.692765E-7
5.758341E-7
5.725649E-7
5.794308E-7
2
0.00482
28.76909
0.00039475
0.00055150
0.00055094
0.00040402
3
0.00277
37.95309
0.00094557
0.00070160
0.00068843
0.00097339
4
0.00000230
1318.75978
0.99866
0.99875
0.99876
0.99862
So What Happened?
( XB)T (Y  XB* )  0
B T X T (Y  XB* )  0
 ( AB)T  B T AT
B T ( X T Y  X T XB* )  0  distribution
( X T Y  X T XB* )  B  0  AT B  B  A
 ( X T X )B  X TY
If (XTX) is invertible, then B has a unique solution B=B*.
 B*  ( X T X ) 1 X T Y
Basically for XTX to be invertible each column must be a pivot column. If
design matrix X has one or more variables that are linear combinations of the
other variables, then when you row reduce XTX you are going to get at least
one row that has a bunch of zeros in it, and at least one of your columns isn’t
going to be a pivot column. Ergo, you do not have a unique solution!
Near Multicollinearity means that at least one column is approximately a linear
combination of some or all of the others, making XTX near singular.
(Enter stage left) Ridge Regression
• Modify Least Squares
Regression to allow
biased estimators of
the regression
coefficients.
• Bias versus precision
trade off
E(b)
Bias of bR
B  ( X T X ) 1 X T Y
is modified t omoveX T X away
from near singularity
and closer t o t he st at eof
ort hogonal
it y amongt he columns
BR  ( X T X  kI m 1 ) 1 X T Y
Where k≥0 and is known as the biasing
or shrinkage parameter
We introduce bias by uniformly
increasing the diagonal elements
and leave the off-diagonal elements
invariant
E(bR)
Methods for Picking a Likely Value of k
• Graphically using the Ridge Trace Graph – a plot of the
parameters against k and estimating where the
coefficients become “stable”
• Getting the VIF’s as close to 1 as possible
• Staring at the errors and figure out where the RMSE
levels off
• Using the formula by Hoerl, Kennard, and Baldwin
k
(m  1) S
2

T
OLS
OLS
Simulation
50 observations
Intercept=N(1000,50)
Acquisition → N(2500,50)
Subscription = 0.7*Acquisition
Interaction = acquisition*subscription
1 2   2 2
1 2   2 2 1 2   2 2
1
So “in theory” we should
end up with 57%
Acquisition and 43%
Subscription
k
(m  1)S2
T
OLS
OLS

(4)(57651)
 0.01217
18943761.8
SAS’s PROC REG
ods graphics on;
proc reg data=sim_data
outvif
outest=rb
ridge=0 to 0.03 by .001;
title 'Ridge Regression with PROC REG';
model total_expense=A S Interaction / tol vif collin;
run;
ods graphics off;
SAS Ridge Plots
SAS Diagnostics
SAS Diagnostics II
SAS Output Dataset
Type of
statistics
Ridge
regression
control value
Root mean
squared error
PARMS
Intercept
A
S
interaction
240.1072
4352.4418
1.4511
-3.1776
1.28E-03
difference in
rmse
RIDGE
0
240.1072
4352.4418
1.4511
-3.1776
1.28E-03
RIDGE
0.001
240.4279
2518.0393
1.8645
-1.7268
8.74E-04
13.3446
RIDGE
0.009
242.0831
616.1069
1.6862
0.5524
4.71E-04
4.6013
RIDGE
0.01
242.1817
565.9577
1.6599
0.6410
4.61E-04
4.0718
RIDGE
0.011
242.2697
524.0733
1.6362
0.7175
4.52E-04
3.6324
RIDGE
0.012
242.3488
488.6401
1.6147
0.7842
4.45E-04
3.2640
RIDGE
0.013
242.4203
458.3412
1.5953
0.8428
4.38E-04
2.9523
RIDGE
0.014
242.4855
432.1970
1.5776
0.8948
4.33E-04
2.6867
RIDGE
0.015
242.5451
409.4631
1.5615
0.9412
4.28E-04
2.4585
RIDGE
0.028
243.0417
268.0123
1.4331
1.2765
3.94E-04
1.1248
RIDGE
0.029
243.0680
263.2177
1.4269
1.2911
3.92E-04
1.0824
RIDGE
0.03
243.0934
258.8830
1.4211
1.3048
3.91E-04
1.0441
SAS Output Dataset
Ridge regression control
value
Type of statistics
A
S
interaction
0 RIDGEVIF
244.8223
228.4689
530.7665
0.001 RIDGEVIF
113.7915
110.8910
164.5080
0.009 RIDGEVIF
14.5425
14.9128
8.0670
0.01 RIDGEVIF
12.5768
12.9119
6.7163
0.011 RIDGEVIF
10.9903
11.2939
5.6825
0.012 RIDGEVIF
9.6907
9.9662
4.8737
0.013 RIDGEVIF
8.6122
8.8629
4.2289
0.014 RIDGEVIF
7.7071
7.9359
3.7067
0.015 RIDGEVIF
6.9398
7.1492
3.2779
0.028 RIDGEVIF
2.5530
2.6368
1.0876
0.029 RIDGEVIF
2.4088
2.4880
1.0239
0.03 RIDGEVIF
2.2770
2.3519
0.9663
Simulation Results
Model: (57% Subscription, 43%Acquistion)
Expense =1,000+(Acquisition)+(Subscription)+(Interaction)
OLS: (184.1% Subscription, -84.1%Acquistion)
Expense = 4352.442– 1.4511(Acquisition) –
3.1776(Subscription) + (1.28E-03)(Interaction)
SAS Ridge: (67.3% Subscription, 32.7%Acquistion)
Expense = 488.64 + 1.61(Acquisition) +
0.784(Subscription) + 3.624(Interaction)
Summary
• Ridge Regression corrects for multicollinearity problems
by modifying the method of least squares to allow more
precise biased estimators.
• Allows me to perform Customer Lifetime Value and
Breakeven Analysis with existing correlated regressors
• Not perfect but better than OLS Estimation
• SAS needs some additional functionality
– Confidence intervals for Bi’s
– Confidence intervals for k
Next Steps
• Implementing other methodology for choosing
shrinkage parameter
• Dorugade and Kashid (2009)
• Mardikyan and Cetin (2008)
• Lawless and Wang (kLW) (1976)
• Add to SAS
– Confidence Intervals
• Firinguetti & Bobadilla’s Asymptotic Confidence Intervals
• Crivelli, Firinguetti & Montano’s Boot Strapping Confidence
Intervals
• Feig’s Monte Carlo method for Evaluating Confidence Intervals
Download