Quantile Regression - LPGA Earnings and Performance Statistics 2009-2010

advertisement
Quantile Regression
Prize Winnings – LPGA 2009/2010 Seasons
www.lpga.com
Kahane, L.H. (2010). “Returns to Skill in Professional Golf: A Quantile Regression Approach,”
International Journal of Sport Finance, Vol. 5, pp. 167-170
Cameron, A.C., and P.K. Trivedi (2010). Microeconometrics Using Stata, Revised Edition, STATA Press,
College Station, TX.
Data Description
• Ladies Professional Golf Association (LPGA) participants
during 2009 and 2010 seasons
• Response Variable: Earnings per Event entered
($1000s)
• Predictor Variables:





Average Driving Distance
Percent of Fairways reached on Drives
Percent of Greens reached in Regulation
Putts per Hole on Greens reached in Regulation
Percent of Sand Saves (2 shots to hole)
Quantile Regression
• Linear Regression is used to relate the Conditional
Mean to predictors.
• Quantile Regression relates specific quantiles to
predictors. Particularly useful with non-normal data
• Makes use of different loss function than Ordinary
Least Squares – Uses linear programming to estimate
Cumulative Distribution Function (CDF): F ( y )  Pr Y  y
q th Quantile: F  yq   q 
yq  F 1 (q )
Loss Function to b e minimized for q th Quantile:
Q  q  
N
  q y  x 'β
i: yi  xi '
i
i
q

N
  1  q  y  x 'β
i: yi  xi '
i
i
q
Summary Data for Earnings/Event
earnevent
------------------------------------------------------------Percentiles
Smallest
1%
.1654375
0
5%
.5611538
0
10%
1.200667
.1654375
Obs
289
25%
2.991929
.2545882
Sum of Wgt.
289
50%
75%
90%
95%
99%
6.653733
15.45304
34.815
54.85067
81.95658
Largest
81.35504
81.95658
82.81731
99.06261
Note: The data are highly skewed:
• Mean > 2*Median
• Std. Dev. > Mean
Mean
Std. Dev.
13.44039
17.40237
Variance
Skewness
Kurtosis
302.8425
2.325991
8.461489
Plots of Earnings per Event – Showing Skew
Multiple Linear Regression
Source |
SS
df
MS
-------------+-----------------------------Model | 47899.6433
5 9579.92865
Residual | 39318.9937
283 138.936374
-------------+-----------------------------Total |
87218.637
288
302.84249
Number of obs
F( 5,
283)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
289
68.95
0.0000
0.5492
0.5412
11.787
-----------------------------------------------------------------------------earnevent |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------drive |
.0749854
.112027
0.67
0.504
-.1455266
.2954974
fairway |
.0765432
.1433219
0.53
0.594
-.2055689
.3586554
green |
1.515417
.2260856
6.70
0.000
1.070394
1.96044
girputtshole | -155.7758
19.96318
-7.80
0.000
-195.0709
-116.4806
sandsvpct |
.4146478
.0919212
4.51
0.000
.2337117
.5955839
_cons |
160.2711
49.32258
3.25
0.001
63.1854
257.3567
------------------------------------------------------------------------------
The model explains approximately 55% of the variation in earnings per event
Important Factors: Greens in Regulation (+), Putts per hole (-), Sand Save Percent (+)
Influential Observations wrt s
Influential Observations: DFBETAS j (i )
2
2


 0.1176
n
289
drive
fairway
green
girputtshole
sandsvpct
Obsnum _dfbeta_1 Obsnum _dfbeta_2 Obsnum _dfbeta_3 Obsnum _dfbeta_4 Obsnum _dfbeta_5
30
0.3509
30
0.3433
30
-0.3225
30
0.6184
30
0.3295
166
-0.2432
125
0.2899
104
-0.2795
106
-0.2525
104
-0.3027
249
-0.3423
249
-0.4798
166
0.2661
125
-0.2833
163
0.5885
268
-0.3883
238
-0.2572
177
0.2556
211
-0.2842
249
-0.2418
211
-0.4088
268
0.7242
256
0.4677
238
-0.5183
268
-0.2856
These cases are extremely influential (higher than twice the “rule of thumb”).
Golfers: 30 (Michelle Ellis, 2009), 104 (Liselotte Neuman, 2009), 125 (Jiyai Shin, 2009),
166 (Paula Creamer, 2010), 211 (Cristie Kerr, 2010), 249 (Angela Park, 2010), and 268
(Jiyai Shin, 2010) appear to have high influence on several regression coefficients
Quantile Regression
Models the regression relation for various quantiles between the
predictors and the response variable: Earnings per Event
Yi  x βq   iq
'
i
x  1 drivei
'
i
fairwayi
green i
sandsavi 
puttsi
Loss Function to b e minimized for q th Quantile:
Q  q  
N
  q y  x 'β
i: yi  xi '
i
i
q

N
  1  q  y  x 'β
i: yi  xi '
i
i
q
Standard errors of regression coefficients are estimated by bootstrapping 400 samples
Quantile Regression Output (STATA)
q25
Coef.
Std. Err. t
drive
-0.0254
0.0539
fairway
-0.0073
0.0665
green
0.8419
0.1492
girputtshole -72.3041
9.3716
sandsvpct
0.1215
0.0435
_cons
86.0426 18.9736
P>|t|
[95% Conf.Interval]
-0.4700
0.6380 -0.1314
0.0806
-0.1100
0.9120 -0.1382
0.1235
5.6400
0.0000
0.5483
1.1355
-7.7200
0.0000 -90.7510 -53.8571
2.8000
0.0060
0.0360
0.2071
4.5300
0.0000 48.6952 123.3899
q50
Coef.
Std. Err. t
drive
0.0528
0.0786
fairway
0.0476
0.0973
green
0.9611
0.1832
girputtshole -95.3871 17.1245
sandsvpct
0.1432
0.0777
_cons
99.7847 43.1192
P>|t|
[95% Conf.Interval]
0.6700
0.5020 -0.1018
0.2074
0.4900
0.6250 -0.1440
0.2392
5.2500
0.0000
0.6006
1.3217
-5.5700
0.0000 -129.0947 -61.6795
1.8400
0.0660 -0.0098
0.2962
2.3100
0.0210 14.9096 184.6598
q75
Coef.
Std. Err. t
drive
0.3131
0.1399
fairway
0.0857
0.1692
green
1.1226
0.3229
girputtshole -127.4508 37.0845
sandsvpct
0.4549
0.1288
_cons
76.0492 86.7001
P>|t|
[95% Conf.Interval]
2.2400
0.0260
0.0377
0.5886
0.5100
0.6130 -0.2475
0.4188
3.4800
0.0010
0.4869
1.7583
-3.4400
0.0010 -200.4471 -54.4544
3.5300
0.0000
0.2014
0.7084
0.8800
0.3810 -94.6097 246.7080
Note:
1) Driving distance is
only significant
among golfers at
the 75th percentile
2) Putting ability effect
increases among
skill levels
3) Greens in regulation
effect is fairly equal
among skill levels
4) Fairway accuracy is
not significant for
any skill level
5) Sand saves are more
important for
golfers at the 75th
percentile
Tests of Equality of Coefficients Across Quantiles
. test[q25=q50=q75]: drive
( 1) [q25]drive - [q50]drive = 0 ( 2) [q25]drive - [q75]drive = 0
F( 2, 283) = 3.25
Prob > F = 0.0403
. test[q25=q50=q75]: fairway
( 1) [q25]fairway - [q50]fairway = 0 ( 2) [q25]fairway - [q75]fairway = 0
F( 2, 283) = 0.27
Prob > F = 0.7603
. test[q25=q50=q75]: green
( 1) [q25]green - [q50]green = 0
( 2) [q25]green - [q75]green = 0
F( 2, 283) = 0.58
Prob > F = 0.5600
. test[q25=q50=q75]: girputtshole
( 1) [q25]putts - [q50]putts = 0 ( 2) [q25] putts - [q75] putts = 0
F( 2, 283) = 1.62
Prob > F = 0.1989
. test[q25=q50=q75]: sandsvpct
( 1) [q25]sandsvpct - [q50]sandsvpct = 0 ( 2) [q25]sandsvpct - [q75]sandsvpct = 0
F( 2, 283) = 5.35
Prob > F = 0.0053
Plots of Regression Coefficients by Quantile
Download