Quantile Regression Prize Winnings – LPGA 2009/2010 Seasons www.lpga.com Kahane, L.H. (2010). “Returns to Skill in Professional Golf: A Quantile Regression Approach,” International Journal of Sport Finance, Vol. 5, pp. 167-170 Cameron, A.C., and P.K. Trivedi (2010). Microeconometrics Using Stata, Revised Edition, STATA Press, College Station, TX. Data Description • Ladies Professional Golf Association (LPGA) participants during 2009 and 2010 seasons • Response Variable: Earnings per Event entered ($1000s) • Predictor Variables: Average Driving Distance Percent of Fairways reached on Drives Percent of Greens reached in Regulation Putts per Hole on Greens reached in Regulation Percent of Sand Saves (2 shots to hole) Quantile Regression • Linear Regression is used to relate the Conditional Mean to predictors. • Quantile Regression relates specific quantiles to predictors. Particularly useful with non-normal data • Makes use of different loss function than Ordinary Least Squares – Uses linear programming to estimate Cumulative Distribution Function (CDF): F ( y ) Pr Y y q th Quantile: F yq q yq F 1 (q ) Loss Function to b e minimized for q th Quantile: Q q N q y x 'β i: yi xi ' i i q N 1 q y x 'β i: yi xi ' i i q Summary Data for Earnings/Event earnevent ------------------------------------------------------------Percentiles Smallest 1% .1654375 0 5% .5611538 0 10% 1.200667 .1654375 Obs 289 25% 2.991929 .2545882 Sum of Wgt. 289 50% 75% 90% 95% 99% 6.653733 15.45304 34.815 54.85067 81.95658 Largest 81.35504 81.95658 82.81731 99.06261 Note: The data are highly skewed: • Mean > 2*Median • Std. Dev. > Mean Mean Std. Dev. 13.44039 17.40237 Variance Skewness Kurtosis 302.8425 2.325991 8.461489 Plots of Earnings per Event – Showing Skew Multiple Linear Regression Source | SS df MS -------------+-----------------------------Model | 47899.6433 5 9579.92865 Residual | 39318.9937 283 138.936374 -------------+-----------------------------Total | 87218.637 288 302.84249 Number of obs F( 5, 283) Prob > F R-squared Adj R-squared Root MSE = = = = = = 289 68.95 0.0000 0.5492 0.5412 11.787 -----------------------------------------------------------------------------earnevent | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------drive | .0749854 .112027 0.67 0.504 -.1455266 .2954974 fairway | .0765432 .1433219 0.53 0.594 -.2055689 .3586554 green | 1.515417 .2260856 6.70 0.000 1.070394 1.96044 girputtshole | -155.7758 19.96318 -7.80 0.000 -195.0709 -116.4806 sandsvpct | .4146478 .0919212 4.51 0.000 .2337117 .5955839 _cons | 160.2711 49.32258 3.25 0.001 63.1854 257.3567 ------------------------------------------------------------------------------ The model explains approximately 55% of the variation in earnings per event Important Factors: Greens in Regulation (+), Putts per hole (-), Sand Save Percent (+) Influential Observations wrt s Influential Observations: DFBETAS j (i ) 2 2 0.1176 n 289 drive fairway green girputtshole sandsvpct Obsnum _dfbeta_1 Obsnum _dfbeta_2 Obsnum _dfbeta_3 Obsnum _dfbeta_4 Obsnum _dfbeta_5 30 0.3509 30 0.3433 30 -0.3225 30 0.6184 30 0.3295 166 -0.2432 125 0.2899 104 -0.2795 106 -0.2525 104 -0.3027 249 -0.3423 249 -0.4798 166 0.2661 125 -0.2833 163 0.5885 268 -0.3883 238 -0.2572 177 0.2556 211 -0.2842 249 -0.2418 211 -0.4088 268 0.7242 256 0.4677 238 -0.5183 268 -0.2856 These cases are extremely influential (higher than twice the “rule of thumb”). Golfers: 30 (Michelle Ellis, 2009), 104 (Liselotte Neuman, 2009), 125 (Jiyai Shin, 2009), 166 (Paula Creamer, 2010), 211 (Cristie Kerr, 2010), 249 (Angela Park, 2010), and 268 (Jiyai Shin, 2010) appear to have high influence on several regression coefficients Quantile Regression Models the regression relation for various quantiles between the predictors and the response variable: Earnings per Event Yi x βq iq ' i x 1 drivei ' i fairwayi green i sandsavi puttsi Loss Function to b e minimized for q th Quantile: Q q N q y x 'β i: yi xi ' i i q N 1 q y x 'β i: yi xi ' i i q Standard errors of regression coefficients are estimated by bootstrapping 400 samples Quantile Regression Output (STATA) q25 Coef. Std. Err. t drive -0.0254 0.0539 fairway -0.0073 0.0665 green 0.8419 0.1492 girputtshole -72.3041 9.3716 sandsvpct 0.1215 0.0435 _cons 86.0426 18.9736 P>|t| [95% Conf.Interval] -0.4700 0.6380 -0.1314 0.0806 -0.1100 0.9120 -0.1382 0.1235 5.6400 0.0000 0.5483 1.1355 -7.7200 0.0000 -90.7510 -53.8571 2.8000 0.0060 0.0360 0.2071 4.5300 0.0000 48.6952 123.3899 q50 Coef. Std. Err. t drive 0.0528 0.0786 fairway 0.0476 0.0973 green 0.9611 0.1832 girputtshole -95.3871 17.1245 sandsvpct 0.1432 0.0777 _cons 99.7847 43.1192 P>|t| [95% Conf.Interval] 0.6700 0.5020 -0.1018 0.2074 0.4900 0.6250 -0.1440 0.2392 5.2500 0.0000 0.6006 1.3217 -5.5700 0.0000 -129.0947 -61.6795 1.8400 0.0660 -0.0098 0.2962 2.3100 0.0210 14.9096 184.6598 q75 Coef. Std. Err. t drive 0.3131 0.1399 fairway 0.0857 0.1692 green 1.1226 0.3229 girputtshole -127.4508 37.0845 sandsvpct 0.4549 0.1288 _cons 76.0492 86.7001 P>|t| [95% Conf.Interval] 2.2400 0.0260 0.0377 0.5886 0.5100 0.6130 -0.2475 0.4188 3.4800 0.0010 0.4869 1.7583 -3.4400 0.0010 -200.4471 -54.4544 3.5300 0.0000 0.2014 0.7084 0.8800 0.3810 -94.6097 246.7080 Note: 1) Driving distance is only significant among golfers at the 75th percentile 2) Putting ability effect increases among skill levels 3) Greens in regulation effect is fairly equal among skill levels 4) Fairway accuracy is not significant for any skill level 5) Sand saves are more important for golfers at the 75th percentile Tests of Equality of Coefficients Across Quantiles . test[q25=q50=q75]: drive ( 1) [q25]drive - [q50]drive = 0 ( 2) [q25]drive - [q75]drive = 0 F( 2, 283) = 3.25 Prob > F = 0.0403 . test[q25=q50=q75]: fairway ( 1) [q25]fairway - [q50]fairway = 0 ( 2) [q25]fairway - [q75]fairway = 0 F( 2, 283) = 0.27 Prob > F = 0.7603 . test[q25=q50=q75]: green ( 1) [q25]green - [q50]green = 0 ( 2) [q25]green - [q75]green = 0 F( 2, 283) = 0.58 Prob > F = 0.5600 . test[q25=q50=q75]: girputtshole ( 1) [q25]putts - [q50]putts = 0 ( 2) [q25] putts - [q75] putts = 0 F( 2, 283) = 1.62 Prob > F = 0.1989 . test[q25=q50=q75]: sandsvpct ( 1) [q25]sandsvpct - [q50]sandsvpct = 0 ( 2) [q25]sandsvpct - [q75]sandsvpct = 0 F( 2, 283) = 5.35 Prob > F = 0.0053 Plots of Regression Coefficients by Quantile