Instrumental Variables: 2-Stage and 3-Stage Least Squares Regression of a Linear Systems of Equations 2009 LPGA Performance Statistics and Prize Winnings www.lpga.com S.J. Callan and J.M. Thomas (2007). “Modeling the Determinants of a Professional Golfer’s Tournament Earnings,” Journal of Sports Economics, Vol. 8, No. 4, pp. 394-411 Data Description • Prize Winnings and Performance Statistics for n = 146 professional women (LPGA) golfers for 2009 season • Exogenous Performance Variables: Average Driving Distance Percentage of Fairways reached on Drive Percentage of Greens Reached in Regulation Percentage of Sand Saves (in hole in 2 shots from close traps) Average Putts per hole on greens reached in regulation Numbers of Events, Events Completed, Rounds • Endogenous Result (Dependent & Independent) Variables: Average Score per Round Average Rank (Percentile in Tournaments) Log(Prize Winnings) Variables in Systems of Equations • Endogenous Variables – Jointly dependent (response) variables that are system determined. They can also appear as predictor variables in other equations • Exogenous Variables – Independent variables that do not depend on the endogenous variables • Predetermined Variables – Exogenous and lagged Endogenous variables • Instrumental Variables – Predetermined variables used to predict endogenous variables in first-stage regressions, with predicted values being used in place of the endogenous predictors in system of equations System of Equations (Callan and Thomas, 2007) 1. Average Score (per 18 holes) is related to the golfers’ skills and experience (number of rounds played) 2. Average Rank (transformed to percentile) in tournaments is related to average score and the number of events she competed in 3. Season Earnings is related to average rank and the number of tournaments she completed SCORE i 0 D Di F Fi G Gi S Si P Pi R Ri 1i Rank i 0 SCORESCORE i E Ei 2i ln Prizei 0 RANK Rank i C Ci 3i Potential Problems with Endogenous Predictors • When endogenous variables are included as predictors, they can be correlated with error terms for that equation, particularly when there are omitted variables that may be related to the outcome. This causes Ordinary Least Squares Estimates to be biased and inconsistent. In equation 2, SCORE may be correlated with the error term without a variable measuring average course difficulty (Callan and Thomas, p. 402). In equation 3, Rank may be correlated with the error term without a variable measuring golfer’s human capital investment such as diet and concentration level (Callan and Thomas, p. 402). Model Building Process 1. Regress all endogenous variables (Score, Rank, and ln(Prize)) on all exogenous variables 2. Obtain the predicted values for each endogenous variable, based on the Regressions from 1. 3. In the system of equations, replace any “right hand side” endogenous predictors with their fitted values from 2. 4. Note that software (e.g. SAS and STATA) will fit all the regressions in 1., even if that variable does not appear as a predictor (ln(Prize) in this example). 5. This method provides correct estimates, but not ANOVA table or correct standard errors First Stage Regressions for Score and Rank SUMMARY OUTPUT Dep. Var. = SCORE Regression Statistics Multiple R 0.969534 R Square 0.939996 Adjusted R Square 0.936492 Standard Error 0.288069 Observations 146 SUMMARY OUTPUT Dep. Var. = RANK Regression Statistics Multiple R 0.971418 R Square 0.943652 Adjusted R Square 0.940362 Standard Error 3.699136 Observations 146 ANOVA ANOVA df Regression Residual Total SS MS F 8 178.0979 22.26223 268.2725 137 11.36876 0.082984 145 189.4666 Coefficients Standard Error t Stat Intercept 63.22496 1.969431 32.10316 Drive -0.00496 0.004252 -1.16683 Fairway -0.01622 0.005784 -2.80373 Green -0.11232 0.010222 -10.9876 sandsv -0.01448 0.003307 -4.3771 GIRPuttsHole10.8415 0.893289 12.13661 Rounds -0.02758 0.010894 -2.5313 Events 0.096164 0.027269 3.526557 Completed -0.02314 0.018163 -1.27401 P-value 1.3E-65 0.245304 0.005786 1.57E-20 2.37E-05 1.8E-23 0.012494 0.000573 0.204818 df Regression Residual Total SS MS F 8 31394.75 3924.344 286.7916 137 1874.655 13.68361 145 33269.41 Coefficients Standard Error t Stat Intercept 127.4395 25.28977 5.039173 Drive 0.006845 0.054606 0.125355 Fairway 0.051758 0.074276 0.696836 Green 1.11684 0.131266 8.508233 sandsv 0.118137 0.042471 2.781584 GIRPuttsHole -88.1679 11.47086 -7.68625 Rounds 0.475649 0.139888 3.400216 Events -1.84119 0.35016 -5.25814 Completed0.894404 0.233229 3.834869 P-value 1.45E-06 0.900426 0.487086 2.74E-14 0.006172 2.63E-12 0.000882 5.46E-07 0.000191 The fitted (predicted) values for SCORE will be used in equation 2 in place of SCORE, and the fitted values for RANK in equation 3. Equation 1 has no right hand side endogenous variables Equation 1) - SCORE is related to SKILLS and experience SUMMARY OUTPUT Model 1 Regression Statistics Multiple R 0.963573 R Square 0.928473 Adjusted R Square 0.925386 Standard Error 0.312244 Observations 146 ANOVA df Regression Residual Total Intercept Drive Fairway Green sandsv GIRPuttsHole Rounds SS MS F 6 175.9147 29.31911 300.7211 139 13.55195 0.097496 145 189.4666 Coefficients Standard Error t Stat 61.2801 2.093701 29.26879 -0.00526 0.004609 -1.14194 -0.01897 0.006242 -3.03902 -0.13583 0.009903 -13.7162 -0.01557 0.003563 -4.37003 13.15201 0.835199 15.74715 -0.00837 0.002169 -3.86102 P-value 2.71E-61 0.255442 0.002836 1.3E-27 2.41E-05 1.05E-32 0.000172 All variables except average driving distance are significant. All else equal: Average SCORE decreases as Percent Fairways Hit Increases (a 10% increase in fairways hit corresponds to a 0.19 decrease in SCORE) Average SCORE decreases by 1.36 with a 10% increase in Greens in regulation Average SCORE decreases by 0.16 with a 10% increase in Sand Saves Average SCORE increases by 1.32 with a 0.1 increase in putts per Green in Regulation hole Average SCORE decreases by 0.08 for 10 Round Increase in Rounds played Equation 2) - Rank is related to SCORE and Events SUMMARY OUTPUT Model 2 Regression Statistics Multiple R 0.963379 R Square 0.928099 Adjusted R Square 0.927093 Standard Error 4.089985 Observations 146 ANOVA df Regression Residual Total Intercept Score-hat Events SS MS F 2 30877.31 15438.65 922.9243 143 2392.1 16.72798 145 33269.41 Coefficients Standard Error t Stat P-value 956.9965 29.14945 32.83069 1.82E-68 -12.5125 0.384185 -32.5689 5E-68 0.281134 0.103653 2.712259 0.007503 Rank (as Percentile, with 100 meaning golfer won every tournament she played in) is: Negative associated with predicted SCORE (decreases by 12.5 with unit increase in average SCORE) Positively associated with number of Events (increases by 0.28 with a unit increase in # of EVENTS played) Note: The estimated coefficients are correct, but the standard errors, t-tests, and Analysis of Variance are incorrect (see slide 11) Equation 3) – ln(Prize) is related to Rank and Completed Events SUMMARY OUTPUT Model 3 Regression Statistics Multiple R 0.932864 R Square 0.870235 Adjusted R Square 0.86842 Standard Error 0.507583 Observations 146 ANOVA df Regression Residual Total Intercept Rank-hat Completed SS MS F 2 247.075 123.5375 479.4961 143 36.84256 0.25764 145 283.9176 Coefficients Standard Error t Stat 7.881995 0.228644 34.47279 0.055812 0.007667 7.279667 0.079741 0.017742 4.494427 P-value 3.69E-71 2.05E-11 1.43E-05 Prize Winnings (in log form): Increase with (Predicted) Rank. A 10% increase in Rank (percentile) increases ln(Prize) by 0.56 Increase with Completed Events. For each tournament completed, ln(Prize) increases by 0.080. Note: The estimated coefficients are correct, but the standard errors, ttests, and Analysis of Variance are incorrect (see slide 11) Matrix Approach: Models w/ Endogenous Predictors Z Matrix of Instrumental Variables: Intercept and 8 Exogenous variables Intercept, Drive, Fairway, Greens, SandSave, Putts, Rounds, Events, Completed X Matrix of Predictors for Model: Model 2: Intercept, Score (Actual, not predicted), Events Model 3: Intercept, Rank, Completed Y Vector of Responses: Model 2: Rank Model 3: ln(Prize) 2-Stage Least Squares Estimator and Estimated Variance-Covariance Matrix: ^ β 2SLS = X'Z Z'Z Z'X -1 -1 X'Z Z'Z Z'Y = X'PZ X X'PZ Y -1 -1 PZ = Z Z'Z Z' -1 ^ ^ V β 2SLS s X'PZ X 2 -1 ^ SSE Y - X β 2SLS SSR R2 SSR SSE SSE s n rank ( X ) 2 1 -1 SSR Y' PZ X X'PZ X X'PZ J n Y n ' ^ Y - X β 2SLS Model 2 – Rank = f(Score, Events) Z X Intercept Drive Fairway Green sandsv Putts Rounds Events Completed 1 251 73.80 64.70 36.50 1.79 61 20 13 1 256.7 73.30 69.60 30.20 1.78 65 20 14 1 250.1 65.90 64.10 32.70 1.78 56 18 12 ... ... ... ... ... ... ... ... ... 1 249.8 70.10 67.60 26.30 1.83 67 22 14 1 239.8 77.70 62.30 30.60 1.88 40 17 4 1 256.1 74.50 72.40 31.40 1.8 89 25 23 X'P_ZX 146 10610.59 2767 10610.59 771305.2 200694.4 2767 200694.4 54887 X'P_ZY 7734.555 559770.4 152254.3 INV(X'P_ZX) 50.79456 -0.66845 -0.1165 -0.66845 0.008823 0.001436 -0.1165 0.001436 0.000642 Beta_2SLS 956.9965 -12.5125 0.281134 Intercept AveStrokesEvents 1 72.492 20 1 71.477 20 1 72.25 18 ... ... ... 1 72.657 22 1 74.225 17 1 71.157 25 SSE s^2 1806.234 12.63101 Y Rank(Pctile) 55.17757 66.57407 55.50107 ... 52.17118 31.27489 75.77619 V^(Beta_2SLS) 641.5866 -8.4432 -1.47152 -8.4432 0.111449 0.018132 -1.47152 0.018132 0.008113 SSModel ybar SSReg R^2 440626.2 52.9764 30877.31 0.944736 SE(B_2SLS) 25.32956 0.333839 0.09007 Model 3: ln(Prize) = f(Rank,Completed) Z X Intercept Drive Fairway Green sandsv Putts Rounds Events Completed X'P_ZX 146 7734.555 1 251 73.80 64.70 36.50 1.79 61 20 13 1 256.7 73.30 69.60 30.20 1.78 65 20 14 7734.555 441143.7 1736 104550.7 1 250.1 65.90 64.10 32.70 1.78 56 18 12 ... ... ... ... ... ... ... ... ... INV(X'P_ZX) 1 249.8 70.10 67.60 26.30 1.83 67 22 14 0.202911 -0.00626 1 239.8 77.70 62.30 30.60 1.88 40 17 4 -0.00626 0.000228 1 256.1 74.50 72.40 31.40 1.8 89 25 23 0.011416 -0.00049 Intercept Rank(Pctile)Completed 1 55.17757 13 1 66.57407 14 1 55.50107 12 ... ... ... 1 52.17118 14 1 31.27489 4 1 75.77619 23 Y lnPrize 12.22 12.86 11.74 ... 12.66 9.36 13.33 1736 104550.7 26504 X'P_ZY 1720.879 93921.62 21631.74 0.011416 -0.00049 0.001222 Beta_2SLS 7.881995 0.055812 0.079741 V^(Beta_2SLS) 0.052024 -0.00161 0.002927 -0.00161 5.85E-05 -0.00013 0.002927 -0.00013 0.000313 SE(B_2SLS) 0.228087 0.007648 0.017699 SSE s^2 36.66316 0.256386 Robust Estimate of Variance of 2SLS Estimator V 2i 22i 2 21 21 0 2 22 0 22 Σ V 2 n 0 0 ^ V β 2SLS V 0 0 22n X'PZ X X'PZY X'PZ X 1 1 X'PZ ΣPZ X X'PZ X 1 1 1 -1 -1 X'PZ X X'Z Z'Z Z'ΣZ Z'Z Z'X X'PZ X Replacing Z'ΣZ with its estimator: 2 e21 0 2 ^ 0 e22 S = Z' 0 0 ^ ^ 0 0 Z e22n V β 2SLS X'PZ X 1 n e z z' i 1 2 2i i i ^ ' e2i Y2i x i β 2SLS ^ 1 -1 -1 X'Z Z'Z S Z'Z Z'X X'P X Z Exact same method for equation 3 z' x' 1 1 z' x' 2 Z = X = 2 ' z n x'n Results for Model 2: Rank = f(Score, Events) S-hat 12.37147 3102.712 850.8146 795.9452 462.1251 22.63899 638.5061 210.1805 125.0767 3102.712 778916.5 213337.1 199840.8 115696.1 5676.833 160482.6 52740.96 31513.16 850.8146 213337.1 58857.24 54843.46 31745.12 1556.302 44235.96 14519.52 8719.568 795.9452 199840.8 54843.46 51415.23 29684.13 1455.89 41635.56 13637.83 8236.469 462.1251 115696.1 31745.12 29684.13 18204.05 844.6313 24297.06 7958.194 4765.153 22.63899 5676.833 1556.302 1455.89 844.6313 41.4413 1164.82 384.0072 227.7275 638.5061 160482.6 44235.96 41635.56 24297.06 1164.82 36888.23 11763.14 7644.445 210.1805 52740.96 14519.52 13637.83 7958.194 384.0072 11763.14 3807.094 2383.913 125.0767 31513.16 8719.568 8236.469 4765.153 227.7275 7644.445 2383.913 1659.86 Homoskedastic Errors Heteroskedastic Errors Beta_2SLS SE(B_2SLS) t SE(B_2SLS) t 956.9965 25.3296 37.7818 23.4808 40.7566 -12.5125 0.3338 -37.4806 0.3046 -41.0801 0.2811 0.0901 3.1213 0.1053 2.6707 V(B_2SLS) 641.5866 -8.4432 -1.4715 -8.4432 0.1114 0.0181 V(B_2SLS) -1.4715 551.3468 0.0181 -7.1334 0.0081 -1.6873 -7.1334 0.0928 0.0202 -1.6873 0.0202 0.0111 Results for Model 3: ln(Prize) = f(Rank,Completed) S-hat 0.251118 62.19775 17.16093 16.13331 9.791545 0.461502 13.36902 4.551543 2.369981 62.19775 15434.73 4247.927 4004.667 2420.563 114.2658 3342.212 1132.968 597.983 17.16093 4247.927 1179.906 1103.003 668.8473 31.54885 911.7178 310.2781 161.2776 16.13331 4004.667 1103.003 1042.178 627.3954 29.63322 874.5716 294.8918 158.2978 9.791545 2420.563 668.8473 627.3954 397.5982 17.99088 521.5487 177.865 91.50151 0.461501568 114.2658074 31.54885121 29.633223 17.9908815 0.848513025 24.44990949 8.342033579 4.315346873 13.36902 3342.212 911.7178 874.5716 521.5487 24.44991 812.2522 263.6976 157.1673 4.551543 1132.968 310.2781 294.8918 177.865 8.342034 263.6976 87.66156 49.09644 2.369981 597.983 161.2776 158.2978 91.50151 4.315347 157.1673 49.09644 34.27717 Homoskedastic Errors Heteroskedastic Errors Beta_2SLS SE(B_2SLS) t SE(B_2SLS) t 7.8820 0.2281 34.5570 0.2544 30.9870 0.0558 0.0076 7.2975 0.0086 6.5169 0.0797 0.0177 4.5054 0.0205 3.8845 V(B_2SLS) 0.05202 -0.00161 0.00293 -0.00161 0.00006 -0.00013 V(B_2SLS) 0.00293 0.06470 -0.00013 -0.00196 0.00031 0.00359 -0.00196 0.00007 -0.00016 0.00359 -0.00016 0.00042 3-Stage Least Squares • Extension of 2-Stage Least Squares that allows for a covariance structure among the system of equations • Errors from 2SLS are obtained, and used to estimate the within individual (golfer) variance-covariance structure among the equations • The response vector is stacked with the n responses from model 1, being stacked over the n responses from model 2, which are stacked over the n responses from model 3. • The X matrices are “blocked” out diagonally, with 0 matrices off the blocked diagonal Model Description - I Model 1: SCORE i 0 D Di F Fi G Gi S Si P Pi R Ri 1i Y1i Model 2: Rank i 0 SCORESCORE i E Ei 2i Y2i Model 3: ln Prizei 0 RANK Rank i C Ci 3i Y3i Y11 Y21 Y31 Y1 Y Y Y 12 22 32 Y Y Y Y Y1 2 3 2 Y3 Y Y Y 1,146 2,146 3,146 F1 G1 S1 P1 R1 1 D1 1 SC1 1 D 1 SC F2 G2 S2 P2 R2 2 2 X1 X2 1 D146 F146 G146 S146 P146 R146 1 SC146 0 X1 0 X 0 X 2 0 0 0 X 3 ^ eki Yki Y ki S11 = k 1, 2,3 are residuals from 2-Stage Least Squares Regressions 1 146 2 e1i 146 7 i 1 S11 S S 21 S31 S12 S 22 S32 E1 1 RA1 1 RA E2 2 X3 E146 1 RA146 S13 S 23 S33 S12 = 146 1 e1i e2i 146 (7 3) / 2 i 1 and so on for S13 , S 22 , S 23 , S33 W S 1 Z Z'Z Z' S 1 PZ 1 C1 C2 C146 Model Description - II ^ β 3SLS X'WX X'WY X'S Z Z'Z Z'X ^ -1 ^ -1 -1 V β 3SLS X'WX X'S Z Z'Z Z'X 1 -1 -1 1 X'S -1 Z Z'Z Z'Y 1 where: S 11 S 1 S 21 S 31 S 12 S 22 S 32 S 13 S 23 S 33 S 11X1'PZ X1 X'WX S 21X 2'PZ X1 S 31X 3'PZ X1 S 11PZ W S 21PZ S 31PZ S 12 X1'PZ X 2 S 22 X 2'PZ X 2 S 32 X 3'PZ X 2 S 12 PZ S 22 PZ S 32 PZ S 13 PZ S 23 PZ S 33 PZ S 13 X1'PZ X 3 S 23 X 2'PZ X 3 S 33 X 3'PZ X 3 S 11X1'PZ Y1 S 12 X1'PZ Y2 S 13 X1'PZ Y3 X'WY S 21X 2'PZ Y1 S 22 X 2'PZ Y2 S 23 X 2'PZ Y3 S 31X 3'PZ Y1 S 32 X 3'PZ Y2 S 33 X 3'PZ Y3 -1 Estimation Results X'WX 1554.094 389489 108250.1 101857 58810.75 2837.925 90861.28 -25.0786 -1822.6 -475.292 -152.235 -8064.85 -1810.13 389488.97 97734135 27100830 25559022 14728936 711073.22 22874738 -6285.237 -456663.2 -119415.2 -38153.28 -2030746 -457357.3 V(Beta_3SLS) 4.234876 -0.00573 -0.00573 2.047E-05 -0.00541 1.756E-05 0.004066 -2.52E-05 -0.00191 2.809E-06 -1.39939 0.0005153 -0.00103 -3.54E-07 1.641475 0.000709 -0.02114 -9.32E-06 -0.00549 -1.67E-06 -0.0096 1.378E-06 0.000355 -4.38E-08 -0.00077 7.95E-08 108250.1 27100830 7588745 7104946 4096401 197673.2 6343826 -1746.85 -126916 -33121.7 -10603.9 -564282 -126816 -0.00541 1.76E-05 3.76E-05 -3.2E-05 1.76E-06 0.000212 3.51E-07 0.002615 -3.4E-05 -7.2E-06 -3.5E-07 1.73E-09 2.15E-08 101857 25559022 7104946 6705381 3853879 185907.4 6033962 -1643.68 -119354 -31362.4 -9977.64 -536619 -121572 0.004066 -2.5E-05 -3.2E-05 9.56E-05 2.35E-06 -0.00073 -9.6E-06 0.014586 -0.00019 -4E-05 -0.00012 4.31E-06 -8.9E-06 58810.75 14728936 4096401 3853879 2319387 107296.4 3488898 -949.037 -68922.7 -18155.7 -5760.94 -308831 -69983.9 -0.00191 2.81E-06 1.76E-06 2.35E-06 1.22E-05 0.00031 -1.7E-06 0.001482 -2E-05 -3E-06 -1.4E-05 5.59E-07 -1.3E-06 2837.925 711073.2 197673.2 185907.4 107296.4 5184.691 165259.8 -45.796 -3329.07 -866.152 -277.995 -14662.7 -3281.32 -1.39939 0.000515 0.000212 -0.00073 0.00031 0.680887 0.000829 -1.53791 0.020033 0.004326 0.00906 -0.00032 0.000649 90861.28 22874738 6343826 6033962 3488898 165259.8 5838242 -1466.24 -106149 -29567.5 -8900.52 -506396 -122817 -0.00103 -3.5E-07 3.51E-07 -9.6E-06 -1.7E-06 0.000829 4.68E-06 -0.00349 4.24E-05 2.17E-05 2.43E-05 -1.2E-06 3.26E-06 1.641475 0.000709 0.002615 0.014586 0.001482 -1.53791 -0.00349 631.5903 -8.32114 -1.41211 2.067496 -0.06317 0.107152 -25.0786 -6285.24 -1746.85 -1643.68 -949.037 -45.796 -1466.24 13.9931 1016.952 265.1981 39.31672 2082.859 467.492 -0.02114 -9.3E-06 -3.4E-05 -0.00019 -2E-05 0.020033 4.24E-05 -8.32114 0.109957 0.017411 -0.02734 0.000817 -0.00134 -1822.6 -456663 -126916 -119354 -68922.7 -3329.07 -106149 1016.952 73924.34 19235.18 2857.353 150742 33733.38 -0.00549 -1.7E-06 -7.2E-06 -4E-05 -3E-06 0.004326 2.17E-05 -1.41211 0.017411 0.007743 -0.00451 0.0002 -0.00051 -475.292 -119415 -33121.7 -31362.4 -18155.7 -866.152 -29567.5 265.1981 19235.18 5260.544 745.1327 41000.96 9711.5 -0.0096 1.38E-06 -3.5E-07 -0.00012 -1.4E-05 0.00906 2.43E-05 2.067496 -0.02734 -0.00451 0.051242 -0.00157 0.002827 -152.235 -38153.3 -10603.9 -9977.64 -5760.94 -277.995 -8900.52 39.31672 2857.353 745.1327 684.3542 36254.62 8137.252 0.000355 -4.4E-08 1.73E-09 4.31E-06 5.59E-07 -0.00032 -1.2E-06 -0.06317 0.000817 0.0002 -0.00157 5.67E-05 -0.00012 -8064.85 -2030746 -564282 -536619 -308831 -14662.7 -506396 2082.859 150742 41000.96 36254.62 2067798 490066.5 -0.00077 7.95E-08 2.15E-08 -8.9E-06 -1.3E-06 0.000649 3.26E-06 0.107152 -0.00134 -0.00051 0.002827 -0.00012 0.0003 -1810.13 -457357 -126816 -121572 -69983.9 -3281.32 -122817 467.492 33733.38 9711.5 8137.252 490066.5 124233.7 EQ1 EQ2 EQ3 X'WY 109821.2 27513834 7646744 7189554 4151912 200612.5 6386177 -617.871 -45213.8 -10931.8 -914.47 -24634 -1065.64 Beta_3SLS StdErr 60.66021 2.057881 -0.00305 0.004524 -0.01449 0.006129 -0.1377 0.009775 -0.01484 0.003499 13.09106 0.825158 -0.00905 0.002163 954.3673 25.13146 -12.4821 0.331598 0.303414 0.087992 7.96384 0.226367 0.050763 0.007529 0.095351 0.017317 SAS Program data lpga2009; infile 'lpga2009.dat'; input golfer drive fairway green putts sandsv prize lnprize events girputts complete aveposrank rounds strokes; lnprize1=log(prize); run; proc syslin 2sls out=regout; instruments drive fairway green girputts sandsv rounds events complete; strokes: model strokes = drive fairway green girputts sandsv rounds; output residual=e1; rank: model aveposrank = strokes events; output residual=e2; prize: model lnprize1 = aveposrank complete; output residual=e3; run; proc syslin 3sls data=lpga2009 itprint out=regout3; instruments drive fairway green girputts sandsv rounds events complete; strokes: model strokes = drive fairway green girputts sandsv rounds / xpx; output residual=e1; rank: model aveposrank = strokes events / xpx; output residual=e2; prize: model lnprize1 = aveposrank complete / xpx; output residual=e3; run; STATA Program insheet using lpga_2009_meq.csv generate lnprize=ln(prize) reg3 (avestrokes=drive fairway green sandsvpct girputtshole rounds) /// (averagepospct=avestrokes events) (lnprize=averagepospct completed), /// 2sls reg3 (avestrokes=drive fairway green sandsvpct girputtshole rounds) /// (averagepospct=avestrokes events) (lnprize=averagepospct completed), /// 3sls