3SLS 3SLS is the combination of 2SLS and SUR. It is used in an system of equations which are endogenous, i.e. In each equation there are endogenous variables on both the left and right hand sides of the equation. THAT IS THE 2SLS PART. But there error terms in each equation are also correlated. Efficient estimation requires we take account of this. THAT IS THE SUR (SEEMINGLY UNRELATED REGRESSIONS). PART. Hence in the regression for the ith equation there are endogenous (Y ) variables on the rhs AND the error term is correlated with the error terms in other equations. 3SLS log using "g:summ1.log" If you type the above then a log is created on drive g (on my computer this is the flash drive, on yours you may need to specify another drive. The name summ1 can be anything. But the suffx must be log At the end you can close the log by typing: log close So open a log now and you will have a record of this session 3SLS Load Data Clear use http://www.ats.ucla.edu/stat/stata/examples/greene/TBL16-2 THAT link no longer works. But the following does webuse klein In order to get the rest to work rename consump c rename capital1 k1 rename invest i rename profits p rename govt g rename wagegovt wg rename taxnetx t rename totinc t rename wagepriv wp generate x=totinc *generate variables generate w = wg+wp generate k = k1+i generate yr=year-1931 generate p1 = p[_n-1] generate x1 = x[_n-1] OLS Regression regress c p p1 w Regresses c on p , p1 and w (what this equation means is not so important). Usual output Source SS df MS Model Residual 923.549937 17.8794524 3 17 307.849979 1.05173249 Total 941.429389 20 47.0714695 c Coef. p p1 w _cons .1929343 .0898847 .7962188 16.2366 Std. Err. .0912102 .0906479 .0399439 1.302698 t 2.12 0.99 19.93 12.46 Number of obs F( 3, 17) Prob > F R-squared Adj R-squared Root MSE P>|t| 0.049 0.335 0.000 0.000 = = = = = = 21 292.71 0.0000 0.9810 0.9777 1.0255 [95% Conf. Interval] .0004977 -.1013658 .7119444 13.48815 .385371 .2811351 .8804931 18.98506 reg3 By the command reg3, STATA estimates a system of structural equations, where some equations contain endogenous variables among the explanatory variables. Estimation is via three-stage least squares (3SLS). Typically, the endogenous regressors are dependent variables from other equations in the system. In addition, reg3 can also estimate systems of equations by seemingly unrelated regression (SURE), multivariate regression (MVREG), and equation-by-equation ordinary least squares (OLS) or two-stage least squares (2SLS). 2SLS Regression reg3 (c p p1 w), 2sls inst(t wg g yr p1 x1 k1) Regresses c on p , p1 and w. The instruments (i.e. The predetermined or exogenous variables in this equation and the rest of the system) are t wg g yr p1 x1 k1 This means that p and w (which are not included in the instruments are endogenous). The output is as before, but it confirms what the exogenous and endogenous variables are. Two-stage least-squares regression Equation c Obs Parms RMSE "R-sq" F-Stat P 21 3 1.135659 0.9767 225.93 0.0000 Coef. Std. Err. t P>|t| [95% Conf. Interval] c p p1 w _cons .0173022 .2162338 .8101827 16.55476 Endogenous variables: Exogenous variables: .1312046 .1192217 .0447351 1.467979 0.13 1.81 18.11 11.28 c p w t wg g yr p1 x1 k1 0.897 0.087 0.000 0.000 -.2595153 -.0353019 .7158 13.45759 .2941197 .4677696 .9045654 19.65192 2SLS Regression ivreg c p1 (p w = t wg g yr p1 x1 k1) This is an alternative command to do the same thing. Note that the endogenous variables on the right hand side of the equation are specified in (p w And the instruments follow the = sign. The results are identical. Instrumental variables (2SLS) regression Source SS df MS Model Residual 919.504138 21.9252518 3 17 306.501379 1.28972069 Total 941.429389 20 47.0714695 c Coef. p w p1 _cons .0173022 .8101827 .2162338 16.55476 Instrumented: Instruments: Std. Err. .1312046 .0447351 .1192217 1.467979 p w p1 t wg g yr x1 k1 t 0.13 18.11 1.81 11.28 Number of obs F( 3, 17) Prob > F R-squared Adj R-squared Root MSE P>|t| 0.897 0.000 0.087 0.000 = = = = = = 21 225.93 0.0000 0.9767 0.9726 1.1357 [95% Conf. Interval] -.2595153 .7158 -.0353019 13.45759 .2941197 .9045654 .4677696 19.65192 3SLS Regression reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1) This format does two new things. First it specifies all the three equations in the system. Note it has to do this. Because it needs to calculate the covariances between the error terms and for this it needs to know what the equations – and hence the errors –are. Secondly it says 3sls not 2sls All 3 equations are printed out. This tells us what these equations look like Three-stage least-squares regression Equation c i wp Obs Parms RMSE "R-sq" chi2 P 21 21 21 3 3 3 .9443305 1.446736 .7211282 0.9801 0.8258 0.9863 864.59 162.98 1594.75 0.0000 0.0000 0.0000 Coef. Std. Err. z P>|z| [95% Conf. Interval] c p p1 w _cons .1248904 .1631439 .790081 16.44079 .1081291 .1004382 .0379379 1.304549 1.16 1.62 20.83 12.60 0.248 0.104 0.000 0.000 -.0870387 -.0337113 .715724 13.88392 .3368194 .3599992 .8644379 18.99766 p p1 k1 _cons -.0130791 .7557238 -.1948482 28.17785 .1618962 .1529331 .0325307 6.793768 -0.08 4.94 -5.99 4.15 0.936 0.000 0.000 0.000 -.3303898 .4559805 -.2586072 14.86231 .3042316 1.055467 -.1310893 41.49339 x x1 yr _cons .4004919 .181291 .149674 1.797216 .0318134 .0341588 .0279352 1.115854 12.59 5.31 5.36 1.61 0.000 0.000 0.000 0.107 .3381388 .1143411 .094922 -.3898181 .462845 .2482409 .2044261 3.984251 i wp Endogenous variables: Exogenous variables: c p w i wp x t wg g yr p1 x1 k1 Lets compare the three different sets of equations. Look at the coefficient on w. In OLS very significant and in 2SLS not significant but in 3SLS its back to similar with OLS and significant. That is odd. Now I expect that if 2sls is different because of bias then so should 3sls. As it stands it suggests that OLS is closer to 3SLS than 2SLS is to 3SLS. Which does not make an awful lot of sense. But we do not have many observations. Perhaps that is partly why. p p1 w _cons R2 3SLS 2SLS OLS coefficient t stat coefficient t stat coefficient t stat 0.125 1.16 0.017 0.13 0.193 2.12 0.163 1.62 0.810 18.11 0.090 0.99 0.790 20.83 0.216 1.81 0.796 19.93 16.441 12.6 16.555 11.28 16.237 12.46 0.98 0.977 0.981 3SLS Regression reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1) matrix sig=e(Sigma) Now this command stores the variances and covariances between the error terms in a matrix I call sig. You have used generate to generate variables, scalar to generate scalars. Similarly matrix produces a matrix. e(Sigma)stores this variance covariance matrix from the previous regression 3SLS Regression reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1) matrix sig=e(Sigma) . display sig[1,1], sig[1,2], sig[1,3] display sig[1,1], sig[1,2], sig[1,3] 1.0440596 .43784767 -.3852272 display sig[2,1], sig[2,2], sig[2,3] . display sig[3,1], sig[3,2], sig[3,3] . display sig[2,1], sig[2,2], sig[2,3] .43784767 1.3831832 .19260612 Variance of 1st error term 1.04406 0.437848 -0.38523 0.437848 1.383183 0.192606 -0.38523 0.192606 0.476426 . . display sig[3,1], sig[3,2], sig[3,3] -.3852272 .19260612 .47642626 Covariance of error terms from equations 2 and 3 3SLS Regression . 1.04406 0.437848 -0.38523 This relates to the variance covariance matrix in the lecture 0.437848 1.383183 0.192606 -0.38523 0.192606 0.476426 Hence 0.437848 relates to σ12 and of course σ21 This matrix is Σ 3SLS Regression display sig[1,2]/( sig[1,1] ^0.5* sig[2,2]^0.5) Now this should give the correlation between the error terms from equations 1 and 2. It is this formula Correlation (x, y) = σxy /(σx σx). When we do this we get: . display sig[1,2]/( sig[1,1]^0.5* sig[2,2]^0.5) .36435149 Lets check reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1) matrix sig=e(Sigma) matrix cy= e(b) generate rc=c-(cy[1,1]*p+ cy[1,2]*p1+ cy[1,3]*w+cy[1,4]) generate ri=i-(cy[1,5]*p+ cy[1,6]*p1+ cy[1,7]*k1+ cy[1,8]) correlate ri rc matrix cy= e(b) stores the coefficients from the regression in a regression vector we call cy, cy[1,1] is the first coefficient on p in the first equation cy[1,4] is the fourth coefficient in the first equation (the constant term) cy[1,5] is the first coefficient ion p in the second equation Note this is cy[1,5] NOT cy[2,1] Lets check reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)matrix sig=e(Sigma) matrix cy= e(b) generate rc=c-(cy[1,1]*p+ cy[1,2]*p1+ cy[1,3]*w+cy[1,4]) generate ri=i-(cy[1,5]*p+ cy[1,6]*p1+ cy[1,7]*k1+ cy[1,8]) correlate ri rc Thus cy[1,1]*p+ cy[1,2]*p1+ cy[1,3]*w+cy[1,4] is the predicted value from this first regression. and i-(cy[1,5]*p+ cy[1,6]*p1+ cy[1,7]*k1+ cy[1,8]) Is the actual minus the predicted value, i.e. The error term from the 2nd equation correlate ri rc prints out the correlation between the two error terms Lets check reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)matrix sig=e(Sigma) matrix cy= e(b) generate rc=c-(cy[1,1]*p+ cy[1,2]*p1+ cy[1,3]*w+cy[1,4]) generate ri=i-(cy[1,5]*p+ cy[1,6]*p1+ cy[1,7]*k1+ cy[1,8]) correlate ri rc . correlate ri rc (obs=21) ri rc ri rc 1.0000 0.3011 1.0000 The correlation is 0,30, close to what we had before. But not the same. Now the main purpose of this class is to illustrate commands. So its not too important. I think it could be because stata is not calculating the e(sigma) matrix by dividing by n-k, but just n????? Lets check Click on help (on tool bar at the top of the screen to the right). Click on ‘stata command’ In the dialogue box type reg3 Move down towards the end of the file and you get the following Saved results reg3 saves the following in Scalars e(N) e(k) e(k_eq) e(mss_ # ) e(df_m # ) e(rss_ # ) e(df_r) e(r2_ # ) e(F_ # ) e(rmse_ # ) e(dfk2_adj) e(ll) e(chi2_ # ) e(p_ # ) e(ic) e(cons_ # ) e(): number of observations number of parameters number of equations model sum of squares for equation # model degrees of freedom for equation # residual sum of squares for equation # residual degrees of freedom ( small) R-squared for equation # F statistic for equation # (small) root mean squared error for equation # divisor used with VCE when dfk2 specified log likelihood chi-squared for equation # significance for equation # number of iterations 1 when equation # has a constant; 0 otherwise Some important retrievables e(mss_#) e(rss_#) e(r2_#) e(F_#) e(rmse_#) e(ll) model sum of squares for equation # residual sum of squares for equation # R-squared for equation # F statistic for equation # (small) root mean squared error for equation # log likelihood Where # is a number e.g. If 2 it means equation 2. And Matrices e(b) e(Sigma) e(V) coefficient vector Sigma hat matrix variance-covariance matrix of the estimators The Hausman Test Again We looked at this with respect to panel data. But it is a general test to allow us to compare an equation which has been estimated by two different techniques. Here we apply the technique to comparing ols with 3sls. reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr),ols est store EQNols reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr) , 3sls inst(t wg g yr p1 x1 k1) est store EQN3sls hausman EQNols EQN3sls The Hausman Test Again Below we run the three regressions specifying ols and store the results as EQNols. reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr),ols est store EQNols Then we run the three regressions specifying 3sls and store the results as EQN3sls. reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr) , 3sls inst(t wg g yr p1 x1 k1) est store EQN3sls Then we do the Hausman test hausman EQNols EQN3sls The Results . hausman EQNols EQN3sls Coefficients (b) (B) EQNols EQN3sls p p1 w .1929343 .0898847 .7962188 .1248904 .1631439 .790081 (b-B) Difference .068044 -.0732592 .0061378 sqrt(diag(V_b-V_B)) S.E. . . .0124993 b = consistent under Ho and Ha; obtained from reg3 B = inconsistent under Ha, efficient under Ho; obtained from reg3 Test: Ho: difference in coefficients not systematic chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B) = 0.06 Prob>chi2 = 0.9963 (V_b-V_B is not positive definite) The table prints out the two sets of coefficients and their difference. The Hausman test statistic is 0.06 The significance level is 0.9963 This is clearly very far from being significant at the 10% level. The Hausman Test Again Hence it would appear that the coefficients from the two regressions are not significantly different. If OLS was giving biased estimates that 3SLS corrects they would be different. Hence we would conclude that there is no endogeneity which requires endogenous techniques. But because the error terms do appear correlated SUR is probably the approriate technique as it produces better results. Tasks 1. Using the display command, e.g. display e(mss_2) Print on the screen some of the retrievables from eqach regression (the above the model sum of squared residuals for the second equation. 2. Lets look at the display command Type: display "The residual sum of squares =" e(mss_2) Tasks display "The residual sum of squares =" e(mss_2), "and the R2 =" e(r2_2) display _column(20) "The residual sum of squares =" e(mss_2), _column(50) "and the R2 =" e(r2_2) display _column(20) "The residual sum of squares =" e(mss_2), _column(60) "and the R2 =" e(r2_2) display _column(20) "The residual sum of squares =" e(mss_2), _column(60) "and the R2 =" _skip(5) e(r2_2) display _column(20) "The residual sum of squares =" e(mss_2), _column(60) "and the R2 =" _skip(10) e(r2_2) Tasks Close log: log close And have a look at it in word. webuse klein In order to get the rest to work rename consump c rename capital1 k1 rename invest i rename profits p rename govt g rename wagegovt wg rename taxnetx t rename totinc t rename wagepriv wp generate x=totinc generate w = wg+wp generate k = k1+i generate yr=year-1931 generate p1 = p[_n-1] generate x1 = x[_n-1] reg3 (c p p1 w), 2sls inst(t wg g yr p1 x1 k1) reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)