3sls inst(t wg g yr p1 x1 k1)

advertisement
3SLS
3SLS is the combination of 2SLS and SUR.
It is used in an system of equations which are endogenous, i.e. In each
equation there are endogenous variables on both the left and right hand
sides of the equation. THAT IS THE 2SLS PART.
But there error terms in each equation are also correlated. Efficient
estimation requires we take account of this. THAT IS THE SUR
(SEEMINGLY UNRELATED REGRESSIONS). PART.
Hence in the regression for the ith equation there are endogenous (Y )
variables on the rhs AND the error term is correlated with the error terms in
other equations.
3SLS
log using "g:summ1.log"
If you type the above then a log is created on drive g
(on my computer this is the flash drive, on yours you
may need to specify another drive.
The name summ1 can be anything. But the suffx
must be log
At the end you can close the log by typing:
log close
So open a log now and you will have a record of this
session
3SLS Load Data
Clear
use http://www.ats.ucla.edu/stat/stata/examples/greene/TBL16-2
THAT link no longer works. But the following does
webuse klein
In order to get the rest to work
rename consump c
rename capital1 k1
rename invest i
rename profits p
rename govt g
rename wagegovt wg
rename taxnetx t
rename totinc t
rename wagepriv wp
generate x=totinc
*generate variables
generate w = wg+wp
generate k = k1+i
generate yr=year-1931
generate p1 = p[_n-1]
generate x1 = x[_n-1]
OLS Regression
regress c p p1 w
Regresses c on p , p1 and w (what this equation means is not so
important).
Usual output
Source
SS
df
MS
Model
Residual
923.549937
17.8794524
3
17
307.849979
1.05173249
Total
941.429389
20
47.0714695
c
Coef.
p
p1
w
_cons
.1929343
.0898847
.7962188
16.2366
Std. Err.
.0912102
.0906479
.0399439
1.302698
t
2.12
0.99
19.93
12.46
Number of obs
F( 3,
17)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
0.049
0.335
0.000
0.000
=
=
=
=
=
=
21
292.71
0.0000
0.9810
0.9777
1.0255
[95% Conf. Interval]
.0004977
-.1013658
.7119444
13.48815
.385371
.2811351
.8804931
18.98506
reg3
By the command reg3, STATA estimates a system of structural
equations, where some equations contain endogenous variables
among the explanatory variables. Estimation is via three-stage
least squares (3SLS). Typically, the endogenous regressors are
dependent variables from other equations in the system.
In addition, reg3 can also estimate systems of equations by
seemingly unrelated regression (SURE), multivariate regression
(MVREG), and equation-by-equation ordinary least squares
(OLS) or two-stage least squares (2SLS).
2SLS Regression
reg3 (c p p1 w), 2sls inst(t wg g yr p1 x1 k1)
Regresses c on p , p1 and w. The instruments (i.e. The predetermined
or exogenous variables in this equation and the rest of the system) are
t wg g yr p1 x1 k1
This means that p and w (which are not included in the instruments
are endogenous).
The output is as before, but it confirms
what the exogenous and endogenous
variables are.
Two-stage least-squares regression
Equation
c
Obs
Parms
RMSE
"R-sq"
F-Stat
P
21
3
1.135659
0.9767
225.93
0.0000
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
c
p
p1
w
_cons
.0173022
.2162338
.8101827
16.55476
Endogenous variables:
Exogenous variables:
.1312046
.1192217
.0447351
1.467979
0.13
1.81
18.11
11.28
c p w
t wg g yr p1 x1 k1
0.897
0.087
0.000
0.000
-.2595153
-.0353019
.7158
13.45759
.2941197
.4677696
.9045654
19.65192
2SLS Regression
ivreg c p1 (p w = t wg g yr p1 x1 k1)
This is an alternative command to do the same thing. Note that the
endogenous variables on the right hand side of the equation are
specified in (p w
And the instruments follow the = sign.
The results are identical.
Instrumental variables (2SLS) regression
Source
SS
df
MS
Model
Residual
919.504138
21.9252518
3
17
306.501379
1.28972069
Total
941.429389
20
47.0714695
c
Coef.
p
w
p1
_cons
.0173022
.8101827
.2162338
16.55476
Instrumented:
Instruments:
Std. Err.
.1312046
.0447351
.1192217
1.467979
p w
p1 t wg g yr x1 k1
t
0.13
18.11
1.81
11.28
Number of obs
F( 3,
17)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
0.897
0.000
0.087
0.000
=
=
=
=
=
=
21
225.93
0.0000
0.9767
0.9726
1.1357
[95% Conf. Interval]
-.2595153
.7158
-.0353019
13.45759
.2941197
.9045654
.4677696
19.65192
3SLS Regression
reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)
This format does two new things. First it specifies all the three
equations in the system. Note it has to do this. Because it needs to
calculate the covariances between the error terms and for this it needs
to know what the equations – and hence the errors –are.
Secondly it says 3sls not 2sls
All 3 equations are printed out. This tells us
what these equations look like
Three-stage least-squares regression
Equation
c
i
wp
Obs
Parms
RMSE
"R-sq"
chi2
P
21
21
21
3
3
3
.9443305
1.446736
.7211282
0.9801
0.8258
0.9863
864.59
162.98
1594.75
0.0000
0.0000
0.0000
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
c
p
p1
w
_cons
.1248904
.1631439
.790081
16.44079
.1081291
.1004382
.0379379
1.304549
1.16
1.62
20.83
12.60
0.248
0.104
0.000
0.000
-.0870387
-.0337113
.715724
13.88392
.3368194
.3599992
.8644379
18.99766
p
p1
k1
_cons
-.0130791
.7557238
-.1948482
28.17785
.1618962
.1529331
.0325307
6.793768
-0.08
4.94
-5.99
4.15
0.936
0.000
0.000
0.000
-.3303898
.4559805
-.2586072
14.86231
.3042316
1.055467
-.1310893
41.49339
x
x1
yr
_cons
.4004919
.181291
.149674
1.797216
.0318134
.0341588
.0279352
1.115854
12.59
5.31
5.36
1.61
0.000
0.000
0.000
0.107
.3381388
.1143411
.094922
-.3898181
.462845
.2482409
.2044261
3.984251
i
wp
Endogenous variables:
Exogenous variables:
c p w i wp x
t wg g yr p1 x1 k1
Lets compare the three different sets of equations. Look at the coefficient on
w. In OLS very significant and in 2SLS not significant but in 3SLS its back to
similar with OLS and significant. That is odd.
Now I expect that if 2sls is different because of bias then so should 3sls. As it
stands it suggests that OLS is closer to 3SLS than 2SLS is to 3SLS. Which does
not make an awful lot of sense.
But we do not have many observations. Perhaps that is partly why.
p
p1
w
_cons
R2
3SLS
2SLS
OLS
coefficient t stat
coefficient t stat
coefficient t stat
0.125
1.16
0.017
0.13
0.193
2.12
0.163
1.62
0.810
18.11
0.090
0.99
0.790
20.83
0.216
1.81
0.796
19.93
16.441
12.6
16.555
11.28
16.237
12.46
0.98
0.977
0.981
3SLS Regression
reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)
matrix sig=e(Sigma)
Now this command stores the variances and covariances between the
error terms in a matrix I call sig.
You have used generate to generate variables, scalar to generate scalars.
Similarly matrix produces a matrix.
e(Sigma)stores this variance covariance matrix from the previous
regression
3SLS Regression
reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)
matrix sig=e(Sigma)
. display sig[1,1], sig[1,2], sig[1,3]
display sig[1,1], sig[1,2], sig[1,3]
1.0440596 .43784767 -.3852272
display sig[2,1], sig[2,2], sig[2,3]
.
display sig[3,1], sig[3,2], sig[3,3]
. display sig[2,1], sig[2,2], sig[2,3]
.43784767 1.3831832 .19260612
Variance of 1st error term
1.04406 0.437848 -0.38523
0.437848 1.383183 0.192606
-0.38523 0.192606 0.476426
.
. display sig[3,1], sig[3,2], sig[3,3]
-.3852272 .19260612 .47642626
Covariance of error terms
from equations 2 and 3
3SLS Regression
.
1.04406 0.437848 -0.38523
This relates to the variance
covariance matrix in the lecture
0.437848 1.383183 0.192606
-0.38523 0.192606 0.476426
Hence 0.437848 relates to σ12 and
of course σ21
This matrix is Σ
3SLS Regression
display sig[1,2]/( sig[1,1] ^0.5* sig[2,2]^0.5)
Now this should give the correlation between the error terms from
equations 1 and 2.
It is this formula Correlation (x, y) = σxy /(σx σx). When we do this we get:
. display sig[1,2]/( sig[1,1]^0.5* sig[2,2]^0.5)
.36435149
Lets check
reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)
matrix sig=e(Sigma)
matrix cy= e(b)
generate rc=c-(cy[1,1]*p+ cy[1,2]*p1+ cy[1,3]*w+cy[1,4])
generate ri=i-(cy[1,5]*p+ cy[1,6]*p1+ cy[1,7]*k1+ cy[1,8])
correlate ri rc
matrix cy= e(b) stores the coefficients from the regression in a
regression vector we call cy,
cy[1,1] is the first coefficient on p in the first equation
cy[1,4] is the fourth coefficient in the first equation (the constant term)
cy[1,5] is the first coefficient ion p in the second equation
Note this is cy[1,5] NOT cy[2,1]
Lets check
reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1
k1)matrix sig=e(Sigma)
matrix cy= e(b)
generate rc=c-(cy[1,1]*p+ cy[1,2]*p1+ cy[1,3]*w+cy[1,4])
generate ri=i-(cy[1,5]*p+ cy[1,6]*p1+ cy[1,7]*k1+ cy[1,8])
correlate ri rc
Thus cy[1,1]*p+ cy[1,2]*p1+ cy[1,3]*w+cy[1,4] is the predicted value
from this first regression. and
i-(cy[1,5]*p+ cy[1,6]*p1+ cy[1,7]*k1+ cy[1,8])
Is the actual minus the predicted value, i.e. The error term from the 2nd
equation
correlate ri rc prints out the correlation between the two error terms
Lets check
reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)matrix
sig=e(Sigma)
matrix cy= e(b)
generate rc=c-(cy[1,1]*p+ cy[1,2]*p1+ cy[1,3]*w+cy[1,4])
generate ri=i-(cy[1,5]*p+ cy[1,6]*p1+ cy[1,7]*k1+ cy[1,8])
correlate ri rc
. correlate ri rc
(obs=21)
ri
rc
ri
rc
1.0000
0.3011
1.0000
The correlation is 0,30, close to what we had before. But not the same.
Now the main purpose of this class is to illustrate commands. So its not
too important. I think it could be because stata is not calculating the
e(sigma) matrix by dividing by n-k, but just n?????
Lets check
Click on help (on tool bar at the top of the screen to the right).
Click on ‘stata command’
In the dialogue box type reg3
Move down towards the end of the file and you get the following
Saved results
reg3 saves the following in
Scalars
e(N)
e(k)
e(k_eq)
e(mss_ # )
e(df_m # )
e(rss_ # )
e(df_r)
e(r2_ # )
e(F_ # )
e(rmse_ # )
e(dfk2_adj)
e(ll)
e(chi2_ # )
e(p_ # )
e(ic)
e(cons_ # )
e():
number of observations
number of parameters
number of equations
model sum of squares for equation
#
model degrees of freedom for equation
#
residual sum of squares for equation
#
residual degrees of freedom ( small)
R-squared for equation
#
F statistic for equation
# (small)
root mean squared error for equation
#
divisor used with VCE when
dfk2 specified
log likelihood
chi-squared for equation
#
significance for equation
#
number of iterations
1 when equation # has a constant; 0 otherwise
Some important retrievables
e(mss_#)
e(rss_#)
e(r2_#)
e(F_#)
e(rmse_#)
e(ll)
model sum of squares for equation #
residual sum of squares for equation #
R-squared for equation #
F statistic for equation # (small)
root mean squared error for equation #
log likelihood
Where # is a number e.g. If 2 it means equation 2.
And
Matrices
e(b)
e(Sigma)
e(V)
coefficient vector
Sigma hat matrix
variance-covariance matrix of the estimators
The Hausman Test Again
We looked at this with respect to panel data. But it is a general test to
allow us to compare an equation which has been estimated by two
different techniques. Here we apply the technique to comparing ols
with 3sls.
reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr),ols
est store EQNols
reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr) , 3sls inst(t wg g yr p1
x1 k1)
est store EQN3sls
hausman EQNols EQN3sls
The Hausman Test Again
Below we run the three regressions specifying ols and store
the results as EQNols.
reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr),ols
est store EQNols
Then we run the three regressions specifying 3sls and store
the results as EQN3sls.
reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr) , 3sls inst(t wg g yr p1
x1 k1)
est store EQN3sls
Then we do the Hausman test
hausman EQNols EQN3sls
The Results
. hausman EQNols EQN3sls
Coefficients
(b)
(B)
EQNols
EQN3sls
p
p1
w
.1929343
.0898847
.7962188
.1248904
.1631439
.790081
(b-B)
Difference
.068044
-.0732592
.0061378
sqrt(diag(V_b-V_B))
S.E.
.
.
.0124993
b = consistent under Ho and Ha; obtained from reg3
B = inconsistent under Ha, efficient under Ho; obtained from reg3
Test:
Ho:
difference in coefficients not systematic
chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B)
=
0.06
Prob>chi2 =
0.9963
(V_b-V_B is not positive definite)
The table prints out the two sets of coefficients and their difference.
The Hausman test statistic is 0.06
The significance level is 0.9963
This is clearly very far from being significant at the 10% level.
The Hausman Test Again
Hence it would appear that the coefficients from the two
regressions are not significantly different.
If OLS was giving biased estimates that 3SLS corrects they
would be different.
Hence we would conclude that there is no endogeneity which
requires endogenous techniques.
But because the error terms do appear correlated SUR is
probably the approriate technique as it produces better
results.
Tasks
1. Using the display command, e.g.
display e(mss_2)
Print on the screen some of the retrievables from eqach regression (the
above the model sum of squared residuals for the second equation.
2. Lets look at the display command
Type:
display "The residual sum of squares =" e(mss_2)
Tasks
display "The residual sum of squares =" e(mss_2), "and the
R2 =" e(r2_2)
display _column(20) "The residual sum of squares ="
e(mss_2), _column(50) "and the R2 =" e(r2_2)
display _column(20) "The residual sum of squares ="
e(mss_2), _column(60) "and the R2 =" e(r2_2)
display _column(20) "The residual sum of squares ="
e(mss_2), _column(60) "and the R2 =" _skip(5) e(r2_2)
display _column(20) "The residual sum of squares ="
e(mss_2), _column(60) "and the R2 =" _skip(10) e(r2_2)
Tasks
Close log:
log close
And have a look at it in word.
webuse klein
In order to get the rest to work
rename consump c
rename capital1 k1
rename invest i
rename profits p
rename govt g
rename wagegovt wg
rename taxnetx t
rename totinc t
rename wagepriv wp
generate x=totinc
generate w = wg+wp
generate k = k1+i
generate yr=year-1931
generate p1 = p[_n-1]
generate x1 = x[_n-1]
reg3 (c p p1 w), 2sls inst(t wg g yr p1 x1
k1)
reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr),
3sls inst(t wg g yr p1 x1 k1)
Download