ch4 - Houston H. Stokes Page - University of Illinois at Chicago

Revised Chapter 4 in Specifying and Diagnostically Testing Econometric Models (Edition 3)
© by Houston H. Stokes 10 March 2012 All rights reserved. Preliminary Draft
Chapter 4
Simultaneous Equations Systems ................................................................................................ 1
4.0 Introduction ......................................................................................................................... 1
4.1 Estimation of Structural Models ........................................................................................ 2
Table 4.1 Matlab Program to obtain Constrained Reduced Form ............................................... 3
Table 4.2 Edited output from running Matlab Program in Table 4.1....................................... 5
4.2 Estimation of OLS, LIML, LS2, LS3, and ILS3 ............................................................... 9
4.3 Examples............................................................................................................................. 17
Table 4.3 B34S, Rats, SAS & Stata setups for ols, liml, ls2, ls3, and ils3 commands ......... 18
Table 4.4 Kmenta (1971, 582) OLS, 2SLS, LIML, 3SLS, I3SLS, FIML Test Problem
Answers .................................................................................................................................. 31
Table 4.5 Kmenta (1986, 712) OLS, 2SLS, LIML, 3SLS, I3SLS, FIML Test Problem
Answers .................................................................................................................................. 32
4.4 Exactly identified systems ................................................................................................. 41
Table 4.6 Exactly Identified Kmenta Problem ....................................................................... 41
4.5 Analysis of OLS, 2SLS and 3SLS using Matrix Command .......................................... 45
Table 4.7 Matrix Command Implementation of OLS, 2SLS, 3SLS and FIML ..................... 45
4.6 LS2 and GMM Models and Specification tests ................................................................... 58
Table 4.8 LS2 and General Method of Moments estimation routines ................................... 61
Table 4.9 Estimation of LS2 and GMM Models using B34S, Stata and Rats ....................... 68
4.7 Potential problems of IV Models......................................................................................... 79
Table 4.10 Overview of IV Tests ........................................................................................... 80
Table 4.11 Subroutine to Perform Hausman Tests ................................................................ 81
Table 4.12 Various Hausman Tests........................................................................................ 82
4.8 Conclusion ........................................................................................................................... 94
Simultaneous Equations Systems
4.0 Introduction
In section 4.1, after first discussing the basic simultaneous equations model, the
constrained reduced form, the unconstrained reduced form and the final form are introduced.
The MATLAB symbolic capability is used to illustrate how the constrained reduced form relates
to the structural parameters of the model. In section 4.2 the theory behind QR approach to
simultaneous equations modeling as developed by Jennings (1980) is discussed in some detail.
The simeq command performs estimation of systems of equations by the methods of OLS,
limited information maximum likelihood (LIML), two-stage least squares (2SLS), three-stage
least squares (3SLS), iterative three-stage least squares (I3SLS), seemingly unrelated regression
(SUR) and full information maximum likelihood (FIML), using code developed by Les Jennings
(1973, 1980). The Jennings code is unique in that it implements the QR approach to estimate
4-1
4-2
Chapter 4
systems of equations, which results in both substantial savings in time and increased accuracy.1
The estimation methods are well known and covered in detail in such books as Johnston (1963,
1972, 1984), Kmenta (1971, 1986), and Pindyck and Rubinfeld (1976, 1981, 1990) and will only
be sketched here. What will be discussed are the contributions of Jennings and others. The
discussion of these techniques follows closely material in Jennings (1980) and Strang (1976).
Section 4.3 illustrates estimation of variants of the Kmenta model using RATS, B34S and
SAS while section 4.4 illustrates an exactly identified model. Section 4.5 shows how using the
matrix command OLS, LIMF, 3SLS and FIML can be estimated. The code here is for
illustration purposes, benchmarking but not production. Section 4.6 shows matrix command
subroutines LS2 and GAMEST that respectively do single equation 2SLS and GMM models.
This code is 100% production.
4.1 Estimation of Structural Models
Assume a system of G equations with K exogenous variables2
b11 y1i  ...  b1G yG i  11 x1i  ...  1 K xK i  u1i
b 21 y1i  ...  b 2G yG i   21 x1i  ...   2 K xK i  u2 i
.....................................................................
(4.1-1)
b G1 y1i  ...  bGG yG i   G1 x1i  ...   G K xK i  uG i
where xk i is the kth exogenous variable for the ith period, y j i is the jth endogenous variable for the
ith period, and u j i is the jth equation error term for the ith period. If we define
 b11 b12 ... b1G 
 b b ... b 
2G 
B=  21 22
.................... 


 bG1 bG2 ... bGG 
 y1i 
 x1i 
 u1i 
1112 ... 1K 
y 
x 
u 
  ...  
21 22
2K 
2i 
2i 




yi 
xi 
u i   2i 
. 
. 
. 
................ 
 
 
 


 G1 G 2 ... GK 
 yGi 
 xKi 
uGi 
equation (4.1-1) can be written as
1 The B34S qr command is designed to provide up to 16 digits of accuracy. This command,
which also allows estimation of the principal component (PC) regression, uses LINPACK code
and is documented in Chapter 10. The qr command is distinct from the code in the simeq
command. The matrix command contains extensive and programmable QR capability. For
further examples see Chapter 10 and 16. and sections of chapter 2
2 For further discussion see Pindyck and Rubinfeld (1981, 339-349).
Simultaneous Equations Systems
Byi  x i  ui
4-3
(4.1-2)
If all observations in yi , x i and ui are included, then
u11u12 ... u1N 
 x11 x12 ... x1N 
 y11 y12 ... y1N 


 x x ... x 
 y y ... y 
u21u22 ... u2 N 
21 22
2N 
21 22
2N 



X=
Y=
U=
............... 
............... 
............... 






 xk 1 xk 2 ... xk N 
 yG1 yG 2 ... yG N 
uG1uG 2 ... uG N 
and equation (4.1-2) can be written as
BY  X  U
(4.1-3)
From equation (4.1-3), the constrained reduced form can be calculated as
Y=  B-1X  B-1U= X  V
(4.1-4)
If  is estimated directly with OLS, then it is called the unconstrained reduced form. The B34S
simeq command estimates B, using either OLS, 2SLS, LIML, 3SLS, I3SLS, or FIML. For each
estimated vector B, the associated reduced form coefficient vector π can be optionally
calculated.3 If B is estimated by OLS, the coefficients will be biased since the key OLS
assumption that the right-hand-side variables are orthogonal with the error term is violated.
Model (4.1-3) can be normalized such that the coefficients bi j  1 for i  j . The necessary
condition for identification of each equation is that the number of endogenous variables - 1 be
less than or equal to the number of excluded exogenous variables. The reason for this restriction
is that otherwise it would not be possible to solve for the elements of  uniquely in terms of the
other parameters of the model. A short example from Greene (2003) that is self documented
using MATLAB illustrates this problem.
Table 4.1 Matlab Program to obtain Constrained Reduced Form
%
%
%
%
%
Greene (2003) Chapter 15 Problem # 1
y1= g1*y2 + b11*x1 + b21*x2 + b31*x3
y2= g2*y1 + b12*x1 + b22*x2 + b32*x3
We know BY+GX=E
syms g1 g2 b11 b21 b31 b12 b22 b32
B =[ 1, -g1;
-g2,
1]
3 If the model is exactly identified, the constrained reduced form  can be directly estimated
by OLS or using (4.1-4) from LIML, 2SLS or 3SLS. This is shown empirically in section 4.5.
4-4
Chapter 4
G =[-b11,-b21,-b31;
-b12,-b22,-b32]
a= -1*inv(B)*G
p11=a(1,1)
p12=a(1,2)
p13=a(1,3)
p21=a(2,1)
p22=a(2,2)
p23=a(2,3)
% Hopeless. Have 6 equations BUT more than 6 variables
'
Now impose restrictions'
'
b21=0 b32=0'
G =[-b11,
0, -b31;
-b12,-b22, 0
]
B,G
a= -1*inv(B)*G
' Here 6 equations and six unknowns g1 g2 b11 b31 b12 b22 '
p11=a(1,1)
p12=a(1,2)
p13=a(1,3)
p21=a(2,1)
p22=a(2,2)
p23=a(2,3)
Simultaneous Equations Systems
4-5
Table 4.2 Edited output from running Matlab Program in Table 4.1
p11
p12
p13
p21
p22
p23
=
=
=
=
=
=
-1/(-1+g1*g2)*b11+g1/(-1+g1*g2)*b12
-1/(-1+g1*g2)*b21+g1/(-1+g1*g2)*b22
-1/(-1+g1*g2)*b31+g1/(-1+g1*g2)*b32
-g2/(-1+g1*g2)*b11+1/ (-1+g1*g2)*b12
-g2/(-1+g1*g2)*b21+1/ (-1+g1*g2)*b22
-g2/(-1+g1*g2)*b31+1/ (-1+g1*g2)*b32
Here 6 equations and six unknowns g1 g2 b11 b31 b12 b22
p11
p12
p13
p21
p22
p23
= -1/(-1+g1*g2)*b11+g1/(-1+g1*g2)*b12
=
-g1/(-1+g1*g2)*b22
=
-1/(-1+g1*g2)*b31
= -g2/(-1+g1*g2)*b11+1/(-1+g1*g2)*b12
=
-1/(-1+g1*g2)*b22
=
-g2/(-1+g1*g2)*b31
If the excluded exogenous variables of the ith equation are not significant in any other equation,
then the ith equation will not be identified, even if it is correctly specified. We note that
E (ui | xi )  0 and E (uiui' )   where ui  [u1i , , uGi ] ' and xi  [ x1i , , xK i ]' . The reduced form
disturbance is not correlated with the exogenous variables or
E (vi | xi )  B 1 0  0 .
E (vi vi' | xi )  E[ B 1ui ui' ( B ' ) 1 ]  B 1( B ' ) 1   from which we deduce that
  BB'
(4.1-5)
In summary,  = G by K exogenous variable coefficient matrix, B = G by G nonsingular
endogenous variable coefficient matrix,  = K by K symmetric positive definite matrix structural
covariance matrix,  =G by K constrained reduced form coefficient matrix and  = G by G
reduced form covariance matrix. The importance of this is that since  and  can be estimated
consistently by OLS, following Greene (2003, 387) if B were known, we could obtain    B
from (4.1-4) and  from (4.1-5). If there are no endogenous variables on the right, yet a number
of equations are estimated where there is covariance in the error term across equations, the
seemingly unrelated regression model (SUR) can be estimated as
ˆ  ( X '  1 X ) 1 ( X '  1Y ).
(4.1-6)
ˆ (ˆ ) can be estimated if OLS is used on each of the G equations and
Elements of 
ij
ˆ ii  uˆiuˆi' /(T  Ki )
ˆ i j  uˆiuˆ 'j / (T  Ki )(T  K j )
(4.1-7)
4-6
Chapter 4
For more detail see Greene (2003) or other advanced econometric books. Pindyck and
Rubinfield (1976, 1981, 1990) provides a particularly good treatment that is consistent with the
notation in this chapter.
From (4.1-4) Theil (1971, 463-468) suggests calculating the final form. First partition the
i observation of the exogenous variables into lagged endogenous, current exogenous and lagged
exogenous where identifies are used to express lags > 1.
th
  [d 0 , D1 , D2 , D3 ]
 yi1 
x   x i 
 x i 1 
yi  d 0  D1yi 1  D2 x i  D3 x i 1   i*
*
i
(4.1-8)
Theil (1971) shows that (4.1-8) can be expressed as


t 1
t 0
yi  (I  D1 ) 1 d 0  D2 x i   D1t 1 ( D1D2  D3 )x i 1   D1t i*1
(4.1-9)
where D2 is the impact multiplier. If there are no lagged endogenous variables in the system,
D1  0 and the constrained reduced form and the final form are the same. In this case
  [ D2 , D3 ] . The interim multipliers are D2 , ( D1D2  D3 ), D1 ( D1D2  D3 ), , D1 ( D1D2  D3 )
which, when summed, form the total multiplier G*
G*  D2  (I  D1  D12 
)( D1D2  D3 )
 D2  (I  D1 ) 1 ( D1D2  D3 )
 (I  D1 ) 1[(I  D1 ) D2  D1D2  D3 )
(4.1-10)
 (I  D1 ) 1 ( D2  D3 )
Goldberger (1959) and Kmenta (1971, 592) provide added detail. The importance of (4.1-8) is
that it shows the effect on all endogenous variables of a change in any exogenous variable after
all effects have had a chance to work themselves out in the system.
There are several common mistakes made in setting up simultaneous equations systems
that include the following:
- Not fully checking for multicollinearity in the equations system.
Simultaneous Equations Systems
4-7
- Attempting to interpret the estimated B and Γ coefficients as partial derivatives, rather
than looking at the reduced form G by K matrix π.
- Not effectively testing whether excluded exogenous variables are significant in at least
one other equation in the system.
- Not building into the solution procedure provisions for taking into account the number
of significant digits in the data.
The simeq code has unique design characteristics that allow solutions for some of these
problems. In the next sections, we will briefly outline some of these features.
Assume for a moment that X is a T by K matrix of observations of the exogenous
variables, Y is a T by 1 vector of observations of the endogenous variable, and β is a K element
array of OLS coefficients, then the OLS solution for the estimated β from equation (2.1-8) is
( X ' X ) 1 X ' Y . The problem with this approach is that some accuracy is lost by forming the
matrix X ' X . The QR approach4 proceeds by operating directly on the matrix X to express it in
terms of the upper triangular K by K matrix R and the T by T orthogonal matrix Q. X is factored
as
R 
R 
X=Q    [Q1 | Q2 ]    Q1R
0 
0 
(4.1-11)
Since Q'Q = I, then
(X'X)-1X'Y=(R 'Q1' Q1R)-1R 'Q1' Y=(R 'R) 1R 'Q1' Y=R 1Q1' Y
(4.1-12)
4 A good discussion of the QR factorization is contained in Strang (1976). Other references
include Jennings (1980) and Dongarra, Bunch, Moler, and Stewart (1979).
4-8
Chapter 4
Following Jennings (1980), we define the condition number of matrix X, (C(X)), as the
ratio of the square root of the largest eigenvalue of X ' X , [ Emax ( X ' X )] to the smallest
eigenvalue of X ' X , [ Emin ( X ' X )]
C(X)= [Emax (X'X)/Emin (X'X)]
(4.1-13)
If | | X||= Emax (X'X) , and X is square and nonsingular, then
C(X)=||X|| ||X1 ||
(4.1-14)
Throughout B34S, 1/C(X) is checked to test for rank problems. Jennings (1980) notes that C(X)
can also be used as a measure of relative error. If μ is a measure of round-off error, then
[C ( X )]2 is the bound for the relative error of the calculated solution. In an IBM 370 running
double precision, μ is approximately .1E-16. If C(X) is > .1E+8 (1 /C(X) is < .1E-8), then
[C(X)]2  1 , meaning that no digits in the reported solution are significant. Jennings (1980)
looks at the problem from another perspective. If matrix X has a round-off error of τX such that
the actual X used is X+τX, then ||  X|| / ||X|| must be less than 1/C(X) for a solution to exist. If
||  X|| / ||X|| = 1/C(X)
(4.1-15)
then there exists a  X such that X   X is singular.5 The user can inspect the estimate of the
condition and determine the degree of multicollinearity. Most programs only report problems
when the matrix is singular. Inspection of C(X) gives warning of the degree of the problem. The
simeq command contains the IPR parameter option with which the user can inform the program
of the number of significant digits in X. This information is used to terminate the iterative threestage (ILS3) iterations when the relative change in the solution is within what would be
expected, given the number of significant digits in the data.
Jennings (1980) notes that the relative error of the QR solution to the OLS problem given
in equation (4.1-10) has the form
n1C ( X )  n2C ( X ) 2 (|| eˆ || / || ˆ ||)
(4.1-16)
where n1 and n2 are of the order of machine precision and || eˆ || ˆ || are the lengths of the
estimated residual and estimated coefficients, respectively. (The length or L2NORM of a vector
5 For more detail on techniques used in simeq to avoid numerical error in the calculations arising
from differences in the means of the data, see Jennings (1980).
Simultaneous Equations Systems
ei is defined as
e
2
i
4-9
) . Equation (4.1-14) indicates that as the relative error of the computer
i
solution improves, the closer the model fits. An estimate of this relative error is made for OLS,
LIML and 2SLS estimators reported by simeq.
4.2 Estimation of OLS, LIML, LS2, LS3, and ILS3
For OLS estimation of a system of equations, simeq uses the QR approach discussed
earlier. If the reduced option is used, once the structural coefficients B and Γ in equation (4.1-3)
are known, the constrained reduced form coefficients π from equation (4.1-4) are displayed. If B
and Γ are estimated using OLS, and all structural equations are exactly identified, then the
constraints on π imposed from the structural coefficients B and Γ are not binding and π could be
estimated directly with OLS or indirectly via (4.1-4). However, if one or more of the equations
in the structural equations system (4.1-2) are overidentified, π must be estimated as  B1 .
Although the reduced-form coefficients π exist and may be calculated from any set of
structural estimates B and Γ, in practice it is not desirable to report those derived from OLS
estimation because in the presence of endogenous variables on the right-hand side of an
equation, the OLS assumption that the error term is orthogonal with the explanatory variables is
violated. Since OLS imposes this constraint as a part of the estimation process, the resulting
estimated B and Γ are biased.
The reason that OLS is often used as a benchmark is because from among the class of all
linear estimators, OLS produces minimum variance. The loss in predictive power of LIML and
2SLS has to be weighed against the fact that OLS produces biased estimates. If reduced-form
coefficients are desired, identities in the system must be entered. The number of identities plus
the number of estimated equations must equal the number of endogenous variables in the model.
The simeq command requires that the number of model sentences and identity sentences is
equal to the number of variables listed in the endogenous sentence.
The 2SLS estimator first estimates all endogenous variables as a function of all
exogenous variables. This is equivalent to estimating an unconstrained form of the reduced-form
equation (4.1-4). Next, in stage 2 the estimated values of the endogenous variables on the right in
the jth equation Yj* are used in place of the actual values of the endogenous variables Yj on the
right to estimate equation (4.1-2). Since the estimated values of the endogenous variables on the
right are only a function of exogenous variables, the theory suggests they can be assumed to be
orthogonal with the population error, and OLS can be safely used for the second stage. In terms
of our prior notation, the two-stage estimator for the first equation is
4-10
Chapter 4
 b11 
. 
 
1
ˆ 'Yˆ Yˆ ' X  Yˆ ' y 
 b1g 

Y
1
1
1
1
1 1
ˆ X )'(Y
ˆ X )}-1 (Y
ˆ X )' y  
   {(Y
  ' 
1 1
1
1
1 1
1
' ˆ
'

 11 
 X 1Y1 X 1 X 1   X 1 y1 
. 
 
1g 
(4.2-1)
where Ŷ1' is the matrix of predicted endogenous variables in the first equation and X1 is the
matrix of exogenous variables in the first equation. For further details on this traditional
estimation approach, see Pindyck and Rubinfeld (1981, 345-347).
The QR approach used by Jennings (1980) involves estimating equation (4.2-1) as the
solution of
Z'j (XX )Z j j  Z'j (XX )y j
(4.2-2)
For  j , where  'j  {(b11,..., big )',(11,.., 1k )'}, Z j  [X j | Yj ] and X+ pseudoinverse6 of X. Zj
consists of the X and Y variables in the jth equation. XX+ is not calculated directly but is
expressed in terms of the QR factorization of X. By working directly on X, and not forming
X'X, substantial accuracy is obtained. Jennings proceeds by writing
I 0
XX +  Q  r  Q '
0 0 
(4.2-3)
where Ir is the r by r identity matrix and r is the rank of X. Using equation (4.2-3), equation (4.22) becomes
ˆ Ir 0  Z
ˆ  Z
ˆ Ir 0  yˆ
Z
j
j j
j

 j
0 0 
0 0 
(4.2-4)
6 If we define X+ as the pseudoinverse of the T by K matrix X, then it can be shown (Strang
1976, 138, exercise 3.4.5) that the following four conditions hold: 1. XX+X=X; 2. X+XX+=X+;
3. (XX+)'=XX+; and 4. (X+X)'=X+X . The pseudoinverse can be obtained from the singular value
decomposition or the QR factorization of X.
Simultaneous Equations Systems
4-11
where Ẑ j  Q'Z j and ŷ j  Q' y j .
The 2SLS covariance matrix can be estimated as
(|| e j ||2 d f )(Z'jXX+ Z j )1
(4.2-5)
where d f is the degrees of freedom and || e j ||2 is the residual sum of squares (or the square of
the L2NORM of the residual). There is a substantial controversy in the literature about the
appropriate value for d f . Since the SEs of the estimated 2SLS coefficients are known only
asymptotically, Theil (1971) suggests that d f be set equal to T, the number of observations used
to estimate the model. Others suggest that d f be set to T-K, similar to what is being used in
OLS. If Theil's suggestion is used, the estimated SEs of the coefficients are larger. The T-K
option is more conservative. The simeq command produces both estimates of the coefficient
standard errors to facilitate comparison with other programs and researcher preferences.
Two-stage least squares estimation of an equation with endogenous variables on the right,
in contrast with OLS estimation, in theory produces unbiased coefficients at the cost of some loss
of efficiency. If a large system is estimated, it is often impossible to use all exogenous variables
in the system because of loss of degrees of freedom. The usual practice is to select a subset of the
exogenous variables. The greater the number of exogenous variables relative to the degrees of
freedom, the closer the predicted Y variables on the right are to the raw Y variables on the right.
In this situation, the 2SLS estimator sum of squares of residuals will approach the OLS estimator
sum of squares of residuals. Such an estimator will lose the unbiased property of the 2SLS
estimator. Usual econometric practice is to use OLS and 2SLS and compare the results to see
how sensitive the OLS results are to simultaneity problems.
While 2SLS results are sensitive to the variable that is used to normalize the system,
limited information maximum likelihood (LIML) estimation, which can be used in place of
2SLS, is not so sensitive. Kmenta (1971, 568-570) has a clear discussion which is summarized
below. The LIML estimator,7 which is hard to explain in simple terms, involves selecting values
for b and δ for each equation such that L is minimized where L = SSE1 / SSE. We define SSE1
as the residual variance of estimating a weighted average of the y variables in the equation on all
exogenous variables in the equation, while SSE is the residual variance of estimating a weighted
average of the y variables on all the exogenous variables in the system. Since SSE  SSE1, L is
bounded at 1. The difficulty in LIML estimation is selecting the weights for combining the y
variables in the equation. Assume equation 1 of (4.1-1)
b11 y1i  ...  b1G yG i  11 xi 1  ...  1K xK i  u1i
7 Kmenta (1971, 565-572) has one of the clearest descriptions. The discussion here
complements that material.
(4.2-6)
4-12
Chapter 4
Ignoring time subscripts, we can define
y1*  y1  [b12 y2  ...  b1G yG ]
(4.2-7)
'
 [1, b12 ,..., b1G ] we would know y*1
If we define Y1*  [ y1i ,..., y1G ] and we knew the vector B1*
since y1*  Y1*B1* and could regress y* on all x variables on the right in that equation and call the
residual variance SSE1 and next regress y1* on all x variables in the system and call the residual
variance SSE. If we define X1 as a matrix consisting of the columns of the x variables on the
right X1= [x1i,...,x1K], and we knew B1*, then we could estimate 1  [11 ,..., 1K ] as
1  [X1' X1 ]1 X1' y1*  (X1' X1 ) 1 X1*Y1*B1*
(4.2-8)
However, we do not know B1*. If we define
W1*  Y1*' Y1*  (Y1*' X1 )(X1*X1 ) 1 X1*Y1*
W1  Y1*' Y1*  (Y1*' X)(X'X)1X'Y1*
(4.2-9)
(4.2-10)
where X is the matrix of all X variables in the system, then L can be written as
'
'
L  [B1*
W1*B1* ] / B1*
W1B1*
(4.2-11)
Minimizing L implies that
det (W1*  LW1 )B1*  0
(4.2-12)
The LIML estimator uses eigenvalue analysis to select the vector B1* such that L is minimized.
This calculation involves solving the system
det(W1*  LW1 )  0
(4.2-13)
for the smallest root L which we will call  . This root can be substituted back into equation (4.212) to get B1* and into equation (4.2-8) to get Γ1. Jennings shows that equation (4.2-13) can be
rewritten as
det | Y1*' {(I  X1X1+ )   (I-XX + )}Y1* | 0 .
(4.2-14)
Further factorizations lead to accuracy improvements and speed over the traditional methods of
solution outlined in Johnston (1984), Kmenta (1971), and other books. Jennings (1973, 1980)
briefly discusses tests made for computational accuracy, given the number of significant digits in
the data and various tests for nonunique solutions. One of the main objectives of the simeq code
Simultaneous Equations Systems
4-13
was to be able to inform the user if there were problems in identification in theory and in
practice. Since the LIML standard errors are known only asymptotically and are, in fact, equal
to the 2SLS estimated standard errors, these are used for both the 2SLS and LIML estimators.
In the first stage of 2SLS, π is the unconstrained, reduced form.
Y = πX + V
(4.2-15)
and is estimated to obtain the Yˆ predicted variables. 2SLS, OLS, and LIML are all special cases
of the Theil (1971) k class estimators. The general formula for the k class estimator for the first
equation (Kmenta 1971, 565) is
ˆ (k )   Y'Y  kV
ˆ 'V
ˆ
B
1
1 1
1 1
 (k )   
'
ˆ
1   X1Y1
1
ˆ 'y 
Y1'X1  Y1'Y1  kV
1 1
 

'
'
X1X1  
X1 y1 
(4.2-16)
where Vˆ1 is the predicted residual from estimating all but the 1st y variable in equation (4.2-15),
Yˆ1  Y1  Vˆ , and X1 is the X variables on the right-hand side of the first equation. (4.2-16) follows
directly from (4.2-1). If k=0, equation (4.2-15) is the formula for OLS estimation of the first
equation. If k=1, equation (4.2-16) is the formula for 2SLS estimation of the first equation and
can be transformed to equation (4.2-5). If k =  , the minimum root of equation (4.2-13),
equation (4.2-16) is the formula for the LIML estimator (Theil 1971, 504). Hence, OLS, 2SLS,
and LIML are all members of the k class of estimators.
Three-stage least squares utilizes the covariance of the residuals across equations from
the estimated 2SLS model to improve the estimated coefficients B and Γ. If the model has only
exogenous variables on the right-hand side ( B  I ) which implies that the OLS estimates can be
used to calculate the covariance of the residuals across equations. The resulting estimator is the
seemingly unrelated regression model (SUR). In this discussion, we will look at the 3SLS model
only, since the SUR model is a special case. From (4.2-2) we rewrite the 2SLS estimator for the
ith equation as
 i  [Zi' X(X ' X) 1X ' Zi ]1 Zi' X(X ' X) 1X ' yi ,
(4.2-17)
which estimates the ith 2SLS equation
yi  Zii  ui .
If we define8 (X' X)-1  PP' and multiply equation (4.2-18) by P'X', we obtain
8 This discussion is based on material contained in Johnston (1984, 486).
(4.2-18)
4-14
Chapter 4
P' X ' yi  P' X ' Zi i  P' X 'ui
(4.2-19)
which can be written
w i  Wii   i
(4.2-20)
where w i  P ' X ' yi , Wi =P'X ' Zi , and  i  P 'X 'ui . If all G 2SLS equations are written as
 w1   W1 0 ...... 0  1  1 
 w   0 W ...... 0     
2
 2
 2  2 
.  ....................
 .  . 
  
   
WG   G   G 
 w G   0 0
(4.2-21)
then the system can be written as
w = Wα + ε.
(4.2-22)
For each equation, i=j and
E[i ( j )' ]  E[P'X( i ( j )' XP)= i j P'X'XP= i j I
(4.2-23)
while the covariance of the error term for the system becomes
 11 I  12 I... 1G I 


 I  I... 2G I 
   21 22
........................ 


 G1I  G 2 I... 1G I 
24)
(4.2-
Equation (4.2-24) indicates that for each equation there is no heteroskedasticity, but that there is
contemporaneous correlation of the residuals across equations. Equation (4.2-24) can be
estimated from the 2SLS estimates of the residuals of each equation for 3SLS or the OLS
estimates of the residuals of each equation for SUR models. Let
ˆ ˆ  I
V=
25)
be such an estimate. The 3SLS estimator of the system  , where  '  [B ] becomes
(4.2-
Simultaneous Equations Systems
ˆ ' W) 1W ' V
ˆ 1w
  (W ' V
4-15
(4.2-26)
Jennings (1980) uses two alternative approaches to solve (4.2-26) depending on whether the
covariance of the 3SLS estimator
Var( )  (Wˆ 'Vˆ 1Wˆ ) 1
27)
(4.2-
is required or not. In the former case, a orthogonal factorization method is used. In the latter
case to save space the conjugate gradiant interative algorithm (Lanczos reductyion) suggested
by Paige and Sanders (1973) is used. This latter approach may or may not converge. For added
detail see Jennings (1980). If the switch kcov=diag is used there will not be convergence
issues, since the QR approach will be used. Since many software systems use inversion methods,
slight differences in the estimated coefficients will be observed since the QR approach is in
theory more accurate. Implementation of the "textbook" approach is illustrated using the matrix
command in section 4.4.
In a model with G equations, if the equation of interest is the jth equation, then assuming
the exogenous variables in the system are selected correctly and the jth equation is specified
correctly, 2SLS estimates are invariant to any other equation. 3SLS of the j th equation, in
contrast, is sensitive to the specification of other equations in the system since changes in other
equation specifications will alter the estimate of V and thus the 3SLS estimator of δ from
equation (4.2-26). Because of this fact, it is imperative that users first inspect the 2SLS estimates
closely. The constrained reduced form estimates, π, should be calculated from the OLS and 2SLS
models and compared. The differences show the effects of correcting for simultaneity. Next,
3SLS should be performed. A study of the resulting changes in δ and π will show the gain of
moving to a system-wide estimation procedure. Since changes in the functional form of one
equation i can possibly impact the estimates of another equation j, in this step of model building,
sensitivity analysis should be attempted. In a multiequation system, the movement from 2SLS to
3SLS often produces changes in the estimate of δi for one equation but not for another equation.
In a model in which all equations are over identified, in general the 3SLS estimators will differ
from the 2SLS estimators. If all equations are exactly identified, then V is a diagonal matrix
(Theil 1971, 511) and there is no gain for any equation from using 3SLS. In the test problem
from Kmenta (1971, 565), which is discussed in the next section, one equation is over identified
and one equation is exactly identified. In this case, only the exactly identified equation will be
changed by 3SLS. This is because the exactly identified equation gains from information in the
over identified equation but the reverse is not true. The over identified equation does not gain
from information from the exactly identified equation.
In SUR models, if all equations contain the same variables, there is no gain over OLS
from going to SUR, since V is again a diagonal matrix. Just as the LIML method of estimation is
an alternative to 2SLS, the FIML is a more costly alternative to 3SLS and I3SLS.
4-16
Chapter 4
FIML9 is a generalization of LIML for systems of models. Like LIML, it is invariant to
the variable used to normalize the model. FIML, in contrast with 3SLS, is highly nonlinear and,
as a consequence, much more costly to estimate. Because FIML is asymptotically equivalent to
3SLS (Theil 1971, 525) and the simeq code does not contain any major advantages over other
programs, the discussion of FIML is left to Theil (1971), Kmenta (1971) and Johnston (1984)
except for an annotated FIML example using the matrix command. In the next section, an
annotated output is presented.
Iterative 3SLS is an alternative final step in which the estimate of V is updated
from the information from the 3SLS estimates. The problem now becomes where do you stop
iterating on the estimates of V? The simeq command uses the information on the number of
significant digits (see ipr parameter) in the raw data and equation (4.1-8) to terminate the I3SLS
iterations if the relative change is within what would be expected, given the number of
significant digits in the raw data. If ipr is not set, the simeq command assumes ten digits.
9 The fiml section of the simeq command is the weakest link. In addition to a probably a scaler
error in the fiml standard errors, there often are convergence problems that appear to be data
related. In view of this and the fact that 3SLS is an inexpensive substitute, users are encouraged
to employ 3SLS and I3SLS in place of FIML. Future releases of B34S will endeavor to improve
the FIML code or disable the option. The matrix command implementation of FIML, shown
later in section 4.4, provides a look into how such a model might be implemented.
Simultaneous Equations Systems
4-17
4.3 Examples
Using data on supply and demand from Kmenta (1971, 565), Table 4.3 shows b34s code
to estimate models for OLS, LIML, 2SLS, and 3SLS. The reduced-form estimates for each
model are calculated. Not all output is shown to save space. The results are the same, digit for
digit, as those reported in Kmenta (1971, 582) which is shown in Table 4.4. Kmenta (1986, 712)
reported the results for the same problem for all models with the same coefficients except for
I3SLS for the supply equation. These answers are shown in Table 4.5 Note the use of the
keyword ls2 for 2SLS and ls3 for 3SLS since the b34s parser will not recognize 2SLS and 3SLS
as keywords. The b34s setup in Table 4.3 also shows Rats, SAS and Stata commands by which it
is possible to benchmark each software system. These results will be discussed in turn.
4-18
Chapter 4
Table 4.3 B34S, Rats, SAS & Stata setups for ols, liml, ls2, ls3, and ils3 commands
==KMENTA1
%b34slet runsimeq=1;
%b34slet runsas =1;
%b34slet runrats =1;
%b34slet runstata=1;
B34sexec data nohead corr$
Input q p d f a $
Label q = 'Food consumption per head'$
Label p = 'Ratio of food prices to consumer prices'$
Label d = 'Disposable income in constant prices'$
Label f = 'Ratio of t 1 years price to general p'$
Label a = 'Time'$
Comment=('Kmenta (1971) page 565 answers page 582')$
Datacards$
98.485 100.323 87.4 98.0 1 99.187 104.264 97.6
102.163 103.435 96.7 99.1 3 101.504 104.506 98.2
104.240
98.001 99.8 110.8 5 103.243
99.456 100.5
103.993 101.066 103.2 105.6 7 99.900 104.763 107.8
100.350
96.446 96.6 108.7 9 102.820
91.228 88.9
95.435
93.085 75.1 81.0 11 92.424
98.801 76.9
94.535 102.908 84.6 70.9 13 98.757
98.756 90.6
105.797
95.119 103.1 102.3 15 100.225
98.451 105.1
103.522
86.498 96.4 110.5 17 99.929 104.016 104.4
105.223 105.769 110.7 89.3 19 106.232 113.490 127.1
B34sreturn$
B34seend$
99.1
98.1
108.2
109.8
100.6
68.6
81.4
105.0
92.5
93.0
2
4
6
8
10
12
14
16
18
20
%b34sif(&runsimeq.ne.0)%then;
B34sexec simeq printsys reduced ols liml ls2 ls3 ils3 kcov=diag
ipr=6$
Heading=('Test Case from Kmenta (1971) Pages 565
582 ' ) $
Exogenous constant d f a $
Endogenous p q $
Model lvar=q rvar=(constant p d)
Name=('Demand Equation')$
Model lvar=q rvar=(constant p f a) name=('Supply Equation')$
B34seend$
%b34sendif;
%b34sif(&runsas.ne.0)%then;
B34SEXEC
B34SRUN$
OPTIONS
OPEN('testsas.sas')
UNIT(29)
DISP=UNKNOWN$
Simultaneous Equations Systems
4-19
B34SEXEC OPTIONS CLEAN(29) $ B34SEEND$
B34SEXEC PGMCALL IDATA=29 ICNTRL=29$
SAS
$
PGMCARDS$
proc means; run;
proc syslin 3sls reduced;
instruments d f a constant;
endogenous p q;
demand:
supply:
run;
model q = p d;
model q = p f a;
proc syslin it3sls reduced;
instruments d f a constant;
endogenous p q;
demand:
supply:
run;
model q = p d;
model q = p f a;
B34SRETURN$
B34SRUN $
B34SEXEC OPTIONS CLOSE(29)$ B34SRUN$
/$ The next card has to be modified to point to SAS location
/$ Be sure and wait until SAS gets done before letting B34S
resume
B34SEXEC OPTIONS dodos('start /w /r sas testsas')
dounix('sas testsas')$
B34SRUN$
B34SEXEC OPTIONS NPAGEOUT NOHEADER
WRITEOUT('
','Output from SAS',' ',' ')
WRITELOG('
','Output from SAS',' ',' ')
COPYFOUT('testsas.lst')
COPYFLOG('testsas.log')
dodos('erase
testsas.sas','erase
testsas.lst','erase
testsas.log')
dounix('rm
testsas.sas','rm
testsas.lst','rm
testsas.log')$
B34SRUN$
%b34sendif;
%b34sif(&runrats.ne.0)%then;
4-20
Chapter 4
B34SEXEC OPTIONS HEADER$ B34SRUN$
b34sexec
options
open('rats.dat')
unit(28)
disp=unknown$
b34srun$
b34sexec options open('rats.in') unit(29) disp=unknown$ b34srun$
b34sexec options clean(28)$ b34srun$
b34sexec options clean(29)$ b34srun$
b34sexec pgmcall$
rats passasts
pcomments('* ',
'* Data passed from B34S(r) system to RATS',
'*
',
"display
@1
%dateandtime()
@33
'
%ratsversion()"
'* ') $
Rats
Version
PGMCARDS$
*
*
heading=('test case from kmenta 1971 page 565 - 582 ' ) $
*
exogenous constant d f a $
*
endogenous p q $
*
model lvar=q rvar=(constant p d)
name=('demand eq.') $
*
model lvar=q rvar=(constant p f a) name=('supply eq.') $
linreg q
# constant p d
linreg q
# constant p f a
instruments constant d f a
linreg(inst) q
# constant p d
linreg(inst) q
# constant p f a
source d:\r\liml.src
@liml q
# constant p d
@liml q
# constant p f a
equation demand q
# constant p d
equation supply q
'
Simultaneous Equations Systems
4-21
# constant p f a
* Supply does not match known answers!!
sur(inst,iterations=200) 2
# demand resid1
# supply resid2
nonlin(parmset=structural) c0 c1 c2 d0 d1 d2 d3
compute
compute
compute
compute
compute
compute
compute
c0
c1
c2
d0
d1
d2
d3
=
=
=
=
=
=
=
.1
.1
.1
.1
.1
.1
.1
frml d_eq q = c0 + c1*p + c2*d
frml s_eq q = d0 + d1*p + d2*f + d3*a
nlsystem(inst,parmset=structural,outsigma=v) * * d_eq s_eq
b34sreturn$
b34srun $
b34sexec options close(28)$ b34srun$
b34sexec options close(29)$ b34srun$
b34sexec options
/$
dodos(' rats386 rats.in rats.out ')
dodos('start /w /r
rats32s rats.in /run')
dounix('rats
rats.in rats.out')$ B34SRUN$
b34sexec options npageout
WRITEOUT('Output from RATS',' ',' ')
COPYFOUT('rats.out')
dodos('ERASE rats.in','ERASE rats.out','ERASE
dounix('rm
rats.in','rm
rats.out','rm
$
B34SRUN$
%b34sendif;
%b34sif(&runstata.ne.0)%then;
/$ This name is required unless filename option used
rats.dat')
rats.dat')
4-22
Chapter 4
b34sexec options open('statdata.do') unit(28) disp=unknown$
b34srun$
b34sexec options clean(28)$ b34srun$
b34sexec
options
open('stata.do')
unit(29)
disp=unknown$
b34srun$
b34sexec options clean(29)$ b34srun$
b34sexec pgmcall idata=28 icntrl=29$
stata$
pgmcards$
//
uncomment if do not use /e
//
log using stata.log, text
// version info
about
describe
summarize
reg3 (q p d) (q p f a), 2sls endog(p)
reg3 (q p d) (q p f a), 3sls endog(p)
reg3 (q p d) (q p f a), ireg3 endog(p)
b34sreturn$
b34seend$
b34sexec options close(28); b34srun;
b34sexec options close(29); b34srun;
b34sexec options
dodos('stata /e do stata.do');
b34srun;
b34sexec options npageout
writeout('output from stata',' ',' ')
copyfout('stata.log')
dodos('erase
stata.do','erase
stata.log','erase
statdata.do') $
b34srun$
%b34sendif;
==
Simultaneous Equations Systems
4-23
The OLS results from b34s match Kmenta to every digit and are shown next:
Test Case from Kmenta (1971) Pages 565 - 582
Summary of Input Parameters and Model
Number of systems to be estimated - Number of identities - - - - - - - Number of exogenous variables - - Number of endogenous variables - - Number of data points in time - - - Maximum number of unknowns per system
Print Parameter - - - - - - - - - - Solutions wanted 0 => no, 1 => yes Reduced form coefficients - - - - - Ordinary Least Squares - - - - - - LIMLE Solution - - - - - - - - - - Two Stage Least Squares - - - - - - Three Stage Least Squares - - - - - Three Stage Covariance Matrix - - - Iterated Three Stage Least Squares Covariance Matrix for I3SLSQ - - - Maximum number of iterations - - - Functional Minimization 3SLSQ - - - Covariance Matrix for Functional Min.
-
2
0
4
2
20
4
2
1
1
1
1
1
1
1
1
25
0
0
Systems described by the following columns of data
Name of the System
LHS
Demand Equation
B34S 8.10R
4
Q
2
Q
(D:M:Y)
11/ 4/04 (H:M:S) 11:13:19
Least Squares Solution for System Number
1
Condition Number of Matrix is greater than
Relative Numerical Error in the Solution
LHS Endogenous Variable No.
2
Exogenous Variables (Predetermined)
1
CONSTANT
2
D
SIMEQ STEP
PAGE
Demand Equation
21.04911571706159
1.301987681166638E-11
Q
99.89542
0.3346356
Std. Error
7.519362
0.4542183E-01
t
13.28509
7.367285
Endogenous Variables (Jointly Dependent)
3
P
-0.3162988
Std. Error
0.9067741E-01
t
-3.488177
Residual Variance for Structural Disturbances
Ratio of Norm Residual to Norm LHS
3.725391173733892
1.762488253954560E-02
Covariance Matrix of Estimated Parameters
1
2
3
CONSTANT
D
1
2
56.54
0.3216E-01 0.2063E-02
-0.5948
-0.2333E-02
P
3
0.8222E-02
Correlation Matrix of Estimated Parameters
CONSTANT
D
P
1
2
3
NO. Y
3
1
1 CONSTANT
1 P
3 F
4 A
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Test Case from Kmenta (1971) Pages 565 - 582
CONSTANT
D
P
(Variables)
2
1
1 CONSTANT
1 P
2 D
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Supply Equation
2
No. X
CONSTANT
D
1
2
1.000
0.9417E-01
1.000
-0.8724
-0.5665
P
3
1.000
Test Case from Kmenta (1971) Pages 565 - 582
Least Squares Solution for System Number
Condition Number of Matrix is greater than
Relative Numerical Error in the Solution
LHS Endogenous Variable No.
2
Q
2
Supply Equation
17.67594711864223
1.318741471618151E-11
4-24
Chapter 4
Exogenous Variables (Predetermined)
1
CONSTANT
2
F
3
A
58.27543
0.2481333
0.2483023
Std. Error
11.46291
0.4618785E-01
0.9751777E-01
t
5.083825
5.372263
2.546227
Endogenous Variables (Jointly Dependent)
4
P
0.1603666
Std. Error
0.9488394E-01
t
1.690134
Residual Variance for Structural Disturbances
Ratio of Norm Residual to Norm LHS
5.784441135907554
2.130622575072544E-02
Covariance Matrix of Estimated Parameters
CONSTANT
F
A
P
1
2
3
4
CONSTANT
1
131.4
-0.3044
-0.2792
-0.9875
F
A
P
2
3
4
0.2133E-02
0.1316E-02
0.8440E-03
0.9510E-02
0.5220E-03
0.9003E-02
Correlation Matrix of Estimated Parameters
CONSTANT
F
A
P
1
2
3
4
CONSTANT
1
1.000
-0.5749
-0.2498
-0.9079
F
A
2
1.000
0.2921
0.1926
P
3
1.000
0.5642E-01
4
1.000
Test Case from Kmenta (1971) Pages 565 - 582
Contemporaneous Covariance of Residuals (Structural Disturbances)
For Least Squares Solution.
Condition Number of residual columns,
Demand E
Supply E
1
2
Demand E
1
3.167
3.411
2.664758
Supply E
2
4.628
Correlation Matrix of Residuals
Demand E
Supply E
1
2
Demand E
1
1.000
0.8912
Supply E
2
1.000
Test Case from Kmenta (1971) Pages 565 - 582
Coefficients of the Reduced Form Equations.
Least Squares Solution.
Condition number of matrix used to find the reduced form coefficients is no smaller than
P
CONSTANT
D
F
A
1
2
3
4
1
87.31
0.7020
-0.5206
-0.5209
4.195815340351579
Q
2
72.28
0.1126
0.1647
0.1648
Mean sum of squares of residuals for the reduced form equations.
1
2
P
Q
0.42748D+01
0.39192D+01
Condition Number of columns of exogenous variables,
11.845
For each estimated equation, the condition number of the matrix, equation (4.1-7), and the
relative numerical errors in the solution, equation (4.1-8), are given. The relative numerical
errors for the supply and demand equations were .1302E-10 and .13187E-10, respectively.
Estimated coefficients agree with Kmenta (1971, 582). From the estimated B and Γ coefficients,
the constrained reduced form π coefficients are calculated. The condition number of the
exogenous columns, .11845E+2, shows little multicollinearity among the exogenous variables.
The next outputs show the corresponding estimates for LIML, 2SLS, and 3LSL. As was
discussed earlier, since the asymptotic SEs for LIML are the same as for 2SLS, the simeq
Simultaneous Equations Systems
4-25
command does not print these values. Kmenta, however, reports standard errors on the LIML
estimates. Note that b34s reports both the large and small sample standard errors.
Test Case from Kmenta (1971) Pages 565 - 582
Limited Information - Maximum Likelihood Solution f
1
Demand Equation
Rank and Condition Number of Exogenous Columns
Rank and Condition Number of Endogenous Variables orthogonal to X(K)
Rank and Condition Number of Endogenous Variables orthogonal to X
Value of LIML
Parameter is
2
2
2
8.5174634
6.5593694
2.3005812
3
1
2
8.2098363
1.0000000
1.0000000
1.173867141559841
Condition Number of Matrix is greater than
Relative Numerical Error in the Solution
LHS Endogenous Variable No.
2
8.517463415017575
4.487883690647531E-12
Q
Standard Deviation Equals 2SLSQ Standard Deviation.
Exogenous Variables (Predetermined)
1
CONSTANT
2
D
93.61922
0.3100134
Endogenous Variables (Jointly Dependent)
3
P
-0.2295381
Residual Variance for Structural Disturbances
Ratio of Norm Residual to Norm LHS
3.926009688207962
1.809322459330604E-02
Test Case from Kmenta (1971) Pages 565 - 582
Limited Information - Maximum Likelihood Solution f
2
Supply Equation
Rank and Condition Number of Exogenous Columns
Rank and Condition Number of Endogenous Variables orthogonal to X(K)
Rank and Condition Number of Endogenous Variables orthogonal to X
Value of LIML
Parameter is
1.000000000000000
Condition Number of Matrix is greater than
Relative Numerical Error in the Solution
LHS Endogenous Variable No.
2
8.209836250820180
4.943047984855735E-12
Q
Standard Deviation Equals 2SLSQ Standard Deviation.
Exogenous Variables (Predetermined)
1
CONSTANT
2
F
3
A
49.53244
0.2556057
0.2529242
Endogenous Variables (Jointly Dependent)
4
P
0.2400758
Residual Variance for Structural Disturbances
Ratio of Norm Residual to Norm LHS
6.039577731391617
2.177103664979223E-02
Test Case from Kmenta (1971) Pages 565 - 582
Contemporaneous Covariance of Residuals (Structural Disturbances)
For LIMLE Solution.
Condition Number of residual columns,
Demand E
Supply E
Demand E
1
3.337
3.629
1
2
2.811594
Supply E
2
4.832
Correlation Matrix of Residuals
Demand E
Supply E
1
2
Demand E
1
1.000
0.9038
Supply E
2
1.000
Test Case from Kmenta (1971) Pages 565 - 582
Coefficients of the Reduced Form Equations.
LIMLE Solution.
Condition number of matrix used to find the reduced form coefficients is no smaller than
CONSTANT
D
1
2
P
Q
1
93.88
0.6601
2
72.07
0.1585
4.258817996669486
4-26
F
A
Chapter 4
3
4
-0.5443
-0.5386
0.1249
0.1236
Mean sum of squares of residuals for the reduced form equations.
1
2
P
Q
0.41286D+01
0.38401D+01
Test Case from Kmenta (1971) Pages 565 - 582
Two Stage Least Squares Solution for System Number
Condition Number of Matrix is greater than
Relative Numerical Error in the Solution
LHS Endogenous Variable No.
2
Exogenous Variables (Predetermined)
1
CONSTANT
2
D
1
Demand Equation
21.98482284147018
1.411421448020441E-11
Q
94.63330
0.3139918
Std. Error
7.920838
0.4694366E-01
t
11.94738
6.688695
Theil SE
7.302652
0.4327991E-01
Theil t
12.95876
7.254908
Endogenous Variables (Jointly Dependent)
3
P
-0.2435565
Std. Error
0.9648429E-01
t
-2.524313
Theil SE
0.8895412E-01
Theil t
-2.738002
Residual Variance for Structural Disturbances
Ratio of Norm Residual to Norm LHS
3.866416929101937
1.795538131264630E-02
Covariance Matrix of Estimated Parameters
CONSTANT
D
P
1
2
3
CONSTANT
D
1
2
62.74
0.4930E-01 0.2204E-02
-0.6734
-0.2642E-02
P
3
0.9309E-02
Correlation Matrix of Estimated Parameters
CONSTANT
D
P
1
2
3
CONSTANT
1
1.000
0.1326
-0.8812
D
P
2
3
1.000
-0.5833
1.000
Test Case from Kmenta (1971) Pages 565 - 582
Two Stage Least Squares Solution for System Number
Condition Number of Matrix is greater than
Relative Numerical Error in the Solution
LHS Endogenous Variable No.
2
Exogenous Variables (Predetermined)
1
CONSTANT
2
F
3
A
2
Supply Equation
18.21923089332271
1.431397195953368E-11
Q
49.53244
0.2556057
0.2529242
Std. Error
12.01053
0.4725007E-01
0.9965509E-01
t
4.124086
5.409637
2.537996
Theil SE
10.74254
0.4226175E-01
0.8913422E-01
Theil t
4.610868
6.048158
2.837565
Endogenous Variables (Jointly Dependent)
4
P
0.2400758
Std. Error
0.9993385E-01
t
2.402347
Theil SE
0.8938355E-01
Theil t
2.685905
Residual Variance for Structural Disturbances
Ratio of Norm Residual to Norm LHS
6.039577731391617
2.177103664979223E-02
Covariance Matrix of Estimated Parameters
CONSTANT
F
A
P
1
2
3
4
CONSTANT
1
144.3
-0.3238
-0.2952
-1.095
F
A
P
2
3
4
0.2233E-02
0.1377E-02
0.9362E-03
0.9931E-02
0.5791E-03
0.9987E-02
Correlation Matrix of Estimated Parameters
CONSTANT
F
A
P
1
2
3
4
CONSTANT
1
1.000
-0.5706
-0.2467
-0.9126
F
A
2
1.000
0.2924
0.1983
P
3
1.000
0.5815E-01
4
1.000
Test Case from Kmenta (1971) Pages 565 - 582
Contemporaneous Covariance of Residuals (Structural Disturbances)
For Two Stage Least Squares Solution.
Condition Number of residual columns,
Demand E
1
Demand E
1
3.286
Supply E
2
2.804709
Simultaneous Equations Systems
Supply E
2
3.593
4-27
4.832
Correlation Matrix of Residuals
Demand E
Supply E
1
2
Demand E
1
1.000
0.9017
Supply E
2
1.000
Test Case from Kmenta (1971) Pages 565 - 582
Coefficients of the Reduced Form Equations.
Two Stage Least Squares Solution
Condition number of matrix used to find the reduced form coefficients is no smaller than
P
CONSTANT
D
F
A
1
2
3
4
1
93.25
0.6492
-0.5285
-0.5230
Q
2
71.92
0.1559
0.1287
0.1274
Mean sum of squares of residuals for the reduced form equations.
1
2
P
Q
0.39831D+01
0.38317D+01
Condition number of the large matrix in Three Stage Least Squares
60.70221
4.135372945327849
4-28
Chapter 4
Test Case from Kmenta (1971) Pages 565 - 582
Three Stage Least Squares Solution for System Number
LHS Endogenous Variable No.
2
1
Demand Equation
Q
Exogenous Variables (Predetermined)
1
CONSTANT
2
D
94.63330
0.3139918
Std. Error
7.920838
0.4694366E-01
t
11.94738
6.688695
Theil SE
7.302652
0.4327991E-01
Theil t
12.95876
7.254908
Endogenous Variables (Jointly Dependent)
3
P
-0.2435565
Std. Error
0.9648429E-01
t
-2.524313
Theil SE
0.8895412E-01
Theil t
-2.738002
Residual Variance (For Structural Disturbances)
3.286454
Three Stage Least Squares Covariance for System
CONSTANT
D
P
CONSTANT
D
1
2
62.74
0.4930E-01 0.2204E-02
-0.6734
-0.2642E-02
1
2
3
Demand Equation
P
3
0.9309E-02
Three Stage Least Squares Solution for System Number
LHS Endogenous Variable No.
2
2
Supply Equation
Q
Exogenous Variables (Predetermined)
1
CONSTANT
2
F
3
A
52.11764
0.2289775
0.3579074
Std. Error
11.89337
0.4399381E-01
0.7288940E-01
t
4.382074
5.204767
4.910281
Theil SE
10.63776
0.3934926E-01
0.6519426E-01
Theil t
4.899308
5.819106
5.489861
Endogenous Variables (Jointly Dependent)
4
P
0.2289322
Std. Error
0.9967317E-01
t
2.296828
Theil SE
0.8915039E-01
Theil t
2.567932
Residual Variance (For Structural Disturbances)
5.360809
Three Stage Least Squares Covariance for System
CONSTANT
F
A
P
CONSTANT
1
141.5
-0.2950
-0.4090
-1.083
1
2
3
4
F
A
Supply Equation
P
2
3
0.1935E-02
0.2548E-02
0.8119E-03
0.5313E-02
0.1069E-02
4
0.9935E-02
Test Case from Kmenta (1971) Pages 565 - 582
Contemporaneous Covariance of Residuals (Structural Disturbances)
For Three Stage Least Squares Solution.
Condition Number of residual columns,
Demand E
Supply E
1
2
Demand E
1
3.286
4.111
6.321462
Supply E
2
5.361
Correlation Matrix of Residuals
Demand E
1
1
1.000
2
0.9794
Demand E
Supply E
Supply E
2
1.000
Test Case from Kmenta (1971) Pages 565 - 582
Coefficients of the Reduced Form Equations.
Three Stage Least Squares Solution using Orthogonal Factorization.
Condition number of matrix used to find the reduced form coefficients is no smaller than
P
CONSTANT
D
F
A
1
2
3
4
1
89.98
0.6645
-0.4846
-0.7575
Q
2
72.72
0.1521
0.1180
0.1845
Mean sum of squares of residuals for the reduced form equations.
1
2
P
Q
0.19065D+01
0.42494D+01
Iterated Three Stage Least Squares Results are given next.
4.232905401139098
Simultaneous Equations Systems
4-29
Iteration begins for Iterated 3SLSQ.
Condition number of the large matrix in Three Stage Least Squares
147.2220
Test Case from Kmenta (1971) Pages 565 - 582
Iterated Three Stage Least Squares Solution for System No.
LHS Endogenous Variable No.
2
1
Demand Equation
Q
Exogenous Variables (Predetermined)
1
CONSTANT
2
D
94.63330
0.3139918
Std. Error
7.920838
0.4694366E-01
t
11.94738
6.688695
Theil SE
7.302652
0.4327991E-01
Theil t
12.95876
7.254908
Endogenous Variables (Jointly Dependent)
3
P
-0.2435565
Std. Error
0.9648429E-01
t
-2.524313
Theil SE
0.8895412E-01
Theil t
-2.738002
Residual Variance (For Structural Disturbances)
3.286454
Iterated Three Stage Least Squares Covariance for System
Demand Equation
CONSTANT
D
P
CONSTANT
D
1
2
62.74
0.4930E-01 0.2204E-02
-0.6734
-0.2642E-02
1
2
3
P
3
0.9309E-02
Iterated Three Stage Least Squares Solution for System No.
LHS Endogenous Variable No.
2
2
Supply Equation
Q
Exogenous Variables (Predetermined)
1
CONSTANT
2
F
3
A
52.55269
0.2244964
0.3755747
Endogenous Variables (Jointly Dependent)
4
P
0.2270569
Std. Error
12.74080
0.4653972E-01
0.7166061E-01
t
4.124755
4.823758
5.241020
Theil SE
11.39572
0.4162639E-01
0.6409520E-01
Theil t
4.611616
5.393126
5.859638
Std. Error
0.1069194
t
2.123627
Theil SE
0.9563159E-01
Theil t
2.374287
Residual Variance (For Structural Disturbances)
5.565111
Iterated Three Stage Least Squares Covariance for System
Supply Equation
CONSTANT
F
A
P
CONSTANT
1
162.3
-0.3336
-0.4953
-1.245
1
2
3
4
F
A
P
2
3
4
0.2166E-02
0.3185E-02
0.9086E-03
0.5135E-02
0.1336E-02
0.1143E-01
Test Case from Kmenta (1971) Pages 565 - 582
Contemporaneous Covariance of Residuals (Structural Disturbances)
For Iterated Three Stage Least Squares Solution.
Condition Number of residual columns,
Demand E
Supply E
Demand E
1
3.286
4.198
1
2
6.814796
Supply E
2
5.565
Correlation Matrix of Residuals
Demand E
Supply E
1
2
Demand E
1
1.000
0.9816
Supply E
2
1.000
Test Case from Kmenta (1971) Pages 565 - 582
Coefficients of the Reduced Form Equations.
Iterated Three Stage Least Squares Solution.
Condition number of matrix used to find the reduced form coefficients is no smaller than
P
CONSTANT
D
F
A
1
2
3
4
1
89.42
0.6672
-0.4770
-0.7981
Q
2
72.86
0.1515
0.1162
0.1944
Mean sum of squares of residuals for the reduced form equations.
1
P
0.20576D+01
4.249772824974006
4-30
2
Chapter 4
Q
0.43519D+01
In Table 4.4 Kmenta (1971, 582) reports the 3SLS and iterative three squares coefficients for the
supply equation 1
3SLS 52.1972 (11.8934), .2286 (.0997)
I3SLS 55.5527 (12.7408), .2271 (.1069), .2245(.0465 ) and .3756 (.0717)
B34s gets
52.55269 (12.7408), .2270569 (.1069194), .2244964 (.04653972) and .3755747 (.07166061)
The coefficient 55.5527 reported by Kmenta and underlined in table 4.4 appears in error.
In Table 4.5 Kmenta (1986, 712) changes the estimated coefficients for iterative three stage least
squares. The new numbers are
52.6618 (12.8051) , .2266(.1075), .2234(.0468) and .3800 (.0720).
These numbers are quite different from the prior ones and bear some investigation. In the
Kmenta test problem, one equation (demand) was overidentified and one equation (supply) was
exactly identified. As was mentioned earlier, the 2SLS and 3SLS results for the overidentified
equation are the same because the other equation was exactly identified. However, the 3SLS
results for the exactly identified equation (supply) differ from the 2SLS results because the other
equation (demand) is over identified. Close inspection of the results for 3SLS for the demand
equation shows that they are the same as those of Kmenta (1971, 582) and Kmenta (1986, 712).
As notes the iterative least squares supply-equation results are the same as those of Kmenta
(1971) but differ slightly from those of Kmenta (1986), which appear to be in error. 10 To
facilitate testing, SAS and RATS setups are shown in Tables 4.2 and 4.3 and their output
discussed in some detail.
10 The file example.mac contains an extension of the above test case that calls RATS, SAS and a
B34S matrix implementation. For the supply equation SAS gets the Kmenta (1986) results which
are 52.1972 (11.8934), .2286 (.0997), .2282 (.0440), (.3611). What RATS calls 3SLS produces
what B34S calls I3SLS. Readers are encouraged to use the code in tables 4.4 and 4.5 to further
investigate this issue. A major difficulty for the researcher to be able to tell exactly what is being
estimated by a software system. For this reason attempting the model on multiple software
systems is strongly advised.
Simultaneous Equations Systems
Table 4.4 Kmenta (1971, 582) OLS, 2SLS, LIML, 3SLS, I3SLS, FIML Test Problem Answers
31
32
Chapter 4
Table 4.5 Kmenta (1986, 712) OLS, 2SLS, LIML, 3SLS, I3SLS, FIML Test Problem Answers
As noted earlier, the 2SLS and 3SLS results for the over- identified equation (demand)
are the same. However, the printout shows that the residual variance for the 2SLS result is
3.8664, while the residual variance for the 3SLS result is 3.2865. The reason for this apparent
error is that the 2SLS residual variance equals the sum of squared residuals divided by T-K,
while the 3SLS calculation uses T; hence, 3.8664 = 3.2865 *(20/17).
Simultaneous Equations Systems
33
To investigate the differences in the supply equation that occur in Kmenta (1971) and (1986),
edited and annotated SAS, RATS and Stata output is shown next. SAS 3SLS and I3SLS
output is shown to agree with Kmenta (1986) for both demand and supply equations. Note that
these numbers do not agree with Kmenta (1971)!
The SYSLIN Procedure
Three-Stage Least Squares Estimation
Parameter Estimates
Variable
Intercept
P
D
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
1
1
1
94.63330
-0.24356
0.313992
7.920838
0.096484
0.046944
11.95
-2.52
6.69
<.0001
0.0218
<.0001
Model
Dependent Variable
SUPPLY
Q
Parameter Estimates
Variable
Intercept
P
F
A
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
1
1
1
1
52.19720
0.228589
0.228158
0.361138
11.89337
0.099673
0.043994
0.072889
4.39
2.29
5.19
4.95
0.0005
0.0357
<.0001
0.0001
Endogenous Variables
DEMAND
SUPPLY
P
Q
0.243557
-0.22859
1
1
Exogenous Variables
DEMAND
SUPPLY
Intercept
D
F
A
94.6333
52.1972
0.313992
0
0
0.228158
0
0.361138
Inverse Endogenous Variables
P
Q
DEMAND
SUPPLY
2.11799
0.48415
-2.11799
0.51585
34
Chapter 4
The SYSLIN Procedure
Three-Stage Least Squares Estimation
Reduced Form
P
Q
Intercept
D
F
A
89.87924
72.74263
0.665032
0.152019
-0.48324
0.117695
-0.76489
0.186293
The SYSLIN Procedure
Iterative Three-Stage Least Squares Estimation
Parameter Estimates
Variable
Intercept
P
D
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
1
1
1
94.63330
-0.24356
0.313992
7.920838
0.096484
0.046944
11.95
-2.52
6.69
<.0001
0.0218
<.0001
Model
Dependent Variable
SUPPLY
Q
Parameter Estimates
Variable
Intercept
P
F
A
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
1
1
1
1
52.66182
0.226586
0.223372
0.380006
12.80511
0.107459
0.046774
0.072010
4.11
2.11
4.78
5.28
0.0008
0.0511
0.0002
<.0001
Endogenous Variables
DEMAND
SUPPLY
P
Q
0.243557
-0.22659
1
1
Exogenous Variables
DEMAND
SUPPLY
Intercept
D
F
A
94.6333
52.66182
0.313992
0
0
0.223372
0
0.380006
Inverse Endogenous Variables
P
Q
DEMAND
SUPPLY
2.127012
0.481952
-2.12701
0.518048
The SYSLIN Procedure
Iterative Three-Stage Least Squares Estimation
Reduced Form
P
Q
Intercept
D
F
A
89.27387
72.89007
0.667864
0.151329
-0.47512
0.115718
-0.80828
0.196861
RATS output is shown next for OLS, 2SLS, LIML, and 3SLS two ways. Note that the 3SLS
results 100% agree with what b34s and Kmenta get for I3SLS, not 3SLS. Rats is using the large
sample SE. Rats output for the same problem using the nonlin procedure gets the same answers.
The Rats Pro version 8.1 was used to make the calculations.
Output from RATS
*
* Data passed from B34S(r) system to RATS
*
display @1 %dateandtime() @33 ' Rats Version ' %ratsversion()
03/10/2012 15:05
Rats Version
8.10000
Simultaneous Equations Systems
*
CALENDAR(IRREGULAR)
ALLOCATE
20
OPEN DATA rats.dat
DATA(FORMAT=FREE,ORG=OBS,
MISSING=
0.1000000000000000E+32
Q
P
D
F
A
CONSTANT
SET TREND = T
TABLE
Series
Obs
Mean
Q
20 100.898200000
P
20 100.019050000
D
20 97.535000000
F
20 96.625000000
A
20 10.500000000
TREND
20 10.500000000
$
) / $
$
$
$
$
$
Std Error
3.756498224
5.926086394
11.830481371
12.708798237
5.916079783
5.916079783
Minimum
92.424000000
86.498000000
75.100000000
68.600000000
1.000000000
1.000000000
Maximum
106.232000000
113.490000000
127.100000000
110.800000000
20.000000000
20.000000000
*
*
heading=('test case from kmenta 1971 page 565 - 582 ' ) $
*
exogenous constant d f a $
*
endogenous p q $
*
model lvar=q rvar=(constant p d)
name=('demand eq.') $
*
model lvar=q rvar=(constant p f a) name=('supply eq.') $
linreg q
# constant p d
Linear Regression - Estimation by Least Squares
Dependent Variable Q
Usable Observations
20
Degrees of Freedom
17
Centered R^2
0.7637886
R-Bar^2
0.7359990
Uncentered R^2
0.9996894
Mean of Dependent Variable
100.89820000
Std Error of Dependent Variable
3.75649822
Standard Error of Estimate
1.93012724
Sum of Squared Residuals
63.331649953
Regression F(2,17)
27.4847
Significance Level of F
0.0000047
Log Likelihood
-39.9053
Durbin-Watson Statistic
1.7442
Variable
Coeff
Std Error
T-Stat
Signif
************************************************************************************
1. Constant
99.89542291
7.51936214
13.28509 0.00000000
2. P
-0.31629880
0.09067741
-3.48818 0.00281529
3. D
0.33463560
0.04542183
7.36729 0.00000110
linreg q
# constant p f a
Linear Regression - Estimation by Least Squares
Dependent Variable Q
Usable Observations
20
Degrees of Freedom
16
Centered R^2
0.6548075
R-Bar^2
0.5900838
Uncentered R^2
0.9995460
Mean of Dependent Variable
100.89820000
Std Error of Dependent Variable
3.75649822
Standard Error of Estimate
2.40508651
Sum of Squared Residuals
92.551058175
Regression F(3,16)
10.1170
Significance Level of F
0.0005602
Log Likelihood
-43.6991
Durbin-Watson Statistic
2.1097
Variable
Coeff
Std Error
T-Stat
Signif
************************************************************************************
1. Constant
58.275431202 11.462909888
5.08383 0.00011056
2. P
0.160366596 0.094883937
1.69013 0.11038810
3. F
0.248133295 0.046187854
5.37226 0.00006227
4. A
0.248302347 0.097517767
2.54623 0.02156713
instruments constant d f a
linreg(inst) q
# constant p d
Linear Regression - Estimation by Instrumental Variables
Dependent Variable Q
Usable Observations
20
Degrees of Freedom
17
Mean of Dependent Variable
100.89820000
Std Error of Dependent Variable
3.75649822
Standard Error of Estimate
1.96632066
Sum of Squared Residuals
65.729087795
35
36
Chapter 4
J-Specification(1)
Significance Level of J
Durbin-Watson Statistic
2.5357
0.1113010
2.0092
Variable
Coeff
Std Error
T-Stat
Signif
************************************************************************************
1. Constant
94.63330387
7.92083831
11.94738 0.00000000
2. P
-0.24355654
0.09648429
-2.52431 0.02183240
3. D
0.31399179
0.04694366
6.68869 0.00000381
linreg(inst) q
# constant p f a
Linear Regression - Estimation by Instrumental Variables
Dependent Variable Q
Usable Observations
20
Degrees of Freedom
16
Mean of Dependent Variable
100.89820000
Std Error of Dependent Variable
3.75649822
Standard Error of Estimate
2.45755523
Sum of Squared Residuals
96.633243702
Durbin-Watson Statistic
2.3846
Variable
Coeff
Std Error
T-Stat
Signif
************************************************************************************
1. Constant
49.532441699 12.010526407
4.12409 0.00079536
2. P
0.240075779 0.099933852
2.40235 0.02878451
3. F
0.255605724 0.047250071
5.40964 0.00005785
4. A
0.252924175 0.099655087
2.53800 0.02192877
source d:\r\liml.src
@liml q
# constant p d
Linear Regression - Estimation by LIML
Dependent Variable Q
Usable Observations
20
Degrees of Freedom
17
Centered R^2
0.7510682
R-Bar^2
0.7217821
Uncentered R^2
0.9996726
Mean of Dependent Variable
100.89820000
Std Error of Dependent Variable
3.75649822
Standard Error of Estimate
1.98141608
Sum of Squared Residuals
66.742164700
Regression F(2,17)
25.6459
Significance Level of F
0.0000074
Log Likelihood
-40.4298
Durbin-Watson Statistic
2.0517
Variable
Coeff
Std Error
T-Stat
Signif
************************************************************************************
1. Constant
93.61922028
8.03124312
11.65688 0.00000000
2. P
-0.22953809
0.09800238
-2.34217 0.03160318
3. D
0.31001345
0.04743306
6.53581 0.00000509
LIML Specification Test
Chi-Squared(1)=
3.477343 with Significance Level 0.06221456
@liml q
# constant p f a
Linear Regression - Estimation by LIML
Dependent Variable Q
Usable Observations
20
Degrees of Freedom
16
Centered R^2
0.6395819
R-Bar^2
0.5720035
Uncentered R^2
0.9995260
Mean of Dependent Variable
100.89820000
Std Error of Dependent Variable
3.75649822
Standard Error of Estimate
2.45755523
Sum of Squared Residuals
96.633243702
Regression F(3,16)
9.4643
Significance Level of F
0.0007834
Log Likelihood
-44.1307
Durbin-Watson Statistic
2.3846
Variable
Coeff
Std Error
T-Stat
Signif
************************************************************************************
1. Constant
49.532441699 12.010526407
4.12409 0.00079536
2. P
0.240075779 0.099933852
2.40235 0.02878451
3. F
0.255605724 0.047250071
5.40964 0.00005785
4. A
0.252924175 0.099655087
2.53800 0.02192877
LIML Specification Test
Chi-Squared(0)=-4.440892e-015 with Significance Level
equation demand q
# constant p d
equation supply q
NA
Simultaneous Equations Systems
# constant p f a
* Supply does not match known answers!!
sur(inst,iterations=200) 2
# demand resid1
# supply resid2
Linear Systems - Estimation by System Instrumental Variables
Iterations Taken
6
Usable Observations
20
J-Specification(1)
2.9831
Significance Level of J
0.0841370
Dependent Variable Q
Mean of Dependent Variable
Std Error of Dependent Variable
Standard Error of Estimate
Sum of Squared Residuals
Durbin-Watson Statistic
100.89820000
3.75649822
1.81285807
65.729087794
2.0092
Variable
Coeff
Std Error
T-Stat
Signif
************************************************************************************
1. Constant
94.63330387
7.30265210
12.95876 0.00000000
2. P
-0.24355654
0.08895412
-2.73800 0.00618138
3. D
0.31399179
0.04327991
7.25491 0.00000000
Dependent Variable Q
Mean of Dependent Variable
Std Error of Dependent Variable
Standard Error of Estimate
Sum of Squared Residuals
Durbin-Watson Statistic
100.89820000
3.75649822
2.35904587
111.30194805
2.0945
Variable
Coeff
Std Error
T-Stat
Signif
************************************************************************************
4. Constant
52.552667564 11.395623960
4.61165 0.00000399
5. P
0.227056969 0.095630772
2.37431 0.01758185
6. F
0.224496638 0.041626039
5.39318 0.00000007
7. A
0.375573566 0.064094682
5.85967 0.00000000
Covariance\Correlation Matrix of Coefficients
Q
Q
Q 3.2864543897
0.98159966
Q 4.1979241683 5.5650974026
nonlin(parmset=structural) c0 c1 c2 d0 d1 d2 d3
compute c0 = .1
compute c1 = .1
compute c2 = .1
compute d0 = .1
compute d1 = .1
compute d2 = .1
compute d3 = .1
frml d_eq q = c0 + c1*p + c2*d
frml s_eq q = d0 + d1*p + d2*f + d3*a
nlsystem(inst,parmset=structural,outsigma=v) * * d_eq s_eq
GMM-Factored Weight Matrix
Convergence in
6 Iterations. Final criterion was
Usable Observations
20
Function Value
2.98311941
J-Specification(1)
2.9831
Significance Level of J
0.0841370
Dependent Variable Q
Mean of Dependent Variable
Std Error of Dependent Variable
Standard Error of Estimate
Sum of Squared Residuals
Durbin-Watson Statistic
100.89820000
3.75649822
1.81285807
65.729087792
2.0092
Dependent Variable Q
Mean of Dependent Variable
Std Error of Dependent Variable
Standard Error of Estimate
Sum of Squared Residuals
Durbin-Watson Statistic
100.89820000
3.75649822
2.35904587
111.30194805
2.0945
0.0000065 <=
0.0000100
Variable
Coeff
Std Error
T-Stat
Signif
***************************************************************************************
1. C0
94.63330387
7.30265212
12.95876 0.00000000
2. C1
-0.24355654
0.08895412
-2.73800 0.00618138
3. C2
0.31399179
0.04327991
7.25491 0.00000000
4. D0
52.55266756 11.39562399
4.61165 0.00000399
5. D1
0.22705697
0.09563077
2.37431 0.01758185
6. D2
0.22449664
0.04162604
5.39318 0.00000007
7. D3
0.37557357
0.06409468
5.85967 0.00000000
37
38
Chapter 4
The Stata results are shown next. Note that 3SLS and I3SLS results agree 100% with the B34S
simeq answers and what was reported in Kments (1971).
output from stata
___ ____ ____ ____ ____ (R)
/__
/
____/
/
____/
___/
/
/___/
/
/___/
12.1
Statistics/Data Analysis
Copyright 1985-2011 StataCorp LP
StataCorp
4905 Lakeway Drive
College Station, Texas 77845 USA
800-STATA-PC
http://www.stata.com
979-696-4600
stata@stata.com
979-696-4601 (fax)
Single-user Stata perpetual license:
Serial number: 3012042652
Licensed to: Houston H. Stokes
U of Illinois
Notes:
1.
Stata running in batch mode
. do stata.do
. * File built by B34S
. run statdata.do
on
7/ 3/12
at
10:32:34
. about
Stata/IC 12.1 for Windows (32-bit)
Revision 06 Feb 2012
Copyright 1985-2011 StataCorp LP
Total physical memory:
2097151 KB
Available physical memory: 2097151 KB
Single-user Stata perpetual license:
Serial number: 3012042652
Licensed to: Houston H. Stokes
U of Illinois
. describe
Contains data
obs:
20
vars:
6
size:
960
---------------------------------------------------------------------------------------------------storage display
value
variable name
type
format
label
variable label
---------------------------------------------------------------------------------------------------q
double %10.0g
Food consumption per head
p
double %10.0g
Ratio of food prices to consumer prices
d
double %10.0g
Disposable income in constant prices
f
double %10.0g
Ratio of t 1 years price to general p
a
double %10.0g
Time
constant
double %10.0g
---------------------------------------------------------------------------------------------------Sorted by:
Note: dataset has changed since last saved
. summarize
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------q |
20
100.8982
3.756498
92.424
106.232
p |
20
100.0191
5.926086
86.498
113.49
d |
20
97.535
11.83048
75.1
127.1
f |
20
96.625
12.7088
68.6
110.8
a |
20
10.5
5.91608
1
20
-------------+-------------------------------------------------------constant |
20
1
0
1
1
. reg3 (q p d) (q p f a), 2sls
endog(p)
Two-stage least-squares regression
---------------------------------------------------------------------Equation
Obs Parms
RMSE
"R-sq"
F-Stat
P
---------------------------------------------------------------------q
20
2
1.966321
0.7548
23.81
0.0000
2q
20
3
2.457555
0.6396
10.70
0.0000
----------------------------------------------------------------------
Simultaneous Equations Systems
-----------------------------------------------------------------------------|
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------q
|
p | -.2435565
.0964843
-2.52
0.017
-.4398553
-.0472578
d |
.3139918
.0469437
6.69
0.000
.2184842
.4094994
_cons |
94.6333
7.920838
11.95
0.000
78.51824
110.7484
-------------+---------------------------------------------------------------2q
|
p |
.2400758
.0999339
2.40
0.022
.0367588
.4433927
f |
.2556057
.0472501
5.41
0.000
.1594747
.3517367
a |
.2529242
.0996551
2.54
0.016
.0501744
.455674
_cons |
49.53244
12.01053
4.12
0.000
25.09684
73.96804
-----------------------------------------------------------------------------Endogenous variables: q p
Exogenous variables:
d f a
-----------------------------------------------------------------------------. reg3 (q p d) (q p f a), 3sls
endog(p)
Three-stage least-squares regression
---------------------------------------------------------------------Equation
Obs Parms
RMSE
"R-sq"
chi2
P
---------------------------------------------------------------------q
20
2
1.812858
0.7548
56.02
0.0000
2q
20
3
2.315342
0.6001
38.20
0.0000
----------------------------------------------------------------------
39
40
Chapter 4
B34S 8.11F
31
(D:M:Y)
7/ 3/12 (H:M:S) 10:32:35
PGMCALL STEP
PAGE
-----------------------------------------------------------------------------|
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------q
|
p | -.2435565
.0889541
-2.74
0.006
-.4179034
-.0692097
d |
.3139918
.0432799
7.25
0.000
.2291647
.3988189
_cons |
94.6333
7.302652
12.96
0.000
80.32037
108.9462
-------------+---------------------------------------------------------------2q
|
p |
.2289322
.0891504
2.57
0.010
.0542006
.4036637
f |
.2289775
.0393493
5.82
0.000
.1518544
.3061006
a |
.3579074
.0651943
5.49
0.000
.230129
.4856858
_cons |
52.11764
10.63776
4.90
0.000
31.26802
72.96726
-----------------------------------------------------------------------------Endogenous variables: q p
Exogenous variables:
d f a
-----------------------------------------------------------------------------. reg3 (q p d) (q p f a), ireg3 endog(p)
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
1:
2:
3:
4:
5:
6:
7:
tolerance
tolerance
tolerance
tolerance
tolerance
tolerance
tolerance
=
=
=
=
=
=
=
.08379059
.01113651
.00158649
.00022817
.00003286
4.733e-06
6.818e-07
Three-stage least-squares regression, iterated
---------------------------------------------------------------------Equation
Obs Parms
RMSE
"R-sq"
chi2
P
---------------------------------------------------------------------q
20
2
1.812858
0.7548
56.02
0.0000
2q
20
3
2.359048
0.5849
36.80
0.0000
--------------------------------------------------------------------------------------------------------------------------------------------------|
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------q
|
p | -.2435565
.0889541
-2.74
0.006
-.4179034
-.0692097
d |
.3139918
.0432799
7.25
0.000
.2291647
.3988189
_cons |
94.6333
7.302652
12.96
0.000
80.32037
108.9462
-------------+---------------------------------------------------------------2q
|
p |
.2270569
.0956315
2.37
0.018
.0396226
.4144911
f |
.2244964
.0416263
5.39
0.000
.1429103
.3060825
a |
.3755745
.0640951
5.86
0.000
.2499504
.5011986
_cons |
52.55269
11.39571
4.61
0.000
30.21751
74.88787
-----------------------------------------------------------------------------Endogenous variables: q p
Exogenous variables:
d f a
-----------------------------------------------------------------------------.
end of do-file
What is to be made of this mystery? It is strange that b34s Kmenta (1971) and Stata agree 100%
fopr 3SLS and I3SLS but that SAS version 9.2 on the same problem supports Kmenta (1986).
Rats output suggests it is doing 3SLS when in fact what is going on is that it is calculating
I3SLS with the NONLIN command. Close inspection of the output shows that these numbers
support Kmenta (1971), Stata and b34s and not SAS or Kmenta (1986). It is to be noted that
Jennings (1980) who developed the b34s simeq fortran code used the Kmenta (1971) problem as
a test case but did not report numbers.
Section 4.5 below attempts to solve this mystery by using “text book” formulas to obtain
2SLS, 3SLS and FIML answers. Since the exact b34s MATRIX commands are given, the
calculation is 100% documented provided that the MATRIX command is working properly. All
coefficients agree 100% with Kmenta (1971)! For FIML the results are calculated using the
CMAX2 command which uses a zero finder routine from IMSL. The SE for the coefficients
were calculated as
| diag ( H 1 ) | where H is rthe hessian.
Simultaneous Equations Systems
41
4.4 Exactly identified systems
Table 4.6 shows the Kmenta supply and demand model modified to be exactly identified.
In this form of the model the exogenous variable a was removed from the demand equation. In
this case  can be directly estimated with OLS and does not have to be calculated as  B 1
using (4.1-4). It will be shown below that the LIML, 2SLS and 3SLS results are all the same. If
 is calculated from the biased OLS model over identified system, it will, however, not be the
same.
Table 4.6 Exactly Identified Kmenta Problem
/; Modified PROBLEM FROM KMENTA (1971) PAGE 565 - 582
b34sexec options ginclude('b34sdata.mac') member(kmenta);
b34srun;
b34sexec simeq printsys reduced ols liml ls2 ls3 ils3 icov ipr=6
itmax=2000 kcov=diag ;
heading=('Modified test case from kmenta 1971 pp 565-582' ) ;
* the variable a has been removed from demand equation ;
exogenous constant d f ;
endogenous p q ;
model lvar=q rvar=(constant p d)
name=('demand eq.') ;
model lvar=q rvar=(constant p f)
name=('supply eq.') ;
b34seend ;
b34sexec matrix;
call loaddata;
call olsq(q d f :print);
call olsq(p d f :print);
b34srun;
Edited output from running the code in Table 4.6 is shown below and will show alternative ways
to calculate the constrained reduced form:
Q = 71.7276 + .18278 D
(15.93)
(3.86)
+
.11739 F
(2.67)
(4.4-1)
P = 85.1843 + .4346 D
(10.19)
(4.95)
-
.28520 F
(-3.49)
(4.4-2)
which was estimated in (4.4-1) and (4.4-2) with OLS.
42
Chapter 4
Modified test case from kmenta 1971 pp 565-582
Least Squares Solution for System Number
Condition Number of Matrix is greater than
Relative Numerical Error in the Solution
LHS Endogenous Variable No.
2
Exogenous Variables (Predetermined)
1
CONSTANT
2
D
1
demand eq.
21.04911571706159
1.301987681166638E-11
Q
99.89542
0.3346356
Std. Error
7.519362
0.4542183E-01
t
13.28509
7.367285
Endogenous Variables (Jointly Dependent)
3
P
-0.3162988
Std. Error
0.9067741E-01
t
-3.488177
Residual Variance for Structural Disturbances
Ratio of Norm Residual to Norm LHS
3.725391173733892
1.762488253954560E-02
Modified test case from kmenta 1971 pp 565-582
Least Squares Solution for System Number
Condition Number of Matrix is greater than
Relative Numerical Error in the Solution
LHS Endogenous Variable No.
2
Exogenous Variables (Predetermined)
1
CONSTANT
2
F
2
supply eq.
17.64779394899586
1.349763156429639E-11
Q
65.56501
0.2137827
Endogenous Variables (Jointly Dependent)
3
P
0.1467363
Residual Variance for Structural Disturbances
Ratio of Norm Residual to Norm LHS
Std. Error
12.76481
0.5080064E-01
t
5.136387
4.208269
Std. Error
0.1089446
t
1.346889
7.650185613573186
2.525668087747731E-02
Modified test case from kmenta 1971 pp 565-582
Coefficients of the Reduced Form Equations.
Least Squares Solution.
Condition number of matrix used to find the reduced form coefficients is no smaller than
P
CONSTANT
D
F
1
2
3
4.319326581036200
Q
1
74.14
0.7227
-0.4617
2
76.44
0.1060
0.1460
Mean sum of squares of residuals for the reduced form equations.
1
2
P
Q
0.21861D+02
0.44308D+01
Condition Number of columns of exogenous variables,
9.7857
Modified test case from kmenta 1971 pp 565-582
Limited Information - Maximum Likelihood Solution f
1
demand eq.
Rank and Condition Number of Exogenous Columns
Rank and Condition Number of Endogenous Variables orthogonal to X(K)
Rank and Condition Number of Endogenous Variables orthogonal to X
Value of LIML
Parameter is
1.000000000000000
Condition Number of Matrix is greater than
Relative Numerical Error in the Solution
LHS Endogenous Variable No.
2
8.517463415017575
4.390231825107355E-12
Q
Standard Deviation Equals 2SLSQ Standard Deviation.
Exogenous Variables (Predetermined)
1
CONSTANT
2
D
106.7894
0.3616812
Endogenous Variables (Jointly Dependent)
3
P
-0.4115989
Residual Variance for Structural Disturbances
Ratio of Norm Residual to Norm LHS
3.967444759365652
1.818845186115603E-02
Modified test case from kmenta 1971 pp 565-582
Limited Information - Maximum Likelihood Solution f
2
supply eq.
2
1
2
8.5174634
1.0000000
1.0000000
Simultaneous Equations Systems
Rank and Condition Number of Exogenous Columns
Rank and Condition Number of Endogenous Variables orthogonal to X(K)
Rank and Condition Number of Endogenous Variables orthogonal to X
Value of LIML
Parameter is
2
1
2
43
7.8643511
1.0000000
1.0000000
1.000000000000000
Condition Number of Matrix is greater than
Relative Numerical Error in the Solution
LHS Endogenous Variable No.
2
7.864351104449048
5.058888259015094E-12
Q
Standard Deviation Equals 2SLSQ Standard Deviation.
Exogenous Variables (Predetermined)
1
CONSTANT
2
F
35.90387
0.2373297
Endogenous Variables (Jointly Dependent)
3
P
0.4205434
Residual Variance for Structural Disturbances
Ratio of Norm Residual to Norm LHS
10.49268888645498
2.957901371051407E-02
Modified test case from kmenta 1971 pp 565-582
Modified test case from kmenta 1971 pp 565-582
Coefficients of the Reduced Form Equations.
LIMLE Solution.
Condition number of matrix used to find the reduced form coefficients is no smaller than
P
CONSTANT
D
F
1
2
3
2.403435013906650
Q
1
85.18
0.4346
-0.2852
2
71.73
0.1828
0.1174
Modified test case from kmenta 1971 pp 565-582
Two Stage Least Squares Solution for System Number
Condition Number of Matrix is greater than
Relative Numerical Error in the Solution
LHS Endogenous Variable No.
2
Exogenous Variables (Predetermined)
1
CONSTANT
2
D
1
demand eq.
32.58122209700925
2.267663108215286E-11
Q
Std. Error
11.14355
0.5640608E-01
106.7894
0.3616812
Endogenous Variables (Jointly Dependent)
3
P
-0.4115989
Residual Variance for Structural Disturbances
Ratio of Norm Residual to Norm LHS
Std. Error
0.1448445
t
9.583069
6.412096
t
-2.841660
Theil SE
10.27384
0.5200383E-01
Theil SE
0.1335401
Theil t
10.39430
6.954895
Theil t
-3.082213
3.967444759365655
1.818845186115604E-02
Modified test case from kmenta 1971 pp 565-582
Two Stage Least Squares Solution for System Number
Condition Number of Matrix is greater than
Relative Numerical Error in the Solution
LHS Endogenous Variable No.
2
Exogenous Variables (Predetermined)
1
CONSTANT
2
F
2
supply eq.
22.96654225297699
2.323008755765498E-11
Q
35.90387
0.2373297
Endogenous Variables (Jointly Dependent)
3
P
0.4205434
Residual Variance for Structural Disturbances
Ratio of Norm Residual to Norm LHS
Std. Error
18.86754
0.6019217E-01
t
1.902944
3.942866
Theil SE
17.39501
0.5549444E-01
Theil t
2.064032
4.276639
Std. Error
0.1660421
t
2.532751
Theil SE
0.1530833
Theil t
2.747154
10.49268888645498
2.957901371051407E-02
Modified test case from kmenta 1971 pp 565-582
Coefficients of the Reduced Form Equations.
Two Stage Least Squares Solution
Condition number of matrix used to find the reduced form coefficients is no smaller than
P
CONSTANT
D
F
1
2
3
1
85.18
0.4346
-0.2852
Q
2
71.73
0.1828
0.1174
2.403435013906650
44
Chapter 4
Modified test case from kmenta 1971 pp 565-582
Three Stage Least Squares Solution for System Number
LHS Endogenous Variable No.
2
Exogenous Variables (Predetermined)
1
CONSTANT
2
D
1
Q
106.7894
0.3616812
Endogenous Variables (Jointly Dependent)
3
P
-0.4115989
Std. Error
11.14355
0.5640608E-01
t
9.583069
6.412096
Std. Error
0.1448445
t
-2.841660
Residual Variance (For Structural Disturbances)
2
Exogenous Variables (Predetermined)
1
CONSTANT
2
F
Theil SE
10.27384
0.5200383E-01
Theil SE
0.1335401
Theil t
10.39430
6.954895
Theil t
-3.082213
3.372328
Three Stage Least Squares Solution for System Number
LHS Endogenous Variable No.
demand eq.
2
supply eq.
Q
35.90387
0.2373297
Endogenous Variables (Jointly Dependent)
3
P
0.4205434
Std. Error
18.86754
0.6019217E-01
t
1.902944
3.942866
Theil SE
17.39501
0.5549444E-01
Theil t
2.064032
4.276639
Std. Error
0.1660421
t
2.532751
Theil SE
0.1530833
Theil t
2.747154
Residual Variance (For Structural Disturbances)
8.918786
Coefficients of the Reduced Form Equations.
Three Stage Least Squares Solution using Orthogonal Factorization.
Condition number of matrix used to find the reduced form coefficients is no smaller than
P
CONSTANT
D
F
1
2
3
2.403435013906646
Q
1
85.18
0.4346
-0.2852
2
71.73
0.1828
0.1174
Note that the following OLS regressions successfully replicate the constrained reduced form
values calculated by LIML, 2SLS and 3SLS models. In such exactly identified models it is
possible to proceed from the reduced form to the coefficients of the estimated simultaneous
structural model as shown in Table 4.1 for the theoretical model.
B34S(r) Matrix Command. d/m/y 13/ 5/08. h:m:s
=>
CALL LOADDATA$
=>
CALL OLSQ(Q D F :PRINT)$
Ordinary Least Squares Estimation
Dependent variable
Centered R**2
Adjusted R**2
Residual Sum of Squares
Residual Variance
Standard Error
Total Sum of Squares
Log Likelihood
Mean of the Dependent Variable
Std. Error of Dependent Variable
Sum Absolute Residuals
F( 2,
17)
F Significance
1/Condition XPX
Maximum Absolute Residual
Number of Observations
Variable
D
F
CONSTANT
=>
Lag
0
0
0
Coefficient
0.18278440
0.11738935
71.727578
8: 9:49.
Q
0.7142164973143195
0.6805949087630629
76.62264354549249
4.507214326205441
2.123020095572682
268.1142991999998
-41.81037433562074
100.8982000000000
3.756498223780113
32.24420684107891
21.24279452844673
0.9999762143066244
5.775396842473943E-07
4.421086526017319
20
SE
0.47299583E-01
0.44030665E-01
4.5035392
t
CALL OLSQ(P D F :PRINT)$
Ordinary Least Squares Estimation
Dependent variable
Centered R**2
Adjusted R**2
Residual Sum of Squares
P
0.6043888119424351
0.5578463192297805
263.9721582328006
3.8643977
2.6660816
15.926935
Simultaneous Equations Systems
Residual Variance
Standard Error
Total Sum of Squares
Log Likelihood
Mean of the Dependent Variable
Std. Error of Dependent Variable
Sum Absolute Residuals
F( 2,
17)
F Significance
1/Condition XPX
Maximum Absolute Residual
Number of Observations
Variable
D
F
CONSTANT
Lag
0
0
0
Coefficient
0.43463860
-0.28520325
85.184338
45
15.52777401369415
3.940529661567611
667.2514989500000
-54.17988429200851
100.0190500000000
5.926086393627488
56.50496104816216
12.98574220495295
0.9996226165906434
5.775396842473943E-07
9.070540097816391
20
SE
0.87792579E-01
0.81725152E-01
8.3590023
t
4.9507442
-3.4897855
10.190730
4.5 Analysis of OLS, 2SLS and 3SLS using Matrix Command
The matrix command, documented in Chapter 16, provides a means by which to
illustrate the estimation of OLS, 2SLS and 3SLS models using “classic textbook” formulas.
Table 4.7 shows code that implements OLS, 2SLS, 3SLS and FIML estimation using these
formulas:
Table 4.7 Matrix Command Implementation of OLS, 2SLS, 3SLS and FIML
/$
/$ Estimates Kmenta Problem with Matrix command.
/$ Purpose is to illustrate OLS/2SLS/3SLS/FIML both with
/$ SIMEQ and with Matrix Commands.
/$
/$ FIML SE same as 3SLS asymptotically (See Greene 5e page 408)
/$
/$ Problem Discussed in "Specifying and Diagnostically Testing
/$ Econometric Models" Chapter 4 Third Edition
/$
%b34slet verbose=0;
/$
set =1 to "test" matrix setup. Usually set=0
%b34slet dosimeq=1;
/$
set =1 to run the SIMEQ command as well as matrix
B34SEXEC DATA NOHEAD CORR$
INPUT Q P D F A $
LABEL Q = 'Food consumption per head'$
LABEL P = 'Ratio of Food Prices to consumer prices'$
LABEL D = 'Disposable Income in constant prices'$
LABEL F = 'Ratio of T-1 years price to general P'$
LABEL A = 'Time'$
COMMENT=('KMENTA(1971) PAGE 565 ANSWERS PAGE 582')$
DATACARDS$
98.485 100.323 87.4 98.0 1 99.187 104.264 97.6
102.163 103.435 96.7 99.1 3 101.504 104.506 98.2
104.240
98.001 99.8 110.8 5 103.243
99.456 100.5
103.993 101.066 103.2 105.6 7 99.900 104.763 107.8
100.350
96.446 96.6 108.7 9 102.820
91.228 88.9
95.435
93.085 75.1 81.0 11 92.424
98.801 76.9
94.535 102.908 84.6 70.9 13 98.757
98.756 90.6
105.797
95.119 103.1 102.3 15 100.225
98.451 105.1
103.522
86.498 96.4 110.5 17 99.929 104.016 104.4
105.223 105.769 110.7 89.3 19 106.232 113.490 127.1
B34SRETURN$
B34SEEND$
99.1
98.1
108.2
109.8
100.6
68.6
81.4
105.0
92.5
93.0
2
4
6
8
10
12
14
16
18
20
46
Chapter 4
%b34sif(&dosimeq.eq.1)%then;
B34SEXEC SIMEQ PRINTSYS REDUCED OLS LIML LS2 LS3 FIML FIMLC
KCOV=DIAG IPR=6$
HEADING=('Test Case from Kmenta (1971) Pages 565 - 582 ' ) $
EXOGENOUS CONSTANT D F A $
ENDOGENOUS P Q $
MODEL LVAR=Q RVAR=(CONSTANT P D)
NAME=('Demand Equation')$
MODEL LVAR=Q RVAR=(CONSTANT P F A) NAME=('Supply Equation')$
B34SEEND$
%b34sendif;
b34sexec matrix;
call loaddata;
verbose=0;
%b34sif(&verbose.ne.0)%then;
verbose=1;
%b34sendif;
x_1=mfam(catcol(constant p d));
x_2=mfam(catcol(constant p f a));
x_1px_1=transpose(x_1)*x_1;
x_2px_2=transpose(x_2)*x_2;
x_1py_1=transpose(x_1)*vfam(q);
x_2py_2=transpose(x_2)*vfam(q);
d1=inv(x_1px_1)*x_1py_1;
d2=inv(x_2px_2)*x_2py_2;
call print('OLS eq 1 ',d1 );
call print('OLS eq 2 ',d2 );
* 2SLS ;
* z_i is right hand side of equation i ;
x
= mfam(catcol(constant d f a));
xpx
= transpose(x)*x;
z_1
= mfam(catcol(constant p d) );
z_2
= mfam(catcol(constant p f a));
xpz_1
= transpose(x)*z_1;
xpz_2
= transpose(x)*z_2;
xpy_1
= transpose(x)*vfam(q);
xpy_2
= transpose(x)*vfam(q);
y_1py_1 = vfam(q)*vfam(q);
y_2py_2 = vfam(q)*vfam(q);
y_1py_2 = vfam(q)*vfam(q);
ls2eq1=inv(transpose(xpz_1)*inv(xpx)*xpz_1)*
(transpose(xpz_1)*inv(xpx)*xpy_1);
call print('Two stage estimates Equation 1',ls2eq1);
fit1=vfam(q)-z_1*ls2eq1;
sigma11=(y_1py_1 - (2.*vfam(q)*z_1*ls2eq1) +
ls2eq1*transpose(z_1)*z_1*ls2eq1)/17.;
if(verbose.ne.0)then;
call print('sigma11
',sigma11:);
call print('Residual Variance 1',sigma11*sigma11:);
call print('Test 1
',(fit1*fit1)/ 17.:);
call print('Large sample ',(fit1*fit1)/ 20.:);
endif;
Simultaneous Equations Systems
varcoef1=sigma11*inv(transpose(z_1)*x*inv(xpx)*transpose(x)*z_1);
call print('Asymptotic Covariance Matrix eq 1 ',varcoef1);
ls2eq2=inv(transpose(xpz_2)*inv(xpx)*xpz_2)*
(transpose(xpz_2)*inv(xpx)*xpy_2);
call print('Two stage estimates Equation 2',ls2eq2);
fit2=vfam(q)-z_2*ls2eq2;
sigma22=(y_2py_2 - (2.*vfam(q)*z_2*ls2eq2) +
ls2eq2*transpose(z_2)*z_2*ls2eq2)/16.;
if(verbose.ne.0)then;
call print('sigma22
',sigma22:);
call print('Residual Variance 2',sigma22*sigma22:);
call print('Test 2
',(fit2*fit2)/ 16.:);
call print('Large Sample ',(fit2*fit2)/ 20.:);
endif;
sigma12=(y_1py_2 - (vfam(q)*z_1*ls2eq1) - (vfam(q)*z_2*ls2eq2) +
ls2eq1*transpose(z_1)*z_2*ls2eq2)/20.;
if(verbose.ne.0)call print('test sigma12 ',sigma12);
varcoef2=sigma22*inv(transpose(z_2)*x*inv(xpx)*transpose(x)*z_2);
call print('Asymptotic Covariance Matrix eq 2 ',varcoef2);
* Get sigma(i,j) from fits ;
s=mfam(catcol(fit1,fit2));
sigma=(transpose(s)*s)/20.;
call print('Large Sample sigma (Jennings) ',sigma);
covar1=sigma(1,1)*inv(transpose(xpz_1)*inv(xpx)*xpz_1);
covar2=sigma(2,2)*inv(transpose(xpz_2)*inv(xpx)*xpz_2);
call print('Estimated Covariance Matrix - Large Sample':);
call print(covar1,covar2);
ls2se=dsqrt(array(:covar1(1,1),covar1(2,2),covar1(3,3)
covar2(1,1),covar2(2,2),covar2(3,3) covar2(4,4)));
call print('SE of LS2 Model Equations - Large Sample',ls2se);
sssigma(1,1)=sigma(1,1)*(20./17.);
sssigma(1,2)=sigma(1,2)*(20./dsqrt(17.*16.));
sssigma(2,1)=sigma(2,1)*(20./dsqrt(17.*16.));
sssigma(2,2)=sigma(2,2)*(20./16.);
call print('Kmenta (Small Sample Sigma
',sssigma);
covar1=sssigma(1,1)*inv(transpose(xpz_1)*inv(xpx)*xpz_1);
covar2=sssigma(2,2)*inv(transpose(xpz_2)*inv(xpx)*xpz_2);
call print('Estimated Covariance Matrix - Small Sample':);
call print(covar1,covar2);
ls2se=dsqrt(array(:diag(covar1),diag(covar2)));
call print('SE of LS2 Model Equations - Small Sample',ls2se);
* LS3 calculation ;
xpxinv=inv(xpx);
/$ sigma=inv(sssigma);
sigma=inv(sigma);
47
48
Chapter 4
term11= sigma(1,1)*(transpose(xpz_1)*xpxinv*xpz_1);
term12= sigma(1,2)*(transpose(xpz_1)*xpxinv*xpz_2);
term21= sigma(2,1)*(transpose(xpz_2)*xpxinv*xpz_1);
term22= sigma(2,2)*(transpose(xpz_2)*xpxinv*xpz_2);
left1 =catcol(term11 term12);
left2 =catcol(term21 term22);
left =catrow(left1 left2);
if(verbose.ne.0)
call print(term11 term12 term21 term22 left1 left2 left);
right1=(sigma(1,1)*(transpose(xpz_1)*xpxinv*xpy_1)) +
(sigma(1,2)*(transpose(xpz_1)*xpxinv*xpy_2));
right2=(sigma(2,1)*(transpose(xpz_2)*xpxinv*xpy_1)) +
(sigma(2,2)*(transpose(xpz_2)*xpxinv*xpy_2));
right=catrow(right1 right2);
call print(right1 right2 right,inv(left));
ls3=inv(left)*right;
call print('Three Stage Least Squares ',ls3);
ls3se = dsqrt(diag(inv(left)));
t3sls=array(norows(ls3):ls3(,1))/afam(ls3se);
call print('Three Stage Least Squares SE',ls3se);
call print('Three Stage Least Squares t ',t3sls);
* FIML following Kmenta (1971) pages 578 - 581 ;
* q = f(constant P D )
* q = g(constant p F A)
* q = a1 + a2*p + a3*d
* q = b1 + b2*p + b3*f
;
;
+ u1 ;
+ b4*a + u2;
y = transpose(mfam(catcol(q p)));
x = transpose(mfam(catcol(constant d f a)));
gt= 2.* dfloat(norows(y));
t =dfloat(norows(y));
call print('Using 3sls starting values ',ls3);
/$
/$
/$
/$
/$
/$
/$
a1=sfam(ls3(1));
a2=sfam(ls3(2));
a3=sfam(ls3(3));
b1=sfam(ls3(4));
b2=sfam(ls3(5));
b3=sfam(ls3(6));
b4=sfam(ls3(7));
program model;
bigb
= matrix(2,2:
1.0, -1.0*a2,
1.0, -1.0*b2);
biggamma = matrix(2,4:-1.0*a1, -1.0*a3, 0.0,
0.0,
-1.0*b1, 0.0,
-1.0*b3, -1.0*b4);
u1u2=bigb*y+biggamma*x;
phi
= u1u2*transpose(u1u2);
Simultaneous Equations Systems
49
/$ General purpose FIML setup if there are no identities
/$ For a discussion of Formulas see Kmenta (1971) page 578-581
func=(-1.0*(gt*pi())/2.0)
- ((t/2.0)*dlog(dmax1(dabs(det(phi)) ,.1d-30) ))
+ ( t
*dlog(dmax1(dabs(det(bigb)),.1d-30) ))
- (.5*sum(transpose(u1u2)*inv(phi)*u1u2));
call
call
call
call
call
outstring(3, 3,'Function');
outdouble(36,3,func);
outdouble(4, 4, a1);
outdouble(36,4, a2);
outdouble(55,4, a3);
call outdouble(4 ,5, b1);
call outdouble(36,5, b2);
call outdouble(55,5, b3);
call outdouble(4, 6, b4);
return;
end;
call
rvec
ll
uu
call
print(model);
=vector(7:ls3);
=vector(7:) -1.d+2;
=vector(7:) +1.d+3;
echooff;
call cmaxf2(func :name model
:parms a1 a2 a3 b1 b2 b3 b4
:ivalue rvec
:lower ll
:upper UU
:maxit 10000
:maxfun 10000
:maxg
10000
:print);
b34srun;
The matrices X_1 and X_2 are built with the catcol command and the OLS estimates for
equations 1 and 2 are respectively D1 and D2. Edited results show.
OLS eq 1
D1
= Vector of
99.8954
=>
3
elements
-0.316299
0.334636
CALL PRINT('OLS eq 2 ',D2 )$
OLS eq 2
D2
= Vector of
58.2754
4
0.160367
elements
0.248133
0.248302
which are consistent with what was obtained with the simeq command. Next using the
“textbook” 2SLS formula
50
Chapter 4
ˆ1  [ Z1' X ( X ' X ) 1 X ' Z1 ]1 [ Z1' X ( X ' X ) 1 X ' y1 ]
ˆ2  [ Z 2' X ( X ' X ) 1 X ' Z 2 ]1[ Z 2' X ( X ' X ) 1 X ' y2 ]
 i j  [eˆ1 , eˆ2 ]'[eˆ1 , eˆ2 ]/ T
(4.5-1)
we obtain the 2SLS estimates and the error covariance matrix  i , j which is needed for the 3SLS
estimates. Edited results match what was found earlier with simeq. Note that call echooff; has
been turned off to illustrate the steps of the calculation.
Two stage estimates Equation 1
LS2EQ1
= Vector of
94.6333
3
elements
-0.243557
0.313992
=>
FIT1=VFAM(Q)-Z_1*LS2EQ1$
=>
=>
SIGMA11=(Y_1PY_1 - (2.*VFAM(Q)*Z_1*LS2EQ1) +
LS2EQ1*TRANSPOSE(Z_1)*Z_1*LS2EQ1)/17.$
=>
IF(VERBOSE.NE.0)THEN$
=>
CALL PRINT('sigma11
=>
CALL PRINT('Residual Variance
=>
CALL PRINT('Test 1
=>
CALL PRINT('Large sample ',(FIT1*FIT1)/ 20.:)$
=>
ENDIF$
=>
VARCOEF1=SIGMA11*INV(TRANSPOSE(Z_1)*X*INV(XPX)*TRANSPOSE(X)*Z_1)$
=>
CALL PRINT('Asymptotic Covariance Matrix eq 1 ',VARCOEF1)$
',SIGMA11:)$
1',SIGMA11*SIGMA11:)$
',(FIT1*FIT1)/ 17.:)$
Asymptotic Covariance Matrix eq 1
VARCOEF1= Matrix of
1
2
3
1
62.7397
-0.673422
0.493016E-01
3
by
3
2
-0.673422
0.930922E-02
-0.264190E-02
elements
3
0.493016E-01
-0.264190E-02
0.220371E-02
=>
=>
LS2EQ2=INV(TRANSPOSE(XPZ_2)*INV(XPX)*XPZ_2)*
(TRANSPOSE(XPZ_2)*INV(XPX)*XPY_2)$
=>
CALL PRINT('Two stage estimates Equation 2',LS2EQ2)$
Two stage estimates Equation 2
LS2EQ2
= Vector of
49.5324
4
0.240076
elements
0.255606
0.252924
Simultaneous Equations Systems
=>
FIT2=VFAM(Q)-Z_2*LS2EQ2$
=>
=>
SIGMA22=(Y_2PY_2 - (2.*VFAM(Q)*Z_2*LS2EQ2) +
LS2EQ2*TRANSPOSE(Z_2)*Z_2*LS2EQ2)/16.$
=>
IF(VERBOSE.NE.0)THEN$
=>
CALL PRINT('sigma22
=>
CALL PRINT('Residual Variance 2',SIGMA22*SIGMA22:)$
=>
CALL PRINT('Test 2
=>
CALL PRINT('Large Sample ',(FIT2*FIT2)/ 20.:)$
=>
ENDIF$
=>
=>
SIGMA12=(Y_1PY_2 - (VFAM(Q)*Z_1*LS2EQ1) - (VFAM(Q)*Z_2*LS2EQ2) +
LS2EQ1*TRANSPOSE(Z_1)*Z_2*LS2EQ2)/20.$
=>
IF(VERBOSE.NE.0)CALL PRINT('test sigma12 ',SIGMA12)$
=>
VARCOEF2=SIGMA22*INV(TRANSPOSE(Z_2)*X*INV(XPX)*TRANSPOSE(X)*Z_2)$
=>
CALL PRINT('Asymptotic Covariance Matrix eq 2 ',VARCOEF2)$
',SIGMA22:)$
',(FIT2*FIT2)/ 16.:)$
Asymptotic Covariance Matrix eq 2
VARCOEF2= Matrix of
1
2
3
4
1
144.253
-1.09541
-0.323818
-0.295229
4
by
2
-1.09541
0.998677E-02
0.936222E-03
0.579069E-03
4
elements
3
-0.323818
0.936222E-03
0.223257E-02
0.137681E-02
4
-0.295229
0.579069E-03
0.137681E-02
0.993114E-02
=>
* GET SIGMA(I,J) FROM FITS $
=>
S=MFAM(CATCOL(FIT1,FIT2))$
=>
SIGMA=(TRANSPOSE(S)*S)/20.$
=>
CALL PRINT('Large Sample sigma (Jennings) ',SIGMA)$
Large Sample sigma (Jennings)
SIGMA
1
2
= Matrix of
1
3.28645
3.59324
2
by
2
elements
2
3.59324
4.83166
=>
COVAR1=SIGMA(1,1)*INV(TRANSPOSE(XPZ_1)*INV(XPX)*XPZ_1)$
=>
COVAR2=SIGMA(2,2)*INV(TRANSPOSE(XPZ_2)*INV(XPX)*XPZ_2)$
=>
CALL PRINT('Estimated Covariance Matrix - Large Sample':)$
51
52
Chapter 4
Estimated Covariance Matrix - Large Sample
=>
CALL PRINT(COVAR1,COVAR2)$
COVAR1
1
2
3
COVAR2
1
2
3
4
= Matrix of
1
53.3287
-0.572408
0.419064E-01
3
3
2
-0.572408
0.791284E-02
-0.224561E-02
= Matrix of
1
115.402
-0.876328
-0.259055
-0.236183
by
4
3
0.419064E-01
-0.224561E-02
0.187315E-02
by
4
2
-0.876328
0.798942E-02
0.748977E-03
0.463256E-03
elements
elements
3
-0.259055
0.748977E-03
0.178606E-02
0.110144E-02
4
-0.236183
0.463256E-03
0.110144E-02
0.794491E-02
=>
LS2SE=DSQRT(ARRAY(:DIAG(COVAR1),DIAG(COVAR2)))$
=>
CALL PRINT('SE of LS2 Model Equations - Large Sample',LS2SE)$
SE of LS2 Model Equations - Large Sample
LS2SE
= Array
of
7.30265
7
elements
0.889541E-01
0.432799E-01
=>
SSSIGMA(1,1)=SIGMA(1,1)*(20./17.)$
=>
SSSIGMA(1,2)=SIGMA(1,2)*(20./DSQRT(17.*16.))$
=>
SSSIGMA(2,1)=SIGMA(2,1)*(20./DSQRT(17.*16.))$
=>
SSSIGMA(2,2)=SIGMA(2,2)*(20./16.)$
=>
CALL PRINT('Kmenta (Small Sample Sigma
10.7425
',SSSIGMA)$
Kmenta (Small Sample Sigma
SSSIGMA = Matrix of
1
2
1
3.86642
4.35744
2
by
2
elements
2
4.35744
6.03958
=>
COVAR1=SSSIGMA(1,1)*INV(TRANSPOSE(XPZ_1)*INV(XPX)*XPZ_1)$
=>
COVAR2=SSSIGMA(2,2)*INV(TRANSPOSE(XPZ_2)*INV(XPX)*XPZ_2)$
=>
CALL PRINT('Estimated Covariance Matrix - Small Sample':)$
Estimated Covariance Matrix - Small Sample
=>
CALL PRINT(COVAR1,COVAR2)$
COVAR1
= Matrix of
1
3
by
2
3
elements
3
0.893836E-01
0.422617E-01
0.891342E-01
Simultaneous Equations Systems
1
2
3
COVAR2
1
2
3
4
62.7397
-0.673422
0.493016E-01
= Matrix of
1
144.253
-1.09541
-0.323818
-0.295229
-0.673422
0.930922E-02
-0.264190E-02
4
by
0.493016E-01
-0.264190E-02
0.220371E-02
4
2
-1.09541
0.998677E-02
0.936222E-03
0.579069E-03
53
elements
3
-0.323818
0.936222E-03
0.223257E-02
0.137681E-02
4
-0.295229
0.579069E-03
0.137681E-02
0.993114E-02
=>
=>
LS2SE=DSQRT(ARRAY(:COVAR1(1,1),COVAR1(2,2),COVAR1(3,3)
COVAR2(1,1),COVAR2(2,2),COVAR2(3,3) COVAR2(4,4)))$
=>
CALL PRINT('SE of LS2 Model Equations - Small Sample',LS2SE)$
SE of LS2 Model Equations - Small Sample
LS2SE
= Array
of
7.92084
7
elements
0.964843E-01
0.469437E-01
12.0105
0.999339E-01
0.472501E-01
0.996551E-01
Note that the estimated asymptotic covariance matrix for each equation was calculated as
ˆ11[ Z1 ' X ( X ' X ) 1 X ' Z1 ]1
ˆ 22 [ Z 2 ' X ( X ' X ) 1 X ' Z 2 ]1
(4.5-2)
The SE for each coefficient is the square root of the diagonal elements of the estimated
covariance matrix. The 3SLS model is estimated using the “textbook” equation as
ˆ1,1 [ Z1' X [ X ' X ]1 X ' y1 ]  

1 
ˆ1,1[ Z1' X [ X ' X ]1 X ' Z1 ˆ1,2 [ Z1' X [ X ' X ]1 X ' Z 2  ˆ1,2 [ Z1' X [ X ' X ]1 X ' y2 ] 

 
 (4.5-3)
'
1
'
1
'
1
ˆ2,1[ Z 2 X [ X ' X ] X ' Z1 ˆ2,2 [ Z 2 X [ X ' X ] X ' Z 2  ˆ2,1 [ Z 2 X [ X ' X ] X ' y1 ]  
ˆ [ Z ' X [ X ' X ]1 X ' y ] 
2
 2,2 2

where   [ ]1 . Equation (4.5-3) comes directly from Kmenta (1971, 577) and is consistent
with Theil (1971, 510). The estimated output verifies the simeq 3SLS command. In the matrix
program each term in (4.5-3) is output and put together into the left and right parts of (4.5-3),
which at first looks formidable.
=>
* LS3 CALCULATION $
=>
XPXINV=INV(XPX)$
=>
SIGMA=INV(SIGMA)$
=>
TERM11= SIGMA(1,1)*(TRANSPOSE(XPZ_1)*XPXINV*XPZ_1)$
=>
TERM12= SIGMA(1,2)*(TRANSPOSE(XPZ_1)*XPXINV*XPZ_2)$
=>
TERM21= SIGMA(2,1)*(TRANSPOSE(XPZ_2)*XPXINV*XPZ_1)$
54
Chapter 4
=>
TERM22= SIGMA(2,2)*(TRANSPOSE(XPZ_2)*XPXINV*XPZ_2)$
=>
LEFT1 =CATCOL(TERM11 TERM12)$
=>
LEFT2 =CATCOL(TERM21 TERM22)$
=>
LEFT
=>
IF(VERBOSE.NE.0)THEN$
=>
CALL PRINT(TERM11 TERM12 TERM21 TERM22 LEFT1 LEFT2 LEFT)$
=>
ENDIF$
=>
=>
RIGHT1=(SIGMA(1,1)*(TRANSPOSE(XPZ_1)*XPXINV*XPY_1)) +
(SIGMA(1,2)*(TRANSPOSE(XPZ_1)*XPXINV*XPY_2))$
=>
=>
RIGHT2=(SIGMA(2,1)*(TRANSPOSE(XPZ_2)*XPXINV*XPY_1)) +
(SIGMA(2,2)*(TRANSPOSE(XPZ_2)*XPXINV*XPY_2))$
=>
RIGHT=CATROW(RIGHT1 RIGHT2)$
=>
CALL PRINT(RIGHT1 RIGHT2 RIGHT,INV(LEFT))$
RIGHT1
=CATROW(LEFT1 LEFT2)$
= Vector of
3
842.104
RIGHT2
RIGHT
1
2
3
4
5
6
7
84261.3
= Vector of
-208.606
82406.3
4
elements
-20873.2
= Matrix of
-20220.4
7
by
1
elements
7
by
7
elements
1
53.3287
-0.572408
0.419064E-01
52.0707
-0.556756
0.337445E-01
0.509185E-01
2
-0.572408
0.791284E-02
-0.224561E-02
-0.291667
0.494945E-02
-0.180825E-02
-0.272854E-02
3
0.419064E-01
-0.224561E-02
0.187315E-02
-0.232929
0.632767E-03
0.150833E-02
0.227598E-02
=>
LS3=INV(LEFT)*RIGHT$
=>
CALL PRINT('Three State Least Squares ',LS3)$
Three Stage Least Squares
LS3
= Matrix of
1
2
3
4
5
6
7
=>
-2196.91
1
842.104
84261.3
82406.3
-208.606
-20873.2
-20220.4
-2196.91
Matrix of
1
2
3
4
5
6
7
elements
7
by
1
94.6333
-0.243557
0.313992
52.1176
0.228932
0.228978
0.357907
LS3SE = DSQRT(DIAG(INV(LEFT)))$
1
elements
4
52.0707
-0.291667
-0.232929
113.162
-0.866671
-0.235979
-0.327163
5
-0.556756
0.494945E-02
0.632767E-03
-0.866671
0.794779E-02
0.649506E-03
0.855426E-03
6
0.337445E-01
-0.180825E-02
0.150833E-02
-0.235979
0.649506E-03
0.154836E-02
0.203856E-02
7
0.509185E-01
-0.272854E-02
0.227598E-02
-0.327163
0.855426E-03
0.203856E-02
0.425029E-02
Simultaneous Equations Systems
=>
55
CALL PRINT('Three State Least Squares SE',LS3SE)$
Three State Least Squares SE
LS3SE
= Vector of
7.30265
7
0.889541E-01
elements
0.432799E-01
10.6378
0.891504E-01
0.393493E-01
0.651943E-01
The estimated standard errors are those suggested by Theil. The FIML estimation method
required a maximization procedure. Kmenta (1971) shows that for a model without constraints
FIML maximizes
L
GT
T
1 T
log(2 )  log |  | T log | B |   ( Byt  xt )'  1 ( Byt  xt )
2
2
2 t 1
(4.5-4)
where G  M or the number of equations in the model. The Kmenta test problem can be written
q  1   2 P  3 D  u1
 Demand
q  1   2 P   3F   4 A  u2
 Supply
(4.5-5)
For this problem
0 
 1  3 0
1  2 
11 12 
B
,


,


 0     

  
3
4
1   2 
 12 22 
 1
and | B | and |  | refer to the Jacobian or absolute value of the determinant of B and 
respectively. Using the matrix command it is fairly easy to implement this estimator. Problems
can arise of there are local maximums in the problem. The edited FIML results are given next.
=>
PROGRAM MODEL$
=>
CALL PRINT(MODEL)$
MODEL
= Program
PROGRAM MODEL$
BIGB
= MATRIX(2,2:
1.0, -1.0*A2,
1.0, -1.0*B2)$
BIGGAMMA = MATRIX(2,4:-1.0*A1, -1.0*A3, 0.0,
0.0,
-1.0*B1, 0.0,
-1.0*B3, -1.0*B4)$
U1U2=BIGB*Y+BIGGAMMA*X$
PHI
= U1U2*TRANSPOSE(U1U2)$
FUNC=(-1.0*(GT*PI())/2.0)
- ((T/2.0)*DLOG(DMAX1(DABS(DET(PHI)) ,.1D-30) ))
+ ( T
*DLOG(DMAX1(DABS(DET(BIGB)),.1D-30) ))
- (.5*SUM(TRANSPOSE(U1U2)*INV(PHI)*U1U2))$
CALL OUTSTRING(3, 3,'Function')$
CALL OUTDOUBLE(36,3,FUNC)$
CALL OUTDOUBLE(4, 4, A1)$
CALL OUTDOUBLE(36,4, A2)$
CALL OUTDOUBLE(55,4, A3)$
CALL OUTDOUBLE(4 ,5, B1)$
CALL OUTDOUBLE(36,5, B2)$
56
Chapter 4
CALL OUTDOUBLE(55,5, B3)$
CALL OUTDOUBLE(4, 6, B4)$
RETURN$
END$
=>
RVEC =VECTOR(7:LS3)$
=>
LL
=VECTOR(7:) -1.D+2$
=>
UU
=VECTOR(7:)
=>
CALL ECHOOFF$
+1.D+3$
Constrained Maximum Likelihood Estimation using CMAXF2 Command
Final Functional Value
-13.37570521223952
# of parameters
7
# of good digits in function 15
# of iterations
28
# of function evaluations
55
# of gradiant evaluations
30
Scaled Gradient Tolerance
6.055454452393343E-06
Scaled Step Tolerance
3.666852862501036E-11
Relative Function Tolerance
3.666852862501036E-11
False Convergence Tolerance
2.220446049250313E-14
Maximum allowable step size
108037.5007234256
Size of Initial Trust region -1.000000000000000
1 / Cond. of Hessian Matrix
2.229180241990960E-09
#
1
2
3
4
5
6
7
Name
A1
A2
A3
B1
B2
B3
B4
Coefficient
93.619219
-0.22953804
0.31001341
51.944511
0.23730613
0.22081875
0.36970888
Standard Error
3.4191227
0.60544227E-01
0.34296485E-01
7.3541629
0.45456398E-01
0.28752980E-01
0.14370566E-01
T Value
27.381064
-3.7912458
9.0392183
7.0632799
5.2205221
7.6798560
25.726814
SE calculated as sqrt |diagonal(inv(%hessian))|
Hessian Matrix
1
2
3
4
5
6
7
1
230.516
23086.3
22522.1
-174.305
-17457.4
-16823.6
-1834.45
2
23089.2
0.231266E+07
0.225634E+07
-17456.3
-0.174875E+07
-0.168477E+07
-183704.
3
22524.9
0.225660E+07
0.220289E+07
-17029.9
-0.170618E+07
-0.164463E+07
-179499.
4
-174.328
-17458.5
-17032.0
135.877
13607.8
13115.4
1430.03
5
-17459.8
-0.174897E+07
-0.170639E+07
13609.6
0.136313E+07
0.131342E+07
143201.
6
-16825.9
-0.168498E+07
-0.164483E+07
13117.1
0.131360E+07
0.126732E+07
137898.
7
-1834.71
-183728.
-179522.
1430.22
143221.
137918.
15323.9
Gradiant Vector
-0.568518E-06
-0.557801E-04
-0.544320E-04
0.447704E-06
0.438995E-04
0.419615E-04
0.528029E-05
Lower vector
-100.000
-100.000
-100.000
-100.000
-100.000
-100.000
-100.000
1000.00
1000.00
1000.00
1000.00
1000.00
1000.00
Upper vector
1000.00
B34S Matrix Command Ending. Last Command reached.
Space available in allocator
Number variables used
Number temp variables used
7873665, peak space used
130, peak number used
36882, # user temp clean
8277
135
0
Simultaneous Equations Systems
57
and replicate the Kmenta (1971) test values for coefficients. The simeq FIML results are:
Test Case from Kmenta (1971) Pages 565 - 582
Functional Minimization Solution for System No.
LHS Endogenous Variable No.
2
1
Demand Equation
Q
Exogenous Variables (Predetermined)
1
CONSTANT
2
D
93.61922
0.3100134
Std. Error
6.152863
0.3633922E-01
t
15.21555
8.531097
Theil SE
5.672659
0.3350311E-01
Theil t
16.50359
9.253274
Endogenous Variables (Jointly Dependent)
3
P
-0.2295381
Std. Error
0.7508118E-01
t
-3.057199
Theil SE
0.6922143E-01
Theil t
-3.315998
Residual Variance (For Structural Disturbances)
3.337108
Functional Minimization 3SLS Covariance for System
CONSTANT
D
P
CONSTANT
D
1
2
37.86
0.3121E-01 0.1321E-02
-0.4078
-0.1600E-02
1
2
3
Demand Equation
P
3
0.5637E-02
Functional Minimization Solution for System No.
LHS Endogenous Variable No.
2
2
Supply Equation
Q
Exogenous Variables (Predetermined)
1
CONSTANT
2
F
3
A
51.94451
0.2208188
0.3697089
Std. Error
9.739647
0.3489965E-01
0.5846143E-01
t
5.333305
6.327249
6.323981
Theil SE
8.711405
0.3121520E-01
0.5228949E-01
Theil t
5.962816
7.074080
7.070425
Endogenous Variables (Jointly Dependent)
4
P
0.2373061
Std. Error
0.8237774E-01
t
2.880707
Theil SE
0.7368089E-01
Theil t
3.220728
Residual Variance (For Structural Disturbances)
5.620947
Functional Minimization 3SLS Covariance for System
CONSTANT
F
A
P
CONSTANT
1
94.86
-0.1858
-0.3119
-0.7341
1
2
3
4
F
A
Supply Equation
P
2
3
4
0.1218E-02
0.1943E-02
0.4772E-03
0.3418E-02
0.8825E-03
0.6786E-02
Test Case from Kmenta (1971) Pages 565 - 582
Contemporaneous Covariance of Residuals (Structural Disturbances)
For Functional Minimization 3SLSQ Solution.
Condition Number of residual columns,
Demand E
Supply E
Demand E
1
3.337
4.255
1
2
6.942988
Supply E
2
5.621
Correlation Matrix of Residuals
Demand E
1
1
1.000
2
0.9824
Demand E
Supply E
Supply E
2
1.000
Test Case from Kmenta (1971) Pages 565 - 582
Coefficients of the Reduced Form Equations.
Condition number of matrix used to find the reduced form coefficients is no smaller than
P
CONSTANT
D
F
A
1
2
3
4
1
89.27
0.6641
-0.4730
-0.7919
Q
2
73.13
0.1576
0.1086
0.1818
Mean sum of squares of residuals for the reduced form equations.
1
2
P
Q
0.20588D+01
0.43479D+01
4.284084281338983
58
Chapter 4
and give identical coefficients but different SE's due to the algorithm used. Greene (2003, page
408), notes that "asymptotically the covariance matrix for the FIML estimator is the same as that
for the 3SLS estimator."
The purpose of this exercise has been to illustrate how "textbook" formulas can be used
with a programming language, such as the matrix command, to produce 2SLS, 3SLS and FIML
estimates fairly easily where the alternative would be to build a C or Fortran program to perform
the calculation. Since "textbook" formulas are used for the matrix example, the accuracy of
these calculations are inferior to the QR approach of Jennings (1980), which is the basis for the
simeq command. Inspection of the matrix program that implements these estimators may give
the reader confidence to tackle other calculations that have not been implemented in commercial
software.11 The matrix examples shown have been coded for teaching purposes (clarity of the
code) not research purposes. Many components of the calculation that appear a number of places
in a formula such as (4.4-3) have not been calculated once and saved.
4.6 LS2 and GMM Models and Specification tests
The Generalized Method of Moments estimation technique is a generalization of 2SLS
that allows for various assumptions on the error distribution. Assume there are l instruments in
Z. The basic idea of GMM is to select coefficients ˆGMM such that
g (ˆGMM )  0
(4.6-1)
where
1 N
1 N
1
g ( ˆ )   gi (  )   zi' ( yi  xi  )  Z 'u
N i1
N i1
N
(4.6-2)
It can be shown that the efficient GMM estimator is
ˆEGMM  ( X ' ZS 1Z ' X )1 X ' ZS 1Z ' y
(4.6-3)
where
S  E[ Z ' uu ' Z ]  E[ Z ' Z ]
(4.6-4)
Using the 2SLS residuals, a heteroskedasticity-consistent estimator of S can be obtained as
11 The modern pace of research is so fast that if one waits until a new procedure is implemented
in commercial software, often it is too late.
Simultaneous Equations Systems
1 N
Sˆ   uˆ 2 Z i' Z i
N i1
59
(4.6-5)
which has been characterized as a standard sandwich approach to robust covariance estimation.
For more details see Davidson and MacKinnon (1993, 607-610) and Baum (2006, 194-197)
Hall – Rudebusch - Wilcox (1996) proposed a likelihood ratio test of the relevance of
instrumental variables Z that is based on canonical correlation between the X and Z ri . The
ordered canonical correlation vector can be calculated as the square root of the eigenvalues of
( X ' X )1 ( X ' Z )( Z ' Z )1 (Z ' X )
(4.6-6)
with associated eigenvectors  i or the square root of the eigenvalues of
(Z ' Z )1 (Z ' X )( X ' X )1 ( X ' Z )
(4.6-7)
with associated eigenvectors  i . The vectors 1 and  1 maximize the correlation between X
and Z  which equals r1 . As noted by Hall-Rudebusch-Wilcox (1996, 287) “  j and  j are the
vectors which yield the j th highest correlation r j subject to the constrains that X  j and Z  j
are orthogonal.” The proposed Anderson statistic
n
LR  T  log(1 ri 2 )
(4.6-8)
i  j 1
is distributed as Chi-squared with (l-k+1) degrees of freedom where l is the rank of Z and k is
the rank of X and can be applied to both 2SLS and GMM models. A significant statistic is
consistent with appropriate instruments. A disadvantage of the Anderson test, is that it assumes
that the regressors are distributed multivariate normal. Further information on the Anderson test
is in Baum (2006, 208). The Anderson statistic can also be displayed in LM form as
N min(ri )
(4.6-9)
or in the Cragg-Donald (1993) form as
( N min(ri )) / (1  min(ri )) .
If these statistics are not significant, the instruments selected are weak.
(4.6-10)
60
Chapter 4
For GMM estimation, the Hansen (1982) J statistic which tests for overidentifying
restrictions is usually used. The Hansen test, which is also called the Sargon (1958) test, is the
value of the efficient GMM objective function
u ' ZS 1Z ' u
(4.6-11)
and is distributed as chi-square with degrees of freedom l-k or the number of over identifying
restrictions. A significant value indicates the selected instruments are not suitable. For 2SLS the
J statistic is
NR 2
(4.6-12)
which is also distributed as  2 (l  k ) .
The Basmann (1960) over identification test is
 (u ' u  uZ' uZ ) 
( N  l )  LS 2 LS'2

uZ uZ


(4.6-11)
where uLS 2 is the residual from the LS2 equation and uz is the residual from a model that
predicts uLS 2 as a function of Z. The Basmann test is distributed as chi-square with degrees of
freedom l-k. If the instruments Z have no predictive power, or in other words are orthogonal to
'
'
the LS2 residuals, then uLS
2u LS 2  u Z u Z and the chi-square value will not be significant. A
significant chi-square value, however, indicates that the instruments are not suitable since they
are not exogenous.
The Hausman (1978) test is discussed in some detail in Camereon-Trivedi (2005, 271276). The basic test is
H  (ˆ   )'(Vˆ[ ]  Vˆ[ˆ]) 1 (ˆ   )
(4.6-12)
Where ˆ is the OLS estimator and  is the instrumental variable estimator. H is distributed as
 2 (k ) where k is the number of endogenous variables tested. A significant value suggests that
OLS should not be used.
Table 4.8 lists subroutines LS2 and GMMEST that estimate 2SLS and GMM models
respectively. For an exactly identified system, LS2 and GMM will be the same. For an
overidentified system, GMM is more efficient.
Simultaneous Equations Systems
Table 4.8 LS2 and General Method of Moments estimation routines
subroutine ls2(y1,x1,z1,var_name,yvar,iprint);
/;
/; y1
=> left hand side
Usually set as %y from OLS
/; x1
=> right hand side. Usually set as %x from OLS step
/; z1
=> instrumental Variables
/; var_name => Names from OLS step. Usually set as %names
/; yvar
=> usually set from call olsq as %yvar
/; iprint
=> =1 print coef, =2 print covariance in addition
/;
/; if # of obs for z1 < x1 then x1 will be truncated
/;
/; Automatic variables created
/; %olscoef
=> OLS Coefficients
/; %ols_se
=> OLS SE
/; %ols_t
=> OLS t
/; %ls2coef
=> LS2 Coefficients
/; %ls2_sel
=> Large Sample LS2 SE
/; %ls2_ses
=> Small Sample LS2 SE
/; %ls2_t_l
=> Large Sample LS2 t
/; %ls2_t_s
=> Small Sample LS2 t
/; %rss_ols
=> e'e for OLS
/; %rss_ls2
=> e'e for LS2
/; %yhatols
=> yhat for OLS
/; %yhatls2
=> yhat for LS2
/; %resols
=> OLS Residual
/; %resls2
=> LS2 Residual
/; %covar l
=> Large Sample covariance
/; %covar_s
=> Small Sample covariance
/; %sigma_l
=> Large Sample sigma
/; %sigma_s
=> Small Sample Sigma
/; %z
/; %varcov1
=> From OLS
/; %info
=> Model is ok if = 0
/; For conditional Heteroskedasticity Sargan(1958)=Hansen(1982) J test
/; %sargan
=> Sargan(1958) test
/; %basmann
=> Basmann(1960)
/;
/; Example Job:
/;
/; b34sexec options ginclude('b34sdata.mac') member(kmenta);
/;
b34srun;
/;
/; b34sexec matrix;
/; call loaddata;
/; call echooff;
/; call print('OLS for Equation # 1':);
/; call olsq(q p d :savex :print);
/; call ls2a(%y,%x,catcol(d,f,a,constant),%names,%yvar,1);
/;
/; call print('OLS for Equation # 2':);
/; call olsq(q p a f: a :savex :print);
/; call ls2(%y,%x,catcol(d,f,a,constant),%names,%yvar,1);
/; b34srun;
/;
/; Command built 26 April 2010, Mods 26 May 2010 2 August 2010
/;
y =vfam(y1);
%z=mfam(z1);
61
62
Chapter 4
x =mfam(x1);
n1=norows(%z);
n2=norows(x);
if(n2.lt.n1)call deleterow(%z,1,(n1-n2));
if(n1.lt.n2)then;
call epprint('ERROR: # obs for instruments < # obs for equation');
go to done;
endif;
/; This saves the OLS Results
call olsq(y x :noint);
%olscoef=%coef;
%ols_se=%se;
%ols_t =%t;
n_k=%nob-%k;
%rss_ols=%rss;
%yhatols=%yhat;
%resols =%res;
%varcov1=%resvar*%xpxinv;
* 2SLS ;
zpz = transpose(%z)*%z;
zpx = transpose(%z)*x;
zpy = transpose(%z)*y;
ypy = y*y;
irank=rank(zpx);
iorder=rank(zpz);
/;
if(iorder.lt.irank)then;
call epprint('ERROR: Model Underidentified.':);
go to done;
endif;
/;
%ls2coef =inv(transpose(zpx)*inv(zpz)*zpx)*
(transpose(zpx)*inv(zpz)*zpy);
/;
/; Error trap turned off
/;
/; call gminv((transpose(zpx)*inv(zpz)*zpx),%ls2coef,%info,rrcond);
/; if(%info.ne.0)then;
/; go to done;
/; endif;
%yhatls2=x*%ls2coef;
%resls2 =y-%yhatls2;
sigma_w=(ypy - (2.*y*x*%ls2coef) +
%ls2coef*transpose(x)*x*%ls2coef)/dfloat(n_k);
%covar_s=sigma_w*inv(transpose(x)*%z*inv(zpz)*transpose(%z)*x);
%ls2_ses=dsqrt(diag(%covar_s));
* Get sigma(i,j) from fits ;
%rss_ls2=sumsq(%resls2);
%sigma_l=%rss_ls2/dfloat(%nob);
%sigma_s=%rss_ls2/dfloat(n_k);
Simultaneous Equations Systems
%covar_l=%sigma_l*inv(transpose(zpx)*inv(zpz)*zpx);
%ls2_sel=dsqrt(diag(%covar_l));
%ls2_t_s=afam(%ls2coef)/afam(%ls2_ses);
%ls2_t_l=afam(%ls2coef)/afam(%ls2_sel);
/;
/; squared canonical correlations
/;
if(iprint.ne.0)then;
can_corr=real(eig(inv(transpose(x)*x)*(transpose(x)*%z)*inv(zpz)*zpx));
call print(can_corr);
anderson=-1.*dfloat(norows(%z))
*dlog(sum(kindas(%z,1.0)-afam(can_corr)));
anderlm = dfloat(norows(%z))*min(can_corr);
cragg_d = anderlm/(1.0 - min(can_corr));
endif;
/;
/; %sargan & %basmann
/;
call olsq(%resls2 %z :noint);
%basmann=(dfloat( norows(%z)-nocols(%z))*(sumsq(%resls2)-%rss))/%rss;
%sargan = dfloat(norows(%z))*%rsq;
/;
if(iprint.ne.0)then;
call print(' ':);
call print('OLS and LS2 Estimation':);
call print(' ':);
gg=
'Dependent Variable
';
gg2=c1array(8:yvar);
ff=catrow(gg,gg2);
call print(ff:);
call print('OLS Sum of squared Residuals
',%rss_ols:);
call print('LS2 Sum of squared Residuals
',%rss_ls2:);
call print('Large Sample ls2 sigma
',%sigma_l:);
call print('Small Sample ls2 sigma
',%sigma_s:);
call print('Rank of Equation
',irank:);
call print('Order of Equation
',iorder:);
if(irank.lt.iorder)call print('Equation is overidentified':);
if(irank.eq.iorder)call print('Equation is exactly identified':);
/;
call print('Anderson LR ident./IV Relevance test ',anderson:);
/;
if(iorder.ge.irank.and.anderson.gt.0.0)then;
aprob=chisqprob(anderson,dfloat(iorder+1-irank));
call print('Significance of Anderson LR Statistic',aprob:);
endif;
/;
call print('Anderson Canon Correlation LM test
',anderlm:);
/;
if(iorder.ge.irank.and.anderlm.gt.0.0)then;
aprob=chisqprob(anderlm,dfloat(iorder+1-irank));
call print('Significance of Anderson LM Statistic',aprob:);
endif;
/;
call print('Cragg-Donald Chi-Square Weak ID Test ',cragg_d:);
/;
if(iorder.ge.irank.and.cragg_d.gt.0.0)then;
aprob=chisqprob(cragg_d,dfloat(iorder+1-irank));
call print('Significance of Cragg-Donald test
',aprob:);
endif;
/;
63
64
Chapter 4
call print('Basmann
',%basmann:);
/;
if(iorder.gt.irank.and.%basmann.gt.0.0)then;
bprob=chisqprob(%basmann,dfloat(iorder-irank));
call print('Significance of Basmann Statistic
',bprob:);
endif;
/;
call print('Sargan N*R-sq / J-Test Test
',%sargan:);
/;
if(iorder.gt.irank.and.%sargan.gt.0.0)then;
sprob=chisqprob(%sargan,dfloat(iorder-irank));
call print('Significance of Sargan Statistic
',sprob:);
endif;
/;
call print(' ':);
call print('Hausman (1978) test - Sig. => need LS2':);
call hausman('All coef. tested with Full (small) Covar. Matrix',
%olscoef,%varcov1,%ls2coef,%covar_s,
hausmant,h_sig,iprint);
call hausman('All coef. tested with Full (large) Covar. Matrix',
%olscoef,%varcov1,%ls2coef,%covar_l,
hausmant,h_sig,iprint);
call hausman('All coef. tested with diag (small) Covar. Matrix',
%olscoef,diagmat(diag(%varcov1)),
%ls2coef,diagmat(diag(%covar_s)),
hausmant,h_sig,iprint);
call hausman('All coef. tested with diag (large) Covar. Matrix',
%olscoef,diagmat(diag(%varcov1)),
%ls2coef,diagmat(diag(%covar_l)),
hausmant,h_sig,iprint);
/;
call tabulate(var_name,%olscoef,%ols_se,%ols_t,%ls2coef,
%ls2_ses,%ls2_sel,
%ls2_t_s,%ls2_t_l
:title
'+++++++++++++++++++++++++++++++++++++++++++++++++++++');
call print(' ':);
if(iprint.eq.2)
call print('Estimated Covariance Matrix - Large Sample',%covar_1);
endif;
/;
call makeglobal(%olscoef);
call makeglobal(%ols_se);
call makeglobal(%ols_t);
call makeglobal(%ls2coef);
call makeglobal(%ls2_sel);
call makeglobal(%ls2_ses);
call makeglobal(%ls2_t_l);
call makeglobal(%ls2_t_s);
call makeglobal(%rss_ols);
call makeglobal(%rss_ls2);
call makeglobal(%yhatols);
call makeglobal(%yhatls2);
call makeglobal(%resols);
call makeglobal(%resls2);
call makeglobal(%covar_l);
call makeglobal(%covar_s);
call makeglobal(%sigma_l);
call makeglobal(%sigma_s);
call makeglobal(%z);
Simultaneous Equations Systems
call makeglobal(%sargan);
call makeglobal(%basmann);
call makeglobal(%varcov1);
/; call makeglobal(%info);
/;
done continue;
return;
end;
subroutine gmmest(y,x,z,names,yvar,j_stat,sigma,iprint);
/;
/; GMM Model - Built 12 May 2010
/;
/; Must call ls2 prior to this call to produce global variable
/; %z
/;
/; The following global variables are created:
/; %resgmm
=> GMM Residuals
/; %segmm
=> GMM SE
/; %tgmm
=> GMM t
/; %coefgmm
=> GMM Coef
/; %yhatgmm
=> GMM Y hat
/; %covar_g
=> Variance Covariance
/;
/;
The Anderson Test is discussed in Baum
/; "An introduction to Modern Econometrics Using Stata" (2006) p. 208
/;
Both the IV and LM forms of tgeh test are given.
/;
/; Generates feasable two-step GMM Estimator. Results are the same as
/; produced by the RATS "optimalweights" option.
/;
/; Note: When running bootstraps inv(s) can fail to invert if dummy
/;
variables are in the dataset.
/;
/; See Baum (2006) page 196
/;
xpz
= transpose(x)*z;
xpy
= transpose(x)*vfam(y);
ypy
= vfam(y)*vfam(y);
/;
/; GMM Coefficients
/;
irank =rank(xpz);
iorder=rank(transpose(z)*z);
/;
if(iorder.lt.irank)then;
call epprint('ERROR: Model Underidentified.':);
go to done;
endif;
/;
adj=kindas(z,1.0)/dfloat(norows(z));
s=hc_sigma(adj,z,%resls2);
inv_s=inv(s);
%coefgmm=inv(xpz*inv_s*transpose(xpz)) *
(xpz*inv_s*transpose(z)*vfam(y));
%resgmm =vfam(y)-x*%coefgmm;
%yhatgmm=x*%coefgmm;
sigma=hc_sigma(kindas(z,1.),z,%resls2);
/;
/; Logic from Rats User's Guide Version 7 page 245
/;
j_stat=%resgmm*z*inv(sigma)*transpose(z)*%resgmm;
65
66
Chapter 4
/;
/; Stock Watcon 2007 page 734
/;
%covar_g=inv(xpz*inv(sigma)*transpose(xpz));
%segmm=dsqrt(diag(%covar_g));
%tgmm=afam(%coefgmm)/afam(%segmm);
/;
/;
/; squared canonical correlations
/;
can_corr = real(eig(inv(transpose(x)*x)*(transpose(x)*z)
*inv(transpose(z)*z)* transpose(xpz)));
/;
if(iprint.gt.1)call print(can_corr);
anderson=-1.*dfloat(norows(z))
*dlog(sum(kindas(z,1.0)-afam(can_corr)));
anderlm = dfloat(norows(z))*min(can_corr);
cragg_d = anderlm/(1.0 - min(can_corr));
/;
if(iprint.ne.0)then;
call print(' ':);
call print('GMM Estimates':);
call print(' ':);
gg=
'Dependent Variable
';
gg2=c1array(8:yvar);
ff=catrow(gg,gg2);
call print(ff:);
call print('OLS sum of squares
',sumsq(%resols):);
call print('LS2 sum of squares
',sumsq(%resls2):);
call print('GMM sum of squares
',sumsq(%resgmm):);
call print('Rank of Equation
',irank:);
call print('Order of Equation
',iorder:);
if(irank.lt.iorder)call print('Equation is overidentified':);
if(irank.eq.iorder)call print('Equation is exactly identified':);
call print('Anderson ident./IV Relevance test
',anderson:);
/;
if(iorder.ge.irank.and.anderson.gt.0.0)then;
aprob=chisqprob(anderson,dfloat(iorder+1-irank));
call print('Significance of Anderson Statistic ',aprob:);
endif;
/;
call print('Anderson Canon Correlation LM test
',anderlm:);
/;
if(iorder.ge.irank.and.anderlm.gt.0.0)then;
aprob=chisqprob(anderlm,dfloat(iorder+1-irank));
call print('Significance of Anderson LM Statistic',aprob:);
endif;
/;
call print('Cragg-Donald Chi-Square Weak ID Test ',cragg_d:);
/;
if(iorder.ge.irank.and.cragg_d.gt.0.0)then;
aprob=chisqprob(cragg_d,dfloat(iorder+1-irank));
call print('Significance of Cragg-Donald test
',aprob:);
endif;
/;
call print('Hansen J_stat Ident. of
instruments',j_stat:);
/;
if(iorder.gt.irank.and.j_stat.gt.0.0)then;
jprob=chisqprob(j_stat,dfloat(iorder-irank));
call print('Significance of Hansen J_stat
',jprob:);
/;
Simultaneous Equations Systems
67
call print(' ':);
call hausman('Hausman (1978) test - Sig. => Need GMM',
%olscoef,%varcov1,%coefgmm,%covar_g,
hausmant,h_sig,iprint);
endif;
/;
call tabulate(names,%coefgmm,%segmm,%tgmm
:title '+++++++++++++++++++++++++++++++++++++++++++++++++++++');
call print(' ':);
endif;
call makeglobal(%resgmm);
call makeglobal(%segmm);
call makeglobal(%tgmm);
call makeglobal(%coefgmm);
call makeglobal(%yhatgmm);
call makeglobal(%covar_g);
done continue;
return;
end;
Table 4.9 shows the setup to estimate and test LS2 and GMM models for the Griliches
(1976) wage data used as a test case in Baum (2006). The Griliches model regresses the log
wage on education, experience, tenure, age, a number of control variables and various year
dummy variables. Stata and Rats results are shown for comparison. In addition Baum (2006)
can be inspected for replication purposes.
68
Chapter 4
Table 4.9 Estimation of LS2 and GMM Models using B34S, Stata and Rats
%b34slet
%b34slet
%b34slet
%b34slet
b34sexec
dob34s1=0;
dob34s2=1;
dostata=1;
dorats =1;
options ginclude('micro.mac')
member(griliches76); b34srun
%b34sif(&dob34s1.ne.0)%then;
b34sexec matrix;
call loaddata;
call echooff;
call olsq(iq s expr tenure rns smsa
iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73
med kww age mrt :print);
iqyhat=%yhat;
call olsq(lw iqyhat s expr tenure rns smsa
iyear_67
iyear_68
iyear_69
iyear_70
iyear_71
iyear_73 :print);
call olsq(lw iq s expr tenure rns smsa
iyear_67
iyear_68
iyear_69
iyear_70
iyear_71
iyear_73 :print);
call gamfit(lw iq s expr tenure rns[factor,1] smsa[factor,1]
iyear_67[factor,1]
iyear_68[factor,1]
iyear_69[factor,1]
iyear_70[factor,1]
iyear_71[factor,1]
iyear_73[factor,1] :print);
call marspline(lw iq s expr tenure rns smsa
iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73
:print :nk 40 :mi 2);
call gamfit(lw80 iq s expr tenure rns[factor,1] smsa[factor,1]
iyear_67[factor,1]
iyear_68[factor,1]
iyear_69[factor,1]
iyear_70[factor,1]
iyear_71[factor,1]
iyear_73[factor,1] :print);
call marspline(lw80 iq s expr tenure rns smsa
iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73
:print :nk 40 :mi 2);
Simultaneous Equations Systems
b34srun;
%b34sendif;
%b34sif(&dob34s2.ne.0)%then;
b34sexec matrix;
call loaddata;
call load(ls2);
call echooff;
call character(lhs,'lw');
call character(endvar,'iq');
call character(rhs,'iq s expr tenure rns smsa iyear_67 iyear_68
iyear_69 iyear_70 iyear_71 iyear_73 constant');
call character(ivar,'s expr tenure rns smsa iyear_67 iyear_68
iyear_69 iyear_70 iyear_71 iyear_73 constant med kww age mrt');
call olsq(argument(lhs) argument(rhs) :noint :print :savex);
call ls2(%y,%x,catcol(argument(ivar)),%names,%yvar,1);
call print(lhs,rhs,ivar,endvar);
call gmmest(%y,%x,%z,%names,%yvar,j_stat,sigma,1);
call graph(%y %yhatols %yhatls2,%yhatgmm :nocontact
:pgborder :nolabel);
b34srun;
%b34sendif;
%b34sif(&dostata.ne.0)%then;
b34sexec options open('statdata.do') unit(28) disp=unknown$ b34srun$
b34sexec options clean(28)$ b34srun$
b34sexec options open('stata.do') unit(29) disp=unknown$ b34srun$
b34sexec options clean(29)$ b34srun$
b34sexec pgmcall idata=28 icntrl=29$
stata$
* for detail on stata commands see Baum page 205 ;
pgmcards$
* uncomment if do not use /e
* log using stata.log, text
global xlist s expr tenure rns smsa iyear_67 ///
iyear_68 iyear_69 iyear_70 iyear_71 iyear_73
ivregress 2sls
ivregress liml
ivregress gmm
lw $xlist (iq=med kww age mrt)
lw $xlist (iq=med kww age mrt)
lw $xlist (iq=med kww age mrt)
ivreg
lw
$xlist (iq=med kww age mrt)
ivreg2
lw
$xlist (iq=med kww age mrt)
ivreg2 lw $xlist (iq=med kww age mrt), gmm2s robust
overid, all
* orthog(age mrt)
gmm (lw-{xb:$xlist iq} +{b0}), ///
instruments ($xlist med kww age mrt) onestep nolog
exit,clear
69
70
Chapter 4
b34sreturn$
b34seend$
b34sexec options close(28); b34srun;
b34sexec options close(29); b34srun;
b34sexec options
dounix('stata -b do stata.do ')
dodos('stata /e stata.do');
b34srun;
b34sexec options npageout
writeout('output from stata',' ',' ')
copyfout('stata.log')
dodos('erase stata.do',
/; 'erase stata.log',
'erase statdata.do') $
b34srun$
%b34sendif;
%b34sif(&dorats.ne.0)%then;
b34sexec options open('rats.dat') unit(28) disp=unknown$ b34srun$
b34sexec options open('rats.in') unit(29) disp=unknown$ b34srun$
b34sexec options clean(28)$ b34srun$
b34sexec options clean(29)$ b34srun$
b34sexec pgmcall$
rats passasts
pcomments('* ',
'* Data passed from B34S(r) system to RATS',
'*
',
"display @1 %dateandtime() @33 ' Rats Version ' %ratsversion()"
'* ') $
PGMCARDS$
*
instruments s expr tenure rns smsa iyear_67
$
iyear_68 iyear_69 iyear_70 iyear_71 iyear_73
med kww age mrt constant
* OLS
linreg lw
# constant s expr tenure rns smsa iyear_67
$
iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 iq
* 2SLS
linreg(inst) lw
# constant s expr tenure rns smsa iyear_67
$
iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 iq
* GMM
linreg(inst,optimalweights) lw
# constant s expr tenure rns smsa iyear_67
$
iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 iq
b34sreturn$
b34srun $
b34sexec options close(28)$ b34srun$
b34sexec options close(29)$ b34srun$
$
Simultaneous Equations Systems
71
b34sexec options
/$
dodos(' rats386 rats.in rats.out ')
dodos('start /w /r
rats32s rats.in /run')
dounix('rats
rats.in rats.out')$ B34SRUN$
b34sexec options npageout
WRITEOUT('Output from RATS',' ',' ')
COPYFOUT('rats.out')
dodos('ERASE rats.in','ERASE rats.out','ERASE
dounix('rm
rats.in','rm
rats.out','rm
$
B34SRUN$
%b34sendif;
rats.dat')
rats.dat')
Edited and annotated results are shown next.
Variable
RNS
RNS80
MRT
MRT80
SMSA
SMSA80
MED
IQ
KWW
YEAR
AGE
AGE80
S
S80
EXPR
EXPR80
TENURE
TENURE80
LW
LW80
IYEAR_67
IYEAR_68
IYEAR_69
IYEAR_70
IYEAR_71
IYEAR_73
CONSTANT
Label
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Cases
residency in South
residency in South in 1980
marital status = 1 if married
marital status = 1 if married in 1980
reside metro area = 1 if urban
reside metro area = 1 if urban in 1980
mother s education, years
iq score
score on knowledge in world of work test
Year
Age
Age in 1980
completed years of schooling
completed years of schooling in 1980
experience, years
experience, yearsin 1980
tenure, years
tenure, years in 1980
log wage
log wage in 1980
Number of observations in data file
Current missing variable code
758
1.000000000000000E+31
758
758
758
758
758
758
758
758
758
758
758
758
758
758
758
758
758
758
758
758
758
758
758
758
758
758
758
Mean
0.269129
0.292876
0.514512
0.898417
0.704485
0.712401
10.9103
103.856
36.5739
69.0317
21.8351
33.0119
13.4050
13.7071
1.73543
11.3943
1.83113
7.36280
5.68674
6.82656
0.831135E-01
0.104222
0.112137
0.844327E-01
0.121372
0.208443
1.00000
Std. Dev.
0.443800
0.455383
0.500119
0.302299
0.456575
0.452942
2.74112
13.6187
7.30225
2.63179
2.98176
3.08550
2.23183
2.21469
2.10554
4.21075
1.67363
5.05024
0.428949
0.409927
0.276236
0.305750
0.315744
0.278219
0.326775
0.406464
0.00000
Variance
0.196959
0.207373
0.250119
0.913845E-01
0.208461
0.205156
7.51374
185.468
53.3228
6.92634
8.89087
9.52033
4.98106
4.90486
4.43331
17.7304
2.80104
25.5049
0.183998
0.168040
0.763063E-01
0.934828E-01
0.996940E-01
0.774060E-01
0.106782
0.165213
0.00000
Maximum
Minimum
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
18.0000
145.000
56.0000
73.0000
30.0000
38.0000
18.0000
18.0000
11.4440
22.0450
10.0000
22.0000
7.05100
8.03200
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
54.0000
12.0000
66.0000
16.0000
28.0000
9.00000
9.00000
0.00000
0.692000
0.00000
0.00000
4.60500
4.74900
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
1.00000
72
Chapter 4
Ordinary Least Squares Estimation
Dependent variable
Centered R**2
Adjusted R**2
Residual Sum of Squares
Residual Variance
Standard Error
Total Sum of Squares
Log Likelihood
Mean of the Dependent Variable
Std. Error of Dependent Variable
Sum Absolute Residuals
F(12,
745)
F Significance
1/Condition XPX
Maximum Absolute Residual
Number of Observations
Variable
IQ
S
EXPR
TENURE
RNS
SMSA
IYEAR_67
IYEAR_68
IYEAR_69
IYEAR_70
IYEAR_71
IYEAR_73
CONSTANT
Lag
0
0
0
0
0
0
0
0
0
0
0
0
0
Coefficient
0.27121199E-02
0.61954782E-01
0.30839472E-01
0.42163060E-01
-0.96293467E-01
0.13289929
-0.54209478E-01
0.80580850E-01
0.20759151
0.22822373
0.22269148
0.32287469
4.2353569
LW
0.4301415547786606
0.4209626268019410
79.37338878983863
0.1065414614628706
0.3264068955504320
139.2861498420176
-220.3342420049200
5.686738782319042
0.4289493629019316
194.5217111479906
46.86185095575703
1.000000000000000
1.486105464518127E-06
1.186094775249485
758
SE
0.10314110E-02
0.72785810E-02
0.65100828E-02
0.74812112E-02
0.27546700E-01
0.26575835E-01
0.47852181E-01
0.44895091E-01
0.43860470E-01
0.48799418E-01
0.43095233E-01
0.40657433E-01
0.11334886
t
2.6295239
8.5119313
4.7371858
5.6358601
-3.4956444
5.0007567
-1.1328528
1.7948700
4.7329979
4.6767716
5.1674272
7.9413448
37.365677
The below listed edited output replicates Baum (2006, 193-194). The Basman and Sargan tests
of 97.0249 and 87.655, respectively, show high significance which rejects the null hypothesis
that there is no correlation between the residuals of the LS2 model and the instruments. This
finding suggests serious problems since endogeniety present in the OLS model will not be
removed by LS2 estimation. Note that Stata replicates the Sargon test value. The Anderson
value of 54.33 that tests for the relevance of the instruments matches the value reported in Baum
(2006, 204) but does not match the value reported by Stata in the printed output that uses the
revised ivreg2 Stata command that uses the LM form of the test value of 52.436. The B34S
output includes both statistics. Since the null was rejected, the instruments appear relevant in that
they are related to the endogenous variables. This is confirmed with the Cragg-Donald (1993)
statistic of 56.333. In addition to various LS2 and GMM results, both Stata bootstrap and Stata
robust errors results are shown. The bootstrap results do not make do not make assumptions
about the distribution of the regressiors.
The Rats coefficient results for LS2 and GMM match B34S and Stata. Note that Rats
uses the small sample SE formula while Stata reports the large sample SE. B34S LS2 results
report both. The exact formulas for all LS2 and GMM calculations in B34S are contained in the
two subroutines listed in Table 4.8.
Simultaneous Equations Systems
OLS and LS2 Estimation
Dependent Variable
OLS Sum of squared Residuals
LS2 Sum of squared Residuals
Large Sample ls2 sigma
Small Sample ls2 sigma
Rank of Equation
Order of Equation
Equation is overidentified
Anderson LR ident./IV Relevance test
Significance of Anderson LR Statistic
Anderson Canon Correlation LM test
Significance of Anderson LM Statistic
Cragg-Donald Chi-Square Weak ID Test
Significance of Cragg-Donald test
Basmann
Significance of Basmann Statistic
Sargan N*R-sq / J-Test Test
Significance of Sargan Statistic
LW
79.37338878983863
80.01823370030675
0.1055649521112226
0.1074070251010829
13
16
54.33777011513529
0.9999999999552830
52.43586586757428
0.9999999998881718
56.33277600836977
0.9999999999829244
97.02497131695870
1.000000000000000
87.65523169449482
1.000000000000000
+++++++++++++++++++++++++++++++++++++++++++++++++++++
Obs
1
2
3
4
5
6
7
8
9
10
11
12
13
LHS
NAMES
IQ
S
EXPR
TENURE
RNS
SMSA
IYEAR_67
IYEAR_68
IYEAR_69
IYEAR_70
IYEAR_71
IYEAR_73
CONSTANT
%OLSCOEF
%OLS_SE
%OLS_T
0.2712E-02 0.1031E-02
2.630
0.6195E-01 0.7279E-02
8.512
0.3084E-01 0.6510E-02
4.737
0.4216E-01 0.7481E-02
5.636
-0.9629E-01 0.2755E-01 -3.496
0.1329
0.2658E-01
5.001
-0.5421E-01 0.4785E-01 -1.133
0.8058E-01 0.4490E-01
1.795
0.2076
0.4386E-01
4.733
0.2282
0.4880E-01
4.677
0.2227
0.4310E-01
5.167
0.3229
0.4066E-01
7.941
4.235
0.1133
37.37
%LS2COEF
%LS2_SES
%LS2_SEL
%LS2_T_S
%LS2_T_L
0.1747E-03 0.3937E-02 0.3903E-02 0.4436E-01 0.4474E-01
0.6918E-01 0.1305E-01 0.1294E-01
5.301
5.347
0.2987E-01 0.6697E-02 0.6639E-02
4.460
4.498
0.4327E-01 0.7693E-02 0.7627E-02
5.625
5.674
-0.1036
0.2974E-01 0.2948E-01 -3.484
-3.514
0.1351
0.2689E-01 0.2666E-01
5.025
5.069
-0.5260E-01 0.4811E-01 0.4769E-01 -1.093
-1.103
0.7947E-01 0.4511E-01 0.4472E-01
1.762
1.777
0.2109
0.4432E-01 0.4393E-01
4.759
4.800
0.2386
0.5142E-01 0.5097E-01
4.641
4.682
0.2285
0.4412E-01 0.4374E-01
5.178
5.223
0.3259
0.4107E-01 0.4072E-01
7.935
8.004
4.400
0.2709
0.2685
16.24
16.38
= LW
RHS
=
IQ S EXPR TENURE RNS SMSA IYEAR_67 IYEAR_68
70 IYEAR_71 IYEAR_73 CONSTANT
IVAR
=
S EXPR TENURE RNS SMSA IYEAR_67 IYEAR_68
IYEAR_71 IYEAR_73 CONSTANT MED KWW AGE MRT
ENDVAR
IYEAR_69 IYEAR_
IYEAR_69 IYEAR_70
= iq
GMM Estimates
Dependent Variable
OLS sum of squares
LS2 sum of squares
GMM sum of squares
Rank of Equation
Order of Equation
Equation is overidentified
Anderson ident./IV Relevance test
Significance of Anderson Statistic
Anderson Canon Correlation LM test
Significance of Anderson LM Statistic
Cragg-Donald Chi-Square Weak ID Test
Significance of Cragg-Donald test
Hansen J_stat Ident. of
instruments
Significance of Hansen j_stat
LW
79.37338878983863
80.01823370030675
81.26217887229201
13
16
54.33777011513529
0.9999999999552830
52.43586586757428
0.9999999998881718
56.33277600836977
0.9999999999829244
74.16487762432548
0.9999999999999994
+++++++++++++++++++++++++++++++++++++++++++++++++++++
Obs
1
2
3
4
5
6
7
8
9
10
11
12
13
NAMES
IQ
S
EXPR
TENURE
RNS
SMSA
IYEAR_67
IYEAR_68
IYEAR_69
IYEAR_70
IYEAR_71
IYEAR_73
CONSTANT
%COEFGMM
%SEGMM
%TGMM
-0.1401E-02 0.4113E-02 -0.3407
0.7684E-01 0.1319E-01
5.827
0.3123E-01 0.6693E-02
4.667
0.4900E-01 0.7344E-02
6.672
-0.1007
0.2959E-01 -3.403
0.1336
0.2632E-01
5.075
-0.2101E-01 0.4554E-01 -0.4614
0.8910E-01 0.4270E-01
2.087
0.2072
0.4080E-01
5.080
0.2338
0.5285E-01
4.424
0.2346
0.4257E-01
5.510
0.3360
0.4041E-01
8.315
4.437
0.2900
15.30
73
74
Chapter 4
B34S Matrix Command Ending. Last Command reached.
output from stata
___ ____ ____ ____ ____ (R)
/__
/
____/
/
____/
___/
/
/___/
/
/___/
11.1
Statistics/Data Analysis
Copyright 2009 StataCorp LP
StataCorp
4905 Lakeway Drive
College Station, Texas 77845 USA
800-STATA-PC
http://www.stata.com
979-696-4600
stata@stata.com
979-696-4601 (fax)
Single-user Stata perpetual license:
Serial number: 30110535901
Licensed to: Houston H. Stokes
University of Illinois at Chicago
Notes:
1.
2.
(/m# option or -set memory-) 120.00 MB allocated to data
Stata running in batch mode
. do stata.do
. * File built by B34S
. run statdata.do
on 17/10/10
at
12:29:31
. * uncomment if do not use /e
. * log using stata.log, text
. global xlist s expr tenure rns smsa iyear_67 ///
>
iyear_68 iyear_69 iyear_70 iyear_71 iyear_73
. bootstrap _b _se, reps(50): ///
>
ivregress 2sls lw $xlist (iq=med kww age mrt)
(running ivregress on estimation sample)
Bootstrap replications (50)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
..................................................
Bootstrap results
50
Number of obs
Replications
=
=
758
50
-----------------------------------------------------------------------------|
Observed
Bootstrap
Normal-based
|
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------b
|
iq |
.0001747
.0074584
0.02
0.981
-.0144435
.0147928
s |
.0691759
.0217356
3.18
0.001
.0265749
.1117769
expr |
.029866
.0079507
3.76
0.000
.014283
.0454491
tenure |
.0432738
.0086468
5.00
0.000
.0263264
.0602211
rns | -.1035897
.0406823
-2.55
0.011
-.1833256
-.0238538
smsa |
.1351148
.0258812
5.22
0.000
.0843886
.1858411
iyear_67 |
-.052598
.0422675
-1.24
0.213
-.1354408
.0302448
iyear_68 |
.0794686
.0459301
1.73
0.084
-.0105528
.16949
iyear_69 |
.2108962
.0456788
4.62
0.000
.1213673
.300425
iyear_70 |
.2386338
.0592127
4.03
0.000
.122579
.3546886
iyear_71 |
.2284609
.0513617
4.45
0.000
.1277939
.3291279
iyear_73 |
.3258944
.0432171
7.54
0.000
.2411904
.4105984
_cons |
4.39955
.4995474
8.81
0.000
3.420455
5.378645
-------------+---------------------------------------------------------------se
|
iq |
.0039035
.0012226
3.19
0.001
.0015073
.0062996
s |
.0129366
.0034772
3.72
0.000
.0061214
.0197518
expr |
.0066393
.0007373
9.00
0.000
.0051941
.0080845
tenure |
.0076271
.0011929
6.39
0.000
.005289
.0099652
rns |
.029481
.0052416
5.62
0.000
.0192077
.0397544
smsa |
.0266573
.002741
9.73
0.000
.021285
.0320297
iyear_67 |
.0476924
.0051268
9.30
0.000
.0376441
.0577407
iyear_68 |
.0447194
.004026
11.11
0.000
.0368285
.0526102
iyear_69 |
.0439336
.0055467
7.92
0.000
.0330623
.054805
iyear_70 |
.0509733
.0052485
9.71
0.000
.0406864
.0612601
iyear_71 |
.0437436
.0041483
10.54
0.000
.035613
.0518741
iyear_73 |
.0407181
.0041193
9.88
0.000
.0326444
.0487917
_cons |
.2685443
.0796381
3.37
0.001
.1124564
.4246321
-----------------------------------------------------------------------------. * Durbin-Wu-Hausman exogenous test robust errors
. ivregress 2sls lw $xlist (iq=med kww age mrt), vce(robust)
Simultaneous Equations Systems
Instrumental variables (2SLS) regression
Number of obs
Wald chi2(12)
Prob > chi2
R-squared
Root MSE
=
=
=
=
=
758
573.14
0.0000
0.4255
.32491
-----------------------------------------------------------------------------|
Robust
lw |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------iq |
.0001747
.0041241
0.04
0.966
-.0079085
.0082578
s |
.0691759
.0132907
5.20
0.000
.0431266
.0952253
expr |
.029866
.0066974
4.46
0.000
.0167394
.0429926
tenure |
.0432738
.0073857
5.86
0.000
.0287981
.0577494
rns | -.1035897
.029748
-3.48
0.000
-.1618947
-.0452847
smsa |
.1351148
.026333
5.13
0.000
.0835032
.1867265
iyear_67 |
-.052598
.0457261
-1.15
0.250
-.1422195
.0370235
iyear_68 |
.0794686
.0428231
1.86
0.063
-.0044631
.1634003
iyear_69 |
.2108962
.0408774
5.16
0.000
.1307779
.2910144
iyear_70 |
.2386338
.0529825
4.50
0.000
.1347901
.3424776
iyear_71 |
.2284609
.0426054
5.36
0.000
.1449558
.311966
iyear_73 |
.3258944
.0405569
8.04
0.000
.2464044
.4053844
_cons |
4.39955
.290085
15.17
0.000
3.830994
4.968106
-----------------------------------------------------------------------------Instrumented: iq
Instruments:
s expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70
iyear_71 iyear_73 med kww age mrt
. ivreg2
lw
$xlist (iq=med kww age mrt)
IV (2SLS) estimation
-------------------Estimates efficient for homoskedasticity only
Statistics consistent for homoskedasticity only
Total (centered) SS
Total (uncentered) SS
Residual SS
=
=
=
139.2861498
24652.24662
80.0182337
Number of obs
F( 12,
745)
Prob > F
Centered R2
Uncentered R2
Root MSE
=
=
=
=
=
=
758
45.91
0.0000
0.4255
0.9968
.3249
-----------------------------------------------------------------------------lw |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------iq |
.0001747
.0039035
0.04
0.964
-.007476
.0078253
s |
.0691759
.0129366
5.35
0.000
.0438206
.0945312
expr |
.029866
.0066393
4.50
0.000
.0168533
.0428788
tenure |
.0432738
.0076271
5.67
0.000
.0283249
.0582226
rns | -.1035897
.029481
-3.51
0.000
-.1613715
-.0458079
smsa |
.1351148
.0266573
5.07
0.000
.0828674
.1873623
iyear_67 |
-.052598
.0476924
-1.10
0.270
-.1460734
.0408774
iyear_68 |
.0794686
.0447194
1.78
0.076
-.0081797
.1671169
iyear_69 |
.2108962
.0439336
4.80
0.000
.1247878
.2970045
iyear_70 |
.2386338
.0509733
4.68
0.000
.1387281
.3385396
iyear_71 |
.2284609
.0437436
5.22
0.000
.1427251
.3141967
iyear_73 |
.3258944
.0407181
8.00
0.000
.2460884
.4057004
_cons |
4.39955
.2685443
16.38
0.000
3.873213
4.925887
-----------------------------------------------------------------------------Underidentification test (Anderson canon. corr. LM statistic):
52.436
Chi-sq(4) P-val =
0.0000
-----------------------------------------------------------------------------Weak identification test (Cragg-Donald Wald F statistic):
13.786
Stock-Yogo weak ID test critical values: 5% maximal IV relative bias
16.85
10% maximal IV relative bias
10.27
20% maximal IV relative bias
6.71
30% maximal IV relative bias
5.34
10% maximal IV size
24.58
15% maximal IV size
13.96
20% maximal IV size
10.26
25% maximal IV size
8.31
Source: Stock-Yogo (2005). Reproduced by permission.
-----------------------------------------------------------------------------Sargan statistic (overidentification test of all instruments):
87.655
Chi-sq(3) P-val =
0.0000
-----------------------------------------------------------------------------Instrumented:
iq
Included instruments: s expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70
iyear_71 iyear_73
Excluded instruments: med kww age mrt
------------------------------------------------------------------------------
75
76
. ivreg2 lw
Chapter 4
$xlist (iq=med kww age mrt), gmm2s robust
2-Step GMM estimation
--------------------Estimates efficient for arbitrary heteroskedasticity
Statistics robust to heteroskedasticity
Total (centered) SS
Total (uncentered) SS
Residual SS
=
=
=
139.2861498
24652.24662
81.26217887
Number of obs
F( 12,
745)
Prob > F
Centered R2
Uncentered R2
Root MSE
=
=
=
=
=
=
758
49.67
0.0000
0.4166
0.9967
.3274
-----------------------------------------------------------------------------|
Robust
lw |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------iq | -.0014014
.0041131
-0.34
0.733
-.009463
.0066602
s |
.0768355
.0131859
5.83
0.000
.0509915
.1026794
expr |
.0312339
.0066931
4.67
0.000
.0181157
.0443522
tenure |
.0489998
.0073437
6.67
0.000
.0346064
.0633931
rns | -.1006811
.0295887
-3.40
0.001
-.1586738
-.0426884
smsa |
.1335973
.0263245
5.08
0.000
.0820021
.1851925
iyear_67 | -.0210135
.0455433
-0.46
0.645
-.1102768
.0682498
iyear_68 |
.0890993
.042702
2.09
0.037
.0054049
.1727937
iyear_69 |
.2072484
.0407995
5.08
0.000
.1272828
.287214
iyear_70 |
.2338308
.0528512
4.42
0.000
.1302445
.3374172
iyear_71 |
.2345525
.0425661
5.51
0.000
.1511244
.3179805
iyear_73 |
.3360267
.0404103
8.32
0.000
.2568239
.4152295
_cons |
4.436784
.2899504
15.30
0.000
3.868492
5.005077
-----------------------------------------------------------------------------Underidentification test (Kleibergen-Paap rk LM statistic):
41.537
Chi-sq(4) P-val =
0.0000
-----------------------------------------------------------------------------Weak identification test (Cragg-Donald Wald F statistic):
13.786
(Kleibergen-Paap rk Wald F statistic):
12.167
Stock-Yogo weak ID test critical values: 5% maximal IV relative bias
16.85
10% maximal IV relative bias
10.27
20% maximal IV relative bias
6.71
30% maximal IV relative bias
5.34
10% maximal IV size
24.58
15% maximal IV size
13.96
20% maximal IV size
10.26
Simultaneous Equations Systems
Output from RATS
*
* Data passed from B34S(r) system to RATS
*
display @1 %dateandtime() @33 ' Rats Version ' %ratsversion()
10/17/2010 12:29
Rats Version
7.30000
*
CALENDAR(IRREGULAR)
ALLOCATE
758
OPEN DATA rats.dat
DATA(FORMAT=FREE,ORG=OBS,
$
MISSING=
0.1000000000000000E+32
) / $
RNS
$
RNS80
$
MRT
$
MRT80
$
SMSA
$
SMSA80
$
MED
$
IQ
$
KWW
$
YEAR
$
AGE
$
AGE80
$
S
$
S80
$
EXPR
$
EXPR80
$
TENURE
$
TENURE80
$
LW
$
LW80
$
IYEAR_67
$
IYEAR_68
$
IYEAR_69
$
IYEAR_70
$
IYEAR_71
$
IYEAR_73
$
CONSTANT
SET TREND = T
TABLE
Series
Obs
Mean
Std Error
Minimum
Maximum
RNS
758
0.269129288
0.443800128
0.000000000
1.000000000
RNS80
758
0.292875989
0.455382503
0.000000000
1.000000000
MRT
758
0.514511873
0.500119364
0.000000000
1.000000000
MRT80
758
0.898416887
0.302298767
0.000000000
1.000000000
SMSA
758
0.704485488
0.456574966
0.000000000
1.000000000
SMSA80
758
0.712401055
0.452941990
0.000000000
1.000000000
MED
758 10.910290237
2.741119861
0.000000000 18.000000000
IQ
758 103.856200528 13.618666082 54.000000000 145.000000000
KWW
758 36.573878628
7.302246519 12.000000000 56.000000000
YEAR
758 69.031662269
2.631794247 66.000000000 73.000000000
AGE
758 21.835092348
2.981755741 16.000000000 30.000000000
AGE80
758 33.011873351
3.085503913 28.000000000 38.000000000
S
758 13.405013193
2.231828411
9.000000000 18.000000000
S80
758 13.707124011
2.214692601
9.000000000 18.000000000
EXPR
758
1.735428758
2.105542485
0.000000000 11.444000244
EXPR80
758 11.394261214
4.210745167
0.691999972 22.045000076
TENURE
758
1.831134565
1.673629972
0.000000000 10.000000000
TENURE80
758
7.362796834
5.050240439
0.000000000 22.000000000
LW
758
5.686738782
0.428949363
4.605000019
7.051000118
LW80
758
6.826555411
0.409926757
4.749000072
8.031999588
IYEAR_67
758
0.083113456
0.276235910
0.000000000
1.000000000
IYEAR_68
758
0.104221636
0.305749595
0.000000000
1.000000000
IYEAR_69
758
0.112137203
0.315743524
0.000000000
1.000000000
IYEAR_70
758
0.084432718
0.278219253
0.000000000
1.000000000
IYEAR_71
758
0.121372032
0.326774746
0.000000000
1.000000000
IYEAR_73
758
0.208443272
0.406463569
0.000000000
1.000000000
TREND
758 379.500000000 218.960042017
1.000000000 758.000000000
77
78
*
instruments s expr tenure rns smsa iyear_67
$
iyear_68 iyear_69 iyear_70 iyear_71 iyear_73
med kww age mrt constant
* OLS
linreg lw
# constant s expr tenure rns smsa iyear_67
$
iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 iq
Chapter 4
$
Linear Regression - Estimation by Least Squares
Dependent Variable LW
Usable Observations
758
Degrees of Freedom
745
Centered R**2
0.430142
R Bar **2
0.420963
Uncentered R**2
0.996780
T x R**2
755.559
Mean of Dependent Variable
5.6867387823
Std Error of Dependent Variable 0.4289493629
Standard Error of Estimate
0.3264068956
Sum of Squared Residuals
79.373388790
Regression F(12,745)
46.8619
Significance Level of F
0.00000000
Log Likelihood
-220.33424
Durbin-Watson Statistic
1.726206
Variable
Coeff
Std Error
T-Stat
Signif
********************************************************************************
1. Constant
4.235356890 0.113348861
37.36568 0.00000000
2. S
0.061954782 0.007278581
8.51193 0.00000000
3. EXPR
0.030839472 0.006510083
4.73719 0.00000260
4. TENURE
0.042163060 0.007481211
5.63586 0.00000002
5. RNS
-0.096293467 0.027546700
-3.49564 0.00050091
6. SMSA
0.132899286 0.026575835
5.00076 0.00000071
7. IYEAR_67
-0.054209478 0.047852181
-1.13285 0.25764051
8. IYEAR_68
0.080580850 0.044895091
1.79487 0.07307967
9. IYEAR_69
0.207591515 0.043860470
4.73300 0.00000265
10. IYEAR_70
0.228223732 0.048799418
4.67677 0.00000346
11. IYEAR_71
0.222691481 0.043095233
5.16743 0.00000031
12. IYEAR_73
0.322874689 0.040657433
7.94134 0.00000000
13. IQ
0.002712120 0.001031411
2.62952 0.00872684
* 2SLS
linreg(inst) lw
# constant s expr tenure rns smsa iyear_67
$
iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 iq
Linear Regression - Estimation by Instrumental Variables
Dependent Variable LW
Usable Observations
758
Degrees of Freedom
745
Mean of Dependent Variable
5.6867387823
Std Error of Dependent Variable 0.4289493629
Standard Error of Estimate
0.3277301102
Sum of Squared Residuals
80.018233699
J-Specification(3)
86.151910
Significance Level of J
0.00000000
Durbin-Watson Statistic
1.723148
Variable
Coeff
Std Error
T-Stat
Signif
********************************************************************************
1. Constant
4.399550073 0.270877148
16.24187 0.00000000
2. S
0.069175917 0.013048998
5.30124 0.00000015
3. EXPR
0.029866018 0.006696962
4.45964 0.00000948
4. TENURE
0.043273756 0.007693380
5.62480 0.00000003
5. RNS
-0.103589698 0.029737133
-3.48351 0.00052378
6. SMSA
0.135114831 0.026888925
5.02492 0.00000063
7. IYEAR_67
-0.052598010 0.048106697
-1.09336 0.27458852
8. IYEAR_68
0.079468615 0.045107833
1.76175 0.07852207
9. IYEAR_69
0.210896152 0.044315294
4.75899 0.00000234
10. IYEAR_70
0.238633821 0.051416062
4.64123 0.00000409
11. IYEAR_71
0.228460915 0.044123572
5.17775 0.00000029
12. IYEAR_73
0.325894418 0.041071810
7.93475 0.00000000
13. IQ
0.000174655 0.003937397
0.04436 0.96463097
Simultaneous Equations Systems
79
* GMM
linreg(inst,optimalweights) lw
# constant s expr tenure rns smsa iyear_67
$
iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 iq
Linear Regression - Estimation by GMM
Dependent Variable LW
Usable Observations
758
Degrees of Freedom
Mean of Dependent Variable
5.6867387823
Std Error of Dependent Variable 0.4289493629
Standard Error of Estimate
0.3302676947
Sum of Squared Residuals
81.262178869
J-Specification(3)
74.164878
Significance Level of J
0.00000000
Durbin-Watson Statistic
1.720776
745
Variable
Coeff
Std Error
T-Stat
Signif
********************************************************************************
1. Constant
4.436784487 0.289950376
15.30188 0.00000000
2. S
0.076835453 0.013185922
5.82708 0.00000001
3. EXPR
0.031233937 0.006693110
4.66658 0.00000306
4. TENURE
0.048999780 0.007343684
6.67237 0.00000000
5. RNS
-0.100681114 0.029588671
-3.40269 0.00066726
6. SMSA
0.133597299 0.026324546
5.07501 0.00000039
7. IYEAR_67
-0.021013483 0.045543337
-0.46140 0.64451500
8. IYEAR_68
0.089099315 0.042701995
2.08654 0.03692996
9. IYEAR_69
0.207248405 0.040799543
5.07967 0.00000038
10. IYEAR_70
0.233830843 0.052851170
4.42433 0.00000967
11. IYEAR_71
0.234552477 0.042566121
5.51031 0.00000004
12. IYEAR_73
0.336026675 0.040410335
8.31536 0.00000000
13. IQ
-0.001401434 0.004113144
-0.34072 0.73331372
4.7 Potential problems of IV Models
Instrumental variable estimation methods, while necessary and useful for models with
endogenous variables on the right, have a number of features that can be serious drawbacks. 12 In
the first place such estimators are never unbiased when endogenous variables are on the right.
Citing Kinal (1980), Wooldridge (2010, 207) notes "when all endogenous variables have
homoskedastic normal distributions with expectations linear in the exogenous variables, the
number of moments of the 2SLS estimator that exist is one fewer than the number of
overidentifying restrictions. This finding implies that when the number of instruments equals the
number of explanatory variables, the IV estimator does not have the expected value." Even for
large sample analysis, there will be problems if there are weak instruments. Assume a single
endogenous variable x on the right or
y  0  1x  u
(4.7-1)
where z is the instrumental variable. It can be shown that
cov( z, u )
p lim ˆ1  1 
cov( z, x)
 c orr ( z, u )
 1  u
 x c orr ( z, x)
12 Wooldridge (2010) especially pages 107-114 forms the basis for this section.
(4.7-2)
80
Chapter 4
The greater corr ( z, u ) or the correlation between the instruments and the population error u , the
greater the bias. The smaller corr ( z, x) or the weaker the instrument since it is less correlated to
the endogenous variables, the greater the bias.
Note that the bias in the OLS estimator is
p lim( 1 )  1 
u
corr ( x, u )
x
(4.7-3)
and can be less than the bias in the IV estimator if
 corr ( z, u ) 
| corr ( x, u ) || 
|
 corr ( z, x) 
(4.7-4)
The more significant the Anderson test, the larger | corr ( z , x) | everything else equal and the less
the bias in the IV estimator. The more significant the Basmann (1960) test, the larger | corr ( z , u ) |
and the more bias in the IV estimator. An insignificant Anderson test and a significant Basmann
test is consistent with one or more instruments being endogenous. Table 4.10 Provides an
overview of the instrumental variable tests.
Table 4.10 Overview of IV Tests
Test
Usage
Sargon(1958)
For 2SLS uses (4.6-12). A significant value casts doubt on the suitability
of the instruments. The Hansen J statistic (4.6-11) is the GMM equivalent
of the Sargon test.
Anderson
The more significant the test, the larger the correlation between the
instruments and the endogenous variables | corr ( z , x) | . This implies less
bias in the IV estimator. There are three variants. Equation (4.6-8) is the
LR form, (4.6-9) is the LM form and (4.6-10) is the Cragg-Donald (1993)
form.
Basmann(1960)
The more significant the Basmann test statistic (4.6-12), the larger is
| corr ( z , u ) | which is associated with more bias in the IV estimator.
Hausman (1978)
Tests If IV estimation is needed. If the test is significant, then OLS should
not be used. For detail see (4.6-12).
Simultaneous Equations Systems
81
The subroutine hausman shown in Table 4.11 can be used to perform a number of
different types of Hausman (1978) tests that include using large and small sample IV covariance
estimators, using only the diagonal of the covariance estimators and using subsets of coefficients.
For example if there are 1-2 endogenous variables in the model, often one wants to test only
these values to see if they significantly changed when estimated with an IV technique. The
assumption being that the exogenous variables on the right are asymptotically unbiased.
Table 4.11 Subroutine to Perform Hausman Tests
subroutine hausman(title,olscoef,varcov1,ivcoef,ivcovar,
hausmant,h_sig,iprint);
/;
/; Hausman (1978) Test if IV Estimation is needed
/;
/; title
=> Supply a title
/; olscoef
=> Usually set as %olscoef from ls2 routine
/; varcov1
=> Usually set as %varcov1 from ls2 routine
/; ivcoef
=> Usually set as %ls2coef from ls2 routine
/;
or %coefgmm from gmmest routine
/; ivcovar
=> Usually set as %covar_L or %covar_S from ls2
/;
or %covar_g from gmmest
/; hausmant
=> Hausman test
/; h_sig
=> Significance of Hausman test
/; iprint
=> NE 0 => print, =2 print internal steps
/;
/; Logic of test is
/; "Cameron-Trivedi Microeconometrics: Methods and Applications"
/; Cambridge (2005, 272) equation 8.37
/;
/; Very Preliminary Version 5 August 2011
/; ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
/; Not to be used until this message is removed
d
=
workm
=
n_end
=
invdif =
hausmant=
h_sig
=
vfam(ivcoef-olscoef);
(mfam(ivcovar)-mfam(varcov1));
rank(workm);
pinv(workm);
d*invdif*d;
chisqprob(dabs(hausmant),dfloat(n_end));
if(iprint.ne.0)then;
call print(' ':);
call print(title:);
call print('Hausman (1978) M test statistic
',hausmant:);
call print('Rank of (ivcoef-varcov1)
',n_end:);
call print('Significance of Hausman Test
',h_sig:);
if(iprint.gt.1)then;
call print('Coefficient Difference Vector',d);
call print('OLS Var_Covar
',varcov1);
call print('IV Var-Covar
',ivcovar);
call print('Generalized Inverse of difference',invdif);
r_cond = rcond(invdif);
call print('rcond
',r_cond:);
endif;
call print(' ':);
endif;
82
Chapter 4
return;
end;
Table 4.12 shows the setup for various Hausman tests performed on the Griliches data.
Table 4.12 Various Hausman Tests
%b34slet
%b34slet
%b34slet
b34sexec
b34sexec
dob34s =1;
dosas =0;
dostata=1;
options noheader; b34srun;
options ginclude('micro.mac')
member(griliches76); b34srun;
%b34sif(&dob34s.ne.0)%then;
b34sexec matrix;
call loaddata;
call load(ls2);
call echooff;
call character(lhs,'lw');
call character(endvar, 'iq');
call character(endvar2,'iq s');
call character(rhs,'iq s expr tenure rns smsa iyear_67 iyear_68
iyear_69 iyear_70 iyear_71 iyear_73 constant');
call character(ivar,'s expr tenure rns smsa iyear_67 iyear_68
iyear_69 iyear_70 iyear_71 iyear_73 constant med kww age mrt');
call character(ivar2,'expr tenure rns smsa iyear_67 iyear_68
iyear_69 iyear_70 iyear_71 iyear_73 constant med kww age mrt');
call
call
Call
call
call
call
olsq(argument(lhs) argument(rhs) :noint :print :savex);
print(' ':);
print('Baum (2006) page 193':);
print(' ':);
print(lhs,rhs,ivar,endvar);
ls2(%y,%x,catcol(argument(ivar)),%names,%yvar,1);
* Hausman test ;
call hausman('2SLS Model large sample covar - Testing coef 1',
%olscoef(1),submatrix(%varcov1,1,1,1,1),
%ls2coef(1),submatrix(%covar_l,1,1,1,1),h,sig_h,1);
call hausman('2SLS Model small sample covar - Testing coef 1',
%olscoef(1),submatrix(%varcov1,1,1,1,1),
%ls2coef(1),submatrix(%covar_s,1,1,1,1),h,sig_h,1);
call print('Baum (2006) page 198':);
call gmmest(%y,%x,%z,%names,%yvar,j_stat,sigma,1);
* Do C test to see it S is a good instrument;
Simultaneous Equations Systems
* s is removed from ivar to ivar2 ;
call olsq(argument(lhs) argument(rhs) :noint :print :savex);
call print(' ':);
call print('Now there are 2 endogenous on the right':);
call print(lhs,rhs,ivar2,endvar2);
call ls2(%y,%x,catcol(argument(ivar2)),%names,%yvar,1);
jj=integers(1,2);
call hausman('2SLS Model large sample covar - Testing coef 1-2',
%olscoef(jj),submatrix(%varcov1,1,2,1,2),
%ls2coef(jj),submatrix(%covar_l,1,2,1,2),h,sig_h,2);
jj=integers(1,2);
call hausman('2SLS Model small sample covar - Testing coef 1-2',
%olscoef(jj),submatrix(%varcov1,1,2,1,2),
%ls2coef(jj),submatrix(%covar_s,1,2,1,2),h,sig_h,2);
call gmmest(%y,%x,%z,%names,%yvar,j_stat,sigma,1);
b34srun;
%b34sendif;
%b34sif(&dostata.ne.0)%then;
b34sexec options open('statdata.do') unit(28) disp=unknown$ b34srun$
b34sexec options clean(28)$ b34srun$
b34sexec options open('stata.do') unit(29) disp=unknown$ b34srun$
b34sexec options clean(29)$ b34srun$
b34sexec pgmcall idata=28 icntrl=29$
stata$
* for detail on stata commands see Baum page 205 ;
pgmcards$
* uncomment if do not use /e
* log using stata.log, text
global xlist s expr tenure rns smsa iyear_67 ///
iyear_68 iyear_69 iyear_70 iyear_71 iyear_73
global xlist2 expr tenure rns smsa iyear_67 ///
iyear_68 iyear_69 iyear_70 iyear_71 iyear_73
ivregress 2sls lw $xlist (iq=med kww age mrt)
estat endogenous
estat overid
ivreg2
lw $xlist (iq=med kww age mrt), gmm2 robust
* s is now endogenous
ivregress 2sls lw $xlist2 (s iq=med kww age mrt)
estat endogenous
estat overid
ivreg2 lw $xlist2 (s iq=med kww age mrt), gmm2 robust
b34sreturn$
b34seend$
83
84
Chapter 4
b34sexec options close(28); b34srun;
b34sexec options close(29); b34srun;
b34sexec options
dounix('stata -b do stata.do ')
dodos('stata /e do stata.do');
b34srun;
b34sexec options npageout
writeout('output from stata',' ',' ')
copyfout('stata.log')
dodos('erase stata.do',
'erase stata.log',
'erase statdata.do') $
b34srun$
%b34sendif;
%b34sif(&dosas.ne.0)%then;
B34SEXEC OPTIONS OPEN('testsas.sas') UNIT(29) DISP=UNKNOWN$ B34SRUN$
B34SEXEC OPTIONS CLEAN(29) $ B34SEEND$
B34SEXEC PGMCALL IDATA=29 ICNTRL=29$
SAS
$
PGMCARDS$
proc means; run;
proc model ;
endogenous iq;
lw=
ciq*iq + cs*s + cexpr*expr+ ctenture*tenure+ crns*rns
+ csmsa*smsa+ ciyear_67*iyear_67+ ciyear_68*iyear_68
+ ciyear_69*iyear_69 + ciyear_70*iyear_70+
+ ciyear_71*iyear_71 + ciyear_73*iyear_73 + interc;
fit lw / ols 2sls hausman;
instruments s expr tenure rns smsa iyear_67
iyear_68 iyear_69 iyear_70 iyear_71 iyear_73
med kww age mrt;
run;
proc model ;
endogenous iq s;
lw=
ciq*iq + cs*s + cexpr*expr+ ctenture*tenure+ crns*rns
+ csmsa*smsa+ ciyear_67*iyear_67+ ciyear_68*iyear_68
+ ciyear_69*iyear_69 + ciyear_70*iyear_70+
+ ciyear_71*iyear_71 + ciyear_73*iyear_73 + interc;
fit lw / ols 2sls hausman;
instruments expr tenure rns smsa iyear_67
iyear_68 iyear_69 iyear_70 iyear_71 iyear_73
med kww age mrt;
run;
B34SRETURN$
B34SRUN $
B34SEXEC OPTIONS CLOSE(29)$ B34SRUN$
/$ The next card has to be modified to point to SAS location
/$ Be sure and wait until SAS gets done before letting B34S resume
B34SEXEC OPTIONS dodos('start /w /r sas testsas')
dounix('sas testsas')$
B34SRUN$
Simultaneous Equations Systems
B34SEXEC OPTIONS NPAGEOUT NOHEADER
WRITEOUT('
','Output from SAS',' ',' ')
WRITELOG('
','Output from SAS',' ',' ')
COPYFOUT('testsas.lst')
COPYFLOG('testsas.log')
dodos('erase testsas.sas','erase testsas.lst','erase testsas.log')
dounix('rm
testsas.sas','rm
testsas.lst','rm
testsas.log')$
B34SRUN$
%b34sendif;
Edited output from running the code in Table 4.12 is shown below.
85
86
Chapter 4
Baum (2006) page 193
LHS
= LW
RHS
=
IQ S EXPR TENURE RNS SMSA IYEAR_67 IYEAR_68
70 IYEAR_71 IYEAR_73 CONSTANT
IYEAR_69 IYEAR_
IVAR
=
s expr tenure rns smsa iyear_67 iyear_68
iyear_71 iyear_73 constant med kww age mrt
ENDVAR
iyear_69 iyear_70
= iq
CAN_CORR= Vector of
0.691766E-01
1.00000
13
1.00000
1.00000
elements
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
OLS and LS2 Estimation
Dependent Variable
OLS Sum of squared Residuals
LS2 Sum of squared Residuals
Large Sample ls2 sigma
Small Sample ls2 sigma
Rank of Equation
Order of Equation
Equation is overidentified
Anderson LR ident./IV Relevance test
Significance of Anderson LR Statistic
Anderson Canon Correlation LM test
Significance of Anderson LM Statistic
Cragg-Donald Chi-Square Weak ID Test
Significance of Cragg-Donald test
Basmann
Significance of Basmann Statistic
Sargan N*R-sq / J-Test Test
Significance of Sargan Statistic
LW
79.3733887898386
80.0182337003248
0.105564952111246
0.107407025101107
13
16
54.3377701150767
0.999999999955283
52.4358658675750
0.999999999888172
56.3327760083706
0.999999999982924
97.0249713169360
1.00000000000000
87.6552316944768
1.00000000000000
Hausman (1978) test - Sig. => need LS2
All coef. tested with Full (small) Covar. Matrix
Hausman (1978) M test statistic
0.445917496047955
Rank of (ivcoef-varcov1)
13
Significance of Hausman Test
2.555970727478295E-008
All coef. tested with Full (large) Covar. Matrix
Hausman (1978) M test statistic
0.454282631283137
Rank of (ivcoef-varcov1)
13
Significance of Hausman Test
2.873775182323094E-008
All coef. tested with diag (small) Covar. Matrix
Hausman (1978) M test statistic
4.31737789543287
Rank of (ivcoef-varcov1)
13
Significance of Hausman Test
1.267697953382511E-002
All coef. tested with diag (large) Covar. Matrix
Hausman (1978) M test statistic
8.48598394209793
Rank of (ivcoef-varcov1)
13
Significance of Hausman Test
0.189437061692008
+++++++++++++++++++++++++++++++++++++++++++++++++++++
Obs
1
2
3
4
5
6
7
8
9
10
11
12
13
VAR_NAME
IQ
S
EXPR
TENURE
RNS
SMSA
IYEAR_67
IYEAR_68
IYEAR_69
IYEAR_70
IYEAR_71
IYEAR_73
CONSTANT
%OLSCOEF
%OLS_SE
%OLS_T
0.2712E-02 0.1031E-02
2.630
0.6195E-01 0.7279E-02
8.512
0.3084E-01 0.6510E-02
4.737
0.4216E-01 0.7481E-02
5.636
-0.9629E-01 0.2755E-01 -3.496
0.1329
0.2658E-01
5.001
-0.5421E-01 0.4785E-01 -1.133
0.8058E-01 0.4490E-01
1.795
0.2076
0.4386E-01
4.733
0.2282
0.4880E-01
4.677
0.2227
0.4310E-01
5.167
0.3229
0.4066E-01
7.941
4.235
0.1133
37.37
2SLS Model large sample covar - Testing coef 1
%LS2COEF
%LS2_SES
%LS2_SEL
%LS2_T_S
%LS2_T_L
0.1747E-03 0.3937E-02 0.3903E-02 0.4436E-01 0.4474E-01
0.6918E-01 0.1305E-01 0.1294E-01
5.301
5.347
0.2987E-01 0.6697E-02 0.6639E-02
4.460
4.498
0.4327E-01 0.7693E-02 0.7627E-02
5.625
5.674
-0.1036
0.2974E-01 0.2948E-01 -3.484
-3.514
0.1351
0.2689E-01 0.2666E-01
5.025
5.069
-0.5260E-01 0.4811E-01 0.4769E-01 -1.093
-1.103
0.7947E-01 0.4511E-01 0.4472E-01
1.762
1.777
0.2109
0.4432E-01 0.4393E-01
4.759
4.800
0.2386
0.5142E-01 0.5097E-01
4.641
4.682
0.2285
0.4412E-01 0.4374E-01
5.178
5.223
0.3259
0.4107E-01 0.4072E-01
7.935
8.004
4.400
0.2709
0.2685
16.24
16.38
Simultaneous Equations Systems
Hausman (1978) M test statistic
Rank of (ivcoef-varcov1)
Significance of Hausman Test
87
0.454282631270989
1
0.499691813837660
2SLS Model small sample covar - Testing coef 1
Hausman (1978) M test statistic
0.445917496059109
Rank of (ivcoef-varcov1)
1
Significance of Hausman Test
0.495719926530397
Baum (2006) page 198
GMM Estimates
Dependent Variable
OLS sum of squares
LS2 sum of squares
GMM sum of squares
Rank of Equation
Order of Equation
Equation is overidentified
Anderson ident./IV Relevance test
Significance of Anderson Statistic
Anderson Canon Correlation LM test
Significance of Anderson LM Statistic
Cragg-Donald Chi-Square Weak ID Test
Significance of Cragg-Donald test
Hansen J_stat Ident. of
instruments
Significance of Hansen J_stat
LW
79.3733887898386
80.0182337003248
81.2621788722545
13
16
54.3377701150767
0.999999999955283
52.4358658675750
0.999999999888172
56.3327760083706
0.999999999982924
74.1648776242674
0.999999999999999
Hausman (1978) test - Sig. => Need GMM
Hausman (1978) M test statistic
Rank of (ivcoef-varcov1)
Significance of Hausman Test
15.0455541849668
13
0.695482435419703
+++++++++++++++++++++++++++++++++++++++++++++++++++++
Obs
1
2
3
4
5
6
7
8
9
10
11
12
13
NAMES
IQ
S
EXPR
TENURE
RNS
SMSA
IYEAR_67
IYEAR_68
IYEAR_69
IYEAR_70
IYEAR_71
IYEAR_73
CONSTANT
%COEFGMM
%SEGMM
%TGMM
-0.1401E-02 0.4113E-02 -0.3407
0.7684E-01 0.1319E-01
5.827
0.3123E-01 0.6693E-02
4.667
0.4900E-01 0.7344E-02
6.672
-0.1007
0.2959E-01 -3.403
0.1336
0.2632E-01
5.075
-0.2101E-01 0.4554E-01 -0.4614
0.8910E-01 0.4270E-01
2.087
0.2072
0.4080E-01
5.080
0.2338
0.5285E-01
4.424
0.2346
0.4257E-01
5.510
0.3360
0.4041E-01
8.315
4.437
0.2900
15.30
Note that the results for testing all coefficients or only the IQ coefficient are very similar. This is
because there was a substantial change in this coefficient from the OLS value of .002712 with t =
2.630 to the LS2 value of .0001747 with small sample t of .04436. Due to the high covariance of
the coefficients this was not significant. If the covariance of the coefficients is assumed to be 0.0,
then the significance of the Hausman statistic rises to 8.49 with significance of .999. The
Hausman tests reported by Stata for this problem are Durbin = .457658 with p=.4987 and WuHausman = .449477 with p=.5028. In the next problem both S and IQ are assumed to be
endogenous.
Now there are 2 endogenous on the right
LHS
= LW
RHS
=
IQ S EXPR TENURE RNS SMSA IYEAR_67 IYEAR_68
70 IYEAR_71 IYEAR_73 CONSTANT
IVAR2
=
expr tenure rns smsa iyear_67 iyear_68
year_71 iyear_73 constant med kww age mrt
IYEAR_69 IYEAR_
iyear_69 iyear_70 i
ENDVAR2 = iq s
CAN_CORR= Vector of
0.632956E-01
13
0.363635
elements
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
88
Chapter 4
1.00000
1.00000
1.00000
1.00000
1.00000
OLS and LS2 Estimation
Dependent Variable
OLS Sum of squared Residuals
LS2 Sum of squared Residuals
Large Sample ls2 sigma
Small Sample ls2 sigma
Rank of Equation
Order of Equation
Equation is overidentified
Anderson LR ident./IV Relevance test
Anderson Canon Correlation LM test
Significance of Anderson LM Statistic
Cragg-Donald Chi-Square Weak ID Test
Significance of Cragg-Donald test
Basmann
Significance of Basmann Statistic
Sargan N*R-sq / J-Test Test
Significance of Sargan Statistic
LW
79.3733887898386
107.531341127675
0.141861927609070
0.144337370641175
13
15
-343.395953003074
47.9780438223675
0.999999999784662
51.2200459449682
0.999999999956070
13.2374795188268
0.998664887551261
13.2683313734400
0.998685324861342
Hausman (1978) test - Sig. => need LS2
All coef. tested with Full (small) Covar. Matrix
Hausman (1978) M test statistic
45.6187185063307
Rank of (ivcoef-varcov1)
13
Significance of Hausman Test
0.999983506422730
All coef. tested with Full (large) Covar. Matrix
Hausman (1978) M test statistic
46.6688127514076
Rank of (ivcoef-varcov1)
13
Significance of Hausman Test
0.999989011736172
All coef. tested with diag (small) Covar. Matrix
Hausman (1978) M test statistic
106.728898645404
Rank of (ivcoef-varcov1)
13
Significance of Hausman Test
1.00000000000000
All coef. tested with diag (large) Covar. Matrix
Hausman (1978) M test statistic
110.450214212224
Rank of (ivcoef-varcov1)
13
Significance of Hausman Test
1.00000000000000
+++++++++++++++++++++++++++++++++++++++++++++++++++++
Obs
1
2
3
4
5
6
7
8
9
10
11
12
13
VAR_NAME
IQ
S
EXPR
TENURE
RNS
SMSA
IYEAR_67
IYEAR_68
IYEAR_69
IYEAR_70
IYEAR_71
IYEAR_73
CONSTANT
%OLSCOEF
%OLS_SE
%OLS_T
0.2712E-02 0.1031E-02
2.630
0.6195E-01 0.7279E-02
8.512
0.3084E-01 0.6510E-02
4.737
0.4216E-01 0.7481E-02
5.636
-0.9629E-01 0.2755E-01 -3.496
0.1329
0.2658E-01
5.001
-0.5421E-01 0.4785E-01 -1.133
0.8058E-01 0.4490E-01
1.795
0.2076
0.4386E-01
4.733
0.2282
0.4880E-01
4.677
0.2227
0.4310E-01
5.167
0.3229
0.4066E-01
7.941
4.235
0.1133
37.37
%LS2COEF
%LS2_SES
%LS2_SEL
%LS2_T_S
-0.9099E-02 0.4745E-02 0.4704E-02 -1.917
0.1724
0.2092E-01 0.2074E-01
8.243
0.4929E-01 0.8225E-02 0.8155E-02
5.992
0.4222E-01 0.8920E-02 0.8843E-02
4.733
-0.1018
0.3447E-01 0.3418E-01 -2.953
0.1261
0.3120E-01 0.3093E-01
4.043
-0.5962E-01 0.5578E-01 0.5530E-01 -1.069
0.4868E-01 0.5247E-01 0.5202E-01 0.9278
0.1528
0.5201E-01 0.5156E-01
2.938
0.1744
0.6028E-01 0.5976E-01
2.894
0.9167E-01 0.5461E-01 0.5414E-01
1.678
0.9324E-01 0.5768E-01 0.5718E-01
1.617
4.034
0.3182
0.3154
12.68
2SLS Model large sample covar - Testing coef 1-2
Hausman (1978) M test statistic
46.6688127514087
Rank of (ivcoef-varcov1)
2
Significance of Hausman Test
0.999999999926549
Coefficient Difference Vector
D
= Vector of
-0.118110E-01
2
elements
0.110471
OLS Var_Covar
VARCOV1 = Matrix of
1
2
1
0.106381E-05
-0.302739E-05
2
by
2
-0.302739E-05
0.529777E-04
2
elements
%LS2_T_L
-1.934
8.314
6.044
4.774
-2.978
4.078
-1.078
0.9359
2.964
2.919
1.693
1.631
12.79
Simultaneous Equations Systems
IV Var-Covar
IVCOVAR = Matrix of
1
2
1
0.221314E-04
-0.766991E-04
2
by
2
elements
2
-0.766991E-04
0.430068E-03
Generalized Inverse of difference
INVDIF
1
2
= Matrix of
2
1
149826.
29271.3
by
2
elements
2
29271.3
8370.60
rcond
1.435620745007835E-002
2SLS Model small sample covar - Testing coef 1-2
Hausman (1978) M test statistic
45.6187185063286
Rank of (ivcoef-varcov1)
2
Significance of Hausman Test
0.999999999875829
Coefficient Difference Vector
D
= Vector of
-0.118110E-01
2
elements
0.110471
OLS Var_Covar
VARCOV1 = Matrix of
1
2
1
0.106381E-05
-0.302739E-05
2
by
2
elements
2
elements
2
-0.302739E-05
0.529777E-04
IV Var-Covar
IVCOVAR = Matrix of
1
2
1
0.225176E-04
-0.780375E-04
2
by
2
-0.780375E-04
0.437572E-03
Generalized Inverse of difference
INVDIF
1
2
= Matrix of
1
146541.
28580.8
2
by
2
elements
2
28580.8
8174.45
rcond
1.439786477313241E-002
GMM Estimates
Dependent Variable
OLS sum of squares
LS2 sum of squares
GMM sum of squares
Rank of Equation
Order of Equation
Equation is overidentified
Anderson ident./IV Relevance test
Anderson Canon Correlation LM test
Significance of Anderson LM Statistic
Cragg-Donald Chi-Square Weak ID Test
Significance of Cragg-Donald test
Hansen J_stat Ident. of
instruments
Significance of Hansen J_stat
Hausman (1978) test - Sig. => Need GMM
Hausman (1978) M test statistic
Rank of (ivcoef-varcov1)
Significance of Hausman Test
LW
79.3733887898386
107.531341127675
109.084651127200
13
15
-343.395953003074
47.9780438223675
0.999999999784662
51.2200459449682
0.999999999956070
11.6014813674647
0.996974686884900
53.4818275385920
13
0.999999255129040
+++++++++++++++++++++++++++++++++++++++++++++++++++++
89
90
Chapter 4
Obs
1
2
3
4
5
6
7
8
9
10
11
12
13
NAMES
IQ
S
EXPR
TENURE
RNS
SMSA
IYEAR_67
IYEAR_68
IYEAR_69
IYEAR_70
IYEAR_71
IYEAR_73
CONSTANT
%COEFGMM
%SEGMM
%TGMM
-0.9286E-02 0.4882E-02 -1.902
0.1758
0.2068E-01
8.502
0.5028E-01 0.8044E-02
6.251
0.4252E-01 0.9455E-02
4.497
-0.1041
0.3352E-01 -3.105
0.1248
0.3077E-01
4.054
-0.5304E-01 0.5146E-01 -1.031
0.4595E-01 0.4957E-01 0.9270
0.1555
0.4763E-01
3.264
0.1670
0.6100E-01
2.737
0.8465E-01 0.5540E-01
1.528
0.9961E-01 0.6070E-01
1.641
4.004
0.3348
11.96
Tests are made on all coefficients, using a diagonal covariance matrix and just the two
endogenous variables. The significant Hausman test statistics indicate that IV methods are
needed. Internal calculations for the two coefficient case are displayed. A significant finding for
the Hausman test is consistent with what Stata finds but the exact statistics do not match
apparently due to the variant of the test implied. Stata appears to be testing all coefficients.
Note that the LS2 and GMM coefficients match 100%. To attempt to issustrate the various
Hausman test values Sas was employed.
output from stata
___ ____ ____ ____ ____ (R)
/__
/
____/
/
____/
___/
/
/___/
/
/___/
12.0
Statistics/Data Analysis
Copyright 1985-2011 StataCorp LP
StataCorp
4905 Lakeway Drive
College Station, Texas 77845 USA
800-STATA-PC
http://www.stata.com
979-696-4600
stata@stata.com
979-696-4601 (fax)
Single-user Stata perpetual license:
Serial number: 3012042652
Licensed to: Houston H. Stokes
U of Illinois
Notes:
1.
Stata running in batch mode
. do stata.do
. * File built by B34S
. run statdata.do
on
5/ 8/11
at
19:55:28
. * uncomment if do not use /e
. * log using stata.log, text
. global xlist s expr tenure rns smsa iyear_67 ///
>
iyear_68 iyear_69 iyear_70 iyear_71 iyear_73
. global xlist2 expr tenure rns smsa iyear_67 ///
>
iyear_68 iyear_69 iyear_70 iyear_71 iyear_73
. ivregress 2sls lw $xlist (iq=med kww age mrt)
Instrumental variables (2SLS) regression
Number of obs
Wald chi2(12)
Prob > chi2
R-squared
Root MSE
=
=
=
=
=
758
560.57
0.0000
0.4255
.32491
-----------------------------------------------------------------------------lw |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------iq |
.0001747
.0039035
0.04
0.964
-.007476
.0078253
s |
.0691759
.0129366
5.35
0.000
.0438206
.0945312
expr |
.029866
.0066393
4.50
0.000
.0168533
.0428788
tenure |
.0432738
.0076271
5.67
0.000
.0283249
.0582226
rns | -.1035897
.029481
-3.51
0.000
-.1613715
-.0458079
smsa |
.1351148
.0266573
5.07
0.000
.0828674
.1873623
iyear_67 |
-.052598
.0476924
-1.10
0.270
-.1460734
.0408774
iyear_68 |
.0794686
.0447194
1.78
0.076
-.0081797
.1671169
iyear_69 |
.2108962
.0439336
4.80
0.000
.1247878
.2970045
iyear_70 |
.2386338
.0509733
4.68
0.000
.1387281
.3385396
iyear_71 |
.2284609
.0437436
5.22
0.000
.1427251
.3141967
iyear_73 |
.3258944
.0407181
8.00
0.000
.2460884
.4057004
_cons |
4.39955
.2685443
16.38
0.000
3.873213
4.925887
Simultaneous Equations Systems
-----------------------------------------------------------------------------Instrumented: iq
Instruments:
s expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70
iyear_71 iyear_73 med kww age mrt
. estat endogenous
Tests of endogeneity
Ho: variables are exogenous
Durbin (score) chi2(1)
Wu-Hausman F(1,744)
=
=
.457658
.449477
(p = 0.4987)
(p = 0.5028)
. estat overid
Tests of overidentifying restrictions:
Sargan (score) chi2(3) =
Basmann chi2(3)
=
. ivreg2
87.6552
97.025
(p = 0.0000)
(p = 0.0000)
lw $xlist (iq=med kww age mrt), gmm2 robust
2-Step GMM estimation
--------------------Estimates efficient for arbitrary heteroskedasticity
Statistics robust to heteroskedasticity
Total (centered) SS
Total (uncentered) SS
Residual SS
=
=
=
139.2861498
24652.24662
81.26217887
Number of obs
F( 12,
745)
Prob > F
Centered R2
Uncentered R2
Root MSE
=
=
=
=
=
=
758
49.67
0.0000
0.4166
0.9967
.3274
-----------------------------------------------------------------------------|
Robust
lw |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------iq | -.0014014
.0041131
-0.34
0.733
-.009463
.0066602
s |
.0768355
.0131859
5.83
0.000
.0509915
.1026794
expr |
.0312339
.0066931
4.67
0.000
.0181157
.0443522
tenure |
.0489998
.0073437
6.67
0.000
.0346064
.0633931
rns | -.1006811
.0295887
-3.40
0.001
-.1586738
-.0426884
smsa |
.1335973
.0263245
5.08
0.000
.0820021
.1851925
iyear_67 | -.0210135
.0455433
-0.46
0.645
-.1102768
.0682498
iyear_68 |
.0890993
.042702
2.09
0.037
.0054049
.1727937
iyear_69 |
.2072484
.0407995
5.08
0.000
.1272828
.287214
iyear_70 |
.2338308
.0528512
4.42
0.000
.1302445
.3374172
iyear_71 |
.2345525
.0425661
5.51
0.000
.1511244
.3179805
iyear_73 |
.3360267
.0404103
8.32
0.000
.2568239
.4152295
_cons |
4.436784
.2899504
15.30
0.000
3.868492
5.005077
-----------------------------------------------------------------------------Underidentification test (Kleibergen-Paap rk LM statistic):
41.537
Chi-sq(4) P-val =
0.0000
-----------------------------------------------------------------------------Weak identification test (Cragg-Donald Wald F statistic):
13.786
(Kleibergen-Paap rk Wald F statistic):
12.167
Stock-Yogo weak ID test critical values: 5% maximal IV relative bias
16.85
10% maximal IV relative bias
10.27
20% maximal IV relative bias
6.71
30% maximal IV relative bias
5.34
10% maximal IV size
24.58
15% maximal IV size
13.96
20% maximal IV size
10.26
25% maximal IV size
8.31
Source: Stock-Yogo (2005). Reproduced by permission.
NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
-----------------------------------------------------------------------------Hansen J statistic (overidentification test of all instruments):
74.165
Chi-sq(3) P-val =
0.0000
-----------------------------------------------------------------------------Instrumented:
iq
Included instruments: s expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70
iyear_71 iyear_73
Excluded instruments: med kww age mrt
-----------------------------------------------------------------------------. * s is now endogenous
. ivregress 2sls lw $xlist2 (s iq=med kww age mrt)
Instrumental variables (2SLS) regression
Number of obs
Wald chi2(12)
Prob > chi2
R-squared
Root MSE
=
=
=
=
=
758
459.55
0.0000
0.2280
.37665
-----------------------------------------------------------------------------lw |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------s |
.1724253
.0207381
8.31
0.000
.1317794
.2130712
91
92
Chapter 4
iq | -.0090988
.0047044
-1.93
0.053
-.0183193
.0001216
expr |
.0492895
.0081546
6.04
0.000
.0333068
.0652722
tenure |
.0422171
.0088429
4.77
0.000
.0248854
.0595488
rns | -.1017935
.0341765
-2.98
0.003
-.1687781
-.0348088
smsa |
.1261109
.0309275
4.08
0.000
.0654942
.1867277
iyear_67 | -.0596171
.0552955
-1.08
0.281
-.1679942
.04876
iyear_68 |
.0486796
.0520161
0.94
0.349
-.0532701
.1506292
iyear_69 |
.1528176
.051563
2.96
0.003
.051756
.2538792
iyear_70 |
.1744361
.0597576
2.92
0.004
.0573133
.2915588
iyear_71 |
.091666
.054144
1.69
0.090
-.0144543
.1977863
iyear_73 |
.0932398
.0571819
1.63
0.103
-.0188347
.2053142
_cons |
4.03351
.3154215
12.79
0.000
3.415295
4.651725
-----------------------------------------------------------------------------Instrumented: s iq
Instruments:
expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70
iyear_71 iyear_73 med kww age mrt
. estat endogenous
Tests of endogeneity
Ho: variables are exogenous
Durbin (score) chi2(2)
Wu-Hausman F(2,743)
=
=
70.8497
38.3041
(p = 0.0000)
(p = 0.0000)
. estat overid
Tests of overidentifying restrictions:
Sargan (score) chi2(2) =
Basmann chi2(2)
=
13.2683
13.2375
(p = 0.0013)
(p = 0.0013)
. ivreg2 lw $xlist2 (s iq=med kww age mrt), gmm2 robust
2-Step GMM estimation
--------------------Estimates efficient for arbitrary heteroskedasticity
Statistics robust to heteroskedasticity
Total (centered) SS
Total (uncentered) SS
Residual SS
=
=
=
139.2861498
24652.24662
109.0846511
Number of obs
F( 12,
745)
Prob > F
Centered R2
Uncentered R2
Root MSE
=
=
=
=
=
=
758
41.98
0.0000
0.2168
0.9956
.3794
-----------------------------------------------------------------------------|
Robust
lw |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------s |
.1757958
.0206766
8.50
0.000
.1352703
.2163212
iq | -.0092862
.0048824
-1.90
0.057
-.0188555
.0002832
expr |
.0502828
.0080438
6.25
0.000
.0345171
.0660484
tenure |
.0425214
.0094549
4.50
0.000
.0239901
.0610526
rns | -.1040931
.0335239
-3.11
0.002
-.1697986
-.0383875
smsa |
.1247512
.0307747
4.05
0.000
.0644338
.1850686
iyear_67 | -.0530432
.0514609
-1.03
0.303
-.1539047
.0478184
iyear_68 |
.0459546
.0495735
0.93
0.354
-.0512077
.1431169
iyear_69 |
.1554801
.0476311
3.26
0.001
.0621249
.2488352
iyear_70 |
.1669875
.0610006
2.74
0.006
.0474285
.2865464
iyear_71 |
.0846485
.0554035
1.53
0.127
-.0239404
.1932373
iyear_73 |
.0996068
.0607034
1.64
0.101
-.0193696
.2185833
_cons |
4.003924
.3348423
11.96
0.000
3.347645
4.660203
-----------------------------------------------------------------------------Underidentification test (Kleibergen-Paap rk LM statistic):
40.927
Chi-sq(3) P-val =
0.0000
-----------------------------------------------------------------------------Weak identification test (Cragg-Donald Wald F statistic):
12.552
(Kleibergen-Paap rk Wald F statistic):
11.461
Stock-Yogo weak ID test critical values: 5% maximal IV relative bias
11.04
10% maximal IV relative bias
7.56
20% maximal IV relative bias
5.57
30% maximal IV relative bias
4.73
10% maximal IV size
16.87
15% maximal IV size
9.93
20% maximal IV size
7.54
25% maximal IV size
6.28
Source: Stock-Yogo (2005). Reproduced by permission.
NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
-----------------------------------------------------------------------------Hansen J statistic (overidentification test of all instruments):
11.601
Chi-sq(2) P-val =
0.0030
-----------------------------------------------------------------------------Instrumented:
s iq
Included instruments: expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70
iyear_71 iyear_73
Excluded instruments: med kww age mrt
-----------------------------------------------------------------------------.
Simultaneous Equations Systems
93
end of do-file
Output from SAS
These results 100% match B34S and Stata. The Hausman value of .45 also matches. Note that
Stata and SAS use the convention that 1.00 implies no significance.
The MODEL Procedure
Nonlinear 2SLS Summary of Residual Errors
Equation
lw
DF
Model
DF
Error
SSE
MSE
Root MSE
R-Square
Adj
R-Sq
13
745
80.0182
0.1074
0.3277
0.4255
0.4163
Nonlinear 2SLS Parameter Estimates
Parameter
Estimate
Approx
Std Err
t Value
Approx
Pr > |t|
ciq
cs
cexpr
ctenture
crns
csmsa
ciyear_67
ciyear_68
ciyear_69
ciyear_70
ciyear_71
ciyear_73
interc
0.000175
0.069176
0.029866
0.043274
-0.10359
0.135115
-0.0526
0.079469
0.210896
0.238634
0.228461
0.325894
4.39955
0.00394
0.0130
0.00670
0.00769
0.0297
0.0269
0.0481
0.0451
0.0443
0.0514
0.0441
0.0411
0.2709
0.04
5.30
4.46
5.62
-3.48
5.02
-1.09
1.76
4.76
4.64
5.18
7.93
16.24
0.9646
<.0001
<.0001
<.0001
0.0005
<.0001
0.2746
0.0785
<.0001
<.0001
<.0001
<.0001
<.0001
Number of Observations
Used
Missing
Efficient
under H0
OLS
758
0
Statistics for System
Objective
Objective*N
0.0122
9.2533
Hausman's Specification Test Results
Consistent
under H1
DF
Statistic
Pr > ChiSq
2SLS
13
0.45
1.0000
Label
log wage
94
Chapter 4
The SAS results that have both IQ and S endogenous match Stata and B34S. The Hausman test
of 45.62 matches the B34S value of 45.6187 for the “All coef. tested with Full (small) Covar.
Matrix” but does not match Stata values of 70.8497 which appears to be for a test on the two
endogenous coefficients. Note that the B34S Hausman M statistic values for testing the two
endogenous coefficients were 46.6688 and 45.6187 depending on whether the large sample or
small sample covariance matrix is used. The exact intermediate values calculated by b34s are
listed in the output to facilitate validation of the calculation
Nonlinear 2SLS Parameter Estimates
Parameter
Estimate
Approx
Std Err
t Value
Approx
Pr > |t|
ciq
cs
cexpr
ctenture
crns
csmsa
ciyear_67
ciyear_68
ciyear_69
ciyear_70
ciyear_71
ciyear_73
interc
-0.0091
0.172425
0.049289
0.042217
-0.10179
0.126111
-0.05962
0.04868
0.152818
0.174436
0.091666
0.09324
4.03351
0.00475
0.0209
0.00823
0.00892
0.0345
0.0312
0.0558
0.0525
0.0520
0.0603
0.0546
0.0577
0.3182
-1.92
8.24
5.99
4.73
-2.95
4.04
-1.07
0.93
2.94
2.89
1.68
1.62
12.68
0.0556
<.0001
<.0001
<.0001
0.0032
<.0001
0.2855
0.3538
0.0034
0.0039
0.0937
0.1064
<.0001
Number of Observations
Used
Missing
Efficient
under H0
B34S normal exit on Date (D:M:Y)
OLS
5/ 8/11
758
0
Statistics for System
Objective
Objective*N
0.002483
1.8823
Hausman's Specification Test Results
Consistent
under H1
DF
Statistic
Pr > ChiSq
2SLS
13
at Time (H:M:S) 19:55:40
45.62
<.0001
It is very important to use the Hausman test as well as other IV tests to determine if IV
techniques are to be used in place of OLS. The Hausman test should not be used without other
tests. If the instruments are poor, the endogenous variable coefficients will most likely differ
from their OLS values and the Hausman test may falsely give an indication that IV is
appropriate. If the Sargon test or its GMM equivalent Hansen J test is significant, this would
cast doubt on the suitability of the IV model. A significant Anderson test would suggest
correlation between the instruments and the endogenous variables which everything else equal
suggests using an IV technique. A significant Basmann test, on the other hand, indicates
correlation between the instruments and the error term or in other words the instruments are
themselves not exogenous.
4.8 Conclusion
The simeq command should be used when either there are endogenous variables on the
right-hand side of a regression model or when the seemingly unrelated regression model is
desired. In the former case, if OLS is attempted, the results will be biased estimates. Jennings
(1973, 1980), the original developer of the simeq code, made a major contribution in developing
fast and accurate code that was designed to alert the user to problems in the structure of the
model. These include rank tests on all the key matrices as well as rank tests on the matrix of
exogenous variables in the system. The matrix command was used to illustrate calculation of
Simultaneous Equations Systems
95
OLS, LIML, 2SLS, 3SLS and FIML models using more traditional equations that those used by
Jennings. SAS and Rats code was shown and the results compared to the B34S program output.
Using the matrix command LS2 (same as 2SLS) and GMM routines together with a number of
diagnostic tests are shown and the results compared to Stata and Rats using an important dataset
studied by Griliches (1975) and Baum (2006).