ch4 - Houston H. Stokes Page - University of Illinois at Chicago

Revised Chapter 4 in Specifying and Diagnostically Testing Econometric Models (Edition 3)
© by Houston H. Stokes 17 October 2010 All rights reserved. Preliminary Draft
Chapter 4
Simultaneous Equations Systems .................................................... 1
4.0 Introduction .............................................................. 1
4.1 Estimation of Structural Models ................................................ 2
Table 4.1 Matlab Program to obtain Constrained Reduced Form ............................... 3
Table 4.2 Edited output from running Matlab Program in Table 4.1 ........................... 5
4.2 Estimation of OLS, LIML, LS2, LS3, and ILS3 ...................................... 9
4.3 Examples ............................................................... 16
Table 4.3 Setup for ols, liml, ls2, ls3, and ils3 commands ................................. 17
Table 4.4 SAS Implementation of the Kmenta Model .................................... 25
Table 4.5 RATS Implementation of the Kmenta Model................................... 27
4.4 Exactly identified systems .................................................... 34
Table 4.6 Exactly Identified Kmenta Problem ......................................... 34
4.5 Analysis of OLS, 2SLS and 3SLS using Matrix Command ............................. 38
Table 4.7 Matrix Command Implementation of OLS, 2SLS, 3SLS and FIML .................... 39
4.6 LS2 and GMM Models and Specification tests ....................................... 52
Table 4.8 LS2 and General Method of Moments estimation routines .......................... 54
Table 4.9 Estimation of LS2 and GMM Models using B34S, Stata and Rats ..................... 60
4.8 Conclusion ............................................................... 72
Simultaneous Equations Systems
4.0 Introduction
In section 4.1, after first discussing the basic simultaneous equations model, the
constrained reduced form, the unconstrained reduced form and the final form are introduced.
The MATLAB symbolic capability is used to illustrate how the constrained reduced form relates
to the structural parameters of the model. In section 4.2 the theory behind QR approach to
simultaneous equations modeling as developed by Jennings (1980) is discussed in some detail.
The simeq command performs estimation of systems of equations by the methods of OLS,
limited information maximum likelihood (LIML), two-stage least squares (2SLS), three-stage
least squares (3SLS), iterative three-stage least squares (I3SLS), seemingly unrelated regression
(SUR) and full information maximum likelihood (FIML), using code developed by Les Jennings
(1973, 1980). The Jennings code is unique in that it implements the QR approach to estimate
systems of equations, which results in both substantial savings in time and increased accuracy.1
The estimation methods are well known and covered in detail in such books as Johnston (1963,
1972, 1984), Kmenta (1971, 1986), and Pindyck and Rubinfeld (1976, 1981, 1990) and will only
be sketched here. What will be discussed are the contributions of Jennings and others. The
1 The B34S qr command is designed to provide up to 16 digits of accuracy. This command, which also allows
estimation of the principal component (PC) regression, uses LINPACK code and is documented in Chapter 10. The
qr command is distinct from the code in the simeq command. The matrix command contains extensive and
programmable QR capability. For further examples see Chapter 10 and 16. and sections of chapter 2
4-1
4-2
Chapter 4
discussion of these techniques follows closely material in Jennings (1980) and Strang (1976).
Section 4.3 illustrates estimation of variants of the Kmenta model using RATS, B34S and
SAS while section 4.4 illustrates an exactly identified model. Section 4.5 shows how using the
matrix command OLS, LIMF, 3SLS and FIML can be estimated. The code here is for
illustration purposes, benchmarking but not production. Section 4.6 shows matrix command
subroutines LS2 and GAMEST that respectively do single equation 2SLS and GMM models.
This code is 100% production.
4.1 Estimation of Structural Models
Assume a system of G equations with K exogenous variables2
b11 y1i  ...  b1G yG i  11 x1i  ...  1 K xK i  u1i
b 21 y1i  ...  b 2G yG i   21 x1i  ...   2 K xK i  u2 i
.....................................................................
(4.1-1)
b G1 y1i  ...  bGG yG i   G1 x1i  ...   G K xK i  uG i
where xk i is the kth exogenous variable for the ith period, y j i is the jth endogenous variable for the
ith period, and u j i is the jth equation error term for the ith period. If we define
 b11 b12 ... b1G 
 b b ... b 
2G 
B=  21 22
.................... 


 bG1 bG2 ... bGG 
 y1i 
 x1i 
u1i 
1112 ... 1K 
y 
x 
u 
  ...  
2K 
   21 22
yi   2i  x i   2i  u i   2i 
. 
. 
. 
................ 
 
 
 


 G1 G 2 ... GK 
 yGi 
 xKi 
uGi 
equation (4.1-1) can be written as
Byi  x i  ui
If all observations in yi , x i and ui are included, then
u11u12 ... u1N 
 x11 x12 ... x1N 
 y11 y12 ... y1N 


 x x ... x 
 y y ... y 
u21u22 ... u2 N 
21 22
2N 
21 22
2N 



X=
Y=
U=
............... 
............... 
............... 






 yG1 yG 2 ... yG N 
uG1uG 2 ... uG N 
 xk 1 xk 2 ... xk N 
2 For further discussion see Pindyck and Rubinfeld (1981, 339-349).
(4.1-2)
Simultaneous Equations Systems
4-3
and equation (4.1-2) can be written as
BY  X  U
(4.1-3)
From equation (4.1-3), the constrained reduced form can be calculated as
Y=  B-1X  B-1U= X  V
(4.1-4)
If  is estimated directly with OLS, then it is called the unconstrained reduced form. The B34S
simeq command estimates B, using either OLS, 2SLS, LIML, 3SLS, I3SLS, or FIML. For each
estimated vector B, the associated reduced form coefficient vector π can be optionally
calculated.3 If B is estimated by OLS, the coefficients will be biased since the key OLS
assumption that the right-hand-side variables are orthogonal with the error term is violated.
Model (4.1-3) can be normalized such that the coefficients bi j  1 for i  j . The necessary
condition for identification of each equation is that the number of endogenous variables - 1 be
less than or equal to the number of excluded exogenous variables. The reason for this restriction
is that otherwise it would not be possible to solve for the elements of  uniquely in terms of the
other parameters of the model. A short example from Greene (2003) that is self documented
using MATLAB illustrates this problem.
Table 4.1 Matlab Program to obtain Constrained Reduced Form
%
%
%
%
%
Greene (2003) Chapter 15 Problem # 1
y1= g1*y2 + b11*x1 + b21*x2 + b31*x3
y2= g2*y1 + b12*x1 + b22*x2 + b32*x3
We know BY+GX=E
syms g1 g2 b11 b21 b31 b12 b22 b32
B =[ 1, -g1;
-g2,
1]
G =[-b11,-b21,-b31;
-b12,-b22,-b32]
a= -1*inv(B)*G
p11=a(1,1)
p12=a(1,2)
p13=a(1,3)
p21=a(2,1)
p22=a(2,2)
p23=a(2,3)
% Hopeless. Have 6 equations BUT more than 6 variables
'
Now impose restrictions'
3 If the model is exactly identified, the constrained reduced form  can be
directly estimated by OLS or using (4.1-4) from LIML, 2SLS or 3SLS. This is
shown empirically in section 4.5.
4-4
Chapter 4
'
b21=0 b32=0'
G =[-b11,
0, -b31;
-b12,-b22, 0
]
B,G
a= -1*inv(B)*G
' Here 6 equations and six unknowns g1 g2 b11 b31 b12 b22 '
p11=a(1,1)
p12=a(1,2)
p13=a(1,3)
p21=a(2,1)
p22=a(2,2)
p23=a(2,3)
Simultaneous Equations Systems
4-5
Table 4.2 Edited output from running Matlab Program in Table 4.1
p11
p12
p13
p21
p22
p23
=
=
=
=
=
=
-1/(-1+g1*g2)*b11+g1/(-1+g1*g2)*b12
-1/(-1+g1*g2)*b21+g1/(-1+g1*g2)*b22
-1/(-1+g1*g2)*b31+g1/(-1+g1*g2)*b32
-g2/(-1+g1*g2)*b11+1/ (-1+g1*g2)*b12
-g2/(-1+g1*g2)*b21+1/ (-1+g1*g2)*b22
-g2/(-1+g1*g2)*b31+1/ (-1+g1*g2)*b32
Here 6 equations and six unknowns g1 g2 b11 b31 b12 b22
p11
p12
p13
p21
p22
p23
= -1/(-1+g1*g2)*b11+g1/(-1+g1*g2)*b12
=
-g1/(-1+g1*g2)*b22
=
-1/(-1+g1*g2)*b31
= -g2/(-1+g1*g2)*b11+1/(-1+g1*g2)*b12
=
-1/(-1+g1*g2)*b22
=
-g2/(-1+g1*g2)*b31
If the excluded exogenous variables of the ith equation are not significant in any other equation,
then the ith equation will not be identified, even if it is correctly specified. We note that
E (ui | xi )  0 and E (uiui' )   where ui  [u1i , , uGi ] ' and x i  [ x1i , , xK i ]' . The reduced form
disturbance is not correlated with the exogenous variables or
E (vi | xi )  B 1 0  0 .
E (vi vi' | xi )  E[ B 1ui ui' ( B ' ) 1 ]  B 1( B ' ) 1   from which we deduce that
  BB'
(4.1-5)
In summary,  = G by K exogenous variable coefficient matrix, B = G by G nonsingular
endogenous variable coefficient matrix,  = K by K symmetric positive definite matrix structural
covariance matrix,  =G by K constrained reduced form coefficient matrix and  = G by G
reduced form covariance matrix. The importance of this is that since  and  can be estimated
consistently by OLS, following Greene (2003, 387) if B were known, we could obtain    B
from (4.1-4) and  from (4.1-5). If there are no endogenous variables on the right, yet a number
of equations are estimated where there is covariance in the error term across equations, the
seemingly unrelated regression model (SUR) can be estimated as
ˆ  ( X '  1 X ) 1 ( X '  1Y ).
(4.1-6)
ˆ (ˆ ) can be estimated if OLS is used on each of the G equations and
Elements of 
ij
ˆ ii  uˆiuˆi' /(T  Ki )
ˆ i j  uˆiuˆ 'j / (T  Ki )(T  K j )
(4.1-7)
For more detail see Greene (2003) or other advanced econometric books. Pindyck and
Rubinfield (1976, 1981, 1990) provides a particularly good treatment that is consistent with the
4-6
Chapter 4
notation in this chapter.
From (4.1-4) Theil (1971, 463-468) suggests calculating the final form. First partition the
ith observation of the exogenous variables into lagged endogenous, current exogenous and lagged
exogenous where identifies are used to express lags > 1.
  [d 0 , D1 , D2 , D3 ]
 yi 1 
x   x i 
 x i 1 
yi  d 0  D1yi 1  D2 x i  D3 x i 1   i*
*
i
(4.1-8)
Theil (1971) shows that (4.1-8) can be expressed as


t 1
t 0
yi  (I  D1 ) 1 d 0  D2 x i   D1t 1 ( D1D2  D3 )x i 1   D1t i*1
(4.1-9)
where D2 is the impact multiplier. If there are no lagged endogenous variables in the system,
D1  0 and the constrained reduced form and the final form are the same. In this case
  [ D2 , D3 ] . The interim multipliers are D2 , ( D1D2  D3 ), D1 ( D1D2  D3 ), , D1 ( D1D2  D3 )
which, when summed, form the total multiplier G *
G*  D2  (I  D1  D12 
)( D1D2  D3 )
 D2  (I  D1 ) 1 ( D1D2  D3 )
 (I  D1 ) 1[(I  D1 ) D2  D1D2  D3 )
(4.1-
 (I  D1 ) 1 ( D2  D3 )
10)
Goldberger (1959) and Kmenta (1971, 592) provide added detail. The importance of (4.1-8) is
that it shows the effect on all endogenous variables of a change in any exogenous variable after
all effects have had a chance to work themselves out in the system.
There are several common mistakes made in setting up simultaneous equations systems.
These include the following:
- Not fully checking for multicollinearity in the equations system.
- Attempting to interpret the estimated B and Γ coefficients as partial derivatives, rather
Simultaneous Equations Systems
than
4-7
looking at the reduced form G by K matrix π.
- Not effectively testing whether excluded exogenous variables are significant in at least
one other
equation in the system.
- Not building into the solution procedure provisions for taking into account the number
of
significant digits in the data.
The simeq code has unique design characteristics that allow solutions for some of these
problems. In the next sections, we will briefly outline some of these features.
Assume for a moment that X is a T by K matrix of observations of the exogenous
variables, Y is a T by 1 vector of observations of the endogenous variable, and β is a K element
array of OLS coefficients, then the OLS solution for the estimated β from equation (2.1-8) is
( X ' X ) 1 X ' Y . The problem with this approach is that some accuracy is lost by forming the
matrix X ' X . The QR approach4 proceeds by operating directly on the matrix X to express it in
terms of the upper triangular K by K matrix R and the T by T orthogonal matrix Q. X is factored
as
R 
R 
X=Q    [Q1 | Q2 ]    Q1R
0 
0 
(4.1-11)
Since Q'Q = I, then
(X' X)-1X' Y=(R 'Q1' Q1R)-1R 'Q1' Y=(R 'R) 1R 'Q1' Y=R 1Q1' Y
(4.1-12)
4 A good discussion of the QR factorization is contained in Strang (1976). Other references include Jennings
(1980) and Dongarra, Bunch, Moler, and Stewart (1979).
4-8
Chapter 4
Following Jennings (1980), we define the condition number of matrix X, (C(X)), as the
ratio of the square root of the largest eigenvalue of X ' X , [ Emax ( X ' X )] to the smallest
eigenvalue of X ' X , [ Emin ( X ' X )]
C(X)= [Emax (X'X)/E min (X'X)]
(4.1-13)
If | | X||= Emax (X'X) , and X is square and nonsingular, then
C(X)=||X|| ||X1 ||
(4.1-14)
Throughout B34S, 1/C(X) is checked to test for rank problems. Jennings (1980) notes that C(X)
can also be used as a measure of relative error. If μ is a measure of round-off error, then
[C ( X )]2 is the bound for the relative error of the calculated solution. In an IBM 370 running
double precision, μ is approximately .1E-16. If C(X) is > .1E+8 (1 /C(X) is < .1E-8), then
[C(X)]2  1 , meaning that no digits in the reported solution are significant. Jennings (1980)
looks at the problem from another perspective. If matrix X has a round-off error of τX such that
the actual X used is X+τX, then ||  X|| / ||X|| must be less than 1/C(X) for a solution to exist. If
||  X|| / ||X|| = 1/C(X)
(4.1-15)
then there exists a  X such that X   X is singular.5 The user can inspect the estimate of the
condition and determine the degree of multicollinearity. Most programs only report problems
when the matrix is singular. Inspection of C(X) gives warning of the degree of the problem. The
simeq command contains the IPR parameter option with which the user can inform the program
of the number of significant digits in X. This information is used to terminate the iterative threestage (ILS3) iterations when the relative change in the solution is within what would be
expected, given the number of significant digits in the data.
Jennings (1980) notes that the relative error of the QR solution to the OLS problem given
in equation (4.1-10) has the form
n1C ( X )  n2C ( X ) 2 (|| eˆ || / || ˆ ||)
(4.1-16)
where n1 and n2 are of the order of machine precision and || eˆ || ˆ || are the lengths of the
estimated residual and estimated coefficients, respectively. (The length or L2NORM of a vector
5 For more detail on techniques used in simeq to avoid numerical error in the calculations arising from differences
in the means of the data, see Jennings (1980).
Simultaneous Equations Systems
ei is defined as
e
2
i
4-9
) . Equation (4.1-14) indicates that as the relative error of the computer
i
solution improves, the closer the model fits. An estimate of this relative error is made for OLS,
LIML and 2SLS estimators reported by simeq.
4.2 Estimation of OLS, LIML, LS2, LS3, and ILS3
For OLS estimation of a system of equations, simeq uses the QR approach discussed
earlier. If the reduced option is used, once the structural coefficients B and Γ in equation (4.1-3)
are known, the constrained reduced form coefficients π from equation (4.1-4) are displayed. If B
and Γ are estimated using OLS, and all structural equations are exactly identified, then the
constraints on π imposed from the structural coefficients B and Γ are not binding and π could be
estimated directly with OLS or indirectly via (4.1-4). However, if one or more of the equations
in the structural equations system (4.1-2) are overidentified, π must be estimated as  B1 .
Although the reduced-form coefficients π exist and may be calculated from any set of
structural estimates B and Γ, in practice it is not desirable to report those derived from OLS
estimation because in the presence of endogenous variables on the right-hand side of an
equation, the OLS assumption that the error term is orthogonal with the explanatory variables is
violated. Since OLS imposes this constraint as a part of the estimation process, the resulting
estimated B and Γ are biased.
The reason that OLS is often used as a benchmark is because from among the class of all
linear estimators, OLS produces minimum variance. The loss in predictive power of LIML and
2SLS has to be weighed against the fact that OLS produces biased estimates. If reduced-form
coefficients are desired, identities in the system must be entered. The number of identities plus
the number of estimated equations must equal the number of endogenous variables in the model.
The simeq command requires that the number of model sentences and identity sentences is
equal to the number of variables listed in the endogenous sentence.
The 2SLS estimator first estimates all endogenous variables as a function of all
exogenous variables. This is equivalent to estimating an unconstrained form of the reduced-form
equation (4.1-4). Next, in stage 2 the estimated values of the endogenous variables on the right in
the jth equation Yj* are used in place of the actual values of the endogenous variables Yj on the
right to estimate equation (4.1-2). Since the estimated values of the endogenous variables on the
right are only a function of exogenous variables, the theory suggests they can be assumed to be
orthogonal with the population error, and OLS can be safely used for the second stage. In terms
of our prior notation, the two-stage estimator for the first equation is
4-10
Chapter 4
 b11 
. 
 
1
ˆ 'Yˆ Yˆ ' X  Yˆ ' y 
 b1g 

Y
1
1
1
1
1 1
ˆ X )'(Y
ˆ X )}-1 (Y
ˆ X )' y  
   {(Y
  ' 
1 1
1
1
1 1
1
' ˆ
'

 11 
 X 1Y1 X 1 X 1   X 1 y1 
. 
 
1g 
(4.2-1)
where Ŷ1' is the matrix of predicted endogenous variables in the first equation and X1 is the
matrix of exogenous variables in the first equation. For further details on this traditional
estimation approach, see Pindyck and Rubinfeld (1981, 345-347).
The QR approach used by Jennings (1980) involves estimating equation (4.2-1) as the
solution of
Z'j (XX )Z j j  Z'j (XX )y j
(4.2-2)
For  j , where  'j  {(b11,..., big ) ',(11,.., 1k ) '}, Z j  [X j | Yj ] and X+ pseudoinverse6 of X. Zj
consists of the X and Y variables in the jth equation. XX+ is not calculated directly but is
expressed in terms of the QR factorization of X. By working directly on X, and not forming
X'X, substantial accuracy is obtained. Jennings proceeds by writing
I 0
XX +  Q  r  Q '
0 0 
(4.2-3)
where Ir is the r by r identity matrix and r is the rank of X. Using equation (4.2-3), equation (4.22) becomes
ˆ Ir 0  Z
ˆ  Z
ˆ Ir 0  yˆ
Z
j
j j
j

 j
0 0 
0 0 
(4.2-4)
where Ẑ j  Q'Z j and ŷ j  Q' y j .
The 2SLS covariance matrix can be estimated as
6 If we define X+ as the pseudoinverse of the T by K matrix X, then it can be shown (Strang 1976, 138, exercise
3.4.5) that the following four conditions hold: 1. XX +X=X; 2. X+XX+=X+; 3. (XX+)'=XX+; and 4. (X+X)'=X+X .
The pseudoinverse can be obtained from the singular value decomposition or the QR factorization of X.
Simultaneous Equations Systems
(|| e j ||2 d f )(Z'jXX+ Z j )1
4-11
(4.2-5)
where d f is the degrees of freedom and || e j ||2 is the residual sum of squares (or the square of
the L2NORM of the residual). There is a substantial controversy in the literature about the
appropriate value for d f . Since the SEs of the estimated 2SLS coefficients are known only
asymptotically, Theil (1971) suggests that d f be set equal to T, the number of observations used
to estimate the model. Others suggest that d f be set to T-K, similar to what is being used in
OLS. If Theil's suggestion is used, the estimated SEs of the coefficients are larger. The T-K
option is more conservative. The simeq command produces both estimates of the coefficient
standard errors to facilitate comparison with other programs and researcher preferences.
Two-stage least squares estimation of an equation with endogenous variables on the right,
in contrast with OLS estimation, in theory produces unbiased coefficients at the cost of some loss
of efficiency. If a large system is estimated, it is often impossible to use all exogenous variables
in the system because of loss of degrees of freedom. The usual practice is to select a subset of the
exogenous variables. The greater the number of exogenous variables relative to the degrees of
freedom, the closer the predicted Y variables on the right are to the raw Y variables on the right.
In this situation, the 2SLS estimator sum of squares of residuals will approach the OLS estimator
sum of squares of residuals. Such an estimator will lose the unbiased property of the 2SLS
estimator. Usual econometric practice is to use OLS and 2SLS and compare the results to see
how sensitive the OLS results are to simultaneity problems.
While 2SLS results are sensitive to the variable that is used to normalize the system,
limited information maximum likelihood (LIML) estimation, which can be used in place of
2SLS, is not so sensitive. Kmenta (1971, 568-570) has a clear discussion which is summarized
below. The LIML estimator,7 which is hard to explain in simple terms, involves selecting values
for b and δ for each equation such that L is minimized where L = SSE1 / SSE. We define SSE1
as the residual variance of estimating a weighted average of the y variables in the equation on all
exogenous variables in the equation, while SSE is the residual variance of estimating a weighted
average of the y variables on all the exogenous variables in the system. Since SSE  SSE1, L is
bounded at 1. The difficulty in LIML estimation is selecting the weights for combining the y
variables in the equation. Assume equation 1 of (4.1-1)
b11 y1i  ...  b1G yG i  11 xi 1  ...  1K xK i  u1i
(4.2-6)
Ignoring time subscripts, we can define
y1*  y1  [b12 y2  ...  b1G yG ]
(4.2-7)
7 Kmenta (1971, 565-572) has one of the clearest descriptions. The discussion here complements that material.
4-12
Chapter 4
'
 [1, b12 ,...,  b1G ] we would know y*1
If we define Y1*  [ y1i ,..., y1G ] and we knew the vector B1*
since y1*  Y1*B1* and could regress y* on all x variables on the right in that equation and call the
residual variance SSE1 and next regress y1* on all x variables in the system and call the residual
variance SSE. If we define X1 as a matrix consisting of the columns of the x variables on the
right X1= [x1i,...,x1K], and we knew B1*, then we could estimate 1  [11 ,..., 1K ] as
1  [X1' X1 ]1 X1' y1*  (X1' X1 ) 1 X1*Y1*B1*
(4.2-8)
However, we do not know B1*. If we define
W1*  Y1*' Y1*  (Y1*' X1 )(X1*X1 ) 1 X1*Y1*
W1  Y1*' Y1*  (Y1*' X)(X'X)1X'Y1*
(4.2-9)
(4.2-10)
where X is the matrix of all X variables in the system, then L can be written as
'
'
L  [B1*
W1*B1* ] / B1*
W1B1*
(4.2-11)
Minimizing L implies that
det (W1*  LW1 )B1*  0
(4.2-12)
The LIML estimator uses eigenvalue analysis to select the vector B1* such that L is minimized.
This calculation involves solving the system
det(W1*  LW1 )  0
(4.2-13)
for the smallest root L which we will call  . This root can be substituted back into equation (4.212) to get B1* and into equation (4.2-8) to get Γ1. Jennings shows that equation (4.2-13) can be
rewritten as
det | Y1*' {(I  X1X1+ )   (I-XX + )}Y1* | 0 .
(4.2-14)
Further factorizations lead to accuracy improvements and speed over the traditional methods of
solution outlined in Johnston (1984), Kmenta (1971), and other books. Jennings (1973, 1980)
briefly discusses tests made for computational accuracy, given the number of significant digits in
the data and various tests for nonunique solutions. One of the main objectives of the simeq code
was to be able to inform the user if there were problems in identification in theory and in
practice. Since the LIML standard errors are known only asymptotically and are, in fact, equal
to the 2SLS estimated standard errors, these are used for both the 2SLS and LIML estimators.
Simultaneous Equations Systems
4-13
In the first stage of 2SLS, π is the unconstrained, reduced form.
Y = πX + V
(4.2-15)
and is estimated to obtain the Yˆ predicted variables. 2SLS, OLS, and LIML are all special cases
of the Theil (1971) k class estimators. The general formula for the k class estimator for the first
equation (Kmenta 1971, 565) is
ˆ (k )   Y'Y  kV
ˆ 'V
ˆ
B
1
1 1
1 1
 (k )   
'
ˆ
1   X1Y1
1
ˆ 'y 
Y1'X1  Y1'Y1  kV
1 1
 

'
'
X1X1  
X1 y1 
(4.2-16)
where Vˆ1 is the predicted residual from estimating all but the 1st y variable in equation (4.2-15),
Yˆ1  Y1  Vˆ , and X1 is the X variables on the right-hand side of the first equation. (4.2-16) follows
directly from (4.2-1). If k=0, equation (4.2-15) is the formula for OLS estimation of the first
equation. If k=1, equation (4.2-16) is the formula for 2SLS estimation of the first equation and
can be transformed to equation (4.2-5). If k =  , the minimum root of equation (4.2-13),
equation (4.2-16) is the formula for the LIML estimator (Theil 1971, 504). Hence, OLS, 2SLS,
and LIML are all members of the k class of estimators.
Three-stage least squares utilizes the covariance of the residuals across equations from
the estimated 2SLS model to improve the estimated coefficients B and Γ. If the model has only
exogenous variables on the right-hand side (B = 0), the OLS estimates can be used to calculate
the covariance of the residuals across equations. The resulting estimator is the seemingly
unrelated regression model (SUR). In this discussion, we will look at the 3SLS model only,
since the SUR model is a special case. From (4.2-2) we rewrite the 2SLS estimator for the ith
equation as
 i  [Zi' X(X ' X) 1X ' Zi ]1 Zi' X(X 'X) 1X ' yi ,
(4.2-17)
which estimates the ith 2SLS equation
yi  Zii  ui .
(4.2-18)
If we define8 (X' X)-1  PP' and multiply equation (4.2-18) by P'X', we obtain
P ' X ' yi  P ' X ' Zi i  P ' X 'ui
which can be written
8 This discussion is based on material contained in Johnston (1984, 486).
(4.2-19)
4-14
Chapter 4
w i  Wii   i
(4.2-20)
where w i  P ' X ' yi , Wi =P'X ' Zi , and  i  P 'X 'ui . If all G 2SLS equations are written as
 w1   W1 0 ...... 0  1  1 
 w  0 W ...... 0     
2
 2
 2  2 
.  ....................
 .  . 
  
   
WG   G   G 
 w G  0 0
(4.2-21)
then the system can be written as
w = Wα + ε.
(4.2-22)
For each equation, i=j and
E[ i ( j )' ]  E[P'X( i ( j )' XP)= i j P'X'XP= i j I
(4.2-23)
while the covariance of the error term for the system becomes
 11 I  12 I... 1G I 


 21 I  22 I... 2G I 


........................ 


 G1I  G 2 I... 1G I 
24)
(4.2-
Equation (4.2-24) indicates that for each equation there is no heteroskedasticity, but that there is
contemporaneous correlation of the residuals across equations. Equation (4.2-24) can be
estimated from the 2SLS estimates of the residuals of each equation for 3SLS or the OLS
estimates of the residuals of each equation for SUR models. Let
ˆ ˆ  I
V=
25)
(4.2-
be such an estimate. The 3SLS estimator of the system  , where  '  [B ] becomes
ˆ ' W) 1W ' V
ˆ 1w
  (W ' V
(4.2-26)
Jennings (1980) uses two alternative approaches to solve (4.2-26) depending on whether the
covariance of the 3SLS estimator
Simultaneous Equations Systems
Var( )  (Wˆ 'Vˆ 1Wˆ ) 1
27)
4-15
(4.2-
is required or not. In the former case, a orthogonal factorization method is used. In the latter
case to save space the conjugate gradiant interative algorithm (Lanczos reductyion) suggested
by Paige and Sanders (1973) is used. This latter approach may or may not converge. For added
detail see Jennings (1980). If the switch kcov=diag is used there will not be convergence
issues, since the QR approach will be used. Since many software systems use inversion methods,
slight differences in the estimated coefficients will be observed since the QR approach is in
theory more accurate. Implementation of the "textbook" approach is illustrated using the matrix
command in section 4.4.
In a model with G equations, if the equation of interest is the jth equation, then assuming
the exogenous variables in the system are selected correctly and the jth equation is specified
correctly, 2SLS estimates are invariant to any other equation. 3SLS of the j th equation, in
contrast, is sensitive to the specification of other equations in the system since changes in other
equation specifications will alter the estimate of V and thus the 3SLS estimator of δ from
equation (4.2-26). Because of this fact, it is imperative that users first inspect the 2SLS estimates
closely. The constrained reduced form estimates, π, should be calculated from the OLS and 2SLS
models and compared. The differences show the effects of correcting for simultaneity. Next,
3SLS should be performed. A study of the resulting changes in δ and π will show the gain of
moving to a system-wide estimation procedure. Since changes in the functional form of one
equation i can possibly impact the estimates of another equation j, in this step of model building,
sensitivity analysis should be attempted. In a multiequation system, the movement from 2SLS to
3SLS often produces changes in the estimate of δi for one equation but not for another equation.
In a model in which all equations are over identified, in general the 3SLS estimators will differ
from the 2SLS estimators. If all equations are exactly identified, then V is a diagonal matrix
(Theil 1971, 511) and there is no gain for any equation from using 3SLS. In the test problem
from Kmenta (1971, 565), which is discussed in the next section, one equation is over identified
and one equation is exactly identified. In this case, only the exactly identified equation will be
changed by 3SLS. This is because the exactly identified equation gains from information in the
over identified equation but the reverse is not true. The over identified equation does not gain
from information from the exactly identified equation.
In SUR models, if all equations contain the same variables, there is no gain over OLS
from going to SUR, since V is again a diagonal matrix. Just as the LIML method of estimation is
an alternative to 2SLS, the FIML is a more costly alternative to 3SLS and I3SLS.
FIML9 is a generalization of LIML for systems of models. Like LIML, it is invariant to
9 The fiml section of the simeq command is the weakest link. In addition to a probably a scaler error in the fiml
standard errors, there often are convergence problems that appear to be data related. In view of this and the fact that
3SLS is an inexpensive substitute, users are encouraged to employ 3SLS and I3SLS in place of FIML. Future
releases of B34S will endeavor to improve the FIML code or disable the option. The matrix command
implementation of FIML, shown later in section 4.4, provides a look into how such a model might be implemented.
4-16
Chapter 4
the variable used to normalize the model. FIML, in contrast with 3SLS, is highly nonlinear and,
as a consequence, much more costly to estimate. Because FIML is asymptotically equivalent to
3SLS (Theil 1971, 525) and the simeq code does not contain any major advantages over other
programs, the discussion of FIML is left to Theil (1971), Kmenta (1971) and Johnston (1984)
except for an annotated FIML example using the matrix command. In the next section, an
annotated output is presented.
Iterative 3SLS is an alternative final step in which the estimate of V is updated
from the information from the 3SLS estimates. The problem now becomes where do you stop
iterating on the estimates of V? The simeq command uses the information on the number of
significant digits (see ipr parameter) in the raw data and equation (4.1-8) to terminate the I3SLS
iterations if the relative change is within what would be expected, given the number of
significant digits in the raw data. If ipr is not set, the simeq command assumes ten digits.
4.3 Examples
Simultaneous Equations Systems
4-17
Using data on supply and demand from Kmenta (1971, 565), Table 4.1 shows matrix
code to estimate models for OLS, LIML, 2SLS, and 3SLS. The reduced-form estimates for each
model are calculated. Not all output is shown to save space. The results are the same, digit for
digit, as those reported in Kmenta (1971, 582). Note the use of the keyword ls2 for 2SLS and ls3
for 3SLS since the parser will not recognize 2SLS and 3SLS as keywords.
Table 4.3 Setup for ols, liml, ls2, ls3, and ils3 commands
==KMENTA1
B34sexec data nohead corr$
Input q p d f a $
Label q = 'Food consumption per head'$
Label p = 'Ratio of food prices to consumer prices'$
Label d = 'Disposable income in constant prices'$
Label f = 'Ratio of t-1 years price to general p'$
Label a = 'Time'$
Comment=('Kmenta (1971) page 565 answers page 582')$
Datacards$
98.485 100.323 87.4 98.0 1 99.187 104.264 97.6 99.1 2
102.163 103.435 96.7 99.1 3 101.504 104.506 98.2 98.1 4
104.240
98.001 99.8 110.8 5 103.243
99.456 100.5 108.2 6
103.993 101.066 103.2 105.6 7 99.900 104.763 107.8 109.8 8
100.350
96.446 96.6 108.7 9 102.820
91.228 88.9 100.6 10
95.435
93.085 75.1 81.0 11 92.424
98.801 76.9 68.6 12
94.535 102.908 84.6 70.9 13 98.757
98.756 90.6 81.4 14
105.797
95.119 103.1 102.3 15 100.225
98.451 105.1 105.0 16
103.522
86.498 96.4 110.5 17 99.929 104.016 104.4 92.5 18
105.223 105.769 110.7 89.3 19 106.232 113.490 127.1 93.0 20
B34sreturn$
B34seend$
B34sexec simeq printsys reduced ols liml ls2 ls3 ils3 kcov=diag
ipr=6$
Heading=('Test Case from Kmenta (1971) Pages 565 - 582 ' ) $
Exogenous constant d f a $
Endogenous p q $
Model lvar=q rvar=(constant p d)
Name=('Demand Equation')$
Model lvar=q rvar=(constant p f a) name=('Supply Equation')$
B34seend$
==
4-18
Chapter 4
The OLS results are as follows:
Test Case from Kmenta (1971) Pages 565 - 582
Summary of Input Parameters and Model
Number of systems to be estimated - Number of identities - - - - - - - Number of exogenous variables - - Number of endogenous variables - - Number of data points in time - - - Maximum number of unknowns per system
Print Parameter - - - - - - - - - - Solutions wanted 0 => no, 1 => yes Reduced form coefficients - - - - - Ordinary Least Squares - - - - - - LIMLE Solution - - - - - - - - - - Two Stage Least Squares - - - - - - Three Stage Least Squares - - - - - Three Stage Covariance Matrix - - - Iterated Three Stage Least Squares Covariance Matrix for I3SLSQ - - - Maximum number of iterations - - - Functional Minimization 3SLSQ - - - Covariance Matrix for Functional Min.
-
2
0
4
2
20
4
2
1
1
1
1
1
1
1
1
25
0
0
Systems described by the following columns of data
Name of the System
LHS
Demand Equation
B34S 8.10R
4
Q
2
Q
No. X
3
1
1 CONSTANT
1 P
3 F
4 A
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
(D:M:Y)
11/ 4/04 (H:M:S) 11:13:19
SIMEQ STEP
PAGE
Test Case from Kmenta (1971) Pages 565 - 582
Least Squares Solution for System Number
1
Condition Number of Matrix is greater than
Relative Numerical Error in the Solution
LHS Endogenous Variable No.
2
Exogenous Variables (Predetermined)
1
CONSTANT
2
D
Demand Equation
21.04911571706159
1.301987681166638E-11
Q
99.89542
0.3346356
Std. Error
7.519362
0.4542183E-01
t
13.28509
7.367285
Endogenous Variables (Jointly Dependent)
3
P
-0.3162988
Std. Error
0.9067741E-01
t
-3.488177
Residual Variance for Structural Disturbances
Ratio of Norm Residual to Norm LHS
3.725391173733892
1.762488253954560E-02
Covariance Matrix of Estimated Parameters
CONSTANT
D
P
1
2
3
CONSTANT
D
1
2
56.54
0.3216E-01 0.2063E-02
-0.5948
-0.2333E-02
P
3
0.8222E-02
Correlation Matrix of Estimated Parameters
CONSTANT
D
P
1
2
3
NO. Y
2
1
1 CONSTANT
1 P
2 D
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Supply Equation
2
(Variables)
CONSTANT
D
1
2
1.000
0.9417E-01
1.000
-0.8724
-0.5665
P
3
1.000
Test Case from Kmenta (1971) Pages 565 - 582
Least Squares Solution for System Number
Condition Number of Matrix is greater than
Relative Numerical Error in the Solution
LHS Endogenous Variable No.
2
Exogenous Variables (Predetermined)
2
Supply Equation
17.67594711864223
1.318741471618151E-11
Q
Std. Error
t
Simultaneous Equations Systems
1
2
3
CONSTANT
F
A
58.27543
0.2481333
0.2483023
11.46291
0.4618785E-01
0.9751777E-01
5.083825
5.372263
2.546227
Endogenous Variables (Jointly Dependent)
4
P
0.1603666
Std. Error
0.9488394E-01
t
1.690134
Residual Variance for Structural Disturbances
Ratio of Norm Residual to Norm LHS
5.784441135907554
2.130622575072544E-02
4-19
Covariance Matrix of Estimated Parameters
CONSTANT
F
A
P
CONSTANT
1
131.4
-0.3044
-0.2792
-0.9875
1
2
3
4
F
A
P
2
3
4
0.2133E-02
0.1316E-02
0.8440E-03
0.9510E-02
0.5220E-03
0.9003E-02
Correlation Matrix of Estimated Parameters
CONSTANT
F
A
P
CONSTANT
1
1.000
-0.5749
-0.2498
-0.9079
1
2
3
4
F
A
2
1.000
0.2921
0.1926
P
3
1.000
0.5642E-01
4
1.000
Test Case from Kmenta (1971) Pages 565 - 582
Contemporaneous Covariance of Residuals (Structural Disturbances)
For Least Squares Solution.
Condition Number of residual columns,
Demand E
Supply E
Demand E
1
3.167
3.411
1
2
2.664758
Supply E
2
4.628
Correlation Matrix of Residuals
Demand E
Supply E
1
2
Demand E
1
1.000
0.8912
Supply E
2
1.000
Test Case from Kmenta (1971) Pages 565 - 582
Coefficients of the Reduced Form Equations.
Least Squares Solution.
Condition number of matrix used to find the reduced form coefficients is no smaller than
P
CONSTANT
D
F
A
1
2
3
4
1
87.31
0.7020
-0.5206
-0.5209
4.195815340351579
Q
2
72.28
0.1126
0.1647
0.1648
Mean sum of squares of residuals for the reduced form equations.
1
2
P
Q
0.42748D+01
0.39192D+01
Condition Number of columns of exogenous variables,
11.845
For each estimated equation, the condition number of the matrix, equation (4.1-7), and the
relative numerical errors in the solution, equation (4.1-8), are given. The relative numerical
errors for the supply and demand equations were .1302E-10 and .13187E-10, respectively.
Estimated coefficients agree with Kmenta (1971, 582). From the estimated B and Γ coefficients,
the constrained reduced form π coefficients are calculated. The condition number of the
exogenous columns, .11845E+2, shows little multicollinearity among the exogenous variables.
The next outputs show the corresponding estimates for LIML, 2SLS, and 3LSL. As was
discussed earlier, since the asymptotic SEs for LIML are the same as for 2SLS, the simeq
command does not print these values.
4-20
Chapter 4
Test Case from Kmenta (1971) Pages 565 - 582
Limited Information - Maximum Likelihood Solution f
1
Demand Equation
Rank and Condition Number of Exogenous Columns
Rank and Condition Number of Endogenous Variables orthogonal to X(K)
Rank and Condition Number of Endogenous Variables orthogonal to X
Value of LIML
Parameter is
2
2
2
8.5174634
6.5593694
2.3005812
3
1
2
8.2098363
1.0000000
1.0000000
1.173867141559841
Condition Number of Matrix is greater than
Relative Numerical Error in the Solution
LHS Endogenous Variable No.
2
8.517463415017575
4.487883690647531E-12
Q
Standard Deviation Equals 2SLSQ Standard Deviation.
Exogenous Variables (Predetermined)
1
CONSTANT
2
D
93.61922
0.3100134
Endogenous Variables (Jointly Dependent)
3
P
-0.2295381
Residual Variance for Structural Disturbances
Ratio of Norm Residual to Norm LHS
3.926009688207962
1.809322459330604E-02
Test Case from Kmenta (1971) Pages 565 - 582
Limited Information - Maximum Likelihood Solution f
2
Supply Equation
Rank and Condition Number of Exogenous Columns
Rank and Condition Number of Endogenous Variables orthogonal to X(K)
Rank and Condition Number of Endogenous Variables orthogonal to X
Value of LIML
Parameter is
1.000000000000000
Condition Number of Matrix is greater than
Relative Numerical Error in the Solution
LHS Endogenous Variable No.
2
8.209836250820180
4.943047984855735E-12
Q
Standard Deviation Equals 2SLSQ Standard Deviation.
Exogenous Variables (Predetermined)
1
CONSTANT
2
F
3
A
49.53244
0.2556057
0.2529242
Endogenous Variables (Jointly Dependent)
4
P
0.2400758
Residual Variance for Structural Disturbances
Ratio of Norm Residual to Norm LHS
6.039577731391617
2.177103664979223E-02
Test Case from Kmenta (1971) Pages 565 - 582
Contemporaneous Covariance of Residuals (Structural Disturbances)
For LIMLE Solution.
Condition Number of residual columns,
Demand E
Supply E
Demand E
1
3.337
3.629
1
2
2.811594
Supply E
2
4.832
Correlation Matrix of Residuals
Demand E
1
1
1.000
2
0.9038
Demand E
Supply E
Supply E
2
1.000
Test Case from Kmenta (1971) Pages 565 - 582
Coefficients of the Reduced Form Equations.
LIMLE Solution.
Condition number of matrix used to find the reduced form coefficients is no smaller than
P
CONSTANT
D
F
A
1
2
3
4
1
93.88
0.6601
-0.5443
-0.5386
Q
2
72.07
0.1585
0.1249
0.1236
Mean sum of squares of residuals for the reduced form equations.
1
P
0.41286D+01
4.258817996669486
Simultaneous Equations Systems
2
Q
4-21
0.38401D+01
Test Case from Kmenta (1971) Pages 565 - 582
Two Stage Least Squares Solution for System Number
Condition Number of Matrix is greater than
Relative Numerical Error in the Solution
LHS Endogenous Variable No.
2
Exogenous Variables (Predetermined)
1
CONSTANT
2
D
1
Demand Equation
21.98482284147018
1.411421448020441E-11
Q
94.63330
0.3139918
Std. Error
7.920838
0.4694366E-01
t
11.94738
6.688695
Theil SE
7.302652
0.4327991E-01
Theil t
12.95876
7.254908
Endogenous Variables (Jointly Dependent)
3
P
-0.2435565
Std. Error
0.9648429E-01
t
-2.524313
Theil SE
0.8895412E-01
Theil t
-2.738002
Residual Variance for Structural Disturbances
Ratio of Norm Residual to Norm LHS
3.866416929101937
1.795538131264630E-02
Covariance Matrix of Estimated Parameters
CONSTANT
D
P
1
2
3
CONSTANT
D
1
2
62.74
0.4930E-01 0.2204E-02
-0.6734
-0.2642E-02
P
3
0.9309E-02
Correlation Matrix of Estimated Parameters
CONSTANT
D
P
1
2
3
CONSTANT
1
1.000
0.1326
-0.8812
D
P
2
3
1.000
-0.5833
1.000
Test Case from Kmenta (1971) Pages 565 - 582
Two Stage Least Squares Solution for System Number
Condition Number of Matrix is greater than
Relative Numerical Error in the Solution
LHS Endogenous Variable No.
2
Exogenous Variables (Predetermined)
1
CONSTANT
2
F
3
A
2
Supply Equation
18.21923089332271
1.431397195953368E-11
Q
49.53244
0.2556057
0.2529242
Std. Error
12.01053
0.4725007E-01
0.9965509E-01
t
4.124086
5.409637
2.537996
Theil SE
10.74254
0.4226175E-01
0.8913422E-01
Theil t
4.610868
6.048158
2.837565
Endogenous Variables (Jointly Dependent)
4
P
0.2400758
Std. Error
0.9993385E-01
t
2.402347
Theil SE
0.8938355E-01
Theil t
2.685905
Residual Variance for Structural Disturbances
Ratio of Norm Residual to Norm LHS
6.039577731391617
2.177103664979223E-02
Covariance Matrix of Estimated Parameters
CONSTANT
F
A
P
1
2
3
4
CONSTANT
1
144.3
-0.3238
-0.2952
-1.095
F
A
P
2
3
4
0.2233E-02
0.1377E-02
0.9362E-03
0.9931E-02
0.5791E-03
0.9987E-02
Correlation Matrix of Estimated Parameters
CONSTANT
F
A
P
1
2
3
4
CONSTANT
1
1.000
-0.5706
-0.2467
-0.9126
F
A
2
1.000
0.2924
0.1983
P
3
1.000
0.5815E-01
4
1.000
Test Case from Kmenta (1971) Pages 565 - 582
Contemporaneous Covariance of Residuals (Structural Disturbances)
For Two Stage Least Squares Solution.
Condition Number of residual columns,
Demand E
Supply E
1
2
Demand E
1
3.286
3.593
Supply E
2
4.832
Correlation Matrix of Residuals
Demand E
1
Supply E
2
2.804709
4-22
Chapter 4
Demand E
Supply E
1
2
1.000
0.9017
1.000
Test Case from Kmenta (1971) Pages 565 - 582
Coefficients of the Reduced Form Equations.
Two Stage Least Squares Solution
Condition number of matrix used to find the reduced form coefficients is no smaller than
P
CONSTANT
D
F
A
1
2
3
4
1
93.25
0.6492
-0.5285
-0.5230
Q
2
71.92
0.1559
0.1287
0.1274
Mean sum of squares of residuals for the reduced form equations.
1
2
P
Q
0.39831D+01
0.38317D+01
Condition number of the large matrix in Three Stage Least Squares
60.70221
4.135372945327849
Simultaneous Equations Systems
4-23
Test Case from Kmenta (1971) Pages 565 - 582
Three Stage Least Squares Solution for System Number
LHS Endogenous Variable No.
2
1
Demand Equation
Q
Exogenous Variables (Predetermined)
1
CONSTANT
2
D
94.63330
0.3139918
Std. Error
7.920838
0.4694366E-01
t
11.94738
6.688695
Theil SE
7.302652
0.4327991E-01
Theil t
12.95876
7.254908
Endogenous Variables (Jointly Dependent)
3
P
-0.2435565
Std. Error
0.9648429E-01
t
-2.524313
Theil SE
0.8895412E-01
Theil t
-2.738002
Residual Variance (For Structural Disturbances)
3.286454
Three Stage Least Squares Covariance for System
CONSTANT
D
P
CONSTANT
D
1
2
62.74
0.4930E-01 0.2204E-02
-0.6734
-0.2642E-02
1
2
3
Demand Equation
P
3
0.9309E-02
Three Stage Least Squares Solution for System Number
LHS Endogenous Variable No.
2
2
Supply Equation
Q
Exogenous Variables (Predetermined)
1
CONSTANT
2
F
3
A
52.11764
0.2289775
0.3579074
Std. Error
11.89337
0.4399381E-01
0.7288940E-01
t
4.382074
5.204767
4.910281
Theil SE
10.63776
0.3934926E-01
0.6519426E-01
Theil t
4.899308
5.819106
5.489861
Endogenous Variables (Jointly Dependent)
4
P
0.2289322
Std. Error
0.9967317E-01
t
2.296828
Theil SE
0.8915039E-01
Theil t
2.567932
Residual Variance (For Structural Disturbances)
5.360809
Three Stage Least Squares Covariance for System
CONSTANT
F
A
P
CONSTANT
1
141.5
-0.2950
-0.4090
-1.083
1
2
3
4
F
A
Supply Equation
P
2
3
4
0.1935E-02
0.2548E-02
0.8119E-03
0.5313E-02
0.1069E-02
0.9935E-02
Test Case from Kmenta (1971) Pages 565 - 582
Contemporaneous Covariance of Residuals (Structural Disturbances)
For Three Stage Least Squares Solution.
Condition Number of residual columns,
Demand E
Supply E
1
2
Demand E
1
3.286
4.111
6.321462
Supply E
2
5.361
Correlation Matrix of Residuals
Demand E
1
1
1.000
2
0.9794
Demand E
Supply E
Supply E
2
1.000
Test Case from Kmenta (1971) Pages 565 - 582
Coefficients of the Reduced Form Equations.
Three Stage Least Squares Solution using Orthogonal Factorization.
Condition number of matrix used to find the reduced form coefficients is no smaller than
P
CONSTANT
D
F
A
1
2
3
4
1
89.98
0.6645
-0.4846
-0.7575
Q
2
72.72
0.1521
0.1180
0.1845
Mean sum of squares of residuals for the reduced form equations.
1
2
P
Q
0.19065D+01
0.42494D+01
Iterated Three Stage Least Squares Results are given next.
4.232905401139098
4-24
Chapter 4
Iteration begins for Iterated 3SLSQ.
Condition number of the large matrix in Three Stage Least Squares
147.2220
Test Case from Kmenta (1971) Pages 565 - 582
Iterated Three Stage Least Squares Solution for System No.
LHS Endogenous Variable No.
2
1
Demand Equation
Q
Exogenous Variables (Predetermined)
1
CONSTANT
2
D
94.63330
0.3139918
Std. Error
7.920838
0.4694366E-01
t
11.94738
6.688695
Theil SE
7.302652
0.4327991E-01
Theil t
12.95876
7.254908
Endogenous Variables (Jointly Dependent)
3
P
-0.2435565
Std. Error
0.9648429E-01
t
-2.524313
Theil SE
0.8895412E-01
Theil t
-2.738002
Residual Variance (For Structural Disturbances)
3.286454
Iterated Three Stage Least Squares Covariance for System
Demand Equation
CONSTANT
D
P
CONSTANT
D
1
2
62.74
0.4930E-01 0.2204E-02
-0.6734
-0.2642E-02
1
2
3
P
3
0.9309E-02
Iterated Three Stage Least Squares Solution for System No.
LHS Endogenous Variable No.
2
2
Supply Equation
Q
Exogenous Variables (Predetermined)
1
CONSTANT
2
F
3
A
52.55269
0.2244964
0.3755747
Endogenous Variables (Jointly Dependent)
4
P
0.2270569
Std. Error
12.74080
0.4653972E-01
0.7166061E-01
t
4.124755
4.823758
5.241020
Theil SE
11.39572
0.4162639E-01
0.6409520E-01
Theil t
4.611616
5.393126
5.859638
Std. Error
0.1069194
t
2.123627
Theil SE
0.9563159E-01
Theil t
2.374287
Residual Variance (For Structural Disturbances)
5.565111
Iterated Three Stage Least Squares Covariance for System
Supply Equation
CONSTANT
F
A
P
CONSTANT
1
162.3
-0.3336
-0.4953
-1.245
1
2
3
4
F
A
P
2
3
4
0.2166E-02
0.3185E-02
0.9086E-03
0.5135E-02
0.1336E-02
0.1143E-01
Test Case from Kmenta (1971) Pages 565 - 582
Contemporaneous Covariance of Residuals (Structural Disturbances)
For Iterated Three Stage Least Squares Solution.
Condition Number of residual columns,
Demand E
Supply E
Demand E
1
3.286
4.198
1
2
6.814796
Supply E
2
5.565
Correlation Matrix of Residuals
Demand E
Supply E
1
2
Demand E
1
1.000
0.9816
Supply E
2
1.000
Test Case from Kmenta (1971) Pages 565 - 582
Coefficients of the Reduced Form Equations.
Iterated Three Stage Least Squares Solution.
Condition number of matrix used to find the reduced form coefficients is no smaller than
P
CONSTANT
D
F
A
1
2
3
4
1
89.42
0.6672
-0.4770
-0.7981
Q
2
72.86
0.1515
0.1162
0.1944
Mean sum of squares of residuals for the reduced form equations.
1
P
0.20576D+01
4.249772824974006
Simultaneous Equations Systems
2
Q
4-25
0.43519D+01
In the Kmenta test problem, one equation (demand) was overidentified and one equation (supply)
was exactly identified. As was mentioned earlier, the 2SLS and 3SLS results for the
overidentified equation are the same because the other equation was exactly identified.
However, the 3SLS results for the exactly identified equation (supply) differ from the 2SLS
results because the other equation (demand) is over identified. Close inspection of the results for
3SLS for the demand equation shows that they are the same as those of Kmenta (1971, 582) and
Kmenta (1986, 712). The supply-equation results are the same as those of Kmenta (1971) but
differ slightly from those of Kmenta (1986), which appear to be in error.10 To facilitate testing,
SAS and RATS setups are shown in Tables 4.2 and 4.3 and their output discussed in some detail.
Table 4.4 SAS Implementation of the Kmenta Model
B34SEXEC OPTIONS OPEN('testsas.sas') UNIT(29) DISP=UNKNOWN$ B34SRUN$
B34SEXEC OPTIONS CLEAN(29) $ B34SEEND$
B34SEXEC PGMCALL IDATA=29 ICNTRL=29$
SAS
$
PGMCARDS$
proc means; run;
proc syslin 3sls reduced;
instruments d f a constant;
endogenous p q;
demand:
supply:
run;
model q = p d;
model q = p f a;
proc syslin it3sls reduced;
instruments d f a constant;
endogenous p q;
demand:
supply:
run;
model q = p d;
model q = p f a;
B34SRETURN$
B34SRUN $
B34SEXEC OPTIONS CLOSE(29)$ B34SRUN$
/$ The next card has to be modified to point to SAS location
/$ Be sure and wait until SAS gets done before letting B34S resume
B34SEXEC OPTIONS dodos('start /w /r sas testsas')
dounix('sas testsas')$
B34SRUN$
B34SEXEC OPTIONS NPAGEOUT NOHEADER
WRITEOUT('
','Output from SAS',' ',' ')
WRITELOG('
','Output from SAS',' ',' ')
COPYFOUT('testsas.lst')
COPYFLOG('testsas.log')
dodos('erase testsas.sas','erase testsas.lst','erase testsas.log')
10 The file example.mac contains an extension of the above test case that calls RATS, SAS and a B34S
matrix implementation. For the supply equation SAS gets the Kmenta (1986) results which are 52.1972 (11.8934),
.2286 (.0997), .2282 (.0440), (.3611). What RATS calls 3SLS produces what B34S calls I3SLS. Readers are
encouraged to use the code in tables 4.4 and 4.5 to further investigate this issue. A major difficulty for the researcher
to be able to tell exactly what is being estimated by a software system. For this reason attempting the model on
multiple software systems is strongly advised.
26
dounix('rm
B34SRUN$
Chapter 4
testsas.sas','rm
testsas.lst','rm
testsas.log')$
Simultaneous Equations Systems
Table 4.5 RATS Implementation of the Kmenta Model
B34SEXEC OPTIONS HEADER$ B34SRUN$
b34sexec
b34sexec
b34sexec
b34sexec
options
options
options
options
open('rats.dat') unit(28) disp=unknown$ b34srun$
open('rats.in') unit(29) disp=unknown$ b34srun$
clean(28)$ b34srun$
clean(29)$ b34srun$
b34sexec pgmcall$
rats passasts
pcomments('* ',
'* Data passed from B34S(r) system to RATS',
'*
',
"display @1 %dateandtime() @33 ' Rats Version ' %ratsversion()"
'* ') $
PGMCARDS$
*
*
heading=('test case from kmenta 1971 page 565 - 582 ' ) $
*
exogenous constant d f a $
*
endogenous p q $
*
model lvar=q rvar=(constant p d)
name=('demand eq.') $
*
model lvar=q rvar=(constant p f a) name=('supply eq.') $
linreg q
# constant p d
linreg q
# constant p f a
instruments constant d f a
linreg(inst) q
# constant p d
linreg(inst) q
# constant p f a
source d:\r\liml.src
@liml q
# constant p d
@liml q
# constant p f a
equation demand q
# constant p d
equation supply q
# constant p f a
* Supply does not match known answers!!
sur(inst,iterations=200) 2
# demand resid1
# supply resid2
nonlin(parmset=structural) c0 c1 c2 d0 d1 d2 d3
compute
compute
compute
compute
compute
compute
compute
c0
c1
c2
d0
d1
d2
d3
=
=
=
=
=
=
=
.1
.1
.1
.1
.1
.1
.1
frml d_eq q = c0 + c1*p + c2*d
frml s_eq q = d0 + d1*p + d2*f + d3*a
nlsystem(inst,parmset=structural,outsigma=v) * * d_eq s_eq
27
28
Chapter 4
b34sreturn$
b34srun $
b34sexec options close(28)$ b34srun$
b34sexec options close(29)$ b34srun$
b34sexec options
/$
dodos(' rats386 rats.in rats.out ')
dodos('start /w /r
rats32s rats.in /run')
dounix('rats
rats.in rats.out')$ B34SRUN$
b34sexec options npageout
WRITEOUT('Output from RATS',' ',' ')
COPYFOUT('rats.out')
dodos('ERASE rats.in','ERASE rats.out','ERASE
dounix('rm
rats.in','rm
rats.out','rm
$
B34SRUN$
rats.dat')
rats.dat')
As noted earlier, the 2SLS and 3SLS results for the over- identified equation (demand)
are the same. However, the printout shows that the residual variance for the 2SLS result is
3.8664, while the residual variance for the 3SLS result is 3.2865. The reason for this apparent
error is that the 2SLS residual variance equals the sum of squared residuals divided by T-K,
while the 3SLS calculation uses T; hence, 3.8664 = 3.2865 *(20/17).
To investigate the differences in the supply equation that occur in Kmenta (1971) and (1986),
edited and annotated SAS and RATS output is shown next. SAS 3SLS and I3SLS output is
shown to agree with Kmenta (1986) for both demand and supply equations.
The SYSLIN Procedure
Three-Stage Least Squares Estimation
Parameter Estimates
Variable
Intercept
P
D
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
1
1
1
94.63330
-0.24356
0.313992
7.920838
0.096484
0.046944
11.95
-2.52
6.69
<.0001
0.0218
<.0001
Model
Dependent Variable
SUPPLY
Q
Parameter Estimates
Variable
Intercept
P
F
A
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
1
1
1
1
52.19720
0.228589
0.228158
0.361138
11.89337
0.099673
0.043994
0.072889
4.39
2.29
5.19
4.95
0.0005
0.0357
<.0001
0.0001
Endogenous Variables
DEMAND
SUPPLY
P
Q
0.243557
-0.22859
1
1
Exogenous Variables
DEMAND
SUPPLY
Intercept
D
F
A
94.6333
52.1972
0.313992
0
0
0.228158
0
0.361138
Simultaneous Equations Systems
Inverse Endogenous Variables
P
Q
DEMAND
SUPPLY
2.11799
0.48415
-2.11799
0.51585
29
30
Chapter 4
The SYSLIN Procedure
Three-Stage Least Squares Estimation
Reduced Form
P
Q
Intercept
D
F
A
89.87924
72.74263
0.665032
0.152019
-0.48324
0.117695
-0.76489
0.186293
The SYSLIN Procedure
Iterative Three-Stage Least Squares Estimation
Parameter Estimates
Variable
Intercept
P
D
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
1
1
1
94.63330
-0.24356
0.313992
7.920838
0.096484
0.046944
11.95
-2.52
6.69
<.0001
0.0218
<.0001
Model
Dependent Variable
SUPPLY
Q
Parameter Estimates
Variable
Intercept
P
F
A
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
1
1
1
1
52.66182
0.226586
0.223372
0.380006
12.80511
0.107459
0.046774
0.072010
4.11
2.11
4.78
5.28
0.0008
0.0511
0.0002
<.0001
Endogenous Variables
DEMAND
SUPPLY
P
Q
0.243557
-0.22659
1
1
Exogenous Variables
DEMAND
SUPPLY
Intercept
D
F
A
94.6333
52.66182
0.313992
0
0
0.223372
0
0.380006
Inverse Endogenous Variables
P
Q
DEMAND
SUPPLY
2.127012
0.481952
-2.12701
0.518048
The SYSLIN Procedure
Iterative Three-Stage Least Squares Estimation
Reduced Form
P
Q
Intercept
D
F
A
89.27387
72.89007
0.667864
0.151329
-0.47512
0.115718
-0.80828
0.196861
RATS output is shown next for OLS, 2SLS, LIML, and 3SLS two ways. Note that for the supply
equation the estimated coefficients, SE’s, t’s and probabilities were:
Constant
P
F
A
52.552667563 11.395623960
0.227056969 0.095630772
0.224496638 0.041626039
0.375573566 0.064094682
4.61165
2.37431
5.39318
5.85967
0.00000399
0.01758185
0.00000007
0.00000000
Which are very close to the B34S I3SLS results which are duplicated below
Exogenous Variables (Predetermined)
1
CONSTANT
2
F
52.55269
0.2244964
Std. Error
12.74080
0.4653972E-01
t
4.124755
4.823758
Theil SE
11.39572
0.4162639E-01
Theil t
4.611616
5.393126
Simultaneous Equations Systems
3
A
0.3755747
Endogenous Variables (Jointly Dependent)
4
P
0.2270569
31
0.7166061E-01
5.241020
0.6409520E-01
5.859638
Std. Error
0.1069194
t
2.123627
Theil SE
0.9563159E-01
Theil t
2.374287
These results are not at all like the SAS supply equation 3SLS results of
Model
SUPPLY
Dependent Variable
Q
Parameter Estimates
Variable
Intercept
P
F
A
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
1
1
1
1
52.19720
0.228589
0.228158
0.361138
11.89337
0.099673
0.043994
0.072889
4.39
2.29
5.19
4.95
0.0005
0.0357
<.0001
0.0001
And I3SLS results of:
Parameter Estimates
Variable
Intercept
P
F
A
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
1
1
1
1
52.66182
0.226586
0.223372
0.380006
12.80511
0.107459
0.046774
0.072010
4.11
2.11
4.78
5.28
0.0008
0.0511
0.0002
<.0001
that agree with Kmenta (1986) but not with Kmenta (1971). The full RATS output is shown
below calculating 3SLS two different ways.
linreg q
# constant p d
Linear Regression - Estimation by Least Squares
Dependent Variable Q
Usable Observations
20
Degrees of Freedom
Centered R**2
0.763789
R Bar **2
0.735999
Uncentered R**2
0.999689
T x R**2
19.994
Mean of Dependent Variable
100.89820000
Std Error of Dependent Variable
3.75649822
Standard Error of Estimate
1.93012724
Sum of Squared Residuals
63.331649953
Regression F(2,17)
27.4847
Significance Level of F
0.00000471
Log Likelihood
-39.90530
Durbin-Watson Statistic
1.744203
17
Variable
Coeff
Std Error
T-Stat
Signif
*******************************************************************************
1. Constant
99.89542291
7.51936214
13.28509 0.00000000
2. P
-0.31629880
0.09067741
-3.48818 0.00281529
3. D
0.33463560
0.04542183
7.36729 0.00000110
linreg q
# constant p f a
Linear Regression - Estimation by Least Squares
Dependent Variable Q
Usable Observations
20
Degrees of Freedom
Centered R**2
0.654807
R Bar **2
0.590084
Uncentered R**2
0.999546
T x R**2
19.991
Mean of Dependent Variable
100.89820000
Std Error of Dependent Variable
3.75649822
Standard Error of Estimate
2.40508651
Sum of Squared Residuals
92.551058175
Regression F(3,16)
10.1170
Significance Level of F
0.00056018
Log Likelihood
-43.69905
Durbin-Watson Statistic
2.109731
16
Variable
Coeff
Std Error
T-Stat
Signif
*******************************************************************************
1. Constant
58.275431202 11.462909888
5.08383 0.00011056
2. P
0.160366596 0.094883937
1.69013 0.11038810
32
3.
4.
Chapter 4
F
A
0.248133295
0.248302347
0.046187854
0.097517767
5.37226
2.54623
0.00006227
0.02156713
instruments constant d f a
linreg(inst) q
# constant p d
Linear Regression - Estimation by Instrumental Variables
Dependent Variable Q
Usable Observations
20
Degrees of Freedom
17
Mean of Dependent Variable
100.89820000
Std Error of Dependent Variable
3.75649822
Standard Error of Estimate
1.96632066
Sum of Squared Residuals
65.729087795
J-Specification(1)
2.535651
Significance Level of J
0.11130095
Durbin-Watson Statistic
2.009220
Variable
Coeff
Std Error
T-Stat
Signif
*******************************************************************************
1. Constant
94.63330387
7.92083831
11.94738 0.00000000
2. P
-0.24355654
0.09648429
-2.52431 0.02183240
3. D
0.31399179
0.04694366
6.68869 0.00000381
linreg(inst) q
# constant p f a
Linear Regression - Estimation by Instrumental Variables
Dependent Variable Q
Usable Observations
20
Degrees of Freedom
16
Mean of Dependent Variable
100.89820000
Std Error of Dependent Variable
3.75649822
Standard Error of Estimate
2.45755523
Sum of Squared Residuals
96.633243702
Durbin-Watson Statistic
2.384645
Variable
Coeff
Std Error
T-Stat
Signif
*******************************************************************************
1. Constant
49.532441699 12.010526407
4.12409 0.00079536
2. P
0.240075779 0.099933852
2.40235 0.02878451
3. F
0.255605724 0.047250071
5.40964 0.00005785
4. A
0.252924175 0.099655087
2.53800 0.02192877
source d:\r\liml.src
@liml q
# constant p d
Linear Regression - Estimation by LIML
Dependent Variable Q
Usable Observations
20
Degrees of Freedom
Centered R**2
0.751068
R Bar **2
0.721782
Uncentered R**2
0.999673
T x R**2
19.993
Mean of Dependent Variable
100.89820000
Std Error of Dependent Variable
3.75649822
Standard Error of Estimate
1.98141608
Sum of Squared Residuals
66.742164700
Regression F(2,17)
25.6459
Significance Level of F
0.00000736
Log Likelihood
-40.42982
Durbin-Watson Statistic
2.051725
17
Variable
Coeff
Std Error
T-Stat
Signif
*******************************************************************************
1. Constant
93.61922028
8.03124312
11.65688 0.00000000
2. P
-0.22953809
0.09800238
-2.34217 0.03160318
3. D
0.31001345
0.04743306
6.53581 0.00000509
LIML Specification Test
Chi-Squared(1)=
3.477343 with Significance Level 0.06221456
@liml q
# constant p f a
Linear Regression - Estimation by LIML
Dependent Variable Q
Usable Observations
20
Degrees of Freedom
Centered R**2
0.639582
R Bar **2
0.572004
Uncentered R**2
0.999526
T x R**2
19.991
Mean of Dependent Variable
100.89820000
Std Error of Dependent Variable
3.75649822
Standard Error of Estimate
2.45755523
Sum of Squared Residuals
96.633243702
Regression F(3,16)
9.4643
Significance Level of F
0.00078341
Log Likelihood
-44.13068
Durbin-Watson Statistic
2.384645
16
Variable
Coeff
Std Error
T-Stat
Signif
*******************************************************************************
1. Constant
49.532441699 12.010526407
4.12409 0.00079536
2. P
0.240075779 0.099933852
2.40235 0.02878451
Simultaneous Equations Systems
3.
4.
F
A
0.255605724
0.252924175
0.047250071
0.099655087
LIML Specification Test
Chi-Squared(0)=
0.000000 with Significance Level
equation demand q
# constant p d
equation supply q
# constant p f a
* Supply does not match known answers!!
sur(inst,iterations=200) 2
# demand resid1
# supply resid2
5.40964
2.53800
0.00005785
0.02192877
NA
Linear Systems - Estimation by System Instrumental Variables
Iterations Taken 6
Usable Observations
20
Dependent Variable Q
Mean of Dependent Variable
100.89820000
Std Error of Dependent Variable
3.75649822
Standard Error of Estimate
1.65490543
Sum of Squared Residuals
65.729087795
Durbin-Watson Statistic
2.009220
Variable
Coeff
Std Error
T-Stat
Signif
*******************************************************************************
1. Constant
94.63330387
7.30265210
12.95876 0.00000000
2. P
-0.24355654
0.08895412
-2.73800 0.00618138
3. D
0.31399179
0.04327991
7.25491 0.00000000
Dependent Variable Q
Mean of Dependent Variable
100.89820000
Std Error of Dependent Variable
3.75649822
Standard Error of Estimate
2.19982161
Sum of Squared Residuals
111.30194805
Durbin-Watson Statistic
2.094475
Variable
Coeff
Std Error
T-Stat
Signif
*******************************************************************************
4. Constant
52.552667563 11.395623960
4.61165 0.00000399
5. P
0.227056969 0.095630772
2.37431 0.01758185
6. F
0.224496638 0.041626039
5.39318 0.00000007
7. A
0.375573566 0.064094682
5.85967 0.00000000
Covariance\Correlation Matrix of Residuals
Q
Q
Q 3.286454389751
0.9815996605
Q 4.197924168364 5.565097402593
nonlin(parmset=structural) c0 c1 c2 d0 d1 d2 d3
compute c0 = .1
compute c1 = .1
compute c2 = .1
compute d0 = .1
compute d1 = .1
compute d2 = .1
compute d3 = .1
frml d_eq q = c0 + c1*p + c2*d
frml s_eq q = d0 + d1*p + d2*f + d3*a
nlsystem(inst,parmset=structural,outsigma=v) * * d_eq s_eq
GMM-No ZU Dependence
Convergence in
6 Iterations. Final criterion was
Usable Observations
20
Function Value
2.98311941
J-Specification(1)
2.983119
Significance Level of J
0.08413697
0.0000065 <
0.0000100
Dependent Variable Q
Mean of Dependent Variable
100.89820000
Std Error of Dependent Variable
3.75649822
Standard Error of Estimate
1.81285807
Sum of Squared Residuals
65.729087790
Durbin-Watson Statistic
2.009220
Dependent Variable Q
Mean of Dependent Variable
100.89820000
Std Error of Dependent Variable
3.75649822
Standard Error of Estimate
2.35905449
Sum of Squared Residuals
111.30276210
Durbin-Watson Statistic
2.094461
Variable
Coeff
Std Error
T-Stat
Signif
*******************************************************************************
33
34
1.
2.
3.
4.
5.
6.
7.
Chapter 4
C0
C1
C2
D0
D1
D2
D3
94.63330387
-0.24355654
0.31399179
52.55266757
0.22705697
0.22449664
0.37557357
7.30265212
0.08895412
0.04327991
11.39562399
0.09563077
0.04162604
0.06409468
12.95876
-2.73800
7.25491
4.61165
2.37431
5.39318
5.85967
0.00000000
0.00618138
0.00000000
0.00000399
0.01758185
0.00000007
0.00000000
4.4 Exactly identified systems
Table 4.7 shows the Kmenta supply and demand model modified to be exactly identified.
In this form of the model the exogenous variable a was removed from the demand equation. In
this case  can be directly estimated with OLS and does not have to be calculated as  B 1
using (4.1-4). It will be shown below that the LIML, 2SLS and 3SLS results are all the same. If
 is calculated from the biased OLS model over identified system, it will, however, not be the
same.
Table 4.6 Exactly Identified Kmenta Problem
/; Modified PROBLEM FROM KMENTA (1971) PAGE 565 - 582
b34sexec options ginclude('b34sdata.mac') member(kmenta);
b34srun;
b34sexec simeq printsys reduced ols liml ls2 ls3 ils3 icov ipr=6
itmax=2000 kcov=diag ;
heading=('Modified test case from kmenta 1971 pp 565-582' ) ;
* the variable a has been removed from demand equation ;
exogenous constant d f ;
endogenous p q ;
model lvar=q rvar=(constant p d)
name=('demand eq.') ;
model lvar=q rvar=(constant p f)
name=('supply eq.') ;
b34seend ;
b34sexec matrix;
call loaddata;
call olsq(q d f :print);
call olsq(p d f :print);
b34srun;
Edited output from running the code in Table 4.6 is shown below and will show alternative ways
to calculate the constrained reduced form:
Q = 71.7276 + .18278 D
(15.93)
(3.86)
+
.11739 F
(2.67)
(4.4-1)
Simultaneous Equations Systems
P = 82.1843 + .4346 D
(10.19)
(4.95)
-
35
.28520 F
(-3.49)
(4.4-2)
which was estimated in (4.4-1) and (4.4-2) with OLS.
Modified test case from kmenta 1971 pp 565-582
Least Squares Solution for System Number
Condition Number of Matrix is greater than
Relative Numerical Error in the Solution
LHS Endogenous Variable No.
2
Exogenous Variables (Predetermined)
1
CONSTANT
2
D
1
demand eq.
21.04911571706159
1.301987681166638E-11
Q
99.89542
0.3346356
Std. Error
7.519362
0.4542183E-01
t
13.28509
7.367285
Endogenous Variables (Jointly Dependent)
3
P
-0.3162988
Std. Error
0.9067741E-01
t
-3.488177
Residual Variance for Structural Disturbances
Ratio of Norm Residual to Norm LHS
3.725391173733892
1.762488253954560E-02
Modified test case from kmenta 1971 pp 565-582
Least Squares Solution for System Number
Condition Number of Matrix is greater than
Relative Numerical Error in the Solution
LHS Endogenous Variable No.
2
Exogenous Variables (Predetermined)
1
CONSTANT
2
F
2
supply eq.
17.64779394899586
1.349763156429639E-11
Q
65.56501
0.2137827
Endogenous Variables (Jointly Dependent)
3
P
0.1467363
Residual Variance for Structural Disturbances
Ratio of Norm Residual to Norm LHS
Std. Error
12.76481
0.5080064E-01
t
5.136387
4.208269
Std. Error
0.1089446
t
1.346889
7.650185613573186
2.525668087747731E-02
Modified test case from kmenta 1971 pp 565-582
Coefficients of the Reduced Form Equations.
Least Squares Solution.
Condition number of matrix used to find the reduced form coefficients is no smaller than
P
CONSTANT
D
F
1
2
3
4.319326581036200
Q
1
74.14
0.7227
-0.4617
2
76.44
0.1060
0.1460
Mean sum of squares of residuals for the reduced form equations.
1
2
P
Q
0.21861D+02
0.44308D+01
Condition Number of columns of exogenous variables,
9.7857
Modified test case from kmenta 1971 pp 565-582
Limited Information - Maximum Likelihood Solution f
1
demand eq.
Rank and Condition Number of Exogenous Columns
Rank and Condition Number of Endogenous Variables orthogonal to X(K)
Rank and Condition Number of Endogenous Variables orthogonal to X
Value of LIML
Parameter is
1.000000000000000
Condition Number of Matrix is greater than
Relative Numerical Error in the Solution
LHS Endogenous Variable No.
2
8.517463415017575
4.390231825107355E-12
Q
Standard Deviation Equals 2SLSQ Standard Deviation.
Exogenous Variables (Predetermined)
1
CONSTANT
2
D
106.7894
0.3616812
Endogenous Variables (Jointly Dependent)
2
1
2
8.5174634
1.0000000
1.0000000
36
Chapter 4
3
P
-0.4115989
Residual Variance for Structural Disturbances
Ratio of Norm Residual to Norm LHS
3.967444759365652
1.818845186115603E-02
Modified test case from kmenta 1971 pp 565-582
Limited Information - Maximum Likelihood Solution f
2
supply eq.
Rank and Condition Number of Exogenous Columns
Rank and Condition Number of Endogenous Variables orthogonal to X(K)
Rank and Condition Number of Endogenous Variables orthogonal to X
Value of LIML
Parameter is
2
1
2
7.8643511
1.0000000
1.0000000
1.000000000000000
Condition Number of Matrix is greater than
Relative Numerical Error in the Solution
LHS Endogenous Variable No.
2
7.864351104449048
5.058888259015094E-12
Q
Standard Deviation Equals 2SLSQ Standard Deviation.
Exogenous Variables (Predetermined)
1
CONSTANT
2
F
35.90387
0.2373297
Endogenous Variables (Jointly Dependent)
3
P
0.4205434
Residual Variance for Structural Disturbances
Ratio of Norm Residual to Norm LHS
10.49268888645498
2.957901371051407E-02
Modified test case from kmenta 1971 pp 565-582
Modified test case from kmenta 1971 pp 565-582
Coefficients of the Reduced Form Equations.
LIMLE Solution.
Condition number of matrix used to find the reduced form coefficients is no smaller than
P
CONSTANT
D
F
1
2
3
2.403435013906650
Q
1
85.18
0.4346
-0.2852
2
71.73
0.1828
0.1174
Modified test case from kmenta 1971 pp 565-582
Two Stage Least Squares Solution for System Number
Condition Number of Matrix is greater than
Relative Numerical Error in the Solution
LHS Endogenous Variable No.
2
Exogenous Variables (Predetermined)
1
CONSTANT
2
D
1
demand eq.
32.58122209700925
2.267663108215286E-11
Q
106.7894
0.3616812
Endogenous Variables (Jointly Dependent)
3
P
-0.4115989
Residual Variance for Structural Disturbances
Ratio of Norm Residual to Norm LHS
Std. Error
11.14355
0.5640608E-01
t
9.583069
6.412096
Std. Error
0.1448445
t
-2.841660
Theil SE
10.27384
0.5200383E-01
Theil SE
0.1335401
Theil t
10.39430
6.954895
Theil t
-3.082213
3.967444759365655
1.818845186115604E-02
Modified test case from kmenta 1971 pp 565-582
Two Stage Least Squares Solution for System Number
Condition Number of Matrix is greater than
Relative Numerical Error in the Solution
LHS Endogenous Variable No.
2
Exogenous Variables (Predetermined)
1
CONSTANT
2
F
2
supply eq.
22.96654225297699
2.323008755765498E-11
Q
35.90387
0.2373297
Endogenous Variables (Jointly Dependent)
3
P
0.4205434
Residual Variance for Structural Disturbances
Ratio of Norm Residual to Norm LHS
Std. Error
18.86754
0.6019217E-01
t
1.902944
3.942866
Theil SE
17.39501
0.5549444E-01
Theil t
2.064032
4.276639
Std. Error
0.1660421
t
2.532751
Theil SE
0.1530833
Theil t
2.747154
10.49268888645498
2.957901371051407E-02
Modified test case from kmenta 1971 pp 565-582
Coefficients of the Reduced Form Equations.
Two Stage Least Squares Solution
Simultaneous Equations Systems
Condition number of matrix used to find the reduced form coefficients is no smaller than
P
CONSTANT
D
F
1
2
3
37
2.403435013906650
Q
1
85.18
0.4346
-0.2852
2
71.73
0.1828
0.1174
Modified test case from kmenta 1971 pp 565-582
Three Stage Least Squares Solution for System Number
LHS Endogenous Variable No.
2
Exogenous Variables (Predetermined)
1
CONSTANT
2
D
1
106.7894
0.3616812
Endogenous Variables (Jointly Dependent)
3
P
-0.4115989
Std. Error
11.14355
0.5640608E-01
t
9.583069
6.412096
Std. Error
0.1448445
t
-2.841660
Residual Variance (For Structural Disturbances)
2
Exogenous Variables (Predetermined)
1
CONSTANT
2
F
Theil SE
10.27384
0.5200383E-01
Theil SE
0.1335401
Theil t
10.39430
6.954895
Theil t
-3.082213
3.372328
Three Stage Least Squares Solution for System Number
LHS Endogenous Variable No.
demand eq.
Q
2
supply eq.
Q
35.90387
0.2373297
Endogenous Variables (Jointly Dependent)
3
P
0.4205434
Std. Error
18.86754
0.6019217E-01
t
1.902944
3.942866
Theil SE
17.39501
0.5549444E-01
Theil t
2.064032
4.276639
Std. Error
0.1660421
t
2.532751
Theil SE
0.1530833
Theil t
2.747154
Residual Variance (For Structural Disturbances)
8.918786
Coefficients of the Reduced Form Equations.
Three Stage Least Squares Solution using Orthogonal Factorization.
Condition number of matrix used to find the reduced form coefficients is no smaller than
P
CONSTANT
D
F
1
2
3
2.403435013906646
Q
1
85.18
0.4346
-0.2852
2
71.73
0.1828
0.1174
Note that the following OLS regressions successfully replicate the constrained reduced form
values calculated by LIML, 2SLS and 3SLS models. In such exactly identified models it is
possible to proceed from the reduced form to the coefficients of the estimated simultaneous
structural model as shown in Table 4.1 for the theoretical model.
B34S(r) Matrix Command. d/m/y 13/ 5/08. h:m:s
=>
CALL LOADDATA$
=>
CALL OLSQ(Q D F :PRINT)$
Ordinary Least Squares Estimation
Dependent variable
Centered R**2
Adjusted R**2
Residual Sum of Squares
Residual Variance
Standard Error
Total Sum of Squares
Log Likelihood
Mean of the Dependent Variable
Std. Error of Dependent Variable
Sum Absolute Residuals
F( 2,
17)
F Significance
1/Condition XPX
Maximum Absolute Residual
Number of Observations
Variable
D
F
CONSTANT
Lag
0
0
0
Coefficient
0.18278440
0.11738935
71.727578
8: 9:49.
Q
0.7142164973143195
0.6805949087630629
76.62264354549249
4.507214326205441
2.123020095572682
268.1142991999998
-41.81037433562074
100.8982000000000
3.756498223780113
32.24420684107891
21.24279452844673
0.9999762143066244
5.775396842473943E-07
4.421086526017319
20
SE
0.47299583E-01
0.44030665E-01
4.5035392
t
3.8643977
2.6660816
15.926935
38
=>
Chapter 4
CALL OLSQ(P D F :PRINT)$
Ordinary Least Squares Estimation
Dependent variable
Centered R**2
Adjusted R**2
Residual Sum of Squares
Residual Variance
Standard Error
Total Sum of Squares
Log Likelihood
Mean of the Dependent Variable
Std. Error of Dependent Variable
Sum Absolute Residuals
F( 2,
17)
F Significance
1/Condition XPX
Maximum Absolute Residual
Number of Observations
Variable
D
F
CONSTANT
Lag
0
0
0
Coefficient
0.43463860
-0.28520325
85.184338
P
0.6043888119424351
0.5578463192297805
263.9721582328006
15.52777401369415
3.940529661567611
667.2514989500000
-54.17988429200851
100.0190500000000
5.926086393627488
56.50496104816216
12.98574220495295
0.9996226165906434
5.775396842473943E-07
9.070540097816391
20
SE
0.87792579E-01
0.81725152E-01
8.3590023
t
4.9507442
-3.4897855
10.190730
4.5 Analysis of OLS, 2SLS and 3SLS using Matrix Command
The matrix command, documented in Chapter 16, provides a means by which to
illustrate the estimation of OLS, 2SLS and 3SLS models using “classic textbook” formulas.
Simultaneous Equations Systems
39
Table 4.7 shows code that implements OLS, 2SLS, 3SLS and FIML estimation using
these formulas:
Table 4.7 Matrix Command Implementation of OLS, 2SLS, 3SLS and FIML
/$
/$ Estimates Kmenta Problem with Matrix command.
/$ Purpose is to illustrate OLS/2SLS/3SLS/FIML both with
/$ SIMEQ and with Matrix Commands.
/$
/$ FIML SE same as 3SLS asymptotically (See Greene 5e page 408)
/$
/$ Problem Discussed in "Specifying and Diagnostically Testing
/$ Econometric Models" Chapter 4 Third Edition
/$
%b34slet verbose=0;
/$
set =1 to "test" matrix setup. Usually set=0
%b34slet dosimeq=1;
/$
set =1 to run the SIMEQ command as well as matrix
B34SEXEC DATA NOHEAD CORR$
INPUT Q P D F A $
LABEL Q = 'Food consumption per head'$
LABEL P = 'Ratio of Food Prices to consumer prices'$
LABEL D = 'Disposable Income in constant prices'$
LABEL F = 'Ratio of T-1 years price to general P'$
LABEL A = 'Time'$
COMMENT=('KMENTA(1971) PAGE 565 ANSWERS PAGE 582')$
DATACARDS$
98.485 100.323 87.4 98.0 1 99.187 104.264 97.6
102.163 103.435 96.7 99.1 3 101.504 104.506 98.2
104.240
98.001 99.8 110.8 5 103.243
99.456 100.5
103.993 101.066 103.2 105.6 7 99.900 104.763 107.8
100.350
96.446 96.6 108.7 9 102.820
91.228 88.9
95.435
93.085 75.1 81.0 11 92.424
98.801 76.9
94.535 102.908 84.6 70.9 13 98.757
98.756 90.6
105.797
95.119 103.1 102.3 15 100.225
98.451 105.1
103.522
86.498 96.4 110.5 17 99.929 104.016 104.4
105.223 105.769 110.7 89.3 19 106.232 113.490 127.1
B34SRETURN$
B34SEEND$
99.1
98.1
108.2
109.8
100.6
68.6
81.4
105.0
92.5
93.0
%b34sif(&dosimeq.eq.1)%then;
B34SEXEC SIMEQ PRINTSYS REDUCED OLS LIML LS2 LS3 FIML FIMLC
KCOV=DIAG IPR=6$
HEADING=('Test Case from Kmenta (1971) Pages 565 - 582 ' ) $
EXOGENOUS CONSTANT D F A $
ENDOGENOUS P Q $
MODEL LVAR=Q RVAR=(CONSTANT P D)
NAME=('Demand Equation')$
MODEL LVAR=Q RVAR=(CONSTANT P F A) NAME=('Supply Equation')$
B34SEEND$
%b34sendif;
b34sexec matrix;
call loaddata;
verbose=0;
%b34sif(&verbose.ne.0)%then;
verbose=1;
2
4
6
8
10
12
14
16
18
20
40
Chapter 4
%b34sendif;
x_1=mfam(catcol(constant p d));
x_2=mfam(catcol(constant p f a));
x_1px_1=transpose(x_1)*x_1;
x_2px_2=transpose(x_2)*x_2;
x_1py_1=transpose(x_1)*vfam(q);
x_2py_2=transpose(x_2)*vfam(q);
d1=inv(x_1px_1)*x_1py_1;
d2=inv(x_2px_2)*x_2py_2;
call print('OLS eq 1 ',d1 );
call print('OLS eq 2 ',d2 );
* 2SLS ;
* z_i is right hand side of equation i ;
x
= mfam(catcol(constant d f a));
xpx
= transpose(x)*x;
z_1
= mfam(catcol(constant p d) );
z_2
= mfam(catcol(constant p f a));
xpz_1
= transpose(x)*z_1;
xpz_2
= transpose(x)*z_2;
xpy_1
= transpose(x)*vfam(q);
xpy_2
= transpose(x)*vfam(q);
y_1py_1 = vfam(q)*vfam(q);
y_2py_2 = vfam(q)*vfam(q);
y_1py_2 = vfam(q)*vfam(q);
ls2eq1=inv(transpose(xpz_1)*inv(xpx)*xpz_1)*
(transpose(xpz_1)*inv(xpx)*xpy_1);
call print('Two stage estimates Equation 1',ls2eq1);
fit1=vfam(q)-z_1*ls2eq1;
sigma11=(y_1py_1 - (2.*vfam(q)*z_1*ls2eq1) +
ls2eq1*transpose(z_1)*z_1*ls2eq1)/17.;
if(verbose.ne.0)then;
call print('sigma11
',sigma11:);
call print('Residual Variance 1',sigma11*sigma11:);
call print('Test 1
',(fit1*fit1)/ 17.:);
call print('Large sample ',(fit1*fit1)/ 20.:);
endif;
varcoef1=sigma11*inv(transpose(z_1)*x*inv(xpx)*transpose(x)*z_1);
call print('Asymptotic Covariance Matrix eq 1 ',varcoef1);
ls2eq2=inv(transpose(xpz_2)*inv(xpx)*xpz_2)*
(transpose(xpz_2)*inv(xpx)*xpy_2);
call print('Two stage estimates Equation 2',ls2eq2);
fit2=vfam(q)-z_2*ls2eq2;
sigma22=(y_2py_2 - (2.*vfam(q)*z_2*ls2eq2) +
ls2eq2*transpose(z_2)*z_2*ls2eq2)/16.;
if(verbose.ne.0)then;
call print('sigma22
',sigma22:);
call print('Residual Variance 2',sigma22*sigma22:);
call print('Test 2
',(fit2*fit2)/ 16.:);
call print('Large Sample ',(fit2*fit2)/ 20.:);
endif;
Simultaneous Equations Systems
sigma12=(y_1py_2 - (vfam(q)*z_1*ls2eq1) - (vfam(q)*z_2*ls2eq2) +
ls2eq1*transpose(z_1)*z_2*ls2eq2)/20.;
if(verbose.ne.0)call print('test sigma12 ',sigma12);
varcoef2=sigma22*inv(transpose(z_2)*x*inv(xpx)*transpose(x)*z_2);
call print('Asymptotic Covariance Matrix eq 2 ',varcoef2);
* Get sigma(i,j) from fits ;
s=mfam(catcol(fit1,fit2));
sigma=(transpose(s)*s)/20.;
call print('Large Sample sigma (Jennings) ',sigma);
covar1=sigma(1,1)*inv(transpose(xpz_1)*inv(xpx)*xpz_1);
covar2=sigma(2,2)*inv(transpose(xpz_2)*inv(xpx)*xpz_2);
call print('Estimated Covariance Matrix - Large Sample':);
call print(covar1,covar2);
ls2se=dsqrt(array(:covar1(1,1),covar1(2,2),covar1(3,3)
covar2(1,1),covar2(2,2),covar2(3,3) covar2(4,4)));
call print('SE of LS2 Model Equations - Large Sample',ls2se);
sssigma(1,1)=sigma(1,1)*(20./17.);
sssigma(1,2)=sigma(1,2)*(20./dsqrt(17.*16.));
sssigma(2,1)=sigma(2,1)*(20./dsqrt(17.*16.));
sssigma(2,2)=sigma(2,2)*(20./16.);
call print('Kmenta (Small Sample Sigma
',sssigma);
covar1=sssigma(1,1)*inv(transpose(xpz_1)*inv(xpx)*xpz_1);
covar2=sssigma(2,2)*inv(transpose(xpz_2)*inv(xpx)*xpz_2);
call print('Estimated Covariance Matrix - Small Sample':);
call print(covar1,covar2);
ls2se=dsqrt(array(:diag(covar1),diag(covar2)));
call print('SE of LS2 Model Equations - Small Sample',ls2se);
* LS3 calculation ;
xpxinv=inv(xpx);
/$ sigma=inv(sssigma);
sigma=inv(sigma);
term11= sigma(1,1)*(transpose(xpz_1)*xpxinv*xpz_1);
term12= sigma(1,2)*(transpose(xpz_1)*xpxinv*xpz_2);
term21= sigma(2,1)*(transpose(xpz_2)*xpxinv*xpz_1);
term22= sigma(2,2)*(transpose(xpz_2)*xpxinv*xpz_2);
left1 =catcol(term11 term12);
left2 =catcol(term21 term22);
left =catrow(left1 left2);
if(verbose.ne.0)
call print(term11 term12 term21 term22 left1 left2 left);
right1=(sigma(1,1)*(transpose(xpz_1)*xpxinv*xpy_1)) +
(sigma(1,2)*(transpose(xpz_1)*xpxinv*xpy_2));
right2=(sigma(2,1)*(transpose(xpz_2)*xpxinv*xpy_1)) +
(sigma(2,2)*(transpose(xpz_2)*xpxinv*xpy_2));
right=catrow(right1 right2);
41
42
Chapter 4
call print(right1 right2 right,inv(left));
ls3=inv(left)*right;
call print('Three Stage Least Squares ',ls3);
ls3se = dsqrt(diag(inv(left)));
t3sls=array(norows(ls3):ls3(,1))/afam(ls3se);
call print('Three Stage Least Squares SE',ls3se);
call print('Three Stage Least Squares t ',t3sls);
* FIML following Kmenta (1971) pages 578 - 581 ;
* q = f(constant P D )
* q = g(constant p F A)
* q = a1 + a2*p + a3*d
* q = b1 + b2*p + b3*f
;
;
+ u1 ;
+ b4*a + u2;
y = transpose(mfam(catcol(q p)));
x = transpose(mfam(catcol(constant d f a)));
gt= 2.* dfloat(norows(y));
t =dfloat(norows(y));
call print('Using 3sls starting values ',ls3);
/$
/$
/$
/$
/$
/$
/$
a1=sfam(ls3(1));
a2=sfam(ls3(2));
a3=sfam(ls3(3));
b1=sfam(ls3(4));
b2=sfam(ls3(5));
b3=sfam(ls3(6));
b4=sfam(ls3(7));
program model;
bigb
= matrix(2,2:
1.0, -1.0*a2,
1.0, -1.0*b2);
biggamma = matrix(2,4:-1.0*a1, -1.0*a3, 0.0,
0.0,
-1.0*b1, 0.0,
-1.0*b3, -1.0*b4);
u1u2=bigb*y+biggamma*x;
phi
= u1u2*transpose(u1u2);
/$ General purpose FIML setup if there are no identities
/$ For a discussion of Formulas see Kmenta (1971) page 578-581
func=(-1.0*(gt*pi())/2.0)
- ((t/2.0)*dlog(dmax1(dabs(det(phi)) ,.1d-30) ))
+ ( t
*dlog(dmax1(dabs(det(bigb)),.1d-30) ))
- (.5*sum(transpose(u1u2)*inv(phi)*u1u2));
call
call
call
call
call
outstring(3, 3,'Function');
outdouble(36,3,func);
outdouble(4, 4, a1);
outdouble(36,4, a2);
outdouble(55,4, a3);
call outdouble(4 ,5, b1);
call outdouble(36,5, b2);
call outdouble(55,5, b3);
Simultaneous Equations Systems
43
call outdouble(4, 6, b4);
return;
end;
call
rvec
ll
uu
call
print(model);
=vector(7:ls3);
=vector(7:) -1.d+2;
=vector(7:) +1.d+3;
echooff;
call cmaxf2(func :name model
:parms a1 a2 a3 b1 b2 b3 b4
:ivalue rvec
:lower ll
:upper UU
:maxit 10000
:maxfun 10000
:maxg
10000
:print);
b34srun;
The matrices X_1 and X_2 are built with the catcol command and the OLS estimates for
equations 1 and 2 are respectively D1 and D2. Edited results show.
OLS eq 1
D1
= Vector of
99.8954
=>
3
-0.316299
elements
0.334636
CALL PRINT('OLS eq 2 ',D2 )$
OLS eq 2
D2
= Vector of
58.2754
4
0.160367
elements
0.248133
0.248302
which are consistent with what was obtained with the simeq command. Next using the
“textbook” 2SLS formula
ˆ1  [ Z1' X ( X ' X ) 1 X ' Z1 ]1 [ Z1' X ( X ' X ) 1 X ' y1 ]
ˆ2  [ Z 2' X ( X ' X ) 1 X ' Z 2 ]1[ Z 2' X ( X ' X ) 1 X ' y2 ]
 i j  [eˆ1 , eˆ2 ]'[eˆ1 , eˆ2 ]/ T
(4.5-1)
we obtain the 2SLS estimates and the error covariance matrix  i , j which is needed for the 3SLS
estimates. Edited results match what was found earlier with simeq. Note that call echooff; has
been turned off to illustrate the steps of the calculation.
Two stage estimates Equation 1
LS2EQ1
= Vector of
94.6333
=>
3
-0.243557
FIT1=VFAM(Q)-Z_1*LS2EQ1$
elements
0.313992
44
Chapter 4
=>
=>
SIGMA11=(Y_1PY_1 - (2.*VFAM(Q)*Z_1*LS2EQ1) +
LS2EQ1*TRANSPOSE(Z_1)*Z_1*LS2EQ1)/17.$
=>
IF(VERBOSE.NE.0)THEN$
=>
CALL PRINT('sigma11
=>
CALL PRINT('Residual Variance
=>
CALL PRINT('Test 1
=>
CALL PRINT('Large sample ',(FIT1*FIT1)/ 20.:)$
=>
ENDIF$
=>
VARCOEF1=SIGMA11*INV(TRANSPOSE(Z_1)*X*INV(XPX)*TRANSPOSE(X)*Z_1)$
=>
CALL PRINT('Asymptotic Covariance Matrix eq 1 ',VARCOEF1)$
',SIGMA11:)$
1',SIGMA11*SIGMA11:)$
',(FIT1*FIT1)/ 17.:)$
Asymptotic Covariance Matrix eq 1
VARCOEF1= Matrix of
1
2
3
3
1
62.7397
-0.673422
0.493016E-01
by
3
2
-0.673422
0.930922E-02
-0.264190E-02
elements
3
0.493016E-01
-0.264190E-02
0.220371E-02
=>
=>
LS2EQ2=INV(TRANSPOSE(XPZ_2)*INV(XPX)*XPZ_2)*
(TRANSPOSE(XPZ_2)*INV(XPX)*XPY_2)$
=>
CALL PRINT('Two stage estimates Equation 2',LS2EQ2)$
Two stage estimates Equation 2
LS2EQ2
= Vector of
49.5324
4
0.240076
elements
0.255606
0.252924
=>
FIT2=VFAM(Q)-Z_2*LS2EQ2$
=>
=>
SIGMA22=(Y_2PY_2 - (2.*VFAM(Q)*Z_2*LS2EQ2) +
LS2EQ2*TRANSPOSE(Z_2)*Z_2*LS2EQ2)/16.$
=>
IF(VERBOSE.NE.0)THEN$
=>
CALL PRINT('sigma22
=>
CALL PRINT('Residual Variance 2',SIGMA22*SIGMA22:)$
=>
CALL PRINT('Test 2
=>
CALL PRINT('Large Sample ',(FIT2*FIT2)/ 20.:)$
=>
ENDIF$
',SIGMA22:)$
',(FIT2*FIT2)/ 16.:)$
Simultaneous Equations Systems
=>
=>
SIGMA12=(Y_1PY_2 - (VFAM(Q)*Z_1*LS2EQ1) - (VFAM(Q)*Z_2*LS2EQ2) +
LS2EQ1*TRANSPOSE(Z_1)*Z_2*LS2EQ2)/20.$
=>
IF(VERBOSE.NE.0)CALL PRINT('test sigma12 ',SIGMA12)$
=>
VARCOEF2=SIGMA22*INV(TRANSPOSE(Z_2)*X*INV(XPX)*TRANSPOSE(X)*Z_2)$
=>
CALL PRINT('Asymptotic Covariance Matrix eq 2 ',VARCOEF2)$
Asymptotic Covariance Matrix eq 2
VARCOEF2= Matrix of
1
2
3
4
1
144.253
-1.09541
-0.323818
-0.295229
4
by
2
-1.09541
0.998677E-02
0.936222E-03
0.579069E-03
4
elements
3
-0.323818
0.936222E-03
0.223257E-02
0.137681E-02
4
-0.295229
0.579069E-03
0.137681E-02
0.993114E-02
=>
* GET SIGMA(I,J) FROM FITS $
=>
S=MFAM(CATCOL(FIT1,FIT2))$
=>
SIGMA=(TRANSPOSE(S)*S)/20.$
=>
CALL PRINT('Large Sample sigma (Jennings) ',SIGMA)$
Large Sample sigma (Jennings)
SIGMA
1
2
= Matrix of
1
3.28645
3.59324
2
by
2
elements
2
3.59324
4.83166
=>
COVAR1=SIGMA(1,1)*INV(TRANSPOSE(XPZ_1)*INV(XPX)*XPZ_1)$
=>
COVAR2=SIGMA(2,2)*INV(TRANSPOSE(XPZ_2)*INV(XPX)*XPZ_2)$
=>
CALL PRINT('Estimated Covariance Matrix - Large Sample':)$
Estimated Covariance Matrix - Large Sample
=>
CALL PRINT(COVAR1,COVAR2)$
COVAR1
1
2
3
COVAR2
1
2
3
4
=>
= Matrix of
1
53.3287
-0.572408
0.419064E-01
= Matrix of
1
115.402
-0.876328
-0.259055
-0.236183
3
by
2
-0.572408
0.791284E-02
-0.224561E-02
4
by
2
-0.876328
0.798942E-02
0.748977E-03
0.463256E-03
3
elements
3
0.419064E-01
-0.224561E-02
0.187315E-02
4
elements
3
-0.259055
0.748977E-03
0.178606E-02
0.110144E-02
LS2SE=DSQRT(ARRAY(:DIAG(COVAR1),DIAG(COVAR2)))$
4
-0.236183
0.463256E-03
0.110144E-02
0.794491E-02
45
46
=>
Chapter 4
CALL PRINT('SE of LS2 Model Equations - Large Sample',LS2SE)$
SE of LS2 Model Equations - Large Sample
LS2SE
= Array
of
7.30265
7
elements
0.889541E-01
0.432799E-01
=>
SSSIGMA(1,1)=SIGMA(1,1)*(20./17.)$
=>
SSSIGMA(1,2)=SIGMA(1,2)*(20./DSQRT(17.*16.))$
=>
SSSIGMA(2,1)=SIGMA(2,1)*(20./DSQRT(17.*16.))$
=>
SSSIGMA(2,2)=SIGMA(2,2)*(20./16.)$
=>
CALL PRINT('Kmenta (Small Sample Sigma
10.7425
0.893836E-01
0.422617E-01
0.891342E-01
0.472501E-01
0.996551E-01
',SSSIGMA)$
Kmenta (Small Sample Sigma
SSSIGMA = Matrix of
1
2
1
3.86642
4.35744
2
by
2
elements
2
4.35744
6.03958
=>
COVAR1=SSSIGMA(1,1)*INV(TRANSPOSE(XPZ_1)*INV(XPX)*XPZ_1)$
=>
COVAR2=SSSIGMA(2,2)*INV(TRANSPOSE(XPZ_2)*INV(XPX)*XPZ_2)$
=>
CALL PRINT('Estimated Covariance Matrix - Small Sample':)$
Estimated Covariance Matrix - Small Sample
=>
CALL PRINT(COVAR1,COVAR2)$
COVAR1
1
2
3
COVAR2
1
2
3
4
= Matrix of
1
62.7397
-0.673422
0.493016E-01
= Matrix of
1
144.253
-1.09541
-0.323818
-0.295229
3
by
3
2
-0.673422
0.930922E-02
-0.264190E-02
4
by
3
0.493016E-01
-0.264190E-02
0.220371E-02
4
2
-1.09541
0.998677E-02
0.936222E-03
0.579069E-03
elements
elements
3
-0.323818
0.936222E-03
0.223257E-02
0.137681E-02
4
-0.295229
0.579069E-03
0.137681E-02
0.993114E-02
=>
=>
LS2SE=DSQRT(ARRAY(:COVAR1(1,1),COVAR1(2,2),COVAR1(3,3)
COVAR2(1,1),COVAR2(2,2),COVAR2(3,3) COVAR2(4,4)))$
=>
CALL PRINT('SE of LS2 Model Equations - Small Sample',LS2SE)$
SE of LS2 Model Equations - Small Sample
LS2SE
= Array
7.92084
of
7
elements
0.964843E-01
0.469437E-01
12.0105
0.999339E-01
Simultaneous Equations Systems
47
Note that the estimated asymptotic covariance matrix for each equation was calculated as
ˆ11[ Z1 ' X ( X ' X ) 1 X ' Z1 ]1
ˆ 22 [ Z 2 ' X ( X ' X ) 1 X ' Z 2 ]1
(4.5-2)
The SE for each coefficient is the square root of the diagonal elements of the estimated
covariance matrix. The 3SLS model is estimated using the “textbook” equation as
ˆ1,1 [ Z1' X [ X ' X ]1 X ' y1 ]  

1 
ˆ1,1[ Z1' X [ X ' X ]1 X ' Z1 ˆ1,2 [ Z1' X [ X ' X ]1 X ' Z 2  ˆ1,2 [ Z1' X [ X ' X ]1 X ' y2 ] 

 

'
1
'
1
'
1
ˆ2,1[ Z 2 X [ X ' X ] X ' Z1 ˆ2,2 [ Z 2 X [ X ' X ] X ' Z 2  ˆ2,1 [ Z 2 X [ X ' X ] X ' y1 ]   (4.5-3)
ˆ [ Z ' X [ X ' X ]1 X ' y ] 
2
 2,2 2

where   [ ]1 . Equation (4.5-3) comes directly from Kmenta (1971, 577) and is consistent
with Theil (1971, 510). It is to me noted that most modern texts The estimated output verifies the
simeq 3SLS command. In the matrix program output each term in (4.4-3) is broken out and put
together into the left and right parts of (4.5-3), which at first looks formidable.
=>
* LS3 CALCULATION $
=>
XPXINV=INV(XPX)$
=>
SIGMA=INV(SIGMA)$
=>
TERM11= SIGMA(1,1)*(TRANSPOSE(XPZ_1)*XPXINV*XPZ_1)$
=>
TERM12= SIGMA(1,2)*(TRANSPOSE(XPZ_1)*XPXINV*XPZ_2)$
=>
TERM21= SIGMA(2,1)*(TRANSPOSE(XPZ_2)*XPXINV*XPZ_1)$
=>
TERM22= SIGMA(2,2)*(TRANSPOSE(XPZ_2)*XPXINV*XPZ_2)$
=>
LEFT1 =CATCOL(TERM11 TERM12)$
=>
LEFT2 =CATCOL(TERM21 TERM22)$
=>
LEFT
=>
IF(VERBOSE.NE.0)THEN$
=>
CALL PRINT(TERM11 TERM12 TERM21 TERM22 LEFT1 LEFT2 LEFT)$
=>
ENDIF$
=>
=>
RIGHT1=(SIGMA(1,1)*(TRANSPOSE(XPZ_1)*XPXINV*XPY_1)) +
(SIGMA(1,2)*(TRANSPOSE(XPZ_1)*XPXINV*XPY_2))$
=>
=>
RIGHT2=(SIGMA(2,1)*(TRANSPOSE(XPZ_2)*XPXINV*XPY_1)) +
(SIGMA(2,2)*(TRANSPOSE(XPZ_2)*XPXINV*XPY_2))$
=>
RIGHT=CATROW(RIGHT1 RIGHT2)$
=CATROW(LEFT1 LEFT2)$
48
=>
Chapter 4
CALL PRINT(RIGHT1 RIGHT2 RIGHT,INV(LEFT))$
RIGHT1
= Vector of
3
842.104
RIGHT2
RIGHT
84261.3
= Vector of
-208.606
elements
82406.3
4
elements
-20873.2
= Matrix of
-20220.4
7
by
1
elements
7
by
7
elements
-2196.91
1
842.104
84261.3
82406.3
-208.606
-20873.2
-20220.4
-2196.91
1
2
3
4
5
6
7
Matrix of
1
53.3287
-0.572408
0.419064E-01
52.0707
-0.556756
0.337445E-01
0.509185E-01
1
2
3
4
5
6
7
2
-0.572408
0.791284E-02
-0.224561E-02
-0.291667
0.494945E-02
-0.180825E-02
-0.272854E-02
3
0.419064E-01
-0.224561E-02
0.187315E-02
-0.232929
0.632767E-03
0.150833E-02
0.227598E-02
=>
LS3=INV(LEFT)*RIGHT$
=>
CALL PRINT('Three State Least Squares ',LS3)$
4
52.0707
-0.291667
-0.232929
113.162
-0.866671
-0.235979
-0.327163
5
-0.556756
0.494945E-02
0.632767E-03
-0.866671
0.794779E-02
0.649506E-03
0.855426E-03
6
0.337445E-01
-0.180825E-02
0.150833E-02
-0.235979
0.649506E-03
0.154836E-02
0.203856E-02
7
0.509185E-01
-0.272854E-02
0.227598E-02
-0.327163
0.855426E-03
0.203856E-02
0.425029E-02
Three Stage Least Squares
LS3
= Matrix of
7
by
1
elements
1
94.6333
-0.243557
0.313992
52.1176
0.228932
0.228978
0.357907
1
2
3
4
5
6
7
=>
LS3SE = DSQRT(DIAG(INV(LEFT)))$
=>
CALL PRINT('Three State Least Squares SE',LS3SE)$
Three State Least Squares SE
LS3SE
= Vector of
7.30265
7
0.889541E-01
elements
0.432799E-01
10.6378
0.891504E-01
0.393493E-01
0.651943E-01
The estimated standard errors are those suggested by Theil. The FIML estimation method
required a maximization procedure. Kmenta (1971) shows that for a model without constraints
FIML maximizes
L
GT
T
1 T
log(2 )  log |  | T log | B |   ( Byt  xt )'  1 ( Byt  xt )
2
2
2 t 1
(4.5-4)
where G  M or the number of equations in the model. The Kmenta test problem can be written
Simultaneous Equations Systems
q  1   2 P  3 D  u1
 Demand
q  1   2 P   3 F   4 A  u2
 Supply
49
(4.5-5)
For this problem
0 
 1  3 0
1  2 
  
B
, 
,    11 12 


1   2 
 12  22 
  1 0  3   4 
and | B | and |  | refer to the Jacobian or absolute value of the determinant of B and 
respectively. Using the matrix command it is fairly easy to implement this estimator. Problems
can arise of there are local maximums in the problem. The edited FIML results are given next.
=>
PROGRAM MODEL$
=>
CALL PRINT(MODEL)$
MODEL
= Program
PROGRAM MODEL$
BIGB
= MATRIX(2,2:
1.0, -1.0*A2,
1.0, -1.0*B2)$
BIGGAMMA = MATRIX(2,4:-1.0*A1, -1.0*A3, 0.0,
0.0,
-1.0*B1, 0.0,
-1.0*B3, -1.0*B4)$
U1U2=BIGB*Y+BIGGAMMA*X$
PHI
= U1U2*TRANSPOSE(U1U2)$
FUNC=(-1.0*(GT*PI())/2.0)
- ((T/2.0)*DLOG(DMAX1(DABS(DET(PHI)) ,.1D-30) ))
+ ( T
*DLOG(DMAX1(DABS(DET(BIGB)),.1D-30) ))
- (.5*SUM(TRANSPOSE(U1U2)*INV(PHI)*U1U2))$
CALL OUTSTRING(3, 3,'Function')$
CALL OUTDOUBLE(36,3,FUNC)$
CALL OUTDOUBLE(4, 4, A1)$
CALL OUTDOUBLE(36,4, A2)$
CALL OUTDOUBLE(55,4, A3)$
CALL OUTDOUBLE(4 ,5, B1)$
CALL OUTDOUBLE(36,5, B2)$
CALL OUTDOUBLE(55,5, B3)$
CALL OUTDOUBLE(4, 6, B4)$
RETURN$
END$
=>
RVEC =VECTOR(7:LS3)$
=>
LL
=VECTOR(7:) -1.D+2$
=>
UU
=VECTOR(7:)
=>
CALL ECHOOFF$
+1.D+3$
Constrained Maximum Likelihood Estimation using CMAXF2 Command
Final Functional Value
-13.37570521223952
# of parameters
7
# of good digits in function 15
# of iterations
28
# of function evaluations
55
# of gradiant evaluations
30
Scaled Gradient Tolerance
6.055454452393343E-06
50
Chapter 4
Scaled Step Tolerance
Relative Function Tolerance
False Convergence Tolerance
Maximum allowable step size
Size of Initial Trust region
1 / Cond. of Hessian Matrix
#
1
2
3
4
5
6
7
Name
A1
A2
A3
B1
B2
B3
B4
3.666852862501036E-11
3.666852862501036E-11
2.220446049250313E-14
108037.5007234256
-1.000000000000000
2.229180241990960E-09
Coefficient
93.619219
-0.22953804
0.31001341
51.944511
0.23730613
0.22081875
0.36970888
Standard Error
3.4191227
0.60544227E-01
0.34296485E-01
7.3541629
0.45456398E-01
0.28752980E-01
0.14370566E-01
T Value
27.381064
-3.7912458
9.0392183
7.0632799
5.2205221
7.6798560
25.726814
SE calculated as sqrt |diagonal(inv(%hessian))|
Hessian Matrix
1
230.516
23086.3
22522.1
-174.305
-17457.4
-16823.6
-1834.45
1
2
3
4
5
6
7
2
23089.2
0.231266E+07
0.225634E+07
-17456.3
-0.174875E+07
-0.168477E+07
-183704.
3
22524.9
0.225660E+07
0.220289E+07
-17029.9
-0.170618E+07
-0.164463E+07
-179499.
4
-174.328
-17458.5
-17032.0
135.877
13607.8
13115.4
1430.03
5
-17459.8
-0.174897E+07
-0.170639E+07
13609.6
0.136313E+07
0.131342E+07
143201.
6
-16825.9
-0.168498E+07
-0.164483E+07
13117.1
0.131360E+07
0.126732E+07
137898.
7
-1834.71
-183728.
-179522.
1430.22
143221.
137918.
15323.9
Gradiant Vector
-0.568518E-06
-0.557801E-04
-0.544320E-04
0.447704E-06
0.438995E-04
0.419615E-04
0.528029E-05
Lower vector
-100.000
-100.000
-100.000
-100.000
-100.000
-100.000
-100.000
1000.00
1000.00
1000.00
1000.00
1000.00
1000.00
Upper vector
1000.00
B34S Matrix Command Ending. Last Command reached.
Space available in allocator
Number variables used
Number temp variables used
7873665, peak space used
130, peak number used
36882, # user temp clean
8277
135
0
and replicate the Kmenta test values for coefficients. The simeq FIML results are:
Test Case from Kmenta (1971) Pages 565 - 582
Functional Minimization Solution for System No.
LHS Endogenous Variable No.
2
1
Demand Equation
Q
Exogenous Variables (Predetermined)
1
CONSTANT
2
D
93.61922
0.3100134
Std. Error
6.152863
0.3633922E-01
t
15.21555
8.531097
Theil SE
5.672659
0.3350311E-01
Theil t
16.50359
9.253274
Endogenous Variables (Jointly Dependent)
3
P
-0.2295381
Std. Error
0.7508118E-01
t
-3.057199
Theil SE
0.6922143E-01
Theil t
-3.315998
Residual Variance (For Structural Disturbances)
3.337108
Functional Minimization 3SLS Covariance for System
CONSTANT
D
P
1
2
3
CONSTANT
D
1
2
37.86
0.3121E-01 0.1321E-02
-0.4078
-0.1600E-02
Demand Equation
P
3
0.5637E-02
Functional Minimization Solution for System No.
2
Supply Equation
Simultaneous Equations Systems
LHS Endogenous Variable No.
2
51
Q
Exogenous Variables (Predetermined)
1
CONSTANT
2
F
3
A
51.94451
0.2208188
0.3697089
Std. Error
9.739647
0.3489965E-01
0.5846143E-01
t
5.333305
6.327249
6.323981
Theil SE
8.711405
0.3121520E-01
0.5228949E-01
Theil t
5.962816
7.074080
7.070425
Endogenous Variables (Jointly Dependent)
4
P
0.2373061
Std. Error
0.8237774E-01
t
2.880707
Theil SE
0.7368089E-01
Theil t
3.220728
Residual Variance (For Structural Disturbances)
5.620947
Functional Minimization 3SLS Covariance for System
CONSTANT
F
A
P
CONSTANT
1
94.86
-0.1858
-0.3119
-0.7341
1
2
3
4
F
A
Supply Equation
P
2
3
4
0.1218E-02
0.1943E-02
0.4772E-03
0.3418E-02
0.8825E-03
0.6786E-02
Test Case from Kmenta (1971) Pages 565 - 582
Contemporaneous Covariance of Residuals (Structural Disturbances)
For Functional Minimization 3SLSQ Solution.
Condition Number of residual columns,
Demand E
Supply E
Demand E
1
3.337
4.255
1
2
6.942988
Supply E
2
5.621
Correlation Matrix of Residuals
Demand E
1
1
1.000
2
0.9824
Demand E
Supply E
Supply E
2
1.000
Test Case from Kmenta (1971) Pages 565 - 582
Coefficients of the Reduced Form Equations.
Condition number of matrix used to find the reduced form coefficients is no smaller than
P
CONSTANT
D
F
A
1
2
3
4
1
89.27
0.6641
-0.4730
-0.7919
4.284084281338983
Q
2
73.13
0.1576
0.1086
0.1818
Mean sum of squares of residuals for the reduced form equations.
1
2
P
Q
0.20588D+01
0.43479D+01
and give identical coefficients but different SE's due to the algorithm used. Greene (2003, page
408), notes that "asymptotically the covariance matrix for the FIML estimator is the same as that
for the 3SLS estimator."
The purpose of this exercise has been to illustrate how "textbook" formulas can be used
with a programming language, such as the matrix command, to produce 2SLS, 3SLS and FIML
estimates fairly easily where the alternative would be to build a C or Fortran program to perform
the calculation. Since "textbook" formulas are used for the matrix example, the accuracy of
these calculations are inferior to the QR approach of Jennings (1980), which is the basis for the
simeq command. Inspection of the matrix program that implements these estimators may give
the reader confidence to tackle other calculations that have not been implemented in commercial
software.11 The matrix examples shown have been coded for teaching purposes (clarity of the
11 The modern pace of research is so fast that if one waits until a new procedure is implemented in commercial
52
Chapter 4
code) not research purposes. Many components of the calculation that appear a number of places
in a formula such as (4.4-3) have not been calculated once and saved.
4.6 LS2 and GMM Models and Specification tests
The Generalized Method of Moments estimation technique is a generalization of 2SLS
that allows for various assumptions on the error distribution. Assume there are l instruments in
Z. The basic idea of GMM is to select coefficients ˆGMM such that
g (ˆGMM )  0
(4.6-1)
where
1 N
1 N
1
g ( ˆ )   gi (  )   zi' ( yi  xi  )  Z 'u
N i1
N i1
N
(4.6-2)
It can be shown that the efficient GMM estimator is
ˆEGMM  ( X ' ZS 1Z ' X )1 X ' ZS 1Z ' y
(4.6-3)
where
S  E[ Z ' uu ' Z ]  E[ Z ' Z ]
(4.6-4)
Using the 2SLS residuals, a heteroskedasticity-consistent estimator of S can be obtained as
1 N
Sˆ   uˆ 2 Z i' Z i
N i1
(4.6-5)
which has been characterized as a standard sandwich approach to robust covariance estimation.
For more details see Davidson and MacKinnon (1993, 607-610) and Baum (2006, 194-197)
Hall – Rudebusch - Wilcox (1996) proposed a likelihood ratio test of the relevance of
instrumental variables Z that is based on canonical correlation between the X and Z ri . The
ordered canonical correlation vector can be calculated as the square root of the eigenvalues of
( X ' X )1 ( X ' Z )( Z ' Z )1 (Z ' X )
software, often it is too late.
(4.6-6)
Simultaneous Equations Systems
53
with associated eigenvectors  i or the square root of the eigenvalues of
( Z ' Z )1 (Z ' X )( X ' X )1 ( X ' Z )
(4.6-7)
with associated eigenvectors  i . The vectors 1 and  1 maximize the correlation between X
and Z  which equals r1 . As noted by Hall-Rudebusch-Wilcox (1996, 287) “  j and  j are the
vectors which yield the j th highest correlation r j subject to the constrains that X  j and Z  j
are orthogonal.” The proposed Anderson statistic
n
LR  T  log(1 ri 2 )
(4.6-8)
i  j 1
is distributed as Chi-squared with (l-k+1) degrees of freedom where l is the rank of Z and k is
the rank of X and can be applied to both 2SLS and GMM models. A significant statistic is
consistent with appropriate instruments. A disadvantage of the Anderson test, is that it assumes
that the regressors are distributed multivariate normal. Further information on the Anderson test
is in Baum (2006, 208). The Anderson statistic can also be displayed in LM form as N min(ri )
or in the Cragg-Donald (1993) form as ( N min(ri )) / (1  min(ri )) . If these ststistics are not
significant, the instruments selected are weak.
For GMM estimation the Hansen (1982) J statistic which tests for overidentifying
restrictions is usually used. The Hansen test, which is also called the Sargon (1958) test, is the
value of the efficient GMM objective function
u ' ZS 1Z ' u
(4.6-9)
and is distributed as chi-square with degrees of freedom l-k. A significant value indicates the
selected instruments are not suitable.
The Basmann (1960) over identification test is
'
'
 (uLS

2uLS 2  uZ uZ )
(N  l) 

uZ' uZ


(4.6-10)
where uLS 2 is the residual from the LS2 equation and uz is the residual from a model that
predicts uLS 2 as a function of Z. The Basmann test is distributed as chi-square with degrees of
freedom l-k. If the instruments Z have no predictive power, or in other words are orthogonal to
'
'
the LS2 residuals, then uLS
2u LS 2  u Z u Z and the chi-square value will not be significant. A
significant chi-square value, however, indicates that the instruments are not suitable since they
are not exogenous.
54
Chapter 4
Table 4.8 lists subroutines LS2 and GMMEST that estimate 2SLS and GMM models
respectively. For an exactly identified system, LS2 and GMM will be the same. For an
overidentified system, GMM is more efficient.
Table 4.8 LS2 and General Method of Moments estimation routines
/;
/; Loads LS2 and GMMEST
/;
subroutine ls2(y1,x1,z1,names,yvar,iprint);
/;
/; y1 => left hand side
Usually set as %y from OLS
/; x1 => right hand side. Usually set as %x from OLS step
/; z1 => instrumental Variables
/; names => Names from OLS step. Usually set as %names
/; yvar
=> usually set from call olsq as %yvar
/; iprint => =1 print coef, =2 print covariance in addition
/;
/; if # of obs for z1 < x1 then x1 will be truncated
/;
/; Automatic variables created
/; %olscoef
=> OLS Coefficients
/; %ols_se
=> OLS SE
/; %ols_t
=> OLS t
/; %ls2coef
=> LS2 Coefficients
/; %ls2_sel
=> Large Sample LS2 SE
/; %ls2_ses
=> Small Sample LS2 SE
/; %ls2_t_l
=> Large Sample LS2 t
/; %ls2_t_s
=> Small Sample LS2 t
/; %rss_ols
=> e'e for OLS
/; %rss_ls2
=> e'e for LS2
/; %yhatols
=> yhat for OLS
/; %yhatls2
=> yhat for LS2
/; %resols
=> OLS Residual
/; %resls2
=> LS2 Residual
/; %covar1
=> Large Sample covariance
/; %sigma_l
=> Large Sample sigma
/; %sigma_s
=> Small Sample Sigma
/; %z
/; %info
=> Model is ok if = 0
/; For conditional Heteroskedasticity Sargan(1958)=Hansen(1982) j test
/; %sargan
=> Sargan(1958) test
/; %basmann
=> Basmann(1960)
/;
/; Example Job:
/;
/; b34sexec options ginclude('b34sdata.mac') member(kmenta);
/;
b34srun;
/;
/; b34sexec matrix;
/; call loaddata;
/; call echooff;
/; call print('OLS for Equation # 1':);
/; call olsq(q p d :savex :print);
/; call ls2a(%y,%x,catcol(d,f,a,constant),%names,%yvar,1);
Simultaneous Equations Systems
/;
/; call print('OLS for Equation # 2':);
/; call olsq(q p a f: a :savex :print);
/; call ls2a(%y,%x,catcol(d,f,a,constant),%names,%yvar,1);
/; b34srun;
/;
/; Command built 26 April 2010, Mods 26 May 2010 2 August 2010
/;
y =vfam(y1);
%z=mfam(z1);
x =mfam(x1);
n1=norows(%z);
n2=norows(x);
if(n2.lt.n1)call deleterow(%z,1,(n1-n2));
if(n1.lt.n2)then;
call epprint('ERROR: # obs for instruments < # obs for equation');
go to done;
endif;
/; This saves the OLS Results
call olsq(y x :noint);
%olscoef=%coef;
%ols_se=%se;
%ols_t =%t;
n_k=%nob-%k;
%rss_ols=%rss;
%yhatols=%yhat;
%resols =%res;
* 2SLS ;
zpz = transpose(%z)*%z;
zpx = transpose(%z)*x;
zpy = transpose(%z)*y;
ypy = y*y;
irank=rank(zpx);
iorder=rank(zpz);
/;
if(iorder.lt.irank)then;
call epprint('ERROR: Model Underidentified.':);
go to done;
endif;
/;
%ls2coef =inv(transpose(zpx)*inv(zpz)*zpx)*
(transpose(zpx)*inv(zpz)*zpy);
/;
/; Error trap turned off
/;
/; call gminv((transpose(zpx)*inv(zpz)*zpx),%ls2coef,%info,rrcond);
/; if(%info.ne.0)then;
/; go to done;
/; endif;
/; %ls2coef=%ls2coef*(transpose(zpx)*inv(zpz)*zpy);
%yhatls2=x*%ls2coef;
55
56
Chapter 4
%resls2 =y-%yhatls2;
sigma_w=(ypy - (2.*y*x*%ls2coef) +
%ls2coef*transpose(x)*x*%ls2coef)/dfloat(n_k);
varcoef=sigma_w*inv(transpose(x)*%z*inv(zpz)*transpose(%z)*x);
%ls2_ses=dsqrt(diag(varcoef));
* Get sigma(i,j) from fits ;
%rss_ls2=sumsq(%resls2);
%sigma_l=%rss_ls2/dfloat(%nob);
%sigma_s=%rss_ls2/dfloat(n_k);
%covar_1=%sigma_l*inv(transpose(zpx)*inv(zpz)*zpx);
%ls2_sel=dsqrt(diag(%covar_1));
%ls2_t_s=afam(%ls2coef)/afam(%ls2_ses);
%ls2_t_l=afam(%ls2coef)/afam(%ls2_sel);
/;
/; squared canonical correlations
/;
if(iprint.ne.0)then;
can_corr=real(eig(inv(transpose(x)*x)*(transpose(x)*%z)*inv(zpz)*zpx));
call print(can_corr);
anderson=-1.*dfloat(norows(%z))
*dlog(sum(kindas(%z,1.0)-afam(can_corr)));
anderlm = dfloat(norows(%z))*min(can_corr);
cragg_d = anderlm/(1.0 - min(can_corr));
endif;
/;
/; %sargan & %basmann
/;
call olsq(%resls2 %z :noint);
%basmann=(dfloat( norows(%z)-nocols(%z))*(sumsq(%resls2)-%rss))/%rss;
%sargan = dfloat(norows(%z))*%rsq;
/;
if(iprint.ne.0)then;
call print(' ':);
call print('OLS and LS2 Estimation':);
call print(' ':);
gg=
'Dependent Variable
';
gg2=c1array(8:yvar);
ff=catrow(gg,gg2);
call print(ff:);
call print('OLS Sum of squared Residuals
',%rss_ols:);
call print('LS2 Sum of squared Residuals
',%rss_ls2:);
call print('Large Sample ls2 sigma
',%sigma_l:);
call print('Small Sample ls2 sigma
',%sigma_s:);
call print('Rank of Equation
',irank:);
call print('Order of Equation
',iorder:);
if(irank.lt.iorder)call print('Equation is overidentified':);
if(irank.eq.iorder)call print('Equation is exactly identified':);
/;
call print('Anderson LR ident./IV Relevance test ',anderson:);
/;
if(iorder.ge.irank.and.anderson.gt.0.0)then;
aprob=chisqprob(anderson,dfloat(iorder+1-irank));
call print('Significance of Anderson LR Statistic',aprob:);
endif;
/;
call print('Anderson Canon Correlation LM test
',anderlm:);
/;
if(iorder.ge.irank.and.anderlm.gt.0.0)then;
aprob=chisqprob(anderlm,dfloat(iorder+1-irank));
Simultaneous Equations Systems
call print('Significance of Anderson LM Statistic',aprob:);
endif;
/;
call print('Cragg-Donald Chi-Square Weak ID Test ',cragg_d:);
/;
if(iorder.ge.irank.and.cragg_d.gt.0.0)then;
aprob=chisqprob(cragg_d,dfloat(iorder+1-irank));
call print('Significance of Cragg-Donald test
',aprob:);
endif;
/;
call print('Basmann
',%basmann:);
/;
if(iorder.gt.irank.and.%basmann.gt.0.0)then;
bprob=chisqprob(%basmann,dfloat(iorder-irank));
call print('Significance of Basmann Statistic
',bprob:);
endif;
/;
call print('Sargan N*R-sq / J-Test Test
',%sargan:);
/;
if(iorder.gt.irank.and.%sargan.gt.0.0)then;
sprob=chisqprob(%sargan,dfloat(iorder-irank));
call print('Significance of Sargan Statistic
',sprob:);
endif;
/;
call tabulate(names,%olscoef,%ols_se,%ols_t,%ls2coef,
%ls2_ses,%ls2_sel,
%ls2_t_s,%ls2_t_l
:title
'+++++++++++++++++++++++++++++++++++++++++++++++++++++');
call print(' ':);
if(iprint.eq.2)
call print('Estimated Covariance Matrix - Large Sample',%covar_1);
endif;
/;
call makeglobal(%olscoef);
call makeglobal(%ols_se);
call makeglobal(%ols_t);
call makeglobal(%ls2coef);
call makeglobal(%ls2_sel);
call makeglobal(%ls2_ses);
call makeglobal(%ls2_t_l);
call makeglobal(%ls2_t_s);
call makeglobal(%rss_ols);
call makeglobal(%rss_ls2);
call makeglobal(%yhatols);
call makeglobal(%yhatls2);
call makeglobal(%resols);
call makeglobal(%resls2);
call makeglobal(%covar_1);
call makeglobal(%sigma_l);
call makeglobal(%sigma_s);
call makeglobal(%z);
call makeglobal(%sargan);
call makeglobal(%basmann);
/; call makeglobal(%info);
/;
done continue;
return;
end;
subroutine gmmest(y,x,z,names,yvar,j_stat,sigma,iprint);
57
58
Chapter 4
/;
/; GMM Model - Built 12 May 2010
/;
/; Must call ls2 prior to this call to produce global variable
/; %z
/;
/; The following global variables are created:
/; %resgmm
=> GMM Residuals
/; %segmm
=> GMM SE
/; %tgmm
=> GMM t
/; %coefgmm
=> GMM Coef
/; %yhatgmm
=> GMM Y hat
/;
/;
The Anderson Test is discussed in Baum
/; "An introduction to Modern Econometrics Using Stata" (2006) p. 208
/;
Both the IV and LM forms of tgeh test are given.
/;
/; Generates feasable two-step GMM Estimator. Results are the same as
/; produced by the RATS "optimalweights" option.
/;
/; Note: When running bootstraps inv(s) can fail to invert if dummy
/;
variables are in the dataset.
/;
/; See Baum (2006) page 196
/;
xpz
= transpose(x)*z;
xpy
= transpose(x)*vfam(y);
ypy
= vfam(y)*vfam(y);
/;
/; GMM Coefficients
/;
irank =rank(xpz);
iorder=rank(transpose(z)*z);
/;
if(iorder.lt.irank)then;
call epprint('ERROR: Model Underidentified.':);
go to done;
endif;
/;
adj=kindas(z,1.0)/dfloat(norows(z));
s=hc_sigma(adj,z,%resls2);
inv_s=inv(s);
%coefgmm=inv(xpz*inv_s*transpose(xpz)) *
(xpz*inv_s*transpose(z)*vfam(y));
%resgmm =vfam(y)-x*%coefgmm;
%yhatgmm=x*%coefgmm;
sigma=hc_sigma(kindas(z,1.),z,%resls2);
/;
/; Logic from Rats User's Guide Version 7 page 245
/;
j_stat=%resgmm*z*inv(sigma)*transpose(z)*%resgmm;
/;
/; Stock Watcon 2007 page 734
/;
%segmm=dsqrt(diag(inv(xpz*inv(sigma)*transpose(xpz))));
%tgmm=afam(%coefgmm)/afam(%segmm);
/;
/;
/; squared canonical correlations
/;
can_corr = real(eig(inv(transpose(x)*x)*(transpose(x)*z)
Simultaneous Equations Systems
*inv(transpose(z)*z)* transpose(xpz)));
/;
if(iprint.gt.1)call print(can_corr);
anderson=-1.*dfloat(norows(z))
*dlog(sum(kindas(z,1.0)-afam(can_corr)));
anderlm = dfloat(norows(z))*min(can_corr);
cragg_d = anderlm/(1.0 - min(can_corr));
/;
if(iprint.ne.0)then;
call print(' ':);
call print('GMM Estimates':);
call print(' ':);
gg=
'Dependent Variable
';
gg2=c1array(8:yvar);
ff=catrow(gg,gg2);
call print(ff:);
call print('OLS sum of squares
',sumsq(%resols):);
call print('LS2 sum of squares
',sumsq(%resls2):);
call print('GMM sum of squares
',sumsq(%resgmm):);
call print('Rank of Equation
',irank:);
call print('Order of Equation
',iorder:);
if(irank.lt.iorder)call print('Equation is overidentified':);
if(irank.eq.iorder)call print('Equation is exactly identified':);
call print('Anderson ident./IV Relevance test
',anderson:);
/;
if(iorder.ge.irank.and.anderson.gt.0.0)then;
aprob=chisqprob(anderson,dfloat(iorder+1-irank));
call print('Significance of Anderson Statistic ',aprob:);
endif;
/;
call print('Anderson Canon Correlation LM test
',anderlm:);
/;
if(iorder.ge.irank.and.anderlm.gt.0.0)then;
aprob=chisqprob(anderlm,dfloat(iorder+1-irank));
call print('Significance of Anderson LM Statistic',aprob:);
endif;
/;
call print('Cragg-Donald Chi-Square Weak ID Test ',cragg_d:);
/;
if(iorder.ge.irank.and.cragg_d.gt.0.0)then;
aprob=chisqprob(cragg_d,dfloat(iorder+1-irank));
call print('Significance of Cragg-Donald test
',aprob:);
endif;
/;
call print('Hansen J_stat Ident. of
instruments',j_stat:);
/;
if(iorder.gt.irank.and.j_stat.gt.0.0)then;
jprob=chisqprob(j_stat,dfloat(iorder-irank));
call print('Significance of Hansen j_stat
',jprob:);
endif;
/;
call tabulate(names,%coefgmm,%segmm,%tgmm
:title '+++++++++++++++++++++++++++++++++++++++++++++++++++++');
call print(' ':);
endif;
call
call
call
call
call
makeglobal(%resgmm);
makeglobal(%segmm);
makeglobal(%tgmm);
makeglobal(%coefgmm);
makeglobal(%yhatgmm);
59
60
Chapter 4
done continue;
return;
end;
Table 4.9 shows the setup to estimate and test LS2 and GMM models for the Griliches
(1976) wage data used as a test case in Baum (2006). The Griliches model regresses the log
wage on education, experience, tenure, age, a number of control variables and various year
dummy variables. Stata and Rats results are shown for comparison. In addition Baum (2006)
can be inspected for replication purposes.
Table 4.9 Estimation of LS2 and GMM Models using B34S, Stata and Rats
%b34slet
%b34slet
%b34slet
%b34slet
b34sexec
dob34s1=0;
dob34s2=1;
dostata=1;
dorats =1;
options ginclude('micro.mac')
member(griliches76); b34srun
%b34sif(&dob34s1.ne.0)%then;
b34sexec matrix;
call loaddata;
call echooff;
call olsq(iq s expr tenure rns smsa
iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73
med kww age mrt :print);
iqyhat=%yhat;
call olsq(lw iqyhat s expr tenure rns smsa
iyear_67
iyear_68
iyear_69
iyear_70
iyear_71
iyear_73 :print);
call olsq(lw iq s expr tenure rns smsa
iyear_67
iyear_68
iyear_69
iyear_70
iyear_71
iyear_73 :print);
call gamfit(lw iq s expr tenure rns[factor,1] smsa[factor,1]
iyear_67[factor,1]
iyear_68[factor,1]
iyear_69[factor,1]
iyear_70[factor,1]
iyear_71[factor,1]
iyear_73[factor,1] :print);
call marspline(lw iq s expr tenure rns smsa
iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73
:print :nk 40 :mi 2);
call gamfit(lw80 iq s expr tenure rns[factor,1] smsa[factor,1]
iyear_67[factor,1]
Simultaneous Equations Systems
iyear_68[factor,1]
iyear_69[factor,1]
iyear_70[factor,1]
iyear_71[factor,1]
iyear_73[factor,1] :print);
call marspline(lw80 iq s expr tenure rns smsa
iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73
:print :nk 40 :mi 2);
b34srun;
%b34sendif;
%b34sif(&dob34s2.ne.0)%then;
b34sexec matrix;
call loaddata;
call load(ls2);
call echooff;
call character(lhs,'lw');
call character(endvar,'iq');
call character(rhs,'iq s expr tenure rns smsa iyear_67 iyear_68
iyear_69 iyear_70 iyear_71 iyear_73 constant');
call character(ivar,'s expr tenure rns smsa iyear_67 iyear_68
iyear_69 iyear_70 iyear_71 iyear_73 constant med kww age mrt');
call olsq(argument(lhs) argument(rhs) :noint :print :savex);
call ls2(%y,%x,catcol(argument(ivar)),%names,%yvar,1);
call print(lhs,rhs,ivar,endvar);
call gmmest(%y,%x,%z,%names,%yvar,j_stat,sigma,1);
call graph(%y %yhatols %yhatls2,%yhatgmm :nocontact
:pgborder :nolabel);
b34srun;
%b34sendif;
%b34sif(&dostata.ne.0)%then;
b34sexec options open('statdata.do') unit(28) disp=unknown$ b34srun$
b34sexec options clean(28)$ b34srun$
b34sexec options open('stata.do') unit(29) disp=unknown$ b34srun$
b34sexec options clean(29)$ b34srun$
b34sexec pgmcall idata=28 icntrl=29$
stata$
* for detail on stata commands see Baum page 205 ;
pgmcards$
* uncomment if do not use /e
* log using stata.log, text
global xlist s expr tenure rns smsa iyear_67 ///
iyear_68 iyear_69 iyear_70 iyear_71 iyear_73
ivregress 2sls
ivregress liml
ivregress gmm
lw $xlist (iq=med kww age mrt)
lw $xlist (iq=med kww age mrt)
lw $xlist (iq=med kww age mrt)
ivreg
lw
$xlist (iq=med kww age mrt)
ivreg2
lw
$xlist (iq=med kww age mrt)
61
62
Chapter 4
ivreg2 lw $xlist (iq=med kww age mrt), gmm2s robust
overid, all
* orthog(age mrt)
gmm (lw-{xb:$xlist iq} +{b0}), ///
instruments ($xlist med kww age mrt) onestep nolog
exit,clear
b34sreturn$
b34seend$
b34sexec options close(28); b34srun;
b34sexec options close(29); b34srun;
b34sexec options
dounix('stata -b do stata.do ')
dodos('stata /e stata.do');
b34srun;
b34sexec options npageout
writeout('output from stata',' ',' ')
copyfout('stata.log')
dodos('erase stata.do',
/; 'erase stata.log',
'erase statdata.do') $
b34srun$
%b34sendif;
%b34sif(&dorats.ne.0)%then;
b34sexec options open('rats.dat') unit(28) disp=unknown$ b34srun$
b34sexec options open('rats.in') unit(29) disp=unknown$ b34srun$
b34sexec options clean(28)$ b34srun$
b34sexec options clean(29)$ b34srun$
b34sexec pgmcall$
rats passasts
pcomments('* ',
'* Data passed from B34S(r) system to RATS',
'*
',
"display @1 %dateandtime() @33 ' Rats Version ' %ratsversion()"
'* ') $
PGMCARDS$
*
instruments s expr tenure rns smsa iyear_67
$
iyear_68 iyear_69 iyear_70 iyear_71 iyear_73
med kww age mrt constant
* OLS
linreg lw
# constant s expr tenure rns smsa iyear_67
$
iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 iq
* 2SLS
linreg(inst) lw
# constant s expr tenure rns smsa iyear_67
$
iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 iq
* GMM
$
Simultaneous Equations Systems
63
linreg(inst,optimalweights) lw
# constant s expr tenure rns smsa iyear_67
$
iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 iq
b34sreturn$
b34srun $
b34sexec options close(28)$ b34srun$
b34sexec options close(29)$ b34srun$
b34sexec options
/$
dodos(' rats386 rats.in rats.out ')
dodos('start /w /r
rats32s rats.in /run')
dounix('rats
rats.in rats.out')$ B34SRUN$
b34sexec options npageout
WRITEOUT('Output from RATS',' ',' ')
COPYFOUT('rats.out')
dodos('ERASE rats.in','ERASE rats.out','ERASE
dounix('rm
rats.in','rm
rats.out','rm
$
B34SRUN$
%b34sendif;
rats.dat')
rats.dat')
Edited and annotated results are shown next.
Variable
RNS
RNS80
MRT
MRT80
SMSA
SMSA80
MED
IQ
KWW
YEAR
AGE
AGE80
S
S80
EXPR
EXPR80
TENURE
TENURE80
LW
LW80
IYEAR_67
IYEAR_68
IYEAR_69
IYEAR_70
IYEAR_71
IYEAR_73
CONSTANT
Label
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Cases
residency in South
residency in South in 1980
marital status = 1 if married
marital status = 1 if married in 1980
reside metro area = 1 if urban
reside metro area = 1 if urban in 1980
mother s education, years
iq score
score on knowledge in world of work test
Year
Age
Age in 1980
completed years of schooling
completed years of schooling in 1980
experience, years
experience, yearsin 1980
tenure, years
tenure, years in 1980
log wage
log wage in 1980
Number of observations in data file
Current missing variable code
758
758
758
758
758
758
758
758
758
758
758
758
758
758
758
758
758
758
758
758
758
758
758
758
758
758
758
758
1.000000000000000E+31
Mean
0.269129
0.292876
0.514512
0.898417
0.704485
0.712401
10.9103
103.856
36.5739
69.0317
21.8351
33.0119
13.4050
13.7071
1.73543
11.3943
1.83113
7.36280
5.68674
6.82656
0.831135E-01
0.104222
0.112137
0.844327E-01
0.121372
0.208443
1.00000
Std. Dev.
0.443800
0.455383
0.500119
0.302299
0.456575
0.452942
2.74112
13.6187
7.30225
2.63179
2.98176
3.08550
2.23183
2.21469
2.10554
4.21075
1.67363
5.05024
0.428949
0.409927
0.276236
0.305750
0.315744
0.278219
0.326775
0.406464
0.00000
Variance
0.196959
0.207373
0.250119
0.913845E-01
0.208461
0.205156
7.51374
185.468
53.3228
6.92634
8.89087
9.52033
4.98106
4.90486
4.43331
17.7304
2.80104
25.5049
0.183998
0.168040
0.763063E-01
0.934828E-01
0.996940E-01
0.774060E-01
0.106782
0.165213
0.00000
Maximum
Minimum
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
18.0000
145.000
56.0000
73.0000
30.0000
38.0000
18.0000
18.0000
11.4440
22.0450
10.0000
22.0000
7.05100
8.03200
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
1.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
54.0000
12.0000
66.0000
16.0000
28.0000
9.00000
9.00000
0.00000
0.692000
0.00000
0.00000
4.60500
4.74900
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
1.00000
64
Chapter 4
Ordinary Least Squares Estimation
Dependent variable
Centered R**2
Adjusted R**2
Residual Sum of Squares
Residual Variance
Standard Error
Total Sum of Squares
Log Likelihood
Mean of the Dependent Variable
Std. Error of Dependent Variable
Sum Absolute Residuals
F(12,
745)
F Significance
1/Condition XPX
Maximum Absolute Residual
Number of Observations
Variable
IQ
S
EXPR
TENURE
RNS
SMSA
IYEAR_67
IYEAR_68
IYEAR_69
IYEAR_70
IYEAR_71
IYEAR_73
CONSTANT
Lag
0
0
0
0
0
0
0
0
0
0
0
0
0
Coefficient
0.27121199E-02
0.61954782E-01
0.30839472E-01
0.42163060E-01
-0.96293467E-01
0.13289929
-0.54209478E-01
0.80580850E-01
0.20759151
0.22822373
0.22269148
0.32287469
4.2353569
LW
0.4301415547786606
0.4209626268019410
79.37338878983863
0.1065414614628706
0.3264068955504320
139.2861498420176
-220.3342420049200
5.686738782319042
0.4289493629019316
194.5217111479906
46.86185095575703
1.000000000000000
1.486105464518127E-06
1.186094775249485
758
SE
0.10314110E-02
0.72785810E-02
0.65100828E-02
0.74812112E-02
0.27546700E-01
0.26575835E-01
0.47852181E-01
0.44895091E-01
0.43860470E-01
0.48799418E-01
0.43095233E-01
0.40657433E-01
0.11334886
t
2.6295239
8.5119313
4.7371858
5.6358601
-3.4956444
5.0007567
-1.1328528
1.7948700
4.7329979
4.6767716
5.1674272
7.9413448
37.365677
The below listed edited output replicates Baum (2006, 193-194). The Basman and Sargan tests
of 97.0249 and 87.655, respectively, show high significance which rejects the null hypothesis
that there is no correlation between the residuals of the LS2 model and the instruments. This
finding suggests serious problems since endogeniety present in the OLS model will not be
removed by LS2 estimation. Note that Stata replicates the Sargon test value. The Anderson
value of 54.33 that tests for the relevance of the instruments matches the value reported in Baum
(2006, 204) but does not match the value reported by Stata in the printed output that uses the
revised ivreg2 Stata command that uses the LM form of the test value of 52.436. The B34S
output includes both statistics. Since the null was rejected, the instruments appear relevant in that
they are related to the endogenous variables. This is confirmed with the Cragg-Donald (1993)
statistic of 56.333. In addition to various LS2 and GMM results, both Stata bootstrap and Stata
robust errors results are shown. The bootstrap results do not make do not make assumptions
about the distribution of the regressiors.
The Rats coefficient results for LS2 and GMM match B34S and Stata. Note that Rats
uses the small sample SE formula while Stata reports the large sample SE. B34S LS2 results
report both. The exact formulas for all LS2 and GMM calculations in B34S are contained in the
two subroutines listed in Table 4.8.
Simultaneous Equations Systems
OLS and LS2 Estimation
Dependent Variable
OLS Sum of squared Residuals
LS2 Sum of squared Residuals
Large Sample ls2 sigma
Small Sample ls2 sigma
Rank of Equation
Order of Equation
Equation is overidentified
Anderson LR ident./IV Relevance test
Significance of Anderson LR Statistic
Anderson Canon Correlation LM test
Significance of Anderson LM Statistic
Cragg-Donald Chi-Square Weak ID Test
Significance of Cragg-Donald test
Basmann
Significance of Basmann Statistic
Sargan N*R-sq / J-Test Test
Significance of Sargan Statistic
LW
79.37338878983863
80.01823370030675
0.1055649521112226
0.1074070251010829
13
16
54.33777011513529
0.9999999999552830
52.43586586757428
0.9999999998881718
56.33277600836977
0.9999999999829244
97.02497131695870
1.000000000000000
87.65523169449482
1.000000000000000
+++++++++++++++++++++++++++++++++++++++++++++++++++++
Obs
1
2
3
4
5
6
7
8
9
10
11
12
13
LHS
NAMES
IQ
S
EXPR
TENURE
RNS
SMSA
IYEAR_67
IYEAR_68
IYEAR_69
IYEAR_70
IYEAR_71
IYEAR_73
CONSTANT
%OLSCOEF
%OLS_SE
%OLS_T
0.2712E-02 0.1031E-02
2.630
0.6195E-01 0.7279E-02
8.512
0.3084E-01 0.6510E-02
4.737
0.4216E-01 0.7481E-02
5.636
-0.9629E-01 0.2755E-01 -3.496
0.1329
0.2658E-01
5.001
-0.5421E-01 0.4785E-01 -1.133
0.8058E-01 0.4490E-01
1.795
0.2076
0.4386E-01
4.733
0.2282
0.4880E-01
4.677
0.2227
0.4310E-01
5.167
0.3229
0.4066E-01
7.941
4.235
0.1133
37.37
%LS2COEF
%LS2_SES
%LS2_SEL
%LS2_T_S
%LS2_T_L
0.1747E-03 0.3937E-02 0.3903E-02 0.4436E-01 0.4474E-01
0.6918E-01 0.1305E-01 0.1294E-01
5.301
5.347
0.2987E-01 0.6697E-02 0.6639E-02
4.460
4.498
0.4327E-01 0.7693E-02 0.7627E-02
5.625
5.674
-0.1036
0.2974E-01 0.2948E-01 -3.484
-3.514
0.1351
0.2689E-01 0.2666E-01
5.025
5.069
-0.5260E-01 0.4811E-01 0.4769E-01 -1.093
-1.103
0.7947E-01 0.4511E-01 0.4472E-01
1.762
1.777
0.2109
0.4432E-01 0.4393E-01
4.759
4.800
0.2386
0.5142E-01 0.5097E-01
4.641
4.682
0.2285
0.4412E-01 0.4374E-01
5.178
5.223
0.3259
0.4107E-01 0.4072E-01
7.935
8.004
4.400
0.2709
0.2685
16.24
16.38
= LW
RHS
=
IQ S EXPR TENURE RNS SMSA IYEAR_67 IYEAR_68
70 IYEAR_71 IYEAR_73 CONSTANT
IVAR
=
S EXPR TENURE RNS SMSA IYEAR_67 IYEAR_68
IYEAR_71 IYEAR_73 CONSTANT MED KWW AGE MRT
ENDVAR
IYEAR_69 IYEAR_
IYEAR_69 IYEAR_70
= iq
GMM Estimates
Dependent Variable
OLS sum of squares
LS2 sum of squares
GMM sum of squares
Rank of Equation
Order of Equation
Equation is overidentified
Anderson ident./IV Relevance test
Significance of Anderson Statistic
Anderson Canon Correlation LM test
Significance of Anderson LM Statistic
Cragg-Donald Chi-Square Weak ID Test
Significance of Cragg-Donald test
Hansen J_stat Ident. of
instruments
Significance of Hansen j_stat
LW
79.37338878983863
80.01823370030675
81.26217887229201
13
16
54.33777011513529
0.9999999999552830
52.43586586757428
0.9999999998881718
56.33277600836977
0.9999999999829244
74.16487762432548
0.9999999999999994
+++++++++++++++++++++++++++++++++++++++++++++++++++++
Obs
1
2
3
4
5
6
7
8
9
10
11
12
13
NAMES
IQ
S
EXPR
TENURE
RNS
SMSA
IYEAR_67
IYEAR_68
IYEAR_69
IYEAR_70
IYEAR_71
IYEAR_73
CONSTANT
%COEFGMM
%SEGMM
%TGMM
-0.1401E-02 0.4113E-02 -0.3407
0.7684E-01 0.1319E-01
5.827
0.3123E-01 0.6693E-02
4.667
0.4900E-01 0.7344E-02
6.672
-0.1007
0.2959E-01 -3.403
0.1336
0.2632E-01
5.075
-0.2101E-01 0.4554E-01 -0.4614
0.8910E-01 0.4270E-01
2.087
0.2072
0.4080E-01
5.080
0.2338
0.5285E-01
4.424
0.2346
0.4257E-01
5.510
0.3360
0.4041E-01
8.315
4.437
0.2900
15.30
B34S Matrix Command Ending. Last Command reached.
65
66
Chapter 4
output from stata
___ ____ ____ ____ ____ (R)
/__
/
____/
/
____/
___/
/
/___/
/
/___/
11.1
Statistics/Data Analysis
Copyright 2009 StataCorp LP
StataCorp
4905 Lakeway Drive
College Station, Texas 77845 USA
800-STATA-PC
http://www.stata.com
979-696-4600
stata@stata.com
979-696-4601 (fax)
Single-user Stata perpetual license:
Serial number: 30110535901
Licensed to: Houston H. Stokes
University of Illinois at Chicago
Notes:
1.
2.
(/m# option or -set memory-) 120.00 MB allocated to data
Stata running in batch mode
. do stata.do
. * File built by B34S
. run statdata.do
on 17/10/10
at
12:29:31
. * uncomment if do not use /e
. * log using stata.log, text
. global xlist s expr tenure rns smsa iyear_67 ///
>
iyear_68 iyear_69 iyear_70 iyear_71 iyear_73
. bootstrap _b _se, reps(50): ///
>
ivregress 2sls lw $xlist (iq=med kww age mrt)
(running ivregress on estimation sample)
Bootstrap replications (50)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
..................................................
Bootstrap results
50
Number of obs
Replications
=
=
758
50
-----------------------------------------------------------------------------|
Observed
Bootstrap
Normal-based
|
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------b
|
iq |
.0001747
.0074584
0.02
0.981
-.0144435
.0147928
s |
.0691759
.0217356
3.18
0.001
.0265749
.1117769
expr |
.029866
.0079507
3.76
0.000
.014283
.0454491
tenure |
.0432738
.0086468
5.00
0.000
.0263264
.0602211
rns | -.1035897
.0406823
-2.55
0.011
-.1833256
-.0238538
smsa |
.1351148
.0258812
5.22
0.000
.0843886
.1858411
iyear_67 |
-.052598
.0422675
-1.24
0.213
-.1354408
.0302448
iyear_68 |
.0794686
.0459301
1.73
0.084
-.0105528
.16949
iyear_69 |
.2108962
.0456788
4.62
0.000
.1213673
.300425
iyear_70 |
.2386338
.0592127
4.03
0.000
.122579
.3546886
iyear_71 |
.2284609
.0513617
4.45
0.000
.1277939
.3291279
iyear_73 |
.3258944
.0432171
7.54
0.000
.2411904
.4105984
_cons |
4.39955
.4995474
8.81
0.000
3.420455
5.378645
-------------+---------------------------------------------------------------se
|
iq |
.0039035
.0012226
3.19
0.001
.0015073
.0062996
s |
.0129366
.0034772
3.72
0.000
.0061214
.0197518
expr |
.0066393
.0007373
9.00
0.000
.0051941
.0080845
tenure |
.0076271
.0011929
6.39
0.000
.005289
.0099652
rns |
.029481
.0052416
5.62
0.000
.0192077
.0397544
smsa |
.0266573
.002741
9.73
0.000
.021285
.0320297
iyear_67 |
.0476924
.0051268
9.30
0.000
.0376441
.0577407
iyear_68 |
.0447194
.004026
11.11
0.000
.0368285
.0526102
iyear_69 |
.0439336
.0055467
7.92
0.000
.0330623
.054805
iyear_70 |
.0509733
.0052485
9.71
0.000
.0406864
.0612601
iyear_71 |
.0437436
.0041483
10.54
0.000
.035613
.0518741
iyear_73 |
.0407181
.0041193
9.88
0.000
.0326444
.0487917
_cons |
.2685443
.0796381
3.37
0.001
.1124564
.4246321
-----------------------------------------------------------------------------. * Durbin-Wu-Hausman exogenous test robust errors
. ivregress 2sls lw $xlist (iq=med kww age mrt), vce(robust)
Simultaneous Equations Systems
Instrumental variables (2SLS) regression
Number of obs
Wald chi2(12)
Prob > chi2
R-squared
Root MSE
=
=
=
=
=
758
573.14
0.0000
0.4255
.32491
-----------------------------------------------------------------------------|
Robust
lw |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------iq |
.0001747
.0041241
0.04
0.966
-.0079085
.0082578
s |
.0691759
.0132907
5.20
0.000
.0431266
.0952253
expr |
.029866
.0066974
4.46
0.000
.0167394
.0429926
tenure |
.0432738
.0073857
5.86
0.000
.0287981
.0577494
rns | -.1035897
.029748
-3.48
0.000
-.1618947
-.0452847
smsa |
.1351148
.026333
5.13
0.000
.0835032
.1867265
iyear_67 |
-.052598
.0457261
-1.15
0.250
-.1422195
.0370235
iyear_68 |
.0794686
.0428231
1.86
0.063
-.0044631
.1634003
iyear_69 |
.2108962
.0408774
5.16
0.000
.1307779
.2910144
iyear_70 |
.2386338
.0529825
4.50
0.000
.1347901
.3424776
iyear_71 |
.2284609
.0426054
5.36
0.000
.1449558
.311966
iyear_73 |
.3258944
.0405569
8.04
0.000
.2464044
.4053844
_cons |
4.39955
.290085
15.17
0.000
3.830994
4.968106
-----------------------------------------------------------------------------Instrumented: iq
Instruments:
s expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70
iyear_71 iyear_73 med kww age mrt
. ivreg2
lw
$xlist (iq=med kww age mrt)
IV (2SLS) estimation
-------------------Estimates efficient for homoskedasticity only
Statistics consistent for homoskedasticity only
Total (centered) SS
Total (uncentered) SS
Residual SS
=
=
=
139.2861498
24652.24662
80.0182337
Number of obs
F( 12,
745)
Prob > F
Centered R2
Uncentered R2
Root MSE
=
=
=
=
=
=
758
45.91
0.0000
0.4255
0.9968
.3249
-----------------------------------------------------------------------------lw |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------iq |
.0001747
.0039035
0.04
0.964
-.007476
.0078253
s |
.0691759
.0129366
5.35
0.000
.0438206
.0945312
expr |
.029866
.0066393
4.50
0.000
.0168533
.0428788
tenure |
.0432738
.0076271
5.67
0.000
.0283249
.0582226
rns | -.1035897
.029481
-3.51
0.000
-.1613715
-.0458079
smsa |
.1351148
.0266573
5.07
0.000
.0828674
.1873623
iyear_67 |
-.052598
.0476924
-1.10
0.270
-.1460734
.0408774
iyear_68 |
.0794686
.0447194
1.78
0.076
-.0081797
.1671169
iyear_69 |
.2108962
.0439336
4.80
0.000
.1247878
.2970045
iyear_70 |
.2386338
.0509733
4.68
0.000
.1387281
.3385396
iyear_71 |
.2284609
.0437436
5.22
0.000
.1427251
.3141967
iyear_73 |
.3258944
.0407181
8.00
0.000
.2460884
.4057004
_cons |
4.39955
.2685443
16.38
0.000
3.873213
4.925887
-----------------------------------------------------------------------------Underidentification test (Anderson canon. corr. LM statistic):
52.436
Chi-sq(4) P-val =
0.0000
-----------------------------------------------------------------------------Weak identification test (Cragg-Donald Wald F statistic):
13.786
Stock-Yogo weak ID test critical values: 5% maximal IV relative bias
16.85
10% maximal IV relative bias
10.27
20% maximal IV relative bias
6.71
30% maximal IV relative bias
5.34
10% maximal IV size
24.58
15% maximal IV size
13.96
20% maximal IV size
10.26
25% maximal IV size
8.31
Source: Stock-Yogo (2005). Reproduced by permission.
-----------------------------------------------------------------------------Sargan statistic (overidentification test of all instruments):
87.655
Chi-sq(3) P-val =
0.0000
-----------------------------------------------------------------------------Instrumented:
iq
Included instruments: s expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70
iyear_71 iyear_73
Excluded instruments: med kww age mrt
------------------------------------------------------------------------------
67
68
. ivreg2 lw
Chapter 4
$xlist (iq=med kww age mrt), gmm2s robust
2-Step GMM estimation
--------------------Estimates efficient for arbitrary heteroskedasticity
Statistics robust to heteroskedasticity
Total (centered) SS
Total (uncentered) SS
Residual SS
=
=
=
139.2861498
24652.24662
81.26217887
Number of obs
F( 12,
745)
Prob > F
Centered R2
Uncentered R2
Root MSE
=
=
=
=
=
=
758
49.67
0.0000
0.4166
0.9967
.3274
-----------------------------------------------------------------------------|
Robust
lw |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------iq | -.0014014
.0041131
-0.34
0.733
-.009463
.0066602
s |
.0768355
.0131859
5.83
0.000
.0509915
.1026794
expr |
.0312339
.0066931
4.67
0.000
.0181157
.0443522
tenure |
.0489998
.0073437
6.67
0.000
.0346064
.0633931
rns | -.1006811
.0295887
-3.40
0.001
-.1586738
-.0426884
smsa |
.1335973
.0263245
5.08
0.000
.0820021
.1851925
iyear_67 | -.0210135
.0455433
-0.46
0.645
-.1102768
.0682498
iyear_68 |
.0890993
.042702
2.09
0.037
.0054049
.1727937
iyear_69 |
.2072484
.0407995
5.08
0.000
.1272828
.287214
iyear_70 |
.2338308
.0528512
4.42
0.000
.1302445
.3374172
iyear_71 |
.2345525
.0425661
5.51
0.000
.1511244
.3179805
iyear_73 |
.3360267
.0404103
8.32
0.000
.2568239
.4152295
_cons |
4.436784
.2899504
15.30
0.000
3.868492
5.005077
-----------------------------------------------------------------------------Underidentification test (Kleibergen-Paap rk LM statistic):
41.537
Chi-sq(4) P-val =
0.0000
-----------------------------------------------------------------------------Weak identification test (Cragg-Donald Wald F statistic):
13.786
(Kleibergen-Paap rk Wald F statistic):
12.167
Stock-Yogo weak ID test critical values: 5% maximal IV relative bias
16.85
10% maximal IV relative bias
10.27
20% maximal IV relative bias
6.71
30% maximal IV relative bias
5.34
10% maximal IV size
24.58
15% maximal IV size
13.96
20% maximal IV size
10.26
Simultaneous Equations Systems
Output from RATS
*
* Data passed from B34S(r) system to RATS
*
display @1 %dateandtime() @33 ' Rats Version ' %ratsversion()
10/17/2010 12:29
Rats Version
7.30000
*
CALENDAR(IRREGULAR)
ALLOCATE
758
OPEN DATA rats.dat
DATA(FORMAT=FREE,ORG=OBS,
$
MISSING=
0.1000000000000000E+32
) / $
RNS
$
RNS80
$
MRT
$
MRT80
$
SMSA
$
SMSA80
$
MED
$
IQ
$
KWW
$
YEAR
$
AGE
$
AGE80
$
S
$
S80
$
EXPR
$
EXPR80
$
TENURE
$
TENURE80
$
LW
$
LW80
$
IYEAR_67
$
IYEAR_68
$
IYEAR_69
$
IYEAR_70
$
IYEAR_71
$
IYEAR_73
$
CONSTANT
SET TREND = T
TABLE
Series
Obs
Mean
Std Error
Minimum
Maximum
RNS
758
0.269129288
0.443800128
0.000000000
1.000000000
RNS80
758
0.292875989
0.455382503
0.000000000
1.000000000
MRT
758
0.514511873
0.500119364
0.000000000
1.000000000
MRT80
758
0.898416887
0.302298767
0.000000000
1.000000000
SMSA
758
0.704485488
0.456574966
0.000000000
1.000000000
SMSA80
758
0.712401055
0.452941990
0.000000000
1.000000000
MED
758 10.910290237
2.741119861
0.000000000 18.000000000
IQ
758 103.856200528 13.618666082 54.000000000 145.000000000
KWW
758 36.573878628
7.302246519 12.000000000 56.000000000
YEAR
758 69.031662269
2.631794247 66.000000000 73.000000000
AGE
758 21.835092348
2.981755741 16.000000000 30.000000000
AGE80
758 33.011873351
3.085503913 28.000000000 38.000000000
S
758 13.405013193
2.231828411
9.000000000 18.000000000
S80
758 13.707124011
2.214692601
9.000000000 18.000000000
EXPR
758
1.735428758
2.105542485
0.000000000 11.444000244
EXPR80
758 11.394261214
4.210745167
0.691999972 22.045000076
TENURE
758
1.831134565
1.673629972
0.000000000 10.000000000
TENURE80
758
7.362796834
5.050240439
0.000000000 22.000000000
LW
758
5.686738782
0.428949363
4.605000019
7.051000118
LW80
758
6.826555411
0.409926757
4.749000072
8.031999588
IYEAR_67
758
0.083113456
0.276235910
0.000000000
1.000000000
IYEAR_68
758
0.104221636
0.305749595
0.000000000
1.000000000
IYEAR_69
758
0.112137203
0.315743524
0.000000000
1.000000000
IYEAR_70
758
0.084432718
0.278219253
0.000000000
1.000000000
IYEAR_71
758
0.121372032
0.326774746
0.000000000
1.000000000
IYEAR_73
758
0.208443272
0.406463569
0.000000000
1.000000000
TREND
758 379.500000000 218.960042017
1.000000000 758.000000000
69
70
*
instruments s expr tenure rns smsa iyear_67
$
iyear_68 iyear_69 iyear_70 iyear_71 iyear_73
med kww age mrt constant
* OLS
linreg lw
# constant s expr tenure rns smsa iyear_67
$
iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 iq
Chapter 4
$
Linear Regression - Estimation by Least Squares
Dependent Variable LW
Usable Observations
758
Degrees of Freedom
745
Centered R**2
0.430142
R Bar **2
0.420963
Uncentered R**2
0.996780
T x R**2
755.559
Mean of Dependent Variable
5.6867387823
Std Error of Dependent Variable 0.4289493629
Standard Error of Estimate
0.3264068956
Sum of Squared Residuals
79.373388790
Regression F(12,745)
46.8619
Significance Level of F
0.00000000
Log Likelihood
-220.33424
Durbin-Watson Statistic
1.726206
Variable
Coeff
Std Error
T-Stat
Signif
********************************************************************************
1. Constant
4.235356890 0.113348861
37.36568 0.00000000
2. S
0.061954782 0.007278581
8.51193 0.00000000
3. EXPR
0.030839472 0.006510083
4.73719 0.00000260
4. TENURE
0.042163060 0.007481211
5.63586 0.00000002
5. RNS
-0.096293467 0.027546700
-3.49564 0.00050091
6. SMSA
0.132899286 0.026575835
5.00076 0.00000071
7. IYEAR_67
-0.054209478 0.047852181
-1.13285 0.25764051
8. IYEAR_68
0.080580850 0.044895091
1.79487 0.07307967
9. IYEAR_69
0.207591515 0.043860470
4.73300 0.00000265
10. IYEAR_70
0.228223732 0.048799418
4.67677 0.00000346
11. IYEAR_71
0.222691481 0.043095233
5.16743 0.00000031
12. IYEAR_73
0.322874689 0.040657433
7.94134 0.00000000
13. IQ
0.002712120 0.001031411
2.62952 0.00872684
* 2SLS
linreg(inst) lw
# constant s expr tenure rns smsa iyear_67
$
iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 iq
Linear Regression - Estimation by Instrumental Variables
Dependent Variable LW
Usable Observations
758
Degrees of Freedom
745
Mean of Dependent Variable
5.6867387823
Std Error of Dependent Variable 0.4289493629
Standard Error of Estimate
0.3277301102
Sum of Squared Residuals
80.018233699
J-Specification(3)
86.151910
Significance Level of J
0.00000000
Durbin-Watson Statistic
1.723148
Variable
Coeff
Std Error
T-Stat
Signif
********************************************************************************
1. Constant
4.399550073 0.270877148
16.24187 0.00000000
2. S
0.069175917 0.013048998
5.30124 0.00000015
3. EXPR
0.029866018 0.006696962
4.45964 0.00000948
4. TENURE
0.043273756 0.007693380
5.62480 0.00000003
5. RNS
-0.103589698 0.029737133
-3.48351 0.00052378
6. SMSA
0.135114831 0.026888925
5.02492 0.00000063
7. IYEAR_67
-0.052598010 0.048106697
-1.09336 0.27458852
8. IYEAR_68
0.079468615 0.045107833
1.76175 0.07852207
9. IYEAR_69
0.210896152 0.044315294
4.75899 0.00000234
10. IYEAR_70
0.238633821 0.051416062
4.64123 0.00000409
11. IYEAR_71
0.228460915 0.044123572
5.17775 0.00000029
12. IYEAR_73
0.325894418 0.041071810
7.93475 0.00000000
13. IQ
0.000174655 0.003937397
0.04436 0.96463097
Simultaneous Equations Systems
71
* GMM
linreg(inst,optimalweights) lw
# constant s expr tenure rns smsa iyear_67
$
iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 iq
Linear Regression - Estimation by GMM
Dependent Variable LW
Usable Observations
758
Degrees of Freedom
Mean of Dependent Variable
5.6867387823
Std Error of Dependent Variable 0.4289493629
Standard Error of Estimate
0.3302676947
Sum of Squared Residuals
81.262178869
J-Specification(3)
74.164878
Significance Level of J
0.00000000
Durbin-Watson Statistic
1.720776
745
Variable
Coeff
Std Error
T-Stat
Signif
********************************************************************************
1. Constant
4.436784487 0.289950376
15.30188 0.00000000
2. S
0.076835453 0.013185922
5.82708 0.00000001
3. EXPR
0.031233937 0.006693110
4.66658 0.00000306
4. TENURE
0.048999780 0.007343684
6.67237 0.00000000
5. RNS
-0.100681114 0.029588671
-3.40269 0.00066726
6. SMSA
0.133597299 0.026324546
5.07501 0.00000039
7. IYEAR_67
-0.021013483 0.045543337
-0.46140 0.64451500
8. IYEAR_68
0.089099315 0.042701995
2.08654 0.03692996
9. IYEAR_69
0.207248405 0.040799543
5.07967 0.00000038
10. IYEAR_70
0.233830843 0.052851170
4.42433 0.00000967
11. IYEAR_71
0.234552477 0.042566121
5.51031 0.00000004
12. IYEAR_73
0.336026675 0.040410335
8.31536 0.00000000
13. IQ
-0.001401434 0.004113144
-0.34072 0.73331372
4.7 Potential problems of IV Models
Instrumental variable estimation methods, while necessary and useful for models with
endogenous variables on the right, have a number of features that can be serious drawbacks. 12 In
the first place such estimators are never unbiased when endogenous variables are on the right.
Citing Kinal (1980), Wooldridge (2010, 207) notes "when all endogenous variables have
homoskedastic normal distributions with expectations linear in the exogenous variables, the
number of moments of the 2SLS estimator that exist is one fewer than the number of
overidentifying restrictions. This finding implies that when the number of instruments equals the
number of explanatory variables, the IV estimator does not have the expected value." Even for
large sample analysis, there will be problems if there are weak instruments. Assume a single
endogenous variable x on the right or
y  0  1x  u
(4.7-1)
where z is the instrumental variable. It can be shown that
cov( z, u )
p lim ˆ1  1 
cov( z, x)
 c orr ( z, u )
 1  u
 x c orr ( z, x)
(4.7-2)
The greater the correlation between the instruments and the population error u the greater the
bias. The weaker the instrument, the smaller corr ( z, x) and the greater the bias. The bias in the
12 Wooldridge (2010) especially pages 107-114 forms the basis for this section.
72
Chapter 4
OLS estimator is
p lim( 1 )  1 
u
corr ( x, u )
x
(4.7-3)
and can be less than the bias in the IV estimator if
 corr ( z , u ) 
| corr ( x, u ) || 
|
 corr ( z , x) 
(4.7-4)
The more significant the Anderson test, the larger | corr ( z , x) | everything else equal and the less
the bias in the IV estimator. The more significant the Basmann (1960) test, the larger | corr ( z , u ) |
and the more bias in the IV estimator.
4.8 Conclusion
The simeq command should be used when either there are endogenous variables on the
right-hand side of a regression model or when the seemingly unrelated regression model is
desired. In the former case, if OLS is attempted, the results will be biased estimates. Jennings
(1973, 1980), the original developer of the simeq code, made a major contribution in developing
fast and accurate code that was designed to alert the user to problems in the structure of the
model. These include rank tests on all the key matrices as well as rank tests on the matrix of
exogenous variables in the system. The matrix command was used to illustrate calculation of
OLS, LIML, 2SLS, 3SLS and FIML models using more traditional equations that those used by
Jennings. SAS and Rats code was shown and the results compared to the B34S program output.
Using the matrix command LS2 (same as 2SLS) and GMM routines together with a number of
diagnostic tests are shown and the results compared to Stata and Rats using an important dataset
studied by Griliches (1975) and Baum (2006).