Revised Chapter 4 in Specifying and Diagnostically Testing Econometric Models (Edition 3) © by Houston H. Stokes 10 March 2012 All rights reserved. Preliminary Draft Chapter 4 Simultaneous Equations Systems ................................................................................................ 1 4.0 Introduction ......................................................................................................................... 1 4.1 Estimation of Structural Models ........................................................................................ 2 Table 4.1 Matlab Program to obtain Constrained Reduced Form ............................................... 3 Table 4.2 Edited output from running Matlab Program in Table 4.1....................................... 5 4.2 Estimation of OLS, LIML, LS2, LS3, and ILS3 ............................................................... 9 4.3 Examples............................................................................................................................. 17 Table 4.3 B34S, Rats, SAS & Stata setups for ols, liml, ls2, ls3, and ils3 commands ......... 18 Table 4.4 Kmenta (1971, 582) OLS, 2SLS, LIML, 3SLS, I3SLS, FIML Test Problem Answers .................................................................................................................................. 31 Table 4.5 Kmenta (1986, 712) OLS, 2SLS, LIML, 3SLS, I3SLS, FIML Test Problem Answers .................................................................................................................................. 32 4.4 Exactly identified systems ................................................................................................. 41 Table 4.6 Exactly Identified Kmenta Problem ....................................................................... 41 4.5 Analysis of OLS, 2SLS and 3SLS using Matrix Command .......................................... 45 Table 4.7 Matrix Command Implementation of OLS, 2SLS, 3SLS and FIML ..................... 45 4.6 LS2 and GMM Models and Specification tests ................................................................... 58 Table 4.8 LS2 and General Method of Moments estimation routines ................................... 61 Table 4.9 Estimation of LS2 and GMM Models using B34S, Stata and Rats ....................... 68 4.7 Potential problems of IV Models......................................................................................... 79 Table 4.10 Overview of IV Tests ........................................................................................... 80 Table 4.11 Subroutine to Perform Hausman Tests ................................................................ 81 Table 4.12 Various Hausman Tests........................................................................................ 82 4.8 Conclusion ........................................................................................................................... 94 Simultaneous Equations Systems 4.0 Introduction In section 4.1, after first discussing the basic simultaneous equations model, the constrained reduced form, the unconstrained reduced form and the final form are introduced. The MATLAB symbolic capability is used to illustrate how the constrained reduced form relates to the structural parameters of the model. In section 4.2 the theory behind QR approach to simultaneous equations modeling as developed by Jennings (1980) is discussed in some detail. The simeq command performs estimation of systems of equations by the methods of OLS, limited information maximum likelihood (LIML), two-stage least squares (2SLS), three-stage least squares (3SLS), iterative three-stage least squares (I3SLS), seemingly unrelated regression (SUR) and full information maximum likelihood (FIML), using code developed by Les Jennings (1973, 1980). The Jennings code is unique in that it implements the QR approach to estimate 4-1 4-2 Chapter 4 systems of equations, which results in both substantial savings in time and increased accuracy.1 The estimation methods are well known and covered in detail in such books as Johnston (1963, 1972, 1984), Kmenta (1971, 1986), and Pindyck and Rubinfeld (1976, 1981, 1990) and will only be sketched here. What will be discussed are the contributions of Jennings and others. The discussion of these techniques follows closely material in Jennings (1980) and Strang (1976). Section 4.3 illustrates estimation of variants of the Kmenta model using RATS, B34S and SAS while section 4.4 illustrates an exactly identified model. Section 4.5 shows how using the matrix command OLS, LIMF, 3SLS and FIML can be estimated. The code here is for illustration purposes, benchmarking but not production. Section 4.6 shows matrix command subroutines LS2 and GAMEST that respectively do single equation 2SLS and GMM models. This code is 100% production. 4.1 Estimation of Structural Models Assume a system of G equations with K exogenous variables2 b11 y1i ... b1G yG i 11 x1i ... 1 K xK i u1i b 21 y1i ... b 2G yG i 21 x1i ... 2 K xK i u2 i ..................................................................... (4.1-1) b G1 y1i ... bGG yG i G1 x1i ... G K xK i uG i where xk i is the kth exogenous variable for the ith period, y j i is the jth endogenous variable for the ith period, and u j i is the jth equation error term for the ith period. If we define b11 b12 ... b1G b b ... b 2G B= 21 22 .................... bG1 bG2 ... bGG y1i x1i u1i 1112 ... 1K y x u ... 21 22 2K 2i 2i yi xi u i 2i . . . ................ G1 G 2 ... GK yGi xKi uGi equation (4.1-1) can be written as 1 The B34S qr command is designed to provide up to 16 digits of accuracy. This command, which also allows estimation of the principal component (PC) regression, uses LINPACK code and is documented in Chapter 10. The qr command is distinct from the code in the simeq command. The matrix command contains extensive and programmable QR capability. For further examples see Chapter 10 and 16. and sections of chapter 2 2 For further discussion see Pindyck and Rubinfeld (1981, 339-349). Simultaneous Equations Systems Byi x i ui 4-3 (4.1-2) If all observations in yi , x i and ui are included, then u11u12 ... u1N x11 x12 ... x1N y11 y12 ... y1N x x ... x y y ... y u21u22 ... u2 N 21 22 2N 21 22 2N X= Y= U= ............... ............... ............... xk 1 xk 2 ... xk N yG1 yG 2 ... yG N uG1uG 2 ... uG N and equation (4.1-2) can be written as BY X U (4.1-3) From equation (4.1-3), the constrained reduced form can be calculated as Y= B-1X B-1U= X V (4.1-4) If is estimated directly with OLS, then it is called the unconstrained reduced form. The B34S simeq command estimates B, using either OLS, 2SLS, LIML, 3SLS, I3SLS, or FIML. For each estimated vector B, the associated reduced form coefficient vector π can be optionally calculated.3 If B is estimated by OLS, the coefficients will be biased since the key OLS assumption that the right-hand-side variables are orthogonal with the error term is violated. Model (4.1-3) can be normalized such that the coefficients bi j 1 for i j . The necessary condition for identification of each equation is that the number of endogenous variables - 1 be less than or equal to the number of excluded exogenous variables. The reason for this restriction is that otherwise it would not be possible to solve for the elements of uniquely in terms of the other parameters of the model. A short example from Greene (2003) that is self documented using MATLAB illustrates this problem. Table 4.1 Matlab Program to obtain Constrained Reduced Form % % % % % Greene (2003) Chapter 15 Problem # 1 y1= g1*y2 + b11*x1 + b21*x2 + b31*x3 y2= g2*y1 + b12*x1 + b22*x2 + b32*x3 We know BY+GX=E syms g1 g2 b11 b21 b31 b12 b22 b32 B =[ 1, -g1; -g2, 1] 3 If the model is exactly identified, the constrained reduced form can be directly estimated by OLS or using (4.1-4) from LIML, 2SLS or 3SLS. This is shown empirically in section 4.5. 4-4 Chapter 4 G =[-b11,-b21,-b31; -b12,-b22,-b32] a= -1*inv(B)*G p11=a(1,1) p12=a(1,2) p13=a(1,3) p21=a(2,1) p22=a(2,2) p23=a(2,3) % Hopeless. Have 6 equations BUT more than 6 variables ' Now impose restrictions' ' b21=0 b32=0' G =[-b11, 0, -b31; -b12,-b22, 0 ] B,G a= -1*inv(B)*G ' Here 6 equations and six unknowns g1 g2 b11 b31 b12 b22 ' p11=a(1,1) p12=a(1,2) p13=a(1,3) p21=a(2,1) p22=a(2,2) p23=a(2,3) Simultaneous Equations Systems 4-5 Table 4.2 Edited output from running Matlab Program in Table 4.1 p11 p12 p13 p21 p22 p23 = = = = = = -1/(-1+g1*g2)*b11+g1/(-1+g1*g2)*b12 -1/(-1+g1*g2)*b21+g1/(-1+g1*g2)*b22 -1/(-1+g1*g2)*b31+g1/(-1+g1*g2)*b32 -g2/(-1+g1*g2)*b11+1/ (-1+g1*g2)*b12 -g2/(-1+g1*g2)*b21+1/ (-1+g1*g2)*b22 -g2/(-1+g1*g2)*b31+1/ (-1+g1*g2)*b32 Here 6 equations and six unknowns g1 g2 b11 b31 b12 b22 p11 p12 p13 p21 p22 p23 = -1/(-1+g1*g2)*b11+g1/(-1+g1*g2)*b12 = -g1/(-1+g1*g2)*b22 = -1/(-1+g1*g2)*b31 = -g2/(-1+g1*g2)*b11+1/(-1+g1*g2)*b12 = -1/(-1+g1*g2)*b22 = -g2/(-1+g1*g2)*b31 If the excluded exogenous variables of the ith equation are not significant in any other equation, then the ith equation will not be identified, even if it is correctly specified. We note that E (ui | xi ) 0 and E (uiui' ) where ui [u1i , , uGi ] ' and xi [ x1i , , xK i ]' . The reduced form disturbance is not correlated with the exogenous variables or E (vi | xi ) B 1 0 0 . E (vi vi' | xi ) E[ B 1ui ui' ( B ' ) 1 ] B 1( B ' ) 1 from which we deduce that BB' (4.1-5) In summary, = G by K exogenous variable coefficient matrix, B = G by G nonsingular endogenous variable coefficient matrix, = K by K symmetric positive definite matrix structural covariance matrix, =G by K constrained reduced form coefficient matrix and = G by G reduced form covariance matrix. The importance of this is that since and can be estimated consistently by OLS, following Greene (2003, 387) if B were known, we could obtain B from (4.1-4) and from (4.1-5). If there are no endogenous variables on the right, yet a number of equations are estimated where there is covariance in the error term across equations, the seemingly unrelated regression model (SUR) can be estimated as ˆ ( X ' 1 X ) 1 ( X ' 1Y ). (4.1-6) ˆ (ˆ ) can be estimated if OLS is used on each of the G equations and Elements of ij ˆ ii uˆiuˆi' /(T Ki ) ˆ i j uˆiuˆ 'j / (T Ki )(T K j ) (4.1-7) 4-6 Chapter 4 For more detail see Greene (2003) or other advanced econometric books. Pindyck and Rubinfield (1976, 1981, 1990) provides a particularly good treatment that is consistent with the notation in this chapter. From (4.1-4) Theil (1971, 463-468) suggests calculating the final form. First partition the i observation of the exogenous variables into lagged endogenous, current exogenous and lagged exogenous where identifies are used to express lags > 1. th [d 0 , D1 , D2 , D3 ] yi1 x x i x i 1 yi d 0 D1yi 1 D2 x i D3 x i 1 i* * i (4.1-8) Theil (1971) shows that (4.1-8) can be expressed as t 1 t 0 yi (I D1 ) 1 d 0 D2 x i D1t 1 ( D1D2 D3 )x i 1 D1t i*1 (4.1-9) where D2 is the impact multiplier. If there are no lagged endogenous variables in the system, D1 0 and the constrained reduced form and the final form are the same. In this case [ D2 , D3 ] . The interim multipliers are D2 , ( D1D2 D3 ), D1 ( D1D2 D3 ), , D1 ( D1D2 D3 ) which, when summed, form the total multiplier G* G* D2 (I D1 D12 )( D1D2 D3 ) D2 (I D1 ) 1 ( D1D2 D3 ) (I D1 ) 1[(I D1 ) D2 D1D2 D3 ) (4.1-10) (I D1 ) 1 ( D2 D3 ) Goldberger (1959) and Kmenta (1971, 592) provide added detail. The importance of (4.1-8) is that it shows the effect on all endogenous variables of a change in any exogenous variable after all effects have had a chance to work themselves out in the system. There are several common mistakes made in setting up simultaneous equations systems that include the following: - Not fully checking for multicollinearity in the equations system. Simultaneous Equations Systems 4-7 - Attempting to interpret the estimated B and Γ coefficients as partial derivatives, rather than looking at the reduced form G by K matrix π. - Not effectively testing whether excluded exogenous variables are significant in at least one other equation in the system. - Not building into the solution procedure provisions for taking into account the number of significant digits in the data. The simeq code has unique design characteristics that allow solutions for some of these problems. In the next sections, we will briefly outline some of these features. Assume for a moment that X is a T by K matrix of observations of the exogenous variables, Y is a T by 1 vector of observations of the endogenous variable, and β is a K element array of OLS coefficients, then the OLS solution for the estimated β from equation (2.1-8) is ( X ' X ) 1 X ' Y . The problem with this approach is that some accuracy is lost by forming the matrix X ' X . The QR approach4 proceeds by operating directly on the matrix X to express it in terms of the upper triangular K by K matrix R and the T by T orthogonal matrix Q. X is factored as R R X=Q [Q1 | Q2 ] Q1R 0 0 (4.1-11) Since Q'Q = I, then (X'X)-1X'Y=(R 'Q1' Q1R)-1R 'Q1' Y=(R 'R) 1R 'Q1' Y=R 1Q1' Y (4.1-12) 4 A good discussion of the QR factorization is contained in Strang (1976). Other references include Jennings (1980) and Dongarra, Bunch, Moler, and Stewart (1979). 4-8 Chapter 4 Following Jennings (1980), we define the condition number of matrix X, (C(X)), as the ratio of the square root of the largest eigenvalue of X ' X , [ Emax ( X ' X )] to the smallest eigenvalue of X ' X , [ Emin ( X ' X )] C(X)= [Emax (X'X)/Emin (X'X)] (4.1-13) If | | X||= Emax (X'X) , and X is square and nonsingular, then C(X)=||X|| ||X1 || (4.1-14) Throughout B34S, 1/C(X) is checked to test for rank problems. Jennings (1980) notes that C(X) can also be used as a measure of relative error. If μ is a measure of round-off error, then [C ( X )]2 is the bound for the relative error of the calculated solution. In an IBM 370 running double precision, μ is approximately .1E-16. If C(X) is > .1E+8 (1 /C(X) is < .1E-8), then [C(X)]2 1 , meaning that no digits in the reported solution are significant. Jennings (1980) looks at the problem from another perspective. If matrix X has a round-off error of τX such that the actual X used is X+τX, then || X|| / ||X|| must be less than 1/C(X) for a solution to exist. If || X|| / ||X|| = 1/C(X) (4.1-15) then there exists a X such that X X is singular.5 The user can inspect the estimate of the condition and determine the degree of multicollinearity. Most programs only report problems when the matrix is singular. Inspection of C(X) gives warning of the degree of the problem. The simeq command contains the IPR parameter option with which the user can inform the program of the number of significant digits in X. This information is used to terminate the iterative threestage (ILS3) iterations when the relative change in the solution is within what would be expected, given the number of significant digits in the data. Jennings (1980) notes that the relative error of the QR solution to the OLS problem given in equation (4.1-10) has the form n1C ( X ) n2C ( X ) 2 (|| eˆ || / || ˆ ||) (4.1-16) where n1 and n2 are of the order of machine precision and || eˆ || ˆ || are the lengths of the estimated residual and estimated coefficients, respectively. (The length or L2NORM of a vector 5 For more detail on techniques used in simeq to avoid numerical error in the calculations arising from differences in the means of the data, see Jennings (1980). Simultaneous Equations Systems ei is defined as e 2 i 4-9 ) . Equation (4.1-14) indicates that as the relative error of the computer i solution improves, the closer the model fits. An estimate of this relative error is made for OLS, LIML and 2SLS estimators reported by simeq. 4.2 Estimation of OLS, LIML, LS2, LS3, and ILS3 For OLS estimation of a system of equations, simeq uses the QR approach discussed earlier. If the reduced option is used, once the structural coefficients B and Γ in equation (4.1-3) are known, the constrained reduced form coefficients π from equation (4.1-4) are displayed. If B and Γ are estimated using OLS, and all structural equations are exactly identified, then the constraints on π imposed from the structural coefficients B and Γ are not binding and π could be estimated directly with OLS or indirectly via (4.1-4). However, if one or more of the equations in the structural equations system (4.1-2) are overidentified, π must be estimated as B1 . Although the reduced-form coefficients π exist and may be calculated from any set of structural estimates B and Γ, in practice it is not desirable to report those derived from OLS estimation because in the presence of endogenous variables on the right-hand side of an equation, the OLS assumption that the error term is orthogonal with the explanatory variables is violated. Since OLS imposes this constraint as a part of the estimation process, the resulting estimated B and Γ are biased. The reason that OLS is often used as a benchmark is because from among the class of all linear estimators, OLS produces minimum variance. The loss in predictive power of LIML and 2SLS has to be weighed against the fact that OLS produces biased estimates. If reduced-form coefficients are desired, identities in the system must be entered. The number of identities plus the number of estimated equations must equal the number of endogenous variables in the model. The simeq command requires that the number of model sentences and identity sentences is equal to the number of variables listed in the endogenous sentence. The 2SLS estimator first estimates all endogenous variables as a function of all exogenous variables. This is equivalent to estimating an unconstrained form of the reduced-form equation (4.1-4). Next, in stage 2 the estimated values of the endogenous variables on the right in the jth equation Yj* are used in place of the actual values of the endogenous variables Yj on the right to estimate equation (4.1-2). Since the estimated values of the endogenous variables on the right are only a function of exogenous variables, the theory suggests they can be assumed to be orthogonal with the population error, and OLS can be safely used for the second stage. In terms of our prior notation, the two-stage estimator for the first equation is 4-10 Chapter 4 b11 . 1 ˆ 'Yˆ Yˆ ' X Yˆ ' y b1g Y 1 1 1 1 1 1 ˆ X )'(Y ˆ X )}-1 (Y ˆ X )' y {(Y ' 1 1 1 1 1 1 1 ' ˆ ' 11 X 1Y1 X 1 X 1 X 1 y1 . 1g (4.2-1) where Ŷ1' is the matrix of predicted endogenous variables in the first equation and X1 is the matrix of exogenous variables in the first equation. For further details on this traditional estimation approach, see Pindyck and Rubinfeld (1981, 345-347). The QR approach used by Jennings (1980) involves estimating equation (4.2-1) as the solution of Z'j (XX )Z j j Z'j (XX )y j (4.2-2) For j , where 'j {(b11,..., big )',(11,.., 1k )'}, Z j [X j | Yj ] and X+ pseudoinverse6 of X. Zj consists of the X and Y variables in the jth equation. XX+ is not calculated directly but is expressed in terms of the QR factorization of X. By working directly on X, and not forming X'X, substantial accuracy is obtained. Jennings proceeds by writing I 0 XX + Q r Q ' 0 0 (4.2-3) where Ir is the r by r identity matrix and r is the rank of X. Using equation (4.2-3), equation (4.22) becomes ˆ Ir 0 Z ˆ Z ˆ Ir 0 yˆ Z j j j j j 0 0 0 0 (4.2-4) 6 If we define X+ as the pseudoinverse of the T by K matrix X, then it can be shown (Strang 1976, 138, exercise 3.4.5) that the following four conditions hold: 1. XX+X=X; 2. X+XX+=X+; 3. (XX+)'=XX+; and 4. (X+X)'=X+X . The pseudoinverse can be obtained from the singular value decomposition or the QR factorization of X. Simultaneous Equations Systems 4-11 where Ẑ j Q'Z j and ŷ j Q' y j . The 2SLS covariance matrix can be estimated as (|| e j ||2 d f )(Z'jXX+ Z j )1 (4.2-5) where d f is the degrees of freedom and || e j ||2 is the residual sum of squares (or the square of the L2NORM of the residual). There is a substantial controversy in the literature about the appropriate value for d f . Since the SEs of the estimated 2SLS coefficients are known only asymptotically, Theil (1971) suggests that d f be set equal to T, the number of observations used to estimate the model. Others suggest that d f be set to T-K, similar to what is being used in OLS. If Theil's suggestion is used, the estimated SEs of the coefficients are larger. The T-K option is more conservative. The simeq command produces both estimates of the coefficient standard errors to facilitate comparison with other programs and researcher preferences. Two-stage least squares estimation of an equation with endogenous variables on the right, in contrast with OLS estimation, in theory produces unbiased coefficients at the cost of some loss of efficiency. If a large system is estimated, it is often impossible to use all exogenous variables in the system because of loss of degrees of freedom. The usual practice is to select a subset of the exogenous variables. The greater the number of exogenous variables relative to the degrees of freedom, the closer the predicted Y variables on the right are to the raw Y variables on the right. In this situation, the 2SLS estimator sum of squares of residuals will approach the OLS estimator sum of squares of residuals. Such an estimator will lose the unbiased property of the 2SLS estimator. Usual econometric practice is to use OLS and 2SLS and compare the results to see how sensitive the OLS results are to simultaneity problems. While 2SLS results are sensitive to the variable that is used to normalize the system, limited information maximum likelihood (LIML) estimation, which can be used in place of 2SLS, is not so sensitive. Kmenta (1971, 568-570) has a clear discussion which is summarized below. The LIML estimator,7 which is hard to explain in simple terms, involves selecting values for b and δ for each equation such that L is minimized where L = SSE1 / SSE. We define SSE1 as the residual variance of estimating a weighted average of the y variables in the equation on all exogenous variables in the equation, while SSE is the residual variance of estimating a weighted average of the y variables on all the exogenous variables in the system. Since SSE SSE1, L is bounded at 1. The difficulty in LIML estimation is selecting the weights for combining the y variables in the equation. Assume equation 1 of (4.1-1) b11 y1i ... b1G yG i 11 xi 1 ... 1K xK i u1i 7 Kmenta (1971, 565-572) has one of the clearest descriptions. The discussion here complements that material. (4.2-6) 4-12 Chapter 4 Ignoring time subscripts, we can define y1* y1 [b12 y2 ... b1G yG ] (4.2-7) ' [1, b12 ,..., b1G ] we would know y*1 If we define Y1* [ y1i ,..., y1G ] and we knew the vector B1* since y1* Y1*B1* and could regress y* on all x variables on the right in that equation and call the residual variance SSE1 and next regress y1* on all x variables in the system and call the residual variance SSE. If we define X1 as a matrix consisting of the columns of the x variables on the right X1= [x1i,...,x1K], and we knew B1*, then we could estimate 1 [11 ,..., 1K ] as 1 [X1' X1 ]1 X1' y1* (X1' X1 ) 1 X1*Y1*B1* (4.2-8) However, we do not know B1*. If we define W1* Y1*' Y1* (Y1*' X1 )(X1*X1 ) 1 X1*Y1* W1 Y1*' Y1* (Y1*' X)(X'X)1X'Y1* (4.2-9) (4.2-10) where X is the matrix of all X variables in the system, then L can be written as ' ' L [B1* W1*B1* ] / B1* W1B1* (4.2-11) Minimizing L implies that det (W1* LW1 )B1* 0 (4.2-12) The LIML estimator uses eigenvalue analysis to select the vector B1* such that L is minimized. This calculation involves solving the system det(W1* LW1 ) 0 (4.2-13) for the smallest root L which we will call . This root can be substituted back into equation (4.212) to get B1* and into equation (4.2-8) to get Γ1. Jennings shows that equation (4.2-13) can be rewritten as det | Y1*' {(I X1X1+ ) (I-XX + )}Y1* | 0 . (4.2-14) Further factorizations lead to accuracy improvements and speed over the traditional methods of solution outlined in Johnston (1984), Kmenta (1971), and other books. Jennings (1973, 1980) briefly discusses tests made for computational accuracy, given the number of significant digits in the data and various tests for nonunique solutions. One of the main objectives of the simeq code Simultaneous Equations Systems 4-13 was to be able to inform the user if there were problems in identification in theory and in practice. Since the LIML standard errors are known only asymptotically and are, in fact, equal to the 2SLS estimated standard errors, these are used for both the 2SLS and LIML estimators. In the first stage of 2SLS, π is the unconstrained, reduced form. Y = πX + V (4.2-15) and is estimated to obtain the Yˆ predicted variables. 2SLS, OLS, and LIML are all special cases of the Theil (1971) k class estimators. The general formula for the k class estimator for the first equation (Kmenta 1971, 565) is ˆ (k ) Y'Y kV ˆ 'V ˆ B 1 1 1 1 1 (k ) ' ˆ 1 X1Y1 1 ˆ 'y Y1'X1 Y1'Y1 kV 1 1 ' ' X1X1 X1 y1 (4.2-16) where Vˆ1 is the predicted residual from estimating all but the 1st y variable in equation (4.2-15), Yˆ1 Y1 Vˆ , and X1 is the X variables on the right-hand side of the first equation. (4.2-16) follows directly from (4.2-1). If k=0, equation (4.2-15) is the formula for OLS estimation of the first equation. If k=1, equation (4.2-16) is the formula for 2SLS estimation of the first equation and can be transformed to equation (4.2-5). If k = , the minimum root of equation (4.2-13), equation (4.2-16) is the formula for the LIML estimator (Theil 1971, 504). Hence, OLS, 2SLS, and LIML are all members of the k class of estimators. Three-stage least squares utilizes the covariance of the residuals across equations from the estimated 2SLS model to improve the estimated coefficients B and Γ. If the model has only exogenous variables on the right-hand side ( B I ) which implies that the OLS estimates can be used to calculate the covariance of the residuals across equations. The resulting estimator is the seemingly unrelated regression model (SUR). In this discussion, we will look at the 3SLS model only, since the SUR model is a special case. From (4.2-2) we rewrite the 2SLS estimator for the ith equation as i [Zi' X(X ' X) 1X ' Zi ]1 Zi' X(X ' X) 1X ' yi , (4.2-17) which estimates the ith 2SLS equation yi Zii ui . If we define8 (X' X)-1 PP' and multiply equation (4.2-18) by P'X', we obtain 8 This discussion is based on material contained in Johnston (1984, 486). (4.2-18) 4-14 Chapter 4 P' X ' yi P' X ' Zi i P' X 'ui (4.2-19) which can be written w i Wii i (4.2-20) where w i P ' X ' yi , Wi =P'X ' Zi , and i P 'X 'ui . If all G 2SLS equations are written as w1 W1 0 ...... 0 1 1 w 0 W ...... 0 2 2 2 2 . .................... . . WG G G w G 0 0 (4.2-21) then the system can be written as w = Wα + ε. (4.2-22) For each equation, i=j and E[i ( j )' ] E[P'X( i ( j )' XP)= i j P'X'XP= i j I (4.2-23) while the covariance of the error term for the system becomes 11 I 12 I... 1G I I I... 2G I 21 22 ........................ G1I G 2 I... 1G I 24) (4.2- Equation (4.2-24) indicates that for each equation there is no heteroskedasticity, but that there is contemporaneous correlation of the residuals across equations. Equation (4.2-24) can be estimated from the 2SLS estimates of the residuals of each equation for 3SLS or the OLS estimates of the residuals of each equation for SUR models. Let ˆ ˆ I V= 25) be such an estimate. The 3SLS estimator of the system , where ' [B ] becomes (4.2- Simultaneous Equations Systems ˆ ' W) 1W ' V ˆ 1w (W ' V 4-15 (4.2-26) Jennings (1980) uses two alternative approaches to solve (4.2-26) depending on whether the covariance of the 3SLS estimator Var( ) (Wˆ 'Vˆ 1Wˆ ) 1 27) (4.2- is required or not. In the former case, a orthogonal factorization method is used. In the latter case to save space the conjugate gradiant interative algorithm (Lanczos reductyion) suggested by Paige and Sanders (1973) is used. This latter approach may or may not converge. For added detail see Jennings (1980). If the switch kcov=diag is used there will not be convergence issues, since the QR approach will be used. Since many software systems use inversion methods, slight differences in the estimated coefficients will be observed since the QR approach is in theory more accurate. Implementation of the "textbook" approach is illustrated using the matrix command in section 4.4. In a model with G equations, if the equation of interest is the jth equation, then assuming the exogenous variables in the system are selected correctly and the jth equation is specified correctly, 2SLS estimates are invariant to any other equation. 3SLS of the j th equation, in contrast, is sensitive to the specification of other equations in the system since changes in other equation specifications will alter the estimate of V and thus the 3SLS estimator of δ from equation (4.2-26). Because of this fact, it is imperative that users first inspect the 2SLS estimates closely. The constrained reduced form estimates, π, should be calculated from the OLS and 2SLS models and compared. The differences show the effects of correcting for simultaneity. Next, 3SLS should be performed. A study of the resulting changes in δ and π will show the gain of moving to a system-wide estimation procedure. Since changes in the functional form of one equation i can possibly impact the estimates of another equation j, in this step of model building, sensitivity analysis should be attempted. In a multiequation system, the movement from 2SLS to 3SLS often produces changes in the estimate of δi for one equation but not for another equation. In a model in which all equations are over identified, in general the 3SLS estimators will differ from the 2SLS estimators. If all equations are exactly identified, then V is a diagonal matrix (Theil 1971, 511) and there is no gain for any equation from using 3SLS. In the test problem from Kmenta (1971, 565), which is discussed in the next section, one equation is over identified and one equation is exactly identified. In this case, only the exactly identified equation will be changed by 3SLS. This is because the exactly identified equation gains from information in the over identified equation but the reverse is not true. The over identified equation does not gain from information from the exactly identified equation. In SUR models, if all equations contain the same variables, there is no gain over OLS from going to SUR, since V is again a diagonal matrix. Just as the LIML method of estimation is an alternative to 2SLS, the FIML is a more costly alternative to 3SLS and I3SLS. 4-16 Chapter 4 FIML9 is a generalization of LIML for systems of models. Like LIML, it is invariant to the variable used to normalize the model. FIML, in contrast with 3SLS, is highly nonlinear and, as a consequence, much more costly to estimate. Because FIML is asymptotically equivalent to 3SLS (Theil 1971, 525) and the simeq code does not contain any major advantages over other programs, the discussion of FIML is left to Theil (1971), Kmenta (1971) and Johnston (1984) except for an annotated FIML example using the matrix command. In the next section, an annotated output is presented. Iterative 3SLS is an alternative final step in which the estimate of V is updated from the information from the 3SLS estimates. The problem now becomes where do you stop iterating on the estimates of V? The simeq command uses the information on the number of significant digits (see ipr parameter) in the raw data and equation (4.1-8) to terminate the I3SLS iterations if the relative change is within what would be expected, given the number of significant digits in the raw data. If ipr is not set, the simeq command assumes ten digits. 9 The fiml section of the simeq command is the weakest link. In addition to a probably a scaler error in the fiml standard errors, there often are convergence problems that appear to be data related. In view of this and the fact that 3SLS is an inexpensive substitute, users are encouraged to employ 3SLS and I3SLS in place of FIML. Future releases of B34S will endeavor to improve the FIML code or disable the option. The matrix command implementation of FIML, shown later in section 4.4, provides a look into how such a model might be implemented. Simultaneous Equations Systems 4-17 4.3 Examples Using data on supply and demand from Kmenta (1971, 565), Table 4.3 shows b34s code to estimate models for OLS, LIML, 2SLS, and 3SLS. The reduced-form estimates for each model are calculated. Not all output is shown to save space. The results are the same, digit for digit, as those reported in Kmenta (1971, 582) which is shown in Table 4.4. Kmenta (1986, 712) reported the results for the same problem for all models with the same coefficients except for I3SLS for the supply equation. These answers are shown in Table 4.5 Note the use of the keyword ls2 for 2SLS and ls3 for 3SLS since the b34s parser will not recognize 2SLS and 3SLS as keywords. The b34s setup in Table 4.3 also shows Rats, SAS and Stata commands by which it is possible to benchmark each software system. These results will be discussed in turn. 4-18 Chapter 4 Table 4.3 B34S, Rats, SAS & Stata setups for ols, liml, ls2, ls3, and ils3 commands ==KMENTA1 %b34slet runsimeq=1; %b34slet runsas =1; %b34slet runrats =1; %b34slet runstata=1; B34sexec data nohead corr$ Input q p d f a $ Label q = 'Food consumption per head'$ Label p = 'Ratio of food prices to consumer prices'$ Label d = 'Disposable income in constant prices'$ Label f = 'Ratio of t 1 years price to general p'$ Label a = 'Time'$ Comment=('Kmenta (1971) page 565 answers page 582')$ Datacards$ 98.485 100.323 87.4 98.0 1 99.187 104.264 97.6 102.163 103.435 96.7 99.1 3 101.504 104.506 98.2 104.240 98.001 99.8 110.8 5 103.243 99.456 100.5 103.993 101.066 103.2 105.6 7 99.900 104.763 107.8 100.350 96.446 96.6 108.7 9 102.820 91.228 88.9 95.435 93.085 75.1 81.0 11 92.424 98.801 76.9 94.535 102.908 84.6 70.9 13 98.757 98.756 90.6 105.797 95.119 103.1 102.3 15 100.225 98.451 105.1 103.522 86.498 96.4 110.5 17 99.929 104.016 104.4 105.223 105.769 110.7 89.3 19 106.232 113.490 127.1 B34sreturn$ B34seend$ 99.1 98.1 108.2 109.8 100.6 68.6 81.4 105.0 92.5 93.0 2 4 6 8 10 12 14 16 18 20 %b34sif(&runsimeq.ne.0)%then; B34sexec simeq printsys reduced ols liml ls2 ls3 ils3 kcov=diag ipr=6$ Heading=('Test Case from Kmenta (1971) Pages 565 582 ' ) $ Exogenous constant d f a $ Endogenous p q $ Model lvar=q rvar=(constant p d) Name=('Demand Equation')$ Model lvar=q rvar=(constant p f a) name=('Supply Equation')$ B34seend$ %b34sendif; %b34sif(&runsas.ne.0)%then; B34SEXEC B34SRUN$ OPTIONS OPEN('testsas.sas') UNIT(29) DISP=UNKNOWN$ Simultaneous Equations Systems 4-19 B34SEXEC OPTIONS CLEAN(29) $ B34SEEND$ B34SEXEC PGMCALL IDATA=29 ICNTRL=29$ SAS $ PGMCARDS$ proc means; run; proc syslin 3sls reduced; instruments d f a constant; endogenous p q; demand: supply: run; model q = p d; model q = p f a; proc syslin it3sls reduced; instruments d f a constant; endogenous p q; demand: supply: run; model q = p d; model q = p f a; B34SRETURN$ B34SRUN $ B34SEXEC OPTIONS CLOSE(29)$ B34SRUN$ /$ The next card has to be modified to point to SAS location /$ Be sure and wait until SAS gets done before letting B34S resume B34SEXEC OPTIONS dodos('start /w /r sas testsas') dounix('sas testsas')$ B34SRUN$ B34SEXEC OPTIONS NPAGEOUT NOHEADER WRITEOUT(' ','Output from SAS',' ',' ') WRITELOG(' ','Output from SAS',' ',' ') COPYFOUT('testsas.lst') COPYFLOG('testsas.log') dodos('erase testsas.sas','erase testsas.lst','erase testsas.log') dounix('rm testsas.sas','rm testsas.lst','rm testsas.log')$ B34SRUN$ %b34sendif; %b34sif(&runrats.ne.0)%then; 4-20 Chapter 4 B34SEXEC OPTIONS HEADER$ B34SRUN$ b34sexec options open('rats.dat') unit(28) disp=unknown$ b34srun$ b34sexec options open('rats.in') unit(29) disp=unknown$ b34srun$ b34sexec options clean(28)$ b34srun$ b34sexec options clean(29)$ b34srun$ b34sexec pgmcall$ rats passasts pcomments('* ', '* Data passed from B34S(r) system to RATS', '* ', "display @1 %dateandtime() @33 ' %ratsversion()" '* ') $ Rats Version PGMCARDS$ * * heading=('test case from kmenta 1971 page 565 - 582 ' ) $ * exogenous constant d f a $ * endogenous p q $ * model lvar=q rvar=(constant p d) name=('demand eq.') $ * model lvar=q rvar=(constant p f a) name=('supply eq.') $ linreg q # constant p d linreg q # constant p f a instruments constant d f a linreg(inst) q # constant p d linreg(inst) q # constant p f a source d:\r\liml.src @liml q # constant p d @liml q # constant p f a equation demand q # constant p d equation supply q ' Simultaneous Equations Systems 4-21 # constant p f a * Supply does not match known answers!! sur(inst,iterations=200) 2 # demand resid1 # supply resid2 nonlin(parmset=structural) c0 c1 c2 d0 d1 d2 d3 compute compute compute compute compute compute compute c0 c1 c2 d0 d1 d2 d3 = = = = = = = .1 .1 .1 .1 .1 .1 .1 frml d_eq q = c0 + c1*p + c2*d frml s_eq q = d0 + d1*p + d2*f + d3*a nlsystem(inst,parmset=structural,outsigma=v) * * d_eq s_eq b34sreturn$ b34srun $ b34sexec options close(28)$ b34srun$ b34sexec options close(29)$ b34srun$ b34sexec options /$ dodos(' rats386 rats.in rats.out ') dodos('start /w /r rats32s rats.in /run') dounix('rats rats.in rats.out')$ B34SRUN$ b34sexec options npageout WRITEOUT('Output from RATS',' ',' ') COPYFOUT('rats.out') dodos('ERASE rats.in','ERASE rats.out','ERASE dounix('rm rats.in','rm rats.out','rm $ B34SRUN$ %b34sendif; %b34sif(&runstata.ne.0)%then; /$ This name is required unless filename option used rats.dat') rats.dat') 4-22 Chapter 4 b34sexec options open('statdata.do') unit(28) disp=unknown$ b34srun$ b34sexec options clean(28)$ b34srun$ b34sexec options open('stata.do') unit(29) disp=unknown$ b34srun$ b34sexec options clean(29)$ b34srun$ b34sexec pgmcall idata=28 icntrl=29$ stata$ pgmcards$ // uncomment if do not use /e // log using stata.log, text // version info about describe summarize reg3 (q p d) (q p f a), 2sls endog(p) reg3 (q p d) (q p f a), 3sls endog(p) reg3 (q p d) (q p f a), ireg3 endog(p) b34sreturn$ b34seend$ b34sexec options close(28); b34srun; b34sexec options close(29); b34srun; b34sexec options dodos('stata /e do stata.do'); b34srun; b34sexec options npageout writeout('output from stata',' ',' ') copyfout('stata.log') dodos('erase stata.do','erase stata.log','erase statdata.do') $ b34srun$ %b34sendif; == Simultaneous Equations Systems 4-23 The OLS results from b34s match Kmenta to every digit and are shown next: Test Case from Kmenta (1971) Pages 565 - 582 Summary of Input Parameters and Model Number of systems to be estimated - Number of identities - - - - - - - Number of exogenous variables - - Number of endogenous variables - - Number of data points in time - - - Maximum number of unknowns per system Print Parameter - - - - - - - - - - Solutions wanted 0 => no, 1 => yes Reduced form coefficients - - - - - Ordinary Least Squares - - - - - - LIMLE Solution - - - - - - - - - - Two Stage Least Squares - - - - - - Three Stage Least Squares - - - - - Three Stage Covariance Matrix - - - Iterated Three Stage Least Squares Covariance Matrix for I3SLSQ - - - Maximum number of iterations - - - Functional Minimization 3SLSQ - - - Covariance Matrix for Functional Min. - 2 0 4 2 20 4 2 1 1 1 1 1 1 1 1 25 0 0 Systems described by the following columns of data Name of the System LHS Demand Equation B34S 8.10R 4 Q 2 Q (D:M:Y) 11/ 4/04 (H:M:S) 11:13:19 Least Squares Solution for System Number 1 Condition Number of Matrix is greater than Relative Numerical Error in the Solution LHS Endogenous Variable No. 2 Exogenous Variables (Predetermined) 1 CONSTANT 2 D SIMEQ STEP PAGE Demand Equation 21.04911571706159 1.301987681166638E-11 Q 99.89542 0.3346356 Std. Error 7.519362 0.4542183E-01 t 13.28509 7.367285 Endogenous Variables (Jointly Dependent) 3 P -0.3162988 Std. Error 0.9067741E-01 t -3.488177 Residual Variance for Structural Disturbances Ratio of Norm Residual to Norm LHS 3.725391173733892 1.762488253954560E-02 Covariance Matrix of Estimated Parameters 1 2 3 CONSTANT D 1 2 56.54 0.3216E-01 0.2063E-02 -0.5948 -0.2333E-02 P 3 0.8222E-02 Correlation Matrix of Estimated Parameters CONSTANT D P 1 2 3 NO. Y 3 1 1 CONSTANT 1 P 3 F 4 A * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Test Case from Kmenta (1971) Pages 565 - 582 CONSTANT D P (Variables) 2 1 1 CONSTANT 1 P 2 D * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Supply Equation 2 No. X CONSTANT D 1 2 1.000 0.9417E-01 1.000 -0.8724 -0.5665 P 3 1.000 Test Case from Kmenta (1971) Pages 565 - 582 Least Squares Solution for System Number Condition Number of Matrix is greater than Relative Numerical Error in the Solution LHS Endogenous Variable No. 2 Q 2 Supply Equation 17.67594711864223 1.318741471618151E-11 4-24 Chapter 4 Exogenous Variables (Predetermined) 1 CONSTANT 2 F 3 A 58.27543 0.2481333 0.2483023 Std. Error 11.46291 0.4618785E-01 0.9751777E-01 t 5.083825 5.372263 2.546227 Endogenous Variables (Jointly Dependent) 4 P 0.1603666 Std. Error 0.9488394E-01 t 1.690134 Residual Variance for Structural Disturbances Ratio of Norm Residual to Norm LHS 5.784441135907554 2.130622575072544E-02 Covariance Matrix of Estimated Parameters CONSTANT F A P 1 2 3 4 CONSTANT 1 131.4 -0.3044 -0.2792 -0.9875 F A P 2 3 4 0.2133E-02 0.1316E-02 0.8440E-03 0.9510E-02 0.5220E-03 0.9003E-02 Correlation Matrix of Estimated Parameters CONSTANT F A P 1 2 3 4 CONSTANT 1 1.000 -0.5749 -0.2498 -0.9079 F A 2 1.000 0.2921 0.1926 P 3 1.000 0.5642E-01 4 1.000 Test Case from Kmenta (1971) Pages 565 - 582 Contemporaneous Covariance of Residuals (Structural Disturbances) For Least Squares Solution. Condition Number of residual columns, Demand E Supply E 1 2 Demand E 1 3.167 3.411 2.664758 Supply E 2 4.628 Correlation Matrix of Residuals Demand E Supply E 1 2 Demand E 1 1.000 0.8912 Supply E 2 1.000 Test Case from Kmenta (1971) Pages 565 - 582 Coefficients of the Reduced Form Equations. Least Squares Solution. Condition number of matrix used to find the reduced form coefficients is no smaller than P CONSTANT D F A 1 2 3 4 1 87.31 0.7020 -0.5206 -0.5209 4.195815340351579 Q 2 72.28 0.1126 0.1647 0.1648 Mean sum of squares of residuals for the reduced form equations. 1 2 P Q 0.42748D+01 0.39192D+01 Condition Number of columns of exogenous variables, 11.845 For each estimated equation, the condition number of the matrix, equation (4.1-7), and the relative numerical errors in the solution, equation (4.1-8), are given. The relative numerical errors for the supply and demand equations were .1302E-10 and .13187E-10, respectively. Estimated coefficients agree with Kmenta (1971, 582). From the estimated B and Γ coefficients, the constrained reduced form π coefficients are calculated. The condition number of the exogenous columns, .11845E+2, shows little multicollinearity among the exogenous variables. The next outputs show the corresponding estimates for LIML, 2SLS, and 3LSL. As was discussed earlier, since the asymptotic SEs for LIML are the same as for 2SLS, the simeq Simultaneous Equations Systems 4-25 command does not print these values. Kmenta, however, reports standard errors on the LIML estimates. Note that b34s reports both the large and small sample standard errors. Test Case from Kmenta (1971) Pages 565 - 582 Limited Information - Maximum Likelihood Solution f 1 Demand Equation Rank and Condition Number of Exogenous Columns Rank and Condition Number of Endogenous Variables orthogonal to X(K) Rank and Condition Number of Endogenous Variables orthogonal to X Value of LIML Parameter is 2 2 2 8.5174634 6.5593694 2.3005812 3 1 2 8.2098363 1.0000000 1.0000000 1.173867141559841 Condition Number of Matrix is greater than Relative Numerical Error in the Solution LHS Endogenous Variable No. 2 8.517463415017575 4.487883690647531E-12 Q Standard Deviation Equals 2SLSQ Standard Deviation. Exogenous Variables (Predetermined) 1 CONSTANT 2 D 93.61922 0.3100134 Endogenous Variables (Jointly Dependent) 3 P -0.2295381 Residual Variance for Structural Disturbances Ratio of Norm Residual to Norm LHS 3.926009688207962 1.809322459330604E-02 Test Case from Kmenta (1971) Pages 565 - 582 Limited Information - Maximum Likelihood Solution f 2 Supply Equation Rank and Condition Number of Exogenous Columns Rank and Condition Number of Endogenous Variables orthogonal to X(K) Rank and Condition Number of Endogenous Variables orthogonal to X Value of LIML Parameter is 1.000000000000000 Condition Number of Matrix is greater than Relative Numerical Error in the Solution LHS Endogenous Variable No. 2 8.209836250820180 4.943047984855735E-12 Q Standard Deviation Equals 2SLSQ Standard Deviation. Exogenous Variables (Predetermined) 1 CONSTANT 2 F 3 A 49.53244 0.2556057 0.2529242 Endogenous Variables (Jointly Dependent) 4 P 0.2400758 Residual Variance for Structural Disturbances Ratio of Norm Residual to Norm LHS 6.039577731391617 2.177103664979223E-02 Test Case from Kmenta (1971) Pages 565 - 582 Contemporaneous Covariance of Residuals (Structural Disturbances) For LIMLE Solution. Condition Number of residual columns, Demand E Supply E Demand E 1 3.337 3.629 1 2 2.811594 Supply E 2 4.832 Correlation Matrix of Residuals Demand E Supply E 1 2 Demand E 1 1.000 0.9038 Supply E 2 1.000 Test Case from Kmenta (1971) Pages 565 - 582 Coefficients of the Reduced Form Equations. LIMLE Solution. Condition number of matrix used to find the reduced form coefficients is no smaller than CONSTANT D 1 2 P Q 1 93.88 0.6601 2 72.07 0.1585 4.258817996669486 4-26 F A Chapter 4 3 4 -0.5443 -0.5386 0.1249 0.1236 Mean sum of squares of residuals for the reduced form equations. 1 2 P Q 0.41286D+01 0.38401D+01 Test Case from Kmenta (1971) Pages 565 - 582 Two Stage Least Squares Solution for System Number Condition Number of Matrix is greater than Relative Numerical Error in the Solution LHS Endogenous Variable No. 2 Exogenous Variables (Predetermined) 1 CONSTANT 2 D 1 Demand Equation 21.98482284147018 1.411421448020441E-11 Q 94.63330 0.3139918 Std. Error 7.920838 0.4694366E-01 t 11.94738 6.688695 Theil SE 7.302652 0.4327991E-01 Theil t 12.95876 7.254908 Endogenous Variables (Jointly Dependent) 3 P -0.2435565 Std. Error 0.9648429E-01 t -2.524313 Theil SE 0.8895412E-01 Theil t -2.738002 Residual Variance for Structural Disturbances Ratio of Norm Residual to Norm LHS 3.866416929101937 1.795538131264630E-02 Covariance Matrix of Estimated Parameters CONSTANT D P 1 2 3 CONSTANT D 1 2 62.74 0.4930E-01 0.2204E-02 -0.6734 -0.2642E-02 P 3 0.9309E-02 Correlation Matrix of Estimated Parameters CONSTANT D P 1 2 3 CONSTANT 1 1.000 0.1326 -0.8812 D P 2 3 1.000 -0.5833 1.000 Test Case from Kmenta (1971) Pages 565 - 582 Two Stage Least Squares Solution for System Number Condition Number of Matrix is greater than Relative Numerical Error in the Solution LHS Endogenous Variable No. 2 Exogenous Variables (Predetermined) 1 CONSTANT 2 F 3 A 2 Supply Equation 18.21923089332271 1.431397195953368E-11 Q 49.53244 0.2556057 0.2529242 Std. Error 12.01053 0.4725007E-01 0.9965509E-01 t 4.124086 5.409637 2.537996 Theil SE 10.74254 0.4226175E-01 0.8913422E-01 Theil t 4.610868 6.048158 2.837565 Endogenous Variables (Jointly Dependent) 4 P 0.2400758 Std. Error 0.9993385E-01 t 2.402347 Theil SE 0.8938355E-01 Theil t 2.685905 Residual Variance for Structural Disturbances Ratio of Norm Residual to Norm LHS 6.039577731391617 2.177103664979223E-02 Covariance Matrix of Estimated Parameters CONSTANT F A P 1 2 3 4 CONSTANT 1 144.3 -0.3238 -0.2952 -1.095 F A P 2 3 4 0.2233E-02 0.1377E-02 0.9362E-03 0.9931E-02 0.5791E-03 0.9987E-02 Correlation Matrix of Estimated Parameters CONSTANT F A P 1 2 3 4 CONSTANT 1 1.000 -0.5706 -0.2467 -0.9126 F A 2 1.000 0.2924 0.1983 P 3 1.000 0.5815E-01 4 1.000 Test Case from Kmenta (1971) Pages 565 - 582 Contemporaneous Covariance of Residuals (Structural Disturbances) For Two Stage Least Squares Solution. Condition Number of residual columns, Demand E 1 Demand E 1 3.286 Supply E 2 2.804709 Simultaneous Equations Systems Supply E 2 3.593 4-27 4.832 Correlation Matrix of Residuals Demand E Supply E 1 2 Demand E 1 1.000 0.9017 Supply E 2 1.000 Test Case from Kmenta (1971) Pages 565 - 582 Coefficients of the Reduced Form Equations. Two Stage Least Squares Solution Condition number of matrix used to find the reduced form coefficients is no smaller than P CONSTANT D F A 1 2 3 4 1 93.25 0.6492 -0.5285 -0.5230 Q 2 71.92 0.1559 0.1287 0.1274 Mean sum of squares of residuals for the reduced form equations. 1 2 P Q 0.39831D+01 0.38317D+01 Condition number of the large matrix in Three Stage Least Squares 60.70221 4.135372945327849 4-28 Chapter 4 Test Case from Kmenta (1971) Pages 565 - 582 Three Stage Least Squares Solution for System Number LHS Endogenous Variable No. 2 1 Demand Equation Q Exogenous Variables (Predetermined) 1 CONSTANT 2 D 94.63330 0.3139918 Std. Error 7.920838 0.4694366E-01 t 11.94738 6.688695 Theil SE 7.302652 0.4327991E-01 Theil t 12.95876 7.254908 Endogenous Variables (Jointly Dependent) 3 P -0.2435565 Std. Error 0.9648429E-01 t -2.524313 Theil SE 0.8895412E-01 Theil t -2.738002 Residual Variance (For Structural Disturbances) 3.286454 Three Stage Least Squares Covariance for System CONSTANT D P CONSTANT D 1 2 62.74 0.4930E-01 0.2204E-02 -0.6734 -0.2642E-02 1 2 3 Demand Equation P 3 0.9309E-02 Three Stage Least Squares Solution for System Number LHS Endogenous Variable No. 2 2 Supply Equation Q Exogenous Variables (Predetermined) 1 CONSTANT 2 F 3 A 52.11764 0.2289775 0.3579074 Std. Error 11.89337 0.4399381E-01 0.7288940E-01 t 4.382074 5.204767 4.910281 Theil SE 10.63776 0.3934926E-01 0.6519426E-01 Theil t 4.899308 5.819106 5.489861 Endogenous Variables (Jointly Dependent) 4 P 0.2289322 Std. Error 0.9967317E-01 t 2.296828 Theil SE 0.8915039E-01 Theil t 2.567932 Residual Variance (For Structural Disturbances) 5.360809 Three Stage Least Squares Covariance for System CONSTANT F A P CONSTANT 1 141.5 -0.2950 -0.4090 -1.083 1 2 3 4 F A Supply Equation P 2 3 0.1935E-02 0.2548E-02 0.8119E-03 0.5313E-02 0.1069E-02 4 0.9935E-02 Test Case from Kmenta (1971) Pages 565 - 582 Contemporaneous Covariance of Residuals (Structural Disturbances) For Three Stage Least Squares Solution. Condition Number of residual columns, Demand E Supply E 1 2 Demand E 1 3.286 4.111 6.321462 Supply E 2 5.361 Correlation Matrix of Residuals Demand E 1 1 1.000 2 0.9794 Demand E Supply E Supply E 2 1.000 Test Case from Kmenta (1971) Pages 565 - 582 Coefficients of the Reduced Form Equations. Three Stage Least Squares Solution using Orthogonal Factorization. Condition number of matrix used to find the reduced form coefficients is no smaller than P CONSTANT D F A 1 2 3 4 1 89.98 0.6645 -0.4846 -0.7575 Q 2 72.72 0.1521 0.1180 0.1845 Mean sum of squares of residuals for the reduced form equations. 1 2 P Q 0.19065D+01 0.42494D+01 Iterated Three Stage Least Squares Results are given next. 4.232905401139098 Simultaneous Equations Systems 4-29 Iteration begins for Iterated 3SLSQ. Condition number of the large matrix in Three Stage Least Squares 147.2220 Test Case from Kmenta (1971) Pages 565 - 582 Iterated Three Stage Least Squares Solution for System No. LHS Endogenous Variable No. 2 1 Demand Equation Q Exogenous Variables (Predetermined) 1 CONSTANT 2 D 94.63330 0.3139918 Std. Error 7.920838 0.4694366E-01 t 11.94738 6.688695 Theil SE 7.302652 0.4327991E-01 Theil t 12.95876 7.254908 Endogenous Variables (Jointly Dependent) 3 P -0.2435565 Std. Error 0.9648429E-01 t -2.524313 Theil SE 0.8895412E-01 Theil t -2.738002 Residual Variance (For Structural Disturbances) 3.286454 Iterated Three Stage Least Squares Covariance for System Demand Equation CONSTANT D P CONSTANT D 1 2 62.74 0.4930E-01 0.2204E-02 -0.6734 -0.2642E-02 1 2 3 P 3 0.9309E-02 Iterated Three Stage Least Squares Solution for System No. LHS Endogenous Variable No. 2 2 Supply Equation Q Exogenous Variables (Predetermined) 1 CONSTANT 2 F 3 A 52.55269 0.2244964 0.3755747 Endogenous Variables (Jointly Dependent) 4 P 0.2270569 Std. Error 12.74080 0.4653972E-01 0.7166061E-01 t 4.124755 4.823758 5.241020 Theil SE 11.39572 0.4162639E-01 0.6409520E-01 Theil t 4.611616 5.393126 5.859638 Std. Error 0.1069194 t 2.123627 Theil SE 0.9563159E-01 Theil t 2.374287 Residual Variance (For Structural Disturbances) 5.565111 Iterated Three Stage Least Squares Covariance for System Supply Equation CONSTANT F A P CONSTANT 1 162.3 -0.3336 -0.4953 -1.245 1 2 3 4 F A P 2 3 4 0.2166E-02 0.3185E-02 0.9086E-03 0.5135E-02 0.1336E-02 0.1143E-01 Test Case from Kmenta (1971) Pages 565 - 582 Contemporaneous Covariance of Residuals (Structural Disturbances) For Iterated Three Stage Least Squares Solution. Condition Number of residual columns, Demand E Supply E Demand E 1 3.286 4.198 1 2 6.814796 Supply E 2 5.565 Correlation Matrix of Residuals Demand E Supply E 1 2 Demand E 1 1.000 0.9816 Supply E 2 1.000 Test Case from Kmenta (1971) Pages 565 - 582 Coefficients of the Reduced Form Equations. Iterated Three Stage Least Squares Solution. Condition number of matrix used to find the reduced form coefficients is no smaller than P CONSTANT D F A 1 2 3 4 1 89.42 0.6672 -0.4770 -0.7981 Q 2 72.86 0.1515 0.1162 0.1944 Mean sum of squares of residuals for the reduced form equations. 1 P 0.20576D+01 4.249772824974006 4-30 2 Chapter 4 Q 0.43519D+01 In Table 4.4 Kmenta (1971, 582) reports the 3SLS and iterative three squares coefficients for the supply equation 1 3SLS 52.1972 (11.8934), .2286 (.0997) I3SLS 55.5527 (12.7408), .2271 (.1069), .2245(.0465 ) and .3756 (.0717) B34s gets 52.55269 (12.7408), .2270569 (.1069194), .2244964 (.04653972) and .3755747 (.07166061) The coefficient 55.5527 reported by Kmenta and underlined in table 4.4 appears in error. In Table 4.5 Kmenta (1986, 712) changes the estimated coefficients for iterative three stage least squares. The new numbers are 52.6618 (12.8051) , .2266(.1075), .2234(.0468) and .3800 (.0720). These numbers are quite different from the prior ones and bear some investigation. In the Kmenta test problem, one equation (demand) was overidentified and one equation (supply) was exactly identified. As was mentioned earlier, the 2SLS and 3SLS results for the overidentified equation are the same because the other equation was exactly identified. However, the 3SLS results for the exactly identified equation (supply) differ from the 2SLS results because the other equation (demand) is over identified. Close inspection of the results for 3SLS for the demand equation shows that they are the same as those of Kmenta (1971, 582) and Kmenta (1986, 712). As notes the iterative least squares supply-equation results are the same as those of Kmenta (1971) but differ slightly from those of Kmenta (1986), which appear to be in error. 10 To facilitate testing, SAS and RATS setups are shown in Tables 4.2 and 4.3 and their output discussed in some detail. 10 The file example.mac contains an extension of the above test case that calls RATS, SAS and a B34S matrix implementation. For the supply equation SAS gets the Kmenta (1986) results which are 52.1972 (11.8934), .2286 (.0997), .2282 (.0440), (.3611). What RATS calls 3SLS produces what B34S calls I3SLS. Readers are encouraged to use the code in tables 4.4 and 4.5 to further investigate this issue. A major difficulty for the researcher to be able to tell exactly what is being estimated by a software system. For this reason attempting the model on multiple software systems is strongly advised. Simultaneous Equations Systems Table 4.4 Kmenta (1971, 582) OLS, 2SLS, LIML, 3SLS, I3SLS, FIML Test Problem Answers 31 32 Chapter 4 Table 4.5 Kmenta (1986, 712) OLS, 2SLS, LIML, 3SLS, I3SLS, FIML Test Problem Answers As noted earlier, the 2SLS and 3SLS results for the over- identified equation (demand) are the same. However, the printout shows that the residual variance for the 2SLS result is 3.8664, while the residual variance for the 3SLS result is 3.2865. The reason for this apparent error is that the 2SLS residual variance equals the sum of squared residuals divided by T-K, while the 3SLS calculation uses T; hence, 3.8664 = 3.2865 *(20/17). Simultaneous Equations Systems 33 To investigate the differences in the supply equation that occur in Kmenta (1971) and (1986), edited and annotated SAS, RATS and Stata output is shown next. SAS 3SLS and I3SLS output is shown to agree with Kmenta (1986) for both demand and supply equations. Note that these numbers do not agree with Kmenta (1971)! The SYSLIN Procedure Three-Stage Least Squares Estimation Parameter Estimates Variable Intercept P D DF Parameter Estimate Standard Error t Value Pr > |t| 1 1 1 94.63330 -0.24356 0.313992 7.920838 0.096484 0.046944 11.95 -2.52 6.69 <.0001 0.0218 <.0001 Model Dependent Variable SUPPLY Q Parameter Estimates Variable Intercept P F A DF Parameter Estimate Standard Error t Value Pr > |t| 1 1 1 1 52.19720 0.228589 0.228158 0.361138 11.89337 0.099673 0.043994 0.072889 4.39 2.29 5.19 4.95 0.0005 0.0357 <.0001 0.0001 Endogenous Variables DEMAND SUPPLY P Q 0.243557 -0.22859 1 1 Exogenous Variables DEMAND SUPPLY Intercept D F A 94.6333 52.1972 0.313992 0 0 0.228158 0 0.361138 Inverse Endogenous Variables P Q DEMAND SUPPLY 2.11799 0.48415 -2.11799 0.51585 34 Chapter 4 The SYSLIN Procedure Three-Stage Least Squares Estimation Reduced Form P Q Intercept D F A 89.87924 72.74263 0.665032 0.152019 -0.48324 0.117695 -0.76489 0.186293 The SYSLIN Procedure Iterative Three-Stage Least Squares Estimation Parameter Estimates Variable Intercept P D DF Parameter Estimate Standard Error t Value Pr > |t| 1 1 1 94.63330 -0.24356 0.313992 7.920838 0.096484 0.046944 11.95 -2.52 6.69 <.0001 0.0218 <.0001 Model Dependent Variable SUPPLY Q Parameter Estimates Variable Intercept P F A DF Parameter Estimate Standard Error t Value Pr > |t| 1 1 1 1 52.66182 0.226586 0.223372 0.380006 12.80511 0.107459 0.046774 0.072010 4.11 2.11 4.78 5.28 0.0008 0.0511 0.0002 <.0001 Endogenous Variables DEMAND SUPPLY P Q 0.243557 -0.22659 1 1 Exogenous Variables DEMAND SUPPLY Intercept D F A 94.6333 52.66182 0.313992 0 0 0.223372 0 0.380006 Inverse Endogenous Variables P Q DEMAND SUPPLY 2.127012 0.481952 -2.12701 0.518048 The SYSLIN Procedure Iterative Three-Stage Least Squares Estimation Reduced Form P Q Intercept D F A 89.27387 72.89007 0.667864 0.151329 -0.47512 0.115718 -0.80828 0.196861 RATS output is shown next for OLS, 2SLS, LIML, and 3SLS two ways. Note that the 3SLS results 100% agree with what b34s and Kmenta get for I3SLS, not 3SLS. Rats is using the large sample SE. Rats output for the same problem using the nonlin procedure gets the same answers. The Rats Pro version 8.1 was used to make the calculations. Output from RATS * * Data passed from B34S(r) system to RATS * display @1 %dateandtime() @33 ' Rats Version ' %ratsversion() 03/10/2012 15:05 Rats Version 8.10000 Simultaneous Equations Systems * CALENDAR(IRREGULAR) ALLOCATE 20 OPEN DATA rats.dat DATA(FORMAT=FREE,ORG=OBS, MISSING= 0.1000000000000000E+32 Q P D F A CONSTANT SET TREND = T TABLE Series Obs Mean Q 20 100.898200000 P 20 100.019050000 D 20 97.535000000 F 20 96.625000000 A 20 10.500000000 TREND 20 10.500000000 $ ) / $ $ $ $ $ $ Std Error 3.756498224 5.926086394 11.830481371 12.708798237 5.916079783 5.916079783 Minimum 92.424000000 86.498000000 75.100000000 68.600000000 1.000000000 1.000000000 Maximum 106.232000000 113.490000000 127.100000000 110.800000000 20.000000000 20.000000000 * * heading=('test case from kmenta 1971 page 565 - 582 ' ) $ * exogenous constant d f a $ * endogenous p q $ * model lvar=q rvar=(constant p d) name=('demand eq.') $ * model lvar=q rvar=(constant p f a) name=('supply eq.') $ linreg q # constant p d Linear Regression - Estimation by Least Squares Dependent Variable Q Usable Observations 20 Degrees of Freedom 17 Centered R^2 0.7637886 R-Bar^2 0.7359990 Uncentered R^2 0.9996894 Mean of Dependent Variable 100.89820000 Std Error of Dependent Variable 3.75649822 Standard Error of Estimate 1.93012724 Sum of Squared Residuals 63.331649953 Regression F(2,17) 27.4847 Significance Level of F 0.0000047 Log Likelihood -39.9053 Durbin-Watson Statistic 1.7442 Variable Coeff Std Error T-Stat Signif ************************************************************************************ 1. Constant 99.89542291 7.51936214 13.28509 0.00000000 2. P -0.31629880 0.09067741 -3.48818 0.00281529 3. D 0.33463560 0.04542183 7.36729 0.00000110 linreg q # constant p f a Linear Regression - Estimation by Least Squares Dependent Variable Q Usable Observations 20 Degrees of Freedom 16 Centered R^2 0.6548075 R-Bar^2 0.5900838 Uncentered R^2 0.9995460 Mean of Dependent Variable 100.89820000 Std Error of Dependent Variable 3.75649822 Standard Error of Estimate 2.40508651 Sum of Squared Residuals 92.551058175 Regression F(3,16) 10.1170 Significance Level of F 0.0005602 Log Likelihood -43.6991 Durbin-Watson Statistic 2.1097 Variable Coeff Std Error T-Stat Signif ************************************************************************************ 1. Constant 58.275431202 11.462909888 5.08383 0.00011056 2. P 0.160366596 0.094883937 1.69013 0.11038810 3. F 0.248133295 0.046187854 5.37226 0.00006227 4. A 0.248302347 0.097517767 2.54623 0.02156713 instruments constant d f a linreg(inst) q # constant p d Linear Regression - Estimation by Instrumental Variables Dependent Variable Q Usable Observations 20 Degrees of Freedom 17 Mean of Dependent Variable 100.89820000 Std Error of Dependent Variable 3.75649822 Standard Error of Estimate 1.96632066 Sum of Squared Residuals 65.729087795 35 36 Chapter 4 J-Specification(1) Significance Level of J Durbin-Watson Statistic 2.5357 0.1113010 2.0092 Variable Coeff Std Error T-Stat Signif ************************************************************************************ 1. Constant 94.63330387 7.92083831 11.94738 0.00000000 2. P -0.24355654 0.09648429 -2.52431 0.02183240 3. D 0.31399179 0.04694366 6.68869 0.00000381 linreg(inst) q # constant p f a Linear Regression - Estimation by Instrumental Variables Dependent Variable Q Usable Observations 20 Degrees of Freedom 16 Mean of Dependent Variable 100.89820000 Std Error of Dependent Variable 3.75649822 Standard Error of Estimate 2.45755523 Sum of Squared Residuals 96.633243702 Durbin-Watson Statistic 2.3846 Variable Coeff Std Error T-Stat Signif ************************************************************************************ 1. Constant 49.532441699 12.010526407 4.12409 0.00079536 2. P 0.240075779 0.099933852 2.40235 0.02878451 3. F 0.255605724 0.047250071 5.40964 0.00005785 4. A 0.252924175 0.099655087 2.53800 0.02192877 source d:\r\liml.src @liml q # constant p d Linear Regression - Estimation by LIML Dependent Variable Q Usable Observations 20 Degrees of Freedom 17 Centered R^2 0.7510682 R-Bar^2 0.7217821 Uncentered R^2 0.9996726 Mean of Dependent Variable 100.89820000 Std Error of Dependent Variable 3.75649822 Standard Error of Estimate 1.98141608 Sum of Squared Residuals 66.742164700 Regression F(2,17) 25.6459 Significance Level of F 0.0000074 Log Likelihood -40.4298 Durbin-Watson Statistic 2.0517 Variable Coeff Std Error T-Stat Signif ************************************************************************************ 1. Constant 93.61922028 8.03124312 11.65688 0.00000000 2. P -0.22953809 0.09800238 -2.34217 0.03160318 3. D 0.31001345 0.04743306 6.53581 0.00000509 LIML Specification Test Chi-Squared(1)= 3.477343 with Significance Level 0.06221456 @liml q # constant p f a Linear Regression - Estimation by LIML Dependent Variable Q Usable Observations 20 Degrees of Freedom 16 Centered R^2 0.6395819 R-Bar^2 0.5720035 Uncentered R^2 0.9995260 Mean of Dependent Variable 100.89820000 Std Error of Dependent Variable 3.75649822 Standard Error of Estimate 2.45755523 Sum of Squared Residuals 96.633243702 Regression F(3,16) 9.4643 Significance Level of F 0.0007834 Log Likelihood -44.1307 Durbin-Watson Statistic 2.3846 Variable Coeff Std Error T-Stat Signif ************************************************************************************ 1. Constant 49.532441699 12.010526407 4.12409 0.00079536 2. P 0.240075779 0.099933852 2.40235 0.02878451 3. F 0.255605724 0.047250071 5.40964 0.00005785 4. A 0.252924175 0.099655087 2.53800 0.02192877 LIML Specification Test Chi-Squared(0)=-4.440892e-015 with Significance Level equation demand q # constant p d equation supply q NA Simultaneous Equations Systems # constant p f a * Supply does not match known answers!! sur(inst,iterations=200) 2 # demand resid1 # supply resid2 Linear Systems - Estimation by System Instrumental Variables Iterations Taken 6 Usable Observations 20 J-Specification(1) 2.9831 Significance Level of J 0.0841370 Dependent Variable Q Mean of Dependent Variable Std Error of Dependent Variable Standard Error of Estimate Sum of Squared Residuals Durbin-Watson Statistic 100.89820000 3.75649822 1.81285807 65.729087794 2.0092 Variable Coeff Std Error T-Stat Signif ************************************************************************************ 1. Constant 94.63330387 7.30265210 12.95876 0.00000000 2. P -0.24355654 0.08895412 -2.73800 0.00618138 3. D 0.31399179 0.04327991 7.25491 0.00000000 Dependent Variable Q Mean of Dependent Variable Std Error of Dependent Variable Standard Error of Estimate Sum of Squared Residuals Durbin-Watson Statistic 100.89820000 3.75649822 2.35904587 111.30194805 2.0945 Variable Coeff Std Error T-Stat Signif ************************************************************************************ 4. Constant 52.552667564 11.395623960 4.61165 0.00000399 5. P 0.227056969 0.095630772 2.37431 0.01758185 6. F 0.224496638 0.041626039 5.39318 0.00000007 7. A 0.375573566 0.064094682 5.85967 0.00000000 Covariance\Correlation Matrix of Coefficients Q Q Q 3.2864543897 0.98159966 Q 4.1979241683 5.5650974026 nonlin(parmset=structural) c0 c1 c2 d0 d1 d2 d3 compute c0 = .1 compute c1 = .1 compute c2 = .1 compute d0 = .1 compute d1 = .1 compute d2 = .1 compute d3 = .1 frml d_eq q = c0 + c1*p + c2*d frml s_eq q = d0 + d1*p + d2*f + d3*a nlsystem(inst,parmset=structural,outsigma=v) * * d_eq s_eq GMM-Factored Weight Matrix Convergence in 6 Iterations. Final criterion was Usable Observations 20 Function Value 2.98311941 J-Specification(1) 2.9831 Significance Level of J 0.0841370 Dependent Variable Q Mean of Dependent Variable Std Error of Dependent Variable Standard Error of Estimate Sum of Squared Residuals Durbin-Watson Statistic 100.89820000 3.75649822 1.81285807 65.729087792 2.0092 Dependent Variable Q Mean of Dependent Variable Std Error of Dependent Variable Standard Error of Estimate Sum of Squared Residuals Durbin-Watson Statistic 100.89820000 3.75649822 2.35904587 111.30194805 2.0945 0.0000065 <= 0.0000100 Variable Coeff Std Error T-Stat Signif *************************************************************************************** 1. C0 94.63330387 7.30265212 12.95876 0.00000000 2. C1 -0.24355654 0.08895412 -2.73800 0.00618138 3. C2 0.31399179 0.04327991 7.25491 0.00000000 4. D0 52.55266756 11.39562399 4.61165 0.00000399 5. D1 0.22705697 0.09563077 2.37431 0.01758185 6. D2 0.22449664 0.04162604 5.39318 0.00000007 7. D3 0.37557357 0.06409468 5.85967 0.00000000 37 38 Chapter 4 The Stata results are shown next. Note that 3SLS and I3SLS results agree 100% with the B34S simeq answers and what was reported in Kments (1971). output from stata ___ ____ ____ ____ ____ (R) /__ / ____/ / ____/ ___/ / /___/ / /___/ 12.1 Statistics/Data Analysis Copyright 1985-2011 StataCorp LP StataCorp 4905 Lakeway Drive College Station, Texas 77845 USA 800-STATA-PC http://www.stata.com 979-696-4600 stata@stata.com 979-696-4601 (fax) Single-user Stata perpetual license: Serial number: 3012042652 Licensed to: Houston H. Stokes U of Illinois Notes: 1. Stata running in batch mode . do stata.do . * File built by B34S . run statdata.do on 7/ 3/12 at 10:32:34 . about Stata/IC 12.1 for Windows (32-bit) Revision 06 Feb 2012 Copyright 1985-2011 StataCorp LP Total physical memory: 2097151 KB Available physical memory: 2097151 KB Single-user Stata perpetual license: Serial number: 3012042652 Licensed to: Houston H. Stokes U of Illinois . describe Contains data obs: 20 vars: 6 size: 960 ---------------------------------------------------------------------------------------------------storage display value variable name type format label variable label ---------------------------------------------------------------------------------------------------q double %10.0g Food consumption per head p double %10.0g Ratio of food prices to consumer prices d double %10.0g Disposable income in constant prices f double %10.0g Ratio of t 1 years price to general p a double %10.0g Time constant double %10.0g ---------------------------------------------------------------------------------------------------Sorted by: Note: dataset has changed since last saved . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------q | 20 100.8982 3.756498 92.424 106.232 p | 20 100.0191 5.926086 86.498 113.49 d | 20 97.535 11.83048 75.1 127.1 f | 20 96.625 12.7088 68.6 110.8 a | 20 10.5 5.91608 1 20 -------------+-------------------------------------------------------constant | 20 1 0 1 1 . reg3 (q p d) (q p f a), 2sls endog(p) Two-stage least-squares regression ---------------------------------------------------------------------Equation Obs Parms RMSE "R-sq" F-Stat P ---------------------------------------------------------------------q 20 2 1.966321 0.7548 23.81 0.0000 2q 20 3 2.457555 0.6396 10.70 0.0000 ---------------------------------------------------------------------- Simultaneous Equations Systems -----------------------------------------------------------------------------| Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------q | p | -.2435565 .0964843 -2.52 0.017 -.4398553 -.0472578 d | .3139918 .0469437 6.69 0.000 .2184842 .4094994 _cons | 94.6333 7.920838 11.95 0.000 78.51824 110.7484 -------------+---------------------------------------------------------------2q | p | .2400758 .0999339 2.40 0.022 .0367588 .4433927 f | .2556057 .0472501 5.41 0.000 .1594747 .3517367 a | .2529242 .0996551 2.54 0.016 .0501744 .455674 _cons | 49.53244 12.01053 4.12 0.000 25.09684 73.96804 -----------------------------------------------------------------------------Endogenous variables: q p Exogenous variables: d f a -----------------------------------------------------------------------------. reg3 (q p d) (q p f a), 3sls endog(p) Three-stage least-squares regression ---------------------------------------------------------------------Equation Obs Parms RMSE "R-sq" chi2 P ---------------------------------------------------------------------q 20 2 1.812858 0.7548 56.02 0.0000 2q 20 3 2.315342 0.6001 38.20 0.0000 ---------------------------------------------------------------------- 39 40 Chapter 4 B34S 8.11F 31 (D:M:Y) 7/ 3/12 (H:M:S) 10:32:35 PGMCALL STEP PAGE -----------------------------------------------------------------------------| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------q | p | -.2435565 .0889541 -2.74 0.006 -.4179034 -.0692097 d | .3139918 .0432799 7.25 0.000 .2291647 .3988189 _cons | 94.6333 7.302652 12.96 0.000 80.32037 108.9462 -------------+---------------------------------------------------------------2q | p | .2289322 .0891504 2.57 0.010 .0542006 .4036637 f | .2289775 .0393493 5.82 0.000 .1518544 .3061006 a | .3579074 .0651943 5.49 0.000 .230129 .4856858 _cons | 52.11764 10.63776 4.90 0.000 31.26802 72.96726 -----------------------------------------------------------------------------Endogenous variables: q p Exogenous variables: d f a -----------------------------------------------------------------------------. reg3 (q p d) (q p f a), ireg3 endog(p) Iteration Iteration Iteration Iteration Iteration Iteration Iteration 1: 2: 3: 4: 5: 6: 7: tolerance tolerance tolerance tolerance tolerance tolerance tolerance = = = = = = = .08379059 .01113651 .00158649 .00022817 .00003286 4.733e-06 6.818e-07 Three-stage least-squares regression, iterated ---------------------------------------------------------------------Equation Obs Parms RMSE "R-sq" chi2 P ---------------------------------------------------------------------q 20 2 1.812858 0.7548 56.02 0.0000 2q 20 3 2.359048 0.5849 36.80 0.0000 --------------------------------------------------------------------------------------------------------------------------------------------------| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------q | p | -.2435565 .0889541 -2.74 0.006 -.4179034 -.0692097 d | .3139918 .0432799 7.25 0.000 .2291647 .3988189 _cons | 94.6333 7.302652 12.96 0.000 80.32037 108.9462 -------------+---------------------------------------------------------------2q | p | .2270569 .0956315 2.37 0.018 .0396226 .4144911 f | .2244964 .0416263 5.39 0.000 .1429103 .3060825 a | .3755745 .0640951 5.86 0.000 .2499504 .5011986 _cons | 52.55269 11.39571 4.61 0.000 30.21751 74.88787 -----------------------------------------------------------------------------Endogenous variables: q p Exogenous variables: d f a -----------------------------------------------------------------------------. end of do-file What is to be made of this mystery? It is strange that b34s Kmenta (1971) and Stata agree 100% fopr 3SLS and I3SLS but that SAS version 9.2 on the same problem supports Kmenta (1986). Rats output suggests it is doing 3SLS when in fact what is going on is that it is calculating I3SLS with the NONLIN command. Close inspection of the output shows that these numbers support Kmenta (1971), Stata and b34s and not SAS or Kmenta (1986). It is to be noted that Jennings (1980) who developed the b34s simeq fortran code used the Kmenta (1971) problem as a test case but did not report numbers. Section 4.5 below attempts to solve this mystery by using “text book” formulas to obtain 2SLS, 3SLS and FIML answers. Since the exact b34s MATRIX commands are given, the calculation is 100% documented provided that the MATRIX command is working properly. All coefficients agree 100% with Kmenta (1971)! For FIML the results are calculated using the CMAX2 command which uses a zero finder routine from IMSL. The SE for the coefficients were calculated as | diag ( H 1 ) | where H is rthe hessian. Simultaneous Equations Systems 41 4.4 Exactly identified systems Table 4.6 shows the Kmenta supply and demand model modified to be exactly identified. In this form of the model the exogenous variable a was removed from the demand equation. In this case can be directly estimated with OLS and does not have to be calculated as B 1 using (4.1-4). It will be shown below that the LIML, 2SLS and 3SLS results are all the same. If is calculated from the biased OLS model over identified system, it will, however, not be the same. Table 4.6 Exactly Identified Kmenta Problem /; Modified PROBLEM FROM KMENTA (1971) PAGE 565 - 582 b34sexec options ginclude('b34sdata.mac') member(kmenta); b34srun; b34sexec simeq printsys reduced ols liml ls2 ls3 ils3 icov ipr=6 itmax=2000 kcov=diag ; heading=('Modified test case from kmenta 1971 pp 565-582' ) ; * the variable a has been removed from demand equation ; exogenous constant d f ; endogenous p q ; model lvar=q rvar=(constant p d) name=('demand eq.') ; model lvar=q rvar=(constant p f) name=('supply eq.') ; b34seend ; b34sexec matrix; call loaddata; call olsq(q d f :print); call olsq(p d f :print); b34srun; Edited output from running the code in Table 4.6 is shown below and will show alternative ways to calculate the constrained reduced form: Q = 71.7276 + .18278 D (15.93) (3.86) + .11739 F (2.67) (4.4-1) P = 85.1843 + .4346 D (10.19) (4.95) - .28520 F (-3.49) (4.4-2) which was estimated in (4.4-1) and (4.4-2) with OLS. 42 Chapter 4 Modified test case from kmenta 1971 pp 565-582 Least Squares Solution for System Number Condition Number of Matrix is greater than Relative Numerical Error in the Solution LHS Endogenous Variable No. 2 Exogenous Variables (Predetermined) 1 CONSTANT 2 D 1 demand eq. 21.04911571706159 1.301987681166638E-11 Q 99.89542 0.3346356 Std. Error 7.519362 0.4542183E-01 t 13.28509 7.367285 Endogenous Variables (Jointly Dependent) 3 P -0.3162988 Std. Error 0.9067741E-01 t -3.488177 Residual Variance for Structural Disturbances Ratio of Norm Residual to Norm LHS 3.725391173733892 1.762488253954560E-02 Modified test case from kmenta 1971 pp 565-582 Least Squares Solution for System Number Condition Number of Matrix is greater than Relative Numerical Error in the Solution LHS Endogenous Variable No. 2 Exogenous Variables (Predetermined) 1 CONSTANT 2 F 2 supply eq. 17.64779394899586 1.349763156429639E-11 Q 65.56501 0.2137827 Endogenous Variables (Jointly Dependent) 3 P 0.1467363 Residual Variance for Structural Disturbances Ratio of Norm Residual to Norm LHS Std. Error 12.76481 0.5080064E-01 t 5.136387 4.208269 Std. Error 0.1089446 t 1.346889 7.650185613573186 2.525668087747731E-02 Modified test case from kmenta 1971 pp 565-582 Coefficients of the Reduced Form Equations. Least Squares Solution. Condition number of matrix used to find the reduced form coefficients is no smaller than P CONSTANT D F 1 2 3 4.319326581036200 Q 1 74.14 0.7227 -0.4617 2 76.44 0.1060 0.1460 Mean sum of squares of residuals for the reduced form equations. 1 2 P Q 0.21861D+02 0.44308D+01 Condition Number of columns of exogenous variables, 9.7857 Modified test case from kmenta 1971 pp 565-582 Limited Information - Maximum Likelihood Solution f 1 demand eq. Rank and Condition Number of Exogenous Columns Rank and Condition Number of Endogenous Variables orthogonal to X(K) Rank and Condition Number of Endogenous Variables orthogonal to X Value of LIML Parameter is 1.000000000000000 Condition Number of Matrix is greater than Relative Numerical Error in the Solution LHS Endogenous Variable No. 2 8.517463415017575 4.390231825107355E-12 Q Standard Deviation Equals 2SLSQ Standard Deviation. Exogenous Variables (Predetermined) 1 CONSTANT 2 D 106.7894 0.3616812 Endogenous Variables (Jointly Dependent) 3 P -0.4115989 Residual Variance for Structural Disturbances Ratio of Norm Residual to Norm LHS 3.967444759365652 1.818845186115603E-02 Modified test case from kmenta 1971 pp 565-582 Limited Information - Maximum Likelihood Solution f 2 supply eq. 2 1 2 8.5174634 1.0000000 1.0000000 Simultaneous Equations Systems Rank and Condition Number of Exogenous Columns Rank and Condition Number of Endogenous Variables orthogonal to X(K) Rank and Condition Number of Endogenous Variables orthogonal to X Value of LIML Parameter is 2 1 2 43 7.8643511 1.0000000 1.0000000 1.000000000000000 Condition Number of Matrix is greater than Relative Numerical Error in the Solution LHS Endogenous Variable No. 2 7.864351104449048 5.058888259015094E-12 Q Standard Deviation Equals 2SLSQ Standard Deviation. Exogenous Variables (Predetermined) 1 CONSTANT 2 F 35.90387 0.2373297 Endogenous Variables (Jointly Dependent) 3 P 0.4205434 Residual Variance for Structural Disturbances Ratio of Norm Residual to Norm LHS 10.49268888645498 2.957901371051407E-02 Modified test case from kmenta 1971 pp 565-582 Modified test case from kmenta 1971 pp 565-582 Coefficients of the Reduced Form Equations. LIMLE Solution. Condition number of matrix used to find the reduced form coefficients is no smaller than P CONSTANT D F 1 2 3 2.403435013906650 Q 1 85.18 0.4346 -0.2852 2 71.73 0.1828 0.1174 Modified test case from kmenta 1971 pp 565-582 Two Stage Least Squares Solution for System Number Condition Number of Matrix is greater than Relative Numerical Error in the Solution LHS Endogenous Variable No. 2 Exogenous Variables (Predetermined) 1 CONSTANT 2 D 1 demand eq. 32.58122209700925 2.267663108215286E-11 Q Std. Error 11.14355 0.5640608E-01 106.7894 0.3616812 Endogenous Variables (Jointly Dependent) 3 P -0.4115989 Residual Variance for Structural Disturbances Ratio of Norm Residual to Norm LHS Std. Error 0.1448445 t 9.583069 6.412096 t -2.841660 Theil SE 10.27384 0.5200383E-01 Theil SE 0.1335401 Theil t 10.39430 6.954895 Theil t -3.082213 3.967444759365655 1.818845186115604E-02 Modified test case from kmenta 1971 pp 565-582 Two Stage Least Squares Solution for System Number Condition Number of Matrix is greater than Relative Numerical Error in the Solution LHS Endogenous Variable No. 2 Exogenous Variables (Predetermined) 1 CONSTANT 2 F 2 supply eq. 22.96654225297699 2.323008755765498E-11 Q 35.90387 0.2373297 Endogenous Variables (Jointly Dependent) 3 P 0.4205434 Residual Variance for Structural Disturbances Ratio of Norm Residual to Norm LHS Std. Error 18.86754 0.6019217E-01 t 1.902944 3.942866 Theil SE 17.39501 0.5549444E-01 Theil t 2.064032 4.276639 Std. Error 0.1660421 t 2.532751 Theil SE 0.1530833 Theil t 2.747154 10.49268888645498 2.957901371051407E-02 Modified test case from kmenta 1971 pp 565-582 Coefficients of the Reduced Form Equations. Two Stage Least Squares Solution Condition number of matrix used to find the reduced form coefficients is no smaller than P CONSTANT D F 1 2 3 1 85.18 0.4346 -0.2852 Q 2 71.73 0.1828 0.1174 2.403435013906650 44 Chapter 4 Modified test case from kmenta 1971 pp 565-582 Three Stage Least Squares Solution for System Number LHS Endogenous Variable No. 2 Exogenous Variables (Predetermined) 1 CONSTANT 2 D 1 Q 106.7894 0.3616812 Endogenous Variables (Jointly Dependent) 3 P -0.4115989 Std. Error 11.14355 0.5640608E-01 t 9.583069 6.412096 Std. Error 0.1448445 t -2.841660 Residual Variance (For Structural Disturbances) 2 Exogenous Variables (Predetermined) 1 CONSTANT 2 F Theil SE 10.27384 0.5200383E-01 Theil SE 0.1335401 Theil t 10.39430 6.954895 Theil t -3.082213 3.372328 Three Stage Least Squares Solution for System Number LHS Endogenous Variable No. demand eq. 2 supply eq. Q 35.90387 0.2373297 Endogenous Variables (Jointly Dependent) 3 P 0.4205434 Std. Error 18.86754 0.6019217E-01 t 1.902944 3.942866 Theil SE 17.39501 0.5549444E-01 Theil t 2.064032 4.276639 Std. Error 0.1660421 t 2.532751 Theil SE 0.1530833 Theil t 2.747154 Residual Variance (For Structural Disturbances) 8.918786 Coefficients of the Reduced Form Equations. Three Stage Least Squares Solution using Orthogonal Factorization. Condition number of matrix used to find the reduced form coefficients is no smaller than P CONSTANT D F 1 2 3 2.403435013906646 Q 1 85.18 0.4346 -0.2852 2 71.73 0.1828 0.1174 Note that the following OLS regressions successfully replicate the constrained reduced form values calculated by LIML, 2SLS and 3SLS models. In such exactly identified models it is possible to proceed from the reduced form to the coefficients of the estimated simultaneous structural model as shown in Table 4.1 for the theoretical model. B34S(r) Matrix Command. d/m/y 13/ 5/08. h:m:s => CALL LOADDATA$ => CALL OLSQ(Q D F :PRINT)$ Ordinary Least Squares Estimation Dependent variable Centered R**2 Adjusted R**2 Residual Sum of Squares Residual Variance Standard Error Total Sum of Squares Log Likelihood Mean of the Dependent Variable Std. Error of Dependent Variable Sum Absolute Residuals F( 2, 17) F Significance 1/Condition XPX Maximum Absolute Residual Number of Observations Variable D F CONSTANT => Lag 0 0 0 Coefficient 0.18278440 0.11738935 71.727578 8: 9:49. Q 0.7142164973143195 0.6805949087630629 76.62264354549249 4.507214326205441 2.123020095572682 268.1142991999998 -41.81037433562074 100.8982000000000 3.756498223780113 32.24420684107891 21.24279452844673 0.9999762143066244 5.775396842473943E-07 4.421086526017319 20 SE 0.47299583E-01 0.44030665E-01 4.5035392 t CALL OLSQ(P D F :PRINT)$ Ordinary Least Squares Estimation Dependent variable Centered R**2 Adjusted R**2 Residual Sum of Squares P 0.6043888119424351 0.5578463192297805 263.9721582328006 3.8643977 2.6660816 15.926935 Simultaneous Equations Systems Residual Variance Standard Error Total Sum of Squares Log Likelihood Mean of the Dependent Variable Std. Error of Dependent Variable Sum Absolute Residuals F( 2, 17) F Significance 1/Condition XPX Maximum Absolute Residual Number of Observations Variable D F CONSTANT Lag 0 0 0 Coefficient 0.43463860 -0.28520325 85.184338 45 15.52777401369415 3.940529661567611 667.2514989500000 -54.17988429200851 100.0190500000000 5.926086393627488 56.50496104816216 12.98574220495295 0.9996226165906434 5.775396842473943E-07 9.070540097816391 20 SE 0.87792579E-01 0.81725152E-01 8.3590023 t 4.9507442 -3.4897855 10.190730 4.5 Analysis of OLS, 2SLS and 3SLS using Matrix Command The matrix command, documented in Chapter 16, provides a means by which to illustrate the estimation of OLS, 2SLS and 3SLS models using “classic textbook” formulas. Table 4.7 shows code that implements OLS, 2SLS, 3SLS and FIML estimation using these formulas: Table 4.7 Matrix Command Implementation of OLS, 2SLS, 3SLS and FIML /$ /$ Estimates Kmenta Problem with Matrix command. /$ Purpose is to illustrate OLS/2SLS/3SLS/FIML both with /$ SIMEQ and with Matrix Commands. /$ /$ FIML SE same as 3SLS asymptotically (See Greene 5e page 408) /$ /$ Problem Discussed in "Specifying and Diagnostically Testing /$ Econometric Models" Chapter 4 Third Edition /$ %b34slet verbose=0; /$ set =1 to "test" matrix setup. Usually set=0 %b34slet dosimeq=1; /$ set =1 to run the SIMEQ command as well as matrix B34SEXEC DATA NOHEAD CORR$ INPUT Q P D F A $ LABEL Q = 'Food consumption per head'$ LABEL P = 'Ratio of Food Prices to consumer prices'$ LABEL D = 'Disposable Income in constant prices'$ LABEL F = 'Ratio of T-1 years price to general P'$ LABEL A = 'Time'$ COMMENT=('KMENTA(1971) PAGE 565 ANSWERS PAGE 582')$ DATACARDS$ 98.485 100.323 87.4 98.0 1 99.187 104.264 97.6 102.163 103.435 96.7 99.1 3 101.504 104.506 98.2 104.240 98.001 99.8 110.8 5 103.243 99.456 100.5 103.993 101.066 103.2 105.6 7 99.900 104.763 107.8 100.350 96.446 96.6 108.7 9 102.820 91.228 88.9 95.435 93.085 75.1 81.0 11 92.424 98.801 76.9 94.535 102.908 84.6 70.9 13 98.757 98.756 90.6 105.797 95.119 103.1 102.3 15 100.225 98.451 105.1 103.522 86.498 96.4 110.5 17 99.929 104.016 104.4 105.223 105.769 110.7 89.3 19 106.232 113.490 127.1 B34SRETURN$ B34SEEND$ 99.1 98.1 108.2 109.8 100.6 68.6 81.4 105.0 92.5 93.0 2 4 6 8 10 12 14 16 18 20 46 Chapter 4 %b34sif(&dosimeq.eq.1)%then; B34SEXEC SIMEQ PRINTSYS REDUCED OLS LIML LS2 LS3 FIML FIMLC KCOV=DIAG IPR=6$ HEADING=('Test Case from Kmenta (1971) Pages 565 - 582 ' ) $ EXOGENOUS CONSTANT D F A $ ENDOGENOUS P Q $ MODEL LVAR=Q RVAR=(CONSTANT P D) NAME=('Demand Equation')$ MODEL LVAR=Q RVAR=(CONSTANT P F A) NAME=('Supply Equation')$ B34SEEND$ %b34sendif; b34sexec matrix; call loaddata; verbose=0; %b34sif(&verbose.ne.0)%then; verbose=1; %b34sendif; x_1=mfam(catcol(constant p d)); x_2=mfam(catcol(constant p f a)); x_1px_1=transpose(x_1)*x_1; x_2px_2=transpose(x_2)*x_2; x_1py_1=transpose(x_1)*vfam(q); x_2py_2=transpose(x_2)*vfam(q); d1=inv(x_1px_1)*x_1py_1; d2=inv(x_2px_2)*x_2py_2; call print('OLS eq 1 ',d1 ); call print('OLS eq 2 ',d2 ); * 2SLS ; * z_i is right hand side of equation i ; x = mfam(catcol(constant d f a)); xpx = transpose(x)*x; z_1 = mfam(catcol(constant p d) ); z_2 = mfam(catcol(constant p f a)); xpz_1 = transpose(x)*z_1; xpz_2 = transpose(x)*z_2; xpy_1 = transpose(x)*vfam(q); xpy_2 = transpose(x)*vfam(q); y_1py_1 = vfam(q)*vfam(q); y_2py_2 = vfam(q)*vfam(q); y_1py_2 = vfam(q)*vfam(q); ls2eq1=inv(transpose(xpz_1)*inv(xpx)*xpz_1)* (transpose(xpz_1)*inv(xpx)*xpy_1); call print('Two stage estimates Equation 1',ls2eq1); fit1=vfam(q)-z_1*ls2eq1; sigma11=(y_1py_1 - (2.*vfam(q)*z_1*ls2eq1) + ls2eq1*transpose(z_1)*z_1*ls2eq1)/17.; if(verbose.ne.0)then; call print('sigma11 ',sigma11:); call print('Residual Variance 1',sigma11*sigma11:); call print('Test 1 ',(fit1*fit1)/ 17.:); call print('Large sample ',(fit1*fit1)/ 20.:); endif; Simultaneous Equations Systems varcoef1=sigma11*inv(transpose(z_1)*x*inv(xpx)*transpose(x)*z_1); call print('Asymptotic Covariance Matrix eq 1 ',varcoef1); ls2eq2=inv(transpose(xpz_2)*inv(xpx)*xpz_2)* (transpose(xpz_2)*inv(xpx)*xpy_2); call print('Two stage estimates Equation 2',ls2eq2); fit2=vfam(q)-z_2*ls2eq2; sigma22=(y_2py_2 - (2.*vfam(q)*z_2*ls2eq2) + ls2eq2*transpose(z_2)*z_2*ls2eq2)/16.; if(verbose.ne.0)then; call print('sigma22 ',sigma22:); call print('Residual Variance 2',sigma22*sigma22:); call print('Test 2 ',(fit2*fit2)/ 16.:); call print('Large Sample ',(fit2*fit2)/ 20.:); endif; sigma12=(y_1py_2 - (vfam(q)*z_1*ls2eq1) - (vfam(q)*z_2*ls2eq2) + ls2eq1*transpose(z_1)*z_2*ls2eq2)/20.; if(verbose.ne.0)call print('test sigma12 ',sigma12); varcoef2=sigma22*inv(transpose(z_2)*x*inv(xpx)*transpose(x)*z_2); call print('Asymptotic Covariance Matrix eq 2 ',varcoef2); * Get sigma(i,j) from fits ; s=mfam(catcol(fit1,fit2)); sigma=(transpose(s)*s)/20.; call print('Large Sample sigma (Jennings) ',sigma); covar1=sigma(1,1)*inv(transpose(xpz_1)*inv(xpx)*xpz_1); covar2=sigma(2,2)*inv(transpose(xpz_2)*inv(xpx)*xpz_2); call print('Estimated Covariance Matrix - Large Sample':); call print(covar1,covar2); ls2se=dsqrt(array(:covar1(1,1),covar1(2,2),covar1(3,3) covar2(1,1),covar2(2,2),covar2(3,3) covar2(4,4))); call print('SE of LS2 Model Equations - Large Sample',ls2se); sssigma(1,1)=sigma(1,1)*(20./17.); sssigma(1,2)=sigma(1,2)*(20./dsqrt(17.*16.)); sssigma(2,1)=sigma(2,1)*(20./dsqrt(17.*16.)); sssigma(2,2)=sigma(2,2)*(20./16.); call print('Kmenta (Small Sample Sigma ',sssigma); covar1=sssigma(1,1)*inv(transpose(xpz_1)*inv(xpx)*xpz_1); covar2=sssigma(2,2)*inv(transpose(xpz_2)*inv(xpx)*xpz_2); call print('Estimated Covariance Matrix - Small Sample':); call print(covar1,covar2); ls2se=dsqrt(array(:diag(covar1),diag(covar2))); call print('SE of LS2 Model Equations - Small Sample',ls2se); * LS3 calculation ; xpxinv=inv(xpx); /$ sigma=inv(sssigma); sigma=inv(sigma); 47 48 Chapter 4 term11= sigma(1,1)*(transpose(xpz_1)*xpxinv*xpz_1); term12= sigma(1,2)*(transpose(xpz_1)*xpxinv*xpz_2); term21= sigma(2,1)*(transpose(xpz_2)*xpxinv*xpz_1); term22= sigma(2,2)*(transpose(xpz_2)*xpxinv*xpz_2); left1 =catcol(term11 term12); left2 =catcol(term21 term22); left =catrow(left1 left2); if(verbose.ne.0) call print(term11 term12 term21 term22 left1 left2 left); right1=(sigma(1,1)*(transpose(xpz_1)*xpxinv*xpy_1)) + (sigma(1,2)*(transpose(xpz_1)*xpxinv*xpy_2)); right2=(sigma(2,1)*(transpose(xpz_2)*xpxinv*xpy_1)) + (sigma(2,2)*(transpose(xpz_2)*xpxinv*xpy_2)); right=catrow(right1 right2); call print(right1 right2 right,inv(left)); ls3=inv(left)*right; call print('Three Stage Least Squares ',ls3); ls3se = dsqrt(diag(inv(left))); t3sls=array(norows(ls3):ls3(,1))/afam(ls3se); call print('Three Stage Least Squares SE',ls3se); call print('Three Stage Least Squares t ',t3sls); * FIML following Kmenta (1971) pages 578 - 581 ; * q = f(constant P D ) * q = g(constant p F A) * q = a1 + a2*p + a3*d * q = b1 + b2*p + b3*f ; ; + u1 ; + b4*a + u2; y = transpose(mfam(catcol(q p))); x = transpose(mfam(catcol(constant d f a))); gt= 2.* dfloat(norows(y)); t =dfloat(norows(y)); call print('Using 3sls starting values ',ls3); /$ /$ /$ /$ /$ /$ /$ a1=sfam(ls3(1)); a2=sfam(ls3(2)); a3=sfam(ls3(3)); b1=sfam(ls3(4)); b2=sfam(ls3(5)); b3=sfam(ls3(6)); b4=sfam(ls3(7)); program model; bigb = matrix(2,2: 1.0, -1.0*a2, 1.0, -1.0*b2); biggamma = matrix(2,4:-1.0*a1, -1.0*a3, 0.0, 0.0, -1.0*b1, 0.0, -1.0*b3, -1.0*b4); u1u2=bigb*y+biggamma*x; phi = u1u2*transpose(u1u2); Simultaneous Equations Systems 49 /$ General purpose FIML setup if there are no identities /$ For a discussion of Formulas see Kmenta (1971) page 578-581 func=(-1.0*(gt*pi())/2.0) - ((t/2.0)*dlog(dmax1(dabs(det(phi)) ,.1d-30) )) + ( t *dlog(dmax1(dabs(det(bigb)),.1d-30) )) - (.5*sum(transpose(u1u2)*inv(phi)*u1u2)); call call call call call outstring(3, 3,'Function'); outdouble(36,3,func); outdouble(4, 4, a1); outdouble(36,4, a2); outdouble(55,4, a3); call outdouble(4 ,5, b1); call outdouble(36,5, b2); call outdouble(55,5, b3); call outdouble(4, 6, b4); return; end; call rvec ll uu call print(model); =vector(7:ls3); =vector(7:) -1.d+2; =vector(7:) +1.d+3; echooff; call cmaxf2(func :name model :parms a1 a2 a3 b1 b2 b3 b4 :ivalue rvec :lower ll :upper UU :maxit 10000 :maxfun 10000 :maxg 10000 :print); b34srun; The matrices X_1 and X_2 are built with the catcol command and the OLS estimates for equations 1 and 2 are respectively D1 and D2. Edited results show. OLS eq 1 D1 = Vector of 99.8954 => 3 elements -0.316299 0.334636 CALL PRINT('OLS eq 2 ',D2 )$ OLS eq 2 D2 = Vector of 58.2754 4 0.160367 elements 0.248133 0.248302 which are consistent with what was obtained with the simeq command. Next using the “textbook” 2SLS formula 50 Chapter 4 ˆ1 [ Z1' X ( X ' X ) 1 X ' Z1 ]1 [ Z1' X ( X ' X ) 1 X ' y1 ] ˆ2 [ Z 2' X ( X ' X ) 1 X ' Z 2 ]1[ Z 2' X ( X ' X ) 1 X ' y2 ] i j [eˆ1 , eˆ2 ]'[eˆ1 , eˆ2 ]/ T (4.5-1) we obtain the 2SLS estimates and the error covariance matrix i , j which is needed for the 3SLS estimates. Edited results match what was found earlier with simeq. Note that call echooff; has been turned off to illustrate the steps of the calculation. Two stage estimates Equation 1 LS2EQ1 = Vector of 94.6333 3 elements -0.243557 0.313992 => FIT1=VFAM(Q)-Z_1*LS2EQ1$ => => SIGMA11=(Y_1PY_1 - (2.*VFAM(Q)*Z_1*LS2EQ1) + LS2EQ1*TRANSPOSE(Z_1)*Z_1*LS2EQ1)/17.$ => IF(VERBOSE.NE.0)THEN$ => CALL PRINT('sigma11 => CALL PRINT('Residual Variance => CALL PRINT('Test 1 => CALL PRINT('Large sample ',(FIT1*FIT1)/ 20.:)$ => ENDIF$ => VARCOEF1=SIGMA11*INV(TRANSPOSE(Z_1)*X*INV(XPX)*TRANSPOSE(X)*Z_1)$ => CALL PRINT('Asymptotic Covariance Matrix eq 1 ',VARCOEF1)$ ',SIGMA11:)$ 1',SIGMA11*SIGMA11:)$ ',(FIT1*FIT1)/ 17.:)$ Asymptotic Covariance Matrix eq 1 VARCOEF1= Matrix of 1 2 3 1 62.7397 -0.673422 0.493016E-01 3 by 3 2 -0.673422 0.930922E-02 -0.264190E-02 elements 3 0.493016E-01 -0.264190E-02 0.220371E-02 => => LS2EQ2=INV(TRANSPOSE(XPZ_2)*INV(XPX)*XPZ_2)* (TRANSPOSE(XPZ_2)*INV(XPX)*XPY_2)$ => CALL PRINT('Two stage estimates Equation 2',LS2EQ2)$ Two stage estimates Equation 2 LS2EQ2 = Vector of 49.5324 4 0.240076 elements 0.255606 0.252924 Simultaneous Equations Systems => FIT2=VFAM(Q)-Z_2*LS2EQ2$ => => SIGMA22=(Y_2PY_2 - (2.*VFAM(Q)*Z_2*LS2EQ2) + LS2EQ2*TRANSPOSE(Z_2)*Z_2*LS2EQ2)/16.$ => IF(VERBOSE.NE.0)THEN$ => CALL PRINT('sigma22 => CALL PRINT('Residual Variance 2',SIGMA22*SIGMA22:)$ => CALL PRINT('Test 2 => CALL PRINT('Large Sample ',(FIT2*FIT2)/ 20.:)$ => ENDIF$ => => SIGMA12=(Y_1PY_2 - (VFAM(Q)*Z_1*LS2EQ1) - (VFAM(Q)*Z_2*LS2EQ2) + LS2EQ1*TRANSPOSE(Z_1)*Z_2*LS2EQ2)/20.$ => IF(VERBOSE.NE.0)CALL PRINT('test sigma12 ',SIGMA12)$ => VARCOEF2=SIGMA22*INV(TRANSPOSE(Z_2)*X*INV(XPX)*TRANSPOSE(X)*Z_2)$ => CALL PRINT('Asymptotic Covariance Matrix eq 2 ',VARCOEF2)$ ',SIGMA22:)$ ',(FIT2*FIT2)/ 16.:)$ Asymptotic Covariance Matrix eq 2 VARCOEF2= Matrix of 1 2 3 4 1 144.253 -1.09541 -0.323818 -0.295229 4 by 2 -1.09541 0.998677E-02 0.936222E-03 0.579069E-03 4 elements 3 -0.323818 0.936222E-03 0.223257E-02 0.137681E-02 4 -0.295229 0.579069E-03 0.137681E-02 0.993114E-02 => * GET SIGMA(I,J) FROM FITS $ => S=MFAM(CATCOL(FIT1,FIT2))$ => SIGMA=(TRANSPOSE(S)*S)/20.$ => CALL PRINT('Large Sample sigma (Jennings) ',SIGMA)$ Large Sample sigma (Jennings) SIGMA 1 2 = Matrix of 1 3.28645 3.59324 2 by 2 elements 2 3.59324 4.83166 => COVAR1=SIGMA(1,1)*INV(TRANSPOSE(XPZ_1)*INV(XPX)*XPZ_1)$ => COVAR2=SIGMA(2,2)*INV(TRANSPOSE(XPZ_2)*INV(XPX)*XPZ_2)$ => CALL PRINT('Estimated Covariance Matrix - Large Sample':)$ 51 52 Chapter 4 Estimated Covariance Matrix - Large Sample => CALL PRINT(COVAR1,COVAR2)$ COVAR1 1 2 3 COVAR2 1 2 3 4 = Matrix of 1 53.3287 -0.572408 0.419064E-01 3 3 2 -0.572408 0.791284E-02 -0.224561E-02 = Matrix of 1 115.402 -0.876328 -0.259055 -0.236183 by 4 3 0.419064E-01 -0.224561E-02 0.187315E-02 by 4 2 -0.876328 0.798942E-02 0.748977E-03 0.463256E-03 elements elements 3 -0.259055 0.748977E-03 0.178606E-02 0.110144E-02 4 -0.236183 0.463256E-03 0.110144E-02 0.794491E-02 => LS2SE=DSQRT(ARRAY(:DIAG(COVAR1),DIAG(COVAR2)))$ => CALL PRINT('SE of LS2 Model Equations - Large Sample',LS2SE)$ SE of LS2 Model Equations - Large Sample LS2SE = Array of 7.30265 7 elements 0.889541E-01 0.432799E-01 => SSSIGMA(1,1)=SIGMA(1,1)*(20./17.)$ => SSSIGMA(1,2)=SIGMA(1,2)*(20./DSQRT(17.*16.))$ => SSSIGMA(2,1)=SIGMA(2,1)*(20./DSQRT(17.*16.))$ => SSSIGMA(2,2)=SIGMA(2,2)*(20./16.)$ => CALL PRINT('Kmenta (Small Sample Sigma 10.7425 ',SSSIGMA)$ Kmenta (Small Sample Sigma SSSIGMA = Matrix of 1 2 1 3.86642 4.35744 2 by 2 elements 2 4.35744 6.03958 => COVAR1=SSSIGMA(1,1)*INV(TRANSPOSE(XPZ_1)*INV(XPX)*XPZ_1)$ => COVAR2=SSSIGMA(2,2)*INV(TRANSPOSE(XPZ_2)*INV(XPX)*XPZ_2)$ => CALL PRINT('Estimated Covariance Matrix - Small Sample':)$ Estimated Covariance Matrix - Small Sample => CALL PRINT(COVAR1,COVAR2)$ COVAR1 = Matrix of 1 3 by 2 3 elements 3 0.893836E-01 0.422617E-01 0.891342E-01 Simultaneous Equations Systems 1 2 3 COVAR2 1 2 3 4 62.7397 -0.673422 0.493016E-01 = Matrix of 1 144.253 -1.09541 -0.323818 -0.295229 -0.673422 0.930922E-02 -0.264190E-02 4 by 0.493016E-01 -0.264190E-02 0.220371E-02 4 2 -1.09541 0.998677E-02 0.936222E-03 0.579069E-03 53 elements 3 -0.323818 0.936222E-03 0.223257E-02 0.137681E-02 4 -0.295229 0.579069E-03 0.137681E-02 0.993114E-02 => => LS2SE=DSQRT(ARRAY(:COVAR1(1,1),COVAR1(2,2),COVAR1(3,3) COVAR2(1,1),COVAR2(2,2),COVAR2(3,3) COVAR2(4,4)))$ => CALL PRINT('SE of LS2 Model Equations - Small Sample',LS2SE)$ SE of LS2 Model Equations - Small Sample LS2SE = Array of 7.92084 7 elements 0.964843E-01 0.469437E-01 12.0105 0.999339E-01 0.472501E-01 0.996551E-01 Note that the estimated asymptotic covariance matrix for each equation was calculated as ˆ11[ Z1 ' X ( X ' X ) 1 X ' Z1 ]1 ˆ 22 [ Z 2 ' X ( X ' X ) 1 X ' Z 2 ]1 (4.5-2) The SE for each coefficient is the square root of the diagonal elements of the estimated covariance matrix. The 3SLS model is estimated using the “textbook” equation as ˆ1,1 [ Z1' X [ X ' X ]1 X ' y1 ] 1 ˆ1,1[ Z1' X [ X ' X ]1 X ' Z1 ˆ1,2 [ Z1' X [ X ' X ]1 X ' Z 2 ˆ1,2 [ Z1' X [ X ' X ]1 X ' y2 ] (4.5-3) ' 1 ' 1 ' 1 ˆ2,1[ Z 2 X [ X ' X ] X ' Z1 ˆ2,2 [ Z 2 X [ X ' X ] X ' Z 2 ˆ2,1 [ Z 2 X [ X ' X ] X ' y1 ] ˆ [ Z ' X [ X ' X ]1 X ' y ] 2 2,2 2 where [ ]1 . Equation (4.5-3) comes directly from Kmenta (1971, 577) and is consistent with Theil (1971, 510). The estimated output verifies the simeq 3SLS command. In the matrix program each term in (4.5-3) is output and put together into the left and right parts of (4.5-3), which at first looks formidable. => * LS3 CALCULATION $ => XPXINV=INV(XPX)$ => SIGMA=INV(SIGMA)$ => TERM11= SIGMA(1,1)*(TRANSPOSE(XPZ_1)*XPXINV*XPZ_1)$ => TERM12= SIGMA(1,2)*(TRANSPOSE(XPZ_1)*XPXINV*XPZ_2)$ => TERM21= SIGMA(2,1)*(TRANSPOSE(XPZ_2)*XPXINV*XPZ_1)$ 54 Chapter 4 => TERM22= SIGMA(2,2)*(TRANSPOSE(XPZ_2)*XPXINV*XPZ_2)$ => LEFT1 =CATCOL(TERM11 TERM12)$ => LEFT2 =CATCOL(TERM21 TERM22)$ => LEFT => IF(VERBOSE.NE.0)THEN$ => CALL PRINT(TERM11 TERM12 TERM21 TERM22 LEFT1 LEFT2 LEFT)$ => ENDIF$ => => RIGHT1=(SIGMA(1,1)*(TRANSPOSE(XPZ_1)*XPXINV*XPY_1)) + (SIGMA(1,2)*(TRANSPOSE(XPZ_1)*XPXINV*XPY_2))$ => => RIGHT2=(SIGMA(2,1)*(TRANSPOSE(XPZ_2)*XPXINV*XPY_1)) + (SIGMA(2,2)*(TRANSPOSE(XPZ_2)*XPXINV*XPY_2))$ => RIGHT=CATROW(RIGHT1 RIGHT2)$ => CALL PRINT(RIGHT1 RIGHT2 RIGHT,INV(LEFT))$ RIGHT1 =CATROW(LEFT1 LEFT2)$ = Vector of 3 842.104 RIGHT2 RIGHT 1 2 3 4 5 6 7 84261.3 = Vector of -208.606 82406.3 4 elements -20873.2 = Matrix of -20220.4 7 by 1 elements 7 by 7 elements 1 53.3287 -0.572408 0.419064E-01 52.0707 -0.556756 0.337445E-01 0.509185E-01 2 -0.572408 0.791284E-02 -0.224561E-02 -0.291667 0.494945E-02 -0.180825E-02 -0.272854E-02 3 0.419064E-01 -0.224561E-02 0.187315E-02 -0.232929 0.632767E-03 0.150833E-02 0.227598E-02 => LS3=INV(LEFT)*RIGHT$ => CALL PRINT('Three State Least Squares ',LS3)$ Three Stage Least Squares LS3 = Matrix of 1 2 3 4 5 6 7 => -2196.91 1 842.104 84261.3 82406.3 -208.606 -20873.2 -20220.4 -2196.91 Matrix of 1 2 3 4 5 6 7 elements 7 by 1 94.6333 -0.243557 0.313992 52.1176 0.228932 0.228978 0.357907 LS3SE = DSQRT(DIAG(INV(LEFT)))$ 1 elements 4 52.0707 -0.291667 -0.232929 113.162 -0.866671 -0.235979 -0.327163 5 -0.556756 0.494945E-02 0.632767E-03 -0.866671 0.794779E-02 0.649506E-03 0.855426E-03 6 0.337445E-01 -0.180825E-02 0.150833E-02 -0.235979 0.649506E-03 0.154836E-02 0.203856E-02 7 0.509185E-01 -0.272854E-02 0.227598E-02 -0.327163 0.855426E-03 0.203856E-02 0.425029E-02 Simultaneous Equations Systems => 55 CALL PRINT('Three State Least Squares SE',LS3SE)$ Three State Least Squares SE LS3SE = Vector of 7.30265 7 0.889541E-01 elements 0.432799E-01 10.6378 0.891504E-01 0.393493E-01 0.651943E-01 The estimated standard errors are those suggested by Theil. The FIML estimation method required a maximization procedure. Kmenta (1971) shows that for a model without constraints FIML maximizes L GT T 1 T log(2 ) log | | T log | B | ( Byt xt )' 1 ( Byt xt ) 2 2 2 t 1 (4.5-4) where G M or the number of equations in the model. The Kmenta test problem can be written q 1 2 P 3 D u1 Demand q 1 2 P 3F 4 A u2 Supply (4.5-5) For this problem 0 1 3 0 1 2 11 12 B , , 0 3 4 1 2 12 22 1 and | B | and | | refer to the Jacobian or absolute value of the determinant of B and respectively. Using the matrix command it is fairly easy to implement this estimator. Problems can arise of there are local maximums in the problem. The edited FIML results are given next. => PROGRAM MODEL$ => CALL PRINT(MODEL)$ MODEL = Program PROGRAM MODEL$ BIGB = MATRIX(2,2: 1.0, -1.0*A2, 1.0, -1.0*B2)$ BIGGAMMA = MATRIX(2,4:-1.0*A1, -1.0*A3, 0.0, 0.0, -1.0*B1, 0.0, -1.0*B3, -1.0*B4)$ U1U2=BIGB*Y+BIGGAMMA*X$ PHI = U1U2*TRANSPOSE(U1U2)$ FUNC=(-1.0*(GT*PI())/2.0) - ((T/2.0)*DLOG(DMAX1(DABS(DET(PHI)) ,.1D-30) )) + ( T *DLOG(DMAX1(DABS(DET(BIGB)),.1D-30) )) - (.5*SUM(TRANSPOSE(U1U2)*INV(PHI)*U1U2))$ CALL OUTSTRING(3, 3,'Function')$ CALL OUTDOUBLE(36,3,FUNC)$ CALL OUTDOUBLE(4, 4, A1)$ CALL OUTDOUBLE(36,4, A2)$ CALL OUTDOUBLE(55,4, A3)$ CALL OUTDOUBLE(4 ,5, B1)$ CALL OUTDOUBLE(36,5, B2)$ 56 Chapter 4 CALL OUTDOUBLE(55,5, B3)$ CALL OUTDOUBLE(4, 6, B4)$ RETURN$ END$ => RVEC =VECTOR(7:LS3)$ => LL =VECTOR(7:) -1.D+2$ => UU =VECTOR(7:) => CALL ECHOOFF$ +1.D+3$ Constrained Maximum Likelihood Estimation using CMAXF2 Command Final Functional Value -13.37570521223952 # of parameters 7 # of good digits in function 15 # of iterations 28 # of function evaluations 55 # of gradiant evaluations 30 Scaled Gradient Tolerance 6.055454452393343E-06 Scaled Step Tolerance 3.666852862501036E-11 Relative Function Tolerance 3.666852862501036E-11 False Convergence Tolerance 2.220446049250313E-14 Maximum allowable step size 108037.5007234256 Size of Initial Trust region -1.000000000000000 1 / Cond. of Hessian Matrix 2.229180241990960E-09 # 1 2 3 4 5 6 7 Name A1 A2 A3 B1 B2 B3 B4 Coefficient 93.619219 -0.22953804 0.31001341 51.944511 0.23730613 0.22081875 0.36970888 Standard Error 3.4191227 0.60544227E-01 0.34296485E-01 7.3541629 0.45456398E-01 0.28752980E-01 0.14370566E-01 T Value 27.381064 -3.7912458 9.0392183 7.0632799 5.2205221 7.6798560 25.726814 SE calculated as sqrt |diagonal(inv(%hessian))| Hessian Matrix 1 2 3 4 5 6 7 1 230.516 23086.3 22522.1 -174.305 -17457.4 -16823.6 -1834.45 2 23089.2 0.231266E+07 0.225634E+07 -17456.3 -0.174875E+07 -0.168477E+07 -183704. 3 22524.9 0.225660E+07 0.220289E+07 -17029.9 -0.170618E+07 -0.164463E+07 -179499. 4 -174.328 -17458.5 -17032.0 135.877 13607.8 13115.4 1430.03 5 -17459.8 -0.174897E+07 -0.170639E+07 13609.6 0.136313E+07 0.131342E+07 143201. 6 -16825.9 -0.168498E+07 -0.164483E+07 13117.1 0.131360E+07 0.126732E+07 137898. 7 -1834.71 -183728. -179522. 1430.22 143221. 137918. 15323.9 Gradiant Vector -0.568518E-06 -0.557801E-04 -0.544320E-04 0.447704E-06 0.438995E-04 0.419615E-04 0.528029E-05 Lower vector -100.000 -100.000 -100.000 -100.000 -100.000 -100.000 -100.000 1000.00 1000.00 1000.00 1000.00 1000.00 1000.00 Upper vector 1000.00 B34S Matrix Command Ending. Last Command reached. Space available in allocator Number variables used Number temp variables used 7873665, peak space used 130, peak number used 36882, # user temp clean 8277 135 0 Simultaneous Equations Systems 57 and replicate the Kmenta (1971) test values for coefficients. The simeq FIML results are: Test Case from Kmenta (1971) Pages 565 - 582 Functional Minimization Solution for System No. LHS Endogenous Variable No. 2 1 Demand Equation Q Exogenous Variables (Predetermined) 1 CONSTANT 2 D 93.61922 0.3100134 Std. Error 6.152863 0.3633922E-01 t 15.21555 8.531097 Theil SE 5.672659 0.3350311E-01 Theil t 16.50359 9.253274 Endogenous Variables (Jointly Dependent) 3 P -0.2295381 Std. Error 0.7508118E-01 t -3.057199 Theil SE 0.6922143E-01 Theil t -3.315998 Residual Variance (For Structural Disturbances) 3.337108 Functional Minimization 3SLS Covariance for System CONSTANT D P CONSTANT D 1 2 37.86 0.3121E-01 0.1321E-02 -0.4078 -0.1600E-02 1 2 3 Demand Equation P 3 0.5637E-02 Functional Minimization Solution for System No. LHS Endogenous Variable No. 2 2 Supply Equation Q Exogenous Variables (Predetermined) 1 CONSTANT 2 F 3 A 51.94451 0.2208188 0.3697089 Std. Error 9.739647 0.3489965E-01 0.5846143E-01 t 5.333305 6.327249 6.323981 Theil SE 8.711405 0.3121520E-01 0.5228949E-01 Theil t 5.962816 7.074080 7.070425 Endogenous Variables (Jointly Dependent) 4 P 0.2373061 Std. Error 0.8237774E-01 t 2.880707 Theil SE 0.7368089E-01 Theil t 3.220728 Residual Variance (For Structural Disturbances) 5.620947 Functional Minimization 3SLS Covariance for System CONSTANT F A P CONSTANT 1 94.86 -0.1858 -0.3119 -0.7341 1 2 3 4 F A Supply Equation P 2 3 4 0.1218E-02 0.1943E-02 0.4772E-03 0.3418E-02 0.8825E-03 0.6786E-02 Test Case from Kmenta (1971) Pages 565 - 582 Contemporaneous Covariance of Residuals (Structural Disturbances) For Functional Minimization 3SLSQ Solution. Condition Number of residual columns, Demand E Supply E Demand E 1 3.337 4.255 1 2 6.942988 Supply E 2 5.621 Correlation Matrix of Residuals Demand E 1 1 1.000 2 0.9824 Demand E Supply E Supply E 2 1.000 Test Case from Kmenta (1971) Pages 565 - 582 Coefficients of the Reduced Form Equations. Condition number of matrix used to find the reduced form coefficients is no smaller than P CONSTANT D F A 1 2 3 4 1 89.27 0.6641 -0.4730 -0.7919 Q 2 73.13 0.1576 0.1086 0.1818 Mean sum of squares of residuals for the reduced form equations. 1 2 P Q 0.20588D+01 0.43479D+01 4.284084281338983 58 Chapter 4 and give identical coefficients but different SE's due to the algorithm used. Greene (2003, page 408), notes that "asymptotically the covariance matrix for the FIML estimator is the same as that for the 3SLS estimator." The purpose of this exercise has been to illustrate how "textbook" formulas can be used with a programming language, such as the matrix command, to produce 2SLS, 3SLS and FIML estimates fairly easily where the alternative would be to build a C or Fortran program to perform the calculation. Since "textbook" formulas are used for the matrix example, the accuracy of these calculations are inferior to the QR approach of Jennings (1980), which is the basis for the simeq command. Inspection of the matrix program that implements these estimators may give the reader confidence to tackle other calculations that have not been implemented in commercial software.11 The matrix examples shown have been coded for teaching purposes (clarity of the code) not research purposes. Many components of the calculation that appear a number of places in a formula such as (4.4-3) have not been calculated once and saved. 4.6 LS2 and GMM Models and Specification tests The Generalized Method of Moments estimation technique is a generalization of 2SLS that allows for various assumptions on the error distribution. Assume there are l instruments in Z. The basic idea of GMM is to select coefficients ˆGMM such that g (ˆGMM ) 0 (4.6-1) where 1 N 1 N 1 g ( ˆ ) gi ( ) zi' ( yi xi ) Z 'u N i1 N i1 N (4.6-2) It can be shown that the efficient GMM estimator is ˆEGMM ( X ' ZS 1Z ' X )1 X ' ZS 1Z ' y (4.6-3) where S E[ Z ' uu ' Z ] E[ Z ' Z ] (4.6-4) Using the 2SLS residuals, a heteroskedasticity-consistent estimator of S can be obtained as 11 The modern pace of research is so fast that if one waits until a new procedure is implemented in commercial software, often it is too late. Simultaneous Equations Systems 1 N Sˆ uˆ 2 Z i' Z i N i1 59 (4.6-5) which has been characterized as a standard sandwich approach to robust covariance estimation. For more details see Davidson and MacKinnon (1993, 607-610) and Baum (2006, 194-197) Hall – Rudebusch - Wilcox (1996) proposed a likelihood ratio test of the relevance of instrumental variables Z that is based on canonical correlation between the X and Z ri . The ordered canonical correlation vector can be calculated as the square root of the eigenvalues of ( X ' X )1 ( X ' Z )( Z ' Z )1 (Z ' X ) (4.6-6) with associated eigenvectors i or the square root of the eigenvalues of (Z ' Z )1 (Z ' X )( X ' X )1 ( X ' Z ) (4.6-7) with associated eigenvectors i . The vectors 1 and 1 maximize the correlation between X and Z which equals r1 . As noted by Hall-Rudebusch-Wilcox (1996, 287) “ j and j are the vectors which yield the j th highest correlation r j subject to the constrains that X j and Z j are orthogonal.” The proposed Anderson statistic n LR T log(1 ri 2 ) (4.6-8) i j 1 is distributed as Chi-squared with (l-k+1) degrees of freedom where l is the rank of Z and k is the rank of X and can be applied to both 2SLS and GMM models. A significant statistic is consistent with appropriate instruments. A disadvantage of the Anderson test, is that it assumes that the regressors are distributed multivariate normal. Further information on the Anderson test is in Baum (2006, 208). The Anderson statistic can also be displayed in LM form as N min(ri ) (4.6-9) or in the Cragg-Donald (1993) form as ( N min(ri )) / (1 min(ri )) . If these statistics are not significant, the instruments selected are weak. (4.6-10) 60 Chapter 4 For GMM estimation, the Hansen (1982) J statistic which tests for overidentifying restrictions is usually used. The Hansen test, which is also called the Sargon (1958) test, is the value of the efficient GMM objective function u ' ZS 1Z ' u (4.6-11) and is distributed as chi-square with degrees of freedom l-k or the number of over identifying restrictions. A significant value indicates the selected instruments are not suitable. For 2SLS the J statistic is NR 2 (4.6-12) which is also distributed as 2 (l k ) . The Basmann (1960) over identification test is (u ' u uZ' uZ ) ( N l ) LS 2 LS'2 uZ uZ (4.6-11) where uLS 2 is the residual from the LS2 equation and uz is the residual from a model that predicts uLS 2 as a function of Z. The Basmann test is distributed as chi-square with degrees of freedom l-k. If the instruments Z have no predictive power, or in other words are orthogonal to ' ' the LS2 residuals, then uLS 2u LS 2 u Z u Z and the chi-square value will not be significant. A significant chi-square value, however, indicates that the instruments are not suitable since they are not exogenous. The Hausman (1978) test is discussed in some detail in Camereon-Trivedi (2005, 271276). The basic test is H (ˆ )'(Vˆ[ ] Vˆ[ˆ]) 1 (ˆ ) (4.6-12) Where ˆ is the OLS estimator and is the instrumental variable estimator. H is distributed as 2 (k ) where k is the number of endogenous variables tested. A significant value suggests that OLS should not be used. Table 4.8 lists subroutines LS2 and GMMEST that estimate 2SLS and GMM models respectively. For an exactly identified system, LS2 and GMM will be the same. For an overidentified system, GMM is more efficient. Simultaneous Equations Systems Table 4.8 LS2 and General Method of Moments estimation routines subroutine ls2(y1,x1,z1,var_name,yvar,iprint); /; /; y1 => left hand side Usually set as %y from OLS /; x1 => right hand side. Usually set as %x from OLS step /; z1 => instrumental Variables /; var_name => Names from OLS step. Usually set as %names /; yvar => usually set from call olsq as %yvar /; iprint => =1 print coef, =2 print covariance in addition /; /; if # of obs for z1 < x1 then x1 will be truncated /; /; Automatic variables created /; %olscoef => OLS Coefficients /; %ols_se => OLS SE /; %ols_t => OLS t /; %ls2coef => LS2 Coefficients /; %ls2_sel => Large Sample LS2 SE /; %ls2_ses => Small Sample LS2 SE /; %ls2_t_l => Large Sample LS2 t /; %ls2_t_s => Small Sample LS2 t /; %rss_ols => e'e for OLS /; %rss_ls2 => e'e for LS2 /; %yhatols => yhat for OLS /; %yhatls2 => yhat for LS2 /; %resols => OLS Residual /; %resls2 => LS2 Residual /; %covar l => Large Sample covariance /; %covar_s => Small Sample covariance /; %sigma_l => Large Sample sigma /; %sigma_s => Small Sample Sigma /; %z /; %varcov1 => From OLS /; %info => Model is ok if = 0 /; For conditional Heteroskedasticity Sargan(1958)=Hansen(1982) J test /; %sargan => Sargan(1958) test /; %basmann => Basmann(1960) /; /; Example Job: /; /; b34sexec options ginclude('b34sdata.mac') member(kmenta); /; b34srun; /; /; b34sexec matrix; /; call loaddata; /; call echooff; /; call print('OLS for Equation # 1':); /; call olsq(q p d :savex :print); /; call ls2a(%y,%x,catcol(d,f,a,constant),%names,%yvar,1); /; /; call print('OLS for Equation # 2':); /; call olsq(q p a f: a :savex :print); /; call ls2(%y,%x,catcol(d,f,a,constant),%names,%yvar,1); /; b34srun; /; /; Command built 26 April 2010, Mods 26 May 2010 2 August 2010 /; y =vfam(y1); %z=mfam(z1); 61 62 Chapter 4 x =mfam(x1); n1=norows(%z); n2=norows(x); if(n2.lt.n1)call deleterow(%z,1,(n1-n2)); if(n1.lt.n2)then; call epprint('ERROR: # obs for instruments < # obs for equation'); go to done; endif; /; This saves the OLS Results call olsq(y x :noint); %olscoef=%coef; %ols_se=%se; %ols_t =%t; n_k=%nob-%k; %rss_ols=%rss; %yhatols=%yhat; %resols =%res; %varcov1=%resvar*%xpxinv; * 2SLS ; zpz = transpose(%z)*%z; zpx = transpose(%z)*x; zpy = transpose(%z)*y; ypy = y*y; irank=rank(zpx); iorder=rank(zpz); /; if(iorder.lt.irank)then; call epprint('ERROR: Model Underidentified.':); go to done; endif; /; %ls2coef =inv(transpose(zpx)*inv(zpz)*zpx)* (transpose(zpx)*inv(zpz)*zpy); /; /; Error trap turned off /; /; call gminv((transpose(zpx)*inv(zpz)*zpx),%ls2coef,%info,rrcond); /; if(%info.ne.0)then; /; go to done; /; endif; %yhatls2=x*%ls2coef; %resls2 =y-%yhatls2; sigma_w=(ypy - (2.*y*x*%ls2coef) + %ls2coef*transpose(x)*x*%ls2coef)/dfloat(n_k); %covar_s=sigma_w*inv(transpose(x)*%z*inv(zpz)*transpose(%z)*x); %ls2_ses=dsqrt(diag(%covar_s)); * Get sigma(i,j) from fits ; %rss_ls2=sumsq(%resls2); %sigma_l=%rss_ls2/dfloat(%nob); %sigma_s=%rss_ls2/dfloat(n_k); Simultaneous Equations Systems %covar_l=%sigma_l*inv(transpose(zpx)*inv(zpz)*zpx); %ls2_sel=dsqrt(diag(%covar_l)); %ls2_t_s=afam(%ls2coef)/afam(%ls2_ses); %ls2_t_l=afam(%ls2coef)/afam(%ls2_sel); /; /; squared canonical correlations /; if(iprint.ne.0)then; can_corr=real(eig(inv(transpose(x)*x)*(transpose(x)*%z)*inv(zpz)*zpx)); call print(can_corr); anderson=-1.*dfloat(norows(%z)) *dlog(sum(kindas(%z,1.0)-afam(can_corr))); anderlm = dfloat(norows(%z))*min(can_corr); cragg_d = anderlm/(1.0 - min(can_corr)); endif; /; /; %sargan & %basmann /; call olsq(%resls2 %z :noint); %basmann=(dfloat( norows(%z)-nocols(%z))*(sumsq(%resls2)-%rss))/%rss; %sargan = dfloat(norows(%z))*%rsq; /; if(iprint.ne.0)then; call print(' ':); call print('OLS and LS2 Estimation':); call print(' ':); gg= 'Dependent Variable '; gg2=c1array(8:yvar); ff=catrow(gg,gg2); call print(ff:); call print('OLS Sum of squared Residuals ',%rss_ols:); call print('LS2 Sum of squared Residuals ',%rss_ls2:); call print('Large Sample ls2 sigma ',%sigma_l:); call print('Small Sample ls2 sigma ',%sigma_s:); call print('Rank of Equation ',irank:); call print('Order of Equation ',iorder:); if(irank.lt.iorder)call print('Equation is overidentified':); if(irank.eq.iorder)call print('Equation is exactly identified':); /; call print('Anderson LR ident./IV Relevance test ',anderson:); /; if(iorder.ge.irank.and.anderson.gt.0.0)then; aprob=chisqprob(anderson,dfloat(iorder+1-irank)); call print('Significance of Anderson LR Statistic',aprob:); endif; /; call print('Anderson Canon Correlation LM test ',anderlm:); /; if(iorder.ge.irank.and.anderlm.gt.0.0)then; aprob=chisqprob(anderlm,dfloat(iorder+1-irank)); call print('Significance of Anderson LM Statistic',aprob:); endif; /; call print('Cragg-Donald Chi-Square Weak ID Test ',cragg_d:); /; if(iorder.ge.irank.and.cragg_d.gt.0.0)then; aprob=chisqprob(cragg_d,dfloat(iorder+1-irank)); call print('Significance of Cragg-Donald test ',aprob:); endif; /; 63 64 Chapter 4 call print('Basmann ',%basmann:); /; if(iorder.gt.irank.and.%basmann.gt.0.0)then; bprob=chisqprob(%basmann,dfloat(iorder-irank)); call print('Significance of Basmann Statistic ',bprob:); endif; /; call print('Sargan N*R-sq / J-Test Test ',%sargan:); /; if(iorder.gt.irank.and.%sargan.gt.0.0)then; sprob=chisqprob(%sargan,dfloat(iorder-irank)); call print('Significance of Sargan Statistic ',sprob:); endif; /; call print(' ':); call print('Hausman (1978) test - Sig. => need LS2':); call hausman('All coef. tested with Full (small) Covar. Matrix', %olscoef,%varcov1,%ls2coef,%covar_s, hausmant,h_sig,iprint); call hausman('All coef. tested with Full (large) Covar. Matrix', %olscoef,%varcov1,%ls2coef,%covar_l, hausmant,h_sig,iprint); call hausman('All coef. tested with diag (small) Covar. Matrix', %olscoef,diagmat(diag(%varcov1)), %ls2coef,diagmat(diag(%covar_s)), hausmant,h_sig,iprint); call hausman('All coef. tested with diag (large) Covar. Matrix', %olscoef,diagmat(diag(%varcov1)), %ls2coef,diagmat(diag(%covar_l)), hausmant,h_sig,iprint); /; call tabulate(var_name,%olscoef,%ols_se,%ols_t,%ls2coef, %ls2_ses,%ls2_sel, %ls2_t_s,%ls2_t_l :title '+++++++++++++++++++++++++++++++++++++++++++++++++++++'); call print(' ':); if(iprint.eq.2) call print('Estimated Covariance Matrix - Large Sample',%covar_1); endif; /; call makeglobal(%olscoef); call makeglobal(%ols_se); call makeglobal(%ols_t); call makeglobal(%ls2coef); call makeglobal(%ls2_sel); call makeglobal(%ls2_ses); call makeglobal(%ls2_t_l); call makeglobal(%ls2_t_s); call makeglobal(%rss_ols); call makeglobal(%rss_ls2); call makeglobal(%yhatols); call makeglobal(%yhatls2); call makeglobal(%resols); call makeglobal(%resls2); call makeglobal(%covar_l); call makeglobal(%covar_s); call makeglobal(%sigma_l); call makeglobal(%sigma_s); call makeglobal(%z); Simultaneous Equations Systems call makeglobal(%sargan); call makeglobal(%basmann); call makeglobal(%varcov1); /; call makeglobal(%info); /; done continue; return; end; subroutine gmmest(y,x,z,names,yvar,j_stat,sigma,iprint); /; /; GMM Model - Built 12 May 2010 /; /; Must call ls2 prior to this call to produce global variable /; %z /; /; The following global variables are created: /; %resgmm => GMM Residuals /; %segmm => GMM SE /; %tgmm => GMM t /; %coefgmm => GMM Coef /; %yhatgmm => GMM Y hat /; %covar_g => Variance Covariance /; /; The Anderson Test is discussed in Baum /; "An introduction to Modern Econometrics Using Stata" (2006) p. 208 /; Both the IV and LM forms of tgeh test are given. /; /; Generates feasable two-step GMM Estimator. Results are the same as /; produced by the RATS "optimalweights" option. /; /; Note: When running bootstraps inv(s) can fail to invert if dummy /; variables are in the dataset. /; /; See Baum (2006) page 196 /; xpz = transpose(x)*z; xpy = transpose(x)*vfam(y); ypy = vfam(y)*vfam(y); /; /; GMM Coefficients /; irank =rank(xpz); iorder=rank(transpose(z)*z); /; if(iorder.lt.irank)then; call epprint('ERROR: Model Underidentified.':); go to done; endif; /; adj=kindas(z,1.0)/dfloat(norows(z)); s=hc_sigma(adj,z,%resls2); inv_s=inv(s); %coefgmm=inv(xpz*inv_s*transpose(xpz)) * (xpz*inv_s*transpose(z)*vfam(y)); %resgmm =vfam(y)-x*%coefgmm; %yhatgmm=x*%coefgmm; sigma=hc_sigma(kindas(z,1.),z,%resls2); /; /; Logic from Rats User's Guide Version 7 page 245 /; j_stat=%resgmm*z*inv(sigma)*transpose(z)*%resgmm; 65 66 Chapter 4 /; /; Stock Watcon 2007 page 734 /; %covar_g=inv(xpz*inv(sigma)*transpose(xpz)); %segmm=dsqrt(diag(%covar_g)); %tgmm=afam(%coefgmm)/afam(%segmm); /; /; /; squared canonical correlations /; can_corr = real(eig(inv(transpose(x)*x)*(transpose(x)*z) *inv(transpose(z)*z)* transpose(xpz))); /; if(iprint.gt.1)call print(can_corr); anderson=-1.*dfloat(norows(z)) *dlog(sum(kindas(z,1.0)-afam(can_corr))); anderlm = dfloat(norows(z))*min(can_corr); cragg_d = anderlm/(1.0 - min(can_corr)); /; if(iprint.ne.0)then; call print(' ':); call print('GMM Estimates':); call print(' ':); gg= 'Dependent Variable '; gg2=c1array(8:yvar); ff=catrow(gg,gg2); call print(ff:); call print('OLS sum of squares ',sumsq(%resols):); call print('LS2 sum of squares ',sumsq(%resls2):); call print('GMM sum of squares ',sumsq(%resgmm):); call print('Rank of Equation ',irank:); call print('Order of Equation ',iorder:); if(irank.lt.iorder)call print('Equation is overidentified':); if(irank.eq.iorder)call print('Equation is exactly identified':); call print('Anderson ident./IV Relevance test ',anderson:); /; if(iorder.ge.irank.and.anderson.gt.0.0)then; aprob=chisqprob(anderson,dfloat(iorder+1-irank)); call print('Significance of Anderson Statistic ',aprob:); endif; /; call print('Anderson Canon Correlation LM test ',anderlm:); /; if(iorder.ge.irank.and.anderlm.gt.0.0)then; aprob=chisqprob(anderlm,dfloat(iorder+1-irank)); call print('Significance of Anderson LM Statistic',aprob:); endif; /; call print('Cragg-Donald Chi-Square Weak ID Test ',cragg_d:); /; if(iorder.ge.irank.and.cragg_d.gt.0.0)then; aprob=chisqprob(cragg_d,dfloat(iorder+1-irank)); call print('Significance of Cragg-Donald test ',aprob:); endif; /; call print('Hansen J_stat Ident. of instruments',j_stat:); /; if(iorder.gt.irank.and.j_stat.gt.0.0)then; jprob=chisqprob(j_stat,dfloat(iorder-irank)); call print('Significance of Hansen J_stat ',jprob:); /; Simultaneous Equations Systems 67 call print(' ':); call hausman('Hausman (1978) test - Sig. => Need GMM', %olscoef,%varcov1,%coefgmm,%covar_g, hausmant,h_sig,iprint); endif; /; call tabulate(names,%coefgmm,%segmm,%tgmm :title '+++++++++++++++++++++++++++++++++++++++++++++++++++++'); call print(' ':); endif; call makeglobal(%resgmm); call makeglobal(%segmm); call makeglobal(%tgmm); call makeglobal(%coefgmm); call makeglobal(%yhatgmm); call makeglobal(%covar_g); done continue; return; end; Table 4.9 shows the setup to estimate and test LS2 and GMM models for the Griliches (1976) wage data used as a test case in Baum (2006). The Griliches model regresses the log wage on education, experience, tenure, age, a number of control variables and various year dummy variables. Stata and Rats results are shown for comparison. In addition Baum (2006) can be inspected for replication purposes. 68 Chapter 4 Table 4.9 Estimation of LS2 and GMM Models using B34S, Stata and Rats %b34slet %b34slet %b34slet %b34slet b34sexec dob34s1=0; dob34s2=1; dostata=1; dorats =1; options ginclude('micro.mac') member(griliches76); b34srun %b34sif(&dob34s1.ne.0)%then; b34sexec matrix; call loaddata; call echooff; call olsq(iq s expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 med kww age mrt :print); iqyhat=%yhat; call olsq(lw iqyhat s expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 :print); call olsq(lw iq s expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 :print); call gamfit(lw iq s expr tenure rns[factor,1] smsa[factor,1] iyear_67[factor,1] iyear_68[factor,1] iyear_69[factor,1] iyear_70[factor,1] iyear_71[factor,1] iyear_73[factor,1] :print); call marspline(lw iq s expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 :print :nk 40 :mi 2); call gamfit(lw80 iq s expr tenure rns[factor,1] smsa[factor,1] iyear_67[factor,1] iyear_68[factor,1] iyear_69[factor,1] iyear_70[factor,1] iyear_71[factor,1] iyear_73[factor,1] :print); call marspline(lw80 iq s expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 :print :nk 40 :mi 2); Simultaneous Equations Systems b34srun; %b34sendif; %b34sif(&dob34s2.ne.0)%then; b34sexec matrix; call loaddata; call load(ls2); call echooff; call character(lhs,'lw'); call character(endvar,'iq'); call character(rhs,'iq s expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 constant'); call character(ivar,'s expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 constant med kww age mrt'); call olsq(argument(lhs) argument(rhs) :noint :print :savex); call ls2(%y,%x,catcol(argument(ivar)),%names,%yvar,1); call print(lhs,rhs,ivar,endvar); call gmmest(%y,%x,%z,%names,%yvar,j_stat,sigma,1); call graph(%y %yhatols %yhatls2,%yhatgmm :nocontact :pgborder :nolabel); b34srun; %b34sendif; %b34sif(&dostata.ne.0)%then; b34sexec options open('statdata.do') unit(28) disp=unknown$ b34srun$ b34sexec options clean(28)$ b34srun$ b34sexec options open('stata.do') unit(29) disp=unknown$ b34srun$ b34sexec options clean(29)$ b34srun$ b34sexec pgmcall idata=28 icntrl=29$ stata$ * for detail on stata commands see Baum page 205 ; pgmcards$ * uncomment if do not use /e * log using stata.log, text global xlist s expr tenure rns smsa iyear_67 /// iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 ivregress 2sls ivregress liml ivregress gmm lw $xlist (iq=med kww age mrt) lw $xlist (iq=med kww age mrt) lw $xlist (iq=med kww age mrt) ivreg lw $xlist (iq=med kww age mrt) ivreg2 lw $xlist (iq=med kww age mrt) ivreg2 lw $xlist (iq=med kww age mrt), gmm2s robust overid, all * orthog(age mrt) gmm (lw-{xb:$xlist iq} +{b0}), /// instruments ($xlist med kww age mrt) onestep nolog exit,clear 69 70 Chapter 4 b34sreturn$ b34seend$ b34sexec options close(28); b34srun; b34sexec options close(29); b34srun; b34sexec options dounix('stata -b do stata.do ') dodos('stata /e stata.do'); b34srun; b34sexec options npageout writeout('output from stata',' ',' ') copyfout('stata.log') dodos('erase stata.do', /; 'erase stata.log', 'erase statdata.do') $ b34srun$ %b34sendif; %b34sif(&dorats.ne.0)%then; b34sexec options open('rats.dat') unit(28) disp=unknown$ b34srun$ b34sexec options open('rats.in') unit(29) disp=unknown$ b34srun$ b34sexec options clean(28)$ b34srun$ b34sexec options clean(29)$ b34srun$ b34sexec pgmcall$ rats passasts pcomments('* ', '* Data passed from B34S(r) system to RATS', '* ', "display @1 %dateandtime() @33 ' Rats Version ' %ratsversion()" '* ') $ PGMCARDS$ * instruments s expr tenure rns smsa iyear_67 $ iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 med kww age mrt constant * OLS linreg lw # constant s expr tenure rns smsa iyear_67 $ iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 iq * 2SLS linreg(inst) lw # constant s expr tenure rns smsa iyear_67 $ iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 iq * GMM linreg(inst,optimalweights) lw # constant s expr tenure rns smsa iyear_67 $ iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 iq b34sreturn$ b34srun $ b34sexec options close(28)$ b34srun$ b34sexec options close(29)$ b34srun$ $ Simultaneous Equations Systems 71 b34sexec options /$ dodos(' rats386 rats.in rats.out ') dodos('start /w /r rats32s rats.in /run') dounix('rats rats.in rats.out')$ B34SRUN$ b34sexec options npageout WRITEOUT('Output from RATS',' ',' ') COPYFOUT('rats.out') dodos('ERASE rats.in','ERASE rats.out','ERASE dounix('rm rats.in','rm rats.out','rm $ B34SRUN$ %b34sendif; rats.dat') rats.dat') Edited and annotated results are shown next. Variable RNS RNS80 MRT MRT80 SMSA SMSA80 MED IQ KWW YEAR AGE AGE80 S S80 EXPR EXPR80 TENURE TENURE80 LW LW80 IYEAR_67 IYEAR_68 IYEAR_69 IYEAR_70 IYEAR_71 IYEAR_73 CONSTANT Label 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 # Cases residency in South residency in South in 1980 marital status = 1 if married marital status = 1 if married in 1980 reside metro area = 1 if urban reside metro area = 1 if urban in 1980 mother s education, years iq score score on knowledge in world of work test Year Age Age in 1980 completed years of schooling completed years of schooling in 1980 experience, years experience, yearsin 1980 tenure, years tenure, years in 1980 log wage log wage in 1980 Number of observations in data file Current missing variable code 758 1.000000000000000E+31 758 758 758 758 758 758 758 758 758 758 758 758 758 758 758 758 758 758 758 758 758 758 758 758 758 758 758 Mean 0.269129 0.292876 0.514512 0.898417 0.704485 0.712401 10.9103 103.856 36.5739 69.0317 21.8351 33.0119 13.4050 13.7071 1.73543 11.3943 1.83113 7.36280 5.68674 6.82656 0.831135E-01 0.104222 0.112137 0.844327E-01 0.121372 0.208443 1.00000 Std. Dev. 0.443800 0.455383 0.500119 0.302299 0.456575 0.452942 2.74112 13.6187 7.30225 2.63179 2.98176 3.08550 2.23183 2.21469 2.10554 4.21075 1.67363 5.05024 0.428949 0.409927 0.276236 0.305750 0.315744 0.278219 0.326775 0.406464 0.00000 Variance 0.196959 0.207373 0.250119 0.913845E-01 0.208461 0.205156 7.51374 185.468 53.3228 6.92634 8.89087 9.52033 4.98106 4.90486 4.43331 17.7304 2.80104 25.5049 0.183998 0.168040 0.763063E-01 0.934828E-01 0.996940E-01 0.774060E-01 0.106782 0.165213 0.00000 Maximum Minimum 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 18.0000 145.000 56.0000 73.0000 30.0000 38.0000 18.0000 18.0000 11.4440 22.0450 10.0000 22.0000 7.05100 8.03200 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 54.0000 12.0000 66.0000 16.0000 28.0000 9.00000 9.00000 0.00000 0.692000 0.00000 0.00000 4.60500 4.74900 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 1.00000 72 Chapter 4 Ordinary Least Squares Estimation Dependent variable Centered R**2 Adjusted R**2 Residual Sum of Squares Residual Variance Standard Error Total Sum of Squares Log Likelihood Mean of the Dependent Variable Std. Error of Dependent Variable Sum Absolute Residuals F(12, 745) F Significance 1/Condition XPX Maximum Absolute Residual Number of Observations Variable IQ S EXPR TENURE RNS SMSA IYEAR_67 IYEAR_68 IYEAR_69 IYEAR_70 IYEAR_71 IYEAR_73 CONSTANT Lag 0 0 0 0 0 0 0 0 0 0 0 0 0 Coefficient 0.27121199E-02 0.61954782E-01 0.30839472E-01 0.42163060E-01 -0.96293467E-01 0.13289929 -0.54209478E-01 0.80580850E-01 0.20759151 0.22822373 0.22269148 0.32287469 4.2353569 LW 0.4301415547786606 0.4209626268019410 79.37338878983863 0.1065414614628706 0.3264068955504320 139.2861498420176 -220.3342420049200 5.686738782319042 0.4289493629019316 194.5217111479906 46.86185095575703 1.000000000000000 1.486105464518127E-06 1.186094775249485 758 SE 0.10314110E-02 0.72785810E-02 0.65100828E-02 0.74812112E-02 0.27546700E-01 0.26575835E-01 0.47852181E-01 0.44895091E-01 0.43860470E-01 0.48799418E-01 0.43095233E-01 0.40657433E-01 0.11334886 t 2.6295239 8.5119313 4.7371858 5.6358601 -3.4956444 5.0007567 -1.1328528 1.7948700 4.7329979 4.6767716 5.1674272 7.9413448 37.365677 The below listed edited output replicates Baum (2006, 193-194). The Basman and Sargan tests of 97.0249 and 87.655, respectively, show high significance which rejects the null hypothesis that there is no correlation between the residuals of the LS2 model and the instruments. This finding suggests serious problems since endogeniety present in the OLS model will not be removed by LS2 estimation. Note that Stata replicates the Sargon test value. The Anderson value of 54.33 that tests for the relevance of the instruments matches the value reported in Baum (2006, 204) but does not match the value reported by Stata in the printed output that uses the revised ivreg2 Stata command that uses the LM form of the test value of 52.436. The B34S output includes both statistics. Since the null was rejected, the instruments appear relevant in that they are related to the endogenous variables. This is confirmed with the Cragg-Donald (1993) statistic of 56.333. In addition to various LS2 and GMM results, both Stata bootstrap and Stata robust errors results are shown. The bootstrap results do not make do not make assumptions about the distribution of the regressiors. The Rats coefficient results for LS2 and GMM match B34S and Stata. Note that Rats uses the small sample SE formula while Stata reports the large sample SE. B34S LS2 results report both. The exact formulas for all LS2 and GMM calculations in B34S are contained in the two subroutines listed in Table 4.8. Simultaneous Equations Systems OLS and LS2 Estimation Dependent Variable OLS Sum of squared Residuals LS2 Sum of squared Residuals Large Sample ls2 sigma Small Sample ls2 sigma Rank of Equation Order of Equation Equation is overidentified Anderson LR ident./IV Relevance test Significance of Anderson LR Statistic Anderson Canon Correlation LM test Significance of Anderson LM Statistic Cragg-Donald Chi-Square Weak ID Test Significance of Cragg-Donald test Basmann Significance of Basmann Statistic Sargan N*R-sq / J-Test Test Significance of Sargan Statistic LW 79.37338878983863 80.01823370030675 0.1055649521112226 0.1074070251010829 13 16 54.33777011513529 0.9999999999552830 52.43586586757428 0.9999999998881718 56.33277600836977 0.9999999999829244 97.02497131695870 1.000000000000000 87.65523169449482 1.000000000000000 +++++++++++++++++++++++++++++++++++++++++++++++++++++ Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 LHS NAMES IQ S EXPR TENURE RNS SMSA IYEAR_67 IYEAR_68 IYEAR_69 IYEAR_70 IYEAR_71 IYEAR_73 CONSTANT %OLSCOEF %OLS_SE %OLS_T 0.2712E-02 0.1031E-02 2.630 0.6195E-01 0.7279E-02 8.512 0.3084E-01 0.6510E-02 4.737 0.4216E-01 0.7481E-02 5.636 -0.9629E-01 0.2755E-01 -3.496 0.1329 0.2658E-01 5.001 -0.5421E-01 0.4785E-01 -1.133 0.8058E-01 0.4490E-01 1.795 0.2076 0.4386E-01 4.733 0.2282 0.4880E-01 4.677 0.2227 0.4310E-01 5.167 0.3229 0.4066E-01 7.941 4.235 0.1133 37.37 %LS2COEF %LS2_SES %LS2_SEL %LS2_T_S %LS2_T_L 0.1747E-03 0.3937E-02 0.3903E-02 0.4436E-01 0.4474E-01 0.6918E-01 0.1305E-01 0.1294E-01 5.301 5.347 0.2987E-01 0.6697E-02 0.6639E-02 4.460 4.498 0.4327E-01 0.7693E-02 0.7627E-02 5.625 5.674 -0.1036 0.2974E-01 0.2948E-01 -3.484 -3.514 0.1351 0.2689E-01 0.2666E-01 5.025 5.069 -0.5260E-01 0.4811E-01 0.4769E-01 -1.093 -1.103 0.7947E-01 0.4511E-01 0.4472E-01 1.762 1.777 0.2109 0.4432E-01 0.4393E-01 4.759 4.800 0.2386 0.5142E-01 0.5097E-01 4.641 4.682 0.2285 0.4412E-01 0.4374E-01 5.178 5.223 0.3259 0.4107E-01 0.4072E-01 7.935 8.004 4.400 0.2709 0.2685 16.24 16.38 = LW RHS = IQ S EXPR TENURE RNS SMSA IYEAR_67 IYEAR_68 70 IYEAR_71 IYEAR_73 CONSTANT IVAR = S EXPR TENURE RNS SMSA IYEAR_67 IYEAR_68 IYEAR_71 IYEAR_73 CONSTANT MED KWW AGE MRT ENDVAR IYEAR_69 IYEAR_ IYEAR_69 IYEAR_70 = iq GMM Estimates Dependent Variable OLS sum of squares LS2 sum of squares GMM sum of squares Rank of Equation Order of Equation Equation is overidentified Anderson ident./IV Relevance test Significance of Anderson Statistic Anderson Canon Correlation LM test Significance of Anderson LM Statistic Cragg-Donald Chi-Square Weak ID Test Significance of Cragg-Donald test Hansen J_stat Ident. of instruments Significance of Hansen j_stat LW 79.37338878983863 80.01823370030675 81.26217887229201 13 16 54.33777011513529 0.9999999999552830 52.43586586757428 0.9999999998881718 56.33277600836977 0.9999999999829244 74.16487762432548 0.9999999999999994 +++++++++++++++++++++++++++++++++++++++++++++++++++++ Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 NAMES IQ S EXPR TENURE RNS SMSA IYEAR_67 IYEAR_68 IYEAR_69 IYEAR_70 IYEAR_71 IYEAR_73 CONSTANT %COEFGMM %SEGMM %TGMM -0.1401E-02 0.4113E-02 -0.3407 0.7684E-01 0.1319E-01 5.827 0.3123E-01 0.6693E-02 4.667 0.4900E-01 0.7344E-02 6.672 -0.1007 0.2959E-01 -3.403 0.1336 0.2632E-01 5.075 -0.2101E-01 0.4554E-01 -0.4614 0.8910E-01 0.4270E-01 2.087 0.2072 0.4080E-01 5.080 0.2338 0.5285E-01 4.424 0.2346 0.4257E-01 5.510 0.3360 0.4041E-01 8.315 4.437 0.2900 15.30 73 74 Chapter 4 B34S Matrix Command Ending. Last Command reached. output from stata ___ ____ ____ ____ ____ (R) /__ / ____/ / ____/ ___/ / /___/ / /___/ 11.1 Statistics/Data Analysis Copyright 2009 StataCorp LP StataCorp 4905 Lakeway Drive College Station, Texas 77845 USA 800-STATA-PC http://www.stata.com 979-696-4600 stata@stata.com 979-696-4601 (fax) Single-user Stata perpetual license: Serial number: 30110535901 Licensed to: Houston H. Stokes University of Illinois at Chicago Notes: 1. 2. (/m# option or -set memory-) 120.00 MB allocated to data Stata running in batch mode . do stata.do . * File built by B34S . run statdata.do on 17/10/10 at 12:29:31 . * uncomment if do not use /e . * log using stata.log, text . global xlist s expr tenure rns smsa iyear_67 /// > iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 . bootstrap _b _se, reps(50): /// > ivregress 2sls lw $xlist (iq=med kww age mrt) (running ivregress on estimation sample) Bootstrap replications (50) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. Bootstrap results 50 Number of obs Replications = = 758 50 -----------------------------------------------------------------------------| Observed Bootstrap Normal-based | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------b | iq | .0001747 .0074584 0.02 0.981 -.0144435 .0147928 s | .0691759 .0217356 3.18 0.001 .0265749 .1117769 expr | .029866 .0079507 3.76 0.000 .014283 .0454491 tenure | .0432738 .0086468 5.00 0.000 .0263264 .0602211 rns | -.1035897 .0406823 -2.55 0.011 -.1833256 -.0238538 smsa | .1351148 .0258812 5.22 0.000 .0843886 .1858411 iyear_67 | -.052598 .0422675 -1.24 0.213 -.1354408 .0302448 iyear_68 | .0794686 .0459301 1.73 0.084 -.0105528 .16949 iyear_69 | .2108962 .0456788 4.62 0.000 .1213673 .300425 iyear_70 | .2386338 .0592127 4.03 0.000 .122579 .3546886 iyear_71 | .2284609 .0513617 4.45 0.000 .1277939 .3291279 iyear_73 | .3258944 .0432171 7.54 0.000 .2411904 .4105984 _cons | 4.39955 .4995474 8.81 0.000 3.420455 5.378645 -------------+---------------------------------------------------------------se | iq | .0039035 .0012226 3.19 0.001 .0015073 .0062996 s | .0129366 .0034772 3.72 0.000 .0061214 .0197518 expr | .0066393 .0007373 9.00 0.000 .0051941 .0080845 tenure | .0076271 .0011929 6.39 0.000 .005289 .0099652 rns | .029481 .0052416 5.62 0.000 .0192077 .0397544 smsa | .0266573 .002741 9.73 0.000 .021285 .0320297 iyear_67 | .0476924 .0051268 9.30 0.000 .0376441 .0577407 iyear_68 | .0447194 .004026 11.11 0.000 .0368285 .0526102 iyear_69 | .0439336 .0055467 7.92 0.000 .0330623 .054805 iyear_70 | .0509733 .0052485 9.71 0.000 .0406864 .0612601 iyear_71 | .0437436 .0041483 10.54 0.000 .035613 .0518741 iyear_73 | .0407181 .0041193 9.88 0.000 .0326444 .0487917 _cons | .2685443 .0796381 3.37 0.001 .1124564 .4246321 -----------------------------------------------------------------------------. * Durbin-Wu-Hausman exogenous test robust errors . ivregress 2sls lw $xlist (iq=med kww age mrt), vce(robust) Simultaneous Equations Systems Instrumental variables (2SLS) regression Number of obs Wald chi2(12) Prob > chi2 R-squared Root MSE = = = = = 758 573.14 0.0000 0.4255 .32491 -----------------------------------------------------------------------------| Robust lw | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------iq | .0001747 .0041241 0.04 0.966 -.0079085 .0082578 s | .0691759 .0132907 5.20 0.000 .0431266 .0952253 expr | .029866 .0066974 4.46 0.000 .0167394 .0429926 tenure | .0432738 .0073857 5.86 0.000 .0287981 .0577494 rns | -.1035897 .029748 -3.48 0.000 -.1618947 -.0452847 smsa | .1351148 .026333 5.13 0.000 .0835032 .1867265 iyear_67 | -.052598 .0457261 -1.15 0.250 -.1422195 .0370235 iyear_68 | .0794686 .0428231 1.86 0.063 -.0044631 .1634003 iyear_69 | .2108962 .0408774 5.16 0.000 .1307779 .2910144 iyear_70 | .2386338 .0529825 4.50 0.000 .1347901 .3424776 iyear_71 | .2284609 .0426054 5.36 0.000 .1449558 .311966 iyear_73 | .3258944 .0405569 8.04 0.000 .2464044 .4053844 _cons | 4.39955 .290085 15.17 0.000 3.830994 4.968106 -----------------------------------------------------------------------------Instrumented: iq Instruments: s expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 med kww age mrt . ivreg2 lw $xlist (iq=med kww age mrt) IV (2SLS) estimation -------------------Estimates efficient for homoskedasticity only Statistics consistent for homoskedasticity only Total (centered) SS Total (uncentered) SS Residual SS = = = 139.2861498 24652.24662 80.0182337 Number of obs F( 12, 745) Prob > F Centered R2 Uncentered R2 Root MSE = = = = = = 758 45.91 0.0000 0.4255 0.9968 .3249 -----------------------------------------------------------------------------lw | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------iq | .0001747 .0039035 0.04 0.964 -.007476 .0078253 s | .0691759 .0129366 5.35 0.000 .0438206 .0945312 expr | .029866 .0066393 4.50 0.000 .0168533 .0428788 tenure | .0432738 .0076271 5.67 0.000 .0283249 .0582226 rns | -.1035897 .029481 -3.51 0.000 -.1613715 -.0458079 smsa | .1351148 .0266573 5.07 0.000 .0828674 .1873623 iyear_67 | -.052598 .0476924 -1.10 0.270 -.1460734 .0408774 iyear_68 | .0794686 .0447194 1.78 0.076 -.0081797 .1671169 iyear_69 | .2108962 .0439336 4.80 0.000 .1247878 .2970045 iyear_70 | .2386338 .0509733 4.68 0.000 .1387281 .3385396 iyear_71 | .2284609 .0437436 5.22 0.000 .1427251 .3141967 iyear_73 | .3258944 .0407181 8.00 0.000 .2460884 .4057004 _cons | 4.39955 .2685443 16.38 0.000 3.873213 4.925887 -----------------------------------------------------------------------------Underidentification test (Anderson canon. corr. LM statistic): 52.436 Chi-sq(4) P-val = 0.0000 -----------------------------------------------------------------------------Weak identification test (Cragg-Donald Wald F statistic): 13.786 Stock-Yogo weak ID test critical values: 5% maximal IV relative bias 16.85 10% maximal IV relative bias 10.27 20% maximal IV relative bias 6.71 30% maximal IV relative bias 5.34 10% maximal IV size 24.58 15% maximal IV size 13.96 20% maximal IV size 10.26 25% maximal IV size 8.31 Source: Stock-Yogo (2005). Reproduced by permission. -----------------------------------------------------------------------------Sargan statistic (overidentification test of all instruments): 87.655 Chi-sq(3) P-val = 0.0000 -----------------------------------------------------------------------------Instrumented: iq Included instruments: s expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 Excluded instruments: med kww age mrt ------------------------------------------------------------------------------ 75 76 . ivreg2 lw Chapter 4 $xlist (iq=med kww age mrt), gmm2s robust 2-Step GMM estimation --------------------Estimates efficient for arbitrary heteroskedasticity Statistics robust to heteroskedasticity Total (centered) SS Total (uncentered) SS Residual SS = = = 139.2861498 24652.24662 81.26217887 Number of obs F( 12, 745) Prob > F Centered R2 Uncentered R2 Root MSE = = = = = = 758 49.67 0.0000 0.4166 0.9967 .3274 -----------------------------------------------------------------------------| Robust lw | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------iq | -.0014014 .0041131 -0.34 0.733 -.009463 .0066602 s | .0768355 .0131859 5.83 0.000 .0509915 .1026794 expr | .0312339 .0066931 4.67 0.000 .0181157 .0443522 tenure | .0489998 .0073437 6.67 0.000 .0346064 .0633931 rns | -.1006811 .0295887 -3.40 0.001 -.1586738 -.0426884 smsa | .1335973 .0263245 5.08 0.000 .0820021 .1851925 iyear_67 | -.0210135 .0455433 -0.46 0.645 -.1102768 .0682498 iyear_68 | .0890993 .042702 2.09 0.037 .0054049 .1727937 iyear_69 | .2072484 .0407995 5.08 0.000 .1272828 .287214 iyear_70 | .2338308 .0528512 4.42 0.000 .1302445 .3374172 iyear_71 | .2345525 .0425661 5.51 0.000 .1511244 .3179805 iyear_73 | .3360267 .0404103 8.32 0.000 .2568239 .4152295 _cons | 4.436784 .2899504 15.30 0.000 3.868492 5.005077 -----------------------------------------------------------------------------Underidentification test (Kleibergen-Paap rk LM statistic): 41.537 Chi-sq(4) P-val = 0.0000 -----------------------------------------------------------------------------Weak identification test (Cragg-Donald Wald F statistic): 13.786 (Kleibergen-Paap rk Wald F statistic): 12.167 Stock-Yogo weak ID test critical values: 5% maximal IV relative bias 16.85 10% maximal IV relative bias 10.27 20% maximal IV relative bias 6.71 30% maximal IV relative bias 5.34 10% maximal IV size 24.58 15% maximal IV size 13.96 20% maximal IV size 10.26 Simultaneous Equations Systems Output from RATS * * Data passed from B34S(r) system to RATS * display @1 %dateandtime() @33 ' Rats Version ' %ratsversion() 10/17/2010 12:29 Rats Version 7.30000 * CALENDAR(IRREGULAR) ALLOCATE 758 OPEN DATA rats.dat DATA(FORMAT=FREE,ORG=OBS, $ MISSING= 0.1000000000000000E+32 ) / $ RNS $ RNS80 $ MRT $ MRT80 $ SMSA $ SMSA80 $ MED $ IQ $ KWW $ YEAR $ AGE $ AGE80 $ S $ S80 $ EXPR $ EXPR80 $ TENURE $ TENURE80 $ LW $ LW80 $ IYEAR_67 $ IYEAR_68 $ IYEAR_69 $ IYEAR_70 $ IYEAR_71 $ IYEAR_73 $ CONSTANT SET TREND = T TABLE Series Obs Mean Std Error Minimum Maximum RNS 758 0.269129288 0.443800128 0.000000000 1.000000000 RNS80 758 0.292875989 0.455382503 0.000000000 1.000000000 MRT 758 0.514511873 0.500119364 0.000000000 1.000000000 MRT80 758 0.898416887 0.302298767 0.000000000 1.000000000 SMSA 758 0.704485488 0.456574966 0.000000000 1.000000000 SMSA80 758 0.712401055 0.452941990 0.000000000 1.000000000 MED 758 10.910290237 2.741119861 0.000000000 18.000000000 IQ 758 103.856200528 13.618666082 54.000000000 145.000000000 KWW 758 36.573878628 7.302246519 12.000000000 56.000000000 YEAR 758 69.031662269 2.631794247 66.000000000 73.000000000 AGE 758 21.835092348 2.981755741 16.000000000 30.000000000 AGE80 758 33.011873351 3.085503913 28.000000000 38.000000000 S 758 13.405013193 2.231828411 9.000000000 18.000000000 S80 758 13.707124011 2.214692601 9.000000000 18.000000000 EXPR 758 1.735428758 2.105542485 0.000000000 11.444000244 EXPR80 758 11.394261214 4.210745167 0.691999972 22.045000076 TENURE 758 1.831134565 1.673629972 0.000000000 10.000000000 TENURE80 758 7.362796834 5.050240439 0.000000000 22.000000000 LW 758 5.686738782 0.428949363 4.605000019 7.051000118 LW80 758 6.826555411 0.409926757 4.749000072 8.031999588 IYEAR_67 758 0.083113456 0.276235910 0.000000000 1.000000000 IYEAR_68 758 0.104221636 0.305749595 0.000000000 1.000000000 IYEAR_69 758 0.112137203 0.315743524 0.000000000 1.000000000 IYEAR_70 758 0.084432718 0.278219253 0.000000000 1.000000000 IYEAR_71 758 0.121372032 0.326774746 0.000000000 1.000000000 IYEAR_73 758 0.208443272 0.406463569 0.000000000 1.000000000 TREND 758 379.500000000 218.960042017 1.000000000 758.000000000 77 78 * instruments s expr tenure rns smsa iyear_67 $ iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 med kww age mrt constant * OLS linreg lw # constant s expr tenure rns smsa iyear_67 $ iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 iq Chapter 4 $ Linear Regression - Estimation by Least Squares Dependent Variable LW Usable Observations 758 Degrees of Freedom 745 Centered R**2 0.430142 R Bar **2 0.420963 Uncentered R**2 0.996780 T x R**2 755.559 Mean of Dependent Variable 5.6867387823 Std Error of Dependent Variable 0.4289493629 Standard Error of Estimate 0.3264068956 Sum of Squared Residuals 79.373388790 Regression F(12,745) 46.8619 Significance Level of F 0.00000000 Log Likelihood -220.33424 Durbin-Watson Statistic 1.726206 Variable Coeff Std Error T-Stat Signif ******************************************************************************** 1. Constant 4.235356890 0.113348861 37.36568 0.00000000 2. S 0.061954782 0.007278581 8.51193 0.00000000 3. EXPR 0.030839472 0.006510083 4.73719 0.00000260 4. TENURE 0.042163060 0.007481211 5.63586 0.00000002 5. RNS -0.096293467 0.027546700 -3.49564 0.00050091 6. SMSA 0.132899286 0.026575835 5.00076 0.00000071 7. IYEAR_67 -0.054209478 0.047852181 -1.13285 0.25764051 8. IYEAR_68 0.080580850 0.044895091 1.79487 0.07307967 9. IYEAR_69 0.207591515 0.043860470 4.73300 0.00000265 10. IYEAR_70 0.228223732 0.048799418 4.67677 0.00000346 11. IYEAR_71 0.222691481 0.043095233 5.16743 0.00000031 12. IYEAR_73 0.322874689 0.040657433 7.94134 0.00000000 13. IQ 0.002712120 0.001031411 2.62952 0.00872684 * 2SLS linreg(inst) lw # constant s expr tenure rns smsa iyear_67 $ iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 iq Linear Regression - Estimation by Instrumental Variables Dependent Variable LW Usable Observations 758 Degrees of Freedom 745 Mean of Dependent Variable 5.6867387823 Std Error of Dependent Variable 0.4289493629 Standard Error of Estimate 0.3277301102 Sum of Squared Residuals 80.018233699 J-Specification(3) 86.151910 Significance Level of J 0.00000000 Durbin-Watson Statistic 1.723148 Variable Coeff Std Error T-Stat Signif ******************************************************************************** 1. Constant 4.399550073 0.270877148 16.24187 0.00000000 2. S 0.069175917 0.013048998 5.30124 0.00000015 3. EXPR 0.029866018 0.006696962 4.45964 0.00000948 4. TENURE 0.043273756 0.007693380 5.62480 0.00000003 5. RNS -0.103589698 0.029737133 -3.48351 0.00052378 6. SMSA 0.135114831 0.026888925 5.02492 0.00000063 7. IYEAR_67 -0.052598010 0.048106697 -1.09336 0.27458852 8. IYEAR_68 0.079468615 0.045107833 1.76175 0.07852207 9. IYEAR_69 0.210896152 0.044315294 4.75899 0.00000234 10. IYEAR_70 0.238633821 0.051416062 4.64123 0.00000409 11. IYEAR_71 0.228460915 0.044123572 5.17775 0.00000029 12. IYEAR_73 0.325894418 0.041071810 7.93475 0.00000000 13. IQ 0.000174655 0.003937397 0.04436 0.96463097 Simultaneous Equations Systems 79 * GMM linreg(inst,optimalweights) lw # constant s expr tenure rns smsa iyear_67 $ iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 iq Linear Regression - Estimation by GMM Dependent Variable LW Usable Observations 758 Degrees of Freedom Mean of Dependent Variable 5.6867387823 Std Error of Dependent Variable 0.4289493629 Standard Error of Estimate 0.3302676947 Sum of Squared Residuals 81.262178869 J-Specification(3) 74.164878 Significance Level of J 0.00000000 Durbin-Watson Statistic 1.720776 745 Variable Coeff Std Error T-Stat Signif ******************************************************************************** 1. Constant 4.436784487 0.289950376 15.30188 0.00000000 2. S 0.076835453 0.013185922 5.82708 0.00000001 3. EXPR 0.031233937 0.006693110 4.66658 0.00000306 4. TENURE 0.048999780 0.007343684 6.67237 0.00000000 5. RNS -0.100681114 0.029588671 -3.40269 0.00066726 6. SMSA 0.133597299 0.026324546 5.07501 0.00000039 7. IYEAR_67 -0.021013483 0.045543337 -0.46140 0.64451500 8. IYEAR_68 0.089099315 0.042701995 2.08654 0.03692996 9. IYEAR_69 0.207248405 0.040799543 5.07967 0.00000038 10. IYEAR_70 0.233830843 0.052851170 4.42433 0.00000967 11. IYEAR_71 0.234552477 0.042566121 5.51031 0.00000004 12. IYEAR_73 0.336026675 0.040410335 8.31536 0.00000000 13. IQ -0.001401434 0.004113144 -0.34072 0.73331372 4.7 Potential problems of IV Models Instrumental variable estimation methods, while necessary and useful for models with endogenous variables on the right, have a number of features that can be serious drawbacks. 12 In the first place such estimators are never unbiased when endogenous variables are on the right. Citing Kinal (1980), Wooldridge (2010, 207) notes "when all endogenous variables have homoskedastic normal distributions with expectations linear in the exogenous variables, the number of moments of the 2SLS estimator that exist is one fewer than the number of overidentifying restrictions. This finding implies that when the number of instruments equals the number of explanatory variables, the IV estimator does not have the expected value." Even for large sample analysis, there will be problems if there are weak instruments. Assume a single endogenous variable x on the right or y 0 1x u (4.7-1) where z is the instrumental variable. It can be shown that cov( z, u ) p lim ˆ1 1 cov( z, x) c orr ( z, u ) 1 u x c orr ( z, x) 12 Wooldridge (2010) especially pages 107-114 forms the basis for this section. (4.7-2) 80 Chapter 4 The greater corr ( z, u ) or the correlation between the instruments and the population error u , the greater the bias. The smaller corr ( z, x) or the weaker the instrument since it is less correlated to the endogenous variables, the greater the bias. Note that the bias in the OLS estimator is p lim( 1 ) 1 u corr ( x, u ) x (4.7-3) and can be less than the bias in the IV estimator if corr ( z, u ) | corr ( x, u ) || | corr ( z, x) (4.7-4) The more significant the Anderson test, the larger | corr ( z , x) | everything else equal and the less the bias in the IV estimator. The more significant the Basmann (1960) test, the larger | corr ( z , u ) | and the more bias in the IV estimator. An insignificant Anderson test and a significant Basmann test is consistent with one or more instruments being endogenous. Table 4.10 Provides an overview of the instrumental variable tests. Table 4.10 Overview of IV Tests Test Usage Sargon(1958) For 2SLS uses (4.6-12). A significant value casts doubt on the suitability of the instruments. The Hansen J statistic (4.6-11) is the GMM equivalent of the Sargon test. Anderson The more significant the test, the larger the correlation between the instruments and the endogenous variables | corr ( z , x) | . This implies less bias in the IV estimator. There are three variants. Equation (4.6-8) is the LR form, (4.6-9) is the LM form and (4.6-10) is the Cragg-Donald (1993) form. Basmann(1960) The more significant the Basmann test statistic (4.6-12), the larger is | corr ( z , u ) | which is associated with more bias in the IV estimator. Hausman (1978) Tests If IV estimation is needed. If the test is significant, then OLS should not be used. For detail see (4.6-12). Simultaneous Equations Systems 81 The subroutine hausman shown in Table 4.11 can be used to perform a number of different types of Hausman (1978) tests that include using large and small sample IV covariance estimators, using only the diagonal of the covariance estimators and using subsets of coefficients. For example if there are 1-2 endogenous variables in the model, often one wants to test only these values to see if they significantly changed when estimated with an IV technique. The assumption being that the exogenous variables on the right are asymptotically unbiased. Table 4.11 Subroutine to Perform Hausman Tests subroutine hausman(title,olscoef,varcov1,ivcoef,ivcovar, hausmant,h_sig,iprint); /; /; Hausman (1978) Test if IV Estimation is needed /; /; title => Supply a title /; olscoef => Usually set as %olscoef from ls2 routine /; varcov1 => Usually set as %varcov1 from ls2 routine /; ivcoef => Usually set as %ls2coef from ls2 routine /; or %coefgmm from gmmest routine /; ivcovar => Usually set as %covar_L or %covar_S from ls2 /; or %covar_g from gmmest /; hausmant => Hausman test /; h_sig => Significance of Hausman test /; iprint => NE 0 => print, =2 print internal steps /; /; Logic of test is /; "Cameron-Trivedi Microeconometrics: Methods and Applications" /; Cambridge (2005, 272) equation 8.37 /; /; Very Preliminary Version 5 August 2011 /; ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ /; Not to be used until this message is removed d = workm = n_end = invdif = hausmant= h_sig = vfam(ivcoef-olscoef); (mfam(ivcovar)-mfam(varcov1)); rank(workm); pinv(workm); d*invdif*d; chisqprob(dabs(hausmant),dfloat(n_end)); if(iprint.ne.0)then; call print(' ':); call print(title:); call print('Hausman (1978) M test statistic ',hausmant:); call print('Rank of (ivcoef-varcov1) ',n_end:); call print('Significance of Hausman Test ',h_sig:); if(iprint.gt.1)then; call print('Coefficient Difference Vector',d); call print('OLS Var_Covar ',varcov1); call print('IV Var-Covar ',ivcovar); call print('Generalized Inverse of difference',invdif); r_cond = rcond(invdif); call print('rcond ',r_cond:); endif; call print(' ':); endif; 82 Chapter 4 return; end; Table 4.12 shows the setup for various Hausman tests performed on the Griliches data. Table 4.12 Various Hausman Tests %b34slet %b34slet %b34slet b34sexec b34sexec dob34s =1; dosas =0; dostata=1; options noheader; b34srun; options ginclude('micro.mac') member(griliches76); b34srun; %b34sif(&dob34s.ne.0)%then; b34sexec matrix; call loaddata; call load(ls2); call echooff; call character(lhs,'lw'); call character(endvar, 'iq'); call character(endvar2,'iq s'); call character(rhs,'iq s expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 constant'); call character(ivar,'s expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 constant med kww age mrt'); call character(ivar2,'expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 constant med kww age mrt'); call call Call call call call olsq(argument(lhs) argument(rhs) :noint :print :savex); print(' ':); print('Baum (2006) page 193':); print(' ':); print(lhs,rhs,ivar,endvar); ls2(%y,%x,catcol(argument(ivar)),%names,%yvar,1); * Hausman test ; call hausman('2SLS Model large sample covar - Testing coef 1', %olscoef(1),submatrix(%varcov1,1,1,1,1), %ls2coef(1),submatrix(%covar_l,1,1,1,1),h,sig_h,1); call hausman('2SLS Model small sample covar - Testing coef 1', %olscoef(1),submatrix(%varcov1,1,1,1,1), %ls2coef(1),submatrix(%covar_s,1,1,1,1),h,sig_h,1); call print('Baum (2006) page 198':); call gmmest(%y,%x,%z,%names,%yvar,j_stat,sigma,1); * Do C test to see it S is a good instrument; Simultaneous Equations Systems * s is removed from ivar to ivar2 ; call olsq(argument(lhs) argument(rhs) :noint :print :savex); call print(' ':); call print('Now there are 2 endogenous on the right':); call print(lhs,rhs,ivar2,endvar2); call ls2(%y,%x,catcol(argument(ivar2)),%names,%yvar,1); jj=integers(1,2); call hausman('2SLS Model large sample covar - Testing coef 1-2', %olscoef(jj),submatrix(%varcov1,1,2,1,2), %ls2coef(jj),submatrix(%covar_l,1,2,1,2),h,sig_h,2); jj=integers(1,2); call hausman('2SLS Model small sample covar - Testing coef 1-2', %olscoef(jj),submatrix(%varcov1,1,2,1,2), %ls2coef(jj),submatrix(%covar_s,1,2,1,2),h,sig_h,2); call gmmest(%y,%x,%z,%names,%yvar,j_stat,sigma,1); b34srun; %b34sendif; %b34sif(&dostata.ne.0)%then; b34sexec options open('statdata.do') unit(28) disp=unknown$ b34srun$ b34sexec options clean(28)$ b34srun$ b34sexec options open('stata.do') unit(29) disp=unknown$ b34srun$ b34sexec options clean(29)$ b34srun$ b34sexec pgmcall idata=28 icntrl=29$ stata$ * for detail on stata commands see Baum page 205 ; pgmcards$ * uncomment if do not use /e * log using stata.log, text global xlist s expr tenure rns smsa iyear_67 /// iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 global xlist2 expr tenure rns smsa iyear_67 /// iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 ivregress 2sls lw $xlist (iq=med kww age mrt) estat endogenous estat overid ivreg2 lw $xlist (iq=med kww age mrt), gmm2 robust * s is now endogenous ivregress 2sls lw $xlist2 (s iq=med kww age mrt) estat endogenous estat overid ivreg2 lw $xlist2 (s iq=med kww age mrt), gmm2 robust b34sreturn$ b34seend$ 83 84 Chapter 4 b34sexec options close(28); b34srun; b34sexec options close(29); b34srun; b34sexec options dounix('stata -b do stata.do ') dodos('stata /e do stata.do'); b34srun; b34sexec options npageout writeout('output from stata',' ',' ') copyfout('stata.log') dodos('erase stata.do', 'erase stata.log', 'erase statdata.do') $ b34srun$ %b34sendif; %b34sif(&dosas.ne.0)%then; B34SEXEC OPTIONS OPEN('testsas.sas') UNIT(29) DISP=UNKNOWN$ B34SRUN$ B34SEXEC OPTIONS CLEAN(29) $ B34SEEND$ B34SEXEC PGMCALL IDATA=29 ICNTRL=29$ SAS $ PGMCARDS$ proc means; run; proc model ; endogenous iq; lw= ciq*iq + cs*s + cexpr*expr+ ctenture*tenure+ crns*rns + csmsa*smsa+ ciyear_67*iyear_67+ ciyear_68*iyear_68 + ciyear_69*iyear_69 + ciyear_70*iyear_70+ + ciyear_71*iyear_71 + ciyear_73*iyear_73 + interc; fit lw / ols 2sls hausman; instruments s expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 med kww age mrt; run; proc model ; endogenous iq s; lw= ciq*iq + cs*s + cexpr*expr+ ctenture*tenure+ crns*rns + csmsa*smsa+ ciyear_67*iyear_67+ ciyear_68*iyear_68 + ciyear_69*iyear_69 + ciyear_70*iyear_70+ + ciyear_71*iyear_71 + ciyear_73*iyear_73 + interc; fit lw / ols 2sls hausman; instruments expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 med kww age mrt; run; B34SRETURN$ B34SRUN $ B34SEXEC OPTIONS CLOSE(29)$ B34SRUN$ /$ The next card has to be modified to point to SAS location /$ Be sure and wait until SAS gets done before letting B34S resume B34SEXEC OPTIONS dodos('start /w /r sas testsas') dounix('sas testsas')$ B34SRUN$ Simultaneous Equations Systems B34SEXEC OPTIONS NPAGEOUT NOHEADER WRITEOUT(' ','Output from SAS',' ',' ') WRITELOG(' ','Output from SAS',' ',' ') COPYFOUT('testsas.lst') COPYFLOG('testsas.log') dodos('erase testsas.sas','erase testsas.lst','erase testsas.log') dounix('rm testsas.sas','rm testsas.lst','rm testsas.log')$ B34SRUN$ %b34sendif; Edited output from running the code in Table 4.12 is shown below. 85 86 Chapter 4 Baum (2006) page 193 LHS = LW RHS = IQ S EXPR TENURE RNS SMSA IYEAR_67 IYEAR_68 70 IYEAR_71 IYEAR_73 CONSTANT IYEAR_69 IYEAR_ IVAR = s expr tenure rns smsa iyear_67 iyear_68 iyear_71 iyear_73 constant med kww age mrt ENDVAR iyear_69 iyear_70 = iq CAN_CORR= Vector of 0.691766E-01 1.00000 13 1.00000 1.00000 elements 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 OLS and LS2 Estimation Dependent Variable OLS Sum of squared Residuals LS2 Sum of squared Residuals Large Sample ls2 sigma Small Sample ls2 sigma Rank of Equation Order of Equation Equation is overidentified Anderson LR ident./IV Relevance test Significance of Anderson LR Statistic Anderson Canon Correlation LM test Significance of Anderson LM Statistic Cragg-Donald Chi-Square Weak ID Test Significance of Cragg-Donald test Basmann Significance of Basmann Statistic Sargan N*R-sq / J-Test Test Significance of Sargan Statistic LW 79.3733887898386 80.0182337003248 0.105564952111246 0.107407025101107 13 16 54.3377701150767 0.999999999955283 52.4358658675750 0.999999999888172 56.3327760083706 0.999999999982924 97.0249713169360 1.00000000000000 87.6552316944768 1.00000000000000 Hausman (1978) test - Sig. => need LS2 All coef. tested with Full (small) Covar. Matrix Hausman (1978) M test statistic 0.445917496047955 Rank of (ivcoef-varcov1) 13 Significance of Hausman Test 2.555970727478295E-008 All coef. tested with Full (large) Covar. Matrix Hausman (1978) M test statistic 0.454282631283137 Rank of (ivcoef-varcov1) 13 Significance of Hausman Test 2.873775182323094E-008 All coef. tested with diag (small) Covar. Matrix Hausman (1978) M test statistic 4.31737789543287 Rank of (ivcoef-varcov1) 13 Significance of Hausman Test 1.267697953382511E-002 All coef. tested with diag (large) Covar. Matrix Hausman (1978) M test statistic 8.48598394209793 Rank of (ivcoef-varcov1) 13 Significance of Hausman Test 0.189437061692008 +++++++++++++++++++++++++++++++++++++++++++++++++++++ Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 VAR_NAME IQ S EXPR TENURE RNS SMSA IYEAR_67 IYEAR_68 IYEAR_69 IYEAR_70 IYEAR_71 IYEAR_73 CONSTANT %OLSCOEF %OLS_SE %OLS_T 0.2712E-02 0.1031E-02 2.630 0.6195E-01 0.7279E-02 8.512 0.3084E-01 0.6510E-02 4.737 0.4216E-01 0.7481E-02 5.636 -0.9629E-01 0.2755E-01 -3.496 0.1329 0.2658E-01 5.001 -0.5421E-01 0.4785E-01 -1.133 0.8058E-01 0.4490E-01 1.795 0.2076 0.4386E-01 4.733 0.2282 0.4880E-01 4.677 0.2227 0.4310E-01 5.167 0.3229 0.4066E-01 7.941 4.235 0.1133 37.37 2SLS Model large sample covar - Testing coef 1 %LS2COEF %LS2_SES %LS2_SEL %LS2_T_S %LS2_T_L 0.1747E-03 0.3937E-02 0.3903E-02 0.4436E-01 0.4474E-01 0.6918E-01 0.1305E-01 0.1294E-01 5.301 5.347 0.2987E-01 0.6697E-02 0.6639E-02 4.460 4.498 0.4327E-01 0.7693E-02 0.7627E-02 5.625 5.674 -0.1036 0.2974E-01 0.2948E-01 -3.484 -3.514 0.1351 0.2689E-01 0.2666E-01 5.025 5.069 -0.5260E-01 0.4811E-01 0.4769E-01 -1.093 -1.103 0.7947E-01 0.4511E-01 0.4472E-01 1.762 1.777 0.2109 0.4432E-01 0.4393E-01 4.759 4.800 0.2386 0.5142E-01 0.5097E-01 4.641 4.682 0.2285 0.4412E-01 0.4374E-01 5.178 5.223 0.3259 0.4107E-01 0.4072E-01 7.935 8.004 4.400 0.2709 0.2685 16.24 16.38 Simultaneous Equations Systems Hausman (1978) M test statistic Rank of (ivcoef-varcov1) Significance of Hausman Test 87 0.454282631270989 1 0.499691813837660 2SLS Model small sample covar - Testing coef 1 Hausman (1978) M test statistic 0.445917496059109 Rank of (ivcoef-varcov1) 1 Significance of Hausman Test 0.495719926530397 Baum (2006) page 198 GMM Estimates Dependent Variable OLS sum of squares LS2 sum of squares GMM sum of squares Rank of Equation Order of Equation Equation is overidentified Anderson ident./IV Relevance test Significance of Anderson Statistic Anderson Canon Correlation LM test Significance of Anderson LM Statistic Cragg-Donald Chi-Square Weak ID Test Significance of Cragg-Donald test Hansen J_stat Ident. of instruments Significance of Hansen J_stat LW 79.3733887898386 80.0182337003248 81.2621788722545 13 16 54.3377701150767 0.999999999955283 52.4358658675750 0.999999999888172 56.3327760083706 0.999999999982924 74.1648776242674 0.999999999999999 Hausman (1978) test - Sig. => Need GMM Hausman (1978) M test statistic Rank of (ivcoef-varcov1) Significance of Hausman Test 15.0455541849668 13 0.695482435419703 +++++++++++++++++++++++++++++++++++++++++++++++++++++ Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 NAMES IQ S EXPR TENURE RNS SMSA IYEAR_67 IYEAR_68 IYEAR_69 IYEAR_70 IYEAR_71 IYEAR_73 CONSTANT %COEFGMM %SEGMM %TGMM -0.1401E-02 0.4113E-02 -0.3407 0.7684E-01 0.1319E-01 5.827 0.3123E-01 0.6693E-02 4.667 0.4900E-01 0.7344E-02 6.672 -0.1007 0.2959E-01 -3.403 0.1336 0.2632E-01 5.075 -0.2101E-01 0.4554E-01 -0.4614 0.8910E-01 0.4270E-01 2.087 0.2072 0.4080E-01 5.080 0.2338 0.5285E-01 4.424 0.2346 0.4257E-01 5.510 0.3360 0.4041E-01 8.315 4.437 0.2900 15.30 Note that the results for testing all coefficients or only the IQ coefficient are very similar. This is because there was a substantial change in this coefficient from the OLS value of .002712 with t = 2.630 to the LS2 value of .0001747 with small sample t of .04436. Due to the high covariance of the coefficients this was not significant. If the covariance of the coefficients is assumed to be 0.0, then the significance of the Hausman statistic rises to 8.49 with significance of .999. The Hausman tests reported by Stata for this problem are Durbin = .457658 with p=.4987 and WuHausman = .449477 with p=.5028. In the next problem both S and IQ are assumed to be endogenous. Now there are 2 endogenous on the right LHS = LW RHS = IQ S EXPR TENURE RNS SMSA IYEAR_67 IYEAR_68 70 IYEAR_71 IYEAR_73 CONSTANT IVAR2 = expr tenure rns smsa iyear_67 iyear_68 year_71 iyear_73 constant med kww age mrt IYEAR_69 IYEAR_ iyear_69 iyear_70 i ENDVAR2 = iq s CAN_CORR= Vector of 0.632956E-01 13 0.363635 elements 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 88 Chapter 4 1.00000 1.00000 1.00000 1.00000 1.00000 OLS and LS2 Estimation Dependent Variable OLS Sum of squared Residuals LS2 Sum of squared Residuals Large Sample ls2 sigma Small Sample ls2 sigma Rank of Equation Order of Equation Equation is overidentified Anderson LR ident./IV Relevance test Anderson Canon Correlation LM test Significance of Anderson LM Statistic Cragg-Donald Chi-Square Weak ID Test Significance of Cragg-Donald test Basmann Significance of Basmann Statistic Sargan N*R-sq / J-Test Test Significance of Sargan Statistic LW 79.3733887898386 107.531341127675 0.141861927609070 0.144337370641175 13 15 -343.395953003074 47.9780438223675 0.999999999784662 51.2200459449682 0.999999999956070 13.2374795188268 0.998664887551261 13.2683313734400 0.998685324861342 Hausman (1978) test - Sig. => need LS2 All coef. tested with Full (small) Covar. Matrix Hausman (1978) M test statistic 45.6187185063307 Rank of (ivcoef-varcov1) 13 Significance of Hausman Test 0.999983506422730 All coef. tested with Full (large) Covar. Matrix Hausman (1978) M test statistic 46.6688127514076 Rank of (ivcoef-varcov1) 13 Significance of Hausman Test 0.999989011736172 All coef. tested with diag (small) Covar. Matrix Hausman (1978) M test statistic 106.728898645404 Rank of (ivcoef-varcov1) 13 Significance of Hausman Test 1.00000000000000 All coef. tested with diag (large) Covar. Matrix Hausman (1978) M test statistic 110.450214212224 Rank of (ivcoef-varcov1) 13 Significance of Hausman Test 1.00000000000000 +++++++++++++++++++++++++++++++++++++++++++++++++++++ Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 VAR_NAME IQ S EXPR TENURE RNS SMSA IYEAR_67 IYEAR_68 IYEAR_69 IYEAR_70 IYEAR_71 IYEAR_73 CONSTANT %OLSCOEF %OLS_SE %OLS_T 0.2712E-02 0.1031E-02 2.630 0.6195E-01 0.7279E-02 8.512 0.3084E-01 0.6510E-02 4.737 0.4216E-01 0.7481E-02 5.636 -0.9629E-01 0.2755E-01 -3.496 0.1329 0.2658E-01 5.001 -0.5421E-01 0.4785E-01 -1.133 0.8058E-01 0.4490E-01 1.795 0.2076 0.4386E-01 4.733 0.2282 0.4880E-01 4.677 0.2227 0.4310E-01 5.167 0.3229 0.4066E-01 7.941 4.235 0.1133 37.37 %LS2COEF %LS2_SES %LS2_SEL %LS2_T_S -0.9099E-02 0.4745E-02 0.4704E-02 -1.917 0.1724 0.2092E-01 0.2074E-01 8.243 0.4929E-01 0.8225E-02 0.8155E-02 5.992 0.4222E-01 0.8920E-02 0.8843E-02 4.733 -0.1018 0.3447E-01 0.3418E-01 -2.953 0.1261 0.3120E-01 0.3093E-01 4.043 -0.5962E-01 0.5578E-01 0.5530E-01 -1.069 0.4868E-01 0.5247E-01 0.5202E-01 0.9278 0.1528 0.5201E-01 0.5156E-01 2.938 0.1744 0.6028E-01 0.5976E-01 2.894 0.9167E-01 0.5461E-01 0.5414E-01 1.678 0.9324E-01 0.5768E-01 0.5718E-01 1.617 4.034 0.3182 0.3154 12.68 2SLS Model large sample covar - Testing coef 1-2 Hausman (1978) M test statistic 46.6688127514087 Rank of (ivcoef-varcov1) 2 Significance of Hausman Test 0.999999999926549 Coefficient Difference Vector D = Vector of -0.118110E-01 2 elements 0.110471 OLS Var_Covar VARCOV1 = Matrix of 1 2 1 0.106381E-05 -0.302739E-05 2 by 2 -0.302739E-05 0.529777E-04 2 elements %LS2_T_L -1.934 8.314 6.044 4.774 -2.978 4.078 -1.078 0.9359 2.964 2.919 1.693 1.631 12.79 Simultaneous Equations Systems IV Var-Covar IVCOVAR = Matrix of 1 2 1 0.221314E-04 -0.766991E-04 2 by 2 elements 2 -0.766991E-04 0.430068E-03 Generalized Inverse of difference INVDIF 1 2 = Matrix of 2 1 149826. 29271.3 by 2 elements 2 29271.3 8370.60 rcond 1.435620745007835E-002 2SLS Model small sample covar - Testing coef 1-2 Hausman (1978) M test statistic 45.6187185063286 Rank of (ivcoef-varcov1) 2 Significance of Hausman Test 0.999999999875829 Coefficient Difference Vector D = Vector of -0.118110E-01 2 elements 0.110471 OLS Var_Covar VARCOV1 = Matrix of 1 2 1 0.106381E-05 -0.302739E-05 2 by 2 elements 2 elements 2 -0.302739E-05 0.529777E-04 IV Var-Covar IVCOVAR = Matrix of 1 2 1 0.225176E-04 -0.780375E-04 2 by 2 -0.780375E-04 0.437572E-03 Generalized Inverse of difference INVDIF 1 2 = Matrix of 1 146541. 28580.8 2 by 2 elements 2 28580.8 8174.45 rcond 1.439786477313241E-002 GMM Estimates Dependent Variable OLS sum of squares LS2 sum of squares GMM sum of squares Rank of Equation Order of Equation Equation is overidentified Anderson ident./IV Relevance test Anderson Canon Correlation LM test Significance of Anderson LM Statistic Cragg-Donald Chi-Square Weak ID Test Significance of Cragg-Donald test Hansen J_stat Ident. of instruments Significance of Hansen J_stat Hausman (1978) test - Sig. => Need GMM Hausman (1978) M test statistic Rank of (ivcoef-varcov1) Significance of Hausman Test LW 79.3733887898386 107.531341127675 109.084651127200 13 15 -343.395953003074 47.9780438223675 0.999999999784662 51.2200459449682 0.999999999956070 11.6014813674647 0.996974686884900 53.4818275385920 13 0.999999255129040 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 89 90 Chapter 4 Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 NAMES IQ S EXPR TENURE RNS SMSA IYEAR_67 IYEAR_68 IYEAR_69 IYEAR_70 IYEAR_71 IYEAR_73 CONSTANT %COEFGMM %SEGMM %TGMM -0.9286E-02 0.4882E-02 -1.902 0.1758 0.2068E-01 8.502 0.5028E-01 0.8044E-02 6.251 0.4252E-01 0.9455E-02 4.497 -0.1041 0.3352E-01 -3.105 0.1248 0.3077E-01 4.054 -0.5304E-01 0.5146E-01 -1.031 0.4595E-01 0.4957E-01 0.9270 0.1555 0.4763E-01 3.264 0.1670 0.6100E-01 2.737 0.8465E-01 0.5540E-01 1.528 0.9961E-01 0.6070E-01 1.641 4.004 0.3348 11.96 Tests are made on all coefficients, using a diagonal covariance matrix and just the two endogenous variables. The significant Hausman test statistics indicate that IV methods are needed. Internal calculations for the two coefficient case are displayed. A significant finding for the Hausman test is consistent with what Stata finds but the exact statistics do not match apparently due to the variant of the test implied. Stata appears to be testing all coefficients. Note that the LS2 and GMM coefficients match 100%. To attempt to issustrate the various Hausman test values Sas was employed. output from stata ___ ____ ____ ____ ____ (R) /__ / ____/ / ____/ ___/ / /___/ / /___/ 12.0 Statistics/Data Analysis Copyright 1985-2011 StataCorp LP StataCorp 4905 Lakeway Drive College Station, Texas 77845 USA 800-STATA-PC http://www.stata.com 979-696-4600 stata@stata.com 979-696-4601 (fax) Single-user Stata perpetual license: Serial number: 3012042652 Licensed to: Houston H. Stokes U of Illinois Notes: 1. Stata running in batch mode . do stata.do . * File built by B34S . run statdata.do on 5/ 8/11 at 19:55:28 . * uncomment if do not use /e . * log using stata.log, text . global xlist s expr tenure rns smsa iyear_67 /// > iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 . global xlist2 expr tenure rns smsa iyear_67 /// > iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 . ivregress 2sls lw $xlist (iq=med kww age mrt) Instrumental variables (2SLS) regression Number of obs Wald chi2(12) Prob > chi2 R-squared Root MSE = = = = = 758 560.57 0.0000 0.4255 .32491 -----------------------------------------------------------------------------lw | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------iq | .0001747 .0039035 0.04 0.964 -.007476 .0078253 s | .0691759 .0129366 5.35 0.000 .0438206 .0945312 expr | .029866 .0066393 4.50 0.000 .0168533 .0428788 tenure | .0432738 .0076271 5.67 0.000 .0283249 .0582226 rns | -.1035897 .029481 -3.51 0.000 -.1613715 -.0458079 smsa | .1351148 .0266573 5.07 0.000 .0828674 .1873623 iyear_67 | -.052598 .0476924 -1.10 0.270 -.1460734 .0408774 iyear_68 | .0794686 .0447194 1.78 0.076 -.0081797 .1671169 iyear_69 | .2108962 .0439336 4.80 0.000 .1247878 .2970045 iyear_70 | .2386338 .0509733 4.68 0.000 .1387281 .3385396 iyear_71 | .2284609 .0437436 5.22 0.000 .1427251 .3141967 iyear_73 | .3258944 .0407181 8.00 0.000 .2460884 .4057004 _cons | 4.39955 .2685443 16.38 0.000 3.873213 4.925887 Simultaneous Equations Systems -----------------------------------------------------------------------------Instrumented: iq Instruments: s expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 med kww age mrt . estat endogenous Tests of endogeneity Ho: variables are exogenous Durbin (score) chi2(1) Wu-Hausman F(1,744) = = .457658 .449477 (p = 0.4987) (p = 0.5028) . estat overid Tests of overidentifying restrictions: Sargan (score) chi2(3) = Basmann chi2(3) = . ivreg2 87.6552 97.025 (p = 0.0000) (p = 0.0000) lw $xlist (iq=med kww age mrt), gmm2 robust 2-Step GMM estimation --------------------Estimates efficient for arbitrary heteroskedasticity Statistics robust to heteroskedasticity Total (centered) SS Total (uncentered) SS Residual SS = = = 139.2861498 24652.24662 81.26217887 Number of obs F( 12, 745) Prob > F Centered R2 Uncentered R2 Root MSE = = = = = = 758 49.67 0.0000 0.4166 0.9967 .3274 -----------------------------------------------------------------------------| Robust lw | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------iq | -.0014014 .0041131 -0.34 0.733 -.009463 .0066602 s | .0768355 .0131859 5.83 0.000 .0509915 .1026794 expr | .0312339 .0066931 4.67 0.000 .0181157 .0443522 tenure | .0489998 .0073437 6.67 0.000 .0346064 .0633931 rns | -.1006811 .0295887 -3.40 0.001 -.1586738 -.0426884 smsa | .1335973 .0263245 5.08 0.000 .0820021 .1851925 iyear_67 | -.0210135 .0455433 -0.46 0.645 -.1102768 .0682498 iyear_68 | .0890993 .042702 2.09 0.037 .0054049 .1727937 iyear_69 | .2072484 .0407995 5.08 0.000 .1272828 .287214 iyear_70 | .2338308 .0528512 4.42 0.000 .1302445 .3374172 iyear_71 | .2345525 .0425661 5.51 0.000 .1511244 .3179805 iyear_73 | .3360267 .0404103 8.32 0.000 .2568239 .4152295 _cons | 4.436784 .2899504 15.30 0.000 3.868492 5.005077 -----------------------------------------------------------------------------Underidentification test (Kleibergen-Paap rk LM statistic): 41.537 Chi-sq(4) P-val = 0.0000 -----------------------------------------------------------------------------Weak identification test (Cragg-Donald Wald F statistic): 13.786 (Kleibergen-Paap rk Wald F statistic): 12.167 Stock-Yogo weak ID test critical values: 5% maximal IV relative bias 16.85 10% maximal IV relative bias 10.27 20% maximal IV relative bias 6.71 30% maximal IV relative bias 5.34 10% maximal IV size 24.58 15% maximal IV size 13.96 20% maximal IV size 10.26 25% maximal IV size 8.31 Source: Stock-Yogo (2005). Reproduced by permission. NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors. -----------------------------------------------------------------------------Hansen J statistic (overidentification test of all instruments): 74.165 Chi-sq(3) P-val = 0.0000 -----------------------------------------------------------------------------Instrumented: iq Included instruments: s expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 Excluded instruments: med kww age mrt -----------------------------------------------------------------------------. * s is now endogenous . ivregress 2sls lw $xlist2 (s iq=med kww age mrt) Instrumental variables (2SLS) regression Number of obs Wald chi2(12) Prob > chi2 R-squared Root MSE = = = = = 758 459.55 0.0000 0.2280 .37665 -----------------------------------------------------------------------------lw | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------s | .1724253 .0207381 8.31 0.000 .1317794 .2130712 91 92 Chapter 4 iq | -.0090988 .0047044 -1.93 0.053 -.0183193 .0001216 expr | .0492895 .0081546 6.04 0.000 .0333068 .0652722 tenure | .0422171 .0088429 4.77 0.000 .0248854 .0595488 rns | -.1017935 .0341765 -2.98 0.003 -.1687781 -.0348088 smsa | .1261109 .0309275 4.08 0.000 .0654942 .1867277 iyear_67 | -.0596171 .0552955 -1.08 0.281 -.1679942 .04876 iyear_68 | .0486796 .0520161 0.94 0.349 -.0532701 .1506292 iyear_69 | .1528176 .051563 2.96 0.003 .051756 .2538792 iyear_70 | .1744361 .0597576 2.92 0.004 .0573133 .2915588 iyear_71 | .091666 .054144 1.69 0.090 -.0144543 .1977863 iyear_73 | .0932398 .0571819 1.63 0.103 -.0188347 .2053142 _cons | 4.03351 .3154215 12.79 0.000 3.415295 4.651725 -----------------------------------------------------------------------------Instrumented: s iq Instruments: expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 med kww age mrt . estat endogenous Tests of endogeneity Ho: variables are exogenous Durbin (score) chi2(2) Wu-Hausman F(2,743) = = 70.8497 38.3041 (p = 0.0000) (p = 0.0000) . estat overid Tests of overidentifying restrictions: Sargan (score) chi2(2) = Basmann chi2(2) = 13.2683 13.2375 (p = 0.0013) (p = 0.0013) . ivreg2 lw $xlist2 (s iq=med kww age mrt), gmm2 robust 2-Step GMM estimation --------------------Estimates efficient for arbitrary heteroskedasticity Statistics robust to heteroskedasticity Total (centered) SS Total (uncentered) SS Residual SS = = = 139.2861498 24652.24662 109.0846511 Number of obs F( 12, 745) Prob > F Centered R2 Uncentered R2 Root MSE = = = = = = 758 41.98 0.0000 0.2168 0.9956 .3794 -----------------------------------------------------------------------------| Robust lw | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------s | .1757958 .0206766 8.50 0.000 .1352703 .2163212 iq | -.0092862 .0048824 -1.90 0.057 -.0188555 .0002832 expr | .0502828 .0080438 6.25 0.000 .0345171 .0660484 tenure | .0425214 .0094549 4.50 0.000 .0239901 .0610526 rns | -.1040931 .0335239 -3.11 0.002 -.1697986 -.0383875 smsa | .1247512 .0307747 4.05 0.000 .0644338 .1850686 iyear_67 | -.0530432 .0514609 -1.03 0.303 -.1539047 .0478184 iyear_68 | .0459546 .0495735 0.93 0.354 -.0512077 .1431169 iyear_69 | .1554801 .0476311 3.26 0.001 .0621249 .2488352 iyear_70 | .1669875 .0610006 2.74 0.006 .0474285 .2865464 iyear_71 | .0846485 .0554035 1.53 0.127 -.0239404 .1932373 iyear_73 | .0996068 .0607034 1.64 0.101 -.0193696 .2185833 _cons | 4.003924 .3348423 11.96 0.000 3.347645 4.660203 -----------------------------------------------------------------------------Underidentification test (Kleibergen-Paap rk LM statistic): 40.927 Chi-sq(3) P-val = 0.0000 -----------------------------------------------------------------------------Weak identification test (Cragg-Donald Wald F statistic): 12.552 (Kleibergen-Paap rk Wald F statistic): 11.461 Stock-Yogo weak ID test critical values: 5% maximal IV relative bias 11.04 10% maximal IV relative bias 7.56 20% maximal IV relative bias 5.57 30% maximal IV relative bias 4.73 10% maximal IV size 16.87 15% maximal IV size 9.93 20% maximal IV size 7.54 25% maximal IV size 6.28 Source: Stock-Yogo (2005). Reproduced by permission. NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors. -----------------------------------------------------------------------------Hansen J statistic (overidentification test of all instruments): 11.601 Chi-sq(2) P-val = 0.0030 -----------------------------------------------------------------------------Instrumented: s iq Included instruments: expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 Excluded instruments: med kww age mrt -----------------------------------------------------------------------------. Simultaneous Equations Systems 93 end of do-file Output from SAS These results 100% match B34S and Stata. The Hausman value of .45 also matches. Note that Stata and SAS use the convention that 1.00 implies no significance. The MODEL Procedure Nonlinear 2SLS Summary of Residual Errors Equation lw DF Model DF Error SSE MSE Root MSE R-Square Adj R-Sq 13 745 80.0182 0.1074 0.3277 0.4255 0.4163 Nonlinear 2SLS Parameter Estimates Parameter Estimate Approx Std Err t Value Approx Pr > |t| ciq cs cexpr ctenture crns csmsa ciyear_67 ciyear_68 ciyear_69 ciyear_70 ciyear_71 ciyear_73 interc 0.000175 0.069176 0.029866 0.043274 -0.10359 0.135115 -0.0526 0.079469 0.210896 0.238634 0.228461 0.325894 4.39955 0.00394 0.0130 0.00670 0.00769 0.0297 0.0269 0.0481 0.0451 0.0443 0.0514 0.0441 0.0411 0.2709 0.04 5.30 4.46 5.62 -3.48 5.02 -1.09 1.76 4.76 4.64 5.18 7.93 16.24 0.9646 <.0001 <.0001 <.0001 0.0005 <.0001 0.2746 0.0785 <.0001 <.0001 <.0001 <.0001 <.0001 Number of Observations Used Missing Efficient under H0 OLS 758 0 Statistics for System Objective Objective*N 0.0122 9.2533 Hausman's Specification Test Results Consistent under H1 DF Statistic Pr > ChiSq 2SLS 13 0.45 1.0000 Label log wage 94 Chapter 4 The SAS results that have both IQ and S endogenous match Stata and B34S. The Hausman test of 45.62 matches the B34S value of 45.6187 for the “All coef. tested with Full (small) Covar. Matrix” but does not match Stata values of 70.8497 which appears to be for a test on the two endogenous coefficients. Note that the B34S Hausman M statistic values for testing the two endogenous coefficients were 46.6688 and 45.6187 depending on whether the large sample or small sample covariance matrix is used. The exact intermediate values calculated by b34s are listed in the output to facilitate validation of the calculation Nonlinear 2SLS Parameter Estimates Parameter Estimate Approx Std Err t Value Approx Pr > |t| ciq cs cexpr ctenture crns csmsa ciyear_67 ciyear_68 ciyear_69 ciyear_70 ciyear_71 ciyear_73 interc -0.0091 0.172425 0.049289 0.042217 -0.10179 0.126111 -0.05962 0.04868 0.152818 0.174436 0.091666 0.09324 4.03351 0.00475 0.0209 0.00823 0.00892 0.0345 0.0312 0.0558 0.0525 0.0520 0.0603 0.0546 0.0577 0.3182 -1.92 8.24 5.99 4.73 -2.95 4.04 -1.07 0.93 2.94 2.89 1.68 1.62 12.68 0.0556 <.0001 <.0001 <.0001 0.0032 <.0001 0.2855 0.3538 0.0034 0.0039 0.0937 0.1064 <.0001 Number of Observations Used Missing Efficient under H0 B34S normal exit on Date (D:M:Y) OLS 5/ 8/11 758 0 Statistics for System Objective Objective*N 0.002483 1.8823 Hausman's Specification Test Results Consistent under H1 DF Statistic Pr > ChiSq 2SLS 13 at Time (H:M:S) 19:55:40 45.62 <.0001 It is very important to use the Hausman test as well as other IV tests to determine if IV techniques are to be used in place of OLS. The Hausman test should not be used without other tests. If the instruments are poor, the endogenous variable coefficients will most likely differ from their OLS values and the Hausman test may falsely give an indication that IV is appropriate. If the Sargon test or its GMM equivalent Hansen J test is significant, this would cast doubt on the suitability of the IV model. A significant Anderson test would suggest correlation between the instruments and the endogenous variables which everything else equal suggests using an IV technique. A significant Basmann test, on the other hand, indicates correlation between the instruments and the error term or in other words the instruments are themselves not exogenous. 4.8 Conclusion The simeq command should be used when either there are endogenous variables on the right-hand side of a regression model or when the seemingly unrelated regression model is desired. In the former case, if OLS is attempted, the results will be biased estimates. Jennings (1973, 1980), the original developer of the simeq code, made a major contribution in developing fast and accurate code that was designed to alert the user to problems in the structure of the model. These include rank tests on all the key matrices as well as rank tests on the matrix of exogenous variables in the system. The matrix command was used to illustrate calculation of Simultaneous Equations Systems 95 OLS, LIML, 2SLS, 3SLS and FIML models using more traditional equations that those used by Jennings. SAS and Rats code was shown and the results compared to the B34S program output. Using the matrix command LS2 (same as 2SLS) and GMM routines together with a number of diagnostic tests are shown and the results compared to Stata and Rats using an important dataset studied by Griliches (1975) and Baum (2006).