Revised Chapter 4 in Specifying and Diagnostically Testing Econometric Models (Edition 3) © by Houston H. Stokes 17 October 2010 All rights reserved. Preliminary Draft Chapter 4 Simultaneous Equations Systems .................................................... 1 4.0 Introduction .............................................................. 1 4.1 Estimation of Structural Models ................................................ 2 Table 4.1 Matlab Program to obtain Constrained Reduced Form ............................... 3 Table 4.2 Edited output from running Matlab Program in Table 4.1 ........................... 5 4.2 Estimation of OLS, LIML, LS2, LS3, and ILS3 ...................................... 9 4.3 Examples ............................................................... 16 Table 4.3 Setup for ols, liml, ls2, ls3, and ils3 commands ................................. 17 Table 4.4 SAS Implementation of the Kmenta Model .................................... 25 Table 4.5 RATS Implementation of the Kmenta Model................................... 27 4.4 Exactly identified systems .................................................... 34 Table 4.6 Exactly Identified Kmenta Problem ......................................... 34 4.5 Analysis of OLS, 2SLS and 3SLS using Matrix Command ............................. 38 Table 4.7 Matrix Command Implementation of OLS, 2SLS, 3SLS and FIML .................... 39 4.6 LS2 and GMM Models and Specification tests ....................................... 52 Table 4.8 LS2 and General Method of Moments estimation routines .......................... 54 Table 4.9 Estimation of LS2 and GMM Models using B34S, Stata and Rats ..................... 60 4.8 Conclusion ............................................................... 72 Simultaneous Equations Systems 4.0 Introduction In section 4.1, after first discussing the basic simultaneous equations model, the constrained reduced form, the unconstrained reduced form and the final form are introduced. The MATLAB symbolic capability is used to illustrate how the constrained reduced form relates to the structural parameters of the model. In section 4.2 the theory behind QR approach to simultaneous equations modeling as developed by Jennings (1980) is discussed in some detail. The simeq command performs estimation of systems of equations by the methods of OLS, limited information maximum likelihood (LIML), two-stage least squares (2SLS), three-stage least squares (3SLS), iterative three-stage least squares (I3SLS), seemingly unrelated regression (SUR) and full information maximum likelihood (FIML), using code developed by Les Jennings (1973, 1980). The Jennings code is unique in that it implements the QR approach to estimate systems of equations, which results in both substantial savings in time and increased accuracy.1 The estimation methods are well known and covered in detail in such books as Johnston (1963, 1972, 1984), Kmenta (1971, 1986), and Pindyck and Rubinfeld (1976, 1981, 1990) and will only be sketched here. What will be discussed are the contributions of Jennings and others. The 1 The B34S qr command is designed to provide up to 16 digits of accuracy. This command, which also allows estimation of the principal component (PC) regression, uses LINPACK code and is documented in Chapter 10. The qr command is distinct from the code in the simeq command. The matrix command contains extensive and programmable QR capability. For further examples see Chapter 10 and 16. and sections of chapter 2 4-1 4-2 Chapter 4 discussion of these techniques follows closely material in Jennings (1980) and Strang (1976). Section 4.3 illustrates estimation of variants of the Kmenta model using RATS, B34S and SAS while section 4.4 illustrates an exactly identified model. Section 4.5 shows how using the matrix command OLS, LIMF, 3SLS and FIML can be estimated. The code here is for illustration purposes, benchmarking but not production. Section 4.6 shows matrix command subroutines LS2 and GAMEST that respectively do single equation 2SLS and GMM models. This code is 100% production. 4.1 Estimation of Structural Models Assume a system of G equations with K exogenous variables2 b11 y1i ... b1G yG i 11 x1i ... 1 K xK i u1i b 21 y1i ... b 2G yG i 21 x1i ... 2 K xK i u2 i ..................................................................... (4.1-1) b G1 y1i ... bGG yG i G1 x1i ... G K xK i uG i where xk i is the kth exogenous variable for the ith period, y j i is the jth endogenous variable for the ith period, and u j i is the jth equation error term for the ith period. If we define b11 b12 ... b1G b b ... b 2G B= 21 22 .................... bG1 bG2 ... bGG y1i x1i u1i 1112 ... 1K y x u ... 2K 21 22 yi 2i x i 2i u i 2i . . . ................ G1 G 2 ... GK yGi xKi uGi equation (4.1-1) can be written as Byi x i ui If all observations in yi , x i and ui are included, then u11u12 ... u1N x11 x12 ... x1N y11 y12 ... y1N x x ... x y y ... y u21u22 ... u2 N 21 22 2N 21 22 2N X= Y= U= ............... ............... ............... yG1 yG 2 ... yG N uG1uG 2 ... uG N xk 1 xk 2 ... xk N 2 For further discussion see Pindyck and Rubinfeld (1981, 339-349). (4.1-2) Simultaneous Equations Systems 4-3 and equation (4.1-2) can be written as BY X U (4.1-3) From equation (4.1-3), the constrained reduced form can be calculated as Y= B-1X B-1U= X V (4.1-4) If is estimated directly with OLS, then it is called the unconstrained reduced form. The B34S simeq command estimates B, using either OLS, 2SLS, LIML, 3SLS, I3SLS, or FIML. For each estimated vector B, the associated reduced form coefficient vector π can be optionally calculated.3 If B is estimated by OLS, the coefficients will be biased since the key OLS assumption that the right-hand-side variables are orthogonal with the error term is violated. Model (4.1-3) can be normalized such that the coefficients bi j 1 for i j . The necessary condition for identification of each equation is that the number of endogenous variables - 1 be less than or equal to the number of excluded exogenous variables. The reason for this restriction is that otherwise it would not be possible to solve for the elements of uniquely in terms of the other parameters of the model. A short example from Greene (2003) that is self documented using MATLAB illustrates this problem. Table 4.1 Matlab Program to obtain Constrained Reduced Form % % % % % Greene (2003) Chapter 15 Problem # 1 y1= g1*y2 + b11*x1 + b21*x2 + b31*x3 y2= g2*y1 + b12*x1 + b22*x2 + b32*x3 We know BY+GX=E syms g1 g2 b11 b21 b31 b12 b22 b32 B =[ 1, -g1; -g2, 1] G =[-b11,-b21,-b31; -b12,-b22,-b32] a= -1*inv(B)*G p11=a(1,1) p12=a(1,2) p13=a(1,3) p21=a(2,1) p22=a(2,2) p23=a(2,3) % Hopeless. Have 6 equations BUT more than 6 variables ' Now impose restrictions' 3 If the model is exactly identified, the constrained reduced form can be directly estimated by OLS or using (4.1-4) from LIML, 2SLS or 3SLS. This is shown empirically in section 4.5. 4-4 Chapter 4 ' b21=0 b32=0' G =[-b11, 0, -b31; -b12,-b22, 0 ] B,G a= -1*inv(B)*G ' Here 6 equations and six unknowns g1 g2 b11 b31 b12 b22 ' p11=a(1,1) p12=a(1,2) p13=a(1,3) p21=a(2,1) p22=a(2,2) p23=a(2,3) Simultaneous Equations Systems 4-5 Table 4.2 Edited output from running Matlab Program in Table 4.1 p11 p12 p13 p21 p22 p23 = = = = = = -1/(-1+g1*g2)*b11+g1/(-1+g1*g2)*b12 -1/(-1+g1*g2)*b21+g1/(-1+g1*g2)*b22 -1/(-1+g1*g2)*b31+g1/(-1+g1*g2)*b32 -g2/(-1+g1*g2)*b11+1/ (-1+g1*g2)*b12 -g2/(-1+g1*g2)*b21+1/ (-1+g1*g2)*b22 -g2/(-1+g1*g2)*b31+1/ (-1+g1*g2)*b32 Here 6 equations and six unknowns g1 g2 b11 b31 b12 b22 p11 p12 p13 p21 p22 p23 = -1/(-1+g1*g2)*b11+g1/(-1+g1*g2)*b12 = -g1/(-1+g1*g2)*b22 = -1/(-1+g1*g2)*b31 = -g2/(-1+g1*g2)*b11+1/(-1+g1*g2)*b12 = -1/(-1+g1*g2)*b22 = -g2/(-1+g1*g2)*b31 If the excluded exogenous variables of the ith equation are not significant in any other equation, then the ith equation will not be identified, even if it is correctly specified. We note that E (ui | xi ) 0 and E (uiui' ) where ui [u1i , , uGi ] ' and x i [ x1i , , xK i ]' . The reduced form disturbance is not correlated with the exogenous variables or E (vi | xi ) B 1 0 0 . E (vi vi' | xi ) E[ B 1ui ui' ( B ' ) 1 ] B 1( B ' ) 1 from which we deduce that BB' (4.1-5) In summary, = G by K exogenous variable coefficient matrix, B = G by G nonsingular endogenous variable coefficient matrix, = K by K symmetric positive definite matrix structural covariance matrix, =G by K constrained reduced form coefficient matrix and = G by G reduced form covariance matrix. The importance of this is that since and can be estimated consistently by OLS, following Greene (2003, 387) if B were known, we could obtain B from (4.1-4) and from (4.1-5). If there are no endogenous variables on the right, yet a number of equations are estimated where there is covariance in the error term across equations, the seemingly unrelated regression model (SUR) can be estimated as ˆ ( X ' 1 X ) 1 ( X ' 1Y ). (4.1-6) ˆ (ˆ ) can be estimated if OLS is used on each of the G equations and Elements of ij ˆ ii uˆiuˆi' /(T Ki ) ˆ i j uˆiuˆ 'j / (T Ki )(T K j ) (4.1-7) For more detail see Greene (2003) or other advanced econometric books. Pindyck and Rubinfield (1976, 1981, 1990) provides a particularly good treatment that is consistent with the 4-6 Chapter 4 notation in this chapter. From (4.1-4) Theil (1971, 463-468) suggests calculating the final form. First partition the ith observation of the exogenous variables into lagged endogenous, current exogenous and lagged exogenous where identifies are used to express lags > 1. [d 0 , D1 , D2 , D3 ] yi 1 x x i x i 1 yi d 0 D1yi 1 D2 x i D3 x i 1 i* * i (4.1-8) Theil (1971) shows that (4.1-8) can be expressed as t 1 t 0 yi (I D1 ) 1 d 0 D2 x i D1t 1 ( D1D2 D3 )x i 1 D1t i*1 (4.1-9) where D2 is the impact multiplier. If there are no lagged endogenous variables in the system, D1 0 and the constrained reduced form and the final form are the same. In this case [ D2 , D3 ] . The interim multipliers are D2 , ( D1D2 D3 ), D1 ( D1D2 D3 ), , D1 ( D1D2 D3 ) which, when summed, form the total multiplier G * G* D2 (I D1 D12 )( D1D2 D3 ) D2 (I D1 ) 1 ( D1D2 D3 ) (I D1 ) 1[(I D1 ) D2 D1D2 D3 ) (4.1- (I D1 ) 1 ( D2 D3 ) 10) Goldberger (1959) and Kmenta (1971, 592) provide added detail. The importance of (4.1-8) is that it shows the effect on all endogenous variables of a change in any exogenous variable after all effects have had a chance to work themselves out in the system. There are several common mistakes made in setting up simultaneous equations systems. These include the following: - Not fully checking for multicollinearity in the equations system. - Attempting to interpret the estimated B and Γ coefficients as partial derivatives, rather Simultaneous Equations Systems than 4-7 looking at the reduced form G by K matrix π. - Not effectively testing whether excluded exogenous variables are significant in at least one other equation in the system. - Not building into the solution procedure provisions for taking into account the number of significant digits in the data. The simeq code has unique design characteristics that allow solutions for some of these problems. In the next sections, we will briefly outline some of these features. Assume for a moment that X is a T by K matrix of observations of the exogenous variables, Y is a T by 1 vector of observations of the endogenous variable, and β is a K element array of OLS coefficients, then the OLS solution for the estimated β from equation (2.1-8) is ( X ' X ) 1 X ' Y . The problem with this approach is that some accuracy is lost by forming the matrix X ' X . The QR approach4 proceeds by operating directly on the matrix X to express it in terms of the upper triangular K by K matrix R and the T by T orthogonal matrix Q. X is factored as R R X=Q [Q1 | Q2 ] Q1R 0 0 (4.1-11) Since Q'Q = I, then (X' X)-1X' Y=(R 'Q1' Q1R)-1R 'Q1' Y=(R 'R) 1R 'Q1' Y=R 1Q1' Y (4.1-12) 4 A good discussion of the QR factorization is contained in Strang (1976). Other references include Jennings (1980) and Dongarra, Bunch, Moler, and Stewart (1979). 4-8 Chapter 4 Following Jennings (1980), we define the condition number of matrix X, (C(X)), as the ratio of the square root of the largest eigenvalue of X ' X , [ Emax ( X ' X )] to the smallest eigenvalue of X ' X , [ Emin ( X ' X )] C(X)= [Emax (X'X)/E min (X'X)] (4.1-13) If | | X||= Emax (X'X) , and X is square and nonsingular, then C(X)=||X|| ||X1 || (4.1-14) Throughout B34S, 1/C(X) is checked to test for rank problems. Jennings (1980) notes that C(X) can also be used as a measure of relative error. If μ is a measure of round-off error, then [C ( X )]2 is the bound for the relative error of the calculated solution. In an IBM 370 running double precision, μ is approximately .1E-16. If C(X) is > .1E+8 (1 /C(X) is < .1E-8), then [C(X)]2 1 , meaning that no digits in the reported solution are significant. Jennings (1980) looks at the problem from another perspective. If matrix X has a round-off error of τX such that the actual X used is X+τX, then || X|| / ||X|| must be less than 1/C(X) for a solution to exist. If || X|| / ||X|| = 1/C(X) (4.1-15) then there exists a X such that X X is singular.5 The user can inspect the estimate of the condition and determine the degree of multicollinearity. Most programs only report problems when the matrix is singular. Inspection of C(X) gives warning of the degree of the problem. The simeq command contains the IPR parameter option with which the user can inform the program of the number of significant digits in X. This information is used to terminate the iterative threestage (ILS3) iterations when the relative change in the solution is within what would be expected, given the number of significant digits in the data. Jennings (1980) notes that the relative error of the QR solution to the OLS problem given in equation (4.1-10) has the form n1C ( X ) n2C ( X ) 2 (|| eˆ || / || ˆ ||) (4.1-16) where n1 and n2 are of the order of machine precision and || eˆ || ˆ || are the lengths of the estimated residual and estimated coefficients, respectively. (The length or L2NORM of a vector 5 For more detail on techniques used in simeq to avoid numerical error in the calculations arising from differences in the means of the data, see Jennings (1980). Simultaneous Equations Systems ei is defined as e 2 i 4-9 ) . Equation (4.1-14) indicates that as the relative error of the computer i solution improves, the closer the model fits. An estimate of this relative error is made for OLS, LIML and 2SLS estimators reported by simeq. 4.2 Estimation of OLS, LIML, LS2, LS3, and ILS3 For OLS estimation of a system of equations, simeq uses the QR approach discussed earlier. If the reduced option is used, once the structural coefficients B and Γ in equation (4.1-3) are known, the constrained reduced form coefficients π from equation (4.1-4) are displayed. If B and Γ are estimated using OLS, and all structural equations are exactly identified, then the constraints on π imposed from the structural coefficients B and Γ are not binding and π could be estimated directly with OLS or indirectly via (4.1-4). However, if one or more of the equations in the structural equations system (4.1-2) are overidentified, π must be estimated as B1 . Although the reduced-form coefficients π exist and may be calculated from any set of structural estimates B and Γ, in practice it is not desirable to report those derived from OLS estimation because in the presence of endogenous variables on the right-hand side of an equation, the OLS assumption that the error term is orthogonal with the explanatory variables is violated. Since OLS imposes this constraint as a part of the estimation process, the resulting estimated B and Γ are biased. The reason that OLS is often used as a benchmark is because from among the class of all linear estimators, OLS produces minimum variance. The loss in predictive power of LIML and 2SLS has to be weighed against the fact that OLS produces biased estimates. If reduced-form coefficients are desired, identities in the system must be entered. The number of identities plus the number of estimated equations must equal the number of endogenous variables in the model. The simeq command requires that the number of model sentences and identity sentences is equal to the number of variables listed in the endogenous sentence. The 2SLS estimator first estimates all endogenous variables as a function of all exogenous variables. This is equivalent to estimating an unconstrained form of the reduced-form equation (4.1-4). Next, in stage 2 the estimated values of the endogenous variables on the right in the jth equation Yj* are used in place of the actual values of the endogenous variables Yj on the right to estimate equation (4.1-2). Since the estimated values of the endogenous variables on the right are only a function of exogenous variables, the theory suggests they can be assumed to be orthogonal with the population error, and OLS can be safely used for the second stage. In terms of our prior notation, the two-stage estimator for the first equation is 4-10 Chapter 4 b11 . 1 ˆ 'Yˆ Yˆ ' X Yˆ ' y b1g Y 1 1 1 1 1 1 ˆ X )'(Y ˆ X )}-1 (Y ˆ X )' y {(Y ' 1 1 1 1 1 1 1 ' ˆ ' 11 X 1Y1 X 1 X 1 X 1 y1 . 1g (4.2-1) where Ŷ1' is the matrix of predicted endogenous variables in the first equation and X1 is the matrix of exogenous variables in the first equation. For further details on this traditional estimation approach, see Pindyck and Rubinfeld (1981, 345-347). The QR approach used by Jennings (1980) involves estimating equation (4.2-1) as the solution of Z'j (XX )Z j j Z'j (XX )y j (4.2-2) For j , where 'j {(b11,..., big ) ',(11,.., 1k ) '}, Z j [X j | Yj ] and X+ pseudoinverse6 of X. Zj consists of the X and Y variables in the jth equation. XX+ is not calculated directly but is expressed in terms of the QR factorization of X. By working directly on X, and not forming X'X, substantial accuracy is obtained. Jennings proceeds by writing I 0 XX + Q r Q ' 0 0 (4.2-3) where Ir is the r by r identity matrix and r is the rank of X. Using equation (4.2-3), equation (4.22) becomes ˆ Ir 0 Z ˆ Z ˆ Ir 0 yˆ Z j j j j j 0 0 0 0 (4.2-4) where Ẑ j Q'Z j and ŷ j Q' y j . The 2SLS covariance matrix can be estimated as 6 If we define X+ as the pseudoinverse of the T by K matrix X, then it can be shown (Strang 1976, 138, exercise 3.4.5) that the following four conditions hold: 1. XX +X=X; 2. X+XX+=X+; 3. (XX+)'=XX+; and 4. (X+X)'=X+X . The pseudoinverse can be obtained from the singular value decomposition or the QR factorization of X. Simultaneous Equations Systems (|| e j ||2 d f )(Z'jXX+ Z j )1 4-11 (4.2-5) where d f is the degrees of freedom and || e j ||2 is the residual sum of squares (or the square of the L2NORM of the residual). There is a substantial controversy in the literature about the appropriate value for d f . Since the SEs of the estimated 2SLS coefficients are known only asymptotically, Theil (1971) suggests that d f be set equal to T, the number of observations used to estimate the model. Others suggest that d f be set to T-K, similar to what is being used in OLS. If Theil's suggestion is used, the estimated SEs of the coefficients are larger. The T-K option is more conservative. The simeq command produces both estimates of the coefficient standard errors to facilitate comparison with other programs and researcher preferences. Two-stage least squares estimation of an equation with endogenous variables on the right, in contrast with OLS estimation, in theory produces unbiased coefficients at the cost of some loss of efficiency. If a large system is estimated, it is often impossible to use all exogenous variables in the system because of loss of degrees of freedom. The usual practice is to select a subset of the exogenous variables. The greater the number of exogenous variables relative to the degrees of freedom, the closer the predicted Y variables on the right are to the raw Y variables on the right. In this situation, the 2SLS estimator sum of squares of residuals will approach the OLS estimator sum of squares of residuals. Such an estimator will lose the unbiased property of the 2SLS estimator. Usual econometric practice is to use OLS and 2SLS and compare the results to see how sensitive the OLS results are to simultaneity problems. While 2SLS results are sensitive to the variable that is used to normalize the system, limited information maximum likelihood (LIML) estimation, which can be used in place of 2SLS, is not so sensitive. Kmenta (1971, 568-570) has a clear discussion which is summarized below. The LIML estimator,7 which is hard to explain in simple terms, involves selecting values for b and δ for each equation such that L is minimized where L = SSE1 / SSE. We define SSE1 as the residual variance of estimating a weighted average of the y variables in the equation on all exogenous variables in the equation, while SSE is the residual variance of estimating a weighted average of the y variables on all the exogenous variables in the system. Since SSE SSE1, L is bounded at 1. The difficulty in LIML estimation is selecting the weights for combining the y variables in the equation. Assume equation 1 of (4.1-1) b11 y1i ... b1G yG i 11 xi 1 ... 1K xK i u1i (4.2-6) Ignoring time subscripts, we can define y1* y1 [b12 y2 ... b1G yG ] (4.2-7) 7 Kmenta (1971, 565-572) has one of the clearest descriptions. The discussion here complements that material. 4-12 Chapter 4 ' [1, b12 ,..., b1G ] we would know y*1 If we define Y1* [ y1i ,..., y1G ] and we knew the vector B1* since y1* Y1*B1* and could regress y* on all x variables on the right in that equation and call the residual variance SSE1 and next regress y1* on all x variables in the system and call the residual variance SSE. If we define X1 as a matrix consisting of the columns of the x variables on the right X1= [x1i,...,x1K], and we knew B1*, then we could estimate 1 [11 ,..., 1K ] as 1 [X1' X1 ]1 X1' y1* (X1' X1 ) 1 X1*Y1*B1* (4.2-8) However, we do not know B1*. If we define W1* Y1*' Y1* (Y1*' X1 )(X1*X1 ) 1 X1*Y1* W1 Y1*' Y1* (Y1*' X)(X'X)1X'Y1* (4.2-9) (4.2-10) where X is the matrix of all X variables in the system, then L can be written as ' ' L [B1* W1*B1* ] / B1* W1B1* (4.2-11) Minimizing L implies that det (W1* LW1 )B1* 0 (4.2-12) The LIML estimator uses eigenvalue analysis to select the vector B1* such that L is minimized. This calculation involves solving the system det(W1* LW1 ) 0 (4.2-13) for the smallest root L which we will call . This root can be substituted back into equation (4.212) to get B1* and into equation (4.2-8) to get Γ1. Jennings shows that equation (4.2-13) can be rewritten as det | Y1*' {(I X1X1+ ) (I-XX + )}Y1* | 0 . (4.2-14) Further factorizations lead to accuracy improvements and speed over the traditional methods of solution outlined in Johnston (1984), Kmenta (1971), and other books. Jennings (1973, 1980) briefly discusses tests made for computational accuracy, given the number of significant digits in the data and various tests for nonunique solutions. One of the main objectives of the simeq code was to be able to inform the user if there were problems in identification in theory and in practice. Since the LIML standard errors are known only asymptotically and are, in fact, equal to the 2SLS estimated standard errors, these are used for both the 2SLS and LIML estimators. Simultaneous Equations Systems 4-13 In the first stage of 2SLS, π is the unconstrained, reduced form. Y = πX + V (4.2-15) and is estimated to obtain the Yˆ predicted variables. 2SLS, OLS, and LIML are all special cases of the Theil (1971) k class estimators. The general formula for the k class estimator for the first equation (Kmenta 1971, 565) is ˆ (k ) Y'Y kV ˆ 'V ˆ B 1 1 1 1 1 (k ) ' ˆ 1 X1Y1 1 ˆ 'y Y1'X1 Y1'Y1 kV 1 1 ' ' X1X1 X1 y1 (4.2-16) where Vˆ1 is the predicted residual from estimating all but the 1st y variable in equation (4.2-15), Yˆ1 Y1 Vˆ , and X1 is the X variables on the right-hand side of the first equation. (4.2-16) follows directly from (4.2-1). If k=0, equation (4.2-15) is the formula for OLS estimation of the first equation. If k=1, equation (4.2-16) is the formula for 2SLS estimation of the first equation and can be transformed to equation (4.2-5). If k = , the minimum root of equation (4.2-13), equation (4.2-16) is the formula for the LIML estimator (Theil 1971, 504). Hence, OLS, 2SLS, and LIML are all members of the k class of estimators. Three-stage least squares utilizes the covariance of the residuals across equations from the estimated 2SLS model to improve the estimated coefficients B and Γ. If the model has only exogenous variables on the right-hand side (B = 0), the OLS estimates can be used to calculate the covariance of the residuals across equations. The resulting estimator is the seemingly unrelated regression model (SUR). In this discussion, we will look at the 3SLS model only, since the SUR model is a special case. From (4.2-2) we rewrite the 2SLS estimator for the ith equation as i [Zi' X(X ' X) 1X ' Zi ]1 Zi' X(X 'X) 1X ' yi , (4.2-17) which estimates the ith 2SLS equation yi Zii ui . (4.2-18) If we define8 (X' X)-1 PP' and multiply equation (4.2-18) by P'X', we obtain P ' X ' yi P ' X ' Zi i P ' X 'ui which can be written 8 This discussion is based on material contained in Johnston (1984, 486). (4.2-19) 4-14 Chapter 4 w i Wii i (4.2-20) where w i P ' X ' yi , Wi =P'X ' Zi , and i P 'X 'ui . If all G 2SLS equations are written as w1 W1 0 ...... 0 1 1 w 0 W ...... 0 2 2 2 2 . .................... . . WG G G w G 0 0 (4.2-21) then the system can be written as w = Wα + ε. (4.2-22) For each equation, i=j and E[ i ( j )' ] E[P'X( i ( j )' XP)= i j P'X'XP= i j I (4.2-23) while the covariance of the error term for the system becomes 11 I 12 I... 1G I 21 I 22 I... 2G I ........................ G1I G 2 I... 1G I 24) (4.2- Equation (4.2-24) indicates that for each equation there is no heteroskedasticity, but that there is contemporaneous correlation of the residuals across equations. Equation (4.2-24) can be estimated from the 2SLS estimates of the residuals of each equation for 3SLS or the OLS estimates of the residuals of each equation for SUR models. Let ˆ ˆ I V= 25) (4.2- be such an estimate. The 3SLS estimator of the system , where ' [B ] becomes ˆ ' W) 1W ' V ˆ 1w (W ' V (4.2-26) Jennings (1980) uses two alternative approaches to solve (4.2-26) depending on whether the covariance of the 3SLS estimator Simultaneous Equations Systems Var( ) (Wˆ 'Vˆ 1Wˆ ) 1 27) 4-15 (4.2- is required or not. In the former case, a orthogonal factorization method is used. In the latter case to save space the conjugate gradiant interative algorithm (Lanczos reductyion) suggested by Paige and Sanders (1973) is used. This latter approach may or may not converge. For added detail see Jennings (1980). If the switch kcov=diag is used there will not be convergence issues, since the QR approach will be used. Since many software systems use inversion methods, slight differences in the estimated coefficients will be observed since the QR approach is in theory more accurate. Implementation of the "textbook" approach is illustrated using the matrix command in section 4.4. In a model with G equations, if the equation of interest is the jth equation, then assuming the exogenous variables in the system are selected correctly and the jth equation is specified correctly, 2SLS estimates are invariant to any other equation. 3SLS of the j th equation, in contrast, is sensitive to the specification of other equations in the system since changes in other equation specifications will alter the estimate of V and thus the 3SLS estimator of δ from equation (4.2-26). Because of this fact, it is imperative that users first inspect the 2SLS estimates closely. The constrained reduced form estimates, π, should be calculated from the OLS and 2SLS models and compared. The differences show the effects of correcting for simultaneity. Next, 3SLS should be performed. A study of the resulting changes in δ and π will show the gain of moving to a system-wide estimation procedure. Since changes in the functional form of one equation i can possibly impact the estimates of another equation j, in this step of model building, sensitivity analysis should be attempted. In a multiequation system, the movement from 2SLS to 3SLS often produces changes in the estimate of δi for one equation but not for another equation. In a model in which all equations are over identified, in general the 3SLS estimators will differ from the 2SLS estimators. If all equations are exactly identified, then V is a diagonal matrix (Theil 1971, 511) and there is no gain for any equation from using 3SLS. In the test problem from Kmenta (1971, 565), which is discussed in the next section, one equation is over identified and one equation is exactly identified. In this case, only the exactly identified equation will be changed by 3SLS. This is because the exactly identified equation gains from information in the over identified equation but the reverse is not true. The over identified equation does not gain from information from the exactly identified equation. In SUR models, if all equations contain the same variables, there is no gain over OLS from going to SUR, since V is again a diagonal matrix. Just as the LIML method of estimation is an alternative to 2SLS, the FIML is a more costly alternative to 3SLS and I3SLS. FIML9 is a generalization of LIML for systems of models. Like LIML, it is invariant to 9 The fiml section of the simeq command is the weakest link. In addition to a probably a scaler error in the fiml standard errors, there often are convergence problems that appear to be data related. In view of this and the fact that 3SLS is an inexpensive substitute, users are encouraged to employ 3SLS and I3SLS in place of FIML. Future releases of B34S will endeavor to improve the FIML code or disable the option. The matrix command implementation of FIML, shown later in section 4.4, provides a look into how such a model might be implemented. 4-16 Chapter 4 the variable used to normalize the model. FIML, in contrast with 3SLS, is highly nonlinear and, as a consequence, much more costly to estimate. Because FIML is asymptotically equivalent to 3SLS (Theil 1971, 525) and the simeq code does not contain any major advantages over other programs, the discussion of FIML is left to Theil (1971), Kmenta (1971) and Johnston (1984) except for an annotated FIML example using the matrix command. In the next section, an annotated output is presented. Iterative 3SLS is an alternative final step in which the estimate of V is updated from the information from the 3SLS estimates. The problem now becomes where do you stop iterating on the estimates of V? The simeq command uses the information on the number of significant digits (see ipr parameter) in the raw data and equation (4.1-8) to terminate the I3SLS iterations if the relative change is within what would be expected, given the number of significant digits in the raw data. If ipr is not set, the simeq command assumes ten digits. 4.3 Examples Simultaneous Equations Systems 4-17 Using data on supply and demand from Kmenta (1971, 565), Table 4.1 shows matrix code to estimate models for OLS, LIML, 2SLS, and 3SLS. The reduced-form estimates for each model are calculated. Not all output is shown to save space. The results are the same, digit for digit, as those reported in Kmenta (1971, 582). Note the use of the keyword ls2 for 2SLS and ls3 for 3SLS since the parser will not recognize 2SLS and 3SLS as keywords. Table 4.3 Setup for ols, liml, ls2, ls3, and ils3 commands ==KMENTA1 B34sexec data nohead corr$ Input q p d f a $ Label q = 'Food consumption per head'$ Label p = 'Ratio of food prices to consumer prices'$ Label d = 'Disposable income in constant prices'$ Label f = 'Ratio of t-1 years price to general p'$ Label a = 'Time'$ Comment=('Kmenta (1971) page 565 answers page 582')$ Datacards$ 98.485 100.323 87.4 98.0 1 99.187 104.264 97.6 99.1 2 102.163 103.435 96.7 99.1 3 101.504 104.506 98.2 98.1 4 104.240 98.001 99.8 110.8 5 103.243 99.456 100.5 108.2 6 103.993 101.066 103.2 105.6 7 99.900 104.763 107.8 109.8 8 100.350 96.446 96.6 108.7 9 102.820 91.228 88.9 100.6 10 95.435 93.085 75.1 81.0 11 92.424 98.801 76.9 68.6 12 94.535 102.908 84.6 70.9 13 98.757 98.756 90.6 81.4 14 105.797 95.119 103.1 102.3 15 100.225 98.451 105.1 105.0 16 103.522 86.498 96.4 110.5 17 99.929 104.016 104.4 92.5 18 105.223 105.769 110.7 89.3 19 106.232 113.490 127.1 93.0 20 B34sreturn$ B34seend$ B34sexec simeq printsys reduced ols liml ls2 ls3 ils3 kcov=diag ipr=6$ Heading=('Test Case from Kmenta (1971) Pages 565 - 582 ' ) $ Exogenous constant d f a $ Endogenous p q $ Model lvar=q rvar=(constant p d) Name=('Demand Equation')$ Model lvar=q rvar=(constant p f a) name=('Supply Equation')$ B34seend$ == 4-18 Chapter 4 The OLS results are as follows: Test Case from Kmenta (1971) Pages 565 - 582 Summary of Input Parameters and Model Number of systems to be estimated - Number of identities - - - - - - - Number of exogenous variables - - Number of endogenous variables - - Number of data points in time - - - Maximum number of unknowns per system Print Parameter - - - - - - - - - - Solutions wanted 0 => no, 1 => yes Reduced form coefficients - - - - - Ordinary Least Squares - - - - - - LIMLE Solution - - - - - - - - - - Two Stage Least Squares - - - - - - Three Stage Least Squares - - - - - Three Stage Covariance Matrix - - - Iterated Three Stage Least Squares Covariance Matrix for I3SLSQ - - - Maximum number of iterations - - - Functional Minimization 3SLSQ - - - Covariance Matrix for Functional Min. - 2 0 4 2 20 4 2 1 1 1 1 1 1 1 1 25 0 0 Systems described by the following columns of data Name of the System LHS Demand Equation B34S 8.10R 4 Q 2 Q No. X 3 1 1 CONSTANT 1 P 3 F 4 A * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * (D:M:Y) 11/ 4/04 (H:M:S) 11:13:19 SIMEQ STEP PAGE Test Case from Kmenta (1971) Pages 565 - 582 Least Squares Solution for System Number 1 Condition Number of Matrix is greater than Relative Numerical Error in the Solution LHS Endogenous Variable No. 2 Exogenous Variables (Predetermined) 1 CONSTANT 2 D Demand Equation 21.04911571706159 1.301987681166638E-11 Q 99.89542 0.3346356 Std. Error 7.519362 0.4542183E-01 t 13.28509 7.367285 Endogenous Variables (Jointly Dependent) 3 P -0.3162988 Std. Error 0.9067741E-01 t -3.488177 Residual Variance for Structural Disturbances Ratio of Norm Residual to Norm LHS 3.725391173733892 1.762488253954560E-02 Covariance Matrix of Estimated Parameters CONSTANT D P 1 2 3 CONSTANT D 1 2 56.54 0.3216E-01 0.2063E-02 -0.5948 -0.2333E-02 P 3 0.8222E-02 Correlation Matrix of Estimated Parameters CONSTANT D P 1 2 3 NO. Y 2 1 1 CONSTANT 1 P 2 D * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Supply Equation 2 (Variables) CONSTANT D 1 2 1.000 0.9417E-01 1.000 -0.8724 -0.5665 P 3 1.000 Test Case from Kmenta (1971) Pages 565 - 582 Least Squares Solution for System Number Condition Number of Matrix is greater than Relative Numerical Error in the Solution LHS Endogenous Variable No. 2 Exogenous Variables (Predetermined) 2 Supply Equation 17.67594711864223 1.318741471618151E-11 Q Std. Error t Simultaneous Equations Systems 1 2 3 CONSTANT F A 58.27543 0.2481333 0.2483023 11.46291 0.4618785E-01 0.9751777E-01 5.083825 5.372263 2.546227 Endogenous Variables (Jointly Dependent) 4 P 0.1603666 Std. Error 0.9488394E-01 t 1.690134 Residual Variance for Structural Disturbances Ratio of Norm Residual to Norm LHS 5.784441135907554 2.130622575072544E-02 4-19 Covariance Matrix of Estimated Parameters CONSTANT F A P CONSTANT 1 131.4 -0.3044 -0.2792 -0.9875 1 2 3 4 F A P 2 3 4 0.2133E-02 0.1316E-02 0.8440E-03 0.9510E-02 0.5220E-03 0.9003E-02 Correlation Matrix of Estimated Parameters CONSTANT F A P CONSTANT 1 1.000 -0.5749 -0.2498 -0.9079 1 2 3 4 F A 2 1.000 0.2921 0.1926 P 3 1.000 0.5642E-01 4 1.000 Test Case from Kmenta (1971) Pages 565 - 582 Contemporaneous Covariance of Residuals (Structural Disturbances) For Least Squares Solution. Condition Number of residual columns, Demand E Supply E Demand E 1 3.167 3.411 1 2 2.664758 Supply E 2 4.628 Correlation Matrix of Residuals Demand E Supply E 1 2 Demand E 1 1.000 0.8912 Supply E 2 1.000 Test Case from Kmenta (1971) Pages 565 - 582 Coefficients of the Reduced Form Equations. Least Squares Solution. Condition number of matrix used to find the reduced form coefficients is no smaller than P CONSTANT D F A 1 2 3 4 1 87.31 0.7020 -0.5206 -0.5209 4.195815340351579 Q 2 72.28 0.1126 0.1647 0.1648 Mean sum of squares of residuals for the reduced form equations. 1 2 P Q 0.42748D+01 0.39192D+01 Condition Number of columns of exogenous variables, 11.845 For each estimated equation, the condition number of the matrix, equation (4.1-7), and the relative numerical errors in the solution, equation (4.1-8), are given. The relative numerical errors for the supply and demand equations were .1302E-10 and .13187E-10, respectively. Estimated coefficients agree with Kmenta (1971, 582). From the estimated B and Γ coefficients, the constrained reduced form π coefficients are calculated. The condition number of the exogenous columns, .11845E+2, shows little multicollinearity among the exogenous variables. The next outputs show the corresponding estimates for LIML, 2SLS, and 3LSL. As was discussed earlier, since the asymptotic SEs for LIML are the same as for 2SLS, the simeq command does not print these values. 4-20 Chapter 4 Test Case from Kmenta (1971) Pages 565 - 582 Limited Information - Maximum Likelihood Solution f 1 Demand Equation Rank and Condition Number of Exogenous Columns Rank and Condition Number of Endogenous Variables orthogonal to X(K) Rank and Condition Number of Endogenous Variables orthogonal to X Value of LIML Parameter is 2 2 2 8.5174634 6.5593694 2.3005812 3 1 2 8.2098363 1.0000000 1.0000000 1.173867141559841 Condition Number of Matrix is greater than Relative Numerical Error in the Solution LHS Endogenous Variable No. 2 8.517463415017575 4.487883690647531E-12 Q Standard Deviation Equals 2SLSQ Standard Deviation. Exogenous Variables (Predetermined) 1 CONSTANT 2 D 93.61922 0.3100134 Endogenous Variables (Jointly Dependent) 3 P -0.2295381 Residual Variance for Structural Disturbances Ratio of Norm Residual to Norm LHS 3.926009688207962 1.809322459330604E-02 Test Case from Kmenta (1971) Pages 565 - 582 Limited Information - Maximum Likelihood Solution f 2 Supply Equation Rank and Condition Number of Exogenous Columns Rank and Condition Number of Endogenous Variables orthogonal to X(K) Rank and Condition Number of Endogenous Variables orthogonal to X Value of LIML Parameter is 1.000000000000000 Condition Number of Matrix is greater than Relative Numerical Error in the Solution LHS Endogenous Variable No. 2 8.209836250820180 4.943047984855735E-12 Q Standard Deviation Equals 2SLSQ Standard Deviation. Exogenous Variables (Predetermined) 1 CONSTANT 2 F 3 A 49.53244 0.2556057 0.2529242 Endogenous Variables (Jointly Dependent) 4 P 0.2400758 Residual Variance for Structural Disturbances Ratio of Norm Residual to Norm LHS 6.039577731391617 2.177103664979223E-02 Test Case from Kmenta (1971) Pages 565 - 582 Contemporaneous Covariance of Residuals (Structural Disturbances) For LIMLE Solution. Condition Number of residual columns, Demand E Supply E Demand E 1 3.337 3.629 1 2 2.811594 Supply E 2 4.832 Correlation Matrix of Residuals Demand E 1 1 1.000 2 0.9038 Demand E Supply E Supply E 2 1.000 Test Case from Kmenta (1971) Pages 565 - 582 Coefficients of the Reduced Form Equations. LIMLE Solution. Condition number of matrix used to find the reduced form coefficients is no smaller than P CONSTANT D F A 1 2 3 4 1 93.88 0.6601 -0.5443 -0.5386 Q 2 72.07 0.1585 0.1249 0.1236 Mean sum of squares of residuals for the reduced form equations. 1 P 0.41286D+01 4.258817996669486 Simultaneous Equations Systems 2 Q 4-21 0.38401D+01 Test Case from Kmenta (1971) Pages 565 - 582 Two Stage Least Squares Solution for System Number Condition Number of Matrix is greater than Relative Numerical Error in the Solution LHS Endogenous Variable No. 2 Exogenous Variables (Predetermined) 1 CONSTANT 2 D 1 Demand Equation 21.98482284147018 1.411421448020441E-11 Q 94.63330 0.3139918 Std. Error 7.920838 0.4694366E-01 t 11.94738 6.688695 Theil SE 7.302652 0.4327991E-01 Theil t 12.95876 7.254908 Endogenous Variables (Jointly Dependent) 3 P -0.2435565 Std. Error 0.9648429E-01 t -2.524313 Theil SE 0.8895412E-01 Theil t -2.738002 Residual Variance for Structural Disturbances Ratio of Norm Residual to Norm LHS 3.866416929101937 1.795538131264630E-02 Covariance Matrix of Estimated Parameters CONSTANT D P 1 2 3 CONSTANT D 1 2 62.74 0.4930E-01 0.2204E-02 -0.6734 -0.2642E-02 P 3 0.9309E-02 Correlation Matrix of Estimated Parameters CONSTANT D P 1 2 3 CONSTANT 1 1.000 0.1326 -0.8812 D P 2 3 1.000 -0.5833 1.000 Test Case from Kmenta (1971) Pages 565 - 582 Two Stage Least Squares Solution for System Number Condition Number of Matrix is greater than Relative Numerical Error in the Solution LHS Endogenous Variable No. 2 Exogenous Variables (Predetermined) 1 CONSTANT 2 F 3 A 2 Supply Equation 18.21923089332271 1.431397195953368E-11 Q 49.53244 0.2556057 0.2529242 Std. Error 12.01053 0.4725007E-01 0.9965509E-01 t 4.124086 5.409637 2.537996 Theil SE 10.74254 0.4226175E-01 0.8913422E-01 Theil t 4.610868 6.048158 2.837565 Endogenous Variables (Jointly Dependent) 4 P 0.2400758 Std. Error 0.9993385E-01 t 2.402347 Theil SE 0.8938355E-01 Theil t 2.685905 Residual Variance for Structural Disturbances Ratio of Norm Residual to Norm LHS 6.039577731391617 2.177103664979223E-02 Covariance Matrix of Estimated Parameters CONSTANT F A P 1 2 3 4 CONSTANT 1 144.3 -0.3238 -0.2952 -1.095 F A P 2 3 4 0.2233E-02 0.1377E-02 0.9362E-03 0.9931E-02 0.5791E-03 0.9987E-02 Correlation Matrix of Estimated Parameters CONSTANT F A P 1 2 3 4 CONSTANT 1 1.000 -0.5706 -0.2467 -0.9126 F A 2 1.000 0.2924 0.1983 P 3 1.000 0.5815E-01 4 1.000 Test Case from Kmenta (1971) Pages 565 - 582 Contemporaneous Covariance of Residuals (Structural Disturbances) For Two Stage Least Squares Solution. Condition Number of residual columns, Demand E Supply E 1 2 Demand E 1 3.286 3.593 Supply E 2 4.832 Correlation Matrix of Residuals Demand E 1 Supply E 2 2.804709 4-22 Chapter 4 Demand E Supply E 1 2 1.000 0.9017 1.000 Test Case from Kmenta (1971) Pages 565 - 582 Coefficients of the Reduced Form Equations. Two Stage Least Squares Solution Condition number of matrix used to find the reduced form coefficients is no smaller than P CONSTANT D F A 1 2 3 4 1 93.25 0.6492 -0.5285 -0.5230 Q 2 71.92 0.1559 0.1287 0.1274 Mean sum of squares of residuals for the reduced form equations. 1 2 P Q 0.39831D+01 0.38317D+01 Condition number of the large matrix in Three Stage Least Squares 60.70221 4.135372945327849 Simultaneous Equations Systems 4-23 Test Case from Kmenta (1971) Pages 565 - 582 Three Stage Least Squares Solution for System Number LHS Endogenous Variable No. 2 1 Demand Equation Q Exogenous Variables (Predetermined) 1 CONSTANT 2 D 94.63330 0.3139918 Std. Error 7.920838 0.4694366E-01 t 11.94738 6.688695 Theil SE 7.302652 0.4327991E-01 Theil t 12.95876 7.254908 Endogenous Variables (Jointly Dependent) 3 P -0.2435565 Std. Error 0.9648429E-01 t -2.524313 Theil SE 0.8895412E-01 Theil t -2.738002 Residual Variance (For Structural Disturbances) 3.286454 Three Stage Least Squares Covariance for System CONSTANT D P CONSTANT D 1 2 62.74 0.4930E-01 0.2204E-02 -0.6734 -0.2642E-02 1 2 3 Demand Equation P 3 0.9309E-02 Three Stage Least Squares Solution for System Number LHS Endogenous Variable No. 2 2 Supply Equation Q Exogenous Variables (Predetermined) 1 CONSTANT 2 F 3 A 52.11764 0.2289775 0.3579074 Std. Error 11.89337 0.4399381E-01 0.7288940E-01 t 4.382074 5.204767 4.910281 Theil SE 10.63776 0.3934926E-01 0.6519426E-01 Theil t 4.899308 5.819106 5.489861 Endogenous Variables (Jointly Dependent) 4 P 0.2289322 Std. Error 0.9967317E-01 t 2.296828 Theil SE 0.8915039E-01 Theil t 2.567932 Residual Variance (For Structural Disturbances) 5.360809 Three Stage Least Squares Covariance for System CONSTANT F A P CONSTANT 1 141.5 -0.2950 -0.4090 -1.083 1 2 3 4 F A Supply Equation P 2 3 4 0.1935E-02 0.2548E-02 0.8119E-03 0.5313E-02 0.1069E-02 0.9935E-02 Test Case from Kmenta (1971) Pages 565 - 582 Contemporaneous Covariance of Residuals (Structural Disturbances) For Three Stage Least Squares Solution. Condition Number of residual columns, Demand E Supply E 1 2 Demand E 1 3.286 4.111 6.321462 Supply E 2 5.361 Correlation Matrix of Residuals Demand E 1 1 1.000 2 0.9794 Demand E Supply E Supply E 2 1.000 Test Case from Kmenta (1971) Pages 565 - 582 Coefficients of the Reduced Form Equations. Three Stage Least Squares Solution using Orthogonal Factorization. Condition number of matrix used to find the reduced form coefficients is no smaller than P CONSTANT D F A 1 2 3 4 1 89.98 0.6645 -0.4846 -0.7575 Q 2 72.72 0.1521 0.1180 0.1845 Mean sum of squares of residuals for the reduced form equations. 1 2 P Q 0.19065D+01 0.42494D+01 Iterated Three Stage Least Squares Results are given next. 4.232905401139098 4-24 Chapter 4 Iteration begins for Iterated 3SLSQ. Condition number of the large matrix in Three Stage Least Squares 147.2220 Test Case from Kmenta (1971) Pages 565 - 582 Iterated Three Stage Least Squares Solution for System No. LHS Endogenous Variable No. 2 1 Demand Equation Q Exogenous Variables (Predetermined) 1 CONSTANT 2 D 94.63330 0.3139918 Std. Error 7.920838 0.4694366E-01 t 11.94738 6.688695 Theil SE 7.302652 0.4327991E-01 Theil t 12.95876 7.254908 Endogenous Variables (Jointly Dependent) 3 P -0.2435565 Std. Error 0.9648429E-01 t -2.524313 Theil SE 0.8895412E-01 Theil t -2.738002 Residual Variance (For Structural Disturbances) 3.286454 Iterated Three Stage Least Squares Covariance for System Demand Equation CONSTANT D P CONSTANT D 1 2 62.74 0.4930E-01 0.2204E-02 -0.6734 -0.2642E-02 1 2 3 P 3 0.9309E-02 Iterated Three Stage Least Squares Solution for System No. LHS Endogenous Variable No. 2 2 Supply Equation Q Exogenous Variables (Predetermined) 1 CONSTANT 2 F 3 A 52.55269 0.2244964 0.3755747 Endogenous Variables (Jointly Dependent) 4 P 0.2270569 Std. Error 12.74080 0.4653972E-01 0.7166061E-01 t 4.124755 4.823758 5.241020 Theil SE 11.39572 0.4162639E-01 0.6409520E-01 Theil t 4.611616 5.393126 5.859638 Std. Error 0.1069194 t 2.123627 Theil SE 0.9563159E-01 Theil t 2.374287 Residual Variance (For Structural Disturbances) 5.565111 Iterated Three Stage Least Squares Covariance for System Supply Equation CONSTANT F A P CONSTANT 1 162.3 -0.3336 -0.4953 -1.245 1 2 3 4 F A P 2 3 4 0.2166E-02 0.3185E-02 0.9086E-03 0.5135E-02 0.1336E-02 0.1143E-01 Test Case from Kmenta (1971) Pages 565 - 582 Contemporaneous Covariance of Residuals (Structural Disturbances) For Iterated Three Stage Least Squares Solution. Condition Number of residual columns, Demand E Supply E Demand E 1 3.286 4.198 1 2 6.814796 Supply E 2 5.565 Correlation Matrix of Residuals Demand E Supply E 1 2 Demand E 1 1.000 0.9816 Supply E 2 1.000 Test Case from Kmenta (1971) Pages 565 - 582 Coefficients of the Reduced Form Equations. Iterated Three Stage Least Squares Solution. Condition number of matrix used to find the reduced form coefficients is no smaller than P CONSTANT D F A 1 2 3 4 1 89.42 0.6672 -0.4770 -0.7981 Q 2 72.86 0.1515 0.1162 0.1944 Mean sum of squares of residuals for the reduced form equations. 1 P 0.20576D+01 4.249772824974006 Simultaneous Equations Systems 2 Q 4-25 0.43519D+01 In the Kmenta test problem, one equation (demand) was overidentified and one equation (supply) was exactly identified. As was mentioned earlier, the 2SLS and 3SLS results for the overidentified equation are the same because the other equation was exactly identified. However, the 3SLS results for the exactly identified equation (supply) differ from the 2SLS results because the other equation (demand) is over identified. Close inspection of the results for 3SLS for the demand equation shows that they are the same as those of Kmenta (1971, 582) and Kmenta (1986, 712). The supply-equation results are the same as those of Kmenta (1971) but differ slightly from those of Kmenta (1986), which appear to be in error.10 To facilitate testing, SAS and RATS setups are shown in Tables 4.2 and 4.3 and their output discussed in some detail. Table 4.4 SAS Implementation of the Kmenta Model B34SEXEC OPTIONS OPEN('testsas.sas') UNIT(29) DISP=UNKNOWN$ B34SRUN$ B34SEXEC OPTIONS CLEAN(29) $ B34SEEND$ B34SEXEC PGMCALL IDATA=29 ICNTRL=29$ SAS $ PGMCARDS$ proc means; run; proc syslin 3sls reduced; instruments d f a constant; endogenous p q; demand: supply: run; model q = p d; model q = p f a; proc syslin it3sls reduced; instruments d f a constant; endogenous p q; demand: supply: run; model q = p d; model q = p f a; B34SRETURN$ B34SRUN $ B34SEXEC OPTIONS CLOSE(29)$ B34SRUN$ /$ The next card has to be modified to point to SAS location /$ Be sure and wait until SAS gets done before letting B34S resume B34SEXEC OPTIONS dodos('start /w /r sas testsas') dounix('sas testsas')$ B34SRUN$ B34SEXEC OPTIONS NPAGEOUT NOHEADER WRITEOUT(' ','Output from SAS',' ',' ') WRITELOG(' ','Output from SAS',' ',' ') COPYFOUT('testsas.lst') COPYFLOG('testsas.log') dodos('erase testsas.sas','erase testsas.lst','erase testsas.log') 10 The file example.mac contains an extension of the above test case that calls RATS, SAS and a B34S matrix implementation. For the supply equation SAS gets the Kmenta (1986) results which are 52.1972 (11.8934), .2286 (.0997), .2282 (.0440), (.3611). What RATS calls 3SLS produces what B34S calls I3SLS. Readers are encouraged to use the code in tables 4.4 and 4.5 to further investigate this issue. A major difficulty for the researcher to be able to tell exactly what is being estimated by a software system. For this reason attempting the model on multiple software systems is strongly advised. 26 dounix('rm B34SRUN$ Chapter 4 testsas.sas','rm testsas.lst','rm testsas.log')$ Simultaneous Equations Systems Table 4.5 RATS Implementation of the Kmenta Model B34SEXEC OPTIONS HEADER$ B34SRUN$ b34sexec b34sexec b34sexec b34sexec options options options options open('rats.dat') unit(28) disp=unknown$ b34srun$ open('rats.in') unit(29) disp=unknown$ b34srun$ clean(28)$ b34srun$ clean(29)$ b34srun$ b34sexec pgmcall$ rats passasts pcomments('* ', '* Data passed from B34S(r) system to RATS', '* ', "display @1 %dateandtime() @33 ' Rats Version ' %ratsversion()" '* ') $ PGMCARDS$ * * heading=('test case from kmenta 1971 page 565 - 582 ' ) $ * exogenous constant d f a $ * endogenous p q $ * model lvar=q rvar=(constant p d) name=('demand eq.') $ * model lvar=q rvar=(constant p f a) name=('supply eq.') $ linreg q # constant p d linreg q # constant p f a instruments constant d f a linreg(inst) q # constant p d linreg(inst) q # constant p f a source d:\r\liml.src @liml q # constant p d @liml q # constant p f a equation demand q # constant p d equation supply q # constant p f a * Supply does not match known answers!! sur(inst,iterations=200) 2 # demand resid1 # supply resid2 nonlin(parmset=structural) c0 c1 c2 d0 d1 d2 d3 compute compute compute compute compute compute compute c0 c1 c2 d0 d1 d2 d3 = = = = = = = .1 .1 .1 .1 .1 .1 .1 frml d_eq q = c0 + c1*p + c2*d frml s_eq q = d0 + d1*p + d2*f + d3*a nlsystem(inst,parmset=structural,outsigma=v) * * d_eq s_eq 27 28 Chapter 4 b34sreturn$ b34srun $ b34sexec options close(28)$ b34srun$ b34sexec options close(29)$ b34srun$ b34sexec options /$ dodos(' rats386 rats.in rats.out ') dodos('start /w /r rats32s rats.in /run') dounix('rats rats.in rats.out')$ B34SRUN$ b34sexec options npageout WRITEOUT('Output from RATS',' ',' ') COPYFOUT('rats.out') dodos('ERASE rats.in','ERASE rats.out','ERASE dounix('rm rats.in','rm rats.out','rm $ B34SRUN$ rats.dat') rats.dat') As noted earlier, the 2SLS and 3SLS results for the over- identified equation (demand) are the same. However, the printout shows that the residual variance for the 2SLS result is 3.8664, while the residual variance for the 3SLS result is 3.2865. The reason for this apparent error is that the 2SLS residual variance equals the sum of squared residuals divided by T-K, while the 3SLS calculation uses T; hence, 3.8664 = 3.2865 *(20/17). To investigate the differences in the supply equation that occur in Kmenta (1971) and (1986), edited and annotated SAS and RATS output is shown next. SAS 3SLS and I3SLS output is shown to agree with Kmenta (1986) for both demand and supply equations. The SYSLIN Procedure Three-Stage Least Squares Estimation Parameter Estimates Variable Intercept P D DF Parameter Estimate Standard Error t Value Pr > |t| 1 1 1 94.63330 -0.24356 0.313992 7.920838 0.096484 0.046944 11.95 -2.52 6.69 <.0001 0.0218 <.0001 Model Dependent Variable SUPPLY Q Parameter Estimates Variable Intercept P F A DF Parameter Estimate Standard Error t Value Pr > |t| 1 1 1 1 52.19720 0.228589 0.228158 0.361138 11.89337 0.099673 0.043994 0.072889 4.39 2.29 5.19 4.95 0.0005 0.0357 <.0001 0.0001 Endogenous Variables DEMAND SUPPLY P Q 0.243557 -0.22859 1 1 Exogenous Variables DEMAND SUPPLY Intercept D F A 94.6333 52.1972 0.313992 0 0 0.228158 0 0.361138 Simultaneous Equations Systems Inverse Endogenous Variables P Q DEMAND SUPPLY 2.11799 0.48415 -2.11799 0.51585 29 30 Chapter 4 The SYSLIN Procedure Three-Stage Least Squares Estimation Reduced Form P Q Intercept D F A 89.87924 72.74263 0.665032 0.152019 -0.48324 0.117695 -0.76489 0.186293 The SYSLIN Procedure Iterative Three-Stage Least Squares Estimation Parameter Estimates Variable Intercept P D DF Parameter Estimate Standard Error t Value Pr > |t| 1 1 1 94.63330 -0.24356 0.313992 7.920838 0.096484 0.046944 11.95 -2.52 6.69 <.0001 0.0218 <.0001 Model Dependent Variable SUPPLY Q Parameter Estimates Variable Intercept P F A DF Parameter Estimate Standard Error t Value Pr > |t| 1 1 1 1 52.66182 0.226586 0.223372 0.380006 12.80511 0.107459 0.046774 0.072010 4.11 2.11 4.78 5.28 0.0008 0.0511 0.0002 <.0001 Endogenous Variables DEMAND SUPPLY P Q 0.243557 -0.22659 1 1 Exogenous Variables DEMAND SUPPLY Intercept D F A 94.6333 52.66182 0.313992 0 0 0.223372 0 0.380006 Inverse Endogenous Variables P Q DEMAND SUPPLY 2.127012 0.481952 -2.12701 0.518048 The SYSLIN Procedure Iterative Three-Stage Least Squares Estimation Reduced Form P Q Intercept D F A 89.27387 72.89007 0.667864 0.151329 -0.47512 0.115718 -0.80828 0.196861 RATS output is shown next for OLS, 2SLS, LIML, and 3SLS two ways. Note that for the supply equation the estimated coefficients, SE’s, t’s and probabilities were: Constant P F A 52.552667563 11.395623960 0.227056969 0.095630772 0.224496638 0.041626039 0.375573566 0.064094682 4.61165 2.37431 5.39318 5.85967 0.00000399 0.01758185 0.00000007 0.00000000 Which are very close to the B34S I3SLS results which are duplicated below Exogenous Variables (Predetermined) 1 CONSTANT 2 F 52.55269 0.2244964 Std. Error 12.74080 0.4653972E-01 t 4.124755 4.823758 Theil SE 11.39572 0.4162639E-01 Theil t 4.611616 5.393126 Simultaneous Equations Systems 3 A 0.3755747 Endogenous Variables (Jointly Dependent) 4 P 0.2270569 31 0.7166061E-01 5.241020 0.6409520E-01 5.859638 Std. Error 0.1069194 t 2.123627 Theil SE 0.9563159E-01 Theil t 2.374287 These results are not at all like the SAS supply equation 3SLS results of Model SUPPLY Dependent Variable Q Parameter Estimates Variable Intercept P F A DF Parameter Estimate Standard Error t Value Pr > |t| 1 1 1 1 52.19720 0.228589 0.228158 0.361138 11.89337 0.099673 0.043994 0.072889 4.39 2.29 5.19 4.95 0.0005 0.0357 <.0001 0.0001 And I3SLS results of: Parameter Estimates Variable Intercept P F A DF Parameter Estimate Standard Error t Value Pr > |t| 1 1 1 1 52.66182 0.226586 0.223372 0.380006 12.80511 0.107459 0.046774 0.072010 4.11 2.11 4.78 5.28 0.0008 0.0511 0.0002 <.0001 that agree with Kmenta (1986) but not with Kmenta (1971). The full RATS output is shown below calculating 3SLS two different ways. linreg q # constant p d Linear Regression - Estimation by Least Squares Dependent Variable Q Usable Observations 20 Degrees of Freedom Centered R**2 0.763789 R Bar **2 0.735999 Uncentered R**2 0.999689 T x R**2 19.994 Mean of Dependent Variable 100.89820000 Std Error of Dependent Variable 3.75649822 Standard Error of Estimate 1.93012724 Sum of Squared Residuals 63.331649953 Regression F(2,17) 27.4847 Significance Level of F 0.00000471 Log Likelihood -39.90530 Durbin-Watson Statistic 1.744203 17 Variable Coeff Std Error T-Stat Signif ******************************************************************************* 1. Constant 99.89542291 7.51936214 13.28509 0.00000000 2. P -0.31629880 0.09067741 -3.48818 0.00281529 3. D 0.33463560 0.04542183 7.36729 0.00000110 linreg q # constant p f a Linear Regression - Estimation by Least Squares Dependent Variable Q Usable Observations 20 Degrees of Freedom Centered R**2 0.654807 R Bar **2 0.590084 Uncentered R**2 0.999546 T x R**2 19.991 Mean of Dependent Variable 100.89820000 Std Error of Dependent Variable 3.75649822 Standard Error of Estimate 2.40508651 Sum of Squared Residuals 92.551058175 Regression F(3,16) 10.1170 Significance Level of F 0.00056018 Log Likelihood -43.69905 Durbin-Watson Statistic 2.109731 16 Variable Coeff Std Error T-Stat Signif ******************************************************************************* 1. Constant 58.275431202 11.462909888 5.08383 0.00011056 2. P 0.160366596 0.094883937 1.69013 0.11038810 32 3. 4. Chapter 4 F A 0.248133295 0.248302347 0.046187854 0.097517767 5.37226 2.54623 0.00006227 0.02156713 instruments constant d f a linreg(inst) q # constant p d Linear Regression - Estimation by Instrumental Variables Dependent Variable Q Usable Observations 20 Degrees of Freedom 17 Mean of Dependent Variable 100.89820000 Std Error of Dependent Variable 3.75649822 Standard Error of Estimate 1.96632066 Sum of Squared Residuals 65.729087795 J-Specification(1) 2.535651 Significance Level of J 0.11130095 Durbin-Watson Statistic 2.009220 Variable Coeff Std Error T-Stat Signif ******************************************************************************* 1. Constant 94.63330387 7.92083831 11.94738 0.00000000 2. P -0.24355654 0.09648429 -2.52431 0.02183240 3. D 0.31399179 0.04694366 6.68869 0.00000381 linreg(inst) q # constant p f a Linear Regression - Estimation by Instrumental Variables Dependent Variable Q Usable Observations 20 Degrees of Freedom 16 Mean of Dependent Variable 100.89820000 Std Error of Dependent Variable 3.75649822 Standard Error of Estimate 2.45755523 Sum of Squared Residuals 96.633243702 Durbin-Watson Statistic 2.384645 Variable Coeff Std Error T-Stat Signif ******************************************************************************* 1. Constant 49.532441699 12.010526407 4.12409 0.00079536 2. P 0.240075779 0.099933852 2.40235 0.02878451 3. F 0.255605724 0.047250071 5.40964 0.00005785 4. A 0.252924175 0.099655087 2.53800 0.02192877 source d:\r\liml.src @liml q # constant p d Linear Regression - Estimation by LIML Dependent Variable Q Usable Observations 20 Degrees of Freedom Centered R**2 0.751068 R Bar **2 0.721782 Uncentered R**2 0.999673 T x R**2 19.993 Mean of Dependent Variable 100.89820000 Std Error of Dependent Variable 3.75649822 Standard Error of Estimate 1.98141608 Sum of Squared Residuals 66.742164700 Regression F(2,17) 25.6459 Significance Level of F 0.00000736 Log Likelihood -40.42982 Durbin-Watson Statistic 2.051725 17 Variable Coeff Std Error T-Stat Signif ******************************************************************************* 1. Constant 93.61922028 8.03124312 11.65688 0.00000000 2. P -0.22953809 0.09800238 -2.34217 0.03160318 3. D 0.31001345 0.04743306 6.53581 0.00000509 LIML Specification Test Chi-Squared(1)= 3.477343 with Significance Level 0.06221456 @liml q # constant p f a Linear Regression - Estimation by LIML Dependent Variable Q Usable Observations 20 Degrees of Freedom Centered R**2 0.639582 R Bar **2 0.572004 Uncentered R**2 0.999526 T x R**2 19.991 Mean of Dependent Variable 100.89820000 Std Error of Dependent Variable 3.75649822 Standard Error of Estimate 2.45755523 Sum of Squared Residuals 96.633243702 Regression F(3,16) 9.4643 Significance Level of F 0.00078341 Log Likelihood -44.13068 Durbin-Watson Statistic 2.384645 16 Variable Coeff Std Error T-Stat Signif ******************************************************************************* 1. Constant 49.532441699 12.010526407 4.12409 0.00079536 2. P 0.240075779 0.099933852 2.40235 0.02878451 Simultaneous Equations Systems 3. 4. F A 0.255605724 0.252924175 0.047250071 0.099655087 LIML Specification Test Chi-Squared(0)= 0.000000 with Significance Level equation demand q # constant p d equation supply q # constant p f a * Supply does not match known answers!! sur(inst,iterations=200) 2 # demand resid1 # supply resid2 5.40964 2.53800 0.00005785 0.02192877 NA Linear Systems - Estimation by System Instrumental Variables Iterations Taken 6 Usable Observations 20 Dependent Variable Q Mean of Dependent Variable 100.89820000 Std Error of Dependent Variable 3.75649822 Standard Error of Estimate 1.65490543 Sum of Squared Residuals 65.729087795 Durbin-Watson Statistic 2.009220 Variable Coeff Std Error T-Stat Signif ******************************************************************************* 1. Constant 94.63330387 7.30265210 12.95876 0.00000000 2. P -0.24355654 0.08895412 -2.73800 0.00618138 3. D 0.31399179 0.04327991 7.25491 0.00000000 Dependent Variable Q Mean of Dependent Variable 100.89820000 Std Error of Dependent Variable 3.75649822 Standard Error of Estimate 2.19982161 Sum of Squared Residuals 111.30194805 Durbin-Watson Statistic 2.094475 Variable Coeff Std Error T-Stat Signif ******************************************************************************* 4. Constant 52.552667563 11.395623960 4.61165 0.00000399 5. P 0.227056969 0.095630772 2.37431 0.01758185 6. F 0.224496638 0.041626039 5.39318 0.00000007 7. A 0.375573566 0.064094682 5.85967 0.00000000 Covariance\Correlation Matrix of Residuals Q Q Q 3.286454389751 0.9815996605 Q 4.197924168364 5.565097402593 nonlin(parmset=structural) c0 c1 c2 d0 d1 d2 d3 compute c0 = .1 compute c1 = .1 compute c2 = .1 compute d0 = .1 compute d1 = .1 compute d2 = .1 compute d3 = .1 frml d_eq q = c0 + c1*p + c2*d frml s_eq q = d0 + d1*p + d2*f + d3*a nlsystem(inst,parmset=structural,outsigma=v) * * d_eq s_eq GMM-No ZU Dependence Convergence in 6 Iterations. Final criterion was Usable Observations 20 Function Value 2.98311941 J-Specification(1) 2.983119 Significance Level of J 0.08413697 0.0000065 < 0.0000100 Dependent Variable Q Mean of Dependent Variable 100.89820000 Std Error of Dependent Variable 3.75649822 Standard Error of Estimate 1.81285807 Sum of Squared Residuals 65.729087790 Durbin-Watson Statistic 2.009220 Dependent Variable Q Mean of Dependent Variable 100.89820000 Std Error of Dependent Variable 3.75649822 Standard Error of Estimate 2.35905449 Sum of Squared Residuals 111.30276210 Durbin-Watson Statistic 2.094461 Variable Coeff Std Error T-Stat Signif ******************************************************************************* 33 34 1. 2. 3. 4. 5. 6. 7. Chapter 4 C0 C1 C2 D0 D1 D2 D3 94.63330387 -0.24355654 0.31399179 52.55266757 0.22705697 0.22449664 0.37557357 7.30265212 0.08895412 0.04327991 11.39562399 0.09563077 0.04162604 0.06409468 12.95876 -2.73800 7.25491 4.61165 2.37431 5.39318 5.85967 0.00000000 0.00618138 0.00000000 0.00000399 0.01758185 0.00000007 0.00000000 4.4 Exactly identified systems Table 4.7 shows the Kmenta supply and demand model modified to be exactly identified. In this form of the model the exogenous variable a was removed from the demand equation. In this case can be directly estimated with OLS and does not have to be calculated as B 1 using (4.1-4). It will be shown below that the LIML, 2SLS and 3SLS results are all the same. If is calculated from the biased OLS model over identified system, it will, however, not be the same. Table 4.6 Exactly Identified Kmenta Problem /; Modified PROBLEM FROM KMENTA (1971) PAGE 565 - 582 b34sexec options ginclude('b34sdata.mac') member(kmenta); b34srun; b34sexec simeq printsys reduced ols liml ls2 ls3 ils3 icov ipr=6 itmax=2000 kcov=diag ; heading=('Modified test case from kmenta 1971 pp 565-582' ) ; * the variable a has been removed from demand equation ; exogenous constant d f ; endogenous p q ; model lvar=q rvar=(constant p d) name=('demand eq.') ; model lvar=q rvar=(constant p f) name=('supply eq.') ; b34seend ; b34sexec matrix; call loaddata; call olsq(q d f :print); call olsq(p d f :print); b34srun; Edited output from running the code in Table 4.6 is shown below and will show alternative ways to calculate the constrained reduced form: Q = 71.7276 + .18278 D (15.93) (3.86) + .11739 F (2.67) (4.4-1) Simultaneous Equations Systems P = 82.1843 + .4346 D (10.19) (4.95) - 35 .28520 F (-3.49) (4.4-2) which was estimated in (4.4-1) and (4.4-2) with OLS. Modified test case from kmenta 1971 pp 565-582 Least Squares Solution for System Number Condition Number of Matrix is greater than Relative Numerical Error in the Solution LHS Endogenous Variable No. 2 Exogenous Variables (Predetermined) 1 CONSTANT 2 D 1 demand eq. 21.04911571706159 1.301987681166638E-11 Q 99.89542 0.3346356 Std. Error 7.519362 0.4542183E-01 t 13.28509 7.367285 Endogenous Variables (Jointly Dependent) 3 P -0.3162988 Std. Error 0.9067741E-01 t -3.488177 Residual Variance for Structural Disturbances Ratio of Norm Residual to Norm LHS 3.725391173733892 1.762488253954560E-02 Modified test case from kmenta 1971 pp 565-582 Least Squares Solution for System Number Condition Number of Matrix is greater than Relative Numerical Error in the Solution LHS Endogenous Variable No. 2 Exogenous Variables (Predetermined) 1 CONSTANT 2 F 2 supply eq. 17.64779394899586 1.349763156429639E-11 Q 65.56501 0.2137827 Endogenous Variables (Jointly Dependent) 3 P 0.1467363 Residual Variance for Structural Disturbances Ratio of Norm Residual to Norm LHS Std. Error 12.76481 0.5080064E-01 t 5.136387 4.208269 Std. Error 0.1089446 t 1.346889 7.650185613573186 2.525668087747731E-02 Modified test case from kmenta 1971 pp 565-582 Coefficients of the Reduced Form Equations. Least Squares Solution. Condition number of matrix used to find the reduced form coefficients is no smaller than P CONSTANT D F 1 2 3 4.319326581036200 Q 1 74.14 0.7227 -0.4617 2 76.44 0.1060 0.1460 Mean sum of squares of residuals for the reduced form equations. 1 2 P Q 0.21861D+02 0.44308D+01 Condition Number of columns of exogenous variables, 9.7857 Modified test case from kmenta 1971 pp 565-582 Limited Information - Maximum Likelihood Solution f 1 demand eq. Rank and Condition Number of Exogenous Columns Rank and Condition Number of Endogenous Variables orthogonal to X(K) Rank and Condition Number of Endogenous Variables orthogonal to X Value of LIML Parameter is 1.000000000000000 Condition Number of Matrix is greater than Relative Numerical Error in the Solution LHS Endogenous Variable No. 2 8.517463415017575 4.390231825107355E-12 Q Standard Deviation Equals 2SLSQ Standard Deviation. Exogenous Variables (Predetermined) 1 CONSTANT 2 D 106.7894 0.3616812 Endogenous Variables (Jointly Dependent) 2 1 2 8.5174634 1.0000000 1.0000000 36 Chapter 4 3 P -0.4115989 Residual Variance for Structural Disturbances Ratio of Norm Residual to Norm LHS 3.967444759365652 1.818845186115603E-02 Modified test case from kmenta 1971 pp 565-582 Limited Information - Maximum Likelihood Solution f 2 supply eq. Rank and Condition Number of Exogenous Columns Rank and Condition Number of Endogenous Variables orthogonal to X(K) Rank and Condition Number of Endogenous Variables orthogonal to X Value of LIML Parameter is 2 1 2 7.8643511 1.0000000 1.0000000 1.000000000000000 Condition Number of Matrix is greater than Relative Numerical Error in the Solution LHS Endogenous Variable No. 2 7.864351104449048 5.058888259015094E-12 Q Standard Deviation Equals 2SLSQ Standard Deviation. Exogenous Variables (Predetermined) 1 CONSTANT 2 F 35.90387 0.2373297 Endogenous Variables (Jointly Dependent) 3 P 0.4205434 Residual Variance for Structural Disturbances Ratio of Norm Residual to Norm LHS 10.49268888645498 2.957901371051407E-02 Modified test case from kmenta 1971 pp 565-582 Modified test case from kmenta 1971 pp 565-582 Coefficients of the Reduced Form Equations. LIMLE Solution. Condition number of matrix used to find the reduced form coefficients is no smaller than P CONSTANT D F 1 2 3 2.403435013906650 Q 1 85.18 0.4346 -0.2852 2 71.73 0.1828 0.1174 Modified test case from kmenta 1971 pp 565-582 Two Stage Least Squares Solution for System Number Condition Number of Matrix is greater than Relative Numerical Error in the Solution LHS Endogenous Variable No. 2 Exogenous Variables (Predetermined) 1 CONSTANT 2 D 1 demand eq. 32.58122209700925 2.267663108215286E-11 Q 106.7894 0.3616812 Endogenous Variables (Jointly Dependent) 3 P -0.4115989 Residual Variance for Structural Disturbances Ratio of Norm Residual to Norm LHS Std. Error 11.14355 0.5640608E-01 t 9.583069 6.412096 Std. Error 0.1448445 t -2.841660 Theil SE 10.27384 0.5200383E-01 Theil SE 0.1335401 Theil t 10.39430 6.954895 Theil t -3.082213 3.967444759365655 1.818845186115604E-02 Modified test case from kmenta 1971 pp 565-582 Two Stage Least Squares Solution for System Number Condition Number of Matrix is greater than Relative Numerical Error in the Solution LHS Endogenous Variable No. 2 Exogenous Variables (Predetermined) 1 CONSTANT 2 F 2 supply eq. 22.96654225297699 2.323008755765498E-11 Q 35.90387 0.2373297 Endogenous Variables (Jointly Dependent) 3 P 0.4205434 Residual Variance for Structural Disturbances Ratio of Norm Residual to Norm LHS Std. Error 18.86754 0.6019217E-01 t 1.902944 3.942866 Theil SE 17.39501 0.5549444E-01 Theil t 2.064032 4.276639 Std. Error 0.1660421 t 2.532751 Theil SE 0.1530833 Theil t 2.747154 10.49268888645498 2.957901371051407E-02 Modified test case from kmenta 1971 pp 565-582 Coefficients of the Reduced Form Equations. Two Stage Least Squares Solution Simultaneous Equations Systems Condition number of matrix used to find the reduced form coefficients is no smaller than P CONSTANT D F 1 2 3 37 2.403435013906650 Q 1 85.18 0.4346 -0.2852 2 71.73 0.1828 0.1174 Modified test case from kmenta 1971 pp 565-582 Three Stage Least Squares Solution for System Number LHS Endogenous Variable No. 2 Exogenous Variables (Predetermined) 1 CONSTANT 2 D 1 106.7894 0.3616812 Endogenous Variables (Jointly Dependent) 3 P -0.4115989 Std. Error 11.14355 0.5640608E-01 t 9.583069 6.412096 Std. Error 0.1448445 t -2.841660 Residual Variance (For Structural Disturbances) 2 Exogenous Variables (Predetermined) 1 CONSTANT 2 F Theil SE 10.27384 0.5200383E-01 Theil SE 0.1335401 Theil t 10.39430 6.954895 Theil t -3.082213 3.372328 Three Stage Least Squares Solution for System Number LHS Endogenous Variable No. demand eq. Q 2 supply eq. Q 35.90387 0.2373297 Endogenous Variables (Jointly Dependent) 3 P 0.4205434 Std. Error 18.86754 0.6019217E-01 t 1.902944 3.942866 Theil SE 17.39501 0.5549444E-01 Theil t 2.064032 4.276639 Std. Error 0.1660421 t 2.532751 Theil SE 0.1530833 Theil t 2.747154 Residual Variance (For Structural Disturbances) 8.918786 Coefficients of the Reduced Form Equations. Three Stage Least Squares Solution using Orthogonal Factorization. Condition number of matrix used to find the reduced form coefficients is no smaller than P CONSTANT D F 1 2 3 2.403435013906646 Q 1 85.18 0.4346 -0.2852 2 71.73 0.1828 0.1174 Note that the following OLS regressions successfully replicate the constrained reduced form values calculated by LIML, 2SLS and 3SLS models. In such exactly identified models it is possible to proceed from the reduced form to the coefficients of the estimated simultaneous structural model as shown in Table 4.1 for the theoretical model. B34S(r) Matrix Command. d/m/y 13/ 5/08. h:m:s => CALL LOADDATA$ => CALL OLSQ(Q D F :PRINT)$ Ordinary Least Squares Estimation Dependent variable Centered R**2 Adjusted R**2 Residual Sum of Squares Residual Variance Standard Error Total Sum of Squares Log Likelihood Mean of the Dependent Variable Std. Error of Dependent Variable Sum Absolute Residuals F( 2, 17) F Significance 1/Condition XPX Maximum Absolute Residual Number of Observations Variable D F CONSTANT Lag 0 0 0 Coefficient 0.18278440 0.11738935 71.727578 8: 9:49. Q 0.7142164973143195 0.6805949087630629 76.62264354549249 4.507214326205441 2.123020095572682 268.1142991999998 -41.81037433562074 100.8982000000000 3.756498223780113 32.24420684107891 21.24279452844673 0.9999762143066244 5.775396842473943E-07 4.421086526017319 20 SE 0.47299583E-01 0.44030665E-01 4.5035392 t 3.8643977 2.6660816 15.926935 38 => Chapter 4 CALL OLSQ(P D F :PRINT)$ Ordinary Least Squares Estimation Dependent variable Centered R**2 Adjusted R**2 Residual Sum of Squares Residual Variance Standard Error Total Sum of Squares Log Likelihood Mean of the Dependent Variable Std. Error of Dependent Variable Sum Absolute Residuals F( 2, 17) F Significance 1/Condition XPX Maximum Absolute Residual Number of Observations Variable D F CONSTANT Lag 0 0 0 Coefficient 0.43463860 -0.28520325 85.184338 P 0.6043888119424351 0.5578463192297805 263.9721582328006 15.52777401369415 3.940529661567611 667.2514989500000 -54.17988429200851 100.0190500000000 5.926086393627488 56.50496104816216 12.98574220495295 0.9996226165906434 5.775396842473943E-07 9.070540097816391 20 SE 0.87792579E-01 0.81725152E-01 8.3590023 t 4.9507442 -3.4897855 10.190730 4.5 Analysis of OLS, 2SLS and 3SLS using Matrix Command The matrix command, documented in Chapter 16, provides a means by which to illustrate the estimation of OLS, 2SLS and 3SLS models using “classic textbook” formulas. Simultaneous Equations Systems 39 Table 4.7 shows code that implements OLS, 2SLS, 3SLS and FIML estimation using these formulas: Table 4.7 Matrix Command Implementation of OLS, 2SLS, 3SLS and FIML /$ /$ Estimates Kmenta Problem with Matrix command. /$ Purpose is to illustrate OLS/2SLS/3SLS/FIML both with /$ SIMEQ and with Matrix Commands. /$ /$ FIML SE same as 3SLS asymptotically (See Greene 5e page 408) /$ /$ Problem Discussed in "Specifying and Diagnostically Testing /$ Econometric Models" Chapter 4 Third Edition /$ %b34slet verbose=0; /$ set =1 to "test" matrix setup. Usually set=0 %b34slet dosimeq=1; /$ set =1 to run the SIMEQ command as well as matrix B34SEXEC DATA NOHEAD CORR$ INPUT Q P D F A $ LABEL Q = 'Food consumption per head'$ LABEL P = 'Ratio of Food Prices to consumer prices'$ LABEL D = 'Disposable Income in constant prices'$ LABEL F = 'Ratio of T-1 years price to general P'$ LABEL A = 'Time'$ COMMENT=('KMENTA(1971) PAGE 565 ANSWERS PAGE 582')$ DATACARDS$ 98.485 100.323 87.4 98.0 1 99.187 104.264 97.6 102.163 103.435 96.7 99.1 3 101.504 104.506 98.2 104.240 98.001 99.8 110.8 5 103.243 99.456 100.5 103.993 101.066 103.2 105.6 7 99.900 104.763 107.8 100.350 96.446 96.6 108.7 9 102.820 91.228 88.9 95.435 93.085 75.1 81.0 11 92.424 98.801 76.9 94.535 102.908 84.6 70.9 13 98.757 98.756 90.6 105.797 95.119 103.1 102.3 15 100.225 98.451 105.1 103.522 86.498 96.4 110.5 17 99.929 104.016 104.4 105.223 105.769 110.7 89.3 19 106.232 113.490 127.1 B34SRETURN$ B34SEEND$ 99.1 98.1 108.2 109.8 100.6 68.6 81.4 105.0 92.5 93.0 %b34sif(&dosimeq.eq.1)%then; B34SEXEC SIMEQ PRINTSYS REDUCED OLS LIML LS2 LS3 FIML FIMLC KCOV=DIAG IPR=6$ HEADING=('Test Case from Kmenta (1971) Pages 565 - 582 ' ) $ EXOGENOUS CONSTANT D F A $ ENDOGENOUS P Q $ MODEL LVAR=Q RVAR=(CONSTANT P D) NAME=('Demand Equation')$ MODEL LVAR=Q RVAR=(CONSTANT P F A) NAME=('Supply Equation')$ B34SEEND$ %b34sendif; b34sexec matrix; call loaddata; verbose=0; %b34sif(&verbose.ne.0)%then; verbose=1; 2 4 6 8 10 12 14 16 18 20 40 Chapter 4 %b34sendif; x_1=mfam(catcol(constant p d)); x_2=mfam(catcol(constant p f a)); x_1px_1=transpose(x_1)*x_1; x_2px_2=transpose(x_2)*x_2; x_1py_1=transpose(x_1)*vfam(q); x_2py_2=transpose(x_2)*vfam(q); d1=inv(x_1px_1)*x_1py_1; d2=inv(x_2px_2)*x_2py_2; call print('OLS eq 1 ',d1 ); call print('OLS eq 2 ',d2 ); * 2SLS ; * z_i is right hand side of equation i ; x = mfam(catcol(constant d f a)); xpx = transpose(x)*x; z_1 = mfam(catcol(constant p d) ); z_2 = mfam(catcol(constant p f a)); xpz_1 = transpose(x)*z_1; xpz_2 = transpose(x)*z_2; xpy_1 = transpose(x)*vfam(q); xpy_2 = transpose(x)*vfam(q); y_1py_1 = vfam(q)*vfam(q); y_2py_2 = vfam(q)*vfam(q); y_1py_2 = vfam(q)*vfam(q); ls2eq1=inv(transpose(xpz_1)*inv(xpx)*xpz_1)* (transpose(xpz_1)*inv(xpx)*xpy_1); call print('Two stage estimates Equation 1',ls2eq1); fit1=vfam(q)-z_1*ls2eq1; sigma11=(y_1py_1 - (2.*vfam(q)*z_1*ls2eq1) + ls2eq1*transpose(z_1)*z_1*ls2eq1)/17.; if(verbose.ne.0)then; call print('sigma11 ',sigma11:); call print('Residual Variance 1',sigma11*sigma11:); call print('Test 1 ',(fit1*fit1)/ 17.:); call print('Large sample ',(fit1*fit1)/ 20.:); endif; varcoef1=sigma11*inv(transpose(z_1)*x*inv(xpx)*transpose(x)*z_1); call print('Asymptotic Covariance Matrix eq 1 ',varcoef1); ls2eq2=inv(transpose(xpz_2)*inv(xpx)*xpz_2)* (transpose(xpz_2)*inv(xpx)*xpy_2); call print('Two stage estimates Equation 2',ls2eq2); fit2=vfam(q)-z_2*ls2eq2; sigma22=(y_2py_2 - (2.*vfam(q)*z_2*ls2eq2) + ls2eq2*transpose(z_2)*z_2*ls2eq2)/16.; if(verbose.ne.0)then; call print('sigma22 ',sigma22:); call print('Residual Variance 2',sigma22*sigma22:); call print('Test 2 ',(fit2*fit2)/ 16.:); call print('Large Sample ',(fit2*fit2)/ 20.:); endif; Simultaneous Equations Systems sigma12=(y_1py_2 - (vfam(q)*z_1*ls2eq1) - (vfam(q)*z_2*ls2eq2) + ls2eq1*transpose(z_1)*z_2*ls2eq2)/20.; if(verbose.ne.0)call print('test sigma12 ',sigma12); varcoef2=sigma22*inv(transpose(z_2)*x*inv(xpx)*transpose(x)*z_2); call print('Asymptotic Covariance Matrix eq 2 ',varcoef2); * Get sigma(i,j) from fits ; s=mfam(catcol(fit1,fit2)); sigma=(transpose(s)*s)/20.; call print('Large Sample sigma (Jennings) ',sigma); covar1=sigma(1,1)*inv(transpose(xpz_1)*inv(xpx)*xpz_1); covar2=sigma(2,2)*inv(transpose(xpz_2)*inv(xpx)*xpz_2); call print('Estimated Covariance Matrix - Large Sample':); call print(covar1,covar2); ls2se=dsqrt(array(:covar1(1,1),covar1(2,2),covar1(3,3) covar2(1,1),covar2(2,2),covar2(3,3) covar2(4,4))); call print('SE of LS2 Model Equations - Large Sample',ls2se); sssigma(1,1)=sigma(1,1)*(20./17.); sssigma(1,2)=sigma(1,2)*(20./dsqrt(17.*16.)); sssigma(2,1)=sigma(2,1)*(20./dsqrt(17.*16.)); sssigma(2,2)=sigma(2,2)*(20./16.); call print('Kmenta (Small Sample Sigma ',sssigma); covar1=sssigma(1,1)*inv(transpose(xpz_1)*inv(xpx)*xpz_1); covar2=sssigma(2,2)*inv(transpose(xpz_2)*inv(xpx)*xpz_2); call print('Estimated Covariance Matrix - Small Sample':); call print(covar1,covar2); ls2se=dsqrt(array(:diag(covar1),diag(covar2))); call print('SE of LS2 Model Equations - Small Sample',ls2se); * LS3 calculation ; xpxinv=inv(xpx); /$ sigma=inv(sssigma); sigma=inv(sigma); term11= sigma(1,1)*(transpose(xpz_1)*xpxinv*xpz_1); term12= sigma(1,2)*(transpose(xpz_1)*xpxinv*xpz_2); term21= sigma(2,1)*(transpose(xpz_2)*xpxinv*xpz_1); term22= sigma(2,2)*(transpose(xpz_2)*xpxinv*xpz_2); left1 =catcol(term11 term12); left2 =catcol(term21 term22); left =catrow(left1 left2); if(verbose.ne.0) call print(term11 term12 term21 term22 left1 left2 left); right1=(sigma(1,1)*(transpose(xpz_1)*xpxinv*xpy_1)) + (sigma(1,2)*(transpose(xpz_1)*xpxinv*xpy_2)); right2=(sigma(2,1)*(transpose(xpz_2)*xpxinv*xpy_1)) + (sigma(2,2)*(transpose(xpz_2)*xpxinv*xpy_2)); right=catrow(right1 right2); 41 42 Chapter 4 call print(right1 right2 right,inv(left)); ls3=inv(left)*right; call print('Three Stage Least Squares ',ls3); ls3se = dsqrt(diag(inv(left))); t3sls=array(norows(ls3):ls3(,1))/afam(ls3se); call print('Three Stage Least Squares SE',ls3se); call print('Three Stage Least Squares t ',t3sls); * FIML following Kmenta (1971) pages 578 - 581 ; * q = f(constant P D ) * q = g(constant p F A) * q = a1 + a2*p + a3*d * q = b1 + b2*p + b3*f ; ; + u1 ; + b4*a + u2; y = transpose(mfam(catcol(q p))); x = transpose(mfam(catcol(constant d f a))); gt= 2.* dfloat(norows(y)); t =dfloat(norows(y)); call print('Using 3sls starting values ',ls3); /$ /$ /$ /$ /$ /$ /$ a1=sfam(ls3(1)); a2=sfam(ls3(2)); a3=sfam(ls3(3)); b1=sfam(ls3(4)); b2=sfam(ls3(5)); b3=sfam(ls3(6)); b4=sfam(ls3(7)); program model; bigb = matrix(2,2: 1.0, -1.0*a2, 1.0, -1.0*b2); biggamma = matrix(2,4:-1.0*a1, -1.0*a3, 0.0, 0.0, -1.0*b1, 0.0, -1.0*b3, -1.0*b4); u1u2=bigb*y+biggamma*x; phi = u1u2*transpose(u1u2); /$ General purpose FIML setup if there are no identities /$ For a discussion of Formulas see Kmenta (1971) page 578-581 func=(-1.0*(gt*pi())/2.0) - ((t/2.0)*dlog(dmax1(dabs(det(phi)) ,.1d-30) )) + ( t *dlog(dmax1(dabs(det(bigb)),.1d-30) )) - (.5*sum(transpose(u1u2)*inv(phi)*u1u2)); call call call call call outstring(3, 3,'Function'); outdouble(36,3,func); outdouble(4, 4, a1); outdouble(36,4, a2); outdouble(55,4, a3); call outdouble(4 ,5, b1); call outdouble(36,5, b2); call outdouble(55,5, b3); Simultaneous Equations Systems 43 call outdouble(4, 6, b4); return; end; call rvec ll uu call print(model); =vector(7:ls3); =vector(7:) -1.d+2; =vector(7:) +1.d+3; echooff; call cmaxf2(func :name model :parms a1 a2 a3 b1 b2 b3 b4 :ivalue rvec :lower ll :upper UU :maxit 10000 :maxfun 10000 :maxg 10000 :print); b34srun; The matrices X_1 and X_2 are built with the catcol command and the OLS estimates for equations 1 and 2 are respectively D1 and D2. Edited results show. OLS eq 1 D1 = Vector of 99.8954 => 3 -0.316299 elements 0.334636 CALL PRINT('OLS eq 2 ',D2 )$ OLS eq 2 D2 = Vector of 58.2754 4 0.160367 elements 0.248133 0.248302 which are consistent with what was obtained with the simeq command. Next using the “textbook” 2SLS formula ˆ1 [ Z1' X ( X ' X ) 1 X ' Z1 ]1 [ Z1' X ( X ' X ) 1 X ' y1 ] ˆ2 [ Z 2' X ( X ' X ) 1 X ' Z 2 ]1[ Z 2' X ( X ' X ) 1 X ' y2 ] i j [eˆ1 , eˆ2 ]'[eˆ1 , eˆ2 ]/ T (4.5-1) we obtain the 2SLS estimates and the error covariance matrix i , j which is needed for the 3SLS estimates. Edited results match what was found earlier with simeq. Note that call echooff; has been turned off to illustrate the steps of the calculation. Two stage estimates Equation 1 LS2EQ1 = Vector of 94.6333 => 3 -0.243557 FIT1=VFAM(Q)-Z_1*LS2EQ1$ elements 0.313992 44 Chapter 4 => => SIGMA11=(Y_1PY_1 - (2.*VFAM(Q)*Z_1*LS2EQ1) + LS2EQ1*TRANSPOSE(Z_1)*Z_1*LS2EQ1)/17.$ => IF(VERBOSE.NE.0)THEN$ => CALL PRINT('sigma11 => CALL PRINT('Residual Variance => CALL PRINT('Test 1 => CALL PRINT('Large sample ',(FIT1*FIT1)/ 20.:)$ => ENDIF$ => VARCOEF1=SIGMA11*INV(TRANSPOSE(Z_1)*X*INV(XPX)*TRANSPOSE(X)*Z_1)$ => CALL PRINT('Asymptotic Covariance Matrix eq 1 ',VARCOEF1)$ ',SIGMA11:)$ 1',SIGMA11*SIGMA11:)$ ',(FIT1*FIT1)/ 17.:)$ Asymptotic Covariance Matrix eq 1 VARCOEF1= Matrix of 1 2 3 3 1 62.7397 -0.673422 0.493016E-01 by 3 2 -0.673422 0.930922E-02 -0.264190E-02 elements 3 0.493016E-01 -0.264190E-02 0.220371E-02 => => LS2EQ2=INV(TRANSPOSE(XPZ_2)*INV(XPX)*XPZ_2)* (TRANSPOSE(XPZ_2)*INV(XPX)*XPY_2)$ => CALL PRINT('Two stage estimates Equation 2',LS2EQ2)$ Two stage estimates Equation 2 LS2EQ2 = Vector of 49.5324 4 0.240076 elements 0.255606 0.252924 => FIT2=VFAM(Q)-Z_2*LS2EQ2$ => => SIGMA22=(Y_2PY_2 - (2.*VFAM(Q)*Z_2*LS2EQ2) + LS2EQ2*TRANSPOSE(Z_2)*Z_2*LS2EQ2)/16.$ => IF(VERBOSE.NE.0)THEN$ => CALL PRINT('sigma22 => CALL PRINT('Residual Variance 2',SIGMA22*SIGMA22:)$ => CALL PRINT('Test 2 => CALL PRINT('Large Sample ',(FIT2*FIT2)/ 20.:)$ => ENDIF$ ',SIGMA22:)$ ',(FIT2*FIT2)/ 16.:)$ Simultaneous Equations Systems => => SIGMA12=(Y_1PY_2 - (VFAM(Q)*Z_1*LS2EQ1) - (VFAM(Q)*Z_2*LS2EQ2) + LS2EQ1*TRANSPOSE(Z_1)*Z_2*LS2EQ2)/20.$ => IF(VERBOSE.NE.0)CALL PRINT('test sigma12 ',SIGMA12)$ => VARCOEF2=SIGMA22*INV(TRANSPOSE(Z_2)*X*INV(XPX)*TRANSPOSE(X)*Z_2)$ => CALL PRINT('Asymptotic Covariance Matrix eq 2 ',VARCOEF2)$ Asymptotic Covariance Matrix eq 2 VARCOEF2= Matrix of 1 2 3 4 1 144.253 -1.09541 -0.323818 -0.295229 4 by 2 -1.09541 0.998677E-02 0.936222E-03 0.579069E-03 4 elements 3 -0.323818 0.936222E-03 0.223257E-02 0.137681E-02 4 -0.295229 0.579069E-03 0.137681E-02 0.993114E-02 => * GET SIGMA(I,J) FROM FITS $ => S=MFAM(CATCOL(FIT1,FIT2))$ => SIGMA=(TRANSPOSE(S)*S)/20.$ => CALL PRINT('Large Sample sigma (Jennings) ',SIGMA)$ Large Sample sigma (Jennings) SIGMA 1 2 = Matrix of 1 3.28645 3.59324 2 by 2 elements 2 3.59324 4.83166 => COVAR1=SIGMA(1,1)*INV(TRANSPOSE(XPZ_1)*INV(XPX)*XPZ_1)$ => COVAR2=SIGMA(2,2)*INV(TRANSPOSE(XPZ_2)*INV(XPX)*XPZ_2)$ => CALL PRINT('Estimated Covariance Matrix - Large Sample':)$ Estimated Covariance Matrix - Large Sample => CALL PRINT(COVAR1,COVAR2)$ COVAR1 1 2 3 COVAR2 1 2 3 4 => = Matrix of 1 53.3287 -0.572408 0.419064E-01 = Matrix of 1 115.402 -0.876328 -0.259055 -0.236183 3 by 2 -0.572408 0.791284E-02 -0.224561E-02 4 by 2 -0.876328 0.798942E-02 0.748977E-03 0.463256E-03 3 elements 3 0.419064E-01 -0.224561E-02 0.187315E-02 4 elements 3 -0.259055 0.748977E-03 0.178606E-02 0.110144E-02 LS2SE=DSQRT(ARRAY(:DIAG(COVAR1),DIAG(COVAR2)))$ 4 -0.236183 0.463256E-03 0.110144E-02 0.794491E-02 45 46 => Chapter 4 CALL PRINT('SE of LS2 Model Equations - Large Sample',LS2SE)$ SE of LS2 Model Equations - Large Sample LS2SE = Array of 7.30265 7 elements 0.889541E-01 0.432799E-01 => SSSIGMA(1,1)=SIGMA(1,1)*(20./17.)$ => SSSIGMA(1,2)=SIGMA(1,2)*(20./DSQRT(17.*16.))$ => SSSIGMA(2,1)=SIGMA(2,1)*(20./DSQRT(17.*16.))$ => SSSIGMA(2,2)=SIGMA(2,2)*(20./16.)$ => CALL PRINT('Kmenta (Small Sample Sigma 10.7425 0.893836E-01 0.422617E-01 0.891342E-01 0.472501E-01 0.996551E-01 ',SSSIGMA)$ Kmenta (Small Sample Sigma SSSIGMA = Matrix of 1 2 1 3.86642 4.35744 2 by 2 elements 2 4.35744 6.03958 => COVAR1=SSSIGMA(1,1)*INV(TRANSPOSE(XPZ_1)*INV(XPX)*XPZ_1)$ => COVAR2=SSSIGMA(2,2)*INV(TRANSPOSE(XPZ_2)*INV(XPX)*XPZ_2)$ => CALL PRINT('Estimated Covariance Matrix - Small Sample':)$ Estimated Covariance Matrix - Small Sample => CALL PRINT(COVAR1,COVAR2)$ COVAR1 1 2 3 COVAR2 1 2 3 4 = Matrix of 1 62.7397 -0.673422 0.493016E-01 = Matrix of 1 144.253 -1.09541 -0.323818 -0.295229 3 by 3 2 -0.673422 0.930922E-02 -0.264190E-02 4 by 3 0.493016E-01 -0.264190E-02 0.220371E-02 4 2 -1.09541 0.998677E-02 0.936222E-03 0.579069E-03 elements elements 3 -0.323818 0.936222E-03 0.223257E-02 0.137681E-02 4 -0.295229 0.579069E-03 0.137681E-02 0.993114E-02 => => LS2SE=DSQRT(ARRAY(:COVAR1(1,1),COVAR1(2,2),COVAR1(3,3) COVAR2(1,1),COVAR2(2,2),COVAR2(3,3) COVAR2(4,4)))$ => CALL PRINT('SE of LS2 Model Equations - Small Sample',LS2SE)$ SE of LS2 Model Equations - Small Sample LS2SE = Array 7.92084 of 7 elements 0.964843E-01 0.469437E-01 12.0105 0.999339E-01 Simultaneous Equations Systems 47 Note that the estimated asymptotic covariance matrix for each equation was calculated as ˆ11[ Z1 ' X ( X ' X ) 1 X ' Z1 ]1 ˆ 22 [ Z 2 ' X ( X ' X ) 1 X ' Z 2 ]1 (4.5-2) The SE for each coefficient is the square root of the diagonal elements of the estimated covariance matrix. The 3SLS model is estimated using the “textbook” equation as ˆ1,1 [ Z1' X [ X ' X ]1 X ' y1 ] 1 ˆ1,1[ Z1' X [ X ' X ]1 X ' Z1 ˆ1,2 [ Z1' X [ X ' X ]1 X ' Z 2 ˆ1,2 [ Z1' X [ X ' X ]1 X ' y2 ] ' 1 ' 1 ' 1 ˆ2,1[ Z 2 X [ X ' X ] X ' Z1 ˆ2,2 [ Z 2 X [ X ' X ] X ' Z 2 ˆ2,1 [ Z 2 X [ X ' X ] X ' y1 ] (4.5-3) ˆ [ Z ' X [ X ' X ]1 X ' y ] 2 2,2 2 where [ ]1 . Equation (4.5-3) comes directly from Kmenta (1971, 577) and is consistent with Theil (1971, 510). It is to me noted that most modern texts The estimated output verifies the simeq 3SLS command. In the matrix program output each term in (4.4-3) is broken out and put together into the left and right parts of (4.5-3), which at first looks formidable. => * LS3 CALCULATION $ => XPXINV=INV(XPX)$ => SIGMA=INV(SIGMA)$ => TERM11= SIGMA(1,1)*(TRANSPOSE(XPZ_1)*XPXINV*XPZ_1)$ => TERM12= SIGMA(1,2)*(TRANSPOSE(XPZ_1)*XPXINV*XPZ_2)$ => TERM21= SIGMA(2,1)*(TRANSPOSE(XPZ_2)*XPXINV*XPZ_1)$ => TERM22= SIGMA(2,2)*(TRANSPOSE(XPZ_2)*XPXINV*XPZ_2)$ => LEFT1 =CATCOL(TERM11 TERM12)$ => LEFT2 =CATCOL(TERM21 TERM22)$ => LEFT => IF(VERBOSE.NE.0)THEN$ => CALL PRINT(TERM11 TERM12 TERM21 TERM22 LEFT1 LEFT2 LEFT)$ => ENDIF$ => => RIGHT1=(SIGMA(1,1)*(TRANSPOSE(XPZ_1)*XPXINV*XPY_1)) + (SIGMA(1,2)*(TRANSPOSE(XPZ_1)*XPXINV*XPY_2))$ => => RIGHT2=(SIGMA(2,1)*(TRANSPOSE(XPZ_2)*XPXINV*XPY_1)) + (SIGMA(2,2)*(TRANSPOSE(XPZ_2)*XPXINV*XPY_2))$ => RIGHT=CATROW(RIGHT1 RIGHT2)$ =CATROW(LEFT1 LEFT2)$ 48 => Chapter 4 CALL PRINT(RIGHT1 RIGHT2 RIGHT,INV(LEFT))$ RIGHT1 = Vector of 3 842.104 RIGHT2 RIGHT 84261.3 = Vector of -208.606 elements 82406.3 4 elements -20873.2 = Matrix of -20220.4 7 by 1 elements 7 by 7 elements -2196.91 1 842.104 84261.3 82406.3 -208.606 -20873.2 -20220.4 -2196.91 1 2 3 4 5 6 7 Matrix of 1 53.3287 -0.572408 0.419064E-01 52.0707 -0.556756 0.337445E-01 0.509185E-01 1 2 3 4 5 6 7 2 -0.572408 0.791284E-02 -0.224561E-02 -0.291667 0.494945E-02 -0.180825E-02 -0.272854E-02 3 0.419064E-01 -0.224561E-02 0.187315E-02 -0.232929 0.632767E-03 0.150833E-02 0.227598E-02 => LS3=INV(LEFT)*RIGHT$ => CALL PRINT('Three State Least Squares ',LS3)$ 4 52.0707 -0.291667 -0.232929 113.162 -0.866671 -0.235979 -0.327163 5 -0.556756 0.494945E-02 0.632767E-03 -0.866671 0.794779E-02 0.649506E-03 0.855426E-03 6 0.337445E-01 -0.180825E-02 0.150833E-02 -0.235979 0.649506E-03 0.154836E-02 0.203856E-02 7 0.509185E-01 -0.272854E-02 0.227598E-02 -0.327163 0.855426E-03 0.203856E-02 0.425029E-02 Three Stage Least Squares LS3 = Matrix of 7 by 1 elements 1 94.6333 -0.243557 0.313992 52.1176 0.228932 0.228978 0.357907 1 2 3 4 5 6 7 => LS3SE = DSQRT(DIAG(INV(LEFT)))$ => CALL PRINT('Three State Least Squares SE',LS3SE)$ Three State Least Squares SE LS3SE = Vector of 7.30265 7 0.889541E-01 elements 0.432799E-01 10.6378 0.891504E-01 0.393493E-01 0.651943E-01 The estimated standard errors are those suggested by Theil. The FIML estimation method required a maximization procedure. Kmenta (1971) shows that for a model without constraints FIML maximizes L GT T 1 T log(2 ) log | | T log | B | ( Byt xt )' 1 ( Byt xt ) 2 2 2 t 1 (4.5-4) where G M or the number of equations in the model. The Kmenta test problem can be written Simultaneous Equations Systems q 1 2 P 3 D u1 Demand q 1 2 P 3 F 4 A u2 Supply 49 (4.5-5) For this problem 0 1 3 0 1 2 B , , 11 12 1 2 12 22 1 0 3 4 and | B | and | | refer to the Jacobian or absolute value of the determinant of B and respectively. Using the matrix command it is fairly easy to implement this estimator. Problems can arise of there are local maximums in the problem. The edited FIML results are given next. => PROGRAM MODEL$ => CALL PRINT(MODEL)$ MODEL = Program PROGRAM MODEL$ BIGB = MATRIX(2,2: 1.0, -1.0*A2, 1.0, -1.0*B2)$ BIGGAMMA = MATRIX(2,4:-1.0*A1, -1.0*A3, 0.0, 0.0, -1.0*B1, 0.0, -1.0*B3, -1.0*B4)$ U1U2=BIGB*Y+BIGGAMMA*X$ PHI = U1U2*TRANSPOSE(U1U2)$ FUNC=(-1.0*(GT*PI())/2.0) - ((T/2.0)*DLOG(DMAX1(DABS(DET(PHI)) ,.1D-30) )) + ( T *DLOG(DMAX1(DABS(DET(BIGB)),.1D-30) )) - (.5*SUM(TRANSPOSE(U1U2)*INV(PHI)*U1U2))$ CALL OUTSTRING(3, 3,'Function')$ CALL OUTDOUBLE(36,3,FUNC)$ CALL OUTDOUBLE(4, 4, A1)$ CALL OUTDOUBLE(36,4, A2)$ CALL OUTDOUBLE(55,4, A3)$ CALL OUTDOUBLE(4 ,5, B1)$ CALL OUTDOUBLE(36,5, B2)$ CALL OUTDOUBLE(55,5, B3)$ CALL OUTDOUBLE(4, 6, B4)$ RETURN$ END$ => RVEC =VECTOR(7:LS3)$ => LL =VECTOR(7:) -1.D+2$ => UU =VECTOR(7:) => CALL ECHOOFF$ +1.D+3$ Constrained Maximum Likelihood Estimation using CMAXF2 Command Final Functional Value -13.37570521223952 # of parameters 7 # of good digits in function 15 # of iterations 28 # of function evaluations 55 # of gradiant evaluations 30 Scaled Gradient Tolerance 6.055454452393343E-06 50 Chapter 4 Scaled Step Tolerance Relative Function Tolerance False Convergence Tolerance Maximum allowable step size Size of Initial Trust region 1 / Cond. of Hessian Matrix # 1 2 3 4 5 6 7 Name A1 A2 A3 B1 B2 B3 B4 3.666852862501036E-11 3.666852862501036E-11 2.220446049250313E-14 108037.5007234256 -1.000000000000000 2.229180241990960E-09 Coefficient 93.619219 -0.22953804 0.31001341 51.944511 0.23730613 0.22081875 0.36970888 Standard Error 3.4191227 0.60544227E-01 0.34296485E-01 7.3541629 0.45456398E-01 0.28752980E-01 0.14370566E-01 T Value 27.381064 -3.7912458 9.0392183 7.0632799 5.2205221 7.6798560 25.726814 SE calculated as sqrt |diagonal(inv(%hessian))| Hessian Matrix 1 230.516 23086.3 22522.1 -174.305 -17457.4 -16823.6 -1834.45 1 2 3 4 5 6 7 2 23089.2 0.231266E+07 0.225634E+07 -17456.3 -0.174875E+07 -0.168477E+07 -183704. 3 22524.9 0.225660E+07 0.220289E+07 -17029.9 -0.170618E+07 -0.164463E+07 -179499. 4 -174.328 -17458.5 -17032.0 135.877 13607.8 13115.4 1430.03 5 -17459.8 -0.174897E+07 -0.170639E+07 13609.6 0.136313E+07 0.131342E+07 143201. 6 -16825.9 -0.168498E+07 -0.164483E+07 13117.1 0.131360E+07 0.126732E+07 137898. 7 -1834.71 -183728. -179522. 1430.22 143221. 137918. 15323.9 Gradiant Vector -0.568518E-06 -0.557801E-04 -0.544320E-04 0.447704E-06 0.438995E-04 0.419615E-04 0.528029E-05 Lower vector -100.000 -100.000 -100.000 -100.000 -100.000 -100.000 -100.000 1000.00 1000.00 1000.00 1000.00 1000.00 1000.00 Upper vector 1000.00 B34S Matrix Command Ending. Last Command reached. Space available in allocator Number variables used Number temp variables used 7873665, peak space used 130, peak number used 36882, # user temp clean 8277 135 0 and replicate the Kmenta test values for coefficients. The simeq FIML results are: Test Case from Kmenta (1971) Pages 565 - 582 Functional Minimization Solution for System No. LHS Endogenous Variable No. 2 1 Demand Equation Q Exogenous Variables (Predetermined) 1 CONSTANT 2 D 93.61922 0.3100134 Std. Error 6.152863 0.3633922E-01 t 15.21555 8.531097 Theil SE 5.672659 0.3350311E-01 Theil t 16.50359 9.253274 Endogenous Variables (Jointly Dependent) 3 P -0.2295381 Std. Error 0.7508118E-01 t -3.057199 Theil SE 0.6922143E-01 Theil t -3.315998 Residual Variance (For Structural Disturbances) 3.337108 Functional Minimization 3SLS Covariance for System CONSTANT D P 1 2 3 CONSTANT D 1 2 37.86 0.3121E-01 0.1321E-02 -0.4078 -0.1600E-02 Demand Equation P 3 0.5637E-02 Functional Minimization Solution for System No. 2 Supply Equation Simultaneous Equations Systems LHS Endogenous Variable No. 2 51 Q Exogenous Variables (Predetermined) 1 CONSTANT 2 F 3 A 51.94451 0.2208188 0.3697089 Std. Error 9.739647 0.3489965E-01 0.5846143E-01 t 5.333305 6.327249 6.323981 Theil SE 8.711405 0.3121520E-01 0.5228949E-01 Theil t 5.962816 7.074080 7.070425 Endogenous Variables (Jointly Dependent) 4 P 0.2373061 Std. Error 0.8237774E-01 t 2.880707 Theil SE 0.7368089E-01 Theil t 3.220728 Residual Variance (For Structural Disturbances) 5.620947 Functional Minimization 3SLS Covariance for System CONSTANT F A P CONSTANT 1 94.86 -0.1858 -0.3119 -0.7341 1 2 3 4 F A Supply Equation P 2 3 4 0.1218E-02 0.1943E-02 0.4772E-03 0.3418E-02 0.8825E-03 0.6786E-02 Test Case from Kmenta (1971) Pages 565 - 582 Contemporaneous Covariance of Residuals (Structural Disturbances) For Functional Minimization 3SLSQ Solution. Condition Number of residual columns, Demand E Supply E Demand E 1 3.337 4.255 1 2 6.942988 Supply E 2 5.621 Correlation Matrix of Residuals Demand E 1 1 1.000 2 0.9824 Demand E Supply E Supply E 2 1.000 Test Case from Kmenta (1971) Pages 565 - 582 Coefficients of the Reduced Form Equations. Condition number of matrix used to find the reduced form coefficients is no smaller than P CONSTANT D F A 1 2 3 4 1 89.27 0.6641 -0.4730 -0.7919 4.284084281338983 Q 2 73.13 0.1576 0.1086 0.1818 Mean sum of squares of residuals for the reduced form equations. 1 2 P Q 0.20588D+01 0.43479D+01 and give identical coefficients but different SE's due to the algorithm used. Greene (2003, page 408), notes that "asymptotically the covariance matrix for the FIML estimator is the same as that for the 3SLS estimator." The purpose of this exercise has been to illustrate how "textbook" formulas can be used with a programming language, such as the matrix command, to produce 2SLS, 3SLS and FIML estimates fairly easily where the alternative would be to build a C or Fortran program to perform the calculation. Since "textbook" formulas are used for the matrix example, the accuracy of these calculations are inferior to the QR approach of Jennings (1980), which is the basis for the simeq command. Inspection of the matrix program that implements these estimators may give the reader confidence to tackle other calculations that have not been implemented in commercial software.11 The matrix examples shown have been coded for teaching purposes (clarity of the 11 The modern pace of research is so fast that if one waits until a new procedure is implemented in commercial 52 Chapter 4 code) not research purposes. Many components of the calculation that appear a number of places in a formula such as (4.4-3) have not been calculated once and saved. 4.6 LS2 and GMM Models and Specification tests The Generalized Method of Moments estimation technique is a generalization of 2SLS that allows for various assumptions on the error distribution. Assume there are l instruments in Z. The basic idea of GMM is to select coefficients ˆGMM such that g (ˆGMM ) 0 (4.6-1) where 1 N 1 N 1 g ( ˆ ) gi ( ) zi' ( yi xi ) Z 'u N i1 N i1 N (4.6-2) It can be shown that the efficient GMM estimator is ˆEGMM ( X ' ZS 1Z ' X )1 X ' ZS 1Z ' y (4.6-3) where S E[ Z ' uu ' Z ] E[ Z ' Z ] (4.6-4) Using the 2SLS residuals, a heteroskedasticity-consistent estimator of S can be obtained as 1 N Sˆ uˆ 2 Z i' Z i N i1 (4.6-5) which has been characterized as a standard sandwich approach to robust covariance estimation. For more details see Davidson and MacKinnon (1993, 607-610) and Baum (2006, 194-197) Hall – Rudebusch - Wilcox (1996) proposed a likelihood ratio test of the relevance of instrumental variables Z that is based on canonical correlation between the X and Z ri . The ordered canonical correlation vector can be calculated as the square root of the eigenvalues of ( X ' X )1 ( X ' Z )( Z ' Z )1 (Z ' X ) software, often it is too late. (4.6-6) Simultaneous Equations Systems 53 with associated eigenvectors i or the square root of the eigenvalues of ( Z ' Z )1 (Z ' X )( X ' X )1 ( X ' Z ) (4.6-7) with associated eigenvectors i . The vectors 1 and 1 maximize the correlation between X and Z which equals r1 . As noted by Hall-Rudebusch-Wilcox (1996, 287) “ j and j are the vectors which yield the j th highest correlation r j subject to the constrains that X j and Z j are orthogonal.” The proposed Anderson statistic n LR T log(1 ri 2 ) (4.6-8) i j 1 is distributed as Chi-squared with (l-k+1) degrees of freedom where l is the rank of Z and k is the rank of X and can be applied to both 2SLS and GMM models. A significant statistic is consistent with appropriate instruments. A disadvantage of the Anderson test, is that it assumes that the regressors are distributed multivariate normal. Further information on the Anderson test is in Baum (2006, 208). The Anderson statistic can also be displayed in LM form as N min(ri ) or in the Cragg-Donald (1993) form as ( N min(ri )) / (1 min(ri )) . If these ststistics are not significant, the instruments selected are weak. For GMM estimation the Hansen (1982) J statistic which tests for overidentifying restrictions is usually used. The Hansen test, which is also called the Sargon (1958) test, is the value of the efficient GMM objective function u ' ZS 1Z ' u (4.6-9) and is distributed as chi-square with degrees of freedom l-k. A significant value indicates the selected instruments are not suitable. The Basmann (1960) over identification test is ' ' (uLS 2uLS 2 uZ uZ ) (N l) uZ' uZ (4.6-10) where uLS 2 is the residual from the LS2 equation and uz is the residual from a model that predicts uLS 2 as a function of Z. The Basmann test is distributed as chi-square with degrees of freedom l-k. If the instruments Z have no predictive power, or in other words are orthogonal to ' ' the LS2 residuals, then uLS 2u LS 2 u Z u Z and the chi-square value will not be significant. A significant chi-square value, however, indicates that the instruments are not suitable since they are not exogenous. 54 Chapter 4 Table 4.8 lists subroutines LS2 and GMMEST that estimate 2SLS and GMM models respectively. For an exactly identified system, LS2 and GMM will be the same. For an overidentified system, GMM is more efficient. Table 4.8 LS2 and General Method of Moments estimation routines /; /; Loads LS2 and GMMEST /; subroutine ls2(y1,x1,z1,names,yvar,iprint); /; /; y1 => left hand side Usually set as %y from OLS /; x1 => right hand side. Usually set as %x from OLS step /; z1 => instrumental Variables /; names => Names from OLS step. Usually set as %names /; yvar => usually set from call olsq as %yvar /; iprint => =1 print coef, =2 print covariance in addition /; /; if # of obs for z1 < x1 then x1 will be truncated /; /; Automatic variables created /; %olscoef => OLS Coefficients /; %ols_se => OLS SE /; %ols_t => OLS t /; %ls2coef => LS2 Coefficients /; %ls2_sel => Large Sample LS2 SE /; %ls2_ses => Small Sample LS2 SE /; %ls2_t_l => Large Sample LS2 t /; %ls2_t_s => Small Sample LS2 t /; %rss_ols => e'e for OLS /; %rss_ls2 => e'e for LS2 /; %yhatols => yhat for OLS /; %yhatls2 => yhat for LS2 /; %resols => OLS Residual /; %resls2 => LS2 Residual /; %covar1 => Large Sample covariance /; %sigma_l => Large Sample sigma /; %sigma_s => Small Sample Sigma /; %z /; %info => Model is ok if = 0 /; For conditional Heteroskedasticity Sargan(1958)=Hansen(1982) j test /; %sargan => Sargan(1958) test /; %basmann => Basmann(1960) /; /; Example Job: /; /; b34sexec options ginclude('b34sdata.mac') member(kmenta); /; b34srun; /; /; b34sexec matrix; /; call loaddata; /; call echooff; /; call print('OLS for Equation # 1':); /; call olsq(q p d :savex :print); /; call ls2a(%y,%x,catcol(d,f,a,constant),%names,%yvar,1); Simultaneous Equations Systems /; /; call print('OLS for Equation # 2':); /; call olsq(q p a f: a :savex :print); /; call ls2a(%y,%x,catcol(d,f,a,constant),%names,%yvar,1); /; b34srun; /; /; Command built 26 April 2010, Mods 26 May 2010 2 August 2010 /; y =vfam(y1); %z=mfam(z1); x =mfam(x1); n1=norows(%z); n2=norows(x); if(n2.lt.n1)call deleterow(%z,1,(n1-n2)); if(n1.lt.n2)then; call epprint('ERROR: # obs for instruments < # obs for equation'); go to done; endif; /; This saves the OLS Results call olsq(y x :noint); %olscoef=%coef; %ols_se=%se; %ols_t =%t; n_k=%nob-%k; %rss_ols=%rss; %yhatols=%yhat; %resols =%res; * 2SLS ; zpz = transpose(%z)*%z; zpx = transpose(%z)*x; zpy = transpose(%z)*y; ypy = y*y; irank=rank(zpx); iorder=rank(zpz); /; if(iorder.lt.irank)then; call epprint('ERROR: Model Underidentified.':); go to done; endif; /; %ls2coef =inv(transpose(zpx)*inv(zpz)*zpx)* (transpose(zpx)*inv(zpz)*zpy); /; /; Error trap turned off /; /; call gminv((transpose(zpx)*inv(zpz)*zpx),%ls2coef,%info,rrcond); /; if(%info.ne.0)then; /; go to done; /; endif; /; %ls2coef=%ls2coef*(transpose(zpx)*inv(zpz)*zpy); %yhatls2=x*%ls2coef; 55 56 Chapter 4 %resls2 =y-%yhatls2; sigma_w=(ypy - (2.*y*x*%ls2coef) + %ls2coef*transpose(x)*x*%ls2coef)/dfloat(n_k); varcoef=sigma_w*inv(transpose(x)*%z*inv(zpz)*transpose(%z)*x); %ls2_ses=dsqrt(diag(varcoef)); * Get sigma(i,j) from fits ; %rss_ls2=sumsq(%resls2); %sigma_l=%rss_ls2/dfloat(%nob); %sigma_s=%rss_ls2/dfloat(n_k); %covar_1=%sigma_l*inv(transpose(zpx)*inv(zpz)*zpx); %ls2_sel=dsqrt(diag(%covar_1)); %ls2_t_s=afam(%ls2coef)/afam(%ls2_ses); %ls2_t_l=afam(%ls2coef)/afam(%ls2_sel); /; /; squared canonical correlations /; if(iprint.ne.0)then; can_corr=real(eig(inv(transpose(x)*x)*(transpose(x)*%z)*inv(zpz)*zpx)); call print(can_corr); anderson=-1.*dfloat(norows(%z)) *dlog(sum(kindas(%z,1.0)-afam(can_corr))); anderlm = dfloat(norows(%z))*min(can_corr); cragg_d = anderlm/(1.0 - min(can_corr)); endif; /; /; %sargan & %basmann /; call olsq(%resls2 %z :noint); %basmann=(dfloat( norows(%z)-nocols(%z))*(sumsq(%resls2)-%rss))/%rss; %sargan = dfloat(norows(%z))*%rsq; /; if(iprint.ne.0)then; call print(' ':); call print('OLS and LS2 Estimation':); call print(' ':); gg= 'Dependent Variable '; gg2=c1array(8:yvar); ff=catrow(gg,gg2); call print(ff:); call print('OLS Sum of squared Residuals ',%rss_ols:); call print('LS2 Sum of squared Residuals ',%rss_ls2:); call print('Large Sample ls2 sigma ',%sigma_l:); call print('Small Sample ls2 sigma ',%sigma_s:); call print('Rank of Equation ',irank:); call print('Order of Equation ',iorder:); if(irank.lt.iorder)call print('Equation is overidentified':); if(irank.eq.iorder)call print('Equation is exactly identified':); /; call print('Anderson LR ident./IV Relevance test ',anderson:); /; if(iorder.ge.irank.and.anderson.gt.0.0)then; aprob=chisqprob(anderson,dfloat(iorder+1-irank)); call print('Significance of Anderson LR Statistic',aprob:); endif; /; call print('Anderson Canon Correlation LM test ',anderlm:); /; if(iorder.ge.irank.and.anderlm.gt.0.0)then; aprob=chisqprob(anderlm,dfloat(iorder+1-irank)); Simultaneous Equations Systems call print('Significance of Anderson LM Statistic',aprob:); endif; /; call print('Cragg-Donald Chi-Square Weak ID Test ',cragg_d:); /; if(iorder.ge.irank.and.cragg_d.gt.0.0)then; aprob=chisqprob(cragg_d,dfloat(iorder+1-irank)); call print('Significance of Cragg-Donald test ',aprob:); endif; /; call print('Basmann ',%basmann:); /; if(iorder.gt.irank.and.%basmann.gt.0.0)then; bprob=chisqprob(%basmann,dfloat(iorder-irank)); call print('Significance of Basmann Statistic ',bprob:); endif; /; call print('Sargan N*R-sq / J-Test Test ',%sargan:); /; if(iorder.gt.irank.and.%sargan.gt.0.0)then; sprob=chisqprob(%sargan,dfloat(iorder-irank)); call print('Significance of Sargan Statistic ',sprob:); endif; /; call tabulate(names,%olscoef,%ols_se,%ols_t,%ls2coef, %ls2_ses,%ls2_sel, %ls2_t_s,%ls2_t_l :title '+++++++++++++++++++++++++++++++++++++++++++++++++++++'); call print(' ':); if(iprint.eq.2) call print('Estimated Covariance Matrix - Large Sample',%covar_1); endif; /; call makeglobal(%olscoef); call makeglobal(%ols_se); call makeglobal(%ols_t); call makeglobal(%ls2coef); call makeglobal(%ls2_sel); call makeglobal(%ls2_ses); call makeglobal(%ls2_t_l); call makeglobal(%ls2_t_s); call makeglobal(%rss_ols); call makeglobal(%rss_ls2); call makeglobal(%yhatols); call makeglobal(%yhatls2); call makeglobal(%resols); call makeglobal(%resls2); call makeglobal(%covar_1); call makeglobal(%sigma_l); call makeglobal(%sigma_s); call makeglobal(%z); call makeglobal(%sargan); call makeglobal(%basmann); /; call makeglobal(%info); /; done continue; return; end; subroutine gmmest(y,x,z,names,yvar,j_stat,sigma,iprint); 57 58 Chapter 4 /; /; GMM Model - Built 12 May 2010 /; /; Must call ls2 prior to this call to produce global variable /; %z /; /; The following global variables are created: /; %resgmm => GMM Residuals /; %segmm => GMM SE /; %tgmm => GMM t /; %coefgmm => GMM Coef /; %yhatgmm => GMM Y hat /; /; The Anderson Test is discussed in Baum /; "An introduction to Modern Econometrics Using Stata" (2006) p. 208 /; Both the IV and LM forms of tgeh test are given. /; /; Generates feasable two-step GMM Estimator. Results are the same as /; produced by the RATS "optimalweights" option. /; /; Note: When running bootstraps inv(s) can fail to invert if dummy /; variables are in the dataset. /; /; See Baum (2006) page 196 /; xpz = transpose(x)*z; xpy = transpose(x)*vfam(y); ypy = vfam(y)*vfam(y); /; /; GMM Coefficients /; irank =rank(xpz); iorder=rank(transpose(z)*z); /; if(iorder.lt.irank)then; call epprint('ERROR: Model Underidentified.':); go to done; endif; /; adj=kindas(z,1.0)/dfloat(norows(z)); s=hc_sigma(adj,z,%resls2); inv_s=inv(s); %coefgmm=inv(xpz*inv_s*transpose(xpz)) * (xpz*inv_s*transpose(z)*vfam(y)); %resgmm =vfam(y)-x*%coefgmm; %yhatgmm=x*%coefgmm; sigma=hc_sigma(kindas(z,1.),z,%resls2); /; /; Logic from Rats User's Guide Version 7 page 245 /; j_stat=%resgmm*z*inv(sigma)*transpose(z)*%resgmm; /; /; Stock Watcon 2007 page 734 /; %segmm=dsqrt(diag(inv(xpz*inv(sigma)*transpose(xpz)))); %tgmm=afam(%coefgmm)/afam(%segmm); /; /; /; squared canonical correlations /; can_corr = real(eig(inv(transpose(x)*x)*(transpose(x)*z) Simultaneous Equations Systems *inv(transpose(z)*z)* transpose(xpz))); /; if(iprint.gt.1)call print(can_corr); anderson=-1.*dfloat(norows(z)) *dlog(sum(kindas(z,1.0)-afam(can_corr))); anderlm = dfloat(norows(z))*min(can_corr); cragg_d = anderlm/(1.0 - min(can_corr)); /; if(iprint.ne.0)then; call print(' ':); call print('GMM Estimates':); call print(' ':); gg= 'Dependent Variable '; gg2=c1array(8:yvar); ff=catrow(gg,gg2); call print(ff:); call print('OLS sum of squares ',sumsq(%resols):); call print('LS2 sum of squares ',sumsq(%resls2):); call print('GMM sum of squares ',sumsq(%resgmm):); call print('Rank of Equation ',irank:); call print('Order of Equation ',iorder:); if(irank.lt.iorder)call print('Equation is overidentified':); if(irank.eq.iorder)call print('Equation is exactly identified':); call print('Anderson ident./IV Relevance test ',anderson:); /; if(iorder.ge.irank.and.anderson.gt.0.0)then; aprob=chisqprob(anderson,dfloat(iorder+1-irank)); call print('Significance of Anderson Statistic ',aprob:); endif; /; call print('Anderson Canon Correlation LM test ',anderlm:); /; if(iorder.ge.irank.and.anderlm.gt.0.0)then; aprob=chisqprob(anderlm,dfloat(iorder+1-irank)); call print('Significance of Anderson LM Statistic',aprob:); endif; /; call print('Cragg-Donald Chi-Square Weak ID Test ',cragg_d:); /; if(iorder.ge.irank.and.cragg_d.gt.0.0)then; aprob=chisqprob(cragg_d,dfloat(iorder+1-irank)); call print('Significance of Cragg-Donald test ',aprob:); endif; /; call print('Hansen J_stat Ident. of instruments',j_stat:); /; if(iorder.gt.irank.and.j_stat.gt.0.0)then; jprob=chisqprob(j_stat,dfloat(iorder-irank)); call print('Significance of Hansen j_stat ',jprob:); endif; /; call tabulate(names,%coefgmm,%segmm,%tgmm :title '+++++++++++++++++++++++++++++++++++++++++++++++++++++'); call print(' ':); endif; call call call call call makeglobal(%resgmm); makeglobal(%segmm); makeglobal(%tgmm); makeglobal(%coefgmm); makeglobal(%yhatgmm); 59 60 Chapter 4 done continue; return; end; Table 4.9 shows the setup to estimate and test LS2 and GMM models for the Griliches (1976) wage data used as a test case in Baum (2006). The Griliches model regresses the log wage on education, experience, tenure, age, a number of control variables and various year dummy variables. Stata and Rats results are shown for comparison. In addition Baum (2006) can be inspected for replication purposes. Table 4.9 Estimation of LS2 and GMM Models using B34S, Stata and Rats %b34slet %b34slet %b34slet %b34slet b34sexec dob34s1=0; dob34s2=1; dostata=1; dorats =1; options ginclude('micro.mac') member(griliches76); b34srun %b34sif(&dob34s1.ne.0)%then; b34sexec matrix; call loaddata; call echooff; call olsq(iq s expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 med kww age mrt :print); iqyhat=%yhat; call olsq(lw iqyhat s expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 :print); call olsq(lw iq s expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 :print); call gamfit(lw iq s expr tenure rns[factor,1] smsa[factor,1] iyear_67[factor,1] iyear_68[factor,1] iyear_69[factor,1] iyear_70[factor,1] iyear_71[factor,1] iyear_73[factor,1] :print); call marspline(lw iq s expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 :print :nk 40 :mi 2); call gamfit(lw80 iq s expr tenure rns[factor,1] smsa[factor,1] iyear_67[factor,1] Simultaneous Equations Systems iyear_68[factor,1] iyear_69[factor,1] iyear_70[factor,1] iyear_71[factor,1] iyear_73[factor,1] :print); call marspline(lw80 iq s expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 :print :nk 40 :mi 2); b34srun; %b34sendif; %b34sif(&dob34s2.ne.0)%then; b34sexec matrix; call loaddata; call load(ls2); call echooff; call character(lhs,'lw'); call character(endvar,'iq'); call character(rhs,'iq s expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 constant'); call character(ivar,'s expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 constant med kww age mrt'); call olsq(argument(lhs) argument(rhs) :noint :print :savex); call ls2(%y,%x,catcol(argument(ivar)),%names,%yvar,1); call print(lhs,rhs,ivar,endvar); call gmmest(%y,%x,%z,%names,%yvar,j_stat,sigma,1); call graph(%y %yhatols %yhatls2,%yhatgmm :nocontact :pgborder :nolabel); b34srun; %b34sendif; %b34sif(&dostata.ne.0)%then; b34sexec options open('statdata.do') unit(28) disp=unknown$ b34srun$ b34sexec options clean(28)$ b34srun$ b34sexec options open('stata.do') unit(29) disp=unknown$ b34srun$ b34sexec options clean(29)$ b34srun$ b34sexec pgmcall idata=28 icntrl=29$ stata$ * for detail on stata commands see Baum page 205 ; pgmcards$ * uncomment if do not use /e * log using stata.log, text global xlist s expr tenure rns smsa iyear_67 /// iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 ivregress 2sls ivregress liml ivregress gmm lw $xlist (iq=med kww age mrt) lw $xlist (iq=med kww age mrt) lw $xlist (iq=med kww age mrt) ivreg lw $xlist (iq=med kww age mrt) ivreg2 lw $xlist (iq=med kww age mrt) 61 62 Chapter 4 ivreg2 lw $xlist (iq=med kww age mrt), gmm2s robust overid, all * orthog(age mrt) gmm (lw-{xb:$xlist iq} +{b0}), /// instruments ($xlist med kww age mrt) onestep nolog exit,clear b34sreturn$ b34seend$ b34sexec options close(28); b34srun; b34sexec options close(29); b34srun; b34sexec options dounix('stata -b do stata.do ') dodos('stata /e stata.do'); b34srun; b34sexec options npageout writeout('output from stata',' ',' ') copyfout('stata.log') dodos('erase stata.do', /; 'erase stata.log', 'erase statdata.do') $ b34srun$ %b34sendif; %b34sif(&dorats.ne.0)%then; b34sexec options open('rats.dat') unit(28) disp=unknown$ b34srun$ b34sexec options open('rats.in') unit(29) disp=unknown$ b34srun$ b34sexec options clean(28)$ b34srun$ b34sexec options clean(29)$ b34srun$ b34sexec pgmcall$ rats passasts pcomments('* ', '* Data passed from B34S(r) system to RATS', '* ', "display @1 %dateandtime() @33 ' Rats Version ' %ratsversion()" '* ') $ PGMCARDS$ * instruments s expr tenure rns smsa iyear_67 $ iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 med kww age mrt constant * OLS linreg lw # constant s expr tenure rns smsa iyear_67 $ iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 iq * 2SLS linreg(inst) lw # constant s expr tenure rns smsa iyear_67 $ iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 iq * GMM $ Simultaneous Equations Systems 63 linreg(inst,optimalweights) lw # constant s expr tenure rns smsa iyear_67 $ iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 iq b34sreturn$ b34srun $ b34sexec options close(28)$ b34srun$ b34sexec options close(29)$ b34srun$ b34sexec options /$ dodos(' rats386 rats.in rats.out ') dodos('start /w /r rats32s rats.in /run') dounix('rats rats.in rats.out')$ B34SRUN$ b34sexec options npageout WRITEOUT('Output from RATS',' ',' ') COPYFOUT('rats.out') dodos('ERASE rats.in','ERASE rats.out','ERASE dounix('rm rats.in','rm rats.out','rm $ B34SRUN$ %b34sendif; rats.dat') rats.dat') Edited and annotated results are shown next. Variable RNS RNS80 MRT MRT80 SMSA SMSA80 MED IQ KWW YEAR AGE AGE80 S S80 EXPR EXPR80 TENURE TENURE80 LW LW80 IYEAR_67 IYEAR_68 IYEAR_69 IYEAR_70 IYEAR_71 IYEAR_73 CONSTANT Label 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 # Cases residency in South residency in South in 1980 marital status = 1 if married marital status = 1 if married in 1980 reside metro area = 1 if urban reside metro area = 1 if urban in 1980 mother s education, years iq score score on knowledge in world of work test Year Age Age in 1980 completed years of schooling completed years of schooling in 1980 experience, years experience, yearsin 1980 tenure, years tenure, years in 1980 log wage log wage in 1980 Number of observations in data file Current missing variable code 758 758 758 758 758 758 758 758 758 758 758 758 758 758 758 758 758 758 758 758 758 758 758 758 758 758 758 758 1.000000000000000E+31 Mean 0.269129 0.292876 0.514512 0.898417 0.704485 0.712401 10.9103 103.856 36.5739 69.0317 21.8351 33.0119 13.4050 13.7071 1.73543 11.3943 1.83113 7.36280 5.68674 6.82656 0.831135E-01 0.104222 0.112137 0.844327E-01 0.121372 0.208443 1.00000 Std. Dev. 0.443800 0.455383 0.500119 0.302299 0.456575 0.452942 2.74112 13.6187 7.30225 2.63179 2.98176 3.08550 2.23183 2.21469 2.10554 4.21075 1.67363 5.05024 0.428949 0.409927 0.276236 0.305750 0.315744 0.278219 0.326775 0.406464 0.00000 Variance 0.196959 0.207373 0.250119 0.913845E-01 0.208461 0.205156 7.51374 185.468 53.3228 6.92634 8.89087 9.52033 4.98106 4.90486 4.43331 17.7304 2.80104 25.5049 0.183998 0.168040 0.763063E-01 0.934828E-01 0.996940E-01 0.774060E-01 0.106782 0.165213 0.00000 Maximum Minimum 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 18.0000 145.000 56.0000 73.0000 30.0000 38.0000 18.0000 18.0000 11.4440 22.0450 10.0000 22.0000 7.05100 8.03200 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 54.0000 12.0000 66.0000 16.0000 28.0000 9.00000 9.00000 0.00000 0.692000 0.00000 0.00000 4.60500 4.74900 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 1.00000 64 Chapter 4 Ordinary Least Squares Estimation Dependent variable Centered R**2 Adjusted R**2 Residual Sum of Squares Residual Variance Standard Error Total Sum of Squares Log Likelihood Mean of the Dependent Variable Std. Error of Dependent Variable Sum Absolute Residuals F(12, 745) F Significance 1/Condition XPX Maximum Absolute Residual Number of Observations Variable IQ S EXPR TENURE RNS SMSA IYEAR_67 IYEAR_68 IYEAR_69 IYEAR_70 IYEAR_71 IYEAR_73 CONSTANT Lag 0 0 0 0 0 0 0 0 0 0 0 0 0 Coefficient 0.27121199E-02 0.61954782E-01 0.30839472E-01 0.42163060E-01 -0.96293467E-01 0.13289929 -0.54209478E-01 0.80580850E-01 0.20759151 0.22822373 0.22269148 0.32287469 4.2353569 LW 0.4301415547786606 0.4209626268019410 79.37338878983863 0.1065414614628706 0.3264068955504320 139.2861498420176 -220.3342420049200 5.686738782319042 0.4289493629019316 194.5217111479906 46.86185095575703 1.000000000000000 1.486105464518127E-06 1.186094775249485 758 SE 0.10314110E-02 0.72785810E-02 0.65100828E-02 0.74812112E-02 0.27546700E-01 0.26575835E-01 0.47852181E-01 0.44895091E-01 0.43860470E-01 0.48799418E-01 0.43095233E-01 0.40657433E-01 0.11334886 t 2.6295239 8.5119313 4.7371858 5.6358601 -3.4956444 5.0007567 -1.1328528 1.7948700 4.7329979 4.6767716 5.1674272 7.9413448 37.365677 The below listed edited output replicates Baum (2006, 193-194). The Basman and Sargan tests of 97.0249 and 87.655, respectively, show high significance which rejects the null hypothesis that there is no correlation between the residuals of the LS2 model and the instruments. This finding suggests serious problems since endogeniety present in the OLS model will not be removed by LS2 estimation. Note that Stata replicates the Sargon test value. The Anderson value of 54.33 that tests for the relevance of the instruments matches the value reported in Baum (2006, 204) but does not match the value reported by Stata in the printed output that uses the revised ivreg2 Stata command that uses the LM form of the test value of 52.436. The B34S output includes both statistics. Since the null was rejected, the instruments appear relevant in that they are related to the endogenous variables. This is confirmed with the Cragg-Donald (1993) statistic of 56.333. In addition to various LS2 and GMM results, both Stata bootstrap and Stata robust errors results are shown. The bootstrap results do not make do not make assumptions about the distribution of the regressiors. The Rats coefficient results for LS2 and GMM match B34S and Stata. Note that Rats uses the small sample SE formula while Stata reports the large sample SE. B34S LS2 results report both. The exact formulas for all LS2 and GMM calculations in B34S are contained in the two subroutines listed in Table 4.8. Simultaneous Equations Systems OLS and LS2 Estimation Dependent Variable OLS Sum of squared Residuals LS2 Sum of squared Residuals Large Sample ls2 sigma Small Sample ls2 sigma Rank of Equation Order of Equation Equation is overidentified Anderson LR ident./IV Relevance test Significance of Anderson LR Statistic Anderson Canon Correlation LM test Significance of Anderson LM Statistic Cragg-Donald Chi-Square Weak ID Test Significance of Cragg-Donald test Basmann Significance of Basmann Statistic Sargan N*R-sq / J-Test Test Significance of Sargan Statistic LW 79.37338878983863 80.01823370030675 0.1055649521112226 0.1074070251010829 13 16 54.33777011513529 0.9999999999552830 52.43586586757428 0.9999999998881718 56.33277600836977 0.9999999999829244 97.02497131695870 1.000000000000000 87.65523169449482 1.000000000000000 +++++++++++++++++++++++++++++++++++++++++++++++++++++ Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 LHS NAMES IQ S EXPR TENURE RNS SMSA IYEAR_67 IYEAR_68 IYEAR_69 IYEAR_70 IYEAR_71 IYEAR_73 CONSTANT %OLSCOEF %OLS_SE %OLS_T 0.2712E-02 0.1031E-02 2.630 0.6195E-01 0.7279E-02 8.512 0.3084E-01 0.6510E-02 4.737 0.4216E-01 0.7481E-02 5.636 -0.9629E-01 0.2755E-01 -3.496 0.1329 0.2658E-01 5.001 -0.5421E-01 0.4785E-01 -1.133 0.8058E-01 0.4490E-01 1.795 0.2076 0.4386E-01 4.733 0.2282 0.4880E-01 4.677 0.2227 0.4310E-01 5.167 0.3229 0.4066E-01 7.941 4.235 0.1133 37.37 %LS2COEF %LS2_SES %LS2_SEL %LS2_T_S %LS2_T_L 0.1747E-03 0.3937E-02 0.3903E-02 0.4436E-01 0.4474E-01 0.6918E-01 0.1305E-01 0.1294E-01 5.301 5.347 0.2987E-01 0.6697E-02 0.6639E-02 4.460 4.498 0.4327E-01 0.7693E-02 0.7627E-02 5.625 5.674 -0.1036 0.2974E-01 0.2948E-01 -3.484 -3.514 0.1351 0.2689E-01 0.2666E-01 5.025 5.069 -0.5260E-01 0.4811E-01 0.4769E-01 -1.093 -1.103 0.7947E-01 0.4511E-01 0.4472E-01 1.762 1.777 0.2109 0.4432E-01 0.4393E-01 4.759 4.800 0.2386 0.5142E-01 0.5097E-01 4.641 4.682 0.2285 0.4412E-01 0.4374E-01 5.178 5.223 0.3259 0.4107E-01 0.4072E-01 7.935 8.004 4.400 0.2709 0.2685 16.24 16.38 = LW RHS = IQ S EXPR TENURE RNS SMSA IYEAR_67 IYEAR_68 70 IYEAR_71 IYEAR_73 CONSTANT IVAR = S EXPR TENURE RNS SMSA IYEAR_67 IYEAR_68 IYEAR_71 IYEAR_73 CONSTANT MED KWW AGE MRT ENDVAR IYEAR_69 IYEAR_ IYEAR_69 IYEAR_70 = iq GMM Estimates Dependent Variable OLS sum of squares LS2 sum of squares GMM sum of squares Rank of Equation Order of Equation Equation is overidentified Anderson ident./IV Relevance test Significance of Anderson Statistic Anderson Canon Correlation LM test Significance of Anderson LM Statistic Cragg-Donald Chi-Square Weak ID Test Significance of Cragg-Donald test Hansen J_stat Ident. of instruments Significance of Hansen j_stat LW 79.37338878983863 80.01823370030675 81.26217887229201 13 16 54.33777011513529 0.9999999999552830 52.43586586757428 0.9999999998881718 56.33277600836977 0.9999999999829244 74.16487762432548 0.9999999999999994 +++++++++++++++++++++++++++++++++++++++++++++++++++++ Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 NAMES IQ S EXPR TENURE RNS SMSA IYEAR_67 IYEAR_68 IYEAR_69 IYEAR_70 IYEAR_71 IYEAR_73 CONSTANT %COEFGMM %SEGMM %TGMM -0.1401E-02 0.4113E-02 -0.3407 0.7684E-01 0.1319E-01 5.827 0.3123E-01 0.6693E-02 4.667 0.4900E-01 0.7344E-02 6.672 -0.1007 0.2959E-01 -3.403 0.1336 0.2632E-01 5.075 -0.2101E-01 0.4554E-01 -0.4614 0.8910E-01 0.4270E-01 2.087 0.2072 0.4080E-01 5.080 0.2338 0.5285E-01 4.424 0.2346 0.4257E-01 5.510 0.3360 0.4041E-01 8.315 4.437 0.2900 15.30 B34S Matrix Command Ending. Last Command reached. 65 66 Chapter 4 output from stata ___ ____ ____ ____ ____ (R) /__ / ____/ / ____/ ___/ / /___/ / /___/ 11.1 Statistics/Data Analysis Copyright 2009 StataCorp LP StataCorp 4905 Lakeway Drive College Station, Texas 77845 USA 800-STATA-PC http://www.stata.com 979-696-4600 stata@stata.com 979-696-4601 (fax) Single-user Stata perpetual license: Serial number: 30110535901 Licensed to: Houston H. Stokes University of Illinois at Chicago Notes: 1. 2. (/m# option or -set memory-) 120.00 MB allocated to data Stata running in batch mode . do stata.do . * File built by B34S . run statdata.do on 17/10/10 at 12:29:31 . * uncomment if do not use /e . * log using stata.log, text . global xlist s expr tenure rns smsa iyear_67 /// > iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 . bootstrap _b _se, reps(50): /// > ivregress 2sls lw $xlist (iq=med kww age mrt) (running ivregress on estimation sample) Bootstrap replications (50) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. Bootstrap results 50 Number of obs Replications = = 758 50 -----------------------------------------------------------------------------| Observed Bootstrap Normal-based | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------b | iq | .0001747 .0074584 0.02 0.981 -.0144435 .0147928 s | .0691759 .0217356 3.18 0.001 .0265749 .1117769 expr | .029866 .0079507 3.76 0.000 .014283 .0454491 tenure | .0432738 .0086468 5.00 0.000 .0263264 .0602211 rns | -.1035897 .0406823 -2.55 0.011 -.1833256 -.0238538 smsa | .1351148 .0258812 5.22 0.000 .0843886 .1858411 iyear_67 | -.052598 .0422675 -1.24 0.213 -.1354408 .0302448 iyear_68 | .0794686 .0459301 1.73 0.084 -.0105528 .16949 iyear_69 | .2108962 .0456788 4.62 0.000 .1213673 .300425 iyear_70 | .2386338 .0592127 4.03 0.000 .122579 .3546886 iyear_71 | .2284609 .0513617 4.45 0.000 .1277939 .3291279 iyear_73 | .3258944 .0432171 7.54 0.000 .2411904 .4105984 _cons | 4.39955 .4995474 8.81 0.000 3.420455 5.378645 -------------+---------------------------------------------------------------se | iq | .0039035 .0012226 3.19 0.001 .0015073 .0062996 s | .0129366 .0034772 3.72 0.000 .0061214 .0197518 expr | .0066393 .0007373 9.00 0.000 .0051941 .0080845 tenure | .0076271 .0011929 6.39 0.000 .005289 .0099652 rns | .029481 .0052416 5.62 0.000 .0192077 .0397544 smsa | .0266573 .002741 9.73 0.000 .021285 .0320297 iyear_67 | .0476924 .0051268 9.30 0.000 .0376441 .0577407 iyear_68 | .0447194 .004026 11.11 0.000 .0368285 .0526102 iyear_69 | .0439336 .0055467 7.92 0.000 .0330623 .054805 iyear_70 | .0509733 .0052485 9.71 0.000 .0406864 .0612601 iyear_71 | .0437436 .0041483 10.54 0.000 .035613 .0518741 iyear_73 | .0407181 .0041193 9.88 0.000 .0326444 .0487917 _cons | .2685443 .0796381 3.37 0.001 .1124564 .4246321 -----------------------------------------------------------------------------. * Durbin-Wu-Hausman exogenous test robust errors . ivregress 2sls lw $xlist (iq=med kww age mrt), vce(robust) Simultaneous Equations Systems Instrumental variables (2SLS) regression Number of obs Wald chi2(12) Prob > chi2 R-squared Root MSE = = = = = 758 573.14 0.0000 0.4255 .32491 -----------------------------------------------------------------------------| Robust lw | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------iq | .0001747 .0041241 0.04 0.966 -.0079085 .0082578 s | .0691759 .0132907 5.20 0.000 .0431266 .0952253 expr | .029866 .0066974 4.46 0.000 .0167394 .0429926 tenure | .0432738 .0073857 5.86 0.000 .0287981 .0577494 rns | -.1035897 .029748 -3.48 0.000 -.1618947 -.0452847 smsa | .1351148 .026333 5.13 0.000 .0835032 .1867265 iyear_67 | -.052598 .0457261 -1.15 0.250 -.1422195 .0370235 iyear_68 | .0794686 .0428231 1.86 0.063 -.0044631 .1634003 iyear_69 | .2108962 .0408774 5.16 0.000 .1307779 .2910144 iyear_70 | .2386338 .0529825 4.50 0.000 .1347901 .3424776 iyear_71 | .2284609 .0426054 5.36 0.000 .1449558 .311966 iyear_73 | .3258944 .0405569 8.04 0.000 .2464044 .4053844 _cons | 4.39955 .290085 15.17 0.000 3.830994 4.968106 -----------------------------------------------------------------------------Instrumented: iq Instruments: s expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 med kww age mrt . ivreg2 lw $xlist (iq=med kww age mrt) IV (2SLS) estimation -------------------Estimates efficient for homoskedasticity only Statistics consistent for homoskedasticity only Total (centered) SS Total (uncentered) SS Residual SS = = = 139.2861498 24652.24662 80.0182337 Number of obs F( 12, 745) Prob > F Centered R2 Uncentered R2 Root MSE = = = = = = 758 45.91 0.0000 0.4255 0.9968 .3249 -----------------------------------------------------------------------------lw | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------iq | .0001747 .0039035 0.04 0.964 -.007476 .0078253 s | .0691759 .0129366 5.35 0.000 .0438206 .0945312 expr | .029866 .0066393 4.50 0.000 .0168533 .0428788 tenure | .0432738 .0076271 5.67 0.000 .0283249 .0582226 rns | -.1035897 .029481 -3.51 0.000 -.1613715 -.0458079 smsa | .1351148 .0266573 5.07 0.000 .0828674 .1873623 iyear_67 | -.052598 .0476924 -1.10 0.270 -.1460734 .0408774 iyear_68 | .0794686 .0447194 1.78 0.076 -.0081797 .1671169 iyear_69 | .2108962 .0439336 4.80 0.000 .1247878 .2970045 iyear_70 | .2386338 .0509733 4.68 0.000 .1387281 .3385396 iyear_71 | .2284609 .0437436 5.22 0.000 .1427251 .3141967 iyear_73 | .3258944 .0407181 8.00 0.000 .2460884 .4057004 _cons | 4.39955 .2685443 16.38 0.000 3.873213 4.925887 -----------------------------------------------------------------------------Underidentification test (Anderson canon. corr. LM statistic): 52.436 Chi-sq(4) P-val = 0.0000 -----------------------------------------------------------------------------Weak identification test (Cragg-Donald Wald F statistic): 13.786 Stock-Yogo weak ID test critical values: 5% maximal IV relative bias 16.85 10% maximal IV relative bias 10.27 20% maximal IV relative bias 6.71 30% maximal IV relative bias 5.34 10% maximal IV size 24.58 15% maximal IV size 13.96 20% maximal IV size 10.26 25% maximal IV size 8.31 Source: Stock-Yogo (2005). Reproduced by permission. -----------------------------------------------------------------------------Sargan statistic (overidentification test of all instruments): 87.655 Chi-sq(3) P-val = 0.0000 -----------------------------------------------------------------------------Instrumented: iq Included instruments: s expr tenure rns smsa iyear_67 iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 Excluded instruments: med kww age mrt ------------------------------------------------------------------------------ 67 68 . ivreg2 lw Chapter 4 $xlist (iq=med kww age mrt), gmm2s robust 2-Step GMM estimation --------------------Estimates efficient for arbitrary heteroskedasticity Statistics robust to heteroskedasticity Total (centered) SS Total (uncentered) SS Residual SS = = = 139.2861498 24652.24662 81.26217887 Number of obs F( 12, 745) Prob > F Centered R2 Uncentered R2 Root MSE = = = = = = 758 49.67 0.0000 0.4166 0.9967 .3274 -----------------------------------------------------------------------------| Robust lw | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------iq | -.0014014 .0041131 -0.34 0.733 -.009463 .0066602 s | .0768355 .0131859 5.83 0.000 .0509915 .1026794 expr | .0312339 .0066931 4.67 0.000 .0181157 .0443522 tenure | .0489998 .0073437 6.67 0.000 .0346064 .0633931 rns | -.1006811 .0295887 -3.40 0.001 -.1586738 -.0426884 smsa | .1335973 .0263245 5.08 0.000 .0820021 .1851925 iyear_67 | -.0210135 .0455433 -0.46 0.645 -.1102768 .0682498 iyear_68 | .0890993 .042702 2.09 0.037 .0054049 .1727937 iyear_69 | .2072484 .0407995 5.08 0.000 .1272828 .287214 iyear_70 | .2338308 .0528512 4.42 0.000 .1302445 .3374172 iyear_71 | .2345525 .0425661 5.51 0.000 .1511244 .3179805 iyear_73 | .3360267 .0404103 8.32 0.000 .2568239 .4152295 _cons | 4.436784 .2899504 15.30 0.000 3.868492 5.005077 -----------------------------------------------------------------------------Underidentification test (Kleibergen-Paap rk LM statistic): 41.537 Chi-sq(4) P-val = 0.0000 -----------------------------------------------------------------------------Weak identification test (Cragg-Donald Wald F statistic): 13.786 (Kleibergen-Paap rk Wald F statistic): 12.167 Stock-Yogo weak ID test critical values: 5% maximal IV relative bias 16.85 10% maximal IV relative bias 10.27 20% maximal IV relative bias 6.71 30% maximal IV relative bias 5.34 10% maximal IV size 24.58 15% maximal IV size 13.96 20% maximal IV size 10.26 Simultaneous Equations Systems Output from RATS * * Data passed from B34S(r) system to RATS * display @1 %dateandtime() @33 ' Rats Version ' %ratsversion() 10/17/2010 12:29 Rats Version 7.30000 * CALENDAR(IRREGULAR) ALLOCATE 758 OPEN DATA rats.dat DATA(FORMAT=FREE,ORG=OBS, $ MISSING= 0.1000000000000000E+32 ) / $ RNS $ RNS80 $ MRT $ MRT80 $ SMSA $ SMSA80 $ MED $ IQ $ KWW $ YEAR $ AGE $ AGE80 $ S $ S80 $ EXPR $ EXPR80 $ TENURE $ TENURE80 $ LW $ LW80 $ IYEAR_67 $ IYEAR_68 $ IYEAR_69 $ IYEAR_70 $ IYEAR_71 $ IYEAR_73 $ CONSTANT SET TREND = T TABLE Series Obs Mean Std Error Minimum Maximum RNS 758 0.269129288 0.443800128 0.000000000 1.000000000 RNS80 758 0.292875989 0.455382503 0.000000000 1.000000000 MRT 758 0.514511873 0.500119364 0.000000000 1.000000000 MRT80 758 0.898416887 0.302298767 0.000000000 1.000000000 SMSA 758 0.704485488 0.456574966 0.000000000 1.000000000 SMSA80 758 0.712401055 0.452941990 0.000000000 1.000000000 MED 758 10.910290237 2.741119861 0.000000000 18.000000000 IQ 758 103.856200528 13.618666082 54.000000000 145.000000000 KWW 758 36.573878628 7.302246519 12.000000000 56.000000000 YEAR 758 69.031662269 2.631794247 66.000000000 73.000000000 AGE 758 21.835092348 2.981755741 16.000000000 30.000000000 AGE80 758 33.011873351 3.085503913 28.000000000 38.000000000 S 758 13.405013193 2.231828411 9.000000000 18.000000000 S80 758 13.707124011 2.214692601 9.000000000 18.000000000 EXPR 758 1.735428758 2.105542485 0.000000000 11.444000244 EXPR80 758 11.394261214 4.210745167 0.691999972 22.045000076 TENURE 758 1.831134565 1.673629972 0.000000000 10.000000000 TENURE80 758 7.362796834 5.050240439 0.000000000 22.000000000 LW 758 5.686738782 0.428949363 4.605000019 7.051000118 LW80 758 6.826555411 0.409926757 4.749000072 8.031999588 IYEAR_67 758 0.083113456 0.276235910 0.000000000 1.000000000 IYEAR_68 758 0.104221636 0.305749595 0.000000000 1.000000000 IYEAR_69 758 0.112137203 0.315743524 0.000000000 1.000000000 IYEAR_70 758 0.084432718 0.278219253 0.000000000 1.000000000 IYEAR_71 758 0.121372032 0.326774746 0.000000000 1.000000000 IYEAR_73 758 0.208443272 0.406463569 0.000000000 1.000000000 TREND 758 379.500000000 218.960042017 1.000000000 758.000000000 69 70 * instruments s expr tenure rns smsa iyear_67 $ iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 med kww age mrt constant * OLS linreg lw # constant s expr tenure rns smsa iyear_67 $ iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 iq Chapter 4 $ Linear Regression - Estimation by Least Squares Dependent Variable LW Usable Observations 758 Degrees of Freedom 745 Centered R**2 0.430142 R Bar **2 0.420963 Uncentered R**2 0.996780 T x R**2 755.559 Mean of Dependent Variable 5.6867387823 Std Error of Dependent Variable 0.4289493629 Standard Error of Estimate 0.3264068956 Sum of Squared Residuals 79.373388790 Regression F(12,745) 46.8619 Significance Level of F 0.00000000 Log Likelihood -220.33424 Durbin-Watson Statistic 1.726206 Variable Coeff Std Error T-Stat Signif ******************************************************************************** 1. Constant 4.235356890 0.113348861 37.36568 0.00000000 2. S 0.061954782 0.007278581 8.51193 0.00000000 3. EXPR 0.030839472 0.006510083 4.73719 0.00000260 4. TENURE 0.042163060 0.007481211 5.63586 0.00000002 5. RNS -0.096293467 0.027546700 -3.49564 0.00050091 6. SMSA 0.132899286 0.026575835 5.00076 0.00000071 7. IYEAR_67 -0.054209478 0.047852181 -1.13285 0.25764051 8. IYEAR_68 0.080580850 0.044895091 1.79487 0.07307967 9. IYEAR_69 0.207591515 0.043860470 4.73300 0.00000265 10. IYEAR_70 0.228223732 0.048799418 4.67677 0.00000346 11. IYEAR_71 0.222691481 0.043095233 5.16743 0.00000031 12. IYEAR_73 0.322874689 0.040657433 7.94134 0.00000000 13. IQ 0.002712120 0.001031411 2.62952 0.00872684 * 2SLS linreg(inst) lw # constant s expr tenure rns smsa iyear_67 $ iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 iq Linear Regression - Estimation by Instrumental Variables Dependent Variable LW Usable Observations 758 Degrees of Freedom 745 Mean of Dependent Variable 5.6867387823 Std Error of Dependent Variable 0.4289493629 Standard Error of Estimate 0.3277301102 Sum of Squared Residuals 80.018233699 J-Specification(3) 86.151910 Significance Level of J 0.00000000 Durbin-Watson Statistic 1.723148 Variable Coeff Std Error T-Stat Signif ******************************************************************************** 1. Constant 4.399550073 0.270877148 16.24187 0.00000000 2. S 0.069175917 0.013048998 5.30124 0.00000015 3. EXPR 0.029866018 0.006696962 4.45964 0.00000948 4. TENURE 0.043273756 0.007693380 5.62480 0.00000003 5. RNS -0.103589698 0.029737133 -3.48351 0.00052378 6. SMSA 0.135114831 0.026888925 5.02492 0.00000063 7. IYEAR_67 -0.052598010 0.048106697 -1.09336 0.27458852 8. IYEAR_68 0.079468615 0.045107833 1.76175 0.07852207 9. IYEAR_69 0.210896152 0.044315294 4.75899 0.00000234 10. IYEAR_70 0.238633821 0.051416062 4.64123 0.00000409 11. IYEAR_71 0.228460915 0.044123572 5.17775 0.00000029 12. IYEAR_73 0.325894418 0.041071810 7.93475 0.00000000 13. IQ 0.000174655 0.003937397 0.04436 0.96463097 Simultaneous Equations Systems 71 * GMM linreg(inst,optimalweights) lw # constant s expr tenure rns smsa iyear_67 $ iyear_68 iyear_69 iyear_70 iyear_71 iyear_73 iq Linear Regression - Estimation by GMM Dependent Variable LW Usable Observations 758 Degrees of Freedom Mean of Dependent Variable 5.6867387823 Std Error of Dependent Variable 0.4289493629 Standard Error of Estimate 0.3302676947 Sum of Squared Residuals 81.262178869 J-Specification(3) 74.164878 Significance Level of J 0.00000000 Durbin-Watson Statistic 1.720776 745 Variable Coeff Std Error T-Stat Signif ******************************************************************************** 1. Constant 4.436784487 0.289950376 15.30188 0.00000000 2. S 0.076835453 0.013185922 5.82708 0.00000001 3. EXPR 0.031233937 0.006693110 4.66658 0.00000306 4. TENURE 0.048999780 0.007343684 6.67237 0.00000000 5. RNS -0.100681114 0.029588671 -3.40269 0.00066726 6. SMSA 0.133597299 0.026324546 5.07501 0.00000039 7. IYEAR_67 -0.021013483 0.045543337 -0.46140 0.64451500 8. IYEAR_68 0.089099315 0.042701995 2.08654 0.03692996 9. IYEAR_69 0.207248405 0.040799543 5.07967 0.00000038 10. IYEAR_70 0.233830843 0.052851170 4.42433 0.00000967 11. IYEAR_71 0.234552477 0.042566121 5.51031 0.00000004 12. IYEAR_73 0.336026675 0.040410335 8.31536 0.00000000 13. IQ -0.001401434 0.004113144 -0.34072 0.73331372 4.7 Potential problems of IV Models Instrumental variable estimation methods, while necessary and useful for models with endogenous variables on the right, have a number of features that can be serious drawbacks. 12 In the first place such estimators are never unbiased when endogenous variables are on the right. Citing Kinal (1980), Wooldridge (2010, 207) notes "when all endogenous variables have homoskedastic normal distributions with expectations linear in the exogenous variables, the number of moments of the 2SLS estimator that exist is one fewer than the number of overidentifying restrictions. This finding implies that when the number of instruments equals the number of explanatory variables, the IV estimator does not have the expected value." Even for large sample analysis, there will be problems if there are weak instruments. Assume a single endogenous variable x on the right or y 0 1x u (4.7-1) where z is the instrumental variable. It can be shown that cov( z, u ) p lim ˆ1 1 cov( z, x) c orr ( z, u ) 1 u x c orr ( z, x) (4.7-2) The greater the correlation between the instruments and the population error u the greater the bias. The weaker the instrument, the smaller corr ( z, x) and the greater the bias. The bias in the 12 Wooldridge (2010) especially pages 107-114 forms the basis for this section. 72 Chapter 4 OLS estimator is p lim( 1 ) 1 u corr ( x, u ) x (4.7-3) and can be less than the bias in the IV estimator if corr ( z , u ) | corr ( x, u ) || | corr ( z , x) (4.7-4) The more significant the Anderson test, the larger | corr ( z , x) | everything else equal and the less the bias in the IV estimator. The more significant the Basmann (1960) test, the larger | corr ( z , u ) | and the more bias in the IV estimator. 4.8 Conclusion The simeq command should be used when either there are endogenous variables on the right-hand side of a regression model or when the seemingly unrelated regression model is desired. In the former case, if OLS is attempted, the results will be biased estimates. Jennings (1973, 1980), the original developer of the simeq code, made a major contribution in developing fast and accurate code that was designed to alert the user to problems in the structure of the model. These include rank tests on all the key matrices as well as rank tests on the matrix of exogenous variables in the system. The matrix command was used to illustrate calculation of OLS, LIML, 2SLS, 3SLS and FIML models using more traditional equations that those used by Jennings. SAS and Rats code was shown and the results compared to the B34S program output. Using the matrix command LS2 (same as 2SLS) and GMM routines together with a number of diagnostic tests are shown and the results compared to Stata and Rats using an important dataset studied by Griliches (1975) and Baum (2006).