Single and Multiple Equation GMM: What do we do in linear model when orthogonality no longer holds Motivation: In our previous linear model, the most important assumption we made is the orthogonality between error term and regressors (i.e. Strict exogeneity or predetermined regressors), without which the OLS estimator is not even consistent for the desired β (i.e. from our model yi =xi’ β + εi ) Æ Endogeneity Bias! Since in economics the orthogonality condition is not satisfied, we develop methods here to deal with endogenous regressors, called Generalized Method of Moments (GMM), which includes OLS as a special case. Example 1: Single Equation GMM: Here we relax assumptions even further and take away the predetermined regressor assumption. Instead, we have orthogonality condition from instruments. I. Assumptions: 3.1 Linearity: The data we observe comes from underlying RV’s { yi (1x1), xi (1xd)}with (this is the equation we want to estimate) yi = xi’δ + εi (i = 1,2,…,n) 3.2 Ergodic Stationarity: Let zi be a M-dimensional vector of instruments, and let wi be the unique and nonconstant elements of (yi,xi, zi). {wi} is jointly stationary and ergodic. 3.3 Orthogonality Condition All the M variables in zi are predetermined in the sense that they are all orthogonal to the current error term: E ( z im ε i ) = 0 ∀i, ∀m ⇔ E[(yi – xi’β) . zi ] = 0 ⇔ E(gi) = 0 where gi = zi . (yi – xi’β) = zi .εi Note on Moment Conditions: E[(yi – xi’β)1x1 . zi Mx1 ]Mx1 = 0 Æ These are the M moment conditions Note on Instruments vs. Regressors: Even though we denote regressors and instruments by xi and zi, this does not mean that they do not share the same variables. Not true! Regressors that are predetermined are instruments, and regressors that are not predetermined are endogenous regressors. Note on 1 as an Instrument: Typically we will include 1 as an instrument Æ E(εi) = 0! 3.4 Rank Condition for Identification : Guarantees there’s a unique solution to the system of equations 1 The m x d matrix E(zi xi’)mxd is of full column rank (or E(xi zi’) is of full row rank), M > d (# of equations > # of unknowns). We denote this matrix by ΣZX. 3.5 Martingale Difference with Finite Second Moments: Assumption for Asymptotic Normality gi = xi . εi is a martingale difference sequence with finite second moments Æ gi is the sequence of moment conditions! { gi } is a martingale difference sequence (so E(gi) = 0 with E(gi | gi-1, gi-2,…, g1)= 0 for i > 2) Æ no serial correlation in gi The KxK matrix of cross moments, E(gi gi’) is nonsingular. Æ so, g nx1 = 1 n n ∑ g ⎯⎯→ N (0, E( g g ' )), S ≡ A var(g N i i i nx1 ) = E ( g i gi ' ) by 3.2 (so gi ergodic stationary) and Ergodic Differences CLT i =1 Note: Again, if zi includes a constant term Æ εi is MDS Æ no autocorrelation in εi Note: The same 4 comments here apply as in 2.5 2 Add’l: 3.6 Finite fourth moments for Regressors: (For consistent estimation of S) E[(xikzij)2] exists and is finite for all k = 1,…, d,j = 1,2, …, m 3.7 Conditional Homoskedasticity E(εi2 | zi ) = σ2 1 This is called the rank condition for identification for the following reason (see proof: the condition guarantees unique min) We can rewrite the moment/orthogonality condition as a system of K simultaneous equations: E[g(wi;δδξ1)] = 0mx1 where g(wi;δ)= xi . (yi – zi’δ), wi is the unique and nonconstant elements of (yi,xi, zi), and δδξ1 is the parameter vector / coefficient vector The moment condition means that the “true” value of the coefficient vector δδξ1 is a solution to this system of K simultaneous equations. Assumptions 3.1 – 3.3. guarantees that there exists a solution to the moment conditions, but if the coefficient vector (or the equation) is identified if there is a unique solution to the moment condition. A necessary and sufficient condition for a unique solution in the system of simultaneous equations is that ΣXZ has full column rank. Derivation: Want unique minimization of E[g(wi;δδξ1)]. Let β be a minimizer. β unique iff ∀ b ≠ β , E P [g ( wi ; b)] ≠ EP [g ( wi ; β )] where P is the underlying distribution that generated the data ⇔ E P [zi ⋅ ( yi − xi ' b) ] ≠ E P [zi ⋅ ( yi − xi ' β )]⇔ E P [zi ⋅ ( xi ' b)] ≠ E P [zi ⋅ ( xi ' β )]⇔ E P [zi ⋅ ( xi ' b)] − E P [zi ⋅ ( xi ' β )]≠ 0 ⇔ E P [zi ⋅ ( xi ' b − xi ' β )]≠ 0 ⇔ E P [zi ⋅ xi ']mxd (b − β ) ≠ 0 ⇔ E P [zi ⋅ xi ']mxd = Σ XZ has full column rank with m ≥ d If not, then there exists a nonzero vector b’ – β such that (b’ – β) is in the Ker(EP[zixi]). i.e. The columns are linearly dependent, so there exists a non-trivial linear combination (b’ – β) such that EP[zixi] (b’ – β) = 0 (Minimizer not unique!) Order Condition for Identification - m > d A necessary condition (embedded in the proofs here) is that m > d (# of equations > # of unknowns)Æ order condition for identification1 We can interpret this in 3 ways: #predetermined vars > #regressors or #orthogonality conditions > #parameters or # orthogonality conditions > #parameters If order condition is not satisfied, then the equation or parameter is not identified. We say that the equation is… 1. Overidentified if the rank condition is satisfied and m > d 2. Exactly identified / just identified if the rank condition is satisfied and m = d 3. Underidentified / not identified if the rank condition is not satisfied or m < d 2 IID is a special case. If iid, we only need Lindberg-Chevy CLT If instruments include a constant, then the error term is a martingales sequence (and a fortiori serially uncorrelated) The assumption is hard to interpret, so we interpret an easier/sufficient condition: E(εi | εi-1, εi-2,…, ε1, zz, zi-1,…, z1 ) = 0 Æ Besides being a martingales sequence and therefore uncorrelated with itself, the error term is orthogonal not only to the current but also to the past instruments Since gigi’= εi2 zizi’, S = E(gigi’) is a matrix of 4th moments, so consistent estimation of S will require a 4th moment assumption (assumption 3.6) II. What do Assumptions Imply about Properties of Instruments? 1. Orthogonality condition Æ Instruments are orthogonal to errors (and uncorrelated if 1 is included as an instrument, i.e. E(εi) = 0) 2. Rank Condition Æ (non-constant) Instruments are correlated with the endogenous variables 3 3. Instruments uncorrelated with Note on Orthogonality vs. Covariance: We can always think of orthogonality between error and instrument as covariance provided that one of them is de-meaned. Cov(x,e) = Cov(x,e-E(e)) = E(x(e-E(e)) and similarly we can rewrite model as: y = b0+E(e) + b1 x + [e – E(e)] III. Using 1 as an instrument: 2 Assumptions Made It is common practice to use 1 as an instrument, however, doing so we are making 2 important assumptions: 1. E(εi) = 0 (this comes from the orthogonality condition) 2. εi is an MDS Æ no serial correlation in εi Suppose regressors are (1 xi 3 z i ) , xi endogenous, instruments are (1 z2i Ezi ⎞ ⎟ Ez2i zi ⎟ ≡ A ⎟ Ezi xi Ezi2 ⎠ If cov( z2i , xi ) = 0, then 1st column of A times E ( xi ) = 2nd column of A ⎛⎡ 1 ⎤ ⎜ Then, moment condition is E ⎜ ⎢⎢ z2i ⎥⎥ ⎡⎣1 xi ⎜⎢ z ⎥ ⎝⎣ i ⎦ ⎞ ⎛ 1 ⎟ ⎜ zi ⎤⎦⎟ = ⎜ Ez2i ⎟ ⎜ Ez ⎠ ⎝ i zi ) Exi Ez2i xi IV. Generalized Method of Moments Defined: We’ll show that IV estimator is a special GMM estimator (i.e. exactly identified system) 1. General Setup: The true parameter of interest is the solution to the moment conditions β s.t. E [g (Wi , b)] = 0 ⇔ β = arg min E [g (Wi , b)]' W E [g (Wi , b)] for some p.d . weighting matrix ⇔ β = arg max − E [g (Wi , b)]' W E [g (Wi , b)] ⎡1 By Analogy Principle, bˆ GMM = arg max b∈Θ − ⎢ ⎣⎢ n n ∑ i =1 ' ⎤ ⎡1 g (Wi , b)⎥ W n ⎢ ⎢⎣ n ⎦⎥ ∑ g (W , b)⎥⎥ i i =1 where sample moment conditions are defined to be g n (Wi ; b) = 2. ⎤ n 1 n ⎦ n ∑ g (W , b) i i =1 Applied to Linear Model: Our model is yi = xi’δ + εi with the moment condition E[(yi – xi’β) . zi ] = 0 o Expression for sample moment condition in linear model g n (Wi , b) = 1 n ∑ ( n i =1 ) 1n ∑ z n z i ⋅ y i − x i' b = i ⋅ yi − i =1 1 n n ∑ z i ⋅ x i' b = i =1 1 n n ∑ zi ⋅ yi − i =1 1 n n ∑ z i ( mx1) ⋅ x i' (1xm) b = i =1 1 n n ∑ i =1 ⎛1 zi ⋅ yi − ⎜ ⎜n ⎝ n ⎞ ∑ z x ⎟⎟b ' i i i =1 ⎠ ≡ s ZY ( mx1) − S ZX ( mxd ) b( dx1) o Method of Moments: If the equation is exactly identified m = d, and there exists a unique b such that gn(Wi,b) = 0, and Σzx invertible, then for sufficiently large n, SZX Æp Σzx by ergodic theorem and is invertible with prob. 1. So, with large sample size the system of simultaneous equation has unique solution given by… The MM estimator bˆIV = (S ZX o ) −1 s ZY ⎛1 =⎜ ⎜n ⎝ n ∑ z i x i' i =1 ⎞ ⎟ ⎟ ⎠ −1 ⎛1 ⎜ ⎜n ⎝ n ∑z i =1 i ⎞ ⋅ y i ⎟ (IV estimator with zi as instruments, def. for m = d) 4 ⎟ ⎠ General Method of Moments: If the equation overly identified m > d, and there does not exists a unique b such that gn(Wi,b) = 0 exactly, we choose b to minimize gn(Wi,b) or equivalently gn(Wi,b)’Wn gn(Wi,b) (for some choice of p.d. weighting matrix that converges in probability to some p.d. W) 5 ⎡1 bˆ GMM = arg max b∈R N − ⎢ ⎢⎣ n n ∑ i =1 ' ⎤ g (Wi , b)⎥ W n ( mxm) ⎥⎦ ⎡1 ⎢ ⎢⎣ n ⎤ n ∑ g (W , b)⎥⎥ i i =1 ⎦ Note: If the equation is just identified, then regardless of the weighting matrix, the GMM estimator = the IV estimator numerically 3. GMM Estimator and Sampling Error o GMM Estimator: bGMM = (Szx’Wn Szx)-1 Szx’WnsZY 6 (By 3.2 and 3.4, Sxz has full column rank for sufficiently large n with prob 1, then Szx’Wn Szx invertible) 7 If m = d, then Szx is a mxm p.d. square matrix and the GMM est. reduces to the IV estimator regardless of Wn! 8 (P.d. by full column rank assumption Æ full rank) Also note that this means OLS is just an IV estimator! o Sampling Error: Our model is yi = xi’δ + εi bGMM - δ= (Szx’Wn Szx)-1 Szx’WnsZε 9 4 The IV estimator is defined for the EXACTLY IDENTIFIED case (i.e. the case where there are as many instruments as endogenous regressors). If zi = xi , i.e. all the regressors are predetermined/orthogonal to the error contemporaneous term, then this boils down to the OLS estimator. So, OLS is a special case of MM estimator. And, IV and OLS are both special cases of GMM 5 The quadratic form formulation gn(Wi,b)’Wn gn(Wi,b) gives us a 1x1 real number over which we can define the minimization/maximization problem. Otherwise it would be impossible to minimize over a m-dimensional vector of moment conditions gn(Wi,b)zd 1 1 bˆGMM = arg max b∈R n − g n (Wi , b )' Wn g n (Wi , b ) = arg max b∈R n − (sZY − S ZX b )'Wn (sZY − S ZX b ) 2 2 ∂ 1 6 FOC : Assume int eriority , − sZY ( mx1) − S ZX bˆ 'Wn sZY ( mx1) − S ZX bˆ = 0 ⇒ S ZX 'Wn sZY − S ZX bˆ = 0 ⇒ S ZX 'Wn s ZY = S ZX 'Wn S ZX bˆ ∂b 2 By assumption 3.2 and 3.4, S XZ is of full column rank for sufficiently big n with prob 1, then S XZ 'Wn S XZ invert. (Wn p.d .) ( ) ( ) ( ) ∴ bˆGMM = (S ZX 'Wn S ZX )−1 S ZX 'Wn sZY 7 8 Claim: if Amxn with m> n has full column rank, Wmxm p.d., then A’WA invertible. Proof: Suppose not. Then there exists non-zero nx1 vector c s.t. c’A’WAc = 0 Æ there exists a mx1 vector d = Ac, d nonzero (since A has full column rank so there are no nontrivial linear combination of columns that give zero vector), and d’Wd = 0. Contradiction to the assumption that W is p.d.! ' ' −1 −1 ' ' −1 ' −1 −1 ' ' −1 −1 ' bGMM = ( S ZX Wn S ZX ) −1 S ZX Wn sZY = ( S ZX Wn ( S ZX ) −1 ) S ZX Wn sZY = S ZX sZY = bIV Note : ( S ZX Wn S ZX ) −1 = ( S ZX Wn ( S ZX ) −1) sin ce S ZX Wn S ZX S ZX Wn ( S ZX ) −1 = I V. Large Sample Properties of GMM and Efficient GMM (We established these results more generally before using the multivariate mean value theorem. Here we can be more specific now because we impose linearity – our parameter is linear in xi. Also note that we index the estimator by the weight matrix Wn to denote sample size) 1. Consistency 10 : Under assumptions 3.1 – 3.4, bGMM(Wn) Æp δ Asymptotic Normality 11 : Under assumptions 3.1 – 3.5, 2. D n (bGMM (W n ) − δ ) ⎯⎯→ N (0, V ) ( where V = A var(bGMM (W n ) ) = Σ 'XZ W Σ XZ ) −1 ( Σ 'XZ W S W Σ XZ Σ 'XZ W Σ XZ ) −1 = (E ( x i z i ' )' W E ( x i z i ' ) )−1 E ( x i z i ' )' W E ( g i g i ' ) W E ( x i z i ' ) (E ( x i z i ' )' W E ( x i z i ' ) )−1 where W = p lim W n Consistent Estimate of Avar(bGMM(Wn)) 12 : Suppose there exists a consistent estimator S* of Smxm = E(gigi’). Then, under 3.2, Avar(bGMM(Wn)) is consistently estimated by −1 −1 ) ≡ ( S 'W S ) S 'W Sˆ W S ( S 'W S ) Aˆ var(bˆ 3. GMM ZX n ZX ZX n n ZX ZX n ZX Consistency of s2 (estimation of variance of “true” error is consistent) 13 : For any consistent estimator bˆ(W n ) of δ , define εˆi ≡ y i − x i ' bˆ(W n ). Under 3.1, 3.2, and assume E ( x i x i ' ) exists and is finite, 4. 1 n n ∑ εˆ 2 i ( ) P ⎯⎯→ E ε i2 ( ) provided E ε i2 exists and is finite i =1 Consistent estimation of S: We’ve assumed S* exists thus far- How do we obtain consistent estimator Ŝ of Skxk from the sample (y,X)? 5. Suppose the coefficient estimate b̂ used for calculating the residual εˆi for Ŝ is consistent for δ, and suppose S =E(gigi’) exists and is 1 finite. Then, under assumptions 3.1, 3.2, and 3.6, Sˆ = n n ∑ εˆ 2 i zi zi ' is consistent for S. 14 i =1 9 n n ⎛ ⎞ ⎛ ⎞ 1 1 −1 −1 −1 bˆGMM = ( S ZX 'Wn S ZX ) S ZX 'Wn sZY = ( S ZX 'Wn SZX ) S ZX 'Wn ⎜ zi ⋅ yi ⎟ = ( S ZX 'Wn S ZX ) S ZX 'Wn ⎜ zi ⋅ ( xi ' δ + ε i ) ⎟ by 3.1 ⎜⎜ n ⎟⎟ ⎜⎜ n ⎟⎟ ⎝ i =1 ⎠ ⎝ i =1 ⎠ ∑ = ( S ZX 'Wn S ZX ) 10 −1 ∑ n n ⎛ ⎞ ⎛ ⎞ 1 1 −1 −1 S ZX 'Wn ⎜ zi ⋅ ( xi ' δ ) ⎟ + ( S ZX 'Wn S ZX ) S ZX 'Wn ⎜ zi ⋅ ε i ⎟ = δ + ( S ZX 'Wn S ZX ) S ZX 'Wn s zε ⎟⎟ ⎜⎜ n ⎟⎟ ⎜⎜ n ⎝ i =1 ⎠ ⎝ i =1 ⎠ ∑ ∑ From above, since n ⎛ ⎞ ⎜1 ⎟ P z ε ⋅ i i ⎟ = g n ⎯⎯→ E (zi ⋅ ε i ) = E ( gi ) = 0 by ergodic theorem and 3.3 ⎜n ⎜ ⎟ ⎝ i =1 ⎠ P ˆ ∴b ⎯⎯→ δ ∑ 11 GMM Continuing from above, ( ) −1 −1 bˆGMM − δ = ( S ZX 'Wn S ZX ) S ZX 'Wn s xε ⇒ n bˆGMM − δ = ( S ZX 'Wn S ZX ) S ZX 'Wn n szε n ⎛ ⎞ 1 D P P n s xε = ⎜ zi ⋅ ε i ⎟ ⎯⎯→ N (0, E ( gi gi ')) by Ergodic Martingale Differences CLT , S ZX ⎯⎯→ Σ XZ by Ergodic Theorem, Wn ⎯⎯→ W by construction ⎜⎜ n ⎟⎟ i =1 ⎝ ⎠ ∑ ) ( −1 −1 −1 D ∴ n bˆGMM − δ = ( S ZX 'Wn S ZX ) S ZX 'Wn ns zε ⎯⎯→ N ⎛⎜ 0, ( Σ ZX 'Wn Σ ZX ) Σ ZX 'W E ( gi gi ')W ' Σ ZX ( Σ ZX 'Wn Σ ZX ) ⎞⎟ by CMT and Slutsky ' s ⎝ ⎠ 12 This follows from above. Standard asymptotic tools. 13 This proof is very similar to 3D from previous notes εˆi = yi − xi ' bˆ = yi − xi ' bˆ + xi 'δ − xi 'δ = ε i + xi ' (bˆ − δ ) ⇒ ei 2 = ε i 2 + 2ε i xi ' (bˆ − δ ) + (bˆ − δ )' xi xi ' (bˆ − δ ) ⇒ 1 n n ∑ i =1 ei2 = 1 n n ∑ ε i 2 + 2ε i xi ' (bˆ − δ ) + (bˆ − δ )' xi xi ' (bˆ − δ ) = i =1 1 n n ∑ i =1 ⎛1 ⎜n ⎝ ε i 2 + 2(bˆ − δ )⎜ n ∑ i =1 ⎞ ⎟ ⎠ P P P 2(bˆ − β )' s xε ⎯⎯→ 0 sin ce s xε ⎯⎯→ some finite vector and bˆ ⎯⎯→ β P P P ˆ ˆ ˆ (b − β )' S XX (b − β ) ⎯⎯→ 0 sin ce b − β ⎯⎯→ 0 and S XX ⎯⎯→ Σ XX finite by assumption 14 (Proof is similar to 4 – multiple zizi’ on both sides!) ⎛1 ⎜n ⎝ ε i xi ' ⎟ + (bˆ − δ )' ⎜ ⎞ 1 xi xi ' ⎟(bˆ − δ ) = ⎟ n i =1 ⎠ n ∑ n ∑ε i =1 2 i + 2(bˆ − β )' s xε + (bˆ − β )' S XX (bˆ − β ) VI. Efficient GMM: How do we choose Wn to minimize Avar(bGMM(Wn))? Let Wn = E(gi gi’)-1 Æ inverse of variance of moment conditions 1. Efficient weighting matrix is given by Wn* = S-1 = E(gi gi’)-1 For any weighting matrix Wn, Avar(bGMM(Wn)) > Avar(bGMM(Wn*))= [Σzx’Wn*Σzx ]-1 =[Σzx’ S-1 Σzx ]-1 =[E(xizi’)’ (E(gigi’)-1) E(xizi’) ]-1 ( eff bˆGMM ( Sˆ −1 ) = S 'ZX Sˆ −1S ZX 2. ) (S −1 ' ˆ −1 ZX S s ZY Large Sample Properties of Efficient GMM Estimator From above, efficient GMM is consistent, asymptotically normal with the following asymptotic variance and its consistent estimate: ( Aˆ var ( bˆ ) ( )) = ( S eff (Sˆ −1 ) = Σ'ZX S −1Σ ZX A var bˆGMM eff GMM 3. (Sˆ −1 ' ˆ −1 ZX S S ZX ) ) −1 −1 Hypothesis Testing: Robust t-Ratio and Wald Statistic From below, the formulas for robust t and Wald statistics become: tl = bˆ( Sˆ −1 )l − βl ( 1 ⎛ ' ˆ −1 ⎜ S ZX S S ZX n⎝ where Robust SEl* = Robust SEl* ( ) ( )( '⎧ ' ˆ −1 W = n ⋅ a bˆ( Sˆ −1 ) ⎨ A bˆ( Sˆ −1 ) S ZX S S ZX ⎩ 4. ) ) ( −1 −1 ) ) −1 ⎞ ⎟ ⎠ll ( '⎫ A bˆ( Sˆ −1 ) ⎬ a bˆ( Sˆ −1 ) ⎭ ) 2-Step Efficient GMM Procedure: How do we construct an Efficient GMM estimator? A. Pick some arbitrary weighting matrix (e.g. I) (that converges in probability to a symmetric p.d. W) and obtain a preliminary consistent GMM estimator bGMM = (Szx’Wn Szx)-1 Szx’WnsZY , which we will use to construct the optimal W Note: Usually we set Wn = SXX-1. Then bGMM(SXX-1) = (Szx’Szz-1Szx)-1 Szx’ Szz-1sZY ÆThis is the 2SLS 1 Then, using the preliminary estimator bGMM(SZZ-1) we construct Sˆ −1 with Sˆ = n B. In the second step, the efficient GMM estimator is obtained as: ' ˆ −1 bˆneff,GMM = ⎡ S ZX S S ZX ⎤ ⎣ ⎦ −1 ( n ∑ εˆ 2 i zi zi ' i =1 ) ( ' ˆ −1 ' ˆ −1 S ZX S sZY with Aˆ var bˆneff,GMM = S ZX S S ZX ⎡εˆ12 ⎢ with B ≡ ⎢ Or in matrix notation: bˆ( Sˆ −1 ) = [ X ' Z ( Z ' BZ ) −1 Z ' X ] X ' Z ( Z ' BZ ) −1 Z ' y ⎢ ⎣ Note: With this notation wan see that the efficient GMM is a GLS estimator! 5. ) −1 (FIX THE AVAR) ⎤ ⎥ 2 ⎥ (in 2SLS, B = σ I) εˆ n2 ⎥⎦ Note on Small Sample Properties The efficient GMM estimator uses Ŝ −1 , a function of estimated fourth moments, as weighting matrix. Generally, it takes a substantially larger sample size to estimate fourth moments reliably (compared to 1st and 2nd moments). Therefore, the efficient GMM estimator has poorer small-sample properties than the GMM estimators that do not use fourth moments for Wn. Equally weighted GMM estimator with Wn = I generally outperforms the efficient GMM in terms of the bias and variance in finite samples! VII. Hypothesis Testing Prop: Robust T-Ratio and Wald Statistic (Testing Linear and NonLinear Restrictions) Suppose Assumptions 3.1 – 3.5 hold, and suppose there is available a consistent estimate Ŝ of S. ( ) ' Then, by above, Aˆ var(bˆ) ≡ S ZX Wˆ S ZX −1 ( ' ˆ ˆ ' S ' Wˆ S S ZX Wˆ SWS ZX ZX ZX ) −1 , And… (a) Under the null hypothesis H 0 : β k = β k , ( n bˆk (Wˆ ) − β k tk ≡ )= bˆk − β k Aˆ var(bˆ (Wˆ )) kk 1 ˆ A var(bˆ (Wˆ )) kk n = bˆk − β k Robust S .E. (bˆk ) * → D N (0,1) 15 This t-ratio is the robust t-ratio because it uses the S.E. that is robust to errors that can be conditionally heteroskedastic. (b) Under the null hypothesis, H 0 : R# rxK β Kx1 = r# rx1 where R is an #rxK matrix, #r < K, with full row rank (where #r is the number of restrictions on β) ( W ≡ ( Rbˆ(Wˆ ) − r ) R[ Aˆ var(bˆ(Wˆ ))]R ' ) −1 ( Rbˆ(Wˆ ) − r ) → D χ 2 (# r ) 16 (c) Under the null hypothesis with #a restrictions 17 H 0 : a( β ) = 0 for some # a − dim ensional vector − valued function with continuous first derivatives s.t. ( first deriv evaluated at β ) A( β ) = Then, ∂a( β ) is # a × K matrix of continuous derivatives with full row rank ∂β Kx1 ( ) W ≡ n ⋅ a(bˆ(Wˆ ))1' x # a A(bˆ(Wˆ )) # axK Aˆ var(bˆ(Wˆ )) KxK A(bˆ(Wˆ ))'Kx # a a (bˆ(Wˆ )) # ax1 → D χ 2 (# a ) 18 Hypothesis Testing by the Likelihood Ratio Principle 15 ( ) ( n ( bˆk − β k ) → ) By V .2 above n bˆk − β k → D N 0, A var(bˆk ) , by V .3 above, Aˆ var(bˆk ) → P A var(bˆk ) Therefore, by Slutsky, Aˆ var(bˆk ) D N (0,1) 16 )( ( We can rewrite W = n Rbˆ − r ' R[ Aˆ var(bˆ)]R ' ) n Rbˆ − r = cn' Qn−1cn )( ( ) ( −1 Under the null , Rbˆ = r ⇒ W = n R bˆ − β ' R[ Aˆ var(bˆ)]R ' ) ( ( ) −1 ( n R Rbˆ − β ) ( ) From (b) above n bˆ − β → D N (0, A var(bˆ)) ⇒ R n bˆ − β → D N 0, RA var(bˆ) R ' ) From (c) above, [ Aˆ var(bˆ)] → P A var(bˆ) ⇒ R[ Aˆ var(bˆ)]R ' → D RA var(bˆ) R ' Re call , if xmx1 ~ N m ( μ , Σ mxm ), ( x − μ )Σ −1( x − μ ) ' ~ χ 2 ( m) Here, c → c ~ N ( RA var(bˆ) R '). n cn' Qn−1cn → c ' Q −1c ~ χ 2 (# r ) 17 18 The full row rank condition is there so that the hypothesis is well-defined. Thisis the generalization of the requirement for linear restrictions Rb = r that R is full row rank. ( Under Null , a ( β ) = 0 ⇒ na (bˆ) = n a (bˆ) − a( β ) ( ) ) ( ) D By (b) above, n bˆ − β ⎯⎯→ N (0, A var(bˆ)) ⇒ By Delta Method , n a (bˆ) − a ( β ) → D c c ~ N (0, A(bˆ) A var(bˆ) A(bˆ) ') The rest follows same from proof of (b) above. VIII. 1. Test for Overidentifying Restrictions: If the equation is exactly identified, then it is possible to choose b* s.t. gb(b*) = 0 and J(δ,Wn) = gn(b*)’Wn gn(b*) = 0 (We call b* the IV estimator). If the equation is overidentified, then the distance cannot be set to 0 exactly (since there is no correlation between the moment conditions), though we expect the minimized distance to be close to 0. If we choose the efficient weighting matrix W* s.t. plim W* = S-1, then the minimized distance is asymptotically chisquared. Hansen’s Test of Overidentifying Restrictions: Suppose there is available a consistent estimator Ŝ of S ( = E(gigi’) ). Under assumptions 3.1 – 3.5, D J bˆ Sˆ −1 , Sˆ −1 = g bˆ Sˆ −1 ' Sˆ −1 g bˆ Sˆ −1 ⎯⎯→ χ 2 (m − d ) ( GMM ( ) ) n ( GMM ( )) n ( GMM ( )) Note: This says that the objective function evaluated at the estimator, i.e. the minimum distance, is asymptotically chi-squared. This is a specification test, testing whether all the restrictions of the model (i.e. 3.1 – 3.5) are satisfied. Given a large enough sample, if the J statistic is “surprisingly” large, then either the orthogonality condition (3.3) or the other assumptions (or both) are likely to be false. 2. Testing a subset of orthogonality conditions (Newey, Eichenbaum, Hansen, and Singleton) Suppose assumptions 3.1 – 3.5 hold. Let zi1 be a subvector of zi, and strengthen Assumption 3.4 by requiring that the rank condition for identification is satisfied for zi1 (so E(xi1zi’) is full column rank ). Then, for any consistent estimator S* of S, and S11* of S11, D C ≡ J − J 1 ⎯⎯→ χ 2 (m − m1 ) where m = #zi (dimension of zi), m1 = #zi1 (dimension of zi1) J(δ,Wn) = n gn(b*)’S*-1 gn(b*) J1(δ,Wn) = n g1n(b*)’ S11*-1 g1n(b*) ⎡ g 1n bˆ d x1 ⎤ ⎡ S11( d1 xd1 ) S12( d1 xd 2 ) ⎤ 1 ⎥ , S dxd ≡ ⎢ g n (bˆ) mx1 ≡ ⎢ ⎥ ⎢⎣ g 2 n bˆ d 2 x1 ⎥⎦ ⎣ S 21( d 2 xd1 ) S 22( d 2 xd 2 ) ⎦ () () IX. Implications of Conditional Homoskedasticity: Assumption 3.7 – E(εi2| zi ) = σ2. A. S = E(gigi’) = E[εi2 zi zi’] = σ2ΣZZ (ΣZZ = E[zi zi’] ) 19 (as in chapter2, this decomposition has several implications) • S nonsingular (by 3.5), the decomposition implies σ2ΣZZ nonsingular Æ σ2 !=0 and ΣZZ nonsingular • A consistent estimator S* of S is: S * = σˆ 2 1 n n ∑z z ' i i = σˆ 2 S ZZ where σˆ 2 is some consistent estimator of σ 2 i =1 (By Ergodic stationarity, S*Æa.s. S , we don’t need 4th moment assumption!) B. Efficient GMM becomes 2SLS: GMM Estimator Under Conditional Homoskedasticity: Setting S * = σˆ 2 2 1 n n ∑z z ' i i = σˆ 2 S ZZ , GMM estimator becomes i =1 bGMM(( σˆ SZZ) ) = (Szx’( σ̂ Szz)-1Szx)-1 Szx’ ( σ̂ Szz)-1sZY = (Szx’(Szz)-1Szx)-1 Szx’ (Szz)-1sZY = bGMM(SZZ-1) = b2SLS (Does not depend on σˆ 2 !) 20 -1 X. 2SLS: 2SLS is a (special case) GMM estimator, i.e. with a particular choice of weighting matrix - ( σˆ 2 SZZ)-1. It’s also the efficient GMM estimator obtained under conditional homoskedasticity. A. Alternative Derivations of 2SLS: 2SLS as IV estimator and 2SLS as 2 Regressions ⎡ x1' ⎤ ⎡ z1' ⎤ ⎡ y1 ⎤ ⎢ '⎥ ⎢ '⎥ ⎢ ⎥ ⎢ x2 ⎥ ⎢ z2 ⎥ ⎢y ⎥ Let X nxd = ⎢ ⎥ , Z nxm = ⎢ ⎥ , ynx1 = ⎢ 2 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ x' ⎥ ⎢ z' ⎥ ⎢y ⎥ n n ⎣ ⎦ ⎣ ⎦ ⎣ n⎦ Then, bˆ = (S 2 SLS ZX '( S ZZ ) −1 S ZX ) −1 S ZX '( S ZZ ) −1 sZY = ( X ' Z ( Z ' Z ) −1 Z ' X ) −1 X ' Z ( Z ' Z ) −1 Z ' y = ( X ' PX ) −1 X ' Py where P = Z ( Z ' Z ) −1 Z ' −1 εˆ ' εˆ −1 ˆ and εˆ ≡ y − X bˆ2 SLS (bˆ2 SLS ) = nσˆ 2 ⎡ X ' Z ( Z ' Z ) −1 Z ' X ⎤ = nσˆ 2 [ X ' PX ] where σˆ ≡ AVar ⎣ ⎦ n T − Statistic : tl = ( bˆ2 SLS ,l − δ l σˆ 2 [ X ' PX ] Wald − Statistic : W = −1 { ) → D N (0,1) ll } −1 a(bˆ2 SLS ) ' A(bˆ2 SLS ) ⎡ X ' Z ( Z ' Z ) −1 Z ' X ⎤ A(bˆ2 SLS ) ' ⎣ ⎦ −1 a(bˆ2 SLS ) σˆ 2 ( y − Xbˆ ) P ( y − Xbˆ ) → = → D ChiSq (# a ) ' ( J − Statistic : J bˆ, (σˆ 2 S XX ) −1 S arg an ' s − Statistic : 19 20 ) σˆ 2 D ChiSq (m − d ) εˆ ' Pεˆ σˆ 2 S = E(gigi’) = E[ (zi εi) (zi εi)’] = E[εi2 zi zi’] = E[E(εi2 zi zi’ | zi) ] = E[zi zi’E(εi2 | zi )]= Ε[σ2 zi zi’ ] =σ2ΣZZ Note: In the efficient 2-step GMM estimation, the first step is to obtain a consistent estimator of S. Under conditional homoskedasticity, we don’t need to perform the first step since from the assumption we immediately get the consistent estimator S * = σˆ 2 1 n n ∑ z z = σˆ S ' i i 2 -1 ZZ . So, the second step estimator collapses to the GMM estimator with SXX as i =1 the weighting matrix. This estimator is called the 2SLS estimator because it can be estimated by 2 OLS regressions. B. 2 Interpretations of 2SLS: bˆ2 SLS = ( X ' PX ) −1 X ' Py (i) 2SLS as an IV Estimator Let x*i (Dx1) be the vector of D instruments for the D regressors xi, these instruments will be generated from zi (Mx1) as follows: a. The d-th instrument is the fitted value from regressing the d-th regressor, xid, on zi : −1 −1 −1 −1 We obtain xˆd = Z ( Z ' Z ) Z ' xd and Xˆ Dx1 = ⎡⎢ Z ( Z ' Z ) Z ' x1 ,..., Z ( Z ' Z ) Z ' xD ⎤⎥ = Z ( Z ' Z ) Z ' X = PX ⎣ ⎦ (Verify that these are indeed instruments: uncorrelated with errors and correlated with endogenous var) b. Recall, if ZnxD is the data matrix of D instruments for the D endogenous regressors XnxD, then bIV = −1 S ZX sZY ⎛1 =⎜ ⎜n ⎝ n ∑ i =1 ⎞ zi xi ' ⎟ ⎟ ⎠ −1 ⎞ xˆi xi ' ⎟ ⎟ ⎠ −1 ⎛1 ⎜ ⎜n ⎝ ∑ z y ⎟⎟⎠ ⎞ ⎛1 ⎜ ⎜n ⎝ ∑ xˆ y ⎟⎟⎠ = ( X ' Xˆ ) ( Xˆ 'Y ) = b n i i i =1 Here, the IV estimator is: ⎛1 −1 bIV = S ZX sZY = ⎜ ⎜n ⎝ n ∑ i =1 ⎞ n −1 2 SLS i i i =1 (ii) 2SLS as 2 Regressions ( −1 Since P is symmetric and idempotent, bˆ2 SLS = ( X ' PX ) −1 X ' Py = ( X ' P ' PX ) X ' Py = X ' X ) −1 X 'y We can obtain this estimator by 2 OLS estimations: Regress X on Z to obtain fitted values xˆi : i.e. obtain X = PX • Note: We only need to regress the endogenous variables on the instruments. The part of the vector x that is pre-determined it should be treated as an instrument, so projecting it onto the column space will get the same thing back. −1 Regress Y on X : i.e. obtain bˆ = X 'X X ' y where P = Z ( Z ' Z ) −1 Z ' a. b. 2 SLS ( ) Note: OLS packages return 2SLS SE’s based on the residual vector y – Xˆ bˆ2 SLS . This is NOT the same y – X bˆ (i.e. the true estimated residual of interest). Therefore the estimated asymptotic standard 2 SLS variance from the second stage cannot be used for statistical inference. C. Asymptotic Properties of 2SLS: These results follow from the fact that 2SLS is special case of GMM with Wn = (Szz)-1 a. b. Consistency 21 : Under Assumptions 3.1 – 3.4, the 2SLS estimator b2SLS = (Szx’(Szz)-1Szx)-1 Szx’ (Szz)-1sZY is consistent. Asymptotic Normality: If we add Assumption 3.5 to 3.1 – 3.4, then the 2SLS estimator is asymptotically normal ( ) D −1 n bGMM ( S ZZ ) − δ ⎯⎯ → N (0, V ) ( ) ( −1 where V = A var bGMM ( S ZZ ) = Σ'XZ Σ −ZZ1 Σ XZ ) −1 ( Σ'XZ Σ −ZZ1 S Σ −ZZ1 Σ XZ Σ'XZ Σ −ZZ1 Σ XZ ) −1 −1 = ( E ( xi zi ') ' E ( zi zi ') E ( xi zi ') ) E ( xi zi ') ' E ( zi zi ') E ( gi gi ') E ( zi zi ') E ( xi zi ') ( E ( xi zi ') ' E ( zi zi ') E ( xi zi ') ) bc S ZZ = 21 1 n n ∑ z z ⎯⎯→ E ( z z ') by Ergodic Theorem (sin ce instruments are ergodic stationary) ' i i P i i i =1 Since 2SLS estimator is a special case of GMM, therefore consistency follows from the general case −1 c. Conditional Homoskedasticity: If we add Assumption 3.7 to 3.1 – 3.5, then the estimator is the efficient GMM estimator with the asymptotic variance given by: GMM Estimator Under Conditional Homoskedasticity: Setting S * = σˆ 2 1 n n ∑z z ' i i = σˆ 2 S ZZ , GMM estimator becomes i =1 bGMM(( σ̂ SZZ)-1) = (Szx’( σ̂ Szz)-1Szx)-1 Szx’ ( σ̂ Szz)-1sZY = (Szx’(Szz)-1Szx)-1 Szx’ (Szz)-1sZY = bGMM(SZZ-1) = b2SLS ( ) ⎧ eff ⎛ −1 A var(bˆ2 SLS ) = A var ⎨bˆGMM ⎜ S ⎝ ⎩ ( −1 ⎞ ⎫ ⎧ ' 2 ⎟ ⎬ = under cond hom o = ⎨Σ ZX σ Σ ZZ ⎠⎭ ⎩ ) −1 −1 ( ⎫ Σ'ZX ⎬ = σ 2 Σ'ZX Σ −ZZ1 Σ ZX ⎭ ) −1 A natural, consistent estimator of asymptotic variance is ( ' −1 Aˆ var(bˆ2 SLS ) = σˆ 2 S ZX S ZZ S ZX d. 1 n ∑ ( y − x ' bˆ ) n i i 2 2 SLS i =1 σˆ 2 ⎡ ' −1 ⎢ S ZX S ZZ S ZX n ( ⎣ ) −1 ⎤ 2 −1 ⎥⎦ = σˆ X ' Z ( Z ' Z ) ZX ' = σˆ X ' PZ X ' l Wald-Statistic Converges in distribution to Chi-Square(#r) (where #r = the dimensionality of restrictions) W= f. where σˆ 2 ≡ T-Statistic Converges in distribution to Standard Normal: bˆ2 SLS − δ l D Tl = ⎯⎯ → N (0,1) Robust SE (bˆ2 SLS )l Robust SEl = e. ) −1 { } −1 a(bˆ2 SLS ) ' A(bˆ2 SLS ) ⎡ X ' Z ( Z ' Z ) −1 Z ' X ⎤ A(bˆ2 SLS ) ' ⎣ ⎦ −1 a (bˆ2 SLS ) σˆ 2 ⎯D⎯→ χ 2 (# r ) The Sargan Statistic Converges to Chi-Sq(m – r) S arg an ' s Stat = εˆ ' Pεˆ D ⎯⎯→ χ 2 (m − d ) σˆ 2 D. Note on Small Sample Properties of 2SLS There is research that shows that if the R2 in the first-stage regression is low, then we should suspect that the large sample approximation to the finite sample distribution of the SLS estimator to be poor. E. When Regressors are Predetermined and Errors are Conditionally Homoskedastic: Efficient GMM = OLS! When all regressors are predetermined and errors are conditionally homoskedastic, the objective function (J statistic) for the efficient GMM estimator/2SLS is: −1 ⎞ −1 ( y − Xb2 SLS ) ' P( y − Xb2 SLS ) ⎛ 2 ⎡⎣ Z ( y − Xb2 SLS ) ⎤⎦ = J ⎜ b2 SLS , σˆ 2 S ZZ ⎟ = ⎡⎣ Z ( y − Xb2 SLS ) ⎤⎦ ' σˆ S ZZ ⎝ ⎠ σˆ 2 y ' Py − b2 SLS ' X ' Py − y ' PXb2 SLS + b2 SLS ' X ' PXb2 SLS y ' Py − 2b2 SLS ' X ' Py + b2 SLS ' X ' PXb2 SLS = = 2 ˆ σ σˆ 2 y ' Py − 2b2 SLS ' X ' P ' y + b2 SLS ' X ' PXb2 SLS Since P = P ' = σˆ 2 y ' Py − 2b2 SLS ' X ' y + b2 SLS ' X ' Xb2 SLS = Since PX = X when xi ⊆ zi (i.e. regressors are instruments ) σˆ 2 ( y − Zb2 SLS ) ' ( y − Zb2 SLS ) − y ' y − y ' Py = σˆ 2 σˆ 2 ( y − Zb2 SLS ) ' ( y − Zb2 SLS ) − ( y − yˆ )( y − yˆ ) ' where yˆ ≡ Py = σˆ 2 σˆ 2 ( ) ( ) Since the last term does not depend on b, minimizing J amounts to minimizing the SSR = (y – Z b)’ (y – Z b). Implication: i. Efficient GMM estimator is OLS (This is true as long as zi = xi, ie. regressors are predetermined) ii. The restricted efficient GMM estimator subject to constraints of the null hypothesis is the restricted OLS (whose objective function is not J but SSR iii. Wald statistic, which is numerically equal to the LR statistic, can be calculated as the difference in SSR with and whtout the imposition of the null, normalized to σˆ 2 . (this confirms the derivation in 2.6 that the LR principle can derive the Wald) (Note: This is why we can fit OLS in the GMM framework, treating x’s as instruments) F. Limited Information Maximum Likelihood Estimator (LIML): This is the ML counterpart of 2SLS They’re both k class estimators. 2SLS is a k class estimator with k = 1 (p. 541). So when the equation is just identified (k = 1), LIML = 2SLS numerically. Multiple Equation GMM Background: Having seen how to estimate 1 equation via GMM, now we can estimate a system of multiple equations as well. Going from simple to multiple equation is easy because the multiple-equation GMM estimator can be expressed as a single-equation GMM estimator by suitably specifying the matrices and vectors comprising the single-equation GMM formula. Summary: Under conditional homoskedasticity, multiple equation GMM reduces to the full-information instrumental variable efficient estimator (FIVE), which reduces to the 3SLS if the set of instruments is common to all equations. If we further assume that all regressors are predetermined, the 3SLS reduces to seemingly unrelated regressions (SUR), which in turn reduces to the multivariate regression when all the equations have the same regressors. I. Assumptions: There are M equations, each of which is a linear equation like the one in GMM 4.1 Linearity: There are M linear equation, yim = xim’δm + εim (m = 1, 2, ... , m; i = 1,2,…,n) (this is the system of equations we want to estimate) (xim is the dm-dim. vector of regressors, δm is the coefficient vector, and εim is the unobservable error term for the m-th equation) Note on interequation correlation and cross-equation restriction 22 Cross-Equation restrictions often occur in panel data models, where the same relationship can be estimated for different points in time. 23 4.2 Ergodic Stationarity: Let wi be the unique and nonconstant elements of (yi1,…, yiM, xi1,…,xiM,zi1,…,ziM). {wi} is jointly stationary and ergodic. Note: This is stronger than assuming ergodic stationarity is satisfied for each of the M equations in the system. Even if {yim, zim, xim} is stationary and ergodic for each m does not imply the whole system {wi}, i.e. the union of individual processes, is jointly stationary and ergodic. 24 4.3 Orthogonality Conditions: Conditions for the M-equation system are just a collection conditions for each equation For each equation m, the K variables in zi are predetermined in the sense that they are all orthogonal to the current error term: ⎡ z i1( K x1) ⋅ y i1 − z i1 ⎤ ⎡ z i1( K x1) ⋅ ε i1 ⎤ (1xd1 ) ' δ d1 x1 1x1 1 1 ⎢ ⎥ ⎥ ⎢ E ( zimε im ) = 0 ∀i, m = 1, 2,..., M ⇔ E ( g i ) ⎛ M ⎞ ≡ E ⎢ ⎥ = E⎢ ⎥ = 0⎛ M ⎞ ⎜ ⎟ ⎜ K m x1 K ⎟ x1 ⎢ ⎥ ∑ ⎥ ⎢ ⎜ ⎟ ⎜∑ m ⎟ z ⋅ y i1 − z i1(1xd M ) ' δ d M x1 1x1 ⎝ i =1 ⎠ ⎝ i =1 ⎠ ⎣ z iM ( K M x1) ⋅ ε i1 ⎦ ⎣ i1( K M x1) ⎦ ( ( ) ) Note on Cross Orthogonalities: The model assumes no “cross” orthogonalities. e.g. zi1 and εi2 do not have to be orthogonal. However, if a variable is included in both zi1 and zi2 (shared instrument), then 4.3 implies that the variable is orthogonal to both εi1 and εi2. 22 The model makes no assumptions about the interequation (or contemporaneous) correlation between errors (εi1,…, εiM). Also, there is no a priori restrictions on the coefficients from different equations: i.e. the model assumes no cross-equation restrictions on the coefficients. Example: Suppose we want to estimate the wage equation (a la Grliches) and add to it the equation for KWW (score on the “Knowledge of the World Test”) LogWagei = φ1 + β1Si + γ 1 IQi + π EXPRi + ε i1 KWWi = φ2 + β 2 Si + γ 2 IQi + ε i 2 xi1 = (1 Si IQi EXPRi ) ', xi 2 = (1 Si IQi ) ', δ1 = (φ1 , β1 , γ 1 , π ) ', δ1 = (φ2 , β 2 , γ 2 ) ' Here, εi1 and εi2 can be correlated. Correlation arrises if, for example, there is unobservable individual characteristic that affects both wage rate and test score. There are no cross-equation restrictions such as β1 = β2 23 Panel data Example: Suppose we have data for the log-wage exercise (a la Griliches) in 1969 and 1980. We can estimate 2 wage equations: LW 69i = φ1 + β1S 69i + γ 1 IQi + π1 EXPR69i + ε i1 LW 80i = φ2 + β 2 S 80i + γ 2 IQi + π 2 EXPR80i + ε i 2 In this setup, S69i (education in 1969) and S80i (education in 1980) are 2 different variables. One possible set of restrictions is that the set of coefficients remain unchanged through time (i.e. same effect) Æ φ1 = φ2, β1 = β2, γ1 = γ2, π1 = π2 24 Example: Element-wise vs. joint stationarity Let {ei} (i = 1,2,…) be a scalar i.i.d. process. Create a 2-dimensional process {zi} from it by defining zi1 = ei and zi2 = e1. Note: The first case is an example of iid sequence. The second is an example of a constant sequence (maximum serial dependence). Both are types of stationary processes. Here, the scalar processes {zi1} and {zi2}is are stationary. However, the vector process {zi}is not jointly stationary because the joint distribution of z1 = (ε1,ε1)’ differs from that of z2 = (ε2,ε1)’. 4.4 Rank Condition for Identification: Guarantees there’s a unique solution to the system of equations25 For each of the m (=1,2,…,M) equations, the Km x dm matrix E(zi xi’)KmxDm is of full column rank (or E(xi zi’) is of full row rank), Km > dm (# of equations > # of unknowns) for all m. 4.5 gi is a Martingale Difference with Finite Second Moments: Assumption for Asymptotic Normality {gi} is a joint martingale difference sequence with finite second moments (so E(gi) = 0 with E(gi | gi-1, gi-2,…, g1)= 0 for i > 2) Æ no serial correlation in gi) ⎡ E (ε i1ε i1 z i1 z i1 ') … E (ε i1ε iM z i1 z iM ') ⎤ ⎥ is The matrix of cross moments, S = E(gi gi’) = S = E ( g i g i ' ) ⎛ M ⎞ ⎛ M ⎞ = ⎢⎢ ⎥ ⎟ ⎜ ⎟ ⎜ K K × ⎜∑ m ⎟ ⎜∑ m⎟ ⎢⎣ E (ε i1ε i1 z iM z i1 ') … E (ε iM ε iM z iM z iM ')⎥⎦ ⎠ ⎠ ⎝ m =1 ⎝ m =1 nonsingular. Note: This is stronger than assuming gim= zim. εim is MDS in each equation m. Add’l: 4.6 Finite fourth moments for Regressors: (For consistent estimation of S) E[(zimkxihj)2] exists and is finite for all k = 1,…, Km,j = 1,2, …, Dh m,h = 1,2,…M, where zimk is the kth element of xim and zihj is the jth elemnt of zih 4.7 Conditional Homoskedasticity: Constant Cross Moment E(εimεih | zim , zih) = σmh2 for all m, h = 1,2,…, M or E(εiεi‘| Zi)= Σ Note on Complete System of Simultaneous Equations: the “Complete” system adds more assumptions to our model, assumptions which are unnecessary for development of ME GMM. They are covered in 8.5. II. Multuple-Equation GMM Defined: This is same as Single Equation GMM but with re-defined matrices 1. General Setup ⎡ δ1 ⎤ The parameter of interest δ GMM = ⎢⎢ ⎥⎥ is defined implicitly as the solution to the moment conditions ⎢⎣δ M ⎥⎦ ( E ( g i ( wi , δ )) ⎛ M ⎞ ⎜ K m ⎟ x1 ⎜ ⎟ ⎝ i =1 ⎠ ∑ ) ⎡ z i1( K x1) ⋅ y i1 − x i1 ⎤ ⎡ E ( z i1( K x1) ⋅ y i1 ) ⎤ (1xd1 ) ' δ d1 x1 1x1 1 1 ⎢ ⎥ ⎢ ⎥ ≡ E⎢ ⎥=⎢ ⎥ ⎢z ⎥ ⎢E( z ⎥ ⋅ ' ) y x y ⋅ − δ iM ( K M x1) iM ⎦ i1 iM (1xd M ) d M x1 1x1 ⎦ ⎣ ⎣ iM ( K M x1) ⎛M ( ) ⎡ E ( z i1( K x1) ⋅ y i1 ) ⎤ ⎡ E ( z i1( K x1) x i'1 (1xd ) ' ) 1 1 1 ⎢ ⎥ ⎢ =⎢ − 0 ⎥ ⎢ ⎢E( z ⎥ ⎢ 0 iM ( K M x1) ⋅ y iM ) ⎦ ⎣ ⎣ ⎞ ⎜ ∑ K m ⎟ x1 ⎟ ⎜ ⎝ i =1 ⎠ 0 0 ⎡ E ( z i1( K x1) x i'1 ⎤ (1xd1 ) ' δ d1 x1 ) 1 ⎢ ⎥ −⎢ ⎥ ' ⎢E( z ⎥ ' ) x ⋅ δ i1( K1 x1) iM (1xd1 ) d M x1 ⎦ ⎣ ⎛M ⎞ ⎜ ∑ K m ⎟ x1 ⎜ ⎟ ⎝ i =1 ⎠ ⎤ ⎡ δ 1( d x1) ⎤ 1 ⎥ ⎢ ⎥ 0 ⎥ ⎢ ⎥ ' ⎥ M ⎢δ ⎥⎛ M ⎞ E ( z i1( K1 x1) ⋅ x iM ' ) M M d x ( 1 ) ⎛ ⎞ ⎛ ⎞ (1xd1 ) ⎦ M ⎦ ⎜ ∑ d m ⎟ x1 ⎜ ∑ K m ⎟ x⎜ ∑ d m ⎟ ⎣ 0 ⎝ i =1 ⎠ ⎝ i =1 ⎝ i =1 ⎠ ⎠ ≡ σ ZY − Σ ZX δ = 0 Sample Analogue: ⎤ ⎡1 ⎡ 1 n z i1( K1 x1) ⋅ y i1 ⎥ ⎢ ⎢ ⎥ ⎢n ⎢ n i =1 ⎥−⎢ g n ( wi , δ ) ⎛ M ⎞ ≡ ⎢ ⎥ ⎢ ⎢1 n ⎜ K m ⎟ x1 ∑ ⎟ ⎜ ⎢ ⎠ ⎝ i =1 z iM ( K M x1) ⋅ y iM ⎥ ⎢ ⎥ ⎢ ⎢ n i =1 ⎦ ⎣ ⎣ ∑ ∑ ⎤ ⎥ ⎥ x iM ( d M x1) ⎥⎦ ⎡ x i1( d x1) ⎤ 1 ⎢ ⎥ ' ⎥ and Σ ZX = E ( Z i X i ) with X i = ⎢ ⎢ z iM ( K M x1) ⎥⎦ ⎣ ⎡ z i1( K x1) 1 ⎢ Where σ ZY = E ( Z i Yi ) with Z i = ⎢ ⎢ ⎣ So, E(ZiYi) = E(ZiXi’)δ n ∑z i =1 ' i1( K1 x1) x i1 (1xd1 ) 0 0 0 0 ⎤ ⎥ ⎡ δ 1( d x1) ⎤ ⎥ 1 ⎢ ⎥ ⎥ 0 ⎢ ⎥ ⎥ ⎢ ⎥ ' δ z iM ( K1 x1) x iM (1xd ) ⎥ ⎣ M ( d M x1) ⎦ ⎛⎜ ∑M d m ⎞⎟ x1 1 ⎥⎛ M ⎞ ⎛ M ⎞ ⎝ i =1 ⎠ ⎦ ⎜ ∑ K m ⎟ x⎜ ∑ d m ⎟ 0 1 n n ∑ i =1 ⎝ i =1 ⎠ ⎝ i =1 ⎠ ≡ s ZY − S ZX δ = 0 We can uniquely determine all the coefficient vectors δ1,..,δm iff each coefficient vector δmis uniquely determined, which occurs iff Assumption 3.4 holds for each equation. The rank condition is simple here because there are no cross-equation restrictions – when coefficients are assumed to be the same across all equations we will have different identification condition. 25 4 Special Features of M.E. GMM: We substitute these into Single Equation GMM and get same results! i. sZY is a stacked vector ii. SZX is a block diagonal matrix iii. By ii, Wn is a Sm Km x Sm Km matrix ⎡ 1 n ⎤ z i1 ⋅ ε i1 ⎥ ⎢ n ⎢ n i =1 ⎥ 1 ⎥ = g n (δ ) is a stacked vector iv. g ≡ gi = ⎢ ⎢1 n ⎥ n i =1 ⎢ z iM ⋅ ε iM ⎥ ⎢ n i =1 ⎥ ⎣ ⎦ ∑ ∑ ∑ 2. Applied to Linear Model: Our model is yi = xi’δ + εi with the moment condition E[(yi – xi’β) . zi ] = 0 o Expression for sample moment condition in linear model g n (Wi , b) = s ZY ( mx1) − S ZX ( mxd ) b( dx1) o Method of Moments: If the equation is exactly identified Km = dm for all m, and there exists a unique b such that gn(Wi,b) = 0, and Σzx invertible, then for sufficiently large n, SZX Æp Σzx by ergodic theorem and is invertible with prob. 1. So, with large sample size the system of simultaneous equation has unique solution given by… The MM estimator (CHECK THIS!!!) δˆ IV = (S ZX o )−1 s ZY (IV estimator with zi as instruments, def. for m = d) 26 General Method of Moments: If the equation overly identified m > d, and there does not exists a unique b such that gn(Wi,b) = 0 exactly, we choose δ(Wn) to minimize gn(Wi,b) or equivalently gn(Wi,b)’Wn gn(Wi,b) (for some choice of p.d. weighting matrix that converges in probability to some p.d. W) 27 [ ] [ δ GMM = arg max b∈R N − [E ( g i (Wi , b))]' W [E ( g i (Wi , b))] = − E ( Z i (Yi − X i' b)) W E ( Z i (Yi − X i' b)) ⎡1 n ∑ ⎢n δˆ GMM = arg max b∈R N − ⎢ ⎣ δ ⎤ ⎡1 g i (Wi , b)⎥ W n ( mxm) ⎢ ⎢⎣ n ⎦⎥ ] ⎤ n ∑ g (W , b)⎥⎥ i i i =1 ⎦ −1 GMM = ⎣⎡ E ( Z i X i ') 'WE ( Z i X i ') ⎦⎤ E ( Z i X i ') W E ( Z iYi ) ⎧⎡ ⎪ 1 δˆ GMM = ⎨ ⎢ ⎪⎩ ⎢⎣ n 26 i =1 ' ' ⎤ Zi ' X i ⎥ Wn ( mxm ) ⎥⎦ i =1 n ∑ ⎡1 ⎢ ⎢⎣ n ⎤ ⎫⎪ Zi ' X i ⎥ ⎬ ⎥⎦ ⎪ i =1 ⎭ n ∑ −1 ⎡1 ⎢ ⎢⎣ n ' ⎤ ⎡1 Z i ' X i ⎥ Wn ( mxm ) ⎢ ⎥⎦ ⎢⎣ n i =1 n ∑ ⎤ n ∑ Z Y ⎥⎥ i i i =1 ⎦ The IV estimator is defined for the EXACTLY IDENTIFIED case (i.e. the case where there are as many instruments as endogenous regressors). If zi = xi , i.e. all the regressors are predetermined/orthogonal to the error contemporaneous term, then this boils down to the OLS estimator. So, OLS is a special case of MM estimator. And, IV and OLS are both special cases of GMM 27 The quadratic form formulation gn(Wi,b)’Wn gn(Wi,b) gives us a 1x1 real number over which we can define the minimization/maximization problem. Otherwise it would be impossible to minimize over a m-dimensional vector of moment conditions gn(Wi,b)zd XI. ME GMM Estimator and Sampling Error GMM Estimator: δGMM(Wn) = (Szx’Wn Szx)-1 Szx’WnsZY 28 (By 4.2 and 4.4, Sxz has full column rank for sufficiently large n with prob 1, then Szx’Wn Szx invertible) Note: If m = d, then Szx is a mxm square matrix and the GMM estimator reduces to the IV estimator if we pick Wn = o Sampling Error: δGMM(Wn) - δ= (Szx’Wn Szx)-1 Szx’WnsZε o ⎡ δˆ1 (Wˆ ) ⎤ ⎢ ⎥ ˆ δˆGMM (Wˆ ) = ⎢ ⎥ = S ZX ' WS ZX ⎢δˆ (Wˆ )⎥ ⎣ M ⎦ ( ⎛ ⎡⎛ 1 ⎜ ⎢⎜ ⎜ ⎢⎜ n ⎜ ⎢⎝ = ⎜⎢ ⎜⎢ ⎜⎢ ⎜⎜ ⎢ ⎝⎣ ⎡⎛ 1 ⎢⎜ ⎢⎜⎝ n ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ n ∑ i =1 ⎞ xi1zi'1 ⎟ ⎟ ⎠ d1 xK1 0 0 n ∑ i =1 ⎞ xi1zi'1 ⎟ ⎟ ⎠d1 xK1 0 0 0 0 ⎛1 ⎜ ⎜n ⎝ n ∑ i =1 0 0 −1 S ZX ' Wˆ sZY ⎤ ⎥ 0 ⎥ ⎡ Wˆ11( K1 xK1 ) ⎥⎢ 0 ⎥⎢ ⎞ ⎥ ⎢Wˆ M 1( K xK ) ' ⎟ M 1 xiM ziM ⎥⎣ ⎟ ⎠ d M xK M ⎥ ⎦ ⎡⎛ 1 ⎢⎜ Wˆ1M ( K1 xK M ) ⎤ ⎢⎜⎝ n ⎥⎢ ⎥⎢ Wˆ MM ( K M xK M ) ⎥ ⎢ ⎦⎢ ⎢ ⎣ ⎤ ⎥ ⎥ ⎡ Wˆ11( K1 xK1 ) ⎥⎢ 0 ⎥⎢ ⎞ ⎥ ⎢Wˆ M 1( K xK ) ' ⎟ M 1 xiM ziM ⎥⎣ ⎟ ⎠ d M xK M ⎥ ⎦ ⎡ 1 n ⎤ zi1( K1 x1) ⋅ yi1 ⎥ ⎢ Wˆ1M ( K1 xK M ) ⎤ ⎢ n ⎥ i =1 ⎥ ⎢ ⎥ ⎥ ⎢1 n ⎥ ⎥ ˆ WMM ( K M xK M ) ⎢ ziM ( K M x1) ⋅ yiM ⎥ ⎦ ⎢ n i =1 ⎥ ⎣ ⎦ ⎛1 ⎜ ⎜n ⎝ n ∑ i =1 ⎛⎡ ⎛ 1 n ⎞ ⎛1 n ⎞ ⎜⎢ ⎜ xi1zi'1 ⎟ Wˆ11( K1 xK1 ) ⎜ zi1xi'1 ⎟ ⎜⎢ ⎜ n ⎟ ⎜n ⎟ ⎠ d1 xK1 ⎝ i =1 ⎠ K1 xd1 ⎜ ⎢ ⎝ i =1 = ⎜⎢ ⎜ ⎢⎛ n ⎞ ⎛1 n ⎞ ' ⎟ ⎜ ⎢⎜ 1 xiM ziM WˆM 1( K M xK1 ) ⎜ zi1xi'1 ⎟ ⎜⎜ ⎢⎜ n ⎟ ⎜n ⎟ i =1 ⎠ d M xK M ⎝ i =1 ⎠ K1 xd1 ⎝ ⎣⎝ ∑ ⎡ ⎛1 n ⎞ ⎛1 n ⎞ ⎢ ⎜ xi1zi'1 ⎟ Wˆ11( K1 xK1 ) ⎜ zi1( K1 x1) ⋅ yi1 ⎟ ⎜ ⎟ ⎜ ⎟ ⎢ ⎝ n i =1 ⎠ d1 xK1 ⎝ n i =1 ⎠ K1 xd1 ⎢ ⎢ ⎞ ⎛1 n ⎞ ⎢⎛⎜ 1 n ' ⎟ ˆ ⎜ x z W zi1( K1 x1) ⋅ yi1 ⎟ iM iM ⎟ M 1( K M xK1 ) ⎜ ⎢⎜ ⎟ ⎢⎝ n i =1 ⎠ d M xK M ⎝ n i =1 ⎠ K1 xd1 ⎣ ∑ ∑ ∑ ∑ Where Wm,h (KmxKh) is the (m,h) block of Wn. Recall, for Block matrices: ⎡ A' ⎤⎡ B B1M ⎤ ⎡ A11 ⎥⎢ ⎢ 11 ⎥ ⎢ 11 1. ⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎢ ⎥⎢ ' A B B MM ⎥⎦ ⎢ M 1 ⎢⎣ MM ⎥⎦ ⎢⎣ ⎣ ⎡ A' ⎢ 11 2. ⎢ ⎢ ⎣⎢ ⎤⎡ B ⎥ ⎢ 11 ⎥⎢ ⎥⎢ ' AMM B ⎦⎥ ⎣⎢ M 1 ⎤ ⎡ A' B A ⎥ ⎢ 11 11 11 ⎥=⎢ ⎥ ⎢ ' AMM ⎥ ⎢ AMM BM 1 A11 ⎦ ⎣ ' B1M ⎤ ⎡ c1 ⎤ ⎡ A11 B11c1 + ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥ = ⎢ ⎥ ⎢ ' BMM ⎥ ⎢⎣cM ⎥⎦ ⎢ AMM Bc1 + ⎣ ⎦ + + ∑ i =1 ⎞ zi1xi'1 ⎟ ⎟ ⎠ K1 xd1 0 0 0 ⎛1 ⎜ ⎜n ⎝ 0 n ∑ i =1 ⎤⎞ ⎥⎟ 0 ⎥⎟ ⎥⎟ 0 ⎥⎟ ⎞ ⎥⎟ ' ⎟ ziM xiM ⎥⎟ ⎟ ⎠ K M xd M ⎥ ⎟⎟ ⎦⎠ −1 ∑ ⎛1 ⎜ ⎜n ⎝ ∑ ∑ n ∑ 0 ∑ 28 ) ⎛1 ⎜ ⎜n ⎝ n ∑ i =1 n ⎞ ⎛1 xi1zi'1 ⎟ Wˆ1M ( K1 xK M ) ⎜ ⎟ ⎜n ⎠d1 xK1 ⎝ ∑x ' iM ziM i =1 + + + ⎛1 + ⎜ ⎜n ⎝ ∑ i =1 ⎞ ⎛1 ⎟ WˆMM ( K M xK M ) ⎜ ⎟ ⎜n ⎠ d M xK M ⎝ ⎛1 ⎜ ⎜n ⎝ n ∑ n ∑ i =1 i =1 ⎤⎞ ⎥⎟ ⎥⎟ ⎥⎟ ⎥⎟ ⎥⎟ ⎞ ' ⎟ ziM xiM ⎥⎟ ⎟ ⎟ ⎠ K M xd M ⎥⎦ ⎟⎠ ⎞ ' ⎟ ziM xiM ⎟ ⎠ K M xd M n ∑ i =1 ⎞ ⎛1 xi1zi'1 ⎟ Wˆ1M ( K1 xK M ) ⎜ ⎟ ⎜n ⎠ d1 xK1 ⎝ n ∑ i =1 ⎞ ⎛1 ' ⎟ xiM ziM Wˆ MM ( K M xK M ) ⎜ ⎟ ⎜n ⎠ d M xK M ⎝ ' A11 B1M AMM ⎤ ⎥ ⎥ ⎥ ' AMM BMM AMM ⎥ ⎦ ' A11 B1M cM ⎤ ⎥ ⎥ ⎥ ' AMM BMM cM ⎥ ⎦ n −1 ⎤ ⎥ ⎥ ⎥ ⎥ ⎞ ⎥ ⎟ ziM ( K M x1) ⋅ yiM ⎥ ⎟ ⎠ K M xd M ⎥ ⎦ ⎞ ziM ( K M x1) ⋅ yiM ⎟ ⎟ ⎠ K M xd M n ∑ i =1 III. Large Sample Theory: As seen above, all the theory/formulas for the multiple-equation GMM is a matter of substitution of the newly defined matrices into the equations. (See summary) GMM Summary IV. Population Orthogonality Conditions g ( wi , δ ) ≡ E ( Z iYi ) − E ( Z i X i ')δ = σ ZY − Σ ZX δ = 0 g n ( wi , δ ) ≡ sZY − S ZX δ = 0 Sample Analogue of Orthogonality Conditions δ IV ≡ (S ZX )−1 s ZY since SZX square and invertible IV Estimator (#moment conditions = # of parameters) GMM Estimator (#moment conditions > # of parameters Æ Can’t set gn exatly to 0) GMM Estimator’s Sampling Error GMM Estimator’s Asymptotic Variance (Under Asymptotic Normality) Consistent Estimator of GMM Estimator’s Asymptotic Variance Optimal/Efficient Weighting Matrix Efficient GMM Estimator Asymptotic (Minimum) Variance of the Efficient/Optimal GMM Asymptotic (Minimum) Variance of the Efficient/Optimal GMM J-Statistic / Objective Function evaluated at Optimal Weighting Matrix δGMM(Wn) = (Szx’Wn Szx)-1 Szx’WnsZY δGMM(Wn) - δ= (Szx’Wn Szx)-1 Szx’WnsZε ( A var(bGMM (Wn ) ) = Σ 'XZ W Σ XZ Aˆ var(bˆGMM ) ≡ (S ZX ' W n S ZX bn,GMM ) −1 ( Σ 'XZ W S W Σ XZ Σ 'XZ W Σ XZ )−1 S ZX ' W S *W ' S ZX (S ZX ' W n S ZX S*-1 s.t. S*-1Æp S-1 = E(gigi’)-1 ( Sˆ −1 ) = ( S ' Sˆ −1 S ) −1 S ' Sˆ −1 s ZX ZX ZX ZY A var(bn,GMM ( Sˆ −1 )) = (Σ ZX ' S −1 Σ ZX ) −1 = (Σ ZX ' E ( g i g i ' ) −1 Σ ZX ) −1 ( Aˆ var(bˆGMM ) ≡ S ZX ' Sˆ −1 S ZX ( ( ) ) ( ( )) ) −1 ( ( )) ' J bˆGMM Sˆ −1 , Sˆ −1 = n ⋅ g n bˆGMM Sˆ −1 Sˆ −1 g n bˆGMM Sˆ −1 ) −1 )−1 Definitions and Properties of Single Equation vs. Multiple Equation Estimators: gi Single – Equation GMM zi(KX1) . εi δ δdx1 sZY ⎛1 ⎜ ⎜n ⎝ Multiple Equation GMM ⎛ M ⎞ ⎜ ⎟ gi = [zi1 ⋅ ε i1 … ziM ⋅ ε iM ]' ⎜ K m ⎟ x1 ⎜ ⎟ ⎝ m =1 ⎠ ∑ [ ⎛ M ⎞ ⎝ m =1 ⎠ ] ⎜ ∑ Dm ⎟⎟ x1 ⎜ ⎟ δ = δ1( D1x1) ,..., δ1( DM x1) ' ⎜ i n ⎤ ⎡ ⎢ 1 zi1( K1 x1) ⋅ yi1 ⎥ ⎥ ⎢ n ⎥ ⎢ i =1 ⎥ ⎢ ⎥ ⎢ n ⎢1 ziM ( K M x1) ⋅ yiM ⎥ ⎥ ⎢n ⎥⎦ ⎢⎣ i =1 ⎞ ⎟ ⎟ ⎠ n ∑z y i i =1 ∑ ∑ SZX ⎛1 ⎜ ⎜n ⎝ S = A var( g ) = E ( g i g i ' ) ∑ i =1 ⎡ n ⎢1 zi1( K1 x1) xi'1(1xd ) 1 ⎢n ⎢ i =1 ⎢ 0 ⎢ ⎢ 0 ⎢ ⎢⎣ ⎞ zi xi' ⎟ ⎟ ⎠ ∑ ⎤ ⎥ ⎥ ⎥ ⎥ 0 n ⎥ 1 ' ⎥ ziM ( K1 x1) xiM (1xd1 ) ⎥ n ⎥⎦ ⎛ M ⎞ ⎛M ⎞ i =1 ⎜ ∑ K m ⎟ x⎜ ∑ d m ⎟ 0 0 ∑ 0 ⎝ i =1 KXK Size of W ΣZX n E(zi xi’) S = E ( g i g i ') = E ' ⎡E(z i1( K1 x1) xi1(1xd1 ) ' ) ⎢ 0 ⎢ ⎢ 0 ⎢⎣ ( ε i2 z i z i ' 0 1 Sˆ = n n ∑ εˆ ⎠ 0 ⎤ ⎥ 0 ⎥ ⎥ ' E ( zi1( K1 x1) ⋅ xiM ' ) ⎞ ⎛M ⎞ (1xd1 ) ⎦⎥ ⎛ M ⎜ ∑ K m ⎟ x⎜ ∑ d m ⎟ 0 ⎝ i =1 ) S = E(g i g i ' )⎛ M ⎞ ⎛ M ⎞ ⎜ K m ⎟×⎜ Km ⎟ ⎜ ⎟ ⎜ ⎟ ⎝ m =1 ⎠ ⎝ m =1 ⎠ Ŝ (consistent estimator of S) ⎠ ⎝ i =1 ⎛M ⎞ ⎛M ⎞ ⎜⎜ ∑ K m ⎟⎟ x⎜⎜ ∑ K m ⎟⎟ ⎝ i =1 ⎠ ⎝ i =1 ⎠ 2 i zi zi ' i =1 ∑ ∑ ⎠ ⎝ i =1 ⎠ ⎡ E (ε i1ε i1 z i1 z i1 ') … E (ε i1ε iM z i1 z iM ') ⎤ ⎥ = ⎢⎢ ⎥ ⎢⎣ E (ε i1ε i1 z iM z i1 ') … E (ε iM ε iM z iM z iM ')⎥⎦ n ⎤ ⎡ 1 n 1 ' εˆi1εˆi1 z i1 z i'1 … εˆi1εˆiM z i1 z iM ⎥ ⎢ n i =1 ⎥ ⎢ n i =1 ⎥ Sˆ = ⎢ ⎥ ⎢ n n 1 ' ⎥ ⎢1 εˆiM εˆi1 z iM z i'1 … εˆ MM εˆ MM z MM z MM ⎥⎛ ⎢n n i =1 ⎦⎜ ⎣ i =1 ∑ ∑ ∑ ∑ M Estimator consistent under assumptions… Estimator asymptotically normal under assumptions… Ŝ ÆP S under assumptions… D.F. of J-Statistic 3.1 – 3.4 4.1 – 4.4 3.1 – 3.4 + 3.5 4.1 – 4.4 + 4.5 3.1, 3.2, 3.6, E(gigi’) finite 4.1, 4.2, 4.6, E(gigi’) finite K-D Σm (Km – Dm) ⎞ ⎛ M ⎞ ∑ K m ⎟⎟×⎜⎜ ∑ K m ⎟⎟ ⎜ ⎝ m =1 ⎠ ⎝ m =1 ⎠ V. Single Equation vs. Multiple Equation Estimation A. Equation by Equation GMM vs. Joint Estimation: An alternative to joint estimation is to apply single-equation GMM separately to each equation. • The equation-by-equation estimator is a particular M.E. GMM estimator with particular choice of Wn Estimate the M equations via SE GMM s.t. the weighting matrix for the m-th equation is W*mm (Km x Km) Then, if we stack the equation-by-equation GMM estimator, it can be written as a M.E. GMM estimator with the block diagonal matrix whose m-th diagonal block is ⎡Wˆ11 ⎢ ˆ Wmm: W = ⎢ ⎢ ⎢⎣ ⎤ ⎥ ⎥ Wˆ11 ⎥⎥ ⎦ (this works bc SZX and W are block diagonal) • When are they Equivalent i. If all equations are just identified, then the equation-by equation GMM and multiple-equation GMM are numerically the same and equal to the IV estimator (regardless of weighting matrix!) ii. If at least one equation is overidentified but the equations are “unrelated” in the sense that E(eimeihximxoh) = 0 for all m !=h, then the efficient equation-by-equation GMM and the efficient multiple-equation GMM are asymptotically P 0 equivalent in that δ (Wˆ ) − δ ( Sˆ −1 ) ⎯⎯→ • Joint Estimation Can Be Hazardous Except for the above cases, joint estimation is asymptotically more efficient since it takes advantage of cross-equation correlations. Even if you are interested in estimation 1 particular equation, you can generally gain asymptotic efficiency by combining it with some other equations (though there are special cases where joint estimation entails no efficiency gain even if the added equations are not unrelated). Caveats: i. Small Sample Properties: Small-sample properties of the equation of interest might be better without joint estimation ii. Misspecification: Asymptotic result presumes that the model is correctly specified, i.e. model assumptions are satisfied. If the model is misspecified, neither the single-equation GMM nor the multiple equation GMM is guaranteed to be even consistent. Furthermore, chances of misspecification increase as you add equations to the system. (see p272 for example) VI. Special Cases of Multiple Equation GMM under Conditional Homoskedasticity: FIVE, 3SLS, and SUR 0. S under conditional homoskedasticity: ⎡ E ( ε i1ε i1 zi1 zi1 ') … E ( ε i1ε iM zi1 ziM ') ⎤ ⎡ σ11 E ( zi1 zi1 ') … σ 1M E ( zi1 ziM ' ) ⎤ ⎢ ⎥ ⎢ ⎥ S = E ( gi gi ') = ⎢ ⎥=⎢ ⎥ = E ( Z i ' ΣZ i ) ⎢ E ( ε i1ε i1 ziM zi1 ' ) … E ( ε iM ε iM ziM ziM ') ⎥ ⎢σ M 1 E ( ziM zi1 ' ) … σ MM E ( ziM ziM ' ) ⎥ ⎣ ⎦ ⎣ ⎦ 1. 29 Full-Information Instrumental Variables Efficient (FIVE) Estimator Simplification of S: This estimator exploits the above structure of 4th moments by using the following consistent estimator Ŝ of S: n n ⎛ ⎞ 1 1 zi1 zi1 ' … σˆ1M zi1 ziM ' ⎟ ⎜ σˆ11 n i =1 n i =1 σˆ1M ⎤ ⎜ ⎟ ⎡ σˆ11 n 1 ˆ 30 ⎜ ⎟ ⎢ ⎥ 1 ˆ ˆ= = Z Σ Z ' where = εˆi εˆi ' S =⎜ Σ ⎟ ⎢ ⎥ n n i =1 n n ⎜ ⎟ ⎢⎣σˆ M 1 σˆ MM ⎥⎦ 1 ⎜ σˆ M 1 1 ziM zi1 ' … σˆ MM ziM ziM ' ⎟ ⎜ ⎟ n i =1 n i =1 ⎝ ⎠ ( δˆ is usually the 2SLS estimator for equation m. So for the cross moments we need 2 2SLS estimators) • ∑ ∑ ∑ ∑ ∑ m • Then, our ME efficient GMM estimator simplifies to: −1 −1 δˆ ≡ δˆ( Sˆ −1 ) = S ' Sˆ −1S S ' Sˆ −1s = XZ '( Z Σˆ Z ') −1 ZX ' XZ '( Z Σˆ Z ')−1 ZY ( ZX FIVE ZX ) ZX ZY ( ) (the 1/n’s cancel) Large Sample Properties of FIVE: (these follow from the large sample properties of M.E. GMM estimators) Suppose Assumptions 4.1 – 4.5 and 4.7 hold. Suppose further that E(zihzim) exists and is finite for all m, h = 1,2,…M. Let S and Ŝ be defined as above. Then, (a) Ŝ ÆP S (b) δˆ ≡ δˆ ( Sˆ −1 ) is consistent, asymptotically normal, and efficient with Avar( δˆ FIVE ) = (ΣZX’S-1ΣZX)-1 ) )-1 is consistent for Avar( δˆ FIVE (c) The estimated asymptotic variance Avar( Aˆ var(δˆFIVE ) = (S'ZX Sˆ -1SZX FIVE ⎛ D (d) Sargan’s Statistic / J Statistic: J (δˆ FIVE , Sˆ −1 ) ≡ n ⋅ g n (δˆ FIVE )' Sˆ −1 g n (δˆFIVE ) ⎯⎯→ χ 2⎜ ⎜ ⎝ 29 30 ∑ (K m m ⎞ − D m )⎟ ⎟ ⎠ The (m,h) block of E(gigi’) = E(εim εih xim xih’) = E(E(εim εih | xim xih) xim xih’) by Law of Iterated Exp and linearity of conditional exp = E(σmh xim xih’) by conditional homoskedasticity = σmh E(xim xih’) σˆ mh ≡ 1 n n ∑ εˆ im εˆih ' ˆ δ m (m, h = 1, 2,..., M ) for some consistent estimator δˆm of δ m , εˆim ≡ yim − xim i =1 By Assumption 4.1, 4.2, and E(zihzim) exists , assumption. Therefore, P σˆ mh ⎯⎯ → σ mh (Prop 4.1 p.269). By (joint) ergodic stationarity, Ŝ ÆP S without finite 4 th moment assumption. 1 n n ∑z i =1 im zih P → E ( zim zih ') which exists and is finite by ' ⎯⎯ 2. Three-Stage Least Squares (3SLS): When the set of instruments is same across equations, FIVE can be simplified to 3SLS • Simplification of gi, S, and Ŝ : If zi ( = zi1 = zi2 = zi3 = … = ziM) is the common set of instruments (for all M equations) with dimension K, then gi, S, and Ŝ can be written compactly using the Kronecker product 31 as follows: ⎡ z i1 ⋅ ε i1 ⎤ ⎡ z i ⋅ ε i1 ⎤ ⎥ ⎥=⎢ gi = ⎢⎢ since instruments are common = εi ⊗ zi ⎥ ⎥ ⎢ ⎢⎣ z iM ⋅ ε iM ⎥⎦ ⎢⎣ z i ⋅ ε iM ⎥⎦ MKx1 σ 1M E ( z i1 z iM ) ⎤ E (ε i1ε iM z i1 z iM ) ⎤ ⎡ σ 11 E ( z i1 z i1 ) ⎡ E (ε i1ε i1 z i1 z i1 ) ⎢ ⎥ ⎢ ⎥ S = E(g i g i ' ) = ⎢ ⎥ ⎥=⎢ ⎢⎣ E (ε iM ε i1 z iM z i1 ) E (ε iM ε iM z iM z iM )⎥⎦ ⎢⎣σ M 1 E ( z iM z i1 ) σ MM E ( z iM z iM )⎥⎦ σ 1M E ( z i z i ) ⎤ ⎡ σ 11 E ( z i z i ) ⎥ by common instruments = ⎢⎢ ⎥ ⎢⎣σ M 1 E ( z i z i ) σ MM E ( z i z i )⎥⎦ σ 1M ⎤ ⎡ ε i1 ⎤ ⎡ σ 11 ⎢ ⎥ ⎢ ⎥ = Σ ⊗ E ( z i z i ) MK ×MK where Σ ≡ ⎢ ⎥ ⎥ = E (ε i ε i ' ) for ε i = ⎢ ⎢⎣ε iM ⎥⎦ ⎢⎣σ M 1 σ MM ⎥⎦ S-1 = Σ-1 ⊗ E(zi zi’)-1 ⎛1 Sˆ = Σˆ ⊗ ⎜ ⎜n ⎝ n ∑ i =1 ⎛1 Sˆ −1 = Σˆ −1 ⊗ ⎜ ⎜n ⎝ • n ∑ i =1 ⎞ zi zi '⎟ ⎟ ⎠ −1 [ = n ⋅ Σˆ −1 ⊗ (Z ' Z )−1 σˆ 1M ⎤ ⎥ 1 ⎥ = n σˆ MM ⎥⎦ ] With above simplifications, our M. E. GMM estimator ( ) ( ' ˆ −1 δˆ3SLS = δˆFIVE Sˆ −1 = S ZX S S ZX 31 ⎡ σˆ 11 ⎢ where Σˆ = ⎢ ⎢⎣σˆ M 1 ⎞ 1 1 z i z i ' ⎟ = Σˆ ⊗ Z ' Z = (Σˆ ⊗ Z ' Z ) ⎟ n n ⎠ ⎡ a11 ⎢ Recall: For general matrices A = ⎢ ⎢⎣aM 1 ) −1 ( = ⎡ X ' ( Σˆ ⎢⎣ ) X ' ( Σˆ ' ˆ −1 S ZX S sZY = ⎡⎢ X ' Σˆ −1 ⊗ Z ( Z ' Z )−1 Z ' X ⎤⎥ ⎣ ⎦ a1N ⎤ ⎡ b11 ⎢ ⎥ (MxN) and B = ⎢ ⎥ ⎢⎣bK1 aMN ⎥⎦ −1 ) ⊗ PZ X ⎤ ⎥⎦ −1 −1 −1 ( ⊗ P )Y 33 Z b1L ⎤ ⎡ a11 ⎤ ⎢ ⎥ ⎥ (KxL), a = ⎢ ⎥ ⎥ (Mx1), b = ⎢⎣aM 1 ⎥⎦ bKL ⎥⎦ a1N B ⎤ ⎡ b11 ⎤ ⎡ a11 ⎤ ⎡ a11B ⎢ ⎥ ⎥ ⎢ ⎥ ⎢ the Kronecker product is defined as: A ⊗ B = ⎢ , a ⊗ b = ⊗ ⎢ ⎥ ⎥ = ⎢ ⎥ ⎢⎣bN 1⎥⎦ ⎢⎣aM 1 ⎥⎦ ⎢⎣ aM 1B aMN B ⎥⎦ MKxNL Useful Properties: (A ⊗ B)(C ⊗ D) = AC ⊗ BD (provided that A and C are conformable and B and D are conformable) ) X ' Σˆ −1 ⊗ Z ( Z ' Z ) −1 Z ' Y ⎡ b11 ⎤ ⎢ ⎥ ⎢ ⎥ (Nx1) ⎢⎣bN1⎥⎦ ⎡ a11b ⎤ ⎥ ⎢ ⎥ ⎢ ⎢⎣ aM 1b ⎥⎦ MNx1 (A ⊗ B)’ = A’ ⊗ B’ (A ⊗ B)-1 = A-1 ⊗ B-1 32 Just as before, ( δˆ σˆ mh ≡ 1 n n ∑ εˆ im εˆih ' ˆ δ m (m, h = 1, 2,..., M ) for some consistent estimator δˆm of δ m , εˆim ≡ yim − xim i =1 m is usually the 2SLS estimator for equation m. So for the cross moments we need 2 2SLS estimators) 33 This is bc n ∑ εˆ εˆ ' i i i =1 32 Large Sample Properties of 3SLS: (SEE ANALYTICAL EXERCISES FOR CHAPTER 4) Suppose Assumptions 4.1 – 4.5 and 4.7 hold, and suppose that there exists a common set of instruments for all equations (zim = zi). Suppose further that E(zihzim) exists and is finite for all m, h = 1,2,…M. Let Σ̂ be the MxM matrix of estimated error cross moments calculated using 2SLS residuals. Then, = δˆ ( Sˆ −1 ) is consistent, asymptotically normal, and efficient (a) δˆ 3 SLS FIVE ( ( ) ) with Avar( δˆ3SLS ) = (ΣZX’S-1ΣZX)-1 = n X ' Σ −1 ⊗ Z ( Z ' Z ) −1 Z ' X ( ( −1 ) ) (b) The estimated asymptotic variance Aˆ var(δˆ3SLS ) = n X ' Σˆ −1 ⊗ Z ( Z ' Z ) −1 Z ' X −1 is consistent for Avar( δˆ3SLS ) (SEE 4.5.17) ⎞ ⎛ D (c) Sargan’s Statistic / J Statistic: J (δˆ3SLS , Sˆ −1 ) ≡ n ⋅ g n (δˆ3SLS )' Sˆ −1 g n (δˆ3SLS ) ⎯⎯→ χ 2 ⎜ MK − Dm ⎟ ⎟ ⎜ m ⎠ ⎝ Seemingly Unrelated Regressions: Suppose in addition to common instruments, zi = union of (xi1,…, xi1) (all regressors are instruments!) • The SUR “cross orthogonality” condition is equivalent to E(xim . εih) = 0 (m, h = 1,2,…,M) Predetermined regressors satisfy cross orthogonalities: not only are the regressors predetermined in each equation E(xim . εim), but also they are predetermined in other equations (so regressors in any equation is an instrument for all equations). This simplification produces the SUR estimator Æ WHAT IS THE IMPORTANCE OF CROSS ORTHOG? ∑ 3. • With the above, (S and g still defined the same) we get that… (bc Xi are in the column space of PZ) ( ) ( ' ˆ −1 δˆSUR = δˆ3SLS = δˆFIVE Sˆ −1 = S ZX S S ZX ⎡ n ' ⎢ zi1( K1 x1) xi(1 1xd1 ) ⎢ n ⎢ i =1 1 1 S ZX = Zi X i ' = ⎢ 0 n n⎢ ⎢ i =1 ⎢ 0 ⎢ ⎣⎢ ∑ ⎤ ⎡ n ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ i =1 1⎢ ⎥ = 0 ⎥ n⎢ n ⎥ ⎢ ' ⎥ ⎢ ziM ( K1 x1) xiM (1xd1 ) ⎥ ⎢ i =1 ⎦⎥ ⎣⎢ 0 ⎡ z' ⎤ ⎢ 1(1xK ) ⎥ ⎡ z11 ⎥=⎢ where Z = ⎢ ⎢ ⎥ ⎢ ⎢ zn' (1xK ) ⎥ ⎢⎣ zn1 ⎣ ⎦ 1 sZY = ( I M ⊗ Z ) ' Y n ) ∑ 0 −1 ∑z x 0 ∑ ) ⎡ X1( nxDM ) z1K ⎤ ⎢ ⎥ and X = ⎢ ⎥ ⎢ znK ⎥⎦ ⎢⎣ nxK ' i i(11xd1 ) ( 0 0 0 ) ' ˆ −1 S ZX S sZY = ⎡ X ' Σˆ −1 ⊗ I n X ⎤ ⎢⎣ ⎥⎦ 0 −1 ( ) 34 X ' Σˆ −1 ⊗ I n Y ⎤ ⎥ ⎥ ⎥ 1 ⎥ = 0 I ⊗ Z )' X ⎥ n( M n ⎥ ' ⎥ zi xiM (1xd1 ) ⎥ i =1 ⎦⎥ 0 ∑ ⎤ ⎥ ⎥ X M ( nxDM ) ⎥⎥ ⎦ nMx ⎡ x' ⎤ ⎢ 1m ⎥ , Xm = ⎢ (data matrix for mth equation ' s regressors) ⎥ ⎢ ' ⎥ ⎣⎢ xnm ⎦⎥ nxDm ∑ m Dm ⎡y ⎤ ⎡y ⎤ ⎢ 1⎥ ⎢ m1 ⎥ where Y = ⎢ ⎥ (data matrix for mth equation ' s dependent var) , ym = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣⎢ yn ⎦⎥ nMx1 ⎣⎢ ymn ⎦⎥ nx1 −1 ) ) ) −1 −1 1 ⎛1 ⎞ = ⎜ X ' ( I M ⊗ Z ) ⎡⎢ n Σˆ −1 ⊗ ( Z ' Z )−1 ⎤⎥ ( I M ⊗ Z ) ' X ⎟ = n ⎛⎜ X ' Σˆ −1 ⊗ Z ( Z ' Z ) −1 ( I M ⊗ Z ) ' X ⎞⎟ = n ⎜⎛ X ' Σˆ −1 ⊗ Z ( Z ' Z )−1 Z X ⎟⎞ ⎣ ⎦n ⎝ ⎠ ⎝ ⎠ ⎝n ⎠ 1 1 1 1 − 1 − 1 − 1 − 1 − 1 − 1 ' ˆ −1 S ZX S sZY = X ' ( I M ⊗ Z ) ⎡ n Σˆ ⊗ ( Z ' Z ) ⎤ ( I M ⊗ Z ) ' y = X ' Σˆ ⊗ Z ( Z ' Z ) ( I M ⊗ Z ) ' y = X ' Σˆ ⊗ Z ( Z ' Z ) Z ' Y ⎢⎣ ⎥⎦ n n n n ( ' ˆ −1 ∴ S ZX S S ZX ( −1 ( 34 ) ( ( ) ( ( ) Large Sample Properties of SUR: (SEE ANALYTICAL EXERCISES FOR CHAPTER 4) Suppose Assumptions 4.1 – 4.5 and 4.7 hold, and suppose that there exists a common set of instruments for all equations (zim = zi), and suppose zi = union of (xi1,…, xi1). Suppose further that E(zihzim) exists and is finite for all m, h = 1,2,…M. Let Σ̂ be the MxM matrix of estimated error cross moments calculated using OLS residuals. Then, = δˆ = δˆ ( Sˆ −1 ) is consistent, asymptotically normal, and efficient (a) δˆ 3 SLS SUR FIVE ( ( ) ) ) = n(X ' (Σˆ with Avar( δˆSUR ) = (ΣZX’S-1ΣZX)-1 = n X ' Σ −1 ⊗ I n X (b) The estimated asymptotic variance Aˆ var(δˆSUR −1 −1 ) ) ⊗ In X −1 is consistent for Avar( δˆSUR ) (SEE 4.5.17) ⎞ ⎛ D χ 2 ⎜ MK − Dm ⎟ (c) Sargan’s Statistic / J Statistic: J (δˆSUR , Sˆ −1 ) ≡ n ⋅ g n (δˆSUR )' Sˆ −1 g n (δˆSUR ) ⎯⎯→ ⎟ ⎜ m ⎠ ⎝ • 3SLS vs. SUR: SUR estimator is a special case of 3SLS. In 3SLS, the initial (consistent) estimator used to calculate Σ̂ was 2SLS. But we know 2SLS when regressors are a subset of the instrument set (i.e. when regressors are predetermined) is OLS (Ch. 3.8 Review Question 7). So for SUR the initial estimator is the OLS estimator. ∑ • SUR vs. OLS: Since the regressors are predetermined, the system can also be estimated by the equation-by-equation OLS. Then why SUR over OLS? The relationship between SUR and equation-by-equation OLS is strictly analogous to the relationship between M.E. GMM and equation-by-equation GMM: Under conditional homoskedasticity, the efficient M.E. GMM is FIVE, which is equal to SUR under the cross orthogonality condition. On the other hand, under conditional homoskedasticity, efficient single-equation GMM is SLS, which is OLS under cross orthogonality since it implies that regressors are predetermined. 2 Cases to consider: (i) Each equation is just identified: Then, since common instrument set is the union of all regressors, this is possible only if the regressors are the same for all equations, i.e. xim = zi for all m Æ Multivariate Regression Æ equation by equation OLS! (ii) At least one equation is overidentified: Then, SUR is more efficient than equation by equation OLS, unless the equations are “unrelated” to each other in the sense that E(eimeihximxoh) = σmh E(ximxoh) = 0 (first equality true by conditional homoskedasticity) for all m !=h (recall, in this case the ME GMM is asymptotically equivalent to SE GMM). Since E(ximxoh) is assumed to be non-0 (by rank condition), σmh E(ximxoh) = 0 iff σmh = 0 (cov of error terms = 0) Therefore, SUR is more efficient than OLS if σmh != 0 for some pair (m,h), and they are are asymptotically equivalent if σmh = 0 for all (m,h) ( SZX' Sˆ −1SZX ) ) ( −1 −1 ( ( ) ) −1 = n ⎜⎛ X ' Σˆ −1 ⊗ Z ( Z ' Z ) −1 Z X ⎞⎟ = n ⎜⎛ X ' Σˆ −1 ⊗ PZ X ⎟⎞ ⎝ ⎠ ⎝ ⎠ ⎛X' ⎞ ⎛ σˆ P ˆ ⎞ σ1M PZ ⎞⎛ X1( nxD1 ) ⎜ 1( D1 xn) ⎟ ⎜ 11 Z ( nxn) ⎟⎜ ⎟ ⎟⎜ X ' Σˆ −1 ⊗ PZ X = ⎜ ⎟⎜ ⎟ ⎜ ⎟ ' ⎟ ⎜⎜ σˆ M 1PZ ⎟⎜ X M ⎟⎟ σˆ MM PZ ⎟⎜ ⎜ XM ⎠ ⎠⎝ ⎝ ⎠⎝ ⎛ σˆ X ' ⎞ P' P X ⎜ 11 1( D1 xn) Z ( nxn) Z ( nxn) 1( nxD1 ) ⎟ ⎟ sin ce P symm idempotent =⎜ Z ⎜ ⎟ ' ' ⎜ σˆ MM X M PZ PZ X M ⎟ ⎝ ⎠ ⎛ σˆ X ' ⎞ X ⎜ 11 1( D1 xn) 1( nxD1 ) ⎟ ⎟ sin ce X is in the column space of P ( Z is projection onto space spanned by union of x ' s) =⎜ m Z m ⎜ ⎟ ' ⎜ σˆ MM X M X M ⎟ ⎝ ⎠ ⎛X' ⎞ ˆ ⎞ ⎛ X1 ⎞ ⎜ 1( D1 xn) ⎟ ⎛ σ11I n ⎟⎜ ⎟ ⎟⎜ =⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎜ ' ⎟⎜ σˆ MM I n ⎟⎠ ⎝ X M ⎟⎠ ⎜ XM ⎝ ⎝ ⎠ ( ) ( ) = X ' Σˆ −1 ⊗ I n X Similarly, ( ) 1 1 ' ˆ −1 S ZX S sZY = X ' Σˆ −1 ⊗ Z ( Z ' Z ) −1 Z ' Y = X ' Σˆ −1 ⊗ I n Y n n 4. Multivariate Regression : Suppose in addition to common instruments, zi = union of (xi1,…, xi1), and all equations are identified. • This condition implies xim = zi for all m (same regressors for all equations, which are all exogenous) 35 • Simplification of X: With the assumption, we get that X = IM ⊗ Z • With the above, (S and g still defined the same) we get that… ( ) ( ' ˆ −1 δˆMVR = δˆSUR = δˆ3SLS = δˆFIVE Sˆ −1 = S ZX S S ZX ) −1 36 ' ˆ −1 S ZX S sZY = ⎡⎢ I M ⊗ ( Z ' Z ) −1 Z '⎤⎥ Y ⎣ ⎦ • Multivariate Regression Estimator = Equation-by-Equation OLS Recall, when equations are just identified, the ME GMM and efficient single-equation GMM are numerically equal to the IV estimator, and since the regressors are predetermined, the GMM estimator of the multivariate regression model is just equationby-equation OLS. • Multivariate Regression Interpretation of SUR: We can think of SUR model as a multivariate regression model with a priori exclusion restrictions. (i.e. a system with same regressors but with restrictions on certain coefficients) Multiple Equation GMM with Common Coefficients: We modify the ME GMM model to allow this restriction. We get RE and Pooled OLS. • Background: In many applications (e.g. panel data) we deal with a special case of ME GMM model where the number of regressors is the same across equations with the same coefficients. How do we apply ME GMM while imposing common coefficient restriction Random Effects Estimator and Pooled OLS 35 36 This is true bc zi instruments all m equations. So, if not, xim c zi for some m, then dim(zi) > dim(xim) and the mth equation is overidentified. ( X ' ( Σˆ ) ) ( I ⊗ Z ) = ( I ⊗ Z ')( Σˆ ⊗ I ) ( I ⊗ Z ) = ( Σˆ ⊗ Z ') ( I ⊗ Z ) = (Σˆ ( ⊗ I ) Y = ( I ⊗ Z ')( Σˆ ⊗ I ) Y = ( Σˆ ⊗ Z ' ) Y = ⎡⎢ X ' ( Σˆ ⊗ I ) X ⎤⎥ X ' ( Σˆ ⊗ I ) Y = ⎡⎢ Σˆ ⊗ Z ' Z ⎤⎥ ( Σˆ ⊗ Z ') Y = ( Σˆ ⊗ ( Z ' Z ) )( Σˆ ⊗ Z ') Y = ( I ⎣ ⎦ ⎣ ⎦ X ' Σˆ −1 ⊗ I n X = ( I M ⊗ Z ) ' Σˆ −1 ⊗ I n −1 ∴δˆMVR −1 ' M n −1 −1 n n M −1 M −1 ⊗ Z 'Z ) −1 n −1 −1 ' M M n −1 −1 −1 −1 −1 m ⊗ (Z ' Z ) −1 ) Z' Y Relationship between ME. Estimators Multiple Equation GMM FIVE 3SLS SUR Multivariate Regression Assumptions 4.1 – 4.5, 4.7 E(ximxih’) finite zim = zi for all m xim = zi for all m The Model Assumptions 4.1 – 4.6 Assumptions 4.1 – 4.5, 4.7 E(ximxih’) finite Note: 4.6 (finite 4th moments) not needed under homoskedasticity, 4.7 Assumptions 4.1 – 4.5, 4.7, E(ximxih’) finite zim = zi for all m (same set of instruments across equations) Assumptions 4.1 – 4.5, 4.7 E(ximxih’) finite zim = zi for all m zi = unionm xim (This is the cross-equation orthogonality: E(xim . εih) = 0 (m, h = 1,2,…,M) ) S ⎡ E ( ε i1ε i1zi1zi1 ') … E ( ε i1ε iM zi1ziM ') ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ E (ε ε z z ') … E (ε ε z z ')⎥ i1 i1 iM i1 iM iM iM iM ⎦ ⎣ ⎡ E ( ε i1ε i1 zi1 zi1 ') … E ( ε i1ε iM zi1 ziM ') ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ E ( ε i1ε i1 ziM zi1 ' ) … E ( ε iM ε iM ziM ziM ' ) ⎥ ⎣ ⎦ σ1M E ( zi zi ) ⎤ ⎡ σ11E ( zi zi ) ⎢ ⎥ ⎢ ⎥ ⎢⎣σ M 1E ( zi zi ) σ MM E ( zi zi ) ⎥⎦ = Σ ⊗ E ( zi zi ) MK ×MK σ1M E ( zi zi ) ⎤ ⎡ σ11E ( zi zi ) ⎢ ⎥ ⎢ ⎥ ⎢⎣σ M 1E ( zi zi ) σ MM E ( zi zi ) ⎥⎦ = Σ ⊗ E ( zi zi ) MK ×MK Irrelevant n ⎛ ⎞ 1 Sˆ = Σˆ ⊗ ⎜ zi zi ' ⎟ ⎜⎜ n ⎟⎟ ⎝ i =1 ⎠ 1 ˆ = Σ ⊗ Z 'Z n Σˆ from 2 SLS residuals n ⎛ ⎞ 1 Sˆ = Σˆ ⊗ ⎜ zi zi ' ⎟ ⎜⎜ n ⎟⎟ ⎝ i =1 ⎠ 1 ˆ = Σ ⊗ Z 'Z n Σˆ from OLS residuals Irrelevant ⎡ σ 11 E ( zi1 zi1 ') … σ 1M E ( zi1 ziM ' ) ⎤ ⎢ ⎥ =⎢ ⎥ ⎢σ M 1 E ( ziM zi1 ' ) … σ MM E ( ziM ziM ' ) ⎥ ⎣ ⎦ = E ( Z i ' ΣZ i ) Ŝ n ⎡ ⎢ 1 εˆi1εˆi1zi1zi'1 … ⎢ n ⎢ i =1 ⎢ ⎢ ⎢ n ⎢1 εˆiM εˆi1ziM zi'1 … ⎢n ⎣⎢ i =1 ∑ ∑ (S δˆ( Sˆ −1) ' ˆ −1 ZX S S ZX ) −1 ⎤ ⎥ ⎥ ⎥ i =1 ⎥ ⎥ n ⎥ 1 ' εˆMM εˆMM zMM zMM ⎥ ⎥ n i =1 ⎦⎥ 1 n n ∑ ' εˆi1εˆiM zi1ziM E ( Zi ' Σˆ Z i ) ∑ ∑ ' ˆ −1 S ZX S sZY ( S Sˆ S ) S = ( XZ '( Z Σˆ Z ') ZX ') −1 ' −1 ZX ZX ' ˆ −1 ZX S sZY −1 Avar δˆ( Sˆ −1) A var δˆ( Sˆ −1) (Σ ZX ' S −1Σ ZX )−1 (S ˆ ZX ' S −1 S ZX ) −1 with S defined above with S defined above (Σ ZX ' S −1Σ ZX )−1 (S ˆ ZX ' S −1 S ZX ) −1 −1 XZ '( Z Σˆ Z ') −1 ZY with S defined above with S defined above (S ' ˆ −1 ZX S S ZX ( ) −1 ) ∑ = ⎡⎢ X ' Σˆ −1 ⊗ PZ X ⎤⎥ ⎣ ⎦ −1 ) ( X ' Σˆ −1 ⊗ PZ Y ( ( ) ) n X ' Σ −1 ⊗ Z ( Z ' Z ) −1 Z ' X ( ) n ⎛⎜ X ' Σˆ −1 ⊗ Z ( Z ' Z ) −1 Z ' X ⎞⎟ ⎝ ⎠ −1 ) S Sˆ − s − = ⎡⎢ X ' ( Σˆ − ⊗ I ) X ⎤⎥ X ' ( Σˆ − ⊗ I ) Y ⎣ ⎦ − n ⎛⎜ X ' ( Σˆ − ⊗ I ) X ⎞⎟ ⎝ ⎠ (S ' ˆ −1 S ZX S sZY −1 −1 ' ˆ −1 ZX S S ZX 1 1 ( ' ZX 1 n ) ZY 1 Equation-byEquation OLS n 1 OLS Formula −1 OLS Formula n n ⎛⎜ X ' Σˆ −1 ⊗ I n X ⎞⎟ ⎝ ⎠ 1 Note: ' ⎡ z1(1 ⎤ ⎡z xK ) ⎢ ⎥ ⎢ 11 Z =⎢ ⎥=⎢ ⎢ ' ⎥ ⎢ ⎢⎣ zn (1xK ) ⎥⎦ ⎣ zn1 ⎡ X1( nxDM ) z1K ⎤ ⎥ , X =⎢ ⎢ ⎥ ⎢ ⎥ znK ⎦ nxK ⎣ ⎤ ⎥ ⎥ X M ( nxDM ) ⎥⎦ nMx ∑ m Dm ⎡ x1' m ⎤ ⎡ ε1( nx1) ⎤ ⎡ y1( nx1) ⎤ ⎡ ε1m ⎤ ⎡ y1m ⎤ ⎡ δ1 ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎥ ⎢ ⎥ ⎢ , Xm = ⎢ ,δ D = ⎢ ⎥ , Y = ⎢ , ym = ⎢ , ε m = ⎢⎢ ⎥ ⎥ ⎥ ⎥ , ε =⎢ ⎥ ∑ m m ⎢ ' ⎥ ⎢ε ⎥ ⎢y ⎥ ⎢⎣ε nm ⎥⎦ ⎢⎣δ M ⎥⎦ ⎢⎣ ynm ⎥⎦ x ( 1) M ( nx 1) M nx nx1 nx1 ⎣ ⎦ nMx1 ⎣ ⎦ nMx1 ⎣ nm ⎦ nxDm VII. Simultaneous Equations, FIML (ML Counterpart to 3SLS) and LIML (ML Counterpart to 2SLS) Background: Given that we’re going to estimate simultaneous equations system (with same instruments across all equations) via maximum likelihood, we will assume iid and normality. But first, we need to “complete” the system (# of endogenous variables = # equations) • • 1. • Recall, there are M linear equation we want to estimate, yim = xim’δm + εim (m = 1, 2, ... , m; i = 1,2,…,n) Common instruments will be used across all equations (this is the 3SLS assumption) Full Information Maximum Likelihood (ML estimation of 3SLS) Complete System of Simultaneous Equations Setup: In order for the M equations of the system to form a “complete” system of simultaneous equations, we need: A. # Endogenous Variables = #Equations: This implies that we can write the M-Equation system in structural form Γ0 ( M × M ) y t ( M ×1) + B 0( M × K ) x t ( K ×1) = ε t ( M ×1) where yt is the vector of endogenous variables in the system and xt is the vector of exogenous variables in the system 37 Note: It is not possible to estimate an incomplete system by FIML unless we complete it by adding appropriate equations This is in contrast to GMM. An incomplete system can be estimated via 3SLS (or M.E. GMM if we don’t assume conditional homoskedasticity), as long as the rank condition for identification is satisfied. Note: To complete the system, we can add equations involving instruments (Endogenous variable and instruments). B. The square matrix Γ0 ( M ×M ) is nonsingular: This implies that the structural equation can be solved for endog. var’s y t ( M ×1) = −Γ0−1 ( M × M ) B 0( M × K ) x t ( K ×1) + Γ0−1ε t ( M ×1) ≡ Π '0( M × K ) x t ( K ×1) + v t • Regression Form is a Multivariate Regression Model ' Def: Reduced Form is: y t ( M ×1) = Π 0( M × K ) xt ( K ×1) + v t Def: The coefficients in Π '0( M × K ) are called reduced-form coefficients Predetermined Condition is satisfied Since we have put all the predetermined variables in xt, then by assumption (of predetermined), E(xt vt’) = 0. Therefore, we have a (ME GMM) Multiple Regression Model (Same regressors/instruments for all equations) • Estimating System of Simultaneous Equations by FIML 2 Assumptions: A. Structural Error εt vector is jointly normal conditional on xt : ε t | xt ~ N M (0, Σ0 ) B. {y t , xt } iid (to Strength Assumption 4.2 and 4.5) From here, we obtain: −1 −1 −1 ' C. Distribution of Reduced form Error : v t | xt = Γ 0 ε t | xt ~ N (0, Γ 0 Σ0 (Γ 0 ) ) y t | xt = −Γ 0−1B 0 xt + vt ~ N (−Γ0−1B0 xt , Γ 0−1Σ0 (Γ0−1 )' ) Distribution of endogenous yt: D. Log-likelihood function for the sample (i.e. our objective function) (pg 531 - 532 Hayashi): Qn (δ , Σ) = − ( ) M 1 1 1 log ( 2π ) + log | Γ |2 − log (| Γ |) − 2 2 2 2n n ∑ ( Γy + Bxt ) Σ −1 ( Γy t + Bxt ) ' t t =1 Therefore, the FIML estimate of (δ 0 , Σ0 ) (i.e. the coefficients in the B0 and Γ0 matrix and the variance of errors) is the (δ , Σ) that maximizes the objective function. 37 How does this relate to instruments? If it’s not complete, then we can add equations that involve instruments that we take to be true, which are orthogonal to the error terms (See LIML example later). • Properties of FIML A. Identification of the FIML estimator The identification condition is equivalent to the rank condition being satisfied for each equation: ' E (z t xtm ) is full column rank for all m = 1, 2,..., M B. Invariance: Since FIML is an ML estimator, the invariance property holds (see HW2 and HW3 on why useful) C. Asymptotic Properties of FIML Consider the M-Equation system: yim = xim’δm + εim (m = 1, 2, ... , m; i = 1,2,…,n) Let δ0 be the stacked vector collecting all the coefficients in the M-equation system. Suppose the following assumptions are satisfied ' ) is full column rank for all m = 1, 2,..., M A1: Rank condition for identification is satisfied: E (z t xtm ( ) A2: E z t z t' nonsingular A3: M-Equation system can be written as a complete system of simultaneousequations with Γ 0 nonsingular A4: ε t | xt ~ N (0, Σ0 ) for some Σ0 p.d . A5: {y t , xt } iid A6: Parameter space for (δ 0 , Σ0 ) is compact with the “true” parameter vector in the interior Then, (a) The FIML estimator (δˆ, Σˆ ) which maximizes the objective function is consistent and asymptotically normal ( ( ) ) (same as 3SLS) A consistent estimator of the asymptotic variance is n ⎛⎜ X ' ( Σˆ ⊗ Z ( Z ' Z ) Z ') X ⎞⎟ (same as 3SLS) ⎝ ⎠ (b) The asymptotic variance of δˆ is n X ' Σ −1 ⊗ Z ( Z ' Z ) −1 Z ' X −1 −1 −1 −1 (c) The likelihood ratio static for testing overidentifying restrictions is asymptotically ChiSq ⎛⎜ KM − ⎝ Furthermore, these asymptotic results hold even without the normality assumption. • 2. ∑ M L m =1 m ⎞ ⎟ ⎠ LIML vs. 3SLS: They are asymptotically equivalent. However, LIML has invariance property that 3SLS (and 2SLS don’t). Limited Information Maximum Likelihood (ML Estimation of 2SLS) Setup: The difference here is that we are only estimating 1 equation, instead of a whole system, and we have endogeneity problem. The only “trick” here is that we need to “complete” the system by adding 1 more equation relating the endogenous variable to the set of predetermined variables (just take something you know to be “true”). Example: Completing the System for LIML e.g. Suppose the population model of interest is Y1 = β 0 + β 1Y2 + ε and we want to estimate ( β 0 , β 1 ) where Y1 and Y2 are endogenous. In order to complete the system, suppose we KNOW the following relationship, Y2 = π 0 + π 1 ' Z + u . Then, we can write the structural form of the system as: Y1 − β 2Y2 = β 0 + ε ⎡ β0 0 ⎤ Y1 = β 0 + β1Y2 + ε ⎫ ⎡1⎤ ⎡ε ⎤ ⎪⎫ ⎡1 − β 2 ⎤ ⎡ Y1 ⎤ =⎢ +⎢ ⎥ ⎥ ⎬= ⎬⇒ ⎢ ⎢ ⎥ ⎥ ⎢ ⎥ ' Y2 = −π 0 + π1 ' Z ( mx1) + u ⎭⎪ ⎣ 0 1 ⎦ 2 x 2 ⎣Y2 ⎦ Y2 = π 0 + π1 ' Z + u ⎭ ⎣⎢π 0 π1 ⎦⎥ 2 xm ⎣ Z ⎦ mx1 ⎣u ⎦ 2 x1 ⎡Y ⎤ ⎡1 − β 2 ⎤ ⇒ ⎢ 1⎥ = ⎢ 1 ⎥⎦ ⎣Y2 ⎦ 2 x1 ⎣ 0 −1 −1 ⎡ β 0 0 ⎤ ⎡ 1 ⎤ ⎡ 1 − β 2 ⎤ ⎡ε ⎤ + ⎢ ⎥ ⎢ ⎥ '⎥⎢ ⎥ ⎢ ⎣⎢π 0 π1 ⎥⎦ ⎣ Z ⎦ ⎣ 0 1 ⎦ ⎣u ⎦ LIML vs. 2SLS They’re both k class estimators. 2SLS is a k class estimator with k = 1 (p. 541). So when the equation is just identified (k = 1), LIML = 2SLS numerically. • LIML and 2SLS have same asymptotic distribution, so we cannot prefer one over the other on asymptotic grounds • In finite sample, LIML has invariance property while 2SLS does not (which makes LIML more desirable) Other literature suggest that LIML should be preferred in finite sample over 2SLS • Example of LIML (271A PS2 Empirical Ex #2): Single Equation System, Completed by Equation Given Suppose the population model of interest is: Y1 = β 0 + β1Y2 + ε We suspect endogeneity, and we complete system with: Y2 = π 0 + π1Z + u ⎛⎡1⎤ ⎞ Such that E ⎜⎜ ⎢ ⎥ ε ⎟⎟ = 0, ⎝ ⎣Z ⎦ ⎠ ⎡σ 11 σ 12 ⎤ ⎡ε ⎤ ⎥ , and we observe an iid sample of (Y1,Y2,Z) ⎢u ⎥ | Z ~ N (0, Σ) where Σ = ⎢σ ⎣ ⎦ ⎣ 21 σ 22 ⎦ Derive LIML Estimator (Use Invariance Property!): Y1 = β 0 + β1Y2 + ε ⇒ Y1 = β 0 + β1 (π 0 + π1Z + u ) + ε = ( β 0 + β1π 0 ) + β1π1Z + ( β1u + ε ) = α 0 + α1Z + v E ( Zv) = E ( Z ( β1u + ε ) ) = β1 E ( Zu ) + E ( Z ε ) = 0 So, orthogonality condition holds. E(ZZ’) assumed to be invertible. And we have iid and Gaussian errors, therefore, we can estimate this equation consistently by OLS Æ But OLS with is the same as the MLE estimate (under Gaussian errors assumption) Similarly, we can estimate the second equation consistently via OLS to obtain MLE estimate (under Gaussian errors) We obtain MLE estimates αˆ 0 , αˆ1 , πˆ0 , πˆ1 from OLS αˆ We also know α1 = β1π1 ⇒ By inv. prop of MLE , βˆ1 = 1 πˆ1 Similarly , α 0 = β 0 + β1π 0 ⇒ By inv. prop MLE , βˆ0 = αˆ 0 − βˆ1πˆ0 ( βˆ , βˆ ) are the LIML Estimators ! 0 1 Example of FIML (271A PS3 #5): Multiple (2) Equation System, Completed by Equation Given The structural model is: Y1 = γ 12Y2 + γ 13Y3 + δ13 Z 3 + δ14 Z 4 + u1 Y2 = γ 22Y1 + δ 21Z1 + u2 Y3 = δ 31Z1 + δ 32 Z 2 + δ 33 Z 3 + δ 34 Z 4 + u4 where Z1 = 1, E (us ) = 0 for s = 1, 2,3 and E ( Z j us ) = 0 for j = 1,.., 4, s = 1, 2,3 Assume in addition that δ13 + δ14 = 1 1. 3 Cases of Endogeneity (Simultaneity, Errors in Variables/Measurement Error, Omitted Varaibes), Examples. A. Simultaneity Example: (Working’s) Simultaneous Equations Model for Market Equilibrium Setup: The “true” relationship between demand and supply of coffee is modeled as follows Demand Equation: qid = α0 + α1 pi + ui (ui represents factors that influence coffee demand other than price) Supply Equation: qis = β0 + β1 pi + vi (vi represents factors that influence coffee supply other than price) Market Equilibrium: qid = qis Note: We assume E(ui) = 0 and E(vi) = 0 (if not, include nonzero means in the intercepts) Endogeneity: Here, the regressor pi is endogenous/not predetermined, i.e. not orthogonal to the (contemporaneous) error term, and therefore does not satisfy the orthogonality condition that E(pi . ui) = 0 ⇔ cov(pi,ui) = 0 and E(pi . vi) = 0 ⇔ cov(pi,vi) = 0 The endogeneity in this example arises from the face that price is a function of both error terms ui and vi, which is a result of market equilibrium 38 . Therefore, cov(pi,ui) = 0 and cov(pi,vi) = 0 iff Var(ui) = 0 and Var(vi) = 0 respectively. Not possible (except in the extreme case when, for example, there are no other factors that shift demand, so ui = 0)! Problem with Endogeneity: Endogeneity Bias When we regress observed quantity on a constant and price, we neither estimate the demand nor supply curve because price is endogenous in both equations. Recall that the OLS estimator is consistent for the least squares projection coefficients: in this case, the least squares projection of (true) qi on a constant and (true) pi gives a coefficient of pi given by Cov(pi,qi)/Var(pi) Suppose we observe {qi, pi} and we regress qi on a constant and pi, what is it that we estimate? OLS estimate of the price coefficient α̂1 (from the demand equation) Cov( p i , q i ) Cov( p i , α 0 + α 1 p i + u i ) Cov( p i , α 1 p i + u i ) α 1Var ( p i ) + Cov (u i , p i ) Cov(u i , p i ) P ⎯⎯→ = = = = α1 + Var ( p i ) Var ( p i ) Var ( p i ) Var ( p i ) Var ( p i ) Asymptotic Bias = Cov(ui , pi ) Var ( pi ) OLS estimate of the price coefficient β̂1 (from the supply equation) Cov( pi , qi ) Cov( pi , β 0 + β1 pi + ui ) Cov( pi , β1 pi + vi ) β1Var ( pi ) + Cov(vi , pi ) Cov(vi , pi ) P ⎯⎯→ = = = = β1 + Var ( pi ) Var ( pi ) Var ( pi ) Var ( pi ) Var ( pi ) Asymptotic Bias = Cov (vi , pi ) Var ( pi ) Since Cov( pi , ui ) ≠ 0 and Cov( pi , vi ) ≠ 0 , therefore endogeneity bias/simultaneous equation bias/simultaneity bias exists! (bc regressor and error term are related to each other through a system of simultaneous equations). So, OLS estimator is not consistent for either α1 or β1. Solution: Instrumental Variables and 2 Stage Least Squares The reason why demand curve nor supply curve can be consistently estimated because we cannot infer from the data whether the observed changes in price and quantity is due to a shift in demand or supply. Therefore, we might be able to estimate the demand/supply if some of the factors that shift the supply/demand curves are observable. Def: A predetermined variable (predetermined in the system) that is correlated with the endogenous regressor is called an instrumental variable or instrument. Sometimes we call it a valid instrument to emphasize that the correlation with the endogenous regressor is not 0. Observable Supply Shifters: Given “appropriate” observable supply shifters (Instrument), we can estimate demand and supply! Suppose the supply shifter vi can be divided into an observable factor xi and an unobservable factor ξI with Cov(xi.ξI)=0 39 38 To see endogeneity, treat the 3 equations as a system of simultaneous equations and solve for pi and qi (α1 − β1 ) pi = (β 0 − α 0 ) + (vi − ui ) ⇒ pi = (β 0 − α 0 ) + (vi − ui ) (α1 − β1 ) (α1 − β1 ) qid = qis ⇒ α 0 + α1 pi + ui = β 0 + β1 pi + vi ⇒ ⎛ (β − α 0 ) (vi − ui ) ⎞ 1 1 So, Cov( pi , ui ) = Cov⎜⎜ 0 + , ui ⎟⎟ = Cov ((vi − ui ), ui ) = (Cov(vi , ui ) − Var (ui )) = − Var (ui ) (Since Cov(vi , ui ) = 0 by assumption) (α1 − β1 ) (α1 − β1 ) ⎝ (α1 − β1 ) (α1 − β1 ) ⎠ (α1 − β1 ) ⎛ (β 0 − α 0 ) (vi − ui ) ⎞ 1 1 (Var (vi ) − Cov(vi , ui )) = Var (vi ) (Since Cov(vi , ui ) = 0 by assumption) Cov ( pi , vi ) = Cov⎜⎜ + , vi ⎟⎟ = Cov ((vi − ui ), vi ) = (α1 − β1 ) (α1 − β1 ) ⎝ (α1 − β1 ) (α1 − β1 ) ⎠ (α1 − β1 ) Æ Supply Equation: qis = β0 + β1 pi +β2 xi +ξi Suppose further that the observed supply shifter xi is predetermined in the demand equation, i.e. uncorrelated with the error term ui (e.g. think of xi is the temperature in coffee growing regions). If the temperature (xi) is uncorrelated with the unobserved factors that shift demand (ui), i.e. temperature (xi) is an instrument (for the demand equation), it would be possible to extract from observed price movements a component that is related to the temperature (i.e. the observed supply shifter) but uncorrelated with the demand shifter. Then, we can estimate the demand curve by examining the relationship between coffee consumption and that component of price. IV Estimator for α1: We derive the IV estimator for α1 below We can re-express price as: q id = q is ⇒ α 0 + α 1 p i + u i = β 0 + β 1 p i + β 2 x i + ς i ⇒ (α 1 − β 1 ) p i = (β 0 − α 0 ) + β 2 x i + (ς i − u i ) (ς − u i ) (β 0 − α 0 ) β2 xi + i + (α 1 − β 1 ) (α 1 − β1 ) (α 1 − β 1 ) ⎛ (β − α 0 ) (ς − u i ) ⎞⎟ β2 β2 1 (Cov(ς i , x i ) − Cov(u i , x i )) ∴ Cov ( p i , x i ) = Cov⎜⎜ 0 + xi + i , xi ⎟ = Var ( x i ) + (α 1 − β 1 ) ⎠ (α 1 − β1 ) (α 1 − β1 ) ⎝ (α 1 − β 1 ) (α 1 − β 1 ) ⇒ pi = = β2 (α 1 − β 1 ) Var ( x i ) sin ce Cov(ς i , x i ) = 0 by construction and Cov(u i , x i ) = 0 by assumption ≠ 0 ( So x i a valid instrument ) With a valid instrument, we can estimate the price coefficient α1 of demand curve consistently. Cov(q i , x i ) = Cov(α 0 + α 1 p i + u i , x i ) = α 1Cov( p i , x i ) + Cov(u i , x i ) = α 1Cov( p i , x i ) sin ce Cov(u i , x i ) = 0 by assumption ∴α 1 = Cov (q i , x i ) Cov ( p i , x i ) If we observe an iid sample (qi, pi, xi), then by the analogy principle, the natural (consistent) estimator is: (we say the endougenous regressor pi is instrumente by xi) αˆ 1, IV = Sample cov bt x i and q i = Sample cov bt x i and p i ∑ x q = ∑ x (α + α p ∑px ∑px i i i i i i i i 0 1 i +εi ) = α0 i i i ∑x ∑px i i i i i + α1 ∑ px +∑ε x ∑px ∑px i i i i i i 40 i i i i i i Æ IV estimator is consistent for α1 An instrumental variable is one that is correlated with the independent variable but not with the error term. The estimator is When z and ε are uncorrelated, the final term vanishes in the limit providing a consistent estimator. Note that when x is uncorrelated with the error term, x is itself an instrument. In that case the OLS estimator is a type of IV estimator. The approach above generalizes in a straightforward way to a regression with multiple explanatory variables. Suppose X is the T x K matrix of explanatory variables resulting from T observations on K variables. Let Z be a T x K matrix of instruments. Then, 39 This decomposition is always possible by the projection theorem. vi can be expressed as the projection onto the space spanned by xi and the orthogonal complement (remember, vi includes all factors that affect supply, so by definition has at least as many dimensions than xi ). i.e. If the least squares projection of vi on a constant and xi is E*(vi | 1 xi) = γ0 + β2xi. Define ξi = vi – γ0 + β2xi. By definition, ξi is orthogonal to xi and E(ξi ) = 0, therefore ξi, xi uncorrelated. Substituting this into the original supply equation, and combining the intercept terms we get the resulting expression. 40 Recall, sample covariance can be expressed as ∑ (x − x )(y − y ) = ∑ (x y − xy − x y + xy ) = ∑ (x y ) − x∑ (y ) − y∑ (x ) + ∑ (xy ) = ∑ (x y ) − nxy − nxy + nxy = ∑ (x y ) − nxy i i i Here, average of i i i i i i i i i i i i i i i i i i i One computational method often used for implementing the technique is two-stage least-squares (2SLS). One advantage of this approach is that it can efficiently combine information from multiple instruments for over-identified regressions: where there are fewer covariates than instruments. Under the 2SLS approach, in a first stage, each endogenous covariate (predictor variable) is regressed on all valid instruments, including the full set of exogenous covariates in the main regression. Since the instruments are exogenous, these approximations of the endogenous covariates will not be correlated with the error term. So, intuitively they provide a way to analyze the relationship between the outcome variable and the endogenous covariates. In the second stage, the regression of interest is estimated as usual, except that in this each endogenous covariate is replaced with its approximation estimated in the first stage. The slope estimator thus obtained is consistent. A small correction must be made to the sum-of-squared residuals in the second-stage fitted model in order that the associated standard errors be computed correctly. Stage 1: Stage 2: Mathematically, this estimator is identical to the single stage estimator presented above when the number of instruments is the same as the number of covariates. Two-Stage Least Squares (2SLS) Estimator for α1: This is another procedure for consistently estimating α1 which is named thusly because the procedure consists of running 2 least squares (OLS) regressions. • First Stage: Endogenous regressor pi is regressed on a constant and the predetermined variable xi to obtain fitted values p̂ i . (OLS coeff. for xi is Sample Cov bt pi and xi / sample variance of xi). • Second Stage: Regress dependent variable qi on a constant and p̂ i . (OLS coeff. for xi is Sample Cov bt p̂ i and xi / sample variance of xi). The 2nd stage estimates the equation (bracketed term is error): q i = α 0 + α 1 pˆ i + [u i + α 1 ( p i − pˆ i )] The 2SLS Estimator is consistent: Why??? Here, IV and 2SLS estimators are numerically the same (generally this will be true – see later). Generally, 2SLS estimator can be written as an IV estimator with an appropriate choice of instruments, and the IV estimator is a particular GMM estimator. B. Errors-in-Variables/Measurement Error Example: This is the phenomenon that an otherwise predetermined regressor becomes endogenous when measured with error C. The point is, if you have endogeneity problems and you can find an instruments that are not correlated with error terms (i.e. meets orthogonality condition) but is correlated with the endogenous terms (and correlated with dependent variable only through the endogenous terms), then we can find a consistent estimator for the underlying parameter of interest.