On the finite-sample theory of exogeneity tests with possibly non-Gaussian errors and weak identification ∗ Firmin Doko Tchatoka† University of Tasmania Jean-Marie Dufour ‡ McGill University February 2013 ∗ The authors thank Atsushi Inoue, Marine Carrasco, Jan Kiviet and Benoit Perron for several useful comments. This work was supported by the William Dow Chair in Political Economy (McGill University), the Canada Research Chair Program (Chair in Econometrics, Université de Montréal), the Bank of Canada (Research Fellowship), a Guggenheim Fellowship, a Konrad-Adenauer Fellowship (Alexander-von-Humboldt Foundation, Germany), the Institut de finance mathématique de Montréal (IFM2), the Canadian Network of Centres of Excellence [program on Mathematics of Information Technology and Complex Systems (MITACS)], the Natural Sciences and Engineering Research Council of Canada, the Social Sciences and Humanities Research Council of Canada, the Fonds de recherche sur la société et la culture (Québec), and the Fonds de recherche sur la nature et les technologies (Québec), and a Killam Fellowship (Canada Council for the Arts). † School of Economics and Finance, University of Tasmania, Private Bag 85, Hobart TAS 7001, Tel: +613 6226 7226, Fax:+61 3 6226 7587; e-mail: Firmin.dokotchatoka@utas.edu.au. Homepage: http://www.fdokotchatoka.com ‡ William Dow Professor of Economics, McGill University, Centre interuniversitaire de recherche en analyse des organisations (CIRANO), and Centre interuniversitaire de recherche en économie quantitative (CIREQ). Mailing address: Department of Economics, McGill University, Leacock Building, Room 519, 855 Sherbrooke Street West, Montréal, Québec H3A 2T7, Canada. TEL: (1) 514 398 4400 ext. 09156; FAX: (1) 514 398 4800; e-mail: jeanmarie.dufour@mcgill.ca . Web page: http://www.jeanmariedufour.com ABSTRACT We investigate the finite-sample behaviour of the Durbin-Wu-Hausman (DWH) and RevankarHartley (RH) specification tests with or without identification. We consider two setups based on conditioning upon the fixed instruments and parametric assumptions on the distribution of the errors. Both setups are quite general and account for non-Gaussian errors. Except for a couple of Wu (1973) tests and the RH-test, finite-sample distributions are not available for the other statistics [including the most standard Hausman (1978) statistic] even when the errors are Gaussian. In this paper, we propose an analysis of the distributions of the statistics under both the null hypothesis (level) and the alternative hypothesis (power). We provide a general characterization of the distributions of the test statistics, which exhibits useful invariance properties and allows one to build exact tests even for non-Gaussian errors. Provided such finite-sample methods are used, the tests remain valid (level is controlled) whether the instruments are strong or weak. The characterization of the distributions of the statistics under the alternative hypothesis clearly exhibits the factors that determine power. We show that all tests have low power when all instruments are irrelevant (strict non-identification). But power does exist as soon as there is one strong instrument (despite the fact overall identification may fail). We present simulation evidence which confirms our finite-sample theory. Key words: Exogeneity tests; finite-sample; weak instruments; strict exogeneity; Cholesky error family; pivotal; identification-robust; exact Monte Carlo exogeneity tests. JEL classification: C3; C12; C15; C52. i Contents 1. Introduction 1 2. Framework 3 3. Notations and definitions 4 4. Exogeneity test statistics 4.1. Unified presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2. Regression interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 6 7 5. Strict exogeneity 5.1. Pivotality under strict exogeneity . . . . . . . . . . . . . . . . . . . . . . . . . 5.2. Power under strict exogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . 8 8 9 6. Cholesky error families 10 7. Exact Monte Carlo exogeneity (MCE) tests 15 8. Simulation experiment 8.1. Standard exogeneity tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2. Exact Monte Carlo exogeneity (MCE) tests . . . . . . . . . . . . . . . . . . . . 9. A. Conclusion 28 Notes A.1. Unified formulation of DWH test statistics . . . . . . . . . . . . . . . . . . . . A.2. Regression interpretation of DWH test statistics . . . . . . . . . . . . . . . . . . B. 17 18 24 Proofs 29 29 29 32 List of Tables 1 2 3 4 5 Power of exogeneity tests at nominal level 5%; G = 2, T = 50 . . . . . . . . . . Power of exogeneity tests at nominal level 5% with Cauchy errors; G = 2, T = 50 Power of MCE tests with Gaussian errors . . . . . . . . . . . . . . . . . . . . . Power of MCE tests with Cauchy errors . . . . . . . . . . . . . . . . . . . . . . Power of MCE tests with Student errors . . . . . . . . . . . . . . . . . . . . . . 20 22 25 26 27 List of Assumptions, Propositions and Theorems 5.1 Theorem : Finite-sample distributions of exogeneity tests . . . . . . . . . . . . . . . ii 9 5.2 5.3 6.1 6.2 6.3 6.4 6.5 Theorem : Finite-sample distributions of exogeneity tests Theorem : Finite-sample distributions of exogeneity tests Lemma : Cholesky invariance of exogeneity tests . . . . Theorem : Finite-sample distributions of exogeneity tests Theorem : Finite-sample distributions of exogeneity tests Theorem : Finite-sample distributions of exogeneity tests Theorem : Finite-sample distributions of exogeneity tests Proof of Lemma 6.1 . . . . . . . . . . . . . . . . . Proof of Lemma 6.1 . . . . . . . . . . . . . . . . . Proof of Lemma 6.1 . . . . . . . . . . . . . . . . . Proof of Lemma 6.1 . . . . . . . . . . . . . . . . . Proof of Theorem 6.2 . . . . . . . . . . . . . . . . Proof of Lemma 6.1 . . . . . . . . . . . . . . . . . Proof of Theorem 6.4 . . . . . . . . . . . . . . . . iii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 10 12 13 13 14 14 32 33 33 33 34 35 36 1. Introduction A basic problem in econometrics consists of estimating a linear relationship where the explanatory variables and the errors may be correlated. In order to detect an endogeneity problem between explanatory variables and disturbances, researchers often apply an exogeneity test, usually by resorting to instrumental variable (IV) methods. Exogeneity tests of the type proposed by Durbin (1954), Wu (1973), Hausman (1978), and Revankar and Hartley (1973)– henceforth DWH-and RH-tests– are often used to decide whether one should apply ordinary least squares (OLS) or instrumental variable methods. One key assumption of DWH and RH-tests however, is the available instruments are strong. Not much is known, at least in finite-sample, about their behaviour when identification is deficient or weak (weak instruments). In the last two decades, literature has emerged that has raised concerns with the quality of inferences based on conventional methods, such as instrumental variables and ordinary least squares settings, when the instruments are only weakly correlated with the endogenous regressors. Many studies have shown that even ex-post conventional large-sample approximations are misleading when instruments are weak. The literature on the “weak instruments” problem is now considerable1 . Several authors have proposed identification-robust procedures that are applicable even when the instruments are weak. However, identification-robust procedures usually do not focus on regressor exogeneity or instrument validity. Hence, there is still a reason to be concerned when testing the exogeneity or orthogonality of a regressor. In Doko Tchatoka and Dufour (2008), we study the impact of instrument endogeneity on Anderson and Rubin (1949, AR-test) and Kleibergen (2002, K-test). We show that both procedures are in general consistent against the presence of invalid instruments (hence invalid for the hypothesis of interest), whether the instruments are strong or weak. However, there are cases where test consistency may not hold and their use may lead to size distortions in large samples. In this paper, we do not focus on the validity of the instruments. Instead, we question whether the standard specification tests are valid in finite samples when: (i) errors have possibly non-Gaussian distribution, and (ii) identification is weak. In the literature, except for Wu (1973, T1 , T2 tests) and the Revankar and Hartley (1973, RH -test), finite-sample distributions are not available for the other specification test statistics [including the most standard Hausman (1978) statistic] even when model errors are Gaussian and identification is strong. This paper fulfills this gap by simultaneously addressing issues related to finite-sample theory and identification. Staiger and Stock (1997) provided a characterization of the asymptotic distribution of Hausman type-tests [namely H1 , H2 , and H3 ] under the local-to-zero weak instruments asymptotic. They showed that when the instruments are asymptotically irrelevant, all three tests are valid (level is controlled) but inconsistent. Furthermore, their result indicates that H1 and H2 are conservative. Staiger and Stock (1997) then observed that the concentration parameter which characterizes instru1 See e.g. Nelson and Startz (1990a, 1990b); Dufour (1997); Bekker (1994); Phillips (1989); Staiger and Stock (1997); Wang and Zivot (1998); Dufour (2003); Stock, Wright and Yogo (2002); Kleibergen (2002); Moreira (2003); Hall, Rudebusch and Wilcox (1996); Hall and Peixe (2003); Donald and Newey (2001); Dufour (2005, 2007). 1 ment quality depends on the sample size in a way that size adjustment is infeasible. In this paper, we argue that this type of conclusion may go far. The local-to-zero weak instruments asymptotic implies that all instruments are asymptotically irrelevant. When the model is partially identified, this setup may lead to misleading conclusions. This raises the following question: how do the alternative standard specification tests behave when at least one instrument is strong? Recently, Hahn, Ham and Moon (2010) proposed a modified Hausman test which can be used for testing the validity of a subset of instruments. Their statistic is pivotal even when the instruments are weak. The problem however, is that the null hypothesis in their study tests the orthogonality of the instruments which are excluded from the structural equation. So, the test proposed by Hahn et al. (2010) can be viewed as an alternative way of assessing the overidentification restrictions hypothesis of the model [Hansen and Singleton (1982); Hansen (1982); Sargan (1983); Cragg and Donald (1993); Hansen, Heaton and Yaron (1996); Stock and Wright (2000); and Kleibergen (2005)]. Clearly, the problem considered by the authors is fundamentally different and less complex than testing the exogeneity of an included instrument in the structural equation, as done by Durbin (1954), Wu (1973), Hausman (1978), and Revankar and Hartley (1973). Guggenberger (2010) investigated the asymptotic size properties of a two-stage test in the linear IV model, when in the first stage a Hausman (1978) specification test is undertaken as a pretest of exogeneity of a regressor. He showed that the asymptotic size is one for empirically relevant choices of the parameter space. This means that the Hausman pre-test does not have sufficient power against correlations that are local to zero when identification is weak, while the OLS-based t-statistic takes on large values for such nonzero correlations. While we do not question the basic result of Guggenberger (2010) in this paper, we observe that cases where this happens include the Staiger and Stock (1997) weak instruments asymptotic which assumes that all instruments are asymptotically irrelevant. This however does not account for situations where at least one instrument is strong. Hence, the conclusions by Guggenberger (2010) may not be applicable when identification is partial. Doko Tchatoka and Dufour (2011) provide a general asymptotic framework which allows one to examine the asymptotic behaviour of DWH-tests including cases where partial identification holds. In this paper, we only focus on finite samples. The behaviour of DWH and RH exogeneity tests is studied under two alternative setups. In the first one, we assume that the structural errors are strictly exogenous, i.e. independent of the regressors and the available instruments. This setup is quite general and does not require additional assumptions on the (supposedly) endogenous regressors and the reduced-form errors. In particular, the endogenous regressors can be arbitrarily generated by any nonlinear function of the instruments and reduced-form parameters. Furthermore, the reduced-form errors may be heteroscedastic. The second one assumes a Cholesky invariance property for both structural and reduced-form errors. A similar assumption in the context of multivariate linear regressions is also made in Dufour and Khalaf (2002); and Dufour, Khalaf and Beaulieu (2010). In both setups, we propose a finite-sample analysis of the distribution of the tests under the null hypothesis (level) and the alternative hypothesis (power), with or without identification. Our analysis provides several new insights and extensions of earlier procedures. The characterization 2 of the finite-sample distributions of the statistics, shows that all tests are typically robust to weak instruments (level is controlled), whether the errors are Gaussian or not. This result is then used to develop exact Monte Carlo exogeneity (MCE) tests which are valid even when conventional asymptotic theory breaks down. In particular, MCE tests remain applicable even if the distribution of the errors does not have moments (Cauchy-type distribution, for example). Hence, size adjustment is feasible and the conclusion by Staiger and Stock (1997) may be misleading. Moreover, the characterization of the power of the tests clearly exhibits the factors that determine power. We show that all tests have no power in the extreme case where all instruments are weak [similar to Staiger and Stock (1997) and Guggenberger (2010)], but do have power as soon as we have one strong instrument. This suggests that DWH and RH exogeneity tests can detect an exogeneity problem even if not all model parameters are identified, provided partial identification holds. We present simulation evidence which confirms our theoretical results. The paper is organized as follows. Section 2 formulates the model studied, and Section 4 describes the statistics. Sections 5 and 6 study the finite-sample properties of the tests with (possibly) weak instruments. Section 7 presents the exact Monte Carlo exogeneity (MCE) test procedures while Section 8 presents a simulation experiment. Conclusions are drawn in Section 9 and proofs are presented in the Appendix. 2. Framework We consider the following standard simultaneous equations model: y = Y β + Z1 γ + u , (2.1) Y = Z1 Π1 + Z2 Π2 +V , (2.2) where y ∈ RT is a vector of observations on a dependent variable, Y ∈ RT ×G is a matrix of observations on (possibly) endogenous explanatory variables (G ≥ 1), Z1 ∈ RT ×k1 is a matrix of observations on exogenous variables included in the structural equation of interest (2.1), Z2 ∈ RT ×k2 is a matrix of observations on the exogenous variables excluded from the structural equation, u = (u1 , . . . , uT )′ ∈ RT and V = [V1 , . . . , VT ]′ ∈ RT ×G are disturbance matrices with mean zero, β ∈ RG and γ ∈ Rk1 are vectors of unknown coefficients, Π1 ∈ Rk1 ×G and Π2 ∈ Rk2 ×G are matrices of unknown coefficients. We suppose that the “instrument matrix” Z = [Z1 : Z2 ] ∈ RT ×k has full-column rank (2.3) where k = k1 + k2 , and T − k1 − k2 > G , k2 ≥ G . The usual necessary and sufficient condition for identification of this model is rank(Π2 ) = G . 3 (2.4) The reduced form for [y, Y ] can be written as: y = Z1 π 1 + Z2 π 2 + v, Y = Z1 Π1 + Z2 Π2 +V , (2.5) where π 1 = γ + Π1 β , π 2 = Π2 β , and v = u +V β = [v1 , . . . , vT ]′ . If any restriction is imposed on γ , we see from π 2 = Π2 β that β is identified if and only rank(Π2 ) = G, which is the usual necessary and sufficient condition for identification of this model. When rank(Π2 ) < G, β is not identified and the instruments Z2 are weak. In this paper, we study the finite-sample properties (size and power) of the standard exogeneity tests of the type proposed by Durbin (1954), Wu (1973), Hausman (1978), and Revankar and Hartley (1973) of the null hypothesis H0 : E(Y ′ u) = 0, including when identification is deficient or weak (weak instruments) and the errors [u, V ] may not have a Gaussian distribution. 3. Notations and definitions Let β̂ = (Y ′ M1Y )−1Y ′ M1 y and β̃ = [Y ′ (M1 − M)Y ]−1Y ′ (M1 − M)y denote the ordinary least squares (OLS) estimator and two-stage least squares (2SLS) estimator of β respectively, where M = M(Z) = I − Z(Z ′ Z)−1 Z ′ , M1 = M(Z1 ) = I − Z1 (Z1′ Z1 )−1 Z1′ , M1 − M = M1 Z2 (Z2′ M1 Z2 )−1 Z2′ M1 . (3.1) (3.2) Let V̂ = MY, X = [X1 : V̂ ], X1 = [Y : Z1 ], X̂ = [X̂1 : V̂ ], X̂1 = [Ŷ : Z1 ], X̄ = [X1 : Z2 ] = [Y : Z], and consider the following regression of u on the columns of V : u = Va + ε , (3.3) where a is a G × 1 vector of unknown coefficients, and ε is independent of V with mean zero and variance σ 2ε . Define θ = (β ′ , γ ′ , a′ )′ , θ ∗ = (β ′ , γ ′ , b′ )′ , θ̄ = (b′ , γ̄ ′ , ā′ )′ , where b = β + a, γ̄ = γ − Π1 a, ā = −Π2 a. We then observe that Y = Ŷ + V̂ , where Ŷ = (I − M)Y = PZ Y, and β = b as soon as a = 0. From the above definitions and notations, the structural equation (2.1) can be written in the following three different ways: y = Y β + Z1 γ + V̂ a + e∗ = X θ + e∗ , (3.4) y = Ŷ β + Z1 γ + V̂ b + e∗ = X̂ θ ∗ + e∗ , (3.5) y = Y b + Z1 γ̄ + Z2 ā + ε = X̄ θ̄ + ε , (3.6) where e∗ = PZ Va + ε . Equations (3.3)-(3.6) clearly illustrate that the endogeneity of Y may be viewed as a problem of omitted variables [see Dufour (1987)]. Let us denote by θ̂ : the OLS estimate of θ in (3.4), θ̂ 0 : the restricted OLS estimate of θ under a = 0, in (3.4), θ̂ ∗ : the OLS estimate of θ ∗ in (3.5), θ̂ ∗0 : the restricted OLS estimate of θ ∗ under 0 β = b in (3.5), θ̂ ∗ : the restricted OLS estimate of θ ∗ under b = 0 in (3.5) or β = −a in (3.4),θ̄ˆ : 4 the OLS estimate of θ̄ in (3.6), θ̄ˆ 0 : the restricted OLS estimate of θ̄ under ā = 0, and define the following sum squared errors: S(ω ) = (y − X ω )′ (y − X ω ), S∗ (ω ) = (y − X̂ ω )′ (y − X̂ ω ), S̄(ω ) = (y − X̄ ω )′ (y − X̄ ω ), ∀ ω ∈ Rk1 +2G . (3.7) Let Σ̃1 = σ̃ 21 ∆ˆ , Σ̃2 = σ̃ 22 ∆ˆ , Σ̃3 = σ̃ 2 ∆ˆ , Σ̃4 = σ̂ 2 ∆ˆ , (3.8) −1 −1 − σ̂ 2 Ω̂LS , Σ̂2 = σ̃ 2 ∆ˆ , Σ̂3 = σ̂ 2 ∆ˆ , Σ̂1 = σ̃ 2 Ω̂IV 1 1 Σ̂R = D Z (Z ′ D Z )−1 Z2′ D1 , D1 = M1 MM1Y , 2 1 2 2 1 2 T σ̂ R 1 ′ 1 ′ −1 −1 − Ω̂LS , Ω̂IV = Y (M1 − M)Y , Ω̂LS = Y M1Y, ∆ˆ = Ω̂IV T T (3.9) (3.10) (3.11) where σ̂ 2 = (y − Y β̂ )′ M1 (y − Y β̂ )/T is the OLS-based estimator of σ 2u , σ̃ 2 = (y − Y β̃ )′ M1 (y − Y β̃ )/T is the usual 2SLS-based estimator of σ 2u (both without correction for degrees of freedom), while σ̃ 21 = (y − Y β̃ )′ (M1 − M)(y − Y β̃ )/T = σ̃ 2 − σ̃ 2e , σ̃ 22 = σ̂ 2 − (β̃ − β̂ )′ ∆ˆ −1 (β̃ − β̂ ) = σ̂ 2 − σ̃ 2 (β̃ − β̂ )′ Σ̂2−1 (β̃ − β̂ ), σ̃ 2e = (y − Y β̃ )′ M(y − Y β̃ )/T, and σ̂ 2R = yMX̄ y′ /T may be interpreted as alternative IV-based scaling factors, κ 1 = (k2 − G)/G, κ 2 = (T − k1 − 2G)/G, κ 3 = κ 4 = T − k1 − G, and κ R = (T − k1 − k2 − G)/k2 . From (3.7) and (3.8)-(3.11), we can see that 0 S(θ̂ ) = S∗ (θ̂ ∗ ), S(θ̂ 0 ) = S∗ (θ̂ ∗0 ), S(θ̂ ) = T σ̃ 22 , S(θ̂ 0 ) = T σ̂ 2 , S∗ (θ̂ ∗ ) = T σ̃ 2 . (3.12) Throughout the paper, we also use the following notations: C0 = (Ā1 − A1 )′ ∆ˆ −1 (Ā1 − A1 ), Ā1 = [Y ′ (M1 − M)Y ]−1Y ′ (M1 − M), 1 1 A1 = (Y ′ M1Y )−1Y ′ M1 , D̄1 = M1 M(M1 −M)Y , D1 = M1 MM1Y , T T −1 −1 − (Va + ε )′ D1 (Va + ε )Ω̂LS , Σ1 = (Va + ε )′ D̄1 (Va + ε )Ω̂IV ΩIV ≡ ΩIV (µ 2 , V̄ ) = (µ 2 + V̄ )′ (M1 − M)(µ 2 + V̄ ) , (3.13) (3.14) (3.15) (3.16) ΩLS ≡ ΩLS (µ 2 , V̄ ) = (µ 2 + V̄ )′ M1 (µ 2 + V̄ ) , (3.17) ω 2IV ≡ ω IV (µ 1 , µ 2 , V̄ , v̄)2 = (µ 1 + v̄)′ D′∗ D∗ (µ 1 + v̄) , (3.18) ω 2LS ≡ ω LS (µ 1 , µ 2 , V̄ , v̄) = (µ 1 + v̄) C∗ (µ 1 + v̄) , (3.19) ′ 2 C∗ = M1 − M1 (µ 2 + V̄ )ΩLS (µ 2 , V̄ ) (µ 2 + V̄ ) M1 (3.20) D∗ = M1 − M1 (µ 2 + V̄ )ΩIV (µ 2 , V̄ )−1 (µ 2 + V̄ )′ (M1 − M) , (3.21) ω 21 ≡ ω 1 (µ 1 , µ 2 , V̄ , v̄)2 = (µ 1 + v̄)′ E(µ 1 + v̄) , (3.22) ω 22 ≡ ω 2 (µ 1 , µ 2 , V̄ , v̄)2 = (µ 1 + v̄)′ [C∗ −C′ ∆ −1C](µ 1 + v̄) , (3.23) −1 ω 2R ′ ≡ ω R (µ 1 , µ 2 , V̄ , v̄) = (µ 1 + v̄) [D1 − PD1 Z2 ](µ 1 + v̄) , ′ 2 (3.24) C = ΩIV (µ 2 , V̄ ) (µ 2 + V̄ ) (M1 − M) − ΩLS (µ 2 , V̄ ) (µ 2 + V̄ ) M1 , −1 ′ 5 −1 ′ (3.25) E = (M1 − M) I − (µ 2 + V̄ )ΩIV (µ 2 , V̄ )−1 (µ 2 + V̄ )′ (M1 − M) , ω 23 ≡ ω 3 (µ 1 , µ 2 , V̄ , v̄)2 = ω 2IV , ω 24 ≡ ω 4 (µ 1 , µ 2 , V̄ , v̄)2 = ω 2LS , 1 −1 −1 −1 Γ1 (µ 1 , µ 2 , V̄ , v̄) = C′ [ω 2IV ΩIV − ω 2LS ΩLS ] C, Γ2 (µ 1 , µ 2 , V̄ , v̄) = 2 C′ ∆ −1C , ω IV 1 ′ −1 1 Γ3 (µ 1 , µ 2 , V̄ , v̄) = C ∆ C, Γ̄l (µ 1 , µ 2 , V̄ , v̄) = 2 C′ ∆ −1C , l = 1, 2, 3, 4 , 2 ω LS ωl 1 PD Z , ΓR (µ 1 , µ 2 , V̄ , v̄) = ω 2R 1 2 (3.26) (3.27) (3.28) (3.29) (3.30) where for any matrix B, PB = B(B′ B)−1 B′ is the projection matrix on the space spanned by the columns of B, and MB = I − PB . Finally, let Cπ = Π2′ Z2′ M1 Z2 Π2 denotes the concentration factor2 . We then have M1 Z2 Π2 a = 0 if and only if Cπ a = 0, i.e. a = (IG −Cπ−Cπ )a∗ , where Cπ− is any generalized inverse of Cπ , and a∗ is an arbitrary G × 1 vector [see Rao and Mitra (1971, Theorem 2.3.1)]. Let N (Cπ ) = {ϖ ∈ RG : Cπ ϖ = 0} , (3.31) denotes the null set of the linear map on RG characterized by the matrix Cπ . Observe that when Z2′ M1 Z2 has full column rank k2 (which is usually the case), we have N (Cπ ) = {ϖ ∈ RG : Π2 ϖ = 0}, so that N (Cπ ) = {0} when identification hold. However, when identification is weak or Z2′ M1 Z2 does not have full column rank, there exist ϖ 0 6= 0 such that ϖ 0 ∈ N (Cπ ). We now presents DWH and RH test statistics studied in this paper. 4. Exogeneity test statistics We consider Durbin-Wu-Hausman test statistics, namely three versions of the Hausman-type statistics [Hi , i = 1, 2, 3], the four statistics proposed by Wu (1973) [Tl , l = 1, 2, 3, 4] and the test statistic proposed by Revankar and Hartley (1973, RH). First, we propose a unified presentation of these statistics that shows the link between Hausman-and Wu-type tests. Second, we provide an alternative derivation of all test statistics (including RH test statistic) from the residuals of the regression of the unconstrained and constrained models. 4.1. Unified presentation This subsection proposes a unified presentation of the DWH and RH test statistics. The proof of this unified representation is attached in Appendix A.1. The four statistics proposed by Wu (1973) can all be written in the form Tl = κ l (β̃ − β̂ )′ Σ̃l−1 (β̃ − β̂ ) , l = 1, 2, 3, 4. −1 − 21 the errors V have a definite positive covariance matrix ΣV , then ΣV 2 Cπ ΣV tion matrix. Hence, we referred here to Cπ as the concentration factor. 2 If 6 (4.1) is often referred to as the concentra- The three versions of Hausman-type statistics are defined as Hi = T (β̃ − β̂ )′ Σ̂i−1 (β̃ − β̂ ) , i = 1, 2, 3. (4.2) And the Revankar and Hartley (1973, RH) statistic is given by: = κ R y′ Σ̂R y. RH (4.3) The corresponding tests reject H0 when the test statistic is “large”. Unlike RH , Hi , i = 1, 2, 3, and Tl , l = 1, 2, 3, 4, compare OLS to 2SLS estimators of β . They only differ through the use of different “covariance matrices”. H1 uses two different estimators of σ 2u , while the others resort to a single scaling factor (or estimator of σ 2u ). The expressions of the Tl , l = 1, 2, 3, 4, in (4.1) are much more interpretable than those in Wu (1973). The link between Wu (1973) notations and ours is established in Appendix A.1. We use the above notations to better see the relation between Hausman-type tests and Wu-type tests. In particular, it is easy to see that Σ̃3 = Σ̂2 and Σ̃4 = Σ̂3 , so T3 = (κ 3 /T )H2 and T4 = (κ 4 /T )H3 . Finite-sample distributions are available for T1 , T2 and RH when the errors are Gaussian. More precisely, if u ∼ N[0, σ 2 IT ] and Z is independent of u, then: T1 ∼ F(G, k2 − G) , T2 ∼ F(G, T − k1 − 2G) , RH ∼ F(k2 , T − k1 − k2 − G) (4.4) under the null hypothesis of exogeneity. If furthermore, rank(Π2 ) = G and the sample size is large, under the exogeneity of Y , we have (with standard regularity conditions): L L Hi → χ 2 (G), i = 1, 2, 3 ; Tl → χ 2 (G), l = 3, 4. (4.5) However, even when identification is strong and the errors Gaussian, the finite-sample distributions of Hi , i = 1, 2, 3 and Tl , l = 3, 4 are not established in the literature. This underscores the importance of this study. 4.2. Regression interpretation We now give the regression interpretation of the above statistics. From Section 3, except for H1 , Hi , i = 2, 3, Tl , l = 1, 2, 3, 4 and RH can be expressed as [see Appendix A.2 for further details]: 0 H2 = T [S(θ̂ 0 ) − S(θ̂ )]/S∗ (θ̂ ∗ ), H3 = T [S(θ̂ 0 ) − S(θ̂ )]/S(θ̂ 0 ) , T1 = T3 = RH 0 κ 1 [S(θ̂ 0 ) − S(θ̂ )]/[S∗ (θ̂ ∗ ) − Se (θ̂ )], T2 = κ 2 [S(θ̂ 0 ) − S(θ̂ )]/S(θ̂ ) , 0 κ 3 [S(θ̂ 0 ) − S(θ̂ )]/S∗ (θ̂ ∗ ), T4 = κ 4 [S(θ̂ 0 ) − S(θ̂ )]/S(θ̂ 0 ) , = κ R [S̄(θ̄ˆ 0 ) − S̄(θ̄ˆ )]/S̄(θ̄ˆ 0 ) , (4.6) (4.7) (4.8) (4.9) where Se (θ̂ ) = T σ̃ 2e . Equations (4.6) - (4.9) are the regression formulation of the DWH and RH statistics. It interesting to observe that DWH statistics test the null hypothesis H0 : a = 0, while RH 7 tests H∗0 : ā = −Π2 a = 0. If rank(Π2 ) = G , a = 0 if and only if ā = 0. However, if rank(Π2 ) < G, ā = 0 does not entail a = 0. So, H0 ⊆ H∗0 but the inverse may not hold. Our analysis of the distribution of the statistics under the null hypothesis (level) and the alternative hypothesis (power), considers two setups. The first setup is the strict exogeneity, i.e. the structural error u is independent of all regressors. The second setup is the Cholesky error family. This setup assumes that the reduced-form errors belong to Cholesky families. 5. Strict exogeneity In this section, we consider the problem of testing the strict exogeneity of Y, i.e. the problem: H0 : u is independent of [Y, Z] (5.1) vs H1 : u = Va + ε , (5.2) where a is a G×1 vector of unknown coefficients, ε is independent of V with mean zero and variance σ 2ε . It is important to observe that equation (5.2) does not impose restrictions on the structure of the errors u and V . This equation is interpreted as the projection of u in the columns of V and holds for any homoscedastic disturbances u and V with mean zero. Thus, the hypothesis H0 can be expressed as H0 : a = 0 . (5.3) Note that (5.1) - (5.2) do not require any assumption concerning the functional form of Y . So, we could assume that Y obeys a general model of the form: Y = g(Z1 , Z2 , V, Π ) , (5.4) where g(.) is a possibly unspecified non-linear function, Π is an unknown parameter matrix and V follows an arbitrary distribution. This setup is quite wide and does allow one to study several situations where neither V nor u follow a Gaussian distribution. This is particularly important in financial models with fat-tailed error distributions, such as the Student-t. Furthermore, the errors u and V may not have moments (Cauchy distribution for example). Section 5.1 studies the distributions of the statistics under the null hypothesis (level). 5.1. Pivotality under strict exogeneity We first characterize the finite-sample distributions of the statistics under H0 , including when identification is weak and the errors are possibly non-Gaussian. Theorem 5.1 establishes the pivotality of all statistics. 8 Theorem 5.1 F INITE - SAMPLE DISTRIBUTIONS OF EXOGENEITY TESTS. Suppose the assumptions (2.1), (2.3) - (2.4) hold. Under H0 , the conditional distributions given [Y : Z] of all statistics defined by (4.1) - (4.3) depend only on the distribution of u/σ u irrespective of whether the instruments are strong or weak. The results of Theorem 5.1 indicate that if the conditional distribution of (u/σ u )|Y, Z does not involve any nuisance parameter, then all exogeneity tests are typically robust to weak instruments (level is controlled) whether the instruments are strong or weak. More interestingly, this holds even if (u/σ u )|Y, Z do not follow a Gaussian distribution. As a result, exact identification-robust procedures can be developed from the standard specification test statistics even when the errors have a non-Gaussian distribution (see Section 7). This is particularly important in financial models with fat-tailed error distributions, such as the Student-t or in models where the errors may not have any moment (Cauchy-type errors, for example). Furthermore, the exact procedures proposed in Section 7 do not require any assumption on the distribution of V and the functional form of Y . More generally, one could assume that Y obeys a general non-linear model as defined in (5.4) and that V1 , . . . , VT are heteroscedastic. 5.2. Power under strict exogeneity We now characterize the distributions of the tests under the general hypothesis (5.2). As before, we cover both weak and strong identification setups. Theorem 5.2 presents the results. Theorem 5.2 F INITE - SAMPLE DISTRIBUTIONS OF EXOGENEITY TESTS. (2.1) - (2.4) hold. If furthermore H1 in (5.2) is satisfied, then we can write Let the assumptions H1 = T (Va + ε )′ (Ā1 − A1 )′ Σ1−1 (Ā1 − A1 )(Va + ε ) , (5.5) H2 = T (Va + ε ) C0 (Va + ε )/(Va + ε ) D̄1 (Va + ε ) , (5.6) H3 = T (Va + ε )′C0 (Va + ε )/(Va + ε )′ D1 (Va + ε ) , (5.7) T1 = κ 1 (Va + ε )′C0 (Va + ε )/(Va + ε )′ (D̄1 − D1 )(Va + ε ) , (5.8) T2 = κ 2 (Va + ε )′C0 (Va + ε )/(Va + ε )′ (D1 −C0 )(Va + ε ) , (5.9) ′ ′ T3 = κ 3 (Va + ε ) C0 (Va + ε )/(Va + ε ) D̄1 (Va + ε ) , (5.10) T4 = κ 4 (Va + ε ) C0 (Va + ε )/(Va + ε ) D1 (Va + ε ) , (5.11) ′ ′ ′ RH ′ = κ R (Va + ε )′ PD1 Z2 (Va + ε )/(Va + ε )′ (D1 − PD1 Z2 )(Va + ε ) , (5.12) where Σ1 , C0 , A1 , D̄1 , D1 , Ω̂IV , Ω̂LS , ∆ˆ κ R , and κ l , l = 1, 2, 3, 4 , are defined in Section 3. We note first that Theorem 5.2 follows from algebraic arguments only. So, [Y : Z] can be random in any arbitrary way. Second, given [Y : Z], the distributions of the statistics only depend on the endogeneity a. We Can then observe that the above characterization clearly exhibits (Ā1 − A1 )Va , C0Va, D1Va, D̄1Va, PD1 Z2 Va as the factors that determine power. As a result, Corollary 5.3 examine the case where all exogeneity tests do not have power. 9 Corollary 5.3 F INITE - SAMPLE DISTRIBUTIONS OF EXOGENEITY TESTS. Under the assumptions of Theorem 5.2, all exogeneity tests do not have power if and on if a ∈ N (Cπ ). More precisely, the following equalities: −1 (Ā1 − A1 )ε , H1 = T ε ′ (Ā1 − A1 )′ Σ1∗ (5.13) H2 = T ε C0 ε /ε D̄1 ε , H3 = T ε C0 ε /ε D1 ε , ′ ′ ′ ′ (5.14) T1 = κ 1 ε C0 ε /ε (D̄1 − D1 )ε , T2 = κ 2 ε C0 ε /ε (D1 −C0 )ε , (5.15) T3 = κ 3 ε ′C0 ε /ε ′ D̄1 ε , T4 = κ 4 ε ′C0 ε /ε ′ D1 ε , (5.16) ′ RH ′ ′ ′ = κ R ε ′ PD1 Z2 ε /ε ′ (D1 − PD1 Z2 )ε (5.17) −1 −1 hold with probability 1 if and only if a ∈ N (Cπ ), where Σ1∗ = ε ′ D̄1 ε Ω̂IV . − ε ′ D1 ε Ω̂LS When a ∈ N (Cπ ), the conditional distributions of the statistics, given [Y : Z], are the same under the null hypothesis and the alternative hypothesis. Therefore, their unconditional distributions are also the same under the null and the alternative hypotheses. This entails that the power of the tests cannot exceed the nominal level. This condition is satisfied when Π2 = 0 (irrelevant instruments), and all exogeneity tests have no power against complete non identification of model parameters. We now analyze the properties of the tests when model errors belong to Cholesky families. 6. Cholesky error families Let U = [u, V ] = [U1 , . . . , UT ]′ , (6.18) W = [v, V ] = [u +V β , V ] = [W1 , W2 , . . . , WT ]′ . (6.19) We assume that the vectors Ut = [ut , Vt′ ]′ , t = 1, . . . , T, have the same nonsingular covariance matrix: ′ E[Ut Ut ] = Σ = " ′ σ 2u δ δ ΣV # > 0, t = 1, . . . , T, (6.20) where ΣV has dimension G. Then the covariance matrix of the reduced-form disturbances Wt = [vt , Vt′ ]′ also have the same covariance matrix, which takes the form: Ω= " σ 2u + β ′ ΣV β + 2β ′ δ ΣV β + δ β ′ ΣV + δ ′ ΣV # (6.21) where Ω is positive definite. In this framework, the exogeneity hypothesis can be expressed as H0 : δ = 0 . 10 (6.22) Under H1 in (5.2), we can see from (6.20) that δ = ΣV a, σ 2u = σ 2ε + a′ ΣV a = σ 2ε + δ ′ ΣV−1 δ . (6.23) So, the null hypothesis in (6.22) can be expressed as Ha : a = 0 . (6.24) Wt = JW̄t , t = 1, . . . , T , (6.25) We will now assume that where the vector W(T ) = vec(W̄1 , . . . , W̄T ) has a known distribution FW̄ and J ∈ R(G+1)×(G+1) is an unknown upper triangular nonsingular matrix [for a similar assumption in the context of multivariate linear regressions, see Dufour and Khalaf (2002) and Dufour et al. (2010)]. When the errors Wt obey (6.25), we say that Wt belongs to the Cholesky error family. If the covariance matrix of W̄t is an identity matrix IG+1 , the covariance matrix of Wt is Ω = E[Wt Wt′ ] = JJ ′ . (6.26) In particular, these conditions are satisfied when i.i.d. W̄t ∼ N[0, IG+1 ] , t = 1, . . . , T . (6.27) Since the J matrix is upper triangular, its inverse J −1 is also upper triangular. Let P = (J −1 )′ . (6.28) Clearly, P is a (G + 1) × (G + 1) lower triangular matrix and it allows one to orthogonalize JJ ′ : ′ P′ JJ ′ P = IG+1 , (JJ ′ )−1 = PP . (6.29) P′ can be interpreted as the Cholesky factor of Ω −1 , so P is the unique lower triangular matrix that satisfies equation (6.29); see Harville (1997, Section 14.5, Theorem 14.5.11). It will be useful to consider the following partition of P : P= " P11 0 P21 P22 # (6.30) where P11 6= 0 is a scalar and P22 is a nonsingular G × G matrix. In particular, if (6.26) holds, we see [using (6.21)] that an appropriate P matrix is obtained by taking: P11 = (σ 2u − δ ′ ΣV−1 δ )−1/2 = σ ε , ′ P22 ΣV P22 = IG , P21 = −(β + ΣV−1 δ )(σ 2u − δ ′ ΣV−1 δ )−1/2 = −(β + a)σ −1 ε . 11 (6.31) (6.32) −1 Further this choice is unique. P22 only depends on ΣV and P11 β + P21 = −(ΣV−1 δ )σ −1 ε = −aσ ε . In particular, if δ = 0, we have P11 = 1/σ u , P21 = −β /σ u and P11 β + P21 = 0. If we postmultiply [y, Y ] by P, we obtain from (2.5): [ȳ, Ȳ ] = [y, Y ] P = [yP11 +Y P21 , Y P22 ] = [Z1 , Z2 ] " γ + Π1 β Π2 β Π1 Π2 # P + W̄ (6.33) where ′ ′ ′ W̄ = UP = [v̄, V̄ ] = [W̄1 , . . . , W̄T ] , W̄t = [v̄t , V̄t ] , v̄ = vP11 +V P21 = [v̄1 , . . . , v̄T ]′ , (6.34) ′ V̄ = V P22 = [V̄1 , . . . , V̄T ] . (6.35) Then, we can rewrite (6.33) as ȳ = Z1 (γ P11 + Π1 ζ ) + Z2 Π2 ζ + v̄ , Ȳ = Z1 Π1 P22 + Z2 Π2 P22 + V̄ , (6.36) (6.37) where ζ = β P11 + P21 = −(ΣV−1 δ )/(σ 2u − δ ′ ΣV−1 δ )1/2 = −aσ −1 ε . (6.38) Since MZ = 0, we have M ȳ = M v̄ , MȲ = MV̄ , (6.39) M1 ȳ = M1 (µ 1 + v̄) , M1Ȳ = M1 (µ 2 + V̄ ) . (6.40) where µ 1 = M1 Z2 Π2 ζ = −σ −1 ε M1 Z2 Π2 a , µ 2 = M1 Z2 Π2 P22 . (6.41) Clearly, µ 2 does not depend on the endogeneity parameter a = ΣV−1 δ . Furthermore, ζ = 0 ⇔ δ = a = 0 and µ 1 = 0. In particular, this condition holds under H0 (δ = a = 0). If Π2 = 0 (complete non-identification of the model parameters), we have µ 1 = 0 and µ 2 = 0, irrespective of the value of δ . In this case, M ȳ = M v̄ , MȲ = MV̄ , M1 ȳ = M1 v̄ , M1Ȳ = M1V̄ . (6.42) We can now show the following Cholesky invariance property of all test statistics. Lemma 6.1 C HOLESKY INVARIANCE OF EXOGENEITY TESTS . R= " R11 0 R21 R22 # Let (6.43) be a lower triangular matrix such that R11 6= 0 is a scalar and R22 is a nonsingular G × G matrix. 12 If we replace y and Y by y∗ = yR11 + Y R21 and Y∗ = Y R22 in (4.1) - (3.11), then the statistics Hi (i = 1, 2, 3), Tl (l = 1, 2, 3, 4) and RH do not change. The above invariance holds irrespective of the choice of lower triangular matrix R. In particular, one can choose R = P as defined in (6.28). We can now prove the following general theorem on the distributions of the test statistics. Theorem 6.2 F INITE - SAMPLE DISTRIBUTIONS OF EXOGENEITY TESTS. Under the assumptions (2.1) - (2.4) and assumption (6.25), the statistics defined in (4.1) - (3.11) have the following representations: Hi = T [µ 1 + v̄]′Γi (µ 1 , µ 2 , v̄, V̄ )[µ 1 + v̄] , i = 1, 2, 3, Tl = κ l [µ 1 + v̄]′Γ̄l (µ 1 , µ 2 , v̄, V̄ )[µ 1 + v̄] , RH l = 1, 2, 3, 4, = κ R [µ 1 + v̄]′ΓR (µ 1 , µ 2 , v̄, V̄ )[µ 1 + v̄] , where [v̄, V̄ ], µ 1 , µ 2 are defined in (6.34) and (6.41), Γi , Γ̄l , and ΓR are defined in Section 3. The above theorem entails that the distributions of the statistics do not depend on either β or γ . Observe that Theorem 6.2 follows from algebraic arguments only, so [Y, Z] and [v̄, V̄ ] can be random in an arbitrary way. If the distributions of Z and [v̄, V̄ ] do not depend on other model parameters, the theorem entails that the distributions of the statistics depend on model parameters only through µ 1 and µ 2 . Since µ 2 does not involve δ , µ 1 is the only factor that determines power. If µ 1 6= 0, the tests have power. This may be the case when at least one instrument is strong (partial identification of model parameters). However, we can observe that when M1 Z2 Π2 a = 0, µ 1 = 0 and exogeneity tests have no power. We now provide a formal characterization of the set of parameters in which exogeneity tests have no power. Corollary 6.3 characterizes the power of the tests when a ∈ N (Cπ ). Corollary 6.3 F INITE - SAMPLE DISTRIBUTIONS OF EXOGENEITY TESTS. Under the assumptions of Theorem 6.2, if a ∈ N (Cπ ), we have µ 1 = 0 and the statistics defined in (4.1) - (3.11) have the following representations: Hi = T v̄′Γi (µ 2 , v̄, V̄ )v̄, i = 1, 2, 3; Tl = κ l v̄′Γ̄l (µ 2 , v̄, V̄ )v̄, l = 1, 2, 3, 4, RH = κ R v̄′ΓR (µ 2 , v̄, V̄ )v̄ irrespective of whether the instruments are weak or strong, where Γi (µ 2 , v̄, V̄ ) ≡ Γi (0, µ 2 , v̄, V̄ ), Γ̄l (µ 2 , v̄, V̄ ) ≡ Γl (0, µ 2 , v̄, V̄ ), ΓR (µ 2 , v̄, V̄ ) = ΓR (0, µ 2 , v̄, V̄ ), ζ = −(ΣV−1 δ )/(σ 2u − δ ′ ΣV−1 δ )1/2 , Γi , Γ̄l , and ΓR are defined in Section 3. First, note that when a ∈ N (Cπ ), i.e. when M1 Z2 Π2 a = 0, the conditional distributions, given Z and V̄ of the exogeneity tests, only depend on µ 2 irrespective of the quality of the instruments. 13 In particular, this condition is satisfied when Π2 = 0 (complete non-identification of the model parameters) or δ = a = 0 (under the null hypothesis). Since µ 2 does not depend on δ or a, all exogeneity test statistics have the same distribution under both the null hypothesis (δ = a = 0) and the alternative (δ 6= 0) when a ∈ N (Cπ ) : the power of these tests cannot exceed the nominal levels. So, the practice of pretesting based on exogeneity tests is unreliable in this case. Theorem 6.4 characterizes the distributions of the statistics in the special case of Gaussian errors. Theorem 6.4 F INITE - SAMPLE DISTRIBUTIONS OF EXOGENEITY TESTS. Let the assumptions of Theorem 6.2 hold. If furthermore the normality assumption (6.27) holds and Z = [Z1 , Z2 ] is fixed, then H1 = T [µ 1 + v̄]′Γ1 (µ 1 , µ 2 , v̄, V̄ )[µ 1 + v̄] , H2 = T [µ 1 + v̄]′Γ2 (µ 1 , µ 2 , v̄, V̄ )[µ 1 + v̄] ∼ T φ 1 (v̄, ν 1 )/φ 2 (v̄, ν 3 ) , H3 | V̄ T1 | V̄ ∗ ∼ T /[1 + κ −1 2 F(T − k1 − 2G, G; υ 2 , ν 1 )] ≤ κ̄ 1 F(G, T − k1 − 2G; ν 1 , υ 2 ) , ∼ F(G, k2 − G; ν 1 , υ 1 ), T2 | V̄ ∼ F(G, T − k1 − 2G; ν 1 , υ 2 ), T3 = κ 2 [µ 1 + v̄]′Γ2 (µ 1 , µ 2 , v̄, V̄ )[µ 1 + v̄] ∼ κ 2 φ 1 (v̄, ν 1 )/φ 2 (v̄, ν 3 ) , T4 | V̄ RH | V̄ ∗ ∼ κ 4 /[1 + κ −1 2 F(T − k1 − 2G, G; υ 2 , ν 1 )] ≤ κ̄ 2 F(G, T − k1 − 2G; ν 1 , υ 2 ) , ∼ F(k2 , T − k − G; ν R , υ R ) , where φ 1 (v̄, ν 1 )| V̄ = [µ 1 + v̄]′C′ ∆ −1C[µ 1 + v̄]| V̄ ∼ χ 2 (G; ν 1 ), φ 2 (v̄, ν 3 )| V̄ = ω 2IV | V̄ ∼ χ 2 (T −k1 − G; ν 3 ), ν 1 = µ ′1C′ ∆ −1C µ 1 , ν 3 = µ ′1 (D′∗ D∗ )µ 1 , υ 1 = µ ′1 E µ 1 , υ 2 = µ ′1 (C∗ − C′ ∆ −1C)µ 1 , ν R = µ ′1 PD1 Z2 µ 1 , υ R = µ ′1 (D1 − PD1 Z2 )µ 1 , κ̄ ∗1 = T G/(T − k1 − 2G), κ̄ ∗2 = (T − k1 − G)G/(T − k1 − 2G). The above theorem entails that given V̄ , the statistics T1 , T2 and RH follow double noncentral F-distributions, while T4 and H3 are bounded by a double noncentral F-type distribution. However, the distributions of T3 , H2 and H1 cannot be characterized by standard distributions. As in Theorem 6.2, µ 1 is the factor that determines power. If µ 1 6= 0, the exogeneity tests have power. However, when µ 1 = 0, all tests have no power as shown in Corollary 6.5. Corollary 6.5 F INITE - SAMPLE DISTRIBUTIONS OF EXOGENEITY TESTS. Under the assumptions of Theorem 6.4, if a ∈ N (Cπ ), we have ν 1 = ν 3 = υ 1 = υ 2 = ν R = υ R = 0 so that H1 = T v̄′Γ1 (µ 2 , v̄, V̄ )v̄, H2 = T v̄′Γ2 (µ 2 , v̄, V̄ )v̄ ∼ T φ 1 (v̄)/φ 2 (v̄) , ∗ H3 ∼ T /(1 + κ −1 2 F(T − k1 − 2G, G)) ≤ κ̄ 1 F(G, T − k1 − 2G) , T1 ∼ F(G, k2 − G) , T2 ∼ F(G, T − k1 − 2G), T3 = κ 2 v̄′Γ2 (µ 2 , v̄, V̄ )v̄ ∼ κ 2 φ 1 (v̄)/φ 2 (v̄) , ∗ T4 ∼ κ 4 /[1 + κ −1 2 F(T − k1 − 2G, G)] ≤ κ̄ 2 F(G, T − k1 − 2G) , RH ∼ F(k2 , T − k − G) , where φ 1 (v̄) ≡ φ 1 (v̄, 0), φ 2 (v̄) ≡ φ 2 (v̄, 0), φ 1 (v̄, ν 1 ), φ 1 (v̄, ν 3 ), Γi (µ 1 , µ 2 , v̄, V̄ ), i = 1, 2 are defined 14 in Theorem 6.4. Observe that when a ∈ N (Cπ ), the non-centrality parameters in the F-distributions vanish. In particular, under the null hypothesis H0 , we have a = 0 ∈ N (Cπ ) and all exogeneity test statistics are pivotal. Furthermore, all exogeneity test statistics have the same distribution under the null hypothesis (δ = a = 0) and the alternative (δ 6= 0): the power of the tests cannot exceed the nominal levels. We now describe the exact procedure for testing exogeneity even with non-Gaussian errors: the Monte Carlo exogeneity tests. 7. Exact Monte Carlo exogeneity (MCE) tests The finite-sample characterization of the distribution of exogeneity test statistics in the previous section shows that the tests are typically robust to weak instruments (level is controlled). However, the distributions of the statistics (under the null hypothesis) are not standard if the errors are non Gaussian. Furthermore, even for Gaussian errors, H1 , H2 , and T3 cannot be characterized by standard distributions. This section develops exact Monte Carlo tests which are identification-robust even if the errors are non-Gaussian. Consider again (2.1) and assume that we test the strict exogeneity of Y , i.e. the hypothesis: H0 : u is independent of [Y, Z] . (7.1) If the distribution under H0 of u/σ u is given, the conditional distributions of the exogeneity test statistics given [Y, Z] do not involve nuisance parameters and so can be simulated [see Theorem 5.1]. Let W ∈ {Hi , Hl , RH , i = 1, 2, 3 ; l = 1, 2, 3, 4 } . (7.2) We shall consider two cases. The first one where the support of W is continuous, and the second one where it is a discrete set. Let us first focus on the case where the statistics have continuous distributions. Let W1 , . . . , WN be a sample of N replications of identically distributed exchangeable random variables with the same distribution as W [for more details on exchangeability, see Dufour (2006)]. Define W (N) = (W1 , . . . , WN )′ and let W0 be the value of W based on observed data. Define p̂N (x) = N ĜN (x) + 1 N +1 (7.3) where ĜN (x) is the survival function given by ĜN (x) ≡ ĜN [x; W (N)] = 15 1 N ∑ 1(Wi ≥ x), N i=1 (7.4) 1(C) = 1 if condition C holds, = 0 otherwise. (7.5) Then, we can show that P[ p̂N (W0 ) ≤ α ] = I[α (N + 1)] N +1 for 0 ≤ α ≤ 1 (7.6) [see Dufour (2006, Proposition 2.2)], where I[x] is the largest integer less than or equal to x. So, p̂N (W0 ) ≤ α is the critical region of the Monte Carlo test with level 1 − α and p̂N (W0 ) is the Monte Carlo test p-value. We will now extend this procedure to the general case where the distribution of the statistic W may be discrete. Assume that W (N) = (W1 , . . . , WN )′ is a sequence of exchangeable random variables which may exhibit ties with positive probability. More precisely P(W j = W j′ ) > 0 for j 6= j′ , j, j′ = 1, . . . , N . (7.7) Let us associate each variable W j , j = 1, . . . , N, with a random variable U j , j = 1, . . . , N such that i.i.d U j , . . . , UN ∼ U (0, 1) , (7.8) U (N) = (U1 , . . . , UN )′ is independent of W (N) = (W1 , . . . , WN )′ where U (0, 1) is the uniform distribution on the interval (0, 1). Then, we consider the pairs Z j = (W j , U j ), j = 1, . . . , N , (7.9) which are ordered according to the lexicographic order: (W j , U j ) ≤ (W j′ , U j′ ) ⇐⇒ {W j < W j′ or (W j = W j′ and U j ≤ U j′ )} . (7.10) Let us define the randomized p-value function as p̃N (x) = N G̃N (x) + 1 , N +1 (7.11) where the tail-area function G̃N is given by G̃N (x) ≡ G̃N [x; U0 , W (N), U (N)] = 1 N N ∑ 1[Z j ≥ (x, U0 )], (7.12) j=1 U0 is a U (0, 1) random variable independent of W (N) and U (N). Then, following Dufour (2006, Proposition 2.4), we have P[ p̃N (W0 ) ≤ α ] = I[α (N + 1)] N +1 16 for 0 ≤ α ≤ 1 (7.13) So, p̃N (W0 ) ≤ α is the critical region of the Monte Carlo test with level 1 − α and p̃N (W0 ) is the MC-test p-value. We now describe the algorithm to compute the Monte Carlo tests p-value when the distributions of the statistics is continuous3 . Before proceeding, it will be useful to recall that under the assumptions of Theorem 5.1, all DWH and RH statistics can be expressed as [see the proof in (B.1)-(B.8)]: H2 = T (u/σ u )′C0 (u/σ u )/(u/σ u )′ D̄1 (u/σ u ) , (7.14) H3 = T (u/σ u )′C0 (u/σ u )/(u/σ u )′ D1 (u/σ u ) , (7.15) T1 = κ 1 (u/σ u ) C0 (u/σ u )/(u/σ u ) (D̄1 − D1 )(u/σ u ), (7.16) T2 = κ 2 (u/σ u )′C0 (u/σ u )/(u/σ u )′ (D1 −C0 )(u/σ u ) , (7.17) T3 = κ 3 (u/σ u )′C0 (u/σ u )/(u/σ u )′ D̄1 (u/σ u ), (7.18) T4 = κ 4 (u/σ u )′C0 (u/σ u )/(u/σ u )′ D1 (u/σ u ) (7.19) ′ ′ where the matrices C0 , D1 , D̄1 are defined in (3.13)-(3.30). Suppose that σ −1 u u | Y, Z ∼ F , where F independent of (Y, Z) under H0 and is completely specified. (7.20) The MC p-values are computed through the following steps: 1. compute the test statistic W0 based on observed data; ( j) ( j) ( j) ( j) = u ′ j = 1, . . . , N , according to 2. generate N i.i.d. variables σ −1 ∗ = [u∗1 , . . . , u∗T ] , u u the specified distribution F , and compute the corresponding test statistics W j using (7.14)(7.19) ; 3. compute the MC p-value as p̂MC (W0 ) = ∑Ni=1 1(Wi ≥ W0 ) + 1 ; N +1 (7.21) 4. reject the null hypothesis H0 at level α if p̂MC (W0 ) ≤ α . We now study the performance of the standard exogeneity tests and the proposed Monte Carlo tests through a Monte Carlo experiment. 8. Simulation experiment In each of the following experiments, the model is described by the following data generating process: y = Y1 β 1 +Y2 β 2 + u, (Y1 ,Y2 ) = (Z2 Π21 , Z2 Π22 ) + (V1 ,V2 ), (8.1) 3 Note that the algorithm can easily be generalized to discrete distributions. 17 i.i.d. where Z2 is a T × k2 matrix of instruments such that Z2t ∼ N(0, Ik2 ) for all t = 1, . . . , T, Π21 and Π22 are k2 -dimensional vectors such that Π21 = η 1C0 , Π22 = η 2C1 , (8.2) where η 1 and η 2 take the value 0 (design of complete non identification), .01 (design of weak identification) or .5 (design of strong identification), [C0 ,C1 ] is a k2 × 2 matrix obtained by taking the first two columns of the identity matrix of order k2 . Observe that (8.2) allows us to consider the partial identification of β = (β 1 , β 2 )′ . In particular, if Π21 = 0 but Π22 6= 0, β 1 is not identified but β 2 is. The true value of β is set at β 0 = (2, 5)′ and the number of instruments k2 varies in {5, 10, 20}. We assume that (8.3) u = Va + ε = V1 a1 +V2 a2 + ε , where a1 and a2 are 2 × 1 vectors and ε is independent with V = (V1 ,V2 ), V1 and V2 are T × 1 vectors. We consider two type framework for model error distributions: 1 0 0 i.i.d (1) (V1t , V2t , ε t )′ ∼ N 0, 0 1 0 0 0 1 for all t = 1, . . . , T (8.4) Vt and ε t are independent such that i.i.d and (2) (V1t , V2t , ε t )′ ∼ standard Cauchy distribution for all , t = 1, . . . , T . (8.5) The sample size is fixed at T = 50 but our results remain valid for alternative choice of the sample size (even less than 50). The endogeneity parameter a is chosen such that a = (a1 , a2 )′ ∈ {(−20, 0)′ , (−5, 5)′ , (0, 0)′ , (.5, .2)′ , (100, 100)′ } . (8.6) From the above notations, the usual exogeneity hypothesis of Y is expressed as H0 : a = (a1 , a2 )′ = (0, 0)′ . (8.7) The nominal level of the tests in each experiment is set at 5%. 8.1. Standard exogeneity tests Table 1 presents the empirical size and power when errors are Gaussian, while Table 2 is for Cauchytype errors. The number of replications is N = 10000 in both cases. The first column of each table reports the statistics, while the second column contains the values of k2 (number of excluded instruments). In the other columns, for each value of endogeneity parameter a and the quality of the instruments η 1 and η 2 , the rejection frequencies are reported. Our main findings can be summarized as follows: 18 1. all DWH and RH tests are valid (level is controlled) whether identification is strong or weak, and the errors are Gaussian or not. This confirms our theoretical results. More precisely, when the errors are Gaussian [framework (8.4)], T1 , T2 , T4 , H3 , and RH have a correct level, while T3 , H1 and H2 are conservative when identification is weak. For Cauchy-type errors [framework (8.5)], in addition to T3 , H1 and H2 , T1 is now conservative with weak instruments. However, T2 , T4 , H3 , and RH still have correct level whether identification is strong or weak; 2. all exogeneity tests exhibit power even if not all parameters are identified, provided partial identification holds. This shows how Staiger and Stock (1997) weak instruments asymptotic may be misleading when identification is partially weak. However, when the instruments are completely irrelevant, i.e. η 1 = η 2 = 0, all DWH and RH tests have no power whether the errors are Gaussian or not [similar to Staiger and Stock (1997) and Guggenberger (2010)]; 3. our results also indicate that in terms of power, H3 dominates H2 and H2 dominates H1 irrespective of whether identification is deficient or not. In the same way, T2 dominates T4 , T4 dominates T1 and T1 dominates T3 . We now analyze the performance of the proposed Monte Carlo exogeneity tests. 19 Table 1. Power of exogeneity tests at nominal level 5%; G = 2, T = 50 k2 (a1 , a2 )′ = (−20, 0)′ η 1 = 0 η 1 = .01 η 1 = .5 η2 = 0 η2 = 0 η2 = 0 (a1 , a2 )′ = (−5, 5)′ η 1 = 0 η 1 = .01 η 1 = .5 η2 = 0 η2 = 0 η2 = 0 (a1 , a2 )′ = (0, 0)′ η 1 = 0 η 1 = .01 η 1 = .5 η2 = 0 η2 = 0 η2 = 0 (a1 , a2 )′ = (.5, .2)′ η 1 = 0 η 1 = .01 η 1 = .5 η2 = 0 η2 = 0 η2 = 0 (a1 , a2 )′ = (100, 100)′ η 1 = 0 η 1 = .01 η 1 = .5 η2 = 0 η2 = 0 η2 = 0 20 T1 T2 T3 T4 H1 H2 H3 RH 5 5 5 5 5 5 5 5 4.98 4.98 0 4.64 0 0.01 5.34 4.84 4.6 24.92 0.19 24.07 0.09 0.25 25.73 45.25 65.81 100 97.93 100 92.53 98.09 100 100 5.26 5.04 0.02 4.67 0.01 0.03 5.33 5.36 4.92 6.77 0.05 6.29 0.02 0.05 7.19 7.83 70.9 100 97.85 100 91.83 98.02 100 100 4.87 4.96 0.02 4.63 0.01 0.02 5.27 5.04 5.06 5.38 0.03 4.91 0.02 0.04 5.72 5.2 5.24 5.26 0.59 4.93 0.26 0.74 5.56 4.9 5.09 4.87 0.03 4.51 0 0.04 5.18 4.88 4.84 4.61 0 4.42 0 0 4.92 4.73 19.85 53.19 29.02 52 17.97 31.42 54.41 41.31 4.94 4.91 0.01 4.62 0 0.02 5.31 5.02 4.18 76.71 5.83 76.25 3.59 6.89 77.11 100 70.09 100 97.93 100 92.48 98.14 100 100 T1 T2 T3 T4 H1 H2 H3 RH 10 10 10 10 10 10 10 10 4.9 5.01 0.35 4.65 0.16 0.46 5.32 5.17 3.95 17.5 1.88 16.77 1.05 2.3 18.11 57.58 98.38 100 100 100 99.31 100 100 100 4.92 5.19 0.38 4.75 0.18 0.48 5.43 4.83 5.34 6.2 0.29 5.73 0.14 0.42 6.56 7.62 98.93 100 100 100 99.22 100 100 100 4.82 5.16 0.3 4.78 0.2 0.38 5.46 4.83 4.81 4.88 0.33 4.55 0.14 0.43 5.18 5.34 5.25 5.07 1.47 4.72 0.49 1.76 5.41 4.97 4.88 4.77 0.36 4.45 0.14 0.46 5.06 4.93 5.22 5.45 0.3 5.02 0.14 0.39 5.75 5.41 34.18 54.24 43.01 52.81 28.92 45.54 55.31 34.5 4.91 4.8 0.22 4.46 0.1 0.33 5.12 4.57 3.28 50.74 14.7 50.05 9.88 16.85 51.25 100 99.23 100 100 100 99.25 100 100 100 T1 T2 T3 T4 H1 H2 H3 RH 20 20 20 20 20 20 20 20 4.93 4.75 1.95 4.43 1.08 2.32 5.15 4.88 2.26 8.97 3.73 8.42 2.43 4.37 9.36 79.08 99.8 100 100 100 99.89 100 100 100 4.94 4.9 1.82 4.51 1.13 2.26 5.25 5.03 4.64 5.54 2.01 5.21 1.08 2.6 5.73 8.36 99.78 100 100 100 99.82 100 100 100 4.9 5.09 2.1 4.74 1.13 2.67 5.4 5.38 5.02 5.32 2.02 5.04 1.2 2.57 5.68 5 5.07 4.99 2.79 4.61 1.03 3.28 5.41 5.21 5.02 4.95 2.01 4.63 1.08 2.46 5.23 5.07 4.93 4.94 1.95 4.57 1.21 2.48 5.18 5.04 39.4 49.34 44.9 47.89 29.88 47.46 50.31 24.88 5.02 4.92 1.94 4.52 1.15 2.33 5.23 5.3 1.5 17.32 9.2 16.45 6.44 10.39 17.76 100 99.96 100 100 100 99.7 100 100 100 Table 1 (continued). Power of exogeneity tests at nominal level 5%; G = 2, T = 50 k2 (a1 , a2 )′ = (−20, 0)′ η 1 = 0 η 1 = .01 η 1 = .5 η 2 = .5 η 2 = .5 η 2 = .5 (a1 , a2 )′ = (−5, 5)′ η 1 = 0 η 1 = .01 η 1 = .5 η 2 = .5 η 2 = .5 η 2 = .5 (a1 , a2 )′ = (0, 0)′ η 1 = 0 η 1 = .01 η 1 = .5 η 2 = .5 η 2 = .5 η 2 = .5 (a1 , a2 )′ = (.5, .2)′ η 1 = 0 η 1 = .01 η 1 = .5 η 2 = .5 η 2 = .5 η 2 = .5 (a1 , a2 )′ = (100, 100)′ η 1 = 0 η 1 = .01 η 1 = .5 η 2 = .5 η 2 = .5 η 2 = .5 21 T1 T2 T3 T4 H1 H2 H3 RH 5 5 5 5 5 5 5 5 4.73 5.1 0.63 4.77 0.27 0.77 5.48 5.13 15.16 37.9 18.25 36.89 10.48 20.16 38.88 28.27 81.58 100 98.68 100 90.44 98.82 100 100 69.69 100 98.15 100 92 98.33 100 100 68.76 100 98.26 100 92.3 98.43 100 100 78.22 100 98.50 100 92.20 98.52 100 100 4.91 5.51 0.75 5.06 0.39 0.87 5.83 4.77 5.26 5.29 0.85 4.98 0.29 0.96 5.64 5.13 5 5.2 0.83 4.78 0.32 0.99 5.41 5.17 8.01 12.95 3.82 12.24 1.93 4.44 13.39 9.81 7.48 12.42 3.47 11.72 1.69 4.08 12.95 10.28 24.2 64.31 42.79 63.06 24.39 45.64 65.44 50.59 63.6 100 97.43 100 92.4 97.59 100 100 65.14 100 97.09 100 91.95 97.31 100 100 78.04 100 98.52 100 92.12 98.64 100 100 T1 T2 T3 T4 H1 H2 H3 RH 10 10 10 10 10 10 10 10 5.18 5.29 1.7 4.96 0.73 2 5.61 5.24 26.81 41.58 31.1 40.35 18.21 33.67 42.64 24.16 99.76 100 99.98 100 98.22 99.98 100 100 98.81 100 99.97 100 99.08 99.98 100 100 99.17 100 99.99 100 98.98 100 100 100 99.56 100 100 100 98.9 100 100 100 5.26 4.92 1.58 4.57 0.55 1.88 5.3 4.92 5.3 5.19 1.6 4.87 0.5 2.03 5.53 5.07 4.86 5.07 1.88 4.67 0.48 2.31 5.38 5.11 11.05 13.49 7.75 12.81 3.34 8.65 14.05 8.55 11.61 14.75 8.29 14 3.88 9.3 15.32 8.94 43.71 66.24 57.52 65.15 32.85 60.4 67.3 43.87 99.12 100 100 100 99.28 100 100 100 99.28 100 100 100 99.26 100 100 100 99.74 100 100 100 98.29 100 100 100 T1 T2 T3 T4 H1 H2 H3 RH 20 20 20 20 20 20 20 20 5.12 5.06 2.97 4.7 1.2 3.59 5.32 5.46 27.67 34.7 30.26 33.32 17.73 32.57 35.69 16.17 99.96 100 100 100 99.24 100 100 100 99.45 100 100 100 99.93 100 100 100 99.48 100 100 100 99.91 100 100 100 99.62 100 100 100 99.93 100 100 100 4.86 4.93 3.2 4.57 1.1 3.65 5.25 5.2 4.91 4.77 2.88 4.45 1.03 3.39 5.06 4.64 4.29 4.3 2.74 3.97 0.72 3.27 4.55 4.82 10.45 11.85 9.14 11.13 4.51 10.24 12.42 7.45 10.95 12.03 9.14 11.34 4.53 10.25 12.55 7.45 41.15 51.76 47.52 50.35 27.81 50.07 52.91 26.62 99.91 100 100 100 99.77 100 100 100 99.9 100 100 100 99.81 100 100 100 99.94 100 100 100 98.75 100 100 100 Table 2. Power of exogeneity tests at nominal level 5% with Cauchy errors; G = 2, T = 50 k2 (a1 , a2 )′ = (−20, 0)′ η 1 = 0 η 1 = .01 η 1 = .5 η2 = 0 η2 = 0 η2 = 0 (a1 , a2 )′ = (−5, 5)′ η 1 = 0 η 1 = .01 η 1 = .5 η2 = 0 η2 = 0 η2 = 0 (a1 , a2 )′ = (0, 0)′ η 1 = 0 η 1 = .01 η 1 = .5 η2 = 0 η2 = 0 η2 = 0 (a1 , a2 )′ = (.5, .2)′ η 1 = 0 η 1 = .01 η 1 = .5 η2 = 0 η2 = 0 η2 = 0 (a1 , a2 )′ = (100, 100)′ η 1 = 0 η 1 = .01 η 1 = .5 η2 = 0 η2 = 0 η2 = 0 22 T1 T2 T3 T4 H1 H2 H3 RH 5 5 5 5 5 5 5 5 4.96 5.08 0.05 4.82 0.04 0.07 5.41 5.13 4.94 8.58 0.08 8.08 0.02 0.11 9.01 12.29 4.9 59.38 4.91 58.8 3.26 5.86 59.84 82.91 4.98 5.48 0.02 5.19 0.01 0.05 5.81 5.61 4.95 6.02 0.03 5.62 0 0.04 6.38 6.79 5.14 24.51 0.65 23.87 0.33 0.81 25.21 40.66 5.19 5.46 0 5.14 0 0 5.67 6.04 5.23 5.38 0.03 5 0.01 0.05 5.7 5.98 4.97 5.29 0.05 4.9 0.02 0.05 5.64 5.93 5.1 5.15 0.02 4.75 0.01 0.02 5.49 4.88 5.34 4.97 0.01 4.63 0 0.03 5.23 4.43 5.23 5.54 0.02 5.14 0.01 0.03 5.77 5.06 5.01 5.32 0.01 5.01 0 0.02 5.57 6.12 4.83 44.68 1.62 44 0.91 2.02 45.34 68.34 6.66 81.16 8.71 80.76 6.2 9.95 81.48 96.73 T1 T2 T3 T4 H1 H2 H3 RH 10 10 10 10 10 10 10 10 5.61 5.42 0.39 5.08 0.17 0.49 5.61 6.09 4.79 6.48 0.44 6.09 0.17 0.65 6.8 11.71 5.07 38.72 10.96 38.06 7.6 12.56 39.32 81.63 4.97 5.53 0.3 5.24 0.11 0.46 5.8 6.41 5.2 5.41 0.3 5 0.13 0.42 5.64 5.77 4.63 9.28 0.8 8.86 0.42 1.11 9.66 22.73 4.83 4.79 0.31 4.45 0.16 0.4 5.01 5.06 5.48 5.17 0.28 4.87 0.08 0.38 5.42 4.67 4.7 4.81 0.32 4.46 0.09 0.45 5.05 4.98 5.04 4.92 0.34 4.61 0.14 0.44 5.19 4.22 5.08 4.94 0.28 4.63 0.17 0.39 5.2 4.63 5.22 5.14 0.19 4.83 0.09 0.33 5.43 4.77 5.09 5.01 0.38 4.69 0.14 0.51 5.33 3.86 2.95 22.57 3.53 21.74 2.06 4.19 23.16 62.53 3.24 54.36 18.51 53.51 13.04 20.96 55.1 96.32 T1 T2 T3 T4 H1 H2 H3 RH 20 20 20 20 20 20 20 20 5.27 5.34 2.03 5.03 1.21 2.54 5.72 6.3 5.02 5.4 2.16 5.13 1.25 2.62 5.69 9.15 3.63 13.09 6.97 12.58 4.78 8.03 13.49 75.83 4.64 4.94 1.8 4.6 1.05 2.27 5.14 4.05 4.63 4.9 1.77 4.57 1.01 2.25 5.12 4.15 4.35 6.76 2.45 6.42 1.61 3.25 7.07 23.42 4.96 4.85 1.95 4.47 1.14 2.3 5.21 6.55 5.27 5.06 2.09 4.68 1.19 2.62 5.4 6.42 5.09 4.98 1.87 4.67 1.06 2.33 5.29 6.83 4.77 4.76 1.91 4.48 0.94 2.35 5.04 5.49 5.16 5.26 2.19 5.01 1.35 2.56 5.5 5.03 4.85 4.98 1.88 4.7 1.09 2.4 5.27 5.27 5.1 4.84 2.06 4.57 1.26 2.43 5.08 5.01 3 8.73 3.74 8.2 2.42 4.38 9.06 54.94 2.55 18.56 11.08 18.04 8.21 12.28 19.12 94.83 Table 2 (continued). Power of exogeneity tests at nominal level 5% with Cauchy errors; G = 2, T = 50 k2 (a1 , a2 )′ = (−20, 0)′ η 1 = 0 η 1 = .01 η 1 = .5 η 2 = .5 η 2 = .5 η 2 = .5 (a1 , a2 )′ = (−5, 5)′ η 1 = 0 η 1 = .01 η 1 = .5 η 2 = .5 η 2 = .5 η 2 = .5 (a1 , a2 )′ = (0, 0)′ η 1 = 0 η 1 = .01 η 1 = .5 η 2 = .5 η 2 = .5 η 2 = .5 (a1 , a2 )′ = (.5, .2)′ η 1 = 0 η 1 = .01 η 1 = .5 η 2 = .5 η 2 = .5 η 2 = .5 (a1 , a2 )′ = (100, 100)′ η 1 = 0 η 1 = .01 η 1 = .5 η 2 = .5 η 2 = .5 η 2 = .5 23 T1 T2 T3 T4 H1 H2 H3 RH 5 5 5 5 5 5 5 5 5.21 4.51 0.01 4.24 0 0.04 4.84 4.36 4.86 7.01 0.01 6.6 0 0.01 7.31 8.62 4.43 52.88 3.51 52.29 2.22 4.22 53.53 77.57 4.53 22.05 0.72 21.41 0.42 0.89 22.61 37.03 5.03 22.32 0.6 21.56 0.21 0.75 22.8 36.83 4.65 33.84 1.47 33.16 0.95 1.74 34.49 53.81 4.88 4.83 0.01 4.42 0 0.02 5.19 5.32 4.42 4.77 0.01 4.5 0 0.03 5.08 5.34 4.88 5.01 0.04 4.7 0.01 0.05 5.42 5.36 5.31 5.02 0.01 4.64 0 0.02 5.41 5.03 5.16 4.91 0.06 4.63 0.02 0.06 5.24 5.29 5.05 5.52 0.02 5.07 0.01 0.03 5.77 5.46 4.64 77.26 8.81 76.83 6.38 10.02 77.6 96.24 4.66 77.11 9.01 76.71 6.37 10.14 77.53 96.42 5.31 78.14 10.27 77.77 7.38 11.51 78.63 97.63 T1 T2 T3 T4 H1 H2 H3 RH 10 10 10 10 10 10 10 10 4.72 4.53 0.23 4.16 0.08 0.34 4.91 4.94 4.97 6.71 0.49 6.3 0.25 0.75 7.18 9.41 4.34 36.17 10.23 35.3 7.01 11.8 36.81 78.79 4.87 13.87 1.6 13.24 0.9 2.03 14.51 34.19 5.41 13.91 1.95 13.31 1.04 2.38 14.37 33.03 5.3 17.44 3.09 16.82 1.86 3.62 18.09 45.3 5.2 4.94 0.34 4.65 0.12 0.44 5.17 5.36 5.3 5.01 0.34 4.68 0.19 0.43 5.45 4.98 5.16 5.11 0.27 4.7 0.15 0.35 5.45 5.44 4.89 5.11 0.27 4.77 0.08 0.42 5.37 5.11 4.93 5.14 0.34 4.73 0.09 0.49 5.44 5.01 4.7 5.15 0.31 4.85 0.11 0.41 5.46 5.46 5.07 49.25 16.39 48.54 12.12 18.12 49.86 95.77 4.59 49.57 15.82 48.81 11.64 17.91 50.25 95.26 4.81 52.89 18.7 52.05 13.86 20.72 53.49 97.22 T1 T2 T3 T4 H1 H2 H3 RH 20 20 20 20 20 20 20 20 4.83 4.61 2.04 4.21 1.12 2.44 4.86 6.64 4.39 4.6 1.85 4.34 1.16 2.2 4.93 9.64 2.6 13.11 6.7 12.41 4.61 7.67 13.56 75.85 4.31 6.41 2.6 6.09 1.59 3.04 6.75 18.22 4.21 6.08 2.54 5.79 1.55 3.12 6.36 18.08 3.47 6.78 3 6.48 1.66 3.52 7.23 33.69 4.85 4.65 1.69 4.27 1.01 2.16 4.93 5.31 5.12 4.85 1.99 4.57 1.07 2.48 5.17 5.11 4.67 4.95 1.9 4.73 1.12 2.39 5.26 5.31 4.66 4.56 1.88 4.23 1.08 2.26 4.85 4.38 4.85 4.7 2 4.4 1.15 2.41 4.97 4.64 5.05 5.13 2.23 4.8 1.35 2.76 5.5 4.93 2.26 18.38 11.17 17.78 8.44 12.37 19.04 94.4 2.19 17.85 10.59 17.22 7.93 11.67 18.46 94.26 1.79 18.44 10.62 17.83 7.45 12.04 18.95 96.09 8.2. Exact Monte Carlo exogeneity (MCE) tests We now evaluate the empirical performance of the Monte Carlo tests described in the previous section. To do that, we use the same DGP introduces in Section 8 in addition with a Student typedistribution for model errors. We generate J = 10000 samples of size T = 50 following this DGP. For each sample r = 1, . . . , J, a replication Wr of the statistic W ∈ {Hi , Hl , RH , i = 1, 2, 3 ; l = 1, 2, 3, 4 } is computed from the simulated sample. In addition, for each sample r, we draw N = 99 realization of the statistic, namely, W j∗ , j = 1, . . . , 99, following the same DGP as above. We then compute the Monte Carlo test p-value as p̂r (Wr ) = ∗ ∑99 j=1 1(W j ≥ Wr ) + 1 100 (8.8) and estimate the rejection probability (RP) of the test as the proportion of the p̂r (Wr ) that are less than α , the nominal level. This yields the following estimate of the RP of the Monte Carlo test: c MC = RP 1 10000 ∑ 1[ p̂r (Wr ) < α ]. 10000 r=1 (8.9) Tables 3- 5 present the results. We note that all Monte Carlo tests now have approximately correct size, even when identification is weak. So, size adjustment of all standard DWH tests is feasible, unlike the conclusion of Staiger and Stock (1997). Furthermore, compared to the standard tests, all tests power has improved slightly even when instruments are weak in both Cauchy and Student distributions setups [see Tables 4-5]. But the Monte Carlo tests still exhibit low power when all instruments are weak. In addition, the Monte Carlo tests seem to have more power when errors have Gaussian and Student distributions than when their distributions are Cauchy-type. Overall, our results clearly suggest that finite-sample improvement of standard exogeneity tests is feasible, whether model errors are Gaussian or not, and identification is strong or weak. Hence, the view that size adjustment is infeasible for some versions of DWH tests [see for example, Staiger and Stock (1997)] is questionable. 24 Table 3 . Power of MCE tests with Gaussian errors k2 (a1 , a2 )′ = (−20, 0)′ η 1 = 0 η 1 = .01 η 1 = .5 η2 = 0 η2 = 0 η2 = 0 (a1 , a2 )′ = (−5, 5)′ η 1 = 0 η 1 = .01 η 1 = .5 η2 = 0 η2 = 0 η2 = 0 (a1 , a2 )′ = (0, 0)′ η 1 = 0 η 1 = .01 η 1 = .5 η2 = 0 η2 = 0 η2 = 0 (a1 , a2 )′ = (.5, .2)′ η 1 = 0 η 1 = .01 η 1 = .5 η 2 = .5 η 2 = .5 η 2 = .5 (a1 , a2 )′ = (100, 100)′ η 1 = 0 η 1 = .01 η 1 = .5 η 2 = .5 η 2 = .5 η 2 = .5 25 T1 T2 T3 T4 H1 H2 H3 RH 5 5 5 5 5 5 5 5 5 5 3 5 4 4 5 5 7 8 9 8 9 9 8 18 30 100 93 100 91 93 100 100 4 5 3 5 3 3 5 5 4 6 4 5 4 4 6 6 41 99 95 99 94 95 99 100 4 5 4 5 4 4 5 5 5 4 3 5 4 5 5 5 5 5 4 5 4 5 5 5 12 13 12 13 12 12 13 13 9 7 8 7 9 8 7 9 13 23 22 23 22 22 23 23 77 100 91 100 91 91 100 100 78 100 94 100 93 94 100 100 81 100 89 100 89 89 100 100 T1 T2 T3 T4 H1 H2 H3 RH 10 10 10 10 10 10 10 10 4 5 4 5 3 4 5 5 7 8 5 8 5 5 8 16 55 99 99 99 99 99 99 100 4 5 4 5 3 4 5 5 3 7 3 7 3 3 7 6 50 99 99 99 98 99 99 100 4 5 3 5 3 4 5 4 5 5 4 5 4 3 5 5 5 5 3 5 3 4 5 4 6 6 5 6 6 5 6 6 5 8 7 8 7 7 8 13 17 23 20 23 21 22 23 24 97 99 99 99 99 99 99 100 95 100 98 100 98 98 100 100 95 98 97 98 96 97 98 100 T1 T2 T3 T4 H1 H2 H3 RH 20 20 20 20 20 20 20 20 5 5 4 5 3 3 4 5 6 7 7 7 6 7 7 6 33 80 82 80 81 82 80 100 4 5 3 5 4 3 5 4 5 7 4 6 4 4 6 7 68 99 99 99 99 99 99 100 4 5 3 4 3 3 5 3 5 4 3 5 3 3 5 5 4 5 3 4 3 3 5 4 6 6 5 6 6 5 6 12 7 8 8 8 7 8 8 13 10 12 10 12 10 11 12 12 88 92 94 92 94 94 92 100 83 84 88 84 89 88 84 100 90 93 93 93 93 93 93 100 Table 4 . Power of MCE tests with Cauchy errors k2 (a1 , a2 )′ = (−20, 0)′ η 1 = 0 η 1 = .01 η 1 = .5 η2 = 0 η2 = 0 η2 = 0 (a1 , a2 )′ = (−5, 5)′ η 1 = 0 η 1 = .01 η 1 = .5 η2 = 0 η2 = 0 η2 = 0 (a1 , a2 )′ = (0, 0)′ η 1 = 0 η 1 = .01 η 1 = .5 η2 = 0 η2 = 0 η2 = 0 (a1 , a2 )′ = (.5, .2)′ η 1 = 0 η 1 = .01 η 1 = .5 η 2 = .5 η 2 = .5 η 2 = .5 (a1 , a2 )′ = (100, 100)′ η 1 = 0 η 1 = .01 η 1 = .5 η 2 = .5 η 2 = .5 η 2 = .5 26 T1 T2 T3 T4 H1 H2 H3 RH 5 5 5 5 5 5 5 5 4 5 4 5 4 4 5 5 7 8 5 8 5 5 8 7 27 62 38 62 38 38 62 82 4 5 5 5 4 4 5 5 9 6 5 6 5 5 6 5 17 32 20 32 20 20 32 47 2 5 5 5 5 5 5 5 4 5 3 5 5 3 5 5 3 4 3 4 3 3 4 5 6 9 7 9 7 7 9 8 5 6 4 6 3 4 6 6 5 5 4 5 4 4 5 6 38 76 50 76 47 50 76 95 41 81 50 81 51 50 81 96 40 78 50 78 49 50 78 98 T1 T2 T3 T4 H1 H2 H3 RH 10 10 10 10 10 10 10 10 3 5 5 5 4 5 5 5 2 3 5 3 4 5 3 7 24 37 34 37 34 34 37 82 5 5 4 5 4 4 5 4 8 6 3 6 3 3 6 5 10 15 12 15 14 12 15 37 3 4 4 4 4 4 4 4 5 3 2 3 2 2 3 4 4 5 4 5 4 4 5 5 4 6 4 6 4 4 6 7 6 9 7 9 7 7 9 10 5 6 5 4 4 5 6 7 34 55 43 55 43 43 55 96 36 54 49 54 52 49 54 96 37 51 44 51 41 44 51 93 T1 T2 T3 T4 H1 H2 H3 RH 20 20 20 20 20 20 20 20 5 5 5 5 5 5 5 4 6 9 8 9 8 8 9 6 11 15 13 15 15 13 15 73 4 5 5 5 4 5 5 5 7 7 8 7 8 8 7 10 6 8 8 8 8 8 8 31 4 5 4 5 3 4 5 5 4 5 5 5 5 5 5 4 4 4 4 4 4 4 4 5 6 6 5 6 4 5 6 5 6 5 5 5 5 5 5 7 4 5 5 5 5 5 5 8 9 14 15 14 14 15 14 98 12 19 17 19 17 17 19 94 10 12 11 12 12 11 12 98 Table 5 . Power of MCE tests with Student errors k2 (a1 , a2 )′ = (−20, 0)′ η 1 = 0 η 1 = .01 η 1 = .5 η2 = 0 η2 = 0 η2 = 0 (a1 , a2 )′ = (−5, 5)′ η 1 = 0 η 1 = .01 η 1 = .5 η2 = 0 η2 = 0 η2 = 0 (a1 , a2 )′ = (0, 0)′ η 1 = 0 η 1 = .01 η 1 = .5 η2 = 0 η2 = 0 η2 = 0 (a1 , a2 )′ = (.5, .2)′ η 1 = 0 η 1 = .01 η 1 = .5 η 2 = .5 η 2 = .5 η 2 = .5 (a1 , a2 )′ = (100, 100)′ η 1 = 0 η 1 = .01 η 1 = .5 η 2 = .5 η 2 = .5 η 2 = .5 27 T1 T2 T3 T4 H1 H2 H3 RH 5 5 5 5 5 5 5 5 3 5 4 5 3 4 5 5 6 7 4 7 4 5 5 7 24 97 86 97 86 86 97 100 6 4 6 4 6 6 4 4 7 5 6 5 6 6 5 8 33 92 72 92 71 72 92 100 5 5 4 5 3 4 5 5 4 4 3 5 3 3 4 5 4 5 3 5 4 3 4 4 7 10 10 10 10 10 10 12 7 10 8 10 8 8 10 12 9 10 6 10 6 6 10 10 75 98 89 98 87 89 98 100 68 100 82 100 82 82 100 100 63 99 90 99 90 90 99 100 T1 T2 T3 T4 H1 H2 H3 RH 10 10 10 10 10 10 10 10 2 4 4 4 3 4 4 3 6 7 5 7 6 6 7 7 18 90 83 90 82 83 90 100 3 5 5 5 5 5 5 5 5 5 2 5 2 2 5 8 29 82 77 82 78 77 82 99 4 5 5 5 4 3 5 5 5 5 4 3 3 3 5 4 4 5 3 5 3 4 5 4 5 6 6 6 6 6 6 6 2 4 3 4 3 3 4 7 4 8 8 8 8 8 8 10 52 90 84 90 82 84 90 100 62 90 86 90 86 86 90 100 59 93 88 93 88 88 93 100 T1 T2 T3 T4 H1 H2 H3 RH 20 20 20 20 20 20 20 20 4 5 3 5 4 3 5 5 6 8 6 8 5 6 8 7 15 65 65 65 65 65 65 100 4 5 4 5 4 4 5 5 10 10 11 10 11 11 10 14 15 44 48 44 48 48 44 98 5 5 4 5 3 4 5 4 4 5 4 5 3 4 5 4 5 5 3 4 2 3 5 5 5 5 5 5 5 5 5 5 4 6 7 6 7 7 6 9 4 6 4 6 4 4 6 5 50 64 64 64 64 64 64 100 51 60 61 60 61 61 60 100 55 73 74 73 74 74 73 100 9. Conclusion This paper develops a finite-sample analysis of the distribution of the standard Durbin-Wu-Hausman and Revankar-Hartley specification tests under both the null hypothesis of exogeneity (level) and the alternative hypothesis of endogeneity (power), with or without identification. Our analysis provides several new insights and extensions of earlier procedures. The characterization of the finite-sample distributions of the statistics under the null hypothesis shows that all tests are typically robust to weak instruments (level is controlled). We provide a characterization of the power of the tests that clearly exhibits the factors that determine power. We show that exogeneity tests have no power in the extreme case where all IVs are weak [similar to Staiger and Stock (1997), and Guggenberger (2010)], but do have power as soon as we have one strong instrument. As a result, exogeneity tests can detect an exogeneity problem even if not all model parameters are identified, provided partial identification holds. Moreover, the finite-sample characterization of the distributions of the tests allows the construction of exact identification-robust exogeneity tests even in cases where conventional asymptotic theory breaks down. In particular, DWH and RH tests are valid even if the distribution of the errors does not have moments (Cauchy-type distribution, for example). We present a Monte Carlo experiment which confirms our finite-sample theory. The large-sample properties of the tests and estimation issues related to pretesting are examined in Doko Tchatoka and Dufour (2011). 28 APPENDIX A. Notes A.1. Unified formulation of DWH test statistics We establish the unified formulation of Durbin-Wu statistics in (4.1) - (3.11), as well as the three versions of Hausman (1978) statistic. From Wu (1973, Eqs. (2.1), (2.18), (3.16), (3.20)), Tl , l = 1, 2, 3, 4 are defined as T1 = κ 1 Q∗ /Q1 , T2 = κ 2 Q∗ /Q2 , T1 = κ 3 Q∗ /Q3 , T2 = κ 4 Q∗ /Q4 , −1 Q∗ = (b1 − b2 )′ (Y ′ A2Y )−1 − (Y ′ A1Y )−1 (b1 − b2 ), ′ (A.1) (A.2) ∗ Q1 = (y −Y b2 ) A2 (y −Y b2 ), Q2 = Q4 − Q , ′ (A.3) ′ Q4 = (y −Y b1 ) A1 (y −Y b1 ), Q3 = (y −Y b2 ) A1 (y −Y b2 ), (A.4) bi = (Y ′ AiY )−1Y ′ Ai y, i = 1, 2, A1 = M1 , A2 = M − M1 , (A.5) where b1 is the ordinary least squares estimator of β , and b2 is the instrumental variables method estimator of β . So, from our notations, b1 ≡ β̂ and b2 ≡ β̃ . So, from (3.9) - (3.11), we have Q∗ = T (β̃ − β̂ )′ ∆ˆ −1 (β̃ − β̂ ) = T σ̃ 2 (β̃ − β̂ )′ Σ̂2−1 (β̃ − β̂ ), Q1 = T σ̃ 21 , Q3 = T σ̃ 2 , (A.6) Q4 = T σ̂ 2 , (A.7) Q2 = Q4 − Q∗ = T σ̂ 2 − T (β̃ − β̂ )′ ∆ˆ −1 (β̃ − β̂ ) = T σ̃ 22 (A.8) so that Tl , can be expressed as: Tl = κ l (β̃ − β̂ )′ Σ̃l−1 (β̃ − β̂ ) , l = 1, 2, 3, 4 , (A.9) where κ l , and Σ̃l are defined in (4.1) - (3.11). The formulation in (A.9) shows clearly the link between Wu (1973) tests and Hausman (1978) test. A.2. Regression interpretation of DWH test statistics Consider equations (3.3) - (3.6). First, we note that H0 and Hb can be written as H0 : Rθ = 0 ⇔ Rb = a , Hb : R∗ θ ∗ = 0 ⇔ R∗ θ ∗ = β − a , where R = ′ h 0 0 IG ′ i and R∗ = h IG 0 −IG i ′ . By definition, we have θ̂ ∗ = [β̃ , γ̃ ′ , b̃′ ]′ and θ̂ ∗0 = [β̂ , γ̂ ′ , β̂ ]′ , where β̃ and γ̃ are the 2SLS estimators of β and γ and β̂ and γ̂ are the OLS 29 estimators of β and γ based on the following model: y = Y β + Z1 γ + u, Ŷ = Z Π̂ , with Π̂ = (Z ′ Z)−1 Z ′Y. So, we can observe that −1 θ̂ ∗0 = θ̂ ∗ + (X̂ ′ X̂)−1 R′∗ R∗ (X̂ ′ X̂)−1 R′∗ (−R∗ θ̂ ∗ ) −1 (R∗ θ̂ ∗ ). S(θ̂ ∗0 ) − S(θ̂ ∗ ) = (θ̂ ∗0 − θ̂ ∗ )′ X̂ ′ X̂(θ̂ ∗0 − θ̂ ∗ ) = (R∗ θ̂ ∗ )′ R∗ (X̂ ′ X̂)−1 R′∗ Furthermore, we have R∗ θ̂ ′ X̂ X̂ (X̂1′ X̂1 )−1 = h IG 0 −IG " i β̃ γ̃ = β̃ − b̃ , b̃ # " (X̂1′ X̂1 ) 0 (X̂1′ X̂1 )−1 0 ′ −1 = , ( X̂ X̂) = 0 (V̂ ′V̂ ) 0 (V̂ ′V̂ )−1 # #−1 " " M11 M12 Ŷ ′Ŷ Ŷ ′ Z1 , = = ′ ′ M21 M22 Z1Ŷ Z1 Z1 # , −1 −1 ′ = [Y ′ (M1 − M)Y ]−1 . So, = Ŷ M1Ŷ where M11 = (Ŷ ′Ŷ ) − Ŷ ′ Z1 (Z1′ Z1 )−1 Z1′ Ŷ (X̂ ′ X̂)−1 R′∗ M11 M12 0 IG M11 = M21 M22 0 M21 0 = ′ −1 ′ −1 0 0 (V̂ V̂ ) −IG −(V̂ V̂ ) R∗ (X̂ ′ X̂)−1 R′∗ = M11 + (V̂ ′V̂ )−1 M11 β̂ − β̃ ′ −1 −1 (b̃ − β̃ ). θ̂ ∗0 − θ̂ ∗ = γ̂ − γ̃ = M21 M11 + (V̂ V̂ ) ′ −1 β̂ − b̃ −(V̂ V̂ ) Hence, we get −1 −1 (b̃ − β̃ ) = M11 M11 + (V̂ ′V̂ )−1 ã , β̂ − β̃ = M11 M11 + (V̂ ′V̂ )−1 where ã = b̃ − β̃ is the OLS estimate of a from (3.4). We see from (??) that ã = b̃ − β̃ So, we have −1 M11 + (V̂ ′V̂ )−1 M11 (β̂ − β̃ ) ′ = [Y (M1 − M)Y ]−1 + (V̂ ′V̂ )−1 [Y ′ (M1 − M)Y ](β̂ − β̃ ) . = −1 (R∗ θ̂ ∗ ) S(θ̂ ∗0 ) − S(θ̂ ∗ ) = (R∗ θ̂ ∗ )′ R∗ (X̂ ′ X̂)−1 R′∗ 30 (A.10) −1 = (b̃ − β̃ )′ [Y ′ (M1 − M)Y ]−1 + (V̂ ′V̂ )−1 (b̃ − β̃ ) ′ ′ ′ −1 = (β̂ − β̃ ) [Y (M1 − M)Y ] [Y (M1 − M)Y ] + (V̂ ′V̂ )−1 × −1 −1 M11 + (Y ′ MY )−1 M11 (β̂ − β̃ ) [Y ′ (M1 − M)Y ](β̂ − β̃ ) = (β̂ − β̃ )′ M11 −1 −1 −1 −1 = (β̂ − β̃ )′ M11 (A.11) M11 + (Y ′ M1Y − M11 ) M11 (β̂ − β̃ ) . Now, we can apply the following lemma which proof is straightforward and then, is omitted. Lemma A.1 Let A and B be two nonsingular r × r matrices. Then A−1 − B−1 = B−1 (B − A)A−1 = A−1 (B − A)B−1 = A−1 (A − AB−1 A)A−1 = B−1 (BA−1 B − B)B−1 . Furthermore, if B − A is nonsingular, then A−1 − B−1 is nonsingular with (A−1 − B−1 )−1 = A(B − A)−1 B = A + A(B − A)−1 A = A[A−1 + (B − A)−1 ]A = B(B − A)−1 A = B(B − A)−1 B − B = B[(B − A)−1 − B−1 ]B = A(A − AB−1 A)−1 A = B(BA−1 B − B)−1 B. −1 By setting A = M11 and B = Y ′ M1Y in (A.11), and applying Lemma A.1, we get −1 −1 −1 −1 M11 + (Y ′ M1Y − M11 ) M11 (β̂ − β̃ ) S(θ̂ ∗0 ) − S(θ̂ ∗ ) = (β̂ − β̃ )′ M11 = (β̂ − β̃ )′ A A−1 + (B − A)−1 A(β̂ − β̃ ) = (β̂ − β̃ )′ (B−1 − A−1 )−1 (β̂ − β̃ ) = (β̂ − β̃ )′ {[Y ′ (M1 − M)Y ]−1 − (Y ′ M1Y )−1 }−1 (β̂ − β̃ ) 1 1 −1 −1 −1 (β̃ − β̂ )′ [Ω̂IV = − Ω̂LS ] (β̃ − β̂ ) = (β̃ − β̂ )′ ∆ˆ −1 (β̃ − β̂ ) , T T (A.12) where Ω̂IV = T1 Y ′ (M1 − M)Y and Ω̂LS = T1 Y ′ M1Y. Note also that S(θ̂ ∗0 ) − S(θ̂ ∗ ) = S(θ̂ 0 ) − S(θ̂ ) = ã′ [V̂ ′ MX V̂ ]ã , (A.13) where MX = I − PX = I − X(X ′ X)−1 X ′ , X = [Y, Z1 , V̂ ]. Moreover, from (3.12), we have 0 S(θ̂ ) = T σ̃ 22 , S(θ̂ 0 ) = T σ̂ 2 , S∗ (θ̂ ∗ ) = T σ̃ 2 . (A.14) Hence, except for H1 , the other statistics can be expressed as: 0 H2 = T [S(θ̂ 0 ) − S(θ̂ )]/S∗ (θ̂ ∗ ), H3 = T [S(θ̂ 0 ) − S(θ̂ )]/S(θ̂ 0 ) , T1 = 0 κ 1 [S(θ̂ 0 ) − S(θ̂ )]/[S∗ (θ̂ ∗ ) − Se (θ̂ )], 31 T2 = κ 2 [S(θ̂ 0 ) − S(θ̂ )]/S(θ̂ ) , (A.15) (A.16) 0 T3 = κ 3 [S(θ̂ 0 ) − S(θ̂ )]/S∗ (θ̂ ∗ ), T4 = κ 4 [S(θ̂ 0 ) − S(θ̂ )]/S(θ̂ 0 ) , RH = κ [S̄(θ̄ˆ ) − S̄(θ̄ˆ )]/S̄(θ̄ˆ ) , R 0 0 (A.17) (A.18) Equations (A.15) - (A.18) are the regression interpretation of DWH and RH statistics. B. Proofs P ROOF OF L EMMA 6.1 β̃ Note first that = β + [Y ′ (M1 − M)Y ]−1Y ′ (M1 − M)u = β + Ā1 u, Ā1 = [Y ′ (M1 − M)Y ]−1Y ′ (M1 − M) , β̂ β̃ − β̂ = β + (Y ′ M1Y )−1Y ′ M1 u = β + A1 u, (B.1) A1 = (Y ′ M1Y )−1Y ′ M1 = (Ā1 − A1 )u, (β̃ − β̂ )′ ∆ˆ −1 (β̃ − β̂ ) = u′C0 u , (B.2) (B.3) with C0 = (Ā1 − A1 )′ ∆ˆ −1 (Ā1 − A1 ). We also have M1 (y −Y β̃ ) = B̄1 u , (B.4) M(y −Y β̃ ) = Mu − MY Ā1 u = Mu − MM1Y Ā1 u = MM(M1 −M)Y u , (B.5) where B̄1 = M1 − P(M1 −M)Y = M1 (I − P(M1 −M)Y ) = M1 M(M1 −M)Y , and σ̃ 2 = 1 ′ u M1 M(M1 −M)Y u = u′ D̄1 u, T σ̃ 21 = σ̃ 2 − σ̂ 2 = u′ (D̄1 − D1 )u = σ̃ 22 = σ̂ 2 = 1 ′ u M1 MM1Y u = u′ D1 u , T 1 ′ u (M1 − M)M(M1 −M)Y u , T 1 ′ u M1 MM1Y u − u′C0 u = u′ (D1 −C0 )u . T (B.6) (B.7) (B.8) Now, from (B.1) - (B.8) and the definitions of the statistics, we get: H2 = Tu′C0 u/u′ D̄1 u = T (u/σ u )′C0 (u/σ u )/(u/σ u )′ D̄1 (u/σ u ) , (B.9) H3 = Tu C0 u/u D1 u = T (u/σ u ) C0 (u/σ u )/(u/σ u ) D1 (u/σ u ) , (B.10) T1 = κ 1 (u/σ u )′C0 (u/σ u )/(u/σ u )′ (D̄1 − D1 )(u/σ u ), (B.11) T2 = κ 2 (u/σ u )′C0 (u/σ u )/(u/σ u )′ (D1 −C0 )(u/σ u ) , (B.12) T3 = κ 3 (u/σ u )′C0 (u/σ u )/(u/σ u )′ D̄1 (u/σ u ), (B.13) T4 = κ 4 (u/σ u )′C0 (u/σ u )/(u/σ u )′ D1 (u/σ u ) . (B.14) ′ ′ ′ ′ Under H0 , Y is independent of u, and if further the instruments Z are exogenous, the conditional distribution, given X̄ of all statistics in (B.9) - (B.14) depend only on the distribution of u/σ u , irrespective of whether identification is strong or weak. The same result holds for H1 . By observing 32 that 1 T (MX1 − MX̄ ) = PD1 Z2 , RH can also be expressed as: RH = κ R (u/σ u )′ PD1 Z2 (u/σ u )/k2 /(u/σ u )′ (D1 − PD1 Z2 )(u/σ u ) . (B.15) Thus, under H0 , the distribution of RH , given X̄, only depends on u/σ u , whether Rank(Π2 ) = G or not. P ROOF OF L EMMA 6.1 Consider the identities expressing Hi , i = 1, 2, 3, Tl , l = 1, 2, 3, 4, and RH in (B.9) - (B.15). Under H1 , we have u = Va + ε and the results of Theorem 5.2 follow. P ROOF OF L EMMA 6.1 Suppose that a ∈ N (Cπ ). Then, we can show that (Ā1 − A1 )Va = 0, C0Va = 0, D̄1Va = 0, D1Va = 0, MX1 Va = D1Va = 0, MX̄ Va = D1Va − PD1 Z2 Va = 0 , (B.16) (B.17) where Ā1 , A1 , C0 , D̄1 , and D1 are defined in (B.1) - (B.8) and (B.15). To simplify, let us prove that (Ā1 − A1 )Va = 0. First, note that V = Y − Z1 Π1 − Z2 Π2 so that (Ā1 − A1 )Va = [Ā1Y − A1Y − (Ā1 − A1 )(Z1 Π1 + Z2 Π2 )]a. Since Ā1Y = IG = A1Y, hence we have (Ā1 − A1 )Va = −[(Ā1 − A1 )(Z1 Π1 + Z2 Π2 )]a = −(Ā1 − A1 )Z2 Π2 a , (B.18) because Ā1 Z1 = A1 Z1 = 0. Now, we observe that (M1 − M)Z2 = M1 Z2 , hence (Ā1 − A1 )Z2 Π2 a = −1 −1 (Ω̂IV )M1 Z2 Π2 a, which equals zero if and only M1 Z2 Π2 a = 0, i.e. Π2′ Z2′ M1 Z2 Π2 a or equiv− Ω̂LS alently, a ∈ N (Cπ ). So, we have a ∈ N (Cπ ) if and only if (Ā1 − A1 )Va = 0. The proof is similar for the other identities in (B.16)-(B.17). Thus by substituting these identities in Theorem 5.2, we get the results of Corollary 5.3. Suppose now that (5.13)-(5.17) hold. It is easy to see from Theorem 5.2 that this equivalent to (Ā1 − A1 )Va = 0, C0Va = 0, D̄1Va = 0, D1Va = 0, PD1 Z2 Va = 0 (B.19) with probability 1. However, we know that (B.19) holds if and only if a ∈ N (Cπ ). Hence the result follows. P ROOF OF L EMMA 6.1 To simplify the proof, let us focus on H3 . We recall that H3 = T (β̃ − β̂ )′ Σ̂3−1 (β̃ − β̂ ) , (B.20) where β̂ = (Y ′ M1Y )−1Y ′ M1 y, β̃ = [Y ′ (M1 − M)Y ]−1Y ′ (M1 − M)y, Σ̂3 = σ̂ 2 [(Y ′ (M1 − M)Y /T )−1 − (Y ′ M1Y /T )−1 ], and σ̂ 2 = (y − Y β̂ )′ M1 (y − Y β̂ )/T . Let us replace y and Y by y∗ = yR11 + Y R21 and Y∗ = Y R22 in (B.20). Then, we get: H3∗ = T (β̃ ∗ − β̂ ∗ )′ Σ̂∗−1 3 (β̃ ∗ − β̂ ∗ ) 33 (B.21) where β̂ ∗ , β̃ ∗ , Σ̂3∗ , and σ̂ 2∗ are also obtained by replacing y by y∗ = yR11 + Y R21 and Y by Y∗ = Y R22 . Now, we have: Y∗′ M1Y∗ = R′22Y ′ M1Y R22 = R′22Y ′ M1Y R22 , Y∗′ M1 y∗ = R′22 (Y ′ M1 yR11 +Y ′ M1Y R21 ) (B.22) so that we get: ′ ′ −1 −1 ′ ′ ′ −1 β̂ ∗ = R−1 22 (Y M1Y ) (R22 ) R22 (Y M1 yR11 +Y M1Y R21 ) = R22 (β̂ R11 + R21 ) , −1 β̃ ∗ = (Y∗′ (M1 − M)Y∗ )−1Y∗′ (M1 − M)y∗ = R−1 22 (β̃ R11 + R21 ), β̃ ∗ − β̂ ∗ = R22 (β̃ − β̂ )R11 . Furthermore, we also have ′ ′ −1 (Y∗′ (M1 − M)Y∗ /T )−1 − (Y∗′ M1Y∗ /T )−1 = R−1 − (Y ′ M1Y /T )−1 (R−1 22 (Y (M1 − M)Y /T ) 22 ) , and, since R11 > 0, we get −1 (β̃ ∗ − β̂ ∗ )′ (Y∗′ (M1 − M)Y∗ /T )−1 − (Y∗′ M1Y∗ /T )−1 (β̃ ∗ − β̂ ∗ ) ′ ′ −1 ′ −1 −1 2 (β̃ − β̂ ) . = R11 (β̃ − β̂ ) (Y (M1 − M)Y /T ) − (Y M1Y /T ) By the same way, we find −1 ′ y∗ − Ȳ β̂ ∗ = yR11 +Y R22 −Y R22 R22Y ′ (M1 − M)Y R22 Y R22 M1 (yR11 +Y R22 ) = yR11 +Y R22 −Y β̂ R11 −Y R22 = (y −Y β̂ )R11 . σ̂ 2∗ = (y∗ − Ȳ β̂ ∗ )′ M1 (y∗ − Ȳ β̂ ∗ )/T = R211 (y −Y β̂ )′ M1 (y − Ȳ β̂ )/T = R211 σ̂ 2 . Hence, from (B.21), we can see that −1 H3∗ = T R211 (β̃ − β̂ )′ (Y∗′ (M1 − M)Y∗ /T )−1 − (Y∗′ M1Y∗ /T )−1 (β̃ − β̂ )/R211 σ̂ 2 h i−1 = T (β̃ − β̂ )′ σ̂ 2 (Y∗′ (M1 − M)Y∗ /T )−1 − σ̂ 2 (Y∗′ M1Y∗ /T )−1 (β̃ − β̂ ) = H3 (B.23) and the same invariance holds for the author statistics so that Lemma 6.1 follows. P ROOF OF T HEOREM 6.2 Lemma 6.1, we can write: Let us replace y by ȳ and Y by Ȳ in the expressions of the statistics. By Hi = T (β̃ ∗ − β̂ ∗ )′ Σ̂i∗−1 (β̃ ∗ − β̂ ∗ ) , i = 1, 2, 3 , Tl = RH = −1 (β̃ ∗ − β̂ ∗ ) , κ l (β̃ ∗ − β̂ ∗ ) Σ̃l∗ ′ κ R ȳ Σ̂∗R ȳ , ′ l = 1, 2, 3, 4 , (B.24) (B.25) (B.26) where β̂ ∗ , β̃ ∗ , Σ̂∗i , Σ̃∗l and Σ̂∗R are the correspondents of β̂ , β̃ , Σ̂i and Σ̃l defined in (4.2)-(3.11). 34 From (6.40) and by observing that and MZ2 = 0, we have M ȳ = M v̄ , MȲ = MV̄ , M1 ȳ = M1 (µ 1 + v̄) , M1Ȳ = M1 (µ 2 + V̄ ) , −1 where µ 1 = M1 Z2 Π2 ζ = µ 2 P22 ζ and µ 2 = M1 Z2 Π2 P22 , where ζ = β P11 + P21 . From (??), we get: Ȳ ′ (M1 − M)ȳ = (µ 2 + V̄ )′ (M1 − M)(µ 1 + v̄), Ȳ ′ M1 ȳ = (µ 2 + V̄ )′ M1 (µ 1 + v̄), Ȳ ′ M1Ȳ Ȳ ′ (M1 − M)Ȳ (B.27) = (µ 2 + V̄ )′ M1 (µ 2 + V̄ ) = ΩLS (µ 2 , V̄ ) , (B.28) = (µ 2 + V̄ )′ (M1 − M)(µ 2 + V̄ ) = ΩIV (µ 2 , V̄ ) , (B.29) so that β̂ ∗ = ΩLS (µ 2 , V̄ )−1 (µ 2 + V̄ )′ M1 (µ 1 + v̄), β̃ ∗ = ΩIV (µ 2 , V̄ )−1 (µ 2 + V̄ )′ (M1 − M)(µ 1 + v̄), and β̃ ∗ − β̂ ∗ = C(µ 1 + v̄), where C = ΩIV (µ 2 , V̄ )−1 (µ 2 + V̄ )′ (M1 − M) − ΩLS (µ 2 , V̄ )−1 (µ 2 + V̄ )′ M1 . Moreover, we have σ̂ 2∗ = T1 (ȳ − Ȳ β̂ ∗ )′ M1 (ȳ − Ȳ β̂ ∗ ) = T1 (µ 1 + v̄)′C∗ (µ 1 + v̄) = 1 2 σ̃ 2∗ = T1 (ȳ − Ȳ β̃ ∗ )′ M1 (ȳ − Ȳ β̃ ∗ ) = T1 (µ 1 + v̄)′ D′∗ D∗ (µ 1 + v̄) = T ω LS ( µ 1 , µ 2 , V̄ , v̄), 1 2 ω IV (µ 1 , µ 2 , V̄ , v̄), with C∗ = I − M1 (µ 2 + V̄ )ΩLS (µ 2 , V̄ )−1 (µ 2 + V̄ )′ M1 and D∗ = T I − M1 (µ 2 + V̄ )ΩIV (µ 2 , V̄ )−1 (µ 2 + V̄ )′ (M1 − M) M1 . Hence, we get Σ̂1∗ = ω 2IV (µ 1 , µ 2 , V̄ , v̄)ΩIV (µ 2 , V̄ )−1 − ω 2LS (µ 1 , µ 2 , V̄ , v̄)ΩLS (µ 2 , V̄ )−1 , 1 2 1 Σ̂2∗ = ω IV (µ 1 , µ 2 , V̄ , v̄)∆ , Σ̂3∗ = ω 2LS (µ 1 , µ 2 , V̄ , v̄)∆ , T T (B.30) where ∆ = C′C = ΩIV (µ 2 , V̄ )−1 − ΩLS (µ 2 , V̄ )−1 . If T − k1 − k2 > G, then ∆ > 0, thus Hi = T [µ 1 + v̄]′Γi (µ 1 , µ 2 , v̄, V̄ )[µ 1 + v̄] , i = 1, 2, 3 . where Γi (µ 1 , µ 2 , v̄, V̄ ), i = 1, 2, 3 are defined in Theorem 6.2. Since T4 = (κ 4 /T )H3 , we find T4 = κ 4 [µ 1 + v̄]′Γ3 (µ 1 , µ 2 , v̄, V̄ )[µ 1 + v̄] . (B.31) In addition, σ̃ 2∗2 = σ̂ 2∗ − σ̃ 2∗ (β̄ ∗ − β̃ ∗ )′ (Γ̄2 )−1 (β̄ ∗ − β̃ ∗ ) and σ̃ 2∗2 = ω 2LS (µ 1 , µ 2 , V̄ , v̄) − (µ 1 + v̄)′C′ ∆ −1C(µ 1 + v̄) = (µ 1 + v̄)′ (C∗ −C′ ∆ −1C)(µ 1 + v̄) = ω 22 (µ 1 , µ 2 , V̄ , v̄) ≡ ω 22 , hence, we find T2 = κ2 [µ + v̄]′C′ ∆ −1C[µ 1 + v̄] . ω 22 1 (B.32) In the same way, we also get: Tl = κl [µ + v̄]′C′ ∆ −1C[µ 1 + v̄] , ω 2l 1 l = 1, 3, RH = κR [µ + v̄]′ PD1Z̄2 [µ 1 + v̄] , ω 2R 1 where ω 2l , l = 1, 3 and ω 2R are defined in Section 3. P ROOF OF L EMMA 6.1 Set Π2 a = 0 in the above proof of Theorem 6.2 and Corollary 6.3 follows. 35 P ROOF OF T HEOREM 6.4 From Theorem 6.2, we have Tl = κ l [µ 1 + v̄]′Γ̄l (µ 1 , µ 2 , v̄, V̄ )[µ 1 + v̄], Hi = T [µ 1 + v̄]′Γi (µ 1 , µ 2 , v̄, V̄ )[µ 1 + v̄], = κ R [µ 1 + v̄]′ΓR (µ 1 , µ 2 , v̄, V̄ )[µ 1 + v̄] , RH for all l = 1, 2, 3, 4 and all i = 1, 2, 3, where Γ̄l (µ 1 , µ 2 , V̄ , v̄), Γ̄i (µ 1 , µ 2 , V̄ , v̄), ΓR (µ 1 , µ 2 , V̄ , v̄), µ 1 , µ 2 , κ l and κ R are defined in Section 3. Assume that Z is fixed. Under the normality assumption (6.27), µ 1 + v̄ is independent of V̄ and µ 1 + v̄|Z ∼ N(µ 1 , 1). Since C′ ∆ −1C, is symmetric idempotent of rank G, C and ∆ are defined in Theorem 6.2, we have (µ 1 + v̄)′C′ ∆ −1C(µ 1 + v̄)| V̄ ∼ χ 2 (G, ν 1 ), where ν 1 = µ ′1C′ ∆ −1C µ 1 . By the same way, the denominator of T1 (without the scaling factor) is (µ 1 + v̄)′ E(µ 1 + v̄)| V̄ ∼ χ 2 (k2 − G, υ 1 ), where E defined in Section 3 is symmetric idempotent of rank k2 − G, and with υ 1 = µ ′1 E µ 1 . Furthermore, we have (C′ ∆ −1C)E = 0, hence T1 | V̄ ∼ F(G, k2 − G; ν 1 , υ 1 ). (B.33) T2 | V̄ ∼ F(G, T − k1 − 2G; ν 1 , υ 2 ), (B.34) By the same way, we get: where υ 2 = µ ′1 (C∗ −C′ ∆ −1C)µ 1 . Now, from the notations in Theorem 6.2, we can write: T4 = κ 4 /(1 + 1 ), κ 2 T2 and since T2 | V̄ ∼ F(G, T − k1 − 2G; ν 1 , υ 2 ), we have T4 | V̄ ∼ κ 4 /[1 + 1 T2 | V̄ (B.35) ∼ F(T − k1 − 2G, G; υ 2 , ν 1 ) so that 1 F(T − k1 − 2G, G; υ 2 , ν 1 )]. κ2 (B.36) Note also that ω 2LS ≥ ω 22 entails that T4 | V̄ ≤ κ4 (µ + v̄)′C′ ∆ −1C(µ 1 + v̄)| V̄ = κ̄ ∗2 T2 | V̄ ∼ κ̄ ∗2 F(G, T − k1 − 2G; ν 1 , υ 2 ), ω 22 1 (B.37) where κ 2 , κ 4 , κ̄ ∗2 are given in Theorem 6.4. For T3 , we note that its numerator and denominator are such that (µ 1 + v̄)′C′ ∆ −1C(µ 1 + v̄)| V̄ ∼ χ 2 (G; ν 1 ), ω 2IV = (µ 1 + v̄)′ D′∗ D∗ (µ 1 + v̄) ∼ χ 2 (T − k1 − G; ν 3 ) , (B.38) where ν 3 = µ ′1 D′∗ D∗ µ 1 . Since D′∗ D∗ (C′ ∆ −1C) 6= 0, T3 does not follow necessary a F-distribution. By the same way, we get the results for H2 , H3 and RH . 36 References Anderson, T. W., Rubin, H., 1949. Estimation of the parameters of a single equation in a complete system of stochastic equations. Annals of Mathematical Statistics 20, 46–63. Bekker, P., 1994. Alternative approximations to the distributions of instrumental variable estimators. Econometrica 62, 657–681. Cragg, J., Donald, S., 1993. Testing identifiability and specification in instrumental variables models. Econometric Theory 9, 222–240. Doko Tchatoka, F. , Dufour, J.-M. , 2008. Instrument endogeneity and identification-robust tests: some analytical results. Journal of Statistical Planning and Inference 138(9), 2649–2661. Doko Tchatoka, F., Dufour, J.-M., 2011. Exogeneity tests and estimation in IV regressions. Technical report, Department of Economics, McGill University, Canada Montréal, Canada. Donald, S. G., Newey, W. K., 2001. Choosing the number of instruments. Econometrica 69, 1161– 1191. Dufour, J.-M., 1987. Linear Wald methods for inference on covariances and weak exogeneity tests in structural equations. In: I. B. MacNeill , G. J. Umphrey, eds, Advances in the Statistical Sciences: Festschrift in Honour of Professor V.M. Joshi’s 70th Birthday. Volume III, Time Series and Econometric Modelling. D. Reidel, Dordrecht, The Netherlands, pp. 317–338. Dufour, J.-M., 1997. Some impossibility theorems in econometrics, with applications to structural and dynamic models. Econometrica 65, 1365–1389. Dufour, J.-M. , 2003. Identification, weak instruments and statistical inference in econometrics. Canadian Journal of Economics 36(4), 767–808. Dufour, J.-M. , 2006. Monte carlo tests with nuisance parameters: A general approach to finitesample inference and nonstandard asymptotics in econometrics. Journal of Econometrics 138, 2649–2661. Dufour, J.-M. , Khalaf, L. , 2002. Simulation based finite and large sample tests in multivariate regressions. Journal of Econometrics 111(2), 303–322. Dufour, J.-M., Khalaf, L., Beaulieu, M.-C., 2010. Multivariate residual-based finite-sample tests for serial dependence and GARCH with applications to asset pricing models. Journal of Applied Econometrics 25, 263–285. Dufour, J.-M., Taamouti, M., 2005. Projection-based statistical inference in linear structural models with possibly weak instruments. Econometrica 73(4), 1351–1365. Dufour, J.-M., Taamouti, M., 2007. Further results on projection-based inference in IV regressions with weak, collinear or missing instruments. Journal of Econometrics 139(1), 133–153. 37 Durbin, J., 1954. Errors in variables. Review of the International Statistical Institute 22, 23–32. Guggenberger, P., 2010. The impact of a Hausman pretest on the size of the hypothesis tests. Econometric Theory 156, 337–343. Hahn, J., Ham, J., Moon, H. R., 2010. The Hausman test and weak instruments. Journal of Econometrics 160, 289–299. Hall, A. R., Peixe, F. P. M., 2003. A consistent method for the selection of relevant instruments. Econometric Reviews 2(3), 269–287. Hall, A. R., Rudebusch, G. D., Wilcox, D. W., 1996. Judging instrument relevance in instrumental variables estimation. International Economic Review 37, 283–298. Hansen, L. P., 1982. Large sample properties of generalized method of moments estimators. Econometrica 50, 1029–1054. Hansen, L. P., Heaton, J. , Yaron, A. , 1996. Finite sample properties of some alternative GMM estimators. Journal of Business and Economic Statistics 14, 262–280. Hansen, L. P. , Singleton, K. , 1982. Generalized instrumental variables estimation of nonlinear rational expectations models. Econometrica 50, 1269–1286. Harville, D. A., 1997. Matrix Algebra from a Statistician’s Perspective. Springer-Verlag, New York. Hausman, J., 1978. Specification tests in econometrics. Econometrica 46, 1251–1272. Kleibergen, F. , 2002. Pivotal statistics for testing structural parameters in instrumental variables regression. Econometrica 70(5), 1781–1803. Kleibergen, F., 2005. Testing parameters in GMM without assuming that they are identified. Econometrica 73, 1103–1124. Moreira, M. J. , 2003. A conditional likelihood ratio test for structural models. Econometrica 71(4), 1027–1048. Nelson, C., Startz, R., 1990a. The distribution of the instrumental variable estimator and its t-ratio when the instrument is a poor one. The Journal of Business 63, 125–140. Nelson, C., Startz, R., 1990b. Some further results on the exact small properties of the instrumental variable estimator. Econometrica 58, 967–976. Phillips, P. C. B., 1989. Partially identified econometric models. Econometric Theory 5, 181–240. Rao, T., Mitra, C., eds, 1971. Generalized Inverse of Matrices and its Applications. John Wiley & Sons, Inc. Hoboken, New Jersey. 38 Revankar, N. S., Hartley, M. J., 1973. An independence test and conditional unbiased predictions in the context of simultaneous equation systems. International Economic Review 14, 625–631. Sargan, J., 1983. Identification and lack of identification. Econometrica 51, 1605–1633. Staiger, D., Stock, J. H., 1997. Instrumental variables regression with weak instruments. Econometrica 65(3), 557–586. Stock, J. H., Wright, J. H., 2000. GMM with weak identification. Econometrica 68, 1055–1096. Stock, J. H., Wright, J. H., Yogo, M., 2002. A survey of weak instruments and weak identification in generalized method of moments. Journal of Business and Economic Statistics 20(4), 518–529. Wang, J., Zivot, E., 1998. Inference on structural parameters in instrumental variables regression with weak instruments. Econometrica 66(6), 1389–1404. Wu, D.-M., 1973. Alternative tests of independence between stochastic regressors and disturbances. Econometrica 41, 733–750. 39