On the finite-sample theory of exogeneity tests with possibly

advertisement
On the finite-sample theory of exogeneity tests with possibly
non-Gaussian errors and weak identification ∗
Firmin Doko Tchatoka†
University of Tasmania
Jean-Marie Dufour ‡
McGill University
February 2013
∗ The authors
thank Atsushi Inoue, Marine Carrasco, Jan Kiviet and Benoit Perron for several useful comments. This
work was supported by the William Dow Chair in Political Economy (McGill University), the Canada Research Chair
Program (Chair in Econometrics, Université de Montréal), the Bank of Canada (Research Fellowship), a Guggenheim
Fellowship, a Konrad-Adenauer Fellowship (Alexander-von-Humboldt Foundation, Germany), the Institut de finance
mathématique de Montréal (IFM2), the Canadian Network of Centres of Excellence [program on Mathematics of Information Technology and Complex Systems (MITACS)], the Natural Sciences and Engineering Research Council of
Canada, the Social Sciences and Humanities Research Council of Canada, the Fonds de recherche sur la société et la culture (Québec), and the Fonds de recherche sur la nature et les technologies (Québec), and a Killam Fellowship (Canada
Council for the Arts).
† School of Economics and Finance, University of Tasmania, Private Bag 85, Hobart TAS 7001, Tel: +613 6226
7226, Fax:+61 3 6226 7587; e-mail: Firmin.dokotchatoka@utas.edu.au. Homepage: http://www.fdokotchatoka.com
‡ William Dow Professor of Economics, McGill University, Centre interuniversitaire de recherche en analyse des
organisations (CIRANO), and Centre interuniversitaire de recherche en économie quantitative (CIREQ). Mailing address: Department of Economics, McGill University, Leacock Building, Room 519, 855 Sherbrooke Street West,
Montréal, Québec H3A 2T7, Canada. TEL: (1) 514 398 4400 ext. 09156; FAX: (1) 514 398 4800; e-mail: jeanmarie.dufour@mcgill.ca . Web page: http://www.jeanmariedufour.com
ABSTRACT
We investigate the finite-sample behaviour of the Durbin-Wu-Hausman (DWH) and RevankarHartley (RH) specification tests with or without identification. We consider two setups based on
conditioning upon the fixed instruments and parametric assumptions on the distribution of the errors. Both setups are quite general and account for non-Gaussian errors. Except for a couple of
Wu (1973) tests and the RH-test, finite-sample distributions are not available for the other statistics
[including the most standard Hausman (1978) statistic] even when the errors are Gaussian. In this
paper, we propose an analysis of the distributions of the statistics under both the null hypothesis
(level) and the alternative hypothesis (power). We provide a general characterization of the distributions of the test statistics, which exhibits useful invariance properties and allows one to build
exact tests even for non-Gaussian errors. Provided such finite-sample methods are used, the tests
remain valid (level is controlled) whether the instruments are strong or weak. The characterization
of the distributions of the statistics under the alternative hypothesis clearly exhibits the factors that
determine power. We show that all tests have low power when all instruments are irrelevant (strict
non-identification). But power does exist as soon as there is one strong instrument (despite the fact
overall identification may fail). We present simulation evidence which confirms our finite-sample
theory.
Key words: Exogeneity tests; finite-sample; weak instruments; strict exogeneity; Cholesky error
family; pivotal; identification-robust; exact Monte Carlo exogeneity tests.
JEL classification: C3; C12; C15; C52.
i
Contents
1.
Introduction
1
2.
Framework
3
3.
Notations and definitions
4
4.
Exogeneity test statistics
4.1. Unified presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2. Regression interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
6
7
5.
Strict exogeneity
5.1. Pivotality under strict exogeneity . . . . . . . . . . . . . . . . . . . . . . . . .
5.2. Power under strict exogeneity . . . . . . . . . . . . . . . . . . . . . . . . . .
8
8
9
6.
Cholesky error families
10
7.
Exact Monte Carlo exogeneity (MCE) tests
15
8.
Simulation experiment
8.1. Standard exogeneity tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2. Exact Monte Carlo exogeneity (MCE) tests . . . . . . . . . . . . . . . . . . . .
9.
A.
Conclusion
28
Notes
A.1. Unified formulation of DWH test statistics . . . . . . . . . . . . . . . . . . . .
A.2. Regression interpretation of DWH test statistics . . . . . . . . . . . . . . . . . .
B.
17
18
24
Proofs
29
29
29
32
List of Tables
1
2
3
4
5
Power of exogeneity tests at nominal level 5%; G = 2, T = 50 . . . . . . . . . .
Power of exogeneity tests at nominal level 5% with Cauchy errors; G = 2, T = 50
Power of MCE tests with Gaussian errors . . . . . . . . . . . . . . . . . . . . .
Power of MCE tests with Cauchy errors . . . . . . . . . . . . . . . . . . . . . .
Power of MCE tests with Student errors . . . . . . . . . . . . . . . . . . . . . .
20
22
25
26
27
List of Assumptions, Propositions and Theorems
5.1 Theorem : Finite-sample distributions of exogeneity tests . . . . . . . . . . . . . . .
ii
9
5.2
5.3
6.1
6.2
6.3
6.4
6.5
Theorem : Finite-sample distributions of exogeneity tests
Theorem : Finite-sample distributions of exogeneity tests
Lemma : Cholesky invariance of exogeneity tests . . . .
Theorem : Finite-sample distributions of exogeneity tests
Theorem : Finite-sample distributions of exogeneity tests
Theorem : Finite-sample distributions of exogeneity tests
Theorem : Finite-sample distributions of exogeneity tests
Proof of Lemma 6.1 . . . . . . . . . . . . . . . . .
Proof of Lemma 6.1 . . . . . . . . . . . . . . . . .
Proof of Lemma 6.1 . . . . . . . . . . . . . . . . .
Proof of Lemma 6.1 . . . . . . . . . . . . . . . . .
Proof of Theorem 6.2 . . . . . . . . . . . . . . . .
Proof of Lemma 6.1 . . . . . . . . . . . . . . . . .
Proof of Theorem 6.4 . . . . . . . . . . . . . . . .
iii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9
10
12
13
13
14
14
32
33
33
33
34
35
36
1.
Introduction
A basic problem in econometrics consists of estimating a linear relationship where the explanatory
variables and the errors may be correlated. In order to detect an endogeneity problem between explanatory variables and disturbances, researchers often apply an exogeneity test, usually by resorting
to instrumental variable (IV) methods. Exogeneity tests of the type proposed by Durbin (1954), Wu
(1973), Hausman (1978), and Revankar and Hartley (1973)– henceforth DWH-and RH-tests– are
often used to decide whether one should apply ordinary least squares (OLS) or instrumental variable methods. One key assumption of DWH and RH-tests however, is the available instruments are
strong. Not much is known, at least in finite-sample, about their behaviour when identification is
deficient or weak (weak instruments).
In the last two decades, literature has emerged that has raised concerns with the quality of inferences based on conventional methods, such as instrumental variables and ordinary least squares
settings, when the instruments are only weakly correlated with the endogenous regressors. Many
studies have shown that even ex-post conventional large-sample approximations are misleading
when instruments are weak. The literature on the “weak instruments” problem is now considerable1 . Several authors have proposed identification-robust procedures that are applicable even when
the instruments are weak. However, identification-robust procedures usually do not focus on regressor exogeneity or instrument validity. Hence, there is still a reason to be concerned when testing the
exogeneity or orthogonality of a regressor.
In Doko Tchatoka and Dufour (2008), we study the impact of instrument endogeneity on Anderson and Rubin (1949, AR-test) and Kleibergen (2002, K-test). We show that both procedures are
in general consistent against the presence of invalid instruments (hence invalid for the hypothesis
of interest), whether the instruments are strong or weak. However, there are cases where test consistency may not hold and their use may lead to size distortions in large samples. In this paper, we
do not focus on the validity of the instruments. Instead, we question whether the standard specification tests are valid in finite samples when: (i) errors have possibly non-Gaussian distribution, and
(ii) identification is weak. In the literature, except for Wu (1973, T1 , T2 tests) and the Revankar
and Hartley (1973, RH -test), finite-sample distributions are not available for the other specification test statistics [including the most standard Hausman (1978) statistic] even when model errors
are Gaussian and identification is strong. This paper fulfills this gap by simultaneously addressing
issues related to finite-sample theory and identification.
Staiger and Stock (1997) provided a characterization of the asymptotic distribution of Hausman
type-tests [namely H1 , H2 , and H3 ] under the local-to-zero weak instruments asymptotic. They
showed that when the instruments are asymptotically irrelevant, all three tests are valid (level is
controlled) but inconsistent. Furthermore, their result indicates that H1 and H2 are conservative.
Staiger and Stock (1997) then observed that the concentration parameter which characterizes instru1 See
e.g. Nelson and Startz (1990a, 1990b); Dufour (1997); Bekker (1994); Phillips (1989); Staiger and Stock
(1997); Wang and Zivot (1998); Dufour (2003); Stock, Wright and Yogo (2002); Kleibergen (2002); Moreira (2003);
Hall, Rudebusch and Wilcox (1996); Hall and Peixe (2003); Donald and Newey (2001); Dufour (2005, 2007).
1
ment quality depends on the sample size in a way that size adjustment is infeasible. In this paper,
we argue that this type of conclusion may go far. The local-to-zero weak instruments asymptotic
implies that all instruments are asymptotically irrelevant. When the model is partially identified, this
setup may lead to misleading conclusions. This raises the following question: how do the alternative
standard specification tests behave when at least one instrument is strong?
Recently, Hahn, Ham and Moon (2010) proposed a modified Hausman test which can be used
for testing the validity of a subset of instruments. Their statistic is pivotal even when the instruments are weak. The problem however, is that the null hypothesis in their study tests the orthogonality of the instruments which are excluded from the structural equation. So, the test proposed by
Hahn et al. (2010) can be viewed as an alternative way of assessing the overidentification restrictions hypothesis of the model [Hansen and Singleton (1982); Hansen (1982); Sargan (1983); Cragg
and Donald (1993); Hansen, Heaton and Yaron (1996); Stock and Wright (2000); and Kleibergen
(2005)]. Clearly, the problem considered by the authors is fundamentally different and less complex
than testing the exogeneity of an included instrument in the structural equation, as done by Durbin
(1954), Wu (1973), Hausman (1978), and Revankar and Hartley (1973).
Guggenberger (2010) investigated the asymptotic size properties of a two-stage test in the linear
IV model, when in the first stage a Hausman (1978) specification test is undertaken as a pretest
of exogeneity of a regressor. He showed that the asymptotic size is one for empirically relevant
choices of the parameter space. This means that the Hausman pre-test does not have sufficient power
against correlations that are local to zero when identification is weak, while the OLS-based t-statistic
takes on large values for such nonzero correlations. While we do not question the basic result of
Guggenberger (2010) in this paper, we observe that cases where this happens include the Staiger and
Stock (1997) weak instruments asymptotic which assumes that all instruments are asymptotically
irrelevant. This however does not account for situations where at least one instrument is strong.
Hence, the conclusions by Guggenberger (2010) may not be applicable when identification is partial.
Doko Tchatoka and Dufour (2011) provide a general asymptotic framework which allows one to
examine the asymptotic behaviour of DWH-tests including cases where partial identification holds.
In this paper, we only focus on finite samples. The behaviour of DWH and RH exogeneity
tests is studied under two alternative setups. In the first one, we assume that the structural errors
are strictly exogenous, i.e. independent of the regressors and the available instruments. This setup
is quite general and does not require additional assumptions on the (supposedly) endogenous regressors and the reduced-form errors. In particular, the endogenous regressors can be arbitrarily
generated by any nonlinear function of the instruments and reduced-form parameters. Furthermore,
the reduced-form errors may be heteroscedastic. The second one assumes a Cholesky invariance
property for both structural and reduced-form errors. A similar assumption in the context of multivariate linear regressions is also made in Dufour and Khalaf (2002); and Dufour, Khalaf and
Beaulieu (2010).
In both setups, we propose a finite-sample analysis of the distribution of the tests under the
null hypothesis (level) and the alternative hypothesis (power), with or without identification. Our
analysis provides several new insights and extensions of earlier procedures. The characterization
2
of the finite-sample distributions of the statistics, shows that all tests are typically robust to weak
instruments (level is controlled), whether the errors are Gaussian or not. This result is then used
to develop exact Monte Carlo exogeneity (MCE) tests which are valid even when conventional
asymptotic theory breaks down. In particular, MCE tests remain applicable even if the distribution of
the errors does not have moments (Cauchy-type distribution, for example). Hence, size adjustment
is feasible and the conclusion by Staiger and Stock (1997) may be misleading. Moreover, the
characterization of the power of the tests clearly exhibits the factors that determine power. We show
that all tests have no power in the extreme case where all instruments are weak [similar to Staiger
and Stock (1997) and Guggenberger (2010)], but do have power as soon as we have one strong
instrument. This suggests that DWH and RH exogeneity tests can detect an exogeneity problem
even if not all model parameters are identified, provided partial identification holds. We present
simulation evidence which confirms our theoretical results.
The paper is organized as follows. Section 2 formulates the model studied, and Section 4 describes the statistics. Sections 5 and 6 study the finite-sample properties of the tests with (possibly)
weak instruments. Section 7 presents the exact Monte Carlo exogeneity (MCE) test procedures
while Section 8 presents a simulation experiment. Conclusions are drawn in Section 9 and proofs
are presented in the Appendix.
2.
Framework
We consider the following standard simultaneous equations model:
y = Y β + Z1 γ + u ,
(2.1)
Y = Z1 Π1 + Z2 Π2 +V ,
(2.2)
where y ∈ RT is a vector of observations on a dependent variable, Y ∈ RT ×G is a matrix of observations on (possibly) endogenous explanatory variables (G ≥ 1), Z1 ∈ RT ×k1 is a matrix of observations on exogenous variables included in the structural equation of interest (2.1), Z2 ∈ RT ×k2
is a matrix of observations on the exogenous variables excluded from the structural equation,
u = (u1 , . . . , uT )′ ∈ RT and V = [V1 , . . . , VT ]′ ∈ RT ×G are disturbance matrices with mean zero,
β ∈ RG and γ ∈ Rk1 are vectors of unknown coefficients, Π1 ∈ Rk1 ×G and Π2 ∈ Rk2 ×G are matrices
of unknown coefficients. We suppose that the “instrument matrix”
Z = [Z1 : Z2 ] ∈ RT ×k has full-column rank
(2.3)
where k = k1 + k2 , and
T − k1 − k2 > G ,
k2 ≥ G .
The usual necessary and sufficient condition for identification of this model is rank(Π2 ) = G .
3
(2.4)
The reduced form for [y, Y ] can be written as:
y = Z1 π 1 + Z2 π 2 + v, Y = Z1 Π1 + Z2 Π2 +V ,
(2.5)
where π 1 = γ + Π1 β , π 2 = Π2 β , and v = u +V β = [v1 , . . . , vT ]′ . If any restriction is imposed on γ ,
we see from π 2 = Π2 β that β is identified if and only rank(Π2 ) = G, which is the usual necessary
and sufficient condition for identification of this model. When rank(Π2 ) < G, β is not identified
and the instruments Z2 are weak.
In this paper, we study the finite-sample properties (size and power) of the standard exogeneity
tests of the type proposed by Durbin (1954), Wu (1973), Hausman (1978), and Revankar and Hartley
(1973) of the null hypothesis H0 : E(Y ′ u) = 0, including when identification is deficient or weak
(weak instruments) and the errors [u, V ] may not have a Gaussian distribution.
3.
Notations and definitions
Let β̂ = (Y ′ M1Y )−1Y ′ M1 y and β̃ = [Y ′ (M1 − M)Y ]−1Y ′ (M1 − M)y denote the ordinary least squares
(OLS) estimator and two-stage least squares (2SLS) estimator of β respectively, where
M = M(Z) = I − Z(Z ′ Z)−1 Z ′ ,
M1 = M(Z1 ) = I − Z1 (Z1′ Z1 )−1 Z1′ ,
M1 − M = M1 Z2 (Z2′ M1 Z2 )−1 Z2′ M1 .
(3.1)
(3.2)
Let V̂ = MY, X = [X1 : V̂ ], X1 = [Y : Z1 ], X̂ = [X̂1 : V̂ ], X̂1 = [Ŷ : Z1 ], X̄ = [X1 : Z2 ] = [Y : Z], and
consider the following regression of u on the columns of V :
u = Va + ε ,
(3.3)
where a is a G × 1 vector of unknown coefficients, and ε is independent of V with mean zero
and variance σ 2ε . Define θ = (β ′ , γ ′ , a′ )′ , θ ∗ = (β ′ , γ ′ , b′ )′ , θ̄ = (b′ , γ̄ ′ , ā′ )′ , where b = β + a,
γ̄ = γ − Π1 a, ā = −Π2 a. We then observe that Y = Ŷ + V̂ , where Ŷ = (I − M)Y = PZ Y, and β = b
as soon as a = 0. From the above definitions and notations, the structural equation (2.1) can be
written in the following three different ways:
y = Y β + Z1 γ + V̂ a + e∗ = X θ + e∗ ,
(3.4)
y = Ŷ β + Z1 γ + V̂ b + e∗ = X̂ θ ∗ + e∗ ,
(3.5)
y = Y b + Z1 γ̄ + Z2 ā + ε = X̄ θ̄ + ε ,
(3.6)
where e∗ = PZ Va + ε . Equations (3.3)-(3.6) clearly illustrate that the endogeneity of Y may be
viewed as a problem of omitted variables [see Dufour (1987)].
Let us denote by θ̂ : the OLS estimate of θ in (3.4), θ̂ 0 : the restricted OLS estimate of θ under
a = 0, in (3.4), θ̂ ∗ : the OLS estimate of θ ∗ in (3.5), θ̂ ∗0 : the restricted OLS estimate of θ ∗ under
0
β = b in (3.5), θ̂ ∗ : the restricted OLS estimate of θ ∗ under b = 0 in (3.5) or β = −a in (3.4),θ̄ˆ :
4
the OLS estimate of θ̄ in (3.6), θ̄ˆ 0 : the restricted OLS estimate of θ̄ under ā = 0, and define the
following sum squared errors:
S(ω ) = (y − X ω )′ (y − X ω ), S∗ (ω ) = (y − X̂ ω )′ (y − X̂ ω ),
S̄(ω ) = (y − X̄ ω )′ (y − X̄ ω ),
∀ ω ∈ Rk1 +2G .
(3.7)
Let
Σ̃1 = σ̃ 21 ∆ˆ ,
Σ̃2 = σ̃ 22 ∆ˆ ,
Σ̃3 = σ̃ 2 ∆ˆ ,
Σ̃4 = σ̂ 2 ∆ˆ ,
(3.8)
−1
−1
− σ̂ 2 Ω̂LS
, Σ̂2 = σ̃ 2 ∆ˆ , Σ̂3 = σ̂ 2 ∆ˆ ,
Σ̂1 = σ̃ 2 Ω̂IV
1
1
Σ̂R =
D Z (Z ′ D Z )−1 Z2′ D1 , D1 = M1 MM1Y ,
2 1 2 2 1 2
T
σ̂ R
1 ′
1 ′
−1
−1
− Ω̂LS
,
Ω̂IV =
Y (M1 − M)Y , Ω̂LS = Y M1Y, ∆ˆ = Ω̂IV
T
T
(3.9)
(3.10)
(3.11)
where σ̂ 2 = (y − Y β̂ )′ M1 (y − Y β̂ )/T is the OLS-based estimator of σ 2u , σ̃ 2 = (y − Y β̃ )′ M1 (y −
Y β̃ )/T is the usual 2SLS-based estimator of σ 2u (both without correction for degrees of freedom),
while σ̃ 21 = (y − Y β̃ )′ (M1 − M)(y − Y β̃ )/T = σ̃ 2 − σ̃ 2e , σ̃ 22 = σ̂ 2 − (β̃ − β̂ )′ ∆ˆ −1 (β̃ − β̂ ) = σ̂ 2 −
σ̃ 2 (β̃ − β̂ )′ Σ̂2−1 (β̃ − β̂ ), σ̃ 2e = (y − Y β̃ )′ M(y − Y β̃ )/T, and σ̂ 2R = yMX̄ y′ /T may be interpreted as
alternative IV-based scaling factors, κ 1 = (k2 − G)/G, κ 2 = (T − k1 − 2G)/G, κ 3 = κ 4 = T − k1 −
G, and κ R = (T − k1 − k2 − G)/k2 . From (3.7) and (3.8)-(3.11), we can see that
0
S(θ̂ ) = S∗ (θ̂ ∗ ), S(θ̂ 0 ) = S∗ (θ̂ ∗0 ), S(θ̂ ) = T σ̃ 22 , S(θ̂ 0 ) = T σ̂ 2 , S∗ (θ̂ ∗ ) = T σ̃ 2 .
(3.12)
Throughout the paper, we also use the following notations:
C0 = (Ā1 − A1 )′ ∆ˆ −1 (Ā1 − A1 ), Ā1 = [Y ′ (M1 − M)Y ]−1Y ′ (M1 − M),
1
1
A1 = (Y ′ M1Y )−1Y ′ M1 , D̄1 = M1 M(M1 −M)Y , D1 = M1 MM1Y ,
T
T
−1
−1
− (Va + ε )′ D1 (Va + ε )Ω̂LS
,
Σ1 = (Va + ε )′ D̄1 (Va + ε )Ω̂IV
ΩIV
≡ ΩIV (µ 2 , V̄ ) = (µ 2 + V̄ )′ (M1 − M)(µ 2 + V̄ ) ,
(3.13)
(3.14)
(3.15)
(3.16)
ΩLS ≡ ΩLS (µ 2 , V̄ ) = (µ 2 + V̄ )′ M1 (µ 2 + V̄ ) ,
(3.17)
ω 2IV
≡ ω IV (µ 1 , µ 2 , V̄ , v̄)2 = (µ 1 + v̄)′ D′∗ D∗ (µ 1 + v̄) ,
(3.18)
ω 2LS
≡ ω LS (µ 1 , µ 2 , V̄ , v̄) = (µ 1 + v̄) C∗ (µ 1 + v̄) ,
(3.19)
′
2
C∗ = M1 − M1 (µ 2 + V̄ )ΩLS (µ 2 , V̄ ) (µ 2 + V̄ ) M1
(3.20)
D∗ = M1 − M1 (µ 2 + V̄ )ΩIV (µ 2 , V̄ )−1 (µ 2 + V̄ )′ (M1 − M) ,
(3.21)
ω 21 ≡ ω 1 (µ 1 , µ 2 , V̄ , v̄)2 = (µ 1 + v̄)′ E(µ 1 + v̄) ,
(3.22)
ω 22 ≡ ω 2 (µ 1 , µ 2 , V̄ , v̄)2 = (µ 1 + v̄)′ [C∗ −C′ ∆ −1C](µ 1 + v̄) ,
(3.23)
−1
ω 2R
′
≡ ω R (µ 1 , µ 2 , V̄ , v̄) = (µ 1 + v̄) [D1 − PD1 Z2 ](µ 1 + v̄) ,
′
2
(3.24)
C = ΩIV (µ 2 , V̄ ) (µ 2 + V̄ ) (M1 − M) − ΩLS (µ 2 , V̄ ) (µ 2 + V̄ ) M1 ,
−1
′
5
−1
′
(3.25)
E = (M1 − M) I − (µ 2 + V̄ )ΩIV (µ 2 , V̄ )−1 (µ 2 + V̄ )′ (M1 − M) ,
ω 23 ≡ ω 3 (µ 1 , µ 2 , V̄ , v̄)2 = ω 2IV , ω 24 ≡ ω 4 (µ 1 , µ 2 , V̄ , v̄)2 = ω 2LS ,
1
−1
−1 −1
Γ1 (µ 1 , µ 2 , V̄ , v̄) = C′ [ω 2IV ΩIV
− ω 2LS ΩLS
] C, Γ2 (µ 1 , µ 2 , V̄ , v̄) = 2 C′ ∆ −1C ,
ω IV
1 ′ −1
1
Γ3 (µ 1 , µ 2 , V̄ , v̄) =
C ∆ C, Γ̄l (µ 1 , µ 2 , V̄ , v̄) = 2 C′ ∆ −1C , l = 1, 2, 3, 4 ,
2
ω LS
ωl
1
PD Z ,
ΓR (µ 1 , µ 2 , V̄ , v̄) =
ω 2R 1 2
(3.26)
(3.27)
(3.28)
(3.29)
(3.30)
where for any matrix B, PB = B(B′ B)−1 B′ is the projection matrix on the space spanned by the
columns of B, and MB = I − PB .
Finally, let Cπ = Π2′ Z2′ M1 Z2 Π2 denotes the concentration factor2 . We then have M1 Z2 Π2 a = 0
if and only if Cπ a = 0, i.e. a = (IG −Cπ−Cπ )a∗ , where Cπ− is any generalized inverse of Cπ , and a∗
is an arbitrary G × 1 vector [see Rao and Mitra (1971, Theorem 2.3.1)]. Let
N (Cπ ) = {ϖ ∈ RG : Cπ ϖ = 0} ,
(3.31)
denotes the null set of the linear map on RG characterized by the matrix Cπ . Observe that when
Z2′ M1 Z2 has full column rank k2 (which is usually the case), we have N (Cπ ) = {ϖ ∈ RG : Π2 ϖ =
0}, so that N (Cπ ) = {0} when identification hold. However, when identification is weak or
Z2′ M1 Z2 does not have full column rank, there exist ϖ 0 6= 0 such that ϖ 0 ∈ N (Cπ ).
We now presents DWH and RH test statistics studied in this paper.
4.
Exogeneity test statistics
We consider Durbin-Wu-Hausman test statistics, namely three versions of the Hausman-type statistics [Hi , i = 1, 2, 3], the four statistics proposed by Wu (1973) [Tl , l = 1, 2, 3, 4] and the test statistic
proposed by Revankar and Hartley (1973, RH). First, we propose a unified presentation of these
statistics that shows the link between Hausman-and Wu-type tests. Second, we provide an alternative derivation of all test statistics (including RH test statistic) from the residuals of the regression
of the unconstrained and constrained models.
4.1.
Unified presentation
This subsection proposes a unified presentation of the DWH and RH test statistics. The proof of
this unified representation is attached in Appendix A.1. The four statistics proposed by Wu (1973)
can all be written in the form
Tl = κ l (β̃ − β̂ )′ Σ̃l−1 (β̃ − β̂ ) ,
l = 1, 2, 3, 4.
−1
− 21
the errors V have a definite positive covariance matrix ΣV , then ΣV 2 Cπ ΣV
tion matrix. Hence, we referred here to Cπ as the concentration factor.
2 If
6
(4.1)
is often referred to as the concentra-
The three versions of Hausman-type statistics are defined as
Hi = T (β̃ − β̂ )′ Σ̂i−1 (β̃ − β̂ ) ,
i = 1, 2, 3.
(4.2)
And the Revankar and Hartley (1973, RH) statistic is given by:
= κ R y′ Σ̂R y.
RH
(4.3)
The corresponding tests reject H0 when the test statistic is “large”. Unlike RH , Hi , i = 1, 2, 3,
and Tl , l = 1, 2, 3, 4, compare OLS to 2SLS estimators of β . They only differ through the use of
different “covariance matrices”. H1 uses two different estimators of σ 2u , while the others resort
to a single scaling factor (or estimator of σ 2u ). The expressions of the Tl , l = 1, 2, 3, 4, in (4.1)
are much more interpretable than those in Wu (1973). The link between Wu (1973) notations and
ours is established in Appendix A.1. We use the above notations to better see the relation between
Hausman-type tests and Wu-type tests. In particular, it is easy to see that Σ̃3 = Σ̂2 and Σ̃4 = Σ̂3 , so
T3 = (κ 3 /T )H2 and T4 = (κ 4 /T )H3 .
Finite-sample distributions are available for T1 , T2 and RH when the errors are Gaussian.
More precisely, if u ∼ N[0, σ 2 IT ] and Z is independent of u, then:
T1 ∼ F(G, k2 − G) ,
T2 ∼ F(G, T − k1 − 2G) ,
RH ∼ F(k2 , T − k1 − k2 − G)
(4.4)
under the null hypothesis of exogeneity. If furthermore, rank(Π2 ) = G and the sample size is large,
under the exogeneity of Y , we have (with standard regularity conditions):
L
L
Hi → χ 2 (G), i = 1, 2, 3 ; Tl → χ 2 (G), l = 3, 4.
(4.5)
However, even when identification is strong and the errors Gaussian, the finite-sample distributions
of Hi , i = 1, 2, 3 and Tl , l = 3, 4 are not established in the literature. This underscores the importance of this study.
4.2.
Regression interpretation
We now give the regression interpretation of the above statistics. From Section 3, except for H1 ,
Hi , i = 2, 3, Tl , l = 1, 2, 3, 4 and RH can be expressed as [see Appendix A.2 for further details]:
0
H2 = T [S(θ̂ 0 ) − S(θ̂ )]/S∗ (θ̂ ∗ ), H3 = T [S(θ̂ 0 ) − S(θ̂ )]/S(θ̂ 0 ) ,
T1 =
T3 =
RH
0
κ 1 [S(θ̂ 0 ) − S(θ̂ )]/[S∗ (θ̂ ∗ ) − Se (θ̂ )], T2 = κ 2 [S(θ̂ 0 ) − S(θ̂ )]/S(θ̂ ) ,
0
κ 3 [S(θ̂ 0 ) − S(θ̂ )]/S∗ (θ̂ ∗ ), T4 = κ 4 [S(θ̂ 0 ) − S(θ̂ )]/S(θ̂ 0 ) ,
= κ R [S̄(θ̄ˆ 0 ) − S̄(θ̄ˆ )]/S̄(θ̄ˆ 0 ) ,
(4.6)
(4.7)
(4.8)
(4.9)
where Se (θ̂ ) = T σ̃ 2e . Equations (4.6) - (4.9) are the regression formulation of the DWH and RH
statistics. It interesting to observe that DWH statistics test the null hypothesis H0 : a = 0, while RH
7
tests H∗0 : ā = −Π2 a = 0. If rank(Π2 ) = G , a = 0 if and only if ā = 0. However, if rank(Π2 ) < G,
ā = 0 does not entail a = 0. So, H0 ⊆ H∗0 but the inverse may not hold.
Our analysis of the distribution of the statistics under the null hypothesis (level) and the alternative hypothesis (power), considers two setups. The first setup is the strict exogeneity, i.e. the
structural error u is independent of all regressors. The second setup is the Cholesky error family.
This setup assumes that the reduced-form errors belong to Cholesky families.
5.
Strict exogeneity
In this section, we consider the problem of testing the strict exogeneity of Y, i.e. the problem:
H0 : u is independent of [Y, Z]
(5.1)
vs
H1 : u = Va + ε ,
(5.2)
where a is a G×1 vector of unknown coefficients, ε is independent of V with mean zero and variance
σ 2ε . It is important to observe that equation (5.2) does not impose restrictions on the structure of the
errors u and V . This equation is interpreted as the projection of u in the columns of V and holds for
any homoscedastic disturbances u and V with mean zero. Thus, the hypothesis H0 can be expressed
as
H0 : a = 0 .
(5.3)
Note that (5.1) - (5.2) do not require any assumption concerning the functional form of Y . So, we
could assume that Y obeys a general model of the form:
Y
= g(Z1 , Z2 , V, Π ) ,
(5.4)
where g(.) is a possibly unspecified non-linear function, Π is an unknown parameter matrix and
V follows an arbitrary distribution. This setup is quite wide and does allow one to study several
situations where neither V nor u follow a Gaussian distribution. This is particularly important in
financial models with fat-tailed error distributions, such as the Student-t. Furthermore, the errors u
and V may not have moments (Cauchy distribution for example).
Section 5.1 studies the distributions of the statistics under the null hypothesis (level).
5.1.
Pivotality under strict exogeneity
We first characterize the finite-sample distributions of the statistics under H0 , including when identification is weak and the errors are possibly non-Gaussian. Theorem 5.1 establishes the pivotality
of all statistics.
8
Theorem 5.1 F INITE - SAMPLE DISTRIBUTIONS OF EXOGENEITY TESTS. Suppose the assumptions (2.1), (2.3) - (2.4) hold. Under H0 , the conditional distributions given [Y : Z] of all statistics
defined by (4.1) - (4.3) depend only on the distribution of u/σ u irrespective of whether the instruments are strong or weak.
The results of Theorem 5.1 indicate that if the conditional distribution of (u/σ u )|Y, Z does not
involve any nuisance parameter, then all exogeneity tests are typically robust to weak instruments
(level is controlled) whether the instruments are strong or weak. More interestingly, this holds
even if (u/σ u )|Y, Z do not follow a Gaussian distribution. As a result, exact identification-robust
procedures can be developed from the standard specification test statistics even when the errors
have a non-Gaussian distribution (see Section 7). This is particularly important in financial models
with fat-tailed error distributions, such as the Student-t or in models where the errors may not have
any moment (Cauchy-type errors, for example). Furthermore, the exact procedures proposed in
Section 7 do not require any assumption on the distribution of V and the functional form of Y . More
generally, one could assume that Y obeys a general non-linear model as defined in (5.4) and that
V1 , . . . , VT are heteroscedastic.
5.2.
Power under strict exogeneity
We now characterize the distributions of the tests under the general hypothesis (5.2). As before, we
cover both weak and strong identification setups. Theorem 5.2 presents the results.
Theorem 5.2 F INITE - SAMPLE DISTRIBUTIONS OF EXOGENEITY TESTS.
(2.1) - (2.4) hold. If furthermore H1 in (5.2) is satisfied, then we can write
Let the assumptions
H1 = T (Va + ε )′ (Ā1 − A1 )′ Σ1−1 (Ā1 − A1 )(Va + ε ) ,
(5.5)
H2 = T (Va + ε ) C0 (Va + ε )/(Va + ε ) D̄1 (Va + ε ) ,
(5.6)
H3 = T (Va + ε )′C0 (Va + ε )/(Va + ε )′ D1 (Va + ε ) ,
(5.7)
T1 = κ 1 (Va + ε )′C0 (Va + ε )/(Va + ε )′ (D̄1 − D1 )(Va + ε ) ,
(5.8)
T2 = κ 2 (Va + ε )′C0 (Va + ε )/(Va + ε )′ (D1 −C0 )(Va + ε ) ,
(5.9)
′
′
T3 = κ 3 (Va + ε ) C0 (Va + ε )/(Va + ε ) D̄1 (Va + ε ) ,
(5.10)
T4 = κ 4 (Va + ε ) C0 (Va + ε )/(Va + ε ) D1 (Va + ε ) ,
(5.11)
′
′
′
RH
′
= κ R (Va + ε )′ PD1 Z2 (Va + ε )/(Va + ε )′ (D1 − PD1 Z2 )(Va + ε ) ,
(5.12)
where Σ1 , C0 , A1 , D̄1 , D1 , Ω̂IV , Ω̂LS , ∆ˆ κ R , and κ l , l = 1, 2, 3, 4 , are defined in Section 3.
We note first that Theorem 5.2 follows from algebraic arguments only. So, [Y : Z] can be random
in any arbitrary way. Second, given [Y : Z], the distributions of the statistics only depend on the
endogeneity a. We Can then observe that the above characterization clearly exhibits (Ā1 − A1 )Va ,
C0Va, D1Va, D̄1Va, PD1 Z2 Va as the factors that determine power. As a result, Corollary 5.3 examine
the case where all exogeneity tests do not have power.
9
Corollary 5.3 F INITE - SAMPLE DISTRIBUTIONS OF EXOGENEITY TESTS. Under the assumptions of Theorem 5.2, all exogeneity tests do not have power if and on if a ∈ N (Cπ ). More precisely,
the following equalities:
−1
(Ā1 − A1 )ε ,
H1 = T ε ′ (Ā1 − A1 )′ Σ1∗
(5.13)
H2 = T ε C0 ε /ε D̄1 ε , H3 = T ε C0 ε /ε D1 ε ,
′
′
′
′
(5.14)
T1 = κ 1 ε C0 ε /ε (D̄1 − D1 )ε , T2 = κ 2 ε C0 ε /ε (D1 −C0 )ε ,
(5.15)
T3 = κ 3 ε ′C0 ε /ε ′ D̄1 ε , T4 = κ 4 ε ′C0 ε /ε ′ D1 ε ,
(5.16)
′
RH
′
′
′
= κ R ε ′ PD1 Z2 ε /ε ′ (D1 − PD1 Z2 )ε
(5.17)
−1
−1
hold with probability 1 if and only if a ∈ N (Cπ ), where Σ1∗ = ε ′ D̄1 ε Ω̂IV
.
− ε ′ D1 ε Ω̂LS
When a ∈ N (Cπ ), the conditional distributions of the statistics, given [Y : Z], are the same under
the null hypothesis and the alternative hypothesis. Therefore, their unconditional distributions are
also the same under the null and the alternative hypotheses. This entails that the power of the tests
cannot exceed the nominal level. This condition is satisfied when Π2 = 0 (irrelevant instruments),
and all exogeneity tests have no power against complete non identification of model parameters.
We now analyze the properties of the tests when model errors belong to Cholesky families.
6.
Cholesky error families
Let
U = [u, V ] = [U1 , . . . , UT ]′ ,
(6.18)
W = [v, V ] = [u +V β , V ] = [W1 , W2 , . . . , WT ]′ .
(6.19)
We assume that the vectors Ut = [ut , Vt′ ]′ , t = 1, . . . , T, have the same nonsingular covariance matrix:
′
E[Ut Ut ] = Σ =
"
′
σ 2u δ
δ ΣV
#
> 0,
t = 1, . . . , T,
(6.20)
where ΣV has dimension G. Then the covariance matrix of the reduced-form disturbances Wt =
[vt , Vt′ ]′ also have the same covariance matrix, which takes the form:
Ω=
"
σ 2u + β ′ ΣV β + 2β ′ δ
ΣV β + δ
β ′ ΣV + δ ′
ΣV
#
(6.21)
where Ω is positive definite. In this framework, the exogeneity hypothesis can be expressed as
H0 : δ = 0 .
10
(6.22)
Under H1 in (5.2), we can see from (6.20) that
δ = ΣV a,
σ 2u = σ 2ε + a′ ΣV a = σ 2ε + δ ′ ΣV−1 δ .
(6.23)
So, the null hypothesis in (6.22) can be expressed as
Ha : a = 0 .
(6.24)
Wt = JW̄t , t = 1, . . . , T ,
(6.25)
We will now assume that
where the vector W(T ) = vec(W̄1 , . . . , W̄T ) has a known distribution FW̄ and J ∈ R(G+1)×(G+1) is an
unknown upper triangular nonsingular matrix [for a similar assumption in the context of multivariate
linear regressions, see Dufour and Khalaf (2002) and Dufour et al. (2010)]. When the errors Wt obey
(6.25), we say that Wt belongs to the Cholesky error family.
If the covariance matrix of W̄t is an identity matrix IG+1 , the covariance matrix of Wt is
Ω = E[Wt Wt′ ] = JJ ′ .
(6.26)
In particular, these conditions are satisfied when
i.i.d.
W̄t ∼ N[0, IG+1 ] , t = 1, . . . , T .
(6.27)
Since the J matrix is upper triangular, its inverse J −1 is also upper triangular. Let
P = (J −1 )′ .
(6.28)
Clearly, P is a (G + 1) × (G + 1) lower triangular matrix and it allows one to orthogonalize JJ ′ :
′
P′ JJ ′ P = IG+1 ,
(JJ ′ )−1 = PP .
(6.29)
P′ can be interpreted as the Cholesky factor of Ω −1 , so P is the unique lower triangular matrix that
satisfies equation (6.29); see Harville (1997, Section 14.5, Theorem 14.5.11). It will be useful to
consider the following partition of P :
P=
"
P11 0
P21 P22
#
(6.30)
where P11 6= 0 is a scalar and P22 is a nonsingular G × G matrix. In particular, if (6.26) holds, we
see [using (6.21)] that an appropriate P matrix is obtained by taking:
P11 = (σ 2u − δ ′ ΣV−1 δ )−1/2 = σ ε ,
′
P22
ΣV P22 = IG ,
P21 = −(β + ΣV−1 δ )(σ 2u − δ ′ ΣV−1 δ )−1/2 = −(β + a)σ −1
ε .
11
(6.31)
(6.32)
−1
Further this choice is unique. P22 only depends on ΣV and P11 β + P21 = −(ΣV−1 δ )σ −1
ε = −aσ ε .
In particular, if δ = 0, we have P11 = 1/σ u , P21 = −β /σ u and P11 β + P21 = 0.
If we postmultiply [y, Y ] by P, we obtain from (2.5):
[ȳ, Ȳ ] = [y, Y ] P = [yP11 +Y P21 , Y P22 ] = [Z1 , Z2 ]
"
γ + Π1 β
Π2 β
Π1
Π2
#
P + W̄
(6.33)
where
′
′ ′
W̄ = UP = [v̄, V̄ ] = [W̄1 , . . . , W̄T ] , W̄t = [v̄t , V̄t ] ,
v̄ = vP11 +V P21 = [v̄1 , . . . , v̄T ]′ ,
(6.34)
′
V̄ = V P22 = [V̄1 , . . . , V̄T ] .
(6.35)
Then, we can rewrite (6.33) as
ȳ = Z1 (γ P11 + Π1 ζ ) + Z2 Π2 ζ + v̄ ,
Ȳ
= Z1 Π1 P22 + Z2 Π2 P22 + V̄ ,
(6.36)
(6.37)
where
ζ = β P11 + P21 = −(ΣV−1 δ )/(σ 2u − δ ′ ΣV−1 δ )1/2 = −aσ −1
ε .
(6.38)
Since MZ = 0, we have
M ȳ = M v̄ , MȲ = MV̄ ,
(6.39)
M1 ȳ = M1 (µ 1 + v̄) , M1Ȳ = M1 (µ 2 + V̄ ) .
(6.40)
where
µ 1 = M1 Z2 Π2 ζ = −σ −1
ε M1 Z2 Π2 a ,
µ 2 = M1 Z2 Π2 P22 .
(6.41)
Clearly, µ 2 does not depend on the endogeneity parameter a = ΣV−1 δ . Furthermore, ζ = 0 ⇔ δ =
a = 0 and µ 1 = 0. In particular, this condition holds under H0 (δ = a = 0). If Π2 = 0 (complete
non-identification of the model parameters), we have µ 1 = 0 and µ 2 = 0, irrespective of the value
of δ . In this case,
M ȳ = M v̄ , MȲ = MV̄ , M1 ȳ = M1 v̄ , M1Ȳ = M1V̄ .
(6.42)
We can now show the following Cholesky invariance property of all test statistics.
Lemma 6.1 C HOLESKY
INVARIANCE OF EXOGENEITY TESTS .
R=
"
R11 0
R21 R22
#
Let
(6.43)
be a lower triangular matrix such that R11 6= 0 is a scalar and R22 is a nonsingular G × G matrix.
12
If we replace y and Y by y∗ = yR11 + Y R21 and Y∗ = Y R22 in (4.1) - (3.11), then the statistics Hi
(i = 1, 2, 3), Tl (l = 1, 2, 3, 4) and RH do not change.
The above invariance holds irrespective of the choice of lower triangular matrix R. In particular,
one can choose R = P as defined in (6.28). We can now prove the following general theorem on the
distributions of the test statistics.
Theorem 6.2 F INITE - SAMPLE DISTRIBUTIONS OF EXOGENEITY TESTS. Under the assumptions (2.1) - (2.4) and assumption (6.25), the statistics defined in (4.1) - (3.11) have the following
representations:
Hi = T [µ 1 + v̄]′Γi (µ 1 , µ 2 , v̄, V̄ )[µ 1 + v̄] , i = 1, 2, 3,
Tl = κ l [µ 1 + v̄]′Γ̄l (µ 1 , µ 2 , v̄, V̄ )[µ 1 + v̄] ,
RH
l = 1, 2, 3, 4,
= κ R [µ 1 + v̄]′ΓR (µ 1 , µ 2 , v̄, V̄ )[µ 1 + v̄] ,
where [v̄, V̄ ], µ 1 , µ 2 are defined in (6.34) and (6.41), Γi , Γ̄l , and ΓR are defined in Section 3.
The above theorem entails that the distributions of the statistics do not depend on either β or γ .
Observe that Theorem 6.2 follows from algebraic arguments only, so [Y, Z] and [v̄, V̄ ] can be random
in an arbitrary way. If the distributions of Z and [v̄, V̄ ] do not depend on other model parameters,
the theorem entails that the distributions of the statistics depend on model parameters only through
µ 1 and µ 2 . Since µ 2 does not involve δ , µ 1 is the only factor that determines power. If µ 1 6= 0, the
tests have power. This may be the case when at least one instrument is strong (partial identification
of model parameters). However, we can observe that when M1 Z2 Π2 a = 0, µ 1 = 0 and exogeneity
tests have no power. We now provide a formal characterization of the set of parameters in which
exogeneity tests have no power.
Corollary 6.3 characterizes the power of the tests when a ∈ N (Cπ ).
Corollary 6.3 F INITE - SAMPLE DISTRIBUTIONS OF EXOGENEITY TESTS. Under the assumptions of Theorem 6.2, if a ∈ N (Cπ ), we have µ 1 = 0 and the statistics defined in (4.1) - (3.11) have
the following representations:
Hi = T v̄′Γi (µ 2 , v̄, V̄ )v̄, i = 1, 2, 3; Tl = κ l v̄′Γ̄l (µ 2 , v̄, V̄ )v̄, l = 1, 2, 3, 4,
RH
= κ R v̄′ΓR (µ 2 , v̄, V̄ )v̄
irrespective of whether the instruments are weak or strong, where Γi (µ 2 , v̄, V̄ ) ≡ Γi (0, µ 2 , v̄, V̄ ),
Γ̄l (µ 2 , v̄, V̄ ) ≡ Γl (0, µ 2 , v̄, V̄ ), ΓR (µ 2 , v̄, V̄ ) = ΓR (0, µ 2 , v̄, V̄ ), ζ = −(ΣV−1 δ )/(σ 2u − δ ′ ΣV−1 δ )1/2 , Γi ,
Γ̄l , and ΓR are defined in Section 3.
First, note that when a ∈ N (Cπ ), i.e. when M1 Z2 Π2 a = 0, the conditional distributions, given
Z and V̄ of the exogeneity tests, only depend on µ 2 irrespective of the quality of the instruments.
13
In particular, this condition is satisfied when Π2 = 0 (complete non-identification of the model
parameters) or δ = a = 0 (under the null hypothesis). Since µ 2 does not depend on δ or a, all
exogeneity test statistics have the same distribution under both the null hypothesis (δ = a = 0) and
the alternative (δ 6= 0) when a ∈ N (Cπ ) : the power of these tests cannot exceed the nominal levels.
So, the practice of pretesting based on exogeneity tests is unreliable in this case.
Theorem 6.4 characterizes the distributions of the statistics in the special case of Gaussian errors.
Theorem 6.4 F INITE - SAMPLE DISTRIBUTIONS OF EXOGENEITY TESTS. Let the assumptions of
Theorem 6.2 hold. If furthermore the normality assumption (6.27) holds and Z = [Z1 , Z2 ] is fixed,
then
H1 = T [µ 1 + v̄]′Γ1 (µ 1 , µ 2 , v̄, V̄ )[µ 1 + v̄] ,
H2 = T [µ 1 + v̄]′Γ2 (µ 1 , µ 2 , v̄, V̄ )[µ 1 + v̄] ∼ T φ 1 (v̄, ν 1 )/φ 2 (v̄, ν 3 ) ,
H3 | V̄
T1 | V̄
∗
∼ T /[1 + κ −1
2 F(T − k1 − 2G, G; υ 2 , ν 1 )] ≤ κ̄ 1 F(G, T − k1 − 2G; ν 1 , υ 2 ) ,
∼ F(G, k2 − G; ν 1 , υ 1 ), T2 | V̄ ∼ F(G, T − k1 − 2G; ν 1 , υ 2 ),
T3 = κ 2 [µ 1 + v̄]′Γ2 (µ 1 , µ 2 , v̄, V̄ )[µ 1 + v̄] ∼ κ 2 φ 1 (v̄, ν 1 )/φ 2 (v̄, ν 3 ) ,
T4 | V̄
RH | V̄
∗
∼ κ 4 /[1 + κ −1
2 F(T − k1 − 2G, G; υ 2 , ν 1 )] ≤ κ̄ 2 F(G, T − k1 − 2G; ν 1 , υ 2 ) ,
∼ F(k2 , T − k − G; ν R , υ R ) ,
where φ 1 (v̄, ν 1 )| V̄ = [µ 1 + v̄]′C′ ∆ −1C[µ 1 + v̄]| V̄ ∼ χ 2 (G; ν 1 ), φ 2 (v̄, ν 3 )| V̄ = ω 2IV | V̄ ∼ χ 2 (T −k1 −
G; ν 3 ), ν 1 = µ ′1C′ ∆ −1C µ 1 , ν 3 = µ ′1 (D′∗ D∗ )µ 1 , υ 1 = µ ′1 E µ 1 , υ 2 = µ ′1 (C∗ − C′ ∆ −1C)µ 1 , ν R =
µ ′1 PD1 Z2 µ 1 , υ R = µ ′1 (D1 − PD1 Z2 )µ 1 , κ̄ ∗1 = T G/(T − k1 − 2G), κ̄ ∗2 = (T − k1 − G)G/(T − k1 − 2G).
The above theorem entails that given V̄ , the statistics T1 , T2 and RH follow double noncentral
F-distributions, while T4 and H3 are bounded by a double noncentral F-type distribution. However, the distributions of T3 , H2 and H1 cannot be characterized by standard distributions. As in
Theorem 6.2, µ 1 is the factor that determines power. If µ 1 6= 0, the exogeneity tests have power.
However, when µ 1 = 0, all tests have no power as shown in Corollary 6.5.
Corollary 6.5 F INITE - SAMPLE DISTRIBUTIONS OF EXOGENEITY TESTS. Under the assumptions of Theorem 6.4, if a ∈ N (Cπ ), we have ν 1 = ν 3 = υ 1 = υ 2 = ν R = υ R = 0 so that
H1 = T v̄′Γ1 (µ 2 , v̄, V̄ )v̄, H2 = T v̄′Γ2 (µ 2 , v̄, V̄ )v̄ ∼ T φ 1 (v̄)/φ 2 (v̄) ,
∗
H3 ∼ T /(1 + κ −1
2 F(T − k1 − 2G, G)) ≤ κ̄ 1 F(G, T − k1 − 2G) ,
T1 ∼ F(G, k2 − G) , T2 ∼ F(G, T − k1 − 2G),
T3 = κ 2 v̄′Γ2 (µ 2 , v̄, V̄ )v̄ ∼ κ 2 φ 1 (v̄)/φ 2 (v̄) ,
∗
T4 ∼ κ 4 /[1 + κ −1
2 F(T − k1 − 2G, G)] ≤ κ̄ 2 F(G, T − k1 − 2G) ,
RH
∼ F(k2 , T − k − G) ,
where φ 1 (v̄) ≡ φ 1 (v̄, 0), φ 2 (v̄) ≡ φ 2 (v̄, 0), φ 1 (v̄, ν 1 ), φ 1 (v̄, ν 3 ), Γi (µ 1 , µ 2 , v̄, V̄ ), i = 1, 2 are defined
14
in Theorem 6.4.
Observe that when a ∈ N (Cπ ), the non-centrality parameters in the F-distributions vanish. In
particular, under the null hypothesis H0 , we have a = 0 ∈ N (Cπ ) and all exogeneity test statistics
are pivotal. Furthermore, all exogeneity test statistics have the same distribution under the null
hypothesis (δ = a = 0) and the alternative (δ 6= 0): the power of the tests cannot exceed the nominal
levels.
We now describe the exact procedure for testing exogeneity even with non-Gaussian errors: the
Monte Carlo exogeneity tests.
7.
Exact Monte Carlo exogeneity (MCE) tests
The finite-sample characterization of the distribution of exogeneity test statistics in the previous
section shows that the tests are typically robust to weak instruments (level is controlled). However,
the distributions of the statistics (under the null hypothesis) are not standard if the errors are non
Gaussian. Furthermore, even for Gaussian errors, H1 , H2 , and T3 cannot be characterized by
standard distributions. This section develops exact Monte Carlo tests which are identification-robust
even if the errors are non-Gaussian.
Consider again (2.1) and assume that we test the strict exogeneity of Y , i.e. the hypothesis:
H0 : u is independent of [Y, Z] .
(7.1)
If the distribution under H0 of u/σ u is given, the conditional distributions of the exogeneity test
statistics given [Y, Z] do not involve nuisance parameters and so can be simulated [see Theorem
5.1]. Let
W
∈ {Hi , Hl , RH , i = 1, 2, 3 ; l = 1, 2, 3, 4 } .
(7.2)
We shall consider two cases. The first one where the support of W is continuous, and the second
one where it is a discrete set.
Let us first focus on the case where the statistics have continuous distributions. Let W1 , . . . , WN
be a sample of N replications of identically distributed exchangeable random variables with the
same distribution as W [for more details on exchangeability, see Dufour (2006)]. Define W (N) =
(W1 , . . . , WN )′ and let W0 be the value of W based on observed data. Define
p̂N (x) =
N ĜN (x) + 1
N +1
(7.3)
where ĜN (x) is the survival function given by
ĜN (x) ≡ ĜN [x; W (N)] =
15
1 N
∑ 1(Wi ≥ x),
N i=1
(7.4)
1(C) = 1 if condition C holds,
= 0 otherwise.
(7.5)
Then, we can show that
P[ p̂N (W0 ) ≤ α ] =
I[α (N + 1)]
N +1
for 0 ≤ α ≤ 1
(7.6)
[see Dufour (2006, Proposition 2.2)], where I[x] is the largest integer less than or equal to x. So,
p̂N (W0 ) ≤ α is the critical region of the Monte Carlo test with level 1 − α and p̂N (W0 ) is the Monte
Carlo test p-value.
We will now extend this procedure to the general case where the distribution of the statistic
W may be discrete. Assume that W (N) = (W1 , . . . , WN )′ is a sequence of exchangeable random
variables which may exhibit ties with positive probability. More precisely
P(W j = W j′ ) > 0 for j 6= j′ , j, j′ = 1, . . . , N .
(7.7)
Let us associate each variable W j , j = 1, . . . , N, with a random variable U j , j = 1, . . . , N such that
i.i.d
U j , . . . , UN ∼ U (0, 1) ,
(7.8)
U (N) = (U1 , . . . , UN )′ is independent of W (N) = (W1 , . . . , WN )′ where U (0, 1) is the uniform
distribution on the interval (0, 1). Then, we consider the pairs
Z j = (W j , U j ), j = 1, . . . , N ,
(7.9)
which are ordered according to the lexicographic order:
(W j , U j ) ≤ (W j′ , U j′ ) ⇐⇒ {W j < W j′ or (W j = W j′ and U j ≤ U j′ )} .
(7.10)
Let us define the randomized p-value function as
p̃N (x) =
N G̃N (x) + 1
,
N +1
(7.11)
where the tail-area function G̃N is given by
G̃N (x) ≡ G̃N [x; U0 , W (N), U (N)] =
1
N
N
∑ 1[Z j ≥ (x, U0 )],
(7.12)
j=1
U0 is a U (0, 1) random variable independent of W (N) and U (N). Then, following Dufour (2006,
Proposition 2.4), we have
P[ p̃N (W0 ) ≤ α ] =
I[α (N + 1)]
N +1
16
for 0 ≤ α ≤ 1
(7.13)
So, p̃N (W0 ) ≤ α is the critical region of the Monte Carlo test with level 1 − α and p̃N (W0 ) is the
MC-test p-value.
We now describe the algorithm to compute the Monte Carlo tests p-value when the distributions
of the statistics is continuous3 . Before proceeding, it will be useful to recall that under the assumptions of Theorem 5.1, all DWH and RH statistics can be expressed as [see the proof in (B.1)-(B.8)]:
H2 = T (u/σ u )′C0 (u/σ u )/(u/σ u )′ D̄1 (u/σ u ) ,
(7.14)
H3 = T (u/σ u )′C0 (u/σ u )/(u/σ u )′ D1 (u/σ u ) ,
(7.15)
T1 = κ 1 (u/σ u ) C0 (u/σ u )/(u/σ u ) (D̄1 − D1 )(u/σ u ),
(7.16)
T2 = κ 2 (u/σ u )′C0 (u/σ u )/(u/σ u )′ (D1 −C0 )(u/σ u ) ,
(7.17)
T3 = κ 3 (u/σ u )′C0 (u/σ u )/(u/σ u )′ D̄1 (u/σ u ),
(7.18)
T4 = κ 4 (u/σ u )′C0 (u/σ u )/(u/σ u )′ D1 (u/σ u )
(7.19)
′
′
where the matrices C0 , D1 , D̄1 are defined in (3.13)-(3.30). Suppose that
σ −1
u u | Y, Z ∼ F , where F independent of (Y, Z) under H0 and is completely specified. (7.20)
The MC p-values are computed through the following steps:
1. compute the test statistic W0 based on observed data;
( j)
( j)
( j)
( j) = u
′
j = 1, . . . , N , according to
2. generate N i.i.d. variables σ −1
∗ = [u∗1 , . . . , u∗T ] ,
u u
the specified distribution F , and compute the corresponding test statistics W j using (7.14)(7.19) ;
3. compute the MC p-value as
p̂MC (W0 ) =
∑Ni=1 1(Wi ≥ W0 ) + 1
;
N +1
(7.21)
4. reject the null hypothesis H0 at level α if p̂MC (W0 ) ≤ α .
We now study the performance of the standard exogeneity tests and the proposed Monte Carlo
tests through a Monte Carlo experiment.
8.
Simulation experiment
In each of the following experiments, the model is described by the following data generating process:
y = Y1 β 1 +Y2 β 2 + u, (Y1 ,Y2 ) = (Z2 Π21 , Z2 Π22 ) + (V1 ,V2 ),
(8.1)
3 Note
that the algorithm can easily be generalized to discrete distributions.
17
i.i.d.
where Z2 is a T × k2 matrix of instruments such that Z2t ∼ N(0, Ik2 ) for all t = 1, . . . , T, Π21 and
Π22 are k2 -dimensional vectors such that
Π21 = η 1C0 , Π22 = η 2C1 ,
(8.2)
where η 1 and η 2 take the value 0 (design of complete non identification), .01 (design of weak
identification) or .5 (design of strong identification), [C0 ,C1 ] is a k2 × 2 matrix obtained by taking
the first two columns of the identity matrix of order k2 . Observe that (8.2) allows us to consider
the partial identification of β = (β 1 , β 2 )′ . In particular, if Π21 = 0 but Π22 6= 0, β 1 is not identified
but β 2 is. The true value of β is set at β 0 = (2, 5)′ and the number of instruments k2 varies in
{5, 10, 20}. We assume that
(8.3)
u = Va + ε = V1 a1 +V2 a2 + ε ,
where a1 and a2 are 2 × 1 vectors and ε is independent with V = (V1 ,V2 ), V1 and V2 are T × 1
vectors. We consider two type framework for model error distributions:
 

1 0 0
i.i.d  

(1) (V1t , V2t , ε t )′ ∼ N 0,  0 1 0 
0 0 1
for all t = 1, . . . , T
(8.4)
Vt and ε t are independent such that
i.i.d
and (2) (V1t , V2t , ε t )′ ∼
standard Cauchy distribution for all , t = 1, . . . , T .
(8.5)
The sample size is fixed at T = 50 but our results remain valid for alternative choice of the sample
size (even less than 50). The endogeneity parameter a is chosen such that
a = (a1 , a2 )′ ∈ {(−20, 0)′ , (−5, 5)′ , (0, 0)′ , (.5, .2)′ , (100, 100)′ } .
(8.6)
From the above notations, the usual exogeneity hypothesis of Y is expressed as
H0 : a = (a1 , a2 )′ = (0, 0)′ .
(8.7)
The nominal level of the tests in each experiment is set at 5%.
8.1.
Standard exogeneity tests
Table 1 presents the empirical size and power when errors are Gaussian, while Table 2 is for Cauchytype errors. The number of replications is N = 10000 in both cases. The first column of each
table reports the statistics, while the second column contains the values of k2 (number of excluded
instruments). In the other columns, for each value of endogeneity parameter a and the quality of the
instruments η 1 and η 2 , the rejection frequencies are reported. Our main findings can be summarized
as follows:
18
1. all DWH and RH tests are valid (level is controlled) whether identification is strong or weak,
and the errors are Gaussian or not. This confirms our theoretical results. More precisely,
when the errors are Gaussian [framework (8.4)], T1 , T2 , T4 , H3 , and RH have a correct
level, while T3 , H1 and H2 are conservative when identification is weak. For Cauchy-type
errors [framework (8.5)], in addition to T3 , H1 and H2 , T1 is now conservative with weak
instruments. However, T2 , T4 , H3 , and RH still have correct level whether identification is
strong or weak;
2. all exogeneity tests exhibit power even if not all parameters are identified, provided partial
identification holds. This shows how Staiger and Stock (1997) weak instruments asymptotic
may be misleading when identification is partially weak. However, when the instruments are
completely irrelevant, i.e. η 1 = η 2 = 0, all DWH and RH tests have no power whether the
errors are Gaussian or not [similar to Staiger and Stock (1997) and Guggenberger (2010)];
3. our results also indicate that in terms of power, H3 dominates H2 and H2 dominates H1
irrespective of whether identification is deficient or not. In the same way, T2 dominates T4 ,
T4 dominates T1 and T1 dominates T3 .
We now analyze the performance of the proposed Monte Carlo exogeneity tests.
19
Table 1. Power of exogeneity tests at nominal level 5%; G = 2, T = 50
k2
(a1 , a2 )′ = (−20, 0)′
η 1 = 0 η 1 = .01 η 1 = .5
η2 = 0
η2 = 0
η2 = 0
(a1 , a2 )′ = (−5, 5)′
η 1 = 0 η 1 = .01 η 1 = .5
η2 = 0
η2 = 0
η2 = 0
(a1 , a2 )′ = (0, 0)′
η 1 = 0 η 1 = .01 η 1 = .5
η2 = 0
η2 = 0
η2 = 0
(a1 , a2 )′ = (.5, .2)′
η 1 = 0 η 1 = .01 η 1 = .5
η2 = 0
η2 = 0
η2 = 0
(a1 , a2 )′ = (100, 100)′
η 1 = 0 η 1 = .01 η 1 = .5
η2 = 0
η2 = 0
η2 = 0
20
T1
T2
T3
T4
H1
H2
H3
RH
5
5
5
5
5
5
5
5
4.98
4.98
0
4.64
0
0.01
5.34
4.84
4.6
24.92
0.19
24.07
0.09
0.25
25.73
45.25
65.81
100
97.93
100
92.53
98.09
100
100
5.26
5.04
0.02
4.67
0.01
0.03
5.33
5.36
4.92
6.77
0.05
6.29
0.02
0.05
7.19
7.83
70.9
100
97.85
100
91.83
98.02
100
100
4.87
4.96
0.02
4.63
0.01
0.02
5.27
5.04
5.06
5.38
0.03
4.91
0.02
0.04
5.72
5.2
5.24
5.26
0.59
4.93
0.26
0.74
5.56
4.9
5.09
4.87
0.03
4.51
0
0.04
5.18
4.88
4.84
4.61
0
4.42
0
0
4.92
4.73
19.85
53.19
29.02
52
17.97
31.42
54.41
41.31
4.94
4.91
0.01
4.62
0
0.02
5.31
5.02
4.18
76.71
5.83
76.25
3.59
6.89
77.11
100
70.09
100
97.93
100
92.48
98.14
100
100
T1
T2
T3
T4
H1
H2
H3
RH
10
10
10
10
10
10
10
10
4.9
5.01
0.35
4.65
0.16
0.46
5.32
5.17
3.95
17.5
1.88
16.77
1.05
2.3
18.11
57.58
98.38
100
100
100
99.31
100
100
100
4.92
5.19
0.38
4.75
0.18
0.48
5.43
4.83
5.34
6.2
0.29
5.73
0.14
0.42
6.56
7.62
98.93
100
100
100
99.22
100
100
100
4.82
5.16
0.3
4.78
0.2
0.38
5.46
4.83
4.81
4.88
0.33
4.55
0.14
0.43
5.18
5.34
5.25
5.07
1.47
4.72
0.49
1.76
5.41
4.97
4.88
4.77
0.36
4.45
0.14
0.46
5.06
4.93
5.22
5.45
0.3
5.02
0.14
0.39
5.75
5.41
34.18
54.24
43.01
52.81
28.92
45.54
55.31
34.5
4.91
4.8
0.22
4.46
0.1
0.33
5.12
4.57
3.28
50.74
14.7
50.05
9.88
16.85
51.25
100
99.23
100
100
100
99.25
100
100
100
T1
T2
T3
T4
H1
H2
H3
RH
20
20
20
20
20
20
20
20
4.93
4.75
1.95
4.43
1.08
2.32
5.15
4.88
2.26
8.97
3.73
8.42
2.43
4.37
9.36
79.08
99.8
100
100
100
99.89
100
100
100
4.94
4.9
1.82
4.51
1.13
2.26
5.25
5.03
4.64
5.54
2.01
5.21
1.08
2.6
5.73
8.36
99.78
100
100
100
99.82
100
100
100
4.9
5.09
2.1
4.74
1.13
2.67
5.4
5.38
5.02
5.32
2.02
5.04
1.2
2.57
5.68
5
5.07
4.99
2.79
4.61
1.03
3.28
5.41
5.21
5.02
4.95
2.01
4.63
1.08
2.46
5.23
5.07
4.93
4.94
1.95
4.57
1.21
2.48
5.18
5.04
39.4
49.34
44.9
47.89
29.88
47.46
50.31
24.88
5.02
4.92
1.94
4.52
1.15
2.33
5.23
5.3
1.5
17.32
9.2
16.45
6.44
10.39
17.76
100
99.96
100
100
100
99.7
100
100
100
Table 1 (continued). Power of exogeneity tests at nominal level 5%; G = 2, T = 50
k2
(a1 , a2 )′ = (−20, 0)′
η 1 = 0 η 1 = .01 η 1 = .5
η 2 = .5 η 2 = .5 η 2 = .5
(a1 , a2 )′ = (−5, 5)′
η 1 = 0 η 1 = .01 η 1 = .5
η 2 = .5 η 2 = .5 η 2 = .5
(a1 , a2 )′ = (0, 0)′
η 1 = 0 η 1 = .01 η 1 = .5
η 2 = .5 η 2 = .5 η 2 = .5
(a1 , a2 )′ = (.5, .2)′
η 1 = 0 η 1 = .01 η 1 = .5
η 2 = .5 η 2 = .5 η 2 = .5
(a1 , a2 )′ = (100, 100)′
η 1 = 0 η 1 = .01 η 1 = .5
η 2 = .5 η 2 = .5 η 2 = .5
21
T1
T2
T3
T4
H1
H2
H3
RH
5
5
5
5
5
5
5
5
4.73
5.1
0.63
4.77
0.27
0.77
5.48
5.13
15.16
37.9
18.25
36.89
10.48
20.16
38.88
28.27
81.58
100
98.68
100
90.44
98.82
100
100
69.69
100
98.15
100
92
98.33
100
100
68.76
100
98.26
100
92.3
98.43
100
100
78.22
100
98.50
100
92.20
98.52
100
100
4.91
5.51
0.75
5.06
0.39
0.87
5.83
4.77
5.26
5.29
0.85
4.98
0.29
0.96
5.64
5.13
5
5.2
0.83
4.78
0.32
0.99
5.41
5.17
8.01
12.95
3.82
12.24
1.93
4.44
13.39
9.81
7.48
12.42
3.47
11.72
1.69
4.08
12.95
10.28
24.2
64.31
42.79
63.06
24.39
45.64
65.44
50.59
63.6
100
97.43
100
92.4
97.59
100
100
65.14
100
97.09
100
91.95
97.31
100
100
78.04
100
98.52
100
92.12
98.64
100
100
T1
T2
T3
T4
H1
H2
H3
RH
10
10
10
10
10
10
10
10
5.18
5.29
1.7
4.96
0.73
2
5.61
5.24
26.81
41.58
31.1
40.35
18.21
33.67
42.64
24.16
99.76
100
99.98
100
98.22
99.98
100
100
98.81
100
99.97
100
99.08
99.98
100
100
99.17
100
99.99
100
98.98
100
100
100
99.56
100
100
100
98.9
100
100
100
5.26
4.92
1.58
4.57
0.55
1.88
5.3
4.92
5.3
5.19
1.6
4.87
0.5
2.03
5.53
5.07
4.86
5.07
1.88
4.67
0.48
2.31
5.38
5.11
11.05
13.49
7.75
12.81
3.34
8.65
14.05
8.55
11.61
14.75
8.29
14
3.88
9.3
15.32
8.94
43.71
66.24
57.52
65.15
32.85
60.4
67.3
43.87
99.12
100
100
100
99.28
100
100
100
99.28
100
100
100
99.26
100
100
100
99.74
100
100
100
98.29
100
100
100
T1
T2
T3
T4
H1
H2
H3
RH
20
20
20
20
20
20
20
20
5.12
5.06
2.97
4.7
1.2
3.59
5.32
5.46
27.67
34.7
30.26
33.32
17.73
32.57
35.69
16.17
99.96
100
100
100
99.24
100
100
100
99.45
100
100
100
99.93
100
100
100
99.48
100
100
100
99.91
100
100
100
99.62
100
100
100
99.93
100
100
100
4.86
4.93
3.2
4.57
1.1
3.65
5.25
5.2
4.91
4.77
2.88
4.45
1.03
3.39
5.06
4.64
4.29
4.3
2.74
3.97
0.72
3.27
4.55
4.82
10.45
11.85
9.14
11.13
4.51
10.24
12.42
7.45
10.95
12.03
9.14
11.34
4.53
10.25
12.55
7.45
41.15
51.76
47.52
50.35
27.81
50.07
52.91
26.62
99.91
100
100
100
99.77
100
100
100
99.9
100
100
100
99.81
100
100
100
99.94
100
100
100
98.75
100
100
100
Table 2. Power of exogeneity tests at nominal level 5% with Cauchy errors; G = 2, T = 50
k2
(a1 , a2 )′ = (−20, 0)′
η 1 = 0 η 1 = .01 η 1 = .5
η2 = 0
η2 = 0
η2 = 0
(a1 , a2 )′ = (−5, 5)′
η 1 = 0 η 1 = .01 η 1 = .5
η2 = 0
η2 = 0
η2 = 0
(a1 , a2 )′ = (0, 0)′
η 1 = 0 η 1 = .01 η 1 = .5
η2 = 0
η2 = 0
η2 = 0
(a1 , a2 )′ = (.5, .2)′
η 1 = 0 η 1 = .01 η 1 = .5
η2 = 0
η2 = 0
η2 = 0
(a1 , a2 )′ = (100, 100)′
η 1 = 0 η 1 = .01 η 1 = .5
η2 = 0
η2 = 0
η2 = 0
22
T1
T2
T3
T4
H1
H2
H3
RH
5
5
5
5
5
5
5
5
4.96
5.08
0.05
4.82
0.04
0.07
5.41
5.13
4.94
8.58
0.08
8.08
0.02
0.11
9.01
12.29
4.9
59.38
4.91
58.8
3.26
5.86
59.84
82.91
4.98
5.48
0.02
5.19
0.01
0.05
5.81
5.61
4.95
6.02
0.03
5.62
0
0.04
6.38
6.79
5.14
24.51
0.65
23.87
0.33
0.81
25.21
40.66
5.19
5.46
0
5.14
0
0
5.67
6.04
5.23
5.38
0.03
5
0.01
0.05
5.7
5.98
4.97
5.29
0.05
4.9
0.02
0.05
5.64
5.93
5.1
5.15
0.02
4.75
0.01
0.02
5.49
4.88
5.34
4.97
0.01
4.63
0
0.03
5.23
4.43
5.23
5.54
0.02
5.14
0.01
0.03
5.77
5.06
5.01
5.32
0.01
5.01
0
0.02
5.57
6.12
4.83
44.68
1.62
44
0.91
2.02
45.34
68.34
6.66
81.16
8.71
80.76
6.2
9.95
81.48
96.73
T1
T2
T3
T4
H1
H2
H3
RH
10
10
10
10
10
10
10
10
5.61
5.42
0.39
5.08
0.17
0.49
5.61
6.09
4.79
6.48
0.44
6.09
0.17
0.65
6.8
11.71
5.07
38.72
10.96
38.06
7.6
12.56
39.32
81.63
4.97
5.53
0.3
5.24
0.11
0.46
5.8
6.41
5.2
5.41
0.3
5
0.13
0.42
5.64
5.77
4.63
9.28
0.8
8.86
0.42
1.11
9.66
22.73
4.83
4.79
0.31
4.45
0.16
0.4
5.01
5.06
5.48
5.17
0.28
4.87
0.08
0.38
5.42
4.67
4.7
4.81
0.32
4.46
0.09
0.45
5.05
4.98
5.04
4.92
0.34
4.61
0.14
0.44
5.19
4.22
5.08
4.94
0.28
4.63
0.17
0.39
5.2
4.63
5.22
5.14
0.19
4.83
0.09
0.33
5.43
4.77
5.09
5.01
0.38
4.69
0.14
0.51
5.33
3.86
2.95
22.57
3.53
21.74
2.06
4.19
23.16
62.53
3.24
54.36
18.51
53.51
13.04
20.96
55.1
96.32
T1
T2
T3
T4
H1
H2
H3
RH
20
20
20
20
20
20
20
20
5.27
5.34
2.03
5.03
1.21
2.54
5.72
6.3
5.02
5.4
2.16
5.13
1.25
2.62
5.69
9.15
3.63
13.09
6.97
12.58
4.78
8.03
13.49
75.83
4.64
4.94
1.8
4.6
1.05
2.27
5.14
4.05
4.63
4.9
1.77
4.57
1.01
2.25
5.12
4.15
4.35
6.76
2.45
6.42
1.61
3.25
7.07
23.42
4.96
4.85
1.95
4.47
1.14
2.3
5.21
6.55
5.27
5.06
2.09
4.68
1.19
2.62
5.4
6.42
5.09
4.98
1.87
4.67
1.06
2.33
5.29
6.83
4.77
4.76
1.91
4.48
0.94
2.35
5.04
5.49
5.16
5.26
2.19
5.01
1.35
2.56
5.5
5.03
4.85
4.98
1.88
4.7
1.09
2.4
5.27
5.27
5.1
4.84
2.06
4.57
1.26
2.43
5.08
5.01
3
8.73
3.74
8.2
2.42
4.38
9.06
54.94
2.55
18.56
11.08
18.04
8.21
12.28
19.12
94.83
Table 2 (continued). Power of exogeneity tests at nominal level 5% with Cauchy errors; G = 2, T = 50
k2
(a1 , a2 )′ = (−20, 0)′
η 1 = 0 η 1 = .01 η 1 = .5
η 2 = .5 η 2 = .5 η 2 = .5
(a1 , a2 )′ = (−5, 5)′
η 1 = 0 η 1 = .01 η 1 = .5
η 2 = .5 η 2 = .5 η 2 = .5
(a1 , a2 )′ = (0, 0)′
η 1 = 0 η 1 = .01 η 1 = .5
η 2 = .5 η 2 = .5 η 2 = .5
(a1 , a2 )′ = (.5, .2)′
η 1 = 0 η 1 = .01 η 1 = .5
η 2 = .5 η 2 = .5 η 2 = .5
(a1 , a2 )′ = (100, 100)′
η 1 = 0 η 1 = .01 η 1 = .5
η 2 = .5 η 2 = .5 η 2 = .5
23
T1
T2
T3
T4
H1
H2
H3
RH
5
5
5
5
5
5
5
5
5.21
4.51
0.01
4.24
0
0.04
4.84
4.36
4.86
7.01
0.01
6.6
0
0.01
7.31
8.62
4.43
52.88
3.51
52.29
2.22
4.22
53.53
77.57
4.53
22.05
0.72
21.41
0.42
0.89
22.61
37.03
5.03
22.32
0.6
21.56
0.21
0.75
22.8
36.83
4.65
33.84
1.47
33.16
0.95
1.74
34.49
53.81
4.88
4.83
0.01
4.42
0
0.02
5.19
5.32
4.42
4.77
0.01
4.5
0
0.03
5.08
5.34
4.88
5.01
0.04
4.7
0.01
0.05
5.42
5.36
5.31
5.02
0.01
4.64
0
0.02
5.41
5.03
5.16
4.91
0.06
4.63
0.02
0.06
5.24
5.29
5.05
5.52
0.02
5.07
0.01
0.03
5.77
5.46
4.64
77.26
8.81
76.83
6.38
10.02
77.6
96.24
4.66
77.11
9.01
76.71
6.37
10.14
77.53
96.42
5.31
78.14
10.27
77.77
7.38
11.51
78.63
97.63
T1
T2
T3
T4
H1
H2
H3
RH
10
10
10
10
10
10
10
10
4.72
4.53
0.23
4.16
0.08
0.34
4.91
4.94
4.97
6.71
0.49
6.3
0.25
0.75
7.18
9.41
4.34
36.17
10.23
35.3
7.01
11.8
36.81
78.79
4.87
13.87
1.6
13.24
0.9
2.03
14.51
34.19
5.41
13.91
1.95
13.31
1.04
2.38
14.37
33.03
5.3
17.44
3.09
16.82
1.86
3.62
18.09
45.3
5.2
4.94
0.34
4.65
0.12
0.44
5.17
5.36
5.3
5.01
0.34
4.68
0.19
0.43
5.45
4.98
5.16
5.11
0.27
4.7
0.15
0.35
5.45
5.44
4.89
5.11
0.27
4.77
0.08
0.42
5.37
5.11
4.93
5.14
0.34
4.73
0.09
0.49
5.44
5.01
4.7
5.15
0.31
4.85
0.11
0.41
5.46
5.46
5.07
49.25
16.39
48.54
12.12
18.12
49.86
95.77
4.59
49.57
15.82
48.81
11.64
17.91
50.25
95.26
4.81
52.89
18.7
52.05
13.86
20.72
53.49
97.22
T1
T2
T3
T4
H1
H2
H3
RH
20
20
20
20
20
20
20
20
4.83
4.61
2.04
4.21
1.12
2.44
4.86
6.64
4.39
4.6
1.85
4.34
1.16
2.2
4.93
9.64
2.6
13.11
6.7
12.41
4.61
7.67
13.56
75.85
4.31
6.41
2.6
6.09
1.59
3.04
6.75
18.22
4.21
6.08
2.54
5.79
1.55
3.12
6.36
18.08
3.47
6.78
3
6.48
1.66
3.52
7.23
33.69
4.85
4.65
1.69
4.27
1.01
2.16
4.93
5.31
5.12
4.85
1.99
4.57
1.07
2.48
5.17
5.11
4.67
4.95
1.9
4.73
1.12
2.39
5.26
5.31
4.66
4.56
1.88
4.23
1.08
2.26
4.85
4.38
4.85
4.7
2
4.4
1.15
2.41
4.97
4.64
5.05
5.13
2.23
4.8
1.35
2.76
5.5
4.93
2.26
18.38
11.17
17.78
8.44
12.37
19.04
94.4
2.19
17.85
10.59
17.22
7.93
11.67
18.46
94.26
1.79
18.44
10.62
17.83
7.45
12.04
18.95
96.09
8.2.
Exact Monte Carlo exogeneity (MCE) tests
We now evaluate the empirical performance of the Monte Carlo tests described in the previous
section. To do that, we use the same DGP introduces in Section 8 in addition with a Student typedistribution for model errors. We generate J = 10000 samples of size T = 50 following this DGP.
For each sample r = 1, . . . , J, a replication Wr of the statistic W ∈ {Hi , Hl , RH , i = 1, 2, 3 ; l =
1, 2, 3, 4 } is computed from the simulated sample. In addition, for each sample r, we draw N = 99
realization of the statistic, namely, W j∗ , j = 1, . . . , 99, following the same DGP as above. We then
compute the Monte Carlo test p-value as
p̂r (Wr ) =
∗
∑99
j=1 1(W j ≥ Wr ) + 1
100
(8.8)
and estimate the rejection probability (RP) of the test as the proportion of the p̂r (Wr ) that are less
than α , the nominal level. This yields the following estimate of the RP of the Monte Carlo test:
c MC =
RP
1 10000
∑ 1[ p̂r (Wr ) < α ].
10000 r=1
(8.9)
Tables 3- 5 present the results. We note that all Monte Carlo tests now have approximately correct
size, even when identification is weak. So, size adjustment of all standard DWH tests is feasible,
unlike the conclusion of Staiger and Stock (1997). Furthermore, compared to the standard tests,
all tests power has improved slightly even when instruments are weak in both Cauchy and Student
distributions setups [see Tables 4-5]. But the Monte Carlo tests still exhibit low power when all
instruments are weak. In addition, the Monte Carlo tests seem to have more power when errors
have Gaussian and Student distributions than when their distributions are Cauchy-type.
Overall, our results clearly suggest that finite-sample improvement of standard exogeneity tests
is feasible, whether model errors are Gaussian or not, and identification is strong or weak. Hence,
the view that size adjustment is infeasible for some versions of DWH tests [see for example, Staiger
and Stock (1997)] is questionable.
24
Table 3 . Power of MCE tests with Gaussian errors
k2
(a1 , a2 )′ = (−20, 0)′
η 1 = 0 η 1 = .01 η 1 = .5
η2 = 0
η2 = 0
η2 = 0
(a1 , a2 )′ = (−5, 5)′
η 1 = 0 η 1 = .01 η 1 = .5
η2 = 0
η2 = 0
η2 = 0
(a1 , a2 )′ = (0, 0)′
η 1 = 0 η 1 = .01 η 1 = .5
η2 = 0
η2 = 0
η2 = 0
(a1 , a2 )′ = (.5, .2)′
η 1 = 0 η 1 = .01 η 1 = .5
η 2 = .5 η 2 = .5 η 2 = .5
(a1 , a2 )′ = (100, 100)′
η 1 = 0 η 1 = .01 η 1 = .5
η 2 = .5 η 2 = .5 η 2 = .5
25
T1
T2
T3
T4
H1
H2
H3
RH
5
5
5
5
5
5
5
5
5
5
3
5
4
4
5
5
7
8
9
8
9
9
8
18
30
100
93
100
91
93
100
100
4
5
3
5
3
3
5
5
4
6
4
5
4
4
6
6
41
99
95
99
94
95
99
100
4
5
4
5
4
4
5
5
5
4
3
5
4
5
5
5
5
5
4
5
4
5
5
5
12
13
12
13
12
12
13
13
9
7
8
7
9
8
7
9
13
23
22
23
22
22
23
23
77
100
91
100
91
91
100
100
78
100
94
100
93
94
100
100
81
100
89
100
89
89
100
100
T1
T2
T3
T4
H1
H2
H3
RH
10
10
10
10
10
10
10
10
4
5
4
5
3
4
5
5
7
8
5
8
5
5
8
16
55
99
99
99
99
99
99
100
4
5
4
5
3
4
5
5
3
7
3
7
3
3
7
6
50
99
99
99
98
99
99
100
4
5
3
5
3
4
5
4
5
5
4
5
4
3
5
5
5
5
3
5
3
4
5
4
6
6
5
6
6
5
6
6
5
8
7
8
7
7
8
13
17
23
20
23
21
22
23
24
97
99
99
99
99
99
99
100
95
100
98
100
98
98
100
100
95
98
97
98
96
97
98
100
T1
T2
T3
T4
H1
H2
H3
RH
20
20
20
20
20
20
20
20
5
5
4
5
3
3
4
5
6
7
7
7
6
7
7
6
33
80
82
80
81
82
80
100
4
5
3
5
4
3
5
4
5
7
4
6
4
4
6
7
68
99
99
99
99
99
99
100
4
5
3
4
3
3
5
3
5
4
3
5
3
3
5
5
4
5
3
4
3
3
5
4
6
6
5
6
6
5
6
12
7
8
8
8
7
8
8
13
10
12
10
12
10
11
12
12
88
92
94
92
94
94
92
100
83
84
88
84
89
88
84
100
90
93
93
93
93
93
93
100
Table 4 . Power of MCE tests with Cauchy errors
k2
(a1 , a2 )′ = (−20, 0)′
η 1 = 0 η 1 = .01 η 1 = .5
η2 = 0
η2 = 0
η2 = 0
(a1 , a2 )′ = (−5, 5)′
η 1 = 0 η 1 = .01 η 1 = .5
η2 = 0
η2 = 0
η2 = 0
(a1 , a2 )′ = (0, 0)′
η 1 = 0 η 1 = .01 η 1 = .5
η2 = 0
η2 = 0
η2 = 0
(a1 , a2 )′ = (.5, .2)′
η 1 = 0 η 1 = .01 η 1 = .5
η 2 = .5 η 2 = .5 η 2 = .5
(a1 , a2 )′ = (100, 100)′
η 1 = 0 η 1 = .01 η 1 = .5
η 2 = .5 η 2 = .5 η 2 = .5
26
T1
T2
T3
T4
H1
H2
H3
RH
5
5
5
5
5
5
5
5
4
5
4
5
4
4
5
5
7
8
5
8
5
5
8
7
27
62
38
62
38
38
62
82
4
5
5
5
4
4
5
5
9
6
5
6
5
5
6
5
17
32
20
32
20
20
32
47
2
5
5
5
5
5
5
5
4
5
3
5
5
3
5
5
3
4
3
4
3
3
4
5
6
9
7
9
7
7
9
8
5
6
4
6
3
4
6
6
5
5
4
5
4
4
5
6
38
76
50
76
47
50
76
95
41
81
50
81
51
50
81
96
40
78
50
78
49
50
78
98
T1
T2
T3
T4
H1
H2
H3
RH
10
10
10
10
10
10
10
10
3
5
5
5
4
5
5
5
2
3
5
3
4
5
3
7
24
37
34
37
34
34
37
82
5
5
4
5
4
4
5
4
8
6
3
6
3
3
6
5
10
15
12
15
14
12
15
37
3
4
4
4
4
4
4
4
5
3
2
3
2
2
3
4
4
5
4
5
4
4
5
5
4
6
4
6
4
4
6
7
6
9
7
9
7
7
9
10
5
6
5
4
4
5
6
7
34
55
43
55
43
43
55
96
36
54
49
54
52
49
54
96
37
51
44
51
41
44
51
93
T1
T2
T3
T4
H1
H2
H3
RH
20
20
20
20
20
20
20
20
5
5
5
5
5
5
5
4
6
9
8
9
8
8
9
6
11
15
13
15
15
13
15
73
4
5
5
5
4
5
5
5
7
7
8
7
8
8
7
10
6
8
8
8
8
8
8
31
4
5
4
5
3
4
5
5
4
5
5
5
5
5
5
4
4
4
4
4
4
4
4
5
6
6
5
6
4
5
6
5
6
5
5
5
5
5
5
7
4
5
5
5
5
5
5
8
9
14
15
14
14
15
14
98
12
19
17
19
17
17
19
94
10
12
11
12
12
11
12
98
Table 5 . Power of MCE tests with Student errors
k2
(a1 , a2 )′ = (−20, 0)′
η 1 = 0 η 1 = .01 η 1 = .5
η2 = 0
η2 = 0
η2 = 0
(a1 , a2 )′ = (−5, 5)′
η 1 = 0 η 1 = .01 η 1 = .5
η2 = 0
η2 = 0
η2 = 0
(a1 , a2 )′ = (0, 0)′
η 1 = 0 η 1 = .01 η 1 = .5
η2 = 0
η2 = 0
η2 = 0
(a1 , a2 )′ = (.5, .2)′
η 1 = 0 η 1 = .01 η 1 = .5
η 2 = .5 η 2 = .5 η 2 = .5
(a1 , a2 )′ = (100, 100)′
η 1 = 0 η 1 = .01 η 1 = .5
η 2 = .5 η 2 = .5 η 2 = .5
27
T1
T2
T3
T4
H1
H2
H3
RH
5
5
5
5
5
5
5
5
3
5
4
5
3
4
5
5
6
7
4
7
4
5
5
7
24
97
86
97
86
86
97
100
6
4
6
4
6
6
4
4
7
5
6
5
6
6
5
8
33
92
72
92
71
72
92
100
5
5
4
5
3
4
5
5
4
4
3
5
3
3
4
5
4
5
3
5
4
3
4
4
7
10
10
10
10
10
10
12
7
10
8
10
8
8
10
12
9
10
6
10
6
6
10
10
75
98
89
98
87
89
98
100
68
100
82
100
82
82
100
100
63
99
90
99
90
90
99
100
T1
T2
T3
T4
H1
H2
H3
RH
10
10
10
10
10
10
10
10
2
4
4
4
3
4
4
3
6
7
5
7
6
6
7
7
18
90
83
90
82
83
90
100
3
5
5
5
5
5
5
5
5
5
2
5
2
2
5
8
29
82
77
82
78
77
82
99
4
5
5
5
4
3
5
5
5
5
4
3
3
3
5
4
4
5
3
5
3
4
5
4
5
6
6
6
6
6
6
6
2
4
3
4
3
3
4
7
4
8
8
8
8
8
8
10
52
90
84
90
82
84
90
100
62
90
86
90
86
86
90
100
59
93
88
93
88
88
93
100
T1
T2
T3
T4
H1
H2
H3
RH
20
20
20
20
20
20
20
20
4
5
3
5
4
3
5
5
6
8
6
8
5
6
8
7
15
65
65
65
65
65
65
100
4
5
4
5
4
4
5
5
10
10
11
10
11
11
10
14
15
44
48
44
48
48
44
98
5
5
4
5
3
4
5
4
4
5
4
5
3
4
5
4
5
5
3
4
2
3
5
5
5
5
5
5
5
5
5
5
4
6
7
6
7
7
6
9
4
6
4
6
4
4
6
5
50
64
64
64
64
64
64
100
51
60
61
60
61
61
60
100
55
73
74
73
74
74
73
100
9.
Conclusion
This paper develops a finite-sample analysis of the distribution of the standard Durbin-Wu-Hausman
and Revankar-Hartley specification tests under both the null hypothesis of exogeneity (level) and the
alternative hypothesis of endogeneity (power), with or without identification. Our analysis provides
several new insights and extensions of earlier procedures. The characterization of the finite-sample
distributions of the statistics under the null hypothesis shows that all tests are typically robust to
weak instruments (level is controlled). We provide a characterization of the power of the tests that
clearly exhibits the factors that determine power. We show that exogeneity tests have no power in
the extreme case where all IVs are weak [similar to Staiger and Stock (1997), and Guggenberger
(2010)], but do have power as soon as we have one strong instrument. As a result, exogeneity
tests can detect an exogeneity problem even if not all model parameters are identified, provided
partial identification holds. Moreover, the finite-sample characterization of the distributions of the
tests allows the construction of exact identification-robust exogeneity tests even in cases where
conventional asymptotic theory breaks down. In particular, DWH and RH tests are valid even
if the distribution of the errors does not have moments (Cauchy-type distribution, for example).
We present a Monte Carlo experiment which confirms our finite-sample theory. The large-sample
properties of the tests and estimation issues related to pretesting are examined in Doko Tchatoka
and Dufour (2011).
28
APPENDIX
A. Notes
A.1.
Unified formulation of DWH test statistics
We establish the unified formulation of Durbin-Wu statistics in (4.1) - (3.11), as well as the three
versions of Hausman (1978) statistic. From Wu (1973, Eqs. (2.1), (2.18), (3.16), (3.20)), Tl ,
l = 1, 2, 3, 4 are defined as
T1 = κ 1 Q∗ /Q1 , T2 = κ 2 Q∗ /Q2 , T1 = κ 3 Q∗ /Q3 , T2 = κ 4 Q∗ /Q4 ,
−1
Q∗ = (b1 − b2 )′ (Y ′ A2Y )−1 − (Y ′ A1Y )−1
(b1 − b2 ),
′
(A.1)
(A.2)
∗
Q1 = (y −Y b2 ) A2 (y −Y b2 ), Q2 = Q4 − Q ,
′
(A.3)
′
Q4 = (y −Y b1 ) A1 (y −Y b1 ), Q3 = (y −Y b2 ) A1 (y −Y b2 ),
(A.4)
bi = (Y ′ AiY )−1Y ′ Ai y, i = 1, 2, A1 = M1 , A2 = M − M1 ,
(A.5)
where b1 is the ordinary least squares estimator of β , and b2 is the instrumental variables method
estimator of β . So, from our notations, b1 ≡ β̂ and b2 ≡ β̃ .
So, from (3.9) - (3.11), we have
Q∗ = T (β̃ − β̂ )′ ∆ˆ −1 (β̃ − β̂ ) = T σ̃ 2 (β̃ − β̂ )′ Σ̂2−1 (β̃ − β̂ ),
Q1 = T σ̃ 21 ,
Q3 = T σ̃ 2 ,
(A.6)
Q4 = T σ̂ 2 ,
(A.7)
Q2 = Q4 − Q∗ = T σ̂ 2 − T (β̃ − β̂ )′ ∆ˆ −1 (β̃ − β̂ ) = T σ̃ 22
(A.8)
so that Tl , can be expressed as:
Tl = κ l (β̃ − β̂ )′ Σ̃l−1 (β̃ − β̂ ) ,
l = 1, 2, 3, 4 ,
(A.9)
where κ l , and Σ̃l are defined in (4.1) - (3.11). The formulation in (A.9) shows clearly the link
between Wu (1973) tests and Hausman (1978) test.
A.2.
Regression interpretation of DWH test statistics
Consider equations (3.3) - (3.6). First, we note that H0 and Hb can be written as
H0 : Rθ = 0 ⇔ Rb = a ,
Hb : R∗ θ ∗ = 0 ⇔ R∗ θ ∗ = β − a ,
where R =
′
h
0 0 IG
′
i
and R∗ =
h
IG 0 −IG
i
′
. By definition, we have θ̂ ∗ = [β̃ , γ̃ ′ , b̃′ ]′ and
θ̂ ∗0 = [β̂ , γ̂ ′ , β̂ ]′ , where β̃ and γ̃ are the 2SLS estimators of β and γ and β̂ and γ̂ are the OLS
29
estimators of β and γ based on the following model:
y = Y β + Z1 γ + u, Ŷ = Z Π̂ ,
with Π̂ = (Z ′ Z)−1 Z ′Y. So, we can observe that
−1
θ̂ ∗0 = θ̂ ∗ + (X̂ ′ X̂)−1 R′∗ R∗ (X̂ ′ X̂)−1 R′∗
(−R∗ θ̂ ∗ )
−1
(R∗ θ̂ ∗ ).
S(θ̂ ∗0 ) − S(θ̂ ∗ ) = (θ̂ ∗0 − θ̂ ∗ )′ X̂ ′ X̂(θ̂ ∗0 − θ̂ ∗ ) = (R∗ θ̂ ∗ )′ R∗ (X̂ ′ X̂)−1 R′∗
Furthermore, we have
R∗ θ̂
′
X̂ X̂
(X̂1′ X̂1 )−1
=
h
IG 0 −IG
"
i

β̃


 γ̃  = β̃ − b̃ ,
b̃
#
"

(X̂1′ X̂1 )
0
(X̂1′ X̂1 )−1
0
′
−1
=
,
(
X̂
X̂)
=
0
(V̂ ′V̂ )
0
(V̂ ′V̂ )−1
#
#−1 "
"
M11 M12
Ŷ ′Ŷ Ŷ ′ Z1
,
=
=
′
′
M21 M22
Z1Ŷ Z1 Z1
#
,
−1
−1 ′
= [Y ′ (M1 − M)Y ]−1 . So,
= Ŷ M1Ŷ
where M11 = (Ŷ ′Ŷ ) − Ŷ ′ Z1 (Z1′ Z1 )−1 Z1′ Ŷ
(X̂ ′ X̂)−1 R′∗


 

M11 M12
0
IG
M11


 

=  M21 M22
0
M21
 0  = 

′
−1
′
−1
0
0 (V̂ V̂ )
−IG
−(V̂ V̂ )
R∗ (X̂ ′ X̂)−1 R′∗ = M11 + (V̂ ′V̂ )−1
 


M11
β̂ − β̃
 


′
−1 −1
(b̃ − β̃ ).
θ̂ ∗0 − θ̂ ∗ =  γ̂ − γ̃  = 
M21
 M11 + (V̂ V̂ )
′
−1
β̂ − b̃
−(V̂ V̂ )
Hence, we get
−1
−1
(b̃ − β̃ ) = M11 M11 + (V̂ ′V̂ )−1
ã ,
β̂ − β̃ = M11 M11 + (V̂ ′V̂ )−1
where ã = b̃ − β̃ is the OLS estimate of a from (3.4). We see from (??) that
ã = b̃ − β̃
So, we have
−1
M11 + (V̂ ′V̂ )−1 M11
(β̂ − β̃ )
′
= [Y (M1 − M)Y ]−1 + (V̂ ′V̂ )−1 [Y ′ (M1 − M)Y ](β̂ − β̃ ) .
=
−1
(R∗ θ̂ ∗ )
S(θ̂ ∗0 ) − S(θ̂ ∗ ) = (R∗ θ̂ ∗ )′ R∗ (X̂ ′ X̂)−1 R′∗
30
(A.10)
−1
= (b̃ − β̃ )′ [Y ′ (M1 − M)Y ]−1 + (V̂ ′V̂ )−1
(b̃ − β̃ )
′
′ ′
−1
= (β̂ − β̃ ) [Y (M1 − M)Y ] [Y (M1 − M)Y ] + (V̂ ′V̂ )−1 ×
−1
−1
M11 + (Y ′ MY )−1 M11
(β̂ − β̃ )
[Y ′ (M1 − M)Y ](β̂ − β̃ ) = (β̂ − β̃ )′ M11
−1
−1 −1
−1
= (β̂ − β̃ )′ M11
(A.11)
M11 + (Y ′ M1Y − M11
)
M11
(β̂ − β̃ ) .
Now, we can apply the following lemma which proof is straightforward and then, is omitted.
Lemma A.1 Let A and B be two nonsingular r × r matrices. Then
A−1 − B−1 = B−1 (B − A)A−1
= A−1 (B − A)B−1
= A−1 (A − AB−1 A)A−1
= B−1 (BA−1 B − B)B−1 .
Furthermore, if B − A is nonsingular, then A−1 − B−1 is nonsingular with
(A−1 − B−1 )−1 = A(B − A)−1 B = A + A(B − A)−1 A = A[A−1 + (B − A)−1 ]A
= B(B − A)−1 A = B(B − A)−1 B − B = B[(B − A)−1 − B−1 ]B
= A(A − AB−1 A)−1 A
= B(BA−1 B − B)−1 B.
−1
By setting A = M11
and B = Y ′ M1Y in (A.11), and applying Lemma A.1, we get
−1
−1
−1 −1
M11 + (Y ′ M1Y − M11
)
M11 (β̂ − β̃ )
S(θ̂ ∗0 ) − S(θ̂ ∗ ) = (β̂ − β̃ )′ M11
= (β̂ − β̃ )′ A A−1 + (B − A)−1 A(β̂ − β̃ ) = (β̂ − β̃ )′ (B−1 − A−1 )−1 (β̂ − β̃ )
= (β̂ − β̃ )′ {[Y ′ (M1 − M)Y ]−1 − (Y ′ M1Y )−1 }−1 (β̂ − β̃ )
1
1
−1
−1 −1
(β̃ − β̂ )′ [Ω̂IV
=
− Ω̂LS
] (β̃ − β̂ ) = (β̃ − β̂ )′ ∆ˆ −1 (β̃ − β̂ ) ,
T
T
(A.12)
where Ω̂IV = T1 Y ′ (M1 − M)Y and Ω̂LS = T1 Y ′ M1Y. Note also that
S(θ̂ ∗0 ) − S(θ̂ ∗ ) = S(θ̂ 0 ) − S(θ̂ ) = ã′ [V̂ ′ MX V̂ ]ã ,
(A.13)
where MX = I − PX = I − X(X ′ X)−1 X ′ , X = [Y, Z1 , V̂ ]. Moreover, from (3.12), we have
0
S(θ̂ ) = T σ̃ 22 , S(θ̂ 0 ) = T σ̂ 2 , S∗ (θ̂ ∗ ) = T σ̃ 2 .
(A.14)
Hence, except for H1 , the other statistics can be expressed as:
0
H2 = T [S(θ̂ 0 ) − S(θ̂ )]/S∗ (θ̂ ∗ ), H3 = T [S(θ̂ 0 ) − S(θ̂ )]/S(θ̂ 0 ) ,
T1 =
0
κ 1 [S(θ̂ 0 ) − S(θ̂ )]/[S∗ (θ̂ ∗ ) − Se (θ̂ )],
31
T2 = κ 2 [S(θ̂ 0 ) − S(θ̂ )]/S(θ̂ ) ,
(A.15)
(A.16)
0
T3 = κ 3 [S(θ̂ 0 ) − S(θ̂ )]/S∗ (θ̂ ∗ ), T4 = κ 4 [S(θ̂ 0 ) − S(θ̂ )]/S(θ̂ 0 ) ,
RH = κ [S̄(θ̄ˆ ) − S̄(θ̄ˆ )]/S̄(θ̄ˆ ) ,
R
0
0
(A.17)
(A.18)
Equations (A.15) - (A.18) are the regression interpretation of DWH and RH statistics.
B.
Proofs
P ROOF OF L EMMA 6.1
β̃
Note first that
= β + [Y ′ (M1 − M)Y ]−1Y ′ (M1 − M)u = β + Ā1 u,
Ā1 = [Y ′ (M1 − M)Y ]−1Y ′ (M1 − M) ,
β̂
β̃ − β̂
= β + (Y ′ M1Y )−1Y ′ M1 u = β + A1 u,
(B.1)
A1 = (Y ′ M1Y )−1Y ′ M1
= (Ā1 − A1 )u, (β̃ − β̂ )′ ∆ˆ −1 (β̃ − β̂ ) = u′C0 u ,
(B.2)
(B.3)
with C0 = (Ā1 − A1 )′ ∆ˆ −1 (Ā1 − A1 ). We also have
M1 (y −Y β̃ ) = B̄1 u ,
(B.4)
M(y −Y β̃ ) = Mu − MY Ā1 u = Mu − MM1Y Ā1 u = MM(M1 −M)Y u ,
(B.5)
where B̄1 = M1 − P(M1 −M)Y = M1 (I − P(M1 −M)Y ) = M1 M(M1 −M)Y , and
σ̃ 2 =
1 ′
u M1 M(M1 −M)Y u = u′ D̄1 u,
T
σ̃ 21 = σ̃ 2 − σ̂ 2 = u′ (D̄1 − D1 )u =
σ̃ 22 =
σ̂ 2 =
1 ′
u M1 MM1Y u = u′ D1 u ,
T
1 ′
u (M1 − M)M(M1 −M)Y u ,
T
1 ′
u M1 MM1Y u − u′C0 u = u′ (D1 −C0 )u .
T
(B.6)
(B.7)
(B.8)
Now, from (B.1) - (B.8) and the definitions of the statistics, we get:
H2 = Tu′C0 u/u′ D̄1 u = T (u/σ u )′C0 (u/σ u )/(u/σ u )′ D̄1 (u/σ u ) ,
(B.9)
H3 = Tu C0 u/u D1 u = T (u/σ u ) C0 (u/σ u )/(u/σ u ) D1 (u/σ u ) ,
(B.10)
T1 = κ 1 (u/σ u )′C0 (u/σ u )/(u/σ u )′ (D̄1 − D1 )(u/σ u ),
(B.11)
T2 = κ 2 (u/σ u )′C0 (u/σ u )/(u/σ u )′ (D1 −C0 )(u/σ u ) ,
(B.12)
T3 = κ 3 (u/σ u )′C0 (u/σ u )/(u/σ u )′ D̄1 (u/σ u ),
(B.13)
T4 = κ 4 (u/σ u )′C0 (u/σ u )/(u/σ u )′ D1 (u/σ u ) .
(B.14)
′
′
′
′
Under H0 , Y is independent of u, and if further the instruments Z are exogenous, the conditional
distribution, given X̄ of all statistics in (B.9) - (B.14) depend only on the distribution of u/σ u , irrespective of whether identification is strong or weak. The same result holds for H1 . By observing
32
that
1
T (MX1
− MX̄ ) = PD1 Z2 , RH can also be expressed as:
RH
= κ R (u/σ u )′ PD1 Z2 (u/σ u )/k2 /(u/σ u )′ (D1 − PD1 Z2 )(u/σ u ) .
(B.15)
Thus, under H0 , the distribution of RH , given X̄, only depends on u/σ u , whether Rank(Π2 ) = G
or not.
P ROOF OF L EMMA 6.1 Consider the identities expressing Hi , i = 1, 2, 3, Tl , l = 1, 2, 3, 4, and
RH in (B.9) - (B.15). Under H1 , we have u = Va + ε and the results of Theorem 5.2 follow.
P ROOF OF L EMMA 6.1
Suppose that a ∈ N (Cπ ). Then, we can show that
(Ā1 − A1 )Va = 0, C0Va = 0, D̄1Va = 0, D1Va = 0,
MX1 Va = D1Va = 0, MX̄ Va = D1Va − PD1 Z2 Va = 0 ,
(B.16)
(B.17)
where Ā1 , A1 , C0 , D̄1 , and D1 are defined in (B.1) - (B.8) and (B.15).
To simplify, let us prove that (Ā1 − A1 )Va = 0. First, note that V = Y − Z1 Π1 − Z2 Π2 so that
(Ā1 − A1 )Va = [Ā1Y − A1Y − (Ā1 − A1 )(Z1 Π1 + Z2 Π2 )]a. Since Ā1Y = IG = A1Y, hence we have
(Ā1 − A1 )Va = −[(Ā1 − A1 )(Z1 Π1 + Z2 Π2 )]a = −(Ā1 − A1 )Z2 Π2 a ,
(B.18)
because Ā1 Z1 = A1 Z1 = 0. Now, we observe that (M1 − M)Z2 = M1 Z2 , hence (Ā1 − A1 )Z2 Π2 a =
−1
−1
(Ω̂IV
)M1 Z2 Π2 a, which equals zero if and only M1 Z2 Π2 a = 0, i.e. Π2′ Z2′ M1 Z2 Π2 a or equiv− Ω̂LS
alently, a ∈ N (Cπ ). So, we have a ∈ N (Cπ ) if and only if (Ā1 − A1 )Va = 0. The proof is similar
for the other identities in (B.16)-(B.17). Thus by substituting these identities in Theorem 5.2, we
get the results of Corollary 5.3.
Suppose now that (5.13)-(5.17) hold. It is easy to see from Theorem 5.2 that this equivalent to
(Ā1 − A1 )Va = 0, C0Va = 0, D̄1Va = 0, D1Va = 0, PD1 Z2 Va = 0
(B.19)
with probability 1. However, we know that (B.19) holds if and only if a ∈ N (Cπ ). Hence the result
follows.
P ROOF OF L EMMA 6.1
To simplify the proof, let us focus on H3 . We recall that
H3 = T (β̃ − β̂ )′ Σ̂3−1 (β̃ − β̂ ) ,
(B.20)
where β̂ = (Y ′ M1Y )−1Y ′ M1 y, β̃ = [Y ′ (M1 − M)Y ]−1Y ′ (M1 − M)y, Σ̂3 = σ̂ 2 [(Y ′ (M1 − M)Y /T )−1 −
(Y ′ M1Y /T )−1 ], and σ̂ 2 = (y − Y β̂ )′ M1 (y − Y β̂ )/T . Let us replace y and Y by y∗ = yR11 + Y R21
and Y∗ = Y R22 in (B.20). Then, we get:
H3∗ = T (β̃ ∗ − β̂ ∗ )′ Σ̂∗−1
3 (β̃ ∗ − β̂ ∗ )
33
(B.21)
where β̂ ∗ , β̃ ∗ , Σ̂3∗ , and σ̂ 2∗ are also obtained by replacing y by y∗ = yR11 + Y R21 and Y by Y∗ =
Y R22 . Now, we have:
Y∗′ M1Y∗ = R′22Y ′ M1Y R22 = R′22Y ′ M1Y R22 , Y∗′ M1 y∗ = R′22 (Y ′ M1 yR11 +Y ′ M1Y R21 )
(B.22)
so that we get:
′
′
−1 −1
′
′
′
−1
β̂ ∗ = R−1
22 (Y M1Y ) (R22 ) R22 (Y M1 yR11 +Y M1Y R21 ) = R22 (β̂ R11 + R21 ) ,
−1
β̃ ∗ = (Y∗′ (M1 − M)Y∗ )−1Y∗′ (M1 − M)y∗ = R−1
22 (β̃ R11 + R21 ), β̃ ∗ − β̂ ∗ = R22 (β̃ − β̂ )R11 .
Furthermore, we also have
′
′
−1
(Y∗′ (M1 − M)Y∗ /T )−1 − (Y∗′ M1Y∗ /T )−1 = R−1
− (Y ′ M1Y /T )−1 (R−1
22 (Y (M1 − M)Y /T )
22 ) ,
and, since R11 > 0, we get
−1
(β̃ ∗ − β̂ ∗ )′ (Y∗′ (M1 − M)Y∗ /T )−1 − (Y∗′ M1Y∗ /T )−1
(β̃ ∗ − β̂ ∗ )
′
′
−1
′
−1 −1
2
(β̃ − β̂ ) .
= R11 (β̃ − β̂ ) (Y (M1 − M)Y /T ) − (Y M1Y /T )
By the same way, we find
−1 ′
y∗ − Ȳ β̂ ∗ = yR11 +Y R22 −Y R22 R22Y ′ (M1 − M)Y R22
Y R22 M1 (yR11 +Y R22 )
= yR11 +Y R22 −Y β̂ R11 −Y R22 = (y −Y β̂ )R11 .
σ̂ 2∗ = (y∗ − Ȳ β̂ ∗ )′ M1 (y∗ − Ȳ β̂ ∗ )/T = R211 (y −Y β̂ )′ M1 (y − Ȳ β̂ )/T = R211 σ̂ 2 .
Hence, from (B.21), we can see that
−1
H3∗ = T R211 (β̃ − β̂ )′ (Y∗′ (M1 − M)Y∗ /T )−1 − (Y∗′ M1Y∗ /T )−1
(β̃ − β̂ )/R211 σ̂ 2
h
i−1
= T (β̃ − β̂ )′ σ̂ 2 (Y∗′ (M1 − M)Y∗ /T )−1 − σ̂ 2 (Y∗′ M1Y∗ /T )−1
(β̃ − β̂ )
= H3
(B.23)
and the same invariance holds for the author statistics so that Lemma 6.1 follows.
P ROOF OF T HEOREM 6.2
Lemma 6.1, we can write:
Let us replace y by ȳ and Y by Ȳ in the expressions of the statistics. By
Hi = T (β̃ ∗ − β̂ ∗ )′ Σ̂i∗−1 (β̃ ∗ − β̂ ∗ ) , i = 1, 2, 3 ,
Tl =
RH
=
−1
(β̃ ∗ − β̂ ∗ ) ,
κ l (β̃ ∗ − β̂ ∗ ) Σ̃l∗
′
κ R ȳ Σ̂∗R ȳ ,
′
l = 1, 2, 3, 4 ,
(B.24)
(B.25)
(B.26)
where β̂ ∗ , β̃ ∗ , Σ̂∗i , Σ̃∗l and Σ̂∗R are the correspondents of β̂ , β̃ , Σ̂i and Σ̃l defined in (4.2)-(3.11).
34
From (6.40) and by observing that and MZ2 = 0, we have
M ȳ = M v̄ , MȲ = MV̄ , M1 ȳ = M1 (µ 1 + v̄) , M1Ȳ = M1 (µ 2 + V̄ ) ,
−1
where µ 1 = M1 Z2 Π2 ζ = µ 2 P22
ζ and µ 2 = M1 Z2 Π2 P22 , where ζ = β P11 + P21 . From (??), we get:
Ȳ ′ (M1 − M)ȳ = (µ 2 + V̄ )′ (M1 − M)(µ 1 + v̄), Ȳ ′ M1 ȳ = (µ 2 + V̄ )′ M1 (µ 1 + v̄),
Ȳ ′ M1Ȳ
Ȳ ′ (M1 − M)Ȳ
(B.27)
= (µ 2 + V̄ )′ M1 (µ 2 + V̄ ) = ΩLS (µ 2 , V̄ ) ,
(B.28)
= (µ 2 + V̄ )′ (M1 − M)(µ 2 + V̄ ) = ΩIV (µ 2 , V̄ ) ,
(B.29)
so that β̂ ∗ = ΩLS (µ 2 , V̄ )−1 (µ 2 + V̄ )′ M1 (µ 1 + v̄), β̃ ∗ = ΩIV (µ 2 , V̄ )−1 (µ 2 + V̄ )′ (M1 − M)(µ 1 + v̄),
and β̃ ∗ − β̂ ∗ = C(µ 1 + v̄), where C = ΩIV (µ 2 , V̄ )−1 (µ 2 + V̄ )′ (M1 − M) − ΩLS (µ 2 , V̄ )−1 (µ 2 +
V̄ )′ M1 . Moreover, we have σ̂ 2∗ = T1 (ȳ − Ȳ β̂ ∗ )′ M1 (ȳ − Ȳ β̂ ∗ ) = T1 (µ 1 + v̄)′C∗ (µ 1 + v̄) =
1 2
σ̃ 2∗ = T1 (ȳ − Ȳ β̃ ∗ )′ M1 (ȳ − Ȳ β̃ ∗ ) = T1 (µ 1 + v̄)′ D′∗ D∗ (µ 1 + v̄) =
T ω LS ( µ 1 , µ 2 , V̄ , v̄),
1 2
ω IV (µ 1 , µ 2 , V̄ , v̄), with C∗ = I − M1 (µ 2 + V̄ )ΩLS (µ 2 , V̄ )−1 (µ 2 + V̄ )′ M1 and D∗ =
T
I − M1 (µ 2 + V̄ )ΩIV (µ 2 , V̄ )−1 (µ 2 + V̄ )′ (M1 − M) M1 . Hence, we get
Σ̂1∗ = ω 2IV (µ 1 , µ 2 , V̄ , v̄)ΩIV (µ 2 , V̄ )−1 − ω 2LS (µ 1 , µ 2 , V̄ , v̄)ΩLS (µ 2 , V̄ )−1 ,
1 2
1
Σ̂2∗ =
ω IV (µ 1 , µ 2 , V̄ , v̄)∆ , Σ̂3∗ = ω 2LS (µ 1 , µ 2 , V̄ , v̄)∆ ,
T
T
(B.30)
where ∆ = C′C = ΩIV (µ 2 , V̄ )−1 − ΩLS (µ 2 , V̄ )−1 . If T − k1 − k2 > G, then ∆ > 0, thus
Hi = T [µ 1 + v̄]′Γi (µ 1 , µ 2 , v̄, V̄ )[µ 1 + v̄] ,
i = 1, 2, 3 .
where Γi (µ 1 , µ 2 , v̄, V̄ ), i = 1, 2, 3 are defined in Theorem 6.2. Since T4 = (κ 4 /T )H3 , we find
T4 = κ 4 [µ 1 + v̄]′Γ3 (µ 1 , µ 2 , v̄, V̄ )[µ 1 + v̄] .
(B.31)
In addition, σ̃ 2∗2 = σ̂ 2∗ − σ̃ 2∗ (β̄ ∗ − β̃ ∗ )′ (Γ̄2 )−1 (β̄ ∗ − β̃ ∗ ) and σ̃ 2∗2 = ω 2LS (µ 1 , µ 2 , V̄ , v̄) − (µ 1 +
v̄)′C′ ∆ −1C(µ 1 + v̄) = (µ 1 + v̄)′ (C∗ −C′ ∆ −1C)(µ 1 + v̄) = ω 22 (µ 1 , µ 2 , V̄ , v̄) ≡ ω 22 , hence, we find
T2 =
κ2
[µ + v̄]′C′ ∆ −1C[µ 1 + v̄] .
ω 22 1
(B.32)
In the same way, we also get:
Tl =
κl
[µ + v̄]′C′ ∆ −1C[µ 1 + v̄] ,
ω 2l 1
l = 1, 3, RH =
κR
[µ + v̄]′ PD1Z̄2 [µ 1 + v̄] ,
ω 2R 1
where ω 2l , l = 1, 3 and ω 2R are defined in Section 3.
P ROOF OF L EMMA 6.1
Set Π2 a = 0 in the above proof of Theorem 6.2 and Corollary 6.3 follows.
35
P ROOF OF T HEOREM 6.4
From Theorem 6.2, we have
Tl = κ l [µ 1 + v̄]′Γ̄l (µ 1 , µ 2 , v̄, V̄ )[µ 1 + v̄], Hi = T [µ 1 + v̄]′Γi (µ 1 , µ 2 , v̄, V̄ )[µ 1 + v̄],
= κ R [µ 1 + v̄]′ΓR (µ 1 , µ 2 , v̄, V̄ )[µ 1 + v̄] ,
RH
for all l = 1, 2, 3, 4 and all i = 1, 2, 3, where Γ̄l (µ 1 , µ 2 , V̄ , v̄), Γ̄i (µ 1 , µ 2 , V̄ , v̄), ΓR (µ 1 , µ 2 , V̄ , v̄),
µ 1 , µ 2 , κ l and κ R are defined in Section 3.
Assume that Z is fixed. Under the normality assumption (6.27), µ 1 + v̄ is independent of V̄
and µ 1 + v̄|Z ∼ N(µ 1 , 1). Since C′ ∆ −1C, is symmetric idempotent of rank G, C and ∆ are defined
in Theorem 6.2, we have (µ 1 + v̄)′C′ ∆ −1C(µ 1 + v̄)| V̄ ∼ χ 2 (G, ν 1 ), where ν 1 = µ ′1C′ ∆ −1C µ 1 .
By the same way, the denominator of T1 (without the scaling factor) is (µ 1 + v̄)′ E(µ 1 + v̄)| V̄ ∼
χ 2 (k2 − G, υ 1 ), where E defined in Section 3 is symmetric idempotent of rank k2 − G, and with
υ 1 = µ ′1 E µ 1 . Furthermore, we have (C′ ∆ −1C)E = 0, hence
T1 | V̄ ∼ F(G, k2 − G; ν 1 , υ 1 ).
(B.33)
T2 | V̄ ∼ F(G, T − k1 − 2G; ν 1 , υ 2 ),
(B.34)
By the same way, we get:
where υ 2 = µ ′1 (C∗ −C′ ∆ −1C)µ 1 . Now, from the notations in Theorem 6.2, we can write:
T4 = κ 4 /(1 +
1
),
κ 2 T2
and since T2 | V̄ ∼ F(G, T − k1 − 2G; ν 1 , υ 2 ), we have
T4 | V̄ ∼ κ 4 /[1 +
1
T2 | V̄
(B.35)
∼ F(T − k1 − 2G, G; υ 2 , ν 1 ) so that
1
F(T − k1 − 2G, G; υ 2 , ν 1 )].
κ2
(B.36)
Note also that ω 2LS ≥ ω 22 entails that
T4 | V̄ ≤
κ4
(µ + v̄)′C′ ∆ −1C(µ 1 + v̄)| V̄ = κ̄ ∗2 T2 | V̄ ∼ κ̄ ∗2 F(G, T − k1 − 2G; ν 1 , υ 2 ),
ω 22 1
(B.37)
where κ 2 , κ 4 , κ̄ ∗2 are given in Theorem 6.4. For T3 , we note that its numerator and denominator
are such that
(µ 1 + v̄)′C′ ∆ −1C(µ 1 + v̄)| V̄ ∼ χ 2 (G; ν 1 ), ω 2IV
= (µ 1 + v̄)′ D′∗ D∗ (µ 1 + v̄)
∼ χ 2 (T − k1 − G; ν 3 ) ,
(B.38)
where ν 3 = µ ′1 D′∗ D∗ µ 1 . Since D′∗ D∗ (C′ ∆ −1C) 6= 0, T3 does not follow necessary a F-distribution.
By the same way, we get the results for H2 , H3 and RH .
36
References
Anderson, T. W., Rubin, H., 1949. Estimation of the parameters of a single equation in a complete
system of stochastic equations. Annals of Mathematical Statistics 20, 46–63.
Bekker, P., 1994. Alternative approximations to the distributions of instrumental variable estimators.
Econometrica 62, 657–681.
Cragg, J., Donald, S., 1993. Testing identifiability and specification in instrumental variables models. Econometric Theory 9, 222–240.
Doko Tchatoka, F. , Dufour, J.-M. , 2008. Instrument endogeneity and identification-robust tests:
some analytical results. Journal of Statistical Planning and Inference 138(9), 2649–2661.
Doko Tchatoka, F., Dufour, J.-M., 2011. Exogeneity tests and estimation in IV regressions. Technical report, Department of Economics, McGill University, Canada Montréal, Canada.
Donald, S. G., Newey, W. K., 2001. Choosing the number of instruments. Econometrica 69, 1161–
1191.
Dufour, J.-M., 1987. Linear Wald methods for inference on covariances and weak exogeneity tests
in structural equations. In: I. B. MacNeill , G. J. Umphrey, eds, Advances in the Statistical
Sciences: Festschrift in Honour of Professor V.M. Joshi’s 70th Birthday. Volume III, Time
Series and Econometric Modelling. D. Reidel, Dordrecht, The Netherlands, pp. 317–338.
Dufour, J.-M., 1997. Some impossibility theorems in econometrics, with applications to structural
and dynamic models. Econometrica 65, 1365–1389.
Dufour, J.-M. , 2003. Identification, weak instruments and statistical inference in econometrics.
Canadian Journal of Economics 36(4), 767–808.
Dufour, J.-M. , 2006. Monte carlo tests with nuisance parameters: A general approach to finitesample inference and nonstandard asymptotics in econometrics. Journal of Econometrics
138, 2649–2661.
Dufour, J.-M. , Khalaf, L. , 2002. Simulation based finite and large sample tests in multivariate
regressions. Journal of Econometrics 111(2), 303–322.
Dufour, J.-M., Khalaf, L., Beaulieu, M.-C., 2010. Multivariate residual-based finite-sample tests for
serial dependence and GARCH with applications to asset pricing models. Journal of Applied
Econometrics 25, 263–285.
Dufour, J.-M., Taamouti, M., 2005. Projection-based statistical inference in linear structural models
with possibly weak instruments. Econometrica 73(4), 1351–1365.
Dufour, J.-M., Taamouti, M., 2007. Further results on projection-based inference in IV regressions
with weak, collinear or missing instruments. Journal of Econometrics 139(1), 133–153.
37
Durbin, J., 1954. Errors in variables. Review of the International Statistical Institute 22, 23–32.
Guggenberger, P., 2010. The impact of a Hausman pretest on the size of the hypothesis tests. Econometric Theory 156, 337–343.
Hahn, J., Ham, J., Moon, H. R., 2010. The Hausman test and weak instruments. Journal of Econometrics 160, 289–299.
Hall, A. R., Peixe, F. P. M., 2003. A consistent method for the selection of relevant instruments.
Econometric Reviews 2(3), 269–287.
Hall, A. R., Rudebusch, G. D., Wilcox, D. W., 1996. Judging instrument relevance in instrumental
variables estimation. International Economic Review 37, 283–298.
Hansen, L. P., 1982. Large sample properties of generalized method of moments estimators. Econometrica 50, 1029–1054.
Hansen, L. P., Heaton, J. , Yaron, A. , 1996. Finite sample properties of some alternative GMM
estimators. Journal of Business and Economic Statistics 14, 262–280.
Hansen, L. P. , Singleton, K. , 1982. Generalized instrumental variables estimation of nonlinear
rational expectations models. Econometrica 50, 1269–1286.
Harville, D. A., 1997. Matrix Algebra from a Statistician’s Perspective. Springer-Verlag, New York.
Hausman, J., 1978. Specification tests in econometrics. Econometrica 46, 1251–1272.
Kleibergen, F. , 2002. Pivotal statistics for testing structural parameters in instrumental variables
regression. Econometrica 70(5), 1781–1803.
Kleibergen, F., 2005. Testing parameters in GMM without assuming that they are identified. Econometrica 73, 1103–1124.
Moreira, M. J. , 2003. A conditional likelihood ratio test for structural models. Econometrica
71(4), 1027–1048.
Nelson, C., Startz, R., 1990a. The distribution of the instrumental variable estimator and its t-ratio
when the instrument is a poor one. The Journal of Business 63, 125–140.
Nelson, C., Startz, R., 1990b. Some further results on the exact small properties of the instrumental
variable estimator. Econometrica 58, 967–976.
Phillips, P. C. B., 1989. Partially identified econometric models. Econometric Theory 5, 181–240.
Rao, T., Mitra, C., eds, 1971. Generalized Inverse of Matrices and its Applications. John Wiley &
Sons, Inc. Hoboken, New Jersey.
38
Revankar, N. S., Hartley, M. J., 1973. An independence test and conditional unbiased predictions in
the context of simultaneous equation systems. International Economic Review 14, 625–631.
Sargan, J., 1983. Identification and lack of identification. Econometrica 51, 1605–1633.
Staiger, D., Stock, J. H., 1997. Instrumental variables regression with weak instruments. Econometrica 65(3), 557–586.
Stock, J. H., Wright, J. H., 2000. GMM with weak identification. Econometrica 68, 1055–1096.
Stock, J. H., Wright, J. H., Yogo, M., 2002. A survey of weak instruments and weak identification in
generalized method of moments. Journal of Business and Economic Statistics 20(4), 518–529.
Wang, J., Zivot, E., 1998. Inference on structural parameters in instrumental variables regression
with weak instruments. Econometrica 66(6), 1389–1404.
Wu, D.-M., 1973. Alternative tests of independence between stochastic regressors and disturbances.
Econometrica 41, 733–750.
39
Download