Preliminary and incomplete Comments welcome 1

advertisement
Preliminary and incomplete
Comments welcome
1
GMM and hypothesis tests when rank conditions fail: a
regularization approach ∗
Jean-Marie Dufour †
McGill University
Pascale Valéry ‡
HEC-Montréal
February 20, 2009
∗
This work was supported by the William Dow Chair in Political Economy (McGill University), the Canada Research Chair Program (Chair in Econometrics, Université de Montréal), the Bank of Canada (Research Fellowship), a
Guggenheim Fellowship, a Konrad-Adenauer Fellowship (Alexander-von-Humboldt Foundation, Germany), the Institut
de finance mathématique de Montréal (IFM2), the Canadian Network of Centres of Excellence [program on Mathematics
of Information Technology and Complex Systems (MITACS)], the Natural Sciences and Engineering Research Council of
Canada, the Social Sciences and Humanities Research Council of Canada, and the Fonds de recherche sur la société et la
culture (Québec).
†
William Dow Professor of Economics, McGill University, Centre interuniversitaire de recherche en analyse des
organisations (CIRANO), and Centre interuniversitaire de recherche en économie quantitative (CIREQ). Mailing address:
Department of Economics, McGill University, Leacock Building, Room 519, 855 Sherbrooke Street West, Montréal,
Québec H3A 2T7, Canada. TEL: (1) 514 398 8879; FAX: (1) 514 398 4938; e-mail: jean-marie.dufour@mcgill.ca . Web
page: http://www.jeanmariedufour.com
‡
Service de l’enseignement de la finance, École des Hautes Études Commerciales de Montréal (HEC-Montréal),
3000 chemin de la Côte-Sainte-Catherine Montréal, Québec), Canada H3T 2A7. TEL: 1 (514) 340-7004, FAX: (514)
340-5632. email: pascale.valery@hec.ca
ABSTRACT
This paper considers the issue of testing hypotheses in the presence of (possibly) singular covariance
matrices. In this paper we specifically focus on the case that the finite sample covariance matrix is
not singular but converges to an asymptotic singular population matrix. The econometrician can
face this type of situations when the derivative matrix of the restrictions has a lower rank only at the
true value of the parameter, or situations where there may be an identification problem asymptotically that cannot be detected in finite samples. More generally, when the econometrician suspects
any possible singularity issue, the usual inverses are discarded and most of the time replaced with
generalized inverses, or g-inverses [see Moore (1977), Andrews (1987) for the generalized Wald
tests built on the Moore-Penrose inverse] or modified inverses as proposed by Lütkepohl and Burda
(1997). To face such difficulties, we propose to introduce modified inverses in such a way that the
continuity property is preserved (at least for some of them). Our modified inverses can be viewed
as a generalization of the one introduced by Lütkepohl and Burda (1997). Our contributions are as
follows. First, following Kato (1966), Tyler (1981), we propose to work with the eigenprojection:
the operator which projects onto the eigenspace (i.e. the subspace generated by the eigenvectors),
is continuous. In this way, the important continuity property (w.r.t. the elements of the matrix) is
preserved for eigenvalues and eigenprojections, even though eigenvectors are not continuous. The
eigenvalues are perturbed in order to make them well behave in a certain neighborhood without
losing accuracy. Among different regularizing methods, some can be specifically designed to be
continuous. Inverses so modified remain continuous. As a result, tests statistics are well defined
and the distributional theory built from them not misleading. Second, our regularized inverses are
all the more appealing because they naturally underly a decomposition of the covariance matrix into
three components, one viewed as regular built on large eigenvalues while the others which involve
the small eigenvalues may not be regular. Based on this decomposition, we identify a convenient decomposition for the test statistic itself. Third, the tests so regularized do embed three possible cases.
Case 1 corresponds to the usual case when using consistent sample covariance matrices which have
full rank. Case 2 corresponds to using singular consistent sample covariance matrices but with the
rank converging to the rank of the limit population matrix. In such a case, the limiting distribution is
modified only through an adjustment of the degree of freedom; this is the case covered by Andrews
(1987). Finally case 3 makes use of singular consistent sample covariance matrices which do not
satisfy the rank condition pined down by Andrews (1987). Under this condition, a regularization
is required to provide a valid test but at the cost of a fully modified asymptotic distribution. The
asymptotic distribution is not chi square anymore but can be approximated by simulations. Besides,
we also consider monotonic regularizations. In the monotonic case, we are able to bound the distribution of the test statistic. Such a bound can then be simulated to provide a valid test. Fourth,
cast into our decomposition framework, the generalized Wald tests defined in Andrews (1987) and
Lütkepohl and Burda (1997), consist of keeping only the first regular component of the test statistic
and setting to zero the other nonregular components. In discarding so the latter, their technique
may result in a loss of information. Revisiting their tests as we do may yield some efficiency gains.
Finally, Carrasco and Florens (2000), Carrasco, Chernov, Florens and Ghysels (2007), Carrasco
(2007) use related spectral decomposition based tools to regularize estimators when a continuum
of moments is used in a GMM framework. By contrast, we focus on testing issues by proposing
i
√
regularized test statistics for any n consistent estimator with the only requirement that ψ(θ̂n ) be
asymptotically normally distributed for which the delta method applies.
ii
1.
Introduction
This paper considers the issue of testing hypotheses in the presence of (possibly) singular covariance matrices. In this paper we specifically focus on the case that the finite sample covariance
matrix is not singular but converges to an asymptotic singular population matrix. The econometrician can face this type of situations when the derivative matrix of the restrictions has a lower rank
only at the true value of the parameter, or situations where there may be an identification problem
asymptotically that cannot be detected in finite samples. More generally, when the econometrician suspects any possible singularity issue, the usual inverses are discarded and most of the time
replaced with generalized inverses, or g-inverses [see Moore (1977), Andrews (1987) for the generalized Wald tests built on the Moore-Penrose inverse] or modified inverses as proposed by Lütkepohl
and Burda (1997). However, when using non-standard inverses, we have observed two difficulties,
of which econometricians are not always aware. First, the well-known continuous mapping theorem
so widely used by econometricians to derive asymptotic distributional results for test statistics does
not apply anymore because g-inverses are not (necessarily) continuous. As a matter of fact, we have
observed that the eigenvectors are not continuous functions in the elements of the matrix of interest
unlike the eigenvalues. Second, when performing the singular value decomposition of a matrix, the
eigenvectors corresponding to eigenvalues with multiplicity larger than one, are not uniquely defined, ruling out the possible convergence of the estimates towards population quantities. Ignoring
such concerns may lead to distributional results which are strictly speaking wrong.
To face such difficulties, we propose to introduce modified inverses in such a way that the continuity property is preserved (at least for some of them). Our modified inverses can be viewed as
a generalization of the one introduced by Lütkepohl and Burda (1997). Our contributions can be
formulated as follows. First, following Kato (1966), Tyler (1981), we propose to work with the
eigenprojection (projection operator) instead, as a trick to circumvent the discontinuity and nonuniqueness features of eigenvectors. The eigenprojection projects onto the invariant (to the choice
of the basis) eigenspace, i.e. the subspace generated by the eigenvectors. A lemma given by Tyler
(1981) states the continuity property for the total eigenprojection (the sum of the eigenprojections
over a subset of the eigenvalues). In this way, the important continuity property is preserved for
eigenvalues and eigenprojections even though eigenvectors are not continuous. By combining this
total eigenprojection technique with a specific design for the perturbation function of the eigenvalues, the inverses thus introduced in the paper remain continuous. As a result, test statistics are well
defined and the distributional theory built from them not misleading. This continuity property (w.r.t.
the elements of the matrix of interest) enjoyed by the inverses is an important contribution to the
econometric literature. Second, the regularized inverses involve a variance regularizing function (or
VRF) which consists in inflating the small eigenvalues that fall below a certain threshold so that
their inverse is well defined whereas the large eigenvalues remain unchanged. The specification of
the VRF allows us to define two classes of tests whether the threshold used to disentangle the small
eigenvalues from the the others is fixed or varies with the sample size. The first class of tests which
exploits a fixed threshold introduce regularized tests statistics with a nonstandard distribution (a
mixture of chi-square) that admit the standard chi-square distribution (with the degree of freedom
corresponding to the full rank) as an upper bound. This upper bound is very useful to the extent that
1
the usual critical points can be used to provide an asymptotically valid test. However the econometrician can do a little bit better from the viewpoint of power by simulating the test. Indeed, the usual
critical points yield a conservative test which can lead to a loss of power under the alternative. The
second class of tests allow the threshold to vary with the sample size. When the threshold declines
to zero at an appropriate rate, the regularized tests are asymptotically distributed as the chi-square
distribution with a reduced degree of freedom (corresponding to the number of eigenvalues greater
than zero) like in Lütkepohl and Burda (1997). However this second class of tests requires more
information regarding the asymptotic behavior of the sample eigenvalues. In this respect, a very
general result established by Eaton and Tyler (1994) where they generalize some nonclassical results given by a wide strand of statistical and multivariate analysis literature [see Anderson (1963)
Anderson (1987) Amemya (1986)...] provide us with the convergence speed of the sample eigenvalues (in the presence of eigenvalues with multiplicity greater than 1) needed for our distributional
results. Third,our modified inverses are all the more appealing because they naturally underly a
decomposition of covariance matrix into three components, one viewed as regular built on large
eigenvalues while the other ones which involve the small eigenvalues may not be regular. Based on
this covariance matrix decomposition, we identify a convenient decomposition for the test statistic
itself. Fourth, the tests so regularized do embed three possible cases. Case 1 corresponds to the
usual case when using consistent sample covariance matrices which have full rank. Case 2 corresponds to using sample covariance matrices which converge to a singular limiting matrix under the
condition that the rank of the matrix estimator does converge to the rank of the limiting population
matrix. In such a case, the limiting distribution is modified only through an adjustment of the degree
of freedom; this is the case covered by Andrews (1987) and Lütkepohl and Burda (1997). Finally
case 3 makes use of consistent sample covariance matrices which do not satisfy the rank condition
pined down by Andrews (1987). Under such a circumstance, a regularization is required to provide
a valid test but at the cost of a fully modified asymptotic distribution (under some specific designs).
Fifth, our regularized inverses may be viewed as a generalization of Lütkepohl and Burda (1997)’s
results by allowing more flexibility in selecting the threshold c for separating the eigenvalues. More
generally, the estimate of the perturbation function of the eigenvalues can be viewed as a Hodges’
estimator applied to the eigenvalues. With an appropriate choice of the regularization (tuning) parameter, the Hodges’ estimator is a consistent estimator not only for the "small" components of
the true eigenvalue vector but also for the large ones. This is the so-called superefficiency property
enjoyed by the Hodges’ estimator. To match our decomposition, the generalized Wald tests defined
in Andrews (1987) and Lütkepohl and Burda (1997), consist of keeping only the first regular component of the test statistic and setting to zero the other nonregular components. In discarding so
the latter, their technique may result in a loss of information. Revisiting their tests as we do may
yield some efficiency gains. More specifically, some of the regularizing functions used are our own
design while others are commonly used in the literature on inverse problems; see Engl, Hanke and
Neubauer (2000), Kress (1999). Some regularizing functions will be continuous in a neighborhood
of the true eigenvalues while others will only be monotonic. Even in the monotonic case we will be
able to bound the distribution of the test statistic. Such a bound can then be simulated to provide
a valid test. Fifth, whereas Moore (1977), Andrews (1987) and Lütkepohl and Burda (1997) did
focus on generalized Wald tests, we will extend the regularized tests beyond the sole Wald tests,
2
by considering C(α)−type tests. Insofar as they involve a quadratic form, they can be cast into
this framework. Following Dufour and Trognon (2001) who provide a comprehensive study of the
generalized C(α)−type tests, we shall revisit the latter in a regularized setup.
On the other hand, while our main concern is testing, some authors make use of the related
spectral decomposition based-tools to regularize estimators when a continuum of moments is used
in a GMM or IV framework; see Carrasco and Florens (2000), Carrasco, Chernov, Florens and
Ghysels (2007), Carrasco, Florens and Renault (2007), Carrasco (2007). In particular, Carrasco
(2007) propose some modified IV estimators based on different ways of inverting the covariance
matrix of instruments. Indeed, when the number of instruments is very large with respect to the
sample size or even infinite, the covariance matrix of the moment conditions becomes singular and
some non-standard inverses are required. By contrast, we focus on testing issues by proposing
modified test statistics.
The paper is organized as follows. In Section 2 we review the well-known methodology
of usual Wald-type tests and C(α)−type tests in two different frameworks, a GMM framework
together with a more general setup where the only requirement is to find an asymptotically
gaussian estimator. Then, we will introduce the regularized inverses together with the regularized
statistics in Section 3. More specifically, a decomposition of the statistic is highlighted through
the corresponding decomposition of the covariance matrix. In Section 4 we review and adapt
some spectral theory elements which serves as useful tools for stating convergence results for the
modified inverses. In particular, we highlight some (non)uniqueness and (dis)continuity properties
related to eigenvectors of a given matrix and provide a solution based on eigenprojection techniques
to surmount such difficulties. In Section 5, we will state new asymptotic distributional results for
the regularized statistics involving continuous regularizing functions. Finally we provide in Section
6 some examples. The proofs are gathered in the appendix.
2. Framework and test statistics
2.1. Basic assumptions
We consider an inference problem about a parameter of interest θ ∈ Ω ∈ Rp . In order to
identify the true value θ0 of θ, we consider an m × 1 vector score-type function Dn (θ, Yn ) where
Yn = [y1 , y2 , · · · , yn ]0 is an n × k stochastic matrix satisfying the following assumptions.
√
p
Assumption 2.1 nDn (θ, Yn ) → D∞ (θ; θ0 ) where D∞ (θ; θ0 ) is a random variable.
n→∞
£
¤
Assumption 2.2 (Identification) Eθ0 D∞ (θ; θ0 ) = 0 iff θ = θ0 . where the expectation is
taken w.r.t. the true probability distribution.
£
¤
Assumption 2.3 D∞ (θ; θ0 ) ∼ N 0, I(θ0 ) .
Thus
√
nDn (θ0 , Yn )
d
→
n→∞
£
¤
N 0, I(θ0 ) . Assumption 2.3 leads to a GMM estimator through the
3
population orthogonality condition
£
¤
Eθ0 D∞ (θ, θ0 ) = 0 .
Typically, Dn (θ, Yn ) has the form
n
1X
Dn (θ, Yn ) =
ht (θ, yt )
n
t=1
where Eθ0 [ht (θ0 , yt )] = 0 . Usually, Assumption 2.2 requires that m ≥ p. A GMM estimator is
obtained by minimizing w.r.t. θ an objective functionof the form
Mn (θ, Wn ) = Dn (θ, Yn )0 Wn Dn (θ, Yn )
(2.1)
where Wn is (usually) a symmetric positive definite matrix. Moreover, a standard regularity assumption which is usually made in the GMM framework is the following.
Assumption 2.4 (Nonsingular weighting matrix) The symmetric positive definite (possibly random) m × m matrix Wn satisfies the limiting condition:
plim Wn = W0 with det(W0 ) 6= 0 .
n→∞
(2.2)
Later on, we shall relax the above nonsingularity condition. This condition matters especially for the
LR-type tests built from a GMM objective function. In this paper, we consider testing problems in
the presence of singular covariance matrices. Under such circumstances, our purpose is to propose
solutions which may hold irrespective of such nonsingularity assumptions. Let θ̂n be the unrestricted
estimator obtained by minimizing the objective function Mn (θ, Wn ), and θ̂n0 the corresponding
constrained estimator under the null hypothesis:
H0 (ψ0 ) : ψ(θ) = ψ0
(2.3)
where ψ : Rp → Rq .
Usual distributional theory critically hinges on regularity conditions such as the assumptions
reviewed in this section. These assumptions define the standard regular framework upon which is
built the whole usual distributional theory. We shall relax later on theses assumptions, especially
those referring to rank conditions usually made on the restriction matrix or the covariance matrices
of interest.
Assumption 2.5 (Local regularity) There is a (nonempty) open neighborhood N1 of θ0 such that
the q × 1 vector function ψ(θ) is continuously differentiable with rank[P (θ)] = q for all θ ∈ N1 ,
where
P (θ) = ∂ψ/∂θ0 .
(2.4)
Let
H(θ0 , Yn ) =
∂
Dn (θ0 , Yn )
∂θ0
4
p
→
n→∞
J(θ0 ) .
(2.5)
In the regular framework, the m×m matrix I(θ0 ) and the m×p matrix J(θ0 ) are (usually) assumed
to be full-column rank matrices. If
n
Dn (θ, Yn ) =
1X
ht (θ; yt ) ,
n
t=1
I(θ0 ) and J(θ0 ) may be estimated using heteroskedastic, autocorrelation consistent (HAC) covariance matrix estimator:
n−1
X
Iˆn (θ) =
κ(j/Kn )Γ̂n (j, θ)
(2.6)
j=−(n−1)
where κ(·) is a kernel function, Kn is a sample-size dependent bandwidth parameter,
( P
n
1
ht (θ; yt )ht−j (θ; yt−j )0 if j ≥ 0,
Γ̂n (j, θ) = n1 Pt=−(j−1)
n
0 if j < 0,
t=−(j−1) ht+j (θ; yt+j )ht (θ; yt )
n
(2.7)
and
∂Dn
Jˆn (θ) =
(θ) = H(θ, Yn ) ,
∂θ0
with θ replaced by a consistent estimator.
(2.8)
2.2. Score and LR-type test statistics
In this paper we focus on two types of test statistics, namely the score-type and the LR-type
statistics. We will now briefly review the theory of C(α) tests which encompass LM-type tests.
The C(α) test statistic is well-defined under the following assumptions; for a more comprehensive
presentation of C(α) tests, see Dufour and Trognon (2001) and the references therein.
Assumption 2.6 (Root-n consistent restricted estimator) θ̃n0 is a consistent restricted estimate of
√
θ0 such that ψ(θ̃n0 ) = 0 and n(θ̃n0 − θ0 ) is asymptotically bounded in probability.
For the statistic to be well-defined, we also need consistent estimators as specified in the next assumption.
Assumption 2.7 (Consistent estimators)
plim Iˆn (θ̃n0 ) = I(θ0 ),
n→∞
plim P (θ̃n0 ) = P (θ0 ),
n→∞
plim Jˆn (θ̃n0 ) = J(θ0 ) .
n→∞
Let P̃0 = P (θ̃n0 ), I˜0 = Iˆn (θ̃n0 ) and J˜0 = Jˆn (θ̃n0 ) be consistent estimators. The C(α) test statistic is
then defined as
P C(θ̃n0 ; ψ) = nDn (θ̃n0 )0 W̃0 Dn (θ̃n0 )
(2.9)
where
¡
¢−1 0 £ ¡ 0 −1 ¢−1 0 ¤−1 ¡ 0 −1 ¢−1 0 −1
W̃0 ≡ I˜0−1 J˜0 J˜00 I˜0−1 J˜0
P̃0 P̃0 J˜0 I˜0 J˜0
P̃0 P̃0 J˜0 I˜0 J˜0
J˜0 I˜0 .
5
Under standard regularity conditions, the C(α) statistic P C(θ̃n0 ; ψ) has an asymptotic null distribution without nuisance parameters, namely χ2 (q).
We will also investigate a "LR-type" tests initially suggested by Newey and West (1987):
LR(ψ) = n[Mn (θ̂n0 , I˜0−1 ) − Mn (θ̂n , I˜0−1 )]
(2.10)
Mn (θ, I˜0−1 ) = Dn (θ)0 I˜0−1 Dn (θ) ,
(2.11)
where
and Wn = I˜0−1 . Under standard regularity conditions, this test statistic is asymptotically pivotal
with a χ2 (q) distribution.
2.3. Usual Wald-type tests
More generally, there is no need to restrict our attention to a GMM estimator as the one introduced in equation (2.1). Indeed, a Wald-type test can be built from any root-n consistent estimator
satisfying the assumption below.
√
Assumption 2.8 (Estimator n convergence) θ̂n is a consistent estimator of θ0 such that:
√
n(θ̂n − θ0 )
L
→
N (0, Σθ ) .
n→∞
(2.12)
The above condition can even be weakened to allow for only the restrictions to be root-n consistent
without perturbing the Wald statistic.
√
Assumption 2.9 ( n convergence of the restrictions)
√
n(ψ(θ̂n ) − ψ(θ0 ]))
L
→
n→∞
N (0, Σψ ) .
(2.13)
It is important to note that Assumption 2.8 implies Assumption 2.9, but not the converse.
The usual Wald statistic for testing H0 (ψ0 ) is:
Wn (ψ0 ) = n[ψ(θ̂) − ψ0 ]0 Σ̂ψ−1 [ψ(θ̂) − ψ0 ]
(2.14)
provided the inverse exists, where
Σ̂ψ = P (θ̂)Σ̂θ P (θ̂)0 ,
P (θ) = ∂ψ/∂θ0 and Σ̂θ being a consistent estimator of Σθ . If H0 (ψ0 ) holds and the q × q-matrix
Σψ (θ) = P (θ)Σθ P (θ)0
is nonsingular [i.e. |Σψ (θ)| 6= 0 for θ ∈ V (θ0 ), where | · | stands for the determinant], then Wn (ψ0 )
has an asymptotic χ2 (q) distribution. This is not necessarily true, however, if Σψ (θ) is singular. The
latter may hold, if Σθ is singular or if P (θ) does not have full row rank q. In this case, Σψ (θ) does
not admit a usual inverse but can still be inverted by means of a generalized inverse or g-inverse. In
6
Section 3, we introduce the family of regularized test statistics based on regularized inverses of the
covariance matrix.
3. Regularized inference
In this section, we introduce the concept of regularized tests which embed three possible cases.
Case 1 corresponds to the usual case when using consistent sample covariance matrices which have
full rank. Case 2 corresponds to using sample covariance matrices which converge to a singular
limiting matrix under the condition that the rank of the matrix estimator does converge to the rank
of the limiting population matrix. In such a case, the limiting distribution is modified only through
an adjustment of the degree of freedom; this is the case covered by Andrews (1987) and Lütkepohl
and Burda (1997). Finally case 3 makes use of consistent sample covariance matrices which do not
satisfy the rank condition pined down by Andrews (1987). Under such a circumstance, a regularization is required to provide a valid test but at the cost of a fully modified asymptotic distribution.
This is the path investigated in this paper. We consider situations when the rank of the covariance
matrix is incomplete asymptotically but not in finite samples. We can face this type of situations
when the derivative matrix of the restrictions has a lower rank only at the true value of the parameter, or situations where there is an identification problem asymptotically that cannot be detected in
finite samples. To the best of our knowledge, such cases have not been investigated in the literature.
Under Assumption 3.1 and more generally for any matrix with an incomplete rank – either asymptotically or in finite samples– the usual inverse is not defined and requires either using a generalized
inverse or transforming it such that it becomes invertible. Based on spectral decomposition tools, we
propose to regularize the inverse of the covariance matrix by means of an eigenvalue perturbation
function. The eigenvalue perturbation function will be specifically designed in order to make the
inverse well-behave in a neighborhood of the true eigenvalue. The variance regularization method
works under general conditions as far as the threshold c(n) converges to 0 at an appropriate rate
depending on the convergence rate of the eigenvalues of the sample covariance matrix toward those
of the limiting matrix. Further, the variance regularization method lead to a decomposition of the
regularized statistics into three components, one viewed as regular built on large eigenvalues while
the others which involve the small eigenvalues may not be regular.
3.1. Variance regularization
We first introduce some notation. Let λ̄ = (λ1 , . . . , λq )0 where λ1 ≥ λ2 ≥ . . . ≥ λq be the
eigenvalues of a q × q covariance matrix Σ, and V an orthogonal matrix such that Σ = V ΛV 0 ,
where Λ = diag(λ1 , . . . , λq ). Specifically, V consists of eigenvectors of the matrix Σ ordered so
that ΣV = V Λ. Let m(λ) be the multiplicity of the eigenvalue λ. Although the matrix Λ is uniquely
defined, the matrix V which consists of the eigenvectors is not uniquely defined when there is an
eigenvalue with multiplicity m(λ) > 1. The eigenvectors which correspond to eigenvalues with
m(λ) > 1 are uniquely defined only up to post-multiplication by an m(λ) × m(λ) orthogonal
matrix. Moreover, let Σ̂ be a consistent estimator of Σ with eigenvalues λ̂1 ≥ λ̂2 ≥ . . . ≥ λ̂q and
V̂ an orthogonal matrix such that Σ̂ = V̂ Λ̂V̂ 0 where Λ̂ = diag(λ̂1 , . . . , λ̂q ). For c > 0, we denote
q(Σ, c) the number of eigenvalues λ such that λ > c and q(Σ̂, c) the number of eigenvalues λ̂ such
7
that λ̂ > c.
If rank(Σ̂) = rank(Σ) = q with probability 1, i.e. both matrices are almost surely (a.s.)
nonsingular, so the inverses Σ −1 = V Λ−1 V 0 and Σ̂ −1 = V̂ Λ̂−1 V̂ 0 are (a.s.) well defined. However,
if rank(Σ) < q and/or rank(Σ̂) < q, we need to make adjustments. For this, we define a regularized
inverse of a covariance matrix Σ:
ΣR = V Λ˜V 0
(3.1)
where


g[λ1 ; c]
0


..
Λ˜ = Λ˜[λ̄; c] = 
 ,
.
0
g[λq ; c]
(3.2)
g[λ; c] ≥ 0, and g[λ; c] is non-increasing in λ. The scalar function g[λ; c] modifies the inverse of the
eigenvalues in order to make the inverse well-behaved in a neighborhood of the true eigenvalues.
We shall call it the variance regularization function (VRF).
We now introduce a partition of the matrix Λ˜ into three submatrices where c represents a threshold which may depend on the sample size and possibly on the sample itself, i.e. c = c[n, Yn ]:
 ˜

Λ1 [λ̄; c]
0
0
Λ2˜[λ̄; c]
0  .
Λ˜ =  0
(3.3)
˜
0
0
Λ3 [λ̄; c]
Let qi = dim Λi˜[λ̄; c], for i = 1, 2, 3, with q1 = q(Σ, c), q2 = m(c) and q3 = q − q1 − q2 . In other
words, q1 represents the number of eigenvalues λi > c. The three above components correspond to:
£
¤
Λ1˜[λ̄; c] = diag g[λ1 ; c], . . . , g[λq1 ; c] for λ > c ,
Λ2˜[λ̄; c] = g(c; c)Iq2 for λ = c ,
an m(c) × m(c) matrix with m(c) denoting the multiplicity of the eigenvalue λ = c (if any),
£
¤
Λ3˜[λ̄; c] = diag g(λq2 +1 ; c), . . . , g(λq ; c) for λ < c .
Let V1 (c) be a q × q1 matrix whose columns are the eigenvectors associated with the eigenvalues
λ > c arranged in the same order as the eigenvalues. The eigenvectors associated with λ > c form a
basis of the eigenspace corresponding with λ. If m(λ) = 1, these eigenvectors are uniquely defined,
otherwise not. The same holds for the q × q2 matrix V2 (c) whose columns are the eigenvectors
associated with the eigenvalues λ = c and for the q × q3 matrix V3 (c) whose columns are the
eigenvectors associated with the eigenvalues λ < c. Λ1˜[λ̂; c], Λ2˜[λ̂; c],Λ3˜[λ̂; c], V̂1 (c), V̂2 (c) and
V̂3 (c) denote the corresponding quantities based on the estimate Σ̂, with dimΛ1˜[λ̂; c] = q̂1 =
card{i ∈ I : λ̂i > c(n)}, dimΛ2˜[λ̂; c] = q̂2 = card{i ∈ I : λ̂i = c(n)}, dimΛ1˜[λ̂; c] = q̂3 =
card{i ∈ I : λ̂i < c(n)}, respectively.
8
The following VRF will turn out particularly convenient.

 1
if λ̂n,j > c(n)
g(λ̂n,j ; c) = λ̂n,j 1

if λ̂n,j ≤ c(n) ,
(3.4)
²+γ|c(n)−λ̂n,j |
where γ > 0. It can be viewed as a modified Hodges’ estimator applied to the eigenvalues.
3.2. Regularized score-type tests
Consider the q × m matrix
¡
¢−
Q̃n ≡ Q̃[Wn ] ≡ P̃n J˜n0 Wn J˜n J˜n0 Wn ,
(3.5)
ˆ θ̃n0 ), P (θ) = ∂ψ/∂θ0 , J(θ)
ˆ = ∂D0n (θ) = Hn (θ) .
with P̃n = P (θ̃n0 ), J˜n = J(
∂θ
The assumptions below define the specific nonregular setup we shall be investigating throughout
this paper. Specifically, we will examine to what extent relaxing the usual nonsingularity assumptions will affect the behavior of the test criterion. The proposed regularizing tools aim at curing the
misbehavior of the test criterion under such circumstances.
Assumption 3.1 ( Incomplete restriction rank) There is a (non-empty) open neighborhood N2 of
θ0 such that ψ(θ) is continuously differentiable and rank[P (θ)] = r1 for all θ ∈ N1 such that
0 < r1 < q.
In the following, we analyze situations where the asymptotic covariance matrix does not necessarily have full rank. We shall further investigate situations where the rank is incomplete not only
asymptotically but also in finite samples. To the best of our knowledge, such cases have not been
investigated yet in the literature. Under Assumption 3.1 and more generally for any matrix with an
incomplete rank – either asymptotically or in finite samples– the usual inverse is not defined and
requires either using a generalized inverse or transforming it such that it becomes invertible. While
Dufour and Trognon (2001, Proposition 3.1) have chosen the generalized inverse approach to proceed, we shall opt for the regularizing approach instead. The variance regularization method will
work under general conditions as long as the threshold c(n) converges to 0 at an appropriate rate
depending on the rate of convergence of the covariance matrix eigenvalues.
The following assumption is crucial for the distribution of the regularized C(α) test; for those
interested in specific conditions under which this assumption holds in a GMM framework, see
Dufour and Trognon (2001).
Assumption 3.2
£
¤
P
P
Z̃n → Zn , Zn → Z ∼ N 0, Vψ (θ0 ) ,
√
√
where Z̃n = nQ̃n Dn (θ̃n0 ), Z̃n = nQ(θ0 )Dn (θ0 ), Z = Q(θ0 )D∞ (θ0 ).
We now introduce the regularized C(α) statistic:
£
¤
P C(θ̃n0 ; ψ) = nDn (θ̃n0 )0 Q̃0n Q̃n I˜0 Q̃0n R Q̃n Dn (θ̃n0 ) .
9
(3.6)
(3.7)
The latter can be decomposed in three components. To do so, we denote
so that
Vψ (θ) ≡ Q(θ)I(θ)Q(θ)0 ,
(3.8)
Ṽψ (θ̃n0 ) ≡ Q̃n I˜0 Q̃0n .
(3.9)
The spectral decomposition of Ṽψ (θ̃n0 ) is:
Ṽψ (θ̃n0 ) = V (θ̃n0 )Λ(θ̃n0 )V (θ̃n0 )0 .
(3.10)
This yields for the Moore-Penrose inverse
[Ṽψ (θ̃n0 )]+ = V̂1 (c)Λ̂˜1 (c)V̂10 (c) ,
(3.11)
[Ṽψ (θ̃n0 )]R = V̂1 (c)Λ̂˜1 (c)V̂10 (c) + V̂2 (c)Λ̂˜2 (c)V̂20 (c) + V̂3 (c)Λ̂˜3 (c)V̂30 (c) .
(3.12)
and the regularized inverse
Based on this regularized inverse, we introduce the regularized C(α) statistic, which can be decomposed as follows:
P C R (θ̃n0 ; ψ) = nDn (θ̃n0 )0 Q̃0n [Ṽψ (θ̃n0 )]R Q̃n Dn (θ̃n0 )
= nDn (θ̃n0 )0 Q̃0n
£
  0
 ˜
V̂1
0
0
¤ Λ̂1 (c)
V̂1 (c) V̂2 (c) V̂3 (c)  0
Λ̂2˜(c)
0  V̂20  Q̃n Dn (θ̃n0 )
0
0
Λ̂3˜(c)
V̂30
= P C 1n (c) + P C 2n (c) + P C 3n (c) ,
(3.13)
where
P C 1n (c) = nDn (θ̃n0 )0 Q̃0n V̂1 (c)Λ̂˜1 (c)V̂10 (c)Q̃n Dn (θ̃n0 ) ,
(3.14)
P C 2n (c) = nDn (θ̃n0 )0 Q̃0n V̂2 (c)Λ̂˜2 (c)V̂20 (c)Q̃n Dn (θ̃n0 ) ,
(3.15)
P C 3n (c) = nDn (θ̃n0 )0 Q̃0n V̂3 (c)Λ̂˜3 (c)V̂30 (c)Q̃n Dn (θ̃n0 ) .
(3.16)
It is important to remark that the regularized C(α) statistic given in equation (3.13) can easily be
simulated together with its three components as soon as we have a consistent estimate Ṽψ (θ̃n0 ) for
the limiting covariance matrix Vψ (θ0 ) of the score vector. Then applying the spectral decomposition
tools followed by the proposed regularizing design, we can easily simulate the regularized statistic.
The appealing and convenient property of being simulable enjoyed by this specific formulation
widens its scope of application to general nonregular situations. Another appealing feature of this
regularized C(α) statistic is that it can be decomposed into three independent components. The
independence feature of its components arises from the eigenprojection onto different eigenspaces.
10
This property greatly simplifies the achievement of the distributional result of the regularized C(α)
test.
3.3. Regularized LR-type tests
We now study the behavior of the LR-type statistic under the following nonregularity assumption.
Assumption 3.3 (Singularity of the weighting matrix)
det(I(θ0 )) = 0 .
We propose to regularize "LR-type " test statistic in the following way:
LR(ψ)R = n[Mn (θ̂n0 , I¯n,R ) − Mn (θ̂n , I¯n,R )]
(3.17)
where
Mn (θ, I¯n,R ) = Dn (θ)0 I¯n,R Dn (θ)
  0
 ˜
V̂1
Λ̂1 (c)
0
0
£
¤
= Dn (θ)0 V̂1 (c) V̂2 (c) V̂3 (c)  0
Λ̂2˜(c)
0  V̂20  Dn (θ)
V̂30
0
0
Λ̂3˜(c)
= Dn (θ)0 V̂1 (c)Λ˜1 (c)V̂10 (c)Dn (θ) + Dn (θ)0 V̂2 (c)Λ˜2 (c)V̂20 (c)Dn (θ)
+Dn (θ)0 V̂3 (c)Λ˜3 (c)V̂30 (c)Dn (θ) .
(3.18)
In the statistic Mn (θ, I¯n,R ), I¯n,R is any consistent estimator of the regularized covariance matrix IR (θ0 ) which is typically a function of a preliminary estimator θ̄n of θ: I¯n,R = Iˆn,R (θ̄n ) =
£
¤
Iˆn (θ̄n ) R where Iˆn (θ) is defined as in equation (2.6). In the sequel let Mn,R (θ) ≡ Mn (θ, I¯n,R ).
This yields a decomposition of the LR statistic into three components
LR(ψ)R = n[Mn (θ̂n0 , I¯n,R ) − Mn (θ̂n , I¯n,R )]
= n[Dn (θ̂n0 )0 I¯n,R Dn (θ̂n0 ) − Dn (θ̂n )0 I¯n,R Dn (θ̂n )]
= n[Dn (θ̂n0 )0 V̂1 (c)Λ˜1 (c)V̂10 (c)Dn (θ̂n0 ) − Dn (θ̂n )0 V̂1 (c)Λ˜1 (c)V̂10 (c)Dn (θ̂n )]
+n[Dn (θ̂n0 )0 V̂2 (c)Λ˜2 (c)V̂20 (c)Dn (θ̂n0 ) − Dn (θ̂n )0 V̂2 (c)Λ˜2 (c)V̂20 (c)Dn (θ̂n )]
+n[Dn (θ̂n0 )0 V̂3 (c)Λ˜3 (c)V̂30 (c)Dn (θ̂n0 ) − Dn (θ̂n )0 V̂3 (c)Λ˜3 (c)V̂30 (c)Dn (θ̂n )]
= LR1,n + LR2,n + LR3,n
(3.19)
The regularized estimator Iˆn,R (θ̄n ) of the covariance matrix IR (θ0 ) corresponds to:
Iˆn,R (θ̄n ) =
n−1
X
κ(j/Kn )Γ̂n,R (j, θ̄n )
j=−(n−1)
11
(3.20)
£
¤
where Γ̂n,R (j, θ̄n ) = Γ̂n (j, θ̄n ) R with Γ̂n (j, θ̄n ) defined as in equation (2.7).
3.4. Regularized Wald tests
Similarly, we can decompose the original Wald statistic Wn in (2.14) as
Wn = W1n (c) + W2n (c) + W3n (c)
(3.21)
0
W1n (c) = n ψ(θ̂)0 V̂1 (c)Λ̂−1
1 (c)V̂1 (c)ψ(θ̂) ,
(3.22)
where
W2n (c) =
W3n (c) =
0
nψ(θ̂)0 V̂2 (c)Λ̂−1
2 (c)V̂2 (c)ψ(θ̂)
0
nψ(θ̂)0 V̂3 (c)Λ̂−1
3 (c)V̂3 (c)ψ(θ̂)
,
(3.23)
.
(3.24)
We now propose to consider regularized Wald statistics defined as follows:
W̃n = n ψ(θ̂)0 Σ̂R ψ(θ̂) = n ψ(θ̂)0 V̂ Λ̂˜ V̂ 0 ψ(θ̂)


g(λ̂1 ; c)
0

 0
..
= n ψ(θ̂)0 V̂ 
 V̂ ψ(θ̂)
.
0
g(λ̂q ; c)
  0
 ˜
V̂1
Λ̂1 (c)
0
0
£
¤
= n ψ(θ̂)0 V̂1 (c) V̂2 (c) V̂3 (c)  0
Λ̂2˜(c)
0  V̂20  ψ(θ̂)
0
0
Λ̂3˜(c)
V̂30
(3.25)
W̃n = n ψ(θ̂)0 V̂1 (c)Λ̂1˜(c)V̂10 (c)ψ(θ̂) + nψ(θ̂)0 V̂2 (c)Λ̂2˜(c)V̂20 (c)ψ(θ̂) + nψ(θ̂)0 V̂3 (c)Λ̂3˜(c)V̂30 (c)ψ(θ̂)
= W̃1n (c) + W̃2n (c) + W̃3n (c) .
(3.26)
By partitioning the eigenvalue matrix Λ˜ into three blocks, Λ1˜(c) for λ > c, Λ2˜(c) for λ = c
and Λ3˜(c) for λ < c, we have identified a convenient decomposition of the statistic into three
components: a first component involving the "large" eigenvalues, a second component gathering
the eigenvalues exactly equal to the threshold c and a third component which incorporates the small
remaining eigenvalues. As we will see below, the first component can be viewed as "regular", while
the other ones may not be "regular". In particular, the third component requires a regularization.
Indeed, because of the invertibility difficulties raised from small values of λ, we will replace the
latter with regularized bounded eigenvalues. Instead, Lütkepohl and Burda (1997) keep only the
eigenvalues which are greater than the threshold c, which boils down to setting W̃2n (c) = 0 and
W̃3n (c) = 0. Discarding the small eigenvalues in such a way may result in a loss of information.
It is important to note that in finite samples it will be difficult to disentangle between the estimates
which really correspond to λ = c from those which are close to c but are distinct from c. This
makes the estimation procedure trickier and the asymptotic distribution more complicated. Note
12
that W1n (c) = W̃1n (c) for this is the regular component common to both statistics, the usual Wald
and the regularized Wald statistics. Moreover when there is no eigenvalues exactly equal to the
threshold c, m(c) = 0, and the decomposition involves only two components.
4. Preliminary results on eigenprojections
In this section, we discuss some non-uniqueness and discontinuity issues regarding the eigenvectors of a given matrix. While it is well-known in spectral theory that the eigenvectors corresponding to multiple eigenvalues are not uniquely defined (but only up to the post multiplication by an
m(λ) × m(λ) orthogonal matrix with m(λ) indicating the multiplicity of the eigenvalue), econometricians are not always cautious about checking such conditions. Second, whereas the eigenvalues
are generally known to be continuous functions of the elements of the matrix, this statement does
not necessarily hold for the eigenvectors. The main pitfall consists of drawing some convergence
results for the estimates of the eigenvectors based on the consistency of the sample matrix which
critically hinges on the continuity assumption of eigenvectors w.r.t. the elements of the matrix.
Even in the deterministic case, the eigenvectors are not necessarily continuous functions of the elements of the matrix. To see the discontinuity of the eigenvectors, we consider the following simple
counter-example.
Example 4.1 Let A(x) be the matrix function defined as:
#
"

1
+
x
0


if x < 0


1−x

 0
(4.1)
A(x) =
"
#



1 x



if x ≥ 0 .

x 1
This matrix function is clearly continuous at x = 0, with A(0) = I2 . However, for x < 0, the
spectral decomposition of A(x) is:
· ¸
· ¸
¤
¤
1 £
0 £
1
0
0 1
A(x) = (1 + x)
+ (1 − x)
(4.2)
0
1
[with (1 + x) and (1 − x) being the eigenvalues and (1, 0)0 and (0, 1)0 the eigenvectors], while for
x > 0, it is
· ¸
· ¸
¤
¤
1
1
1 £
1 £
1 1 + √ (1 − x)
1 −1
A(x) = √ (1 + x)
(4.3)
1
−1
2
2
[with (1 + x) and (1 − x) being the eigenvalues and √12 (1, 1)0 and √12 (1, −1)0 the eigenvectors].
Clearly, the eigenvalues (1 + x) and (1 − x) are continuous at x = 0 whereas the eigenvectors are
not the same wether x → 0+ or x → 0− .
Being unaware of this caveat may lead to wrong distributional results through mistakenly applying
the continuous mapping theorem to objects that are not continuous. Nevertheless, there exists functions of the eigenvectors which are continuous w.r.t. the elements of the matrix. Specifically, for
13
an eigenvalue λ, the projection matrix P (λ) on the space spanned by the eigenvectors associated
with λ - the eigenspace V (λ) - is continuous in the elements of the matrix. This follows from the
fact that V (λ) is invariant to the choice of a basis [eigenvectors of λ]. For further discussion of this
important property, see Rellich (1953), Kato (1966) and Tyler (1981).
In order to derive the asymptotic distribution of the regularized statistics, it will be useful to
review and adapt some results on spectral theory used by Tyler (1981). Let S(Σ) denote the spectral
set of Σ, i.e. the set of all the eigenvalues of Σ. The eigenspace of Σ associated with λ is defined
as all the linear combinations from a basis of eigenvectors xi (λ), i = 1, . . . , m(λ), i.e.
V (λ) = {xi (λ) ∈ Rq |Σxi (λ) = λxi (λ)} .
(4.4)
Clearly, dim V (λ) = m(λ) . Since Σ is a q × q matrix symmetric in the metric of a real positive
definite symmetric matrix T (i.e. TΣ is symmetric), we have:
X
Rq =
V (λ) .
(4.5)
λ∈S(Σ)
The eigenprojection of Σ associated with λ, denoted P (λ), is the projection operator onto V (λ)
w.r.t. decomposition (4.5) of Rq . For any set of vectors xi (λ) in V (λ) such that xi (λ)0 Txj (λ) =
δij , where δij denotes the Kronecker’ s delta, P (λ) has the representation
m(λ)
P (λ) =
X
xj (λ)xj (λ)0 T .
(4.6)
j=1
P (λ) is symmetric in the metric of T. This yields
X
X
λP (λ) ,
Σ̂ =
λ̂P (λ̂) .
Σ=
λ∈S(Σ)
(4.7)
λ̂∈S(Σ̂)
If v is any subset of the spectral set
with
Pt
PS(Σ), then the total eigenprojection for Σ associated
the eigenvalues in v is defined to be λ∈v P (λ). More specifically, let Pk,t (λ) = l=k P (λl ).
We report below a lemma given by Tyler (1981, Lemma 2.1, p. 726) which states an important
continuity property for eigenvalues and eigenprojections on eigenspaces for non-random symmetric
matrices which will be useful to establish the consistency for the sample regularized inverses toward
their population quantity.
Lemma 4.2 (Continuity of eigenvalues and eigenprojections) Let Σn be a q ×q real matrix symmetric in the metric of a real positive definite symmetric matrix Tn with eigenvalues λ1 (Σn ) ≥
λ2 (Σn ) ≥ . . . ≥ λq (Σn ). Let Pk,t (Σn ) represent the total eigenprojection for Σn associated with
λk (Σn ) . . . λt (Σn ) for t ≥ k. If Σn → Σ as n → ∞, then
i) λk (Σn ) → λk (Σ),
ii) Pk,t (Σn ) → Pk,t (Σ) provided λk−1 (Σ) 6= λk (Σ) and λt (Σ) 6= λt+1 (Σ) .
14
This lemma tell us that the eigenvalues are continuous functions in the elements of the matrix. The
same property holds for the projection operators [or equivalently for the projection matrices for
there exists a one-to-one relation relating the operator to the matrix with respect to the given bases]
associated with the eigenvalues and is transmitted to their sum. NoP
matter what the multiplicity
t
of the eigenvalues involved in the total eigenprojection [Pk,t (λ) =
l=k P (λl )], this continuity
property holds provided that we can find one eigenvalue just before and one just after which are
distinct.
We consider first almost sure (a.s.) convergence and then convergence in probability (in symbol
p
→). It will be useful to extend Lemma 4.2 to random symmetric matrices. To the best of our
knowledge, these results are not stated elsewhere. In the following we will tacitly assume that a
probability space (Z, AZ , P ) is given and that all random variables are defined on this space.
Lemma 4.3 Let Σ̂n be a q × q real random matrix symmetric in the metric of a real positive definite
symmetric random matrix Tn and with eigenvalues λ1 (Σ̂n ) ≥ λ2 (Σ̂n ) ≥ . . . ≥ λq (Σ̂n ). Let
Pk,t (Σ̂n ) represent the total eigenprojection for Σ̂n associated with λk (Σ̂n ) . . . λt (Σ̂n ) for t ≥ k.
a.s.
If Σ̂n → Σ as n → ∞ , then:
a.s.
i) λk (Σ̂n ) → λk (Σ), and
a.s.
ii) Pk,t (Σ̂n ) → Pk,t (Σ) provided λk−1 (Σ) 6= λk (Σ) and λt (Σ) 6= λt+1 (Σ) .
In the sequel, we will show that the continuity property of the eigenvalues and eigenprojections
established in the a.s. case, remain valid in the case of convergence in probability .
Lemma 4.4 Let Σ̂n be a q × q real random matrix symmetric in the metric of a real positive definite
symmetric random matrix Tn with eigenvalues λ1 (Σ̂n ) ≥ λ2 (Σ̂n ) ≥ . . . ≥ λq (Σ̂n ). Let Pk,t (Σ̂n )
p
represent the total eigenprojection for Σ̂n associated with λk (Σ̂n ) . . . λt (Σ̂n ) for t ≥ k. If Σ̂n → Σ
as n → ∞ , then, for 1 ≤ k ≤ q:
p
i) λk (Σ̂n ) → λk (Σ),
p
ii) Pk,t (Σ̂n ) → Pk,t (Σ) provided λk−1 (Σ) 6= λk (Σ) and λt (Σ) 6= λt+1 (Σ) .
5. Asymptotic theory
In this section, we state some asymptotic results for the regularized statistics which rely on different specifications of the regularized inverses. We will first study the simple case of a continuous
regularizing function before considering the monotonic case. The monotonicity property can be
exploited to yield some bounds on the limiting distribution of the modified statistics.
Based on the spectral decomposition defined in equation (4.7), we immediately deduce a spectral
decomposition for the regularized inverses:
X
X
ΣR =
g(λ; c)P (λ) ,
Σ̂R =
g(λ̂; c)P (λ̂) .
(5.1)
λ∈S(Σ)
λ̂∈S(Σ̂)
15
We will provide below various regularized inverses based on different specifications of the regularizing function g(λ; c). We will consider different specifications of the regularizing function which
may depend on the sample size n. This path dependency may be introduced through the threshold
cn .
g(λ, n) = g(λ, cn ) .
(5.2)
5.1. Distributional theory for regularizing functions continuous almost everywhere
Let the general eigenvalue-regularizing function g(λ; c) be defined as:
½
1
if λ > c ,
λ
g(λ; c) =
1
if λ ≤ c .
c+γ|λ−c|
Interesting special cases include:
i) γ = ∞, hence
½1
λ
g(λ; c) =
0
(5.3)
if λ > c
if λ ≤ c
and therefore Λ˜ = Λ+
c , where
Λ+
c = diag[1/λ1 I(λ1 > c), . . . , 1/λq1 I(λq1 > c), 0, . . . , 0 ],
with I(s) being equal to 1 if the relation s is satisfied; Λ˜ is a modified version of the MoorePenrose inverse of
Λc = diag[λ1 I(λ1 > c), λq1 I(λq1 > c), λq1 +1 I(λq1 +1 ≤ c) . . . , λq I(λq ≤ c)] ,
as the one used by Lütkepohl and Burda (1997, henceforth LB);
ii) γ = 0, hence
½1
λ
1
c
g(λ; c) =
if λ > c
if λ ≤ c
henceforth called DV1;
iii) γ = − 12 , hence
½
g(λ; c) =
1
λ
2
λ+c
if λ > c
if λ ≤ c
henceforth called DV2.
Note that the eigenvalue-regularizing function g(λ; c) may allow for a sample-size dependent threshold cn .
Proposition 5.1 (Almost sure convergence of the regularized inverses) Let g(λ; c) be a continua.s.
ous function at each eigenvalue of Σ. If Σ̂n → Σ and λj (Σ) 6= c for all j, then
a.s.
Σ̂R,n → ΣR .
16
(5.4)
Before establishing the main convergence result for the regularized covariance matrices, we
must first study the convergence rate of the eigenvalues in the general case where the covariance
matrix may be singular with (possibly) multiple eigenvalues. To do so, we shall apply a general
result given by Eaton and Tyler (1994) where they generalize some nonclassical results due to
Anderson (1963) Anderson (1987) on the behavior of sample roots (of a determinantal equation).
Specifically, under fairly weak conditions they show that if the sequence of random matrices of
L
interest Xn is such that Zn = bn (Xn − B0 ) → Z with the limit B0 nonstochastic, then the sample
L
eigenvalues will have the same convergence rate, with bn [Ψ (Xn )−Ψ (B0 )] → Ψ (Z), where Ψ (Xn )
denotes the q−vector of singular values of Xn . This convergence rate of the sample eigenvalues will
be useful in establishing the asymptotic distribution of the regularized tests.
Let d1 > d2 > · · · > dk denote the distinct eigenvalues of Σ and let mi be their multiplicities,
i = 1, . . . , k. Given the eigenvalue multiplicities of Σ, a function H on q × q symmetric matrices
can be defined by


ρ(Σ11 )
 ρ(Σ22 ) 


H(Σ) =  . 
(5.5)
 .. 
ρ(Σkk )
where Σii is the mi × mi diagonal block of Σ determined by partitioning Σ into mi × mj blocks,
i, j = 1, . . . , k. Thus H(Σ) takes values in Rq and ρ(Σii ) consists of the mi -vector of ordered
eigenvalues of the diagonal block Σii , i = 1, . . . , k.
p
Lemma 5.2 Let Σn be a q × q real random matrix such that Σn → Σ, where Σ is a q × q real
nonstochastic matrix such that rank(Σ) = r1 , 1 ≤ r1 < q. Let also bn be a sequence of real
constants such that
L
Zn = bn (Σn − Σ) → Z
(5.6)
where bn → +∞ and Z is a random matrix. Then,
£
¤ L
En = bn Ψ (Σn ) − Ψ (Σ) →
¢¸
· ¡1 0
H 2 [Z11 + Z11 ]
Ψ (Z22 )
(5.7)
where Z11 and Z22 are well-defined random elements.
For a proof, see Eaton and Tyler (1994, Theorem 4.2).
We state now the convergence result for the regularized inverses which is fundamental to obtain
well-behaved regularized test statistics. Let λ̂n,i = λi (Σn ) and λi = λi (Σ). First when designing
the VRF g(λ; c(n)), the threshold c(n) must be selected so that
£
¤
£
¤
P r |λ̂n,i − λi | > c(n) = P r |bn (λ̂n,i − λi )| > bn c(n) → 0
n→∞
with c(n) → 0 and bn c(n) → ∞ as n grows to infinity. Thus, c(n) declines to 0 slower than 1/bn ,
and bn c(n) → ∞ slower than bn does. Indeed, the threshold must not decline to zero either too fast,
or too slow. Selecting c(n) in this way guarantees that the nonzero eigenvalues of the covariance
17
matrix will eventually be greater than the threshold, while the true zero eigenvalues are still smaller
than the threshold and are set to zero at least in large samples. This convergence rate plays a crucial
role in Proposition 5.3 below.
Because the random objects considered here are matrices, we must chose a norm suitable to
matrices. For this reason, we consider the finite dimensional inner product space (Sq , < ·, · >),
where Sq is the vector space of q × q symmetric matrices. Sq is equipped with the inner product
< Σ1 , Σ2 >= tr[Σ10 Σ2 ], where tr denotes the trace. Let k · k denote the Frobenius norm induced
by this inner product, i.e. kΣk2F = tr[Σ 0 Σ]. Let AR denote the regularized inverse of a q × q real
symmetric matrix A.
Proposition 5.3 Let Σ be a q × q real positive semidefinite nonstochastic matrix. Let Σn be a q × q
real symmetric random matrix and suppose the assumptions of Lemma 5.2 hold. Let g(λ; c(n)) be
a function continuous almost everywhere w.r.t. λ, with the set consisted of the discontinuities of g
being of measure zero. If (5.6) holds, then
p
(Σn )R → (Σ)R .
Proofs appear in the Appendix.
In the following, we establish a characterization of the asymptotic distribution of the regularized
test statistics in the general case. This characterization makes use of a decomposition of the modified
statistic into a regular component and the regularized component. We first formulate the proposition
under the following generic assumption before verifying it in special cases for the C(α) and Wald
test statistics.
Assumption 5.4 Let Zn and Z be q × 1 random vectors defined on a common probability space
such that:
L
Zn → Z ∼ N [0, Σ] , with rank(Σ) = q1 < q .
Recall that q̂1 = card{j : λ̂j > c(n)} and q1 = card{j : λj > c(n)}. Let Aq1 = {ω ∈ Ω :
q̂1 (ω) = q1 } such that P r(Aq1 ) → 1.
n→∞
Proposition 5.5 (Asymptotic distribution of regularized statistics when the threshold goes to zero)
Suppose Assumption 5.4 holds. Suppose further that the assumptions of Proposition 5.3 are satisfied. The regularized statistic
L
0
0
Zn0 [Σn ]R Zn = Z1n
Λ̂˜1n Z1n + Z2n
Λ̂˜2n Z2n → Z10 Λ˜1 Z1 + Z20 Λ˜2 Z2 = X1 + X2 ∀w ∈ Aq1 ,
with X2 = Z20 Λ˜2 Z2 = 0 and X1 = Z10 Λ˜1 Z1 ∼ χ2 (q1 ) .
It is worth noting that when the threshold c(n) converges to zero at an appropriate rate based on
that of the eigenvalues of the covariance matrix, the statistic thus regularized is asymptotically
distributed as a χ2 (q1 ) variable with q1 being the dimension of the eigenspace corresponding to the
nonzero eigenvalues. Under this circumstance, we find the asymptotic result obtained by Lütkepohl
and Burda (1997) but at the difference that the above regularized statistic is always greater than that
18
of Lütkepohl and Burda (1997) in finite samples. While a simulation-based approach is required to
control the level under the null hypothesis, this feature may entail some gains in power under the
alternative.
In the following we apply the above distributional result to specific test statistics such as the
regularized C(α), the regularized Wald test and finally to the regularized LR-type test. To do so,
we need to check that the required assumptions for Proposition 5.5 to hold are satisfied for those
specific statistics. Also, under the assumptions 2.4 and 2.7, we have:
plim Ṽψ (θ̃n0 ) = plim Q̃n I˜0 Q̃0n = Q(θ0 )I(θ0 )Q(θ0 )0 = Vψ (θ0 ) .
n→∞
n→∞
Corollary 5.6 Under the assumptions 2.4 and 2.7, and the assumptions of Proposition 5.3, we
have:
p
[Ṽψ (θ̃n0 )]R → [Vψ (θ0 )]R .
(5.8)
The results stated in Proposition 5.5 when applied to Zn = Qn Dn (θ̃n0 ) yield a χ2 distribution for
the regularized C(α) test.
Proposition 5.7 (Asymptotic distribution of regularized C(α) statistic) Suppose
Assumption
3.2 holds. Suppose further that the assumptions of Proposition 5.5 and those of Corollary 5.6 are
satisfied. Then
L
P C R (θ̃n0 ; ψ) → P C R (θ0 ; ψ) = D∞ (θ0 )0 Q(θ0 )0 [Vψ (θ0 )]R Q(θ0 )D∞ (θ0 ) ∼ χ2 (q1 )
i) P C 1n (c(n)) =
χ2 (q1 );
ii) P C 2n (c(n)) =
Pq̂1
L
0
0
j=1 (b̂1n,j Z̃n )g(λ̂n,j ; c(n))(b̂1n,j Z̃n ) →
Pq
Pq1
p
0
0
j=q̂1 +1 (b̂2n,j Z̃n )g(λ̂n,j ; c(n))(b̂2n,j Z̃n ) →
(5.9)
0
0
j=1 (b1,j Z)g(λj ; c(n))(b1,j Z)
∼
0;
bi,j for i = 1, 2 represents an eigenvector associated with λj .
The results established above deserve some discussion. Thus, it is important to notice that both
components are asymptotically independent. Whereas the first component is asymptotically χ2
distributed, the second one collapses to zero. To the extent that the regularized C(α) statistic can be
viewed as the sum of two "small" independent C(α) statistics, it can be easily simulated to provide
a feasible test under singularity conditions.
√
Similarly when taking Zn = nψ(θ̂), Proposition 5.5 establishes the asymptotic distribution of
the regularized Wald test.
Proposition 5.8 (Asymptotic distribution of regularized Wald statistics) Suppose that the assumptions of Proposition 5.5 hold. Suppose further that Assumption 2.9 holds. Let
W̃n = nψ(θ̂)0 Σ̂ψ,R ψ(θ̂) = W1n + W2n ,
with
W1n = nψ(θ̂)0 V̂1 (c(n))Λ̂1˜(c(n))V̂10 (c(n))ψ(θ̂)
19
(5.10)
and
W2n = nψ(θ̂)0 V̂2 (c(n))Λ̂2˜(c(n))V̂20 (c(n))ψ(θ̂) .
Then,
p
L
W̃1n → W1 , with W1 ∼ χ2 (q1 ) and W2n → 0.
to:
(5.11)
(5.12)
For example, in the DV1 case (i.e. with Λ˜2 (c) = 1c I(q−qc ) ), the modified statistic corresponds
W̃n = W̃1n (c) + W̃2n (c)
= nψ(θ̂)0 V̂1 (c)Λ̂1˜(c)V̂10 (c)ψ(θ̂) + nψ(θ̂)0 V̂2 (c)Λ̂2˜(c)V̂20 (c)ψ(θ̂)
n
= nψ(θ̂)0 V̂1 (c)Λ̂1˜(c)V̂10 (c)ψ(θ̂) + ψ(θ̂)0 V̂2 (c)V̂20 (c)ψ(θ̂) ,
c
(5.13)
and for the LB case the modified statistic corresponds to:
W̃n = W̃1n (c)+W̃2n (c) = nψ(θ̂)0 V̂1 (c)Λ̂1˜(c)V̂10 (c)ψ(θ̂)+0 = nψ(θ̂)0 V̂ (c)Λ̂ + (c)V̂ 0 (c)ψ(θ̂) = Wn+ (c) ,
(5.14)
−1
+
−1
where Λ (c) = diag(λ1 , . . . , λqc , 0, . . . , 0) represent the Moore-Penrose inverse of Λ =
diag(λ1 , . . . , λqc , λqc +1 , . . . , λq ). Therefore Wn+ (c) = W̃1n (c)
When the threshold goes to zero at the appropriate speed, the regularized test statistics have a
standard chi-square limiting distribution, with the nonregular component collapsing to zero. A more
interesting case producing a nonstandard asymptotic distribution would be the case that the threshold is fixed putting a positive probability mass on the eigenvalues exactly equal to the threshold c.
Let us now study this case.
Proposition 5.9 (Asymptotic distribution of regularized statistics when the threshold is fixed)
Suppose Zn and Z be q × 1 random vectors defined on a common probability space such that:
L
Zn → Z ∼ N [0, Σ] , with rank(Σ) = r ≤ q . Let the regularized statistic satisfy the following
decomposition:
0
0
0
Zn0 [Σn ]R Zn = Z1n
Λ̂˜1n Z1n + Z2n
Λ̂˜2n Z2n + Z3n
Λ̂˜3n Z3n ,
then
0
Z1n
Λ̂˜1n Z1n
=
q1
X
L
z1nj g(λ1nj ; c)z1nj → χ2 (q1 )
j=1
q2
X
0
Z2n
Λ̂˜2n Z2n =
L
z2nj g(λ2nj ; c)z2nj →
j=q1 +1
0
Z3n
Λ̂˜3n Z3n
=
q
X
where vj
cg(c; c)I(λ2j = c)vj2 ≤ χ2 (q2 )
j=r1 +1
q
X
L
z3nj g(λ3nj ; c)z3nj →
j=q2 +1
q2
X
λ3j g(λ3j ; c)I(λ3j < c)vj2 ≤ χ2 (q3 )
j=q2 +1
i.i.d.
∼ N (0, 1).
20
6. Examples
6.1. Multistep noncausality
As already observed by Lütkepohl and Burda (1997), testing noncausality restrictions may raise
some singularity problems for the Wald test. We shall reconsider the example provided by Lütkepohl
and Burda (1997) in our specific regularization design. A VAR(1) process is considered for the
(3 × 1)vector yt = [xt yt zt ]0 as follows:

 




 
ux,t
xt−1
αxx αxy αxz
xt−1
xt
 yt  = A1  yt−1  + ut = αyx αyy αyz   yt−1  + uy,t  .
uz,t
zt−1
αzx αzy αzz
zt−1
zt
Let consider
Y ≡ (y1 , . . . , yT ) (3 × T )
B ≡ (A1 ) (3 × 3)
Zt ≡ [yt ] (3 × 1) Z ≡ (Z0 , . . . , ZT −1 ) (3 × T )
U = (u1 , . . . , uT )
(3 × T )
where ut = [ux,t uy,t uz,t ]0 is a white noise with (3 × 3) nonsingular covariance matrix Σu . Let
(∞)
α = vec(A1 ) = vec(B). Testing H0 : yt 6→ xt requires to test h = pKz + 1 = 2 restrictions on
α [see Dufour and Renault (1998)] of the form:
·
¸ · ¸
αxy
0
r(α) =
=
.
αxx αxy + αxy αyy + αxz αzy
0
These restrictions are fulfilled in the following three different parameter settings
αxy = αxz = 0, αzy 6= 0
αxy = αzy = 0, αxz 6= 0
αxy = αxz = αzy = 0
(6.1)
But we can observe that the first-order partial derivative of the restrictions leads to a singular matrix
·
¸
∂r
0 0 0
1
0
0
0 0 0
=
(6.2)
αxy 0 0 αxx + αyy αxy αxz αzy 0 0
∂α0
if (6.1) holds. Under such circumstances, the Wald test does not have the standard χ2 −distribution
under the null. To perform the Wald test, let us consider the multivariate LS estimator of α =
vec(A1 ) = vec(B). Using the column stacking operator vec we have:
Y = BZ + U
21
(6.3)
or
vec(Y ) = vec(BZ) + vec(U )
¡
¢
y = Z 0 ⊗ I3 vec(B) + vec(U )
¡
¢
y = Z 0 ⊗ I3 α + u
where E(uu0 ) = I3 ⊗ Σu . The multivariate LS estimator α̂ is given by:
µ
¶
0 −1
α̂ = (ZZ ) Z ⊗ I3 y .
(6.4)
(6.5)
(6.6)
(6.7)
The asymptotic distribution of the multivariate LS estimator:
√
¡
¢
L
T (α̂ − α) → N 0, Γ −1 ⊗ Σu )
(6.8)
implies the asymptotic distribution for the restrictions:
√
¡
¢
L
T (r(α̂) − r(α)) → N 0, Σr(α) )
where
Σ̂r(α) =
∂r
∂r0
(α̂)
(α̂)
Σ̂
α
∂α0
∂α
(6.9)
(6.10)
is a consistent estimator for Σr(α) and
Σ̂α = Γ̂ −1 ⊗ Σ̂u
is a consistent estimator for Σα with
Γ̂ =
and
1
ZZ 0
T
T
¤
1X
1 £
Σ̂u =
ût û0t = Y IT − Z 0 (ZZ 0 )−1 Z Y 0 .
T
T
(6.11)
(6.12)
(6.13)
t=1
From the asymptotic distribution (6.9), a Wald-type test is easily obtained to test the null H0 :
r(α) = 0, i.e.
R
Wψ = T r(α̂)0 Σ̂r(α)
(α̂)
(6.14)
where a regularization is required under parameter setting (6.1).
7.
Conclusion
In this paper, we introduce two classes of tests allowing us to deal with asymptotically singular
covariance matrices whose finite sample analog is not singular. Exploiting the continuity property
of the total eigenprojection stated in Tyler (1981) enables us to built continuous regularized inverses
22
that are useful in establishing asymptotic distributional results for the regularized tests statistics. The
standard distributional results can be used to provide asymptotically valid tests in a convenient way.
However, the usual critical point may be conservative. In this respect, a simulation-based approach
can improve the performance of the procedure under the alternative. Finally, the tuning parameter
requires a deeper examination together with the study of monotonic variance regularization function.
23
A. Appendix: Proofs
a.s.
PROOF of Lemma 4.3 If Σ̂n → Σ then the event A = {ω : Σ̂n (ω)
one: P (A) = 1. For any ω ∈ A , we have by Lemma 4.2:
[Σ̂n (ω)
→ Σ} has probability
n→∞
→ Σ] ⇒ [λj (Σ̂n (ω)) → λj , j = 1, . . . , J] .
n→∞
By denoting B = {ω : λj (Σ̂n (ω)) → λj (Σ)}, we have A ⊆ B, hence we have with probability
n→∞
one the result i) stated in the Lemma. By using the same type of argument, we have the result ii) for
the eigenprojections.
PROOF of Lemma 4.4
p
If Σ̂n → Σ with eigenvalues {λj (Σ̂n )}, then every subsequence {Σ̂nk } with eigenvalues
p
{λ(Σ̂nk )}, also satisfies Σ̂nk → Σ. By Lukacs (1975, theorem 2.4.3, page 48), there exists
a.s.
{Σ̂ml } ⊆ {Σ̂nk } such that Σ̂ml → Σ. Hence by Lemma 4.3, we have
a.s.
i) λj (Σ̂ml ) → λj (Σ),
a.s.
ii) Pj,t (Σ̂ml ) → Pj,t (Σ) provided λj−1 (Σ) 6= λj (Σ) and λt (Σ) 6= λt+1 (Σ) .
As we have {Σ̂ml } ⊆ {Σ̂nk } ⊆ {Σ̂n } with the corresponding eigenvalues {λj (Σ̂ml )} ⊆
{λj (Σ̂nk )} ⊆ {λj (Σ̂n )} by Lukacs (1975, theorem 2.4.4 page 49), it suffices that every subsequence {λj (Σ̂nk )} of {λj (Σ̂n )} contains a subsequence {λj (Σ̂ml )} which converges almost surely,
p
to get the result of convergence in probability: λj (Σ̂n ) → λj (Σ̂) . By applying the same type of
p
argument, we get Pj,t (Σ̂n ) → Pj,t (Σ) .
PROOF of Proposition 5.1
We shall first focus on the case where all the eigenvalues are distinct before considering multiple
eigenvalues. Let abbreviate λ̂j = λj (Σ̂n (ω)) , P (λ̂j ) = Pj,j [Σ̂n (ω)] , λj = λj (Σ) and P (λj ) =
Pj,j (Σ). Recall equation (4.7) we have:
X
X
Σ̂˜n (ω) =
g(λ̂j ; c)P (λ̂j ) and Σ˜ =
g(λj ; c)P (λj )
(A.1)
λj ∈S(Σ)
λ̂j ∈S(Σ̂n (ω))
a.s.
a.s.
If Σ̂n → Σ then by lemma 4.3 i), we have λ̂j → λj for all j, hence by continuity we
a.s.
have g(λ̂j ; c) → g(λj ; c) ∀ j. By lemma 4.3 ii) and the continuity of the product, we have
a.s.
g(λ̂j ; c)P (λ̂j ) → g(λj ; c)P (λj ) ∀ j . It follows from the componentwise consistency, the cona.s.
sistency for the whole sum. Hence Σ̂˜n → Σ˜ for m(λ) = 1 .
Now, let us establish the consistency property in the case of multiple eigenvalues.Suppose that
λj has multiplicity m(λj ), i.e. λj−1 > λj = λj+1 = · · · = λj+m(λj ) > λj+m(λj )+1 and λ̂j−1 >
λ̂j ≥ λ̂j+1 ≥ · · · ≥ λ̂j+m(λ̂j ) > λ̂j+m(λ̂j )+1 . Let us focus on the corresponding part of the spectral
24
decomposition of Σ̂n , i.e.
j+m(λ̂j )
X
j+m(λ̂j )
g(λ̂k ; c)P (λ̂k )
=
X
g(λ̂j ; c)P (λ̂j ) +
k=j
g(λ̂k ; c)P (λ̂k )
k=j+1
j+m(λ̂j )
=
X
g(λ̂j ; c)P (λ̂j ) +
[g(λ̂k ; c) − g(λ̂j ; c) + g(λ̂j ; c)]P (λ̂k )
k=j+1
j+m(λ̂j )
=
X
g(λ̂j ; c)P (λ̂j ) +
j+m(λ̂j )
g(λ̂j ; c)P (λ̂k ) +
k=j+1
=
g(λ̂j ; c)
j+m(λ̂j )
→
g(λj ; c)
[g(λ̂k ; c) − g(λ̂j ; c)]P (λ̂k )
k=j+1
j+m(λj )
X
X
P (λ̂k ) +
k=j
a.s.
[g(λ̂k ; c) − g(λ̂j ; c)]P (λ̂k )
k=j+1
j+m(λ̂j ;c)
X
X
j+m(λj )
P (λk ) +
k=j
X
[g(λk ; c) − g(λj ; c)]P (λk )
(A.2)
k=j+1
by Lemma 4.3 and by continuity of g(.; c). Moreover, for λj = λj+1 = · · · = λj+m(λj )
and as g(.; c) is continuous, we have: g(λj ; c) = g(λj+1 ; c) = · · · = g(λj+m(λj ) ; c), hence
∀ k = j + 1, . . . , j + m(λj ), we have g(λk ; c) − g(λj ; c) = 0. For the projection operator P
is a continuous linear operator; following Kolmogorov and Fomin (1970, theorem 1, page 223), a
necessary condition for a linear operator P to be continuous on a topological linear space is that P
be bounded. Therefore, equation (A.2) becomes
j+m(λ̂j )
X
g(λ̂k ; c)P (λ̂k )
a.s.
→
j+m(λj )
j+m(λj )
g(λj ; c)
X
P (λk ) +
k=j+1
k=j
k=j
a.s.
→
X
[g(λk ; c) − g(λj ; c)] P (λk )
|
{z
} | {z }
0
O(1)
j+m(λj )
g(λj ; c)
X
P (λk )
(A.3)
k=j
As the result holds for any λj with an arbitrary multiplicity m(λj ) > 1, hence
X
X
a.s.
Σ̂˜ψ(θ) =
g(λ̂; c)P (λ̂) →
g(λ; c)P (λ) = Σ˜ψ(θ) ∀ m(λ) ≥ 1 .
λ̂∈Ŝ
λ∈S
L
PROOF of Proposition 5.5 By Assumption 5.4, we have Zn → Z ∼ N [0, Σ]. By Proposition
25
p
5.3, we have [Σn ]R → [Σ]R Hence,
L
Zn0 [Σn ]R Zn → Z 0 [Σ]R Z ∼ χ2 (ν)
(A.4)
with ν = rank(Σ).
Zn0 [Σn ]R Zn
·
¸· 0 ¸
£
¤ Λ̂˜1n
0
V̂1n
Zn
= Zn V̂1n V̂2n
0
˜
0 Λ̂2n V̂2n
¸
·
¸· 0
¤ Λ̂˜1n
0
V̂1n Zn
0
0
0
Zn
Zn + Zn0 V̂2n Λ̂˜2n V̂2n
= Zn0 V̂1n Λ̂˜1n V̂1n
Zn V̂2n
0 Z
0 Λ̂˜2n V̂2n
n
(A.5)
=
Zn0 V̂n Λ̂˜n V̂n Zn
=
£ 0
Zn V̂1n
0 Z and Z = V 0 Z for i = 1, 2. We decompose the limiting variable
Let Zin = V̂in
n
i
i
X = Z 0 [Σ]R Z = Z 0 V1 Λ˜1 V10 Z + Z 0 V2 Λ˜2 V20 Z = Z10 Λ˜1 Z1 + Z20 Λ˜2 Z2 = X1 + X2
in the same way as
0
0
0
0
Xn = Zn0 [Σn ]R Zn = Zn0 V̂1n Λ̂˜1n V̂1n
Zn +Zn0 V̂2n Λ̂˜2n V̂2n
Zn = Z1n
Λ̂˜1n Z1n +Z2n
Λ̂˜2n Z2n = X1n +X2n .
L
Taking f (x, y) = x + y, by the Slutsky theorem1 applied to the vector Xn , we have X1n + X2n →
X1 + X2 , with Xi , i = 1, 2 being independent variables.
L
0
0
X1n + X2n = Z1n
Λ̂˜1n Z1n + Z2n
Λ̂˜2n Z2n → Z10 Λ˜1 Z1 + Z20 Λ˜2 Z2 = X1 + X2 ∀w ∈ Aq1 .
As V ar(Z1 ) = Λ1 , V ar(Z2 ) = Λ2 = diag(λj ), with λj = 0, for all j = q1 +1, . . . , q. For the large
eigenvalues, we have Λ˜1 Λ1 = Iq1 and for the q2 eigenvalues equal to zero we have Λ˜2 Λ2 = 0q2 .
L
This leads to X2 = 0. Therefore X1n + X2n → X1 ∼ χ2 (q1 ) ∀w ∈ Aq1 .
PROOF of Proposition 5.7 Let bin,j for i = 1, 2 be the eigenvectors associated with λ̂n,j which
constitute the columns of the matrix V̂i (c(n)), respectively. Let
P C 1n (c(n)) =
Z̃n0 V̂1 (c(n))Λ̂˜1 (c(n))V̂10 (c(n))Z̃n
=
q̂1 ·
X
¸
(b̂01n,j Z̃n )g(λ̂n,j ; c(n))(b̂01n,j Z̃n )
j=1
¸
q1
q̂1 ·
q1
X
X
¡ λj ¢ 2 X
1
L
0
0
vj2 ∼ χ2 (q1 ), ∀w ∈ Aq1
vj =
(b1n,j Z̃n )
(b1n,j Z̃n ) →
=
λ
j
λ̂
n,j
j=1
j=1
j=1
1
L
(A.6)
The Slutsky theorem states if Xn → X and P r[X ∈ C(f )] = 1, where C(f ) is the continuity set of f , then
L
f (Xn ) → f (X).
26
i.i.d
for vj ∼ N (0, 1).
P C 2n (c(n)) =
Z̃n0 V̂2 (c(n))Λ̂˜2 (c(n))V̂20 (c(n))Z̃n
=
·
q
X
¸
(b̂02n,j Z̃n )g(λ̂n,j ; c(n))(b̂02n,j Z̃n )
j=q̂1 +1
L
→
q
X
p
g(λ̂n,j ; c(n))λ̂n,j vj2 → 0 ∀ ω ∈ Aq1
j=q̂1 +1
p
because λ̂n,j → λj = 0 for j = q̂1 + 1 . . . , q and λ̂n,j = Op (1/bn ) with g(.; c(n)) a function
i.i.d
continuous a.e. except on a set with measure zero. vj ∼ N (0, 1).
27
References
[0]
Amemya, Y. (1986), Limiting distribution of certain characteristic roots under general conditions and their
applications, Technical report 17, economic workshop, stanford university.
Anderson, T. W. (1963), ‘Asymptotic theory for principal component analysis’, Annals of Mathematical
Statistics 34, 122–148.
Anderson, T. W. (1987), The asymptotic distribution of characteristic roots and vectors in multivariate
components of variance, Technical report 23, econometric workshop, stanford university.
Andrews, D. W. K. (1987), ‘Asymptotic results for generalized Wald tests’, Econometric Theory 3, 348–
358.
Carrasco, M. (2007), A regularizing approach to the many instrument problem, Technical report, Université de Montréal.
Carrasco, M., Chernov, M., Florens, J.-P. and Ghysels, E. (2007), ‘Efficient estimation of general dynamic
models with a continuum of moment conditions’, Journal of Econometrics 140, 529–573.
Carrasco, M. and Florens, J.-P. (2000), ‘Generalization of gmm to a continuum of moment conditions’,
Econometric Theory 16, 797–834.
Carrasco, M., Florens, J.-P. and Renault, E. (2007), Linear inverse problems in structural econometrics:
estimation based on spectral decomposition and regularization, in J. Heckman and E. Leamer, eds,
‘Handbook of econometrics’, Vol. 6, pNH, Amsterdam.
Dufour, J.-M. and Renault, E. (1998), ‘Short run and long run causality in time series: theory’, Econometrica 66, 1099–1125.
Dufour, J.-M. and Trognon, A. (2001), Invariant tests based on M-estimators, estimating functions, and
the generalized method of moments, Technical report, CIREQ, University of Montreal.
Eaton, M. L. and Tyler, D. (1994), ‘The asymptotic distribution of singular values with applications to
canonical correlations and correspondence analysis’, Journal of Multivariate Analysis 50, 238–264.
Engl, H. W., Hanke, M. and Neubauer, A. (2000), Regularization of inverse problems, Kluwer Academic
Publishers.
Kato, T. (1966), Perturbation theory for linear operators, Springer.
Kolmogorov, a. N. and Fomin, S. V. (1970), Introductory real analysis, Dover Publications, inc.
Kress, R. (1999), Linear integral equations, Applied mathematical sciences, second edn, Springer, New
York.
Lukacs, E. (1975), Stochastic convergence, second edn, Academic Press, inc.
Lütkepohl, H. and Burda, M. M. (1997), ‘Modified wald tests under nonregular conditions’, Journal of
Econometrics 78, 315–332.
Moore, D. S. (1977), ‘Estimating stochastic volatility models through indirect inference’, Journal of the
American Statistical Association 72(357), 131–137.
28
Newey, W. K. and West, K. D. (1987), ‘Hypothesis testing with efficient method of moments estimators’,
International Economic Review 28, 777–787.
Rellich, F. (1953), Perturbation theory of eigenvalues problems, Lecture Notes, New York university.
Tyler, D. E. (1981), ‘Asymptotic inference for eigenvectors’, The Annals of Statistics 9(4), 725–736.
29
Download