Preliminary and incomplete Comments welcome 1 GMM and hypothesis tests when rank conditions fail: a regularization approach ∗ Jean-Marie Dufour † McGill University Pascale Valéry ‡ HEC-Montréal February 20, 2009 ∗ This work was supported by the William Dow Chair in Political Economy (McGill University), the Canada Research Chair Program (Chair in Econometrics, Université de Montréal), the Bank of Canada (Research Fellowship), a Guggenheim Fellowship, a Konrad-Adenauer Fellowship (Alexander-von-Humboldt Foundation, Germany), the Institut de finance mathématique de Montréal (IFM2), the Canadian Network of Centres of Excellence [program on Mathematics of Information Technology and Complex Systems (MITACS)], the Natural Sciences and Engineering Research Council of Canada, the Social Sciences and Humanities Research Council of Canada, and the Fonds de recherche sur la société et la culture (Québec). † William Dow Professor of Economics, McGill University, Centre interuniversitaire de recherche en analyse des organisations (CIRANO), and Centre interuniversitaire de recherche en économie quantitative (CIREQ). Mailing address: Department of Economics, McGill University, Leacock Building, Room 519, 855 Sherbrooke Street West, Montréal, Québec H3A 2T7, Canada. TEL: (1) 514 398 8879; FAX: (1) 514 398 4938; e-mail: jean-marie.dufour@mcgill.ca . Web page: http://www.jeanmariedufour.com ‡ Service de l’enseignement de la finance, École des Hautes Études Commerciales de Montréal (HEC-Montréal), 3000 chemin de la Côte-Sainte-Catherine Montréal, Québec), Canada H3T 2A7. TEL: 1 (514) 340-7004, FAX: (514) 340-5632. email: pascale.valery@hec.ca ABSTRACT This paper considers the issue of testing hypotheses in the presence of (possibly) singular covariance matrices. In this paper we specifically focus on the case that the finite sample covariance matrix is not singular but converges to an asymptotic singular population matrix. The econometrician can face this type of situations when the derivative matrix of the restrictions has a lower rank only at the true value of the parameter, or situations where there may be an identification problem asymptotically that cannot be detected in finite samples. More generally, when the econometrician suspects any possible singularity issue, the usual inverses are discarded and most of the time replaced with generalized inverses, or g-inverses [see Moore (1977), Andrews (1987) for the generalized Wald tests built on the Moore-Penrose inverse] or modified inverses as proposed by Lütkepohl and Burda (1997). To face such difficulties, we propose to introduce modified inverses in such a way that the continuity property is preserved (at least for some of them). Our modified inverses can be viewed as a generalization of the one introduced by Lütkepohl and Burda (1997). Our contributions are as follows. First, following Kato (1966), Tyler (1981), we propose to work with the eigenprojection: the operator which projects onto the eigenspace (i.e. the subspace generated by the eigenvectors), is continuous. In this way, the important continuity property (w.r.t. the elements of the matrix) is preserved for eigenvalues and eigenprojections, even though eigenvectors are not continuous. The eigenvalues are perturbed in order to make them well behave in a certain neighborhood without losing accuracy. Among different regularizing methods, some can be specifically designed to be continuous. Inverses so modified remain continuous. As a result, tests statistics are well defined and the distributional theory built from them not misleading. Second, our regularized inverses are all the more appealing because they naturally underly a decomposition of the covariance matrix into three components, one viewed as regular built on large eigenvalues while the others which involve the small eigenvalues may not be regular. Based on this decomposition, we identify a convenient decomposition for the test statistic itself. Third, the tests so regularized do embed three possible cases. Case 1 corresponds to the usual case when using consistent sample covariance matrices which have full rank. Case 2 corresponds to using singular consistent sample covariance matrices but with the rank converging to the rank of the limit population matrix. In such a case, the limiting distribution is modified only through an adjustment of the degree of freedom; this is the case covered by Andrews (1987). Finally case 3 makes use of singular consistent sample covariance matrices which do not satisfy the rank condition pined down by Andrews (1987). Under this condition, a regularization is required to provide a valid test but at the cost of a fully modified asymptotic distribution. The asymptotic distribution is not chi square anymore but can be approximated by simulations. Besides, we also consider monotonic regularizations. In the monotonic case, we are able to bound the distribution of the test statistic. Such a bound can then be simulated to provide a valid test. Fourth, cast into our decomposition framework, the generalized Wald tests defined in Andrews (1987) and Lütkepohl and Burda (1997), consist of keeping only the first regular component of the test statistic and setting to zero the other nonregular components. In discarding so the latter, their technique may result in a loss of information. Revisiting their tests as we do may yield some efficiency gains. Finally, Carrasco and Florens (2000), Carrasco, Chernov, Florens and Ghysels (2007), Carrasco (2007) use related spectral decomposition based tools to regularize estimators when a continuum of moments is used in a GMM framework. By contrast, we focus on testing issues by proposing i √ regularized test statistics for any n consistent estimator with the only requirement that ψ(θ̂n ) be asymptotically normally distributed for which the delta method applies. ii 1. Introduction This paper considers the issue of testing hypotheses in the presence of (possibly) singular covariance matrices. In this paper we specifically focus on the case that the finite sample covariance matrix is not singular but converges to an asymptotic singular population matrix. The econometrician can face this type of situations when the derivative matrix of the restrictions has a lower rank only at the true value of the parameter, or situations where there may be an identification problem asymptotically that cannot be detected in finite samples. More generally, when the econometrician suspects any possible singularity issue, the usual inverses are discarded and most of the time replaced with generalized inverses, or g-inverses [see Moore (1977), Andrews (1987) for the generalized Wald tests built on the Moore-Penrose inverse] or modified inverses as proposed by Lütkepohl and Burda (1997). However, when using non-standard inverses, we have observed two difficulties, of which econometricians are not always aware. First, the well-known continuous mapping theorem so widely used by econometricians to derive asymptotic distributional results for test statistics does not apply anymore because g-inverses are not (necessarily) continuous. As a matter of fact, we have observed that the eigenvectors are not continuous functions in the elements of the matrix of interest unlike the eigenvalues. Second, when performing the singular value decomposition of a matrix, the eigenvectors corresponding to eigenvalues with multiplicity larger than one, are not uniquely defined, ruling out the possible convergence of the estimates towards population quantities. Ignoring such concerns may lead to distributional results which are strictly speaking wrong. To face such difficulties, we propose to introduce modified inverses in such a way that the continuity property is preserved (at least for some of them). Our modified inverses can be viewed as a generalization of the one introduced by Lütkepohl and Burda (1997). Our contributions can be formulated as follows. First, following Kato (1966), Tyler (1981), we propose to work with the eigenprojection (projection operator) instead, as a trick to circumvent the discontinuity and nonuniqueness features of eigenvectors. The eigenprojection projects onto the invariant (to the choice of the basis) eigenspace, i.e. the subspace generated by the eigenvectors. A lemma given by Tyler (1981) states the continuity property for the total eigenprojection (the sum of the eigenprojections over a subset of the eigenvalues). In this way, the important continuity property is preserved for eigenvalues and eigenprojections even though eigenvectors are not continuous. By combining this total eigenprojection technique with a specific design for the perturbation function of the eigenvalues, the inverses thus introduced in the paper remain continuous. As a result, test statistics are well defined and the distributional theory built from them not misleading. This continuity property (w.r.t. the elements of the matrix of interest) enjoyed by the inverses is an important contribution to the econometric literature. Second, the regularized inverses involve a variance regularizing function (or VRF) which consists in inflating the small eigenvalues that fall below a certain threshold so that their inverse is well defined whereas the large eigenvalues remain unchanged. The specification of the VRF allows us to define two classes of tests whether the threshold used to disentangle the small eigenvalues from the the others is fixed or varies with the sample size. The first class of tests which exploits a fixed threshold introduce regularized tests statistics with a nonstandard distribution (a mixture of chi-square) that admit the standard chi-square distribution (with the degree of freedom corresponding to the full rank) as an upper bound. This upper bound is very useful to the extent that 1 the usual critical points can be used to provide an asymptotically valid test. However the econometrician can do a little bit better from the viewpoint of power by simulating the test. Indeed, the usual critical points yield a conservative test which can lead to a loss of power under the alternative. The second class of tests allow the threshold to vary with the sample size. When the threshold declines to zero at an appropriate rate, the regularized tests are asymptotically distributed as the chi-square distribution with a reduced degree of freedom (corresponding to the number of eigenvalues greater than zero) like in Lütkepohl and Burda (1997). However this second class of tests requires more information regarding the asymptotic behavior of the sample eigenvalues. In this respect, a very general result established by Eaton and Tyler (1994) where they generalize some nonclassical results given by a wide strand of statistical and multivariate analysis literature [see Anderson (1963) Anderson (1987) Amemya (1986)...] provide us with the convergence speed of the sample eigenvalues (in the presence of eigenvalues with multiplicity greater than 1) needed for our distributional results. Third,our modified inverses are all the more appealing because they naturally underly a decomposition of covariance matrix into three components, one viewed as regular built on large eigenvalues while the other ones which involve the small eigenvalues may not be regular. Based on this covariance matrix decomposition, we identify a convenient decomposition for the test statistic itself. Fourth, the tests so regularized do embed three possible cases. Case 1 corresponds to the usual case when using consistent sample covariance matrices which have full rank. Case 2 corresponds to using sample covariance matrices which converge to a singular limiting matrix under the condition that the rank of the matrix estimator does converge to the rank of the limiting population matrix. In such a case, the limiting distribution is modified only through an adjustment of the degree of freedom; this is the case covered by Andrews (1987) and Lütkepohl and Burda (1997). Finally case 3 makes use of consistent sample covariance matrices which do not satisfy the rank condition pined down by Andrews (1987). Under such a circumstance, a regularization is required to provide a valid test but at the cost of a fully modified asymptotic distribution (under some specific designs). Fifth, our regularized inverses may be viewed as a generalization of Lütkepohl and Burda (1997)’s results by allowing more flexibility in selecting the threshold c for separating the eigenvalues. More generally, the estimate of the perturbation function of the eigenvalues can be viewed as a Hodges’ estimator applied to the eigenvalues. With an appropriate choice of the regularization (tuning) parameter, the Hodges’ estimator is a consistent estimator not only for the "small" components of the true eigenvalue vector but also for the large ones. This is the so-called superefficiency property enjoyed by the Hodges’ estimator. To match our decomposition, the generalized Wald tests defined in Andrews (1987) and Lütkepohl and Burda (1997), consist of keeping only the first regular component of the test statistic and setting to zero the other nonregular components. In discarding so the latter, their technique may result in a loss of information. Revisiting their tests as we do may yield some efficiency gains. More specifically, some of the regularizing functions used are our own design while others are commonly used in the literature on inverse problems; see Engl, Hanke and Neubauer (2000), Kress (1999). Some regularizing functions will be continuous in a neighborhood of the true eigenvalues while others will only be monotonic. Even in the monotonic case we will be able to bound the distribution of the test statistic. Such a bound can then be simulated to provide a valid test. Fifth, whereas Moore (1977), Andrews (1987) and Lütkepohl and Burda (1997) did focus on generalized Wald tests, we will extend the regularized tests beyond the sole Wald tests, 2 by considering C(α)−type tests. Insofar as they involve a quadratic form, they can be cast into this framework. Following Dufour and Trognon (2001) who provide a comprehensive study of the generalized C(α)−type tests, we shall revisit the latter in a regularized setup. On the other hand, while our main concern is testing, some authors make use of the related spectral decomposition based-tools to regularize estimators when a continuum of moments is used in a GMM or IV framework; see Carrasco and Florens (2000), Carrasco, Chernov, Florens and Ghysels (2007), Carrasco, Florens and Renault (2007), Carrasco (2007). In particular, Carrasco (2007) propose some modified IV estimators based on different ways of inverting the covariance matrix of instruments. Indeed, when the number of instruments is very large with respect to the sample size or even infinite, the covariance matrix of the moment conditions becomes singular and some non-standard inverses are required. By contrast, we focus on testing issues by proposing modified test statistics. The paper is organized as follows. In Section 2 we review the well-known methodology of usual Wald-type tests and C(α)−type tests in two different frameworks, a GMM framework together with a more general setup where the only requirement is to find an asymptotically gaussian estimator. Then, we will introduce the regularized inverses together with the regularized statistics in Section 3. More specifically, a decomposition of the statistic is highlighted through the corresponding decomposition of the covariance matrix. In Section 4 we review and adapt some spectral theory elements which serves as useful tools for stating convergence results for the modified inverses. In particular, we highlight some (non)uniqueness and (dis)continuity properties related to eigenvectors of a given matrix and provide a solution based on eigenprojection techniques to surmount such difficulties. In Section 5, we will state new asymptotic distributional results for the regularized statistics involving continuous regularizing functions. Finally we provide in Section 6 some examples. The proofs are gathered in the appendix. 2. Framework and test statistics 2.1. Basic assumptions We consider an inference problem about a parameter of interest θ ∈ Ω ∈ Rp . In order to identify the true value θ0 of θ, we consider an m × 1 vector score-type function Dn (θ, Yn ) where Yn = [y1 , y2 , · · · , yn ]0 is an n × k stochastic matrix satisfying the following assumptions. √ p Assumption 2.1 nDn (θ, Yn ) → D∞ (θ; θ0 ) where D∞ (θ; θ0 ) is a random variable. n→∞ £ ¤ Assumption 2.2 (Identification) Eθ0 D∞ (θ; θ0 ) = 0 iff θ = θ0 . where the expectation is taken w.r.t. the true probability distribution. £ ¤ Assumption 2.3 D∞ (θ; θ0 ) ∼ N 0, I(θ0 ) . Thus √ nDn (θ0 , Yn ) d → n→∞ £ ¤ N 0, I(θ0 ) . Assumption 2.3 leads to a GMM estimator through the 3 population orthogonality condition £ ¤ Eθ0 D∞ (θ, θ0 ) = 0 . Typically, Dn (θ, Yn ) has the form n 1X Dn (θ, Yn ) = ht (θ, yt ) n t=1 where Eθ0 [ht (θ0 , yt )] = 0 . Usually, Assumption 2.2 requires that m ≥ p. A GMM estimator is obtained by minimizing w.r.t. θ an objective functionof the form Mn (θ, Wn ) = Dn (θ, Yn )0 Wn Dn (θ, Yn ) (2.1) where Wn is (usually) a symmetric positive definite matrix. Moreover, a standard regularity assumption which is usually made in the GMM framework is the following. Assumption 2.4 (Nonsingular weighting matrix) The symmetric positive definite (possibly random) m × m matrix Wn satisfies the limiting condition: plim Wn = W0 with det(W0 ) 6= 0 . n→∞ (2.2) Later on, we shall relax the above nonsingularity condition. This condition matters especially for the LR-type tests built from a GMM objective function. In this paper, we consider testing problems in the presence of singular covariance matrices. Under such circumstances, our purpose is to propose solutions which may hold irrespective of such nonsingularity assumptions. Let θ̂n be the unrestricted estimator obtained by minimizing the objective function Mn (θ, Wn ), and θ̂n0 the corresponding constrained estimator under the null hypothesis: H0 (ψ0 ) : ψ(θ) = ψ0 (2.3) where ψ : Rp → Rq . Usual distributional theory critically hinges on regularity conditions such as the assumptions reviewed in this section. These assumptions define the standard regular framework upon which is built the whole usual distributional theory. We shall relax later on theses assumptions, especially those referring to rank conditions usually made on the restriction matrix or the covariance matrices of interest. Assumption 2.5 (Local regularity) There is a (nonempty) open neighborhood N1 of θ0 such that the q × 1 vector function ψ(θ) is continuously differentiable with rank[P (θ)] = q for all θ ∈ N1 , where P (θ) = ∂ψ/∂θ0 . (2.4) Let H(θ0 , Yn ) = ∂ Dn (θ0 , Yn ) ∂θ0 4 p → n→∞ J(θ0 ) . (2.5) In the regular framework, the m×m matrix I(θ0 ) and the m×p matrix J(θ0 ) are (usually) assumed to be full-column rank matrices. If n Dn (θ, Yn ) = 1X ht (θ; yt ) , n t=1 I(θ0 ) and J(θ0 ) may be estimated using heteroskedastic, autocorrelation consistent (HAC) covariance matrix estimator: n−1 X Iˆn (θ) = κ(j/Kn )Γ̂n (j, θ) (2.6) j=−(n−1) where κ(·) is a kernel function, Kn is a sample-size dependent bandwidth parameter, ( P n 1 ht (θ; yt )ht−j (θ; yt−j )0 if j ≥ 0, Γ̂n (j, θ) = n1 Pt=−(j−1) n 0 if j < 0, t=−(j−1) ht+j (θ; yt+j )ht (θ; yt ) n (2.7) and ∂Dn Jˆn (θ) = (θ) = H(θ, Yn ) , ∂θ0 with θ replaced by a consistent estimator. (2.8) 2.2. Score and LR-type test statistics In this paper we focus on two types of test statistics, namely the score-type and the LR-type statistics. We will now briefly review the theory of C(α) tests which encompass LM-type tests. The C(α) test statistic is well-defined under the following assumptions; for a more comprehensive presentation of C(α) tests, see Dufour and Trognon (2001) and the references therein. Assumption 2.6 (Root-n consistent restricted estimator) θ̃n0 is a consistent restricted estimate of √ θ0 such that ψ(θ̃n0 ) = 0 and n(θ̃n0 − θ0 ) is asymptotically bounded in probability. For the statistic to be well-defined, we also need consistent estimators as specified in the next assumption. Assumption 2.7 (Consistent estimators) plim Iˆn (θ̃n0 ) = I(θ0 ), n→∞ plim P (θ̃n0 ) = P (θ0 ), n→∞ plim Jˆn (θ̃n0 ) = J(θ0 ) . n→∞ Let P̃0 = P (θ̃n0 ), I˜0 = Iˆn (θ̃n0 ) and J˜0 = Jˆn (θ̃n0 ) be consistent estimators. The C(α) test statistic is then defined as P C(θ̃n0 ; ψ) = nDn (θ̃n0 )0 W̃0 Dn (θ̃n0 ) (2.9) where ¡ ¢−1 0 £ ¡ 0 −1 ¢−1 0 ¤−1 ¡ 0 −1 ¢−1 0 −1 W̃0 ≡ I˜0−1 J˜0 J˜00 I˜0−1 J˜0 P̃0 P̃0 J˜0 I˜0 J˜0 P̃0 P̃0 J˜0 I˜0 J˜0 J˜0 I˜0 . 5 Under standard regularity conditions, the C(α) statistic P C(θ̃n0 ; ψ) has an asymptotic null distribution without nuisance parameters, namely χ2 (q). We will also investigate a "LR-type" tests initially suggested by Newey and West (1987): LR(ψ) = n[Mn (θ̂n0 , I˜0−1 ) − Mn (θ̂n , I˜0−1 )] (2.10) Mn (θ, I˜0−1 ) = Dn (θ)0 I˜0−1 Dn (θ) , (2.11) where and Wn = I˜0−1 . Under standard regularity conditions, this test statistic is asymptotically pivotal with a χ2 (q) distribution. 2.3. Usual Wald-type tests More generally, there is no need to restrict our attention to a GMM estimator as the one introduced in equation (2.1). Indeed, a Wald-type test can be built from any root-n consistent estimator satisfying the assumption below. √ Assumption 2.8 (Estimator n convergence) θ̂n is a consistent estimator of θ0 such that: √ n(θ̂n − θ0 ) L → N (0, Σθ ) . n→∞ (2.12) The above condition can even be weakened to allow for only the restrictions to be root-n consistent without perturbing the Wald statistic. √ Assumption 2.9 ( n convergence of the restrictions) √ n(ψ(θ̂n ) − ψ(θ0 ])) L → n→∞ N (0, Σψ ) . (2.13) It is important to note that Assumption 2.8 implies Assumption 2.9, but not the converse. The usual Wald statistic for testing H0 (ψ0 ) is: Wn (ψ0 ) = n[ψ(θ̂) − ψ0 ]0 Σ̂ψ−1 [ψ(θ̂) − ψ0 ] (2.14) provided the inverse exists, where Σ̂ψ = P (θ̂)Σ̂θ P (θ̂)0 , P (θ) = ∂ψ/∂θ0 and Σ̂θ being a consistent estimator of Σθ . If H0 (ψ0 ) holds and the q × q-matrix Σψ (θ) = P (θ)Σθ P (θ)0 is nonsingular [i.e. |Σψ (θ)| 6= 0 for θ ∈ V (θ0 ), where | · | stands for the determinant], then Wn (ψ0 ) has an asymptotic χ2 (q) distribution. This is not necessarily true, however, if Σψ (θ) is singular. The latter may hold, if Σθ is singular or if P (θ) does not have full row rank q. In this case, Σψ (θ) does not admit a usual inverse but can still be inverted by means of a generalized inverse or g-inverse. In 6 Section 3, we introduce the family of regularized test statistics based on regularized inverses of the covariance matrix. 3. Regularized inference In this section, we introduce the concept of regularized tests which embed three possible cases. Case 1 corresponds to the usual case when using consistent sample covariance matrices which have full rank. Case 2 corresponds to using sample covariance matrices which converge to a singular limiting matrix under the condition that the rank of the matrix estimator does converge to the rank of the limiting population matrix. In such a case, the limiting distribution is modified only through an adjustment of the degree of freedom; this is the case covered by Andrews (1987) and Lütkepohl and Burda (1997). Finally case 3 makes use of consistent sample covariance matrices which do not satisfy the rank condition pined down by Andrews (1987). Under such a circumstance, a regularization is required to provide a valid test but at the cost of a fully modified asymptotic distribution. This is the path investigated in this paper. We consider situations when the rank of the covariance matrix is incomplete asymptotically but not in finite samples. We can face this type of situations when the derivative matrix of the restrictions has a lower rank only at the true value of the parameter, or situations where there is an identification problem asymptotically that cannot be detected in finite samples. To the best of our knowledge, such cases have not been investigated in the literature. Under Assumption 3.1 and more generally for any matrix with an incomplete rank – either asymptotically or in finite samples– the usual inverse is not defined and requires either using a generalized inverse or transforming it such that it becomes invertible. Based on spectral decomposition tools, we propose to regularize the inverse of the covariance matrix by means of an eigenvalue perturbation function. The eigenvalue perturbation function will be specifically designed in order to make the inverse well-behave in a neighborhood of the true eigenvalue. The variance regularization method works under general conditions as far as the threshold c(n) converges to 0 at an appropriate rate depending on the convergence rate of the eigenvalues of the sample covariance matrix toward those of the limiting matrix. Further, the variance regularization method lead to a decomposition of the regularized statistics into three components, one viewed as regular built on large eigenvalues while the others which involve the small eigenvalues may not be regular. 3.1. Variance regularization We first introduce some notation. Let λ̄ = (λ1 , . . . , λq )0 where λ1 ≥ λ2 ≥ . . . ≥ λq be the eigenvalues of a q × q covariance matrix Σ, and V an orthogonal matrix such that Σ = V ΛV 0 , where Λ = diag(λ1 , . . . , λq ). Specifically, V consists of eigenvectors of the matrix Σ ordered so that ΣV = V Λ. Let m(λ) be the multiplicity of the eigenvalue λ. Although the matrix Λ is uniquely defined, the matrix V which consists of the eigenvectors is not uniquely defined when there is an eigenvalue with multiplicity m(λ) > 1. The eigenvectors which correspond to eigenvalues with m(λ) > 1 are uniquely defined only up to post-multiplication by an m(λ) × m(λ) orthogonal matrix. Moreover, let Σ̂ be a consistent estimator of Σ with eigenvalues λ̂1 ≥ λ̂2 ≥ . . . ≥ λ̂q and V̂ an orthogonal matrix such that Σ̂ = V̂ Λ̂V̂ 0 where Λ̂ = diag(λ̂1 , . . . , λ̂q ). For c > 0, we denote q(Σ, c) the number of eigenvalues λ such that λ > c and q(Σ̂, c) the number of eigenvalues λ̂ such 7 that λ̂ > c. If rank(Σ̂) = rank(Σ) = q with probability 1, i.e. both matrices are almost surely (a.s.) nonsingular, so the inverses Σ −1 = V Λ−1 V 0 and Σ̂ −1 = V̂ Λ̂−1 V̂ 0 are (a.s.) well defined. However, if rank(Σ) < q and/or rank(Σ̂) < q, we need to make adjustments. For this, we define a regularized inverse of a covariance matrix Σ: ΣR = V Λ˜V 0 (3.1) where g[λ1 ; c] 0 .. Λ˜ = Λ˜[λ̄; c] = , . 0 g[λq ; c] (3.2) g[λ; c] ≥ 0, and g[λ; c] is non-increasing in λ. The scalar function g[λ; c] modifies the inverse of the eigenvalues in order to make the inverse well-behaved in a neighborhood of the true eigenvalues. We shall call it the variance regularization function (VRF). We now introduce a partition of the matrix Λ˜ into three submatrices where c represents a threshold which may depend on the sample size and possibly on the sample itself, i.e. c = c[n, Yn ]: ˜ Λ1 [λ̄; c] 0 0 Λ2˜[λ̄; c] 0 . Λ˜ = 0 (3.3) ˜ 0 0 Λ3 [λ̄; c] Let qi = dim Λi˜[λ̄; c], for i = 1, 2, 3, with q1 = q(Σ, c), q2 = m(c) and q3 = q − q1 − q2 . In other words, q1 represents the number of eigenvalues λi > c. The three above components correspond to: £ ¤ Λ1˜[λ̄; c] = diag g[λ1 ; c], . . . , g[λq1 ; c] for λ > c , Λ2˜[λ̄; c] = g(c; c)Iq2 for λ = c , an m(c) × m(c) matrix with m(c) denoting the multiplicity of the eigenvalue λ = c (if any), £ ¤ Λ3˜[λ̄; c] = diag g(λq2 +1 ; c), . . . , g(λq ; c) for λ < c . Let V1 (c) be a q × q1 matrix whose columns are the eigenvectors associated with the eigenvalues λ > c arranged in the same order as the eigenvalues. The eigenvectors associated with λ > c form a basis of the eigenspace corresponding with λ. If m(λ) = 1, these eigenvectors are uniquely defined, otherwise not. The same holds for the q × q2 matrix V2 (c) whose columns are the eigenvectors associated with the eigenvalues λ = c and for the q × q3 matrix V3 (c) whose columns are the eigenvectors associated with the eigenvalues λ < c. Λ1˜[λ̂; c], Λ2˜[λ̂; c],Λ3˜[λ̂; c], V̂1 (c), V̂2 (c) and V̂3 (c) denote the corresponding quantities based on the estimate Σ̂, with dimΛ1˜[λ̂; c] = q̂1 = card{i ∈ I : λ̂i > c(n)}, dimΛ2˜[λ̂; c] = q̂2 = card{i ∈ I : λ̂i = c(n)}, dimΛ1˜[λ̂; c] = q̂3 = card{i ∈ I : λ̂i < c(n)}, respectively. 8 The following VRF will turn out particularly convenient. 1 if λ̂n,j > c(n) g(λ̂n,j ; c) = λ̂n,j 1 if λ̂n,j ≤ c(n) , (3.4) ²+γ|c(n)−λ̂n,j | where γ > 0. It can be viewed as a modified Hodges’ estimator applied to the eigenvalues. 3.2. Regularized score-type tests Consider the q × m matrix ¡ ¢− Q̃n ≡ Q̃[Wn ] ≡ P̃n J˜n0 Wn J˜n J˜n0 Wn , (3.5) ˆ θ̃n0 ), P (θ) = ∂ψ/∂θ0 , J(θ) ˆ = ∂D0n (θ) = Hn (θ) . with P̃n = P (θ̃n0 ), J˜n = J( ∂θ The assumptions below define the specific nonregular setup we shall be investigating throughout this paper. Specifically, we will examine to what extent relaxing the usual nonsingularity assumptions will affect the behavior of the test criterion. The proposed regularizing tools aim at curing the misbehavior of the test criterion under such circumstances. Assumption 3.1 ( Incomplete restriction rank) There is a (non-empty) open neighborhood N2 of θ0 such that ψ(θ) is continuously differentiable and rank[P (θ)] = r1 for all θ ∈ N1 such that 0 < r1 < q. In the following, we analyze situations where the asymptotic covariance matrix does not necessarily have full rank. We shall further investigate situations where the rank is incomplete not only asymptotically but also in finite samples. To the best of our knowledge, such cases have not been investigated yet in the literature. Under Assumption 3.1 and more generally for any matrix with an incomplete rank – either asymptotically or in finite samples– the usual inverse is not defined and requires either using a generalized inverse or transforming it such that it becomes invertible. While Dufour and Trognon (2001, Proposition 3.1) have chosen the generalized inverse approach to proceed, we shall opt for the regularizing approach instead. The variance regularization method will work under general conditions as long as the threshold c(n) converges to 0 at an appropriate rate depending on the rate of convergence of the covariance matrix eigenvalues. The following assumption is crucial for the distribution of the regularized C(α) test; for those interested in specific conditions under which this assumption holds in a GMM framework, see Dufour and Trognon (2001). Assumption 3.2 £ ¤ P P Z̃n → Zn , Zn → Z ∼ N 0, Vψ (θ0 ) , √ √ where Z̃n = nQ̃n Dn (θ̃n0 ), Z̃n = nQ(θ0 )Dn (θ0 ), Z = Q(θ0 )D∞ (θ0 ). We now introduce the regularized C(α) statistic: £ ¤ P C(θ̃n0 ; ψ) = nDn (θ̃n0 )0 Q̃0n Q̃n I˜0 Q̃0n R Q̃n Dn (θ̃n0 ) . 9 (3.6) (3.7) The latter can be decomposed in three components. To do so, we denote so that Vψ (θ) ≡ Q(θ)I(θ)Q(θ)0 , (3.8) Ṽψ (θ̃n0 ) ≡ Q̃n I˜0 Q̃0n . (3.9) The spectral decomposition of Ṽψ (θ̃n0 ) is: Ṽψ (θ̃n0 ) = V (θ̃n0 )Λ(θ̃n0 )V (θ̃n0 )0 . (3.10) This yields for the Moore-Penrose inverse [Ṽψ (θ̃n0 )]+ = V̂1 (c)Λ̂˜1 (c)V̂10 (c) , (3.11) [Ṽψ (θ̃n0 )]R = V̂1 (c)Λ̂˜1 (c)V̂10 (c) + V̂2 (c)Λ̂˜2 (c)V̂20 (c) + V̂3 (c)Λ̂˜3 (c)V̂30 (c) . (3.12) and the regularized inverse Based on this regularized inverse, we introduce the regularized C(α) statistic, which can be decomposed as follows: P C R (θ̃n0 ; ψ) = nDn (θ̃n0 )0 Q̃0n [Ṽψ (θ̃n0 )]R Q̃n Dn (θ̃n0 ) = nDn (θ̃n0 )0 Q̃0n £ 0 ˜ V̂1 0 0 ¤ Λ̂1 (c) V̂1 (c) V̂2 (c) V̂3 (c) 0 Λ̂2˜(c) 0 V̂20 Q̃n Dn (θ̃n0 ) 0 0 Λ̂3˜(c) V̂30 = P C 1n (c) + P C 2n (c) + P C 3n (c) , (3.13) where P C 1n (c) = nDn (θ̃n0 )0 Q̃0n V̂1 (c)Λ̂˜1 (c)V̂10 (c)Q̃n Dn (θ̃n0 ) , (3.14) P C 2n (c) = nDn (θ̃n0 )0 Q̃0n V̂2 (c)Λ̂˜2 (c)V̂20 (c)Q̃n Dn (θ̃n0 ) , (3.15) P C 3n (c) = nDn (θ̃n0 )0 Q̃0n V̂3 (c)Λ̂˜3 (c)V̂30 (c)Q̃n Dn (θ̃n0 ) . (3.16) It is important to remark that the regularized C(α) statistic given in equation (3.13) can easily be simulated together with its three components as soon as we have a consistent estimate Ṽψ (θ̃n0 ) for the limiting covariance matrix Vψ (θ0 ) of the score vector. Then applying the spectral decomposition tools followed by the proposed regularizing design, we can easily simulate the regularized statistic. The appealing and convenient property of being simulable enjoyed by this specific formulation widens its scope of application to general nonregular situations. Another appealing feature of this regularized C(α) statistic is that it can be decomposed into three independent components. The independence feature of its components arises from the eigenprojection onto different eigenspaces. 10 This property greatly simplifies the achievement of the distributional result of the regularized C(α) test. 3.3. Regularized LR-type tests We now study the behavior of the LR-type statistic under the following nonregularity assumption. Assumption 3.3 (Singularity of the weighting matrix) det(I(θ0 )) = 0 . We propose to regularize "LR-type " test statistic in the following way: LR(ψ)R = n[Mn (θ̂n0 , I¯n,R ) − Mn (θ̂n , I¯n,R )] (3.17) where Mn (θ, I¯n,R ) = Dn (θ)0 I¯n,R Dn (θ) 0 ˜ V̂1 Λ̂1 (c) 0 0 £ ¤ = Dn (θ)0 V̂1 (c) V̂2 (c) V̂3 (c) 0 Λ̂2˜(c) 0 V̂20 Dn (θ) V̂30 0 0 Λ̂3˜(c) = Dn (θ)0 V̂1 (c)Λ˜1 (c)V̂10 (c)Dn (θ) + Dn (θ)0 V̂2 (c)Λ˜2 (c)V̂20 (c)Dn (θ) +Dn (θ)0 V̂3 (c)Λ˜3 (c)V̂30 (c)Dn (θ) . (3.18) In the statistic Mn (θ, I¯n,R ), I¯n,R is any consistent estimator of the regularized covariance matrix IR (θ0 ) which is typically a function of a preliminary estimator θ̄n of θ: I¯n,R = Iˆn,R (θ̄n ) = £ ¤ Iˆn (θ̄n ) R where Iˆn (θ) is defined as in equation (2.6). In the sequel let Mn,R (θ) ≡ Mn (θ, I¯n,R ). This yields a decomposition of the LR statistic into three components LR(ψ)R = n[Mn (θ̂n0 , I¯n,R ) − Mn (θ̂n , I¯n,R )] = n[Dn (θ̂n0 )0 I¯n,R Dn (θ̂n0 ) − Dn (θ̂n )0 I¯n,R Dn (θ̂n )] = n[Dn (θ̂n0 )0 V̂1 (c)Λ˜1 (c)V̂10 (c)Dn (θ̂n0 ) − Dn (θ̂n )0 V̂1 (c)Λ˜1 (c)V̂10 (c)Dn (θ̂n )] +n[Dn (θ̂n0 )0 V̂2 (c)Λ˜2 (c)V̂20 (c)Dn (θ̂n0 ) − Dn (θ̂n )0 V̂2 (c)Λ˜2 (c)V̂20 (c)Dn (θ̂n )] +n[Dn (θ̂n0 )0 V̂3 (c)Λ˜3 (c)V̂30 (c)Dn (θ̂n0 ) − Dn (θ̂n )0 V̂3 (c)Λ˜3 (c)V̂30 (c)Dn (θ̂n )] = LR1,n + LR2,n + LR3,n (3.19) The regularized estimator Iˆn,R (θ̄n ) of the covariance matrix IR (θ0 ) corresponds to: Iˆn,R (θ̄n ) = n−1 X κ(j/Kn )Γ̂n,R (j, θ̄n ) j=−(n−1) 11 (3.20) £ ¤ where Γ̂n,R (j, θ̄n ) = Γ̂n (j, θ̄n ) R with Γ̂n (j, θ̄n ) defined as in equation (2.7). 3.4. Regularized Wald tests Similarly, we can decompose the original Wald statistic Wn in (2.14) as Wn = W1n (c) + W2n (c) + W3n (c) (3.21) 0 W1n (c) = n ψ(θ̂)0 V̂1 (c)Λ̂−1 1 (c)V̂1 (c)ψ(θ̂) , (3.22) where W2n (c) = W3n (c) = 0 nψ(θ̂)0 V̂2 (c)Λ̂−1 2 (c)V̂2 (c)ψ(θ̂) 0 nψ(θ̂)0 V̂3 (c)Λ̂−1 3 (c)V̂3 (c)ψ(θ̂) , (3.23) . (3.24) We now propose to consider regularized Wald statistics defined as follows: W̃n = n ψ(θ̂)0 Σ̂R ψ(θ̂) = n ψ(θ̂)0 V̂ Λ̂˜ V̂ 0 ψ(θ̂) g(λ̂1 ; c) 0 0 .. = n ψ(θ̂)0 V̂ V̂ ψ(θ̂) . 0 g(λ̂q ; c) 0 ˜ V̂1 Λ̂1 (c) 0 0 £ ¤ = n ψ(θ̂)0 V̂1 (c) V̂2 (c) V̂3 (c) 0 Λ̂2˜(c) 0 V̂20 ψ(θ̂) 0 0 Λ̂3˜(c) V̂30 (3.25) W̃n = n ψ(θ̂)0 V̂1 (c)Λ̂1˜(c)V̂10 (c)ψ(θ̂) + nψ(θ̂)0 V̂2 (c)Λ̂2˜(c)V̂20 (c)ψ(θ̂) + nψ(θ̂)0 V̂3 (c)Λ̂3˜(c)V̂30 (c)ψ(θ̂) = W̃1n (c) + W̃2n (c) + W̃3n (c) . (3.26) By partitioning the eigenvalue matrix Λ˜ into three blocks, Λ1˜(c) for λ > c, Λ2˜(c) for λ = c and Λ3˜(c) for λ < c, we have identified a convenient decomposition of the statistic into three components: a first component involving the "large" eigenvalues, a second component gathering the eigenvalues exactly equal to the threshold c and a third component which incorporates the small remaining eigenvalues. As we will see below, the first component can be viewed as "regular", while the other ones may not be "regular". In particular, the third component requires a regularization. Indeed, because of the invertibility difficulties raised from small values of λ, we will replace the latter with regularized bounded eigenvalues. Instead, Lütkepohl and Burda (1997) keep only the eigenvalues which are greater than the threshold c, which boils down to setting W̃2n (c) = 0 and W̃3n (c) = 0. Discarding the small eigenvalues in such a way may result in a loss of information. It is important to note that in finite samples it will be difficult to disentangle between the estimates which really correspond to λ = c from those which are close to c but are distinct from c. This makes the estimation procedure trickier and the asymptotic distribution more complicated. Note 12 that W1n (c) = W̃1n (c) for this is the regular component common to both statistics, the usual Wald and the regularized Wald statistics. Moreover when there is no eigenvalues exactly equal to the threshold c, m(c) = 0, and the decomposition involves only two components. 4. Preliminary results on eigenprojections In this section, we discuss some non-uniqueness and discontinuity issues regarding the eigenvectors of a given matrix. While it is well-known in spectral theory that the eigenvectors corresponding to multiple eigenvalues are not uniquely defined (but only up to the post multiplication by an m(λ) × m(λ) orthogonal matrix with m(λ) indicating the multiplicity of the eigenvalue), econometricians are not always cautious about checking such conditions. Second, whereas the eigenvalues are generally known to be continuous functions of the elements of the matrix, this statement does not necessarily hold for the eigenvectors. The main pitfall consists of drawing some convergence results for the estimates of the eigenvectors based on the consistency of the sample matrix which critically hinges on the continuity assumption of eigenvectors w.r.t. the elements of the matrix. Even in the deterministic case, the eigenvectors are not necessarily continuous functions of the elements of the matrix. To see the discontinuity of the eigenvectors, we consider the following simple counter-example. Example 4.1 Let A(x) be the matrix function defined as: # " 1 + x 0 if x < 0 1−x 0 (4.1) A(x) = " # 1 x if x ≥ 0 . x 1 This matrix function is clearly continuous at x = 0, with A(0) = I2 . However, for x < 0, the spectral decomposition of A(x) is: · ¸ · ¸ ¤ ¤ 1 £ 0 £ 1 0 0 1 A(x) = (1 + x) + (1 − x) (4.2) 0 1 [with (1 + x) and (1 − x) being the eigenvalues and (1, 0)0 and (0, 1)0 the eigenvectors], while for x > 0, it is · ¸ · ¸ ¤ ¤ 1 1 1 £ 1 £ 1 1 + √ (1 − x) 1 −1 A(x) = √ (1 + x) (4.3) 1 −1 2 2 [with (1 + x) and (1 − x) being the eigenvalues and √12 (1, 1)0 and √12 (1, −1)0 the eigenvectors]. Clearly, the eigenvalues (1 + x) and (1 − x) are continuous at x = 0 whereas the eigenvectors are not the same wether x → 0+ or x → 0− . Being unaware of this caveat may lead to wrong distributional results through mistakenly applying the continuous mapping theorem to objects that are not continuous. Nevertheless, there exists functions of the eigenvectors which are continuous w.r.t. the elements of the matrix. Specifically, for 13 an eigenvalue λ, the projection matrix P (λ) on the space spanned by the eigenvectors associated with λ - the eigenspace V (λ) - is continuous in the elements of the matrix. This follows from the fact that V (λ) is invariant to the choice of a basis [eigenvectors of λ]. For further discussion of this important property, see Rellich (1953), Kato (1966) and Tyler (1981). In order to derive the asymptotic distribution of the regularized statistics, it will be useful to review and adapt some results on spectral theory used by Tyler (1981). Let S(Σ) denote the spectral set of Σ, i.e. the set of all the eigenvalues of Σ. The eigenspace of Σ associated with λ is defined as all the linear combinations from a basis of eigenvectors xi (λ), i = 1, . . . , m(λ), i.e. V (λ) = {xi (λ) ∈ Rq |Σxi (λ) = λxi (λ)} . (4.4) Clearly, dim V (λ) = m(λ) . Since Σ is a q × q matrix symmetric in the metric of a real positive definite symmetric matrix T (i.e. TΣ is symmetric), we have: X Rq = V (λ) . (4.5) λ∈S(Σ) The eigenprojection of Σ associated with λ, denoted P (λ), is the projection operator onto V (λ) w.r.t. decomposition (4.5) of Rq . For any set of vectors xi (λ) in V (λ) such that xi (λ)0 Txj (λ) = δij , where δij denotes the Kronecker’ s delta, P (λ) has the representation m(λ) P (λ) = X xj (λ)xj (λ)0 T . (4.6) j=1 P (λ) is symmetric in the metric of T. This yields X X λP (λ) , Σ̂ = λ̂P (λ̂) . Σ= λ∈S(Σ) (4.7) λ̂∈S(Σ̂) If v is any subset of the spectral set with Pt PS(Σ), then the total eigenprojection for Σ associated the eigenvalues in v is defined to be λ∈v P (λ). More specifically, let Pk,t (λ) = l=k P (λl ). We report below a lemma given by Tyler (1981, Lemma 2.1, p. 726) which states an important continuity property for eigenvalues and eigenprojections on eigenspaces for non-random symmetric matrices which will be useful to establish the consistency for the sample regularized inverses toward their population quantity. Lemma 4.2 (Continuity of eigenvalues and eigenprojections) Let Σn be a q ×q real matrix symmetric in the metric of a real positive definite symmetric matrix Tn with eigenvalues λ1 (Σn ) ≥ λ2 (Σn ) ≥ . . . ≥ λq (Σn ). Let Pk,t (Σn ) represent the total eigenprojection for Σn associated with λk (Σn ) . . . λt (Σn ) for t ≥ k. If Σn → Σ as n → ∞, then i) λk (Σn ) → λk (Σ), ii) Pk,t (Σn ) → Pk,t (Σ) provided λk−1 (Σ) 6= λk (Σ) and λt (Σ) 6= λt+1 (Σ) . 14 This lemma tell us that the eigenvalues are continuous functions in the elements of the matrix. The same property holds for the projection operators [or equivalently for the projection matrices for there exists a one-to-one relation relating the operator to the matrix with respect to the given bases] associated with the eigenvalues and is transmitted to their sum. NoP matter what the multiplicity t of the eigenvalues involved in the total eigenprojection [Pk,t (λ) = l=k P (λl )], this continuity property holds provided that we can find one eigenvalue just before and one just after which are distinct. We consider first almost sure (a.s.) convergence and then convergence in probability (in symbol p →). It will be useful to extend Lemma 4.2 to random symmetric matrices. To the best of our knowledge, these results are not stated elsewhere. In the following we will tacitly assume that a probability space (Z, AZ , P ) is given and that all random variables are defined on this space. Lemma 4.3 Let Σ̂n be a q × q real random matrix symmetric in the metric of a real positive definite symmetric random matrix Tn and with eigenvalues λ1 (Σ̂n ) ≥ λ2 (Σ̂n ) ≥ . . . ≥ λq (Σ̂n ). Let Pk,t (Σ̂n ) represent the total eigenprojection for Σ̂n associated with λk (Σ̂n ) . . . λt (Σ̂n ) for t ≥ k. a.s. If Σ̂n → Σ as n → ∞ , then: a.s. i) λk (Σ̂n ) → λk (Σ), and a.s. ii) Pk,t (Σ̂n ) → Pk,t (Σ) provided λk−1 (Σ) 6= λk (Σ) and λt (Σ) 6= λt+1 (Σ) . In the sequel, we will show that the continuity property of the eigenvalues and eigenprojections established in the a.s. case, remain valid in the case of convergence in probability . Lemma 4.4 Let Σ̂n be a q × q real random matrix symmetric in the metric of a real positive definite symmetric random matrix Tn with eigenvalues λ1 (Σ̂n ) ≥ λ2 (Σ̂n ) ≥ . . . ≥ λq (Σ̂n ). Let Pk,t (Σ̂n ) p represent the total eigenprojection for Σ̂n associated with λk (Σ̂n ) . . . λt (Σ̂n ) for t ≥ k. If Σ̂n → Σ as n → ∞ , then, for 1 ≤ k ≤ q: p i) λk (Σ̂n ) → λk (Σ), p ii) Pk,t (Σ̂n ) → Pk,t (Σ) provided λk−1 (Σ) 6= λk (Σ) and λt (Σ) 6= λt+1 (Σ) . 5. Asymptotic theory In this section, we state some asymptotic results for the regularized statistics which rely on different specifications of the regularized inverses. We will first study the simple case of a continuous regularizing function before considering the monotonic case. The monotonicity property can be exploited to yield some bounds on the limiting distribution of the modified statistics. Based on the spectral decomposition defined in equation (4.7), we immediately deduce a spectral decomposition for the regularized inverses: X X ΣR = g(λ; c)P (λ) , Σ̂R = g(λ̂; c)P (λ̂) . (5.1) λ∈S(Σ) λ̂∈S(Σ̂) 15 We will provide below various regularized inverses based on different specifications of the regularizing function g(λ; c). We will consider different specifications of the regularizing function which may depend on the sample size n. This path dependency may be introduced through the threshold cn . g(λ, n) = g(λ, cn ) . (5.2) 5.1. Distributional theory for regularizing functions continuous almost everywhere Let the general eigenvalue-regularizing function g(λ; c) be defined as: ½ 1 if λ > c , λ g(λ; c) = 1 if λ ≤ c . c+γ|λ−c| Interesting special cases include: i) γ = ∞, hence ½1 λ g(λ; c) = 0 (5.3) if λ > c if λ ≤ c and therefore Λ˜ = Λ+ c , where Λ+ c = diag[1/λ1 I(λ1 > c), . . . , 1/λq1 I(λq1 > c), 0, . . . , 0 ], with I(s) being equal to 1 if the relation s is satisfied; Λ˜ is a modified version of the MoorePenrose inverse of Λc = diag[λ1 I(λ1 > c), λq1 I(λq1 > c), λq1 +1 I(λq1 +1 ≤ c) . . . , λq I(λq ≤ c)] , as the one used by Lütkepohl and Burda (1997, henceforth LB); ii) γ = 0, hence ½1 λ 1 c g(λ; c) = if λ > c if λ ≤ c henceforth called DV1; iii) γ = − 12 , hence ½ g(λ; c) = 1 λ 2 λ+c if λ > c if λ ≤ c henceforth called DV2. Note that the eigenvalue-regularizing function g(λ; c) may allow for a sample-size dependent threshold cn . Proposition 5.1 (Almost sure convergence of the regularized inverses) Let g(λ; c) be a continua.s. ous function at each eigenvalue of Σ. If Σ̂n → Σ and λj (Σ) 6= c for all j, then a.s. Σ̂R,n → ΣR . 16 (5.4) Before establishing the main convergence result for the regularized covariance matrices, we must first study the convergence rate of the eigenvalues in the general case where the covariance matrix may be singular with (possibly) multiple eigenvalues. To do so, we shall apply a general result given by Eaton and Tyler (1994) where they generalize some nonclassical results due to Anderson (1963) Anderson (1987) on the behavior of sample roots (of a determinantal equation). Specifically, under fairly weak conditions they show that if the sequence of random matrices of L interest Xn is such that Zn = bn (Xn − B0 ) → Z with the limit B0 nonstochastic, then the sample L eigenvalues will have the same convergence rate, with bn [Ψ (Xn )−Ψ (B0 )] → Ψ (Z), where Ψ (Xn ) denotes the q−vector of singular values of Xn . This convergence rate of the sample eigenvalues will be useful in establishing the asymptotic distribution of the regularized tests. Let d1 > d2 > · · · > dk denote the distinct eigenvalues of Σ and let mi be their multiplicities, i = 1, . . . , k. Given the eigenvalue multiplicities of Σ, a function H on q × q symmetric matrices can be defined by ρ(Σ11 ) ρ(Σ22 ) H(Σ) = . (5.5) .. ρ(Σkk ) where Σii is the mi × mi diagonal block of Σ determined by partitioning Σ into mi × mj blocks, i, j = 1, . . . , k. Thus H(Σ) takes values in Rq and ρ(Σii ) consists of the mi -vector of ordered eigenvalues of the diagonal block Σii , i = 1, . . . , k. p Lemma 5.2 Let Σn be a q × q real random matrix such that Σn → Σ, where Σ is a q × q real nonstochastic matrix such that rank(Σ) = r1 , 1 ≤ r1 < q. Let also bn be a sequence of real constants such that L Zn = bn (Σn − Σ) → Z (5.6) where bn → +∞ and Z is a random matrix. Then, £ ¤ L En = bn Ψ (Σn ) − Ψ (Σ) → ¢¸ · ¡1 0 H 2 [Z11 + Z11 ] Ψ (Z22 ) (5.7) where Z11 and Z22 are well-defined random elements. For a proof, see Eaton and Tyler (1994, Theorem 4.2). We state now the convergence result for the regularized inverses which is fundamental to obtain well-behaved regularized test statistics. Let λ̂n,i = λi (Σn ) and λi = λi (Σ). First when designing the VRF g(λ; c(n)), the threshold c(n) must be selected so that £ ¤ £ ¤ P r |λ̂n,i − λi | > c(n) = P r |bn (λ̂n,i − λi )| > bn c(n) → 0 n→∞ with c(n) → 0 and bn c(n) → ∞ as n grows to infinity. Thus, c(n) declines to 0 slower than 1/bn , and bn c(n) → ∞ slower than bn does. Indeed, the threshold must not decline to zero either too fast, or too slow. Selecting c(n) in this way guarantees that the nonzero eigenvalues of the covariance 17 matrix will eventually be greater than the threshold, while the true zero eigenvalues are still smaller than the threshold and are set to zero at least in large samples. This convergence rate plays a crucial role in Proposition 5.3 below. Because the random objects considered here are matrices, we must chose a norm suitable to matrices. For this reason, we consider the finite dimensional inner product space (Sq , < ·, · >), where Sq is the vector space of q × q symmetric matrices. Sq is equipped with the inner product < Σ1 , Σ2 >= tr[Σ10 Σ2 ], where tr denotes the trace. Let k · k denote the Frobenius norm induced by this inner product, i.e. kΣk2F = tr[Σ 0 Σ]. Let AR denote the regularized inverse of a q × q real symmetric matrix A. Proposition 5.3 Let Σ be a q × q real positive semidefinite nonstochastic matrix. Let Σn be a q × q real symmetric random matrix and suppose the assumptions of Lemma 5.2 hold. Let g(λ; c(n)) be a function continuous almost everywhere w.r.t. λ, with the set consisted of the discontinuities of g being of measure zero. If (5.6) holds, then p (Σn )R → (Σ)R . Proofs appear in the Appendix. In the following, we establish a characterization of the asymptotic distribution of the regularized test statistics in the general case. This characterization makes use of a decomposition of the modified statistic into a regular component and the regularized component. We first formulate the proposition under the following generic assumption before verifying it in special cases for the C(α) and Wald test statistics. Assumption 5.4 Let Zn and Z be q × 1 random vectors defined on a common probability space such that: L Zn → Z ∼ N [0, Σ] , with rank(Σ) = q1 < q . Recall that q̂1 = card{j : λ̂j > c(n)} and q1 = card{j : λj > c(n)}. Let Aq1 = {ω ∈ Ω : q̂1 (ω) = q1 } such that P r(Aq1 ) → 1. n→∞ Proposition 5.5 (Asymptotic distribution of regularized statistics when the threshold goes to zero) Suppose Assumption 5.4 holds. Suppose further that the assumptions of Proposition 5.3 are satisfied. The regularized statistic L 0 0 Zn0 [Σn ]R Zn = Z1n Λ̂˜1n Z1n + Z2n Λ̂˜2n Z2n → Z10 Λ˜1 Z1 + Z20 Λ˜2 Z2 = X1 + X2 ∀w ∈ Aq1 , with X2 = Z20 Λ˜2 Z2 = 0 and X1 = Z10 Λ˜1 Z1 ∼ χ2 (q1 ) . It is worth noting that when the threshold c(n) converges to zero at an appropriate rate based on that of the eigenvalues of the covariance matrix, the statistic thus regularized is asymptotically distributed as a χ2 (q1 ) variable with q1 being the dimension of the eigenspace corresponding to the nonzero eigenvalues. Under this circumstance, we find the asymptotic result obtained by Lütkepohl and Burda (1997) but at the difference that the above regularized statistic is always greater than that 18 of Lütkepohl and Burda (1997) in finite samples. While a simulation-based approach is required to control the level under the null hypothesis, this feature may entail some gains in power under the alternative. In the following we apply the above distributional result to specific test statistics such as the regularized C(α), the regularized Wald test and finally to the regularized LR-type test. To do so, we need to check that the required assumptions for Proposition 5.5 to hold are satisfied for those specific statistics. Also, under the assumptions 2.4 and 2.7, we have: plim Ṽψ (θ̃n0 ) = plim Q̃n I˜0 Q̃0n = Q(θ0 )I(θ0 )Q(θ0 )0 = Vψ (θ0 ) . n→∞ n→∞ Corollary 5.6 Under the assumptions 2.4 and 2.7, and the assumptions of Proposition 5.3, we have: p [Ṽψ (θ̃n0 )]R → [Vψ (θ0 )]R . (5.8) The results stated in Proposition 5.5 when applied to Zn = Qn Dn (θ̃n0 ) yield a χ2 distribution for the regularized C(α) test. Proposition 5.7 (Asymptotic distribution of regularized C(α) statistic) Suppose Assumption 3.2 holds. Suppose further that the assumptions of Proposition 5.5 and those of Corollary 5.6 are satisfied. Then L P C R (θ̃n0 ; ψ) → P C R (θ0 ; ψ) = D∞ (θ0 )0 Q(θ0 )0 [Vψ (θ0 )]R Q(θ0 )D∞ (θ0 ) ∼ χ2 (q1 ) i) P C 1n (c(n)) = χ2 (q1 ); ii) P C 2n (c(n)) = Pq̂1 L 0 0 j=1 (b̂1n,j Z̃n )g(λ̂n,j ; c(n))(b̂1n,j Z̃n ) → Pq Pq1 p 0 0 j=q̂1 +1 (b̂2n,j Z̃n )g(λ̂n,j ; c(n))(b̂2n,j Z̃n ) → (5.9) 0 0 j=1 (b1,j Z)g(λj ; c(n))(b1,j Z) ∼ 0; bi,j for i = 1, 2 represents an eigenvector associated with λj . The results established above deserve some discussion. Thus, it is important to notice that both components are asymptotically independent. Whereas the first component is asymptotically χ2 distributed, the second one collapses to zero. To the extent that the regularized C(α) statistic can be viewed as the sum of two "small" independent C(α) statistics, it can be easily simulated to provide a feasible test under singularity conditions. √ Similarly when taking Zn = nψ(θ̂), Proposition 5.5 establishes the asymptotic distribution of the regularized Wald test. Proposition 5.8 (Asymptotic distribution of regularized Wald statistics) Suppose that the assumptions of Proposition 5.5 hold. Suppose further that Assumption 2.9 holds. Let W̃n = nψ(θ̂)0 Σ̂ψ,R ψ(θ̂) = W1n + W2n , with W1n = nψ(θ̂)0 V̂1 (c(n))Λ̂1˜(c(n))V̂10 (c(n))ψ(θ̂) 19 (5.10) and W2n = nψ(θ̂)0 V̂2 (c(n))Λ̂2˜(c(n))V̂20 (c(n))ψ(θ̂) . Then, p L W̃1n → W1 , with W1 ∼ χ2 (q1 ) and W2n → 0. to: (5.11) (5.12) For example, in the DV1 case (i.e. with Λ˜2 (c) = 1c I(q−qc ) ), the modified statistic corresponds W̃n = W̃1n (c) + W̃2n (c) = nψ(θ̂)0 V̂1 (c)Λ̂1˜(c)V̂10 (c)ψ(θ̂) + nψ(θ̂)0 V̂2 (c)Λ̂2˜(c)V̂20 (c)ψ(θ̂) n = nψ(θ̂)0 V̂1 (c)Λ̂1˜(c)V̂10 (c)ψ(θ̂) + ψ(θ̂)0 V̂2 (c)V̂20 (c)ψ(θ̂) , c (5.13) and for the LB case the modified statistic corresponds to: W̃n = W̃1n (c)+W̃2n (c) = nψ(θ̂)0 V̂1 (c)Λ̂1˜(c)V̂10 (c)ψ(θ̂)+0 = nψ(θ̂)0 V̂ (c)Λ̂ + (c)V̂ 0 (c)ψ(θ̂) = Wn+ (c) , (5.14) −1 + −1 where Λ (c) = diag(λ1 , . . . , λqc , 0, . . . , 0) represent the Moore-Penrose inverse of Λ = diag(λ1 , . . . , λqc , λqc +1 , . . . , λq ). Therefore Wn+ (c) = W̃1n (c) When the threshold goes to zero at the appropriate speed, the regularized test statistics have a standard chi-square limiting distribution, with the nonregular component collapsing to zero. A more interesting case producing a nonstandard asymptotic distribution would be the case that the threshold is fixed putting a positive probability mass on the eigenvalues exactly equal to the threshold c. Let us now study this case. Proposition 5.9 (Asymptotic distribution of regularized statistics when the threshold is fixed) Suppose Zn and Z be q × 1 random vectors defined on a common probability space such that: L Zn → Z ∼ N [0, Σ] , with rank(Σ) = r ≤ q . Let the regularized statistic satisfy the following decomposition: 0 0 0 Zn0 [Σn ]R Zn = Z1n Λ̂˜1n Z1n + Z2n Λ̂˜2n Z2n + Z3n Λ̂˜3n Z3n , then 0 Z1n Λ̂˜1n Z1n = q1 X L z1nj g(λ1nj ; c)z1nj → χ2 (q1 ) j=1 q2 X 0 Z2n Λ̂˜2n Z2n = L z2nj g(λ2nj ; c)z2nj → j=q1 +1 0 Z3n Λ̂˜3n Z3n = q X where vj cg(c; c)I(λ2j = c)vj2 ≤ χ2 (q2 ) j=r1 +1 q X L z3nj g(λ3nj ; c)z3nj → j=q2 +1 q2 X λ3j g(λ3j ; c)I(λ3j < c)vj2 ≤ χ2 (q3 ) j=q2 +1 i.i.d. ∼ N (0, 1). 20 6. Examples 6.1. Multistep noncausality As already observed by Lütkepohl and Burda (1997), testing noncausality restrictions may raise some singularity problems for the Wald test. We shall reconsider the example provided by Lütkepohl and Burda (1997) in our specific regularization design. A VAR(1) process is considered for the (3 × 1)vector yt = [xt yt zt ]0 as follows: ux,t xt−1 αxx αxy αxz xt−1 xt yt = A1 yt−1 + ut = αyx αyy αyz yt−1 + uy,t . uz,t zt−1 αzx αzy αzz zt−1 zt Let consider Y ≡ (y1 , . . . , yT ) (3 × T ) B ≡ (A1 ) (3 × 3) Zt ≡ [yt ] (3 × 1) Z ≡ (Z0 , . . . , ZT −1 ) (3 × T ) U = (u1 , . . . , uT ) (3 × T ) where ut = [ux,t uy,t uz,t ]0 is a white noise with (3 × 3) nonsingular covariance matrix Σu . Let (∞) α = vec(A1 ) = vec(B). Testing H0 : yt 6→ xt requires to test h = pKz + 1 = 2 restrictions on α [see Dufour and Renault (1998)] of the form: · ¸ · ¸ αxy 0 r(α) = = . αxx αxy + αxy αyy + αxz αzy 0 These restrictions are fulfilled in the following three different parameter settings αxy = αxz = 0, αzy 6= 0 αxy = αzy = 0, αxz 6= 0 αxy = αxz = αzy = 0 (6.1) But we can observe that the first-order partial derivative of the restrictions leads to a singular matrix · ¸ ∂r 0 0 0 1 0 0 0 0 0 = (6.2) αxy 0 0 αxx + αyy αxy αxz αzy 0 0 ∂α0 if (6.1) holds. Under such circumstances, the Wald test does not have the standard χ2 −distribution under the null. To perform the Wald test, let us consider the multivariate LS estimator of α = vec(A1 ) = vec(B). Using the column stacking operator vec we have: Y = BZ + U 21 (6.3) or vec(Y ) = vec(BZ) + vec(U ) ¡ ¢ y = Z 0 ⊗ I3 vec(B) + vec(U ) ¡ ¢ y = Z 0 ⊗ I3 α + u where E(uu0 ) = I3 ⊗ Σu . The multivariate LS estimator α̂ is given by: µ ¶ 0 −1 α̂ = (ZZ ) Z ⊗ I3 y . (6.4) (6.5) (6.6) (6.7) The asymptotic distribution of the multivariate LS estimator: √ ¡ ¢ L T (α̂ − α) → N 0, Γ −1 ⊗ Σu ) (6.8) implies the asymptotic distribution for the restrictions: √ ¡ ¢ L T (r(α̂) − r(α)) → N 0, Σr(α) ) where Σ̂r(α) = ∂r ∂r0 (α̂) (α̂) Σ̂ α ∂α0 ∂α (6.9) (6.10) is a consistent estimator for Σr(α) and Σ̂α = Γ̂ −1 ⊗ Σ̂u is a consistent estimator for Σα with Γ̂ = and 1 ZZ 0 T T ¤ 1X 1 £ Σ̂u = ût û0t = Y IT − Z 0 (ZZ 0 )−1 Z Y 0 . T T (6.11) (6.12) (6.13) t=1 From the asymptotic distribution (6.9), a Wald-type test is easily obtained to test the null H0 : r(α) = 0, i.e. R Wψ = T r(α̂)0 Σ̂r(α) (α̂) (6.14) where a regularization is required under parameter setting (6.1). 7. Conclusion In this paper, we introduce two classes of tests allowing us to deal with asymptotically singular covariance matrices whose finite sample analog is not singular. Exploiting the continuity property of the total eigenprojection stated in Tyler (1981) enables us to built continuous regularized inverses 22 that are useful in establishing asymptotic distributional results for the regularized tests statistics. The standard distributional results can be used to provide asymptotically valid tests in a convenient way. However, the usual critical point may be conservative. In this respect, a simulation-based approach can improve the performance of the procedure under the alternative. Finally, the tuning parameter requires a deeper examination together with the study of monotonic variance regularization function. 23 A. Appendix: Proofs a.s. PROOF of Lemma 4.3 If Σ̂n → Σ then the event A = {ω : Σ̂n (ω) one: P (A) = 1. For any ω ∈ A , we have by Lemma 4.2: [Σ̂n (ω) → Σ} has probability n→∞ → Σ] ⇒ [λj (Σ̂n (ω)) → λj , j = 1, . . . , J] . n→∞ By denoting B = {ω : λj (Σ̂n (ω)) → λj (Σ)}, we have A ⊆ B, hence we have with probability n→∞ one the result i) stated in the Lemma. By using the same type of argument, we have the result ii) for the eigenprojections. PROOF of Lemma 4.4 p If Σ̂n → Σ with eigenvalues {λj (Σ̂n )}, then every subsequence {Σ̂nk } with eigenvalues p {λ(Σ̂nk )}, also satisfies Σ̂nk → Σ. By Lukacs (1975, theorem 2.4.3, page 48), there exists a.s. {Σ̂ml } ⊆ {Σ̂nk } such that Σ̂ml → Σ. Hence by Lemma 4.3, we have a.s. i) λj (Σ̂ml ) → λj (Σ), a.s. ii) Pj,t (Σ̂ml ) → Pj,t (Σ) provided λj−1 (Σ) 6= λj (Σ) and λt (Σ) 6= λt+1 (Σ) . As we have {Σ̂ml } ⊆ {Σ̂nk } ⊆ {Σ̂n } with the corresponding eigenvalues {λj (Σ̂ml )} ⊆ {λj (Σ̂nk )} ⊆ {λj (Σ̂n )} by Lukacs (1975, theorem 2.4.4 page 49), it suffices that every subsequence {λj (Σ̂nk )} of {λj (Σ̂n )} contains a subsequence {λj (Σ̂ml )} which converges almost surely, p to get the result of convergence in probability: λj (Σ̂n ) → λj (Σ̂) . By applying the same type of p argument, we get Pj,t (Σ̂n ) → Pj,t (Σ) . PROOF of Proposition 5.1 We shall first focus on the case where all the eigenvalues are distinct before considering multiple eigenvalues. Let abbreviate λ̂j = λj (Σ̂n (ω)) , P (λ̂j ) = Pj,j [Σ̂n (ω)] , λj = λj (Σ) and P (λj ) = Pj,j (Σ). Recall equation (4.7) we have: X X Σ̂˜n (ω) = g(λ̂j ; c)P (λ̂j ) and Σ˜ = g(λj ; c)P (λj ) (A.1) λj ∈S(Σ) λ̂j ∈S(Σ̂n (ω)) a.s. a.s. If Σ̂n → Σ then by lemma 4.3 i), we have λ̂j → λj for all j, hence by continuity we a.s. have g(λ̂j ; c) → g(λj ; c) ∀ j. By lemma 4.3 ii) and the continuity of the product, we have a.s. g(λ̂j ; c)P (λ̂j ) → g(λj ; c)P (λj ) ∀ j . It follows from the componentwise consistency, the cona.s. sistency for the whole sum. Hence Σ̂˜n → Σ˜ for m(λ) = 1 . Now, let us establish the consistency property in the case of multiple eigenvalues.Suppose that λj has multiplicity m(λj ), i.e. λj−1 > λj = λj+1 = · · · = λj+m(λj ) > λj+m(λj )+1 and λ̂j−1 > λ̂j ≥ λ̂j+1 ≥ · · · ≥ λ̂j+m(λ̂j ) > λ̂j+m(λ̂j )+1 . Let us focus on the corresponding part of the spectral 24 decomposition of Σ̂n , i.e. j+m(λ̂j ) X j+m(λ̂j ) g(λ̂k ; c)P (λ̂k ) = X g(λ̂j ; c)P (λ̂j ) + k=j g(λ̂k ; c)P (λ̂k ) k=j+1 j+m(λ̂j ) = X g(λ̂j ; c)P (λ̂j ) + [g(λ̂k ; c) − g(λ̂j ; c) + g(λ̂j ; c)]P (λ̂k ) k=j+1 j+m(λ̂j ) = X g(λ̂j ; c)P (λ̂j ) + j+m(λ̂j ) g(λ̂j ; c)P (λ̂k ) + k=j+1 = g(λ̂j ; c) j+m(λ̂j ) → g(λj ; c) [g(λ̂k ; c) − g(λ̂j ; c)]P (λ̂k ) k=j+1 j+m(λj ) X X P (λ̂k ) + k=j a.s. [g(λ̂k ; c) − g(λ̂j ; c)]P (λ̂k ) k=j+1 j+m(λ̂j ;c) X X j+m(λj ) P (λk ) + k=j X [g(λk ; c) − g(λj ; c)]P (λk ) (A.2) k=j+1 by Lemma 4.3 and by continuity of g(.; c). Moreover, for λj = λj+1 = · · · = λj+m(λj ) and as g(.; c) is continuous, we have: g(λj ; c) = g(λj+1 ; c) = · · · = g(λj+m(λj ) ; c), hence ∀ k = j + 1, . . . , j + m(λj ), we have g(λk ; c) − g(λj ; c) = 0. For the projection operator P is a continuous linear operator; following Kolmogorov and Fomin (1970, theorem 1, page 223), a necessary condition for a linear operator P to be continuous on a topological linear space is that P be bounded. Therefore, equation (A.2) becomes j+m(λ̂j ) X g(λ̂k ; c)P (λ̂k ) a.s. → j+m(λj ) j+m(λj ) g(λj ; c) X P (λk ) + k=j+1 k=j k=j a.s. → X [g(λk ; c) − g(λj ; c)] P (λk ) | {z } | {z } 0 O(1) j+m(λj ) g(λj ; c) X P (λk ) (A.3) k=j As the result holds for any λj with an arbitrary multiplicity m(λj ) > 1, hence X X a.s. Σ̂˜ψ(θ) = g(λ̂; c)P (λ̂) → g(λ; c)P (λ) = Σ˜ψ(θ) ∀ m(λ) ≥ 1 . λ̂∈Ŝ λ∈S L PROOF of Proposition 5.5 By Assumption 5.4, we have Zn → Z ∼ N [0, Σ]. By Proposition 25 p 5.3, we have [Σn ]R → [Σ]R Hence, L Zn0 [Σn ]R Zn → Z 0 [Σ]R Z ∼ χ2 (ν) (A.4) with ν = rank(Σ). Zn0 [Σn ]R Zn · ¸· 0 ¸ £ ¤ Λ̂˜1n 0 V̂1n Zn = Zn V̂1n V̂2n 0 ˜ 0 Λ̂2n V̂2n ¸ · ¸· 0 ¤ Λ̂˜1n 0 V̂1n Zn 0 0 0 Zn Zn + Zn0 V̂2n Λ̂˜2n V̂2n = Zn0 V̂1n Λ̂˜1n V̂1n Zn V̂2n 0 Z 0 Λ̂˜2n V̂2n n (A.5) = Zn0 V̂n Λ̂˜n V̂n Zn = £ 0 Zn V̂1n 0 Z and Z = V 0 Z for i = 1, 2. We decompose the limiting variable Let Zin = V̂in n i i X = Z 0 [Σ]R Z = Z 0 V1 Λ˜1 V10 Z + Z 0 V2 Λ˜2 V20 Z = Z10 Λ˜1 Z1 + Z20 Λ˜2 Z2 = X1 + X2 in the same way as 0 0 0 0 Xn = Zn0 [Σn ]R Zn = Zn0 V̂1n Λ̂˜1n V̂1n Zn +Zn0 V̂2n Λ̂˜2n V̂2n Zn = Z1n Λ̂˜1n Z1n +Z2n Λ̂˜2n Z2n = X1n +X2n . L Taking f (x, y) = x + y, by the Slutsky theorem1 applied to the vector Xn , we have X1n + X2n → X1 + X2 , with Xi , i = 1, 2 being independent variables. L 0 0 X1n + X2n = Z1n Λ̂˜1n Z1n + Z2n Λ̂˜2n Z2n → Z10 Λ˜1 Z1 + Z20 Λ˜2 Z2 = X1 + X2 ∀w ∈ Aq1 . As V ar(Z1 ) = Λ1 , V ar(Z2 ) = Λ2 = diag(λj ), with λj = 0, for all j = q1 +1, . . . , q. For the large eigenvalues, we have Λ˜1 Λ1 = Iq1 and for the q2 eigenvalues equal to zero we have Λ˜2 Λ2 = 0q2 . L This leads to X2 = 0. Therefore X1n + X2n → X1 ∼ χ2 (q1 ) ∀w ∈ Aq1 . PROOF of Proposition 5.7 Let bin,j for i = 1, 2 be the eigenvectors associated with λ̂n,j which constitute the columns of the matrix V̂i (c(n)), respectively. Let P C 1n (c(n)) = Z̃n0 V̂1 (c(n))Λ̂˜1 (c(n))V̂10 (c(n))Z̃n = q̂1 · X ¸ (b̂01n,j Z̃n )g(λ̂n,j ; c(n))(b̂01n,j Z̃n ) j=1 ¸ q1 q̂1 · q1 X X ¡ λj ¢ 2 X 1 L 0 0 vj2 ∼ χ2 (q1 ), ∀w ∈ Aq1 vj = (b1n,j Z̃n ) (b1n,j Z̃n ) → = λ j λ̂ n,j j=1 j=1 j=1 1 L (A.6) The Slutsky theorem states if Xn → X and P r[X ∈ C(f )] = 1, where C(f ) is the continuity set of f , then L f (Xn ) → f (X). 26 i.i.d for vj ∼ N (0, 1). P C 2n (c(n)) = Z̃n0 V̂2 (c(n))Λ̂˜2 (c(n))V̂20 (c(n))Z̃n = · q X ¸ (b̂02n,j Z̃n )g(λ̂n,j ; c(n))(b̂02n,j Z̃n ) j=q̂1 +1 L → q X p g(λ̂n,j ; c(n))λ̂n,j vj2 → 0 ∀ ω ∈ Aq1 j=q̂1 +1 p because λ̂n,j → λj = 0 for j = q̂1 + 1 . . . , q and λ̂n,j = Op (1/bn ) with g(.; c(n)) a function i.i.d continuous a.e. except on a set with measure zero. vj ∼ N (0, 1). 27 References [0] Amemya, Y. (1986), Limiting distribution of certain characteristic roots under general conditions and their applications, Technical report 17, economic workshop, stanford university. Anderson, T. W. (1963), ‘Asymptotic theory for principal component analysis’, Annals of Mathematical Statistics 34, 122–148. Anderson, T. W. (1987), The asymptotic distribution of characteristic roots and vectors in multivariate components of variance, Technical report 23, econometric workshop, stanford university. Andrews, D. W. K. (1987), ‘Asymptotic results for generalized Wald tests’, Econometric Theory 3, 348– 358. Carrasco, M. (2007), A regularizing approach to the many instrument problem, Technical report, Université de Montréal. Carrasco, M., Chernov, M., Florens, J.-P. and Ghysels, E. (2007), ‘Efficient estimation of general dynamic models with a continuum of moment conditions’, Journal of Econometrics 140, 529–573. Carrasco, M. and Florens, J.-P. (2000), ‘Generalization of gmm to a continuum of moment conditions’, Econometric Theory 16, 797–834. Carrasco, M., Florens, J.-P. and Renault, E. (2007), Linear inverse problems in structural econometrics: estimation based on spectral decomposition and regularization, in J. Heckman and E. Leamer, eds, ‘Handbook of econometrics’, Vol. 6, pNH, Amsterdam. Dufour, J.-M. and Renault, E. (1998), ‘Short run and long run causality in time series: theory’, Econometrica 66, 1099–1125. Dufour, J.-M. and Trognon, A. (2001), Invariant tests based on M-estimators, estimating functions, and the generalized method of moments, Technical report, CIREQ, University of Montreal. Eaton, M. L. and Tyler, D. (1994), ‘The asymptotic distribution of singular values with applications to canonical correlations and correspondence analysis’, Journal of Multivariate Analysis 50, 238–264. Engl, H. W., Hanke, M. and Neubauer, A. (2000), Regularization of inverse problems, Kluwer Academic Publishers. Kato, T. (1966), Perturbation theory for linear operators, Springer. Kolmogorov, a. N. and Fomin, S. V. (1970), Introductory real analysis, Dover Publications, inc. Kress, R. (1999), Linear integral equations, Applied mathematical sciences, second edn, Springer, New York. Lukacs, E. (1975), Stochastic convergence, second edn, Academic Press, inc. Lütkepohl, H. and Burda, M. M. (1997), ‘Modified wald tests under nonregular conditions’, Journal of Econometrics 78, 315–332. Moore, D. S. (1977), ‘Estimating stochastic volatility models through indirect inference’, Journal of the American Statistical Association 72(357), 131–137. 28 Newey, W. K. and West, K. D. (1987), ‘Hypothesis testing with efficient method of moments estimators’, International Economic Review 28, 777–787. Rellich, F. (1953), Perturbation theory of eigenvalues problems, Lecture Notes, New York university. Tyler, D. E. (1981), ‘Asymptotic inference for eigenvectors’, The Annals of Statistics 9(4), 725–736. 29