Treatment E¤ects with Many Covariates and Heteroskedasticity Matias D. Cattaneo Michael Jansson

advertisement
Treatment E¤ects with Many Covariates and Heteroskedasticity
Matias D. Cattaneoy
Michael Janssonz
Whitney K. Neweyx
February 29, 2016
Abstract
The linear regression model is widely used in empirical work in Economics. Researchers
often include many covariates in their linear model speci…cation in an attempt to control for
confounders. We give inference methods that allow for many covariates and heteroskedasticity.
Our results are obtained using high-dimensional approximations, where the number of included
covariates are allowed to grow as fast as the sample size. We …nd that all of the usual versions
of Eicker-White heteroskedasticity consistent standard error estimators for linear models are
inconsistent under this asymptotics. We then propose a new heteroskedasticity consistent standard error formula that is fully automatic and robust to both (conditional) heteroskedasticity
of unknown form and the inclusion of possibly many covariates. We apply our …ndings to three
settings: (i) parametric linear models with many covariates, (ii) semiparametric semi-linear
models with many technical regressors, and (iii) linear panel models with many …xed e¤ects.
Keywords: high-dimensional models, linear regression, many regressors, heteroskedasticity,
standard errors.
JEL: C12, C14, C21.
We thank Ulrich Müller and Andres Santos for very thoughtful discussions regarding this project. We also thank
Silvia Gonçalvez, Pat Kline, James MacKinnon and seminar participants at Duke University, ITAM, Northwestern
University, Pennsylvania State University, Princeton University, UCSD, University of Chicago, University of Miami,
UNC-Chapel Hill, University of Michigan, University of Montreal, Vanderbilt university, and Yale University for their
comments. The …rst author gratefully acknowledges …nancial support from the National Science Foundation (SES
1459931). The second author gratefully acknowledges …nancial support from the National Science Foundation (SES
1459967) and the research support of CREATES (funded by the Danish National Research Foundation under grant
no. DNRF78).
y
Department of Economics and Department of Statistics, University of Michigan.
z
Department of Economics, UC Berkeley and CREATES.
x
Department of Economics, MIT.
1
Introduction
A key goal in empirical work is to estimate the structural, causal, or treatment e¤ect of some
variable on an outcome of interest, such as the impact of a labor market policy on outcomes like
earnings or employment. Since many variables measuring policies or interventions are not exogenous, researchers often employ observational methods to estimate their e¤ects. One important
method is based on assuming that the variable of interest can be taken as exogenous after controlling for a su¢ ciently large set of other factors or covariates. A major problem that empirical
researchers face when employing selection-on-observables methods to estimate structural e¤ects is
the availability of many potential covariates. This problem has become even more pronounced in
recent years because of the widespread availability of large (or high-dimensional) new data sets.
Not only it is often the case that economic theory (or intuition) will suggest a large set of
variables that might be important, but also researchers prefer to also include additional “technical”
controls constructed using indicator variables, interactions, and other non-linear transformations
of those variables. Therefore, many empirical studies include very many covariates in order to
control for as broad array of confounders as possible. For example, it is common practice in
microeconometrics to include dummy variables for many potentially overlapping groups based on
age, cohort, geographic location, etc. Even when some controls are dropped after valid covariate
selection, as was recently developed by Belloni, Chernozhukov, and Hansen (2014b), many controls
usually may remain in the …nal model speci…cation. See also Angrist and Hahn (2004) for discussion
on when to include high-dimensional covariates in treatment e¤ect models.
We present valid inference methods that explicitly account for the presence of possibly many
controls in linear regression models with unrestricted (conditional) heteroskedasticity. Speci…cally,
we consider the setting where the object of interest is
yi;n =
0
xi;n +
0
n wi;n
+ ui;n ;
in a model of the form
i = 1; : : : ; n;
(1)
where yi;n is a scalar outcome variable, xi;n is a regressor of small (i.e., …xed) dimension d, wi;n
is a vector of covariates of possibly large (i.e., growing) dimension Kn , and ui;n is an unobserved
error term. Two important cases discussed in more detail below, are “‡exible”parametric modeling
1
of controls via basis expansions such as higher-order powers and interactions (i.e., a series-based
formulation of the partially linear regression model), and models with many dummy variables such
as …xed e¤ects and interactions thereof in panel data. In both cases conducting OLS-based inference
on
in (1) is straightforward when the error ui;n is homoskedastic and/or the dimension Kn of
the nuisance covariates is modeled as a vanishing fraction of the sample size. The latter modeling
assumption, however, seems inappropriate in applications with many dummy variables and does
not deliver the best approximation when many covariates are included.
Motivated by the above observations, this paper studies the consequences of allowing the error
ui;n in (1) to be (conditionally) heteroskedastic in a setting where the covariate wi;n is permitted
to be high-dimensional in the sense that Kn is allowed, but not required, to be a non-vanishing
fraction of the sample size. Our main purpose is to investigate the possibility of constructing
heteroskedasticity-consistent variance estimators for the OLS estimator of
in (1) without (nec-
essarily) assuming any special structure on the part of the covariate wi;n . We present two main
results. First, we provide high-level su¢ cient conditions guaranteeing a valid Gaussian distributional approximation to the …nite sample distribution of the OLS estimator of
, allowing for
the dimension of the nuisance covariates to be “large” relative to the sample size (Kn =n 6! 0).
Second, we characterize the large sample properties of a large class of variance estimators, and
use this characterization to obtain both negative and positive results. The negative …nding is that
the Eicker-White estimator is inconsistent in general, as are popular variants of this estimator.
The positive result gives conditions under which an alternative heteroskedasticity-robust variance
estimator (described in more detail below) is consistent. The main condition needed for our constructive results is a high-level assumption on the nuisance covariates requiring in particular that
their number be strictly less than half of the sample size. As a by-product, we also …nd that among
the popular HCk class of standard errors estimators for linear models, a variant of the HC3 estimator delivers standard errors that are asymptotically upward biased in general. Thus, standard OLS
inference employing HC3 standard errors will be asymptotically valid, albeit conservative, even in
high-dimensional settings where the number of covariate wi;n is large (i.e., when Kn =n 6! 0).
Our results contribute to the already sizeable literature on heteroskedasticity-robust variance
estimators for linear regression models, a recent review of which is given by MacKinnon (2012).
Important papers whose results are related to ours include White (1980), MacKinnon and White
2
(1985), Wu (1986), Chesher and Jewitt (1987), Shao and Wu (1987), Chesher (1989), Cribari-Neto,
Ferrari, and Cordeiro (2000), Bera, Suprayitno, and Premaratne (2002), Stock and Watson (2008),
Cribari-Neto and da Gloria A. Lima (2011), Müller (2013), and Abadie, Imbens, and Zheng (2014).
In particular, Bera, Suprayitno, and Premaratne (2002) analyze some …nite sample properties of
a variance estimator similar to the one whose asymptotic properties are studied herein. They use
unbiasedness or minimum norm quadratic unbiasedness to motivate a variance estimator that is
similar in structure to ours, but their results are obtained for …xed Kn and n and is silent about
the extent to which consistent variance estimation is even possible when Kn =n 6! 0:
This paper also adds to the literature on high-dimensional linear regression where the number
of regressors grow with the sample size; see, e.g., Huber (1973), Koenker (1988), Mammen (1993),
El Karoui, Bean, Bickel, Lim, and Yu (2013) and references therein. In particular, Huber (1973)
showed that …tted regression values are not asymptotically normal when the number of regressors
grows as fast as sample size, while Mammen (1993) obtained asymptotic normality for arbitrary
contrasts of OLS estimators in linear regression models where the dimension of the covariates is at
most a vanishing fraction of the sample size. More recently, El Karoui, Bean, Bickel, Lim, and Yu
(2013) showed that, if a Gaussian distributional assumption on regressors and homoskedasticity is
assumed, then certain estimated coe¢ cients and contrasts in linear models are asymptotically normal when the number of regressors grow as fast as sample size, but do not discuss inference results
(even under homoskedasticity). Our result in Theorem 1 below shows that certain contrasts of OLS
estimators in high-dimensional linear models are asymptotically normal under fairly general regularity conditions. Intuitively, we circumvent the problems associated with the lack of asymptotic
Gaussianity in general high-dimensional linear models by focusing exclusively on a small subset
of regressors when the number of covariates gets large. We give inference results by constructing
heteroskedasticity consistent standard errors without imposing any distributional assumption or
other very speci…c restrictions on the regressors.
As discussed in more detailed below, our high-level conditions allow for Kn / n and restrict
the data generating process in fairly general and intuitive ways. In particular, our generic su¢ cient condition on the nuisance covariates wi;n covers several special cases of interest for empirical
work. For example, our results encompass (and weakens in some sense; see Remark 2 below) those
reported in Stock and Watson (2008), who investigated the one-way …xed e¤ects panel data re3
gression model in detail and showed that the conventional Eicker-White heteroskedasticity-robust
variance estimator is inconsistent in that model, being plagued by a non-negligible bias problem
attributable to the presence of many covariates (i.e., the …xed e¤ects). The very special structure
of the covariates in the one-way …xed e¤ects model estimator enabled Stock and Watson (2008)
to give an explicit characterization of this bias and to demonstrate consistency of a bias-corrected
version of the Eicker-White variance estimator. The generic variance estimator proposed herein
essentially reduces to their bias corrected variance estimator in the special case of the one-way …xed
e¤ects model, even though our results are derived from a di¤erent perspective.
The rest of this paper is organized as follows. Section 2 presents the variance estimators we
study and gives a heuristic description of their main properties. Section 3 introduces the three
leading examples covered by our results. Section 4 introduces a general framework that uni…es
the examples, gives the main results of the paper, and discusses their implications for the three
examples we consider. Section 5 reports the results of a Monte Carlo experiment, while Section 6
concludes. Proofs of all results are reported in the appendix.
2
Variance Estimators
For the purposes of discussing variance estimators associated with the OLS estimator ^ n of
in
(1) it is convenient to write the estimator in “partialled out” form as
n
X
0
^ =(
v^i;n v^i;n
)
n
i=1
where Mij;n = 1(i = j)
0 (
wi;n
1
n
X
(
v^i;n yi;n );
i=1
v^i;n =
n
X
Mij;n xj;n ;
j=1
Pn
0
1
k=1 wk;n wk;n ) wj;n ;
1( ) denotes the indicator function, and the
P
0 =n; the objective is to …nd an
relevant inverses are assumed to exist. De…ning ^ n = ni=1 v^i;n v^i;n
P
p
estimator ^ n of the variance of ni=1 v^i;n ui;n = n such that
p
^ n 1=2 n( ^ n
) !d N (0; Id );
in which case asymptotically valid inference on
^ n = ^ n 1 ^ n ^ n 1;
(2)
can be conducted in the usual way by employing
a
the distributional approximation ^ n N ( ; ^ n =n):
P
0
De…ning u
^i;n = nj=1 Mij;n (yj;n ^ n xj;n ); standard choices of ^ n in the …xed-Kn case include
4
the homoskedasticity-only estimator
^ HO = ^ 2 ^ n ;
n
n
^ 2n =
n
n
X
1
u
^2i;n ;
d Kn
i=1
and the Eicker-White-type estimator
n
X
0
^ EW = 1
v^i;n v^i;n
u
^2i;n :
n
n
i=1
Perhaps not too surprisingly, we …nd that consistency of ^ HO
n under homoskedasticity holds quite
generally even for models with many covariates. In contrast, construction of a heteroskedasticityrobust estimator of
n
is more challenging, as it turns out that consistency of ^ EW
n generally requires
Kn to be a vanishing fraction of n.
0 ) are i.i.d. over i: It turns out that, under certain regularity
To …x ideas, suppose (yi;n ; x0i;n ; wi;n
conditions,
n
n
1 XX 2
0
^ EW
Mij;n v^i;n v^i;n
E[u2j;n jxj;n ; wj;n ] + op (1);
=
n
n
i=1 j=1
whereas a requirement for (2) to hold is that the estimator ^ n satis…es
n
X
0
^n = 1
v^i;n v^i;n
E[u2i;n jxi;n ; wi;n ] + op (1):
n
(3)
i=1
The di¤erence between the leading terms in the expansions is non-negligible in general unless
Kn =n ! 0: In recognition of this problem with ^ EW
n ; we study the more general class of estimators
of the form
n
^ n(
n
1 XX
n) =
n
^i v^i0 u
^2j ;
ij;n v
i=1 j=1
where
ij;n
denotes element (i; j) of a symmetric matrix
n
=
can be written in this fashion include ^ EW
n (which corresponds to
n (w1;n ; : : : ; wn;n ):
n
Estimators that
= In ) as well as variants of the
so-called HCk estimators, k 2 f1; 2; 3; 4g, discussed by MacKinnon (2012), among others. To be
speci…c, a natural variant of HCk is obtained by choosing
where (
(
i;n ; i;n )
i;n ; i;n )
n
to be diagonal with
= (1; 0) for HC0 (and corresponding to ^ EW
n ), (
= (1; 1) for HC2 , (
i;n ; i;n )
= (1; 2) for HC3 , and (
5
i;n ; i;n )
ii;n
=
i;n
i;n Mii;n
,
= (n=(n Kn ); 0) for HC1 ,
i;n ; i;n )
= (1; min(4; nMii;n =Kn ))
for HC4 . See Sections 4.4 for more details.
We show that all of the HCk -type estimators, which correspond to a diagonal choice of
n,
have the shortcoming that they do not satisfy (3) when Kn =n 9 0. On the other hand, it turns
out that a certain non-diagonal choice of
n
makes it possible to satisfy (3) even if Kn is a non-
vanishing fraction of n. To be speci…c, it turns out that (under regularity conditions and) under
mild conditions under the weights
n
^ n(
ij;n ;
n
^ n(
satis…es
n
1 XXX
n) =
n
i=1 j=1 k=1
2
0
^i;n v^i;n
E[u2j;n jxj;n ; wj;n ]
ik;n Mkj;n v
suggesting that (3) holds with ^ n = ^ n (
n
X
n)
n)
2
ik;n Mkj;n
provided
n
= 1(i = j);
+ op (1);
is chosen in such a way that
1
i; j
n:
k=1
Accordingly, we de…ne
n
^ HC
^
n = n(
HC
n )
n
1 XX
=
n
HC
0
^i;n v^i;n
u
^2j;n ;
ij;n v
i=1 j=1
where, with Mn denoting the matrix with element (i; j) given by Mij;n and
denoting the
Hadamard product,
HC
n
0
B
B
=B
B
@
HC
11;n
..
.
HC
1n;n
..
.
HC
n1;n
..
.
HC
nn;n
1
0
C B
C B
C=B
C B
A @
2
M11;n
..
.
..
.
2
Mn1;n
The estimator ^ HC
n is well de…ned whenever Mn
2
M1n;n
..
.
2
Mnn;n
1
1
C
C
C
C
A
= (Mn
Mn )
1
:
Mn is invertible, a simple su¢ cient condition for
which is that Mn < 1=2; where
Mn = 1
min Mii;n :
1 i n
The fact that Mn < 1=2 implies invertibility of Mn
Mn is a consequence of the Gershgorin circle
theorem. For details, see Section A.2 in the appendix. More importantly, a slight strengthening of
the condition Mn < 1=2 will be shown to be su¢ cient for (2) and (3) to hold with ^ n = ^ HC
n .
Remark 1. The estimator ^ HC
n can be written as n
6
1
Pn
0 u
^i;n v^i;n
~2i;n ;
i=1 v
where u
~2i;n =
Pn
j=1
HC u
2
ij;n ^j;n
can be interpreted as a bias-corrected “estimator” of (the conditional expectation of) u2i;n :
3
Examples
The heuristics of the preceding section will be made precise in the next section. Before doing so,
we present three leading examples, all of which are covered by the results developed in Section 4:
(i) linear regression models with increasing dimension, (ii) semiparametric partially linear models,
and (iii) …xed e¤ects panel data regression models.
Let
min (
) denote the minimum eigenvalue of its argument and let k k denote the Euclidean
norm.
3.1
Linear Regression Model with Increasing Dimension
The model of main interest is the linear regression model characterized by (1) and the following
assumptions.
0 ):1
Assumption LR1 f(yi;n ; x0i;n ; wi;n
i
ng are i.i.d. over i:
Assumption LR2 E[kxi;n k2 ] = O(1); nE[(E[ui;n jxi;n ; wi;n ])2 ] = o(1); and max1
p
vi;n k=
i n k^
n=
op (1):
Assumption LR3 P[
min (
CnLR =
Pn
0
i=1 wi;n wi;n )
> 0] ! 1; limn!1 Kn =n < 1; and CnLR = Op (1); where
max fE[u4i;n jxi;n ; wi;n ] + E[kVi;n k4 jwi;n ]g
1 i n
+ max f1=E[u2i;n jxi;n ; wi;n ] + 1=
1 i n
with Vi;n = xi;n
0
min (E[Vi;n Vi;n jwi;n ])g;
E[xi;n jwi;n ]:
We shall consider this model in some detail because it is important in its own right and because
the insights obtained for it can be used constructively in other cases, including the partially linear
model (4) and the …xed e¤ects panel data regression model (5) presented below. Linear regression
models with (possibly) increasing dimension have a long tradition in econometrics and statistics,
and we consider them here as a theoretical device to obtain asymptotic approximations that better
represent the …nite-sample behavior of the statistics of interest.
7
The main di¤erence between Assumptions LR1-LR3 and those familiar from the …xed-Kn case
p
is the presence of the conditions nE[(E[ui;n jxi;n ; wi;n ])2 ] = o(1) and max1 i n k^
vi;n k= n = op (1) in
Assumption LR2. The …rst condition is of course implied by the classical “exogeneity”assumption
E[ui;n jxi;n ; wi;n ] = 0, but is in general weaker because it allows for a (vanishing) misspeci…cation
error in the linear model speci…cation. As for the second condition, at the present level of generality
p
it seems di¢ cult to formulate primitive su¢ cient conditions for max1 i n k^
vi;n k= n = op (1) that
cover all cases of interest, but for completeness we mention that under mild moment conditions it
su¢ ces to require that one of the following conditions hold (see the appendix for details):
(i) Mn = op (1); or
(ii)
LR
n
= min
(iii) max1
i n
2RKn
Pn
d
E[kE(xi;n jwi;n )
j=1 1(Mij;n
0
wi;n k2 ] = o(1); or
6= 0) = op (n1=3 ):
Each of these conditions is interpretable. First, Mn
Kn =n because
Pn
i=1 Mii;n
=n
Kn and
a necessary condition for (i) is therefore that Kn =n ! 0: Conversely, because
Mn
Kn 1
n 1
min1
max1
i n Mii;n
i n Mii;n
;
the condition Kn =n ! 0 is su¢ cient for (i) whenever the design is “approximately balanced” in
the sense that (1
min1
i n Mii;n )=(1
max1
i n Mii;n )
= Op (1): In other words, (i) requires and
e¤ectively covers the case where it is assumed that Kn is a vanishing fraction of n: In contrast,
conditions (ii) and (iii) can hold also when Kn is a non-vanishing fraction of n; which is the case
of primary interest in this paper.
Because (ii) is a requirement on the accuracy of the approximation
E[xi;n jwi;n ]
0
n wi;n ;
n
0
= E[wi;n wi;n
]
1
E[wi;n x0i;n ];
primitive conditions for it are available when the elements of wi;n are approximating functions, as in
the partially linear model (4) discussed next. Indeed, in such cases one typically has
for some
LR
n
= O(Kn )
> 0; so condition (ii) not only accommodates Kn =n 9 0; but actually places no upper
bound on the magnitude of Kn in important special cases.
8
Finally, condition (iii), and its underlying higher-level condition described in the appendix, is
useful to handle cases where wi;n can not be interpreted as approximating functions, but rather just
many di¤erent covariates included in the linear model speci…cation. This condition is a “sparsity”
condition on the matrix Mn , which allows for Kn =n 9 0. Although somewhat stronger than needed,
the condition is easy to verify in certain cases, including the panel data model (5) discussed below.
3.2
Semiparametric Partially Linear Model
Another econometric model covered by our results is the partially linear model
yi =
0
xi + g(zi ) + "i ;
i = 1; : : : ; n;
(4)
where xi and zi are explanatory variables, "i is an error term, and the function g(z) is unknown.
Suppose fpk (z) : k = 1; 2;
g are functions having the property that linear combinations can
0 p (z )
n n i
approximate square-integrable functions of z well, in which case g(zi )
for some
n;
where pn (z) = (p1 (z); : : : ; pKn (z))0 . De…ning yi;n = yi ; xi;n = xi ; wi;n = pn (zi ); and ui;n =
0 w ;
n i;n
"i + g(zi )
the model (4) is of the form (1), and ^ n is the series estimator of
previously
studied by Donald and Newey (1994) and Cattaneo, Jansson, and Newey (2015).
Let h(zi ) = E[xi jzi ]: Our analysis of ^ n will proceed under the following assumptions.
Assumption PL1 f(yi ; x0i ; zi0 ) : 1
i
ng are i.i.d. over i:
Assumption PL2 E["i jxi ; zi ] = 0; %PL
n = o(1);
%PL
n = min E[jg(zi )
2RKn
Assumption PL3 P[
min (
where
Pn
0
PL
n
pn (zi )j2 ];
i=1 pn (zi )pn (zi )
0)
= o(1); and n%PL
n
PL
n
=
min
2RKn
d
PL
n
E[kh(zi )
1 i n
i
= xi
0
pn (zi )k2 ]:
> 0] ! 1; limn!1 Kn =n < 1; and CnPL = Op (1);
CnPL = max fE["4i jxi ; zi ] + E[k i k4 jzi ] + 1=E["2i jxi ; zi ] + 1=
with
= o(1); where
E[xi jzi ]:
9
0
min (E[ i i jzi ])g;
Because g(zi ) 6=
0 p (z )
n n i
in general, the partially linear model does not (necessarily) satisfy
E[ui;n jxi;n ; wi;n ] = 0: To accommodate this failure a relaxation of Assumption LR2 is needed.
The approach taken here, made precise in Assumption PL2, is motivated by the fact that linear
combinations of fpk (z)g are assumed to be able to approximate the functions g(z) and h(z) well.
Under standard smoothness conditions, and for standard choices of basis functions, we have %PL
n =
g
O(Kn
) and
PL
n
= O(Kn
PL2 holds provided Kn g
+
h
h
) for some pair (
g;
h)
of positive constants, in which case Assumption
=n ! 1: For further technical details see, for example, Newey (1997),
Chen (2007), Cattaneo and Farrell (2013), Belloni, Chernozhukov, Chetverikov, and Kato (2015)
and Chen and Christensen (2015).
3.3
Fixed E¤ects Panel Data Regression Model
Stock and Watson (2008) consider heteroskedasticity-robust inference for the panel data regression
model
Yit =
where
i
i
0
+
Xit + Uit ;
i = 1; : : : ; N;
t = 1; : : : ; T;
(5)
2 R is an individual-speci…c intercept, Xit 2 Rd is a regressor of dimension d; Uit 2 R is
an error term, and the following assumptions are satis…ed.
0 : : : ; X0 ) : 1
Assumption FE1 f(Ui1 ; : : : ; UiT ; Xi1
iT
i
ng are independent over i; T
3 is
…xed, and E[Uit Uis jXi1 : : : ; XiT ] = 0 for t 6= s:
Assumption FE2 E[Uit jXi1 : : : ; XiT ] = 0:
FE = O (1); where
Assumption FE3 CN
p
FE
CN
=
max
1 i N;1 t T
+
with V~it = Xit
E[Xit ]
De…ning n = N T; Kn = N;
(y(i
fE[Uit4 jXi1 : : : ; XiT ] + E[kXit k4 ]g
max
1 i N;1 t T
1
T
n
f1=E[Uit2 jXi1 : : : ; XiT ] + 1=
PT
=(
s=1 (Xis
1; : : : ;
~ ~0
min (E[Vit Vit ])g;
E[Xis ]):
0
N) ;
0
0
1)T +t;n ; x(i 1)T +t;n ; u(i 1)T +t;n ; w(i 1)T +t;n )
and
= (Yit ; Xit0 ; Uit ; e0i;N );
10
1
i
N; 1
t
T;
where ei;N 2 RN is the i-th unit vector of dimension N; the model (5) is also of the form (1) and
^ is the …xed e¤ects estimator of : In general, this model does not satisfy Assumption LR1, but
n
Assumption FE1 enables us to employ results for independent random variables when developing
asymptotics. In other respects this model is in fact more tractable than the previous models due
to the special nature of the covariates wi;n :
Remark 2. One implication of Assumptions FE1 and FE2 is that E[Yit jXi1 ; : : : ; XiT ] =
where
i
i+
0
Xit ;
can depend on i and the conditioning variables (Xi1 ; : : : ; XiT ) in an arbitrary way.
In the spirit of “…xed e¤ects”(as opposed to “correlated random e¤ects”) Assumptions FE1FE3 further allow V[Yit jXi1 ; : : : ; XiT ] to depend not only on (Xi1 ; : : : ; XiT ); but also on i: In
0 : : : ; X 0 ) to
particular, unlike Stock and Watson (2008), we do not require (Ui1 ; : : : ; UiT ; Xi1
iT
be i.i.d. over i. In addition, we do not require any kind of stationarity on the part of (Uit ; Xit0 ).
The amount of variance heterogeneity permitted is quite large, as Assumption FE3 basically
only requires V[Yit jXi1 ; : : : ; XiT ] = E[Uit2 jXi1 ; : : : ; XiT ] to be bounded and bounded away
from zero. (On the other hand, serial correlation is assumed away because Assumptions FE1
and FE2 imply that C[Yit ; Yis jXi1 ; : : : ; XiT ] = 0 for t 6= s:)
4
Results
The three models presented in the previous section are non-nested, but may be treated in a uni…ed
way by embedding them in a general framework. This general framework, which accommodates
our motivating examples as well as others, is presented next.
11
4.1
General Framework
0 ) : 1
Suppose f(yi;n ; x0i;n ; wi;n
i
ng is generated by (1). Let Xn = (x1;n ; : : : ; xn;n ) and for a set
Wn of random variables satisfying E[wi;n jWn ] = wi;n ; de…ne the constants
n
%n =
n
n
=
=
1X
2
E[Ri;n
];
n
1
n
1
n
i=1
n
X
i=1
n
X
i=1
Ri;n = E[ui;n jXn ; Wn ];
2
E[ri;n
];
ri;n = E[ui;n jWn ];
E[kQi;n k2 ];
Qi;n = E[vi;n jWn ];
P
P
0 ])( n E[w w 0 ])
( nj=1 E[xj;n wj;n
j;n j;n
j=1
where vi;n = xi;n
v^i;n : Also, de…ne
1w
i;n
is the population counterpart of
4
2
Cn = max fE[Ui;n
jXn ; Wn ] + E[kVi;n k4 jWn ] + 1=E[Ui;n
jXn ; Wn ]g + 1=
1 i n
where Ui;n = yi;n
Pn
j=1 Mij;n Vj;n :
~
min (E[ n jWn ])g;
Pn ~ ~ 0
~
E[xi;n jWn ]; ~ n =
i=1 Vi;n Vi;n =n; and Vi;n =
E[yi;n jXn ; Wn ]; Vi;n = xi;n
In the appendix we show how the three examples …t in this general framework and verify that
Assumptions LR1–LR3, PL1–PL3, and FE1–FE3, respectively, imply the following three assumptions.
Assumption 1 C[Ui;n ; Uj;n jXn ; Wn ] = 0 for i 6= j and max1
is the cardinality of Ti;n and where fTi;n : 1
i
i Nn
#Ti;n = O(1); where #Ti;n
Nn g is a partition of f1; : : : ; ng such that
f(Ut;n ; Vt;n ) : t 2 Ti;n g are independent over i conditional on Wn :
Assumption 2
n
Assumption 3 P[
4.2
= O(1); %n + n(%n
Pn
min (
0
i=1 wi;n wi;n )
n)
+n
n %n
= o(1); and max1
vi;n k=
i n k^
p
n = op (1):
> 0] ! 1; limn!1 Kn =n < 1; and Cn = Op (1):
Asymptotic Normality
As a means to the end of establishing (2), we give an asymptotic normality result for ^ n which may
be of interest in its own right.
12
Theorem 1 Suppose Assumptions 1–3 hold. Then
n
where
n
=
1=2 p
n( ^ n
) !d N (0; Id );
n
= ^n 1
^
n n
1
;
(6)
Pn
0 E[U 2 jX ; W ]=n:
^i;n v^i;n
n
i;n n
i=1 v
In the literature on high-dimensional linear models, Mammen (1993) obtains a similar asymptotic normality result as in Theorem 1 but under the condition Kn1+ =n ! 0 for
> 0 restricted by
certain moment condition on the covariates. In contrast, our result only requires limn!1 Kn =n < 1;
but imposes a di¤erent restriction on the high-dimensional covariates (e.g., condition (i), (ii) or
(iii) discussed previously) and furthermore exploits the fact that the parameter of interest is given
by the …rst d coordinates of the vector ( 0 ;
case c = ( 0 ; 00 )0 with
0 )0
n
(i.e., in Mammen (1993) notation, it considers the
denoting a d-dimensional vector of ones and 0 denoting a Kn -dimensional
vector of zeros).
In isolation, the fact that Theorem 1 removes the requirement Kn =n ! 0 may seem like little
more than a subtle technical improvement over results currently available. It should be recognized,
however, that conducting inference turn out to be considerably harder when Kn =n 6! 0: The latter is
an important insight about large-dimensional models that cannot be deduced from results obtained
under the assumption Kn =n ! 0; but can be obtained with the help of Theorem 1. In addition,
it is worth mentioning that Theorem 1 is a substantial improvement over Cattaneo, Jansson, and
Newey (2015, Theorem 1) because here it is not required that Kn ! 1 nor
n
= o(1). This
improvement applies not only to the partially linear model example, but more generally to linear
models with many covariates, because it allows now for quite general form of nuisance covariate wi;n
beyond speci…c approximating basis functions. In the speci…c case of the partially linear model,
this implies that we are able to weaken smoothness assumptions (or the curse of dimensionality),
otherwise required to satisfy the condition
4.3
n
= o(1).
Variance Estimation
Achieving (2), the counterpart of (6) in which the unknown matrix
n
is replaced by the estimator
^ n ; requires additional assumptions. One possibility is to impose homoskedasticity.
13
2 jX ; W ] =
Theorem 2 Suppose the assumptions of Theorem 1 hold. If E[Ui;n
n
n
2;
n
then (2) holds
with ^ n = ^ HO
n :
This result shows in quite some generality that homoskedastic inference in linear models remains
valid even when Kn is proportional to n; provided the variance estimator incorporates a degreesof-freedom correction, as ^ HO
n does.
Establishing (2) is also possible when Kn is assumed to be a vanishing fraction of n; as is of course
the case in the usual …xed-Kn linear regression model setup. The following theorem establishes
consistency of the conventional standard error estimator ^ EW
n under the assumption Mn !p 0, and
also derives an asymptotic representation for estimators of the form ^ n (
n)
without imposing this
assumption, which is useful to study the asymptotic properties of other members of the HCk class
of standard error estimators.
Theorem 3 Suppose the assumptions of Theorem 1 hold.
(a) If Mn !p 0; then (2) holds with ^ n = ^ EW
n :
P
(b) If k n k1 = max1 i n nj=1 j ij;n j = Op (1); then
n
^ n(
n) =
n
n
1 XXX
n
i=1 j=1 k=1
2
0
2
^i;n v^i;n
E[Uj;n
jXn ; Wn ]
ik;n Mkj;n v
+ op (1):
The conclusion of part (a) typically fails when the condition Kn =n ! 0 is dropped. For
example, when specialized to
n
= In part (b) implies that in the homoskedastic case (i.e., when
the assumptions of Theorem 2 are satis…ed) the standard estimator ^ EW
n is asymptotically downward
biased in general (unless Kn =n ! 0). In the following section we make this result precise and discuss
similar results for other popular variants of the HCk standard error estimators mentioned above.
P
2
On the other hand, because 1 k n HC
ik;n Mkj;n = 1(i = j) by construction, part (b) implies
that ^ HC
n is consistent provided k
HC k
n 1
= Op (1): A simple condition for this to occur can be stated
in terms of Mn : Indeed, if Mn < 1=2; then
HC
n
is diagonally dominant and it follows from Theorem
1 of Varah (1975) that
k
HC
n k1
1
1=2
Mn
:
As a consequence, we obtain the following theorem, whose conditions can hold even if Kn =n 9 0.
14
Theorem 4 Suppose the assumptions of Theorem 1 hold.
Mn ) = Op (1), then (2) holds with ^ n = ^ HC
n :
If P[Mn < 1=2] ! 1 and if 1=(1=2
Because Mn
Kn =n; a necessary condition for Theorem 4 to be applicable is that limn!1 Kn =n <
1=2: When the design is balanced, that is, when M11;n = : : : = Mnn;n (as occurs in the panel data
model (5)), the condition limn!1 Kn =n < 1=2 is also su¢ cient, but in general it seems di¢ cult
to formulate primitive su¢ cient conditions for the assumption made about Mn in Theorem 4.
In practice, the fact that Mn is observed means that the condition Mn < 1=2 is veri…able, and
therefore unless Mn is found to be “close” to 1=2 there is reason to expect ^ HC
n to perform well.
4.4
HCk Standard Errors with Many Covariates
The HCk variance estimators are very popular in empirical work, and in our context are of the
form ^ n (
n)
with
ij;n
= 1(i = j)
i;n
i;n Mii;n
for some choice of (
i;n ; i;n ).
See MacKinnon (2012)
for a review. Theorem 3(b) can be used to formulate conditions, including Kn =n ! 0, under which
these estimators are consistent in the sense that
n
^ n(
n)
=
n
+ op (1);
n
1X
0
2
v^i;n v^i;n
E[Ui;n
jXn ; Wn ]:
=
n
i=1
More generally, Theorem 3(b) shows that, under regularity conditions and if
ij;n
= 1(i = j)
i;n
i;n Mii;n
then
n
^ n(
n)
=
n( n)
+ op (1);
n
1 XX
n( n) =
n
i;n
i;n Mii;n
i=1 j=1
2
0
2
Mij;n
v^i;n v^i;n
E[Uj;n
jXn ; Wn ]:
We therefore obtain the following (mostly negative) results about the properties of HCk estimators
when Kn =n 9 0, that is, when potentially many covariates are included in the model.
HC0 : (
i;n ; i;n )
2 jX ; W ] =
= (1; 0): If E[Uj;n
n
n
n( n)
with n
1
Pn
i=1 (1
=
n
2
n
n
n
X
(1
2;
n
then
0
Mii;n )^
vi;n v^i;n
n;
i=1
0 6= o (1) in general (unless K =n ! 0). Thus, ^ (
Mii;n )^
vi;n v^i;n
p
n
n
15
n)
= ^ EW
n
,
is inconsistent in general. In particular, inference based on ^ EW
n is asymptotically liberal (even)
under homoskedasticity.
HC1 : (
n( n)
i;n ; i;n )
=
n;
2 jX ; W ] =
Kn ); 0): If E[Uj;n
n
n
= (n=(n
2
n
and if M11;n = : : : = Mnn;n ; then
but in general this estimator is inconsistent when Kn =n 9 0 (and so is any
other scalar multiple of ^ EW
n ).
HC2 : (
i;n ; i;n )
2 jX ; W ] =
= (1; 1): If E[Uj;n
n
n
2;
n
then
n( n)
=
n;
but in general this
estimator is inconsistent under heteroskedasticity when Kn =n 9 0: For instance, if d = 1 and
2 jX ; W ] = v
2 ; then
if E[Uj;n
^j;n
n
n
n
n( n)
n
n
2
1 X X Mij;n
1
1
=
[
(Mii;n
+ Mjj;n
)
n
2
i=1 j=1
2 2
1(i = j)]^
vi;n
v^j;n 6= op (1)
in general (unless Kn =n ! 0).
HC3 : (
i;n ; i;n )
= (1; 2): Inference based on this estimator is asymptotically conservative
because
n( n)
n
=
n
n
1X X
2
2
0
2
Mii;n
Mij;n
v^i;n v^i;n
E[Uj;n
jXn ; Wn ]
n
0;
i=1 j=1;j6=i
where n
0).
HC4 : (
1
Pn Pn
i=1
i;n ; i;n )
2
2 ^ v
0
2
i;n ^i;n E[Uj;n jXn ; Wn ]
j=1;j6=i Mii;n Mij;n v
6= op (1) in general (unless Kn =n !
= (1; min(4; nMii;n =Kn )): If M11;n = : : : = Mnn;n = 2=3 (as occurs when
T = 3 in the …xed e¤ects panel data model), then HC4 reduces to HC3 , so this estimator is
also inconsistent in general.
Among other things these results show that (asymptotically) conservative inference in linear
models with many covariates (i.e., even when K=n 6! 0) can be conducted using standard linear
methods (and software), provided the HC3 standard errors are used.
In Section 5 we present numerical evidence comparing all these standard error estimators. In
particular, we …nd that indeed standard OLS-based con…dence intervals employing HC3 standard
errors are always quite conservative. Furthermore, we also …nd that our proposed variance estimator
^ HC
n delivers con…dence intervals with close-to-correct empirical coverage.
16
4.5
4.5.1
Examples
Linear Regression Model with Increasing Dimension
Specializing Theorems 2–4 to the linear regression model, we obtain the following result.
Theorem LR Suppose Assumptions LR1–LR3 hold.
(a) If E[u2i;n jxi;n ; zi;n ] =
2;
n
then (2) holds with ^ n = ^ HO
n :
(b) If Mn !p 0; then (2) holds with ^ n = ^ EW
n :
(c) If P[Mn < 1=2] ! 1 and if 1=(1=2
Mn ) = Op (1); then (2) holds with ^ n = ^ HC
n :
This theorem gives a formal justi…cation for employing ^ HC
n as the variance estimator when
forming con…dence intervals for
in linear models with possibly many nuisance covariates and
heteroskedasticity. The resulting con…dence intervals for
will remain consistent even when Kn is
proportional to n, provided the technical conditions given in part (c) are satis…ed.
Remark 3. Our main results for linear models concern large-sample approximations for the …nitesample distribution of the usual t-statistics. An alternative, equally automatic approach
is to employ the bootstrap and closely related resampling procedures (see, among others,
Freedman (1981), Mammen (1993), Gonçalvez and White (2005), Kline and Santos (2012)).
Assuming Kn =n 9 0; Bickel and Freedman (1983) demonstrated an invalidity result for the
bootstrap. We conjecture that similar results can be obtained for other resampling procedures.
Furthermore, we also conjecture that employing appropriate resampling methods on the “biascorrected”residuals u
~2i;n (Remark 1) can lead to valid inference procedures. Investigating these
conjectures, however, is beyond the scope of this paper.
4.5.2
Semiparametric Partially Linear Model
The results for the partially linear model (4) are in perfect analogy with those for the linear
regression model.
Theorem PL Suppose Assumptions PL1–PL3 hold.
(a) If E["2i jxi ; zi ] =
2;
then (2) holds with ^ n = ^ HO
n :
(b) If Mn !p 0; then (2) holds with ^ n = ^ EW
n :
(c) If P[Mn < 1=2] ! 1 and if 1=(1=2
Mn ) = Op (1); then (2) holds with ^ n = ^ HC
n :
17
A result similar to Theorem PL(a) was previously reported in Cattaneo, Jansson, and Newey
(2015), but parts (b) and (c) of Theorem PL are new.
4.5.3
Fixed E¤ects Panel Data Regression Model
Finally, consider the panel data model (5). Because Kn =n = 1=T is …xed this model does not admit
an analog of Theorem 3. On the other hand, it does admit an analog of Theorems 2 and 4.
Theorem FE Suppose Assumptions FE1–FE3 hold. Then (2) holds with ^ n = ^ HC
n : If also
E[Uit2 jXi1 ; : : : ; XiT ] =
2;
then (2) holds with ^ n = ^ HO
n :
To see the connection between our results and those in Stock and Watson (2008), observe that
Mn = I N
[IT
0
T T =T ]
for
T
2 RT a T
(for i = 1; : : : ; n) and therefore Mn
HC
n
closed-form expression for
1 vector of ones. We then obtain Mii;n = 1
1=3 because T
1=T
3: More importantly, perhaps, we obtain a
given by
HC
n
= IN
T
T
1
IT
2
1)2
(T
0
T T
:
As a consequence,
N
^ HC
n
T
XX
1
~ it X
~ it0 U
^it2
=
X
N (T 2)
i=1 t=1
~ it = Xit T
where X
1
PT
s=1 Xis
N
X
1
N (T 2)
i=1
^it = Yit T
and U
1
PT
1
T
s=1 Yis
1
T
X
~ it X
~ it0
X
t=1
!
1
T
1
T
X
t=1
^it2
U
!
;
^0 X
~
n it : Apart from an asymptotically
negligible degrees of freedom correction, this estimator coincides with the estimator ^ HR
FE
of
Stock and Watson (2008, Eq. (6), p. 156).
Remark 4. The result above not only highlights a tight connection between our general standard
error estimator and the one in Stock and Watson (2008), but also indicates that our general
formula ^ HC
n could be used to derive explicit, simple expressions in other contexts where
multi-way …xed e¤ects or similar discrete regressors are included.
18
5
Simulations
We report the results from a small Monte Carlo experiment aimed to capture the extent to which
our main theoretical …ndings are present in samples of moderate size. To facilitate comparability
with other studies, we employ a data generating process (DGP) that is as similar as possible to
those employed in the literature before. In particular, we consider the following model:
yi = xi +
0w
i
+ ui ,
xi = vi ,
where
= (1; 1;
ui j(xi ; wi ) s i.i.d. N (0;
vi jwi s i.i.d. N (0;
; 1)0 ,
= 0 and
2 ),
vi
2 ),
ui
2
ui
= {u (1 + (xi + 0 wi )2 )# ,
2
vi
= {v (1 + ( 0 wi )2 )# ,
= 0, and the constants {u and {v are chosen so that
V[ui ] = V[vi ] = 1. In the absence of the additional covariates wi , this design coincides with the one
in Stock and Watson (2008), and is very similar to the one considered in MacKinnon (2012).
The simulation study employs 5; 000 replications, sets the sample size to n = 1; 000, and considers models with Kn =n 2 f0:1; 0:2; 0:3; 0:4g. The two main parameters varying in the Monte
Carlo experiments are: the constant # and the distribution of the covariates wi . The …rst parameter controls the degree of heteroskedasticity: # = 0 corresponds to homoskedasticity, and # = 1
corresponds to moderate heteroskedasticity, as classi…ed by MacKinnon (2012). For the distribution of the covariates we consider the following cases: independent standard N (0; 1) (Model 1),
independent U( 1; 1) (Model 2), independent discrete covariates constructed as 1(N (0; 1)
2:33).
The results are given in Table 1. These tables report empirical coverage rates for eight distinct
nominal 95% con…dence intervals for , across the range of Kn and values of #. Each con…dence
interval considered employs a di¤erent standard error formula: HO0 uses homoskedastic standard
errors without degrees of freedom correction, HO1 uses homoskedastic standard errors with degrees
of freedom correction, HC0 –HC4 are described in Section 4.4, and HCK uses ^ HC
n .
The main …ndings from the small simulation study are in line with our theoretical results. We
…nd that the con…dence interval estimators constructed our proposed standard errors formula ^ HC
n ,
denoted HCK , o¤er close-to-correct empirical coverage in all cases considered. The alternative
heteroskedasticity consistent standard errors currently available in the literature lead to con…dence
intervals that could deliver substantial under or over coverage depending on the design and degree
of heteroskedasticity considered. We also found that inference based on HC3 standard errors is
19
conservative, a general asymptotic result that is formally established in the supplemental appendix.
6
Conclusion
We established asymptotic normality of the OLS estimator of a subset of coe¢ cients in highdimensional linear regression models with many nuisance covariates, and investigated the properties of several popular heteroskedasticity-robust standard error estimators in this high-dimensional
context. We showed that none of the usual formulas deliver consistent standard errors when the
number of covariates is not a vanishing proportion of the sample size. We also proposed a new standard error formula that is consistent under (conditional) heteroskedasticity and many covariates,
which is fully automatic and does not assume special, restrictive structure on the regressors.
Our results concern high-dimensional models where the number of covariates is at most a
non-vanishing fraction of the sample size. A quite recent related literature concerns ultra-highdimensional models where the number of covariates is much larger than the sample size, but some
form of (approximate) sparsity is imposed in the model; see, e.g., Belloni, Chernozhukov, and
Hansen (2014a,b), Belloni, Chernozhukov, Hansen, and Fernandez-Val (2014), Farrell (2015), and
references therein. In that setting, inference is conducted after covariate selection, where the
resulting number of selected covariates is at most a vanishing fraction of the sample size (usually
much smaller). An implication of the results obtained in this paper is that the latter assumption
cannot be dropped if post covariate selection inference is based on conventional standard errors. It
would therefore be of interest to investigate whether the methods proposed herein can be applied
also for inference post covariate selection in ultra-high-dimensional settings, which would allow for
weaker forms of sparsity because more covariates could be selected for inference.
A
Technical Appendix
Our main results are based on seven technical lemmas, and are obtained by working with the
representation
p
n( ^ n
) = ^ n 1 Sn ;
P
P
p
0 =n and S =
where ^ n =
^i;n v^i;n
^i;n ui;n = n: Strictly speaking, the displayed
n
1 i nv
1 i nv
P
0 ) > 0 and
^
representation is valid only when min ( ni=1 wi;n wi;n
min ( n ) > 0: Both events occur
with probability approaching one under our assumptions and our main results are valid no matter
20
which de…nitions (of ^ n and ^ n ) are employed on the complement of the union of these events, but
P
0 ) > 0g; and, in a slight
for speci…city we let Mij;n = ! n Mij;n ; where ! n = 1f min ( nk=1 wk;n wk;n
abuse of notation, we de…ne
X
0
^n = 1
v^i;n v^i;n
;
n
1 X
Sn = p
v^i;n ui;n ;
n
1 i n
^ = 1f
n
^
min ( n )
> 0g ^ n 1 (
Mij;n xj;n ;
1 j n
1 i n
and
X
v^i;n =
1 X
v^i;n yi;n ):
n
1 i n
We …rst present our seven technical lemmas, some of which may be of independent interest.
Their proof is given at the end of the appendix, after establishing the main results of the paper
(Theorems 1–4).
The …rst lemma can be used to bound ^ n 1 :
Lemma A-1 If Assumptions 1–3 hold, then ^ n 1 = Op (1):
Let
n
=
n (Xn ; Wn )
= V[Sn jXn ; Wn ]: The second lemma can be used to bound
n
1
and to
show asymptotic normality of Sn :
Lemma A-2 If Assumptions 1–3 hold, then
n
1
= Op (1) and
n
1=2
Sn !d N (0; Id ):
The third lemma can be used to approximate ^ 2n by means of ~ 2n ; where
^ 2n =
with u
^i;n =
P
n
1 j n Mij;n (yj;n
X
1
u
^2i;n ;
d Kn
~ 2n =
1 i n
X
1
~2 ;
U
i;n
n Kn
1 i n
^ 0 xj;n ) and U
~i;n = P
n
1
j n Mij;n Uj;n :
Lemma A-3 If Assumptions 1–3 hold, then ^ 2n = E[~ 2n jXn ; Wn ] + op (1):
The fourth lemma can be used to approximate ^ n (
^ n(
n)
=
1
n
X
~ n(
0
^i;n v^i;n
u
^2j;n ;
ij;n v
n)
by means of ~ n (
n)
=
1 i;j n
Lemma A-4 Suppose Assumptions 1–3 hold.
P
If k n k1 = max1 i n 1 j n j ij;n j = Op (1); then ^ n (
n)
1
n
X
n );
where
0 ~2
^i;n v^i;n
Uj;n :
ij;n v
1 i;j n
= E[ ~ n (
n )jXn ; Wn ]
+ op (1):
The …fth lemma can be combined with the third lemma to show consistency of ^ HO
n under
homoskedasticity.
Lemma A-5 Suppose Assumption 1 holds.
2 jX ; W ] =
If E[Ui;n
n
n
2;
n
then E[~ 2n jXn ; Wn ] =
2!
n n
21
and
n
=
2^ :
n n
The sixth lemma can be combined with the fourth lemma to show consistency of ^ n (
n ):
Part
(a) is a general result stated under a high-level condition. Part (b) gives su¢ cient conditions for
the condition of part (a) for estimators of HCk type and part (c) does likewise for ^ HC : With a
n
slight abuse of notation, let
Mn = 1
min Mii;n :
1 i n
Lemma A-6 Suppose Assumption 3 holds.
(a) If
max fj
1 i n
then E[ ~ n (
If max1
k
n k1
i n fj1
= Op (1):
HC
n
=
n k1
X
X
j
1 j n;j6=i 1 k n
n
2
ik;n Mjk;n jg
= op (1);
+ op (1):
i;n
= ! n 1(i = j)
i;n Mii;n
n
= !n
0
HC
11;n
B
=B
@
HC ;
n
..
.
HC
1n;n
..
; where 0
i;n
n
+ op (1) and
where
.
HC
n1;n
..
.
HC
nn;n
1
0
2
M11;n
C B
..
C=B
.
A @
2
Mn1;n
..
.
1
2
M1n;n
C
..
C
.
A
2
Mnn;n
Mn ) = Op (1); then E[ ~ n (
If P[Mn < 1=2] ! 1 and if 1=(1=2
k
1j +
4 and i;n 0:
~
i;n jg = op (1) and if Mn = op (1); then E[ n ( n )jXn ; Wn ] =
ij;n
(c) Suppose
2
ik;n Mik;n
1 k n
n )jXn ; Wn ]
(b) Suppose
X
= Op (1):
1
= (Mn
n )jXn ; Wn ]
Mn )
=
n
1
:
+ op (1) and
Finally, the seventh lemma can be used to formulate primitive su¢ cient conditions for the last
part of Assumption 2.
Lemma A-7 Suppose Assumptions 1 and 3 hold and suppose that
1 X
E[kQi;n k2+ ] = O(1)
n
1 i n
for some
max1
A.1
i n
P
0: If either (i)
1 j n 1(Mij;n
> 0 and Mn = op (1); or (ii)
6= 0) = op (n
=(2 +2) );
then max1
i
= o(1); or (iii)
p
vi;n k= n = op (1):
n k^
n
> 0 and
Proofs of Main Results
Theorem 1 follows from Lemmas A-1 and A-2. Theorem 2 follows from Theorem 1 combined with
Lemmas A-3 and A-5. Theorems 3 and 4 follow from Theorem 1 combined with Lemmas A-4 and
A-6.
22
A.1.1
Linear Regression Model with Increasing Dimension
If Assumption LR1 holds, then Assumption 1 holds with Wn = (w1;n ; : : : ; wn;n ); Nn = n; Ti;n = fig;
and max1
i Nn
#Ti;n = 1: Moreover,
max1
n
i n E[kxi;n k
2]
and
%n = E[(E[ui;n jxi;n ; wi;n ])2 ],
n
so Assumption 2 holds if Assumptions LR1-LR2 hold. Finally, Assumption 3 is implied by Assumptions LR1-LR3. In particular,
~
min (E[ n jWn ])
1 X
0
jwi;n ])
E[V~i;n V~i;n
n
1 i n
1 X
0
= ! n min (
Mii;n E[Vi;n Vi;n
jwi;n ])
n
1 i n
1 X
0
!n
Mii;n min (E[Vi;n Vi;n
jwi;n ])
n
1 i n
1 X
0
Mii;n ) min min (E[Vi;n Vi;n
jwi;n ])
!n(
1 i n
n
=
min (
1 i n
! n (1
so 1=
~
min (E[ n jWn ])
Kn =n)=CnLR ;
= Op (1) because P[! n = 1] ! 1; limn!1 Kn =n < 1; and CnLR = Op (1):
Under Assumptions LR1 and LR3, we have
n
=
LR
n
=
min
2RKn
d
0
E[kE(xi;n jwi;n )
wi;n k2 ] = E[kQi;n k2 ];
where
Qi;n = E[vi;n jwi;n ];
Setting
vi;n = xi;n
0
0
E[xi;n wi;n
]E[wi;n wi;n
]
1
wi;n :
= 2 in Lemma A-7 and specializing it to the linear regression model with increasing
dimension we therefore obtain the following lemma, whose conditions are discussed in the main
text.
Lemma A-8 Suppose Assumptions LR1 and LR3 hold and suppose that E[kxi;n k2 ] = O(1); E[ui;n jxi;n ; wi;n ] =
P
0; and E[kQi;n k4 ] = O(1): If either (i) Mn = op (1); or (ii) LR
n = o(1); or (iii) max1 i n
1 j n 1(Mij;n 6=
p
1=3
0) = op (n ); then max1 i n k^
vi;n k= n = op (1):
A.1.2
Semiparametric Partially Linear Model
If Assumption PL1 holds, then Assumption 1 holds with Wn = (z1 ; : : : ; zn ); Nn = n; Ti;n = fig;
and max1
i Nn
#Ti;n = 1: Moreover, in this case we have
n
=
min
2RKn
d
E[kE(xi jzi )
23
0
pn (zi )k2 ] =
PL
n
and, using E(yi
0
0
xi jxi ; zi ) = g(zi ) = E(yi
0
%n = min E[jE(yi
2RKn
0
xi jzi )
xi jzi );
pn (zi )j2 ] = min E[jE(yi
2RKn
0
xi jxi ; zi )
0
so Assumption 2 holds when Assumptions PL1-PL2 hold, the condition max1
holding by Lemma A-7 because
n
pn (zi )j2 ] =
n
p
vi;n j=
i n j^
= %PL
n ;
n = op (1)
= o(1): Finally, Assumption 3 is implied by Assumptions PL1-
PL3. In particular,
~
min (E[ n jWn ])
1 X
Mii;n E[
n
1 i n
1 X
!n
Mii;n min (E[
n
1 i n
1 X
!n(
Mii;n ) min
1 i n
n
= !n
min (
0
i i jzi ])
0
i i jzi ])
0
min (E[ i i jzi ])
1 i n
Kn =n)=CnPL ;
! n (1
so 1=
A.1.3
~
min (E[ n jWn ])
= Op (1) because P[! n = 1] ! 1; limn!1 Kn =n < 1; and CnPL = Op (1):
Fixed E¤ects Panel Data Regression Model
If Assumption FE1 holds, then Assumption 1 holds with Wn = (w1;n ; : : : ; wn;n ); Nn = N = n=T;
Ti;n = fT (i 1)+1; : : : ; T ig; and max1
i Nn
#Ti;n = T: Moreover,
n
max1
i N;1 t T
E[kXit k2 ];
so Assumption 2 holds (with %n = n = 0) when Assumptions FE1-FE3 hold, the condition
P
p
max1 i n j^
vi;n j= n = op (1) holding by Lemma A-7 because 1 j n 1(Mij;n 6= 0) = T: Finally,
Assumption 3 is implied by Assumptions FE1-FE3. In particular,
~
min (E[ n jWn ])
=
min (
1
NT
min
A.2
~
min (E[ n jWn ])
E[V~it V~it0 ])
1 i N;1 t T
1 i N;1 t T
so 1=
X
~ ~0
min (E[Vit Vit ])
1=CnFE ;
= Op (1) because CnFE = Op (1):
Properties of Mn
Mn
Because Mn is symmetric, so is Mn
Mn and it follows from the Gerschgorin circle theorem (see,
e.g., Barnes and Ho¤man (1981) for an interesting discussion) that
min (Mn
Mn )
2
min fMii;n
1 i n
X
1 j n;j6=i
24
2
2
jMij;n
jg = min f2Mii;n
1 i n
X
1 j n
2
Mij;n
g;
where, using the fact that
X
2
min f2Mii;n
1 i n
Thus,
min (Mn
P
1 j n
2
1 j n Mij;n
= Mii;n because Mn is idempotent,
2
2
Mij;n
g = min f2Mii;n
1 i n
Mn ) > 0 (i.e., Mn
Under the same condition, Mn
Mii;n g = 2 min fMii;n (Mii;n
1 i n
1=2)g:
Mn is positive de…nite) whenever Mn < 1=2:
Mn is diagonally dominant and it follows from Theorem 1 of
Varah (1975) that
k(Mn
A.3
1
Mn )
k1
1
1=2
Mn
:
Proofs of Lemmas
Throughout the proofs we simplify the notation by assuming without loss of generality that d = 1:
In Lemma A-2 the case where d > 1 can be handled by means of the Cramér-Wold device and
simple bounding arguments.
A.3.1
Proof of Lemma A-1
It su¢ ces to show that ~ n = E[ ~ n jWn ] + op (1) and that ^ n
~n
op (1):
First,
~n = 1
n
where
P
1 i;j Nn
V[aij;n jWn ]
X
aii;n +
1 i Nn
2
n
X
aij;n ;
X
aij;n =
1 i;j Nn ;i<j
Mst;n Vs;n Vt;n ;
s2Ti;n ;t2Tj;n
V[aij;n jWn ] = Op (n) because
(#Ti;n )(#Tj;n )
X
s2Ti;n ;t2Tj;n
where CT ;n = max1
i Nn
X
2
Mst;n
V[Vs;n Vt;n jWn ]
#(Ti;n ) = Op (1); CV;n = 1 + max1
X
2
Mst;n
=
X
X
X
Mii;n
2
Mst;n
;
s2Ti;n ;t2Tj;n
4
i n E[kVi;n k jWn ]
2
Mij;n
=
1 i;j n
1 i;j Nn s2Ti;n ;t2Tj;n
CT2 ;n CV;n
= Op (1) and
n:
1 i n
As a consequence,
V[
1
n
X
1 i Nn
aii;n jWn ] =
1
n2
aij;n jWn ] =
1
n2
X
1 i Nn
V[aii;n jWn ]
1
n2
X
1 i;j Nn
V[aij;n jWn ] = op (1)
and
V[
1
n
X
1 i;j Nn ;i<j
X
1 i;j Nn ;i<j
V[aij;n jWn ]
implying in particular that ~ n = E[ ~ n jWn ] + op (1):
25
1
n2
X
1 i;j Nn
V[aij;n jWn ] = op (1);
~ i;n = P
Next, de…ning Q
1
j n Mij;n Qj;n ;
we have
X
X
~n = 1
~2 + 2
~ i;n V~i;n
Q
Q
i;n
n
n
^n
1 i n
2 X ~ ~
2 X ~
Qi;n Vi;n =
Qi;n Vi;n = op (1);
n
n
1 i n
1 i n
1 i n
~ i;n Vi;n jWn ] = 0 and
the last equality using the fact that E[Q
V[
1 X ~
Qi;n Vi;n jWn ] =
n
1
n2
1 i n
1
n2
X
X
V[
1 i Nn
X
s2Ti;n
(#Ti;n )
1 i Nn
~ s;n Vs;n jWn ]
Q
X
s2Ti;n
~ s;n Vs;n jWn ]
V[Q
1
1
1 X 2
Qi;n ) = Op (
CT ;n CV;n (
n
n
n
n)
X
1
1
CT ;n CV;n (
n
n
X
~ 2s;n )
Q
1 i Nn s2Ti;n
= op (1):
1 i n
A.3.2
Proof of Lemma A-2
De…ning S~n = Sn
Sn
E[Sn jXn ; Wn ] =
P
^i;n Ui;n =
1 i nv
p
n and employing the decomposition
1 X ~
1 X ~
1 X
S~n = p
Vi;n ri;n + p
Qi;n ri;n + p
v^i;n (Ri;n
n
n
n
1 i n
1 i n
1 i n
we begin by showing that Sn = S~n + op (1):
P
First, de…ning r~i;n = 1 j n Mij;n rj;n and using E[~
ri;n Vi;n jWn ] = 0 and
1 X
r~i;n Vi;n jWn ] =
V[ p
n
1 i n
1
n
X
V[
1 i Nn
CT ;n CV;n
we have
X
s2Ti;n
r~s;n Vs;n jWn ]
1 X 2
r~i;n
n
1 i n
ri;n );
CT ;n CV;n
1
n
X
(#Ti;n )
X
s2Ti;n
1 i Nn
1 X 2
ri;n = Op (
n
n)
V[~
rs;n Vs;n jWn ]
= Op (%n ) = op (1);
1 i n
1 X ~
1 X
p
Vi;n ri;n = p
r~i;n Vi;n = op (1):
n
n
1 i n
1 i n
Also, using the Cauchy-Schwarz inequality,
1 X ~
jp
Qi;n ri;n j2
n
n(
1 X 2
1 X 2
Qi;n )(
ri;n ) = Op (n
n
n
1 i n
1 i n
n n)
= op (1)
1 i n
and
1 X
jp
v^i;n (Ri;n
n
1 i n
ri;n )j2
n(
1 X 2
1 X
v^i;n )(
jRi;n
n
n
1 i n
1 i n
26
ri;n j2 ) = Op [n(%n
n )]
= op (1);
where the penultimate equality uses
1 X 2
v^i;n
n
1 X 2
vi;n
n
1 i n
1 i n
1 i n
1 i n
2 ]: As a consequence, S = S
~n + op (1):
E[ri;n
n
2 ]
ri;n j2 ] = E[Ri;n
and E[jRi;n
2 X 2
2 X 2
Qi;n +
Vi;n = Op (1)
n
n
Next, using Assumption 1,
n
=
1
n
X X
1 i Nn t2Ti
2
2
v^t;n
E[Ut;n
jXn ; Wn ] =
^ n min1 i n E[U 2 jXn ; Wn ];
i;n
so
n
1
1 X 2
2
v^i;n E[Ui;n
jXn ; Wn ]
n
1 i n
= Op (1): The proof can therefore be completed by showing that
We shall do so assuming without loss of generality that
1=2 ~
Sn
n
1
=p
n
where, conditional on (Xn ; Wn );
i;n
1
n
X
i;n ;
i;n
min ( n )
=
n
1=2
X
n
1=2 ~
Sn
!d N (0; 1):
> 0 (a.s.). Because
v^t;n Ut;n ;
t2Ti
1 i Nn
are mean zero independent random variables with
X
V[
1 i Nn
i;n jXn ; Wn ]
= 1;
it follows from the Berry-Esseen inequality that
sup jP(
z2R
where
1=2 ~
Sn
n
zjXn ; Wn )
(z)j
min(
P
1 i Nn
3
i;n j jXn ; Wn ]
; 1);
n3=2
E[j
( ) is the standard normal cdf. It therefore su¢ ces to show that
1
n3=2
X
E[j
1 i Nn
i;n j
3
jXn ; Wn ] = op (1):
Now,
1
n3=2
X
1 i Nn
E[j
3
i;n j jXn ; Wn ]
min ( n )
min ( n )
CT2 ;n CU;n
3=2
1
n3=2
3=2
1
n3=2
min ( n )
27
X
E[j
1 i Nn
X
t2Ti
(#Ti )2
1 i Nn
3=2
X
1
n3=2
v^t;n Ut;n j3 jXn ; Wn ]
X
t2Ti
j^
vt;n j3 E[jUt;n j3 jXn ; Wn ]
X X
1 i Nn t2Ti
j^
vt;n j3 = op (1);
where CU;n = 1 + max1
1
n3=2
A.3.3
X X
1 i Nn t2Ti
4
i n E[Ui;n jXn ; Wn ]
1
j^
vt;n j3 =
n3=2
X
1 i n
= Op (1) and where the last equality uses the fact that
j^
vi;n j3
(
max1
vi;n j
i n j^
p
n
)(
1 i n
Proof of Lemma A-3
It su¢ ces to show that ~ 2n = E[~ 2n jXn ; Wn ] + op (1) = Op (1) and that
First,
~ 2n =
where
1 X 2
v^i;n ) = op (1):
n
X
1
n Kn
P
1 i;j Nn
bii;n +
1 i Nn
2
n Kn
X
bij;n ;
P
ui;n
1 i n (^
X
bij;n =
1 i;j Nn ;i<j
~i;n )2 = op (n):
U
Mst;n Us;n Ut;n ;
s2Ti;n ;t2Tj;n
V[bij;n jXn ; Wn ] = Op (n) because
V[bij;n jXn ; Wn ]
(#Ti;n )(#Tj;n )
X
s2Ti;n ;t2Tj;n
CT2 ;n CU;n
X
2
Mst;n
V[Us;n Ut;n jXn ; Wn ]
2
Mst;n
CT2 ;n CU;n n:
s2Ti;n ;t2Tj;n
As a consequence,
V[
1
n Kn
X
1 i Nn
bii;n jXn ; Wn ] =
1
(n Kn )2
1
(n Kn )2
X
1 i Nn
V[bii;n jXn ; Wn ]
X
1 i;j Nn
V[bij;n jXn ; Wn ] = op (1)
and
V[
1
n Kn
X
1 i;j Nn ;i<j
bij;n jXn ; Wn ] =
1
(n Kn )2
1
(n Kn )2
X
1 i;j Nn ;i<j
X
1 i;j Nn
V[bij;n jXn ; Wn ]
V[bij;n jXn ; Wn ] = op (1);
implying in particular that ~ 2n = E[~ 2n jXn ; Wn ] + op (1); where
E[~ 2n jXn ; Wn ] =
X
1
2
Mii;n E[Ui;n
jXn ; Wn ]
n Kn
1 i n
Next, by Lemmas A-1 and A-2 and their proofs, ~ n ( ^ n
have
1 X ~2
Ri;n
n
1 i n
CU;n = Op (1):
)2 = op (1): Also, using %n ! 0; we
1 X 2
Ri;n = Op (%n ) = op (1):
n
1 i n
28
~i;n = R
~ i;n + V~i;n ( ^ n
U
As a consequence, using u
^i;n
X
~i;n )2
U
(^
ui;n
2n[
1 i n
A.3.4
);
1 X ~2
Ri;n + ~ n ( ^ n
n
)2 ] = op (n):
1 i n
Proof of Lemma A-4
It su¢ ces to show that ^ n (
n)
= ~ n(
n)
First,
~ n(
cij;n =
n)
1
n
=
+ op (1) and that ~ n (
X
cii;n +
1 i Nn
X
X
2
n
n)
X
= E[ ~ n (
n )jXn ; Wn ]
+ op (1):
cij;n ;
1 i;j Nn ;i<j
2
^k;n
Msl;n Mtl;n Us;n Ut;n ;
kl;n v
s2Ti;n ;t2Tj;n 1 k;l n
P
where
1 i;j Nn
V[cij;n jXn ; Wn ] = op (n2 ) because
V[cij;n jXn ; Wn ]
(#Ti;n )(#Tj;n )
X
(
X
s2Ti;n ;t2Tj;n 1 k;l n
CT2 ;n CU;n
= CT2 ;n CU;n
= CT2 ;n CU;n
CT2 ;n CU;n
X
X
2
Msl;n Mtl;n )2 V[Us;n Ut;n jXn ; Wn ]
^k;n
kl;n v
2
2
^k;n
v^K;n
Msl;n Mtl;n MsL;n MtL;n
kl;n KL;n v
s2Ti;n ;t2Tj;n 1 k;l;K;L n
X
X
2
2
v^K;n
Mil;n Mjl;n MiL;n MjL;n
^k;n
kl;n KL;n v
1 i;j n 1 k;l;K;L n
X
2
2
2
v^K;n
MlL;n
^k;n
kl;n KL;n v
1 k;l;K;L n
X
1 k;l;K;L n
j
2
2
2
vk;n
v^K;n
MlL;n
kl;n jj KL;n j^
and
X
1 k;l;K;L n
j
2
2
2
v^K;n
MlL;n
vk;n
kl;n jj KL;n j^
(max1
2
^i;n
)
i nv
X
1 k;l;K;L n
2
(max1 i n v^i;n
)k n k1
2
(max1 i n v^i;n
)k n k1
n2 (
max1
j
2
2
vK;n
MlL;n
kl;n jj KL;n j^
X
1 l;K;L n
X
1 K;L n
vi;n j 2
i n j^
p
) k
n
2 1
n k1 (
n
j
j
2
2
MlL;n
vK;n
KL;n j^
2
vK;n
KL;n j^
X
2
v^i;n
) = op (n2 ):
1 i n
As a consequence,
V[
1
n
X
1 i Nn
cii;n jXn ; Wn ] =
1
n2
X
1 i Nn
V[cii;n jXn ; Wn ]
29
1
n2
X
1 i;j Nn
V[cij;n jXn ; Wn ] = op (1)
and
V[
1
n
X
1 i;j Nn ;i<j
In particular, ~ n (
jE[ ~ n (
n)
= E[ ~ n (
X
1
n2
cij;n jXn ; Wn ] =
1 i;j Nn ;i<j
n )jXn ; Wn ]
X
1
n
n )jXn ; Wn ]j
1 i;j n
1
n2
V[cij;n jXn ; Wn ]
X
1 i;j Nn
V[cij;n jXn ; Wn ] = op (1):
+ op (1); where
j
2
~ 2 jXn ; Wn ]
vj;n
E[U
ij;n j^
i;n
CU;n
1
n
1 X 2
CU;n k n k1 (
v^i;n ) = Op (1):
n
X
1 i;j n
j
2
vj;n
ij;n j^
1 i n
To complete the proof it su¢ ces to show that
^ n(
~ n(
n)
n)
=
1
n
X
2
~i;n
^j;n
([U
ij;n v
V~i;n ( ^ n
~ i;n
+R
)]2
~ 2 ) = op (1):
U
i;n
1 i;j n
~ j;n ; R
~ i;n =
To do so, it su¢ ces (by the Cauchy-Schwarz inequality and using v^j;n = V~j;n + Q
~ i;n r~i;n ); and ~ n ( n ) = Op (1)) to show that
r~i;n + (R
1
n
1
n
X
1 i;j n
X
1 i;j n
First, n
1
E[
1
n
~ 2 ~2
ij;n jVj;n r
i;n
= op (1);
~2 ~2
ij;n jQj;n Ri;n
= op (1);
j
j
P
~ 2 ~2
i;n
1 i;j n j ij;n jVj;n r
X
1 i;j n
j
1
n
X
1 i;j n
(^n
j
)2
~2 ~
ij;n jVj;n jRi;n
1
n
X
1 i;j n
j
r~i;n j2 = op (1);
2 ~2
vj;n
Vi;n
ij;n j^
= op (1):
= op (1) because
~ 2 ~2 jWn ]
ij;n jVj;n r
i;n
=
1 X 2 X
r~i;n
j
n
1 i n
CT ;n CV;n k
1 j n
n k1
~2
ij;n jE[Vj;n jWn ]
1 X 2
r~i;n = Op (
n
n)
= op (1);
1 i n
where the inequality uses
2
E[V~i;n
jWn ] = E[(
X
1 j n
=
X
E[(
1 j Nn
X
Mij;n Vj;n )2 jWn ] = E[(
X
1 j Nn
= CT ;n CV;n
X
1 j n
X
1 j Nn s2Tj;n
s2Tj;n
(#Tj;n )
X
Mis;n Vs;n )2 jWn ]
Mis;n Vs;n )2 jWn ]
X
s2Tj;n
2
Mij;n
2
2
Mis;n
E[Vs;n
jWn ]
CT ;n CV;n
= CT ;n CV;n Mii;n
CT ;n CV;n :
30
X
X
1 j Nn s2Tj;n
2
Mis;n
Next,
1
n
X
j
1 i;j n
~2 ~
ij;n jVj;n jRi;n
r~i;n j2
nC
;n (
1 X ~2 1 X ~
Vi;n )(
jRi;n
n
n
1 i n
= Op [n(%n
1 i n
n )]
r~i;n j2 )
= op (1)
and
1
n
X
1 i;j n
j
~2 ~2
ij;n jQj;n Ri;n
nk
n k1 (
1 X ~2
1 X ~2
Qi;n )(
Ri;n ) = Op (n
n
n
1 i n
Finally,
(^n
because
p
n( ^ n
1
n2
)2
X
1
n
j
1 i;j n
n %n )
= op (1):
1 i n
2 ~2
vj;n
Vi;n
ij;n j^
= op (1)
) = Op (1) and
X
1 i;j n
j
2 ~2
vj;n
Vi;n
ij;n j^
2
( max v^i;n
)
1 i n
(
max1
1
n2
X
1 i;j n
vi;n j 2
i n j^
p
) k
n
j
~2
ij;n jVi;n
n k1 (
Proof of Lemma A-5
2
2
~ 2 jXn ; Wn ] = P
Because E[U
j;n
1 i n Mij;n E[Ui;n jXn ; Wn ];
1 X ~2
Vi;n ) = op (1):
n
1 i n
A.3.5
E[~ 2n jXn ; Wn ] =
=
X
1
n Kn
1 i;j n
2
2
Mij;n
E[Ui;n
jXn ; Wn ] =
X
1
2
Mii;n E[Ui;n
jXn ; Wn ];
n Kn
X
1
2
Mii;n E[Ui;n
jXn ; Wn ]
n Kn
1 i n
1 i n
2 jX ; W ] =
so if E[Ui;n
n
n
2;
n
then
E[~ 2n jXn ; Wn ]
and
n
=
=
2
n
P
1 i n Mii;n
n
Kn
=
1 X 2
2
v^i;n E[Ui;n
jXn ; Wn ] =
n
2
n!n
2^
n n:
1 i n
A.3.6
Proof of Lemma A-6
P
2
De…ning dij;n = 1 k n ik;n Mjk;n
E[ ~ n (
n )jXn ; Wn ]
1(i = j); we have
n
=
1
n
X
1 i;j n
31
2
2
dij;n v^i;n
E[Uj;n
jXn ; Wn ];
so if max1
i n
jE[ ~ n (
P
1 j n jdij;n j
n )jXn ; Wn ]
= op (1); then
1
n
nj
X
1 i;j n
2
2
jdij;n j^
vi;n
E[Uj;n
jXn ; Wn ]
CU;n
1
n
X
1 i;j n
2
jdij;n j^
vi;n
X
1 X 2
CU;n (
v^i;n )( max
jdij;n j) = op (1):
1 i n
n
1 i n
This establishes part (a).
P
0 ) > 0 and if
Next, if min ( nk=1 wk;n wk;n
1 j n
ij;n
i;n
= 1fi = jg
i;n Mii;n
(with 0
i;n
4 and
0), then
i;n
X
1 j n
2
i;n Mii;n
jdij;n j = j
i;n
X
1j +
i;n
i;n Mii;n
1 j n;j6=i
j
i;n
1j +
j
i;n
1j +
= j
i;n
1j +
2 i;n
1
1j + i;n Mii;n i;n (1 Mii;n )
i;n jMii;n
2
3
2
Mii;n
) + i;n Mii;n
(1 Mii;n )
i;n (Mii;n
3
2
Mii;n ):
i;n [(1 + Mii;n )(1 + Mii;n ) + Mii;n ](1
Part (b) follows from this inequality and the fact that P[
Pn
X
2
HC
ik;n Mik;n
1j +
X
j
X
1 j n;j6=i 1 k n
1 k n
0
k=1 wk;n wk;n )
min (
Finally, if Mn < 1=2; then
j
2
Mji;n
2
HC
ik;n Mjk;n j
> 0] ! 1:
=0
and, by Theorem 1 of Varah (1975),
k
HC
n k1
1
1=2
Mn
:
Part (c) follows from these displays and the fact that P[Mn < 1=2] ! 1:
A.3.7
Proof of Lemma A-7
~ i;n ; we have
Because v^i;n = V~i;n + Q
max1
vi;n j
i n j^
p
max1
n
~
i n jVi;n j
p
n
+
max1
~
i n jQi;n j
p
n
=
max1
~
i n jQi;n j
p
n
+ op (1);
the equality using the fact that
P[
max1
~
i n jVi;n j
p
n
> "jWn ]
min(
X
1 i n
p
P[jV~i;n j > " njWn ]; 1)
32
min(
1 X
4
E[V~i;n
jWn ]; 1)
"4 n 2
1 i n
and
4
E[V~i;n
jWn ] = E[(
X
1 j n
X
=
E[(
1 j Nn
1 j Nn
+3
X
X
3CT3 ;n CV;n
=
s2Tj;n
De…ning
= ! n 1(i = j)
max1
n
Mit;n Vt;n )2 jWn ]
X
X
s2Tj;n ;t2Tk;n
X
2
2
2
2
Vt;n
jWn ]
Mis;n
Mit;n
E[Vs;n
2
2
Mis;n
Mit;n
1 j;k Nn s2Tj;n ;t2Tk;n
X
1 j;k n
2
2
2
= 3CT3 ;n CV;n Mii;n
Mij;n
Mik;n
p
~
i n jQi;n j=
3CT3 ;n CV;n = Op (1):
n = op (1):
Mij;n ; we have
~
i n jQi;n j
p
t2Tk;n
(#Tj;n )(#Tk;n )
It therefore su¢ ces to show that max1
?
Mij;n
X
4
4
Mis;n
E[Vs;n
jWn ]
1 j;k Nn ;k6=j
3CT3 ;n CV;n
Mis;n Vs;n )2 (
s2Tj;n
X
(#Tj;n )3
Mis;n Vs;n )4 jWn ]
Mis;n Vs;n )4 jWn ]
E[(
1 j;k Nn ;k6=j
X
X
1 j Nn s2Tj;n
X
s2Tj;n
X
+3
X
Mij;n Vj;n )4 jWn ] = E[(
=
max1
=
X
!
pn max jQi;n
n1 i n
1 j n
X
1
?
+ p max j
Mij;n
Qj;n j
n1 i n
i n jQi;n j
p
?
Mij;n
Qj;n j
n
1 j n
X
1
?
p max j
Mij;n
Qj;n j + op (1);
n1 i n
1 j n
where the last equality uses the fact that n
It therefore su¢ ces to show that max1
1
i nj
P
In cases (i) and (ii) the desired conclusion
by the Cauchy-Schwarz inequality,
(
max1
i nj
P
2+ ] = o(n =2 ) if
i n E[jQi;n j
p
?
1 j n Mij;n Qj;n j= n = op (1):
P
? )2 =
follows from 1 j n (Mij;n
P1
?
1 j n Mij;n Qj;n j 2
p
n
?
( max Mii;n
)(
)
1 i n
> 0 or if
?
Mii;n
n
= o(1):
Mn because,
1 X 2
Qi;n )
n
1 i n
1 X 2
Mn (
Qi;n ) = Mn Op (
n
n ):
1 i n
Finally, in case (iii) the desired conclusion follows from n
33
1
P
1 i n E[jQi;n j
2+
] = O(1) because,
by the Hölder inequality,
X
1
?
p j
Mij;n
Qj;n j
n
1 j n
s
(2+ )=(1+ )
n
s
(2+ )=(1+ ))
X
1
n
=(2 +2)
1 j n
1
=(2 +2)
[1 +
? j(2+ )=(1+ )
jMij;n
X
1 j n
s
2+
1 X
jQj;n j2+
n
1 j n
1(Mij;n 6= 0)]Op (1):
References
Abadie, A., G. W. Imbens, and F. Zheng (2014): “Inference for Misspeci…ed Models With
Fixed Regressors,” Journal of the American Statistical Association, 109(508), 1601–1614.
Angrist, J., and J. Hahn (2004): “When to Control for Covariates? Panel Asymptotics for
Estimates of Treatment E¤ects,” Review of Economics and Statistics, 86(1), 58–72.
Barnes, E. R., and A. J. Hoffman (1981): “On Bounds for Eigenvalues of Real Symmetric
Matrices,” Linear Algebra and its Applications, 40, 217–223.
Belloni, A., V. Chernozhukov, D. Chetverikov, and K. Kato (2015): “On the Asymptotic
Theory for Least Squares Series: Pointwise and Uniform Results,” Journal of Econometrics,
186(2), 345–366.
Belloni, A., V. Chernozhukov, and C. Hansen (2014a): “High-Dimensional Methods and
Inference on Structural and Treatment E¤ects,”Journal of Economic Perspectives, 28(2), 29–50.
(2014b): “Inference on Treatment E¤ects after Selection among High-Dimensional Controls,” Review of Economic Studies, 81(2), 608–650.
Belloni, A., V. Chernozhukov, C. Hansen, and I. Fernandez-Val (2014): “Program Evaluation with High-Dimensional Data,” Working paper, MIT.
Bera, A. K., T. Suprayitno, and G. Premaratne (2002): “On Some Heteroskedasticityrobust Estimators of Variance-Covariance Matrix of the Least-squares Estimators,” Journal of
Statistical Planning and Inference, 108, 121–136.
Bickel, P. J., and D. A. Freedman (1983): “Bootstrapping Regression Models with Many
Parameters,”in A Festschrift for Erich L. Lehmann, ed. by P. Bickel, K. Doksum, and J. Hodges.
Chapman and Hall.
Cattaneo, M. D., and M. H. Farrell (2013): “Optimal Convergence Rates, Bahadur Representation, and Asymptotic Normality of Partitioning Estimators,” Journal of Econometrics,
174(2), 127–143.
Cattaneo, M. D., M. Jansson, and W. K. Newey (2015): “Alternative Asymptotics and the
Partially Linear Model with Many Regressors,” Econometric Theory, forthcoming.
34
Chen, X. (2007): “Large Sample Sieve Estimation of Semi-Nonparametric Models,” in Handbook
of Econometrics, Volume VI, ed. by J. J. Heckman, and E. E. Leamer, pp. 5549–5632. Elsevier
Science B.V.
Chen, X., and T. M. Christensen (2015): “Optimal Uniform Convergence Rates and Asymptotic Normality for Series Estimators Under Weak Dependence and Weak Conditions,” Journal
of Econometrics, 188(2), 447–465.
Chesher, A. (1989): “Hájek Inequalities, Measures of Leverage and the Size of Heteroskedasticity
Robust Wald Tests,” Econometrica, 57(4), 971–977.
Chesher, A., and I. Jewitt (1987): “The Bias of a Heteroskedasticity Consistent Covariance
Matrix Estimator,” Econometrica, 55(5), 1217–1222.
Cribari-Neto, F., and M. da Gloria A. Lima (2011): “A Sequence of Improved Standard Errors under Heteroskedasticity of Unknown Form,”Journal of Statistical Planning and Inference,
141(11), 3617–3627.
Cribari-Neto,
F.,
S. L. P. Ferrari,
and G. M. Cordeiro (2000):
“Improved
Heteroscedasticity-Consistent Covariance Matrix Estimators,” Biometrika, 87(4), 907–918.
Donald, S. G., and W. K. Newey (1994): “Series Estimation of Semilinear Models,” Journal
of Multivariate Analysis, 50(1), 30–40.
El Karoui, N., D. Bean, P. J. Bickel, C. Lim, and B. Yu (2013): “On Robust Regression
with High-Dimensional Predictors,” Proceedings of the National Academy of Sciences, 110(36),
14557–14562.
Farrell, M. H. (2015): “Robust Inference on Average Treatment E¤ects with Possibly More
Covariates than Observations,” Journal of Econometrics, 189(1), 1–23.
Freedman, D. A. (1981): “Bootstrapping Regression Models,” Annals of Statistics, 9(6), 1218–
1228.
Gonçalvez, S., and H. White (2005): “Bootstrap Standard Error Estimates for Linear Regression,” Journal of the American Statistical Association, 100(471), 970–979.
Huber, P. J. (1973): “Robust Regression: Asymptotics, Conjectures, and Monte Carlo,” Annals
of Stastistics, 1(5), 799–821.
Kline, P., and A. Santos (2012): “Higher order properties of the wild bootstrap under misspeci…cation,” Journal of Econometrics, 171(1), 54–70.
Koenker, R. (1988): “Asymptotic Theory and Econometric Practice,”Journal of Applied Econometrics, 3(2), 139–147.
35
MacKinnon, J., and H. White (1985): “Some Heteroskedasticity-consistent Covariance Matrix
Estimators with Improved Finite Sample Properties,” Journal of Econometrics, 29, 305–325.
MacKinnon, J. G. (2012): “Thirty Years of Heteroskedasticity-Robust Inference,”in Recent Advances and Future Directions in Causality, Prediction, and Speci…cation Analysis, ed. by X. Chen,
and N. R. Swanson. Springer.
Mammen, E. (1993): “Bootstrap and Wild Bootstrap for High Dimensional Linear Models,” Annals of Statistics, 21(1), 255–285.
Müller, U. K. (2013): “Risk of Bayesian Inference in Misspeci…ed Models, and the Sandwich
Covariance Matrix,” Econometrica, 81(5), 1805–1849.
Newey, W. K. (1997): “Convergence Rates and Asymptotic Normality for Series Estimators,”
Journal of Econometrics, 79, 147–168.
Shao, J., and C. F. J. Wu (1987): “Heteroscedasticity-Robustness of Jackknife Variance Estimators in Linear Models,” Annals of Statistics, 15(4), 1563–1579.
Stock, J. H., and M. W. Watson (2008): “Heteroskedasticity-Robust Standard Errors for Fixed
E¤ects Panel Data Regression,” Econometrica, 76(1), 155–174.
Varah, J. M. (1975): “A Lower Bound for the Smallest Singular Value of a Matrix,” Linear
Algebra and its Applications, 11(1), 3–5.
White, H. (1980): “A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct
Test for Heteroskedasticity,” Econometrica, 48(4), 817–838.
Wu, C. F. J. (1986): “Jackknife, Bootstrap and Other Resampling Methods in Regression Analysis,” Annals of Statistics, 14(4), 1261–1295.
36
Table 1: Empirical Coverage of 95% Con…dence Intervals.
(a) Model 1: Gaussian wi;n Regressors
#
0
0
0
0
1
1
1
1
Kn =n
0:1
0:2
0:3
0:4
0:1
0:2
0:3
0:4
HO0
0:939
0:922
0:899
0:867
0:421
0:436
0:446
0:442
HO1
0:952
0:952
0:948
0:952
0:442
0:479
0:516
0:554
HC0
0:940
0:920
0:897
0:866
0:899
0:852
0:809
0:742
HC1
0:950
0:952
0:949
0:951
0:917
0:896
0:878
0:858
HC2
0:950
0:951
0:949
0:951
0:918
0:896
0:881
0:858
HC3
0:963
0:969
0:981
0:987
0:932
0:934
0:941
0:937
HC4
0:977
0:993
0:987
0:974
0:962
0:976
0:955
0:902
HCK
0:950
0:950
0:947
0:950
0:929
0:929
0:928
0:922
HC4
0:980
0:993
0:988
0:973
0:964
0:981
0:959
0:908
HCK
0:950
0:959
0:953
0:950
0:936
0:932
0:931
0:927
HC4
0:976
0:993
0:982
0:971
0:991
0:992
0:964
0:927
HCK
0:946
0:946
0:945
0:943
0:942
0:940
0:943
0:950
(b) Model 2: Uniform wi;n Regressors
#
0
0
0
0
1
1
1
1
Kn =n
0:1
0:2
0:3
0:4
0:1
0:2
0:3
0:4
HO0
0:937
0:930
0:905
0:872
0:387
0:420
0:427
0:419
HO1
0:950
0:959
0:955
0:951
0:406
0:463
0:499
0:521
HC0
0:938
0:929
0:904
0:870
0:901
0:862
0:807
0:736
HC1
0:950
0:960
0:953
0:952
0:922
0:905
0:886
0:853
HC2
0:950
0:960
0:952
0:951
0:922
0:906
0:885
0:853
HC3
0:962
0:975
0:982
0:989
0:939
0:939
0:943
0:942
(c) Model 3: Discrete wi;n Regressors
#
0
0
0
0
1
1
1
1
Kn =n
0:1
0:2
0:3
0:4
0:1
0:2
0:3
0:4
HO0
0:936
0:919
0:896
0:868
0:346
0:516
0:616
0:670
HO1
0:949
0:948
0:945
0:945
0:366
0:569
0:703
0:790
HC0
0:935
0:920
0:897
0:871
0:834
0:802
0:777
0:751
HC1
0:948
0:948
0:947
0:947
0:861
0:856
0:858
0:867
HC2
0:947
0:947
0:946
0:943
0:900
0:893
0:892
0:899
HC3
0:960
0:967
0:978
0:988
0:949
0:957
0:970
0:982
Notes:
(i) # = 0 and # = 1 correspond to homoskedastic and heteroskedastic models, respectively.
(ii) HO0 and HO1 employ homoskedastic consistent standard errors without and with degrees of
freedom correction, respectively.
(iii) HC0 –HC4 employ HCk heteroskedastic consistent standard errors; see footnote ?? for details.
(iv) HCK employs our proposed standard errors formula, denoted by ^ HC
n .
37
Download