Read Full Text

advertisement
Research Division
Federal Reserve Bank of St. Louis
Working Paper Series
Tests of Equal Accuracy for Nested Models with Estimated Factors
Sílvia Gonçalves
Michael W. McCracken
and
Benoit Perron
Working Paper 2015-025A
http://research.stlouisfed.org/wp/2015/2015-025.pdf
September 2015
FEDERAL RESERVE BANK OF ST. LOUIS
Research Division
P.O. Box 442
St. Louis, MO 63166
______________________________________________________________________________________
The views expressed are those of the individual authors and do not necessarily reflect official positions of
the Federal Reserve Bank of St. Louis, the Federal Reserve System, or the Board of Governors.
Federal Reserve Bank of St. Louis Working Papers are preliminary materials circulated to stimulate
discussion and critical comment. References in publications to Federal Reserve Bank of St. Louis Working
Papers (other than an acknowledgment that the writer has had access to unpublished material) should be
cleared with the author or authors.
Tests of Equal Accuracy for Nested Models with Estimated Factors
Sílvia Gonçalves, Michael W. McCrackenyand Benoit Perronz
September 14, 2015
Abstract
In this paper we develop asymptotics for tests of equal predictive ability between nested models
when factor-augmented regression models are used to forecast. We provide conditions under which
the estimation of the factors does not a¤ect the asymptotic distributions developed in Clark and
McCracken (2001) and McCracken (2007). This enables researchers to use the existing tabulated
critical values when conducting inference. As an intermediate result, we derive the asymptotic
properties of the principal components estimator over recursive windows. We provide simulation
evidence on the …nite sample e¤ects of factor estimation and apply the tests to the case of forecasting
excess returns to the S&P 500 Composite Index.
JEL Nos.: C12, C32, C38, C52
Keywords: factor model, out-of-sample forecasts, recursive estimation
Western University, Economics Department, SSC 4058, 1151 Richmond Street N. London, Ontario, Canada, N6A
5C2. email: sgoncal9@uwo.ca.
y
Federal Reserve Bank of St. Louis. Address: P.O. Box 442. St. Louis, MO 63166, U.S.A. Tel: 314-444-8594. Email:
michael.w.mccracken@stls.frb.org
z
Département de sciences économiques, CIREQ and CIRANO, Université de Montréal. Address: C.P.6128, succ.
Centre-Ville, Montréal, QC, H3C 3J7, Canada. Tel: (514) 343 2126. Email: benoit.perron@umontreal.ca. The views
expressed here are those of the individual authors and do not necessarily re‡ect o¢ cial positions of the Federal Reserve
Bank of St. Louis, the Federal Reserve System, or the Board of Governors.
1
Introduction
Factors are now a common, parsimonious approach to incorporating many potential predictors into
a predictive regression model. Examples that estimate factors using a large collection of macroeconomic variables include works by Stock and Watson (2002), who use factors to predict U.S. in‡ation
and industrial production; Ludvigson and Ng (2009), who use factors to predict bond risk premia;
Hofmann (2009), who uses factors to predict euro-area in‡ation; Shintani (2005), who uses factors to
predict Japanese in‡ation; and Ciccarelli and Mojon (2010), who use factors to predict global in‡ation.
Examples that estimate factors using a large collection of …nancial variables include works by Stock
and Watson (2003), who use factors to predict a variety of macroeconomic variables; Kelly and Pruitt
(2013), who use factors to predict stock returns; and Andreou, Ghysels, and Kourtellos (2013), who
use factors to predict U.S. real GDP growth.
In many of these examples, the predictive content of the factors is evaluated using pseudo outof-sample methods. Speci…cally, the usefulness of the factor is evaluated by comparing the mean
squared forecast error (MSE) of a model that includes factors with one that does not include factors.
Typically, the comparison is between a baseline autoregressive model and a comparable model that
has been augmented with estimated factors. As such, the models being compared are nested under
the null hypothesis of equal accuracy and the theoretical results in West (1996), based on non-nested
comparisons, may not apply. McCracken (2007) and Clark and McCracken (2001, 2005) show that,
for forecasts from nested models, the distributions of tests for equal forecast accuracy and encompassing are usually not asymptotically normal or chi-square. For one-step-ahead forecasts, tables are
provided in Clark and McCracken (2001) and McCracken (2007) and improved upon by Hansen and
Timmermann (2015). For longer horizons, a bootstrap approach is developed in Clark and McCracken
(2012).
The existing literature on out-of-sample methods for nested models is largely silent on situations
where generated regressors –and, in particular, factors –are used for forecasting.1 In most cases, the
asymptotic distribution of the test of predictive ability is derived assuming that the predictors are
observed and do not themselves need to be estimated prior to forecasting. In the context of in-sample
methods, generated regressors have been shown to sometimes a¤ect the asymptotic distribution of a
statistic, typically through the standard errors. Examples include works by Pagan (1984, 1986) and
Randles (1982). Of particular interest here, Gonçalves and Perron (2014) prove that in some instances
an asymptotic bias is also introduced when factors are estimated rather than known.
1
Li and Patton (2013) establish the …rst-order validity of the asymptotics in Clark and McCracken (2001, 2005) when
the dependent variable, not the regressors, is generated. Fosten (2015) considers the non-nested case with estimated
factors.
1
Building on McCracken’s (2007) and Clark and McCracken’s (2001, 2005) results, we examine the
asymptotic and …nite sample properties of tests of equal forecast accuracy and encompassing applied
to predictions from nested models when generated factors are used as predictors. Speci…cally, we
establish conditions under which the asymptotic distributions of the M SE-F and EN C-F tests of
predictive ability continue to hold even when the factors are estimated recursively prior to forecasting.
Throughout we focus on one-step ahead forecasts made by linear regression models estimated by
ordinary least squares (OLS).
Because our asymptotics depend on the relevant size of the number of cross sectional units (N ),
the total number of observations (T ), and the number of predictions (P ), we provide a collection
of Monte Carlo simulations designed to examine the size and power of the tests as we vary these
parameters. In most, but not all, of our simulations the presence of estimated factors leads to only
minor size distortions, yielding rejection frequencies akin to those found in the existing literature when
the predictors are observed rather than estimated. Nevertheless, we …nd that the estimation of factors
reduces power relative to the case where factors are observed. Finally, to illustrate how the tests
perform in practical settings, the paper concludes with a reexamination of the factor-based forecasting
of excess stock returns.
The rest of the paper is structured as follows. Section 2 introduces notation and describes the setup.
Section 3 provides assumptions and a discussion of these. Section 4 contains our main theoretical
results, including some asymptotic results on the estimation of factors over recursive samples which
may be of independent interest. Section 5 presents Monte Carlo results on the …nite sample size
and power of the tests with an emphasis on the degree of distortions induced by factor estimation
error. Section 6 applies the tests to forecasting excess stock returns using a large cross-section of
macroeconomic aggregates and …nancial data. Finally, Section 7 concludes.
Before we proceed, we de…ne some notation. For any matrix A, we let kAk = tr1=2 (A0 A) denote
the Frobenius norm and kAk1 =
supR
2
2.1
t T
1=2
0
max (A A)
the spectral norm. For R = T
P , we let supt denote
:
Setup and statistics of interest
Setup
Consider a benchmark one-step-ahead forecast predictive regression model given by
yt+1 = wt0 + u1;t+1 ;
where wt is a k1
t = 1; : : : ; T;
1 vector of predictors for the target variable yt+1 .
2
(1)
We would like to compare the forecast accuracy of this model with that of an alternative model
given by
yt+1 = wt0 + ft0 + u2;t+1
where
=
0
;
0 0
and zt = (wt0 ; ft0 )0 is a k2
zt0 + u2;t+1 ;
t = 1; : : : ; T;
(2)
1 set of predictors, with k2 = k1 + r.
In this paper, the r additional predictors ft are latent and denote a vector of r common factors
underlying a panel factor model
0
i ft
xit =
+ eit , t = 1; : : : ; T ; i = 1; : : : ; N:
(3)
We base our comparison of the two models on their ability to forecast y one step-ahead. To do
so, we divide the total sample of T = R + P observations into in-sample and out-of-sample portions.
The in-sample observations span 1 to R; whereas the out–of-sample observations span R + 1 through
T for a total of P one-step-ahead predictions.
For each forecast origin t = R; : : : ; T
1; we generate forecasts of yt+1 using the benchmark model
(1) by regressing ys+1 on ws for s = 1; : : : ; t
1:
y^1;t+1 = wt0 ^ t ;
where
^ =
t
t 1
X
ws ws0
s=1
is the recursive OLS estimator of
!
in (1).
1 t 1
X
ws ys+1
s=1
To generate forecasts of yt+1 using the alternative model (2), we proceed in two steps. First, we
estimate the latent factors recursively at each forecast origin t = R; R + 1; : : : ; T
of principal components. More speci…cally, at each time t, we compute an N
loadings de…ned as ~ t =
matrix F~t =
f~1;t
~ 1;t
0
f~t;t
~ N;t
0
1 by the method
r matrix of factor
and the corresponding factors collected in the t
: Since estimation takes place recursively over t = R; : : : ; T
index ~ t and F~t by t. Thus, f~s;t denotes the sth observation on the r
r
1, we
1 vector of factors estimated
using data on xis up to time t:
The principal components estimators are de…ned as
N
t
1 XX
xis
i ;fs g N t
~ t ; F~t = arg min
f
2
0
;
i fs
i=1 s=1
F~t0 F~t
subject to the normalization condition that t = Ir : Under this normalization, we can show that F~t
p
X 0 F~
is equal to t times the t 1 eigenvector of the t t matrix Xt Xt0 = (tN ) and ~ t = tt t ; where we let
0
1
x1;1 : : : x1;N
B
.. C ; for t = R; R + 1; : : : ; T 1;
..
Xt = @ ...
.
. A
t N
xt;1
: : : xt;N
3
and ~ t =
Xt0 F~t
t :
Thus,
Xt Xt0 ~
tN Ft
= F~t V~t ; where V~t = diag (~
v1;t ; : : : ; v~r;t ), with v~1;t
and v~j;t is the j th largest eigenvalue of
Xt Xt0
tN
v~2;t
:::
v~r;t > 0
. As is well known, the principal components estimator is
not consistent for the true latent factors even under the normalization imposed above. As Bai (2003)
and Stock and Watson (2002) showed, for given t, F~t is mean square consistent for a rotated version
of Ft , where the rotation matrix is de…ned as follows:
!
~t0 Ft
F
Ht = V~t 1
t
Given the estimated factors
forecasts of yt+1 :
0
where
^t =
t 1
X
0
z~s;t z~s;t
s=1
2.2
0
(4)
n
o
f~s;t : s = 1; : : : ; t , we then regress ys+1 onto ws and f~s;t to get
0
y^2;t+1 = wt0 ^t + f~t;t
^t
0
with z~s;t = ws0 ; f~s;t
:
N
for s = 1; : : : ; t
!
1 t 1
X
0 ^
z~t;t
t;
z~s;t ys+1 ;
s=1
1; where t = R; : : : ; T
1:
Test statistics
The statistics of interest are given by
EN C-Ff~ =
PT
and
M SE-Ff~ =
1
^1;t+1 (^
u1;t+1
t=R u
2
^
PT
1
t=R
P
u
^21;t+1
^2
with
^2
u
^2;t+1 )
1
T
X1
u
^22;t+1 ;
u
^22;t+1
;
(5)
(6)
(7)
t=R
and where u
^1;t+1 = yt+1
y^1;t+1 and u
^2;t+1 = yt+1
y^2;t+1 denote the forecast errors of the benchmark
and alternative models, respectively.
Clark and McCracken (2001) derive the asymptotic distribution of the EN C-F statistic under the
null hypothesis H0 : Eu1;t+1 (u1;t+1
u2;t+1 ) = 0 when all the predictors are observed –that is, when
both wt and ft are observed. Similarly, McCracken (2007) derives the asymptotic distribution of the
M SE-F statistic under the null hypothesis H0 : E(u21;t+1
u22;t+1 ) = 0 when all the predictors are
observed. In particular, they derive the asymptotic distributions of the analogs of (5) and (6) when
u
^1;t+1 and u
^2;t+1 are replaced with u
•1;t+1 and u
•2;t+1 , the out-of-sample forecast errors based on wt
and zt = (wt0 ; ft0 )0 , respectively. Our main goal is to prove the asymptotic equivalence between the
4
statistics based on u
^1;t+1 and u
^2;t+1 and those based on u
•1;t+1 and u
•2;t+1 : This result will justify using
the usual critical values when testing for equal predictability with estimated factors in the nesting
model.
We …rst establish an algebraic relationship between the out-of-sample test statistics given in (5),
(6), and (7) and their analogs based on u
•1;t+1 and u
•2;t+1 (which we denote by EN C-Ff , M SE-Ff ,
and • 2 , respectively).
We can write
EN C-Ff~ = EN C-Ff + EN C-Ff
•2
^2
1 +
T 1
1 X
u
•1;t+1 (•
u2;t+1
^ 2 t=R
u
^2;t+1 )
and
M SE-Ff~ = M SE-Ff +M SE-Ff
•2
^2
1
T 1
2 X
(•
u2;t+1 (^
u2;t+1
^ 2 t=R
T 1
1 X
u
•2;t+1 )) 2
(^
u2;t+1
^ t=R
u
•2;t+1 )2 :
This decomposition allows us to identify a set of su¢ cient conditions for the asymptotic equivalence
of the two sets of statistics (based on f~ and on f , respectively).
Lemma 2.1 Suppose that R; P ! 1 such that
a)
b)
c)
PT
1
u2;t+1
t=R (^
u
•2;t+1 )2 = oP (1) :
PT
1
•2;t+1 (^
u2;t+1
t=R u
PT
1
•1;t+1 (•
u2;t+1
t=R u
d) ^ 2
u
•2;t+1 ) = oP (1) :
u
^2;t+1 ) = oP (1) :
• 2 = oP (1) :
Then, it follows that EN C-Ff~ = EN C-Ff + oP (1) and M SE-Ff~ = M SE-Ff + oP (1) :
The proof of Lemma 2.1 is trivial and therefore omitted. Our next goal is to provide a set of
assumptions such that conditions a) through d) of Lemma 2.1 are veri…ed. An important step in
that direction is to provide bounds on the mean squared factors estimation uncertainty over recursive
samples, which we do next. But …rst we need to introduce our assumptions.
3
Assumptions
In this section, we state the assumptions that will be used to establish the asymptotic equivalence of
the test statistics with estimated factors (EN C-Ff~ and M SE-Ff~) and the analogous statistics with
known factors (EN C-Ff and M SE-Ff ). Throughout, we let M be a generic …nite constant.
5
Assumption 1
a) For each s; E kfs k16
M,
E (fs fs0 ) = Ir and
f
0
is a diagonal matrix with distinct entries
arranged in decreasing order.
b) supt
c)
0
1
T
Pt
0
s=1 (fs fs
N
p
Ir ) = OP 1= T :
p
= O 1= N such that
in…nity –that is, 0 < a <
min (
min (
) is bounded away from zero and bounded away from
) < b < 1 for some constants a and b:
The …rst part of Assumption 1 a) requires fs to have …nite moments of order 16. Although some
of our results could be derived under less stringent moment conditions (for instance, the results in
Section 4 require only …nite second moments), we will rely on these many moments in later parts of our
proofs and therefore we impose this condition from the beginning. The remaining part of Assumption
1 a) is a normalization condition often used in factor analysis. It implies that both the factors and the
factor loadings are orthogonal. See, in particular, Stock and Watson (2002) for a similar assumption
and more recently, Fan, Liao, and Mincheva (2013), Bai and Li (2012), and Bai and Ng (2013). Part
p
P
b) of Assumption 1 implies that T 1 Ts=1 fs fs0 converges to f = Ir at T -rate. This condition
follows under standard mixing conditions on fs by a maximal inequality for mixing processes. Part c)
of Assumption 1 requires that a nonnegligible fraction of the factor loadings is nonvanishing, implying
that factors are pervasive. We specify the rate at which
0
N
converges to
, a diagonal matrix given
part a). For simplicity, we treat the factors loadings as …xed, but if the factor loadings were assumed
random, this would be the rate implied by the central limit theorem. See Han and Inoue (2015) for a
similar assumption.
Assumption 2
a) E (eit ) = 0, E jeit j8
M:
b) There exists a constant m > 0 such that
min ( e )
E (et e0t ) :
c)
t;s
and
E (e0t es =N ) =
t;s
PN
i=1 E
(eit eis ) is such that
t;s
p1
N
p1
N
PN
i=1 (eit eis
PN
i=1
i eit
16
E (eit eis ))
M:
6
16
e k1
=
max ( e )
= 0 whenever jt
M for all (t; s).
d) For every (t; s) ; E
e) For every t, E
1
N
> m, k
M:
sj >
< M , where
e
=
for some
< 1,
f)
1
N
PN
1
i;j=1 T
PT
s1 =1
PT
s2 =1 jcov (eis1 ejs1 ; eis2 ejs2 )j
M:
Assumption 2 is rather standard in this literature, allowing for weak dependence of eit across i and
over t. Part b) of Assumption 2 corresponds to the approximate factor model of Chamberlain and
Rothschild (1983), imposing an upper bound on the maximum eigenvalue of the covariance matrix
of et = (e1t ; : : : ; eN t )0 : It is satis…ed if we assume that eit is independent across i, given the moment
conditions on eit . However, it does not require cross sectional independence and is implied by the
P
M; an assumption used by Bai (2003).
condition that max1 j N N
i=1 jE (eit ejt )j
Part c) of Assumption 2 allows for serial correlation on eit but restricts it to be of the moving
average type. This is a stronger assumption than usual. The main reason we impose this assumption
is the need for tight uniform bounds on the time average of the product of the factors estimation error
f~s;t
Ht fs and us+1 , the common value of u1;s+1 and u2;s+1 under the null hypothesis. In particular,
we need that
1
sup t
t
2
N;R
where
t 1
X
f~s;t
Ht fs us+1 = OP
s=1
1
p
R N;R
!
;
= min (N; R). This in turn is satis…ed if
sup t
t
1
t
t 1
X
X
l=1
l;s us+1
s=1
!2
= OP (1) ;
< 1 (but not
P
necessarily if we impose only the more usual weak dependence assumption that T 1 Tl;s=1 2l;s M ).
which follows under the assumption that
t;s
= 0 whenever jt
sj >
for some
Part d) of Assumption 2 strengthens Bai’s (2003) Assumption C.5 by requiring 16 instead of 4 …nite
moments. Part e) is also a strengthened version of Assumption 3.d) by Gonçalves and Perron (2014),
where 16 instead of 2 …nite moments are required. The relatively large number of moments we require
is explained by the fact that we use mean square convergence arguments to show the asymptotic
equivalence of our statistics. Since these are a function of squared forecast errors that depend on the
product of estimated factors and regression estimates, the 16 moments assumptions are the result of
an application of Cauchy-Schwarz inequality. Finally, Bai (2009) relies on an assumption similar to
part f) (see, in particular, his Assumption C.4) in the context of the interactive …xed e¤ects model.
Assumptions 3 and 4 are new to the out-of-sample forecasting context.
Assumption 3
a)
1
N
b)
1
T
PN
i=1 supt
PT
l=1 supt
0:
p1
T
Pt
p1
NT
s=1 fs eis
Pt
2
= OP (1), where E (fs eis ) = 0:
1 PN
s=1
i=1 (eil eis
E (eil eis )) us+1
7
2
= OP (1), where E ((eil eis
E (eil eis )) us+1 ) =
p1
NT
c) supt
d)
1
T
PT
1 PN
i=1
s=1
Pt
l=1 supt
p1
NT
0:
e) supt
p1
NT
i eis us+1
Pt
2
= OP (1), where E ( i eis us+1 ) = 0:
1 PN
s=1
i=1 (eil eis
Pt
1 PN
s=1
i=1
0
i eis zs
2
E (eil eis )) zs
2
= OP (1) ; where E ((eil eis
E (eil eis )) zs ) =
= OP (1), where E ( i eis zs0 ) = 0:
Assumption 3 requires that several partial sums be bounded in probability, uniformly in t =
R; : : : ; T . Since the summands have mean zero, this assumption is not very restrictive and is implied
by an application of the functional central limit theorem. For instance, we can verify that Assumptions
3 b) and c) hold under Assumptions 2 d) and e) if we assume that fut g and fet g are independent of
each other and ut+1 is a martingale di¤erence sequence, as we assume in Assumption 5 below.
Our next assumption requires that the partial sums in parts b) and c) of Assumption 3 have 8
…nite moments.
Assumption 4
p1
TN
a) For each (l; t), E
p1
NT
b) For each t; E
Pt
1 PN
s=1
i=1 (eil eis
1 PN
i=1
s=1
Pt
i eis us+1
E (eil eis )) us+1
8
8
M:
M:
Assumption 4 is a strengthening of Assumption 4 of Gonçalves and Perron (2014), according to
2
2
P
PN
P
1 PT
E (eil eis )) us+1
M and E pN
e
u
M:
which E pT1 N Ts=1 N
i
is
s+1
i=1 (eil eis
i=1
s=1
T
A su¢ cient condition for Assumption 4 is that ut+1 is i.i.d., given Assumptions 2 d) and e), respectively.
Our assumption on ut+1 is nevertheless weaker, as we assume that only the regression innovation is a
martingale di¤erence sequence. More formally, we impose the following assumption on ut+1 .
Assumption 5 E ut+1 jF t = 0, where F t =
(yt ; yt
1 ; : : : ; Xt ; Xt 1 ; : : : ; zt ; zt 1 ; : : :).
Assumption 5 is analogous to the martingale di¤erence assumption in Bai and Ng (2006) when the
forecasting horizon is 1.
Assumption 6 a) E kzt k16
1
b) Let V (t) = t
M for all t:
c) Let B (t)
OP
t
Pt
M; E u8t+1
1
s=1 zs us+1 ,
1
p
1= T :
Pt
1
0
s=1 zs zs
M for all t = 1; : : : ; T:
p
p
where E (zs us+1 ) = 0: Then supt kV (t)k = OP 1= T and E
T V (t)
1
and B = (E (zs zs0 ))
8
1
. Then, supt kB (t)k = OP (1) and supt kB (t)
8
Bk =
d) P; R ! 1 such that P=R = O (1) :
Assumption 6 imposes conditions on the regressors and errors of the predictability regression model
based on the latent factors. In particular, Assumption 6 a) extends the moment conditions on the
factors assumed in Assumption 1 to the observed regressors wt . In addition, it requires the innovations
ut+1 to have eight …nite moments. Parts b) and c) of Assumption 6 imply that supt •t
=
p
supt kB (t) V (t)k = OP 1= T , which is the usual rate of convergence. These assumptions are
implied by more primitive assumptions that ensure the application of mixingale-type inequalities.
Part d) is implied by the usual rate condition on P and R according to which limP;R!1
0
P
R
= , with
< 1:
4
Theory
4.1
Properties of factors over recursive samples
As mentioned earlier, factor models are subject to a fundamental identi…cation problem. In our setup,
this means that factors f~s;t estimated at each forecasting origin t are consistent toward a rotation of the
true factors given by Ht fs , where the rotation matrix Ht is time-varying. The asymptotic properties
of the rotation matrix Ht and its inverse (as well as the inverse of the eigenvalue matrix V~t 1 ) are a
crucial step to proving the asymptotic negligibility of the factors estimation error in the out-of-sample
forecasting context. Lemmas A.5 and A.6 in Appendix A contain these results. These results are
crucial to bounding terms involving the estimated factors in the expansions of our test statistics.
The following theorem establishes the rates of convergence of the estimated factors over recursive
subsamples.
Theorem 4.1 Under Assumptions 1-6,
a)
1
P
PT
b) supt
1
t=R
1
t
f~t;t
Pt
s=1
Ht ft
f~s;t
2
= OP 1=
Ht fs
2
2
NR
2
N;R
, where
= OP 1=
2
NR
= min (N; R) :
.
Theorem 4.1 extends Lemma A.1 of Bai (2003) to the out-of-sample context. Just as Bai’s (2003)
result is crucial to proving the asymptotic theory of the method of principal components over one
sample when both N and T are large, Theorem 4.1 is crucial to proving the asymptotic theory of this
estimation method over recursive samples. Part a) of Theorem 4.1 shows that the time average over
the out-of-sample period of the squared deviations between the estimated factor f~t;t and the rotated
p p
factor Ht ft converges to zero at rate OP 1= 2N;R , where N;R = min
N ; R : Part b) shows that
9
the same rate of convergence applies uniformly over t = R; : : : ; T to the time average of the squared
deviations between f~s;t and Ht fs for each recursive sample estimation period s = 1; : : : ; t.
4.2
Asymptotic equivalence
This section’s main contribution is to show that Lemma 2.1 holds under the null hypothesis of equal
predictability. In particular, we verify conditions a) through d), which involve the forecast errors
u
•1;t+1 ; u
•2;t+1 , u
^1;t+1 and u
^2;t+1 , and their respective di¤erences.
We have that
u
•2;t+1 = yt+1
zt0 •t ;
where •t is the recursive OLS estimator obtained from regressing ys+1 on zs = (ws0 ; fs0 )0 , for s =
1; : : : ; t
1; whereas u
^2;t+1 is given by
u
^2;t+1 = yt+1
0
where z~t;t = wt0 ; f~t;t
0
0 ^
z~t;t
t;
0
0
.
and ^t is the OLS estimate obtained from regressing ys+1 on z~s;t = ws0 ; f~s;t
Thus,
u
•2;t+1
Let
t
=
0
; 00
0
0
t zt )
0 1
t
t zt
0
t zt )
u
^2;t+1 = (~
zt;t
since
the fact that z~t;t
(~
zt;t
zt0 •t :
= diag (Ik1 ; Ht ). Adding and subtracting appropriately yields
u
•2;t+1
where
0 ^
u
^2;t+1 = z~t;t
t
=
=
and
00 ; f~t;t
0 1
t
•t
0
^t
+ z~t;t
0 1•
t
t
;
(8)
= 0 under the null hypothesis of equal accuracy. This and
Ht ft
0
0
(since wt is fully observed) explain why the term
is missing in (8).
The …rst term in (8) measures the discrepancy between the “ideal” predictors at time t, given
0
0
by zt = (wt0 ; ft0 )0 , and their estimated version given by z~t;t = wt0 ; f~t;t
, interacted with the error in
estimating
using •t : The second term in (8) measures the impact of the factor estimation error on
the OLS estimators.
Our …rst goal is to show that
p
OP 1= T , whereas
T 1
1 X
k~
zt;t
T
t=R
PT
1
u2;t+1
t=R (•
2
t zt k =
u
^2;t+1 )2 = oP (1). By Assumption 6, supt •t
T 1
1 X ~
ft;t
T
Ht ft
2
= OP 1=
=
2
N;R
t=R
by Theorem 4.1. Thus, by an application of the Cauchy-Schwarz inequality, we can show that the sum
of the squared contributions from the …rst term in (8) is of order OP 1=
that N; R ! 1:
10
2
N;R
: This is oP (1) given
In turn, the sum of squared contributions from the second term is bounded by
T sup ^t
0 1•
t
t
t
where T
1
if supt ^t
PT
1
zt;t k2
t=R k~
2
0 1•
=
t
t
2
T 1
1 X
k~
zt;t k2 ;
T
t=R
= OP (1) under our assumptions. Thus, condition a) in Theorem 4.1 follows
oP (1=T ) ; as shown in the next lemma.
Lemma 4.1 Suppose Assumptions 1-6 hold and the null hypothesis is true. Then,
!
1
0
1
•t = OP p
sup ^t
:
t
t
R N;T
Since Lemma 4.1 implies that supt ^t
0 1•
t
t
2
1
= OP
R
2
N;T
1
T
= oP
, this result and the
above arguments imply that
T
X1
(^
u2;t+1
2
u
•2;t+1 ) = OP
t=R
1
2
N;T
!
= oP (1) ;
which is condition a) of Lemma 2.1.
Next, we focus on condition b) of Lemma 2.1. Given the de…nition of u
•2;t+1 , we can write
u
•2;t+1 = yt+1
where ut+1 = yt+1
T
X1
zt0 •t = yt+1
wt0 = yt+1
u
•2;t+1 (•
u2;t+1
zt0 •t
zt0
zt0 •t
= ut+1
;
zt0 under H0 : It follows that
u
^2;t+1 ) =
t=R
T
X1
ut+1 (•
u2;t+1
u
^2;t+1 )
t=R
T
X1
zt0 •t
(•
u2;t+1
u
^2;t+1 ) :
t=R
The second term is bounded by
T
X1
t=R
zt0 •t
j•
u2;t+1
T
X1
u
^2;t+1 j
zt0 •t
t=R
2
!1=2
T
X1
(•
u2;t+1
u
^2;t+1 )2
t=R
!1=2
;
given the Cauchy-Schwarz inequality. Since the square of the …rst term is bounded by
T
X1
t=R
zt0 •t
2
sup
t
p
T •t
2
T 1
1 X
kzt k2 ;
T
t=R
which is OP (1) under Assumption 6, condition b) is implied by condition a) and the next lemma.
Lemma 4.2 Suppose Assumptions 1-6 hold and the null hypothesis is true. If
PT 1
u2;t+1 u
^2;t+1 ) = oP (1).
t=R ut+1 (•
11
p
T =N ! 0, then
p
T =N ! 0 is the same as the one used by Bai and Ng (2006) to show
The rate restriction
that factor estimation uncertainty disappears asymptotically when conducting inference on factoraugmented regression models. In our context, this rate restriction is used to show that the di¤erence
between the out-of-sample test statistics based on the estimated factors and those based on the latent
factors is asymptotically negligible.
Similarly, condition c) of Lemma 2.1 follows using the same arguments as condition b), whereas
d) follows from conditions a) and b) since
^2
•2 = P
= P
1
1
T
X1
t=R
T
X1
u
^22;t+1
(^
u2;t+1
u
•22;t+1
2
u
•2;t+1 ) + 2P
t=R
1
T
X1
u
•2;t+1 (^
u2;t+1
u
•2;t+1 ) = OP
t=R
1
P
;
and P ! 1: The next theorem contains our main theoretical result.
Theorem 4.2 Under Assumptions 1-6, if the null hypothesis holds and
p
T =N ! 0, then EN C-
Ff~ = EN C-Ff + oP (1) and M SE-Ff~ = M SE-Ff + oP (1) :
This result justi…es using the asymptotic critical values given in Clark and McCracken (2001) for
the EN C-F statistic and in McCracken (2007) for the M SE-F statistic. These critical values are
derived assuming conditional homoskedasticity. Without this assumption, a di¤erent set of critical
values should be used. In particular, Clark and McCracken (2012) have proposed bootstrap critical
values to approximate the asymptotic distribution under conditional heteroskedasticity.
5
Simulations
In this section, we analyze the …nite sample behavior of tests of equal predictive ability with estimated
factors. For this purpose, we use a data-generating process similar to the one in Clark and McCracken
(2001):
ys
fs
=
0:5 b
0 0:5
ys
fs
1
1
+
uy;s
uf;s
for s = 1; : : : ; T; where the vector of errors is i.i.d. N (0; I2 ) and y0 = f0 = 0: The value of b will be
set to 0 for the size experiment and 0.3 for the power experiment.
2
The panel of variables used to extract the factors is obtained as
xis =
i fs
2
+ eis ;
We also conducted an experiment with the autoregressive coe¢ cient on ys set to 0 and b set to 0 in both the
benchmark and alternative models. This speci…cation is analogous to our empirical speci…cation in the next section, but
because the results were similar, we decided not to report them. They are available from the authors upon request.
12
where
i
s U (0; 1), eis s N (0;
2 ),
i
and
2
i
s U (0:5; 1:5).
3
We consider as sample sizes N 2 f50; 100; 150; 200g and T 2 f50; 100; 200g: We report results for
three sample splits:
=
P
R
2 f0:2; 1; 2g: The number of replications is set at 100,000.
We test the predictive ability of factors using recursive windows. For each t = R; : : : ; T
1; we
estimate the parameters of the model under the null hypothesis:
ys+1 =
using data from s = 1; : : : ; t
0
+
1 ys
+ u1;s+1 ;
1, while the model under the alternative hypothesis is given by
ys+1 =
0
+
1 ys
+ fs + u2;s+1 ; s = 1; : : : ; t
1;
where we estimate fs with f~s;t recursively using data up to time t, xis , i = 1; : : : ; N , and s = 1; : : : ; t.
We consider the EN C-Ff~ and M SE-Ff~ statistics. All tests are conducted at the 5% signi…cance level.
The results are reported in Table 1. We report right-tail rejection probabilities using the asymptotic
critical values for each statistic for two cases: when the factor is known and when the factor is
estimated. The asymptotic critical values are from Clark and McCracken (2001) for the EN C-F
statistic and McCracken (2007) for the M SE-F statistic. The results for
followed by
= 1, and
= 0:2 are reported …rst,
= 2: In each case, we report results for the EN C-F and M SE-F; …rst with
the factor assumed known and then with the factor estimated.
The size results are reported in the top panel of Table 1. While there should be no e¤ect of changes
in N for the observed factor case, we do see some variation that must be accounted by randomness. As
predicted by theory, the statistics seem to behave very similarly under the null hypothesis regardless
of whether the factor is estimated or known. The largest absolute di¤erence in rejection rates is 0:11
percentage points. The average absolute discrepancy across all sample sizes and three choices of
is 0.039 percentage points. Overall, all tests have very good size properties, though there is some
tendency to overreject for smaller values of T: The M SE-F statistic tends to be less subject to size
distortions than the EN C-F test statistic.
The bottom panel of Table 1 reports power results for the two tests considered. This panel contains
an important result: estimating the factor reduces power to some degree. For example, for N = 50,
T = 50; and
= 0:2; power of the M SE-F statistic is 41.7% if the factor is known, but 40.3% if the
factor is estimated. This is a general result for all values of N; T , and : We also …nd other results
from the literature such as the power advantage of the EN C-F statistic over the M SE-F statistic
3
We have also experimented with errors drawn from a Student distribution with 3 degrees of freedom to check
sensitivity to the strong moment conditions imposed in our assumptions. Results are qualitatively similar and are
available upon request from the authors.
13
(see Clark and McCracken, 2001) and the fact that power is increasing in P which can be achieved by
increasing T for …xed
6
or increasing
for …xed T:
Empirical results
In this section, we provide empirical evidence on the predictive content of macroeconomic factors for
the equity premium associated with the S&P 500 Composite Index. Others who have looked into
factor-based prediction of the equity premium include Bai (2010), Cakmakli and van Dijk (2010), and
Batje and Menkho¤ (2012). Like us, they use monthly frequency data to investigate whether factorbased model forecasts are more accurate than just using the historical mean. Unlike us, formal tests
of statistical signi…cance are not necessarily conducted (though to be fair, prior to the current paper
a theoretical justi…cation for doing so did not exist). In addition, in each of these three papers an
in-sample information criterion is used to select the “best” model prior to forecasting – an approach
we do not follow. Instead, we consider a range of possible models each de…ned by a given permutation
of the current estimated factors.
In our exercise, we construct factors f~t;t using a large monthly frequency macroeconomic database of
134 predictors documented in McCracken and Ng (2015).4 This dataset is designed to be representative
of those most frequently used in the literature on factor-based forecasting and appears to be similar
to those used in the papers cited above. If we let Pt+1 denote the value of the stock index at the
close of business on the last business day of month t + 1, we construct our equity premium as yt+1 =
ln(Pt+1 =Pt )
rt+1 , where rt+1 denotes the 1-month Treasury bill rate.5 After transforming the data
to induce stationarity and truncating the dataset to avoid missing values, our dataset spans 1960:03
through 2014:12 for a total of T = 658 observations.
We construct our forecasting exercises with two goals in mind. First, we want to identify the
factors with the greatest predictive content for the equity premium. Toward this goal we consider
each of the …rst 8 macroeconomic factors (f~1t ; :::; f~8t ) as potential predictors.6
As shown later, the
second factor, which loads heavily on interest rate variables, is clearly the strongest predictor.
Second, we want to investigate the importance of the “timing” of the estimated factors. In our
reading of the factor-based forecasting literature (not limited to the papers listed earlier), it is not
4
We extract the factors using the April 2015 vintage of FRED-MD. Four of the series are dropped to avoid the presence
of missing values. By doing so we can estimate the factors using precisely the same formulas given in the text.
5
The
1-month
Treasury
bill
rates
are
obtained
from
Kenneth
French’s
website
(http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html).
6
As discussed in McCracken and Ng (2015), Factor 1 can be interpreted as a real activity/employment factor. Factor
2 is dominated by interest rate variables. Factor 3 is concentrated on price variables. Factors 4 and 5 are a mix of
housing and interest rate variables. Like Factor 1, Factor 6 concentrates on real activity/employment variables. Factor
7 is associated with stock market variables, while Factor 8 loads heavily on exchange rates.
14
always clear whether the time t factors are dated using calendar or data-release time. Note that in
equation (2), the dependent variable is associated with time t + 1, while the factor is associated with
time t. Since our dependent variable is observable immediately at the close of business on the last
business day of month t + 1, the distinction between calendar and data-release time is irrelevant.
This is in sharp contrast to the data used to construct the factors.
The dataset used consists of
macroeconomic series that are typically released with a 1-month delay. As such, the factor estimates
based on data associated with (say) April 2015 cannot literally be constructed until May 2015 after the
data is released. To evaluate the importance of this distinction, we conduct our forecasting exercise
once estimating the factors f~t;t such that t represents calendar time (i.e., the data are associated with
month t) and once such that t represents data-release time (i.e. the data is associated with month
t
1).
For all model comparisons, our benchmark model uses the historical mean as the predictor and
hence Model 1 takes the form
yt+1 =
+ u1;t+1 .
The competing models are similar but add estimated factors and hence take the form
0
yt+1 = + f~t;t
+ u2;t+1 .
We consider all 28
1 = 255 permutations of models that include a single lag of at least 1 of the …rst 8
factors. For each we construct one-step-ahead forecasts and test the null hypotheses of equal predictive
ability at the 5% level using the M SE-F and EN C-F test statistics based on critical values taken
from Clark and McCracken (2001) and McCracken (2007). Note that the appropriate critical values
vary with the number of factors used in the competing model. All models are estimated recursively
by OLS. We consider two out-of-sample periods: 1985:01-2014:12 and 2005:01-2014:12. These imply
sample splits (R; P ) equal to (298; 360) and (438; 120), respectively.
Rather than report all of the potential pairwise comparisons, for each out-of-sample period and
for each de…nition of time t (calendar vs. data-release time), we report the results associated with
the 10 best-performing factor-based models. The results are reported in Table 2. The upper panel is
associated with calendar time, while the lower is associated with data-release time. For each of the
four sets of results we list (i) the factors in the model, (ii) the ratio of the factor-model MSE to the
benchmark model MSE, and (iii) the M SE-F and EN C-F statistics. An asterisk is used to denote
signi…cance at the 5% level.
The upper panel of Table 2 suggests the presence of predictability for a variety of factor-based
models. Despite the fact that the gains in forecast accuracy are nominally small, both the M SE-F
15
and EN C-F tests indicate signi…cant improvements relative to the benchmark. The predictive content
is largely concentrated in the second factor –a feature also documented in Bai (2010) and Batje and
Menkho¤ (2012). Over the shorter, more recent, out-of-sample period the gains in forecast accuracy
are larger than before but still remain modest. Interestingly, it appears that the eighth factor has
become increasingly important for forecasting the equity premium and, in fact, the best 6 models all
include both the second and eighth factor.
All of that said, in the lower panel of Table 2 we …nd less predictive content, which suggests that
the timing used to estimate factors can play an important role in determining their usefulness for
forecasting. Over the longer out-of-sample period, exactly 2 of the 255 factor-based models have a
nominally smaller MSE than the benchmark (note that the ratio is reported to only two decimals)
and in only two instances is the improvement statistically signi…cant based on either the M SE
or EN C
F
F . Over the shorter out-of-sample period, the evidence of predictability is a bit stronger.
As in the upper panel, the second and eighth factors seem to contain predictive content, providing an
improvement in forecast accuracy of about 2% that is signi…cant at the 5% level regardless of which
test statistic we consider. Unlike the upper panel, the seventh factor shows up in roughly half of the
models and is one of the better predictors over the longer out-of-sample period. The seventh factor,
which loads heavily on stock market variables, shows up in only one model under calendar based
timing.
Overall, it seems the second factor, which loads heavily on interest rates, is the strongest and most
consistent predictor of the equity premium. This is consistent with extensive empirical results in the
literature on the ability of interest rates to forecast stock returns, see for example Hjalmarsson (2010)
for evidence for developed countries. Even so, the eighth factor, which loads heavily on exchange rate
variables, seems to be of increasing importance. Of course, one could certainly argue that the gains
in forecast accuracy are small but, as is common in the …nancial forecasting literature, even small
statistical improvements in forecast accuracy can provide large economic gains. On a …nal note, it is
interesting to note that the …rst factor, which loads heavily on real activity and employment variables,
never appears in Table 2. This is of independent interest since the …rst factor is the one most often
used in the literature on factor-based forecasting.
7
Conclusion
Factor-based forecasting is increasingly common at central banks and in academic research. The accuracy of these forecasts is sometimes compared with other benchmark models using tests of equal
accuracy and encompassing for nested models developed in Clark and McCracken (2001) and Mc-
16
Cracken (2007). In this paper, we establish conditions under which these tests remain asymptotically
valid when the factors are estimated recursively across forecast origins. While some of these conditions are straighforward generalizations of existing theoretical work on factors, others are new and of
independent interest for applications that estimate factors recursively over time.
Simulation evidence supports the theoretical results showing that the size of the tests are typically
near their nominal level despite …nite sample estimation error in the factors. Perhaps not surprisingly,
we …nd that this …nite sample estimation error reduces power compared with a hypothetical case in
which the factors can be observed directly. The paper concludes with an empirical application in which
a large macroeconomic dataset is used to construct factors and forecast the equity premium. While
the predictive content is not high, we …nd signi…cant evidence that the second factor has consistently
exhibited predictive content for the equity premium and that of late, the eighth factor has become
increasingly more relevant.
17
A
Appendix
This appendix contains two sections. First, we provide several lemmas useful to prove the results in
Section 4, followed by their proofs. Then we prove the results in Section 4.
A.1
Auxiliary lemmas
Our …rst set of results are instrumental in proving some of the auxiliary lemmas that follow. As in
Bai (2003), we use the following notation:
l;s
l;s
= E
=
N
1 X
=
E (eil eis ) ;
N
e0l es
N
l;s
i=1
N
fl0 0 es
1 X 0
=
fl i eis ,
N
N
and
N
1 X
=
(eil eis
N
E (eil eis )) ;
i=1
l;s
i=1
=
N
fs0 0 el
1 X 0
=
fs i eil :
N
N
i=1
Moreover, for t = R; : : : ; T; we let
Ut;T
t 1
1 X
p
zs us+1
T s=1
T
t
ut+1
!
;
and note that Ut;T is a martingale di¤erence sequence array with respect to F t given Assumption 5.
Lemma A.1 Under Assumptions 1-6,
a) supt
1
t
b) supt
1
t
c) supt
1
t
d) supt
1
t
e) supt
1
t
f ) supt
1
t
g) supt
1
t
Pt
l=1
Pt
l=1
Pt
l=1
Pt
l=1
Pt
l=1
Pt
l=1
Pt
l=1
Pt
1
s=1
l;s us+1
Pt
1
s=1 l;s us+1
Pt
1
s=1
l;s us+1
Pt
1
s=1 l;s us+1
Pt
1
s=1
0
l;s zs
Pt
1
0
s=1 l;s zs
Pt
1
s=1
0
l;s zs
2
2
2
2
2
2
2
= OP (1), where
= OP
T
N
; where
l;s
= OP
T
N
; where
l;s
= OP
T
N
; where
l;s
= OP (1).
= OP
T
N
:
= OP
T
N
:
Lemma A.2 Under Assumptions 1-6,
a) supt E
1
t
Pt
l=1 fl
Pt
1
s=1
l;s us+1
l;s
4
= O (1).
18
1
N
PN
i=1 E
1
N
fl0
(eil eis ) :
PN
i=1 (eil eis
0e
N
s;l :
s
= fl0 N1
E (eil eis )) :
PN
i=1
i eis :
b) supt E
1
t
c) supt E
1
t
d) supt E
1
t
Pt
Pt
4
1
s=1 l;s us+1
l=1 fl
Pt
Pt
1
s=1
l=1 fl
Pt
4
l;s us+1
Pt
1
s=1 l;s us+1
l=1 fl
4
=O
T 2
N
:
=O
T 2
N
:
=O
T 2
N
:
Lemma A.3 Under Assumptions 1-6,
a)
b)
c)
d)
e)
PT
1
T
t=R kUt;T k
PT
t=R
PT
t=R
PT
t=R
PT
t=R
1
t
1
t
1
t
1
t
2
= OP (1) :
Pt
s=1 fs s;t
Pt
s=1 fs s;t
Pt
s=1 fs
Pt
s=1 fs
0
AUt;T = OP (1) ; where A is any r
q
0
T
AUt;T = OP
N :
s;t
0
AUt;T = OP
s;t
0
AUt;T = OP
q
q
T
N
:
T
N
:
(r + k1 ) matrix of constants.
Our next result provides bounds on the norms of A1s;t ; A2s;t ; A3s;t ; and A4s;t , which are de…ned
by the following equality given in Bai (2003). Speci…cally, for each t = R; : : : ; T and s = 1; : : : ; t, we
can write
f~s;t
Ht fs = V~t
0
B X
B t
f~l;t
Bt
@ l=1
{z
|
1 B1
A1s;t
1
C
t
t
t
C
1X ~
1X ~
1X ~
fl;t l;s +
fl;t l;s +
fl;t l;s C
l;s +
C:
t
t
t
A
l=1
l=1
l=1
} |
{z
} |
{z
} |
{z
}
A2s;t
A3s;t
(9)
A4s;t
Lemma A.4 Under Assumptions 1 through 6,
a)
1
P
PT
b) supt
1
2
t=R kA1t;t k
1
t
= OP
Pt
2
s=1 kA1s;t k
1
T
= OP
; whereas for j = 2; 3; 4,
1
R
1
P
PT
1
2
t=R kAjt;t k
; whereas for j = 2; 3; 4, supt
1
t
Pt
= OP
2
s=1 kAjs;t k
1
N
.
= OP
1
N
.
Lemma A.4 and part a) of the following result (which provides uniform bounds on V~t
1
over
t = R; : : : ; T ) are used to prove Theorem 4.1.
Lemma A.5 Under Assumptions 1-6, we have that (a) supt V~t
1
= OP (1) ; and (b) supt kHt kq =
OP (1) ; for any q > 0:
Our next result derives the convergence rates of Ht ; Ht 1 , and V~t 1 . Given the identi…cation
condition Assumption 1 a), we show that Ht converges to H0t = diag ( 1) at rate O (1=
over t = R; : : : ; T: A similar result holds for Ht
1
at rate O (1=
N;R )
1
N;R )
uniformly
and for V~t 1 , which converge to H0t1 and to V0
uniformly over t = R; : : : ; T; respectively.
19
1
Lemma A.6 Under Assumptions 1-6,
H0t k = supt Ht
a) supt kHt
b) supt V~t
1
V0
1
1
= OP (1=
H0t1 = OP (1=
N;R ),
N;R ),
where V0 =
where H0t = diag ( 1) and
N;R
= min
p
p
N; R :
> 0.
Next, we provide a result that is auxiliary in proving Lemma 4.1.
Lemma A.7 Under Assumptions 1 through 6,
a) supt
1
t
b) supt
1
t
supt
Pt
1
s=1 A1s;t us+1
Pt
1
0
s=1 A1s;t zs
1
t
Pt
1
0
s=1 A4s;t zs
= OP
1
R
whereas for j = 2; 3; 4, supt
= OP
1
R
; for j = 2; 3, supt
1
= OP
:
2
N;R
1
t
1
t
Pt
Pt
1
s=1 Ajs;t us+1
1
0
s=1 Ajs;t zs
= OP
= OP
p1
RN
p1
RN
:
; and
To prove Lemma 4.1, we need to introduce some additional notation. Let
^ (t) =
B
t
1
t 1
X
0
z~s;t z~s;t
s=1
and note that ys+1 = zs0 + us+1 with
!
=
0
1
; and V^ (t) = t
1
t 1
X
z~s;t us+1 ;
s=1
0
; 00
under the null hypothesis. Moreover, adding and
subtracting appropriately it is also true that
0
ys+1 = z~s;t
0
t zs )
given that (~
zs;t
0 1
t
(~
zs;t
= 0 when
t zs )
=
0
0
0 1
t
; 00
0
0
+ us+1 = z~s;t
+ us+1 ;
and there is no estimation error in the regressors
of the benchmark model. Therefore,
^t = + B
^ (t) V^ (t) and •t = + B (t) V (t) ;
which implies that
^t
since
0 1
t
^t
0 1•
t
t
^ (t) V^ (t)
=B
0 1
t B (t) V
(t) ;
= 0 under the null hypothesis. Thus, we can write
0 1•
t
t
=
0 1
t B (t)
1
t
V^ (t)
tV
^ (t)
(t) + B
0 1
t B (t)
1
t
V^ (t) ;
where the …rst term captures the factors estimation error in the score process and the second term
captures the factors estimation error in the (inverse) Hessian.
The following result analyzes each term of ^t
0 1•
t:
t
Lemma A.8 Suppose Assumptions 1-6 hold. Then,
20
a) supt V^ (t)
tV
^ (t)
b) supt B
0 1
t B (t)
p 1
R N;T
(t) = OP
c) supt V^ (t) = OP
p1
T
1
:
1
= OP
t
2
N;R
:
:
Our next result is useful to prove Lemma 4.2. Given (8), we can write
T
X1
ut+1 (•
u2;t+1
u
^2;t+1 ) =
t=R
T
X1
ut+1 (~
zt;t
t zt )
0
0 1
t
•t
+
t=R
T
X1
0
^t
ut+1 z~t;t
0 1•
t
t
:
(10)
t=R
We analyze each piece separately. The following lemma contains the results.
Lemma A.9 Suppose Assumptions 1-6 hold. If the null hypothesis holds and
a)
b)
PT
1
zt;t
t=R ut+1 (~
PT
1
0
~t;t
t=R ut+1 z
A.2
0
t zt )
^t
•t
0 1
t
0 1•
t
t
p
T =N ! 0, then
= oP (1) :
= oP (1) :
Proofs of auxiliary lemmas
Proof of Lemma A.1. We start with a). Under Assumption 2 c), we can write
t 1
X
l;s us+1 =
s=1
where we let
t 1
X
s=1
l;s us+1
0 if min (l; s)
l;s
!2
=
l+
X
s=l
l;s us+1
l+
X
l;s us+1 ;
s=l
0 or max (l; s) > t. By Cauchy-Schwarz,
!2
l+
X
s=l
2
l;s
l+
X
u2s+1
(2 + 1) max
s=l
l;s
2
l;s
l+
X
u2s+1
M
s=l
s=l
for some constant M , given Assumption 2 c), implying that
0
!2 1
!
t
t 1
t
T 1 l+
l+
X
X
X
X
1
1
1 X X 2
2
@
A
sup
M sup
us+1 = M
us+1 = OP (1)
l;s us+1
t
t
R
t
t
l=1
provided E u2s+1
s=1
l=1 s=l
l=1 s=l
M; which follows under Assumption 6 a).
Next, consider part b). We have that
0
!2 1
t
t 1
X
X
1
A
sup @
l;s us+1
t
t
s=1
l=1
0
!2 1
t
t 1X
N
X
X
1
T
1
p
sup @
(eil eis E (eil eis )) us+1 A
=
N t
t
N
T
s=1 i=1
l=1
0
!2 1
T
t 1X
N
X
X
1
T T @1
=
sup p
(eil eis E (eil eis )) us+1 A = OP
NR T
t
N
T
s=1 i=1
l=1
21
l+
X
T
N
u2s+1 ;
under Assumption 3.b).
For c),
0
t
t 1
1X X
@
sup
t
t
l=1
s=1
0
!2 1
!2 1
t
t 1
N
X
X
X
1
A = sup @ 1
A
fl0
i eis us+1
l;s us+1
t
N
t
s=1
i=1
l=1
0
t 1 N
t
T
1X
1 XX
sup @
kfl k2 p
i eis us+1
N t
t
NT
s=1 i=1
l=1
T
sup
N t
|
given in particular Assumption 3 c).
Finally, for d),
0
t
t 1
1X X
@
sup
t
t
l=1
s=1
l;s us+1
!
t
t 1 N
1X
1 XX
kfl k2 sup p
t
t
N T s=1 i=1
l=1
{z
{z
}|
1
A
2
i eis us+1
= OP
T
N
;
}
=OP (1)
=OP (1)
!2 1
2
0
!2 1
t
t 1
N
X
X
X
1
A = sup @ 1
fs0 i eil us+1 A
t
N
t
s=1
i=1
l=1
0
!2 1
t
N
t 1
X
X
X
1
1
0
= sup @
fs us+1 A
i eil
t
N
t
s=1
i=1
l=1
0
1
2 t 1
2
t
N
X
X
X
1
1
0
fs us+1 A
sup @
i eil
t
N
t
T
2
N
1 X 1 X
p
=
R
N i=1
l=1
{z
|
=OP (1)
given Assumptions 2 e) and Assumption 6 b).
s=1
i=1
l=1
T
N
0
i eil
}
2
t 1
1 X
sup p
fs us+1 ;
t
T s=1
{z
}
|
=OP (1)
Part e) follows exactly as the proof of a) given Assumption 2 c) and the fact that E kzs k2
M by
Assumption 6 a).
Next, consider part f). We have that
1
0
0
2
t
t 1
T
t 1 N
X
X
T T @1 X
1 XX
1
0
A
p
sup @
z
=
sup
(eil eis
l;s s
t
NR T
t
t
N
T
s=1 i=1
l=1 s=1
l=1
2
E (eil eis )) zs0
1
A = OP
under the analog of Assumption 3 b) with us+1 replaced with zs0 (cf. Assumption 3 d)).
22
T
N
For g),
0
t
t 1
1X X
@
sup
t
t
l=1
2
1
0
s=1
t 1
X
!
N
1 X
N
2
1
0
fl0
i eis zs A
t
s=1
i=1
l=1
0
1
2
t
t 1X
N
X
X
T
1
1
0
sup @
kfl k2 p
i eis zs A
N t
t
N T s=1 i=1
l=1
!
2
t 1 N
t
T
1X
1 XX
2
0
sup
kfl k sup p
= OP
i eis zs
N t
t
t
N
T
s=1 i=1
l=1
{z
}
|
{z
}|
A = sup @
0
l;s zs
t
1X
t
T
N
;
=OP (1)
=OP (1)
given in particular Assumption 3 e) (the analog of 3 c).
Proof of Lemma A.2. Starting with a), by the Cramer-Rao and Cauchy-Scwharz inequalities,
0
1
! 4
4
t
t 1
t
t 1
X
X
X
X
1
1
A
E
fl
E @kfl k4
l;s us+1
l;s us+1
t
t
s=1
l=1
s=1
l=1
t
1X
E kfl k8
t
1=2
l=1
t
1X
t
l=1
E kfl k8
0
@E
!1=2 0
@
8
t 1
X
l;s us+1
s=1
t
1X
t
E
t 1
X
11=2
A
l;s us+1
s=1
l=1
8
11=2
A
:
The …rst term in parenthesis is bounded given the moment conditions on fl . For the second term, by
the moving average-type assumption on
E
t 1
X
8
l;s us+1
given that
l;s
we have that for given l;
8
l+
X
=E
s=1
l;s ,
l;s us+1
(2 + 1)7
s=l
M and E u8s+1
l+
X
8
l;s E
u8s+1
M;
s=l
M by Assumptions 2 c) and 6 a).
For part b), following exactly the same steps as for a), we have that
t 1
t
X
1X
E
t
l=1
s=1
8
l;s us+1
=
t 1
t
N
X
1 X
1X
E
(eil eis
t
N
T
N
which is O
T 4
N
s=1
l=1
4
8
E (eil eis )) us+1
i=1
T
t 1 N
1 XX
1 X
sup E p
(eil eis
R
T N s=1 i=1
l=1 R t T
given Assumption 4 a).
23
8
E (eil eis )) us+1
;
For part c),
t
1X
sup E
fl
t
t
l=1
t 1
X
l;s us+1
s=1
!
4
4
t
t 1X
N
X
1X
0 1
= sup E
fl fl
i eis us+1
t
N
t
s=1 i=1
l=1
0
t
t 1X
N
X
X
1
8 1
@
E kfl k
sup
i eis us+1
N
t t
s=1 i=1
l=1
sup E kfl k16
l
T
N
= O
2
!
1=2
0
@
T4
N4
sup E p
t
1
NT
4
1
A
t 1X
N
X
8
i eis us+1
s=1 i=1
11=2
A
;
given Assumptions 1 a) and 4 b).
Finally, for part d), we have that
!
! 4
t
N
t 1
1X
1 X 0 X
= sup E
fl
fs us+1
l;s us+1
i eil
t
N
t
s=1
s=1
i=1
l=1
l=1
0
1
4
4
t
N
t 1
2
X
X
X
1
T
1
1
p
sup
E @kfl k4 p
fs us+1 A
i eil
2
N t t
N
T
s=1
i=1
l=1
0 0
1
1
0
1
1=2
8
8 1=2
t
N
t 1
2
X
X
X
1
T
@E @kfl k8 p1
@E p1
sup
fs us+1 A
i eil AA
N2 t t
N
T
s=1
i=1
l=1
0
1
1
0
1=4
16
8 1=2
N
t 1
1=4
X
X
1
T2
1
@sup E p
A @sup E p
fs us+1 A
sup E kfl k16
i eil
N2
t
N i=1
T s=1
l
l
t
1X
fl
sup E
t
t
t 1
X
4
=O
given Assumptions 1.a), 2.e) and 6.b).
Proof of Lemma A.3. For part a), note that by de…nition of Ut;T ;
T
1 X
kUt;T k2
T
T
1 X
kut+1 k2 (T =R)2 sup T
T
t
t=R
t=R
1=2
t 1
X
2
zs us+1
s=1
given Assumptions 6 a) and b).
For part b), it su¢ ces to show that
E
T
X
t=R
t
1X
fl
t
l=1
l;t
!0
24
AUt;T = O (1) :
= OP (1)
T
N
2
!
;
Using the triangle and Cauchy-Schwarz inequalities,
E
T
X
t=R
t
1X
fl
t
l;t
l=1
!0
AUt;T
T
X
kAk E
t
t=R
T
1X
kfl k
kUt;T k
t
T
1 XX
kAk
E kUt;T k kfl k
R
kAk
1
R
l;t
l=1
!
t=R l=1
T X
T
X
t=R l=1
l;t
1=2
E kUt;T k2
1=2
E kfl k2
l;t
1=2
1=2
sup E kfl k2
kAk sup E kUt;T k2
t
l
1
R
T
l;t
t=R l=1
= O (1) ;
provided (i) supt E kUt;T k2 = O (1), (ii) supl E kfl k2 = O (1) ; and (iii)
T
1 XX
R
PT
t=R
PT
l;t
l=1
= O (1).
(ii) follows by Assumption 1 a) and (iii) follows by Assumption 2 c). Next, we show that (i) follows
under our assumptions. By de…nition of Ut;T ,
sup E kUt;T k2
sup E ut+1
t
t
T
t
T
1=2
t 1
X
s=1
1=2
(T =R)2 sup E u4t+1
t
implying that supR
t T
zs us+1
0
!
0
2
(T =R)2 sup E @u2t+1 T
t
@sup E T
1=2
t
t 1
X
1
4 1=2
A
zs us+1
s=1
E kUt;T k2 = O (1) since supt E u4t+1
1=2
t 1
X
2
zs us+1
s=1
;
M and supt E T
1=2
M under Assumptions 6 a) and b).
Pt
1
s=1 zs us+1
4
For part c), we can write
T
X
t=R
t
1X
fl
t
l=1
l;t
!0
AUt;T =
T
X
t=R
where
Wt;T =
t
N
1 1 XX
zl (eil eit
tN
T
t
ut+1 Wt;T ;
!0
E (eil eit ))
l=1 i=1
A
t 1
1 X
p
zs us+1
T s=1
!
is measurable with respect to F t . Thus, E ut+1 Wt;T jF t = 0 given the martingale di¤erence assumption 5 and it follows that
!
T
T
X
X
2
V ar
ut+1 Wt;T =
E u2t+1 Wt;T
t=R
t=R
|
T
X
t=R
25
E u4t+1
{z
p
=O( T )
!1=2
}|
T
X
t=R
4
E Wt;T
=O
{z
T
N2
1=2
!1=2
}
=O
T
N
;
1
A
which su¢ ces to prove the result. Note that
sup E kWt;T k4 = sup
t
t
4
T
R
t
N
1 1 XX
zl (eil eit
tN
4
T
t
E
E (eil eit ))
l=1 i=1
0
t 1
X
@sup E p1
zs us+1
t
T s=1
= O 1=N
!0
2
8
11=2
A
;
t 1
1 X
p
zs us+1
T s=1
A
0
t
N
1 1 XX
@
sup E
zl (eil eit
tN
t
l=1 i=1
!
4
8
E (eil eit )) A
where we have used Assumption 6.b) to bound the …rst term in the square root parentheses. The
second term can be bounded by
0
sup @
t
0
t
1X
t
l=1
N
1 X
(eil eit
N
E kzl k8
i=1
t
B1 X
sup @
E kzl k16
t
t
l=1
t
1X
sup
t
t
l=1
E kzl k16
1=4
sup E kzl k16
l
0
8
N
X
1=2
@E 1
(eil eit
N
!1=4 0
@
0
16
E (eil eit ))
i=1
t
1X
t
11=2
E (eil eit )) A
l=1
11=2 11=2
A C
A
16
N
1 X
E
(eil eit
N
i=1
16
N
X
@ 1 sup E p1
(eil eit
N 8 l;t
N i=1
E (eil eit ))
given Assumptions 2.d) and 6.a).
Next, consider part d). By replacing
T
X
t=R
t
1X
fl
t
l;t
l=1
!0
AUt;T =
T
X
t=R
l;t
11=4
A
E (eil eit ))
11=4
A
=O
1
N2
and Ut;T ; we can write
t
N
X
1X
0 1
fl fl
t
N
i eit
i=1
l=1
!0
Aut+1
T
t
t 1
1 X
p
zs us+1
T s=1
;
!
:
Adding and subtracting appropriately, we can decompose the term above as the sum of two terms,
1
i=1
!0
T
X
N
1 X
N
N
1 X
N
T
X
t=R
i eit
t
1X
fl fl0
t
and
2
=
t=R
i=1
Ir
l=1
i eit
!0
Aut+1
26
!0
T
t
Aut+1
T
t
t 1
1 X
p
zs us+1
T s=1
t 1
1 X
p
zs us+1
T s=1
!
:
!
11=2
kAk4
We can bound
k
1
by
t
T
R
1k
OP
= OP
N
t 1
T
X
1 X
1 X
Ir sup p
zs us+1
kut+1 k
N
t
T s=1
i=1
t=R
l=1
0
1
1=2
!
1=2
2
N
T
T
1 X
1 1 X 1 X
2
@
p
kut+1 k
i eit A
T
NT
N
i=1
t=R
t=R
1X
fl fl0
t
kAk sup
t
1
p
T
T
r !
T
;
N
given Assumptions 1 a) and b), 2 e), and 6 a). For
2
=
T
X
2,
i eit
we can write
ut+1
t;T ;
t=R
where
t;T
=
N
1 X
N
T
t
i eit
i=1
is measurable with respect to F t : Since E ut+1
!0
t;T jF
t 1
1 X
p
zs us+1
T s=1
A
t
!
= 0 given the martingale di¤erence assumption
5, it follows that
T
X
V ar
ut+1
t=R
t;T
!
=
T
X
E
u2t+1 2t;T
t=R
|
T
X
t=R
E
u4t+1
{z
p
=O( T )
!1=2
}|
T
X
4
t;T
E
t=R
=O
{z
T
N2
!1=2
1=2
=O
T
N
;
}
where
4
sup E
t;T
t
! 4
t 1
1 X
= sup
E
zs us+1
A p
i eit
t
T s=1
i=1
0
1
0
8 1=2
t 1
N
X
1
T 4
1
1 X
kAk @sup E p
zs us+1 A sup @ 4 E p
R
N
t
t
T s=1
N i=1
T
t
!0
N
1 X
N
4
8
i eit
= O 1=N 2 ;
given our assumptions. This proves that k
Finally, to prove part e), by replacing
T
X
t=R
t
1X
fl
t
l=1
l;t
!0
AUt;T =
T
X
T
t
ft0 ut+1
t=R
where
& t;T
T
t
t
1X
t
l=1
q
2 k = OP
l;t
T
N
t
l=1
N
1 X
N
N
1 X
N
i eil
i=1
27
i eil
i=1
!
fl0
!
A
and concludes the proof of part d).
and Ut;T ; we can write
1X
t
11=2
A
!
fl0
!
A
t 1
1 X
p
zs us+1
T s=1
t 1
1 X
p
zs us+1
T s=1
!
!
=
T
X
t=R
ut+1 ft0 & t;T ;
is measurable with respect to F t : Since E ut+1 ft0 & t;T jF t = 0, we can use the above argument to show
PT
T
that V ar
t=R vt+1 & t;T = O N ; given our assumptions.
Proof of Lemma A.4. Part a). For j = 1; 2; 3; 4; let
T 1
1 X
kAjt;t k2
P
Ij
t
1X ~
=
fl;t
t
with Ajt;t
t=R
where
1;l;t
=
l;t ,
=
2;l;t
l;t ,
3;l;t
=
l;t
j;l;t ;
l=1
and
=
4;l;t
l;t .
We consider each term separately. Starting
with the …rst term,
2
T 1
t
1 X 1X ~
I1 =
fl;t
P
t
t=R
T 1
1 X
P
l;t
t=R
l=1
t
1X ~
fl;t
t
2
l=1
!
t
1X
t
2
l;t
l=1
!
T
T
1 XX
r
PR
2
l;t
= O (1=T )
t=1 l=1
under Assumption 2 c) and given that T =R = O (1). Next,
T 1
t
1 X 1X ~
I2 =
fl;t
P
t
t=R
2
T 1
1 X
P
l;t
t=R
l=1
t
1X ~
fl;t
t
2
l=1
!
t
1X
t
l=1
2
l;t
!
T
r
T
1 XX
PR
2
l;t
= OP (1=N ) ;
t=1 l=1
given Assumption 2 d). I3 and I4 follow similarly using the fact that
T
T
1 XX
PR
T
2
l;t
=
t=1 l=1
T
1 XX
PR
2
l;t
= OP (1=N ) :
t=1 l=1
Part b). The proof follows the same argument as part a) and is therefore omitted.
We start with (a). Recall that V~t denotes the eigenvalue matrix of
Proof of Lemma A.5.
Xt Xt0 =tN for t = R; : : : ; T; with its eigenvalues arranged in a non-increasing order: v~1;t
v~r;t . It follows that V~t 1 = diag v~j;t1 ; where v~1;t1 v~2;t1 : : : v~r;t1 : Hence, V~t
Pr
~j;t2 r~
vr;t2 ; where v~r;t is the rth largest eigenvalue of Xt Xt0 =tN . Since
j=1 v
sup V~t
1
t
it su¢ ces to show that inf t j~
vr;t j
r1=2 sup v~r;t1 = r1=2
t
1
2
v~2;t
:::
= tr V~t
2
=
1
;
inf t j~
vr;t j
C > 0 for some constant C, with probability approaching 1. We use
an argument similar to that used by Fan et al. (2013; cf. Lemma C.4). First, note that for s = 1; : : : ; t,
where t = R; : : : ; T; xs = fs + es ; where xs = (x1s ; : : : ; xN s )0 is N
to es : This implies that the N
N covariance matrix of xs is equal to
E xs x0s = E fs fs0
or equivalently,
1
N
=
1
N
0
+
1 and a similar de…nition applies
1
N
e;
0
+ E es e0s =
0
+
e;
given that E (fs eis ) = 0 (by Assumption 3 a)) and the identi…-
cation condition Assumption 1. Given this assumption, the r
r matrix
0
N
is diagonal with distinct
and positive elements on the main diagonal. This implies that its eigenvalues, say
positive and distinct for j = 1; : : : ; r. Since
0
and
28
0
j
0
N
, are all
share the same nonzero eigenvalues, it follows
0
that
has r nonzero eigenvalues, with the remaining eigenvalues equal to 0. By Proposition 1 of
Fan et al. (2013) (which relies on Weyl’s eigenvalue theorem), and given Assumption 1, we can then
show that for j = 1; : : : ; r, the di¤erence between the j th eigenvalue of
0 =N
is bounded by the maximum eigenvalue of
( =N )
j
0
j
e =N ;
1
k
N
=N
=N and the j th eigenvalue of
that is,
e k1
1
N
=O
= o (1) ;
given Assumption 2 b). By a second application of Weyl’s Theorem (cf. Fan et al., 2013, Lemma 1),
for each j = 1; : : : ; r;
v~j;t
0
j
Xt0 Xt =tN
=N
N
1
1
;
given that Xt0 Xt and Xt Xt0 share the same nonzero eigenvalues. Thus,
v~r;t
since j
j
0
r
( =N )
j
Xt0 Xt =tN
=N
(
0
1
N
min (
1
=N )j = o (1) and the fact that j
inf j~
vr;t j
min (
t
j
0
(
=N )
sup Xt0 Xt =tN
) =2
Xt0 Xt =tN
) =2
j
N
(
N
1
1
1
1
;
)j = o (1). It follows that
1
1
t
and hence it su¢ ces to show that supt Xt0 Xt =tN
N
= oP (1) : But
;
Xt0 Xt
tN
N
= D1t + D2t +
D3t + D4t ; where
t
D1t =
1X
fs fs0
t
N
s=1
t
1X
D3t =
Ir
Nt
!
0
t
1X
es e0s
t
1
; D2t =
N
s=1
e
!
;
0
fs e0s , and D4t = D3t
;
s=1
where we recall that E (fs fs0 ) = Ir by assumption. It follows that
sup Xt0 Xt =tN
N
1
since kAk1
sup kD1t k + sup kD2t k + 2 sup kD3t k ;
1
t
t
t
t
kAk for any matrix A (see Horn and Johnson, 1985, p. 314). Starting with D1t , we have
that
sup kD1t k =
t
p 2
= N
| {z }
=tr(
0
sup
t
=N )=O(1)
T
t
t
1X
fs fs0
T
Ir
s=1
!
C
t
T
1X
sup
fs fs0
R t T
Ir
s=1
for some constant C: Since T =R = O (1) by assumption, it follows that supt kD1t k = oP (1) given that
p
P
supt T1 ts=1 (fs fs0 Ir ) = OP 1= T = oP (1) by Assumption 1. Next, consider D2t . We have
that for any " > 0;
P
sup kD2t k > "
t
T
X
t=R
P (kD2t k > ")
29
T
1 X
E kD2t k2 ;
"2
t=R
P
thus it su¢ ces to prove that Tt=R E kD2t k2 = o (1). By the de…nition of the Frobenius norm, kAk2 =
P
2
tr (A0 A) = N
N matrix. Thus, for given t = R; : : : ; T;
i;j=1 aij for any N
E kD2t k
2
=
!2
N
t
N
t
t
1 X
1X
1 X 1 X X
E
(eis ejs E (eis ejs ))
= 2
cov (eis1 ejs1 ; eis2 ejs2 )
N2
t
N
t2
s=1
s1 =1 s2 =1
i;j=1
i;j=1
1
0
N
T
T
T 1 @1 X 1 X X
jcov (eis1 ejs1 ; eis2 ejs2 )jA ;
R2 N N
T
s1 =1 s2 =1
i;j=1
which implies that
T
X
t=R
2
T
R
E kD2t k2
1
N
T X
T
X
X
1 @1
1
jcov (eis1 ejs1 ; eis2 ejs2 )jA = O (1=N ) = o (1) ;
N N
T
0
s1 =1 s2 =1
i;j=1
0 ; we have that
given in particular Assumption 2 f). Finally, for D3t = D4t
2
t
1X
2
kD3t k
1
p
N
t
N
| {z }
2
C
1
C
R
s=1
N
t
1 X 1X
C
fs eis
N
t
fs e0s
|{z}
i=1
r N
t
N
1 X 1 X
p
fs eis
N
T s=1
i=1
T
R
2
s=1
2
1
R
= OP
= oP (1) ;
given Assumption 3 a) and the fact that T =R = O (1) :
To prove (b), we prove the result for q = 2 and note that this implies the result for any q > 0 since
q
sup kHt k = sup kHt k
t
q=2
2
q=2
2
=
sup kHt k
= OP (1)
t
t
if supt kHt k2 = OP (1). By de…nition, for t = R; : : : ; T; Ht = V~t
kHt k
V~t
1
F~t0 Ft
t
0
; by the fact that kA Bk
N
sup kHt k
2
sup V~t
t
2
1
t
su¢ ces to show that supt
F~t0 Ft
t
F~t0 Ft
t
2
= t
1
t
X
0
N
; implying that
kAk kBk. Thus,
F~ 0 Ft
sup t
t
t
0
where the …rst factor is OP (1) given part a) and
F~t0 Ft
t
1
2
0
N
2
;
= O (1) under Assumption 1 c). Thus, it
N
2
= OP (1) : By the Cauchy-Schwarz inequality,
2
f~s;t fs0
t
s=1
implying that
F~ 0 Ft
sup t
t
t
2
|
r
1
t
X
f~s;t
s=1
{z
T
t
}
=r
T
R
2
1
T
X
s=1
30
1
t
X
s=1
kfs k2
r t
kfs k2 = OP (1)
1
t
X
s=1
kfs k2 ;
since sups E kfs k2 = O (1) by Assumption 1 a).
For part a), we follow the proof of result (2) in Bai and Ng (2013; cf.
Proof of Lemma A.6.
Appendix B, p. 27). In particular, by the de…nition of Ht , we can show that for t = R; : : : ; T;
Ht Ht0 = Ir + C1t + C2t + C3t ;
where
t
1X ~
fs;t f~s;t
t
C1t =
0
Ht fs
t
1X ~
fs;t
t
; C2t =
s=1
and
s=1
t
C3t =
1X
fs fs0
t
Ht
Ir
s=1
It follows that
8
t
< 1X
sup
f~s;t
t : t
sup kC1t k
t
2
s=1
!1=2
t
1X ~
fs;t
t
!
t
1X
t
t
= r1=2 sup
Ht fs fs0 Ht0
Ht fs
s=1
Ht0 :
f~s;t
Ht fs
s=1
2
!1=2
= OP (1=
!1=2 9
=
2
N;R ) ;
;
given Theorem 4.1. Similarly,
t
1X
kfs Ht k2
sup
t
t
sup kC2t k
t
s=1
!1=2
t
1X ~
fs;t
sup
t t
Ht fs
s=1
2
!1=2
= OP (1=
N;R ) ;
by Theorem 4.1 and the fact that the …rst factor can be bounded by
t
1X
sup
kfs Ht k2
t t
s=1
!1=2
T
sup kHt k2
t
1 X
kfs k2
R
s=1
!1=2
= OP (1)
given Lemma A.5.b). Finally, for C3t we have that
t
sup kC3t k
t
sup kHt k2 sup
t
t
1X
fs fs0
t
p
Ir = OP 1= T
s=1
given Assumption 1 b) and the fact that T =R = O (1) : This shows that Ht Ht0 = Ir + OP (1=
N;R ) ;
where the remainder is uniform over t = R; : : : ; T: We can now follow the argument in Bai and Ng
(2013) to show that Ht = H0t + OP (1=
matrix with
N;R )
uniformly over t = R; : : : ; T , where H0t is a diagonal
1 on the main diagonal. Note that we are not assuming exactly the same identifying
restrictions as Bai and Ng’s (2013) condition PC1. The di¤erence is that we assume that
f
instead of assuming that Ft0 Ft =t = Ir for each t: This explains why we get only the rate 1=
instead
of 1=
2
N;R :
For part b), note that we can show that
V~t =
0
N
+ OP (1=
31
N;R )
N;R
= Ir
by exactly the same argument as Bai and Ng (2013, p. 27). Given our Assumption 1.b), we can write
V~t =
uniformly over t. Since V~t
sup V~t
1
1
1
= V~t
sup V~t
1
1
t
t
+ OP (1=
V~t
1
N;R )
1
; it follows that
V~t
sup
t
(11)
1
= OP (1) OP (1=
N;R ) O (1)
given Lemma A.5 b) and (11).
Proof of Lemma A.7. Part a). We rely on Lemma A.1. In particular, for j = 1; : : : ; 4;
t 1
1X
t
Ajs;t us+1
t
1 X~
fl;t
t2
=
s=1
t 1
X
s=1
l=1
=
where we let
0
t
t 1
1p @1 X X
r
t
t
be one of
j;l;s
j;l;s us+1
l=1
!
2
C
B X
t
2C
1B
~
B1
fl;t C
C
tB
A
@ t l=1
{z
}
|
11=2
A
j;l;s us+1
s=1
l;s ; l;s ; l;s ; l;s
11=2
0
=r
0
@
t
t 1
1X X
t
l=1
2
j;l;s us+1
s=1
11=2
A
;
depending on j = 1; 2; 3; 4. By Lemma A.1.a) through
d), it then follows that
sup
t
t 1
1X
t
2
s=1
p
1 pT
R N
t
t 1
1X X
1p 4
r sup @
R
t
t
Ajs;t us+1
whereas it is OP
0
=O
p1
RN
2
j;l;s us+1
s=1
l=1
131=2
A5
= OP
1
R
if j = 1;
for j = 2; 3; 4, given that T =R = O (1) :
Part b). The same arguments follow by relying on Lemma A.1 e), f), and g). For the last term
involving A4s;t , the analogue of Lemma A.1 d) when we replace us+1 with zs0 does not hold because
2
P
we cannot claim that supt p1T ts=11 fs zs0 = OP (1) since E (fs zs0 ) = (E (fs ws0 ) ; E (fs fs0 )) 6= 0: Thus,
we need a more re…ned analysis. Replacing A4s;t with its de…nition yields
1
sup t
t
t 1
X
A4s;t zs0
s=1
= sup t
1
t
= sup t
t
where supt t
1
Pt
1
0
s=1 fs zs
t
1
s=1
1
t
sup t
t 1
X
1
t
X
l=1
t
X1
t
X
f~l;t
l=1
f~l;t
N
1 X
N
i=1
fs zs0
s=1
sup t
t
N
1 X
N
i=1
!
0
i eil
1
t
X
l=1
t
0
i eil
1
f~l;t
!
t 1
X
fs zs0
fs zs0
s=1
N
1 X
N
i=1
0
i eil
!
;
is bounded under Assumption 6. By adding and subtracting appropriately,
32
we can write
sup t
1
t
t
X
N
1 X
N
f~l;t
0
i eil
i=1
l=1
!
1
sup t
t
t
X
f~l;t
N
1 X
N
Ht fl
l=1
1
+ sup kHt k sup t
t
t
t
X
1
N
fl
l=1
i=1
N
X
0
i eil
!
0
i eil
i=1
!
:
p
The …rst term can be shown to be OP (1= N;R ) OP 1= N given Theorem 4.1 and the fact that
2
PN 0
1 PT
p1
e
= OP (1) under our assumptions. Thus, this term is OP 1= 2N;R : For the
il
i
t=1
i=1
T
N
second term, given that supt kHt k = OP (1), it su¢ ces that
!
t
N
t
N
X
p
1
1 X 0
T
1 XX
1
p
p
sup t
fl
e
=
sup
fl 0i eil = OP 1= T N ;
i il
N
R
t
TN t
T N l=1 i=1
i=1
l=1
which follows under Assumption 3.e). This concludes the proof of part b).
Proof of Lemma A.8. Part a). Given the equality (9) and the de…nitions of V^ (t) and V (t),
we have that
t 1
1X
Ht V (t) =
(~
zs;t
t
V^ (t)
t zs ) us+1
=
1
t
s=1
which implies that
!
0
Pt
1
s=1
f~s;t
Ht fs us+1
;
t 1
sup V^ (t)
t
sup V~t
4
X
1
t
t
t 1
1X
sup
t
t
j=1
1X ~
fs;t
t
Ht V (t) = sup
Ht fs us+1
s=1
Ajs;t us+1 = OP
s=1
1
R
+ OP
1 (B
A) B
1
p
RN
1
p
R N;R
= OP
!
;
by Lemmas A.5 and A.7.
1
Part b). Given the equality A
0 1
t B (t)
OP (1) and supt
^
B
1
(t)
1
tB
1
0
t
(t)
= A
1 2
supt
t
1
B
t
=t
1
t 1
X
1,
supt kB (t)k = OP (1), it su¢ ces to consider
0
z~s;t z~s;t
0
t zs zs
0
t
s=1
where
F1t = t
1
t 1
X
(~
zs;t
0
t zs )
zs;t
t zs ) (~
and F2t = t
s=1
1
By the triangle inequality and Theorem 4.1,
kF1t k
t
1
t 1
X
s=1
k~
zs;t
t zs k
2
^ (t) =
and the fact that supt B
=t
1
t 1
X
s=1
33
f~s;t
Ht fs
0
= F1t + F2t + F2t
;
t 1
X
(~
zs;t
0
t zs ) zs
s=1
2
= OP
1
2
N;T
!
:
0
t:
For F2t ; note that
1
F2t = t
where f~s;t
Ht fs = V~t
t 1
X
0
t z s ) zs
(~
zs;t
0
t
=
t
s=1
1
1
0
Pt
1
s=1
f~s;t
Ht fs zs0
0
t
!
;
(A1s;t + A2s;t + A3s;t + A4s;t ). Thus,
F2t = V~t
1
1
t
t 1
X
(A1s;t + A2s;t + A3s;t + A4s;t ) zs0
s=1
!
0
t;
which given Lemma A.7 is bounded in norm by
sup V~t
sup kF2t k
t
1
sup k
t
which is OP
t
2
N;R
tk
4
X
j=1
t 1
1X
Ajs;t zs0 = OP
sup
t
t
1
+OP
R
s=1
1
p
+OP
RN
1
p
+OP
RN
:
Part c). We have that
sup V^ (t)
sup V^ (t)
t
tV
t
p
where supt kV (t)k = OP 1= T
(t) + sup k
0
ut+1 (~
zt;t
t zt )
(t)k ;
given Assumption 6.b).
Proof of Lemma A.9. Part a). Given that •t
T
X1
t k sup kV
t
t
0 1
t
•t
=
T
X1
= B (t) V (t), we can write
ut+1 (~
zt;t
t zt )
0
0 1
t B (t) V
t=R
t=R
(t) = A1 + A2 ;
where
A1 =
T
X1
0
t zt )
ut+1 (~
zt;t
0 1
0t BV
t=R
Let B = (E (zs zs0 ))
1
(t) , and A2 =
T
X1
0
t zt )
ut+1 (~
zt;t
0 1
t B (t)
0 1
0t B
V (t) :
t=R
p
Bk = OP 1= T . By the triangle
and note that by Assumption 6, supt kB (t)
and Cauchy-Schwarz inequalities,
jA2 j
T
X1
t=R
sup
t
= OP
jut+1 j k~
zt;t
0 1
t B (t)
1
N;T
if
p
supt
OP
0 1
t B (t)
t zt k
0 1
0t B
1
p
T
0 1
0t B
T
1X 2
ut+1
T
sup kV (t)k T
t
t=1
1
OP (T ) OP
kV (t)k
= OP
0 1
t B (t)
0 1
0t B
= OP (1=
N;T ),
1
T
(2) supt kV (t)k = OP
34
p
T
2
N;T
N;T
T =N ! 0, where we have used Theorem 4.1 to bound
!1=2
PT
T
1X
k~
zt;t
T
t=1
!
t zt k
2
!1=2
= oP (1) ;
zt;t
t=1 k~
p
1= T , (3)
t zt k
1
T
2
PT
, and where (1)
2
t=1 ut+1
= OP (1)
1
2
N;R
!
;
hold under our assumptions. To see (1), note that
0 1
t B (t)
sup
t
0 1
0t B
1
sup
1
0t
t
t
= OP (1=
N;T )
OP (1=
NT )
that z~t;t
t
t
p
+ OP 1= T
1
given Assumption 6 and the fact that supt
1
0t
t
1
0t
sup kB (t)k + sup
= OP (1=
sup kB (t)
Bk
t
N;T ) ;
is bounded by supt Ht
1
H0t1 , which is
by Lemma A.6. (2) and (3) follow easily under Assumption 6. For A1 , note that given
t zt
=
00 ; f~t;t
Ht ft
0
and
(~
zt;t
where C is a r
A1 =
T
X1
t zt )
0t
0
= diag (Ik1 ; H0t ), with H0t = diag ( 1), we can write
0 1
0t B
= f~t;t
Ht ft
0
H0t C;
(r + k1 ) submatrix of B containing its last r rows. It follows that
f~t;t
ut+1
Ht ft
t=R
0
T 1
1 X ~
H0t CV (t) = p
ft;t
T t=R
Ht ft
0
H0t C
p
|
T V (t) ut+1 ;
{z
}
Ut;T
where we note that Ut;T is a martingale di¤erence sequence array with respect to F t given Assumption
5. Given the decomposition (9), we can write
f~t;t
Ht ft = V0
1
(A1t;t + A2t;t + A3t;t + A4t;t ) + V~t
1
V0
1
(A1t;t + A2t;t + A3t;t + A4t;t ) ;
which implies that we can decompose A1 into the sum of A1:1 and A1:2 with
A1:1 =
A1:2 =
T 1
1 X
p
(A1t;t + A2t;t + A3t;t + A4t;t )0 V00
T t=R
1
T 1
1 X
p
(A1t;t + A2t;t + A3t;t + A4t;t )0 V~t
T t=R
p
We can prove that kA1:2 k = OP
T
2
N;T
= oP (1) if
p
H0t CUt;T ;
1
V0
1
0
H0t CUt;T :
T =N ! 0, given that supt V~t
1
V0
1
=
PT 1
by Lemma A.6; that T1 t=R
for all j = 1; : : : ; 4 by Lemma A.4 a),
kAjt;t k2 = OP 21
NT
P
and given that T1 Tt=R kUt;T k2 = OP (1) by Lemma A.3 a). So, we concentrate on A1:1 : We can
OP
1
NT
decompose
4
X
A1:1 = R1 + R2 + R3 + R4
with obvious de…nitions. Let
1;l;t
=
l;t ,
2;l;t
=
l;t ;
3;l;t
=
j=1
l;t ,
Rj
and
4;l;t
=
l;t :
Given the de…nition
of Ajt;t for j = 1; : : : ; 4, we can write
t
Ajt;t =
1X ~
fl;t
t
l=1
t
l;t
= H0t
1X
fl
t
l=1
35
t
j;l;t +
1X ~
fl;t
t
l=1
H0t fl
j;l;t ;
(12)
which implies that
T 1
1 X
Rj = p
T t=R
t
1X
fl
t
j;l;t
l=1
!0
V0
1
T 1
t
1 X1X ~
CUt;T + p
fl;t
T t=R t l=1
H0t fl
0
0 1
j;l;t V0 H0t CUt;T
R1j +R2j ;
where we have used the fact that V0 is diagonal under our assumptions and H0t = diag ( 1) to
0 V0
write H0t
0
1
whereas R1j = OP
p
kR2j k
p
1
H0t = V0
T V0
T V0
p1
N
sup kH0t k
1
t=R
p
t=R
t
T V0
,
for j = 2; 3; 4: For R2j , note that
T
t
0
1 X
1X ~
kUt;T k
fl;t H0t fl
j;l;t
T
t
t
t=R
l=1
!1=2 0
T
T
t
X
1
1 X 1X ~
2
@
sup kH0t k
kUt;T k
fl;t
T
T
t
t
1
p1
T
when de…ning R1j : By Lemma A.3, we can conclude that R11 = OP
1X ~
sup kH0t k sup
fl;t
t
t t
1
H0t fl
2
l=1
Note that
T
H0t fl
0
2
j;l;t
l=1
!1=2
T
1 X
kUt;T k2
T
t=R
!1=2
11=2
A
T
T
1 XX
TR
2
j;l;t
t=1 l=1
T
1 XX
TR
2
l;t
= O (1=T )
t=1 l=1
under Assumption 2 c),
T
T
1 XX
PR
T
2
l;t
t=1 l=1
T
!2
N
1 X
p
(eil eit
N i=1
1 1 XX
=
N PR
t=1 l=1
E (eil eit ))
= OP (1=N )
given Assumption 2 d),
T
T
1 XX
PR
!2
T X
T
N
X
X
1
1
2
fl0 i eit
l;t =
PR
N
t=1 l=1
t=1 l=1
i=1
1
!0
2
T
T
N
X
X
X
1
1
1
p
kfl k2 @
i eit A = OP (1=N ) ;
R
P
N
t=1
i=1
l=1
T
2
l;t
=
t=1 l=1
T
1 XX
PR
1
N
given Assumption 2 e) and the fact that E kfl k2 = O (1) : Moreover,
t
sup
t
1X ~
fl;t
t
l=1
H0t fl
2
t
= sup
t
(
1X ~
fl;t
t
Ht fl + (Ht
2
l=1
t
1X ~
2 sup
fl;t
t t
Ht fl
2
l=1
= OP 1=
by Lemma A.6 and Theorem 4.1. Since
H0t ) fl
1
T
2
N;R
PT
+ OP 1=
2
t=R kUt;T k
kR2j k = OP
1
NT
36
2
N;R
T
+ sup kHt
t
= OP 1=
1 X
H0t k
kfl k2
R
2
l=1
2
N;R
)
;
= OP (1) by Lemma A.3 a), it follows that
when j = 1;
!1=2
:
whereas for j
2;
!
p
T
p
kR2j k = OP
;
N
p
= N : Thus, A1:1 = OP
NT
p
p
which is OP p1N if N T = T and OP NT if
p
oP (1) if T =N ! 0; which concludes part a).
NT
1
p
T
N
+ OP
NT
=
Part b). Given that
^t
0 1•
t
t
0 1
t B (t)
=
we can write
V^ (t)
1
t
T
X1
^ (t)
(t) + B
tV
0
^t
ut+1 z~t;t
0 1•
t
t
B1 =
T
X1
0
ut+1 z~t;t
0 1
t B (t)
V^ (t)
1
t
tV
(t)
and B2 =
t=R
p
We can easily show that B2 = OP
T
= oP (1) if
2
NT
p
T
X1
0
^ (t)
ut+1 z~t;t
B
t
B (t)
1
t
B
1
0t
1
0t
, where we note that
=
t zt )
0t .
p !
T
C + OP
2
NT
+ (~
zt;t
0 1
t B (t)
2
NT
V^ (t)
tV
(t) = t
t 1
X
(~
zs;t
and then replacing B (t)
1
t
with B
The extra terms can be shown to be OP
t zs ) us+1
=
t
s=1
0k1 r ~
=
Vt
Ir r
| {z }
1
C=
T
X1
ut+1 zt0 B
C1 =
C2 =
T
X1
t
1
t 1
X
T
X1
ut+1 zt0 B
4
X
t
t 1
X
1
0t JV0
4
X
1
t
V~t
1
V0
1
37
!
Ajs;t us+1
1
t 1
X
f~s;t
4
X
t
!
C1 + C2 ;
Ajs;t us+1
1
Ht fs us+1
:
s=1
j=1
t=R
0
1
s=1
Ajs;t us+1
j=1
0t J)
Pt
s=1
t=R
ut+1 zt0 (B
1
s=1
j=1
t=R
where
and
~
0t J Vt
1
j=1
J
With this notation,
4
X
V^ (t) :
:
by applying Theorem 4.1 and Lemma A.8. Notice that we can write
1
1
t
T =N ! 0, given in particular Lemma A.8 b)
t=R
t zt
V^ (t) ;
t=R
and c). For B1 , note that we can show that under our assumptions,
p !
T
X1
T
B1 =
ut+1 z 0 B 0t V^ (t)
t V (t) + OP
This follows by …rst replacing z~t;t with
1
t
= B1 + B2 ;
t=R
where
0 1
t B (t)
t 1
X
s=1
!
Ajs;t us+1
!
:
!
1
0t +
p
T
2
NT
p
We can easily show that C2 = OP
T
by the usual inequalities, given Lemmas A.6 b) and A.7 a)
P4
and b). Consider C1 . We have that C1 = j=1 Sj , where
!
T
t 1
X1
X
Sj =
Ajs;t us+1 , where 0t B 0t JV0 1 :
ut+1 zt0 0t t 1
2
NT
s=1
t=R
Given the de…nition of Ajs;t , by adding and subtracting appropriately (see (12) with s = t), we can
write Sj = S1j + S2j ; where
S1j =
and
S2j =
It follows that
T
X1
T
X1
t 1
ut+1 zt0
0t H0t
s=1
t=R
ut+1 zt0
0t
t=R
t 1
t
s=1
l=1
t
sup k
t
0t k
1
R
t=R
l=1
T
X1
t
1X
t=R
kut+1 zt k sup
t
t
f~l;t
j;l;s us+1
l=1
1X1X ~
fl;t
t
t
T 1
t
1X ~
1 X
fl;t
sup k 0t k
kut+1 zt k sup
R
t
t
t
sup jS2j j
t
1X1X
fl
t
t
H0t fl
t 1
X
H0t fl
j;l;s us+1 :
j;l;s us+1
s=1
2
H0t fl
l=1
!1=2 0
@sup
t
t
t 1
1X X
t
l=1
s=1
j;l;s us+1
!2 11=2
A
By Lemma A.7, the last factor is OP (1) when j = 1, which implies that supt jS2j j = OP (1= N T ) for
p
T =N , which
this value of j. For j = 2; 3; and 4, Lemma A.1 implies that the last factor is OP
p
p
implies that supt jS2j j = OP (1= N T ) OP
T =N = oP (1) if T =N ! 0: Hence, we conclude that
C1 =
where we can write
with sj;t+1
tion of
(zt ; zt
ut+1 zt0 Zjt and Zjt =
(zt ; zt
4
X
j=1
S1j + oP (1) ;
T 1
1 X
S1j =
sj;t+1
T
t=R
T 1 Pt 1 Pt
0t H0t t t
s=1
l=1 fl
1 ; : : : ; ut ; ut 1 ; : : : ; et ; et 1 ; : : :)
1 ; : : : ; yt ; yt 1 ; : : : ; Xt ; Xt 1 ; : : :),
j;l;s us+1 .
Noting that Zjt is a func-
and that this information set is identical to F t =
given that yt+1 = zt0 + ut+1 and that Xt =
ft + et ; it
follows that E sj;t+1 jF t = 0 by Assumption 5. Thus,
V ar (S1j ) =
T 1
T 1
i
1 X
1 X h
2
2
0
E
s
=
E
u
z
Z
t+1
jt
j;t+1
t
T2
T2
1
T2
1
T2
t=R
T
X1
t=R
T
X1
t=R
t=R
E
h
ut+1 zt0
E ut+1 zt0
2
kZjt k2
4 1=2
i
E kZjt k4
38
1=2
1
T
1=2
sup E kZjt k4
t
T 1
1 X
E ut+1 zt0
T
t=R
4 1=2
:
:
We can check that supt E kZ1t k4 = OP (1) and supt E kZjt k4 = OP (T =N )2 for j = 2; 3; 4 by Lemma
A.2. Since E kut+1 zt0 k4
M by Assumption 6, it follows that V ar (S1j ) is equal to O (1=T ) when
j = 1 and O (1=N ) when j = 2; 3, or 4: This shows that S1j = oP (1) and concludes the proof of part
b).
A.3
Proof of results in Section 4
Proof of Theorem 4.1. Part a). By (9), it follows that
T 1
1 X ~
ft;t
P
Ht ft
2
t=R
4 sup V~t
t
where supt V~t
1
2
1
2
1
P
T
X1
t=R
T 1
1 X ~
=
Vt
P
t=R
T
X1
1
kA1t;t k +
P
2
t=R
1
(A1t;t + A2t;t + A3t;t + A4t;t )
2
T 1
T 1
1 X
1 X
2
kA2t;t k +
kA3t;t k +
kA4t;t k2
P
P
2
t=R
t=R
!
= OP (1) by Lemma A.5. The result now follows by Lemma A.4.a), which shows
that the …rst term inside the parentheses is OP (1=T ), whereas the remaining terms are OP (1=N ) :
Part b). The proof follows exactly the same reasoning as part a), given Lemma A.4.b).
Proof of Lemma 4.1. The proof is immediate given Lemma A.8.
Proof of Lemma 4.2. The proof is immediate given (10) and Lemma A.9.
Proof of Theorem 4.2. The proof follows from Lemma 2.1, given Lemmas 4.1 and 4.2 and the
arguments described in the main text.
References
[1] Andreou, Elena, Ghysels, Eric, and Andros Kourtellos (2013), “Should Macroeconomic Forecasters Use Daily Financial Data?” Journal of Business and Economic Statistics 31, 1-12.
[2] Bai, Jennie (2010), “Equity Premium Predictions with Adaptive Macro Indices,” manuscript.
[3] Bai, Jushan (2003), “Inferential Theory for Factor Models of Large Dimensions,” Econometrica
71, 135-171.
[4] Bai, Jushan (2009), “Panel Data Models with Interactive Fixed E¤ects,”Econometrica 77, 12291279.
[5] Bai, Jushan, and Kunpeng Li (2012), “Statistical Analysis of Factor Models of High Dimension,”
Annals of Statistics 40, 436-465.
[6] Bai, Jushan, and Serena Ng (2006), “Con…dence Intervals for Di¤usion Index Forecast and Inference with Factor-Augmented Regressions,” Econometrica 74, 1133-1155.
39
[7] Bai, Jushan, and Serena Ng (2013), “Principal Components Estimation and Identi…cation of
Static Factors,” Journal of Econometrics 176, 178-189.
[8] Batje, Fabian, and Lukas Menkho¤ (2012), “Macro Determinants of U.S. Stock Market Risk
Premia in Bull and Bear Markets,” manuscript.
[9] Cakmakli, Cek and Dick van Dijk (2010), “Getting the Most out of Macroeconomic Information
for Predicting Stock Returns and Volatility,” manuscript.
[10] Chamberlain, Gary, and Michael Rothschild (1983), “Arbitrage, Factor Structure and Meanvariance Analysis in Large Asset Markets,” Econometrica 51, 1305-1324.
[11] Ciccarelli, Matteo, and Benoit Mojon (2010), “Global In‡ation,” Review of Economic Statistics
92, 524-535.
[12] Clark, Todd E., and Michael W. McCracken (2001), “Tests of Equal Forecast Accuracy and
Encompassing for Nested Models,” Journal of Econometrics 105, 85-110.
[13] Clark, Todd E., and Michael W. McCracken (2005), “Evaluating Direct Multistep Forecasts,”
Econometric Reviews 24, 369-404.
[14] Clark, Todd E., and Michael W. McCracken (2012), “Reality Checks and Comparisons of Nested
Predictive Models,” Journal of Business and Economic Statistics 30 53-66.
[15] Diebold, Francis X., and Roberto S. Mariano (1995), “Comparing Predictive Accuracy,”Journal
of Business and Economic Statistics 13, 253-263.
[16] Fan, Jianqing, Yuan Liao, and Martina Mincheva (2013), “Large Covariance Estimation by
Thresholding Principal Orthogonal Complements,”Journal of the Royal Statistical Society: Series
B 75, 603-680.
[17] Fosten, Jack (2015): Forecast Evalutation with Factor-Augmented Models, University of Surrey,
manuscript.
[18] Gonçalves, Silvia, and Benoit Perron (2014), “Bootstrapping Factor-Augmented Regression Models,” Journal of Econometrics 182, 156-173.
[19] Han, X., and Atsushi Inoue (2015), “Tests of Parameter Instability in Dynamic Factor Models,”
Econometric Theory, forthcoming.
40
[20] Hansen, Peter Reinhard, and Allan Timmermann (2015), “Equivalence Between Out-of-Sample
Forecast Comparisons and Wald Statistics,” Econometrica, forthcoming.
[21] Hjalmarsson, Erik (2010): Predicting Global Stock Returns, Journal of Financial and Quantitative Analysis, 45, 49-80.
[22] Hofman, Boris (2009), “Do Monetary Indicators Lead Euro Area In‡ation?” Journal of International Money and Finance 28, 1165-1181.
[23] Horn, Roger A., and Charles R. Johnson (1985), Matrix Analysis, Cambridge University Press,
Cambridge UK.
[24] Kelly, Brian, and Seth Pruitt (2013), “The Three-Pass Regression Filter: A New Approach to
Forecasting With Many Predictors,” Journal of Econometrics 186, 294-316.
[25] Li, Jia, and Andrew J. Patton (2015), “Asymptotic Inference about Predictive Accuracy using
High Frequency Data,” manuscript.
[26] Ludvigson, Sydney, and Serena Ng (2009), “A Factor Analysis of Bond Risk Premia,” Handbook
of Empirical Economics and Finance, A. Ullah and D. Giles Eds., 313-372, Chapman and Hall.
[27] McCracken, Michael W. (2007), “Asymptotics for Out-of-Sample Tests of Granger Causality,”
Journal of Econometrics 140, 719-752.
[28] McCracken, Michael W., and Serena Ng (2015), “FRED-MD: A Monthly Database for Macroeconomic Research,” manuscript.
[29] Pagan, Adrian (1984), “Econometric Issues in the Analysis of Regressions with Generated Regressors,” International Economic Review 25, 183-209.
[30] Pagan, Adrian (1986), “Two Stage and Related Estimators and Their Applications,” Review of
Economic Studies 53, 517-538.
[31] Randles, Ronald H. (1982), “On the Asymptotic Normality of Statistics with Estimated Parameters,” Annals of Statistics 10, 463-474.
[32] Shintani, Mototsugu (2005), “Nonlinear Forecasting Analysis Using Di¤usion Indexes: An Application to Japan,” Journal of Money, Credit and Banking 37, 517-38.
[33] Stock, James H., and Mark W. Watson (1996), “Evidence on Structural Instability in Macroeconomic Time Series Relations,” Journal of Business and Economic Statistics, 14, 11-30.
41
[34] Stock, James H., and Mark W. Watson (2002), “Macroeconomic Forecasting Using Di¤usion
Indexes,” Journal of Business and Economic Statistics 20, 147-162.
[35] Stock, James H., and Mark W. Watson (2003), “Forecasting Output and In‡ation: The Role of
Asset Prices,” Journal of Economic Literature 41, 788-829.
[36] Stock, James H., and Mark W. Watson (2007), “Has U.S. In‡ation Become Harder to Forecast?”
Journal of Money, Credit, and Banking 39, 3-33.
[37] West, Kenneth D. (1996), “Asymptotic Inference About Predictive Ability,” Econometrica 64,
1067-1084.
42
43
b = 0.3
(power)
b=0
(size)
6.7
6.6
6.8
6.7
51.6
51.2
51.6
51.5
73.7
73.6
73.6
73.7
50
100
150
200
50
100
150
200
50
100
150
200
50
100
150
200
50
100
150
200
100
200
50
100
200
92.2
92.1
92.3
92.2
6.1
6.1
6.1
6.0
8.5
8.6
8.5
8.4
50
100
150
200
50
ENC-Ff
N
T
91.1
91.5
91.9
91.8
71.7
72.7
72.9
73.3
49.8
50.2
51.0
51.0
6.1
6.1
6.1
6.0
6.7
6.6
6.7
6.7
8.4
8.6
8.6
8.4
ENC-Ff˜
74.8
74.8
74.9
74.6
58.1
58.1
58.0
58.0
41.7
41.4
41.6
41.6
6.3
6.3
6.3
6.2
6.7
6.7
6.7
6.6
7.8
8.1
8.0
7.8
MSE-Ff
π = 0.2
73.3
74.1
74.4
74.2
56.6
57.4
57.5
57.7
40.3
40.6
41.1
41.2
6.2
6.3
6.2
6.1
6.6
6.6
6.7
6.6
7.8
8.1
8.0
7.8
MSE-Ff˜
99.1
99.2
99.2
99.2
87.8
87.8
87.8
87.8
60.7
60.7
60.8
60.6
5.5
5.4
5.6
5.7
6.1
6.0
6.0
6.0
7.1
7.2
7.1
7.0
ENC-Ff
98.9
99.1
99.1
99.1
86.0
86.9
87.2
87.4
58.2
59.4
59.9
59.9
5.5
5.4
5.5
5.7
6.0
5.9
6.1
6.0
7.0
7.1
7.1
7.0
ENC-Ff˜
92.1
92.0
92.0
92.2
74.4
74.2
74.1
74.2
48.9
48.9
48.9
48.7
5.3
5.2
5.3
5.5
5.5
5.5
5.5
5.5
6.0
6.1
6.0
6.1
MSE-Ff
π=1
91.1
91.5
91.7
91.9
72.3
73.2
73.4
73.8
47.0
47.8
48.2
48.4
5.3
5.2
5.3
5.5
5.5
5.4
5.5
5.5
6.0
6.1
6.0
6.0
MSE-Ff˜
Table 1: Size and power for MSE-F and ENC-F (%)
99.4
99.4
99.5
99.4
88.9
89.0
89.0
88.9
61.7
61.7
61.7
61.6
5.3
5.3
5.4
5.3
5.8
5.9
5.7
5.9
7.1
7.1
7.4
7.3
ENC-Ff
99.2
99.3
99.4
99.4
87.0
88.1
88.5
88.5
59.0
60.4
60.7
60.9
5.3
5.3
5.4
5.3
5.7
5.9
5.7
5.9
7.0
7.1
7.3
7.3
ENC-Ff˜
95.9
96.0
95.8
95.8
79.3
79.4
79.6
79.4
50.9
50.9
51.0
50.7
5.1
4.9
5.1
5.0
5.1
5.2
5.0
5.2
5.3
5.2
5.3
5.4
MSE-Ff
π=2
95.2
95.7
95.6
95.7
77.3
78.5
78.8
78.9
48.7
49.9
50.2
50.1
5.0
4.9
5.1
5.0
5.1
5.2
5.0
5.2
5.2
5.3
5.2
5.4
MSE-Ff˜
Table 2: Factor-based forecasting of the equity premium
Calendar time
1985:01 - 2014:12
Model
Ratio MSE-F
2
0.98
5.52*
2, 6
0.99
4.82*
2, 8
0.99
4.71*
2, 6, 8
0.99
4.22*
3.56*
2, 4
0.99
2, 3
0.99
3.42*
2, 4, 8
3.28*
0.99
2, 7
0.99
3.15*
2, 3, 6
0.99
3.12*
2.97*
2, 3, 8
0.99
1985:01 - 2014:12
Model
Ratio MSE-F
2, 7
0.99
2.47*
1.00
2, 6, 7
1.51
1.00
1.47
2
2, 7, 8
1.00
0.94
2, 6
1.00
0.07
1.00
2, 6, 7, 8
0.07
1.00
-0.09
7
2, 3, 7
1.00
-0.10
2, 4, 7
1.00
-0.13
2, 8
1.00
-0.15
2005:01 - 2014:12
Ratio MSE-F
0.95
5.73*
0.96
5.51*
0.96
5.46*
0.96
5.26*
4.66*
0.96
0.96
4.45*
4.33*
0.97
0.97
3.99*
0.97
3.86*
3.56*
0.97
ENC-F
5.49*
4.87*
5.41*
4.79*
3.47*
3.39*
4.21*
4.11*
3.50*
2.86*
2005:01 - 2014:12
Model
Ratio MSE-F
2, 8
0.98
2.51*
2, 4, 8
0.98
2.36*
2, 7, 8
0.98
2.21*
2, 4, 7, 8
0.98
2.15*
2
0.98
2.03*
2, 4
0.98
1.83*
2, 7
0.99
1.68*
2, 4, 7
0.99
1.58
2, 3, 8
0.99
1.26
2, 3, 4, 8
0.99
1.17
ENC-F
2.17*
2.48*
1.79*
2.09*
1.77*
2.05*
1.38*
1.66*
1.90*
2.21*
ENC-F
Model
7.14*
2, 8
6.21*
2, 6, 8
8.05*
2, 5, 8
7.21*
2, 5, 6, 8
5.99*
2, 4, 8
7.43*
2, 4, 5, 8
6.89*
2
6.88*
2, 5
6.58*
2, 6
8.32*
2, 4, 6, 8
Data-release time
ENC-F
3.08*
2.69
2.25*
2.77
1.87
2.41
0.80
2.50
2.28
1.95
Note: The table reports empirical evidence regarding the use of factor-based models for predicting the equity premium at the one-month horizon. Only the top 10 models, by MSE, are
reported. The top panel uses factors estimates dated by calendar time while the lower panel
uses factor estimates dated by data-release time. Model denotes which factors are included in
a given model. Ratio denotes MSE(factor)/MSE(benchmark). An asterisk next to the values
of the MSE-F and ENC-F denote significant at the 5% level.
44
Download