Research Division Federal Reserve Bank of St. Louis Working Paper Series Tests of Equal Accuracy for Nested Models with Estimated Factors Sílvia Gonçalves Michael W. McCracken and Benoit Perron Working Paper 2015-025A http://research.stlouisfed.org/wp/2015/2015-025.pdf September 2015 FEDERAL RESERVE BANK OF ST. LOUIS Research Division P.O. Box 442 St. Louis, MO 63166 ______________________________________________________________________________________ The views expressed are those of the individual authors and do not necessarily reflect official positions of the Federal Reserve Bank of St. Louis, the Federal Reserve System, or the Board of Governors. Federal Reserve Bank of St. Louis Working Papers are preliminary materials circulated to stimulate discussion and critical comment. References in publications to Federal Reserve Bank of St. Louis Working Papers (other than an acknowledgment that the writer has had access to unpublished material) should be cleared with the author or authors. Tests of Equal Accuracy for Nested Models with Estimated Factors Sílvia Gonçalves, Michael W. McCrackenyand Benoit Perronz September 14, 2015 Abstract In this paper we develop asymptotics for tests of equal predictive ability between nested models when factor-augmented regression models are used to forecast. We provide conditions under which the estimation of the factors does not a¤ect the asymptotic distributions developed in Clark and McCracken (2001) and McCracken (2007). This enables researchers to use the existing tabulated critical values when conducting inference. As an intermediate result, we derive the asymptotic properties of the principal components estimator over recursive windows. We provide simulation evidence on the …nite sample e¤ects of factor estimation and apply the tests to the case of forecasting excess returns to the S&P 500 Composite Index. JEL Nos.: C12, C32, C38, C52 Keywords: factor model, out-of-sample forecasts, recursive estimation Western University, Economics Department, SSC 4058, 1151 Richmond Street N. London, Ontario, Canada, N6A 5C2. email: sgoncal9@uwo.ca. y Federal Reserve Bank of St. Louis. Address: P.O. Box 442. St. Louis, MO 63166, U.S.A. Tel: 314-444-8594. Email: michael.w.mccracken@stls.frb.org z Département de sciences économiques, CIREQ and CIRANO, Université de Montréal. Address: C.P.6128, succ. Centre-Ville, Montréal, QC, H3C 3J7, Canada. Tel: (514) 343 2126. Email: benoit.perron@umontreal.ca. The views expressed here are those of the individual authors and do not necessarily re‡ect o¢ cial positions of the Federal Reserve Bank of St. Louis, the Federal Reserve System, or the Board of Governors. 1 Introduction Factors are now a common, parsimonious approach to incorporating many potential predictors into a predictive regression model. Examples that estimate factors using a large collection of macroeconomic variables include works by Stock and Watson (2002), who use factors to predict U.S. in‡ation and industrial production; Ludvigson and Ng (2009), who use factors to predict bond risk premia; Hofmann (2009), who uses factors to predict euro-area in‡ation; Shintani (2005), who uses factors to predict Japanese in‡ation; and Ciccarelli and Mojon (2010), who use factors to predict global in‡ation. Examples that estimate factors using a large collection of …nancial variables include works by Stock and Watson (2003), who use factors to predict a variety of macroeconomic variables; Kelly and Pruitt (2013), who use factors to predict stock returns; and Andreou, Ghysels, and Kourtellos (2013), who use factors to predict U.S. real GDP growth. In many of these examples, the predictive content of the factors is evaluated using pseudo outof-sample methods. Speci…cally, the usefulness of the factor is evaluated by comparing the mean squared forecast error (MSE) of a model that includes factors with one that does not include factors. Typically, the comparison is between a baseline autoregressive model and a comparable model that has been augmented with estimated factors. As such, the models being compared are nested under the null hypothesis of equal accuracy and the theoretical results in West (1996), based on non-nested comparisons, may not apply. McCracken (2007) and Clark and McCracken (2001, 2005) show that, for forecasts from nested models, the distributions of tests for equal forecast accuracy and encompassing are usually not asymptotically normal or chi-square. For one-step-ahead forecasts, tables are provided in Clark and McCracken (2001) and McCracken (2007) and improved upon by Hansen and Timmermann (2015). For longer horizons, a bootstrap approach is developed in Clark and McCracken (2012). The existing literature on out-of-sample methods for nested models is largely silent on situations where generated regressors –and, in particular, factors –are used for forecasting.1 In most cases, the asymptotic distribution of the test of predictive ability is derived assuming that the predictors are observed and do not themselves need to be estimated prior to forecasting. In the context of in-sample methods, generated regressors have been shown to sometimes a¤ect the asymptotic distribution of a statistic, typically through the standard errors. Examples include works by Pagan (1984, 1986) and Randles (1982). Of particular interest here, Gonçalves and Perron (2014) prove that in some instances an asymptotic bias is also introduced when factors are estimated rather than known. 1 Li and Patton (2013) establish the …rst-order validity of the asymptotics in Clark and McCracken (2001, 2005) when the dependent variable, not the regressors, is generated. Fosten (2015) considers the non-nested case with estimated factors. 1 Building on McCracken’s (2007) and Clark and McCracken’s (2001, 2005) results, we examine the asymptotic and …nite sample properties of tests of equal forecast accuracy and encompassing applied to predictions from nested models when generated factors are used as predictors. Speci…cally, we establish conditions under which the asymptotic distributions of the M SE-F and EN C-F tests of predictive ability continue to hold even when the factors are estimated recursively prior to forecasting. Throughout we focus on one-step ahead forecasts made by linear regression models estimated by ordinary least squares (OLS). Because our asymptotics depend on the relevant size of the number of cross sectional units (N ), the total number of observations (T ), and the number of predictions (P ), we provide a collection of Monte Carlo simulations designed to examine the size and power of the tests as we vary these parameters. In most, but not all, of our simulations the presence of estimated factors leads to only minor size distortions, yielding rejection frequencies akin to those found in the existing literature when the predictors are observed rather than estimated. Nevertheless, we …nd that the estimation of factors reduces power relative to the case where factors are observed. Finally, to illustrate how the tests perform in practical settings, the paper concludes with a reexamination of the factor-based forecasting of excess stock returns. The rest of the paper is structured as follows. Section 2 introduces notation and describes the setup. Section 3 provides assumptions and a discussion of these. Section 4 contains our main theoretical results, including some asymptotic results on the estimation of factors over recursive samples which may be of independent interest. Section 5 presents Monte Carlo results on the …nite sample size and power of the tests with an emphasis on the degree of distortions induced by factor estimation error. Section 6 applies the tests to forecasting excess stock returns using a large cross-section of macroeconomic aggregates and …nancial data. Finally, Section 7 concludes. Before we proceed, we de…ne some notation. For any matrix A, we let kAk = tr1=2 (A0 A) denote the Frobenius norm and kAk1 = supR 2 2.1 t T 1=2 0 max (A A) the spectral norm. For R = T P , we let supt denote : Setup and statistics of interest Setup Consider a benchmark one-step-ahead forecast predictive regression model given by yt+1 = wt0 + u1;t+1 ; where wt is a k1 t = 1; : : : ; T; 1 vector of predictors for the target variable yt+1 . 2 (1) We would like to compare the forecast accuracy of this model with that of an alternative model given by yt+1 = wt0 + ft0 + u2;t+1 where = 0 ; 0 0 and zt = (wt0 ; ft0 )0 is a k2 zt0 + u2;t+1 ; t = 1; : : : ; T; (2) 1 set of predictors, with k2 = k1 + r. In this paper, the r additional predictors ft are latent and denote a vector of r common factors underlying a panel factor model 0 i ft xit = + eit , t = 1; : : : ; T ; i = 1; : : : ; N: (3) We base our comparison of the two models on their ability to forecast y one step-ahead. To do so, we divide the total sample of T = R + P observations into in-sample and out-of-sample portions. The in-sample observations span 1 to R; whereas the out–of-sample observations span R + 1 through T for a total of P one-step-ahead predictions. For each forecast origin t = R; : : : ; T 1; we generate forecasts of yt+1 using the benchmark model (1) by regressing ys+1 on ws for s = 1; : : : ; t 1: y^1;t+1 = wt0 ^ t ; where ^ = t t 1 X ws ws0 s=1 is the recursive OLS estimator of ! in (1). 1 t 1 X ws ys+1 s=1 To generate forecasts of yt+1 using the alternative model (2), we proceed in two steps. First, we estimate the latent factors recursively at each forecast origin t = R; R + 1; : : : ; T of principal components. More speci…cally, at each time t, we compute an N loadings de…ned as ~ t = matrix F~t = f~1;t ~ 1;t 0 f~t;t ~ N;t 0 1 by the method r matrix of factor and the corresponding factors collected in the t : Since estimation takes place recursively over t = R; : : : ; T index ~ t and F~t by t. Thus, f~s;t denotes the sth observation on the r r 1, we 1 vector of factors estimated using data on xis up to time t: The principal components estimators are de…ned as N t 1 XX xis i ;fs g N t ~ t ; F~t = arg min f 2 0 ; i fs i=1 s=1 F~t0 F~t subject to the normalization condition that t = Ir : Under this normalization, we can show that F~t p X 0 F~ is equal to t times the t 1 eigenvector of the t t matrix Xt Xt0 = (tN ) and ~ t = tt t ; where we let 0 1 x1;1 : : : x1;N B .. C ; for t = R; R + 1; : : : ; T 1; .. Xt = @ ... . . A t N xt;1 : : : xt;N 3 and ~ t = Xt0 F~t t : Thus, Xt Xt0 ~ tN Ft = F~t V~t ; where V~t = diag (~ v1;t ; : : : ; v~r;t ), with v~1;t and v~j;t is the j th largest eigenvalue of Xt Xt0 tN v~2;t ::: v~r;t > 0 . As is well known, the principal components estimator is not consistent for the true latent factors even under the normalization imposed above. As Bai (2003) and Stock and Watson (2002) showed, for given t, F~t is mean square consistent for a rotated version of Ft , where the rotation matrix is de…ned as follows: ! ~t0 Ft F Ht = V~t 1 t Given the estimated factors forecasts of yt+1 : 0 where ^t = t 1 X 0 z~s;t z~s;t s=1 2.2 0 (4) n o f~s;t : s = 1; : : : ; t , we then regress ys+1 onto ws and f~s;t to get 0 y^2;t+1 = wt0 ^t + f~t;t ^t 0 with z~s;t = ws0 ; f~s;t : N for s = 1; : : : ; t ! 1 t 1 X 0 ^ z~t;t t; z~s;t ys+1 ; s=1 1; where t = R; : : : ; T 1: Test statistics The statistics of interest are given by EN C-Ff~ = PT and M SE-Ff~ = 1 ^1;t+1 (^ u1;t+1 t=R u 2 ^ PT 1 t=R P u ^21;t+1 ^2 with ^2 u ^2;t+1 ) 1 T X1 u ^22;t+1 ; u ^22;t+1 ; (5) (6) (7) t=R and where u ^1;t+1 = yt+1 y^1;t+1 and u ^2;t+1 = yt+1 y^2;t+1 denote the forecast errors of the benchmark and alternative models, respectively. Clark and McCracken (2001) derive the asymptotic distribution of the EN C-F statistic under the null hypothesis H0 : Eu1;t+1 (u1;t+1 u2;t+1 ) = 0 when all the predictors are observed –that is, when both wt and ft are observed. Similarly, McCracken (2007) derives the asymptotic distribution of the M SE-F statistic under the null hypothesis H0 : E(u21;t+1 u22;t+1 ) = 0 when all the predictors are observed. In particular, they derive the asymptotic distributions of the analogs of (5) and (6) when u ^1;t+1 and u ^2;t+1 are replaced with u •1;t+1 and u •2;t+1 , the out-of-sample forecast errors based on wt and zt = (wt0 ; ft0 )0 , respectively. Our main goal is to prove the asymptotic equivalence between the 4 statistics based on u ^1;t+1 and u ^2;t+1 and those based on u •1;t+1 and u •2;t+1 : This result will justify using the usual critical values when testing for equal predictability with estimated factors in the nesting model. We …rst establish an algebraic relationship between the out-of-sample test statistics given in (5), (6), and (7) and their analogs based on u •1;t+1 and u •2;t+1 (which we denote by EN C-Ff , M SE-Ff , and • 2 , respectively). We can write EN C-Ff~ = EN C-Ff + EN C-Ff •2 ^2 1 + T 1 1 X u •1;t+1 (• u2;t+1 ^ 2 t=R u ^2;t+1 ) and M SE-Ff~ = M SE-Ff +M SE-Ff •2 ^2 1 T 1 2 X (• u2;t+1 (^ u2;t+1 ^ 2 t=R T 1 1 X u •2;t+1 )) 2 (^ u2;t+1 ^ t=R u •2;t+1 )2 : This decomposition allows us to identify a set of su¢ cient conditions for the asymptotic equivalence of the two sets of statistics (based on f~ and on f , respectively). Lemma 2.1 Suppose that R; P ! 1 such that a) b) c) PT 1 u2;t+1 t=R (^ u •2;t+1 )2 = oP (1) : PT 1 •2;t+1 (^ u2;t+1 t=R u PT 1 •1;t+1 (• u2;t+1 t=R u d) ^ 2 u •2;t+1 ) = oP (1) : u ^2;t+1 ) = oP (1) : • 2 = oP (1) : Then, it follows that EN C-Ff~ = EN C-Ff + oP (1) and M SE-Ff~ = M SE-Ff + oP (1) : The proof of Lemma 2.1 is trivial and therefore omitted. Our next goal is to provide a set of assumptions such that conditions a) through d) of Lemma 2.1 are veri…ed. An important step in that direction is to provide bounds on the mean squared factors estimation uncertainty over recursive samples, which we do next. But …rst we need to introduce our assumptions. 3 Assumptions In this section, we state the assumptions that will be used to establish the asymptotic equivalence of the test statistics with estimated factors (EN C-Ff~ and M SE-Ff~) and the analogous statistics with known factors (EN C-Ff and M SE-Ff ). Throughout, we let M be a generic …nite constant. 5 Assumption 1 a) For each s; E kfs k16 M, E (fs fs0 ) = Ir and f 0 is a diagonal matrix with distinct entries arranged in decreasing order. b) supt c) 0 1 T Pt 0 s=1 (fs fs N p Ir ) = OP 1= T : p = O 1= N such that in…nity –that is, 0 < a < min ( min ( ) is bounded away from zero and bounded away from ) < b < 1 for some constants a and b: The …rst part of Assumption 1 a) requires fs to have …nite moments of order 16. Although some of our results could be derived under less stringent moment conditions (for instance, the results in Section 4 require only …nite second moments), we will rely on these many moments in later parts of our proofs and therefore we impose this condition from the beginning. The remaining part of Assumption 1 a) is a normalization condition often used in factor analysis. It implies that both the factors and the factor loadings are orthogonal. See, in particular, Stock and Watson (2002) for a similar assumption and more recently, Fan, Liao, and Mincheva (2013), Bai and Li (2012), and Bai and Ng (2013). Part p P b) of Assumption 1 implies that T 1 Ts=1 fs fs0 converges to f = Ir at T -rate. This condition follows under standard mixing conditions on fs by a maximal inequality for mixing processes. Part c) of Assumption 1 requires that a nonnegligible fraction of the factor loadings is nonvanishing, implying that factors are pervasive. We specify the rate at which 0 N converges to , a diagonal matrix given part a). For simplicity, we treat the factors loadings as …xed, but if the factor loadings were assumed random, this would be the rate implied by the central limit theorem. See Han and Inoue (2015) for a similar assumption. Assumption 2 a) E (eit ) = 0, E jeit j8 M: b) There exists a constant m > 0 such that min ( e ) E (et e0t ) : c) t;s and E (e0t es =N ) = t;s PN i=1 E (eit eis ) is such that t;s p1 N p1 N PN i=1 (eit eis PN i=1 i eit 16 E (eit eis )) M: 6 16 e k1 = max ( e ) = 0 whenever jt M for all (t; s). d) For every (t; s) ; E e) For every t, E 1 N > m, k M: sj > < M , where e = for some < 1, f) 1 N PN 1 i;j=1 T PT s1 =1 PT s2 =1 jcov (eis1 ejs1 ; eis2 ejs2 )j M: Assumption 2 is rather standard in this literature, allowing for weak dependence of eit across i and over t. Part b) of Assumption 2 corresponds to the approximate factor model of Chamberlain and Rothschild (1983), imposing an upper bound on the maximum eigenvalue of the covariance matrix of et = (e1t ; : : : ; eN t )0 : It is satis…ed if we assume that eit is independent across i, given the moment conditions on eit . However, it does not require cross sectional independence and is implied by the P M; an assumption used by Bai (2003). condition that max1 j N N i=1 jE (eit ejt )j Part c) of Assumption 2 allows for serial correlation on eit but restricts it to be of the moving average type. This is a stronger assumption than usual. The main reason we impose this assumption is the need for tight uniform bounds on the time average of the product of the factors estimation error f~s;t Ht fs and us+1 , the common value of u1;s+1 and u2;s+1 under the null hypothesis. In particular, we need that 1 sup t t 2 N;R where t 1 X f~s;t Ht fs us+1 = OP s=1 1 p R N;R ! ; = min (N; R). This in turn is satis…ed if sup t t 1 t t 1 X X l=1 l;s us+1 s=1 !2 = OP (1) ; < 1 (but not P necessarily if we impose only the more usual weak dependence assumption that T 1 Tl;s=1 2l;s M ). which follows under the assumption that t;s = 0 whenever jt sj > for some Part d) of Assumption 2 strengthens Bai’s (2003) Assumption C.5 by requiring 16 instead of 4 …nite moments. Part e) is also a strengthened version of Assumption 3.d) by Gonçalves and Perron (2014), where 16 instead of 2 …nite moments are required. The relatively large number of moments we require is explained by the fact that we use mean square convergence arguments to show the asymptotic equivalence of our statistics. Since these are a function of squared forecast errors that depend on the product of estimated factors and regression estimates, the 16 moments assumptions are the result of an application of Cauchy-Schwarz inequality. Finally, Bai (2009) relies on an assumption similar to part f) (see, in particular, his Assumption C.4) in the context of the interactive …xed e¤ects model. Assumptions 3 and 4 are new to the out-of-sample forecasting context. Assumption 3 a) 1 N b) 1 T PN i=1 supt PT l=1 supt 0: p1 T Pt p1 NT s=1 fs eis Pt 2 = OP (1), where E (fs eis ) = 0: 1 PN s=1 i=1 (eil eis E (eil eis )) us+1 7 2 = OP (1), where E ((eil eis E (eil eis )) us+1 ) = p1 NT c) supt d) 1 T PT 1 PN i=1 s=1 Pt l=1 supt p1 NT 0: e) supt p1 NT i eis us+1 Pt 2 = OP (1), where E ( i eis us+1 ) = 0: 1 PN s=1 i=1 (eil eis Pt 1 PN s=1 i=1 0 i eis zs 2 E (eil eis )) zs 2 = OP (1) ; where E ((eil eis E (eil eis )) zs ) = = OP (1), where E ( i eis zs0 ) = 0: Assumption 3 requires that several partial sums be bounded in probability, uniformly in t = R; : : : ; T . Since the summands have mean zero, this assumption is not very restrictive and is implied by an application of the functional central limit theorem. For instance, we can verify that Assumptions 3 b) and c) hold under Assumptions 2 d) and e) if we assume that fut g and fet g are independent of each other and ut+1 is a martingale di¤erence sequence, as we assume in Assumption 5 below. Our next assumption requires that the partial sums in parts b) and c) of Assumption 3 have 8 …nite moments. Assumption 4 p1 TN a) For each (l; t), E p1 NT b) For each t; E Pt 1 PN s=1 i=1 (eil eis 1 PN i=1 s=1 Pt i eis us+1 E (eil eis )) us+1 8 8 M: M: Assumption 4 is a strengthening of Assumption 4 of Gonçalves and Perron (2014), according to 2 2 P PN P 1 PT E (eil eis )) us+1 M and E pN e u M: which E pT1 N Ts=1 N i is s+1 i=1 (eil eis i=1 s=1 T A su¢ cient condition for Assumption 4 is that ut+1 is i.i.d., given Assumptions 2 d) and e), respectively. Our assumption on ut+1 is nevertheless weaker, as we assume that only the regression innovation is a martingale di¤erence sequence. More formally, we impose the following assumption on ut+1 . Assumption 5 E ut+1 jF t = 0, where F t = (yt ; yt 1 ; : : : ; Xt ; Xt 1 ; : : : ; zt ; zt 1 ; : : :). Assumption 5 is analogous to the martingale di¤erence assumption in Bai and Ng (2006) when the forecasting horizon is 1. Assumption 6 a) E kzt k16 1 b) Let V (t) = t M for all t: c) Let B (t) OP t Pt M; E u8t+1 1 s=1 zs us+1 , 1 p 1= T : Pt 1 0 s=1 zs zs M for all t = 1; : : : ; T: p p where E (zs us+1 ) = 0: Then supt kV (t)k = OP 1= T and E T V (t) 1 and B = (E (zs zs0 )) 8 1 . Then, supt kB (t)k = OP (1) and supt kB (t) 8 Bk = d) P; R ! 1 such that P=R = O (1) : Assumption 6 imposes conditions on the regressors and errors of the predictability regression model based on the latent factors. In particular, Assumption 6 a) extends the moment conditions on the factors assumed in Assumption 1 to the observed regressors wt . In addition, it requires the innovations ut+1 to have eight …nite moments. Parts b) and c) of Assumption 6 imply that supt •t = p supt kB (t) V (t)k = OP 1= T , which is the usual rate of convergence. These assumptions are implied by more primitive assumptions that ensure the application of mixingale-type inequalities. Part d) is implied by the usual rate condition on P and R according to which limP;R!1 0 P R = , with < 1: 4 Theory 4.1 Properties of factors over recursive samples As mentioned earlier, factor models are subject to a fundamental identi…cation problem. In our setup, this means that factors f~s;t estimated at each forecasting origin t are consistent toward a rotation of the true factors given by Ht fs , where the rotation matrix Ht is time-varying. The asymptotic properties of the rotation matrix Ht and its inverse (as well as the inverse of the eigenvalue matrix V~t 1 ) are a crucial step to proving the asymptotic negligibility of the factors estimation error in the out-of-sample forecasting context. Lemmas A.5 and A.6 in Appendix A contain these results. These results are crucial to bounding terms involving the estimated factors in the expansions of our test statistics. The following theorem establishes the rates of convergence of the estimated factors over recursive subsamples. Theorem 4.1 Under Assumptions 1-6, a) 1 P PT b) supt 1 t=R 1 t f~t;t Pt s=1 Ht ft f~s;t 2 = OP 1= Ht fs 2 2 NR 2 N;R , where = OP 1= 2 NR = min (N; R) : . Theorem 4.1 extends Lemma A.1 of Bai (2003) to the out-of-sample context. Just as Bai’s (2003) result is crucial to proving the asymptotic theory of the method of principal components over one sample when both N and T are large, Theorem 4.1 is crucial to proving the asymptotic theory of this estimation method over recursive samples. Part a) of Theorem 4.1 shows that the time average over the out-of-sample period of the squared deviations between the estimated factor f~t;t and the rotated p p factor Ht ft converges to zero at rate OP 1= 2N;R , where N;R = min N ; R : Part b) shows that 9 the same rate of convergence applies uniformly over t = R; : : : ; T to the time average of the squared deviations between f~s;t and Ht fs for each recursive sample estimation period s = 1; : : : ; t. 4.2 Asymptotic equivalence This section’s main contribution is to show that Lemma 2.1 holds under the null hypothesis of equal predictability. In particular, we verify conditions a) through d), which involve the forecast errors u •1;t+1 ; u •2;t+1 , u ^1;t+1 and u ^2;t+1 , and their respective di¤erences. We have that u •2;t+1 = yt+1 zt0 •t ; where •t is the recursive OLS estimator obtained from regressing ys+1 on zs = (ws0 ; fs0 )0 , for s = 1; : : : ; t 1; whereas u ^2;t+1 is given by u ^2;t+1 = yt+1 0 where z~t;t = wt0 ; f~t;t 0 0 ^ z~t;t t; 0 0 . and ^t is the OLS estimate obtained from regressing ys+1 on z~s;t = ws0 ; f~s;t Thus, u •2;t+1 Let t = 0 ; 00 0 0 t zt ) 0 1 t t zt 0 t zt ) u ^2;t+1 = (~ zt;t since the fact that z~t;t (~ zt;t zt0 •t : = diag (Ik1 ; Ht ). Adding and subtracting appropriately yields u •2;t+1 where 0 ^ u ^2;t+1 = z~t;t t = = and 00 ; f~t;t 0 1 t •t 0 ^t + z~t;t 0 1• t t ; (8) = 0 under the null hypothesis of equal accuracy. This and Ht ft 0 0 (since wt is fully observed) explain why the term is missing in (8). The …rst term in (8) measures the discrepancy between the “ideal” predictors at time t, given 0 0 by zt = (wt0 ; ft0 )0 , and their estimated version given by z~t;t = wt0 ; f~t;t , interacted with the error in estimating using •t : The second term in (8) measures the impact of the factor estimation error on the OLS estimators. Our …rst goal is to show that p OP 1= T , whereas T 1 1 X k~ zt;t T t=R PT 1 u2;t+1 t=R (• 2 t zt k = u ^2;t+1 )2 = oP (1). By Assumption 6, supt •t T 1 1 X ~ ft;t T Ht ft 2 = OP 1= = 2 N;R t=R by Theorem 4.1. Thus, by an application of the Cauchy-Schwarz inequality, we can show that the sum of the squared contributions from the …rst term in (8) is of order OP 1= that N; R ! 1: 10 2 N;R : This is oP (1) given In turn, the sum of squared contributions from the second term is bounded by T sup ^t 0 1• t t t where T 1 if supt ^t PT 1 zt;t k2 t=R k~ 2 0 1• = t t 2 T 1 1 X k~ zt;t k2 ; T t=R = OP (1) under our assumptions. Thus, condition a) in Theorem 4.1 follows oP (1=T ) ; as shown in the next lemma. Lemma 4.1 Suppose Assumptions 1-6 hold and the null hypothesis is true. Then, ! 1 0 1 •t = OP p sup ^t : t t R N;T Since Lemma 4.1 implies that supt ^t 0 1• t t 2 1 = OP R 2 N;T 1 T = oP , this result and the above arguments imply that T X1 (^ u2;t+1 2 u •2;t+1 ) = OP t=R 1 2 N;T ! = oP (1) ; which is condition a) of Lemma 2.1. Next, we focus on condition b) of Lemma 2.1. Given the de…nition of u •2;t+1 , we can write u •2;t+1 = yt+1 where ut+1 = yt+1 T X1 zt0 •t = yt+1 wt0 = yt+1 u •2;t+1 (• u2;t+1 zt0 •t zt0 zt0 •t = ut+1 ; zt0 under H0 : It follows that u ^2;t+1 ) = t=R T X1 ut+1 (• u2;t+1 u ^2;t+1 ) t=R T X1 zt0 •t (• u2;t+1 u ^2;t+1 ) : t=R The second term is bounded by T X1 t=R zt0 •t j• u2;t+1 T X1 u ^2;t+1 j zt0 •t t=R 2 !1=2 T X1 (• u2;t+1 u ^2;t+1 )2 t=R !1=2 ; given the Cauchy-Schwarz inequality. Since the square of the …rst term is bounded by T X1 t=R zt0 •t 2 sup t p T •t 2 T 1 1 X kzt k2 ; T t=R which is OP (1) under Assumption 6, condition b) is implied by condition a) and the next lemma. Lemma 4.2 Suppose Assumptions 1-6 hold and the null hypothesis is true. If PT 1 u2;t+1 u ^2;t+1 ) = oP (1). t=R ut+1 (• 11 p T =N ! 0, then p T =N ! 0 is the same as the one used by Bai and Ng (2006) to show The rate restriction that factor estimation uncertainty disappears asymptotically when conducting inference on factoraugmented regression models. In our context, this rate restriction is used to show that the di¤erence between the out-of-sample test statistics based on the estimated factors and those based on the latent factors is asymptotically negligible. Similarly, condition c) of Lemma 2.1 follows using the same arguments as condition b), whereas d) follows from conditions a) and b) since ^2 •2 = P = P 1 1 T X1 t=R T X1 u ^22;t+1 (^ u2;t+1 u •22;t+1 2 u •2;t+1 ) + 2P t=R 1 T X1 u •2;t+1 (^ u2;t+1 u •2;t+1 ) = OP t=R 1 P ; and P ! 1: The next theorem contains our main theoretical result. Theorem 4.2 Under Assumptions 1-6, if the null hypothesis holds and p T =N ! 0, then EN C- Ff~ = EN C-Ff + oP (1) and M SE-Ff~ = M SE-Ff + oP (1) : This result justi…es using the asymptotic critical values given in Clark and McCracken (2001) for the EN C-F statistic and in McCracken (2007) for the M SE-F statistic. These critical values are derived assuming conditional homoskedasticity. Without this assumption, a di¤erent set of critical values should be used. In particular, Clark and McCracken (2012) have proposed bootstrap critical values to approximate the asymptotic distribution under conditional heteroskedasticity. 5 Simulations In this section, we analyze the …nite sample behavior of tests of equal predictive ability with estimated factors. For this purpose, we use a data-generating process similar to the one in Clark and McCracken (2001): ys fs = 0:5 b 0 0:5 ys fs 1 1 + uy;s uf;s for s = 1; : : : ; T; where the vector of errors is i.i.d. N (0; I2 ) and y0 = f0 = 0: The value of b will be set to 0 for the size experiment and 0.3 for the power experiment. 2 The panel of variables used to extract the factors is obtained as xis = i fs 2 + eis ; We also conducted an experiment with the autoregressive coe¢ cient on ys set to 0 and b set to 0 in both the benchmark and alternative models. This speci…cation is analogous to our empirical speci…cation in the next section, but because the results were similar, we decided not to report them. They are available from the authors upon request. 12 where i s U (0; 1), eis s N (0; 2 ), i and 2 i s U (0:5; 1:5). 3 We consider as sample sizes N 2 f50; 100; 150; 200g and T 2 f50; 100; 200g: We report results for three sample splits: = P R 2 f0:2; 1; 2g: The number of replications is set at 100,000. We test the predictive ability of factors using recursive windows. For each t = R; : : : ; T 1; we estimate the parameters of the model under the null hypothesis: ys+1 = using data from s = 1; : : : ; t 0 + 1 ys + u1;s+1 ; 1, while the model under the alternative hypothesis is given by ys+1 = 0 + 1 ys + fs + u2;s+1 ; s = 1; : : : ; t 1; where we estimate fs with f~s;t recursively using data up to time t, xis , i = 1; : : : ; N , and s = 1; : : : ; t. We consider the EN C-Ff~ and M SE-Ff~ statistics. All tests are conducted at the 5% signi…cance level. The results are reported in Table 1. We report right-tail rejection probabilities using the asymptotic critical values for each statistic for two cases: when the factor is known and when the factor is estimated. The asymptotic critical values are from Clark and McCracken (2001) for the EN C-F statistic and McCracken (2007) for the M SE-F statistic. The results for followed by = 1, and = 0:2 are reported …rst, = 2: In each case, we report results for the EN C-F and M SE-F; …rst with the factor assumed known and then with the factor estimated. The size results are reported in the top panel of Table 1. While there should be no e¤ect of changes in N for the observed factor case, we do see some variation that must be accounted by randomness. As predicted by theory, the statistics seem to behave very similarly under the null hypothesis regardless of whether the factor is estimated or known. The largest absolute di¤erence in rejection rates is 0:11 percentage points. The average absolute discrepancy across all sample sizes and three choices of is 0.039 percentage points. Overall, all tests have very good size properties, though there is some tendency to overreject for smaller values of T: The M SE-F statistic tends to be less subject to size distortions than the EN C-F test statistic. The bottom panel of Table 1 reports power results for the two tests considered. This panel contains an important result: estimating the factor reduces power to some degree. For example, for N = 50, T = 50; and = 0:2; power of the M SE-F statistic is 41.7% if the factor is known, but 40.3% if the factor is estimated. This is a general result for all values of N; T , and : We also …nd other results from the literature such as the power advantage of the EN C-F statistic over the M SE-F statistic 3 We have also experimented with errors drawn from a Student distribution with 3 degrees of freedom to check sensitivity to the strong moment conditions imposed in our assumptions. Results are qualitatively similar and are available upon request from the authors. 13 (see Clark and McCracken, 2001) and the fact that power is increasing in P which can be achieved by increasing T for …xed 6 or increasing for …xed T: Empirical results In this section, we provide empirical evidence on the predictive content of macroeconomic factors for the equity premium associated with the S&P 500 Composite Index. Others who have looked into factor-based prediction of the equity premium include Bai (2010), Cakmakli and van Dijk (2010), and Batje and Menkho¤ (2012). Like us, they use monthly frequency data to investigate whether factorbased model forecasts are more accurate than just using the historical mean. Unlike us, formal tests of statistical signi…cance are not necessarily conducted (though to be fair, prior to the current paper a theoretical justi…cation for doing so did not exist). In addition, in each of these three papers an in-sample information criterion is used to select the “best” model prior to forecasting – an approach we do not follow. Instead, we consider a range of possible models each de…ned by a given permutation of the current estimated factors. In our exercise, we construct factors f~t;t using a large monthly frequency macroeconomic database of 134 predictors documented in McCracken and Ng (2015).4 This dataset is designed to be representative of those most frequently used in the literature on factor-based forecasting and appears to be similar to those used in the papers cited above. If we let Pt+1 denote the value of the stock index at the close of business on the last business day of month t + 1, we construct our equity premium as yt+1 = ln(Pt+1 =Pt ) rt+1 , where rt+1 denotes the 1-month Treasury bill rate.5 After transforming the data to induce stationarity and truncating the dataset to avoid missing values, our dataset spans 1960:03 through 2014:12 for a total of T = 658 observations. We construct our forecasting exercises with two goals in mind. First, we want to identify the factors with the greatest predictive content for the equity premium. Toward this goal we consider each of the …rst 8 macroeconomic factors (f~1t ; :::; f~8t ) as potential predictors.6 As shown later, the second factor, which loads heavily on interest rate variables, is clearly the strongest predictor. Second, we want to investigate the importance of the “timing” of the estimated factors. In our reading of the factor-based forecasting literature (not limited to the papers listed earlier), it is not 4 We extract the factors using the April 2015 vintage of FRED-MD. Four of the series are dropped to avoid the presence of missing values. By doing so we can estimate the factors using precisely the same formulas given in the text. 5 The 1-month Treasury bill rates are obtained from Kenneth French’s website (http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html). 6 As discussed in McCracken and Ng (2015), Factor 1 can be interpreted as a real activity/employment factor. Factor 2 is dominated by interest rate variables. Factor 3 is concentrated on price variables. Factors 4 and 5 are a mix of housing and interest rate variables. Like Factor 1, Factor 6 concentrates on real activity/employment variables. Factor 7 is associated with stock market variables, while Factor 8 loads heavily on exchange rates. 14 always clear whether the time t factors are dated using calendar or data-release time. Note that in equation (2), the dependent variable is associated with time t + 1, while the factor is associated with time t. Since our dependent variable is observable immediately at the close of business on the last business day of month t + 1, the distinction between calendar and data-release time is irrelevant. This is in sharp contrast to the data used to construct the factors. The dataset used consists of macroeconomic series that are typically released with a 1-month delay. As such, the factor estimates based on data associated with (say) April 2015 cannot literally be constructed until May 2015 after the data is released. To evaluate the importance of this distinction, we conduct our forecasting exercise once estimating the factors f~t;t such that t represents calendar time (i.e., the data are associated with month t) and once such that t represents data-release time (i.e. the data is associated with month t 1). For all model comparisons, our benchmark model uses the historical mean as the predictor and hence Model 1 takes the form yt+1 = + u1;t+1 . The competing models are similar but add estimated factors and hence take the form 0 yt+1 = + f~t;t + u2;t+1 . We consider all 28 1 = 255 permutations of models that include a single lag of at least 1 of the …rst 8 factors. For each we construct one-step-ahead forecasts and test the null hypotheses of equal predictive ability at the 5% level using the M SE-F and EN C-F test statistics based on critical values taken from Clark and McCracken (2001) and McCracken (2007). Note that the appropriate critical values vary with the number of factors used in the competing model. All models are estimated recursively by OLS. We consider two out-of-sample periods: 1985:01-2014:12 and 2005:01-2014:12. These imply sample splits (R; P ) equal to (298; 360) and (438; 120), respectively. Rather than report all of the potential pairwise comparisons, for each out-of-sample period and for each de…nition of time t (calendar vs. data-release time), we report the results associated with the 10 best-performing factor-based models. The results are reported in Table 2. The upper panel is associated with calendar time, while the lower is associated with data-release time. For each of the four sets of results we list (i) the factors in the model, (ii) the ratio of the factor-model MSE to the benchmark model MSE, and (iii) the M SE-F and EN C-F statistics. An asterisk is used to denote signi…cance at the 5% level. The upper panel of Table 2 suggests the presence of predictability for a variety of factor-based models. Despite the fact that the gains in forecast accuracy are nominally small, both the M SE-F 15 and EN C-F tests indicate signi…cant improvements relative to the benchmark. The predictive content is largely concentrated in the second factor –a feature also documented in Bai (2010) and Batje and Menkho¤ (2012). Over the shorter, more recent, out-of-sample period the gains in forecast accuracy are larger than before but still remain modest. Interestingly, it appears that the eighth factor has become increasingly important for forecasting the equity premium and, in fact, the best 6 models all include both the second and eighth factor. All of that said, in the lower panel of Table 2 we …nd less predictive content, which suggests that the timing used to estimate factors can play an important role in determining their usefulness for forecasting. Over the longer out-of-sample period, exactly 2 of the 255 factor-based models have a nominally smaller MSE than the benchmark (note that the ratio is reported to only two decimals) and in only two instances is the improvement statistically signi…cant based on either the M SE or EN C F F . Over the shorter out-of-sample period, the evidence of predictability is a bit stronger. As in the upper panel, the second and eighth factors seem to contain predictive content, providing an improvement in forecast accuracy of about 2% that is signi…cant at the 5% level regardless of which test statistic we consider. Unlike the upper panel, the seventh factor shows up in roughly half of the models and is one of the better predictors over the longer out-of-sample period. The seventh factor, which loads heavily on stock market variables, shows up in only one model under calendar based timing. Overall, it seems the second factor, which loads heavily on interest rates, is the strongest and most consistent predictor of the equity premium. This is consistent with extensive empirical results in the literature on the ability of interest rates to forecast stock returns, see for example Hjalmarsson (2010) for evidence for developed countries. Even so, the eighth factor, which loads heavily on exchange rate variables, seems to be of increasing importance. Of course, one could certainly argue that the gains in forecast accuracy are small but, as is common in the …nancial forecasting literature, even small statistical improvements in forecast accuracy can provide large economic gains. On a …nal note, it is interesting to note that the …rst factor, which loads heavily on real activity and employment variables, never appears in Table 2. This is of independent interest since the …rst factor is the one most often used in the literature on factor-based forecasting. 7 Conclusion Factor-based forecasting is increasingly common at central banks and in academic research. The accuracy of these forecasts is sometimes compared with other benchmark models using tests of equal accuracy and encompassing for nested models developed in Clark and McCracken (2001) and Mc- 16 Cracken (2007). In this paper, we establish conditions under which these tests remain asymptotically valid when the factors are estimated recursively across forecast origins. While some of these conditions are straighforward generalizations of existing theoretical work on factors, others are new and of independent interest for applications that estimate factors recursively over time. Simulation evidence supports the theoretical results showing that the size of the tests are typically near their nominal level despite …nite sample estimation error in the factors. Perhaps not surprisingly, we …nd that this …nite sample estimation error reduces power compared with a hypothetical case in which the factors can be observed directly. The paper concludes with an empirical application in which a large macroeconomic dataset is used to construct factors and forecast the equity premium. While the predictive content is not high, we …nd signi…cant evidence that the second factor has consistently exhibited predictive content for the equity premium and that of late, the eighth factor has become increasingly more relevant. 17 A Appendix This appendix contains two sections. First, we provide several lemmas useful to prove the results in Section 4, followed by their proofs. Then we prove the results in Section 4. A.1 Auxiliary lemmas Our …rst set of results are instrumental in proving some of the auxiliary lemmas that follow. As in Bai (2003), we use the following notation: l;s l;s = E = N 1 X = E (eil eis ) ; N e0l es N l;s i=1 N fl0 0 es 1 X 0 = fl i eis , N N and N 1 X = (eil eis N E (eil eis )) ; i=1 l;s i=1 = N fs0 0 el 1 X 0 = fs i eil : N N i=1 Moreover, for t = R; : : : ; T; we let Ut;T t 1 1 X p zs us+1 T s=1 T t ut+1 ! ; and note that Ut;T is a martingale di¤erence sequence array with respect to F t given Assumption 5. Lemma A.1 Under Assumptions 1-6, a) supt 1 t b) supt 1 t c) supt 1 t d) supt 1 t e) supt 1 t f ) supt 1 t g) supt 1 t Pt l=1 Pt l=1 Pt l=1 Pt l=1 Pt l=1 Pt l=1 Pt l=1 Pt 1 s=1 l;s us+1 Pt 1 s=1 l;s us+1 Pt 1 s=1 l;s us+1 Pt 1 s=1 l;s us+1 Pt 1 s=1 0 l;s zs Pt 1 0 s=1 l;s zs Pt 1 s=1 0 l;s zs 2 2 2 2 2 2 2 = OP (1), where = OP T N ; where l;s = OP T N ; where l;s = OP T N ; where l;s = OP (1). = OP T N : = OP T N : Lemma A.2 Under Assumptions 1-6, a) supt E 1 t Pt l=1 fl Pt 1 s=1 l;s us+1 l;s 4 = O (1). 18 1 N PN i=1 E 1 N fl0 (eil eis ) : PN i=1 (eil eis 0e N s;l : s = fl0 N1 E (eil eis )) : PN i=1 i eis : b) supt E 1 t c) supt E 1 t d) supt E 1 t Pt Pt 4 1 s=1 l;s us+1 l=1 fl Pt Pt 1 s=1 l=1 fl Pt 4 l;s us+1 Pt 1 s=1 l;s us+1 l=1 fl 4 =O T 2 N : =O T 2 N : =O T 2 N : Lemma A.3 Under Assumptions 1-6, a) b) c) d) e) PT 1 T t=R kUt;T k PT t=R PT t=R PT t=R PT t=R 1 t 1 t 1 t 1 t 2 = OP (1) : Pt s=1 fs s;t Pt s=1 fs s;t Pt s=1 fs Pt s=1 fs 0 AUt;T = OP (1) ; where A is any r q 0 T AUt;T = OP N : s;t 0 AUt;T = OP s;t 0 AUt;T = OP q q T N : T N : (r + k1 ) matrix of constants. Our next result provides bounds on the norms of A1s;t ; A2s;t ; A3s;t ; and A4s;t , which are de…ned by the following equality given in Bai (2003). Speci…cally, for each t = R; : : : ; T and s = 1; : : : ; t, we can write f~s;t Ht fs = V~t 0 B X B t f~l;t Bt @ l=1 {z | 1 B1 A1s;t 1 C t t t C 1X ~ 1X ~ 1X ~ fl;t l;s + fl;t l;s + fl;t l;s C l;s + C: t t t A l=1 l=1 l=1 } | {z } | {z } | {z } A2s;t A3s;t (9) A4s;t Lemma A.4 Under Assumptions 1 through 6, a) 1 P PT b) supt 1 2 t=R kA1t;t k 1 t = OP Pt 2 s=1 kA1s;t k 1 T = OP ; whereas for j = 2; 3; 4, 1 R 1 P PT 1 2 t=R kAjt;t k ; whereas for j = 2; 3; 4, supt 1 t Pt = OP 2 s=1 kAjs;t k 1 N . = OP 1 N . Lemma A.4 and part a) of the following result (which provides uniform bounds on V~t 1 over t = R; : : : ; T ) are used to prove Theorem 4.1. Lemma A.5 Under Assumptions 1-6, we have that (a) supt V~t 1 = OP (1) ; and (b) supt kHt kq = OP (1) ; for any q > 0: Our next result derives the convergence rates of Ht ; Ht 1 , and V~t 1 . Given the identi…cation condition Assumption 1 a), we show that Ht converges to H0t = diag ( 1) at rate O (1= over t = R; : : : ; T: A similar result holds for Ht 1 at rate O (1= N;R ) 1 N;R ) uniformly and for V~t 1 , which converge to H0t1 and to V0 uniformly over t = R; : : : ; T; respectively. 19 1 Lemma A.6 Under Assumptions 1-6, H0t k = supt Ht a) supt kHt b) supt V~t 1 V0 1 1 = OP (1= H0t1 = OP (1= N;R ), N;R ), where V0 = where H0t = diag ( 1) and N;R = min p p N; R : > 0. Next, we provide a result that is auxiliary in proving Lemma 4.1. Lemma A.7 Under Assumptions 1 through 6, a) supt 1 t b) supt 1 t supt Pt 1 s=1 A1s;t us+1 Pt 1 0 s=1 A1s;t zs 1 t Pt 1 0 s=1 A4s;t zs = OP 1 R whereas for j = 2; 3; 4, supt = OP 1 R ; for j = 2; 3, supt 1 = OP : 2 N;R 1 t 1 t Pt Pt 1 s=1 Ajs;t us+1 1 0 s=1 Ajs;t zs = OP = OP p1 RN p1 RN : ; and To prove Lemma 4.1, we need to introduce some additional notation. Let ^ (t) = B t 1 t 1 X 0 z~s;t z~s;t s=1 and note that ys+1 = zs0 + us+1 with ! = 0 1 ; and V^ (t) = t 1 t 1 X z~s;t us+1 ; s=1 0 ; 00 under the null hypothesis. Moreover, adding and subtracting appropriately it is also true that 0 ys+1 = z~s;t 0 t zs ) given that (~ zs;t 0 1 t (~ zs;t = 0 when t zs ) = 0 0 0 1 t ; 00 0 0 + us+1 = z~s;t + us+1 ; and there is no estimation error in the regressors of the benchmark model. Therefore, ^t = + B ^ (t) V^ (t) and •t = + B (t) V (t) ; which implies that ^t since 0 1 t ^t 0 1• t t ^ (t) V^ (t) =B 0 1 t B (t) V (t) ; = 0 under the null hypothesis. Thus, we can write 0 1• t t = 0 1 t B (t) 1 t V^ (t) tV ^ (t) (t) + B 0 1 t B (t) 1 t V^ (t) ; where the …rst term captures the factors estimation error in the score process and the second term captures the factors estimation error in the (inverse) Hessian. The following result analyzes each term of ^t 0 1• t: t Lemma A.8 Suppose Assumptions 1-6 hold. Then, 20 a) supt V^ (t) tV ^ (t) b) supt B 0 1 t B (t) p 1 R N;T (t) = OP c) supt V^ (t) = OP p1 T 1 : 1 = OP t 2 N;R : : Our next result is useful to prove Lemma 4.2. Given (8), we can write T X1 ut+1 (• u2;t+1 u ^2;t+1 ) = t=R T X1 ut+1 (~ zt;t t zt ) 0 0 1 t •t + t=R T X1 0 ^t ut+1 z~t;t 0 1• t t : (10) t=R We analyze each piece separately. The following lemma contains the results. Lemma A.9 Suppose Assumptions 1-6 hold. If the null hypothesis holds and a) b) PT 1 zt;t t=R ut+1 (~ PT 1 0 ~t;t t=R ut+1 z A.2 0 t zt ) ^t •t 0 1 t 0 1• t t p T =N ! 0, then = oP (1) : = oP (1) : Proofs of auxiliary lemmas Proof of Lemma A.1. We start with a). Under Assumption 2 c), we can write t 1 X l;s us+1 = s=1 where we let t 1 X s=1 l;s us+1 0 if min (l; s) l;s !2 = l+ X s=l l;s us+1 l+ X l;s us+1 ; s=l 0 or max (l; s) > t. By Cauchy-Schwarz, !2 l+ X s=l 2 l;s l+ X u2s+1 (2 + 1) max s=l l;s 2 l;s l+ X u2s+1 M s=l s=l for some constant M , given Assumption 2 c), implying that 0 !2 1 ! t t 1 t T 1 l+ l+ X X X X 1 1 1 X X 2 2 @ A sup M sup us+1 = M us+1 = OP (1) l;s us+1 t t R t t l=1 provided E u2s+1 s=1 l=1 s=l l=1 s=l M; which follows under Assumption 6 a). Next, consider part b). We have that 0 !2 1 t t 1 X X 1 A sup @ l;s us+1 t t s=1 l=1 0 !2 1 t t 1X N X X 1 T 1 p sup @ (eil eis E (eil eis )) us+1 A = N t t N T s=1 i=1 l=1 0 !2 1 T t 1X N X X 1 T T @1 = sup p (eil eis E (eil eis )) us+1 A = OP NR T t N T s=1 i=1 l=1 21 l+ X T N u2s+1 ; under Assumption 3.b). For c), 0 t t 1 1X X @ sup t t l=1 s=1 0 !2 1 !2 1 t t 1 N X X X 1 A = sup @ 1 A fl0 i eis us+1 l;s us+1 t N t s=1 i=1 l=1 0 t 1 N t T 1X 1 XX sup @ kfl k2 p i eis us+1 N t t NT s=1 i=1 l=1 T sup N t | given in particular Assumption 3 c). Finally, for d), 0 t t 1 1X X @ sup t t l=1 s=1 l;s us+1 ! t t 1 N 1X 1 XX kfl k2 sup p t t N T s=1 i=1 l=1 {z {z }| 1 A 2 i eis us+1 = OP T N ; } =OP (1) =OP (1) !2 1 2 0 !2 1 t t 1 N X X X 1 A = sup @ 1 fs0 i eil us+1 A t N t s=1 i=1 l=1 0 !2 1 t N t 1 X X X 1 1 0 = sup @ fs us+1 A i eil t N t s=1 i=1 l=1 0 1 2 t 1 2 t N X X X 1 1 0 fs us+1 A sup @ i eil t N t T 2 N 1 X 1 X p = R N i=1 l=1 {z | =OP (1) given Assumptions 2 e) and Assumption 6 b). s=1 i=1 l=1 T N 0 i eil } 2 t 1 1 X sup p fs us+1 ; t T s=1 {z } | =OP (1) Part e) follows exactly as the proof of a) given Assumption 2 c) and the fact that E kzs k2 M by Assumption 6 a). Next, consider part f). We have that 1 0 0 2 t t 1 T t 1 N X X T T @1 X 1 XX 1 0 A p sup @ z = sup (eil eis l;s s t NR T t t N T s=1 i=1 l=1 s=1 l=1 2 E (eil eis )) zs0 1 A = OP under the analog of Assumption 3 b) with us+1 replaced with zs0 (cf. Assumption 3 d)). 22 T N For g), 0 t t 1 1X X @ sup t t l=1 2 1 0 s=1 t 1 X ! N 1 X N 2 1 0 fl0 i eis zs A t s=1 i=1 l=1 0 1 2 t t 1X N X X T 1 1 0 sup @ kfl k2 p i eis zs A N t t N T s=1 i=1 l=1 ! 2 t 1 N t T 1X 1 XX 2 0 sup kfl k sup p = OP i eis zs N t t t N T s=1 i=1 l=1 {z } | {z }| A = sup @ 0 l;s zs t 1X t T N ; =OP (1) =OP (1) given in particular Assumption 3 e) (the analog of 3 c). Proof of Lemma A.2. Starting with a), by the Cramer-Rao and Cauchy-Scwharz inequalities, 0 1 ! 4 4 t t 1 t t 1 X X X X 1 1 A E fl E @kfl k4 l;s us+1 l;s us+1 t t s=1 l=1 s=1 l=1 t 1X E kfl k8 t 1=2 l=1 t 1X t l=1 E kfl k8 0 @E !1=2 0 @ 8 t 1 X l;s us+1 s=1 t 1X t E t 1 X 11=2 A l;s us+1 s=1 l=1 8 11=2 A : The …rst term in parenthesis is bounded given the moment conditions on fl . For the second term, by the moving average-type assumption on E t 1 X 8 l;s us+1 given that l;s we have that for given l; 8 l+ X =E s=1 l;s , l;s us+1 (2 + 1)7 s=l M and E u8s+1 l+ X 8 l;s E u8s+1 M; s=l M by Assumptions 2 c) and 6 a). For part b), following exactly the same steps as for a), we have that t 1 t X 1X E t l=1 s=1 8 l;s us+1 = t 1 t N X 1 X 1X E (eil eis t N T N which is O T 4 N s=1 l=1 4 8 E (eil eis )) us+1 i=1 T t 1 N 1 XX 1 X sup E p (eil eis R T N s=1 i=1 l=1 R t T given Assumption 4 a). 23 8 E (eil eis )) us+1 ; For part c), t 1X sup E fl t t l=1 t 1 X l;s us+1 s=1 ! 4 4 t t 1X N X 1X 0 1 = sup E fl fl i eis us+1 t N t s=1 i=1 l=1 0 t t 1X N X X 1 8 1 @ E kfl k sup i eis us+1 N t t s=1 i=1 l=1 sup E kfl k16 l T N = O 2 ! 1=2 0 @ T4 N4 sup E p t 1 NT 4 1 A t 1X N X 8 i eis us+1 s=1 i=1 11=2 A ; given Assumptions 1 a) and 4 b). Finally, for part d), we have that ! ! 4 t N t 1 1X 1 X 0 X = sup E fl fs us+1 l;s us+1 i eil t N t s=1 s=1 i=1 l=1 l=1 0 1 4 4 t N t 1 2 X X X 1 T 1 1 p sup E @kfl k4 p fs us+1 A i eil 2 N t t N T s=1 i=1 l=1 0 0 1 1 0 1 1=2 8 8 1=2 t N t 1 2 X X X 1 T @E @kfl k8 p1 @E p1 sup fs us+1 A i eil AA N2 t t N T s=1 i=1 l=1 0 1 1 0 1=4 16 8 1=2 N t 1 1=4 X X 1 T2 1 @sup E p A @sup E p fs us+1 A sup E kfl k16 i eil N2 t N i=1 T s=1 l l t 1X fl sup E t t t 1 X 4 =O given Assumptions 1.a), 2.e) and 6.b). Proof of Lemma A.3. For part a), note that by de…nition of Ut;T ; T 1 X kUt;T k2 T T 1 X kut+1 k2 (T =R)2 sup T T t t=R t=R 1=2 t 1 X 2 zs us+1 s=1 given Assumptions 6 a) and b). For part b), it su¢ ces to show that E T X t=R t 1X fl t l=1 l;t !0 24 AUt;T = O (1) : = OP (1) T N 2 ! ; Using the triangle and Cauchy-Schwarz inequalities, E T X t=R t 1X fl t l;t l=1 !0 AUt;T T X kAk E t t=R T 1X kfl k kUt;T k t T 1 XX kAk E kUt;T k kfl k R kAk 1 R l;t l=1 ! t=R l=1 T X T X t=R l=1 l;t 1=2 E kUt;T k2 1=2 E kfl k2 l;t 1=2 1=2 sup E kfl k2 kAk sup E kUt;T k2 t l 1 R T l;t t=R l=1 = O (1) ; provided (i) supt E kUt;T k2 = O (1), (ii) supl E kfl k2 = O (1) ; and (iii) T 1 XX R PT t=R PT l;t l=1 = O (1). (ii) follows by Assumption 1 a) and (iii) follows by Assumption 2 c). Next, we show that (i) follows under our assumptions. By de…nition of Ut;T , sup E kUt;T k2 sup E ut+1 t t T t T 1=2 t 1 X s=1 1=2 (T =R)2 sup E u4t+1 t implying that supR t T zs us+1 0 ! 0 2 (T =R)2 sup E @u2t+1 T t @sup E T 1=2 t t 1 X 1 4 1=2 A zs us+1 s=1 E kUt;T k2 = O (1) since supt E u4t+1 1=2 t 1 X 2 zs us+1 s=1 ; M and supt E T 1=2 M under Assumptions 6 a) and b). Pt 1 s=1 zs us+1 4 For part c), we can write T X t=R t 1X fl t l=1 l;t !0 AUt;T = T X t=R where Wt;T = t N 1 1 XX zl (eil eit tN T t ut+1 Wt;T ; !0 E (eil eit )) l=1 i=1 A t 1 1 X p zs us+1 T s=1 ! is measurable with respect to F t . Thus, E ut+1 Wt;T jF t = 0 given the martingale di¤erence assumption 5 and it follows that ! T T X X 2 V ar ut+1 Wt;T = E u2t+1 Wt;T t=R t=R | T X t=R 25 E u4t+1 {z p =O( T ) !1=2 }| T X t=R 4 E Wt;T =O {z T N2 1=2 !1=2 } =O T N ; 1 A which su¢ ces to prove the result. Note that sup E kWt;T k4 = sup t t 4 T R t N 1 1 XX zl (eil eit tN 4 T t E E (eil eit )) l=1 i=1 0 t 1 X @sup E p1 zs us+1 t T s=1 = O 1=N !0 2 8 11=2 A ; t 1 1 X p zs us+1 T s=1 A 0 t N 1 1 XX @ sup E zl (eil eit tN t l=1 i=1 ! 4 8 E (eil eit )) A where we have used Assumption 6.b) to bound the …rst term in the square root parentheses. The second term can be bounded by 0 sup @ t 0 t 1X t l=1 N 1 X (eil eit N E kzl k8 i=1 t B1 X sup @ E kzl k16 t t l=1 t 1X sup t t l=1 E kzl k16 1=4 sup E kzl k16 l 0 8 N X 1=2 @E 1 (eil eit N !1=4 0 @ 0 16 E (eil eit )) i=1 t 1X t 11=2 E (eil eit )) A l=1 11=2 11=2 A C A 16 N 1 X E (eil eit N i=1 16 N X @ 1 sup E p1 (eil eit N 8 l;t N i=1 E (eil eit )) given Assumptions 2.d) and 6.a). Next, consider part d). By replacing T X t=R t 1X fl t l;t l=1 !0 AUt;T = T X t=R l;t 11=4 A E (eil eit )) 11=4 A =O 1 N2 and Ut;T ; we can write t N X 1X 0 1 fl fl t N i eit i=1 l=1 !0 Aut+1 T t t 1 1 X p zs us+1 T s=1 ; ! : Adding and subtracting appropriately, we can decompose the term above as the sum of two terms, 1 i=1 !0 T X N 1 X N N 1 X N T X t=R i eit t 1X fl fl0 t and 2 = t=R i=1 Ir l=1 i eit !0 Aut+1 26 !0 T t Aut+1 T t t 1 1 X p zs us+1 T s=1 t 1 1 X p zs us+1 T s=1 ! : ! 11=2 kAk4 We can bound k 1 by t T R 1k OP = OP N t 1 T X 1 X 1 X Ir sup p zs us+1 kut+1 k N t T s=1 i=1 t=R l=1 0 1 1=2 ! 1=2 2 N T T 1 X 1 1 X 1 X 2 @ p kut+1 k i eit A T NT N i=1 t=R t=R 1X fl fl0 t kAk sup t 1 p T T r ! T ; N given Assumptions 1 a) and b), 2 e), and 6 a). For 2 = T X 2, i eit we can write ut+1 t;T ; t=R where t;T = N 1 X N T t i eit i=1 is measurable with respect to F t : Since E ut+1 !0 t;T jF t 1 1 X p zs us+1 T s=1 A t ! = 0 given the martingale di¤erence assumption 5, it follows that T X V ar ut+1 t=R t;T ! = T X E u2t+1 2t;T t=R | T X t=R E u4t+1 {z p =O( T ) !1=2 }| T X 4 t;T E t=R =O {z T N2 !1=2 1=2 =O T N ; } where 4 sup E t;T t ! 4 t 1 1 X = sup E zs us+1 A p i eit t T s=1 i=1 0 1 0 8 1=2 t 1 N X 1 T 4 1 1 X kAk @sup E p zs us+1 A sup @ 4 E p R N t t T s=1 N i=1 T t !0 N 1 X N 4 8 i eit = O 1=N 2 ; given our assumptions. This proves that k Finally, to prove part e), by replacing T X t=R t 1X fl t l=1 l;t !0 AUt;T = T X T t ft0 ut+1 t=R where & t;T T t t 1X t l=1 q 2 k = OP l;t T N t l=1 N 1 X N N 1 X N i eil i=1 27 i eil i=1 ! fl0 ! A and concludes the proof of part d). and Ut;T ; we can write 1X t 11=2 A ! fl0 ! A t 1 1 X p zs us+1 T s=1 t 1 1 X p zs us+1 T s=1 ! ! = T X t=R ut+1 ft0 & t;T ; is measurable with respect to F t : Since E ut+1 ft0 & t;T jF t = 0, we can use the above argument to show PT T that V ar t=R vt+1 & t;T = O N ; given our assumptions. Proof of Lemma A.4. Part a). For j = 1; 2; 3; 4; let T 1 1 X kAjt;t k2 P Ij t 1X ~ = fl;t t with Ajt;t t=R where 1;l;t = l;t , = 2;l;t l;t , 3;l;t = l;t j;l;t ; l=1 and = 4;l;t l;t . We consider each term separately. Starting with the …rst term, 2 T 1 t 1 X 1X ~ I1 = fl;t P t t=R T 1 1 X P l;t t=R l=1 t 1X ~ fl;t t 2 l=1 ! t 1X t 2 l;t l=1 ! T T 1 XX r PR 2 l;t = O (1=T ) t=1 l=1 under Assumption 2 c) and given that T =R = O (1). Next, T 1 t 1 X 1X ~ I2 = fl;t P t t=R 2 T 1 1 X P l;t t=R l=1 t 1X ~ fl;t t 2 l=1 ! t 1X t l=1 2 l;t ! T r T 1 XX PR 2 l;t = OP (1=N ) ; t=1 l=1 given Assumption 2 d). I3 and I4 follow similarly using the fact that T T 1 XX PR T 2 l;t = t=1 l=1 T 1 XX PR 2 l;t = OP (1=N ) : t=1 l=1 Part b). The proof follows the same argument as part a) and is therefore omitted. We start with (a). Recall that V~t denotes the eigenvalue matrix of Proof of Lemma A.5. Xt Xt0 =tN for t = R; : : : ; T; with its eigenvalues arranged in a non-increasing order: v~1;t v~r;t . It follows that V~t 1 = diag v~j;t1 ; where v~1;t1 v~2;t1 : : : v~r;t1 : Hence, V~t Pr ~j;t2 r~ vr;t2 ; where v~r;t is the rth largest eigenvalue of Xt Xt0 =tN . Since j=1 v sup V~t 1 t it su¢ ces to show that inf t j~ vr;t j r1=2 sup v~r;t1 = r1=2 t 1 2 v~2;t ::: = tr V~t 2 = 1 ; inf t j~ vr;t j C > 0 for some constant C, with probability approaching 1. We use an argument similar to that used by Fan et al. (2013; cf. Lemma C.4). First, note that for s = 1; : : : ; t, where t = R; : : : ; T; xs = fs + es ; where xs = (x1s ; : : : ; xN s )0 is N to es : This implies that the N N covariance matrix of xs is equal to E xs x0s = E fs fs0 or equivalently, 1 N = 1 N 0 + 1 and a similar de…nition applies 1 N e; 0 + E es e0s = 0 + e; given that E (fs eis ) = 0 (by Assumption 3 a)) and the identi…- cation condition Assumption 1. Given this assumption, the r r matrix 0 N is diagonal with distinct and positive elements on the main diagonal. This implies that its eigenvalues, say positive and distinct for j = 1; : : : ; r. Since 0 and 28 0 j 0 N , are all share the same nonzero eigenvalues, it follows 0 that has r nonzero eigenvalues, with the remaining eigenvalues equal to 0. By Proposition 1 of Fan et al. (2013) (which relies on Weyl’s eigenvalue theorem), and given Assumption 1, we can then show that for j = 1; : : : ; r, the di¤erence between the j th eigenvalue of 0 =N is bounded by the maximum eigenvalue of ( =N ) j 0 j e =N ; 1 k N =N =N and the j th eigenvalue of that is, e k1 1 N =O = o (1) ; given Assumption 2 b). By a second application of Weyl’s Theorem (cf. Fan et al., 2013, Lemma 1), for each j = 1; : : : ; r; v~j;t 0 j Xt0 Xt =tN =N N 1 1 ; given that Xt0 Xt and Xt Xt0 share the same nonzero eigenvalues. Thus, v~r;t since j j 0 r ( =N ) j Xt0 Xt =tN =N ( 0 1 N min ( 1 =N )j = o (1) and the fact that j inf j~ vr;t j min ( t j 0 ( =N ) sup Xt0 Xt =tN ) =2 Xt0 Xt =tN ) =2 j N ( N 1 1 1 1 ; )j = o (1). It follows that 1 1 t and hence it su¢ ces to show that supt Xt0 Xt =tN N = oP (1) : But ; Xt0 Xt tN N = D1t + D2t + D3t + D4t ; where t D1t = 1X fs fs0 t N s=1 t 1X D3t = Ir Nt ! 0 t 1X es e0s t 1 ; D2t = N s=1 e ! ; 0 fs e0s , and D4t = D3t ; s=1 where we recall that E (fs fs0 ) = Ir by assumption. It follows that sup Xt0 Xt =tN N 1 since kAk1 sup kD1t k + sup kD2t k + 2 sup kD3t k ; 1 t t t t kAk for any matrix A (see Horn and Johnson, 1985, p. 314). Starting with D1t , we have that sup kD1t k = t p 2 = N | {z } =tr( 0 sup t =N )=O(1) T t t 1X fs fs0 T Ir s=1 ! C t T 1X sup fs fs0 R t T Ir s=1 for some constant C: Since T =R = O (1) by assumption, it follows that supt kD1t k = oP (1) given that p P supt T1 ts=1 (fs fs0 Ir ) = OP 1= T = oP (1) by Assumption 1. Next, consider D2t . We have that for any " > 0; P sup kD2t k > " t T X t=R P (kD2t k > ") 29 T 1 X E kD2t k2 ; "2 t=R P thus it su¢ ces to prove that Tt=R E kD2t k2 = o (1). By the de…nition of the Frobenius norm, kAk2 = P 2 tr (A0 A) = N N matrix. Thus, for given t = R; : : : ; T; i;j=1 aij for any N E kD2t k 2 = !2 N t N t t 1 X 1X 1 X 1 X X E (eis ejs E (eis ejs )) = 2 cov (eis1 ejs1 ; eis2 ejs2 ) N2 t N t2 s=1 s1 =1 s2 =1 i;j=1 i;j=1 1 0 N T T T 1 @1 X 1 X X jcov (eis1 ejs1 ; eis2 ejs2 )jA ; R2 N N T s1 =1 s2 =1 i;j=1 which implies that T X t=R 2 T R E kD2t k2 1 N T X T X X 1 @1 1 jcov (eis1 ejs1 ; eis2 ejs2 )jA = O (1=N ) = o (1) ; N N T 0 s1 =1 s2 =1 i;j=1 0 ; we have that given in particular Assumption 2 f). Finally, for D3t = D4t 2 t 1X 2 kD3t k 1 p N t N | {z } 2 C 1 C R s=1 N t 1 X 1X C fs eis N t fs e0s |{z} i=1 r N t N 1 X 1 X p fs eis N T s=1 i=1 T R 2 s=1 2 1 R = OP = oP (1) ; given Assumption 3 a) and the fact that T =R = O (1) : To prove (b), we prove the result for q = 2 and note that this implies the result for any q > 0 since q sup kHt k = sup kHt k t q=2 2 q=2 2 = sup kHt k = OP (1) t t if supt kHt k2 = OP (1). By de…nition, for t = R; : : : ; T; Ht = V~t kHt k V~t 1 F~t0 Ft t 0 ; by the fact that kA Bk N sup kHt k 2 sup V~t t 2 1 t su¢ ces to show that supt F~t0 Ft t F~t0 Ft t 2 = t 1 t X 0 N ; implying that kAk kBk. Thus, F~ 0 Ft sup t t t 0 where the …rst factor is OP (1) given part a) and F~t0 Ft t 1 2 0 N 2 ; = O (1) under Assumption 1 c). Thus, it N 2 = OP (1) : By the Cauchy-Schwarz inequality, 2 f~s;t fs0 t s=1 implying that F~ 0 Ft sup t t t 2 | r 1 t X f~s;t s=1 {z T t } =r T R 2 1 T X s=1 30 1 t X s=1 kfs k2 r t kfs k2 = OP (1) 1 t X s=1 kfs k2 ; since sups E kfs k2 = O (1) by Assumption 1 a). For part a), we follow the proof of result (2) in Bai and Ng (2013; cf. Proof of Lemma A.6. Appendix B, p. 27). In particular, by the de…nition of Ht , we can show that for t = R; : : : ; T; Ht Ht0 = Ir + C1t + C2t + C3t ; where t 1X ~ fs;t f~s;t t C1t = 0 Ht fs t 1X ~ fs;t t ; C2t = s=1 and s=1 t C3t = 1X fs fs0 t Ht Ir s=1 It follows that 8 t < 1X sup f~s;t t : t sup kC1t k t 2 s=1 !1=2 t 1X ~ fs;t t ! t 1X t t = r1=2 sup Ht fs fs0 Ht0 Ht fs s=1 Ht0 : f~s;t Ht fs s=1 2 !1=2 = OP (1= !1=2 9 = 2 N;R ) ; ; given Theorem 4.1. Similarly, t 1X kfs Ht k2 sup t t sup kC2t k t s=1 !1=2 t 1X ~ fs;t sup t t Ht fs s=1 2 !1=2 = OP (1= N;R ) ; by Theorem 4.1 and the fact that the …rst factor can be bounded by t 1X sup kfs Ht k2 t t s=1 !1=2 T sup kHt k2 t 1 X kfs k2 R s=1 !1=2 = OP (1) given Lemma A.5.b). Finally, for C3t we have that t sup kC3t k t sup kHt k2 sup t t 1X fs fs0 t p Ir = OP 1= T s=1 given Assumption 1 b) and the fact that T =R = O (1) : This shows that Ht Ht0 = Ir + OP (1= N;R ) ; where the remainder is uniform over t = R; : : : ; T: We can now follow the argument in Bai and Ng (2013) to show that Ht = H0t + OP (1= matrix with N;R ) uniformly over t = R; : : : ; T , where H0t is a diagonal 1 on the main diagonal. Note that we are not assuming exactly the same identifying restrictions as Bai and Ng’s (2013) condition PC1. The di¤erence is that we assume that f instead of assuming that Ft0 Ft =t = Ir for each t: This explains why we get only the rate 1= instead of 1= 2 N;R : For part b), note that we can show that V~t = 0 N + OP (1= 31 N;R ) N;R = Ir by exactly the same argument as Bai and Ng (2013, p. 27). Given our Assumption 1.b), we can write V~t = uniformly over t. Since V~t sup V~t 1 1 1 = V~t sup V~t 1 1 t t + OP (1= V~t 1 N;R ) 1 ; it follows that V~t sup t (11) 1 = OP (1) OP (1= N;R ) O (1) given Lemma A.5 b) and (11). Proof of Lemma A.7. Part a). We rely on Lemma A.1. In particular, for j = 1; : : : ; 4; t 1 1X t Ajs;t us+1 t 1 X~ fl;t t2 = s=1 t 1 X s=1 l=1 = where we let 0 t t 1 1p @1 X X r t t be one of j;l;s j;l;s us+1 l=1 ! 2 C B X t 2C 1B ~ B1 fl;t C C tB A @ t l=1 {z } | 11=2 A j;l;s us+1 s=1 l;s ; l;s ; l;s ; l;s 11=2 0 =r 0 @ t t 1 1X X t l=1 2 j;l;s us+1 s=1 11=2 A ; depending on j = 1; 2; 3; 4. By Lemma A.1.a) through d), it then follows that sup t t 1 1X t 2 s=1 p 1 pT R N t t 1 1X X 1p 4 r sup @ R t t Ajs;t us+1 whereas it is OP 0 =O p1 RN 2 j;l;s us+1 s=1 l=1 131=2 A5 = OP 1 R if j = 1; for j = 2; 3; 4, given that T =R = O (1) : Part b). The same arguments follow by relying on Lemma A.1 e), f), and g). For the last term involving A4s;t , the analogue of Lemma A.1 d) when we replace us+1 with zs0 does not hold because 2 P we cannot claim that supt p1T ts=11 fs zs0 = OP (1) since E (fs zs0 ) = (E (fs ws0 ) ; E (fs fs0 )) 6= 0: Thus, we need a more re…ned analysis. Replacing A4s;t with its de…nition yields 1 sup t t t 1 X A4s;t zs0 s=1 = sup t 1 t = sup t t where supt t 1 Pt 1 0 s=1 fs zs t 1 s=1 1 t sup t t 1 X 1 t X l=1 t X1 t X f~l;t l=1 f~l;t N 1 X N i=1 fs zs0 s=1 sup t t N 1 X N i=1 ! 0 i eil 1 t X l=1 t 0 i eil 1 f~l;t ! t 1 X fs zs0 fs zs0 s=1 N 1 X N i=1 0 i eil ! ; is bounded under Assumption 6. By adding and subtracting appropriately, 32 we can write sup t 1 t t X N 1 X N f~l;t 0 i eil i=1 l=1 ! 1 sup t t t X f~l;t N 1 X N Ht fl l=1 1 + sup kHt k sup t t t t X 1 N fl l=1 i=1 N X 0 i eil ! 0 i eil i=1 ! : p The …rst term can be shown to be OP (1= N;R ) OP 1= N given Theorem 4.1 and the fact that 2 PN 0 1 PT p1 e = OP (1) under our assumptions. Thus, this term is OP 1= 2N;R : For the il i t=1 i=1 T N second term, given that supt kHt k = OP (1), it su¢ ces that ! t N t N X p 1 1 X 0 T 1 XX 1 p p sup t fl e = sup fl 0i eil = OP 1= T N ; i il N R t TN t T N l=1 i=1 i=1 l=1 which follows under Assumption 3.e). This concludes the proof of part b). Proof of Lemma A.8. Part a). Given the equality (9) and the de…nitions of V^ (t) and V (t), we have that t 1 1X Ht V (t) = (~ zs;t t V^ (t) t zs ) us+1 = 1 t s=1 which implies that ! 0 Pt 1 s=1 f~s;t Ht fs us+1 ; t 1 sup V^ (t) t sup V~t 4 X 1 t t t 1 1X sup t t j=1 1X ~ fs;t t Ht V (t) = sup Ht fs us+1 s=1 Ajs;t us+1 = OP s=1 1 R + OP 1 (B A) B 1 p RN 1 p R N;R = OP ! ; by Lemmas A.5 and A.7. 1 Part b). Given the equality A 0 1 t B (t) OP (1) and supt ^ B 1 (t) 1 tB 1 0 t (t) = A 1 2 supt t 1 B t =t 1 t 1 X 1, supt kB (t)k = OP (1), it su¢ ces to consider 0 z~s;t z~s;t 0 t zs zs 0 t s=1 where F1t = t 1 t 1 X (~ zs;t 0 t zs ) zs;t t zs ) (~ and F2t = t s=1 1 By the triangle inequality and Theorem 4.1, kF1t k t 1 t 1 X s=1 k~ zs;t t zs k 2 ^ (t) = and the fact that supt B =t 1 t 1 X s=1 33 f~s;t Ht fs 0 = F1t + F2t + F2t ; t 1 X (~ zs;t 0 t zs ) zs s=1 2 = OP 1 2 N;T ! : 0 t: For F2t ; note that 1 F2t = t where f~s;t Ht fs = V~t t 1 X 0 t z s ) zs (~ zs;t 0 t = t s=1 1 1 0 Pt 1 s=1 f~s;t Ht fs zs0 0 t ! ; (A1s;t + A2s;t + A3s;t + A4s;t ). Thus, F2t = V~t 1 1 t t 1 X (A1s;t + A2s;t + A3s;t + A4s;t ) zs0 s=1 ! 0 t; which given Lemma A.7 is bounded in norm by sup V~t sup kF2t k t 1 sup k t which is OP t 2 N;R tk 4 X j=1 t 1 1X Ajs;t zs0 = OP sup t t 1 +OP R s=1 1 p +OP RN 1 p +OP RN : Part c). We have that sup V^ (t) sup V^ (t) t tV t p where supt kV (t)k = OP 1= T (t) + sup k 0 ut+1 (~ zt;t t zt ) (t)k ; given Assumption 6.b). Proof of Lemma A.9. Part a). Given that •t T X1 t k sup kV t t 0 1 t •t = T X1 = B (t) V (t), we can write ut+1 (~ zt;t t zt ) 0 0 1 t B (t) V t=R t=R (t) = A1 + A2 ; where A1 = T X1 0 t zt ) ut+1 (~ zt;t 0 1 0t BV t=R Let B = (E (zs zs0 )) 1 (t) , and A2 = T X1 0 t zt ) ut+1 (~ zt;t 0 1 t B (t) 0 1 0t B V (t) : t=R p Bk = OP 1= T . By the triangle and note that by Assumption 6, supt kB (t) and Cauchy-Schwarz inequalities, jA2 j T X1 t=R sup t = OP jut+1 j k~ zt;t 0 1 t B (t) 1 N;T if p supt OP 0 1 t B (t) t zt k 0 1 0t B 1 p T 0 1 0t B T 1X 2 ut+1 T sup kV (t)k T t t=1 1 OP (T ) OP kV (t)k = OP 0 1 t B (t) 0 1 0t B = OP (1= N;T ), 1 T (2) supt kV (t)k = OP 34 p T 2 N;T N;T T =N ! 0, where we have used Theorem 4.1 to bound !1=2 PT T 1X k~ zt;t T t=1 ! t zt k 2 !1=2 = oP (1) ; zt;t t=1 k~ p 1= T , (3) t zt k 1 T 2 PT , and where (1) 2 t=1 ut+1 = OP (1) 1 2 N;R ! ; hold under our assumptions. To see (1), note that 0 1 t B (t) sup t 0 1 0t B 1 sup 1 0t t t = OP (1= N;T ) OP (1= NT ) that z~t;t t t p + OP 1= T 1 given Assumption 6 and the fact that supt 1 0t t 1 0t sup kB (t)k + sup = OP (1= sup kB (t) Bk t N;T ) ; is bounded by supt Ht 1 H0t1 , which is by Lemma A.6. (2) and (3) follow easily under Assumption 6. For A1 , note that given t zt = 00 ; f~t;t Ht ft 0 and (~ zt;t where C is a r A1 = T X1 t zt ) 0t 0 = diag (Ik1 ; H0t ), with H0t = diag ( 1), we can write 0 1 0t B = f~t;t Ht ft 0 H0t C; (r + k1 ) submatrix of B containing its last r rows. It follows that f~t;t ut+1 Ht ft t=R 0 T 1 1 X ~ H0t CV (t) = p ft;t T t=R Ht ft 0 H0t C p | T V (t) ut+1 ; {z } Ut;T where we note that Ut;T is a martingale di¤erence sequence array with respect to F t given Assumption 5. Given the decomposition (9), we can write f~t;t Ht ft = V0 1 (A1t;t + A2t;t + A3t;t + A4t;t ) + V~t 1 V0 1 (A1t;t + A2t;t + A3t;t + A4t;t ) ; which implies that we can decompose A1 into the sum of A1:1 and A1:2 with A1:1 = A1:2 = T 1 1 X p (A1t;t + A2t;t + A3t;t + A4t;t )0 V00 T t=R 1 T 1 1 X p (A1t;t + A2t;t + A3t;t + A4t;t )0 V~t T t=R p We can prove that kA1:2 k = OP T 2 N;T = oP (1) if p H0t CUt;T ; 1 V0 1 0 H0t CUt;T : T =N ! 0, given that supt V~t 1 V0 1 = PT 1 by Lemma A.6; that T1 t=R for all j = 1; : : : ; 4 by Lemma A.4 a), kAjt;t k2 = OP 21 NT P and given that T1 Tt=R kUt;T k2 = OP (1) by Lemma A.3 a). So, we concentrate on A1:1 : We can OP 1 NT decompose 4 X A1:1 = R1 + R2 + R3 + R4 with obvious de…nitions. Let 1;l;t = l;t , 2;l;t = l;t ; 3;l;t = j=1 l;t , Rj and 4;l;t = l;t : Given the de…nition of Ajt;t for j = 1; : : : ; 4, we can write t Ajt;t = 1X ~ fl;t t l=1 t l;t = H0t 1X fl t l=1 35 t j;l;t + 1X ~ fl;t t l=1 H0t fl j;l;t ; (12) which implies that T 1 1 X Rj = p T t=R t 1X fl t j;l;t l=1 !0 V0 1 T 1 t 1 X1X ~ CUt;T + p fl;t T t=R t l=1 H0t fl 0 0 1 j;l;t V0 H0t CUt;T R1j +R2j ; where we have used the fact that V0 is diagonal under our assumptions and H0t = diag ( 1) to 0 V0 write H0t 0 1 whereas R1j = OP p kR2j k p 1 H0t = V0 T V0 T V0 p1 N sup kH0t k 1 t=R p t=R t T V0 , for j = 2; 3; 4: For R2j , note that T t 0 1 X 1X ~ kUt;T k fl;t H0t fl j;l;t T t t t=R l=1 !1=2 0 T T t X 1 1 X 1X ~ 2 @ sup kH0t k kUt;T k fl;t T T t t 1 p1 T when de…ning R1j : By Lemma A.3, we can conclude that R11 = OP 1X ~ sup kH0t k sup fl;t t t t 1 H0t fl 2 l=1 Note that T H0t fl 0 2 j;l;t l=1 !1=2 T 1 X kUt;T k2 T t=R !1=2 11=2 A T T 1 XX TR 2 j;l;t t=1 l=1 T 1 XX TR 2 l;t = O (1=T ) t=1 l=1 under Assumption 2 c), T T 1 XX PR T 2 l;t t=1 l=1 T !2 N 1 X p (eil eit N i=1 1 1 XX = N PR t=1 l=1 E (eil eit )) = OP (1=N ) given Assumption 2 d), T T 1 XX PR !2 T X T N X X 1 1 2 fl0 i eit l;t = PR N t=1 l=1 t=1 l=1 i=1 1 !0 2 T T N X X X 1 1 1 p kfl k2 @ i eit A = OP (1=N ) ; R P N t=1 i=1 l=1 T 2 l;t = t=1 l=1 T 1 XX PR 1 N given Assumption 2 e) and the fact that E kfl k2 = O (1) : Moreover, t sup t 1X ~ fl;t t l=1 H0t fl 2 t = sup t ( 1X ~ fl;t t Ht fl + (Ht 2 l=1 t 1X ~ 2 sup fl;t t t Ht fl 2 l=1 = OP 1= by Lemma A.6 and Theorem 4.1. Since H0t ) fl 1 T 2 N;R PT + OP 1= 2 t=R kUt;T k kR2j k = OP 1 NT 36 2 N;R T + sup kHt t = OP 1= 1 X H0t k kfl k2 R 2 l=1 2 N;R ) ; = OP (1) by Lemma A.3 a), it follows that when j = 1; !1=2 : whereas for j 2; ! p T p kR2j k = OP ; N p = N : Thus, A1:1 = OP NT p p which is OP p1N if N T = T and OP NT if p oP (1) if T =N ! 0; which concludes part a). NT 1 p T N + OP NT = Part b). Given that ^t 0 1• t t 0 1 t B (t) = we can write V^ (t) 1 t T X1 ^ (t) (t) + B tV 0 ^t ut+1 z~t;t 0 1• t t B1 = T X1 0 ut+1 z~t;t 0 1 t B (t) V^ (t) 1 t tV (t) and B2 = t=R p We can easily show that B2 = OP T = oP (1) if 2 NT p T X1 0 ^ (t) ut+1 z~t;t B t B (t) 1 t B 1 0t 1 0t , where we note that = t zt ) 0t . p ! T C + OP 2 NT + (~ zt;t 0 1 t B (t) 2 NT V^ (t) tV (t) = t t 1 X (~ zs;t and then replacing B (t) 1 t with B The extra terms can be shown to be OP t zs ) us+1 = t s=1 0k1 r ~ = Vt Ir r | {z } 1 C= T X1 ut+1 zt0 B C1 = C2 = T X1 t 1 t 1 X T X1 ut+1 zt0 B 4 X t t 1 X 1 0t JV0 4 X 1 t V~t 1 V0 1 37 ! Ajs;t us+1 1 t 1 X f~s;t 4 X t ! C1 + C2 ; Ajs;t us+1 1 Ht fs us+1 : s=1 j=1 t=R 0 1 s=1 Ajs;t us+1 j=1 0t J) Pt s=1 t=R ut+1 zt0 (B 1 s=1 j=1 t=R where and ~ 0t J Vt 1 j=1 J With this notation, 4 X V^ (t) : : by applying Theorem 4.1 and Lemma A.8. Notice that we can write 1 1 t T =N ! 0, given in particular Lemma A.8 b) t=R t zt V^ (t) ; t=R and c). For B1 , note that we can show that under our assumptions, p ! T X1 T B1 = ut+1 z 0 B 0t V^ (t) t V (t) + OP This follows by …rst replacing z~t;t with 1 t = B1 + B2 ; t=R where 0 1 t B (t) t 1 X s=1 ! Ajs;t us+1 ! : ! 1 0t + p T 2 NT p We can easily show that C2 = OP T by the usual inequalities, given Lemmas A.6 b) and A.7 a) P4 and b). Consider C1 . We have that C1 = j=1 Sj , where ! T t 1 X1 X Sj = Ajs;t us+1 , where 0t B 0t JV0 1 : ut+1 zt0 0t t 1 2 NT s=1 t=R Given the de…nition of Ajs;t , by adding and subtracting appropriately (see (12) with s = t), we can write Sj = S1j + S2j ; where S1j = and S2j = It follows that T X1 T X1 t 1 ut+1 zt0 0t H0t s=1 t=R ut+1 zt0 0t t=R t 1 t s=1 l=1 t sup k t 0t k 1 R t=R l=1 T X1 t 1X t=R kut+1 zt k sup t t f~l;t j;l;s us+1 l=1 1X1X ~ fl;t t t T 1 t 1X ~ 1 X fl;t sup k 0t k kut+1 zt k sup R t t t sup jS2j j t 1X1X fl t t H0t fl t 1 X H0t fl j;l;s us+1 : j;l;s us+1 s=1 2 H0t fl l=1 !1=2 0 @sup t t t 1 1X X t l=1 s=1 j;l;s us+1 !2 11=2 A By Lemma A.7, the last factor is OP (1) when j = 1, which implies that supt jS2j j = OP (1= N T ) for p T =N , which this value of j. For j = 2; 3; and 4, Lemma A.1 implies that the last factor is OP p p implies that supt jS2j j = OP (1= N T ) OP T =N = oP (1) if T =N ! 0: Hence, we conclude that C1 = where we can write with sj;t+1 tion of (zt ; zt ut+1 zt0 Zjt and Zjt = (zt ; zt 4 X j=1 S1j + oP (1) ; T 1 1 X S1j = sj;t+1 T t=R T 1 Pt 1 Pt 0t H0t t t s=1 l=1 fl 1 ; : : : ; ut ; ut 1 ; : : : ; et ; et 1 ; : : :) 1 ; : : : ; yt ; yt 1 ; : : : ; Xt ; Xt 1 ; : : :), j;l;s us+1 . Noting that Zjt is a func- and that this information set is identical to F t = given that yt+1 = zt0 + ut+1 and that Xt = ft + et ; it follows that E sj;t+1 jF t = 0 by Assumption 5. Thus, V ar (S1j ) = T 1 T 1 i 1 X 1 X h 2 2 0 E s = E u z Z t+1 jt j;t+1 t T2 T2 1 T2 1 T2 t=R T X1 t=R T X1 t=R t=R E h ut+1 zt0 E ut+1 zt0 2 kZjt k2 4 1=2 i E kZjt k4 38 1=2 1 T 1=2 sup E kZjt k4 t T 1 1 X E ut+1 zt0 T t=R 4 1=2 : : We can check that supt E kZ1t k4 = OP (1) and supt E kZjt k4 = OP (T =N )2 for j = 2; 3; 4 by Lemma A.2. Since E kut+1 zt0 k4 M by Assumption 6, it follows that V ar (S1j ) is equal to O (1=T ) when j = 1 and O (1=N ) when j = 2; 3, or 4: This shows that S1j = oP (1) and concludes the proof of part b). A.3 Proof of results in Section 4 Proof of Theorem 4.1. Part a). By (9), it follows that T 1 1 X ~ ft;t P Ht ft 2 t=R 4 sup V~t t where supt V~t 1 2 1 2 1 P T X1 t=R T 1 1 X ~ = Vt P t=R T X1 1 kA1t;t k + P 2 t=R 1 (A1t;t + A2t;t + A3t;t + A4t;t ) 2 T 1 T 1 1 X 1 X 2 kA2t;t k + kA3t;t k + kA4t;t k2 P P 2 t=R t=R ! = OP (1) by Lemma A.5. The result now follows by Lemma A.4.a), which shows that the …rst term inside the parentheses is OP (1=T ), whereas the remaining terms are OP (1=N ) : Part b). The proof follows exactly the same reasoning as part a), given Lemma A.4.b). Proof of Lemma 4.1. The proof is immediate given Lemma A.8. Proof of Lemma 4.2. The proof is immediate given (10) and Lemma A.9. Proof of Theorem 4.2. The proof follows from Lemma 2.1, given Lemmas 4.1 and 4.2 and the arguments described in the main text. References [1] Andreou, Elena, Ghysels, Eric, and Andros Kourtellos (2013), “Should Macroeconomic Forecasters Use Daily Financial Data?” Journal of Business and Economic Statistics 31, 1-12. [2] Bai, Jennie (2010), “Equity Premium Predictions with Adaptive Macro Indices,” manuscript. [3] Bai, Jushan (2003), “Inferential Theory for Factor Models of Large Dimensions,” Econometrica 71, 135-171. [4] Bai, Jushan (2009), “Panel Data Models with Interactive Fixed E¤ects,”Econometrica 77, 12291279. [5] Bai, Jushan, and Kunpeng Li (2012), “Statistical Analysis of Factor Models of High Dimension,” Annals of Statistics 40, 436-465. [6] Bai, Jushan, and Serena Ng (2006), “Con…dence Intervals for Di¤usion Index Forecast and Inference with Factor-Augmented Regressions,” Econometrica 74, 1133-1155. 39 [7] Bai, Jushan, and Serena Ng (2013), “Principal Components Estimation and Identi…cation of Static Factors,” Journal of Econometrics 176, 178-189. [8] Batje, Fabian, and Lukas Menkho¤ (2012), “Macro Determinants of U.S. Stock Market Risk Premia in Bull and Bear Markets,” manuscript. [9] Cakmakli, Cek and Dick van Dijk (2010), “Getting the Most out of Macroeconomic Information for Predicting Stock Returns and Volatility,” manuscript. [10] Chamberlain, Gary, and Michael Rothschild (1983), “Arbitrage, Factor Structure and Meanvariance Analysis in Large Asset Markets,” Econometrica 51, 1305-1324. [11] Ciccarelli, Matteo, and Benoit Mojon (2010), “Global In‡ation,” Review of Economic Statistics 92, 524-535. [12] Clark, Todd E., and Michael W. McCracken (2001), “Tests of Equal Forecast Accuracy and Encompassing for Nested Models,” Journal of Econometrics 105, 85-110. [13] Clark, Todd E., and Michael W. McCracken (2005), “Evaluating Direct Multistep Forecasts,” Econometric Reviews 24, 369-404. [14] Clark, Todd E., and Michael W. McCracken (2012), “Reality Checks and Comparisons of Nested Predictive Models,” Journal of Business and Economic Statistics 30 53-66. [15] Diebold, Francis X., and Roberto S. Mariano (1995), “Comparing Predictive Accuracy,”Journal of Business and Economic Statistics 13, 253-263. [16] Fan, Jianqing, Yuan Liao, and Martina Mincheva (2013), “Large Covariance Estimation by Thresholding Principal Orthogonal Complements,”Journal of the Royal Statistical Society: Series B 75, 603-680. [17] Fosten, Jack (2015): Forecast Evalutation with Factor-Augmented Models, University of Surrey, manuscript. [18] Gonçalves, Silvia, and Benoit Perron (2014), “Bootstrapping Factor-Augmented Regression Models,” Journal of Econometrics 182, 156-173. [19] Han, X., and Atsushi Inoue (2015), “Tests of Parameter Instability in Dynamic Factor Models,” Econometric Theory, forthcoming. 40 [20] Hansen, Peter Reinhard, and Allan Timmermann (2015), “Equivalence Between Out-of-Sample Forecast Comparisons and Wald Statistics,” Econometrica, forthcoming. [21] Hjalmarsson, Erik (2010): Predicting Global Stock Returns, Journal of Financial and Quantitative Analysis, 45, 49-80. [22] Hofman, Boris (2009), “Do Monetary Indicators Lead Euro Area In‡ation?” Journal of International Money and Finance 28, 1165-1181. [23] Horn, Roger A., and Charles R. Johnson (1985), Matrix Analysis, Cambridge University Press, Cambridge UK. [24] Kelly, Brian, and Seth Pruitt (2013), “The Three-Pass Regression Filter: A New Approach to Forecasting With Many Predictors,” Journal of Econometrics 186, 294-316. [25] Li, Jia, and Andrew J. Patton (2015), “Asymptotic Inference about Predictive Accuracy using High Frequency Data,” manuscript. [26] Ludvigson, Sydney, and Serena Ng (2009), “A Factor Analysis of Bond Risk Premia,” Handbook of Empirical Economics and Finance, A. Ullah and D. Giles Eds., 313-372, Chapman and Hall. [27] McCracken, Michael W. (2007), “Asymptotics for Out-of-Sample Tests of Granger Causality,” Journal of Econometrics 140, 719-752. [28] McCracken, Michael W., and Serena Ng (2015), “FRED-MD: A Monthly Database for Macroeconomic Research,” manuscript. [29] Pagan, Adrian (1984), “Econometric Issues in the Analysis of Regressions with Generated Regressors,” International Economic Review 25, 183-209. [30] Pagan, Adrian (1986), “Two Stage and Related Estimators and Their Applications,” Review of Economic Studies 53, 517-538. [31] Randles, Ronald H. (1982), “On the Asymptotic Normality of Statistics with Estimated Parameters,” Annals of Statistics 10, 463-474. [32] Shintani, Mototsugu (2005), “Nonlinear Forecasting Analysis Using Di¤usion Indexes: An Application to Japan,” Journal of Money, Credit and Banking 37, 517-38. [33] Stock, James H., and Mark W. Watson (1996), “Evidence on Structural Instability in Macroeconomic Time Series Relations,” Journal of Business and Economic Statistics, 14, 11-30. 41 [34] Stock, James H., and Mark W. Watson (2002), “Macroeconomic Forecasting Using Di¤usion Indexes,” Journal of Business and Economic Statistics 20, 147-162. [35] Stock, James H., and Mark W. Watson (2003), “Forecasting Output and In‡ation: The Role of Asset Prices,” Journal of Economic Literature 41, 788-829. [36] Stock, James H., and Mark W. Watson (2007), “Has U.S. In‡ation Become Harder to Forecast?” Journal of Money, Credit, and Banking 39, 3-33. [37] West, Kenneth D. (1996), “Asymptotic Inference About Predictive Ability,” Econometrica 64, 1067-1084. 42 43 b = 0.3 (power) b=0 (size) 6.7 6.6 6.8 6.7 51.6 51.2 51.6 51.5 73.7 73.6 73.6 73.7 50 100 150 200 50 100 150 200 50 100 150 200 50 100 150 200 50 100 150 200 100 200 50 100 200 92.2 92.1 92.3 92.2 6.1 6.1 6.1 6.0 8.5 8.6 8.5 8.4 50 100 150 200 50 ENC-Ff N T 91.1 91.5 91.9 91.8 71.7 72.7 72.9 73.3 49.8 50.2 51.0 51.0 6.1 6.1 6.1 6.0 6.7 6.6 6.7 6.7 8.4 8.6 8.6 8.4 ENC-Ff˜ 74.8 74.8 74.9 74.6 58.1 58.1 58.0 58.0 41.7 41.4 41.6 41.6 6.3 6.3 6.3 6.2 6.7 6.7 6.7 6.6 7.8 8.1 8.0 7.8 MSE-Ff π = 0.2 73.3 74.1 74.4 74.2 56.6 57.4 57.5 57.7 40.3 40.6 41.1 41.2 6.2 6.3 6.2 6.1 6.6 6.6 6.7 6.6 7.8 8.1 8.0 7.8 MSE-Ff˜ 99.1 99.2 99.2 99.2 87.8 87.8 87.8 87.8 60.7 60.7 60.8 60.6 5.5 5.4 5.6 5.7 6.1 6.0 6.0 6.0 7.1 7.2 7.1 7.0 ENC-Ff 98.9 99.1 99.1 99.1 86.0 86.9 87.2 87.4 58.2 59.4 59.9 59.9 5.5 5.4 5.5 5.7 6.0 5.9 6.1 6.0 7.0 7.1 7.1 7.0 ENC-Ff˜ 92.1 92.0 92.0 92.2 74.4 74.2 74.1 74.2 48.9 48.9 48.9 48.7 5.3 5.2 5.3 5.5 5.5 5.5 5.5 5.5 6.0 6.1 6.0 6.1 MSE-Ff π=1 91.1 91.5 91.7 91.9 72.3 73.2 73.4 73.8 47.0 47.8 48.2 48.4 5.3 5.2 5.3 5.5 5.5 5.4 5.5 5.5 6.0 6.1 6.0 6.0 MSE-Ff˜ Table 1: Size and power for MSE-F and ENC-F (%) 99.4 99.4 99.5 99.4 88.9 89.0 89.0 88.9 61.7 61.7 61.7 61.6 5.3 5.3 5.4 5.3 5.8 5.9 5.7 5.9 7.1 7.1 7.4 7.3 ENC-Ff 99.2 99.3 99.4 99.4 87.0 88.1 88.5 88.5 59.0 60.4 60.7 60.9 5.3 5.3 5.4 5.3 5.7 5.9 5.7 5.9 7.0 7.1 7.3 7.3 ENC-Ff˜ 95.9 96.0 95.8 95.8 79.3 79.4 79.6 79.4 50.9 50.9 51.0 50.7 5.1 4.9 5.1 5.0 5.1 5.2 5.0 5.2 5.3 5.2 5.3 5.4 MSE-Ff π=2 95.2 95.7 95.6 95.7 77.3 78.5 78.8 78.9 48.7 49.9 50.2 50.1 5.0 4.9 5.1 5.0 5.1 5.2 5.0 5.2 5.2 5.3 5.2 5.4 MSE-Ff˜ Table 2: Factor-based forecasting of the equity premium Calendar time 1985:01 - 2014:12 Model Ratio MSE-F 2 0.98 5.52* 2, 6 0.99 4.82* 2, 8 0.99 4.71* 2, 6, 8 0.99 4.22* 3.56* 2, 4 0.99 2, 3 0.99 3.42* 2, 4, 8 3.28* 0.99 2, 7 0.99 3.15* 2, 3, 6 0.99 3.12* 2.97* 2, 3, 8 0.99 1985:01 - 2014:12 Model Ratio MSE-F 2, 7 0.99 2.47* 1.00 2, 6, 7 1.51 1.00 1.47 2 2, 7, 8 1.00 0.94 2, 6 1.00 0.07 1.00 2, 6, 7, 8 0.07 1.00 -0.09 7 2, 3, 7 1.00 -0.10 2, 4, 7 1.00 -0.13 2, 8 1.00 -0.15 2005:01 - 2014:12 Ratio MSE-F 0.95 5.73* 0.96 5.51* 0.96 5.46* 0.96 5.26* 4.66* 0.96 0.96 4.45* 4.33* 0.97 0.97 3.99* 0.97 3.86* 3.56* 0.97 ENC-F 5.49* 4.87* 5.41* 4.79* 3.47* 3.39* 4.21* 4.11* 3.50* 2.86* 2005:01 - 2014:12 Model Ratio MSE-F 2, 8 0.98 2.51* 2, 4, 8 0.98 2.36* 2, 7, 8 0.98 2.21* 2, 4, 7, 8 0.98 2.15* 2 0.98 2.03* 2, 4 0.98 1.83* 2, 7 0.99 1.68* 2, 4, 7 0.99 1.58 2, 3, 8 0.99 1.26 2, 3, 4, 8 0.99 1.17 ENC-F 2.17* 2.48* 1.79* 2.09* 1.77* 2.05* 1.38* 1.66* 1.90* 2.21* ENC-F Model 7.14* 2, 8 6.21* 2, 6, 8 8.05* 2, 5, 8 7.21* 2, 5, 6, 8 5.99* 2, 4, 8 7.43* 2, 4, 5, 8 6.89* 2 6.88* 2, 5 6.58* 2, 6 8.32* 2, 4, 6, 8 Data-release time ENC-F 3.08* 2.69 2.25* 2.77 1.87 2.41 0.80 2.50 2.28 1.95 Note: The table reports empirical evidence regarding the use of factor-based models for predicting the equity premium at the one-month horizon. Only the top 10 models, by MSE, are reported. The top panel uses factors estimates dated by calendar time while the lower panel uses factor estimates dated by data-release time. Model denotes which factors are included in a given model. Ratio denotes MSE(factor)/MSE(benchmark). An asterisk next to the values of the MSE-F and ENC-F denote significant at the 5% level. 44