Measuring Uncertainty of a Combined Forecast and a New Test for Forecaster Heterogeneity Kajal Lahiri, Huaming Peng Department of Economics University at Albany, SUNY and Xuguang Sheng Department of Economics American University Abstract Using a standard factor decomposition of a panel of forecasts, we have shown that the forecast uncertainty from the standpoint of a policy maker can be expressed as the disagreement among forecasters plus the perceived variability of common aggregate shocks. Thus, the uncertainty of the average forecast is not the variance of the average forecast but rather the average of the variances of the individual forecasts, where the combined forecast is obtained by minimizing the risk averaged over all possible forecasts rather than the risk of the combined forecast. Using a new statistic developed in this paper to test the heterogeneity of idiosyncratic errors under the joint limits with both T and N approaching infinity simultaneously, we show that the previous used uncertainty measures significantly underestimate the correct benchmark forecast uncertainty. JEL classification: C12; C33; E37 Keywords: Forecast Uncertainty, Forecast Disagreement, Forecast Combination, Panel Data, Survey of Professional Forecasters 1 2 1 Introduction Consider the problem of a macro policy maker who often has to aggregate a number of competing forecasts from experts for the purpose of a uniform policymaking. A general solution was provided by Bates and Granger (1969) who have inspired extensive research on forecast combination, as evidenced by two surveys by Clemen (1989), Timmermann (2006), and many additional papers since 2006.1 The solution based on minimizing the mean square error of the combined forecasts calls for a performance-based weighted average of individual forecasts with precision of the combined forecast that is readily shown to be better than any of the constituent elements under reasonable conditions. Thus, Wei and Yang (2011) characterize this approach as “combination for improvement”. However, a surprising result found by many is that a simple average is often as good as the Granger-Bates estimator, cf. Genre et al.(2013) for a recent case study. Under the assumption that all the cross correlations of forecast errors are due to a common aggregate shock, the precision of this equally-weighted average is simply a function of the variance of this common shock, netting out all idiosyncratic errors. Palm and Zellner (1992) is an authoritative summary of the developments in this area using alternative approaches till the early nineties. One characteristic of this precision formula for the average (or what is often called consensus) forecast is that it does not reflect disagreement among experts as part of forecast uncertainty, which may be undesirable in many situations. As Winkler (1989, p.607) and Timmermann (2006, p.141) have noted, heightened discord among forecasters ceteris paribus may be indicative of higher uncertainty in the combined forecast from the standpoint of a policy-maker. In many contexts, disagreement has been used as a sole proxy for forecast uncertainty. It is important to note that the performance of the consensus forecast relative to individual forecasts follows trivially from Jensen’s inequality, which states that with convex loss functions, the loss associated with the mean forecast is generally less than the mean loss of individual forecasts, see McNees (1992) and Manski (2011). Thus, the consensus forecast is not more accurate than every individual forecast in the pool; rather it outperforms an individual forecast drawn randomly from the available forecasts.2 Moreover, 1 Granger and Jeon (2004) call this approach of making inference based on combined outputs from alternative models as “thick modeling”. 2 Roughly one-third of individual forecasts have been found to be more accurate than 3 as we elaborate later, the wedge between the two is determined precisely by the variance of the cross sectional distribution of forecasts. Thus, should disagreement be a part of forecast uncertainty, the presumed superiority of the consensus forecast over the typical forecaster will be lost, and the primary advantage of averaging will be an insurance value against idiosyncratic errors that may be generated due to heterogeneous learning in periods of structural instability, breaks and bad judgements, cf. Hendry and Clements (2002). The procedure will essentially rule out forecasts that are inferior in a particular period to that of the typical forecaster on the average. The objective of the combination approach will then be less ambitious, and will be similar to that in the machine learning literature where forecasts are combined on line to adapt to a few superior ones, see Wang (1994), and Sancetta (2012). We show that a slight reformulation of the Bates-Granger loss function justifies one to take the average of the individual forecast variances rather than the variance of the average as the correct measure for aggregate uncertainty. A more fundamental justification for incorporating disagreement as part of aggregate uncertainty comes from the rich literature on model averaging using both Bayesian and Frequentist approaches, see Draper (1995) and Buckland et al. (1995) for an early explication of the result. One motivation to explore the importance of using a theoretically sound uncertainty measure of the consensus forecast comes from the recent advances in the presentation and communication strategies by a number of central banks, pioneered by Bank of England’s fan charts to report forecast uncertainty. For the credibility of forecasts in the long run, it is essential that the reported confidence bands for forecasts be properly calibrated. In the U.S., from November 2007, all FOMC members are required to provide their judgments as to whether the uncertainty attached to their projections is greater than, smaller than, or broadly similar to typical levels of forecast uncertainty in the past. In order to aid each FOMC member to come up with their personal uncertainty estimates, Reifschneider and Tulip (2007) have proposed a measure for gauging the average magnitude of historical uncertainty using information on past forecast errors from a number of private and government forecasters. These benchmark estimates for a number of target variables are reported in the minutes of each FOMC meeting and are used by the public to interpret the responses of the FOMC participants. a consensus, but the mix of the superior forecasts change depending on the occassion, see Zarnowitz (1984), McNees (1992) and Hibbon and Evgeniou (2005). 4 Even though this measure incorporates the disagreement amongst forecasters as a component of forecast uncertainty, we show that it underestimates the true historical uncertainty if the individual forecast errors are heterogeneous. Given that these historical benchmark numbers are fed into the highest level of national decision making, a careful examination of the Reifschneider and Tulip (RT) formula cannot be overemphasized. In this paper we establish the asymptotic relationships between these alternative measures of uncertainty under the joint limits with both T and N approaching infinity simultaneously, and develop a test to check if the uncertainty measures are statistically different and the forecasters are exchangeable. Using panel-data sequential asymptotics, Issler and Lima (2009) have shown the optimality of the (bias-corrected) simple average forecast. Our test for forecaster homogeneity differs from the Lagrange multiplier test for heteroskedasticity developed by Baltagi et al. (2006) in that we do not assume that the idiosyncratic variance is a function of some known covariates. A Monte Carlo study confirms that the test performs well in our context. We also use individual inflation and output growth forecasts from the Survey of Professional Forecasters (SPF) over 1982-2011 to show that the uncertainty measure conventionally attached to a consensus forecast and the RT benchmark measure underestimate the true uncertainty, often substantially, that increase steadily as the horizon goes from one to the eighth quarter. Our test also confirms these results at the 1% level for all eight forecast horizons. The plan of the paper is as follows. Section 2 derives the relationship between disagreement and overall forecast uncertainty. In Section 3, we reexamine the reported confidence bands by Granger and Jeon (2004) for the Taylor rule parameters based on the average of 24 alternative models. Section 4 compares different measures of average historical uncertainty and develops a new test for forecast homogeneity. In Section 4 we use SPF data on real GDP and inflation to highlight the differences in the alternative uncertainty measures, and implement our test for forecaster homogeneity. Finally, Section 6 summarizes the results and present some concluding remarks. Technical proofs of Proposition and Theorem in section 3 are presented in the Appendix. 5 2 Uncertainty and Disagreement Let Yt be the random variable of interest, Fith be the forecast of Yt made by individual i at time t − h. Then individual i’s forecast error, eith , can be defined as eith = At − Fith , (1) where At is the actual realization of Yt . Following a long tradition, e.g., Palm and Zellner (1992), Davies and Lahiri (1995)and Issler and Lima (2009), we write eith as the sum of an individual time invariant bias, αih , a common component, λth and idiosyncratic errors, εith : eith = αih + λth + εith , λth = h X utj . (2) (3) j=1 Without loss of generality the deterministic individual bias αih is assumed P to satisfy α·h = N1 N j=1 αih = 0 for convenience of exposition. In addition, 1 N →∞ N we assume sup αih < ∞ so that lim i,h N P i=1 2 is well defined. The common αih component, λth , represents the cumulative effect of all shocks that occurred from h-quarter ahead to the end of target year t. Equation (3) shows that this accumulation of shocks is the sum of shocks (utj )that occurred over the forecast span for each quarter. Even if forecasters make “perfect” forecasts, the forecast error may still be nonzero due to shocks which are, by nature, unpredictable. Forecasters, however, do not make “perfect” forecasts even in the absence of unanticipated shocks. This “lack of perfection” is due to other factors (e.g., differences in information acquisition and processing, interpretation, judgment, and forecasting models) specific to a given individual at a given point in time and is represented by the idiosyncratic error, εith . In order to establish the relationship between different measures of uncertainty and derive their limiting distributions, we make the following simplifying assumptions: Assumption 1(Common Components) 6 (i) utj is a strictly stationary and ergodic stochastic process over t for all 2 j with E(utj ) = 0, E(u2tj ) = σuj , and E |utj |4+δ < ∞ for some δ > 0. In addition, utj is independent of uts for any j 6= s. 2 2 4 (ii) E(utj ut−k,j ) − σuj → 0 as k → ∞ for all t and j, and 2 lim T →∞ var Ph j=1 utj 2 TP −1 P h 4 ] + 1 = c > 0. (1 − Tk )[E(u2tj u2t−k,j ) − σuj k=1 j=1 Assumption 2 (Idiosyncratic Components) 2 ) = 1, εith = σεih υith . υith is i.i.d. across i and t, with E(υith ) = 0, E(υith 2 2 8 < ∞. ≤ sup σεih < ∞ for all i; 0 < inf σεih Eυith i i Assumption 3 (Relations) utj is independent of υish for all i, j, s, t and h. Remark 1. Assumption 1 implies that the aggregate shock λth (= hj=1 utj ) is weakly dependent over time, and λ2th is completely regular, that is, the correlation between λ2th and λ2t−k,h eventually dies out as the the time lag k goes to infinity. The temporal dependence condition is thus intermediate between uniform mixing and strong mixing (see, for example, Hall and Heyde (1980) for a detailed explanation). The positiveness condition in (ii) controls the speed at which the correlation diminishes, and is important for obtaining central limit theorem for the dependent sequence λ2th . P Remark 2. In Assumption 2 cross-agent heterogeneity is allowed but correlation in the time dimension is ruled out for analytical simplicity. Remark 3. Assumption 3 is standard in the literature. It helps the identification of the variance from aggregate shocks and that from idiosyncratic shocks. Taken together, Assumptions 1-3 imply that the individual forecast error is a strictly stationary and ergodic process for any h and has the factor model interpretation. Given a panel of forecasts, we can decompose the average squared individual forecast errors as N N 1 X 1 X e2ith = (At − F·th )2 + (Fith − F·th )2 . N i=1 N i=1 (4) 7 where F·th = N1 N j=1 Fjth . Now taking time average on both sides of (4) we get a measure of historical forecast uncertainty based on past errors: P T X N T T X N 1 X 1 X 1X (At − F·th )2 + e2ith = (Fith − F·th )2 . N T t=1 i=1 T t=1 N T t=1 i=1 (5) This is our suggested measure of historical uncertainty of the average forecast, and is readily seen to be the average of the individual variances observed over the sample period. This is also the measure suggested by Reifschneider and Tulip (2007). Equation (5) states that the measure can be decomposed into two components: uncertainty that is common to all forecasters and uncertainty that arises from heterogeneity of individual forecasters. The first component is the variance of the average that is conventionally taken as the uncertainty of the consensus forecast. The second component is the disagreement among the forecasters. While analyzing individual density forecasts reported in SPF, Zarnowitz and Lambros (1987) used the average of the individual variances as the uncertainty of the average forecast. Giordani and Soderlind (2003) and Lahiri and Sheng (2010) provided theoretical justifications for its use. The above empirical measure of forecast uncertainty that carries the additional disagreement term can easily be justified by reformulating the objective function of the policy maker. Since each individual forecast comes with its own uncertainty, we will argue that the policy maker while using the combined forecast should minimize the risk averaged over all possible forecasts. This is particularly the case where the forecasters are all experts who do not collude or have any systematic predictive superiority over others in all periods. The problem is then formulated as min ω i N X ωi E(e2ith ) (6) i=1 subject to the constraint N X ωi = 1. i=1 Under (2) and assumptions 1-3, the overall risk function becomes N X i=1 ωi E(e2ith ) = E(λ2th ) + N X i=1 ωi E(ε2ith ) + N X i=1 2 ωi αih . (7) 8 Now the overall risk is composed of the risk associated with the common shock, the weighted average of idiosyncratic risks and the weighted average of squared bias. But, as it is now well recognized, when N is large, the estimation of ωi based on variance and covariance is problematic due to the curse of dimensionality. As shown by Issler and Lima (2009), an optimal approach is to set ωi = N1 empirically for large N . Then the overall risk becomes N N N 1 X 1 X 1 X E(e2ith ) = E(λ2th ) + E(ε2ith ) + α2 . N i=1 N i=1 N i=1 ih (8) Interestingly, the use of population average of the individual loss functions as the social loss function for a policy maker has also been proposed and justified, among others, by Woodford (2005) on the ground that it is consistent with both the rational and herding behavior of market participants. In practice, E(e2ith ) is estimated using observed forecast errors. Thus, the policy maker, having an ensemble of forecasts, should expect to face an overall risk given by 1 TN N P T P e2 i=1 t=1 ith on the average. Note that this loss function is different from the loss function implicit in the vast forecast combination literature following the seminal work of Bates and Granger (1969). Specifically, with a quadratic loss function, Bates and Granger posited the policy-maker’s problem as N X E[( min ω i ωi eith )2 ] (9) i=1 subject to the constraint3 N X ωi = 1. i=1 Under the popular equal weighting scheme, we have N N 1 X 1 X 2 2 E[( eith ) ] = E(λth ) + 2 E(ε2ith ) N i=1 N i=1 3 (10) Elliott and Timmermann (2004) have shown that the optimal combination weights remain the same for a much wider variety of loss functions including the popular MSE loss, provided the aggregate forecaast errors are elliptically symmetric. They, however, did not consider cross sectional distribution of the individual forecasts. 9 because of the restriction α·h = N1 N j=1 αih = 0. Note that in equation (10) the idiosyncratic risk is of magnitude order O(N −1 ), and will be negligible P for large N , making the overall risk E[( N1 N P eith )2 ] completely dominated i=1 by the common risk E(λ2th ). By contrast, in equation (8), the idiosyncratic and aggregate risks are of the same order of magnitude O(1), and thus the idiosyncratic term will not vanish as N goes to infinity. Even in cases where N is small, the term 1 N2 N P i=1 E(ε2ith ) is always less than 1 N N P i=1 E(ε2ith ), and hence the measure of uncertainty imbedded in the conventional forecast combination literature as in (10) will be less than that associated with our measure as in (8) regardless of the values of the individual biases. This is because, while using the combined forecast, equation (10) fails to propagate the uncertainties associated with individual forecasts as a component of risk to the policy-maker by "sweeping disagreement under the rug", as Giordani and Soderlind (2003) had put it. Observe that (8) can be written as N N N 1 X 1 X 1 X Ee2ith = E [Fith − (At − αih )]2 + α2 , N i=1 N i=1 N i=1 ih (11) where the first component on the right hand side is identified as the average uncertainty measure within forecasters, while the second term is the disagreement measure due to systematic biases in individual forecasts. A similar decomposition of forecast uncertainty has been obtained by Lahiri, et.al (1988), Draper (1995) and Wallis (2005) in cases where probability distribution of forecasts are available.4 Note that (11) can be further simplified, by virtue of the assumptions 1-3 to N N N 1 X 1 X 1 X 2 2 2 2 , (12) Eeith = σλ + σεi + αih N i=1 N i=1 N i=1 which is the population analog of (5). It is now obvious from (12) that the uncertainty of the average arises from the variance of the aggregate shock common to all forecasters and from the heterogeneity of individual forecasters that contains both the average idiosyncratic variance and the average 4 Geweke and Amisano (2013) have presented a parallel decomposition of predictive variance from Bayesian model averaging in terms of i) intrinsic, ii) within-model extrinsic, and iii) between-model extrinsic variances. 10 squared forecast bias. In order to appreciate the second term, one has to realize that forecasts generated by any particular model or forecaster will have its own undeniable uncertainty, as Tetlow (2009) has demonstrated for the FRB/US model of the Federal Reserve. When individual experts provide forecasts partly based on judgments as in SPF and Blue Chip surveys, additional psychological variability will be involved too. Recognizing that each point forecast is invariably an outcome from an underlying forecast distribution, the combined forecast should be treated conceptually as the mean of a finite mixture distribution of unobserved densities, which suggests that the variance of the combined forecast will be the average of the individual variances plus an disagreement term due to systematic biases in the individual forecasts, cf. Hoeting et al. (1999) and Wallis (2005). What is not readily recognized in the literature is that apart from the disagreement P 2 coming from time-invariant systematic biases (i.e., N1 N i=1 αih ), the average of the individual variances also contains a disagreement component coming P 2 from N1 N i=1 σεi . In the context of the empirical examples on real GAP and inflation forecasts that we report in section 4, an ”uncertainty audit” reveals that the variance explained by the systematic bias component is small compared to the other two components 12)(see Tables 7 and 8). A similar result on the insignificance of the bias term is also reported by Reifscneider and Tulip (2006). That is why the assumption αih = 0 (for all i and h) is inconsequential in our context. Equation (5) helps to reconcile the divergent findings in previous empirical studies examining the appropriateness of disagreement as a proxy for uncertainty. As Zarnowitz and Lambros (1987) have pointed out, there may be periods where all forecasters agree on relatively high macroeconomic uncertainty in the immediate future, and hence disagreement between forecasters will be low even though uncertainty is high. The opposite is also possible where forecasters disagree a lot about their point forecasts, but are confident about their individual predictions. This situation will arise when forecasters disagree on otherwise precise models and scenarios that should be used to depict the movement of the economy over the forecasting horizon. Thus, lacking any theoretical basis, the strength and the stability of the relationship between disagreement and uncertainty becomes an empirical issue. But our result clearly suggests that due to the presence of e2·th (i.e., the volatility of common shocks) in equation (5), the strength of the relation between uncertainty and disagreement will depend crucially on the sample period, the target variable, and length of the forecast horizon, see Lahiri and 11 Sheng (2010). 3 Measures of Historical Uncertainty Based on the argument in Section 2, the historical risk faced by a policy maker while using the average forecasts from N experts is simply RMSEhLP S = v u u t T X N 1 X e2 . T N t=1 i=1 ith (13) On the other hand, the conventional choice as suggested by Bates and Granger (1969) is the variance of the mean forecast as measured by the root mean squared error (RMSE) of the average forecast RMSEhAF = v u u X u1 T t T t=1 N 1 X eith N i=1 !2 . (14) With the stated objective of using the root mean squared prediction errors made by a panel of forecasts to generate a benchmark estimate of historical forecast uncertainty, Reifschneider and Tulip (2007, p. 14), however, average the individual RMSEs as in the following formula: RMSEhRT v u N u T X 1 X t1 e2 . = N i=1 T t=1 ith (15) They explicitly recognized that their average RMSE is not the conventional RMSEhAF associated with the average forecast as in (14), and noted “rather we average the individual RMSEs of our forecasts in order to generate a benchmark for the typical amount of uncertainty we might expect to be associated with the separate forecasts of the different members of our sample, including the FOMC”. Note that RM SERT is not exactly the same as RM SELP S ; nevertheless, we will show that it would incorporate at least partially the disagreement as a component of uncertainty. Below we now present the asymptotic relationships between these measures and establish the limit 2 2 distribution of a statistic for testing the null hypothesis that σεih = σεh . As will be clear, testing for forecaster homogeneity has a very special significance in the present context. 12 To see the differences in these uncertainty measures, we derive the following limit theorem: Proposition 1 Suppose Assumptions 1-3 hold. Then, as (T, N → ∞), (1) RMSEhAF h P 2 = where σλh j=1 2 2 →p σλh , 2 . σuj (2) T 1/2 RMSEhAF 2 N 1 X 2 − σλh − 2 σ2 =⇒ N (0, Σλ ) , N i=1 εih ! " E(λ2th where Σλ = − 2 )(λ2 2 )] E[(λ2th −σλh −σλh (t−k)h 2 )2 E(λ2th −σλh 2 2 ) σλh 1 + 2 lim # TP −1 T →∞ k=1 (1 − k )ρλ (k) T and ρλ (k) = . (3) RMSEhRT 2 N q 1 X 2 2 lim σλh + σεih N →∞ N i=1 →p !2 . (4) 2 T 1/2 RMSEh RT =⇒ N 0, N q 1 X 2 2 − σλh + σεih N i=1 !2 N q X 1 N →∞ N 2 2 σλh + σεih lim !2 i=1 1 N →∞ N lim N X 1 q i=1 2 2 σλh + σεih (5) 1 N →∞ N 2 where σεh = lim N P i=1 RMSEhLS 2 2 2 →p (σλh + σεh ), 2 σεih . (6) T 1/2 2 RMSEhLS N 1 X − σ2 =⇒ N (0, Σλ ). N i=1 εih ! − σλ2 2 Σλ . 13 Remark 4. Although RM SEAF has a bias term of order O( N1 ) originating from averaging the idiosyncratic shocks in finite samples, it is a consistent estimator for the uncertainty associated with forthcoming common shock, which is allowed to be more general than a strictly moving average process. In establishing the joint limit distribution, we need the preliminary result that 1 N N P εith →p 0 or i=1 N P 1 N Fith →p At + λth as N → ∞. Similar results i=1 are also obtained by Issler and Lima (2009) in their Proposition 3 for their bias-corrected average forecast, under the assumption that the common shock is a moving average process of order at most (h − 1). While their paper is aimed at obtaining the bias-corrected average forecast under the sequential limit where T → ∞ followed by N → ∞, our focus is to find an appropriate benchmark measure of overall forecast uncertainty. In this regard, RM SEAF completely ignores the uncertainty associated with the idiosyncratic shocks, especially when N is large. Remark 5. Proposition 1 shows that RMSEhAF ≤ RMSEhRT ≤ RMSEhLS in 1 N →∞ N the limit, because lim N q P σ2 i=1 λh 1/2 2 2 2 + σεih is less than (σλh + σεh ) s of Jensen inequality, with the latter given by 1 N →∞ N 2 σλh + lim N P i=1 by virtue 2 σεih in the presence of unequal idiosyncratic error variances. √ h h the same T and RM SELP Remark 6. Although both RM SERT S have 2 h h since rate of convergence, RM SERT is more dispersed than RM SELS lim 1 N →∞ N N q P σ2 i=1 λh !2 2 + σεih lim 1 N →∞ N N P √ 21 2 σλh +σεih i=1 !2 ≥ 1 by the Cauchy-Schwarz inequality. In a special case where there is no individual heterogeneity with 2 2 respect to idiosyncratic error variance, that is, σεih = σεh a.s. for all i and h, RT and LPS will have the same limit distribution. h h To check whether RM SERT and RM SELP S are statistically different in the context of a particular data set, we propose a simple test statistic under 2 2 the null hypothesis that σεih = σεh for almost all i5 . The following limit theorem provides a theoretical basis to conduct the statistical inference for both large T and large N . 5 2 2 2 6= σεh for a fixed nonzero fraction of σεih . The alternative hypothesis is that σεih 14 2 2 Theorem 1 Suppose Assumptions 1-3 hold. Then, if σεih = σεh for almost all i, 2−1/2 N −1/2 N h X i=1 2 2 4σbλh σbεih + Vbih i−1 T 1 T T X t=1 e2ith − 1 NT N X T X i=1 t=1 !2 e2ith − 1 =⇒ N (0, 1) as (T, N → ∞) and N T → 0, where 1 h 2 h 2 (RM SELP , S ) − (RM SEAF ) N T N 1 X 1X εb2ith , and εbith = eith − eith = T t=1 N i=1 2 h σbλh = (RM SEAF )2 − 2 σbεih Vbih = T 2 1X 2 εb2ith − σbεih T t=1 Remark 7. Theorem 1 establishes the joint limit distribution of the statistic 2 2 for almost all i and each h. = σεh for testing the null hypothesis that σεih The restriction N → 0 controls for the effect of errors in approximating the T variance of T P √1 e2 , T t=1 ith the special bias in panel estimation, and prevents the approximation errors from having a dominated asymptotic impact on the test statistic. Remark 8. Our decomposition of forecast errors in equation (2) can be expressed as a two-way error component model if we write εith = µih + ξith , 2 assuming µih ∼ N (0, σµ2 ih ) and ξith ∼ N (0, σξh ).6 In this framework, our 2 2 test for the null hypothesis that σεih = σεh for almost all i and each h is 2 essentially equivalent to testing for σµih = 0 for almost all i. Several tests for homoscedasticity in error component models have been developed in the literature. For example, Verbon (1980) derived a Lagrange multiplier test for the null hypothesis of homoscedasticity against the alternative of heteroscedasticity by assuming that σµ2 ih is a function of some 6 Mazodier and Trognon (1978) studied models allowing for heteroscedasticity in the individual-specific error µih , while Rao et al (1981), Magnus (1982), Baltagi (1988) and Wansbeek (1989) considered models allowing for heteroscedasticity in the error term ξith . 15 exogenous variables.7 In the similar spirit, Baltagi et al. (2006) developed joint LM tests for homoscedasticity. Holly and Gardiol (2000) derived a score test for homoscedasticity. All of these LM tests are based on the assumption that the variance is a parametric function of some observable exogenous variables as in the original Breusch and Pagan (1979) test for homoscedasticity. Our test, based on Theorem 1, differs from the tests mentioned above, since we do not assume that the idiosyncratic variance is a function of some variables. In fact when N is small, N h X 2 2 4σbλh σbεih + Vbih i=1 i−1 T T N X T 1 X 1X e2ith − e2 T t=1 N T i=1 t=1 ith !2 (16) approximately follows a χ2N −2 distribution, and is also closely related to N P 1 T 8(RM SELS )3/2 i=1 T 1 P 2 e N T 1 P P 2 e !2 , the dominant term ith − N T ith t=1 i=1 t=1 h h RM SELP S and RM SERT . However, when N goes T in the dif- to infinity ference between with T , the test statistic in (16) is of stochastic order Op (N ), leading to overrejection of the null even if the null hypothesis is true. Theorem 1 addresses this size issue by re-centering and then properly scaling the test statistic such that central limit theorem can be applied8 . A similar approach was adopted by Pesaran and Yamagata (2008) in their test of slope homogeneity in large 7 2 2 Verbon also assumed ξith ∼ N (0, σξih ) and σξih is a function of the set of exogenous 2 variables defining σµih . 8 We conducted a Monte Carlo study to explore the size and power of the new test at the 5% nominal level. The model is eit = λt + εit where λt follows a stationary AR(1) process, 2 λt =√ρλt−1 √ + ωt , with ρ = 0 or 0.4. ωt is randomly drawn alternatively √ √ from N (0, σω ) or 2 U (− 3σω , 3σω ). εit are i.i.d. drawn from N (0, σεi ) or U (− 3σεi , 3σεi ). We studied 2 the size of the test under the null hypothesis H0 : σεi = σε2 for almost all i, and the power 2 2 of test under the alternative H1 : σεi 6= σε for a fixed proportion r (0 < r ≤ 1). To explore 2 the effect of changes in r on the power of test, we let σεi = σε2 for (1 − r) of all i, and the 2 remaining σεi were generated according to U (0.1σε , 1.9σε ), with r = 0.2 or 0.4. The values of σω2 ,σε2 are selected from the set {(0.05, 0.01), (0.1, 0.01), (0.2, 0.2), (0.6, 1.5), (3, 1)}, and are similar to the estimated parameters in the empirical examples below. Tables 1-2 report the simulation results. Two points are worth noting. First, the Z test (Table 1) yields reasonably good empirical size under different values of N , T and σ2 2 signal to noise ratio (SN R = (1−ρω2 )σ2 ). Second, when only 20% of σεi are different from ε 2 σε (Table 2, Panel A), the empirical power of test is low, especially for small N . Even when N is moderately large, the test tends to under-reject the null hypothesis provided that both σω2 and σε2 are small and the SNR is large. But the test does well in detecting heteroscedasticity for moderately large σω2 and σε2 when both N and T are large. The 16 random coefficient panel data models. A detailed procedure for their test can also be found in Hsiao and Pesaran (2008). It is interesting to note that, when idiosyncratic error variances are the same across forecasters, both RM SERT and RM SELP S can be interpreted as measuring the uncertainty of a typical forecast in the panel. This interpretation is consistent with the explicitly stated objective in Reifscheinder and Tulip (2007) that the benchmark uncertainty figures they are providing is that of a typical forecaster in the pool that will be individualized by forecasters based on their own special experiences. Let us assume that in predicting the variable for the target year t, denoted by At , each forecaster possesses two types of information: (i) common information, yth = At + ηth with precision κth ; and (ii) private information, zith = At + ith with precision τth , where ηth and ith ’s are mutually independent and normally distributed with mean 0. Then according to Bayes rule, individual i’s forecast Fith is a weighted average of the two signals with their precision as the weights: κth yth + τth zith , (17) κth + τth 1 . Taking exand the uncertainty of an individual forecast is simply κth +τ th pectations on both sides conditional on available information, the left-hand side of equation (4) becomes Fith ≡ E[At | yth , zith ] = N N 1 X κth ηth + τth ith 2 1 X 2 E[(At − Fith ) | yth , zith ] = E[( ) | yth , zith ] N i=1 N i=1 κth + τth 1 = . (18) κth + τth Thus, we see that our uncertainty measure, RM SELP S , is the same as the 1 variance of an individual forecast, with both of them equal to κth +τ in the th population. So it can be interpreted as the confidence an outside observer will have in a randomly drawn typical individual forecast from the panel of forecasters, as pointed out first by Giordani and Söderlind (2003). Note that the expected value of the first term on the right-hand side of equation (4) can be expressed as power of test is close to one when the SNR is not greater than one for N ≥ 40. The test becomes more powerful when r increases. More specifically, the test under r = 0.4 (Table 2, Panel B) has power that is almost the double of those under r = 0.2. Similar results from from many more simulation experiments can be obtained from the authors. 17 E[(At − F·th )2 | yth , zith ] = κ + κth + τNth . (κth + τth )2 (19) τth th N in equation (19) is less than For N > 1, it is easy to verify that (κth +τth )2 1 in equation (18). This shows that the uncertainty conventionally κth +τth associated with the consensus forecast, RM SEAF , is less than RM SELS or RM SERT . However, as we have shown before, when idiosyncratic errors are heteroskedastic, RM SERT will be in generally smaller than RM SELP S implying that RM SERT does not fully incorporate disagreement among the forecasters. 4 Confidence Interval for Thick Modeling: An Illustration of Underestimation Granger and Jeon (2004) have presented a number of illustrative examples to show advantages of using multiple models and forecast averaging for the purpose of inference. One particular case (their section 8) where they report the confidence intervals of average parameter estimates based on 24 closely related alternative specifications of the Taylor rule highlights well the issue we raise in this paper. They reexamine the empirical findings of Kozicki (1999) who estimated the Taylor rule using reasonable variations in the alternative definitions of inflation and output gap with monthly data from 1983-1997. The Taylor rule stipulates: g rt = c + (1 + α)πt−1 + βyt−1 , g where rt is the fund rate at time t, c is a constant, πt−1 and yt−1 are the inflation and output gap at t − 1 respectively. The hypothesized values of the parameters are 0.5 for both α and β. Because the nature of the estimates and our conclusions are very similar for α and β, in this section, we will focus on the α - the inflation weight. As reported in Table 5 of Granger and Jeon (2004, p.340) the 24 estimates of α varied widely across models ranging from −0.117 to 1.375 with an average value of 0.598. The standard error of these estimates varied from 0.137 to 0.283 with mean 0.219. Based on bootstrap aggregation (bagging) Granger and Jeon obtained the b to be 0.045. Thus their 95% confidence standard error of the average α 18 interval for α was (0.508, 0.688) that included only 3 of the 24 estimates of α. Since they did not explain the details of the bagging procedure, it seems b under a very that they bootstrapped the standard error of the average α stringent assumption. Note that the squared root of var( N1 N P i=1 b i ) is given α 0 by N −2 lN ΣlN , where lN is a N × 1 vector of ones and Σ is the covariance b This expression can also be written as matrix of α. var( N N N X X 1 X b i ) = N −2 b i ) + N −2 bi, α b j ). α var(α cov(α N i=1 i=1 i6=j (20) b their reported standard error of 0.045 is obtained using only the For average α first term in (20), i.e., N −2 N P i=1 b i )with var(α b i ) being their bootstrapped var(α b i for the ith model. It implies that the bagestimates of standard errors of α ging estimate that Granger and Jeon (2004) reported was based on bootstrap samples that assumed that the parameter estimates to be uncorrelated across the 24 alternative models, cf. Hastie et al.(2009, p.285). Note also that if N is allowed to tend to infinity and N −1 N P i=1 b i ) is bounded, their confidence var(α interval would eventually collapse to the mean 0.598. In general, we expect b i and α b j (j 6= i) to be correlated and hence the standard error of average α b α would converge to the limit of N −2 N P i6=j sition 1 that the limit of N −2 N P i6=j bi, α b j ). We have shown in Propocov(α bi, α b j ) would then be the standard error cov(α b i ’s. of the common component of various α In the absence of the actual data, we approximated above expression b i ’s are (20) for different values of the cross correlations by assuming that α equi-correlated. As we increased this correlation from 0 to 0.9, the standard b increased steadily from 0.045 to 0.219. Note that the error of the average α 24 models are based on conceptually very similar covariates and hence the parameters estimated across different models are expected to be reasonably correlated. If we assume the correlation to be 0.5, the standard deviation of b is 0.158. This value is significantly less than 0.224 - the square the average α root of the average variances as suggested by our analysis in the absence of systematic biases in parameter estimates. This underestimation is due 0 to the fact that the Granger-Bates formula N −2 lN ΣlN does not incorporate the uncertainties associated with each of the 24 models, see Draper (1995, 19 p.57) or Buckland et al. (1995, p.604). Even if the correlations among b are taken into account in (20), the confidence interval the components of α b would still be too narrow to be an appropriate measure of of average α b i . The difference between standard uncertainty associated with a typical α b corresponding to the true correlation among individual error of average α b i ’s and 0.224 can be attributed to the variance of biases in the individual α estimates. Given the data and the alternative models one can of course do a fullfledged BMA or FMA analysis to obtain the true standard error of the avb that will incorporate the uncertainty of the individual estimates. As erage α b is expected to be a gross explained before, their standard error of average α underestimation of the true risk that a policy maker will face in the presence of such varied point estimates from the 24 reasonably well specified Taylor rule specifications. Rather we should compute the true uncertainty following b i (= 0.050) and the variance ((11) that includes both the average variance of α of the parameter estimates of α, which was calculated to be 0.243. These numbers yield a standard error of 0.541 which is almost ten times bigger than what Granger and Jeon reported. This underestimation happens because of two reasons - 1) the use of the variance of the average rather than the average of the variances and 2) disregarding the variance of the point estimates. 5 Estimated Benchmark Uncertainty based on SPF Data In this section, we present estimates of historical uncertainty in inflation and output growth forecasts using RM SELP S ,and compare it to RM SEAF and RM SERT . The data in our study are taken from the Survey of Professional Forecasters (SPF) and Real Time Data Set for Macroeconomists (RTDSM), provided by the Federal Reserve Bank of Philadelphia. Although the SPF began as the NBER-ASA survey in 1986:Q4, we focus on the forecasts made from 1981:Q3 until 2011:Q4.9 The starting date reflects a decision by the SPF to ask for the forecasts of inflation and output growth for both the current year and next year. In our analysis, inflation is measured as the annual-average over annual-average percent change of GDP deflator, and 9 The data were downloaded on March 1, 2012, and thus include the corrections released on August 12, 2011. 20 output growth is similarly defined in terms of real GDP. The actual horizons for these forecasts are approximately 7 12 , 6 12 , . . ., 12 quarters but we refer to them simply as quarterly horizons of 8, 7, . . . , 1. In calculating the percentage change from year t − 1 to t, we transform the quarterly forecasts of the level of the variables into annual forecasts of the growth rate of the variables. Specifically, we use the forecasts of the level of the variables in the current and subsequent quarters and the actual values from the vintage of data available at that time to calculate the forecast of annual inflation and output growth for the current year. For example, in the second quarter of the current year, a respondent would use data on the actual values of real GDP in the first quarter (Y1,t ) and make predictions through the end of year t (F2,t , F3,t , F4,t ). Accordingly, as as example, the output growth forecast for the current year at horizon of 3 quarters is calculated as ft = (Y1,t + F2,t + F3,t + F4,t )/(Y1,t−1 + Y2,t−1 + Y3,t−1 + Y4,t−1 ), where Y1,t−1 through Y4,t−1 are the level of actual real GDP in the first through fourth quarter of the year t − 1, which are available to forecasters in the second quarter of year t. In calculating the percentage change from year t to t+1, we simply divide Ft+1 by Ft , where Ft+1 and Ft are the annual forecasts of real GDP for the next year and current year, respectively, which are available from the SPF surveys from 1981:Q3 onwards. Note that according to SPF, errors were made for surveys conducted in 1985:Q1, 1986:Q1 and 1990:Q1, so that the first annual forecast is for the previous year and the second annual forecast is for the current year. So we do not have those forecasts for the next year made at these quarters. To calculate forecast errors, we also need the actual values of annual inflation and output growth. As is well known, the NIPA data often go through serious revisions. Obviously, the most recent revision is not a good choice, since it involves adjustment of definitions and classifications. We choose the first quarterly release of annual inflation and output growth to compute the actual values. Since not all forecasters responded at all times, we have to deal with the unbalanced panel data. We can, of course, compute the uncertainty measures using a subset of the unbalanced panel. Though convenient, this method tends to lose important information by throwing out the valuable observations. On the other hand, we may also construct the balanced data set by estimating the missing observation as in Capistrán and Timmermann (2009). But estimation of the missing values will nevertheless introduce addi- 21 tional uncertainty, which may obscure or complicate the forecast uncertainty. In addition, the estimated values may not be reliable when a large number of observations are missing as in the SPF data set. One straightforward way to extend the measures designed exclusively for balanced panels to unbalanced panels is simply to replace N with Nt and T with Ti in equations (14), (15) and (13). While this approach utilizes all available data, it is easy to see that the inequality relationship between RM SEAF and RM SERT (and hence RM SELP S ) no longer holds.10 More h specifically, this simple approach implies that while calculating RM SEAF we are implicitly replacing the missing observations with cross-section average 1 Nt Nt P i h h eith at time t. In contrast, for both RM SERT and RM SELP S this s 1 Ti approach fills in the missing data with values Ti P e2 t ith . h Thus, RM SEAF h h (and hence RM SELP and RM SERT S ) are essentially using different data sets. This casts doubt on the appropriateness of using this approach to compute the three measures in an unbalanced panel. When dealing with unbalanced panels, an equal weighting scheme fails to discriminate forecast accuracy among different forecasters. In particular, the more observations are available for forecaster i, the more reliable information can be extracted from forecaster i. Thus, it is necessary and intuitively appealing to adopt an unequal weighting scheme to partially address the problems in computing forecast uncertainty caused by unbalanced panels. Following Bates and Granger (1969), we use the weights that are inversely proportional to the variance of summation components. To avoid the complications in estimating the variances, we assume that the weights are proportional to the number of observations for each time period or for each forecaster. Specifically, suppose that at t = τ , we have n∗τ cross-section observations starting from the unit nt , and at n = i, we have t∗i time series observations starting from time ti . Define ωt = n∗t T P t=1 n∗t and ωi = t∗i N P t∗ i=1 . i Then three uncertainty measures can be expressed as 10 Such a violation occurs because we can no longer apply Cauchy-Schwarz inequality in unbalanced panels. 22 RM SELP S = v ∗ u T nt +n u 1 X Xt −1 u e2it u T tP ∗ n t=1 i=nt t=1 RM SEAF = t v u uX u T t ωt t=1 RM SERT = N X i=1 (13A) 1 n∗t n∗t +nt −1 X eit (14A) i=nt v u t∗ +t −1 u i u1 iX ωi t e2it t∗i 2 (15A) t=ti Table 3 reports the result for inflation forecasts. Among the three measures, as expected, RM SEAF yields the smallest forecast uncertainty and there is a large difference between RM SEAF and the other two. At longer horizons, the importance of the idiosyncratic errors becomes increasingly more important, and of similar size as the variance of aggregate shock. Thus, pervasive heterogeneity seems to exist amongst the professional forecasters but is ignored in the traditional measure RM SEAF . In addition, for all eight horizons RM SERT is less than RM SELP S . Simple calculations show that the degree of underestimation is between 5% and 35% when RM SEAF is used as the measure of historical uncertainty and between 8% and 16% when RM SERT is used instead. As we show later, the estimated test statistics are such that the difference between RM SERT and RM SELP S is statistically significant at the 1% level for all eight horizons. Table 4 presents the result for output growth forecasts. The picture here is somewhat different from inflation forecasts. Although RM SEAF is still the smallest measure of forecast uncertainty, and RM SERT is smaller than RM SELP S across all eight horizons, the difference between RM SEAF and the other two is not as substantial as in inflation. This is due to the fact that the variance of aggregate shock (σλ2 ) is a more dominant factor and forecast disagreement plays a lesser role in output growth forecasts on average compared to inflation. As a result, when RM SEAF is used as the measure of historical uncertainty the degree of underestimation is modest, between 3% and 15%. There is still a statistically significant difference between RM SERT and RM SELP S . The RMSEs associated with output are uniformly higher compared to inflation due to a differential incidence of common shocks. This phenomenon, which makes real GDP growth a difficult variable to predict, 23 has been documented by Lahiri and Sheng (2011) using a heterogeneous learning model, see also Elliott (2011a). One might note that at one-quarter ahead, RM SEAF is slightly larger than RM SERT . This is possibly due to the unbalanced panel. In a separate experiment, we constructed a pseudo-balanced panel of 21 forecasters with each of them representing the minimum, 5th percentile, 10th percentile, ..., 95th percentile and the maximum of individual forecasts, respectively. The results, reported in Tables 5 and 6, clearly show that, as theoretically expected, RM SEAF < RM SERT < RM SELP S at all eight horizons in both inflation and output growth forecasts.11 To further illustrate the relative importance of heterogeneity in idiosyncratic errors, we calculate the percentage of average squared individual forecast errors that is explained by the three components in (8) including the two components forecast disagreement. For inflation, the contribution due to idiosyncratic errors increases from 8% for 1Q horizon to 50% for 8Q, with a corresponding decline in the role of common shock from 90% to 46%. As forecast horizon increases, disagreement becomes increasingly a more important factor of aggregate uncertainty. The interesting result to note is that the contribution due to time-invariant individual biases is rather small never exceeding 8% for the eight horizons. For growth forecasts, as noted before, the common shock maintains its relative importance more than its role in inflation as horizon increases. The relative importance of disagreement hovers around 18% at long horizons and still is non-ignorable. We must add that these are average percentages over the sample period. While estimating aggregate forecast uncertainty in real time, the disagreement term can sometimes be very large, often indicating unusual future. The distribution of disagreement over time for both inflation and real GDP for each horizon is highly skewed to the right.The RMSE explained by individual systematic biases again is very small like in inflation forecasts. Note that the RMSE figures that are reported in Reifschneider and Tulip (2007) and those in this paper are not directly comparable. We present average year-over-year forecasts whereas RT use Q4 over Q4 percent changes. The latter target tends to be more variable. Our estimates are based on 11 In constructing the confidence interval for the combined forecast, Granger and Jeon (2004) suggest using the variance of the mean, which is much less than RM SELS . For example, in their Table 5, the standard deviation calculated using the variance of the mean is 0.045 for inflation weight in a Taylor rule equation, much less than 0.219, the standard deviation calculated using RM SELS . 24 SPF individual forecasts beginning 1981:Q3 whereas RT started with an initial 20-year sample beginning 1986. More importantly, Reifschneider and Tulip (2007) used the the simple averages of the individual projections in SPF, Blue Chip and FOMC panels, together with Greenbook, Congressional Budget Office (CBO) and the Administration forecasts (N=6) in their group calculations. = q PMore generally, their measure is expressed as RMSEh P M T 1 1 m m 2 m=1 t=1 (At − F·th ) , where F·th is the mean forecast for the group M T m, for the target year t and h-period ahead to the end of the target year. By averaging across individual projections, most of idiosyncratic differences and disagreement in FOMC, SPF and Blue Chip forecasts have inadvertently been washed away. They found very little heterogeneity in these six forecasts. On the other hand, their simultaneous use of Greenbook, CBO, Adminstration, mean FOMC, SPF, and Blue Chip forecasts meant that RT had to meticulously sort out important differences in the comparability of these six forecasts due to data coverage, timing of forecasts, reporting basis for projections, and forecast conditionality. Despite all these differences, a close examination of these two sets of uncertainty estimates suggest that our longer term forecasts are clearly tagged with more historical uncertainty than the corresponding RT estimates. At least a part of the explanation for this discrepancy has to be due to their use of an incorrect formula for estimating benchmark uncertainty and the aggregation of individual forecasts from FOMC, SPF and Blue Chip surveys. 6 Conclusion Recent model averaging literature suggests that the risk associated with a combined forecast should incorporate not only the average of the variances of the individual forecasts but also the variance of the point forecasts across different models. This means that even if we have highly precise individual forecasts, we might end up with considerable uncertainty about the combined forecast if these point forecasts are very different across forecasters. The benchmark uncertainty estimates that we report in this paper will help the public to interpret the mean or the consensus forecasts that are released by surveys such as SPF, Blue Chip, or FOMC. Often the consensus forecasts are reported together with maximum, minimum and interquartile 25 range of the point forecasts, presumably to suggest the underlying uncertainty. However, as we have explained, these features of the cross sectional distribution indicate only a part of the aggregate uncertainty. For the purpose of a knowledgeable use of the consensus forecasts by the public, historical benchmark uncertainty estimates along the line suggested in this paper will lead to more meaningful use of the estimates by providing historical confidence intervals. Which is worse - widening the bands now or missing the truth later? Draper. Due to heterogeneity and entry and exit of forecasters, Manski suggested not to combine. We attest try to give confidence band that recognize the uncertainties associated with individual forecasts. Leamer warned us about fragility of inferences because of out failure to recognize model uncertainty. The developments in Granger Bates approach has sidestepped this concern. Granger-Jeon’s example is a very peprtiment example of such disregard. Acknowledgement We have benefited from discussions with Robert Engle, David Hendry, Hashem Pesaran, Peter Phillips, Kenneth Wallis and Mark Watson. We also thank the comments from participants at 10th World Congress of Econometric Society, 30th International Symposium on Forecasting, 12th ZEW Summer Workshop for Young Economists, New York Camp Econometrics, Society of Government Economist annual meeting, 6th Colloquium on Modern Tools for Business Cycle Analysis, George Washington University and Howard University. 26 7 7.1 Appendix Some Useful Lemmas and Their Proofs A word on notation. k·k denotes the Euclidean norm. −→p denotes convergence in probability. =⇒ denotes convergence in distribution. The limits taken under (T, N → ∞)seq refer to sequential limits where T goes to infinity first and then N goes to infinity. The limits taken under (T, N → ∞) are regarded as the joint limits with both T and N approaching infinity at the same time. We start with some lemmas that are useful in the following arguments. Lemmas 1 and 2 come from Lemma 6 and Theorem 1, respectively, in Phillips and Moon (1999). For simplicity, the subscript h will be omitted in the proof. Lemma 1 (Conditions for Sequential Convergence to Imply Joint Convergence) (a) Suppose there exist random variables Xi and X on the same probability space as XN,T satisfying, for all N, XN,T →p XN as T → ∞ and XN →p X as N → ∞. Then XN,T →p X as (T, N → ∞) if and only if lim supP {|XN,T − XN | > } = 0 ∀ > 0. (A1) N,T (b) Suppose there exist random variables XN such that, for any fixed N, XN,T =⇒ XN as T → ∞ and XN =⇒ X as N → ∞ Then XN,T =⇒ X as (T, N → ∞) if and only if lim sup |E(f (XN,T ) − Ef (XN )| = 0 ∀f ∈ C (A2) N,T where C is the class of all bounded, continuous real function on R. Lemma 2 Suppose the random variable Yi,T = 1 T T P i=1 Xi,T are independent across i for all T and integrable. Assume that Yi,T =⇒ Yi as T → ∞ for all i. Let the following condition hold: (i) lim sup N1 N,T N P i=1 E |Yi,T | < ∞, 27 N P (ii) lim sup N1 |EYi,T − EYi | = 0, i=1 N,T N P (iii) lim sup N1 (iv) lim sup N1 N,T E |Yi,T | 1 {|Yi,T | > N ε} = 0 ∀ε > 0, i=1 N,T N P E |Yi | 1 {|Yi | > N ε} = 0. i=1 Then (a) Condition (A2) holds. N P 1 N →∞ N (b) If lim EYi (= µx ) exists and XN →p µx , we have XN,T →p µx as i=1 (T, N → ∞). Lemma 3 Suppose Assumptions 1-3 hold. Then 1 T (a) As T → ∞, 1 T T P t=1 t=1 T P 1 T λ2t →p σλ2 , t=1 λt υjt →p 0, 1 T T P t=1 υit2 →p 1, and υit υjt →p 0. 1 (b) As T → ∞, 1), T P 1 T 1/2 T P t=1 T 1/2 T P t=1 λt υit =⇒ N (0, σλ2 ), υit υjt =⇒ N (0, 1), and " Σλ = E(λ2t − σλ2 )2 1 + 2 lim 1 T 1/2 TP −1 T →∞ k=1 (1 − 1 T 1/2 T P T P (υit2 − 1) =⇒ N (0, Eυit4 − t=1 (λ2t −σλ2 ) =⇒ N (0, Σλ ), where t=1 # k )ρ(k) T , 1 T 1/2 T P (υit4 − Eυit4 ) =⇒ t=1 2 N (0, Eυit8 − (Eυit4 ) ). (c) supE| T1 T (d) T P t=1 λt υit | < ∞, supE| T1 T T 1 P supE T 1/2 (υit2 t=1 T ∞. 1 (e) supE| T 1/2 T T P − 1) < T P t=1 υit2 | < ∞, supE| T1 T 1 P ∞, supE T 1/2 λt υit t=1 T 1 (υit2 − 1)|4 < ∞, supE| T 1/2 t=1 T T T P t=1 T P t=1 υit υjt | < ∞. 1 < ∞, supE| T 1/2 λt υit |4 < ∞. T T P t=1 υit υjt | < 28 Proof. Under Assumption 1(i), we see that λt is a strictly stationary and ergodic stochastic process with E(λt ) = 0, E(λ2t ) = σλ2 and E|λt |4+δ < ∞ for some δ > 0. Hence λ2t is strictly stationary and ergodic and ρλ (k) = 2 )(λ2 2 E[(λ2t −σλ t−k −σλ )] 2 )2 E(λ2t −σλ → 0 as k → ∞. Then the convergence of 1 T T P t=1 λ2t to σλ2 follows by applying weak law of large number (WLLN) for strictly stationary and ergodic processes. The remained convergence results in part (a) can be easily established by using Khintchine’s WLLN since υit is i.i.d. across i and over t with finite eighth moments, and is independent of λt . Now Assumption ) ( 1(ii) implies that lim T 2 T →∞ totic normality of k=1 T P 1 T 1/2 TP −1 (1 − Tk )ρλ (k) + 1 → ∞. Then the asymp- (λ2t − σλ2 ) in part (b) holds by virtue of Theorem t=1 5.6 of Hall and Heyde (1980). The remained asymptotic normality results in part (b) can be proved trivially by classical central limit theorem. Part (c) can be readily verified by application of Cauchy-Schwarz inequality and Jensen inequality. Part (e) follows directly from Theorem 3.6 of Billingsley (1999) and part (b). For part (d), the first two results follow easily from part (e), and the last one can be obtained by first applying Jensen inequality and then following the procedure in part (e). Lemma 4 Suppose Assumptions 1-3 hold. Then (a) (b) (c) (d) (e) (f) 1 T 1 T 1 T 1 T 1 T 1 T T P t=1 υit N1 T P 1 N t=1 T P t=1 1 N υit 1 N t=1 1 N = Op ( N1 ), υit i=1 υit2 T P υit = Op ( min{N,N11/2 T 1/2 } ), !2 N P 1 N t=1 t=1 i=1 υit3 T P T P N P N P i=1 ! υit = Op ( min{N,N11/2 T 1/2 } ), !2 N P = Op ( min{N 21,N T 2/1 } ), υit i=1 N P !3 υit i=1 N P i=1 = Op ( min{N 3 ,N1 3/2 T 1/2 } ), !4 υit = Op ( N12 ). 29 1 T (g) T P t=1 N P λt N1 i=1 υit = Op ( N 1/21T 2/1 ) Proof . Part (b), (f), and (g) can be obtained by using similar argument as in Lemma 6(b). For part (a), it can be established that T1 T P t=1 υit N1 NP −1 υjt = ,j6=i Op (N T )−1/2 by using similar argument as in Lemma 6(b). Then using also Lemma 1(a) we get the desired result because 1 T T P t=1 υit N1 NP −1 1 T T P t=1 υit N1 N P υit = i=1 1 NT T P t=1 υit2 + υjt . Part (c),(d) and (e) can be established in the same way as ,j6=i part (a). Lemma 5 Let εbit = eit − N1 σbλ2 = T P 1 T t=1 2 (4σλ2 σεi 1 N N P N P i=1 2 eit , σbεi = 1 T T P εb2it , t=1 and Vbi = 1 T T P t=1 2 2 (εb2it − σbεi ), !2 eit i=1 ar (ε2it )) +V Proof. Write 2 . Then under Assumptions 1-3, we have 4σbλ2 σbεi + Vbi − = Op 1 min{T 1/2 ,N } . N N 1 X 1 X eit = σεi υit − υit . εbit = eit − N i=1 N i=1 ! Then, using Lemma 3(b) and Lemma 4 (a) and 4(b), we have 2 2 σbεi − σεi = T 1X 2 εb2 − σεi T t=1 it 2 = σεi T T N X 1X 1 X 2 2 (υit2 − 1) − σεi υit υit T t=1 T t=1 N i=1 !2 T N 1X 1 X υit T t=1 N i=1 1 1 1 = Op + Op ( ) + Op 1/2 1/2 1/2 T min{N, N T } N ! 1 = Op min{T 1/2 , N } 2 +σεi 30 Similarly, we have σbλ2 − σλ2 T N 1X 1 X = eit T t=1 N i=1 = Op !2 − σλ2 1 min{T 1/2 , N } ! and T T N X 1X 1 X 4 4 1 4 4 b ε − σεi Eυit = σεi υit − υit T t=1 it T t=1 N i=1 = Op 1 min{T 1/2 , N } !4 − Eυit4 ! Then it follows that 4 4 Vbi − (σεi Eυit4 − σεi ) = = T 1X 2 2 4 4 εb2it − σbεi − (σεi Eυit4 − σεi ) T t=1 T 1X 4 2 2 2 2 εb4it − σεi Eυit4 − (σbεi + σεi )(σbεi − σεi ) T t=1 = Op 1 min{T 1/2 , N } ! Therefore ! 2 4σbλ2 σbεi + Vb i − 2 (4σλ2 σεi + Vi ) = Op 1 . min{T 1/2 , N } Lemma 6 Suppose Assumptions 1-3 hold. Then we have (a) (b) 1 T 1/2 1 N 1/2 T P 2 2 4 (2λt εit + ε2it − σεi ) =⇒ N (0, 4σλ2 σεi + σεi [E(υit4 ) − 1]) as T → ∞, t=1 N P i=1 1 T 1/2 T P 2 (2λt εit +ε2it −σεi ) t=1 as (T, N → ∞) . =⇒ N 0, lim N1 N →∞ N P i=1 ! 2 (4σλ2 σεi + 4 σεi [E(υit4 ) − 1]) 31 Proof. Part (a) can be established by a direct application of Theorem 5.6 2 in Hall and Heyde (1980), because (2λt εit + ε2it − σεi ) is strictly stationary and ergodic stochastic process for given i. From (a), we have, as T → ∞ and N is fixed, N T N 1/2 1 X 1 X 1 X 2 2 2 2 4 4 (2λ ε +ε −σ ) =⇒ zi 4σ σ + σ [E(υ ) − 1] t it it εi λ εi εi it N 1/2 i=1 T 1/2 t=1 N 1/2 i=1 where zi is an i.i.d. N (0, 1) variable. Using Liapunov’s central limit theorem for i.i.d. variables, we obtain N N 1/2 1 X 1 X 2 2 2 2 4 4 4 4 4σ 4σ σ σ + σ + σ [E(υ z =⇒ N 0, lim [E(υ ) − 1] ) − 1] i λ λ εi εi εi εi it it N →∞ N N 1/2 i=1 i=1 as N → ∞. Now it remains to show that conditions (i)-(iv) of Lemma 2 are satisfied. Applying triangle inequality, we have N T 1 X 1 X 2 E 1/2 (2λt εit + ε2it − σεi ) lim sup N i=1 T N,T t=1 N T N T 1 X 1 X 1 X 1 X 2 E 1/2 2λt εit + lim sup E 1/2 (ε2it − σεi ) ≤ lim sup N i=1 T N i=1 T N,T N,T t=1 t=1 ≤ T T 1 X 1 X 2 2supσεi supE 1/2 λt εit + supσεi supE 1/2 υit2 T T i i T T t=1 t=1 − 1 < ∞ by virtue of Lemma 3 (d). Condition (ii) holds trivially because 2 2 E(λt εit + ε2it − σεi ) = E(λt )E(εit ) + Eε2it − σεi = 0. Next, using Cauchy-Schwarz inequality and the indicator inequality 1 {b > x} ≤ ! 32 1 {a > x} if a > b for some real number a, b, x, we have N T 1 X 1 X 2 E 1/2 (2λt εit + ε2it − σεi ) 1 N i=1 T t=1 ( T 1 X 1/2 (2λt εit T ≤ ≤ ) 2 2 + εit − σεi ) > N t=1 !2 !2 1/2 N T T q 1 X 2 1 X 1 X 2 2 E σεi 4 + σεi λ υ + (υ − 1) t it N i=1 T 1/2 t=1 T 1/2 t=1 it !2 !2 1/2 T T q X X 1 1 2 2 2 ×1 σεi 4 + σεi λ υ + (υ − 1) > N t it it T 1/2 t=1 T 1/2 t=1 !2 !2 1/2 T T q X X 1 1 2 2 2 4 + σεi sup E 1/2 λt υit + (υ − 1) supσεi T T 1/2 t=1 it i T t=1 !2 1/2 !2 T T X X 1 N 1 2 q ×1 λ υ (υ − 1) > + t it T 1/2 t=1 T 1/2 t=1 it σ2 4 + σ2 εi εi → 0 as (T, N → ∞), since by Jensen inequality, ≤ ≤ 1 sup E 1/2 T T 1 sup E T 1/2 T " T 1X sup T T !2 1/2 1 2 λt υit + (υ − 1) T 1/2 t=1 it t=1 !2 !2 1/2 T T X X 1 2 λt υit + E (υ − 1) it T 1/2 t=1 t=1 ! !# 1/2 T 1X !2 T X T X E(λt υit )2 + sup T T t=1 E(υit2 − 1)2 t=1 < ∞, so condition (iii) is verified. Since zi is an i.i.d. normal variable for all i, 2 4 supE 4σλ2 σεi + σεi [E(υit4 ) − 1] zi2 i ≤ 2 4σλ2 supσεi i + [E(υit4 ) < ∞, verifying condition (iv) of lemma 2. − 4 1]supσεi i E(zi2 ) 33 7.2 Proof of Proposition 1 Proof. We show the proofs for part (c) and (d). Part (a), (b), (e) and (f) can be established by similar arguments. (c) Using Lemma 3(a), we see that v u RM SERT N u T N q X 1 X 1 X t1 2 as T → ∞ for fixed N = e2it →p σλ2 + σεi N i=1 T t=1 N i=1 1 N →∞ N 2 < ∞ by assumption, lim Since supσεi i it follows that N q P σ2 i=1 λ 2 is well defined. Then + σεi N q 1 X 2 σλ2 + σεi N →∞ N i=1 RM SERT →p lim as (T, N → ∞)seq . To establish the joint limit, it remains to show that conditions (i)-(iv) of s Lemma 2 hold for 1 N N P 1 T i=1 T P e2 . By Jensen inequality and Lemma 3(b) it t=1 v u N u T q X 1 X tE 1 2 E |RM SERT | ≤ e2it ≤ sup σλ2 + σεi N i=1 T t=1 i ! verifying condition (i). Next, using power series expansion v u T q u1 X 2 E t e − σλ2 it T t=1 2 + σεi ≤ 1 (σλ2 2 + 2 −1/2 σεi ) E T 1X 2 (e2 − σλ2 − σεi ) T t=1 it ! 2 for sufficiently large T . Then condition (ii) holds because E(e2it ) = σλ2 + σεi for all i and t. To establish condition (iii), it is enough to verify that 1 T T P e2 t=1 it is uniformly integrable in T, which holds by virtue of Theorem 3.6 in Billingsley (1999), because of 1 T T P e2 t=1 it ≥ 0 and 1 T T P e2 t=1 it 2 →p (σλ2 + σεi ) by Lemma 3(a). 2 Condition (iv) is trivial to establish since supσεi < ∞ by assumption. So we i can conclude that 2 RM SERT →p N q 1 X 2 lim σλ2 + σεi N →∞ N i=1 !2 34 as (T, N → ∞). (d) Using power series expansion again, we see that T = 1/2 RM SERT N q 1 X 2 − σλ2 + σεi N i=1 ! N T 1 2 1 X 1 1 X 2 −1/2 2 (σλ + σεi ) [ 1/2 (e2it − σλ2 − σεi )] + Op ( 1/2 ). N i=1 2 T T t=1 Then it suffices to show N T 1 2 1 X 1 X 2 −1/2 2 (σλ + σεi (e2it − σλ2 − σεi ) [ 1/2 )] N i=1 2 T t=1 =⇒ N 0, 1 N →∞ N lim N q X !2 2 σλ2 + σεi i=1 1 N →∞ N lim N X 1 q i=1 σλ2 + 2 2 σεi Σλ as (N, T → ∞). Using Lemma 3(b), we have T T 1 X 2 ) (e2it − σλ2 − σεi 1/2 = t=1 N T X 11 X 1 1 2 −1/2 1 (λ2t − σλ2 ) + Op ( 1/2 ) + Op ( 1/2 ) (σλ2 + σεi ) 1/2 2 N i=1 T T N t=1 =⇒ N 11 X 2 −1/2 (σ 2 + σεi ) N (0, Σλ ) . 2 N i=1 λ 1 N →∞ N 2 2 Since 0 < inf σεi ≤ supσεi < ∞ by assumption, lim i i N P 2 −1/2 (σλ2 + σεi ) is i=1 well defined. Then it follows that T 1/2 RM SERT − 1 N N q X i=1 ! 2 σλ2 + σεi =⇒ N 0, 1 1 lim 4 N →∞ N N X 1 q i=1 2 σλ2 + σεi 2 Σλ 35 as (T, N → ∞) . Consequently, using the results in part (c), 2 T 1/2 RM SERT = N q 1 X 2 σλ2 + σεi + N i=1 RM SERT 1 N →∞ N =⇒ N 0, lim N q 1 X 2 − σλ2 + σεi N i=1 N q X T 1/2 RM SERT i=1 !! !2 2 σλ2 + σεi !2 1 N →∞ N lim N X 2 1 q i=1 N q 1 X 2 − σλ2 + σεi N i=1 σλ2 + 2 σεi Σλ as (T, N → ∞). 7.3 Proof of Theorem 1 Proof. Using Lemma 5, we obtain !2 !2 T N h T N X i X 1 X T X 2 2 b −1 1 b λ2 σ b εi e2 e − 4 σ + V i N 1/2 i=1 T t=1 it N T i=1 t=1 it N h T N X T i−1 T X 1X 1 X 2 2 2 = 4σ σ + V e − e2it i λ εi it 1/2 N T t=1 N T i=1 t=1 i=1 +Op N T 1 − T 1 − T 1/2 ! . 2 4 Next, define Ψi = 4σλ2 σεi + σεi (Eυit4 − 1). Then 2 2 4 supΨ−1 i = 4σλ inf σεi + inf σεi i i i Eυit4 − 1 < ∞. Hence, N T 1X T X −1 2 Ψ (2λt εit + ε2it − σεi ) N 1/2 i=1 i T t=1 ! ≤ 1 supΨ−1 i N 1/2 i = Op (N −1/2 ) N X T 1 X 2 (2λt εit + ε2it − σεi ) N T i=1 t=1 ! !2 N T 1 X 1 X 2 ) (2λt εit + ε2it − σεi N 1/2 i=1 T 1/2 t=1 ! 36 by virtue of Lemma 6(b). Similarly, N X N T T X 1 X 2 −1 (2λt εit + ε2it − σεi ) Ψ N 1/2 i=1 i N T i=1 t=1 !2 !2 N T 1 X 1 X 2 (2λt εit + ε2it − σεi ) N 1/2 i=1 T 1/2 t=1 1 ≤ supΨ−1 i N 1/2 i = Op (N −1/2 ). 1 T Then, by expanding the term T P e2 t=1 it − 1 NT N P T P e2 i=1 t=1 !2 and substituting it 2 λt + εit for eit , we have, under the null hypothesis that σεi = σε2 for all i, !2 N h T N X T i−1 1 X T X 1X 2 2 2 e − e2it 4σ σ + V i it λ εi 1/2 N T t=1 N T i=1 t=1 i=1 1 N 1/2 = N h X 2 4σλ2 σεi + Vi i−1 1 T X T 1/2 t=1 i=1 1 − T !2 2 (2λt εit + ε2it − σεi ) − 1 + Op (N −1/2 ). So 2−1/2 N −1/2 = 2−1/2 N −1/2 +Op N T N h X i=1 N h X i=1 ! 1/2 2 4σbλ2 σbεi + Vbi 2−1/2 N −1/2 i=1 T N X T 1 X 1X e2it − e2 T t=1 N T i=1 t=1 it T i−1 4σλ2 σε2 + σε4 [(Eυit4 − 1)] T 1 X T 1/2 !2 −1 !2 [(2λt εit + ε2it ) − σε2 ] N T → 0, it remains to establish that under the null i−1 2 4 4σλ2 σεi + σεi [(Eυit4 − 1)] T T 1 X 2 [(2λt εit + ε2it ) − σεi ] 1/2 !2 By Lemma 6(a) and continuous mapping theorem, we see that, when T → ∞, −1 2 4 4σλ2 σεi + σεi [E(υit4 ) − 1] T T 1 X 2 [(2λt εit + ε2it ) − σεi ] 1/2 t=1 !2 =⇒ χ21,i −1 t=1 =⇒ N (0, 1) as (N, T → ∞) . −1 t=1 + Op (N −1/2 ). Given (N, T → ∞) and hypothesis N h X i−1 37 where χ21,i is an i.i.d. chi square random variable with 1 degree of freedom. By applying classical central limit theorem, we obtain 2−1/2 N −1/2 N X i=1 T 1 X −1 2 4 4σλ2 σεi + σεi [E(υit4 ) − 1] T 1/2 !2 2 [(2λt εit + ε2it ) − σεi ] −1 t=1 =⇒ N (0, 1) as (N, T → ∞)seq . The above limit distribution continues to hold under (N, T → ∞), provided that conditions (i)-(iv) of Lemma 2 are satisfied. Conditions (i) and (ii) can be established in a similar way as in Lemma 6(b). Condition (iv) is trivial to verify because the variable (χ21,i − 1) is uniformly integrable. It remains to examine condition (iii). Note that n N T o−1 1 X 1 X 2 2 4 [(2λt εit + ε2it ) − σεi E 4σλ2 σεi lim sup ] + σεi [E(υit4 ) − 1] N i=1 T 1/2 t=1 N,T n 2 × 1 4σλ2 σεi n T 1 X o−1 4 + σεi [E(υit4 ) − 1] T 1/2 !2 2 [(2λt εit + ε2it ) − σεi ] t=1 − 1 !2 > Nε o−1 2 4 ≤ inf 4σλ2 σεi + σεi [E(υit4 ) − 1] i × lim sup N,T n ×1 N 1 X 2 4 3 σεi 4σεi ZiT E 4σεi N i=1 3 4 2 4σεi σεi 4σεi n oo 2 4 ZiT > N ε 4σλ2 σεi + σεi [E(υit4 ) − 1] where ZiT = 1 T 1/2 1 T 1/2 1 T 1/2 T P t=1 !2 λt υit − σλ2 4 − [E(υit ) − 1] ! T 1 P (υ 2 − 1) !2 (υit2 − 1) t=1 T P t=1 T P ! λt υit T 1/2 t=1 it Consider the indicator function first. Note that oo n n 2 4 2 4 3 σεi 4σεi 1 4σεi ZiT > N ε 4σλ2 σεi + σεi [E(υit4 ) − 1] 2 4 N ε {4σλ2 σεi + σεi [E(υit4 ) − 1]} n o ≤ 1 kZiT k > 2 4 3 σεi 4σεi supi 4σεi − 1 (A3) 38 Next, expression (A3) is bounded above by o−1 n 2 4 inf 4σλ2 σεi + σεi [E(υit4 ) − 1] i o n 2 4 3 σεi 4σεi sup 4σεi i × sup E kZiT k 1 kZiT k > T 2 4 N ε {4σλ2 σεi + σεi [E(υit4 ) − 1]} o n 2 4 3 σεi 4σεi supi 4σεi (A4) By Assumption 2 2 4 N ε {4σλ2 σεi + σεi [E(υit4 ) − 1]} o → ∞ n 3 4 2 4σεi σεi supi 4σεi as N → ∞. Then expression (A4)→ 0 as (T, N → ∞), since kZiT k is uniformly integrable in T , which follows from sup E kZiT k T = sup E T T 1 X T 1/2 T 1 X " + T 1/2 ≤ sup E T T 1 X " +E T 1/2 ≤ sup E T + E λt υit T 1/2 T 1 T 1/2 T X λt υit 2 T 1 X λt υit T 1/2 t=1 !2 (υit2 − 1) !2 − σλ2 + E E 1 T X 2 − [E(υit4 ) − 1] t=1 λt υit t=1 (υit2 − 1) 2 !4 1/2 " − [E(υit4 ) − 1] !#2 1/2 t=1 !2 (υit2 − 1) − σλ2 + E λt υit T 1/2 T 1 X t=1 T 1/2 2 !2 !#2 1/2 t=1 !2 T 1 X t=1 t=1 T X T 1 X (υit2 − 1) 1/2 ! T 1/2 − σλ2 + λt υit ! 1 t=1 t=1 2 !2 1 T X T 1/2 t=1 <∞ by virtue of Lemma 3(c), 3(d) and 3(e). T T 1 X (υit2 − 1) 1/2 t=1 !#1/2 1/2 (υit2 − 1) 2 − [E(υit4 ) − 1] 39 References Aiolfi, M. and A. Timmermann (2006). Persistence in forecasting performance and conditional combination strategies. Journal of Econometrics 135, 31-53. Baltagi, B.H. (1988). An alternative heteroscedastic error component model. Econometric Theory 4, 349-350. Baltagi, B.H., G. Bresson and A. Pirotte (2006). Joint LM test for heteroskedasticity in a one way error component model. Journal of Econometrics 134, 401-417. Bates, J.M. and C.W.J. Granger (1969). The combination of forecasts. Operations Research Quarterly 20, 451-468. Billingsley, P. (1999). Convergence of Probability Measures. John Wiley & Sons Inc., New York. Boero G., J. Smith and K.F. Wallis (2008). Uncertainty and disagreement in economic prediction: the Bank of England Survey of External Forecasters. Economic Journal 118, 1107-1127. Bomberger, W.A. (1996). Disagreement as a measure of uncertainty. Journal of Money, Credit and Banking 28, 381-392. Breusch, T.S. and A.R. Pagan (1979). Simple test for heteroscedasticity and random coefficient variation. Econometrica 47, 1287-1294. Buckland, S.T., K.P. Burnham, and N.H. Augustin (1997). Model Selection: An Integral Part of Inference. Biometrics 53, 603-618. Capistrán, C. and A. Timmermann (2009). Forecast combination with entry and exit of experts. Journal of Business and Economic Statistics 27, 429440. Davidson, J. (1994). Stochastic Limit Theorem. Oxford University Press, Oxford. Davies, A. and K. Lahiri (1995). A new framework for analyzing survey forecasts using three-dimensional panel data. Journal of Econometrics 68, 205-227. 40 Diebold, F.X., A.S. Tay and K.F. Wallis (1999). Evaluating density forecasts of inflation: The survey of professional forecasters. In Engle, R.F. and White, H. (eds.), Cointegration, Causality, and Forecasting: A Festschrift in Honour of Clive W.J. Granger. Oxford University Press, Oxford. Draper, D. (1995). Assessment and propagation of model uncertainty. Journal of the Royal Statistical Society B 57: 45-97. Elliott, G. (2011a). Forecast Combination when Outcomes are Difficult to Predict. UCSD Department of Economics Working Paper May 10, 2011. Elliott, G. (2011b). Averaging and the Optimal Combination of Forecasts. UCSD Department of Economics Working Paper Sept. 27, 2011. Elliott, G. and A. Timmermann (2004). Optimal Forecast Combination under General Loss Functions and Forecast Error Distributions. Journal of Econometrics 122, 47-79. Elliott, G. and A. Timmermann (2005). Optimal forecast combination under regime switching. International Economic Review 46, 1081-1102. Engelberg, J., C.F.Manski and J. Williams (2011), Assessing the Temporal Variation of Macroeconomic Forecast by a Panel of Changing Composition. Journal of Applied Econometrics 26, 1059-1078. Engle, R.F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica 50: 987-1008. Engle, R.F. (1983). Estimates of the variance of U.S. inflation based upon the ARCH model. Journal of Money, Credit and Banking 15, 286-301. Genre, V., G. Kenny, A. Meyler and A. Timmermann (2013). Combining expert forecasts: can anything beat the simple average? International Journal of Forecasting 29, 108-121 Geweke, J. and C. Whiteman (2006). Bayesian forecasting. In Elliott, G., Granger, C.W.J. and Timmermann, A. (eds.), Handbook of Economic Forecasting. Elsevier, 4-80. Giordani, P. and P. Söderlind (2003). Inflation forecast uncertainty. European Economic Review 47, 1037-1059. 41 Granger, C.W.J. and Y. Jeon (2004). Thick modeling. Economic Modelling 21, 323-343. Hall, P. and C.C. Heyde (1980). Martingale Limit Theory and its Application. Academic Press, New York. Hastie, T., R. Tibshirani and J. Friedman (2009), The Elements of Statistical Learning: Data mining, Inference, and Prediction Springer, 2nd edition. Hendry, D.F. and M.P. Clements (2004), Pooling of Forecasts. Econometrics Journal 7, 1-31. Hibon, M and T. Evgeniou (2005), To combine or not to combine: Slecting among forecasts and their combinations. International Journal of Forecasting, 21 15-24. Hoeting, J.A., D. Madigan, A.E. Raftery and C.T. Volinsky, (1999). Bayesian Model Averaging: A Tutorial (with discussion). Statistical Science 14 , 382417. Holly, A. and L. Gardiol (2000). A score test for individual heteroscedasticity in a one way error component model. In Krishnakumar, J. and E. Ronchetti (eds.) Panel Data Econometrics: Future Directions, North-Holland, Amsterdam. Hsiao, C. and M.H. Pesaran (2008). Random coefficient panel data models. In Matyas, L. and P. Sevestre (eds.), The Econometrics of Panel Data: Fundamentals and Recent Developments in Theory and Practice. Springer, 3rd edition. Hsiao, C. and S.K. Wan (2011), Comparison of Forecasting methods with an application to predicting excess equity premium. Mathematics and Computers in Simulation 81(7), 1235-1246. Issler, J.V. and L.R. Lima (2009). A Panel data approach to economic forecasting: the bias-corrected average forecast. Journal of Econometrics 152, 153-164. Kleiber, W., A.E. Raftery, and T. Geniting (2011). Geostatistical Model Averaging for Locally Calibrated Probabilistic Quantitative Precipitation Forecasting. Journal of the American Statistical Association 106 (496), 1291-1303. 42 Koop, G. and S. Potter (2004). Forecasting in Dynamic Factor Models using Bayesian Model Averaging. Econometrics Journal 7, 550-565. Lahiri, K. and F. Liu (2006). Modeling multi-period inflation uncertainty using a panel of density forecasts. Journal of Applied Econometrics 21, 1199-1219. Lahiri, K. and X. Sheng (2008). Evolution of forecast disagreement in a Bayesian learning model. Journal of Econometrics 144, 325-340. Lahiri, K. and X. Sheng (2010). Measuring forecast uncertainty by disagreement: the missing link. Journal of Applied Econometrics 25, 514-538. Lahiri, K., C. Teigland and M. Zaporowski (1988). Interest Rates and the Subjective Probability Distribution of Inflation Forecasts. Journal of Money, Credit, and Banking 20, 233-248. Magnus, J.R. (1982). Multivariate error components analysis of linear and nonlinear regression models by maximum likelihood. Journal of econometrics 19, 239-285. Makridakis, S., R.M. Hogarth and A. Gaba (2009). Forecasting and uncertainty in the economic and business world. International Journal of Forecasting 25, 794-812. Manski, C.F. (2011). Interpreting and combining Heterogeneous Survey Forecasts. In Clemens, M. P., and Hendry, D.F. (eds.), Oxford Handbook of Economic Forecasting 457-472. Oxford University Press. Mazodier, P. and A. Trognon (1978). Heteroscedasticity and stratification in error components models. Annales de I’INSEE 30-31, 451-482. McNees, S.K. (1992). The Uses and Abuses of “Consensus” Forecasts. Journal of Forecasting 11, 703-710. Mishkin, F.S. (2007). Inflation dynamics. Speech delivered at the Annual Macro Conference, Federal Reserve Bank of San Francisco, California (March 23). Palm, F.C. and A. Zellner (1992). To combine or not to combine? Issues of Combining Forecasts. Journal of Forecasting 11, 687-701. 43 Pesaran, M.H. and T. Yamagata (2008). Testing slope homogeneity in large panels. Journal of Econometrics 142, 50-93. Phillips, P.C.B. and H.R. Moon (1999). Linear regression limit theory for nonstationary panel data. Econometrica 67, 1057-1111. Rao, P.S.R.S., J. Kaplan and W.C. Cochran (1981). Estimators for the oneway random effects model with unequal error variances. Journal of the American Statistical Association 76, 89-97. Reifschneider, D. and P. Tulip (2007). Gauging the uncertainty of the economic outlook from historical forecasting errors. Federal Reserve Board, Finance and Economics Discussion Series, No. 60. Sancetta, A. (2010). Recursive Forecast Combination for Dependent Heterogeneous Data. Econometric Theory 26, 598-631. Smith, J. and K.F. Wallis (2009). A Simple Explanation of the Forecast Combination Puzzle. Oxford Bulletin of Economics and Statistics 71, 331355. Stock, J.H. and M.W. Watson (2007). Why has U.S. inflation become harder to forecast? Journal of Money, Credit and Banking 39, 3-33. Swamy, P. (1970). Efficient inference in a random coefficient regressions model. Econometrica 38, 311-323. Tetlow, R.J. (2006). Real-time model uncertainty in the United States: “Robust” Policies put to the test. Federal Reserve Board Mimeo. Timmermann, A. (2006). Forecast combinations. In Elliott, G., C.W.J. Granger and A. Timmermann (eds.), Handbook of Economic Forecasting. Elsevier, 135-196. Tulip, P. and S. Wallace (2012). Estimates of Uncertainty around the RBA’s Forecasts. Economic Research Department, Reserve Bank of Australia Discussion Paper 2012-07. Verbon, H.A.A. (1980). Testing for heteroscedasticity in a model of seemingly unrelated regression equations with variance components (SUREVC), Economics Letters 5, 149-153. 44 Wallis, K.F. (2005). Combining Density and Interval Forecasts: A Modest proposal. Oxford bulletin of Economics and Statistics 67, 983-994. Wansbeek, T.J. (1989). An alternative heteroscedastic error components model. Econometric Theory 5, 326. Wei, X., Y. Yang, (2012). Robust forecast combinations. Journal of Econometrics 166, 224-236. Wilson, E.B. and M.M. Hilferty (1931). The distribution of chi-square. Proceedings of the National Academy of Sciences 17, 684-688. Winkler, R.L. (1989). Combining Forecasts: A philosophical basis and some current issues. International Journal of Forecasting 5, 605-609. Woodford, M. (2005). Central bank communication and policy effectiveness. Proceedings, Federal Reserve Bank of Kansas City, issue Aug, 399-474. Yang, U. (2004). Combining Forecasting Procedures: Some Theoretical Results. Econometric Theory 20, 176-222. Zarnowitz, V (1984), The accuracy of individual and group forecasts from business outlook survey.Journal of Forecasting, 3, 11-26. Zarnowitz, V. and L.A. Lambros (1987). Consensus and uncertainty in economic prediction. Journal of Political Economy 95, 591-621. σω2 N=20 0.047 0.045 0.048 0.042 0.044 0.045 = 0.1, σε2 = 0.01 N=40 N=100 0.039 0.047 0.051 0.049 0.044 0.047 0.045 0.041 0.040 0.049 0.056 0.051 σω2 = 0.2, σε2 = 0.2 σω2 = 0.6, σε2 = 1.5 σω2 N=20 N=40 N=100 N=20 N=40 N=100 N=20 0.051 0.044 0.051 0.057 0.063 0.076 0.041 0.046 0.058 0.043 0.046 0.053 0.054 0.047 0.053 0.046 0.050 0.054 0.052 0.051 0.051 0.050 0.052 0.055 0.059 0.058 0.068 0.048 0.049 0.047 0.055 0.054 0.055 0.048 0.045 0.055 0.055 0.043 0.040 0.052 0.053 0.047 = 3, σε2 = 1 N=40 N=100 0.044 0.040 0.044 0.049 0.053 0.045 0.044 0.046 0.043 0.050 0.044 0.043 2 2 Rejection rates of Z test under H0 : σ level based on 2000 replications. The model is eit = λt + εit , qεi = σε forqalmost all i at the 5%√nominal √ √ √ 3 3 where λt = ρλt−1 + ωt , λ0 ∼ U (−σω 1−ρ2 , σω 1−ρ2 ), ωt ∼ i.i.d.U (− 3σω , 3σω ), ∀t, and εit ∼ i.i.d.U (− 3σε , 3σε ), ∀(i, t). σω2 = 0.05, σε2 = 0.01 N=20 N=40 N=100 T=30 0.044 0.039 0.050 T=60 0.039 0.050 0.054 ρ=0 T=120 0.049 0.041 0.051 T=30 0.044 0.042 0.045 T=60 0.047 0.041 0.049 ρ = 0.4 T=120 0.049 0.052 0.043 Table 1: Size of Z test 45 =1 N=100 0.665 0.971 1.000 0.565 0.927 1.000 =1 N=100 0.243 0.550 0.947 0.193 0.474 0.877 2 2 2 In Panel A, σεi = σε2 for 80% of the samples, and σεi 6= σε2 for the remaining 20%, where σεi ∼ U (0.1σω , 1.9σω ). 2 2 2 2 2 In Panel B, σεi = σε for 60% of the samples, and σεi 6= σε for the remaining 40%, where σεi ∼ U (0.1σω , 1.9σω ). Rejection rates based on 2000 replications. The model is eit = λt + εit , where λt = ρλt−1 + ωt , q of Z testqat the 5% nominal level √ √ √ √ 3 3 λ0 ∼ U (−σω 1−ρ , σ ), ω ∼ i.i.d.U (− 3σ 2 ω t ω , 3σω ), ∀t, and εit ∼ i.i.d.U (− 3σεi , 3σεi ), ∀(i, t). 1−ρ2 Panel A σω2 = 0.05, σε2 = 0.01 σω2 = 0.1, σε2 = 0.01 σω2 = 0.2, σε2 = 0.2 σω2 = 0.6, σε2 = 1.5 σω2 = 3, σε2 N=20 N=40 N=100 N=20 N=40 N=100 N=20 N=40 N=100 N=20 N=40 N=100 N=20 N=40 T=30 0.047 0.175 0.125 0.050 0.096 0.075 0.087 0.881 0.808 0.153 0.999 0.998 0.066 0.344 T=60 0.071 0.403 0.282 0.051 0.162 0.125 0.141 0.999 0.992 0.281 1.000 1.000 0.080 0.703 ρ=0 T=120 0.088 0.798 0.694 0.065 0.424 0.296 0.298 1.000 1.000 0.596 1.000 1.000 0.121 0.982 T=30 0.055 0.147 0.105 0.046 0.071 0.066 0.089 0.805 0.717 0.149 0.996 0.990 0.057 0.294 T=60 0.052 0.345 0.232 0.045 0.143 0.119 0.131 0.990 0.980 0.255 1.000 1.000 0.065 0.613 ρ = 0.4 T=120 0.086 0.716 0.590 0.056 0.343 0.245 0.248 1.000 1.000 0.539 1.000 1.000 0.102 0.948 Panel B σω2 = 0.05, σε2 = 0.01 σω2 = 0.1, σε2 = 0.01 σω2 = 0.2, σε2 = 0.2 σω2 = 0.6, σε2 = 1.5 σω2 = 3, σε2 N=20 N=40 N=100 N=20 N=40 N=100 N=20 N=40 N=100 N=20 N=40 N=100 N=20 N=40 T=30 0.079 0.033 0.373 0.056 0.136 0.158 0.356 0.991 0.999 0.710 1.000 1.000 0.133 0.550 T=60 0.149 0.658 0.782 0.088 0.318 0.347 0.661 1.000 1.000 0.962 1.000 1.000 0.241 0.921 ρ=0 T=120 0.315 0.968 0.993 0.149 0.661 0.776 0.962 1.000 1.000 1.000 1.000 1.000 0.507 0.999 T=30 0.078 0.245 0.304 0.061 0.130 0.128 0.307 0.954 0.987 0.643 1.000 1.000 0.113 0.466 T=60 0.128 0.563 0.671 0.079 0.253 0.274 0.593 1.000 1.000 0.930 1.000 1.000 0.202 0.837 ρ = 0.4 T=120 0.250 0.928 0.969 0.135 0.552 0.663 0.924 1.000 1.000 1.000 1.000 1.000 0.437 0.995 Table 2: Power of Z test 46 47 TABLES Table 3: Measures of aggregate uncertainty in inflation forecast Horizon 1Q ahead 2Q ahead 3Q ahead 4Q ahead 5Q ahead 6Q ahead 7Q ahead 8Q ahead RM SEAF 0.19 0.23 0.29 0.45 0.65 0.80 0.85 0.88 RM SERT 0.18 0.26 0.38 0.58 0.81 0.92 1.02 1.09 RM SELS 0.20 0.28 0.45 0.67 0.91 1.02 1.15 1.30 σλ2 0.04 0.05 0.08 0.20 0.41 0.63 0.71 0.74 σ2 0.00 0.02 0.12 0.25 0.41 0.41 0.61 0.94 Z 22.37* 11.95* 7.36* 11.29* 9.80* 8.12* 5.89* 8.23* Note: (a) The inflation rate is measured as the annual-average over annual-average percent change of GDP deflator. The actual inflation rate for 1982-2011 is taken from the first quarterly release of Federal Reserve Bank of Philadelphia “real-time” data set; (b) The inflation forecasts used in this study are taken from the Survey of Professional Forecasters from 1981:Q3 until 2011:Q4; (c) * indicates significance at 1% level. 48 TABLES Table 4: Measures of aggregate uncertainty in output growth forecast Horizon 1Q ahead 2Q ahead 3Q ahead 4Q ahead 5Q ahead 6Q ahead 7Q ahead 8Q ahead RM SEAF 0.29 0.37 0.52 0.88 1.20 1.59 1.54 1.67 RM SERT 0.25 0.39 0.59 0.97 1.29 1.63 1.60 1.73 RM SELS 0.30 0.41 0.62 1.03 1.36 1.77 1.69 1.89 σλ2 0.09 0.14 0.27 0.77 1.43 2.51 2.35 2.77 σ2 0.00 0.04 0.12 0.31 0.61 1.00 0.63 0.97 Z 23.82* 5.41* 5.40* 7.30* 4.06* 14.25* 9.97* 9.02* Note: (a) The output growth is measured as the annual-average over annual-average percent change of real GDP. The actual output growth rate for 1982-2011 is taken from the first quarterly release of Federal Reserve Bank of Philadelphia “real-time” data set; (b) The output growth forecasts used in this study are taken from the Survey of Professional Forecasters from 1981:Q3 until 2011:Q4; (c) * indicates significance at 1% level. 49 TABLES Table 5: Measures of aggregate uncertainty from a pseudo-balanced panel of inflation forecast Horizon 1Q ahead 2Q ahead 3Q ahead 4Q ahead 5Q ahead 6Q ahead 7Q ahead 8Q ahead RM SEAF 0.18 0.23 0.30 0.43 0.63 0.78 0.83 0.85 RM SERT 0.19 0.27 0.43 0.65 0.90 0.98 1.10 1.28 RM SELS 0.19 0.29 0.48 0.72 0.99 1.04 1.20 1.39 Note: (a) The inflation rate is measured as the annual-average over annual-average percent change of GDP deflator. The actual inflation rate for 1982-2011 is taken from the first quarterly release of Federal Reserve Bank of Philadelphia “real-time” data set; (b) The inflation forecasts used in this study are taken from the Survey of Professional Forecasters from 1981:Q3 until 2011:Q4; (c) The pseudo-balanced panel includes 21 forecasters, with each of them representing the minimum, 5th percentile, 10th percentile, ..., 95th percentile and the maximum of individual forecasts, respectively. 50 TABLES Table 6: Measures of aggregate uncertainty from a pseudo-balanced panel of output growth forecasts Horizon 1Q ahead 2Q ahead 3Q ahead 4Q ahead 5Q ahead 6Q ahead 7Q ahead 8Q ahead RM SEAF 0.26 0.36 0.54 0.92 1.18 1.59 1.53 1.61 RM SERT 0.28 0.41 0.64 1.08 1.36 1.79 1.72 1.85 RM SELS 0.28 0.42 0.66 1.11 1.39 1.82 1.74 1.93 Note: (a) The output growth is measured as the annualaverage over annual-average percent change of real GDP. The actual output growth rate for 1982-2011 is taken from the first quarterly release of Federal Reserve Bank of Philadelphia “real-time” data set; (b) The output growth forecasts used in this study are taken from the Survey of Professional Forecasters from 1981:Q3 until 2011:Q4; (c) The pseudo-balanced panel includes 21 forecasters, with each of them representing the minimum, 5th percentile, 10th percentile, ..., 95th percentile and the maximum of individual forecasts, respectively. 51 TABLES Table 7: Historical GDP forecast uncertainty due to the variance of Horizon 1Q ahead 2Q ahead 3Q ahead 4Q ahead 5Q ahead 6Q ahead 7Q ahead 8Q ahead aggregate shocks 0.95 0.80 0.72 0.73 0.78 0.81 0.82 0.78 idiosyncratic errors 0.04 0.16 0.24 0.20 0.18 0.15 0.14 0.18 systematic individual biases 0.01 0.04 0.04 0.07 0.05 0.04 0.04 0.03 52 TABLES Table 8: Historical inflation forecast uncertainty due to the variance of Horizon 1Q ahead 2Q ahead 3Q ahead 4Q ahead 5Q ahead 6Q ahead 7Q ahead 8Q ahead aggregate shocks 0.90 0.70 0.42 0.45 0.52 0.62 0.55 0.46 idiosyncratic errors 0.08 0.26 0.53 0.47 0.41 0.32 0.39 0.50 systematic individual biases 0.02 0.04 0.05 0.08 0.07 0.06 0.05 0.04