Measuring Uncertainty of a Combined Forecast and a

advertisement
Measuring Uncertainty of a Combined Forecast and a
New Test for Forecaster Heterogeneity
Kajal Lahiri, Huaming Peng
Department of Economics
University at Albany, SUNY
and
Xuguang Sheng
Department of Economics
American University
Abstract
Using a standard factor decomposition of a panel of forecasts, we
have shown that the forecast uncertainty from the standpoint of a
policy maker can be expressed as the disagreement among forecasters
plus the perceived variability of common aggregate shocks. Thus, the
uncertainty of the average forecast is not the variance of the average
forecast but rather the average of the variances of the individual forecasts, where the combined forecast is obtained by minimizing the risk
averaged over all possible forecasts rather than the risk of the combined forecast. Using a new statistic developed in this paper to test
the heterogeneity of idiosyncratic errors under the joint limits with
both T and N approaching infinity simultaneously, we show that the
previous used uncertainty measures significantly underestimate the
correct benchmark forecast uncertainty.
JEL classification: C12; C33; E37
Keywords: Forecast Uncertainty, Forecast Disagreement, Forecast Combination, Panel Data, Survey of Professional Forecasters
1
2
1
Introduction
Consider the problem of a macro policy maker who often has to aggregate
a number of competing forecasts from experts for the purpose of a uniform
policymaking. A general solution was provided by Bates and Granger (1969)
who have inspired extensive research on forecast combination, as evidenced
by two surveys by Clemen (1989), Timmermann (2006), and many additional
papers since 2006.1 The solution based on minimizing the mean square error
of the combined forecasts calls for a performance-based weighted average of
individual forecasts with precision of the combined forecast that is readily
shown to be better than any of the constituent elements under reasonable
conditions. Thus, Wei and Yang (2011) characterize this approach as “combination for improvement”. However, a surprising result found by many is
that a simple average is often as good as the Granger-Bates estimator, cf.
Genre et al.(2013) for a recent case study. Under the assumption that all the
cross correlations of forecast errors are due to a common aggregate shock,
the precision of this equally-weighted average is simply a function of the
variance of this common shock, netting out all idiosyncratic errors. Palm
and Zellner (1992) is an authoritative summary of the developments in this
area using alternative approaches till the early nineties. One characteristic
of this precision formula for the average (or what is often called consensus)
forecast is that it does not reflect disagreement among experts as part of
forecast uncertainty, which may be undesirable in many situations. As Winkler (1989, p.607) and Timmermann (2006, p.141) have noted, heightened
discord among forecasters ceteris paribus may be indicative of higher uncertainty in the combined forecast from the standpoint of a policy-maker.
In many contexts, disagreement has been used as a sole proxy for forecast
uncertainty.
It is important to note that the performance of the consensus forecast relative to individual forecasts follows trivially from Jensen’s inequality, which
states that with convex loss functions, the loss associated with the mean
forecast is generally less than the mean loss of individual forecasts, see McNees (1992) and Manski (2011). Thus, the consensus forecast is not more
accurate than every individual forecast in the pool; rather it outperforms an
individual forecast drawn randomly from the available forecasts.2 Moreover,
1
Granger and Jeon (2004) call this approach of making inference based on combined
outputs from alternative models as “thick modeling”.
2
Roughly one-third of individual forecasts have been found to be more accurate than
3
as we elaborate later, the wedge between the two is determined precisely by
the variance of the cross sectional distribution of forecasts. Thus, should disagreement be a part of forecast uncertainty, the presumed superiority of the
consensus forecast over the typical forecaster will be lost, and the primary
advantage of averaging will be an insurance value against idiosyncratic errors
that may be generated due to heterogeneous learning in periods of structural
instability, breaks and bad judgements, cf. Hendry and Clements (2002).
The procedure will essentially rule out forecasts that are inferior in a particular period to that of the typical forecaster on the average. The objective of
the combination approach will then be less ambitious, and will be similar to
that in the machine learning literature where forecasts are combined on line
to adapt to a few superior ones, see Wang (1994), and Sancetta (2012). We
show that a slight reformulation of the Bates-Granger loss function justifies
one to take the average of the individual forecast variances rather than the
variance of the average as the correct measure for aggregate uncertainty. A
more fundamental justification for incorporating disagreement as part of aggregate uncertainty comes from the rich literature on model averaging using
both Bayesian and Frequentist approaches, see Draper (1995) and Buckland
et al. (1995) for an early explication of the result.
One motivation to explore the importance of using a theoretically sound
uncertainty measure of the consensus forecast comes from the recent advances in the presentation and communication strategies by a number of
central banks, pioneered by Bank of England’s fan charts to report forecast
uncertainty. For the credibility of forecasts in the long run, it is essential
that the reported confidence bands for forecasts be properly calibrated. In
the U.S., from November 2007, all FOMC members are required to provide
their judgments as to whether the uncertainty attached to their projections
is greater than, smaller than, or broadly similar to typical levels of forecast
uncertainty in the past. In order to aid each FOMC member to come up
with their personal uncertainty estimates, Reifschneider and Tulip (2007)
have proposed a measure for gauging the average magnitude of historical
uncertainty using information on past forecast errors from a number of private and government forecasters. These benchmark estimates for a number
of target variables are reported in the minutes of each FOMC meeting and
are used by the public to interpret the responses of the FOMC participants.
a consensus, but the mix of the superior forecasts change depending on the occassion, see
Zarnowitz (1984), McNees (1992) and Hibbon and Evgeniou (2005).
4
Even though this measure incorporates the disagreement amongst forecasters
as a component of forecast uncertainty, we show that it underestimates the
true historical uncertainty if the individual forecast errors are heterogeneous.
Given that these historical benchmark numbers are fed into the highest level
of national decision making, a careful examination of the Reifschneider and
Tulip (RT) formula cannot be overemphasized.
In this paper we establish the asymptotic relationships between these alternative measures of uncertainty under the joint limits with both T and N
approaching infinity simultaneously, and develop a test to check if the uncertainty measures are statistically different and the forecasters are exchangeable. Using panel-data sequential asymptotics, Issler and Lima (2009) have
shown the optimality of the (bias-corrected) simple average forecast. Our
test for forecaster homogeneity differs from the Lagrange multiplier test for
heteroskedasticity developed by Baltagi et al. (2006) in that we do not assume that the idiosyncratic variance is a function of some known covariates.
A Monte Carlo study confirms that the test performs well in our context. We
also use individual inflation and output growth forecasts from the Survey of
Professional Forecasters (SPF) over 1982-2011 to show that the uncertainty
measure conventionally attached to a consensus forecast and the RT benchmark measure underestimate the true uncertainty, often substantially, that
increase steadily as the horizon goes from one to the eighth quarter. Our
test also confirms these results at the 1% level for all eight forecast horizons.
The plan of the paper is as follows. Section 2 derives the relationship
between disagreement and overall forecast uncertainty. In Section 3, we reexamine the reported confidence bands by Granger and Jeon (2004) for the
Taylor rule parameters based on the average of 24 alternative models. Section
4 compares different measures of average historical uncertainty and develops
a new test for forecast homogeneity. In Section 4 we use SPF data on real
GDP and inflation to highlight the differences in the alternative uncertainty
measures, and implement our test for forecaster homogeneity. Finally, Section 6 summarizes the results and present some concluding remarks. Technical proofs of Proposition and Theorem in section 3 are presented in the
Appendix.
5
2
Uncertainty and Disagreement
Let Yt be the random variable of interest, Fith be the forecast of Yt made
by individual i at time t − h. Then individual i’s forecast error, eith , can be
defined as
eith = At − Fith ,
(1)
where At is the actual realization of Yt . Following a long tradition, e.g., Palm
and Zellner (1992), Davies and Lahiri (1995)and Issler and Lima (2009), we
write eith as the sum of an individual time invariant bias, αih , a common
component, λth and idiosyncratic errors, εith :
eith = αih + λth + εith ,
λth =
h
X
utj .
(2)
(3)
j=1
Without loss of generality the deterministic individual bias αih is assumed
P
to satisfy α·h = N1 N
j=1 αih = 0 for convenience of exposition. In addition,
1
N →∞ N
we assume sup αih < ∞ so that lim
i,h
N
P
i=1
2
is well defined. The common
αih
component, λth , represents the cumulative effect of all shocks that occurred
from h-quarter ahead to the end of target year t. Equation (3) shows that
this accumulation of shocks is the sum of shocks (utj )that occurred over the
forecast span for each quarter. Even if forecasters make “perfect” forecasts,
the forecast error may still be nonzero due to shocks which are, by nature,
unpredictable. Forecasters, however, do not make “perfect” forecasts even in
the absence of unanticipated shocks. This “lack of perfection” is due to other
factors (e.g., differences in information acquisition and processing, interpretation, judgment, and forecasting models) specific to a given individual at a
given point in time and is represented by the idiosyncratic error, εith .
In order to establish the relationship between different measures of uncertainty and derive their limiting distributions, we make the following simplifying assumptions:
Assumption 1(Common Components)
6
(i) utj is a strictly stationary and ergodic stochastic process over t for all
2
j with E(utj ) = 0, E(u2tj ) = σuj
, and E |utj |4+δ < ∞ for some δ > 0. In
addition,
utj is independent
of uts for any j 6= s.
2 2
4 (ii) E(utj ut−k,j ) − σuj → 0 as k → ∞ for all t and j, and
2
lim
T →∞
var
Ph
j=1
utj
2 TP
−1 P
h
4
] + 1 = c > 0.
(1 − Tk )[E(u2tj u2t−k,j ) − σuj
k=1 j=1
Assumption 2 (Idiosyncratic Components)
2
) = 1,
εith = σεih υith . υith is i.i.d. across i and t, with E(υith ) = 0, E(υith
2
2
8
< ∞.
≤ sup σεih
< ∞ for all i; 0 < inf σεih
Eυith
i
i
Assumption 3 (Relations)
utj is independent of υish for all i, j, s, t and h.
Remark 1. Assumption 1 implies that the aggregate shock λth (= hj=1 utj )
is weakly dependent over time, and λ2th is completely regular, that is, the
correlation between λ2th and λ2t−k,h eventually dies out as the the time lag
k goes to infinity. The temporal dependence condition is thus intermediate
between uniform mixing and strong mixing (see, for example, Hall and Heyde
(1980) for a detailed explanation). The positiveness condition in (ii) controls
the speed at which the correlation diminishes, and is important for obtaining
central limit theorem for the dependent sequence λ2th .
P
Remark 2. In Assumption 2 cross-agent heterogeneity is allowed but correlation in the time dimension is ruled out for analytical simplicity.
Remark 3. Assumption 3 is standard in the literature. It helps the identification of the variance from aggregate shocks and that from idiosyncratic
shocks.
Taken together, Assumptions 1-3 imply that the individual forecast error
is a strictly stationary and ergodic process for any h and has the factor model
interpretation. Given a panel of forecasts, we can decompose the average
squared individual forecast errors as
N
N
1 X
1 X
e2ith = (At − F·th )2 +
(Fith − F·th )2 .
N i=1
N i=1
(4)
7
where F·th = N1 N
j=1 Fjth . Now taking time average on both sides of (4) we
get a measure of historical forecast uncertainty based on past errors:
P
T X
N
T
T X
N
1 X
1 X
1X
(At − F·th )2 +
e2ith =
(Fith − F·th )2 .
N T t=1 i=1
T t=1
N T t=1 i=1
(5)
This is our suggested measure of historical uncertainty of the average forecast, and is readily seen to be the average of the individual variances observed
over the sample period. This is also the measure suggested by Reifschneider and Tulip (2007). Equation (5) states that the measure can be decomposed into two components: uncertainty that is common to all forecasters
and uncertainty that arises from heterogeneity of individual forecasters. The
first component is the variance of the average that is conventionally taken as the uncertainty of the consensus forecast. The second component is
the disagreement among the forecasters. While analyzing individual density
forecasts reported in SPF, Zarnowitz and Lambros (1987) used the average
of the individual variances as the uncertainty of the average forecast. Giordani and Soderlind (2003) and Lahiri and Sheng (2010) provided theoretical
justifications for its use.
The above empirical measure of forecast uncertainty that carries the additional disagreement term can easily be justified by reformulating the objective function of the policy maker. Since each individual forecast comes
with its own uncertainty, we will argue that the policy maker while using
the combined forecast should minimize the risk averaged over all possible
forecasts. This is particularly the case where the forecasters are all experts
who do not collude or have any systematic predictive superiority over others
in all periods. The problem is then formulated as
min
ω
i
N
X
ωi E(e2ith )
(6)
i=1
subject to the constraint
N
X
ωi = 1.
i=1
Under (2) and assumptions 1-3, the overall risk function becomes
N
X
i=1
ωi E(e2ith )
=
E(λ2th )
+
N
X
i=1
ωi E(ε2ith )
+
N
X
i=1
2
ωi αih
.
(7)
8
Now the overall risk is composed of the risk associated with the common
shock, the weighted average of idiosyncratic risks and the weighted average
of squared bias. But, as it is now well recognized, when N is large, the
estimation of ωi based on variance and covariance is problematic due to the
curse of dimensionality. As shown by Issler and Lima (2009), an optimal
approach is to set ωi = N1 empirically for large N . Then the overall risk
becomes
N
N
N
1 X
1 X
1 X
E(e2ith ) = E(λ2th ) +
E(ε2ith ) +
α2 .
N i=1
N i=1
N i=1 ih
(8)
Interestingly, the use of population average of the individual loss functions
as the social loss function for a policy maker has also been proposed and
justified, among others, by Woodford (2005) on the ground that it is consistent with both the rational and herding behavior of market participants. In
practice, E(e2ith ) is estimated using observed forecast errors. Thus, the policy
maker, having an ensemble of forecasts, should expect to face an overall risk
given by
1
TN
N P
T
P
e2
i=1 t=1
ith
on the average.
Note that this loss function is different from the loss function implicit in
the vast forecast combination literature following the seminal work of Bates
and Granger (1969). Specifically, with a quadratic loss function, Bates and
Granger posited the policy-maker’s problem as
N
X
E[(
min
ω
i
ωi eith )2 ]
(9)
i=1
subject to the constraint3
N
X
ωi = 1.
i=1
Under the popular equal weighting scheme, we have
N
N
1 X
1 X
2
2
E[(
eith ) ] = E(λth ) + 2
E(ε2ith )
N i=1
N i=1
3
(10)
Elliott and Timmermann (2004) have shown that the optimal combination weights
remain the same for a much wider variety of loss functions including the popular MSE
loss, provided the aggregate forecaast errors are elliptically symmetric. They, however,
did not consider cross sectional distribution of the individual forecasts.
9
because of the restriction α·h = N1 N
j=1 αih = 0. Note that in equation (10)
the idiosyncratic risk is of magnitude order O(N −1 ), and will be negligible
P
for large N , making the overall risk E[( N1
N
P
eith )2 ] completely dominated
i=1
by the common risk E(λ2th ). By contrast, in equation (8), the idiosyncratic and aggregate risks are of the same order of magnitude O(1), and thus
the idiosyncratic term will not vanish as N goes to infinity. Even in cases
where N is small, the term
1
N2
N
P
i=1
E(ε2ith ) is always less than
1
N
N
P
i=1
E(ε2ith ),
and hence the measure of uncertainty imbedded in the conventional forecast
combination literature as in (10) will be less than that associated with our
measure as in (8) regardless of the values of the individual biases. This is
because, while using the combined forecast, equation (10) fails to propagate
the uncertainties associated with individual forecasts as a component of risk
to the policy-maker by "sweeping disagreement under the rug", as Giordani
and Soderlind (2003) had put it.
Observe that (8) can be written as
N
N
N
1 X
1 X
1 X
Ee2ith =
E [Fith − (At − αih )]2 +
α2 ,
N i=1
N i=1
N i=1 ih
(11)
where the first component on the right hand side is identified as the average uncertainty measure within forecasters, while the second term is the
disagreement measure due to systematic biases in individual forecasts. A
similar decomposition of forecast uncertainty has been obtained by Lahiri, et.al (1988), Draper (1995) and Wallis (2005) in cases where probability
distribution of forecasts are available.4
Note that (11) can be further simplified, by virtue of the assumptions 1-3
to
N
N
N
1 X
1 X
1 X
2
2
2
2
,
(12)
Eeith = σλ +
σεi +
αih
N i=1
N i=1
N i=1
which is the population analog of (5). It is now obvious from (12) that the
uncertainty of the average arises from the variance of the aggregate shock
common to all forecasters and from the heterogeneity of individual forecasters that contains both the average idiosyncratic variance and the average
4
Geweke and Amisano (2013) have presented a parallel decomposition of predictive
variance from Bayesian model averaging in terms of i) intrinsic, ii) within-model extrinsic,
and iii) between-model extrinsic variances.
10
squared forecast bias. In order to appreciate the second term, one has to
realize that forecasts generated by any particular model or forecaster will
have its own undeniable uncertainty, as Tetlow (2009) has demonstrated for
the FRB/US model of the Federal Reserve. When individual experts provide forecasts partly based on judgments as in SPF and Blue Chip surveys,
additional psychological variability will be involved too. Recognizing that
each point forecast is invariably an outcome from an underlying forecast
distribution, the combined forecast should be treated conceptually as the
mean of a finite mixture distribution of unobserved densities, which suggests
that the variance of the combined forecast will be the average of the individual variances plus an disagreement term due to systematic biases in the
individual forecasts, cf. Hoeting et al. (1999) and Wallis (2005). What is
not readily recognized in the literature is that apart from the disagreement
P
2
coming from time-invariant systematic biases (i.e., N1 N
i=1 αih ), the average
of the individual variances also contains a disagreement component coming
P
2
from N1 N
i=1 σεi . In the context of the empirical examples on real GAP and
inflation forecasts that we report in section 4, an ”uncertainty audit” reveals that the variance explained by the systematic bias component is small
compared to the other two components 12)(see Tables 7 and 8). A similar
result on the insignificance of the bias term is also reported by Reifscneider
and Tulip (2006). That is why the assumption αih = 0 (for all i and h) is
inconsequential in our context.
Equation (5) helps to reconcile the divergent findings in previous empirical studies examining the appropriateness of disagreement as a proxy
for uncertainty. As Zarnowitz and Lambros (1987) have pointed out, there
may be periods where all forecasters agree on relatively high macroeconomic
uncertainty in the immediate future, and hence disagreement between forecasters will be low even though uncertainty is high. The opposite is also
possible where forecasters disagree a lot about their point forecasts, but are
confident about their individual predictions. This situation will arise when
forecasters disagree on otherwise precise models and scenarios that should
be used to depict the movement of the economy over the forecasting horizon. Thus, lacking any theoretical basis, the strength and the stability of
the relationship between disagreement and uncertainty becomes an empirical issue. But our result clearly suggests that due to the presence of e2·th (i.e.,
the volatility of common shocks) in equation (5), the strength of the relation
between uncertainty and disagreement will depend crucially on the sample
period, the target variable, and length of the forecast horizon, see Lahiri and
11
Sheng (2010).
3
Measures of Historical Uncertainty
Based on the argument in Section 2, the historical risk faced by a policy
maker while using the average forecasts from N experts is simply
RMSEhLP S
=
v
u
u
t
T X
N
1 X
e2 .
T N t=1 i=1 ith
(13)
On the other hand, the conventional choice as suggested by Bates and
Granger (1969) is the variance of the mean forecast as measured by the root
mean squared error (RMSE) of the average forecast
RMSEhAF
=
v
u
u X
u1 T
t
T
t=1
N
1 X
eith
N i=1
!2
.
(14)
With the stated objective of using the root mean squared prediction errors
made by a panel of forecasts to generate a benchmark estimate of historical
forecast uncertainty, Reifschneider and Tulip (2007, p. 14), however, average
the individual RMSEs as in the following formula:
RMSEhRT
v
u
N u
T
X
1 X
t1
e2 .
=
N i=1 T t=1 ith
(15)
They explicitly recognized that their average RMSE is not the conventional
RMSEhAF associated with the average forecast as in (14), and noted “rather
we average the individual RMSEs of our forecasts in order to generate a
benchmark for the typical amount of uncertainty we might expect to be
associated with the separate forecasts of the different members of our sample, including the FOMC”. Note that RM SERT is not exactly the same as
RM SELP S ; nevertheless, we will show that it would incorporate at least partially the disagreement as a component of uncertainty. Below we now present
the asymptotic relationships between these measures and establish the limit
2
2
distribution of a statistic for testing the null hypothesis that σεih
= σεh
. As
will be clear, testing for forecaster homogeneity has a very special significance
in the present context.
12
To see the differences in these uncertainty measures, we derive the following limit theorem:
Proposition 1 Suppose Assumptions 1-3 hold. Then, as (T, N → ∞),
(1)
RMSEhAF
h
P
2
=
where σλh
j=1
2
2
→p σλh
,
2
.
σuj
(2)
T 1/2
RMSEhAF
2
N
1 X
2
− σλh
− 2
σ2
=⇒ N (0, Σλ ) ,
N i=1 εih
!
"
E(λ2th
where Σλ =
−
2 )(λ2
2 )]
E[(λ2th −σλh
−σλh
(t−k)h
2 )2
E(λ2th −σλh
2 2
)
σλh
1 + 2 lim
#
TP
−1
T →∞ k=1
(1 −
k
)ρλ (k)
T
and ρλ (k) =
.
(3)
RMSEhRT
2
N q
1 X
2
2
lim
σλh
+ σεih
N →∞ N
i=1
→p
!2
.
(4)

2
T 1/2  RMSEh
RT

=⇒ N 
0,
N q
1 X
2
2
−
σλh
+ σεih
N i=1
!2 
N q
X
1
N →∞ N
2
2
σλh
+ σεih
lim
!2 
i=1
1
N →∞ N

lim

N
X
1
q
i=1
2
2
σλh
+ σεih
(5)
1
N →∞ N
2
where σεh
= lim
N
P
i=1
RMSEhLS
2
2
2
→p (σλh
+ σεh
),
2
σεih
.
(6)
T
1/2
2
RMSEhLS
N
1 X
−
σ2
=⇒ N (0, Σλ ).
N i=1 εih
!
−
σλ2
2


Σλ 
.
13
Remark 4. Although RM SEAF has a bias term of order O( N1 ) originating
from averaging the idiosyncratic shocks in finite samples, it is a consistent estimator for the uncertainty associated with forthcoming common shock,
which is allowed to be more general than a strictly moving average process.
In establishing the joint limit distribution, we need the preliminary result
that
1
N
N
P
εith →p 0 or
i=1
N
P
1
N
Fith →p At + λth as N → ∞. Similar results
i=1
are also obtained by Issler and Lima (2009) in their Proposition 3 for their
bias-corrected average forecast, under the assumption that the common shock
is a moving average process of order at most (h − 1). While their paper is
aimed at obtaining the bias-corrected average forecast under the sequential
limit where T → ∞ followed by N → ∞, our focus is to find an appropriate
benchmark measure of overall forecast uncertainty. In this regard, RM SEAF
completely ignores the uncertainty associated with the idiosyncratic shocks,
especially when N is large.
Remark 5. Proposition 1 shows that RMSEhAF ≤ RMSEhRT ≤ RMSEhLS in
1
N →∞ N
the limit, because lim
N q
P
σ2
i=1
λh
1/2
2
2
2
+ σεih
is less than (σλh
+ σεh
)
s
of Jensen inequality, with the latter given by
1
N →∞ N
2
σλh
+ lim
N
P
i=1
by virtue
2
σεih
in the
presence of unequal idiosyncratic error variances.
√
h
h
the same T
and RM SELP
Remark 6. Although both RM SERT
S have
2
h
h
since
rate of convergence, RM SERT
is more dispersed than RM SELS
lim 1
N →∞ N
N q
P
σ2
i=1
λh
!2
2
+ σεih
lim 1
N →∞ N
N
P
√ 21 2
σλh +σεih
i=1
!2
≥ 1 by the Cauchy-Schwarz
inequality. In a special case where there is no individual heterogeneity with
2
2
respect to idiosyncratic error variance, that is, σεih
= σεh
a.s. for all i and
h, RT and LPS will have the same limit distribution.
h
h
To check whether RM SERT
and RM SELP
S are statistically different in
the context of a particular data set, we propose a simple test statistic under
2
2
the null hypothesis that σεih
= σεh
for almost all i5 . The following limit
theorem provides a theoretical basis to conduct the statistical inference for
both large T and large N .
5
2
2
2
6= σεh
for a fixed nonzero fraction of σεih
.
The alternative hypothesis is that σεih
14
2
2
Theorem 1 Suppose Assumptions 1-3 hold. Then, if σεih
= σεh
for almost
all i,
2−1/2 N −1/2

N h
X
i=1
2
2
4σbλh
σbεih
+ Vbih

i−1
T
1
T
T
X
t=1
e2ith −
1
NT
N X
T
X
i=1 t=1
!2
e2ith
− 1
=⇒ N (0, 1)
as (T, N → ∞) and
N
T
→ 0, where
1 h
2
h
2
(RM SELP
,
S ) − (RM SEAF )
N
T
N
1 X
1X
εb2ith , and εbith = eith −
eith
=
T t=1
N i=1
2
h
σbλh
= (RM SEAF
)2 −
2
σbεih
Vbih =
T 2
1X
2
εb2ith − σbεih
T t=1
Remark 7. Theorem 1 establishes the joint limit distribution of the statistic
2
2
for almost all i and each h.
= σεh
for testing the null hypothesis that σεih
The restriction N
→
0
controls
for
the
effect
of
errors in approximating the
T
variance of
T
P
√1
e2 ,
T t=1 ith
the special bias in panel estimation, and prevents the
approximation errors from having a dominated asymptotic impact on the test
statistic.
Remark 8. Our decomposition of forecast errors in equation (2) can be expressed as a two-way error component model if we write εith = µih + ξith ,
2
assuming µih ∼ N (0, σµ2 ih ) and ξith ∼ N (0, σξh
).6 In this framework, our
2
2
test for the null hypothesis that σεih
= σεh
for almost all i and each h is
2
essentially equivalent to testing for σµih = 0 for almost all i.
Several tests for homoscedasticity in error component models have been
developed in the literature. For example, Verbon (1980) derived a Lagrange
multiplier test for the null hypothesis of homoscedasticity against the alternative of heteroscedasticity by assuming that σµ2 ih is a function of some
6


Mazodier and Trognon (1978) studied models allowing for heteroscedasticity in the
individual-specific error µih , while Rao et al (1981), Magnus (1982), Baltagi (1988) and
Wansbeek (1989) considered models allowing for heteroscedasticity in the error term ξith .
15
exogenous variables.7 In the similar spirit, Baltagi et al. (2006) developed
joint LM tests for homoscedasticity. Holly and Gardiol (2000) derived a score
test for homoscedasticity. All of these LM tests are based on the assumption that the variance is a parametric function of some observable exogenous
variables as in the original Breusch and Pagan (1979) test for homoscedasticity. Our test, based on Theorem 1, differs from the tests mentioned above,
since we do not assume that the idiosyncratic variance is a function of some
variables. In fact when N is small,

N h
X
2
2
4σbλh
σbεih
+ Vbih
i=1

i−1
T
T
N X
T
1 X
1X
e2ith −
e2
T t=1
N T i=1 t=1 ith
!2 

(16)

approximately follows a χ2N −2 distribution, and is also closely related to
N
P
1
T
8(RM SELS )3/2 i=1
T
1 P 2
e
N T
1 P P 2
e
!2
, the dominant term
ith − N T
ith
t=1
i=1 t=1
h
h
RM SELP
S and RM SERT . However, when N goes
T
in the dif-
to infinity
ference between
with T , the test statistic in (16) is of stochastic order Op (N ), leading to overrejection of the null even if the null hypothesis is true. Theorem 1 addresses
this size issue by re-centering and then properly scaling the test statistic such
that central limit theorem can be applied8 . A similar approach was adopted
by Pesaran and Yamagata (2008) in their test of slope homogeneity in large
7
2
2
Verbon also assumed ξith ∼ N (0, σξih
) and σξih
is a function of the set of exogenous
2
variables defining σµih .
8
We conducted a Monte Carlo study to explore the size and power of the new test at the
5% nominal level. The model is eit = λt + εit where λt follows a stationary AR(1) process,
2
λt =√ρλt−1 √
+ ωt , with ρ = 0 or 0.4. ωt is randomly drawn alternatively
√
√ from N (0, σω ) or
2
U (− 3σω , 3σω ). εit are i.i.d. drawn from N (0, σεi ) or U (− 3σεi , 3σεi ). We studied
2
the size of the test under the null hypothesis H0 : σεi
= σε2 for almost all i, and the power
2
2
of test under the alternative H1 : σεi 6= σε for a fixed proportion r (0 < r ≤ 1). To explore
2
the effect of changes in r on the power of test, we let σεi
= σε2 for (1 − r) of all i, and the
2
remaining
σεi were generated according to U (0.1σε , 1.9σε ), with r = 0.2 or 0.4. The values
of σω2 ,σε2 are selected from the set {(0.05, 0.01), (0.1, 0.01), (0.2, 0.2), (0.6, 1.5), (3, 1)},
and are similar to the estimated parameters in the empirical examples below.
Tables 1-2 report the simulation results. Two points are worth noting. First, the Z
test (Table 1) yields reasonably good empirical size under different values of N , T and
σ2
2
signal to noise ratio (SN R = (1−ρω2 )σ2 ). Second, when only 20% of σεi
are different from
ε
2
σε (Table 2, Panel A), the empirical power of test is low, especially for small N . Even
when N is moderately large, the test tends to under-reject the null hypothesis provided
that both σω2 and σε2 are small and the SNR is large. But the test does well in detecting
heteroscedasticity for moderately large σω2 and σε2 when both N and T are large. The
16
random coefficient panel data models. A detailed procedure for their test
can also be found in Hsiao and Pesaran (2008).
It is interesting to note that, when idiosyncratic error variances are the
same across forecasters, both RM SERT and RM SELP S can be interpreted
as measuring the uncertainty of a typical forecast in the panel. This interpretation is consistent with the explicitly stated objective in Reifscheinder
and Tulip (2007) that the benchmark uncertainty figures they are providing is that of a typical forecaster in the pool that will be individualized by
forecasters based on their own special experiences. Let us assume that in
predicting the variable for the target year t, denoted by At , each forecaster
possesses two types of information: (i) common information, yth = At + ηth
with precision κth ; and (ii) private information, zith = At + ith with precision
τth , where ηth and ith ’s are mutually independent and normally distributed
with mean 0. Then according to Bayes rule, individual i’s forecast Fith is a
weighted average of the two signals with their precision as the weights:
κth yth + τth zith
,
(17)
κth + τth
1
. Taking exand the uncertainty of an individual forecast is simply κth +τ
th
pectations on both sides conditional on available information, the left-hand
side of equation (4) becomes
Fith ≡ E[At | yth , zith ] =
N
N
1 X
κth ηth + τth ith 2
1 X
2
E[(At − Fith ) | yth , zith ] =
E[(
) | yth , zith ]
N i=1
N i=1
κth + τth
1
=
.
(18)
κth + τth
Thus, we see that our uncertainty measure, RM SELP S , is the same as the
1
variance of an individual forecast, with both of them equal to κth +τ
in the
th
population. So it can be interpreted as the confidence an outside observer
will have in a randomly drawn typical individual forecast from the panel of
forecasters, as pointed out first by Giordani and Söderlind (2003).
Note that the expected value of the first term on the right-hand side of
equation (4) can be expressed as
power of test is close to one when the SNR is not greater than one for N ≥ 40. The test
becomes more powerful when r increases. More specifically, the test under r = 0.4 (Table
2, Panel B) has power that is almost the double of those under r = 0.2. Similar results
from from many more simulation experiments can be obtained from the authors.
17
E[(At − F·th )2 | yth , zith ] =
κ +
κth + τNth
.
(κth + τth )2
(19)
τth
th
N
in equation (19) is less than
For N > 1, it is easy to verify that (κth
+τth )2
1
in equation (18). This shows that the uncertainty conventionally
κth +τth
associated with the consensus forecast, RM SEAF , is less than RM SELS or
RM SERT . However, as we have shown before, when idiosyncratic errors
are heteroskedastic, RM SERT will be in generally smaller than RM SELP S
implying that RM SERT does not fully incorporate disagreement among the
forecasters.
4
Confidence Interval for Thick Modeling: An
Illustration of Underestimation
Granger and Jeon (2004) have presented a number of illustrative examples
to show advantages of using multiple models and forecast averaging for the
purpose of inference. One particular case (their section 8) where they report
the confidence intervals of average parameter estimates based on 24 closely
related alternative specifications of the Taylor rule highlights well the issue we
raise in this paper. They reexamine the empirical findings of Kozicki (1999)
who estimated the Taylor rule using reasonable variations in the alternative
definitions of inflation and output gap with monthly data from 1983-1997.
The Taylor rule stipulates:
g
rt = c + (1 + α)πt−1 + βyt−1
,
g
where rt is the fund rate at time t, c is a constant, πt−1 and yt−1
are the
inflation and output gap at t − 1 respectively. The hypothesized values
of the parameters are 0.5 for both α and β. Because the nature of the
estimates and our conclusions are very similar for α and β, in this section,
we will focus on the α - the inflation weight. As reported in Table 5 of
Granger and Jeon (2004, p.340) the 24 estimates of α varied widely across
models ranging from −0.117 to 1.375 with an average value of 0.598. The
standard error of these estimates varied from 0.137 to 0.283 with mean 0.219.
Based on bootstrap aggregation (bagging) Granger and Jeon obtained the
b to be 0.045. Thus their 95% confidence
standard error of the average α
18
interval for α was (0.508, 0.688) that included only 3 of the 24 estimates of
α. Since they did not explain the details of the bagging procedure, it seems
b under a very
that they bootstrapped the standard error of the average α
stringent assumption. Note that the squared root of var( N1
N
P
i=1
b i ) is given
α
0
by N −2 lN
ΣlN , where lN is a N × 1 vector of ones and Σ is the covariance
b This expression can also be written as
matrix of α.
var(
N
N
N
X
X
1 X
b i ) = N −2
b i ) + N −2
bi, α
b j ).
α
var(α
cov(α
N i=1
i=1
i6=j
(20)
b their reported standard error of 0.045 is obtained using only the
For average α
first term in (20), i.e., N −2
N
P
i=1
b i )with var(α
b i ) being their bootstrapped
var(α
b i for the ith model. It implies that the bagestimates of standard errors of α
ging estimate that Granger and Jeon (2004) reported was based on bootstrap
samples that assumed that the parameter estimates to be uncorrelated across
the 24 alternative models, cf. Hastie et al.(2009, p.285). Note also that if N
is allowed to tend to infinity and N −1
N
P
i=1
b i ) is bounded, their confidence
var(α
interval would eventually collapse to the mean 0.598. In general, we expect
b i and α
b j (j 6= i) to be correlated and hence the standard error of average α
b
α
would converge to the limit of N −2
N
P
i6=j
sition 1 that the limit of N −2
N
P
i6=j
bi, α
b j ). We have shown in Propocov(α
bi, α
b j ) would then be the standard error
cov(α
b i ’s.
of the common component of various α
In the absence of the actual data, we approximated above expression
b i ’s are
(20) for different values of the cross correlations by assuming that α
equi-correlated. As we increased this correlation from 0 to 0.9, the standard
b increased steadily from 0.045 to 0.219. Note that the
error of the average α
24 models are based on conceptually very similar covariates and hence the
parameters estimated across different models are expected to be reasonably
correlated. If we assume the correlation to be 0.5, the standard deviation of
b is 0.158. This value is significantly less than 0.224 - the square
the average α
root of the average variances as suggested by our analysis in the absence
of systematic biases in parameter estimates. This underestimation is due
0
to the fact that the Granger-Bates formula N −2 lN
ΣlN does not incorporate
the uncertainties associated with each of the 24 models, see Draper (1995,
19
p.57) or Buckland et al. (1995, p.604). Even if the correlations among
b are taken into account in (20), the confidence interval
the components of α
b would still be too narrow to be an appropriate measure of
of average α
b i . The difference between standard
uncertainty associated with a typical α
b corresponding to the true correlation among individual
error of average α
b i ’s and 0.224 can be attributed to the variance of biases in the individual
α
estimates.
Given the data and the alternative models one can of course do a fullfledged BMA or FMA analysis to obtain the true standard error of the avb that will incorporate the uncertainty of the individual estimates. As
erage α
b is expected to be a gross
explained before, their standard error of average α
underestimation of the true risk that a policy maker will face in the presence
of such varied point estimates from the 24 reasonably well specified Taylor
rule specifications. Rather we should compute the true uncertainty following
b i (= 0.050) and the variance
((11) that includes both the average variance of α
of the parameter estimates of α, which was calculated to be 0.243. These
numbers yield a standard error of 0.541 which is almost ten times bigger than
what Granger and Jeon reported. This underestimation happens because of
two reasons - 1) the use of the variance of the average rather than the average
of the variances and 2) disregarding the variance of the point estimates.
5
Estimated Benchmark Uncertainty based
on SPF Data
In this section, we present estimates of historical uncertainty in inflation and
output growth forecasts using RM SELP S ,and compare it to RM SEAF and
RM SERT . The data in our study are taken from the Survey of Professional
Forecasters (SPF) and Real Time Data Set for Macroeconomists (RTDSM),
provided by the Federal Reserve Bank of Philadelphia. Although the SPF
began as the NBER-ASA survey in 1986:Q4, we focus on the forecasts made
from 1981:Q3 until 2011:Q4.9 The starting date reflects a decision by the
SPF to ask for the forecasts of inflation and output growth for both the
current year and next year. In our analysis, inflation is measured as the
annual-average over annual-average percent change of GDP deflator, and
9
The data were downloaded on March 1, 2012, and thus include the corrections released
on August 12, 2011.
20
output growth is similarly defined in terms of real GDP. The actual horizons
for these forecasts are approximately 7 12 , 6 12 , . . ., 12 quarters but we refer to
them simply as quarterly horizons of 8, 7, . . . , 1.
In calculating the percentage change from year t − 1 to t, we transform
the quarterly forecasts of the level of the variables into annual forecasts of
the growth rate of the variables. Specifically, we use the forecasts of the level
of the variables in the current and subsequent quarters and the actual values
from the vintage of data available at that time to calculate the forecast of
annual inflation and output growth for the current year. For example, in the
second quarter of the current year, a respondent would use data on the actual
values of real GDP in the first quarter (Y1,t ) and make predictions through
the end of year t (F2,t , F3,t , F4,t ). Accordingly, as as example, the output
growth forecast for the current year at horizon of 3 quarters is calculated as
ft = (Y1,t + F2,t + F3,t + F4,t )/(Y1,t−1 + Y2,t−1 + Y3,t−1 + Y4,t−1 ), where Y1,t−1
through Y4,t−1 are the level of actual real GDP in the first through fourth
quarter of the year t − 1, which are available to forecasters in the second
quarter of year t.
In calculating the percentage change from year t to t+1, we simply divide
Ft+1 by Ft , where Ft+1 and Ft are the annual forecasts of real GDP for the
next year and current year, respectively, which are available from the SPF
surveys from 1981:Q3 onwards. Note that according to SPF, errors were
made for surveys conducted in 1985:Q1, 1986:Q1 and 1990:Q1, so that the
first annual forecast is for the previous year and the second annual forecast
is for the current year. So we do not have those forecasts for the next year
made at these quarters.
To calculate forecast errors, we also need the actual values of annual
inflation and output growth. As is well known, the NIPA data often go
through serious revisions. Obviously, the most recent revision is not a good
choice, since it involves adjustment of definitions and classifications. We
choose the first quarterly release of annual inflation and output growth to
compute the actual values.
Since not all forecasters responded at all times, we have to deal with
the unbalanced panel data. We can, of course, compute the uncertainty
measures using a subset of the unbalanced panel. Though convenient, this
method tends to lose important information by throwing out the valuable
observations. On the other hand, we may also construct the balanced data
set by estimating the missing observation as in Capistrán and Timmermann
(2009). But estimation of the missing values will nevertheless introduce addi-
21
tional uncertainty, which may obscure or complicate the forecast uncertainty.
In addition, the estimated values may not be reliable when a large number
of observations are missing as in the SPF data set.
One straightforward way to extend the measures designed exclusively for
balanced panels to unbalanced panels is simply to replace N with Nt and
T with Ti in equations (14), (15) and (13). While this approach utilizes
all available data, it is easy to see that the inequality relationship between
RM SEAF and RM SERT (and hence RM SELP S ) no longer holds.10 More
h
specifically, this simple approach implies that while calculating RM SEAF
we
are implicitly replacing the missing observations with cross-section average
1
Nt
Nt
P
i
h
h
eith at time t. In contrast, for both RM SERT
and RM SELP
S this
s
1
Ti
approach fills in the missing data with values
Ti
P
e2
t
ith .
h
Thus, RM SEAF
h
h
(and hence RM SELP
and RM SERT
S ) are essentially using different data
sets. This casts doubt on the appropriateness of using this approach to
compute the three measures in an unbalanced panel.
When dealing with unbalanced panels, an equal weighting scheme fails
to discriminate forecast accuracy among different forecasters. In particular, the more observations are available for forecaster i, the more reliable
information can be extracted from forecaster i. Thus, it is necessary and
intuitively appealing to adopt an unequal weighting scheme to partially address the problems in computing forecast uncertainty caused by unbalanced
panels. Following Bates and Granger (1969), we use the weights that are
inversely proportional to the variance of summation components. To avoid
the complications in estimating the variances, we assume that the weights
are proportional to the number of observations for each time period or for
each forecaster. Specifically, suppose that at t = τ , we have n∗τ cross-section
observations starting from the unit nt , and at n = i, we have t∗i time series
observations starting from time ti . Define
ωt =
n∗t
T
P
t=1
n∗t
and ωi =
t∗i
N
P
t∗
i=1
.
i
Then three uncertainty measures can be expressed as
10
Such a violation occurs because we can no longer apply Cauchy-Schwarz inequality in
unbalanced panels.
22
RM SELP S =
v
∗
u
T nt +n
u 1 X
Xt −1
u
e2it
u T
tP ∗
n t=1 i=nt
t=1
RM SEAF =
t
v

u
uX
u T
t
ωt 
t=1
RM SERT =
N
X
i=1
(13A)
1
n∗t
n∗t +nt −1
X
eit 
(14A)
i=nt
v
u t∗ +t −1
u
i
u1 iX
ωi t
e2it
t∗i
2
(15A)
t=ti
Table 3 reports the result for inflation forecasts. Among the three measures, as expected, RM SEAF yields the smallest forecast uncertainty and
there is a large difference between RM SEAF and the other two. At longer
horizons, the importance of the idiosyncratic errors becomes increasingly
more important, and of similar size as the variance of aggregate shock. Thus,
pervasive heterogeneity seems to exist amongst the professional forecasters
but is ignored in the traditional measure RM SEAF . In addition, for all eight
horizons RM SERT is less than RM SELP S . Simple calculations show that
the degree of underestimation is between 5% and 35% when RM SEAF is
used as the measure of historical uncertainty and between 8% and 16% when
RM SERT is used instead. As we show later, the estimated test statistics are
such that the difference between RM SERT and RM SELP S is statistically
significant at the 1% level for all eight horizons.
Table 4 presents the result for output growth forecasts. The picture
here is somewhat different from inflation forecasts. Although RM SEAF is
still the smallest measure of forecast uncertainty, and RM SERT is smaller
than RM SELP S across all eight horizons, the difference between RM SEAF
and the other two is not as substantial as in inflation. This is due to the
fact that the variance of aggregate shock (σλ2 ) is a more dominant factor and
forecast disagreement plays a lesser role in output growth forecasts on average
compared to inflation. As a result, when RM SEAF is used as the measure of
historical uncertainty the degree of underestimation is modest, between 3%
and 15%. There is still a statistically significant difference between RM SERT
and RM SELP S . The RMSEs associated with output are uniformly higher
compared to inflation due to a differential incidence of common shocks. This
phenomenon, which makes real GDP growth a difficult variable to predict,
23
has been documented by Lahiri and Sheng (2011) using a heterogeneous
learning model, see also Elliott (2011a).
One might note that at one-quarter ahead, RM SEAF is slightly larger
than RM SERT . This is possibly due to the unbalanced panel. In a separate
experiment, we constructed a pseudo-balanced panel of 21 forecasters with
each of them representing the minimum, 5th percentile, 10th percentile, ...,
95th percentile and the maximum of individual forecasts, respectively. The
results, reported in Tables 5 and 6, clearly show that, as theoretically expected, RM SEAF < RM SERT < RM SELP S at all eight horizons in both
inflation and output growth forecasts.11
To further illustrate the relative importance of heterogeneity in idiosyncratic errors, we calculate the percentage of average squared individual forecast errors that is explained by the three components in (8) including the
two components forecast disagreement. For inflation, the contribution due
to idiosyncratic errors increases from 8% for 1Q horizon to 50% for 8Q, with
a corresponding decline in the role of common shock from 90% to 46%. As
forecast horizon increases, disagreement becomes increasingly a more important factor of aggregate uncertainty. The interesting result to note is that
the contribution due to time-invariant individual biases is rather small never
exceeding 8% for the eight horizons. For growth forecasts, as noted before,
the common shock maintains its relative importance more than its role in
inflation as horizon increases. The relative importance of disagreement hovers around 18% at long horizons and still is non-ignorable. We must add
that these are average percentages over the sample period. While estimating aggregate forecast uncertainty in real time, the disagreement term can
sometimes be very large, often indicating unusual future. The distribution
of disagreement over time for both inflation and real GDP for each horizon
is highly skewed to the right.The RMSE explained by individual systematic
biases again is very small like in inflation forecasts.
Note that the RMSE figures that are reported in Reifschneider and Tulip
(2007) and those in this paper are not directly comparable. We present average year-over-year forecasts whereas RT use Q4 over Q4 percent changes.
The latter target tends to be more variable. Our estimates are based on
11
In constructing the confidence interval for the combined forecast, Granger and Jeon
(2004) suggest using the variance of the mean, which is much less than RM SELS . For
example, in their Table 5, the standard deviation calculated using the variance of the mean
is 0.045 for inflation weight in a Taylor rule equation, much less than 0.219, the standard
deviation calculated using RM SELS .
24
SPF individual forecasts beginning 1981:Q3 whereas RT started with an initial 20-year sample beginning 1986. More importantly, Reifschneider and
Tulip (2007) used the the simple averages of the individual projections in
SPF, Blue Chip and FOMC panels, together with Greenbook, Congressional Budget Office (CBO) and the Administration forecasts (N=6) in their
group
calculations.
=
q PMore generally, their measure is expressed as RMSEh
P
M
T
1
1
m
m 2
m=1
t=1 (At − F·th ) , where F·th is the mean forecast for the group
M
T
m, for the target year t and h-period ahead to the end of the target year.
By averaging across individual projections, most of idiosyncratic differences
and disagreement in FOMC, SPF and Blue Chip forecasts have inadvertently been washed away. They found very little heterogeneity in these six
forecasts. On the other hand, their simultaneous use of Greenbook, CBO,
Adminstration, mean FOMC, SPF, and Blue Chip forecasts meant that RT
had to meticulously sort out important differences in the comparability of
these six forecasts due to data coverage, timing of forecasts, reporting basis
for projections, and forecast conditionality. Despite all these differences, a
close examination of these two sets of uncertainty estimates suggest that our
longer term forecasts are clearly tagged with more historical uncertainty than
the corresponding RT estimates. At least a part of the explanation for this
discrepancy has to be due to their use of an incorrect formula for estimating benchmark uncertainty and the aggregation of individual forecasts from
FOMC, SPF and Blue Chip surveys.
6
Conclusion
Recent model averaging literature suggests that the risk associated with a
combined forecast should incorporate not only the average of the variances
of the individual forecasts but also the variance of the point forecasts across
different models. This means that even if we have highly precise individual
forecasts, we might end up with considerable uncertainty about the combined
forecast if these point forecasts are very different across forecasters.
The benchmark uncertainty estimates that we report in this paper will
help the public to interpret the mean or the consensus forecasts that are
released by surveys such as SPF, Blue Chip, or FOMC. Often the consensus
forecasts are reported together with maximum, minimum and interquartile
25
range of the point forecasts, presumably to suggest the underlying uncertainty. However, as we have explained, these features of the cross sectional distribution indicate only a part of the aggregate uncertainty. For the purpose of a
knowledgeable use of the consensus forecasts by the public, historical benchmark uncertainty estimates along the line suggested in this paper will lead
to more meaningful use of the estimates by providing historical confidence
intervals.
Which is worse - widening the bands now or missing the truth later?
Draper.
Due to heterogeneity and entry and exit of forecasters, Manski suggested
not to combine. We attest try to give confidence band that recognize the
uncertainties associated with individual forecasts. Leamer warned us about
fragility of inferences because of out failure to recognize model uncertainty.
The developments in Granger Bates approach has sidestepped this concern.
Granger-Jeon’s example is a very peprtiment example of such disregard.
Acknowledgement
We have benefited from discussions with Robert Engle, David Hendry, Hashem
Pesaran, Peter Phillips, Kenneth Wallis and Mark Watson. We also thank
the comments from participants at 10th World Congress of Econometric Society, 30th International Symposium on Forecasting, 12th ZEW Summer
Workshop for Young Economists, New York Camp Econometrics, Society
of Government Economist annual meeting, 6th Colloquium on Modern Tools for Business Cycle Analysis, George Washington University and Howard
University.
26
7
7.1
Appendix
Some Useful Lemmas and Their Proofs
A word on notation. k·k denotes the Euclidean norm. −→p denotes convergence in probability. =⇒ denotes convergence in distribution. The limits
taken under (T, N → ∞)seq refer to sequential limits where T goes to infinity
first and then N goes to infinity. The limits taken under (T, N → ∞) are
regarded as the joint limits with both T and N approaching infinity at the
same time.
We start with some lemmas that are useful in the following arguments.
Lemmas 1 and 2 come from Lemma 6 and Theorem 1, respectively, in Phillips
and Moon (1999). For simplicity, the subscript h will be omitted in the proof.
Lemma 1 (Conditions for Sequential Convergence to Imply Joint Convergence)
(a) Suppose there exist random variables Xi and X on the same probability
space as XN,T satisfying, for all N, XN,T →p XN as T → ∞ and
XN →p X as N → ∞. Then XN,T →p X as (T, N → ∞) if and only
if
lim supP {|XN,T − XN | > } = 0 ∀ > 0.
(A1)
N,T
(b) Suppose there exist random variables XN such that, for any fixed N,
XN,T =⇒ XN as T → ∞ and XN =⇒ X as N → ∞ Then XN,T =⇒ X
as (T, N → ∞) if and only if
lim sup |E(f (XN,T ) − Ef (XN )| = 0 ∀f ∈ C
(A2)
N,T
where C is the class of all bounded, continuous real function on R.
Lemma 2 Suppose the random variable Yi,T =
1
T
T
P
i=1
Xi,T are independent
across i for all T and integrable. Assume that Yi,T =⇒ Yi as T → ∞ for all
i. Let the following condition hold:
(i) lim sup N1
N,T
N
P
i=1
E |Yi,T | < ∞,
27
N
P
(ii) lim sup N1
|EYi,T − EYi | = 0,
i=1
N,T
N
P
(iii) lim sup N1
(iv) lim sup N1
N,T
E |Yi,T | 1 {|Yi,T | > N ε} = 0 ∀ε > 0,
i=1
N,T
N
P
E |Yi | 1 {|Yi | > N ε} = 0.
i=1
Then
(a) Condition (A2) holds.
N
P
1
N →∞ N
(b) If lim
EYi (= µx ) exists and XN →p µx , we have XN,T →p µx as
i=1
(T, N → ∞).
Lemma 3 Suppose Assumptions 1-3 hold. Then
1
T
(a) As T → ∞,
1
T
T
P
t=1
t=1
T
P
1
T
λ2t →p σλ2 ,
t=1
λt υjt →p 0,
1
T
T
P
t=1
υit2 →p 1, and
υit υjt →p 0.
1
(b) As T → ∞,
1),
T
P
1
T 1/2
T
P
t=1
T 1/2
T
P
t=1
λt υit =⇒ N (0, σλ2 ),
υit υjt =⇒ N (0, 1), and
"
Σλ =
E(λ2t
−
σλ2 )2
1 + 2 lim
1
T 1/2
TP
−1
T →∞ k=1
(1 −
1
T 1/2
T
P
T
P
(υit2 − 1) =⇒ N (0, Eυit4 −
t=1
(λ2t −σλ2 ) =⇒ N (0, Σλ ), where
t=1
#
k
)ρ(k)
T
,
1
T 1/2
T
P
(υit4 − Eυit4 ) =⇒
t=1
2
N (0, Eυit8 − (Eυit4 ) ).
(c) supE| T1
T
(d)
T
P
t=1
λt υit | < ∞, supE| T1
T
T
1 P
supE T 1/2
(υit2
t=1
T
∞.
1
(e) supE| T 1/2
T
T
P
− 1)
<
T
P
t=1
υit2 | < ∞, supE| T1
T
1 P
∞, supE T 1/2
λt υit t=1
T
1
(υit2 − 1)|4 < ∞, supE| T 1/2
t=1
T
T
T
P
t=1
T
P
t=1
υit υjt | < ∞.
1
< ∞, supE| T 1/2
λt υit |4 < ∞.
T
T
P
t=1
υit υjt | <
28
Proof. Under Assumption 1(i), we see that λt is a strictly stationary and
ergodic stochastic process with E(λt ) = 0, E(λ2t ) = σλ2 and E|λt |4+δ < ∞
for some δ > 0. Hence λ2t is strictly stationary and ergodic and ρλ (k) =
2 )(λ2
2
E[(λ2t −σλ
t−k −σλ )]
2 )2
E(λ2t −σλ
→ 0 as k → ∞. Then the convergence of
1
T
T
P
t=1
λ2t to σλ2
follows by applying weak law of large number (WLLN) for strictly stationary
and ergodic processes. The remained convergence results in part (a) can be
easily established by using Khintchine’s WLLN since υit is i.i.d. across i and
over t with finite eighth moments,
and is independent
of λt . Now Assumption
)
(
1(ii) implies that lim T 2
T →∞
totic normality of
k=1
T
P
1
T 1/2
TP
−1
(1 − Tk )ρλ (k) + 1
→ ∞. Then the asymp-
(λ2t − σλ2 ) in part (b) holds by virtue of Theorem
t=1
5.6 of Hall and Heyde (1980). The remained asymptotic normality results
in part (b) can be proved trivially by classical central limit theorem. Part
(c) can be readily verified by application of Cauchy-Schwarz inequality and
Jensen inequality. Part (e) follows directly from Theorem 3.6 of Billingsley
(1999) and part (b). For part (d), the first two results follow easily from
part (e), and the last one can be obtained by first applying Jensen inequality
and then following the procedure in part (e).
Lemma 4 Suppose Assumptions 1-3 hold. Then
(a)
(b)
(c)
(d)
(e)
(f)
1
T
1
T
1
T
1
T
1
T
1
T
T
P
t=1
υit N1
T
P
1
N
t=1
T
P
t=1
1
N
υit
1
N
t=1
1
N
= Op ( N1 ),
υit
i=1
υit2
T
P
υit = Op ( min{N,N11/2 T 1/2 } ),
!2
N
P
1
N
t=1
t=1
i=1
υit3
T
P
T
P
N
P
N
P
i=1
!
υit = Op ( min{N,N11/2 T 1/2 } ),
!2
N
P
= Op ( min{N 21,N T 2/1 } ),
υit
i=1
N
P
!3
υit
i=1
N
P
i=1
= Op ( min{N 3 ,N1 3/2 T 1/2 } ),
!4
υit
= Op ( N12 ).
29
1
T
(g)
T
P
t=1
N
P
λt N1
i=1
υit = Op ( N 1/21T 2/1 )
Proof . Part (b), (f), and (g) can be obtained by using similar argument
as in Lemma 6(b). For part (a), it can be established that T1
T
P
t=1
υit N1
NP
−1
υjt =
,j6=i
Op (N T )−1/2 by using similar argument as in Lemma 6(b). Then using also
Lemma 1(a) we get the desired result because
1
T
T
P
t=1
υit N1
NP
−1
1
T
T
P
t=1
υit N1
N
P
υit =
i=1
1
NT
T
P
t=1
υit2 +
υjt . Part (c),(d) and (e) can be established in the same way as
,j6=i
part (a).
Lemma 5 Let εbit = eit − N1
σbλ2 =
T
P
1
T
t=1
2
(4σλ2 σεi
1
N
N
P
N
P
i=1
2
eit , σbεi
=
1
T
T
P
εb2it ,
t=1
and Vbi =
1
T
T
P
t=1
2
2
(εb2it − σbεi
),
!2
eit
i=1
ar (ε2it ))
+V
Proof. Write
2
. Then under Assumptions 1-3, we have 4σbλ2 σbεi
+ Vbi −
= Op
1
min{T 1/2 ,N }
.
N
N
1 X
1 X
eit = σεi υit −
υit .
εbit = eit −
N i=1
N i=1
!
Then, using Lemma 3(b) and Lemma 4 (a) and 4(b), we have
2
2
σbεi
− σεi
=
T
1X
2
εb2 − σεi
T t=1 it
2
= σεi
T
T
N
X
1X
1 X
2 2
(υit2 − 1) − σεi
υit
υit
T t=1
T t=1 N i=1
!2
T
N
1X
1 X
υit
T t=1 N i=1
1
1
1
= Op
+ Op (
) + Op
1/2
1/2
1/2
T
min{N, N T }
N
!
1
= Op
min{T 1/2 , N }
2
+σεi
30
Similarly, we have
σbλ2
−
σλ2
T
N
1X
1 X
=
eit
T t=1 N i=1
= Op
!2
− σλ2
1
min{T 1/2 , N }
!
and

T T
N
X
1X
1 X
4
4 1
4
4

b
ε − σεi Eυit = σεi
υit −
υit
T t=1 it
T t=1
N i=1
= Op
1
min{T 1/2 , N }
!4

−
Eυit4 
!
Then it follows that
4
4
Vbi − (σεi
Eυit4 − σεi
) =
=
T 1X
2 2
4
4
εb2it − σbεi
− (σεi
Eυit4 − σεi
)
T t=1
T 1X
4
2
2
2
2
εb4it − σεi
Eυit4 − (σbεi
+ σεi
)(σbεi
− σεi
)
T t=1
= Op
1
min{T 1/2 , N }
!
Therefore
!
2
4σbλ2 σbεi
+ Vb
i
−
2
(4σλ2 σεi
+ Vi ) = Op
1
.
min{T 1/2 , N }
Lemma 6 Suppose Assumptions 1-3 hold. Then we have
(a)
(b)
1
T 1/2
1
N 1/2
T
P
2
2
4
(2λt εit + ε2it − σεi
) =⇒ N (0, 4σλ2 σεi
+ σεi
[E(υit4 ) − 1]) as T → ∞,
t=1
N
P
i=1
1
T 1/2
T
P
2
(2λt εit +ε2it −σεi
)
t=1
as (T, N → ∞) .
=⇒ N
0, lim N1
N →∞
N
P
i=1
!
2
(4σλ2 σεi
+
4
σεi
[E(υit4 )
− 1])
31
Proof. Part (a) can be established by a direct application of Theorem 5.6
2
in Hall and Heyde (1980), because (2λt εit + ε2it − σεi
) is strictly stationary
and ergodic stochastic process for given i.
From (a), we have, as T → ∞ and N is fixed,
N
T
N 1/2
1 X
1 X
1 X
2
2
2 2
4
4
(2λ
ε
+ε
−σ
)
=⇒
zi
4σ
σ
+
σ
[E(υ
)
−
1]
t
it
it
εi
λ
εi
εi
it
N 1/2 i=1 T 1/2 t=1
N 1/2 i=1
where zi is an i.i.d. N (0, 1) variable. Using Liapunov’s central limit theorem
for i.i.d. variables, we obtain
N N 1/2
1 X
1 X
2 2
2 2
4
4
4
4
4σ
4σ
σ
σ
+
σ
+
σ
[E(υ
z
=⇒
N
0,
lim
[E(υ
)
−
1]
)
−
1]
i
λ
λ
εi
εi
εi
εi
it
it
N →∞ N
N 1/2 i=1
i=1
as N → ∞. Now it remains to show that conditions (i)-(iv) of Lemma 2 are
satisfied. Applying triangle inequality, we have
N
T
1 X
1 X
2 E 1/2
(2λt εit + ε2it − σεi
)
lim sup
N i=1 T
N,T
t=1
N
T
N
T
1 X
1 X
1 X
1 X
2 E 1/2
2λt εit + lim sup
E 1/2
(ε2it − σεi
)
≤ lim sup
N i=1 T
N i=1 T
N,T
N,T
t=1
t=1
≤
T
T 1 X
1 X
2
2supσεi supE 1/2
λt εit + supσεi supE 1/2
υit2
T
T
i
i
T
T
t=1
t=1
− 1 < ∞
by virtue of Lemma 3 (d). Condition (ii) holds trivially because
2
2
E(λt εit + ε2it − σεi
) = E(λt )E(εit ) + Eε2it − σεi
= 0.
Next, using Cauchy-Schwarz inequality and the indicator inequality 1 {b > x} ≤
!
32
1 {a > x} if a > b for some real number a, b, x, we have
N
T
1 X
1 X
2 E 1/2
(2λt εit + ε2it − σεi
) 1
N i=1 T
t=1
(
T
1 X
1/2
(2λt εit
T
≤
≤
)
2
2 + εit − σεi ) > N t=1

!2
!2 1/2 N
T
T
q
1 X 2
1 X
1 X 2
2 

E σεi 4 + σεi
λ
υ
+
(υ
−
1)
t it
N i=1 T 1/2 t=1
T 1/2 t=1 it



!2
!2 1/2


T
T
 q

X
X
1
1
2
2
2 

×1 σεi 4 + σεi
λ
υ
+
(υ
−
1)
>
N
t
it
it


T 1/2 t=1
T 1/2 t=1



!2
!2 1/2 T
T
q
X
X
1
1
2
2
2
 4 + σεi
sup E  1/2
λt υit +
(υ
−
1)
supσεi
T
T 1/2 t=1 it
i
T
t=1


!2 1/2
!2


T
T


X
X
1
N
1
2


q
×1 
λ
υ
(υ
−
1)
>
+
t
it
T 1/2 t=1
T 1/2 t=1 it


σ2 4 + σ2 
εi
εi
→ 0 as (T, N → ∞),
since by Jensen inequality,
≤
≤

1
sup E  1/2
T
T

1
sup E
T 1/2
T "
T
1X
sup
T
T
!2 1/2 1
2
 λt υit +
(υ
−
1)
T 1/2 t=1 it
t=1
!2
!2 1/2 T
T
X
X
1
2
 λt υit + E
(υ
−
1)
it
T 1/2 t=1
t=1
!
!#
1/2
T
1X
!2
T
X
T
X
E(λt υit )2 + sup
T
T
t=1
E(υit2 − 1)2
t=1
< ∞,
so condition (iii) is verified. Since zi is an i.i.d. normal variable for all i,
2
4
supE 4σλ2 σεi
+ σεi
[E(υit4 ) − 1] zi2 i
≤
2
4σλ2 supσεi
i
+
[E(υit4 )
< ∞,
verifying condition (iv) of lemma 2.
−
4
1]supσεi
i
E(zi2 )
33
7.2
Proof of Proposition 1
Proof. We show the proofs for part (c) and (d). Part (a), (b), (e) and (f) can
be established by similar arguments.
(c) Using Lemma 3(a), we see that
v
u
RM SERT
N u
T
N q
X
1 X
1 X
t1
2
as T → ∞ for fixed N
=
e2it →p
σλ2 + σεi
N i=1 T t=1
N i=1
1
N →∞ N
2
< ∞ by assumption, lim
Since supσεi
i
it follows that
N q
P
σ2
i=1
λ
2
is well defined. Then
+ σεi
N q
1 X
2
σλ2 + σεi
N →∞ N
i=1
RM SERT →p lim
as (T, N → ∞)seq .
To establish the joint limit,
it remains to show that conditions (i)-(iv) of
s
Lemma 2 hold for
1
N
N
P
1
T
i=1
T
P
e2 .
By Jensen inequality and Lemma 3(b)
it
t=1
v
u
N u
T
q
X
1 X
tE 1
2
E |RM SERT | ≤
e2it ≤ sup σλ2 + σεi
N i=1
T t=1
i
!
verifying condition (i). Next, using power series expansion
v

u
T
q
u1 X
2
E  t
e
−
σλ2
it
T
t=1
2
+ σεi
≤
1
(σλ2
2
+
2 −1/2
σεi
)
E
T
1X
2
(e2 − σλ2 − σεi
)
T t=1 it
!
2
for sufficiently large T . Then condition (ii) holds because E(e2it ) = σλ2 + σεi
for all i and t. To establish condition (iii), it is enough to verify that
1
T
T
P
e2
t=1
it
is
uniformly integrable in T, which holds by virtue of Theorem 3.6 in Billingsley
(1999), because of
1
T
T
P
e2
t=1
it
≥ 0 and
1
T
T
P
e2
t=1
it
2
→p (σλ2 + σεi
) by Lemma 3(a).
2
Condition (iv) is trivial to establish since supσεi
< ∞ by assumption. So we
i
can conclude that
2
RM SERT
→p
N q
1 X
2
lim
σλ2 + σεi
N →∞ N
i=1
!2
34
as (T, N → ∞).
(d) Using power series expansion again, we see that
T
=
1/2
RM SERT
N q
1 X
2
−
σλ2 + σεi
N i=1
!
N
T
1 2
1 X
1
1 X
2 −1/2
2
(σλ + σεi
)
[ 1/2
(e2it − σλ2 − σεi
)] + Op ( 1/2 ).
N i=1 2
T
T
t=1
Then it suffices to show
N
T
1 2
1 X
1 X
2 −1/2
2
(σλ + σεi
(e2it − σλ2 − σεi
)
[ 1/2
)]
N i=1 2
T
t=1

=⇒ N 0,

1
N →∞ N
lim
N q
X
!2 
2
σλ2 + σεi
i=1

1
N →∞ N
lim
N
X
1
q
i=1
σλ2
+

2
2
σεi
Σλ 


as (N, T → ∞). Using Lemma 3(b), we have
T
T
1 X
2
)
(e2it − σλ2 − σεi
1/2
=
t=1
N
T
X
11 X
1
1
2 −1/2 1
(λ2t − σλ2 ) + Op ( 1/2 ) + Op ( 1/2 )
(σλ2 + σεi
)
1/2
2 N i=1
T
T
N
t=1
=⇒
N
11 X
2 −1/2
(σ 2 + σεi
)
N (0, Σλ ) .
2 N i=1 λ
1
N →∞ N
2
2
Since 0 < inf σεi
≤ supσεi
< ∞ by assumption, lim
i
i
N
P
2 −1/2
(σλ2 + σεi
)
is
i=1
well defined. Then it follows that
T 1/2 RM SERT −
1
N
N q
X
i=1
!

2
σλ2 + σεi
=⇒ N 0,


1
1
lim
4 N →∞ N
N
X
1
q
i=1
2
σλ2 + σεi
2


Σλ 

35
as (T, N → ∞) . Consequently, using the results in part (c),

2
T 1/2 RM SERT
=
N q
1 X
2
σλ2 + σεi
+
N i=1
RM SERT

1
N →∞ N
=⇒ N 0,
lim

N q
1 X
2
−
σλ2 + σεi
N i=1
N q
X
T 1/2 RM SERT

i=1

!!
!2 
2
σλ2 + σεi
!2 
1
N →∞ N
lim
N
X
2
1
q
i=1
N q
1 X
2
−
σλ2 + σεi
N i=1
σλ2
+
2
σεi


Σλ 

as (T, N → ∞).
7.3
Proof of Theorem 1
Proof. Using Lemma 5, we obtain


!2


!2
T
N h
T
N X
i
X
1 X
T X
2
2
b −1  1
b λ2 σ
b εi
e2
e
−
4
σ
+
V
i
N 1/2 i=1 
T t=1 it N T i=1 t=1 it
N h
T
N X
T
i−1
T X
1X
1 X
2 2
2

=
4σ
σ
+
V
e
−
e2it
i
λ εi
it
1/2

N
T t=1
N T i=1 t=1
i=1
+Op
N
T

1 
− 
T

1 
− 
T 
1/2 !
.
2
4
Next, define Ψi = 4σλ2 σεi
+ σεi
(Eυit4 − 1). Then
2
2
4
supΨ−1
i = 4σλ inf σεi + inf σεi
i
i
i
Eυit4 − 1 < ∞.
Hence,
N
T
1X
T X
−1
2
Ψ
(2λt εit + ε2it − σεi
)
N 1/2 i=1 i
T t=1
!
≤
1
supΨ−1
i
N 1/2
i
= Op (N −1/2 )
N X
T
1 X
2
(2λt εit + ε2it − σεi
)
N T i=1 t=1
!
!2
N
T
1 X
1 X
2
)
(2λt εit + ε2it − σεi
N 1/2 i=1 T 1/2 t=1
!
36
by virtue of Lemma 6(b). Similarly,
N X
N
T
T X
1 X
2
−1
(2λt εit + ε2it − σεi
)
Ψ
N 1/2 i=1 i
N T i=1 t=1
!2
!2
N
T
1 X
1 X
2
(2λt εit + ε2it − σεi
)
N 1/2 i=1 T 1/2 t=1
1
≤ supΨ−1
i
N 1/2
i
= Op (N −1/2 ).
1
T
Then, by expanding the term
T
P
e2
t=1
it
−
1
NT
N P
T
P
e2
i=1 t=1
!2
and substituting
it
2
λt + εit for eit , we have, under the null hypothesis that σεi
= σε2 for all i,

!2

N h
T
N X
T
i−1
1 X
T X
1X
2
2 2

e
−
e2it
4σ
σ
+
V
i
it
λ εi
1/2

N
T t=1
N T i=1 t=1
i=1
1
N 1/2
=
N h
X
2
4σλ2 σεi
+ Vi
i−1

1
T
X

T 1/2
t=1
i=1

1 
− 
T 
!2
2
(2λt εit + ε2it − σεi
)

− 1 + Op (N −1/2 ).
So
2−1/2 N −1/2
= 2−1/2 N −1/2
+Op
N
T

N h
X

i=1

N h
X

i=1
!
1/2
2
4σbλ2 σbεi
+ Vbi
2−1/2 N −1/2
i=1
T
N X
T
1 X
1X
e2it −
e2
T t=1
N T i=1 t=1 it
T
i−1
4σλ2 σε2 + σε4 [(Eυit4 − 1)]
T
1 X
T 1/2
!2


−1

!2
[(2λt εit + ε2it ) − σε2 ]
N
T

→ 0, it remains to establish that under the null
i−1
2
4
4σλ2 σεi
+ σεi
[(Eυit4 − 1)]

T
T
1 X
2
[(2λt εit + ε2it ) − σεi
]
1/2
!2
By Lemma 6(a) and continuous mapping theorem, we see that, when T → ∞,
−1
2
4
4σλ2 σεi
+ σεi
[E(υit4 ) − 1]
T
T
1 X
2
[(2λt εit + ε2it ) − σεi
]
1/2
t=1
!2
=⇒ χ21,i


−1
t=1
=⇒ N (0, 1) as (N, T → ∞) .


−1
t=1
+ Op (N −1/2 ).
Given (N, T → ∞) and
hypothesis

N h
X
i−1

37
where χ21,i is an i.i.d. chi square random variable with 1 degree of freedom.
By applying classical central limit theorem, we obtain
2−1/2 N −1/2

N 
X
i=1
T
1 X
−1
2
4
4σλ2 σεi
+ σεi
[E(υit4 ) − 1]
T 1/2

!2
2
[(2λt εit + ε2it ) − σεi
]


−1

t=1
=⇒ N (0, 1) as (N, T → ∞)seq .
The above limit distribution continues to hold under (N, T → ∞), provided
that conditions (i)-(iv) of Lemma 2 are satisfied. Conditions (i) and (ii) can
be established in a similar way as in Lemma 6(b). Condition (iv) is trivial
to verify because the variable (χ21,i − 1) is uniformly integrable. It remains
to examine condition (iii). Note that
n
N
T
o−1
1 X
1 X
2
2
4
[(2λt εit + ε2it ) − σεi
E 4σλ2 σεi
lim sup
]
+ σεi
[E(υit4 ) − 1]
N i=1 T 1/2 t=1
N,T

n
2
× 1 4σλ2 σεi

n
T
1 X
o−1
4
+ σεi
[E(υit4 ) − 1]
T 1/2
!2
2
[(2λt εit + ε2it ) − σεi
]
t=1
− 1
!2
> Nε

o−1
2
4
≤ inf 4σλ2 σεi
+ σεi
[E(υit4 ) − 1]
i
× lim sup
N,T
n
×1 N
1 X
2
4
3
σεi
4σεi
ZiT E 4σεi
N i=1
3
4
2
4σεi
σεi
4σεi
n
oo
2
4
ZiT > N ε 4σλ2 σεi
+ σεi
[E(υit4 ) − 1]
where

ZiT =










1
T 1/2
1
T 1/2
1
T 1/2
T
P
t=1
!2
λt υit

− σλ2





4
− [E(υit ) − 1] 

! 

T

1 P
(υ 2 − 1)
!2
(υit2 − 1)
t=1
T
P
t=1
T
P
!
λt υit
T 1/2
t=1
it
Consider the indicator function first. Note that
oo
n
n
2
4
2
4
3
σεi
4σεi
1 4σεi
ZiT > N ε 4σλ2 σεi
+ σεi
[E(υit4 ) − 1]



2
4
N ε {4σλ2 σεi
+ σεi
[E(υit4 ) − 1]} 
n
o
≤ 1 kZiT k >

2
4
3
σεi
4σεi
supi 4σεi
− 1


(A3)
38
Next, expression (A3) is bounded above by
o−1
n
2
4
inf 4σλ2 σεi
+ σεi
[E(υit4 ) − 1]
i


o
n
2
4
3
σεi
4σεi
sup 4σεi
i

× sup E kZiT k 1 kZiT k >
T
2
4
N ε {4σλ2 σεi
+ σεi
[E(υit4 ) − 1]} 
o
n

2
4
3
σεi
4σεi
supi 4σεi
(A4)
By Assumption 2
2
4
N ε {4σλ2 σεi
+ σεi
[E(υit4 ) − 1]}
o → ∞
n
3
4
2
4σεi
σεi
supi 4σεi
as N → ∞. Then expression (A4)→ 0 as (T, N → ∞), since kZiT k is uniformly integrable in T , which follows from
sup E kZiT k
T
=



sup E 


T
T
1 X
T 1/2
T
1 X
"
+
T 1/2



≤ sup E 

T 
T
1 X
"
+E
T 1/2




≤ sup E 
T

+ E
λt υit
T 1/2


T
1
T 1/2
T
X
λt υit
2
T
1 X

λt υit

T 1/2
t=1
!2
(υit2 − 1)
!2

− σλ2  + E 
E
1
T
X
2
− [E(υit4 ) − 1]

t=1
λt υit
t=1
(υit2 − 1)
2
!4 1/2 "
− [E(υit4 ) − 1]
!#2 1/2

t=1
!2
(υit2 − 1)

− σλ2  + E 
λt υit
T 1/2
T
1 X
t=1
T 1/2
2
!2
!#2 1/2

t=1
!2
T
1 X
t=1
t=1
T
X
T
1 X
(υit2 − 1)
1/2
!
T 1/2
− σλ2  + 
λt υit
!
1

t=1
t=1

2
!2
1
T
X
T 1/2
t=1
<∞
by virtue of Lemma 3(c), 3(d) and 3(e).
T
T
1 X
(υit2 − 1)
1/2
t=1

!#1/2 1/2

(υit2 − 1)


2
− [E(υit4 ) − 1]
39
References
Aiolfi, M. and A. Timmermann (2006). Persistence in forecasting performance and conditional combination strategies. Journal of Econometrics
135, 31-53.
Baltagi, B.H. (1988). An alternative heteroscedastic error component model.
Econometric Theory 4, 349-350.
Baltagi, B.H., G. Bresson and A. Pirotte (2006). Joint LM test for heteroskedasticity in a one way error component model. Journal of Econometrics 134, 401-417.
Bates, J.M. and C.W.J. Granger (1969). The combination of forecasts. Operations Research Quarterly 20, 451-468.
Billingsley, P. (1999). Convergence of Probability Measures. John Wiley &
Sons Inc., New York.
Boero G., J. Smith and K.F. Wallis (2008). Uncertainty and disagreement in
economic prediction: the Bank of England Survey of External Forecasters.
Economic Journal 118, 1107-1127.
Bomberger, W.A. (1996). Disagreement as a measure of uncertainty. Journal
of Money, Credit and Banking 28, 381-392.
Breusch, T.S. and A.R. Pagan (1979). Simple test for heteroscedasticity and
random coefficient variation. Econometrica 47, 1287-1294.
Buckland, S.T., K.P. Burnham, and N.H. Augustin (1997). Model Selection:
An Integral Part of Inference. Biometrics 53, 603-618.
Capistrán, C. and A. Timmermann (2009). Forecast combination with entry
and exit of experts. Journal of Business and Economic Statistics 27, 429440.
Davidson, J. (1994). Stochastic Limit Theorem. Oxford University Press, Oxford.
Davies, A. and K. Lahiri (1995). A new framework for analyzing survey
forecasts using three-dimensional panel data. Journal of Econometrics 68,
205-227.
40
Diebold, F.X., A.S. Tay and K.F. Wallis (1999). Evaluating density forecasts
of inflation: The survey of professional forecasters. In Engle, R.F. and
White, H. (eds.), Cointegration, Causality, and Forecasting: A Festschrift
in Honour of Clive W.J. Granger. Oxford University Press, Oxford.
Draper, D. (1995). Assessment and propagation of model uncertainty. Journal of the Royal Statistical Society B 57: 45-97.
Elliott, G. (2011a). Forecast Combination when Outcomes are Difficult to
Predict. UCSD Department of Economics Working Paper May 10, 2011.
Elliott, G. (2011b). Averaging and the Optimal Combination of Forecasts.
UCSD Department of Economics Working Paper Sept. 27, 2011.
Elliott, G. and A. Timmermann (2004). Optimal Forecast Combination under General Loss Functions and Forecast Error Distributions. Journal of
Econometrics 122, 47-79.
Elliott, G. and A. Timmermann (2005). Optimal forecast combination under
regime switching. International Economic Review 46, 1081-1102.
Engelberg, J., C.F.Manski and J. Williams (2011), Assessing the Temporal
Variation of Macroeconomic Forecast by a Panel of Changing Composition.
Journal of Applied Econometrics 26, 1059-1078.
Engle, R.F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica 50:
987-1008.
Engle, R.F. (1983). Estimates of the variance of U.S. inflation based upon
the ARCH model. Journal of Money, Credit and Banking 15, 286-301.
Genre, V., G. Kenny, A. Meyler and A. Timmermann (2013). Combining
expert forecasts: can anything beat the simple average? International
Journal of Forecasting 29, 108-121
Geweke, J. and C. Whiteman (2006). Bayesian forecasting. In Elliott, G.,
Granger, C.W.J. and Timmermann, A. (eds.), Handbook of Economic
Forecasting. Elsevier, 4-80.
Giordani, P. and P. Söderlind (2003). Inflation forecast uncertainty. European
Economic Review 47, 1037-1059.
41
Granger, C.W.J. and Y. Jeon (2004). Thick modeling. Economic Modelling
21, 323-343.
Hall, P. and C.C. Heyde (1980). Martingale Limit Theory and its Application.
Academic Press, New York.
Hastie, T., R. Tibshirani and J. Friedman (2009), The Elements of Statistical
Learning: Data mining, Inference, and Prediction Springer, 2nd edition.
Hendry, D.F. and M.P. Clements (2004), Pooling of Forecasts. Econometrics
Journal 7, 1-31.
Hibon, M and T. Evgeniou (2005), To combine or not to combine: Slecting
among forecasts and their combinations. International Journal of Forecasting, 21 15-24.
Hoeting, J.A., D. Madigan, A.E. Raftery and C.T. Volinsky, (1999). Bayesian
Model Averaging: A Tutorial (with discussion). Statistical Science 14 , 382417.
Holly, A. and L. Gardiol (2000). A score test for individual heteroscedasticity
in a one way error component model. In Krishnakumar, J. and E. Ronchetti
(eds.) Panel Data Econometrics: Future Directions, North-Holland, Amsterdam.
Hsiao, C. and M.H. Pesaran (2008). Random coefficient panel data models.
In Matyas, L. and P. Sevestre (eds.), The Econometrics of Panel Data:
Fundamentals and Recent Developments in Theory and Practice. Springer,
3rd edition.
Hsiao, C. and S.K. Wan (2011), Comparison of Forecasting methods with an
application to predicting excess equity premium. Mathematics and Computers in Simulation 81(7), 1235-1246.
Issler, J.V. and L.R. Lima (2009). A Panel data approach to economic forecasting: the bias-corrected average forecast. Journal of Econometrics 152,
153-164.
Kleiber, W., A.E. Raftery, and T. Geniting (2011). Geostatistical Model
Averaging for Locally Calibrated Probabilistic Quantitative Precipitation
Forecasting. Journal of the American Statistical Association 106 (496),
1291-1303.
42
Koop, G. and S. Potter (2004). Forecasting in Dynamic Factor Models using
Bayesian Model Averaging. Econometrics Journal 7, 550-565.
Lahiri, K. and F. Liu (2006). Modeling multi-period inflation uncertainty
using a panel of density forecasts. Journal of Applied Econometrics 21,
1199-1219.
Lahiri, K. and X. Sheng (2008). Evolution of forecast disagreement in a
Bayesian learning model. Journal of Econometrics 144, 325-340.
Lahiri, K. and X. Sheng (2010). Measuring forecast uncertainty by disagreement: the missing link. Journal of Applied Econometrics 25, 514-538.
Lahiri, K., C. Teigland and M. Zaporowski (1988). Interest Rates and the
Subjective Probability Distribution of Inflation Forecasts. Journal of Money, Credit, and Banking 20, 233-248.
Magnus, J.R. (1982). Multivariate error components analysis of linear and
nonlinear regression models by maximum likelihood. Journal of econometrics 19, 239-285.
Makridakis, S., R.M. Hogarth and A. Gaba (2009). Forecasting and uncertainty in the economic and business world. International Journal of Forecasting 25, 794-812.
Manski, C.F. (2011). Interpreting and combining Heterogeneous Survey Forecasts. In Clemens, M. P., and Hendry, D.F. (eds.), Oxford Handbook of
Economic Forecasting 457-472. Oxford University Press.
Mazodier, P. and A. Trognon (1978). Heteroscedasticity and stratification in
error components models. Annales de I’INSEE 30-31, 451-482.
McNees, S.K. (1992). The Uses and Abuses of “Consensus” Forecasts. Journal
of Forecasting 11, 703-710.
Mishkin, F.S. (2007). Inflation dynamics. Speech delivered at the Annual Macro Conference, Federal Reserve Bank of San Francisco, California
(March 23).
Palm, F.C. and A. Zellner (1992). To combine or not to combine? Issues of
Combining Forecasts. Journal of Forecasting 11, 687-701.
43
Pesaran, M.H. and T. Yamagata (2008). Testing slope homogeneity in large
panels. Journal of Econometrics 142, 50-93.
Phillips, P.C.B. and H.R. Moon (1999). Linear regression limit theory for
nonstationary panel data. Econometrica 67, 1057-1111.
Rao, P.S.R.S., J. Kaplan and W.C. Cochran (1981). Estimators for the oneway random effects model with unequal error variances. Journal of the
American Statistical Association 76, 89-97.
Reifschneider, D. and P. Tulip (2007). Gauging the uncertainty of the economic outlook from historical forecasting errors. Federal Reserve Board,
Finance and Economics Discussion Series, No. 60.
Sancetta, A. (2010). Recursive Forecast Combination for Dependent Heterogeneous Data. Econometric Theory 26, 598-631.
Smith, J. and K.F. Wallis (2009). A Simple Explanation of the Forecast
Combination Puzzle. Oxford Bulletin of Economics and Statistics 71, 331355.
Stock, J.H. and M.W. Watson (2007). Why has U.S. inflation become harder
to forecast? Journal of Money, Credit and Banking 39, 3-33.
Swamy, P. (1970). Efficient inference in a random coefficient regressions model. Econometrica 38, 311-323.
Tetlow, R.J. (2006). Real-time model uncertainty in the United States: “Robust” Policies put to the test. Federal Reserve Board Mimeo.
Timmermann, A. (2006). Forecast combinations. In Elliott, G., C.W.J.
Granger and A. Timmermann (eds.), Handbook of Economic Forecasting.
Elsevier, 135-196.
Tulip, P. and S. Wallace (2012). Estimates of Uncertainty around the RBA’s
Forecasts. Economic Research Department, Reserve Bank of Australia Discussion Paper 2012-07.
Verbon, H.A.A. (1980). Testing for heteroscedasticity in a model of seemingly unrelated regression equations with variance components (SUREVC),
Economics Letters 5, 149-153.
44
Wallis, K.F. (2005). Combining Density and Interval Forecasts: A Modest
proposal. Oxford bulletin of Economics and Statistics 67, 983-994.
Wansbeek, T.J. (1989). An alternative heteroscedastic error components
model. Econometric Theory 5, 326.
Wei, X., Y. Yang, (2012). Robust forecast combinations. Journal of Econometrics 166, 224-236.
Wilson, E.B. and M.M. Hilferty (1931). The distribution of chi-square. Proceedings of the National Academy of Sciences 17, 684-688.
Winkler, R.L. (1989). Combining Forecasts: A philosophical basis and some
current issues. International Journal of Forecasting 5, 605-609.
Woodford, M. (2005). Central bank communication and policy effectiveness.
Proceedings, Federal Reserve Bank of Kansas City, issue Aug, 399-474.
Yang, U. (2004). Combining Forecasting Procedures: Some Theoretical Results. Econometric Theory 20, 176-222.
Zarnowitz, V (1984), The accuracy of individual and group forecasts from
business outlook survey.Journal of Forecasting, 3, 11-26.
Zarnowitz, V. and L.A. Lambros (1987). Consensus and uncertainty in economic prediction. Journal of Political Economy 95, 591-621.
σω2
N=20
0.047
0.045
0.048
0.042
0.044
0.045
= 0.1, σε2 = 0.01
N=40 N=100
0.039 0.047
0.051 0.049
0.044 0.047
0.045 0.041
0.040 0.049
0.056 0.051
σω2 = 0.2, σε2 = 0.2
σω2 = 0.6, σε2 = 1.5
σω2
N=20 N=40 N=100 N=20 N=40 N=100 N=20
0.051 0.044 0.051 0.057 0.063 0.076 0.041
0.046 0.058 0.043 0.046 0.053 0.054 0.047
0.053 0.046 0.050 0.054 0.052 0.051 0.051
0.050 0.052 0.055 0.059 0.058 0.068 0.048
0.049 0.047 0.055 0.054 0.055 0.048 0.045
0.055 0.055 0.043 0.040 0.052 0.053 0.047
= 3, σε2 = 1
N=40 N=100
0.044 0.040
0.044 0.049
0.053 0.045
0.044 0.046
0.043 0.050
0.044 0.043
2
2
Rejection rates of Z test under H0 : σ
level based on 2000 replications. The model is eit = λt + εit ,
qεi = σε forqalmost all i at the 5%√nominal
√
√
√
3
3
where λt = ρλt−1 + ωt , λ0 ∼ U (−σω 1−ρ2 , σω 1−ρ2 ), ωt ∼ i.i.d.U (− 3σω , 3σω ), ∀t, and εit ∼ i.i.d.U (− 3σε , 3σε ), ∀(i, t).
σω2 = 0.05, σε2 = 0.01
N=20 N=40 N=100
T=30 0.044 0.039 0.050
T=60 0.039 0.050 0.054
ρ=0
T=120 0.049 0.041 0.051
T=30 0.044 0.042 0.045
T=60 0.047 0.041 0.049
ρ = 0.4
T=120 0.049 0.052 0.043
Table 1: Size of Z test
45
=1
N=100
0.665
0.971
1.000
0.565
0.927
1.000
=1
N=100
0.243
0.550
0.947
0.193
0.474
0.877
2
2
2
In Panel A, σεi
= σε2 for 80% of the samples, and σεi
6= σε2 for the remaining 20%, where σεi
∼ U (0.1σω , 1.9σω ).
2
2
2
2
2
In Panel B, σεi = σε for 60% of the samples, and σεi 6= σε for the remaining 40%, where σεi ∼ U (0.1σω , 1.9σω ).
Rejection rates
based on 2000 replications. The model is eit = λt + εit , where λt = ρλt−1 + ωt ,
q of Z testqat the 5% nominal level
√
√
√
√
3
3
λ0 ∼ U (−σω 1−ρ
,
σ
),
ω
∼
i.i.d.U
(−
3σ
2
ω
t
ω , 3σω ), ∀t, and εit ∼ i.i.d.U (− 3σεi , 3σεi ), ∀(i, t).
1−ρ2
Panel A
σω2 = 0.05, σε2 = 0.01
σω2 = 0.1, σε2 = 0.01
σω2 = 0.2, σε2 = 0.2
σω2 = 0.6, σε2 = 1.5
σω2 = 3, σε2
N=20 N=40 N=100 N=20 N=40 N=100 N=20 N=40 N=100 N=20 N=40 N=100 N=20 N=40
T=30 0.047 0.175 0.125 0.050 0.096
0.075
0.087 0.881 0.808 0.153 0.999 0.998 0.066 0.344
T=60 0.071 0.403 0.282 0.051 0.162
0.125
0.141 0.999 0.992 0.281 1.000 1.000 0.080 0.703
ρ=0
T=120 0.088 0.798 0.694 0.065 0.424
0.296
0.298 1.000 1.000 0.596 1.000 1.000 0.121 0.982
T=30 0.055 0.147 0.105 0.046 0.071
0.066
0.089 0.805 0.717 0.149 0.996 0.990 0.057 0.294
T=60 0.052 0.345 0.232 0.045 0.143
0.119
0.131 0.990 0.980 0.255 1.000 1.000 0.065 0.613
ρ = 0.4
T=120 0.086 0.716 0.590 0.056 0.343
0.245
0.248 1.000 1.000 0.539 1.000 1.000 0.102 0.948
Panel B
σω2 = 0.05, σε2 = 0.01
σω2 = 0.1, σε2 = 0.01
σω2 = 0.2, σε2 = 0.2
σω2 = 0.6, σε2 = 1.5
σω2 = 3, σε2
N=20 N=40 N=100 N=20 N=40 N=100 N=20 N=40 N=100 N=20 N=40 N=100 N=20 N=40
T=30 0.079 0.033 0.373 0.056 0.136
0.158
0.356 0.991 0.999 0.710 1.000 1.000 0.133 0.550
T=60
0.149
0.658
0.782
0.088
0.318
0.347
0.661 1.000 1.000 0.962 1.000 1.000 0.241 0.921
ρ=0
T=120 0.315 0.968 0.993 0.149 0.661
0.776
0.962 1.000 1.000 1.000 1.000 1.000 0.507 0.999
T=30 0.078 0.245 0.304 0.061 0.130
0.128
0.307 0.954 0.987 0.643 1.000 1.000 0.113 0.466
T=60
0.128
0.563
0.671
0.079
0.253
0.274
0.593 1.000 1.000 0.930 1.000 1.000 0.202 0.837
ρ = 0.4
T=120 0.250 0.928 0.969 0.135 0.552
0.663
0.924 1.000 1.000 1.000 1.000 1.000 0.437 0.995
Table 2: Power of Z test
46
47
TABLES
Table 3: Measures of aggregate uncertainty in inflation forecast
Horizon
1Q ahead
2Q ahead
3Q ahead
4Q ahead
5Q ahead
6Q ahead
7Q ahead
8Q ahead
RM SEAF
0.19
0.23
0.29
0.45
0.65
0.80
0.85
0.88
RM SERT
0.18
0.26
0.38
0.58
0.81
0.92
1.02
1.09
RM SELS
0.20
0.28
0.45
0.67
0.91
1.02
1.15
1.30
σλ2
0.04
0.05
0.08
0.20
0.41
0.63
0.71
0.74
σ2
0.00
0.02
0.12
0.25
0.41
0.41
0.61
0.94
Z
22.37*
11.95*
7.36*
11.29*
9.80*
8.12*
5.89*
8.23*
Note:
(a)
The inflation rate is measured as the annual-average over annual-average
percent change of GDP deflator. The actual inflation rate for 1982-2011 is
taken from the first quarterly release of Federal Reserve Bank of Philadelphia
“real-time” data set;
(b)
The inflation forecasts used in this study are taken from the Survey of
Professional Forecasters from 1981:Q3 until 2011:Q4;
(c)
* indicates significance at 1% level.
48
TABLES
Table 4: Measures of aggregate uncertainty in output growth forecast
Horizon
1Q ahead
2Q ahead
3Q ahead
4Q ahead
5Q ahead
6Q ahead
7Q ahead
8Q ahead
RM SEAF
0.29
0.37
0.52
0.88
1.20
1.59
1.54
1.67
RM SERT
0.25
0.39
0.59
0.97
1.29
1.63
1.60
1.73
RM SELS
0.30
0.41
0.62
1.03
1.36
1.77
1.69
1.89
σλ2
0.09
0.14
0.27
0.77
1.43
2.51
2.35
2.77
σ2
0.00
0.04
0.12
0.31
0.61
1.00
0.63
0.97
Z
23.82*
5.41*
5.40*
7.30*
4.06*
14.25*
9.97*
9.02*
Note:
(a)
The output growth is measured as the annual-average over annual-average
percent change of real GDP. The actual output growth rate for 1982-2011 is
taken from the first quarterly release of Federal Reserve Bank of Philadelphia
“real-time” data set;
(b)
The output growth forecasts used in this study are taken from the Survey of
Professional Forecasters from 1981:Q3 until 2011:Q4;
(c)
* indicates significance at 1% level.
49
TABLES
Table 5: Measures of aggregate uncertainty from a pseudo-balanced panel of
inflation forecast
Horizon
1Q ahead
2Q ahead
3Q ahead
4Q ahead
5Q ahead
6Q ahead
7Q ahead
8Q ahead
RM SEAF
0.18
0.23
0.30
0.43
0.63
0.78
0.83
0.85
RM SERT
0.19
0.27
0.43
0.65
0.90
0.98
1.10
1.28
RM SELS
0.19
0.29
0.48
0.72
0.99
1.04
1.20
1.39
Note:
(a)
The inflation rate is measured as the annual-average
over annual-average percent change of GDP deflator.
The actual inflation rate for 1982-2011 is taken from
the first quarterly release of Federal Reserve Bank of
Philadelphia “real-time” data set;
(b)
The inflation forecasts used in this study are taken
from the Survey of Professional Forecasters from
1981:Q3 until 2011:Q4;
(c)
The pseudo-balanced panel includes 21 forecasters,
with each of them representing the minimum, 5th percentile, 10th percentile, ..., 95th percentile and the
maximum of individual forecasts, respectively.
50
TABLES
Table 6: Measures of aggregate uncertainty from a pseudo-balanced panel of
output growth forecasts
Horizon
1Q ahead
2Q ahead
3Q ahead
4Q ahead
5Q ahead
6Q ahead
7Q ahead
8Q ahead
RM SEAF
0.26
0.36
0.54
0.92
1.18
1.59
1.53
1.61
RM SERT
0.28
0.41
0.64
1.08
1.36
1.79
1.72
1.85
RM SELS
0.28
0.42
0.66
1.11
1.39
1.82
1.74
1.93
Note:
(a)
The output growth is measured as the annualaverage over annual-average percent change of real
GDP. The actual output growth rate for 1982-2011
is taken from the first quarterly release of Federal
Reserve Bank of Philadelphia “real-time” data set;
(b)
The output growth forecasts used in this study are
taken from the Survey of Professional Forecasters
from 1981:Q3 until 2011:Q4;
(c)
The pseudo-balanced panel includes 21 forecasters,
with each of them representing the minimum, 5th percentile, 10th percentile, ..., 95th percentile and the
maximum of individual forecasts, respectively.
51
TABLES
Table 7: Historical GDP forecast uncertainty due to the variance of
Horizon
1Q ahead
2Q ahead
3Q ahead
4Q ahead
5Q ahead
6Q ahead
7Q ahead
8Q ahead
aggregate shocks
0.95
0.80
0.72
0.73
0.78
0.81
0.82
0.78
idiosyncratic errors
0.04
0.16
0.24
0.20
0.18
0.15
0.14
0.18
systematic individual biases
0.01
0.04
0.04
0.07
0.05
0.04
0.04
0.03
52
TABLES
Table 8: Historical inflation forecast uncertainty due to the variance of
Horizon
1Q ahead
2Q ahead
3Q ahead
4Q ahead
5Q ahead
6Q ahead
7Q ahead
8Q ahead
aggregate shocks
0.90
0.70
0.42
0.45
0.52
0.62
0.55
0.46
idiosyncratic errors
0.08
0.26
0.53
0.47
0.41
0.32
0.39
0.50
systematic individual biases
0.02
0.04
0.05
0.08
0.07
0.06
0.05
0.04
Download