Forecast Combination - University at Albany

advertisement
FORECAST ERRORS: BALANCING THE RISKS AND COSTS
OF BEING WRONG
Qiang Xu1
Hilke Kayser
Lynn Holland
March 2007
1
Chief Econometrician and Director of Research
Mailing Address:
New York State Executive Department
Division of the Budget
Room 135 – State Capitol
Albany, NY 12224, USA
Email Address: bdxu@budget.state.ny.us
Phone: (518) 474-1766
1
Forecast Errors: Balancing the Risks and Costs of Being Wrong
In practice, there is no such thing as a perfect forecast. Forecast errors can arise
from various sources, including an incorrect model specification, errors in the data,
incorrect assumptions regarding the future values of explanatory variables, and shocks or
events that, by nature, cannot be predicted at the time when the forecast was made. Thus,
even under a correct model specification and correct assumptions, forecasts will differ
from the actual values. Forecast errors are typically assumed to be drawn from a zeromean process, such as white noise. Errors of that nature are the best one can hope for,
since no model can presume to capture all of the factors that affect the variable under
consideration.
Though the model specification may be correct, the analyst typically works with
sample data rather than population data, making parameter estimates subject to sampling
error. However, when a model is solved to produce a forecast, the model coefficient
estimates are treated as fixed numbers, when in fact, they are themselves random
variables. The forecaster can only hope to estimate the "true" model parameters within a
statistically acceptable margin of error. For example, though the true parameter value
may be 0.85, an estimate of 0.75 may be judged to be statistically significant. Indeed,
any value between 0.75 and 0.85 might reasonably be expected to pass the test of
statistical significance. But when either 0.75 or 0.80 is used instead of 0.85 to predict
future values, the forecast outcome will be different.
In light of the many sources of risk, the forecaster must be prepared to make an
assessment of the risks to the forecast, and evaluate the costs associated with those risks.
After performing such an assessment, the forecaster may want to implement a feedback
mechanism from the risk assessment back to the forecast. If the risk that the forecast will
be too high is assessed to be greater than the risk of being too low, then the analyst may
want to lower the forecast in order to restore balance. For example, it is unlikely that an
econometric model can adequately capture the impact of geopolitical turmoil on oil
prices. Consequently, when there is a war going on in the Middle East, the probability
that actual oil prices will rise above the model forecast may be greater than the
probability that oil prices will be below. In such cases, the analyst may not only want to
make explicit the asymmetric nature of the risks, but may also feel justified in making an
upward adjustment to the model forecast.
Even when the forecast risks are balanced, the costs associated with forecast errors
may not be. In many situations, the cost of an overestimate may outweigh the cost of an
underestimate, and, in such cases, the analyst may feel justified in making a downward
adjustment to the model forecast in order to balance the costs. In estimating budgetary
revenues and spending, the cost of overestimating tax receipts may include the risk of a
fiscal crisis, while no such risk is inherent in underestimation. These concerns lead to a
discussion of the forecaster's "loss function" and an evaluation of the costs of being
wrong.
2
Section 1 of this chapter introduces various measures of forecast error, including the
notion of symmetric vs. asymmetric error distribution. Section 2 presents methods for
assessing forecast risks (prediction intervals and density forecasts) and for presenting
those risks to other interested parties. These methods include Monte Carlo simulation
and the construction of fan charts. For simplicity of exposition, sections 1 and 2 abstract
from the forecaster's loss function, implicitly assuming that the forecaster's loss is simply
proportional to the absolute value of the error itself. Section 3 introduces more general
forms for the forecaster's loss function and discusses the choice of an optimal forecast
under a given loss function and a given distribution of risks. Section 4 discusses methods
for choosing among a menu of forecasts given a particular loss function.
1. Measures of Forecast Error
There are a number of statistics that are commonly used to measure forecast error.
Suppose Yt is an observed time series and one is interested in forecasting its future values
H periods ahead. Define et  h ,t as the time t+h forecast error for a forecast made at time t
such that
et  h ,t  Yt  h  Yt  h ,t
where Yt  h is the actual value of Y at time t+h and Yt  h ,t is the forecast for Yt  h made at
time t. Similarly, we define the percentage error as
pt  h,t 
Yt  h  Yt h,t
.
Yt h
In addition, there are various statistics that summarize the model's overall fit. For a
given value of h, these include the mean error,
1 T
Mean Error: ME   et  h ,t
T t 1
which can be interpreted as a measure of bias. An ME greater than zero, indicates that
the model has a tendency to underestimate. All else being equal, the smaller the ME , the
better the model. We can also define the error variance,
Error Variance: EV 
1 T
(et  h ,t  ME ) 2

T t 1
which measures the dispersion of the forecast errors. Squaring the errors amplifies the
penalty for large errors and does not permit positive and negative errors to cancel each
other out. All else being equal, the smaller the EV , the better the model. Popular
measures also include:
1 T
Mean Squared Error: MSE   et2 h ,t
T t 1
3
and
Mean Squared Percent Error: MSPE 
1 T 2
 pt h,t .
T t 1
Often the squared roots of these measures are used to preserve units, yielding
Root Mean Squared Error: RMSE 
1 T 2
 et h,t
T t 1
and
Root Mean Squared Percent Error: RMSPE 
1 T 2
 pt h,t .
T t 1
Some less popular but nevertheless common accuracy measures include:
Mean Absolute Error: MAE 
1 T
 et h,t
T t 1
and
Mean Absolute Percent Error: MAPE 
1 T
 pt h,t .
T t 1
It is clear that the length of the forecast horizon, H, is of crucial importance, as longerterm forecasts tend to have larger errors, when compared with nearer-term forecasts.
2. Risk Assessment: Monte Carlo Simulation and Fan Charts
Since no forecast can be expected to be 100 percent accurate, risk assessment
involving measures of expected forecast accuracy has become increasingly popular. The
construction of such measures is usually simulation-based and the availability of ample
computing power has made these computations more widely feasible. The most common
constructs for assessing risk are prediction intervals and density forecasts. A prediction
interval supplements a point forecast with a range and a probability that the actual value
will fall within that range. A density forecast goes one step further by assigning varying
degrees of likelihood to particular values as one moves further from the point forecast.
The basic tool for constructing these measures is Monte Carlo simulation.
Monte Carlo Simulation
Applications of Monte Carlo methods have enjoyed a flowering in the econometrics
literature. In these studies, data are generated using computer-based pseudorandom
number generators, i.e., computer programs that generate sequences of values that appear
4
to be strings of draws from a specified probability distribution. For a set of three given
values {p,q,r}, the method of generation usually proceeds as follows:
0. Initialize the seed.
1. Update the seed according to: seed j  f ( seed j 1 , p, q ) .
2. Calculate x j  seed j / r .
3. Perform a distribution-specific transformation on x if necessary (if the desired
distribution is something other than a standard uniform distribution, or U[0,1]);
then move x into memory.
4. Return to step 1.
For example, the following simple pseudo-random number generator has been widely
used for x ~ U[0,1]:
0. Initialize seed 0 .
1. Update the seed according to: seed j  mod( p * seed j 1 , q) .
2. Calculate x j  seed j / r .
3. Move x into memory.
4. Return to step 1.
The modulus function, mod(a,b), is the integer remainder after a is divided by b. For
example, mod(11,3)=2. The generator will produce several million pseudorandom draws
from U(0,1). For example, suppose the seed is initialized at 1234567.0 and let
{ p, q, r}  {16807.0, 2147483648.0, 2147483655.0} . Then, the first ten values produced
by this random number generator are:
Iteration
0
1
2
3
4
5
6
7
8
9
10
SEED
1234567
1422014737
456166167
268145409
1299195559
2113510897
250624311
1027361249
1091982023
546604753
1998521175
X
0.662177
0.212419
0.124865
0.604985
0.984180
0.116706
0.478402
0.508494
0.254533
0.930634
The above sample is drawn from a standard uniform, or U[0,1], population. When
sampling from a standard uniform U[0,1] population, the sequence is essentially a
difference equation, since given the initial seed, x j is ultimately a function of x j 1 . In
most cases, the result at step 2 is a pseudo draw from the continuous uniform distribution
in the range of zero to one.
5
For a given model specification and a given set of exogenous inputs, Monte Carlo
simulation studies evaluate the risk to the forecast due to variation in the dependent
variable that cannot be explained by the model, as well as the random variation in the
model parameters. By assumption, the model errors are considered to be draws from a
normally distributed random variable with mean zero. For purposes of the simulation,
the model parameters are also considered to be random variables that are distributed as
multivariate normal. The standard deviation of the regression errors, and the means and
standard deviations of the parameter distribution are derived from the regression analysis.
In order to simulate values for the dependent variable, a random number generator is
used to generate a value for the model error and values for the parameters from each of
the above probability distributions. Based on these draws and values from the input data
set, which for purposes of the simulation is assumed to be fixed, the model is solved for
the dependent variable. This "experiment" is typically repeated thousands of times,
yielding thousands of simulated values for each observation of the dependent variable.
The means and standard deviations of these simulated values can be used to construct a
prediction interval and provide the starting point for creating a density forecast typically
portrayed by a fan chart.
Figure 1
Fan Chart for Partnership/S Corporation Income Growth
90 percent prediction intervals
20%
Monte Carlo Mean
DOB Forecast
Percent change
15%
10%
5%
0%
-5%
1991
1993
1995
1997
1999
2001
2003
2005
2007
Note: With 90 percent probability, actual growth will fall into the shaded region. Bands represent 5 percent probability regions.
Source: NYS Department of Taxation and Finance; DOB staff estimates.
Density Forecasts and Fan Charts
Fan charts display prediction intervals as shown in Figure 1. It is estimated that with
90 percent probability, future values will fall into the shaded area of the fan. Each band
within the shaded area reflects five percent probability regions. The chart "fans out" over
6
time to reflect the increasing uncertainty and growing risk as the forecast departs further
from the base year. Not only does the fan chart graphically depict the risks associated
with a point forecast as time progresses, but it also highlights how realizations that are
quite far from the point estimate can have a reasonably high likelihood of occurring. Fan
charts can exhibit skewness that reflects more downside or upside risk to the forecast, and
the costs associated with erring on either side.
Theoretical Underpinnings of the Fan Chart
To capture the notion of asymmetric risk, the fan chart used by DOB assumes a twopiece normal distribution for each of the forecast years following an approach inspired by
Wallis (1999) and others. A two-piece normal distribution of the form
 A exp[( x   )2 / 2 12 ] x  
f ( x)  
2
2
 A exp[( x   ) / 2 2 ] x  
with A  ( 2 (1   2 ) / 2)1 , is formed by combining halves of two normal distributions
having the same mean but different standard deviations, with parameters (  , 1 ) and
(  , 2 ) , and scaling them to give the common value f (  ). If  1   2 , the two-piece
normal has positive skewness with the mean and median exceeding the mode. A smooth
distribution f ( x) arises from scaling the discontinuous distribution f ( z ) to the left of μ
using 2 1 /( 1   2 ) and the original distribution f ( z ) to the right of μ using
2 2 /( 1   2 ).
Figure 2
f ( x), f ( z )
____ two halves of normal distributions with mean 
and standard deviations  1 and  2 .
------ two-piece normal distribution with mean  .
β
α
α
δ
σ1/(σ1+σ2)
σ2/(σ1+σ2)

x, z
7
One can determine the cutoff values for the smooth probability density function f ( x)
from the underlying standard normal cumulative distribution functions by recalling the
scaling factors. For    1 ( 1   2 ) , i.e. to the left of μ, the point of the two-piece
normal distribution defined by Prob( X  x ) = is the same as the point that is defined
by Prob(Z  z ) = , with

 ( 1   2 )
2 1
and
x   1 z  
Likewise, for (1   )   2 ( 1   2 ) , i.e. to the right of μ, the point of the two-piece normal
distribution that is defined by Prob( X  x ) = is the same as the point that is defined
by Prob(Z  z ) = , with

 ( 1   2 )
2 2
x1   1 z1  
and
For the two-piece normal distribution, the mode remains at μ. The median of the
distribution can be determined as the value defined by Prob( X  x ) =0.5 . The mean of
the two-piece normal distribution depends on the skewness of the distribution and can be
calculated as:
E( X )   
2

( 2   1 )
Choice of Parameters
In constructing its fan charts, DOB uses means from the Monte Carlo simulation
study as the mean, μ, of the two underlying normal distributions. As mentioned above, if
the two-piece normal distribution is skewed, the Monte Carlo mean becomes the mode or
most likely outcome of the distribution and will differ from the median and the mean. In
the sample fan chart above, the mode is displayed as the crossed line. Except for in
extremely skewed cases the mode tends to fall close to the middle of the central 10
percent prediction interval. As Britton et al. (1998) point out in their discussion of the
inflation fan chart by the Bank of England, the difference between the mean and the
mode provides a measure of the skewness of the distribution. Given the skewness
parameter, γ, DOB determines the two standard deviations,  1 and  2 , as  1 = (1+ )
and  2 = (1- ) , where  is the standard deviation from the Monte Carlo simulation
study.
By definition, the mean of the distribution is the weighted average of the realizations
of the variable under all possible scenarios, with the weights corresponding to the
probability or likelihood of each scenario. In its forecasts, DOB aims to assess and
incorporate the likely risks. Though no attempt is made to strictly calculate the
8
probability weighted average, the forecast will be considered a close approximation of
the mean. Thus the skewness parameter, γ, is determined as the difference between
DOB's forecast and the Monte Carlo mean. DOB's fan chart shows central prediction
intervals with equal tail probabilities. For example, the region in the darkest two slivers
represents the ten percent region in the center of the distribution. DOB adds regions with
5 percent probability on either side of the central interval to obtain the next prediction
interval. If the distribution is skewed, the corresponding 5 percent prediction intervals
will include different ranges of growth rates at the top and the bottom, thus leading to an
asymmetric fan chart.
The 5 percent prediction regions encompass increasingly wider ranges of growth
rates as one moves away from the center because the probability density of the two-piece
normal distribution decreases as one moves further into the tails. Thus the limiting
probability for any single outcome to occur is higher for the central prediction regions
than for intervals further out because a smaller range of outcomes shares the same
cumulative probability. Over time, risks become cumulative and uncertainties grow.
DOB uses its own forecast history to determine the degree to which σ1 and σ2 need to be
adjusted upward to maintain the appropriate probability regions.
3. Generalizing the Forecaster's Loss Function
When the forecaster's loss function is more general than the simple one assumed for the
prior section, the forecaster's choice of an optimal forecast may deviate even further from
the model forecast. Suppose a forecaster working for a private sector manufacturing firm
is asked to provide guidance as to whether the firm should raise its level of inventories
based on the outlook for demand for the company's product. If demand is projected to be
high, then the firm will proceed to build inventories; if low, then the firm will reduce
inventories. There are costs to the firm of being wrong. If demand is unexpectedly low,
the firm will have unplanned inventories, while if demand is higher than expected, the
firm will lose market share. The simple tables below, which summarize the costs to the
firm of bad planning under alternative loss structures, clearly illustrate that the loss
structure will factor critically into the firm's decision.
Forecast/Actual
High
Low
Under Symmetric Losses
High
$0
$10,000
Low
$10,000
$0
Decision
High Forecast
Low forecast
Under Asymmetric Losses
Demand High
$0
$20,000
Demand Low
$10,000
$0
The construct for measuring the cost attached by the forecaster to an incorrect
prediction is the loss function, L (et  h ,t ) , where et  h ,t is defined as above. The cost
9
associated with the forecast error is presumed to depend only on the size of the forecast
error and to be positive unless the error is (in theory) zero. Typically, L(e) is constructed
to satisfy three requirements:
1. L(0)  0 .
2. L(e) is continuous, implying that two nearly identical forecast errors should
produce nearly identical losses.
3. L(e) increases as the absolute value of e increases, implying that the bigger the
size of the absolute value of the error, the bigger the loss.
Figure 3
Quadratic Loss Function
Loss
30
20
10
0
-5
-4
-3
-2
-1
0
1
2
3
4
5
2
3
4
5
Forecast Error
Figure 4
Absolute Loss Function
Loss
6
4
2
0
-5
-4
-3
-2
-1
0
1
Forecast Error
Loss functions can be either symmetric or asymmetric. Depicted in Figure 3 is the
quadratic loss function, where
10
L(e)  e2 .
The squaring associated with quadratic loss makes large errors much more costly than
small ones. In addition, the loss increases at an increasing rate on each side of the origin,
implying symmetry. The absolute loss function is depicted in Figure 4, where
L(e) | e | .
This function is also symmetric, but the loss increases at a constant rate with the size of
the error, producing its V-shape.
In reality, the costs associated with being wrong may not always be symmetric. For
example, if the costs associated with under- and over-predicting travel time to the airport
were symmetric, we would expect many more missed flights than we actually observe.
That we observe few missed flights is an indication that the cost of a missed flight must
outweigh the cost of arriving early and having to wait in the airport, implying that the
loss function is not symmetric. As alluded to above, government budget analysts may
also face asymmetric costs associated with over-predicting vs. under-predicting revenues.
Indeed, the different branches of government may have asymmetric loss functions that
are mirror images of each other. Industry analysts may also attach a higher cost to an
overly pessimistic forecast than to an overly optimistic one.
Here we present the two asymmetric loss functions that are most popular in the
literature. A more detailed presentation can be found in Christoffersen and Diebold
(1997). The first is the "linex" function,
L(e)  b exp(ae)  ae 1 , a  R \ 0 , b  R .
The linex loss function is so-named since for a greater than (less than) 0, it assigns a cost
that is linear in the forecast error if the error is negative (positive) and exponential in the
forecast error if it is positive (negative). Thus, negative forecast errors (Yt+h<Yt+h,t) are
much less costly than positive errors. The linex loss function, which is depicted in the
graph below, may well pertain to forecasting the time it will take to get to the airport. A
negative error implies a longer wait at the airport, while a large positive error could entail
a missed flight.
Under the linex loss function, the optimal h-step ahead forecast solves the following
minimization problem:


min
Et b exp(a(Yt  h  Yˆt  h ))  a(Yt  h  Yˆt  h )-1 .
Yˆt h
Differentiating and using the conditional moment-generating function for a conditionally
normally distributed random variate yields,
11
a
Yˆt  h =t  h t   t2 h t
2
assuming conditional heteroskedasticity.2 Thus, the optimal predictor is a simple
function of the conditional mean and a bias term that depends on the conditional h-step
ahead prediction-error variance and the degree of loss function asymmetry, as measured
by the parameter a. When a is positive, the larger is a, the greater the bias toward
negative errors (over-prediction). In addition, when a is positive, the optimal predictor is
also positive in the prediction-error variance.
Figure 5
Linex Loss Function
30
Loss
20
10
0
-5
-4
-3
-2
-1
0
1
2
3
4
5
Forecast Error
A second commonly used asymmetric loss function is the "linlin" loss function,
which can be expressed as follows,
a e , if e  0
L (e)  
.
b e , if e  0
The "linlin" loss function is so-called since it is linear in the errors, and is a
generalization of the absolute loss function depicted above where the slopes are allowed
to differ on either side of the origin. The optimal predictor solves the following
minimization problem,
2
Christoffersen and Diebold derive a "pseudo-optimal" estimator by replacing
unconditional h-step ahead prediction-error variance
 t2 h t
with the
 h2 , the resulting estimator only being optimal under
conditional homoskedasticity. However, under conditional heteroskedasticity, the "pseudo-optimal"
estimator will fail to result in a lower conditionally expected loss than the conditional mean except during
times of high volatility.
12





ˆ
ˆ )f (Y |  )dY 
min
a
(
Y

Y
)
f
(
Y
|

)
dY

b
(
Y

Y

 t h t h t h t t h ˆ t h t h t h t t h .
Yˆt h
ˆ
Y
Yt h


 t h

The first order condition implies the following result
F (Yˆt  h | t ) 
a
ab
where t is the conditional cumulative distribution function (c.d.f.) of Yt  h . If Yt  h is
normally distributed, then the optimal predictor is
 a 
Yˆt  h =t  h t   t  h t  1 

 ab
where  ( z ) is the standard normal c.d.f.
The above results pertain to two fairly simple loss functions.
However,
Christoffersen and Diebold also show how an optimal predictor can be approximated
when the loss function is more general using numerical simulation. Though less
restrictive, this approach may be less accessible to the average practitioner. Moreover, on
choosing values for parameters a and b, the literature is silent. However, it is hoped that
the above discussion has illustrated how the problem of asymmetric loss fits into the
broader problem of forecasting and can provide a useful guideline as to how to proceed
and communicate the central issue.
4. Statistical Comparison of Alternative Forecasts
Choosing Among Competing Models
Suppose one must choose between two competing models, A and B, given a particular
loss function. This can be couched as a hypothesis testing problem:
H 0 : E[ L(etA h ,t )]  E[ L(etB h ,t )]
H A : E[ L(etA h ,t )]  E[ L(etB h ,t )] or E[ L(etA h ,t )]  E[ L(etB h ,t )]
Equivalently, you might want to test the hypothesis that the expected loss differential is
zero
E[dt ]  E[ L(etAh,t )]  E[ L(etBh,t )]  0
If dt is a stationary series, the large-sample distribution of the sample mean loss
differential is
13
T (d   ) ~ N (0, f )
where
d
1 T
[ L(etA h ,t )  L(etB h ,t )]

T t 1
is the sample mean loss differential, f is the variance of the sample mean differential,
and  is the population mean loss differential. Under the null hypothesis of a zero
population mean loss differential, the standardized sample mean loss differential has a
standard normal distribution
B
d
~ N (0,1)
fˆ T
where fˆ is a consistent estimate of f .3
Forecast Combination
Suppose one has two competing models, A and B, and statistical test results indicate that
they are equally accurate. Should you combine them?
Forecast Encompassing
Suppose models A and B produce forecasts Yt Ah,t and Yt Bh,t . The following regression
can be performed,
Yt h,t   AYt Ah,t  BYt Bh,t   t h,t
If  A  1 and  B  0 then Model A forecast encompasses Model B. If  A  0 and  B  1
then Model B forecast encompasses Model A. Otherwise, neither model encompasses
the other and you may want to combine them.
Forecast Combination
The Blue Chip Consensus forecast is a simple average of about 50 forecasts. However,
under certain circumstances, equally weighting all of the participating forecasters may
not be optimal. For example, suppose again there are two forecasts, Yt Ah,t and Yt Bh,t . One
might combine them in a weighted average:
3
Alternatively, the sophisticated practitioner might want to choose between competing density forecasts.
This problem is treated rigorously in Tay and Wallis (2000), under loss functions of general form, but is
beyond the scope of this chapter.
14
Yt Ch,t   *Yt Ah,t  (1  )Yt Bh,t
where Yt C h,t is the combination forecast. Alternatively, one can write the problem in
terms of forecast errors:
etCh,t   * etAh,t  (1  )* etBh,t
with variance
2
 C2   2 A2  (1   )2  B2  2 (1   )2  AB
based on forecasters' past performances. The value of  can be determined as the
solution to an optimization problem where the objective is to minimize the weighted
average forecast error. The first order condition indicates that the simple Blue Chip
weighting scheme is not necessarily optimal.
The above methods abstract from consideration of the form of the forecaster's loss
function. Forecast combination under more general circumstances is discussed more
rigorously in Elliott and Timmermann (2002). The authors show that as long as the
forecast error density is elliptically symmetric, the forecast combination weights are
invariant over all loss functions, leaving only the constant term to capture the tradeoff
between the bias in the loss function and the variance of the forecast error. As to the
importance of the shape of the loss function to the choice of weights, the authors offer the
intuitive conclusion that the larger the degree of loss function asymmetry, the larger the
gains from optimally estimating the combination weights compared to equally weighting
the forecasts.
Following Elliott and Timmermann (2002), we generalize the problem of forecast
combination by defining Yt  h,t as a vector of forecasts and assume that Yt C h,t and Yt  h,t are
jointly distributed with the following first and second moments,
 Yt C h ,t   Y 
E 
   
 Yt+h,t   μ 
and
 Yt C h ,t    Y2
Var 
  
 Yt+ h,t   σ 21
σ'21 
.
Σ 22 
Assume that the optimal combination forecast is a linear combination of the elements of
Yt  h,t , giving rise to the forecast error defined as
15
et h,t  Yt Ch,t   c  ωYt+h,t
where ω is a vector of combination weights and  c is a scalar constant, and et has the
following first and second moments
e   y   c  ωμ
 e2   y2 + ωΣ 22 ω  2ωσ 21 .
Under a symmetric quadratic loss function, the first order conditions of the minimization
problem imply the optimal population values
0c   y  ωμ
-1
ω0  Σ 22
σ 21 .
Although Elliott and Timmermann (2002) presents very general results, a common
special class of cases is that of elliptically symmetric forecast errors, but asymmetric loss.
The solution values for the optimal weights have the convenient property that only the
constant term  c depends on the shape of the loss function. Thus, if
E  L  et    g  e ,  e2 
then 0c is the solution to
g  e* ,  e2  / e  0
where  e* is the optimal value for  e . Thus, under the assumption of normally
distributed forecast errors and a linex loss function,
a
2
0c   y  ω0 μ +  e2
while under linlin loss,
 a 
.
 ab
0c   y  ω0 μ   e  1 
16
References
Britton, E., P. Fisher and J. Whitley (1998). "The Inflation Report projections:
understanding the fan chart." Bank of England Quarterly Bulletin, 38, 30-37.
Christofferson, P. and F.X. Diebold (1997). "Optimal prediction under asymmetrical
loss." Econometric Theory 13, 806-817.
Elliott, G. and A. Timmermann (2002). "Optimal forecast combinations under general
loss functions and forecast error distributions." University of California at San Diego,
Economics Working Paper Series 2002-08.
Granger, Clive (1989). Forecasting in Business and Economics (2nd edition), San Diego:
Academic Press.
Tay, A. and K. Wallis (2000). "Density forecasting: a survey." Journal of Forecasting 19,
235-254.
Wallis, K. (1999). "Asymmetric density forecasts of inflation and the Bank of England's
fan chart." National Institute Economic Review, no. 167, January, 106-112.
17
Download