CHAPTER 2
Forecasting
McGraw-Hill/Irwin
Copyright © 2009 by The McGraw-Hill Companies, Inc. All rights reserved.
2-2
Introduction to Forecasting
What is forecasting?
Primary Function is to Predict the Future
Why are we interested?
Affects the decisions we make today
Examples: who uses forecasting in their
jobs?
forecast demand for products and services
forecast availability of manpower
forecast inventory and materiel needs daily
2-3
Characteristics of Forecasts
They are usually wrong!
A good forecast is more than a single number
mean and standard deviation
range (high and low)
Aggregate forecasts are usually more accurate
Accuracy erodes as we go further into the future.
Forecasts should not be used to the exclusion of
known information
2-4
What Makes a Good Forecast
It should be timely
It should be as accurate as possible
It should be reliable
It should be in meaningful units
It should be presented in writing
The method should be easy to use and
understand in most cases.
2-5
Forecast Horizons in Operation Planning
Figure 2.1
2-6
Subjective Forecasting
Methods
Sales Force Composites
Aggregation of sales personnel estimates
Customer Surveys
Jury of Executive Opinion
The Delphi Method
Individual opinions are compiled and
reconsidered. Repeat until and overall group
consensus is (hopefully) reached.
2-7
Objective Forecasting Methods
Two primary methods: causal models and
time series methods
Causal Models
Let Y be the quantity to be forecasted and (X1,
X2, . . . , Xn) be n variables that have predictive power for Y.
A causal model is Y = f (X1, X2, . . . , Xn).
A typical relationship is a linear one. That is,
Y = a0 + a1X1 + . . . + an Xn.
2-8
Time Series Methods
A time series is just collection of past values of the
variable being predicted. Also known as naïve
methods. Goal is to isolate patterns in past data.
(See Figures on following pages)
Trend
Seasonality
Cycles
Randomness
2-9
Figure 2.2
2-10
Notation Conventions
Let D1, D2, . . . Dn, . . . be the past values of the
series to be predicted (demand). If we are making
a forecast in period t, assume we have observed Dt
, Dt-1 etc.
Let Ft, t + t = forecast made in period t for the demand
in period t + t where t = 1, 2, 3, …
Then Ft -1, t is the forecast made in t-1 for t and
t+1 is the forecast made in t for t+1. (one step
ahead) Use shorthand notation Ft = Ft - 1, t .
Ft,
2-11
Evaluation of Forecasts
The forecast error in period t, et, is the difference
between the forecast for demand in period t and
the actual value of demand in t.
For a multiple step ahead forecast: et = Ft - t, t - Dt.
For one step ahead forecast: et = Ft - Dt.
MAD = (1/n) S | e i |
MSE = (1/n) S ei 2
2-12
Biases in Forecasts
A bias occurs when the average value of a
forecast error tends to be positive or
negative.
Mathematically an unbiased forecast is one
in which E (e i ) = 0. See Figure 2.3 on page
64 in text (next slide).
Forecast Errors Over Time
Figure 2.3
2-13
2-14
Forecasting for Stationary
Series
A stationary time series has the form:
Dt = m + e t where m is a constant and e t is a
random variable with mean 0 and var s2 .
Two common methods for forecasting
stationary series are moving averages and
exponential smoothing.
2-15
Moving Averages
In words: the arithmetic average of the n most
recent observations. For a one-step-ahead
forecast:
Ft = (1/N) (Dt - 1 + Dt - 2 + . . . + Dt - n )
(Go to Example.)
2-16
Summary of Moving Averages
Advantages of Moving Average Method
Easily understood
Easily computed
Provides stable forecasts
Disadvantages of Moving Average Method
Requires saving all past N data points
Lags behind a trend
Ignores complex relationships in data
Moving Average Lags a Trend
Figure 2.4
2-17
2-18
Exponential Smoothing
Method
A type of weighted moving average that applies
declining weights to past data.
1. New Forecast = a (most recent observation)
+ (1 - a) (last forecast)
or
2. New Forecast = last forecast
a (last forecast error)
where 0 < a < 1 and generally is small for stability
of forecasts ( around .1 to .2)
2-19
Exponential Smoothing (cont.)
In symbols:
Ft+1 = a Dt + (1 - a ) Ft
= a Dt + (1 - a ) (a Dt-1 + (1 - a ) Ft-1)
= a Dt + (1 - a )(a )Dt-1 + (1 - a)2 (a )Dt - 2 + . . .
Hence the method applies a set of exponentially
declining weights to past data. It is easy to show that
the sum of the weights is exactly one.
(Or
Ft + 1 = Ft - a (Ft - Dt)
)
2-20
Weights in Exponential Smoothing
2-21
Comparison of ES and MA
Similarities
Both methods are appropriate for stationary series
Both methods depend on a single parameter
Both methods lag behind a trend
One can achieve the same distribution of forecast error
by setting a = 2/ ( N + 1).
Differences
ES carries all past history. MA eliminates “bad” data
after N periods
MA requires all N past data points while ES only
requires last forecast and last observation.
2-22
Using Regression for Times Series Forecasting
Regression Methods Can be Used When Trend is Present.
–Model: Dt = a + bt.
If t is scaled to 1, 2, 3, . . . , then the least squares estimates for a
and b can be computed as follows:

Set Sxx = n2 (n+1)(2n+1)/6 - [n(n+1)/2]2

Set Sxy = n S i Di - [n(n + 1)/2] S Di
_
–Let b = Sxy / Sxx
and a = D - b (n+1)/2
These values of a and b provide the “best” fit of the data in a
least squares sense.
2-23
Other Methods When Trend is Present
Double exponential smoothing, of which Holt’s
method is only one example, can also be used to
forecast when there is a linear trend present in
the data. The method requires separate
smoothing constants for slope and intercept.
2-24
Forecasting For Seasonal
Series
Seasonality corresponds to a pattern in the data that repeats at
regular intervals. (See figure next slide)
Multiplicative seasonal factors: c1 , c2 , . . . , cN where i = 1 is
first period of season, i = 2 is second period of the season,
etc..
S ci = N.
ci = 1.25 implies 25% higher than the baseline on avg.
ci = 0.75 implies 25% lower than the baseline on avg.
2-25
Figure 2.8
2-26
Quick and Dirty Method of
Estimating Seasonal Factors
Compute the sample mean of the entire data set
(should be at least several seasons of data).
Divide each observation by the sample mean.
(This gives a factor for each observation.)
Average the factors for like periods in a season.
The resulting N numbers will exactly add
to N and correspond to the N seasonal
factors.
2-27
Deseasonalizing a Series
To remove seasonality from a series, simply
divide each observation in the series by the
appropriate seasonal factor. The resulting
series will have no seasonality and may then
be predicted using an appropriate method.
Once a forecast is made on the
deseasonalized series, one then multiplies
that forecast by the appropriate seasonal
factor to obtain a forecast for the original
series
2-28
Box-Jenkins Models
Recommended when at least 72 data points
of past history are available.
Primary feature: Exploits the structure of
the autocorrelation function of the time
series. Autocorrelation coefficient of lag k:
n
 ( D  D)( D
rk = t = k 1
t
n
t k
2
(
D

D
)
 t
t =1
 D)
,
2-29
Stationary Time Series
Box Jenkins models can only be constructed for
stationary series. That is, series that exhibit no
trend, seasonality, growth, etc. If the series is
represented by D1, D2, . . . then this translates
to the assumptions that E(Di) = μ and
Var(Di) = s 2 independent of i.
Later we will show how differencing can convert
many non-stationary series to stationary series.
2-30
The Autoregressive Process
Dt = a0  a1 Dt 1  a2 Dt  2  ...  a p Dt  p  e t .
(a0 , a1 ,..., aas
Interpret
p ) the linear
regression coefficients and ase tthe error
term. This is an AR(p) process. Simpler and
more common is the AR(1) process given
by:
Dt = a0  a1Dt 1  e t .
2-31
Theoretical Autocorrelation
Function of the AR(1) Process
2-32
2-33
The Moving Average Process
Dt = b0  b1e t 1  b2e t  2  ...  bqe t  q  e t
Note that the weights (b1 , b2 ,..., bq ) are shown with
negative signs by convention. It can be shown
that an AR(1) process is equivalent to an MA(∞)
process. The MA(1) model is powerful because
the autocorrelation function, which has a non-zero
value only at lag 1, is often observed in practice.
Typical Realizations of the MA(1) Process with
negative and positive one period
autocorrelations.
2-34
2-35
2-36
ARMA Models
An ARMA model is one that includes both AR terms
and MA terms. For example, the ARMA(1,1)
model is:
Dt = c  a1Dt 1  b1e t 1  e t
By combining AR and MA terms into a single
model, we are able to capture complex
relationships in the data with a parsimonious
model (i.e., one with as few terms as possible).
2-37
ARIMA Models
The “I” in ARIMA stands for integrated, which
means applying an ARMA model to a differenced
process. Differencing can convert a nonstationary time series into a stationary time series
under some circumstances. One order of
differencing eliminates trend, and two orders of
differencing eliminates quadratic trend First
differencing would be denoted:
U t = Dt  Dt 1
2-38
Practical Considerations in
Forecasting
Overly sophisticated forecasting methods can be
problematic, especially for long term forecasting.
(Refer to Figure on the next slide.)
Tracking signals may be useful for indicating forecast
bias.
Box-Jenkins methods require substantial data history,
use the correlation structure of the data, and can
provide significantly improved forecasts under some
circumstances.
2-39
Figure 2.12
2-40
Case Study: Sport Obermeyer Saves Money
Using Sophisticated Forecasting Methods
Problem: Company had to commit at least half of production
based on forecasts, which were often very wrong. Standard
jury of executive opinion method of forecasting was
replaced by a type of Delphi Method which could itself
predict forecast accuracy by the dispersion in the forecasts
received. Firm could commit early to items that had
forecasts more likely to be accurate and hold off on items in
which forecasts were probably off. Use of early information
from retailers improved forecasting on difficult items.
Consensus forecasting in this case was not the best method.