Document 11045189

advertisement
LIBRARY
OF THE
MASSACHUSETTS INSTITUTE
OF TECHNOLOGY
EXPONENTIAL SMOOTHING^
AN EXTENSION
189-66
Christopher
R.
Sprague
This working paper should not be reproduced, quoted, or cited without the
written permission of the author.
'
RrcEIVFD
I
I
OCT
j_M.>
T.
JO
19f;fi
\inRARlF.5
A/V/V
EXPONENTIAL SMOOTHING
—
AN EXTENSION
Christopher R. Sprague
.
Introduct ion
its
various forms
I
Exponential smoothing
in
demand forecasting.
in
is
a
commonly-used technique
This paper proposes a method for extending the
usefulness of the technique where long lead times are encountered.
Section
describes a technique for estimating errors at an arbitrary
II
lead time.
Section
III
Section
IV
describes a computer program incorporating this technique.
gives a brief summary of results, which
offered only as
is
indication without any claim of statistical significance.
The Exponential Smoothing Model of Demand -Evaluation of Errors
M.
A wide class of demand-generating processes seem to be well modeled by
d(t) =
where d(t)
(s
is
+
rt)
•
f(t) + u(t)
demand at time
t
s
is
"average" demand at time
r
is
a
f
is
a seasonal
f
normalized so that Z
1
(1)
inear trend term
factor
m
f(t) = m, where m
is
number of periods
in
a
t=l
seasonal cycle
and u(t)
is
a
J
random variable normally distributed with zero mean and
variance independent of
While this model
remain valid,
of
s,
r,
it
is
t.
conceptually neat, one sad fact of life
must admit of changes (albeit slow changes)
and the f's.
Normally
it
Is
is
in
expected that the changes
parameters are sufficiently slow to be swamped by the u's
in
that,
to
the values
in
model
any given period.
:
Page
2
.
One useful way of estimating the model parameters so as to use them
for predictive purposes
and Ratio Seasonals
fo
1
1
]
in
known as Exponential Smoothing with Linear Trend
is
which the parameters are recursively estimated as
ows
Q(l) = D(l)/F(l)
S(l) = A^-^Qd) +
R(l) =
^v
B
F(l+M) = C
(s(|)
->
(2a)
(l.-A)
-
vr
(S(l-l) + R(l-l))
S(l-l)) + (1.-B)
(D(l)/S(l)) + (l.-C)
^v
(2b)
R(i-l)
(2c)
(2d)
F(l)
vv
where:
D(l)
is
actual demand in period
Q(l)
is
deseasonal ized demand as of period
D(|)
is
actual demand in period
F(l)
is
estimated seasonal factor for period
I
I
1
I
S(l) is estimated average deseasonal zed demand as of period
period 0)
i
R(l)
estimated trend for period
is
M is the number of periods
monthly data)
and
It
F(|)
A,
is
B,
one seasonal cycle (e.g.,
and C are "smoothing constants" between
to begin,
(
not
!
and
12
for most
1.
evident from the above that the process needs S(0),
F(m)
...
in
I
R(0),
and
but thereafter needs only D(l) for each period to
conti nue.
The proof of the pudding,
however,
appropriate prediction for month
P(l+L) = (S(l) +
where P
is
a
L
->
R(l))
predicted demand,
F(l+L = F(l+L-M), M <
L <;
I
.V
+
L
is
in
the prediction,
is:
F(l + L)
and the F's are reused,
2M
and the
(3)
i.e.,
(k)
Page 3.
Now,
S.
R,
the model
if
(l)
valid,
is
then the "best" possible estimates of
and the F's will give the "best" possible prediction.
are obviously dependent on the choice of A,
measured demands
A,
B,
this
and
is
D,
so that the problem
is
B,
and
C,
as well
as on
to produce the "best" estimates of the parameters.
C
the
to choose appropriate values of
Typically
done by establishing a set of smoothing constants and running the
smoothing process against historical data for periods
1
These parameters
period ahead,
L =
I
...
I,
predicting
and forming the sum:
I
E =
Z
(((D(J)
j =
where G
is
-
P(J))
^v-.v
2)
weighting factor <
1.
.V
G
'!r:c
(|-j))
(5)
l
a
This associates a squared-error term with a set of smoothing constants.
If G
is
less
than 1.0,
the most recent periods are emphasized
in
the sum,
reflecting the idea that errors long-past are not so important as errors
in
the recent past.
Since we can associate an error term with any set of smoothing constants,
we can obviously find an "approximately optimum" set by an exhaustive search
or by hi
1
1
-cl imb ing or by some
combination of both.
This
is
called
"Adaptive Smoothing".
The remainder of this paper
above
is
It
used,
is
insensitive to which method of the two
so we here leave the point.
widely held that the procedure outlined above results
of parameters which will
"nonsense"
is
is
in
a
set
give the "best" predictions of future demand.
perhaps too strong
least try to cast some doubt
a
word to apply to this belief,
(or some stones).
While
let us at
There are at least four
reasons why this approach ought to be examined very carefully before being
adopted:
Page k.
The model
1.
(I)
is
only approximately valid.
If
were exact,
It
then
we could Indeed conclude that the set of smoothing constants which minimized
prediction error
1
period
in
advance was the set which would yield the "best"
estimates of model parameters.
The manager who
2.
is
on the line for a sales forecast does not care
whether the parameters are correct.
are good for some meaningful
be useful
M,
and
change
S
in
3.
He wants to know that his predict ions
lead time (perhaps 6 to 30 periods).
to note here that after some time,
the F's may no longer sum to
and R may be under or over-stated proportionately.
parameters makes absolutely no difference
model
may
It
Assuming that the desired prediction lead-time
the process as described yields no estimate of error
This great
in
is
the prediction'.
greater than
(or cost) at
1,
the actual
lead-time to be used.
k.
Where did the sum-of-squared-errors term arise,
from statistics,
least-squared fits,
computational simplicity (later
context).
In
this case,
and the error (or,
in
however,
anyway?
It
and the like, where its virtue
this paper we exploit this
there
is
In
came
is
another
no great computational advantage
more properly, cost) function might just as well be
absolute deviation or any arbitrary thing like:
D(J)
-
P(J)
(6)
Before proceeding, we should probably dispose of objection {k) above.
While there
is
nothing idea! about the sum-of-squared-error formulation,
it
Page 5.
does have a number of advantages.
and probably controlling,
First,
we
if
I
divide such
a
sum by the sum of weights E
like a variance,
(see Equation
(l.j)) we obtain a number
'''
(G
J=l
which can be viewed as an estimate of
the variance of u
While this analogy cannot be pushed too far,
(1)).
useful for people with some "feel" for statistical concepts.
square root of the "variance"
and last,
Third,
mildly discontinuous,
there
is
it
Second,
is
the
at least of the same order of magnitude as
is
and this
the expected absolute error in each prediction,
for the user.
2
if
is
a useful
(or cost) function
the error
number
even
Is
little hope of finding an acceptable
"approximately optimum" set of parameters by any method short of exhaustive
search on an extremely fine grid.
So,
having raised four objections and backed off from one, we pursue
the other three,
all
of which seem to be easily answered by simply evaluating
predictions at the desired lead time rather than at a lead time of
1.
this one has rather grave
As with many brilliantly simple solutions,
desire a prediction for
flaws,
for
if
we have data for periods
period
N +
L,
our error function will contain terms for predicted vs.
actual demands
periods
...
contain N-L +
old.
periods
in
N-L.
...
N,
Worse yet,
Perhaps some numbers are
in
Suppose we have 30 periods
period hl\ so
L
is
would contain P(l)
...
F(M)
12.
-
will use P(12)
-
!f
1
30 of real
...
F(30)
D(12) through P(30)
F(M) through S(l8),
R(!8),
F(19)
be L periods
order.
data and wish to predict
we were predicting with
R(29),
the error function will
the most recent of those will
D(l) through P(30)
through S(29),
based on parameter values as of
i.e.,
instead of having N terms,
Thus,
terms.
1
L
N and
...
I
...
-
...
-
L =
1,
our error function
D(30) based on S(0),
F(29+M).
D(30),
F(!8+
M)
Instead with
based on S(0),
.
R(0),
L
R(0),
F(l)
= 12,
F(l)
we
...
Page 6.
The fact that each sum of errors has
would
is
serious
another way out
it
otherwise
always
is
periods too old (relatively).
11
Clearly
desirable.
is
We now propose
It
fewer terms than
but it pales by comparison with the fact that the
indeed,
most recent such term
11
a
method of estimating error at an arbitrary lead time.
based on the assumption that the expected squared error at any lead
is
time L
linear function of
a
is
itself,
L
and the initial estimates S(0),
given a set of smoothing constants
and F(l)
R(0),
F(M).
...
Let us adopt some
new notation:
Z(I,L) = (S(l) =
period
L
-••
R(l)) * F(l+L);
the prediction of
+ L made at the end of period
I
E2(|,L) = (Z(1,L)
-
D(! + L))
-'-v
2
;
1.
the square of the difference
between the actual demand in period
-'-'"
to period
...
N
-
1
(N-l),
G
<
1
.
;
a
I.
weighting factor corresponding
set of predictions for periods
I's
(8)
+ L and the prediction
I
made for the same period at the end of period
W(l) = G
(7)
i
+
L,
where
L
(9)
=
1
.
We now seek coefficients U and
V
such that the double sum
N-1
N-1
(W(l) E
1=0
L=l
Z
is
minimized.
In
(E2(I,L)
-
U - V * L)
-nc
2)
(10)
essence, we are going to perform a weighted curve-fitting
with lead time the independent variable,
squared prediction error the
dependent variable, and weights falling geometrically with age of prediction,
We will produce N
"
fitted) as follows:
(N+l)/2 "observations"
(or,
more properly,
points to be
Page 7.
Using data through period
predictions for periods
I
.
.
i.e.,
N,
.
initial estimates), we produce
the
(i.e.,
Z(0, 1)
...
Z(0, N).
The squared
difference between each prediction and its corresponding data item
E2(0,N) and the associated lead time
E2(0, 1)
...
for all
these points
periods
...
G
-w-
2
...
Z(l,l)
i.e.,
N,
L
=
1
...
N.
The weight
G »W' N.
is
using data through period
Similarly,
is
is
L =
E2(1,N-1) and lead times
Z(1,N-1),
...
...
1
N-1
we produce predictions for
1,
corresponding errors E2(l,l)
The weight for all
.
these
is
(N-1).
using data through period N-1, we predict
This process continues until,
period
N,
error E2(N-l,l),
Z(N-1,1),
i.e.,
Ignoring the weights for a moment,
a
at
lead time L=l,
with weight G.
scatter diagram of these points
would look something like Figure 2:
Error
'
•''^'
.1
'
.->",V*-"
,
;
t
J?c^*--
"
Lead Time
Methods for obtaining such
a
"best linear fit" btq well
known,
so we
state only the equations used to find U and V
XO = Y
[
1
(12a)
]
= Y[L]/XO
XI
X2 = Y[E2(|,L)]/X0
XI
1
^''
= Y[L
X22 = Y[E2
(l,L)
X21
= Y[L
V =
X21/X22
U = X2
-
2]/X0
-•-
-'-v
-
XI
2 ]/X0
E2(!,L)]/X0
V • XI
-.Wf
-
2
-
XI
X2
^v
-Wr 2
X2
Page 8,
where the special symbol Y[H]
taken to mean
is
N=l
N-1
Y[H] = E
(W(l) E
1=0
H)
(13)
L=l
Now we obtain an estimate of the expected squared error at any lead time
L
as
E
= U + L
-•-
V
(14)
This answer to the first three objections stated on page k has some
flaws,
of which any potential user should be aware:
It
1.
is
Each set of smoothing constants tested requires
expensive.
doing a linear regression computation involving
(N +
'>
N
1
)/2 observations.
This works out to approximately kH times the calculations required for
L =
simple evaluation of error at
cheap commodity,
A,
not excessive for,
is
2.
It
.
is
rapidly becoming
B,
say,
and C) takes about
a
3
= 12,
L
a
125
minutes of 709^ time.
monthly forecast of corporate sales.
could be wrong -- especially where the actual demand-generating
process at work
is
very close to the model given
the simple scheme of evaluation at L =
pathological cases
a
Computer time
and a practical problem (N = 96,
however,
different combinations of
This
1
where,
(1),
would be better.
(such as using the wrong value of M)
clear indication of trouble,
squared errors.
1
in
since for some
This did not occur in several
L,
In
presumably,
certain
might be negative
V
this would
imply negative
trials with "dirty" data,
however.
3.
It
is
very sensitive to the choice of
The simple scheme
F(M).
is
G,
S(0),
also sensitive to these,
R(0),
but,
and F(l)
...
because the
relationships are simpler, obvious mistakes are more easily detected.
k.
It
rests on some shaky foundations -- notably the assumption that
Page 9.
squared error of prediction
rs
a
linear function of lead time.
This
is
not
much more ridiculous than the rest of the assumptions which underlie
exponential smoothing,
a
technique which
is
well established and
demonstrably useful.
summary, we have described a method for choosing a set of smoothing
In
constants on the basis of expected error at some arbitrary prediction lead
time.
It
attempts to accomplish three objectives:
invalidities
the model underlying Exponential
in
overcome minor
Smoothing with Linear Trend
and Ratio Seasonals; emphasize quality of prediction rather than quality of
parameters; and give a meaningful estimate of error at arbitrary lead time
rather than at the normal,
are achieved at some cost:
due to poor
pinning.
data,
initial
nearly-meaningless
These objectives
period.
increased computation;
increased risks of error
conditions of strange data; and shaky theoretical under-
While the model has performed well
it will
1
in
limited tests with real
be some time before any determination of
its
relative advantage
can be made.
Design of a Computer Program Using the
Method for Choosing Smoothing Constants
III.
As occasionally happens,
described above was
what before
it
the method of selecting smoothing constants
incorporated into an operating computer program some-
had been conceptualized.
It
is
useful,
perhaps to describe
the program from the point of view of a use.
I
.
General
The program
is
designed to run on an IBM 709-7090-7094 operating under
the control of the 9/90/9^ FORTRAN Monitor System (FMS).
consists of:
The operating deck
.
Page 10.
* (installation sign-on card)
* XEQ
(binary deck)
* DATA
(data cards)
The program
is
to read
is
input-driven by a set of control cards.
input data until
to the effect
a
control card
recognized,
is
The basic scheme
then log a message
that such-and-such control card was encountered,
proceed to take action as designated by the control card.
of functions can be called for by changes
in
the
and then
Thus a wide variety
input deck rather than the
program.
Entering Basic Data
2.
When the program encounters
"DEPEN,
card,
it
is
a
card where the first six columns are
conditioned to accept basic demand data.
interpreted as follows:
Columns
1
-
60
Alphabetic description of data
61
-
62 Period
of first piece of data entered
63
-
Sk Year
65
-
66 Period
67
-
68 Year
69
-
70 M,
of last piece of data entered
the number of periods/year (e.g.,
The first and last periods given
(200 maximum ) of data to be entered.
follow immediately,
3
reads one more
It
9
in
01,
Ok,
12,
52)
this card define the number of periods
The program expects these data to
eight-column fields per card.
Establishing a Set of Smoothing Constants
When the program encounters a card whose first six columns are "ALPHA
it
reads one more card to define a set of smoothing constants as follows:
Page
Columns
k.
.5,
2
Number of values of A to be tried
3
-
k
First value of A
in
1/lOOth
Second through 11th values of A
23
-
2k
Same as above
25
-
26
Number of values of
27
-
kS
As above
^9
-
50
Number of values of
default,
.7,
,
-
51-72
.3,
1
1
5-6
In
1
B
to be tried
C
to be tried
As above
the program uses 5 values each of A,
B,
C,
specifically
.1,
.9.
Establishing a Range of Predictions
When the program encounters a card whose first six columns are "RANGE,
it
reads one more card to define a set of ranges of data as follows:
Prediction Range:
Columns
1
3
-
2
Period
-
k
Year
5-6
7-8
9-10
Evaluation Range:
11-12
Computation Range:
The prediction range
is
Period
Period
1^
Period
15
-
16
Year
17
-
IS
Period
19 -
20
Year
23
-
the
2k
First data point
Year
-
- 22
Last data point
Year
13
21
First data point
Period
Last data point
First data point
Last, data point
Year
range of dates over which prediction based
.
.
Page
12
on the already stored data are desired.
The evaluation range
is
to be
estimated.
is
the range of dates over which prediction error
normally would be identical with the prediction
It
range
The computation range
be used in the predictions.
is
the range of dates of the basic data which may
It
normally coincides with the range of stored
data.
If
we denote the range of stored data as the "data range",
then the
default conditions are as follows:
If
one leaves unspecified
The program uses
Low date prediction range
High date data range +1
High date prediction range
Low date prediction range
Low date evaluation range
Low date prediction range
High date evaluation range
Low date evaluation range
Low date computation range
Low date data range
High date computation range
Minimum of: High date of data range
and low date prediction range less
1
The
'-'^ANGE
A.
Force all
B.
Form initial estimates of S(0), R(0),
card also initiates the actual prediction procedure.
ranges to consistency with prediction range.
F(l)
...
F(M).
The program
"cheats" by using the first two years of the computation range to produce
these estimates
C.
For all possible combinations of A,
B,
C,
form an estimate of the
error at the mid-point of the evaluation range.
This corresponds exactly
to the method discussed
it
fitting process not at
in
N
section
1
except that
truncates the curve-
but at lead time corresponding to the difference
between the high dates of the evaluation and computation ranges.
saves some time.)
(This
Page 13.
0.
Using the "best"
A,
B,
C,
form terminal estimates of S(N)j
R(N),
F(N+1) ... F(N+M).
E.
Form and print estimates for the prediction range.
the program also contains
For purposes of testing,
facilities for
analyzing differences between actual and predicted demands,
its predictions with
and for comparing
those made by a simple linear regression against an
optimally-lagged leading indicator series.
These,
however,
are not directly
relevant.
IV.
Results
Tests were made on two series,
of inexpensive durable goods.
in
Brief
both monthly ten-year sales histories
Predictions were made for the period covered
by the last year of the series given average lead times of six (6) to thirty
(30) months.
Over this limited range, at least,
the actual
error appeared to be no worse than linear with lead time.
average squared
In
none of the
trials did the estimated average error at a specified lead time differ from
the actual
average error observed by more than 10%.
Page ]k.
Footnotes
See Holt, Modigliani, Muth, and Simon, Planning Production, Inventories,
and Work Force (Englewood Cliffs:
Prentice Hall), I960, and R. G. Brown,
Smoothing, Forecasting, and Prediction of Discrete Time Series (Englewood
CI iffs:
Prentice Hall), 1963
2
3
Note that the model which minimizes this "variance" does not necessarily
produce unbiased estimates of demand.
Some closed systems tend to generate their own internal cycles. As
Forrester points out, such systems are not good subjects for statistical or
quasi-statistical analysis at the aggregate level.
NOV
l'^c;7
fmrm
^!T
Date Due
TovlZu
HOW.
itjoa
2 2^*^
'78
tffi06'8l
SEP 23
^•^
Lib-26-67
MIT LIBRARIES
OflD
03 flbl 17
D
MIT LIBRARIES
3
TDflD
DD3
Ml
3
'
I
flbl
112
IBRAHIf^
TOD la?
TDfiO D 03
MIT LIBRARIES
3
3
TOflO
TOflO
DD3
flbT
flbT
3
275
25
MIT LIBRARIES
3
TOflO
003 TOO 237
MIT LIBRARIES
3
TOflO
003
flbT
2b7.
Download