LIBRARY OF THE MASSACHUSETTS INSTITUTE OF TECHNOLOGY EXPONENTIAL SMOOTHING^ AN EXTENSION 189-66 Christopher R. Sprague This working paper should not be reproduced, quoted, or cited without the written permission of the author. ' RrcEIVFD I I OCT j_M.> T. JO 19f;fi \inRARlF.5 A/V/V EXPONENTIAL SMOOTHING — AN EXTENSION Christopher R. Sprague . Introduct ion its various forms I Exponential smoothing in demand forecasting. in is a commonly-used technique This paper proposes a method for extending the usefulness of the technique where long lead times are encountered. Section describes a technique for estimating errors at an arbitrary II lead time. Section III Section IV describes a computer program incorporating this technique. gives a brief summary of results, which offered only as is indication without any claim of statistical significance. The Exponential Smoothing Model of Demand -Evaluation of Errors M. A wide class of demand-generating processes seem to be well modeled by d(t) = where d(t) (s is + rt) • f(t) + u(t) demand at time t s is "average" demand at time r is a f is a seasonal f normalized so that Z 1 (1) inear trend term factor m f(t) = m, where m is number of periods in a t=l seasonal cycle and u(t) is a J random variable normally distributed with zero mean and variance independent of While this model remain valid, of s, r, it is t. conceptually neat, one sad fact of life must admit of changes (albeit slow changes) and the f's. Normally it Is is in expected that the changes parameters are sufficiently slow to be swamped by the u's in that, to the values in model any given period. : Page 2 . One useful way of estimating the model parameters so as to use them for predictive purposes and Ratio Seasonals fo 1 1 ] in known as Exponential Smoothing with Linear Trend is which the parameters are recursively estimated as ows Q(l) = D(l)/F(l) S(l) = A^-^Qd) + R(l) = ^v B F(l+M) = C (s(|) -> (2a) (l.-A) - vr (S(l-l) + R(l-l)) S(l-l)) + (1.-B) (D(l)/S(l)) + (l.-C) ^v (2b) R(i-l) (2c) (2d) F(l) vv where: D(l) is actual demand in period Q(l) is deseasonal ized demand as of period D(|) is actual demand in period F(l) is estimated seasonal factor for period I I 1 I S(l) is estimated average deseasonal zed demand as of period period 0) i R(l) estimated trend for period is M is the number of periods monthly data) and It F(|) A, is B, one seasonal cycle (e.g., and C are "smoothing constants" between to begin, ( not ! and 12 for most 1. evident from the above that the process needs S(0), F(m) ... in I R(0), and but thereafter needs only D(l) for each period to conti nue. The proof of the pudding, however, appropriate prediction for month P(l+L) = (S(l) + where P is a L -> R(l)) predicted demand, F(l+L = F(l+L-M), M < L <; I .V + L is in the prediction, is: F(l + L) and the F's are reused, 2M and the (3) i.e., (k) Page 3. Now, S. R, the model if (l) valid, is then the "best" possible estimates of and the F's will give the "best" possible prediction. are obviously dependent on the choice of A, measured demands A, B, this and is D, so that the problem is B, and C, as well as on to produce the "best" estimates of the parameters. C the to choose appropriate values of Typically done by establishing a set of smoothing constants and running the smoothing process against historical data for periods 1 These parameters period ahead, L = I ... I, predicting and forming the sum: I E = Z (((D(J) j = where G is - P(J)) ^v-.v 2) weighting factor < 1. .V G '!r:c (|-j)) (5) l a This associates a squared-error term with a set of smoothing constants. If G is less than 1.0, the most recent periods are emphasized in the sum, reflecting the idea that errors long-past are not so important as errors in the recent past. Since we can associate an error term with any set of smoothing constants, we can obviously find an "approximately optimum" set by an exhaustive search or by hi 1 1 -cl imb ing or by some combination of both. This is called "Adaptive Smoothing". The remainder of this paper above is It used, is insensitive to which method of the two so we here leave the point. widely held that the procedure outlined above results of parameters which will "nonsense" is is in a set give the "best" predictions of future demand. perhaps too strong least try to cast some doubt a word to apply to this belief, (or some stones). While let us at There are at least four reasons why this approach ought to be examined very carefully before being adopted: Page k. The model 1. (I) is only approximately valid. If were exact, It then we could Indeed conclude that the set of smoothing constants which minimized prediction error 1 period in advance was the set which would yield the "best" estimates of model parameters. The manager who 2. is on the line for a sales forecast does not care whether the parameters are correct. are good for some meaningful be useful M, and change S in 3. He wants to know that his predict ions lead time (perhaps 6 to 30 periods). to note here that after some time, the F's may no longer sum to and R may be under or over-stated proportionately. parameters makes absolutely no difference model may It Assuming that the desired prediction lead-time the process as described yields no estimate of error This great in is the prediction'. greater than (or cost) at 1, the actual lead-time to be used. k. Where did the sum-of-squared-errors term arise, from statistics, least-squared fits, computational simplicity (later context). In this case, and the error (or, in however, anyway? It and the like, where its virtue this paper we exploit this there is In came is another no great computational advantage more properly, cost) function might just as well be absolute deviation or any arbitrary thing like: D(J) - P(J) (6) Before proceeding, we should probably dispose of objection {k) above. While there is nothing idea! about the sum-of-squared-error formulation, it Page 5. does have a number of advantages. and probably controlling, First, we if I divide such a sum by the sum of weights E like a variance, (see Equation (l.j)) we obtain a number ''' (G J=l which can be viewed as an estimate of the variance of u While this analogy cannot be pushed too far, (1)). useful for people with some "feel" for statistical concepts. square root of the "variance" and last, Third, mildly discontinuous, there is it Second, is the at least of the same order of magnitude as is and this the expected absolute error in each prediction, for the user. 2 if is a useful (or cost) function the error number even Is little hope of finding an acceptable "approximately optimum" set of parameters by any method short of exhaustive search on an extremely fine grid. So, having raised four objections and backed off from one, we pursue the other three, all of which seem to be easily answered by simply evaluating predictions at the desired lead time rather than at a lead time of 1. this one has rather grave As with many brilliantly simple solutions, desire a prediction for flaws, for if we have data for periods period N + L, our error function will contain terms for predicted vs. actual demands periods ... contain N-L + old. periods in N-L. ... N, Worse yet, Perhaps some numbers are in Suppose we have 30 periods period hl\ so L is would contain P(l) ... F(M) 12. - will use P(12) - !f 1 30 of real ... F(30) D(12) through P(30) F(M) through S(l8), R(!8), F(19) be L periods order. data and wish to predict we were predicting with R(29), the error function will the most recent of those will D(l) through P(30) through S(29), based on parameter values as of i.e., instead of having N terms, Thus, terms. 1 L N and ... I ... - ... - L = 1, our error function D(30) based on S(0), F(29+M). D(30), F(!8+ M) Instead with based on S(0), . R(0), L R(0), F(l) = 12, F(l) we ... Page 6. The fact that each sum of errors has would is serious another way out it otherwise always is periods too old (relatively). 11 Clearly desirable. is We now propose It fewer terms than but it pales by comparison with the fact that the indeed, most recent such term 11 a method of estimating error at an arbitrary lead time. based on the assumption that the expected squared error at any lead is time L linear function of a is itself, L and the initial estimates S(0), given a set of smoothing constants and F(l) R(0), F(M). ... Let us adopt some new notation: Z(I,L) = (S(l) = period L -•• R(l)) * F(l+L); the prediction of + L made at the end of period I E2(|,L) = (Z(1,L) - D(! + L)) -'-v 2 ; 1. the square of the difference between the actual demand in period -'-'" to period ... N - 1 (N-l), G < 1 . ; a I. weighting factor corresponding set of predictions for periods I's (8) + L and the prediction I made for the same period at the end of period W(l) = G (7) i + L, where L (9) = 1 . We now seek coefficients U and V such that the double sum N-1 N-1 (W(l) E 1=0 L=l Z is minimized. In (E2(I,L) - U - V * L) -nc 2) (10) essence, we are going to perform a weighted curve-fitting with lead time the independent variable, squared prediction error the dependent variable, and weights falling geometrically with age of prediction, We will produce N " fitted) as follows: (N+l)/2 "observations" (or, more properly, points to be Page 7. Using data through period predictions for periods I . . i.e., N, . initial estimates), we produce the (i.e., Z(0, 1) ... Z(0, N). The squared difference between each prediction and its corresponding data item E2(0,N) and the associated lead time E2(0, 1) ... for all these points periods ... G -w- 2 ... Z(l,l) i.e., N, L = 1 ... N. The weight G »W' N. is using data through period Similarly, is is L = E2(1,N-1) and lead times Z(1,N-1), ... ... 1 N-1 we produce predictions for 1, corresponding errors E2(l,l) The weight for all . these is (N-1). using data through period N-1, we predict This process continues until, period N, error E2(N-l,l), Z(N-1,1), i.e., Ignoring the weights for a moment, a at lead time L=l, with weight G. scatter diagram of these points would look something like Figure 2: Error ' •''^' .1 ' .->",V*-" , ; t J?c^*-- " Lead Time Methods for obtaining such a "best linear fit" btq well known, so we state only the equations used to find U and V XO = Y [ 1 (12a) ] = Y[L]/XO XI X2 = Y[E2(|,L)]/X0 XI 1 ^'' = Y[L X22 = Y[E2 (l,L) X21 = Y[L V = X21/X22 U = X2 - 2]/X0 -•- -'-v - XI 2 ]/X0 E2(!,L)]/X0 V • XI -.Wf - 2 - XI X2 ^v -Wr 2 X2 Page 8, where the special symbol Y[H] taken to mean is N=l N-1 Y[H] = E (W(l) E 1=0 H) (13) L=l Now we obtain an estimate of the expected squared error at any lead time L as E = U + L -•- V (14) This answer to the first three objections stated on page k has some flaws, of which any potential user should be aware: It 1. is Each set of smoothing constants tested requires expensive. doing a linear regression computation involving (N + '> N 1 )/2 observations. This works out to approximately kH times the calculations required for L = simple evaluation of error at cheap commodity, A, not excessive for, is 2. It . is rapidly becoming B, say, and C) takes about a 3 = 12, L a 125 minutes of 709^ time. monthly forecast of corporate sales. could be wrong -- especially where the actual demand-generating process at work is very close to the model given the simple scheme of evaluation at L = pathological cases a Computer time and a practical problem (N = 96, however, different combinations of This 1 where, (1), would be better. (such as using the wrong value of M) clear indication of trouble, squared errors. 1 in since for some This did not occur in several L, In presumably, certain might be negative V this would imply negative trials with "dirty" data, however. 3. It is very sensitive to the choice of The simple scheme F(M). is G, S(0), also sensitive to these, R(0), but, and F(l) ... because the relationships are simpler, obvious mistakes are more easily detected. k. It rests on some shaky foundations -- notably the assumption that Page 9. squared error of prediction rs a linear function of lead time. This is not much more ridiculous than the rest of the assumptions which underlie exponential smoothing, a technique which is well established and demonstrably useful. summary, we have described a method for choosing a set of smoothing In constants on the basis of expected error at some arbitrary prediction lead time. It attempts to accomplish three objectives: invalidities the model underlying Exponential in overcome minor Smoothing with Linear Trend and Ratio Seasonals; emphasize quality of prediction rather than quality of parameters; and give a meaningful estimate of error at arbitrary lead time rather than at the normal, are achieved at some cost: due to poor pinning. data, initial nearly-meaningless These objectives period. increased computation; increased risks of error conditions of strange data; and shaky theoretical under- While the model has performed well it will 1 in limited tests with real be some time before any determination of its relative advantage can be made. Design of a Computer Program Using the Method for Choosing Smoothing Constants III. As occasionally happens, described above was what before it the method of selecting smoothing constants incorporated into an operating computer program some- had been conceptualized. It is useful, perhaps to describe the program from the point of view of a use. I . General The program is designed to run on an IBM 709-7090-7094 operating under the control of the 9/90/9^ FORTRAN Monitor System (FMS). consists of: The operating deck . Page 10. * (installation sign-on card) * XEQ (binary deck) * DATA (data cards) The program is to read is input-driven by a set of control cards. input data until to the effect a control card recognized, is The basic scheme then log a message that such-and-such control card was encountered, proceed to take action as designated by the control card. of functions can be called for by changes in the and then Thus a wide variety input deck rather than the program. Entering Basic Data 2. When the program encounters "DEPEN, card, it is a card where the first six columns are conditioned to accept basic demand data. interpreted as follows: Columns 1 - 60 Alphabetic description of data 61 - 62 Period of first piece of data entered 63 - Sk Year 65 - 66 Period 67 - 68 Year 69 - 70 M, of last piece of data entered the number of periods/year (e.g., The first and last periods given (200 maximum ) of data to be entered. follow immediately, 3 reads one more It 9 in 01, Ok, 12, 52) this card define the number of periods The program expects these data to eight-column fields per card. Establishing a Set of Smoothing Constants When the program encounters a card whose first six columns are "ALPHA it reads one more card to define a set of smoothing constants as follows: Page Columns k. .5, 2 Number of values of A to be tried 3 - k First value of A in 1/lOOth Second through 11th values of A 23 - 2k Same as above 25 - 26 Number of values of 27 - kS As above ^9 - 50 Number of values of default, .7, , - 51-72 .3, 1 1 5-6 In 1 B to be tried C to be tried As above the program uses 5 values each of A, B, C, specifically .1, .9. Establishing a Range of Predictions When the program encounters a card whose first six columns are "RANGE, it reads one more card to define a set of ranges of data as follows: Prediction Range: Columns 1 3 - 2 Period - k Year 5-6 7-8 9-10 Evaluation Range: 11-12 Computation Range: The prediction range is Period Period 1^ Period 15 - 16 Year 17 - IS Period 19 - 20 Year 23 - the 2k First data point Year - - 22 Last data point Year 13 21 First data point Period Last data point First data point Last, data point Year range of dates over which prediction based . . Page 12 on the already stored data are desired. The evaluation range is to be estimated. is the range of dates over which prediction error normally would be identical with the prediction It range The computation range be used in the predictions. is the range of dates of the basic data which may It normally coincides with the range of stored data. If we denote the range of stored data as the "data range", then the default conditions are as follows: If one leaves unspecified The program uses Low date prediction range High date data range +1 High date prediction range Low date prediction range Low date evaluation range Low date prediction range High date evaluation range Low date evaluation range Low date computation range Low date data range High date computation range Minimum of: High date of data range and low date prediction range less 1 The '-'^ANGE A. Force all B. Form initial estimates of S(0), R(0), card also initiates the actual prediction procedure. ranges to consistency with prediction range. F(l) ... F(M). The program "cheats" by using the first two years of the computation range to produce these estimates C. For all possible combinations of A, B, C, form an estimate of the error at the mid-point of the evaluation range. This corresponds exactly to the method discussed it fitting process not at in N section 1 except that truncates the curve- but at lead time corresponding to the difference between the high dates of the evaluation and computation ranges. saves some time.) (This Page 13. 0. Using the "best" A, B, C, form terminal estimates of S(N)j R(N), F(N+1) ... F(N+M). E. Form and print estimates for the prediction range. the program also contains For purposes of testing, facilities for analyzing differences between actual and predicted demands, its predictions with and for comparing those made by a simple linear regression against an optimally-lagged leading indicator series. These, however, are not directly relevant. IV. Results Tests were made on two series, of inexpensive durable goods. in Brief both monthly ten-year sales histories Predictions were made for the period covered by the last year of the series given average lead times of six (6) to thirty (30) months. Over this limited range, at least, the actual error appeared to be no worse than linear with lead time. average squared In none of the trials did the estimated average error at a specified lead time differ from the actual average error observed by more than 10%. Page ]k. Footnotes See Holt, Modigliani, Muth, and Simon, Planning Production, Inventories, and Work Force (Englewood Cliffs: Prentice Hall), I960, and R. G. Brown, Smoothing, Forecasting, and Prediction of Discrete Time Series (Englewood CI iffs: Prentice Hall), 1963 2 3 Note that the model which minimizes this "variance" does not necessarily produce unbiased estimates of demand. Some closed systems tend to generate their own internal cycles. As Forrester points out, such systems are not good subjects for statistical or quasi-statistical analysis at the aggregate level. NOV l'^c;7 fmrm ^!T Date Due TovlZu HOW. itjoa 2 2^*^ '78 tffi06'8l SEP 23 ^•^ Lib-26-67 MIT LIBRARIES OflD 03 flbl 17 D MIT LIBRARIES 3 TDflD DD3 Ml 3 ' I flbl 112 IBRAHIf^ TOD la? TDfiO D 03 MIT LIBRARIES 3 3 TOflO TOflO DD3 flbT flbT 3 275 25 MIT LIBRARIES 3 TOflO 003 TOO 237 MIT LIBRARIES 3 TOflO 003 flbT 2b7.