A Managerial Approach to Using Error Measures in the Evaluation of Forecasting Methods [published in the International Journal of Business Research, Vol. 7, No. 3, 2007, pp. 143-149] James E. Cox, Jr., Illinois State University, Normal, Illinois, USA David G. Loomis, Illinois State University, Normal, Illinois, USA ABSTRACT Summary error accuracy measures are used in several steps of the forecasting process. For example, error measures are used to evaluate and choose forecasting methods. However, more effort should be made to take into account the implicit managerial assumptions made by using these error measures. Keywords: Error Measures, Forecasting, Forecasting Management, Accuracy Measures, Forecasting Process, Forecasting Techniques. 1. INTRODUCTION The topic of forecasting error has been addressed by many authors. Forecasting books have shown how to calculate basic summary error (accuracy) measures [Makridakis, Wheelwright, Hydman, 1998; DeLurgio, 1998]. In addition articles have been written on ways of making forecasts more accurate [Guerts and Whitlark, 2000], how to measure the impact of error on an enterprise [Kahn, 2003], measuring the cost of forecasting error [Jain, 2004], examining errors in various industries [Jain, 2003], and the empirical measurement of error [Mentzer and Cox, 1984]. However, surprisingly absent from the literature is what impact the implicit managerial assumptions regarding the summary error measures have on the selection of the proper methods to use in a forecasting situation. What the forecast user or manager is ready to assume will influence what summary error measures should be used in the technique selection process. The choice of error measures can affect the selection and ranking of methods [Armstrong, 2001]. Empirical research has shown that the most preferred error measures have varied over time. A study in 1982 by Carbone and Armstrong (1982) found that 33% of practitioners preferred root mean squared error (RMSE) which was the most popular of seven error measures and 11 % preferred mean absolute percent error (MAPE). A later study by Mentzer and Kahn (1995) showed that 52% of their sample of companies used MAPE and 10% used RMSE. The purpose of this article is not to do a survey of all possible error measures but rather to draw attention to the importance of considering the assumptions made in selecting an error measure. Understanding the context and managerial assumptions/preferences for which a forecast is made will assist the forecaster in choosing the appropriate error measure for that particular situation. This is important regardless of whether managers are generating their own forecasts or whether forecasters are preparing the forecasts for managers. Our focus for this article is to give managers/forecasters advice on the best error measure to use for a particular time series rather than choosing the best error measurement across multiple series. Focusing on the best error measure for one series would be an important forecasting perspective when a particular product would account for a significant proportion of company sales. The best error measure across multiple series is dicussed in other literature, such as Makridakis et. al (1982), Thompson (1990), Gardner (1990), Collopy and Armstrong (1992), Fildes (1992), and Armstrong and Fildes (1995), has addressed the multi-series situation. Armstrong and Fildes (1995, 68) state, “We believe that most users of forecasts are able to examine more than one error measure. Computer technology is making such comparisons easier…Software designers now include a variety of summary statistics and the issue should be which error measure or measures are appropriate for a given situation.” 2. THE FORECASTING PROCESS In generating forecasts the manager or forecaster will go through a forecasting process similar to the one below in Figure 1. FIGURE 1 – The Forecasting Process Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Set objectives for the forecast Select possible forecasting techniques Data collection and preparation Parameterize the technique(s) Technique(s) evaluation and selection Application of technique(s) and forecast revision Evaluation of technique performance Typically, the forecaster uses a summary measure of forecasting error to parameterize the technique (Step 4), to evaluate the technique (Step 5), and to evaluate the technique’s performance (Step 7). Forecasting error is defined by taking the actual sales and subtracting the forecast. However, there are several ways to come up with a summary of the error over several periods. The summary error measure that is chosen will ultimately have a significant impact on selecting a forecasting method. One of the necessary steps in generating forecasts is parameterizing the forecasting methods. What the forecaster needs to do at this stage is to pick the particular parameters that will be used to run the method. For example single exponential smoothing uses the parameter alpha (α) where alpha is chosen to be between 0 and 1. The formula for single exponential smoothing is: F(t+1) = F(t) + α * [X(t) – F(t)] where 0 ≤ α ≤ 1. Initialize by letting F(1) = X(1) F(t+1) = one step ahead forecast at time period t F(t) = forecast for time period t X(t) = sales for time period t As different values for the parameter are chosen the forecasting method will generate very different forecasts. In the following example single exponential smoothing is used to generate forecasts using three different parameter values. It can clearly be seen the difference the parameter value makes in forecasting and consequently in the technique’s accuracy. Although this simple example uses only a one-step-ahead forecast for illustration purposes, the methodology generalizes to multi-step-ahead forecasts as well. TABLE 1 – Forecasts and Parameterization Sales Forecast Forecast Forecast α=.2 α=.5 α=.8 1 20 20 20 20 2 40 20 20 20 3 30 24 30 36 4 50 25.2 30 31.2 5 40 30.16 40 46.24 t The key question is what values should be chosen for the parameters that will do the best job at forecasting. Although some software programs do not allow the user to choose different error measures in parameterizations, the most popular statistical program, SAS, allows this flexibility. Other programs may offer this feature as well if it is requested by users. In addition, users could use spreadsheets to calculate these measures. Typically, the best job in parameterization is defined by the parameter(s) that give the forecasting method the lowest forecasting error. Error is defined by taking the actual sales and subtracting the forecast. 2 However, there are several ways to come up with a summary of the error over several periods. The summary error measure that is chosen will have a great influence on parameters chosen. Here are seven summary measures of error commonly used. Mean (Average) Error (ME) = the average of the error for each period Mean Absolute Deviation (MAD) = the average of the absolute values of the error for each period Mean Squared Error (MSE) = the average of the squared error for each period Standard Deviation (SD) = the square root of the MSE also known as root mean squared error (RMSE) Signed Squared Error (SSE) = the average of the squared error with the signed retained of the direction of the error Mean Percentage Error (MPE) = the average of the percentage error for each period. The percentage is calculated by dividing the error by the actual sales. Mean Absolute Percentage Error (MAPE) = the average of the absolute value of the percentage error for each period. The percentage is calculated by dividing the error by the actual sales. Table 1 contains a brief example that illustrates how each summary error measure is computed. TABLE 1 – Error Measures Abs Signed Sqr. Error Sqr. Error Error t Sales Forecast Error 1 2 3 4 20 40 30 50 10 30 50 40 10 10 -20 10 10 10 20 10 100 100 400 100 10 2.5 ME 50 12.5 MAD 700 175 MSE √175=13.23 SD/RMSE Sum Ave Percent Abs. Percent 100 100 -400 100 50% 25% -67% 20% 50% 25% 67% 20% -100 -25 SSE 28% 7% MPE 162% 40.5% MAPE Each of these error summary measures has certain characteristics. Mean Error (ME) - shows direction of error - does not penalize extreme errors - errors cancel out (no idea of how much) - in original units Mean Absolute Deviation (MAD) - shows magnitude of overall error - does not penalize extreme errors - errors do not cancel out - no idea of direction of error - in original units Mean Squared Error (MSE) - penalizes extreme errors - errors do not offset one another - not in original units - does not show direction of error Standard Deviation (SD) or (RMSE) - penalizes extreme errors - errors do not offset one another - in original units - does not show direction of error Signed Squared Error (SSE) - penalizes extreme errors - errors can offset one another - shows direction of error - not in original units Mean Percentage Error (MPE) - takes percentage of actual sales - does not penalize extreme error - errors can offset one another - shows direction of error - assumes more sales can absorb more error in units 3 Mean Absolute Percentage Error (MAPE) - takes percentage of actual sales - does not penalize extreme deviations - does not cancel offsetting errors - assumes more sales can absorb more error in units - does not show direction of error These characteristics are illustrated in Figure 2 below. Figure 2 - Summary Error Measures Taxonomy Units or Percentage? Units Percentage Let + and – errors cancel? Let + and – errors cancel? Yes No Penalize extreme errors? Yes Penalize extreme errors? Yes No Yes SSE ME Original Units? Yes SD/RMSE MPE No MAPE No MAD No MSE Since the summary error measure characteristics differ, a manager should ask the following questions to help him/her determine which summary error measure would be best for using in technique evaluation and in the other stages of the forecasting process. 4 1. 2. 3. 4. 5. Is the manager looking at a long-term perspective; i.e., more interested in the final result than period by period accuracy? Is period by period accuracy more important than ultimate accuracy? If final result is more important then ME, SSE, MPE would be most appropriate. If period by period is more important then MAD, MSE, SD, MAPE would be most appropriate. Would the manager have trouble comprehending unless "regular" units are used to express error (accuracy)? If regular units is desired then ME, MAD, and SD would be appropriate. Is the manager willing to take more error if the (sales) base is larger? If yes, then MPE and MAPE would be appropriate. Would extreme error be very costly so that the manager would be willing to take lower overall accuracy if extreme error could be avoided for any one period? If yes, then MSE, SD, and SSE would be most appropriate. Does the direction (sign) of the error make a difference in cost? In other words, is there an asymmetrical loss function? (for a complete discussion of asymmetrical loss functions, see Diebold (2001, 34-37) If yes, then ME, SSE, and MPE would be most appropriate. Table 2 reflects the summary error measures which would be most appropriate given the managerial questions posed above. TABLE 2 – Managerial Questions Versus Summary Error Measures ME MAD MSE SD SSE MPE x x x End result accuracy: errors cancel Period by period accuracy: errors don't cancel Regular units? Willing to accept more absolute error if the base is larger? Penalize extreme error? Does direction of error make a difference? Asymmetrical loss? x x x x x x x x x x MAPE x x x x x 3. APPLICATION Four industries with particular issues can be examined to illustrate the importance of accounting for managerial concerns in selecting an error measure. We have purposely limited the complexity of the applications so as not to obscure the managerial perspective regarding the error measure used in the forecasting situation. First, in the electricity industry, short term forecasters are called upon to forecast the demand for electricity on an hourly basis one day ahead taking account of the number and type of customers, weather, etc. If the forecaster overestimates the demand, the utility will have excess capacity on hand for which it receives no payment because no customer buys it. If the forecaster underestimates demand, the utility may have to buy power on the spot market at very high prices or there may be blackouts if no electricity can be found from other places. Clearly, managers would be somewhat unhappy about the former but extremely upset at the latter. Thus, underforecasts are much worse for the 5 utility than overforecasts. In this case, the forecaster would be best served by choosing Signed Squared Error (SSE) because it is in units and it penalizes extreme values and the direction of the error matters. Second, consider a manufacturer of airplanes. In this industry, a forecaster may be called upon to forecast future demand for a model of airplane that is not built yet. In this case, the management might be biased against overforecasts for fear that it would have billions of dollars tied up in airplanes that it cannot sell. In other words, in technical terms, there is an asymmetrical loss function. In this case, if the management is risk averse, more error might be acceptable only as the expected number of planes sold gets larger since a larger revenue base can absorb the cost of the error and consequently, the most appropriate error measure would be in percentage terms. Because of the danger of overforecasting, the sign of the error is still important, so the logical choice here would be Mean Percentage Error (MPE). If, on the other hand, management is not risk averse and a larger revenue base is not important, Mean Error (ME) would be acceptable. In addition to having billions of dollars tied up with an overforecast, this might affect the production process and affect per-unit costs. A third industry is represented by a restaurant where demand for different dishes needs to be forecasted in order to buy ingredients. Some of these ingredients are perishable goods that cannot be carried over from day to day. If ingredients are only used for one dish, an overforecast would result in the waste of the perishable goods but an underforecast may result in angry customers who cannot order the dish that they want. In this case, Mean Absolute Percentage Error (MAPE) would be best error measure for the forecaster since errors don’t cancel and as in the previous example, management is willing to accept more error if they have a larger revenue base to absorb the error with more sales. If management would not be willing to accept more error with larger revenue, Mean Absolute Deviation (MAD) would be the best error measure. Finally, a manager of future commodity prices may need to be as accurate as possible with underforecasts being just as bad as overforecasts. Further, it may not be acceptable to the manager to accept a greater error as prices rise and errors in one month don’t cancel out errors in another month since all errors are costly. Since an extreme error could be disastrous financially, the manager should penalize extreme errors. Thus, the best error measure would be Mean Squared Error (MSE) or if the forecaster prefers to work in regular units, Standard Deviation (SD). 4. SUMMARY When an error measure is chosen for use in evaluating technique performance, certain assumptions are being made about what is important in the forecasting situation. For example, the measurement chosen should reflect the importance that the company’s management places on the size and sign of the forecasting error. If a manager is extremely concerned with underforecasts of demand for fear of not having enough of his/her product and disappointing customers, the forecaster should choose an error measurement that reflects this concern. If a manager only feels comfortable thinking in regular units then this should be considered. All too often management does not consider the characteristics of the error summary measures used by their forecasters for technique selection. More effort should be made to use these characteristics in evaluation stage of the forecasting process. If forecasters would do this they would be more likely to meet the managerial expectations for the forecast. 5. REFERENCES Armstrong, Scott J. and Collopy, Fred, “Error Measures for Generalizing About Forecasting Methods: Empirical Comparisons”, International Journal of Forecasting, Vol. VIII (1), 1992, 69-80. Armstrong, Scott J. and Fildes, Robert, “On the Selection of Error Measures for Comparisons 6 Among Forecasting Methods”, Journal of Forecasting, Vol. XIV (1), 1995, 67-71. Armstrong, Scott J. Principles of Forecasting. Kluwer, Boston, 2001. Carbone, Robert and Armstrong, Scott J. “Evaluation of Extrapolative Forecasting Methods: Results of a Survey of Academics and Practitioners”, Journal of Forecasting, Vol. I, 1982, 215-217. DeLurgio, Stephen A., Forecasting Principles and Applications, Irwin/McGraw Hill, New York, 1998. Diebold, Francis X., Elements of Forecasting, 2nd Ed., South-Western, Cincinnati, OH, 2001. Fildes, Robert, “The Evaluation of Extrapolative Forecasting Methods”, International Journal of Forecasting, Vol. VIII (1), 1992, 88-98. Gardner, E. S., Jr., “Evaluating Forecast Performance in an Inventory Control System,” Management Science, Vol. XXXVI (4), 1990, 490-499. Geurts, M.D. and Whitlark, D.B., "Six Ways to Make Sales Forecasts More Accurate”, The Journal of Business Forecasting, Vol. XVIII (4), Winter 1999-2000, pp. 21-23,30. Jain, Chaman L., "Forecasting Errors in the Consumers Products Industry”, The Journal of Business Forecasting, Vol. XXII (2), Summer 2003, 2-4. Jain, Chaman L., "How to Measure the Cost of Forecast Error", The Journal of Business Forecasting Methods and Systems, Vol. XXII (4), Winter 2003-04, 2,29-30. Kahn, K.B. "How to Measure the Impact of Forecast Error on an Enterprise", The Journal of Business Forecasting, Spring 2003, 21-25. Makridakis, S. et. al., “The Accuracy of Extrapolation (Time Series) Methods: Results of a Forecasting Competition”, Journal of Forecasting, Vol. I, 1982, 111-53. Makridakis, S., Wheelwright, S.C., and Hyndman S.C., Forecasting Methods and Applications. Wiley & Sons, New York, 1998. Mentzer, John T. and Cox, James. E., Jr., “Familiarity Application and Performance of Sales Forecasting Techniques”, Journal of Forecasting. Vol. III (1), January-March 1984, 27-36. Mentzer, John T. and Kahn, Kenneth B., “Forecasting Technique Familiarity, Satisfaction, Usage and Application”, Journal of Forecasting, Vol. XIV (5), 1995, 465-476. Thompson, Patrick A., “An MSE Statistic for Comparing Forecast Accuracy Across Series”, International Journal of Forecasting, Vol. VI (2), 1990, 219-27. 6. AUTHOR PROFILES Dr. James E. Cox, Jr. earned his Ph.D. at the University of Illinois, Champaign-Urbana in 1981. Currently he is a professor of marketing at Illinois State University. In addition to teaching forecasting for over 25 years, he has consulted with major corporations in the forecasting area. Dr. David G. Loomis earned his Ph.D. in economics at Temple University in 1995. Currently he is an associate professor of economics at Illinois State University where he teaches forecasting. He formerly worked at Bell Atlantic in the forecasting area. 7