Chapter 13 FORECASTING CHAPTER OUTLINE 13.1 Introduction 13.2 Quantitative Forecasting 13.3 Causal Forecasting Models 13.4 Time-Series Forecasting Models 13.5 The Role of Historical Data: Divide and Conquer 13.6 Qualitative Forecasting 13.7 Notes on Implementation KEY TERMS SELF-REVIEW EXERCISES PROBLEMS CASE 1: BANK OF LARAMIE CASE 2: SHUMWAY, HORCH, AND SAGER (B) CASE 3: MARRIOTT ROOM FORECASTING REFERENCES CD13-2 C D C H A P T E R S APPLICATION CAPSULE Forecasting Improvement at L. L. Bean L. L. Bean is a widely known retailer of high-quality outdoor goods and apparel. The majority of its sales are generated through telephone orders via 800-service, which was introduced in 1986. Ten percent of its $870 million in 1993 sales was derived through store transactions, 18% was ordered through the mail, leaving the bulk (72%) from orders taken at the company’s call center. Calls to L. L. Bean’s call center fit into two major classifications, telemarketing (TM) and telephone inquiry (TI), each with its own 800-number. TM calls are primarily the orderplacing calls that generate the vast majority of the company’s sales. TI callers are mainly customers who ask about the status of their orders, report problems about orders, and so on. The volume of calls and average duration of each call are quite different for these two classes. Annual call volumes for TM are many times higher than those of TI, but the average length is much less. TI agents are responsible for customer inquiries in a variety of areas and thus require special training. Thus it is important to accurately forecast the incoming call volumes for TI and TM separately to properly schedule these two distinct server groups. The real focus of these forecasts is on the third week ahead. Once the forecast is made, the schedulers can make up a weekly schedule for their workers and give them two weeks’ advance notice. Inaccurate forecasts are very costly to L. L. Bean because they result in a mismatch of supply and demand. Understaffing of TM agents increases opportunity costs due to diminished revenues from lost orders (a percentage of customers who don’t get through immediately, abandon the call, and never call back). Understaffing of TI agents decreases overall customer satisfaction and erodes customer loyalty. In both cases, understaffing leads to excessive queue times, which causes telephone-connect charges to 13.1 INTRODUCTION increase dramatically. On the other hand, overstaffing of either group of agents incurs the obvious penalty of excessive direct labor costs for the underutilized pool of agents on duty. The staff-scheduling decisions would be quite routine if it were not for the erratic nature and extreme seasonality of L. L. Bean’s business. For example, the three-week period before Christmas can make or break the year, as nearly 20% of the annual calls come during this short period. L. L. Bean will typically double the number of agents and quadruple the number of phone lines during this period. After this period, there is, of course, the exact opposite problem, the build-down process. In addition, there is a strong day-of-week pattern throughout the year in both types of calls, with the volume in the week the highest on Monday and monotonically decreasing down to the low on Sunday. Other factors that must be considered by the forecasting model is the effect of catalog mailings or “drops.” These are generally done so that the bulk of the catalogs arrive around Tuesday, which disrupts the normal pattern of calls tremendously. Many eager customers order immediately, which creates a surge of new calls around the time of the “drop.” The new forecasting model that was developed had much greater forecast accuracy than L. L. Bean’s previous approach and was able to produce a mean absolute percentage error of 7.4% for the TM group and 11.4% for the TI group on five years of historical data. So far, on future threeweek ahead forecasts, the new forecasting model has had about the same accuracy as that demonstrated on the historical data. The increased precision afforded by these models is estimated to translate into $300,000 annual savings for L. L. Bean through more efficient scheduling. (See Andrews and Cunningham.) The date is June 15, 1941. Joachim von Ribbentrop, Hitler’s special envoy, is meeting in Venice with Count Ciano, the Italian foreign minister, whereupon von Ribbentrop says: “My dear Ciano, I cannot tell you anything as yet because every decision is locked in the impenetrable bosom of the Führer. However, one thing is certain: If we attack, the Russia of Stalin will be erased from the map within eight weeks.” (see Bullock) Nine days later, Nazi Germany launched operation Barbarossa and declared war on Russia. With this decision, a chain of events that led to the end of the Third Reich had been set in motion, and the course of history was dramatically changed. Although few decisions are this significant, it is clearly true that many of the most important decisions made by individuals and organizations crucially depend on an assessment of the future. Predictions or forecasts with greater accuracy than that achieved by the C H A P T E R 1 3 Forecasting CD13-3 German General Staff are thus fervidly hoped for and in some cases diligently worked for. There are a few “wise” sayings that illustrate the promise and frustration of forecasting: • “It is difficult to forecast, especially in regards to the future.” • “It isn’t difficult to forecast, just to forecast correctly.” • “Numbers, if tortured enough, will confess to just about anything!” Economic forecasting considered by itself is an important activity. Government policies and business decisions are based on forecasts of the GDP, the level of unemployment, the demand for refrigerators, and so on. Among the major insurance companies, one is hard-pressed to find an investment department that does not have a contract with some expert or firm to obtain economic forecasts on a regular basis. Billions of dollars of investments in mortgages and bonds are influenced by these forecasts. Over 2,000 people show up each year at the Annual Forecast Luncheon sponsored by the University of Chicago to hear the views of three economists on the economic outlook. The data are overwhelming. Forecasting is playing an increasingly important role in the modern firm. Not only is forecasting increasingly important, but quantitative models are playing an increasingly important role in the forecasting function. There is clearly a steady increase in the use of quantitative forecasting models at many levels in industry and government. A conspicuous example is the widespread use of inventory control programs that include a forecasting subroutine. Another example is the reliance of several industries (airlines, hotels, rental cars, cruise lines) in the services sector of the economy on accurate forecasts of demand as inputs to their sophisticated mathematical optimizers used for revenue management (e.g., How much to overbook? How many units should be made available at different discount levels?). For economic entities such as the GDP or exchange rates, many firms now rely on econometric models for their forecasts. These models, which consist of a system of statistically estimated equations, have had a significant impact on the decision processes in both industry and government. There are numerous ways to classify forecasting models and the terminology varies with the classification. For example, one can refer to “long-range,” “medium-range,” and “short-range” models. There are “regression” models, “extrapolation” models, and “conditional” or “precedent-based” models, as well as “nearest-neighbor” models. The major distinction we employ will be between quantitative and qualitative forecasting techniques. 13.2 QUANTITATIVE FORECASTING Quantitative forecasting models possess two important and attractive features: 1. They are expressed in mathematical notation. Thus, they establish an unambiguous record of how the forecast is made. This provides an excellent vehicle for clear communication about the forecast among those who are concerned. Furthermore, they provide an opportunity for systematic modification and improvement of the forecasting technique. In a quantitative model coefficients can be modified and/or terms added until the model yields good results. (This assumes that the relationship expressed in the model is basically sound.) 2. With the use of spreadsheets and computers, quantitative models can be based on an amazing quantity of data. For example, a major oil company was considering a reorganization and expansion of its domestic marketing facilities (gasoline stations). Everyone understood that this was a pivotal decision for the firm. The size of the proposed capital investment alone, not to mention the possible influences on the revenue from gasoline sales, dictated that this decision be made by the board of directors. In order to evaluate the alternative expansion strategies, the board needed forecasts of the demand for gasoline in each of the marketing regions (more than 100 regions were involved) for each of the next 15 years. Each of these 1,500 estimates was based CD13-4 C D C H A P T E R S on a combination of several factors, including the population and the level of new construction in each region. Without the use of computers and quantitative models, a study involving this level of detail would generally be impossible. In a similar way inventory control systems that require forecasts that are updated on a monthly basis for literally thousands of items could not be constructed without quantitative models and computers. The technical literature related to quantitative forecasting models is enormous, and a high level of technical, mainly statistical, sophistication is required to understand the intricacies of the models in certain areas. In the following two sections we summarize some of the important characteristics and the applicability of such models. We shall distinguish two categories based on the underlying approach. These are causal models and time-series models. 13.3 CAUSAL FORECASTING MODELS In a causal forecasting model, the forecast for the quantity of interest “rides piggyback” on another quantity or set of quantities. In other words, our knowledge of the value of one variable (or perhaps several variables) enables us to forecast the value of another variable. In more precise terms, let y denote the true value for some variable of interest, and let ŷ denote a predicted or forecast value for that variable. Then, in a causal model, ŷ = f(x1, x 2, . . . , xn) where f is a forecasting rule, or function, and x1, x 2, . . . , xn is a set of variables. In this representation the x variables are often called independent variables, whereas ŷ is the dependent or response variable. The notion is that we know the independent variables and use them in the forecasting model to forecast the dependent variable. Consider the following examples: 1. If y is the demand for baby food, then x might be the number of children between 7 and 24 months old. 2. If y is the demand for plumbing fixtures, then x1 and x 2 might be the number of housing starts and the number of existing houses, respectively. 3. If y is the traffic volume on a proposed expressway, x1 and x 2 might be the traffic volume on each of two nearby existing highways. 4. If y is the yield of usable material per pound of ingredients from a proposed chemical plant, then x might be the same quantity produced by a small-scale experimental plant. For a causal model to be useful, either the independent variables must be known in advance or it must be possible to forecast them more easily than ŷ, the dependent variable. For example, knowing a functional relationship between the pounds of sauerkraut and the number of bratwurst sold in Milwaukee in the same year may be interesting to sociologists, but unless sauerkraut usage can be easily predicted, the relationship is of little value for anyone in the bratwurst forecasting business. More generally, companies often find by looking at past performance that their monthly sales are directly related to the monthly GDP, and thus figure that a good forecast could be made using next month’s GDP figure. The only problem is that this quantity is not known, or it may just be a forecast and thus not a truly independent variable. To use a causal forecasting model, then, requires two conditions: 1. There must be a relationship between values of the independent and dependent variables such that the former provides information about the latter. 2. The values for the independent variables must be known and available to the forecaster at the time the forecast must be made. Before we proceed, let’s reemphasize what we mean by point 1. Simply because there is a mathematical relationship does not guarantee that there is really cause and effect. Since C H A P T E R 1 3 Forecasting CD13-5 the Super Bowl began in 1967, almost every time an NFC team wins, the stock market’s Standard & Poor 500 indicator increases for that year. When an AFC team wins, the market usually goes down. In 32 years this rule has worked 88% of the time (28 out of 32)! If you really believed there was a significant relationship between these two variables (which team wins the Super Bowl and subsequent stock market performance that year), then in 1999 you would have taken all of your money out of the stock market and put it into a money market account (savings) or if you wanted to go even further, you might have sold short some stocks or the S&P Index because the Denver Broncos (AFC) won the Super Bowl early that year. One commonly used approach in creating a causal forecasting model is called curve fitting. CURVE FITTING: AN OIL COMPANY EXPANSION The fundamental ideas of curve fitting are easily illustrated by a model in which one independent variable is used to predict the value of the dependent variable. As a specific example, consider an oil company that is planning to expand its network of modern self-service gasoline stations. It plans to use traffic flow (measured in the average number of cars per hour) to forecast sales (measured in average dollar sales per hour). The firm has had five stations in operation for more than a year and has used historical data to calculate the averages shown in Figure 13.1 (OILCOMP.XLS). These data are plotted in Figure 13.2. Such a plot is often called a scatter diagram. In order to create this diagram (or chart) in Excel, we must do the following: 1. 2. Highlight the range of data (B2:C6); then click on the Chart Wizard. In the first step, indicate that you want the XY (Scatter) type of chart (the fifth choice), then indicate you want the first subtype of scatter chart you can choose (only the data points, no lines connecting). 3. In the second step, click on “Next>” because all of Excel’s default choices are fine. 4. In the third step, enter the X-axis label as “Cars/hour” and the Y-axis label as “Sales/hour ($),” then click on “Next>.” 5. In the final step, click on “As new sheet” to place the chart in a separate worksheet called “Chart1”; then click on “Finish.” We now wish to use these data to construct a function that will enable us to forecast the sales at any proposed location by measuring the traffic flow at that location and plugging its value into the function we construct. In particular, suppose that the traffic flow at a proposed location in Buffalo Grove is 183 cars per hour. How might we use the data in Figure 13.2 to forecast the sales at this location? Least Squares Fits The method of least squares is a formal procedure for curve fitting. It is a two-step process. 1. Select a specific functional form (e.g., a straight line or a quadratic curve). 2. Within the set of functions specified in step 1, choose the specific function that minimizes the sum of the squared deviations between the data points and the function values. FIGURE 13.1 Sales and Traffic Data CD13-6 C D C H A P T E R S FIGURE 13.2 Scatter Plot of Sales Versus Traffic To demonstrate this process, consider the sales-traffic flow example. In step 1, assume that we select a straight line; that is, we restrict our attention to functions of the form y = a + bx, Step 2 is illustrated in Figure 13.3. Here values for a and b were chosen (we’ll show how to do this in Excel momentarily), the appropriate line y = a + bx was drawn, and the deviations between observed points and the function are indicated. For example, d1 = y1 – [a + bx1] = 220 – [a + 150b] where y1 = actual (observed) sales/hr at location 1 (i.e., 220) x1 = actual (observed) traffic flow at location 1 (i.e., 150) a = intercept (on the vertical axis) for function in Figure 13.3 b = slope for the function in Figure 13.3 The value d12 is one measure of how close the value of the function [a + bx1] is to the observed value, y1; that is, it indicates how well the function fits at this one point. FIGURE 13.3 y Method of Least Squares 300 Sales/hour ($) 250 d3 d1 200 d5 d4 150 100 y = a + bx d2 50 50 100 150 200 Cars/hour x C H A P T E R 1 3 Forecasting CD13-7 We want the function to fit well at all points. One measure of how well it fits overall is 5 the sum of the squared deviations, which is di2. Let us now consider a general model with i=1 n as opposed to five observations. Then, since each di = yi – (a + bxi ), the sum of the squared deviations can be written as n (yi – [a + bxi])2 i=1 (13.1) Using the method of least squares, we select a and b so as to minimize the sum shown in equation (13.1). The rules of calculus can be used to determine the values of a and b that minimize this sum. Over a century ago, mathematicians could not determine the straight line that minimized the absolute deviation or error yi – ŷi, but they could use calculus to determine the line that minimized the squared error (yi – ŷi)2. Thus, forecasting has been inundated with “least squares” formulas and rationalizations as to why “squared” errors should be minimized. Today, with the advent of spreadsheets, we are able to use other error measures (e.g., Mean Absolute Deviation [MAD] and Mean Absolute Percentage Error [MAPE]) because spreadsheets combined with the Solver algorithm can now minimize the sums of absolute errors or percentage errors as well. These two, newer error measures will be used and demonstrated extensively in Section 13.4. To continue with the development of the traditional least squares approach, the procedure is to take the partial derivative of the sum in equation (13.1) with respect to a and set the resulting expression equal to zero. This yields one equation. A second equation is derived by following the same procedure with b. The equations that result from this procedure are n n –2(yi – [a + bxi]) = 0 i=1 –2xi(yi – [a + bxi]) = 0 i=1 and Recall that the values for xi and yi are the observations, and our goal is to find the values of a and b that satisfy these two equations. The solution can be shown to be n b= n n xi yi – n1 i=1 xi i=1 yi i=1 1 a=n xi2 – n1 i=1 xi i=1 n n n n yi – b n1 i=1 xi i=1 2 (13.2) The next step is to determine the values for xi, x2i , yi, xiyi. Note that these quantities depend only on the data we have observed and that we can find them with simple arithmetic operations. Of course, Excel is highly capable of doing this for us and in fact has some predefined functions that will do this automatically. To do this, we simply 1. Click on Tools, then “Data Analysis . . .” (If you don’t see this as a choice on the submenu, you will need to click on “Add-ins” and then select “Analysis ToolPak.”) 2. Choose “Regression” and it brings up the dialog box shown in Figure 13.4. Enter the Y-range as $C$2:$C$6 and the X-range as $B$2:$B$6. 3. Indicate that we want the results reported to us in a separate spreadsheet entitled “Results.” 4. Click on OK. The results that are automatically calculated and reported by Excel are shown in Figure 13.5. CD13-8 C D C H A P T E R S FIGURE 13.4 Excel’s Regression Tool Dialog Box FIGURE 13.5 Results of Regression There is a wealth of information that is reported to us, but the parameters of immediate interest to us are contained in cells B17:B18. We note that the “Intercept” (a) and “X Variable 1” value (b) are reported as: b = 0.92997 a = 57.104 To add the resulting least squares line we must follow these steps: 1. Click on the worksheet with our original scatter plot (Chart 1). 2. Click on the data series so that they’re highlighted. 3. Click on the Chart menu, followed by “Add Trendline . . .” 4. In responding to the type of trendline we want, click on OK (because the linear trend is the default choice). 5. Click on OK. The result is shown in Figure 13.6 as a solid line. Let’s explore what some of the other information reported in the regression’s “Summary Output” means. The “R Square” value reported in cell B5 of Figure 13.5 is given as 69.4%. This is a “goodness of fit” measure, like the sum of squared deviations. This number represents the R2 statistic discussed in introductory statistics classes. It ranges in value from 0 to 1 and gives us an idea of how much of the total variation in Y from its mean is explained by the new line we’ve drawn. Put another way, statisticians like to talk about the C H A P T E R 1 3 Forecasting CD13-9 FIGURE 13.6 Least Squares Linear Trend Line three different sums of errors (Total Sum of Squares [TSS], Error Sum of Squares [ESS], and Regression Sum of Squares [RSS]). The basic relationship between them is: TSS = ESS + RSS and they are defined as follows: n TSS = (Yi – Y )2 i=1 ESS = (Yi – Ŷi)2 i=1 RSS = (Ŷi – Y)2 i=1 n n (13.3) The ESS is the quantity that we tried to minimize with the Regression tool. Essentially, the sum of squared errors that is left after regression has done its job (ESS) is the amount of variation that can’t be explained by the regression. The RSS quantity is effectively the amount of the original, total variation (TSS) that we could remove by using our newfound regression line. Another way, R2 is defined is: R2 = RSS TSS If we could come up with a perfect fitting regression line (ESS = 0), we note that RSS = TSS and the R2 = 1.0 (its maximum value). In our case, R2 = .694, meaning we can explain approximately 70% of the variation in the Y values by using our one explanatory variable (X), cars per hour. Now let’s get back to our original task—should we build a station at Buffalo Grove where the traffic is 183 cars/hour? Our best guess at what the corresponding sales volume would be is found by placing this X value into our new regression equation: Sales/hour = 57.104 + 0.92997 * (183 cars/hour) CD13-10 C D C H A P T E R S This gives us a forecasted sales/hour of $227.29. How confident are we in our forecast? It would be nice to be able to state a 95% confidence interval around our best guess. The information we need to do that is also contained in Excel’s summary output. In cell B7, Excel reports that the standard error (Se) is 44.18. This quantity represents the amount of scatter in the actual data around our regression line and is very similar in concept to the ESS. In fact, its formula is Se = n (Yi – Ŷi)2 i=1 n–k–1 (13.4) where n is the number of data points (5 in our example), and k is the number of independent variables (1 in our example). We can see that equation (13.4) is equivalent to = n –ESSk – 1 Once we have this Se value, we can take advantage of a rough thumb rule that is based on the normal distribution and states that we can have 68% confidence the actual value of sales/hour would be within ± 1 Se of our predicted value ($227.29). Likewise we have 95% confidence that the actual value of sales/hour would be within ± 2 Se of our predicted value ($227.29), meaning our 95% confidence interval would be [227.29 – 2(44.18); 227.29 + 2(44.18)] or [$138.93; $315.65] To be more precise on these confidence intervals requires that we calculate Sp (the standard prediction error), which is always larger than Se but is more complicated to derive and beyond our scope of an introductory coverage. The intuition to remember here is that when we’re trying to predict Y based on values of X that are near X , then Sp is very close to Se. The farther away the X values get from X , the larger the difference between Sp and Se. Another value of interest in the Summary report is the t-statistic for the X variable and its associated values (cells D18:G18). The t-statistic is given in cell D18 as 2.61. The P-value in cell E18 is 0.0798. We desire to have the P-value less than 0.05. This would represent that we have at least 95% confidence that the slope parameter (b) is statistically significantly different than zero (a zero slope would be a flat line and indicate no relationship between Y and X). In fact, Excel provides the 95% confidence interval for its estimate of b. In our case, we have 95% confidence the true value for b is between –0.205 and 2.064 (cells F18 and G18) and thus we can’t exclude the possibility that the true value of b might be zero. Lastly, the F-significance reported in cell F12 is identical to the P-value for the t-statistic (0.0798) and will always be so if there is only a single independent variable. In the case of more than one X variable, the F-significance tests the hypothesis that all the X variable parameters as a group are statistically significantly different than zero. One final note as you move into multiple regression models: As you add other X variables, the R2 statistic will always increase, meaning the RSS has increased. But in order to keep from overmassaging the data (similar to what we discussed with the polynomial of degree n – 1), you should keep an eye on the Adjusted R2 statistic (cell B6) as the more reliable indicator of the true goodness of fit because it compensates for the reduction in the ESS due to the addition of more independent variables. Thus it may report a decreased adjusted R2 value even though R2 has increased, unless the improvement in RSS is more than compensated for by the addition of the new independent variables. We should point out that we could have obtained the same parameter values for a and b by using the Solver algorithm (set our objective function as the sum of the squared deviations, let the decision variables be a and b and then turn the Solver loose in trying to minimize our nonlinear objective function). Note that our forecast also “predicts” earning $57.10 (the value for a) when no cars arrive (i.e., cars/hour = 0). At this point it might be well to establish limits on the range over C H A P T E R 1 3 Forecasting CD13-11 which we feel that the forecast is valid (e.g., from 30 to 250 cars) or seek a logical explanation. Many service stations have convenience foods and also do a walk-in business. Thus “a” might represent the amount of walk-in business (which would be constant regardless of how much car traffic there is). Fitting a Quadratic Function The example above has shown how to make linear fits for the case of one independent variable. But the method of least squares can be used with any number of independent variables and with any functional form. As an illustration, suppose that we wish to fit a quadratic function of the form y = a0 + a1x + a2x 2 to our previous data with the method of least squares. Our goal, then, is to select a0, a1, and a2 in order to minimize the sum of squared deviations, which is now 5 (yi – [a0 + a1xi + a2xi2])2 i=1 (13.5) We proceed by setting the partial derivatives with respect to a0, a1, and a2 equal to zero. This gives the equations 5a0 + (xi)a1 + (xi2)a2 = yi (xi)a0 + (xi2)a1 + (xi3)a2 = xi yi (xi2)a0 + (xi3)a1 + (xi4)a2 = xi2yi. (13.6) This is a simple set of three linear equations in three unknowns. Thus, the general name for this least squares curve fitting is “Linear Regression.” The term linear comes not from a straight line being fit, but from the fact that simultaneous linear equations are being solved. Finding the numerical values of the coefficients is a straightforward task in a spreadsheet. This time instead of using the “Regression” tool in Excel, we will demonstrate the use of the Solver algorithm. We use the new spreadsheet called “Quadratic” in the same OILCOMP.XLS workbook that is shown in Figure 13.7. FIGURE 13.7 Quadratic Trend Spreadsheet Cell Formula Copy To D7 E7 F7 F13 F15 = $B$2$B$3*B7$B$4B7^2 = C7D7 = E7^2 = SUM(F7:F11) = SUMXMY2(C7:C11,D7:D11) D8:D11 E8:E11 F8:F11 — — CD13-12 C D C H A P T E R S FIGURE 13.8 Solver Dialog Box for Quadratic Trend The steps to find the optimal values for the parameters (a0, a1, and a2) are indicated below: 1. Click on the Tools menu, and then “Solver . . .” 2. Complete the Solver Parameters dialog box, as shown in Figure 13.8. Click on “Solve.” We are basically setting up an unconstrained, nonlinear optimization model, where the three parameters (cells B2:B4) are our decision variables (changing cells) and our objective function is to minimize the sum of squared errors (cell F13). 3. When Solver returns its dialog box showing that it has found a solution, click on OK and you see the results shown in Figure 13.9. We see the optimal parameters are: a0 = –13.586 a1 = 2.147 a2 = –0.0044 which yields a sum of squared errors of 4954. Note: Excel has a built-in function to help us calculate this quantity directly (i.e., we could do it without columns E and F) known as =sumxmy2(range1, range2) and it is shown in cell F15 of Figure 13.9. The function takes FIGURE 13.9 Results for Optimal Quadratic Parameters Cell Formula Copy To D7 E7 F7 F13 F15 = $B$2$B$3*B7$B$4B7^2 = C7D7 = E7^2 = SUM(F7:F11) = SUMXMY2(C7:C11,D7:D11) D8:D11 E8:E11 F8:F11 — –– C H A P T E R 1 3 Forecasting CD13-13 FIGURE 13.10 Quadratic Least Squares Function Fit to Data the values in the second range and subtracts them from the values in the first range (one at a time), squares the difference, and sums these squared differences up for all the values in the range. To plot the original data and this quadratic function, we use the Chart Wizard with the following steps: 1. Highlight the original range of data (B7:C11) in the “Quadratic” spreadsheet, then click on the Chart Wizard. 2. In the first step, indicate that you want the XY (Scatter) type of chart (the fifth choice), then indicate that you want the first subtype of scatter chart you can choose (only the data points, no lines connecting). 3. In the second step, click on “Next>” because all of Excel’s default choices are fine. 4. In the third step, enter the X-axis label as “Cars/hour” and the Y-axis label as “Sales/hour($)”; then click on “Next>.” 5. In the final step, click on “As new sheet” to place the chart in a separate worksheet called “Chart2”; then click on “Finish.” 6. Click on the data series in Chart2 so that they’re highlighted. 7. Click on the Chart menu, followed by “Add Trendline.” 8. In responding to the type of trendline we want, click on “Polynomial” of order 2. 9. Click on OK and you get the graph shown in Figure 13.10. To do this same thing with the “Regression” tool, you must first create a column for a second independent variable, X2 = X12, and then regress Y (Sales/hr) on both X1 (Cars/hr) and X2 ([Cars/hr] ˆ2). We leave this as an exercise for the student (see Problem 13-23). Comparing the Linear and Quadratic Fits In the method of least squares, we have selected the sum of the squared deviations as our measure of “goodness of fit.” We can thus compare the linear and the quadratic fit with this criterion. In order to make this comparison, we have to go back and use the linear regression “Results” spreadsheet and make the corresponding calculation in the original “Data” spreadsheet. This work is shown in Figure 13.11. We see that the sum of the squared deviations for the quadratic function is indeed smaller than that for the linear function (i.e., 4954 < 5854.7). Indeed, the quadratic gives us CD13-14 C D C H A P T E R S FIGURE 13.11 Sum of Squared Errors Calculation for Linear Regression Results Cell Formula Copy To D2 E2 F2 F8 = Results!$B$17Results!$B$18*B2 = C2D2 = E2^2 = SUM(F2:F6) D3:D6 E3:E6 F3:F6 — roughly a 15% decrease in the sum of squared deviations. The general result has to hold in this direction; that is, the quadratic function must always fit better than the linear function. A linear function is, after all, a special type of a quadratic function (one in which a2 = 0). It follows then that the best quadratic function must be at least as good as the best linear function. WHICH CURVE TO FIT? If a quadratic function is at least as good as a linear function, why not choose an even more general form, such as a cubic or a quartic, thereby getting an even better fit? In principle the method can be applied to any specified functional form. In practice, functions of the form (again using only a single independent variable for illustrative purposes) y = a0 + a1x + a2x 2 + · · · + an xn are often suggested. Such a function is called a polynomial of degree n, and it represents a broad and flexible class of functions (for n = 2 we have a quadratic, n = 3 a cubic, n = 4 a quartic, etc.). One can obtain an amazing variety of curves with polynomials, and thus they are popular among curve fitters. One must, however, proceed with caution when fitting data with a polynomial function. Under quite general conditions it is possible, for example, to find a (k – 1)-degree polynomial that will perfectly fit k data points. To be more specific, suppose that we have on hand seven historical observations, denoted (xi , yi), i = 1, 2, . . . , 7. It is possible to find a sixth-degree polynomial y = a0 + a1x + a2x 2 + · · · + a6x 6 that exactly passes through each of these seven data points (see Figure 13.12). This perfect fit (giving zero for the sum of squared deviations), however, is deceptive, for it does not imply as much as you may think about the predictive value of the model for use in future forecasting. For example, refer again to Figure 13.12. When the independent variable (at some future time) assumes the value x8, the true value of y might be given by y8, whereas the predicted value is ŷ8. Despite the previous perfect fit, the forecast is very inaccurate. In this situation a linear fit (i.e., a first-degree polynomial) such as the one indicated in Figure 13.12 might well provide more realistic forecasts, although by the criterion of least squares it does not “fit” the historical data nearly as well as the sixth-degree polynomial. Also, note that the polynomial fit has hazardous extrapolation properties. That is, the polynomial “blows up” at its extremes; x values only slightly larger than x6 produce very large predicted y’s. Looking at Figure 13.12, you can understand why high-order polynomial fits are referred to as “wild.” C H A P T E R FIGURE 13.12 1 3 Forecasting CD13-15 y A Sixth-Degree Polynomial Produces a Perfect Fit Linear fit y8 A sixth-degree polynomial yˆ 8 x7 x8 x2 x5 x1 x3 x4 x6 x One way of finding which fit is truly “better” is to use a different standard of comparison, the “mean squared error” or MSE—which is the sum of squared errors/(number of points – number of parameters). For our linear fit, the number of parameters estimated is 2 (a, b), so MSE = 5854/(5 – 2) = 1951.3; and for the quadratic fit, the MSE = 4954/ (5 – 3) = 2477.0. Thus, the MSE gets worse in this case, even though the total sum of squares will always be less or the same for a higher-order fit. (Note: This is similar to how the adjusted R2 statistic works.) We still have to be somewhat careful even with this new standard of comparison, because when there is a perfect fit, both the total sum of squares and MSE will be 0.00. Because of this, most prepackaged forecasting programs will fit only up through a cubic polynomial, since higher degrees simply don’t reflect the general trend of actual data. What Is a Good Fit? The intent of the paragraphs above is to suggest that a model that has given a good fit to historical data may provide a terrible fit to future data. That is, a good historical fit may have poor predictive power. So what is a good fit? The answer to this question involves considerations both philosophic and technical. It depends, first, on whether one has some idea about the underlying real-world process that relates the y’s and the x’s. To be an effective forecasting device, the forecasting function must to some extent capture important features of that process. The more one knows, the better one can do. To go very far into this topic, one must employ a level of statistics that would extend well beyond this introductory coverage. For our purposes it suffices to state that knowledge of the underlying process is typically phrased in statistical language. For example, linear curve fitting, in the statistical context, is called linear regression. If the statistical assumptions about the linear regression model are precisely satisfied (e.g., errors are normally distributed around the regression line), then in a precise and well-defined sense statisticians can prove that the linear fit is the “best possible fit.” But in a real sense, this begs the question. In the real world one can never be completely certain about the underlying process. It is never “served to us on a silver platter.” One only has some (and often not enough) historical data to observe. The question then becomes: How much confidence can we have that the underlying process is one that satisfies a particular set of statistical assumptions? Fortunately, quantitative measures do exist. Statistical analysis, at least for simple classes of models like linear regression, can reveal how well the historical data do indeed satisfy those assumptions. And what if they do not? One tries a different model. Let us regress (digress) for a moment to recall some of the philosophy involved with the use of optimization models CD13-16 C D C H A P T E R S (which is exactly what least squares fitting is—an unconstrained nonlinear optimization). There is an underlying real-world problem. The model is a selective representation of that problem. How good is that model, or representation? One usually does not have precise measures, and many paragraphs in this text have been devoted to the role of managerial judgment and sensitivity analysis in establishing a model’s credibility. Ideally, to test the goodness of a model, one would like to have considerable experience with its use. If, in repeated use, we observe that the model performs well, then our confidence is high.1 However, what confidence can we have at the outset, without experience? Validating Models One benchmark, which brings us close to the current context, is to ask the question: Suppose the model had been used to make past decisions; how well would the firm have fared? This approach “creates” experience by simulating the past. This is often referred to as validation of the model. One way to use this approach, in the forecasting context, is called “divide and conquer” and is discussed in Section 13.5. Typically, one uses only a portion of the historical data to create the model—for example, to fit a polynomial of a specified degree. One can then use the remaining data to see how well the model would have performed. This procedure is specified in some detail in Section 13.5. At present, it suffices to conclude by stressing that in curve fitting the question of “goodness of fit” is both philosophic and technical, and you do not want to lose sight of either issue. SUMMARY A causal forecasting model uses one or more independent variables to forecast the value of a dependent or response variable. The model is often created by fitting a curve to an existing set of data and then using this curve to determine the response associated with new values of the independent variable(s). The method of least squares is a particularly useful method of fitting a curve. We illustrated the general concept of this method and considered the specific problems of fitting a straight line, a quadratic function, and higher-order polynomials to a set of data. For simplicity, all of our illustrations involved a single independent variable but the same techniques apply to models with many variables. These few examples of causal forecasting models demonstrate that even in simple models the required calculations are tedious. The wide availability of spreadsheets has reduced the problem of performing the necessary calculations so that it is insignificant. The important questions are: What model, if any, can do a reliable job of forecasting, and are the data required for such a model available and reliable? We have discussed both philosophic and technical issues that the “curve fitting” manager must address. Comments on the role of causal models in managerial decision making are reserved for Section 13.7. We now turn our attention to time-series analysis. 13.4 TIME-SERIES FORECASTING MODELS Another class of quantitative forecasting techniques comprises the so-called time-series forecasting models. These models produce forecasts by extrapolating the historical behavior of the values of a particular single variable of interest. For example, one may be interested in the sales for a particular item, or a fluctuation of a particular market price with time. Timeseries models use a technique to extrapolate the historical behavior into the future. Figuratively, the series is being lifted into the future “by its own bootstraps.” Time-series data are historical data in chronological order, with only one value per time period. Thus, the data for the service station from the previous section are not time-series data and cannot be analyzed using the techniques in this section. 1 No matter how much observation seems to substantiate the model, we can never conclude that the model is “true.” Recall the high degree of “substantiation” of the flat earth model. “If you leave port and sail westward you will eventually fall off the earth and never be seen again.” C H A P T E R 1 3 Forecasting CD13-17 EXTRAPOLATING HISTORICAL BEHAVIOR In order to provide several examples of bootstrap methods, let us suppose that we have on hand from the Wall Street Journal the daily closing prices of a March cocoa futures contract for the past 12 days, including today, and that from this past stream of data we wish to predict tomorrow’s closing price. Several possibilities come to mind: 1. If it is felt that all historical values are important, and that all have equal predictive power, we might take the average of the past 12 values as our best forecast for tomorrow. 2. If it is felt that today’s value (the 12th) is far and away the most important, this value might be our best prediction for tomorrow. 3. It may be felt that in the current “fast-trending market” the first six values are too antiquated, but the most recent six are important and each has equal predictive power. We might then take the average of the most recent six values as our best estimate for tomorrow. 4. It may be felt that all past values contain useful information, but today’s (the 12th observation) is the most important of all, and, in succession, the 11th, 10th, 9th, and so on, observations have decreasing importance. In this case we might take a weighted average of all 12 observations, with increasing weights assigned to each value in the order 1 through 12 and with the 12 weights summing to 1. 5. We might actually plot the 12 values as a function of time and then draw a linear “trend line” that lies close to these values. This line might then be used to predict tomorrow’s value. Let us now suppose that tomorrow’s actual closing price is observed and consider our forecast for the day after tomorrow, using the 13 available historical values. Methods 1 and 2 can be applied in a straightforward manner. Now consider method 3. In this case we might take tomorrow’s actual observed price, together with today’s and the previous four prices, to obtain a new 6-day average. This technique is called a simple 6-period moving average, and it will be discussed in more detail in the following sections. Let us now refer to method 4. In this instance, since we employ all past values, we would be using 13 rather than 12 values, with new weights assigned to these values. An important class of techniques called exponential smoothing models operate in this fashion. These models will also be explored in the ensuing discussion. Finally, we shall explore in more detail the technique mentioned in item 5. This provides another illustration of forecasting by a curve fitting method. We mention at this point that whenever we have values for a particular (single) variable of interest, which can be plotted against time, these values are often termed a time series, and any method used to analyze and extrapolate such a series into the future falls within the general category of time-series analysis. This is currently a very active area of research in statistics and management science. We will be able to barely scratch the surface in terms of formal development. Nevertheless, some of the important concepts, from the manager’s viewpoint, will be developed. In this section, we will use the error measures of MAD (mean absolute deviation) and MAPE (mean absolute percentage error) instead of mean squared error (MSE), which was used extensively in Section 13.3. CURVE FITTING We have already considered curve fitting in the discussion of causal models. The main difference in the time-series context is that the independent variable is time. The historical observations of the dependent variable are plotted against time, and a curve is then fitted to these data. The curve is then extended into the future to yield a forecast. In this context, extending the curve simply means evaluating the derived function for larger values of t, the time. This procedure is illustrated for a straight line in Figure 13.13. CD13-18 C D FIGURE 13.13 Fitting a Straight Line C H A P T E R S Sales Forecast for period t + 2 Historical data Forecast for period t + 1 t Time The use of time as an independent variable has more serious implications than altering a few formulas, and a manager should understand the important difference between a causal model using curve fitting and a time-series model using curve fitting. One of the assumptions with curve fitting is that all the data are equally important (weighted). This method produces a very stable forecast that is fairly insensitive to slight changes in the data. The mathematical techniques for fitting the curves are identical, but the rationale, or philosophy, behind the two models is basically quite different. To understand this difference, think of the values of y, the variable of interest, as being produced by a particular underlying process or system. The causal model assumes that as the underlying system changes to produce different values of y, it will also produce corresponding differences in the independent variables and thus, by knowing the independent variables, a good forecast of y can be deduced. The time-series model assumes that the system that produces y is essentially stationary (or stable) and will continue to act in the future as it has in the past. Future patterns in the movement of y will closely resemble past patterns. This means that time is a surrogate for many factors that may be difficult to measure but that seem to vary in a consistent and systematic manner with time. If the system that produces y significantly changes (e.g., because of changes in environment, technology, or government policy), then the assumption of a stationary process is invalid and consequently a forecast based on time as an independent variable is apt to be badly in error. Just as for causal models, it is, of course, possible to use other than linear functions to extrapolate a series of observations (i.e., to forecast the future). As you might imagine, one alternative that is often suggested in practice is to assume that y is a higher-order polynomial in t, that is, yt = b0 + b1t + b2t 2 + . . . + bkt k As before, appropriate values for the parameters b0, b1, . . . ,bk must be mathematically derived from the values of previous observations. The higher-order polynomial, however, suffers from the pitfalls described earlier. That is, perfect (or at least extremely good) historical fits with little or no predictive power may be obtained. MOVING AVERAGES: FORECASTING STECO’S STRUT SALES The assumption behind models of this type is that the average performance over the recent past is a good forecast of the future. The fact that only the most recent data are being used to forecast the future, and perhaps the weighting of the most recent data most heavily, produces a forecast that is much more responsive than a curve fitting model. This new type of model will be sensitive to increases or decreases in sales, or other changes in the data. It is perhaps surprising that these “naive” models are extremely important in applications. Many of the world’s major airlines need to generate forecasts of demand to come by fare class in order to feed these forecasts into sophisticated revenue management optimization engines. A great number of the airlines use a particular type of moving average called exponentially weighted moving averages. In addition, almost all inventory control packages C H A P T E R 1 3 Forecasting CD13-19 include a forecasting subroutine based on this same type of moving average (exponentially weighted moving averages). On the basis of a criterion such as “frequency of use,” the method of moving averages is surely an important forecasting procedure. One person who is deeply concerned about the use of simple forecasting models is Victor Kowalski, the new vice president of operations of STECO. His introduction to inventory control models is discussed in Chapter 7. Since he is responsible for the inventory of thousands of items, simple (i.e., inexpensive) forecasting models are important to him. In order to become familiar with the various models, he decides to “try out” different models on some historical data. In particular he decides to use last year’s monthly sales data for stainless steel struts to learn about the different models and to see how well they would have worked if STECO had been using the models last year. He is performing what is called a validation study. The forecasting models are presented, of course, in symbols. Victor feels that it would be useful to use a common notation throughout his investigation. He thus decides to let yt–1 = observed sales of struts in month t – 1 ŷt = forecast of sales for struts in period t He is interested in forecasting the sales one month ahead; that is, he will take the known historical values y1, . . . , yt–1 (demand in months 1 through t – 1) and use this information to produce ŷt the forecast for demand in month t. In other words, he will take the actual past sales, up through May, for example, and use them to forecast the sales in June; then he will use the sales through June to forecast sales in July, and so on. This process produces a sequence of ŷt values. By comparing these values with the observed yt values, one obtains an indication of how the forecasting model would have worked had it actually been in use last year. Simple n-Period Moving Average The simplest model in the moving average category is the simple n-period moving average. In this model the average of a fixed number (say, n) of the most recent observations is used as an estimate of the next value of y. For example, if n equals 4, then after we have observed the value of y in period 15, our estimate for period 16 would be ŷ16 = y15 + y14 + y13 + y12 4 In general, 1 ŷt+1 = n (yt + yt–1 + . . . + yt–n+1) The application of a 3-period and a 4-period moving average to STECO’s strut sales data is shown in Table 13.1. We see that the 3-month moving average forecast for sales in April is the average of January, February, and March sales, (20 + 24 + 27)/3, or 23.67. Ex post (i.e., after the forecast) actual sales in April were 31. Thus in this case the sales forecast differed from actual sales by 31 – 23.67, or 7.33. Comparing the actual sales to the forecast sales using the data in Table 13.1 suggests that neither forecasting method seems particularly accurate. It is, however, useful to replace this qualitative impression with some quantitative measure of how well the two methods performed. The measures of comparison we’ll use in this section are the mean absolute deviation (MAD) and the mean absolute percentage error (MAPE), where MAD = all forecasts MAPE = all forecasts actual sales – forecast sales number of forecasts actual sales – forecast sales actual sales number of forecasts * 100% (13.7) CD13-20 C D C H A P T E R S Table 13.1 Three- and Four-Month Simple Moving Averages MONTH ACTUAL SALES ($000s) THREE-MONTH SIMPLE MOVING AVERAGE FORECAST FOUR-MONTH SIMPLE MOVING AVERAGE FORECAST Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec. 20 24 27 31 37 47 53 62 54 36 32 29 (20 + 24 + 27)/3 = 23.67 (24 + 27 + 31)/3 = 27.33 (27 + 31 + 37)/3 = 31.67 (31 + 37 + 47)/3 = 38.33 (37 + 47 + 53)/3 = 45.67 (47 + 53 + 62)/3 = 54.00 (53 + 62 + 54)/3 = 56.33 (62 + 54 + 36)/3 = 50.67 (54 + 36 + 32)/3 = 40.67 (20 + 24 + 27 + 31)/4 = 25.50 (24 + 27 + 31 + 37)/4 = 29.75 (27 + 31 + 37 + 47)/4 = 35.50 (31 + 37 + 47 + 53)/4 = 42.00 (37 + 47 + 53 + 62)/4 = 49.75 (47 + 53 + 62 + 54)/4 = 54.00 (53 + 62 + 54 + 36)/4 = 51.25 (62 + 54 + 36 + 32)/4 = 46.00 The MAD is calculated for the 3-month (beginning with April) and 4-month (beginning with May) moving average forecast in a new spreadsheet (STRUT.XLS), which is shown in Figure 13.14. Since the 3-month moving average yields a MAD of 12.67 (cell D16), whereas the 4-month moving average yields a MAD of 15.59 (cell F16), it seems (at least historically) that including more historical data harms rather than helps the forecasting accuracy. A simple moving average will always lag behind rising data and stay above declining data. Thus, if there are broad rises and falls, simple moving averages will not perform well. They are best suited to data with small erratic ups and downs, providing some stability in the face of the random perturbations. FIGURE 13.14 Mean Absolute Deviation Comparison for Three- and Four-Month Moving Average Forecasts Cell Formula Copy To C5 D5 E6 F6 D15 D16 AVERAGE(B2:B4) ABS(B5-C5) AVERAGE(B2:B5) ABS(B6-E6) SUM(D5:D13) AVERAGE(D5:D13) C6:C13 D6:D13 E7:E13 F7:F13 F15 F16 C H A P T E R 1 3 Forecasting CD13-21 The simple moving average has two shortcomings, one philosophical and the other operational. The philosophical problem centers on the fact that in calculating a forecast (say, ŷ8), the most recent observation (y7) receives no more weight or importance than an older observation such as y5. This is because each of the last n observations is assigned the weight 1/n. This procedure of assigning equal weights stands in opposition to one’s intuition that in many instances the more recent data should tell us more than the older data about the future. Indeed, the analysis in Figure 13.14 suggests that better predictions for strut sales are based on the most recent data. The second shortcoming, which is operational, is that if n observations are to be included in the moving average, then (n – 1) pieces of past data must be brought forward to be combined with the current (the nth) observation. All this data must be stored in some way, in order to calculate the forecast. This is not a serious problem when a small number of forecasts is involved. The situation is quite different for the firm that needs to forecast the demand for thousands of individual products on an item-by-item basis. If, for example, STECO is using 8-period moving averages to forecast demand for 5,000 small parts, then for each item 7 pieces of old data must be stored for each forecast, in addition to the most recent observation. This implies that a total of 40,000 pieces of data must be stored. Another example where the number of forecasts is huge comes from the airline industry. Consider an airline with a large number of flights departing per day (like United Airlines or Continental). Suppose it has 2,000 flights departing every day and it tracks all flights for 300 days in advance. This means it has 600,000 flights to track and forecast on an ongoing basis. In both these cases, storage requirements, as well as computing time, may become important factors in designing a forecasting and/or inventory control system. Weighted n-Period Moving Average The notion that recent data are more important than old data can be implemented with a weighed n-period moving average. This generalizes the notion of a simple n-period moving average, where, as we have seen, each weight is 1/n. In this more general form, taking n = 3 as a specific example, we would set ŷ7 = 0y6 + 1y5 + 2y4 where the ’s (which are called weights) are nonnegative numbers that are chosen so that smaller weights are assigned to more ancient data and all the weights sum to 1. There are, of course, innumerable ways of selecting a set of ’s to satisfy these criteria. For example, if the weighted average is to include the last three observations (a weighted 3-period moving average), one might set 3 2 1 ŷ = y6 + y5 + y4 6 6 6 Alternatively, one could define ŷ7 = 5 3 2 y + y + y 10 6 10 5 10 4 In both these expressions we have decreasing weights that sum to 1. In practice, the proper choice of weights could easily be left to the Solver algorithm. To get some idea about its performance, Victor applies the 3-month weighted moving average with initial weights 3/6, 2/6, 1/6 to the historical stainless strut data. The forecasts and the MAD are developed by Victor in a new sheet “WMA” in the same workbook (STRUT.XLS) and are shown in Figure 13.15. Comparing the new MAD Victor obtained of 11.04 (cell G16) to the MAD of the 3-month simple moving average (12.67) and the 4-month simple moving average (15.59) confirms the suggestion that recent sales results are a better indicator of future sales than are older data. CD13-22 C D C H A P T E R S FIGURE 13.15 Initial Three-Month Weighted Moving Average Cell Formula Copy To B4 F5 G5 G15 G16 SUM(B1:B3) SUMPRODUCT($B$1:$B$3, E2:E4) ABS(E5-F5) SUM(G5:G13) AVERAGE(G5:G13) — F6:F13 G6:G13 — — Of course, if we let the Solver choose the optimal weights for us, we can do even better than our initial guess at the weights. To let Solver choose the weights that minimize the MAD, we do the following: 1. Click on Tools, then “Solver.” 2. Set the Target cell to G16 and tell Solver we want to minimize it. 3. Indicate that the changing cells are B1:B3. 4. Add the constraints that (a) B4 = 1.0, (b) B1:B3 0, (c) B1:B3 1, (d) B1 B2, and (e) B2 B3. 5. Click on Solve and you get the results shown in Figure 13.16. Here we see that the optimal weighting is to place all the weight on the most recent observation, which yields a MAD of 7.56 (a 31.5% reduction in error from our initial guess). This continues to confirm our idea that we should give more weight to the most recent observation (to the extreme in this example). Although the weighted moving average places more weight on more recent data, it does not solve the operational problems of data storage, since (n – 1) pieces of historical FIGURE 13.16 Optimal Three-Month Weighted Moving Average C H A P T E R 1 3 Forecasting CD13-23 sales data must still be stored. We now turn to a weighting scheme that cleverly addresses this problem. EXPONENTIAL SMOOTHING: THE BASIC MODEL We saw that, in using a weighted moving average, there are many different ways to assign decreasing weights that sum to 1. One way is called exponential smoothing, which is a shortened name for an exponentially weighted moving average. This is a scheme that weights recent data more heavily than past data, with weights summing to 1, but it avoids the operational problem just discussed. In this model, for any t 1 the forecast for period t + 1, denoted ŷt+1, is a weighted sum (with weights summing to 1) of the actual observed sales in period t (i.e., yt) and the forecast for period t (which was ŷt). In other words, Observed in t Forecast for t → → → Forecast for t + 1 ŷt+1 = yt + (1 – )ŷt (13.8) where is a user-specified constant such that 0 1. The value assigned to determines how much weight is placed on the most recent observation in calculating the forecast for the next period. Note in equation (13.8) that if is assigned a value close to 1, almost all the weight is placed on the actual demand in period t. Exponential smoothing has important computational advantages. To compute ŷt+1, only ŷt need be stored (together with the value of ). As soon as the actual yt is observed, we compute ŷt+1 = yt + (1 – )ŷt . If STECO wanted to forecast demand for 5,000 small parts, in each period, then 10,001 items would have to be stored (the 5000 ŷt values, the 5000 yt values, and the value of ), as opposed to the previously computed 40,000 items needed to implement an 8-period moving average. Depending on the behavior of the data, it might be necessary to store a different value of for each item, but even then much less storage would be required than if using moving averages. The thing that is nice about exponential smoothing is that by saving and the last forecast, all the previous forecasts are being stored implicitly. In order to obtain more insight into the exponential smoothing model, let us note that when t = 1 the expression used to define ŷ2 is ŷ2 = y1 + (1 – )ŷ1 In this expression ŷ1 is an “initial guess” at the value for y in period 1, and y1 is the observed value in period 1. To get the exponential smoothing forecast going, we need to provide this “initial guess.” Several options are available to us: (1) First and most commonly, we let ŷ1 = y1 (i.e., we assume a “perfect” forecast to get the process rolling, but we don’t count this error of zero in our calculation of the MAD). Other choices include (2) looking ahead at all the available data and letting ŷ1 = y (average of all available data), or (3) letting ŷ1 = the average of just the first couple of months. We will choose the first approach. At this point Victor decides to use the spreadsheet “EXPSMTH” in the same workbook (STRUT.XLS) to apply exponential smoothing to the stainless steel strut data. Figure 13.17 shows actual sales and estimated sales for 12 months using an initial value for = 0.5. He has also calculated the MAD for February through December. Indeed, the exponential smoothing model with = 0.5 yields a smaller MAD (9.92 in cell G16) than the moving average models (see Figure 13.14) or our initial guess at a weighted moving average model (see Figure 13.15). Victor knows he can find a better model by using the Solver to select the optimal value of (one that minimizes the MAD), but he is pleased with the initial results. The MAD is smaller than what he obtained with several previous models, and the calculations are simple. From a computational viewpoint it is reasonable to consider exponential smoothing as an affordable way to forecast the sales of the many products STECO holds in inventory. Although the results obtained from the exponential smoothing model are impressive, it is clear that the particular numerical values (column F of Figure 13.17) depend on the CD13-24 C D C H A P T E R S FIGURE 13.17 Exponential Smoothing Forecast, Initial = 0.5 Cell Formula Copy To F3 G3 G15 G16 $B$1*E2 (1$B$1)*F2 ABS(E3F3) SUM(G3:G13) AVERAGE(G3:G13) F4:F13 G4:G13 — — values selected for the smoothing constant and the “initial guess” ŷ1. In order to find the optimal value of , we just set up a nonlinear optimization model using Excel’s Solver tool. Of course, if we let the Solver choose the optimal for us, we can do even better than our initial guess of = 0.5. To let Solver choose the that minimizes the MAD, we do the following: 1. Click on Tools, then “Solver . . .” 2. Set the Target cell to G16 and tell Solver we want to minimize it. 3. Indicate that the changing cell is B1. 4. Add the constraints that (a) B1 0, and (b) B1 1. 5. Click on Solve and you get the results shown in Figure 13.18. FIGURE 13.18 Exponential Smoothing Forecast, Optimal C H A P T E R 1 3 Forecasting CD13-25 Again, as with the weighted moving average approach, we see that because of the linear trend (up, then down) in the data, that the more weight we can put on the most recent observation, the better the forecast. So, not surprisingly, the optimal = 1.0, and this forecasting approach gives Victor a MAD of 6.82 (cell G16), which is the best performance he has seen so far. Because of the importance of the basic exponential smoothing model, it is worth exploring in more detail how it works and when it can be successfully applied to real models. We will now examine some of its properties. To begin, note that if t 2 it is possible to substitute t – 1 for t in (13.8) to obtain ŷt = yt–1 + (1 – )ŷt–1 Substituting this relationship for ŷt back into the original expression for ŷt+1 (i.e., into [13.8]) yields for t 2, ŷt+1 = yt + (1 – )yt–1 + (1 – )2ŷt–1 By successively performing similar substitutions, one is led to the following general expression for ŷt+1: ŷt+1 = yt + (1 – )yt–1 + (1 – )2yt–2 + . . . + (1 – )t–1y1 + (1 – )tŷ1 (13.9) For example, ŷ4 = y3 + (1 – )y2 + (1 – )2y1 + (1 – )3ŷ1 Since usually 0 < < 1, it follows that 0 < 1 – < 1. Thus, > (1 – ) > (1 – )2 In other words, in the previous example y3, the most recent observation, receives more weight than y2, which receives more weight than y1. This illustrates the general property of an exponential smoothing model—that the coefficients of the y’s decrease as the data become older. It can also be shown that the sum of all of the coefficients (including the coefficient of ŷ1) is 1; that is in the case of ŷ4, for example, + (1 – ) + (1 – )2 + (1 – )3 = 1 We have thus seen in equation (13.9) that the general value ŷt+1 a weighted sum of all previous observations (including the last observed value, yt). Moreover, the weights sum to 1 and are decreasing as historical observations get older. The last term in the sum, namely ŷ1, is not a historical observation. Recall that it was a “guess” at y1. We can now observe that as t increases, the influence of ŷ1 on ŷt+1 decreases and in time becomes negligible. To see this, note that the coefficient of ŷ1 in (13.9) is (1 – )t. Thus, the weight assigned to ŷ1 decreases exponentially with t. Even if is small (which makes [1 – ] nearly 1), the value of (1 – )t decreases rapidly. For example, if = 0.1 and t = 20, then (1 – )t = 0.12. If = 0.1 and t = 40, then (1 – )t = 0.015. Thus, as soon as enough data have been observed, the value of ŷt+1 will be quite insensitive to the choice for ŷ1. Obviously, the value of , which is a parameter input by the manager, affects the performance of the model. As you can see explicitly in (13.8), it is the weight given to the data value (yt ) most recently observed. This implies that the larger the value of , the more strongly the model will react to the last observation (we call this a responsive forecast). This, as we will see, may or may not be desirable. When 0.0, this means almost complete trust in the last forecast and almost completely ignoring the most recent observation (i.e., last data point). This would be an extremely stable forecast. Table 13.2 shows values for the weights (in equation [13.9]) when = 0.1, 0.3, and 0.5. You can see that for the larger values of (e.g., = 0.5) more relative weight is assigned to the more recent observations, and the influence of older data is more rapidly diminished. To illustrate further the effect of choosing various values for (i.e., putting more or less weight on recent observations), we consider three specific cases. CD13-26 C D C H A P T E R S Table 13.2 Weights for Different Values of VARIABLE COEFFICIENT = 0.1 = 0.3 = 0.5 yt yt–1 yt–2 yt–3 yt–4 yt–5 yt–6 yt–7 yt–8 yt–9 yt–10 Sum of the Weights (1 – ) (1 – )2 (1 – )3 (1 – )4 (1 – )5 (1 – )6 (1 – )7 (1 – )8 (1 – )9 (1 – )10 0.1 0.09 0.081 0.07290 0.06561 0.05905 0.05314 0.04783 0.04305 0.03874 0.03487 0.68619 0.3 0.21 0.147 0.10290 0.07203 0.05042 0.03530 0.02471 0.01729 0.01211 0.00847 0.98023 0.5 0.25 0.125 0.0625 0.03125 0.01563 0.00781 0.00391 0.00195 0.00098 0.00049 0.99610 Case 1 (Response to a Sudden Change) Suppose that at a certain point in time the underlying system experiences a rapid and radical change. How does the choice of influence the way in which the exponential smoothing model will react? As an illustrative example consider an extreme case in which yt = 0 for t = 1, 2, . . . , 99 yt = 1 for t = 100, 101, . . . This situation is illustrated in Figure 13.19. Note that in this case if ŷ1 = 0, then ŷ100 = 0 for any value of , since we are taking the weighted sum of a series of zeros. Thus, at time 99 our best estimate of y100 is 0, whereas the actual value will be 1. At time 100 we will first see that the system has changed. The question is: How quickly will the forecasting system respond as time passes and the information that the system has changed becomes available? To answer this question, we plot ŷt+1 or = 0.5 and = 0.1 in Figure 13.20. Note that when = 0.5, ŷ106 = 0.984; thus at time 105 our estimate of y106 would be 0.984, whereas the true value will turn out to be 1. When = 0.1 our estimate of y106 is only 0.468. We see then that a forecasting system with = 0.5 responds much more quickly to changes in the data than does a forecasting system with = 0.1. The manager would thus prefer a relatively large if the system is characterized by a low level of random behavior, but is subject to occasional enduring shocks. (Case 1 is an extreme example of this situation.) However, suppose that the data are characterized by large random errors but a stable mean. Then if is large, a large random error in yt will throw the forecast value, ŷt+1, way off. Hence, for this type of process a smaller value of would be preferred. FIGURE 13.19 yt System Change when t = 100 1 0 95 96 97 98 99 100 101 102 t C H A P T E R FIGURE 13.20 yˆ t + 1 Response to a Unit Change in yt 1 3 Forecasting CD13-27 When = 0.5 yˆ 106 = 0.984 .9 .8 = 0.5 .7 .6 When = 0.1 yˆ 106 = 0.468 .5 = 0.1 .4 .3 .2 .1 100 105 110 115 120 t Case 2 (Response to a Steady Change) As opposed to the rapid and radical change investigated in Case 1, suppose now that a system experiences a steady change in the value of y. An example of a steady growth pattern is illustrated in Figure 13.21. This example is called a linear ramp. Again the questions are: How will the exponential smoothing model respond, and how will this response be affected by the choice of ? In this case, recall that ŷt+1 = yt + (1 – )yt–1 + . . . Since all previous y’s (y1, . . . , yt–1) are smaller than yt , and since the weights sum to 1, it can be shown that, for any between 0 and 1, ŷt+1 < yt . Also, since yt+1 is greater than yt , we see that ŷt+1 < yt < yt+1. Thus our forecast will always be too small. Finally, since smaller values of put more weight on older data, the smaller the value of , the worse the forecast becomes. But even with very close to 1 the forecast is not very good if the ramp is steep. The moral for managers is that exponential smoothing (or indeed any weighted moving average), without an appropriate modification, is not a good forecasting tool in a rapidly growing market or a declining market. The model can be adjusted to include the trend and this is called Holt’s model (or exponential smoothing with trend), and the method will be shown in more detail later in this section. In reality, the observation that Victor made with the struts in our previous example (i.e., both weighted moving average and exponential smoothing placed ALL the weight on the most recent observation) is a good clue to you as a manager that there is obvious trend in the data and that you should consider a different forecasting model. FIGURE 13.21 yt Steadily Increasing Values of yt (a Linear Ramp) t CD13-28 C D C H A P T E R S Case 3 (Response to a Seasonal Change) Suppose that a system experiences a regular seasonal pattern in y (such as would be the case if y represents, for example, the demand in the city of Denver for swimming suits). How then will the exponential smoothing model respond, and how will this response be affected by the choice of ? Consider, for example, the seasonal pattern illustrated in Figure 13.22, and suppose it is desired to extrapolate several periods forward. For example, suppose we wish to forecast demand in periods 8 through 11 based only on data through period 7. Then ŷ8 = y7 + (1 – )ŷ7 Now to obtain ŷ9 , since we have data only through period 7, we assume that y8 = ŷ8. Then ŷ9 = y8 + (1 – )ŷ8 = ŷ8 + (1 – )ŷ8 = ŷ8 Similarly, it can be shown that ŷ11 = yˆ10 = ŷ9 = ŷ8. In other words, ŷ8 is the best estimate of all future demands. Now let us see how good these predictions are. We know that ŷt+1 = yt + (1 – )yt–1 + (1 – )2yt–2 + . . . Suppose that a small value of is chosen. By referring to Table 13.2 we see that when is small (say, 0.1) the coefficients for the most recent terms change relatively slowly (i.e., they are nearly equal to each other). Thus, ŷt+1 will resemble a simple moving average of a number of terms. In this case the future predictions (e.g., ŷ11) will all be somewhere near the average of the past observations. The forecast thus essentially ignores the seasonal pattern. If a large value of is chosen, ŷ11, which equals ŷ8, will be close in value to y7, which is obviously not good. In other words, the model fares poorly in this case regardless of the choice of . The exponential smoothing model ŷt+1 = yt + (1 – )ŷt is intended for situations in which the behavior of the variable of interest is essentially stable, in the sense that deviations over time have nothing to do with time, per se, but are caused by random effects that do not follow a regular pattern. This is what we have termed the stationarity assumption. Not surprisingly, then, the model has various shortcomings when it is used in situations (such as a linear ramp or swimming suit demand) that do not fit this prescription. Although this statement may be true, it is not very constructive. What approach should a manager take when the exponential smoothing model as described above is not appropriate? In the case of a seasonal pattern, a naive approach would be to use the exponential smoothing model on “appropriate” past data. For example, the airlines or hotels, which exhibit strong day-of-week seasonality, could take a smoothed average of demand on previous Mondays to forecast demand on upcoming Mondays. Another business with monthly seasonality might take a smoothed average of sales in previous Junes to forecast sales this June. This latter approach has two problems. First, it ignores a great deal of useful information. Certainly sales from last Tuesday to Sunday in the airline or hotel example (or July through this May in the other example) should provide at least a limited amount of infor- FIGURE 13.22 Demand Seasonal Pattern in yt yt 7 8 9 10 11 t C H A P T E R 1 3 Forecasting CD13-29 mation about the likely level of sales this Monday (or June). Second, if the cycle is very long, say a year, this approach means that very old data must be used to get a reasonable sample size. The above assumption, that the system or process producing the variable of interest is essentially stationary over time, becomes more tenuous when the span of time covered by the data becomes quite large. If the manager is convinced that there is either a trend (Case 2) or a seasonal effect (Case 3) in the variable being predicted, a better approach is to develop forecasting models that incorporate these features. When there is a discernible pattern of seasonality (which can be seen fairly easily by graphing the data in Excel) there are methods, using simple moving averages, to determine a seasonality factor. Using this factor, the data can be “deseasonalized,” some forecasting method used on the deseasonalized data, and the forecast can then be “reseasonalized.” This approach will be shown after the trend model in the next section. HOLT’S MODEL (EXPONENTIAL SMOOTHING WITH TREND) As discussed above, simple exponential smoothing models don’t perform very well on models that have obvious up or down trend in the data (and no seasonality). To correct this, Holt developed the following model: ŷt+k = Lt + kTt where: Lt = yt + (1 – )(Lt–1 + Tt–1) Tt = (Lt – Lt–1) + (1 – )Tt–1 (13.10) Holt’s model allows us to forecast up to k time periods ahead. In this model, we now have two smoothing parameters, and , both of which must be between 0 and 1. The Lt term indicates the long-term level or base value for the time-series data. The Tt term indicates the expected increase or decrease per period (i.e., the trend). Let’s demonstrate how to make this model work with a new example. Amy Luford is an analyst with a large brokerage firm on Wall Street. She has been looking at the quarterly earnings of Startup Airlines and is expected to make a forecast of next quarter’s earnings. She has the following data and graph available to her in a spreadsheet (STARTUP.XLS) as shown in Figure 13.23. Amy can see that the data has obvious trend to it, as she would expect for a successful new business venture. She wants to apply Holt’s trend model to the data to generate her forecast of earnings per share (EPS) for the thirteenth quarter. This forecasting approach is demonstrated in her spreadsheet “Holt” in the same workbook (STARTUP.XLS) and is shown in Figure 13.24. She needs initial values for both L and T. FIGURE 13.23 Startup Airlines Earnings per Share CD13-30 C D C H A P T E R S FIGURE 13.24 Exponential Smoothing with Trend Model for Startup Airlines Cell Formula Copy To C5 C6 D6 E6 F6 F18 B5 $B$1*B6 (1-$B$1)*(C5D5) $B$2* (C6-C5)(1-$B$2)*D5 SUM(C5:D5) ABS(B6-E6)/B6 AVERAGE(F6:F16) — C7:C16 D7:D16 E7:E17 F7:F16 — She has several choices: (1) let L1 = actual EPS for quarter 1 and T1 = 0, (2) let L1 = average EPS for all 12 quarters and T1 = average trend for all 12 quarters, and many other variations in between. Amy chooses the first option. With initial guesses for = 0.5 and = 0.5, she sees that the mean absolute percentage error (MAPE) is 43.3% (cell F18). Although this is fairly high, Amy tries putting in a of zero (as if there were no trend and she was back to simple exponential smoothing to see if she’s gaining anything by this new model), and she sees in Figure 13.25 that the MAPE is much worse at 78.1%. FIGURE 13.25 Spreadsheet Model for Startup Airlines with No Trend C H A P T E R 1 3 Forecasting CD13-31 FIGURE 13.26 Optimal Exponential Smoothing with Trend Spreadsheet Model for Startup Airlines Finally, she decides to use Solver to help her find the optimal values for and because she knows Solver can do better than her initial guesses of 0.5. To let Solver choose the and that minimize the MAPE, she does the following: 1. Click on Tools, then “Solver.” 2. Set the Target cell to F18 and tell Solver we want to minimize it. 3. Indicate that the changing cells are B1:B2. 4. Add the constraints that (a) B1:B2 0, and (b) B1:B2 1. 5. Click on Solve to get the results shown in Figure 13.26. Amy sees that the * = 0.59 and * = 0.42 and the MAPE has been lowered to 38%, which is nearly a 12.5% improvement over the MAPE with her initial guesses of = 0.5 and = 0.5. The other forecasting approach Amy could have tried where she could see that there was an obvious trend in her data (and therefore that simple exponential smoothing and weighted moving averages would not be effective) would be to do a linear regression on the data with time being the independent variable. We’ll leave this as an exercise for the student (see Problem 13-19). SEASONALITY When making forecasts using data from a time series, one can often take advantage of seasonality. Seasonality comprises movements up and down in a pattern of constant length that repeats itself. For example, if you were looking at monthly data on sales of ice cream, you would expect to see higher sales in the warmer months (June to August in the Northern Hemisphere) than in the winter months, year after year. The seasonal pattern would be 12 months long. If we used weekly data, the seasonal pattern would repeat every 52 periods. The number of time periods in a seasonal pattern depends on how often the observations are collected. In another example we may be looking at daily data on the number of guests staying overnight at a downtown business hotel. Our intuition might tell us that we expect high numbers on Monday, Tuesday, and Wednesday nights, low numbers on Friday and Saturday, and medium numbers on Thursday and Sunday. So our pattern would be as follows, starting with Sunday: Medium, High, High, High, Medium, Low, Low. The pattern would repeat itself every seven days. CD13-32 C D C H A P T E R S FIGURE 13.27 Coal Receipts over a Nine-Year Period The approach for treating such seasonal patterns consists of four steps: (1) Look at the original data that exhibit a seasonal pattern. From examining the data and from our own judgment, we hypothesize an m-period seasonal pattern. (2) Using the numerical approach described in the next section, we deseasonalize the data. (3) Using the best forecasting method available, we make a forecast in deseasonalized terms. (4) We reseasonalize the forecast to account for the seasonal pattern. We will illustrate these concepts with data on U.S. coal receipts by the commercial/residential sectors over a nine-year period (measured in thousands of tons).2 Frank Keetch is the manager of Gillette Coal Mine and he is trying to make a forecast of demand in the upcoming two quarters. He has entered the following data for the entire industry in a spreadsheet (COAL.XLS) and it is graphed in Figure 13.27. Intuition tells Frank to expect higher than average coal receipts in the first and fourth quarters (winter effects) and lower than average in the second and third quarters (spring/summer effects). Deseasonalizing The procedure to deseasonalize data is quite simply to average out all variations that occur within one season. Thus for quarterly data an average of four periods is used to eliminate within-year seasonality. In order to deseasonalize a whole time series, the first step is to calculate a series of m-period moving averages, where m is the length of the seasonal pattern. In order to calculate this four-period moving average, he has to add two columns (C and D) to his Excel spreadsheet, which is shown in Figure 13.28. Column C of Figure 13.28 shows a four-period moving average of the data in column B. The first number is the average of the first four periods, (2159 + 1203 + 1094 + 1996)/4 = 1613 The second number is the average of the next four periods, and so on. Frank really would like to center the moving average in the middle of the data from which it was calculated. If m is odd, the first moving average (average of points 1 to m) is easily centered on the (m + 1)/2 point (e.g., suppose you have daily data where m = 7, the first seven-period moving average is centered on the (7 + 1)/2 or fourth point). This process rolls forward to find the average of the second through (m + 1)st point, which is centered on the (m + 3)/2 point, and so forth. 2 See Quarterly Coal Reports. C H A P T E R 1 3 Forecasting CD13-33 FIGURE 13.28 Spreadsheet to Deseasonalize the Data Cell Formula Copy To C10 D10 E10 F8 F9 F10 F11 AVERAGE(B8:B11) AVERAGE(C10:C11) B10/D10 $E$1 $E$2 $E$3 $E$4 C11:C42 D11:D41 E11:E41 F12, F16, F20, F24, F28, F32, F36, F40 F13, F17, F21, F25, F29, F33, F37, F41 F14, F18, F22, F26, F30, F34, F38, F42 F15, F19, F23, F27, F31, F35, F39, F43 If m is even, as it is in Frank’s situation, the task is a little more complicated, using an additional step to get the moving averages centered. Since the average of the first four points should really be centered at the midpoint between the second and third data point, and the average of periods two through five should be centered halfway between periods three and four, the value to be centered at period three can be approximated by taking the average of the first two averages. Thus the first number in the centered moving average column (column D) is (1613 + 1594)/2 = 1603 A graph of the original data along with their centered moving averages is shown in Figure 13.29. Notice that as Frank expected, coal receipts are above the centered moving average in the first and fourth quarters, and below average in the second and third quarters. Notice that the moving average has much less volatility than the original series; again, the averaging process eliminates the quarter-to-quarter movement. FIGURE 13.29 Graph of Data and Its Centered Moving Average CD13-34 C D C H A P T E R S FIGURE 13.30 Ratios by Quarter and Their Averages Cell Formula Copy To E1 E3 AVERAGE(H1:O1) AVERAGE(G3:N3) E2 E4 The third step is to divide the actual data at a given point in the series by the centered moving average corresponding to the same point. This calculation cannot be done for all possible points, since at the beginning and end of the series we are unable to compute a centered moving average. These ratios represent the degree to which a particular observation is below (as in the .682 for period three shown in cell E10 of Figure 13.28) or above (as in the 1.24 for period four shown in cell E11) the typical level. Note that these ratios for the third quarter tend to be below 1.0 and the ratios for the fourth quarter tend to be above 1.0. These ratios form the basis for developing a “seasonal index.” To develop the seasonal index, we first group the ratios according by quarter (columns G through O), as shown in Figure 13.30. We then average all of the ratios to moving averages quarter by quarter (column E). For example, all the ratios for the first quarter average 1.108. This is a seasonal index for the first quarter, and Frank concludes that the first quarter produces coal receipts that are on average about 110.8% relative to the average of all quarters. These seasonal indices represent what that particular season’s data look like on average compared to the average of the entire series. A seasonal index greater than 1 means that season is higher than the average for the year; likewise an index less than 1 means that season is lower than the average for the year. The last step of deseasonalization is to take the actual data and divide it by the appropriate seasonal index. This is shown in columns F and G of Figure 13.31. The deseasonalized data are graphed in Figure 13.32. Frank notices that the deseasonalized data seem to “jump around” a lot less than the original data. Forecasting Once the data have been deseasonalized, a deseasonalized forecast can be made. This should be based on an appropriate methodology that accounts for the pattern in the deseasonalized data (e.g., if there is trend in the data, use a trend-based model). In this example, Frank chooses to forecast coal receipts for the first quarter of the tenth year by simple exponential smoothing. Using this forecasting method, it turns out that the optimal smoothing constant is * = .653, which yields an MSE of 47,920 (see Figure 13.33, page CD13-36). Frank determines that the deseasonalized forecast would be 1726 thousand tons for the first quarter of next year (cell H44). This would also be the deseasonalized forecast amount for the second quarter given the data we currently have. Reseasonalizing The last step in the process is for Frank to reseasonalize the forecast of 1726. The way to do this is to multiply 1726 by the seasonal index for the first quarter (1.108) to obtain a value of 1912. A seasonalized forecast for the second quarter would be 1726 times its seasonal index (0.784), giving a value of 1353. These values would represent Frank’s point forecasts for the coming two quarters. THE RANDOM WALK The moving average and exponential smoothing techniques discussed above are examples of what are called time-series models. Recently, much more sophisticated methods for time-series analysis have become available. These methods, based primarily on develop- C H A P T E R 1 3 Forecasting CD13-35 FIGURE 13.31 Calculation of Deseasonalized Values Cell Formula Copy To G8 B8/F8 G9:G43 ments by G. E. P. Box and G. M. Jenkins, have already had an important impact on the practice of forecasting, and indeed the Box-Jenkins approach is incorporated in several computer-based forecasting packages. These time-series forecasting techniques are based on the assumption that the true values of the variable of interest, yt , are generated by a stochastic (i.e., probabilistic) model. FIGURE 13.32 Graph of Deseasonalized Data CD13-36 C D C H A P T E R S FIGURE 13.33 Spreadsheet for Exponential Smoothing Forecast of Deseasonalized Data Cell Formula Copy To H8 H9 J10 G8 $J$7*G8(1$J$7)*H8 SUMXMY2(G9:G43,H9:H43)/COUNT(G9:G43) — H10:H44 — Introducing enough of the theory of probability to enable us to discuss these models in any generality seems inappropriate, but one special and very important (and very simple) process, called a random walk, serves as a nice illustration of a stochastic model. Here the variable yt is assumed to be produced by the relationship yt = yt–1 + ε where the value of ε is determined by a random event. To illustrate this process even more explicitly, let us consider a man standing at a street corner on a north-south street. He flips a fair coin. If it lands with a head showing (H), he walks one block north. If it lands with a tail showing (T), he walks one block south. When he arrives at the next corner (whichever one it turns out to be), he repeats the process. This is the classic example of a random walk. To put this example in the form of the model, label the original corner zero. We shall call this the value of the first observation, y1. Starting at this point, label successive corners going north +1, +2, . . . Also starting at the original corner label successive corners going south –1, –2, . . . (see Figure 13.34). These labels that describe the location of our random walker are the yt’s. FIGURE 13.34 yt Classic Random Walk 4 North 3 2 1 Original corner 0 –1 –2 South –3 –4 1 2 3 4 5 6 7 8 9 10 t C H A P T E R 1 3 Forecasting CD13-37 In the model, yt = yt–1 + ε, where (assuming a fair coin) ε = 1 with probability 1\2 and ε = –1 with probability 1\2. If our walker observes the sequence H, H, H, T, T, H, T, T, T, he will follow the path shown in Figure 13.34. Forecasts Based on Conditional Expected Value Suppose that after our special agent has flipped the coin nine times (i.e., he has moved nine times, and we have [starting with corner 0] ten observations of corners), we would like to forecast where he will be after another move. This is the typical forecasting problem in the time-series context. That is, we have observed y1, y2, . . . , y10 and we need a good forecast ŷ11 of the forthcoming value y11. In this case, according to a reasonable criterion, the best value for ŷ11 is the conditional expected value of the random quantity y11. In other words, the best forecast is the expected value of y11 given that we know y1,y2, . . . , y10. From the model we know that y11 will equal (y10 + 1) with a probability equal to 1\2 and y11 will equal (y10 – 1) with a probability equal to 1\2. Thus, E(y11y1, . . . , y10), the conditional expected value of y11 given y1, y2, . . . , y10, is calculated as follows: E(y11y1, . . . , y10) = (y10 + 1)1\2 + (y10 – 1)1\2 = y10 Thus we see that for this model the data y1, . . . , y9 are irrelevant, and the best forecast of the random walker’s position one move from now is his current position. It is interesting to observe that the best forecast of the next observation y12 given y1, . . . , y10 (the original set) is also y10. Indeed, the best forecast for any future value of y1, given this particular model, is its current value. Seeing What Isn’t There This example is not as silly as it may seem at first glance. Indeed, there is a great deal of evidence that supports the idea that stock prices and foreign currency exchange rates behave like a random walk and that the best forecast of a future stock price or of an exchange rate (e.g., German Mark/$) is its current value. Not surprisingly, this conclusion is not warmly accepted by research directors and technical chartists who make their living forecasting stock prices or exchange rates. One reason for the resistance to the random walk hypothesis is the almost universal human tendency when looking at a set of data to observe certain patterns or regularities, no matter how the data are produced. Consider the time-series data plotted in Figure 13.35. It does not seem unreasonable to believe that the data are following a sinusoidal pattern as suggested by the smooth curve in the figure. In spite of this impression, the data were in fact generated by the random walk model presented earlier in this section. This illustrates the tendency to see patterns where there are none. In Figure 13.35, any attempt to predict future values by extrapolating the sinusoidal pattern would have no more validity than flipping a coin. In concluding this section we should stress that it is not a general conclusion of timeseries analysis that the best estimate of the future is the present (i.e., that ŷt+1 = yt). This FIGURE 13.35 Time-Series Data yt +5 0 –5 10 20 30 40 50 t CD13-38 C D C H A P T E R S result holds for the particular random walk model presented above. The result depends crucially on the assumption that the expected or mean value of ε, the random component, is zero. If the probability that ε equals 1 had been 0.6 and the probability that ε equals –1 had been 0.4, the best forecast of yt+1 would not have been yt . To find this forecast one would have had to find the new value for E(yt+1y1, . . . . yt). Such a model is called a random walk with a drift. 13.5 THE ROLE OF HISTORICAL DATA: DIVIDE AND CONQUER Historical data play a critical role in the construction and testing of forecasting models. One hopes that a rationale precedes the construction of a quantitative forecasting model. There may be theoretical reasons for believing that a relationship exists between some independent variables and the dependent variable to be forecast and thus that a causal model is appropriate. Alternatively, one may take the time-series view that the “behavior of the past” is a good indication of the future. In either case, however, if a quantitative model is to be used, the parameters of the model must be selected. For example: 1. In a causal model using a linear forecasting function, y = a + bx, the values of a and b must be specified. 2. In a time-series model using a weighted n-period moving average, ŷt+1 = 0yt + 1yt–1 + . . . + n–1yt–n+1, the number of terms, n, and the values for the weights, 0, 1, . . . . , n–1, must be specified. 3. In a time-series model using exponential smoothing, ŷt+1 = yt + (1 – )ŷt , the value of must be specified. In any of these models, in order to specify the parameter values, one typically must make use of historical data. A useful guide in seeking to use such data effectively is to “divide and conquer.” More directly, this means that it is often a useful practice to use part of the data to estimate the parameters and the rest of the data to test the model. With real data, it is also important to “clean” the data—that is, examine them for irregularities, missing information, or special circumstances, and adjust them accordingly. For example, suppose that a firm has weekly sales data on a particular product for the last two years (104 observations) and plans to use an exponential smoothing model to forecast sales for this product. The firm might use the following procedure: 1. Pick a particular value of , and compare the values of ŷt+1 to yt+1 for t = 25 to 75. The first 24 values are not compared, so as to negate any initial or “startup” effect; that is, to nullify the influence of the initial guess, ŷ1. The manager would continue to select different values of until the model produces a satisfactory fit during the period t = 25 to 75. 2. Test the model derived in step 1 on the remaining 29 pieces of data. That is, using the best value of from step 1, compare the values of ŷt+1 to yt+1 for t = 76 to 104. If the model does a good job of forecasting values for the last part of the historical data, there is some reason to believe that it will also do a good job with the future. On the other hand, if by using the data from weeks 1 through 75, the model cannot perform well in predicting the demand in weeks 76 through 104, the prospects for predicting the future with the same model seem dubious. In this case, another forecasting technique might be applied. The same type of divide-and-conquer strategy can be used with any of the forecasting techniques we have presented. This approach amounts to simulating the model’s performance on past data. It is a popular method of testing models. It should be stressed, however, that this procedure represents what is termed a null test. If the model fails on historical data, the model probably is not appropriate. If the model succeeds on historical data, C H A P T E R 1 3 Forecasting CD13-39 one cannot be sure that it will work in the future. Who knows, the underlying system that is producing the observations may change. It is this type of sobering experience that causes certain forecasters to be less certain. 13.6 QUALITATIVE FORECASTING EXPERT JUDGMENT Many important forecasts are not based on formal models. This point seems obvious in the realm of world affairs—matters of war and peace, so to speak. Perhaps more surprisingly it is also often true in economic matters. For example, during the high-interest-rate period of 1980 and 1981, the most influential forecasters of interest rates were not two competing econometric models run by teams of econometricians. Rather, they were Henry Kaufman of Salomon Brothers and Albert Wojnilower of First Boston, the so-called Doctors Doom and Gloom of the interest-rate world. These gentlemen combined relevant factors such as the money supply and unemployment, as well as results from quantitative models, in their own intuitive way (their own “internal” models) and produced forecasts that had widespread credibility and impact on the financial community. The moral for managers is that qualitative forecasts can well be an important source of information. Managers must consider a wide variety of sources of data before coming to a decision. Expert opinion should not be ignored. A sobering and useful measure of all forecasts—quantitative and qualitative—is a record of past performance. Good performance in the past is a sensible type of null test. An excellent track record does not promise good results in the future. A poor record, however, hardly creates enthusiasm for high achievement in the future. Managers should thus listen to experts cautiously and hold them to a standard of performance. This same type of standard should be applied by individuals to the thousands of managers of stock and bond mutual funds. Every month the Wall Street Journal publishes the ranking of the different mutual funds’ performance for the previous month, year, three-year, five-year and ten-year period. Wise investors check the track record of the different funds although it is not a guarantee of future performance. There is, however, more to qualitative forecasting than selecting “the right” expert. Techniques exist to elicit and combine forecasts from various groups of experts, and we now turn our attention to these techniques. THE DELPHI METHOD AND CONSENSUS PANEL The Delphi Method confronts the problem of obtaining a combined forecast from a group of experts. One approach is to bring the experts together in a room and let them discuss the event until a consensus emerges. Not surprisingly, this group is called a consensus panel. This approach suffers because of the group dynamics of such an exercise. One strong individual can have an enormous effect on the forecast because of his or her personality, reputation, or debating skills. Accurate analysis may be pushed into a secondary position. The Delphi Method was developed by the Rand Corporation to retain the strength of a joint forecast, while removing the effects of group dynamics. The method uses a coordinator and a set of experts. No expert knows who else is in the group. All communication is through the coordinator. The process is illustrated in Figure 13.36. After three or four passes through this process, a consensus forecast typically emerges. The forecast may be near the original median, but if a forecast that is an outlier in round 1 is supported by strong analysis, the extreme forecast in round 1 may be the group forecast after three or four rounds. GRASSROOTS FORECASTING AND MARKET RESEARCH Other qualitative techniques focus primarily on forecasting demand for a product or group of products. They are based on the concept of asking either those who are close to the even- CD13-40 C D FIGURE 13.36 C H A P T E R S Coordinator requests forecasts Delphi Method Coordinator receives individual forecasts Coordinator determines (a) Median response (b) Range of middle 50% of answers Coordinator requests explanations from any expert whose estimate is not in the middle 50% Coordinator sends to all experts (a) Median response (b) Range of middle 50% (c) Explanations tual consumer, such as salespeople, or consumers themselves, about a product or their purchasing plans. Consulting Salesmen In grassroots forecasting, salespeople are asked to forecast demand in their districts. In the simplest situations, these forecasts are added together to get a total demand forecast. In more sophisticated systems individual forecasts or the total may be adjusted on the basis of the historical correlation between the salesperson’s forecasts and the actual sales. Such a procedure makes it possible to adjust for an actual occurrence of the stereotyped salesperson’s optimism. Grassroots forecasts have the advantage of bringing a great deal of detailed knowledge to bear on the forecasting problem. The individual salesperson who is keenly aware of the situation in his or her district should be able to provide better forecasts than more aggregate models. There are, however, several problems: 1. High cost: The time salespeople spend forecasting is not spent selling. Some view this opportunity cost of grassroots forecasting as its major disadvantage. 2. Potential conflict of interest: Sales forecasts may well turn into marketing goals that can affect a salesperson’s compensation in an important way. Such considerations exert a downward bias in individual forecasts. 3. Product schizophrenia (i.e., stereotyped salesperson’s optimism): It is important for salespeople to be enthusiastic about their product and its potential uses. It is not clear that this enthusiasm is consistent with a cold-eyed appraisal of its market potential. In summary, grassroots forecasting may not fit well with other organization objectives and thus may not be effective in an overall sense. Consulting Consumers Market research is a large and important topic in its own right. It includes a variety of techniques, from consumer panels through consumer surveys and on to test marketing. The goal is to make predictions about the size and structure of the market for specific goods and/or services. These predictions (forecasts) are usually based on small samples and are qualitative in the sense that the original data typically con- C H A P T E R 1 3 Forecasting CD13-41 sist of subjective evaluations of consumers. A large menu of quantitative techniques exists to aid in determining how to gather the data and how to analyze them. Market research is an important activity in most consumer product firms. It also plays an increasingly important role in the political and electoral process. 13.7 NOTES ON IMPLEMENTATION APPLICATION CAPSULE Whether in the private or public sector, the need to deal with the future is an implicit or explicit part of every management action and decision. Because of this, managing the forecasting activity is a critical part of a manager’s responsibility. A manager must decide what resources to devote to a particular forecast and what approach to use to obtain it. The question of “what resources” hinges on two issues: 1. The importance of the forecast, or more precisely, the importance of the decision awaiting the forecast and its sensitivity to the forecast. 2. The quality of the forecast as a function of the resources devoted to it. In other words, how much does it matter, and how much does it cost? These are the same questions that management must ask and answer about many of the services it purchases. In actual applications, the selection of the appropriate forecasting method for a particular situation depends on a variety of factors. Some of the features that distinguish one situation from the next are 1. The importance of the decision 2. The availability of relevant data 3. The time horizon for the forecast 4. The cost of preparing the forecast 5. The time until the forecast is needed 6. The number of times such a forecast will be needed 7. The stability of the environment Improved Forecasting at Taco Bell Helps Save Millions in Labor Costs Taco Bell Corporation has approximately 6,500 companyowned, licensed, and franchised locations in 50 states a growing international market. Worldwide yearly sales are approximately $4.6 billion. In the late 1980s, they restructured the business to become more efficient and cost-effective. To do this, the company relied on an integrated set of operations research models, including forecasting to predict customer arrivals. Through 1997, these models have saved over $53 million in labor costs. At Taco Bell, labor costs represent approximately 30% of every sales dollar and are among the largest controllable costs. They are also among the most difficult to manage because of the direct link that exists between sales capacity and labor. Because the product must be fresh when sold, it is not possible to produce large amounts during periods of low demand to be warehoused and sold during periods of high demand. Instead, the product must be prepared virtually when it is ordered. And since demand is highly variable and is concentrated during the meal periods (52% of daily sales occur during the three-hour lunch period from 11 A.M. to 2 P.M.), determining how many employees to schedule to per- form what functions in the store at any given time is a complex and vexing problem. Customer transactions during a 15-minute interval are subject to many sources of variability, including but not limited to time of the day, day of the week, week of the month, and month of the year. To eliminate as many sources of variability as possible, all customer transaction data was separated into a number of independent time series, each representing the history corresponding to a specific 15-minute interval during a specific day of the week. For example, the customer transaction history at a particular store for all Fridays from 9:00 to 9:15 A.M. constituted the time series to be used to forecast customer transactions at that store for future Fridays from 9:00 to 9:15 A.M. Many different time-series forecasting methods were tried and it was found that a six-week moving average provided the best forecast (i.e., the one that minimized mean squared error). The forecasting system provides the manager with the forecasts obtained for the next week for scheduling purposes. The manager has the authority to modify the forecast based upon known events. (See Hueter and Swart.) CD13-42 C D C H A P T E R S The importance of the decision probably plays the strongest role in determining what forecasting method to use. Curiously, qualitative approaches (as opposed to quantitative) dominate the stage at the extremes of important and not very important forecasts. On the low end of the importance scale, think of the many decisions a supermarket manager makes on what are typically implicit forecasts: what specials to offer, what to display at the ends of the aisles, how many baggers to employ. In such cases, forecasts are simply business judgments. The potential return is not high enough to justify the expenditure of resources required for formal and extensive model development. On the high end, the decisions are too important (and perhaps too complex) to be left entirely to formal quantitative models. The future of the company, to say nothing of the chief executive, may hinge on a good forecast and the ensuing decision. Quantitative models may certainly provide important input. In fact, the higher the planning level, the more you can be sure that forecasting models will be employed at least to some extent. But for very important decisions, the final forecast will be based on the judgment of the executive and his or her colleagues. The extent to which a quantitative model is employed as an input to this judgment will depend, in the final analysis, on management’s assessment of the model’s validity. A consensus panel (a management committee) is often the chosen vehicle for achieving the final forecast. For example, what forecasts do you think persuaded Henry Ford IV to reject Lee Iacocca’s plan to move Ford into small, energy-efficient cars in the late 1970s? Also, what forecasts led Panasonic to introduce a tape-based system while RCA introduced a disk-based system for the TV player market? And what about the Cuban missile crisis? The Bay of Pigs? Clearly, management’s personal view of the future played an important role. Quantitative models play a major role in producing directly usable forecasts in situations that are deemed to be of “midlevel importance.” This is especially true in short-range (up to one month) and medium-range (one month to two years) scenarios. Time-series analyses are especially popular for repetitive forecasts of midlevel importance in a relatively stable environment. The use of exponential smoothing to forecast the demand for mature products is a prototype of this type of application. Causal models actively compete with various experts for forecasting various economic phenomena in the midlevel medium range. Situations in which a forecast will be repeated quite often, and where much relevant data are available, are prime targets for quantitative models, and in such cases many successful models have been constructed. As our earlier discussion of interest rates forecasts indicated, there is ample room in this market for the “expert” with a good record of performance. In commercial practice one finds that many management consulting groups, as well as specialized firms, provide forecasting “packages” for use in a variety of midlevel scenarios. As a final comment, we can make the following observations about the use of forecasting in decision making within the public sector: Just as in private industry, it is often the case that the higher the level of the planning function, the more one sees the use of forecasting models employed as inputs. In such high-level situations there is a high premium on expertise, and forecasting is, in one sense, a formal extension of expert judgment. Think of the Council of Economic Advisors, the chairman of the Federal Reserve Board, or the director of the Central Intelligence Agency. You can be sure that forecasts are of importance in these contexts, and you can be sure that there is within these environments a continuing updating and, one hopes, improvement of forecasting techniques. As always, the extent to which the results of existing models are employed is a function of the executive’s overall assessment of the model itself. C H A P T E R 1 3 Forecasting CD13-43 Key Terms Causal Forecasting. The forecast for the quantity of interest is determined as a function of other variables. Curve Fitting. Selecting a “curve” that passes close to the data points in a scatter diagram. Scatter Diagram. A plot of the response variable against a single independent variable. Method of Least Squares. A procedure for fitting a curve to a set of data. It minimizes the sum of the squared deviations of the data from the curve. Polynomial of Degree n. A function of the form y = a0 + a1x + a2x 2 + . . . + anx n. Often used as the curve in a least squares fit. Linear Regression. A statistical technique used to estimate the parameters of a polynomial in such a way that the polynomial “best” represents a set of data. Also sometimes used to describe the problem of fitting a linear function to a set of data. Validation. The process of using a model on past data to assess its credibility. Time-Series Forecasting. A variable of interest is plotted against time and extrapolated into the future using one of several techniques. Simple n-Period Moving Average. Average of last n periods is used as the forecast of future values; (n – 1) pieces of data must be stored. Weighted n-Period Moving Average. A weighted sum, with decreasing weights, of the last n observations is used as a forecast. The sum of the weights equals 1; (n – 1) pieces of data must be stored. Exponential Smoothing. A weighted sum, with decreasing weights of all past observations, the sum of the weights equals 1; only one piece of information need be stored. Holt’s Model. A variation of simple exponential smoothing that accounts for either up or down trend in the data. Seasonality. Movements up and down in a pattern of constant length that repeats itself in a time-series set of data. Random Walk. A stochastic process in which the variable at time t equals the variable at time (t – 1) plus a random element. Delphi Method. A method of achieving a consensus among experts while eliminating factors of group dynamics. Consensus Panel. An assembled group of experts that produces an agreedupon forecast. Grassroots Forecasting. Soliciting forecasts from individuals “close to” and thus presumably knowledgeable about the entity being forecast. Market Research. A type of grassroots forecasting that is based on getting information directly from consumers. Self-Review Exercises True-False n 1. T F Minimizing total deviations (i.e., di) is a reasoni=1 able way to define a “good fit.” 2. T F Least squares fits can be used for a variety of curves in addition to straight lines. 3. T F Regression analysis can be used to prove that the method of least squares produces the best possible fit for any specific real model. 4. T F The method of least squares is used in causal models as well as in time-series models. 5. T F In a weighted three-period moving-average forecast, the weights can be assigned in many different ways. Multiple Choice 11. Linear regression (with one independent variable) a. requires the estimation of three parameters b. is a special case of polynomial least squares c. is a quick and dirty method d. uses total deviation as a measure of good fit 6. T F Exponential smoothing automatically assigns weights that decrease in value as the data get older. 7. T F Average squared error is one way to compare various forecasting techniques. 8. T F Validation refers to the process of determining a model’s credibility by simulating its performance on past data. 9. T F 10. T F At higher levels of management, qualitative forecasting models become more important. A “random walk” is a stochastic model. 12. An operational problem with a simple k-period moving average is that a. it assigns equal weight to each piece of past data b. it assigns equal weight to each of the last k observations c. it requires storage of k – 1 pieces of data d. none of the above CD13-44 C D C H A P T E R S 13. A large value of puts more weight on a. recent b. older data in an exponential smoothing model. b. divide the data into two parts; estimate the parameters of the model on the first part; see how well the model works on the second part. c. compare two models on the same database. d. none of the above. 14. If the data being observed can be best thought of as being generated by random deviations about a stationary mean, a a. large b. small value of is preferable in an exponential smoothing model. 16. The Delphi Method a. relies on the power of written arguments b. requires resolution of differences via face-to-face debate c. is mainly used as an alternative to exponential smoothing d. none of the above 15. A divide-and-conquer strategy means a. divide the modeling procedure into two parts: (1) use all the data to estimate parameter values, and (2) use the parameter values from part (1) to see how well the model works. 17. Conflict of interest can be a serious problem in a. the Delphi Method b. asking salespeople c. market research based on consumer data d. none of the above Answers 1 . F, 2 . T, 3 . F, 4 . T, 5 . T, 6 . T, 7 . T, 8 . T, 9 . T, 1 0 . T, 1 1 . b , 1 2 . c , 1 3 . a , 14. b, 15. b, 16. a, 17. b Skill Problems 13-1. Consider the data set shown (contained in 13-1.XLS): x 100 70 30 40 80 60 50 20 10 90 y 57 40 35 33 56 46 45 26 26 53 (a) Plot a scatter diagram of these data. (b) Fit a straight line to the data using the method of least squares. (c) Use the function derived in part (b) to forecast a value for y when x = 120. 13-2. Consider the following set of data where x is the independent and y the dependent variable (contained in 13-2.XLS): x 30 25 20 15 10 5 y 15 20 30 35 45 60 (a) Plot the scatter diagram for these data. (b) Fit a straight line to the data by the method of least squares. 13-3. Consider the following set of data (contained in 13-3.XLS): x 1 2 3 4 5 6 7 y 2.00 1.50 4.50 4.00 5.50 4.50 6.00 (a) Plot a scatter diagram of the data. (b) Fit a straight line to the data by the method of least squares. Plot the line on the scatter diagram. (c) Fit a quadratic function to the data by the method of least squares. Plot the curve on the scatter diagram. 13-4. Fit a quadratic function to the data in Problem 13-2 by the method of least squares. 13-5. Compare the goodness of fit on the data in Problem 13-3 for the least squares linear function and the least squares quadratic by calculating the sum of the squared deviations. 13-6. Compare the goodness of fit on the data in Problem 13-2 for the least squares linear function and the least squares quadratic function (derived in Problem 13-4) by calculating the sum of the squared deviations. Is the answer for 13-4 always better than that for 13-2? C H A P T E R Forecasting CD13-45 1 3 13-7. Further investigation reveals that the x variable in Problem 13-1 is simply 10 times the time at which an observation was recorded, and the y variable is demand. For example, a demand of 57 occurred at time 10; a demand of 26 occurred at times 1 and 2. (a) Plot actual demand against time. (b) Use a simple four-period moving average to forecast demand at time 11. (c) By inspecting the data, would you expect this to be a good model or not? Why? 13-8. Consider the following data set (contained in 13-8.XLS): TIME DEMAND 13-9. 13-10. 13-11. 13-12. 13-13. 13-14. 13-15. 13-16. 13-17. 13-18. 1 2 3 4 5 6 7 8 9 10 11 12 10 14 19 26 31 35 39 44 51 55 61 54 (a) Plot this time series. Connect the points with a straight line. (b) Use a simple four-period moving average to forecast the demand for periods 5–13. (c) Find the mean absolute deviation. (d) Does this seem like a reasonable forecasting device in view of the data? Consider the data in Problem 13-7. (a) Use a four-period weighted moving average with the weights 4\10, 3\10, 2\10, and 1\10 to forecast demand for time 11. Heavier weights should apply to more recent observations. (b) Do you prefer this approach to the simple four-period model suggested in Problem 13-7? Why? (c) Now find the optimal weights using the Solver. How much have you reduced your error measure compared to (a)? Consider the data in Problem 13-8. (a) Use a four-period weighted moving average with the weights 0.1, 0.2, 0.3, and 0.4 to forecast demand for time periods 5–13. Heavier weights should apply to more recent observations. (b) Find the mean absolute deviation. (c) Do you prefer this approach to the simple four-period model suggested in Problem 13-8? Why? (d) Now find the optimal weights using the Solver. How much have you reduced your error measure compared to the method of (a)? Consider the time-series data in Problem 13-7. (a) Let ŷ1 = 22 and = 0.4. Use an exponential smoothing model to forecast demand in period 11. (b) If you were to use an exponential smoothing model to forecast this time series, would you prefer a larger (0.4) or smaller value for ? Why? (c) Find the optimal value for using Solver by minimizing the mean absolute percentage error. Consider the time-series data in Problem 13-8. (a) Assume that ŷ1 = 8 and = 0.3. Use an exponential smoothing model to forecast demand in periods 2–13. (b) Find the mean absolute percentage error. (c) Repeat the analysis using = 0.5. (d) If you were to use an exponential smoothing model to forecast this time series, would you prefer = 0.3, a larger (0.3), or smaller, value of ? Why? (e) What is the optimal value for ? How much does it reduce the MAPE from part (a)? The president of Quacker Mills wants a subjective evaluation of the market potential of a new nachoflavored breakfast cereal from a group consisting of (1) the vice president of marketing, (2) the marketing manager of the western region, (3) ten district sales managers from the western region. Discuss the advantages and disadvantages of a consensus panel and the Delphi Method for obtaining this evaluation. Given that yt is produced by the relationship yt = yt–1 + ε where ε is a random number with mean zero and y1 = 1, y2 = 2, y3 = 1.5, y4 = 0.8, y5 = 1, what is your best forecast of y6? Given your current knowledge of the situation, would you recommend a causal or a time-series model to forecast next month’s demand for Kellogg’s Rice Crispies? Why? If = 0.3, in calculating ŷ5, what is the weight on (a) ŷ1 (b) y1 (c) y4 Discuss the merit of the measure “mean squared error.” In comparing two forecasting methods, is the one with a smaller average squared error always superior? If a company experiences an exponential sales growth, how would you alter the sales forecasting model to account for this? CD13-46 C D C H A P T E R S 13-19. Generate the following function for 100 periods (i.e., xt from 1 to 100): yt = 300 + 200 * COS(0.1 * xt) (a) Create a twenty-period simple moving average forecast for periods 21 to 100, and a fifty-period simple moving average forecast for periods 51 to 100 and compare these forecasts against the original function. What can you infer about the performance of the simple moving average forecasting technique? (b) Create an exponential smoothing forecast using = 0.3. What can you infer about the performance of exponential smoothing forecasts? (c) Given the above function and forecasts, how would you as a manager use qualitative forecasting to adjust the quantitative forecasts to achieve higher accuracy? 13-20. Discuss the advantages and disadvantages of MAD, MAPE, and MSE as error measurements. Why would you use one over another? 13-21. The marketing manager for 7–12, a small upstart convenience store corporation, wants to forecast the sales of a new store opening in a small midwestern city. He has gathered the population statistics and sales at the other ten stores the corporation operates. The data is shown in Table 13.3. (a) Plot the scatter diagram of these data. (b) Fit a straight line to the data using the method of least squares. (c) Forecast sales for the new store if the population of the new city is 50,000. 13-22. The marketing manager for 7–12, a small upstart convenience store corporation, has found that sales over time at his highest performing store is as shown in Table 13.4. (a) Use a three-period simple moving average to forecast periods 4 to 11. (b) Use a three-period weighted moving average to forecast periods 4 to 11. Use Solver to optimize the weights for the forecast. (c) Use an exponentially smoothed forecast to forecast periods 2 to 11. Use Solver to optimize the value of alpha. Table 13.3 Table 13.4 STORE NUMBER SALES POPULATION 1 2 3 4 5 6 7 8 9 10 400000 1250000 1300000 1100000 450000 540000 500000 1425000 1700000 475000 10,000 65,000 72,000 54,000 42,500 36,800 27,500 85,000 98,000 37,500 PERIOD SALES 1 2 3 4 5 6 7 8 9 10 11 $ 750,000 $ 790,000 $ 810,000 $ 875,000 $ 990,000 $1,090,000 $ 950,000 $1,050,000 $1,150,000 $1,200,000 $1,250,000 C H A P T E R 1 3 Forecasting CD13-47 13-23. Discuss the effort and accuracy required with forecasts for the following: (a) A short-term investment of $1,000,000 in volatility.com common stock with a time horizon of two weeks or less. (b) A long-term investment in an S&P-500 index fund with a time horizon of 30 years or more. (c) A monthly materials requirements forecast for the production of jumbo jet aircraft. (d) A weekly forecast for the number of custodians in a school. 13-24. Discuss the role of probability in the development of forecasting models. When would it be appropriate to use random variables in a forecasting model and what kind of statistical data would be used in conjunction with the model? 13-25. Discuss the divide-and-conquer strategy as it applies to time-series forecasting. What alternatives would you consider if the results from the divide-and-conquer strategy fail to meet your standards? Application Problems 13-26. In some cases it is possible to obtain better forecasts by using a trend-adjusted forecast. (a) Use the Holt trend model with ŷ1 = 22 to forecast the sequence of demands in Problem 13-7. (b) Use the MSE error measure to compare the simple exponential smoothing model (Problem 1311) with the trend-adjusted model from part (a) on forecasting demand. 13-27. In some cases it is possible to obtain better forecasts by using a trend-adjusted forecast. (a) Use the Holt trend model with ŷ1 = 8 to forecast the sequence of demands in Problem 13-8. (b) As in Problem 13-26, compare the above result with the result from Problem 13-12 (simple exponential smoothing). 13-28. In Section 13.4 we presented the Holt trend model for forecasting the earnings of Startup Airlines by Amy Luford. Use the same data to (a) Develop a trend model using linear regression with time as the independent variable. (b) How does its forecasting performance compare with the Holt trend model? 13-29. The spreadsheet AUTO.XLS contains data from Business Week on auto sales by month for 43 months. (a) Deseasonalize the data. (b) Find the best forecasting method for the deseasonalized data. (c) What is your forecast for period 44? (d) How much confidence do you have in your forecast? 13-30. Using the OILCOMP.XLS data, use the “Regression” tool of Excel to fit a quadratic function curve to the data. Hint: You must first create a column for a second independent variable, X2 = X12, and then regress Y (Sales/hr) on both X1 (Cars/hr) and X2 ([Cars/hr] ^2). (a) How do the results compare to those found in the chapter by using the Solver? (b) Which technique seems easier? (c) Which error measures can you use to compare these two approaches? 13-31. Use an “investment” Internet web site to download the daily closing price data for AT&T stock for the most recent thirty days. Use Holt’s model to forecast the next five days. Follow the next five days’ closing prices for AT&T and compare the actual closings with your forecast. Note: www.yahoo.com will provide data in its finance section that can be downloaded into an Excel spreadsheet. Ensure that you save the spreadsheet as an Excel spreadsheet after downloading the .csv format. Case Study The Bank of Laramie Jim Cowan was reviewing the staffing needs for the Proof Department. He was the president of the Bank of Laramie and one of his first priorities was to evaluate a staffing and scheduling problem. tomers. It provided its customers a full line of corporate banking services, including cash management, and money market services. It had total assets of $19 million and net income of $300 thousand. Company Background Proof Department The Bank of Laramie conducted retail and corporate banking activities and provided a full line of trust services to its cus- The Proof Department was the heart of the bank’s checkclearing operations. The department received and processed CD13-48 C D C H A P T E R S checks and other documents to clear them in the shortest possible time in order to save on float, which averaged $450 thousand a day. The department was charged with the responsibility for sorting checks, proving the accuracy of deposits, distributing checks, and listing transactions arising from the daily operations of the bank. The physical facility consisted of a room with two proof machines and several tables. The department operated from 8:00 A.M. to 5:30 P.M. on Monday through Friday. Despite the practice by other banks of handling check processing almost entirely at night, the Bank of Laramie believed it was important to give its employees a normal workday. The volume of items processed in the Proof Department had increased significantly in the last two years, from 780 thousand/year to 1.6 million/year. The scheduling problem in the department was magnified because of the uneven nature of the volume. Exhibit 1 contains deseasonalized EXHIBIT 1 Deseasonalized Weekly Proof Volumes WEEK 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 VOLUME (000) WEEK VOLUME (000) 23.4 26.4 28.7 26.4 28.6 29.4 29.9 29.3 32.2 28.7 27.8 31.1 32.7 32.5 28.9 31.8 32.8 32.7 31.7 32.5 32.7 30.9 30.5 31.3 30.1 32.4 28.5 29.9 31.7 30.7 31.6 32.1 30.1 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 31.1 31.0 29.6 31.5 31.3 31.1 34.9 32.3 35.6 33.8 31.3 31.2 30.4 31.1 32.7 34.8 34.5 36.0 28.3 27.8 30.4 28.4 29.3 28.9 33.5 32.6 32.0 30.6 31.9 31.3 31.6 31.1 32.0 weekly proof volumes going back to the beginning of the prior year. This volume pattern led management to use a part-time staff to cover peak loads. Currently, one full-time and two part-time proof operators were working at the bank. Each operator had an average processing rate of 700 items per hour. Forecasting The first thing Mr. Cowan had to do was forecast demand for next week, week 67, and then he would need to work out a schedule for the number of full- and part-time staff to meet the predicted demand. A couple of simple forecasting methods appealed to him—namely, using the previous week’s actual demand for the next week’s forecast or just using the weekly average over the past 66 weeks. He wondered how accurate these simple methods were, however, and whether or not there was a better way. He would use his forecast to determine how many hours of additional part-time workers to schedule for the next week. His base schedule was enough to do 15,000 checks; he could add as many additional part-time hours as he wished to the schedule. (Note: The base schedule includes the number of checks that could be processed by one full-time and two parttime workers along with their other duties.) If he scheduled either full- or part-time hours, he had to pay for them even if the workers completed the check processing early. On the other hand, if the volume of checks was so high that the checks couldn’t be processed in the hours he scheduled for the week, he would need to pay overtime wages (which were 50% above regular wages) in order to complete the work for the week. There was no requirement to finish the checks in a given day during the day, but all the checks for the week had to be done by Friday afternoon. His first task was to get a handle on the forecasting problem; then he could easily use it to find the number of parttime hours to schedule. Questions 1. Look at the following five procedures for forecasting the weekly workload requirements of the Proof Department suggested below: • Method 1: This simple forecasting scheme uses the previous week’s volume to forecast each succeeding week’s volume (e.g., the forecast for week 10 would be 32.2, the volume for week 9). • Method 2: This approach uses the average volume over all the previous weeks as the forecast. Thus, the forecast for week 23 would be 30.05 (the average over the first 22 weeks). To forecast week 67, weeks 1 through 66 would be averaged together. • Method 3: This method is exponential smoothing with alpha = 0.5. Use the spreadsheet (BANK.XLS) that already has the data from Exhibit 1 entered and the exponential smoothing forecast calculated for an C H A P T E R alpha of 0.5. (Notice that this forecasting method requires an initial average to get things started. The spreadsheet used 23.4 as this average; in essence taking a “best-case” approach by assuming the first average was perfect.) • Method 4: This method takes a moving average approach. The number of periods in the moving average is up to you. Try several different values to come up with the best forecasting method you can. Case Study 1 3 Forecasting CD13-49 • Method 5: This method is called linear regression. It basically fits a straight line through the data, looking for the overall trend. 2. 3. Which of the forecasting methods presented would you recommend to forecast weekly volumes? Adjust the smoothing constant if needed to improve method 3. How many part-time hours should Mr. Cowan schedule for next week? Hint: Think about the costs of over- and underscheduling the number of hours. Shumway, Horch, and Sager (B)1 Christensen started to look at the circulation data of some of the other monthly magazines represented by the client organization. The first set of data was for Working Woman (found in the “WkgWom” sheet of SHSB.XLS and shown in Exhibit 1), which was targeted at women who were in management careers in business. Contents included sections devoted to entrepreneurs, business news, economic trends, technology, politics, career fields, social and behavioral sciences, fashion, and health. It was sold almost entirely through subscriptions, as evidenced by the latest figures reported to the Audit Bureau of Circulation (823.6K subscriptions out of 887.8K total circulation). The next graph represented circulation data for Country Living (found in the “CtryLiving” sheet of SHSB.XLS and shown in Exhibit 2), a journal that focused on both the practical concerns and the intangible rewards of living on the land. It was sold to people who had a place in the country, whether EXHIBIT 2 Graph of Country Living Circulation that was a working farm, a gentleman’s country place, or a weekend retreat. The third set of data was for Health (found in the “Hlth” sheet of SHSB.XLS and shown in Exhibit 3), which was a lifestyle magazine edited for women who were trying to look and feel better. The magazine provided information on fitness, beauty, nutrition, medicine, psychology, and fashions for the active woman. EXHIBIT 1 Graph of Working Woman Circulation 1 This case is to be used as the basis for class discussion rather than to illustrate either the effective or ineffective handling of an administrative situation. © 1990 by the Darden Graduate Business School Foundation. Preview Darden case abstracts on the World Wide Web at www.darden.virginia.edu/publishing. EXHIBIT 3 Graph of Health Circulation CD13-50 C D C H A P T E R S EXHIBIT 4 Graph of Better Homes and Gardens Circulation EXHIBIT 5 Graph of True Story Circulation A fourth graph was for Better Homes and Gardens (found in the “BH&G” sheet of SHSB.XLS and shown in Exhibit 4), which competed with Good Housekeeping and was published for husbands and wives who had serious interests in home and family as the focal points of their lives. It covered these homeand-family subjects in depth: food and appliances, building and handyman, decorating, family money management, gardening, travel, health, cars in your family, home and family entertainment, new-product information, and shopping. The magazine’s circulation appeared to be experiencing increased volatility over time. Was this the beginning of a new pattern? The last magazine was True Story (found in the “TrueStry” sheet of SHSB.XLS and shown in Exhibit 5). It was edited for young women and featured story editorials as well as recipes and food features, beauty and health articles, and home management and personal advice. This journal’s circulation appeared to have a definite downward trend over the past nine years. Was the cause a general declining interest in the subject matter, or was this a cycle that would correct itself in the future (like the sine wave Christensen had studied in trigonometry)? Case Study Question 1. What’s the best forecasting method for each of the five magazines? Use the concept of “Divide and Conquer” to really test your different forecast methods. Marriott Room Forecasting1 “A hotel room is a perishable good. If it is vacant for one night, the revenue is lost forever.” Linda Snow was commenting on the issue of capacity utilization in the hotel business. “On the other hand, the customer is king with us. We go to great pains to avoid telling a customer with a reservation at the front desk that we don’t have a room for him in the hotel.” As reservation manager of one of Marriott’s hotels, Linda faced this trade-off constantly. To complicate the matter, customers often booked reservations and then failed to show, or cancelled reservations just before their expected arrival. In addition, some guests stayed over in the hotel extra days beyond their original reservation and others checked out early. A key aspect of dealing with the capacity-management problem was having a good forecast of how many rooms would be needed on any future date. It was Linda’s responsibility to pre1 This case is to be used as the basis for class discussion rather than to illustrate either the effective or ineffective handling of an administrative situation. © 1989 by the Darden Graduate Business School Foundation. Preview Darden case abstracts on the World Wide Web at www.darden.virginia.edu/publishing. pare a forecast on Tuesday afternoon of the number of rooms that would be occupied each day of the next week (Saturday through Friday). This forecast was used by almost every department within the hotel for a variety of purposes; now she needed the forecast for a decision in her own department. Hamilton Hotel The Hamilton Hotel was a large downtown business hotel with 1,877 rooms and abundant meeting space for groups and conventions. It had been built and was operated by Marriott Hotels, a company that operated more than 180 hotels and resorts worldwide and was expanding rapidly into other lodging-market segments. Management of The Hamilton reported regularly to Marriott Corporation on occupancy and revenue performance. Hotel managers were rewarded for their ability to meet targets for occupancy and revenue. Linda could not remember a time when the targets went down, but she had seen them go up in the two years since she took the job as reservation manager. The hotel managers were continuously comparing forecasts of performance against these targets. In addition to over- C H A P T E R seeing the reservations office with eight reservationists, Linda prepared the week-ahead forecast and presented it on Tuesday afternoon to other department managers in the hotel. The forecast was used to schedule, for example, daily work assignments for housekeeping personnel, the clerks at the front desk, restaurant personnel, and others. It also played a role in purchasing, revenue, and cost planning. Overbooking At the moment, however, Linda needed her forecast to know how to treat an opportunity that was developing for next Saturday. It was Tuesday, August 18, and Linda’s forecasts were due by midafternoon for Saturday, August 22 through Friday, August 28. Although 1,839 rooms were reserved already for Saturday, Linda had just received a request from a tour company for as many as 60 more rooms for that night. The tour company would take any number of rooms less than 60 that Linda would provide, but no more than 60. Normally Linda would be ecstatic about such a request: selling out the house for a business hotel on a Saturday would be a real coup. The request, in its entirety, put reservations above the capacity of the hotel, however. True, a reservation on the books Tuesday was not the same as a “head in the bed” on Saturday, especially when weekend nights produced a lot of “no-show” reserva- 1 3 Forecasting CD13-51 tions. “Chances are good we still wouldn’t have a full house on Saturday,” Linda thought out loud. “But if everybody came and someone was denied a room due to overbooking, I would certainly hear about it, and maybe Bill Marriott would also!” Linda considered the trade-off between a vacant room and denying a customer a room. The contribution margin from a room was about $90, since the low variable costs arose primarily from cleaning the room and check-in/check-out. On the other side, if a guest with a reservation was denied a room at The Hamilton, the front desk would find a comparable room somewhere in the city, transport the guest there, and provide some gratuity, such as a fruit basket, in consideration for the inconvenience. If the customer were a Marquis cardholder (a frequent guest staying more than 45 nights a year in the hotel), he or she would receive $200 cash plus the next two stays at Marriott free. Linda wasn’t sure how to put a cost figure on a denied room; in her judgment, it should be valued, goodwill and all, at about twice the contribution figure. Forecasting Linda focused on getting a good forecast for Saturday, August 22, and making a decision on whether to accept the additional reservations for that day. She had historical data on demand for rooms in the hotel; Exhibit 1 shows demand for the first EXHIBIT 1 Historical Demand and Bookings Data Cell Formula Copy To E5 C5/D5 E6:E91 CD13-52 C D C H A P T E R S three weeks for dates starting with Saturday, May 23. (Ten additional weeks [weeks 4–13] are contained in MARRIOTT.XLS and thus Saturday, August 22, was the beginning of week 14 in this database.) “Demand” figures (column C) included the number of turned-down requests for a reservation on a night when the hotel had stopped taking reservations because of capacity plus the number of rooms actually occupied that night. Also included in Exhibit 1 is the number of rooms booked (column D) as of the Tuesday morning of the week prior to each date. (Note that this Tuesday precedes a date by a number of days that depends on the date’s day of week. It is four days ahead of a Saturday date, seven days ahead of a Tuesday, ten days ahead of a Friday. Also note that on a Tuesday morning, actual demand is known for Monday night, but not for Tuesday night.) Linda had calculated pickup ratios for each date where actual demand was known in Exhibit 1 (column E). Between a Tuesday one week ahead and any date, new reservations were added, reservations were canceled, some reservations were extended to more nights, some were shortened, and some resulted in no-shows. The net effect was a final demand that might be larger than Tuesday bookings (a pickup ratio greater than 1.0) or smaller than Tuesday bookings (a pickup ratio less than 1.0). Linda looked at her forecasting task as one of predicting the pickup ratio. With a good forecast of pickup ratio, she could simply multiply by Tuesday bookings to obtain a forecast of demand. From her earliest experience in a hotel, Linda was aware that the day of the week (DOW) made a lot of difference in demand for rooms; her recent experience in reservations sug- gested that it was key in forecasting pickup ratios. Downtown business hotels like hers tended to be busiest in the middle of the workweek (Tuesday, Wednesday, Thursday) and light on the weekends. Using the data in her spreadsheet, she had calculated a DOW index for the pickup ratio during each day of the week, which is shown in column F of Exhibit 1. Thus, for example, the average pickup ratio for Saturday is about 86.5% of the average pickup ratio for all days of the week. Her plan was to adjust the data for this DOW effect by dividing each pickup ratio by this factor. This adjustment would take out the DOW effect, and put the pickup ratios on the same footing. Then she could use the stream of adjusted pickup ratios to forecast Saturday’s adjusted pickup ratio. To do this, she needed to think about how to level out the peaks and valleys of demand, which she knew from experience couldn’t be forecasted. Once she had this forecast of adjusted pickup ratio, then she could multiply it by the Saturday DOW index to get back to an unadjusted pickup ratio. “Let’s get on with it,” she said to herself. “I need to get an answer back on that request for 60 reservations.” Questions 1. 2. 3. Verify the Day-of-Week indices in column F of Exhibit 1. What forecasting procedure would you recommend for making the Tuesday afternoon forecast for each day’s demand for the following Saturday through Friday? What is your forecast for Saturday, August 22? What will you do about the current request for up to 60 rooms for Saturday? References Bruce Andrews and Shawn Cunningham, “L. L. Bean Improves Call-Center Forecasting,” Interfaces, 25, no. 6 (1995), 1–13. George E. P. Box and Gwilym M. Jenkins, Time Series Analysis, Forecasting and Control (San Francisco: Holden-Day, Inc., 1970). Alan L. C. Bullock, Hitler: A Study in Tyranny (New York: Harper & Row, 1962). Jackie Hueter and William Swart, “An Integrated Labor-Management System for Taco Bell,” Interfaces, 28, no. 1 (1998), 75–91. Quarterly Coal Reports published by the Department of Energy, Energy Information Administration.