King Abdulaziz University Faculty of Engineering Industrial Engineering Dept. IE 436 Dynamic Forecasting 1 CHAPTER 3 Exploring Data Patterns and an Introduction to Forecasting techniques Cross-sectional data: collected at a single point in time. A Time series: collected, and recorded over successive increments of time. (Page 62) 2 Exploring Time Series Data Patterns Horizontal (stationary). Trend. Cyclical. Seasonal. A Stationary Series Its mean and variance remain constant over time 3 The Trend The long-term component that represents the growth or decline in the time series. The Cyclical component The wavelike fluctuation around the trend. Cost 25 Cyclical Peak Trend Line 20 15 Cyclical Valley 10 0 10 Year 20 FIGURE 3-2 Trend and Cyclical Components of an Annual Time Series Such as Housing Costs Page (63) 4 The Seasonal Component A pattern of change that repeats itself year after year. Seasonal data 800 750 700 650 600 600 400 550 550 500 500 450 Y 500 550 400 350 400 350 350 400 350 400 350 300 300 250 250 200 200 200 150 100 2 4 6 8 10 12 14 Index 16 18 20 22 FIGURE 3-3 Electrical Usage for Washington water Power Company, 1980-1991 24 Page (64) 5 Exploring Data Patterns with Autocorrelation Analysis • Autocorrelation: The correlation between a variable lagged one or more periods and itself. n (Y Y )(Y rk t k 1 t t k Y) n 2 ( Y Y ) t k 0,1, 2, .... (3.1) t 1 rk = autocorrelation coefficient for a lag of k periods Y = mean of the values of the series Yt = observation in time period t Yt k = observation at time period t-k (Pages 64-65) 6 Autocorrelation Function (Correlogram) A graph of the autocorrelations for various lags. Computation of the lag 1 autocorrelation coefficient Table 3-1 (page 65) 7 Example 3.1 Data are presented in Table 3-1 (page 65). Table 3-2 shows the computations that lead to the calculation of the lag 1 autocorrelation coefficient. Figure 3-4 contains a scatter diagram of the pairs of observations (Yt, Yt-1). Using the totals from Table 3-2 and Equation 3.1: n r1 (Y Y )(Y t 11 t 1 t n (Y Y ) t 1 t 2 Y) 843 0.572 1474 8 Autocorrelation Function (Correlogram) (Cont.) Minitab instructions: Stat > Time Series > Autocorrelation Autocorrelation Function 1.0 0.8 Autocorrelation 0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -1.0 1 2 Lag 3 FIGURE 3-5 Correlogram or Autocorrelation Function for the Data Used in Example 3.1 9 Questions to be Answered using Autocorrelation Analysis Are Do the data random? the data have a trend? Are the data stationary? Are the data seasonal? (Page 68) 10 Are the data random? If a series is random: The successive values are not related to each other. Almost all the autocorrelation coefficients are significantly different from zero. 11 Is an autocorrelation coefficient significantly different from zero? - The autocorrelation coefficients of random data have an approximate normal sampling distribution. -At a specified confidence level, a series can be considered random if the autocorrelation coefficients are within the interval [0 ± t SE(rk)], (z instead of t for large samples). rk - The following t statistic can be used: t SE ( rk ) 12 - Standard error of the autocorrelation at lag k: k 1 SE (rk ) 1 2 ri 2 i 1 (3.2) n Where: ri = the autocorrelation at time lag k. k = the time lag n = the number of observations in the time series 13 Example 3.2 (Page 69) A hypothesis test: Is a particular autocorrelation coefficient is significantly different from zero? At significant level = 0.05: the critical values ± 2.2 are the t upper and lower points for n-1 = 11 degrees of freedom. Decision Rule: If t < -2.2 or t > 2.2, reject H◦: rk = 0 Note: t is given directly in the Minitab output under the heading T. 14 Is an autocorrelation coefficient different from zero? (Cont.) The Modified Box-Pierce Q statistic (developed by: Ljung, and Box) “LBQ” A portmanteau test: Whether a whole set of autocorrelation coefficients at once. 15 rk2 Q n(n 2) k 1 n k m (3.3) Where: n= number of observations K= the time lag m= number of time lags to be considered rk= kth autocorrelation coefficient lagged k time periods The value of Q can be compared with the chi-square with m degrees of freedom. 16 Example 3.3 (Page 70) t Yt t Yt t Yt t Yt 1 2 3 4 5 6 7 8 9 10 343 574 879 728 37 227 613 157 571 72 11 12 13 14 15 16 17 18 19 20 946 142 477 452 727 147 199 744 627 122 21 22 23 24 25 26 27 28 29 30 704 291 43 118 682 577 834 981 263 424 31 32 33 34 35 36 37 38 39 40 555 476 612 574 518 296 970 204 616 17 97 Autocorrelation Autocorrelation Function 1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -1.0 1 2 3 4 5 6 7 8 Lag Corr T LBQ Lag Corr T LBQ 1 2 -0.19 -0.01 -1.21 -0.04 1.57 1.58 8 9 -0.03 -0.03 -0.15 -0.18 7.67 7.73 3 -0.15 -0.89 2.53 10 0.02 0.12 7.75 4 0.10 0.63 3.04 5 6 -0.25 0.03 -1.50 0.16 6.13 6.17 7 0.17 0.95 7.63 9 10 FIGURE 3-7 Autocorrelation Function for the Data Used in Example 18 3.3 • Q statistic for m= 10 time lags is calculated = 7.75 (using Minitab). • The chi-square value 18.307, (tested at 0.05 significance level, degrees of freedom df = m = 10). Table B-4 (Page 527) 2 0.05= •Q< 2 0.05 , Conclusion: the series is random. 19 Do the Data have a Trend? A significant relationship exists between successive time series values. The autocorrelation coefficients are large for the first several time lags, and then gradually drop toward zero as the number of periods increases. The autocorrelation for time lag 1: is close to 1, for time lag 2: is large but smaller than for time lag 1. 20 Example 3.4 (Page 72) Data in Table 3-4 (Page 74) Year Yt Year Yt Year Yt Year Yt 1955 3307 1966 6769 1977 17224 1988 50251 1956 3556 1967 7296 1978 17946 1989 53794 1957 3601 1968 8178 1979 17514 1990 55972 1958 3721 1969 8844 1980 25195 1991 57242 1959 4036 1970 9251 1981 27357 1992 52345 1960 4134 1971 10006 1982 30020 1993 50838 1961 4268 1972 10991 1983 35883 1994 54559 1962 4578 1973 12306 1984 38828 1995 34925 1963 5093 1974 13101 1985 40715 1996 38236 1964 5716 1975 13639 1986 44282 1997 41296 1965 6357 1976 14950 1987 48440 1998 ……. 21 Data Differencing • A time series can be differenced to remove the trend and to create a stationary series. • See FIGURE 3-8 (Page 73) for differencing the Data of Example 3.1 • See FIGURES 3-12, 3-13 (Page 75) 22 Are The Data Seasonal? For quarterly data: a significant autocorrelation coefficient will appear at time lag 4. For monthly data: a significant autocorrelation coefficient will appear at time lag 12. 23 Example 3.5 (Page 76) See Figures 3-14, 3-15 (Page 77) Table 3-5: Year December 31 March 31 June 30 September 30 1994 147.6 251.8 273.1 249.1 1995 139.3 221.2 260.2 259.5 1996 140.5 245.5 298.8 287.0 1997 168.8 322.6 393.5 404.3 1998 259.7 401.1 464.6 497.7 1999 264.4 402.6 411.3 385.9 2000 232.7 309.2 310.7 293.0 2001 205.1 234.4 285.4 258.7 2002 193.2 263.7 292.5 315.2 2003 178.3 274.5 295.4 286.4 2004 190.8 263.5 318.8 305.5 2005 242.6 318.8 329.6 338.2 2006 232.1 285.6 291.0 281.4 24 Time Series Graph Quarterly Sales: 1995-2007 600 Quarterly Sales 500 400 300 200 100 0 Years FIGURE 3-14 Time Series Plot of Quarterly Sales for Coastal Marine for Example 3.5 25 Autocorrelation Autocorrelation Function 1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -1.0 2 7 12 Lag Corr T LBQ Lag Corr T LBQ 1 0.39 2.83 8.49 8 0.34 1.48 56.92 2 0.16 1.03 10.00 9 -0.18 -0.77 59.10 3 0.29 1.81 14.91 10 -0.43 -1.79 71.46 4 0.74 4.30 46.79 11 -0.32 -1.24 78.32 5 0.15 0.67 48.14 12 0.09 0.32 78.83 6 -0.15 -0.64 49.44 13 -0.35 -1.34 87.77 7 -0.05 -0.23 49.60 FIGURE 3-15 Autocorrelation Function for quarterly Sales for Coastal Marine for Example 3.5 Autocorrelation coefficients at time lags 1 and 4 are significantly 26 different from zero, Sales are seasonal on quarterly basis. Choosing a Forecasting Technique Questions to be Considered: 27 Why is a forecast needed? Who will use the forecast? What are the characteristics of the data? What time period is to be forecast? What are the minimum data requirements? How much accuracy is required? What will the forecast cost? Choosing a Forecasting Technique (Cont.) The Forecaster Should Accomplish the Following: Define the nature of the forecasting problem. Explain the nature of the data. Describe the properties of the techniques. Develop criteria for selection. 28 Choosing a Forecasting Technique (Cont.) Factors Considered: Level of Details. Time horizon. Based on judgment or data manipulation. Management acceptance. 29Cost. General considerations for choosing the appropriate method Method 30 Uses Considerations Judgment Can be used in the absence of historical data (e.g. new product). Most helpful in mediumand long-term forecasts Subjective estimates are subject to the biases and motives of estimators. Causal Sophisticated method Very good for medium- and long-term forecasts Must have historical data. Relationships can be difficult to specify Time series Easy to implement Work well when the series is relatively stable Rely exclusively on past data. Most useful for short-term estimates. Minimal Data Requirements Pattern of Data Time Horizon Type of Model ST , T , S S TS 1 Simple averages ST S TS 30 Moving averages ST S TS 4-20 Single Exponential smoothing ST S TS 2 Linear (Double) exponential smoothing (Holt’s) T S TS 3 Quadratic exponential smoothing T S TS 4 Seasonal exponential smoothing (Winter’s) S S TS 2xs Adaptive filtering S S TS 5xs Simple regression T I C 10 C,S I C 10 x V Classical decomposition S S TS Exponential trend models T I,L TS 10 S-curve fitting T I,L TS 10 Gompertz models T I,L TS 10 Growth curves T I,L TS 10 Census X-12 S S TS ST , T , C , S S TS 24 Lading indicators C S C 24 Econometric models C S C 30 Method Naïve Multiple regression ARIMA (Box-Jenkins) Nonseasonal Seasonal 5xs 6xs Time series multiple regression T,S I,L C Pattern of data: ST, stationary; T, trended; S, seasonal; C, cyclical. Time horizon: S, short term (less than three months); I, intermediate; L, long term Type of model: TS, time series; C, causal. Seasonal: s, length of seasonality. of Variable: V, number variables. 3xs 6xs 31 Measuring Forecast Error Basic Forecasting Notation Yt = actual value of a time series in time t Yt = forecast value for time period t et = Yt - Yt = forecast error in time t (residual) 32 Measuring Forecasting Error (Cont.) The Mean Absolute Deviation The Mean Squared Error The Root Mean Square Error 1 n MAD Yt Yt n t 1 2 1 n MSE (Yt Yt ) n t 1 RMSE 2 1 n (Yt Yt ) n t 1 1 n The Mean Absolute Percentage Error MAPE n t 1 The Mean Percentage Error Equations (3.7 - 3.11) Yt Yt Yt n 1 (Yt Yt ) MPE n t 1 Yt 33 Used for: • The measurement of a technique usefulness or reliability. • Comparison of the accuracy of two different techniques. • The search for an optimal technique. Example 3.6 (Page 83) • Evaluate the model using: • MAD, MSE, RMSE, MAPE, and MPE. 34 Empirical Evaluation of Forecasting Methods Results of the forecast accuracy for a sample of 3003 time series (1997): Complex methods do not necessarily produce more accurate forecasts than simpler ones. Various accuracy measures (MAD, MSE, MAPE) produce consistent results. The performance of methods depends on the forecasting horizon and the kind of data analyzed( yearly, quarterly, monthly). 35 Determining the Adequacy of a Forecasting Technique Are the residuals indicate a random series? (Examine the autocorrelation coefficients of the residuals, there should be no significant ones) Are they approximately normally distributed? Is the technique simple and understood by decision makers? 36