Rob J Hyndman Forecasting: Principles and Practice 2. The forecaster’s toolbox OTexts.com/fpp/2/ Forecasting: Principles and Practice 1 Outline 1 Time series graphics 2 Seasonal or cyclic? 3 Autocorrelation 4 Forecast residuals 5 White noise 6 Evaluating forecast accuracy Forecasting: Principles and Practice Time series graphics 2 Time series graphics plot(melsyd[,"Economy.Class"]) 20 15 10 5 0 Thousands 25 30 Economy class passengers: Melbourne−Sydney 1988 1989 1990 1991 1992 1993 Year Forecasting: Principles and Practice Time series graphics 3 Time series graphics 30 Antidiabetic drug sales 15 10 5 $ million 20 25 > plot(a10) 1995 2000 2005 Year Forecasting: Principles and Practice Time series graphics 4 Time series graphics 30 Seasonal plot: antidiabetic drug sales 2008 ● 2007 ● 25 2006 ● ● ● ● ● ● 2005 ● ● 2003 ● 2002 ● 15 2007 ● 2006 ● ● 2005 2004 ● 2003 ● 2002 1999 ● 1998 1997 1996 ● ● ● 1995 ● 1994 1993 ● 1992 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Jan ● Feb ● ● ● ● ● 2000 ● ● ● ● 2001 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2004 ● 10 $ million 20 ● 5 ● ● ● ● ● ● ● Mar Apr ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● May Jun ● ● ● ● ● Jul ● ● ● ● ● ● ● ● ● 2001 2000 1999 ● 1998 ● 1997 ● ● 1996 ● ● ● ● ● 1995 1993 1994 1992 ● 1991 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Aug Sep Oct ● ● ● ● ● ● ● ● ● Nov Dec Year Forecasting: Principles and Practice Time series graphics 5 Seasonal plots Data plotted against the individual “seasons” in which the data were observed. (In this case a “season” is a month.) Something like a time plot except that the data from each season are overlapped. Enables the underlying seasonal pattern to be seen more clearly, and also allows any substantial departures from the seasonal pattern to be easily identified. In R: seasonplot Forecasting: Principles and Practice Time series graphics 6 Seasonal plots Data plotted against the individual “seasons” in which the data were observed. (In this case a “season” is a month.) Something like a time plot except that the data from each season are overlapped. Enables the underlying seasonal pattern to be seen more clearly, and also allows any substantial departures from the seasonal pattern to be easily identified. In R: seasonplot Forecasting: Principles and Practice Time series graphics 6 Seasonal plots Data plotted against the individual “seasons” in which the data were observed. (In this case a “season” is a month.) Something like a time plot except that the data from each season are overlapped. Enables the underlying seasonal pattern to be seen more clearly, and also allows any substantial departures from the seasonal pattern to be easily identified. In R: seasonplot Forecasting: Principles and Practice Time series graphics 6 Seasonal plots Data plotted against the individual “seasons” in which the data were observed. (In this case a “season” is a month.) Something like a time plot except that the data from each season are overlapped. Enables the underlying seasonal pattern to be seen more clearly, and also allows any substantial departures from the seasonal pattern to be easily identified. In R: seasonplot Forecasting: Principles and Practice Time series graphics 6 Seasonal subseries plots 30 Seasonal subseries plot: antidiabetic drug sales 15 10 5 $ million 20 25 > monthplot(a10) Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Month Forecasting: Principles and Practice Time series graphics 7 Seasonal subseries plots Data for each season collected together in time plot as separate time series. Enables the underlying seasonal pattern to be seen clearly, and changes in seasonality over time to be visualized. In R: monthplot Forecasting: Principles and Practice Time series graphics 8 Seasonal subseries plots Data for each season collected together in time plot as separate time series. Enables the underlying seasonal pattern to be seen clearly, and changes in seasonality over time to be visualized. In R: monthplot Forecasting: Principles and Practice Time series graphics 8 Seasonal subseries plots Data for each season collected together in time plot as separate time series. Enables the underlying seasonal pattern to be seen clearly, and changes in seasonality over time to be visualized. In R: monthplot Forecasting: Principles and Practice Time series graphics 8 Quarterly Australian Beer Production beer <- window(ausbeer,start=1992) plot(beer) seasonplot(beer,year.labels=TRUE) monthplot(beer) Forecasting: Principles and Practice Time series graphics 9 Time series graphics 450 400 megaliters 500 Australian quarterly beer production 1995 Forecasting: Principles and Practice 2000 2005 Time series graphics 10 Time series graphics Seasonal plot: quarterly beer production ● 1992 1994 1997 1999 1995 1998 1993 1996 2002 2000 2001 2006 2003 2005 2007 ● 2004 500 ● ● ● ● ● ● ● ● ● ● ● ● ● 450 400 megalitres ● 2001 1994 1992 2006 1999 2004 2003 1993 1997 1998 2002 2007 1995 2000 2008 2005 1996 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Q1 Q2 Q3 Q4 Quarter Forecasting: Principles and Practice Time series graphics 11 Time series graphics 450 400 Megalitres 500 Seasonal subseries plot: quarterly beer production Jan Apr Jul Oct Quarter Forecasting: Principles and Practice Time series graphics 12 Time series graphics Time plots R command: plot or plot.ts Seasonal plots R command: seasonplot Seasonal subseries plots R command: monthplot Lag plots R command: lag.plot ACF plots R command: Acf Forecasting: Principles and Practice Time series graphics 13 Outline 1 Time series graphics 2 Seasonal or cyclic? 3 Autocorrelation 4 Forecast residuals 5 White noise 6 Evaluating forecast accuracy Forecasting: Principles and Practice Seasonal or cyclic? 14 Time series patterns Trend pattern exists when there is a long-term increase or decrease in the data. Seasonal pattern exists when a series is influenced by seasonal factors (e.g., the quarter of the year, the month, or day of the week). Cyclic pattern exists when data exhibit rises and falls that are not of fixed period (duration usually of at least 2 years). Forecasting: Principles and Practice Seasonal or cyclic? 15 Time series patterns 12000 10000 8000 GWh 14000 Australian electricity production 1980 1985 1990 1995 Year Forecasting: Principles and Practice Seasonal or cyclic? 16 Time series patterns 400 300 200 million units 500 600 Australian clay brick production 1960 1970 1980 1990 Year Forecasting: Principles and Practice Seasonal or cyclic? 17 Time series patterns 60 50 40 30 Total sales 70 80 90 Sales of new one−family houses, USA 1975 1980 Forecasting: Principles and Practice 1985 1990 Seasonal or cyclic? 1995 18 Time series patterns 88 87 86 85 price 89 90 91 US Treasury bill contracts 0 20 40 60 80 100 Day Forecasting: Principles and Practice Seasonal or cyclic? 19 1000 2000 3000 4000 5000 6000 7000 Annual Canadian Lynx trappings 0 Number trapped Time series patterns 1820 1840 1860 1880 1900 1920 Time Forecasting: Principles and Practice Seasonal or cyclic? 20 Seasonal or cyclic? Differences between seasonal and cyclic patterns: seasonal pattern constant length; cyclic pattern variable length average length of cycle longer than length of seasonal pattern magnitude of cycle more variable than magnitude of seasonal pattern The timing of peaks and troughs is predictable with seasonal data, but unpredictable in the long term with cyclic data. Forecasting: Principles and Practice Seasonal or cyclic? 21 Seasonal or cyclic? Differences between seasonal and cyclic patterns: seasonal pattern constant length; cyclic pattern variable length average length of cycle longer than length of seasonal pattern magnitude of cycle more variable than magnitude of seasonal pattern The timing of peaks and troughs is predictable with seasonal data, but unpredictable in the long term with cyclic data. Forecasting: Principles and Practice Seasonal or cyclic? 21 Seasonal or cyclic? Differences between seasonal and cyclic patterns: seasonal pattern constant length; cyclic pattern variable length average length of cycle longer than length of seasonal pattern magnitude of cycle more variable than magnitude of seasonal pattern The timing of peaks and troughs is predictable with seasonal data, but unpredictable in the long term with cyclic data. Forecasting: Principles and Practice Seasonal or cyclic? 21 Seasonal or cyclic? Differences between seasonal and cyclic patterns: seasonal pattern constant length; cyclic pattern variable length average length of cycle longer than length of seasonal pattern magnitude of cycle more variable than magnitude of seasonal pattern The timing of peaks and troughs is predictable with seasonal data, but unpredictable in the long term with cyclic data. Forecasting: Principles and Practice Seasonal or cyclic? 21 Seasonal or cyclic? Differences between seasonal and cyclic patterns: seasonal pattern constant length; cyclic pattern variable length average length of cycle longer than length of seasonal pattern magnitude of cycle more variable than magnitude of seasonal pattern The timing of peaks and troughs is predictable with seasonal data, but unpredictable in the long term with cyclic data. Forecasting: Principles and Practice Seasonal or cyclic? 21 Outline 1 Time series graphics 2 Seasonal or cyclic? 3 Autocorrelation 4 Forecast residuals 5 White noise 6 Evaluating forecast accuracy Forecasting: Principles and Practice Autocorrelation 22 Autocorrelation Covariance and correlation: measure extent of linear relationship between two variables (y and X). Autocovariance and autocorrelation: measure linear relationship between lagged values of a time series y. We measure the relationship between: yt and yt−1 yt and yt−2 yt and yt−3 etc. Forecasting: Principles and Practice Autocorrelation 23 Autocorrelation Covariance and correlation: measure extent of linear relationship between two variables (y and X). Autocovariance and autocorrelation: measure linear relationship between lagged values of a time series y. We measure the relationship between: yt and yt−1 yt and yt−2 yt and yt−3 etc. Forecasting: Principles and Practice Autocorrelation 23 Autocorrelation Covariance and correlation: measure extent of linear relationship between two variables (y and X). Autocovariance and autocorrelation: measure linear relationship between lagged values of a time series y. We measure the relationship between: yt and yt−1 yt and yt−2 yt and yt−3 etc. Forecasting: Principles and Practice Autocorrelation 23 Example: Beer production > lag.plot(beer,lags=9) 500 400 36 beer 31 11 3 15 47 39 35 51 55 59 63 7 43 23 27 19 37 9 1 57 49 45 29 215 41 13 61 25 31 47 6533 11 3 53 15 39 35 51 17 55 7 23 43 59 27 19 63 50 12 8 20 44 36 40 60 52 37 9 1 57 4929 45 5 21 25 133161 41 33 4711 3 15 39 35 53 51 27 55 42 591423 43 17 27 3454 18 19 26 22 63 50 58 62 30 10 46 38 6 37 9 1 57 49 5 45 2129 25 61 13 41 33 6 53 17 14 2 42 54 34 18 22 26 50 58 30 10 3862 46 500 36 52 52 48 37 9 1 4945 29 5 21 25 13 61 41 31 11 47 33 3 53 39 3515 7 51 17 55 23 43 59 27 19 57 47 59 51 35 55 43 31 3 11 39 15 7 23 27 19 28 44 4 24 16 4 12 24 32 16 8 20 8 44 36 48 40 56 37 57 29 41 33 53 14 42 54 34 26 18 22 50 58 10 46 38 2 9 1 49 13 21 5 25 17 30 lag 7 Forecasting: Principles and Practice 9 400 3038 1 29 lag 8 450 10 28 20 36 40 48 56 52 37 57 45 5 49 21 25 13 41 11 47 6 31 333 15 3953 35 2 425514 51 7 17 43 23 59 54 27 34 22 1819 26 50 30 58 1046 38 222618 50 lag 6 12 45 6 142 42 54 34 58 46 32 6 17 34 8 lag 5 beer 450 beer 1628 44 20 4 24 52 9 1 5 2921 41 25 13 33 12 40 60 56 60 40 48 31 1147 3 39 15 35 51 7 55 23 43 59 27 19 2 42 54 2218 26 50 58 62 3010 38 46 53 lag 3 4 24 32 32 56 400 30 10 4 24 28 16 8 20 44 36 4860 40 lag 4 32 16 28 8 20 44 36 58 46 38 56 beer beer 56 12 42 14 2 34 18 26 22 54 61 6 3 lag 2 4 12 24 28 32 16 48 37 57 45 49 62 lag 1 52 3111 47 15 39 35 51 755 43 59 23 14 27 19 63 6 500 1 57 45 29 49 5 21 61 25 41 13 65 33 6 53 17 42 142 5434 1826 22 66 50 62 58 3846 30 10 500 56 64 52 500 beer 450 beer 400 52 37 9 450 4 12 24 32 16 28 8 20 44 36 60 4048 beer 500 20 44 40 48 60 56 64 4 450 12 24 32 28 16 8 20 44 36 40 60 48 56 64 8 400 450 12 4 24 28 16 32 beer 400 52 37 9 1 57 2949 455 21 41 1325 33 6 53 17 1442 2 54 34 18 26 22 50 58 10 46 3038 47 43 55 31 39 51 11 35 27 3 15 7 23 19 lag 9 Autocorrelation 24 Example: Beer production > lag.plot(beer,lags=9,do.lines=FALSE) 400 ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● beer ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●●● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ●● ●● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● 500 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●●●● ●● ● ●●● ● ● ● ●● ●● ●● ●● ● ● ● ● ● ●●●● ● ● ●● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ●● ● beer ● ● ● ● ● ● ● ● ● lag 7 Forecasting: Principles and Practice ● ● ● lag 8 450 ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ●●● ● ●● ● ● ●●● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ●● ● 400 ●● ●● ● ● ● ● ● ●●● ● lag 6 ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● lag 5 ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● lag 3 ● ● ● ● ● 450 beer ● ● ●● ● ● ● ● ● ●● ● ● ●● lag 4 400 ● ● ● ● beer beer ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● lag 2 ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ●●●● ● ●● ● ● ● ● ● ●● ● ● ●●●● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● lag 1 ●● 500 ● ● ● ● ● ● ● ● 500 ● ● 450 ●● ● ● 500 beer 400 450 beer ● ● 450 500 ●● 500 ● 400 ●● ● ● beer 450 ● ● beer 400 ● ● ● ● ●●●● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● lag 9 Autocorrelation 25 Lagged scatterplots Each graph shows yt plotted against yt−k for different values of k. The autocorrelations are the correlations associated with these scatterplots. Forecasting: Principles and Practice Autocorrelation 26 Lagged scatterplots Each graph shows yt plotted against yt−k for different values of k. The autocorrelations are the correlations associated with these scatterplots. Forecasting: Principles and Practice Autocorrelation 26 Autocorrelation We denote the sample autocovariance at lag k by ck and the sample autocorrelation at lag k by rk . Then define ck = and T 1 X T (yt − ȳ)(yt−k − ȳ) t =k +1 rk = ck /c0 r1 indicates how successive values of y relate to each other r2 indicates how y values two periods apart relate to each other rk is almost the same as the sample correlation between yt and yt−k . Forecasting: Principles and Practice Autocorrelation 27 Autocorrelation We denote the sample autocovariance at lag k by ck and the sample autocorrelation at lag k by rk . Then define ck = and T 1 X T (yt − ȳ)(yt−k − ȳ) t =k +1 rk = ck /c0 r1 indicates how successive values of y relate to each other r2 indicates how y values two periods apart relate to each other rk is almost the same as the sample correlation between yt and yt−k . Forecasting: Principles and Practice Autocorrelation 27 Autocorrelation We denote the sample autocovariance at lag k by ck and the sample autocorrelation at lag k by rk . Then define ck = and T 1 X T (yt − ȳ)(yt−k − ȳ) t =k +1 rk = ck /c0 r1 indicates how successive values of y relate to each other r2 indicates how y values two periods apart relate to each other rk is almost the same as the sample correlation between yt and yt−k . Forecasting: Principles and Practice Autocorrelation 27 Autocorrelation We denote the sample autocovariance at lag k by ck and the sample autocorrelation at lag k by rk . Then define ck = and T 1 X T (yt − ȳ)(yt−k − ȳ) t =k +1 rk = ck /c0 r1 indicates how successive values of y relate to each other r2 indicates how y values two periods apart relate to each other rk is almost the same as the sample correlation between yt and yt−k . Forecasting: Principles and Practice Autocorrelation 27 Autocorrelation Results for first 9 lags for beer data: r1 r2 r3 r4 r5 r6 r7 r8 r9 −0.126 −0.650 −0.094 0.863 −0.099 −0.642 −0.098 0.834 −0.116 Forecasting: Principles and Practice Autocorrelation 28 Autocorrelation Results for first 9 lags for beer data: r1 r2 r3 r4 r5 r6 r7 r8 r9 ACF −0.5 0.0 0.5 −0.126 −0.650 −0.094 0.863 −0.099 −0.642 −0.098 0.834 −0.116 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Lag Forecasting: Principles and Practice Autocorrelation 28 Autocorrelation r4 higher than for the other lags. This is due to the seasonal pattern in the data: the peaks tend to be 4 quarters apart and the troughs tend to be 2 quarters apart. r2 is more negative than for the other lags because troughs tend to be 2 quarters behind peaks. Together, the autocorrelations at lags 1, 2, . . . , make up the autocorrelation or ACF. The plot is known as a correlogram Forecasting: Principles and Practice Autocorrelation 29 Autocorrelation r4 higher than for the other lags. This is due to the seasonal pattern in the data: the peaks tend to be 4 quarters apart and the troughs tend to be 2 quarters apart. r2 is more negative than for the other lags because troughs tend to be 2 quarters behind peaks. Together, the autocorrelations at lags 1, 2, . . . , make up the autocorrelation or ACF. The plot is known as a correlogram Forecasting: Principles and Practice Autocorrelation 29 Autocorrelation r4 higher than for the other lags. This is due to the seasonal pattern in the data: the peaks tend to be 4 quarters apart and the troughs tend to be 2 quarters apart. r2 is more negative than for the other lags because troughs tend to be 2 quarters behind peaks. Together, the autocorrelations at lags 1, 2, . . . , make up the autocorrelation or ACF. The plot is known as a correlogram Forecasting: Principles and Practice Autocorrelation 29 Autocorrelation r4 higher than for the other lags. This is due to the seasonal pattern in the data: the peaks tend to be 4 quarters apart and the troughs tend to be 2 quarters apart. r2 is more negative than for the other lags because troughs tend to be 2 quarters behind peaks. Together, the autocorrelations at lags 1, 2, . . . , make up the autocorrelation or ACF. The plot is known as a correlogram Forecasting: Principles and Practice Autocorrelation 29 0.0 −0.5 ACF 0.5 ACF 1 2 3 4 5 6 Acf(beer) Forecasting: Principles and Practice 7 8 9 10 11 12 13 14 15 16 17 Lag Autocorrelation 30 0.0 −0.5 ACF 0.5 ACF 1 2 3 4 5 6 Acf(beer) Forecasting: Principles and Practice 7 8 9 10 11 12 13 14 15 16 17 Lag Autocorrelation 30 Recognizing seasonality in a time series If there is seasonality, the ACF at the seasonal lag (e.g., 12 for monthly data) will be large and positive. For seasonal monthly data, a large ACF value will be seen at lag 12 and possibly also at lags 24, 36, . . . For seasonal quarterly data, a large ACF value will be seen at lag 4 and possibly also at lags 8, 12, . . . Forecasting: Principles and Practice Autocorrelation 31 Recognizing seasonality in a time series If there is seasonality, the ACF at the seasonal lag (e.g., 12 for monthly data) will be large and positive. For seasonal monthly data, a large ACF value will be seen at lag 12 and possibly also at lags 24, 36, . . . For seasonal quarterly data, a large ACF value will be seen at lag 4 and possibly also at lags 8, 12, . . . Forecasting: Principles and Practice Autocorrelation 31 Australian monthly electricity production 12000 10000 8000 GWh 14000 Australian electricity production 1980 1985 1990 1995 Year Forecasting: Principles and Practice Autocorrelation 32 0.4 0.2 0.0 −0.2 ACF 0.6 0.8 Australian monthly electricity production 0 10 20 30 40 Lag Forecasting: Principles and Practice Autocorrelation 33 Australian monthly electricity production Time plot shows clear trend and seasonality. The same features are reflected in the ACF. The slowly decaying ACF indicates trend. The ACF peaks at lags 12, 24, 36, . . . , indicate seasonality of length 12. Forecasting: Principles and Practice Autocorrelation 34 Australian monthly electricity production Time plot shows clear trend and seasonality. The same features are reflected in the ACF. The slowly decaying ACF indicates trend. The ACF peaks at lags 12, 24, 36, . . . , indicate seasonality of length 12. Forecasting: Principles and Practice Autocorrelation 34 Which is which? 10 9 7 thousands 2. Accidental deaths in USA (monthly) 8 chirps per minute 40 60 80 1. Daily morning temperature of a cow 0 20 40 60 1973 1975 1977 1979 4. Annual mink trappings (Canada) 100 60 20 thousands 100 thousands 200 300 400 3. International airline passengers 1950 1952 1954 1956 1850 1870 1890 1910 ACF 0.2 0.6 -0.4 -0.4 ACF 0.2 0.6 1.0 B 1.0 A 5 10 15 20 5 10 15 20 15 20 ACF 0.2 0.6 -0.4 -0.4 ACF 0.2 0.6 1.0 D 1.0 C 5 10 15 20 5 10 Outline 1 Time series graphics 2 Seasonal or cyclic? 3 Autocorrelation 4 Forecast residuals 5 White noise 6 Evaluating forecast accuracy Forecasting: Principles and Practice Forecast residuals 36 Forecasting residuals Residuals in forecasting: difference between observed value and its forecast based on all previous observations: et = yt − ŷt|t−1 . Assumptions 1 {et } uncorrelated. If they aren’t, then information left in residuals that should be used in computing forecasts. 2 {et } have mean zero. If they don’t, then forecasts are biased. Useful properties (for prediction intervals) 3 {et } have constant variance. 4 {et } are normally distributed. Forecasting: Principles and Practice Forecast residuals 37 Forecasting residuals Residuals in forecasting: difference between observed value and its forecast based on all previous observations: et = yt − ŷt|t−1 . Assumptions 1 {et } uncorrelated. If they aren’t, then information left in residuals that should be used in computing forecasts. 2 {et } have mean zero. If they don’t, then forecasts are biased. Useful properties (for prediction intervals) 3 {et } have constant variance. 4 {et } are normally distributed. Forecasting: Principles and Practice Forecast residuals 37 Forecasting residuals Residuals in forecasting: difference between observed value and its forecast based on all previous observations: et = yt − ŷt|t−1 . Assumptions 1 {et } uncorrelated. If they aren’t, then information left in residuals that should be used in computing forecasts. 2 {et } have mean zero. If they don’t, then forecasts are biased. Useful properties (for prediction intervals) 3 {et } have constant variance. 4 {et } are normally distributed. Forecasting: Principles and Practice Forecast residuals 37 3800 3700 3600 Dow−Jones index 3900 Forecasting Dow-Jones index 0 50 100 150 200 250 300 Day Forecasting: Principles and Practice Forecast residuals 38 Forecasting Dow-Jones index Naïve forecast: ŷt|t−1 = yt−1 et = yt − yt−1 Note: et are one-step-forecast residuals Forecasting: Principles and Practice Forecast residuals 39 Forecasting Dow-Jones index Naïve forecast: ŷt|t−1 = yt−1 et = yt − yt−1 Note: et are one-step-forecast residuals Forecasting: Principles and Practice Forecast residuals 39 Forecasting Dow-Jones index Naïve forecast: ŷt|t−1 = yt−1 et = yt − yt−1 Note: et are one-step-forecast residuals Forecasting: Principles and Practice Forecast residuals 39 3800 3700 3600 Dow−Jones index 3900 Forecasting Dow-Jones index 0 50 100 150 200 250 300 Day Forecasting: Principles and Practice Forecast residuals 40 0 −50 −100 Change in Dow−Jones index 50 Forecasting Dow-Jones index 0 50 100 150 200 250 300 Day Forecasting: Principles and Practice Forecast residuals 41 Forecasting Dow-Jones index 10 20 30 Normal? 0 Frequency 40 50 60 Histogram of residuals −100 −50 0 50 Change in Dow−Jones index Forecasting: Principles and Practice Forecast residuals 42 −0.15 −0.05 ACF 0.05 0.10 0.15 Forecasting Dow-Jones index 1 2 3 4 5 6 7 8 9 10 12 14 16 18 20 22 Lag Forecasting: Principles and Practice Forecast residuals 43 Forecasting Dow-Jones index fc <- rwf(dj) res <- residuals(fc) plot(res) hist(res,breaks="FD") Acf(res,main="") Forecasting: Principles and Practice Forecast residuals 44 Outline 1 Time series graphics 2 Seasonal or cyclic? 3 Autocorrelation 4 Forecast residuals 5 White noise 6 Evaluating forecast accuracy Forecasting: Principles and Practice White noise 45 Example: White noise −3 −2 −1 x 0 1 2 White noise 0 10 20 30 40 50 Time Forecasting: Principles and Practice White noise 46 Example: White noise 2 White noise −3 −2 −1 x 0 1 White noise data is uncorrelated across time with zero mean and constant variance. (Technically, we require independence as well.) 0 10 20 30 40 50 Time Forecasting: Principles and Practice White noise 46 Example: White noise 2 White noise −1 x 0 1 White noise data is uncorrelated across time with zero mean and constant variance. (Technically, we require independence as well.) −3 −2 Think of white noise as completely uninteresting with no predictable patterns. 0 10 20 30 40 50 Time Forecasting: Principles and Practice White noise 46 r10 0.2 0.0 ACF 0.013 −0.163 0.163 −0.259 −0.198 0.064 −0.139 −0.032 0.199 −0.240 −0.2 = = = = = = = = = = −0.4 r1 r2 r3 r4 r5 r6 r7 r8 r9 0.4 Example: White noise 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Lag Sample autocorrelations for white noise series. For uncorrelated data, we would expect each autocorrelation to be close to zero. Forecasting: Principles and Practice White noise 47 Sampling distribution of autocorrelations Sampling distribution of rk for white noise data is asymptotically N(0,1/T). 95% of √ all rk for white noise must lie within ±1.96/ T. If this is not the case, the series is probably not WN. √ Common to plot lines at ±1.96/ T when plotting ACF. These are the critical values. Forecasting: Principles and Practice White noise 48 Sampling distribution of autocorrelations Sampling distribution of rk for white noise data is asymptotically N(0,1/T). 95% of √ all rk for white noise must lie within ±1.96/ T. If this is not the case, the series is probably not WN. √ Common to plot lines at ±1.96/ T when plotting ACF. These are the critical values. Forecasting: Principles and Practice White noise 48 Sampling distribution of autocorrelations Sampling distribution of rk for white noise data is asymptotically N(0,1/T). 95% of √ all rk for white noise must lie within ±1.96/ T. If this is not the case, the series is probably not WN. √ Common to plot lines at ±1.96/ T when plotting ACF. These are the critical values. Forecasting: Principles and Practice White noise 48 Sampling distribution of autocorrelations Sampling distribution of rk for white noise data is asymptotically N(0,1/T). 95% of √ all rk for white noise must lie within ±1.96/ T. If this is not the case, the series is probably not WN. √ Common to plot lines at ±1.96/ T when plotting ACF. These are the critical values. Forecasting: Principles and Practice White noise 48 Autocorrelation 0.0 −0.4 −0.2 ACF 0.2 0.4 Example: T = 50 and so critical √ values at ±1.96/ 50 = ±0.28. All autocorrelation coefficients lie within 1 2 3 4 5 6 these limits, confirming that the data are white noise. (More precisely, the data cannot be distinguished from white noise.) Forecasting: Principles and Practice 7 8 9 10 11 12 13 14 Lag White noise 49 15 Example: Pigs slaughtered 100 90 80 thousands 110 Number of pigs slaughtered in Victoria 1990 1991 1992 Forecasting: Principles and Practice 1993 1994 White noise 1995 50 0.0 −0.2 ACF 0.2 Example: Pigs slaughtered 0 10 20 30 40 Lag Forecasting: Principles and Practice White noise 51 Example: Pigs slaughtered Monthly total number of pigs slaughtered in the state of Victoria, Australia, from January 1990 through August 1995. (Source: Australian Bureau of Statistics.) Difficult to detect pattern in time plot. ACF shows some significant autocorrelation at lags 1, 2, and 3. r12 relatively large although not significant. This may indicate some slight seasonality. These show the series is not a white noise series. Forecasting: Principles and Practice White noise 52 Example: Pigs slaughtered Monthly total number of pigs slaughtered in the state of Victoria, Australia, from January 1990 through August 1995. (Source: Australian Bureau of Statistics.) Difficult to detect pattern in time plot. ACF shows some significant autocorrelation at lags 1, 2, and 3. r12 relatively large although not significant. This may indicate some slight seasonality. These show the series is not a white noise series. Forecasting: Principles and Practice White noise 52 Example: Pigs slaughtered Monthly total number of pigs slaughtered in the state of Victoria, Australia, from January 1990 through August 1995. (Source: Australian Bureau of Statistics.) Difficult to detect pattern in time plot. ACF shows some significant autocorrelation at lags 1, 2, and 3. r12 relatively large although not significant. This may indicate some slight seasonality. These show the series is not a white noise series. Forecasting: Principles and Practice White noise 52 Example: Pigs slaughtered Monthly total number of pigs slaughtered in the state of Victoria, Australia, from January 1990 through August 1995. (Source: Australian Bureau of Statistics.) Difficult to detect pattern in time plot. ACF shows some significant autocorrelation at lags 1, 2, and 3. r12 relatively large although not significant. This may indicate some slight seasonality. These show the series is not a white noise series. Forecasting: Principles and Practice White noise 52 Example: Pigs slaughtered Monthly total number of pigs slaughtered in the state of Victoria, Australia, from January 1990 through August 1995. (Source: Australian Bureau of Statistics.) Difficult to detect pattern in time plot. ACF shows some significant autocorrelation at lags 1, 2, and 3. r12 relatively large although not significant. This may indicate some slight seasonality. These show the series is not a white noise series. Forecasting: Principles and Practice White noise 52 Example: Pigs slaughtered Monthly total number of pigs slaughtered in the state of Victoria, Australia, from January 1990 through August 1995. (Source: Australian Bureau of Statistics.) Difficult to detect pattern in time plot. ACF shows some significant autocorrelation at lags 1, 2, and 3. r12 relatively large although not significant. This may indicate some slight seasonality. These show the series is not a white noise series. Forecasting: Principles and Practice White noise 52 ACF of residuals We assume that the residuals are white noise (uncorrelated, mean zero, constant variance). If they aren’t, then there is information left in the residuals that should be used in computing forecasts. So a standard residual diagnostic is to check the ACF of the residuals of a forecasting method. We expect these to look like white noise. Dow-Jones naive forecasts revisited ŷt|t−1 = yt−1 et = yt − yt−1 Forecasting: Principles and Practice White noise 53 ACF of residuals We assume that the residuals are white noise (uncorrelated, mean zero, constant variance). If they aren’t, then there is information left in the residuals that should be used in computing forecasts. So a standard residual diagnostic is to check the ACF of the residuals of a forecasting method. We expect these to look like white noise. Dow-Jones naive forecasts revisited ŷt|t−1 = yt−1 et = yt − yt−1 Forecasting: Principles and Practice White noise 53 ACF of residuals We assume that the residuals are white noise (uncorrelated, mean zero, constant variance). If they aren’t, then there is information left in the residuals that should be used in computing forecasts. So a standard residual diagnostic is to check the ACF of the residuals of a forecasting method. We expect these to look like white noise. Dow-Jones naive forecasts revisited ŷt|t−1 = yt−1 et = yt − yt−1 Forecasting: Principles and Practice White noise 53 ACF of residuals We assume that the residuals are white noise (uncorrelated, mean zero, constant variance). If they aren’t, then there is information left in the residuals that should be used in computing forecasts. So a standard residual diagnostic is to check the ACF of the residuals of a forecasting method. We expect these to look like white noise. Dow-Jones naive forecasts revisited ŷt|t−1 = yt−1 et = yt − yt−1 Forecasting: Principles and Practice White noise 53 ACF of residuals We assume that the residuals are white noise (uncorrelated, mean zero, constant variance). If they aren’t, then there is information left in the residuals that should be used in computing forecasts. So a standard residual diagnostic is to check the ACF of the residuals of a forecasting method. We expect these to look like white noise. Dow-Jones naive forecasts revisited ŷt|t−1 = yt−1 et = yt − yt−1 Forecasting: Principles and Practice White noise 53 0 −50 −100 Change in Dow−Jones index 50 Forecasting Dow-Jones index 0 50 100 150 200 250 300 Day Forecasting: Principles and Practice White noise 54 −0.15 −0.05 ACF 0.05 0.10 0.15 Forecasting Dow-Jones index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Lag Forecasting: Principles and Practice White noise 55 −0.15 −0.05 ACF 0.05 0.10 0.15 Example: Dow-Jones residuals 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Lag These look like white noise. But the ACF is a multiple testing problem. Forecasting: Principles and Practice White noise 56 −0.15 −0.05 ACF 0.05 0.10 0.15 Example: Dow-Jones residuals 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Lag These look like white noise. But the ACF is a multiple testing problem. Forecasting: Principles and Practice White noise 56 Portmanteau tests Consider a whole set of rk values, and develop a test to see whether the set is significantly different from a zero set. Forecasting: Principles and Practice White noise 57 Portmanteau tests Consider a whole set of rk values, and develop a test to see whether the set is significantly different from a zero set. Box-Pierce test Q=T h X rk2 k =1 where h is max lag being considered and T is number of observations. My preferences: h = 10 for non-seasonal data, h = 2m for seasonal data. If each rk close to zero, Q will be small. If some rk values large (positive or negative), Q will be large. Forecasting: Principles and Practice White noise 57 Portmanteau tests Consider a whole set of rk values, and develop a test to see whether the set is significantly different from a zero set. Box-Pierce test Q=T h X rk2 k =1 where h is max lag being considered and T is number of observations. My preferences: h = 10 for non-seasonal data, h = 2m for seasonal data. If each rk close to zero, Q will be small. If some rk values large (positive or negative), Q will be large. Forecasting: Principles and Practice White noise 57 Portmanteau tests Consider a whole set of rk values, and develop a test to see whether the set is significantly different from a zero set. Box-Pierce test Q=T h X rk2 k =1 where h is max lag being considered and T is number of observations. My preferences: h = 10 for non-seasonal data, h = 2m for seasonal data. If each rk close to zero, Q will be small. If some rk values large (positive or negative), Q will be large. Forecasting: Principles and Practice White noise 57 Portmanteau tests Consider a whole set of rk values, and develop a test to see whether the set is significantly different from a zero set. Box-Pierce test Q=T h X rk2 k =1 where h is max lag being considered and T is number of observations. My preferences: h = 10 for non-seasonal data, h = 2m for seasonal data. If each rk close to zero, Q will be small. If some rk values large (positive or negative), Q will be large. Forecasting: Principles and Practice White noise 57 Portmanteau tests Consider a whole set of rk values, and develop a test to see whether the set is significantly different from a zero set. Ljung-Box test ∗ Q = T (T + 2) h X (T − k)−1 rk2 k =1 where h is max lag being considered and T is number of observations. My preferences: h = 10 for non-seasonal data, h = 2m for seasonal data. Better performance, especially in small samples. Forecasting: Principles and Practice White noise 57 Portmanteau tests If data are WN, Q∗ has χ2 distribution with (h − K ) degrees of freedom where K = no. parameters in model. When applied to raw data, set K = 0. For the Dow-Jones example, res <- residuals(naive(dj)) # lag=h and fitdf=K > Box.test(res, lag=10, Box-Pierce test X-squared = 14.0451, df > Box.test(res, lag=10, Box-Ljung test X-squared = 14.4615, df Forecasting: Principles and Practice fitdf=0) = 10, p-value = 0.1709 fitdf=0, type="Lj") = 10, p-value = 0.153 White noise 58 Portmanteau tests If data are WN, Q∗ has χ2 distribution with (h − K ) degrees of freedom where K = no. parameters in model. When applied to raw data, set K = 0. For the Dow-Jones example, res <- residuals(naive(dj)) # lag=h and fitdf=K > Box.test(res, lag=10, Box-Pierce test X-squared = 14.0451, df > Box.test(res, lag=10, Box-Ljung test X-squared = 14.4615, df Forecasting: Principles and Practice fitdf=0) = 10, p-value = 0.1709 fitdf=0, type="Lj") = 10, p-value = 0.153 White noise 58 Portmanteau tests If data are WN, Q∗ has χ2 distribution with (h − K ) degrees of freedom where K = no. parameters in model. When applied to raw data, set K = 0. For the Dow-Jones example, res <- residuals(naive(dj)) # lag=h and fitdf=K > Box.test(res, lag=10, Box-Pierce test X-squared = 14.0451, df > Box.test(res, lag=10, Box-Ljung test X-squared = 14.4615, df Forecasting: Principles and Practice fitdf=0) = 10, p-value = 0.1709 fitdf=0, type="Lj") = 10, p-value = 0.153 White noise 58 Exercise 1 2 Calculate the residuals from a seasonal naive forecast applied to the quarterly Australian beer production data from 1992. Test if the residuals are white noise. Forecasting: Principles and Practice White noise 59 Exercise 1 2 Calculate the residuals from a seasonal naive forecast applied to the quarterly Australian beer production data from 1992. Test if the residuals are white noise. beer <- window(ausbeer,start=1992) fc <- snaive(beer) res <- residuals(fc) Acf(res) Box.test(res, lag=8, fitdf=0, type="Lj") Forecasting: Principles and Practice White noise 60 Outline 1 Time series graphics 2 Seasonal or cyclic? 3 Autocorrelation 4 Forecast residuals 5 White noise 6 Evaluating forecast accuracy Forecasting: Principles and Practice Evaluating forecast accuracy 61 Measures of forecast accuracy Let yt denote the tth observation and ŷt|t−1 denote its forecast based on all previous data, where t = 1, . . . , T. Then the following measures are useful. MAE = T −1 T X |yt − ŷt|t−1 | t =1 T MSE = T −1 X (yt − ŷt|t−1 )2 v u T u X (yt − ŷt|t−1 )2 RMSE = tT −1 t =1 t =1 MAPE = 100T −1 T X |yt − ŷt|t−1 |/|yt | t =1 MAE, MSE, RMSE are all scale dependent. MAPE is scale independent but is only sensible if yt 0 for all t, and y has a natural zero. Forecasting: Principles and Practice Evaluating forecast accuracy 62 Measures of forecast accuracy Let yt denote the tth observation and ŷt|t−1 denote its forecast based on all previous data, where t = 1, . . . , T. Then the following measures are useful. MAE = T −1 T X |yt − ŷt|t−1 | t =1 T MSE = T −1 X (yt − ŷt|t−1 )2 v u T u X (yt − ŷt|t−1 )2 RMSE = tT −1 t =1 t =1 MAPE = 100T −1 T X |yt − ŷt|t−1 |/|yt | t =1 MAE, MSE, RMSE are all scale dependent. MAPE is scale independent but is only sensible if yt 0 for all t, and y has a natural zero. Forecasting: Principles and Practice Evaluating forecast accuracy 62 Measures of forecast accuracy Let yt denote the tth observation and ŷt|t−1 denote its forecast based on all previous data, where t = 1, . . . , T. Then the following measures are useful. MAE = T −1 T X |yt − ŷt|t−1 | t =1 T MSE = T −1 X (yt − ŷt|t−1 )2 v u T u X (yt − ŷt|t−1 )2 RMSE = tT −1 t =1 t =1 MAPE = 100T −1 T X |yt − ŷt|t−1 |/|yt | t =1 MAE, MSE, RMSE are all scale dependent. MAPE is scale independent but is only sensible if yt 0 for all t, and y has a natural zero. Forecasting: Principles and Practice Evaluating forecast accuracy 62 Measures of forecast accuracy Mean Absolute Scaled Error MASE = T −1 T X |yt − ŷt|t−1 |/Q t =1 where Q is a stable measure of the scale of the time series {yt }. Forecasting: Principles and Practice Evaluating forecast accuracy 63 Measures of forecast accuracy Mean Absolute Scaled Error MASE = T −1 T X |yt − ŷt|t−1 |/Q t =1 where Q is a stable measure of the scale of the time series {yt }. Proposed by Hyndman and Koehler (IJF, 2006) Forecasting: Principles and Practice Evaluating forecast accuracy 63 Measures of forecast accuracy Mean Absolute Scaled Error MASE = T −1 T X |yt − ŷt|t−1 |/Q t =1 where Q is a stable measure of the scale of the time series {yt }. For non-seasonal time series, −1 Q = (T − 1) T X |yt − yt−1 | t =2 works well. Then MASE is equivalent to MAE relative to a naive method. Forecasting: Principles and Practice Evaluating forecast accuracy 63 Measures of forecast accuracy Mean Absolute Scaled Error MASE = T −1 T X |yt − ŷt|t−1 |/Q t =1 where Q is a stable measure of the scale of the time series {yt }. For seasonal time series, −1 Q = (T − m) T X |yt − yt−m | t =m+1 works well. Then MASE is equivalent to MAE relative to a seasonal naive method. Forecasting: Principles and Practice Evaluating forecast accuracy 64 Measures of forecast accuracy Forecasts for quarterly beer production 400 450 500 Mean method Naive method Seasonal naive method 1995 Forecasting: Principles and Practice 2000 2005 Evaluating forecast accuracy 65 Measures of forecast accuracy Forecasts for quarterly beer production 400 450 500 Mean method Naive method Seasonal naive method 1995 Forecasting: Principles and Practice 2000 2005 Evaluating forecast accuracy 65 Measures of forecast accuracy Mean method RMSE 38.0145 MAE 33.7776 MAPE 8.1700 MASE 2.2990 MAPE 15.8765 MASE 4.3498 Naïve method RMSE 70.9065 MAE 63.9091 Seasonal naïve method RMSE 12.9685 MAE 11.2727 Forecasting: Principles and Practice MAPE 2.7298 MASE 0.7673 Evaluating forecast accuracy 66 Measures of forecast accuracy Dow Jones Index (daily ending 15 Jul 94) 3600 3700 3800 3900 Mean method Naive method Drift model 0 50 100 150 200 250 300 Day Forecasting: Principles and Practice Evaluating forecast accuracy 67 Measures of forecast accuracy Dow Jones Index (daily ending 15 Jul 94) 3600 3700 3800 3900 Mean method Naive method Drift model 0 50 100 150 200 250 300 Day Forecasting: Principles and Practice Evaluating forecast accuracy 67 Measures of forecast accuracy Mean method RMSE 148.2357 MAE 142.4185 MAPE 3.6630 MASE 8.6981 MAE 54.4405 MAPE 1.3979 MASE 3.3249 MAE 45.7274 MAPE 1.1758 MASE 2.7928 Naïve method RMSE 62.0285 Drift model RMSE 53.6977 Forecasting: Principles and Practice Evaluating forecast accuracy 68 Training and test sets Available data Training set (e.g., 80%) Test set (e.g., 20%) The test set must not be used for any aspect of model development or calculation of forecasts. Forecast accuracy is based only on the test set. Forecasting: Principles and Practice Evaluating forecast accuracy 69 Training and test sets Available data Training set (e.g., 80%) Test set (e.g., 20%) The test set must not be used for any aspect of model development or calculation of forecasts. Forecast accuracy is based only on the test set. Forecasting: Principles and Practice Evaluating forecast accuracy 69 Training and test sets beer3 <- window(ausbeer,start=1992,end=2005.99) beer4 <- window(ausbeer,start=2006) fit1 <- meanf(beer3,h=20) fit2 <- rwf(beer3,h=20) accuracy(fit1,beer4) accuracy(fit2,beer4) In-sample accuracy (one-step forecasts) accuracy(fit1) accuracy(fit2) Forecasting: Principles and Practice Evaluating forecast accuracy 70 Training and test sets beer3 <- window(ausbeer,start=1992,end=2005.99) beer4 <- window(ausbeer,start=2006) fit1 <- meanf(beer3,h=20) fit2 <- rwf(beer3,h=20) accuracy(fit1,beer4) accuracy(fit2,beer4) In-sample accuracy (one-step forecasts) accuracy(fit1) accuracy(fit2) Forecasting: Principles and Practice Evaluating forecast accuracy 70 Beware of over-fitting A model which fits the data well does not necessarily forecast well. A perfect fit can always be obtained by using a model with enough parameters. (Compare R2 ) Over-fitting a model to data is as bad as failing to identify the systematic pattern in the data. Problems can be overcome by measuring true out-of-sample forecast accuracy. That is, total data divided into “training” set and “test” set. Training set used to estimate parameters. Forecasts are made for test set. Accuracy measures computed for errors in test set only. Forecasting: Principles and Practice Evaluating forecast accuracy 71 Beware of over-fitting A model which fits the data well does not necessarily forecast well. A perfect fit can always be obtained by using a model with enough parameters. (Compare R2 ) Over-fitting a model to data is as bad as failing to identify the systematic pattern in the data. Problems can be overcome by measuring true out-of-sample forecast accuracy. That is, total data divided into “training” set and “test” set. Training set used to estimate parameters. Forecasts are made for test set. Accuracy measures computed for errors in test set only. Forecasting: Principles and Practice Evaluating forecast accuracy 71 Beware of over-fitting A model which fits the data well does not necessarily forecast well. A perfect fit can always be obtained by using a model with enough parameters. (Compare R2 ) Over-fitting a model to data is as bad as failing to identify the systematic pattern in the data. Problems can be overcome by measuring true out-of-sample forecast accuracy. That is, total data divided into “training” set and “test” set. Training set used to estimate parameters. Forecasts are made for test set. Accuracy measures computed for errors in test set only. Forecasting: Principles and Practice Evaluating forecast accuracy 71 Beware of over-fitting A model which fits the data well does not necessarily forecast well. A perfect fit can always be obtained by using a model with enough parameters. (Compare R2 ) Over-fitting a model to data is as bad as failing to identify the systematic pattern in the data. Problems can be overcome by measuring true out-of-sample forecast accuracy. That is, total data divided into “training” set and “test” set. Training set used to estimate parameters. Forecasts are made for test set. Accuracy measures computed for errors in test set only. Forecasting: Principles and Practice Evaluating forecast accuracy 71 Beware of over-fitting A model which fits the data well does not necessarily forecast well. A perfect fit can always be obtained by using a model with enough parameters. (Compare R2 ) Over-fitting a model to data is as bad as failing to identify the systematic pattern in the data. Problems can be overcome by measuring true out-of-sample forecast accuracy. That is, total data divided into “training” set and “test” set. Training set used to estimate parameters. Forecasts are made for test set. Accuracy measures computed for errors in test set only. Forecasting: Principles and Practice Evaluating forecast accuracy 71 Poll: true or false? 1 Good forecast methods should have normally distributed residuals. 2 A model with small residuals will give good forecasts. 3 The best measure of forecast accuracy is MAPE. 4 If your model doesn’t forecast well, you should make it more complicated. 5 Always choose the model with the best forecast accuracy as measured on the test set. Forecasting: Principles and Practice Evaluating forecast accuracy 72