ECON 251 Research Methods 11. Time Series Analysis and Forecasting 1 Introduction Any variable that is measured over time in sequential order is called a time series. We analyze time series to detect patterns. The patterns help in forecasting future values of the time series. Future expected value The time series exhibit a downward trend pattern. 2 Components of a Time Series A time series can consist of four components. • • • • Long - term trend (T) Cyclical effect (C) Seasonal effect (S) Random variation (R) A trend is a long term relatively smooth pattern or direction, that persists usually for more than one year. 3 Components of a Time Series A time series can consists of four components. • • • • Long - term trend (T) Cyclical effect (C) Seasonal effect (S) Random variation (R) A cycle is a wavelike pattern describing a long term behavior (for more than one year). 6-88 12-88 6-89 12-89 6-90 Cycles are seldom regular, and often appear in combination with 4 other components. Components of a Time Series A time series can consists of four components. • • • • Long - term trend (T) Cyclical effect (C) Seasonal effect (S) Random variation (R) The seasonal component of the time-series exhibits a short term (less than one year) calendar repetitive behavior. 6-88 12-88 6-89 12-89 6-90 5 Components of a Time Series A time series can consists of four components. • • • • Long - term trend (T) Cyclical effect (C) Seasonal effect (S) Random variation (R) Random variation comprises the irregular unpredictable changes in the time series. It tends to hide the other (more predictable) components. We try to remove random variation thereby, identify the other components. 6 Time-series models There are two commonly used time series models: The additive model yt = Tt + Ct + St + Rt The multiplicative model yt = Tt x Ct x St x Rt 7 Smoothing Techniques To produce a better forecast we need to determine which components are present in a time series To identify the components present in the time series, we need first to remove the random variation This can be easily done by smoothing techniques: • moving averages • exponential smoothing 8 Moving Averages A k-period moving average for time period t is the arithmetic average of the time series values where the period t is the center observation. Example: A 3-period moving average for period t is calculated by (yt+1 + yt + yt-1)/3. Example: A 5-period moving average for period t is calculated by (yt+2 + yt+1 + yt + yt-1 + yt-2)/5. 9 Example: Gasoline Sales To forecast future gasoline sales, the last four years quarterly sales were recorded. Calculate the three-quarter and five-quarter moving average. Data Period 1 2 3 4 5 6 Year/Quarter Gas Sales 1 1 39 2 37 3 61 4 58 2 1 18 2 56 10 Solution 3 period moving average 5 period moving average (yt+1 + yt + yt-1)/3 (yt+2 + yt+1 + yt + yt-1 + yt-2)/5 • Solving by hand Period 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Gas Sales 39 37 61 58 18 56 82 27 41 69 49 66 54 42 90 66 3-period moving Avg. 5-period moving Avg. 52.00 45.67 44.00 52.00 55.00 50.00 45.67 53.00 61.33 56.33 54.00 62.00 66.00 46.00 55.00 48.20 44.80 55.00 53.60 50.40 55.80 56.00 60.20 63.60 * 45.67 * * * 42.60 * * 11 Example: Gasoline Sales Moving Average 100 Actual 50 3pr MA 15 13 11 9 7 5 3 1 0 Notice how the averaging process removes some of the random variation. There is some trend component present, as well as seasonality. 12 Example: Gasoline Sales The 5-period moving average removes more variation than the 3-period moving average. Too much smoothing may eliminate patterns of interest. Here, the seasonality component is removed when using 5-period moving average. 100 80 60 40 20 0 1 2 3 4 5 6 7 8 5-period moving average 9 10 11 12 13 14 15 16 3-period moving average Too little smoothing leaves much of the variation, which disguises the real patterns. 13 Example – Armani’s Pizza Sales Below are the daily sales figures for Armani’s pizza. Compute a 3-day moving average. (Armani TS.xls) Monday Tuesday Wednesday Thursday Friday Saturday Sunday Week 1 35 42 56 46 67 51 39 Week 2 38 46 61 52 73 58 42 The moving average that is associated with Monday of week 2 is? 14 Centered moving average With even number of observations included in the moving average, the average is placed between the two periods in the middle. To place the moving average in an actual time period, we need to center it. Two consecutive moving averages are centered by taking their average, and placing it in the middle between them. 15 Example – Armani’s Pizza Sales Calculate the 4-period moving average and center it, for the data given below: Period Time series Moving Avg. 1 15 2 27 (2.5) 19.0 3 20 (3.5) 21.5 4 14 (4.5) 17.5 5 25 6 11 Centerd Mov.Avg. 20.25 19.50 16 Example – Armani’s Pizza Sales Armani’s pizza is back. Compute a 2-day centered moving average. Week 1 Monday 35 Tuesday 42 Wednesday 56 Thursday 46 Friday 67 Saturday 51 Sunday 39 The centered moving average that is associated with Friday of week 1 is? 17 Components of a Time Series A time series can consist of four components. • Long - term trend (T) • Cyclical effect (C) • Seasonal effect (S) Random variation (R) A trend is a long term relatively smooth pattern or direction, that persists usually for more than one year. 18 Trend Analysis The trend component of a time series can be linear or non linear. It is easy to isolate the trend component by using linear regression. • For linear trend use the model y = b0 + b1t + e. • For non-linear trend with one (major) change in slope use the quadratic model y = b0 + b1t + b2t2 + e 19 Example – Pharmaceutical Sales (Identify trend) Annual sales for a pharmaceutical company are believed to change linearly over time. Based on the last 10 year sales records, measure the trend component. Start by renaming your years 1, 2, 3, etc. Year 1990 1991 1992 1993 1994 Sales 18.0 19.4 18.0 19.9 19.3 Year 1995 1996 1997 1998 1999 Sales 21.1 23.5 23.2 20.4 24.4 20 Example – Pharmaceutical Sales (Identify trend) Solution • Using Excel we have yˆ 17 . 28 . 6254 (11 ) 24 . 1599 30 25 20 Sales Regression Statistics Multiple R 0.829573 R Square 0.688192 Adj. R Sq 0.649216 Std Error 1.351968 Observations 10 Forecast forFit period t Line Plot11 15 yˆ 17 . 28 . 6254 t 10 5 0 ANOVA 0 df Regression Residual Total Intercept t 5 SS MS F Sig. F 1 32.27345 32.27345 17.65682 0.002988 8 14.62255 1.827818 9 46.896 Coeffs. S.E. t Stat P-value 17.28 0.92357 18.71 6.88E-08 0.625455 0.148847 4.202002 0.002988 10 15 11 21 Example – Cigarette Consumption Determine the long-term trend of the number of cigarettes smoked by Americans 18 years and older. Data (Cigarettes.xls) Consumption 5000 4000 3000 2000 1000 0 1950 1960 1970 1980 Year 1955 1956 1957 1958 1959 1960 1961 . . 1990 Per Capita Consumption 3675 3718 3762 3892 4066 4197 4284 . . 2000 Solution • Cigarette consumption increased between 1955 and 1963, but decreased thereafter. • A quadratic model seems to fit this pattern. 22 Example – Cigarette Consumption Regression Statistics Multiple R 0.9821756 R Square 0.9646688 Adj. R Sq 0.962759 Std Error 97.98621 Observations 40 4500 4300 4100 3900 3700 3500 3300 3100 2900 2700 2500 The quadratic model fits the data very well. 0 ANOVA df Regression Residual Total Intercept t t-sq 10 20 2 yˆ 3 ,730 . 689 71 . 380 t 2 . 558 t SS MS F 2 9699558.4 4849779.2 505.11707 37 355248 9601.2974 39 10054806 30 40 Sig. F 1.38332E-27 Coeffs S.E. t Stat P-value 3730.6888 48.903765 76.28633 2.67E-42 71.379694 5.5009971 12.975774 2.429E-15 -2.558442 0.130116 -19.66278 3.518E-21 23 Example Forecast the trend components for the previous two examples for periods 12, and 41 respectively. • yˆ 17 . 28 . 6254 t yˆ ( t 12 ) yˆ ( t 41 ) • yˆ 3, 730 . 689 71 . 380 t 2 . 558 t 2 yˆ ( t 12 ) yˆ ( t 41 ) 24 Components of a Time Series A time series can consists of four components. Long - term trend (T) • Cyclical effect (C) • Seasonal effect (S) Random variation (R) A cycle is a wavelike pattern describing a long term behavior (for more than one year). 6-88 12-88 6-89 12-89 6-90 Cycles are seldom regular, and often appear in combination with 25 other components. Measuring the Cyclical Effects Although often unpredictable, cycles need to be isolated. To identify cyclical variation we use the percentage of trend. • Determine the trend line (by regression). • Compute the trend value yˆ t for each period t. • Calculate the percentage of trend by y t yˆ t 100 26 Example – Energy Demand Does the demand for energy in the US exhibit cyclic behavior over time? Assume a linear trend and calculate the percentage of trend. Data (Energy Demand.xls) • The collected data included annual energy consumption from 1970 through 1993. Find the values for trend and percentage of trend for years 1975 and 1976. 27 Example – Energy Demand Now consider the multiplicative model y t Tt C t R t yt yˆ t Tt C t R t Tt (Assuming no seasonal effects). C t Rt The regression line represents trend. No trend is observed, but cyclicality and randomness still exist. Rate/Predicted Rate 110 105 100 95 90 85 1 3 5 7 9 11 13 15 17 19 21 23 28 (66.4/69.828)100 Period Line Fit Plot 90 yˆ 69 . 3 . 502 t Consumption 85 80 75 Cnsptn Pred. cnsmptn % of trend 66.4 69.7 72.2 74.3 72.5 . . 69.8283 70.3299 70.8314 71.3329 71.8344 . . 95.0903 99.1044 101.9322 104.1595 100.9265 . . 70 65 110.0000 60 0 5 10 15 105.0000 20 25 Period 100.0000 When groups of the percentage of trend alternate around 100%, the cyclical effects are present. 30 95.0000 90.0000 85.0000 1 3 5 7 9 11 13 15 17 19 21 23 29 Components of a Time Series A time series can consists of four components. Long - term trend (T) Cyclical effect (C) • Seasonal effect (S) Random variation (R) The seasonal component of the time-series exhibits a short term (less than one year) calendar repetitive behavior. 6-88 12-88 6-89 12-89 6-90 30 Measuring the Seasonal effects Seasonal variation may occur within a year or even within a shorter time interval. To measure the seasonal effects we can use two common methods: 1. constructing seasonal indexes 2. using indicator variables We will do each in turn 31 Example – Hotel Occupancy Calculate the quarterly seasonal indexes for hotel occupancy rate in order to measure seasonal variation. Data (Hotel Occupancy.xls) Year Quarter Rate Year Quarter Rate Year Quarter Rate 1991 1 0.561 1993 1 0.594 1995 1 0.665 2 0.702 2 0.738 2 0.835 3 0.800 3 0.729 3 0.873 4 0.568 4 0.600 4 0.670 1992 1 0.575 1994 1 0.622 2 0.738 2 0.708 3 0.868 3 0.806 4 0.605 4 0.632 32 Example – Hotel Occupancy Perform regression analysis for the model y = b0 + b1t + e where t represents the chronological time, and y represents the occupancy rate. Rate Time (t) Rate 1 0.561 2 0.702 3 0.800 4 0.568 5 0.575 6 0.738 7 0.868 8 0.605 . . . . yˆ . 639368 . 005246 t 0 5 10 t 15 20 The regression line represents trend. 33 Example – Hotel Occupancy Now let’s consider the multiplicative model, and let’s DETREND data y t Tt S t R t (Assuming no cyclical effects) Tt S t R t St Rt yˆ t Tt yt The regression line represents trend. Rate/Predicted rate No trend is observed, but seasonality and randomness still exist. 1.5 1 0.5 0 1 3 5 7 9 11 13 15 17 19 34 Example – Hotel Occupancy Rate/Predicted rate 0.870 1.080 1.221 0.860 0.864 1.100 1.284 0.888 0.865 1.067 1.046 0.854 0.879 0.993 1.122 0.874 0.913 1.138 1.181 0.900 To remove most of the random variation but leave the seasonal effects, average the terms StRt for each season. Rate/Predicted rate 1.5 1 0.5 0 1 3 5 7 9 11 13 15 17 19 Average ratio for quarter 1: (.870 + .864 + .865 + .879 + .913)/5 = .878 Average ratio for quarter 2: (1.080+1.100+1.067+.993+1.138)/5 = 1.076 Average ratio for quarter 3: (1.222+1.284+1.046+1.123+1.182)/5 = 1.171 Average ratio for quarter 4: (.861 +.888 + .854 + .874 + .900)/ 5 = .875 35 Example – Hotel Occupancy Interpreting the results • The seasonal indexes tell us what is the ratio between the time series value at a certain season, and the overall seasonal average. • In our problem: 7.6% above the annual average Annual average occupancy (100%) 17.1% above the 117.1% annual average 12.5% below the annual average 107.6% 12.2% below the annual average 87.8% 87.5% Quarter 1 Quarter 2 Quarter 3 Quarter 4 Quarter 1 Quarter 2 Quarter 3 Quarter 4 36 Creating Seasonal Indexes Normalizing the ratios: • The sum of all the ratios must be 4, such that the average ratio per season is equal to 1. • If the sum of all the ratios is not 4, we need to normalize (adjust) them proportionately. Suppose the sum of ratios equaled 4.1. Then each ratio will be multiplied by 4/4.1 (Seasonal averaged ratio) (number of seasons) Seasonal index = Sum of averaged ratios In our problem the sum of all the averaged ratios is equal to 4: .878 + 1.076 + 1.171 + .875 = 4.0. No normalization is needed. These ratios become the seasonal indexes. 37 Example Using a different example on quarterly GDP, assume that the averaged ratios were: Quarter 1: .97 Quarter 2: 1.03 Quarter 3: .87 Quarter 4: 1.00 Determine the seasonal indexes. 38 Deseasonalizing time series Seasonally adjusted time series = Actual time series Seasonal index By removing the seasonality, we can: • compare data across different seasons • identify changes in the other components of the time - series. 39 Example – Hotel Occupancy A new manager was hired in Q3 1994, when hotel occupancy rate was 80.6%. In the next quarter, the occupancy rate was 63.2%. Should the manager be fired because of poor performance? Method 1: • compare occupancy rates in Q4 1994 (0.632) and Q3 1994 (0.806) • the manager has underperformed he should be fired Method 2: • compare occupancy rates in Q4 1994 (0.632) and Q4 1993 (0.600) • the manager has performed very well he should get a raise 40 Example – Hotel Occupancy Method 3: • compare deseasonalized occupancy rates in Q4 1994 and Q3 1994 Seasonally adjusted time series = Actual time series Seasonal index Seasonally adjusted Actual occupancy rate in Q4 1994 = occupancy rate in Q4 Seasonal index for Q4 1994 Recall: Seasonally adjusted occupancy rate in Q3 1994 the manager has indeed performed well he should be maybe get a raise 41 Recomposing the time series Recomposition will recreate the time series, based only on the trend and seasonal components – you can then deduce the value of the random component. yˆ t Tˆ t Sˆ t (. 639 . 0052 t ) Sˆ t In period #1 ( quarter 1): yˆ 1 Tˆ1 Sˆ 1 (. 639 . 0052 (1))(. 878 ) . 566 In period #2 ( quarter 2): yˆ 2 Tˆ 2 Sˆ 2 (. 639 . 0052 ( 2 ))( 1 . 076 ) . 699 Actual series Smoothed series 0.9 0.8 0.7 0.6 The linear trend (regression) line 0.5 1 3 5 7 9 11 13 15 17 19 42 Example – Recomposing the Time Series Recompose the smoothed time series in the above example for periods 3 and 4 assuming our initial seasonal index figures of: Quarter 1: 0.878 Quarter 3: 1.171 Quarter 2: 1.076 Quarter 4: 0.875 and equation of: yˆ t Tˆ t Sˆ t (. 639 . 0052 t ) Sˆ t yˆ 3 ( 0 . 639 0 . 0052 * t ) * SI 3 ( 0 . 639 0 . 0052 * 3 ) * 1 . 171 yˆ 4 ( 0 . 639 0 . 0052 * t ) * SI 4 ( 0 . 639 0 . 0052 * 4 ) * 0 . 875 43 Seasonal Time Series with Indicator Variables We create a seasonal time series model by using indicator variables to determine whether there is seasonal variation in data. Then, we can use this linear regression with indicator variables to forecast seasonal effects. If there are n seasons, we use n-1 indicator variables to define the season of period t: • [Indicator variable n] = 1 if season n belongs to period t • [Indicator variable n] = 0 if season n does not belong to period t 44 Example – Hotel Occupancy Let’s again build a model that would be able to forecast quarterly occupancy rates at a luxurious hotel, but this time we will use a different technique. Data (Hotel Occupancy.xls) Year Quarter Rate Year Quarter Rate Year Quarter Rate 1991 1 0.561 1993 1 0.594 1995 1 0.665 2 0.702 2 0.738 2 0.835 3 0.800 3 0.729 3 0.873 4 0.568 4 0.600 4 0.670 1992 1 0.575 1994 1 0.622 2 0.738 2 0.708 3 0.868 3 0.806 4 0.605 4 0.632 45 Example – Hotel Occupancy Use regression analysis and indicator variables to forecast hotel occupancy rate in 1996 Data Qi = 1 if quarter i occur at t 0 otherwise Quarter 1 belongs to t = 1 y 0.561 0.702 0.800 0.568 0.575 0.738 0.868 0.605 . . Quarter 2 does not belong to t = 1 Quarter 3 does not belong to t = 1 Quarters 1, 2, 3, do not belong to period t = 4. t 1 2 3 4 5 6 7 8 . . Q1 1 0 0 0 1 0 0 0 . . Q2 0 1 0 0 0 1 0 0 . . Q3 0 0 1 0 0 0 1 0 . . 46 The regression model There is insufficient evidence to conclude y = b0 + b1t + b2Q1 + b3Q2 + b4Q3 +e that seasonality Regression Statistics There is sufficient evidence to causes occupancy Multiple R 0.943022 conclude that trend is present. rate in quarter 1 be R Square 0.88929 different than this of Adj R Sq. 0.859767 Std. Error 0.03806 Good fit quarter 4! Observations 20 ANOVA df Regression Residual Total Intercept t Q1 Q2 Q3 SS MS F 4 0.174531 0.043633 30.12217 15 0.021728 0.001449 19 0.196259 Coeffs 0.55455 0.005037 0.003512 0.139275 0.205237 S.E. 0.024812 0.001504 0.02449 0.024258 0.024118 t Stat 22.35028 3.348434 0.143423 5.74134 8.509752 P-value 6.27E-13 0.004399 0.887865 3.9E-05 3.99E-07 Sig. F 5.2E-07 Seasonality effects on occupancy rate in quarter 2 and 3 are different than this of quarter 4! 47 The estimated regression model y = b0 + b1t + b2Q1 + b3Q2 + b4Q3 +e Regression Statistics Multiple R 0.943022 R Square 0.88929 Adj R Sq. 0.859767 Std. Error 0.03806 Observations 20 df Intercept t Q1 Q2 Q3 yˆ 21 . 555 . 00504 ( 21) . 0035 (1) . 664 yˆ 22 . 555 . 00504 ( 22 ) . 139 (1) . 805 yˆ 23 . 555 . 00504 ( 23 ) . 205 (1) . 876 yˆ 24 . 555 . 00504 ( 24 ) ANOVA Regression Residual Total yˆ t . 555 . 00504 t . 0035 Q1 . 139 Q 2 . 205 Q 3 SS MS F 4 0.174531 0.043633 30.12217 15 0.021728 0.001449 19 0.196259 Coeffs 0.55455 0.005037 0.003512 0.139275 0.205237 S.E. 0.024812 0.001504 0.02449 0.024258 0.024118 t Stat 22.35028 3.348434 0.143423 5.74134 8.509752 P-value 6.27E-13 0.004399 0.887865 3.9E-05 3.99E-07 . 675 Sig. F 5.2E-07 b4 b2 b0 b3 1 2 3 4 48 Time-Series Forecasting with Autoregressive (AR) models Autocorrelation among the errors of the regression model provides opportunity to produce accurate forecasts. Correlation between consecutive residuals leads to the following autoregressive model: yt = b0 + b1yt-1 + et This is type of model is termed a first-order autoregressive model, and is denoted AR(1). A model which also includes both yt-1 and yt-2 is a second order model AR(2) and so on. Selecting between them is done on the basis of the significance of the highest order term. If you reject the highest order term, rerun the model without it. Apply the same criterion for the new highest order term, and so on. 49 Example – Consumer Price Index (CPI) Forecast the increase in the Consumer Price Index (CPI) for the year 1994, based on the data collected for the years 1951 -1993. Data (CPI.xls) t 1 2 3 4 . . 43 Y (t) 7.9 2.2 0.8 0.5 . . 3 50 Example – Consumer Price Index (CPI) Regress the Percent increase (yt) on the time variable (t) it does not lead to good results. Y (t) 16 14 Yt = 2.14 + .098t 12 10 r2 = 15.0% 8 (+) 6 4 2 (-) (-) 0 -2 0 10 20 30 t 40 50 The residuals form a pattern over time. Also, this indicates the potential presence of first order autocorrelation between consecutive residuals. 51 Example – Consumer Price Index (CPI) An autoregressive model appears to be a desirable technique. The increase in the CPI for periods 1, 2, 3,… are predictors of the increase in the CPI for periods 2, 3, 4,…, respectively. Y (t) 7.9 2.2 0.8 0.5 -0.4 . . Y (t-1) 7.9 2.2 0.8 0.5 . . Regression Statistics Multiple R 0.797851 R Square 0.636566 Adj. R Sq. 0.62748 Std. Error 1.943366 Observations 42 df Intercept Y (t-1) Forecast for 1994 ( t 44 ) : yˆ 44 . 8066 . 78705 y 43 . 8066 . 78705 ( 3 ) 3 . 16775 ANOVA Regression Residual Total yˆ t . 8066 . 78705 y t 1 SS MS F Sig. F 1 264.5979 264.5979 70.06111 2.51E-10 40 151.0669 3.776673 41 415.6648 Coeffs S.E. t Stat P-value 0.806644 0.506089 1.59388 0.118835 0.787045 0.094029 8.370251 2.51E-10 52 16 Compare 14 linear trend model Y (t) 12 10 8 6 4 2 autoregressive model 0 -2 0 10 20 30 40 50 Y (t) t -5 16 14 12 10 8 6 4 2 0 -2 0 5 Y (t-1) 10 15 53 Selecting Among Alternative Forecasting Models There are many forecasting models available * o o * o o* * Model 1 ? o* o* o o* * Model 2 Which model performs better? 54 Model selection To choose a forecasting method, we can evaluate forecast accuracy using the actual time series. The two most commonly used measures of forecast n accuracy are: • Mean Absolute Deviation MAD t 1 y t Ft n • Sum of Squares for Forecast Error n SSE SSFE (y t Ft ) 2 t 1 In practice, SSE is SSFE are used interchangeably. 55 Model selection Procedure for model selection. 1. Use some of the observations to develop several competing forecasting models. 2. Run the models on the rest of the observations. 3. Calculate the accuracy of each model using both MAD and SSE criterion. 4. Use the model which generates the lowest MAD value, unless it is important to avoid (even a few) large errors - in this case use best model as indicated by the lowest SSE. 56 Example – Model selection I Assume we have annual data from 1950 to 2001. We used data from 1950 to 1997 to develop three alternate forecasting models (two different regression models and an autoregressive model), Use MAD and SSE to determine which model performed best using the model forecasts versus the actual data for 1998-2001. Year 1998 1999 2000 2001 Actual y 129 142 156 183 Forecast value Reg 1 Reg 2 136 118 148 141 150 158 175 163 AR 130 146 170 180 57 Example – Model selection I Solution Actual y in 1998 • For model 1 MAD Forecast for y in 1998 129 136 142 148 156 150 183 175 4 6 . 75 SSE (129 136 ) (142 148 ) 2 2 (156 150 ) (183 175 ) 185 2 2 • Summary of results MAD SSE Reg 1 6.75 185 Reg 2 8.5 526 AR 5.5 222 58 Example – Model selection II For the actual values and the forecast values of a time series shown in the following table, calculate MAD and SSE. Forecast Value Actual Value Ft 173 186 192 211 223 MAD SSE yt 166 179 195 214 220 59