Chapter 11: Time Series 1. a. 23.96 b. 24.064 a. S5 = 24.95; T5 = 1.19 b. S6 = 26.026; T6 = 1.1672 2. 3. 5,660 4. Seasonality is apparent on a listing or plot if the same patterns appear at regular intervals. This occurs typically in monthly or quarterly ata. You might notice, for example, that every year there is a high in summer and a low in winter as you might expect for sales of a product like ice cream. The ACF is helpful in seeing seasonality because it will show a peak at the appropriate interval or intervals. For example if monthly data are seasonal, then you should expect a peak at lag 12 in the ACF. Similarly if quarterly data is seasonal, you should expect a peak at lag 4. If there is additive seasonality in monthly data, then December should be high (or low) each year by same amount relative to the monthly average. For example, December sales could be about $80,000 higher than the average for the year. On the other hand, multiplicative seasonality involves a percentage increment. In this case, December sales might be 10% higher than the monthly average. 5. The politician's claim that unemployment fell was based on raw data–88,000 more people were employed in that month than in the previous month. The Bureau's claim that unemployment rose by 98,000 was based on the expectation that employment would rise by 186,000 (perhaps based on the seasonal business cycle) when in fact, it only rose by 88,000. The Bureau's number gives the better picture of the state of the economy because when the economy is healthy, there is an increase of 186,000 jobs in that time period. The fact that this number was not reached, indicates a faltering economy. 1 Chapter 11: Time Series 6. The line chart of the baseball averages through the years 1901 to 2001 appears as follows: a. Leading Batting Average 0.440 0.420 Batting Average 0.400 0.380 0.360 0.340 0.320 19 01 19 05 19 09 19 13 19 17 19 21 19 25 19 29 19 33 19 37 19 41 19 45 19 49 19 53 19 57 19 61 19 65 19 69 19 73 19 77 19 81 19 85 19 89 19 93 19 97 20 01 0.300 Year From the plot or from observing the data in the Edit worksheet, a downward trend is apparent, although the AVERAGE was relative to the trend in the early years and high relative to the trend in recent years. The George Brett average in 1980 (0.390) was high relative to neighboring observations, but there are others that are similarly different. For example, in 1977 Rod Carew hit 0.388. There is also a peak in 1994. The trendline is: b. Leading Batting Average 0.440 0.420 0.380 0.360 0.340 0.320 2 01 97 20 93 19 89 19 85 19 81 19 77 19 73 19 69 19 65 19 61 19 57 Year 19 53 19 49 19 45 19 41 19 37 19 33 19 29 19 25 19 21 19 17 19 13 19 09 19 05 19 19 01 0.300 19 Batting Average 0.400 Chapter 11: Time Series c. The ACF is: The ACF plot follows the pattern associated wwith a trend. This is in accord with the trend that appears of the raw data. d. The ACF of the difference in batting average is: 3 Chapter 11: Time Series The line plot appears as: Difference in Batting Avg 0.06 0.04 Difference 0.02 0 -0.02 -0.04 19 01 19 05 19 09 19 13 19 17 19 21 19 25 19 29 19 33 19 37 19 41 19 45 19 49 19 53 19 57 19 61 19 65 19 69 19 73 19 77 19 81 19 85 19 89 19 93 19 97 20 01 -0.06 Year The differenced series has no evident trend although there seems to be less variability near the beginning of the series. e. The only significant correlation is in the first lag and it is a negative correlation. The correlation drops after the first lag and no other lags are statistically significant. The negative value for the first lag indicates that the change in the batting average is negatively correlated with the change in the previous year. In other words, if the batting average has decreased one year it is more likely to increase in the next. f. Here are the predictions and standard errors of simple exponential smoothing models. The minimum standard forecasting error occurs with a smalling parameter value of 0.4. Smoothing Value Forecasted Value 0.2 0.3653 0.01928 0.3 0.3659 0.01883 0.4 0.3657 0.01875 0.5 0.3655 0.01887 4 Standard Error Chapter 11: Time Series 7. The line chart appears as follows: a. Power vs. Date 290.0 270.0 Power 250.0 230.0 210.0 190.0 170.0 Ja n 1 Ju 978 l1 Ja 97 n 8 1 Ju 979 l1 Ja 97 n 9 1 Ju 980 l1 Ja 98 n 0 1 Ju 981 l1 Ja 98 n 1 1 Ju 982 l1 Ja 98 n 2 1 Ju 983 l1 Ja 98 n 3 1 Ju 984 l1 Ja 98 n 4 1 Ju 985 l1 Ja 98 n 5 1 Ju 986 l1 Ja 98 n 6 1 Ju 987 l1 Ja 98 n 7 1 Ju 988 l1 Ja 98 n 8 1 Ju 989 l1 Ja 98 n 9 1 Ju 990 l1 99 0 150.0 Date There are two peaks each year. There is a summer peak probably caused by high demand for air conditioning and there is a winter peak probably resulting from heating demands. The forecasted values for the next 12 months are: b. Obs. Forecast Lower Upper 157 257.42 245.41 269.43 158 229.45 217.30 241.60 159 234.65 222.35 246.95 160 217.36 204.91 229.82 161 229.44 216.82 242.06 162 249.15 236.36 261.94 163 274.52 261.55 287.49 164 274.80 261.64 287.96 165 240.71 227.35 254.06 166 230.54 216.98 244.09 167 228.18 214.42 241.94 168 252.54 238.56 266.51 5 Chapter 11: Time Series c. The multiplicative seasonal indices are: July, August, and January are the three months of highest power production. August production is 25% higher than April production. d. e. The standard errors for the 5 exponential smoothing models are as follows: Location Linear Seasonal Standard Error 0.05 0.15 0.05 6.743145 0.05 0.30 0.05 7.463687 0.15 0.15 0.05 6.039115 0.15 0.30 0.05 6.165231 0.30 0.15 0.05 5.779812 0.30 0.30 0.05 5.972524 The smallest standard error comes from a location parameter of 0.3, a linear parameter of 0.15, and a seasonal parameter of 0.05. 6 Chapter 11: Time Series 8. a. The line plot of the visitation for Exit Glacier appears as follows: There are two trends present: visitation at Exit Glacier is high in the summer and very low in the winter, and overall visitation is growing every year. b. The two line plots for visitation appear as follows: 7 Chapter 11: Time Series June 1994 is much higher than any previous month and much higher than previous June values. It will cause forecasts for future months to be higher. c. The plot of the seasonally adjusted values appears as follows: The 36th observation–December 1992–shows an increase in the seasonally adjusted values. After this date, the adjusted values appear to vary around a high average value. 8 Chapter 11: Time Series d. Using exponential smoothing with the location parameter = 0.15, the linear parameter = 0.15 and the seasonal parameter = 0.05, here are forecasted values for the next 12 months: e. Here are the forecasted visitation data for the next 12 months, with parameters of 0.05, 0.15, and 0.05: Here are the data with parameters of .01, .15, and .05: Location Linear Seasonal Standard Error 0.15 0.15 0.05 8419.575 0.05 0.15 0.05 7644.315 0.01 0.15 0.05 8402.932 9 Chapter 11: Time Series The predictions for the next 12 months are much reduced with the smaller location parameters, and the standard error is smaller as well. The second model might be the best, since it has the smallest standard error. More information that would help decide between projections is some explanation for the jump in June 1994. Was there an event at the glacier that drew more visitors? Was a new road recently put in, or a new shuttle system, or something else that would make access to the glacier much easier? Or has the Park experienced a dramatic and temporary increase in visitation due to publicity from some other event? f. The lower end of many confidence intervals are negative, particularly in the winter. Since the numbers represent visitors, they cannot be negative! Clearly these confidence intervals have to be viewed with caution. a. Use the LOG10(D2) function to transform the count of visitors to Exit Glacier and then use the Fill Down command to fill in the rest of the values in the column. b. The line plot of the log10(visits) appears as follows: 9. By plotting the logs we can view smaller variations in the number of visits in the winter months. It is possible that there are peaks that occur in the winter months which we could not see in the raw counts plots (due to the width of the y-axis scale). c. The predicted and exponentiated log(visits) values for an exponential model with the location parameter = 0.15, the linear parameter=0.15 and the seasonal parameter=0.05 is: 10 Chapter 11: Time Series The confidence intervals are unreasonably wide. The upper 95% confidence interval suggests that more than 3 million people could visit the Exit Glacier site in June 1995! d. Here are the forecasted values with parameters .01, .15, and .05: and with parameters .05, .15, and .05: Location Linear Seasonal Standard Error 0.15 0.15 0.05 0.363263 0.05 0.15 0.05 0.326634 0.01 0.15 0.05 0.317860 The set of parameters, .01, .15, and .05, give the smallest standard error, as well as the smallest forecasted visitation, of the log(visitation) data. e. Here are the projected winter visitation numbers using the raw and log counts: Month Raw Count Projection Log Count Projection December 1995 442 248 January 1995 275 217 February 1995 201 143 March 1995 551 471 11 Chapter 11: Time Series The transformed values predict a lower number of visits for the winter months. When determining the number of personnel you need during winter months a difference of 200 visits might mean hiring (or not hiring) an additional employee. The results from previous years indicates that it may be more appropriate to forecast the lower number of visits rather than the higher number. Other information besides the forecasts from the models should be taken into account however. 10. a. The line plot appears as follows: There is some evidence of seasonality albeit with a great deal of variability. There appear to be about 7 separate peaks. b. The boxplot appears as: A period of high body temperatures preceeds menstruation. The onset of menstruation appears to be associated with a decrease in temperature of about 0.5 degrees. 12 Chapter 11: Time Series c. The ACF plot is: Based on the shape of the plot, the between-peak distance is roughly 35, and thus the length of the period is about 35 days. d. The exponential smoothing model using a location parameter=0.15, linear parameter=0.01 and the seasonal parameter=0.05 appears as follows: 13 Chapter 11: Time Series e. Here are the predicted temperatures for the next cycle, using parameters of 0.15, 0.01, and 0.15: 14 Chapter 11: Time Series Here are the predicted temperatures for the next cycle, using parameters of 0.15, 0.01, and 0.25: 15 Chapter 11: Time Series Location Linear Seasonal Standard Error 0.15 0.01 0.05 0.221952 0.15 0.01 0.15 0.229522 0.15 0.01 0.25 0.236464 11. The smoothed plot with a smoothing parameter of 0.15 appears as follows (after reformatting): a. 401 351 301 251 201 151 101 51 1 0 100 200 300 MSE = 11,595.06 For w = 0.085, MSE = 11,263,54 16 400 Chapter 11: Time Series 401 351 301 251 201 151 101 51 1 0 100 200 300 400 300 400 For w = 0.05, MSE = 11,188.20 401 351 301 251 201 151 101 51 1 0 100 200 The lowest MSE occurs with the smoothing parameter = 0.05. b. The plots hsow a downward trend in draft numbers as the year progresses. 17 Chapter 11: Time Series c. The ACF plot (shown following) does not indicate the presence of any significant autocorrelation in the draft numbers. d. The draft numbers show a downward trend as the year progresses making it more likely that people born later in the year receive lower draft numbers (and are thus less likely to be drafted.) There is no indication that draftnumbers are autocorrelated. a. The line chart appears as follows: 12. Oil Production 160 140 120 Production 100 1992 1993 1994 1995 80 60 40 20 0 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Year Production is seasonal. Cottonseed oil production reaches its height in the months from November to January. The production is at its lowest from July to September. 18 Chapter 11: Time Series b. The smoothed plot and 12 forcasted values appear as follows: 211.7 191.7 171.7 151.7 131.7 111.7 91.7 71.7 51.7 0 c. 10 20 30 40 50 60 70 Adjusting for the seasonal effects yields the following plot: 151.7 131.7 111.7 91.7 71.7 51.7 0 10 20 30 Production 40 Adjusted It is difficult to determine whether the mean production level (after adjusting for seasonal variation) has changed. There is one peak that occurs in the last year that suggests that it might have. 19 Chapter 11: Time Series The ANOVA table for the regression is: Regression Statistics Multiple R 0.266 R Square 0.071 Adjusted R Square 0.051 Standard Error 11.321 Observations 48 ANOVA df SS Regression MS 1 449.403 449.403 Residual 46 5895.438 128.162 Total 47 6344.841 Standard Error Coefficients Intercept Obs. Significance F F t Stat 3.507 P-value 0.067 Lower 95% Upper 95% 94.896 3.320 28.585 0.000 88.214 101.578 0.221 0.118 1.873 0.067 -0.017 0.458 The p-value for the slope of the regression is 0.067 which is not significant at the 5% level. We reject the null hypothesis that there has been an significant increase in cottonseed oil production over the past four years. 13. a. The two-way table appears as: Stoppage Year Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 1981 6 7 16 17 18 30 23 9 5 7 5 2 1982 2 3 4 14 15 18 13 9 14 3 1 0 1983 1 5 5 2 12 16 10 7 7 12 4 0 1984 6 3 2 7 5 5 8 5 10 4 4 3 1985 2 4 4 3 2 2 9 6 11 6 3 2 1986 4 3 2 4 6 11 13 10 8 5 2 1 1987 2 5 3 2 3 8 6 3 7 1 6 0 1988 3 5 3 0 5 7 4 7 2 3 1 0 1989 3 0 3 6 8 2 6 6 6 5 5 1 1990 2 3 7 5 5 6 1 5 3 2 3 2 1991 0 2 1 7 7 5 0 4 3 6 3 1 1992 0 1 1 4 6 6 1 3 8 5 0 0 1993 2 1 4 2 5 2 3 5 4 4 3 0 1994 1 2 3 5 4 9 4 5 7 4 1 0 1995 1 1 4 3 1 2 3 5 4 5 2 0 20 Chapter 11: Time Series The boxplot and line plots appear as: b. 35 30 25 20 15 10 5 0 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Nov Dec -5 Work Stoppage 35 30 Stoppage 25 20 15 10 5 0 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Month 1981 1991 1982 1992 1983 1993 1984 1994 1985 1995 1986 1987 1988 1989 1990 The highest work stoppages appear in the summer months from May to September. The fewest work stoppages appear from December to February. 21 Chapter 11: Time Series The adjusted values appear as follows (after reformatting): c. 35 30 25 20 15 10 5 0 0 50 100 Stoppage 150 Adjusted There does appear to be a decrease in work stoppages over the decade after adjusting for seasonal effects. The plot of smoothed values appears as follows: d. 25 20 15 10 5 0 0 e. 50 100 150 200 Based on the findings, it appears that work stoppages may be seasonal and they have declined over the past ten years. 22 Chapter 11: Time Series 14. The two-way table appears as: a. Unemployment Month Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 19.1 19.3 19.2 18.8 19.1 19.8 18.6 18.8 19.7 20.3 21.3 21.1 1982 22 22.6 21.8 22.8 22.8 22.9 24 23.7 23.6 23.7 24.1 24.1 1983 23.1 22.8 23.5 23.4 22.8 24 22.8 22.9 21.7 21.4 20.2 19.9 1984 19.5 19.4 19.8 19.2 18.7 18.2 18.8 18.7 19.2 18.6 17.7 18.8 1985 18.8 18.3 18.2 17.5 18.5 18.5 20.2 17.9 17.9 20 18.3 19.1 1986 18.1 18.8 18.2 19.2 18.6 19.2 18.4 18 18.4 17.7 18.1 17.5 1987 17.7 18 17.9 17.3 17.4 16.5 15.8 15.9 16.2 17.3 16.6 16 1988 16.1 15.6 16.6 16 15.3 14.2 14.8 15.4 15.5 15.1 13.9 14.8 1989 16.4 15 13.9 14.6 14.8 15.7 14.2 14.6 15.2 15 15.5 15.3 1990 14.8 15 14.3 14.7 15 14.3 15 16.3 16.4 16.5 17.1 17.4 1991 18.6 17.4 18.3 17.8 18.8 18.5 19.4 18.9 18.8 19.1 19 20.3 1992 19.2 20.1 20.3 18.5 20.1 23 20.8 19.9 21 18.3 20.5 19.8 1993 19.9 19.7 19.7 19.5 19.8 19.9 18.4 18.4 18.2 18.7 18.5 17.9 1994 18.3 18.1 18 19 18 17.7 17.5 17.3 17.6 17.4 15.5 16.9 1995 16.5 17.5 16.1 17.3 17.5 17.2 18 17.4 17.8 17.3 17.5 17.9 1996 17.8 17 17.1 16.8 16.6 16.2 16.7 17 16 16.3 16.8 16.5 Aug Sep Oct 1981 The spaghetti plot appears as: b. Unemployment 26 24 Unemployment 22 20 18 16 14 12 Jan Feb Mar Apr May Jun Jul Nov Dec Month 1981 1991 1982 1992 1983 1993 1984 1994 1985 1995 23 1986 1996 1987 1988 1989 1990 Chapter 11: Time Series c. The boxplot appears as: Unemployment 26 24 22 20 18 16 14 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 12 Month There is no evidence of seasonality in this plot. d. The results of the seasonal adjustment appear as follows: There is nothing in the seasonal indices that indicates that this data is seasonal. 24 25