DS 303 Spring 2004 Final Exam Name: ____KEY_______________ Show All your Work 1. Decision Science Associates has been asked to do a feasibility study for a proposed destination resort to be located within half a mile of the Grand Coulee Dam. Mark Craze is not happy with the regression model that used the price of a regular gallon of gasoline to predict the number of visitors to the Grand Coulee Visitor Center. After plotting the data, Mark decides to use a dummy variable to represent significant celebrations in the general area. Mark uses a 1 to represent a celebration and a 0 to represent no celebration.. Mark also decides to use time as a predictor variable. Mark runs these data on the computer using Excel. The partial out put is given below. SUMMARY OUTPUT Regression Statistics Multiple R 0.904651648 R Square 0.818394605 Adjusted R Square 0.763912986 Standard Error 70006.05694 Observations 14 ANOVA df Regression Residual Total Intercept Time index Price of gasoline Celebration 3 10 13 SS MS 2.20854E+11 73617995859 49008480079 4900848008 2.69862E+11 Coefficients Standard Error 309899.41 59495.89 24430.90 7240.12 -193330.79 97705.70 217138.10 47412.24 t Stat 5.209 3.374 -1.979 F 15.02148113 P-value 0.00040 0.0071 0.076 a) Write the estimated least square regression line? ŷ = 309899.41 + 24430.9 Time – 193330.79 Gas + 217138.1 Celeb. b) Should we keep all the variables in the model? If no, which one do you suggest to drop and why? Yes, they are all statistically significant predictors of the number of visitors. Price of gasoline is border line, but since it is not far from 5% I still keep it. c) How are b2 and b3 are interpreted here? b2 For every dollar increase in price of gas while all other variables are kept the same , the number of visitors will go down by 193330. b3 While all other variables are kept the same, the number of visitors is 217138 more for times when there is a significant celebration in the area. d) What is the estimated number of visitors if the Time index is 16, price of Gasoline is 1.45, and there is celebration in the general area. ŷ = 309899.41 + 24430.9(16) – 193330.79(1.45) + 217138.1(1) e) ŷ = 637602.26 Give a 90% confidence interval for 2. b2 ± t*s (b2) t* = t.05, 10 = 1.812 -193330.79 ± 1.812 (97705.70) -193330.79 ± 177042.72 (-370373.52, 16288.1) f) Test the overall fit of the model (State the null and alternative hypothesis, test statistic, the decision criteria, and your conclusion) use =5%. Ho: β1 = β2 = β3 = 0 Ha: Not all βi = 0 Decision criteria Reject Ho if F > F.05, 3, 10 = 3.71 F = 15.02 F = 15.02 > 3.71 reject Ho: at least one of these explanatory variables is a significant predictor. 2. An ANOVA table is Source Regression Error Total DF 1 23 24 SS 50 450 500 a. Complete the table. b. How large was the sample? 25 Determine the coefficient of determination. c. MS 50 19.56 F 2.556 R2 = SSR/SST = 50/450 = .11 3. A sample of 25 mayoral campaigns in cities with population larger than 50,000 showed that the correlation between the percent of the vote received and the amount spent on the campaign by the candidate was .34. At the 5% significance level, is there a positive correlation between the variables? State the null and the alternative hypothesis, the test statistic, the decision criteria, and your conclusion. n = 25 Ho : ρ = 0 Reject Ho if t > t.05, 23 = 1.714 r = .34 Ha : ρ > 0 α = 5% t t 4. r 0 1 r2 n2 .34 1.73 .8844 23 Reject Ho: there is a positive correlation between the % vote received and the amount spent on the campaign. You test for serial correlation, at the .05 level with 32 residuals from a regression with two independent variables. If the calculated Durbin-Watson statistic is equal to 1.0, what is your conclusion? State the null, and the alternative hypothesis, the decision criteria, and your conclusion. α = .05 DW = 1 Ho : ρ = 0 n = 32 Ha : ρ ≠ 0 k=2 from table 1.31 = L 1.57 = U Since DW = 1 < L = 1.31 Reject Ho There is serial correlation 5. A tanning parlor located in a major shopping center near a large New England city has the following history of customers over the last four years (data are in hundreds of customers): Year 1 2 3 4 Number of Moving Centered CMA Seasonal Seasonal Cycle Quarters Customers Average Moving Average Trend Factor Index Factor 1 3.50 2 2.90 3 2.00 4 1 .73 1.028 .98 1.247 1.27 .986 1.01 2.9 2.975 2.90 .672 3.20 3.05 3.113 3.10 4.10 3.175 3.288 3.29 2 3.40 3.4 3.45 3.49 3 2.90 3.5 3.638 3.68 .797 4 3.60 3.775 3.913 3.88 .920 1 5.20 4.05 4.075 4.08 1.276 2 4.50 4.1 4.213 4.27 1.068 3 3.10 4.325 4.438 4.47 .699 4 4.50 4.55 4.613 4.66 .976 1 6.10 4.675 4.838 4.86 1.261 2 5.00 5 5.188 5.05 .964 3 4.40 5.375 4 6.00 1.026 1.004 .999 .989 .989 1.009 .999 .987 .993 .999 .995 1.027 5.25 5.45 a) Find a four period moving average for each quarter. b) Find the centered moving average for the sample. c) Find the seasonal factors and the seasonal indexes. d) Find the cycle factors. e) Use the multiplicative decomposition method to forecast the number of customers for each quarter of year 4. ŷ1 = 6.09 ŷ2 = 5.24 ŷ3 = 3.81 ŷ4 =5.29 f) Evaluate the RMSE for the year 4. e1 = 6.10 – 6.09 = .01 e2 = 5 – 5.24 = -.24 e3 = 4.40 – 3.81 = .59 e4 = 6 – 5.29 = .71 Σ(y – ŷ)2 = .9099 RMSE = √(.9099/4) = √.227 = .48 Multiple Choice Questions Select the best answer 1. Seasonal components A) cannot be predicted. B) are regular repeated patterns. C) are long runs of observations above or below the trend line. D) reflect a shift in the series over time. 2. Forecast errors A) are the difference in successive values of a time series B) are the differences between actual and forecast values C) should all be nonnegative D) should be summed to judge the goodness of a forecasting model 3. To select a value for for exponential smoothing A) use a small when the series varies substantially. B) use a large when the series has little random variability. C) use any value between 0 and 1 D) All of the alternatives are true. 4. Quantitative forecasting methods do not require that patterns from the past will necessarily continue in the future. A) True B) False 5. All quarterly time series contain seasonality. A) True B) False 6. You are given a time series of sales data with 10 observations. You construct forecasts according to last period’s actual level of sales. How many data points will be lost in the forecast process relative to the original data series? 7. A) One. B) Two. C) Three. D) Zero. E) None of the above. If a sample of n= 30 is used to estimate a multiple regression having four independent variables, the degrees of freedom for the F distribution test of model significance are A) B) C) D) 3 and 24 2 and 24 3 and 23 3 and 21 8. 9. D) None of the above How many first initial values must the forecaster set using Holt's exponential smoothing? A) 0. B) 1. C) 2. D) 3. E) None of the above. Which of the following is not correct? Seasonality in a time series dataset containing quarterly observations can be handled by A) B) C) D) using four dummy variables, one for each season. using only three dummy variables. using Winter's smoothing. deseasonalizing the data and then applying nonseasonal methods. 10. In the linear model, the slope coefficient i measures the expected change in Y per unit change in Xi given the other independent variables are fixed. 11. A) True B) false C) Not enough information Which of the following is not a technique used to generate forecasts with time series decomposition? 12. 13. A) Moving averages. B) Trend projection. C) Multiplicative seasonality. D) Dummy variables. E) All of the above. In time-series decomposition analysis, decomposition refers to: A) converting an annual trend line into a monthly trend line. B) deseasonalizing the data. C) separating a time series into component parts. D) isolating the cyclical component of a time series. E) None of the above. When calculating centered moving-averages using a 4-period moving average, how many data points are lost at the beginning of the original series? A) B) C) D) E) 1. 2. 3. 4. None of the above. 14. 15. 16. A seasonal index number of 1.80 for quarter one of an automobile parts manufacturer suggests A) Quarter one sales are 80% above the norm. B) Quarter one sales are 1.80% below the norm. C) Quarter one sales are 20% below the norm. D) Quarter one sales are 80% below the norm. E) None of the above. Quarter one sales for a tire manufacturer were $120,000,000. If the quarter one seasonal index was 1.20, what is an estimate of annual sales for this firm? A) $100,000,000. B) $144,000,000. C) $400,000,000. D) $576,000,000. E) None of the above. Suppose Nike sales are expected to be 1.2 billion dollars for the year 2005. If the January seasonal index for Nike is .98, what is a reasonable estimate for January 2005 sales revenue? A) .098 billion. B) .1 billion. C) 1.176 billion. D) 2.18 billion. Note: The next three questions relate to the following data: Actual Series Forecast Series Forecast Error Time Period 100 100 0 1 110 --2 115 --3 17. If a smoothing constant of .3 is used, what is the exponentially smoothed forecast for period 4? 18. A) 106.6. B) 103.0. C) 115.0. D) 112.6. E) 104.4. What is the forecast error for period 3? A) B) C) D) F) +3. +12. +12. -7. +7. 19. If a three-month moving-average model is used, what is the forecast for period 4? A) B) C) D) E) 20. 104.4. 106.6. 107.1. 108.3. 110.2. If the smoothing constant were chosen to be unity, the exponential smoothing model would equal A) B) C) D) E) moving average smoothing. Holt's exponential smoothing. the simple naive model. Winter's exponential smoothing. moving average smoothing with a one-year lag.