Assignment #1 “The work contained and presented here is my work and my work alone.” 1. Which is a Time series forecasting from the below-given options? (2 points) a) A chocolate company send out marketing postcards and models who will respond b) A paint manufacturer wants to know how long it will take for its paint to dry c) A hospital wants to know how long its patients will survive after a heart surgery so that the effects can be caught early d) A fire department wants to know how many fires it will likely need to fight during the holidays so that it can staff accordingly Ans: d 2. Which of the following is an example of a time series problem? (2 points) a) Estimating the number of hotel rooms booking in the next six months b) Estimating the total admissions in the next three years of an educational institute c) Estimating the number of calls for the next one week d) a) & b) e) a) & c) f) a), b) & c) Ans: f 3. What are your observations from the below graphs regarding trend, seasonality, white noise? Do you think the time series is stationary? Reason your answers. (3 points) Ans: A.) Observations regarding Trend: The given time series exhibits trend because of the below factors: The time series has significant ACF, PACF, and IACF at lag 1. We can see ACF with many significant lags decaying slowly from lag 1. We can see ACF with few significant values after first differencing is applied. The unit root tests are not significant but become significant after applying 1st difference. The time series plot exhibits an increase (Sep ’86 to Sep ’88) and then a decrease pattern. Hence the time series exhibits Trend. B.) Observations regarding Seasonality There is a repeating pattern in Auto Correlation (AC) plot which hints that seasonality is present However, the seasonality root test has all values as non- significant values ( all the values are less than the cutoff ), hence seasonality cannot be established in the series. The series is non-stationary due to the presence of trend. C.) Observations regarding White Noise: For the given time series, as seen in the white noise test plot above, the white noise test is failed. This indicates the series is not purely white noise but contains other elements which include stationary and non-stationary parts. Based on the above analysis: We have already established that the series is non-stationary and exhibits trend as well. Hence the nonstationary part needs to be modelled. 4. For the same time series in question 3 – when applied first difference (delta) the graphs look like below. What can you infer from this? Do you think you need to apply Seasonal difference (delta S) on this? Reason your answers. (3 points) Inference on the effect of applying First Difference (Delta) on the time series: We can see in the time series plot that trend is lost once first difference is applied to the original series. This indicates trend is present in the original series Further, the unit root test after applying first difference becomes significant which further proves presence of trend in the original series. The auto-correlation plot also has few significant values after first difference indicating trend There is no need to apply seasonal difference because there is no change in applying difference (delta) on the seasonality root test.(Further , we have noticed in the previous analysis that seasonality is not established in the given time series) General Instructions for questios 5 and 6. Download all the datasets on to your machines and save them in a single folder. You can use this folder as your library when you are building models. Submit a Word file with snapshots of the models you have built with detailed explanations of the process Do not use the “Automatically Fit Model” option in SAS Explain which model is the best fit and your rationale for model selection. Explore the RMSE, Residuals, Autocorrelation, White Noise plots, and the metrics to pick the best model 5. Open ‘Monthly_milk_production.sas7bdat’ File. This dataset contains information about past Milk production. Determine if the time series has a trend and seasonality. Describe your best model and your rationale in detail. (10 points) From the time series plot, We can clearly see Increasing Trend and Seasonality, which can be further determined statistically as explained below: First Difference: First Seasonal Difference The given time series exhibits trend because of the below factors: The time series has significant ACF, PACF, and IACF at lag 1. We can see ACF with many significant lags decaying slowly from lag 1. We can see ACF with few significant values after first differencing is applied. The unit root tests are not significant but become significant after applying 1st difference. The time series plot exhibits an increasing pattern. The time series exhibits seasonality as well due to the below factors: Significant ACF, PACF, and IACF values at lag S ACF with significant values at lags that are multiples of S. ACF without a significant value at lag S after a difference of order S has been applied Seasonal unit root tests that are not significant but become significant when a difference of order S is applied. Building Models: The given series is non-stationary with trend and seasonality. They need to be modelled. Since there is Trend and Seasonality in the series, both are modelled. Trend is modelled with various methods as noted below: 1. 2. 3. 4. 5. 6. Linear Trend Model Quadratic Trend Model Cubic Trend Model Exponential Trend Model Logarithmic Trend Model Logistic Trend Model Seasonality is modelled using: 1. Seasonal Dummy Repressors Model The models are thoroughly compared based on: 1. Accuracy based on: a. Root Mean Square Error (RMSE) 2. Complexity based on: a. Akaike information criterion b. Schwarz Bayesian information criterion 3. Model Fit based on: a. Residuals, ACF, PACF, IACF plots b. White Noise Test, Unit Root Test , Seasonality Test c. Parameter Estimates d. Statistics of Fit The details of the models and their comparisons are as below: Best Model: Linear Trend +Seasonal Dummies (Model M1) Although cubic and quadratic trends model the time series with marginally less error , the Linear Trend + Seasonal Dummies is the best model because : 1. Model M1 has very good RMSE 2. It does not involve complex non-linear terms (quadratic and cubic) 3. The quadratic and cubic terms have very low coefficients 4. It involves less computational efforts 5. It involves better explanation and interpretation 6. Open ‘CocoCola.sas7bdat’. This dataset contains weekly sales of boxes of Coco-Cola during 2004. Fit the best model and explain your rationale in detail. (10 points) For the given time series, It is difficult to estimate Trend and Seasonality based on the plot alone as there is no evident pattern observed. Hence the series is investigated further based on various statistical plots like ACF, PACF and IACF along with various Hypothesis tests for white-noise , seasonality and trend. First Difference: Based on the ACF, PACF and IACF plots and the tests we can assess that: 1. There is no trend in the data as even after applying first difference, (delta),there is not much change in the unit root test plot 2. There is evidently no seasonality in the data (absence of seasonality root test graph) 3. There is significant white noise in the time series 4. The series is hence stationary. Since the series is stationary, we should refrain from modelling the trend and seasonality of the data.