In-class Exercise 1 : Fitting Temperature Data with ARIMA (due Fri 3/06/2015) Name: Sample Soluiton Use This file as a template for your assignment. Submit your code and comments together with (selected) output from R console. Your comments must be BOLD FACED. First, load global temperature data from class web site using below R code. D <- read.csv("http://gozips.uakron.edu/~nmimoto/pages/datasets/gtemp.txt") X <- ts(D, start=c(1880), freq=1) plot(X, type='o') plot(diff(X), type='o') Fit1 <- auto.arima(X) Fit1 d=0 d=1 1. Use auto.arima and select the best ARIMA(p,d,q) model for this dataset. How did auto.arima came to the final model? Briefly explain. ARIMA(0,1,2) with drift ma1 ma2 drift -0.5134 -0.1915 0.0065 s.e. 0.0833 0.0783 0.0026 sigma^2 estimated as 0.009181: log likelihood=118.78 AIC=-229.56 AICc=-229.23 BIC=-218.12 Auto arima chose d=1 by KPSS stationarity test. Then it looked for ARMA(p,1,q) model stepwise looking for lowest AICc. 2. Determine the adequacy of the model fit by residual analysis. Ljung-Box, and McLeod-Li randomness tests of residuals all have high pvalues. Residuals look uncorrelated. The model seems adequate. JaqueBera test have high p-values indicating normality of the residual distribution. 3. Check for parameter significance of the current model (trusting the asymptotic s.e. from auto.arima()). Using standard error in the output, all model parameters are significant. (i.e. parameter estimate (+ - ) 1.96* s.e. does not contain 0). 4. Check for adequacy of value d selected by auto.arima(). Which method used as default? Use Stationarity.test() from the class website and check d=0, d=1, d=2 for its (non)stationarity. Do you agree with the choice made by auto.arima()? Stationarity.tests(X) KPSS ADF PP p-val: 0.01 0.706 0.01 Stationarity.tests(diff(X)) KPSS ADF PP p-val: 0.1 0.01 0.01 Stationarity.tests(diff(diff(X))) KPSS ADF PP p-val: 0.1 0.01 0.01 When d=0, KPSS and ADF test both indicate non-stationarity, and that is obvious from the plot (see top). When d=1, all three tests indicate stationarity, and plot does look stationary. Since d=1 is stationary, d=2 looks stationaty as well. 5. Look for signs of under-difference (d too low) and over-difference (d too high). What are the signs? Do you see any from your fit? Mod( polyroot( c( 1, -0.5134, -0.1915)) ) [1] 1.30883 3.98977 If AR parameters are close to unit root, it is a sign of under-differencing. However since the model does not have AR part, we cannot look for under-differenced sign here. However, we did check stationarity when d=1 in part (4), so we are not worried about under-differencing here. For over-differencing sign, we look at if MA polynomial have root close to unit circle. Note that auto.arima() uses (1+theta B) convention for the sign. Polyroot function shows the roots are not too close to the unit circle. From the different point of view, d=0 is clearly non-stationary. Therefore, d=1 cannot be over-differencing. 6. State your final model using equation(s) with parameter values. ARIMA(0,1,2) with drift ▽𝑋𝑡 = 𝑌𝑡 𝑌𝑡 = 𝑒𝑡 − .5134 𝑒𝑡−1 − 0.1915 𝑒𝑡−2 7. Use trace=TRUE option in auto.arima(), to see second and third lowest model from part(1). Use Arima() and repeat part (2) and (3) for those models. Is there any reason that they are better than model chosen by part (1)? auto.arima(X, trace=TRUE) ARIMA(1,1,3) with drift ARIMA(0,1,3) with drift : -226.4521 : -224.9137 Arima(X, order=c(0,1,3), include.drift=TRUE) Coefficients: ma1 ma2 ma3 drift -0.5048 -0.1875 -0.0164 0.0065 s.e. 0.0960 0.0816 0.0877 0.0025 sigma^2 estimated as 0.009179: log likelihood=118.8 AIC=-227.59 AICc=-227.1 BIC=-213.29 Arima(X, order=c(1,1,3), include.drift=TRUE) Coefficients: ar1 ma1 ma2 ma3 drift -0.9376 0.4846 -0.6341 -0.2857 0.0065 s.e. 0.0950 0.1169 0.0959 0.0869 0.0025 sigma^2 estimated as 0.008824: log likelihood=121.23 AIC=-230.45 AICc=-229.76 BIC=-213.29 Using trace=TRUE option, next two lowest AICc models are as above. ARIMA(0,1,3) contains one non-significant parameter (MA(3)), and ARIMA(1,1,2) contains phi1 being not significantly different from -1, which is sign of under-differencing. However, from part (4), we are convinced that d=1 is a stationary series. So this is not a good model. 8. Perform 5-step prediction of global temperature using your final model. X.hat = forecast(Fit1) plot(X.hat) plot(X.hat, xlim=c(2000, 2015)