Characterizing Time Series: The Autocorrelation Function The autocorrelation function is very useful because it provides a partial description of the process for modeling purposes. The autocorrelation function tells us how much correlation there is and how much interdependency there is between neighboring data points in the series Yt . Define the autocorrelation with lag K as E [(Yt Y )(Yt K Y )] K E [(Yt Y )2 ]E [(Yt K Y )2 ] cov(Yt ,Yt K ) t t K For a stationary series, the variance at t is the same as the variance at time t K , thus the denominator is just the variance of Yt , K E [(Yt Y )(Yt K Y )] Y2 The numerator is the covariance between Yt and Yt K , or K , So we have K K 0 Notice that by definition: 0 1 for any stochastic process. Example: Yt t Where t is an independently distributed random variable with zero mean, then the autocorrelation function for this process is given by 0 1, and K 0 for K 0. This process is called white noise, and there is no model that can provide a forecast any better than Y t l 0, for all l . Thus, if the autocorrelation function is zero, or close to zero for all K 0, there is little or no value in using a model to forecast the series. Estimation: How to Calculate An Estimate of the Autocorrelation Function The sample autocorrelation function of Yt is T K K t 1 (Yt Y ) * (Yt K Y ) T (Yt t 1 Y )2 The autocorrelation functions are symmetrical, i.e., that the correlation for a positive displacement is the same as that for a negative displacement, so that K K When plotting an autocorrelation function (plotting K for different values of K ), one need consider only positive values of K . Spreadsheet Example. How to Test a Particular Value of PK Equal to Zero If Yt has been generated by a white noise process, the sample autocorrelation coefficients (K 0) are approximately distributed according to a 1 normal distribution with mean zero and standard deviation , where T is the T 1 number of observations in the series. For example: T 100 , 0.1, which is T a standard error to each autocorrelation coefficient. For example, if a particular coefficient was in magnitude greater than 0.2, we could be 95 percent sure that the time autocorrelation coefficient is not zero. How to Test the Joint Hypothesis That All the Autocorrelation Coefficients Are Zero Box and Pierce’s Q test M Q T K2 K 1 is approximately distributed as 2 with M degree of freedom. Thus, if the calculated value of Q is greater than the critical 5 percent level, we can be 95 percent sure that the autocorrelation coefficients 1, 2 ,....M are not all zero. SAS Examples. PROC ARIMA; IDENTIFY statement. SAS Simulated Series. The Partial Autocorrecation Function The partial autocorrelation function can be used to determine the order of AR processes. The reason why it is called partial autocorrelation function is because it describes the correlation between Yt and Yt K minus the part explained linearly by the intervening lags. The idea here is to use Yule-Walker equations to solve for successive values of p , which is the order of the AR process. Example Suppose we start from the assumption that the autoregressive order is one, i.e. p 1 . Then we have 1 = 1, or sample autocorrelation ˆ1 ˆ1. If the calculated value ˆ , is significantly different from zero, the 1 autoregressive order is at least one (using 1 to denote ˆ1.) Now, consider p 2 . Solving Yule-Walker equations for p 2 , ˆ and ˆ . If ˆ is significantly different from zero, the process is 1 2 2 at least order 2 (denote 2 ˆ2 ) . If ˆ2 is approximately zero, the order is one. Repeating the process and get 1,... . We call 1,... the partial autocorrelation function and we can determine the order of AR processes from its behavior. In particular, if the true order is p , then j 0 for j P . To test j and see if it is zero, we use the fact that it is approximately normally distributed with mean zero and standard 1 2 error . So 5 percent level is to see whether it exceeds in T T magnitude.