1 Moving Average Models MA(q) In the moving average process of order q , each observation Yt is generated by a weighted average of random disturbances going back q periods. It is denoted as MA(q ) , and the equation is Yt t 1 t 1 2 t 2 .... q t q where the parameters 1, 2 ,...., q may be positive or negative. t WN (0, 2 ), covariance K 0, for K 0. White noise processes may not occur very common, but weighted sums of a white noise process can provide a good representation of processes that are nonwhite. The mean of the moving average process is independent of time, since E(Yt ) The variance: var(Yt ) 0 E [Yt )2 ] E [( t 1 t 1 2 t 2 .... q q )( t 1 t 1 2 t 2 .... q q )] E [ t2 12 t 1 22 t22 .... q2 t2q 1 t 1 t 2 t 2 t ....] 2 12 2 22 2 .... q2 2 2 (1 12 22 .... q2 ) (sin ce E ( t ) 0 for t ) Yt is stationary, so the variance of Yt to be finite. So we have q i , or more general i 1 2 i lim . i 1 2 x 2 Let’s examine some simple moving average processes; calculating the mean, variance, covariance and autocorrelation function for each. These statistics are important since: 1. They provide information that helps characterize the process; 2. Help us to identify the process when we construct models. Example 1 MA(1) Yt t 1 t 1 mean = variance = 2 (1 12 ) covariance for lag one 1 E [(Yt )(Yt 1 )] E [( t 1 t 1 )( t 1 1 t 2 )] 1 2 For K , K 1, in general K E [( t 1 t 1 )( t K 1 t K 1 )] 0 Thus, the MA(1) process has a covariance of zero when the displacement is more than one period. (It has a memory of only one period.) *Autocorrelation function for MA(1) 1 K K 1 12 0 0 [Graph MA(1) ] for K 1 otherwise 3 Example 2 MA(2) Equation Yt t 1 t 1 2 t 2 *mean = *variance = 2 (1 12 22 ) 0 *covariance 1 E [( t 1 t 1 2 t 2 )( t 1 1 t 2 2 t 3 ) 1 2 21 2 1(1 2 ) 2 2 E [( t 1 t 1 2 t 2 )( t 2 1 t 3 2 t 4 ) 2 2 and K 0 for K 2. *Autocorrelation function is 1 1(1 2 ) ; 0 1 12 22 2 2 2 . 0 1 12 22 1 NOTE: The process MA(2) has a memory of two periods. [Show graph MA(2) ] 4 In general, the formula of autocorrelation for a moving average process of order q [ MA(q ) ] is K K 1K 1 .... q Kq 1 12 22 ....q2 We can see now why the sample autocorrelation function can be useful in specifying the order of a moving average process: the autocorrelation function K for the MA(q ) process has q non-zero values and is zero for K q. Invertible MA(1) process Yt t 1 t 1 If 1 1, the process is invertible, i.e., we can “invert” the series and express the current value of Yt in a current disturbance and its lagged value. This is so called an autoregressive representation. t Yt 1 t 1 t 1 Yt 1 1 t 2 t 2 Yt 2 1 t 3 Yt t 1(Yt 1 1 t 2 ) t 1Yt 1 12 (Yt 2 1 t 3 ) t 1Yt 1 12Yt 2 13 (Yt 3 1 t 4 ) t 1Yt 1 12Yt 2 13Yt 3 ... Notice that the autoregressive representation exists only if 1 1. 5 Autoregressive Models – AR(P) In the autoregressive process of order p , the current observation Yt is generated by a weighted average of past observations going back p periods, together with a random disturbance in the current period. It is denoted as AR ( p ) , and the equation is: Yt 1Yt 1 2Yt 2 .... Yt t where is a constant term which relates to the mean of the series and 1,2 ,.... can be positive or negative. The properties of autoregressive processes: If the autoregressive process is stationary, then its mean, denoting , must be invariant with respect to time, i.e., E(Yt ) E(Yt 1 ) .... The mean of AR ( p ) then is E (Yt ) 1 2 .... or 1 1 2 .... This formula also gives us a condition for stationarity, i.e., 1 2 .... 1. Example 1 AR (1) Yt 1Yt 1 t *mean = and is stationary if 1 1. 1 1 * var iance (set 0) 0 E [(1Yt 1 t )2 ] E [(1Yt 1 t )(1Yt 1 t )] 12 0 2 0 2 1 12 6 *covariance 1 E [Yt 1(1Yt 1 t )] 1 0 (substitute 0 int o t he equation ) 1 2 1 12 2 E [Yt 2 (1Yt 1 t )] sin ce Yt 1 Yt 2 t 1, so we have 2 E [Yt 2 (12Yt 2 1 t 1 t )] (substitute 0 2 1 0 2 by 1 12 2 2 1 12 2 1 Similarly, the covariance for K-lag displacement is 2 K 0 1 12 *the autocorrelation function K 1 1, K K 1 K 1K 0 (decline geometrically !) NOTE: This process has an infinite memory. The current value Yt depends on all past values, although the magnitude of this dependence declines with time. [Show graph AR (1) ] Yt 0.9Yt 1 t 7 Example 2 AR(2) Yt 1Yt 1 2Yt 2 t *mean = 1 1 2 The necessary condition for stationarity is that 1 2 1. *variance and covariances (assuming 0) 0 E [Yt (1Yt 1 2Yt 2 t )] (1) 1 1 2 2 2 1 E [Yt 1 (1Yt 1 2Yt 2 t )] (2) 1 0 2 1 2 E [Yt 2 (1Yt 1 2Yt 2 t )] (3) 1 1 2 0 In general, for K 2, we have K 1 K 1 2 k 2 Now we can solve 0 , 1, and 2 in terms of 1, 2 , and 2 . Starting from 1 : 1 1 0 2 1 1 1 0 1 2 (4) Substituting (3) into (1) 0 1 1 12 1 22 0 then substituting (4) into (5) 0 12 0 212 0 22 0 2 1 2 1 2 after rearranging, 0 then solve for 2 . 2 (1 2 ) (1 2 ) [(1 2 ) 12 ] (5) 8 *autocorrelation function 1 1 1 0 1 2 2 12 2 2 0 1 2 In general, for K 2 K 1 K 1 2 K 2 We can use ˆ K to derive the autoregressive parameters: Yule-Walker equations. 1 1 1 2 2 2 12 1 2 Suppose we have the sample autocorrelation function for a time series which is AR(2) . Then we calculate the sample autocorrelation function. T K ˆ K (Yt Y )(Yt K Y ) t 1 T 2 (Yt Y ) t 1 ˆ1, ˆ 2 , then substitute the value into Yule-Walker equations to solve 1 and 2 . [Show graph of AR(2) ] 9 The Partial Autocorrelation Function The partial autocorrelation function can be used to determine the order of AR processes. The reason why it is called partial autocorrelation function is because it describes the correlation between Yt and Yt K minus the part explained linearly by the intervening lags. The idea here is to use Yule-Walker equations to solve for successive values of p , which is the order of the AR process. Example Suppose we start from the assumption that the autoregressive order is one, i.e. p 1 . Then we have 1 = 1, or sample autocorrelation ˆ1 ˆ1. If the calculated value ˆ , is significantly different from zero, the 1 autoregressive order is at least one (using 1 to denote ˆ1.) Now, consider p 2 . Solving Yule-Walker equations for p 2 , ˆ and ˆ . If ˆ is significantly different from zero, the process is 1 2 2 at least order 2 (denote 2 ˆ2 ) . If ˆ2 is approximately zero, the order is one. Repeating the process and get 1,... . We call 1,... the partial autocorrelation function and we can determine the order of AR processes from its behavior. In particular, if the true order is p , then j 0 for j P . To test j and see if it is zero, we use the fact that it is approximately normally distributed with mean zero and standard error level is to see whether it exceeds 2 T in magnitude. 1 T . So 5 percent 10 Autoregressive-Moving Average Models ARMA(p,q) ARMA( p, q ) Models Yt 1Yt 1 2Yt 2 ... pYt p t 1 t 1 ... q t q where t ~ WN (0, 2 ) Assume that the process is stationary, so that its mean is constant over time: 1 1 2 ... p 1 2 ... p Notice that this gives a necessary condition for stationarity of Yt 1 2 ... p 1 For variance and covariance, let us assume ARMA(1,1) Yt 1Yt 1 t 1 t 1 Variance: 0 E[(1Yt 1 t 1 t 1 )(1Yt 1 t 1 t 1 )] 12 0 2 12 2 211 E[ t 1Yt 1 ] and E[ t 1Yt 1 ] 2 We have 0 12 0 2 12 2 211 2 or 0 1 * 2 (1 12 211 ) (1 12 ) Covariance 1 E[Yt 1 (1Yt 1 t 1 t 1 )] 1 0 1 2 2 E[Yt 2 (1Yt 1 t 1 t 1 )] 1 1 Autocorrelation 1 (1 11 )(1 1 ) 0 1 12 211 For K 2 , K 1 K 1 1 Notice that the autocorrelation function begins at its starting point 1 , which is a function of 1 and 1 ; then decline geometrically from the starting value. This reflects the fact that the moving average part of the process has a memory of only one period. 11 Show graphs Yt 0.8Yt 1 t 0.9 t 1 2 Yt 0.8Yt 1 t 0.9 t 1 2 Determining the Order of an ARMA(p,q) Model In practice, choosing the order of p and q requires balancing the benefit of including more lags against the cost of additional estimation uncertainty. On the one hand, if, say, the order of an estimated autoregressive is too low ( p is too low), one will omit potentially valuable information contained in the more distant lagged values. On the other hand, if it is too high, one will be estimating more coefficients than necessary, which in turn introduces additional error into one’s forecast. One approach to choosing, say, p , is to start with a model with many lags and to perform hypothesis tests on the final lag, i.e., the F-statistic approach. For example, one might start by estimating AR(6) and test whether the coefficient on the sixth lag is significant at the 5% level; if not, drop it and estimate AR(5) model, and test the coefficient on the 5th lag, and so on. The drawback of this model is that it will produce too large a model, at least some of the time: even if the true AR order is 5, so the 6th coefficient is zero, a 5% test will incorrectly reject this null hypothesis 5% of the time just by chance. Thus, when the true value of p is five, the method will estimate to be six 5% of the time. The BIC(Bayes information criterion) One way around the problem is to estimate p by minimizing an “information criterion”. One is called BIC, sometimes it is also referred as the Schwarz information criterion (SIC), which is defined as: SSR( p) ln(T ) BIC ( p) ln( ) ( p 1) T T The BIC estimator p , p is the value that minimizes BIC(p), among the possible choices of p 1, 2,... pmax . 12 AR(p) Model for U.S. Inflation p SSR(p)/T ln(SSR(p)/T) (p+1)ln(T)/T BIC 0 2.853 1.048 0.033 1 2.726 1.003 0.066 2 2.361 0.859 0.099 3 2.264 0.817 0.132 4 2.261 0.816 0.165 5 2.260 0.815 0.198 6 2.257 0.814 0.231 T=152 (1962-1999 qtrly) 1.081 1.069 0.958 0.949 0.981 1.014 1.045 R squares 0 0.045 0.173 0.206 0.207 0.208 0.209 The AIC(Akaike information criterion) SSR ( p ) 2 ) ( p 1) T T where p is the order of autoregression and T is the sample size. AIC ( p ) ln( Non-stationary Process Very few of the economic time series in practice are stationary. But many can be differenced so that the resulting series will be stationary. The number of times that the original series must be differenced before a stationary series results is called the order of integration. Example: If Yt is first-order integrated non-stationary series, then Yt Yt Yt 1 is stationary. If Yt is second-order integrated series, then 2Yt Yt Yt 1 would be stationary. If Yt is non-stationary, the statistical characteristics of the process is not independent of time anymore. Example: Random walk Yt Yt 1 t where t ~ WN (0, 2 ) The variance of the process: 13 0 E (Yt 2 ) E [(Yt 1 t )2 ] E (Yt 21 ) 2 E (Yt 22 ) 2 2 .... E (Yt 2N ) N 2 The variance is infinite when N approaches infinite. The same is true for the covariance. But, YK t is stationary, (white noise), so we have 0 1,K 0for K 0 . 14 How Can We Decide a Series is Non-stationary? 1. Autocorrelation function. If it is stationary, the autocorrelation function should die off quickly. Example: Nonstationary vs. stationary series. ARIMA_stationary_nonstationary.sas 2. Unit Root Test The problem: Consider the model Yt Yt 1 t , t ~ WN In the random walk case, 1, the OLD estimation of this equation produces an estimate of that is biased toward 0. The OLS estimate is also biased toward zero when is less than but near 1. Dickey-Fuller Unit Root Test Consider the model Yt 0 1t t t t 1 t t ~ WN The reduced form is: Yt 0 (1 ) 1 (1 )t Yt 1 t or Yt t Yt 1 t The equation is said to have a unit root if 1. There are three test statistics: K (1) T (ˆ 1) ˆ 1 t (1) SE (ˆ ) F (0,1) (i .e., 0 and ˆ 1) The critical values for them are simulated. 15 The Augmented Dickey-Fuller Test for a Unit Autoregressive Root The Augmented Dickey-Fuller (ADF) test for a unit autoregressive root tests the null hypothesis H 0 : a 0 against the one-sided alternative H1 : a 0 in the regression Yt 0 aYt 1 1Yt 1 2 Yt 2 ... p Yt p t Under the null hypothesis, Yt has a stochastic trend; under the alternative hypothesis, Yt is stationary. The ADF statistic is the OLS t-statistic testing a 0 . If instead the alternative hypothesis is that Yt is stationary around a deterministic linear trend, then this trend “t” (the observation number) must be added as an additional regressor, in which case, the Dickey-Fuller regression becomes: Yt 0 t aYt 1 1Yt 1 2 Yt 2 ... p Yt p t where is an unknown coefficient and the ADF statistic is the OLS t-statistic testing a 0 . The length of p can be determined using AIC. The ADF does not have a normal distribution, even in large sample. Critical values for one-sided ADF test are simulated. Problems with Non-stationary Processes Suppose we have Yt * t Yt 1 t where t ~ WN (0, 2 ) and t 1, 2,...,T . The series is non-stationary since it has a trend in its variance. Issues: 1.Regression of a random walk on time t by least squares will produce high R2 value; i.e., if the true process is 0 , R 2 0.44 just by doing so. 2.If 0 , R2 will be even higher, and will increase with the sample size and reach one in the limit. 3.The residuals have on average only about 14% of the true variance. 10 4.The residuals are highly correlated, roughly (1 ) at lag one where T is the T sample size. 5.Conventional t-tests are not valid. 6.Regression of one random walk variable on another one is strongly subject to the spurious regression phenomenon.