Time Series II Autoregression and Moving Average IV. Moving Average in Time Series Data A. What is a moving average process? In a time series with an MA process, errors are the average of this period’s random error and last period’s random error. This process depends solely on innovations, so it has no memory of past levels (even AR-1 does remember). Put another way, the covariance of et and et-2 = 0 under an MA-1 process. This sort of MA-1 process might be generated by a variable with measurement error, or some process where the impact of a shock to y takes exactly one period to fade away. In an MA-2 process, the shock takes two periods, and then has completely faded away. We can express an MA process by the following equation, where both of the vs are innovation or random errors: et = θvt-1 + vt B. How can we identify a moving average process? We can look at special time series graphs called autocorrelation functions (ACFs) and partial autocorrelation functions (PACFs) to determine whether we have an AR process, and MA process, or neither. In order to produce either chart in Stata, we need to first tell it that we have time series data, tell it the name of the variable is that indicates where we are in the time series, and then tell it what the units (daily, monthly, yearly, etc) of the series are. After we do this, we can ask for an ACF for a particular dependent variable, and tell it how far back in time to go. An ACF just looks at the raw correlations between yt and yt-M, . Stata gives us a confidence band around zero; if the correlation is outside of this band, then the correlation between yt and ytM is statistically significant. tsset t, monthly time variable: t, 1960m2 to 1972m1 ac air, lags(6) level(95) 1.00 0.50 0.00 -0.50 1 2 3 4 5 6 Lag Bartlett's f ormula f or MA(q) 95% conf idence bands This looks like an AR process. How do I know? In autoregression, this year’s budget should be highly correlated with last year’s budget, and there should be enough memory in the process so that its correlation with the budget from six years ago is statistically significant. In an MA-1 process, the correlation would only last one time period. The correlation with the second through sixth lags should be inside the confidence band. We should see the reverse of this pattern in a PACF, which calculates correlations controlling for all correlations with prior lags. Suppose we have an AR-1 process. The ρ for the first lag should be significant, but controlling for that, the correlation of today’s value with the value two time periods ago should not be significant. If the correlations in the PACF are significant for two time periods, but only two, we probably have an AR-2 process. What will the PACF look like when we have a moving average process? It will look like the ACF of an AR process – the lags will continue to be significantly (partially) correlated with each other for a while. 1.00 0.50 0.00 -0.50 1 2 3 4 5 6 Lag 95% Conf idence bands [se = 1/sqrt(n)] V. Integration and ARIMA Models A. What is integration? An integrated time series has a time trend, where the dependent variable is a function of some random innovation and the product of time and a coefficient. yt = βt + et Perhaps your height (until about age 16) is an integrated time series. So the interesting variation to explain in any time unit is not your height but how much you grew. An I-1 model then explains variation in the difference in the dependent variable, yt - yt-1. The ARIMA model. An ARIMA model is one that can estimate any and all of these processes together. It is written by specifying the order of each process, in order. For instance, and ARIMIA(2,0,1) process is an AR-2 and an MA-1 process. An ARIMA(2,1,1) process explains variation in the first differences of the dependent variable (yt - yt-1, which comes from I-1) using the first differences of one and two lags previous (yt-1 - yt-2 and y2 - yt-3, coming from AR-1), the previous disturbances (et-1, coming from MA-1) and a random disturbance et. Here are examples of each: arima air, arima(2,0,1) (setting optimization to BHHH) Iteration 0: log likelihood = -860.83601 Iteration 22: log likelihood = -699.12461 ARIMA regression Sample: 1960m2 to 1972m1 Log likelihood = -699.1246 Number of obs Wald chi2(3) Prob > chi2 = = = 144 1087.32 0.0000 -----------------------------------------------------------------------------| OPG air | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------air | _cons | 281.7383 62.14023 4.53 0.000 159.9456 403.5309 -------------+---------------------------------------------------------------ARMA | ar | L1 | .4990531 .1305942 3.82 0.000 .2430931 .755013 L2 | .4313754 .1244967 3.46 0.001 .1873665 .6753844 ma | L1 | .8564644 .0812863 10.54 0.000 .6971461 1.015783 -------------+---------------------------------------------------------------/sigma | 30.6991 1.748526 17.56 0.000 27.27205 34.12615 arima air, arima(2,1,1) (setting optimization to BHHH) Iteration 0: log likelihood = -714.88177 Iteration 15: log likelihood = -675.8479 ARIMA regression Sample: 1960m3 to 1972m1 Log likelihood = -675.8479 Number of obs Wald chi2(3) Prob > chi2 = = = 143 366.94 0.0000 -----------------------------------------------------------------------------| OPG D.air | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------air | _cons | 2.669459 .1533041 17.41 0.000 2.368988 2.969929 -------------+---------------------------------------------------------------ARMA | ar | L1 | 1.104262 .0667915 16.53 0.000 .9733534 1.235171 L2 | -.5103714 .0801074 -6.37 0.000 -.667379 -.3533638 ma | L1 | -.9999992 654.2094 -0.00 0.999 -1283.227 1281.227 -------------+---------------------------------------------------------------/sigma | 26.88058 8792.34 0.00 0.998 -17205.79 17259.55 ------------------------------------------------------------------------------ Why does the effect of the constant get smaller in the model that includes integration? Why is the number of observations smaller? VI. Granger Causality and Vector Autoregression A. How can I Granger-cause something? If we can control for past values of Y, and X still has an effect on present Y, then X “Granger-causes” Y. This Granger-caused a UCSD economist to win the Nobel Prize, and get a whole notion of causality named after him. So how do we find out if X Granger-causes Y? To test this, perform a “vector autoregression.” Regress variable Yt on all of its lags, and all of the lags of the X variables that might possibly in a kitchen sink world be related to it. This cleans up the errors (though at the cost of multi-collinearity, low degrees of free, and an utter lack of theory). A Granger test looks at the standard error of this regression, versus the S.E.R. for a regression of Y on lagged Ys along. Another way to see if X Granger-causes Y is to do a joint F-test on the lagged Xs. VII. Unit Roots A. What is a stationary process? In a stationary process, the mean and variance of Y is not dependent on time. We like to assume that this is true. If there is a time trend in our process (but the variable is stationary around that trend), that’s not a big problem. We can simply “detrend” our data, and then model it. But the problem comes when we have a “random walk” or a “random walk with drift.” The mean and/or variance of Y moves around over time, but not in any systematic fashion (more like a drunk by a lamppost). If we have a random walk, in a model regressing Y on its lag, we should get a ρ = 1, which is called a “unit root.” But the problem is that the variance of the lagged Y (the denominator of ρ) will be infinite, making it impossible to perform a traditional t-test to see whether ρ = 1. So to see if we indeed have a unit root, we perform a slightly different test called the Dickey-Fuller test