Moving Average Models MA(q)

advertisement
1
Moving Average Models  MA(q)


In the moving average process of order q , each observation Yt is
generated by a weighted average of random disturbances going back q
periods.
It is denoted as MA(q ) , and the equation is
Yt     t  1 t 1   2 t 2  ....   q t q
where the parameters 1,  2 ,...., q may be positive or negative.
t



WN (0, 2 ), covariance  K  0, for K  0.
White noise processes may not occur very common, but weighted sums of
a white noise process can provide a good representation of processes that
are nonwhite.
The mean of the moving average process is independent of time, since
E(Yt )  
The variance:
var(Yt )   0  E [Yt   )2 ]
 E [( t  1 t 1   2 t 2  ....  q  q )( t  1 t 1   2 t 2  ....   q q )]
 E [ t2  12 t 1   22 t22  ....  q2 t2q  1 t 1 t   2 t 2 t  ....]
  2  12 2   22 2  ....   q2 2
  2 (1  12   22  ....   q2 )
(sin ce E ( t  )  0 for t   )
Yt is stationary, so the variance of Yt to be finite. So we have
q
  i  , or more general
i 1
2

  i lim  .
i 1
2
x 
2
Let’s examine some simple moving average processes; calculating the
mean, variance, covariance and autocorrelation function for each. These
statistics are important since:
1. They provide information that helps characterize the process;
2. Help us to identify the process when we construct models.
Example 1
MA(1)
Yt     t  1 t 1
mean = 
variance =  2 (1  12 )
covariance for lag one
 1  E [(Yt   )(Yt 1   )]
 E [( t  1 t 1 )( t 1  1 t 2 )]
 1 2
For  K , K  1, in general
 K  E [( t  1 t 1 )( t K  1 t K 1 )]  0
Thus, the MA(1) process has a covariance of zero when the displacement
is more than one period. (It has a memory of only one period.)
*Autocorrelation function for MA(1)
 1
K 
K 
 1  12
0 
 0
[Graph MA(1) ]
for K  1
otherwise
3
Example 2

MA(2)
Equation
Yt     t  1 t 1   2 t 2
*mean = 
*variance =  2 (1  12   22 )   0
*covariance
 1  E [( t  1 t 1   2 t 2 )( t 1  1 t 2   2 t 3 )
 1 2   21 2  1(1   2 ) 2
 2  E [( t  1 t 1   2 t 2 )( t 2  1 t 3   2 t 4 )
  2 2
and
 K  0 for K  2.
*Autocorrelation function is
 1 1(1   2 )

;
 0 1  12   22

 2
2  2 
.
 0 1  12   22
1 
NOTE: The process MA(2) has a memory of two periods.
[Show graph MA(2) ]
4
In general, the formula of autocorrelation for a moving average process of
order q [ MA(q ) ] is
K 
K  1K 1  ....  q Kq
1  12   22  ....q2

We can see now why the sample autocorrelation function can be useful in
specifying the order of a moving average process: the autocorrelation
function  K for the MA(q ) process has q non-zero values and is zero for
K  q.

Invertible
MA(1) process Yt     t  1 t 1
If 1  1, the process is invertible, i.e., we can “invert” the series and
express the current value of Yt in a current disturbance and its lagged value. This
is so called an autoregressive representation.
 t  Yt  1 t 1
 t 1  Yt 1  1 t 2
 t 2  Yt 2  1 t 3
 Yt   t  1(Yt 1  1 t 2 )
  t  1Yt 1  12 (Yt 2  1 t 3 )
  t  1Yt 1  12Yt 2  13 (Yt 3  1 t 4 )
  t  1Yt 1  12Yt 2  13Yt 3  ...
Notice that the autoregressive representation exists only if 1  1.
5
Autoregressive Models – AR(P)


In the autoregressive process of order p , the current observation Yt is
generated by a weighted average of past observations going back p
periods, together with a random disturbance in the current period.
It is denoted as AR ( p ) , and the equation is:
Yt    1Yt 1  2Yt 2  .... Yt     t
where  is a constant term which relates to the mean of the series and
1,2 ,....  can be positive or negative.

The properties of autoregressive processes:
If the autoregressive process is stationary, then its mean, denoting  ,
must be invariant with respect to time, i.e.,
E(Yt )  E(Yt 1 )  ....  
The mean of AR ( p ) then is
E (Yt )    1  2   ....    
or  

1  1  2 ....   
This formula also gives us a condition for stationarity, i.e.,
1  2 ....     1.
Example 1
AR (1)
Yt  1Yt 1     t

*mean =  
and is stationary if 1  1.
1  1
* var iance (set   0)
 0  E [(1Yt 1   t )2 ]  E [(1Yt 1   t )(1Yt 1   t )]
 12 0   2   0 
 2
1  12
6
*covariance
 1  E [Yt 1(1Yt 1   t )]
 1 0 (substitute  0 int o t he equation )  1
 2
1  12
 2  E [Yt 2 (1Yt 1   t )]
sin ce Yt 1   Yt 2   t 1, so we have
 2  E [Yt 2 (12Yt 2  1 t 1   t )]
   (substitute  0
2
1 0
 2
by

1  12
 2
2  
1  12
2
1
Similarly, the covariance for K-lag displacement is
 2
K   0  
1  12
*the autocorrelation function
K
1
  1, K 
K
1
K
 1K
0
(decline geometrically !)
NOTE: This process has an infinite memory. The current value Yt
depends on all past values, although the magnitude of this dependence declines
with time.
[Show graph AR (1) ]
Yt  0.9Yt 1   t
7
Example 2
AR(2)
Yt    1Yt 1  2Yt 2   t
*mean =  

1  1  2
The necessary condition for stationarity is that 1  2  1.
*variance and covariances (assuming   0)
 0  E [Yt (1Yt 1  2Yt 2   t )]
(1)
 1 1  2 2   
2
 1  E [Yt 1 (1Yt 1  2Yt 2   t )]
(2)
 1 0  2 1
 2  E [Yt 2 (1Yt 1  2Yt 2   t )]
(3)
 1 1  2 0
In general, for K

2, we have
 K  1 K 1  2 k 2
Now we can solve  0 ,  1, and  2 in terms of 1, 2 , and  2 .
Starting from  1 :
 1  1 0  2 1   1 
1 0
1  2
(4)
Substituting (3) into (1)
 0  1 1  12 1  22 0
then substituting (4) into (5)
0 
12 0 212 0

 22 0   2
1  2 1  2
after rearranging,  0 
then solve for  2 .
 2
(1  2 )
(1  2 ) [(1  2 )  12 ]
(5)
8
*autocorrelation function
1 
1

 1
 0 1  2
2
12
2 
 2 
0
1  2
In general, for K  2
K  1 K 1  2 K 2

We can use ˆ K to derive the autoregressive parameters: Yule-Walker
equations.
1 
1
1  2
 2  2 
12
1  2
Suppose we have the sample autocorrelation function for a time series
which is AR(2) . Then we calculate the sample autocorrelation function.
T K
ˆ K 
 (Yt  Y )(Yt K  Y )
t 1
T
2
 (Yt  Y )
t 1
 ˆ1, ˆ 2 , then substitute the value into Yule-Walker equations to solve
1 and 2 .
[Show graph of AR(2) ]
9
The Partial Autocorrelation Function



The partial autocorrelation function can be used to determine the order of
AR processes.
The reason why it is called partial autocorrelation function is because it
describes the correlation between Yt and Yt  K minus the part explained
linearly by the intervening lags.
The idea here is to use Yule-Walker equations to solve for successive
values of p , which is the order of the AR process.
Example
Suppose we start from the assumption that the autoregressive order is
one, i.e. p  1 . Then we have 1 = 1, or sample autocorrelation ˆ1  ˆ1. If
the calculated value ˆ , is significantly different from zero,  the
1
autoregressive order is at least one (using  1 to denote ˆ1.)
Now, consider p  2 . Solving Yule-Walker equations for p  2 ,
 ˆ and ˆ . If ˆ is significantly different from zero,  the process is
1
2
2
at least order 2 (denote  2  ˆ2 ) . If ˆ2 is approximately zero,  the order
is one.
Repeating the process and get 1,...  . We call 1,...  the partial
autocorrelation function and we can determine the order of AR processes
from its behavior.
In particular, if the true order is p , then  j  0 for  j  P .

To test  j and see if it is zero, we use the fact that it is approximately
normally distributed with mean zero and standard error
level is to see whether it exceeds
2
T
in magnitude.
1
T
. So 5 percent
10
Autoregressive-Moving Average Models ARMA(p,q)
ARMA( p, q ) Models
Yt  1Yt 1  2Yt  2  ...   pYt  p   t  1 t 1  ...   q t q
where
 t ~ WN (0,  2 )
Assume that the process is stationary, so that its mean is constant over time:
1
  1  2   ...   p  
1  2  ...   p
Notice that this gives a necessary condition for stationarity of Yt
1  2  ...   p  1
For variance and covariance, let us assume ARMA(1,1)
Yt  1Yt 1   t  1 t 1
Variance:
 0  E[(1Yt 1   t  1 t 1 )(1Yt 1   t  1 t 1 )]
 12 0   2  12 2  211 E[ t 1Yt 1 ]
and
E[ t 1Yt 1 ]   2
We have
 0  12 0   2  12 2  211 2
or
0 
1
*  2 (1  12  211 )
(1  12 )
Covariance
 1  E[Yt 1 (1Yt 1   t  1 t 1 )]  1 0  1 2
 2  E[Yt  2 (1Yt 1   t  1 t 1 )]  1 1
Autocorrelation
1
(1  11 )(1  1 )

0
1  12  211
For K  2 ,  K  1  K 1
1 
Notice that the autocorrelation function begins at its starting point 1 , which is a
function of 1 and 1 ; then decline geometrically from the starting value. This
reflects the fact that the moving average part of the process has a memory of
only one period.
11
Show graphs
Yt  0.8Yt 1   t  0.9 t 1  2
Yt  0.8Yt 1   t  0.9 t 1  2
Determining the Order of an ARMA(p,q) Model
In practice, choosing the order of p and q requires balancing the benefit of
including more lags against the cost of additional estimation uncertainty. On the
one hand, if, say, the order of an estimated autoregressive is too low ( p is too
low), one will omit potentially valuable information contained in the more distant
lagged values. On the other hand, if it is too high, one will be estimating more
coefficients than necessary, which in turn introduces additional error into one’s
forecast.
One approach to choosing, say, p , is to start with a model with many lags and to
perform hypothesis tests on the final lag, i.e., the F-statistic approach. For
example, one might start by estimating AR(6) and test whether the coefficient on
the sixth lag is significant at the 5% level; if not, drop it and estimate
AR(5) model, and test the coefficient on the 5th lag, and so on. The drawback of
this model is that it will produce too large a model, at least some of the time:
even if the true AR order is 5, so the 6th coefficient is zero, a 5% test will
incorrectly reject this null hypothesis 5% of the time just by chance. Thus, when
the true value of p is five, the method will estimate to be six 5% of the time.
The BIC(Bayes information criterion)
One way around the problem is to estimate p by minimizing an “information
criterion”. One is called BIC, sometimes it is also referred as the Schwarz
information criterion (SIC), which is defined as:
SSR( p)
ln(T )
BIC ( p)  ln(
)  ( p  1)
T
T
The BIC estimator p , p is the value that minimizes BIC(p), among the possible
choices of p  1, 2,... pmax .
12
AR(p) Model for U.S. Inflation
p
SSR(p)/T ln(SSR(p)/T) (p+1)ln(T)/T BIC
0
2.853
1.048
0.033
1
2.726
1.003
0.066
2
2.361
0.859
0.099
3
2.264
0.817
0.132
4
2.261
0.816
0.165
5
2.260
0.815
0.198
6
2.257
0.814
0.231
T=152 (1962-1999 qtrly)
1.081
1.069
0.958
0.949
0.981
1.014
1.045
R squares
0
0.045
0.173
0.206
0.207
0.208
0.209
The AIC(Akaike information criterion)
SSR ( p )
2
)  ( p  1)
T
T
where p is the order of autoregression and T is the sample size.
AIC ( p )  ln(
Non-stationary Process



Very few of the economic time series in practice are stationary.
But many can be differenced so that the resulting series will be stationary.
The number of times that the original series must be differenced before a
stationary series results is called the order of integration.
Example:
If Yt is first-order integrated non-stationary series, then
Yt  Yt  Yt 1
is stationary.
If Yt is second-order integrated series, then
 2Yt  Yt  Yt 1
would be stationary.
If Yt is non-stationary, the statistical characteristics of the process is not
independent of time anymore.
Example: Random walk
Yt  Yt 1   t
where  t ~ WN (0,  2 )
The variance of the process:
13
 0  E (Yt 2 )
 E [(Yt 1   t )2 ]  E (Yt 21 )   2
 E (Yt 22 )  2 2 ....
 E (Yt 2N )  N  2
The variance is infinite when N approaches infinite.
The same is true for the covariance.
But, YK   t is stationary, (white noise), so we have
0  1,K  0for K  0 .
14
How Can We Decide a Series is Non-stationary?
1. Autocorrelation function. If it is stationary, the autocorrelation
function should die off quickly.
Example: Nonstationary vs. stationary series.
ARIMA_stationary_nonstationary.sas
2. Unit Root Test
The problem:
Consider the model
Yt  Yt 1   t ,  t ~ WN
In the random walk case,   1, the OLD estimation of this equation
produces an estimate of  that is biased toward 0. The OLS
estimate is also biased toward zero when  is less than but near 1.
Dickey-Fuller Unit Root Test
Consider the model
Yt   0  1t  t
t   t 1   t
 t ~ WN
The reduced form is:
Yt   0 (1   )  1 (1   )t   Yt 1   t
or
Yt     t  Yt 1   t
The equation is said to have a unit root if   1.
There are three test statistics:
K (1)  T (ˆ  1)
ˆ  1
t (1) 
SE (ˆ )
F (0,1) (i .e.,   0 and ˆ  1)
The critical values for them are simulated.
15
The Augmented Dickey-Fuller Test for a Unit Autoregressive Root
The Augmented Dickey-Fuller (ADF) test for a unit autoregressive root tests the
null hypothesis H 0 : a  0 against the one-sided alternative H1 : a  0 in the
regression
Yt   0  aYt 1   1Yt 1   2 Yt  2  ...   p Yt  p   t
Under the null hypothesis, Yt has a stochastic trend; under the alternative
hypothesis, Yt is stationary. The ADF statistic is the OLS t-statistic testing a  0 .
If instead the alternative hypothesis is that Yt is stationary around a deterministic
linear trend, then this trend “t” (the observation number) must be added as an
additional regressor, in which case, the Dickey-Fuller regression becomes:
Yt   0   t  aYt 1   1Yt 1   2 Yt  2  ...   p Yt  p   t
where  is an unknown coefficient and the ADF statistic is the OLS t-statistic
testing a  0 .
The length of p can be determined using AIC. The ADF does not have a normal
distribution, even in large sample. Critical values for one-sided ADF test are
simulated.
Problems with Non-stationary Processes
Suppose we have
Yt     * t  Yt 1   t
where  t ~ WN (0,  2 ) and t  1, 2,...,T . The series is non-stationary since it has a
trend in its variance.
Issues:
1.Regression of a random walk on time t by least squares will produce high R2
value; i.e., if the true process is   0 , R 2  0.44 just by doing so.
2.If   0 , R2 will be even higher, and will increase with the sample size and
reach one in the limit.
3.The residuals have on average only about 14% of the true variance.
10
4.The residuals are highly correlated, roughly (1  ) at lag one where T is the
T
sample size.
5.Conventional t-tests are not valid.
6.Regression of one random walk variable on another one is strongly subject to
the spurious regression phenomenon.
Download