ST4064 Time Series Analysis Lecture notes

advertisement
ST4064
Time Series Analysis
ST4064
Time Series Analysis
Lecture notes
1
ST4064
Time Series Analysis
Outline
I
Introduction to time series analysis
II
Stationarity and ARMA modelling
1. Stationarity
a. Definitions
b. Strict stationarity
c. Weak stationarity
2. Autocovariance, autocorrelation and partial autocorrelation
a. Autocovariance
b. Autocorrelation
c. Partial autocorrelation
d. Estimation of the ACF and PACF
3. ARMA modelling
a. AR models
b. MA models
c. ARMA models
4. Backward Shift Operator and Difference Operator
5. AR(p) models, stationarity and the Yule-Walker equations
a. The AR(1) model
b. The AR(p) model and stationarity
c. Yule-Walker equations
6. MA(q) models and invertibility
a. The MA(1) model
b. The MA(q) model and invertibility
7. ARMA(p,q) models
8. ARIMA(p,d,q) models
a. Non-ARMA processes
b. The I(d) notation
9. The Markov property
III
Non-stationarity: trends and techniques
1. Typical trends
2. Least squares trend removal
3. Differencing
a. Linear trend removal
b. Selection of d
4. Seasonal differencing
2
ST4064
5.
6.
7.
8.
IV
Time Series Analysis
Method of moving averages
Seasonal means
Filtering, smoothing
Transformations
Box-Jenkins methodology
1. Overview
2. Model selection
a. Identification of white noise
b. Identification of MA(q)
c. Identification of AR(p)
3. Model fitting
a. Fitting an ARMA(p,q) model
b. Parameter estimation: LS and ML
c. Parameter estimation: method of moments
d. Diagnostic checking
V
Forecasting
1.
2.
3.
4.
VI
The Box-Jenkins approach
Forecasting ARIMA processes
Exponential smoothing and Holt-Winters
Filtering
Multivariate time series analysis
1.
2.
3.
4.
Principal component analysis and dimension reduction
Vector AR processes
Cointegration
Other common models
a. Bilinear models
b. Threshold AR models
c. Random coefficient AR models
5. ARCH and GARCH
a. ARCH
b. GARCH
3
ST4064
I.
Time Series Analysis
4
Introduction to time series analysis
A time series is a stochastic process in discrete time with a continuous state space.
Notation: {X1, X2, ..., Xn } denotes a time series process, whereas {x1, x2, ..., xn } denotes a univariate
time series, i.e. a sequence of realisations of the time series process.
X1
X2
...
Xn-1
Xn
Xn+1
S = (-!,!)
x1
x2
0
I.1
1
…
xn
xn-1
2
n-1
?
n
n+1
time
Purposes of Time Series Analysis
•
•
Describe the observed time series data:
-
mean, variance, correlation structure, ...
-
e.g. correlation coefficient between sales 1 month apart, 2 months apart, etc.
! Autocorrelation Function (ACF)
! Partial Autocorrelation Function (PACF)
Construct a model which fits the data
! From the class of ARMA models, select a model which best fits the data based on ACF
and PACF of the observed time series
! Apply Box Jenkins Methodology:
o Identify tentative model
o Estimate model parameters
o Diagnostic checks - does the model fit?
•
Forecast future values of the time series process
! easy, once model has been fitted to past data
All ARMA models are stationary. If an observed time series is non-stationary (e.g. upward trend),
it must be converted to stationary time series (e.g. by differencing).
ST4064
I.2
Time Series Analysis
5
Other forms of analysis
Another important approach to the analysis of time series relies on the Spectral Density Function; the
analysis is then based on the autocorrelation function of a time series model. This approach is not
covered in this course.
ST4064
II.
Time Series Analysis
6
Stationarity and ARMA modelling
II.1 Stationarity
a. Definition
A stochastic process is (strictly) stationary if its statistical properties remain unchanged over time.
S
X5
X10
X120
X125
5
10
120
125
time
Joint distribution of Xt1, Xt2, ..., Xtn = Joint distribution of Xk+t1, Xk+t2, ..., Xk+tn, for all k and for all n.
Example: Joint distribution of X5, X6, ..., X10 = Joint distribution of X120, X121, ..., X125
" for any ‘chunk’ of variables
" for any ‘shift’ of start
Implications of (strict) stationarity
Take n = 1:
Xt
Xt+k
t
•
t+k
Distribution of Xt = distribution of Xt+k for any integers k
Xt discrete:
P(Xt = i) = P(Xt+k = i) for any k
Xt continuous:
f(Xt) = f(Xt+k) for any k
In particular,
E(Xt) = E(Xt+k) for any k
Var(Xt) = Var(Xt+k) for any k
•
A stationary process has constant mean and variance
•
The variables Xt in a stationary process must be identically distributed (but not necessarily
independent)
ST4064
Time Series Analysis
Take n = 2:
•
Xs
Xt
Xs+k
Xt+k
{s
t}
{s+k
t+k}
Joint Distribution of (Xs ,Xt) = Joint Distribution of (Xs+k ,Xt+k)
"
"
"
•
In particular, COV(Xs ,Xt) = COV(Xs+k ,Xt+k)
where
•
for all lags (t - s)
for all integers k
depends on the lag (t - s)
COV(Xs ,Xt) = E[(Xs – E(Xs)) (Xt – E(Xt))]
Thus COV(Xs ,Xt) depends only on lag (t – s) and not on time s
b. Strict Stationarity
•
•
•
Very stringent requirement
Hard to prove a process is stationary
To show a process is not stationary show one condition doesn’t hold
Examples:
Simple Random Walk: {Xt} not identically distributed
!
NOT stationary
White Noise Process: {Zt} i.i.d.
!
trivially stationary
c. Weak Stationarity
•
This requires only that E(Xt) is constant AND COV(Xs, Xt) depends only on (t – s)
•
Since Var(Xt) = COV(Xt, Xt) this implies that Var(Xt) is constant
•
Weak stationarity does not imply strict stationarity
•
For weak stationarity, COV(Xt, Xt+k) is constant with respect to t for all lags k
•
Here (and often), stationary is shorthand for weakly stationary
7
ST4064
Time Series Analysis
8
Question: If the joint distribution of the Xt’s is multivariate normal, then weak stationarity implies strong
stationarity.
Solution: If X ~ N(!, ") then X is completely determined by ! and " (property of the multivariate
Normal distribution). If these do not depend on t, neither does the distribution of X.
Example: Xt = sin(#t + u), U ~ U[0, 2$] then E(Xt) = 0.
Here COV(Xt, Xt+k) = cos(#k) E(sin2(u)), hence does not depend on t
!
Xt is weakly stationary
Question: If we know X0, then we can work out u, since X0 = sin(u). We then know all the values of Xt
= sin(#t + u)
!
Xt is completely determined by X0
Definition: X is purely indeterministic if values of X1, ..., Xn are progressively less useful at predicting
XN as N % &.
Here stationary time series means weakly stationary, purely indeterministic process.
II.2 Autocovariance, autocorrelation and partial autocorrelation
c.
Autocovariance function
Xt
|
t
Xt+k
|
t+k
time
•
For a stationary process, E(Xt) = µt = µ, for any t
•
We define !k = Cov(Xt, Xt+k) = E(Xt Xt+k) - E(Xt) E(Xt+k) the “autocovariance at LAG k”.
•
This function does not depend on t.
•
Autocovariance function of X: {'0, '1, '2, ...} = {'k : k ( 0}
•
Note: !0 = Var(Xt)
Question: Properties of covariance – needed when calculating autocovariances for specified models.
b. Autocorrelation function (ACF)
•
Recall that corr(X,Y) = Cov(X,Y) / ()X)Y)
•
For a stationary process, we define "k = corr(Xt, Xt+k) = !k/!0 the “autocorrelation at lag k”.
ST4064
Time Series Analysis
9
(This is the usual correlation coefficient, since Var(Xt) = Var(Xt+k) = '0)
•
Autocorrelation Function (ACF) of X: { *0 , *1, *2 , ...} = {*k : k ( 0}
•
Note: "0 = 1
•
For a purely indeterministic process, we expect *k % 0 as k % & (i.e. values far apart will not
be correlated)
•
Recall (ST3053): a sequence of i.i.d. random variables {Zt} is called a white noise process and is
trivially stationary.
Example: {et} is a zero-mean white noise process if
"
E(et) = 0 for any t and
"
"k
#! 2 , if k = 0
= COV ( e t ,e t + k ) = $
%0, otherwise
•
Note: the variables et have zero mean, variance )2 and are uncorrelated
•
A sequence of i.i.d. variables with zero mean will be a white noise process, according to this
definition. In particular, Zt independent, Zt ~ N(0,)2) is a white noise process.
•
Result: 'k = '-k and *k = *-k
•
Correlogram = plot of ACF {*k : k ( 0} as a function of lag k. It is widely used as it tells a lot
about the time series.
c. Partial autocorrelation function (PACF)
Let r(x,y|z) = corr(x,y|z) denote the partial correlation coefficient between x and y, adjusted for z (or
with z held constant).
Xt
|
t
•
Xt+1 Xt+2 ... Xt+k-1 Xt+k
|
t+1
|
t+2
|
|
t+k-1 t+k
Denote:
+2 = corr(xt, xt+2|xt+1)
+3 = corr(xt, xt+3|xt+1, xt+2)
+4= corr(xt, xt+k|xt+1,... xt+k-1)
+k = partial autocorrelation coefficient at lag k.
ST4064
•
Time Series Analysis
10
Partial autocorrelation function (PACF):
{+1, +2, ...} = {+k, k ( 1}
•
The +k’s are related to the *k’s:
+1 = corr(Xt, Xt+1) = *1
Recall that
r(x,y|z) =
r(x,y) - r(x,z)r(y,z)
1-r 2 (x,z) 1-r 2 (y,z)
Applying this here, using x = Xt, y = Xt+2, z = Xt+1, "2 = corr(xt, xt+2|xt+1) = r(x,y|z), along with #1= r(x,z)
and #2 = r(x,y), yields:
"2 =
! 2 # !12
1 # !12
d. Estimation of the ACF and PACF
We assume that the sequence of observations {x1, x2, ...xn} comes from a stationary time series process.
The following functions are central to the analysis of time series:
{'k} - Autocovariance function
f($)
{*k} Autocorrelation function (ACF)
Spectral
density
function
{!k } Partial Autocorrelation function
(PACF)
To find a model to fit the sequence {x1,x2, ... ,xn}, we must be able to estimate the ACF of the process
of which the data is a realisation. Since the model underlying the data is assumed to be stationary, its
mean can be estimated using the sample mean.
µ̂ =
1 n
! xt
n t =1
The autocovariance function, 'k, can be estimated using the sample autocovariance function:
ST4064
Time Series Analysis
!ˆk =
11
1 n
# (x t " µˆ )(x t-k " µˆ )
n t = k +1
from which are derived estimates, rk for the autocorrelation *k:
rk =
!ˆk
!ˆ0
The collection {rk : k ! Z } is called the sample autocorrelation function (SACF). The plot of rk
against k is called a correlogram.
Recall that the partial autocorrelation coefficients !k are calculated as follows:
!1 = "1
1
!2 =
1
"1
" 2 " 2 # "12
=
"1
1 # "12
"1
1
"1
In general, !k is given as a ratio of determinants involving *1, *2, ..., *k. The sample partial
autocorrelation coefficients are given by these formulae, but with the *k replaced by their estimates rk:
!ˆ1 = r1
r2 " r12
ˆ
!2 =
1 " r12
etc.
The collection {!ˆk } is called the sample partial autocorrelation function (SPACF). The plot of {!ˆk }
against k is called the partial correlogram.
rk
!ˆk 1
1
|
|
|
|
|
|
|
|
|
k
-1
|
|
|
k
-1
These are the main tools in identifying a model for a stationary time series.
ST4064
Time Series Analysis
12
II.3 ARMA modelling
Autoregressive moving average (ARMA) models constitute the main class of linear models for time
series. More specifically:
a.
•
Autoregressive (AR)
•
Moving Average (MA)
•
Autoregressive Moving Average (ARMA)
•
Autoregressive Integrated Moving Average (ARIMA)
!
Last type are non-stationary
!
Others are stationary
AR models
•
Recall: Markov Chain = process such that the conditional distribution of Xn+1, given Xn,Xn-1,...X0
depends only on Xn, i.e. “the future depends on the present, but not on the past”
•
The simplest type of autoregressive model (AR(1)) has this property: Xt = , Xt-1 + -t , where -t is
zero-mean white noise.
Xt-2
|
t-2
Xt-1
|
t-1
Xt
|
t
•
For AR(1), we prove that +2 = corr(Xt, Xt-2| Xt-1) = 0
•
Similarly, +k = 0 for k > 2.
•
A more general form of an AR(1) model is
Xt = µ + , (Xt-1 – µ) + -t
where µ = E(Xt) is the process mean
•
Autoregressive process of order p (AR(p)):
Xt = µ + ,1 (Xt-1 – µ) + ,2 (Xt-2 – µ) + ... ,p (Xt-p – µ) + -t
b. MA models
A realisation of a white noise process is very ‘jagged’, since successive observations are realisations of
independent variables... Most time series observed in practice have a smoother time series plot than a
realisation of a white noise process, since in this process the successive observations are realisations of
independent variables. In that respect, taking a “moving average” is a standard way of smoothing an
observed time series:
Observed data:
x1 , x2 , x3 , x4 ,...
ST4064
Time Series Analysis
13
1
( x1 + x2 + x3 , x2 + x3 + x4 ,...)
3
Moving average:
Data
Moving average
•
A moving average process is “smoothed white noise”
•
The simplest type of moving average (MA) process is Xt = µ +-t + .-t-1 where -t is zero-mean
white noise
•
The %t’s are uncorrelated, but the Xt’s are not:
Xt-1
.....
-t-1
-t-2
-t-1
Xt-2
-t
Xt
•
For MA(1) we prove that: *2 = corr(Xt , Xt-2) = 0
•
Similarly, *k = 0, for k > 2
•
Moving average process of order (q) (MA(q)):
Xt = µ +-t + .1 -t-1 + ... + .q -t-q
c.
ARMA models
ARMA processes « combine » AR and MA parts :
Xt = µ + ,1(Xt-1 – µ) +...+ ,p(Xt-p – µ) + -t + . 1 -t-1 +...+ .q -t-q
Note: ARMA(p,0) = AR(p)
ARMA(0,q) = MA(q)
II.4 Backwards Shift Operator and Difference Operator
The following operators will be useful:
•
Backwards shift operator: B Xt = Xt-1, Bµ = µ
ST4064
•
Time Series Analysis
14
Difference operator: ! = 1-B, hence
!Xt
= Xt – Xt-1
B2Xt
= BBXt = BXt-1 = Xt-2
!2Xt
=
=
=
=
=
!Xt - !Xt-1
Xt – Xt-1 – (Xt-1 – Xt-2)
(1-B)2Xt
(1-2B+B2) Xt
Xt – 2Xt-1 + Xt-2
II.5 AR(p) models, stationarity and the Yule-Walker equations
a.
The AR(1) Model
•
Recall Xt = µ + ,(Xt-1 – µ) + -t
•
Substituting in for Xt-1, then for Xt-2,
Xt = µ + ,[, (Xt-2 – µ) + -t-1] + -t = µ + ,2(Xt-2 – µ) + -t + , -t-1
t
t-1
t-1
t
Xt = µ + , (X0 – µ) + -t + , -t-1 + ...+ , -1 = µ + , (X0 – µ) +
j=0
•
Note: X0 is a Random Variable
•
Since E(-t) = 0 for any t, µt = E(Xt) =µ + ,t (µ0 – µ)
•
Since the -t’s are uncorrelated with each other and with X0,
%
t $1
&
j =0
*
Var ( X t ) = Var '' µ + ! t ( X 0 $ µ ) + +! j" t $ j ((
)
t $1
= ! 2tVar ( X 0 ) + + ! 2 j# 2
j =0
1 $ ! 2t
= ! Var ( X 0 ) + #
1$! 2
2t
2
Question: When will AR(1) process be stationary?
Answer: This will require constant mean and variance.
If
If
µ0 = µ
then µt = µ + ,t (µ0 – µ) = µ.
!2
Var (X0) =
1#" 2
then Var ( X t ) = " 2
j
#! "
2
1 # ! 2t
"2
2t "
+
!
=
1#! 2
1#! 2 1#! 2
t-j
ST4064
Time Series Analysis
15
Neither µt nor Var(Xt) depend on t. We also require that |,| < 1 so that the AR(1) process be stationary,
in which case
µt = µ + " t (µ0 % µ ) AND Var ( X t ) %
•
!2
!2 $
2t #
=
"
Var
(
X
)
%
&
'
0
1%" 2
1%" 2 )
(
If |,| < 1, both terms will decay away to zero for large t
!
X is almost stationary for large t
•
Equivalently, if we assume that the process has already been running for a very long time, it
will be stationary
•
Any AR(1) process with infinite history and |,| < 1 will be stationary:
... --2, --1, -0, -1,
...
-t
... X-2, X-1, X0, X1,
...
Xt
Steady State reached
•
Observed time series
An AR(1) process can be represented as:
#
X t = µ + %! j" t $ j
j =0
and this converges only if |,| < 1.
•
The AR(1) model Xt = µ + , (Xt-1 – µ) + -t can be written as
(1 – ,B)(Xt – µ) = -t
If |,| < 1, then (1 – ,B) is invertible and
Xt – µ = (1 – ,B)-1-t = (1 + ,B + ,2B2 + ...) -t
= -t + ,-t-1 + ,2-t-2 + ...,
•
So
#
X t = µ + %! j " t $ j
j =0
From this representation,
µt = E(Xt) = µ and Var(Xt) =
#
$!
j =0
"2 =
2j
!2
1#" 2
if |,| < 1.
ST4064
Time Series Analysis
•
So, if |,| < 1, the mean and variance are constant, as required for stationarity
•
We must calculate the autocovariance 'x = Cov(Xt , Xt+k) and show that this depends only on the
lag k. We need properties of covariance:
Cov(X+Y,W) = Cov(X,W) + Cov (Y,W)
Cov(X,e) = 0
From the following diagram
... -t-2, -t-1, -t (uncorrelated)
and ... -t-2, -t-1, -t
Xt
Xt-1
we can tell that -t and Xt-1 are uncorrelated, hence
Cov(-t, Xt-1) = 0
Cov(-t, Xt-k) = 0,
Cov(-t, Xt) = )2
k(1
'1 = Cov(Xt, Xt-1) = Cov(µ + ,(Xt-1 – µ) + -t, Xt-1)
= , Cov(Xt-1, Xt-1) + Cov(-t, Xt-1)
= ,'0 + 0
'2 = Cov(Xt, Xt-2) = Cov(µ + ,(Xt-1 – µ) + -t, Xt-2)
= , Cov(Xt-1, Xt-2) + Cov(-t, Xt-2)
= , '1 + 0
= ,2'0
Similarly,
'k = ,k '0, k ( 0
In general,
'k = Cov(Xt,Xt-k) = Cov(µ + ,(Xt-1 – µ) + -t,Xt-k)
= ,Cov(Xt-1, Xt-k) + Cov(-t,Xt-k)
= , 'k-1 + 0
Hence,
2
'k = ,k'0 = ,k !
1#" 2
and
16
for k ( 0
*k = 'k/'0 = ,k for k ( 0
! ACF decreases geometrically with k
ST4064
Time Series Analysis
Recall the partial autocorrelations +1 and +2 satisfy
!2 # !12
"1 = !1 and "2 =
1 # !12
Here +1 = *1 = , and
! #!
=0
1#!
2
"2 =
In fact,
2
2
#k = 0 for k > 1
In summary, for the AR(1) model,
• ACF
“tails off” to zero
• PACF
“cuts off” after lag 1
Example: Consumer price index Qt
rt = ln(Qt/Qt-1)
models the force of inflation
Assume rt is an AR(1) process:
rt= µ + ,(rt-1-µ) + et
Note: Here µ is the long-run mean
rt - µ = ,(rt-1- µ), ignoring et
If |,| < 1, then rt – µ % 0 and so rt % µ as t % &. In this case rt is said to be mean-reverting.
b.
The AR(P) model and stationarity
Recall that the AR(p) model can be written either in its generic form
Xt = µ + ,1(Xt-1 – µ ) + ,2(Xt-2 – µ) + ... + ,p(Xt-p – µ) + et
or using the B operator as
(1 – ,1 B – ,2 B2 – ,3 ... – ,pBp) (Xt – µ) = et
Result: AR(p) is stationary IFF the roots of the characteristic equation
1 – ,1z – ,2z2 – ... – ,pzp = 0
are all greater than 1 in absolute value.
17
ST4064
Time Series Analysis
18
1 – ,1z – ,2z 2 – ... – ,pzp # Characteristic Polynomial
Explanation for this result: write the AR(p) process in the form
! B "!
B" !
B"
1
#
1
#
...
1
#
%% ( X t # µ ) = et
$
%$
% $$
z
z
z
&
1 '&
2 ' &
p '
where z1 ...zp are roots of the characteristic polynomial:
!
z "!
z " !
z "
1 # %$1 # % ... $1 # %
$
1 – ,1z ... – ,pz = & z1 '& z 2 ' $ z p %
&
'
p
In the AR(1) case,
1 – ,z = 1 – z/z1, where z1 = 1/,
In AR(1) case, we can invert the term
! B"
#1- $
% z1 &
in
! B"
$1- % ( X t # µ ) = et
& z1 '
IFF |z1| > 1. In the AR(p) case, we need to be able to invert all of the factors
! B"
#1- $
% zi &
This will be the case IFF |zi| > 1 for i = 1,2, ..., p.
Example : AR(2)
Xt = 5 – 2(Xt-1 - 5) + 3(Xt-2 – 5) + et
!
or
(1 + 2B -3B2)(Xt – 5) = et
1 + 2z -3z2 = 0 is the characteristic equation here
Question: when is an AR(1) process stationary ?
Answer: we have
Xt = µ + , (Xt-1 – µ) + et.
i.e. (1 – ,B)(Xt – µ) = et, so 1 – ,z = 0 is the characteristic equation with solution z = 1/,. So |,| < 1 is
equivalent to |z| > 1, as required.
Question: Consider the AR(2) process Xn = Xn-1 – / Xn-2 + en. Is it stationary ?
ST4064
Time Series Analysis
Answer: Use B-operator: (1 – B + / B2)Xn = en. So characteristic equation is
1 – z + / z2 = 0, with roots 1 ± i and |1± i| = !2 > 1
Since both roots satisfy |zi| > 1, the process is stationary.
In the AR(1) model, we had '1 = ,'0 and '0 = )2. These are a particular case of the Yule-Walker
Equations for AR(p):
Cov( X t , X t #k ) = Cov( µ + !1 ( X t #1 # µ ) + ... + ! p ( X t # p # µ ) + et , X t #k )
$" 2 , if k=0
= !1Cov( X t #1 , X t #k ) + ... + ! pCov( X t # p , X t #k ) + %
& 0, otherwise
c.
Yule-Walker equations
The Yule-Walker equations are defined by the following relationship:
%! 2 , if k=0
" k = #1" k $1 + # 2" k $2 + ... + # p" k $ p + '
, for 0 & k & p
( 0, otherwise
Considering the AR(1) (i.e. p = 1), for k = 1, we get '1 = ,'0, and for k = 0, we get '0 = )2.
Example (p=3):
'3 = ,1'2 + ,2'1 + ,3'0
'2 = ,1'1 + ,2'0 + ,3'1
'1 = ,1'0 + ,2'1 + ,3'2
'0 = ,1'1 + ,2'2 + ,3'3 + )2
Example: consider the AR(3) model Xt = 0.6Xt-1 + 0.4Xt-2 – 0.1Xt-3 + et
Yule-Walker Equations:
'0 = 0.6'1 + 0.4'2 – 0.1'3 + )2
'1 = 0.6'0 + 0.4'1 – 0.1'2
'2 = 0.6'1 + 0.4'0 – 0.1'1
'3 = 0.6'2 + 0.4'1 – 0.1'0
(0)
(1)
(2)
(3)
From (1), '2 = 6'0 – 6'1
From (2), '2 = 0.4'0 + 0.56'1, hence '1 =
From (3), '3 =
56
54
'0, and hence '2 =
'0.
65
65
483
'0
650
From (0), )2 = 0.22508'0
Hence, '0 = 4.4429)2, '1=3.8278)2, '2=3.6910)2, '3=3.3014)2
19
ST4064
Time Series Analysis
20
and so, since *k = 'k/'0, *0 = 1, *1 = 0.862, *2 = 0.831, *3 = 0.743.
It may be shown that for AR(p) models,
•
ACF “tails off” to zero,
•
PACF “cuts off” after lag p, i.e. #k = 0 for k > p
II.6 MA(q) models and invertibility
a.
The MA(1) model
The model is given by Xt = µ + et + .et-1, where µt = E(Xt) = µ, and
'0 = Var(et + .et-1) = (1 + .2))2
'1 = Cov(et + .et-1, et-1+.et-2)=.)2
'k = 0 for k > 1
Hence, the ACF for MA(1) is:
*0 = 1
*1 = . / (1+.2)
*k = 0 for k > 1
Since the mean E(Xt) and covariance 'k = E(Xt, Xt-k) do not depend on t, the MA(1) process is (weakly)
stationary - for all values of the parameter &.
However, we require MA models to be invertible and this imposes conditions on the parameters.
Recall: If |,| < 1 then in the AR(1) model
(1 – ,B)(Xt – µ) = et,
(1-,B) is invertible and
#
X t = µ " $! j et" j = µ + et + ! et "1 + ! 2et "2 + ...
j =0
i.e. an AR(1) process is MA(&). An MA(1) process can be written as
Xt – µ = (1 + .B)et
or
(1 + .B)-1(Xt – µ) = et
i.e.
ST4064
Time Series Analysis
21
Xt-µ – .(Xt-1 – µ) + .2(Xt-2 – µ) + ... = et
So an MA(1) process is represented as an AR(&) one – but only if |.| < 1, in which case the MA(1)
process is invertible.
Example: MA(1) with & = 0.5 or & = 2
For both values of . we have:
*1 = ./(1+.2) =
0.5
2
=
= 0.4,
2
1 + (0.5) 1 + 22
So both models have the same ACF. However, only the model with .=0.5 is invertible.
Question: Interpretation of invertibility
Consider the MA(1) model Xn – µ – .en-1. We have
en = Xn – µ – .en-1 = Xn – µ – .(Xn-1 – µ – .en-2)
= ...
= Xn – µ – .(Xn-1 – µ) + .2(Xn-2 – µ) ... + (-.)n-1(X1 – µ) + (-.)ne0
As n gets large, the dependence of en on e0 will be small if .| < 1.
Note:
AR(1) is stationary IFF |$| < 1.
MA(1) is invertible IFF |%| < 1.
For an MA(1) process, we have *k = 0 for k > 1, so for an MA(1) process, the ACF “cuts off” after lag1.
It may be shown that PACF “tails off” to zero.
AR(1)
MA(1)
ACF
Tails off to zero
Cuts off after lag 1
PACF
Cuts off after lag 1
Tails off to zero
b. The MA(q) model and invertibility
An MA(q) process is modeled by Xt = µ + et + .1et-1 + ... + .qet-q, where {et} is a sequence of
uncorrelated realisations. For this model we have 'k = Cov(Xt, Xt-k) = 0 for k > q.
'k = Cov(Xt, Xt-k)
= E[(et + .1et-1 + ... + .qet-q) ( et-k + .1et-k-1 + ... + .qet-k-q)]
q
=
q
## ! ! E(e
i
i =0 j =0
j
e
t -i t " j " k
)
[where &0 = 1]
ST4064
= )2
Time Series Analysis
22
q "k
#!
j +k
!j ,
[since j = i-k ' q-k]
j =0
since the only non-zero terms occur when the subscripts of et-i and et-j-k match, i.e. when i = j+k, for k '
q.
In summary, for k > q, !k = 0:
•
For MA(q), ACF cuts off after lag q
•
For AR(p), PACF cuts off after lag p
Question: ACF of the MA(2) process Xn = 1 + en – 5en-1 + 6en-2.
'0 = Cov(1 + en – 5en-1 + 6en-2, 1 + en – 5en-1 + 6en-2) = (1 + 25 + 36) 1 = 62
If E(en) = 0 and Var(en) = 1.
'1 = Cov (1 + en – 5en-1 + 6en-2, 1 + en-1 – 5en-2 + 6en-3) = (-5)(1) + (6)(-5) = -35
'2 = Cov (1 + en – 5en-1 + 6en-2, 1 + en-2 – 5en-3 + 6en-4) = (6)(1) = 6
'k = 0, k > 2
Recall that an AR(p) process is stationary IFF roots z of the characteristic eq satisfy |z| > 1. For an
MA(q) process , we have
Xt – µ = (1 + .1B + .2B2 + ... + .pBp) et
Consider the equation 1 + .1z + .2x2 + ... + .pzp = 0. The MA(q) process is invertible IFF all roots z of
this equation satisfy |z| > 1.
In summary:
•
If AR(p) stationary, then AR(p) = MA(&)
•
If MA(q) is invertible, then MA(q) = AR(&)
Question: Assess invertibility of the MA(2) process Xt = 2 + et – 5et-1 + 6et-2.
We have Xt = 2 + (1-5B +6B2)et.
The characteristic equation is 1 – 5z + 6z2 = 0 with roots (1-2z)(1-3z) = 0, i.e. roots z = / and z = 1/3
!
Not invertible
II.7 ARMA(p,q) models
Recall that the ARMA(p,q) model can be written either in its generic form
ST4064
Time Series Analysis
23
Xt = µ + ,1(Xt-1 – µ) + ... ,p(Xt-p –µ) + et + .1et-1 + ...+ .qet-q
or using the B operator:
(1 – ,1B ... –,pBp) (Xt – µ) = (1 + .1B ... + .qBq)et
i.e.
0(B)(Xt – µ) = 1(B)et
where
0 (2) = 1 – ,12 - ... - ,p2p
1 (2) = 1 + .12 + ... + .p2q
If 0 (2) and 1 (2) have factors in common, we simplify the defining relation.
Consider the simple ARMA(1,1) process with . = -,, written either
Xt = ,Xt-1 + et – ,et-1
or
(1 – ,B)Xt = (1 – ,B)et , with |,| < 1
Dividing through by (1 – ,B), we obtain Xt = et. Therefore the process is actually an ARMA(0,0), also
called white noise.
We assume that 0(2) and 1(2) have no common factors. Properties of ARMA(p,q) are a mixture of
those of AR(p) and those of MA(q).
•
Characteristic polynomial of ARMA(p,q) = 1 – ,1z ... – ,pzp (as for AR(p))
•
ARMA(p,q) is stationary IFF all the roots z of 1 – ,1z ... – ,pzp = 0 satisfy |z| > 1
•
ARMA(p,q) is invertible IFF all the roots z of 1 – .1z ... – .pzq = 0 satisfy |z| > 1
Example: the ARMA(1,1) process Xt = ,Xt-1 + et + .et-1 is stationary if |,| < 1 and invertible if |.| < 1.
Example: ACF of ARMA(1,1). For the model given by Xt = ,Xt-1 + et + .et-1 we have
Cov(et, Xt-1)
=0
Cov(et, et-1)
=0
Cov(et, Xt)
= , Cov(et,Xt-1) + Cov(et,et) + . Cov(et,et-1) = )2
Cov(et-1, Xt) = , Cov(et-1,Xt-1) + Cov(et-1,et) + . Cov(et-1,et-1)
= , )2 + 0 + . )2 = (, + .) )2
ST4064
Time Series Analysis
'0 = Cov(Xt,Xt) = , Cov(Xt,Xt-1) + Cov(Xt,et) + . Cov(Xt,et-1)
= ,'1 + )2 + . (,+.) )2
= ,'1 + (1 + ,. + .2) )2
'1 = Cov(Xt-1,Xt)
= , Cov(Xt-1,Xt-1) + Cov(Xt-1,et) + .Cov(Xt-1,et-1)
= ,'0 + .)2
For k > 1,
'k = Cov(Xt-1,Xt)
= , Cov(Xt-k,Xt-1) + Cov(Xt-k,et) + . Cov(Xt-k,et-1)
= , 'k-1
(Analogues of Yule-Walker Equations)
!
Solve for '0 and '1:
1 + 2!" + " 2 2
'0 =
#
1-! 2
'1 =
(!+")(1 + !") 2
#
1-! 2
'k = ,k-1 '1, for k > 1
!1 (1+"#)("+#)
, *k = ,k-1*1, for k > 1 (compare *k = ,k, for k ( 0 for AR(1)).
=
2
!0
1+2"#+#
For (stationary) ARMA(p,q),
Hence
•
ACF tails off to zero
•
PACF tails off to zero
Question: ARMA(2,2) process
12 Xt = 10Xt-1 – 2Xt-2 + 12et – 11et-1 + 2et-2
24
ST4064
Time Series Analysis
25
(12 – 10B +2B2)Xt = (12 – 11B +2B2)et
The roots of
12 – 10z + 2z2 = 2(z – 2)(z –3) = 0
Are z = 2 and z = 3, |z| > 1 for both roots; process stationary.
II.8 ARIMA(p,d,q) models
a. Non-ARMA processes
•
Given time series data X1 ... Xn, find a model for this data.
•
Calculate sample statistics: sample mean, sample ACF, sample PACF.
•
Compare with known ACF/PACF of class of ARMA models to select suitable model.
•
All ARMA models considered are stationary – so can only be used for stationary time series
data.
•
If time-series data is non-stationary, transform it to a stationary time series (e.g. by differencing)
•
Model this transformed series using an ARMA model
•
Take the “inverse transform” of this model as model for the original non-stationary time series.
Example: Random Walk X0 = 0, Xn = Xn-1 + Zn, where Zn is a white noise process.
Xn is non-stationary, but
Xn = Xn – Xn-1 = Zn is stationary.
Question: Given X0, X1 ...Xn the first order differences are wi = xn – xi-1 , i = 1, ... , N
From the differences w1, w2, ..., wN and x0 we can calculate the original time series:
w1 = x1 – x0 , so x1 = x0 + w1
w2 = x2 – x1 , so x2 = x1 + w2
= x0 + w1 + w2, etc.
The inverse process of differencing is integration, since we must sum the differences to obtain the
original time series.
b. The I(d) notation (“integrated of order d”)
•
X is said to be I(0) if X is stationary
ST4064
Time Series Analysis
•
X is said to be I(1) if X is not stationary but Yt = Xt – Xt-1 is stationary
•
X is said to be I(2) if X is not stationary, but Y is I(1).
26
Thus X is I(d) if X must be “differenced” d times to make it stationary.
Example: If the first differences xn = xn – xn-1 of x1, x2 ... xn are modelled by an AR(1) model
(stationary)
!Xn = 0.5 !Xn-1 + en,
Then, Xn – Xn-1 = 0.5(Xn-1 – Xn-2) + en, so Xn = 1.5Xn-1 – 0.5Xn-2 +en is the model for the original time
series.
This AR(2) model is non-stationary since written as (1 – 1.5B + 0.5B2)Xn = en, for which the
characteristic equation is:
1 – 1.5z + 0.5z2 = 0
with roots z = 1 and z = 2. The model is non-stationary since |z| > 1 does not hold for BOTH roots.
X is ARIMA(p,1,q) if X is non-stationary, but !X (the first difference of X) is a stationary ARMA(p,q)
process
•
Recall that a process X is I(1) if X is non-stationary, but !X = Xt – Xt-1 is stationary
Note: If Xt is ARIMA(p,1,q) then Xt is I(1).
Example: Random Walk. Xt – Xt-1 = et, where et is a white noise process.
We have
t
Xt = X0 +
!e
j
j=1
So E(Xt) = E(X0), if E(et) = 0, but Var(Xt) = Var(X0) + t)2. Hence Xt is non-stationary, but !Xt = et,
where et is a stationary white noise process.
Example: Zt = closing share price on day t. Here the model is given by
Zt = Zt-1 exp(µ + et)
Let Yt = ln Zt , then Yt = µ + Yt-1 + et . This is a random walk with drift.
Now consider the daily returns Yt – Yt-1 = ln(Zt/Zt-1). Since Yt – Yt-1 = µ + et and the et’s are independent,
then Yt – Yt-1 is independent of Y1 ...Yt-1 or ln(Zt/Zt-1) is independent of past prices Z0, Z1, ... Zt-1.
ST4064
Time Series Analysis
Example: Recall the example of Qt = consumer price index at time t. We have
rt = ln(Qt/Qt-1) follows AR(1) model
rt = µ + , (rt-1 – µ) + et
ln(Qt/Qt-1) = µ + , (ln(Qt/Qt-1) - µ) + et
!ln(Qt) = µ + ,(!ln(Qt-1) – µ) + et
thus !ln(Qt) is AR(1) and so ln(Qt) is ARIMA(1,1,0)
If
X needs to be differenced at least d times to reduce it to stationarity,
d
• and Y = ! X is stationary ARMA(p,q),
•
then
X is an ARIMA(p,d,q) process.
An ARIMA(p,d,q) process is I(d)
Example: Identify as ARIMA(p,d,q) the following model
Xt = 0.6Xt-1 + 0.3Xt-2 + 0.1Xt-3 + et – 0.25et
(1 – 0.6B – 0.3B2 – 0.1B3) Xt = (1 – 0.25B) et
Check for factor (1 – B) on LHS: (1 – B)(1 – 0.4B + 0.1B2)Xt = (1 – 0.25B) et
!
Model is ARIMA(2,1,1)
Characteristic equation: 1 + 0.4z + 0.1z2 = 0 with roots -2 ± i 6
Since |z| =
10 > 1 for both roots !Xt is stationary, as required.
Alternative method: Write model in terms of !Xt = Xt – Xt-1, !Xt-1, etc
Xt – Xt-1 = -0.4Xt-1 + 0.4Xt-2
= -0.1Xt-2 + 0.1Xt-3 + et – 0.25et
!Xt = -0.4 !Xt-1 – 0.1 !Xt-2 + et – 0.25et-1
Hence, !Xt is ARMA(2,1) (check for stationarity as above), and so Xt is ARIMA(2,1,1)
Note: if !dXt is ARMA(1,q), to check for stationarity, we only need to see that |,1| < 1.
27
ST4064
Time Series Analysis
28
II.9 The Markov Property
AR(1) Model:
Xt = µ + ,(Xt-1 – µ) + et
Conditional distribution of Xn+1 , given Xn, Xn-1 , ... , X0 depends only on Xn
!
AR(1) has markov property
AR(2) Model:
Xt = µ + ,1 (Xt-1 – µ) + ,2 (Xt-2 - ) + et
Conditional distribution of Xn+1, given Xn , Xn-1 , ... X0
depends on Xn-1 as well as Xn.
!
AR(2) does not have the Markov Property
Consider now
Xn+1 = µ + ,1Xn + ,2Xn-1 + en+1
or
! X n+1 " ! µ " ! !1 ! 2 " ! X n " ! en+1 "
#
$ =# $+#
$ +#
$
$#
% X n & % 0 & % 1 0 & % X n-1 & % 0 &
Define
! Xn "
T
Yn = #
$ =(X n ,X n-1 )
% X n-1 &
! en+1 "
! µ " ! !1 ! 2 "
Y
+
$
$ n #
%0 & % 1 0 &
%0 &
then Yn+1 = # $ + #
•
Y is said to be a vector autoregressive process of order 1.
•
Notation: Var(1)
•
Y has the Markov property
In general, AR(P) does not have the Markov property for p > 1, but Y = (Xt, Xt-1, ... Xt-p+1)T does
•
•
Recall: Random walk – ARIMA(0,1,0) defined by Xt – Xt-1 = et has independent increments and
hence does have the Markov property
It may be shown that for p+d > 1, ARIMA(p,d,0) does not have the Markov property, but Yt = (Xt, Xt-1,
..., Xt-p-d+1)T does.
ST4064
Time Series Analysis
29
Consider the MA(1) process Xt = µ + et + .et-1. It is clear that “knowing Xn will never be enough to
deduce the value of en, on which the distribution of Xn+1 depends”. Hence an MA(1) process does not
have the Markov property.
Now consider an MA(q) = AR(&) process. It is known that AR(p) processes Y = (Xt, Xt-1, ...Xt-p+1)T
have the Markov property if considered as a p-dimensional vector process (p finite). It follows that an
MA(q) process has no finite dimensional Markov representation.
Question: Associate a vector-valued Markov process with 2Xt = 5Xt-1 – 4Xt-2 + Xt-3 + et
We have
2 (Xt – Xt-1) = 3 (Xt-1 – Xt-2) - (Xt-2 – Xt-3) + et
2!Xt = 3 !Xt-1 – !Xt-2 + et
!2 Xt = !2Xt-1 + et
!
ARIMA(1,2,0) or ARIMA(p,d,q) with p = 1 and d = 2.
Since p+d = 3 > 1, Yt = (Xt, Xt-1, ...Xt-p-d+1)T = (Xt, Xt-1,Xt-2)T is Markov
Question: Let the MA(1) process Xn = en + en-1, where
en =
1 with probability /
-1 with probability /
P(Xn = 2 | Xn-1 = 0)
= P(en = 1, en-1 = 1 | en-1 + en-2 = 0)
= P(en = 1) P(en-1 = 1 | en-1 + en-2 = 0)
=//=3
P(Xn = 2 | Xn-1 = 0, Xn-2 = 2)
= P(en = 1, en-1 = 1| en-1 + en-2 = 0, en-2 + en-3 = 2)
=0
! Not Markov: since the two probabilities differ, value of Xn does not depend on the immediate
past n-1 only.
ST4064
III.
Time Series Analysis
30
Non-stationarity: trends and techniques
III.1 Typical trends
Possible causes of non-stationarity in a time series are:
•
•
•
Deterministic trend (e.g. linear or exponential growth)
Deterministic cycle (e.g. seasonal effects)
Time series is integrated (as opposed to differenced)
Example:
+1, probability 0.6
Xn = Xn-1 + Zn, where Zn =
-1, probability 0.4
Here Xn is I(1) , since Zn = Xn – Xn-1 is stationary. Also, E(Xn) = E(Xn-1) + 0.2, so the process has a
deterministic trend.
Many techniques allow to detect non-stationary series; among the simplest methods:
• Plot of time series against t
• Sample ACF
The sample ACF is an estimate of the theoretical ACF, based on the sample data and is defined later. A
plot of the time series will highlight a trend in the data and will show up any cyclic variation.
Trend
Xt
Xt
t
Seasonal Pattern
2003
|
2004
t
Trend + Seasonal
Xt
t
Recall: For a stationary time series, *k % 0 as k % &, i.e. (theoretical) ACF converges toward zero.
Hence, the sample ACF should also converge toward zero. If the sample ACF decreases slowly, the
time series is non-stationary, and needs to be differenced before fitting a model.
Sample ACF
Sample ACF
ST4064
Time Series Analysis
rk
31
rk
6
1
k
1
12
If sample ACF exhibits periodic oscillation, there is probably a seasonal pattern in the data. This
should be removed before fitting a model (see Figures 7.3a and 7.3b). The following graph (Fig 7.3(a))
shows the number of hotel rooms occupied over several years. Inspection shows the clear seasonal
dependence, manifested as a cyclic effect.
The next graph (Fig 7.3(b)) shows the sample autocorrelation function for this data. It is clear that the
seasonal effect shows up as a cycle in this function. In particular, the period of this cycle looks to be 12
months, reinforcing the idea that it is a seasonal effect.
!"#$%&'()*+'
Seasonal variation- hotel room occupancy (7.3a) 1963-1976 and its sample ACF (7.3b)
Methods for removing a linear trend:
ST4064
•
•
Time Series Analysis
32
Least squares
Differencing
Methods for removing a seasonal effect
•
•
•
Seasonal differencing
Method of Moving Averages
Method of seasonal means
III.2 Least squares trend removal
Fit a model,
Xt = a + bt + Yt
where Yt is a zero-mean, stationary process.
Recall: et = error variables (“true residuals”) in a regression model. Assume et ~ IN(0,)2)
•
Estimate parameters a and b using linear regression
•
Fit a stationary model to the residuals:
ˆ
ŷt = x t ! (aˆ - bt)
Note: least squares may also be used to remove nonlinear trends from a time series. It is naturally
possible to model any observed nonlinear trend by some term "(t) within
Xt = "(t) + Yt
which can be estimated using least squares. For example, a plot of hourly data of daily energy loads
against temperature, over a one-daytime frame, may indicate quadratic variations over the day; in this
case one could use "(t) = a + bt2.
III.3 Differencing
a. Differencing and linear trend removal
Use differencing if the sample ACF decreases slowly. If there is a linear trend, e.g. xt = a + bt + yt, then
"xt = xt ! xt !1 = b + " yt ,
so differencing has removed the linear trend. If xt is I(d), then differencing xt d times will make it
stationary.
Differencing xt once will remove any linear trend, as above.
ST4064
Time Series Analysis
33
Suppose xt is I(1) with a linear trend. If we difference xt once, then !xt is stationary and we have
removed the trend.
However, if we remove the trend using linear regression we will still be left with an I(1) process that is
non-stationary.
Example:
+1, prob. 0.6
Xn = Xn-1 + Zn, where Zn =
-1, prob. 0.4
Let X0 = 0. Then E(X1) = 0.2, since E(Z1) = 0.2, and
E(X2) = 0.2(2)
E(Xn) = 0.2(n).
Then Xn is I(1) AND Xn has a linear trend.
Let Yn = Xn – 0.2(n). Then E(Yn) = 0, so we have removed the linear trend but
Yn – Yn-1 = Xn – Xn-1 -0.2
= Zn – 0.2
Hence Yn is a random walk (which is non-stationary) and !Yn is stationary, so Yn is an I(1) process.
b. Selection of d
How many times (d) do we have to difference the time series Xt to convert it to stationarity? This will
determine the parameter d in the fitted ARIMA(p,d,q) model.
Recall the three causes of non-stationarity:
•
•
•
Trend
Cycle
Time series is an integrated series
We are assuming that linear trends and cycles have been removed, so if the plot of the time series and its
SACF indicate non-stationarity, it could be that the time series is a realisation of an integrated process
and so must be differenced a number of times to achieve stationarity.
Choosing an appropriate value of d:
•
Look at the SACF. If the SACF decays slowly to zero, this indicates a need for differencing (for
a stationary ARMA model, the SACF decays rapidly to zero).
•
Look at the sample variance of the original time series X and its difference.
Let !ˆ 2 be the sample variance of z ( d ) =! d x . It is normally the case that !ˆ 2 first decreases with d until
stationarity is reached, and then starts to increase, since differencing too much introduces correlation.
ST4064
Time Series Analysis
34
Take d equal to the value that minimises !ˆ 2 .
!ˆ 2 5
5
5
5
0
5
5
5
5
1
2
3
d
In the above example, take d=2, which is the value for which the estimated variance is minimised.
III.4 Seasonal differencing
Example: Let X be the monthly average temperature in London. Suppose that the model
xt = µ + 4t + yt
applies, where 4t is a periodic function with period 12 and yt is stationary. The seasonal difference of X
is defined as:
( "12 x )t = xt
– xt !12
But: xt – xt-12 = (µ + 4t + yt) – (µ + 4t-12 + yt-12) = yt – yt-12 since 4t = 4t-12.
Hence xt – xt-12 is a stationary process. We can model xt – xt-12 as a stationary process and thus get a
model for xt.
Example: In the UK, monthly inflation figures are obtained by seasonal differencing of the retail prices
index (RPI). If xt is the value of RPI for month t, then annual inflation figure for month t is
x t - x t-12
!100%
x t-12
Remark 1: the number of seasonal differences taken is denoted by D. For example, for the seasonal
differencing X t ! X t !12 = "12 X t we have D=1.
Remark 2: in practice, for most time series we would need at most d=1 and D=1.
III.5 Method of moving averages
This method makes use of a simple linear filter to eliminate the effects of periodic variation. If X is a
time series with seasonal effects with even period d = 2h, we define a smoothed process Y by
yt =
1 !1
1
"
# xt -h + xt -h +1 + ... + xt -1 + xt + ... + xt + h -1 + xt + h $
2h % 2
2
&
This ensures that each period makes equal contribution to yt.
Example with quarterly data: A yearly period will have d = 4 = 2h, so h = 2, and
ST4064
Time Series Analysis
35
yt = 3 ( / xt-2 + xt-1 + xt + xt+1 + / xt+2)
This is a centred moving average, since the average is taken symmetrically around the time t. Such an
average can only be calculated retrospectively.
For odd periods d = 2h + 1, the end terms xt-h and xt+h need not be halved:
yt =
1
( x t-h +x t-h+1 +...+x t-1 +x t +...+x t+h-1 + x t+h )
2h + 1
Example: with data every 4 months, a yearly period will have d = 3 = 2h+1, so h = 1 and
yt = 1/3 (xt-1 + xt + xt+1)
III.6 Seasonal means
In fitting the seasonal model
xt = µ + 4t + yt with E(Yt)=0 (additive model)
to a monthly time series, x extending over 10 years from January 1990, the estimate of µ is x (the
average over all 120 observations) and the estimate of 4January is
1
!ˆ January = (x1 +x13 +...+x109 )-x ,
10
the difference between the average value for January, and the overall average over all the months.
Recall that 4t is a periodic function with period 12 and yt is stationary. Thus, 4t contains the deviation of
the model (from the overall mean µ) at time t due to the seasonal effect.
Month/Year
January
.
.
.
December
1
x1
.
.
.
x12
2
x13
.
.
.
x24
....
...
...
10
x109
.
.
.
x120
mean
!ˆ
1
!ˆ12
overall mean x
III.7 Filtering, smoothing
Filtering and exponential smoothing techniques are commonly applied to time series in order to “clean”
the original series from undesired artifacts. The moving average is an example of a filtering technique.
Other filters may be applied depending on the nature of the input series.
ST4064
Time Series Analysis
36
Exponential smoothing is another common set of techniques. It is used typically to “simplify” the input
time series by dampening its variations so as to retain in priority the underlying dynamics.
III.8 Transformations
Recall: In the simple linear model
yi = .0 + .ixi + ei
where ei ~ IN (0,)2), we use regression diagnostic plots of the residuals, eˆi , to test the assumptions
about the model (e.g. the normality of the error variables ei or the constant variance of the error variables
ei). To test the later assumption we plot the residuals against the fitted values.
eˆi
x
0
x
x
x
x
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
ŷi
If the plot does not appear as above, the data is transformed, and the most common transformation is the
logarithmic transformation.
Similarly, if after fitting an ARMA model to a time series xt, a plot of the “residuals” versus the “fitted
values” indicates a dependence, then we should consider modelling a transformation of the time series
xt and the most common transformation is the logarithmic Transformation
Yt = ln(Xt)
ST4064
IV.
Time Series Analysis
37
Box-Jenkins methodology
IV.1 Overview
We consider how to fit an ARIMA(p,d,q) model to historical data {x1, x2, ...xn}. We assume that trends
and seasonal effects have been removed from the data.
The methodology developed by Box and Jenkins consists in 3 distinct steps:
Tentative identification of an ARIMA model
Estimation of the parameters of the identified model
Diagnostic checks
•
•
•
If the tentatively identified model passes the diagnostic tests, it can be used for forecasting.
If it does not, the diagnostic tests should indicate how the model should be modified, and a new cycle of
Identification
Estimation
Diagnostic checks
•
•
•
is performed.
IV.2 Model selection
a. Identification of white noise
Recall: in a simple linear regression model, yi = .0 + .1Xi + ei, ei ~ IN(0,)2), we use regression
diagnostic plots of the residuals eˆi to test the goodness of fit of the model, i.e. if the assumptions
ei ~ IN(0,)2) are justified.
The error variables ei form a zero-mean white noise process: they are uncorrelated, with common
variance )2.
Recall: {et : t !! } is a zero-mean white noise process if
E (et ) = 0 $t
%! 2 , k = 0
" k = Cov(et , et #k ) = &
' 0, otherwise
Thus the ACF and PACF of a white noise process (when plotted against k) look like this:
ACF (*k)
PACF ( !ˆk )
1
-1
|
|
|
1
2
3 ...
|
|
1
|
k
-1
|
|
|
1
2
3 ...
|
|
|
k
ST4064
Time Series Analysis
38
i.e. apart from *0 = 1, we have *k = 0 for k = 1,2,... and !k = 0 for k = 1, 2,...
Question: how do we test if the residuals from a time series model look like a realisation of a white
noise process?
Answer: we look at the SACF and SPACF of the residuals. In studying the SACF and SPACF, we
realise that even if the original process was white noise, we would not expect rk = 0 for k = 1, 2,… and
!k = 0 for k = 1, 2,… as rk is only an estimate of *k and !ˆk is only an estimate of !k .
Question: how close to 0 should rk and !ˆk be, if rk = 0 for k = 1, 2, … and !ˆk = 0 for k = 1, 2, …?
Answer: If the original model is white noise, Xt = µ + et, then for each k, the SACF and SPACF satisfy
! 1$
! 1$
rk ~ N # 0, & and !ˆk ~ N # 0, &
" n%
" n%
This is true for large samples, i.e. for large values of n.
! 2 2 "
Values of rk or !ˆk outside the range $ #
,
% can be taken as suggesting that a white noise model is
& n n'
inappropriate.
However, these are only approximate 95% confidence intervals. If *k = 0, we can be 95% certain that rk
lies between these limits. This means that 1 value in 20 will lie outside these limits even if the white
noise model is correct.
Hence a single value of rk or !ˆk outside these limits would not be regarded as significant on its own, but
three such values might well be significant.
There is an overall Goodness of Fit test, based on all the rk’s in the SACF, rather than on individual rk’s,
called the Portmanteau test by Ljung and Box. It consists in checking whether the m sample
autocorrelation coefficients of the residuals are too large to resemble those of a white noise process
(which should all be negligible).
Given residuals from an estimated ARMA(p,q) model, under the null hypothesis that all values of rk = 0,
and the Q-statistic is asymptotically #2-distributed with s = m – p – q degrees of freedom, or, if a
constant (say µ) is included, s = m – p – q – 1 degrees of freedom.
If the white noise model is correct then
rk 2
! ! s2 for each s = m - p - q.
k =1 n " k
m
Q = n(n + 2)#
That is, under the null hypothesis that all values of rk = 0, the Q-statistic given above is asymptotically
#2-distributed with m degrees of freedom. If the Q-statistic is found to be greater than the 95th percentile
of that #2 distribution, the null hypothesis is rejected, which means that the alternative hypothesis that “at
least one autocorrelation is non-zero” is accepted. Statistical packages print these statistics. For large n,
the Ljung-Box Q-statistic tends to closely approximate the Box-Pierce statistic:
ST4064
Time Series Analysis
rk 2
n(n + 2) "
! n
k =1 n - k
m
39
m
2
k
"r
k =1
The overall diagnostic test is therefore performed as follows (for centred realisations):
•
Fit ARMA(p,q) model
•
Estimate (p+q) parameters
•
Test if
rk2
m
Q = n(n + 2)
"n!k ~
2
! m!
p!q
k=1
Remark: the above Ljung-Box Q-statistic was first suggested to improve upon the simpler Box-Pierce
test statistic
m
Q = n! rk2
k =1
which was found to perform poorly even for moderately large sample sizes.
b. Identification of MA(q)
Recall: for an MA(q) process, #k = 0 for all k > q, i.e. the “ACF cuts off after lag q”.
To test if an MA(q) model is appropriate, we see if rk is close to 0 for all k > q. If the data do come from
an MA(q) model, then for k > q (since the first q+1 coefficients are significant),
q
" 1"
%%
rk ~ N $$ 0, $$1+ ! 2 !i2 ''''
&&
# n # i=1
and 95% of the rk’s should lie in the interval
q
q
"
#
1$
1$
2%
2%
' &1.96 )1 + 2/ !i * , +1.96 )1 + 2/ !i * (
n+
n+
i =1
i =1
',
, (.
(note that it is common to use 2 instead of 1.96 in the above formula). We would expect 1 in 20 values to
lie outside the interval. In practise, the #i’s are replaced by ri’s. The “confidence limits” on SACF plots
are based on this. If rk lies outside these limits it is “significantly different from zero” and we conclude
that #k $ 0. Otherwise, rk is not significantly different to zero and we conclude that #k = 0.
SACF
---
rk
1
---
---
---
2
---
k
---
ST4064
Time Series Analysis
40
For q=0, the limits for k=1 are
! 1.96 1.96 "
,
$#
%
n
n'
&
as for testing for white noise model. Coefficient r1 is compared with these limits. For q = 1, the limits
for k = 2 are
!
"
1
1
2
2
$ #1.96 (1 + 2r1 ),1.96 (1 + 2r1 ) %
n
n
&
'
and r2 is compared with these limits. Again, 2 is often used in place of 1.96.
c. Identification of AR(p)
Recall: for an AR(p) process, we have !k = 0 for all k > p, i.e. the “PACF cuts off after lag p”.
To test if an AR(p) model is appropriate, we see if the sample estimate of !k is close to 0 for all k > p. If
the data do come from an AR(p) model, then for k > p,
! 1$
!ˆk ~ N # 0, &
" n%
and 95% of the sample estimates should lie in the interval
! 2 2 "
,
$#
%
& n n'
The “confidence limits” on SPACF plots are based on this: if the sample estimate of !k lies outside these
limits, it is “significant”.
0.4
0.2
-0.2 0.0
SPACF
0.6
0.8
Sample PACF of AR(1)
5
10
Lag k
15
ST4064
Time Series Analysis
41
IV.3 Model fitting
a. Fitting an ARMA(p,q) model
We make the following assumptions:
•
An appropriate value of d has been found and {zd+1, zd+2, ... zn} is stationary.
•
Sample mean z = 0; if not, subtract µˆ = z from each zi.
•
For simplicity, we assume that d = 0 (to simplify upper and lower limits of sums).
We look for an ARMA(p,q) model for the data z:
•
If the SACF appears to cut off after lag q, an MA(q) model is indicated (we use the tests of
significance described previously).
•
If the SPACF appears to cut off after lag p, and AR(p) model is indicated.
If neither the SACF nor the SPACF cut off, mixed models must be considered, starting with
ARMA(1,1).
b. Parameter estimation: LS and ML
Having identified the values for the parameters p and q, we must now estimate the values of the
parameters (1, (2, ... (p and &1, &2, ..., &q in the model
Zt = (1Zt-1 + ... + (pZt-p + et + &1et-1 + &qet-q
Least squares (LS) estimation is equivalent to maximum likelihood (ML) estimation if et is assumed
normally distributed.
Example: in the AR(p) model, et = Zt – (1Zt-1 – ... – (pZt-p. The estimators !ˆ1 ,...,!ˆ p are chosen to
minimise
n
" (z
t
! !ˆ 1z t-1 ! ... ! !ˆ p z t-p )2
t=p+1
Once these estimates obtained, the residual at time t is given by
eˆt = z " !ˆ1 zt -1 " ... " !ˆ p zt - p
For general ARMA models, êt cannot be deduced from the zt. In the MA(1) model for instance,
eˆt = zt " !ˆ1eˆt "1
We can solve this iteratively for êt as long as some starting value ê0 is assumed. For an ARMA(p,q)
model, the list of starting values is ( ê0 , ê1 , ..., êq!1 ). The starting values are estimated recursively by
backforecasting:
ST4064
0.
Time Series Analysis
42
Assume ( ê0 , ê1 , ..., êq!1 ) are all zero
Estimate the (i and &j
2. Use forecasting on the time-reversed process {zn, ..., z1} to predict values for ( ê0 , ê1 , ..., êq!1 )
1.
3.
Repeat cycle (1)-(2) until the estimates converge.
c. Parameter estimation: method of moments
•
Calculate theoretical ACF or ARMA(p,q): #k’s will be a function of the (’s and &’s.
•
Set #k = rk and solve for the (’s and &’s. These are the method of moments estimators.
Example: you have decided to fit the following MA(1) model
xn = en + .en-1 , en ~ N(0,1)
You have calculated !ˆ0 =1, !ˆ1 = -0.25. Estimate ..
ˆ
We have r1 = ! 1 = -0.25.
!ˆ0
Recall: '0 = (1 + .2) )2 = 1 + .2 and '1 = .)2 = . here, from which *1 =
Setting #1 = r1 =
! .
1+! 2
! = -0.25 and solving for . gives . = -0.268 or . = -3.732.
1+! 2
Recall: the MA(1) process is invertible IFF |.| < 1. So for . = -0.268, the model is invertible. But for . =
-3.732 the model is not invertible.
Note: If !ˆ1 = -0.5 here, then #1 = r1 =
! = -0.5, which gives (. + 1)2 = 0, so . = -1, and neither
1+! 2
estimate gives an invertible model.
Now, let us estimate )2 = Var (et).
Recall that in the simple linear model Yi = &0 + &1Xi + ei, ei ~ IN(0, )2), )2 is estimated by
!ˆ 2 =
1
n-2
n
2
i
" eˆ
i =1
where eˆi = yi - !ˆ0 - !ˆ1 xi is the ith residual. Here we use
!ˆ 2 =
=
1 n 2
$ eˆt
n t = p +1
1
n
n
$ ( z - "ˆ z
t
t = p +1
1 t -1
-...- "ˆ p zt - p - #ˆ1eˆt -1 -...- #ˆq eˆt -q )
ST4064
Time Series Analysis
43
No matter which estimation method is used this parameter is estimated last, as estimates of the (’s and
.’s are required first.
Note: In using either Least Squares or Maximum Likelihood Estimation we also find the residuals, ê t ,
whereas using the Method of Moments to estimate the ,’s and .’s these residuals have to be calculated
afterwards.
Note: for large n, there will be little difference between LS, ML and Method of Moments estimators.
d. Diagnostic checking
Assume we have identified a tentative ARIMA(p,d,q) model and calculated the estimates
ˆ !,
ˆ "ˆ 1 , ... "ˆ p , #ˆ 1, ... ,#ˆ q .
µ,
We must perform diagnostic checks based on the residuals. If the ARMA(p,q) model is a good
approximation to the underlying time series process, then the residuals ê t will form a good
approximation to a white noise process.
(I)
Tests to see if the residuals are white noise:
! 1.96 1.96 "
Study SACF and SPACF of residuals. Do rk and !ˆk lie outside $ #
,
%?
n
n'
&
• Portmanteau test of residuals (carried out on the residual SACF):
m
r2
n(n + 2) # k
~ ! m2 "s , for s = number of parameters of the model
n
k
k =1
•
If the SACF or SPACF of the residuals has too many values outside the interval !$ # 1.96 , 1.96 "% we
&
n
n'
conclude that the fitted model does not have enough parameters and a new model with additional
parameters should be fitted.
The Portmanteau test may also be used for this purpose. Other tests are:
{eˆ t }
•
Inspection of the graph of
•
Counting turning points
•
Study the sample spectral density function of the residuals
(II)
Inspection of the graph of
{eˆ t }:
plot ê t against t
• plot ê t against zt
any patterns evident in these plots may indicate that the residuals are not a realisation of a set of
independent (uncorrelated) variables and so the model is inadequate.
•
ST4064
(III)
Time Series Analysis
44
Counting Turning Points:
This is a test of independence. Are the residuals a realisation of a set of independent variables?
Possible configurations for a turning point are:
In the diagram above, there exists a turning point for all configurations except (a) and (b). Since four out
of the six possible configurations exhibit a turning point, the probability to observe one is 4/6 = 2/3.
If y1, y2, ..., yn is a sequence of numbers, the sequence has a turning point at time k if
either
yk-1 < yk AND yk > yk+1
or
yk-1 > yk AND yk < yk+1
Result: if Y1, Y2, ... YN is a sequence of independent random variables, then
•
the probability of a turning point at time k is 2/3
•
The expected number of turning points is 2/3 (N - 2)
•
The variance is (16N – 29)/90
[Kendall and Stuart, “The Advanced Theory of Statistics”, 1966, vol 3, p.351]
therefore, the number of turning points in a realisation of Y1, Y2, ... YN should lie within the 95%
confidence interval:
!2
$ 16 N # 29 % 2
$ 16 N # 29 % "
& ( N # 2) # 1.96 (
) , ( N # 2) + 1.96 (
)'
* 90 + 3
* 90 + -'
,& 3
Study the sample spectral density function of the residuals:
Recall: the spectral density function on white noise process is f(#) = )2/2$ , -$ < # < $. So the sample
spectral density function of the residuals should be roughly constant for a white noise process.
ST4064
V.
Time Series Analysis
45
Forecasting
V.1 The Box-Jenkins approach
Having fitted an ARMA model to {x1, x2, ... xn} we have the equation:
Xn+k = µ + ,1 (xn+k-1 – µ) + ... + ,p (xn+k-p – µ) + en+k + .1en+k-1 + ...+ .qen+k-q
x1
x2 ... ...
xn ....
xn+k
S
?
•
1
•
•
2
n
n+k
time
x̂ n (k) = Forecast value of xn+k, given all observations up until time n.
= k-step ahead forecast at time n.
In the Box-Jenkins approach, x̂ n (k) is taken as E(Xn+k | X1 , ... , Xn), i.e. x̂ n (k) is the conditional
expectation of the future value of the process, given the information currently available.
From result 2 in ST3053 (section A), we know that E(Xn+k | X1 , ... , Xn) minimises the mean square error
E(Xn+k – h( X1 , ... , Xn))2 of all functions h(X1 , ... , Xn).
x̂ n (k) is calculated as follows from the equation for Xn+k:
•
Replace all unknown parameters by their estimated values
•
Replace random variables X1, ..., Xn by their observed values x1 , ... , xn.
•
Replace random variables Xn+1 , ... , Xn+k-1 by their forecast values, x̂ n (1) , ... , x̂ n (k-1)
•
Replace variables e1 , ... , en by the residuals eˆ1 , ... , eˆ n
•
Replace variables en+1 , ... , en+k-1 by their expectations 0.
Example: AR(2) model xn = µ + !1 ( xn-1 - µ ) + ! 2 ( xn-2 - µ ) + en . Since
X n+1 = µ + !1 ( X n – µ ) + ! 2 ( X n"1 – µ ) + en+1
X n+2 = µ + !1 ( X n+1 – µ ) + ! 2 ( X n – µ ) + en+2
we have
xˆn (1) = µˆ + !ˆ1 ( xn " µˆ ) + !ˆ 2 ( xn-1 " µˆ )
xˆn (2) = µˆ + !ˆ1 ( xˆn (1) " µˆ ) + !ˆ 2 ( xn " µˆ )
ST4064
Time Series Analysis
46
Example: 2-step ahead forecast of an ARMA(2,2) model
xn = µ + !1 ( xn-1 - µ ) + ! 2 ( xn-2 - µ ) + en + "2en#2 .
Since xn+2 = µ + !1 ( xn+1 - µ ) + ! 2 ( xn - µ ) + en+ 2 + "2en , we have
xˆn (2) = µˆ + !ˆ1 ( xˆn (1) - µˆ ) + !ˆ 2 ( xn - µˆ ) + "ˆ2eˆn
The (forecast) error of the forecast x̂ n (k) is
x n+k - xˆ n (k )
The expected value of this error is
E(xn+k - xˆ n (k) | x1,...,x n ) = xˆ n (k) - xˆ n (k) = 0
Hence the variance of the forecast error is
E((x n+k ! xˆ n (k ))2 | x1 ,..., x n )
This is needed for confidence interval forecasts as it is more useful than a point estimate.
For stationary processes, it may be shown that x̂ n (k) ! µ as k ! " . Hence, the variance of the
forecast error tends to E(xn+k-µ)2 = )2 as k % &, where )2 is the variance of the process.
V.2 Forecasting ARIMA processes
If X is ARIMA(p,d,q) then Z = ! d X is ARMA(p,q).
•
Use methods reviewed to produce forecasts for Z
•
Reverse the differencing procedure to produce forecasts for X
Example: if X is ARIMA(0,1,1) then Z = !X is ARMA(0,1), leading to the forecast ẑn (1) .
But Xn+1 = Xn + Zn+1, so xˆ n (1) = x n + zˆ n (1)
Question: Find x̂ n (2) for an ARIMA(1,2,1) process.
Let Z n = ! 2 X n and assume Zn = µ + , (Zn-1 – µ) + en + .en-1, but
Z n+ 2 = !2 X n+ 2 = ( X n+ 2 " X n+1 ) " ( X n+1 " X n )
= X n+ 2 " 2 X n+1 + X n
ST4064
Time Series Analysis
47
so Xn+2 = 2Xn+1 – Xn + Zn+2. Hence,
ˆ ˆ n (1) ! µˆ
xˆ n (2) = 2xˆ n (1) ! x n + zˆ n (2) = 2xˆ n (1) ! x n + µˆ +!(z
V.3 Exponential smoothing and Holt-Winters
•
The Box-Jenkins method requires a skilled operator in order to obtain reliable results.
•
For cases where only a simple forecast is needed, exponential smoothing is much simpler (Holt,
1958).
A weighted combination of past values is used to predict future observations. For example, the first
forecast for an AR model is obtained by
(
2
xˆn (1) = ! xn + (1 " ! ) xn "1 + (1 – ! ) xn "2 + ...
)
or
"
xˆn (1) = ! # (1- ! )i xn-i =
i =0
!
xn
1- (1- ! ) B
!
•
The sum of the weights is ! " (1-!)i =
i=0
!
=1
1-(1-!)
•
Generally we use a value of , such that 0 < , < 1, so that there is less emphasis on historic values
further back in time (usually, 0.2 6 , 6 0.3).
•
There is only one parameter to control, usually estimated via least squares.
•
The weights decrease geometrically – hence the name exponential smoothing.
Updating forecasts is easy with exponential smoothing:
Xn-1
Xn
Xn+1
5
?
5
|
n-1
|
n
|
n+1
It is easy to see that
xˆn (1) = (1- ! ) xˆn-1 (1) + ! xn = xˆn-1 (1) + ! ( xn - xˆn-1 (1))
Current forecast = previous forecast + , 7 (error in previous forecast).
ST4064
Time Series Analysis
48
•
Simple exponential smoothing can’t cope with trend or seasonal variation.
•
Holt-Winters smoothing can cope with trend and seasonal variation
•
Holt –Winters can sometimes outperform Box-Jenkins forecasts.
V.4 Linear filtering
input process
linear filter
xt
time Series
output process
yt
filter weights
time series
A linear filter is a transformation of a time series {xt} (the input series) to create an output series {yt}
which satisfies:
yt =
!
#a
k
x t-k .
k= "!
The collection of weights {ak : k % Z} forms a complete description of the filter.
The objective of the filtering is to modify the input series to meet particular objectives, or to display
specific features of the data. For example, an important problem in analysis of economic time series is
detection, isolation and removal of deterministic trends.
In practice, a filter {ak : k % Z} normally contains only a relatively small number of non-zero
components.
Example: regular differencing. This is used to remove a linear trend. Here a0 = 1, a1 = -1, ak = 0
otherwise. Hence yt = xt – xt-1.
Example: seasonal differencing. Here a0 = 1, a12 = -1, ak = 0 otherwise, and yt = xt – xt-12.
Example: if the input series is a white noise and the filter takes the form {.0 = 1, .1, ... , .q}, then the
output series is MA(q), since
q
yt = " ! k et -k
k =0
If the input series, x, is AR(p), and the filter takes the form {,0 = 1, -,1, ... , -,p}, then the output series is
white noise
p
yt = xt " # ! k xt -k = et
k =1
ST4064
VI.
Time Series Analysis
49
Multivariate time series analysis
VI.1 Principal component analysis and dimension reduction
a. Principle Component Analysis
See lectures and practicals.
b. Multivariate correlation: basic properties
The multivariate process (X,Y,Z) defined by
! ! µX
$$
( X , Y , Z ) ~ N $ $ µY
$$ µ
&& Z
" ! # XX
% $
% , $ #YX
% $#
' & ZX
# XY
#YY
# ZY
# XZ " "
%%
#YX % %
# ZZ %' %'
satisfies the following:
X adjusted for Z:
-1
X ! E( X | Z ) = ( X ! µ X ) ! " XZ "ZZ ( z - µZ )
#$ X = µ X + ! ( Z " µ Z )
% ˆ
-1
$& ! = ' XZ ' ZZ
Y adjusted for Z:
-1
Y ! E(Y | Z ) = (Y ! µY ) ! "YZ "ZZ ( z - µZ )
Partial Covariance :
-1
-1
Cov "$( X ! µ X ) ! & XZ & ZZ ( z - µ Z ), (Y ! µY ) ! & YZ & ZZ ( z - µ Z ) #%
-1
-1
= E " "$( X ! µ X ) ! & XZ & ZZ ( z - µ Z ) #% "$(Y ! µY ) ! & YZ & ZZ ( z - µ Z ) #% #
$
%
-1
-1
= E "$( X ! µ X )(Y ! µY ) ! & XZ & ZZ ( z - µ Z )( z - µ Z ) -1 & ZZ
= & XY ! & XZ
&
YZ
#
%
!1
& &
ZZ
ZY
Variance:
-1
Var "$( X ! µ X ) ! & XZ & ZZ ( z - µZ ) #%
-1
-1
= E " "$( X ! µ X ) ! & XZ & ZZ ( z - µZ ) #% "$( X ! µ X ) ! & XZ & ZZ ( z - µZ ) #% #
$
%
= & XX ! & XZ
and
!1
& &
ZZ
ZX
ST4064
Time Series Analysis
50
-1
Var "$(Y ! µY ) ! & YZ & ZZ ( z - µ Z ) #% = & YY ! & YZ
!1
& &
ZZ
ZY
from which we get the partial correlation
-1
P( X , Y | Z ) =
! -! ! !
! ! ! ! -! ! !
XY
!
XX
-
XZ
ZZ
ZY
-1
XZ
ZZ
-1
ZX
YY
YZ
ZZ
ZY
Now substituting
X = Xt
8XY = '2
Z = Xt+1
8XZ = '1 = 8XZ
Y = Xt+2
8ZZ = '0
we get
P( X t , X t + 2 X t +1 ) =
! 2 $ ! 1! 0-1! 1
! 0 $ ! 1! 0-1! 1
2
%! &
$' 1 (
) !0 *
=
2
% !1 &
1$ ' (
) !0 *
" $ "2
= 2 21 = #2
1 $ "1
!2
!0
1
=
1
"1
"2
"1
"1
1
"1
VI.2 Vector AR processes
A univariate time series consists of a sequence of random variables Xt, where Xt is the value of the
single variable X of interest at time t.
An m-dimensional multivariate time series consists of a sequence of random vectors X1, X2 , ... There
are m variables of interest, denoted X(1) , ..., X(m), and Xt(m) is the value of X(m) at time t.
Thus at time t we have a vector of observations
ST4064
Time Series Analysis
51
! X t(1) "
#
$
Xt = # ! $
# X t( m ) $
%
&
X1
...
Xt
...
Xn
single
variable X
1
X1
m variables of
interest
(X(1) , ... , X(m))
...
...
t
Xt
...
...
n
Xn
X1(1)
Xt(1)
Xn(1)
.
.
.
.
.
.
.
.
.
X1(m)
Xt(m)
Xn(m)
1
...
t
...
time
n
time
As for the second order properties of Xk, we use
•
Vectors of expected values µt = E(Xt)
•
Covariance Matrices Cov(Xt, Xt+k) for all pairs of random vectors
The vector process {Xt} is weakly stationary if E(Xt) and Cov(Xt, Xt+k) are independent of t. Let µ
denote the common mean vector E(Xt) and 8k denote the common lag k covariance matrix, i.e.
8k = Cov(Xt, Xt+k)
In the stationary case,
ST4064
Time Series Analysis
X
X
(1)
%k =
X
(m)
(1)
!*
#!
#
#&*
52
(m)
X
...
...
...
*"
$
$
*$
'
!
For k=0, 8k is the variance/covariance matrix of X(1), ..., X(m).
8k(1,1) = Cov (xt(1) , xt+k(1)) = covariance at lag k for x(1).
8k(i,j) = Cov (xt(i) , xt+k(j)) = lag k cross-covariance of X(i) with x(j).
Example: Multivariate White Noise. Recall that univariate white noise is a sequence e1, e2, ... of random
variables with E(et) = 0 and Cov(et, et+k) = )2 1(k=0) (where 1(.) is the indicator function). Multivariate
white noise is the simplest example of a multivariate random process.
Let e1 , e2 , ... be a sequence of independent, zero-mean random vectors, each with the same covariance
matrix 8. Thus for k = 0, the lag k covariance matrix of the et’s is 8k = 8.
But since the et’s are independent vectors 8k= 0 for k > 0.Thus 8 need not be a diagonal matrix, i.e.
the components of et at time t need not be independent of each other. However, the et’s are independent
vectors -- the components of et and et+k are independent for k > 0.
Example: A vector autoregressive process of order P, Var(P), is a sequence of m-component random
vectors {X1, X2, ...} satisfying
P
xt = µ + " Aj ( xt - j ! µ ) + et
j =1
where e is an m-dimensional white noise process and the Aj are (m x m) matrices.
Example: Let it denote the interest rate at time t and It the tendency to invest at time t. We might believe
these two are related as follows:
#$it – µi = !11 ( it "1 – µi ) + et (i )
%
(I )
$& I t – µ I = ! 21 ( it "1 – µi ) + ! 22 ( I t "1 – µ I ) + et
where e(i) and e(I) are zero-mean, univariate white noise. They may have different variances and are not
necessarily uncorrelated, i.e. we do not require that Cov (et(i) , et(I) ) = 0 for any t. However, we do require
Cov (et(i) , es(I) ) = 0 for s 9 t.
The model can be expressed as a 2-dimensional VAR(1):
"
! i t -µ i " ! !11 0 " ! i t-1 -µ i " ! e(i)
t
#
$=#
$ + # (I) $
$#
% I t -µ I & % ! 21 ! 22 & % I t-1 -µ I & % et &
The theory and analysis of Var(1) closely parallels that of a univariate AR(1).
ST4064
Time Series Analysis
53
Recall: The AR(1) model xt = µ + , (xt-1 – µ) + et is stationary IFF | , | < 1. For the Var(p) process with
p = 1 (Var(1))
X t = µ + A( X t !1 – µ ) + et
we have
t -1
X t = µ + ! A j et - j + At ( X 0 - µ )
j =0
In order that X should represent a stationary time series, the powers of A should converge to zero in
some sense: this will happen if all eigenvalues of the matrix A are less than 1 in absolute magnitude.
Recall eigenvalues (see appendix): 2 is an eigenvalue of the n x n matrix A if there is a non-zero vector
x (called the eigenvector) such that
Ax = 2x
or
(A – 2I) x = 0
These equations have a non-zero solution x IFF | A – 2I | = 0. This equation is solved for 2 to find the
eigenvalues.
2"!
Example: Find the eigenvalues of !# 2 1 "$ . Solution: Solve
% 4 2&
4
1
= 0 which is
2"!
equivalent to (2 – 2)2 – 4 = 22 - 42 = 2 (2 – 4) = 0. The eigenvalues are 0 and 4.
Question: Is the following multivariate time series stationary?
x
! x t " ! 0.3 0.5 " ! x t-1 " ! e t "
# $=#
$+# y $
$#
% y t & % 0.2 0.2 & % y t-1 & #% e t $&
0.3 0.5 "
We find the eigenvalues of !#
$:
% 0.2 0.2 &
0.3 " !
0.5
= (0.3 " ! )(0.2 " ! ) " 0.1
0.2
0.2 " !
= ! 2 " 0.5! " 0.04 = 0
!
2 = 0.57, -0.07
Since | 2 | < 1 for both eigenvalues, the process is stationary.
Question: Write the model in question 7.18 in terms of Xt only. Show that Xt is stationary in its own
right. Solution: The model can be written as:
ST4064
Time Series Analysis
"# X t = 0.3 X t !1 + 0.5Yt !1 + et X
$
Y
#%Yt = 0.2 X t -1 + 0.2Yt -1 + et
54
(1)
(2)
Rearranging (1): Yt-1 = 2(Xt – 0.3Xt-1 – etX) so Yt = 2(Xt+1 – 0.3Xt – et+1X). Substituting for Yt and Yt-1 in
(2) and tidying up:
Xt+1 = 0.5Xt + 0.04Xt-1 + et+1X – 0.2etX + 0.5etY
Since the white noise terms do not affect stationarity, the characteristic equation is
1 – 0.5 2 – 0.04 22 = 0
Since the model can be written as (1 – 0.5B -0.04B2)Xt = ..., the roots of the characteristic equation are 21
= -14.25 and 22 = 1.75. Since | 2 | > 1 for both roots, the Xt process is stationary.
Example: A 2-dimensional VAR(2). Let Yt denote the national income over a period of time, Ct the total
consumption over the same period, and It the total investment over the same period. We assume Ct = ,Yt(1)
(1)
is a zero-mean white noise (consumption over a period depends on the income over
1 + et , where e
the previous period).
We assume It = . (Ct-1 – Ct-2) + et(2), where e(2) is another zero-mean white noise.
We assume Yt = Ct + It (any part of the national income is either consumed or invested).
Eliminating Yt, we get the following 2-dimensional VAR(2):
Ct = , Ct-1 + , It-1 + et(1)
It = . (Ct-1 – Ct-2) + et(2)
Using matrix notation, we get
$ Ct % $ ! ! % $ Ct #1 % $ 0
& '=&
'+&
'&
( It ) ( " 0 ) ( It #1 ) ( #"
0 % $ Ct #2 % $ et(1) %
'+& '
'&
0 ) ( It #2 ) ( et(2) )
VI.3 Cointegration
Cointegrated time series can be applied to analyse non-stationary multivariate time series.
Recall: X is integrated of order d (X is I(d)) if Y = ! d X is stationary.
For univariate models, we have seen that a stochastic trend can be removed by differencing, so that the
resulting time series can be estimated using the univariate Box-Jenkins approach. In the multivariate
case, the appropriate way to treat non-stationary variables is not so straightforward, since it is possible
for there to be a linear combination of integrated variables that is stationary. In this case, the variables
are said to be cointegrated. This property can be found in many econometric models.
ST4064
Time Series Analysis
55
Definition: Two time series X and Y are called cointegrated if:
i)
X and Y are I(1) random processes
ii)
There exists a non-zero vector (, , .) such that ,X + .Y is stationary.
Thus X and Y are themselves non-stationary, (being I(1)), but their movements are correlated in such a
way that a certain weighted average of the two processes is stationary. The vector (, , .) is called a
cointegrating vector.
We may expect two processes to be cointegrated if
•
one of the processes is driving the other;
•
both are being driven by the same underlying process.
Remarks:
R1 – Any equilibrium relationship among a set of non-stationary variables indicates that the variables
cannot move independently of each other, and implies that their stochastic trends must be linked. This
linkage implies that the variables are cointegrated.
R2 – If the linear relationship (as made obvious by cointegration) is already stationary, differencing the
relationship entails a misspecification error.
R3 – There are two main popular tests for cointegration, but they are not the only ones.
Reference: see e.g. Enders, “Applied Econometric Time Series”, Wiley 2004.
Example: Let Xt denote the U.S. Dollars/GB Poung exchange rate. Let Pt be the consumer price index
for the U.S. and Qt the consumer price index for the U.K.
It is assumed that Xt fluctuates around the purchasing power Pt/Qt according to the following model:
ln Xt = ln(Pt/Qt) + Yt
Yt = µ + , (Yt-1 – µ) + et + . et-1
where e is a zero-mean white noise.
We assume ln P and ln Q follow ARIMA(1,1,0) models:
(1-B) ln Pt = µ1 + ,1 [(1-B) ln Pt-1 – µ1] + et(1)
(1-B) ln Qt = µ2 + ,2 [(1-B) ln Qt-1 – µ2] + et(2)
where e(1) and e(2) are zero-mean white noise, possibly correlated. Since ln Pt and ln Qt are both
ARIMA(1,1,0) processes, they are both I(1)-non-stationary, and ln Xt is also non-stationary. However,
lnXt – ln Pt + ln Qt = Yt
and Yt is an ARMA(1,1) process-stationary. Hence, the sequence of random vectors
{(ln Xt, ln Pt, ln Qt): t = 1,2,...}
is a cointegrated model with cointegrating vector (1,-1,1).
Question: Show that the two processes Xt and Yt defined by
ST4064
Time Series Analysis
Xt = 0.05Xt-1 + 0.35Yt-1 + etX . . .
Yt = 0.35Xt-1 + 0.65Yt-1 + etY . . .
56
(1)
(2)
are cointegrated, with cointegrating vector (1,-1).
Solution: We have to show that Xt – Yt is a stationary process. If we subtract the second equation from
the first, we get
Xt – Yt = 0.3Xt-1 – 0.3Yt-1 + etX - etY
= 0.3 (Xt-1 – Yt-1) + etX - etY
Hence the process is stationary, since |0.3| < 1; the white noise terms don’t affect the stationarity.
Strictly speaking, we should also show that the processes Xt and Yt are both I(1). We use the method of
question 7.19 to find the process Yt: from the first equation (1) we have
Yt-1 = 1/0.35 (Xt – 0.05Xt-1 – etX)
and so
Yt = 1/0.35 (Xt+1 – 0.05Xt – et+1X).
Substituting in the second equation (2), gives
1
1
( X t +1 ! 0.05 X t ! et +1 X ) = 0.35 X t -1 + (0.05)
( X t ! 0.05 X t -1 + et X ) + et Y
0.35
0.35
Tidying up, we have:
Xt+1 = 1.3Xt – 0.3 Xt-1 +et+1X – 0.05etX + 0.35etY
If this is to be an I(1) process, we need to show that the first difference is I(0). Look at the characteristic
equation or re-write the above equation in terms of differences:
!X t +1 = 0.3 !X t + et +1 X – 0.05et X + 0.35et Y
since |0.3| < 1, this process is I(0) and so Xt is I(1). Similarly, Yt can be shown to be I(1).
VI.4 Other common models
a. Bilinear models
The simplest example of this class is
Xn + ,(Xn-1 – µ) = µ + en + .en-1 + b(Xn-1 – µ)en-1
Considered as a function of X, this relation is linear; it is also linear when considered as a function of e
only; hence, the name “bilinear”.
•
Many bilinear models exhibit “burst” behaviour: When the process is far from its mean, it tends
to exhibit larger fluctuations.
ST4064
•
Time Series Analysis
57
The difference between this model and ARMA(1,1) is in the final term: b(Xn-1 – µ)en-1.
If Xn-1 is far from µ and en-1 is far from 0, this term assumes a much greater significance.
b. Threshold AR models
Let us look at a simple example:
$! ( X " µ ) + en , if X n"1 # d
X n = µ + % 1 n"1
&! 2 ( X n"1 " µ ) + en , if X n "1 > d
These models exhibit cyclic behaviour.
Example: set ,2 = 0. Xn follows an AR(1) process until it passes the threshold value d. Then Xn returns to
µ and the process effectively starts again. Thus we get cyclic behaviour as the process keeps resetting.
d
µ
t
ST4064
Time Series Analysis
58
c. Random coefficient AR models
Consider a simple example: Xt = µ + ,t (Xt-1 – µ) + et, where {,1, ,2, ...} is a sequence of independent
random variables.
Example: Xt = value of investment fund at time t. We have Xt = (1 + it) Xt-1 + et. It follows that µ = 0
and ,t = 1 + it where it is the random rate of return. The behaviour of such models is generally more
irregular than that of the corresponding AR(1) model.
VI.5 ARCH and GARCH
a. ARCH
Recall: Homoscedastic = constant variance
Heteroscedastic = different variances
Financial assets often display the following behaviour:
-
A large change in asset price is followed by a period of high volatility.
A small change in asset price tends to be followed by further small changes.
Thus the variance of the process is dependent upon the size of the previous value. This is what is meant
by conditional heteroscedasticity.
t
The class of autoregressive models with conditional heteroscedasticity of order p – the ARCH(p) models
– is defined by:
p
X t = µ + et !0 +! ! k (X t-k - µ)2
k=1
where e is a sequence of independent standard normal variables.
Example: The ARCH(1) model
X t = µ + et !0 + !1 (X t-1 - µ)2
A significant deviation of Xt-1 from the mean µ gives rise to an increase in the conditional variance of
Xt, given Xt-1:
ST4064
Time Series Analysis
59
(Xt - µ)2 = et2 (,0 + ,1 (Xt-1 – µ)2)
E[(Xt – µ)2 | Xt-1] = ,0 + ,1 (Xt-1 – µ)2
Example: Let Zt denote the price of asset at the end of the tth trading day, and let Xt = ln(Zt/Zt-1) be the
daily rate of return on day t.
It has been found that the ARCH model can be used to model Xt.
Brief history of cointegration and ARCH modelling:
• Cointegration (1981 - ) – Granger
• ARCH (1982 - ) – Engle
2003 Nobel prize in Economics – Engle/Granger
b. GARCH
Download