Uploaded by Gugu Khoza

General Insurance Maths II

advertisement
0.10
rpiLD
0.04
0.06
0.08
200
180
160
RPI
140
0.02
120
100
1987 1
1989 1
1991 1
1993 1
1995 1
1997 1
1999 1
2001 1
2003 1
2005 1
2007 1
0
20
Quarter
Figure 1: Retail Price Index (UK)
1 Introduction, ACF, Stationarity and Operators
Definition 1.1 A Time Series (TS) is a stochastic process defined on a probability space (Ω, F, P).
That means, for a given index set T, a time series Y = {Yt }t∈T is a collection of random variables
defined on a probability space (Ω, F, P).
Remarks:
40
60
80
Index
Figure 2: Annual force of inflation (annual log-difference of RPI)
• Estimation of ACF and ARMA parameters
• Model selection
• Multivariate Time Series
• Introduction to ARCH and GARCH
• The term ”time series” is used for the sequence of random variables {Yt } and for the sequence
of observations {yt }.
• In most cases T = {0, 1, . . . , ∞} or T = {0, 1, . . . , n} or T = {−∞, . . . , −1, 0, 1, . . . , ∞}.
This means Y = {Yt }t∈T is a stochastic process in discrete time, that is, Y = {Yt }t=T is a
set of random variables Yt in time order.
• The random variables Yt (observations yt ) are usually equally-spaced in time e.g. daily,
monthly, quarterly, annually.
• The series
• The random variables Yt are NOT just an i.i.d. sample – Yt ’s will be related to one another.
y1 , y2 , . . . , yn = {yt }nt=1 with yt = Yt (ω)
is called an observation (or observed series) of the time series Y1 , . . . , Yn = {Yt }nt=1 .
• It is this relationship or “serial dependence” or “structure” that is of interest.
• “time” could be another dimension e.g. length (along a river, pipeline, etc).
• We consider only univariate series, Yt ∈ R, but there are multivariate series.
Contents
• Areas of application: finance, economics, medicine, ...
• Introduction, ACF, Stationarity and Operators
• Moving Average Processes: MA(q) and MA(∞)
• Autoregressive Processes
2
1.1 Objectives
1. Description and Analysis
• The ARMA processes
Plot the series (always do this first);
• Partial ACF
Calculate summary measures
• ARIMA Processes
e.g. estimate trend, seasonal variation, autocorrelations;
• Estimation for Linear Processes
Model i.e. fit a stochastic model.
3
1400000
gdpLD
0.10
0.15
0.20
1000000
GDP
600000
200000
1960
1970
1980
1990
0.05
0
1950
2000
year
1950
1960
1970
1980
1990
2000
gdp$year[2:ng]
Figure 3: UK annual GDP at current prices (in Million GBP)
Figure 4: Log-difference of UK annual GDP at current prices
2. Forecasting
Predict future values (with confidence limits).
3. Explanation
Relate one series to another (like regression).
4. Control
A simulated time series following a MA(2) with psi1=0.5, psi2=0.4
−4
1.2 ACF and Stationarity
0
Xt
2
4
Monitoring and adjusting.
Definition 1.2 The covariance between two elements Xt and Xs , t, s ∈ T of a time series, is
called the autocovariance function of X, denoted by
0
100
200
300
400
500
Time
γ(t, s) = cov(Xt , Xs ) = E[(Xt − E(Xt ))(Xs − E(Xs ))].
The second simulated time series following the same model
Xt
Definition 1.3 A time series is said to be weakly stationary, if
i) E(Xt2 ) < ∞ for all t ∈ T,
ii) EXt = µ for all t ∈ T
−3
Weak stationarity is defined based on autocovariances.
0
2
In particular, γ(t, t) = var(Xt ).
0
and
100
200
300
400
500
Time
iii) γ(r, s) = γ(r + t, s + t) for all r, s, t ∈ T.
Figure 5: Two realizations of a second order moving average process
Properties of a weakly stationary TS:
• mean and variance are constant
4
5
Strong stationarity is a property on the whole distribution of a process.
A simulated time series following a AR(2) with phi1=0.5, phi2=0.4
0
It is clear that strong stationarity does not follow from weak stationarity. However, note that,
in general, weak stationarity also does not follow from strong stationarity, because the second
moments (even the mean) of a strongly stationary process may not exist.
−5
Xt
5
Weak stationarity is just defined based on the first two moments of the process.
0
100
200
300
400
500
If the variance of a strongly stationary process exists, then it is also weakly stationary.
In particular, if the process is Gaussian, i.e. the distribution of Xt for any t ∈ T is normal, then
the two stationary properties are equivalent to each other.
Time
0
−6
Xt
4
The second simulated time series following the same model
0
100
200
300
400
500
Time
Figure 6: Two realizations of a second order autoregressive process
• the autocovariances γ(r, s) = cov(Xr , Xs ) only depend on k = s − r. It is hence a onedimensional function,
γ(k) = cov(Xt , Xt+k ) for k = ... − 1, 0, 1, ... and ∀ t ∈ T
which is called the autocovariance function of a (weakly) stationary process Xt .
Definition 1.4 The standardised autocovariance function of a (weakly) stationary process Xt ,
ρ(k) = γ(k)/γ(0)
cov(Xt , Xt+k )
cov(Xt , Xt+k )
=p
=
var(Xt )
var(Xt )var(Xt+k )
will be called the autocorrelation function (acf ).
Example: White Noise
This is the simplest stationary series. All X’s are i.i.d. with mean 0 and variance σ 2 . So, γ(t, s) = 0
for t 6= s, γ(t, t) = σ 2 . The White Noise process is usually denoted by Zt (or Z(t)). It has no
“structure” but is used within models for series with structure (see later chapters).
Properties of the autocovariance function of a (weakly) stationary process (similar properties hold
for the acf).
2
• γ(0) = var(Xt ) = σX
≥ 0 and ρ(0) ≡ 1.
• |γ(k)| ≤ γ(0) for all k ∈ T,
• γ(−k) = γ(k) for all k ∈ T.
Definition 1.5 A time series is said to be strongly (or strictly) stationary, if
P (Xt1 < x1 , ..., Xtk < xk ) = P (Xt1 +t < x1 , ..., Xtk +t < xk )
∀ k = 1, 2, ...; t, t1 , ..., tk ∈ T and x1 , ..., xk ∈ R.
6
7
Examples:
Operators can be handled algebraically and in particular have inverses.
(i) Xt = t , t = 0, 1, ... with iid Cauchy t is strongly, but not weakly, stationary.
∆yt := yt − yt−1 = yt − Byt = (1 − B)yt ,
(ii) Let t be iid with E(t ) = 0 and E(2t ) < ∞, then the MA(1) process
=⇒ ∆ ≡ 1 − B.
Xt = ψt−1 + t , |ψ| < 1
Also,
and the AR(1) process
S(1 − B)yt = S(yt − yt−1 ) = Syt − Syt−1
= (yt + yt−1 + yt−2 + . . . )
− (yt−1 + yt−2 + yt−3 + . . . )
= yt
Xt = ϕXt−1 + t , |ϕ| < 1
are both strongly and weakly stationary.
(iii) Let t be iid N (0, 1) random variables, then the process
t
for t = 1, 3, ...
√
Xt =
(2t − 1)/ 2 for t = 2, 4, ...,
is weakly, but not strongly, stationary.
=⇒ S(1 − B) ≡ 1
i.e. S ≡ (1 − B)−1 .
S∆ ≡ 1
i.e. S ≡ ∆−1 .
Alternatively,
(iv) The Random Walk Xt = Xt−1 + t with iid t is neither weakly nor strongly stationary.
Syt = yt + yt−1 + yt−2 + . . .
= (1 + B + B 2 + . . . )yt = (1 − B)−1 yt .
The Moving Average Operator
1.3 Operators
Notation:
T = [w−k , . . . , w−1 , w0 , w1 , . . . , wk ]
A time series model is often defined based on another simple process, e.g. an i.i.d. series.
is called a moving average operator. We write mt = T yt .
On the other hand we may want to transfer a time series into another.
T is defined by:
This is done by the use of operators.
T yt =
k
X
wi yt+i = w−k yt−k + · · · + w0 yt + · · · + wk yt+k
i=−k
The use of operators simplifies much work in TS.
An operator is defined by its effect.
y1 , y2 , . . . yk+1 , . . . , y2k+1 , y2k+2 , . . .
|
{z
}
mk+1
Operators can be applied to the random variables Yt or the observations yt .
Basic operators are:
Backward shift: BYt := Yt−1
Difference: ∆Yt = Yt − Yt−1
Identity: 1Yt = Yt
Forward Shift: F Yt = Yt+1 , F r Yt = Yt+r
Seasonal difference: Dr Yt = Yt − Yt−r = (1 − B r )Yt
Summation: SYt = Yt + Yt−1 + Yt−2 + ...
then
y2 , . . . , y2k+2
to give mk+2 , etc.
Some examples for time series models based on white noise are:
(i) MA(1) (first order moving average process):
Xt = 0.8t−1 + t = (1 + 0.8B)t , t are e.g. iid N (0, 1) random variables.
(ii) MA(2) (second order moving average process):
Xt = 0.5t−1 + 0.3t−2 + t = (1 + 0.5B + 0.3B 2 )t , t as in 1.
Using the MA-operator: Xt = T t with T = [0.3, 0.5, 1, 0, 0]
(iii) AR(1) (first order autoregressive process):
Xt = 0.8Xt−1 + t ⇐⇒ (1 − 0.8B)Xt = t , t as in 1.
(iv) Random Walk:
Xt = Xt−1 + t ⇐⇒ ∆Xt = t , t = 0, 1, ..., t as in 1.
Xt = St
8
9
2 Univariate Linear Processes
0
Xt
−4
A very wide class of time series is that of linear processes, including the well known AR (autoregressive), MA (moving average), ARMA(autoregressive - moving average) processes with iid
innovations.
2
A simulated time series following a MA(2) with psi1 = 0.7 and psi2 = 0.4
We will start from the most simple white noise (WN) model, which is defined as follows.
0
100
200
Definition 2.1 An iid series t with E(t ) = 0 and var(t ) = σ2 is called a White Noise.
400
500
(2.1)
0
1, k = 0,
0, k =
6 0.
Xt
ρ(k) =
2
The second simulated time series following the same model
−3
The WN is strongly and weakly stationary with
2
σ , k = 0,
γ(k) =
and
0, k 6= 0
300
Time
Infinite variance i.i.d. series, e.g. i.i.d. Cauchy series, are not included in our definition.
0
100
200
A time series is said to be linear, if it can be represented as a linear (possibly infinite) sum (called
a linear filter) of t , where {t } is a WN process. See the MA(∞) process defined later.
Figure 7: Two simulated realisations
(F71TS MA ACF.r)
2.1 Moving Average Processes
Definition 2.2 Let t be a white noise. The process defined in the following is called a moving
average process of order q (MA(q)).
q
Xt = ψ1 t−1 + ... + ψq t−q + t =
X
ψi t−i ,
(2.2)
i=0
• Using the backshift operator B we have
q
P
500
following
the
MA(2)
Model
(iii)
given
above.
A process, which does not involve future information, is called causal.
For some theoretical reasons, an MA model can be defined with coefficients αi 6= 0 for i < 0 (see
e.g. the MA(∞) process given later). In this case the process is no more causal.
We can obtain the autocovariances γ(k) and the autocorrelations ρ(k) of an MA process easily
using the fact var(t ) = σ2 , cov(t , t±k ) ≡ 0 for all t and k 6= 0. For example, for the MA(1)
process Xt = 0.8t−1 + t we have
• E(Xt ) = 0 for an MA(q) process
where ψ(B) =
400
A finite order MA process is always stationary.
where ψq 6= 0 and ψ0 = 1.
Xt = ψ0 B 0 t + ψ1 Bt + ... + ψq B q t
h
i
= ψ0 B 0 + ψ1 B + ... + ψq B q t = ψ(B)t
300
Time
(2.3)
ψi B i .
i=0
Examples:
(i) An MA(1) process: Xt = 0.8t−1 + t ,
var(0.8t−1 + t ) = 0.82 var(t ) + var(t ) = 1.64σ2 ,
cov(Xt , Xt+1 ) = cov(0.8t−1 + t , 0.8t + t+1 ) = 0.8σ2 ,
0 for |k| > 1, and hence
1,
0.8σ2
= 0.488, and
ρ(±1) =
1.64σ2
ρ(±k) = 0 for |k| > 1.
var(Xt )
γ(±1)
γ(±k)
ρ(0)
=
=
=
=
(ii) Another MA(1) process: Xt = −0.9t−1 + t ,
(iii) An MA(2) process: Xt = 0.7t−1 + 0.4t−2 + t ,
Let γ (k) = cov(t , t+k ). Then γ (0) = σ2 and γ (k) = 0 for k 6= 0.
(iv) Another MA(2) process: Xt = 0.6t−1 − 0.3t−2 + t .
2.2 Properties of MA processes
The MA process defined above does not involve i with i > t, i.e. does not involve information
about the future.
10
11
For an MA(q) we obtain
q
q
X
X
ψi ψj γ (k + i − j)
0.8
γ(k) =
0.8
1.0
Theoretical ACF of MA(2)
1.0
Empirical ACF of second simulated series
a) The autocovariances:
0.6
ACF
0.4
(2.4)
ψi2 .
0.0
q
P
0.0
b) In particular, var(Xt ) = σ 2 = σ2
0.2
ACF
k > q,
k < 0.
0.4
i=0
0,



γ(−k),
0.2
=

q−k
P


ψi ψi+k , 0 ≤ k ≤ q,
 σ2
0.6
i=0 j=0
i=0
c) The autocorrelations:
0
2
4
6
8
0
2
Lag
ρ(k) =
 q−k
q
P 2

 P ψi ψi+k
ψi , 0 ≤ k ≤ q,

i=0
i=0
0,



ρ(−k),
k > q,
k < 0.
(2.5)
We see, γ(k) and ρ(k) of an MA(q) process are zero for |k| > q.
4
6
8
Lag
Figure 8: The acf’s (estimated and theoretical) for the MA(2) process. The estimated acf is based
on the second realisation shown in figure 7. (F71TS MA ACF.r)
A general MA(∞) process is defined through a linear filter (MA-filter) of t :
Examples:
(i) For the MA(1) process Xt = −0.9t−1 + t we have
∞
X
Xt =
αi t−i .
(2.7)
i=−∞
a) γ(0) = var(Xt ) = (ψ02 + ψ12 )σ2 = 1.81σ2 ,
b) γ(±1) = ψ1 σ2 = −0.9σ2 , ρ(±1) = −0.497 and
The MA(∞) process is causal, if αi = 0 ∀ i < 0. We see, the MA(∞) representation of an AR(1)
process is causal. Hereafter, we will mainly consider causal MA(∞) processes
c) γ(±k) = ρ(±k) = 0 for |k| > 1.
(ii) For the MA(2) process Xt = 0.7t−1 + 0.4t−2 + t we have
Xt =
a) γ(0) = var(Xt ) = (1 + ψ12 + ψ22 )σ2 = 1.65σ2 ,
∞
X
αi t−i .
(2.8)
i=0
b) γ(±1) = (ψ0 ψ1 + ψ1 ψ2 )σ2 = 0.98σ2 , ρ(±1) = 0.594,
Without loss of generality, we will often put α0 = 1.
c) γ(±2) = ψ0 ψ2 σ2 = 0.4σ2 , ρ(±2) = 0.242 and
The MA(∞) process is stationary, if αi , i = 0, 1, ... are at least squared summable, i.e.
∞
X
d) γ(±k) = ρ(±k) = 0 for |k| > 2.
αi2 < ∞.
(2.9)
i=0
2.3 The MA(∞) processes
In time series analysis we often need to consider infinite order MA processes, denoted by MA(∞),
which is an generalisation of the MA(q) processes described above. This will be motivated in the
following by the simple AR(1) (first order autoregressive) process.
Xt =
φi t−i .
(2.6)
(iii) ρ(k) =
∞
P
i=0
∞
P
φi t−i .
∞
P
αi αi+|k| σ2 , k = 0, ±1, ... .
i=0
i=0
Proof: Xt = φXt−1 + t = φ(φXt−2 + t−1 ) + t = ... =
i=0
(ii) γ(k) = cov(Xt , Xt+k ) =
Example: The AR(1) process Xt = φXt−1 + t with |φ| < 1 has an MA(∞) representation
∞
X
Provided that (2.9) holds, by extending the results on the acf of an MA(q) process, we have,
∞ P 2
(i) γ(0) = var(Xt ) =
αi σ2 < ∞;
αi αi+|k|
∞
P
αi2
, k = 0, ±1, ... .
i=0
i=0
12
13
Later we will see that (Theorem 2.6 below), many practically relevant autoregressive processes
have the causal MA(∞) representations defined in (2.8). The above results can hence be used to
calculate (or approximate) the acf’s of autoregressive processes.
A stronger condition on the coefficients αi of an MA(∞) process is the absolute summability:
∞
X
|αi | < ∞.
(2.10)
i=0
2.4 Autoregressive Processes
The autoregressive (AR) processes are another class of important time series models in theory and
practice.
These processes are closely related to regression analysis. But now the regressors are some (shifted)
random variables of the same series in the past.
Definition 2.4 A p-th order autoregressive process AR(p) is defined by
It is easy to show that
∞
X
|αi | < ∞ =⇒
∞
X
i=0
Xt = φ1 Xt−1 + ... + φp Xt−p + t ,
αi2 < ∞,
but not vice versa. A series converges faster than i−1/2 is squared summable, and that converges
faster than i−1 is absolutely summable. Some examples are given below.
Equivalently, we have
t = Xt −
Examples: Let α0 = 1.
(i) αi = i−1/2 for i = 1, 2, ... =⇒
∞
P
αi2 = ∞,
(ii) αi = i−1 for i = 1, 2, ... =⇒
αi2 < ∞,
i=0
−3/2
(iii) αi = i
for i = 1, 2, ... =⇒
∞
P
|αi | = ∞.
αi2 < ∞,
i=0
t = φ0 B 0 Xt − φ1 BXt − ... − φp B p Xt = φ(B)Xt ,
|αi | = ∞.
∞
P
where φ(B) = 1 −
|αi | < ∞.
p
P
(2.15)
i
φi B .
Examples: Given a WN process t , we can define
i=0
Lemma 2.3 For an MA(∞) process defined by (2.8) we have,
|γ(k)| < ∞, if
∞
X
|αi | < ∞
(i) An AR(1) process: Xt = 0.8Xt−1 + t ,
(ii) An AR(2) process: Xt = 1.1Xt−1 − 0.3Xt−2 + t .
Replacing B in φ(B) with a variable z we obtain a p-th order polynomial φ(z).
(2.11)
i=0
k=−∞
and
(2.14)
i=1
Summable conditions play an important role in the theory of time series analysis, which can be
seen from the following simple lemma.
∞
X
φi Xt−i ,
Set φ0 = 1, we have
i=0
∞
P
p
X
i=1
i=0
i=0
∞
P
∞
P
(2.13)
where φp 6= 0.
i=0
Definition 2.5 The polynomical φ(x) is called the characteristic polynomial of an AR(p) model.
Similarly, ψ(z) is the characteristic polynomial of an MA(q) model.
Set φ(z) = 0 we obtain the characteristic equation of an AR(p) model:
∞
X
2
γ (k) < ∞, if
k=−∞
∞
X
αi2 < ∞.
φ(z) = 1 − φ1 z − ... − φp z p = 0.
(2.12)
i=0
This also holds for the general MA(∞) given in (2.7).
The proof of Lemma 2.3 is left to the tutorial.
The coefficients in (2.6) are obviously absolutely summable, i.e. AR(1) with |φ| < 1 is stationarity
with absolutely summable γ(k).
A stationary process with absolutely summable γ(k) is said to have short memory. This means
that a stationary AR(1) process has short memory.
(2.16)
This equation has exactly p roots z1 , ..., zp (possibly multiple or complex).
Note that φ(z) and {z1 , ..., zp } determine each other. Hence, the correlation structure and some
other important properties of Xt are determined by z1 , ..., zp .
To show this we introduce first the important concept of the unit circle in time series analysis.
The unit circle in the complex plane is the set of all complex numbers with norm one:
where i =
√
z = a + ib such that |z| = 1,
√
a2 + b2 is the Euclidean norm of z.
−1 denotes the imaginary unit and |z| =
We have seen that an MA process with summable coefficients are stationary. But this is not true
for an AR model. For instance, Xt = Xt−1 + t , the random walk, is non-stationary.
Theorem 2.6 An AR(p) process Xt is causal and stationary, iff (if and only if ) all of the roots
of φ(z) lie outside the unit circle, i.e. iff
|zk | > 1, ∀ 1 ≤ k ≤ p.
Proof. See e.g. Theorem 3.1.1 of Brockwell and Davis (1991).
14
15
These results are based on the assumption that the process starts at t = −∞. For an AR process
starting at t = 0 or t = 1, these results only hold asymptotically, i.e. Xt will converge (almost
surely) to another X̃t as t → ∞, where X̃t is defined following the same AR model but starts at
t = −∞.
If we can find a factorised version of φ(z), then it is easy to check, if the process is causal stationary.
For the AR(3) process
Xt = 1.8Xt−1 − 1.07Xt−2 + 0.21Xt−3 + t
and
ρ(k) = γ(k)/γ(0) = φ|k| ,
for k = 0, ±1, ±2, ... . Note that γ(k) 6= 0 for any k. For φ > 0, ρ(k) are always positive and
decrease monotonically. For φ < 0, |ρ(k)| decrease monotonically but with alternating signs.
These results can also be obtained recursively. Note that
we have
E(Xt ) = 0, γ(k) = E[Xt Xt+k ] and E[Xt−k t ] =
φ(z) = (1 − 0.5z)(1 − 0.6z)(1 − 0.7z)
with
σ2
0
for
for
k=0
k 6= 0
Multiply both sides of (2.18) with Xt−k and take expectations.
z1 = 2, z2 = 10/6 and z3 = 10/7
For k = 0
These are all outside the unit circle and X is therefore stationary.
E(Xt Xt ) = E(Xt φXt−1 ) + E(Xt t )
γ(0) = φγ(1) + σ2
If the conditions of Theorem 2.6 are fulfilled, we have
Xt = φ−1 (B)t =
∞
X
For k = 1;
αi t−i ,
(2.17)
E(Xt−1 Xt ) = E(Xt−1 φXt−1 ) + E(Xt−1 t )
i=0
γ(1) = φγ(0)
which is a causal stationary MA(∞) process with
∞
X
(2.20)
(2.21)
For k ≥ 2:
E(Xt−k Xt ) = E(Xt−k φXt−1 ) + E(Xt−k t )
|αi | < ∞
γ(k) = φγ(k − 1)
(2.22)
i=0
By solving this system of equations system we obtain the same results as given above, i.e.
Generally, αi 6= 0 for all i, because the reciprocal φ−1 (z) of a finite order polynomial φ(z) is an
infinite order polynomial.
γ(0) =
1
φ
φ2
σ 2 , γ(1) =
σ 2 , γ(2) =
σ 2 , ...,
1 − φ2 1 − φ2 1 − φ2 ρ(0) ≡ 1, ρ(1) = φ, ρ(2) = φ2 , ...
2.5 The AR(1) processes
The simplest AR process is the AR(1) process
ρ(k) =
Xt = φXt−1 + t .
(2.18)
For this process the condition of Theorem 2.6 reduces to |φ| < 1 and we have seen that αi = φi
and
∞
X
Xt =
φi t−i .
(2.19)
γ(k)
φγ(k − 1)
=
= φρ(k − 1) = φk
γ(0)
γ(0)
Example ρ(k) for AR(1) processes with φ = 0.8 and φ = −0.8, respectively. The following
figure displays two realisations following each of these two AR(1) models with iid N (0, 1) t . The
realisations following AR models with φ > 0 and φ < 0 look quite different. The former is
dominated by positive and the latter by negative correlations. (F71TS AR1 ACF.r)
i=0
We obtain
∞
X
αi =
∞
X
i=0
φi = 1/(1 − φ) < ∞
i=0
The acf of an AR(1) model can be calculated using its MA(∞) representation.
γ(k) = E(Xt Xt+k )
( ∞
! ∞
!)
X
X
i
j
= E
φ t−i
φ t+k−j
i=0
= σ2 φ|k|
∞
X
i=0
16
j=0
φ2i = σ2
φ|k|
,
1 − φ2
17
ACF
20
40
60
80
100
1) Xt = 0.556Xt−1 + 0.322Xt−2 + t is causal stationary, because
φ1 + φ2 = 0.878 < 1, φ2 − φ1 = −0.234 < 1 and −1 < φ2 = 0.322 < 1.
2) Xt = 0.424Xt−1 − 1.122Xt−2 + t is not causal stationary, because φ2 = −1.122 < −1.
0
2
4
6
Time
Lag
AR(1) with phi = −0.8
AR(1) with phi = −0.8
8
1.0
Region B φ1 < 0 and φ21 + 4φ2 > 0 (two real roots z1 and z2 )
Region C φ1 < 0 and φ21 + 4φ2 < 0 (two conjugate complex roots z1 and z2 )
ACF
0.0
Region D φ1 > 0 and φ21 + 4φ2 < 0 (two conjugate complex roots z1 and z2 )
The acf’s of an AR(2) process with coefficients in different regions have different function forms
(F71TS AR2 regions.r).
−0.5
0
20
40
60
80
100
The above stationary conditions define a triangle in the φ1 -φ2 plane, which can be divided into
four regions:
Region A φ1 > 0 and φ21 + 4φ2 > 0 (two real roots z1 and z2 )
0.5
1 2 3
−1
−3
Xt
Examples:
0.2 0.4 0.6 0.8 1.0
4
2
0
Xt
−4 −2
0
Using these conditions it is easy to check, whether an AR(2) model is causal stationary or not.
AR(1) with phi = 0.8
AR(1) with phi = 0.8
0
2
4
6
8
The four regions of the AR(2) parameters
Lag
1.0
Time
0.5
2.6 The AR(2) processes
with the roots z1 = 2 and z2 = 1 32 . This process is causal stationary.
A
C
D
−1.0
φ(z) = 1 − 1.1z + 0.3z 2 = (1 − 0.5z)(1 − 0.6z)
−0.5
Examples:
1) For the AR(2) process Xt = 1.1Xt−1 − 0.3Xt−2 + t we have
B
0.0
For a given AR(2) model we have to first check, whether it is causal stationary. In some special
cases this can be done by means of a factorisation of φ(z).
phi2
Some properties of the more complex AR(2) processes will be discussed in this section.
−2
−1
2) For the AR(2) process Xt = −1.5Xt−1 + Xt−2 + t we have
0
1
2
phi1
φ(z) = 1 + 1.5z − z 2 = (1 − 0.5z)(1 + 2z)
with the roots z1 = 2 and z2 = − 21 . This process is not causal stationary (however, there exist a
non-causal stationary solution of Xt ).
The ACF of an AR(2) model
Usually we have to check, whether the conditions of Theorem 2.6 are fulfilled or not.
We have explicit formulas for the ACFs of MA processes and AR(1) processes.
For an AR(2) model
The acf of an AR(p) process with p > 1 has to be calculated in a recursive way.
Xt = φ1 Xt−1 + φ2 Xt−2 + t
these conditions are equivalent to the following simple conditions on the coefficients
i) φ1 + φ2 < 1;
ii) φ2 − φ1 < 1;
iii) −1 < φ2 < 1.
18
Consider an AR(2)
Xt − φ1 Xt−1 − φ2 Xt−2 = t
Multiply both sides with Xt−k and take expectations. We obtain, for k = 0, 1 and 2 respectively,
γ(0) −φ1 γ(1) −φ2 γ(2) = σ2 ,
γ(0)
−φ1 γ(1) −φ2 γ(2) = σ2 ,
γ(1) −φ1 γ(0) −φ2 γ(1) = 0, or −φ1 γ(0) +(1 − φ2 )γ(1) +0γ(2) = 0,
γ(2) −φ1 γ(1) −φ2 γ(0) = 0,
−φ2 γ(0)
−φ1 γ(1)
+γ(2) = 0.
(2.23)
19
Solving these equations, we have
(1 − φ2 )σ2
,
(1 + φ2 )[(1 − φ2 )2 − φ21 ]
2
φ1 σ
γ(1) =
,
(1 + φ2 )[(1 − φ2 )2 − φ21 ]
[φ21 + φ2 (1 − φ2 )]σ2
γ(2) =
,
(1 + φ2 )[(1 − φ2 )2 − φ21 ]
(2.24)
γ(k) = φ1 γ(k − 1) + φ2 γ(k − 2).
(2.25)
γ(0) =
1.0
0.2
0
5
10
ρ(1) =
φ1
.
1 − φ2
25
30
0
5
10
15
Region C
Region D
20
25
30
20
25
30
ACF
−0.5
0.0
0.5
0.5
1.0
Lag
−1.0
(2.26)
0
And hence
20
0.0
ACF
For ρ(k) we have ρ(0) ≡ 1. For k = 1, dividing the second equation of (2.23) by γ(0), we have
15
Lag
1.0
This recursive formula and the initial solutions given in (2.24) allow us to calculate the autocovariances γ(k) of an AR(2) process for any finite k (using e.g. R).
This idea can be generalized to a common AR(p) model, by which p + 1 initial values have to be
solved.
0.6
ACF
0.5
ACF
−0.5 0.0
and for k ≥ 2
ρ(1) − φ1 − φ2 ρ(1) = 0.
Region A
1.0
Region B
5
10
(2.27)
15
20
25
30
0
5
10
15
Lag
Lag
Region B
Region A
Analogously, we obtain the recursive formulas of ρ(k) for k ≥ 2:
ρ(k) = φ1 ρ(k − 1) + φ2 ρ(k − 2).
(2.28)
We see, for an AR(2) model, the calculation of ρ(k) is a little bit easier than that of γ(k).
0
−4
Similarly, we can obtain recursive formulas for computing the coefficients αi in the MA(∞) representation of an AR process given in (2.17). But this will not be discussed further in our lecture.
What we need to know are only some properties of αi , e.g. absolute summability.
−4 −2
A$Xt
2
4
0 1 2 3
−2
Solutions: Following the above formulas we have
ρ(0) = 1, ρ(1) = φ1 /(1 − φ2 ) = 0.833, ρ(2) = φ1 ρ(1) + φ2 ρ(0) = 0.817,
ρ(3) = φ1 ρ(2) + φ2 ρ(1) = 0.742, ρ(4) = 0.698, ρ(5) = 0.645, ρ(6) = 0.602, ... ,
ρ(48) = 0.029, ρ(49) = 0.027 and ρ(50) = 0.025.
B$Xt
Example. Calculate ρ(k), k = 0, 1, ..., 50, of the AR(2) process Xt = 0.5Xt−1 + 0.4Xt−2 + t .
0
50
100
150
0
50
Time
Time
Region C
Region D
100
150
100
150
The ACF for the four different regions (F71TS AR2 ACF.r):
D$Xt
−10
0 5
15
−20
C$Xt
−5
−15
(iv) Xt = 1.8Xt−1 − 0.9Xt−2 + t with φ1 and φ2 in area D.
We see, the acf of an AR(2) process with coefficients in region A looks like that of an AR(1) with
positive coefficient and that of an AR(2) process with coefficients in region B looks like that of an
AR(1) with negative coefficient.
5
(ii) Xt = −0.5Xt−1 + 0.4Xt−2 + t with φ1 and φ2 in area B;
(iii) Xt = −1.8Xt−1 − 0.9Xt−2 + t with φ1 and φ2 in area C;
15
(i) Xt = 0.5Xt−1 + 0.4Xt−2 + t with φ1 and φ2 in area A;
0
50
100
Time
150
0
50
Time
If the roots of φ(z) are complex, then ρ(k) perform like damped sine waves. If the coefficients are
in region C, the sign of ρ(k) changes quite frequently. If the coefficients are in region D, the sign
of ρ(k) keeps the same in a half period, as by a sine function.
20
21
The conditions |zi | > 1 in Theorem 2.7 imply that ψ −1 (B) is well defined with non-negative powers
and absolutely summable coefficients. Now we have
2.7 The AR(∞) processes
The AR(∞) process is given by
Xt =
∞
X
βi Xt−i + t
t = ψ −1 (B)Xt =
(2.29)
∞
X
βi Xt−i ,
i=0
i=1
or equivalently,
which is a causal stationary AR(∞) process with
∞
P
|βi | < ∞.
i=0
t =
∞
X
We see there is a dual relationship between the MA and AR processes.
βi Xt−i
(2.30)
i=0
with β0 = 1. Usually, it is assumed that
• A causal invertible MA process can be represented as a causal stationary AR(∞).
• A causal stationary AR process can be represented as a causal, stationary (and also invertible) MA(∞).
P∞
i=0 |βi | < ∞.
Example: An MA(1) process Xt = ψt−1 + t with |ψ| < 1 has the following AR(∞) representation.
∞
X
t =
(−ψ)i Xt−i
(2.31)
Analogously to the causal stationary conditions for AR(1) and AR(2), we have the following
invertible conditions for MA(1) and MA(2) processes.
1) An MA(1) process Xt = ψt−1 + t is invertible, iff |ψ| < 1.
2) An MA(2) process Xt = ψ1 t−1 + ψ2 t−2 + t is invertible, iff
i=0
with absolutely summable coefficients βi = (−ψ)i , i = 0, 1, ... .
i) ψ1 + ψ2 > −1;
Note that Xt are observable but t are often unobservable. Hence, the above AR(∞) model of the
innovations t is very useful in theory and practice, because now it is possible to estimate t from
the data. The question is, whether such an AR(∞) model is well defined for a given MA process.
The answer is that this is only possible if the MA process is invertible.
ii) ψ1 − ψ2 < 1;
iii) −1 < ψ2 < 1.
2.8 Invertibility of MA(q) Processes
1.0
Invertibility: Any process Xt is said to be invertible, if it can be represented as an AR(∞)
process with (absolutely) summable coefficients.
0.5
The four regions of the MA(2) parameters
Example: The RW Xt = Xt−1 + t is non-stationary but invertible.
D
C
A
B
0.0
psi2
Note that any AR process with summable coefficients is invertible.
Example: The MA(1) process Xt = t−1 + t is stationary but not invertible.
Hence, we need to discuss, when an MA process is invertible. For this we have the following
theorem, which is closely related to Theorem 2.6 on the causal stationarity of an AR process. The
characteristic equation of an MA(q) model is
−1.0
−0.5
There are however some simple MA models which are not invertible.
−2
ψ(z) = 1 + ψ1 z + ... + ψq z q = 0.
(2.32)
−1
0
1
2
psi1
Again, this equation has q roots z1 , ..., zq . And ψ(z) and {z1 , ..., zq } determine each other. Hence,
the correlation structure of Xt is determined by z1 , ..., zq .
Theorem 2.7 An MA(q) process Xt is invertible, iff all of the roots of ψ(z) lie outside the unit
circle, i.e. iff
|zi | > 1, ∀ 1 ≤ i ≤ q.
Also the invertible conditions for an MA(2) process define a triangle in the ψ1 -ψ2 plane, which
can again be divided into four regions (F71TS AR2 regions.r).
Proof. See Brockwell and Davis (1991).
22
23
Region A ψ1 < 0 and ψ12 − 4ψ2 > 0 (two real roots z1 and z2 )
Region B
The following theorem is one of the most important theorems in time series analysis.
ψ1 > 0 and ψ12 − 4ψ2 > 0 (two real roots z1 and z2 )
For an ARMA(p, q) process we have
Region C ψ1 > 0 and ψ12 − 4ψ2 < 0 (with two conjugate complex roots z1 and z̄1 )
Theorem 2.9 Assume that φ(z) and ψ(z) have no common factors. Then the ARMA(p, q)
process is
Region D ψ1 < 0 and ψ12 − 4ψ2 < 0 (with two conjugate complex roots z1 and z̄1 )
Examples:
a) causal (stationary), iff all roots of φ(z) lie outside the unit circle,
1) Xt = 1.5t−1 + 0.75t−2 + t is invertible, because
ψ1 + ψ2 = 2.25 > −1,
ψ1 − ψ2 = 0.75 < 1, and −1 < ψ2 = 0.75 < 1.
b) invertible, iff all roots of ψ(z) lie outside the unit circle,
c) causal (stationary) and invertible, iff all roots of φ(z) and ψ(z) lie outside the unit circle.
2) Xt = 0.75t−1 − 0.5t−2 + t is not invertible, because ψ1 − ψ2 = 1.25 > 1.
Theorem 2.9 combines Theorems 2.6 and 2.7.
The causal stationary and invertible conditions of an ARMA model do not depend on each other.
2.9 The ARMA processes
Definition 2.8 An Autoregressive Moving Average process of order (p, q) (ARMA(p, q)) is defined
by
Xt = φ1 Xt−1 + ... + φp Xt−p + ψ1 t−1 + ... + ψq t−q + t .
(2.33)
By combining the conditions given above we can check, whether an ARMA(p, q), for p = 0, 1, 2
and q = 0, 1, 2, process is causal stationary and/or invertible or not.
φ(z) and ψ(z) have a common factor if there exists a function f (z) such that
φ(z) = f (z)φ̃(z) and ψ(z) = f (z)ψ̃(z)
An ARMA model combines an AR and an MA models. Equivalently, (2.38) can be represented
in the following way
Xt − φ1 Xt−1 − ... − φp Xt−p = ψ1 t−1 + ... + ψq t−q + t ,
(2.34)
φ(B)Xt = ψ(B)t ,
(2.35)
If a common factor exists then φ̃(z) and ψ̃(z) instead of φ(z) and ψ(z) should be used in theorem
2.9.
Examples:
where
φ(z) = 1 − φ1 z − ... − φp z
p
is the characteristic polynomial of the AR part and
ψ(z) = 1 + ψ1 z + ... + ψq z q
the characteristic polynomial of the MA part.
An AR(p) model is an ARMA(p, 0) model with ψ(z) ≡ 1, and an MA(q) model is an ARMA(0,
q) model with φ(z) ≡ 1 .
Examples: Given a WN process t , we can define
(i) An ARMA(1, 1) process:
Xt = 0.8Xt−1 + 0.6t−1 + t
1) Xt − 0.6Xt−1 − 0.3Xt−2 = 1.5t−1 + 0.75t−2 + t is both causal stationary and invertible.
2) Xt − 0.6Xt−1 − 0.3Xt−2 = 0.75t−1 − 0.5t−2 + t is causal stationary but not invertible.
3) Xt − 0.6Xt−1 − 0.5Xt−2 = 1.5t−1 + 0.75t−2 + t is invertible but not causal stationary.
4) Xt − 0.6Xt−1 − 0.5Xt−2 = 0.75t−1 − 0.5t−2 + t is neither causal stationary nor invertible.
Remark. Under the assumptions of Theorem 2.9 c), an ARMA(p, q) process has on the one hand
the MA(∞) representation
∞
X
Xt =
αi t−i ,
(2.36)
i=0
where α(z) = φ(z)−1 ψ(z) with
P
|αi | < ∞, and on the other hand the AR(∞) representation
t =
βi Xt−i ,
(2.37)
i=0
(ii) An ARMA(2,2) process:
Xt = 0.7Xt−1 + 0.1Xt−2 + 0.8t−1 + 0.16t−2 + t
∞
X
where β(z) = φ(z)ψ(z)−1 , with
P
|βi | < ∞.
The fact that αi or βi are absolutely summable follows since the convolution of two absolutely
sumamble sequences is absolutely summable.
Under the assumptions of Theorem 2.9 c), the autocovariances of an ARMA(p,q) process are
always absolutely summable, i.e.
∞
X
|γX (k)| < ∞
k=−∞
24
25
3 Spectral Density Functions of ARMA Processes
The mean of MA, AR or ARMA processes may be non-zero.
Definition 2.10 An ARMA(p, q) with mean µ is defined by
We have seen how to calculate the ACF of an MA or AR process.
Xt − µ = φ1 (Xt−1 − µ) + ... + φp (Xt−p − µ) + ψ1 t−1 + ... + ψq t−q + t .
(2.38)
On the other hand, given a function γ(k), we might want to check, if it could be an autocovariance
function of some stationary process.
Note that, if the mean is known, we can simply assume that µ = 0 as before. If the mean is
unknown, it is not difficult to estimate µ from the data and to remove it. Hence, ARMA processes
with unknown mean have similar properties as those given above.
If the answer is positive, we would further like to know the properties of the stationary process
determined by the given ACF.
Discussion on this problem will involve the so-called spectral analysis in time series analysis, which
is carried out by means of the spectral density of a time series.
The ACF
We rewrite the ARMA(p, q) process as
The spectral density is a nonnegative function defined on the frequency interval λ ∈ [−π, π].
Xt − φ1 Xt−1 − ... − φp Xt−p = ψ1 t−1 + ... + ψq t−q + t .
Multiplying both sides of (2.39) by Xt−k and taking the expectations, we obtain
X
γ(k) − φ1 γ(k − 1) − ... − φp γ(k − p) = σ2
ψj ψj−k
(2.39)
(2.40)
k≤j≤q
Assume that the autocovariances γ(k) of a (real-valued)1 stationary process Xt are absolutely
summable. Then
∞
1 X
f (λ) =
γ(k)e−ikλ
2π k=−∞
∞
1 X
γ(k) cos(kλ)
2π k=−∞
#
"
∞
X
1
γ(k) cos(kλ)
=
γ(0) + 2
2π
k=1
=
for 0 ≤ k < max(p, q + 1), and
γ(k) − φ1 γ(k − 1) − ... − φp γ(k − p) = 0
(2.41)
for k ≥ max(p, q + 1). The recursive equation (2.41) is the same as for an AR(p).
(3.42)
is a well defined continuous nonnegative even (symmetric) function.
Where the last two equations are due to the fact
e−ikλ = cos(kλ) + i sin(kλ) ,
the function γ(k) is symmetric, cos is a symmetric function and sin is an anti-symmetric function.
f (λ) defined here is called the spectral density function (spdf) or simply spectral density of the
acf γ(k) or of the process Xt , which is a periodic function with period 2π. Hence, we only need
to consider f (λ) on the interval λ ∈ [−π, π].
Note that, f (λ) is not necessarily a density function, because generally
Rπ
f (λ)dλ 6= 1.
−π
The spectral density function can be calculated from both γ(k) or ρ(k). The difference between
the results is just the constant factor γ(0). We therefore, write acf for both the autocorrelation
and the autocovariance function.
On the other hand, for the spectral density function f (λ) we have
Z π
γ(k) =
f (λ)eikλ dλ,
(3.43)
−π
This means that, if γ(k) are absolutely summable, then γ(k) and f (λ) determine each other.
Indeed, f (λ) is the Fourier transform of γ(k) and γ(k) are the Fourier coefficients of f (λ).
Examples:
(i) For the WN process Xt = t with γ(0) = σ2 and γ(k) = 0 for k 6= 0 we have
∞
fX (λ) =
1
26
1
1 X
γ(k) cos(kλ) =
γ(0)
2π k=−∞
2π
Spectral analysis often involves complex-valued stochastic processes, which will be avoided in our lecture.
27
=
1 2
σ .
2π (3.44)
Solution: For this function we have
∞
(ii) MA(1) process Xt = ψt−1 + t with |ψ| < 1.
γ(k) = 0 for |k| > 1. Now, we have
f (λ) =
Then γ(0) = (1 + ψ )σ2 , γ(±1) = ψσ2 and
2
1 X −ikλ
e
ρ(k)
2π k=−∞
1
{ρ[cos(−λ) + i sin(−λ)] + 1ρ[cos(λ) + i sin(λ)]}
2π
1
[1 + 2ρ cos λ].
=
2π
=
∞
1 X
fX (λ) =
γ(k) cos(kλ)
2π k=−∞
1
[γ(0) + 2γ(1) cos(λ)]
2π
1 2
=
σ [1 + 2ψ cos(λ) + ψ 2 ].
2π (3.45)
=
Now, it is clear that f (λ) ≥ 0 for all λ ∈ [−π, π], if and only if |ρ| ≤ 12 .
(3.46)
On the other hand, given a function f (λ) on [−π, π], how can we check, if it is a spectral density
of some (real-valued) stationary process or not? To this end we have
The basic idea of the spectral analysis is that any stationary process in discrete time can be
decomposed into (usually infinite) periodic components with different frequencies λ ∈ [−π, π] (a
low frequency corresponds to a long period).
Theorem 3.2 A function f (λ) defined on [−π, π] is the spectral density function of a (real-valued)
stationary process, if and only if
(i) f (λ) = f (−λ),
The (relative) value of the spectral density at a given frequency λ shows how strong the influence
of the periodic fluctuation with this frequency on the whole process is.
We see the spectral density of a White Noise is a constant, this means in a WN all of the frequencies
are equally important.
The spdf can be used to check, whether a given symmetric function γ(k), k = 0, ±1, ..., is the
autocovariance function of some stationary process or not.
Besides the basic properties of γ(k) (γ(0) ≥ 0, |γ(k)| ≤ γ(0) and γ(−k) = γ(k)), an autocovariance
function of a stationary process has to be non-negative (or positive semi-) definite. For any given
k ≥ 1 define the (k+1) × (k+1)-matrices


γ(0) γ(1) · · · γ(k)
 γ(1) γ(0) · · · γ(k-1) 


Γ =  ..

..
..
..
 .

.
.
.
γ(k) γ(k-1) · · · γ(0)
(ii) f (λ) ≥ 0, and
Rπ
(iii) −π f (λ)dλ < ∞.
Example. Let f (λ) = (1 − 2φ cos λ + φ2 )−1 with |φ| < 1, λ ∈ [−π, π]. Then f (λ) = f (−λ) and
2φ cos λ + φ2 > 1 + φ2 − 2|φ| > 0. This means that f (λ) > 0 and f (λ) < ∞. The latter ensures
R1 −
π
f (λ)dλ < ∞. We conclude that f (λ) is the spdf of some stationary process.
−π
3.1 Spectral Densities of ARMA Processes
Closed form formulas for the spectral density functions of ARMA(p, q) processes are well known,
which are summarised in the following theorem.
Theorem 3.3 (Spectral density of an ARMA(p, q) process). Let {Xt } be a causal stationary and
invertible ARMA(p, q) process given by
γ(k) are positive semidefinite means that, for any real vector a = (a1 , ..., ak )0 , we have a0 Γa ≥ 0.
This condition is usually not easy to check. However, the following theorem shows that the answer
is clear, if f (λ) is easy to calculate.
Theorem 3.1 A symmetric, absolutely summable function γ(k) defined on the integers is the
autocovariance function of a stationary process, iff
φ(B)Xt = ψ(B)t ,
(3.48)
where t is a WN process, φ(z) and ψ(z) have no common roots. Then {Xt } has the spectral
density function
σ 2 |ψ(e−iλ )|2
,
−π ≤ λ ≤ π.
(3.49)
fX (λ) = 2π |φ(e−iλ )|2
∞
1 X −ikλ
e
γ(k) ≥ 0.
2π k=−∞
(3.47)
Remark. The spdf depends only on γ(k) (or ρ(k)). Hence, the above results hold, if only t are
uncorrelated with mean zero and variance σ2 . The invertible condition is not necessary. These
results can also be extended to causal stationary MA(∞) or AR(∞) processes.
See Corollary 4.3.2 in Brockwell and Davis (1991). Other properties of γ(k) follow from the
assumptions of Theorem 3.1. Similar results hold for the autocorrelation function.
Remark. Although the above formula seems to be very simple, the following examples show that
it is still not easy to obtain the explicit solution in a special case, because we have to calculate
the norm of complex functions.
f (λ) =
Example. Now we can show that the following function

 1, if k = 0,
ρ, if k = ±1,
ρ(k) =

0, otherwise,
is an acf of a stationary process, if and only if |ρ| ≤ 21 .
28
∀λ ∈ [−π, π].
Examples:
1. We will show again, that the spdf of an MA(1) process Xt = ψt−1 + t with |ψ| < 1 is
fX (λ) =
σ2
(1 + 2ψ cos λ + ψ 2 ).
2π
29
Proof: Following Theorem 3.3 we have
=
0.4
spdf
0.2
0.0
1.5
2.0
3.0
2.5
3.0
spdf
3
4
2.5
1
spdf
1.0
0.5
1.0
1.5
2.0
2.5
3.0
0.0
0.5
1.0
1.5
2.0
lambda
lambda
phi = −0.3 and psi = 0.8
phi = 0.3 and psi = −0.8
−1
−1
[1 − φ cos(−λ)]2 + [−φ sin(−λ)]2
1 − 2φ cos λ + φ2
−1
−1
.
(3.51)
spdf
|1 − φ[cos(−λ) + i sin(−λ)]|2
0.05
=
0.5
phi = −0.8 and psi = 0
0.20
=
0.0
phi = 0.8 and psi = 0
)|
|1 − φe−iλ |2
3.0
lambda
2 −1
spdf
=
|φ(e
2.5
0.05
f (λ) =
2.0
0
0.0
−iλ
1.5
lambda
1
−1
σ2
1 − 2φ cos λ + φ2
.
2π
σ2
2π
σ2
2π
σ2
2π
σ2
2π
σ2
2π
1.0
2
spdf
0.2
0.0
(3.50)
2. The spdf of an AR(1) process Xt = φXt−1 + t with |φ| < 1 is
. Proof. Now we have
0.5
0
=
0.0
0.0
We see, the example given after Theorem 3.2 is indeed the spdf of an AR(1) process with φ and
σ2 = 2π. Combining the above examples, we obtain:
0.20
=
4
=
3
=
2
f (λ) =
phi = 0 and psi = −0.8
0.4
phi = 0 and psi = 0.8
σ2
|ψ(e−iλ )|2
2π
σ2
|1 + ψe−iλ |2
2π
σ2
|1 + ψ[cos(−λ) + i sin(−λ)]|2
2π
σ2
[1 + ψ cos(−λ)]2 + [ψ sin(−λ)]2
2π
σ2
1 + 2ψ cos λ + ψ 2 .
2π
0.5
1.0
1.5
2.0
2.5
3.0
0.0
lambda
0.5
1.0
1.5
2.0
2.5
3.0
lambda
The spdf of an ARMA(1, 1) model Xt = φXt−1 + ψt−1 + t with |φ| < 1 and < ψ| < 1 is
f (λ) =
σ2 (1 + 2ψ cos λ + ψ 2 )
.
2π (1 − 2φ cos λ + φ2 )
(3.52)
The spectral densities of the following models:
f (0) =
• MA(1) with ψ = 0.8, −0.8,
• AR(1) with φ = 0.8, −0.8,
∞
X
• ARMA(1, 1) with φ = 0.3, ψ = −0.8
Later we will see that for a time series Xt , the variance of the sample mean, var(x̄), depends on
∞
P
γX (k).
k=−∞
Generally, it is not easy to calculate this sum, since it depends on all γ(k), k = 0, 1, ....
∞
P
k=−∞
30
(3.53)
γ(k) = 2πf (0).
(3.54)
k=−∞
are shown in the following figures.
its spdf (without calculating γ(k) and f (λ)!).
∞
∞
1 X
1 X
γ(k)e−ik0 =
γ(k),
2π k=−∞
2π k=−∞
because e−ik0 ≡ 1 for any k. We therefore obtain
• ARMA(1, 1) with φ = −0.3, ψ = 0.8 and
However, we can show for an ARMA process that
Assume that γX (k) of a stationary process Xt are absolute summable, then Xt has the spectral
density fX (λ). Following the definition we have, at the origin λ = 0,
γX (k) can be easily obtained by means of
Note again that e−iλ = 1 for λ = 0. For an ARMA(p, q) process the above formula reduces to
2
∞
X
ψ(1)
γ(k) = 2πf (0) = σ2
.
(3.55)
φ(1)
k=−∞
Inserting ψ(1) and φ(1), we have for any causal stationary ARMA(p, q):
!2
Pq
∞
X
j=0 ψj
P
.
γ(k) = 2πf (0) = σ2
1 − pi=1 φi
k=−∞
(3.56)
31
Examples: Assume that t are iid N (0, 1) random variables.
1. For MA(1) Xt = ψt−1 + t , |ψ| < 1, we have
∞
P
γX (k) = σ2 (1 + ψ)2 .
k=−∞
2. For AR(1) Xt = φXt−1 + t , |φ| < 1, we have
∞
P
i=0
γX (k) = σ2 (1 − φ)−2 .
k=−∞
3. For Xt = 0.6Xt−1 − 0.3Xt−2 + 0.2Xt−3 + 0.4t−1 + 0.2t−2 + 0.3t−3 + t we have
∞
X
k=−∞
The above results can easily be extended to more general cases. Assume that Xt has a causal
MA(∞) representation
∞
X
Xt =
αi t−i
(3.57)
γX (k) = [(1 + 0.4 + 0.2 + 0.3)/(1 − 0.6 + 0.3 − 0.2)]2 = (1.9/0.5)2 = 14.44.
with
∞
P
|αi | < ∞, t are uncorrelated with E(t ) = 0 and var(t ) = σ2 , which are not necessarily
i=0
independent2 . Recall that now γ(k) are absolutely summable (see Lemma 2.3). By extending the
above results, we obtain the following relationship
!2
∞
∞
X
X
γ(k) = σ2
αi .
(3.58)
i=−∞
i=0
This can also be shown by means of convolution of two infinite series.
2
32
A well known theorem in time series analysis, called the Wold Decomposition, ensures that almost all common
stationary processes can be represented as an MA(∞) process of uncorrelated innovations.
33
4 Partial Autocorrelations
5 Linear Non-stationary Processes
If we want to fit an AR(p) model to our data, we have to choose a proper order of the model. By
means of the acf, it is not easy to determine, which p should be used. The partial autocorrelation
function (pacf) was traditionally introduced as a tool for selecting the order of an AR model,
because, whereas an AR(p) model has an acf which is infinite in extent, its partial autocorrelations
are only non-zero until lag k = p, similarly to the autocorrelations of an MA model.
The class of ARMA processes is the most important class of stationary time series. However, in
the practice, in particular in finance and insurance, most time series observed are non-stationary.
One important reason for non-stationarity is the effect due to the integration of two stationary
observations. Non-stationary processes in this sense are hence called integrated ones. In the
sequel, the ARMA processes will be extended to the well known linear non-stationary integrated
processes.
The partial autocorrelation of a stationary process at lag k, denoted by α(k), may be regarded
as the correlation between Xt and Xt−k after adjusting for {Xt−1 , ..., Xt−k+1 } or conditionally on
{Xt−1 , ..., Xt−k+1 }. In other words, the partial autocorrelation α(k) is the correlation between the
residuals of the regression of Xt on {Xt−1 , ..., Xt−k+1 } and the residuals of the regression of Xt−k
on {Xt−1 , ..., Xt−k+1 }.
Assume we are given a stationary process with autocorrelations ρ(k), k = ..., −1, 0, 1, ... such that
ρ(k) → 0 as k → ∞. The formulas of its pacf are very complex. However, it is easy to calculate
the sample pacf’s from your data using a software , e.g. R.
The pacf for an AR(2) process.
(ii) For k = 2 we have
α(2) =
Definition 5.1 (The ARIMA(p, d, q) process) If d is a non-negative integer, then {Xt } is said
to be an ARIMA(p, d, q) process if Yt := (1 − B)d Xt is a causal stationary ARMA process.
φ(B)(1 − B)d Xt = ψ(B)t
2
=
ρ(2) − ρ (1)
,
1 − ρ2 (1)
φ(B)Yt = ψ(B)t with Yt = (1 − B)d Xt ,
(5.60)
If d = 0 we will simply have an ARMA process. The process is stationary, if and only if d = 0.
Example 1. For the AR(1) process Xt = φXt−1 + t , |φ| < 1 we have α(0) = 1, α(1) = φ and
α(k) = 0 for k > 1.
Example 2. For a causal AR(2) process Xt = φ1 Xt−1 + φ2 Xt−2 + t we have α(0) = 1, α(±1) =
ρ1 = φ1 /(1 − φ2 ), α(2) = φ2 and α(k) = 0 for k > 2 (following the above results). Note in
particular that −1 < α(2) = φ2 < 1, because it is some kind of correlation coefficient.
The pacf’s of MA processes are very complex and are in general nonzero for all lags, like the acf
of AR models. For the simplest invertible MA(1) process Xt = ψt−1 + t with |ψ| < 1, it can be
shown that, after lengthy calculation,
(−ψ)k (1 − ψ 2 )
.
1 − ψ 2(k+1)
or
where d = 0, 1, ..., φ(z) and ψ(z) are characteristic polynomials of the AR and MA parts, respectively, and {t } is a WN.
For an AR(p) process it can be shown that α(p) = φp and α(k) = 0 for all k > p. The result
α(k) = 0 for all k > p is due to the fact that an AR(p) model is a p-th Markov process. For k > p,
the influence of Xt−k on Xt is totally included in Xt−1 , ..., Xt−p .
α(k) = −
Processes Xt whose d-th differencing series Yt are ARMA processes, where d = 0, 1, ..., are called
ARIMA (autoregressive integrated moving average) processes, denoted by ARIMA(p, d, q).
This definition means that Xt satisfies the difference equation
(i) Define α(0) = 1 and α(1) = ρ(1) (no observations coming in between Xt and Xt−1 ).
1 ρ(1)
ρ(1) ρ(2)
1 ρ(1)
ρ(1) 1
5.1 ARIMA Processes
(4.59)
See Box and Jenkins (1976). For ψ < 0, α(k) < 0 ∀ k 6= 0. For ψ > 0, α(k) has alternating signs.
We see that the acf of a MA(q) process is zero for lag |k| > q. And the acf of an AR(p) process is
nonzero for all lags. In contrast to this, the pacf of a MA(q) process is nonzero for all lags. And
the pacf of an AR(p) process is zero for lag |k| > p. This shows again the duality between the MA
and AR processes.
This means that an ARIMA model with d ≥ 1 is non-stationary.
An ARIMA process with d > 1 is not practically relevant. Hence, we will mainly consider cases
with either d = 0 or d = 1.
If a time series follows an ARIMA(p, 1, q) model, then the series of the first differences, i.e.
Yt = ∆Xt = Xt − Xt−1 , will follow an ARMA(p, q) model and is stationary.
Given observations x1 , x2 , ..., xn of a time series Xt . The graph of the sample autocorrelations
ρ̂(k), called the correlogram, can be empirically used to check, whether d = 1 or d = 0, that is
whether Xt would be non-stationary or stationary.
Part (a) of the following figure displays a realisation x0 , x1 , ..., x300 following the ARIMA(2, 1, 0)
model:
(1 − B)Xt = Yt
with
Yt = 0.4Yt−1 + 0.2Yt−2 + t ,
where t are iid N (0, 1) random variables.
The differencing series yt = xt − xt−1 , t = 1, 2, ..., 300, is shown in part (b).
Note that the original series {xi } is of length 301, but the differencing series {yi } is of length 300.
One observation is lost by taking first order difference.
The estimated acf’s (the correlograms) of these two series are shown in part (c) (left) and (d)
(right), respectively.
The correlogram shown in Figure 2.9(c) indicates (empirically) clear nonstationarity of the data,
whereas that given in part (d) is a typical correlogram of a realisation of a stationary process.
√
The two dashed lines in a correlogram are the so-called ±2/ n confidence bounds, which will be
explained later.
34
35
(b) First Diff.
50
100
150
200
250
300
50
100
150
200
250
300
ACF
0.0
0.4
0.8
15
5
Xt
0
0
50
100
150
200
250
300
0
5
10
15
Time
Time
Time
Lag
(c) ARIMA(2,1,0)
(d) First Diff.
Random Walk
ACF of RW
20
25
30
20
25
30
0
5
10
15
20
25
30
5
10
15
20
25
30
The random walk is a useful model for financial time series.
For practical and theoretical reason, a RW is often assumed to start at the time point t = 0 with
known X0 = x0 . Without loss of generality it is often assumed X0 = 0.
Definition 5.2 (Random Walk) A random walk is the stochastic process defined by
X0 ,
t = 0,
Xt−1 + t , t > 0,
ACF
50
100
150
200
250
300
0
Time
5.2 Random Walk with or without Drift
0.0
0
Lag
0.4
Xt
0
Lag
Xt =
−10
−15
ACF
0.4
0.0
0.4
0.0
ACF
−5
0.8
0.8
0.8
0
0
−10
−2
−20
0
Y
0
0
X
1
2
10 20
ACF of RW
Random Walk
3
(a) ARIMA(2,1,0)
5
10
15
Lag
A time series in the practice may also have a non-stochastic trend together with a stochastic one.
If there is a simple linear trend in a RW, then it can be modelled by a random walk with drift
defined by
X0 ,
t = 0,
Xt =
(5.64)
Xt−1 + µ + t , t > 0,
where {t } is a WN with mean zero and variance σ2 , and µ 6= 0 is an unknown constant. Now, we
have
t
X
Xt = tµ +
i + X0
(5.65)
i=1
(5.61)
with mean E(Xt ) = tµ + X0 , which forms a linear non-stochastic trend in such a time series.
where {t } is a WN with mean zero and variance σ2 .
Moreover, it holds t = Xt − Xt−1 − µ.
A random walk is indeed an ARIMA(0, 1, 0) model starting from t = 1. Obviously, we have
Xt =
t
X
i + X0 .
(5.62)
i=1
E(Xt ) = X0 and var(Xt ) = tσ2 , (not stationary), causal
Due to the very big variance, a realisation of a random walk often shows a (stochastic) trend,
which is however purely random.
Two realisations of length n = 301 following the same RW model with iid N (0, 1) innovations are
shown in the following figure together with their sample acf’s.
A RW is also invertible, because we have
t = Xt − Xt−1 ,
(5.63)
for t > 0, where t are iid.
36
37
6 Estimation for Univariate Linear Processes
N
1X
1
|ai − a| + .
n i=1
2
≤
We will now discuss the estimation of
• the mean E(Xt ) = µ,
We can choose n large enough so that the first term is smaller than 12 . The result follows.
• the autocovariances γ(k) = E[(Xt − µ)(Xt+k − µ)],
Theorem 6.2 Let {Xt ; t = 1, 2, ...} be a time series satisfying
• the parameters
lim E(Xt ) = µ,
under the assumption that Xt follows an ARIMA(p, d, q) process.
t→∞
The data will be one realisation x1 , ..., xn of Xt , which will be called a time series.
lim cov(X̄, Xt ) = 0,
Furthermore, the selection of the unknown model and the application of the estimated model for
forecasting will also be discussed.
t→∞
where X̄ is as defined before. Then
We will mainly consider the so-called large sample properties of an estimator, based on the assumption that we have a relatively long time series.
lim E[(X̄ − µ)2 ] = 0.
t→∞
Remark. The conditions of Theorem 6.2 mean that E(Xt ) is asymptotically a constant, and any
single observation does not dominate the covariance. Note that stationarity is not required.
6.1 Estimation of µ
The expected value µ = E(Xt ) can be estimated by the sample mean
Proof. Now
n
n
1X
µ̂ = X̄ =
Xi .
n i=1
(6.66)
2
2
so
If Xt = t are iid with E(Xt ) = µ and var(Xt ) = σX
, we have E(X̄) = µ and var(X̄) = n1 σX
that
lim E (µ̂ − µ)2 = 0.
(6.67)
n→∞
1X
E(Xt ) − µ
n t=1
2
E[(X̄ − µ) ] = var(X̄) +
!2
,
where the second term on the right hand side converges to zero by Lemma 6.1. Furthermore,
n
var(X̄) =
n
1 XX
cov(Xt , Xj )
n2 t=1 j=1
For an estimator θ̂ of an unknown parameter θ, the quantity
h
i h i
2
E [θ̂ − θ]2 = E θ̂ − θ + var(θ̂)
=
n
t
n
2 XX
1 X
var(Xt )
cov(X
,
X
)
−
t
j
n2 t=1 j=1
n2 t=1
is called the mean squared error (MSE) of θ̂.
=
If MSE(θ̂)→ 0 as n → ∞, then θ̂ is said to be consistent, denoted by θ̂ → θ, as n → ∞. In the
i.i.d. case X̄ is consistent.
1 X
2 X
t cov(X̄t , Xt ) − 2
var(Xt )
2
n t=1
n t=1
≤
2X
|cov(X̄t , Xt )|,
n t=1
n
6.2 Properties of X̄
n
n
where X̄t is the sample mean of the first t observations. By Lemma 6.1 the last sum converges to
zero.
♦
If Xt is stationary, then X̄ defined above is clearly unbiased, i.e. E(X̄) = µ.
Theorem 6.3 Let {Xt } be a stationary time series with mean µ and autocovariances γ(k) such
that γ(k) → 0 as k → ∞. Then X̄ is a consistent estimator of the mean µ.
Hence E[X̄ − µ]2 = var(X̄).
The following lemma will be used for some of the proofs in the chapter.
Lemma 6.1 Let {an } be a sequence of real numbers and a ∈ R.
Proof. Note that for a stationary process we have E(X̄) = µ. Hence we only need to show
n
lim an = a =⇒ lim
1X
n→∞ n
n→∞
n
38
≤
n→∞
i=1
Proof. By assumption, given > 0, we may choose an N such that |an − a| < 12 for all n > N .
For n > N , we have
1X
ai − a
n i=1
lim var(X̄) = 0.
ai = a
N
n
1X
1 X
|ai − a| +
|ai − a|
n i=1
n i=N +1
n
n
1X
1X
Xi ,
Xk
n i=1
n k=1
var(X̄) = cov
n
≤
n
!
1X
1X
cov Xi ,
Xk
n i=1
n k=1
n
n
1X
1X
=
cov Xi ,
Xk
n i=1
n k=1
!
!
39
n
1X
cov Xi ,
Xk
n k=1
!
n
n
1X
1X
|cov(Xi , Xk )| =
|γ(k − i)|
n k=1
n k=1
=
1
n
=
n
2X
|γ(k)| =: an → 0
n k=0
≤
We obtain
(n − |k|)γ(k)
k=−(n−1)
n−1
X
=
γ(k) − 2
n−1
X
k
k=1
k=−(n−1)
n
γ(k).
By Lemma 6.4 the second term of the last equation tends to zero. Theorem 3.2 is proved.
n
var(X̄) ≤
n−1
X
1X
an = an → 0
n i=1
k=−∞
Example. For any causal stationary ARMA process Xt we have γ(k) → 0. Hence, X̄ → µ as
n → ∞.
Example. Let Z be a Bernolli random variable with distribution P (Z = 1) = P (Z) = 0 = 0.5.
Define
1, for Z = 1,
Xt = sign(Z − 0.5) =
t = 1, 2, ....
−1, for Z = 0,
It is easy to show that {Xt } is a stationary process with zero mean and γ(k) ≡ 1 for all k, i.e.
γ(k) 6→ 0 as k → ∞. For this process we have either X̄ ≡ 1 (for z = 1) or X̄ ≡ −1 (for z = 0).
None of them is equal to or converges to µ = 0.
lim
then
lim
n→∞
n
X
|aj | < ∞,
For a causal stationary ARMA(p, q) process, the above result reduces to
!2
Pq
σ2
j=0 ψj
P
var(X̄) ≈
.
n 1 − pi=1 φi
Example: Let x1 , x2 , ..., x400 be an observed time series following the theoretical model
Xt − µ = 0.5(Xt−1 − µ) + 0.3(Xt−2 − µ) + t ,
where t are iid N (0, σ2 ) random variables. Calculate the asymptotic variance of the sample mean
n
X
j
n
25 2
. σ2
σ = 0.0625σ2 .
var(x̄) = (1 − 0.5 − 0.3)−2 =
400
400 |aj | = 0.
Using the formula for γ(0) of an AR(2) model, we obtain
(1 − φ2 )σ2
(1 + φ2 )[(1 − φ2 )2 − φ21 ]
2
0.7
(1 − 0.3)σ
=
=
σ 2 = 2.244σ2 .
(1 + 0.3)[(1 − 0.3)2 − 0.52 ]
1.3 ∗ 0.24 Proof. Set e.g. N = n1/3 , then we have
n
X
j
j=0
n
|aj | =
var(Yt ) = var(Xt ) = γ(0) =
N
X
j
j=0
n
|aj | +
n
X
j
|aj |,
n
j=N +1
We have var(ȳ) = var(Yt )/400 = 0.0056σ2 , i.e., asymptotically, var(x̄) > 10var(ȳ).
where the second term tends to zero by assumption, because N → ∞, and
N
X
j
j=0
n
Theorem 6.6 Let Xt be a causal stationary process with MA(∞) representation
Xt =
Theorem 6.5 Assume that {Xt } is a stationary time series with absolutely summable autocovariances γ(k). Then X̄ → µ, as n → ∞. Furthermore,
lim n var(X̄) =
n→∞
∞
X
γ(k).
k=−∞
Proof. We only need to show the second statement, for which we have
n
n
1 XX
nvar(X̄) =
γ(t − j)
n j=1 t=1
The following theorem provides a CLT for a general linear process, which is a linear filter of an
iid WN t with var(t ) = σ2 .
|aj | < max(|aj |)n−1/3 → 0.
too, as n → ∞. The result holds.
40
(6.68)
Solution. We have
j=0
j=0
k=−∞
x̄. Furthermore, assume that yt , t = 1, 2, ..., 400, are iid random variables with var(Yt ) = var(Xt )
and unknown mean. Compare var(ȳ) with the asymptotic variance of x̄, where ȳ is the sample
mean of yt .
Lemma 6.4 (Kronecker’s lemma) If the sequence {aj } is such that
n→∞
The asymptotic variance of X̄ is larger than that of the sample mean of iid random variables Yt
∞
∞
P
P
with the same variance as Xt , if
γ(k) > γ(0) and smaller, if
γ(k) < γ(0).
∞
X
αj t−j
j=0
with
∞
P
j=0
|αj | < ∞,
∞
P
αj 6= 0, and the t are iid random variables with E(t ) = 0 and var(t ) = σ2 .
j=0
Then
√
nX̄ →D N (0, V ),
where
V =
∞
X
k=−∞
γ(k) =
∞
X
(6.69)
!2
αj
σ2 .
j=0
41
and the sign →D means convergence in distribution.
Solution. Xt is MA(2). We have
Note that Theorem 6.6 holds for all ARMA processes.
0.25 · 9
. σ2
var(x̄) = (1 − 0.8 + 0.3)2 =
= 0.0025.
n
900
We can give a confidence interval for µ of an ARMA model.
√
√
Since nX̄ tends to a normal distribution, X̄ is called n convergent.
In general case with E(Xt ) = µ 6= 0, we have
√
n(X̄ − µ) →D N (0, V ),
and γ(0) = (1 + ψ12 + ψ22 )σ2 = 1.73σ2 and hence var(ȳ) = var(Yt )/n = γ(0)/900 = 0.0173.
.
Asymptotically, var(x̄) = 0.1445var(ȳ) and SDX̄ = 0.05, SDȲ = 0.1315.
(6.70)
The approximate 95% confidence intervals are:
µX ∈ [x̄ − 2SDX̄ , x̄ + 2SDX̄ ] = [35.15, 35.35]
where V is the same as in (6.69).
The assumption
∞
P
and
αj 6= 0 is necessary.
µY ∈ [ȳ − 2SDȲ , ȳ + 2SDȲ ] = [25.487, 26.013].
j=0
Example. Let Xt = t − t−1 . Now we have
∞
P
αj = 1 − 1 = 0 and hence the results of Theorem
j=0
6.6 do not hold for this Xt .
Example. Continue the example about the variances of x̄ and ȳ. Assume that there σ2 = 1, we
have the standard deviations of x̄ and ȳ are about 0.25 and 0.075, respectively. For simplicity,
.
we can use the standard normal quantile Z0.025 = 1.96 = 2 to calculate the approximate 95%
confidence interval. Now an approximate 95% confidence interval, e.g. for µX , is simply x̄ ± 2 ×
SDX̄ . Assume that we obtained x̄ = 10.5 and ȳ = 15.15 from the data. Then the approximate
95% confidence intervals are µX ∈ [10, 11] and µY ∈ [15, 15.30]. The length of the former is more
than three times of that of the latter.
√
Following the CLT n(X̄ − µ) →D N (0, V ) we have, asymptotically,
In this example, the asymptotic variance of x̄ is about 14.5% of that of ȳ by an iid random variable
with the same variance. Hence, the aymptotic standard deviation of x̄ is also much smaller than
that of ȳ. Consequently, the confidence interval of µX is much shorter than that of µY .
This example provides a case where the estimation of the unknown mean in dependent data is
more accurate than the estimation in independent data.
X̄ − µ
q
∼ N (0, 1),
V
n
where
p
V /n is the asymptotic standard deviation of X̄. For an ARMA model we have simply
!2
Pq
j=0 ψj
P
V = σ2
.
1 − pi=1 φi
This means, for any (upper) normal quantile Zα/2 we have, with about (1 − α) cover probability,
r
r
V
V
−Zα/2
≤ X̄ − µ ≤ Zα/2
,
n
n
or equivalently,
"
µ ∈ X̄ − Zα/2
r
V
, X̄ + Zα/2
n
r #
V
.
n
Example: Let x1 , x2 , ..., x900 be an observed time series following the theoretical model
Xt − µX = −0.8t−1 + 0.3t−2 + t ,
where t are iid N (0, σ2 ) random variables with σ2 = 9. Furthermore, assume that yt , t =
1, 2, ..., 900, are iid random variables with var(Yt ) = var(Xt ) and unknown mean. Assume we
have x̄ = 35.25 and ȳ = 25.75. Calculate var(x̄) asymptotically. Calculate var(ȳ) and compare it
with var(x̄). And then construct the approximate 95% confidence intervalers of µX and µY , and
compare them with each other.
42
43
6.3 Estimation of the acf
Theorem 6.7 Let the time series {Xt } be defined by
Given a stationary process with mean µ and autocovariances γ(k).
Xt − µ =
n−k
(6.71)
where the sequence {αj } is absolutely summable and t are iid (0, σ2 ) innovations with E(4t ) = ησ4 .
Then, for fixed k ≥ h ≥ 0
E{γ̂(k) − γ(k)} = −
and
γ̃(k) =
n−k
X
1
(Xt − X̄)(Xt+k − X̄)
n − k t=1
(6.72)
|k|
n − |k|
γ(k) −
var(X̄) + O(n−2 )
n
n
(6.75)
and
n2
cov{γ̂(k), γ̂(h)}
n→∞ n − k
= (η − 3)γ(k)γ(h)
∞
X
+
[γ(j)γ(j − k + h) + γ(j + h)γ(j − k)]
lim
for k = 0, 1, ..., n − 1. For k = −(n − 1), ..., −1 we define γ̂(k) = γ̂(−k) and γ̃(k) = γ̃(−k). Note
that γ̂(0) = γ̃(0).
Remark. If µ is known, we can use µ instead of X̄ in the above definitions. Now the error caused
by X̄ is avoided. However, it can be shown that, for fixed k, the asymptotic properties of γ̂(k) or
γ̃(k) are the same for cases with known or unknown µ.
(6.76)
j=−∞
Remark. O(n−2 ) denotes a series of the order n−2 . For two series ai and bi , we say ai is of the
same order as bi , denoted by ai = O(bi ), if lim ai /bi tends to some non-zero constant. And if
The autocorrelations can be estimated by
ρ̂(k) =
or
αj t−j ,
j=−∞
Two reasonable estimators of γ(k) are (sample size n)
1X
γ̂(k) =
(Xt − X̄)(Xt+k − X̄)
n t=1
∞
X
γ̂(k)
γ̂(0)
γ̃(k)
ρ̃(k) =
.
γ̃(0)
i→∞
(6.73)
(6.74)
For k = 0 we have ρ̂(0) = ρ̃(0) ≡ 1. Hence, for estimating ρ(k) we only need to discuss the
properties of these estimators with k 6= 0.
Which of these two estimators should be used?
At the first sight, it seems that γ̃(k) is more logical than γ̂(k), because we just have n − k product
terms in the sum.
• bias(γ̂(k)) >bias(γ̃(k))
lim ai /bi → 0, we say that ai is of a smaller order than bi , denoted by ai = o(bi ).
i→∞
Remark. What we can learn from the above results are: For estimating µ, the existence of the
variance is often required. And for estimating γ(k), the existence of the fourth order moments is
often required. Note that, if t are independent N (0, σ2 ) random variables, then (η − 3)γ(k)γ(h) =
0, since η = 3 for a normal distribution.
Theorem 6.8 Under the assumptions of Theorem 6.7 and for fixed i ≥ j > 0, the following
results hold:
n − |i|
E{ρ̂(i)} =
ρ(i) + O(n−1 )
n
and
.
ncov{ρ̂(i), ρ̂(j)} = wij ,
(6.77)
where wij are some constants depending on ρ(k) in a very complex way.
• MSE(γ̂(k)) <MSE(γ̃(k)) (in general)
• γ̃(k) is not necessarily positive semi-definite but γ̂(k) is always positive semi-definite.
Due to these reasons, γ̂(k) is often used in a statistical software.
γ̂(k) also has a clear disadvantage, i.e.
Let k ≥ 1 be fixed. Define ρ(k) = (ρ(1), ..., ρ(k))0 and ρ̂(k) = (ρ̂(1), ..., ρ̂(k))0 . The following
theorem shows that ρ̂(h), h = 1, 2, ..., k, for fixed k are asymptotically normal.
Theorem 6.9 Under the assumptions of Theorem 6.7 and for fixed k ≥ 1, we have
ρ̂ →D N (ρ, n−1 W ),
n−1
X
γ̂(k) = 0
where W = (wij ) with wij defined above.
k=−(n−1)
for any data set, no matter what the process is. This is a restriction introduced by the definition
and is not a property of the underlying process. Hence, only γ̂(k) with relatively small lags should
be estimated and used. Also the estimator γ̃(k) should only be calculated for relatively smaller
k’s. A
√ rule of thumb is, the maximal lag used, say m, should be negligible compared to n, e.g.
m = n.
Remark. If k is negligible compared to n, then the difference between γ̂(k) and γ̃(k) is small.
For fixed k, γ̃(k) resp. ρ̃(k) have the same asymptotic properties as γ̂(k) resp. ρ̂(k).
44
Example. Let Xt = {t } WN with E(t ) = 0 and var(t ) = σ2 (ρ(k) = 0 for k 6= 0)
1, if i = j
wij =
0, otherwise.
ρ̂(1), ρ̂(2), ..., ρ̂(k), are asymptotically independent with mean zero and variance n1 . Therefore,
95% of the sample autocorrelations should lie between the bounds
1.96 .
2
± √ = ±√ .
n
n
45
√
These are the so-called ±2/ n confidence bands given on a correlogram in R for the sample acf,
which shows empirically, whether the underlying process could be a WN or not.
Remark. Note that this results is obtained under the iid assumption. It should be noticed that,
if more than 5% of the ρ̂(k) lie outside these bounds, then we can say that the underlying process
is possibly not an independent WN process. However, if more than 95% of ρ̂(k) lie between these
two bounds, we cannot say that the process is iid, because the second order properties of an iid
process and an uncorrelated white noise process are the same.
The estimation of an AR(1) model in thePcase with unknown mean µX is similar. Now
P we should
first P
calculate x̄ (from the sum of all xt , nt=1 xt ), and then calculate the two sums nt=1 (xt − x̄)2
and nt=1 (xt − x̄)(xt+1 − x̄). If information about these three sums are given, then it is enough
for all further calculations.
Example. Assume that we have a time series x1 , ..., x900 following an AR(1) model with unknown
mean. From the data we obtained
900
X
xt = 27254.45 ,
t=1
7 Estimation of the ARMA Model
and
t=1
899
X
In this section we will discuss the estimation of the unknown parameters of an ARMA model.
900
X
(xt − x̄)2 = 13347.46
(xt − x̄)(xt+1 − x̄) = 8385.93.
t=1
At first it is assumed that the orders p and q are known. For an ARMA(p, q) model with
Then
φ(B)Xt = ψ(B)t
(7.78)
(σ2 ; φ1 , ..., φp ; ψ1 , ..., ψq )0 .
the unknown parameters are θ =
If we want to check, whether the
distribution would be normal or not, more detailed analysis will be involved.
x̄ = 27254.45/900 = 30.28
and
φ̂ = ρ̂(1) =
899
X
(xt − x̄)(xt+1 − x̄)/
t=1
900
X
(xt − x̄)2
t=1
= 8385.93/13347.46 = 0.6283.
7.1 Estimate an AR(1) model
Xt = φXt−1 + t ,
(7.79)
The fitted model is
Xt − 30.28 = 0.6283(Xt−1 − 30.28) + t .
where |φ| < 1 and t is a WN process.
For an AR(1) model, the unknown innovation variance σ2 can also be estimated from the information given in the above examples. Let Yt = Xt − µ, Yt is an AR(1) process with zero mean,
unknown parameter φ and var(Yt ) = var(Xt ). We know Yt has the MA(∞) representation:
We have two unknown parameters σ2 and φ.
Note that φ = ρ(1). Hence an estimator of ρ(1) will also be an estimator of φ.
Given observations x1 , ..., xn , the simplest estimator of φ is hence
Pn−1
(xi − x̄)(xi+1 − x̄)
φ̂ = ρ̂(1) = i=1Pn
.
2
i=1 (xi − x̄)
Yt =
(7.80)
If we know that E(Xt ) = 0, the use of x̄ is not necessary and φ̂ can also be calculated by
Pn−1
i=1 xi xi+1
φ̂ = ρ̂(1) = P
.
(7.81)
n
2
i=1 xi
∞
X
φi t−i ,
i=0
from which we have
var(Xt ) = var(Yt ) = γ(0) =
∞
X
!
2i
φ
σ2 = 1/(1 − φ2 )σ2 .
i=0
By definition, we have
n
The properties of φ̂ are the same√ as discussed in the last section. In particular, |φ̂| ≤ 1 (why?)
and it is under given conditions n-consistent and asymptotically normally distributed.
Example. AR(1) model with zero mean: From a time series x1 , ..., x400 we have obtained
400
X
i=1
x2i = 1012.74 and
399
X
γ̂(0) =
1X 2
x.
n t=1 t
The unknown innovation variance σ2 can hence be estimated from φ̂ and γ̂(0),
σ̂2 = γ̂(0)(1 − φ̂2 ).
xi xi+1 = 798.15.
i=1
Example. In the first example above, where it is assumed that µ = 0, we have simply
Then we have φ̂ = ρ̂(1) = 798.15/1012.74 = 0.788. The fitted model is
n
γ̂(0) =
Xt = 0.788Xt−1 + t .
This example is indeed calculated using a realisation following the AR(1) model Xt = 0.8Xt−1 + t , where
t are iid N (0, 1) random variables.
46
1X 2
x = 1012.74/400 = 2.532
n t=1 t
and
σ̂2 = (1 − φ̂2 )γ̂(0) = 2.532(1 − 0.7882 ) = 0.960.
47
Example. In the second example above, we have
γ = Γφ ⇐⇒ φ = Γ−1 γ
n
1X
(xt − x̄)2 = 13347.46/900 = 14.831
n t=1
provided the inverse of Γ exists.
We now replace γ and Γ by their estimates γ̂ and Γ̂ to obtain our estimates for φ,
and
φ̂ = Γ̂−1 γ̂
σ̂2 = (1 − φ̂2 )γ̂(0) = 14.831(1 − 0.62832 ) = 8.976.
. 8.976
.
−2
Furthermore, we have var(x̄) = 900 (1 − 0.6283) = 0.0722. And SDx̄ = 0.27. The approximate
95%-CI of µ is µ ∈ [30.28 − 2 ∗ 0.27, 30.28 + 2 ∗ 0.27] = [29.74, 30.82].
provided the inverse of Γ̂ exists. Generally, Γ̂−1 exists, because Γ̂(k) is positive semi-definite.
Results based on (7.84) are called Yule-Walker Estimators.
Remark. Denote by Yt = Xt − µ and X1t = Xt−1 . Then we have, for t = 2, ..., n,
Remark. If all γ̂(k) are replaced by ρ̂(k), the solutions will not change (why?).
Yt = φX1t + t .
(7.85)
Once we have estimates for γ and φ we obtain an estimate for σ2 from
We have
E[Yt |(X1t = xt−1 )] = φxt−1 .
γ(0) − φ1 γ(1) − φ2 γ(2) − . . . − φp γ(p) = σ2
That is we obtain a linear regression function f (x1t ) = a + bx1t with a = 0 and b = φ, which
is called a regression through the origin because of the fact f (0) = 0. The proposed φ̂ here is
indeed an approximate least squares estimator of φ in the above regression model. This idea can
be extended to general AR(p) models.
or from the explicit formulas for σ2 for AR(1) and AR(2) processes.
Two special cases:
(i) For AR(1) with p = 1, there is only one equation in (7.85), i.e.
γ̂(1)
φ̂1 = [γ̂(0)]−1 γ̂(1) = γ̂(0)
= ρ̂(1), as proposed earlier.
7.2 The Yule-Walker Estimators of an AR(p) Model
(ii) For AR(2) with p = 2, by replacing all γ̂(k) with ρ̂(k), we have
For an AR(p) Y with unknown mean µ we have
Yt − µ = φ1 (Yt−1 − µ) + · · · + φp (Yt−p − µ) + t .
(7.82)
Let Xt = Yt − µ, We obtain
Consider again the equations for γ of an AR(p), see one of the previous lectures for p = 2.
φ2 γ(2) − . . .
and
γ(1) −
φ1 γ(0) −
φ2 γ(1) − . . .
γ(2) −
φ1 γ(1) −
φ2 γ(0) − . . .
..
..
..
...
.
.
.
γ(p) − φ1 γ(p − 1) − φ2 γ(p − 2) − . . .
φ1 γ(1) −
The last p equations can be rewritten as

 
γ(0)
γ(1)
γ(1)
 γ(2)   γ(1)
γ(0)

 
 γ(2)   γ(2)
γ(1)

=
 ..  
..
..
 .  
.
.
48
γ(p)
{z }
γ
|
1 − ρ̂(2)
1 − ρ̂2 (1)
=
1 ρ̂(1)
ρ̂(1) 1
−1 ρ̂(1)
ρ̂(2)
.
(7.83)
Remark: This is a linear regression model (through the origin). Although this is not a usual
regression model, because the regressors Xt−1 , ..., Xt−p depend strongly on each other, the unknown
parameters φ1 , ..., φp can still be estimated using the method of least squares as in standard
regression analysis. A solution based on such a least squares method will lead to the same result
as the following Yule-Walker estimation.
|
The solutions are
Xt = φ1 Xt−1 + · · · + φp Xt−p + t .
γ(0) −
φ̂1
φ̂2
γ(2)
γ(1)
γ(0)
..
.
−
φp γ(p) =
− φp γ(p − 1) = 0,
− φp γ(p − 2) = 0,
..
..
.
.
−
φp γ(0) = 0,
γ(p − 1) γ(p − 2) γ(p − 3) . . .
{z
Γ
γ(0)
}|
φ(p)
{z }
φ
and φ̂2 = 1 −
1 − ρ̂(2)
1 − ρ̂2 (1)
.
Example. From a time series with n = 400, x1 , ..., x400 , we have x̄ = 10.06, γ̂(0) = 1.6378,
γ̂(1) = 0.9176 and γ̂(2) = 0.1969. Assume that the theoretical model is an AR(2) with unknown
mean µ. Estimate and write down the model. Find a 95%-CI for µ.
Solution. We have ρ̂(1) = 0.9176/1.6378 = 0.5603 and ρ̂(2) = 0.1969/1.6378 = 0.1202.
1 − 0.1202
1 − ρ̂(2)
= 0.5603
= 0.7185
φ̂1 = ρ̂(1)
1 − ρ̂2 (1)
1 − 0.56032
σ2 ,


. . . γ(p − 1)
φ(1)


. . . γ(p − 2) 
  φ(2) 


. . . γ(p − 3) 
  φ(3) 
  .. 
..
 . 
.
φ̂1 = ρ̂(1)
and
φ̂2 = 1 −
1 − ρ̂(2)
1 − 0.1202
=1−
= −0.2824.
1 − ρ̂2 (1)
1 − 0.56032
The estimated AR(2) model is
Xt − 10.06 = 0.7185(Xt−1 − 10.06) − 0.2824(Xt−2 − 10.06) + t .
Now, insert φ̂1 and φ̂2 into the formula
(7.84)
γ(0) =
(1 − φ2 )σ2
(1 + φ2 )[(1 − φ2 )2 − φ21 ]
we have γ̂(0) = 1.5839σ2 =⇒ σ2 = 1.6378/1.5839 = 1.034.
49
Furthermore, we have
.
var(x̄) =
σ2
400
estimators of an ARMA model. Note that, an estimated ARMA model will always be causal
stationary, because parameters outside the causal stationary regions will not be considered. This
is a restriction of the estimation method, which does not mean that the underlying model should
certainly be a causal stationary ARMA model. We can do a test on the hypothesis, that the model
is an ARMA. But this is not the topic of our lecture.
1.034
(1 − 0.7185 + 0.2824)−2 = 0.0081.
400
(1 − φ̂1 − φ̂2 )−2 =
.
Hence, SDx̄ = 0.09. The approximate 95% CI for µ is
µ ∈ [10.06 − 2 ∗ 0.09, 10.06 + 2 ∗ 0.09] = [9.88, 10.24].
Remark. The estimation of an AR(p) model is not difficult, because it is related to a regression
problem. The first p + 1 estimates of the autocovariances γ̂(k), k = 0, 1, ..., p, (together with n)
contain all information we need to estimate an AR(p) model using the above proposal.
7.3 Least Squares Estimators of an ARMA Model
In the following, only the basic ideas to estimate an ARMA(p, q) model will be described briefly.
Assume that E(Xt ) = 0. For an AR(p) we have
t = Xt − φ1 Xt−1 − ... − φp Xt−p .
The least squares estimators φ̂1 , ..., φ̂p are obtained by minimising the squared sum
Q :=
n
X
2
(ˆt ) =
t=p+1
n
X
Remark: Another commonly used method for estimating an ARMA model is the maximumlikelihood (ML) method, which will not be discussed in our lecture. When the distribution of Xt
is normal, then the least squares method and the ML method are asymptotically equivalent.
7.4 Selection of an ARMA(p, q) Model
Assume that the underlying process Xt is an ARMA(p0 , q0 ) model with unknown (p0 , q0 ). The
question is how the unknown parameters (p0 , q0 ) can be estimated. This is the so-called model
selection problem, which also occurs in other areas of statistics. Earlier it was indicated that the
partial acf can be used for selecting an AR model. This method is however no more a commonly
used one in modern statistics. Nowadays, model selection is often carried out following some
information criteria. The optimal orders p0 and q0 will then be estimated (or selected) from some
reasonable values of p and q.
Note that, for given (p, q), an ARMA(p, q) model is estimated by minimising the least squares
criterion
n
X
ˆ2t .
(7.88)
Q=
2
(Xt − φ̂1 Xt−1 − ... − φ̂p Xt−p ) .
t=p+1
t=1
This convenience does not exist, if the model has an MA part. Now we have
t = Xt − φ1 Xt−1 − ... − φp Xt−p − ψ1 t−1 − ... − ψq t−q ,
(7.86)
which involves unobservable -values in the past. Hence, an approximate recursion is required to
carry out a least squares estimation for an general ARMA model.
For some trail parameters φ1 , ..., φp and ψ1 , ..., ψq , ˆt can be approximated in the following way.
(i) Calculate µ̂ = x̄ and define yt = xt − x̄, the centralised data again.
(ii) Put ˆt = 0 and yt = 0 for t ≤ 0, because we do not have information about them.
However, the criterion Q cannot be directly used for selecting p and q, because the larger p or q
are, the smaller the minimised Q is. Hence, for model selection, a penalty term for large p and q
should be introduced.
For given (p, q), denote the minimised value of Q as Q(p, q). Then define
2
σ̂p,q
=
Q(p, q)
.
n
(7.89)
2
Note that, Q is the squared sum of ˆt . Hence, σ̂p,q
defined in this way is an estimator of σ2 by
2
definition. Two well known information criteria proposed in the literature based on σ̂p,q
are
(i) the AIC (Akaike’s Information Criterion, Akaike, 1978)
(iii) Set
2
AIC(p, q) = ln σ̂p,q
+2
ˆ1 = y1
ˆ2 = y2 − φ1 y1 − ψ1 ˆ1
..
.
p+q
,
n
(7.90)
(ii) the BIC (Bayesian Information Criterion, Akaike, 1977, Schwarz, 1978)
ˆt = yt − φ1 yt−1 − ... − φp yt−p − ψ1 ˆt−1 − ... − ψq ˆt−q
2
BIC(p, q) = ln σ̂p,q
+ ln(n)
(p + q)
.
n
(7.91)
for t > max(p, q).
The effect due to the above approximation is negligible for large n.
The penalty term in the BIC is larger than that in the AIC. Hence, the orders selected following
the BIC are not larger than those selected following the AIC.
We see, results in time series analysis are often obtained based on a large number of observations.
Now we can define
n
X
Q(φ1 , ..., φp ; ψ1 , ..., ψq ) =
ˆ2t .
(7.87)
The main different between these two methods is, p̂ and q̂ selected following the BIC are consistent
estimators of p0 and q0 , i.e. they tends to p0 and q0 respectively, as n → ∞. However, those selected
following the AIC are not consistent. Due to this reason we prefer to use BIC.
t=1
And we estimate φ1 , ..., φp and ψ1 , ..., ψq , by minimising Q over all values of φ1 , ..., φp and ψ1 , ..., ψq
within the causal stationary regions of these parameters. These are the approximate least squares
50
Simulated examples: Three realisations are generated following the AR(4) model
Xt = 0.4Xt−1 + 0.2Xt−2 + 0.0Xt−3 − 0.3Xt−4 + t
51
Table 3.1. BIC’s and p̂ for the three simulated time series
p
1. S.
2. S.
3. S.
0
0.446
0.233
0.548
1
0.208
0.115
0.272
2
0.223
0.119
0.272
3
0.224
0.133
0.293
4
0.137*
0.091*
0.183*
5
0.158
0.122
0.211
6
0.179
0.153
0.237
7
0.188
0.181
0.268
8
0.201
0.207
0.297
Then the optimal linear prediction of Xn+k is given by the conditional expaectation of Xn+k given
X1 , ..., Xn , i.e.
X̂n+k = E[Xn+k |Xn , ..., X1 ].
(8.92)
For an AR(p) process we have Xn+k = φ1 Xn+k−1 + ... + φp Xn+k−p + n+k . The point prediction
for Xn+k and given observations x1 , ..., xn , is now
x̂n+k = φ1 (x̂n+k−1 |xn , ..., x1 ) + ... + φp (x̂n+k−p |xn , ..., x1 ),
(8.93)
with iid N (0, 1) t . Table 3.1 shows the estimated BIC’s for p = 0, 1, ..., 8, where p̂ with minimal
BIC is indicated with a small star.
where (x̂n+k−i |xn , ..., x1 ), i = 1, ..., p, are either the observations xn+k−i , if k − i ≤ 0 or the
prediction x̂n+k−i obtained before, if k − i > 0.
The estimated models obtained from these three time series using p̂ as the AR order are:
The prediction procedure is as follows.
(i) Xt = 0.407Xt−1 + 0.254Xt−2 − 0.027Xt−3 − 0.323Xt−4 + t
(i) The first step optimal linear prediction for k = 1 is
x̂n+1 = φ1 xn + ... + φp xn+1−p .
(ii) Xt = 0.326Xt−1 + 0.111Xt−2 − 0.029Xt−3 − 0.264Xt−4 + t
(iii) Xt = 0.462Xt−1 + 0.142Xt−2 + 0.088Xt−3 − 0.358Xt−4 + t .
(ii) The second step optimal linear prediction for k = 2 is
x̂n+2 = φ1 x̂n+1 + φ2 xn + ... + φp xn+2−p .
7.5 Selection of d in an ARIMA(p, d, q) model
(iii) The k-step optimal linear prediction for k > p is
Assume that now the time series Xt follows an ARIMA(p, d, q) with d ∈ {0, 1}. We have also to
determine, whether Xt is integrated (with d = 1) or not (with d = 0). This can be solved in the
following way.
(i) Set d = 0. Estimate ARMA models from the original data x1 , ..., xn and select p̂0 , q̂0 using
BIC. We obtain the minimised BIC in this case, denoted by BIC(0).
(ii) Set d = 1. Let yt = ∆xt = xt − xt−1 for t = 2, ..., n. Estimate ARMA models from the
y2 , ..., yn and select p̂1 , q̂1 using BIC. We obtain the minimised BIC in this case, denoted by
BIC(1).
(iii) Compare the two values of BIC. We will select d = 0, if BIC(0)<BIC(1) and use the
ARIMA(p̂0 , 0, q̂0 ) (i.e. the ARMA(p̂0 , q̂0 )) model obtained in 1. Otherwise we will select d = 1 and use the ARIMA(p̂1 , 1, q̂1 ) model obtained in 2.
x̂n+k = φ1 x̂n+k−1 + ... + φp xn+k−p .
Remark. If an AR(p) model is given, than the point predictions for such a model depend only
on the last p observations. This will be clarified using examples.
Remark. For large k, X̂n+k tends to zero, the true mean. This means that, the given observations
have only a clear effect on some future observations in a short period.
Example: From a time series x1 , ..., xn we obtained the following AR(1) model:
Xt = 0.7Xt−1 + t ,
where t are iid WN with mean zero. The last observation is xn = −1.2. Calculate the first 10
point predictions.
Solutions: x̂n+1 = φ̂xn = 0.7∗(−1.2) = −0.84, x̂n+k = φ̂x̂n+k−1 = φ̂k xn for k ≥ 2. All predictions
are listed in the following table.
8 Forecasting for Linear Processes
The first 10 point predictions for the above AR(1) model with xn = −1.2
k
x̂n+k
8.1 Point Forecasting for an ARMA Process
1
-0.840
2
-0.588
3
-0.412
4
-0.288
5
-0.202
6
-0.141
7
-0.099
8
-0.069
9
-0.048
10
-0.034
Example. From a time series x1 , ..., xn we obtained the following AR(2) model:
For simplicity, the data considered in this and the next subsections are assumed to be centralised
with zero sample mean.
Assume that we have fitted an ARMA model to a time series x1 , ..., xn . Now we want to forecast
a future observation Xn+k with k ≥ 1 based on the fitted model. This will be explained briefly in
the following for AR(p) models. The method introduced in this subsection is often called BoxJenkins Method (Box and Jenkins, 1970). We will see again, similar to the estimation of a
model, the point forecasting is easier for AR(p) than for MA or ARMA models.
If we have a prediction (the same as forecast) X̂n+k , then the MSE of the prediction is
2
MSE(X̂n+k ) = E[(X̂n+k − Xn+k ) ].
Considered are only the so-called linear predictions, which are linear functions of all observations
in the past.
52
Xt = 0.6Xt−1 + 0.3Xt−2 + t ,
where t are iid WN with mean zero. The last two observation are xn−1 = 1.1 and xn = 0.5.
Calculate the first 10 point predictions.
Solutions: x̂n+1 = φ̂1 xn + φ̂2 xn−1 = 0.6 ∗ 0.5 + 0.3 ∗ 1.1 = 0.630, x̂n+2 = φ̂1 x̂n+1 + φ̂2 xn =
0.6 ∗ 630 + 0.3 ∗ 0.5 = 0.528, x̂n+k = φ̂1 x̂n+k−1 + φ̂2 x̂n+k−2 for k ≥ 3. All predictions are listed in
the following table.
The first 10 point predictions for the above AR(2) model with xn−1 = 1.1, xn−1 = 0.5.
k
x̂n+k
1
0.630
2
0.528
3
0.506
4
0.462
5
0.429
6
0.396
7
0.366
8
0.338
9
0.313
10
0.289
The optimal linear prediction for ARMA(p, q) processes is more complex, which will not be
discussed in this lecture.
53
8.2 Interval Forecasting for Normal ARMA(p, q) Processes
Assume now the process is normally distributed, then it is possible to calculate forecasting intervals for a future observation Xn+k . Recall that a causal stationary ARMA(p, q) process can be
represented in MA(∞) form:
∞
X
Xt =
αi i .
i=0
Note that the point prediction X̂n+k introduced in the last subsection is the conditional mean of
a future observation Xn+k . It can be shown that the conditional variance of Xn+k given x1 , ..., xn
is
k−1
X
var(Xn+k |Xn , ..., X1 ) = σ2
αi2 .
i=0
Observe that α0 = 1 and var(Xn+k ) = σ2
∞
P
αi2
, we have
i=0
(i) var(Xn+k |Xn , ..., X1 ) increases as k increases.
(ii) var(Xn+1 |Xn , ..., X1 ) = σ2 .
(iii) σ2 ≤ var(Xn+k |Xn , ..., X1 ) ≤ σ2
∞
P
αi2 = var(Xn+k ) for k ≥ 2.
i=0
(iv) lim var(Xn+k |Xn , ..., X1 ) = σ2
k→∞
∞
P
An example for interval forecasting by an AR(1) model
αi2 = var(Xn+k ).
4
i=0
0
i=0
-2
i=0
2
The approximate 95% forecasting interval (FI) for Xn+k is
v
v

u k−1
u k−1 
uX
uX
Xn+k ∈ x̂n+k − 2σ t
αi2 , x̂n+k + 2σ t
αi2  .
Remark. The FI here is for one observation, which is not the same as the FI for the sample mean
x̄. Note that lim X̂n+k = 0 = µX and lim var(Xn+k |Xn , ..., X1 ) = var(Xn+k ). This means that
k→∞
k→∞
0
50
100
150
the larger k is, the fewer information about a future observation Xn+k is contained in the past
observations.
Example. Figure 3.2 shows a realisation of length 150 following the AR(1) model
Xt = 0.8Xt−1 + t ,
where t are iid N (0, 1) random variables. Note that we have now αi = 0.8i , i = 0, 1, ... The
first 100 observations are shown in solid lines, whereas the last 50 (see the first row of the table)
in points, which are assumed to be unknown future values. The point predictions for n = 100
and k = 1, 2, ..., 50, and the 95%-forecasting intervals are shown in long- resp. short-dashed lines,
where x100 = 2.966. The first 10 point predictions and the corresponding forecasting intervals are
listed in the following table.
The first 10 point predictions and the corresponding 95%-forecasting intervals
k
xn+k (future o.)
x̂n+k
lower bound
upper bound
54
1
1.403
2.373
0.373
4.373
2
3.032
1.898
-0.663
4.460
3
1.687
1.519
-1.345
4.382
4
2.337
1.215
-1.826
4.256
5
2.509
0.972
-2.177
4.121
6
1.888
0.778
-2.439
3.994
7
3.956
0.622
-2.637
3.881
8
4.148
0.498
-2.788
3.784
9
2.917
0.398
-2.905
3.701
10
1.375
0.318
-2.996
3.633
55
9 Vector Time Series
RPI and CPI rates
0.08
0.10
Literature: Brockwell, P. J. and Davis, R. A.: Introduction to Time Series and Forecasting,
Springer, 2002
0.04
0.06
Definition 9.1 A multivariate time series (or vector time series) is a sequence of m-dimensional
random vectors
X(t) = (X1 (t), . . . , Xm (t))0
,t ∈ T
0.02
for a given index set T.
0.00
Clearly, Xi is a univariate time series.
−0.02
Example 9.2 Let X1 be the inflation rate in the UK measured by the Retail Price index (RPI),
and X2 the inflation rate in the UK measured by the Consumer Price index (CPI).
1990
1995
2000
2005
2010
Time
RPI and CPI
200
Γ(t, s) is a matrix of the form


γ1,1 (t, s) . . . γ1,m (t, s)

..
..

.
.
RPI

Γ(t, s) = 
150
γm,1 (t, s) . . . γm,m (t, s)
with
γi,j (t, s) = cov(Xi (t), Xj (s))
100
for i, j = 1, . . . , m.
Therefore, the diagonal elements of Γ(t, s) are the autocovariance functions of the one-dimensional
time series Xi .
CPI
1990
1995
2000
2005
2010
9.1 Stationary Vector TS
Time
As in the one-dimensional case we call X weakly stationary if
We can now define the second order properties of X:
i) E(Xi (t)2 ) < ∞ for all i ∈ 1, . . . , m, t ∈ T,
• the mean of X(t):
ii) EX(t) = µ ∈ Rm for all t ∈ T, and


µ(t) = 



Note, that µ is an m-dimensional vector.
µm (t)
For a weakly stationary time series we use the notation
E [Xm (t)]
• and the autocovariance matrix as
Γ(t, s) = cov(X(t), X(s))
0 = E X(t) − µ(t) X(s) − µ(s)
.
56
iii) Γ(r, s) = Γ(r + t, s + t) for all r, s, t ∈ T.
µ1 (t)
E [X1 (t)]

..  = 
..
 = E [X(t)]
.  
.
Γ(h) = Γ(t + h, t)
and call it the covariance matrix at lag h.
γi,i (h) is the autocovariance function of the stationary TS Xi , and γi,j (h) is called the crosscovariance function of Xi and Xj .
57
We can also define the correlation matrix at lag h:


ρ1,1 (h) . . . ρ1,m (h)
γi,j (h)


..
..
R(h) = 
 with ρi,j (h) = p
.
.
γi,i (0)γj,j (0)
ρm,1 (h) . . . ρm,m (h)
ρi,j (h) is the correlation between Xi (h) and Xj (0).
Properties of Γ and R:
(i) γi,j (h) 6= γj,i (h) (in general)
(ii) Γ(h) = Γ0 (−h) 6= Γ(−h)
(iii) ρi,i (0) = 1
(iv) ρi,j (0) 6= 1 (in general)
Every VAR(p) process can be written as a VAR(1) process. Consider the m-dimensional VAR(p)process
X(t) = Φ1 X(t − 1) + ... + Φp X(t − p) + Z(t).




X(t)
Z(t)
 X(t − 1) 
 0 




Define: ξ(t) = 
 and V (t) =  .. 
..


 . 
.
X(t − p + 1)
0


Φ1 Φ2 Φ3 · · · Φp−1 Φp
 I
0 0 ···
0
0 


 0 I
0
·
·
·
0
0 
F =

 ..
..
..
..
.. 
 .
.
. ···
.
. 
0 0 0 ···
I
0
where I is the m × m identity matrix.
9.2 Multivariate ARMA Processes
We call a vector TS Z an m-dimensional White Noise process with covariance matrix Σ if Z is
a sequence of i.i.d. random vectors, E [Z] = (0, . . . , 0)0 ∈ Rm and Γ(0) = Σ.
Note that some authors (for example Brockwell & Davis) call Z a white noise process if Z is
stationary with zero mean and covariance
Σ for h = 0
Γ(h) =
0 otherwise
We then obtain:
ξ(t) = F ξ(t − 1) + V (t)
mp
where ξ(t), V (t) ∈ R
X(t) = ΦX(t − 1) + Z(t)
X(t)X(t − h)0 = ΦX(t − 1)X(t − h)0 + Z(t)X(t − h)0
E [X(t)X(t − h)0 ] = ΦE [X(t − 1)X(t − h)0 ] + E [Z(t)X(t − h)0 ]
This would be a sequence of uncorrelated identically distributed random variables.
Similarly to the definition of a one-dimensional ARMA process, we can now define a multivariate
ARMA process.
In particular:
Definition 9.3 A multivariate Autoregressive Moving Average process of order (p, q) (ARMA(p,
q)) is defined by
Γ(0) =
=
=
Γ(1) =
Γ(h) =
X(t) − Φ1 X(t − 1) − ... − Φp X(t − p) = Ψ1 Z(t − 1) + ... + Ψq Z(t − q) + Z(t)
where Φ1 , . . . , Φp and Ψ1 , . . . , Ψq are m × m matrices.
One can now define causality and invertibility as for a one-dimensional time series, see for example
Brockwell & Davis.
Of particular importance and widely used for modelling econometric data are vector autoregressive
processes VAR(p), that is, q = 0 (no moving average term).
and F is a mp × mp matrix.
The covariance function Γ of a VAR(1) process can be found recursively, again as in the onedimensional case.
ΦΓ(−1) + E [Z(t)X(t)0 ]
ΦΓ(1)0 + E [Z(t)Z(t)0 ]
ΦΓ(1)0 + Σ
ΦΓ(0)
ΦΓ(h − 1)
It follows that:
Φ = Γ(1)Γ(0)−1 if Γ(0)−1 exists.
This relationship can be used for estimating the coefficient matrix Φ.
Once Φ has been estimated, we can estimate Σ from
9.3 VAR(p)
Γ(0) − ΦΓ(1)0 = Σ.
Example 9.4 Consider the following two-dimensional VAR(1) process.
58
X(t)
X1 (t)
X2 (t)
X1 (t)
X2 (t)
= ΦX(t − 1) + Z(t)
0.5 0.2
X1 (t − 1)
Z1 (t)
=
+
−0.5 0.2
X2 (t − 1)
Z2 (t)
= 0.5X1 (t − 1) + 0.2X2 (t − 1) + Z1 (t)
= −0.5X1 (t − 1) + 0.2X2 (t − 1) + Z2 (t)
59
10 Introduction to ARCH and GARCH
0.05
0.10
The purpose of this chapter is to introduce the well known non-linear time series models ARCH
(autoregressive conditional heterocedastic) (Engle, 1982, joint winner of Nobel Prize in Economics
2003) and GARCH (generalised ARCH, Bollerslev, 1986), which are very important for modelling
financial time series.
0.00
−0.05
Financial time series have some special properties. Some of these are not consitent with linear
models such as ARMA or ARIMA. In particular
ftseLD
10.1 Properties of Financial Time Series
−0.10
• Financial time series are generally integrated, i.e. they are not stationary but differencing
stationary. Hence, we should either study the difference series or some other transforms of
them. Commonly used transformations of a financial time series Xt are the difference series
Yt = ∆Xt , the series of the log-returns
Yt = ∆(log Xt )
0
1000
2000
or that of the absolute returns
3000
4000
5000
6000
Index
Yt = ∆Xt /Xt−1 .
Figure 9: Daily returns of the FTSE100.
• The conditional variance of financial time series does often depend on past values of X. This
is usually not the case for the conditional mean.
• Financial returns are often uncorrelated but not independent. They often depend on each
other in some non-linear way, e.g. the quadratic returns often depend on each other. Due
to this fact, such processes are called non-linear processes.
• The conditional distribution of financial returns may be normal but the unconditional distribution of them is usually not.
Conditional expectation of AR(p):
1.0
0.8
ACF
0.6
0.6
0.4
ACF
0.4
0.2
0.2
0.0
10.2 The ARCH Models
0.0
In 1982 Robert Engle introduced the very important ARCH model for modelling the above special
properties of financial time series.
0.8
• Financial returns often have heavy tails, i.e. with var(Yt ) < ∞ but E(Yt4 ) = ∞.
Some of the above properties are illustrated in Figure 9 which shows the time series of the daily
log-returns of the FTSE100 stock market index from Jan 1985 to Feb 2010. There are some
clustered observations with large variances, which illustrate the so-called GARCH-effect. The
acf’s of the returns and squared returns (Figure 10) show that the former seem to be uncorrelated
but the latter are clearly correlated to each other. This means further that the return are not
independent.
Series ftseLD^2
1.0
Series ftseLD
0
10
20
Lag
30
0
10
20
30
Lag
E [Xt | Xt−1 , . . . , Xt−p ] = φ1 Xt−1 + . . . + φ1 Xt−p
Figure 10: acf of the daily returns of the FTSE100, and acf of the squared daily returns.
Conditional variance of AR(p):
h
i
2
E Xt − E [Xt | Xt−1 , . . . , Xt−p ] | Xt−1 , . . . , Xt−p
60
61
=E
h
Xt − φ1 Xt−1 , . . . , φ1 Xt−p
2
i
| Xt−1 , . . . , Xt−p
= E 2 = σ2
stationary. We then obtain
var(Zt ) = E Zt2 = E ht 2t
2
2
= E α0 + α1 Zt−1
+ ... + αp Zt−p
= α0 + α1 var(Zt ) + ... + αp var(Zt )
This means that the conditional variance does not depend on the past observations, but the
conditional mean does.
ARCH models have been introduced as an alternative. These models are about the error term t .
The particular form of the linear term is not important, and is therefore ignored in the following
definition.
An ARCH model of order p (ARCH(p)) is defined by
Zt =
p
since Z is stationary. We solve this equation for var(Zt ) and obtain
var(Zt ) =
We obtain:
0 < var(Zt ) < ∞ ⇐⇒
2
2
ht t with ht = α0 + α1 Zt−1
+ ... + αp Zt−p
where is a white noise process with E [2t ] = 1, and α0 > 0, αi ≥ 0 for i = 1, .., p. We often make
the assumption that t ∼ N (0, 1).
Here ht > 0 is the conditional variance of Zt , which depends on some past observations. This
is what conditional heteroscedasticity means. The larger the squared past observations are, the
larger the conditional variance.
Z(t) can now replace the white noise term in linear models or can be used directly to model a
time series.
We obtain:
E [Zt |Zt−1 , . . .] = E
i
hp
i
hp
ht t |Zt−1 , . . . = E
ht |Zt−1 , . . . E [t ] = 0
α0
.
1 − α1 − ... − αp
p
X
(10.94)
αi < 1.
i=1
If the variance exists, its quantity depends linearly on α0 , which is hence called the scale parameter.
Finally, an ARCH may have finite variance but infinite fourth order moments. Again an important
property of financial returns.
10.3 The GARCH Models
ARCH models were generalised by Bollerslev (1986) to GARCH(p, q) model, defined by
p
Zt = t ht
with t ∼ N (0, 1) i.i.d. and
since t is independent of ht . This is a common property of many financial return series (the
so-called Martingale property). Note, that E [Z(t)] = 0 too.
We obtain for the covariance function of Z:
cov(Zt , Zt+k ) = E [Zt Zt+k ]
i
hp
p
ht t ht+k t+k
= E
i
hp
p
= E
ht t ht+k E [t+k ]
= 0
since t+k , for k > 0, is independent of all other three terms and has mean zero. This is another
important property of financial returns, that is, they are often uncorrelated with each other.
2
2
ht = α0 + α1 Zt−1
+ ... + αp Zt−p
+β1 ht−1 + ... + βq ht−q ,
(10.95)
where α0 > 0, αi ≥ 0 for i = 1, ..., p, βj ≥ 0 for j = 1, ..., q
A GARCH model has similar properties as those for an ARCH model. The key point is that a
GARCH model is more practically relevant.
2
and on the variance of Zt−1 , while it only
For GARCH models, the variance of Zt depends on Zt−1
2
depends on Zt−1
if Z follows an ARCH-model.
Figure 11 shows the acf of the squared residuals of a GARCH(1,1) model fitted to the daily returns
of the FTSE100.
It is clear that Zt and Zt+k are not independent, because the variance of Zt+k depends on the past
of Z, and therefore on Zt . An ARCH model is an example of uncorrelated random variables which
are not independent.
The process is called a non-linear process, because there is no linear correlation between the
observations but there are some dependencies between them in a non-linear ways. In particular,
2
here Zt2 and Zt+k
are correlated.
Furthermore, note that the conditional distribution of an ARCH model is normal if t ∼ N (0, 1).
Hence, the conditional moments of any order exist.
However, this does not mean that the unconditional variance of an ARCH model exists.
If var(Zt ) < ∞, we have var(Zt ) = E(Zt2 ). Assume further that the process started from the
infinite past. It can be shown that an ARCH process with finite variance is weakly (and strongly)
62
63
ACF
0.0
0.4
0.8
ACF of Squared ftseLD
0
10
20
30
Lag
ACF
0.0
0.4
0.8
ACF of Squared Residuals
0
10
20
30
Lag
Figure 11: acf of the squared residuals of a GARCH(1,1) model fitted to the fTSE100 returns.
64
Download