0.10 rpiLD 0.04 0.06 0.08 200 180 160 RPI 140 0.02 120 100 1987 1 1989 1 1991 1 1993 1 1995 1 1997 1 1999 1 2001 1 2003 1 2005 1 2007 1 0 20 Quarter Figure 1: Retail Price Index (UK) 1 Introduction, ACF, Stationarity and Operators Definition 1.1 A Time Series (TS) is a stochastic process defined on a probability space (Ω, F, P). That means, for a given index set T, a time series Y = {Yt }t∈T is a collection of random variables defined on a probability space (Ω, F, P). Remarks: 40 60 80 Index Figure 2: Annual force of inflation (annual log-difference of RPI) • Estimation of ACF and ARMA parameters • Model selection • Multivariate Time Series • Introduction to ARCH and GARCH • The term ”time series” is used for the sequence of random variables {Yt } and for the sequence of observations {yt }. • In most cases T = {0, 1, . . . , ∞} or T = {0, 1, . . . , n} or T = {−∞, . . . , −1, 0, 1, . . . , ∞}. This means Y = {Yt }t∈T is a stochastic process in discrete time, that is, Y = {Yt }t=T is a set of random variables Yt in time order. • The random variables Yt (observations yt ) are usually equally-spaced in time e.g. daily, monthly, quarterly, annually. • The series • The random variables Yt are NOT just an i.i.d. sample – Yt ’s will be related to one another. y1 , y2 , . . . , yn = {yt }nt=1 with yt = Yt (ω) is called an observation (or observed series) of the time series Y1 , . . . , Yn = {Yt }nt=1 . • It is this relationship or “serial dependence” or “structure” that is of interest. • “time” could be another dimension e.g. length (along a river, pipeline, etc). • We consider only univariate series, Yt ∈ R, but there are multivariate series. Contents • Areas of application: finance, economics, medicine, ... • Introduction, ACF, Stationarity and Operators • Moving Average Processes: MA(q) and MA(∞) • Autoregressive Processes 2 1.1 Objectives 1. Description and Analysis • The ARMA processes Plot the series (always do this first); • Partial ACF Calculate summary measures • ARIMA Processes e.g. estimate trend, seasonal variation, autocorrelations; • Estimation for Linear Processes Model i.e. fit a stochastic model. 3 1400000 gdpLD 0.10 0.15 0.20 1000000 GDP 600000 200000 1960 1970 1980 1990 0.05 0 1950 2000 year 1950 1960 1970 1980 1990 2000 gdp$year[2:ng] Figure 3: UK annual GDP at current prices (in Million GBP) Figure 4: Log-difference of UK annual GDP at current prices 2. Forecasting Predict future values (with confidence limits). 3. Explanation Relate one series to another (like regression). 4. Control A simulated time series following a MA(2) with psi1=0.5, psi2=0.4 −4 1.2 ACF and Stationarity 0 Xt 2 4 Monitoring and adjusting. Definition 1.2 The covariance between two elements Xt and Xs , t, s ∈ T of a time series, is called the autocovariance function of X, denoted by 0 100 200 300 400 500 Time γ(t, s) = cov(Xt , Xs ) = E[(Xt − E(Xt ))(Xs − E(Xs ))]. The second simulated time series following the same model Xt Definition 1.3 A time series is said to be weakly stationary, if i) E(Xt2 ) < ∞ for all t ∈ T, ii) EXt = µ for all t ∈ T −3 Weak stationarity is defined based on autocovariances. 0 2 In particular, γ(t, t) = var(Xt ). 0 and 100 200 300 400 500 Time iii) γ(r, s) = γ(r + t, s + t) for all r, s, t ∈ T. Figure 5: Two realizations of a second order moving average process Properties of a weakly stationary TS: • mean and variance are constant 4 5 Strong stationarity is a property on the whole distribution of a process. A simulated time series following a AR(2) with phi1=0.5, phi2=0.4 0 It is clear that strong stationarity does not follow from weak stationarity. However, note that, in general, weak stationarity also does not follow from strong stationarity, because the second moments (even the mean) of a strongly stationary process may not exist. −5 Xt 5 Weak stationarity is just defined based on the first two moments of the process. 0 100 200 300 400 500 If the variance of a strongly stationary process exists, then it is also weakly stationary. In particular, if the process is Gaussian, i.e. the distribution of Xt for any t ∈ T is normal, then the two stationary properties are equivalent to each other. Time 0 −6 Xt 4 The second simulated time series following the same model 0 100 200 300 400 500 Time Figure 6: Two realizations of a second order autoregressive process • the autocovariances γ(r, s) = cov(Xr , Xs ) only depend on k = s − r. It is hence a onedimensional function, γ(k) = cov(Xt , Xt+k ) for k = ... − 1, 0, 1, ... and ∀ t ∈ T which is called the autocovariance function of a (weakly) stationary process Xt . Definition 1.4 The standardised autocovariance function of a (weakly) stationary process Xt , ρ(k) = γ(k)/γ(0) cov(Xt , Xt+k ) cov(Xt , Xt+k ) =p = var(Xt ) var(Xt )var(Xt+k ) will be called the autocorrelation function (acf ). Example: White Noise This is the simplest stationary series. All X’s are i.i.d. with mean 0 and variance σ 2 . So, γ(t, s) = 0 for t 6= s, γ(t, t) = σ 2 . The White Noise process is usually denoted by Zt (or Z(t)). It has no “structure” but is used within models for series with structure (see later chapters). Properties of the autocovariance function of a (weakly) stationary process (similar properties hold for the acf). 2 • γ(0) = var(Xt ) = σX ≥ 0 and ρ(0) ≡ 1. • |γ(k)| ≤ γ(0) for all k ∈ T, • γ(−k) = γ(k) for all k ∈ T. Definition 1.5 A time series is said to be strongly (or strictly) stationary, if P (Xt1 < x1 , ..., Xtk < xk ) = P (Xt1 +t < x1 , ..., Xtk +t < xk ) ∀ k = 1, 2, ...; t, t1 , ..., tk ∈ T and x1 , ..., xk ∈ R. 6 7 Examples: Operators can be handled algebraically and in particular have inverses. (i) Xt = t , t = 0, 1, ... with iid Cauchy t is strongly, but not weakly, stationary. ∆yt := yt − yt−1 = yt − Byt = (1 − B)yt , (ii) Let t be iid with E(t ) = 0 and E(2t ) < ∞, then the MA(1) process =⇒ ∆ ≡ 1 − B. Xt = ψt−1 + t , |ψ| < 1 Also, and the AR(1) process S(1 − B)yt = S(yt − yt−1 ) = Syt − Syt−1 = (yt + yt−1 + yt−2 + . . . ) − (yt−1 + yt−2 + yt−3 + . . . ) = yt Xt = ϕXt−1 + t , |ϕ| < 1 are both strongly and weakly stationary. (iii) Let t be iid N (0, 1) random variables, then the process t for t = 1, 3, ... √ Xt = (2t − 1)/ 2 for t = 2, 4, ..., is weakly, but not strongly, stationary. =⇒ S(1 − B) ≡ 1 i.e. S ≡ (1 − B)−1 . S∆ ≡ 1 i.e. S ≡ ∆−1 . Alternatively, (iv) The Random Walk Xt = Xt−1 + t with iid t is neither weakly nor strongly stationary. Syt = yt + yt−1 + yt−2 + . . . = (1 + B + B 2 + . . . )yt = (1 − B)−1 yt . The Moving Average Operator 1.3 Operators Notation: T = [w−k , . . . , w−1 , w0 , w1 , . . . , wk ] A time series model is often defined based on another simple process, e.g. an i.i.d. series. is called a moving average operator. We write mt = T yt . On the other hand we may want to transfer a time series into another. T is defined by: This is done by the use of operators. T yt = k X wi yt+i = w−k yt−k + · · · + w0 yt + · · · + wk yt+k i=−k The use of operators simplifies much work in TS. An operator is defined by its effect. y1 , y2 , . . . yk+1 , . . . , y2k+1 , y2k+2 , . . . | {z } mk+1 Operators can be applied to the random variables Yt or the observations yt . Basic operators are: Backward shift: BYt := Yt−1 Difference: ∆Yt = Yt − Yt−1 Identity: 1Yt = Yt Forward Shift: F Yt = Yt+1 , F r Yt = Yt+r Seasonal difference: Dr Yt = Yt − Yt−r = (1 − B r )Yt Summation: SYt = Yt + Yt−1 + Yt−2 + ... then y2 , . . . , y2k+2 to give mk+2 , etc. Some examples for time series models based on white noise are: (i) MA(1) (first order moving average process): Xt = 0.8t−1 + t = (1 + 0.8B)t , t are e.g. iid N (0, 1) random variables. (ii) MA(2) (second order moving average process): Xt = 0.5t−1 + 0.3t−2 + t = (1 + 0.5B + 0.3B 2 )t , t as in 1. Using the MA-operator: Xt = T t with T = [0.3, 0.5, 1, 0, 0] (iii) AR(1) (first order autoregressive process): Xt = 0.8Xt−1 + t ⇐⇒ (1 − 0.8B)Xt = t , t as in 1. (iv) Random Walk: Xt = Xt−1 + t ⇐⇒ ∆Xt = t , t = 0, 1, ..., t as in 1. Xt = St 8 9 2 Univariate Linear Processes 0 Xt −4 A very wide class of time series is that of linear processes, including the well known AR (autoregressive), MA (moving average), ARMA(autoregressive - moving average) processes with iid innovations. 2 A simulated time series following a MA(2) with psi1 = 0.7 and psi2 = 0.4 We will start from the most simple white noise (WN) model, which is defined as follows. 0 100 200 Definition 2.1 An iid series t with E(t ) = 0 and var(t ) = σ2 is called a White Noise. 400 500 (2.1) 0 1, k = 0, 0, k = 6 0. Xt ρ(k) = 2 The second simulated time series following the same model −3 The WN is strongly and weakly stationary with 2 σ , k = 0, γ(k) = and 0, k 6= 0 300 Time Infinite variance i.i.d. series, e.g. i.i.d. Cauchy series, are not included in our definition. 0 100 200 A time series is said to be linear, if it can be represented as a linear (possibly infinite) sum (called a linear filter) of t , where {t } is a WN process. See the MA(∞) process defined later. Figure 7: Two simulated realisations (F71TS MA ACF.r) 2.1 Moving Average Processes Definition 2.2 Let t be a white noise. The process defined in the following is called a moving average process of order q (MA(q)). q Xt = ψ1 t−1 + ... + ψq t−q + t = X ψi t−i , (2.2) i=0 • Using the backshift operator B we have q P 500 following the MA(2) Model (iii) given above. A process, which does not involve future information, is called causal. For some theoretical reasons, an MA model can be defined with coefficients αi 6= 0 for i < 0 (see e.g. the MA(∞) process given later). In this case the process is no more causal. We can obtain the autocovariances γ(k) and the autocorrelations ρ(k) of an MA process easily using the fact var(t ) = σ2 , cov(t , t±k ) ≡ 0 for all t and k 6= 0. For example, for the MA(1) process Xt = 0.8t−1 + t we have • E(Xt ) = 0 for an MA(q) process where ψ(B) = 400 A finite order MA process is always stationary. where ψq 6= 0 and ψ0 = 1. Xt = ψ0 B 0 t + ψ1 Bt + ... + ψq B q t h i = ψ0 B 0 + ψ1 B + ... + ψq B q t = ψ(B)t 300 Time (2.3) ψi B i . i=0 Examples: (i) An MA(1) process: Xt = 0.8t−1 + t , var(0.8t−1 + t ) = 0.82 var(t ) + var(t ) = 1.64σ2 , cov(Xt , Xt+1 ) = cov(0.8t−1 + t , 0.8t + t+1 ) = 0.8σ2 , 0 for |k| > 1, and hence 1, 0.8σ2 = 0.488, and ρ(±1) = 1.64σ2 ρ(±k) = 0 for |k| > 1. var(Xt ) γ(±1) γ(±k) ρ(0) = = = = (ii) Another MA(1) process: Xt = −0.9t−1 + t , (iii) An MA(2) process: Xt = 0.7t−1 + 0.4t−2 + t , Let γ (k) = cov(t , t+k ). Then γ (0) = σ2 and γ (k) = 0 for k 6= 0. (iv) Another MA(2) process: Xt = 0.6t−1 − 0.3t−2 + t . 2.2 Properties of MA processes The MA process defined above does not involve i with i > t, i.e. does not involve information about the future. 10 11 For an MA(q) we obtain q q X X ψi ψj γ (k + i − j) 0.8 γ(k) = 0.8 1.0 Theoretical ACF of MA(2) 1.0 Empirical ACF of second simulated series a) The autocovariances: 0.6 ACF 0.4 (2.4) ψi2 . 0.0 q P 0.0 b) In particular, var(Xt ) = σ 2 = σ2 0.2 ACF k > q, k < 0. 0.4 i=0 0, γ(−k), 0.2 = q−k P ψi ψi+k , 0 ≤ k ≤ q, σ2 0.6 i=0 j=0 i=0 c) The autocorrelations: 0 2 4 6 8 0 2 Lag ρ(k) = q−k q P 2 P ψi ψi+k ψi , 0 ≤ k ≤ q, i=0 i=0 0, ρ(−k), k > q, k < 0. (2.5) We see, γ(k) and ρ(k) of an MA(q) process are zero for |k| > q. 4 6 8 Lag Figure 8: The acf’s (estimated and theoretical) for the MA(2) process. The estimated acf is based on the second realisation shown in figure 7. (F71TS MA ACF.r) A general MA(∞) process is defined through a linear filter (MA-filter) of t : Examples: (i) For the MA(1) process Xt = −0.9t−1 + t we have ∞ X Xt = αi t−i . (2.7) i=−∞ a) γ(0) = var(Xt ) = (ψ02 + ψ12 )σ2 = 1.81σ2 , b) γ(±1) = ψ1 σ2 = −0.9σ2 , ρ(±1) = −0.497 and The MA(∞) process is causal, if αi = 0 ∀ i < 0. We see, the MA(∞) representation of an AR(1) process is causal. Hereafter, we will mainly consider causal MA(∞) processes c) γ(±k) = ρ(±k) = 0 for |k| > 1. (ii) For the MA(2) process Xt = 0.7t−1 + 0.4t−2 + t we have Xt = a) γ(0) = var(Xt ) = (1 + ψ12 + ψ22 )σ2 = 1.65σ2 , ∞ X αi t−i . (2.8) i=0 b) γ(±1) = (ψ0 ψ1 + ψ1 ψ2 )σ2 = 0.98σ2 , ρ(±1) = 0.594, Without loss of generality, we will often put α0 = 1. c) γ(±2) = ψ0 ψ2 σ2 = 0.4σ2 , ρ(±2) = 0.242 and The MA(∞) process is stationary, if αi , i = 0, 1, ... are at least squared summable, i.e. ∞ X d) γ(±k) = ρ(±k) = 0 for |k| > 2. αi2 < ∞. (2.9) i=0 2.3 The MA(∞) processes In time series analysis we often need to consider infinite order MA processes, denoted by MA(∞), which is an generalisation of the MA(q) processes described above. This will be motivated in the following by the simple AR(1) (first order autoregressive) process. Xt = φi t−i . (2.6) (iii) ρ(k) = ∞ P i=0 ∞ P φi t−i . ∞ P αi αi+|k| σ2 , k = 0, ±1, ... . i=0 i=0 Proof: Xt = φXt−1 + t = φ(φXt−2 + t−1 ) + t = ... = i=0 (ii) γ(k) = cov(Xt , Xt+k ) = Example: The AR(1) process Xt = φXt−1 + t with |φ| < 1 has an MA(∞) representation ∞ X Provided that (2.9) holds, by extending the results on the acf of an MA(q) process, we have, ∞ P 2 (i) γ(0) = var(Xt ) = αi σ2 < ∞; αi αi+|k| ∞ P αi2 , k = 0, ±1, ... . i=0 i=0 12 13 Later we will see that (Theorem 2.6 below), many practically relevant autoregressive processes have the causal MA(∞) representations defined in (2.8). The above results can hence be used to calculate (or approximate) the acf’s of autoregressive processes. A stronger condition on the coefficients αi of an MA(∞) process is the absolute summability: ∞ X |αi | < ∞. (2.10) i=0 2.4 Autoregressive Processes The autoregressive (AR) processes are another class of important time series models in theory and practice. These processes are closely related to regression analysis. But now the regressors are some (shifted) random variables of the same series in the past. Definition 2.4 A p-th order autoregressive process AR(p) is defined by It is easy to show that ∞ X |αi | < ∞ =⇒ ∞ X i=0 Xt = φ1 Xt−1 + ... + φp Xt−p + t , αi2 < ∞, but not vice versa. A series converges faster than i−1/2 is squared summable, and that converges faster than i−1 is absolutely summable. Some examples are given below. Equivalently, we have t = Xt − Examples: Let α0 = 1. (i) αi = i−1/2 for i = 1, 2, ... =⇒ ∞ P αi2 = ∞, (ii) αi = i−1 for i = 1, 2, ... =⇒ αi2 < ∞, i=0 −3/2 (iii) αi = i for i = 1, 2, ... =⇒ ∞ P |αi | = ∞. αi2 < ∞, i=0 t = φ0 B 0 Xt − φ1 BXt − ... − φp B p Xt = φ(B)Xt , |αi | = ∞. ∞ P where φ(B) = 1 − |αi | < ∞. p P (2.15) i φi B . Examples: Given a WN process t , we can define i=0 Lemma 2.3 For an MA(∞) process defined by (2.8) we have, |γ(k)| < ∞, if ∞ X |αi | < ∞ (i) An AR(1) process: Xt = 0.8Xt−1 + t , (ii) An AR(2) process: Xt = 1.1Xt−1 − 0.3Xt−2 + t . Replacing B in φ(B) with a variable z we obtain a p-th order polynomial φ(z). (2.11) i=0 k=−∞ and (2.14) i=1 Summable conditions play an important role in the theory of time series analysis, which can be seen from the following simple lemma. ∞ X φi Xt−i , Set φ0 = 1, we have i=0 ∞ P p X i=1 i=0 i=0 ∞ P ∞ P (2.13) where φp 6= 0. i=0 Definition 2.5 The polynomical φ(x) is called the characteristic polynomial of an AR(p) model. Similarly, ψ(z) is the characteristic polynomial of an MA(q) model. Set φ(z) = 0 we obtain the characteristic equation of an AR(p) model: ∞ X 2 γ (k) < ∞, if k=−∞ ∞ X αi2 < ∞. φ(z) = 1 − φ1 z − ... − φp z p = 0. (2.12) i=0 This also holds for the general MA(∞) given in (2.7). The proof of Lemma 2.3 is left to the tutorial. The coefficients in (2.6) are obviously absolutely summable, i.e. AR(1) with |φ| < 1 is stationarity with absolutely summable γ(k). A stationary process with absolutely summable γ(k) is said to have short memory. This means that a stationary AR(1) process has short memory. (2.16) This equation has exactly p roots z1 , ..., zp (possibly multiple or complex). Note that φ(z) and {z1 , ..., zp } determine each other. Hence, the correlation structure and some other important properties of Xt are determined by z1 , ..., zp . To show this we introduce first the important concept of the unit circle in time series analysis. The unit circle in the complex plane is the set of all complex numbers with norm one: where i = √ z = a + ib such that |z| = 1, √ a2 + b2 is the Euclidean norm of z. −1 denotes the imaginary unit and |z| = We have seen that an MA process with summable coefficients are stationary. But this is not true for an AR model. For instance, Xt = Xt−1 + t , the random walk, is non-stationary. Theorem 2.6 An AR(p) process Xt is causal and stationary, iff (if and only if ) all of the roots of φ(z) lie outside the unit circle, i.e. iff |zk | > 1, ∀ 1 ≤ k ≤ p. Proof. See e.g. Theorem 3.1.1 of Brockwell and Davis (1991). 14 15 These results are based on the assumption that the process starts at t = −∞. For an AR process starting at t = 0 or t = 1, these results only hold asymptotically, i.e. Xt will converge (almost surely) to another X̃t as t → ∞, where X̃t is defined following the same AR model but starts at t = −∞. If we can find a factorised version of φ(z), then it is easy to check, if the process is causal stationary. For the AR(3) process Xt = 1.8Xt−1 − 1.07Xt−2 + 0.21Xt−3 + t and ρ(k) = γ(k)/γ(0) = φ|k| , for k = 0, ±1, ±2, ... . Note that γ(k) 6= 0 for any k. For φ > 0, ρ(k) are always positive and decrease monotonically. For φ < 0, |ρ(k)| decrease monotonically but with alternating signs. These results can also be obtained recursively. Note that we have E(Xt ) = 0, γ(k) = E[Xt Xt+k ] and E[Xt−k t ] = φ(z) = (1 − 0.5z)(1 − 0.6z)(1 − 0.7z) with σ2 0 for for k=0 k 6= 0 Multiply both sides of (2.18) with Xt−k and take expectations. z1 = 2, z2 = 10/6 and z3 = 10/7 For k = 0 These are all outside the unit circle and X is therefore stationary. E(Xt Xt ) = E(Xt φXt−1 ) + E(Xt t ) γ(0) = φγ(1) + σ2 If the conditions of Theorem 2.6 are fulfilled, we have Xt = φ−1 (B)t = ∞ X For k = 1; αi t−i , (2.17) E(Xt−1 Xt ) = E(Xt−1 φXt−1 ) + E(Xt−1 t ) i=0 γ(1) = φγ(0) which is a causal stationary MA(∞) process with ∞ X (2.20) (2.21) For k ≥ 2: E(Xt−k Xt ) = E(Xt−k φXt−1 ) + E(Xt−k t ) |αi | < ∞ γ(k) = φγ(k − 1) (2.22) i=0 By solving this system of equations system we obtain the same results as given above, i.e. Generally, αi 6= 0 for all i, because the reciprocal φ−1 (z) of a finite order polynomial φ(z) is an infinite order polynomial. γ(0) = 1 φ φ2 σ 2 , γ(1) = σ 2 , γ(2) = σ 2 , ..., 1 − φ2 1 − φ2 1 − φ2 ρ(0) ≡ 1, ρ(1) = φ, ρ(2) = φ2 , ... 2.5 The AR(1) processes The simplest AR process is the AR(1) process ρ(k) = Xt = φXt−1 + t . (2.18) For this process the condition of Theorem 2.6 reduces to |φ| < 1 and we have seen that αi = φi and ∞ X Xt = φi t−i . (2.19) γ(k) φγ(k − 1) = = φρ(k − 1) = φk γ(0) γ(0) Example ρ(k) for AR(1) processes with φ = 0.8 and φ = −0.8, respectively. The following figure displays two realisations following each of these two AR(1) models with iid N (0, 1) t . The realisations following AR models with φ > 0 and φ < 0 look quite different. The former is dominated by positive and the latter by negative correlations. (F71TS AR1 ACF.r) i=0 We obtain ∞ X αi = ∞ X i=0 φi = 1/(1 − φ) < ∞ i=0 The acf of an AR(1) model can be calculated using its MA(∞) representation. γ(k) = E(Xt Xt+k ) ( ∞ ! ∞ !) X X i j = E φ t−i φ t+k−j i=0 = σ2 φ|k| ∞ X i=0 16 j=0 φ2i = σ2 φ|k| , 1 − φ2 17 ACF 20 40 60 80 100 1) Xt = 0.556Xt−1 + 0.322Xt−2 + t is causal stationary, because φ1 + φ2 = 0.878 < 1, φ2 − φ1 = −0.234 < 1 and −1 < φ2 = 0.322 < 1. 2) Xt = 0.424Xt−1 − 1.122Xt−2 + t is not causal stationary, because φ2 = −1.122 < −1. 0 2 4 6 Time Lag AR(1) with phi = −0.8 AR(1) with phi = −0.8 8 1.0 Region B φ1 < 0 and φ21 + 4φ2 > 0 (two real roots z1 and z2 ) Region C φ1 < 0 and φ21 + 4φ2 < 0 (two conjugate complex roots z1 and z2 ) ACF 0.0 Region D φ1 > 0 and φ21 + 4φ2 < 0 (two conjugate complex roots z1 and z2 ) The acf’s of an AR(2) process with coefficients in different regions have different function forms (F71TS AR2 regions.r). −0.5 0 20 40 60 80 100 The above stationary conditions define a triangle in the φ1 -φ2 plane, which can be divided into four regions: Region A φ1 > 0 and φ21 + 4φ2 > 0 (two real roots z1 and z2 ) 0.5 1 2 3 −1 −3 Xt Examples: 0.2 0.4 0.6 0.8 1.0 4 2 0 Xt −4 −2 0 Using these conditions it is easy to check, whether an AR(2) model is causal stationary or not. AR(1) with phi = 0.8 AR(1) with phi = 0.8 0 2 4 6 8 The four regions of the AR(2) parameters Lag 1.0 Time 0.5 2.6 The AR(2) processes with the roots z1 = 2 and z2 = 1 32 . This process is causal stationary. A C D −1.0 φ(z) = 1 − 1.1z + 0.3z 2 = (1 − 0.5z)(1 − 0.6z) −0.5 Examples: 1) For the AR(2) process Xt = 1.1Xt−1 − 0.3Xt−2 + t we have B 0.0 For a given AR(2) model we have to first check, whether it is causal stationary. In some special cases this can be done by means of a factorisation of φ(z). phi2 Some properties of the more complex AR(2) processes will be discussed in this section. −2 −1 2) For the AR(2) process Xt = −1.5Xt−1 + Xt−2 + t we have 0 1 2 phi1 φ(z) = 1 + 1.5z − z 2 = (1 − 0.5z)(1 + 2z) with the roots z1 = 2 and z2 = − 21 . This process is not causal stationary (however, there exist a non-causal stationary solution of Xt ). The ACF of an AR(2) model Usually we have to check, whether the conditions of Theorem 2.6 are fulfilled or not. We have explicit formulas for the ACFs of MA processes and AR(1) processes. For an AR(2) model The acf of an AR(p) process with p > 1 has to be calculated in a recursive way. Xt = φ1 Xt−1 + φ2 Xt−2 + t these conditions are equivalent to the following simple conditions on the coefficients i) φ1 + φ2 < 1; ii) φ2 − φ1 < 1; iii) −1 < φ2 < 1. 18 Consider an AR(2) Xt − φ1 Xt−1 − φ2 Xt−2 = t Multiply both sides with Xt−k and take expectations. We obtain, for k = 0, 1 and 2 respectively, γ(0) −φ1 γ(1) −φ2 γ(2) = σ2 , γ(0) −φ1 γ(1) −φ2 γ(2) = σ2 , γ(1) −φ1 γ(0) −φ2 γ(1) = 0, or −φ1 γ(0) +(1 − φ2 )γ(1) +0γ(2) = 0, γ(2) −φ1 γ(1) −φ2 γ(0) = 0, −φ2 γ(0) −φ1 γ(1) +γ(2) = 0. (2.23) 19 Solving these equations, we have (1 − φ2 )σ2 , (1 + φ2 )[(1 − φ2 )2 − φ21 ] 2 φ1 σ γ(1) = , (1 + φ2 )[(1 − φ2 )2 − φ21 ] [φ21 + φ2 (1 − φ2 )]σ2 γ(2) = , (1 + φ2 )[(1 − φ2 )2 − φ21 ] (2.24) γ(k) = φ1 γ(k − 1) + φ2 γ(k − 2). (2.25) γ(0) = 1.0 0.2 0 5 10 ρ(1) = φ1 . 1 − φ2 25 30 0 5 10 15 Region C Region D 20 25 30 20 25 30 ACF −0.5 0.0 0.5 0.5 1.0 Lag −1.0 (2.26) 0 And hence 20 0.0 ACF For ρ(k) we have ρ(0) ≡ 1. For k = 1, dividing the second equation of (2.23) by γ(0), we have 15 Lag 1.0 This recursive formula and the initial solutions given in (2.24) allow us to calculate the autocovariances γ(k) of an AR(2) process for any finite k (using e.g. R). This idea can be generalized to a common AR(p) model, by which p + 1 initial values have to be solved. 0.6 ACF 0.5 ACF −0.5 0.0 and for k ≥ 2 ρ(1) − φ1 − φ2 ρ(1) = 0. Region A 1.0 Region B 5 10 (2.27) 15 20 25 30 0 5 10 15 Lag Lag Region B Region A Analogously, we obtain the recursive formulas of ρ(k) for k ≥ 2: ρ(k) = φ1 ρ(k − 1) + φ2 ρ(k − 2). (2.28) We see, for an AR(2) model, the calculation of ρ(k) is a little bit easier than that of γ(k). 0 −4 Similarly, we can obtain recursive formulas for computing the coefficients αi in the MA(∞) representation of an AR process given in (2.17). But this will not be discussed further in our lecture. What we need to know are only some properties of αi , e.g. absolute summability. −4 −2 A$Xt 2 4 0 1 2 3 −2 Solutions: Following the above formulas we have ρ(0) = 1, ρ(1) = φ1 /(1 − φ2 ) = 0.833, ρ(2) = φ1 ρ(1) + φ2 ρ(0) = 0.817, ρ(3) = φ1 ρ(2) + φ2 ρ(1) = 0.742, ρ(4) = 0.698, ρ(5) = 0.645, ρ(6) = 0.602, ... , ρ(48) = 0.029, ρ(49) = 0.027 and ρ(50) = 0.025. B$Xt Example. Calculate ρ(k), k = 0, 1, ..., 50, of the AR(2) process Xt = 0.5Xt−1 + 0.4Xt−2 + t . 0 50 100 150 0 50 Time Time Region C Region D 100 150 100 150 The ACF for the four different regions (F71TS AR2 ACF.r): D$Xt −10 0 5 15 −20 C$Xt −5 −15 (iv) Xt = 1.8Xt−1 − 0.9Xt−2 + t with φ1 and φ2 in area D. We see, the acf of an AR(2) process with coefficients in region A looks like that of an AR(1) with positive coefficient and that of an AR(2) process with coefficients in region B looks like that of an AR(1) with negative coefficient. 5 (ii) Xt = −0.5Xt−1 + 0.4Xt−2 + t with φ1 and φ2 in area B; (iii) Xt = −1.8Xt−1 − 0.9Xt−2 + t with φ1 and φ2 in area C; 15 (i) Xt = 0.5Xt−1 + 0.4Xt−2 + t with φ1 and φ2 in area A; 0 50 100 Time 150 0 50 Time If the roots of φ(z) are complex, then ρ(k) perform like damped sine waves. If the coefficients are in region C, the sign of ρ(k) changes quite frequently. If the coefficients are in region D, the sign of ρ(k) keeps the same in a half period, as by a sine function. 20 21 The conditions |zi | > 1 in Theorem 2.7 imply that ψ −1 (B) is well defined with non-negative powers and absolutely summable coefficients. Now we have 2.7 The AR(∞) processes The AR(∞) process is given by Xt = ∞ X βi Xt−i + t t = ψ −1 (B)Xt = (2.29) ∞ X βi Xt−i , i=0 i=1 or equivalently, which is a causal stationary AR(∞) process with ∞ P |βi | < ∞. i=0 t = ∞ X We see there is a dual relationship between the MA and AR processes. βi Xt−i (2.30) i=0 with β0 = 1. Usually, it is assumed that • A causal invertible MA process can be represented as a causal stationary AR(∞). • A causal stationary AR process can be represented as a causal, stationary (and also invertible) MA(∞). P∞ i=0 |βi | < ∞. Example: An MA(1) process Xt = ψt−1 + t with |ψ| < 1 has the following AR(∞) representation. ∞ X t = (−ψ)i Xt−i (2.31) Analogously to the causal stationary conditions for AR(1) and AR(2), we have the following invertible conditions for MA(1) and MA(2) processes. 1) An MA(1) process Xt = ψt−1 + t is invertible, iff |ψ| < 1. 2) An MA(2) process Xt = ψ1 t−1 + ψ2 t−2 + t is invertible, iff i=0 with absolutely summable coefficients βi = (−ψ)i , i = 0, 1, ... . i) ψ1 + ψ2 > −1; Note that Xt are observable but t are often unobservable. Hence, the above AR(∞) model of the innovations t is very useful in theory and practice, because now it is possible to estimate t from the data. The question is, whether such an AR(∞) model is well defined for a given MA process. The answer is that this is only possible if the MA process is invertible. ii) ψ1 − ψ2 < 1; iii) −1 < ψ2 < 1. 2.8 Invertibility of MA(q) Processes 1.0 Invertibility: Any process Xt is said to be invertible, if it can be represented as an AR(∞) process with (absolutely) summable coefficients. 0.5 The four regions of the MA(2) parameters Example: The RW Xt = Xt−1 + t is non-stationary but invertible. D C A B 0.0 psi2 Note that any AR process with summable coefficients is invertible. Example: The MA(1) process Xt = t−1 + t is stationary but not invertible. Hence, we need to discuss, when an MA process is invertible. For this we have the following theorem, which is closely related to Theorem 2.6 on the causal stationarity of an AR process. The characteristic equation of an MA(q) model is −1.0 −0.5 There are however some simple MA models which are not invertible. −2 ψ(z) = 1 + ψ1 z + ... + ψq z q = 0. (2.32) −1 0 1 2 psi1 Again, this equation has q roots z1 , ..., zq . And ψ(z) and {z1 , ..., zq } determine each other. Hence, the correlation structure of Xt is determined by z1 , ..., zq . Theorem 2.7 An MA(q) process Xt is invertible, iff all of the roots of ψ(z) lie outside the unit circle, i.e. iff |zi | > 1, ∀ 1 ≤ i ≤ q. Also the invertible conditions for an MA(2) process define a triangle in the ψ1 -ψ2 plane, which can again be divided into four regions (F71TS AR2 regions.r). Proof. See Brockwell and Davis (1991). 22 23 Region A ψ1 < 0 and ψ12 − 4ψ2 > 0 (two real roots z1 and z2 ) Region B The following theorem is one of the most important theorems in time series analysis. ψ1 > 0 and ψ12 − 4ψ2 > 0 (two real roots z1 and z2 ) For an ARMA(p, q) process we have Region C ψ1 > 0 and ψ12 − 4ψ2 < 0 (with two conjugate complex roots z1 and z̄1 ) Theorem 2.9 Assume that φ(z) and ψ(z) have no common factors. Then the ARMA(p, q) process is Region D ψ1 < 0 and ψ12 − 4ψ2 < 0 (with two conjugate complex roots z1 and z̄1 ) Examples: a) causal (stationary), iff all roots of φ(z) lie outside the unit circle, 1) Xt = 1.5t−1 + 0.75t−2 + t is invertible, because ψ1 + ψ2 = 2.25 > −1, ψ1 − ψ2 = 0.75 < 1, and −1 < ψ2 = 0.75 < 1. b) invertible, iff all roots of ψ(z) lie outside the unit circle, c) causal (stationary) and invertible, iff all roots of φ(z) and ψ(z) lie outside the unit circle. 2) Xt = 0.75t−1 − 0.5t−2 + t is not invertible, because ψ1 − ψ2 = 1.25 > 1. Theorem 2.9 combines Theorems 2.6 and 2.7. The causal stationary and invertible conditions of an ARMA model do not depend on each other. 2.9 The ARMA processes Definition 2.8 An Autoregressive Moving Average process of order (p, q) (ARMA(p, q)) is defined by Xt = φ1 Xt−1 + ... + φp Xt−p + ψ1 t−1 + ... + ψq t−q + t . (2.33) By combining the conditions given above we can check, whether an ARMA(p, q), for p = 0, 1, 2 and q = 0, 1, 2, process is causal stationary and/or invertible or not. φ(z) and ψ(z) have a common factor if there exists a function f (z) such that φ(z) = f (z)φ̃(z) and ψ(z) = f (z)ψ̃(z) An ARMA model combines an AR and an MA models. Equivalently, (2.38) can be represented in the following way Xt − φ1 Xt−1 − ... − φp Xt−p = ψ1 t−1 + ... + ψq t−q + t , (2.34) φ(B)Xt = ψ(B)t , (2.35) If a common factor exists then φ̃(z) and ψ̃(z) instead of φ(z) and ψ(z) should be used in theorem 2.9. Examples: where φ(z) = 1 − φ1 z − ... − φp z p is the characteristic polynomial of the AR part and ψ(z) = 1 + ψ1 z + ... + ψq z q the characteristic polynomial of the MA part. An AR(p) model is an ARMA(p, 0) model with ψ(z) ≡ 1, and an MA(q) model is an ARMA(0, q) model with φ(z) ≡ 1 . Examples: Given a WN process t , we can define (i) An ARMA(1, 1) process: Xt = 0.8Xt−1 + 0.6t−1 + t 1) Xt − 0.6Xt−1 − 0.3Xt−2 = 1.5t−1 + 0.75t−2 + t is both causal stationary and invertible. 2) Xt − 0.6Xt−1 − 0.3Xt−2 = 0.75t−1 − 0.5t−2 + t is causal stationary but not invertible. 3) Xt − 0.6Xt−1 − 0.5Xt−2 = 1.5t−1 + 0.75t−2 + t is invertible but not causal stationary. 4) Xt − 0.6Xt−1 − 0.5Xt−2 = 0.75t−1 − 0.5t−2 + t is neither causal stationary nor invertible. Remark. Under the assumptions of Theorem 2.9 c), an ARMA(p, q) process has on the one hand the MA(∞) representation ∞ X Xt = αi t−i , (2.36) i=0 where α(z) = φ(z)−1 ψ(z) with P |αi | < ∞, and on the other hand the AR(∞) representation t = βi Xt−i , (2.37) i=0 (ii) An ARMA(2,2) process: Xt = 0.7Xt−1 + 0.1Xt−2 + 0.8t−1 + 0.16t−2 + t ∞ X where β(z) = φ(z)ψ(z)−1 , with P |βi | < ∞. The fact that αi or βi are absolutely summable follows since the convolution of two absolutely sumamble sequences is absolutely summable. Under the assumptions of Theorem 2.9 c), the autocovariances of an ARMA(p,q) process are always absolutely summable, i.e. ∞ X |γX (k)| < ∞ k=−∞ 24 25 3 Spectral Density Functions of ARMA Processes The mean of MA, AR or ARMA processes may be non-zero. Definition 2.10 An ARMA(p, q) with mean µ is defined by We have seen how to calculate the ACF of an MA or AR process. Xt − µ = φ1 (Xt−1 − µ) + ... + φp (Xt−p − µ) + ψ1 t−1 + ... + ψq t−q + t . (2.38) On the other hand, given a function γ(k), we might want to check, if it could be an autocovariance function of some stationary process. Note that, if the mean is known, we can simply assume that µ = 0 as before. If the mean is unknown, it is not difficult to estimate µ from the data and to remove it. Hence, ARMA processes with unknown mean have similar properties as those given above. If the answer is positive, we would further like to know the properties of the stationary process determined by the given ACF. Discussion on this problem will involve the so-called spectral analysis in time series analysis, which is carried out by means of the spectral density of a time series. The ACF We rewrite the ARMA(p, q) process as The spectral density is a nonnegative function defined on the frequency interval λ ∈ [−π, π]. Xt − φ1 Xt−1 − ... − φp Xt−p = ψ1 t−1 + ... + ψq t−q + t . Multiplying both sides of (2.39) by Xt−k and taking the expectations, we obtain X γ(k) − φ1 γ(k − 1) − ... − φp γ(k − p) = σ2 ψj ψj−k (2.39) (2.40) k≤j≤q Assume that the autocovariances γ(k) of a (real-valued)1 stationary process Xt are absolutely summable. Then ∞ 1 X f (λ) = γ(k)e−ikλ 2π k=−∞ ∞ 1 X γ(k) cos(kλ) 2π k=−∞ # " ∞ X 1 γ(k) cos(kλ) = γ(0) + 2 2π k=1 = for 0 ≤ k < max(p, q + 1), and γ(k) − φ1 γ(k − 1) − ... − φp γ(k − p) = 0 (2.41) for k ≥ max(p, q + 1). The recursive equation (2.41) is the same as for an AR(p). (3.42) is a well defined continuous nonnegative even (symmetric) function. Where the last two equations are due to the fact e−ikλ = cos(kλ) + i sin(kλ) , the function γ(k) is symmetric, cos is a symmetric function and sin is an anti-symmetric function. f (λ) defined here is called the spectral density function (spdf) or simply spectral density of the acf γ(k) or of the process Xt , which is a periodic function with period 2π. Hence, we only need to consider f (λ) on the interval λ ∈ [−π, π]. Note that, f (λ) is not necessarily a density function, because generally Rπ f (λ)dλ 6= 1. −π The spectral density function can be calculated from both γ(k) or ρ(k). The difference between the results is just the constant factor γ(0). We therefore, write acf for both the autocorrelation and the autocovariance function. On the other hand, for the spectral density function f (λ) we have Z π γ(k) = f (λ)eikλ dλ, (3.43) −π This means that, if γ(k) are absolutely summable, then γ(k) and f (λ) determine each other. Indeed, f (λ) is the Fourier transform of γ(k) and γ(k) are the Fourier coefficients of f (λ). Examples: (i) For the WN process Xt = t with γ(0) = σ2 and γ(k) = 0 for k 6= 0 we have ∞ fX (λ) = 1 26 1 1 X γ(k) cos(kλ) = γ(0) 2π k=−∞ 2π Spectral analysis often involves complex-valued stochastic processes, which will be avoided in our lecture. 27 = 1 2 σ . 2π (3.44) Solution: For this function we have ∞ (ii) MA(1) process Xt = ψt−1 + t with |ψ| < 1. γ(k) = 0 for |k| > 1. Now, we have f (λ) = Then γ(0) = (1 + ψ )σ2 , γ(±1) = ψσ2 and 2 1 X −ikλ e ρ(k) 2π k=−∞ 1 {ρ[cos(−λ) + i sin(−λ)] + 1ρ[cos(λ) + i sin(λ)]} 2π 1 [1 + 2ρ cos λ]. = 2π = ∞ 1 X fX (λ) = γ(k) cos(kλ) 2π k=−∞ 1 [γ(0) + 2γ(1) cos(λ)] 2π 1 2 = σ [1 + 2ψ cos(λ) + ψ 2 ]. 2π (3.45) = Now, it is clear that f (λ) ≥ 0 for all λ ∈ [−π, π], if and only if |ρ| ≤ 12 . (3.46) On the other hand, given a function f (λ) on [−π, π], how can we check, if it is a spectral density of some (real-valued) stationary process or not? To this end we have The basic idea of the spectral analysis is that any stationary process in discrete time can be decomposed into (usually infinite) periodic components with different frequencies λ ∈ [−π, π] (a low frequency corresponds to a long period). Theorem 3.2 A function f (λ) defined on [−π, π] is the spectral density function of a (real-valued) stationary process, if and only if (i) f (λ) = f (−λ), The (relative) value of the spectral density at a given frequency λ shows how strong the influence of the periodic fluctuation with this frequency on the whole process is. We see the spectral density of a White Noise is a constant, this means in a WN all of the frequencies are equally important. The spdf can be used to check, whether a given symmetric function γ(k), k = 0, ±1, ..., is the autocovariance function of some stationary process or not. Besides the basic properties of γ(k) (γ(0) ≥ 0, |γ(k)| ≤ γ(0) and γ(−k) = γ(k)), an autocovariance function of a stationary process has to be non-negative (or positive semi-) definite. For any given k ≥ 1 define the (k+1) × (k+1)-matrices γ(0) γ(1) · · · γ(k) γ(1) γ(0) · · · γ(k-1) Γ = .. .. .. .. . . . . γ(k) γ(k-1) · · · γ(0) (ii) f (λ) ≥ 0, and Rπ (iii) −π f (λ)dλ < ∞. Example. Let f (λ) = (1 − 2φ cos λ + φ2 )−1 with |φ| < 1, λ ∈ [−π, π]. Then f (λ) = f (−λ) and 2φ cos λ + φ2 > 1 + φ2 − 2|φ| > 0. This means that f (λ) > 0 and f (λ) < ∞. The latter ensures R1 − π f (λ)dλ < ∞. We conclude that f (λ) is the spdf of some stationary process. −π 3.1 Spectral Densities of ARMA Processes Closed form formulas for the spectral density functions of ARMA(p, q) processes are well known, which are summarised in the following theorem. Theorem 3.3 (Spectral density of an ARMA(p, q) process). Let {Xt } be a causal stationary and invertible ARMA(p, q) process given by γ(k) are positive semidefinite means that, for any real vector a = (a1 , ..., ak )0 , we have a0 Γa ≥ 0. This condition is usually not easy to check. However, the following theorem shows that the answer is clear, if f (λ) is easy to calculate. Theorem 3.1 A symmetric, absolutely summable function γ(k) defined on the integers is the autocovariance function of a stationary process, iff φ(B)Xt = ψ(B)t , (3.48) where t is a WN process, φ(z) and ψ(z) have no common roots. Then {Xt } has the spectral density function σ 2 |ψ(e−iλ )|2 , −π ≤ λ ≤ π. (3.49) fX (λ) = 2π |φ(e−iλ )|2 ∞ 1 X −ikλ e γ(k) ≥ 0. 2π k=−∞ (3.47) Remark. The spdf depends only on γ(k) (or ρ(k)). Hence, the above results hold, if only t are uncorrelated with mean zero and variance σ2 . The invertible condition is not necessary. These results can also be extended to causal stationary MA(∞) or AR(∞) processes. See Corollary 4.3.2 in Brockwell and Davis (1991). Other properties of γ(k) follow from the assumptions of Theorem 3.1. Similar results hold for the autocorrelation function. Remark. Although the above formula seems to be very simple, the following examples show that it is still not easy to obtain the explicit solution in a special case, because we have to calculate the norm of complex functions. f (λ) = Example. Now we can show that the following function 1, if k = 0, ρ, if k = ±1, ρ(k) = 0, otherwise, is an acf of a stationary process, if and only if |ρ| ≤ 21 . 28 ∀λ ∈ [−π, π]. Examples: 1. We will show again, that the spdf of an MA(1) process Xt = ψt−1 + t with |ψ| < 1 is fX (λ) = σ2 (1 + 2ψ cos λ + ψ 2 ). 2π 29 Proof: Following Theorem 3.3 we have = 0.4 spdf 0.2 0.0 1.5 2.0 3.0 2.5 3.0 spdf 3 4 2.5 1 spdf 1.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 lambda lambda phi = −0.3 and psi = 0.8 phi = 0.3 and psi = −0.8 −1 −1 [1 − φ cos(−λ)]2 + [−φ sin(−λ)]2 1 − 2φ cos λ + φ2 −1 −1 . (3.51) spdf |1 − φ[cos(−λ) + i sin(−λ)]|2 0.05 = 0.5 phi = −0.8 and psi = 0 0.20 = 0.0 phi = 0.8 and psi = 0 )| |1 − φe−iλ |2 3.0 lambda 2 −1 spdf = |φ(e 2.5 0.05 f (λ) = 2.0 0 0.0 −iλ 1.5 lambda 1 −1 σ2 1 − 2φ cos λ + φ2 . 2π σ2 2π σ2 2π σ2 2π σ2 2π σ2 2π 1.0 2 spdf 0.2 0.0 (3.50) 2. The spdf of an AR(1) process Xt = φXt−1 + t with |φ| < 1 is . Proof. Now we have 0.5 0 = 0.0 0.0 We see, the example given after Theorem 3.2 is indeed the spdf of an AR(1) process with φ and σ2 = 2π. Combining the above examples, we obtain: 0.20 = 4 = 3 = 2 f (λ) = phi = 0 and psi = −0.8 0.4 phi = 0 and psi = 0.8 σ2 |ψ(e−iλ )|2 2π σ2 |1 + ψe−iλ |2 2π σ2 |1 + ψ[cos(−λ) + i sin(−λ)]|2 2π σ2 [1 + ψ cos(−λ)]2 + [ψ sin(−λ)]2 2π σ2 1 + 2ψ cos λ + ψ 2 . 2π 0.5 1.0 1.5 2.0 2.5 3.0 0.0 lambda 0.5 1.0 1.5 2.0 2.5 3.0 lambda The spdf of an ARMA(1, 1) model Xt = φXt−1 + ψt−1 + t with |φ| < 1 and < ψ| < 1 is f (λ) = σ2 (1 + 2ψ cos λ + ψ 2 ) . 2π (1 − 2φ cos λ + φ2 ) (3.52) The spectral densities of the following models: f (0) = • MA(1) with ψ = 0.8, −0.8, • AR(1) with φ = 0.8, −0.8, ∞ X • ARMA(1, 1) with φ = 0.3, ψ = −0.8 Later we will see that for a time series Xt , the variance of the sample mean, var(x̄), depends on ∞ P γX (k). k=−∞ Generally, it is not easy to calculate this sum, since it depends on all γ(k), k = 0, 1, .... ∞ P k=−∞ 30 (3.53) γ(k) = 2πf (0). (3.54) k=−∞ are shown in the following figures. its spdf (without calculating γ(k) and f (λ)!). ∞ ∞ 1 X 1 X γ(k)e−ik0 = γ(k), 2π k=−∞ 2π k=−∞ because e−ik0 ≡ 1 for any k. We therefore obtain • ARMA(1, 1) with φ = −0.3, ψ = 0.8 and However, we can show for an ARMA process that Assume that γX (k) of a stationary process Xt are absolute summable, then Xt has the spectral density fX (λ). Following the definition we have, at the origin λ = 0, γX (k) can be easily obtained by means of Note again that e−iλ = 1 for λ = 0. For an ARMA(p, q) process the above formula reduces to 2 ∞ X ψ(1) γ(k) = 2πf (0) = σ2 . (3.55) φ(1) k=−∞ Inserting ψ(1) and φ(1), we have for any causal stationary ARMA(p, q): !2 Pq ∞ X j=0 ψj P . γ(k) = 2πf (0) = σ2 1 − pi=1 φi k=−∞ (3.56) 31 Examples: Assume that t are iid N (0, 1) random variables. 1. For MA(1) Xt = ψt−1 + t , |ψ| < 1, we have ∞ P γX (k) = σ2 (1 + ψ)2 . k=−∞ 2. For AR(1) Xt = φXt−1 + t , |φ| < 1, we have ∞ P i=0 γX (k) = σ2 (1 − φ)−2 . k=−∞ 3. For Xt = 0.6Xt−1 − 0.3Xt−2 + 0.2Xt−3 + 0.4t−1 + 0.2t−2 + 0.3t−3 + t we have ∞ X k=−∞ The above results can easily be extended to more general cases. Assume that Xt has a causal MA(∞) representation ∞ X Xt = αi t−i (3.57) γX (k) = [(1 + 0.4 + 0.2 + 0.3)/(1 − 0.6 + 0.3 − 0.2)]2 = (1.9/0.5)2 = 14.44. with ∞ P |αi | < ∞, t are uncorrelated with E(t ) = 0 and var(t ) = σ2 , which are not necessarily i=0 independent2 . Recall that now γ(k) are absolutely summable (see Lemma 2.3). By extending the above results, we obtain the following relationship !2 ∞ ∞ X X γ(k) = σ2 αi . (3.58) i=−∞ i=0 This can also be shown by means of convolution of two infinite series. 2 32 A well known theorem in time series analysis, called the Wold Decomposition, ensures that almost all common stationary processes can be represented as an MA(∞) process of uncorrelated innovations. 33 4 Partial Autocorrelations 5 Linear Non-stationary Processes If we want to fit an AR(p) model to our data, we have to choose a proper order of the model. By means of the acf, it is not easy to determine, which p should be used. The partial autocorrelation function (pacf) was traditionally introduced as a tool for selecting the order of an AR model, because, whereas an AR(p) model has an acf which is infinite in extent, its partial autocorrelations are only non-zero until lag k = p, similarly to the autocorrelations of an MA model. The class of ARMA processes is the most important class of stationary time series. However, in the practice, in particular in finance and insurance, most time series observed are non-stationary. One important reason for non-stationarity is the effect due to the integration of two stationary observations. Non-stationary processes in this sense are hence called integrated ones. In the sequel, the ARMA processes will be extended to the well known linear non-stationary integrated processes. The partial autocorrelation of a stationary process at lag k, denoted by α(k), may be regarded as the correlation between Xt and Xt−k after adjusting for {Xt−1 , ..., Xt−k+1 } or conditionally on {Xt−1 , ..., Xt−k+1 }. In other words, the partial autocorrelation α(k) is the correlation between the residuals of the regression of Xt on {Xt−1 , ..., Xt−k+1 } and the residuals of the regression of Xt−k on {Xt−1 , ..., Xt−k+1 }. Assume we are given a stationary process with autocorrelations ρ(k), k = ..., −1, 0, 1, ... such that ρ(k) → 0 as k → ∞. The formulas of its pacf are very complex. However, it is easy to calculate the sample pacf’s from your data using a software , e.g. R. The pacf for an AR(2) process. (ii) For k = 2 we have α(2) = Definition 5.1 (The ARIMA(p, d, q) process) If d is a non-negative integer, then {Xt } is said to be an ARIMA(p, d, q) process if Yt := (1 − B)d Xt is a causal stationary ARMA process. φ(B)(1 − B)d Xt = ψ(B)t 2 = ρ(2) − ρ (1) , 1 − ρ2 (1) φ(B)Yt = ψ(B)t with Yt = (1 − B)d Xt , (5.60) If d = 0 we will simply have an ARMA process. The process is stationary, if and only if d = 0. Example 1. For the AR(1) process Xt = φXt−1 + t , |φ| < 1 we have α(0) = 1, α(1) = φ and α(k) = 0 for k > 1. Example 2. For a causal AR(2) process Xt = φ1 Xt−1 + φ2 Xt−2 + t we have α(0) = 1, α(±1) = ρ1 = φ1 /(1 − φ2 ), α(2) = φ2 and α(k) = 0 for k > 2 (following the above results). Note in particular that −1 < α(2) = φ2 < 1, because it is some kind of correlation coefficient. The pacf’s of MA processes are very complex and are in general nonzero for all lags, like the acf of AR models. For the simplest invertible MA(1) process Xt = ψt−1 + t with |ψ| < 1, it can be shown that, after lengthy calculation, (−ψ)k (1 − ψ 2 ) . 1 − ψ 2(k+1) or where d = 0, 1, ..., φ(z) and ψ(z) are characteristic polynomials of the AR and MA parts, respectively, and {t } is a WN. For an AR(p) process it can be shown that α(p) = φp and α(k) = 0 for all k > p. The result α(k) = 0 for all k > p is due to the fact that an AR(p) model is a p-th Markov process. For k > p, the influence of Xt−k on Xt is totally included in Xt−1 , ..., Xt−p . α(k) = − Processes Xt whose d-th differencing series Yt are ARMA processes, where d = 0, 1, ..., are called ARIMA (autoregressive integrated moving average) processes, denoted by ARIMA(p, d, q). This definition means that Xt satisfies the difference equation (i) Define α(0) = 1 and α(1) = ρ(1) (no observations coming in between Xt and Xt−1 ). 1 ρ(1) ρ(1) ρ(2) 1 ρ(1) ρ(1) 1 5.1 ARIMA Processes (4.59) See Box and Jenkins (1976). For ψ < 0, α(k) < 0 ∀ k 6= 0. For ψ > 0, α(k) has alternating signs. We see that the acf of a MA(q) process is zero for lag |k| > q. And the acf of an AR(p) process is nonzero for all lags. In contrast to this, the pacf of a MA(q) process is nonzero for all lags. And the pacf of an AR(p) process is zero for lag |k| > p. This shows again the duality between the MA and AR processes. This means that an ARIMA model with d ≥ 1 is non-stationary. An ARIMA process with d > 1 is not practically relevant. Hence, we will mainly consider cases with either d = 0 or d = 1. If a time series follows an ARIMA(p, 1, q) model, then the series of the first differences, i.e. Yt = ∆Xt = Xt − Xt−1 , will follow an ARMA(p, q) model and is stationary. Given observations x1 , x2 , ..., xn of a time series Xt . The graph of the sample autocorrelations ρ̂(k), called the correlogram, can be empirically used to check, whether d = 1 or d = 0, that is whether Xt would be non-stationary or stationary. Part (a) of the following figure displays a realisation x0 , x1 , ..., x300 following the ARIMA(2, 1, 0) model: (1 − B)Xt = Yt with Yt = 0.4Yt−1 + 0.2Yt−2 + t , where t are iid N (0, 1) random variables. The differencing series yt = xt − xt−1 , t = 1, 2, ..., 300, is shown in part (b). Note that the original series {xi } is of length 301, but the differencing series {yi } is of length 300. One observation is lost by taking first order difference. The estimated acf’s (the correlograms) of these two series are shown in part (c) (left) and (d) (right), respectively. The correlogram shown in Figure 2.9(c) indicates (empirically) clear nonstationarity of the data, whereas that given in part (d) is a typical correlogram of a realisation of a stationary process. √ The two dashed lines in a correlogram are the so-called ±2/ n confidence bounds, which will be explained later. 34 35 (b) First Diff. 50 100 150 200 250 300 50 100 150 200 250 300 ACF 0.0 0.4 0.8 15 5 Xt 0 0 50 100 150 200 250 300 0 5 10 15 Time Time Time Lag (c) ARIMA(2,1,0) (d) First Diff. Random Walk ACF of RW 20 25 30 20 25 30 0 5 10 15 20 25 30 5 10 15 20 25 30 The random walk is a useful model for financial time series. For practical and theoretical reason, a RW is often assumed to start at the time point t = 0 with known X0 = x0 . Without loss of generality it is often assumed X0 = 0. Definition 5.2 (Random Walk) A random walk is the stochastic process defined by X0 , t = 0, Xt−1 + t , t > 0, ACF 50 100 150 200 250 300 0 Time 5.2 Random Walk with or without Drift 0.0 0 Lag 0.4 Xt 0 Lag Xt = −10 −15 ACF 0.4 0.0 0.4 0.0 ACF −5 0.8 0.8 0.8 0 0 −10 −2 −20 0 Y 0 0 X 1 2 10 20 ACF of RW Random Walk 3 (a) ARIMA(2,1,0) 5 10 15 Lag A time series in the practice may also have a non-stochastic trend together with a stochastic one. If there is a simple linear trend in a RW, then it can be modelled by a random walk with drift defined by X0 , t = 0, Xt = (5.64) Xt−1 + µ + t , t > 0, where {t } is a WN with mean zero and variance σ2 , and µ 6= 0 is an unknown constant. Now, we have t X Xt = tµ + i + X0 (5.65) i=1 (5.61) with mean E(Xt ) = tµ + X0 , which forms a linear non-stochastic trend in such a time series. where {t } is a WN with mean zero and variance σ2 . Moreover, it holds t = Xt − Xt−1 − µ. A random walk is indeed an ARIMA(0, 1, 0) model starting from t = 1. Obviously, we have Xt = t X i + X0 . (5.62) i=1 E(Xt ) = X0 and var(Xt ) = tσ2 , (not stationary), causal Due to the very big variance, a realisation of a random walk often shows a (stochastic) trend, which is however purely random. Two realisations of length n = 301 following the same RW model with iid N (0, 1) innovations are shown in the following figure together with their sample acf’s. A RW is also invertible, because we have t = Xt − Xt−1 , (5.63) for t > 0, where t are iid. 36 37 6 Estimation for Univariate Linear Processes N 1X 1 |ai − a| + . n i=1 2 ≤ We will now discuss the estimation of • the mean E(Xt ) = µ, We can choose n large enough so that the first term is smaller than 12 . The result follows. • the autocovariances γ(k) = E[(Xt − µ)(Xt+k − µ)], Theorem 6.2 Let {Xt ; t = 1, 2, ...} be a time series satisfying • the parameters lim E(Xt ) = µ, under the assumption that Xt follows an ARIMA(p, d, q) process. t→∞ The data will be one realisation x1 , ..., xn of Xt , which will be called a time series. lim cov(X̄, Xt ) = 0, Furthermore, the selection of the unknown model and the application of the estimated model for forecasting will also be discussed. t→∞ where X̄ is as defined before. Then We will mainly consider the so-called large sample properties of an estimator, based on the assumption that we have a relatively long time series. lim E[(X̄ − µ)2 ] = 0. t→∞ Remark. The conditions of Theorem 6.2 mean that E(Xt ) is asymptotically a constant, and any single observation does not dominate the covariance. Note that stationarity is not required. 6.1 Estimation of µ The expected value µ = E(Xt ) can be estimated by the sample mean Proof. Now n n 1X µ̂ = X̄ = Xi . n i=1 (6.66) 2 2 so If Xt = t are iid with E(Xt ) = µ and var(Xt ) = σX , we have E(X̄) = µ and var(X̄) = n1 σX that lim E (µ̂ − µ)2 = 0. (6.67) n→∞ 1X E(Xt ) − µ n t=1 2 E[(X̄ − µ) ] = var(X̄) + !2 , where the second term on the right hand side converges to zero by Lemma 6.1. Furthermore, n var(X̄) = n 1 XX cov(Xt , Xj ) n2 t=1 j=1 For an estimator θ̂ of an unknown parameter θ, the quantity h i h i 2 E [θ̂ − θ]2 = E θ̂ − θ + var(θ̂) = n t n 2 XX 1 X var(Xt ) cov(X , X ) − t j n2 t=1 j=1 n2 t=1 is called the mean squared error (MSE) of θ̂. = If MSE(θ̂)→ 0 as n → ∞, then θ̂ is said to be consistent, denoted by θ̂ → θ, as n → ∞. In the i.i.d. case X̄ is consistent. 1 X 2 X t cov(X̄t , Xt ) − 2 var(Xt ) 2 n t=1 n t=1 ≤ 2X |cov(X̄t , Xt )|, n t=1 n 6.2 Properties of X̄ n n where X̄t is the sample mean of the first t observations. By Lemma 6.1 the last sum converges to zero. ♦ If Xt is stationary, then X̄ defined above is clearly unbiased, i.e. E(X̄) = µ. Theorem 6.3 Let {Xt } be a stationary time series with mean µ and autocovariances γ(k) such that γ(k) → 0 as k → ∞. Then X̄ is a consistent estimator of the mean µ. Hence E[X̄ − µ]2 = var(X̄). The following lemma will be used for some of the proofs in the chapter. Lemma 6.1 Let {an } be a sequence of real numbers and a ∈ R. Proof. Note that for a stationary process we have E(X̄) = µ. Hence we only need to show n lim an = a =⇒ lim 1X n→∞ n n→∞ n 38 ≤ n→∞ i=1 Proof. By assumption, given > 0, we may choose an N such that |an − a| < 12 for all n > N . For n > N , we have 1X ai − a n i=1 lim var(X̄) = 0. ai = a N n 1X 1 X |ai − a| + |ai − a| n i=1 n i=N +1 n n 1X 1X Xi , Xk n i=1 n k=1 var(X̄) = cov n ≤ n ! 1X 1X cov Xi , Xk n i=1 n k=1 n n 1X 1X = cov Xi , Xk n i=1 n k=1 ! ! 39 n 1X cov Xi , Xk n k=1 ! n n 1X 1X |cov(Xi , Xk )| = |γ(k − i)| n k=1 n k=1 = 1 n = n 2X |γ(k)| =: an → 0 n k=0 ≤ We obtain (n − |k|)γ(k) k=−(n−1) n−1 X = γ(k) − 2 n−1 X k k=1 k=−(n−1) n γ(k). By Lemma 6.4 the second term of the last equation tends to zero. Theorem 3.2 is proved. n var(X̄) ≤ n−1 X 1X an = an → 0 n i=1 k=−∞ Example. For any causal stationary ARMA process Xt we have γ(k) → 0. Hence, X̄ → µ as n → ∞. Example. Let Z be a Bernolli random variable with distribution P (Z = 1) = P (Z) = 0 = 0.5. Define 1, for Z = 1, Xt = sign(Z − 0.5) = t = 1, 2, .... −1, for Z = 0, It is easy to show that {Xt } is a stationary process with zero mean and γ(k) ≡ 1 for all k, i.e. γ(k) 6→ 0 as k → ∞. For this process we have either X̄ ≡ 1 (for z = 1) or X̄ ≡ −1 (for z = 0). None of them is equal to or converges to µ = 0. lim then lim n→∞ n X |aj | < ∞, For a causal stationary ARMA(p, q) process, the above result reduces to !2 Pq σ2 j=0 ψj P var(X̄) ≈ . n 1 − pi=1 φi Example: Let x1 , x2 , ..., x400 be an observed time series following the theoretical model Xt − µ = 0.5(Xt−1 − µ) + 0.3(Xt−2 − µ) + t , where t are iid N (0, σ2 ) random variables. Calculate the asymptotic variance of the sample mean n X j n 25 2 . σ2 σ = 0.0625σ2 . var(x̄) = (1 − 0.5 − 0.3)−2 = 400 400 |aj | = 0. Using the formula for γ(0) of an AR(2) model, we obtain (1 − φ2 )σ2 (1 + φ2 )[(1 − φ2 )2 − φ21 ] 2 0.7 (1 − 0.3)σ = = σ 2 = 2.244σ2 . (1 + 0.3)[(1 − 0.3)2 − 0.52 ] 1.3 ∗ 0.24 Proof. Set e.g. N = n1/3 , then we have n X j j=0 n |aj | = var(Yt ) = var(Xt ) = γ(0) = N X j j=0 n |aj | + n X j |aj |, n j=N +1 We have var(ȳ) = var(Yt )/400 = 0.0056σ2 , i.e., asymptotically, var(x̄) > 10var(ȳ). where the second term tends to zero by assumption, because N → ∞, and N X j j=0 n Theorem 6.6 Let Xt be a causal stationary process with MA(∞) representation Xt = Theorem 6.5 Assume that {Xt } is a stationary time series with absolutely summable autocovariances γ(k). Then X̄ → µ, as n → ∞. Furthermore, lim n var(X̄) = n→∞ ∞ X γ(k). k=−∞ Proof. We only need to show the second statement, for which we have n n 1 XX nvar(X̄) = γ(t − j) n j=1 t=1 The following theorem provides a CLT for a general linear process, which is a linear filter of an iid WN t with var(t ) = σ2 . |aj | < max(|aj |)n−1/3 → 0. too, as n → ∞. The result holds. 40 (6.68) Solution. We have j=0 j=0 k=−∞ x̄. Furthermore, assume that yt , t = 1, 2, ..., 400, are iid random variables with var(Yt ) = var(Xt ) and unknown mean. Compare var(ȳ) with the asymptotic variance of x̄, where ȳ is the sample mean of yt . Lemma 6.4 (Kronecker’s lemma) If the sequence {aj } is such that n→∞ The asymptotic variance of X̄ is larger than that of the sample mean of iid random variables Yt ∞ ∞ P P with the same variance as Xt , if γ(k) > γ(0) and smaller, if γ(k) < γ(0). ∞ X αj t−j j=0 with ∞ P j=0 |αj | < ∞, ∞ P αj 6= 0, and the t are iid random variables with E(t ) = 0 and var(t ) = σ2 . j=0 Then √ nX̄ →D N (0, V ), where V = ∞ X k=−∞ γ(k) = ∞ X (6.69) !2 αj σ2 . j=0 41 and the sign →D means convergence in distribution. Solution. Xt is MA(2). We have Note that Theorem 6.6 holds for all ARMA processes. 0.25 · 9 . σ2 var(x̄) = (1 − 0.8 + 0.3)2 = = 0.0025. n 900 We can give a confidence interval for µ of an ARMA model. √ √ Since nX̄ tends to a normal distribution, X̄ is called n convergent. In general case with E(Xt ) = µ 6= 0, we have √ n(X̄ − µ) →D N (0, V ), and γ(0) = (1 + ψ12 + ψ22 )σ2 = 1.73σ2 and hence var(ȳ) = var(Yt )/n = γ(0)/900 = 0.0173. . Asymptotically, var(x̄) = 0.1445var(ȳ) and SDX̄ = 0.05, SDȲ = 0.1315. (6.70) The approximate 95% confidence intervals are: µX ∈ [x̄ − 2SDX̄ , x̄ + 2SDX̄ ] = [35.15, 35.35] where V is the same as in (6.69). The assumption ∞ P and αj 6= 0 is necessary. µY ∈ [ȳ − 2SDȲ , ȳ + 2SDȲ ] = [25.487, 26.013]. j=0 Example. Let Xt = t − t−1 . Now we have ∞ P αj = 1 − 1 = 0 and hence the results of Theorem j=0 6.6 do not hold for this Xt . Example. Continue the example about the variances of x̄ and ȳ. Assume that there σ2 = 1, we have the standard deviations of x̄ and ȳ are about 0.25 and 0.075, respectively. For simplicity, . we can use the standard normal quantile Z0.025 = 1.96 = 2 to calculate the approximate 95% confidence interval. Now an approximate 95% confidence interval, e.g. for µX , is simply x̄ ± 2 × SDX̄ . Assume that we obtained x̄ = 10.5 and ȳ = 15.15 from the data. Then the approximate 95% confidence intervals are µX ∈ [10, 11] and µY ∈ [15, 15.30]. The length of the former is more than three times of that of the latter. √ Following the CLT n(X̄ − µ) →D N (0, V ) we have, asymptotically, In this example, the asymptotic variance of x̄ is about 14.5% of that of ȳ by an iid random variable with the same variance. Hence, the aymptotic standard deviation of x̄ is also much smaller than that of ȳ. Consequently, the confidence interval of µX is much shorter than that of µY . This example provides a case where the estimation of the unknown mean in dependent data is more accurate than the estimation in independent data. X̄ − µ q ∼ N (0, 1), V n where p V /n is the asymptotic standard deviation of X̄. For an ARMA model we have simply !2 Pq j=0 ψj P V = σ2 . 1 − pi=1 φi This means, for any (upper) normal quantile Zα/2 we have, with about (1 − α) cover probability, r r V V −Zα/2 ≤ X̄ − µ ≤ Zα/2 , n n or equivalently, " µ ∈ X̄ − Zα/2 r V , X̄ + Zα/2 n r # V . n Example: Let x1 , x2 , ..., x900 be an observed time series following the theoretical model Xt − µX = −0.8t−1 + 0.3t−2 + t , where t are iid N (0, σ2 ) random variables with σ2 = 9. Furthermore, assume that yt , t = 1, 2, ..., 900, are iid random variables with var(Yt ) = var(Xt ) and unknown mean. Assume we have x̄ = 35.25 and ȳ = 25.75. Calculate var(x̄) asymptotically. Calculate var(ȳ) and compare it with var(x̄). And then construct the approximate 95% confidence intervalers of µX and µY , and compare them with each other. 42 43 6.3 Estimation of the acf Theorem 6.7 Let the time series {Xt } be defined by Given a stationary process with mean µ and autocovariances γ(k). Xt − µ = n−k (6.71) where the sequence {αj } is absolutely summable and t are iid (0, σ2 ) innovations with E(4t ) = ησ4 . Then, for fixed k ≥ h ≥ 0 E{γ̂(k) − γ(k)} = − and γ̃(k) = n−k X 1 (Xt − X̄)(Xt+k − X̄) n − k t=1 (6.72) |k| n − |k| γ(k) − var(X̄) + O(n−2 ) n n (6.75) and n2 cov{γ̂(k), γ̂(h)} n→∞ n − k = (η − 3)γ(k)γ(h) ∞ X + [γ(j)γ(j − k + h) + γ(j + h)γ(j − k)] lim for k = 0, 1, ..., n − 1. For k = −(n − 1), ..., −1 we define γ̂(k) = γ̂(−k) and γ̃(k) = γ̃(−k). Note that γ̂(0) = γ̃(0). Remark. If µ is known, we can use µ instead of X̄ in the above definitions. Now the error caused by X̄ is avoided. However, it can be shown that, for fixed k, the asymptotic properties of γ̂(k) or γ̃(k) are the same for cases with known or unknown µ. (6.76) j=−∞ Remark. O(n−2 ) denotes a series of the order n−2 . For two series ai and bi , we say ai is of the same order as bi , denoted by ai = O(bi ), if lim ai /bi tends to some non-zero constant. And if The autocorrelations can be estimated by ρ̂(k) = or αj t−j , j=−∞ Two reasonable estimators of γ(k) are (sample size n) 1X γ̂(k) = (Xt − X̄)(Xt+k − X̄) n t=1 ∞ X γ̂(k) γ̂(0) γ̃(k) ρ̃(k) = . γ̃(0) i→∞ (6.73) (6.74) For k = 0 we have ρ̂(0) = ρ̃(0) ≡ 1. Hence, for estimating ρ(k) we only need to discuss the properties of these estimators with k 6= 0. Which of these two estimators should be used? At the first sight, it seems that γ̃(k) is more logical than γ̂(k), because we just have n − k product terms in the sum. • bias(γ̂(k)) >bias(γ̃(k)) lim ai /bi → 0, we say that ai is of a smaller order than bi , denoted by ai = o(bi ). i→∞ Remark. What we can learn from the above results are: For estimating µ, the existence of the variance is often required. And for estimating γ(k), the existence of the fourth order moments is often required. Note that, if t are independent N (0, σ2 ) random variables, then (η − 3)γ(k)γ(h) = 0, since η = 3 for a normal distribution. Theorem 6.8 Under the assumptions of Theorem 6.7 and for fixed i ≥ j > 0, the following results hold: n − |i| E{ρ̂(i)} = ρ(i) + O(n−1 ) n and . ncov{ρ̂(i), ρ̂(j)} = wij , (6.77) where wij are some constants depending on ρ(k) in a very complex way. • MSE(γ̂(k)) <MSE(γ̃(k)) (in general) • γ̃(k) is not necessarily positive semi-definite but γ̂(k) is always positive semi-definite. Due to these reasons, γ̂(k) is often used in a statistical software. γ̂(k) also has a clear disadvantage, i.e. Let k ≥ 1 be fixed. Define ρ(k) = (ρ(1), ..., ρ(k))0 and ρ̂(k) = (ρ̂(1), ..., ρ̂(k))0 . The following theorem shows that ρ̂(h), h = 1, 2, ..., k, for fixed k are asymptotically normal. Theorem 6.9 Under the assumptions of Theorem 6.7 and for fixed k ≥ 1, we have ρ̂ →D N (ρ, n−1 W ), n−1 X γ̂(k) = 0 where W = (wij ) with wij defined above. k=−(n−1) for any data set, no matter what the process is. This is a restriction introduced by the definition and is not a property of the underlying process. Hence, only γ̂(k) with relatively small lags should be estimated and used. Also the estimator γ̃(k) should only be calculated for relatively smaller k’s. A √ rule of thumb is, the maximal lag used, say m, should be negligible compared to n, e.g. m = n. Remark. If k is negligible compared to n, then the difference between γ̂(k) and γ̃(k) is small. For fixed k, γ̃(k) resp. ρ̃(k) have the same asymptotic properties as γ̂(k) resp. ρ̂(k). 44 Example. Let Xt = {t } WN with E(t ) = 0 and var(t ) = σ2 (ρ(k) = 0 for k 6= 0) 1, if i = j wij = 0, otherwise. ρ̂(1), ρ̂(2), ..., ρ̂(k), are asymptotically independent with mean zero and variance n1 . Therefore, 95% of the sample autocorrelations should lie between the bounds 1.96 . 2 ± √ = ±√ . n n 45 √ These are the so-called ±2/ n confidence bands given on a correlogram in R for the sample acf, which shows empirically, whether the underlying process could be a WN or not. Remark. Note that this results is obtained under the iid assumption. It should be noticed that, if more than 5% of the ρ̂(k) lie outside these bounds, then we can say that the underlying process is possibly not an independent WN process. However, if more than 95% of ρ̂(k) lie between these two bounds, we cannot say that the process is iid, because the second order properties of an iid process and an uncorrelated white noise process are the same. The estimation of an AR(1) model in thePcase with unknown mean µX is similar. Now P we should first P calculate x̄ (from the sum of all xt , nt=1 xt ), and then calculate the two sums nt=1 (xt − x̄)2 and nt=1 (xt − x̄)(xt+1 − x̄). If information about these three sums are given, then it is enough for all further calculations. Example. Assume that we have a time series x1 , ..., x900 following an AR(1) model with unknown mean. From the data we obtained 900 X xt = 27254.45 , t=1 7 Estimation of the ARMA Model and t=1 899 X In this section we will discuss the estimation of the unknown parameters of an ARMA model. 900 X (xt − x̄)2 = 13347.46 (xt − x̄)(xt+1 − x̄) = 8385.93. t=1 At first it is assumed that the orders p and q are known. For an ARMA(p, q) model with Then φ(B)Xt = ψ(B)t (7.78) (σ2 ; φ1 , ..., φp ; ψ1 , ..., ψq )0 . the unknown parameters are θ = If we want to check, whether the distribution would be normal or not, more detailed analysis will be involved. x̄ = 27254.45/900 = 30.28 and φ̂ = ρ̂(1) = 899 X (xt − x̄)(xt+1 − x̄)/ t=1 900 X (xt − x̄)2 t=1 = 8385.93/13347.46 = 0.6283. 7.1 Estimate an AR(1) model Xt = φXt−1 + t , (7.79) The fitted model is Xt − 30.28 = 0.6283(Xt−1 − 30.28) + t . where |φ| < 1 and t is a WN process. For an AR(1) model, the unknown innovation variance σ2 can also be estimated from the information given in the above examples. Let Yt = Xt − µ, Yt is an AR(1) process with zero mean, unknown parameter φ and var(Yt ) = var(Xt ). We know Yt has the MA(∞) representation: We have two unknown parameters σ2 and φ. Note that φ = ρ(1). Hence an estimator of ρ(1) will also be an estimator of φ. Given observations x1 , ..., xn , the simplest estimator of φ is hence Pn−1 (xi − x̄)(xi+1 − x̄) φ̂ = ρ̂(1) = i=1Pn . 2 i=1 (xi − x̄) Yt = (7.80) If we know that E(Xt ) = 0, the use of x̄ is not necessary and φ̂ can also be calculated by Pn−1 i=1 xi xi+1 φ̂ = ρ̂(1) = P . (7.81) n 2 i=1 xi ∞ X φi t−i , i=0 from which we have var(Xt ) = var(Yt ) = γ(0) = ∞ X ! 2i φ σ2 = 1/(1 − φ2 )σ2 . i=0 By definition, we have n The properties of φ̂ are the same√ as discussed in the last section. In particular, |φ̂| ≤ 1 (why?) and it is under given conditions n-consistent and asymptotically normally distributed. Example. AR(1) model with zero mean: From a time series x1 , ..., x400 we have obtained 400 X i=1 x2i = 1012.74 and 399 X γ̂(0) = 1X 2 x. n t=1 t The unknown innovation variance σ2 can hence be estimated from φ̂ and γ̂(0), σ̂2 = γ̂(0)(1 − φ̂2 ). xi xi+1 = 798.15. i=1 Example. In the first example above, where it is assumed that µ = 0, we have simply Then we have φ̂ = ρ̂(1) = 798.15/1012.74 = 0.788. The fitted model is n γ̂(0) = Xt = 0.788Xt−1 + t . This example is indeed calculated using a realisation following the AR(1) model Xt = 0.8Xt−1 + t , where t are iid N (0, 1) random variables. 46 1X 2 x = 1012.74/400 = 2.532 n t=1 t and σ̂2 = (1 − φ̂2 )γ̂(0) = 2.532(1 − 0.7882 ) = 0.960. 47 Example. In the second example above, we have γ = Γφ ⇐⇒ φ = Γ−1 γ n 1X (xt − x̄)2 = 13347.46/900 = 14.831 n t=1 provided the inverse of Γ exists. We now replace γ and Γ by their estimates γ̂ and Γ̂ to obtain our estimates for φ, and φ̂ = Γ̂−1 γ̂ σ̂2 = (1 − φ̂2 )γ̂(0) = 14.831(1 − 0.62832 ) = 8.976. . 8.976 . −2 Furthermore, we have var(x̄) = 900 (1 − 0.6283) = 0.0722. And SDx̄ = 0.27. The approximate 95%-CI of µ is µ ∈ [30.28 − 2 ∗ 0.27, 30.28 + 2 ∗ 0.27] = [29.74, 30.82]. provided the inverse of Γ̂ exists. Generally, Γ̂−1 exists, because Γ̂(k) is positive semi-definite. Results based on (7.84) are called Yule-Walker Estimators. Remark. Denote by Yt = Xt − µ and X1t = Xt−1 . Then we have, for t = 2, ..., n, Remark. If all γ̂(k) are replaced by ρ̂(k), the solutions will not change (why?). Yt = φX1t + t . (7.85) Once we have estimates for γ and φ we obtain an estimate for σ2 from We have E[Yt |(X1t = xt−1 )] = φxt−1 . γ(0) − φ1 γ(1) − φ2 γ(2) − . . . − φp γ(p) = σ2 That is we obtain a linear regression function f (x1t ) = a + bx1t with a = 0 and b = φ, which is called a regression through the origin because of the fact f (0) = 0. The proposed φ̂ here is indeed an approximate least squares estimator of φ in the above regression model. This idea can be extended to general AR(p) models. or from the explicit formulas for σ2 for AR(1) and AR(2) processes. Two special cases: (i) For AR(1) with p = 1, there is only one equation in (7.85), i.e. γ̂(1) φ̂1 = [γ̂(0)]−1 γ̂(1) = γ̂(0) = ρ̂(1), as proposed earlier. 7.2 The Yule-Walker Estimators of an AR(p) Model (ii) For AR(2) with p = 2, by replacing all γ̂(k) with ρ̂(k), we have For an AR(p) Y with unknown mean µ we have Yt − µ = φ1 (Yt−1 − µ) + · · · + φp (Yt−p − µ) + t . (7.82) Let Xt = Yt − µ, We obtain Consider again the equations for γ of an AR(p), see one of the previous lectures for p = 2. φ2 γ(2) − . . . and γ(1) − φ1 γ(0) − φ2 γ(1) − . . . γ(2) − φ1 γ(1) − φ2 γ(0) − . . . .. .. .. ... . . . γ(p) − φ1 γ(p − 1) − φ2 γ(p − 2) − . . . φ1 γ(1) − The last p equations can be rewritten as γ(0) γ(1) γ(1) γ(2) γ(1) γ(0) γ(2) γ(2) γ(1) = .. .. .. . . . 48 γ(p) {z } γ | 1 − ρ̂(2) 1 − ρ̂2 (1) = 1 ρ̂(1) ρ̂(1) 1 −1 ρ̂(1) ρ̂(2) . (7.83) Remark: This is a linear regression model (through the origin). Although this is not a usual regression model, because the regressors Xt−1 , ..., Xt−p depend strongly on each other, the unknown parameters φ1 , ..., φp can still be estimated using the method of least squares as in standard regression analysis. A solution based on such a least squares method will lead to the same result as the following Yule-Walker estimation. | The solutions are Xt = φ1 Xt−1 + · · · + φp Xt−p + t . γ(0) − φ̂1 φ̂2 γ(2) γ(1) γ(0) .. . − φp γ(p) = − φp γ(p − 1) = 0, − φp γ(p − 2) = 0, .. .. . . − φp γ(0) = 0, γ(p − 1) γ(p − 2) γ(p − 3) . . . {z Γ γ(0) }| φ(p) {z } φ and φ̂2 = 1 − 1 − ρ̂(2) 1 − ρ̂2 (1) . Example. From a time series with n = 400, x1 , ..., x400 , we have x̄ = 10.06, γ̂(0) = 1.6378, γ̂(1) = 0.9176 and γ̂(2) = 0.1969. Assume that the theoretical model is an AR(2) with unknown mean µ. Estimate and write down the model. Find a 95%-CI for µ. Solution. We have ρ̂(1) = 0.9176/1.6378 = 0.5603 and ρ̂(2) = 0.1969/1.6378 = 0.1202. 1 − 0.1202 1 − ρ̂(2) = 0.5603 = 0.7185 φ̂1 = ρ̂(1) 1 − ρ̂2 (1) 1 − 0.56032 σ2 , . . . γ(p − 1) φ(1) . . . γ(p − 2) φ(2) . . . γ(p − 3) φ(3) .. .. . . φ̂1 = ρ̂(1) and φ̂2 = 1 − 1 − ρ̂(2) 1 − 0.1202 =1− = −0.2824. 1 − ρ̂2 (1) 1 − 0.56032 The estimated AR(2) model is Xt − 10.06 = 0.7185(Xt−1 − 10.06) − 0.2824(Xt−2 − 10.06) + t . Now, insert φ̂1 and φ̂2 into the formula (7.84) γ(0) = (1 − φ2 )σ2 (1 + φ2 )[(1 − φ2 )2 − φ21 ] we have γ̂(0) = 1.5839σ2 =⇒ σ2 = 1.6378/1.5839 = 1.034. 49 Furthermore, we have . var(x̄) = σ2 400 estimators of an ARMA model. Note that, an estimated ARMA model will always be causal stationary, because parameters outside the causal stationary regions will not be considered. This is a restriction of the estimation method, which does not mean that the underlying model should certainly be a causal stationary ARMA model. We can do a test on the hypothesis, that the model is an ARMA. But this is not the topic of our lecture. 1.034 (1 − 0.7185 + 0.2824)−2 = 0.0081. 400 (1 − φ̂1 − φ̂2 )−2 = . Hence, SDx̄ = 0.09. The approximate 95% CI for µ is µ ∈ [10.06 − 2 ∗ 0.09, 10.06 + 2 ∗ 0.09] = [9.88, 10.24]. Remark. The estimation of an AR(p) model is not difficult, because it is related to a regression problem. The first p + 1 estimates of the autocovariances γ̂(k), k = 0, 1, ..., p, (together with n) contain all information we need to estimate an AR(p) model using the above proposal. 7.3 Least Squares Estimators of an ARMA Model In the following, only the basic ideas to estimate an ARMA(p, q) model will be described briefly. Assume that E(Xt ) = 0. For an AR(p) we have t = Xt − φ1 Xt−1 − ... − φp Xt−p . The least squares estimators φ̂1 , ..., φ̂p are obtained by minimising the squared sum Q := n X 2 (ˆt ) = t=p+1 n X Remark: Another commonly used method for estimating an ARMA model is the maximumlikelihood (ML) method, which will not be discussed in our lecture. When the distribution of Xt is normal, then the least squares method and the ML method are asymptotically equivalent. 7.4 Selection of an ARMA(p, q) Model Assume that the underlying process Xt is an ARMA(p0 , q0 ) model with unknown (p0 , q0 ). The question is how the unknown parameters (p0 , q0 ) can be estimated. This is the so-called model selection problem, which also occurs in other areas of statistics. Earlier it was indicated that the partial acf can be used for selecting an AR model. This method is however no more a commonly used one in modern statistics. Nowadays, model selection is often carried out following some information criteria. The optimal orders p0 and q0 will then be estimated (or selected) from some reasonable values of p and q. Note that, for given (p, q), an ARMA(p, q) model is estimated by minimising the least squares criterion n X ˆ2t . (7.88) Q= 2 (Xt − φ̂1 Xt−1 − ... − φ̂p Xt−p ) . t=p+1 t=1 This convenience does not exist, if the model has an MA part. Now we have t = Xt − φ1 Xt−1 − ... − φp Xt−p − ψ1 t−1 − ... − ψq t−q , (7.86) which involves unobservable -values in the past. Hence, an approximate recursion is required to carry out a least squares estimation for an general ARMA model. For some trail parameters φ1 , ..., φp and ψ1 , ..., ψq , ˆt can be approximated in the following way. (i) Calculate µ̂ = x̄ and define yt = xt − x̄, the centralised data again. (ii) Put ˆt = 0 and yt = 0 for t ≤ 0, because we do not have information about them. However, the criterion Q cannot be directly used for selecting p and q, because the larger p or q are, the smaller the minimised Q is. Hence, for model selection, a penalty term for large p and q should be introduced. For given (p, q), denote the minimised value of Q as Q(p, q). Then define 2 σ̂p,q = Q(p, q) . n (7.89) 2 Note that, Q is the squared sum of ˆt . Hence, σ̂p,q defined in this way is an estimator of σ2 by 2 definition. Two well known information criteria proposed in the literature based on σ̂p,q are (i) the AIC (Akaike’s Information Criterion, Akaike, 1978) (iii) Set 2 AIC(p, q) = ln σ̂p,q +2 ˆ1 = y1 ˆ2 = y2 − φ1 y1 − ψ1 ˆ1 .. . p+q , n (7.90) (ii) the BIC (Bayesian Information Criterion, Akaike, 1977, Schwarz, 1978) ˆt = yt − φ1 yt−1 − ... − φp yt−p − ψ1 ˆt−1 − ... − ψq ˆt−q 2 BIC(p, q) = ln σ̂p,q + ln(n) (p + q) . n (7.91) for t > max(p, q). The effect due to the above approximation is negligible for large n. The penalty term in the BIC is larger than that in the AIC. Hence, the orders selected following the BIC are not larger than those selected following the AIC. We see, results in time series analysis are often obtained based on a large number of observations. Now we can define n X Q(φ1 , ..., φp ; ψ1 , ..., ψq ) = ˆ2t . (7.87) The main different between these two methods is, p̂ and q̂ selected following the BIC are consistent estimators of p0 and q0 , i.e. they tends to p0 and q0 respectively, as n → ∞. However, those selected following the AIC are not consistent. Due to this reason we prefer to use BIC. t=1 And we estimate φ1 , ..., φp and ψ1 , ..., ψq , by minimising Q over all values of φ1 , ..., φp and ψ1 , ..., ψq within the causal stationary regions of these parameters. These are the approximate least squares 50 Simulated examples: Three realisations are generated following the AR(4) model Xt = 0.4Xt−1 + 0.2Xt−2 + 0.0Xt−3 − 0.3Xt−4 + t 51 Table 3.1. BIC’s and p̂ for the three simulated time series p 1. S. 2. S. 3. S. 0 0.446 0.233 0.548 1 0.208 0.115 0.272 2 0.223 0.119 0.272 3 0.224 0.133 0.293 4 0.137* 0.091* 0.183* 5 0.158 0.122 0.211 6 0.179 0.153 0.237 7 0.188 0.181 0.268 8 0.201 0.207 0.297 Then the optimal linear prediction of Xn+k is given by the conditional expaectation of Xn+k given X1 , ..., Xn , i.e. X̂n+k = E[Xn+k |Xn , ..., X1 ]. (8.92) For an AR(p) process we have Xn+k = φ1 Xn+k−1 + ... + φp Xn+k−p + n+k . The point prediction for Xn+k and given observations x1 , ..., xn , is now x̂n+k = φ1 (x̂n+k−1 |xn , ..., x1 ) + ... + φp (x̂n+k−p |xn , ..., x1 ), (8.93) with iid N (0, 1) t . Table 3.1 shows the estimated BIC’s for p = 0, 1, ..., 8, where p̂ with minimal BIC is indicated with a small star. where (x̂n+k−i |xn , ..., x1 ), i = 1, ..., p, are either the observations xn+k−i , if k − i ≤ 0 or the prediction x̂n+k−i obtained before, if k − i > 0. The estimated models obtained from these three time series using p̂ as the AR order are: The prediction procedure is as follows. (i) Xt = 0.407Xt−1 + 0.254Xt−2 − 0.027Xt−3 − 0.323Xt−4 + t (i) The first step optimal linear prediction for k = 1 is x̂n+1 = φ1 xn + ... + φp xn+1−p . (ii) Xt = 0.326Xt−1 + 0.111Xt−2 − 0.029Xt−3 − 0.264Xt−4 + t (iii) Xt = 0.462Xt−1 + 0.142Xt−2 + 0.088Xt−3 − 0.358Xt−4 + t . (ii) The second step optimal linear prediction for k = 2 is x̂n+2 = φ1 x̂n+1 + φ2 xn + ... + φp xn+2−p . 7.5 Selection of d in an ARIMA(p, d, q) model (iii) The k-step optimal linear prediction for k > p is Assume that now the time series Xt follows an ARIMA(p, d, q) with d ∈ {0, 1}. We have also to determine, whether Xt is integrated (with d = 1) or not (with d = 0). This can be solved in the following way. (i) Set d = 0. Estimate ARMA models from the original data x1 , ..., xn and select p̂0 , q̂0 using BIC. We obtain the minimised BIC in this case, denoted by BIC(0). (ii) Set d = 1. Let yt = ∆xt = xt − xt−1 for t = 2, ..., n. Estimate ARMA models from the y2 , ..., yn and select p̂1 , q̂1 using BIC. We obtain the minimised BIC in this case, denoted by BIC(1). (iii) Compare the two values of BIC. We will select d = 0, if BIC(0)<BIC(1) and use the ARIMA(p̂0 , 0, q̂0 ) (i.e. the ARMA(p̂0 , q̂0 )) model obtained in 1. Otherwise we will select d = 1 and use the ARIMA(p̂1 , 1, q̂1 ) model obtained in 2. x̂n+k = φ1 x̂n+k−1 + ... + φp xn+k−p . Remark. If an AR(p) model is given, than the point predictions for such a model depend only on the last p observations. This will be clarified using examples. Remark. For large k, X̂n+k tends to zero, the true mean. This means that, the given observations have only a clear effect on some future observations in a short period. Example: From a time series x1 , ..., xn we obtained the following AR(1) model: Xt = 0.7Xt−1 + t , where t are iid WN with mean zero. The last observation is xn = −1.2. Calculate the first 10 point predictions. Solutions: x̂n+1 = φ̂xn = 0.7∗(−1.2) = −0.84, x̂n+k = φ̂x̂n+k−1 = φ̂k xn for k ≥ 2. All predictions are listed in the following table. 8 Forecasting for Linear Processes The first 10 point predictions for the above AR(1) model with xn = −1.2 k x̂n+k 8.1 Point Forecasting for an ARMA Process 1 -0.840 2 -0.588 3 -0.412 4 -0.288 5 -0.202 6 -0.141 7 -0.099 8 -0.069 9 -0.048 10 -0.034 Example. From a time series x1 , ..., xn we obtained the following AR(2) model: For simplicity, the data considered in this and the next subsections are assumed to be centralised with zero sample mean. Assume that we have fitted an ARMA model to a time series x1 , ..., xn . Now we want to forecast a future observation Xn+k with k ≥ 1 based on the fitted model. This will be explained briefly in the following for AR(p) models. The method introduced in this subsection is often called BoxJenkins Method (Box and Jenkins, 1970). We will see again, similar to the estimation of a model, the point forecasting is easier for AR(p) than for MA or ARMA models. If we have a prediction (the same as forecast) X̂n+k , then the MSE of the prediction is 2 MSE(X̂n+k ) = E[(X̂n+k − Xn+k ) ]. Considered are only the so-called linear predictions, which are linear functions of all observations in the past. 52 Xt = 0.6Xt−1 + 0.3Xt−2 + t , where t are iid WN with mean zero. The last two observation are xn−1 = 1.1 and xn = 0.5. Calculate the first 10 point predictions. Solutions: x̂n+1 = φ̂1 xn + φ̂2 xn−1 = 0.6 ∗ 0.5 + 0.3 ∗ 1.1 = 0.630, x̂n+2 = φ̂1 x̂n+1 + φ̂2 xn = 0.6 ∗ 630 + 0.3 ∗ 0.5 = 0.528, x̂n+k = φ̂1 x̂n+k−1 + φ̂2 x̂n+k−2 for k ≥ 3. All predictions are listed in the following table. The first 10 point predictions for the above AR(2) model with xn−1 = 1.1, xn−1 = 0.5. k x̂n+k 1 0.630 2 0.528 3 0.506 4 0.462 5 0.429 6 0.396 7 0.366 8 0.338 9 0.313 10 0.289 The optimal linear prediction for ARMA(p, q) processes is more complex, which will not be discussed in this lecture. 53 8.2 Interval Forecasting for Normal ARMA(p, q) Processes Assume now the process is normally distributed, then it is possible to calculate forecasting intervals for a future observation Xn+k . Recall that a causal stationary ARMA(p, q) process can be represented in MA(∞) form: ∞ X Xt = αi i . i=0 Note that the point prediction X̂n+k introduced in the last subsection is the conditional mean of a future observation Xn+k . It can be shown that the conditional variance of Xn+k given x1 , ..., xn is k−1 X var(Xn+k |Xn , ..., X1 ) = σ2 αi2 . i=0 Observe that α0 = 1 and var(Xn+k ) = σ2 ∞ P αi2 , we have i=0 (i) var(Xn+k |Xn , ..., X1 ) increases as k increases. (ii) var(Xn+1 |Xn , ..., X1 ) = σ2 . (iii) σ2 ≤ var(Xn+k |Xn , ..., X1 ) ≤ σ2 ∞ P αi2 = var(Xn+k ) for k ≥ 2. i=0 (iv) lim var(Xn+k |Xn , ..., X1 ) = σ2 k→∞ ∞ P An example for interval forecasting by an AR(1) model αi2 = var(Xn+k ). 4 i=0 0 i=0 -2 i=0 2 The approximate 95% forecasting interval (FI) for Xn+k is v v u k−1 u k−1 uX uX Xn+k ∈ x̂n+k − 2σ t αi2 , x̂n+k + 2σ t αi2 . Remark. The FI here is for one observation, which is not the same as the FI for the sample mean x̄. Note that lim X̂n+k = 0 = µX and lim var(Xn+k |Xn , ..., X1 ) = var(Xn+k ). This means that k→∞ k→∞ 0 50 100 150 the larger k is, the fewer information about a future observation Xn+k is contained in the past observations. Example. Figure 3.2 shows a realisation of length 150 following the AR(1) model Xt = 0.8Xt−1 + t , where t are iid N (0, 1) random variables. Note that we have now αi = 0.8i , i = 0, 1, ... The first 100 observations are shown in solid lines, whereas the last 50 (see the first row of the table) in points, which are assumed to be unknown future values. The point predictions for n = 100 and k = 1, 2, ..., 50, and the 95%-forecasting intervals are shown in long- resp. short-dashed lines, where x100 = 2.966. The first 10 point predictions and the corresponding forecasting intervals are listed in the following table. The first 10 point predictions and the corresponding 95%-forecasting intervals k xn+k (future o.) x̂n+k lower bound upper bound 54 1 1.403 2.373 0.373 4.373 2 3.032 1.898 -0.663 4.460 3 1.687 1.519 -1.345 4.382 4 2.337 1.215 -1.826 4.256 5 2.509 0.972 -2.177 4.121 6 1.888 0.778 -2.439 3.994 7 3.956 0.622 -2.637 3.881 8 4.148 0.498 -2.788 3.784 9 2.917 0.398 -2.905 3.701 10 1.375 0.318 -2.996 3.633 55 9 Vector Time Series RPI and CPI rates 0.08 0.10 Literature: Brockwell, P. J. and Davis, R. A.: Introduction to Time Series and Forecasting, Springer, 2002 0.04 0.06 Definition 9.1 A multivariate time series (or vector time series) is a sequence of m-dimensional random vectors X(t) = (X1 (t), . . . , Xm (t))0 ,t ∈ T 0.02 for a given index set T. 0.00 Clearly, Xi is a univariate time series. −0.02 Example 9.2 Let X1 be the inflation rate in the UK measured by the Retail Price index (RPI), and X2 the inflation rate in the UK measured by the Consumer Price index (CPI). 1990 1995 2000 2005 2010 Time RPI and CPI 200 Γ(t, s) is a matrix of the form γ1,1 (t, s) . . . γ1,m (t, s) .. .. . . RPI Γ(t, s) = 150 γm,1 (t, s) . . . γm,m (t, s) with γi,j (t, s) = cov(Xi (t), Xj (s)) 100 for i, j = 1, . . . , m. Therefore, the diagonal elements of Γ(t, s) are the autocovariance functions of the one-dimensional time series Xi . CPI 1990 1995 2000 2005 2010 9.1 Stationary Vector TS Time As in the one-dimensional case we call X weakly stationary if We can now define the second order properties of X: i) E(Xi (t)2 ) < ∞ for all i ∈ 1, . . . , m, t ∈ T, • the mean of X(t): ii) EX(t) = µ ∈ Rm for all t ∈ T, and µ(t) = Note, that µ is an m-dimensional vector. µm (t) For a weakly stationary time series we use the notation E [Xm (t)] • and the autocovariance matrix as Γ(t, s) = cov(X(t), X(s)) 0 = E X(t) − µ(t) X(s) − µ(s) . 56 iii) Γ(r, s) = Γ(r + t, s + t) for all r, s, t ∈ T. µ1 (t) E [X1 (t)] .. = .. = E [X(t)] . . Γ(h) = Γ(t + h, t) and call it the covariance matrix at lag h. γi,i (h) is the autocovariance function of the stationary TS Xi , and γi,j (h) is called the crosscovariance function of Xi and Xj . 57 We can also define the correlation matrix at lag h: ρ1,1 (h) . . . ρ1,m (h) γi,j (h) .. .. R(h) = with ρi,j (h) = p . . γi,i (0)γj,j (0) ρm,1 (h) . . . ρm,m (h) ρi,j (h) is the correlation between Xi (h) and Xj (0). Properties of Γ and R: (i) γi,j (h) 6= γj,i (h) (in general) (ii) Γ(h) = Γ0 (−h) 6= Γ(−h) (iii) ρi,i (0) = 1 (iv) ρi,j (0) 6= 1 (in general) Every VAR(p) process can be written as a VAR(1) process. Consider the m-dimensional VAR(p)process X(t) = Φ1 X(t − 1) + ... + Φp X(t − p) + Z(t). X(t) Z(t) X(t − 1) 0 Define: ξ(t) = and V (t) = .. .. . . X(t − p + 1) 0 Φ1 Φ2 Φ3 · · · Φp−1 Φp I 0 0 ··· 0 0 0 I 0 · · · 0 0 F = .. .. .. .. .. . . . ··· . . 0 0 0 ··· I 0 where I is the m × m identity matrix. 9.2 Multivariate ARMA Processes We call a vector TS Z an m-dimensional White Noise process with covariance matrix Σ if Z is a sequence of i.i.d. random vectors, E [Z] = (0, . . . , 0)0 ∈ Rm and Γ(0) = Σ. Note that some authors (for example Brockwell & Davis) call Z a white noise process if Z is stationary with zero mean and covariance Σ for h = 0 Γ(h) = 0 otherwise We then obtain: ξ(t) = F ξ(t − 1) + V (t) mp where ξ(t), V (t) ∈ R X(t) = ΦX(t − 1) + Z(t) X(t)X(t − h)0 = ΦX(t − 1)X(t − h)0 + Z(t)X(t − h)0 E [X(t)X(t − h)0 ] = ΦE [X(t − 1)X(t − h)0 ] + E [Z(t)X(t − h)0 ] This would be a sequence of uncorrelated identically distributed random variables. Similarly to the definition of a one-dimensional ARMA process, we can now define a multivariate ARMA process. In particular: Definition 9.3 A multivariate Autoregressive Moving Average process of order (p, q) (ARMA(p, q)) is defined by Γ(0) = = = Γ(1) = Γ(h) = X(t) − Φ1 X(t − 1) − ... − Φp X(t − p) = Ψ1 Z(t − 1) + ... + Ψq Z(t − q) + Z(t) where Φ1 , . . . , Φp and Ψ1 , . . . , Ψq are m × m matrices. One can now define causality and invertibility as for a one-dimensional time series, see for example Brockwell & Davis. Of particular importance and widely used for modelling econometric data are vector autoregressive processes VAR(p), that is, q = 0 (no moving average term). and F is a mp × mp matrix. The covariance function Γ of a VAR(1) process can be found recursively, again as in the onedimensional case. ΦΓ(−1) + E [Z(t)X(t)0 ] ΦΓ(1)0 + E [Z(t)Z(t)0 ] ΦΓ(1)0 + Σ ΦΓ(0) ΦΓ(h − 1) It follows that: Φ = Γ(1)Γ(0)−1 if Γ(0)−1 exists. This relationship can be used for estimating the coefficient matrix Φ. Once Φ has been estimated, we can estimate Σ from 9.3 VAR(p) Γ(0) − ΦΓ(1)0 = Σ. Example 9.4 Consider the following two-dimensional VAR(1) process. 58 X(t) X1 (t) X2 (t) X1 (t) X2 (t) = ΦX(t − 1) + Z(t) 0.5 0.2 X1 (t − 1) Z1 (t) = + −0.5 0.2 X2 (t − 1) Z2 (t) = 0.5X1 (t − 1) + 0.2X2 (t − 1) + Z1 (t) = −0.5X1 (t − 1) + 0.2X2 (t − 1) + Z2 (t) 59 10 Introduction to ARCH and GARCH 0.05 0.10 The purpose of this chapter is to introduce the well known non-linear time series models ARCH (autoregressive conditional heterocedastic) (Engle, 1982, joint winner of Nobel Prize in Economics 2003) and GARCH (generalised ARCH, Bollerslev, 1986), which are very important for modelling financial time series. 0.00 −0.05 Financial time series have some special properties. Some of these are not consitent with linear models such as ARMA or ARIMA. In particular ftseLD 10.1 Properties of Financial Time Series −0.10 • Financial time series are generally integrated, i.e. they are not stationary but differencing stationary. Hence, we should either study the difference series or some other transforms of them. Commonly used transformations of a financial time series Xt are the difference series Yt = ∆Xt , the series of the log-returns Yt = ∆(log Xt ) 0 1000 2000 or that of the absolute returns 3000 4000 5000 6000 Index Yt = ∆Xt /Xt−1 . Figure 9: Daily returns of the FTSE100. • The conditional variance of financial time series does often depend on past values of X. This is usually not the case for the conditional mean. • Financial returns are often uncorrelated but not independent. They often depend on each other in some non-linear way, e.g. the quadratic returns often depend on each other. Due to this fact, such processes are called non-linear processes. • The conditional distribution of financial returns may be normal but the unconditional distribution of them is usually not. Conditional expectation of AR(p): 1.0 0.8 ACF 0.6 0.6 0.4 ACF 0.4 0.2 0.2 0.0 10.2 The ARCH Models 0.0 In 1982 Robert Engle introduced the very important ARCH model for modelling the above special properties of financial time series. 0.8 • Financial returns often have heavy tails, i.e. with var(Yt ) < ∞ but E(Yt4 ) = ∞. Some of the above properties are illustrated in Figure 9 which shows the time series of the daily log-returns of the FTSE100 stock market index from Jan 1985 to Feb 2010. There are some clustered observations with large variances, which illustrate the so-called GARCH-effect. The acf’s of the returns and squared returns (Figure 10) show that the former seem to be uncorrelated but the latter are clearly correlated to each other. This means further that the return are not independent. Series ftseLD^2 1.0 Series ftseLD 0 10 20 Lag 30 0 10 20 30 Lag E [Xt | Xt−1 , . . . , Xt−p ] = φ1 Xt−1 + . . . + φ1 Xt−p Figure 10: acf of the daily returns of the FTSE100, and acf of the squared daily returns. Conditional variance of AR(p): h i 2 E Xt − E [Xt | Xt−1 , . . . , Xt−p ] | Xt−1 , . . . , Xt−p 60 61 =E h Xt − φ1 Xt−1 , . . . , φ1 Xt−p 2 i | Xt−1 , . . . , Xt−p = E 2 = σ2 stationary. We then obtain var(Zt ) = E Zt2 = E ht 2t 2 2 = E α0 + α1 Zt−1 + ... + αp Zt−p = α0 + α1 var(Zt ) + ... + αp var(Zt ) This means that the conditional variance does not depend on the past observations, but the conditional mean does. ARCH models have been introduced as an alternative. These models are about the error term t . The particular form of the linear term is not important, and is therefore ignored in the following definition. An ARCH model of order p (ARCH(p)) is defined by Zt = p since Z is stationary. We solve this equation for var(Zt ) and obtain var(Zt ) = We obtain: 0 < var(Zt ) < ∞ ⇐⇒ 2 2 ht t with ht = α0 + α1 Zt−1 + ... + αp Zt−p where is a white noise process with E [2t ] = 1, and α0 > 0, αi ≥ 0 for i = 1, .., p. We often make the assumption that t ∼ N (0, 1). Here ht > 0 is the conditional variance of Zt , which depends on some past observations. This is what conditional heteroscedasticity means. The larger the squared past observations are, the larger the conditional variance. Z(t) can now replace the white noise term in linear models or can be used directly to model a time series. We obtain: E [Zt |Zt−1 , . . .] = E i hp i hp ht t |Zt−1 , . . . = E ht |Zt−1 , . . . E [t ] = 0 α0 . 1 − α1 − ... − αp p X (10.94) αi < 1. i=1 If the variance exists, its quantity depends linearly on α0 , which is hence called the scale parameter. Finally, an ARCH may have finite variance but infinite fourth order moments. Again an important property of financial returns. 10.3 The GARCH Models ARCH models were generalised by Bollerslev (1986) to GARCH(p, q) model, defined by p Zt = t ht with t ∼ N (0, 1) i.i.d. and since t is independent of ht . This is a common property of many financial return series (the so-called Martingale property). Note, that E [Z(t)] = 0 too. We obtain for the covariance function of Z: cov(Zt , Zt+k ) = E [Zt Zt+k ] i hp p ht t ht+k t+k = E i hp p = E ht t ht+k E [t+k ] = 0 since t+k , for k > 0, is independent of all other three terms and has mean zero. This is another important property of financial returns, that is, they are often uncorrelated with each other. 2 2 ht = α0 + α1 Zt−1 + ... + αp Zt−p +β1 ht−1 + ... + βq ht−q , (10.95) where α0 > 0, αi ≥ 0 for i = 1, ..., p, βj ≥ 0 for j = 1, ..., q A GARCH model has similar properties as those for an ARCH model. The key point is that a GARCH model is more practically relevant. 2 and on the variance of Zt−1 , while it only For GARCH models, the variance of Zt depends on Zt−1 2 depends on Zt−1 if Z follows an ARCH-model. Figure 11 shows the acf of the squared residuals of a GARCH(1,1) model fitted to the daily returns of the FTSE100. It is clear that Zt and Zt+k are not independent, because the variance of Zt+k depends on the past of Z, and therefore on Zt . An ARCH model is an example of uncorrelated random variables which are not independent. The process is called a non-linear process, because there is no linear correlation between the observations but there are some dependencies between them in a non-linear ways. In particular, 2 here Zt2 and Zt+k are correlated. Furthermore, note that the conditional distribution of an ARCH model is normal if t ∼ N (0, 1). Hence, the conditional moments of any order exist. However, this does not mean that the unconditional variance of an ARCH model exists. If var(Zt ) < ∞, we have var(Zt ) = E(Zt2 ). Assume further that the process started from the infinite past. It can be shown that an ARCH process with finite variance is weakly (and strongly) 62 63 ACF 0.0 0.4 0.8 ACF of Squared ftseLD 0 10 20 30 Lag ACF 0.0 0.4 0.8 ACF of Squared Residuals 0 10 20 30 Lag Figure 11: acf of the squared residuals of a GARCH(1,1) model fitted to the fTSE100 returns. 64