ST4064 Time Series Analysis ST4064 Time Series Analysis Lecture notes 1 ST4064 Time Series Analysis Outline I Introduction to time series analysis II Stationarity and ARMA modelling 1. Stationarity a. Definitions b. Strict stationarity c. Weak stationarity 2. Autocovariance, autocorrelation and partial autocorrelation a. Autocovariance b. Autocorrelation c. Partial autocorrelation d. Estimation of the ACF and PACF 3. ARMA modelling a. AR models b. MA models c. ARMA models 4. Backward Shift Operator and Difference Operator 5. AR(p) models, stationarity and the Yule-Walker equations a. The AR(1) model b. The AR(p) model and stationarity c. Yule-Walker equations 6. MA(q) models and invertibility a. The MA(1) model b. The MA(q) model and invertibility 7. ARMA(p,q) models 8. ARIMA(p,d,q) models a. Non-ARMA processes b. The I(d) notation 9. The Markov property III Non-stationarity: trends and techniques 1. Typical trends 2. Least squares trend removal 3. Differencing a. Linear trend removal b. Selection of d 4. Seasonal differencing 2 ST4064 5. 6. 7. 8. IV Time Series Analysis Method of moving averages Seasonal means Filtering, smoothing Transformations Box-Jenkins methodology 1. Overview 2. Model selection a. Identification of white noise b. Identification of MA(q) c. Identification of AR(p) 3. Model fitting a. Fitting an ARMA(p,q) model b. Parameter estimation: LS and ML c. Parameter estimation: method of moments d. Diagnostic checking V Forecasting 1. 2. 3. 4. VI The Box-Jenkins approach Forecasting ARIMA processes Exponential smoothing and Holt-Winters Filtering Multivariate time series analysis 1. 2. 3. 4. Principal component analysis and dimension reduction Vector AR processes Cointegration Other common models a. Bilinear models b. Threshold AR models c. Random coefficient AR models 5. ARCH and GARCH a. ARCH b. GARCH 3 ST4064 I. Time Series Analysis 4 Introduction to time series analysis A time series is a stochastic process in discrete time with a continuous state space. Notation: {X1, X2, ..., Xn } denotes a time series process, whereas {x1, x2, ..., xn } denotes a univariate time series, i.e. a sequence of realisations of the time series process. X1 X2 ... Xn-1 Xn Xn+1 S = (-!,!) x1 x2 0 I.1 1 … xn xn-1 2 n-1 ? n n+1 time Purposes of Time Series Analysis • • Describe the observed time series data: - mean, variance, correlation structure, ... - e.g. correlation coefficient between sales 1 month apart, 2 months apart, etc. ! Autocorrelation Function (ACF) ! Partial Autocorrelation Function (PACF) Construct a model which fits the data ! From the class of ARMA models, select a model which best fits the data based on ACF and PACF of the observed time series ! Apply Box Jenkins Methodology: o Identify tentative model o Estimate model parameters o Diagnostic checks - does the model fit? • Forecast future values of the time series process ! easy, once model has been fitted to past data All ARMA models are stationary. If an observed time series is non-stationary (e.g. upward trend), it must be converted to stationary time series (e.g. by differencing). ST4064 I.2 Time Series Analysis 5 Other forms of analysis Another important approach to the analysis of time series relies on the Spectral Density Function; the analysis is then based on the autocorrelation function of a time series model. This approach is not covered in this course. ST4064 II. Time Series Analysis 6 Stationarity and ARMA modelling II.1 Stationarity a. Definition A stochastic process is (strictly) stationary if its statistical properties remain unchanged over time. S X5 X10 X120 X125 5 10 120 125 time Joint distribution of Xt1, Xt2, ..., Xtn = Joint distribution of Xk+t1, Xk+t2, ..., Xk+tn, for all k and for all n. Example: Joint distribution of X5, X6, ..., X10 = Joint distribution of X120, X121, ..., X125 " for any ‘chunk’ of variables " for any ‘shift’ of start Implications of (strict) stationarity Take n = 1: Xt Xt+k t • t+k Distribution of Xt = distribution of Xt+k for any integers k Xt discrete: P(Xt = i) = P(Xt+k = i) for any k Xt continuous: f(Xt) = f(Xt+k) for any k In particular, E(Xt) = E(Xt+k) for any k Var(Xt) = Var(Xt+k) for any k • A stationary process has constant mean and variance • The variables Xt in a stationary process must be identically distributed (but not necessarily independent) ST4064 Time Series Analysis Take n = 2: • Xs Xt Xs+k Xt+k {s t} {s+k t+k} Joint Distribution of (Xs ,Xt) = Joint Distribution of (Xs+k ,Xt+k) " " " • In particular, COV(Xs ,Xt) = COV(Xs+k ,Xt+k) where • for all lags (t - s) for all integers k depends on the lag (t - s) COV(Xs ,Xt) = E[(Xs – E(Xs)) (Xt – E(Xt))] Thus COV(Xs ,Xt) depends only on lag (t – s) and not on time s b. Strict Stationarity • • • Very stringent requirement Hard to prove a process is stationary To show a process is not stationary show one condition doesn’t hold Examples: Simple Random Walk: {Xt} not identically distributed ! NOT stationary White Noise Process: {Zt} i.i.d. ! trivially stationary c. Weak Stationarity • This requires only that E(Xt) is constant AND COV(Xs, Xt) depends only on (t – s) • Since Var(Xt) = COV(Xt, Xt) this implies that Var(Xt) is constant • Weak stationarity does not imply strict stationarity • For weak stationarity, COV(Xt, Xt+k) is constant with respect to t for all lags k • Here (and often), stationary is shorthand for weakly stationary 7 ST4064 Time Series Analysis 8 Question: If the joint distribution of the Xt’s is multivariate normal, then weak stationarity implies strong stationarity. Solution: If X ~ N(!, ") then X is completely determined by ! and " (property of the multivariate Normal distribution). If these do not depend on t, neither does the distribution of X. Example: Xt = sin(#t + u), U ~ U[0, 2$] then E(Xt) = 0. Here COV(Xt, Xt+k) = cos(#k) E(sin2(u)), hence does not depend on t ! Xt is weakly stationary Question: If we know X0, then we can work out u, since X0 = sin(u). We then know all the values of Xt = sin(#t + u) ! Xt is completely determined by X0 Definition: X is purely indeterministic if values of X1, ..., Xn are progressively less useful at predicting XN as N % &. Here stationary time series means weakly stationary, purely indeterministic process. II.2 Autocovariance, autocorrelation and partial autocorrelation c. Autocovariance function Xt | t Xt+k | t+k time • For a stationary process, E(Xt) = µt = µ, for any t • We define !k = Cov(Xt, Xt+k) = E(Xt Xt+k) - E(Xt) E(Xt+k) the “autocovariance at LAG k”. • This function does not depend on t. • Autocovariance function of X: {'0, '1, '2, ...} = {'k : k ( 0} • Note: !0 = Var(Xt) Question: Properties of covariance – needed when calculating autocovariances for specified models. b. Autocorrelation function (ACF) • Recall that corr(X,Y) = Cov(X,Y) / ()X)Y) • For a stationary process, we define "k = corr(Xt, Xt+k) = !k/!0 the “autocorrelation at lag k”. ST4064 Time Series Analysis 9 (This is the usual correlation coefficient, since Var(Xt) = Var(Xt+k) = '0) • Autocorrelation Function (ACF) of X: { *0 , *1, *2 , ...} = {*k : k ( 0} • Note: "0 = 1 • For a purely indeterministic process, we expect *k % 0 as k % & (i.e. values far apart will not be correlated) • Recall (ST3053): a sequence of i.i.d. random variables {Zt} is called a white noise process and is trivially stationary. Example: {et} is a zero-mean white noise process if " E(et) = 0 for any t and " "k #! 2 , if k = 0 = COV ( e t ,e t + k ) = $ %0, otherwise • Note: the variables et have zero mean, variance )2 and are uncorrelated • A sequence of i.i.d. variables with zero mean will be a white noise process, according to this definition. In particular, Zt independent, Zt ~ N(0,)2) is a white noise process. • Result: 'k = '-k and *k = *-k • Correlogram = plot of ACF {*k : k ( 0} as a function of lag k. It is widely used as it tells a lot about the time series. c. Partial autocorrelation function (PACF) Let r(x,y|z) = corr(x,y|z) denote the partial correlation coefficient between x and y, adjusted for z (or with z held constant). Xt | t • Xt+1 Xt+2 ... Xt+k-1 Xt+k | t+1 | t+2 | | t+k-1 t+k Denote: +2 = corr(xt, xt+2|xt+1) +3 = corr(xt, xt+3|xt+1, xt+2) +4= corr(xt, xt+k|xt+1,... xt+k-1) +k = partial autocorrelation coefficient at lag k. ST4064 • Time Series Analysis 10 Partial autocorrelation function (PACF): {+1, +2, ...} = {+k, k ( 1} • The +k’s are related to the *k’s: +1 = corr(Xt, Xt+1) = *1 Recall that r(x,y|z) = r(x,y) - r(x,z)r(y,z) 1-r 2 (x,z) 1-r 2 (y,z) Applying this here, using x = Xt, y = Xt+2, z = Xt+1, "2 = corr(xt, xt+2|xt+1) = r(x,y|z), along with #1= r(x,z) and #2 = r(x,y), yields: "2 = ! 2 # !12 1 # !12 d. Estimation of the ACF and PACF We assume that the sequence of observations {x1, x2, ...xn} comes from a stationary time series process. The following functions are central to the analysis of time series: {'k} - Autocovariance function f($) {*k} Autocorrelation function (ACF) Spectral density function {!k } Partial Autocorrelation function (PACF) To find a model to fit the sequence {x1,x2, ... ,xn}, we must be able to estimate the ACF of the process of which the data is a realisation. Since the model underlying the data is assumed to be stationary, its mean can be estimated using the sample mean. µ̂ = 1 n ! xt n t =1 The autocovariance function, 'k, can be estimated using the sample autocovariance function: ST4064 Time Series Analysis !ˆk = 11 1 n # (x t " µˆ )(x t-k " µˆ ) n t = k +1 from which are derived estimates, rk for the autocorrelation *k: rk = !ˆk !ˆ0 The collection {rk : k ! Z } is called the sample autocorrelation function (SACF). The plot of rk against k is called a correlogram. Recall that the partial autocorrelation coefficients !k are calculated as follows: !1 = "1 1 !2 = 1 "1 " 2 " 2 # "12 = "1 1 # "12 "1 1 "1 In general, !k is given as a ratio of determinants involving *1, *2, ..., *k. The sample partial autocorrelation coefficients are given by these formulae, but with the *k replaced by their estimates rk: !ˆ1 = r1 r2 " r12 ˆ !2 = 1 " r12 etc. The collection {!ˆk } is called the sample partial autocorrelation function (SPACF). The plot of {!ˆk } against k is called the partial correlogram. rk !ˆk 1 1 | | | | | | | | | k -1 | | | k -1 These are the main tools in identifying a model for a stationary time series. ST4064 Time Series Analysis 12 II.3 ARMA modelling Autoregressive moving average (ARMA) models constitute the main class of linear models for time series. More specifically: a. • Autoregressive (AR) • Moving Average (MA) • Autoregressive Moving Average (ARMA) • Autoregressive Integrated Moving Average (ARIMA) ! Last type are non-stationary ! Others are stationary AR models • Recall: Markov Chain = process such that the conditional distribution of Xn+1, given Xn,Xn-1,...X0 depends only on Xn, i.e. “the future depends on the present, but not on the past” • The simplest type of autoregressive model (AR(1)) has this property: Xt = , Xt-1 + -t , where -t is zero-mean white noise. Xt-2 | t-2 Xt-1 | t-1 Xt | t • For AR(1), we prove that +2 = corr(Xt, Xt-2| Xt-1) = 0 • Similarly, +k = 0 for k > 2. • A more general form of an AR(1) model is Xt = µ + , (Xt-1 – µ) + -t where µ = E(Xt) is the process mean • Autoregressive process of order p (AR(p)): Xt = µ + ,1 (Xt-1 – µ) + ,2 (Xt-2 – µ) + ... ,p (Xt-p – µ) + -t b. MA models A realisation of a white noise process is very ‘jagged’, since successive observations are realisations of independent variables... Most time series observed in practice have a smoother time series plot than a realisation of a white noise process, since in this process the successive observations are realisations of independent variables. In that respect, taking a “moving average” is a standard way of smoothing an observed time series: Observed data: x1 , x2 , x3 , x4 ,... ST4064 Time Series Analysis 13 1 ( x1 + x2 + x3 , x2 + x3 + x4 ,...) 3 Moving average: Data Moving average • A moving average process is “smoothed white noise” • The simplest type of moving average (MA) process is Xt = µ +-t + .-t-1 where -t is zero-mean white noise • The %t’s are uncorrelated, but the Xt’s are not: Xt-1 ..... -t-1 -t-2 -t-1 Xt-2 -t Xt • For MA(1) we prove that: *2 = corr(Xt , Xt-2) = 0 • Similarly, *k = 0, for k > 2 • Moving average process of order (q) (MA(q)): Xt = µ +-t + .1 -t-1 + ... + .q -t-q c. ARMA models ARMA processes « combine » AR and MA parts : Xt = µ + ,1(Xt-1 – µ) +...+ ,p(Xt-p – µ) + -t + . 1 -t-1 +...+ .q -t-q Note: ARMA(p,0) = AR(p) ARMA(0,q) = MA(q) II.4 Backwards Shift Operator and Difference Operator The following operators will be useful: • Backwards shift operator: B Xt = Xt-1, Bµ = µ ST4064 • Time Series Analysis 14 Difference operator: ! = 1-B, hence !Xt = Xt – Xt-1 B2Xt = BBXt = BXt-1 = Xt-2 !2Xt = = = = = !Xt - !Xt-1 Xt – Xt-1 – (Xt-1 – Xt-2) (1-B)2Xt (1-2B+B2) Xt Xt – 2Xt-1 + Xt-2 II.5 AR(p) models, stationarity and the Yule-Walker equations a. The AR(1) Model • Recall Xt = µ + ,(Xt-1 – µ) + -t • Substituting in for Xt-1, then for Xt-2, Xt = µ + ,[, (Xt-2 – µ) + -t-1] + -t = µ + ,2(Xt-2 – µ) + -t + , -t-1 t t-1 t-1 t Xt = µ + , (X0 – µ) + -t + , -t-1 + ...+ , -1 = µ + , (X0 – µ) + j=0 • Note: X0 is a Random Variable • Since E(-t) = 0 for any t, µt = E(Xt) =µ + ,t (µ0 – µ) • Since the -t’s are uncorrelated with each other and with X0, % t $1 & j =0 * Var ( X t ) = Var '' µ + ! t ( X 0 $ µ ) + +! j" t $ j (( ) t $1 = ! 2tVar ( X 0 ) + + ! 2 j# 2 j =0 1 $ ! 2t = ! Var ( X 0 ) + # 1$! 2 2t 2 Question: When will AR(1) process be stationary? Answer: This will require constant mean and variance. If If µ0 = µ then µt = µ + ,t (µ0 – µ) = µ. !2 Var (X0) = 1#" 2 then Var ( X t ) = " 2 j #! " 2 1 # ! 2t "2 2t " + ! = 1#! 2 1#! 2 1#! 2 t-j ST4064 Time Series Analysis 15 Neither µt nor Var(Xt) depend on t. We also require that |,| < 1 so that the AR(1) process be stationary, in which case µt = µ + " t (µ0 % µ ) AND Var ( X t ) % • !2 !2 $ 2t # = " Var ( X ) % & ' 0 1%" 2 1%" 2 ) ( If |,| < 1, both terms will decay away to zero for large t ! X is almost stationary for large t • Equivalently, if we assume that the process has already been running for a very long time, it will be stationary • Any AR(1) process with infinite history and |,| < 1 will be stationary: ... --2, --1, -0, -1, ... -t ... X-2, X-1, X0, X1, ... Xt Steady State reached • Observed time series An AR(1) process can be represented as: # X t = µ + %! j" t $ j j =0 and this converges only if |,| < 1. • The AR(1) model Xt = µ + , (Xt-1 – µ) + -t can be written as (1 – ,B)(Xt – µ) = -t If |,| < 1, then (1 – ,B) is invertible and Xt – µ = (1 – ,B)-1-t = (1 + ,B + ,2B2 + ...) -t = -t + ,-t-1 + ,2-t-2 + ..., • So # X t = µ + %! j " t $ j j =0 From this representation, µt = E(Xt) = µ and Var(Xt) = # $! j =0 "2 = 2j !2 1#" 2 if |,| < 1. ST4064 Time Series Analysis • So, if |,| < 1, the mean and variance are constant, as required for stationarity • We must calculate the autocovariance 'x = Cov(Xt , Xt+k) and show that this depends only on the lag k. We need properties of covariance: Cov(X+Y,W) = Cov(X,W) + Cov (Y,W) Cov(X,e) = 0 From the following diagram ... -t-2, -t-1, -t (uncorrelated) and ... -t-2, -t-1, -t Xt Xt-1 we can tell that -t and Xt-1 are uncorrelated, hence Cov(-t, Xt-1) = 0 Cov(-t, Xt-k) = 0, Cov(-t, Xt) = )2 k(1 '1 = Cov(Xt, Xt-1) = Cov(µ + ,(Xt-1 – µ) + -t, Xt-1) = , Cov(Xt-1, Xt-1) + Cov(-t, Xt-1) = ,'0 + 0 '2 = Cov(Xt, Xt-2) = Cov(µ + ,(Xt-1 – µ) + -t, Xt-2) = , Cov(Xt-1, Xt-2) + Cov(-t, Xt-2) = , '1 + 0 = ,2'0 Similarly, 'k = ,k '0, k ( 0 In general, 'k = Cov(Xt,Xt-k) = Cov(µ + ,(Xt-1 – µ) + -t,Xt-k) = ,Cov(Xt-1, Xt-k) + Cov(-t,Xt-k) = , 'k-1 + 0 Hence, 2 'k = ,k'0 = ,k ! 1#" 2 and 16 for k ( 0 *k = 'k/'0 = ,k for k ( 0 ! ACF decreases geometrically with k ST4064 Time Series Analysis Recall the partial autocorrelations +1 and +2 satisfy !2 # !12 "1 = !1 and "2 = 1 # !12 Here +1 = *1 = , and ! #! =0 1#! 2 "2 = In fact, 2 2 #k = 0 for k > 1 In summary, for the AR(1) model, • ACF “tails off” to zero • PACF “cuts off” after lag 1 Example: Consumer price index Qt rt = ln(Qt/Qt-1) models the force of inflation Assume rt is an AR(1) process: rt= µ + ,(rt-1-µ) + et Note: Here µ is the long-run mean rt - µ = ,(rt-1- µ), ignoring et If |,| < 1, then rt – µ % 0 and so rt % µ as t % &. In this case rt is said to be mean-reverting. b. The AR(P) model and stationarity Recall that the AR(p) model can be written either in its generic form Xt = µ + ,1(Xt-1 – µ ) + ,2(Xt-2 – µ) + ... + ,p(Xt-p – µ) + et or using the B operator as (1 – ,1 B – ,2 B2 – ,3 ... – ,pBp) (Xt – µ) = et Result: AR(p) is stationary IFF the roots of the characteristic equation 1 – ,1z – ,2z2 – ... – ,pzp = 0 are all greater than 1 in absolute value. 17 ST4064 Time Series Analysis 18 1 – ,1z – ,2z 2 – ... – ,pzp # Characteristic Polynomial Explanation for this result: write the AR(p) process in the form ! B "! B" ! B" 1 # 1 # ... 1 # %% ( X t # µ ) = et $ %$ % $$ z z z & 1 '& 2 ' & p ' where z1 ...zp are roots of the characteristic polynomial: ! z "! z " ! z " 1 # %$1 # % ... $1 # % $ 1 – ,1z ... – ,pz = & z1 '& z 2 ' $ z p % & ' p In the AR(1) case, 1 – ,z = 1 – z/z1, where z1 = 1/, In AR(1) case, we can invert the term ! B" #1- $ % z1 & in ! B" $1- % ( X t # µ ) = et & z1 ' IFF |z1| > 1. In the AR(p) case, we need to be able to invert all of the factors ! B" #1- $ % zi & This will be the case IFF |zi| > 1 for i = 1,2, ..., p. Example : AR(2) Xt = 5 – 2(Xt-1 - 5) + 3(Xt-2 – 5) + et ! or (1 + 2B -3B2)(Xt – 5) = et 1 + 2z -3z2 = 0 is the characteristic equation here Question: when is an AR(1) process stationary ? Answer: we have Xt = µ + , (Xt-1 – µ) + et. i.e. (1 – ,B)(Xt – µ) = et, so 1 – ,z = 0 is the characteristic equation with solution z = 1/,. So |,| < 1 is equivalent to |z| > 1, as required. Question: Consider the AR(2) process Xn = Xn-1 – / Xn-2 + en. Is it stationary ? ST4064 Time Series Analysis Answer: Use B-operator: (1 – B + / B2)Xn = en. So characteristic equation is 1 – z + / z2 = 0, with roots 1 ± i and |1± i| = !2 > 1 Since both roots satisfy |zi| > 1, the process is stationary. In the AR(1) model, we had '1 = ,'0 and '0 = )2. These are a particular case of the Yule-Walker Equations for AR(p): Cov( X t , X t #k ) = Cov( µ + !1 ( X t #1 # µ ) + ... + ! p ( X t # p # µ ) + et , X t #k ) $" 2 , if k=0 = !1Cov( X t #1 , X t #k ) + ... + ! pCov( X t # p , X t #k ) + % & 0, otherwise c. Yule-Walker equations The Yule-Walker equations are defined by the following relationship: %! 2 , if k=0 " k = #1" k $1 + # 2" k $2 + ... + # p" k $ p + ' , for 0 & k & p ( 0, otherwise Considering the AR(1) (i.e. p = 1), for k = 1, we get '1 = ,'0, and for k = 0, we get '0 = )2. Example (p=3): '3 = ,1'2 + ,2'1 + ,3'0 '2 = ,1'1 + ,2'0 + ,3'1 '1 = ,1'0 + ,2'1 + ,3'2 '0 = ,1'1 + ,2'2 + ,3'3 + )2 Example: consider the AR(3) model Xt = 0.6Xt-1 + 0.4Xt-2 – 0.1Xt-3 + et Yule-Walker Equations: '0 = 0.6'1 + 0.4'2 – 0.1'3 + )2 '1 = 0.6'0 + 0.4'1 – 0.1'2 '2 = 0.6'1 + 0.4'0 – 0.1'1 '3 = 0.6'2 + 0.4'1 – 0.1'0 (0) (1) (2) (3) From (1), '2 = 6'0 – 6'1 From (2), '2 = 0.4'0 + 0.56'1, hence '1 = From (3), '3 = 56 54 '0, and hence '2 = '0. 65 65 483 '0 650 From (0), )2 = 0.22508'0 Hence, '0 = 4.4429)2, '1=3.8278)2, '2=3.6910)2, '3=3.3014)2 19 ST4064 Time Series Analysis 20 and so, since *k = 'k/'0, *0 = 1, *1 = 0.862, *2 = 0.831, *3 = 0.743. It may be shown that for AR(p) models, • ACF “tails off” to zero, • PACF “cuts off” after lag p, i.e. #k = 0 for k > p II.6 MA(q) models and invertibility a. The MA(1) model The model is given by Xt = µ + et + .et-1, where µt = E(Xt) = µ, and '0 = Var(et + .et-1) = (1 + .2))2 '1 = Cov(et + .et-1, et-1+.et-2)=.)2 'k = 0 for k > 1 Hence, the ACF for MA(1) is: *0 = 1 *1 = . / (1+.2) *k = 0 for k > 1 Since the mean E(Xt) and covariance 'k = E(Xt, Xt-k) do not depend on t, the MA(1) process is (weakly) stationary - for all values of the parameter &. However, we require MA models to be invertible and this imposes conditions on the parameters. Recall: If |,| < 1 then in the AR(1) model (1 – ,B)(Xt – µ) = et, (1-,B) is invertible and # X t = µ " $! j et" j = µ + et + ! et "1 + ! 2et "2 + ... j =0 i.e. an AR(1) process is MA(&). An MA(1) process can be written as Xt – µ = (1 + .B)et or (1 + .B)-1(Xt – µ) = et i.e. ST4064 Time Series Analysis 21 Xt-µ – .(Xt-1 – µ) + .2(Xt-2 – µ) + ... = et So an MA(1) process is represented as an AR(&) one – but only if |.| < 1, in which case the MA(1) process is invertible. Example: MA(1) with & = 0.5 or & = 2 For both values of . we have: *1 = ./(1+.2) = 0.5 2 = = 0.4, 2 1 + (0.5) 1 + 22 So both models have the same ACF. However, only the model with .=0.5 is invertible. Question: Interpretation of invertibility Consider the MA(1) model Xn – µ – .en-1. We have en = Xn – µ – .en-1 = Xn – µ – .(Xn-1 – µ – .en-2) = ... = Xn – µ – .(Xn-1 – µ) + .2(Xn-2 – µ) ... + (-.)n-1(X1 – µ) + (-.)ne0 As n gets large, the dependence of en on e0 will be small if .| < 1. Note: AR(1) is stationary IFF |$| < 1. MA(1) is invertible IFF |%| < 1. For an MA(1) process, we have *k = 0 for k > 1, so for an MA(1) process, the ACF “cuts off” after lag1. It may be shown that PACF “tails off” to zero. AR(1) MA(1) ACF Tails off to zero Cuts off after lag 1 PACF Cuts off after lag 1 Tails off to zero b. The MA(q) model and invertibility An MA(q) process is modeled by Xt = µ + et + .1et-1 + ... + .qet-q, where {et} is a sequence of uncorrelated realisations. For this model we have 'k = Cov(Xt, Xt-k) = 0 for k > q. 'k = Cov(Xt, Xt-k) = E[(et + .1et-1 + ... + .qet-q) ( et-k + .1et-k-1 + ... + .qet-k-q)] q = q ## ! ! E(e i i =0 j =0 j e t -i t " j " k ) [where &0 = 1] ST4064 = )2 Time Series Analysis 22 q "k #! j +k !j , [since j = i-k ' q-k] j =0 since the only non-zero terms occur when the subscripts of et-i and et-j-k match, i.e. when i = j+k, for k ' q. In summary, for k > q, !k = 0: • For MA(q), ACF cuts off after lag q • For AR(p), PACF cuts off after lag p Question: ACF of the MA(2) process Xn = 1 + en – 5en-1 + 6en-2. '0 = Cov(1 + en – 5en-1 + 6en-2, 1 + en – 5en-1 + 6en-2) = (1 + 25 + 36) 1 = 62 If E(en) = 0 and Var(en) = 1. '1 = Cov (1 + en – 5en-1 + 6en-2, 1 + en-1 – 5en-2 + 6en-3) = (-5)(1) + (6)(-5) = -35 '2 = Cov (1 + en – 5en-1 + 6en-2, 1 + en-2 – 5en-3 + 6en-4) = (6)(1) = 6 'k = 0, k > 2 Recall that an AR(p) process is stationary IFF roots z of the characteristic eq satisfy |z| > 1. For an MA(q) process , we have Xt – µ = (1 + .1B + .2B2 + ... + .pBp) et Consider the equation 1 + .1z + .2x2 + ... + .pzp = 0. The MA(q) process is invertible IFF all roots z of this equation satisfy |z| > 1. In summary: • If AR(p) stationary, then AR(p) = MA(&) • If MA(q) is invertible, then MA(q) = AR(&) Question: Assess invertibility of the MA(2) process Xt = 2 + et – 5et-1 + 6et-2. We have Xt = 2 + (1-5B +6B2)et. The characteristic equation is 1 – 5z + 6z2 = 0 with roots (1-2z)(1-3z) = 0, i.e. roots z = / and z = 1/3 ! Not invertible II.7 ARMA(p,q) models Recall that the ARMA(p,q) model can be written either in its generic form ST4064 Time Series Analysis 23 Xt = µ + ,1(Xt-1 – µ) + ... ,p(Xt-p –µ) + et + .1et-1 + ...+ .qet-q or using the B operator: (1 – ,1B ... –,pBp) (Xt – µ) = (1 + .1B ... + .qBq)et i.e. 0(B)(Xt – µ) = 1(B)et where 0 (2) = 1 – ,12 - ... - ,p2p 1 (2) = 1 + .12 + ... + .p2q If 0 (2) and 1 (2) have factors in common, we simplify the defining relation. Consider the simple ARMA(1,1) process with . = -,, written either Xt = ,Xt-1 + et – ,et-1 or (1 – ,B)Xt = (1 – ,B)et , with |,| < 1 Dividing through by (1 – ,B), we obtain Xt = et. Therefore the process is actually an ARMA(0,0), also called white noise. We assume that 0(2) and 1(2) have no common factors. Properties of ARMA(p,q) are a mixture of those of AR(p) and those of MA(q). • Characteristic polynomial of ARMA(p,q) = 1 – ,1z ... – ,pzp (as for AR(p)) • ARMA(p,q) is stationary IFF all the roots z of 1 – ,1z ... – ,pzp = 0 satisfy |z| > 1 • ARMA(p,q) is invertible IFF all the roots z of 1 – .1z ... – .pzq = 0 satisfy |z| > 1 Example: the ARMA(1,1) process Xt = ,Xt-1 + et + .et-1 is stationary if |,| < 1 and invertible if |.| < 1. Example: ACF of ARMA(1,1). For the model given by Xt = ,Xt-1 + et + .et-1 we have Cov(et, Xt-1) =0 Cov(et, et-1) =0 Cov(et, Xt) = , Cov(et,Xt-1) + Cov(et,et) + . Cov(et,et-1) = )2 Cov(et-1, Xt) = , Cov(et-1,Xt-1) + Cov(et-1,et) + . Cov(et-1,et-1) = , )2 + 0 + . )2 = (, + .) )2 ST4064 Time Series Analysis '0 = Cov(Xt,Xt) = , Cov(Xt,Xt-1) + Cov(Xt,et) + . Cov(Xt,et-1) = ,'1 + )2 + . (,+.) )2 = ,'1 + (1 + ,. + .2) )2 '1 = Cov(Xt-1,Xt) = , Cov(Xt-1,Xt-1) + Cov(Xt-1,et) + .Cov(Xt-1,et-1) = ,'0 + .)2 For k > 1, 'k = Cov(Xt-1,Xt) = , Cov(Xt-k,Xt-1) + Cov(Xt-k,et) + . Cov(Xt-k,et-1) = , 'k-1 (Analogues of Yule-Walker Equations) ! Solve for '0 and '1: 1 + 2!" + " 2 2 '0 = # 1-! 2 '1 = (!+")(1 + !") 2 # 1-! 2 'k = ,k-1 '1, for k > 1 !1 (1+"#)("+#) , *k = ,k-1*1, for k > 1 (compare *k = ,k, for k ( 0 for AR(1)). = 2 !0 1+2"#+# For (stationary) ARMA(p,q), Hence • ACF tails off to zero • PACF tails off to zero Question: ARMA(2,2) process 12 Xt = 10Xt-1 – 2Xt-2 + 12et – 11et-1 + 2et-2 24 ST4064 Time Series Analysis 25 (12 – 10B +2B2)Xt = (12 – 11B +2B2)et The roots of 12 – 10z + 2z2 = 2(z – 2)(z –3) = 0 Are z = 2 and z = 3, |z| > 1 for both roots; process stationary. II.8 ARIMA(p,d,q) models a. Non-ARMA processes • Given time series data X1 ... Xn, find a model for this data. • Calculate sample statistics: sample mean, sample ACF, sample PACF. • Compare with known ACF/PACF of class of ARMA models to select suitable model. • All ARMA models considered are stationary – so can only be used for stationary time series data. • If time-series data is non-stationary, transform it to a stationary time series (e.g. by differencing) • Model this transformed series using an ARMA model • Take the “inverse transform” of this model as model for the original non-stationary time series. Example: Random Walk X0 = 0, Xn = Xn-1 + Zn, where Zn is a white noise process. Xn is non-stationary, but Xn = Xn – Xn-1 = Zn is stationary. Question: Given X0, X1 ...Xn the first order differences are wi = xn – xi-1 , i = 1, ... , N From the differences w1, w2, ..., wN and x0 we can calculate the original time series: w1 = x1 – x0 , so x1 = x0 + w1 w2 = x2 – x1 , so x2 = x1 + w2 = x0 + w1 + w2, etc. The inverse process of differencing is integration, since we must sum the differences to obtain the original time series. b. The I(d) notation (“integrated of order d”) • X is said to be I(0) if X is stationary ST4064 Time Series Analysis • X is said to be I(1) if X is not stationary but Yt = Xt – Xt-1 is stationary • X is said to be I(2) if X is not stationary, but Y is I(1). 26 Thus X is I(d) if X must be “differenced” d times to make it stationary. Example: If the first differences xn = xn – xn-1 of x1, x2 ... xn are modelled by an AR(1) model (stationary) !Xn = 0.5 !Xn-1 + en, Then, Xn – Xn-1 = 0.5(Xn-1 – Xn-2) + en, so Xn = 1.5Xn-1 – 0.5Xn-2 +en is the model for the original time series. This AR(2) model is non-stationary since written as (1 – 1.5B + 0.5B2)Xn = en, for which the characteristic equation is: 1 – 1.5z + 0.5z2 = 0 with roots z = 1 and z = 2. The model is non-stationary since |z| > 1 does not hold for BOTH roots. X is ARIMA(p,1,q) if X is non-stationary, but !X (the first difference of X) is a stationary ARMA(p,q) process • Recall that a process X is I(1) if X is non-stationary, but !X = Xt – Xt-1 is stationary Note: If Xt is ARIMA(p,1,q) then Xt is I(1). Example: Random Walk. Xt – Xt-1 = et, where et is a white noise process. We have t Xt = X0 + !e j j=1 So E(Xt) = E(X0), if E(et) = 0, but Var(Xt) = Var(X0) + t)2. Hence Xt is non-stationary, but !Xt = et, where et is a stationary white noise process. Example: Zt = closing share price on day t. Here the model is given by Zt = Zt-1 exp(µ + et) Let Yt = ln Zt , then Yt = µ + Yt-1 + et . This is a random walk with drift. Now consider the daily returns Yt – Yt-1 = ln(Zt/Zt-1). Since Yt – Yt-1 = µ + et and the et’s are independent, then Yt – Yt-1 is independent of Y1 ...Yt-1 or ln(Zt/Zt-1) is independent of past prices Z0, Z1, ... Zt-1. ST4064 Time Series Analysis Example: Recall the example of Qt = consumer price index at time t. We have rt = ln(Qt/Qt-1) follows AR(1) model rt = µ + , (rt-1 – µ) + et ln(Qt/Qt-1) = µ + , (ln(Qt/Qt-1) - µ) + et !ln(Qt) = µ + ,(!ln(Qt-1) – µ) + et thus !ln(Qt) is AR(1) and so ln(Qt) is ARIMA(1,1,0) If X needs to be differenced at least d times to reduce it to stationarity, d • and Y = ! X is stationary ARMA(p,q), • then X is an ARIMA(p,d,q) process. An ARIMA(p,d,q) process is I(d) Example: Identify as ARIMA(p,d,q) the following model Xt = 0.6Xt-1 + 0.3Xt-2 + 0.1Xt-3 + et – 0.25et (1 – 0.6B – 0.3B2 – 0.1B3) Xt = (1 – 0.25B) et Check for factor (1 – B) on LHS: (1 – B)(1 – 0.4B + 0.1B2)Xt = (1 – 0.25B) et ! Model is ARIMA(2,1,1) Characteristic equation: 1 + 0.4z + 0.1z2 = 0 with roots -2 ± i 6 Since |z| = 10 > 1 for both roots !Xt is stationary, as required. Alternative method: Write model in terms of !Xt = Xt – Xt-1, !Xt-1, etc Xt – Xt-1 = -0.4Xt-1 + 0.4Xt-2 = -0.1Xt-2 + 0.1Xt-3 + et – 0.25et !Xt = -0.4 !Xt-1 – 0.1 !Xt-2 + et – 0.25et-1 Hence, !Xt is ARMA(2,1) (check for stationarity as above), and so Xt is ARIMA(2,1,1) Note: if !dXt is ARMA(1,q), to check for stationarity, we only need to see that |,1| < 1. 27 ST4064 Time Series Analysis 28 II.9 The Markov Property AR(1) Model: Xt = µ + ,(Xt-1 – µ) + et Conditional distribution of Xn+1 , given Xn, Xn-1 , ... , X0 depends only on Xn ! AR(1) has markov property AR(2) Model: Xt = µ + ,1 (Xt-1 – µ) + ,2 (Xt-2 - ) + et Conditional distribution of Xn+1, given Xn , Xn-1 , ... X0 depends on Xn-1 as well as Xn. ! AR(2) does not have the Markov Property Consider now Xn+1 = µ + ,1Xn + ,2Xn-1 + en+1 or ! X n+1 " ! µ " ! !1 ! 2 " ! X n " ! en+1 " # $ =# $+# $ +# $ $# % X n & % 0 & % 1 0 & % X n-1 & % 0 & Define ! Xn " T Yn = # $ =(X n ,X n-1 ) % X n-1 & ! en+1 " ! µ " ! !1 ! 2 " Y + $ $ n # %0 & % 1 0 & %0 & then Yn+1 = # $ + # • Y is said to be a vector autoregressive process of order 1. • Notation: Var(1) • Y has the Markov property In general, AR(P) does not have the Markov property for p > 1, but Y = (Xt, Xt-1, ... Xt-p+1)T does • • Recall: Random walk – ARIMA(0,1,0) defined by Xt – Xt-1 = et has independent increments and hence does have the Markov property It may be shown that for p+d > 1, ARIMA(p,d,0) does not have the Markov property, but Yt = (Xt, Xt-1, ..., Xt-p-d+1)T does. ST4064 Time Series Analysis 29 Consider the MA(1) process Xt = µ + et + .et-1. It is clear that “knowing Xn will never be enough to deduce the value of en, on which the distribution of Xn+1 depends”. Hence an MA(1) process does not have the Markov property. Now consider an MA(q) = AR(&) process. It is known that AR(p) processes Y = (Xt, Xt-1, ...Xt-p+1)T have the Markov property if considered as a p-dimensional vector process (p finite). It follows that an MA(q) process has no finite dimensional Markov representation. Question: Associate a vector-valued Markov process with 2Xt = 5Xt-1 – 4Xt-2 + Xt-3 + et We have 2 (Xt – Xt-1) = 3 (Xt-1 – Xt-2) - (Xt-2 – Xt-3) + et 2!Xt = 3 !Xt-1 – !Xt-2 + et !2 Xt = !2Xt-1 + et ! ARIMA(1,2,0) or ARIMA(p,d,q) with p = 1 and d = 2. Since p+d = 3 > 1, Yt = (Xt, Xt-1, ...Xt-p-d+1)T = (Xt, Xt-1,Xt-2)T is Markov Question: Let the MA(1) process Xn = en + en-1, where en = 1 with probability / -1 with probability / P(Xn = 2 | Xn-1 = 0) = P(en = 1, en-1 = 1 | en-1 + en-2 = 0) = P(en = 1) P(en-1 = 1 | en-1 + en-2 = 0) =//=3 P(Xn = 2 | Xn-1 = 0, Xn-2 = 2) = P(en = 1, en-1 = 1| en-1 + en-2 = 0, en-2 + en-3 = 2) =0 ! Not Markov: since the two probabilities differ, value of Xn does not depend on the immediate past n-1 only. ST4064 III. Time Series Analysis 30 Non-stationarity: trends and techniques III.1 Typical trends Possible causes of non-stationarity in a time series are: • • • Deterministic trend (e.g. linear or exponential growth) Deterministic cycle (e.g. seasonal effects) Time series is integrated (as opposed to differenced) Example: +1, probability 0.6 Xn = Xn-1 + Zn, where Zn = -1, probability 0.4 Here Xn is I(1) , since Zn = Xn – Xn-1 is stationary. Also, E(Xn) = E(Xn-1) + 0.2, so the process has a deterministic trend. Many techniques allow to detect non-stationary series; among the simplest methods: • Plot of time series against t • Sample ACF The sample ACF is an estimate of the theoretical ACF, based on the sample data and is defined later. A plot of the time series will highlight a trend in the data and will show up any cyclic variation. Trend Xt Xt t Seasonal Pattern 2003 | 2004 t Trend + Seasonal Xt t Recall: For a stationary time series, *k % 0 as k % &, i.e. (theoretical) ACF converges toward zero. Hence, the sample ACF should also converge toward zero. If the sample ACF decreases slowly, the time series is non-stationary, and needs to be differenced before fitting a model. Sample ACF Sample ACF ST4064 Time Series Analysis rk 31 rk 6 1 k 1 12 If sample ACF exhibits periodic oscillation, there is probably a seasonal pattern in the data. This should be removed before fitting a model (see Figures 7.3a and 7.3b). The following graph (Fig 7.3(a)) shows the number of hotel rooms occupied over several years. Inspection shows the clear seasonal dependence, manifested as a cyclic effect. The next graph (Fig 7.3(b)) shows the sample autocorrelation function for this data. It is clear that the seasonal effect shows up as a cycle in this function. In particular, the period of this cycle looks to be 12 months, reinforcing the idea that it is a seasonal effect. !"#$%&'()*+' Seasonal variation- hotel room occupancy (7.3a) 1963-1976 and its sample ACF (7.3b) Methods for removing a linear trend: ST4064 • • Time Series Analysis 32 Least squares Differencing Methods for removing a seasonal effect • • • Seasonal differencing Method of Moving Averages Method of seasonal means III.2 Least squares trend removal Fit a model, Xt = a + bt + Yt where Yt is a zero-mean, stationary process. Recall: et = error variables (“true residuals”) in a regression model. Assume et ~ IN(0,)2) • Estimate parameters a and b using linear regression • Fit a stationary model to the residuals: ˆ ŷt = x t ! (aˆ - bt) Note: least squares may also be used to remove nonlinear trends from a time series. It is naturally possible to model any observed nonlinear trend by some term "(t) within Xt = "(t) + Yt which can be estimated using least squares. For example, a plot of hourly data of daily energy loads against temperature, over a one-daytime frame, may indicate quadratic variations over the day; in this case one could use "(t) = a + bt2. III.3 Differencing a. Differencing and linear trend removal Use differencing if the sample ACF decreases slowly. If there is a linear trend, e.g. xt = a + bt + yt, then "xt = xt ! xt !1 = b + " yt , so differencing has removed the linear trend. If xt is I(d), then differencing xt d times will make it stationary. Differencing xt once will remove any linear trend, as above. ST4064 Time Series Analysis 33 Suppose xt is I(1) with a linear trend. If we difference xt once, then !xt is stationary and we have removed the trend. However, if we remove the trend using linear regression we will still be left with an I(1) process that is non-stationary. Example: +1, prob. 0.6 Xn = Xn-1 + Zn, where Zn = -1, prob. 0.4 Let X0 = 0. Then E(X1) = 0.2, since E(Z1) = 0.2, and E(X2) = 0.2(2) E(Xn) = 0.2(n). Then Xn is I(1) AND Xn has a linear trend. Let Yn = Xn – 0.2(n). Then E(Yn) = 0, so we have removed the linear trend but Yn – Yn-1 = Xn – Xn-1 -0.2 = Zn – 0.2 Hence Yn is a random walk (which is non-stationary) and !Yn is stationary, so Yn is an I(1) process. b. Selection of d How many times (d) do we have to difference the time series Xt to convert it to stationarity? This will determine the parameter d in the fitted ARIMA(p,d,q) model. Recall the three causes of non-stationarity: • • • Trend Cycle Time series is an integrated series We are assuming that linear trends and cycles have been removed, so if the plot of the time series and its SACF indicate non-stationarity, it could be that the time series is a realisation of an integrated process and so must be differenced a number of times to achieve stationarity. Choosing an appropriate value of d: • Look at the SACF. If the SACF decays slowly to zero, this indicates a need for differencing (for a stationary ARMA model, the SACF decays rapidly to zero). • Look at the sample variance of the original time series X and its difference. Let !ˆ 2 be the sample variance of z ( d ) =! d x . It is normally the case that !ˆ 2 first decreases with d until stationarity is reached, and then starts to increase, since differencing too much introduces correlation. ST4064 Time Series Analysis 34 Take d equal to the value that minimises !ˆ 2 . !ˆ 2 5 5 5 5 0 5 5 5 5 1 2 3 d In the above example, take d=2, which is the value for which the estimated variance is minimised. III.4 Seasonal differencing Example: Let X be the monthly average temperature in London. Suppose that the model xt = µ + 4t + yt applies, where 4t is a periodic function with period 12 and yt is stationary. The seasonal difference of X is defined as: ( "12 x )t = xt – xt !12 But: xt – xt-12 = (µ + 4t + yt) – (µ + 4t-12 + yt-12) = yt – yt-12 since 4t = 4t-12. Hence xt – xt-12 is a stationary process. We can model xt – xt-12 as a stationary process and thus get a model for xt. Example: In the UK, monthly inflation figures are obtained by seasonal differencing of the retail prices index (RPI). If xt is the value of RPI for month t, then annual inflation figure for month t is x t - x t-12 !100% x t-12 Remark 1: the number of seasonal differences taken is denoted by D. For example, for the seasonal differencing X t ! X t !12 = "12 X t we have D=1. Remark 2: in practice, for most time series we would need at most d=1 and D=1. III.5 Method of moving averages This method makes use of a simple linear filter to eliminate the effects of periodic variation. If X is a time series with seasonal effects with even period d = 2h, we define a smoothed process Y by yt = 1 !1 1 " # xt -h + xt -h +1 + ... + xt -1 + xt + ... + xt + h -1 + xt + h $ 2h % 2 2 & This ensures that each period makes equal contribution to yt. Example with quarterly data: A yearly period will have d = 4 = 2h, so h = 2, and ST4064 Time Series Analysis 35 yt = 3 ( / xt-2 + xt-1 + xt + xt+1 + / xt+2) This is a centred moving average, since the average is taken symmetrically around the time t. Such an average can only be calculated retrospectively. For odd periods d = 2h + 1, the end terms xt-h and xt+h need not be halved: yt = 1 ( x t-h +x t-h+1 +...+x t-1 +x t +...+x t+h-1 + x t+h ) 2h + 1 Example: with data every 4 months, a yearly period will have d = 3 = 2h+1, so h = 1 and yt = 1/3 (xt-1 + xt + xt+1) III.6 Seasonal means In fitting the seasonal model xt = µ + 4t + yt with E(Yt)=0 (additive model) to a monthly time series, x extending over 10 years from January 1990, the estimate of µ is x (the average over all 120 observations) and the estimate of 4January is 1 !ˆ January = (x1 +x13 +...+x109 )-x , 10 the difference between the average value for January, and the overall average over all the months. Recall that 4t is a periodic function with period 12 and yt is stationary. Thus, 4t contains the deviation of the model (from the overall mean µ) at time t due to the seasonal effect. Month/Year January . . . December 1 x1 . . . x12 2 x13 . . . x24 .... ... ... 10 x109 . . . x120 mean !ˆ 1 !ˆ12 overall mean x III.7 Filtering, smoothing Filtering and exponential smoothing techniques are commonly applied to time series in order to “clean” the original series from undesired artifacts. The moving average is an example of a filtering technique. Other filters may be applied depending on the nature of the input series. ST4064 Time Series Analysis 36 Exponential smoothing is another common set of techniques. It is used typically to “simplify” the input time series by dampening its variations so as to retain in priority the underlying dynamics. III.8 Transformations Recall: In the simple linear model yi = .0 + .ixi + ei where ei ~ IN (0,)2), we use regression diagnostic plots of the residuals, eˆi , to test the assumptions about the model (e.g. the normality of the error variables ei or the constant variance of the error variables ei). To test the later assumption we plot the residuals against the fitted values. eˆi x 0 x x x x x x x x x xx x x x x x x x x x ŷi If the plot does not appear as above, the data is transformed, and the most common transformation is the logarithmic transformation. Similarly, if after fitting an ARMA model to a time series xt, a plot of the “residuals” versus the “fitted values” indicates a dependence, then we should consider modelling a transformation of the time series xt and the most common transformation is the logarithmic Transformation Yt = ln(Xt) ST4064 IV. Time Series Analysis 37 Box-Jenkins methodology IV.1 Overview We consider how to fit an ARIMA(p,d,q) model to historical data {x1, x2, ...xn}. We assume that trends and seasonal effects have been removed from the data. The methodology developed by Box and Jenkins consists in 3 distinct steps: Tentative identification of an ARIMA model Estimation of the parameters of the identified model Diagnostic checks • • • If the tentatively identified model passes the diagnostic tests, it can be used for forecasting. If it does not, the diagnostic tests should indicate how the model should be modified, and a new cycle of Identification Estimation Diagnostic checks • • • is performed. IV.2 Model selection a. Identification of white noise Recall: in a simple linear regression model, yi = .0 + .1Xi + ei, ei ~ IN(0,)2), we use regression diagnostic plots of the residuals eˆi to test the goodness of fit of the model, i.e. if the assumptions ei ~ IN(0,)2) are justified. The error variables ei form a zero-mean white noise process: they are uncorrelated, with common variance )2. Recall: {et : t !! } is a zero-mean white noise process if E (et ) = 0 $t %! 2 , k = 0 " k = Cov(et , et #k ) = & ' 0, otherwise Thus the ACF and PACF of a white noise process (when plotted against k) look like this: ACF (*k) PACF ( !ˆk ) 1 -1 | | | 1 2 3 ... | | 1 | k -1 | | | 1 2 3 ... | | | k ST4064 Time Series Analysis 38 i.e. apart from *0 = 1, we have *k = 0 for k = 1,2,... and !k = 0 for k = 1, 2,... Question: how do we test if the residuals from a time series model look like a realisation of a white noise process? Answer: we look at the SACF and SPACF of the residuals. In studying the SACF and SPACF, we realise that even if the original process was white noise, we would not expect rk = 0 for k = 1, 2,… and !k = 0 for k = 1, 2,… as rk is only an estimate of *k and !ˆk is only an estimate of !k . Question: how close to 0 should rk and !ˆk be, if rk = 0 for k = 1, 2, … and !ˆk = 0 for k = 1, 2, …? Answer: If the original model is white noise, Xt = µ + et, then for each k, the SACF and SPACF satisfy ! 1$ ! 1$ rk ~ N # 0, & and !ˆk ~ N # 0, & " n% " n% This is true for large samples, i.e. for large values of n. ! 2 2 " Values of rk or !ˆk outside the range $ # , % can be taken as suggesting that a white noise model is & n n' inappropriate. However, these are only approximate 95% confidence intervals. If *k = 0, we can be 95% certain that rk lies between these limits. This means that 1 value in 20 will lie outside these limits even if the white noise model is correct. Hence a single value of rk or !ˆk outside these limits would not be regarded as significant on its own, but three such values might well be significant. There is an overall Goodness of Fit test, based on all the rk’s in the SACF, rather than on individual rk’s, called the Portmanteau test by Ljung and Box. It consists in checking whether the m sample autocorrelation coefficients of the residuals are too large to resemble those of a white noise process (which should all be negligible). Given residuals from an estimated ARMA(p,q) model, under the null hypothesis that all values of rk = 0, and the Q-statistic is asymptotically #2-distributed with s = m – p – q degrees of freedom, or, if a constant (say µ) is included, s = m – p – q – 1 degrees of freedom. If the white noise model is correct then rk 2 ! ! s2 for each s = m - p - q. k =1 n " k m Q = n(n + 2)# That is, under the null hypothesis that all values of rk = 0, the Q-statistic given above is asymptotically #2-distributed with m degrees of freedom. If the Q-statistic is found to be greater than the 95th percentile of that #2 distribution, the null hypothesis is rejected, which means that the alternative hypothesis that “at least one autocorrelation is non-zero” is accepted. Statistical packages print these statistics. For large n, the Ljung-Box Q-statistic tends to closely approximate the Box-Pierce statistic: ST4064 Time Series Analysis rk 2 n(n + 2) " ! n k =1 n - k m 39 m 2 k "r k =1 The overall diagnostic test is therefore performed as follows (for centred realisations): • Fit ARMA(p,q) model • Estimate (p+q) parameters • Test if rk2 m Q = n(n + 2) "n!k ~ 2 ! m! p!q k=1 Remark: the above Ljung-Box Q-statistic was first suggested to improve upon the simpler Box-Pierce test statistic m Q = n! rk2 k =1 which was found to perform poorly even for moderately large sample sizes. b. Identification of MA(q) Recall: for an MA(q) process, #k = 0 for all k > q, i.e. the “ACF cuts off after lag q”. To test if an MA(q) model is appropriate, we see if rk is close to 0 for all k > q. If the data do come from an MA(q) model, then for k > q (since the first q+1 coefficients are significant), q " 1" %% rk ~ N $$ 0, $$1+ ! 2 !i2 '''' && # n # i=1 and 95% of the rk’s should lie in the interval q q " # 1$ 1$ 2% 2% ' &1.96 )1 + 2/ !i * , +1.96 )1 + 2/ !i * ( n+ n+ i =1 i =1 ', , (. (note that it is common to use 2 instead of 1.96 in the above formula). We would expect 1 in 20 values to lie outside the interval. In practise, the #i’s are replaced by ri’s. The “confidence limits” on SACF plots are based on this. If rk lies outside these limits it is “significantly different from zero” and we conclude that #k $ 0. Otherwise, rk is not significantly different to zero and we conclude that #k = 0. SACF --- rk 1 --- --- --- 2 --- k --- ST4064 Time Series Analysis 40 For q=0, the limits for k=1 are ! 1.96 1.96 " , $# % n n' & as for testing for white noise model. Coefficient r1 is compared with these limits. For q = 1, the limits for k = 2 are ! " 1 1 2 2 $ #1.96 (1 + 2r1 ),1.96 (1 + 2r1 ) % n n & ' and r2 is compared with these limits. Again, 2 is often used in place of 1.96. c. Identification of AR(p) Recall: for an AR(p) process, we have !k = 0 for all k > p, i.e. the “PACF cuts off after lag p”. To test if an AR(p) model is appropriate, we see if the sample estimate of !k is close to 0 for all k > p. If the data do come from an AR(p) model, then for k > p, ! 1$ !ˆk ~ N # 0, & " n% and 95% of the sample estimates should lie in the interval ! 2 2 " , $# % & n n' The “confidence limits” on SPACF plots are based on this: if the sample estimate of !k lies outside these limits, it is “significant”. 0.4 0.2 -0.2 0.0 SPACF 0.6 0.8 Sample PACF of AR(1) 5 10 Lag k 15 ST4064 Time Series Analysis 41 IV.3 Model fitting a. Fitting an ARMA(p,q) model We make the following assumptions: • An appropriate value of d has been found and {zd+1, zd+2, ... zn} is stationary. • Sample mean z = 0; if not, subtract µˆ = z from each zi. • For simplicity, we assume that d = 0 (to simplify upper and lower limits of sums). We look for an ARMA(p,q) model for the data z: • If the SACF appears to cut off after lag q, an MA(q) model is indicated (we use the tests of significance described previously). • If the SPACF appears to cut off after lag p, and AR(p) model is indicated. If neither the SACF nor the SPACF cut off, mixed models must be considered, starting with ARMA(1,1). b. Parameter estimation: LS and ML Having identified the values for the parameters p and q, we must now estimate the values of the parameters (1, (2, ... (p and &1, &2, ..., &q in the model Zt = (1Zt-1 + ... + (pZt-p + et + &1et-1 + &qet-q Least squares (LS) estimation is equivalent to maximum likelihood (ML) estimation if et is assumed normally distributed. Example: in the AR(p) model, et = Zt – (1Zt-1 – ... – (pZt-p. The estimators !ˆ1 ,...,!ˆ p are chosen to minimise n " (z t ! !ˆ 1z t-1 ! ... ! !ˆ p z t-p )2 t=p+1 Once these estimates obtained, the residual at time t is given by eˆt = z " !ˆ1 zt -1 " ... " !ˆ p zt - p For general ARMA models, êt cannot be deduced from the zt. In the MA(1) model for instance, eˆt = zt " !ˆ1eˆt "1 We can solve this iteratively for êt as long as some starting value ê0 is assumed. For an ARMA(p,q) model, the list of starting values is ( ê0 , ê1 , ..., êq!1 ). The starting values are estimated recursively by backforecasting: ST4064 0. Time Series Analysis 42 Assume ( ê0 , ê1 , ..., êq!1 ) are all zero Estimate the (i and &j 2. Use forecasting on the time-reversed process {zn, ..., z1} to predict values for ( ê0 , ê1 , ..., êq!1 ) 1. 3. Repeat cycle (1)-(2) until the estimates converge. c. Parameter estimation: method of moments • Calculate theoretical ACF or ARMA(p,q): #k’s will be a function of the (’s and &’s. • Set #k = rk and solve for the (’s and &’s. These are the method of moments estimators. Example: you have decided to fit the following MA(1) model xn = en + .en-1 , en ~ N(0,1) You have calculated !ˆ0 =1, !ˆ1 = -0.25. Estimate .. ˆ We have r1 = ! 1 = -0.25. !ˆ0 Recall: '0 = (1 + .2) )2 = 1 + .2 and '1 = .)2 = . here, from which *1 = Setting #1 = r1 = ! . 1+! 2 ! = -0.25 and solving for . gives . = -0.268 or . = -3.732. 1+! 2 Recall: the MA(1) process is invertible IFF |.| < 1. So for . = -0.268, the model is invertible. But for . = -3.732 the model is not invertible. Note: If !ˆ1 = -0.5 here, then #1 = r1 = ! = -0.5, which gives (. + 1)2 = 0, so . = -1, and neither 1+! 2 estimate gives an invertible model. Now, let us estimate )2 = Var (et). Recall that in the simple linear model Yi = &0 + &1Xi + ei, ei ~ IN(0, )2), )2 is estimated by !ˆ 2 = 1 n-2 n 2 i " eˆ i =1 where eˆi = yi - !ˆ0 - !ˆ1 xi is the ith residual. Here we use !ˆ 2 = = 1 n 2 $ eˆt n t = p +1 1 n n $ ( z - "ˆ z t t = p +1 1 t -1 -...- "ˆ p zt - p - #ˆ1eˆt -1 -...- #ˆq eˆt -q ) ST4064 Time Series Analysis 43 No matter which estimation method is used this parameter is estimated last, as estimates of the (’s and .’s are required first. Note: In using either Least Squares or Maximum Likelihood Estimation we also find the residuals, ê t , whereas using the Method of Moments to estimate the ,’s and .’s these residuals have to be calculated afterwards. Note: for large n, there will be little difference between LS, ML and Method of Moments estimators. d. Diagnostic checking Assume we have identified a tentative ARIMA(p,d,q) model and calculated the estimates ˆ !, ˆ "ˆ 1 , ... "ˆ p , #ˆ 1, ... ,#ˆ q . µ, We must perform diagnostic checks based on the residuals. If the ARMA(p,q) model is a good approximation to the underlying time series process, then the residuals ê t will form a good approximation to a white noise process. (I) Tests to see if the residuals are white noise: ! 1.96 1.96 " Study SACF and SPACF of residuals. Do rk and !ˆk lie outside $ # , %? n n' & • Portmanteau test of residuals (carried out on the residual SACF): m r2 n(n + 2) # k ~ ! m2 "s , for s = number of parameters of the model n k k =1 • If the SACF or SPACF of the residuals has too many values outside the interval !$ # 1.96 , 1.96 "% we & n n' conclude that the fitted model does not have enough parameters and a new model with additional parameters should be fitted. The Portmanteau test may also be used for this purpose. Other tests are: {eˆ t } • Inspection of the graph of • Counting turning points • Study the sample spectral density function of the residuals (II) Inspection of the graph of {eˆ t }: plot ê t against t • plot ê t against zt any patterns evident in these plots may indicate that the residuals are not a realisation of a set of independent (uncorrelated) variables and so the model is inadequate. • ST4064 (III) Time Series Analysis 44 Counting Turning Points: This is a test of independence. Are the residuals a realisation of a set of independent variables? Possible configurations for a turning point are: In the diagram above, there exists a turning point for all configurations except (a) and (b). Since four out of the six possible configurations exhibit a turning point, the probability to observe one is 4/6 = 2/3. If y1, y2, ..., yn is a sequence of numbers, the sequence has a turning point at time k if either yk-1 < yk AND yk > yk+1 or yk-1 > yk AND yk < yk+1 Result: if Y1, Y2, ... YN is a sequence of independent random variables, then • the probability of a turning point at time k is 2/3 • The expected number of turning points is 2/3 (N - 2) • The variance is (16N – 29)/90 [Kendall and Stuart, “The Advanced Theory of Statistics”, 1966, vol 3, p.351] therefore, the number of turning points in a realisation of Y1, Y2, ... YN should lie within the 95% confidence interval: !2 $ 16 N # 29 % 2 $ 16 N # 29 % " & ( N # 2) # 1.96 ( ) , ( N # 2) + 1.96 ( )' * 90 + 3 * 90 + -' ,& 3 Study the sample spectral density function of the residuals: Recall: the spectral density function on white noise process is f(#) = )2/2$ , -$ < # < $. So the sample spectral density function of the residuals should be roughly constant for a white noise process. ST4064 V. Time Series Analysis 45 Forecasting V.1 The Box-Jenkins approach Having fitted an ARMA model to {x1, x2, ... xn} we have the equation: Xn+k = µ + ,1 (xn+k-1 – µ) + ... + ,p (xn+k-p – µ) + en+k + .1en+k-1 + ...+ .qen+k-q x1 x2 ... ... xn .... xn+k S ? • 1 • • 2 n n+k time x̂ n (k) = Forecast value of xn+k, given all observations up until time n. = k-step ahead forecast at time n. In the Box-Jenkins approach, x̂ n (k) is taken as E(Xn+k | X1 , ... , Xn), i.e. x̂ n (k) is the conditional expectation of the future value of the process, given the information currently available. From result 2 in ST3053 (section A), we know that E(Xn+k | X1 , ... , Xn) minimises the mean square error E(Xn+k – h( X1 , ... , Xn))2 of all functions h(X1 , ... , Xn). x̂ n (k) is calculated as follows from the equation for Xn+k: • Replace all unknown parameters by their estimated values • Replace random variables X1, ..., Xn by their observed values x1 , ... , xn. • Replace random variables Xn+1 , ... , Xn+k-1 by their forecast values, x̂ n (1) , ... , x̂ n (k-1) • Replace variables e1 , ... , en by the residuals eˆ1 , ... , eˆ n • Replace variables en+1 , ... , en+k-1 by their expectations 0. Example: AR(2) model xn = µ + !1 ( xn-1 - µ ) + ! 2 ( xn-2 - µ ) + en . Since X n+1 = µ + !1 ( X n – µ ) + ! 2 ( X n"1 – µ ) + en+1 X n+2 = µ + !1 ( X n+1 – µ ) + ! 2 ( X n – µ ) + en+2 we have xˆn (1) = µˆ + !ˆ1 ( xn " µˆ ) + !ˆ 2 ( xn-1 " µˆ ) xˆn (2) = µˆ + !ˆ1 ( xˆn (1) " µˆ ) + !ˆ 2 ( xn " µˆ ) ST4064 Time Series Analysis 46 Example: 2-step ahead forecast of an ARMA(2,2) model xn = µ + !1 ( xn-1 - µ ) + ! 2 ( xn-2 - µ ) + en + "2en#2 . Since xn+2 = µ + !1 ( xn+1 - µ ) + ! 2 ( xn - µ ) + en+ 2 + "2en , we have xˆn (2) = µˆ + !ˆ1 ( xˆn (1) - µˆ ) + !ˆ 2 ( xn - µˆ ) + "ˆ2eˆn The (forecast) error of the forecast x̂ n (k) is x n+k - xˆ n (k ) The expected value of this error is E(xn+k - xˆ n (k) | x1,...,x n ) = xˆ n (k) - xˆ n (k) = 0 Hence the variance of the forecast error is E((x n+k ! xˆ n (k ))2 | x1 ,..., x n ) This is needed for confidence interval forecasts as it is more useful than a point estimate. For stationary processes, it may be shown that x̂ n (k) ! µ as k ! " . Hence, the variance of the forecast error tends to E(xn+k-µ)2 = )2 as k % &, where )2 is the variance of the process. V.2 Forecasting ARIMA processes If X is ARIMA(p,d,q) then Z = ! d X is ARMA(p,q). • Use methods reviewed to produce forecasts for Z • Reverse the differencing procedure to produce forecasts for X Example: if X is ARIMA(0,1,1) then Z = !X is ARMA(0,1), leading to the forecast ẑn (1) . But Xn+1 = Xn + Zn+1, so xˆ n (1) = x n + zˆ n (1) Question: Find x̂ n (2) for an ARIMA(1,2,1) process. Let Z n = ! 2 X n and assume Zn = µ + , (Zn-1 – µ) + en + .en-1, but Z n+ 2 = !2 X n+ 2 = ( X n+ 2 " X n+1 ) " ( X n+1 " X n ) = X n+ 2 " 2 X n+1 + X n ST4064 Time Series Analysis 47 so Xn+2 = 2Xn+1 – Xn + Zn+2. Hence, ˆ ˆ n (1) ! µˆ xˆ n (2) = 2xˆ n (1) ! x n + zˆ n (2) = 2xˆ n (1) ! x n + µˆ +!(z V.3 Exponential smoothing and Holt-Winters • The Box-Jenkins method requires a skilled operator in order to obtain reliable results. • For cases where only a simple forecast is needed, exponential smoothing is much simpler (Holt, 1958). A weighted combination of past values is used to predict future observations. For example, the first forecast for an AR model is obtained by ( 2 xˆn (1) = ! xn + (1 " ! ) xn "1 + (1 – ! ) xn "2 + ... ) or " xˆn (1) = ! # (1- ! )i xn-i = i =0 ! xn 1- (1- ! ) B ! • The sum of the weights is ! " (1-!)i = i=0 ! =1 1-(1-!) • Generally we use a value of , such that 0 < , < 1, so that there is less emphasis on historic values further back in time (usually, 0.2 6 , 6 0.3). • There is only one parameter to control, usually estimated via least squares. • The weights decrease geometrically – hence the name exponential smoothing. Updating forecasts is easy with exponential smoothing: Xn-1 Xn Xn+1 5 ? 5 | n-1 | n | n+1 It is easy to see that xˆn (1) = (1- ! ) xˆn-1 (1) + ! xn = xˆn-1 (1) + ! ( xn - xˆn-1 (1)) Current forecast = previous forecast + , 7 (error in previous forecast). ST4064 Time Series Analysis 48 • Simple exponential smoothing can’t cope with trend or seasonal variation. • Holt-Winters smoothing can cope with trend and seasonal variation • Holt –Winters can sometimes outperform Box-Jenkins forecasts. V.4 Linear filtering input process linear filter xt time Series output process yt filter weights time series A linear filter is a transformation of a time series {xt} (the input series) to create an output series {yt} which satisfies: yt = ! #a k x t-k . k= "! The collection of weights {ak : k % Z} forms a complete description of the filter. The objective of the filtering is to modify the input series to meet particular objectives, or to display specific features of the data. For example, an important problem in analysis of economic time series is detection, isolation and removal of deterministic trends. In practice, a filter {ak : k % Z} normally contains only a relatively small number of non-zero components. Example: regular differencing. This is used to remove a linear trend. Here a0 = 1, a1 = -1, ak = 0 otherwise. Hence yt = xt – xt-1. Example: seasonal differencing. Here a0 = 1, a12 = -1, ak = 0 otherwise, and yt = xt – xt-12. Example: if the input series is a white noise and the filter takes the form {.0 = 1, .1, ... , .q}, then the output series is MA(q), since q yt = " ! k et -k k =0 If the input series, x, is AR(p), and the filter takes the form {,0 = 1, -,1, ... , -,p}, then the output series is white noise p yt = xt " # ! k xt -k = et k =1 ST4064 VI. Time Series Analysis 49 Multivariate time series analysis VI.1 Principal component analysis and dimension reduction a. Principle Component Analysis See lectures and practicals. b. Multivariate correlation: basic properties The multivariate process (X,Y,Z) defined by ! ! µX $$ ( X , Y , Z ) ~ N $ $ µY $$ µ && Z " ! # XX % $ % , $ #YX % $# ' & ZX # XY #YY # ZY # XZ " " %% #YX % % # ZZ %' %' satisfies the following: X adjusted for Z: -1 X ! E( X | Z ) = ( X ! µ X ) ! " XZ "ZZ ( z - µZ ) #$ X = µ X + ! ( Z " µ Z ) % ˆ -1 $& ! = ' XZ ' ZZ Y adjusted for Z: -1 Y ! E(Y | Z ) = (Y ! µY ) ! "YZ "ZZ ( z - µZ ) Partial Covariance : -1 -1 Cov "$( X ! µ X ) ! & XZ & ZZ ( z - µ Z ), (Y ! µY ) ! & YZ & ZZ ( z - µ Z ) #% -1 -1 = E " "$( X ! µ X ) ! & XZ & ZZ ( z - µ Z ) #% "$(Y ! µY ) ! & YZ & ZZ ( z - µ Z ) #% # $ % -1 -1 = E "$( X ! µ X )(Y ! µY ) ! & XZ & ZZ ( z - µ Z )( z - µ Z ) -1 & ZZ = & XY ! & XZ & YZ # % !1 & & ZZ ZY Variance: -1 Var "$( X ! µ X ) ! & XZ & ZZ ( z - µZ ) #% -1 -1 = E " "$( X ! µ X ) ! & XZ & ZZ ( z - µZ ) #% "$( X ! µ X ) ! & XZ & ZZ ( z - µZ ) #% # $ % = & XX ! & XZ and !1 & & ZZ ZX ST4064 Time Series Analysis 50 -1 Var "$(Y ! µY ) ! & YZ & ZZ ( z - µ Z ) #% = & YY ! & YZ !1 & & ZZ ZY from which we get the partial correlation -1 P( X , Y | Z ) = ! -! ! ! ! ! ! ! -! ! ! XY ! XX - XZ ZZ ZY -1 XZ ZZ -1 ZX YY YZ ZZ ZY Now substituting X = Xt 8XY = '2 Z = Xt+1 8XZ = '1 = 8XZ Y = Xt+2 8ZZ = '0 we get P( X t , X t + 2 X t +1 ) = ! 2 $ ! 1! 0-1! 1 ! 0 $ ! 1! 0-1! 1 2 %! & $' 1 ( ) !0 * = 2 % !1 & 1$ ' ( ) !0 * " $ "2 = 2 21 = #2 1 $ "1 !2 !0 1 = 1 "1 "2 "1 "1 1 "1 VI.2 Vector AR processes A univariate time series consists of a sequence of random variables Xt, where Xt is the value of the single variable X of interest at time t. An m-dimensional multivariate time series consists of a sequence of random vectors X1, X2 , ... There are m variables of interest, denoted X(1) , ..., X(m), and Xt(m) is the value of X(m) at time t. Thus at time t we have a vector of observations ST4064 Time Series Analysis 51 ! X t(1) " # $ Xt = # ! $ # X t( m ) $ % & X1 ... Xt ... Xn single variable X 1 X1 m variables of interest (X(1) , ... , X(m)) ... ... t Xt ... ... n Xn X1(1) Xt(1) Xn(1) . . . . . . . . . X1(m) Xt(m) Xn(m) 1 ... t ... time n time As for the second order properties of Xk, we use • Vectors of expected values µt = E(Xt) • Covariance Matrices Cov(Xt, Xt+k) for all pairs of random vectors The vector process {Xt} is weakly stationary if E(Xt) and Cov(Xt, Xt+k) are independent of t. Let µ denote the common mean vector E(Xt) and 8k denote the common lag k covariance matrix, i.e. 8k = Cov(Xt, Xt+k) In the stationary case, ST4064 Time Series Analysis X X (1) %k = X (m) (1) !* #! # #&* 52 (m) X ... ... ... *" $ $ *$ ' ! For k=0, 8k is the variance/covariance matrix of X(1), ..., X(m). 8k(1,1) = Cov (xt(1) , xt+k(1)) = covariance at lag k for x(1). 8k(i,j) = Cov (xt(i) , xt+k(j)) = lag k cross-covariance of X(i) with x(j). Example: Multivariate White Noise. Recall that univariate white noise is a sequence e1, e2, ... of random variables with E(et) = 0 and Cov(et, et+k) = )2 1(k=0) (where 1(.) is the indicator function). Multivariate white noise is the simplest example of a multivariate random process. Let e1 , e2 , ... be a sequence of independent, zero-mean random vectors, each with the same covariance matrix 8. Thus for k = 0, the lag k covariance matrix of the et’s is 8k = 8. But since the et’s are independent vectors 8k= 0 for k > 0.Thus 8 need not be a diagonal matrix, i.e. the components of et at time t need not be independent of each other. However, the et’s are independent vectors -- the components of et and et+k are independent for k > 0. Example: A vector autoregressive process of order P, Var(P), is a sequence of m-component random vectors {X1, X2, ...} satisfying P xt = µ + " Aj ( xt - j ! µ ) + et j =1 where e is an m-dimensional white noise process and the Aj are (m x m) matrices. Example: Let it denote the interest rate at time t and It the tendency to invest at time t. We might believe these two are related as follows: #$it – µi = !11 ( it "1 – µi ) + et (i ) % (I ) $& I t – µ I = ! 21 ( it "1 – µi ) + ! 22 ( I t "1 – µ I ) + et where e(i) and e(I) are zero-mean, univariate white noise. They may have different variances and are not necessarily uncorrelated, i.e. we do not require that Cov (et(i) , et(I) ) = 0 for any t. However, we do require Cov (et(i) , es(I) ) = 0 for s 9 t. The model can be expressed as a 2-dimensional VAR(1): " ! i t -µ i " ! !11 0 " ! i t-1 -µ i " ! e(i) t # $=# $ + # (I) $ $# % I t -µ I & % ! 21 ! 22 & % I t-1 -µ I & % et & The theory and analysis of Var(1) closely parallels that of a univariate AR(1). ST4064 Time Series Analysis 53 Recall: The AR(1) model xt = µ + , (xt-1 – µ) + et is stationary IFF | , | < 1. For the Var(p) process with p = 1 (Var(1)) X t = µ + A( X t !1 – µ ) + et we have t -1 X t = µ + ! A j et - j + At ( X 0 - µ ) j =0 In order that X should represent a stationary time series, the powers of A should converge to zero in some sense: this will happen if all eigenvalues of the matrix A are less than 1 in absolute magnitude. Recall eigenvalues (see appendix): 2 is an eigenvalue of the n x n matrix A if there is a non-zero vector x (called the eigenvector) such that Ax = 2x or (A – 2I) x = 0 These equations have a non-zero solution x IFF | A – 2I | = 0. This equation is solved for 2 to find the eigenvalues. 2"! Example: Find the eigenvalues of !# 2 1 "$ . Solution: Solve % 4 2& 4 1 = 0 which is 2"! equivalent to (2 – 2)2 – 4 = 22 - 42 = 2 (2 – 4) = 0. The eigenvalues are 0 and 4. Question: Is the following multivariate time series stationary? x ! x t " ! 0.3 0.5 " ! x t-1 " ! e t " # $=# $+# y $ $# % y t & % 0.2 0.2 & % y t-1 & #% e t $& 0.3 0.5 " We find the eigenvalues of !# $: % 0.2 0.2 & 0.3 " ! 0.5 = (0.3 " ! )(0.2 " ! ) " 0.1 0.2 0.2 " ! = ! 2 " 0.5! " 0.04 = 0 ! 2 = 0.57, -0.07 Since | 2 | < 1 for both eigenvalues, the process is stationary. Question: Write the model in question 7.18 in terms of Xt only. Show that Xt is stationary in its own right. Solution: The model can be written as: ST4064 Time Series Analysis "# X t = 0.3 X t !1 + 0.5Yt !1 + et X $ Y #%Yt = 0.2 X t -1 + 0.2Yt -1 + et 54 (1) (2) Rearranging (1): Yt-1 = 2(Xt – 0.3Xt-1 – etX) so Yt = 2(Xt+1 – 0.3Xt – et+1X). Substituting for Yt and Yt-1 in (2) and tidying up: Xt+1 = 0.5Xt + 0.04Xt-1 + et+1X – 0.2etX + 0.5etY Since the white noise terms do not affect stationarity, the characteristic equation is 1 – 0.5 2 – 0.04 22 = 0 Since the model can be written as (1 – 0.5B -0.04B2)Xt = ..., the roots of the characteristic equation are 21 = -14.25 and 22 = 1.75. Since | 2 | > 1 for both roots, the Xt process is stationary. Example: A 2-dimensional VAR(2). Let Yt denote the national income over a period of time, Ct the total consumption over the same period, and It the total investment over the same period. We assume Ct = ,Yt(1) (1) is a zero-mean white noise (consumption over a period depends on the income over 1 + et , where e the previous period). We assume It = . (Ct-1 – Ct-2) + et(2), where e(2) is another zero-mean white noise. We assume Yt = Ct + It (any part of the national income is either consumed or invested). Eliminating Yt, we get the following 2-dimensional VAR(2): Ct = , Ct-1 + , It-1 + et(1) It = . (Ct-1 – Ct-2) + et(2) Using matrix notation, we get $ Ct % $ ! ! % $ Ct #1 % $ 0 & '=& '+& '& ( It ) ( " 0 ) ( It #1 ) ( #" 0 % $ Ct #2 % $ et(1) % '+& ' '& 0 ) ( It #2 ) ( et(2) ) VI.3 Cointegration Cointegrated time series can be applied to analyse non-stationary multivariate time series. Recall: X is integrated of order d (X is I(d)) if Y = ! d X is stationary. For univariate models, we have seen that a stochastic trend can be removed by differencing, so that the resulting time series can be estimated using the univariate Box-Jenkins approach. In the multivariate case, the appropriate way to treat non-stationary variables is not so straightforward, since it is possible for there to be a linear combination of integrated variables that is stationary. In this case, the variables are said to be cointegrated. This property can be found in many econometric models. ST4064 Time Series Analysis 55 Definition: Two time series X and Y are called cointegrated if: i) X and Y are I(1) random processes ii) There exists a non-zero vector (, , .) such that ,X + .Y is stationary. Thus X and Y are themselves non-stationary, (being I(1)), but their movements are correlated in such a way that a certain weighted average of the two processes is stationary. The vector (, , .) is called a cointegrating vector. We may expect two processes to be cointegrated if • one of the processes is driving the other; • both are being driven by the same underlying process. Remarks: R1 – Any equilibrium relationship among a set of non-stationary variables indicates that the variables cannot move independently of each other, and implies that their stochastic trends must be linked. This linkage implies that the variables are cointegrated. R2 – If the linear relationship (as made obvious by cointegration) is already stationary, differencing the relationship entails a misspecification error. R3 – There are two main popular tests for cointegration, but they are not the only ones. Reference: see e.g. Enders, “Applied Econometric Time Series”, Wiley 2004. Example: Let Xt denote the U.S. Dollars/GB Poung exchange rate. Let Pt be the consumer price index for the U.S. and Qt the consumer price index for the U.K. It is assumed that Xt fluctuates around the purchasing power Pt/Qt according to the following model: ln Xt = ln(Pt/Qt) + Yt Yt = µ + , (Yt-1 – µ) + et + . et-1 where e is a zero-mean white noise. We assume ln P and ln Q follow ARIMA(1,1,0) models: (1-B) ln Pt = µ1 + ,1 [(1-B) ln Pt-1 – µ1] + et(1) (1-B) ln Qt = µ2 + ,2 [(1-B) ln Qt-1 – µ2] + et(2) where e(1) and e(2) are zero-mean white noise, possibly correlated. Since ln Pt and ln Qt are both ARIMA(1,1,0) processes, they are both I(1)-non-stationary, and ln Xt is also non-stationary. However, lnXt – ln Pt + ln Qt = Yt and Yt is an ARMA(1,1) process-stationary. Hence, the sequence of random vectors {(ln Xt, ln Pt, ln Qt): t = 1,2,...} is a cointegrated model with cointegrating vector (1,-1,1). Question: Show that the two processes Xt and Yt defined by ST4064 Time Series Analysis Xt = 0.05Xt-1 + 0.35Yt-1 + etX . . . Yt = 0.35Xt-1 + 0.65Yt-1 + etY . . . 56 (1) (2) are cointegrated, with cointegrating vector (1,-1). Solution: We have to show that Xt – Yt is a stationary process. If we subtract the second equation from the first, we get Xt – Yt = 0.3Xt-1 – 0.3Yt-1 + etX - etY = 0.3 (Xt-1 – Yt-1) + etX - etY Hence the process is stationary, since |0.3| < 1; the white noise terms don’t affect the stationarity. Strictly speaking, we should also show that the processes Xt and Yt are both I(1). We use the method of question 7.19 to find the process Yt: from the first equation (1) we have Yt-1 = 1/0.35 (Xt – 0.05Xt-1 – etX) and so Yt = 1/0.35 (Xt+1 – 0.05Xt – et+1X). Substituting in the second equation (2), gives 1 1 ( X t +1 ! 0.05 X t ! et +1 X ) = 0.35 X t -1 + (0.05) ( X t ! 0.05 X t -1 + et X ) + et Y 0.35 0.35 Tidying up, we have: Xt+1 = 1.3Xt – 0.3 Xt-1 +et+1X – 0.05etX + 0.35etY If this is to be an I(1) process, we need to show that the first difference is I(0). Look at the characteristic equation or re-write the above equation in terms of differences: !X t +1 = 0.3 !X t + et +1 X – 0.05et X + 0.35et Y since |0.3| < 1, this process is I(0) and so Xt is I(1). Similarly, Yt can be shown to be I(1). VI.4 Other common models a. Bilinear models The simplest example of this class is Xn + ,(Xn-1 – µ) = µ + en + .en-1 + b(Xn-1 – µ)en-1 Considered as a function of X, this relation is linear; it is also linear when considered as a function of e only; hence, the name “bilinear”. • Many bilinear models exhibit “burst” behaviour: When the process is far from its mean, it tends to exhibit larger fluctuations. ST4064 • Time Series Analysis 57 The difference between this model and ARMA(1,1) is in the final term: b(Xn-1 – µ)en-1. If Xn-1 is far from µ and en-1 is far from 0, this term assumes a much greater significance. b. Threshold AR models Let us look at a simple example: $! ( X " µ ) + en , if X n"1 # d X n = µ + % 1 n"1 &! 2 ( X n"1 " µ ) + en , if X n "1 > d These models exhibit cyclic behaviour. Example: set ,2 = 0. Xn follows an AR(1) process until it passes the threshold value d. Then Xn returns to µ and the process effectively starts again. Thus we get cyclic behaviour as the process keeps resetting. d µ t ST4064 Time Series Analysis 58 c. Random coefficient AR models Consider a simple example: Xt = µ + ,t (Xt-1 – µ) + et, where {,1, ,2, ...} is a sequence of independent random variables. Example: Xt = value of investment fund at time t. We have Xt = (1 + it) Xt-1 + et. It follows that µ = 0 and ,t = 1 + it where it is the random rate of return. The behaviour of such models is generally more irregular than that of the corresponding AR(1) model. VI.5 ARCH and GARCH a. ARCH Recall: Homoscedastic = constant variance Heteroscedastic = different variances Financial assets often display the following behaviour: - A large change in asset price is followed by a period of high volatility. A small change in asset price tends to be followed by further small changes. Thus the variance of the process is dependent upon the size of the previous value. This is what is meant by conditional heteroscedasticity. t The class of autoregressive models with conditional heteroscedasticity of order p – the ARCH(p) models – is defined by: p X t = µ + et !0 +! ! k (X t-k - µ)2 k=1 where e is a sequence of independent standard normal variables. Example: The ARCH(1) model X t = µ + et !0 + !1 (X t-1 - µ)2 A significant deviation of Xt-1 from the mean µ gives rise to an increase in the conditional variance of Xt, given Xt-1: ST4064 Time Series Analysis 59 (Xt - µ)2 = et2 (,0 + ,1 (Xt-1 – µ)2) E[(Xt – µ)2 | Xt-1] = ,0 + ,1 (Xt-1 – µ)2 Example: Let Zt denote the price of asset at the end of the tth trading day, and let Xt = ln(Zt/Zt-1) be the daily rate of return on day t. It has been found that the ARCH model can be used to model Xt. Brief history of cointegration and ARCH modelling: • Cointegration (1981 - ) – Granger • ARCH (1982 - ) – Engle 2003 Nobel prize in Economics – Engle/Granger b. GARCH