Structural Break Detection in Time Series Models Richard A. Davis Thomas Lee Gabriel Rodriguez-Yam Colorado State University (http://www.stat.colostate.edu/~rdavis/lectures) This research supported in part by: • National Science Foundation • IBM faculty award • EPA funded project entitled: Space-Time Aquatic Resources Modeling and Analysis Program (STARMAP) The work reported here was developed under STAR Research Assistance Agreements CR-829095 awarded by the U.S. Environmental Protection Agency (EPA) to Colorado State University. This presentation has not been formally reviewed by EPA. EPA does not endorse any products or commercial services mentioned in this presentation. Corvallis 9/05 1 Illustrative Example -6 -4 -2 0 2 4 6 How many segments do you see? 0 Corvallis 9/05 t1 = 51 100 200 t2 = 151 300 time t3 = 251 400 2 Illustrative Example Auto-PARM=Auto-Piecewise AutoRegressive Modeling -6 -4 -2 0 2 4 6 4 pieces, 2.58 seconds. 0 Corvallis 9/05 t1 = 51 100 200 t2 = 157 300 time t3 = 259 400 3 Introduction Setup Examples AR GARCH Stochastic volatility State space model Model Selection Using Minimum Description Length (MDL) General principles Application to AR models with breaks Optimization using Genetic Algorithms Basics Implementation for structural break estimation Simulation results Applications Simulation results for GARCH and SSM Corvallis 9/05 4 Examples 1. Piecewise AR model: Yt = g j j1Yt 1 jp jYt p j s j et , if t j-1 t < t j , where t0 = 1 < t1 < . . . < tm-1 < tm = n + 1, and {et} is IID(0,1). Goal: Estimate m = number of segments tj = location of jth break point gj = level in jth epoch pj = order of AR process in jth epoch ( j1,, jp j ) = AR coefficients in jth epoch sj = scale in jth epoch Corvallis 9/05 6 Piecewise AR models (cont) Structural breaks: Kitagawa and Akaike (1978) • fitting locally stationary autoregressive models using AIC • computations facilitated by the use of the Householder transformation Davis, Huang, and Yao (1995) • likelihood ratio test for testing a change in the parameters and/or order of an AR process. Kitagawa, Takanami, and Matsumoto (2001) • signal extraction in seismology-estimate the arrival time of a seismic signal. Ombao, Raz, von Sachs, and Malow (2001) • orthogonal complex-valued transforms that are localized in time and frequency- smooth localized complex exponential (SLEX) transform. • applications to EEG time series and speech data. Corvallis 9/05 7 Motivation for using piecewise AR models: Piecewise AR is a special case of a piecewise stationary process (see Adak 1998), m ~ Yt ,n = Yt j I[ t j 1 ,t j ) (t / n), j =1 where {Yt j } , j = 1, . . . , m is a sequence of stationary processes. It is argued in Ombao et al. (2001), that if {Yt,n} is a locally stationary process (in the sense of Dahlhaus), then there exists a piecewise ~ stationary process {Yt ,n } with mn with mn / n 0, as n , that approximates {Yt,n} (in average mean square). Roughly speaking: {Yt,n} is a locally stationary process if it has a timevarying spectrum that is approximately |A(t/n,w)|2 , where A(u,w) is a continuous function in u. Corvallis 9/05 8 Examples (cont) 2. Segmented GARCH model: Yt = st et , st2 = w j j1Yt 21 jp j Yt 2 p j j1st21 jq j st2q j , if t j-1 t < t j , where t0 = 1 < t1 < . . . < tm-1 < tm = n + 1, and {et} is IID(0,1). 3. Segmented stochastic volatility model: Yt = st et , log st2 = g j j1 log st21 jp j log st2 p j j t , if t j-1 t < t j . 4. Segmented state-space model (SVM a special case): p( yt | t ,..., 1, yt 1,..., y1 ) = p( yt | t ) is specified t = g j j1t 1 jp j t p j s j t , Corvallis 9/05 if t j-1 t < t j . 9 Model Selection Using Minimum Description Length Basics of MDL: Choose the model which maximizes the compression of the data or, equivalently, select the model that minimizes the code length of the data (i.e., amount of memory required to encode the data). M = class of operating models for y = (y1, . . . , yn) LF (y) = code length of y relative to F M Typically, this term can be decomposed into two pieces (two-part code), LF ( y ) = L(Fˆ|y) L(eˆ | Fˆ), where L(F̂|y) = code length of the fitted model for F L(eˆ|F̂) = code length of the residuals based on the fitted model Corvallis 9/05 10 Model Selection Using Minimum Description Length (cont) Applied to the segmented AR model: Yt = g j j1Yt 1 jp jYt p j s j et , if t j-1 t < t j , First term L(F̂|y) : L(Fˆ|y) = L(m) L( t1 ,, tm ) L( p1,, pm ) L(ˆ 1 | y ) L(ˆ m | y ) m m = log 2 m m log 2 n log 2 p j j =1 j =1 pj 2 2 log 2 n j Encoding: integer I : log2 I bits (if I unbounded) log2 IU bits (if I bounded by IU) MLE ̂ : ½ log2N bits (where N = number of observations used to compute ̂ ; Rissanen (1989)) Corvallis 9/05 11 L(eˆ | F̂) : Using Shannon’s classical results on information theory, Rissanen demonstrates that the code length of Second term can be approximated by the negative of the log-likelihood of the fitted model, i.e., by m L(eˆ | Fˆ) log 2 L(ˆ 1 | y ) j =1 For fixed values of m, (t1,p1),. . . , (tm,pm), we define the MDL as MDL (m, ( t1 , p1 ), , ( tm , pm )) m m pj 2 j =1 j =1 2 = log 2 m m log 2 n log 2 p j m log 2 n j log 2 L(ˆ j | y ) j =1 The strategy is to find the best segmentation that minimizes MDL(m,t1,p1,…, tm,pm). To speed things up for AR processes, we use Y-W estimates of AR parameters and we replace log 2 L(ˆ j | y ) with log 2 (2sˆ 2j ) n j Corvallis 9/05 12 Optimization Using Genetic Algorithms Basics of GA: Class of optimization algorithms that mimic natural evolution. • Start with an initial set of chromosomes, or population, of possible solutions to the optimization problem. • Parent chromosomes are randomly selected (proportional to the rank of their objective function values), and produce offspring using crossover or mutation operations. • After a sufficient number of offspring are produced to form a second generation, the process then restarts to produce a third generation. • Based on Darwin’s theory of natural selection, the process should produce future generations that give a smaller (or larger) objective function. Corvallis 9/05 13 Application to Structural Breaks—(cont) Genetic Algorithm: Chromosome consists of n genes, each taking the value of 1 (no break) or p (order of AR process). Use natural selection to find a near optimal solution. Map the break points with a chromosome c via (m, ( t1 , p1 ) , ( tm , pm )) c = (1 ,, nn ), where no break break point point at at tt,, 11,, ifif no tt == break point point at at time time tt == ttjj11 and and AR AR order order isis ppjj.. ppjj,, ifif break For example, c = (2, -1, -1, -1, -1, 0, -1, -1, -1, -1, 0, -1, -1, -1, 3, -1, -1, -1, -1,-1) t: 1 6 11 15 would correspond to a process as follows: AR(2), t=1:5; AR(0), t=6:10; AR(0), t=11:14; AR(3), t=15:20 Corvallis 9/05 14 Simulation Examples-based on Ombao et al. (2001) test cases 1. Piecewise stationary with dyadic structure: Consider a time series following the model, 513,, ifif 11tt<<513 513tt<<769 769,, ifif 513 769tt1024 1024,, ifif 769 -10 -5 0 5 10 where {et} ~ IID N(0,1). .9.9YYt t 11eet ,t , .69YYt t 11.81 .81YYt t 22eet ,t , YYt t ==11.69 11.32 .81YYt t 22eet ,t , .32YYt t 11.81 1 Corvallis 9/05 200 400 600 Time 800 1000 17 1. Piecewise stat (cont) GA results: 3 pieces breaks at t1=513; t2=769. Total run time 16.31 secs 1 2 s2 1- 512: .857 .9945 513- 768: 1.68 -0.801 1.1134 769-1024: 1.36 -0.801 1.1300 Fitted model: 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 Fitted Model 0.5 True Model 0.0 0.2 0.4 0.6 Time Corvallis 9/05 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Time 18 Simulation Examples (cont) 2. Slowly varying AR(2) model: Yt = atYt 1 .81Yt 2 et if 1 t 1024 0.8 0.4 -6 -4 0.6 -2 a_t 0 2 1.0 4 1.2 where at = .8[1 0.5 cos( t / 1024)], and {et} ~ IID N(0,1). 1 Corvallis 9/05 200 400 600 Time 800 1000 0 200 400 600 time 800 1000 19 2. Slowly varying AR(2) (cont) GA results: 3 pieces, breaks at t1=293, t2=615. Total run time 27.45 secs 1 2 s2 1- 292: .365 -0.753 1.149 293- 614: .821 -0.790 1.176 615-1024: 1.084 -0.760 0.960 Fitted model: Fitted Model 0.0 0.0 0.1 0.1 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 True Model 0.0 Corvallis 9/05 0.2 0.4 0.6 Time 0.8 1.0 0.0 0.2 0.4 0.6 Time 0.8 1.0 20 2. Slowly varying AR(2) (cont) In the graph below right, we average the spectogram over the GA fitted models generated from each of the 200 simulated realizations. Average Model 0.3 0.2 0.0 0.0 0.1 0.1 0.2 Frequency 0.3 0.4 0.4 0.5 0.5 True Model 0.0 0.2 0.4 0.6 Time Corvallis 9/05 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Time 24 Examples -1500 -500 0 500 1000 1500 Speech signal: GREASY G 1 R EA 1000 2000 S 3000 4000 Y 5000 Time Corvallis 9/05 27 1500 1000 500 0 -1500 -500 Speech signal: GREASY n = 5762 observations m = 15 break points Run time = 18.02 secs G EA 1000 2000 S 3000 0.5 1 R Y 4000 5000 0.0 0.1 0.2 0.3 0.4 Time Corvallis 9/05 0.0 0.2 0.4 0.6 Time 0.8 1.0 28 Application to GARCH (cont) Garch(1,1) model: Yt = st et , t t t {ett} ~ IID(0,1) IID(0,1) st2t2 = wjj jjYtt2211 jjst2t211, if t j-j-11 t < t jj. .4 .1Yt21 .5st21, if if 1 t < 501 501, s = 2 2 if 501 501 t <1000 1000. .4 .1Yt 1 .6st1, if -4 -2 0 2 2 t 1 200 400 600 800 1000 # of CPs GA % AG % 0 80.4 72.0 1 19.2 24.0 2 0.4 0.4 Time CP estimate = 506 AG = Andreou and Ghysels (2002) Corvallis 9/05 29 Count Data Example 1008 0 1002 1004 5 1006 y MDL 10 1010 1012 1014 15 Model: Yt | t ~ Pois(exp{ t }), t = t-1+ et , {et}~IID N(0, s2) 1 100 200 300 400 500 time 1 100 200 300 400 500 Breaking Point True model: Yt | t ~ Pois(exp{.7 + t }), t = .5t-1+ et , {et}~IID N(0, .3), t < 250 Yt | t ~ Pois(exp{.7 + t }), t = -.5t-1+ et , {et}~IID N(0, .3), t > 250. GA estimate 251, time 267secs Corvallis 9/05 30 SV Process Example 1305 1295 -2 -1 1300 0 y MDL 1 2 1310 3 1315 Model: Yt | t ~ N(0,exp{t}), t = g t-1+ et , {et}~IID N(0, s2) 1 500 1000 1500 2000 2500 time 3000 1 500 1000 1500 2000 2500 3000 Breaking Point True model: Yt | t ~ N(0, exp{t}), t = -.05 + .975t-1+ et , {et}~IID N(0, .05), t 750 Yt | t ~ N(0, exp{t }), t = -.25 +.900t-1+ et , {et}~IID N(0, .25), t > 750. GA estimate 754, time 1053 secs Corvallis 9/05 31 SV Process Example -530 -0.5 -525 -520 -515 MDL 0.0 y -510 0.5 -505 -500 Model: Yt | t ~ N(0,exp{t}), t = g t-1+ et , {et}~IID N(0, s2) 1 100 200 300 time 400 500 1 100 200 300 400 500 Breaking Point True model: Yt | t ~ N(0, exp{t}), t = -.175 + .977t-1+ et , {et}~IID N(0, .1810), t 250 Yt | t ~ N(0, exp{t }), t = -.010 +.996t-1+ et , {et}~IID N(0, .0089), t > 250. GA estimate 251, time 269s Corvallis 9/05 32 SV Process Example-(cont) True model: Yt | t ~ N(0, exp{at}), t = -.175 + .977t-1+ et , {et}~IID N(0, .1810), t 250 Yt | t ~ N(0, exp{t }), t = -.010 .996t-1+ et , {et}~IID N(0, .0089), t > 250. Fitted model based on no structural break: 1.0 Yt | t ~ N(0, exp{t}), t = -.0645 + .9889t-1+ et , {et}~IID N(0, .0935) 0.5 simulated series 0.0 y 0.0 -0.5 -0.5 y 0.5 original series 1 100 200 300 time Corvallis 9/05 400 500 1 100 200 300 400 500 time 33 SV Process Example-(cont) Fitted model based on no structural break: -468 -466 simulated series -472 -478 -0.5 -476 -474 0.0 y MDL -470 0.5 1.0 Yt | t ~ N(0, exp{t}), t = -.0645 + .9889t-1+ et , {et}~IID N(0, .0935) 1 100 200 300 time Corvallis 9/05 400 500 1 100 200 300 400 500 Breaking Point 34 Summary Remarks 1. MDL appears to be a good criterion for detecting structural breaks. 2. Optimization using a genetic algorithm is well suited to find a near optimal value of MDL. 3. This procedure extends easily to multivariate problems. 4. While estimating structural breaks for nonlinear time series models is more challenging, this paradigm of using MDL together GA holds promise for break detection in parameter-driven models and other nonlinear models. Corvallis 9/05 35