Structural Break Detection in Time Series Models Richard A. Davis Thomas Lee

advertisement
Structural Break Detection in Time Series Models
Richard A. Davis
Thomas Lee
Gabriel Rodriguez-Yam
Colorado State University
(http://www.stat.colostate.edu/~rdavis/lectures)
This research supported in part by:
• National Science Foundation
• IBM faculty award
• EPA funded project entitled: Space-Time Aquatic Resources
Modeling and Analysis Program (STARMAP)
The work reported here was developed under STAR Research Assistance Agreements CR-829095
awarded by the U.S. Environmental Protection Agency (EPA) to Colorado State University. This
presentation has not been formally reviewed by EPA. EPA does not endorse any products or
commercial services mentioned in this presentation.
Corvallis 9/05
1
Illustrative Example
-6
-4
-2
0
2
4
6
How many segments do you see?
0
Corvallis 9/05
t1 = 51
100
200
t2 = 151
300
time
t3 = 251
400
2
Illustrative Example
Auto-PARM=Auto-Piecewise AutoRegressive Modeling
-6
-4
-2
0
2
4
6
4 pieces, 2.58 seconds.
0
Corvallis 9/05
t1 = 51
100
200
t2 = 157
300
time
t3 = 259
400
3
 Introduction
 Setup
 Examples
 AR
 GARCH
 Stochastic volatility
 State space model
 Model Selection Using Minimum Description Length (MDL)
 General principles
 Application to AR models with breaks
 Optimization using Genetic Algorithms
 Basics
 Implementation for structural break estimation
 Simulation results
 Applications
 Simulation results for GARCH and SSM
Corvallis 9/05
4
Examples
1. Piecewise AR model:
Yt = g j   j1Yt 1     jp jYt  p j  s j et , if t j-1  t < t j ,
where t0 = 1 < t1 < . . . < tm-1 < tm = n + 1, and {et} is IID(0,1).
Goal: Estimate
m = number of segments
tj = location of jth break point
gj = level in jth epoch
pj = order of AR process in jth epoch
( j1,,  jp j ) = AR coefficients in jth epoch
sj = scale in jth epoch
Corvallis 9/05
6
Piecewise AR models (cont)
Structural breaks:
Kitagawa and Akaike (1978)
• fitting locally stationary autoregressive models using AIC
• computations facilitated by the use of the Householder
transformation
Davis, Huang, and Yao (1995)
• likelihood ratio test for testing a change in the parameters
and/or order of an AR process.
Kitagawa, Takanami, and Matsumoto (2001)
• signal extraction in seismology-estimate the arrival time of
a seismic signal.
Ombao, Raz, von Sachs, and Malow (2001)
• orthogonal complex-valued transforms that are localized
in time and frequency- smooth localized complex
exponential (SLEX) transform.
• applications to EEG time series and speech data.
Corvallis 9/05
7
Motivation for using piecewise AR models:
Piecewise AR is a special case of a piecewise stationary process (see
Adak 1998),
m
~
Yt ,n = Yt j I[ t j 1 ,t j ) (t / n),
j =1
where {Yt j } , j = 1, . . . , m is a sequence of stationary processes. It is
argued in Ombao et al. (2001), that if {Yt,n} is a locally stationary
process (in the sense of Dahlhaus), then there exists a piecewise
~
stationary process {Yt ,n } with
mn   with mn / n  0, as n  ,
that approximates {Yt,n} (in average mean square).
Roughly speaking: {Yt,n} is a locally stationary process if it has a timevarying spectrum that is approximately |A(t/n,w)|2 , where A(u,w) is a
continuous function in u.
Corvallis 9/05
8
Examples (cont)
2. Segmented GARCH model:
Yt = st et ,
st2 = w j   j1Yt 21     jp j Yt 2 p j   j1st21     jq j st2q j ,
if t j-1  t < t j ,
where t0 = 1 < t1 < . . . < tm-1 < tm = n + 1, and {et} is IID(0,1).
3. Segmented stochastic volatility model:
Yt = st et ,
log st2 = g j   j1 log st21     jp j log st2 p j   j t ,
if t j-1  t < t j .
4. Segmented state-space model (SVM a special case):
p( yt | t ,..., 1, yt 1,..., y1 ) = p( yt | t ) is specified
t = g j   j1t 1     jp j t  p j  s j t ,
Corvallis 9/05
if t j-1  t < t j .
9
Model Selection Using Minimum Description Length
Basics of MDL:
Choose the model which maximizes the compression of the data or,
equivalently, select the model that minimizes the code length of the
data (i.e., amount of memory required to encode the data).
M = class of operating models for y = (y1, . . . , yn)
LF (y) = code length of y relative to F  M
Typically, this term can be decomposed into two pieces (two-part code),
LF ( y ) = L(Fˆ|y)  L(eˆ | Fˆ),
where
L(F̂|y) = code length of the fitted model for F
L(eˆ|F̂) = code length of the residuals based on the fitted model
Corvallis 9/05
10
Model Selection Using Minimum Description Length (cont)
Applied to the segmented AR model:
Yt = g j   j1Yt 1     jp jYt  p j  s j et , if t j-1  t < t j ,
First term L(F̂|y) :
L(Fˆ|y) = L(m)  L( t1 ,, tm )  L( p1,, pm )  L(ˆ 1 | y )    L(ˆ m | y )
m
m
= log 2 m  m log 2 n   log 2 p j  
j =1
j =1
pj  2
2
log 2 n j
Encoding:
integer I : log2 I bits (if I unbounded)
log2 IU bits (if I bounded by IU)
MLE ̂ : ½ log2N bits (where N = number of observations used to
compute ̂ ; Rissanen (1989))
Corvallis 9/05
11
L(eˆ | F̂) : Using Shannon’s classical results on
information theory, Rissanen demonstrates that the code length of
Second term
can be approximated by the negative of the log-likelihood of the
fitted model, i.e., by
m
L(eˆ | Fˆ)   log 2 L(ˆ 1 | y )
j =1
For fixed values of m, (t1,p1),. . . , (tm,pm), we define the MDL as
MDL (m, ( t1 , p1 ), , ( tm , pm ))
m
m
pj  2
j =1
j =1
2
= log 2 m  m log 2 n   log 2 p j  
m
log 2 n j   log 2 L(ˆ j | y )
j =1
The strategy is to find the best segmentation that minimizes
MDL(m,t1,p1,…, tm,pm). To speed things up for AR processes, we use
Y-W estimates of AR parameters and we replace
 log 2 L(ˆ j | y ) with log 2 (2sˆ 2j )  n j
Corvallis 9/05
12
Optimization Using Genetic Algorithms
Basics of GA:
Class of optimization algorithms that mimic natural evolution.
• Start with an initial set of chromosomes, or population, of
possible solutions to the optimization problem.
• Parent chromosomes are randomly selected (proportional to
the rank of their objective function values), and produce
offspring using crossover or mutation operations.
• After a sufficient number of offspring are produced to form a
second generation, the process then restarts to produce a third
generation.
• Based on Darwin’s theory of natural selection, the process
should produce future generations that give a smaller (or larger)
objective function.
Corvallis 9/05
13
Application to Structural Breaks—(cont)
Genetic Algorithm: Chromosome consists of n genes, each taking
the value of 1 (no break) or p (order of AR process). Use natural
selection to find a near optimal solution.
Map the break points with a chromosome c via
(m, ( t1 , p1 ) , ( tm , pm )) 
 c = (1 ,, nn ),
where
no break
break point
point at
at tt,,
 11,, ifif no
tt == 
break point
point at
at time
time tt == ttjj11 and
and AR
AR order
order isis ppjj..
 ppjj,, ifif break
For example,
c = (2, -1, -1, -1, -1, 0, -1, -1, -1, -1, 0, -1, -1, -1, 3, -1, -1, -1, -1,-1)
t: 1
6
11
15
would correspond to a process as follows:
AR(2), t=1:5; AR(0), t=6:10; AR(0), t=11:14; AR(3), t=15:20
Corvallis 9/05
14
Simulation Examples-based on Ombao et al. (2001) test cases
1. Piecewise stationary with dyadic structure: Consider a time
series following the model,
513,,
ifif 11tt<<513
513tt<<769
769,,
ifif 513
769tt1024
1024,,
ifif 769
-10
-5
0
5
10
where {et} ~ IID N(0,1).
 .9.9YYt t 11eet ,t ,

.69YYt t 11.81
.81YYt t 22eet ,t ,
YYt t ==11.69
11.32
.81YYt t 22eet ,t ,
 .32YYt t 11.81
1
Corvallis 9/05
200
400
600
Time
800
1000
17
1. Piecewise stat (cont)
GA results: 3 pieces breaks at t1=513; t2=769. Total run time 16.31 secs
1
2
s2
1- 512: .857
.9945
513- 768: 1.68 -0.801 1.1134
769-1024: 1.36 -0.801 1.1300
Fitted model:
0.4
0.3
0.2
0.1
0.0
0.0
0.1
0.2
0.3
0.4
0.5
Fitted Model
0.5
True Model
0.0
0.2
0.4
0.6
Time
Corvallis 9/05
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
Time
18
Simulation Examples (cont)
2. Slowly varying AR(2) model:
Yt = atYt 1  .81Yt 2  et if 1  t  1024
0.8
0.4
-6
-4
0.6
-2
a_t
0
2
1.0
4
1.2
where at = .8[1  0.5 cos( t / 1024)], and {et} ~ IID N(0,1).
1
Corvallis 9/05
200
400
600
Time
800
1000
0
200
400
600
time
800
1000
19
2. Slowly varying AR(2) (cont)
GA results: 3 pieces, breaks at t1=293, t2=615. Total run time 27.45 secs
1
2
s2
1- 292: .365 -0.753 1.149
293- 614: .821 -0.790 1.176
615-1024: 1.084 -0.760 0.960
Fitted model:
Fitted Model
0.0
0.0
0.1
0.1
0.2
0.2
0.3
0.3
0.4
0.4
0.5
0.5
True Model
0.0
Corvallis 9/05
0.2
0.4
0.6
Time
0.8
1.0
0.0
0.2
0.4
0.6
Time
0.8
1.0
20
2. Slowly varying AR(2) (cont)
In the graph below right, we average the spectogram over the GA
fitted models generated from each of the 200 simulated realizations.
Average Model
0.3
0.2
0.0
0.0
0.1
0.1
0.2
Frequency
0.3
0.4
0.4
0.5
0.5
True Model
0.0
0.2
0.4
0.6
Time
Corvallis 9/05
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
Time
24
Examples
-1500
-500
0
500
1000
1500
Speech signal: GREASY
G
1
R
EA
1000
2000
S
3000
4000
Y
5000
Time
Corvallis 9/05
27
1500
1000
500
0
-1500
-500
Speech signal: GREASY
n = 5762 observations
m = 15 break points
Run time = 18.02 secs
G
EA
1000
2000
S
3000
0.5
1
R
Y
4000
5000
0.0
0.1
0.2
0.3
0.4
Time
Corvallis 9/05
0.0
0.2
0.4
0.6
Time
0.8
1.0
28
Application to GARCH (cont)
Garch(1,1) model: Yt = st et ,
t
t t
{ett} ~ IID(0,1)
IID(0,1)
st2t2 = wjj   jjYtt2211   jjst2t211,
if t j-j-11  t < t jj.
 .4 .1Yt21 .5st21, if
if 1  t < 501
501,
s =
2
2
if 501
501  t <1000
1000.
.4 .1Yt 1 .6st1, if
-4
-2
0
2
2
t
1
200
400
600
800
1000
# of CPs
GA
%
AG
%
0
80.4
72.0
1
19.2
24.0
2
0.4
0.4
Time
CP estimate = 506
AG = Andreou and Ghysels (2002)
Corvallis 9/05
29
Count Data Example
1008
0
1002
1004
5
1006
y
MDL
10
1010
1012
1014
15
Model: Yt | t ~ Pois(exp{  t }), t = t-1+ et , {et}~IID N(0, s2)
1
100
200
300
400
500
time
1
100
200
300
400
500
Breaking Point
True model:
 Yt | t ~ Pois(exp{.7 + t }), t = .5t-1+ et , {et}~IID N(0, .3), t < 250
 Yt | t ~ Pois(exp{.7 + t }), t = -.5t-1+ et , {et}~IID N(0, .3), t > 250.
 GA estimate 251, time 267secs
Corvallis 9/05
30
SV Process Example
1305
1295
-2
-1
1300
0
y
MDL
1
2
1310
3
1315
Model: Yt | t ~ N(0,exp{t}), t = g   t-1+ et , {et}~IID N(0, s2)
1
500
1000
1500
2000
2500
time
3000
1
500
1000
1500
2000
2500
3000
Breaking Point
True model:
 Yt | t ~ N(0, exp{t}), t = -.05 + .975t-1+ et , {et}~IID N(0, .05), t  750
 Yt | t ~ N(0, exp{t }), t = -.25 +.900t-1+ et , {et}~IID N(0, .25), t > 750.
 GA estimate 754, time 1053 secs
Corvallis 9/05
31
SV Process Example
-530
-0.5
-525
-520
-515
MDL
0.0
y
-510
0.5
-505
-500
Model: Yt | t ~ N(0,exp{t}), t = g   t-1+ et , {et}~IID N(0, s2)
1
100
200
300
time
400
500
1
100
200
300
400
500
Breaking Point
True model:
 Yt | t ~ N(0, exp{t}), t = -.175 + .977t-1+ et , {et}~IID N(0, .1810), t  250
 Yt | t ~ N(0, exp{t }), t = -.010 +.996t-1+ et , {et}~IID N(0, .0089), t > 250.
 GA estimate 251, time 269s
Corvallis 9/05
32
SV Process Example-(cont)
True model:
 Yt | t ~ N(0, exp{at}), t = -.175 + .977t-1+ et , {et}~IID N(0, .1810), t  250
 Yt | t ~ N(0, exp{t }), t = -.010 .996t-1+ et , {et}~IID N(0, .0089), t > 250.
Fitted model based on no structural break:
1.0
 Yt | t ~ N(0, exp{t}), t = -.0645 + .9889t-1+ et , {et}~IID N(0, .0935)
0.5
simulated series
0.0
y
0.0
-0.5
-0.5
y
0.5
original series
1
100
200
300
time
Corvallis 9/05
400
500
1
100
200
300
400
500
time
33
SV Process Example-(cont)
Fitted model based on no structural break:
-468
-466
simulated series
-472
-478
-0.5
-476
-474
0.0
y
MDL
-470
0.5
1.0
 Yt | t ~ N(0, exp{t}), t = -.0645 + .9889t-1+ et , {et}~IID N(0, .0935)
1
100
200
300
time
Corvallis 9/05
400
500
1
100
200
300
400
500
Breaking Point
34
Summary Remarks
1. MDL appears to be a good criterion for detecting structural
breaks.
2. Optimization using a genetic algorithm is well suited to find a
near optimal value of MDL.
3. This procedure extends easily to multivariate problems.
4. While estimating structural breaks for nonlinear time series
models is more challenging, this paradigm of using MDL together
GA holds promise for break detection in parameter-driven models
and other nonlinear models.
Corvallis 9/05
35
Download