Extensions of some classical methods in change point analysis

advertisement
Extensions of some classical methods in
change point analysis
Lajos Horváth & Gregory Rice
TEST
An Official Journal of the Spanish
Society of Statistics and Operations
Research
ISSN 1133-0686
TEST
DOI 10.1007/s11749-014-0368-4
1 23
Your article is protected by copyright and all
rights are held exclusively by Sociedad de
Estadística e Investigación Operativa. This eoffprint is for personal use only and shall not
be self-archived in electronic repositories. If
you wish to self-archive your article, please
use the accepted manuscript version for
posting on your own website. You may
further deposit the accepted manuscript
version in any repository, provided it is only
made publicly available 12 months after
official publication or later and provided
acknowledgement is given to the original
source of publication and a link is inserted
to the published article on Springer's
website. The link must be accompanied by
the following text: "The final publication is
available at link.springer.com”.
1 23
Author's personal copy
TEST
DOI 10.1007/s11749-014-0368-4
INVITED PAPER
Extensions of some classical methods in change point
analysis
Lajos Horváth · Gregory Rice
© Sociedad de Estadística e Investigación Operativa 2014
Abstract A common goal in modeling and data mining is to determine, based on
sample data, whether or not a change of some sort has occurred in a quantity of
interest. The study of statistical problems of this nature is typically referred to as
change point analysis. Though change point analysis originated nearly 70 years ago,
it is still an active area of research and much effort has been put forth to develop new
methodology and discover new applications to address modern statistical questions.
In this paper we survey some classical results in change point analysis and recent
extensions to time series, multivariate, panel and functional data. We also present real
data examples which illustrate the utility of the surveyed results.
Keywords Change point analysis · Sequential monitor · Panel data · Time series ·
Functional data · Linear models
Mathematics Subject Classification Primary 60F017 · 62M10; Secondary 60F05 ·
60F25 · 62F05 · 60F12 · 62G30 · 62G10 · 62J05 · 62L20 · 62P12 · 62P20
Research supported by NSF grant DMS 1305858.
This invited paper is discussed in comments available at: doi:10.1007/s11749-014-0367-5, doi:10.1007/
s11749-014-0369-3, doi:10.1007/s11749-014-0370-x, doi:10.1007/s11749-014-0371-9, doi:10.1007/
s11749-014-0372-8, doi:10.1007/s11749-014-0373-7, doi:10.1007/s11749-014-0376-4, doi:10.1007/
s11749-014-0377-3.
L. Horváth (B) · G. Rice
Department of Mathematics, University of Utah, Salt Lake City, UT 84112–0090, USA
e-mail: horvath@math.utah.edu
G. Rice
e-mail: rice@math.utah.edu
123
Author's personal copy
L. Horváth, G. Rice
1 Introduction
Change point analysis, which is the study of the detection and estimation of changes in
a quantity of interest based on sample data, originated in the 1940s and initially focused
on data-driven quality control techniques. Over time, methods in change point analysis
have been developed to address data analytic questions in fields ranging from biology
to finance, and in many cases such methodology has become standard. The statistical
community now enjoys a vast literature on change point analysis where many of the
most natural and common questions have received at least some attention. In spite of
this, change point analysis is still an active area of research, and much effort is being put
forth to extend classical results to data which exhibit dependence, high-dimensionality
and/or follow models which have not yet been considered. The present paper is meant to
accomplish two goals. The first goal is to survey the most frequently used methods and
ideas in change point analysis. Toward this we have supplied detailed developments of
several procedures and an extensive, however far from exhaustive, list of references.
Throughout the paper we have also included applications of several classical results to
real data sets. The second goal is to derive extensions of some results presented in the
paper. The surveyed results and their extensions are organized into sections as follows.
In Sect. 2, we consider the early contribution to change point analysis by Page and other
nonparametric methods. These results are extended to time series in Sect. 3. Section
4 contains Darling–Erdős results and applications to changes in correlation. In Sect.
5, we consider changes to the parameters in regression models. Section 6 discusses
sequential methods in change point analysis. Section 7 contains applications of change
point analysis to panel data. In Sect. 8, we discuss functional data.
2 Page’s procedure and its extension
Some of the first results in change point analysis were derived in the context of quality
control where an important problem is to detect an increase in the proportion of
defective products which are being manufactured as quickly as possible. Naturally
many of these procedures were nonparametric in nature. Page (1954, 1955) suggested
a very simple method to test the stability of the quantiles of underlying observations.
To simplify the calculations below we take the quantile of interest to be the median.
Let us assume that
X 1 , X 2 , . . . , X N are independent random variables.
If m i denotes the median of X i , the null hypothesis is stated as
m = m1 = m2 = · · · = m N ,
where m is known and the change point alternative is
there is an integer k ∗ such that m 1 = m 2 = · · · = m k ∗ = m k ∗ +1
= m k ∗ +2 = · · · = m N .
123
(2.1)
Author's personal copy
Classical methods in change point analysis
We say that k ∗ is the location of the change assuming that the alternative holds. It is
assumed in Page (1955) that m = m 1 = m 2 = · · · = m N , i.e. the common value
under the no change null (and the initial value under the change point alternative) is
known. Page defined
Vj =
if X j > m
if X j ≤ m,
1,
−1,
and he recommended rejecting the no change null hypothesis if
TN = max
⎧
k
⎨
1≤k≤N
⎩
Vi − min
i=1
1≤ j≤k
j
i=1
⎫
⎬
Vi
⎭
is large. The limit distribution of TN under H0 was approximated using combinatorial
arguments. The exact distribution of TN can also be computed for any N , as it is
derived by Gombay (1994) from Csáki (1986) [cf. also Csörgő and Horváth (1997),
p 91]. However, using weak convergence of empirical measures [cf. Billingsley
(1968)], the limit distribution of TN can be derived easily. Assuming that the distributions of the X i ’s are continuous at the common median, the variables V1 , V2 , . . . , VN
are independent identically distributed random variables taking values ±1 with probability 1/2. Hence by the weak convergence of the simple random walk
N −1/2
N
t
Vi
D[0,1]
−→ W (t),
(2.2)
i=1
where W (t), 0 ≤ t < ∞, stands for a Wiener process (standard Brownian motion).
D[0,1]
We use −→ to denote the weak convergence of random functions in the Skorokhod
space [cf. Billingsley (1968)]. The weak convergence of partial sums in (2.2) implies
immediately that under the no change null hypothesis
N
−1/2
D
W (t) − inf W (s) ,
TN → sup
0≤s≤t
0≤t≤1
D
where → means convergence in distribution. According to Lévy’s formula [cf. Chung
and Williams (1983)], we have that
D
W (t) − inf W (s), 0 ≤ t < ∞ = {|W (t)|, 0 ≤ t < ∞}.
0≤s≤t
(2.3)
and therefore
D
N −1/2 TN → sup |W (t)|.
0≤t≤1
123
Author's personal copy
L. Horváth, G. Rice
We can consider TN as the supremum functional of
⎧
⎫
N
s
t
⎨N
⎬
VN (t) = N −1/2
Vi − inf
Vj ,
⎩
⎭
0≤s≤t
i=1
j=1
and via (2.2) the limit distributions of several other functionals of VN (t) can be derived.
For example,
1
D
VN2 (t)dt →
0
1
1
W (t)dt and
0
D
1
|VN (t)|dt →
2
0
|W (t)|dt.
0
This procedure can be modified to test for the equality of other quantiles by altering
the definition of V j . The main crux of this technique though is that it assumes that
the quantile of interest may be specified, which, although is reasonable in some contexts like quality control, is not plausible in many situations. Furthermore, even if a
single quantile is specified, this procedure cannot in general detect deviations in other
quantiles of the underlying sample.
In light of this, one may wish to consider the more general hypothesis test of
H0 : F1∗ = F2∗ = · · · = FN∗ ,
versus
H A : F1∗ = F2∗ = · · · = Fk∗∗ = Fk∗∗ +1 = · · · = Fk∗∗ +2 = FN∗ ,
for some unknown change point k ∗ where Fi∗ is the cumulative distribution function
of X i . Below we outline a test of these hypotheses, which exploits empirical process
theory, along the lines of Wolfe and Schechtman (1984) and Csörgő and Horváth
(1987). Let
FN t (x) =
N t
1 I {X i ≤ x}, −∞ < x < ∞, 0 ≤ t ≤ 1
N
i=1
denote the sequential empirical distribution function. The empirical quantile function
is defined as the generalized inverse of FN (x):
Q N (s) = inf {x : FN (x) ≥ s}, 0 < s < 1.
The process
Y N (t, s) = N 1/2 (FN t (Q N (s)) − ts)
is a version of the Wilcoxon-type process studied in Wolfe and Schechtman (1984)
and Csörgő and Horváth (1987). If
123
Author's personal copy
Classical methods in change point analysis
X 1 , X 2 , . . . , X N are identically distributed with
a continuous distribution function F,
(2.4)
then by the Glivenko–Cantelli theorem for the empirical distribution and quantile
functions, we get that for every 0 < t, s < 1 and −∞ < x < ∞
FN t (x) =
N t
N t 1 I {X i ≤ x} ≈ t F(x) and
N N t
Q N (s) ≈ Q(s) = F −1 (s),
i=1
so the asymptotic value of FN t (Q N (s)) is ts. Empirical process theory can be used
to derive the weak limit of Y N (t, s) under (2.1) and (2.4). First, we write
N t
1 (I {F(X i ) ≤ F(Q N (s))} − F(Q N (s)))
N
i=1
N t
N t
(F(Q N (s)) − s) +
− t s.
+
N
N
FN t (Q N (s)) − ts =
(2.5)
Clearly,
N t
1
.
sup − t s = O
N
N
0≤t,s≤1
(2.6)
Also, on account of (2.1) and (2.4), the variables F(X 1 ), F(X 2 ), . . . , F(X N ) are
independent, uniformly distributed on [0, 1]. Hence the distribution of Y N (t, s) does
not depend on F. By the weak convergence of the sequential empirical process, we
can define Gaussian processes N (t, s), 0 ≤ t, s ≤ 1 with E N (t, s) = 0 and
E N (t, s) N (t , s ) = min(t, t )(min(s, s ) − ss ) such that
t
−1/2 N
sup N
sup
(I {X i ≤ x} − F(x)) − N (t, F(x))
0≤t≤1 −∞<x<∞ i=1
= o P (1).
(2.7)
By the Bahadur–Kiefer representation [cf. Sect. 3.3 in Csörgő and Horváth (1993)]
we have that
sup |F(Q N (s)) + FN (Q(s)) − 2s| = o P (N −1/2 ).
(2.8)
0≤s≤1
Putting together (2.6)–(2.8), we conclude
sup |Y N (t, s) − [ N (t, F(Q N (s))) − t N (1, s)]| = o P (1).
0≤t,s≤1
123
Author's personal copy
L. Horváth, G. Rice
It also follows from (2.7) and (2.8) that
u N = sup |F(Q N (s)) − s| = O P (N −1/2 ).
(2.9)
0≤s≤1
So by the almost sure continuity of the sample paths of N (t, s) [cf. Chapter 1 in
Csörgő and Révész (1981)] we get
sup | N (t, F(Q N (s))) − N (t, s)|
0≤t,s≤1
≤ 2 sup
sup | N (t, s + h) − N (t, s)|
0≤t,s≤1 0≤h≤u N
= o P (1).
Thus, we have obtained the following result:
Theorem 2.1 If (2.1) and (2.4) hold, then we can define a sequence of Gaussian
processes ◦N (t, s) = N (t, s) − t N (1, s) with E◦N (t, s) = 0 and
E◦N (t, s)◦N (t , s ) = (min(t, t ) − tt )(min(s, s ) − ss ) such that
sup |Y N (t, s) − ◦N (t, s)| = o P (1).
0≤t,s≤1
The distribution of ◦N (t, s) is the same for all N . The computation of the distributions of functionals of the limiting process is discussed in Blum et al. (1961) and
Shorack and Wellner (1986).
Statistics based on functionals of Y N (t, s) are robust and they could be considered
as adaptations of Wilcoxon-type statistics to change point detection. Rank-based procedures are discussed in Hušková (1997a,b) and Gombay and Hušková (1998) while
U -statistics is utilized in Gombay (2000, 2001) and Horváth and Hušková (2005). For
an up-to-date survey on robust change point analysis we refer to Hušková (2013).
Example 1 To illustrate the utility of Theorem 2.1, we consider data consisting of
the monthly average temperatures in Prague between 1775 and 1989 which were
provided to us by Dr. Jarušková (Technical University, Prague). For each month there
are 215 observations which we consider to be independent due to temporal separation.
The
panel of Fig. 1 is the plot of Y215 (t, s) for the March averages. We used
first
2 (t, s)dtds to test the no change in the distribution null hypothesis. Using the
Y215
critical values from Blum et al. (1961), the null hypothesis was rejected at the 0.05
level. In total the test rejected the no change null hypothesis at the 0.05 level in 9 out of
the 12 months which is consistent with the results in Horváth et al. (1999) where these
data were also considered. To illustrate the difference between the shape of Y N (t, s)
under H0 and H A , we generated 215 independent uniform random variables on [0, 1].
The graph of Y215 (t, s) for the uniform variables is in the second panel of Fig. 1. Note
that when H0 was not rejected we do not observe a large peak in Y N which is a typical
feature under the alternative.
123
Author's personal copy
Classical methods in change point analysis
Fig. 1 Plots of Y215 (t, s) for the March average temperature in Prague from 1775 to 1989 (left panel) and
for simulated uniform random variables on [0, 1] (right panel)
3 Empirical process technique for time series
In several applications, especially in economics and finance, it cannot be assumed that
the observations satisfy (2.1). In this section we are interested in the behavior of Page’s
procedure if we allow the independence assumption to be violated. We replace (2.1)
with
(3.1)
{X i , −∞ < i < ∞}, a stationary sequence.
As before, the common distribution is denoted by F, and we assume
F is continuous on the real line.
(3.2)
We show how to use empirical process theory and invariance principles to establish the
1 1
weak convergence of Y N (t, s) and approximate the distribution of 0 0 Y N2 (t, s)dtds
under (3.1). Due to its importance in applications, the sequential empirical process
α N (t, x) = N
−1/2
N
t
(I {X i ≤ x} − F(x))
i=1
has received special attention in the literature. Assuming further conditions on
the dependence structure of the X i ’s, which we address below, one can define a
sequence of Gaussian processes N (t, s), 0 ≤ t, s ≤ 1 with E N (t, s) = 0 and
E N (t, s) N (t , s ) = min(t, t )C(s, s ), where
C(s, s ) = min(s, s ) − ss +
∞
[E(I {F(X 0 ) ≤ s}I {F(X i ) ≤ s }) − ss ]
i=1
∞
+
[E(I {F(X 0 ) ≤ s }I {F(X i ) ≤ s}) − ss ]
i=1
123
Author's personal copy
L. Horváth, G. Rice
such that
sup
sup
0≤t≤1 −∞<x<∞
|α N (t, x) − N (t, F(x))| = o P (1).
(3.3)
Note that the distribution of N (t, s) does not depend on N . Let (t, s) be a process
D
satisfying {(t, s), 0 ≤ t, s ≤ 1} = { N (t, s), 0 ≤ t, s ≤ 1}. We show that if
(t, s) has continuous sample paths with probability 1,
(3.4)
then the weak approximation in (3.3) yields the limit of functionals of Y N (t, s) in the
dependent case. Results establishing (3.3) under various dependence conditions can
be found in Berkes et al. (2008, 2009a), Berkes and Horváth (2001, 2003), Berkes
and Philipp (1977), Billingsley (1968), Louhichi (2000), Shao and Yu (1996) and Yu
(1993). For surveys we refer to Berkes and Horváth (2002), Bradley (2007), Dedecker
et al. (2007) and Doukhan (1994).
Theorem 3.1 If (3.1)–(3.3) hold, then we have
sup |Y N (t, s) − ◦N (t, s)| = o P (1),
0≤t,s≤1
where ◦N (t, s) = N (t, s) − t N (1, s).
Proof On account of (2.5) and (2.6) we need to consider the joint approximation
of α N (t, Q(s)) and the corresponding quantile process N 1/2 (F(Q N (s)) − s) in the
dependent case. First, we show that (3.3) implies the Bahadur–Kiefer representation of
(2.8). Using Lemma 1.1 in Csörgő and Horváth (1993) p 24 [cf. also Horváth (1984)]
and then (3.3), we conclude
sup |F(Q N (s)) − s| ≤ sup |FN (Q(s)) − s| = O P (N −1/2 ).
0≤s≤1
0≤s≤1
Lemma 1.2 of Csörgő and Horváth (1993) p 25 combined with (3.3) and (3.4) imply
that
sup |N 1/2 (s − F(Q N (s)) − α N (1, Q N (s))| = o P (1).
(3.5)
0≤s≤1
Hence by (2.5) and (2.6) we have
sup |Y N (t, s) − (α N (t, Q N (s)) − tα N (1, Q N (s)))| = o P (1).
0≤t,s≤1
Now the result follows from (3.3), the almost sure continuity of the approximating
process assumed in (3.4), and (3.5).
Remark 3.1 The arguments used in the proof of Theorem 3.1 show that many results
for the empirical distribution function and the empirical process can be automatically
established for the empirical quantile and quantile processes. Not only asymptotic
123
Author's personal copy
Classical methods in change point analysis
normality, weak convergence and laws of the iterated logarithm, but also rates of
convergence and Bahadur–Kiefer representations (with exact rate) can be established
if we have a rate in (3.3) and the modulus of continuity of is known [cf. Chapter
1 in Csörgő and Révész (1981) on the modulus of continuity of Wiener and related
processes]. For example, the results in Dominicy et al. (2013), Dutta and Sen (1971),
Oberhofer and Haupt (2005), Sen (1968) and Wu (2005) can also be derived from
empirical process theory.
It follows immediately from Theorem 3.1 that under the no change null hypothesis
for every 0 < s0 < 1
D[0,1]
C −1/2 (s0 , s0 )Y N (t, s0 ) −→ B(t),
where B(t), 0 ≤ t ≤ 1, stands for a Brownian bridge. However, C(s0 , s0 ) is typically
unknown and must be estimated from the sample. A Bartlett-type estimator computed
from I {X i ≤ Q N (s0 )}, 1 ≤ i ≤ N can be used to construct an asymptotically
consistent estimator for C(s0 , s0 ). For some recent results on Bartlett estimators for
the long-run variance we refer to Liu and Wu (2010) and Taniguchi and Kakizawa
(2000).
An alternative to estimating C(s0 , s0 ) is to use ratio statistics. Adapting the main
idea in Busetti and Taylor (2004), Horváth et al. (2008), Kim (2000) and Kim et al.
(2002) to our case, the testing of the stability of the quantiles can be based on the
following consequence of Theorem 3.1: for every 0 < δ < 1 we have
sup0≤u≤t |Y N (u, s0 )|
δ<t<1 sup0≤u≤t |Y N (u, s0 ) − (u/t)Y N (t, s0 )|
sup0≤u≤t | ◦ (u, s0 )|
D
.
→ sup
◦
◦
δ<t<1 sup0≤u≤t | (u, s0 ) − (u/t) (t, s0 )|
sup
(3.6)
The limit in (3.6) is distribution free so it is easy to tabulate its distribution using
Monte Carlo simulations.
The covariance function of ◦ is the product of the covariance function of a Brownian bridge and C(s, s ), so by the Karhunen–Loéve expansion of stochastic processes
we get that
11
( ◦ (t, s))2 dtds =
∞ ∞
i=1 j=1
0 0
1
N2 λj,
(iπ )2 i, j
where Ni, j , 1 ≤ i, j < ∞ are independent standard normal random variables and
λ1 ≥ λ2 ≥ . . . are the eigenvalues of the operator associated with C(s, s ), i.e.
1
λi ϕi (s ) =
C(s, s )ϕi (s)ds, 1 ≤ i < ∞,
(3.7)
0
123
Author's personal copy
L. Horváth, G. Rice
with eigenfunctions ϕ1 , ϕ2 , . . .. Since C(s, s ) is unknown, first we consider its estimation from the sample. The estimation is based on the observation that C(s, s ) is
also the long-run covariance function of the random functions I {X 1 ≤ Q(s)}, I {X 2 ≤
Q(s)}, . . . , I {X N ≤ Q(s)}. Since Q is unknown, the estimation of C(s, s ) is based on
I {X 1 ≤ Q N (s)}, I {X 2 ≤ Q N (s)}, . . . , I {X N ≤ Q N (s)}. We suggest Ĉ N , the kernelbased Bartlett estimator of Horváth et al. (2013b) [cf. also Horváth and Kokoszka
(2012)]. It can be shown under very mild assumptions on the dependence structure of
the observations that
11
(Ĉ N (s, s ) − C(s, s ))2 dsds = o P (1),
(3.8)
0 0
which implies for any fixed i that
|λ̂i,N − λi | = o P (1),
(3.9)
where λ̂1,N ≥ λ̂2,N ≥ . . . are the eigenvalues of the operator associated with Ĉ N (s, s ),
i.e.
1
λ̂i,N ϕ̂i,N (s ) =
Ĉ N (s, s )ϕ̂i,N (s)ds.
(3.10)
0
The result in (3.9) is an immediate consequence of (3.7), (3.8) and (3.10). If N , d(1)
and d(2) are sufficiently large, then
⎧ 1 1
⎨ P
⎩
0 0
Y N2 (t, s)dtds ≤ x
⎫
⎬
⎭
≈P
⎧
d(1) d(2)
⎨
⎩
i=1 j=1
⎫
⎬
1
2
N
≤
x
for all x.
λ̂
j,N
⎭
(iπ )2 i, j
Usually d(1) and d(2) are chosen using the cumulative variance approach, see Sect. 8.
Fremdt (2013, 2014) develops some interesting applications of Page’s procedure
to sequential stability testing. Empirical U -quantiles are used in Dehling and Fried
(2012) to detect changes in the location parameter in case of dependent data.
4 Darling–Erdős laws and change in correlations
The survey paper Aue and Horváth (2013) explains the applications of CUSUMs
(cumulative sums) including their weighted and self-normalized versions. The maximally selected self-normalized CUSUM is the maximally selected two-sample likelihood ratio when the errors are normally distributed, and its distribution was derived
in Horváth (1993). It is also noted in Aue and Horváth (2013) that a weak invariance
principle for the sums of the observations is enough to obtain the limit distribution
123
Author's personal copy
Classical methods in change point analysis
√
Fig. 2 Plots of |Z215 (t)| (left panel) and |Z215 (t)|/ t (1 − t) (right panel) for the March average temperature in Prague from 1775 to 1989
of CUSUM statistics, but the derivation of the limit of the maximally selected selfnormalized CUSUM requires approximation of the partial sums of the observations
with a rate. In this case, the limit distribution is in the extreme value class of distributions. Due to the nonstandard nature of the limit, Andrews (1993) claimed when
the maximum is computed one must compute the maximum on a restricted range,
i.e. the maximum is trimmed. It must then be implicitly assumed that the break does
not occur in a given percentage of the beginning or the end of the data. The limit
of the trimmed maximum depends on several unknown parameters and functions so
one usually needs resampling to get critical values. Andrews’ approach has been criticized by, among others, Hidalgo and Seo (2013) who pointed out that the choice of
the trimming parameter is arbitrary and the test loses power if the change occurs in
the trimmed off observations. In light of the comments in Hidalgo and Seo (2013),
it would be interesting to reinvestigate the likelihood ratio test for multiple changes
in Bai (1999) when the whole sample is used. All the proofs of the limit distribution
of the likelihood ratio tests and the maximally selected self-normalized CUSUM are
based on the following technique: (1) first, it is shown that the underlying process
can be approximated with suitably chosen Gaussian processes (2) a limit theorem is
proven for the Gaussian-based process. This technique first appeared in Darling and
Erdős (1956) and therefore the limits of the maximum of self-normalized statistics is
usually referred to as Darling–Erdős laws. For more information on Darling–Erdős
laws we refer to Csörgő and Horváth (1993, 1997).
Example 2 As an illustration for the CUSUM methods we reinvestigated the temperature data used in Example 1. In this case, we consider the test of H0 : μ1 = μ2 =
· · · = μ N , where μi = E X i against the one change in the mean alternative. The
CUSUM process is defined as
⎛
Z N (t) =
1
N 1/2 σ̂ N
N
t
⎝
i=1
Xi − t
N
⎞
X i ⎠ , 0 ≤ t ≤ 1,
i=1
123
Author's personal copy
L. Horváth, G. Rice
where X 1 , X 2 , . . . , X N denote the March averages and σ̂ N stands for the sample standard deviation. The left panel on Fig. 2 exhibits the graph of |Z N (t)| together with
the 0.05 critical value (red line) computed from the distribution of sup0≤t≤1 |B(t)|,
where B is a Brownian bridge. The right panel is the graph of |Z N (t)|/(t (1 −
t))1/2 . The rate of convergence in the Darling–Erdős type limit results is slow
[cf. Berkes et al. (2004c)], so we used the recommended 0.05 critical value in Gombay
and Horváth (1996) (the red line in the right panel). The no change in the mean null
hypothesis is rejected by both tests. This is consistent with our findings in Example 1.
The CUSUM method was also applied to detect changes in variances as in Gombay et al. (1996), Inclán and Tiao (1994) and Lee and Park (2001). For applications
to stock prices and air traffic we refer to Hsu (1979). CUSUM methodology was
extended to detect the stability of the covariances of the observation vectors in Berkes
et al. (2009c). The proofs in their paper are based on weighted approximations of
partial sums. Change detection in linear processes was investigated by Hušková et al.
(2007, 2008), Kirch (2007a,b), Kirch and Politis (2011) and Kirch and Steinebach
(2006). Horváth and Serbinowska (1995) and Batsidis et al. (2013) propose several
methods, including self-normalized and maximally selected χ 2 statistics, to detect
changes in multinomial data. Their methods are used to determine the number of
translators of the Lindisfarne Gospels. Due to the popularity of volatility models (i.e.
when the conditional variance depends on the previous observations and variances),
several papers have been devoted to testing the stability of ARCH- and GARCH-type
processes. For a general introduction and survey on volatility models we refer to Francq
and Zakoïan (2010). In a nonlinear time series setting, parametric procedures were
utilized by Berkes et al. (2004a) and Kokoszka and Leipus (2000) to detect breaks
in the parameters of ARCH and GARCH processes, and by Berkes et al. (2004b) to
sequentially monitor for breaks in the parameters of GARCH processes. For more general nonlinear processes we refer to Kirch and Tadjuidje Kamgaing (2012). Andreou
and Ghysels (2002) were concerned with the dynamic evolution of financial market
volatilities. Davis et al. (2008) analyze parametric nonlinear time series models by
means of minimum description length procedures.
In the rest of this section we focus on the application of the CUSUM method to detect
instability in correlations following Aue et al. (2009c). We observe the d-dimensional
random vectors y1 , y2 , . . . , y N and wish to check
H0 : Cov(y1 ) = Cov(y2 ) = · · · = Cov(y N )
against the alternative
H A : there is an integer k ∗ such that
Cov(y1 ) = Cov(y2 ) = · · · = Cov(yk ∗ ) = Cov(yk ∗ +1 ) = · · · = Cov(y N ).
As before, k ∗ is the time of change and is unknown. In the derivation below we assume
that the means of the observations do not change. This may require an initial test of
constancy of the means as described in Aue and Horváth (2013), and if necessary, a
transformation of the data so that the expected values can be regarded as stable for
123
Author's personal copy
Classical methods in change point analysis
the whole observation period. To construct a test statistic for distinguishing between
H0 and H A , we let vech(·) be the operator that stacks the columns on and below the
diagonal of a symmetric d × d matrix as a vector with d = d(d + 1)/2 components.
For a detailed study of the vech operator we refer to Horn and Johnson (1991). The
CUSUM will be constructed from
⎫
⎧
k
N
⎬
1 ⎨
k
vech(ỹ j ỹTj ) −
vech(ỹ j ỹTj ) , 1 ≤ k ≤ N ,
S̃k = 1/2
⎭
⎩
N
N
j=1
j=1
where ·T denotes the transpose of vectors and matrices and
ỹ j = y j − ȳ N
with ȳ N =
N
1 yj.
N
j=1
The variables S̃k , 1 ≤ k ≤ N form a vector-valued CUSUM sequence. The asymptotic
properties of S̃k , 1 ≤ k ≤ N are determined by the partial sum process of vech{(yi −
Eyi )(yi − Eyi )T }. Let
=
∞
Cov vech{(y0 − Ey0 )(y0 − Ey0 )T },
i=−∞
vech{(yi − Eyi )(yi − Eyi )T }
denote the long-run covariance matrix. The matrix is typically unknown and we
ˆ N , i.e.
assume that it can be estimated asymptotically consistently by ˆ N − | = o P (1).
|
(4.1)
In addition to Taniguchi and Kakizawa (2000) and Liu and Wu (2010), a discussion
of Bartlett-type kernel estimators satisfying (4.1) is in Brockwell and Davis (1991).
The proposed estimators take the possible change into account so (4.1) holds under
the null as well as under the change point alternative. If the possibility of a change is
ˆ N , the power of the tests (i) , i = 1, . . . , 4, defined
not built into the definition of N
in this section, might be reduced.
The statistics
−1
(1)
ˆ N S̃k
N = max S̃kT 1≤k≤N
and
(2)
N =
N
1 T ˆ −1
S̃k N S̃k .
N
k=1
were suggested in Aue et al. (2009c) and their asymptotic distributions were established
when the sequence yk , −∞ < k < ∞ is a stationary, m-decomposable sequence.
However, their assumption can be replaced with other measures of dependence, like
mixing or near-epoch dependence conditions [cf. Wied et al. (2012)]. Under H0 and
certain regularity conditions it is shown in Aue et al. (2009c) that
123
Author's personal copy
L. Horváth, G. Rice
(1)
D
N
→ (1) = sup
(2)
N
D
d
Bi2 (t)
(4.2)
Bi2 (t)dt,
(4.3)
0≤t≤1 i=1
and
(2)
→ =
d 1
i=1 0
where B1 , B2 , . . . , Bd are independent Brownian bridges. Formulas and critical values
for the distribution functions of (1) and (2) are given in Kiefer (1959). It is interesting
to note if d is large both (1) = (1) (d) and (2) = (2) (d) can be approximated
with normal random variables. Namely, both
¯ (1) =
(1) (d) − d/4
(d/8)1/2
and
¯ (2) =
(2) (d) − d/6
(d/45)1/2
converge in distribution to a standard normal random variable as d → ∞.
We can define another type of test statistic by comparing the means of vech(ỹ j ỹTj ),
1 ≤ j ≤ k to the sample means of vech(ỹ j ỹTj ), k + 1 ≤ j ≤ N . If H0 holds, then
Cov
⎧
k
⎨1 ⎩k
vech(ỹ j ỹTj ) −
j=1
1
N −k
N
vech(ỹ j ỹTj )
j=k+1
⎫
⎬
⎭
≈
1
1
+
k
N −k
so other natural statistics are
⎫T
⎧
k
N
⎬
⎨1 1
(3)
vech(ỹ j ỹTj ) −
vech(ỹ j ỹTj )
N = max
⎭
1≤k<N ⎩ k
N −k
j=1
j=k+1
−1
1
1
ˆ
+
×
N
k
N −k
⎧
⎫
k
N
⎨1 ⎬
1
×
vech(ỹ j ỹTj ) −
vech(ỹ j ỹTj )
⎩k
⎭
N −k
j=1
and
(4)
N
⎫T
⎧
N −1
k
N
⎬
1 ⎨1 1
=
vech(ỹ j ỹTj ) −
vech(ỹ j ỹTj )
⎭
⎩k
N
N −k
k=1
j=1
j=k+1
−1
1
1
ˆN
+
×
k
N −k
⎧
⎫
k
N
⎨1 ⎬
1
×
vech(ỹ j ỹTj ) −
vech(ỹ j ỹTj ) .
⎩k
⎭
N −k
j=1
123
j=k+1
j=k+1
,
Author's personal copy
Classical methods in change point analysis
(3)
(4)
The statistic N is a maximally selected self-normalized quadratic form, while N
is a sum of self-normalized CUSUM’s. Following the proofs of (4.2) and (4.3) one
can show that under H0
(4)
N
D
→ (4) =
d i=1 0
1
Bi2 (t)
dt.
t (1 − t)
(4.4)
(3)
However, it is much harder and delicate to obtain the limit distribution of N . The
main difficulty is that the weak convergence type arguments leading to (4.2)–(4.4) need
to be replaced with an approximation of the partial sums of vech[(y j − Ey j )(y j −
Ey j )T ], 1 ≤ j ≤ N with a suitable rate. Also, we need to replace (4.1) with
ˆ N − | = o P (1/ log log N ).
|
(4.5)
To prove a Darling–Erdős law for (3)
N first we need to find an approximation for the
partial sums of vech[(yi − Eyi )(yi − Eyi )T ], 1 ≤ i ≤ N . Namely, one must show
that there are ξ 1 = ξ 1 (N ), ξ 2 = ξ 2 (N ), . . . , ξ N = ξ N (N ), independent identically
distributed standard normal random vectors in R d such that
n
1
vech[(y j − Ey j )(y j − Ey j )T ]
1/2−
1≤k<n≤N (n − k)
j=k+1
n
− 1/2
ξ j = O P (1)
max
(4.6)
j=k+1
with some > 0 and
N
N
T
1/2
max vech[(y j − Ey j )(y j − Ey j ) ] − ξ j = o P (N 1/2 ). (4.7)
1≤k≤N j=1
j=1
The approximations we assumed in (4.6) and (4.7) have been established under a large
variety of dependence conditions [cf., for example, Berkes et al. (2014), Berkes and
Philipp (1979), Bradley (2007) and Eberlein (1986)]. Using (4.6) and (4.7) or even
weaker versions of these, the arguments in Csörgő and Horváth (1993, 1997) can be
repeated [cf. also Aue and Horváth (2013)] and the following result can be established
under the no change in the covariances null hypothesis:
(3)
lim P{(2 log log N ) N
N →∞
≤ (x + 2 log log N + (d/2) log log log N − log (d/2))2 } = exp(−2e−x ) (4.8)
for all x, where (t) denotes the Gamma function.
123
Author's personal copy
L. Horváth, G. Rice
Fig. 3 AIG stock prices and the Housing Price Index between 02 January, 1991 and 31 January, 2013
It has been observed that the rate of convergence in (4.8) might be slow and large
N is needed to use the limit to approximate the distribution of (3)
N . It is pointed out in
Davis et al. (1995) that the result in (4.8) remains valid with different, but of course,
asymptotically equivalent norming sequences but the choice of these sequences might
affect the finite sample performance. Due to the slow rate of convergence, resampling
(3)
methods can be used to get critical values for N . The rate of convergence in Darling–
Erdős laws was studied in Berkes et al. (2004c) in a very simple case and they proved
that the rate of convergence of the permutation resampling is better than the rate of
convergence in the Darling–Erdős limit theorem for the maximally selected CUSUM.
Correlations instead of covariances are used in Wied et al. (2012) to check the
stability of the connection between the coordinates of the observations, while the
method in Wied et al. (2014) is based on the nonparametric Spearman’s rho.
Example 3 In this example, we examine the relationship between the stock price of
American International Group (AIG) and the Housing Price Index (HPI) published
by the US Federal Housing Finance Agency. The data we consider are available at
http://finance.yahoo.com and http://www.fhfa.gov, and graphs of these observations
from January, 1991 to January, 2013 are displayed in Fig. 3. Major losses are visible
in both graphs around the summer of 2008 which correspond to the collapse of the
US housing bubble and the beginning of a global financial crisis. The monthly log
returns of the stock price of AIG as well as the detrended monthly log returns of the
HPI are displayed in Fig. 4. Both sequences show typical GARCH features [cf. Francq
and Zakoïan (2010)]. Maximally selected CUSUM procedures suggest that both time
series experience a change in volatility at roughly the same time. CUSUM procedures
can also be used to test the stability of the correlation between the log returns of the
AIG stock price and those of the HPI over the observation period as in Wied et al.
(2012). Let ρ̂k and ρ̄ N −k denote the sample correlations computed from the first k
and the last N − k observations, respectively. The corresponding CUSUM process is
defined by
123
Author's personal copy
Classical methods in change point analysis
Fig. 4 The monthly log returns on the AIG stock price and on the Housing Price Index between 02 January,
1991 and 31 January, 2013 are on the first two panels. The detrended log returns on the Housing Price Index
are on the third panel
√
Fig. 5 Plots of |R264 (t)| (left panel) and |R264 (t)|/ t (1 − t) (right panel) for the correlation between
the AIG and housing index monthly log returns
R N (t) =
t (1 − t)
(ρ̂N t − ρ̄ N −N t ), 0 ≤ t ≤ 1.
σ̂ N N 1/2
where σ̂ N is a Bartlett-type estimator of the variance of the summands in the correlation.
The graphs of |R N (t)| and its weighted version are shown in Fig. 5. In each case the
123
Author's personal copy
L. Horváth, G. Rice
process has a definitive peak which corresponds to the month of August in 2008; the
month immediately following the signing of the Housing and Economic Recovery
Act and one month before Lehman Brothers Holdings Inc. filed for bankruptcy. The
correlation of the monthly log returns of the stock price of AIG and the HPI in the
months prior to August 2008 was 0.011 compared to a correlation of 0.305 after August
2008. We observed similar results as those above when considering other major US
banks and lending companies.
5 Regressions
Following Quandt (1958, 1960) we consider linear regression with one possible change
in a subset of the regressors:
T
T
(β + δ I {i > k ∗ }) + xi,2
γ + εi , 1 ≤ i ≤ N ,
yi = xi,1
(5.1)
where xi,1 ∈ R d , xi,2 ∈ R p are known column vectors, and k ∗ , the time of change,
is unknown. The regression parameter β ∈ R d changes to β + δ ∈ R d at time k ∗
while γ ∈ R p remains constant during the observation period. All the parameters are
unknown but it is assumed that δ = 0. Under the null hypothesis
H0 : k ∗ ≥ N
(5.2)
HA : 1 < k∗ < N .
(5.3)
while under the alternative
In the model above not all regressors are changing, only a subset of them. This
problem was first studied in Quandt (1958, 1960) when all regressors could change,
i.e. it is known that γ = 0. We use the quasi-likelihood method, i.e. we assume that
the errors εi , 1 ≤ i ≤ N are independent normal random variables with unknown
variance σ 2 . Assuming that the change occurs at time k ∗ = k, this is a standard twosample problem and the likelihood ratio k can be easily and explicitly computed.
For details we refer to Sect. 3.1.1 of Csörgő and Horváth (1997).
Since the time of change is unknown, we use the maximally selected likelihood
ZN =
max
d+ p<k<N −(d+ p)
(−2 log k ).
It was pointed out in Quandt (1958, 1960) that even under H0 we have that Z N → ∞
in probability which was misinterpreted that Z N does not have a limit distribution.
Following Andrews (1993) the truncated statistics
Z N ,δ =
max
(−2 log k )
N δ≤k≤N −N δ
has been considered in the literature since under H0 it has a limit distribution for every
0 < δ < 1/2. However, the power depends strongly on δ and the unknown k ∗ . Also,
123
Author's personal copy
Classical methods in change point analysis
the limit distribution depends heavily on the unknown parameters. The statistic Z N
does not have these drawbacks.
Let
⎡ T ⎤
⎡ T ⎤
x1,1
x1,2
⎢ T ⎥
⎢ T ⎥
⎢ x2,1 ⎥
⎢ x2,2 ⎥
⎢
⎥
⎥
X11,k = ⎢
⎢ .. ⎥ , X12,k = ⎢ .. ⎥ ,
⎣. ⎦
⎣. ⎦
T
T
xk,1
xk,2
and
⎡
X21,k
T
xk+1,1
⎡
⎤
T
xk+1,2
⎤
⎢ T
⎢ T
⎥
⎥
⎢ xk+2,1 ⎥
⎢ xk+2,2 ⎥
⎢
⎢
⎥
⎥.
= ⎢.
⎥ , X22,k = ⎢ ..
⎥
.
⎣.
⎣.
⎦
⎦
x TN ,1
x TN ,2
We assume that there are matrices A1,1 , A1,2 , A2,2 such that
1 T
1 T
1 T
X X11,k ≈ A1,1 , X11,k
X12,k ≈ A1,2 , X12,k
X12,k ≈ A2,2 ,
k 11,k
k
k
1
1
XT X21,k ≈ A1,1 ,
XT X22,k ≈ A1,2 ,
N − k 21,k
N − k 21,k
1
XT X22,k ≈ A2,2 ,
N − k 22,k
and
the matrix A =
A1,1
T
A1,2
A1,2
A2,2
(5.4)
(5.5)
has rank d + p.
(5.6)
Assuming that H0 holds, (5.4)–(5.6) are satisfied and ε1 , ε2 , . . . , ε N are independent
and identically distributed with enough moments, then one can prove that
1/2
lim P{(2 log log N )1/2 Z N ≤ x + 2 log log N
N →∞
+ (d/2) log log log N − log (d/2)} = exp(−2e−x )
(5.7)
for all x, where (·) denotes the Gamma function. The limit result in (5.7) was obtained
in Horváth (1995) and it is also given in Sect. 3.1.1 of Csörgő and Horváth (1997).
Conditions (5.4)–(5.6) are satisfied by not only a large class of numerical sequences
but also realizations of stationary and ergodic variables. We used “≈” in (5.4) and
(5.5) since it can also mean closeness in probability. Likelihood methods can also be
extended to include a possible change in the variance under the alternative. The natural
estimator for k ∗ is the argmaxk (−2 log k ). The asymptotic properties of this estimator
are investigated in Horváth et al. (1997) and Hušková (1996). Chapter 3 of Csörgő and
123
Author's personal copy
L. Horváth, G. Rice
Horváth (1997) contains results on the power of the maximally selected likelihood and
further testing methods. The proof of (5.7) is based on a strong approximation of the
partial sums of the vectors (xi,1 + xi,2 )εi , 1 ≤ i ≤ N , so it can be extended to the case
when the errors εi , 1 ≤ i ≤ N , are dependent. However, since Z N was derived under
the assumption of independence, adjustments must be made to the limit result in (5.7)
involving the long-run covariance matrix of the sum of (xi,1 + xi,2 )εi , 1 ≤ i ≤ N [cf.
(5.10) below].
Next we consider the model
yi = xiT (β + δ I {i > k ∗ }) + εi , 1 ≤ i ≤ N ,
(5.8)
xi = h(i/N ), 1 ≤ i ≤ N .
(5.9)
where
As before δ = 0 and we wish to test (5.2) against (5.3). Assumption (5.9) contains the polynomial and harmonic regression in Jandhyala and MacNeill (1997),
the linear model in Albin and Jarušková (2003), Hansen (2000), Hušková and Picek
(2005), Jarušková (1998), and the polynomial regression in Aue et al. (2009a), Aue
et al. (2012), Aue et al. (2008a), Jarušková (1999) and Kuang (1998). The maximally selected likelihood ratio can be derived as in the the first part of this section
so we need to derive the limit distribution of Z N under assumption (5.9). Note that
(5.4) and (5.5) hold only if h is the constant function so the limit result in (5.7)
cannot be used even if the errors ε1 , ε2 , . . . , ε N were independent and identically
distributed. Instead of assuming mixing or similar conditions on the stationary errors
εi , 1 ≤ i ≤ N , Aue et al. (2012) assumed that the partial sums of the errors can be
approximated with suitably constructed Wiener processes. For example, approximations have been obtained for ARMA, GARCH-type, mixing, near epoch-dependent
and m-approximable sequences. These results can be found in Aue et al. (2006b),
Berkes et al. (2014), Bradley (2007), Dedecker et al. (2007), Doukhan (1994) and
Eberlein (1986). The main result in Aue et al. (2012) says that
lim P{(σ/τ )2 Z N ≤ x + 2 log log N
N →∞
+ d log log log N − 2 log(2d/2 (d/2)/d)} = exp(−2e−x/2 )
(5.10)
for all x, where
σ =
2
Eεi2
and
N 2
1
E
τ = lim
εi .
N →∞ N
2
i=1
We note that (5.10) can be written as
1/2
lim P{(2 log log N )1/2 (σ/τ )Z N ≤ x + 2 log log N
N →∞
+ (d/2) log log log N − log(2d/2 (d/2)/d)} = exp(−2e−x ).
123
(5.11)
Author's personal copy
Classical methods in change point analysis
As before in the classical case of (5.7), under assumption (5.4)–(5.6), the limit result
in (5.10) depends only on the number of parameters which can change under the
alternative. Only the constant is different in the normalization in (5.7) and (5.11).
The methods discussed in this section will reject falsely the no change in the parameters null hypothesis even if the null is correct but the errors are nonstationary;
typically a random walk is used to model nonstationarity. Hence, it is a challenging question to test for a unit root if there are change points in the regression line.
Wright (1996) derives the limits of some of the stability tests, including the Lagrange
multiplier and sup–Wald tests when the regressor is local to a random walk extending the result in Hansen (1992). A procedure which utilizes auxiliary statistics to
detect the presence of trend breaks and using the outcome of the detection step to
check for possible unit root is outlined in Carrion–i-Silvestre et al. (2009). Their
method achieves near asymptotically efficient unit root inference in both the no trend
break as well as in the trend break environments. However, according to their simulations the high efficiency is not apparent if the sample size is small or moderate.
Harvey et al. (2013) propose a test that allows for multiple breaks in the regression,
obtained by taking the infimum of the sequence (across all possible break points)
of detrended Dickey–Fuller type statistics. Change point detection is considered in
Iacone et al. (2013) when the errors are integrated (including random walk) in a linear
regression.
6 Sequential testing
So far we have discussed retrospective break point tests, i.e. the case when we have
a set of observations and wish to check if the parameter of interest changed during
the observation period. There is a large and still growing literature concerned with
retrospective break point tests and estimation procedures, but much less attention has
been paid to the corresponding sequential procedures. Starting with the seminal paper
Chu et al. (1996) several authors have developed fluctuation tests that are based on
the general paradigm that an initial time period (sometimes called historical or training sample) of length m is used to estimate a model with the goal to monitor for
parameter changes on-line. We assume that X1 , X2 , . . . , Xm are from a stable observation period, i.e. the parameter of interest is the same for these observations. The
asymptotic analysis is carried out for m → ∞. Under the null hypothesis the parameter of Xm+1 , Xm+2 , . . . is the same while under the alternative there is an integer
k ∗ ≥ 0 such that the parameter of Xm+1 , Xm+2 , . . . , Xm+k ∗ is the same as in the
historical sample, but Xm+k ∗ +1 , Xm+k ∗ +2 , . . . have a different parameter (k ∗ = 0 is
taken to mean that the parameter changes immediately after the last historical observations). To test the null hypothesis of structural stability sequentially, one defines a
stopping time τm that rejects the null as soon as the detector Dm (k), suitably constructed from the sample, crosses an appropriate threshold g(m, k) (measuring the
growth of the detector under the null). Two types of detectors have been used in the
literature:
τm = inf{k : Dm (k) ≥ g(m, k)},
(6.1)
123
Author's personal copy
L. Horváth, G. Rice
where inf(∅) = ∞ and
τm (N ) = inf{k < N : Dm (k) ≥ g(m, k)}
(6.2)
and τm (N ) = N + 1 if D M (k) < g(m, k) for all 1 ≤ k ≤ N . If the stopping time τm
is used, the method is called open-ended since we may not stop the data-generating
process under the null hypothesis. We say that the sequential testing is closed-ended if
it is based on τm (N ) since the data collection will terminate after the N th observation.
The boundary g(m, k) is chosen such that
lim P{τm < ∞} = α
m→∞
under H0
(6.3)
and in case of a close-ended procedure
lim P{τm (N ) ≤ N } = α
m→∞
under H0 ,
(6.4)
where α is a fixed small number picked by the practitioner. Equations (6.3) and (6.4)
mean that the probability of stopping when we should not is α if the historical sample
size is large enough. Of course, (6.3) or (6.4) does not determine the boundary and we
should use a boundary such that k ∗ −τm (or k ∗ −τm (N )) is small under the alternative.
To illustrate the method, we consider the linear model
yi = xiT β i + εi , i ≥ 1.
We assume that the training sample of size m is noncontaminated, i.e. β 1 = β 2 =
· · · = β m . Under H0 we have that β m = β m+1 = · · · while under the alternative
β m = β m+1 = · · · = β m+k ∗ = β m+k ∗ +1 = β m+k ∗ +2 = · · · . Let β̂ m denote the
least square estimator for the regression parameter computed from yi , xi , 1 ≤ i ≤ m.
Following Horváth et al. (2004) the detector is based on the residuals
ε̂i = yi − xiT β̂ m , m + 1 ≤ i < ∞
and σ̂m2 , the estimator for the common variance of the εi ’s computed from the training
sample. The detector is the CUSUM of the residuals defined as
1
Dm (k) =
σ̂m
ε̂i m<i≤m+k and the boundary function is
γ
k
k
g(m, k) = cm 1/2 1 +
,
m
m+k
123
(6.5)
Author's personal copy
Classical methods in change point analysis
where 0 ≤ γ < 1/2 is a given parameter and c is determined by (6.3). Assuming that
ε1 , ε2 , . . . are independent and identically distributed random variables with Eεi =
0, Eεi2 = σ 2 and E|εi |ν < ∞ with some ν > 2, then
!
lim P{τm < ∞} = lim P
m→∞
m→∞
=P
sup Dm (k)/g(m, k) < 1
1≤k<∞
!
γ
sup |W (t)|/t ≤ c ,
(6.6)
0≤t≤1
where W denotes a Wiener process. The assumption that the errors are independent
and identically distributed random variables can be replaced with the assumption that
the partial sums of the εi ’s can be approximated with Wiener
"processes but in this case
σ̂m2 must be a consistent estimator for limm→∞ var(m −1/2 1≤i≤m εi ). For details we
refer to Aue et al. (2008b). By the law of the iterated logarithm for the Wiener process
we get immediately that
sup |W (t)|/t 1/2 = ∞
with probability one,
0<t<1
so we cannot choose γ = 1/2 in (6.6). The limit distribution of τm was studied in Aue
and Horváth (2004) and Aue et al. (2008b) under the alternative in the special case of
a change in the location (i.e. xi = 1). They found numerical sequences am = am (γ )
and bm = bm (γ ) > 0 such that under the alternative
lim P
m→∞
τm − am
≤ x = (x)
bm
for all x,
where (x) denotes the standard normal distribution function. They also pointed out
that τm − k ∗ ≈ m (1−2γ )/(1−γ ) (in probability) so it is decreasing as γ → 1/2 and it
would be as small as possible when γ = 1/2 but this case is not allowed in (6.6). The
case of γ = 1/2 is investigated in Horváth et al. (2007) where it is made clear that the
boundary function g(m, k) of (6.5) cannot be used. We use
g(m, k) =
c + 2 log log m + (1/2) log log log m − (1/2) log π
(2 log log m)1/2
1/2
k
k
1/2
1+
×m
m
k+m
(6.7)
instead of (6.5) in case of a closed-ended procedure. It is proved in Horváth et al.
(2007), if N is proportional to m λ , where λ ≥ 1, then assuming no change in the linear
model parameters we have
lim P{τm (N ) ≤ N } = exp(−e−c ),
m→∞
(6.8)
123
Author's personal copy
L. Horváth, G. Rice
where the boundary is given by (6.7). One can also show [cf. Aue and Horváth (2004)
and Aue et al. (2008b)] that τm (N ) − k ∗ is bounded by log log m in probability which
is an improvement over the polynomial bound if (6.5) is used. For a multivariate
extension we refer to Aue et al. (2014). It is interesting to note that the square root
boundary function (i.e. the standard deviation of the detector) leads to a Darling–Erdős
type limit result in (6.8).
For the sake of simplicity we considered linear models with not necessarily independent error terms. However, the sequential detection method can also be extended
to nonlinear models along the lines of Berkes et al. (2004b). Resampling to get
critical values is proposed in Hušková and Kirch (2012). Sequential monitoring for
changes in the parameters of a linear model is considered in Aue et al. (2006a) and
Černíková et al. (2013). Fourier coefficients are utilized in Hlávka et al. (2012). The
paper Chochola et al. (2013) contains an interesting application to finance.
7 Panel models
Panel models are very popular when short segments of several sets of observations are
available. This is common when collectively examining the performance of companies
where usually yearly data are available for a large number of companies [cf. Bartram
et al. (2012)]. Hence it is an important consideration in case of panel data that N , the
number of panels, might be much larger than T , the number of observations in any
given panel. For a survey on panel data we refer to Hsiao (2007). To illustrate change
point detection in panels, we consider a very simple model:
X i,t = μi + δi I {t > t0 } + γi ηt + ei,t , 1 ≤ i ≤ N , 1 ≤ t ≤ T.
(7.1)
The ith panel has an initial mean μi which might change to μi + δi if the change in
the mean happens before time T , i.e. the change occurs in the observation period. The
panels are connected by the common factors ηt and the effect of the common factors
in the ith panel is measured by γi . We do not assume that ηt is observed; it is used
as an error term common to all panels. The errors {ei,t , 1 ≤ t ≤ T } are time series
for each i, and it is assumed that {ei,t , 1 ≤ t ≤ T, 1 ≤ i ≤ N } is independent of
{ηt , 1 ≤ t ≤ T } or at least uncorrelated. We wish to test the null hypothesis
H0 : t0 ≥ T
(7.2)
H A : t0 < T.
(7.3)
against the alternative
The model in (7.1) has a large number of parameters but all of them are nuisance parameters with the exception of t0 . No statement is made about the nuisance parameters
under H0 nor under H A , and only T observations are available for them. However,
the possible time of change appears in all observations, so the number of observations
123
Author's personal copy
Classical methods in change point analysis
which can be used to estimate t0 is NT. The panel data approach has excellent performance if the statistical inference is on parameters which are common in a large number
of panels. The model (7.1) was introduced by Bai (2010) without the common factors
and assuming that H A holds the author estimated the time of change. In this section
we discuss the CUSUM-based testing method of Horváth and Hušková (2012). Due
to possible dependence between the observations in the ith panel the CUSUM will be
normalized by
σi2
T
2
1
E
= lim
ei,t , 1 ≤ i ≤ N .
T →∞ T
t=1
The CUSUM process in the panel model of (7.1) is defined by
V̄N ,T (x) =
1
N
N 1/2
i=1
!
T x(T − T x)
1 2
Z T,i (x) −
,
T2
σi2
where
Z T,i (x) =
1
T 1/2
ST,i (x) −
T x
ST,i (1)
T
with
ST,i (x) =
T
x
X i,t , 0 ≤ x ≤ 1.
t=1
By definition, the CUSUM process for the panels is the sum of the squares of all
CUSUM processes computed from the individual panels. Next we list the most important conditions which are used to establish the weak convergence of V̄N ,T (x):
(7.4)
for each i the sequence {X i,t , −∞ < t < ∞} is a linear process
N
→ 0
(7.5)
T2
{ei,t , 1 ≤ i ≤ N , 1 ≤ t ≤ T } and {ηt , 1 ≤ t ≤ T } are independent (7.6)
1
T 1/2
T
x
D[0,1]
ηt −→ W (x), where W is a Wiener process
(7.7)
t=1
for all i we have that γi = γi,N =
where ζi = 0.
ζi
with some ρi > 1/4,
N ρi
(7.8)
123
Author's personal copy
L. Horváth, G. Rice
Assuming that H0 , (7.4)–(7.8) and some additional moment conditions are satisfied,
then
D[0,1]
V̄N ,T (x) −→ V̄ (x),
(7.9)
where V̄ (x) is a Gaussian process with E V̄ (x) = 0 and E V̄ (x)V̄ (y) = 2x 2 (1 −
y)2 , 0 ≤ x ≤ y ≤ 1. For the proof of (7.9) we refer to Horváth and Hušková (2012).
The weak convergence in (7.9) should remain true if the independence in (7.6) is
replaced with “uncorrelated”. Assumption (7.7) is minor since only the normality of
the sum of the common factors is required. If (7.5) does not hold, then a drift term
of order N 1/2 /T will appear. Also, if for all i ζi = 0 and ρi ≤ 1/4, then the weak
convergence in (7.9) does not hold. It is an interesting and still unsolved problem to
test H0 in (7.2) when the correlation between the panels is large, i.e. the loadings
√
are large. Also it would be important to find tests when T is much smaller than N .
Theoretical considerations as well as Monte Carlo simulations show that in the panel
data model very small changes in the mean can be detected.
The long-run variances of the errors are usually unknown so the σi2 ’s in the definition
of V̄N ,T must be replaced with some suitable estimators. Bartlett-type kernel estimators
were proposed in Horváth and Hušková (2012) and a simulation study in that paper
supports the suggestion.
A quasi-maximum likelihood argument leads to the self-normalized statistic
E N ,T
!
N
2 (x)
1 T 2 Z T,i
1
−
1
= sup 1/2
.
2
σi T x(T − T x)
0<x<1 N
i=1
A Darling–Erdős type result for E N ,T under the null hypothesis was proven in Chan
et al. (2013) but assuming much stronger conditions than (7.4)–(7.8).
The estimator for t0 in Bai (2010) is strongly related to E N ,T . Bai (2010) estimates
the time of change with
tˆN ,T = T argmaxx
N
2 (x)
Z T,i
1
σ 2 T x(T − T x)
i=1 i
and obtains the distribution of the normalized difference between t0 and tˆN ,T assuming
that γi = 0 for all 1 ≤ i ≤ N , i.e. the panels are independent. The more general case
of γi = 0 for some 1 ≤ i ≤ N is investigated in Horváth et al. (2013a).
It has been observed [cf. Aue et al. (2009b) and Berkes et al. (2006)] that CUSUM
and the quasi-maximum likelihood tests reject the no change in the mean null hypothesis, even if the null hypothesis is correct, when the errors follow a random walk or there
is long-range dependence between the errors. This phenomenon was also investigated
in panel data. Baltagi et al. (2012) consider the regression model
yi,t = α + δ1 I {t > t0 } + (β + δ2 I {t > t0 })xi,t + u i,t , 1 ≤ i ≤ N , 1 ≤ t ≤ T,
123
Author's personal copy
Classical methods in change point analysis
where (δ1 , δ2 ) = (0, 0). They assume that {xi,t , 1 ≤ t ≤ T }, {u i,t , 0 ≤ t ≤ T }
are independent. This implies immediately that the panels are independent and the
regressors and the errors are independent in each panel. Furthermore, E xi,t = Eu i,t =
0,
for each i, {xi,t , 0 ≤ t ≤ T } is an AR(1) process with
autoregressive parameter λ
(7.10)
and
for each i, {u i,t , 0 ≤ t ≤ T } is an AR(1) process with
autoregressive parameter ρ.
(7.11)
Stationarity means that both λ and ρ are in the interval (−1, 1) while nonstationarity
means that λ or ρ or both are 1 (random walk). Let t˜T,N denote the least squares
estimator for t0 as defined in Baltagi et al. (2012), i.e. when the ordinary least squares
will take their smallest value with respect to all parameters including the time of
change. They obtained several limit results for the difference between t˜T,N and t0 .
Assuming that t0 = T τ0 , 0 < τ < 1, they show t˜N ,T /T → τ0 in probability even
in the case when |ρ| < 1 and λ = 1. However, if ρ = 1, |λ| < 1 or ρ = λ = 1, then
t˜N ,T /T converges in distribution to a nondegenerate limit.
Westerlund and Larsson (2012) pointed out that the common regression parameters
in (7.10) and (7.11) restrict the applicability of the model. Following Im et al. (2003)
one possibility is that for each i the AR(1) (autoregressive of order 1) processes xi,t ,
and u i,t have their own autoregressive parameters. This means that in this model we
have additional 2N − 2 parameters. However, the statistical inference is only about t0
and the additional parameters are nuisance parameters. The discussion of the model
in (7.1) suggests that testing if t0 ≥ T or the estimation of t0 is possible even if the
number of nuisance parameters is large.
The other possibility to weaken conditions (7.10) and (7.11) is to use the random
coefficient panel model of Horváth and Trapani (2013), Ng (2008) and Westerlund and
Larsson (2012). In the random coefficient approach, for each i, 1 ≤ i ≤ N , xi,t is an
AR(1) process with parameter λi with Eλi = λ and similarly the parameter of u i,t is a
random variable ρi with Eρi = ρ. The main result in Horváth and Trapani (2013) states
that the statistical inference for the autoregression parameter is the same for stationary,
nonstationary and mixed cases. It is an interesting and possibly challenging question
if one could have statistical inference for t0 even in nonstationary cases.
8 Functional observations
In this section we assume that the observations are functions defined on [0, 1]. Many
data may be considered naturally as curves. For example, temperatures observed several times on a given day can be considered as a discrete sample from an underlying
daily temperature curve. Similarly, pollution levels or blood pressure measurements
can be considered as realizations of curves, densely observed. Stock prices change
123
Author's personal copy
L. Horváth, G. Rice
when the stock is traded but in case of frequently traded stocks it happens so often
that it would be too high dimensional to use classical methods. Also, the trading might
happen at different times on different days. Introduction and thorough reviews of
functional data methods are given in Cuevas (2014), Ferraty and Romain (2011) and
Horváth and Kokoszka (2012).
One of the first papers on change detection in functional data was motivated by
yearly temperature measurements; one curve is constructed from 365 daily observations in Berkes et al. (2009b). Based on these curves we wish to test if the mean
yearly temperature curve remained the same since the data collection started. Let the
functional observations satisfy the model
X i (t) = μ(t) + δ(t)I {i > k ∗ } + εi (t), 1 ≤ i ≤ N .
We assume that
δ = 0,
1
where x = ( x 2 (t)dt)1/2 , and = 0 . We wish to test
H0 : k ∗ ≥ N
against the alternative
HA : 1 < k∗ < N .
We can test H0 against H A using the functional version of the CUSUM process
⎛
S N◦ (x, t) =
N
x
1
N 1/2
⎝
i=1
⎞
N
N x X i (t) −
X i (t)⎠ , 0 ≤ x, t ≤ 1.
N
i=1
It is clear that under H0 the process S N (x, t) does not depend on the unknown μ(t).
We assume that
ε1 , ε2 , . . . , ε N are independent and identically distributed
random functions,
Eεi (t) = 0
and
Eεi 2 < ∞.
Let
C(t, s) = Eεi (t)εi (s)
123
(8.1)
(8.2)
Author's personal copy
Classical methods in change point analysis
denote the covariance function of the errors. If (8.1) and (8.2) hold, then we
can define a sequence of Gaussian processes G N◦ (x, t) with EG N◦ (x, t) = 0 and
EG N◦ (x, t)G N◦ (y, s) = (min(x, y) − x y)C(t, s) such that
$2
# ◦
S N (x, t) − G N◦ (x, t) dxdt = o P (1), as N → ∞.
(8.3)
The approximation in (8.3) follows immediately from the approximation of partial
sums in Hilbert spaces. Namely, we can define a sequence of Gaussian processes
G N (x, t) with EG N (x, t) = 0 and EG N (x, t)G N (y, s) = min(x, y)C(t, s) such that
(S N (x, t) − G N (x, t))2 dxdt = o P (1), as N → ∞,
(8.4)
where
S N (x, t) =
1
N 1/2
N
x
X i (t).
(8.5)
i=1
The proof of the result in (8.4) can be found in Horváth and Kokoszka (2012). By the
spectral theorem we have
C(t, s) =
∞
λi φi (t)φi (s),
i=1
where λ1 ≥ λ2 . . . ≥ 0 and the orthonormal functions φ1 (t), φ2 (t), . . . satisfy
λi φi (t) =
C(t, s)φi (s)ds, 1 ≤ i < ∞.
Using the Karhunen–Loéve expansion we conclude
D
(G N◦ (x, t))2 dtdx =
∞
λi
Bi2 (x)dx,
(8.6)
i=1
where B1 , B2 , . . . are independent Brownian bridges. The limit distribution depends
on the unknown λi ’s so we need to estimate them from the random sample. The
covariance kernel C can be estimated by
Ĉ N (t, s) =
N
1 (X i (t) − X̄ N (t))(X i (s) − X̄ N (s)),
N
i=1
123
Author's personal copy
L. Horváth, G. Rice
where
X̄ N (t) =
N
1 X i (t).
N
i=1
Under assumptions (8.1) and (8.2) we have that
Ĉ N − C = o P (1).
(8.7)
Let λ̂1 ≥ λ̂2 ≥ λ̂3 ≥ . . . ≥ 0 and φ̂1 (t), φ̂2 (t), . . . be the empirical eigenvalues and
corresponding eigenfunctions satisfying
λ̂i φ̂i (t) =
Ĉ N (t, s)φ̂i (s)ds.
If λ1 > λ2 > . . ., the relation in (8.7) [c.f. Horváth and Kokoszka (2012)] implies that
|λ̂i − λi | = o P (1)
and
φ̂i − φi = o P (1),
"d
λ̂i Bi2 (t)dt assuming
and therefore the limit in (8.6) can be approximated with i=1
that d is large enough. However, we still need to use Monte Carlo simulations to get
critical values.
A different approach in Berkes et al. (2009b) is based on projections. First we define
the score vectors
T
ξ̂ i = (ξ̂i,1 , ξ̂i,2 , . . . , ξ̂i,d ), with ξ̂i, j = X i , φ̂ j 1 ≤ j ≤ d.
where ·, · denotes the inner product in the Hilbert space of square integrable functions
on [0, 1]. The CUSUM process is now defined as
⎛
⎞2
N x
d
N
1 1 ⎝
ξ̂i, j − x
ξ̂i, j ⎠ .
H N (x) =
N
j=1 λ̂ j
i=1
i=1
It is proven in Berkes et al. (2009b) that under assumption (8.1) and (8.2)
D
H N (x) →
d
Bi2 (x),
(8.8)
i=1
where B1 , B2 , . . . , Bd are independent Brownian bridges. The distributions of the
supremum and integral functions of the limit process in (8.8) are discussed in Sect. 4
of the present paper. Comparing (8.6) and (8.8) we see that the empirical projection
method provides a simple, distribution-free procedure. Using the supremum functional
of H N (x) as a test statistic, the central England daily temperature data was segmented
into six homogenous parts in Berkes et al. (2009b).
123
Author's personal copy
Classical methods in change point analysis
Fig. 6 Plots of 5 yearly temperature curves for Atlantic City (NJ), the raw data are on the upper panel and
the smoothed curves are on the lower panel
Example 4 The National Climatic Data Center collects, stores, and analyzes climatic
data obtained from thousands of weather stations across the US. The data sets are
published at their website www.ncdc.noaa.gov/cdo-web/, of which many go back over
100 years. In this example, we consider the daily maximum temperature in Atlantic
City (NJ) from January 1, 1874 to December 31, 2012. Only two observations in
the data set consisting of over 50,000 data points were missing, and we replaced
them using linear interpolation between the adjacent data points. Due to the yearly
periodicity in temperature, we divided the data set into 138 yearly observations each
of which consists of 365 or 366 observations. The plots of the first 5 years of data are
123
Author's personal copy
L. Horváth, G. Rice
displayed in the upper panel of Fig. 6. Since the underlying data points can be thought
of as discrete observations of an underlying continuous yearly temperature curve, we
consider the data to be functional in nature and proceed by approximating the data
with continuous curves. Many techniques for creating functional data objects from a
discrete collection of points have been implemented in the fda package within R, see
Ramsay et al. (2009) for details. To create functional data objects from the temperature
data, we used the fda package to approximately interpolate the data points using a Bspline basis with 50 curves. The smooth curves generated in this way are displayed in
the lower panel of Fig. 6. To obtain an approximate test of the hypothesis that the mean
yearly temperature curve does not change over the observation period using (8.8), we
1
"4 1 2
calculate 0 H138 (x)dx. We compare this value to a critical value of i=1
0 Bi (x)dx
which are tabulated in Kiefer (1959). The calculation of H138 (x) requires the choice
of d. The most common technique to choose d in practice is the cumulative variance
approach; d is chosen so that
d
i=1
λ̂i
N
%
λ̂i ≈ v,
i=1
where v is a specified percentage. In our analysis we used the cumulative variance
approach with v = 0.9 which gave d = 4. For a thorough account of the cumulative
variance approach and principal component analysis for functional data we refer to
Horváth and Kokoszka (2012). The value of our test statistic is 4.817, which is larger
than the 0.001 critical value in Kiefer (1959).
Some basic results on functional time series models are summarized in Hörmann
and Kokoszka (2010). An analog of the CUSUM process Hn (x) is introduced and
studied in Horváth et al. (2014) in case of dependent observations. Their method is
used to analyze stock returns. The most popular time series model is the functional
autoregression due to Bosq (2000). A test for the stability of functional autoregressive
processes is provided by Horváth et al. (2010). The model is used for prediction in
Kargin and Onatski (2008). For some interesting applications to biological data we
refer to Aston and Kirch (2012a,b). Models for nonlinear functional time series are
presented in Hörmann et al. (2013) and Horváth et al. (2013b).
Acknowledgments We are grateful to Marie Hušková, Stefan Fremdt and the participants of the Time
Series Seminar at the University of Utah for pointing out mistakes in the earlier versions of this paper and
to Daniela Jarušková and Brad Hatch for some of the data sets.
References
Albin JMP, Jarušková D (2003) On a test statistic for linear trend. Extremes 6:247–258
Andreou E, Ghysels E (2002) Detecting multiple breaks in financial market volatility dynamics. J Appl
Econom 17:579–600
Andrews DWK (1993) Tests for parameter instability and structural change with unknown change point.
Econometrica 61:821–856
Aston J, Kirch C (2012a) Evaluating stationarity via change-point alternatives with applications to fmri
data. Ann Appl Stat 6:1906–1948
123
Author's personal copy
Classical methods in change point analysis
Aston J, Kirch C (2012b) Detecting and estimating changes in dependent functional data. J Multivar Anal
109:204–220
Aue A, Horváth L (2004) Delay time in sequential detection of change. Stat Prob Lett 67:221–231
Aue A, Horváth L, Hušková M, Kokoszka P (2006a) Change-point monitoring in linear models with
conditionally heteroscedastic errors. Econom J 9:373–403
Aue A, Berkes I, Horváth L (2006b) Strong approximation for the sums of squares of augmented garch
sequences. Bernoulli 12:583–608
Aue A, Horváth L, Hušková M, Kokoszka P (2008a) Testing for changes in polynomial regression. Bernoulli
14:637–660
Aue A, Horváth L, Kokoszka P, Steinebach JG (2008b) Monitoring shifts in mean: asymptotic normality
of stopping times. Test 17:515–530
Aue A, Horváth L, Hušková M (2009a) Extreme value theory for stochastic integrals of legendre polynomials. J Multivar Anal 100:1029–1043
Aue A, Horváth L, Hušková M, Ling S (2009b) On distinguishing between random walk and changes in
the mean alternatives. Econom Theory 25:411–441
Aue A, Hörmann S, Horváth L, Reimherr M (2009c) Break detection in the covariance structure of multivariate time series models. Ann Stat 37:4046–4087
Aue A, Horváth L, Hušková M (2012) Segmenting mean-nonstationary time series via trending regression.
J Econom 168:367–381
Aue A, Horváth L (2013) Structural breaks in time series. J Time Ser Anal 34:1–16
Aue A, Dienes C, Fremdt S, Steinebach JG (2014) Reaction times of monitoring schemes for ARMA time
series. Bernoulli (to appear)
Bai J (1999) Likelihood ratio test for multiple structural changes. J Econom 91:299–323
Bai J (2010) Common breaks in means and variances for panel data. J Econom 157:78–92
Baltagi BH, Kao C, Liu L (2012) Estimation and identification of change points in panel models with
nonstationary or stationary regressors and error terms. Preprint.
Bartram SM, Brown G, Stulz RM (2012) Why are us stocks more volatile? J Finance 67:1329–1370
Batsidis A, Horváth L, Martín N, Pardo L, Zografos K (2013) Change-point detection in multinomial data
using phi-convergence test statistics. J Multivar Anal 118:53–66
Berkes I, Philipp W (1977) An almost sure invariance principle for the empirical distribution function of
mixing random variables. Zeitschrift für Wahrscheinlichtkeitstheorie und verwandte Gebiete 41:115–137
Berkes I, Philipp W (1979) Approximation theorems for independent and weakly dependent random vectors.
Ann Prob 7:29–54
Berkes I, Horváth L (2001) Strong approximation of the empirical process of garch sequences. Ann Appl
Prob 11:789–809
Berkes I, Horváth L (2002) Empirical processes of residuals. In: Dehling H, Mikosch T, Sorensen M (eds)
Empirical process techniques for dependent data. Birkhäuser, Basel, pp 195–209
Berkes I, Horváth L (2003) Limit results for the empirical process of squared residuals in garch models.
Stoc Process Appl 105:271–298
Berkes I, Horváth L, Kokoszka P (2004a) Testing for parameter constancy in GARCH( p, q) models. 70:
263–273
Berkes I, Gombay E, Horváth L, Kokoszka P (2004b) Sequential change-point detection in garch( p, q)
models. Econom Theory 20:1140–1167
Berkes I, Horváth L, Hušková M, Steinebach M (2004c) Applications of permutations to the simulation of
critical values. J Nonparamet Stat 16:197–216
Berkes I, Horváth L, Kokoszka P, Shao Q-M (2006) On discriminating between long-range dependence
and changes in the mean. Ann Stat 34:1140–1165
Berkes I, Hörmann S, Horváth L (2008) The functional central limit theorem for a family of garch observations with applications. Stat Prob Lett 78:2725–2730
Berkes I, Hörmann S, Schauer J (2009a) Asymptotic results for the empirical process of stationary sequences.
Stoch Process Appl 119:1298–1324
Berkes I, Gabrys R, Horváth L, Kokoszka P (2009b) Detecting changes in the mean of functional observations. J R Stat Soc Ser B 70:927–946
Berkes I, Gombay E, Horváth L (2009c) Testing for changes in the covariance structure of linear processes.
J Stat Plan Inference 139:2044–2063
Berkes I, Liu W, Wu WB (2014) Komlós–major–tusnády approximation under dependence. Ann Prob
42:794–817
123
Author's personal copy
L. Horváth, G. Rice
Billingsley P (1968) Convergence probability measures. Wiley, New York
Blum JR, Kiefer J, Rosenblatt M (1961) Distribution free tests of independence based on the sample
distribution function. Ann Math Stat 32:485–498
Bosq D (2000) Linear processes in function spaces. Springer, New York
Bradley RC (2007) Introduction to strong mixing conditions, vol 1–3. Kendrick Press, Heber City
Brockwell PJ, Davis RA (1991) Time series: theory and methods, 2nd edn. Springer, New York
Busetti F, Taylor AMR (2004) Tests of stationarity against a change in persistence. J Econom 123:33–66
Carrion-i-Silvestre JL, Kim D, Perron P (2009) Gls-based unit root tests with multiple structural breaks
both under the null and the alternative hypotheses. Econom Theory 25:1754–1792
Černíková A, Hušková M, Prášková Z, Steinebach J (2013) Delay time in monitoring jump changes in
linear models. Statistics 47:1–25
Chan J, Horváth L, Hušková M (2013) Darling–erdős limit results for change-point detection in panel data.
J Stat Plan Inference 143:955–970
Chochola O, Hušková M, Prášková Z, Steinebach JG (2013) Robust monitoring of capm portfolio betas. J
Multivar Anal 115:374–396
Chu C-SJ, Stinchcombe M, White H (1996) Monitoring structural change. Econometrica 64:1045–1065
Chung KL, Williams RJ (1983) Introduction to stochastic integration. Birkhäuser, Boston
Csáki E (1986) Some applications of the classical formula on ruin probabilities. J Stat Plan Inference
14:35–42
Csörgő M, Révész P (1981) Strong approximations in probability and statistics. Academic Press, New York
Csörgő M, Horváth L (1987) Nonparametric tests for the changepoint problem. J Stat Plan Inference 17:1–9
Csörgő M, Horváth L (1993) Weighted approximations in probability and statistics. Wiley, Chichester
Csörgő M, Horváth L (1997) Limit theorems in change-point analysis. Wiley, Chichester
Cuevas A (2014) A partial overview of the theory of statistics with functional data. J Stat Plan Inference
147:1–23
Darling DA, Erdős P (1956) A limit theorem for the maximum of normalized sums of independent random
variables. Duke Math J 23:143–155
Davis RA, Huang D, Yao Y-C (1995) Testing for a change in the parameter values and order of an autoregressive model. Ann Stat 23:282–304
Davis RA, Lee TC, Rodriguez-Yam G (2008) Break detection for a class of nonlinear time series models.
J Time Ser Anal 29:834–867
Dedecker I, Doukhan P, Lang G, León JRR, Louhichi S, Prieur C (2007) Weak dependence with examples
and applications. Lecture Notes in Statistics. Springer, Berlin
Dehling H, Fried R (2012) Asymptotic distribution of two-sample empirical u-quantiles with applications
to robust tests for shifts in location. J Multivar Anal 105:124–140
Dominicy Y, Hörmann S, Ogata H, Veredas D (2013) On sample marginal quantiles for stationary processes.
Stat Prob Lett 83:28–36
Doukhan P (1994) Mixing: properties and examples, vol 85. Lecture Notes in Statistics. Springer, Berlin
Dutta K, Sen PK (1971) On the bahadur representation of sample quantiles in some stationary multivariate
autoregressive processes. J Multivar Anal 1:186–198
Eberlein E (1986) On strong invariance principles under dependence assumptions. Ann Prob 14:260–270
Ferraty F, Romain Y (2011) Oxford handbook of functional data analysis. Oxford University Press, Oxford
Francq Z, Zakoïan J-M (2010) GARCH models. Wiley, Chichester
Fremdt S (2013) Page’s sequential procedure for change-point detection in time series regression. http://
arxiv.org/abs/1308.1237
Fremdt S (2014) Asymptotic distribution of delay time in page’s sequential procedure. J Stat Plan Inference
145:74–91
Gombay E (1994) Testing for change-points with rank and sign statistics. Stat Prob Lett 20:49–56
Gombay E, Horváth L (1996) On the rate of approximations for maximum likelihood tests in change-point
models. J Multivar Anal 56:120–152
Gombay E, Horváth L, Hušková M (1996) Estimators and tests for change in variances. Stat Decis 14:145–
159
Gombay E, Hušková M (1998) Rank based estimators of the change point. J Stat Plan Inference 67:137–154
Gombay E (2000) u-statistics for sequential change-detection. Metrika 54:133–145
Gombay E (2001) u-statistics for change under alternative. J Multivar Anal 78:139–158
Hansen BE (1992) Tests for parameter instability in regression with i(1) processes. J Bus Econom Stat
10:321–335
123
Author's personal copy
Classical methods in change point analysis
Hansen BE (2000) Testing for structural change in conditional models. J Econom 97:93–115
Harvey DI, Leybourne SJ, Taylor AMR (2013) Testing for unit roots in the possible presence of multiple
trend breaks using minimum dickey–fuller statistics. J Econom 177:265–284
Hidalgo J, Seo MH (2013) Testing for structural stability in the whole sample. J Econom 175:84–93
Hlávka Z, Hušková M, Kirch C, Meintanis S (2012) Monitoring changes in the error distribution of autoregressive models based on fourier methods. TEST 21(2012):605–634
Hörmann S, Kokoszka P (2010) Weakly dependent functional data. Ann Stat 38:1845–1884
Hörmann S, Horváth L, Reeder R (2013) A functional version of the arch model. Econom Theory 29:267–
288
Horn RA, Johnson AMD (1991) Topics in matrix analysis. Cambridge University Press, Cambridge
Horváth L (1984) Strong approximation of renewal processes. Stoch Process Appl 18:127–138
Horváth L (1993) The maximum likelihood method for testing changes in the parameters of normal observations. Ann Stat 21:671–680
Horváth L (1995) Detecting changes in linear regressions. Statistics 26:189–208
Horváth L, Serbinowska M (1995) Testing for changes in multinomial observations: the lindisfarnne scribes
problem. Scand J Stat 22:371–384
Horváth L, Hušková M, Serbinowska M (1997) Estimators for the time of change in linear models. Statistics
29:109–130
Horváth L, Kokoszka P, Steinebach J (1999) Testing for changes in multivariate dependent observations
with an application to temperature changes. J Multivar Anal 68:96–119
Horváth L, Hušková M, Kokoszka P, Steinebach J (2004) Monitoring changes in linear models. J Stat Plan
Inference 126:225–251
Horváth L, Hušková M (2005) Testing for changes using permutations of u-statistics. J Stat Plan Inference
128:351–371
Horváth L, Kokoszka P, Steinebach J (2007) On sequential detection of parameter changes in linear regression. Stat Prob Lett 77:885–895
Horváth L, Horváth Zs, Hušková M (2008) Ratio tests for change point detection. In: Beyond parametrics
in interdisciplinary research. IMS, Collections, 1:293–304
Horváth L, Hušková M, Kokoszka P (2010) Testing the stability of the functional autoregressive model. J
Multivar Anal 101:352–367
Horváth L, Hušková M (2012) Change-point detection in panel data. J Time Ser Anal 33:631–648
Horváth L, Kokoszka P (2012) Inference for functional data with applications. Springer, New York
Horváth L, Hušková M, Wang J (2013a) Estimation of the time of change in panel data. Preprint
Horváth L, Kokoszka P, Reeder R (2013b) Estimation of the mean of of functional time series and a two
sample problem. J R Stat Soc Ser B 75:103–122
Horváth L, Trapani L (2013) Statistical inference in a random coefficient panel model. Preprint
Horváth L, Kokoszka P, Rice G (2014) Testing stationarity of functional data. J Econom 179:66–82
Hsiao C (2007) Panel data analysis—advantages and challenges. TEST 16:1–22
Hsu DA (1979) Detecting shifts of parameter in gamma sequences with applications to stock prices and air
traffic flow analysis. J Am Stat Assoc 74:31–40
Hušková M (1996) Estimation of a change in linear models. Stat Prob Lett 26:13–24
Hušková M (1997a) Limit theorems for rank statistics. Stat Prob Lett 32:45–55
Hušková M (1997b) Multivariate rank statistics processes and change point analysis. In: Applied statistical
sciences III, Nova Science Publishers, New York, pp 83–96
Hušková M, Picek J (2005) Bootstrap in detection of changes in linear regression. Sankhya Ser B 67:1–27
Hušková M, Prášková Z, Steinebach J (2007) On the detection of changes in autoregressive time series, i.
asymptotics. J Stat Plan Inference 137:1243–1259
Hušková M, Kirch C, Prašková Z, Steinebach J (2008) On the detection of changes in autoregressive time
series, ii. resampling procedures. J Stat Plan Inference 138:1697–1721
Hušková M, Kirch C (2012) Bootstrapping sequential change-point tests for linear regression. Metrika
75:673–708
Hušková M (2013) Robust change point analysis. In: Robustness and complex data structures, Springer,
Berlin pp 171–190
Iacone F, Leybourne SJ, Taylor AMR (2013) Testing for a break in trend when the order of integration is
unknown. J Econom 176:30–45
Im KS, Pesaran MH, Shin Y (2003) Testing for unit roots in heterogeneous panels. J Econom 115:53–74
123
Author's personal copy
L. Horváth, G. Rice
Inclán C, Tiao GC (1994) Use of cumulative sums of squares for retrospective detection of change of
variance. J Amer Stat Assoc 89:913–923
Jandhyala VK, MacNeill IB (1997) Iterated partial sum sequences of regression residuals and tests for
changepoints with continuity constraints. J R Stat Soc Ser B 59:147–156
Jarušková D (1998) Testing appearance of linear trend. J Stat Plan Inference 70:263–276
Jarušková D (1999) Testing appearance of polynomial trend. Extremes 2:25–37
Kargin V, Onatski A (2008) Curve forecasting by functional autoregression. J Multivar Anal 99:2508–2526
Kiefer J (1959) k-sample analogues of the kolmogorov–smirnov and cramér-v. mises tests. Ann Math Stat
30:420–447
Kim JY (2000) Detection of change in persistence of a linear time series. J Econom 95:97–116
Kim JY, Belaire-French J, Badillo AR (2002) Corringendum to “detection of change in persistence of a
linear time series”. J Econom 109:389–392
Kirch C, Steinebach J (2006) Permutation principles for the change analysis of stochastic processes under
strong invariance. J Comput Appl Math 186:64–88
Kirch C (2007a) Resampling in the frequency domain of time series to determine critical values for changepoint tests. Stat Decis 25:237–261
Kirch C (2007b) Block permutation principles for the change analysis of dependent data. J Stat Plan
Inference 137:2453–2474
Kirch C, Politis DN (2011) Tft-bootstrap: resampling time series in the frequency domain to obtain replicates
in the time domain. Ann Stat 39:1427–1470
Kirch C, Tadjuidje Kamgaing J (2012) Testing for parameter stability in nonlinear autoregressive models.
J Time Ser Anal 33:365–385
Kokoszka P, Leipus R (2000) Change-point estimation in arch models. Bernoulli 6:513–539
Kuang C-M (1998) Tests for changes in models with a polynomial trend. J Econom 84:75–91
Lee S, Park S (2001) The cusum of squares test for scale changes in infinite order moving average processes.
Scand J Stat 28:625–644
Liu W, Wu WB (2010) Asymptotics of spectral density estimates. Econom Theory 26:1218–1245
Louhichi S (2000) Weak convergence for empirical processes of associated sequences. Annales de l’Institut
Henri Poincaré Probabilités et Statistiques 36:547–567
Ng S (2008) A simple test for nonstationarity in mixed panels. J Bus Econ Stat 26:113–126
Oberhofer W, Haupt H (2005) The asymptotic distribution of the unconditional quantile estimator under
dependence. Stat Prob Lett 73:243–250
Page ES (1954) Continuous inspection schemes. Biometrika 41:100–105
Page ES (1955) A test for a change in a parameter occurring at an unknown point. Biometrika 42:523–526
Quandt RE (1958) Tests of the hypothesis that a linear regression system obeys two separate regimes. J Am
Stat Assoc 53:873–880
Quandt RE (1960) The estimation of the parameters of a linear regression system obeying two separate
regimes. J Am Stat Assoc 55:324–330
Ramsay JO, Hooker G, Graves S (2009) Functional data analysis with R and MATLAB. Springer, New
York
Sen PK (1968) Asymptotic normality of sample quantiles for m-dependent processes. Ann Math Stat
39:1724–1730
Shao QM, Yu H (1996) Weak convergence for weighted empirical processes of dependent sequences. Ann
Prob 24:2098–2127
Shorack GR, Wellner JA (1986) Empirical processes with applications to statistics. Wiley, New York
Taniguchi M, Kakizawa Y (2000) Asymptotic theory of statistical inference for time series. Springer,
New York
Westerlund J, Larsson R (2012) Testing for a unit root in a random coefficient panel data model. J Econom
167:254–273
Wied D, Krämer W, Dehling H (2012) Testing for a change in correlation at an unknown point in time using
an extended functional delta method. Econom Theory 28:570–589
Wied D, Dehling H, van Kampen M, Vogel D (2014) A fluctuation test for constant Spearman’s rho with
nuisance-free limit distribution. Comput Stat Data Anal (to appear)
Wolfe DA, Schechtman E (1984) Nonparametric statistical procedures for the changepoint problem. J Stat
Plan Inference 9:389–396
Wright JH (1996) Structural stability tests in the linear regression model when the regressors have roots
local to unity. Econ Lett 52:257–262
123
Author's personal copy
Classical methods in change point analysis
Wu WB (2005) On the bahadur representation of sample quantiles for dependent sequences. Ann Stat
33:1934–1963
Yu H (1993) A glivenko–cantelli lemma and weak convergence for empirical processes of associated
sequences. Prob Theory Relat Fields 95:357–370
123
Download