Extensions of some classical methods in change point analysis Lajos Horváth & Gregory Rice TEST An Official Journal of the Spanish Society of Statistics and Operations Research ISSN 1133-0686 TEST DOI 10.1007/s11749-014-0368-4 1 23 Your article is protected by copyright and all rights are held exclusively by Sociedad de Estadística e Investigación Operativa. This eoffprint is for personal use only and shall not be self-archived in electronic repositories. If you wish to self-archive your article, please use the accepted manuscript version for posting on your own website. You may further deposit the accepted manuscript version in any repository, provided it is only made publicly available 12 months after official publication or later and provided acknowledgement is given to the original source of publication and a link is inserted to the published article on Springer's website. The link must be accompanied by the following text: "The final publication is available at link.springer.com”. 1 23 Author's personal copy TEST DOI 10.1007/s11749-014-0368-4 INVITED PAPER Extensions of some classical methods in change point analysis Lajos Horváth · Gregory Rice © Sociedad de Estadística e Investigación Operativa 2014 Abstract A common goal in modeling and data mining is to determine, based on sample data, whether or not a change of some sort has occurred in a quantity of interest. The study of statistical problems of this nature is typically referred to as change point analysis. Though change point analysis originated nearly 70 years ago, it is still an active area of research and much effort has been put forth to develop new methodology and discover new applications to address modern statistical questions. In this paper we survey some classical results in change point analysis and recent extensions to time series, multivariate, panel and functional data. We also present real data examples which illustrate the utility of the surveyed results. Keywords Change point analysis · Sequential monitor · Panel data · Time series · Functional data · Linear models Mathematics Subject Classification Primary 60F017 · 62M10; Secondary 60F05 · 60F25 · 62F05 · 60F12 · 62G30 · 62G10 · 62J05 · 62L20 · 62P12 · 62P20 Research supported by NSF grant DMS 1305858. This invited paper is discussed in comments available at: doi:10.1007/s11749-014-0367-5, doi:10.1007/ s11749-014-0369-3, doi:10.1007/s11749-014-0370-x, doi:10.1007/s11749-014-0371-9, doi:10.1007/ s11749-014-0372-8, doi:10.1007/s11749-014-0373-7, doi:10.1007/s11749-014-0376-4, doi:10.1007/ s11749-014-0377-3. L. Horváth (B) · G. Rice Department of Mathematics, University of Utah, Salt Lake City, UT 84112–0090, USA e-mail: horvath@math.utah.edu G. Rice e-mail: rice@math.utah.edu 123 Author's personal copy L. Horváth, G. Rice 1 Introduction Change point analysis, which is the study of the detection and estimation of changes in a quantity of interest based on sample data, originated in the 1940s and initially focused on data-driven quality control techniques. Over time, methods in change point analysis have been developed to address data analytic questions in fields ranging from biology to finance, and in many cases such methodology has become standard. The statistical community now enjoys a vast literature on change point analysis where many of the most natural and common questions have received at least some attention. In spite of this, change point analysis is still an active area of research, and much effort is being put forth to extend classical results to data which exhibit dependence, high-dimensionality and/or follow models which have not yet been considered. The present paper is meant to accomplish two goals. The first goal is to survey the most frequently used methods and ideas in change point analysis. Toward this we have supplied detailed developments of several procedures and an extensive, however far from exhaustive, list of references. Throughout the paper we have also included applications of several classical results to real data sets. The second goal is to derive extensions of some results presented in the paper. The surveyed results and their extensions are organized into sections as follows. In Sect. 2, we consider the early contribution to change point analysis by Page and other nonparametric methods. These results are extended to time series in Sect. 3. Section 4 contains Darling–Erdős results and applications to changes in correlation. In Sect. 5, we consider changes to the parameters in regression models. Section 6 discusses sequential methods in change point analysis. Section 7 contains applications of change point analysis to panel data. In Sect. 8, we discuss functional data. 2 Page’s procedure and its extension Some of the first results in change point analysis were derived in the context of quality control where an important problem is to detect an increase in the proportion of defective products which are being manufactured as quickly as possible. Naturally many of these procedures were nonparametric in nature. Page (1954, 1955) suggested a very simple method to test the stability of the quantiles of underlying observations. To simplify the calculations below we take the quantile of interest to be the median. Let us assume that X 1 , X 2 , . . . , X N are independent random variables. If m i denotes the median of X i , the null hypothesis is stated as m = m1 = m2 = · · · = m N , where m is known and the change point alternative is there is an integer k ∗ such that m 1 = m 2 = · · · = m k ∗ = m k ∗ +1 = m k ∗ +2 = · · · = m N . 123 (2.1) Author's personal copy Classical methods in change point analysis We say that k ∗ is the location of the change assuming that the alternative holds. It is assumed in Page (1955) that m = m 1 = m 2 = · · · = m N , i.e. the common value under the no change null (and the initial value under the change point alternative) is known. Page defined Vj = if X j > m if X j ≤ m, 1, −1, and he recommended rejecting the no change null hypothesis if TN = max ⎧ k ⎨ 1≤k≤N ⎩ Vi − min i=1 1≤ j≤k j i=1 ⎫ ⎬ Vi ⎭ is large. The limit distribution of TN under H0 was approximated using combinatorial arguments. The exact distribution of TN can also be computed for any N , as it is derived by Gombay (1994) from Csáki (1986) [cf. also Csörgő and Horváth (1997), p 91]. However, using weak convergence of empirical measures [cf. Billingsley (1968)], the limit distribution of TN can be derived easily. Assuming that the distributions of the X i ’s are continuous at the common median, the variables V1 , V2 , . . . , VN are independent identically distributed random variables taking values ±1 with probability 1/2. Hence by the weak convergence of the simple random walk N −1/2 N t Vi D[0,1] −→ W (t), (2.2) i=1 where W (t), 0 ≤ t < ∞, stands for a Wiener process (standard Brownian motion). D[0,1] We use −→ to denote the weak convergence of random functions in the Skorokhod space [cf. Billingsley (1968)]. The weak convergence of partial sums in (2.2) implies immediately that under the no change null hypothesis N −1/2 D W (t) − inf W (s) , TN → sup 0≤s≤t 0≤t≤1 D where → means convergence in distribution. According to Lévy’s formula [cf. Chung and Williams (1983)], we have that D W (t) − inf W (s), 0 ≤ t < ∞ = {|W (t)|, 0 ≤ t < ∞}. 0≤s≤t (2.3) and therefore D N −1/2 TN → sup |W (t)|. 0≤t≤1 123 Author's personal copy L. Horváth, G. Rice We can consider TN as the supremum functional of ⎧ ⎫ N s t ⎨N ⎬ VN (t) = N −1/2 Vi − inf Vj , ⎩ ⎭ 0≤s≤t i=1 j=1 and via (2.2) the limit distributions of several other functionals of VN (t) can be derived. For example, 1 D VN2 (t)dt → 0 1 1 W (t)dt and 0 D 1 |VN (t)|dt → 2 0 |W (t)|dt. 0 This procedure can be modified to test for the equality of other quantiles by altering the definition of V j . The main crux of this technique though is that it assumes that the quantile of interest may be specified, which, although is reasonable in some contexts like quality control, is not plausible in many situations. Furthermore, even if a single quantile is specified, this procedure cannot in general detect deviations in other quantiles of the underlying sample. In light of this, one may wish to consider the more general hypothesis test of H0 : F1∗ = F2∗ = · · · = FN∗ , versus H A : F1∗ = F2∗ = · · · = Fk∗∗ = Fk∗∗ +1 = · · · = Fk∗∗ +2 = FN∗ , for some unknown change point k ∗ where Fi∗ is the cumulative distribution function of X i . Below we outline a test of these hypotheses, which exploits empirical process theory, along the lines of Wolfe and Schechtman (1984) and Csörgő and Horváth (1987). Let FN t (x) = N t 1 I {X i ≤ x}, −∞ < x < ∞, 0 ≤ t ≤ 1 N i=1 denote the sequential empirical distribution function. The empirical quantile function is defined as the generalized inverse of FN (x): Q N (s) = inf {x : FN (x) ≥ s}, 0 < s < 1. The process Y N (t, s) = N 1/2 (FN t (Q N (s)) − ts) is a version of the Wilcoxon-type process studied in Wolfe and Schechtman (1984) and Csörgő and Horváth (1987). If 123 Author's personal copy Classical methods in change point analysis X 1 , X 2 , . . . , X N are identically distributed with a continuous distribution function F, (2.4) then by the Glivenko–Cantelli theorem for the empirical distribution and quantile functions, we get that for every 0 < t, s < 1 and −∞ < x < ∞ FN t (x) = N t N t 1 I {X i ≤ x} ≈ t F(x) and N N t Q N (s) ≈ Q(s) = F −1 (s), i=1 so the asymptotic value of FN t (Q N (s)) is ts. Empirical process theory can be used to derive the weak limit of Y N (t, s) under (2.1) and (2.4). First, we write N t 1 (I {F(X i ) ≤ F(Q N (s))} − F(Q N (s))) N i=1 N t N t (F(Q N (s)) − s) + − t s. + N N FN t (Q N (s)) − ts = (2.5) Clearly, N t 1 . sup − t s = O N N 0≤t,s≤1 (2.6) Also, on account of (2.1) and (2.4), the variables F(X 1 ), F(X 2 ), . . . , F(X N ) are independent, uniformly distributed on [0, 1]. Hence the distribution of Y N (t, s) does not depend on F. By the weak convergence of the sequential empirical process, we can define Gaussian processes N (t, s), 0 ≤ t, s ≤ 1 with E N (t, s) = 0 and E N (t, s) N (t , s ) = min(t, t )(min(s, s ) − ss ) such that t −1/2 N sup N sup (I {X i ≤ x} − F(x)) − N (t, F(x)) 0≤t≤1 −∞<x<∞ i=1 = o P (1). (2.7) By the Bahadur–Kiefer representation [cf. Sect. 3.3 in Csörgő and Horváth (1993)] we have that sup |F(Q N (s)) + FN (Q(s)) − 2s| = o P (N −1/2 ). (2.8) 0≤s≤1 Putting together (2.6)–(2.8), we conclude sup |Y N (t, s) − [ N (t, F(Q N (s))) − t N (1, s)]| = o P (1). 0≤t,s≤1 123 Author's personal copy L. Horváth, G. Rice It also follows from (2.7) and (2.8) that u N = sup |F(Q N (s)) − s| = O P (N −1/2 ). (2.9) 0≤s≤1 So by the almost sure continuity of the sample paths of N (t, s) [cf. Chapter 1 in Csörgő and Révész (1981)] we get sup | N (t, F(Q N (s))) − N (t, s)| 0≤t,s≤1 ≤ 2 sup sup | N (t, s + h) − N (t, s)| 0≤t,s≤1 0≤h≤u N = o P (1). Thus, we have obtained the following result: Theorem 2.1 If (2.1) and (2.4) hold, then we can define a sequence of Gaussian processes ◦N (t, s) = N (t, s) − t N (1, s) with E◦N (t, s) = 0 and E◦N (t, s)◦N (t , s ) = (min(t, t ) − tt )(min(s, s ) − ss ) such that sup |Y N (t, s) − ◦N (t, s)| = o P (1). 0≤t,s≤1 The distribution of ◦N (t, s) is the same for all N . The computation of the distributions of functionals of the limiting process is discussed in Blum et al. (1961) and Shorack and Wellner (1986). Statistics based on functionals of Y N (t, s) are robust and they could be considered as adaptations of Wilcoxon-type statistics to change point detection. Rank-based procedures are discussed in Hušková (1997a,b) and Gombay and Hušková (1998) while U -statistics is utilized in Gombay (2000, 2001) and Horváth and Hušková (2005). For an up-to-date survey on robust change point analysis we refer to Hušková (2013). Example 1 To illustrate the utility of Theorem 2.1, we consider data consisting of the monthly average temperatures in Prague between 1775 and 1989 which were provided to us by Dr. Jarušková (Technical University, Prague). For each month there are 215 observations which we consider to be independent due to temporal separation. The panel of Fig. 1 is the plot of Y215 (t, s) for the March averages. We used first 2 (t, s)dtds to test the no change in the distribution null hypothesis. Using the Y215 critical values from Blum et al. (1961), the null hypothesis was rejected at the 0.05 level. In total the test rejected the no change null hypothesis at the 0.05 level in 9 out of the 12 months which is consistent with the results in Horváth et al. (1999) where these data were also considered. To illustrate the difference between the shape of Y N (t, s) under H0 and H A , we generated 215 independent uniform random variables on [0, 1]. The graph of Y215 (t, s) for the uniform variables is in the second panel of Fig. 1. Note that when H0 was not rejected we do not observe a large peak in Y N which is a typical feature under the alternative. 123 Author's personal copy Classical methods in change point analysis Fig. 1 Plots of Y215 (t, s) for the March average temperature in Prague from 1775 to 1989 (left panel) and for simulated uniform random variables on [0, 1] (right panel) 3 Empirical process technique for time series In several applications, especially in economics and finance, it cannot be assumed that the observations satisfy (2.1). In this section we are interested in the behavior of Page’s procedure if we allow the independence assumption to be violated. We replace (2.1) with (3.1) {X i , −∞ < i < ∞}, a stationary sequence. As before, the common distribution is denoted by F, and we assume F is continuous on the real line. (3.2) We show how to use empirical process theory and invariance principles to establish the 1 1 weak convergence of Y N (t, s) and approximate the distribution of 0 0 Y N2 (t, s)dtds under (3.1). Due to its importance in applications, the sequential empirical process α N (t, x) = N −1/2 N t (I {X i ≤ x} − F(x)) i=1 has received special attention in the literature. Assuming further conditions on the dependence structure of the X i ’s, which we address below, one can define a sequence of Gaussian processes N (t, s), 0 ≤ t, s ≤ 1 with E N (t, s) = 0 and E N (t, s) N (t , s ) = min(t, t )C(s, s ), where C(s, s ) = min(s, s ) − ss + ∞ [E(I {F(X 0 ) ≤ s}I {F(X i ) ≤ s }) − ss ] i=1 ∞ + [E(I {F(X 0 ) ≤ s }I {F(X i ) ≤ s}) − ss ] i=1 123 Author's personal copy L. Horváth, G. Rice such that sup sup 0≤t≤1 −∞<x<∞ |α N (t, x) − N (t, F(x))| = o P (1). (3.3) Note that the distribution of N (t, s) does not depend on N . Let (t, s) be a process D satisfying {(t, s), 0 ≤ t, s ≤ 1} = { N (t, s), 0 ≤ t, s ≤ 1}. We show that if (t, s) has continuous sample paths with probability 1, (3.4) then the weak approximation in (3.3) yields the limit of functionals of Y N (t, s) in the dependent case. Results establishing (3.3) under various dependence conditions can be found in Berkes et al. (2008, 2009a), Berkes and Horváth (2001, 2003), Berkes and Philipp (1977), Billingsley (1968), Louhichi (2000), Shao and Yu (1996) and Yu (1993). For surveys we refer to Berkes and Horváth (2002), Bradley (2007), Dedecker et al. (2007) and Doukhan (1994). Theorem 3.1 If (3.1)–(3.3) hold, then we have sup |Y N (t, s) − ◦N (t, s)| = o P (1), 0≤t,s≤1 where ◦N (t, s) = N (t, s) − t N (1, s). Proof On account of (2.5) and (2.6) we need to consider the joint approximation of α N (t, Q(s)) and the corresponding quantile process N 1/2 (F(Q N (s)) − s) in the dependent case. First, we show that (3.3) implies the Bahadur–Kiefer representation of (2.8). Using Lemma 1.1 in Csörgő and Horváth (1993) p 24 [cf. also Horváth (1984)] and then (3.3), we conclude sup |F(Q N (s)) − s| ≤ sup |FN (Q(s)) − s| = O P (N −1/2 ). 0≤s≤1 0≤s≤1 Lemma 1.2 of Csörgő and Horváth (1993) p 25 combined with (3.3) and (3.4) imply that sup |N 1/2 (s − F(Q N (s)) − α N (1, Q N (s))| = o P (1). (3.5) 0≤s≤1 Hence by (2.5) and (2.6) we have sup |Y N (t, s) − (α N (t, Q N (s)) − tα N (1, Q N (s)))| = o P (1). 0≤t,s≤1 Now the result follows from (3.3), the almost sure continuity of the approximating process assumed in (3.4), and (3.5). Remark 3.1 The arguments used in the proof of Theorem 3.1 show that many results for the empirical distribution function and the empirical process can be automatically established for the empirical quantile and quantile processes. Not only asymptotic 123 Author's personal copy Classical methods in change point analysis normality, weak convergence and laws of the iterated logarithm, but also rates of convergence and Bahadur–Kiefer representations (with exact rate) can be established if we have a rate in (3.3) and the modulus of continuity of is known [cf. Chapter 1 in Csörgő and Révész (1981) on the modulus of continuity of Wiener and related processes]. For example, the results in Dominicy et al. (2013), Dutta and Sen (1971), Oberhofer and Haupt (2005), Sen (1968) and Wu (2005) can also be derived from empirical process theory. It follows immediately from Theorem 3.1 that under the no change null hypothesis for every 0 < s0 < 1 D[0,1] C −1/2 (s0 , s0 )Y N (t, s0 ) −→ B(t), where B(t), 0 ≤ t ≤ 1, stands for a Brownian bridge. However, C(s0 , s0 ) is typically unknown and must be estimated from the sample. A Bartlett-type estimator computed from I {X i ≤ Q N (s0 )}, 1 ≤ i ≤ N can be used to construct an asymptotically consistent estimator for C(s0 , s0 ). For some recent results on Bartlett estimators for the long-run variance we refer to Liu and Wu (2010) and Taniguchi and Kakizawa (2000). An alternative to estimating C(s0 , s0 ) is to use ratio statistics. Adapting the main idea in Busetti and Taylor (2004), Horváth et al. (2008), Kim (2000) and Kim et al. (2002) to our case, the testing of the stability of the quantiles can be based on the following consequence of Theorem 3.1: for every 0 < δ < 1 we have sup0≤u≤t |Y N (u, s0 )| δ<t<1 sup0≤u≤t |Y N (u, s0 ) − (u/t)Y N (t, s0 )| sup0≤u≤t | ◦ (u, s0 )| D . → sup ◦ ◦ δ<t<1 sup0≤u≤t | (u, s0 ) − (u/t) (t, s0 )| sup (3.6) The limit in (3.6) is distribution free so it is easy to tabulate its distribution using Monte Carlo simulations. The covariance function of ◦ is the product of the covariance function of a Brownian bridge and C(s, s ), so by the Karhunen–Loéve expansion of stochastic processes we get that 11 ( ◦ (t, s))2 dtds = ∞ ∞ i=1 j=1 0 0 1 N2 λj, (iπ )2 i, j where Ni, j , 1 ≤ i, j < ∞ are independent standard normal random variables and λ1 ≥ λ2 ≥ . . . are the eigenvalues of the operator associated with C(s, s ), i.e. 1 λi ϕi (s ) = C(s, s )ϕi (s)ds, 1 ≤ i < ∞, (3.7) 0 123 Author's personal copy L. Horváth, G. Rice with eigenfunctions ϕ1 , ϕ2 , . . .. Since C(s, s ) is unknown, first we consider its estimation from the sample. The estimation is based on the observation that C(s, s ) is also the long-run covariance function of the random functions I {X 1 ≤ Q(s)}, I {X 2 ≤ Q(s)}, . . . , I {X N ≤ Q(s)}. Since Q is unknown, the estimation of C(s, s ) is based on I {X 1 ≤ Q N (s)}, I {X 2 ≤ Q N (s)}, . . . , I {X N ≤ Q N (s)}. We suggest Ĉ N , the kernelbased Bartlett estimator of Horváth et al. (2013b) [cf. also Horváth and Kokoszka (2012)]. It can be shown under very mild assumptions on the dependence structure of the observations that 11 (Ĉ N (s, s ) − C(s, s ))2 dsds = o P (1), (3.8) 0 0 which implies for any fixed i that |λ̂i,N − λi | = o P (1), (3.9) where λ̂1,N ≥ λ̂2,N ≥ . . . are the eigenvalues of the operator associated with Ĉ N (s, s ), i.e. 1 λ̂i,N ϕ̂i,N (s ) = Ĉ N (s, s )ϕ̂i,N (s)ds. (3.10) 0 The result in (3.9) is an immediate consequence of (3.7), (3.8) and (3.10). If N , d(1) and d(2) are sufficiently large, then ⎧ 1 1 ⎨ P ⎩ 0 0 Y N2 (t, s)dtds ≤ x ⎫ ⎬ ⎭ ≈P ⎧ d(1) d(2) ⎨ ⎩ i=1 j=1 ⎫ ⎬ 1 2 N ≤ x for all x. λ̂ j,N ⎭ (iπ )2 i, j Usually d(1) and d(2) are chosen using the cumulative variance approach, see Sect. 8. Fremdt (2013, 2014) develops some interesting applications of Page’s procedure to sequential stability testing. Empirical U -quantiles are used in Dehling and Fried (2012) to detect changes in the location parameter in case of dependent data. 4 Darling–Erdős laws and change in correlations The survey paper Aue and Horváth (2013) explains the applications of CUSUMs (cumulative sums) including their weighted and self-normalized versions. The maximally selected self-normalized CUSUM is the maximally selected two-sample likelihood ratio when the errors are normally distributed, and its distribution was derived in Horváth (1993). It is also noted in Aue and Horváth (2013) that a weak invariance principle for the sums of the observations is enough to obtain the limit distribution 123 Author's personal copy Classical methods in change point analysis √ Fig. 2 Plots of |Z215 (t)| (left panel) and |Z215 (t)|/ t (1 − t) (right panel) for the March average temperature in Prague from 1775 to 1989 of CUSUM statistics, but the derivation of the limit of the maximally selected selfnormalized CUSUM requires approximation of the partial sums of the observations with a rate. In this case, the limit distribution is in the extreme value class of distributions. Due to the nonstandard nature of the limit, Andrews (1993) claimed when the maximum is computed one must compute the maximum on a restricted range, i.e. the maximum is trimmed. It must then be implicitly assumed that the break does not occur in a given percentage of the beginning or the end of the data. The limit of the trimmed maximum depends on several unknown parameters and functions so one usually needs resampling to get critical values. Andrews’ approach has been criticized by, among others, Hidalgo and Seo (2013) who pointed out that the choice of the trimming parameter is arbitrary and the test loses power if the change occurs in the trimmed off observations. In light of the comments in Hidalgo and Seo (2013), it would be interesting to reinvestigate the likelihood ratio test for multiple changes in Bai (1999) when the whole sample is used. All the proofs of the limit distribution of the likelihood ratio tests and the maximally selected self-normalized CUSUM are based on the following technique: (1) first, it is shown that the underlying process can be approximated with suitably chosen Gaussian processes (2) a limit theorem is proven for the Gaussian-based process. This technique first appeared in Darling and Erdős (1956) and therefore the limits of the maximum of self-normalized statistics is usually referred to as Darling–Erdős laws. For more information on Darling–Erdős laws we refer to Csörgő and Horváth (1993, 1997). Example 2 As an illustration for the CUSUM methods we reinvestigated the temperature data used in Example 1. In this case, we consider the test of H0 : μ1 = μ2 = · · · = μ N , where μi = E X i against the one change in the mean alternative. The CUSUM process is defined as ⎛ Z N (t) = 1 N 1/2 σ̂ N N t ⎝ i=1 Xi − t N ⎞ X i ⎠ , 0 ≤ t ≤ 1, i=1 123 Author's personal copy L. Horváth, G. Rice where X 1 , X 2 , . . . , X N denote the March averages and σ̂ N stands for the sample standard deviation. The left panel on Fig. 2 exhibits the graph of |Z N (t)| together with the 0.05 critical value (red line) computed from the distribution of sup0≤t≤1 |B(t)|, where B is a Brownian bridge. The right panel is the graph of |Z N (t)|/(t (1 − t))1/2 . The rate of convergence in the Darling–Erdős type limit results is slow [cf. Berkes et al. (2004c)], so we used the recommended 0.05 critical value in Gombay and Horváth (1996) (the red line in the right panel). The no change in the mean null hypothesis is rejected by both tests. This is consistent with our findings in Example 1. The CUSUM method was also applied to detect changes in variances as in Gombay et al. (1996), Inclán and Tiao (1994) and Lee and Park (2001). For applications to stock prices and air traffic we refer to Hsu (1979). CUSUM methodology was extended to detect the stability of the covariances of the observation vectors in Berkes et al. (2009c). The proofs in their paper are based on weighted approximations of partial sums. Change detection in linear processes was investigated by Hušková et al. (2007, 2008), Kirch (2007a,b), Kirch and Politis (2011) and Kirch and Steinebach (2006). Horváth and Serbinowska (1995) and Batsidis et al. (2013) propose several methods, including self-normalized and maximally selected χ 2 statistics, to detect changes in multinomial data. Their methods are used to determine the number of translators of the Lindisfarne Gospels. Due to the popularity of volatility models (i.e. when the conditional variance depends on the previous observations and variances), several papers have been devoted to testing the stability of ARCH- and GARCH-type processes. For a general introduction and survey on volatility models we refer to Francq and Zakoïan (2010). In a nonlinear time series setting, parametric procedures were utilized by Berkes et al. (2004a) and Kokoszka and Leipus (2000) to detect breaks in the parameters of ARCH and GARCH processes, and by Berkes et al. (2004b) to sequentially monitor for breaks in the parameters of GARCH processes. For more general nonlinear processes we refer to Kirch and Tadjuidje Kamgaing (2012). Andreou and Ghysels (2002) were concerned with the dynamic evolution of financial market volatilities. Davis et al. (2008) analyze parametric nonlinear time series models by means of minimum description length procedures. In the rest of this section we focus on the application of the CUSUM method to detect instability in correlations following Aue et al. (2009c). We observe the d-dimensional random vectors y1 , y2 , . . . , y N and wish to check H0 : Cov(y1 ) = Cov(y2 ) = · · · = Cov(y N ) against the alternative H A : there is an integer k ∗ such that Cov(y1 ) = Cov(y2 ) = · · · = Cov(yk ∗ ) = Cov(yk ∗ +1 ) = · · · = Cov(y N ). As before, k ∗ is the time of change and is unknown. In the derivation below we assume that the means of the observations do not change. This may require an initial test of constancy of the means as described in Aue and Horváth (2013), and if necessary, a transformation of the data so that the expected values can be regarded as stable for 123 Author's personal copy Classical methods in change point analysis the whole observation period. To construct a test statistic for distinguishing between H0 and H A , we let vech(·) be the operator that stacks the columns on and below the diagonal of a symmetric d × d matrix as a vector with d = d(d + 1)/2 components. For a detailed study of the vech operator we refer to Horn and Johnson (1991). The CUSUM will be constructed from ⎫ ⎧ k N ⎬ 1 ⎨ k vech(ỹ j ỹTj ) − vech(ỹ j ỹTj ) , 1 ≤ k ≤ N , S̃k = 1/2 ⎭ ⎩ N N j=1 j=1 where ·T denotes the transpose of vectors and matrices and ỹ j = y j − ȳ N with ȳ N = N 1 yj. N j=1 The variables S̃k , 1 ≤ k ≤ N form a vector-valued CUSUM sequence. The asymptotic properties of S̃k , 1 ≤ k ≤ N are determined by the partial sum process of vech{(yi − Eyi )(yi − Eyi )T }. Let = ∞ Cov vech{(y0 − Ey0 )(y0 − Ey0 )T }, i=−∞ vech{(yi − Eyi )(yi − Eyi )T } denote the long-run covariance matrix. The matrix is typically unknown and we ˆ N , i.e. assume that it can be estimated asymptotically consistently by ˆ N − | = o P (1). | (4.1) In addition to Taniguchi and Kakizawa (2000) and Liu and Wu (2010), a discussion of Bartlett-type kernel estimators satisfying (4.1) is in Brockwell and Davis (1991). The proposed estimators take the possible change into account so (4.1) holds under the null as well as under the change point alternative. If the possibility of a change is ˆ N , the power of the tests (i) , i = 1, . . . , 4, defined not built into the definition of N in this section, might be reduced. The statistics −1 (1) ˆ N S̃k N = max S̃kT 1≤k≤N and (2) N = N 1 T ˆ −1 S̃k N S̃k . N k=1 were suggested in Aue et al. (2009c) and their asymptotic distributions were established when the sequence yk , −∞ < k < ∞ is a stationary, m-decomposable sequence. However, their assumption can be replaced with other measures of dependence, like mixing or near-epoch dependence conditions [cf. Wied et al. (2012)]. Under H0 and certain regularity conditions it is shown in Aue et al. (2009c) that 123 Author's personal copy L. Horváth, G. Rice (1) D N → (1) = sup (2) N D d Bi2 (t) (4.2) Bi2 (t)dt, (4.3) 0≤t≤1 i=1 and (2) → = d 1 i=1 0 where B1 , B2 , . . . , Bd are independent Brownian bridges. Formulas and critical values for the distribution functions of (1) and (2) are given in Kiefer (1959). It is interesting to note if d is large both (1) = (1) (d) and (2) = (2) (d) can be approximated with normal random variables. Namely, both ¯ (1) = (1) (d) − d/4 (d/8)1/2 and ¯ (2) = (2) (d) − d/6 (d/45)1/2 converge in distribution to a standard normal random variable as d → ∞. We can define another type of test statistic by comparing the means of vech(ỹ j ỹTj ), 1 ≤ j ≤ k to the sample means of vech(ỹ j ỹTj ), k + 1 ≤ j ≤ N . If H0 holds, then Cov ⎧ k ⎨1 ⎩k vech(ỹ j ỹTj ) − j=1 1 N −k N vech(ỹ j ỹTj ) j=k+1 ⎫ ⎬ ⎭ ≈ 1 1 + k N −k so other natural statistics are ⎫T ⎧ k N ⎬ ⎨1 1 (3) vech(ỹ j ỹTj ) − vech(ỹ j ỹTj ) N = max ⎭ 1≤k<N ⎩ k N −k j=1 j=k+1 −1 1 1 ˆ + × N k N −k ⎧ ⎫ k N ⎨1 ⎬ 1 × vech(ỹ j ỹTj ) − vech(ỹ j ỹTj ) ⎩k ⎭ N −k j=1 and (4) N ⎫T ⎧ N −1 k N ⎬ 1 ⎨1 1 = vech(ỹ j ỹTj ) − vech(ỹ j ỹTj ) ⎭ ⎩k N N −k k=1 j=1 j=k+1 −1 1 1 ˆN + × k N −k ⎧ ⎫ k N ⎨1 ⎬ 1 × vech(ỹ j ỹTj ) − vech(ỹ j ỹTj ) . ⎩k ⎭ N −k j=1 123 j=k+1 j=k+1 , Author's personal copy Classical methods in change point analysis (3) (4) The statistic N is a maximally selected self-normalized quadratic form, while N is a sum of self-normalized CUSUM’s. Following the proofs of (4.2) and (4.3) one can show that under H0 (4) N D → (4) = d i=1 0 1 Bi2 (t) dt. t (1 − t) (4.4) (3) However, it is much harder and delicate to obtain the limit distribution of N . The main difficulty is that the weak convergence type arguments leading to (4.2)–(4.4) need to be replaced with an approximation of the partial sums of vech[(y j − Ey j )(y j − Ey j )T ], 1 ≤ j ≤ N with a suitable rate. Also, we need to replace (4.1) with ˆ N − | = o P (1/ log log N ). | (4.5) To prove a Darling–Erdős law for (3) N first we need to find an approximation for the partial sums of vech[(yi − Eyi )(yi − Eyi )T ], 1 ≤ i ≤ N . Namely, one must show that there are ξ 1 = ξ 1 (N ), ξ 2 = ξ 2 (N ), . . . , ξ N = ξ N (N ), independent identically distributed standard normal random vectors in R d such that n 1 vech[(y j − Ey j )(y j − Ey j )T ] 1/2− 1≤k<n≤N (n − k) j=k+1 n − 1/2 ξ j = O P (1) max (4.6) j=k+1 with some > 0 and N N T 1/2 max vech[(y j − Ey j )(y j − Ey j ) ] − ξ j = o P (N 1/2 ). (4.7) 1≤k≤N j=1 j=1 The approximations we assumed in (4.6) and (4.7) have been established under a large variety of dependence conditions [cf., for example, Berkes et al. (2014), Berkes and Philipp (1979), Bradley (2007) and Eberlein (1986)]. Using (4.6) and (4.7) or even weaker versions of these, the arguments in Csörgő and Horváth (1993, 1997) can be repeated [cf. also Aue and Horváth (2013)] and the following result can be established under the no change in the covariances null hypothesis: (3) lim P{(2 log log N ) N N →∞ ≤ (x + 2 log log N + (d/2) log log log N − log (d/2))2 } = exp(−2e−x ) (4.8) for all x, where (t) denotes the Gamma function. 123 Author's personal copy L. Horváth, G. Rice Fig. 3 AIG stock prices and the Housing Price Index between 02 January, 1991 and 31 January, 2013 It has been observed that the rate of convergence in (4.8) might be slow and large N is needed to use the limit to approximate the distribution of (3) N . It is pointed out in Davis et al. (1995) that the result in (4.8) remains valid with different, but of course, asymptotically equivalent norming sequences but the choice of these sequences might affect the finite sample performance. Due to the slow rate of convergence, resampling (3) methods can be used to get critical values for N . The rate of convergence in Darling– Erdős laws was studied in Berkes et al. (2004c) in a very simple case and they proved that the rate of convergence of the permutation resampling is better than the rate of convergence in the Darling–Erdős limit theorem for the maximally selected CUSUM. Correlations instead of covariances are used in Wied et al. (2012) to check the stability of the connection between the coordinates of the observations, while the method in Wied et al. (2014) is based on the nonparametric Spearman’s rho. Example 3 In this example, we examine the relationship between the stock price of American International Group (AIG) and the Housing Price Index (HPI) published by the US Federal Housing Finance Agency. The data we consider are available at http://finance.yahoo.com and http://www.fhfa.gov, and graphs of these observations from January, 1991 to January, 2013 are displayed in Fig. 3. Major losses are visible in both graphs around the summer of 2008 which correspond to the collapse of the US housing bubble and the beginning of a global financial crisis. The monthly log returns of the stock price of AIG as well as the detrended monthly log returns of the HPI are displayed in Fig. 4. Both sequences show typical GARCH features [cf. Francq and Zakoïan (2010)]. Maximally selected CUSUM procedures suggest that both time series experience a change in volatility at roughly the same time. CUSUM procedures can also be used to test the stability of the correlation between the log returns of the AIG stock price and those of the HPI over the observation period as in Wied et al. (2012). Let ρ̂k and ρ̄ N −k denote the sample correlations computed from the first k and the last N − k observations, respectively. The corresponding CUSUM process is defined by 123 Author's personal copy Classical methods in change point analysis Fig. 4 The monthly log returns on the AIG stock price and on the Housing Price Index between 02 January, 1991 and 31 January, 2013 are on the first two panels. The detrended log returns on the Housing Price Index are on the third panel √ Fig. 5 Plots of |R264 (t)| (left panel) and |R264 (t)|/ t (1 − t) (right panel) for the correlation between the AIG and housing index monthly log returns R N (t) = t (1 − t) (ρ̂N t − ρ̄ N −N t ), 0 ≤ t ≤ 1. σ̂ N N 1/2 where σ̂ N is a Bartlett-type estimator of the variance of the summands in the correlation. The graphs of |R N (t)| and its weighted version are shown in Fig. 5. In each case the 123 Author's personal copy L. Horváth, G. Rice process has a definitive peak which corresponds to the month of August in 2008; the month immediately following the signing of the Housing and Economic Recovery Act and one month before Lehman Brothers Holdings Inc. filed for bankruptcy. The correlation of the monthly log returns of the stock price of AIG and the HPI in the months prior to August 2008 was 0.011 compared to a correlation of 0.305 after August 2008. We observed similar results as those above when considering other major US banks and lending companies. 5 Regressions Following Quandt (1958, 1960) we consider linear regression with one possible change in a subset of the regressors: T T (β + δ I {i > k ∗ }) + xi,2 γ + εi , 1 ≤ i ≤ N , yi = xi,1 (5.1) where xi,1 ∈ R d , xi,2 ∈ R p are known column vectors, and k ∗ , the time of change, is unknown. The regression parameter β ∈ R d changes to β + δ ∈ R d at time k ∗ while γ ∈ R p remains constant during the observation period. All the parameters are unknown but it is assumed that δ = 0. Under the null hypothesis H0 : k ∗ ≥ N (5.2) HA : 1 < k∗ < N . (5.3) while under the alternative In the model above not all regressors are changing, only a subset of them. This problem was first studied in Quandt (1958, 1960) when all regressors could change, i.e. it is known that γ = 0. We use the quasi-likelihood method, i.e. we assume that the errors εi , 1 ≤ i ≤ N are independent normal random variables with unknown variance σ 2 . Assuming that the change occurs at time k ∗ = k, this is a standard twosample problem and the likelihood ratio k can be easily and explicitly computed. For details we refer to Sect. 3.1.1 of Csörgő and Horváth (1997). Since the time of change is unknown, we use the maximally selected likelihood ZN = max d+ p<k<N −(d+ p) (−2 log k ). It was pointed out in Quandt (1958, 1960) that even under H0 we have that Z N → ∞ in probability which was misinterpreted that Z N does not have a limit distribution. Following Andrews (1993) the truncated statistics Z N ,δ = max (−2 log k ) N δ≤k≤N −N δ has been considered in the literature since under H0 it has a limit distribution for every 0 < δ < 1/2. However, the power depends strongly on δ and the unknown k ∗ . Also, 123 Author's personal copy Classical methods in change point analysis the limit distribution depends heavily on the unknown parameters. The statistic Z N does not have these drawbacks. Let ⎡ T ⎤ ⎡ T ⎤ x1,1 x1,2 ⎢ T ⎥ ⎢ T ⎥ ⎢ x2,1 ⎥ ⎢ x2,2 ⎥ ⎢ ⎥ ⎥ X11,k = ⎢ ⎢ .. ⎥ , X12,k = ⎢ .. ⎥ , ⎣. ⎦ ⎣. ⎦ T T xk,1 xk,2 and ⎡ X21,k T xk+1,1 ⎡ ⎤ T xk+1,2 ⎤ ⎢ T ⎢ T ⎥ ⎥ ⎢ xk+2,1 ⎥ ⎢ xk+2,2 ⎥ ⎢ ⎢ ⎥ ⎥. = ⎢. ⎥ , X22,k = ⎢ .. ⎥ . ⎣. ⎣. ⎦ ⎦ x TN ,1 x TN ,2 We assume that there are matrices A1,1 , A1,2 , A2,2 such that 1 T 1 T 1 T X X11,k ≈ A1,1 , X11,k X12,k ≈ A1,2 , X12,k X12,k ≈ A2,2 , k 11,k k k 1 1 XT X21,k ≈ A1,1 , XT X22,k ≈ A1,2 , N − k 21,k N − k 21,k 1 XT X22,k ≈ A2,2 , N − k 22,k and the matrix A = A1,1 T A1,2 A1,2 A2,2 (5.4) (5.5) has rank d + p. (5.6) Assuming that H0 holds, (5.4)–(5.6) are satisfied and ε1 , ε2 , . . . , ε N are independent and identically distributed with enough moments, then one can prove that 1/2 lim P{(2 log log N )1/2 Z N ≤ x + 2 log log N N →∞ + (d/2) log log log N − log (d/2)} = exp(−2e−x ) (5.7) for all x, where (·) denotes the Gamma function. The limit result in (5.7) was obtained in Horváth (1995) and it is also given in Sect. 3.1.1 of Csörgő and Horváth (1997). Conditions (5.4)–(5.6) are satisfied by not only a large class of numerical sequences but also realizations of stationary and ergodic variables. We used “≈” in (5.4) and (5.5) since it can also mean closeness in probability. Likelihood methods can also be extended to include a possible change in the variance under the alternative. The natural estimator for k ∗ is the argmaxk (−2 log k ). The asymptotic properties of this estimator are investigated in Horváth et al. (1997) and Hušková (1996). Chapter 3 of Csörgő and 123 Author's personal copy L. Horváth, G. Rice Horváth (1997) contains results on the power of the maximally selected likelihood and further testing methods. The proof of (5.7) is based on a strong approximation of the partial sums of the vectors (xi,1 + xi,2 )εi , 1 ≤ i ≤ N , so it can be extended to the case when the errors εi , 1 ≤ i ≤ N , are dependent. However, since Z N was derived under the assumption of independence, adjustments must be made to the limit result in (5.7) involving the long-run covariance matrix of the sum of (xi,1 + xi,2 )εi , 1 ≤ i ≤ N [cf. (5.10) below]. Next we consider the model yi = xiT (β + δ I {i > k ∗ }) + εi , 1 ≤ i ≤ N , (5.8) xi = h(i/N ), 1 ≤ i ≤ N . (5.9) where As before δ = 0 and we wish to test (5.2) against (5.3). Assumption (5.9) contains the polynomial and harmonic regression in Jandhyala and MacNeill (1997), the linear model in Albin and Jarušková (2003), Hansen (2000), Hušková and Picek (2005), Jarušková (1998), and the polynomial regression in Aue et al. (2009a), Aue et al. (2012), Aue et al. (2008a), Jarušková (1999) and Kuang (1998). The maximally selected likelihood ratio can be derived as in the the first part of this section so we need to derive the limit distribution of Z N under assumption (5.9). Note that (5.4) and (5.5) hold only if h is the constant function so the limit result in (5.7) cannot be used even if the errors ε1 , ε2 , . . . , ε N were independent and identically distributed. Instead of assuming mixing or similar conditions on the stationary errors εi , 1 ≤ i ≤ N , Aue et al. (2012) assumed that the partial sums of the errors can be approximated with suitably constructed Wiener processes. For example, approximations have been obtained for ARMA, GARCH-type, mixing, near epoch-dependent and m-approximable sequences. These results can be found in Aue et al. (2006b), Berkes et al. (2014), Bradley (2007), Dedecker et al. (2007), Doukhan (1994) and Eberlein (1986). The main result in Aue et al. (2012) says that lim P{(σ/τ )2 Z N ≤ x + 2 log log N N →∞ + d log log log N − 2 log(2d/2 (d/2)/d)} = exp(−2e−x/2 ) (5.10) for all x, where σ = 2 Eεi2 and N 2 1 E τ = lim εi . N →∞ N 2 i=1 We note that (5.10) can be written as 1/2 lim P{(2 log log N )1/2 (σ/τ )Z N ≤ x + 2 log log N N →∞ + (d/2) log log log N − log(2d/2 (d/2)/d)} = exp(−2e−x ). 123 (5.11) Author's personal copy Classical methods in change point analysis As before in the classical case of (5.7), under assumption (5.4)–(5.6), the limit result in (5.10) depends only on the number of parameters which can change under the alternative. Only the constant is different in the normalization in (5.7) and (5.11). The methods discussed in this section will reject falsely the no change in the parameters null hypothesis even if the null is correct but the errors are nonstationary; typically a random walk is used to model nonstationarity. Hence, it is a challenging question to test for a unit root if there are change points in the regression line. Wright (1996) derives the limits of some of the stability tests, including the Lagrange multiplier and sup–Wald tests when the regressor is local to a random walk extending the result in Hansen (1992). A procedure which utilizes auxiliary statistics to detect the presence of trend breaks and using the outcome of the detection step to check for possible unit root is outlined in Carrion–i-Silvestre et al. (2009). Their method achieves near asymptotically efficient unit root inference in both the no trend break as well as in the trend break environments. However, according to their simulations the high efficiency is not apparent if the sample size is small or moderate. Harvey et al. (2013) propose a test that allows for multiple breaks in the regression, obtained by taking the infimum of the sequence (across all possible break points) of detrended Dickey–Fuller type statistics. Change point detection is considered in Iacone et al. (2013) when the errors are integrated (including random walk) in a linear regression. 6 Sequential testing So far we have discussed retrospective break point tests, i.e. the case when we have a set of observations and wish to check if the parameter of interest changed during the observation period. There is a large and still growing literature concerned with retrospective break point tests and estimation procedures, but much less attention has been paid to the corresponding sequential procedures. Starting with the seminal paper Chu et al. (1996) several authors have developed fluctuation tests that are based on the general paradigm that an initial time period (sometimes called historical or training sample) of length m is used to estimate a model with the goal to monitor for parameter changes on-line. We assume that X1 , X2 , . . . , Xm are from a stable observation period, i.e. the parameter of interest is the same for these observations. The asymptotic analysis is carried out for m → ∞. Under the null hypothesis the parameter of Xm+1 , Xm+2 , . . . is the same while under the alternative there is an integer k ∗ ≥ 0 such that the parameter of Xm+1 , Xm+2 , . . . , Xm+k ∗ is the same as in the historical sample, but Xm+k ∗ +1 , Xm+k ∗ +2 , . . . have a different parameter (k ∗ = 0 is taken to mean that the parameter changes immediately after the last historical observations). To test the null hypothesis of structural stability sequentially, one defines a stopping time τm that rejects the null as soon as the detector Dm (k), suitably constructed from the sample, crosses an appropriate threshold g(m, k) (measuring the growth of the detector under the null). Two types of detectors have been used in the literature: τm = inf{k : Dm (k) ≥ g(m, k)}, (6.1) 123 Author's personal copy L. Horváth, G. Rice where inf(∅) = ∞ and τm (N ) = inf{k < N : Dm (k) ≥ g(m, k)} (6.2) and τm (N ) = N + 1 if D M (k) < g(m, k) for all 1 ≤ k ≤ N . If the stopping time τm is used, the method is called open-ended since we may not stop the data-generating process under the null hypothesis. We say that the sequential testing is closed-ended if it is based on τm (N ) since the data collection will terminate after the N th observation. The boundary g(m, k) is chosen such that lim P{τm < ∞} = α m→∞ under H0 (6.3) and in case of a close-ended procedure lim P{τm (N ) ≤ N } = α m→∞ under H0 , (6.4) where α is a fixed small number picked by the practitioner. Equations (6.3) and (6.4) mean that the probability of stopping when we should not is α if the historical sample size is large enough. Of course, (6.3) or (6.4) does not determine the boundary and we should use a boundary such that k ∗ −τm (or k ∗ −τm (N )) is small under the alternative. To illustrate the method, we consider the linear model yi = xiT β i + εi , i ≥ 1. We assume that the training sample of size m is noncontaminated, i.e. β 1 = β 2 = · · · = β m . Under H0 we have that β m = β m+1 = · · · while under the alternative β m = β m+1 = · · · = β m+k ∗ = β m+k ∗ +1 = β m+k ∗ +2 = · · · . Let β̂ m denote the least square estimator for the regression parameter computed from yi , xi , 1 ≤ i ≤ m. Following Horváth et al. (2004) the detector is based on the residuals ε̂i = yi − xiT β̂ m , m + 1 ≤ i < ∞ and σ̂m2 , the estimator for the common variance of the εi ’s computed from the training sample. The detector is the CUSUM of the residuals defined as 1 Dm (k) = σ̂m ε̂i m<i≤m+k and the boundary function is γ k k g(m, k) = cm 1/2 1 + , m m+k 123 (6.5) Author's personal copy Classical methods in change point analysis where 0 ≤ γ < 1/2 is a given parameter and c is determined by (6.3). Assuming that ε1 , ε2 , . . . are independent and identically distributed random variables with Eεi = 0, Eεi2 = σ 2 and E|εi |ν < ∞ with some ν > 2, then ! lim P{τm < ∞} = lim P m→∞ m→∞ =P sup Dm (k)/g(m, k) < 1 1≤k<∞ ! γ sup |W (t)|/t ≤ c , (6.6) 0≤t≤1 where W denotes a Wiener process. The assumption that the errors are independent and identically distributed random variables can be replaced with the assumption that the partial sums of the εi ’s can be approximated with Wiener "processes but in this case σ̂m2 must be a consistent estimator for limm→∞ var(m −1/2 1≤i≤m εi ). For details we refer to Aue et al. (2008b). By the law of the iterated logarithm for the Wiener process we get immediately that sup |W (t)|/t 1/2 = ∞ with probability one, 0<t<1 so we cannot choose γ = 1/2 in (6.6). The limit distribution of τm was studied in Aue and Horváth (2004) and Aue et al. (2008b) under the alternative in the special case of a change in the location (i.e. xi = 1). They found numerical sequences am = am (γ ) and bm = bm (γ ) > 0 such that under the alternative lim P m→∞ τm − am ≤ x = (x) bm for all x, where (x) denotes the standard normal distribution function. They also pointed out that τm − k ∗ ≈ m (1−2γ )/(1−γ ) (in probability) so it is decreasing as γ → 1/2 and it would be as small as possible when γ = 1/2 but this case is not allowed in (6.6). The case of γ = 1/2 is investigated in Horváth et al. (2007) where it is made clear that the boundary function g(m, k) of (6.5) cannot be used. We use g(m, k) = c + 2 log log m + (1/2) log log log m − (1/2) log π (2 log log m)1/2 1/2 k k 1/2 1+ ×m m k+m (6.7) instead of (6.5) in case of a closed-ended procedure. It is proved in Horváth et al. (2007), if N is proportional to m λ , where λ ≥ 1, then assuming no change in the linear model parameters we have lim P{τm (N ) ≤ N } = exp(−e−c ), m→∞ (6.8) 123 Author's personal copy L. Horváth, G. Rice where the boundary is given by (6.7). One can also show [cf. Aue and Horváth (2004) and Aue et al. (2008b)] that τm (N ) − k ∗ is bounded by log log m in probability which is an improvement over the polynomial bound if (6.5) is used. For a multivariate extension we refer to Aue et al. (2014). It is interesting to note that the square root boundary function (i.e. the standard deviation of the detector) leads to a Darling–Erdős type limit result in (6.8). For the sake of simplicity we considered linear models with not necessarily independent error terms. However, the sequential detection method can also be extended to nonlinear models along the lines of Berkes et al. (2004b). Resampling to get critical values is proposed in Hušková and Kirch (2012). Sequential monitoring for changes in the parameters of a linear model is considered in Aue et al. (2006a) and Černíková et al. (2013). Fourier coefficients are utilized in Hlávka et al. (2012). The paper Chochola et al. (2013) contains an interesting application to finance. 7 Panel models Panel models are very popular when short segments of several sets of observations are available. This is common when collectively examining the performance of companies where usually yearly data are available for a large number of companies [cf. Bartram et al. (2012)]. Hence it is an important consideration in case of panel data that N , the number of panels, might be much larger than T , the number of observations in any given panel. For a survey on panel data we refer to Hsiao (2007). To illustrate change point detection in panels, we consider a very simple model: X i,t = μi + δi I {t > t0 } + γi ηt + ei,t , 1 ≤ i ≤ N , 1 ≤ t ≤ T. (7.1) The ith panel has an initial mean μi which might change to μi + δi if the change in the mean happens before time T , i.e. the change occurs in the observation period. The panels are connected by the common factors ηt and the effect of the common factors in the ith panel is measured by γi . We do not assume that ηt is observed; it is used as an error term common to all panels. The errors {ei,t , 1 ≤ t ≤ T } are time series for each i, and it is assumed that {ei,t , 1 ≤ t ≤ T, 1 ≤ i ≤ N } is independent of {ηt , 1 ≤ t ≤ T } or at least uncorrelated. We wish to test the null hypothesis H0 : t0 ≥ T (7.2) H A : t0 < T. (7.3) against the alternative The model in (7.1) has a large number of parameters but all of them are nuisance parameters with the exception of t0 . No statement is made about the nuisance parameters under H0 nor under H A , and only T observations are available for them. However, the possible time of change appears in all observations, so the number of observations 123 Author's personal copy Classical methods in change point analysis which can be used to estimate t0 is NT. The panel data approach has excellent performance if the statistical inference is on parameters which are common in a large number of panels. The model (7.1) was introduced by Bai (2010) without the common factors and assuming that H A holds the author estimated the time of change. In this section we discuss the CUSUM-based testing method of Horváth and Hušková (2012). Due to possible dependence between the observations in the ith panel the CUSUM will be normalized by σi2 T 2 1 E = lim ei,t , 1 ≤ i ≤ N . T →∞ T t=1 The CUSUM process in the panel model of (7.1) is defined by V̄N ,T (x) = 1 N N 1/2 i=1 ! T x(T − T x) 1 2 Z T,i (x) − , T2 σi2 where Z T,i (x) = 1 T 1/2 ST,i (x) − T x ST,i (1) T with ST,i (x) = T x X i,t , 0 ≤ x ≤ 1. t=1 By definition, the CUSUM process for the panels is the sum of the squares of all CUSUM processes computed from the individual panels. Next we list the most important conditions which are used to establish the weak convergence of V̄N ,T (x): (7.4) for each i the sequence {X i,t , −∞ < t < ∞} is a linear process N → 0 (7.5) T2 {ei,t , 1 ≤ i ≤ N , 1 ≤ t ≤ T } and {ηt , 1 ≤ t ≤ T } are independent (7.6) 1 T 1/2 T x D[0,1] ηt −→ W (x), where W is a Wiener process (7.7) t=1 for all i we have that γi = γi,N = where ζi = 0. ζi with some ρi > 1/4, N ρi (7.8) 123 Author's personal copy L. Horváth, G. Rice Assuming that H0 , (7.4)–(7.8) and some additional moment conditions are satisfied, then D[0,1] V̄N ,T (x) −→ V̄ (x), (7.9) where V̄ (x) is a Gaussian process with E V̄ (x) = 0 and E V̄ (x)V̄ (y) = 2x 2 (1 − y)2 , 0 ≤ x ≤ y ≤ 1. For the proof of (7.9) we refer to Horváth and Hušková (2012). The weak convergence in (7.9) should remain true if the independence in (7.6) is replaced with “uncorrelated”. Assumption (7.7) is minor since only the normality of the sum of the common factors is required. If (7.5) does not hold, then a drift term of order N 1/2 /T will appear. Also, if for all i ζi = 0 and ρi ≤ 1/4, then the weak convergence in (7.9) does not hold. It is an interesting and still unsolved problem to test H0 in (7.2) when the correlation between the panels is large, i.e. the loadings √ are large. Also it would be important to find tests when T is much smaller than N . Theoretical considerations as well as Monte Carlo simulations show that in the panel data model very small changes in the mean can be detected. The long-run variances of the errors are usually unknown so the σi2 ’s in the definition of V̄N ,T must be replaced with some suitable estimators. Bartlett-type kernel estimators were proposed in Horváth and Hušková (2012) and a simulation study in that paper supports the suggestion. A quasi-maximum likelihood argument leads to the self-normalized statistic E N ,T ! N 2 (x) 1 T 2 Z T,i 1 − 1 = sup 1/2 . 2 σi T x(T − T x) 0<x<1 N i=1 A Darling–Erdős type result for E N ,T under the null hypothesis was proven in Chan et al. (2013) but assuming much stronger conditions than (7.4)–(7.8). The estimator for t0 in Bai (2010) is strongly related to E N ,T . Bai (2010) estimates the time of change with tˆN ,T = T argmaxx N 2 (x) Z T,i 1 σ 2 T x(T − T x) i=1 i and obtains the distribution of the normalized difference between t0 and tˆN ,T assuming that γi = 0 for all 1 ≤ i ≤ N , i.e. the panels are independent. The more general case of γi = 0 for some 1 ≤ i ≤ N is investigated in Horváth et al. (2013a). It has been observed [cf. Aue et al. (2009b) and Berkes et al. (2006)] that CUSUM and the quasi-maximum likelihood tests reject the no change in the mean null hypothesis, even if the null hypothesis is correct, when the errors follow a random walk or there is long-range dependence between the errors. This phenomenon was also investigated in panel data. Baltagi et al. (2012) consider the regression model yi,t = α + δ1 I {t > t0 } + (β + δ2 I {t > t0 })xi,t + u i,t , 1 ≤ i ≤ N , 1 ≤ t ≤ T, 123 Author's personal copy Classical methods in change point analysis where (δ1 , δ2 ) = (0, 0). They assume that {xi,t , 1 ≤ t ≤ T }, {u i,t , 0 ≤ t ≤ T } are independent. This implies immediately that the panels are independent and the regressors and the errors are independent in each panel. Furthermore, E xi,t = Eu i,t = 0, for each i, {xi,t , 0 ≤ t ≤ T } is an AR(1) process with autoregressive parameter λ (7.10) and for each i, {u i,t , 0 ≤ t ≤ T } is an AR(1) process with autoregressive parameter ρ. (7.11) Stationarity means that both λ and ρ are in the interval (−1, 1) while nonstationarity means that λ or ρ or both are 1 (random walk). Let t˜T,N denote the least squares estimator for t0 as defined in Baltagi et al. (2012), i.e. when the ordinary least squares will take their smallest value with respect to all parameters including the time of change. They obtained several limit results for the difference between t˜T,N and t0 . Assuming that t0 = T τ0 , 0 < τ < 1, they show t˜N ,T /T → τ0 in probability even in the case when |ρ| < 1 and λ = 1. However, if ρ = 1, |λ| < 1 or ρ = λ = 1, then t˜N ,T /T converges in distribution to a nondegenerate limit. Westerlund and Larsson (2012) pointed out that the common regression parameters in (7.10) and (7.11) restrict the applicability of the model. Following Im et al. (2003) one possibility is that for each i the AR(1) (autoregressive of order 1) processes xi,t , and u i,t have their own autoregressive parameters. This means that in this model we have additional 2N − 2 parameters. However, the statistical inference is only about t0 and the additional parameters are nuisance parameters. The discussion of the model in (7.1) suggests that testing if t0 ≥ T or the estimation of t0 is possible even if the number of nuisance parameters is large. The other possibility to weaken conditions (7.10) and (7.11) is to use the random coefficient panel model of Horváth and Trapani (2013), Ng (2008) and Westerlund and Larsson (2012). In the random coefficient approach, for each i, 1 ≤ i ≤ N , xi,t is an AR(1) process with parameter λi with Eλi = λ and similarly the parameter of u i,t is a random variable ρi with Eρi = ρ. The main result in Horváth and Trapani (2013) states that the statistical inference for the autoregression parameter is the same for stationary, nonstationary and mixed cases. It is an interesting and possibly challenging question if one could have statistical inference for t0 even in nonstationary cases. 8 Functional observations In this section we assume that the observations are functions defined on [0, 1]. Many data may be considered naturally as curves. For example, temperatures observed several times on a given day can be considered as a discrete sample from an underlying daily temperature curve. Similarly, pollution levels or blood pressure measurements can be considered as realizations of curves, densely observed. Stock prices change 123 Author's personal copy L. Horváth, G. Rice when the stock is traded but in case of frequently traded stocks it happens so often that it would be too high dimensional to use classical methods. Also, the trading might happen at different times on different days. Introduction and thorough reviews of functional data methods are given in Cuevas (2014), Ferraty and Romain (2011) and Horváth and Kokoszka (2012). One of the first papers on change detection in functional data was motivated by yearly temperature measurements; one curve is constructed from 365 daily observations in Berkes et al. (2009b). Based on these curves we wish to test if the mean yearly temperature curve remained the same since the data collection started. Let the functional observations satisfy the model X i (t) = μ(t) + δ(t)I {i > k ∗ } + εi (t), 1 ≤ i ≤ N . We assume that δ = 0, 1 where x = ( x 2 (t)dt)1/2 , and = 0 . We wish to test H0 : k ∗ ≥ N against the alternative HA : 1 < k∗ < N . We can test H0 against H A using the functional version of the CUSUM process ⎛ S N◦ (x, t) = N x 1 N 1/2 ⎝ i=1 ⎞ N N x X i (t) − X i (t)⎠ , 0 ≤ x, t ≤ 1. N i=1 It is clear that under H0 the process S N (x, t) does not depend on the unknown μ(t). We assume that ε1 , ε2 , . . . , ε N are independent and identically distributed random functions, Eεi (t) = 0 and Eεi 2 < ∞. Let C(t, s) = Eεi (t)εi (s) 123 (8.1) (8.2) Author's personal copy Classical methods in change point analysis denote the covariance function of the errors. If (8.1) and (8.2) hold, then we can define a sequence of Gaussian processes G N◦ (x, t) with EG N◦ (x, t) = 0 and EG N◦ (x, t)G N◦ (y, s) = (min(x, y) − x y)C(t, s) such that $2 # ◦ S N (x, t) − G N◦ (x, t) dxdt = o P (1), as N → ∞. (8.3) The approximation in (8.3) follows immediately from the approximation of partial sums in Hilbert spaces. Namely, we can define a sequence of Gaussian processes G N (x, t) with EG N (x, t) = 0 and EG N (x, t)G N (y, s) = min(x, y)C(t, s) such that (S N (x, t) − G N (x, t))2 dxdt = o P (1), as N → ∞, (8.4) where S N (x, t) = 1 N 1/2 N x X i (t). (8.5) i=1 The proof of the result in (8.4) can be found in Horváth and Kokoszka (2012). By the spectral theorem we have C(t, s) = ∞ λi φi (t)φi (s), i=1 where λ1 ≥ λ2 . . . ≥ 0 and the orthonormal functions φ1 (t), φ2 (t), . . . satisfy λi φi (t) = C(t, s)φi (s)ds, 1 ≤ i < ∞. Using the Karhunen–Loéve expansion we conclude D (G N◦ (x, t))2 dtdx = ∞ λi Bi2 (x)dx, (8.6) i=1 where B1 , B2 , . . . are independent Brownian bridges. The limit distribution depends on the unknown λi ’s so we need to estimate them from the random sample. The covariance kernel C can be estimated by Ĉ N (t, s) = N 1 (X i (t) − X̄ N (t))(X i (s) − X̄ N (s)), N i=1 123 Author's personal copy L. Horváth, G. Rice where X̄ N (t) = N 1 X i (t). N i=1 Under assumptions (8.1) and (8.2) we have that Ĉ N − C = o P (1). (8.7) Let λ̂1 ≥ λ̂2 ≥ λ̂3 ≥ . . . ≥ 0 and φ̂1 (t), φ̂2 (t), . . . be the empirical eigenvalues and corresponding eigenfunctions satisfying λ̂i φ̂i (t) = Ĉ N (t, s)φ̂i (s)ds. If λ1 > λ2 > . . ., the relation in (8.7) [c.f. Horváth and Kokoszka (2012)] implies that |λ̂i − λi | = o P (1) and φ̂i − φi = o P (1), "d λ̂i Bi2 (t)dt assuming and therefore the limit in (8.6) can be approximated with i=1 that d is large enough. However, we still need to use Monte Carlo simulations to get critical values. A different approach in Berkes et al. (2009b) is based on projections. First we define the score vectors T ξ̂ i = (ξ̂i,1 , ξ̂i,2 , . . . , ξ̂i,d ), with ξ̂i, j = X i , φ̂ j 1 ≤ j ≤ d. where ·, · denotes the inner product in the Hilbert space of square integrable functions on [0, 1]. The CUSUM process is now defined as ⎛ ⎞2 N x d N 1 1 ⎝ ξ̂i, j − x ξ̂i, j ⎠ . H N (x) = N j=1 λ̂ j i=1 i=1 It is proven in Berkes et al. (2009b) that under assumption (8.1) and (8.2) D H N (x) → d Bi2 (x), (8.8) i=1 where B1 , B2 , . . . , Bd are independent Brownian bridges. The distributions of the supremum and integral functions of the limit process in (8.8) are discussed in Sect. 4 of the present paper. Comparing (8.6) and (8.8) we see that the empirical projection method provides a simple, distribution-free procedure. Using the supremum functional of H N (x) as a test statistic, the central England daily temperature data was segmented into six homogenous parts in Berkes et al. (2009b). 123 Author's personal copy Classical methods in change point analysis Fig. 6 Plots of 5 yearly temperature curves for Atlantic City (NJ), the raw data are on the upper panel and the smoothed curves are on the lower panel Example 4 The National Climatic Data Center collects, stores, and analyzes climatic data obtained from thousands of weather stations across the US. The data sets are published at their website www.ncdc.noaa.gov/cdo-web/, of which many go back over 100 years. In this example, we consider the daily maximum temperature in Atlantic City (NJ) from January 1, 1874 to December 31, 2012. Only two observations in the data set consisting of over 50,000 data points were missing, and we replaced them using linear interpolation between the adjacent data points. Due to the yearly periodicity in temperature, we divided the data set into 138 yearly observations each of which consists of 365 or 366 observations. The plots of the first 5 years of data are 123 Author's personal copy L. Horváth, G. Rice displayed in the upper panel of Fig. 6. Since the underlying data points can be thought of as discrete observations of an underlying continuous yearly temperature curve, we consider the data to be functional in nature and proceed by approximating the data with continuous curves. Many techniques for creating functional data objects from a discrete collection of points have been implemented in the fda package within R, see Ramsay et al. (2009) for details. To create functional data objects from the temperature data, we used the fda package to approximately interpolate the data points using a Bspline basis with 50 curves. The smooth curves generated in this way are displayed in the lower panel of Fig. 6. To obtain an approximate test of the hypothesis that the mean yearly temperature curve does not change over the observation period using (8.8), we 1 "4 1 2 calculate 0 H138 (x)dx. We compare this value to a critical value of i=1 0 Bi (x)dx which are tabulated in Kiefer (1959). The calculation of H138 (x) requires the choice of d. The most common technique to choose d in practice is the cumulative variance approach; d is chosen so that d i=1 λ̂i N % λ̂i ≈ v, i=1 where v is a specified percentage. In our analysis we used the cumulative variance approach with v = 0.9 which gave d = 4. For a thorough account of the cumulative variance approach and principal component analysis for functional data we refer to Horváth and Kokoszka (2012). The value of our test statistic is 4.817, which is larger than the 0.001 critical value in Kiefer (1959). Some basic results on functional time series models are summarized in Hörmann and Kokoszka (2010). An analog of the CUSUM process Hn (x) is introduced and studied in Horváth et al. (2014) in case of dependent observations. Their method is used to analyze stock returns. The most popular time series model is the functional autoregression due to Bosq (2000). A test for the stability of functional autoregressive processes is provided by Horváth et al. (2010). The model is used for prediction in Kargin and Onatski (2008). For some interesting applications to biological data we refer to Aston and Kirch (2012a,b). Models for nonlinear functional time series are presented in Hörmann et al. (2013) and Horváth et al. (2013b). Acknowledgments We are grateful to Marie Hušková, Stefan Fremdt and the participants of the Time Series Seminar at the University of Utah for pointing out mistakes in the earlier versions of this paper and to Daniela Jarušková and Brad Hatch for some of the data sets. References Albin JMP, Jarušková D (2003) On a test statistic for linear trend. Extremes 6:247–258 Andreou E, Ghysels E (2002) Detecting multiple breaks in financial market volatility dynamics. J Appl Econom 17:579–600 Andrews DWK (1993) Tests for parameter instability and structural change with unknown change point. Econometrica 61:821–856 Aston J, Kirch C (2012a) Evaluating stationarity via change-point alternatives with applications to fmri data. Ann Appl Stat 6:1906–1948 123 Author's personal copy Classical methods in change point analysis Aston J, Kirch C (2012b) Detecting and estimating changes in dependent functional data. J Multivar Anal 109:204–220 Aue A, Horváth L (2004) Delay time in sequential detection of change. Stat Prob Lett 67:221–231 Aue A, Horváth L, Hušková M, Kokoszka P (2006a) Change-point monitoring in linear models with conditionally heteroscedastic errors. Econom J 9:373–403 Aue A, Berkes I, Horváth L (2006b) Strong approximation for the sums of squares of augmented garch sequences. Bernoulli 12:583–608 Aue A, Horváth L, Hušková M, Kokoszka P (2008a) Testing for changes in polynomial regression. Bernoulli 14:637–660 Aue A, Horváth L, Kokoszka P, Steinebach JG (2008b) Monitoring shifts in mean: asymptotic normality of stopping times. Test 17:515–530 Aue A, Horváth L, Hušková M (2009a) Extreme value theory for stochastic integrals of legendre polynomials. J Multivar Anal 100:1029–1043 Aue A, Horváth L, Hušková M, Ling S (2009b) On distinguishing between random walk and changes in the mean alternatives. Econom Theory 25:411–441 Aue A, Hörmann S, Horváth L, Reimherr M (2009c) Break detection in the covariance structure of multivariate time series models. Ann Stat 37:4046–4087 Aue A, Horváth L, Hušková M (2012) Segmenting mean-nonstationary time series via trending regression. J Econom 168:367–381 Aue A, Horváth L (2013) Structural breaks in time series. J Time Ser Anal 34:1–16 Aue A, Dienes C, Fremdt S, Steinebach JG (2014) Reaction times of monitoring schemes for ARMA time series. Bernoulli (to appear) Bai J (1999) Likelihood ratio test for multiple structural changes. J Econom 91:299–323 Bai J (2010) Common breaks in means and variances for panel data. J Econom 157:78–92 Baltagi BH, Kao C, Liu L (2012) Estimation and identification of change points in panel models with nonstationary or stationary regressors and error terms. Preprint. Bartram SM, Brown G, Stulz RM (2012) Why are us stocks more volatile? J Finance 67:1329–1370 Batsidis A, Horváth L, Martín N, Pardo L, Zografos K (2013) Change-point detection in multinomial data using phi-convergence test statistics. J Multivar Anal 118:53–66 Berkes I, Philipp W (1977) An almost sure invariance principle for the empirical distribution function of mixing random variables. Zeitschrift für Wahrscheinlichtkeitstheorie und verwandte Gebiete 41:115–137 Berkes I, Philipp W (1979) Approximation theorems for independent and weakly dependent random vectors. Ann Prob 7:29–54 Berkes I, Horváth L (2001) Strong approximation of the empirical process of garch sequences. Ann Appl Prob 11:789–809 Berkes I, Horváth L (2002) Empirical processes of residuals. In: Dehling H, Mikosch T, Sorensen M (eds) Empirical process techniques for dependent data. Birkhäuser, Basel, pp 195–209 Berkes I, Horváth L (2003) Limit results for the empirical process of squared residuals in garch models. Stoc Process Appl 105:271–298 Berkes I, Horváth L, Kokoszka P (2004a) Testing for parameter constancy in GARCH( p, q) models. 70: 263–273 Berkes I, Gombay E, Horváth L, Kokoszka P (2004b) Sequential change-point detection in garch( p, q) models. Econom Theory 20:1140–1167 Berkes I, Horváth L, Hušková M, Steinebach M (2004c) Applications of permutations to the simulation of critical values. J Nonparamet Stat 16:197–216 Berkes I, Horváth L, Kokoszka P, Shao Q-M (2006) On discriminating between long-range dependence and changes in the mean. Ann Stat 34:1140–1165 Berkes I, Hörmann S, Horváth L (2008) The functional central limit theorem for a family of garch observations with applications. Stat Prob Lett 78:2725–2730 Berkes I, Hörmann S, Schauer J (2009a) Asymptotic results for the empirical process of stationary sequences. Stoch Process Appl 119:1298–1324 Berkes I, Gabrys R, Horváth L, Kokoszka P (2009b) Detecting changes in the mean of functional observations. J R Stat Soc Ser B 70:927–946 Berkes I, Gombay E, Horváth L (2009c) Testing for changes in the covariance structure of linear processes. J Stat Plan Inference 139:2044–2063 Berkes I, Liu W, Wu WB (2014) Komlós–major–tusnády approximation under dependence. Ann Prob 42:794–817 123 Author's personal copy L. Horváth, G. Rice Billingsley P (1968) Convergence probability measures. Wiley, New York Blum JR, Kiefer J, Rosenblatt M (1961) Distribution free tests of independence based on the sample distribution function. Ann Math Stat 32:485–498 Bosq D (2000) Linear processes in function spaces. Springer, New York Bradley RC (2007) Introduction to strong mixing conditions, vol 1–3. Kendrick Press, Heber City Brockwell PJ, Davis RA (1991) Time series: theory and methods, 2nd edn. Springer, New York Busetti F, Taylor AMR (2004) Tests of stationarity against a change in persistence. J Econom 123:33–66 Carrion-i-Silvestre JL, Kim D, Perron P (2009) Gls-based unit root tests with multiple structural breaks both under the null and the alternative hypotheses. Econom Theory 25:1754–1792 Černíková A, Hušková M, Prášková Z, Steinebach J (2013) Delay time in monitoring jump changes in linear models. Statistics 47:1–25 Chan J, Horváth L, Hušková M (2013) Darling–erdős limit results for change-point detection in panel data. J Stat Plan Inference 143:955–970 Chochola O, Hušková M, Prášková Z, Steinebach JG (2013) Robust monitoring of capm portfolio betas. J Multivar Anal 115:374–396 Chu C-SJ, Stinchcombe M, White H (1996) Monitoring structural change. Econometrica 64:1045–1065 Chung KL, Williams RJ (1983) Introduction to stochastic integration. Birkhäuser, Boston Csáki E (1986) Some applications of the classical formula on ruin probabilities. J Stat Plan Inference 14:35–42 Csörgő M, Révész P (1981) Strong approximations in probability and statistics. Academic Press, New York Csörgő M, Horváth L (1987) Nonparametric tests for the changepoint problem. J Stat Plan Inference 17:1–9 Csörgő M, Horváth L (1993) Weighted approximations in probability and statistics. Wiley, Chichester Csörgő M, Horváth L (1997) Limit theorems in change-point analysis. Wiley, Chichester Cuevas A (2014) A partial overview of the theory of statistics with functional data. J Stat Plan Inference 147:1–23 Darling DA, Erdős P (1956) A limit theorem for the maximum of normalized sums of independent random variables. Duke Math J 23:143–155 Davis RA, Huang D, Yao Y-C (1995) Testing for a change in the parameter values and order of an autoregressive model. Ann Stat 23:282–304 Davis RA, Lee TC, Rodriguez-Yam G (2008) Break detection for a class of nonlinear time series models. J Time Ser Anal 29:834–867 Dedecker I, Doukhan P, Lang G, León JRR, Louhichi S, Prieur C (2007) Weak dependence with examples and applications. Lecture Notes in Statistics. Springer, Berlin Dehling H, Fried R (2012) Asymptotic distribution of two-sample empirical u-quantiles with applications to robust tests for shifts in location. J Multivar Anal 105:124–140 Dominicy Y, Hörmann S, Ogata H, Veredas D (2013) On sample marginal quantiles for stationary processes. Stat Prob Lett 83:28–36 Doukhan P (1994) Mixing: properties and examples, vol 85. Lecture Notes in Statistics. Springer, Berlin Dutta K, Sen PK (1971) On the bahadur representation of sample quantiles in some stationary multivariate autoregressive processes. J Multivar Anal 1:186–198 Eberlein E (1986) On strong invariance principles under dependence assumptions. Ann Prob 14:260–270 Ferraty F, Romain Y (2011) Oxford handbook of functional data analysis. Oxford University Press, Oxford Francq Z, Zakoïan J-M (2010) GARCH models. Wiley, Chichester Fremdt S (2013) Page’s sequential procedure for change-point detection in time series regression. http:// arxiv.org/abs/1308.1237 Fremdt S (2014) Asymptotic distribution of delay time in page’s sequential procedure. J Stat Plan Inference 145:74–91 Gombay E (1994) Testing for change-points with rank and sign statistics. Stat Prob Lett 20:49–56 Gombay E, Horváth L (1996) On the rate of approximations for maximum likelihood tests in change-point models. J Multivar Anal 56:120–152 Gombay E, Horváth L, Hušková M (1996) Estimators and tests for change in variances. Stat Decis 14:145– 159 Gombay E, Hušková M (1998) Rank based estimators of the change point. J Stat Plan Inference 67:137–154 Gombay E (2000) u-statistics for sequential change-detection. Metrika 54:133–145 Gombay E (2001) u-statistics for change under alternative. J Multivar Anal 78:139–158 Hansen BE (1992) Tests for parameter instability in regression with i(1) processes. J Bus Econom Stat 10:321–335 123 Author's personal copy Classical methods in change point analysis Hansen BE (2000) Testing for structural change in conditional models. J Econom 97:93–115 Harvey DI, Leybourne SJ, Taylor AMR (2013) Testing for unit roots in the possible presence of multiple trend breaks using minimum dickey–fuller statistics. J Econom 177:265–284 Hidalgo J, Seo MH (2013) Testing for structural stability in the whole sample. J Econom 175:84–93 Hlávka Z, Hušková M, Kirch C, Meintanis S (2012) Monitoring changes in the error distribution of autoregressive models based on fourier methods. TEST 21(2012):605–634 Hörmann S, Kokoszka P (2010) Weakly dependent functional data. Ann Stat 38:1845–1884 Hörmann S, Horváth L, Reeder R (2013) A functional version of the arch model. Econom Theory 29:267– 288 Horn RA, Johnson AMD (1991) Topics in matrix analysis. Cambridge University Press, Cambridge Horváth L (1984) Strong approximation of renewal processes. Stoch Process Appl 18:127–138 Horváth L (1993) The maximum likelihood method for testing changes in the parameters of normal observations. Ann Stat 21:671–680 Horváth L (1995) Detecting changes in linear regressions. Statistics 26:189–208 Horváth L, Serbinowska M (1995) Testing for changes in multinomial observations: the lindisfarnne scribes problem. Scand J Stat 22:371–384 Horváth L, Hušková M, Serbinowska M (1997) Estimators for the time of change in linear models. Statistics 29:109–130 Horváth L, Kokoszka P, Steinebach J (1999) Testing for changes in multivariate dependent observations with an application to temperature changes. J Multivar Anal 68:96–119 Horváth L, Hušková M, Kokoszka P, Steinebach J (2004) Monitoring changes in linear models. J Stat Plan Inference 126:225–251 Horváth L, Hušková M (2005) Testing for changes using permutations of u-statistics. J Stat Plan Inference 128:351–371 Horváth L, Kokoszka P, Steinebach J (2007) On sequential detection of parameter changes in linear regression. Stat Prob Lett 77:885–895 Horváth L, Horváth Zs, Hušková M (2008) Ratio tests for change point detection. In: Beyond parametrics in interdisciplinary research. IMS, Collections, 1:293–304 Horváth L, Hušková M, Kokoszka P (2010) Testing the stability of the functional autoregressive model. J Multivar Anal 101:352–367 Horváth L, Hušková M (2012) Change-point detection in panel data. J Time Ser Anal 33:631–648 Horváth L, Kokoszka P (2012) Inference for functional data with applications. Springer, New York Horváth L, Hušková M, Wang J (2013a) Estimation of the time of change in panel data. Preprint Horváth L, Kokoszka P, Reeder R (2013b) Estimation of the mean of of functional time series and a two sample problem. J R Stat Soc Ser B 75:103–122 Horváth L, Trapani L (2013) Statistical inference in a random coefficient panel model. Preprint Horváth L, Kokoszka P, Rice G (2014) Testing stationarity of functional data. J Econom 179:66–82 Hsiao C (2007) Panel data analysis—advantages and challenges. TEST 16:1–22 Hsu DA (1979) Detecting shifts of parameter in gamma sequences with applications to stock prices and air traffic flow analysis. J Am Stat Assoc 74:31–40 Hušková M (1996) Estimation of a change in linear models. Stat Prob Lett 26:13–24 Hušková M (1997a) Limit theorems for rank statistics. Stat Prob Lett 32:45–55 Hušková M (1997b) Multivariate rank statistics processes and change point analysis. In: Applied statistical sciences III, Nova Science Publishers, New York, pp 83–96 Hušková M, Picek J (2005) Bootstrap in detection of changes in linear regression. Sankhya Ser B 67:1–27 Hušková M, Prášková Z, Steinebach J (2007) On the detection of changes in autoregressive time series, i. asymptotics. J Stat Plan Inference 137:1243–1259 Hušková M, Kirch C, Prašková Z, Steinebach J (2008) On the detection of changes in autoregressive time series, ii. resampling procedures. J Stat Plan Inference 138:1697–1721 Hušková M, Kirch C (2012) Bootstrapping sequential change-point tests for linear regression. Metrika 75:673–708 Hušková M (2013) Robust change point analysis. In: Robustness and complex data structures, Springer, Berlin pp 171–190 Iacone F, Leybourne SJ, Taylor AMR (2013) Testing for a break in trend when the order of integration is unknown. J Econom 176:30–45 Im KS, Pesaran MH, Shin Y (2003) Testing for unit roots in heterogeneous panels. J Econom 115:53–74 123 Author's personal copy L. Horváth, G. Rice Inclán C, Tiao GC (1994) Use of cumulative sums of squares for retrospective detection of change of variance. J Amer Stat Assoc 89:913–923 Jandhyala VK, MacNeill IB (1997) Iterated partial sum sequences of regression residuals and tests for changepoints with continuity constraints. J R Stat Soc Ser B 59:147–156 Jarušková D (1998) Testing appearance of linear trend. J Stat Plan Inference 70:263–276 Jarušková D (1999) Testing appearance of polynomial trend. Extremes 2:25–37 Kargin V, Onatski A (2008) Curve forecasting by functional autoregression. J Multivar Anal 99:2508–2526 Kiefer J (1959) k-sample analogues of the kolmogorov–smirnov and cramér-v. mises tests. Ann Math Stat 30:420–447 Kim JY (2000) Detection of change in persistence of a linear time series. J Econom 95:97–116 Kim JY, Belaire-French J, Badillo AR (2002) Corringendum to “detection of change in persistence of a linear time series”. J Econom 109:389–392 Kirch C, Steinebach J (2006) Permutation principles for the change analysis of stochastic processes under strong invariance. J Comput Appl Math 186:64–88 Kirch C (2007a) Resampling in the frequency domain of time series to determine critical values for changepoint tests. Stat Decis 25:237–261 Kirch C (2007b) Block permutation principles for the change analysis of dependent data. J Stat Plan Inference 137:2453–2474 Kirch C, Politis DN (2011) Tft-bootstrap: resampling time series in the frequency domain to obtain replicates in the time domain. Ann Stat 39:1427–1470 Kirch C, Tadjuidje Kamgaing J (2012) Testing for parameter stability in nonlinear autoregressive models. J Time Ser Anal 33:365–385 Kokoszka P, Leipus R (2000) Change-point estimation in arch models. Bernoulli 6:513–539 Kuang C-M (1998) Tests for changes in models with a polynomial trend. J Econom 84:75–91 Lee S, Park S (2001) The cusum of squares test for scale changes in infinite order moving average processes. Scand J Stat 28:625–644 Liu W, Wu WB (2010) Asymptotics of spectral density estimates. Econom Theory 26:1218–1245 Louhichi S (2000) Weak convergence for empirical processes of associated sequences. Annales de l’Institut Henri Poincaré Probabilités et Statistiques 36:547–567 Ng S (2008) A simple test for nonstationarity in mixed panels. J Bus Econ Stat 26:113–126 Oberhofer W, Haupt H (2005) The asymptotic distribution of the unconditional quantile estimator under dependence. Stat Prob Lett 73:243–250 Page ES (1954) Continuous inspection schemes. Biometrika 41:100–105 Page ES (1955) A test for a change in a parameter occurring at an unknown point. Biometrika 42:523–526 Quandt RE (1958) Tests of the hypothesis that a linear regression system obeys two separate regimes. J Am Stat Assoc 53:873–880 Quandt RE (1960) The estimation of the parameters of a linear regression system obeying two separate regimes. J Am Stat Assoc 55:324–330 Ramsay JO, Hooker G, Graves S (2009) Functional data analysis with R and MATLAB. Springer, New York Sen PK (1968) Asymptotic normality of sample quantiles for m-dependent processes. Ann Math Stat 39:1724–1730 Shao QM, Yu H (1996) Weak convergence for weighted empirical processes of dependent sequences. Ann Prob 24:2098–2127 Shorack GR, Wellner JA (1986) Empirical processes with applications to statistics. Wiley, New York Taniguchi M, Kakizawa Y (2000) Asymptotic theory of statistical inference for time series. Springer, New York Westerlund J, Larsson R (2012) Testing for a unit root in a random coefficient panel data model. J Econom 167:254–273 Wied D, Krämer W, Dehling H (2012) Testing for a change in correlation at an unknown point in time using an extended functional delta method. Econom Theory 28:570–589 Wied D, Dehling H, van Kampen M, Vogel D (2014) A fluctuation test for constant Spearman’s rho with nuisance-free limit distribution. Comput Stat Data Anal (to appear) Wolfe DA, Schechtman E (1984) Nonparametric statistical procedures for the changepoint problem. J Stat Plan Inference 9:389–396 Wright JH (1996) Structural stability tests in the linear regression model when the regressors have roots local to unity. Econ Lett 52:257–262 123 Author's personal copy Classical methods in change point analysis Wu WB (2005) On the bahadur representation of sample quantiles for dependent sequences. Ann Stat 33:1934–1963 Yu H (1993) A glivenko–cantelli lemma and weak convergence for empirical processes of associated sequences. Prob Theory Relat Fields 95:357–370 123