See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/352739204 Seasonal functional autoregressive models Article in Journal of Time Series Analysis · June 2021 DOI: 10.1111/jtsa.12608 CITATIONS READS 2 136 4 authors, including: Hossein Haghbin Persian Gulf University Maryam Hashemi 9 PUBLICATIONS 23 CITATIONS 20 PUBLICATIONS 145 CITATIONS SEE PROFILE SEE PROFILE Rob J Hyndman Monash University (Australia) 328 PUBLICATIONS 27,861 CITATIONS SEE PROFILE Some of the authors of this publication are also working on these related projects: Data Science research core at ICT, Persian Gulf University View project MAPCAH study View project All content following this page was uploaded by Hossein Haghbin on 29 June 2021. The user has requested enhancement of the downloaded file. Seasonal functional autoregressive models Atefeh Zamani1 , Hossein Haghbin2 , Maryam Hashemi3 , and Rob J Hyndman4 27 May 2020 Abstract Functional autoregressive models are popular for functional time series analysis, but the standard formulation fails to address seasonal behavior in functional time series data. To overcome this shortcoming, we introduce seasonal functional autoregressive time series models. For the model of order one, we derive sufficient stationarity conditions and limiting behavior, and provide estimation and prediction methods. Moreover, we consider a portmanteau test for testing the adequacy of this model and we derive its asymptotic distribution. The merits of this model is demonstrated using simulation studies and via an application to real data. Keywords: Functional time series analysis, seasonal functional autoregressive model, central limit theorem, prediction, estimation. 1 Introduction Improved acquisition techniques make it possible to collect a large amount of high-dimensional data, including data that can be considered functional (Ramsay, 1982). Functional time series arise when such data are collected over time. We are interested in functional time series which exhibit seasonality. The underlying functional process is denoted by f t ( x ), where t = 1, . . . , T indexes regularly spaced time and x is a continuous variable. A seasonal pattern exists when f t ( x ) is influenced by seasonal factors (e.g., the quarter of the year, the month, the day of the week, etc.). Usually seasonality is considered to have a fixed and known period. For example, consider satellite observations measuring the normalized difference vegetation index (NDVI) (He, 2018). These are often averaged to obtain monthly observations over the land surface. Here x denotes the two spatial dimensions, while t denotes the month. Seasonal patterns are present due to the natural annual patterns of vegetation variation. 1 Department of Statistics, Faculty of Science, Shiraz University, Shiraz, Iran. of Statistics, Faculty of Science, Persian Gulf University, Boushehr, Iran. 3 Department of Statistics,University of Khansar, Khansar, Iran. 4 Department of Econometrics & Business Statistics, Monash University, Clayton VIC 3800, Australia. Email: Rob.Hyndman@monash.edu. Corresponding author. 2 Department 1 Another example arises in demography where f t ( x ) is the mortality rate for people aged x at time t (Hyndman and Ullah, 2007). When such data are collected more frequently than annually, seasonality arises due to deaths being influenced by weather. In other applications, x may denote a second time variable. For example, Hörmann, Kokoszka, Nisol, et al., 2018 study pollution data observed every 30 minutes. The long time series is sliced into separate functions, where x denotes the time of day, and t denotes the day. A similar approach has been applied to the El Nino-Southern Oscillation (ENSO) (Besse, Cardot, and Stephenson, 2000), Eurodollar futures rates (Kargin and Onatski, 2008; Horváth, Kokoszka, and Reeder, 2013), electricity demand (Shang, 2013), and many other applications. Although, the term ”functional data analysis” was coined by Ramsay (1982), the history of this area is much older and dates back to Grenander (1950) and Rao (1958). Functional data cannot be analyzed using classical statistical tools and need appropriate new techniques in order to be studied theoretically and computationally. A popular functional time series model is the functional autoregressive (FAR) process introduced by Bosq (2000), and further studied by Hörmann, Kokoszka, et al. (2010), Horváth, Hušková, and Kokoszka (2010), Horváth and Kokoszka (2012), Berkes, Horváth, and Rice (2013), Hörmann, Horváth, and Reeder (2013) and Aue, Norinho, and Hörmann (2015). Although these models are applied in the analysis of various functional time series, they cannot handle seasonality adequately. For example, although it seems that the traffic flow follows a weakly pattern, Klepsch, Klüppelberg, and Wei (2017) applied functional ARMA processes for modeling highway traffic data. A popular model for seasonal univariate time series, { X1 , . . . , XT }, is the class of seasonal autoregressive processes denoted by SAR(P)S , where S is the seasonal period and P is the order of the autoregression. These models satisfy the following equation: Xt = φ1 Xt−S + φ2 Xt−2S + · · · + φP Xt− PS + ε t , where φ( x ) = 1 − φ1 x S − φ2 x2S − · · · − φP x PS is the seasonal characteristic polynomial and ε t is independent of Xt−1 , Xt−2 , . . . . For stationarity, the roots of φ( x ) = 0 must be greater than 1 in absolute value. This model is a special case of the AR(p) model, which is of order p = PS, with nonzero φ-coefficients only at the seasonal lags S, 2S, 3S, . . . , PS. In this paper, we propose a class of seasonal functional AR models, which are analogous to seasonal autoregressive models. We present some notation and definitions in Section 2. Section 3 introduces the seasonal functional AR(1) model and discusses some of its properties. Estimation of the parameters of this model is studied in Section 4. In Section 5, the portmanteau test for seasonal functional AR(1) is presented and its asymptotic distribution is derived. The prediction problem is considered in Section 6 and Sections 7 and 8 are devoted to simulation studies and real data analysis. We conclude in Section 9. 2 2 Preliminary notations and definitions Let H = L2 ([0, 1]) be the separable real Hilbert space of square integrable functions x : [0, 1] → R 1/2 R1 1 2 . Let B R with the inner product h x, yi = 0 x (t)y(t)dt and the norm k x k = x ( t ) dt 0 denote the Borel field in H, and (Ω, F , P) stand for a probability space. A random function in H is an F /B measurable mapping from Ω into H. Additionally, let L( H ) denote the space of all continuous linear operators from H to H, with operatorial norm k · kL . An important subspace of L( H ) is the space of Hilbert-Schmidt operators, HS( H ), which forms a Hilbert space equipped with the inner product h A, BiHS = n o1/2 2 ∞ and the norm A = Aφ , where {φk } is any orthonormal Aφ , Bφ k i k k k h ∑ ∑∞ k k k k =1 k =1 HS basis on H. The space of nuclear or trace class operators, N ( H ), is another notable subclass of HS ( H ) and the associated norm is defined as ∞ k AkN = ∑ h| A| φk , φk i = k =1 ∞ ∑ D E ( A∗ A)1/2 φk , φk , (2.1) k =1 where A∗ is the adjoint of A (Conway, 2000). If A is a self-adjoint nuclear operator with associated eigenvalues λk , then k AkN = ∑∞ k =1 | λk |. If, in addition, A is non-negative, then ∞ ∞ k AkN = ∑k=1 h Aφk , φk i = ∑k=1 λk . Note that k · kL ≤ k · kHS ≤ k · kN (Hsing and Eubank, 2015). For x and y in H, the tensorial product of x and y, x ⊗ y, is a nuclear operator and is defined as ( x ⊗ y)z := hy, zi x, z ∈ H. Here, we point out some relations which are simple consequences of the definition of x ⊗ y and will be applied in the following sections: ( x ⊗ y)∗ = y ⊗ x, ( Ax ) ⊗ y = A ( x ⊗ y) , x ⊗ ( Ay) = ( x ⊗ y) A∗ . (Conway, 2000). Let Z denote the set of integers. Then, we define a functional discrete time stochastic process as a sequence of random functions in H, namely X = { Xt , t ∈ Z}. A random function X in H is said to be strongly second order if Ek X k2 < ∞. Similarly, a functional discrete time stochastic process X is said to be strongly second order if every Xt is strongly second order. Let L2H (Ω, F , P) stand for the Hilbert space of all strongly second order random function on the probability space (Ω, F , P). For the random function Xt , the mean function is denoted by µt := E( Xt ) and is defined in terms of Bochner integral. For any t, t0 ∈ Z, the auto-covariance operator is defined as 3 X : = E X − µ ) ⊗ ( X 0 − µ 0 )]. Besides, as an integral operator, C X can be represented as Ct,t [( t 0 t t t t,t0 X Ct,t 0 h(s) = Z 1 0 X 0 0 0 Ct,t 0 ( s, s ) h ( s ) ds , s, s0 ∈ [0, 1] and t, t0 ∈ Z, X s, s0 : = E X ( s ) − µ ( s ))( X 0 ( s0 ) − µ 0 ( s0 ))] is the corresponding auto-covariance where Ct,t ) [( t 0 ( t t t X as C X and if no confusion arises, kernel. When the process is stationary, we will denote Ct,t 0 t−t0 we will drop superscript X. As in any time series analysis, functional white noise processes are of great importance in functional time series analysis. Definition 2.1. A sequence ε = {ε t , t ∈ Z} of random functions is called functional white noise if ε : = C ε do not depend on t, (i) 0 < E kε t k2 = σ2 < ∞, E (ε t ) = 0 and Ct,t 0 ε = 0. (ii) ε t is orthogonal to ε t0 , t, t0 ∈ Z, t 6= t0 ; i.e., Ct,t 0 The gaussian random functions are of great importance in the analysis of function data. The next definition is devoted to introducing these random functions, Lord, Powell, and Shardlow (2014). Definition 2.2. The random function X is called gaussian with mean µ and covariance operator C, N (µ, C ) in abbreviation, if for all x ∈ H, h X, x i is a real-valued gaussian random variable with mean hµ, x i and variance hC ( x ) , x i . The following definitions will be applied in the subsequent sections. Definition 2.3. Let Xn be a sequence of random functions. We say that Xn converges to X in L2H (Ω, F , P) if Ek Xn − X k2 → 0, as n goes to infinity. Definition 2.4. G is said to be an L-closed subspace (or hermetically closed subspace) of L2H (Ω, F , P) if G is a Hilbertian subspace of L2H (Ω, F , P) and, if X ∈ G and ` ∈ L, then `( X ) ∈ G . A zero-mean LCS is an L-closed subspace which contains only zero-mean random functions. p Moreover, let H p = L2 ([0, 1]) denote the productqHilbert space equipped with the inner p product hx, yi p = ∑i=1 h xi , yi i and the norm kxk p = hx, xi p . We denote by L( H p ) the space of bounded linear operators on H p . 3 The SFAR(1)S model Following the development of univariate seasonal autoregressions in Harrison (1965), Chatfield and Prothero (1973) and Box et al. (2015), we define the seasonal functional autoregressive process of order one as follows. 4 Definition 3.1. A sequence X = { Xt ; t ∈ Z} of random functions is said to be a pure seasonal functional autoregressive process of order 1 with seasonality S (SFAR(1)S ) associated with (µ, ε, φ) if Xt − µ = φ ( Xt − S − µ ) + ε t , (3.1) where ε = {ε t ; t ∈ Z} is a functional white noise process, µ ∈ H and φ ∈ L( H ), with φ 6= 0. The SFAR(1)S processes can be studied from two different perspectives. As the first viewpoint, a SFAR(1)S model is an FAR(S) model with most coefficients equal to zero. This point of view will be applied when dealing with basic properties of these processes in the next subsection. In the other perspective, SFAR(1)S processes are studied as a special case of an autoregressive time series of order one with values in the product Hilbert space H S , which will be used while studying the limit theorems of such processes. For simplicity of notation, let us consider µ as zero. In this case, as there is no intercept, the unconditional mean of the process regardless of the season is equal to zero. However, the conditional mean on the past of Xt depends in S, since E ( Xt | Xt−1 , . . . ) = φXt−S (Ghysels and Osborn, 2001). 3.1 Basic properties In order to study the existence of the process X, the following assumption is required: Assumption 1: There exists an integer j0 ≥ 1 such that φ j0 L < 1. In the following, we will call an SFAR(1)S a standard time series if µ = 0 and Assumption 1 holds. The following theorem, which deals with the existence and uniqueness of SFAR(1)S process, is a reformulation of Theorem 3.1, Bosq (2000). Theorem 3.1. Let Xt be a standard SFAR(1)S time series. Then, Xt = φXt−S + ε t , (3.2) has a unique stationary solution given by ∞ Xt = ∑ φ j ε t− jS , t ∈ Z, (3.3) j =0 where the series converges in L2H (Ω, F , P) and with probability 1 and ε is the functional white noise process of Xt . Proof. The proof is an extension of the proof of Theorem 3.1 of Bosq (2000) for the functional AR(1) and consequently is omitted. 5 In time series analysis, the auto-covariance operator is of great importance and plays a crucial role in the analysis of the data. The following theorem demonstrates the structure of autocovariance operator for the SFAR(1)S processes. Theorem 3.2. If X is a standard SFAR(1)S , the following relations hold: C0X = CSX φ∗ + C ε = φC0X φ∗ + C0ε , (3.4) ∞ C0X = ∑ φ j C0ε φ∗ j , (3.5) j =0 where CSX is the auto-covariance operator of X at lag S and the series converges in the k · kN sense. Besides, for t > t0 , ( X Ct,t 0 = φk C0X if t − t0 = kS 0 Otherwise , (3.6) , (3.7) and ( CtX0 ,t = C0X φk ∗ if t − t0 = kS Otherwise 0 where k is some positive integer. Proof. For each x in H, we have: C0X = E ( Xt ⊗ Xt ) = E ( Xt ⊗ (φXt−S + ε t )) , which, using the properties of tensorial product, results in that C0X = CSX φ∗ + C0ε . Similarly, C0X = E ((φXt−S + ε t ) ⊗ (φXt−S + ε t )) , which implies that C0X = φC0X φ∗ + C0ε . As demonstrated in the proof of Theorem 3.1, Xt = ∑kj=0 φ j ε t−Sj + φk+1 Xt−(k+1)S , which results in that: C0X = E ( Xt ⊗ Xt ) =E k ∑ φ j ε t−Sj + φk+1 Xt−(k+1)S ! j =0 k ⊗ ∑ φ j ε t−Sj + φk+1 Xt−(k+1)S !! . j =0 Consequently, C0X = ∑kj=1 φ j C0ε φ∗ j + φ(k+1) C0X φ∗ (k+1) . Since φ(k+1) C0X φ∗ (k+1) is the auto-covariance operator of φ(k+1) X0 , it is a nuclear operator and φ(k+1) C0X φ∗ (k+1) N = φ(k+1) A1/2 A∗ 1/2 φ∗ (k+1) = φ(k+1) A1/2 6 2 HS N ≤ φ ( k +1) 2 L k AkHS , where C0X = A1/2 A∗ 1/2 . Therefore, C0X − ∑kj=1 φ j C ε φ∗ j N ≤ φ ( k +1) 2 L k AkHS , which tends j ε ∗j to zero as k goes to infinity, and results in that C0X = ∑∞ j =0 φ C φ . X = Let t − t0 = kS, for some positive integer k. Using recursive calculation, we have Ct,t 0 X = φk C X . Finally, Concerning (3.7), it E φk Xt0 + ε t0 +S ⊗ Xt0 , which demonstrates that Ct,t 0 0 suffices to write ∗ h i∗ ∗ X = φk C0X = C0X φk , CtX0 ,t = Ct,t 0 and the proof is completed. Note 3.1. The auto-covariance operators characterize all the second-order dynamical properties of a time series and are the focus of time domain analysis of time series data. However, sometimes studying time series in the frequency domain can open new avenues in time series research. Here, we are going to formulate the spectral density operator of { Xt }, based on Panaretos and Tavakoli (2013). If the auto-covariance operators satisfy ∑t∈Z CtX N < ∞, then the spectral density operator of a stationary functional time series, { Xt }, at frequency λ (which is nuclear), is defined as Fλ = 1 2π ∑ e−iλt CtX , (3.8) t ∈Z where λ ∈ R and the convergence holds in nuclear norm. Similarly, based on Theorem 3.2, for SFAR(1)S time series, if ∑t∈Z CtX Fλ = N 1 2π 1 = 2π < ∞, then the spectral density operator at frequency λ, will be ∑ e−iλt CtX , t ∈Z ∞ ( φC0X φ∗ ε +C + ∑e −iλkS k φ C0X k =1 −1 + ∑ k =−∞ e −iλkS ∗ C0X φk ) . (3.9) Note 3.2. Let ε = Xt − φXt−S . It can be shown that X X X ∗ X ∗ h=0 C0 − φC−S − CS φ + φC0 φ ε X X X ∗ X ∗ CkS − φC(k−1)S − C(k+1)S φ + φCkS φ h = kS . Ch = 0 h 6= kS (3.10) This result will be used in Section 5. Consider φ be a symmetric compact operator. In this case, φ admits the spectral decomposition ∞ φ= ∑ αj ej ⊗ ej, j =1 7 (3.11) where e j is an orthonormal basis for H and α j is a sequence of real numbers, such that 2 α1 ≥ α2 ≥ · · · ≥ 0 and lim j→∞ α j = 0. If there exists j such that E ε 0 , e j > 0, we can define the operator φε , which connects ε and φ, as follows: ∞ φε = ∑ αj ej ⊗ ej, (3.12) j= jmin where jmin is the smallest j satisfying E ε 0, ej 2 > 0. If such a jmin does not exists, we set φε = 0. Theorem 3.3. Let φ be a symmetric compact operator over H and ε be a functional white noise process. Then, equation Xt = φXt−S + ε t has a stationary solution with innovation ε if and only if kφε kL < 1. Proof. The proof is an easy consequence of Theorem 3.5, Page 83, Bosq (2000). The following theorem is of great importance when dealing with portmanteau tests for SFAR(1)S . Theorem 3.4. Let φ = ∑ j=1 α j e j ⊗ e j be a symmetric compact operator on H. Then, Xt is a SFAR(1)S model if and only if h Xt , ek i is a SAR(1)S model. Proof. Let Xt be a pure SFAR(1)S process, where ε = {ε t ; t ∈ Z} is a functional white noise and φ ∈ L( H ). Then, using spectral decomposition and symmetric property of φ, it can be shown that h Xt , e k i = α k h Xt − S , e k i + h ε t , e k i , j ≥ 1. (3.13) If αk 6= 0 and E hε t , ek i2 > 0, then h Xt , ek i is a SAR(1) model. If αk = 0, h Xt , ek i = hε t , ek i and h Xt , ek i is a degenerate AR(1). Conversely, by substituting equation (3.13) in h Xt , x i = ∑k=1 h Xt , ek i h x, ek i and applying spectral decomposition, we have h Xt , x i = hφXt−S + ε t , x i . Consequently, Xt = φXt−S + ε t . 3.2 Limit theorems In this subsection, we will focus on some limiting properties of the SFAR(1)S model. Let us set Yt = ( Xt , . . . , Xt−S+1 )0 and εt = (ε t , 0, . . . , 0)0 , where 0 appears S − 1 times. Define the operator ρ on H S as: 0 0 ··· I 0 · · · ρ = 0 I · · · . . .. .. · · · 0 0 ··· 8 0 φ 0 0 0 0 , .. .. . . I 0 (3.14) where I denotes the identity operator on H. Consequently, we have the following simple but crucial lemma. Lemma 3.1. If X is a SFAR(1)S process, associated with (ε, φ), then Y is an autoregressive time series of order one with values in the product Hilbert space H S associated with (ε, ρ), i.e., Yt = ρYt−1 + εt . (3.15) Proof. This lemma is an immediate consequence of Lemma 5.1, Bosq (2000). The next theorem demonstrates the required condition for existence and uniqueness of the process X based on the operator ρ. Theorem 3.5. Let L H S denote the space of bounded linear operators on H S equipped with the norm k · kLS . If ρ j0 < 1, LS for some j0 ≥ 1, (3.16) t ∈ Z, (3.17) then (3.1) has a unique stationary solution given by ∞ Xt = ∑ πρ j εt− j , j =0 where the series converges in L2H S (Ω, F , P) and with probability 1 and π is the projector of H S onto H, defined as π ( x1 , . . . , xS ) = x1 , ( x1 , . . . , xS ) ∈ H S . By the structure of ρ, it can be demonstrated that ρS = φI, where I denotes the identity operator on H S . Consequently, the following lemma is clear. Lemma 3.2. If kφk < 1, then (3.16) holds. Considering Sn = ∑nj=1 X j , we can state the law of large number for the process X. Theorem 3.6. If X is a standard SFAR(1)S then, as n → ∞, n0.25 Sn → 0, (log n) β n β > 0.5. (3.18) Proof. The proof is an easy consequence of Theorem 5.6 of Bosq (2000). The central limit theorem for SFAR(1)S processes is established in the next theorem, which can be proved using Theorem 5.9 of Bosq (2000). Theorem 3.7. Let X be a standard SFAR(1)S associated with a functional white noise ε and such that I − φ is invertible. Then S √n → N (0, Γ), n 9 (3.19) where Γ = ( I − φ ) −1 C ε ( I − φ ∗ ) −1 . 4 (3.20) Parameter estimation In this section, three methods are applied to estimate the parameter φ in model (3.1). These methods, namely method of moments, unconditional least squares estimation and the KarginOnatski method, are studied in brief in the following subsections. 4.1 Method of moments Identification of SFAR(1)S takes place via estimation of the parameter φ. This parameter can be estimated using the classical method of moments. For this purpose, it can be shown that E ( Xt ⊗ Xt−S ) = E (φXt−S ⊗ Xt−S ) + E (ε t ⊗ Xt−S ) , or equivalently, by stationarity of SFAR(1)S , ε,X CSX = φC0X + Ct,t −S . Since CSX,ε = 0, it can be concluded that CSX = φC0X . (4.1) The operator C0X is compact and, consequently, it has the following spectral decomposition: C0X = ∑ λm νm ⊗ νm , m ∈N ∑ λm < ∞, (4.2) where (λm )m>1 is the sequence of the non-negative eigenvalues of C0X , such that, λ1 ≥ λ2 ≥ · · · ≥ 0. Besides, (νm )m>1 is the corresponding set of eigenfunctions, which forms a complete orthogonal system in H. Note that, if, for some k ∈ Z, λk > 0 and λk+1 = 0, then Xt belongs to the k-dimensional space spanned by νm , m = 1, 2, · · · , k, which is isometric to Rk . Therefore, in the following and in order to reflect the infinite-dimensional nature of functional data, we consider the case where λm > 0, for all m ≥ 1, Zhang (2016). From equation (4.1), we have CSX νj = φC0X νj = λ j φ νj . Then, for any x ∈ H, the derived equation leads to the following representation: ∞ φ( x ) = φ ∑ j =1 10 ! x, νj νj ∞ = ∑ x, νj φ νj j =1 ∞ C X νj =∑ S λj j =1 x, νj , (4.3) which gives a core idea for the estimation of φ. Define the empirical auto-covariance operators at lags 0 and S, respectively, by Ĉ0X = n1 ∑nj=1 X j ⊗ X j and ĈSX = n1 ∑nj=S X j ⊗ X j−S . Let λ̂ j , ν̂j 1≤ j≤n be the eigenvalues and the corresponding eigenfunctions of Ĉ0X , where λ̂1 ≥ λ̂2 ≥ · · · λ̂n ≥ 0. Using Equation (4.3), φ can be estimated by λ̂ j , ν̂j 1≤ j≤k , which leads to the following truncated equation: n kn φ̂( x ) = ∑ j =1 = ĈSX ν̂j λ̂ j x, ν̂j 1 n k n −1 ∑ λ̂ j Xi−S , ν̂j n i∑ = S j =1 x, ν̂j Xi , (4.4) where k n ≤ max j| 1 ≤ j ≤ n, λ̂ j > 0 . The choice of k n can be based on the cumulative percentage of total variance (CPV). Apart from the truncation method which is used in equation (4.4), there are other regularization methods that can be applied in the estimation of the parameter φ. For instance, one may alternatively consider Tikhonov regularization given by φ̃( x ) = λ̂ j 1 n n Xi−S , ν̂j ∑ ∑ 2 n i=S j=1 λ̂ j + an x, ν̂j Xij , 1 n n 1 Xi−S , ν̂j ∑ ∑ n i=S j=1 λ̂ j + an x, ν̂j Xi , or the penalization given by φ̌( x ) = with ( an )n∈Z being a positive sequence that converges to zero (Tikhonov and Arsenin, 1977; Groetsch, 1993; Zhang, 2016). Following the same steps as Theorem 8.7 of Bosq (2000), it can be proved that under certain technical assumptions, φ̂ is a strongly consistent estimator of φ. Hörmann and Kidziński (2015) showed that, even if some of the assumed conditions in Bosq (2000) are violated, φ̂ stays consistent. Moreover, Zhang (2016) presented assumptions which slightly generalizes the results of Bosq (2000) by allowing the errors to be dependent. These assumption are stated in Section 5. 11 Note 4.1. For a sequence of zero mean SFAR(1)S processes, the Yule-Walker equations can be obtained as: ( ChX = φChX−S if h = kS 0 Otherwise , (4.5) and C0X = CSX φ∗ + C ε , (4.6) where k is some positive integer. If the auto-covariance operators are estimated using their empirical counterparts, the Yule-Walker equations can be applied for estimating the unknown parameters of the model. 4.2 Unconditional least square estimation method In this section we obtain the least square estimation of the model parameter, φ. Let C0X be the auto-covariance operator of Xt , with the sequence of λ j , νj as its eigen-values and eigenfunctions. The idea is that the functional data can be represented by their coordinates with respect to the functional principal components (FPC) of the Xt , e.g. Xtk = h Xt , νk i, which is the projection of the tth observation onto the kth largest FPC. Therefore, h Xt , νk i = hφXt−S , νk i + hε t , νk i ∞ = ∑ Xt−S , νj φ(νj ), νk + hε t , νk i Xt−S , νj φ(νj ), νk + δtk , j =1 p = ∑ (4.7) j =1 where δtk = hε t , νk i + ∑∞ j= p+1 h Xt−S , νk i φ ( νj ), νk . Consequently, for any 1 ≤ k ≤ p, we have p Xtk = ∑ φkj Xt−S,j + δtk j =1 where φkj = φ(νj ), νk . Note that the δtk are not i.i.d. Setting Xt = ( Xt1 , . . . , Xtp )0 , δt = (δt1 , . . . , δtp )0 , φ = (φ11 , . . . , φ1p , φ21 , . . . , φ2p , . . . , φ p1 , . . . , φ pp )0 , we rewrite (4.8) as Xt = Zt−S φ + δt , 12 t = 1, 2, . . . , n, (4.8) where each Zt is a p × p2 matrix Xt0 0 0p Zt = .. . 00p 00p ... Xt0 . . . .. . ... 00p ... 00p 00p .. , . (4.9) Xt0 with 0 p = (0, . . . , 0)0 . Finally, defining the N p × 1 vectors X and δ and the N p × p2 matrix Z by X1 δ1 Z 1− S X2 δ2 Z 2− S X = . , δ = . , Z = . , .. .. .. Xn δn Zn−S we obtain the following linear model X = Zφ + δ (4.10) Representation (4.10) leads to the formal least square estimator for φ : φ̃ = ( Z 0 Z )−1 Z 0 X (4.11) which can be computed if the eigenvalues and eigenfunctions of C0X are known. However, in practical problems, since C0X is estimated by its empirical estimator, the νj must be replaced by its empirical counterpart, called ν̂j . Using φ̃, the unconditional least square estimator of φ, called φ̃, can be reconstructed. Following the approach of Kokoszka and Reimherr (2013), it can be proved that φ̃ is a consitent estimator of φ. 4.3 The Kargin-Onatski method The Kargin-Onatski method, which is also known as the predictive factors, is introduced by Kargin and Onatski (2008). The main idea of this methods is to focus on the estimation of those linear functionals of the data that can most reduce the expected square error of prediction. In fact, we would like to find an operator A, approximating φ, that minimizes Ek Xt+1 − A( Xt−S+1 )k2 . Using this method, we obtain a consistent estimator as follows. Let Ĉ0 and ĈS be the empirical auto-covariance and empirical auto-covariance of lags 0 and S, respectively, i.e., Ĉ0 ( X ) = 1 n h Xt , x i Xt , n t∑ =1 ĈS ( X ) = 13 n 1 h Xt , x i Xt − S . n − S t∑ =S Define Ĉα as Ĉ0 + αI, where α is a positive real number. Let {υα,i , i = 1, 2, . . . , k n } be the k n eigenfunctions of the operator Ĉα−1/2 ĈS∗ ĈS Ĉα−1/2 , corresponding to the first largest eigenvalues τ̂α,1 > · · · > τ̂α,kn . Based on these eigenfunctions, we construct the empirical estimator of φ, φ̂α,kn , as follows: kn φ̂α,kn = ∑ Ĉα−1/2 υα,i ⊗ ĈS Ĉα−1/2 υα,i . (4.12) i =1 It can be demonstrated that that if {αn } is a sequence of positive real numbers such that n1/6 αn → 0, as n goes to infinity, and {k n } is any sequence of positive integers such that K ≤ n−1/4 k n ≤ n, for some K > 0, then φ̂α,kn is a consistent estimator of φ. Based on Kargin and Onatski (2008) and Horváth and Kokoszka (2012), in practical implementation, k n can be considered to be equal to the number of the EFPC’s and αn = 0.75. 5 Diagnostic Checking Based on the Box-Jenkins methodology in time series analysis, an appropriate method for a long-term forecasting is consisted of model identification, parameter estimation, diagnostic checking and forecasting. In the functional time series, the portmanteau tests for testing the independence of a sequence of functional observations are proposed by Gabrys and Kokoszka (2007) and Horváth, Hušková, and Rice (2013). They applied the independent and identically distributed (i.i.d) assumption to validity the distribution of the proposed test statistics. Zhang (2016) proposed a Cramérvon-Mises type test based on the functional periodogram to test the uncorrelatedness of the residuals. In functional time series, the effect of estimating the model parameters and replacing the unobservable innovations with their estimates are not asymptotically negligible due to the dependence structure of these models. To overcome this problem, Zhang (2016) introduced a modified block bootstrap that takes the estimation effect into account. In order to check the adequacy of seasonal time series models, it is recommended to use tests in seasonal and nonseasonal lags, since implementing tests in just nonseasonal lags results in ignoring the possibility of autocorrelations at seasonal lags S, 2S, · · · , mS, Mahdi (2016). Therefore, in this section, we presented the seasonal version of the test introduced by Zhang (2016), which can be applied to check the adequacy of the fit of the SFAR(1)S model in seasonal lags. Let Rt1 ,t2 ,··· ,t2k−1 (s1 , · · · , s2k−1 , s2k ) = cum Xt1 (s1 ) , · · · , Xt2k−1 (s2k−1 ) , X0 (s2k ) be the cumulant function and the 2kth order cumulant operator be defined as Rt1 ,t2 ,··· ,t2k−1 g (s1 , · · · , s2k−1 , s2k ) = Z [0,1]k Rt1 ,t2 ,··· ,t2k−1 (s1 , · · · , s2k−1 , s2k ) g (sk+1 , · · · , s2k ) dsk+1 · · · ds2k . (5.1) 14 To establish the test statistic for diagnostic checking, we make the following assumption. ε Assumption 5.1: Assume that kφkHS < 1 and ∑∞ t1 ,t2 ,t3 Rt1 ,t2 ,t3 Besides, considering the autocovariance operator C0X < ∞. and its spectral decomposition presented HS as equation (4.2), we assume that Assumption 5.2: For all j, λ j > 0. n √ √ −1 −1 o Assumption 5.3: Let a1 = 2 2 (λ1 − λ2 )−1 and a j = 2 2 max λ j−1 − λ j , λ j − λ j +1 2 2 = o (1) , and ∑ j≥kn φ νj = for j ≥ 2. Assume that n−1/4 ∑kj=n 1 a j /λ j = o (1) , λkn n−1/2 ∑kj=n 1 λ− j √ o 1/ n It can be shown that under Assumptions 5.1-5.3, φ̂, defined in Section 4.1, is a consistent estiε̂ , with mator for φ. Let ε̂ t = Xt − φ̂Xt−S , t = S, S + 1, · · · , n, and X0 = 0. Define V̂hε̂ = Ĉhε̂ + Ĉ− h the corresponding kernel 1 n V̂hε̂ s, s0 = ε̂ t (s) ε̂ t−h s0 + ε̂ t s0 ε̂ t−h (s) ∑ n t = h +1 (5.2) As can be seen in Equation (3.10), if the model fits the data adequately, we may expect zero autocovariance in non-seasonal lags. Consequently, the test statistic defined by Zhang (2016) is redefined as [n/S] Z 1 Z 1 n Qε̂ = 8π 2 ∑ k =1 Let ψh (λ) = sin (hφλ) /hπ and C0X −1/2 0 0 ε̂ V̂kS s, s0 2 ds ds0 . (5.3) −1/2 νj ⊗ νj . The following theorem deter= ∑∞ j =1 λ j mine the asymptotic behavior of Qε̂ . Theorem 5.1. Consider the standard SFAR(1)S model with ε t being a sequence of i.i.d random function in L2H (Ω, F , P) . If Assumptions 5.1-5.3 are fulfilled and C0X uniformly for all k = 1, · · · , −1/2 n S √ −1/2 φkS−1 (C0ε )2 φ∗ kS−1 C0X < ∞, (5.4) N , we have n 2 [n/S] ∑ ε̂ V̂kS s, s0 ψkS (λ) ⇒ g λ, s, s0 , (5.5) k =1 and Qε̂ → Z 1Z 1Z 1 0 0 0 g λ, s, s0 ds ds0 dλ 15 (5.6) where ⇒ denotes the weak convergence and g (λ, s, s0 ) stands for a zero mean Gaussian random process such that 1 ∞ Cov g λ, s, s0 , g λ0 , s1 , s1 0 = ψkS (λ) ψk0 S λ0 ∑ 4 k,k0 =1 × Cov ξ 0,kS s, s0 + ξ 0,kS s0 , s , ξ 0,k0 S s, s0 + ξ 0,k0 S s0 , s (5.7) and ξ i,j (s, s0 ) = ε i (s) ε i− j (s0 ) − ε i (s) C0ε φ∗ j−1 C0X 6 −1 ( Xi − 1 ) ( s 0 ) . Prediction In this section, we present the h-step functional best linear predictor (FBLP) X̂n+h of Xn+h based on X1 , . . . , Xn . For this purpose, we will follow the method of Bosq (2014). In the last section, this prediction method will be compared with the Hyndman-Ullah method and multivariate predictors. Let us define Xn = ( X1 , X2 , . . . , Xn )0 . It can be demonstrated that the L−closed subspace generated by Xn , called G, is the closure of {`0 Xn ; `0 ∈ L ( H n , H )}, (see Bosq, 2000, Theorem 1.8 for the proof). The h-step FBLP of Xn+h is defined as the projection of Xn+h on G; i.e., X̂n+h = PG Xn+h . The following proposition, which is a modified version of (see Bosq, 2014, Proposition 2.2), presents the necessary and sufficient condition for the existence of FBLP in terms of bounded linear operators and determines its form. Proposition 6.1. For h ∈ N the following statements are equivalent: (i) There exists `0 ∈ L ( H n , H ) such that CXn ,Xn+h = `0 CXn . (ii) PG Xn+h = `0 Xn for some `0 ∈ L ( H n , H ). Based on this proposition, to determine the FBLP of Xn+h , it is required to find `0 ∈ L ( H n , H ) such that CXn ,Xn+h = `0 CXn . Let h = aS + c > 0, a ≥ 0 and 0 < c ≤ S. Then , for any x = ( x1 , . . . , xn ) ∈ H n , we have: CXn ,Xn+h ( x) = E (h Xn , xi H n Xn+h ) = E h Xn , x i H n φ a +1 a Xn − S + c + ∑ φ ε n−(k−a)S+c k =0 = E h X n , x i H n φ a +1 X n − S + c 1 = E h Xn , xi H n Φna+ X n −S+c 1 = Φna+ −S+c E (h Xn , x i H n Xn ) 1 = Φna+ − S + c CX n ( x ) , 16 k !! where Φij is in L ( H n , H ) and is defined as an n × 1 vector of zeros with φi in the jth position. 1 a +1 X Consequently, X̂n+h = PG Xn+h = Φna+ n−S+c . −S+c Xn = φ Note 6.1. Based on The Kargin-Onatski estimation method, the 1-step ahead predictor of Xn+1 is: kn X̂n+1 = ∑ < Xn−S+1 , ẑα,i > ĈS (ẑα,i ), (6.1) i =1 where q ẑα,i = ∑ τ̂j−1/2 υα,i , ν̂j ν̂j + αυα,i , (6.2) j =1 The method depends on a selection of q and k n . We selected q by the cumulative variance method and set k n = q. 7 Simulation results Let { Xt } follows a SFAR(1)S model, i.e., Xt (τ ) = φXt−S (τ ) + ε t (τ ) , t = 1, . . . , n, (7.1) where φ is an integral operator with parabolic kernel k φ (τ, u) = γ0 2 − (2τ − 1)2 − (2u − 1)2 . The value of γ0 is chosen such that kφkS = qR R 1 1 0 0 2 k φ (τ, u) dsdu = 0.1, 0.5, 0.9. Moreover, the white noise terms ε t (τ ) are considered as independent standard Brownian motion process on [0, 1] with variance 0.05. Then B = 1000 trajectories are simulated from the Model (7.1) and the model parameter is estimated using the method of moments (MME), Unconditional Least Square (ULSE) and Kargin-Onatski (KOE) estimators. As a measure to check the adequacy of the fit of the model, the root mean-squared error (RMSE), defined as v u B u1 RMSE = t ∑ kφ̂i − φk2S , B i =1 (7.2) is applied. The MSE results are reported in Table 1. By comparing the results in this table, it can be seen that MME outperform other methods when kφkS = 0.5, 0.9, but ULSE and KOE have better MSE when kφkS = 0.1. Generally and as expected, the MSE of the methods improves as the sample size (n) get larger. To better understand these methods, a trajectory path of length n = 200 from the model (7.1) is generated by considering kφkS = 0.9 and k n = 1. Then three estimation methods are compared using the generated path and the results are shown in Figure 1. Moreover, the figures related to MME are shown in Figure 2 for k n = 1, 2, 3. As 17 can be seen in these figures, the method of moments and unconditional least squares have a better performance than the Kargin-Onatski method. Besides, increasing the number of k n in the estimation methods, decreases the accuracy of estimation. kφkS = 0.1 kφkS = 0.5 kφkS = 0.9 n kn MME ULSE KOE MME ULSE KOE MME ULSE KOE 50 1 2 3 4 0.1750 0.5484 1.0239 1.5573 0.1645 0.5189 0.9657 1.4934 0.0951 0.0959 0.0961 0.0962 0.2403 0.4931 0.9988 1.5340 0.2838 0.7000 1.1478 1.6513 0.3716 0.3720 0.3721 0.3721 0.1986 0.4387 1.0435 1.6382 0.2323 1.0381 1.7282 2.4725 0.5096 0.5099 0.5099 0.5099 100 1 2 3 4 0.1222 0.3662 0.6830 1.0645 0.1183 0.3598 0.6798 1.0243 0.0861 0.0866 0.0868 0.0868 0.2050 0.3325 0.6661 1.0245 0.2579 0.6087 0.8723 1.1973 0.3539 0.3541 0.3541 0.3542 0.1387 0.2743 0.6694 1.0193 0.1709 0.9728 1.3925 1.9377 0.4134 0.4136 0.4136 0.4136 150 1 2 3 4 0.1033 0.2903 0.5533 0.8560 0.1027 0.2900 0.5449 0.8237 0.0825 0.0830 0.0831 0.0831 0.1946 0.2666 0.5387 0.8256 0.2505 0.5704 0.7601 1.0040 0.3460 0.3462 0.3462 0.3462 0.1205 0.2149 0.5272 0.8106 0.1517 0.9478 1.2493 1.6683 0.3735 0.3737 0.3736 0.3736 200 1 2 3 4 0.0917 0.2490 0.4790 0.7438 0.0935 0.2610 0.4745 0.7127 0.0798 0.0803 0.0804 0.0804 0.1879 0.2285 0.4684 0.7199 0.2457 0.5568 0.7047 0.9042 0.3393 0.3394 0.3394 0.3394 0.1114 0.1818 0.4542 0.7040 0.1419 0.9411 1.1896 1.5134 0.3496 0.3497 0.3497 0.3497 Table 1: The NMSE of the parameter estimators. Figure 1: Comparison of all methods of estimations when n = 200, kφkS = 0.9, and k n = 1. 18 Figure 2: Method of moment estimation of the model parameter kernel by considering n = 200, kφkS = 0.9, and k n = 1, 2, 3. 60 20 20 0 40 60 Sunday 0 40 60 40 Saturday 20 20 0 Friday 0 20 40 60 Thursday 0 40 60 Wednesday 20 40 60 Tuesday 0 0 20 40 60 Monday Figure 3: Square root of functional pedestrian hourly counts at Flagstaff Street Station, by day of the week. 8 Application to pedestrian traffic In this section, we study pedestrian behavior in Melbourne, Australia. These data are counts of pedestrians captured at hourly intervals by 43 sensors scattered around the city. The dataset can shed light on people’s daily rhythms, and assist the city administration and local businesses with event planning and operational management. Our dataset consists of measurements at Flagstaff Street Station from 00:00 1 Jan 2016 to 23:00 31 Dec 2016.1 In Figure 3, we show the pedestrian behavior by day of the week, which are converted to functional objects using 11 Fourier basis functions. Although Figure 3 shows the existence of a weekly pattern in the data, the different pattern in weekends can put stationarity of data in doubt. Therefore, we do not consider weekends in data analysis and SFAR(1)7 model is applied to the rest of the data. The number of bases is chosen using a generalized cross-validation criteria. Figure 4 shows the estimation of the autocorrelation kernel using different estimation methods, as described in Section 4. A rolling forecast origin for one day ahead throughout the data set was computed. The Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) of the forecasts are tabulated in Table 2, comparing the results when using MME and ULSE estimation methods. The table also 1 The data are available using the rwalkr package (Wang, 2019) in R. 19 MME(Kn=1) ULSE(Kn=1) 3.5 3.0 2.5 2.0 z z 1.5 1.0 y 0.5 y x x 0.0 Figure 4: The estimated kernel of the autocorrelation operator using MME and ULSE methods. include the seasonal naive (SN) forecast for comparison. As can be seen in this table, the best predictions come with small values of k n (similar to the results in the last section) and using MME. Using this method of prediction, the one-day-ahead forecasts are shown in Figure ??, along with the functional pedestrian data. MAE RMSE kn MME ULSE MME ULSE 1 2 3 4 5 6 3.04 135.19 139.46 155.52 165.34 180.86 3.025 24.55 27.65 30.07 32.60 33.20 6.08 138.09 142.09 157.62 168.06 186.92 6.073 24.70 27.88 30.41 33.23 34.01 SN 3.67 6.46 Table 2: The MAE and RMSE of the 1-step predictor when the autocorrelation operator is estimated by MME and ULSE methods. 9 Conclusion In this paper, we have focused on seasonal functional autoregressive time series. We have presented conditions for the existence of a unique stationary and causal solution. Furthermore, we have derived some basic properties and limiting behavior. In FSAR(1)S , we proposed three estimation methods, namely methods of moment, unconditional least square and Kargin-Onatski. Furthermore, we have presented a test for diagnostic checking and, for arbitrary h ∈ N, we have investigated the h-step predictor. The performance of the estimation methods are compared using a simulation study, which demonstrates that the MME has the 20 best performance among mentioned estimation methods. Finally, a real data set is analyzed using the SFAR(1)S model. References Aue, A, DD Norinho, and S Hörmann (2015). On the prediction of stationary functional time series. Journal of the American Statistical Association 110(509), 378–392. Berkes, I, L Horváth, and G Rice (2013). Weak invariance principles for sums of dependent random functions. Stochastic Processes and their Applications 123(2), 385–403. Besse, PC, H Cardot, and DB Stephenson (2000). Autoregressive forecasting of some functional climatic variations. Scandinavian Journal of Statistics 27(4), 673–687. Bosq, D (2000). Linear processes in function spaces: theory and applications. Vol. 149. Lecture Notes in Statistics. Springer Science & Business Media. Bosq, D (2014). Computing the best linear predictor in a Hilbert space. Applications to general ARMAH processes. Journal of Multivariate Analysis 124, 436–450. Box, GE, GM Jenkins, GC Reinsel, and GM Ljung (2015). Time series analysis: forecasting and control. John Wiley & Sons. Chatfield, C and D Prothero (1973). Box-Jenkins seasonal forecasting: problems in a case-study. Journal of the Royal Statistical Society, Series A, 295–336. Conway, JB (2000). A course in operator theory. American Mathematical Society. Gabrys, R and P Kokoszka (2007). Portmanteau test of independence for functional observations. Journal of the American Statistical Association 102(480), 1338–1348. Ghysels, E and DR Osborn (2001). The econometric analysis of seasonal time series. Cambridge University Press. Grenander, U (1950). Stochastic processes and statistical inference. Arkiv för matematik 1(3), 195– 277. Groetsch, CW (1993). Inverse problems in the mathematical sciences. Vol. 52. Springer. Harrison, PJ (1965). Short-term sales forecasting. Applied Statistics, 102–139. He, F (2018). “Statistical Modeling of CO2 Flux Data”. PhD thesis. The University of Western Ontario. Hörmann, S, L Horváth, and R Reeder (2013). A functional version of the ARCH model. Econometric Theory 29(2), 267–288. Hörmann, S and Ł Kidziński (2015). A note on estimation in Hilbertian linear models. Scandinavian journal of statistics 42(1), 43–62. Hörmann, S, P Kokoszka, et al. (2010). Weakly dependent functional data. The Annals of Statistics 38(3), 1845–1884. Hörmann, S, P Kokoszka, G Nisol, et al. (2018). Testing for periodicity in functional time series. The Annals of Statistics 46(6A), 2960–2984. Horváth, L, M Hušková, and P Kokoszka (2010). Testing the stability of the functional autoregressive process. Journal of Multivariate Analysis 101(2), 352–367. 21 Horváth, L, M Hušková, and G Rice (2013). Test of independence for functional data. Journal of Multivariate Analysis 117, 100–119. Horváth, L and P Kokoszka (2012). Inference for functional data with applications. Vol. 200. Springer Science & Business Media. Horváth, L, P Kokoszka, and R Reeder (2013). Estimation of the mean of functional time series and a two-sample problem. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 75(1), 103–122. Hsing, T and R Eubank (2015). Theoretical foundations of functional data analysis, with an introduction to linear operators. John Wiley & Sons. Hyndman, RJ and S Ullah (2007). Robust forecasting of mortality and fertility rates: A functional data approach. Computational Statistics & Data Analysis 51(10), 4942–4956. Kargin, V and A Onatski (2008). Curve forecasting by functional autoregression. Journal of Multivariate Analysis 99(10), 2508–2526. Klepsch, J, C Klüppelberg, and T Wei (2017). Prediction of functional ARMA processes with an application to traffic data. Econometrics and Statistics 1, 128–149. Kokoszka, P and M Reimherr (2013). Determining the order of the functional autoregressive model. Journal of Time Series Analysis 34(1), 116–129. Lord, GJ, CE Powell, and T Shardlow (2014). An introduction to computational stochastic PDEs. Vol. 50. Cambridge University Press. Mahdi, E (2016). Portmanteau test statistics for seasonal serial correlation in time series models. SpringerPlus 5(1), 1485. Panaretos, VM and S Tavakoli (2013). Fourier analysis of stationary time series in function space. The Annals of Statistics 41(2), 568–603. Ramsay, J (1982). When the data are functions. Psychometrika 47(4), 379–396. Rao, CR (1958). Some statistical methods for comparison of growth curves. Biometrics 14(1), 1–17. Shang, HL (2013). Functional time series approach for forecasting very short-term electricity demand. Journal of Applied Statistics 40(1), 152–168. Tikhonov, AN and VY Arsenin (1977). Solutions of ill-posed problems. New York, 1–30. Wang, E (2019). rwalkr: API to Melbourne Pedestrian Data. R package version 0.4.0. URL: http: //pkg.earo.me/rwalkr. Zhang, X (2016). White noise testing and model diagnostic checking for functional time series. Journal of Econometrics 194(1), 76–95. 22 View publication stats