Convolutional Autoregressive Models for Functional Time Series Xialu Liu, Han Xiao, and Rong Chen Rutgers University Abstract Functional data analysis has became an increasingly popular class of problems in statistical research. However, functional data observed over time with serial dependence remains a less studied area. Motivated by Bosq (2000), who first introduced the functional autoregressive (FAR) models, we propose a convolutional functional autoregressive (CFAR) model, where the function at time t is a result of the sum of convolutions of the past functions with a set of convolution functions, plus a noise process, mimicking the autoregressive process. It provides an intuitive and direct interpretation of the dynamics of a stochastic process. Instead of spectral decomposition approach commonly used in functional data analysis, we adopt a sieve estimation procedure based on B-spline approximation of the convolution functions. We establish convergence rate of the proposed estimator, and investigate its theoretical properties. The model building, model validation, and prediction procedures are also developed. Both simulated and real data examples are presented. KEYWORDS: Functional time series; Nonparametric methods; Spline methods; Sieve estimation. 1 Introduction Functional data analysis has received much attention in the last few decades, and has been widely applied in many fields, including medical science (Houghton et al., 1980; Gasser et al., 1984; Ratcliffe et al., 2002a; Ratcliffe et al., 2002b), behavioral science (Keselman and Keselman, 1993), Xialu Liu is Ph.D. student, Department of Statistics, Rutgers University, Piscataway, NJ 08854. Email: xialu.liu@rutgers.edu. Han Xiao is Assistant Professor, Department of Statistics, Rutgers University, Piscataway, NJ 08854. E-mail: hxiao@stat.rutgers.edu. Rong Chen is Professor, Department of Statistics, Rutgers University, Piscataway, NJ 08854. E-mail: rongchen@stat.rutgers.edu. Rong Chen is the corresponding author. 1 and economics (Roberts, 1995; Diebold and Li, 2006). Nonparametric methods, such as spline methods (Silverman, 1994; Brumback and Rice, 1998; Zhou, et al., 1998; Cai et al., 2000) and kernel smoothing (Nadaraya, 1964; Watson, 1964; Gasser et al., 1984; Fan and Gijbels, 1996), were often implemented to analyze functional data. Unsupervised learning methods, such as principal component analysis (James et al., 2000) and clustering analysis (James and Sugar, 2003) were extended for functional data. Books by Ramsay and Silverman (2005), Ferraty and Vieu (2006), Horváth and Kokoszka (2012), provide a comprehensive introduction on functional data analysis. Often, a variety of functional data is observed over time and has serial dependence. For example, in financial industry, the implied volatility of an option as a function of moneyness changes over time. In insurance industry, age-specific mortality rate as a function of age changes over time. In banking industry, term structure of interest rates (yield as a function of time to maturity of a bond) changes over time. In meteorology, daily records of temperature, precipitation and cloud cover for a region, viewed as three related functional surfaces, change over time. Time series analysis, designed to explore the underlying dynamics of the data, is well studied and understood, with modern development in nonlinear (Tong and Lam, 1980; Chan, 1993), nonparametric (Chen and Tsay, 1993a; Chen and Tsay, 1993b; Härdle et al., 1997; Xia and Li, 1999; Cai et al., 2000; Fan and Yao, 2003), multivariate (Tiao and Tsay, 1983; Tiao and Tsay, 1989; Lüetkepohl, 2005) and spatial-temporal modelling (Handcock and Wallis, 1994; Cressie and Huang 1999; Gneiting, 2002). However, these scalar or vector time series models cannot be applied directly to functional data. Bosq (2000) first introduced functional autoregressive (FAR) models with order p, Xt where X ∆1Xt1 . . . ∆p Xtp εt , pXt, t P Zq and ε pεt, t P Zq are a sequence of random functions and a functional white noise process respectively, and ∆i , is a linear operator in Hilbert functional space H. The linear operators can be estimated by performing functional principal component analysis on the sample autocovariance operators. The consistency of such estimators has been proved (Bosq, 2000; Hörmann et al. 2013). All the theoretical and empirical results in the literature have been developed based on the models and methods in Bosq (2000), including Hörmann and Kokoszka (2010), Horváth et al. (2010), Aue et al. (2012), Horváth et al. (2012), Berkes et al. (2013), and Hörmann et al. (2013). So far functional time series data is still a less studied area. Sieve methods are a popular set of tools to estimate parameters in an infinite-dimensional space, including the functional space. It optimizes an objective function over a sequence of finite2 dimensional parameter subspaces, which are called sieve spaces. Hence, the problem is reduced to a sequence of parametric ones. The consistency of the sequence of the sieve estimators can be derived, under certain conditions of sieve spaces. Sieve methods have been discussed in Chen and Shen (1998), Chen (2008), Halberstam and Richert (2013). In this article, we develop a new class of functional time series models called the convolutional functional autoregressive (CFAR) models, along with its associated estimation procedure using splines and sieve methods. As a special case of Bosq (2000), our model provides an intuitive and direct interpretation of the dynamics of a stochastic process. It assumes that the function at time t is a result of the sum of convolutions of the past functions with convolution functions plus a noise process, mimicking the autoregressive process commonly used in scalar time series. It is also an extension of vector autoregressive process. The paper makes contributions to the literature in three aspects. First, we establish the convergence rate of the convolution function estimator in a general case, while Bosq (2000) only considered consistency. Second, in reality functional data are often observed at discrete points, hence nonparametric methods such as splines method are used to obtain a continuous curve. The possible estimation error introduced by these methods was overlooked by Hörmann and Kokoszka (2010), Horváth et al. (2010), Horváth et al. (2012), Hörmann et al. (2013). Under the CFAR model settings, we consider the recovery procedure of functional data, when studying asymptotic properties of the estimators. It can be shown that the estimation error due to discrete observation points is of smaller order, hence as long as the number of observations at each time goes to infinity, the estimator is consistent and the convergence rate does not depend on the number of discrete samples available directly. Thirdly, we develop model building, model validation and prediction procedures for CFAR models, while the picture of FAR models is less complete due to lack of specific model assumptions. The rest of the paper is organized as follows. In Sections 2 and 3, detailed model setting and statistical inference procedures are introduced. The asymptotic properties are presented in Section 4. Simulation results are presented in Section 5 and a real example is analyzed in Section 6. All proofs are shown in the Appendix. 3 2 Convolutional Functional Autoregressive Models We introduce some notations first. For any vector µ, pµqi denotes its i-th entry. For any matrix H, pHqij denotes its pi, j q-th entry. Let Liph r1, 1s Lipr2 tf P r1, 1s : |f px δq f pxq| ¤ M δh, M 8u, h r1, 1s tf P C r r1, 1s : f prq P Liphr1, 1s, r P N, h P p0, 1su, P Lipζ2 r1, 1s, then ζ is called moduli of smoothness of f pq, which measures the smoothness of f pq. If f Without loss of generality, in this paper, we restrict ourselves only to consider functional time series in the space L2 r0, 1s. If a function f : Ω Ñ Rr0,1s is defined on a probability space pΩ, F, P q, and f P L2r0, 1s, then f is called a random function on r0, 1s. The space containing all the random functions on r0, 1s is denoted by Hr0, 1s, when L-2 norm is applied. Definition 1. A sequence of random functions ε pεt, t P Zq in Hr0, 1s is said to be a white noise, if E}εt}22 σ2 8, Epεtq 0, and Covpεtps1q, εtps2qq s1 , s2 P r0, 1s. 2) Covpεt ps1 q, εt h ps2 qq 0, for h 0 and s1 , s2 P r0, 1s. 1) 0 does not depend on t for any pXt, t P Zq in Hr0, 1s is (weakly) stationary, if the mean and covariance functions do not vary with time, i.e., for @h P Z, s1 , s2 P r0, 1s, Definition 2. A sequence of random functions X EpXt ps1 qq µps1 q, CovpXn and A sequence of random functions X h ps1q, Xm hps2qq CovpXnps1q, Xmps2qq. pXt, t P Zq in Hr0, 1s is called a convolutional functional autoregressive model with order p, CFAR(p), if Xt psq p »1 ¸ i 1 0 φi ps uqXti puqdu εt psq, s P r0, 1s, (1) P L2r1, 1s for i 1, . . . , p, are called convolution functions, and ε pεt, t P Zq are i.i.d. Ornstein-Uhlenbeck processes defined on r0, 1s, following the stochastic differential equation, dεt psq ρεt psqds σdWs , ρ ¡ 0 with Ws being a Wiener process. Remark 1. The skeleton of Xt pq, excluding the noise process, defined as ft pq, is the sum of where φi convolutions of φi and Xti . It can also be viewed as the sum of non-normalized smoothed 4 versions of the p past functions tXt1 , . . . , Xtp u. From a pointwise view, ft psq is a weighted sum of tXt1 , . . . , Xtp u, and the weights φi ps uq depend on the distance between s and u. Remark 2. We assume the noise process in model (1) is an Ornstein-Uhlenbeck process, which is spatially stationary, Markovian and continuous but not differentiable with the following properties: εt ps1 q N p0, σ2 q, 2ρ Corrpεt ps1 q, εt ps2 qq eρ|s1 s2 | , @s1, s2 P r0, 1s. The variance is constant, and the correlation of the process at s1 and s2 is determined by the distance of s1 and s2 . If all the convolution functions tφi pq, i 1, . . . , pu are continuous, Xt is also continuous, but not differentiable. The convolution functions tφi pq, i 1, . . . , pu determine the pattern of Xtpq process. Here we show three examples to illustrate the impact of the convolution functions. Example 1. Convolutional functional autoregressive model of order 1, CFAR(1), Xt psq ρ 5, σ 2 »1 0 φps uqXt1 puqdu εt psq, s P r0, 1s, (2) 10. We consider three convolution functions: (i) φpsq 1, s P r1, 1s, and X0 pq 0; (ii) φpsq 1 |s|, s P r1, 1s, and X0 pq 10; (iii) φpsq I ps ¡ 0q, s P r1, 1s, and X0 pq 10. Simulated processes of functions for each φpq are shown in the top, middle and bottom panel 1, 2, 3, 100. The solid lines and dashed lines are Xtpq and ftpq respectively. In case(i), since φpq is a constant function, ft pq is simply the average of Xt1 pq hence a constant function. In case(ii), φpq is a unimodal function around 0. We start the process with a constant function, X0 pq 10, and f1 psq is larger when s is around 0.5. In case (iii), φpq is an indicator function on p0, 1s, so the skeleton of ft psq would be a partial integration of Xt1 pq on the left of s in r0, ss. At s 0, Xt p0q contains no information of Xt1 , but only noise; as s increases, the weight functions φps q increases and information carried by Xt psq on Xt1 pq increases as well; at s 1, Xt p1q is the integration of the function Xt1 pq in the entire range of r0, 1s plus noise. It is worth noting that in case(i), we start at X0 0, but Xtpq gets explosive as time goes by. On the other hand, although we start at a large value, X0 psq 10 for both cases of (ii) and (iii), the processes become close to 0 when t 100. This is because, the process in case(i) of Figure 1, respectively, for t 5 0.6 0.8 1.0 0.8 1.0 0.8 1.0 2 1 f[99, ] −3 0.6 0.8 1.0 0.0 Figure 1: Plots of Xt pq and ft pq, t 0.6 0.8 1.0 0.4 0.6 0.8 1.0 0.8 1.0 8 10 f[99, ] 6 8 10 f[4, ] 0.4 0.2 t=100 0 0.2 1.0 8 10 f[99, ] 0.4 −4 0.0 0.8 0 0.2 6 8 10 4 2 f[3, ] 1.0 0.6 −4 0.0 t=3 0 0.8 0.4 6 8 10 0.6 −4 0.6 0.2 t=100 4 f[4, ] 0.4 6 8 10 6 4 2 0 0.4 0.0 0 0.2 t=2 −4 0.2 1.0 −4 0.0 t=1 0.0 0.8 0 0.6 0.6 −4 0.4 0.4 6 8 10 4 f[3, ] 2 0 −4 0.2 0.2 t=3 6 8 10 6 4 2 0 −4 0.0 −5 0.0 t=2 4 0.4 2 0.2 −1 0 1 f[4, ] −3 −5 0.0 4 1.0 2 0.8 2 0.6 t=1 4 0.4 2 0.2 −1 0 1 −5 −3 f[3, ] −1 0 1 −1 0 −3 −5 0.0 t=100 2 t=3 2 t=2 2 t=1 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 1, 2, 3, 100 for Example 1(i) (top panel), Example 1(ii) (middle panel) and Example 1(iii) (bottom panel). is nonstationary, while the processes in case(ii) and case(iii) are stationary, and their stationary means are a constant function at 0. Theorem 1 presents a sufficient condition for the stationarity of CFAR(1) models. Theorem 1. The CFAR(1) process X κ sup pXt, t P Zq, defined in (1) is (weakly) stationary, if » 1 ¤¤ 0 s 1 1{2 φ ps uqdu 2 0 1. (3) Here weak stationarity is acturally equivalent to strong stationarity, since we assume that the noise process is Gaussian. It is easy to see that }φ}22 1 is also a sufficient condition for ³1 stationarity. However, the condition in Theorem 1 is a weaker one, since for any s, 0 φ2 ps uqdu ¤ ³1 2 ³1 2 1 φ puqdu. Note that 0 φ ps uqdu is the sum of squares of the convolution weights of Xt1 pq to obtain ft psq. Theorem 2 presents a sufficient condition for stationarity of CFAR(p) models. Theorem 2. The CFAR(p) process X pXt, t P Zq defined in (1) is (weakly) stationary, if all the roots of the characteristic function 1 κ1 z κ2 z 2 . . . κp z p 6 0, (4) are outside the unit circle, where κi b³ 1 sup0¤s¤1 0 φ2i ps uqdu for i 1 . . . , p. Condition in Theorem 2 is similar to the sufficient condition for scalar AR(p) models to be stationary, replacing the AR coefficients by the maximum of the norm of the weights function φi ps q, s P r0, 1s. Corollary 1. The CFAR(1) process X on r0, 1s2 : Ψ1 ps, uq φps uq, pXt, t P Zq satisfies (3). Define a sequence of functions Ψ` ps, uq Then Xt has the following representation Xt psq εt psq Corollary 2. The CFAR(p) process X on r0, 1s2 : »1 0 8̧ » 1 ` 1 0 Ψ`1 ps, v qφpv uq dv, for ` ¥ 2. (5) Ψ` ps, uqεt` puq du. (6) pXt, t P Zq satisfies (4). Define a sequence of functions Ψ1 ps, uq φ1 ps uq, Ψ` ps, uq Ψ` ps, uq »1 ¸ ` 1 ¸» i 1 0 p 1 i 1 0 φi ps v qΨ`i pv, uq dv Φi ps, v qΨ`i pv, uq dv, Then Xt has the following representation Xt psq εt psq 8̧ » 1 ` 1 0 φ` ps, uq, 2 ¤ ` ¤ p, ` ¡ p. Ψ` ps, uqεt` puq du. (7) Corollary 1 and Corollary 2 are derived from Theorem 3.1 and Theorem 5.1 of Bosq(2000) directly. This is similar to that of stationary scalar AR(p) case when no constant is in the model. To include nonzero mean function µpq, we use Xt psq µpsq 3 °p ³1 i1 0 φi ps uqpXti puq µpuqqdu. Estimation, Prediction and Order Determination Assume that we observe Xt pq at discrete points, s n{N , n 0, . . . , N , for time t 1, .., T . 3.1 Estimation Since the convolution operator guarantees continuous path of Xt pq, we approximate it with linear interpolation of the observations. For any s P r1, 1s, if sn1 ¤ s sn, let rt psq psn sqXt psn1 q ps sn1 qXt psn q . X 1{N 7 (8) Because Xt pq is continuous but not differentiable and observations are not subject to noises, ³ linear interpolation suffices to evaluate φi ps uqXti puqdu. In a typical nonparametric fashion, we approximate the unknown convolution functions φi pq with a B-spline approximation. Specifically, ķ φi pq φrk,i pq βrk,i,j Bk,j pq, for i 1, . . . , p, (9) j 1 where tBk,j pq, j 1, . . . , ku are uniform cubic B-spline functions with degree of freedom k. Plugging in the linear interpolation of Xti pq and B-spline approximation of φi pq into (1), we get εt psn q Xt psn q p ¸ ķ βrk,i,j »1 0 i 1j 1 Let r ε1t pεrt,0, . . . , εrt,N q1 be an pN rti puqdu. Bk,j psn uqX (10) 1q 1 vector. εrt,n is the approximated noise at sn and time Xtpsq Mtβr k , where rti puqdu, and Mt pMt,1 , . . . , Mt,p q, Mt,i is an pN 1q k matrix, pMt,i qnj 0 Bk,j psn uqX r k is a pk 1 vector, βr k pβr 1k,i , . . . βr 1k,p q1 , and pβk,i qj βrk,i,j . Since Bk,j pq are fixed and known β t defined as on the right hand side of (10). It can be expressed as ε̃t ³1 functions, Mt is known, given the observations. Under the Ornstein-Uhenleck process, for equally spaced sn , εt pq follows an AR(1) process with AR coefficient eρ{N , and covariance matrix Σ, where pΣqij eρ|ij|{N σ2{2ρ. Therefore, β tβij , i 1, . . . , p, j 1, . . . , ku, σ2, and ρ can be estimated by maximizing the approximated log-likelihood function, Qk,T,N pβ, σ 2 , ρq pk Let β βpk,i,j . pN 1qpT 2 pq ln πσ2 N pT pq lnp1 e2ρ{N q 1 ρ Ţ 2 tp 2 ε̃1t Σ1 ε̃t ,(11) 1 pβp 1k,1, . . . , βp 1k,pq1 be the estimator of βr by maximizing Qk,T,N pβ, σ2, ρq, where pβp k,iqj After reparameterization, the objective function can be written as Qk,T,N pβ, ϕ, ω q pN 1qpT 2 pq lnp2πωq N pT pq lnp1 ϕ2q epβ, ϕq , 2 eρ{N , ω σ2{2ρ, Σ0 2ρΣ{σ2, and epβ, ϕq °Ttp correlation matrix of tεt psn q, n 1, . . . , N u. where ϕ 2ω (12) p1 1pεt . Σ0 is actually the 1 εt Σ0 Given β and ϕ, the maximizer of w is p ω pN ep1β,qpϕT q pq . 8 (13) Hence pN β,ϕ max pN max Qk,T,N pβ, ϕ, ω q max β,ϕ,ω 1qpT 2 1qpT 2 ϕ pq ln epβ, ϕq N pT pq lnp1 ϕ2q pq ln epβpϕq, ϕq c 2 N pT p q lnp1 ϕ2 q 2 c , (14) where c is a constant and p pϕq arg min epβ, ϕq β Ţ M1t Σ1 Mt β 1 t p 1 Ţ M1t Σ1 Xt psq . (15) t p 1 Then this problem becomes one parameter optimization, and the estimator of ϕ can be easily obtained by maximizing (14). Together with (13) and (15), we have the estimators for σ 2 , ρ, β. The convolution function φi pq can be estimated by φpk,i pq °k p j 1 βk,i,j Bk,j pq. Remark 3. The above method can also deal with non-equally spaced tsn , n 0, . . . , N u, different tsn, n 0, . . . , N u vary with time), or different number of observations at each time. For simplicity, we assume equally spaced ts n{N u in this paper. observation points at each time (i.e. 3.2 Fitted values Given estimated φi pq, the continuous process Xt pq can be interpolated more precisely than simple linear interpolation. By taking advantage of information from the observed tXt psn q, n 0, . . . , N u and the Markovian property of Ornstein-Uhlenbeck process, we obtain a better approximation of Xt pq. Specifically, for a fixed s P psn , sn frt psq 1 p »1 ¸ i 1 0 q, let rti puqds, φpk,i ps uqX and let rt psq X frt psq eρppssn q eρppsn 1 p 1 s q Σ (16) Xt psn q frt psn q Xt psn p 0 is the estimated correlation matrix of εt psn q and εt psn where Σ 1 q frtpsn 1q , (17) q, with diagonal entry 1 and rt psq Xt psq. Here frt pq is the estimated skeleton of off-diagonal entry eρp{N . When s sn , X Xt pq process, and Xt psn q frt psn q is the approximated residual process. rt 1 psq iteratively. However, empirical Remark 4. Plugging (17) into (16), we can fit frt 1 pq, and X results show that its impact is minor with large N . 9 1 3.3 Prediction Given X1 , . . . , Xt , the least squares prediction of Xt pt X rt where X p »1 ¸ 1 psq i 1 0 1 psq is rt φpk,i ps uqX puqdu, (18) 1 i pq is the fitted processes of Xt 1i pq in (17). The residual is defined as 1 i εpt In addition, if Xt 1 1 psq Xt 1psq Xpt 1psq. (19) pq is partially observed at s 0, N1 , . . . , s, the prediction of Xt 1psq, for s P ps , 1s can be obtained with, pt X pt where X 3.4 1 1 psq Xpt 1psq eρppss q pXt 1psq Xpt 1psqq, (20) psq is from (18). Determination of the B-spline approximation order As in all nonparametric estimation, bandwidth is the key control parameter that balances the estimation bias and variance (Ruppert et al., 1995; Fan and Gijbels, 1996). For spline methods, the control parameter is the degree of freedom k (Zhou et al., 1998; Huang, 2003). We propose to choose k to minimize the out-sample rolling forecasting error instead of the typically used cross validation criterion due to the time series nature of our problem. Specifically, for a given k, and each t T0 , . . . , T , we use data tXh psn q, h 0, ..., t 1, n 0, . . . , N u, observed before time t to estimate the convolution functions using B-spline with degree of freedom k, and predict Xk,t psn q as in (18). Define overall squared rolling forecasting error S pk q Ţ Ņ pXtpsnq Xpk,tpsnqq2. (21) t T0 n 0 The optimal k is chosen to be the one that minimizes S pk q. 3.5 AR order determination F-test can be constructed for hypothesis testing of the significance of the convolution functions as well as for AR order determination. For testing H0 : φr 1 pq . . . φppq 0 vs H1 : not H0, we reject H0 if F pSSE prq SSE pf qq{rkpp rqs ¡ F SSE α,kpprq,pN pf q {rpN 1qpT pq pks 10 qp qpk , 1 T p (22) where SSE pf q and SSE prq are sum of squared estimated residuals of the approximated model in (10) for full CFAR(p) and reduced CFAR(r) model respectively, for an optimally chosen k. Specifically, SSE pf q Ţ 1 p pε1t,1 Σ εt,1 , SSE prq 0,1 p t p 1 Ţ 1 p pε1t,2 Σ εt,2 , 0,2 p t p 1 p 0,1 and Σ p 0,2 are the estimates of Σ0 , the correlation matrix of εt psq, whose pi, j q-th entry where Σ is eρ|ij |{N , from full and reduced models, respectively; tp εt,i psn q, t p i 1, 2 are the residuals of the full and reduced models, respectively. 1, . . . , T, n 0, . . . N u, This test can be used for model specification. We begin with CFAR(1) model, and sequentially add more lags of Xt , until the newly introduced lag is not significant. In addition, define the cross-sectional residuals of εt , et psn q εt psn q eρp{N εt psn1 q. (23) Under our model, tet psn q, n 1, . . . , N u is a white noise process, for t p 1, . . . , T . Let ept psn q εpt psn q eρp{N εpt psn1 q. Hence, the features of tept psn q, n 1, . . . , N u, can be studied with standard residual time series analysis for model validation. 4 Theoretical Properties We study the asymptotic properties of our estimator as both N and T go to infinity. The degree of freedom of B-spline approximation k also goes to infinity with N and T . 4.1 Sieve Framework In this section we reformulate our method under the sieve estimation framework. The sieve method is designed for estimation of a parameter in an infinite-dimensional space Θ. Often optimization the objective function over the parameter space cannot be directly solved. Instead, we optimize the function over a sequence of subspaces tΘk , k P Z u, which are called sieve spaces. If the sieve spaces satisfy certain conditions, the sequence of estimators, called sieve estimators, is expected to be consistent, as the complexity of sieve spaces goes to infinity with sample size, see Chen (2008), Halberstam and Richert (2013). The parameter space Θ for CFAR(p) models contains all the functions from L2 r1, 1s that satisfy stationary condition specified in Theorem 2. The sieve space, Θk , used to approximate 11 Spk X Θ, where Spk is product of p copies of Sk , the parameter space Θ, is defined as Θk Sk t ķ βj Bk,j pq, β pβ1, . . . , βk q1 P Rk u, j 1 and tBk,j pq, j 1, . . . , ku are the uniform cubic B-spline functions defined on r1, 1s with degree of freedom k. In other words, Sk is the cubic spline space with k 4 interior knots at t1, 1 1{m, . . . , 1 1{m, 1u, where m pk 3q{2. The population objective function to maximize we use here is, 1 Qpφ, σ , ρq E 2 2σ »1 2 where εt psq Xt psq ε2t 2 ρ 0 psqds ρε2t p0 q ρε2t p1 q ρ , (24) °p ³ 1 i1 0 φi ps uqXti puqdu. In fact, (24) is the expectation of log-likelihood of the Ornstein-Uhlenbeck process εt pq; see Rao (1999). Define the population objective function Qk pβ, σ 2 , ρq in sieve space Θk , 1 Qk pβ, σ , ρq E 2 2σ 2 where εt psq Xtpsq °pi1 °kj1 βi,j ³1 0 »1 2 ε2t ρ 0 psq ds ρε2t p0q ρε2t p1q ρ Bk,j ps uqXti puq du, and φi pq , °kj1 βi,j Bk,j pq, i 1, . . . , p. When Xt pq is only observed at discrete points, with (8) we have the sample objective function Qk,T,N pβ, σ 2 , ρq Qk,T,N pβ, σ where εrt psq 2 pN , ρq 1qpT 2 pq ln πσ2 N pT pq lnp1 ϕ2q 1 Xtpsq °pi1 °kj1 βi,j ρ ³1 0 Ţ 2 tp 2 rti puqdu, and φi pq Bk,j ps uqX εr1t Σ1 εrt , 1 °kj1 βi,j Bk,j pq, i 1, . . . , p. Then the sieve estimator in space Θk is pβp k , σpk2, ρpk q arg max Qk,T,N pβ, σ2, ρq. 4.2 Asymptotic Properties for CFAR(1) Models Let Qk be the linear operator defined in Section 6.4 of Schumaker (1981), which maps C r1, 1s into the space Θk . Let φrk Qk φ βrk,1Bk,1 ... βrk,k Bk,k , and rk psq φpsq φrk psq. p k and βr k are the B-spline Theorem 3. Xt is a stationary CFAR(1) process defined in (1). β coefficients of φpk and φrk . Assume φ N, T Ñ 8, ? P Lipζ2 r1, 1s, and ζ ¡ 1. p k βr k bk q Ñ N p0, Σk q, T pβ d 12 Then with given σ 2 and ρ, as where Σk Γk 1Υk Γk 1, Υk is the long-run variance of the process ut zt 1 Γ k µk , and the entries in ut , zt , At , µk and Γk are listed as follows put q i pz t q i pAtqij pµk qi pΓk qij where γ pu, v q 1 σ2 »1»1 »1 0 0 0 At bk , where bk ApBk,i , rk qpu, v qXt1 puqXt1 pv qpu, v q dudv, 2ρ Bk,i puqεt p0q σ2 » » 1 σ »1 0 pρBk,ipv uq 1 Bk,i pv uqq dWt pv q Xt1 puq du, 1 1 1 ApBk,i , Bk,j qpu, v qXt1 puqXt1 pv q dudv, σ2 0 0 » » 1 1 1 ApBk,i , rk qpu, v qγ pu, v q dudv, σ2 0 0 » » 1 1 1 ApBk,i , Bk,j qpu, v qγ pu, v q dudv, σ2 0 0 CovpXtpuq, Xtpvqq, and A is a binary functional operator, A : H1 b H1 Ñ H2, H1 tf : r1, 1s Ñ Ru,H2 tf : r1, 1s2 Ñ Ru. Apf, g qpu, v q ρf puqgpvq ρf p1 uqg p1 v q »1 0 f 1 ps uqg 1 ps v q ds »1 2 ρ 0 f ps uqg ps v q ds. (25) Theorem 3 provides the asymptotic distribution of the estimated B-spline coefficients when the dimension of the sieve space is fixed. It is interesting to note that the distribution does not depend on N . As long as N goes to infinity and T opN 2q, the estimated B-spline coefficients converges, and the rate depends on T , but is independent of N . It is due to that errors introduced by the linear interpolation has smaller order than others. With Theorem 3, we can easily obtain the asymptotic pointwise estimation error of the convolution function when k is fixed. P Lipζ r1, 1s with ζ ¡ 1, Bk psq pBk,1 psq, . . . , Bk,k psqq1 , and Corollary 3. Assume φ bk psq Bk psq1 bk rk psq, Then ? }bk pq}8 Opkζ where ζ0 0 o pN 2 q. For each fixed k, define σk2 psq Bk psq1 Σk Bk psq. T φpk psq φpsq bk psq Theorem 4. Assume φ P C ζ r1, 1s, where ζ and T Ñd N p0, σk2psqq. ¥ 2 is an integer. Then as k Ñ 8, { q, 3 2 mintζ, 4u. σpk2 Ñp σ2, and ρpk Ñp ρ. 13 }σk2pq}8 Opkq, Theorem 4 shows the consistency of our proposed estimators as well as the convergence rates of the asymptotic bias and variance. The asymptotic bias depends on the smoothness of the convolution function and complexity of the sieve space used for approximation, while the asymptotic variance only depends on the complexity of the sieve space. It reflects the fact that the complexity of sieve space controls the trade-off between bias and variance. When k is a small number, the bias dominates the error, since there are not enough knots to approximate the convolution function. When k is large enough, the variance dominates, and estimation of extra B-spline coefficients introduces too much error. Theorem 4 also reveals the selection principle for the number of knots. When the convolution function is smooth, a small k is favorable; when the sample size is large, we prefer to have more knots. Remark 5. By balancing the bias and variance, the optimal convergence rate of the estimation 2ζ0 3 error is attained at k OpT 1{p2ζ0 2q q, then both the squared bias and variance will be OpT 4ζ0 4 q. If the moduli of smoothness of φ is greater than 4 and k OpT 1{4q, the optimal convergence rate is OpT 5{12 q. If B-splines with higher order are adopted, the estimator is expected to converge faster. 4.3 Asymptotic Properties for CFAR(p) The asymptotic properties of the estimators of CFAR(p) models are very similar to these of CFAR(1) models. Let φrk,i Qk φi βrk,i,1Bk,1 ... βrk,i,k Bk,k , and rk,i psq φpsq φrk,i psq. Theorem 5 provides the bound of the B-spline coefficients estimation error, and is a higher dimension version of Theorem 3. p k and βr k be the B-spline Theorem 5. Xt is a stationary CFAR(p) process defined in (1). β coefficients of tφpk,1 , . . . , φpk,p u and tφrk,1 , . . . , φrk,p u respectively. Assume φi ζi ¡ 1, for i 1, . . . , p. Then with given σ2 and ρ, as N, T Ñ 8, ? p r d T pβ k β k bk q Ñ N p0, Σk q . P Lipζ2i r1, 1s, and Γk 1Υk Γk 1, Υk is the long-run variance of the process ut zt Atbk , where bk 1 1 1 1 1 1 1 1 1 1 Γ k µk . Here ut put,1 , . . . , ut,p q , zt pzt,1 , . . . , zt,p q , µk pµk,1 , . . . , µk,p q , Γk and At can be partitioned into p p blocks. where Σk Γ Γk,1 k,0 Γk,1 Γk,0 Γk .. .. . . Γk,p1 Γk,p2 . . . Γk,p . . . Γk,p .. .. . . ... 1 2 A At,1,2 . . . At,1,p t,1,1 A At,2,2 . . . At,2,p , At t,2,1 .. .. .. .. . . . . Γk,0 At,p,1 At,p,2 14 . . . At,p,p . The entries in ut , zt , At , bk and Γk are listed as follows put,hqi pzt,hqi pAt,l,hqij pµk,hqi pΓk,hqij for i, j p » » 1 ¸ 1 1 ApBk,i , rk,q qpu, v qXth puqXtq pv qpu, v q dudv, σ 2 q1 0 0 »1 0 1 σ2 2ρ Bk,i puqεt p0q σ2 »1»1 0 p 0 1 σ »1 0 pρBk,ipv uq 1 Bk,i pv uqq dWt pv q Xth puq du, ApBk,i , Bk,j qpu, v qXtl puqXth pv q dudv, » » 1 ¸ 1 1 ApBk,i , rk,q qpu, v qγqh pu, v q dudv, σ 2 q1 0 0 1 σ2 »1»1 0 0 ApBk,i , Bk,j qpu, v qγh pu, v q dudv, 1, . . . , k and l, h 1, . . . , p, where γq pu, vq CovpXtpuq, Xt q pvqq, q P Z. With Theorem 5, we have the asymptotic pointwise estimation error for each convolution function when k is fixed. P Lipζ r1, 1s, with ζi ¡ 1, i 1, . . . , p, and T opN 2q. For each fixed k, define Bk,i psq pB1k,i,1 , . . . , B1k,i,k q1 , Bk,i,i pBk,1 psq, . . . , Bk,k psqq1 , and Bk,i,j is a pdimensional vector whose entries are all 0, for j i. Then Corollary 4. Assume φi 2 psq Bk,ipsq1Σk Bk,ipsqq, σk,i bk,i psq Bk,i psq1 bk rk,i psq, for i 1, . . . , p. Then ? T φpk,i psq φpsq bk,i psq Theorem 6. Assume φi for i 1, . . . , p. i 0 5 2 psqq, Ñd N p0, σk,i P C ζ r1, 1s with ζi ¥ 3, and T opN 2q. It holds that as k Ñ 8, }bk,ipq}8 Opkζ where ζ0 { q, 3 2 2 }σk,i pq}8 Opkq, for i 1, . . . , p, mintζ1, . . . , ζp, 4u. σpk2 Ñp σ2, and ρpk Ñp ρ. Simulations In this section, we illustrate the performance of the proposed estimators, and demonstrate the impacts of k, T and N through two simulated examples. For each model and combination of pk, T, N q, we generate the corresponding functional time series 100 times and estimate the convolution functions. The performance of estimation is evaluated by the integrated squared error, Ei }φpk,i φi}22, for i 1, . . . , p. 15 Example 2. Consider a simple CFAR(1) model with convolution function φpq being the normal density function with mean 0 and standard deviation 0.1, truncated at we use ρ 5 and σ 2 r1, 1s. For noise process, 10. We demonstrate various features of the model with a typical data set whose estimation error is the median value of the 100 simulated sets for k 11, N 100, and T 100. Table 1: The average (sd) of errors when k 7, 11, 19 for Example 2 k distance N T E 7 1/2 100 100 0.4257 (0.0955) 11 1/4 100 100 0.1685 (0.0954) 19 1/8 100 100 0.3318 (0.1370) Table 1 shows the average of E for different k, with fixed T 100 and N 100. The second column shows the distance between two adjacent knots. As k increases, average of E decreases first, due to improvement of the sieve approximation, and then increases, because of the introduction of extra B-spline coefficients. It reaches the minimum when k Table 2: The average (sd) of errors when T 11. 10, 100, 1000 for Example 2 k N T E 11 100 10 4.3748 (3.3497 ) 11 100 100 0.1685 (0.0954) 11 100 1000 0.0312 (0.0187) Table 2 shows the estimation performance as the sample size T changes from 10, 100 to 1000, with fixed k and N . As T increases, the means and standard deviations of E all decrease, and the estimates become more accurate and stable. It can be seen that E decreases approximately at the rate of 1{T . Table 3: The average (sd) of errors when N 10, 100, 1000 for Example 2 k N T E 11 10 100 0.2015 (0.1001) 11 100 100 0.1685 (0.0954) 11 1000 100 0.1687 (0.0968) Table 3 summarizes the estimation performance for k 16 11 and T 100 with different N . It is seen that when N is small, increasing N improves the performance of the estimators as they benefit from the increase of information. However, when N is large enough, E stays at the same level, because there are sufficient observations to get an accurate approximation of Xt pq by linear interpolation. Due to the strong spatial correlation in the error process, dense observations do not provide much extra information. t= 3 0.6 0.2 0.8 1.0 0.4 1 0.8 1.0 0.6 0.0 0.2 0.8 1.0 0.2 0.4 0.6 0.8 1.0 0.6 ● ● 0.8 ●● ● ●● ●● ● ●● ● ●● ●● ● ● 2 3 ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ●● ● ●● ●●●● ●● ● ●● ● ●● ●● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● 0.0 0.4 4 ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ●● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ●● ●● ● ● ●● ● ● ●● ● ●● ● ● ● ● t= 9 3 1 0 ● 0.2 0.6 ●● ● ● ●● ● ● t= 8 ● ●●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● 0.0 0.4 0 3 2 1 0 0.0 2 2 1 0.4 1.0 ● ●● ● ● ● ●● ● ●● ● ● ● ● −3 −2 ● ● ● ● ●● ●● ● ● ● 0.2 0.8 ● ● ●● ● ● ●● ● ● ● ● ●● ● ●●● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● −1 0 −1 ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● −2 2 1 ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● 0.0 0.6 t= 7 ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● 0.4 0 0.2 −1 0.0 t= 6 ● ●●● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ●● ●●● ● ● ●● ●● ● ●● ●● ●● ● ● ● ● ●● ● ●●● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● −2 1.0 ● ●● ● ●● ● ● ● ●● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● −2 0.8 ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ●●● ● ●● ● ● ●●● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● −1 ● ● ● ● ● ● ● ● −1 1 0 ● ● ● ● ● ● ● ● ●● ●●● ● 0 0.6 ● ● ● ●● ●● ● ●● ● ● ● −1 0.4 t= 5 ● ● ● ●●● ● ● ●● ●● ● ● −2 0.2 ● ● −3 −3 ● ● ● ● 0.0 ● ● ●●●● ● ● ● ● ● ● ●● ●● ● ●● ● ● ●●● ● −1 1 0 −1 −2 ● ● ●● ● ● ●● ● ●● ● ●● ● ●● ●● ● ● ● ● ●● −2 ● ●● ● ●●● ●● ●● ●●● ●● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● −2 ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●●●● ● ●●● ●● ● ● ● ● ● ● ● ● ● t= 4 2 t= 2 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Figure 2: Plots of predicted Xt pq when k the typical data set from Example 2. 15, N 100, and T 100, t 2, 3, 4, 5, 6, 7, 8, 9, for The black lines are Xt pq; the blue lines are ft pq; the red lines are the predictions; the black circles are the observations. Figure 2 displays the predicted Xt pq using (18) from time 2 to 9, using k N 15, T 100 and 100 for the typical data set. Because the estimated φpq is very close to the true convolution function, the predictions (red lines) are also close to the skeleton (blue lines). Also note that the noisy process is large in magnitude with strong spatial correlation, hence simple smoothing of Xt pq will be far away from the true skeleton ft pq. Figure 3 shows the performance of using out-sample forecasting to choose k, for T 100, T {2 and N 100 . The left panel in Figure 3 displays the sum of squared forecasting error across k for the typical data set. The minimum is achieved at k 15. It is seen that the sum of T0 squared forecasting error experiences two different phases as k increases. At the beginning, it is very large, due to the lack of knots to approximate φpq well. As the complexity of sieve space k increases, the squared forecasting error reduces very quickly, and the performance of estimator improves, until its minimum is reached. In this phase, the bias dominates the estimation error. After that, the error increases gradually with k because the fixed sample size T is not sufficient 17 30 5200 Histogram of chosen k for 100 data sets 20 ● 5 10 5000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 4800 sum of squared forecasting error Sum of squared forecasting error 5 10 15 20 0 5 10 k 15 20 25 30 k Figure 3: Left panel: plot of the mean squared out-sample forecasting error against k for the 100 and N 100 from Example 2; right panel: histogram of chosen k for 100 data sets when T 100 and N 100 from Example 2. typical data set when T for the extra knots estimation. In this phase the variance dominates the error. The right panel in Figure 3 shows the histogram of chosen k for the 100 data sets. It is truncated at k 5, and most data sets require the dimension of the sieve space to be greater than 8. Note that when k is odd, the prediction error is smaller than that when k is even. This is because φpq reaches its maximum value and maximum second derivatives at 0. The estimate benefits from a knot at 0. −1.0 −0.5 0.0 0.5 1.0 4 −1 0 1 2 3 4 3 2 1 0 −1 −1 0 1 2 3 4 5 k=11 5 k=7 5 k=5 −1.0 −0.5 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.5 1.0 5 −1 0 1 2 3 4 5 4 2 1 0 −1 −0.5 −0.5 k=101 3 4 3 2 1 0 −1 −1.0 −1.0 k=19 5 k=15 −1.0 −0.5 0.0 0.5 1.0 −1.0 Figure 4: Plots of φpq (solid line) and φppq (dashed line) with T 5, 7, 11, 15, 19, 101 for the typical data set from Example 2. 18 −0.5 0.0 100, N 100, and k Figure 4 shows the estimated φpq for the typical data set when k 5, 7, 11, 15, 19, 101. As expected, the accuracy of the estimate improves first and then becomes worse as k increases. The top panel shows the first phase where the bias dominates, and the bottom panel shows the second phase where the variance dominates. When k 11, φppq is very close to the true convolution function, shown in the top right plot of Figure 4. Table 4 reports the performance of the F-test, where the first row is for testing H0 : φ under CFAR(1) model, and the second row is for testing H0 : φ2 0 0 under CFAR(2) model, comparing with CFAR(1) model. The second column shows the p-values for the typical data set. The last 5 columns show the minimum, mean, maximum and standard deviation of the p-values, and frequency of rejection at 5% level, respectively. φ1 pq is significant for all the data set, and φ2 pq is not significant for most of them. Table 4: The p-values for H0 : φi 0, when k 11, T 100, N 100 for Example 2 Overall H0 Significance Example Min Mean Max Sd frequency 0 0.0001 0.0001 0.0001 0.0001 0.0001 φ2 0 0.8442 0.0286 0.5698 0.9832 0.2679 φ1 100% 2% Example 3. A CFAR(2) model is considered in this example. Here we use φ1 psq and φ2 psq 1 2 cosp2πsq, for s P r1, 1s. We fix ρ 5, and σ2 10. 100. Table 5: The average (sd) of errors when k 2 Again, a typical data is selected whose estimation error is the median of these of 100 data sets when k N ? 2?50π e50s , 11, T 500 and 5, 7, 11, 19 for Example 3 k distance N T E1 E2 5 1 100 500 0.2711 (0.0406) 0.0464 (0.0216) 7 1/2 100 500 0.1227 (0.0317) 0.0446 (0.0226) 11 1/4 100 500 0.0623 (0.0335) 0.0596 (0.0286) 19 1/8 100 500 0.1029 (0.0434) 0.0972 (0.0410) Tables 5-7 show the average and standard deviation of E1 and E2 with different combination of (k, T, N ). The pattern is similar to that in Example 2. As k increases, the errors decrease first, reach the minimum, then increase. E1 and E2 are roughly linear in T , but does not change too much with N . 19 Table 6: The average (sd) of errors when T k N T E1 E2 11 100 100 0.3000 (0.1626) 0.3128 (0.1822) 11 100 200 0.1493 (0.0693) 0.1406 (0.0838) 11 100 500 0.0623 (0.0335) 0.0596 (0.0286) Table 7: The average (sd) of errors when N N T E1 E2 11 100 500 0.0623 (0.0335) 0.0596 (0.0286) 11 200 500 0.0623 (0.0334) 0.0595 (0.0285) 11 500 500 0.0622 (0.0330) 0.0593 (0.0284) 35 ● ● ● ● ● ● 0 ● 5 24700 ● ● 25 Frequency ● 15 25300 Histogram of chosen k for 100 data sets ● 24900 25100 100, 200, 500 for Example 3 k Sum of squared forecasting error sum of squared forecasting error 100, 200, 500 for Example 3 6 8 10 12 14 6 k 8 10 12 14 k Figure 5: Left panel: plot of the mean squared out-sample forecasting error against k for the 500 and N 100 from Example 3; right panel: histogram of chosen k for 100 data sets when T 500 and N 100 from Example 3. typical data set when T 9, or k 11 are 500, T0 T {2, and N 100. Figure 5 reads similarly to Figure 3. For most of the data sets, k selected based on the out-sample prediction criterion, when T The estimated φ1 pq and estimated φ2 pq for the typical data set are shown in Figure 6, when 0 than that around the boundary. This is because φi puq’s are used as weights for obtaining Xt psq, when u P r1 s, ss. Specifically φi p0q’s are the convolution weights for Xt psq with u P r1 s, ss. Every Xt psq, s P r1, 1s contains information of φi p0q but only Xt p1q contains information about φi p1q. Hence, the estimate of φi pq in the center exploits data more efficiently than that close to k 11, T 500 and N 100. The estimated function is more accurate around s 20 the boundary. Estimated φ2 −1.0 −0.5 −0.5 0.5 0.0 1.5 0.5 1.0 2.5 Estimated φ1 −1.0 −0.5 0.0 0.5 1.0 −1.0 Figure 6: Plots of estimated φ1 pq and φ2 pq when k −0.5 0.0 0.5 1.0 11, T 500 and N 100 for the typical data set from Example 3. Table 8 summarizes the p-values of a sequence of tests. The first row is for testing H0 : φ 0 under CFAR(1) model. The second row is for testing H0 : φ2 0 under CFAR(2) model, comparing with CFAR(1) model in the F-test. The third row is defined similarly. Based on the p-values shown in the second column, CFAR(2) would be correctly specified for the typical data set. The last 5 columns show that F-tests perform well in determining the AR order. Table 8: The p-values for H0 : φi 0, when k 11, T 500 N 100 from Example 3 Overall p-value Significance Example Min Mean Max Sd frequency 0 0.0001 0.0001 0.0001 0.0001 0.0001 φ2 0 0.0001 0.0001 0.0001 0.0001 0.0001 φ3 0 0.0797 0.0119 0.4906 0.9966 0.2678 φ1 6 100% 100% 2% Real Data Analysis We apply our method to the S&P 500 index European call option data in this section. It is well-known that the implied volatility of an option is a function of its strike prices, and this phenomenon is called volatility smiles. In this paper, we treat the implied volatility as a function of moneyness, which is the relative strike price with respective to the price of underlying asset. The option data is collected from July 9, 2004 to September 20, 2004, and the expiration date of the options is December 18, 2004. Hence, the time series is T 51 long. We select the options with strikes between 950 and 1550, which were actively traded within this period, and 21 have a solution for implied volatility derived from Black-Scholes model most of the time. Number of observations at each time varies from 43 to 48. Our aim is to construct the volatility curve against moneyness, study its evolution mechanism, and predict implied volatility in the future. Figure 7: Plots of implied volatilities against moneyness (top left panel) across time (bottom left panel) and plots of differenced implied volatility against moneyness (top right panel) across time (bottom right panel). The colorbars on the right show the time (top panel) and implied volatility (bottom panel) corresponding to different color scales. The left panel of Figure 7 shows the implied volatility curves as functions of moneyness over time, where the volatility smiles are clearly demonstrated, since deep in-the-money or out-themoney options have larger volatility than others. As expiration approaches, the implied volatilities of at-the-money options decreases, the implied volatilities of out-the-money options increase, and the location where minimum reaches approaches to 1 seen from the top left panel of Figure 7. Hence, we take a difference of the implied volatilities, and model them under CFAR settings. Specifically, let Yt psn q Xtpsnq Xrt1psnq, for n 1, . . . , Nt, t 2, . . . , T , where Xtpsnq is the rt pq is the linear interpolation of Xt pq defined implied volatility at time t with moneyness sn , X in (8), and Nt is number of observations at time t. Data after taking a difference is plotted in the right panel of Figure 7. The implied volatilities of deep in-the-money and out-the-money options 22 are much more volatile than these of at-the-money options. 0, when k 11 φ2 0 Table 9: The p-values for H0 : φi H0 p-value φ1 0 0.0001 0.1027 0 vs H1 : φi 0 for i 1, 2 in Table 9 indicate that CFAR(1) model should be chosen to fit the data. The estimates of φpq for k 12 (top left panel), k 14 (top right panel) and k 18 (bottom left panel) are shown in Figure 8. It can be seen that φppq is not very sensitive to number of knots. The bottom right panel of Figure 8 displays the sum of squared out-sample forecasting errors against k when T0 4{5T . When k 18, the errors reaches the minimum value, hence we select k 18. φppsq is negative when s ¤ 0.1 or 0.65 ¤ s ¤ 0.9; otherwise, it is positive, for k 18 from bottom right panel of Figure The sequential F-tests of testing H0 : φi 8. Hence, the implied volatility with moneyness s at time t is likely to increase, when these with moneyness u, u ¥ s 0.1, s 0.65 ¤ u ¤ s 0.9 decrease, and others increase at time t 1. Define ept pnq εpt psn q eρp|sn sn1 | εpt psn1 q, n 2, . . . , Nt . The sample autocorrelation functions of ept pnq are shown in Figure 9 for t 2, . . . , 37. Most of the time, the sample autocorrelations of cross-sectional errors lie within confidence intervals, except t 8, 14, 15, 30, 37. Overall, most of the sample autocorrelations of ept psn q are not significant, which implies that the CFAR(1) model fits data well. One-step-ahead out-sample forecasting error of Xt pq is used to compare the prediction perfor- pt mance of the estimated CFAR(1) model with a functional random walk model , where X 1 psnq 0, @sn r0, 1s . Table 10 shows the estimated ρ and σ 2 using CFAR(1) model, and the mean squared out-sample forecasting errors for both models. CFAR(1) outperforms the functional random walk model, reducing the MSE by about 9%. Table 10: Parameter estimates for CFAR model and out-sample forecasting MSEs for both models ρp σ p2 MSE1(CFAR model) MSE2(random walk) 0.0001 1.5176e 04 1.2755e 04 1.167e 04 23 1 0 −2 −4 −4 −2 0 1 2 Estimated convolution function, k=16 2 Estimated convolution function, k=12 −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 0.5 1.0 0.0575 ● 0.0560 ● ● ● ● 0.0545 2 1 0 −2 −4 −1.0 0.0 Sum of squared forecasting error Sum of squared forecasting error Estimated convolution function, k=18 −0.5 1.0 ● ● ● 5 ● ● 10 ● ● ● ● 15 ● ● ● 20 k Figure 8: Plots of φppq for different k, k k 12 (top left panel), k 16 (top right panel) and 18 (bottom left panel), and plot of the sum of squared out-sample forecasting error against k (bottom right panel). Appendix By Cauchy-Schwarz inequality, for any function x P L2 r0, 1s, Proof of Theorem 1: } »1 0 ¤ φps uqxpuqdu} » 1 » 1 0 0 2 2 }x}¤1 0 »01 φ2 ps uqdu It follows that sup » 1 » 1 } »1 0 0 2 φps uqxpuqdu ds x2 puqdu ds ¤ κ2 ||x||. φps uqxpuqdu}22 ¤ κ2 1. By Lemma 3.1 in Bosq (2000), Xt is stationary. Proof of Theorem 2: Let Hp r0, 1s be the Cartesian product of p copies of the random function 24 0.2 −0.4 ACF 0.2 −0.4 ACF Lag 12 2 6 0.2 −0.4 ACF 0.2 ACF Lag 12 2 6 0.2 −0.4 ACF 0.2 Lag 12 2 6 t= 36 −0.4 0.2 Lag ACF 0.2 12 t= 37 Lag ACF 12 t= 31 −0.4 2 6 12 12 t= 25 Lag 12 12 t= 19 t= 30 ACF 0.2 0.2 −0.4 ACF 0.2 ACF −0.4 2 6 −0.4 2 6 Lag 2 6 0.2 −0.4 ACF 0.2 ACF −0.4 ACF 0.2 0.2 ACF −0.4 0.2 ACF −0.4 ACF 0.2 12 t= 35 12 12 Lag −0.4 2 6 ACF 2 6 Lag t= 24 −0.4 −0.4 ACF 0.2 ACF 0.2 12 Lag 12 2 6 Lag t= 34 −0.4 2 6 12 12 t= 13 Lag t= 29 ACF 0.2 ACF 2 6 Lag −0.4 2 6 −0.4 12 t= 33 12 12 2 6 t= 18 Lag Lag 0.2 2 6 2 6 t= 23 t= 28 −0.4 ACF 0.2 12 12 −0.4 2 6 Lag Lag 2 6 0.2 ACF 12 t= 27 t= 32 2 6 −0.4 2 6 −0.4 2 6 0.2 ACF 0.2 0.2 −0.4 ACF 0.2 12 12 12 Lag Lag Lag t= 7 t= 12 t= 17 t= 22 Lag Lag 2 6 −0.4 2 6 t= 21 t= 26 12 Lag 12 2 6 Lag t= 16 −0.4 ACF 0.2 −0.4 2 6 −0.4 2 6 0.2 ACF 2 6 Lag 12 12 t= 11 −0.4 12 t= 15 Lag 2 6 Lag 0.2 2 6 t= 20 12 t= 10 −0.4 ACF 0.2 12 Lag 2 6 −0.4 2 6 Lag t= 14 t= 6 −0.4 12 t= 9 −0.4 2 6 0.2 ACF 2 6 Lag 0.2 12 t= 8 0.2 2 6 t= 5 −0.4 ACF −0.4 −0.4 t= 4 0.2 t= 3 0.2 t= 2 2 6 12 2 6 12 Figure 9: Sample autocorrelations of ept with lag 0 autocorrelation removed when t 2, . . . , 37. space Hr0, 1s. The norm in Hp r0, 1s is defined as g f¸ fp }pX1, . . . , Xpq}p e }X1}22, where X1, . . . , Xp P Hr0, 1s. i1 Consider κ1 κ2 . . . κp ∆1 ∆2 . . . ∆p 1 0 ... 0 I 0 ... 0 , ∆ , K 0 1 ... 0 0 I ... 0 0 ... I 0 0 ... 1 (26) 0 where I denotes the identity operator, ∆i is a convolution operator associating with φi , i.e. ∆i X ³1 0 φi p uqX puqdu. Note that }∆X}p ¤ }KX}p. And all the roots of the characteristic function are eigenvalues of the matrix K. Let λmax be the maximum eigenvalue in modulus of K, and |λmax | 1. Hence, there exists an integer j, such that }∆j } ¤ }Kj } 1. By Theorem 5.1 in Bosq (2000), CFAR(p) model is stationary if (4) is satisfied. P Lipζ2 r1, 1s, and ζ ¡ 1, ts0 0, s1, . . . , sN 1u is an equally spaced partition of r0, 1s. For any u, v P r0, 1s, let fu and gv be pN 1q1 vectors, fu pf ps0 uq, f ps1 uq, . . . , f psN uqq1 , Lemma 1. Let f, g 25 pgps0 vq, gps1 vq, . . . , gpsN vqq1. Σ is defined in Section 3.1, where pΣqij eρ|ij|{N σ2{2ρ. Define AN pf, g qpu, v q fu1 Σ1 gv , then gv AN pf, g qpu, v q where ζ0 OpN ζ0 1 Apf, g qpu, v q σ2 1 q, (27) mintζ, 2u. Proof: Let ϕ eρ{N , and it is easy to show that Σ1 a 1 ϕ2 1 a ϕ 2 1ϕ U 0 σ2ρ U1U, where 2 0 . 1 ϕ 0 0 (28) 1 And we have Ufu a 1 1 ϕ2 a 1 ϕ2 f ps0 uq f ps1 uq ϕf ps0 uq . . . f psN uq ϕf psN 1 uq 1 , Ugv has a similar expression, hence f 1 Σ1 gv u 2ρ f puqg pv q σ2 2ρ p1 ϕ2qσ2 2ρ f puqg pv q σ2 2ρ p1 ϕ2qσ2 Ņ 2ρ p1 ϕ2qσ2 Ņ n 1 Ņ pf psn uq ϕf psn1 uqq pgpsn vq ϕgpsn1 vqq rf psn uq f psn1 uqs rgpsn vq gpsn1 vqs n 1 p1 ϕq rf psn uq f psn1 uqs gpsn1 vq n 1 Ņ 2ρ p1 ϕ2qσ2 n1p1 ϕqf psn1 vq rgpsn vq gpsn1 vqs Since f, g ζ Ņ 2ρ p1 ϕ2qσ2 L1 L2 p1 ϕq2f psn1 vqgpsn1 vq n 1 L3 L4 L5 . P Lipζ2 r1, 1s, |f 1puq f 1pvq| ¤ M |u v|, if ζ ¥ 2; |f 1puq f 1pvq| ¤ M |u v|ζ 1, 2, for some positive constant M . It follows that Ņ 1 f psn uq f psn1 uq g psn v q g psn1 v q 2ρ L2 N p1 ϕ2 qσ 2 n1 N 1{N 1{N 2ρ N p1 ϕ2 qσ 2 1 σ2 »1 0 1 1 1 pf psn1 uq N n1 f 1 ps uqg 1 ps v q ds OpN ζ0 OpN ζ0 1 1 qqpg1psn1 uq σρ q, and L5 σ 2 0 26 OpN ζ0 1 qq q. ³1 1 OpN ζ0 0 f ps uqg ps v qds ρ2 ³ 1 f ps uqg ps v qds OpN ζ0 1 q. 2 Similarly, we obtain L3 OpN ζ0 Ņ 1 q, L4 σρ 2 ³1 0 f ps uqg 1 ps v qds if Using integration by parts, we have fu1 Σ1 gv 1 σ2 ρf puqgpvq OpN ζ0 1 »1 ρf p1 uqg p1 v q 0 q. Lemma 2. If the CFAR(1) process X »1 f 1 ps uqg 1 ps v q ds 2 ρ 0 f ps uqg ps v q ds pXt, t P Zq satisfies (3) and φ P C 2r1, 1s, then there exists a positive constant ι which only depends on φ and ρ such that: (i) }Ψh }8 (ii) }γh }8 ¤ κh for h ¥ 2. ¤ σ2ικ|h| for h P Z. (iii) For h ¥ 1, " B γ p u, v q h max u,v Bu , and when h 0, for any u1 , u2 , v Bγh pu, vq * 2 h Bv ¤ σ ικ , P r0, 1s, |γ pu1, vq γ pu2, vq| ¤ σ2ι|u1 u2|. (iv) For h ¥ 1, " 2 B γh pu, v q max u,v Bu2 , and when h 0, 2 B γh pu, vq BuBv , " 2 B γ p u, v q max 2 uv Bu , 2 B γ pu, vq BuBv , Proof: By Cauchy-Schwarz inequality, for any u, v » 1 |Ψ2pu, vq| |Ψhpu, vq| ¤ 0 φpu sqφps v q ds ¤ » 1 0 » 1 1{2 » 1 pu, sqds Ψ2h 1 0 2 B γh pu, vq * 2 h Bv 2 ¤ σ ικ , 0 2 B γ pu, vq * 2 Bv 2 ¤ σ ι. P r0, 1s, 1{2 » 1 φ2 pu sqds 1{2 φ ps v qds 2 0 1{2 φ2 ps v qds ¤ κ}Ψh1}8, for h ¥ 3. Hence, Lemma 2(i) can be proved inductively. By the definition of Ψ, for h ¥ 2, Ψh pu, v q » r0,1sh φpu s1 qφps1 s2 q . . . φpsh1 v q ds1 . . . dsh1 , 27 ¤ κ, then we have BΨhpu, vq » φ1pu sqΨ ps, vq ds, BΨhpu, vq » h1 Bu Bv r0,1s r0,1s h Ψh1 pu, sqφ1 ps v q ds, h B2Ψhpu, vq » φ2pu sqΨ ps, vq ds, B2Ψhpu, vq » Ψ pu, sqφ2ps vq ds, h1 h1 Bu2 Bv2 r0,1s r0,1s Let d1 max1¤s¤1 |φ1 psq| and d2 max1¤s¤1 |φ2 psq|, and it holds that ! BΨ pu, vq BΨ pu, vq ) h h1 max , h u,v Bu Bv ¤ d1κ , for h ¥ 1, ! B2 Ψ pu, vq B2 Ψ pu, vq ) h h h1 max uv Bu2 , Bv2 ¤ d2κ , for h ¥ 1. By Corollary 1, the covariances γ pu, v q have the expression: h γ pu, v q h σ 2 ρ|uv| e 2ρ It follows that, for any u, v 8̧ » 1 » 1 σ2 2ρ `1 0 0 Ψ` pu, wqΨ` pv, z qeρ|wz | dwdz. P r0, 1s, }γ }8 ¤ σ2{r2ρp1 κ2qs, 2 |γ pu1, vq γ pu2, vq| ¤ σ2ρ pρ " 2 B γ pu, v q max uv Bu2 , where d maxtd21 , d2 κu. 2 B γ pu, vq BuBv , d1 κ q|u1 u2|, 1 κ2 2 B γ pu, vq * σ 2 2 Bv 2 ¤ 2ρ pρ For h ¥ 1, the autocovariances γh pu, v q CovpXt puq, Xt γh pu, v q σ2 2ρ »1 0 eρ|us| Ψh pv, sq ds Hence, for h ¥ 1, we have 8̧ » 1 » 1 σ2 2ρ `1 0 0 h d q, 1 κ2 pvqq is given by Ψ` pu, wqΨ` h pv, zqeρ|wz| dwdz. }γh}8 ¤ σ2κ|h|{r2ρp1 κ2qs, 2 Bγh pu, vq d1 κ ¤σ ρ κh , Bu 8 2ρ 1 κ2 2 h1 Bγh pu, vq ¤ σ d1 κ , Bv 8 2ρ 1 κ2 * " 2 B γh pu, v q B 2 γh pu, v q B 2 γh pu, v q d σ2 2 max , p ρ qκh, , ¤ 2 2 uv Bu BuBv Bv 2ρ 1 κ2 Since γh pu, v q γh pv, uq, the proof of Lemma 2(ii), 2(iii) and 2(iv) is complete. 28 pXt, t P Zq satisfies (3). Let ε0 be an i.i.d. copy of ε0, and Xt be obtained by replacing ε0 with ε0 in the definition of Xt . If f : r0, 1s2 Ñ R be a continuous function, and set }f }8 : maxu,v |f pu, v q|. Define Lemma 3. The CFAR(1) process X Yt »1»1 0 0 Then, for each q f pu, v qXt puqXt pv q dudv, and Yt »1»1 0 0 f pu, v qXt puqXt pv q dudv. P N and q ¥ 1, and t ¥ 0, ? 2 }Yt Yt }q ¤ ? 2σ 2 rp2q 1q!!s1{q }f }8 κt. ρ 1κ Proof: By Cauchy-Schwarz inequality, » 1» 1 f pu, v qrXt puq Xt puqsXt pv q dudv q 0 0 »1»1 rXt puq Xt puqsXt pv q dudv 2}f }8 q »01 »01 » 1 r Ψt pu, wqpεt pwq εt pwqdw Xt pv q}2q dudv 2}f }8 2q }Yt Yt}q ¤ 2 ¤ ¤ ? 02 0 0 ¤ a 2σ 2 rp2q 1q!!s1{q }f }8 κt. ρ p1 κ q Define δt pq Proof of Theorem 3: pf gqpuq ³1 0 Xtpq Xrtpq. For any function f P C r1, 1s, denote f pu v qg pv qdv. We first observe that Xt can be decomposed as Xt psq ķ βk,j pBk,j Xrt1qpsq prk Xrt1qpsq εt psq pφ δtqpsq. j 1 rt and w r t be two pN Let v 1q-dimensional vectors whose i-th entries are prk and pφ δt qppi 1q{N q respectively. Let εt be the pN p k can be decomposed as εt ppi 1q{N q. The estimator β pk β βr k Ţ M1t Σ1 Mt 1 Ţ t 2 o pN 2 q Ţ ?1 t p k βr k β Γk,N1 T1 k,N Ţ rt M1t Σ1 w εt r tq . w opp1q. T (29) t 2 Γk,N1 µk,N , then t 1 Γ1 bk,N rt M1t Σ1 pv 1 E pM Σ1Mtq, and bk,N T Let µk,N 1q-dimensional vector whose i-th entry is t 2 We claim that under the condition T E pM1 Σ1vrtq, Γk,N Xrt1qppi 1q{N q Ţ pM1 Σ1vrt µ k,N t t 2 pM1tΣ1Mt Γk,N qbk,N t 2 29 q 1 Γ1 k,N T op pT 1{2 q. Ţ t 2 M1t Σ1 εt (30) We claim that pk β βr k 1 Γ k bk Ţ Γk 1 T1 1 T Ţ put µk q t 2 »u σ 0 pztqj 0 The vector-valued process ut zt t 2 (31) eρpuvq dWt pv q, then similar to proof of Lemma 1, we get 1 σ Ţ op pT 1{2 q. pAt Γk qbk εt puq εt p0qeρu 2ρ Bk,j puqεt p0q σ2 1 T t 2 If we represent εt puq as »1 1 Γ k »1 0 1 Bk,j pv uqq dWt pv q Xt1 puq du. pρBk,j pv uq zt (32) At bk is a stationary process. For any fixed unit vector P Rk , by Lemma 3, the physical dependence measures defined in Wu (2005) of the process θ 1 put zt At bk q decay geometrically fast, and therefore the martingale approximation used in ° Wu (2005) can be applied to obtain the central limit theorem for Tt2 θ 1 put zt At bk q. By θ the Cramer-Wold device, we have ? 1 1 p k βr k bk q Ñ N p0, Γ T pβ k Υk Γk q, d where Υk is the long-run variance of the process ut zt At bk . Now we prove the claim (29). We proceed by showing that each entry of the vector converges r t can be written as to zero in probability. The j-th entry of Mt Σ1 w »1»1 0 0 rt1 puqδt pv q dudv. AN pBk,j , φqpu, v qX By Lemma 1 , there exists a constant c1 ¡ 0 such that |AN pBj , φqpu, vq| ¤ c1, for all u, v, N. (33) Let rh pu, vq CovpXt puq, δt γ By Lemma 2, there exists a constant c2 h pvqq, »1»1 E 0 0 h ¡ 0 such that }γ̃h}8 ¤ c2κh{N, It follows that γ̄h pu, v q Covpδt puq, δt }γ̄h}8 ¤ c2κh{N. |AN pBk,j , φqpu, vqδtpuqδtpvq| dudv OpN 1q, 30 pvqq, °T 1{2 q, where t2 yjt op pT »1»1 so it suffices to prove yjt 0 0 AN pBk,j , φqpu, v qXt1 puqδt pv q dudv. By (33), Eyjt »1»1 0 0 AN pBk,j , φqpu, v qrγ1 pu, v q γ r1 pu, vqs dudv Op1{N q. (34) The autocovariance is Covpyjt , yj,t hq » r0,1s4 AN pBk,j , φqpu1 , v1 qAN pBk,j , φqpu2 , v2 q rγhpu1, u2qγ̄hpv1, v2q rh γ 1 pu1, v2qγrh 1pu2, v1qs du1du2du3du4. ¡ 0 such that By (33) and Lemma 2, we know there exist a constant c3 |Covpwjt, wj,t hq| ¤ c3κ2h{N. It follows that Ţ Var yjt OpT {N q. (35) t 2 Combining (34) and (35), and using the condition T opN 2q, we have °Tt2 yjt oppT 1{2q, and the proof of claim (29) is complete. The difference between the two expressions in (30) and (31) is due to the approximation of rt , so the proof of claim (31) is in the same fashion of that of claim (29). We omit the Xt by X details. Lemma 4. The distance of two adjacent knots for uniform cubic B-spline functions defined on r1, 1s with degree of freedom k is 1{m, and m pk 3q{2. Define two pm 4qpm 4q matrices Ps and Vs : pPsqqj » 1 s{m » 1 s{m 1 eρ|uv| Bk,q { s{m » 1 s{ m » 1 s{ m s m pVsqqj ρ2 { s m { 1 puqBk,j m 1 eρ|uv| Bk,q puqBk,j m 1 s m pvq dudv, m 1 pvq dudv, m 1 1, . . . , m 4. Let b pb0, . . . , bm 3q1 be a unit vector. There exists constants c1 and c2 ° 1 such that if s P Iα , 3j 0 b2j ¥ 3m , and m ¥ c1 , then for q, j b1 pPs Vs qb ¥ c2 3̧ j 0 31 b2j . Proof: Rescale the uniform B-spline functions, Bq psq Bk,m q ps{mq rq, q The support of Bq pq is rq, q pPsqqj 0, . . . , m 3. 4s and we denote B puq B2 puq. Ps and Vs can be written as »m s»m s » »m s ρ2 pVsqqj m2 1, . . . , m 4sp sq3 , for q 1, . . . , q s m s s eρ|uv|{m Bq1 1 puqBj1 1 pv q dudv, s s eρ|uv|{m Bq1 puqBj 1 pv q dudv, s 4, there are m 4 spline functions which are not identically zero on the interval rs, m ss: B0 puq, B1 puq, . . . , Bm 3 puq. For any numbers b0 , b1 , b2 , b3 , 2 ³4 °3 ° b B p s q ds ¥ c3 3j 0 b2j . Thus there exists an there exists a constant c3 ¡ 0 such that 3 j 0 j j s P r3, 4s such that 3̧ 3̧ 1{2 b B psq ¥ ?c b2 . for q, j 4. For a fixed 3 j 0 j j 3 j j 0 P r3, 4s, °3j0 bj Bj1 psq ¤ As a result, there exists an interval Iα of length c5 ¡ 0, which is contained in On the other hand, there exists a constant c4 such that for all s c4 ° 3 2 j 0 bj 1{2 . r3, 4s, such that for each s in this interval 3̧ 3̧ 1{2 b B psq ¥ c b2 , j 0 j j 6 j 0 j where 0 c6 1 is an absolute constant. Define the functions: 3̧ f1 puq bj Bj puq, f2 puq j 0 and let f puq f1 puq ¸ m 1 bj Bj puq, ¸ m 3 bj Bj puq, j m j 4 f2 puq f3 puq f3 puq. Using the identity » 1 8 iuλ ρ|u|{m e π e 1 8 m{ρ pmλ{ρq2 dλ, we have 1 b1 P s b s f 1 puqf 1 pv q »8 eipuvqλ m{ρ pmλ{ρq2 dλdudv 1 s s 8 2 » 8 » m s 1 m{ρ f 1 puqeiuλ du dλ π 8 s 1 pmλ{ρq2 2 » »m s 1 8 isλ ipm sqλ iuλ f1 psqe f3 pm sqe p iλq f puqe du π 8 1 s π »m s»m 32 m{ρ pmλ{ρq2 dλ. Similarly, 2 » 8 » m s iuλ f puqe du 2 πm 8 s 1 m{ρ pmλ{ρq2 dλ. Neuman (1981) showed the Fourier transform of the central spline B2 puq is »2 2 sinpλ{2q 4 iuλ B puqe du . ρ2 b1 V s b 2 Let g pλq »m λ °m1 ipj 2qλ , we have j 4 bj e »m s s f puqe iuλ s 2 du » s m s s »m f1 puqe iuλ f1 puqe iuλ s du » s m s du s f3 puqe iuλ f3 puqe iuλ du du g pλ q g pλ q »4 0 B2 puqeiuλ du 2 sinpλ{2q λ 4 . (36) Note that » m s »7 iuλ s f1 puqe du ¤ 3 3̧ |bj |Bj puq du ¤ 2, (37) j 0 and a similar inequality holds for the Fourier transform of f3 puq. We will consider two cases. Case 1: maxλPr0,2{ms |g pλq| ¤ m |f1psq|{3. on the interval λ P rπ{p6mq, 11π{p6mqs, by law of cosines, it holds that f psqeisλ 1 q ¥ |f1 psq sinpmλq| ¥ |f1 psq|{2, sqeipm f3 pm s λ and with (36) and (37), we have f1 psqeisλ f3 pm sqe p q i m s λ »m piλq s f puqe iuλ s |f1 psq| 22π du ¥ 3m 2 |f13psq| |f16psq| 22π . 3m It follows that 1 b1 P s b ¥ π π1 Recall that » 11π{p6mq π {p6mq » 11π{6 { 6 1 3m , 3m |f1psq| 22π 6 π 6 °3 2 j 0 bj ¥ |f1psq| 22π 2 2 3m m{ρ pmλ{ρq2 dλ 1 ρ ρ2 λ2 dλ ¥ 60ρ 36ρ2 121π 2 so |f1psq| ¥ c4 1{2 3̧ b2j j 0 33 ? ¥ c6{ 3m. |f1psq| 22π 2 . 6 3m Therefore, for Case 1, there exists a constant c7 b1 P s b ¥ 5ρ 109ρ2 ¡ 0, such that when m ¥ c7, it holds that 3̧ 2 5ρc6 | f1 psq|2 ¥ 2 2 364π 109ρ 364π 2 Case 2: maxλPr0,2{ms |g pλq| b2j . j 0 ¡ m |f1psq|{3. This implies that there exists a λ0 P r0, 2{ms such ? ? that |g pλ0 q| ¥ m |f1 psq|{3 ¥ c6 m{6. Since b is a unit vector, |g pλq| ¤ m, and it follows that ? f1 psq ¤ 3{ m. By Zygmund (2002), the derivative of g pλq satisfies |g 1 pλq| ¤ m3{2 . Therefore, on ? an interval I1 r0, 3{ms of length c6 {p12mq, |g pλq| ¥ c6 m{12. On this interval, it holds that » m s c6 ?m 2 sinpλ{2q 4 iuλ f puqe du ¥ 4. There exists a c8 It follows that s 12 ¡ 0 such that when m ¥ c8, » m s c6 ?m iuλ ¥ . f p u q e du s 13 2 » » m s iuλ f puqe du 2 πm I s 1 ρ2 m{ρ dλ ¥ pmλ{ρq2 πm2 ρ2 b1 Vs b ¥ 1 ¥ λ c36 ρ3 2028π ρ2 p Note that b1 P s b 9q ¥ 1 m c36 ρ3 18252π ρ2 p ¸ ¸ » 1 s{m » 1 s{m m 3m 3 s{m m 3 » ¸ 1 s{m { E Similarly, we have b1 Vs b bq εt puqB 1 q m s q 0 mI1 c24 m ρ dλ 2 169 ρ λ2 c56 ρ3 18252π pρ2 |f1psq| ¥ 2 bi bj eρ|uv|{m Bq1 s m q 0 j 0 9q » m 9q puq du ¥ 0. Proof: Γk could be written as sum of the following four matrices. Γk 0 Hence, the proof is pGk,2qij ρ2 pGk,3qij ρ pGk,4qij ρ » » 0 1 0 1 » » 0 0 1 1 0 0 °4i1 Gk,i. 1 ps uqB 1 ps vq dudvds, γ0 pu, v qBk,i k,j » 01 » 01 » 1 0 γ0 pu, v qBk,i ps uqBk,j ps v q dudvds, γ0 pu, v qBk,i puqBk,j pv q dudv, γ0 pu, v qBk,i p1 uqBk,j p1 v q dudv, 34 j 0 2 Lemma 5. The minimum eigenvalue of Γk , defined in Theorem 3 is Opk 1 q. pGk,1qij b2j . puqBj1 mpvq dudv ¥ 0, so both Ps and Vs are non-negative definite. completed by setting c1 maxtc7 , c8 u, and c5 ρ mintρ2 , 1u c2 6 . 18252π pρ2 9q »1»1»1 3̧ for i, j 1, . . . , k. Define Wk,1 and Wk,2. » » » σ 2 1 1 1 ρ|uv| 1 1 ps vq dudvds, e Bk,i ps uqBk,j 2ρ 0 0 0 » » » ρσ 2 1 1 1 ρ|uv| e Bk,i ps uqBk,j ps v q dudvds, 2 0 0 0 pWk,1qij pWk,2qij for i, j 1, . . . , k. For any b tb1, . . . , bk u P Rk , b1 Gk,1 b ¥ So Gk,1 © Wk,1 © »1 E 0 »1 E 0 »1 E ķ »1 i 1 0 ķ » 1 » 1 bi 0 »1 bi εt1 puqB 1 2 εt1 puq D1 2 q be a unit vector, and t0 ¤ j ¤ 2m 1 : ¸ j 3 B1 k,i 2 puq du ds ds ¥ b1 Wk,1 b ¥ 0. puq du 0. In a similar way, we can show that Gk,2 Wk,2 . Let b pb0 , . . . , b2m Wk,1 ds φpu v qXt1 pv q dv k,i i 1 0 0 1 ps uqdu bi Xt puqBk,i i 1 0 ķ 2 b2l © Wk,2 © 0. Thus Γk © ¥ 1{p3mqu, l j t0, 1, . . . , 2m 1uzD1. Since °jPD °jlj3 b2l ¤ 2m 1{p3mq 2{3, it holds that ° °j 3 2 j PD lj bl ¥ 1{3. For each j P D1 and j ¤ m 1, by Lemma 4 , There exists an interval Ij r1 j {m, 1 pj 1q{ms of length c5 {m, such that for each s P Ij and D2 2 1 » 1 » 1 2m¸ 2 0 0 h,l 0 1 ps uqB 1 ps vq bh bl Bk,h k,l ρ2 Bk,h ps uqBk,l ps v q eρ|uv| dudv ¥ c6 ¸ j 3 b2l . l j It follows that » 1 » 1 » 1 2m¸ 2 0 0 0 h,l 0 1 ps uqB 1 ps vq bh bl Bk,h k,l 3 c6 ¥ cm ¸ P ¸ ρ2 Bk,h ps uqBk,l ps v q eρ|uv| dudvds j 3 ¤ b2l . j D1 ,j m 1 l j By applying reverting the order of the rows and the columns, we can show that for each j ¥ m, there exists an interval Ij r1 each s P Ij and j » 1 » 1 2m¸ 2 0 0 h,l 0 1 ps uqB 1 ps vq bh bl Bk,h k,l j {m, 1 pj 1q{ms of length c5 {m, such that for ρ2 Bk,h ps uqBk,l ps v q eρ|uv| dudv ¥ c6 ¸ j 3 l j 35 P D1 b2l . So similarly, » 1 » 1 » 1 2m¸ 2 0 0 0 h,l 0 ¥ 1 ps uqB 1 ps vq bh bl Bk,h k,l c3 c6 m ¸ P ¸ ρ2 Bk,h ps uqBk,l ps v q eρ|uv| dudvds j 3 ¥ b2l . j D1 ,j m l j Therefore, b1 pWk,1 Wk,2 qb ¥ c3 c6 , 6m and the proof is complete. Lemma 6. Assume Ξ : r0, 1s2 Ñ r0, 1s is a function, and satisfies * " Ξpu1 , v q Ξpu2 , v q Ξpv, u1 q Ξpv, u2 q p 1q }Ξ }8 sup , 8; u1 u2 u1 u2 0¤u ,u ,v ¤1,u u * " 2 B Ξpu, v q B 2 Ξpu, v q B 2 Ξpu, v q p 2q }Ξ }8 sup Bu2 , BuBv , Bv 2 8. 1 2 1 2 (38) ¤ ¤ 0 u,v 1,u v Define the k k matrix Λ with the pj, lq-th entry »1»1»1 0 0 0 1 ps uqB 1 ps vqΞpu, vq dudvds. Bk,j k,l (39) Then }Λ} Opk 1 q. Proof: Let Sj be the support of Bk,j , and let Sjl be the set of s which makes the integral »1»1 0 0 1 ps uqB 1 ps vqΞpu, vq dudv Bk,j k,l (40) nonzero. Define two subsets of Sjl , Sjl1 : ts : s P Sjl , pSj Y Sl q rs 1, ssu, and Sjl2 : Sjl zSjl1 . Use λ to denote the Lebesgue measure. Let k is the mesh size. For each 1 2m 3, and note that 1{m ¤ j ¤ k, we consider the entries in the j-th row of the matrix Λ. There are three cases. Case 1. If ||l j | m| ¤ 3, then λpSjl q ¤ 7{m, and it follows that |pΛqjl | ¤ C1{k, where C1 is a constant depending only on Ξ. Observe that for each j, the inequality ||l j | m| ¤ 3 only holds for at most 14 different values of l. Case 2. If |j l| ¤ 3, then λpSjl1 q ¤ 7{m, and integrating (40) over s P Sjl1 leads to a value bounded by C1 {p2k q. For s P Sjl2 , the integral (40) can be written as »1»1 0 0 1 ps uqB 1 ps vqΞpu, vq dudv Bk,j k,l » » 36 Sj Sl 1 puqB 1 pvqΞps u, s vq dudv. Bk,j k,l Let uj be the left boundary of Sj , then » » 1 puqB 1 pvqΞps u, s vq dudv Bk,j k,l Sj Sl » » 1 puqB 1 pvqrΞps u, s vq Ξps u , s vq dudvs Bk,j j k,l Sj Sl » » 1 puqB 1 pvq| }Ξp1q }8 pu u q dudv ¤ C1 {p2kq. ¤ |Bk,j j k,l Sj (41) Sl Since λpSjl2 q ¤ 1, we have |pΛqjl | ¤ C1{k. Case 3. If 3 |j l| m 3, then for each s P Sjl1 , it must holds that either Sj rs 1, ss or Sl rs 1, ss. Similar as (41), it holds that for each s P Sjl1 , the integral (40) has an absolute value less than C1 {k. For s P Sjl2 , since Sj and Sl has no intersection, we have BΞps uj , s vlq Ξps u, s v q Ξps uj , s v q pu uj q B u BΞps uj , s vlq pv vlq Rpu, vq Bv where |Rpu, vq| ¤ }Ξp2q}8pu uj v vl q2 , u P Sj , v P Sl . Therefore » » » » 1 puqB 1 pvqRpu, vq dudv 1 puqB 1 pvqΞps u, s vq dudv Bk,j Bk,j k,l k,l Sj Sl Sj Sl » » 1 puqB 1 pvq| }Ξp2q }8 pu u v v q2 dudv ¤ C1 {k2 . ¤ |Bk,j j l k,l Sj (42) Sl Since λpSjl1 q ¤ 8{m and λpSjl2 q ¤ 1, it holds that |pΛqjl | ¤ Ck1 m8 C1 k2 ¤ Ck22 , where C2 is a constant which only depends on Ξ. Combining these three cases, we see that there exists a constant C3 which only depends on Ξ, such that }Λ}8 ¤ C3{k, and }Λ}1 ¤ C3{k. Therefore, }Λ} ¤ }Λ}18{2}Λ}11{2 ¤ C3{k. Proof of Theorem 4: By Corollary 6.26 of Schumaker (1981), }φ φrk }8 ¤ c1kζ , }φ1pq φr1k pq}8 ¤ c1kζ 37 1 , (43) where c1 is an absolute constant. There are four terms in the definition of ApBk,j , rk q. We first consider »1»1»1 0 0 0 1 ps uqr1 ps vqγ pu, vq dudvds. Bk,j k Let Sj and Sj1 ts : λpSj X rs 1, ssq ¡ 0u, ts P Sj : Sj rs 1, ssu, Sj2 Sj zSj1. When s P Sj1, it holds that » 1 » 1 1 1 ¤ C ρ k ζ 1 . B p s u q r p s v q γ p u, v q dudv k,j k 0 0 where C is constant which only depends on ρ, σ 2 and φ. When s P Sj2 and 3 j the fact that ³ Sj 1 puq du 0 and Lemma 2, we have Bk,j k 2, using » 1 » 1 1 1 0 0 Bk,j ps uqrk ps v qγ pu, vq dudv » » 1 1 puqr1 ps vq γ ps u, vq γ ps u , vq dudv Bk,j j k 0 Sj ¤ Ckζ . Noticing that λpSj1 q ¤ 4{m, and combining the previous two cases, we see that » 1 » 1 » 1 1 1 ζ 0 0 0 Bk,j ps uqrk ps v qγ pu, vq dudvds ¤ Ck . It can be shown that the other three terms in the definition of ApBk,j , rk q have the same order Opk ζ q, and hence |pµk qj | ¤ Ckζ . By Lemma 5, }bk } }Γk 1µk } Opkζ { q. 3 2 It follows that }bk pq}8 ¤ }Bk pq1bk }8 }rk pq}8 Opkζ { q. 3 2 For the statement regarding }σk2 pq}8 , it suffices to show that }Σk } Opkq. We proceed by calculating the long-run variances of the three processes ut , zt and At bk . First we observe that zt is a martingale, and Varpzt q Γk . 38 The covariance between utj and ut Covputj , ut h,l q » r0,1s4 h,l is ApBk,j , rk qpu1 , v1 qApBk,l , rk qpu2 , v2 q rγhpu1, u2qγhpv1, v2q γh pu1 , v2 qγh pv1 , u2 qs du1 du2 du3 du4 . By (43), we have ApBk,j , rk qpu, v q ¤ Cρ k ζ 1 , where Cρ is constant which only depends on ρ. The preceding bound, together with Lemma 2 implies that, |Covputj , ut h,l q| ¤ 2Cρ2{p1 κ22q2σ4 k2ζ | |. 2 2h κ Therefore, the operator norm of the long-run variance of ut is of the order Opk 2ζ The pj1 , j2 q-th entry of the matrix EpAt bk b1k A1t » ķ r0,1s4 h q is ApBk,l1 , Bk,j1 qpu1 , v1 qpbk ql1 pbk ql2 ApBk,l2 , Bk,j2 qpu2 , v2 q 3 q. (44) l1 ,l2 1 rγhpu1, u2qγhpv1, v2q γhpu1, v2qγhpv1, u2qs du1du2du3du4. By the definition of Ap, q, the product ApBk,j , Bk,h qApBk,j , Bk,h q can be expanded to 16 1 1 2 2 terms. We first consider the term » r0,1s 6 1 ps1 u1 qB 1 ps1 v1 qB 1 ps2 u2 qB 1 ps2 v2 q Bk,l k,j1 k,l2 k,j2 1 (45) rγhpu1, u2qγhpv1, v2q γhpu1, v2qγhpv1, u2qs ds1ds2du1dv1du2dv2. Let Bk be the set of the pairs pj, lq such that either |j l| ¤ 3, or ||j l| m| ¤ 3. Lemma 2 gives bounds on derivatives of γh pu, v q. Using these bounds, and a similar argument as the proof of Lemma 6, we can show that there exists a constant c2 ¡ 0 such that the absolute value of (45) can be controlled as $ 2h 2 ' ' ' & c2 κ {k if pj1 , l1 q P Bk and pj2 , l2 q P Bk ; c2 κ2h {k 3 if one and only one of pj1 , l1 q and pj2 , l2 q belongs to Bk ; ' ' ' % c2 κ2h {k4 if pj1 , l1 q R B and pj2 , l2 q R B . k k It can be shown that the other 15 terms have the same bounds, and we omit the details. Let C be a k k matrix whose pj, lq-th entry is 1{k when pj, lq P Bk , and 1{k 2 when pj, lq R Bk , then 1 2h 2ζ }E pAtbk b1k A1t hq} ¤ c3κ2h 2 }C |bk | |bk | C} ¤ c4 κ k 3 , where c3 and c4 are absolute constants, and |bk | consists of entry-wise absolute values of bk . Therefore, the long-run variance of At bk is of the order Opk 2ζ 39 3 q, and the proof is complete. pXt, t P Zq satisfies (4) and φi P C 2r1, 1s for i is a stationary process, where Xt pXt , Xt1 , . . . , Xtp 1 q1 , Lemma 7. If the CFAR(p) process X 1, . . . , p, then Xt °8j0 ∆j εtj pεt, 0, . . . , 0q1, and ∆ is defined in (26). depend on tφi , i 1, . . . , pu and ρ such that: εt There exist three constants ι, t0 and λ which only (i) }∆h } ¤ ιλh for h ¥ 2. (ii) }γh }8 ¤ σ2ιλ|h| for h P Z. (iii) For h ¥ 1, " B γ p u, v q h max u,v Bu , (iv) For h ¥ 1, Bγh pu, vq * 2 h Bv ¤ σ ικ . " * B2 γh pu, vq B2 γh pu, vq B2 γh pu, vq max u,v Bu2 , BuBv , Bv2 ¤ σ2ικh. Proof: By Lemma 5.1 and Theorem 5.1 in Bosq (2000), it is straightforward to see that Xt has the °8j0 ∆j εtj . Since we show that }∆j }1{j Ñ λmax 1 in the proof of Theorem 2, there exists an integer t0 such that }∆h } ¤ λh , for h ¥ t0 , where λ p1 λmax q{2. For h ¥ t0 , any u, v P r0, 1s, following expression: Xt γh pu, v q ¤ where δ 1 EpXt puq1 Xt p 8̧ 1 p j 0 1 h pv qq p 8̧ E p∆j εtj qpuq1p∆j h εtj qpv q j 0 }p∆j εtj qpuq}2 }p∆j h εtj qpv q}2 2 h ¤ 2ρpδσp1 λ λq , maxj¥0 }∆j }. By the definition of ∆i, we have ∆q1 ∆q2 . . . ∆qn f psq » r0,1sn φq1 ps u1 qφq2 pu1 u2 q . . . φqn puqn 1 uqn qf pun q du1 . . . duqn . The entries in i-th row of ∆p can be regarded a polynomial in t∆1 , . . . , ∆p u of degree p i 1, without intercept (the identity operator). Hence, we can define two linear operators ∆p1q and ∆p2q in Hp r0, 1s such that ∆p1q f psq r∆p f psqs1 , and ∆p2q f psq r∆p f psqs2 . Specifically, assume that the pi, j q-th entry in ∆ is ∆q1 ∆q2 . . . ∆qn , and we define pi, j q-th entries in ∆p1q and ∆p2q as follows p∆p1qf psqqij p∆p2qf psqqij » » r0,1s n φ1q1 ps u1 qφq2 pu1 u2 q . . . φqn qpuqn 1 uqn qf pun q du1 . . . duqn , r0,1s n φ2q1 ps u1 qφq2 pu1 u2 q . . . φqn qpuqn 1 uqn qf pun q du1 . . . duqn . 40 p 1 1 respectively, Then, the operator norms of ∆p1q and ∆p2q are bounded by pdp,1 κmax and pdp,2 κpmax where κmax maxtκ1, . . . , κp, 1u, dp,1 max1¤j¤p,0¤s¤1 |φ1j psq|, and dp,2 max1¤j¤p,0¤s¤1 |φ2j psq|. On the other hand, by the definition of ∆, it suffices to show that the operator norms of 1 }f } and pd κp1 }f } respectively, Bp∆j f psqq{Bs and Bp∆j f psqq{Bs2 are bounded by pdp,1κpmax 8 p,2 max 8 for 1 ¤ j p. It follows that, 1 λhp Bγhpu, vq ¤ σ2δ1λh , Bγhpu, vq ¤ σ2δdp,1κpmax , Bu 2ρpp1 λq Bv 2ρp1 λq p1 hp λ B2γhpu, vq ¤ σ2dδ2λh , B2γhpu, vq ¤ σ2δdp,2κmax , Bu2 2ρpp1 λq Bv2 2ρp1 λq p1 hp λ B2γhpu, vq ¤ σ2δ2δdp,2κmax , BuBv 2ρp1 λq 1 δ u, and δ maxtρ2 , pd κp1 δ u. p, where δ1 maxtρ, pdp,1 κpmax 2 p,2 max for h ¥ t0 The proof is complete. pXt, t P Zq satisfies (4). Let ε0 be an i.i.d. copy of ε0, and Xt be obtained by replacing ε0 with ε0 in the definition of Xt . If f : r0, 1s2 Ñ R be a continuous function, and set }f }8 : maxu,v |f pu, v q|, then for p ¤ h ¤ p, Lemma 8. The CFAR(p) process X Yt,h »1»1 0 0 f pu, v qXt puqXth pv q dudv, Then, for each q f pu, v q, such that and Yt,h »1»1 0 0 f pu, v qXt puqXth pv q dudv. P N, there exists a constant ι which depends on tφi, i 1, . . . , pu, ρ, σ2, p and }Yt,h Yt,h }q ¤ ιλt. Proof: By Cauchy-Schwartz inequality and Lemma 7, for t ¥ t0 }Yt,h Yt,h }q ¤ ¤ ¤ ¤ p, » 1» 1 f pu, v qrXt puq Xt puqsXth pv q dudv q 0 0 » 1» 1 f pu, v qrXth pv q Xth pv qsXt puq dudv q 0 0 »1»1 ∆t εt puq ∆t εt puq }Xth pv q}2q dudv }f }8 2q 0 0 »1»1 ∆th εth puq ∆th εth puq }Xt pv q}2q dudv }f }8 2q 0 0 2 q 1{p2qq σ q pγmax 2}f }8 λtp rp 2q 1q!!s rp2q 1q!!sq1{p2qq ρ 1{2 2σγmax rp2q 1q!!s1{q }f }8 λtp. ρ1{2 41 where γmax maxsPr0,1s VarpXtpsqq. Proof of Theorem 5: Xt psq p ¸ ķ The proof is similar to that of Theorem 3. Xt can be decomposed as p ¸ rti qpsq βrk,i,j pBk,j X prk,i Xrtiqpsq εt psq pφi δtqpsq. i 1 i 1 i 1j 1 p ¸ °p r i1 prk,i Xti qppj 1q{N q rt and w r t be two pN 1q-dimensional vectors whose j-th entries are Let v °p 1q-dimensional vector whose j-th i1 pφi δt qppj 1q{N q respectively. Let εt be the pN p k can be decomposed as entry is εt ppj 1q{N q. The estimate β Ţ 1 Ţ p k βr k rt εt w r tq . β M1t Σ1 Mt M1t Σ1 pv tp 1 tp 1 and o pN 2 q We claim that under the condition T Ţ ?1 E pM1tΣ1vrtq, Γk,N pk β βr k Γk,N1 T1 Ţ Ţ 1 T (46) t p 1 t 1 Γ k,N bk,N opp1q. E pM1 Σ1Mtq, and bk,N T Let µk,N rt M1t Σ1 w Γk,N1 µk,N , then 1 Γ k,N pM1tΣ1vrt µk,N q t p 1 pM1tΣ1Mt Γk,N qbk,N Ţ 1 T t p 1 op pT 1{2 q. M1t Σ1 εt (47) t p 1 We claim that p k βr k β 1 Γ1 bk k Ţ Γ1 1 k put µk q 1 Γ1 t p 1 k T Ţ t p 1 op pT 1{2 q. pAt Γk qbk T T Ţ zt (48) t p 1 By Lemma 8, we can finish this proof similar to that of Theorem 3. Lemma 9. Let q ¥ 2 be an integer. matrices: Consider two pqk q pqk q symmetric positive semi-definite Ω1 Ω 0 0 , Ω2 0 Ω11 Ω12 , Ω21 Ω22 where Ω and Ω11 are pq 1qk-dimensional square matrices. Assume there exist positive constants c1 ¤ c2 such that }Ω11} ¤ c2, Ω © c1 Ipq1qk , Ω22 Then Ω1 Ω2 ©c 1 42 c21 Iqk . 2c2 © c1Ik . Proof: Let 0 c 1, and define Ω3 c1{2 Ipq1qk 0 c1{2 Ik 0 Ω11 Ω12 0 c1{2 Ik 0 Ω21 Ω22 Ω1 Ω2 Ω3 0 c1 Ω11 Ω12 Ω21 c Ω22 We see that Ω3 is positive semi-definite, and 1 Ω pc 1qΩ11 c1{2 Ipq1qk . 0 p1 cqΩ22 Since Ω © c1 Ipq1qk and }Ω11 } ¤ c2 , it holds that Ω pc1 1qΩ11 By taking c 2c2 {pc1 © pc1 pc1 1qc2qIpq1qk . 2c2 q, we have Ω1 Ω2 c21 Iqk , c1 2c2 © Ω3 and the proof is complete. Lemma 10. Assume that φ1 , . . . , φp Theorem 5 is Opk 1 q. P C 2 r1, 1s, The minimum eigenvalue of Γk , defined in Proof: For any function Ψi , denote by pΨi εt qpuq the random variable ³1 0 Ψi pu, v qεt pv q dv. Define γ11 pu, v q : σ 2 eρ|uv| . For each 2 ¤ i ¤ p, let γ1i pu, v q : Covrεt puq, pΨi1 εt qpv qs and γi1 pu, v q : γ1i pv, uq. For every pair 2 ¤ i, j ¤ p, define γij pu, vq CovrpΨi1εtqpuq, pΨj1εtqpvqs. each pair 1 ¤ i, j ¤ p, define a k k matrix Ξij , with ph, lq-th entry pΞij qhl »1»1»1 0 0 0 1 ps uqB 1 ps vq Bk,h k,l ρ2 Bk,h ps uqBk,l ps v q ds γij pu, v qdudv. For each 1 ¤ i ¤ p, define the p p block matrix Ξ Ξi,i1 ii Ξ i1,i Ξi1,i1 . .. Γp,i .. . Ξ1,i Ξ1,i1 0 0 ... © p ¸ i 1 43 Ξi,1 . . . Ξi1,1 .. .. . . ... Ξ11 ... 0 Using the representation (7), we know Γk Now for Γp,i . 0 .. . . 0 0 0 By Lemma 5, there exists a constant c1 ¡ 0 such that Ξ11 © c1{k Ik . P C 2r1, 1s implies that all functions γij pu, vq satisfies the assumption (38), and then by Lemma 6, the operator norms of all matrices Ξij have the order Opk 1 q The condition that φ1 , . . . , φp uniformly. Then we can apply Lemma 9 inductively to complete the proof. Proof of Theorem 6: By Lemma 9 and Lemma 10, we can complete the proof of Theorem 6, following the proof of Theorem 4. References Antoniadis, A., Paparoditis, E., and Sapatinas, T. (2009), “Bandwidth selection for functional time series prediction”, Statistics and Probability Letter, 79, 733-740. Aue, A., Noriho, D.D., Hörmann, S. (2012), “On the prediction of functional time series”, Technical Report. Berkes, I., Horvath, L., and Rice, G. (2013), “Weak invariance principles for sums of dependent random function”, Stochastic Processes and their Application, 123, 385-403. Besse, P., Cardot, H., and Stephenson, D. (2000), “Autoregressive forecasting of some functional climatic variations”, Scandinavian Journal of Statistics, 27, 673-687. Bosq, D.(2000), Linear Processes in Function Spaces, Theory and Applications. Lecture Notes in Statistics, New York: Springer-Verlag. Brockwell, P. and Davis, R. (1990), Time Series: Theory and Methods, New York: Springer. Brumback, B. and Rice, J.A.(1998), “Smoothing spline models for the analysis of nested and crossed samples of curves”, Journal of the American Statistical Association, 93, 961-994. Cai, Z., Fan, J., and Yao, Q. (2000), “Functional-coefficient regression models for non-linear time series”, Journal of the American Statistical Association, 95, 941-956. Chan, K.S. (1993), “Consistency and limiting distribution of the least squares estimator of a threshold autoregressive model”, The Annals of Statistics, 520-533. Chen, R. and Tsay, R.S. (1993a), “Functional coefficient autoregressive models”, Journal of the American Statistical Association, 88, 298-308. Chen, R. and Tsay, R.S. (1993b), “Nonlinear additive ARX models”, Journal of the American Statistical Association, 88, 955-967. Chen, R., Liu, J.S., and Tsay, R.S. (1995), “Additivity tests for nonlinear autoregressive models”, Biometrika, 82, 369-383. Chen, X. (2008), “Large sample sieve estimation of semi-nonparametric models”, Handbook of Econometrics, 6, 5549-5632. 44 Chen, X. and Shen, X. (1998), “Sieve extremum estimates for weakly dependent data”, Econometrica, 66, 289-314. Cressie, H. and Huang, H. (1999), “ Classes of nonseparable, spatio-temporal stationary covariance function”, Journal of the American Statistical Association, 94, 1330-1339. Diebold, F. X. and Li, C. (2006), “ Forecasting the term structure of government bond yields”, Journal of Econometrics, 130, 337-364. Fan, J. and Gijbels, I. (1996), Local Polynomial Modelling and Its Applications, London: Chapman and Hall. Fan, J. and Yao, Q. (2003), Nonlinear Time Series: Nonparametric and Parametric Methods, New York: Springer-Verlag. Ferraty, F. and Vieu, P. (2006), Nonparametric Functional Data Analysis, New York: Springer. Gasser, T., Müller, H.G., Köhler, W., Molinari, L., and Prader, A. (1984), “Nonparametric regression anlaysis of growth curves”, Annals of Statistics, 12, 210-229. Gneiting, T. (2002), “Nonseparable, stationary covariance functions for space-time data”, Journal of the American Statistical Association, 97, 590-600. Grenander, U. and Szego, G.(1984), Toeplitz Forms and Their Applications, University of California Press. Halberstam, H. and Richert, H.E. (2013), Sieve Methods, Courier Dover Publications. Hall, P. and Heyde, C.C. (1980), Martingale Limit Theory and Its Application, New York: Academic press. Handcock, M.S. and Wallis, J.R. (1994), “An approach to statistical spatial-temporal modeling of meteorological fields”, Journal of the American Statistical Association, 89, 368-378. Härdle, W., Chen, R. and Lüetkepohl, H. (1997), “A review of nonparametric time series analysis”, International Statistical Review, 65, 49-72. Hörmann, S., Horváth, L., and Reeder, R. (2013), “A functional version of ARCH model”, Econometric Theory, 29, 267-288. Hörmann, S. and Kokoszka, P.(2010), “Weakly dependent functional data”, The Annals of Statistics, 38(3), 1845-1884. Horváth, L., Huskova, M., and Kokoszka, P. (2010), “Testing the stability of the functional autoregressive process”, Journal of Multivariate Analysis, 101, 352-367. Horváth, L. and Kokoszka, P. (2012), Inference for Functional Data with Applications, Springer. Horváth, L., Kokoszka, P., and Reeder, R.(2012), “Estimation of the mean of functional time series and a two-sample problem”, Journal of Royal Statistical Society: Series B, 75, 103122. Houghton, A.N., Flannery, J., and Viola, M.V. (1980), “Malignant melanoma in Connecticut and Denmark” , International Journal of Cancer, 25, 95-104. 45 Huang, J.Z. (2003), “Local asymptotics for polynomial spline regression”, The Annals of Statistics, 31(5), 1600-1635. Hyndman, R. and Shahid, Md. U. (2007), “Robust forecasting of mortality and fertility rates: a functional data approach”, Computational Statistics & Data Analysis, 51, 4942-4956. James, G.M., Hastie, T.J., and Sugar, C.A. (2000), “Principal component models for sparse functional data”, Biometrika, 87(3), 587-602. James, G.M. and Sugar, C.A. (2003), “Clustering for sparsely sampled functional data”, Journal of the American Statistical Association, 98(462), 397-408. Keselman, H.J. and Keselman, J.C. (1993), “Analysis of repeated measurements”, In L.K. Edwards (ed.) Applied analysis of Variance in Behavioral Science, New York: Marcel Dekker, 105-145. Lüetkepohl, H. (2005), New Introduction to Multiple Time Series Analysis, Berlin: Springer. Neuman, E. (1981), “Moments and Fourier transforms of B-splines”, Journal of Computational and Mathematics, 7(1), 51-62. Nadaraya, E.A. (1964), “On estimating regression”, Theory of Probability & Its Application, 9, 141-142. Ratcliffe, S.J., Leader, L.R., and Heller, G.Z. (2002a), “Functional data analysis with application to periodically stimulated foetal heart rate data. I: Functional regression”, Statistics in Medicine, 21, 1103-1114. Ratcliffe, S.J., Leader, L.R., and Heller, G.Z. (2002b), “Functional data analysis with application to periodically stimulated foetal heart rate data. II: Functional logistic regression”, Statistics in Medicine, 21, 1115-1127. Ramsay, J.O. and Silverman, B.W. (2005), Functional Data Analysis, New York: Springer. Rao, B.P. (1999), Semimartingales and Statistical Inference, London: Chapman and Hall. Roberts, J. M. (1995), “New Keynesian economics and the Phillips curve”, Journal of Money, Credit and Banking, 975-984. Ruppert,D., Sheather, S., and Wand, M.P. (1995), “An effective bandwidth selector for local least squares regression”, Journal of the American Statistical Association, 90(432), 1257-1270. Silverman, B.W. (1984), “Spline smoothing: the equivalent variable kernel method”, The Annals of Statistics, 898-916. Schumaker, L. (1981), Spline Functions: Basic Theory, New York: John Wiley & Sons. Shojo, I. (1998), “Approximation of continuous time stochastic processes by a local linearization method”, Mathematics of Computation, 67, 287-298. Tong, H. (1983), Threshold Models in Non-linear Time Series Analysis, Lecture notes, SpringerVerlag. Tiao, G.C. and Tsay, R.S. (1989), “Model specification in multivariate time series”, Journal of Royal Statistical Society. Series B, 157-213. 46 Tiao, G.C. and Tsay, R.S. (1983), “Multiple time series modeling and extended sample crosscorrelations”, Journal of Business &Economic Statistics, 1(1), 43-56. Tsay, R.S. (2010), Analysis of Financial Time Series, New Jersey: John Wiley & Sons. Watson, G.S. (1964), “ Smooth regression analysis”, Sankhyā: The Indian Journal of Statistics, Series A, 359-372. Wu, W.B. (2005), “Nonlinear system theory: Another look at dependence”, Proceedings of the National Academy of Sciences of the United States of America, 102(40), 14150-14154. Xia, Y. and Li, W.K. (1999), “On single-index coefficient regression models”, Journal of the American Statistical Association, 94, 1275-1285. Zhou, S., Shen, X., and Wolfe, D.A. (1998), “ Local asymptotics for regression splines and confidence regions”, The Annals of Statistics, 26, 1760-1782. Zygmund, A. (2002), Trigonometric series, Cambridge University press. 47