Convolutional Autoregressive Models for Functional Time Series

advertisement
Convolutional Autoregressive Models for Functional
Time Series
Xialu Liu, Han Xiao, and Rong Chen
Rutgers University
Abstract
Functional data analysis has became an increasingly popular class of problems in statistical
research. However, functional data observed over time with serial dependence remains a less
studied area. Motivated by Bosq (2000), who first introduced the functional autoregressive
(FAR) models, we propose a convolutional functional autoregressive (CFAR) model, where
the function at time t is a result of the sum of convolutions of the past functions with a
set of convolution functions, plus a noise process, mimicking the autoregressive process. It
provides an intuitive and direct interpretation of the dynamics of a stochastic process. Instead
of spectral decomposition approach commonly used in functional data analysis, we adopt a
sieve estimation procedure based on B-spline approximation of the convolution functions. We
establish convergence rate of the proposed estimator, and investigate its theoretical properties.
The model building, model validation, and prediction procedures are also developed. Both
simulated and real data examples are presented.
KEYWORDS: Functional time series; Nonparametric methods; Spline methods; Sieve estimation.
1
Introduction
Functional data analysis has received much attention in the last few decades, and has been widely
applied in many fields, including medical science (Houghton et al., 1980; Gasser et al., 1984;
Ratcliffe et al., 2002a; Ratcliffe et al., 2002b), behavioral science (Keselman and Keselman, 1993),
Xialu Liu is Ph.D. student, Department of Statistics, Rutgers University, Piscataway, NJ 08854. Email: xialu.liu@rutgers.edu. Han Xiao is Assistant Professor, Department of Statistics, Rutgers University, Piscataway,
NJ 08854. E-mail: hxiao@stat.rutgers.edu. Rong Chen is Professor, Department of Statistics, Rutgers University,
Piscataway, NJ 08854. E-mail: rongchen@stat.rutgers.edu. Rong Chen is the corresponding author.
1
and economics (Roberts, 1995; Diebold and Li, 2006). Nonparametric methods, such as spline
methods (Silverman, 1994; Brumback and Rice, 1998; Zhou, et al., 1998; Cai et al., 2000) and
kernel smoothing (Nadaraya, 1964; Watson, 1964; Gasser et al., 1984; Fan and Gijbels, 1996), were
often implemented to analyze functional data. Unsupervised learning methods, such as principal
component analysis (James et al., 2000) and clustering analysis (James and Sugar, 2003) were
extended for functional data. Books by Ramsay and Silverman (2005), Ferraty and Vieu (2006),
Horváth and Kokoszka (2012), provide a comprehensive introduction on functional data analysis.
Often, a variety of functional data is observed over time and has serial dependence. For
example, in financial industry, the implied volatility of an option as a function of moneyness
changes over time. In insurance industry, age-specific mortality rate as a function of age changes
over time. In banking industry, term structure of interest rates (yield as a function of time to
maturity of a bond) changes over time. In meteorology, daily records of temperature, precipitation
and cloud cover for a region, viewed as three related functional surfaces, change over time.
Time series analysis, designed to explore the underlying dynamics of the data, is well studied
and understood, with modern development in nonlinear (Tong and Lam, 1980; Chan, 1993),
nonparametric (Chen and Tsay, 1993a; Chen and Tsay, 1993b; Härdle et al., 1997; Xia and Li,
1999; Cai et al., 2000; Fan and Yao, 2003), multivariate (Tiao and Tsay, 1983; Tiao and Tsay,
1989; Lüetkepohl, 2005) and spatial-temporal modelling (Handcock and Wallis, 1994; Cressie
and Huang 1999; Gneiting, 2002). However, these scalar or vector time series models cannot be
applied directly to functional data. Bosq (2000) first introduced functional autoregressive (FAR)
models with order p,
Xt
where X
∆1Xt1
. . . ∆p Xtp
εt ,
pXt, t P Zq and ε pεt, t P Zq are a sequence of random functions and a functional
white noise process respectively, and ∆i , is a linear operator in Hilbert functional space H. The
linear operators can be estimated by performing functional principal component analysis on the
sample autocovariance operators. The consistency of such estimators has been proved (Bosq,
2000; Hörmann et al. 2013). All the theoretical and empirical results in the literature have been
developed based on the models and methods in Bosq (2000), including Hörmann and Kokoszka
(2010), Horváth et al. (2010), Aue et al. (2012), Horváth et al. (2012), Berkes et al. (2013), and
Hörmann et al. (2013). So far functional time series data is still a less studied area.
Sieve methods are a popular set of tools to estimate parameters in an infinite-dimensional
space, including the functional space. It optimizes an objective function over a sequence of finite2
dimensional parameter subspaces, which are called sieve spaces. Hence, the problem is reduced
to a sequence of parametric ones. The consistency of the sequence of the sieve estimators can be
derived, under certain conditions of sieve spaces. Sieve methods have been discussed in Chen and
Shen (1998), Chen (2008), Halberstam and Richert (2013).
In this article, we develop a new class of functional time series models called the convolutional
functional autoregressive (CFAR) models, along with its associated estimation procedure using
splines and sieve methods. As a special case of Bosq (2000), our model provides an intuitive and
direct interpretation of the dynamics of a stochastic process. It assumes that the function at time
t is a result of the sum of convolutions of the past functions with convolution functions plus a
noise process, mimicking the autoregressive process commonly used in scalar time series. It is also
an extension of vector autoregressive process. The paper makes contributions to the literature in
three aspects. First, we establish the convergence rate of the convolution function estimator in
a general case, while Bosq (2000) only considered consistency. Second, in reality functional data
are often observed at discrete points, hence nonparametric methods such as splines method are
used to obtain a continuous curve. The possible estimation error introduced by these methods
was overlooked by Hörmann and Kokoszka (2010), Horváth et al. (2010), Horváth et al. (2012),
Hörmann et al. (2013). Under the CFAR model settings, we consider the recovery procedure of
functional data, when studying asymptotic properties of the estimators. It can be shown that the
estimation error due to discrete observation points is of smaller order, hence as long as the number
of observations at each time goes to infinity, the estimator is consistent and the convergence rate
does not depend on the number of discrete samples available directly. Thirdly, we develop model
building, model validation and prediction procedures for CFAR models, while the picture of FAR
models is less complete due to lack of specific model assumptions.
The rest of the paper is organized as follows. In Sections 2 and 3, detailed model setting
and statistical inference procedures are introduced. The asymptotic properties are presented in
Section 4. Simulation results are presented in Section 5 and a real example is analyzed in Section
6. All proofs are shown in the Appendix.
3
2
Convolutional Functional Autoregressive Models
We introduce some notations first. For any vector µ, pµqi denotes its i-th entry. For any matrix
H, pHqij denotes its pi, j q-th entry. Let
Liph r1, 1s
Lipr2
tf P r1, 1s : |f px δq f pxq| ¤ M δh, M 8u,
h
r1, 1s tf P C r r1, 1s : f prq P Liphr1, 1s, r P N, h P p0, 1su,
P Lipζ2 r1, 1s, then ζ is called moduli of smoothness of f pq, which measures the smoothness
of f pq.
If f
Without loss of generality, in this paper, we restrict ourselves only to consider functional time
series in the space L2 r0, 1s.
If a function f : Ω Ñ Rr0,1s is defined on a probability space pΩ, F, P q, and f
P L2r0, 1s, then
f is called a random function on r0, 1s. The space containing all the random functions on r0, 1s is
denoted by Hr0, 1s, when L-2 norm is applied.
Definition 1. A sequence of random functions ε
pεt, t P Zq in Hr0, 1s is said to be a white
noise, if
E}εt}22 σ2 8, Epεtq 0, and Covpεtps1q, εtps2qq
s1 , s2 P r0, 1s.
2) Covpεt ps1 q, εt h ps2 qq 0, for h 0 and s1 , s2 P r0, 1s.
1) 0
does not depend on t for any
pXt, t P Zq in Hr0, 1s is (weakly) stationary, if the mean and covariance functions do not vary with time, i.e., for @h P Z, s1 , s2 P r0, 1s,
Definition 2. A sequence of random functions X
EpXt ps1 qq µps1 q,
CovpXn
and
A sequence of random functions X
h
ps1q, Xm hps2qq CovpXnps1q, Xmps2qq.
pXt, t P Zq in Hr0, 1s is called a convolutional functional
autoregressive model with order p, CFAR(p), if
Xt psq
p »1
¸
i 1 0
φi ps uqXti puqdu
εt psq,
s P r0, 1s,
(1)
P L2r1, 1s for i 1, . . . , p, are called convolution functions, and ε pεt, t P Zq are
i.i.d. Ornstein-Uhlenbeck processes defined on r0, 1s, following the stochastic differential equation,
dεt psq ρεt psqds σdWs , ρ ¡ 0 with Ws being a Wiener process.
Remark 1. The skeleton of Xt pq, excluding the noise process, defined as ft pq, is the sum of
where φi
convolutions of φi and Xti . It can also be viewed as the sum of non-normalized smoothed
4
versions of the p past functions tXt1 , . . . , Xtp u. From a pointwise view, ft psq is a weighted sum
of tXt1 , . . . , Xtp u, and the weights φi ps uq depend on the distance between s and u.
Remark 2. We assume the noise process in model (1) is an Ornstein-Uhlenbeck process, which is
spatially stationary, Markovian and continuous but not differentiable with the following properties:
εt ps1 q N p0,
σ2
q,
2ρ
Corrpεt ps1 q, εt ps2 qq eρ|s1 s2 | ,
@s1, s2 P r0, 1s.
The variance is constant, and the correlation of the process at s1 and s2 is determined by the
distance of s1 and s2 . If all the convolution functions tφi pq, i
1, . . . , pu are continuous, Xt is
also continuous, but not differentiable.
The convolution functions tφi pq, i
1, . . . , pu determine the pattern of Xtpq process.
Here
we show three examples to illustrate the impact of the convolution functions.
Example 1. Convolutional functional autoregressive model of order 1, CFAR(1),
Xt psq ρ 5, σ 2
»1
0
φps uqXt1 puqdu
εt psq,
s P r0, 1s,
(2)
10. We consider three convolution functions:
(i) φpsq 1, s P r1, 1s, and X0 pq 0;
(ii) φpsq 1 |s|, s P r1, 1s, and X0 pq 10;
(iii) φpsq I ps ¡ 0q, s P r1, 1s, and X0 pq 10.
Simulated processes of functions for each φpq are shown in the top, middle and bottom panel
1, 2, 3, 100. The solid lines and dashed lines are Xtpq and ftpq
respectively. In case(i), since φpq is a constant function, ft pq is simply the average of Xt1 pq
hence a constant function. In case(ii), φpq is a unimodal function around 0. We start the process
with a constant function, X0 pq 10, and f1 psq is larger when s is around 0.5. In case (iii), φpq
is an indicator function on p0, 1s, so the skeleton of ft psq would be a partial integration of Xt1 pq
on the left of s in r0, ss. At s 0, Xt p0q contains no information of Xt1 , but only noise; as
s increases, the weight functions φps q increases and information carried by Xt psq on Xt1 pq
increases as well; at s 1, Xt p1q is the integration of the function Xt1 pq in the entire range of
r0, 1s plus noise. It is worth noting that in case(i), we start at X0 0, but Xtpq gets explosive as
time goes by. On the other hand, although we start at a large value, X0 psq 10 for both cases of
(ii) and (iii), the processes become close to 0 when t 100. This is because, the process in case(i)
of Figure 1, respectively, for t
5
0.6
0.8
1.0
0.8
1.0
0.8
1.0
2
1
f[99, ]
−3
0.6
0.8
1.0
0.0
Figure 1: Plots of Xt pq and ft pq, t
0.6
0.8
1.0
0.4
0.6
0.8
1.0
0.8
1.0
8 10
f[99, ]
6
8 10
f[4, ]
0.4
0.2
t=100
0
0.2
1.0
8 10
f[99, ]
0.4
−4
0.0
0.8
0
0.2
6
8 10
4
2
f[3, ]
1.0
0.6
−4
0.0
t=3
0
0.8
0.4
6
8 10
0.6
−4
0.6
0.2
t=100
4
f[4, ]
0.4
6
8 10
6
4
2
0
0.4
0.0
0
0.2
t=2
−4
0.2
1.0
−4
0.0
t=1
0.0
0.8
0
0.6
0.6
−4
0.4
0.4
6
8 10
4
f[3, ]
2
0
−4
0.2
0.2
t=3
6
8 10
6
4
2
0
−4
0.0
−5
0.0
t=2
4
0.4
2
0.2
−1 0
1
f[4, ]
−3
−5
0.0
4
1.0
2
0.8
2
0.6
t=1
4
0.4
2
0.2
−1 0
1
−5
−3
f[3, ]
−1 0
1
−1 0
−3
−5
0.0
t=100
2
t=3
2
t=2
2
t=1
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
1, 2, 3, 100 for Example 1(i) (top panel), Example 1(ii)
(middle panel) and Example 1(iii) (bottom panel).
is nonstationary, while the processes in case(ii) and case(iii) are stationary, and their stationary
means are a constant function at 0.
Theorem 1 presents a sufficient condition for the stationarity of CFAR(1) models.
Theorem 1. The CFAR(1) process X
κ sup
pXt, t P Zq, defined in (1) is (weakly) stationary, if
» 1
¤¤
0 s 1
1{2
φ ps uqdu
2
0
1.
(3)
Here weak stationarity is acturally equivalent to strong stationarity, since we assume that
the noise process is Gaussian. It is easy to see that }φ}22
1 is also a sufficient condition for
³1
stationarity. However, the condition in Theorem 1 is a weaker one, since for any s, 0 φ2 ps uqdu ¤
³1 2
³1 2
1 φ puqdu. Note that 0 φ ps uqdu is the sum of squares of the convolution weights of Xt1 pq
to obtain ft psq.
Theorem 2 presents a sufficient condition for stationarity of CFAR(p) models.
Theorem 2. The CFAR(p) process X
pXt, t P Zq defined in (1) is (weakly) stationary, if all
the roots of the characteristic function
1 κ1 z κ2 z 2 . . . κp z p
6
0,
(4)
are outside the unit circle, where κi
b³
1
sup0¤s¤1
0
φ2i ps uqdu for i 1 . . . , p.
Condition in Theorem 2 is similar to the sufficient condition for scalar AR(p) models to be
stationary, replacing the AR coefficients by the maximum of the norm of the weights function
φi ps q, s P r0, 1s.
Corollary 1. The CFAR(1) process X
on r0, 1s2 :
Ψ1 ps, uq φps uq,
pXt, t P Zq satisfies (3). Define a sequence of functions
Ψ` ps, uq Then Xt has the following representation
Xt psq εt psq
Corollary 2. The CFAR(p) process X
on r0, 1s2 :
»1
0
8̧ » 1
` 1 0
Ψ`1 ps, v qφpv uq dv, for ` ¥ 2.
(5)
Ψ` ps, uqεt` puq du.
(6)
pXt, t P Zq satisfies (4). Define a sequence of functions
Ψ1 ps, uq φ1 ps uq,
Ψ` ps, uq Ψ` ps, uq »1
¸
` 1
¸»
i 1 0
p
1
i 1 0
φi ps v qΨ`i pv, uq dv
Φi ps, v qΨ`i pv, uq dv,
Then Xt has the following representation
Xt psq εt psq
8̧ » 1
` 1 0
φ` ps, uq,
2 ¤ ` ¤ p,
` ¡ p.
Ψ` ps, uqεt` puq du.
(7)
Corollary 1 and Corollary 2 are derived from Theorem 3.1 and Theorem 5.1 of Bosq(2000)
directly. This is similar to that of stationary scalar AR(p) case when no constant is in the model.
To include nonzero mean function µpq, we use Xt psq µpsq 3
°p ³1
i1 0 φi ps uqpXti puq µpuqqdu.
Estimation, Prediction and Order Determination
Assume that we observe Xt pq at discrete points, s n{N , n 0, . . . , N , for time t 1, .., T .
3.1
Estimation
Since the convolution operator guarantees continuous path of Xt pq, we approximate it with linear
interpolation of the observations. For any s P r1, 1s, if sn1
¤ s sn, let
rt psq psn sqXt psn1 q ps sn1 qXt psn q .
X
1{N
7
(8)
Because Xt pq is continuous but not differentiable and observations are not subject to noises,
³
linear interpolation suffices to evaluate φi ps uqXti puqdu.
In a typical nonparametric fashion, we approximate the unknown convolution functions φi pq
with a B-spline approximation. Specifically,
ķ
φi pq φrk,i pq βrk,i,j Bk,j pq, for i 1, . . . , p,
(9)
j 1
where tBk,j pq, j
1, . . . , ku are uniform cubic B-spline functions with degree of freedom k.
Plugging in the linear interpolation of Xti pq and B-spline approximation of φi pq into (1),
we get
εt psn q Xt psn q p
¸
ķ
βrk,i,j
»1
0
i 1j 1
Let r
ε1t
pεrt,0, . . . , εrt,N q1 be an pN
rti puqdu.
Bk,j psn uqX
(10)
1q 1 vector. εrt,n is the approximated noise at sn and time
Xtpsq Mtβr k , where
rti puqdu, and
Mt pMt,1 , . . . , Mt,p q, Mt,i is an pN 1q k matrix, pMt,i qnj 0 Bk,j psn uqX
r k is a pk 1 vector, βr k pβr 1k,i , . . . βr 1k,p q1 , and pβk,i qj βrk,i,j . Since Bk,j pq are fixed and known
β
t defined as on the right hand side of (10). It can be expressed as ε̃t
³1
functions, Mt is known, given the observations. Under the Ornstein-Uhenleck process, for equally
spaced sn , εt pq follows an AR(1) process with AR coefficient eρ{N , and covariance matrix Σ,
where pΣqij
eρ|ij|{N σ2{2ρ.
Therefore, β
tβij , i 1, . . . , p, j 1, . . . , ku, σ2, and ρ can be
estimated by maximizing the approximated log-likelihood function,
Qk,T,N pβ, σ 2 , ρq pk
Let β
βpk,i,j .
pN
1qpT
2
pq ln πσ2 N pT pq lnp1 e2ρ{N q 1
ρ
Ţ
2 tp
2
ε̃1t Σ1 ε̃t ,(11)
1
pβp 1k,1, . . . , βp 1k,pq1 be the estimator of βr by maximizing Qk,T,N pβ, σ2, ρq, where pβp k,iqj After reparameterization, the objective function can be written as
Qk,T,N pβ, ϕ, ω q pN
1qpT
2
pq lnp2πωq N pT pq lnp1 ϕ2q epβ, ϕq ,
2
eρ{N , ω σ2{2ρ, Σ0 2ρΣ{σ2, and epβ, ϕq °Ttp
correlation matrix of tεt psn q, n 1, . . . , N u.
where ϕ
2ω
(12)
p1 1pεt . Σ0 is actually the
1 εt Σ0
Given β and ϕ, the maximizer of w is
p
ω
pN ep1β,qpϕT q pq .
8
(13)
Hence
pN
β,ϕ
max pN
max Qk,T,N pβ, ϕ, ω q max
β,ϕ,ω
1qpT
2
1qpT
2
ϕ
pq ln epβ, ϕq N pT pq lnp1 ϕ2q
pq ln epβpϕq, ϕq c
2
N pT p q
lnp1 ϕ2 q
2
c , (14)
where c is a constant and
p pϕq arg min epβ, ϕq β
Ţ
M1t Σ1 Mt
β
1 t p 1
Ţ
M1t Σ1 Xt psq .
(15)
t p 1
Then this problem becomes one parameter optimization, and the estimator of ϕ can be easily
obtained by maximizing (14). Together with (13) and (15), we have the estimators for σ 2 , ρ, β.
The convolution function φi pq can be estimated by φpk,i pq °k p
j 1 βk,i,j Bk,j pq.
Remark 3. The above method can also deal with non-equally spaced tsn , n 0, . . . , N u, different
tsn, n 0, . . . , N u vary with time), or different number of
observations at each time. For simplicity, we assume equally spaced ts n{N u in this paper.
observation points at each time (i.e.
3.2
Fitted values
Given estimated φi pq, the continuous process Xt pq can be interpolated more precisely than simple
linear interpolation. By taking advantage of information from the observed tXt psn q, n 0, . . . , N u
and the Markovian property of Ornstein-Uhlenbeck process, we obtain a better approximation of
Xt pq. Specifically, for a fixed s P psn , sn
frt psq 1
p »1
¸
i 1 0
q, let
rti puqds,
φpk,i ps uqX
and let
rt psq
X
frt psq
eρppssn q eρppsn
1
p 1 s q
Σ
(16)
Xt psn q frt psn q
Xt psn
p 0 is the estimated correlation matrix of εt psn q and εt psn
where Σ
1
q frtpsn 1q
,
(17)
q, with diagonal entry 1 and
rt psq Xt psq. Here frt pq is the estimated skeleton of
off-diagonal entry eρp{N . When s sn , X
Xt pq process, and Xt psn q frt psn q is the approximated residual process.
rt 1 psq iteratively. However, empirical
Remark 4. Plugging (17) into (16), we can fit frt 1 pq, and X
results show that its impact is minor with large N .
9
1
3.3
Prediction
Given X1 , . . . , Xt , the least squares prediction of Xt
pt
X
rt
where X
p »1
¸
1 psq i 1 0
1
psq is
rt
φpk,i ps uqX
puqdu,
(18)
1 i
pq is the fitted processes of Xt 1i pq in (17). The residual is defined as
1 i
εpt
In addition, if Xt
1
1
psq Xt 1psq Xpt 1psq.
(19)
pq is partially observed at s 0, N1 , . . . , s, the prediction of Xt 1psq, for
s P ps , 1s can be obtained with,
pt
X
pt
where X
3.4
1
1
psq Xpt 1psq
eρppss
q
pXt 1psq Xpt 1psqq,
(20)
psq is from (18).
Determination of the B-spline approximation order
As in all nonparametric estimation, bandwidth is the key control parameter that balances the
estimation bias and variance (Ruppert et al., 1995; Fan and Gijbels, 1996). For spline methods,
the control parameter is the degree of freedom k (Zhou et al., 1998; Huang, 2003). We propose to
choose k to minimize the out-sample rolling forecasting error instead of the typically used cross
validation criterion due to the time series nature of our problem. Specifically, for a given k, and
each t T0 , . . . , T , we use data tXh psn q, h 0, ..., t 1, n 0, . . . , N u, observed before time t to
estimate the convolution functions using B-spline with degree of freedom k, and predict Xk,t psn q
as in (18). Define overall squared rolling forecasting error
S pk q Ţ
Ņ
pXtpsnq Xpk,tpsnqq2.
(21)
t T0 n 0
The optimal k is chosen to be the one that minimizes S pk q.
3.5
AR order determination
F-test can be constructed for hypothesis testing of the significance of the convolution functions
as well as for AR order determination. For testing H0 : φr
1
pq . . . φppq 0 vs H1 : not H0,
we reject H0 if
F
pSSE prq SSE pf qq{rkpp rqs ¡ F
SSE
α,kpprq,pN
pf q {rpN 1qpT pq pks
10
qp qpk ,
1 T p
(22)
where SSE pf q and SSE prq are sum of squared estimated residuals of the approximated model
in (10) for full CFAR(p) and reduced CFAR(r) model respectively, for an optimally chosen k.
Specifically,
SSE pf q
Ţ
1
p
pε1t,1 Σ
εt,1 , SSE prq 0,1 p
t p 1
Ţ
1
p
pε1t,2 Σ
εt,2 ,
0,2 p
t p 1
p 0,1 and Σ
p 0,2 are the estimates of Σ0 , the correlation matrix of εt psq, whose pi, j q-th entry
where Σ
is eρ|ij |{N , from full and reduced models, respectively; tp
εt,i psn q, t p
i
1, 2 are the residuals of the full and reduced models, respectively.
1, . . . , T, n 0, . . . N u,
This test can be used for
model specification. We begin with CFAR(1) model, and sequentially add more lags of Xt , until
the newly introduced lag is not significant.
In addition, define the cross-sectional residuals of εt ,
et psn q εt psn q eρp{N εt psn1 q.
(23)
Under our model, tet psn q, n
1, . . . , N u is a white noise process, for t p 1, . . . , T . Let
ept psn q εpt psn q eρp{N εpt psn1 q. Hence, the features of tept psn q, n 1, . . . , N u, can be studied
with standard residual time series analysis for model validation.
4
Theoretical Properties
We study the asymptotic properties of our estimator as both N and T go to infinity. The degree
of freedom of B-spline approximation k also goes to infinity with N and T .
4.1
Sieve Framework
In this section we reformulate our method under the sieve estimation framework.
The sieve method is designed for estimation of a parameter in an infinite-dimensional space
Θ. Often optimization the objective function over the parameter space cannot be directly solved.
Instead, we optimize the function over a sequence of subspaces tΘk , k
P Z u, which are called
sieve spaces. If the sieve spaces satisfy certain conditions, the sequence of estimators, called sieve
estimators, is expected to be consistent, as the complexity of sieve spaces goes to infinity with
sample size, see Chen (2008), Halberstam and Richert (2013).
The parameter space Θ for CFAR(p) models contains all the functions from L2 r1, 1s that
satisfy stationary condition specified in Theorem 2. The sieve space, Θk , used to approximate
11
Spk X Θ, where Spk is product of p copies of Sk ,
the parameter space Θ, is defined as Θk
Sk
t
ķ
βj Bk,j pq, β
pβ1, . . . , βk q1 P Rk u,
j 1
and tBk,j pq, j
1, . . . , ku are the uniform cubic B-spline functions defined on r1, 1s with degree
of freedom k. In other words, Sk is the cubic spline space with k 4 interior knots at t1, 1
1{m, . . . , 1 1{m, 1u, where m pk 3q{2.
The population objective function to maximize we use here is,
1
Qpφ, σ , ρq E 2
2σ
»1
2
where εt psq Xt psq
ε2t
2
ρ
0
psqds
ρε2t
p0 q
ρε2t
p1 q ρ
,
(24)
°p ³ 1
i1 0 φi ps uqXti puqdu. In fact, (24) is the expectation of log-likelihood
of the Ornstein-Uhlenbeck process εt pq; see Rao (1999). Define the population objective function
Qk pβ, σ 2 , ρq in sieve space Θk ,
1
Qk pβ, σ , ρq E 2
2σ
2
where εt psq
Xtpsq °pi1 °kj1 βi,j
³1
0
»1
2
ε2t
ρ
0
psq ds
ρε2t
p0q
ρε2t
p1q ρ
Bk,j ps uqXti puq du, and φi pq
,
°kj1 βi,j Bk,j pq, i 1, . . . , p.
When Xt pq is only observed at discrete points, with (8) we have the sample objective function
Qk,T,N pβ, σ 2 , ρq
Qk,T,N pβ, σ
where εrt psq
2
pN
, ρq 1qpT
2
pq ln πσ2 N pT pq lnp1 ϕ2q 1
Xtpsq °pi1 °kj1 βi,j
ρ
³1
0
Ţ
2 tp
2
rti puqdu, and φi pq
Bk,j ps uqX
εr1t Σ1 εrt ,
1
°kj1 βi,j Bk,j pq, i 1, . . . , p. Then the sieve estimator in space Θk is
pβp k , σpk2, ρpk q arg max Qk,T,N pβ, σ2, ρq.
4.2
Asymptotic Properties for CFAR(1) Models
Let Qk be the linear operator defined in Section 6.4 of Schumaker (1981), which maps C r1, 1s
into the space Θk . Let φrk
Qk φ βrk,1Bk,1
...
βrk,k Bk,k , and rk psq φpsq φrk psq.
p k and βr k are the B-spline
Theorem 3. Xt is a stationary CFAR(1) process defined in (1). β
coefficients of φpk and φrk . Assume φ
N, T
Ñ 8,
?
P Lipζ2 r1, 1s, and ζ ¡ 1.
p k βr k bk q Ñ N p0, Σk q,
T pβ
d
12
Then with given σ 2 and ρ, as
where Σk
Γk 1Υk Γk 1, Υk is the long-run variance of the process ut
zt
1
Γ
k µk , and the entries in ut , zt , At , µk and Γk are listed as follows
put q i pz t q i pAtqij pµk qi pΓk qij where γ pu, v q
1
σ2
»1»1
»1
0
0
0
At bk , where bk
ApBk,i , rk qpu, v qXt1 puqXt1 pv qpu, v q dudv,
2ρ
Bk,i puqεt p0q
σ2
» »
1
σ
»1
0
pρBk,ipv uq
1
Bk,i pv uqq dWt pv q Xt1 puq du,
1 1 1
ApBk,i , Bk,j qpu, v qXt1 puqXt1 pv q dudv,
σ2 0 0
» »
1 1 1
ApBk,i , rk qpu, v qγ pu, v q dudv,
σ2 0 0
» »
1 1 1
ApBk,i , Bk,j qpu, v qγ pu, v q dudv,
σ2 0 0
CovpXtpuq, Xtpvqq, and A is a binary functional operator, A : H1 b H1 Ñ H2,
H1 tf : r1, 1s Ñ Ru,H2 tf : r1, 1s2 Ñ Ru.
Apf, g qpu, v q
ρf puqgpvq
ρf p1 uqg p1 v q
»1
0
f 1 ps uqg 1 ps v q ds
»1
2
ρ
0
f ps uqg ps v q ds.
(25)
Theorem 3 provides the asymptotic distribution of the estimated B-spline coefficients when
the dimension of the sieve space is fixed. It is interesting to note that the distribution does not
depend on N . As long as N goes to infinity and T
opN 2q, the estimated B-spline coefficients
converges, and the rate depends on T , but is independent of N . It is due to that errors introduced
by the linear interpolation has smaller order than others.
With Theorem 3, we can easily obtain the asymptotic pointwise estimation error of the convolution function when k is fixed.
P Lipζ r1, 1s with ζ ¡ 1,
Bk psq pBk,1 psq, . . . , Bk,k psqq1 , and
Corollary 3. Assume φ
bk psq Bk psq1 bk rk psq,
Then
?
}bk pq}8 Opkζ
where ζ0
0
o pN 2 q.
For each fixed k, define
σk2 psq Bk psq1 Σk Bk psq.
T φpk psq φpsq bk psq
Theorem 4. Assume φ P C ζ r1, 1s, where ζ
and T
Ñd N p0, σk2psqq.
¥ 2 is an integer. Then as k Ñ 8,
{ q,
3 2
mintζ, 4u. σpk2 Ñp σ2, and ρpk Ñp ρ.
13
}σk2pq}8 Opkq,
Theorem 4 shows the consistency of our proposed estimators as well as the convergence rates
of the asymptotic bias and variance. The asymptotic bias depends on the smoothness of the convolution function and complexity of the sieve space used for approximation, while the asymptotic
variance only depends on the complexity of the sieve space. It reflects the fact that the complexity
of sieve space controls the trade-off between bias and variance. When k is a small number, the bias
dominates the error, since there are not enough knots to approximate the convolution function.
When k is large enough, the variance dominates, and estimation of extra B-spline coefficients
introduces too much error. Theorem 4 also reveals the selection principle for the number of knots.
When the convolution function is smooth, a small k is favorable; when the sample size is large,
we prefer to have more knots.
Remark 5. By balancing the bias and variance, the optimal convergence rate of the estimation
2ζ0 3
error is attained at k OpT 1{p2ζ0 2q q, then both the squared bias and variance will be OpT 4ζ0 4 q.
If the moduli of smoothness of φ is greater than 4 and k
OpT 1{4q, the optimal convergence rate
is OpT 5{12 q. If B-splines with higher order are adopted, the estimator is expected to converge
faster.
4.3
Asymptotic Properties for CFAR(p)
The asymptotic properties of the estimators of CFAR(p) models are very similar to these of
CFAR(1) models. Let φrk,i
Qk φi βrk,i,1Bk,1
...
βrk,i,k Bk,k , and rk,i psq φpsq φrk,i psq.
Theorem 5 provides the bound of the B-spline coefficients estimation error, and is a higher
dimension version of Theorem 3.
p k and βr k be the B-spline
Theorem 5. Xt is a stationary CFAR(p) process defined in (1). β
coefficients of tφpk,1 , . . . , φpk,p u and tφrk,1 , . . . , φrk,p u respectively. Assume φi
ζi
¡ 1, for i 1, . . . , p. Then with given σ2 and ρ, as N, T Ñ 8,
? p r
d
T pβ k β k bk q Ñ N p0, Σk q .
P
Lipζ2i r1, 1s, and
Γk 1Υk Γk 1, Υk is the long-run variance of the process ut zt Atbk , where bk 1
1
1 1
1
1 1
1
1 1
Γ
k µk . Here ut put,1 , . . . , ut,p q , zt pzt,1 , . . . , zt,p q , µk pµk,1 , . . . , µk,p q , Γk and At can be
partitioned into p p blocks.
where Σk
Γ
Γk,1
k,0
Γk,1
Γk,0
Γk ..
..
.
.
Γk,p1 Γk,p2
. . . Γk,p
. . . Γk,p
..
..
.
.
...
1
2
A
At,1,2 . . . At,1,p
t,1,1
A
At,2,2 . . . At,2,p
, At t,2,1
..
..
..
..
.
.
.
.
Γk,0
At,p,1 At,p,2
14
. . . At,p,p
.
The entries in ut , zt , At , bk and Γk are listed as follows
put,hqi pzt,hqi pAt,l,hqij pµk,hqi pΓk,hqij for i, j
p » »
1 ¸ 1 1
ApBk,i , rk,q qpu, v qXth puqXtq pv qpu, v q dudv,
σ 2 q1 0 0
»1
0
1
σ2
2ρ
Bk,i puqεt p0q
σ2
»1»1
0
p
0
1
σ
»1
0
pρBk,ipv uq
1
Bk,i pv uqq dWt pv q Xth puq du,
ApBk,i , Bk,j qpu, v qXtl puqXth pv q dudv,
» »
1 ¸ 1 1
ApBk,i , rk,q qpu, v qγqh pu, v q dudv,
σ 2 q1 0 0
1
σ2
»1»1
0
0
ApBk,i , Bk,j qpu, v qγh pu, v q dudv,
1, . . . , k and l, h 1, . . . , p, where γq pu, vq CovpXtpuq, Xt q pvqq, q P Z.
With Theorem 5, we have the asymptotic pointwise estimation error for each convolution
function when k is fixed.
P Lipζ r1, 1s, with ζi ¡ 1, i 1, . . . , p, and T opN 2q. For each
fixed k, define Bk,i psq pB1k,i,1 , . . . , B1k,i,k q1 , Bk,i,i pBk,1 psq, . . . , Bk,k psqq1 , and Bk,i,j is a pdimensional vector whose entries are all 0, for j i. Then
Corollary 4. Assume φi
2
psq Bk,ipsq1Σk Bk,ipsqq,
σk,i
bk,i psq Bk,i psq1 bk rk,i psq,
for i 1, . . . , p.
Then
?
T φpk,i psq φpsq bk,i psq
Theorem 6. Assume φi
for i 1, . . . , p.
i
0
5
2
psqq,
Ñd N p0, σk,i
P C ζ r1, 1s with ζi ¥ 3, and T opN 2q. It holds that as k Ñ 8,
}bk,ipq}8 Opkζ
where ζ0
{ q,
3 2
2
}σk,i
pq}8 Opkq,
for i 1, . . . , p,
mintζ1, . . . , ζp, 4u. σpk2 Ñp σ2, and ρpk Ñp ρ.
Simulations
In this section, we illustrate the performance of the proposed estimators, and demonstrate the
impacts of k, T and N through two simulated examples.
For each model and combination of pk, T, N q, we generate the corresponding functional time
series 100 times and estimate the convolution functions. The performance of estimation is evaluated by the integrated squared error, Ei
}φpk,i φi}22, for i 1, . . . , p.
15
Example 2. Consider a simple CFAR(1) model with convolution function φpq being the normal
density function with mean 0 and standard deviation 0.1, truncated at
we use ρ 5 and σ 2
r1, 1s. For noise process,
10. We demonstrate various features of the model with a typical data set
whose estimation error is the median value of the 100 simulated sets for k 11, N 100, and
T 100.
Table 1: The average (sd) of errors when k
7, 11, 19 for Example 2
k
distance
N
T
E
7
1/2
100
100
0.4257 (0.0955)
11
1/4
100
100
0.1685 (0.0954)
19
1/8
100
100
0.3318 (0.1370)
Table 1 shows the average of E for different k, with fixed T
100 and N
100. The
second column shows the distance between two adjacent knots. As k increases, average of E
decreases first, due to improvement of the sieve approximation, and then increases, because of the
introduction of extra B-spline coefficients. It reaches the minimum when k
Table 2: The average (sd) of errors when T
11.
10, 100, 1000 for Example 2
k
N
T
E
11
100
10
4.3748 (3.3497 )
11
100
100
0.1685 (0.0954)
11
100
1000
0.0312 (0.0187)
Table 2 shows the estimation performance as the sample size T changes from 10, 100 to 1000,
with fixed k and N . As T increases, the means and standard deviations of E all decrease, and
the estimates become more accurate and stable. It can be seen that E decreases approximately
at the rate of 1{T .
Table 3: The average (sd) of errors when N
10, 100, 1000 for Example 2
k
N
T
E
11
10
100
0.2015 (0.1001)
11
100
100
0.1685 (0.0954)
11
1000
100
0.1687 (0.0968)
Table 3 summarizes the estimation performance for k
16
11 and T 100 with different N .
It
is seen that when N is small, increasing N improves the performance of the estimators as they
benefit from the increase of information. However, when N is large enough, E stays at the same
level, because there are sufficient observations to get an accurate approximation of Xt pq by linear
interpolation. Due to the strong spatial correlation in the error process, dense observations do
not provide much extra information.
t= 3
0.6
0.2
0.8
1.0
0.4
1
0.8
1.0
0.6
0.0
0.2
0.8
1.0
0.2
0.4
0.6
0.8
1.0
0.6
●
●
0.8
●●
●
●●
●●
●
●●
●
●●
●●
●
●
2
3
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
1
●
●
●
●
●
●
●
● ●●
●
●●
●●●●
●●
● ●●
●
●●
●● ●
●
●
●
●
● ●
●● ●
●●
●
●●
●
●
●
●
0.0
0.4
4
●
●
●
●
●
●
●
●
● ●●
●
●●
●
●●
●●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●● ●●
●
● ●● ●
●
●●
●
●●
●
●
●
●
t= 9
3
1
0
●
0.2
0.6
●●
●
●
●●
●
●
t= 8
●
●●●
●
●●
●
●
●
● ●
●●
●
●
● ●
●
●
●
●
●
●
● ● ●●
●
●
●● ● ●
●
●
0.0
0.4
0
3
2
1
0
0.0
2
2
1
0.4
1.0
●
●●
●
●
●
●●
●
●●
●
●
●
●
−3
−2
●
● ●
●
●●
●● ●
●
●
0.2
0.8
●
●
●●
●
●
●●
●
●
●
●
●●
●
●●●
●●
●
●
●
●
● ●● ● ●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
−1
0
−1
●
●
●
●● ●● ●
●
● ●
●
●
●
●
● ●●●
●
●
●
●
●
●● ●●
●
●
●
●
●●
●
● ●●
●
●
●
●●
● ●●
●
●
●
●
●
●
● ●
●●
−2
2
1
●
●●
●
●
●
●
● ●
●●
●
●
●
●
●
0.0
0.6
t= 7
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
0.4
0
0.2
−1
0.0
t= 6
●
●●●
●●
● ●
●
●
●
●
●● ●
●
● ●
●●
● ●
●
●
●●
●
●
●
●●
●
●
●●
●● ● ●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
● ●
●
● ●
●
●
●●
●
●
●●
●● ●
●
●●
●●●
●
●
●●
●●
● ●●
●●
●● ●
● ●
● ●● ●
●●●
●●●
●● ●
●
●
●
●
●
● ●
●
●
●
●
●● ●
●
●● ●●
−2
1.0
●
●●
●
●●
●
●
●
●● ●
●●
●● ●
●
● ●● ●
●
●
●
●
●
●
●●
●
−2
0.8
●
●
●
●
●
●
● ●
●
●
●● ●
●
●
●● ●● ●
●
●●
● ● ● ●
●
● ●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●● ●
●● ●
● ●
●
●●●
●
●●
●
●
●●●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●●
●
● ●
●
●
●
●
●●●
●
●
●
●
●
−1
●
●
●
●
●
●
●
●
−1
1
0
●
●
●
●
●
●
●
●
●●
●●●
●
0
0.6
●
●
● ●●
●●
●
●●
●
●
●
−1
0.4
t= 5
●
●
●
●●●
●
●
●●
●●
●
●
−2
0.2
●
●
−3
−3
●
●
●
●
0.0
●
●
●●●●
●
●
●
●
●
●
●●
●●
● ●● ●
●
●●●
●
−1
1
0
−1
−2
●
●
●●
●
●
●●
●
●●
●
●●
●
●●
●●
●
●
●
● ●●
−2
●
●●
● ●●●
●●
●●
●●●
●●
●
●● ●
●●●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
−2
●
●
● ●
●
●
●●
●
●
●
●
●● ●
●
● ● ●●
●
● ●
●
●●●●
● ●●●
●●
●
●
●
●
●
●
●
●
●
t= 4
2
t= 2
1.0
0.0
0.2
0.4
0.6
0.8
1.0
Figure 2: Plots of predicted Xt pq when k
the typical data set from Example 2.
15, N 100, and T 100, t 2, 3, 4, 5, 6, 7, 8, 9, for
The black lines are Xt pq; the blue lines are ft pq; the red
lines are the predictions; the black circles are the observations.
Figure 2 displays the predicted Xt pq using (18) from time 2 to 9, using k
N
15, T 100 and
100 for the typical data set. Because the estimated φpq is very close to the true convolution
function, the predictions (red lines) are also close to the skeleton (blue lines). Also note that the
noisy process is large in magnitude with strong spatial correlation, hence simple smoothing of
Xt pq will be far away from the true skeleton ft pq.
Figure 3 shows the performance of using out-sample forecasting to choose k, for T
100,
T {2 and N 100 . The left panel in Figure 3 displays the sum of squared forecasting error
across k for the typical data set. The minimum is achieved at k 15. It is seen that the sum of
T0
squared forecasting error experiences two different phases as k increases. At the beginning, it is
very large, due to the lack of knots to approximate φpq well. As the complexity of sieve space
k increases, the squared forecasting error reduces very quickly, and the performance of estimator
improves, until its minimum is reached. In this phase, the bias dominates the estimation error.
After that, the error increases gradually with k because the fixed sample size T is not sufficient
17
30
5200
Histogram of chosen k for 100 data sets
20
●
5 10
5000
●
●
●
●
●
●
●
● ●
● ● ● ● ● ●
0
4800
sum of squared forecasting error
Sum of squared forecasting error
5
10
15
20
0
5
10
k
15
20
25
30
k
Figure 3: Left panel: plot of the mean squared out-sample forecasting error against k for the
100 and N 100 from Example 2; right panel: histogram of chosen k
for 100 data sets when T 100 and N 100 from Example 2.
typical data set when T
for the extra knots estimation. In this phase the variance dominates the error.
The right panel in Figure 3 shows the histogram of chosen k for the 100 data sets. It is
truncated at k
5, and most data sets require the dimension of the sieve space to be greater
than 8. Note that when k is odd, the prediction error is smaller than that when k is even. This
is because φpq reaches its maximum value and maximum second derivatives at 0. The estimate
benefits from a knot at 0.
−1.0
−0.5
0.0
0.5
1.0
4
−1
0
1
2
3
4
3
2
1
0
−1
−1
0
1
2
3
4
5
k=11
5
k=7
5
k=5
−1.0
−0.5
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
0.5
1.0
5
−1
0
1
2
3
4
5
4
2
1
0
−1
−0.5
−0.5
k=101
3
4
3
2
1
0
−1
−1.0
−1.0
k=19
5
k=15
−1.0
−0.5
0.0
0.5
1.0
−1.0
Figure 4: Plots of φpq (solid line) and φppq (dashed line) with T
5, 7, 11, 15, 19, 101 for the typical data set from Example 2.
18
−0.5
0.0
100, N
100, and k
Figure 4 shows the estimated φpq for the typical data set when k
5, 7, 11, 15, 19, 101.
As
expected, the accuracy of the estimate improves first and then becomes worse as k increases. The
top panel shows the first phase where the bias dominates, and the bottom panel shows the second
phase where the variance dominates. When k
11, φppq is very close to the true convolution
function, shown in the top right plot of Figure 4.
Table 4 reports the performance of the F-test, where the first row is for testing H0 : φ
under CFAR(1) model, and the second row is for testing H0 : φ2
0
0 under CFAR(2) model,
comparing with CFAR(1) model. The second column shows the p-values for the typical data set.
The last 5 columns show the minimum, mean, maximum and standard deviation of the p-values,
and frequency of rejection at 5% level, respectively. φ1 pq is significant for all the data set, and
φ2 pq is not significant for most of them.
Table 4: The p-values for H0 : φi
0, when k 11, T 100, N 100 for Example 2
Overall
H0
Significance
Example
Min
Mean
Max
Sd
frequency
0 0.0001 0.0001 0.0001 0.0001 0.0001
φ2 0
0.8442
0.0286
0.5698
0.9832
0.2679
φ1
100%
2%
Example 3. A CFAR(2) model is considered in this example. Here we use φ1 psq
and φ2 psq
1
2
cosp2πsq, for s
P r1, 1s.
We fix ρ
5, and σ2 10.
100.
Table 5: The average (sd) of errors when k
2
Again, a typical data is
selected whose estimation error is the median of these of 100 data sets when k
N
?
2?50π e50s ,
11, T 500 and
5, 7, 11, 19 for Example 3
k
distance
N
T
E1
E2
5
1
100
500
0.2711 (0.0406)
0.0464 (0.0216)
7
1/2
100
500
0.1227 (0.0317)
0.0446 (0.0226)
11
1/4
100
500
0.0623 (0.0335)
0.0596 (0.0286)
19
1/8
100
500
0.1029 (0.0434)
0.0972 (0.0410)
Tables 5-7 show the average and standard deviation of E1 and E2 with different combination
of (k, T, N ). The pattern is similar to that in Example 2. As k increases, the errors decrease first,
reach the minimum, then increase. E1 and E2 are roughly linear in T , but does not change too
much with N .
19
Table 6: The average (sd) of errors when T
k
N
T
E1
E2
11
100
100
0.3000 (0.1626)
0.3128 (0.1822)
11
100
200
0.1493 (0.0693)
0.1406 (0.0838)
11
100
500
0.0623 (0.0335)
0.0596 (0.0286)
Table 7: The average (sd) of errors when N
N
T
E1
E2
11
100
500
0.0623 (0.0335)
0.0596 (0.0286)
11
200
500
0.0623 (0.0334)
0.0595 (0.0285)
11
500
500
0.0622 (0.0330)
0.0593 (0.0284)
35
●
●
●
●
●
●
0
●
5
24700
●
●
25
Frequency
●
15
25300
Histogram of chosen k for 100 data sets
●
24900
25100
100, 200, 500 for Example 3
k
Sum of squared forecasting error
sum of squared forecasting error
100, 200, 500 for Example 3
6
8
10
12
14
6
k
8
10
12
14
k
Figure 5: Left panel: plot of the mean squared out-sample forecasting error against k for the
500 and N 100 from Example 3; right panel: histogram of chosen k
for 100 data sets when T 500 and N 100 from Example 3.
typical data set when T
9, or k 11 are
500, T0 T {2, and N 100.
Figure 5 reads similarly to Figure 3. For most of the data sets, k
selected based on the out-sample prediction criterion, when T
The estimated φ1 pq and estimated φ2 pq for the typical data set are shown in Figure 6, when
0 than
that around the boundary. This is because φi puq’s are used as weights for obtaining Xt psq, when
u P r1 s, ss. Specifically φi p0q’s are the convolution weights for Xt psq with u P r1 s, ss.
Every Xt psq, s P r1, 1s contains information of φi p0q but only Xt p1q contains information about
φi p1q. Hence, the estimate of φi pq in the center exploits data more efficiently than that close to
k
11, T 500 and N 100.
The estimated function is more accurate around s
20
the boundary.
Estimated φ2
−1.0
−0.5
−0.5
0.5
0.0
1.5
0.5
1.0
2.5
Estimated φ1
−1.0
−0.5
0.0
0.5
1.0
−1.0
Figure 6: Plots of estimated φ1 pq and φ2 pq when k
−0.5
0.0
0.5
1.0
11, T 500 and N 100 for the typical
data set from Example 3.
Table 8 summarizes the p-values of a sequence of tests. The first row is for testing H0 : φ 0
under CFAR(1) model. The second row is for testing H0 : φ2
0 under CFAR(2) model,
comparing with CFAR(1) model in the F-test. The third row is defined similarly. Based on the
p-values shown in the second column, CFAR(2) would be correctly specified for the typical data
set. The last 5 columns show that F-tests perform well in determining the AR order.
Table 8: The p-values for H0 : φi
0, when k 11, T 500 N 100 from Example 3
Overall
p-value
Significance
Example
Min
Mean
Max
Sd
frequency
0 0.0001 0.0001 0.0001 0.0001 0.0001
φ2 0 0.0001 0.0001 0.0001 0.0001 0.0001
φ3 0
0.0797
0.0119
0.4906
0.9966
0.2678
φ1
6
100%
100%
2%
Real Data Analysis
We apply our method to the S&P 500 index European call option data in this section. It is
well-known that the implied volatility of an option is a function of its strike prices, and this
phenomenon is called volatility smiles. In this paper, we treat the implied volatility as a function
of moneyness, which is the relative strike price with respective to the price of underlying asset.
The option data is collected from July 9, 2004 to September 20, 2004, and the expiration date
of the options is December 18, 2004. Hence, the time series is T
51 long. We select the
options with strikes between 950 and 1550, which were actively traded within this period, and
21
have a solution for implied volatility derived from Black-Scholes model most of the time. Number
of observations at each time varies from 43 to 48. Our aim is to construct the volatility curve
against moneyness, study its evolution mechanism, and predict implied volatility in the future.
Figure 7: Plots of implied volatilities against moneyness (top left panel) across time (bottom left
panel) and plots of differenced implied volatility against moneyness (top right panel) across time
(bottom right panel). The colorbars on the right show the time (top panel) and implied volatility
(bottom panel) corresponding to different color scales.
The left panel of Figure 7 shows the implied volatility curves as functions of moneyness over
time, where the volatility smiles are clearly demonstrated, since deep in-the-money or out-themoney options have larger volatility than others. As expiration approaches, the implied volatilities
of at-the-money options decreases, the implied volatilities of out-the-money options increase, and
the location where minimum reaches approaches to 1 seen from the top left panel of Figure 7.
Hence, we take a difference of the implied volatilities, and model them under CFAR settings.
Specifically, let Yt psn q
Xtpsnq Xrt1psnq, for n 1, . . . , Nt, t 2, . . . , T , where Xtpsnq is the
rt pq is the linear interpolation of Xt pq defined
implied volatility at time t with moneyness sn , X
in (8), and Nt is number of observations at time t. Data after taking a difference is plotted in the
right panel of Figure 7. The implied volatilities of deep in-the-money and out-the-money options
22
are much more volatile than these of at-the-money options.
0, when k 11
φ2 0
Table 9: The p-values for H0 : φi
H0
p-value
φ1
0
0.0001
0.1027
0 vs H1 : φi 0 for i 1, 2 in Table 9 indicate
that CFAR(1) model should be chosen to fit the data. The estimates of φpq for k 12 (top
left panel), k 14 (top right panel) and k 18 (bottom left panel) are shown in Figure 8. It
can be seen that φppq is not very sensitive to number of knots. The bottom right panel of Figure
8 displays the sum of squared out-sample forecasting errors against k when T0 4{5T . When
k 18, the errors reaches the minimum value, hence we select k 18. φppsq is negative when
s ¤ 0.1 or 0.65 ¤ s ¤ 0.9; otherwise, it is positive, for k 18 from bottom right panel of Figure
The sequential F-tests of testing H0 : φi
8. Hence, the implied volatility with moneyness s at time t is likely to increase, when these with
moneyness u, u ¥ s 0.1, s 0.65 ¤ u ¤ s 0.9 decrease, and others increase at time t 1.
Define
ept pnq εpt psn q eρp|sn sn1 | εpt psn1 q,
n 2, . . . , Nt .
The sample autocorrelation functions of ept pnq are shown in Figure 9 for t 2, . . . , 37. Most of the
time, the sample autocorrelations of cross-sectional errors lie within confidence intervals, except
t
8, 14, 15, 30, 37.
Overall, most of the sample autocorrelations of ept psn q are not significant,
which implies that the CFAR(1) model fits data well.
One-step-ahead out-sample forecasting error of Xt pq is used to compare the prediction perfor-
pt
mance of the estimated CFAR(1) model with a functional random walk model , where X
1
psnq 0, @sn r0, 1s . Table 10 shows the estimated ρ and σ 2 using CFAR(1) model, and the mean squared
out-sample forecasting errors for both models. CFAR(1) outperforms the functional random walk
model, reducing the MSE by about 9%.
Table 10: Parameter estimates for CFAR model and out-sample forecasting MSEs for both models
ρp
σ
p2
MSE1(CFAR model)
MSE2(random walk)
0.0001
1.5176e 04
1.2755e 04
1.167e 04
23
1
0
−2
−4
−4
−2
0
1
2
Estimated convolution function, k=16
2
Estimated convolution function, k=12
−1.0
−0.5
0.0
0.5
1.0
−1.0
−0.5
0.0
0.5
0.5
1.0
0.0575
●
0.0560
●
●
●
●
0.0545
2
1
0
−2
−4
−1.0
0.0
Sum of squared forecasting error
Sum of squared forecasting error
Estimated convolution function, k=18
−0.5
1.0
●
●
●
5
●
●
10
●
●
●
●
15
●
●
●
20
k
Figure 8: Plots of φppq for different k, k
k
12 (top left panel), k 16 (top right panel) and
18 (bottom left panel), and plot of the sum of squared out-sample forecasting error against
k (bottom right panel).
Appendix
By Cauchy-Schwarz inequality, for any function x P L2 r0, 1s,
Proof of Theorem 1:
}
»1
0
¤
φps uqxpuqdu}
» 1 » 1
0
0
2
2
}x}¤1
0
»01
φ2 ps uqdu
It follows that
sup
» 1 » 1
}
»1
0
0
2
φps uqxpuqdu
ds
x2 puqdu ds ¤ κ2 ||x||.
φps uqxpuqdu}22
¤ κ2 1.
By Lemma 3.1 in Bosq (2000), Xt is stationary.
Proof of Theorem 2:
Let Hp r0, 1s be the Cartesian product of p copies of the random function
24
0.2
−0.4
ACF
0.2
−0.4
ACF
Lag
12
2 6
0.2
−0.4
ACF
0.2
ACF
Lag
12
2 6
0.2
−0.4
ACF
0.2
Lag
12
2 6
t= 36
−0.4
0.2
Lag
ACF
0.2
12
t= 37
Lag
ACF
12
t= 31
−0.4
2 6
12
12
t= 25
Lag
12
12
t= 19
t= 30
ACF
0.2
0.2
−0.4
ACF
0.2
ACF
−0.4
2 6
−0.4
2 6
Lag
2 6
0.2
−0.4
ACF
0.2
ACF
−0.4
ACF
0.2
0.2
ACF
−0.4
0.2
ACF
−0.4
ACF
0.2
12
t= 35
12
12
Lag
−0.4
2 6
ACF
2 6
Lag
t= 24
−0.4
−0.4
ACF
0.2
ACF
0.2
12
Lag
12
2 6
Lag
t= 34
−0.4
2 6
12
12
t= 13
Lag
t= 29
ACF
0.2
ACF
2 6
Lag
−0.4
2 6
−0.4
12
t= 33
12
12
2 6
t= 18
Lag
Lag
0.2
2 6
2 6
t= 23
t= 28
−0.4
ACF
0.2
12
12
−0.4
2 6
Lag
Lag
2 6
0.2
ACF
12
t= 27
t= 32
2 6
−0.4
2 6
−0.4
2 6
0.2
ACF
0.2
0.2
−0.4
ACF
0.2
12
12
12
Lag
Lag
Lag
t= 7
t= 12
t= 17
t= 22
Lag
Lag
2 6
−0.4
2 6
t= 21
t= 26
12
Lag
12
2 6
Lag
t= 16
−0.4
ACF
0.2
−0.4
2 6
−0.4
2 6
0.2
ACF
2 6
Lag
12
12
t= 11
−0.4
12
t= 15
Lag
2 6
Lag
0.2
2 6
t= 20
12
t= 10
−0.4
ACF
0.2
12
Lag
2 6
−0.4
2 6
Lag
t= 14
t= 6
−0.4
12
t= 9
−0.4
2 6
0.2
ACF
2 6
Lag
0.2
12
t= 8
0.2
2 6
t= 5
−0.4
ACF
−0.4
−0.4
t= 4
0.2
t= 3
0.2
t= 2
2 6
12
2 6
12
Figure 9: Sample autocorrelations of ept with lag 0 autocorrelation removed when t 2, . . . , 37.
space Hr0, 1s. The norm in Hp r0, 1s is defined as
g
f¸
fp
}pX1, . . . , Xpq}p e }X1}22, where X1, . . . , Xp P Hr0, 1s.
i1
Consider
κ1 κ2 . . . κp
∆1 ∆2 . . . ∆p
1 0 ... 0 I 0 ... 0 ,
∆
, K 0 1 ... 0 0 I ... 0 0
...
I
0
0
...
1
(26)
0
where I denotes the identity operator, ∆i is a convolution operator associating with φi , i.e.
∆i X
³1
0
φi p uqX puqdu. Note that }∆X}p
¤ }KX}p.
And all the roots of the characteristic
function are eigenvalues of the matrix K. Let λmax be the maximum eigenvalue in modulus of K,
and |λmax | 1. Hence, there exists an integer j, such that }∆j } ¤ }Kj } 1. By Theorem 5.1 in
Bosq (2000), CFAR(p) model is stationary if (4) is satisfied.
P Lipζ2 r1, 1s, and ζ ¡ 1, ts0 0, s1, . . . , sN 1u is an equally spaced partition of r0, 1s. For any u, v P r0, 1s, let fu and gv be pN 1q1 vectors, fu pf ps0 uq, f ps1 uq, . . . , f psN uqq1 ,
Lemma 1. Let f, g
25
pgps0 vq, gps1 vq, . . . , gpsN vqq1. Σ is defined in Section 3.1, where pΣqij eρ|ij|{N σ2{2ρ.
Define AN pf, g qpu, v q fu1 Σ1 gv , then
gv
AN pf, g qpu, v q where ζ0
OpN ζ0
1
Apf, g qpu, v q
σ2
1
q,
(27)
mintζ, 2u.
Proof: Let ϕ eρ{N , and it is easy to show that Σ1
a
1 ϕ2
1
a
ϕ
2
1ϕ U
0
σ2ρ U1U, where
2
0 .
1 ϕ
0
0
(28)
1
And we have
Ufu
a
1
1 ϕ2
a
1 ϕ2 f ps0 uq f ps1 uq ϕf ps0 uq . . . f psN
uq ϕf psN 1 uq
1
,
Ugv has a similar expression, hence
f 1 Σ1 gv
u
2ρ
f puqg pv q
σ2
2ρ
p1 ϕ2qσ2
2ρ
f puqg pv q
σ2
2ρ
p1 ϕ2qσ2
Ņ
2ρ
p1 ϕ2qσ2
Ņ
n 1
Ņ
pf psn uq ϕf psn1 uqq pgpsn vq ϕgpsn1 vqq
rf psn uq f psn1 uqs rgpsn vq gpsn1 vqs
n 1
p1 ϕq rf psn uq f psn1 uqs gpsn1 vq
n 1
Ņ
2ρ
p1 ϕ2qσ2 n1p1 ϕqf psn1 vq rgpsn vq gpsn1 vqs
Since f, g
ζ
Ņ
2ρ
p1 ϕ2qσ2
L1
L2
p1 ϕq2f psn1 vqgpsn1 vq
n 1
L3
L4
L5 .
P Lipζ2 r1, 1s, |f 1puq f 1pvq| ¤ M |u v|, if ζ ¥ 2; |f 1puq f 1pvq| ¤ M |u v|ζ
1,
2, for some positive constant M . It follows that
Ņ
1 f psn uq f psn1 uq
g psn v q g psn1 v q
2ρ
L2 N p1 ϕ2 qσ 2 n1 N
1{N
1{N
2ρ
N p1 ϕ2 qσ 2
1
σ2
»1
0
1
1 1
pf psn1 uq
N
n1
f 1 ps uqg 1 ps v q ds
OpN ζ0
OpN ζ0
1
1
qqpg1psn1 uq
σρ
q, and L5 σ
2
0
26
OpN ζ0
1
qq
q.
³1 1
OpN ζ0
0 f ps uqg ps v qds
ρ2 ³ 1
f ps uqg ps v qds OpN ζ0 1 q.
2
Similarly, we obtain L3
OpN ζ0
Ņ
1
q, L4 σρ
2
³1
0
f ps uqg 1 ps v qds
if
Using integration by parts, we have
fu1 Σ1 gv
1
σ2 ρf puqgpvq
OpN ζ0
1
»1
ρf p1 uqg p1 v q
0
q.
Lemma 2. If the CFAR(1) process X
»1
f 1 ps uqg 1 ps v q ds
2
ρ
0
f ps uqg ps v q ds
pXt, t P Zq satisfies (3) and φ P C 2r1, 1s, then there
exists a positive constant ι which only depends on φ and ρ such that:
(i) }Ψh }8
(ii) }γh }8
¤ κh for h ¥ 2.
¤ σ2ικ|h| for h P Z.
(iii) For h ¥ 1,
"
B
γ
p
u,
v
q
h
max u,v
Bu ,
and when h 0, for any u1 , u2 , v
Bγh pu, vq *
2 h
Bv ¤ σ ικ ,
P r0, 1s,
|γ pu1, vq γ pu2, vq| ¤ σ2ι|u1 u2|.
(iv) For h ¥ 1,
" 2
B
γh pu, v q max u,v
Bu2 ,
and when h 0,
2
B γh pu, vq BuBv ,
" 2
B
γ
p
u,
v
q
max 2
uv
Bu ,
2
B γ pu, vq BuBv ,
Proof: By Cauchy-Schwarz inequality, for any u, v
» 1
|Ψ2pu, vq| |Ψhpu, vq| ¤
0
φpu sqφps v q ds ¤
» 1
0
» 1
1{2 » 1
pu, sqds
Ψ2h 1
0
2
B γh pu, vq *
2 h
Bv 2 ¤ σ ικ ,
0
2
B γ pu, vq *
2
Bv 2 ¤ σ ι.
P r0, 1s,
1{2 » 1
φ2 pu sqds
1{2
φ ps v qds
2
0
1{2
φ2 ps v qds
¤ κ}Ψh1}8,
for h ¥ 3.
Hence, Lemma 2(i) can be proved inductively.
By the definition of Ψ, for h ¥ 2,
Ψh pu, v q »
r0,1sh
φpu s1 qφps1 s2 q . . . φpsh1 v q ds1 . . . dsh1 ,
27
¤ κ,
then we have
BΨhpu, vq » φ1pu sqΨ ps, vq ds, BΨhpu, vq »
h1
Bu
Bv
r0,1s
r0,1s
h
Ψh1 pu, sqφ1 ps v q ds,
h
B2Ψhpu, vq » φ2pu sqΨ ps, vq ds, B2Ψhpu, vq » Ψ pu, sqφ2ps vq ds,
h1
h1
Bu2
Bv2
r0,1s
r0,1s
Let d1 max1¤s¤1 |φ1 psq| and d2 max1¤s¤1 |φ2 psq|, and it holds that
! BΨ pu, vq BΨ pu, vq )
h
h1
max , h
u,v
Bu
Bv ¤ d1κ , for h ¥ 1,
! B2 Ψ pu, vq B2 Ψ pu, vq )
h
h
h1
max uv
Bu2 , Bv2 ¤ d2κ , for h ¥ 1.
By Corollary 1, the covariances γ pu, v q have the expression:
h
γ pu, v q h
σ 2 ρ|uv|
e
2ρ
It follows that, for any u, v
8̧ » 1 » 1
σ2
2ρ `1
0
0
Ψ` pu, wqΨ` pv, z qeρ|wz | dwdz.
P r0, 1s,
}γ }8 ¤ σ2{r2ρp1 κ2qs,
2
|γ pu1, vq γ pu2, vq| ¤ σ2ρ pρ
" 2
B
γ pu, v q max uv
Bu2 ,
where d maxtd21 , d2 κu.
2
B γ pu, vq BuBv ,
d1 κ
q|u1 u2|,
1 κ2
2
B γ pu, vq * σ 2 2
Bv 2 ¤ 2ρ pρ
For h ¥ 1, the autocovariances γh pu, v q CovpXt puq, Xt
γh pu, v q σ2
2ρ
»1
0
eρ|us| Ψh pv, sq ds
Hence, for h ¥ 1, we have
8̧ » 1 » 1
σ2
2ρ `1
0
0
h
d
q,
1 κ2
pvqq is given by
Ψ` pu, wqΨ`
h
pv, zqeρ|wz| dwdz.
}γh}8 ¤ σ2κ|h|{r2ρp1 κ2qs,
2
Bγh pu, vq d1 κ
¤σ ρ
κh ,
Bu 8 2ρ
1 κ2
2
h1
Bγh pu, vq ¤ σ d1 κ ,
Bv 8 2ρ 1 κ2
*
" 2
B
γh pu, v q B 2 γh pu, v q B 2 γh pu, v q d
σ2 2
max ,
p
ρ
qκh,
,
¤
2
2
uv
Bu
BuBv
Bv
2ρ
1 κ2
Since γh pu, v q γh pv, uq, the proof of Lemma 2(ii), 2(iii) and 2(iv) is complete.
28
pXt, t P Zq satisfies (3). Let ε0 be an i.i.d. copy of ε0, and
Xt be obtained by replacing ε0 with ε0 in the definition of Xt . If f : r0, 1s2 Ñ R be a continuous
function, and set }f }8 : maxu,v |f pu, v q|. Define
Lemma 3. The CFAR(1) process X
Yt
»1»1
0
0
Then, for each q
f pu, v qXt puqXt pv q dudv,
and
Yt
»1»1
0
0
f pu, v qXt puqXt pv q dudv.
P N and q ¥ 1, and t ¥ 0,
? 2
}Yt Yt }q ¤ ? 2σ 2 rp2q 1q!!s1{q }f }8 κt.
ρ 1κ
Proof: By Cauchy-Schwarz inequality,
» 1» 1
f pu, v qrXt puq Xt puqsXt pv q dudv q
0 0
»1»1
rXt puq Xt puqsXt pv q dudv
2}f }8
q
»01 »01 » 1
r Ψt pu, wqpεt pwq εt pwqdw Xt pv q}2q dudv
2}f }8
2q
}Yt Yt}q ¤
2
¤
¤
? 02 0 0
¤ a 2σ 2 rp2q 1q!!s1{q }f }8 κt.
ρ p1 κ q
Define δt pq
Proof of Theorem 3:
pf gqpuq ³1
0
Xtpq Xrtpq.
For any function f
P C r1, 1s, denote
f pu v qg pv qdv. We first observe that Xt can be decomposed as
Xt psq ķ
βk,j pBk,j
Xrt1qpsq prk Xrt1qpsq
εt psq
pφ δtqpsq.
j 1
rt and w
r t be two pN
Let v
1q-dimensional vectors whose i-th entries are prk
and pφ δt qppi 1q{N q respectively. Let εt be the pN
p k can be decomposed as
εt ppi 1q{N q. The estimator β
pk
β
βr k
Ţ
M1t Σ1 Mt
1 Ţ
t 2
o pN 2 q
Ţ
?1
t
p k βr k
β
Γk,N1 T1
k,N
Ţ
rt
M1t Σ1 w
εt
r tq .
w
opp1q.
T
(29)
t 2
Γk,N1 µk,N , then
t
1
Γ1
bk,N
rt
M1t Σ1 pv
1
E pM Σ1Mtq, and bk,N
T
Let µk,N
1q-dimensional vector whose i-th entry is
t 2
We claim that under the condition T
E pM1 Σ1vrtq, Γk,N
Xrt1qppi 1q{N q
Ţ
pM1 Σ1vrt µ
k,N
t
t 2
pM1tΣ1Mt Γk,N qbk,N
t 2
29
q
1
Γ1
k,N
T
op pT 1{2 q.
Ţ
t 2
M1t Σ1 εt
(30)
We claim that
pk
β
βr k
1
Γ
k
bk
Ţ
Γk 1 T1
1
T
Ţ
put µk q
t 2
»u
σ
0
pztqj 0
The vector-valued process ut
zt
t 2
(31)
eρpuvq dWt pv q,
then similar to proof of Lemma 1, we get
1
σ
Ţ
op pT 1{2 q.
pAt Γk qbk
εt puq εt p0qeρu
2ρ
Bk,j puqεt p0q
σ2
1
T
t 2
If we represent εt puq as
»1
1
Γ
k
»1
0
1
Bk,j pv uqq dWt pv q Xt1 puq du.
pρBk,j pv uq
zt
(32)
At bk is a stationary process. For any fixed unit vector
P Rk , by Lemma 3, the physical dependence measures defined in Wu (2005) of the process
θ 1 put zt At bk q decay geometrically fast, and therefore the martingale approximation used in
°
Wu (2005) can be applied to obtain the central limit theorem for Tt2 θ 1 put zt At bk q. By
θ
the Cramer-Wold device, we have
?
1
1
p k βr k bk q Ñ N p0, Γ
T pβ
k Υk Γk q,
d
where Υk is the long-run variance of the process ut
zt
At bk .
Now we prove the claim (29). We proceed by showing that each entry of the vector converges
r t can be written as
to zero in probability. The j-th entry of Mt Σ1 w
»1»1
0
0
rt1 puqδt pv q dudv.
AN pBk,j , φqpu, v qX
By Lemma 1 , there exists a constant c1
¡ 0 such that
|AN pBj , φqpu, vq| ¤ c1,
for all u, v, N.
(33)
Let
rh pu, vq CovpXt puq, δt
γ
By Lemma 2, there exists a constant c2
h
pvqq,
»1»1
E
0
0
h
¡ 0 such that
}γ̃h}8 ¤ c2κh{N,
It follows that
γ̄h pu, v q Covpδt puq, δt
}γ̄h}8 ¤ c2κh{N.
|AN pBk,j , φqpu, vqδtpuqδtpvq| dudv OpN 1q,
30
pvqq,
°T
1{2 q, where
t2 yjt op pT
»1»1
so it suffices to prove
yjt
0
0
AN pBk,j , φqpu, v qXt1 puqδt pv q dudv.
By (33),
Eyjt
»1»1
0
0
AN pBk,j , φqpu, v qrγ1 pu, v q γ
r1 pu, vqs dudv
Op1{N q.
(34)
The autocovariance is
Covpyjt , yj,t
hq »
r0,1s4
AN pBk,j , φqpu1 , v1 qAN pBk,j , φqpu2 , v2 q
rγhpu1, u2qγ̄hpv1, v2q
rh
γ
1
pu1, v2qγrh 1pu2, v1qs du1du2du3du4.
¡ 0 such that
By (33) and Lemma 2, we know there exist a constant c3
|Covpwjt, wj,t hq| ¤ c3κ2h{N.
It follows that
Ţ
Var
yjt
OpT {N q.
(35)
t 2
Combining (34) and (35), and using the condition T
opN 2q, we have °Tt2 yjt oppT 1{2q, and
the proof of claim (29) is complete.
The difference between the two expressions in (30) and (31) is due to the approximation of
rt , so the proof of claim (31) is in the same fashion of that of claim (29). We omit the
Xt by X
details.
Lemma 4. The distance of two adjacent knots for uniform cubic B-spline functions defined on
r1, 1s with degree of freedom k is 1{m, and m pk 3q{2. Define two pm 4qpm 4q matrices
Ps and Vs :
pPsqqj » 1 s{m » 1 s{m
1
eρ|uv| Bk,q
{
s{m
» 1 s{ m » 1 s{ m
s m
pVsqqj ρ2
{
s m
{
1
puqBk,j
m 1
eρ|uv| Bk,q
puqBk,j
m 1
s m
pvq dudv,
m 1
pvq dudv,
m 1
1, . . . , m 4. Let b pb0, . . . , bm 3q1 be a unit vector. There exists constants c1 and c2
°
1
such that if s P Iα , 3j 0 b2j ¥ 3m
, and m ¥ c1 , then
for q, j
b1 pPs
Vs qb ¥ c2
3̧
j 0
31
b2j .
Proof: Rescale the uniform B-spline functions,
Bq psq Bk,m
q
ps{mq rq, q
The support of Bq pq is rq, q
pPsqqj 0, . . . , m
3.
4s and we denote B puq B2 puq. Ps and Vs can be written as
»m s»m
s
»
»m
s
ρ2
pVsqqj m2
1, . . . , m
4sp sq3 , for q
1, . . . , q
s
m s
s
eρ|uv|{m Bq1 1 puqBj1 1 pv q dudv,
s
s
eρ|uv|{m Bq1 puqBj 1 pv q dudv,
s 4, there are m 4 spline functions which are not
identically zero on the interval rs, m ss: B0 puq, B1 puq, . . . , Bm 3 puq. For any numbers b0 , b1 , b2 , b3 ,
2
³4 °3
°
b
B
p
s
q
ds ¥ c3 3j 0 b2j . Thus there exists an
there exists a constant c3 ¡ 0 such that 3
j 0 j j
s P r3, 4s such that
3̧
3̧ 1{2
b B psq ¥ ?c b2
.
for q, j
4. For a fixed 3
j 0
j
j
3
j
j 0
P r3, 4s, °3j0 bj Bj1 psq ¤
As a result, there exists an interval Iα of length c5 ¡ 0, which is contained in
On the other hand, there exists a constant c4 such that for all s
c4 °
3
2
j 0 bj
1{2
.
r3, 4s, such that for each s in this interval
3̧
3̧ 1{2
b B psq ¥ c b2
,
j 0 j j 6 j 0 j
where 0 c6
1 is an absolute constant.
Define the functions:
3̧
f1 puq bj Bj puq,
f2 puq j 0
and let f puq f1 puq
¸
m 1
bj Bj puq,
¸
m 3
bj Bj puq,
j m
j 4
f2 puq
f3 puq f3 puq. Using the identity
»
1 8 iuλ
ρ|u|{m
e
π e 1
8
m{ρ
pmλ{ρq2 dλ,
we have
1
b1 P s b s
f 1 puqf 1 pv q
»8
eipuvqλ m{ρ
pmλ{ρq2 dλdudv
1
s
s
8
2
» 8 » m s
1
m{ρ
f 1 puqeiuλ du
dλ
π 8 s
1 pmλ{ρq2
2
» »m s
1 8 isλ
ipm sqλ
iuλ
f1 psqe
f3 pm sqe
p
iλq
f puqe du
π 8
1
s
π
»m s»m
32
m{ρ
pmλ{ρq2 dλ.
Similarly,
2
» 8 » m s
iuλ
f puqe du
2
πm 8 s
1
m{ρ
pmλ{ρq2 dλ.
Neuman (1981) showed the Fourier transform of the central spline B2 puq is
»2
2 sinpλ{2q 4
iuλ
B puqe du .
ρ2
b1 V s b 2
Let g pλq »m
λ
°m1 ipj 2qλ
, we have
j 4 bj e
»m s
s
f puqe
iuλ
s
2
du »
s
m s
s
»m
f1 puqe
iuλ
f1 puqe
iuλ
s
du
»
s
m s
du
s
f3 puqe
iuλ
f3 puqe
iuλ
du
du
g pλ q
g pλ q
»4
0
B2 puqeiuλ du
2 sinpλ{2q
λ
4
.
(36)
Note that
» m s
»7
iuλ
s f1 puqe du ¤ 3
3̧
|bj |Bj puq du ¤ 2,
(37)
j 0
and a similar inequality holds for the Fourier transform of f3 puq.
We will consider two cases.
Case 1: maxλPr0,2{ms |g pλq|
¤ m |f1psq|{3.
on the interval λ
P rπ{p6mq, 11π{p6mqs, by law of
cosines, it holds that
f psqeisλ
1
q ¥ |f1 psq sinpmλq| ¥ |f1 psq|{2,
sqeipm
f3 pm
s λ
and with (36) and (37), we have
f1 psqeisλ
f3 pm
sqe p
q
i m s λ
»m
piλq
s
f puqe
iuλ
s
|f1 psq| 22π
du ¥
3m
2
|f13psq|
|f16psq| 22π
.
3m
It follows that
1
b1 P s b ¥
π
π1
Recall that
» 11π{p6mq π {p6mq
» 11π{6 {
6
1
3m ,
3m
|f1psq| 22π
6
π 6
°3 2
j 0 bj ¥
|f1psq| 22π 2
2
3m
m{ρ
pmλ{ρq2 dλ
1
ρ
ρ2
λ2
dλ ¥
60ρ
36ρ2 121π 2
so
|f1psq| ¥ c4
1{2
3̧
b2j
j 0
33
?
¥ c6{ 3m.
|f1psq| 22π 2 .
6
3m
Therefore, for Case 1, there exists a constant c7
b1 P s b ¥
5ρ
109ρ2
¡ 0, such that when m ¥ c7, it holds that
3̧
2
5ρc6
|
f1 psq|2 ¥
2
2
364π
109ρ
364π 2
Case 2: maxλPr0,2{ms |g pλq|
b2j .
j 0
¡ m |f1psq|{3. This implies that there exists a λ0 P r0, 2{ms such
?
?
that |g pλ0 q| ¥ m |f1 psq|{3 ¥ c6 m{6. Since b is a unit vector, |g pλq| ¤ m, and it follows that
?
f1 psq ¤ 3{ m. By Zygmund (2002), the derivative of g pλq satisfies |g 1 pλq| ¤ m3{2 . Therefore, on
?
an interval I1 € r0, 3{ms of length c6 {p12mq, |g pλq| ¥ c6 m{12. On this interval, it holds that
» m s
c6 ?m 2 sinpλ{2q 4
iuλ
f puqe du ¥
4.
There exists a c8
It follows that
s
12
¡ 0 such that when m ¥ c8,
» m s
c6 ?m
iuλ
¥
.
f
p
u
q
e
du
s
13
2
» » m s
iuλ
f puqe du
2
πm I s
1
ρ2
m{ρ
dλ
¥
pmλ{ρq2
πm2
ρ2
b1 Vs b ¥
1
¥
λ
c36 ρ3
2028π ρ2
p
Note that
b1 P s b
9q
¥
1
m
c36 ρ3
18252π ρ2
p
¸ ¸ » 1 s{m » 1 s{m
m 3m 3
s{m
m 3 »
¸ 1 s{m
{
E
Similarly, we have b1 Vs b
bq εt puqB 1
q m
s
q 0
mI1
c24 m
ρ
dλ
2
169 ρ
λ2
c56 ρ3
18252π pρ2
|f1psq| ¥
2
bi bj eρ|uv|{m Bq1
s m
q 0 j 0
9q
»
m
9q
puq du ¥ 0.
Proof: Γk could be written as sum of the following four matrices. Γk
0
Hence, the proof is
pGk,2qij ρ2
pGk,3qij ρ
pGk,4qij ρ
» »
0
1
0
1
» »
0 0
1 1
0
0
°4i1 Gk,i.
1 ps uqB 1 ps vq dudvds,
γ0 pu, v qBk,i
k,j
» 01 » 01 » 1
0
γ0 pu, v qBk,i ps uqBk,j ps v q dudvds,
γ0 pu, v qBk,i puqBk,j pv q dudv,
γ0 pu, v qBk,i p1 uqBk,j p1 v q dudv,
34
j 0
2
Lemma 5. The minimum eigenvalue of Γk , defined in Theorem 3 is Opk 1 q.
pGk,1qij b2j .
puqBj1 mpvq dudv
¥ 0, so both Ps and Vs are non-negative definite.
completed by setting c1 maxtc7 , c8 u, and
c5 ρ mintρ2 , 1u
c2 6
.
18252π pρ2 9q
»1»1»1
3̧
for i, j
1, . . . , k. Define Wk,1 and Wk,2.
» » »
σ 2 1 1 1 ρ|uv| 1
1 ps vq dudvds,
e
Bk,i ps uqBk,j
2ρ 0 0 0
» » »
ρσ 2 1 1 1 ρ|uv|
e
Bk,i ps uqBk,j ps v q dudvds,
2 0 0 0
pWk,1qij pWk,2qij for i, j
1, . . . , k. For any b tb1, . . . , bk u P Rk ,
b1 Gk,1 b
¥
So Gk,1
©
Wk,1
©
»1 E
0
»1 E
0
»1 E
ķ
»1
i 1 0
ķ
» 1 » 1
bi
0
»1
bi εt1 puqB 1
2
εt1 puq
D1
2
q be a unit vector, and
t0 ¤ j ¤ 2m 1 :
¸
j 3
B1
k,i
2
puq du
ds
ds ¥ b1 Wk,1 b ¥ 0.
puq du
0. In a similar way, we can show that Gk,2
Wk,2 . Let b pb0 , . . . , b2m
Wk,1
ds
φpu v qXt1 pv q dv
k,i
i 1 0
0
1 ps uqdu
bi Xt puqBk,i
i 1 0
ķ
2
b2l
©
Wk,2
©
0. Thus Γk
©
¥ 1{p3mqu,
l j
t0, 1, . . . , 2m 1uzD1. Since °jPD °jlj3 b2l ¤ 2m 1{p3mq 2{3, it holds that
°
°j 3 2
j PD
lj bl ¥ 1{3. For each j P D1 and j ¤ m 1, by Lemma 4 , There exists an interval Ij € r1 j {m, 1 pj 1q{ms of length c5 {m, such that for each s P Ij
and D2
2
1
» 1 » 1 2m¸ 2
0
0 h,l 0
1 ps uqB 1 ps vq
bh bl Bk,h
k,l
ρ2 Bk,h ps uqBk,l ps v q eρ|uv| dudv
¥ c6
¸
j 3
b2l .
l j
It follows that
» 1 » 1 » 1 2m¸ 2
0
0
0 h,l 0
1 ps uqB 1 ps vq
bh bl Bk,h
k,l
3 c6
¥ cm
¸
P
¸
ρ2 Bk,h ps uqBk,l ps v q eρ|uv| dudvds
j 3
¤ b2l .
j D1 ,j m 1 l j
By applying reverting the order of the rows and the columns, we can show that for each j
¥ m, there exists an interval Ij € r1
each s P Ij
and j
» 1 » 1 2m¸ 2
0
0 h,l 0
1 ps uqB 1 ps vq
bh bl Bk,h
k,l
j {m, 1
pj
1q{ms of length c5 {m, such that for
ρ2 Bk,h ps uqBk,l ps v q eρ|uv| dudv
¥ c6
¸
j 3
l j
35
P D1
b2l .
So similarly,
» 1 » 1 » 1 2m¸ 2
0
0
0 h,l 0
¥
1 ps uqB 1 ps vq
bh bl Bk,h
k,l
c3 c6
m
¸
P
¸
ρ2 Bk,h ps uqBk,l ps v q eρ|uv| dudvds
j 3
¥ b2l .
j D1 ,j m l j
Therefore,
b1 pWk,1
Wk,2 qb ¥
c3 c6
,
6m
and the proof is complete.
Lemma 6. Assume Ξ : r0, 1s2
Ñ r0, 1s is a function, and satisfies
*
"
Ξpu1 , v q Ξpu2 , v q Ξpv, u1 q Ξpv, u2 q p
1q
}Ξ }8 sup
,
8;
u1 u2
u1 u2
0¤u ,u ,v ¤1,u u
*
" 2
B
Ξpu, v q B 2 Ξpu, v q B 2 Ξpu, v q p
2q
}Ξ }8 sup
Bu2 , BuBv , Bv 2 8.
1
2
1
2
(38)
¤ ¤ 0 u,v 1,u v
Define the k k matrix Λ with the pj, lq-th entry
»1»1»1
0
0
0
1 ps uqB 1 ps vqΞpu, vq dudvds.
Bk,j
k,l
(39)
Then }Λ} Opk 1 q.
Proof: Let Sj be the support of Bk,j , and let Sjl be the set of s which makes the integral
»1»1
0
0
1 ps uqB 1 ps vqΞpu, vq dudv
Bk,j
k,l
(40)
nonzero. Define two subsets of Sjl ,
Sjl1 : ts : s P Sjl , pSj
Y Sl q ‚ rs 1, ssu,
and Sjl2 : Sjl zSjl1 . Use λ to denote the Lebesgue measure. Let k
is the mesh size. For each 1
2m
3, and note that 1{m
¤ j ¤ k, we consider the entries in the j-th row of the matrix Λ.
There are three cases.
Case 1. If ||l j | m|
¤ 3, then λpSjl q ¤ 7{m, and it follows that |pΛqjl | ¤ C1{k, where C1
is a constant depending only on Ξ. Observe that for each j, the inequality ||l j | m| ¤ 3 only
holds for at most 14 different values of l.
Case 2. If |j l| ¤ 3, then λpSjl1 q ¤ 7{m, and integrating (40) over s P Sjl1 leads to a value
bounded by C1 {p2k q. For s P Sjl2 , the integral (40) can be written as
»1»1
0
0
1 ps uqB 1 ps vqΞpu, vq dudv Bk,j
k,l
» »
36
Sj
Sl
1 puqB 1 pvqΞps u, s vq dudv.
Bk,j
k,l
Let uj be the left boundary of Sj , then
» »
1 puqB 1 pvqΞps u, s vq dudv
Bk,j
k,l
Sj Sl
» »
1 puqB 1 pvqrΞps u, s vq Ξps u , s vq dudvs
Bk,j
j
k,l
Sj Sl
» »
1 puqB 1 pvq| }Ξp1q }8 pu u q dudv ¤ C1 {p2kq.
¤
|Bk,j
j
k,l
Sj
(41)
Sl
Since λpSjl2 q ¤ 1, we have
|pΛqjl | ¤ C1{k.
Case 3. If 3 |j l| m 3, then for each s P Sjl1 , it must holds that either Sj € rs 1, ss
or Sl € rs 1, ss. Similar as (41), it holds that for each s P Sjl1 , the integral (40) has an absolute
value less than C1 {k. For s P Sjl2 , since Sj and Sl has no intersection, we have
BΞps uj , s vlq
Ξps u, s v q Ξps uj , s v q pu uj q
B
u
BΞps uj , s vlq
pv vlq Rpu, vq
Bv
where
|Rpu, vq| ¤ }Ξp2q}8pu uj
v vl q2 ,
u P Sj , v
P Sl .
Therefore
» »
» »
1 puqB 1 pvqRpu, vq dudv
1 puqB 1 pvqΞps u, s vq dudv Bk,j
Bk,j
k,l
k,l
Sj Sl
Sj Sl
» »
1 puqB 1 pvq| }Ξp2q }8 pu u v v q2 dudv ¤ C1 {k2 .
¤
|Bk,j
j
l
k,l
Sj
(42)
Sl
Since λpSjl1 q ¤ 8{m and λpSjl2 q ¤ 1, it holds that
|pΛqjl | ¤ Ck1 m8
C1
k2
¤ Ck22 ,
where C2 is a constant which only depends on Ξ.
Combining these three cases, we see that there exists a constant C3 which only depends on Ξ,
such that }Λ}8
¤ C3{k, and }Λ}1 ¤ C3{k. Therefore,
}Λ} ¤ }Λ}18{2}Λ}11{2 ¤ C3{k.
Proof of Theorem 4:
By Corollary 6.26 of Schumaker (1981),
}φ φrk }8 ¤ c1kζ ,
}φ1pq φr1k pq}8 ¤ c1kζ
37
1
,
(43)
where c1 is an absolute constant. There are four terms in the definition of ApBk,j , rk q. We first
consider
»1»1»1
0
0
0
1 ps uqr1 ps vqγ pu, vq dudvds.
Bk,j
k
Let
Sj
and Sj1
ts : λpSj X rs 1, ssq ¡ 0u,
ts P Sj : Sj ‚ rs 1, ssu, Sj2 Sj zSj1. When s P Sj1, it holds that
» 1 » 1
1
1
¤ C ρ k ζ 1 .
B
p
s
u
q
r
p
s
v
q
γ
p
u,
v
q
dudv
k,j
k
0 0
where C is constant which only depends on ρ, σ 2 and φ. When s P Sj2 and 3 j
the fact that
³
Sj
1 puq du 0 and Lemma 2, we have
Bk,j
k 2, using
» 1 » 1
1
1
0 0 Bk,j ps uqrk ps v qγ pu, vq dudv » »
1
1 puqr1 ps vq γ ps u, vq γ ps u , vq dudv
Bk,j
j
k
0 Sj
¤ Ckζ .
Noticing that λpSj1 q ¤ 4{m, and combining the previous two cases, we see that
» 1 » 1 » 1
1
1
ζ
0 0 0 Bk,j ps uqrk ps v qγ pu, vq dudvds ¤ Ck .
It can be shown that the other three terms in the definition of ApBk,j , rk q have the same order
Opk ζ q, and hence
|pµk qj | ¤ Ckζ .
By Lemma 5,
}bk } }Γk 1µk } Opkζ
{ q.
3 2
It follows that
}bk pq}8 ¤ }Bk pq1bk }8 }rk pq}8 Opkζ
{ q.
3 2
For the statement regarding }σk2 pq}8 , it suffices to show that
}Σk } Opkq.
We proceed by calculating the long-run variances of the three processes ut , zt and At bk . First
we observe that zt is a martingale, and
Varpzt q Γk .
38
The covariance between utj and ut
Covputj , ut
h,l
q
»
r0,1s4
h,l
is
ApBk,j , rk qpu1 , v1 qApBk,l , rk qpu2 , v2 q
rγhpu1, u2qγhpv1, v2q
γh pu1 , v2 qγh pv1 , u2 qs du1 du2 du3 du4 .
By (43), we have
ApBk,j , rk qpu, v q ¤ Cρ k ζ
1
,
where Cρ is constant which only depends on ρ. The preceding bound, together with Lemma 2
implies that,
|Covputj , ut h,l q| ¤ 2Cρ2{p1 κ22q2σ4 k2ζ
| |.
2 2h
κ
Therefore, the operator norm of the long-run variance of ut is of the order Opk 2ζ
The pj1 , j2 q-th entry of the matrix EpAt bk b1k A1t
»
ķ
r0,1s4
h
q is
ApBk,l1 , Bk,j1 qpu1 , v1 qpbk ql1 pbk ql2 ApBk,l2 , Bk,j2 qpu2 , v2 q
3
q.
(44)
l1 ,l2 1
rγhpu1, u2qγhpv1, v2q γhpu1, v2qγhpv1, u2qs du1du2du3du4.
By the definition of Ap, q, the product ApBk,j , Bk,h qApBk,j , Bk,h q can be expanded to 16
1
1
2
2
terms. We first consider the term
»
r0,1s
6
1 ps1 u1 qB 1 ps1 v1 qB 1 ps2 u2 qB 1 ps2 v2 q
Bk,l
k,j1
k,l2
k,j2
1
(45)
rγhpu1, u2qγhpv1, v2q γhpu1, v2qγhpv1, u2qs ds1ds2du1dv1du2dv2.
Let Bk be the set of the pairs pj, lq such that either |j l| ¤ 3, or ||j l| m| ¤ 3. Lemma 2
gives bounds on derivatives of γh pu, v q. Using these bounds, and a similar argument as the proof
of Lemma 6, we can show that there exists a constant c2 ¡ 0 such that the absolute value of (45)
can be controlled as
$
2h 2
'
'
'
& c2 κ {k if pj1 , l1 q P Bk and pj2 , l2 q P Bk ;
c2 κ2h {k 3 if one and only one of pj1 , l1 q and pj2 , l2 q belongs to Bk ;
'
'
'
% c2 κ2h {k4 if pj1 , l1 q R B and pj2 , l2 q R B .
k
k
It can be shown that the other 15 terms have the same bounds, and we omit the details. Let C
be a k k matrix whose pj, lq-th entry is 1{k when pj, lq P Bk , and 1{k 2 when pj, lq R Bk , then
1
2h 2ζ
}E pAtbk b1k A1t hq} ¤ c3κ2h
2 }C |bk | |bk | C} ¤ c4 κ k
3
,
where c3 and c4 are absolute constants, and |bk | consists of entry-wise absolute values of bk .
Therefore, the long-run variance of At bk is of the order Opk 2ζ
39
3
q, and the proof is complete.
pXt, t P Zq satisfies (4) and φi P C 2r1, 1s for i is a stationary process, where Xt pXt , Xt1 , . . . , Xtp 1 q1 ,
Lemma 7. If the CFAR(p) process X
1, . . . , p, then Xt
°8j0 ∆j εtj
pεt, 0, . . . , 0q1, and ∆ is defined in (26).
depend on tφi , i 1, . . . , pu and ρ such that:
εt
There exist three constants ι, t0 and λ which only
(i) }∆h } ¤ ιλh for h ¥ 2.
(ii) }γh }8
¤ σ2ιλ|h| for h P Z.
(iii) For h ¥ 1,
"
B
γ
p
u,
v
q
h
max u,v
Bu ,
(iv) For h ¥ 1,
Bγh pu, vq *
2 h
Bv ¤ σ ικ .
"
*
B2 γh pu, vq B2 γh pu, vq B2 γh pu, vq max u,v
Bu2 , BuBv , Bv2 ¤ σ2ικh.
Proof: By Lemma 5.1 and Theorem 5.1 in Bosq (2000), it is straightforward to see that Xt has the
°8j0 ∆j εtj . Since we show that }∆j }1{j Ñ λmax 1 in the proof of
Theorem 2, there exists an integer t0 such that }∆h } ¤ λh , for h ¥ t0 , where λ p1 λmax q{2.
For h ¥ t0 , any u, v P r0, 1s,
following expression: Xt
γh pu, v q
¤
where δ
1
EpXt puq1 Xt
p
8̧
1
p j 0
1
h pv qq p
8̧
E
p∆j εtj qpuq1p∆j
h
εtj qpv q
j 0
}p∆j εtj qpuq}2 }p∆j
h
εtj qpv q}2
2 h
¤ 2ρpδσp1 λ λq ,
maxj¥0 }∆j }. By the definition of ∆i, we have
∆q1 ∆q2 . . . ∆qn f psq »
r0,1sn
φq1 ps u1 qφq2 pu1 u2 q . . . φqn puqn 1 uqn qf pun q du1 . . . duqn .
The entries in i-th row of ∆p can be regarded a polynomial in t∆1 , . . . , ∆p u of degree p i
1,
without intercept (the identity operator). Hence, we can define two linear operators ∆p1q and
∆p2q in Hp r0, 1s such that ∆p1q f psq r∆p f psqs1 , and ∆p2q f psq r∆p f psqs2 . Specifically, assume
that the pi, j q-th entry in ∆ is ∆q1 ∆q2 . . . ∆qn , and we define pi, j q-th entries in ∆p1q and ∆p2q as
follows
p∆p1qf psqqij
p∆p2qf psqqij »
»
r0,1s
n
φ1q1 ps u1 qφq2 pu1 u2 q . . . φqn qpuqn 1 uqn qf pun q du1 . . . duqn ,
r0,1s
n
φ2q1 ps u1 qφq2 pu1 u2 q . . . φqn qpuqn 1 uqn qf pun q du1 . . . duqn .
40
p 1
1 respectively,
Then, the operator norms of ∆p1q and ∆p2q are bounded by pdp,1 κmax
and pdp,2 κpmax
where κmax
maxtκ1, . . . , κp, 1u, dp,1 max1¤j¤p,0¤s¤1 |φ1j psq|, and dp,2 max1¤j¤p,0¤s¤1 |φ2j psq|.
On the other hand, by the definition of ∆, it suffices to show that the operator norms of
1 }f } and pd κp1 }f } respectively,
Bp∆j f psqq{Bs and Bp∆j f psqq{Bs2 are bounded by pdp,1κpmax
8
p,2 max
8
for 1 ¤ j p.
It follows that,
1 λhp
Bγhpu, vq ¤ σ2δ1λh , Bγhpu, vq ¤ σ2δdp,1κpmax
,
Bu
2ρpp1 λq
Bv
2ρp1 λq
p1 hp
λ
B2γhpu, vq ¤ σ2dδ2λh , B2γhpu, vq ¤ σ2δdp,2κmax
,
Bu2
2ρpp1 λq
Bv2
2ρp1 λq
p1 hp
λ
B2γhpu, vq ¤ σ2δ2δdp,2κmax
,
BuBv
2ρp1 λq
1 δ u, and δ maxtρ2 , pd κp1 δ u.
p, where δ1 maxtρ, pdp,1 κpmax
2
p,2 max
for h ¥ t0
The proof is complete.
pXt, t P Zq satisfies (4). Let ε0 be an i.i.d. copy of ε0, and
Xt be obtained by replacing ε0 with ε0 in the definition of Xt . If f : r0, 1s2 Ñ R be a continuous
function, and set }f }8 : maxu,v |f pu, v q|, then for p ¤ h ¤ p,
Lemma 8. The CFAR(p) process X
Yt,h
»1»1
0
0
f pu, v qXt puqXth pv q dudv,
Then, for each q
f pu, v q, such that
and
Yt,h
»1»1
0
0
f pu, v qXt puqXth pv q dudv.
P N, there exists a constant ι which depends on tφi, i 1, . . . , pu, ρ, σ2, p and
}Yt,h Yt,h }q ¤ ιλt.
Proof: By Cauchy-Schwartz inequality and Lemma 7, for t ¥ t0
}Yt,h Yt,h }q ¤
¤
¤
¤
p,
» 1» 1
f pu, v qrXt puq Xt puqsXth pv q dudv q
0 0
» 1» 1
f pu, v qrXth pv q Xth pv qsXt puq dudv q
0 0
»1»1
∆t εt puq ∆t εt puq }Xth pv q}2q dudv
}f }8
2q
0 0
»1»1
∆th εth puq ∆th εth puq }Xt pv q}2q dudv
}f }8
2q
0 0
2 q
1{p2qq
σ
q
pγmax
2}f }8 λtp
rp
2q 1q!!s
rp2q 1q!!sq1{p2qq
ρ
1{2
2σγmax
rp2q 1q!!s1{q }f }8 λtp.
ρ1{2
41
where γmax
maxsPr0,1s VarpXtpsqq.
Proof of Theorem 5:
Xt psq
p
¸
ķ
The proof is similar to that of Theorem 3. Xt can be decomposed as
p
¸
rti qpsq
βrk,i,j pBk,j X
prk,i Xrtiqpsq
εt psq
pφi δtqpsq.
i 1
i 1
i 1j 1
p
¸
°p
r
i1 prk,i Xti qppj 1q{N q
rt and w
r t be two pN 1q-dimensional vectors whose j-th entries are
Let v
°p
1q-dimensional vector whose j-th
i1 pφi δt qppj 1q{N q respectively. Let εt be the pN
p k can be decomposed as
entry is εt ppj 1q{N q. The estimate β
Ţ
1 Ţ
p k βr k
rt εt w
r tq .
β
M1t Σ1 Mt
M1t Σ1 pv
tp 1
tp 1
and
o pN 2 q
We claim that under the condition T
Ţ
?1
E pM1tΣ1vrtq, Γk,N
pk
β
βr k
Γk,N1 T1
Ţ
Ţ
1
T
(46)
t p 1
t
1
Γ
k,N
bk,N
opp1q.
E pM1 Σ1Mtq, and bk,N
T
Let µk,N
rt
M1t Σ1 w
Γk,N1 µk,N , then
1
Γ
k,N
pM1tΣ1vrt µk,N q
t p 1
pM1tΣ1Mt Γk,N qbk,N
Ţ
1
T
t p 1
op pT 1{2 q.
M1t Σ1 εt
(47)
t p 1
We claim that
p k βr k
β
1
Γ1
bk
k
Ţ
Γ1 1
k
put µk q
1
Γ1
t p 1
k
T
Ţ
t p 1
op pT 1{2 q.
pAt Γk qbk
T
T
Ţ
zt
(48)
t p 1
By Lemma 8, we can finish this proof similar to that of Theorem 3.
Lemma 9. Let q
¥ 2 be an integer.
matrices:
Consider two pqk q pqk q symmetric positive semi-definite
Ω1
Ω 0
0
,
Ω2
0
Ω11 Ω12
,
Ω21 Ω22
where Ω and Ω11 are pq 1qk-dimensional square matrices. Assume there exist positive constants
c1
¤ c2 such that
}Ω11} ¤ c2,
Ω © c1 Ipq1qk ,
Ω22
Then
Ω1
Ω2
©c
1
42
c21
Iqk .
2c2
© c1Ik .
Proof: Let 0 c 1, and define
Ω3
c1{2 Ipq1qk
0
c1{2 Ik
0
Ω11 Ω12
0
c1{2 Ik
0
Ω21 Ω22
Ω1
Ω2
Ω3
0
c1 Ω11
Ω12
Ω21
c Ω22
We see that Ω3 is positive semi-definite, and
1
Ω pc 1qΩ11
c1{2 Ipq1qk
.
0
p1 cqΩ22
Since Ω © c1 Ipq1qk and }Ω11 } ¤ c2 , it holds that
Ω pc1 1qΩ11
By taking c 2c2 {pc1
© pc1 pc1 1qc2qIpq1qk .
2c2 q, we have
Ω1
Ω2
c21
Iqk ,
c1 2c2
© Ω3
and the proof is complete.
Lemma 10. Assume that φ1 , . . . , φp
Theorem 5 is Opk 1 q.
P
C 2 r1, 1s, The minimum eigenvalue of Γk , defined in
Proof: For any function Ψi , denote by pΨi εt qpuq the random variable
³1
0
Ψi pu, v qεt pv q dv. Define
γ11 pu, v q : σ 2 eρ|uv| . For each 2 ¤ i ¤ p, let γ1i pu, v q : Covrεt puq, pΨi1 εt qpv qs and γi1 pu, v q :
γ1i pv, uq. For every pair 2
¤ i, j ¤ p, define γij pu, vq CovrpΨi1εtqpuq, pΨj1εtqpvqs.
each pair 1 ¤ i, j ¤ p, define a k k matrix Ξij , with ph, lq-th entry
pΞij qhl »1»1»1
0
0
0
1 ps uqB 1 ps vq
Bk,h
k,l
ρ2 Bk,h ps uqBk,l ps v q ds γij pu, v qdudv.
For each 1 ¤ i ¤ p, define the p p block matrix
Ξ
Ξi,i1
ii
Ξ
i1,i Ξi1,i1
.
..
Γp,i ..
.
Ξ1,i
Ξ1,i1
0
0
...
©
p
¸
i 1
43
Ξi,1
. . . Ξi1,1
..
..
.
.
...
Ξ11
...
0
Using the representation (7), we know
Γk
Now for
Γp,i .
0
.. .
.
0
0
0
By Lemma 5, there exists a constant c1
¡ 0 such that
Ξ11
© c1{k Ik .
P C 2r1, 1s implies that all functions γij pu, vq satisfies the assumption (38), and then by Lemma 6, the operator norms of all matrices Ξij have the order Opk 1 q
The condition that φ1 , . . . , φp
uniformly. Then we can apply Lemma 9 inductively to complete the proof.
Proof of Theorem 6:
By Lemma 9 and Lemma 10, we can complete the proof of Theorem
6, following the proof of Theorem 4.
References
Antoniadis, A., Paparoditis, E., and Sapatinas, T. (2009), “Bandwidth selection for functional
time series prediction”, Statistics and Probability Letter, 79, 733-740.
Aue, A., Noriho, D.D., Hörmann, S. (2012), “On the prediction of functional time series”,
Technical Report.
Berkes, I., Horvath, L., and Rice, G. (2013), “Weak invariance principles for sums of dependent
random function”, Stochastic Processes and their Application, 123, 385-403.
Besse, P., Cardot, H., and Stephenson, D. (2000), “Autoregressive forecasting of some functional
climatic variations”, Scandinavian Journal of Statistics, 27, 673-687.
Bosq, D.(2000), Linear Processes in Function Spaces, Theory and Applications. Lecture Notes
in Statistics, New York: Springer-Verlag.
Brockwell, P. and Davis, R. (1990), Time Series: Theory and Methods, New York: Springer.
Brumback, B. and Rice, J.A.(1998), “Smoothing spline models for the analysis of nested and
crossed samples of curves”, Journal of the American Statistical Association, 93, 961-994.
Cai, Z., Fan, J., and Yao, Q. (2000), “Functional-coefficient regression models for non-linear time
series”, Journal of the American Statistical Association, 95, 941-956.
Chan, K.S. (1993), “Consistency and limiting distribution of the least squares estimator of a
threshold autoregressive model”, The Annals of Statistics, 520-533.
Chen, R. and Tsay, R.S. (1993a), “Functional coefficient autoregressive models”, Journal of the
American Statistical Association, 88, 298-308.
Chen, R. and Tsay, R.S. (1993b), “Nonlinear additive ARX models”, Journal of the American
Statistical Association, 88, 955-967.
Chen, R., Liu, J.S., and Tsay, R.S. (1995), “Additivity tests for nonlinear autoregressive models”,
Biometrika, 82, 369-383.
Chen, X. (2008), “Large sample sieve estimation of semi-nonparametric models”, Handbook of
Econometrics, 6, 5549-5632.
44
Chen, X. and Shen, X. (1998), “Sieve extremum estimates for weakly dependent data”, Econometrica, 66, 289-314.
Cressie, H. and Huang, H. (1999), “ Classes of nonseparable, spatio-temporal stationary covariance function”, Journal of the American Statistical Association, 94, 1330-1339.
Diebold, F. X. and Li, C. (2006), “ Forecasting the term structure of government bond yields”,
Journal of Econometrics, 130, 337-364.
Fan, J. and Gijbels, I. (1996), Local Polynomial Modelling and Its Applications, London: Chapman and Hall.
Fan, J. and Yao, Q. (2003), Nonlinear Time Series: Nonparametric and Parametric Methods,
New York: Springer-Verlag.
Ferraty, F. and Vieu, P. (2006), Nonparametric Functional Data Analysis, New York: Springer.
Gasser, T., Müller, H.G., Köhler, W., Molinari, L., and Prader, A. (1984), “Nonparametric
regression anlaysis of growth curves”, Annals of Statistics, 12, 210-229.
Gneiting, T. (2002), “Nonseparable, stationary covariance functions for space-time data”, Journal of the American Statistical Association, 97, 590-600.
Grenander, U. and Szego, G.(1984), Toeplitz Forms and Their Applications, University of California Press.
Halberstam, H. and Richert, H.E. (2013), Sieve Methods, Courier Dover Publications.
Hall, P. and Heyde, C.C. (1980), Martingale Limit Theory and Its Application, New York:
Academic press.
Handcock, M.S. and Wallis, J.R. (1994), “An approach to statistical spatial-temporal modeling
of meteorological fields”, Journal of the American Statistical Association, 89, 368-378.
Härdle, W., Chen, R. and Lüetkepohl, H. (1997), “A review of nonparametric time series analysis”, International Statistical Review, 65, 49-72.
Hörmann, S., Horváth, L., and Reeder, R. (2013), “A functional version of ARCH model”,
Econometric Theory, 29, 267-288.
Hörmann, S. and Kokoszka, P.(2010), “Weakly dependent functional data”, The Annals of Statistics, 38(3), 1845-1884.
Horváth, L., Huskova, M., and Kokoszka, P. (2010), “Testing the stability of the functional
autoregressive process”, Journal of Multivariate Analysis, 101, 352-367.
Horváth, L. and Kokoszka, P. (2012), Inference for Functional Data with Applications, Springer.
Horváth, L., Kokoszka, P., and Reeder, R.(2012), “Estimation of the mean of functional time
series and a two-sample problem”, Journal of Royal Statistical Society: Series B, 75, 103122.
Houghton, A.N., Flannery, J., and Viola, M.V. (1980), “Malignant melanoma in Connecticut
and Denmark” , International Journal of Cancer, 25, 95-104.
45
Huang, J.Z. (2003), “Local asymptotics for polynomial spline regression”, The Annals of Statistics, 31(5), 1600-1635.
Hyndman, R. and Shahid, Md. U. (2007), “Robust forecasting of mortality and fertility rates:
a functional data approach”, Computational Statistics & Data Analysis, 51, 4942-4956.
James, G.M., Hastie, T.J., and Sugar, C.A. (2000), “Principal component models for sparse
functional data”, Biometrika, 87(3), 587-602.
James, G.M. and Sugar, C.A. (2003), “Clustering for sparsely sampled functional data”, Journal
of the American Statistical Association, 98(462), 397-408.
Keselman, H.J. and Keselman, J.C. (1993), “Analysis of repeated measurements”, In L.K. Edwards (ed.) Applied analysis of Variance in Behavioral Science, New York: Marcel Dekker,
105-145.
Lüetkepohl, H. (2005), New Introduction to Multiple Time Series Analysis, Berlin: Springer.
Neuman, E. (1981), “Moments and Fourier transforms of B-splines”, Journal of Computational
and Mathematics, 7(1), 51-62.
Nadaraya, E.A. (1964), “On estimating regression”, Theory of Probability & Its Application, 9,
141-142.
Ratcliffe, S.J., Leader, L.R., and Heller, G.Z. (2002a), “Functional data analysis with application
to periodically stimulated foetal heart rate data. I: Functional regression”, Statistics in
Medicine, 21, 1103-1114.
Ratcliffe, S.J., Leader, L.R., and Heller, G.Z. (2002b), “Functional data analysis with application
to periodically stimulated foetal heart rate data. II: Functional logistic regression”, Statistics
in Medicine, 21, 1115-1127.
Ramsay, J.O. and Silverman, B.W. (2005), Functional Data Analysis, New York: Springer.
Rao, B.P. (1999), Semimartingales and Statistical Inference, London: Chapman and Hall.
Roberts, J. M. (1995), “New Keynesian economics and the Phillips curve”, Journal of Money,
Credit and Banking, 975-984.
Ruppert,D., Sheather, S., and Wand, M.P. (1995), “An effective bandwidth selector for local least
squares regression”, Journal of the American Statistical Association, 90(432), 1257-1270.
Silverman, B.W. (1984), “Spline smoothing: the equivalent variable kernel method”, The Annals
of Statistics, 898-916.
Schumaker, L. (1981), Spline Functions: Basic Theory, New York: John Wiley & Sons.
Shojo, I. (1998), “Approximation of continuous time stochastic processes by a local linearization
method”, Mathematics of Computation, 67, 287-298.
Tong, H. (1983), Threshold Models in Non-linear Time Series Analysis, Lecture notes, SpringerVerlag.
Tiao, G.C. and Tsay, R.S. (1989), “Model specification in multivariate time series”, Journal of
Royal Statistical Society. Series B, 157-213.
46
Tiao, G.C. and Tsay, R.S. (1983), “Multiple time series modeling and extended sample crosscorrelations”, Journal of Business &Economic Statistics, 1(1), 43-56.
Tsay, R.S. (2010), Analysis of Financial Time Series, New Jersey: John Wiley & Sons.
Watson, G.S. (1964), “ Smooth regression analysis”, Sankhyā: The Indian Journal of Statistics,
Series A, 359-372.
Wu, W.B. (2005), “Nonlinear system theory: Another look at dependence”, Proceedings of the
National Academy of Sciences of the United States of America, 102(40), 14150-14154.
Xia, Y. and Li, W.K. (1999), “On single-index coefficient regression models”, Journal of the
American Statistical Association, 94, 1275-1285.
Zhou, S., Shen, X., and Wolfe, D.A. (1998), “ Local asymptotics for regression splines and
confidence regions”, The Annals of Statistics, 26, 1760-1782.
Zygmund, A. (2002), Trigonometric series, Cambridge University press.
47
Download