This article appeared in a journal published by Elsevier. The... copy is furnished to the author for internal non-commercial research

advertisement
This article appeared in a journal published by Elsevier. The attached
copy is furnished to the author for internal non-commercial research
and education use, including for instruction at the authors institution
and sharing with colleagues.
Other uses, including reproduction and distribution, or selling or
licensing copies, or posting to personal, institutional or third party
websites are prohibited.
In most cases authors are permitted to post their version of the
article (e.g. in Word or Tex form) to their personal website or
institutional repository. Authors requiring further information
regarding Elsevier’s archiving and manuscript policies are
encouraged to visit:
http://www.elsevier.com/copyright
Author's personal copy
Journal of Statistical Planning and Inference 139 (2009) 533 -- 546
Contents lists available at ScienceDirect
Journal of Statistical Planning and Inference
journal homepage: w w w . e l s e v i e r . c o m / l o c a t e / j s p i
Asymptotic properties of nonparametric M-estimation for mixing
functional data
Jia Chen∗ , Lixin Zhang
Department of Mathematics, Zhejiang University, Yuquan Campus, Hangzhou 310027, China
A R T I C L E
I N F O
A B S T R A C T
We investigate the asymptotic behavior of a nonparametric M-estimator of a regression function for stationary dependent processes, where the explanatory variables take values in some
abstract functional space. Under some regularity conditions, we give the weak and strong
consistency of the estimator as well as its asymptotic normality. We also give two examples
of functional processes that satisfy the mixing conditions assumed in this paper. Furthermore,
a simulated example is presented to examine the finite sample performance of the proposed
estimator.
© 2008 Elsevier B.V. All rights reserved.
Article history:
Received 19 February 2006
Received in revised form
20 May 2007
Accepted 1 May 2008
Available online 22 May 2008
MSC:
primary 62N02
62G09
secondary 62E20
Keywords:
-Mixing
Asymptotic normality
Consistency
Functional data
Nonparametric M-estimation
1. Introduction
The nonparametric regression techniques in modelling economic and financial time series data have gained a lot of attention
during the last two decades. Both estimation and specification testing have been systematically examined for real-valued i.i.d.
or weakly dependent stationary processes. See, for example, Robinson (1989), Härdle et al. (1997), Fan and Yao (2003) and the
references therein.
In this paper, we consider the case of mixing functional data. Let {Xi , Yi }∞
be a stationary process, where Yi are real-valued
i=1
random variables and Xi take values in some abstract functional space H (e.g. Banach or Hilbert space) with norm · and
the deduced distance d(u, v) = u − v, u, v ∈ H. In recent years, there is increasing interest in studying the limit theorems for
functional data. For example, Araujo and Gine (1980) and Vakashnia et al. (1987) extended certain probability limit theorems
to Banach-space-valued random variables. Gasser et al. (1998) considered density and mode estimation for Xi taking values in a
normed vector space. Ferraty and Vieu (2002) studied nonparametric regression estimation, and Ramsay and Silverman (2002)
gave an introduction to the basic methods of functional data analysis and provided diverse case studies in several areas including
criminology, economics and neurophysiology.
In this paper, we consider the following regression functional:
m(x) = E(Y1 |X1 = x),
∗
x ∈ H,
Corresponding author.
E-mail address: chenjiazju@yahoo.com.cn (J. Chen).
0378-3758/$ - see front matter © 2008 Elsevier B.V. All rights reserved.
doi:10.1016/j.jspi.2008.05.007
(1.1)
Author's personal copy
534
J. Chen, L. Zhang / Journal of Statistical Planning and Inference 139 (2009) 533 -- 546
assuming that E|Y1 | < ∞. Masry (2005) established the asymptotic normality of the Nadaraya–Watson (NW) kernel estimator
m∗n (x) of m(x), which is defined by
m∗n (x) =
n
Yi K
i=1
d(x, Xi )
hn
n
i=1
K
d(x, Xi )
,
hn
(1.2)
where K(·) is a kernel function and hn is the bandwidth, which tends to zero as n tends to infinity. The NW estimator m∗n (x) can
be seen as a local least squares estimator, since it minimizes
n
K
i=1
d(x, Xi )
(Yi − )2
hn
with respect to .
Although NW estimators are central in nonparametric regression, it is known that they are not robust due to the fact that NW
estimators can be considered as local least squares estimators and the least squares estimators are not robust. For instance, they
are sensitive to outliers and do not perform well when the errors are heavy-tailed. However, outliers or aberrant observations are
observed very often in econometrics and finance as well as in many other applied fields. A treatment of outliers is an important
step in highlighting features of a data set. So in order to attenuate the lack of robustness of NW estimators, M-type regression
estimators are natural candidates for achieving desirable robustness properties. In this paper, we investigate the asymptotic
properties of the nonparametric M-estimator mn (x), which is implicitly defined as the solution to the following problem: find
∈ R to minimize
n
d(x, Xi )
K
(Yi − )
(1.3)
hn
i=1
or to satisfy the equation
n
K
i=1
d(x, Xi )
(Yi − ) = 0,
hn
(1.4)
where (·) is a given outlier-resistant loss function defined in R and (·) is its derivative. If the relevant function is (y)=y, then
the estimator mn (x) becomes the NW estimator m∗n (x). If (y) = sign(y), then the resulting estimator of (1.4) is a nonparametric
least absolute distance estimator of m(x). Many examples of choices of can be found in the book of Serfling (1980).
There is much literature on M-estimation for real-valued data. For example, Huber (1964) studied M-estimation of a location
parameter and obtained the asymptotic theory for the proposed estimator. Härdle (1984) established weak and strong consistency
as well as asymptotic normality of a nonparametric regression M-estimator for i.i.d. observations. Härdle (1989) considered the
asymptotic maximal deviation of the nonparametric M-smoother mn (x) towards the regression curve m(x), i.e. sup0 x 1 |mn (x)−
m(x)|. Boente and Fraiman (1989) proposed nonparametric M-estimators for regression and autoregression of mixing processes.
Beran (1991) studied M-estimation of a local parameter for Gaussian long-memory processes and Beran et al. (2003) investigated
the behavior of a nonparametric kernel M-estimator for a fixed design model with long-memory errors.
To the best of our knowledge, however, nonparametric M-estimation has not been developed for functional data. The purpose
of this paper is to investigate the nonparametric M-estimator mn (x), which is defined as the solution of Eq. (1.4), and to obtain its
weak and strong consistency as well as its asymptotic normality under -mixing dependence assumptions. From the asymptotic
normality, the asymptotic mean and variance of the error mn (x) − m(x) can be got.
The rest of the paper is organized as follows. In Section 2, we present the assumptions and main results. The derivations of the
main results are given in Section 3. Section 4 includes some examples of functional processes that satisfy the mixing conditions
in this paper as well as a simulated example. The simulation results show that the proposed M-estimator works well in practice.
Some useful lemmas are given in the Appendix.
2. Assumptions and main results
A stationary process {Xi , Yi }∞
is called -mixing if
i=1
(n) → 0 as n → ∞,
where
(i) =
sup
∞
A∈F0
−∞ ,B∈Fi
|P(AB) − P(A)P(B)|,
where Fba is the -algebra generated by {Xi , Yi }bi=a . -Mixing dependence is satisfied by many time series models (see Pham and
Tran, 1985 for more details). Hence, it is assumed by many authors in nonparametric inference for real-valued time series,
Author's personal copy
J. Chen, L. Zhang / Journal of Statistical Planning and Inference 139 (2009) 533 -- 546
535
say Roussas (1990), Masry and Fan (1997) and Cai (2002). For limit theorems for -mixing sequences, we refer to Lin and Lu
(1996) and the references therein.
Define Di = d(x, Xi ) and i = Yi − m(Xi ). We first give the assumptions that are necessary in deriving the asymptotic properties
of the nonparametric M-estimator mn (x).
A1. K(·) is a nonnegative and bounded kernel function with compact support, say [0, 1].
A2. (·) is a convex function. Let (·) be the derivative of (·), then it is monotone and continuous almost everywhere.
A3. (i) F(u, x) := P(D1 u) = (u)f1 (x) as u → 0, where (0) = 0 and (u) is absolutely continuous in a neighborhood of 0 and
f1 : H −→ R is a nonnegative function;
(ii) supij P(Di u, Dj u) (u)f2 (x) as u → 0, where (u) → 0 as u → 0 and (hn )/ 2 (hn ) is bounded, f2 (·) is a nonnegative
function defined in H;
(iii) For i = 1, 2, Ii (hn ) := (hn / (hn )) 01 K i (v) (hn v) dv → Ki as n → ∞, whereK1 and K2 are two positive constants.
A4. Uniformly for s in a neighborhood of s = m(x),
(i) There exists a function 1 (·) defined in H, such that as |u| → 0,
E[(1 + u)|X1 = z] = 1 (z)u + o(u),
where 1 (z) is continuous in a neighborhood of x and 1 (x)0;
(ii) E|(Y1 − s)|r < ∞ for some r > 2 and if we let g1 (z, s) = E[|(Y1 − s)|2 |X1 = z] and g2 (z, s) = E[|(Y1 − s)|r |X1 = z], then g1 (z, s)
and g2 (z, s) are continuous in a neighborhood of x with respect to z. Furthermore, we have, for z in a neighborhood of x,
max{E[((1 + u) − (1 ))2 |X1 = z], E[|(1 + u) − (1 )|r |X1 = z]} 2 (|u|),
where 2 (u) is continuous at u = 0 with 2 (0) = 0;
(iii) gi,j (x1 , x2 ; s) = E[(Yi − s)(Yj − s)|Xi = x1 , Xj = x2 ], ij, is uniformly bounded in a neighborhood of (x1 , x2 ) = (x, x).
A5. |m(z) − m(x)| Cd (z, x) for z in a neighborhood of x, where is some positive constant.
∞ l [(l)]1−2/r < ∞, > 1 − 2/r.
A6.
l=1
For simplicity, we will write h instead of hn in the rest of the paper.
Remark 1. A1 is standard in nonparametric kernel estimation except that K(·) is a one-sided kernel on [0, 1]. This is because K(·)
takes the form K(d(x, Xi )/h) in our paper and d(x, Xi ) 0. Therefore, we assume that K(·) has a compact support [0, 1] as Masry
(2005). In A2, we impose some mild restrictions on (·). In Härdle (1984, 1989), (·) is assumed to be monotone, antisymmetric,
bounded and twice continuously differentiable. With contrast to their assumption, A2 is much milder and covers some wellknown special cases such as the least squares estimate (LSE), the least absolute distance estimate (LADE) and the mixed LSE and
LADE. Assumptions A3, A5 and A6 are the same as Conditions 1, 3 and 4 in Masry (2005). Gasser et al. (1998) also used the
condition F(u, x) = (u)f1 (x) as u → 0, and refer to f1 (·) as the functional probability density of X. For an insight of A3, we take
H=Rd as an example and assume that f1 (·) is the probability density function of X1 . Then, F(u, x) = C1 ud f1 (x) as u → 0, where
C1 is the volume of the unit ball in Rd . Let f2 (x) = supij fi,j (x, x), where fi,j (x, x) is the joint density of Xi and Xj at (x, x). Then,
supij P(Di u, Dj u) (C1 ud )2 f2 (x) as u → 0. So in this case, (u) = C1 ud and (u) = O(2 (u)) as u → 0. Furthermore, by an
elementary calculation, we know that Ii (h) = d 01 K i (v)vd−1 dv for i = 1, 2, which is obviously finite. Masry (2005) also gave an
alternative condition on F(u, x0 ) (i.e. 0 < c3 (h)f1 (x) F(h, x) c4 (h)f1 (x)) when calculating the variance of the proposed NW
estimator. A4 imposes some uniform continuity assumptions on certain conditional moments of (i + ·) and A4 (i) and (ii) are
similar to (M3) and (M4) in Bai et al. (1992). Assumption A6 concerns with the decaying rate of the mixing coefficient, which is
related to the moment order r in A4.
Our main results are stated as follows.
Theorem 1. Suppose that m(x) is an isolated root of E[(Y1 − )|X1 = x] = 0 and A1–A6 hold. Moreover, the bandwidth h satisfies
h → 0 and n(h) → ∞. Then, we have
P
mn (x) −→ m(x),
n → ∞.
(2.1)
Theorem 2. Assume that the assumptions of Theorem 1 hold. Besides, there exists a sequence of positive constants {An }, such that
An → ∞ and the quantities An , h and (l) satisfy
∞
n=1
A−r
n < ∞,
∞ An 1/2 1/2
n ([n1/2 ]) < ∞,
(h)
n=1
n2 (h)
A2n log2 n
→ ∞.
(2.2)
Author's personal copy
536
J. Chen, L. Zhang / Journal of Statistical Planning and Inference 139 (2009) 533 -- 546
Then,
a.s.
mn (x) −→ m(x),
n → ∞.
(2.3)
Remark 2. To gain an insight as to what kind of h, (l) and An satisfy the assumptions above, let us suppose that An =
n1/2 (h)(log n)−2 , (l) = O(l− ), where > max{ 72 , ( + 1)/(1 − 2/r)} and h is chosen such that (h) = O(n−1/2(1/2−1/r) ). Then,
(2.2) and A6 hold.
Theorem 3. Assume that the conditions of Theorem 1 hold. Besides, there exists a sequence of positive integers {vn }, such that vn → ∞,
vn = o((n(h))1/2 ) and (n/ (h))1/2 (vn ) → 0 as n → ∞. Then, for all x that satisfy 1 (x)f1 (x)0, we have
d
(n(h))1/2 (mn (x) − m(x) − n ) −→ N(0, 21 (x)),
(2.4)
where
2 (m(x))
n =
Bn
,
1 (x)f1 (x)K1
Bn =
1
E[1 (Y1 − m(x))] = O(h ),
(h)
21 (x) =
(
1 (x)f1 (x)K1 )2
,
i = K
d(x, Xi )
,
h
2 (s) = g1 (x, s)f1 (x)K2
and K1 , K2 are defined in A3 and g1 (x, s) is defined in A4.
Remark 3. In Masry (2005), a smoothness condition on the regression function m(x) is also imposed, that is |m(u)−m(v)| Cd (u, v)
for all u, v ∈ H for some > 0, and the asymptotic normality of the NW type estimator m∗n (x) is given as
d
(n(h))1/2 (m∗n (x) − m(x) − B∗n ) → N(0, ∗ (x)),
where the bias term B∗n = O(h ), which is of the same order as n in (2.4).
3. Proofs of the main results
In order to avoid unnecessary repetitions, we suppose that all limits are taken as n tends to infinity except special notations
and the limits in the proofs of Theorems 1 and 2 hold uniformly in a neighborhood of s = m(x).
Proof of Theorem 1. Define i = K(d(x, Xi )/h), Hn (x, s) = (1/n(h)) ni=1 i (Yi − s) and H(x, s) = E[(Y1 − s)|X1 = x]f1 (x)K1 ,
where 1 (x) is defined in A4. We first prove
P
Hn (x, s) −→ H(x, s)
(3.1)
and then prove (2.1) by the monotonity and continuity of the function (·). We prove (3.1) by verifying the following two
statements:
P
Hn (x, s) − EHn (x, s) −→ 0
(3.2)
EHn (x, s) −→ H(x, s).
(3.3)
and
The validity of (3.3) is obvious by an elementary calculation. In fact, by A1, A3 and A4,
EHn (x, s) =
=
1
(h)
1
(h)
E1 (Y1 − s)
E(1 E[(Y1 − s)|X1 ])
= E[(Y1 − s)|X1 = x](1 + o(1))
1
E1
1
u
F(du, x)
= E[(Y1 − s)|X1 = x](1 + o(1))
K
(h)
h
1
h
= E[(Y1 − s)|X1 = x](1 + o(1))
f1 (x)
K(v) (hv) dv
(h)
0
−→E[(Y1 − s)|X1 = x]f1 (x)K1 = H(x, s).
(h)
(3.4)
Author's personal copy
J. Chen, L. Zhang / Journal of Statistical Planning and Inference 139 (2009) 533 -- 546
537
To prove (3.2), we first show
n(h)Var[Hn (x, s)] −→ 2 (s),
(3.5)
where 2 (s) = g1 (x, s)f1 (x)K2 . Let Zn,i (s) = i (Yi − s), then
Hn (x, s) =
n
1 Zn,i (s)
n(h)
i=1
and
(n(h))1/2 (Hn (x, s) − EHn (x, s)) = 1
n
(Zn,i (s) − EZ n,i (s)).
n(h) i=1
(3.6)
Notice that
n(h)Var[Hn (x, s)] =
1
1
Var(Zn,1 ) +
(h)
n(h)
n
i,j=1 ij
Cov(Zn,i , Zn,j ).
(3.7)
By (3.4) and the definition of Zn,1 (s), we have
EZ n,1 (s) = O((h)).
(3.8)
By A5, we know that when d(X1 , x) h, m(X1 ) = m(x) + O(h ). Hence,
Var(Zn,1 ) = E[1 (Y1 − s)]2 + O(2 (h))
= E(21 E[2 (Y1 − s)|X1 ]) + O(2 (h))
1
= g1 (x, s)(1 + o(1))hf 1 (x)
K 2 (v) (hv) dv + O(2 (h)).
0
This, in conjunction with A3(iii) and (h) → 0, gives
1
(h)
Var(Zn,1 ) → 2 (s).
(3.9)
Next, we will show
n
i,j=1 ij
|Cov(Zn,i , Zn,j )| = o(n(h)).
(3.10)
Let 0 < an < n be a sequence of positive integers that tends to infinity and its value will be specified later. Then,
n
i,j=1 ij
|Cov(Zn,i , Zn,j )| 0<|i−j| an
|Cov(Zn,i , Zn,j )| +
|Cov(Zn,i , Zn,j )|
|i−j|>an
=: J1 + J2 .
(3.11)
In view of (3.8), A1, A3(ii) and A4(iii), we have
J1 =
0<|i−j| an
0<|i−j| an
0<|i−j| an
[|EZ n,i Zn,j | + (EZ n,1 )2 ]
[|Egi,j (Xi , Xj ; s)i j | + O(2 (h))]
[CP(Di h, Dj h) + O(2 (h))]
Cnan [f2 (x)(h) + O(2 (h))],
where C denotes a positive constant that may take different values in different places.
(3.12)
Author's personal copy
538
J. Chen, L. Zhang / Journal of Statistical Planning and Inference 139 (2009) 533 -- 546
As for the covariances in J2 , we employ A4 and Corollary A.2 (Davydov's lemma) in Hall and Heyde (1980) and get
|Cov(Zn,i , Zn,j )| 8{E|(Y1 − s)1 |r }2/r [(|i − j|)]1−2/r
C{P(D1 h)}2/r [(|i − j|)]1−2/r
C{f1 (x)(h)}2/r [(|i − j|)]1−2/r .
Hence,
J2 C
{f1 (x)(h)}2/r [(|i − j|)]1−2/r
|i−j|>an
∞
Cn{f1 (x)(h)}2/r
[(l)]1−2/r
l=an +1
a
n
Cn{f1 (x)(h)}2/r
∞
l [(l)]1−2/r .
(3.13)
l=an +1
Now we choose an = [((h))−(1−2/r)/ ], then by (3.12) and (3.13), we have
J1 Cn((h))2−(1−2/r)/ and
∞
J2 Cn(h)
l [(l)]1−2/r .
l=an +1
By A6, we have
J1 = o(n(h))
(3.14)
J2 = o(n(h)).
(3.15)
and
Combining (3.11), (3.14) and (3.15), we get (3.10). By (3.7), (3.9) and (3.10), we know that (3.5) is true. Therefore,
1
= oP (1).
Hn (x, s) − EHn (x, s) = OP n(h)
This implies that (3.2) holds. Thus we have proved (3.1).
The remaining proof of Theorem 1 is rather standard in the context of M-estimation and we omit it here. For details, see Huber
(1964). Proof of Theorem 2. We first prove
a.s.
Hn (x, s) − EHn (x, s) −→ 0,
n → ∞,
(3.16)
a.s.
P
then the rest of the proof of mn (x) −→ m(x) is similar to that of mn (x) → m(x). To prove (3.16), we first employ a truncation method.
Let {An }↑∞ be the sequence that satisfies (2.2) in Theorem 2 and denote
A = A (Y − s) ,
Zn,i
i
i
HnA (x, s) =
n
1 A
Zn,i ,
n(h)
i=1
where A (y) = (y)I(|(y)| An ). Then,
A
|Hn (x, s) − EHn (x, s)| |Hn (x, s) − HnA (x, s)| + |HnA (x, s) − EHA
n (x, s)| + |EHn (x, s) − EHn (x, s)|.
Since
∞
−r
n=1 An < ∞, we have, for any > 0,
∞
n=1
P(|(Yn − s)| > An ) ∞
n=1
−r E|r (Y1 − s)|A−r
n < ∞.
(3.17)
Author's personal copy
J. Chen, L. Zhang / Journal of Statistical Planning and Inference 139 (2009) 533 -- 546
539
By Theorem 4.2.1 in Galambos (1978), we know that
P
max {|(Yi − s)|} > An i.o. = 0.
1in
Therefore, for n sufficiently large,
Hn (x, s) = HnA (x, s) a.s.
(3.18)
Besides, by A3(iii), A4(ii), r > 2 and An → ∞, we have
|EHA
n (x, s) − EHn (x, s)| 1
(h)
An1−r
An1−r
E[|(Y1 − s)|I(|(Y1 − s)| > An )1 ]
1
(h)
1
(h)
E[|(Y1 − s)|r I(|(Y1 − s)| > An )1 ]
E[|(Y1 − s)|r 1 ]
= g2 (x, s)(1 + o(1))An1−r
1
(h)
= g2 (x, s)(1 + o(1))An1−r f1 (x)
E1
1
h
K(v) (hv) dv
(h) 0
→0.
(3.19)
A / (h)| A / (h), we now apply Lemma 1 in the Appendix and get for each q = 1, . . . , [n/2],
As |Zn,i
n
⎛
⎞
n
A
A
Z
Z
n,i
n,i ⎝
> n⎠
−E
P(|HnA − EHA
n | > ) = P (h) i=1 (h)
n
2 q
4An 1/2
,
4 exp − 2
q
+ 22 1 +
(h)
2q
8v (q)
(3.20)
where v2 (q) = 22 (q)/p2 + An /(2(h)), p = [n/2q] and
⎛
⎞
A
A
Zn,[(j+1)p]−[jp]
Zn,1
2
⎠,
+ ··· +
(q) = max Var ⎝
(h)
(h)
0 j 2q−1
which, by (3.5), is bounded above by Cp/ (h). If we choose q = [n1/2 ], then (3.20) is bounded above by
4 exp{−Cn1/2 (h)/An } + C
By n2 (h)/A2n log2 n → ∞ and
∞
An 1/2 1/2
n ([n1/2 ]).
(h)
∞
n=1 (An / (h))
1/2 1/2
n ([n1/2 ]) < ∞, we know
P(|HnA − EHA
n | > ) < ∞,
n=1
which, by the Borel–Cantelli lemma, implies HnA − EHA
n → 0 a.s. Combining this with (3.17)–(3.19), we get (3.16). A
A
Remark 4. Notice that if we choose q?n1/2 ( i.e. q/n1/2 → ∞), then by (3.20) we know that in order ∞
n=1 P(|Hn − EHn | > ) < ∞,
we need more rigid conditions on the mixing coefficient (l) but weaker conditions on the bandwidth h. For example, if
we choose q = [n2/3 ], then in order that the last term in (3.20) is summable, it suffices that n((h)/An log n)3/2 → ∞ and
∞
1/2 2/3
n ([n1/3 ]) < ∞. Hence, there is a tradeoff between the bandwidth h and the mixing coefficient (l). In the
n=1 (An / (h))
proof of Theorem 2, we just choose q = [n1/2 ].
Proof of Theorem 3. We first prove that for any c > 0,
n
i [(Yi − s) − (Yi − m(x)) + (Yi − m(x))(s − m(x))]
sup
1/2
(n(h)) |s−m(x)| c i=1
1
2
− (m(x) − s) 1 (x)f1 (x)K1 n(h) = oP (1).
2
Author's personal copy
540
J. Chen, L. Zhang / Journal of Statistical Planning and Inference 139 (2009) 533 -- 546
As (·) is convex, we have
i |(Yi − s) − (Yi − m(x)) + (Yi − m(x))(s − m(x))|
i |s − m(x)||(Yi − s) − (Yi − m(x))|
= i |s − m(x)||(i + m(Xi ) − s) − (i + m(Xi ) − m(x))|
i |s − m(x)|[|(i + m(Xi ) − s) − (i )| + |(i + m(Xi ) − m(x)) − (i )|].
(3.21)
By A5, we know that when (n(h))1/2 |s − m(x)| c and d(x, Xi ) h, we have
|m(Xi ) − s| |m(Xi ) − m(x)| + |m(x) − s| Ch + (n(h))−1/2 c
(3.22)
|m(Xi ) − m(x)| Ch .
(3.23)
and
Let ni = i |(i + m(Xi ) − m(x)) − (i )|, then by Theorem 17.2.2 in Ibragimov and Linnik (1971) and A6, we have
⎞
⎛
n
n
ni ⎠ nE2n1 + Cn
|Cov(n1 , ni )|
Var ⎝
i=1
i=2
n
nE2n1 + Cn [(i)]1−2/r (E|n1 |r )2/r
i=1
2
nEn1 + Cn(E|n1 |r )2/r .
(3.24)
Furthermore, by A3, A4 and (3.23), we have
E2n1 = E[E(2n1 |X1 )] CE[21 2 (|m(X1 ) − m(x)|)] C (h)
2 (h ).
Similarly, we have
E|n1 |r C (h)
2 (h ).
Therefore, by (3.24), we know that
⎞
⎛
n
ni ⎠ = O(n(h)
2 (h )) = o(n(h)).
Var ⎝
(3.25)
i=1
If we let ni = |(i + m(Xi ) − s) − (i )|, then by (3.22), we can similarly prove
⎛
⎞
n
Var ⎝
ni ⎠ = O(n(h)
2 (h + (n(h))−1/2 )) = o(n(h)).
(3.26)
i=1
From (3.21), (3.25) and (3.26), we know that the following equation holds uniformly for (n(h))1/2 |s − m(x)| c,
⎛
⎞
n
Var ⎝
i [(Yi − s) − (Yi − m(x)) + (Yi − m(x))(s − m(x))]⎠ = o((s − m(x))2 n(h)) = o(1).
i=1
This implies that
n
{i [(Yi − s) − (Yi − m(x)) + (Yi − m(x))(s − m(x))]
i=1
− E(i [(Yi − s) − (Yi − m(x)) + (Yi − m(x))(s − m(x))])}
= oP (1)
(3.27)
uniformly for (n(h))1/2 |s − m(x)| c. Moreover, by Lemma 1 in Bai et al. (1992), we have
E[(1 + u) − (1 )|X1 = z] = 12 1 (z)u2 + o(u2 )
as |u| → 0.
Author's personal copy
J. Chen, L. Zhang / Journal of Statistical Planning and Inference 139 (2009) 533 -- 546
541
By (3.22) and (3.23), we know that
n
E(i [(Yi − s) − (Yi − m(x))]) = nE(1 E[(Yi − s) − (Yi − m(x))|X1 ])
i=1
1
nE(1 1 (X1 )[(m(Xi ) − s)2 − (m(X1 ) − m(x))2 ])(1 + o(1))
2
1
= (m(x) − s)nE[1 1 (X1 )(2m(X1 ) − s − m(x))](1 + o(1))
2
=
(3.28)
holds uniformly for (n(h))1/2 |s − m(x)| c. On the other hand,
(s − m(x))
n
E[i (Yi − m(x))] = (s − m(x))nE[1 1 (X1 )(m(X1 ) − m(x))](1 + o(1)).
(3.29)
i=1
Combining (3.28) with (3.29), we get
E
n
(i [(Yi − s) − (Yi − m(x)) + (Yi − m(x))(s − m(x))])
i=1
1
(m(x) − s)2 nE[1 1 (X1 )](1 + o(1))
2
1
= (m(x) − s)2 1 (x)f1 (x)K1 n(h)(1 + o(1)).
2
=
(3.30)
By (3.27) and (3.30), we know that
n
1
2
= oP (1)
[
(Y
−
s)
−
(Y
−
m(x))
+
(Y
−
m(x))(s
−
m(x))]
−
(x)f
(x)K
n
(h)
(m(x)
−
s)
i
i
i
i
1
1
1
2
i=1
holds uniformly for (n(h))1/2 |s − m(x)| c. As
n
i=1 i [(Yi − s) − (Yi − m(x)) + (Yi − m(x))(s − m(x))] is convex in s, and
1 (m(x) − s)2 (x)f (x)K n(h) is continuous and convex in s, by Theorem 10.8 of Rockafellar (1970), we have
1
1
1
2
n
sup
i [(Yi − s) − (Yi − m(x)) + (Yi − m(x))(s − m(x))]
1/2
(n(h)) |s−m(x)| c i=1
1
2
− (m(x) − s) 1 (x)f1 (x)K1 n(h) = oP (1).
2
By arguments similar to the proof of (3.31) and by Theorem 2.5.7 of Rockafellar (1970), we have
n
1
sup
i [(Yi − s) − (Yi − m(x))]
n(h) i=1
(n(h))1/2 |s−m(x)| c + n(h)
1 (x)f1 (x)K1 (s − m(x)) = oP (1).
(3.31)
(3.32)
Denote n = mn (x) − m(x) and
n =
n
1
(
(x)f1 (x)K1 )−1
i (Yi − m(x)) = (
1 (x)f1 (x)K1 )−1 Hn (x, m(x)).
n(h) 1
i=1
Using the same technique as that used in the proof of Theorem 2.4 in Bai et al. (1992), we obtain
n(h)|n − n | = oP (1).
(3.33)
Hence, in order to prove Theorem 3, it remains for us to prove
d
(n(h))1/2 (Hn (x, m(x)) − EHn (x, m(x))) −→ N(0, 2 (m(x))),
(3.34)
EHn (x, m(x)) = Bn = O(h ).
(3.35)
and
Author's personal copy
542
J. Chen, L. Zhang / Journal of Statistical Planning and Inference 139 (2009) 533 -- 546
Eq. (3.35) is obvious. In fact
|Bn | 1
(h)
1
=
(h)
E(1 |E[(Y1 − m(x))|X1 ]|)
E(1 |
1 (X1 )(m(X1 ) − m(x))|)
Ch |
1 (x)|f1 (x)
= O(h ).
1
h
K(u) (hu) du
(h) 0
Next, we will prove (3.34). Let Zn,i (m(x)) = (1/ (h))(Zn,i (m(x)) − EZ n,i (m(x))), then
n
1 Zn,i (m(x)).
(n(h))1/2 (Hn (x, m(x)) − EHn (x, m(x))) = √
n
i=1
For simplicity, we will write Zn,i instead of Zn,i (m(x)). It is easy to see that (3.34) is equivalent to
n
1 d
Zn,i −→ N(0, 2 (m(x))).
√
n
(3.36)
i=1
To show this, we adopt the small- and large-block technique. Namely, partition {1, . . . , n} into 2kn + 1 subsets with small-block
of size v = vn , which is defined in Theorem 3, and large-block of size u = un and kn = [n/(un + vn )], where [a] denotes the integer
part of a. For the choice of un , note that by the conditions in Theorem 3, there exists a sequence of positive integers {qn } such that
qn → ∞, qn vn = o((n(h))1/2 ) and qn (n/ (h))1/2 (vn ) → 0. Let un = [(n(h))1/2 /qn ], then
vn
→ 0,
un
un
→ 0,
n
un
(n(h))
1/2
→ 0,
n
(vn ) → 0.
un
(3.37)
Define
j(u+v)+u
j =
Zn,i ,
j = 0, . . . , kn − 1,
i=j(u+v)+1
(j+1)(u+v)
j =
Zn,i ,
j = 0, . . . , kn − 1
i=j(u+v)+u+1
and
n
kn =
Zn,i ,
i=kn (u+v)+1
then
n
i=1 Zn,i =
kn −1
j=0
j +
kn −1
j=0
j + kn =: Qn1 + Qn2 + Qn3 . We will show (3.36) via
1 2
EQ → 0,
n n2
1 2
EQ → 0,
n n3
k
n −1
−1/2
−1/2
E exp(itn
Q
)
−
E
exp(itn
)
n1
j → 0,
j=0
(3.38)
(3.39)
(3.40)
kn −1
1 E2j → 2 (m(x))
n
(3.41)
kn −1
√
1 E2j I(|j | (m(x)) n) → 0
n
(3.42)
j=0
and
j=0
for any > 0.
Author's personal copy
J. Chen, L. Zhang / Journal of Statistical Planning and Inference 139 (2009) 533 -- 546
543
As for (3.38) and (3.39), we only prove (3.38), since the proof of (3.39) is similar. By stationarity,
EQ 2n2 = kn E20 +
k
n −1
i,j=0 ij
Cov(i , j )
=: I1 + I2 .
(3.43)
By (3.37), the proofs of (3.9) and (3.10) and the definition of kn , we know
⎤
⎡
v
2
I1 = kn ⎣vEZ +
Cov(Zn,i , Z )⎦
n,1
n,j
i,j=1 ij
= kn [v(2 (m(x)) + o(1)) + o(v)]
= kn v[2 (m(x)) + o(1)] = o(n).
(3.44)
Note that
k
n −1
|I2 | (i+1)(u+v)
(j+1)(u+v)
i,j=0 ij l1 =i(u+v)+u+1 l2 =j(u+v)+u+1
|Cov(
Zn,l , Zn,l )|.
1
2
For l1 ∈ {i(u + v) + u + 1, i(u + v) + u + 2, . . . , (i + 1)(u + v)} and l2 ∈ {j(u + v) + u + 1, j(u + v) + u + 2, . . . , (j + 1)(u + v)}, ij, it is
obvious that |l1 − l2 | u. Hence,
|I2 | n
i,j=1 |j−i| u
|Cov(
Zn,i , Zn,j )| = o(n).
(3.45)
In view of (3.43)–(3.45), (3.38) holds.
As for (3.40), by Lemma 2 in the Appendix, we know that
k
n −1
−1/2
−1/2
E exp(itn
Q
)
−
E
exp(itn
)
n1
j 16kn (vn ),
j=0
which tends to zero by (3.37).
A simple computation gives
kn −1
k n un 2
1 [ (m(x)) + o(1)],
E2j =
n
n
j=0
which implies (3.41) by kn un /n → 1.
It remains for us to prove (3.42). As is not necessarily bounded, we employ a truncation method. Denote L (·)=(·)I(|(·)| L)
Z L = (Z L − EZ L )/ (h) and L , j = 0, . . . , kn − 1, be large-block with Zn,i replaced by Z L . By
and define Z L = i (Yi − m(x)), L
n,i
n,i
n,i
n,i
the boundedness of K(·), we have
|Lj | C un
(h)
j
√
= o( n).
n,i
(3.46)
Let 2L (s) = E[2L (Y1 − s)|X1 = x]K2 f1 (x), then by (3.46), we have
k−1
√
1 L2
Ej I(|Lj | L n) → 0
n
for any > 0.
j=0
It is obvious that (3.38)–(3.41) still hold when we substitute with L in each expression. As a result,
kn −1
1 d
Lj −→ N(0, 2L (m(x))).
√
n
j=0
L
Let j = j − Lj . Reminding (3.47), we only need to show
⎞2
⎛
kn −1
L
1 ⎝
j ⎠ → 0
E
n
j=0
(3.47)
Author's personal copy
544
J. Chen, L. Zhang / Journal of Statistical Planning and Inference 139 (2009) 533 -- 546
as first n → ∞ and then L → ∞. In fact, a calculation similar to the proof of (3.38) gives
⎛
⎞2
kn −1
kn −1
L
L
L L
1 ⎝
k
1 E
j ⎠ = n E(1 )2 +
Cov(i , j )
n
n
n
j=0
i,j=0 ij
2
kn u
E[L (Y1 − m(x))|X1 = x]f1 (x)K2 (1 + o(1)) + o(1)
=
n
2
n→∞
−→ f1 (x)K2 E[L (Y1 − m(x))|X1 = x]
L→∞
−→ 0,
where L = − L . Thus, the proof of Theorem 3 is completed. Remark 5. Beran et al. (2003) studied the following fixed design model
yi = g(ti ) + i ,
ti =
i
, i = 1, 2, . . . , n
n
with long-memory errors i and got the order O(h2 ) for the bias of the proposed nonparametric M-estimator g(t). In their paper,
1
K(·) was assumed to be a symmetric kernel on [−1, 1], so −1
uK(u) du = 0. After applying the third order Taylor expansion to g(ti ),
g(t) − g(t). However, as H is an abstract functional space in our paper, we do not apply Taylor expansion
O(h2 ) was obtained for E
directly to m(Xi ). By imposing the smoothness condition A5 on the regression function m(·), we get
1
h
K(u) (hu) du = O(h ).
Emn (x) − m(x) = n = O(Bn ) = O h 1 (x)f1 (x)
(h) 0
4. Examples and numerical implementation
We first give two examples that satisfy the -mixing conditions assumed in this paper. The first one is a multivariate process
and the second one is a sequence of random curves (or functions).
Example 1. Let H = Rd . For x = (x1 , . . . , xd ) ∈ Rd , x is defined as x =
x21 + · · · + x2d , which is the Euclid norm of x. For any
d × d matrix A, A = supx=1 Ax. Suppose that {Xt } is a multivariate MA(∞) process defined by
Xt =
∞
Ai t−i ,
(4.1)
i=0
where Ai are d × d matrices and An → 0 exponentially fast as n → ∞, t are i.i.d. Rd -valued random vectors with mean 0. If
the probability density function of t exists (such as multivariate normal, exponential, and uniform distribution), then by Pham
and Tran (1985), {Xt } is -mixing with (n) → 0 exponentially fast. It can be easily verified that A5 holds. If we choose An such
∞
2
1/2 1/2
−r
2
2
that ∞
n ([n1/2 ]) < ∞. So the conditions in (2.2) are satisfied.
n=1 An < ∞ and n (h)/An log n → ∞, then
n=1 (An / (h))
Besides, if we let (h) = O(n− ), 0 < < 1, and v = O(n(1−)/3 ), then (n/ (h))1/2 (v ) → 0, which implies that the assumptions
n
concerning the -mixing coefficient in Theorem 3 are satisfied.
n
Example 2. Assume that {Vt } is a real-valued autoregressive moving average (ARMA) process, then it admits a Markovian
representation
Vi = HZ i ,
Zi = FZ i−1 + Gei ,
(4.2)
where Zi are random variables, H, F, G are appropriate constants and et are i.i.d. random variables with density function g(·). If
|F| < 1, E|e1 | < ∞ and |g(x) − g(x − )| dx = O(|| ) for some > 0 and > 0, then by Pham and Tran (1985), {Vt } is an -mixing
process with (n) → 0 at an exponential rate. Let Xi (t) = h(Vi t), where h(·) is some real-valued function, then {Xi (t)} is a sequence
of -mixing random curves with exponentially decaying mixing rate. Hence, for suitably chosen h and vn , the -mixing conditions
in Theorem 3 hold.
Although there are cases of time series that are mixing, the mixing condition is difficult to check in general. So in the context
of asymptotic theory of statistics, it is usually a common practice to assume the mixing property as a precondition.
Numerical implementation: Next, we will give a numerical example to show the efficiency of the M-estimation discussed in
this paper. We choose H = C[0, 1], which is the space composed of all of the continuous functions defined in [0, 1]. We consider
Author's personal copy
J. Chen, L. Zhang / Journal of Statistical Planning and Inference 139 (2009) 533 -- 546
545
Table 1
The average MADE of mn and m∗n
Error distribution
n
mn
m∗n
(i)
(i)
(ii)
(ii)
(iii)
(iii)
300
800
300
800
300
800
0.1073 (0.0102)
0.0919 (0.0068)
0.1206 (0.0121)
0.0997 (0.0081)
0.2843 (0.0636)
0.3122 (0.0618)
0.2152 (0.0221)
0.1967 (0.0183)
0.5029 (0.0510)
0.4308 (0.0385)
6.3254 (12.3969)
6.4584 (11.0093)
the random curves Xi (t) = exp(2Wi t), 0 t 1, where Wi are random variables generated by
Wi = 0.1Wi−1 + ei ,
ei ∼ N(0, 1).
By Example 2, we know that {Xi (t)} is a sequence of -mixing random curves with exponentially decaying mixing rate.
Consider the following model:
Yi = m(Xi ) + i ,
m(x) = 4
1
(x (t))2 dt,
(4.3)
0
which was also used in Laksaci et al. (2007). In view of the smoothness of the random curves Xi (t), we choose the distance in H
as d(x, y) = ( 01 (x (t) − y (t))2 dt)1/2 for x = x(t) ∈ H and y = y(t) ∈ H. For more discussion on the choices of distance in functional
nonparametric estimation, see Ferraty and Vieu (2006).
In this example, we apply the quadratic kernel function K(u) = 32 (1 − u2 )I(0 u 1). In order to compare the performance of
the proposed M-estimator mn with the NW estimator m∗n , we consider the following different distributions of the errors i ,
(i) standard normal distribution: i ∼ N(0, 1);
(ii) contaminated errors: i ∼ 0.4N(0, 22 ) + 0.9N(0, 1);
(iii) Cauchy distribution: i ∼ C[0, 1].
Furthermore, we choose (y) = max{−1.75, min{y, 1.75}}, which is Huber's function with c = 1.75.
A difficult problem in simulation is the choice of a proper bandwidth. Here, we will apply a simple method to choose the
bandwidth. For a predetermined sequence of h's from a wide range, say from 0.05 to 1 with an increment 0.02, we choose the
bandwidth to minimize the mean absolute deviation error (MADE), which is defined in (4.4).
As it may be not easy to find the solution to (1.4), we apply an iterative procedure to obtain mn . For any original value 0 ,
calculate t iteratively by
⎞−1
n
n
t = t−1 + ⎝ (Yi − t−1 )i ⎠
(Yi − t−1 )i .
⎛
i=1
i=1
This procedure terminates at t0 when |t0 − t0 −1 | < 1 × 10−4 or t0 = 500, then we let mn = t0 . We compare the MADE of mn
with that of the NW estimator m∗n . The MADEs of mn and m∗n are defined as
n
1
|mn (Xi ) − m(Xi )|
n
i=1
and
n
1 ∗
|mn (Xi ) − m(Xi )|,
n
(4.4)
i=1
respectively. The simulation data consist of 500 replications of sample sizes n = 300 and 800. The results are reported in Table 1.
From Table 1, we know that the proposed M-estimator mn is more robust than the NW estimator m∗n . This is more apparent
when the errors i are heavy-tailed (distribution (iii)). So when dealing with outliers or aberrant observations, mn performs better
than the NW estimator m∗n .
Acknowledgments
We show our gratitude to the referees for their constructive comments, which help greatly to improve the presentation of
this paper. We would also like to thank Dr. Li, D.G. for his help in the completion of this paper. This work was supported by the
National Natural Science Foundation of China (Grant no. 10771192).
Appendix
The first lemma follows by Theorem 1.3 of Bosq (1998) and the second lemma is due to Volkonskii and Rozanov (1959).
Author's personal copy
546
J. Chen, L. Zhang / Journal of Statistical Planning and Inference 139 (2009) 533 -- 546
Lemma 1. Suppose that {Xt } is a stationary -mixing process with mean 0 and Sn =
each q = 1, . . . , [n/2] and > 0,
P(|Sn | > n) 4 exp −
2 q
8v2 (q)
n
t=1 Xt . If P(|Xt | b) = 1, t = 1, . . . , n, then for
n
4b 1/2
,
q
+ 22 1 +
2q
where v2 (q) = 22 (q)/p2 + b/2, p = n/(2q), and
2 (q) =
max
0 j 2q−1
E{([jp] + 1 − jp)X1 + X2
+ · · · + X[(j+1)p]−[jp] + (jp + p − [jp + p])X[(j+1)p]−[jp]+1 }2 .
Lemma 2. Let V1 , . . . , VL be -mixing random variables that can take values in the complex space and are measurable with respect to
j
j
the -algebras Fi1 , . . . , FiL , respectively, with 1 i1 < j1 < i2 < · · · < jL n, il+1 − jl w 1 for 1 l L − 1. Assume that |Vj | 1 for
L
1
j = 1, . . . , L, then
⎛
⎞
L
L
E ⎝
⎠−
V
E(V
)
j
j 16(L − 1)(w),
j=1
j=1
where (w) is the mixing coefficient.
References
Araujo, A., Gine, E., 1980. The Central Limit Theorem for Real and Banach Valued Random Variables. Wiley, New York.
Bai, Z.D., Rao, C.R., Wu, Y., 1992. M-estimation of multivariate linear regression parameters under a convex discrepancy function. Statist. Sinica 2, 237–254.
Beran, J., 1991. M estimators of location for Gaussian and related processes with slowly decaying serial correlations. J. Amer. Statist. Assoc. 86, 704–707.
Beran, J., Ghosh, S., Sibbertsen, P., 2003. Nonparametric M-estimation with long-memory errors. J. Statist. Plann. Inference 117, 199–205.
Boente, G., Fraiman, R., 1989. Robust nonparametric regression estimation for dependent observations. Ann. Statist. 17, 1242–1256.
Bosq, D., 1998. Nonparametric Statistics for Stochastic Processes: Estimation and Prediction. Lecture Notes in Statistics, second ed., vol. 110. Springer, Berlin.
Cai, Z., 2002. Regression quantiles for time series. Econom. Theory 18, 169–192.
Fan, J., Yao, Q., 2003. Nonlinear Time Series. Springer, New York.
Ferraty, F., Vieu, P., 2002. Nonparametric models for functional data, with application in regression, time series prediction and curve discrimination. Presented at
the International Conference on Advances and Trends in Nonparametric Statistics, Crete, Greece.
Ferraty, F., Vieu, P., 2006. Nonparametric Functional Data Analysis. Springer Series in Statistics. Springer, New York.
Galambos, J., 1978. The Asymptotic Theory of Extreme Order Statistics. Wiley, New York.
Gasser, T., Hall, P., Presnell, B., 1998. Nonparametric estimation of the mode of a distribution of random curves. J. Roy. Statist. Soc. Ser. B 60, 681–691.
Hall, P., Heyde, C.C., 1980. Martingale Limit Theory and its Applications. Academic Press, New York.
Härdle, W., 1984. Robust regression function estimation. J. Multivariate Anal. 14, 169–180.
Härdle, W., 1989. Asymptotic maximal derivation of M-smoothers. J. Multivariate Anal. 29, 163–179.
Härdle, W., Lütkepohl, H., Chen, R., 1997. A review of nonparametric time series analysis. Internat. Statist. Rev. 21, 49–72.
Huber, P.J., 1964. Robust estimation of a location parameter. Ann. Math. Statist. 35, 73–101.
Ibragimov, I.A., Linnik, Yu.V., 1971. Independent and Stationary Sequences of Random Variables. Wolters, Groningen.
Laksaci, A., Lemdani, M., Ould-Sa¨d, E., 2007. L1 -norm kernel estimator of conditional quantile for functional regressors: consistency and asymptotic normality.
Manuscript.
Lin, Z., Lu, C., 1996. Limit Theory for Mixing Dependent Random Variables. Science Press, Kluwer Academic Publisher, Beijing.
Masry, E., 2005. Nonparametric regression estimation for dependent functional data: asymptotic normality. Stochastic Process. Appl. 115, 155–177.
Masry, E., Fan, J., 1997. Local polynomial estimation of regression functions for mixing processes. Scand. J. Statist. 24, 165–179.
Pham, T.D., Tran, L.T., 1985. Some mixing properties of time series models. Stochastic Process. Appl. 19, 297–303.
Ramsay, J., Silverman, B., 2002. Applied Functional Data Analysis: Methods and Case Studies. Springer, New York.
Robinson, P.M., 1989. Hypothesis testing in semiparametric and nonparametric models for econometric time series. Rev. Econom. Stud. 56, 511–534.
Rockafellar, R.T., 1970. Convex Analysis. Princeton University Press, Princeton.
Roussas, G.G., 1990. Nonparametric regression estimation under mixing conditions. Stochastic Process. Appl. 36, 107–116.
Serfling, R.J., 1980. Approximation Theorems of Mathematical Statistics. Wiley, New York.
Vakashnia, N.N., Tarieladze, V.I., Chobanyan, S.A., 1987. Probability Distributions on Banach Spaces. D. Reidel Publishing Co., Boston.
Volkonskii, V.A., Rozanov, Yu.A., 1959. Some limit theorems for random functions. Theory Probab. Appl. 4, 178–197.
Download