Weighted Sums and Residual Empirical Processes for Time-varying Processes

advertisement
Weighted Sums and Residual Empirical Processes for
Time-varying Processes
1
1
Gabriel Chandler∗ and 2 Wolfgang Polonik
Department of Mathematics, Pomona College, Claremont, CA 91711
2
Department of Statistics, University of California, Davis, CA 95616
E-mail: gabriel.chandler@pomona.edu and polonik@wald.ucdavis.edu
December 15, 2012
Abstract
In the context of a time-varying AR-process we study both function indexed
weighted sums, and sequential residual empirical processes. As for the latter it turns
out that somewhat surprisingly, under appropriate assumptions, the non-parametric estimation of the parameter functions has a negligible asymptotic effect on the estimation
of the error distribution. Function indexed weighted sum processes are generalized partial sums. An exponential inequality for such weighted sums of time-varying processes
provides the basis for a weak convergence result of weighted sum processes. Properties
of weighted sum processes are needed to treat the residual empirical processes. As an
application of our results we discuss testing for the shape of the variance function.
Keywords: Cumulants, empirical process theory, exponential inequality, locally stationary processes, nonstationary processes, investigating unimodality.
1
1
Introduction
Consider the following time-varying AR-model of the form
Yt −
p
X
θk
t
k=1
n
Yt−k = σ
t
n
t ,
t = −p + 1, . . . , 0, 1, . . . , n,
(1)
where θk are the autoregressive parameter functions, p is the order of the model, σ is a
function controlling the volatility, and t ∼ (0, 1) i.i.d. Following Dahlhaus (1997), time is
rescaled to the interval [0, 1] in order to make a large sample analysis feasible. Observe that
this in particular means that Yt = Yt,n satisfying (1) in fact forms a triangular array.
The consideration of non-stationary time series models goes back to Priestley (1965) who
considered evolutionary spectra, i.e. spectra of time series evolving in time. The time-varying
AR-process has always been an important special case, either in more methodological and
theoretical considerations of non-stationary processes, or in applications such as signal processing and (financial) econometrics, e.g. Subba Rao (1970), Grenier (1983), Hall et al.
(1983), Rjan and Rayner (1996), Girault et al. (1998), Eom (1999), Drees and Stǎricǎ
(2002), Orbe et al. (2005), Fryzlewicz et al. (2006), and Chandler and Polonik (2006).
Dahlhaus (1997) advanced the formal analysis of time-varying processes by introducing the
notion of a locally stationary process. This is a time-varying processes with time being
rescaled to [0, 1] that satisfies certain regularity assumptions - see (14 - 16) below. We would
like to point out, however, that in this paper local stationarity is only used to calculate
the asymptotic covariance function in Theorem 3. All the other results hold under weaker
assumptions.
b of θ = (θ1 , . . . , θp )0 and corresponding
Let Yt−1 = (Yt−1 , . . . , Yt−p )0 . Given an estimator θ
b t 0 Yt−1 , we estimate the distribution of the innovations ηt = σ t t
residuals ηbt = Yt − θ
n
n
by the distribution function of the residuals. Define the sequential empirical distribution
function of the residuals as
bαnc
1X
b
Fn (α, z) =
1{b
ηt ≤ z}, z ∈ R, α ∈ [0, 1].
n t=1
Let Fn (α, z) =
1
n
Pbαnc
t=1
1{ηt ≤ z}, denote the sequential empirical distribution function of
the innovations ηt . A somewhat surprising result that will be shown here is that under
2
appropriate conditions we have (see Theorem 2)
sup
|Fbn (α, z) − Fn (α, z)| = oP (n−1/2 )
(2)
α∈[0,1],z∈[−L,L]
even though non-parametric estimation of the parameter functions θk is involved. See also
(12) where the ηbt are replaced by b
t .
One can see that using appropriate estimators for the parameter functions, the (average)
distribution function of the ηt can be estimated just as well as if the parameters were known.
The same applies to the estimation of the distribution function of the errors t (cf. (12) in
section 2.1). This effect is similar to what is known in the literature from the parametric
case, i.e. for a standard AR(p)-process (see Koul 2002). In our case, this phenomenon is
caused by the particular structure underlying the time varying processes considered. Other
empirical processes based on residuals studied in the literature that also involve nonparametric estimation do not allow for such a behavior. See, for instance, Akritas and van Keilegom
(2001), Cheng (2005), Müller, Schick and Wefelmayer (2007, 2009a, 2009b, 2012) and van
Keilegom et al. (2008).
The fact that in this work we emphasize processes based on the ηbt rather than on b
t is
triggered by the applications of our results we have in mind. However, the incorporation of
the estimation of σ is more or less straightforward. See section 2.1 for more details.
The proof of (2) (and also the one of (12)) involves the behavior of two types of stochastic
processes, the large sample behavior of which based on observations from a non-stationary
process satisfying (1) is investigated in this paper. One of these processes is the residual
sequential empirical process, closely related to the sequential empirical process based on the
residuals, and the other is a generalized partial sum processes or weighted sums processes.
Both of them are of independent interest.
Residual sequential empirical process are defined as follows. First observe that we can write
P
b 0 ( t ) Yt−1 + z}. This motivates the consideration of
Fbn (α, z) = 1 bαnc 1{σ( t ) t ≤ (θ − θ)
n
t=1
n
n
the process
bαnc
1 X t
νn (α, z, g) = √
1 σ( n ) t ≤ g0 ( nt ) Yt−1 + z − F
n t=1
g0 ( nt ) Yt−1 + z /σ
t
n
,
(3)
where α ∈ [0, 1], z ∈ R, g : [0, 1] → Rp , g ∈ G p = {g = (g1 , . . . , gp )0 , gi ∈ G} with G an
3
b ∈ G p (see below), and F (z) denotes the distribution
appropriate function class such that θ− θ
function of the errors t . Following Koul (2002), we call the process (3) a residual sequential
empirical process even though residuals are not directly involved anymore. Observe that with
g = 0 = (0, . . . , 0)0 denoting the p-vector of null-functions, νn (α, z, 0) equals the sequential
empirical process based on the innovations ηt . The basic form of νn is standard, and residual
empirical processes based on non-parametric models have been considered in the literature as
indicated above. Our contribution here is to study these processes based on a non-stationary
Yt of the form (1).
The second type of key processes in this paper are weighted sums of the form
n
1 X t
h( ) Yt , h ∈ H,
Zn (h) = √
n t=1 n
where H is an appropriate function class. Such processes can be considered as generalizations
of partial sum processes. Exponential inequalities and weak convergence results for such
processes are derived below.
Motivation. Properties of both processes, νn (α, g, z) and Zn (h), are used to derive conditions
for (2) to hold. Moreover, properties of both processes are also used to derive the large sample
behavior of a test statistic for the shape of the variance function in model (1) that is proposed
in an accompanying paper, Chandler and Polonik (2012). This test statistic is based on the
bn,γ (α) given
sequential empirical process based on the squared residuals - see definition of G
in (21) below. For more details, see Chapter 4.
The outline of the paper is as follows. In sections 2 and 3 we analyze the large sample
behavior of the function indexed residual empirical process and weighted sums, respectively,
under the time varying model (1), and apply them to show (2). Section 4 applies some
bn,γ (α).
of the results from the preceding sections to derive an approximation result for G
This approximation provides a crucial ingredient to Chandler and Polonik (2012) where the
asymptotic distribution free limit of a test statistics for testing modality of the variance
function in model (1) is derived. Proofs are deferred to section 5.
Remark on measurability. Suprema of function indexed processes will enter the theoretical results below. We assume throughout the paper that such suprema are measurable.
4
2
Residual empirical processes under time varying AR-models
Work on empirical processes based on residuals in models involving non-parametric components has already been mentioned above. There of course exists a substantial body of
work on residual empirical processes indexed by a finite dimensional parameter. For the
time series setting see, for instance, Horváth et al. (2001), Stute (2001), Koul (2002), Koul
and Ling (2006), Laı̈b et al. (2008), Müller et al. (2009a), and references therein. Here we
are considering function indexed residual empirical processes based on triangular arrays of
non-stationary time series of the form (1).
In order to formulate one of our main results for the residual empirical process νn (α, z, g)
defined in (3) above, we first introduce some notation and formulate the underlying assumptions.
Let H denote a class of functions defined on [0, 1], and let d denote a metric on H. For
a given δ > 0, let N (δ, H, d) denote the minimal number N of d-balls of radius less than
or equal to δ that are needed to cover H, i.e., there exists functions gk , k = 1, . . . , N such
S
that the balls Ak = {h ∈ H : d(gk , h) ≤ δ} with H ⊂ N
k=1 Ak . Then log N (δ, H, d) is
called the metric entropy of H with respect to d. If the balls Ak are replaced by brackets
Bk = {h ∈ H : g k ≤ h ≤ g k } for pairs of functions g k ≤ g k , k = 1, . . . , N with d(g k , g k ) ≤ δ,
S
then the minimal number N = NB (δ, H, d) of such brackets with H ⊂ N
k=1 Bk is called a
bracketing covering number, and log NB (δ, H, d) is called the metric entropy with bracketing
of H with respect to d. For a function h : [0, 1] → R we denote
khk∞ := sup |h(u)| and
u∈[0,1]
khk2n
n
1 X 2
h
:=
n t=1
t
n
.
We further denote by Nn (δ, H) and Nn,B (δ, H) the metric entropy and the metric entropy
with bracketing, respectively, with respect to the metric defined by k · kn .
Assumptions. (i) The process Yt = Yt,T has an MA-type representation
Yt,T =
∞
X
at,T (j)t−j ,
(4)
j=0
where t ∼i.i.d. (0, 1). The distribution F of t has a strictly positive Lipschitz continuous
Lebesgue density f. The function σ(u) in (1) is of bounded variation with 0 < m∗ < σ(u) <
5
m∗ < ∞ for all u.
(ii) The coefficients at,T (·) of the MA-type representation of Yt,T given in (i) satisfy
sup |at,T (j)| ≤
1≤t≤T
K
`(j)
for some K > 0, and where for some κ > 0 we have `(j) = j (log j)1+κ for j > 1 and `(j) = 1
for j = 0, 1.
Assumptions (i) and (ii) have been used in the literature on locally stationary processes
before (e.g. Dahlhaus and Polonik, 2006, 2009). It is shown in Dahlhaus and Polonik (2005)
by using a proof similar to Künsch (1995) that (i) holds for time-varying AR-processes (1)
if the zeros of the corresponding AR-polynomials are bounded away from the unit circle
(uniformly in the rescaled time u) and the parameter functions are of bounded variation.
Theorem 1 Suppose that assumptions (i) and (ii) hold, and G denotes a function class
satisfying for some c > 0 that
Z
1
q
log Nn,B (u2 , G) du < ∞
∀ n.
(5)
c/n
Then we have for any L > 0 and δn → 0 as n → ∞ that
sup
α∈[0,1],z∈[−L,L], g∈G p
kgkn ≤δn
|νn (α, z, g) − νn (α, z, 0)| = oP (1).
(6)
This theorem is of typical nature for work on residual empirical processes (e.g. (8.2.32) in
Koul 2002), although here we are dealing with time-varying AR-processes, and are considering a non-parametric index class of functions. Also keep in mind that we are considering
triangular arrays (recall that Yt = Yt,T ).
Notice that the above result is considering the difference of two processes where the ‘right’
centering is used. In contrast to that, we now consider the difference between two sequential
empirical distribution functions; one of them based on the residuals and the other on the
innovations. It will turn out that somewhat surprisingly, the difference of these two func√
tions is oP (1/ n), so that the non-parametric estimation of the parameter functions has an
negligible asymptotic effect on the estimation of the (average) innovation distribution. More
assumptions are needed.
6
Assumptions.
(iii) There exists a class G with
θk (·) − θbk (·) ∈ G,
k = 1, . . . , p,
with probability tending to 1 as n → ∞.
such that supg∈G kgk∞ < ∞ and for some C, c > 0
Z
1
log Nn (u, G) du < C < ∞ ∀ n.
c/n
(iv) For k = 1, . . . , p we have
1
n
Pn
t=1
2
θbk ( nt ) − θk ( nt ) = OP (m−1
n ) with mn → ∞ as n → ∞.
Assumptions (iii) and (iv) are discussed below.
Theorem 2 Suppose that the assumptions of Theorem 1 hold. If, in addition, assumptions
(iii) and (iv) are satisfied, where mn satisfies
sup
n1/2
m2n log n
→ ∞ as n → ∞, then for any L > 0
√
|Fbn (α, z) − Fn (α, z)| = oP (1/ n).
(7)
α∈[0,1],z∈[−L,L]
Discussion of assumptions (iii) and (iv). The assumption on the entropy integral (see assumptions (iii) and (5)) control the complexity of the class G. Many classes G are know
to satisfy these assumptions - see below for one example (for more examples we refer to
the literature on empirical process theory). Notice that a more standard condition on the
R1 p
covering integral is c/n log Nn (u, G) du < ∞ (or similarly with bracketing). In contrast
to that, the entropy integral in assumption (iii) does not have a square root. This is similar
p
to condition (5) where the integral is over log Nn,B (u2 , G) (notice the u2 ). This makes
both our entropy conditions stronger than the standard assumption. The reason for this
is that the exponential inequality that is underlying the derivations of our results is not of
sub-Gaussian type (see Lemma 3), which in turn is caused by the dependence structure of
our underlying process.
A class of non-parametric estimators satisfying conditions (iii) and (iv) is given by the
wavelet estimators of Dahlhaus et al. (1999). These estimators lie in the Besov smoothness
s
class Bp,q
(C) where the smoothness parameters satisfy the condition s +
7
1
2
−
1
max(2,p)
>
1. The constant C > 0 is a uniform bound on the (Besov) norm of the functions in the
class. Dahlhaus et al. derive conditions under which their estimators converge at rate
log n s/(2s+1)
s
in the L2 -norm. For s ≥ 1 the functions in Bp,q
(C) have uniformly bounded
n
total variation. Assuming that the model parameter functions also posses this property, the
rate of convergence in the k · kn -norm is the same as the one of the L2 -norm, because in
this case the error in approximating the integral by the average over equidistant points is of
log n s/(2s+1)
. In order to verify the
order O(n−1 ). Consequently, in this case we have m−1
=
n
n
condition on the bracketing covering numbers from (iii), we use Nickl and Pötscher (2007).
Their Corollary 1, applied with s = 2, p = q = 2 implies that the bracketing entropy with
respect to the L2 -norm can be bounded by C δ −1/2 . (When applying their Corollary to our
situation, choose, in their notation, β = 0, µ = U [0, 1], r = 2 and γ = 2, say.)
The proof of Theorem 1 rests on the following lemma which is of independent interest. It is
modeled after similar results for empirical processes (see van de Geer, 2000, Theorem 5.11).
Let HL = [0, 1] × [−L, L] × G p denote the index space of the process νn where L > 0 is some
constant. Define a metric on HL as
p
X
dn (h1 , h2 ) = dn (α1 , z1 , g1 ), (α2 , z2 , g2 ) = |α1 − α2 | + |z1 − z2 | +
kg1,k − g2,k kn .
(8)
k=1
Lemma 1 For C0 > 0 let An =
2
2
≤
C
and define K ∗ = 1 + (1 + p C0 ) kfmk∗∞ .
Y
0
s
s=−p+1
1 Pn
n
Suppose that HL is totally bounded with respect to dn , that assumptions (i) and (ii) hold,
and that for C1 > 0,
26 K ∗
η≥ √ ,
n
(9)
1 ∗√
K n (τ 2 ∧ τ ),
2
Z τ
p
2
log NB (u , HL , dn ) du ∨ τ .
η ≥ C1
√
η ≤
(10)
(11)
η/28 K ∗ n
6
Then, for every L > 0, and for C1 ≥ 2
P
sup
h1 ,h2 ∈HL : dn (h1 ,h2 )≤τ 2
√
10 K ∗
we have with C2 =
νn (h1 ) − νn (h2 )| ≥ η, An ≤ C2 exp −
8
26 (26 +1)K ∗
C12
+ 2 that
η2
26 (26 + 1) K ∗ τ 2
.
2.1
Estimating the distribution function of the errors t
As has been indicated in the introduction, the fact that we present our results for the case
of σ(·) not being estimated is related to the applications we have in mind (see Section 4).
However, the case of also estimating the variance function, and thus estimating the distribution of t rather than the average distribution of ηt = σ( nt )t , can be treated by using very
similar techniques. This is discussed here briefly.
Assume we have available an estimator σ
b(·) of the variance function. Then the residuals
become b
t =
b t )0 Yt−1
Yt −θ(
n
,
σ
b( nt )
and the sequential empirical distribution function of b
t can be
written as
Fbn∗ (α, z)
bαnc
bαnc
σ
o
b − θ)( t )0 Yt−1
b( nt )
(θ
1X n
1X
n
1{b
t ≤ z} =
1 t ≤
=
+
z
−
1
+
z
n t=1
n t=1
σ( nt )
σ( nt )
Accordingly, we define the modified residual sequential empirical process as
νn∗ (α, z, g, s)
bαnc
1 Xh t
1 σ( n ) t ≤ g0 ( nt ) Yt−1 + z s( nt ) + z
=√
n t=1
−F
g0 ( nt ) Yt−1
+z
s( nt )
+ z /σ
t
n
i
,
where the function s lies in a class S such that the difference σ
b −σ ∈ S (with probability tending to 1 as n → ∞). An adapted version of Theorem 1 holds for νn∗ (α, z, g, s) − νn∗ (α, z, 0, 0)
where the adaptation consists in replacing G by G ∪ S in the entropy integral, and the
supremum is also taken over all s ∈ S with kskn ≤ δn → 0 as n → ∞.
Assuming further that conditions on S and the difference σ
b − σ ∈ S follow exactly the
ones given in assumptions (iii) and (iv) above, then under the same assumption on the
rates of convergence of our estimators, the non-parametric estimation of the AR-parameter
functions and the variance function has a negligible asymptotic effect on the estimation of
the distribution function of the errors t , i.e.
bαnc
1 bαnc
√
1X
X
1{b
t ≤ z} −
1{t ≤ z} = oP (1/ n).
sup
n t=1
α∈[0,1],z∈[−L,L] n t=1
(12)
The proofs of these results require only some straightforward adaptations, beginning with a
modification of dn (given in (8)) to a metric on HL∗ = [0, 1] × [−L, L] × G p × S by adding
9
ks1 − s2 kn . With this modification Lemma 1 still holds for HL∗ . This then is the main
ingredient to the proof of Theorems 1 and 2 for processes based on the b
t rather than on the
b
t .
Just as for the estimators of the AR-parameters, the wavelet based estimator of the variance
function of Dahlhaus et al. (1999) satisfies the necessary assumptions on σ
b.
3
Weighted sums under local stationarity
As discussed above, the second type of processes of importance in our context are weighted
partial sums of locally stationary processes given by
n
1 X
h
Zn (h) = √
n t=1
t
T
Ys ,
h ∈ H.
(13)
In the i.i.d. case, weighted sums have received some attention in the literature. For functional
central limit theorems and exponential inequalities see, for instance, Alexander and Pyke
(1986), van de Geer (2000), and references therein.
We will show below that under appropriate assumptions, Zn (h) converges weakly to a Gaussian process. In order to calculate the covariance function of the limit we assume that the
process Yt is locally stationary as in Dahlhaus and Polonik (2009). Recalling assumption (i),
this means that we assume the existence of functions a(·, j) : (0, 1] → R with
sup |a(u, j)| ≤
u
sup
j
n
X
t=1
K
,
`(j)
t
|at,T (j) − a( , j)| ≤ K,
T
T V (a(·, j)) ≤
K
,
`(j)
(14)
(15)
(16)
where for a function g : [0, 1] → R we denote by T V (g) the total variation of g on [0, 1].
Conditions (14) - (16) hold if the zeros of the corresponding AR-polynomials are bounded
away from the unit circle (uniformly in the rescaled time u) and the parameter functions are
of bounded variation (see Dahlhaus and Polonik, 2006). Further we define the time varying
10
spectral density as the function
1
|A(u, λ)|2
2π
f (u, λ) :=
with
A(u, λ) :=
∞
X
a(u, j) exp(−iλj),
j=−∞
and
π
Z
f (u, λ) exp(iλk)dλ =
c(u, k) :=
−π
∞
X
a(u, k + j) a(u, j)
j=−∞
is the time varying covariance of lag k at rescaled time u ∈ [0, 1]. We also denote by cumk (X),
the k-th order cumulant of a random variable X.
Theorem 3 Let H denote a class of uniformly bounded, real valued functions of bounded
variation defined on [0, 1]. Assume further that for some C, c > 0,
Z
1
log Nn (u, H) du < C < ∞
∀ n,
(17)
c/n
and that for some constant C1 > 0 we have cumk (t ) ≤ k!C1k for k = 1, 2, . . . Then we have
under assumptions (i) and (ii) that as n → ∞, the process Zn (h), h ∈ H converges weakly to
a tight, mean zero Gaussian process {G(h) , h ∈ H}. If, in addition, (14) - (16) hold, then the
R1
variance-covariance function of G(h) can be calculated as C(h1 , h2 ) = 0 h1 (u) h2 (u) S(u) du,
P
where S(u) = ∞
k=−∞ c(u, k).
Remarks. a) Here weak convergence is meant in the sense of Hoffman-Jorgensen - see van
der Vaart and Wellner (1996) for more details.
b) It is well-known that for normally distributed errors we have cumk (t ) = 0 for k = 3, 4, . . .,
and thus they satisfy the assumptions of the theorem.
c) Weighted partial sums of the form
bαnc
1 X
Zn (α, h) = √
g
n t=1
11
t
n
Ys
are in fact a special case of processes considered in the theorem. Here h(u) = hg,α (u) =
1[0,α] (u) g(u). Note that if g ∈ G and G satisfies the assumptions on the covering integral
from the above theorem, then so does the class {hg,α (u) : g ∈ G, α ∈ [0, 1]}. In this case,
R α ∧α
the limit covariance can then be written as C(hg1 ,α1 , hg2 ,α2 ) = 0 1 2 g1 (u) g2 (u) S(u) du.
d) Assumptions (14) - (16) are only used for calculating the covariance function of the limit
process.
The main ingredients to the proof of Theorem 3 are presented in the following results. These
results are of independent interest.
Theorem 4 Let {Yt , t = 1, . . . , n} satisfy assumptions (i) and (ii) and let H = {h : [0, 1] →
R} be totally bounded with respect to k·kn . Further assume that there exists a constant C > 0
P
such that for all k = 1, 2, . . ., we have cumk ≤ k!C k , and let An = n1 nt=1 Yt2 ≤ M 2 where
M > 0. There exists constants c0 , c1 , c2 > 0 such that for all η > 0 satisfying
√
η < 16 M n τ
Z τ
log Nn (u, H) d u ∨ τ
η > c0
and
(18)
(19)
η√
8M n
we have
P
h
i
n
η o
.
|Zn (h)| > η, An ≤ c1 exp −
c2 τ
h∈H, khkn ≤τ
sup
The next result of importance for the proof of Theorem 3 deals with cumulants. For random
variables X1 , . . . , Xk we denote by cum(X1 , . . . , Xk ) their joint cumulant, and if Xi = X
for all i = 1, . . . , k, then cum(X1 , . . . , Xk ) = cum(X, . . . , X) = cumk (X), the k-th order
cumulant of X.
Lemma 2 Let {Yt , t = 1, . . . , n} satisfy assumptions (i) and (ii). For j = 1, 2, . . ., let hj be
functions defined on [a, b] with khj kn < ∞. Then there exists a constant 1 ≤ K0 < ∞ such
that for all k ≥ 1,
k
Y
cum(Zn (h1 ), . . . , Zn (hk )) ≤ K0k−1 cumk (1 ) khj kn .
j=1
12
If, in addition, khj k∞ ≤ M < ∞, j = 1, . . . , k, then for k ≥ 3,
cum(Zn (h1 ), . . . , Zn (hk )) ≤ (K0 )k−2 M k cumk (1 ) n− k−2
2 .
The behavior of the cumulants given in the above lemma is needed for the following crucial
exponential inequality.
Lemma 3 Let {Yt , t = 1, . . . , n} satisfy the assumptions of Lemma 2. Let h be a function
with khkn < ∞. Assume that there exists a constant C > 0 such that for all k = 1, 2, . . . we
have cumk (t ) ≤ k! C k . Then there exists constants c1 , c2 > 0 such that for any η > 0 we have
h
i
n
o
η
P |Zn (h)| > η ≤ c1 exp −
.
(20)
c2 khkn
The assumptions on the cumulants in particular hold for errors with E|t |k ≤
C k
2
for all
k = 1, 2, . . .
4
Applications
We present two applications that motivate the above study of the two types of processes. The
first is a test for the constancy of the variance function, i.e. homoskedasticity. The second,
closely related though more involved than the first, is a test for determining the modality of
the variance function.
4.1
A test of homoskedasticity
We are interested in testing the null hypothesis H : σ(u) = σ0 for all u ∈ [0, 1]. (Note that
if we additionally assume that the AR parameters do not depend on time, then under mild
conditions on the θk , this is also a test for weak stationarity.) To this end, consider the
process
bαnc
1X
b
1(b
ηt2 ≥ qbγ2 ),
Gn,γ (α) =
n t=1
α ∈ [0, 1],
(21)
bn,γ (α) counts
where qbγ is the empirical quantile of the squared residuals. Notice that G
the number of large (squared) residuals within the first (100 × α)% of the observations
13
Y1 , . . . , Yn . If the variance function is constant, then, since one has a total of bnγc large
bn,γ (α) approximately equals αγ. This motivates the form
residuals, the expected value of G
bn,γ (α) − αγ . The behavior of this test statistic follows
of our test statistic, Tn = supα∈[0,1] G
the discussion of our second application.
4.2
Asymptotic distribution of a test statistic for modality of the variance function
We argue briefly that the test for constancy of variance from subsection 4.1 can be modified
to a test for modality of the variance function. For more details, please refer to Chandler
and Polonik (2012). A key realization is that for a function to be monotonic (which, say, a
unimodal function is on either side of the mode), the variance function must be ‘at least’
constant. In other words, a constant variance function provides the boundary case under the
null for a test of monotonicity. We conduct this test on appropriate intervals [a, b] ⊂ [0, 1],
where this crucial choice depends on the underlying variance function. Under the alternative
hypothesis the interval needs to be located around a deviation from the null-hypothesis,
meaning around an antimode. Chandler and Polonik (2012) provide an estimator for the
interval of interest and show that this estimation does not influence the asymptotic limit
of the test statistic. Via this estimation, and supposing we are testing for a monotonically
increasing function (locally on [a, b]), it is appropriate to modify the test statistic to Tn =
bn,γ (α) − αγ .
supα∈[a,b] G
bn,γ (α) is
To simplify the notation we assume in the following that [a, b] = [0, 1]. Notice that G
closely related to the sequential residual empirical process, and as can be seen from the proof
bn,γ (α) through handling
of Theorem 5, weighted empirical processes enter the analysis of G
the estimation of qγ . Theorem 5 below provides an approximation of the test statistic Tn
by independent (but not necessarily identically distributed) random variables. This result
crucially enters the proofs in Chandler and Polonik (2012). In particular, it implies that the
large sample behavior of the test statistic Tn under the null hypothesis is not influenced by
the in general non-parametric estimation of the parameter functions, as long as the rate of
convergence of these estimators is sufficiently fast. This is somewhat a surprise, and it is
connected to the particular structure of our model.
First we introduce some additional notation. Let fu denote the pdf of σ(u) t , i.e. fu (z) =
1
σ(u)
z
f ( σ(u)
), and
14
bαnc
1X
Gn,γ (α) =
1 t2 σ 2
n t=1
t
T
≥ qγ2 ,
where qγ is defined by means of
Z
1
Ψ(z) =
0
z
F ( σ(u)
) du
Z
−
0
1
−z
F ( σ(u)
) du,
z ≥ 0,
(22)
as the solution to the equation
Ψ(qγ ) = 1 − γ.
(23)
Notice that this solution is unique since we have assumed F to be strictly monotonic, and if
σ 2 (u) = σ02 is constant for all u ∈ [a, b], then qγ2 equals the upper γ-quantile of the squared
innovations ηt2 = σ02 2t . The approximation result that follows does not assume that the
variance is constant, however.
Theorem 5 Let γ ∈ [0, 1] and suppose that 0 ≤ a < b ≤ 1 are non-random. Assume further
that cumk |t | < k!C k for some C > 0 and all k = 1, 2, . . . Then, under assumptions (i) (iv), with
n1/2
m2n log n
√
= o(1) we have
bn,γ (α) − Gn,γ (α) + c(α) Gn,γ (1) − EGn,γ (1) = op (1),
n sup G
(24)
α∈[0,1]
where
Rα
f
(q
))
+
f
(−q
)
du
u
γ
u
γ
c(α) = R0 1 .
fu (qγ ) + fu (−qγ ) du
0
Under the null-hypothesis σ(u) ≡ σ0 > 0 for u ∈ [0, 1] we have c(α) = α. Moreover, in case
√
the AR-parameter in model (1) are constant, and n-consistent estimators are used, then
the moment assumptions on the innovations can be significantly relaxed to E t2 < ∞.
Under the (worst case) null-hypothesis, σ 2 (u) = σ02 for all u ∈ [0, 1], the innovations are
i.i.d., we have c(α) = α, and EGn,γ (α) = αγ. Therefore the above result implies that
√ b
(γ(1 − γ))−1/2 n(G
n,γ (α) − αγ) converges weakly to a standard Brownian Bridge. Further
details can be found in Chandler and Polonik (2012).
15
5
Proofs
5.1
Proof of Lemma 1
We only present a brief outline. First notice that νn (α, z, g) is a sum of bounded martingale
differences, because with ξtz,g = 1 σ( nt ) t ≤ g0 ( nt )Yt−1 + z we have
n
1 X ez,g
ξ 1
νn (α, z, g) = √
n t=1 t
t
n
≤α
where ξetz,g = ξtz,g − E(ξtz,g |Ft−1 ) and Ft = σ(t , t−1 , . . .) denotes the σ-algebra generated
by {t , t−1 , . . .}, and obviously also νn (α1 , z1 , g1 ) − νn (α2 , z2 , g2 ) are sums of martingale
differences. The proof of this lemma is based on the basic chaining device that is well-known
in empirical process theory, utilizing the following exponential inequality for sums of bounded
martingale differences from Freedman (1975).
Lemma. (Freedman 1975) Let Z1 , . . . , Zn denote martingale differences with respect to a
P
filtration {Ft , t = 0, . . . , n−1} with |Zt | ≤ C for all t = 1, . . . , n. Let further Sn = √1n nt=1 Zt
P
and Vn = Vn (Sn ) = n1 nt=1 E(Zt2 | Ft−1 ). Then we have for all , τ 2 > 0 that
P Sn ≥ , Vn ≤ τ
2
2
≤ exp − 2
2τ +
!
2
√C
n
.
(25)
The form of (25) motivates that we first need to control the quadratic variation Vn . Let
ηtα,z,g = ξetz,g 1
t
n
≤α .
We have for h1 = (α1 , z1 , g1 ), h2 = (α2 , z2 , g2 ) ∈ H with d h1 , h2 ≤ that
n
1X
Vn =Vn (νn (h1 ) − νn (h2 )) =
E |ηtα1 ,z1 ,g1 − ηtα2 ,z2 ,g2 | Ft−1
n t=1
n
1 X ≤
1
n t=1
≤
t
n
≤ α1 − 1
t
n
n
1X
z1 ,g1 ≤ α2
E(|ξt | Ft−1 ) +
E(|ξtz1 ,g1 − ξtz2 ,g2 |Ft−1 )
n t=1
n
1X
1 n (α2 − α1 ) + 1 +
E(|ξtz1 ,g1 − ξtz2 ,g2 |Ft−1 )
n
n t=1
16
≤ |α1 − α2 | +
n
1
1X
+
E(|ξtz1 ,g1 − ξtz2 ,g2 |Ft−1 ).
n n t=1
On An we have for the last sum that
n
1X
E(|ξtz1 ,g1 − ξtz2 ,g2 |Ft−1 )
n t=1
n
0
0
1 X ≤
Fa+ nt z1 + (g1 ( nt ) Yt−1 − F nt z2 + (g1 ( nt ) Yt−1 n t=1
n
0
0
1 X F Ts z2 + (g1 ( nt ) Yt−1 − F nt z2 + (g2 ( nt ) Yt−1
+
n t=1
n
0
1 X (g1 − g2 )( nt ) Yt−1 |
n t=1
v
u X
p
X
u1 n
kf k∞
kg1k − g2k kn t
Yt2 ≤ (1 + C0 )
.
n
m
∗
t=−p
k=1
≤ sup fu (x) |z1 − z2 | + sup fu (x)
u,x
≤
u,x
kf k∞
kf k∞
|z1 − z2 | +
m∗
m∗
(26)
Thus, on An we have for h1 = (α1 , z1 , g1 ), h2 = (α2 , z2 , g2 ) ∈ HL with dn h1 , h2 ≤ and
≥
1
n
that
n
2
1 X h1
Vn (νn (h1 ) − νn (h2 )) ≤
E η s − η hs 2 Fs−1 ≤ K ∗ .
n s=1
(27)
This control of the quadratic variation in conjunction with Freedman’s exponential bound
for martingales now enables us to apply the chaining argument in a way similar to the proof
of Theorem 5.11 in van de Geer (2000). Details are omitted.
5.2
Proof of Theorem 1
We will utilize Lemma 1. For η > 0 we have
P(
|νn (α, z, g) − νn (α, z, 0)| ≥ η)
sup
α∈[0,1],z∈[−L,L],g∈G p
≤P
An
c +P
sup
dn (h1 ,h2 )≤C m−1
n
h1 ,h2 ∈HL
17
|νn (h1 ) − νn (h2 )| ≥ η, An ,
P
where again An = { n1 nt=1−p Yt2 ≤ C } for some C > 0. We will see below that
Pn
1
2
t=−p+1 Yt = OP (1) as n → ∞. Therefore, for any given > 0 we can choose C
n
c such that P An
≤ for n large enough. An application of Lemma 1 now gives the asR1p
sertion. Similar arguments as those leading to (19) show that c log Nn,B (u2 , G) du < ∞
n
R1p
implies c log NB (u2 , HL , dn ) du < ∞ (see also (46)). It remains to show that in fact
n
Pn
1
2
t=−p+1 Yt = OP (1). To this end we show that
n
n
i
h1 X
E
Yt2 < ∞,
n t=1
n
h1 X
i
2
Var
Y = o(1).
n t=1 t
(28)
(29)
These two facts can be seen by direct calculations as demonstrated now. First we consider
(28). We have
EYt2
t
t
t
h X
i2
X
X
=E
at,n (t − j)j = E
at,n (t − j)at,n (t − k)j k
j=−∞
j=−∞ k=−∞
=
t
X
a2t,T (t
j=−∞
∞ X
K 2
− j) ≤
< C < ∞,
`(j)
j=0
(30)
for some C > 0, where we were using assumption (ii). This implies (28). Next we indicate
(29). Straightforward calculations show that
E(Yt2 Ys2 ) − E(Yt2 ) E(Ys2 )
=
t
t
s
s
X
X
X
X
at,T (t − i)at,T (t − j)as,T (s − k)as,T (s − `)E(i j k ` )
i=−∞ j=−∞ k=−∞ `=−∞
−
t
X
a2t,T (t − i)
i=−∞
=
(E4t
s
X
a2s,T (s − k)
k=−∞
− 2) A(t, s) + 2B(t, s),
where
A(t, s) =
∞
X
a2t,T (i)a2s,T (i + |t − s|),
i=0
18
B(t, s) =
∞
X
2
at,T (i)as,T (i + |t − s|) .
i=0
We obtain that for some C ∗ < ∞,
∞
∞
X
X
1
1
1
1
4
≤K
A(t, s) ≤ K
2
2
` (i) ` (i + |t − s|)
`(i) `(i + |t − s|)
i=0
i=0
4
≤ K4
∞
X
1
1
1
< C∗
,
`(|t − s|) i=0 `(i)
`(|t − s|)
and similarly
∞
i2
X
1
1
1
|B(t, s)| ≤ K
.
≤ C∗
`(i) `(i + |t − s|)
`(|t − s|)
i=0
h
2
(31)
Now we have
n X
n
n X
n
n−1
1 X
C∗ X
1
C ∗ X 2(n − k) 1
A(t, s) ≤ 2
≤
2
n s=1 t=1
n s=1 t=1 `(|t − s|)
n k=0
n
`(k)
≤
This implis that Var
5.3
n−1
1
2 C∗ X 1
=O
as n → ∞.
n k=0 `(k)
n
1 Pn
n
2
Y
= O(1/n) which is (29).
s
s=1
Proof of Theorem 3
Showing weak convergence of the process Zn (h) means proving asymptotic tightness and
convergence of the finite dimensional distribution (e.g. see van der Vaart and Wellner,
1996). Tightness follows from Theorem 4.
It remains to show convergence of the finite dimensional distributions. To this end we will
utilize the Cramér-Wold device in conjunction with the method of cumulants. It follows from
Lemma 2, that all the cumulants of Zn (h) of order k ≥ 3 converge to zero as n → ∞. Using
the linearity of the cumulants, the same holds for any linear combination of Zn (hi ), i =
1, . . . , K. The mean of all the Zn (h) equals zero. It remains to show convergence of the
covariances cov(Zn (h1 ), Zn (h2 )). The range of the summation indices below are such that
the indices of the Y -variables are between 1 and n. For ease of notation we achieve this by
19
formally setting hi (u) = 0 for u ≤ 0 and u > 1, i = 1, 2. We have
n
n
1 XX
h1
n t=1 s=1
cov(Zn (h1 ), Zn (h2 )) =
=
n
t−1
1X X
h1
n t=1 k=t−n
t
n
=
n
1X X
h1
n t=1
√
t
n
· h2
t−k
n
· h2
t−k
n
t
n
· h2
s
n
cov(Ys , Yt )
cov(Yt , Yt−k )
cov(Yt , Yt−k ) + R1n
(32)
|k|≤ n
where for n sufficiently large
|R1n | ≤
n
1 X X h1
n t=1
√
t
T
· h2
t−k
T
cov(Yt , Yt−k ).
|k|> n
From Proposition 5.4 of Dahlhaus and Polonik (2009) we obtain that supt |cov(Yt , Yt−k )| ≤
P∞
K
1
for
some
constant
K.
Since
both
h
and
h
are
bounded
and
1
2
k=−∞ `(k) < ∞, we can
`(k)
conclude that R1n = o(1). The main term in (32) can be approximated as
n
1X X
h1
n t=1
√
t
T
· h2
t−k
n
· h2
c
t
,
n
k + R2n
(33)
|k|≤ n
where
n
1 X X |R2,n | ≤
h1
n t=1
√
t
n
t−k
n
cov(Yt , Yt−k ) − c
t
,
n
k .
|k|≤ n
√
Proposition 5.4 of Dahlhaus and Polonik (2009) also says that for |k| ≤ n we have
Pn cov(Yt , Yt−k ) − c t , k ≤ K 1 + |k| for some K > 0. Applying this result we
t=0
n
n
obtain that
√
n
n
1 X X h1
|R2,n | ≤
n t=1
√
≤ K1
1
n
k=− n
√
n
n
X
X
√
k=− n t=1
t
n
· h2
t−k
n
cov(Yt , Yt−k ) − c
t
,
n
k √
cov(Yt , Yt−k ) − c
n
1 X
t
, k ≤ K1
n
n
√
1+
|k|
`(|k|)
= o(1)
k=− n
as n → ∞. Next we replace h2 ( t−k
) in the main term of (33) by h2 ( nt ). The approximation
n
20
error can be bounded by
n
1 X X h1
n t=1
√
t
n
· h2
t−k
n
t
n
− h2
K
`(|k|)
= o(1).
|k|≤ n
Here we are using the fact that supu |c(u, k)| ≤
K
`(|k|)
(see Proposition 5.4 in Dahlhaus and
Polonik, 2009) together with the assumed (uniform) continuity of h2 , the boundedness of h1
P
1
and the boundedness of ∞
k=−∞ `(|k|) . We have seen that
n
X
1X
cov(Zn (h1 ), Zn (h2 ) ) =
h1 nt · h2 nt
c nt , k + o(1).
n t=1
√
k≤ n
P∞
t
Since S(u) = k=−∞ c n , k < ∞ we also have
∞
n
X
1X
cov(Zn (h1 ), Zn (h2 ) ) =
h1 nt · h2 nt c nt , k + o(1).
n t=1
k=−∞
Finally, we utilize the fact that T V (c(·, k)) ≤
K
`(|k|)
which is another result from Proposi-
tion 5.4 of Dahlhaus and Polonik (2009). This result, together with the assumed bounded
variation of both h1 and h2 allows us to replace the average over t by the integral.
5.4
Proof of Lemma 2
By utilizing multilinearity of cumulants we obtain that
cum(Zn (h1 ), . . . , Zn (hk )) = n
− k2
n
X
h1
t1
n
· · · hk
tk
n
cum(Yt1 , . . . , Ytk ).
t1 ,t2 ,...,tk =1
In order to estimate cum(Yt1 , . . . , Ytk ) we utilize the special structure of the Yt -variables.
Since the j are independent we again obtain by again using multilinearity of the cumulants
together with the fact that cum(j1 , . . . , jk ) = 0 unless all the j` , ` = 1, . . . , k are equal, that
min{t1 ,...,tk }
cum(Yt1 , . . . , Ytk ) =
X
at1 ,n (t1 − j) · · · atk ,n (tk − j) cum(j , . . . , j )
j=0
min{t1 ,...,tk }
= cumk (1 )
X
at1 ,n (t1 − j) · · · atk ,n (tk − j).
j=0
P∞ Qk
Thus cum(Yt1 , . . . , Ytk ) ≤ cumk (1 )
j=0
i=1
21
K
,
`(|ti −j|)
(34)
and consequently,
∞
k hX
n
Y
X
cum(Zn (h1 ), . . . , Zn (hk )) ≤ n− k2 cumk (1 )
hi
=n
− k2
cumk (1 )
2
Y
n
hX
hi
×
n
hX
hi
ti
n
ti =0
i=3
i
K
`(|ti − j|)
ti
n
i
K
`(|ti − j|)
ti =0
j=−∞ i=1
k
Y
ti =0
j=−∞ i=1
∞
X
ti
n
i
K
.
`(|ti − j|)
Utilizing Cauchy-Schwarz inequality we have for the last product
k hX
n
Y
hi
i=3
ti n
ti =0
1
`(|ti − j|)
i
v
k uX
Y
u n
t
≤
hi
v
uX
u n ti 2 t
n
ti =0
i=3
ti =0
2
K
`(|ti − j|)
v
u ∞ k
Y
uX
k
K 2
−1
t
2
≤n
khi kn
`(|t|)
t=−∞
i=3
≤
k
K0k−2 n 2 −1
k
Y
khi kn .
(35)
i=3
where we used the fact that
r
P∞
t=−∞
K
`(|t|)
2
≤
P∞
K
t=−∞ `(|t|)
≤ K0 for some K0 < ∞. Notice
that the bound (35) does not depend on the index j anymore, so that
k
∞
2 hX
n
Y
Y
X
cum(Zn (h1 ), . . . , Zn (hk )) ≤ K0k−2 n−1
hi
khi kn cumk (1 )
i=3
j=−∞ i=1
As for the last sum, we have by again using the fact that
=
j=−∞ i=1
n X
n
X
ti =0
h1 tn1
t1 =0 t2 =0
h2 tn2
ti
n
P∞
K
K
j=−∞ `(|t1 −j|) `(|t2 −j|)
i
K
`(|ti − j|)
∞
X
K
K
`(|t1 − j|) `(|t2 − j|)
j=−∞
22
ti =0
for some K ∗ > 0 (cf. (31)) together with the Cauchy-Schwarz inequality that
∞
2 hX
n
X
Y
hi
ti
n
i
K
.
`(|ti − j|)
≤
K∗
`(|t1 −t2 |)
≤
n X
n
X
t1
n
h1
t2
n
h2
t1 =0 t2 =0
v
uX
n
u n X
∗t
h1
≤K
t1
n
K∗
`(|t1 − t2 |)
2
t1 =0 t2 =0
v
uX
u n
∗t
h1
=K
n
X
t1 2
n
t1 =0
1
`(|t1 − t2 |)
t2
1
`(|t1 − t2 |)
=0
v
uX
n
u n X
t
h1
t2 2
n
t1 =0 t2 =0
v
uX
u n
t
h1
t2 2
n
t1 =0
1
`(|t1 − t2 |)
n
X
t2
1
`(|t1 − t2 |)
=0
≤ K0 n kh1 kn kh2 kn .
This completes the proof of the first part of the lemma. The second part follows similar to
the above by observing that if khi k∞ < M for all i = 1, . . . , k, then, instead of the estimate
(35), we have
k hX
n
Y
hi
i=3
with K0 =
5.5
ti
n
ti =0
k X
n
i
Y
1
1
k−2
≤M
≤ (M K0 )k−2
`(|ti − j|)
`(|t
−
j|)
i
i=3 t =0
i
P∞
1
t=−∞ `(|t|) .
Proof of Lemma 3
Using Lemma 2 the assumptions on the cumulants imply that
ΨZn (h) (t) = log Ee
t Zn (h)
=
∞
X
tk
k=0
k!
cumk (Zn (h))
∞
1 X
≤
(tCK0 khkn )k ,
K0 k=1
assuming that t > 0 is such that the infinite sum exists and is finite. We obtain
P |Zn (h)| > η ≤ 2 e−tη E eZn (h) = 2 exp{−tη} exp{ΨZn (h) (t)}
∞
X
−1
≤ 2 exp{−tη + K0
(tCK0 khkn )k }.
k=1
23
Choosing t =
1
2CK0 khkn
gives the assertion:
n
1/K0
exp −
P |Zn (h)| > η ≤ 2 e
o
η
.
2CK0 khkn
The fact that
cumj (t ) ≤ j! C j , j = 1, 2, . . .
holds if E|t |k ≤
(36)
C k
,
2
k = 1, 2, . . . can be seen by induction as follows. First observe that
since |cum1 (t )| = E(t ) = 0, (36) holds for j = 1. Now assume that (36) holds for all
P
k−1
k−i
|cumi (t )| we get
j = 1, . . . , k − 1. Using |cumk (t )| ≤ E(|t |k ) + k−1
i=1 i−1 E|t |
k−1 X
k−1
|cumk (t )| ≤
+
E|t |k−j |cumj (t )|
2
j
−
1
j=1
k−1 C k X
k − 1 C k−j
+
j! C j
≤
2
j−1
2
j=1
C k
≤
≤
≤
C k
2
C k
2
C k
2
≤ k! C k
5.6
+ (k − 1)! C
k
k−1
X
j=1
+ k! C k
k−1
X
`=1
j
(k − j)! 2k−j
2−`
`!
√
+ k! C k ( e − 1)
(for k ≥ 2).
Proof of Theorem 4
Using the exponential inequality from Lemma 3 we can mimic the proof of Lemma 3.2
from van de Geer (2000). As compared to van de Geer, our exponential bound is of the form
2
η
η
c1 exp{−c2 khk
}
rather
than
c
exp{−c
}. It is well-known that this type of inequality
1
2
khkn
n
leads to the covering integral being the integral of the metric entropy rather than the square
root of the metric entropy. (See for instance Theorem 2.2.4 in van der Vaart and Wellner,
1996.) This indicates the necessary modifications to the proof in van de Geer. Details are
omitted. Condition (18) just makes sure that the upper limit in the integral from (19) is
larger than the lower limit.
24
5.7
Proof of Theorem 2 and Theorem 5
First we indicate how the two processes studied in this paper enter the picture, and simultaneously we provide an outline of the proof. Recall that
bαnc
1X
b
1(b
ηt ≤ z)
Fn (α, z) =
n t=1
and
bαnc
1X
Fn (α, z) =
1(σ( nt ) t ≤ z).
n t=1
With this notation we can write
√
bn,γ (α) − Gn,γ (α) )
n (G
√
√
n Fn (α, qbγ ) − Hn (α, qbγ ) − n Fn (α, −b
qγ ) − Hn (α, −b
qγ )
=
√
√
− n Fn (α, qbγ ) − Fn (α, qγ ) − n Fn (α, −b
qγ ) − Fn (α, −qγ )
=:
−
In (α)
IIn (α)
(37)
with In (α) and IIn (α), respectively, denoting the two quantities inside the two [·]-brackets.
The assertion of Theorem 5 follows from
sup |In (α)| = oP (1)
and
(38)
α∈[0,1]
sup |IIn (α) − c(α) Gn,γ (1) − EGn,γ (1))| = oP (1).
(39)
α∈[0,1]
Property (39) can be shown by using empirical process theory based on independent, but not
identically distributed random variables. This will be done below. First we consider (38).
We will see that this involves properties of both residual empirical processes and weighted
sum processes.
Proof of supα∈[0,1] |In (α)| = oP (1) and proof of Theorem 2.
Observe that
bαnc
bαnc
1X
1X
b
Fn (α, z ) =
1 ηbt ≤ z =
1 σ( nt ) t ≤ σ( nt ) t − ηbt + z
n t=1
n t=1
bαnc
1X
b t ) 0 Yt−1 + z .
=
1 σ( nt ) t ≤ (θ − θ)(
n
n t=1
25
bn (·) − θ(·) ∈ G p → 1 as n → ∞, where for g = (g1 , . . . , gp )0 :
Recall that by assumption P θ
[0, 1]p → R we write {g ∈ G p } for {gi ∈ G, i = 1, . . . , p}. We also write
bαnc
1X t
Fn (α, z, g) =
1 σ( n ) t ≤ g0 ( nt ) Yt−1 + z .
n t=1
b − θ = OP (m−1 ) the above implies that for any
By taking into account that by assumption θ
n
> 0 we can find a C > 0 such that with probability at least 1 − we have for large enough
n that
sup
α∈[0,1], z∈[−L,L]
√ b
n Fn (α, z) − Fn (α, z) ≤
sup
α∈[0,1], z∈[−L,L],
g∈G p , kgkn ≤Cm−1
n
√ n Fn (α, z) − Fn (α, z, g) . (40)
Thus, if L is such that qγ ∈ (−L, L) and P (b
qγ ∈ [−L, L]) → 1, as n → ∞, then the right
hand side in (40) being oP (1) for any C > 0 implies that supα |In (α)| = oP (1), and this then
also proves Theorem 2. In order to control the right hand side in (40), we first introduce an
appropriate centering. Let
bαnc
1X t
E 1 σ( n ) t ≤ g0 ( nt ) Yt−1 + z t−1 , t−2 , . . .
En (α, z, g) :=
n t=1
!
0
bαnc
t
X
g(
)
Y
+
z
1
t−1
n
=
F
t
n t=1
σ( n )
(41)
and let 0 = (0, . . . , 0)0 ∈ G p denote the p-vector of null-functions. Write
h
i
Fn (α, z) − Fn (α, z, g) = Fn (α, z, 0) − En (α, z, 0) − Fn (α, z, g) − En (α, z, g)
+ En (α, z, 0) − En (α, z, g)
=: Tn1 (α, z, g) + Tn2 (α, z, g)
(42)
with Tn1 and Tn2 denoting the two terms inside the two [ ] brackets. This centering makes
T1n a sum of martingale differences (cf. proof of Lemma 1 given above). The quantity
√
νn (α, z, g)) = n Fn (α, z, g) − En (α, z, g) in fact is the residual empirical process discussed
in section 2.
It follows from (40) and (42) that together
26
√
| n Tn1 (α, z, g)| = oP (1)
sup
α∈[0,1], z∈[−L,L],
g∈G p , kgkn ≤Cm−1
n
sup
α∈[0,1], z∈[−L,L],
g∈G p , kgkn ≤Cm−1
n
and
(43)
√
n Tn2 (α, z, g) = oP (1)
(44)
imply the desired result, provided we can show that P (b
qγ ∈
/ [−L, L]) = o(1) as n → ∞.
Since qγ ∈ (−L, L), the desired property follows from consistency of qbγ as an estimator for
qγ . This will be shown at the end of this proof.
Verification of (43). We have with HL as defined immediately above Lemma 1 in Section 2
that
|νn (α, z, g) − νn (α, z, 0)| ≤
sup
|νn (h1 ) − νn (h2 )|.
sup
dn (h1 ,h2 )≤C m−1
n
h1 ,h2 ∈HL
α∈[0,1], z∈[−L,L],
g∈G p , kgkn ≤Cm−1
n
P
As above let An = { n1 nt=1−p Yt2 ≤ C0 } for some C0 > 0. It is shown in the proof of
P
Theorem 1 that n1 ns=−p+1 Ys2 = OP (1), and thus P (An ) = o(1). It remains to show that
and for η > 0 we have
P
|νn (h1 ) − νn (h2 )| ≥ η, An = o(1).
sup
(45)
dn (h1 ,h2 )≤C m−1
n
h1 ,h2 ∈HL
This, however, is an immediate application of Theorem 1, and (43) is verified. Notice here
that the finiteness of the covering integral of HL with respect to k · kn follows by standard
arguments. In fact, it is not difficult to see that for some C0 > 0
log Nn (C1 δ, HL ) ≤ −C0 log + log Nn (, G).
(46)
Verification of (44). We have
√
dnαe
1 X g
n Tn2 (α, z, g) = √
F
n t=1
dnαe
1 X
=√
f t (z) g
n t=1 n
with ζt between z and z + g
t
n
t
n
0
0
t 0
Yt−1
n
σ nt
+ z
−F
z σ nt
dnαe
1 X
Yt−1 + √
(f t (ζt ) − f nt (z)) g
n t=1 n
t
n
0
Yt−1
(47)
Yt−1 . The second term in (47) is a remainder term that
27
will be treated below. The first term is a weighted sum of the Ys ’s and we will now use
P
Theorem 4 to control this term. We can write the first term as pk=1 Zk,n (h) where
n
1 X t
h( )Yt−k ,
Zk,n (h) = √
n t=1 n
k = 1, . . . , p,
(48)
for h ∈ H with H = {hα,z,g (u) = 1α (u) fu (z)g(u), α ∈ [0, 1], z ∈ R, g ∈ G} where we use the
shorthand notation 1α (u) = 1 u ≤ α . We will apply Theorem 4 to show that each Zk,n (h)
tends to zero uniformly in h ∈ H. Since by our assumptions the functions {fu (z), z ∈
= o(1).
[−L, L]} are uniformly bounded, we have supα,z,g khα,z,g k2n ≤ supu,z |fu (z)|2 m−1
n
Since EZn (hα,z,g ) = 0, an application of Theorem 4 now shows that the first term of (47)
converges to zero in probability uniformly in (α, z, g).
Now we treat the second term in (47). Recall that our assumptions imply that the functions
{fu , u ∈ [0, 1]} are uniformly Lipschitz continuous with Lipschitz constant c, say. Therefore
we can estimate the last term in (47) by
n
c X √
g
n t=1
t
n
0
p
n
2
c XX 2
g
Yt−1 ≤ √
n t=1 k=1 k
t
n
p
X
2
Yt−j
j=0
√
√ X
n
2
n
kgk kn = c p 2 OP (log n) = oP (1),
mn
k=1
p
≤ c p sup
−p≤t≤T
Yt2
where the last inequality uses the fact that sup−p≤t≤T Yt2 = OP (log n), which follows as in
the proof of Lemma 5.9 of Dahlhaus and Polonik (2009).
Proof of supα∈[0,1] | IIn (α) − c(α) Gn,γ (1) − EGn,γ (1)) | = oP (1). Define
bαnc
1X z F n (α, z) := EFn (α, z) =
F
.
n t=1
σ( nt )
We can write
IIn (α) =
√
n (Fn − F n )(α, qbγ ) − (Fn − F n )(α, qγ )
+
√
n (Fn − F n )(α, −b
qγ ) − (Fn − F n )(α, −qγ )
+
√
n (F n (α, qbγ ) − F n (α, qγ )) −
The process νn (α, z, 0) =
√
(49)
√
n (F n (α, −b
qγ ) − F n (α, −qγ ))
(50)
(51)
n (Fn − F n )(α, z) is a sequential empirical process, or a Kiefer28
Müller process, based on independent, but not necessarily identically distributed random
variables. This process is asymptotically stochastically equicontinuous, uniformly in α with
respect to ρn (v, w) = F n (1, v) − F 1,n (1, w), i.e. for every η > 0 there exists an > 0 with
lim sup P
n→∞
|νn (α, z1 , 0) − νn (α, z2 , 0)| > η = 0.
sup
(52)
α∈[0,1], ρn (z1 ,z2 )≤
In fact, with ρn α1 , z1 ), (α2 , z2 ) = |α1 − α2 | + ρn (z1 , z2 ) we have
sup
sup
|νn (α, z1 , 0) − νn (α, z2 , 0)|
α∈[0,1] z1 ,z2 ,∈R, ρn (z1 ,z2 )≤
≤
sup
|νn (α1 , z1 , 0) − νn (α2 , z2 , 0)|.
α1 ,α2 ∈[0,1], z1 ,z2 ∈R
ρn ( (α,z1 ), (α,z2 ) )≤
Thus, (52) follows from asymptotic stochastic dn -equicontinuity of νn (α, z, 0). This in turn
follows from a proof similar to, but simpler than, the proof of Lemma 1. In fact, it can be
seen from (26) that for g1 = g2 = 0 we simply can use the metric ρ (α1 , z1 ), (α2 , z2 ) =
|α1 − α2 | + ρn (z1 , z2 ) in the estimation of the quadratic variation, which in the simple case of
g1 = g2 = 0 amounts to the estimation of the variance, because the randomness only comes
in through the t . With this modification the proof of the ρn -equicontinuity of νn (α, z1 , 0)
follows the proof of Lemma 1.
Thus, if qbγ is consistent for qγ with respect to ρn , then it follows that both (49) and (50) are
oP (1). We now prove this consistency of qbγ .
R1
R1
R1
qγ
z
−z
Recall that Ψ(u) = 0 F ( σ(u)
) du − 0 F ( σ(u)
) du. Observe that 0 F ( σ(u)
) du is close to
qγ
F n (1, qγ ). In fact, since by assumption u → F ( σ(u)
) is of bounded variation, their difference
is of the order O(1/n). Consequently we have
(1 − γ) − F n (1, qγ ) − F n (1, −qγ ) = Ψ(1, qγ ) − F n (1, qγ ) − F n (1, −qγ ) ≤ c n−1
(53)
for some c ≥ 0. In fact (53) holds uniformly in γ. This follows from the fact that the functions
z
u → F σ(u)
are Lipschitz continuous uniformly in z. (Notice that in the case where σ(·) ≡ σ0
) = F n (1, z) − F n (1, −z). In other words, in
is constant on [0, 1] then Ψ(z) = F ( σz0 ) − F ( −z
σ0
this case we can choose c = 0.) We now show that under our assumptions we have for any
29
fixed 0 < γ < 1 that
ρn (b
qγ , qγ ) = F n (1, qbγ ) − F n (1, qγ ) = oP (1).
(54)
By assumption Ψ(z) is a strictly
monotonic function in z. Together with (53) this
implies
qγ ) − F n (1, qγ ) − F n (1, −qγ ) = oP (1),
that (54) is equivalent to F n (1, qbγ ) − F n (1, −b
which (by using (53)) follows from
qγ ) − 1 − γ = oP (1),
F n (1, qbγ ) − F n (1, −b
(55)
Since by definition of qbγ we have Fbn (1, qbγ ) − Fbn (1, −b
qγ ) = 1 − γ, (55) follows from
sup |Fbn (1, z) − F n (1, z)| = oP (1).
(56)
z
To see (56) notice that supz |Fbn (1, z)−Fn (1, z)| = oP (1). This follows from (40), (42), (43) and
(44). Utilizing triangular inequality it remains to show that supz |Fn (1, z)−F n (1, z)| = oP (1).
This uniform law of large numbers result follows from the arguments given below. It can
also be seen directly by observing that for every fixed z we have |Fn (1, z) − F n (1, z)| =
oP (1) (which easily follows by observing that the variance of this quantity tends to zero as
n → ∞) together with a standard argument as in the proof of the classical Glivenko-Cantelli
theorem for continuous random variables, utilizing monotonicity of F n (1, z) and Fn (1, z).
This completes the proof of (54), and as outlined above, this implies that both (49) and (50)
are oP (1), uniformly in α.
It remains to consider the quantity in (51). First, we derive an upper bound for qbγ . Let
Bn () = {|F n (b
qγ ) − F n (qγ )| < ; supz |(Fbn − F n )(1, z)| < δ /6} where δ > 0 is such that
|F n (qγ±δ ) − F n (qγ )| < . We already know that P (Bn ()) → 1 as n → ∞ for any > 0. Now
choose > 0 small enough (such that γ > δ /6 in order for the below to be well defined).
On Bn () we have
qbγ = inf{z ≥ 0 : Fbn (1, z) − Fbn (1, −z) ≥ 1 − γ; |F n (z) − F n (qγ )| < }
≤ inf{z ≥ 0 : F n (1, z) − F n (1, −z) ≥ 1 − γ − (Fbn − F n )(1, qγ ) − (Fbn − F n )(1, −qγ )
+2
|(Fbn − F n )(1, v) − |(Fbn − F n )(1, w)|}
sup
|F n (v)−F n (w)|≤
≤ inf{z ≥ 0 : Ψ(z) ≥ 1 − γ − (Fbn − F n )(1, qγ ) − (Fbn − F n )(1, −qγ )
30
+2
|(Fbn − F n )(1, v) − |(Fbn − F n )(1, w)| + c n−1 }
sup
|F n (v)−F n (w)|≤
−1
= Ψ (1 − γ − Qn + rn ),
where for short rn = 2 sup|F n (v)−F n (w)|≤ |(Fbn − F n )(1, v) − |(Fbn − F n )(1, w)| + cn−1 with c
from (53), and Qn = (Fbn − F n )(1, qγ ) − (Fbn − F n )(1, −qγ ) . Now let
Z
Ψ(α, z) =
0
α
z
F ( σ(u)
) du
Z
α
−z
F ( σ(u))
) du,
−
0
z ≥ 0, α ∈ [0, 1].
(57)
Observe that Ψ(z) = Ψ(1, z), where Ψ(z) is defined in (22). Since by definition of qγ we have
qγ = Ψ−1 (1 − γ), and since Ψ(α, z) is strictly increasing in z for any α we have on Bn (),
F n (α, qbγ ) − F n (α, qγ ) − F n (α, −b
qγ ) − F n (α, −qγ )
≤
Ψ(α, qbγ ) − Ψ(α, qγ ) + c n−1
Ψ α, Ψ−1 1 − γ − Qn + rn
− Ψ α, Ψ−1 (1 − γ) + c n−1
=
−
=
∂
Ψ(α, Ψ−1 (ξn+ ))
∂z
Ψ0 (Ψ−1 (ξn+ ))
(Qn − rn ) + c n−1
Rα
=
−1 +
−1 +
f
(Ψ
(ξ
))
+
f
(−Ψ
(ξ
))
du
u
u
n
n
− R0 1 (Qn − rn ) + c n−1
−1
+
−1
+
fu (Ψ (ξn )) + fu (−Ψ (ξn )) du
0
=:
− (c(α) + oP (1)) (Qn − rn ) + c n−1
with ξn+ ∈ [1 − γ, 1 − γ − Qn + rn ], fu (z) =
z
∂
F ( σ(u)
)
∂z
=
1
σ(u)
f
z
σ(u)
, and the oP (1)-term
equals cn (α) − c(α) with cn (α) the ratio from the second to last line in the above formula,
and
Rα
fu (qγ )) + fu (−qγ ) du
.
c(α) = R 1 f
(q
)
+
f
(−q
)
du
u
γ
u
γ
0
0
The fact that |cn (α) − c(α)| = oP (1) follows from our smoothness assumptions together with
the fact that |Qn − rn | = oP (1). Similarly, we obtain a lower bound of the form
F n (α, qbγ ) − F n (α, qγ ) − F n (α, −b
qγ ) − F n (α, −qγ ) ≥ −(c(α) + oP (1)) (Qn + rn ) − c n−1 .
31
From the above we know that
√
and n rn = oP (1), so that
√
n Qn =
√
√
n (Fn − F n )(1, qγ ) − n (Fn − F n )(1, −qγ ) + oP (1)
√ n Fn (α, qbγ ) − F n (α, qγ ) − Fn (α, −b
qγ ) − F n (α, −qγ )
n
c(α) X
1(σ 2 ns 2t ≤ qγ2 ) − P (σ 2 nt 2t ≤ qγ2 ) + oP (1)
= −√
n t=1
√
= − c(α) n (Gn,γ (1) − EGn,γ (1) ) + oP (1).
Finally, observe that under H0 , where σ(u) ≡ σ0 , we have c(α) = α, because fu does not
depend on u ∈ [a, b]. The proof is complete once we have shown that |b
qγ − qγ | = oP (1) as
n → ∞ (cf. comment right after (44)). To see that, first notice that our regularity conditions
imply that inf u fu (qγ ) =: d∗ > 0. Again using the fact that our assumptions imply the
uniform Lipschitz continuity of the functions z → fu (z) we obtain the existence of an 0 > 0
qγ − qγ | ≤ 0
such that fu (z) > d2∗ for all |z − qγ | ≤ 0 and
all u ∈ [0, 1]. Consequently, if |b
R
q
b
then F σ(qbγt ) − F σ(qγt ) = qγγ f nt (u) du ≥ d2∗ |b
qγ − qγ | ∀ t = 0, 1, . . . , n. On the other
n
n
hand, if |b
qγ − qγ | > 0 then, because the fu(z) are all non-negative, F σ(qbγs ) − F σ(qγs ) =
n
n
R
qbγ s
d∗
qγ f T (u) du ≥ 2 0 ∀ s = 0, 1, . . . , n. Thus, for any > 0 there exists an η > 0 with
{|b
qγ − qγ | > } ⊂ F
qbγ s
σ( n
)
−F
qγ s
σ( n
)
> η ∀ t = 0, 1, . . . , n .
(58)
Since all the function z → F Ts (z) are strictly monotone, we have
n
1X
F
dn (b
qγ , qγ ) = F n (b
qγ ) − F n (qγ ) =
n t=1
qbγ s
σ( n
)
−F
qγ .
s
σ( n
)
Consequently, (58) implies that if |b
qγ − qγ | is not oP (1), then also dn (b
qγ , qγ ) is not oP (1).
This is a contradiction to (54) and this completes our proof.
6
References:
AKRITAS, M.G. and VAN KEILEGOM, I. (2001): Non-paramteric estimation of the residual distribution. Scand. J. Statist. 28, 549–567.
ALEXANDER, K.S. and PYKE, R. (1986): A uniform central limit theorem for set-indexed
partial-sum processes with finite variance. Ann. Probab. 14, 582–597.
32
CHANDLER, G. and POLONIK, W. (2006). Discrimination of locally stationary time series
based on the excess mass functional. J. Amer. Statist. Assoc. 101, 240–253.
CHANDLER, G. and POLONIK, W. (2012): Mode Identification of volatility in time-varying
autoregression. J. Amer. Statist. Assoc. 107, 1217-1229.
CHENG, F. (2005). Asymptotic distributions of error density and distribution function estimators in nonparametric regression. J. Statist. Plann. Inference 128, 327–349.
DAHLHAUS, R. (1997). Fitting time series models to nonstationary processes. Ann. Statist.
25, 1–37.
DAHLHAUS, R., NEUMANN, M. and V. SACHS, R. (1999). Nonlinear wavelet estimation
of time-varying autoregressive processes. Bernoulli 5, 873–906.
DAHLHAUS, R. and POLONIK, W. (2005). Nonparametric maximum likelihood estimation
and empirical spectral processes for Gaussian locally stationary processes. Technical report.
DAHLHAUS, R. and POLONIK, W. (2006). Nonparametric quasi-maximum likelihood estimation for gaussian locally stationary processes. Ann. Statist. 34, 2790–2824.
DAHLHAUS, R. and POLONIK, W. (2009). Empirical spectral processes for locally stationary time series Bernoulli 15, 1–39.
DREES, H. and STǍRICǍ, C. (2002). A simple non-stationary model for stock returns.
Preprint.
EOM, K.B. (1999). Analysis of acoustic signatures from moving vehicles using time-varying
autoregressive models, Multidimensional Systems and Signal Processing 10, 357–378.
FRYZLEWICZ, P., SAPATINAS, T., and SUBBA RAO, S. (2006). A Haar-Fisz technique
for locally stationary volatility estimation. Biometrika 93, 687–704.
GIRAULT, J-M., OSSANT, F. OUAHABI, A., KOUAMÉ, D. and PATAT, F. (1998). Timevarying autoregressive spectral estimation for ultrasound attenuation in tissue characterization. IEEE Trans. Ultrasonic, Ferroeletric, and Frequency Control 45, 650–659.
GRENIER, Y. (1983): Time-dependent ARMA modeling of nonstationary signals, IEEE
Trans. on Acoustics, Speech, and Signal Processing 31 , 899–911.
HALL, M.G., OPPENHEIM, A.V. and WILLSKY, A.S. (1983), Time varying parametric
modeling of speech, Signal Processing 5, 267–285.
HORVÁTH, L., KOKOSZKA, P., TEYESSIÉRE, G. (2001): Empirical process of the
33
squared residuals of an ARCH sequence. Ann. Statist. 29, 445–469.
KOUL, H. L. (2002). Weighted empirical processes in dynamic nonlinear models, 2nd edn.,
Lecture Notes in Statistics, 166, Springer, New York.
KOUL, H.L., LING, S. (2006): Fitting an error distribution in some heteroscedastic time
series models. Ann. Statist. 34, 994–1012.
KÜNSCH, H.R. (1995): A note on causal solutions for locally stationary AR processes.
Preprint. ETH Zürich.
LAÏB, N., LEMDANI, M. and OLUD-SAÏD, E. (2008): On residual empirical processes of
GARCH-SM models: Application to conditional symmetry tests. J. Time Ser. Anal. 29,
762–782.
MÜLLER, U., SCHICK, A. and WEFELMEYER, W. (2007): Estimating the error distribution function in semiparametric regression. Statistics & Decisions 25, 1–18.
MÜLLER, U., SCHICK, A. and WEFELMEYER, W. (2009a): Estimating the innovation
distribution in nonparametric autoregression. Probab. Theory Relat. Fields 144, 53–77
MÜLLER, U., SCHICK, A. and WEFELMEYER, W. (2009b): Estimating the error distribution function in nonparametric regression with multivariate covariates. Statist. Probab.
Letters 79, 957–964.
MÜLLER, U., SCHICK, A. and WEFELMEYER, W. (2012): Estimating the error distribution function in semiparametric additive regression models. J. Statist. Plan. Inference
142, 552–566.
NICKL, R. and PÖTSCHER, B.M. (2007). Bracketing metric entropy rates and empirical
central limit theorems for function classes of Besov- and Sobolev-type. J. Thoer. Probab.
20, 177–199.
ORBE, S., FERREIRA, E., and RODRIGUES-POO, J. (2005). Nonparametric estimation
of time varying parameters under shape restrictions. J. Econometrics 126, 53–77.
PRIESTLEY, M.B. (1965). Evolutionary spectra and non-stationary processes. J. Royal
Statist. Soc. - Ser. B 27, 204–237.
RAJAN, J.J. and Rayner, P.J. (1996). Generalized feature extraction for time-varying autoregressive models, IEEE Trans. on Signal Processing 44, 2498–2507.
STUTE, W. (2001). Residual analysis for ARCH(p)-time series. Test 1, 393–403.
34
SUBBA RAO, T. (1970).
The fitting of nonstationary time-series models with time-
dependent parameters, J. Royal Statist. Soc. - Ser. B 32, 312–322.
VAN DE GEER, S. (2000): Empirical processes in M-estimation. Cambridge University
Press.
VAN DER VAART, A.W. and WELLNER, J. (1996). Weak convergence and empirical processes. Springer, New York.
VAN KEILEGOM, I., GONZÁLEZ MANTEIGA, W. and SÁNCHEZ SELLERO, C. (2008).
Goodness-of-fit tests in parametric regression based on the estimation of the error distribution, Test 17, 401–415.
35
Download