Goodness-of-fit tests based on the empirical characteristic function ∗

advertisement
Goodness-of-fit tests based on the empirical
characteristic function∗
V. Alba-Fernández1 , M.D. Jiménez-Gamero2 , and J. Muñoz Garcı́a2
1
2
Dpt. Statistic and O.R. University of Jaén, Spain mvalba@ujaen.es
Dpt. Statistic and O.R. University of Seville, Spain (dolores,joaquinm)@us.es
Summary. We study a class of goodness-of-fit tests based on the empirical characteristic function that can be applied to continuous and discrete data with any
arbitrary fixed dimension. Under some not too restrictive conditions, the tests are
consistent against any fixed alternative and they are also able to detect contiguous
alternatives. We show that the bootstrap can be employed to consistently estimate
the null distribution.
Key words: goodness-of-fit, characteristic function, consistent test, contiguous alternatives, parametric bootstrap.
1 Introduction
Since the empirical characteristic function (e.c.f.) converges to the population characteristic function (c.f.) and the c.f. characterizes the distribution of the population,
several goodness of fit tests have been proposed whose test statistics measure deviations between the e.c.f. and the c.f. in the null hypothesis. Examples are the
tests proposed in Koutrouvelis [Kou80], Koutrouvelis and Kellermeier [KK81] and
Fan [Fan97]. All these tests evaluate the difference between the e.c.f. and the c.f.
of the law in the null hypothesis in several points, and hence they require choosing
the points and the number of them. Some tests have been designed for special laws.
An example is the one proposed by Epps and Pulley [EP83] for testing univariate normality, that has been extended to the multivariate case by Baringhaus and
Henze [BH88]. These authors consider as test statistic
n
Z
|cn (t) − ψ(t; µ̂, Σ̂)|2 g(t; µ̂, Σ̂)dt,
P
(1)
′
where cn (t) = n1 n
j=1 exp(it Xj ) is the e.c.f. of the sample, X1 , X2 , ..., Xn , the
prime denotes transpose, ψ(t; µ, Σ) is the c.f. of the law Nd (µ, Σ) and g(t; µ, Σ) ∝
∗
This work is partially supported by MEC grant MTM2004-01433.
1060
V. Alba-Fernández, M.D. Jiménez-Gamero, and J. Muñoz Garcı́a
|ψ(t; µ, Σ)|. That is, (1) is a weighted integral of the squared modulus of the difference between cn (t), which is a consistent estimator of the c.f. of the distribution generating the data, and ψ(t; µ̂, Σ̂), which is an estimator of the population c.f. under the null hypothesis of normality. Because of its desirable properties
( [BH88], [Epp99], [HW97]), it is interesting to try to generalize this test for testing
fit to any distribution. With this aim, to test the composite null hypothesis,
H0 : the law of X1 ∈ F,
where F is a parametric family, F = {F (x; θ), x ∈ Rd , θ ∈ Θ}, Θ being an open
subset of Rp , we consider the statistic
Tn,G (θ̂) = n
Z
|cn (t) − c(t; θ̂)|2 dG(t),
(2)
where c(t; θ) is the characteristic function of F (x; θ), θ̂ is a consistent estimator of
θ and G is a distribution function on Rd , which is not necessarily proportional to
|c0 (t; θ̂)|. The aim of this paper is to show that, under some not restrictive conditions
on F and θ̂, the test that rejects the null composite hypothesis H0 for large values
of Tn,G (θ̂) is consistent against any fixed alternative, and that it is able to detect
Pitman local alternatives that differ from the null by O(n−1/2 ). We also see that
these properties hold if the weight function G in (2) depends on θ̂, as it is the case
of statistic (1),
Tn,Ĝ (θ̂) = n
Z
|cn (t) − c(t; θ̂)|2 dG(t; θ̂).
Baringhaus and Henze [BH88] have shown that the null distribution of statistic (1)
does not depend on the unknown true parameter value. We study conditions for
this to hold in the general case considered here. The article also gives a consistent
parametric bootstrap procedure to approximate the null distribution of Tn,G (θ̂) when
it depends on some unknown parameters. Last Section summarizes the results of a
simulation study.
2 Properties
For a fixed integer d ≥ 1, let X1 , X2 , ..., Xn be independent d-dimensional random
column vectors with common distribution function F (x) and c.f. c(t). To test the
composite null hypothesis H0 we consider the statistic Tn,G (θ̂) as defined in (2). To
study properties of this test we first express Tn,G (θ̂) in a more convenient form that
also provides us with a way to compute it.
Lemma 1. The statistic Tn,G (θ̂) defined in (2) can be expressed as
Tn,G (θ̂) =
1
n
X X h(X , X ; θ̂),
n
n
j
k
j=1 k=1
where
(3)
R
R
with u (x; θ) = u(x − y)dF (y; θ) and uR (θ) = u(x − y)dF (x; θ)dF (y; θ), being
u(t) the real part of the c.f. of G, u(t) = cos(x t)dG(x).
h(x, y; θ) = u(x − y) − u0 (x; θ) − u0 (y; θ) + u00 (θ),
0
00
′
Goodness-of-fit tests based on the empirical characteristic function
1061
Note that Tn,G (θ̂) is not, in general, a degree-2 V statistic, but what JiménezGamero [JMP03] call a parameter estimated degree-2 V statistic, since for each fixed
value of θ, Tn,G (θ) is a degree-2 V statistic. Tn,Ĝ (θ̂) also has an expansion similar to
that of Tn,G (θ̂) given in Lemma 1, with G(t) replaced by G(t; θ̂) in the expression
of h(x, y; θ̂) as defined in (3), and hence Tn,Ĝ (θ̂) is a parameter estimated degree-2
V statistic, too.
To derive properties for statistics Tn,G (θ̂) and Tn,Ĝ (θ̂) we first introduce some
notation and specify some conditions. Let k.k denote the Euclidean norm. For
any real function f (t; θ) differentiable at θ = (θ1 , θ2 , ..., θp )′ the following notations will be used for any t ∈ Rd , f(r) (t; θ) = ∂θ∂r f (t; θ), r = 1, 2, ..., p,
∇f (t; θ) = (f(1) (t; θ), f(2) (t; θ), ..., f(p) (t; θ))′ . For any distribution in the family
{F (x; θ), x ∈ Rd , θ ∈ Θ}, Θ ⊂ Rp , let R(t; θ) and I(t; θ) denote the real and
imaginary parts of c(t; θ), respectively,
Condition A. Let X1 , X2 , ..., Xn be independent and identically distributed
d-dimensional random vectors with distribution function F and let θ̂ =
θ̂(X1 , X2 , ..., Xn ) be such that
X l(X ; θ) + o (n
n
θ̂ = θ + n−1
j
j=1
R
p
−1/2
),
R
for some θ ∈ Θ, with θ = θ0 if F (.) = F (.; θ0 ), and the components of l(x; θ) =
(l1 (x; θ), l2 (x; θ), ...., lp (x; θ))′ satisfy lr (x; θ)dF (x) = 0, lr (x; θ)2 dF (x) < ∞, 1 ≤
r ≤ p.
Condition B-s. There is a neighborhood of θ, call it K(θ), such that for all
γ ∈ K(θ) we have that R(r) (t; γ) and I(r) (t; γ) exist and |R(r) (t; γ)| ≤ ρr (t), ∀t ∈
Rd , with ρsr (t)dG(t) < ∞, |I(r) (t; γ)| ≤ ιr (t), ∀t ∈ Rd , with ιsr (t)dG(t) < ∞,
r = 1, 2, ..., p.
Condition C. For any ε > 0 there is a bounded sphere C in Rp centered at θ,
such that if γ ∈ C then R(r) (t; γ) and I(r) (t; γ) exist for r = 1, 2, ..., p and k∇R(t; γ)−
∇R(t; θ)k ≤ ρ(t), ∀t ∈ Rd , with ρ2 (t)dG(t) < ε, k∇I(t; γ) − ∇I(t; θ)k ≤ ι(t),
∀t ∈ Rd , with ι2 (t)dG(t) < ε.
Condition A says that θ̂ is a consistent asymptotically normal estimator of θ. This
condition is quite usual and it is satisfied by a large class of estimators. Conditions
B and C are smoothness conditions on c(t; θ) as a function of θ. Next theorem gives
the asymptotic null distribution of Tn,G (θ̂).
R
R
R
R
P
Theorem 1. Suppose conditions A, B-2 and C hold, and that H0 is true, that is,
c 2
2
2
F (.) = F (.; θ). Then, Tn,G (θ̂) → ∞
j=1 λj χ1j in distribution, where χ11 , χ12 , ... are
independent chi square variates with one degree of freedom, the set {λcj } are the
eigenvalues of operator Ac = Ac (F (.; θ)) defined on L2 (Rd , F (.; θ)) by
c
A w(x) =
R
Z
hc (x, y; θ)w(y)dF (y; θ),
R
and hc (x, y; θ) = h(x, y; θ) − l(x; θ)′ m(y; θ) − l(y; θ)′ m(x; θ) + m(x; θ)′ M (θ)m(y; θ),
{cos(t′ x) − R(t; θ)}∇R(t; θ)dG(t) +
{sin(t′ x) −
with m(x; θ)
=
I(t; θ)}∇I(t; θ)dG(t), M (θ) = {∇R(t; θ)∇R(t; θ)′ + ∇I(t; θ)∇I(t; θ)′ } dG(t).
R
Note that the limiting null distribution of Tn,G (θ̂) may depend on the estimator
θ̂ of θ we are considering. The following result gives the behaviour of Tn,G (θ̂) under
fixed alternatives.
1062
V. Alba-Fernández, M.D. Jiménez-Gamero, and J. Muñoz Garcı́a
Z
Theorem 2. If θ̂ = θ + o(1) for some θ ∈ Θ, condition B-1 holds and G is such that
η(F ; θ) =
|c(t) − c(t; θ)|2 dG(t) > 0,
∀F ∈
/ F,
(4)
then Tn,G (θ̂) → ∞ almost surely when H0 does not hold.
Hence, if G is such that (4) holds, then the test that rejects H0 for large values of
Tn,G (θ̂) is strongly consistent against any fixed alternative. The next theorem gives
the behaviour of this test under contiguous alternatives. It shows that it is able to
detect alternatives which converge to some F ∈ F at the rate n−1/2 , irrespectively
of the underlying dimension d.
Let {fjc } be the set of orthonormal eigenfunctions corresponding to the eigenvalues {λcj } of operator Ac (F (.; θ)) in Theorem 1.
Theorem 3. Suppose conditions A, B-2 and C hold, and that the probability measure
induced by F is dominated by the probability measure induced by F0 , where F0 (.) =
F (.; θ), with Radon-Nikodym derivative dF/dF0 = 1 + n−1/2 acn , for some sequence
{acn } in L2 (Rd , F0 ) converging to ac ∈ L2 (Rd , F0 ), say, then limn→∞ P {Tn,G (θ̂) ≤
x} = P { k≥1 λk (Zk + cck )2 ≤ x}, where cck = ac (x)fkc (x)dF0 (x) and Z1 , Z2 , ... are
i.i.d. standard normal variates, N1 (0, 1).
R
P
The results in Theorems 1-3 have been proven by Fan [Fan98] for continuous
families F satisfying more stringent conditions than the ones assumed here and for
θ̂ being the quasi maximum likelihood estimator of θ. Note that our results are valid
for continuous and discrete populations and for general θ̂ satisfying Condition A.
The following result shows that if condition D holds, then Tn,G (θ̂) and Tn,Ĝ (θ̂)
both have the same limit behaviour.
Condition D. For all γ in a neighborhood of θ, G(t; γ) has a probability density
g(t; γ) with respect to τ , where τ is a σ-finite measure over (Rd , B) and B is the
class of Borel sets of Rd , satisfying |g(t; θ) − g(t; γ)| ≤ g0 (t; θ)kθ − γk, where g0 (t; θ)
is such that g0 (t; θ)dt < ∞.
R
Theorem 4. Suppose θ̂ = θ + rn , for some θ ∈ Θ, and that condition D holds.
(a) If rn = oP (1) and H0 holds, then Tn,Ĝ (θ̂) = Tn,G (θ̂) + oP (1).
(b) If rn = o(1), condition B-1 holds and G(.; θ) satisfies (4), then Tn,Ĝ (θ̂) → ∞
almost surely when H0 does not hold.
(c) If rn = oP (1) and the probability measure induced by F is dominated by the probability measure induced by F0 , where F0 (.) = F (.; θ), with Radon-Nikodym derivative dF/dF0 = 1 + n−1/2 acn , for some sequence {acn } in L2 (Rd , F0 ) converging to
ac ∈ L2 (Rd , F0 ) then Tn,Ĝ (θ̂) = Tn,G (θ̂) + oP (1).
3 Null distribution not depending on θ
In general, the null distribution of statistics Tn,G (θ̂) and Tn,Ĝ (θ̂) depends on the
unknown true parameter value θ. In this section we see that in some cases, and
for adequate choices of the weight function, the null distribution of these statistics
does not depend on θ, but only on the family F and the weight function G. Let F0
be a fixed distribution function on Rd with varF0 (X) = Σ0 positive definite. Let
Goodness-of-fit tests based on the empirical characteristic function
1063
X0 be a random vector having distribution function F0 and consider F = F1 =
{F (x; υ, V ) = P (X ≤ x), with X = V X0 + υ, υ ∈ Rd , V ∈ Md×d , det(V ) 6= 0},
where Md×d is the set of d × d matrices. We have the following result.
Lemma 2. If F ∈ F1 and G(t) = G(t; V ) is such that dG(t; V ) = det(V )dG0 (V t),
for some fixed distribution G0 , then
Tn,Ĝ (θ̂) = n
Z
|ĉn (t) − c0 (t)|2 dG0 (t),
(5)
where θ = (υ, V ), c0 (t) is the c.f. of F0 , ĉn (t) is the e.c.f. of Z1 , Z2 , ..., Zn , Zj =
V̂ −1 (Xj − υ̂), j = 1, 2, ..., n, υ̂ = υ̂(X1 , X2 , ..., Xn ) is an estimator of υ and V̂ =
V̂ (X1 , X2 , ..., Xn ) is an estimator of V such that det(V̂ ) 6= 0 with probability one.
Assume without loss of generality that Σ0 = Id . A problem with the result in
Lemma 2 is the estimation of V . To see it, let us consider the following example:
if X0 ∼ Nd (0, Id ), then V X0 + υ ∼ Nd (υ, Σ), with Σ = V V ′ ; since different matrices A with det(A) 6= 0 give rise to the same covariance matrix Σ = AA′ , one
cannot identify V and thus one cannot estimate it. To circumvent this difficulty,
we can make the parametrization one-to-one in the definition of F1 by imposing
some condition on V as to be lower triangular or to be symmetric positive definite. However, in some cases, we do not have to worry about the parametrization
in the definition of F1 . Let us consider, for example, that the weight function G
satisfies the condition in Lemma 2, that is, G(t) = G(t; V ) is such that dG(t; V )
= det(V )dG0 (V t), and that G0 is a spherically symmetric distribution. In this case,
since cos(t′ x)dG0 (x) = u(t′ t) for some real valued function u of a scalar variable,
from equality (5) we have that Tn,Ĝ (θ̂) depends on V through V V ′ = Σ = var(X),
that can be estimated, for example, by the sample covariance matrix. The next result, which is an immediate consequence of Lemma 2, shows that in this case, if the
estimator of θ = (υ, Σ) satisfies a equivariant condition, then the null distribution
of Tn,Ĝ (θ̂) does not depend on θ.
R
Theorem 5. Suppose conditions in Lemma 2 hold with G0 a spherically symmetric
distribution and that θ̂ = (υ̂, Σ̂) satisfies
(i) υ̂(BX1 + b, BX2 + b, ..., BXn + b) = B υ̂(X1 , X2 , ..., Xn ) + b,
(ii) Σ̂(BX1 + b, BX2 + b, ..., BXn + b) = B Σ̂(X1 , X2 , ..., Xn )B ′ ,
∀b ∈ Rd , ∀B ∈ Md×d with det(B) 6= 0, then Tn,Ĝ (θ̂) is invariant with respect to
nonsingular affine transformations of the data X1 , X2 , ..., Xn .
Thus if F ∈ F1 , to obtain the null distribution of Tn,Ĝ (θ̂), we can assume that,
under some conditions on G and θ̂, the data is a random sample from F0 . F1 is a
location-scale family. Similar results to that stated in Theorem 5 can be given when
F is a location family and when F is a scale family.
4 A bootstrap test
In general, the null distribution of Tn,G (θ̂) is difficult to obtain and in addition it
depends on the unknown true parameter value θ. This is also true for its limiting null
distribution, since the eigenvalues of operator Ac = Ac (F (.; θ)), defined in Theorem
1064
V. Alba-Fernández, M.D. Jiménez-Gamero, and J. Muñoz Garcı́a
1, may depend on θ. Even if they were known, it would be computationally difficult
to evaluate the limiting null distribution. Thus it is wise to look for other ways of
approximating the null distribution of Tn,G (θ̂). In this section we show that the
bootstrap can be used to consistently approximate the null distribution of Tn,G (θ̂)
(and also that of Tn,Ĝ (θ̂)). In order to define properly the null bootstrap distribution
of Tn,G (θ̂), we assume the following condition.
Condition E. θ̂ = θ + o(1), for some θ ∈ Θ, with θ = θ0 if F (.) = F (.; θ0 ).
Condition E implies that if n ≥ n0 , for some n0 ∈ N, then F (.; θ̂) ∈ F, given
X1 , X2 , ..., Xn . Assume n ≥ n0 . Let X1∗ , X2∗ , ..., Xn∗ be a random sample from F (.; θ̂),
∗
given X1 , X2 , ..., Xn , and let Tn,G
(θ̂∗ ) be the bootstrap version of Tn,G (θ̂), that is,
∗
∗
∗
∗ 2
Tn,G (θ̂ ) = n |cn (t) − c(t; θ̂ )| dG(t), where θ̂∗ = θ̂(X1∗ , X2∗ , ..., Xn∗ ) and c∗n (t) is
the e.c.f. of the bootstrap sample X1∗ , X2∗ , ..., Xn∗ .
To show that the bootstrap consistently estimates the null distribution of
Tn,G (θ̂), we need F (x; θ) to be smooth as a function of θ. Next condition guarantees
this.
Condition F. Each distribution F (x; θ) ∈ F has a probability density f (x; θ)
with respect to τ , where τ is a σ-finite measure over (Rd , B), such that f(r) (x; θ)
exists and satisfies Eθ [{f(r) (X; θ)/f (X; θ)}2 ] < ∞, 1 ≤ r ≤ p, and there is
a neighborhood of θ, call it K(θ), such that if γ ∈ K(θ) then Eθ [{f(r) (X; θ) −
f(r) (X; γ)}2 /f 2 (X; θ)] < ∞, 1 ≤ r ≤ p.
Let Pθ denote the probability assuming that F (.) = F (.; θ) and let P∗ denote
the conditional bootstrap probability above defined, given X1 , X2 , ..., Xn , that is,
P∗ (.) = Pθ̂ .
R
R
Theorem 6. Suppose conditions A, B-2, C, E and F hold and that Σ1 (γ) is continu∗
ous at γ = θ, where Σ1 (γ) = l(x; γ)l(x; γ)′ dF (x; γ). Then, supx∈R |P∗ (Tn,G
(θ̂∗ ) ≤
x) − Pθ (Tn,G (θ̂) ≤ x)| → 0, as n → ∞, with probability one.
As an immediate consequence of Theorem 6, we have that if we take t∗α such
∗
that P∗ (Tn,G
(θ̂∗ ) ≥ t∗α ) = α, for some 0 < α < 1, then the test that rejects H0 when
Tn,G (θ̂) ≥ t∗α has asymptotically the desired level.
Note that conditions in Theorem 6 are weaker than those required in Fan [Fan98]
to derive the consistency of the bootstrap approximation to the null distribution of
Tn,G (θ̂).
If in addition of the assumptions in Theorem 6 we assume that the weight function G is smooth as a function of θ we get a similar result to that in Theorem 6 for
∗
Tn,Ĝ (θ̂) with identical consequences. Let Tn,
(θ̂∗ ) = n |c∗n (t) − c(t; θ̂∗ )|2 dG(t; θ̂∗ ).
Ĝ
R
Theorem 7. Suppose assumptions in Theorem 6 with G(t) = G(t; θ) and that
∗
G(t; θ) satisfies condition D. Then, supx∈R |P∗ (Tn,
(θ̂∗ ) ≤ x) − Pθ (Tn,Ĝ (θ̂) ≤ x)| →
Ĝ
0, as n → ∞, with probability one.
It is important to note that Theorems 6 and 7 hold whether H0 is true or not. If
H0 is indeed true, Theorem 6 (7) implies that the bootstrap distribution of Tn,G (θ̂)
(Tn,Ĝ (θ̂)) converges to its null distribution almost surely. If H0 is not true, then
Theorem 6 (7) asserts that the bootstrap distribution of Tn,G (θ̂) (Tn,Ĝ (θ̂)) converges
to the distribution of Tn,G (θ̂) (Tn,Ĝ (θ̂)) when the data have distribution function
F (.; θ).
Goodness-of-fit tests based on the empirical characteristic function
1065
5 A simulation study
When the null distribution of the test statistic depends on the unknown true parameter value, we have seen that the bootstrap provides a way to obtain an approximate
α-level test for testing H0 . To study the finite sample performance of the bootstrap
test we have carried out a simulation study. The family F in H0 that we have considered is Morgenstern’s system of bivariate distributions with uniform marginals
F = {F (x, y; θ) = F (x)F (y)[1 + θ{1 − F (x)}{1 − F (y)}],
|θ| ≤ 1} ,
where F (x) is the distribution function of a U (0, 1) population. For testing H0
we have considered the tests statistic Tn,G (θ̂) with dG(t) = ςkc(t; 0)k2 dt, where
ς = 1/ kc(t; 0)k2 dt and c(t; θ) is the c.f. of F (x, y; θ). For several sample sizes and
several populations, we have generated M = 1000 samples and, for each sample, we
calculated the bootstrap test. Table 1 shows the empirical power for α = 0.05, 0.10.
Column (1) examines the bootstrap approximation to the null distribution of
the test statistic, which seems to be good since the rejection frequencies are quite
closed to the nominal sizes. Columns (2) to (5) examine the power of the test which
is quite high for all the considered alternatives.
R
Table 1. Empirical power of the bootstrap test for data coming from (1) F (x, y; 0.5),
(2) Two independent β(0.5, 0.5), (3) Two independent β(0.5, 1), (4) Two independent β(3, 2), (5) Two independent β(3, 3).
Population
(1)
(2)
(3)
(4)
(5)
α
0.05 0.10 0.05 0.10 0.05 0.10 0.05 0.10 0.05 0.10
n = 20
n = 30
0.049 0.107 0.252 0.378 0.861 0.909 0.737 0.849 0.431 0.634
0.046 0.086 0.287 0.434 0.956 0.974 0.921 0.962 0.674 0.830
References
[Kou80]
[KK81]
[Fan97]
[EP83]
[BH88]
Koutrouvelis, I.A.: A goodness-of-fit test of simple hypothesis based on
the empirical characteristic function. Biometrika, 67, 238–240 (1980)
Koutrouvelis, I.A., Kellermeier, J.: A Goodness-of-fit Test based on the
Empirical Characteristic Function when Parameters must be Estimated.
J. R. Statist. Soc. B, 43, 173–176 (1981)
Fan, Y.: Goodness-of-Fit Tests for a Multivariate Distribution by the Empirical Characteristic Function. J. Multivariate Anal., 62, 36–63 (1997)
Epps, T.W., Pulley, L.B.: A test for normality based on the empirical
characteristic function. Biometrika, 70, 723–726 (1983)
Baringhaus, L., Henze, N.: A Consistent Test for Multivariate Normality
Based on the Empirical Characteristic Function. Metrika, 35, 339–348
(1988)
1066
[Epp99]
V. Alba-Fernández, M.D. Jiménez-Gamero, and J. Muñoz Garcı́a
Epps, T.W.: Limiting behavior of the ICF test for normality under GramCharlier alternatives. Statist. Probab. Lett., 42, 175–184 (1999)
[HW97] Henze, N., Wagner, T.: A New Approach to the BHEP Tests for Multivariate Normality. J. Multivariate Anal., 62, 1–23 (1997)
[JMP03] Jiménez-Gamero, M.D., Muñoz-Garcı́a, J., Pino-Mejı́as, R.: Bootstrapping parameter estimated degenerate U and V statistics. Statist. Probab.
Lett., 61, 61–70 (2003)
[Fan98] Fan, Y.: Goodness-of-fit tests based on kernel density estimators with
fixed smoothing parameters. Econometric Theory, 14, 604–621 (1998)
Download