Goodness-of-fit tests based on the empirical characteristic function∗ V. Alba-Fernández1 , M.D. Jiménez-Gamero2 , and J. Muñoz Garcı́a2 1 2 Dpt. Statistic and O.R. University of Jaén, Spain mvalba@ujaen.es Dpt. Statistic and O.R. University of Seville, Spain (dolores,joaquinm)@us.es Summary. We study a class of goodness-of-fit tests based on the empirical characteristic function that can be applied to continuous and discrete data with any arbitrary fixed dimension. Under some not too restrictive conditions, the tests are consistent against any fixed alternative and they are also able to detect contiguous alternatives. We show that the bootstrap can be employed to consistently estimate the null distribution. Key words: goodness-of-fit, characteristic function, consistent test, contiguous alternatives, parametric bootstrap. 1 Introduction Since the empirical characteristic function (e.c.f.) converges to the population characteristic function (c.f.) and the c.f. characterizes the distribution of the population, several goodness of fit tests have been proposed whose test statistics measure deviations between the e.c.f. and the c.f. in the null hypothesis. Examples are the tests proposed in Koutrouvelis [Kou80], Koutrouvelis and Kellermeier [KK81] and Fan [Fan97]. All these tests evaluate the difference between the e.c.f. and the c.f. of the law in the null hypothesis in several points, and hence they require choosing the points and the number of them. Some tests have been designed for special laws. An example is the one proposed by Epps and Pulley [EP83] for testing univariate normality, that has been extended to the multivariate case by Baringhaus and Henze [BH88]. These authors consider as test statistic n Z |cn (t) − ψ(t; µ̂, Σ̂)|2 g(t; µ̂, Σ̂)dt, P (1) ′ where cn (t) = n1 n j=1 exp(it Xj ) is the e.c.f. of the sample, X1 , X2 , ..., Xn , the prime denotes transpose, ψ(t; µ, Σ) is the c.f. of the law Nd (µ, Σ) and g(t; µ, Σ) ∝ ∗ This work is partially supported by MEC grant MTM2004-01433. 1060 V. Alba-Fernández, M.D. Jiménez-Gamero, and J. Muñoz Garcı́a |ψ(t; µ, Σ)|. That is, (1) is a weighted integral of the squared modulus of the difference between cn (t), which is a consistent estimator of the c.f. of the distribution generating the data, and ψ(t; µ̂, Σ̂), which is an estimator of the population c.f. under the null hypothesis of normality. Because of its desirable properties ( [BH88], [Epp99], [HW97]), it is interesting to try to generalize this test for testing fit to any distribution. With this aim, to test the composite null hypothesis, H0 : the law of X1 ∈ F, where F is a parametric family, F = {F (x; θ), x ∈ Rd , θ ∈ Θ}, Θ being an open subset of Rp , we consider the statistic Tn,G (θ̂) = n Z |cn (t) − c(t; θ̂)|2 dG(t), (2) where c(t; θ) is the characteristic function of F (x; θ), θ̂ is a consistent estimator of θ and G is a distribution function on Rd , which is not necessarily proportional to |c0 (t; θ̂)|. The aim of this paper is to show that, under some not restrictive conditions on F and θ̂, the test that rejects the null composite hypothesis H0 for large values of Tn,G (θ̂) is consistent against any fixed alternative, and that it is able to detect Pitman local alternatives that differ from the null by O(n−1/2 ). We also see that these properties hold if the weight function G in (2) depends on θ̂, as it is the case of statistic (1), Tn,Ĝ (θ̂) = n Z |cn (t) − c(t; θ̂)|2 dG(t; θ̂). Baringhaus and Henze [BH88] have shown that the null distribution of statistic (1) does not depend on the unknown true parameter value. We study conditions for this to hold in the general case considered here. The article also gives a consistent parametric bootstrap procedure to approximate the null distribution of Tn,G (θ̂) when it depends on some unknown parameters. Last Section summarizes the results of a simulation study. 2 Properties For a fixed integer d ≥ 1, let X1 , X2 , ..., Xn be independent d-dimensional random column vectors with common distribution function F (x) and c.f. c(t). To test the composite null hypothesis H0 we consider the statistic Tn,G (θ̂) as defined in (2). To study properties of this test we first express Tn,G (θ̂) in a more convenient form that also provides us with a way to compute it. Lemma 1. The statistic Tn,G (θ̂) defined in (2) can be expressed as Tn,G (θ̂) = 1 n X X h(X , X ; θ̂), n n j k j=1 k=1 where (3) R R with u (x; θ) = u(x − y)dF (y; θ) and uR (θ) = u(x − y)dF (x; θ)dF (y; θ), being u(t) the real part of the c.f. of G, u(t) = cos(x t)dG(x). h(x, y; θ) = u(x − y) − u0 (x; θ) − u0 (y; θ) + u00 (θ), 0 00 ′ Goodness-of-fit tests based on the empirical characteristic function 1061 Note that Tn,G (θ̂) is not, in general, a degree-2 V statistic, but what JiménezGamero [JMP03] call a parameter estimated degree-2 V statistic, since for each fixed value of θ, Tn,G (θ) is a degree-2 V statistic. Tn,Ĝ (θ̂) also has an expansion similar to that of Tn,G (θ̂) given in Lemma 1, with G(t) replaced by G(t; θ̂) in the expression of h(x, y; θ̂) as defined in (3), and hence Tn,Ĝ (θ̂) is a parameter estimated degree-2 V statistic, too. To derive properties for statistics Tn,G (θ̂) and Tn,Ĝ (θ̂) we first introduce some notation and specify some conditions. Let k.k denote the Euclidean norm. For any real function f (t; θ) differentiable at θ = (θ1 , θ2 , ..., θp )′ the following notations will be used for any t ∈ Rd , f(r) (t; θ) = ∂θ∂r f (t; θ), r = 1, 2, ..., p, ∇f (t; θ) = (f(1) (t; θ), f(2) (t; θ), ..., f(p) (t; θ))′ . For any distribution in the family {F (x; θ), x ∈ Rd , θ ∈ Θ}, Θ ⊂ Rp , let R(t; θ) and I(t; θ) denote the real and imaginary parts of c(t; θ), respectively, Condition A. Let X1 , X2 , ..., Xn be independent and identically distributed d-dimensional random vectors with distribution function F and let θ̂ = θ̂(X1 , X2 , ..., Xn ) be such that X l(X ; θ) + o (n n θ̂ = θ + n−1 j j=1 R p −1/2 ), R for some θ ∈ Θ, with θ = θ0 if F (.) = F (.; θ0 ), and the components of l(x; θ) = (l1 (x; θ), l2 (x; θ), ...., lp (x; θ))′ satisfy lr (x; θ)dF (x) = 0, lr (x; θ)2 dF (x) < ∞, 1 ≤ r ≤ p. Condition B-s. There is a neighborhood of θ, call it K(θ), such that for all γ ∈ K(θ) we have that R(r) (t; γ) and I(r) (t; γ) exist and |R(r) (t; γ)| ≤ ρr (t), ∀t ∈ Rd , with ρsr (t)dG(t) < ∞, |I(r) (t; γ)| ≤ ιr (t), ∀t ∈ Rd , with ιsr (t)dG(t) < ∞, r = 1, 2, ..., p. Condition C. For any ε > 0 there is a bounded sphere C in Rp centered at θ, such that if γ ∈ C then R(r) (t; γ) and I(r) (t; γ) exist for r = 1, 2, ..., p and k∇R(t; γ)− ∇R(t; θ)k ≤ ρ(t), ∀t ∈ Rd , with ρ2 (t)dG(t) < ε, k∇I(t; γ) − ∇I(t; θ)k ≤ ι(t), ∀t ∈ Rd , with ι2 (t)dG(t) < ε. Condition A says that θ̂ is a consistent asymptotically normal estimator of θ. This condition is quite usual and it is satisfied by a large class of estimators. Conditions B and C are smoothness conditions on c(t; θ) as a function of θ. Next theorem gives the asymptotic null distribution of Tn,G (θ̂). R R R R P Theorem 1. Suppose conditions A, B-2 and C hold, and that H0 is true, that is, c 2 2 2 F (.) = F (.; θ). Then, Tn,G (θ̂) → ∞ j=1 λj χ1j in distribution, where χ11 , χ12 , ... are independent chi square variates with one degree of freedom, the set {λcj } are the eigenvalues of operator Ac = Ac (F (.; θ)) defined on L2 (Rd , F (.; θ)) by c A w(x) = R Z hc (x, y; θ)w(y)dF (y; θ), R and hc (x, y; θ) = h(x, y; θ) − l(x; θ)′ m(y; θ) − l(y; θ)′ m(x; θ) + m(x; θ)′ M (θ)m(y; θ), {cos(t′ x) − R(t; θ)}∇R(t; θ)dG(t) + {sin(t′ x) − with m(x; θ) = I(t; θ)}∇I(t; θ)dG(t), M (θ) = {∇R(t; θ)∇R(t; θ)′ + ∇I(t; θ)∇I(t; θ)′ } dG(t). R Note that the limiting null distribution of Tn,G (θ̂) may depend on the estimator θ̂ of θ we are considering. The following result gives the behaviour of Tn,G (θ̂) under fixed alternatives. 1062 V. Alba-Fernández, M.D. Jiménez-Gamero, and J. Muñoz Garcı́a Z Theorem 2. If θ̂ = θ + o(1) for some θ ∈ Θ, condition B-1 holds and G is such that η(F ; θ) = |c(t) − c(t; θ)|2 dG(t) > 0, ∀F ∈ / F, (4) then Tn,G (θ̂) → ∞ almost surely when H0 does not hold. Hence, if G is such that (4) holds, then the test that rejects H0 for large values of Tn,G (θ̂) is strongly consistent against any fixed alternative. The next theorem gives the behaviour of this test under contiguous alternatives. It shows that it is able to detect alternatives which converge to some F ∈ F at the rate n−1/2 , irrespectively of the underlying dimension d. Let {fjc } be the set of orthonormal eigenfunctions corresponding to the eigenvalues {λcj } of operator Ac (F (.; θ)) in Theorem 1. Theorem 3. Suppose conditions A, B-2 and C hold, and that the probability measure induced by F is dominated by the probability measure induced by F0 , where F0 (.) = F (.; θ), with Radon-Nikodym derivative dF/dF0 = 1 + n−1/2 acn , for some sequence {acn } in L2 (Rd , F0 ) converging to ac ∈ L2 (Rd , F0 ), say, then limn→∞ P {Tn,G (θ̂) ≤ x} = P { k≥1 λk (Zk + cck )2 ≤ x}, where cck = ac (x)fkc (x)dF0 (x) and Z1 , Z2 , ... are i.i.d. standard normal variates, N1 (0, 1). R P The results in Theorems 1-3 have been proven by Fan [Fan98] for continuous families F satisfying more stringent conditions than the ones assumed here and for θ̂ being the quasi maximum likelihood estimator of θ. Note that our results are valid for continuous and discrete populations and for general θ̂ satisfying Condition A. The following result shows that if condition D holds, then Tn,G (θ̂) and Tn,Ĝ (θ̂) both have the same limit behaviour. Condition D. For all γ in a neighborhood of θ, G(t; γ) has a probability density g(t; γ) with respect to τ , where τ is a σ-finite measure over (Rd , B) and B is the class of Borel sets of Rd , satisfying |g(t; θ) − g(t; γ)| ≤ g0 (t; θ)kθ − γk, where g0 (t; θ) is such that g0 (t; θ)dt < ∞. R Theorem 4. Suppose θ̂ = θ + rn , for some θ ∈ Θ, and that condition D holds. (a) If rn = oP (1) and H0 holds, then Tn,Ĝ (θ̂) = Tn,G (θ̂) + oP (1). (b) If rn = o(1), condition B-1 holds and G(.; θ) satisfies (4), then Tn,Ĝ (θ̂) → ∞ almost surely when H0 does not hold. (c) If rn = oP (1) and the probability measure induced by F is dominated by the probability measure induced by F0 , where F0 (.) = F (.; θ), with Radon-Nikodym derivative dF/dF0 = 1 + n−1/2 acn , for some sequence {acn } in L2 (Rd , F0 ) converging to ac ∈ L2 (Rd , F0 ) then Tn,Ĝ (θ̂) = Tn,G (θ̂) + oP (1). 3 Null distribution not depending on θ In general, the null distribution of statistics Tn,G (θ̂) and Tn,Ĝ (θ̂) depends on the unknown true parameter value θ. In this section we see that in some cases, and for adequate choices of the weight function, the null distribution of these statistics does not depend on θ, but only on the family F and the weight function G. Let F0 be a fixed distribution function on Rd with varF0 (X) = Σ0 positive definite. Let Goodness-of-fit tests based on the empirical characteristic function 1063 X0 be a random vector having distribution function F0 and consider F = F1 = {F (x; υ, V ) = P (X ≤ x), with X = V X0 + υ, υ ∈ Rd , V ∈ Md×d , det(V ) 6= 0}, where Md×d is the set of d × d matrices. We have the following result. Lemma 2. If F ∈ F1 and G(t) = G(t; V ) is such that dG(t; V ) = det(V )dG0 (V t), for some fixed distribution G0 , then Tn,Ĝ (θ̂) = n Z |ĉn (t) − c0 (t)|2 dG0 (t), (5) where θ = (υ, V ), c0 (t) is the c.f. of F0 , ĉn (t) is the e.c.f. of Z1 , Z2 , ..., Zn , Zj = V̂ −1 (Xj − υ̂), j = 1, 2, ..., n, υ̂ = υ̂(X1 , X2 , ..., Xn ) is an estimator of υ and V̂ = V̂ (X1 , X2 , ..., Xn ) is an estimator of V such that det(V̂ ) 6= 0 with probability one. Assume without loss of generality that Σ0 = Id . A problem with the result in Lemma 2 is the estimation of V . To see it, let us consider the following example: if X0 ∼ Nd (0, Id ), then V X0 + υ ∼ Nd (υ, Σ), with Σ = V V ′ ; since different matrices A with det(A) 6= 0 give rise to the same covariance matrix Σ = AA′ , one cannot identify V and thus one cannot estimate it. To circumvent this difficulty, we can make the parametrization one-to-one in the definition of F1 by imposing some condition on V as to be lower triangular or to be symmetric positive definite. However, in some cases, we do not have to worry about the parametrization in the definition of F1 . Let us consider, for example, that the weight function G satisfies the condition in Lemma 2, that is, G(t) = G(t; V ) is such that dG(t; V ) = det(V )dG0 (V t), and that G0 is a spherically symmetric distribution. In this case, since cos(t′ x)dG0 (x) = u(t′ t) for some real valued function u of a scalar variable, from equality (5) we have that Tn,Ĝ (θ̂) depends on V through V V ′ = Σ = var(X), that can be estimated, for example, by the sample covariance matrix. The next result, which is an immediate consequence of Lemma 2, shows that in this case, if the estimator of θ = (υ, Σ) satisfies a equivariant condition, then the null distribution of Tn,Ĝ (θ̂) does not depend on θ. R Theorem 5. Suppose conditions in Lemma 2 hold with G0 a spherically symmetric distribution and that θ̂ = (υ̂, Σ̂) satisfies (i) υ̂(BX1 + b, BX2 + b, ..., BXn + b) = B υ̂(X1 , X2 , ..., Xn ) + b, (ii) Σ̂(BX1 + b, BX2 + b, ..., BXn + b) = B Σ̂(X1 , X2 , ..., Xn )B ′ , ∀b ∈ Rd , ∀B ∈ Md×d with det(B) 6= 0, then Tn,Ĝ (θ̂) is invariant with respect to nonsingular affine transformations of the data X1 , X2 , ..., Xn . Thus if F ∈ F1 , to obtain the null distribution of Tn,Ĝ (θ̂), we can assume that, under some conditions on G and θ̂, the data is a random sample from F0 . F1 is a location-scale family. Similar results to that stated in Theorem 5 can be given when F is a location family and when F is a scale family. 4 A bootstrap test In general, the null distribution of Tn,G (θ̂) is difficult to obtain and in addition it depends on the unknown true parameter value θ. This is also true for its limiting null distribution, since the eigenvalues of operator Ac = Ac (F (.; θ)), defined in Theorem 1064 V. Alba-Fernández, M.D. Jiménez-Gamero, and J. Muñoz Garcı́a 1, may depend on θ. Even if they were known, it would be computationally difficult to evaluate the limiting null distribution. Thus it is wise to look for other ways of approximating the null distribution of Tn,G (θ̂). In this section we show that the bootstrap can be used to consistently approximate the null distribution of Tn,G (θ̂) (and also that of Tn,Ĝ (θ̂)). In order to define properly the null bootstrap distribution of Tn,G (θ̂), we assume the following condition. Condition E. θ̂ = θ + o(1), for some θ ∈ Θ, with θ = θ0 if F (.) = F (.; θ0 ). Condition E implies that if n ≥ n0 , for some n0 ∈ N, then F (.; θ̂) ∈ F, given X1 , X2 , ..., Xn . Assume n ≥ n0 . Let X1∗ , X2∗ , ..., Xn∗ be a random sample from F (.; θ̂), ∗ given X1 , X2 , ..., Xn , and let Tn,G (θ̂∗ ) be the bootstrap version of Tn,G (θ̂), that is, ∗ ∗ ∗ ∗ 2 Tn,G (θ̂ ) = n |cn (t) − c(t; θ̂ )| dG(t), where θ̂∗ = θ̂(X1∗ , X2∗ , ..., Xn∗ ) and c∗n (t) is the e.c.f. of the bootstrap sample X1∗ , X2∗ , ..., Xn∗ . To show that the bootstrap consistently estimates the null distribution of Tn,G (θ̂), we need F (x; θ) to be smooth as a function of θ. Next condition guarantees this. Condition F. Each distribution F (x; θ) ∈ F has a probability density f (x; θ) with respect to τ , where τ is a σ-finite measure over (Rd , B), such that f(r) (x; θ) exists and satisfies Eθ [{f(r) (X; θ)/f (X; θ)}2 ] < ∞, 1 ≤ r ≤ p, and there is a neighborhood of θ, call it K(θ), such that if γ ∈ K(θ) then Eθ [{f(r) (X; θ) − f(r) (X; γ)}2 /f 2 (X; θ)] < ∞, 1 ≤ r ≤ p. Let Pθ denote the probability assuming that F (.) = F (.; θ) and let P∗ denote the conditional bootstrap probability above defined, given X1 , X2 , ..., Xn , that is, P∗ (.) = Pθ̂ . R R Theorem 6. Suppose conditions A, B-2, C, E and F hold and that Σ1 (γ) is continu∗ ous at γ = θ, where Σ1 (γ) = l(x; γ)l(x; γ)′ dF (x; γ). Then, supx∈R |P∗ (Tn,G (θ̂∗ ) ≤ x) − Pθ (Tn,G (θ̂) ≤ x)| → 0, as n → ∞, with probability one. As an immediate consequence of Theorem 6, we have that if we take t∗α such ∗ that P∗ (Tn,G (θ̂∗ ) ≥ t∗α ) = α, for some 0 < α < 1, then the test that rejects H0 when Tn,G (θ̂) ≥ t∗α has asymptotically the desired level. Note that conditions in Theorem 6 are weaker than those required in Fan [Fan98] to derive the consistency of the bootstrap approximation to the null distribution of Tn,G (θ̂). If in addition of the assumptions in Theorem 6 we assume that the weight function G is smooth as a function of θ we get a similar result to that in Theorem 6 for ∗ Tn,Ĝ (θ̂) with identical consequences. Let Tn, (θ̂∗ ) = n |c∗n (t) − c(t; θ̂∗ )|2 dG(t; θ̂∗ ). Ĝ R Theorem 7. Suppose assumptions in Theorem 6 with G(t) = G(t; θ) and that ∗ G(t; θ) satisfies condition D. Then, supx∈R |P∗ (Tn, (θ̂∗ ) ≤ x) − Pθ (Tn,Ĝ (θ̂) ≤ x)| → Ĝ 0, as n → ∞, with probability one. It is important to note that Theorems 6 and 7 hold whether H0 is true or not. If H0 is indeed true, Theorem 6 (7) implies that the bootstrap distribution of Tn,G (θ̂) (Tn,Ĝ (θ̂)) converges to its null distribution almost surely. If H0 is not true, then Theorem 6 (7) asserts that the bootstrap distribution of Tn,G (θ̂) (Tn,Ĝ (θ̂)) converges to the distribution of Tn,G (θ̂) (Tn,Ĝ (θ̂)) when the data have distribution function F (.; θ). Goodness-of-fit tests based on the empirical characteristic function 1065 5 A simulation study When the null distribution of the test statistic depends on the unknown true parameter value, we have seen that the bootstrap provides a way to obtain an approximate α-level test for testing H0 . To study the finite sample performance of the bootstrap test we have carried out a simulation study. The family F in H0 that we have considered is Morgenstern’s system of bivariate distributions with uniform marginals F = {F (x, y; θ) = F (x)F (y)[1 + θ{1 − F (x)}{1 − F (y)}], |θ| ≤ 1} , where F (x) is the distribution function of a U (0, 1) population. For testing H0 we have considered the tests statistic Tn,G (θ̂) with dG(t) = ςkc(t; 0)k2 dt, where ς = 1/ kc(t; 0)k2 dt and c(t; θ) is the c.f. of F (x, y; θ). For several sample sizes and several populations, we have generated M = 1000 samples and, for each sample, we calculated the bootstrap test. Table 1 shows the empirical power for α = 0.05, 0.10. Column (1) examines the bootstrap approximation to the null distribution of the test statistic, which seems to be good since the rejection frequencies are quite closed to the nominal sizes. Columns (2) to (5) examine the power of the test which is quite high for all the considered alternatives. R Table 1. Empirical power of the bootstrap test for data coming from (1) F (x, y; 0.5), (2) Two independent β(0.5, 0.5), (3) Two independent β(0.5, 1), (4) Two independent β(3, 2), (5) Two independent β(3, 3). Population (1) (2) (3) (4) (5) α 0.05 0.10 0.05 0.10 0.05 0.10 0.05 0.10 0.05 0.10 n = 20 n = 30 0.049 0.107 0.252 0.378 0.861 0.909 0.737 0.849 0.431 0.634 0.046 0.086 0.287 0.434 0.956 0.974 0.921 0.962 0.674 0.830 References [Kou80] [KK81] [Fan97] [EP83] [BH88] Koutrouvelis, I.A.: A goodness-of-fit test of simple hypothesis based on the empirical characteristic function. Biometrika, 67, 238–240 (1980) Koutrouvelis, I.A., Kellermeier, J.: A Goodness-of-fit Test based on the Empirical Characteristic Function when Parameters must be Estimated. J. R. Statist. Soc. B, 43, 173–176 (1981) Fan, Y.: Goodness-of-Fit Tests for a Multivariate Distribution by the Empirical Characteristic Function. J. Multivariate Anal., 62, 36–63 (1997) Epps, T.W., Pulley, L.B.: A test for normality based on the empirical characteristic function. Biometrika, 70, 723–726 (1983) Baringhaus, L., Henze, N.: A Consistent Test for Multivariate Normality Based on the Empirical Characteristic Function. Metrika, 35, 339–348 (1988) 1066 [Epp99] V. Alba-Fernández, M.D. Jiménez-Gamero, and J. Muñoz Garcı́a Epps, T.W.: Limiting behavior of the ICF test for normality under GramCharlier alternatives. Statist. Probab. Lett., 42, 175–184 (1999) [HW97] Henze, N., Wagner, T.: A New Approach to the BHEP Tests for Multivariate Normality. J. Multivariate Anal., 62, 1–23 (1997) [JMP03] Jiménez-Gamero, M.D., Muñoz-Garcı́a, J., Pino-Mejı́as, R.: Bootstrapping parameter estimated degenerate U and V statistics. Statist. Probab. Lett., 61, 61–70 (2003) [Fan98] Fan, Y.: Goodness-of-fit tests based on kernel density estimators with fixed smoothing parameters. Econometric Theory, 14, 604–621 (1998)