Using nonparametric regression to find local departures from a parametric model

advertisement
Using nonparametric regression to find local
departures from a parametric model
Mario Francisco-Fernández1 and Jean Opsomer2
1
2
Departamento de Matemáticas, Facultad de Informática, Universidad de A
Coruña, 15071 A Coruña. mariofr@udc.es
Department of Statistics Ames, IA 50014, U.S.A. jopsomer@iastate.edu
Summary. Goodness-of-fit evaluation of a parametric regression model is often
done through hypothesis testing, where the fit of the model of interest is compared
statistically to that obtained under a broader class of models. Nonparametric regression models are frequently used as the latter type of model, because of their flexibility
and wide applicability. To date, this type of tests has generally been performed globally, by comparing the parametric and nonparametric fits over the whole range of
the data. However, in some instances it might be of interest to test for deviations
from the parametric model that are localized to a subset of the data. In this case, a
global test will have low power and hence can miss important local deviations. Alternatively, a naive testing approach that discards all observations outside the local
interval will suffer from reduced sample size and potential overfitting. We therefore
propose a new local goodness-of-fit test for parametric regression models that can
be applied to a subset of the data but relies on global model fits. We compare the
new approach with the global and the naive tests, both theoretically and through
simulations. We find that the local test has a better ability to detect local deviations
than the other two tests, and propose a bootstrap-based approach for obtaining the
distribution of the test statistic.
Key words: Cramér-von Mises test, wild bootstrap, local polynomial regression
1 Introduction
A common task for statisticians is to determine whether a certain parametric model
is an appropriate representation for a dataset. As part of this determination, it is
often desirable to formally test the model, by treating the model as a null hypothesis
against an alternative (broader) model and evaluating the probability of obtaining
the observed data under the null hypothesis. Nonparametric models are a conceptually appealing choice for the alternative hypothesis, since they only require weak
model assumptions such as continuity and/or differentiability.
A number of authors have developed goodness-of-fit tests for parametric models
that rely on a nonparametric alternative, including [CKWY88], [ABH89] and [ES90].
[HM93], [SG-M96] and [ACG-M99] describe tests based on overall distance between
the parametric and nonparametric fits.
1260
Mario Francisco-Fernández and Jean Opsomer
While the methods covered in those articles are quite different, the deviations
between the parametric fit and the (nonparametric) fit under the alternative model
are all measured globally, i.e. they measure the cumulative discrepancies between
both models over the whole range of the data. However, there are cases in which we
might be more interested in discovering local deviations from the null model. Such
a deviation would be caused by a departure from a parametric model for a subset
of the data only, while the fit for the rest of the data might be well-represented by
the parametric model.
The idea of testing for local deviations was motivated by a collaboration of the
second author with U.S. Bureau of Labor Statistics staff, to evaluate the appropriateness of a ratio model used in producing monthly employment estimates. The
model used appears to be a reasonable overall model, but there is some concern that
the overall relationship might not hold uniformly well for all establishment sizes
(especially for the smallest or largest establishments), and hence lead to poor predictions for some of the establishments. A local test for deviations from the overall
parametric model could be useful for testing for local deviations, even in cases where
the overall model fits well.
In situations like that described above, three nonparametric testing approaches
are in principle possible: (1) apply a global test as those described above and attempt
to capture the local deviation, (2) apply a global test after discarding the data
outside the range of the potential local deviation, or (3) adapt a global test so
that it specifically targets the local deviation. The third type of test does not exist
in the statistical literature and will be developed later in this article, but we will
describe some of the drawbacks with the first two testing approaches in detecting
local deviations.
Because of the cumulative or “integrated” nature of most global measures of
discrepancy between null and alternative hypotheses, a global test is likely to be
unable to differentiate between one relatively large but localized discrepancy and
many smaller discrepancies between the fits. This inability results in global methods
having low power in situations with localized deviations.
The second type of tests consists of applying a global test on a restricted dataset
obtained by discarding all data outside the range of interest. This approach, which
we will refer to as a “naive” test, has the intuitive advantage of specifically targeting
the area of interest, but has a number of important drawbacks. First, by reducing
the sample size, the test looses power. Second, this procedure suffers from potentially
severe “over-fitting bias,” in the sense that the parametric fit in the local interval
might deviate from the overall parametric model fit, with a resulting further loss of
power. Finally, it requires that the null and alternative fit be recomputed for each
subset to be tested, which adds to the computational complexity of the procedure.
The third type, to be further developed in what follows, is a local testing procedure that can be applied when a global parametric fit is obtained, but the presence
of local deviations is suspected in a subset of the data. By relying on global fits
but restricting the testing interval to a specific area, this testing approach can be
expected to combine the good features of global and naive tests while avoiding most
of their disadvantages.
In this article, we will generalize the global Cramér-von Mises-based test of
[ACG-M99], but many other methods could be localized in a similar manner. We
will derive the asymptotic properties of the proposed local test, and compare them
with those of the corresponding global and naive tests. We then propose a bootstrap-
Testing local departures from a parametric model
1261
based method to compute the critical value for the local test without relying on its
asymptotic distribution. Finally, we will compare the behavior of the three testing
procedures in simulation experiments. The results will confirm the superiority of
the proposed local procedure over the other two approaches when a local deviation
needs to be evaluated.
2 Proposed Local Test
We describe the local hypothesis test for the situation in which the assumed null
model is a polynomial in a single continuous covariate, as was done by [ACG-M99].
This is also the situation being considered in the CES survey. The approach continues to hold, with some modifications, if a different parametric form is considered.
Suppose that we observe data (xi , Yi ), i = 1, . . . , n, where xi ∈ [a, b], and we believe
that the model
Yi = xTi β + εi
with x = (1, x, . . . , xp )T for some fixed p ≥ 0, is a reasonable overall specification
for the data, but we are concerned that there might be local departures from that
model. Therefore, we would like to test the hypothesis
H0 : E(Y |x) = xT β
versus
HA : E(Y |x) = m(x) = xT β
where m(·) is a smooth but otherwise unspecified function. Because we are interested
in a local test, we would like to develop a procedure that can be applied to a specific
interval within [a, b] and determine whether observed discrepancies in that interval
are statistically significant. If such a local discrepancy is identified and found to be
sufficiently large to warrant modifying the model, then the overall parametric model
could be extended in such a way that it captures the observed deviation. For our
theoretical results to hold, the interval need to be picked without first looking at
the data. This problem is present in many situations in statistics, for instance, in
model selection. We could also use multiple intervals, but in this case a Bonferroni
correction, for example, should be used.
We will use the Cramér-von Mises test statistic proposed in [ACG-M99], adjusted
for the fact that we are only testing for deviations in a domain Dn ⊂ [a, b] (the
subscript n is added as we will be considering the asymptotic properties of the test).
Under H0 , the mean function is assumed to be a polynomial in x, and the parameter
β is estimated by least squares regression. Let m0 (x; β̂) denote the estimator for
m(x) in that case. Under HA , m is no longer assumed to be a polynomial function,
but it is still assumed to be smooth function of x. In that case, m(x) is estimated
by local polynomial regression, m̂(x; hn ), with kernel function K(·), degree of the
local polynomial q, and bandwidth hn . The local Cramér-von Mises test statistic is
defined as
m̂(x; hn ) − m0 (x; β̂)
Tn,L = dn
2
w(x)dx,
(1)
Dn
where w(·) is a fixed weight function and dn is a variance stabilizing constant. It is
clear that the statistic Tn,L will be large when the difference between the parametric
1262
Mario Francisco-Fernández and Jean Opsomer
and nonparametric fits, evaluated on the domain Dn , is large, and small otherwise.
Specifically, the types of model deviations that can be captured by this test are of
the form m(x) − m0 (x; β) ≡ cn g(x), where g(·) is a fixed non-zero bounded function
orthogonal to polynomials of degree p, and cn a measure of the size of the deviation.
The following assumptions are needed:
A 1 Model assumptions: The variable x has compact support [a, b] and its density
f (·) is bounded away from 0. The functions m(·) and f (·) are twice continuously
differentiable. The variance function σ 2 (x) = Var(Y |x) is continuous and bounded
away from 0 and ∞.
A 2 Parametric
√ estimator assumptions: The parameter β is estimated by β̂, with
β̂ − β = Op (1/ n).
A 3 Nonparametric estimator assumptions: The kernel K is symmetric, bounded
and has compact support. The local polynomial degree q ≥ p, where p is the degree
→ ∞.
of the null model polynomial. The bandwidth satisfies hn → 0, nhq+1
n
A 4 Test assumptions: The model deviation cn satisfy cn → 0, hn /cn → 0. The
domain Dn is of size Dn , with Dn → 0 and hn /Dn → 0.
These assumptions are not very restictive, the theory for Tn,L (and the test)
will fail if assumption A1 is not true, that is, if the relationship between x and Y is
not smooth. This has to do with what type of alternative hypothesis one wants to
test, but in many cases, this will be reasonable. In other words, the test given is for
the shape only, not the relationship itself. The other assumptions are all things the
statistician can control. So overall, the results obtained are quite general.
Theorem 1. Under assumptions A1–A4, the test statistic Tn,L defined in (1) has
the asymptotic distribution
Tn,L →L
N(b0n + b1n , VL ),
where
b0n =
dn (2)
K̃ (0)
nhn q
Dn
σ 2 (x)w(x)
dx,
f (x)
K̃q,h ∗ g(x)
b1n = dn c2n
2
w(x)dx,
Dn
and
VL = 2
d2n
K̃ (4) (0)
n2 hn q
Dn
σ 2 (x)w(x)
f (x)
2
dx,
(r)
where K̃q (·) is the equivalent kernel of order q, K̃q (·) denotes its r-fold convolution
with itself and K̃q,h (x) = 1/h K̃q (x/h).
The asymptotic distribution in Theorem 1 is valid under both the null and the
alternative hypothesis. If the null model is true, i.e. cn = 0, then b1n = 0, so that
this result can be used to derive an asymptotic test as long as b0n is known or can
be estimated consistently.
Testing local departures from a parametric model
1263
The asymptotic distribution of the global test statistic, denoted by Tn,G in what
follows, is similar to that shown in Theorem 1 for Tn,L , except that the Dn are
replaced by the full interval [a, b] everywhere. That of the naive local test, denoted
Tn,N , is the same as in Theorem 1 except that n needs to be replaced by nD ,
the sample size for the observations that fall in Dn . Note that nD /n ∝ Dn by
assumption A1. We will denote by Vt , t = L, G, N the asymptotic variance of the
three statistics.
√
The asymptotic power of all three tests depends only on the ratio b1n / V t , which
is independent of the normalization constant dn . In order to compare the tests in
the case of a local deviation, we consider a deviation from the null model that is
localized in an interval Dn and of size cn . Such a deviation would have g(x) = 0 for
x ∈ Dn and 0 otherwise. In this case, the asymptotic bias term b1n is the same as in
Theorem 1 for all three tests. We are assuming that we can estimate the bias term
†
the test statistic adjusted for this bias. In
b0n for each of them, and denote by Tn,t
the following theorem, we compare the power of the three tests, defined as
Pα,t ≡ Pr
†
Tn,t
b1n
√ > z1−α − √
Vt
Vt
,
where z1−α is the 1 − α quantile of the standard normal distribution. We make the
following additional assumption:
√
A 5 b1n / Vt does not converge to 0 for t = L, G, N,
Theorem 2. Under the assumptions of Theorem 1 and A5, there exists n0 such that
for all n > n0 , the following relations hold:
Pα,L
= 1 + aLG Dn −1/2
Pα,G
Pα,L
= 1 + aLN Dn −1
Pα,N
Pα,G
= 1 + aGN Dn −1/2
Pα,N
for some positive constants aLG , aLN , aGN > 0, as long as b1n = 0.
The theorem shows that, for sufficiently large n and for small local intervals
Dn , the local test has more power to detect a local deviation from the null hypothesis than either the global or the naive test. The global test of [ACG-M99] still
dominates the naive test, which is asymptotically the worst performer. It should be
noted that the potential “overfitting bias” of the naive test mentioned in Section
1 is excluded by assumption A2, so that the inferior performance of naive test in
Theorem 2 will be further degraded when that assumption does not hold.
Assumption A5 excludes the situation in which the power of any of the tests
converges to its level α, which implies that the power of that test decreases with
increasing sample size. In that situation, any test with non-decreasing power will
trivially dominate a test with decreasing power, but the rates of Theorem 2 will no
longer hold. Similarly, power ratios involving tests that both have decreasing power
will converge to 1.
As noted in other nonparametric testing contexts, the asymptotic distribution
of Theorem 1 is often not sufficiently precise for constructing a practical test in
1264
Mario Francisco-Fernández and Jean Opsomer
small-to-medium sample size situation. Hence, following [ACG-M99], we will instead
construct a bootstrap-based distribution function for Tn,L , from which the relevant
probabilities can be estimated.
The procedure is based on the wild bootstrap procedure of [HM93]. Using the
bootstrap residuals ξi∗ , a bootstrap sample under the null model is constructed, and
∗
is computed as in (1).
the bootstrap test statistics Tn,L
∗
Theorem 3. Under assumptions A1–A4, the bootstrap test statistic Tn,L
has the
same asymptotic distribution as Tn,L under H0 :
∗
→L
Tn,L
N(b0n , VL ).
∗
∗
Similar results can be obtained for bootstrapped statistics Tn,G
and Tn,N
.
3 Simulation Study
In this Section, we describe a simulation experiment to compare the behavior of the
three tests. We generated 1000 samples of sample
size n = 100 for the regression
model Yi = (2xi − 1) + 0.5 exp −50(2xi − 1.5)2 + σεi , with σ 2 = 0.25, a fixed and
equally-spaced design in the interval [0, 1], and errors generated from i.i.d. N (0, 1)
random variables. The wild bootstrap resampling was performed B = 500 times for
each sample. The weight function used in all three tests was taken as w(x) = 1.
The interval [0, 1] was split in six sub-intervals, [0, 0.2], [0.2, 0.4], [0.4, 0.6], [0.6, 0.8],
[0.8, 1] as well as [0.15, 0.35], where the deviation from a linear model is localized.
Figure 1 shows the regression function, a random sample from this model and the
six sub-intervals considered.
2
1.5
1
0.5
0
−0.5
−1
−1.5
−2
−2.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Fig. 1. Regression function, a random sample and the six sub-intervals considered.
In this model, the parametric estimator under the null hypothesis was calculated
using the Ordinary Least Squares estimator, while the model under the alternative
hypothesis was fitted using local linear regression with an Epaneshnikov kernel and
bandwidth values h = 0.05, 0.075 and 0.1. The simulated rejection probabilities
Testing local departures from a parametric model
1265
using Tn,L , Tn,G and TnN for significance levels α = 0.05 and 0.1, as well as the
mean and standard deviation of the p-values over the 1000 trials are obtained.
When comparing the results for the tests, it is clear that the probability of (correctly) rejecting the null hypothesis as well as the average p-value are substantially
higher in the local test on the intervals [0.2, 0.4] and [0.15, 0.35] than in the global
test, even though the overall deviation in the mean model is the same. The naive
test performed the worst, almost completely missing the model deviation. This is
likely due primarily to the overfitting bias described earlier, which causes a line fitted inside each of the intervals to be close to the nonparametric fit, instead of to
the overall linear fit. We can also evaluate the behavior of the local and naive tests
when the null model is correct, by considering the results in the intervals [0.6, 0.8]
and [0.8, 1]. The results show that the tests approximately achieve the correct level
of α for all three considered bandwidth values.
Finally, we evaluated the finite sample behavior of the asymptotic test of Theorem 1, by applying it to the same simulation setup as above. Under the null hypothesis, it requires explicit estimation of b0n and VL . In all the intervals considered, the
asymptotic test displayed a pronounced tendency to reject the null hypothesis. This
happened even when the null model was correct (as in the intervals [0.6, 0.8] and
[0.8, 1]). Hence, the asymptotic test appears to be severely biased, and we recommend using the bootstrap-based test for practical applications of local testing. We
repeated the simulations using the asymptotic test with other sample sizes (n = 200
and 500) and we observed a very slight better behavior for this procedure when the
sample size increases.
Acknowledgments
The authors wish to thank two referees for their helpful comments and suggestions.The research of M. Francisco-Fernández was partially supported by MEC
Grant MTM2005-00429 (ERDF included) and Grant PGIDIT03PXIC10505PN. The
research of J. Opsomer was supported by Personal Service Contract B9J31191 with
the U.S. Bureau of Labor Statistics.
A Proofs
Proof of Theorem 1: The proof proceeds along the lines of that of Theorem 2.1
in [ACG-M99].
Proof of Theorem 2: Note first that by Theorem 1 and a Taylor expansion, we can
approximate
Pα,t =
b1n
1 − Φ z1−α − √
Vt
(1 + o(1)) =
b1n
α + φ(z1−α ) √
Vt
(1 + o(1)),
where Φ(·) is the cumulative normal distribution and φ(·) is its density. From Theorem 1, we also know that
√
Bt ≡ b1n / Vt = O(c2n Dn t /2 h1/2 )
1266
Mario Francisco-Fernández and Jean Opsomer
with L = 1, G = 2 and N = 3, and we note that Bt > 0 for all t (by Theorem 1
and the assumption on b1n ). Hence, we can approximate the three different ratios
as
B
−B
t
t
φ(z1−α )
Bt
Pα,t
(1 + o(1))
=1+
Pα,t
α/Bt + φ((z1−α )
(2)
The results of the theorem then follow directly from assumption A5 and (2), as long
as n is large enough for the leading positive term in the approximation to dominate
the smaller order terms.
Proof of Theorem 3: The proof uses the method of imitation applied to the proof
of Theorem 1. The proof is again analogous to that of Theorem 2.1 in [ACG-M99],
except that the moments under the bootstrap model require an additional approximation step.
References
[ACG-M99] Alcalá, J.T., Cristobal, J.A., González-Manteiga, W.: Goodness-of-fit
test for linear models based on local polynomials. Statistics & Probability
Letters, 42, 39–46 (1999)
[ABH89] Azzalini, A., Bowman, A.W., Härdle, W.: On the use of nonparametric
regression for model cheking. Biometrika, 76, 1–12 (1989)
[CKWY88] Cox, D., Koh, E., Wahba, G., Yandell, B.S.: Testing the (parametric)
null model hypothesis in (semiparametric) partial and generalized spline
models. The Annals of Statistics, 16, 113–119 (1988)
[ES90]
Eubank, R.L:, Spiegelman, C.H.: Testing the goodness of fit of a linear
model via nonparametric regression techniques. Journal of the American
Statistical Association, 85, 387–392 (1990)
[HM93] Härdle, W., Mammen, E.: Comparing nonparametric versus parametric
regression fits. The Annals of Statistics, 21, 1926–1947 (1993)
[SG-M96] Stute, W., González-Manteiga, W.: NN goodness-of-fit tests for linear
models. Journal of Statistical Planning and Inference, 53, 75–92 (1996)
Download