Uploaded by CARLOS SANTIAGO TORRES INGA

gasser1986

advertisement
Biometrika (1986), 73, 3, pp. 625-33
Printed in Great Britain
Residual variance and residual pattern
in nonlinear regression
BY THEO GASSER
ZentralinstitutfurSeelische Gesundheit, D-6800 Mannheim 1, Federal Republic of Germany
LOTHAR SROKA
Institut fur Angewandte Mathematik, Universitat Heidelberg, Im Neuenheimer Feld 294,
D-6900 Heidelberg 1, Federal Republic of Germany
SUMMARY
A nonparametric estimator of residual variance in nonlinear regression is proposed.
It is based on local linear fitting. Asymptotically the estimator has a small bias, but a
larger variance compared with the parametric estimator in linear regression. Finite sample
properties are investigated in a simulation study, including a comparison with other
nonparametric estimators. The method is also useful for spotting heteroscedasticity and
outliers in the residuals at an early stage of the data analysis. A further application is
checking the fit of parametric models. This is illustrated for longitudinal growth data.
Some key words: Heteroscedasticity; Nonlinear regression; Nonparametric estimation; Outlier; Residual
variance.
1. INTRODUCTION
Let us assume that data Xx,...,
Xn follow a regression model with fixed design
(i=l,...,n).
(1)
The regression function \x is required to be 'smooth', as specified below, and the residuals
{e,} are independent random variables with expectation zero.
Information about one of the two summands in (1) will be helpful to obtain information
about the other one. Whenever a valid parametric model for /x is at hand, least-squares
techniques will give an estimate of residual variance in addition to an estimate of the
parameter vector. At the onset of data analysis, there often is no valid parametric model
available and a nonparametric estimate of residual variance may be wanted for a number
of reasons. For instance, it is often required in the field of application, it is helpful in
the process of model building and it might also be helpful for nonparametric regression
estimates. As pointed out by Silverman (1984), cross-validation for smoothing splines
works badly in some cases, by leading to interpolation rather than smoothing. This could
be spotted if information on residual variance were available. Our recent work on fitting
regression functions to longitudinal data of height growth (Stutzle et al., 1980; Gasser,
Muller et al., 1984; Gasser, Kohler et al., 1984) has shown that deficient models are in
use in classical fields of application. A parametric approach will then yield a biased
Downloaded from http://biomet.oxfordjournals.org/ at Carleton University on June 2, 2014
AND CHRISTINE JENNEN-STEINMETZ
Zentralinstitut fur Seelische Gesundheit, D-6800 Mannheim 1, Federal Republic of Germany
626
T. GASSER, L. SROKA AND C. JENNEN-STEINMETZ
2. DEFINITION AND THEORY
Pseudo-residuals e( are defined as follows. Assume model (1) with a common residual
variance a2. Pseudo-residuals {!,} are obtained by taking continuous triples of design
points <,_!, t{, tl+i Joining the two outer observations by a straight line and then computing
the difference between this straight line and the middle observation
(« = 2 , . . . , n - l ) .
2
2
2
(2)
2
Since E{e]) = (a + b + I)cr + O(n~ ) for fi twice continuously differentiate, we are led
to the following definition of an estimate of residual variance S2e:
1
n-l
l
S2
V
2 -2
n — l j-2
where c2 = (a2t + b] + \)~x for i = 2 , . . . , n - l .
Two further proposals for obtaining residual variance nonparametrically will be mentioned briefly. Rice, in an unpublished report, suggested using
t ^ ^ - r z " ! {X(ti+1)-X(ti)}2
l\n
(3)
i)
and J. Cuzick, in a talk given at Oberwolfach, used this as a criterion for determining
parameters within a semiparametric framework. Wahba (1978) and Silverman (1985)
proposed estimating pseudo-residuals at tt by leaving out X(t,) when computing a
nonparametric estimate at f,, using an optimal bandwidth.
Table 1 compares these estimators in terms of finite sample expectation as determined
by simulation; for details, see §3. The underlying functional model was fi(t) =
Zi exp (—z2t), z, = 5, z2 = 5. The bias is always positive due to an incomplete removal of
the variation of fi{t), and it is proportionally largest for small a2 and small n. For the
estimator proposed here, the bias is small in practical terms and it is much smaller than
for the two other estimators. The same pattern prevailed for some other functional models,
Downloaded from http://biomet.oxfordjournals.org/ at Carleton University on June 2, 2014
estimate not only of the regression function but also of residual variance. It is a further
aim of the proposed procedure to obtain information about heteroscedasticity and outliers
at an early stage of data analysis.
Our approach is based on local fitting and has grown out of work on the analysis of
growth curves (Gasser, Miiller et al., 1984). It is remarkably simple and has in related
forms been considered by others; see, for instance, Breiman & Meisel (1976) and Rice
(1984). Rice proposed a slightly different estimator for the purpose offixingthe bandwidth
in kernel estimation. The problem is semiparametric, since the parameter a2 has to be
estimated in the presence of the infinite dimensional nuisance parameter /A. In § 2, some
theory is presented giving asymptotic expressions for bias and variance, as well as for
skewness and kurtosis. Consistency and asymptotic normality are derived and, for
normally distributed residuals, distributional results are obtained. The simulations in § 3
check the bias and the accuracy of asymptotic approximations. In § 4 we study heteroscedasticity and the possibility of spotting outliers in nonlinear regression. The last section
illustrates the usefulness of the method for applications.
Residuals in nonlinear regression
627
Table 1. Nonparametric estimates of residual variance; mean of 400
replications
n
Estimator
S^xlO 4
Estimator
^xlO4
Estimator
with optimal
smoothing xlO 4
50
25
100
59
49
590
81
190
59
100
25
100
108
100
643
132
267
115
900
25
100
950
908
1494
942
1406
997
Regression function, exponential decay. Residuals, zero mean normal with
variance a2; n, sample size.
Assumption 1. There are no multiple measurements at any design point, that is
a = tx<t2<.. .<tn = b; a = 0, b = \ without loss of generality.
Assumption 2. We require max |f, - f,_,| = 0(1/n).
Assumption 3. The {e,} are independent and identically distributed with £(e,) = 0,
:oo.
Assumption 4. The function fi is continuous.
The case of multiple measurements will be considered at the end of this section.
Heteroscedasticity is deferred to § 4. Normally distributed residuals are of particular
importance and will be dealt with in more detail. It is easy to see that the above conditions
are sufficient for asymptotic unbiasedness of S\. The assumption that fi is twice continuously diSerentiable leads to a bias of order O(n~*), proportional to integrated squared
second derivatives of fi.
Let C denote the ( n - 2 ) x ( n - 2 ) diagonal matrix with elements C,, = c,+), and let A
be (n - 2 ) x n tridiagonal with elements Au = al+i, Au+l = —1, Au+2 = bi+l a °d D = ATC2A.
The vector of pseudo-residuals e = (e2,..., en-i) and e = ( e , , . . . , en) are then related
by e = Ae + B, where B represents bias and then 5^ = (n-2)~ 1 ||C£|| 2 . This leads to
where E(e]) = m3o-3, E(e*) = m4a*.
The first remainder term disappears for symmetric residual distributions. For an
equidistant design and normally distributed residuals, the leading term becomes
The increase in asymptotic variance compared with a linear parametric regression is then
94%. The penalty for not knowing /x could be reduced by local fitting over more data
points. This would, however, need further assumptions for controlling the bias, which is
Downloaded from http://biomet.oxfordjournals.org/ at Carleton University on June 2, 2014
differences between estimators being often smaller than for the example considered. A
theoretical explanation will be given later in this section.
Some assumptions are needed for obtaining asymptotic results.
628
T. GASSER, L. SROKA AND C. JENNEN-STEINMETZ
kept minimal in the present set-up. For normally distributed residuals, skewness and
excess are as follows (Box, 1954):
Given Assumptions 1-4, S\ is a strongly consistent estimate of a2. Assuming
in addition that \fi(t) — fi(s)\^const\t-s\y,
for f,.ye[O,1], y>\, we obtain
THEOREM.
U-\S2t-a2)~N(0,aA),
where
^n£
(5)
where the x) a r e independent *2-variables with one degree of freedom and the A; are
the eigenvalues of D. By equating the first two moments (Box, 1954), one might treat
this approximately as a #2-variable with h degrees of freedom times a constant g, with
g = a2tr{D2)/{n-2)2,
h = {n-2)2/tr{D2).
Alternatively, one might take log S2 as normally distributed with expectation log a2 and
variance 2tr (D2)/{n -2) 2 . The distribution given by (5) and the above approximations
are expected to be useful for computing confidence intervals and one-sample tests. For
misspecified regression models, the residual variance will be overestimated by the
parametric estimator as a result of bias, and confidence intervals for a2 might thus be
useful for model selection and model checking.
When multiple measurements are allowed at some design points, two cases have to be
distinguished. When f,_, = t, = ti+1, we simply define a, =fc,=5. When
a, = (ti+1 - *,-)/(*,+, - tj_fc),
b, = (t, - *,_*)/(*,+, - f,_k).
The arguments then carry over with minor modifications.
The estimator a2, given by (3), is such that
2
).
(6)
The leading term of the bias of a2 is of lower order in n"1 than that of S2, and is moreover
proportional to the integrated first, not the second, squared derivative of \L. This explains
why the finite sample bias in Table 1, and in other cases too, is much larger for a2. The
estimator of a2 which is based on leave-one-out residuals with respect to an optimally
smoothed curve has a larger bias than S\, too; see Table 1. These residuals inherit the
Downloaded from http://biomet.oxfordjournals.org/ at Carleton University on June 2, 2014
The proof of strong consistency is straightforward. Asymptotic normality follows from
a proof of the central limit theorem for m-dependent sequences by Orey (1958) or from
a proof of asymptotic normality for quadratic forms given by Whittle (1964). In the latter
case, the moment conditions have to be strengthened somewhat. If the bias term is
disregarded, thefinitesample distribution of S2 becomes for normally distributed residuals
(Box, 1954)
Residuals in nonlinear regression
629
smoothing bias, which may be substantial (Gasser & Miiller, 1984). The asymptotic
variance of a2 is somewhat lower than that of S2.. This turned out to be true in simulations
as well, at least for large sample size.
3. SIMULATION RESULTS
We have undertaken a small simulation to study primarily bias and secondly the validity
of the formula for asymptotic variance (4) and of the asymptotic distribution of S2E as
given in the theorem. The following three functional models were used:
model A,
(z, = 2-0, z2 = - 5 , z3 = 5, z4 = 5, z5 = 0-04),
model B,
fi(t) = zt sin (2irt), (zi = 5),
with an equidistant design on [0,1]. The sample size was taken as n = 25, 100. Pseudorandom normal residuals were generated from uniform variates by the polar method;
uniform variates were obtained by a two-seed congruential scheme (Marsaglia, 1972;
McLaren & Marsaglia, 1965). Three levels of residual variance, corresponding to low,
medium and high noise, according to a visual judgement, were fixed as given in Tables
1 and 2. To obtain reasonably accurate estimates for our purposes, 400 replications were
considered sufficient.
Table 2 gives the results for models A and C, while results for model B have been
given in Table 1. The first column gives the mean, the second the bias, standardized by
Table 2. Nonparametric estimate S2e of residual variance: mean, standardized bias and
empirical variance for 400 replications, and asymptotic variance
n
2
3
Ave S . x 10
A
•2\
win 3
X lu
2
2
r (S .) xlO
3
<r ( S ^ x l O 3
asymp.
(a) Gaussian regression function
50
25
100
55
51
250
49
0-40
010
250
25
100
250
250
33
80
9-6
2-4
900
25
100
904
911
11
64
150
30
0-41
010
100
2-5
130
32
(b) Harmonic regression function
25
100
90
81
310
91
400
25
100
408
401
50
17
1000
25
100
1040
1000
95
24
80
Residuals, zero mean normal with variance a2; n, sample size.
10
0-23
27
5-6
180
35
11
0-25
26
6-3
170
39
Downloaded from http://biomet.oxfordjournals.org/ at Carleton University on June 2, 2014
model C,
630
T. GASSER, L. SROKA AND C. JENNEN-STEINMETZ
4. HETEROSCEDASTICITY AND OUTLIERS
Assume a smooth trend a2{t) in the residual variance. From pseudo-residuals, one
can obtain an asymptotically unbiased pointwise estimate of cr2(f,) as c2!2. Since variability becomes intolerably high, one has to smooth these local estimates to obtain a reasonable
estimate of cr2(t) or to average over replicates, if available; see § 5 for an example. In a
small simulation, model A and a parabolic trend in residual variance was assumed
(n = 50). Based on 400 replications, the averaged point wise residual variance is shown
in Fig. 1, both for ordinary and pseudo-residuals. Bias does not become a serious problem
even when estimating a trend in residual variance.
Outliers should ideally be spotted at the beginning of a data analysis. In nonlinear
regression, there is usually no adequate model at this stage and this makes it a rather
difficult task. Figure 2 illustrates that pseudo-residuals might be helpful to detect outliers
visually and perhaps also automatically: an outlier was produced for model A (<x2 = 0-9)
and it can, in fact, be spotted easily from the pseudo-residuals. By checking for pseudoresidual falling outside, for example ±35 e , one can identify potential outliers automatically. For this, it is advisable to use a robust estimate of variance, such as the properly
normalized interquartile range, which we have used in the example.
0-2-
01
0
0-2
0-4
Time
0-6
0-8
10
Fig. 1. Heteroscedastic variance with parabolic trend, y. Solid
line, simulated residual variance; dotted line, pointwise estimated residual variance. Model A, n = 50 data points.
Average of 400 Monte Carlo replications.
Downloaded from http://biomet.oxfordjournals.org/ at Carleton University on June 2, 2014
the standard error, and the next two the empirical and the asymptotic variances. The
bias is small for medium to large residual variance, even for small n, and for small
residual variance it is small for large n, from n = 50 onwards, not tabulated, and modest
for small n.
The agreement between empirical and asymptotic variance of the estimate is satisfactory.
For model B we checked the effect of nonequidistance: the design was perturbed by
setting tf = /,+ [/(/,-</_,), where {t,} is equidistant and where U is uniform on the
interval (-5, +5). The bias of S2t was then about the same as for the equidistant design.
The adequacy of distributional approximations was checked graphically and by applying the Kolmogorov-Smirnov test. The ^-approximation was best, but the normal
approximation to log S2 not much worse, if the expectation was taken up to order l/n,
whereas the normal approximation was clearly worst. For small n and small a2, bias led
to some discrepancies between empirical and approximate distribution. From the results
of Tables 1 and 2, confidence intervals or tests are not likely to be much distorted.
Residuals in nonlinear regression
631
5. APPLICATIONS
Despite long study, modelling of human height growth from birth to adulthood is still
difficult (Gasser, Kohler et al., 1984; Gasser, Miiller et al., 1984). A parametric model
was suggested by Bock et al. (1973) and judged to provide a satisfactory fit. Preece &
Baines (1978) and El Lozy (1978), however, found this model to be seriously deficient
and Preece and Baines suggested better fitting parametric functions. In our above
mentioned papers, it was shown that the Preece & Baines models, too, have some
deficiencies, first qualitatively, by lacking a mid-growth spurt at about 7 years of age,
which is present in the data, and secondly quantitatively, when characterizing the pubertal
spurt. The question, then, arises whether the techniques just discussed might be helpful
in such a situation. Relying on the longitudinal data of 45 boys and 45 girls previously
analysed, the following specific problems were considered.
(i) How large is the residual variance in height growth and is there heteroscedasticity?
Note that model bias might mimic heteroscedasticity in a parametric analysis.
(ii) Are there sex differences in residual variance? Residual variance with respect to
parametric models was consistently higher for boys, but this might be due to a
larger bias of the model.
(iii) Is it possible to spot deficiencies in the Preece-Baines model using pseudoresiduals? In view of previous work by Preece & Baines (1978) and by Hauspie
et al. (1980), this is not a trivial matter.
To study average residual variance across age for a sample of N boys or girls, the
following estimator, correcting for bias, was used:
where £fc('i) is the pseudo-residual for subject k at age f(.
Figure 3 shows the results comparing boys and girls. For both, heteroscedasticity is
present, with larger variances below 4 years and with smaller ones towards the end of
the growth process. The pattern and the size is similar for the two sexes, except around
14-15 years, where boys have much larger variance. Our interpretation is that this is a
Downloaded from http://biomet.oxfordjournals.org/ at Carleton University on June 2, 2014
Fig. 2(a). One simulation of noisy regression function for model A (o-2 = 0-9); outlier produced at
0-45 (dotted line), (b). Pseudo-residuals; horizontal dotted lines ±3 standard deviations, estimated
robustly.
632
T. GASSER, L. SROKA AND C. JENNEN-STEINMETZ
true sex difference and that it comes from a later pubertal spurt and thus a later stopping
of growth for boys. Evidently, variability has to go down when further growth begins to
stop.
0-6
0-4
0-2
0
2
4
6
Our expectation that pseudo-residuals might be useful for model diagnostics in nonlinear regression, was verified for this example. Such diagnostics are, by other means,
more difficult whenever heteroscedasticity is suspected.
Figure 4 shows pointwise residual variances for boys, obtained from pseudo-residuals
and from fitting the Preece-Baines model. Large discrepancies arise below 5 years of
age; Preece & Baines (1978) validated their model from 4 years onwards. The squared
parametric residuals show a further large deviation peaking at age 7. This is the so-called
mid-growth spurt, not contained in the Preece-Baines model; this small phenomenon
was verified by Gasser, Muller et al. (1984) and quantified for the first time by Gasser
et al. (1985). The nonparametric residual variance and the parametric variance were also
compared by the Wilcoxon one-sample test, separately for boys and girls from 4 years
onwards. For both sexes, the parametric residual variance was significantly higher,
p < 10~8, indicating serious lack of fit of the parametric model. For diagnostic purposes,
tests were also computed for each age separately. The results again pointed out the
importance of incorporating the mid-growth spurt into a functional model. The residual
analysis undertaken by Preece & Baines (1978) and by Hauspie et al. (1980) did not lead
to detailed insight about model deficiencies.
2
1
0
8 10 12 14
Age in years
16
18 20
Fig. 4. Comparison of parametric residual variance,
Preece-Baines model, shown by dotted line, with
nonparametric estimate, solid line. Longitudinal
height, y, n = 45 boys. Pointwise estimate for age, x,
averaged across subjects.
Downloaded from http://biomet.oxfordjournals.org/ at Carleton University on June 2, 2014
8 10 12 14 16 18 20
Age in years
Fig. 3. Residual variance of height growth, y, for n = 45
boys, shown by solid line, and n = 45 girls, dotted line;
pointwise estimate for age, x, averaged across subjects,
corrected for bias.
Residuals in nonlinear regression
633
The tools suggested are computationally very cheap and directly interpretable. They
provide useful information at an early stage of the analysis of curve data, which is
otherwise not easily available. The relatively small bias in residual variance, arising when
the true variance is small, should not be too problematic, since this is a favourable
situation for data analysis anyhow. The estimator suggested leads also to a nonparametric
measure of determinacy as (1 -S2JS2X), where S2X is the variance of measurements, which
might often be useful, but this has not been studied here.
ACKNOWLEDGMENT
This work has been performed as part of the research program of the Sonderforschungsbereich 123 (project Bl) at the University of Heidelberg and was made possible by
financial support from the Deutsche Forschungsgemeinschaft.
R. D., WAINER, H., PETERSEN, A., THISSEN, D., MURRAY, J. & ROCHE, A. (1973). A parameterization for individual human growth curves. Hum. BioL 45, 63-80.
Box, G. E. P. (1954). Some theorems on quadratic forms applied in the study of analysis of variance
problems, I: effect of inequality of variance in the one-way classification. Ann. Math. Statist. 25, 290-302.
BREIMAN, L. & MEISEL, W. S. (1976). General estimates of the intrinsic variability of data in nonlinear
regression models. /. Am. Statist. Assoc 71, 301-7.
EL LOZY, M. (1978). A critical review of the double and triple logistic growth curves. Ann. Hum. BioL 5,389-94.
GASSER, T., KOHLER, W., MULLER, H. G. KNEIP, A., LARGO, R., MOLINARI, L. & PRADER, A. (1984).
Velocity and acceleration of height growth using kernel estimation. Ann. Hum. BioL 11, 397-411.
GASSER, T. & MULLER, H. G. (1984). Estimating regression functions and their derivatives by the kernel
method. Scand. J. Statist 11, 171-85.
GASSER, T., MULLER, H. G., KOHLER, W., MOLINARI, L. & PRADER, A. (1984). Nonparametric regression
analysis of growth curves. Ann. Statist. 12, 210-29.
GASSER, T., MULLER, H. G., KOHLER, W., PRADER, A. LARGO, R. & MOLINARI, L. (1985). An analysis
of the mid-growth and adolescent spurts of height based on acceleration. Ann. Hum. BioL 12, 129-48.
HAUSPIE, R. C , WACHHOLDER, A., BARON, G., CANTRAINE, F., SUSANNE, C. & GRAFFAR, M. (1980).
A comparative study of the fit of four different functions to longitudinal data of growth in height of
Belgian girls. Ann. Hum. BioL 7, 347-58.
MARSAGLIA, G. (1972). The structure of linear congruential sequences. In Applications of Number Theory
to Numerical Analysis, Ed. S. K. Zaremba, pp. 241-85. New York: Academic Press.
MCLAREN, M. D. & MARSAGLIA, G. (1965). Uniform random number generators. J. Assoc Comput. Mach.
12, 83-9.
OREY, S. (1958). A central limit theorem for m-dependent random variables. Duke Math. J. 52, 543-6.
PREECE, M. A. & BAINES, M. J. (1978). A new family of mathematical models describing the human growth
curve. Ann. Hum. BioL 5, 1-24.
RICE, J. (1984). Bandwidth choice for nonparametric regression. Ann. Statist 12, 1215-30.
SILVERMAN, B. W. (1984). A fast and efficient cross-validation method for smoothing parameter choice in
spline regression. J. Am. Statist. Assoc 79, 584-9.
SILVERMAN, B. W. (1985). Some aspects of the spline smoothing approach to parametric regression curve
fitting. / R. Statist Soc B 47, 1-52.
STUTZLE, W., GASSER, T., MOLINARI, L., PRADER, A. & HUBER, P. J. (1980). Shape-invariant modeling
of human growth. Ann. Hum. BioL 7, 507-28.
WAHBA, G. (1978). Improper priors, spline smoothing, and the problem of guarding against model errors
in regression. /. R. Statist. Soc B 49, 364-72.
WHITTLE, P. (1964). On the convergence to normality of quadratic forms in independent variables. Theory
Prob. Applic 9, 103-8.
BOCK,
[Received May 1985. Revised February 1986]
Downloaded from http://biomet.oxfordjournals.org/ at Carleton University on June 2, 2014
REFERENCES
Downloaded from http://biomet.oxfordjournals.org/ at Carleton University on June 2, 2014
Download