A degrees-of-freedom approximation for a t

advertisement
The Statistician (1999)
48, Part 4, pp. 495±506
A degrees-of-freedom approximation for a t-statistic
with heterogeneous variance
Stuart R. Lipsitz and Joseph G. Ibrahim
Harvard School of Public Health and Dana-Farber Cancer Institute, Boston, USA
and Michael Parzen
University of Chicago, USA
[Received November 1998]
Summary. Ordinary least squares is a commonly used method for obtaining parameter estimates
for a linear model. When the variance is heterogeneous, the ordinary least squares estimate is
unbiased, but the ordinary least squares variance estimate may be biased. Various jackknife
estimators of variance have been shown to be consistent for the asymptotic variance of the ordinary
least squares estimator of the parameters. In small samples, simulations have shown that the
coverage probabilities of con®dence intervals using the jackknife variance estimators are often low.
We propose a ®nite sample degrees-of-freedom approximation to use with the t-distribution, which,
in simulations, appears to raise the coverage probabilities closer to the nominal level. We also
discuss an `estimated generalized least squares estimator', in which we model and estimate the
variance, and use the estimated variance in a weighted regression. Even when the variance is
modelled correctly, the coverage can be low in small samples, and, in terms of the coverage
probability, one appears better off using ordinary least squares with a robust variance and our
degrees-of-freedom approximation. An example with real data is given to demonstrate how
potentially misleading conclusions may arise when the degrees-of-freedom approximation is not
used.
Keywords: Homogeneous variance; Linear model; Ordinary least squares
1. Introduction
Linear regression with ordinary least squares is used extensively in statistics. Often, although
assumed homogeneous, the underlying variance of the response is heterogeneous. When using
ordinary least squares, if the linear model is correctly speci®ed, and the error variance of the
response is heterogeneous, the parameter estimates are unbiased and consistent (under weak
regularity conditions). However, the ordinary least squares variance estimators (assuming homogeneous error variance) are usually biased and inconsistent. Alternatively, various jackknife
estimators of variance have been shown to be consistent for the asymptotic variance of the
ordinary least squares estimator of the parameters (Hinkley, 1977; MacKinnon and White, 1985;
Wu, 1986).
Heterogeneous variance occurs, for example, when the variance of the response increases or
decreases when a covariate value increases (this covariate may or may not be in the model for the
response); it also occurs when the mean and variance are linked, such as a linear model for binary
Address for correspondence: Joseph G. Ibrahim, Department of Biostatistics, Harvard School of Public Health and DanaFarber Cancer Institute, 44 Binney Street, Boston, MA 02115, USA.
E-mail: ibrahim@jimmy.harvard.edu
& 1999 Royal Statistical Society
0039±0526/99/48495
496
S. R. Lipsitz, J. G. Ibrahim and M. Parzen
data. In some situations, a transformation of the response and covariates can be used to formulate
a regression model with constant variance. Unfortunately, the interpretation of the model is often
ruined with such a transformation. Therefore, methods that preserve the linearity of the model for
the outcome as a function of covariates, but also model the variance as a function of covariates,
have been developed. Methods for obtaining an appropriate variance model as a function of
covariates have been developed by Cook and Weisberg (1983). After ®nding the appropriate
variance model and estimating the variance, weighted least squares can then be used to estimate
the regression parameters of the mean, in which an individual's observation is weighted by the
inverse of the estimated variance. Unfortunately, when the sample size is small, even if the
variance is modelled correctly, estimates of the variance parameters can be highly unstable,
leading to con®dence intervals for the regression parameters that have low coverage. Thus,
because of the problems that can occur when estimating the variance in small samples, we propose
the use of ordinary least squares estimation with a robust variance, and a degrees-of-freedom
correction. The proposed degrees-of-freedom correction gives better con®dence interval coverage
than do parametric models for the variance function. In addition, the degrees-of-freedom
approximation and the parametric variance function methods give similar coverages for large
samples.
In an excellent review of methods to use with heterogeneous variance, Wu (1986) posed the
model
Y i ˆ â 0 ‡ â 1 xi ‡ E i ,
(1)
where Y i is the response, xi is a ®xed covariate and the E i are independently distributed as
N (0, 0:5xi ). To explore ®nite sample properties of the jackknife, Wu ®xed the sample size at 12,
with the 12 covariate values equal to 1, 1.5, 2, 2.5, 3, 3.5, 4, 5, 6, 7, 8 and 10. In these small
samples, his simulations found that the coverage probabilities of con®dence intervals using the
jackknife variance estimators were often low. In Section 3 we discuss a ®nite sample degrees-offreedom approximation to use with the t-distribution, which, in simulations, appears to raise the
coverage probabilities closer to the nominal level. Section 2 discusses the notation and models,
and reviews the jackknife variance estimator, which gives consistent variance estimators when the
variance is heterogeneous. Section 3 discusses the degrees-of-freedom approximation. Section 4
discusses a generalized least squares method for estimating the heterogeneous variances, and then
using these to re-estimate the regression parameters. Section 5 presents simulation results,
comparing the ordinary least squares estimate with robust variance and the degrees-of-freedom
correction with the estimated generalized least squares estimate.
To illustrate the degrees-of-freedom correction, in Section 6, we consider the gasoline vapour
data set of Cook and Weisberg (1982). This data set has been analysed in detail by Cook and
Weisberg (1982, 1983) and Weisberg (1985) with respect to the issue of non-constant variance.
The data set contains information on the amount of gasoline vapour given off into the atmosphere
when ®lling a tank, under various temperature and pressure conditions. In particular, the data set
consists of four predictors x1 , . . ., x4 and a response variable Y, where Y denotes the amount of
gasoline vapour given off, x1 is the initial temperature of the tank, x2 is the temperature of the
gasoline dispensed, x3 is the initial vapour pressure in the tank and x4 is the vapour pressure of the
gasoline dispensed. The data set contains 32 observations. The linear regression model for these
data is given by
Y i ˆ â0 ‡ â1 xi1 ‡ â2 xi2 ‡ â2 xi3 ‡ â4 xi4 ‡ E i :
Cook and Weisberg (1983) used these data to develop diagnostic tools for checking whether
the variance function has a particular parametric form. In particular, through their proposed score
Degrees-of-freedom Approximation for a t-statistic
497
statistic, they found that the error variance is a function of (x1 , x4 ). Our main aim here is to
demonstrate the potential effect of the degrees-of-freedom approximation on the t-tests when
using the jackknife variance estimator.
2. Notation and review
We have a random sample of n individuals, in which the ith individual has response Y i and a
P 3 1 covariate vector x i ˆ (xi1, . . ., xiP )9. Let Y ˆ (Y1 , . . ., Y n )9 be the n 3 1 vector of responses
and X ˆ (x1, . . ., x n )9 be the n 3 P design matrix. The response Y i is assumed to be normal with
®rst two moments
E(Y i ) ˆ x9i â,
var(Y i ) ˆ ó 2i :
If ó 2i is unknown or is dif®cult to model, an attractive method for estimating â is ordinary least
squares. The ordinary least squares estimator of â, which is unbiased, is
n
ÿ1 n
P
^OLS ˆ (X 9X )ÿ1 X 9Y ˆ P x i x9i
x i yi :
â
iˆ1
However,
^ OLS ) ˆ (X 9X )ÿ1 X 9 var(Y)X (X 9X )ÿ1 ˆ
var( â
n
P
iˆ1
iˆ1
ÿ1 n
n
ÿ1
P
P
x i x9i
x i x9i ó 2i
x i x9i
:
iˆ1
iˆ1
(2)
^OLS ÿ â) is asymptotically
^OLS is consistent for â and n1=2 ( â
The ordinary least squares estimator â
multivariate normal with mean vector 0 and covariance matrix given by n times expression (2). In
^OLS ) ˆ ó 2 (X9X)ÿ1 if ó 2 ˆ ó 2 for all i. If ó 2 6ˆ ó 2 for some i, then the usual
equation (2), var( â
i
i
ordinary least squares variance estimate may be biased.
Several researchers have shown that equation (2) can be consistently estimated by various
jackknife variance estimators of the form
!
2
n
ÿ1 n
ÿ1
n
P
P
P
ei
e2i
ÿ1
ÿ1
x i x9i
x i x9i
x i x9i
,
(3)
X (X 9X ) ˆ
(X 9X ) X 9 diag
wi
wi
iˆ1
iˆ1
iˆ1
^OLS is the residual, w i is a known positive weight that converges to 1 as
where e i ˆ Y i ÿ x9i â
n ! 1 and diag(a i ) is a diagonal matrix with (a1 , . . ., a n ) on the main diagonal. In particular,
White (1980) let w i ˆ 1 for all i, and Horn et al. (1975) let w i ˆ 1 ÿ h i , where h i ˆ x9i (X 9X )ÿ1 x i
is the leverage. Horn et al. (1975) justi®ed their weights by noting that, if ó 2i ˆ ó 2 , then
^ is unbiased. Also, dividing e2 by 1 ÿ h i tends to
E(e2i ) ˆ ó 2 (1 ÿ h i ) and their estimator of var( â)
i
2
in¯ate the squared residual to approximate ó i more closely. When the variance is heterogeneous,
simulations have shown (MacKinnon and White, 1985) that setting w i ˆ 1 ÿ h i has better ®nite
^ p be the pth element of â
^OLS and var(
^ p ) be the
c â
sample properties than setting w i ˆ 1. If we let â
pth diagonal element of expression (3) with w i ˆ 1 ÿ h i , then, to test H 0 : â p ˆ 0, Wu (1986)
assumed that
^ p ÿ âp
â
T ˆp
^ p)
c â
var(
(4)
follows a t-distribution with n ÿ P degrees of freedom. However, Wu's (1986) simulations with
small n found that the coverage probabilities using this t-approximation were often low. In this
498
S. R. Lipsitz, J. G. Ibrahim and M. Parzen
paper, we propose to use a t-statistic for equation (4), but with a modi®ed degrees of freedom
derived from the Satterthwaite (1946) approximation.
3. The degrees-of-freedom approximation
To use the Satterthwaite approximation, we rewrite equation (4) as
^ p ÿ âp
Zp
â
,
ˆp
T ˆp
^
(Q p = f p )
c âp)
var(
(5)
where
Zp ˆ p
^ p ÿ âp
â
^ p )g
c â
Efvar(
is approximately a normal (0, 1) random variable and
Qp ˆ
^ p)
c â
f p var(
^ p )g
c â
Efvar(
(6)
is an approximate ÷ 2 random variable with f p degrees of freedom. Here, we also assume that
^ p )g ˆ var( â
^ p ). In the Satterthwaite approxima^ p ) is approximately unbiased so that Efvar(
c â
c â
var(
tion, the degrees of freedom f p are chosen so that Q p has the same ®rst two moments as a ÷ 2
random variable, i.e. so that the variance of Q p is twice its mean. Satterthwaite (1946) showed that
the appropriate f p is
fp ˆ
^ p )g]2
c â
2[Efvar(
:
^ p )g
c â
varfvar(
(7)
If we substitute f p in equation (6), it is easy to see that the variance of Q p is twice its mean. Then,
we shall assume that equation (5), and equivalently equation (4), follows a t-distribution with f p
degrees of freedom. We note here that f p can be different for the different â p .
^ p )g and
c â
Thus, to use the Satterthwaite approximation, we need to estimate Efvar(
^ p )g is var(
^ p ). Thus, we now only
^ p )g in equation (7). An obvious estimate of Efvar(
c â
c â
c â
varfvar(
^ p )g, we ®rst
^ p )g to estimate f p in equation (7). To estimate varfvar(
c â
c â
need an estimate of varfvar(
^ p and var(
^ p ) in simpler form. We write
c â
rewrite â
^ p ˆ c p Y ˆ P c pi yi ,
â
n
iˆ1
where c p ˆ (c p1 , . . ., c pn ) is the pth row of (X9X) X9. Then,
ÿ1
^ p) ˆ
var( â
which we estimate by
^ p) ˆ
c â
var(
n
P
iˆ1
c2pi ó 2i ,
2
n
n
c2pi e2i
P
P
c pi e i
p
ˆ
:
(1 ÿ h i )
iˆ1 1 ÿ h i
iˆ1
(8)
If we let W ˆ diag{(1 ÿ h i )ÿ1=2 }, i.e. a diagonal matrix with elements (1 ÿ h i )ÿ1=2 on the diagonal,
and e ˆ (e1, . . ., e n )9, we can rewrite equation (8) as
Degrees-of-freedom Approximation for a t-statistic
^ p ) ˆ (c p W e)9(c p W e) ˆ e9(W c9p c p W )e ˆ e9A p e,
c â
var(
499
(9)
where A p ˆ W c9p c p W . Equation (9) is just a quadratic form in e.
^ p )g, we can use multivariate normal theory results (Searle, 1982) to
c â
Thus, to obtain varfvar(
obtain the variance of the quadratic form e9A p e,
^ p )g ˆ var(e9A p e) ˆ 2 tr[fA p var(e)g2 ] ‡ 4 E(e9)A p var(e)A p E(e),
c â
varfvar(
(10)
^ p ), we compute the
c â
where tr[´] denotes the trace of a matrix. Then, to obtain the variance of var(
mean and variance of e and substitute them in equation (10). Using results from linear models, it
is easily shown that
^OLS ) ˆ 0
E(e) ˆ E(Y ÿ X â
and
var(e) ˆ (I n ÿ H) diag(ó 2i )(I n ÿ H),
where I n is an n 3 n identity matrix and H ˆ X (X9X)ÿ1 X9 is the hat matrix. Plugging these
results into equation (10), we obtain
^ p )g ˆ 2 tr[fA p (I n ÿ H) diag(ó 2 )(I n ÿ H)g2 ]:
c â
varfvar(
i
(11)
After a little algebra, it can be shown that equation (11) can be rewritten as
^ p )g ˆ 2 tr(f(I n ÿ H)A p (I n ÿ H)g[f(I n ÿ H)A p (I n ÿ H)g #Ó]),
c â
varfvar(
(12)
where the symbol # represents element multiplication (the ijth element of the new matrix is the
product of the ijth elements of the matrices that you are `multiplying'). Also, Ó has ith diagonal
elements equal to ó 4i and ijth off-diagonal elements equal to ó 2i ó 2j . Thus, we need estimates of ó 4i
and ó 2i ó 2j . We propose `almost unbiased' estimates of these two by noting that, if ó 2i ˆ ó 2 , then
E(e4i ) ˆ 3ó 4 (1 ÿ h i )2
and
E(e2i e2j ) ˆ var(Y i ) var(Y j )f2h2ij ‡ (1 ÿ h i )(1 ÿ h j )g,
^ which
where h ij is the ijth element of the hat matrix H. Thus, we propose to estimate Ó with Ó,
has ith diagonal element equal to
ó^ 4i ˆ
e4i
3(1 ÿ h i )2
(13)
and ijth off-diagonal element equal to
ó^ 2i ó^ 2j ˆ
e2i e2j
2
ij
2h ‡ (1 ÿ h i )(1 ÿ h j )
:
(14)
As n ! 1, h ij ! 0 and h i ! 0, and equations (13) and (14) are asymptotically unbiased for ó 4i
and ó 2i ó 2j respectively.
Thus, we propose to estimate the degrees of freedom from the Satterthwaite approximation by
500
S. R. Lipsitz, J. G. Ibrahim and M. Parzen
^f p ˆ
ˆ
^ p )2
c â
2 var(
^
2 tr(f(I n ÿ H)A p (I n ÿ H)g[f(I n ÿ H)A p (I n ÿ H)g #Ó])
^ p )2
c â
var(
^
tr(f(I n ÿ H)A p (I n ÿ H)g[f(I n ÿ H)A p (I n ÿ H)g #Ó])
:
(15)
4. Generalized least squares
If var(Y i ) ˆ ó 2i is known, then the optimal linear estimation procedure is to estimate â via
generalized least squares, i.e.
n
ÿ1 n
P 1
P 1
^
x i x9i
x y,
(16)
âGLS ˆ
2
2 i i
iˆ1 ó i
iˆ1 ó i
^GLS ÿ â)
^GLS is consistent for â and n1=2 ( â
which is unbiased. Furthermore (Mardia et al., 1979), â
is asymptotically multivariate normal with mean vector 0 and covariance matrix given by n times
n
ÿ1
^ GLS ) ˆ P 1 x i x9i
:
(17)
var( â
2
iˆ1 ó i
Since this generalized least squares problem can be transformed to ordinary least squares by
dividing Y i and x i by ó i,
^ p ÿ âp
â
T ˆp
^ p)
c â
var(
(18)
follows a t-distribution with n ÿ P degrees of freedom under the null hypothesis H 0 : â p ˆ 0.
Unfortunately, we do not know ó 2i . An estimated generalized least squares estimator of â, say
^EGLS , obtained by replacing ó 2 in equation (16) with a consistent estimator, say ó^ 2, will have
â
i
i
^GLS . The variance as a function of covariates is often modelled
the same asymptotic properties as â
as
ó 2i ˆ g(u9i á),
(19)
where g(:) is the identity function g(u9i á) ˆ u9i á or the exponential function g(u9i á) ˆ exp(u9i á)
and u i is some function of the covariates (often, u i equals x i , but this is not necessary).
By modelling the variance structure as in equation (19), we hope to obtain an estimator which
is more ef®cient than ordinary least squares. The mean of the asymptotic distribution of e2i ˆ
^OLS )2 is ó 2 , i.e. e2 ˆ ó 2 ‡ ri (with ri having asymptotic mean 0). This suggests a set of
(Y i ÿ x9i â
i
i
i
estimating equations for ó 2i of the form
n
P
u9i fe2i ÿ g(u9i á)g ˆ 0:
(20)
iˆ1
If g(:) is the linear function, then the solution to equations (20) is the ordinary least squares
estimate, i.e.
n
ÿ1 n
P
P
^ˆ
u i u9i
u i e2i :
á
iˆ1
iˆ1
If g(:) is the exponential function, then the solution to equations (20) can be obtained by using
any generalized linear models program (such as SAS procedure Genmod (SAS Institute, 1993))
Degrees-of-freedom Approximation for a t-statistic
501
with Poisson error structure and a log-link. Then â is re-estimated by equation (16) with ó 2i
^ ). This is a three-stage technique in which stage 1 is ordinary least squares for â,
replaced by g(u9i á
stage 2 is ordinary least squares or a generalized linear model for á and stage 3 is estimated
generalized least squares for â using ó^ i from stage 2. In our analyses, we iterate among the three
^EGLS and á
^ . The three-stage estimator and the fully
stages until a convergence criterion is met for â
iterated estimator have the same asymptotic properties.
Using the same type of asymptotic expansions as in Prentice (1988), assuming that equation
^EGLS is consistent and has the asymptotic covariance matrix consis(19) is correctly speci®ed, â
tently estimated by
n
ÿ1
^EGLS ) ˆ P 1 x i x9i
c â
:
(21)
var(
^ 2i
iˆ1 ó
The theory on estimated generalized least squares (Maddala, 1971; De Gruttola et al., 1987)
proposes treating the estimates ó^ 2i as if they are known (and equal to ó 2i ) when making inferences
^EGLS to its estimated standard error is taken to have
about â. As such, the ratio of an element of â
^GLS to its estimated standard error, which is a tthe same distribution as the ratio of an element of â
distribution with n ÿ P degrees of freedom. Unfortunately, when the sample size is small, the
estimate of ó 2i based on equations (20) can be variable and, assuming that it is known, and the
resulting t-statistic has n ÿ P degrees of freedom, can result in low coverage, as seen in the
simulations in the next section.
5. Simulation
We performed simulations to compare coverage probabilities using the following:
(a) ordinary least squares with ó^ 2i ˆ ó^ 2 and n ÿ P degrees of freedom;
(b) ordinary least squares with variance estimator (3), ó^ 2i ˆ e2i =(1 ÿ h i ) and n ÿ P degrees of
freedom;
(c) ordinary least squares with variance estimator (3), ó^ 2i ˆ e2i =(1 ÿ h i ) and ^f p degrees of
freedom;
^0 ‡ á
^ 1 xi and n ÿ P degrees of freedom.
(d) estimated generalized least squares with ó^ 2i ˆ á
The linear regression model considered was
Y i ˆ 0:4xi ÿ 0:25x2i ‡ E i ,
(22)
where the xi are ®xed, and the E i are independent. We performed sets of simulations when both
E i N (0, xi ) and E i N (0, 1). As such, the estimated variance when using estimated generalized
^0 ‡ á
^ 1 xi , is correctly speci®ed (actually overspeci®ed) in both cases. To
least squares, ó^ 2i ˆ á
explore the ®nite sample properties of the coverage probabilities, we considered sample sizes of
12, 24 and 48, with 12 distinct covariate values equal to 1, 1.5, 2, 2.5, 3, 3.5, 4, 5, 6, 7, 8 and 10.
For n ˆ 24, we doubled these values, and for n ˆ 48 we quadrupled them.
We are most interested in coverage probabilities for con®dence intervals. Speci®cally, we look
at 95% con®dence intervals of the form
^ pb t0:025,df v1=2 ,
â
pb
p
^ pb is an estimate of the pth element of â from the bth simulation and v pb is the estimate of
where â
its variance using one of the methods discussed. Also, t0:025,df p is the 97.5th percentile from a tdistribution with df p degrees of freedom. With 1825 replications, using large sample normal
502
S. R. Lipsitz, J. G. Ibrahim and M. Parzen
theory, the proportion of con®dence intervals containing the true value should be in the closed
interval [94, 95] if the true coverage probability is 95%. Also of interest are the average lengths of
the con®dence intervals and the average degrees of freedom. The results are given in Tables 1 and
2.
Table 1 contains the simulations under heterogeneous variance. When wrongly assuming constant variance (ó 2i ˆ ó 2 ), we see that the coverage probabilities do not appear to depend very
much on the sample size. For â0, the coverage is high (about 98%), for â1, the coverage is just a
little low (about 93.5%) and, for â2, it is low (about 90%). When using the robust variance
estimator with df ˆ n ÿ P, the coverage for the intercept is apparently correct, for â1, the
coverage is slightly low and, for â2, it is also low. As n increases, the coverage probabilities for â1
and â2 increase closer to the nominal level, though still low for â2. The degrees-of-freedom
correction f p appears to make the coverage for â1 and â2 not signi®cantly different from the
nominal level. Note that the lengths of the con®dence intervals are larger when using our degreesof-freedom approximation. Further, the average degrees of freedom are much smaller using our
degrees-of-freedom approximation. This is particularly true for â2. When n ˆ 48, the average of
Table 1. Heterogeneous varianceÐcoverage probabilities and average lengths of con®dence intervals from
simulations with (â0 , â1 , â2 ) ˆ (0, 0.4, ÿ0.25) and var(Y|x ) ˆ x
Parameter
n
Results from the following methods:
Ordinary least squares,
ó^ 2i ˆ ó^ 2 , df p ˆ n ÿ P
â0
12
24
48
â1
12
24
48
â2
12
24
48
97.5{
4.45{
9.0}
98.3
3.00
21.0
98.8
2.10
45.0
93.6
2.01
9.0
93.5
1.35
21.0
93.7
0.95
45.0
91.0
0.18
9.0
90.9
0.14
21.0
90.1
0.087
45.0
Ordinary least squares,
ó^ 2i ˆ e2i =(1 ÿ h i ),
df p ˆ n ÿ P
df p ˆ ^f p
Estimated generalized
least squares,
^0 ‡ á
^ 1 xi ,
ó^ 2i ˆ á
df p ˆ n ÿ P
95.1
3.76
9.0
95.4
2.51
21.0
95.3
1.77
45.0
93.2
1.99
9.0
93.9
1.37
21.0
93.9
0.99
45.0
90.5
0.19
9.0
92.2
0.14
21.0
92.9
0.098
45.0
96.2
4.12
4.5
95.9
2.62
14.2
95.5
1.69
28.7
95.5
2.20
6.9
94.8
1.46
13.1
94.9
0.94
23.4
94.6
0.59
5.7
93.9
0.15
10.0
94.5
0.110
16.2
95.2
3.86
9.0
94.7
2.51
21.0
94.6
1.65
45.0
93.2
1.88
9.0
94.2
1.31
21.0
94.9
0.88
45.0
90.8
0.18
9.0
93.1
0.13
21.0
94.7
0.091
45.0
{The ®rst entry in each cell is the coverage probability.
{The second entry in each cell is the average length of the con®dence interval.
}The third entry in each cell is the average degrees of freedom.
Degrees-of-freedom Approximation for a t-statistic
503
Table 2. Homogeneous varianceÐcoverage probabilities and average lengths of con®dence intervals from
simulations with (â0 , â1 , â2 ) ˆ (0, 0.4, ÿ0.25) and var(Y|x ) ˆ 1
Parameter
n
Results from the following methods:
Ordinary least squares,
ó^ 2i ˆ ó^ 2 , df p ˆ n ÿ P
â0
12
24
48
â1
12
24
48
â2
12
24
48
95.5{
2.22{
9.0}
94.1
1.46
21.0
94.8
1.01
45.0
94.5
1.00
9.0
95.4
0.66
21.0
94.6
0.45
45.0
95.3
0.92
9.0
94.5
0.061
21.0
94.1
0.042
45.0
Ordinary least squares,
ó^ 2i ˆ e2i =(1 ÿ h i ),
df p ˆ n ÿ P
df p ˆ ^f p
Estimated generalized
least squares,
^0 ‡ á
^ 1 xi ,
ó^ 2i ˆ á
df p ˆ n ÿ P
92.5
2.15
9.0
93.2
1.43
21.0
94.5
1.00
45.0
93.5
0.98
9.0
93.5
0.65
21.0
94.2
0.45
45.0
92.8
0.089
9.0
93.1
0.059
21.0
93.8
0.042
45.0
95.1
2.32
5.8
94.4
1.51
12.5
95.1
1.03
23.8
95.3
1.05
6.9
94.6
0.71
14.7
95.1
0.65
29.1
95.7
0.10
6.1
93.9
0.063
12.9
94.5
0.043
24.5
92.9
2.25
9.0
94.4
1.48
21.0
94.9
1.00
45.0
93.6
1.01
9.0
93.6
0.68
21.0
94.4
0.48
45.0
92.3
0.094
9.0
93.6
0.060
21.0
94.9
0.041
45.0
{The ®rst entry in each cell is the coverage probability.
{The second entry in each cell is the average length of the con®dence interval.
}The third entry in each cell is the average degrees of freedom.
^EGLS , the coverage is low (about 91%) for
the degrees of freedom ( ^f p ) is only 16.2. When using â
â2 when n ˆ 12 and slightly low (about 93%) for â2 when n ˆ 24. Thus, even when the variance
is modelled correctly, the coverage can be low in small samples, and, in terms of coverage
probability, one is better off using ordinary least squares with a robust variance and our degreesof-freedom approximation. Fig. 1 gives a plot of the coverage probabilities as a function of n for
â2 (based on more simulations than those given in Table 1). As stated above, we see that using
ordinary least squares with a robust variance estimate and our degrees-of-freedom approximation
has the best coverage.
Looking at Table 2, in which the true model has homogeneous variance, we see that the usual
ordinary least squares method gives correct coverage probabilities, as expected. The coverage
probability using the robust variance estimator with n ÿ P degrees of freedom appears low,
although becoming closer to the nominal level as n increases. Finally, our degrees-of-freedom
^EGLS , the coverage
approximation tends to increase the coverage to the nominal level. When using â
is slightly low (about 92%) for â3 when n ˆ 12.
Because of the broad range of possible data con®gurations, it is dif®cult to draw de®nitive
504
S. R. Lipsitz, J. G. Ibrahim and M. Parzen
Fig. 1. Coverage probabilities as a function of n for â2 from simulations with (â0 , â1 , â2 ) ˆ (0, 0.4, ÿ0.25) and
var(Y|x ) ˆ x: ´´´´´´´´´´´, ordinary least squares with ó^ 2i ˆ e 2i =(1 ÿ h i ) and df p ˆ ^f p ; - - - - - -, estimated generalized
least squares; ± ± ± ±, ordinary least squares with ó^ 2i ˆ e 2i =(1 ÿ h i ) and df p ˆ n ÿ P; ÐÐÐ, ordinary least
squares with ó^ 2i ˆ ó^ 2 and df p ˆ n ÿ P
conclusions from a simulation. We can only make general suggestions. We have looked at a simple
case to study the properties of the degrees-of-freedom approximation. We encourage using our
degrees-of-freedom approximation ^f p since, when n is large, it will give the same coverage as the
usual degrees of freedom n ÿ P and, when n is small, it appears more appropriate in our limited
simulations and can increase the coverage probability by up to 5%. Further, it apparently has a
better coverage probability in small samples than does the estimated generalized least squares
estimate with correctly modelled variance. When analysing a data set in which the error variance
appears heterogeneous, we suggest that the data analyst should calculate our proposed degrees of
freedom ^f p and, if ^f p < 30, we suggest that our degrees-of-freedom approximation should be
used.
6. Example
Gasoline vapours are a signi®cant source of air pollution. One source of gasoline vapours occurs
when gasoline is pumped into a tank and vapours are forced into the atmosphere. A laboratory
experiment was conducted to determine the amount of gasoline vapours given off into the
atmosphere under different temperature and vapour conditions (Cook and Weisberg, 1982). The
model of interest is
E(Y i jxi1 , xi2 , xi3 , xi4 ) ˆ â0 ‡ â1 xi1 ‡ â2 xi2 ‡ â3 xi3 ‡ â4 xi4 ,
where Y i is the amount of vapour given off, xi1 is the initial temperature of the tank, xi2 is the
temperature of the gasoline dispensed, xi3 is the initial vapour pressure in the tank and xi4 is the
vapour pressure of the gasoline dispensed. Through their proposed score statistic, Cook and
Weisberg (1982) found that the error variance is of the form
var(Y i jxi1 , xi2 , xi3 , xi4 ) ˆ exp(á0 ‡ á1 xi1 ‡ á2 xi4 ):
Tables 3±6 give the t-tests and p-values obtained using the following respective variance
estimators:
Degrees-of-freedom Approximation for a t-statistic
505
Table 3. Ordinary least squares estimates of variance
Parameter
â0
â1
â2
â3
â4
Degrees of freedom
for error
Estimate
Standard
error
t-statistic
p-value
27
27
27
27
27
0.7663
0.0151
0.1992
ÿ4.8195
9.1439
2.0214
0.0903
0.0709
2.9318
2.9497
0.379
0.167
2.808
ÿ1.644
3.100
0.7076
0.8683
0.0092
0.1118
0.0045
Table 4. Robust estimates of variance with n ÿ P degrees of freedom
Parameter
â0
â1
â2
â3
â4
Degrees of freedom
for error
Estimate
Standard
error
t-statistic
p-value
27
27
27
27
27
0.7663
0.0151
0.1992
ÿ4.8195
9.1439
1.7390
0.0923
0.0573
3.9144
3.9841
0.4406
0.1637
3.4783
ÿ1.2312
2.2950
0.6630
0.8711
0.0017
0.2288
0.0297
Table 5. Robust estimates of variance with proposed degrees-of-freedom correction
Parameter
â0
â1
â2
â3
â4
Degrees of freedom
for error
Estimate
Standard
error
t-statistic
p-value
13.628
16.947
10.282
8.143
5.789
0.7663
0.0151
0.1992
ÿ4.8195
9.1439
1.7390
0.0923
0.0573
3.9144
3.9841
0.4406
0.1637
3.4783
ÿ1.2312
2.2950
0.6664
0.8718
0.0057
0.2526
0.0631
^ 1 x i1 ‡ á
^ 2 x i4 )
Table 6. Estimated generalized least squares estimates with ó^ 2i ˆ exp(^
á0 ‡ á
Parameter
â0
â1
â2
â3
â4
Degrees of freedom
for error
Estimate
Standard
error
t-statistic
p-value
27
27
27
27
27
ÿ0.2610
0.0219
0.1863
ÿ4.7927
9.4411
1.6747
0.0759
0.0569
2.9221
2.9246
ÿ0.1558
0.2884
3.2725
ÿ1.6401
3.2282
0.8773
0.7752
0.0029
0.1126
0.0033
(a) ordinary least squares with ó^ 2i ˆ ó^ 2 and n ÿ P degrees of freedom;
(b) ordinary least squares with variance estimator (3), ó^ 2i ˆ e2i =(1 ÿ h i ) and n ÿ P degrees of
freedom;
(c) ordinary least squares with variance estimator (3), ó^ 2i ˆ e2i =(1 ÿ h i ) and ^f p degrees of freedom;
^ 1 xi1 ‡ á
^ 2 xi4 ) and n ÿ P degrees
á0 ‡ á
(d) estimated generalized least squares with ó^ 2i ˆ exp(^
of freedom.
From Tables 3±6, we observe several interesting results. First, we see how the degrees of
freedom change dramatically from Table 4 to Table 5, especially for â4. Secondly, using the usual
506
S. R. Lipsitz, J. G. Ibrahim and M. Parzen
ordinary least squares method with homogeneous variance and n ÿ P degrees of freedom, we see
from Table 3 that the p-value for â4 is highly signi®cant ( p-value 0.0045) and becomes less
signi®cant (Table 4) when the robust estimate of variance is used ( p-value 0.0297). However,
when the degrees-of-freedom correction is used (Table 5), we see that the p-value for â4 becomes
non-signi®cant at the 5% level ( p-value 0.0631). Further, using estimated generalized least
squares, we obtain results that are similar to the usual ordinary least squares results ( p-value
0.0033).
Thus, the effect of xi4 is reduced greatly when using ordinary least squares with robust variance
and the degrees-of-freedom approximation ^f p , and we see how potentially different conclusions
can be reached when the degrees-of-freedom approximation is used. This example agrees with the
simulations in Section 4: the ordinary least squares variance and robust variance with degrees of
freedom n ÿ P and estimated generalized least squares can be anticonservative in small samples.
References
Cook, R. D. and Weisberg, S. (1982) Residuals and In¯uence in Regression. London: Chapman and Hall.
Ð (1983) Diagnostics for heteroscedasticity in regression. Biometrika, 70, 1±10.
De Gruttola, V., Ware, J. H. and Louis, T. A. (1987) In¯uence analysis of generalized least squares estimators. J. Am.
Statist. Ass., 82, 911±917.
Hinkley, D. V. (1977) Jackkni®ng in unbalanced situations. Technometrics, 19, 285±292.
Horn, S. D., Horn, R. A. and Duncan, D. B. (1975) Estimating heteroscedastic variances in linear models. J. Am. Statist.
Ass., 70, 380±385.
MacKinnon, J. G. and White, H. (1985) Some heteroscedasticity-consistent covariance matrix estimators with improved
®nite sample properties. J. Econometr., 29, 305±325.
Maddala, G. S. (1971) Generalized least squares with an estimated variance covariance matrix. Econometrica, 39, 23±33.
Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979) Multivariate Analysis. New York: Academic Press.
Prentice, R. L. (1988) Correlated binary regression with covariates speci®c to each binary observation. Biometrics, 44,
1033±1048.
SAS Institute (1993) The GENMOD procedure, release 6.09. Technical Report P-243, SAS/STAT Software. SAS Institute,
Cary.
Satterthwaite, F. E. (1946) An approximate distribution of estimates of variance components. Biometr. Bull., 2, 110±114.
Searle, S. R. (1982) Matrix Algebra Useful for Statistics. New York: Wiley.
Weisberg, S. (1985) Applied Linear Regression, 2nd edn. New York: Wiley.
White, H. (1980) A heteroscedasticity-consistent covariance matrix and a direct test for heteroscedasticity. Econometrica,
48, 817±838.
Wu, C. F. J. (1986) Jackknife, bootstrap and other resampling methods in regression analysis. Ann. Statist., 14, 1261±1295.
Download