The Statistician (1999) 48, Part 4, pp. 495±506 A degrees-of-freedom approximation for a t-statistic with heterogeneous variance Stuart R. Lipsitz and Joseph G. Ibrahim Harvard School of Public Health and Dana-Farber Cancer Institute, Boston, USA and Michael Parzen University of Chicago, USA [Received November 1998] Summary. Ordinary least squares is a commonly used method for obtaining parameter estimates for a linear model. When the variance is heterogeneous, the ordinary least squares estimate is unbiased, but the ordinary least squares variance estimate may be biased. Various jackknife estimators of variance have been shown to be consistent for the asymptotic variance of the ordinary least squares estimator of the parameters. In small samples, simulations have shown that the coverage probabilities of con®dence intervals using the jackknife variance estimators are often low. We propose a ®nite sample degrees-of-freedom approximation to use with the t-distribution, which, in simulations, appears to raise the coverage probabilities closer to the nominal level. We also discuss an `estimated generalized least squares estimator', in which we model and estimate the variance, and use the estimated variance in a weighted regression. Even when the variance is modelled correctly, the coverage can be low in small samples, and, in terms of the coverage probability, one appears better off using ordinary least squares with a robust variance and our degrees-of-freedom approximation. An example with real data is given to demonstrate how potentially misleading conclusions may arise when the degrees-of-freedom approximation is not used. Keywords: Homogeneous variance; Linear model; Ordinary least squares 1. Introduction Linear regression with ordinary least squares is used extensively in statistics. Often, although assumed homogeneous, the underlying variance of the response is heterogeneous. When using ordinary least squares, if the linear model is correctly speci®ed, and the error variance of the response is heterogeneous, the parameter estimates are unbiased and consistent (under weak regularity conditions). However, the ordinary least squares variance estimators (assuming homogeneous error variance) are usually biased and inconsistent. Alternatively, various jackknife estimators of variance have been shown to be consistent for the asymptotic variance of the ordinary least squares estimator of the parameters (Hinkley, 1977; MacKinnon and White, 1985; Wu, 1986). Heterogeneous variance occurs, for example, when the variance of the response increases or decreases when a covariate value increases (this covariate may or may not be in the model for the response); it also occurs when the mean and variance are linked, such as a linear model for binary Address for correspondence: Joseph G. Ibrahim, Department of Biostatistics, Harvard School of Public Health and DanaFarber Cancer Institute, 44 Binney Street, Boston, MA 02115, USA. E-mail: ibrahim@jimmy.harvard.edu & 1999 Royal Statistical Society 0039±0526/99/48495 496 S. R. Lipsitz, J. G. Ibrahim and M. Parzen data. In some situations, a transformation of the response and covariates can be used to formulate a regression model with constant variance. Unfortunately, the interpretation of the model is often ruined with such a transformation. Therefore, methods that preserve the linearity of the model for the outcome as a function of covariates, but also model the variance as a function of covariates, have been developed. Methods for obtaining an appropriate variance model as a function of covariates have been developed by Cook and Weisberg (1983). After ®nding the appropriate variance model and estimating the variance, weighted least squares can then be used to estimate the regression parameters of the mean, in which an individual's observation is weighted by the inverse of the estimated variance. Unfortunately, when the sample size is small, even if the variance is modelled correctly, estimates of the variance parameters can be highly unstable, leading to con®dence intervals for the regression parameters that have low coverage. Thus, because of the problems that can occur when estimating the variance in small samples, we propose the use of ordinary least squares estimation with a robust variance, and a degrees-of-freedom correction. The proposed degrees-of-freedom correction gives better con®dence interval coverage than do parametric models for the variance function. In addition, the degrees-of-freedom approximation and the parametric variance function methods give similar coverages for large samples. In an excellent review of methods to use with heterogeneous variance, Wu (1986) posed the model Y i â 0 â 1 xi E i , (1) where Y i is the response, xi is a ®xed covariate and the E i are independently distributed as N (0, 0:5xi ). To explore ®nite sample properties of the jackknife, Wu ®xed the sample size at 12, with the 12 covariate values equal to 1, 1.5, 2, 2.5, 3, 3.5, 4, 5, 6, 7, 8 and 10. In these small samples, his simulations found that the coverage probabilities of con®dence intervals using the jackknife variance estimators were often low. In Section 3 we discuss a ®nite sample degrees-offreedom approximation to use with the t-distribution, which, in simulations, appears to raise the coverage probabilities closer to the nominal level. Section 2 discusses the notation and models, and reviews the jackknife variance estimator, which gives consistent variance estimators when the variance is heterogeneous. Section 3 discusses the degrees-of-freedom approximation. Section 4 discusses a generalized least squares method for estimating the heterogeneous variances, and then using these to re-estimate the regression parameters. Section 5 presents simulation results, comparing the ordinary least squares estimate with robust variance and the degrees-of-freedom correction with the estimated generalized least squares estimate. To illustrate the degrees-of-freedom correction, in Section 6, we consider the gasoline vapour data set of Cook and Weisberg (1982). This data set has been analysed in detail by Cook and Weisberg (1982, 1983) and Weisberg (1985) with respect to the issue of non-constant variance. The data set contains information on the amount of gasoline vapour given off into the atmosphere when ®lling a tank, under various temperature and pressure conditions. In particular, the data set consists of four predictors x1 , . . ., x4 and a response variable Y, where Y denotes the amount of gasoline vapour given off, x1 is the initial temperature of the tank, x2 is the temperature of the gasoline dispensed, x3 is the initial vapour pressure in the tank and x4 is the vapour pressure of the gasoline dispensed. The data set contains 32 observations. The linear regression model for these data is given by Y i â0 â1 xi1 â2 xi2 â2 xi3 â4 xi4 E i : Cook and Weisberg (1983) used these data to develop diagnostic tools for checking whether the variance function has a particular parametric form. In particular, through their proposed score Degrees-of-freedom Approximation for a t-statistic 497 statistic, they found that the error variance is a function of (x1 , x4 ). Our main aim here is to demonstrate the potential effect of the degrees-of-freedom approximation on the t-tests when using the jackknife variance estimator. 2. Notation and review We have a random sample of n individuals, in which the ith individual has response Y i and a P 3 1 covariate vector x i (xi1, . . ., xiP )9. Let Y (Y1 , . . ., Y n )9 be the n 3 1 vector of responses and X (x1, . . ., x n )9 be the n 3 P design matrix. The response Y i is assumed to be normal with ®rst two moments E(Y i ) x9i â, var(Y i ) ó 2i : If ó 2i is unknown or is dif®cult to model, an attractive method for estimating â is ordinary least squares. The ordinary least squares estimator of â, which is unbiased, is n ÿ1 n P ^OLS (X 9X )ÿ1 X 9Y P x i x9i x i yi : â i1 However, ^ OLS ) (X 9X )ÿ1 X 9 var(Y)X (X 9X )ÿ1 var( â n P i1 i1 ÿ1 n n ÿ1 P P x i x9i x i x9i ó 2i x i x9i : i1 i1 (2) ^OLS ÿ â) is asymptotically ^OLS is consistent for â and n1=2 ( â The ordinary least squares estimator â multivariate normal with mean vector 0 and covariance matrix given by n times expression (2). In ^OLS ) ó 2 (X9X)ÿ1 if ó 2 ó 2 for all i. If ó 2 6 ó 2 for some i, then the usual equation (2), var( â i i ordinary least squares variance estimate may be biased. Several researchers have shown that equation (2) can be consistently estimated by various jackknife variance estimators of the form ! 2 n ÿ1 n ÿ1 n P P P ei e2i ÿ1 ÿ1 x i x9i x i x9i x i x9i , (3) X (X 9X ) (X 9X ) X 9 diag wi wi i1 i1 i1 ^OLS is the residual, w i is a known positive weight that converges to 1 as where e i Y i ÿ x9i â n ! 1 and diag(a i ) is a diagonal matrix with (a1 , . . ., a n ) on the main diagonal. In particular, White (1980) let w i 1 for all i, and Horn et al. (1975) let w i 1 ÿ h i , where h i x9i (X 9X )ÿ1 x i is the leverage. Horn et al. (1975) justi®ed their weights by noting that, if ó 2i ó 2 , then ^ is unbiased. Also, dividing e2 by 1 ÿ h i tends to E(e2i ) ó 2 (1 ÿ h i ) and their estimator of var( â) i 2 in¯ate the squared residual to approximate ó i more closely. When the variance is heterogeneous, simulations have shown (MacKinnon and White, 1985) that setting w i 1 ÿ h i has better ®nite ^ p be the pth element of â ^OLS and var( ^ p ) be the c â sample properties than setting w i 1. If we let â pth diagonal element of expression (3) with w i 1 ÿ h i , then, to test H 0 : â p 0, Wu (1986) assumed that ^ p ÿ âp â T p ^ p) c â var( (4) follows a t-distribution with n ÿ P degrees of freedom. However, Wu's (1986) simulations with small n found that the coverage probabilities using this t-approximation were often low. In this 498 S. R. Lipsitz, J. G. Ibrahim and M. Parzen paper, we propose to use a t-statistic for equation (4), but with a modi®ed degrees of freedom derived from the Satterthwaite (1946) approximation. 3. The degrees-of-freedom approximation To use the Satterthwaite approximation, we rewrite equation (4) as ^ p ÿ âp Zp â , p T p ^ (Q p = f p ) c âp) var( (5) where Zp p ^ p ÿ âp â ^ p )g c â Efvar( is approximately a normal (0, 1) random variable and Qp ^ p) c â f p var( ^ p )g c â Efvar( (6) is an approximate ÷ 2 random variable with f p degrees of freedom. Here, we also assume that ^ p )g var( â ^ p ). In the Satterthwaite approxima^ p ) is approximately unbiased so that Efvar( c â c â var( tion, the degrees of freedom f p are chosen so that Q p has the same ®rst two moments as a ÷ 2 random variable, i.e. so that the variance of Q p is twice its mean. Satterthwaite (1946) showed that the appropriate f p is fp ^ p )g]2 c â 2[Efvar( : ^ p )g c â varfvar( (7) If we substitute f p in equation (6), it is easy to see that the variance of Q p is twice its mean. Then, we shall assume that equation (5), and equivalently equation (4), follows a t-distribution with f p degrees of freedom. We note here that f p can be different for the different â p . ^ p )g and c â Thus, to use the Satterthwaite approximation, we need to estimate Efvar( ^ p )g is var( ^ p ). Thus, we now only ^ p )g in equation (7). An obvious estimate of Efvar( c â c â c â varfvar( ^ p )g, we ®rst ^ p )g to estimate f p in equation (7). To estimate varfvar( c â c â need an estimate of varfvar( ^ p and var( ^ p ) in simpler form. We write c â rewrite â ^ p c p Y P c pi yi , â n i1 where c p (c p1 , . . ., c pn ) is the pth row of (X9X) X9. Then, ÿ1 ^ p) var( â which we estimate by ^ p) c â var( n P i1 c2pi ó 2i , 2 n n c2pi e2i P P c pi e i p : (1 ÿ h i ) i1 1 ÿ h i i1 (8) If we let W diag{(1 ÿ h i )ÿ1=2 }, i.e. a diagonal matrix with elements (1 ÿ h i )ÿ1=2 on the diagonal, and e (e1, . . ., e n )9, we can rewrite equation (8) as Degrees-of-freedom Approximation for a t-statistic ^ p ) (c p W e)9(c p W e) e9(W c9p c p W )e e9A p e, c â var( 499 (9) where A p W c9p c p W . Equation (9) is just a quadratic form in e. ^ p )g, we can use multivariate normal theory results (Searle, 1982) to c â Thus, to obtain varfvar( obtain the variance of the quadratic form e9A p e, ^ p )g var(e9A p e) 2 tr[fA p var(e)g2 ] 4 E(e9)A p var(e)A p E(e), c â varfvar( (10) ^ p ), we compute the c â where tr[´] denotes the trace of a matrix. Then, to obtain the variance of var( mean and variance of e and substitute them in equation (10). Using results from linear models, it is easily shown that ^OLS ) 0 E(e) E(Y ÿ X â and var(e) (I n ÿ H) diag(ó 2i )(I n ÿ H), where I n is an n 3 n identity matrix and H X (X9X)ÿ1 X9 is the hat matrix. Plugging these results into equation (10), we obtain ^ p )g 2 tr[fA p (I n ÿ H) diag(ó 2 )(I n ÿ H)g2 ]: c â varfvar( i (11) After a little algebra, it can be shown that equation (11) can be rewritten as ^ p )g 2 tr(f(I n ÿ H)A p (I n ÿ H)g[f(I n ÿ H)A p (I n ÿ H)g #Ó]), c â varfvar( (12) where the symbol # represents element multiplication (the ijth element of the new matrix is the product of the ijth elements of the matrices that you are `multiplying'). Also, Ó has ith diagonal elements equal to ó 4i and ijth off-diagonal elements equal to ó 2i ó 2j . Thus, we need estimates of ó 4i and ó 2i ó 2j . We propose `almost unbiased' estimates of these two by noting that, if ó 2i ó 2 , then E(e4i ) 3ó 4 (1 ÿ h i )2 and E(e2i e2j ) var(Y i ) var(Y j )f2h2ij (1 ÿ h i )(1 ÿ h j )g, ^ which where h ij is the ijth element of the hat matrix H. Thus, we propose to estimate Ó with Ó, has ith diagonal element equal to ó^ 4i e4i 3(1 ÿ h i )2 (13) and ijth off-diagonal element equal to ó^ 2i ó^ 2j e2i e2j 2 ij 2h (1 ÿ h i )(1 ÿ h j ) : (14) As n ! 1, h ij ! 0 and h i ! 0, and equations (13) and (14) are asymptotically unbiased for ó 4i and ó 2i ó 2j respectively. Thus, we propose to estimate the degrees of freedom from the Satterthwaite approximation by 500 S. R. Lipsitz, J. G. Ibrahim and M. Parzen ^f p ^ p )2 c â 2 var( ^ 2 tr(f(I n ÿ H)A p (I n ÿ H)g[f(I n ÿ H)A p (I n ÿ H)g #Ó]) ^ p )2 c â var( ^ tr(f(I n ÿ H)A p (I n ÿ H)g[f(I n ÿ H)A p (I n ÿ H)g #Ó]) : (15) 4. Generalized least squares If var(Y i ) ó 2i is known, then the optimal linear estimation procedure is to estimate â via generalized least squares, i.e. n ÿ1 n P 1 P 1 ^ x i x9i x y, (16) âGLS 2 2 i i i1 ó i i1 ó i ^GLS ÿ â) ^GLS is consistent for â and n1=2 ( â which is unbiased. Furthermore (Mardia et al., 1979), â is asymptotically multivariate normal with mean vector 0 and covariance matrix given by n times n ÿ1 ^ GLS ) P 1 x i x9i : (17) var( â 2 i1 ó i Since this generalized least squares problem can be transformed to ordinary least squares by dividing Y i and x i by ó i, ^ p ÿ âp â T p ^ p) c â var( (18) follows a t-distribution with n ÿ P degrees of freedom under the null hypothesis H 0 : â p 0. Unfortunately, we do not know ó 2i . An estimated generalized least squares estimator of â, say ^EGLS , obtained by replacing ó 2 in equation (16) with a consistent estimator, say ó^ 2, will have â i i ^GLS . The variance as a function of covariates is often modelled the same asymptotic properties as â as ó 2i g(u9i á), (19) where g(:) is the identity function g(u9i á) u9i á or the exponential function g(u9i á) exp(u9i á) and u i is some function of the covariates (often, u i equals x i , but this is not necessary). By modelling the variance structure as in equation (19), we hope to obtain an estimator which is more ef®cient than ordinary least squares. The mean of the asymptotic distribution of e2i ^OLS )2 is ó 2 , i.e. e2 ó 2 ri (with ri having asymptotic mean 0). This suggests a set of (Y i ÿ x9i â i i i estimating equations for ó 2i of the form n P u9i fe2i ÿ g(u9i á)g 0: (20) i1 If g(:) is the linear function, then the solution to equations (20) is the ordinary least squares estimate, i.e. n ÿ1 n P P ^ u i u9i u i e2i : á i1 i1 If g(:) is the exponential function, then the solution to equations (20) can be obtained by using any generalized linear models program (such as SAS procedure Genmod (SAS Institute, 1993)) Degrees-of-freedom Approximation for a t-statistic 501 with Poisson error structure and a log-link. Then â is re-estimated by equation (16) with ó 2i ^ ). This is a three-stage technique in which stage 1 is ordinary least squares for â, replaced by g(u9i á stage 2 is ordinary least squares or a generalized linear model for á and stage 3 is estimated generalized least squares for â using ó^ i from stage 2. In our analyses, we iterate among the three ^EGLS and á ^ . The three-stage estimator and the fully stages until a convergence criterion is met for â iterated estimator have the same asymptotic properties. Using the same type of asymptotic expansions as in Prentice (1988), assuming that equation ^EGLS is consistent and has the asymptotic covariance matrix consis(19) is correctly speci®ed, â tently estimated by n ÿ1 ^EGLS ) P 1 x i x9i c â : (21) var( ^ 2i i1 ó The theory on estimated generalized least squares (Maddala, 1971; De Gruttola et al., 1987) proposes treating the estimates ó^ 2i as if they are known (and equal to ó 2i ) when making inferences ^EGLS to its estimated standard error is taken to have about â. As such, the ratio of an element of â ^GLS to its estimated standard error, which is a tthe same distribution as the ratio of an element of â distribution with n ÿ P degrees of freedom. Unfortunately, when the sample size is small, the estimate of ó 2i based on equations (20) can be variable and, assuming that it is known, and the resulting t-statistic has n ÿ P degrees of freedom, can result in low coverage, as seen in the simulations in the next section. 5. Simulation We performed simulations to compare coverage probabilities using the following: (a) ordinary least squares with ó^ 2i ó^ 2 and n ÿ P degrees of freedom; (b) ordinary least squares with variance estimator (3), ó^ 2i e2i =(1 ÿ h i ) and n ÿ P degrees of freedom; (c) ordinary least squares with variance estimator (3), ó^ 2i e2i =(1 ÿ h i ) and ^f p degrees of freedom; ^0 á ^ 1 xi and n ÿ P degrees of freedom. (d) estimated generalized least squares with ó^ 2i á The linear regression model considered was Y i 0:4xi ÿ 0:25x2i E i , (22) where the xi are ®xed, and the E i are independent. We performed sets of simulations when both E i N (0, xi ) and E i N (0, 1). As such, the estimated variance when using estimated generalized ^0 á ^ 1 xi , is correctly speci®ed (actually overspeci®ed) in both cases. To least squares, ó^ 2i á explore the ®nite sample properties of the coverage probabilities, we considered sample sizes of 12, 24 and 48, with 12 distinct covariate values equal to 1, 1.5, 2, 2.5, 3, 3.5, 4, 5, 6, 7, 8 and 10. For n 24, we doubled these values, and for n 48 we quadrupled them. We are most interested in coverage probabilities for con®dence intervals. Speci®cally, we look at 95% con®dence intervals of the form ^ pb t0:025,df v1=2 , â pb p ^ pb is an estimate of the pth element of â from the bth simulation and v pb is the estimate of where â its variance using one of the methods discussed. Also, t0:025,df p is the 97.5th percentile from a tdistribution with df p degrees of freedom. With 1825 replications, using large sample normal 502 S. R. Lipsitz, J. G. Ibrahim and M. Parzen theory, the proportion of con®dence intervals containing the true value should be in the closed interval [94, 95] if the true coverage probability is 95%. Also of interest are the average lengths of the con®dence intervals and the average degrees of freedom. The results are given in Tables 1 and 2. Table 1 contains the simulations under heterogeneous variance. When wrongly assuming constant variance (ó 2i ó 2 ), we see that the coverage probabilities do not appear to depend very much on the sample size. For â0, the coverage is high (about 98%), for â1, the coverage is just a little low (about 93.5%) and, for â2, it is low (about 90%). When using the robust variance estimator with df n ÿ P, the coverage for the intercept is apparently correct, for â1, the coverage is slightly low and, for â2, it is also low. As n increases, the coverage probabilities for â1 and â2 increase closer to the nominal level, though still low for â2. The degrees-of-freedom correction f p appears to make the coverage for â1 and â2 not signi®cantly different from the nominal level. Note that the lengths of the con®dence intervals are larger when using our degreesof-freedom approximation. Further, the average degrees of freedom are much smaller using our degrees-of-freedom approximation. This is particularly true for â2. When n 48, the average of Table 1. Heterogeneous varianceÐcoverage probabilities and average lengths of con®dence intervals from simulations with (â0 , â1 , â2 ) (0, 0.4, ÿ0.25) and var(Y|x ) x Parameter n Results from the following methods: Ordinary least squares, ó^ 2i ó^ 2 , df p n ÿ P â0 12 24 48 â1 12 24 48 â2 12 24 48 97.5{ 4.45{ 9.0} 98.3 3.00 21.0 98.8 2.10 45.0 93.6 2.01 9.0 93.5 1.35 21.0 93.7 0.95 45.0 91.0 0.18 9.0 90.9 0.14 21.0 90.1 0.087 45.0 Ordinary least squares, ó^ 2i e2i =(1 ÿ h i ), df p n ÿ P df p ^f p Estimated generalized least squares, ^0 á ^ 1 xi , ó^ 2i á df p n ÿ P 95.1 3.76 9.0 95.4 2.51 21.0 95.3 1.77 45.0 93.2 1.99 9.0 93.9 1.37 21.0 93.9 0.99 45.0 90.5 0.19 9.0 92.2 0.14 21.0 92.9 0.098 45.0 96.2 4.12 4.5 95.9 2.62 14.2 95.5 1.69 28.7 95.5 2.20 6.9 94.8 1.46 13.1 94.9 0.94 23.4 94.6 0.59 5.7 93.9 0.15 10.0 94.5 0.110 16.2 95.2 3.86 9.0 94.7 2.51 21.0 94.6 1.65 45.0 93.2 1.88 9.0 94.2 1.31 21.0 94.9 0.88 45.0 90.8 0.18 9.0 93.1 0.13 21.0 94.7 0.091 45.0 {The ®rst entry in each cell is the coverage probability. {The second entry in each cell is the average length of the con®dence interval. }The third entry in each cell is the average degrees of freedom. Degrees-of-freedom Approximation for a t-statistic 503 Table 2. Homogeneous varianceÐcoverage probabilities and average lengths of con®dence intervals from simulations with (â0 , â1 , â2 ) (0, 0.4, ÿ0.25) and var(Y|x ) 1 Parameter n Results from the following methods: Ordinary least squares, ó^ 2i ó^ 2 , df p n ÿ P â0 12 24 48 â1 12 24 48 â2 12 24 48 95.5{ 2.22{ 9.0} 94.1 1.46 21.0 94.8 1.01 45.0 94.5 1.00 9.0 95.4 0.66 21.0 94.6 0.45 45.0 95.3 0.92 9.0 94.5 0.061 21.0 94.1 0.042 45.0 Ordinary least squares, ó^ 2i e2i =(1 ÿ h i ), df p n ÿ P df p ^f p Estimated generalized least squares, ^0 á ^ 1 xi , ó^ 2i á df p n ÿ P 92.5 2.15 9.0 93.2 1.43 21.0 94.5 1.00 45.0 93.5 0.98 9.0 93.5 0.65 21.0 94.2 0.45 45.0 92.8 0.089 9.0 93.1 0.059 21.0 93.8 0.042 45.0 95.1 2.32 5.8 94.4 1.51 12.5 95.1 1.03 23.8 95.3 1.05 6.9 94.6 0.71 14.7 95.1 0.65 29.1 95.7 0.10 6.1 93.9 0.063 12.9 94.5 0.043 24.5 92.9 2.25 9.0 94.4 1.48 21.0 94.9 1.00 45.0 93.6 1.01 9.0 93.6 0.68 21.0 94.4 0.48 45.0 92.3 0.094 9.0 93.6 0.060 21.0 94.9 0.041 45.0 {The ®rst entry in each cell is the coverage probability. {The second entry in each cell is the average length of the con®dence interval. }The third entry in each cell is the average degrees of freedom. ^EGLS , the coverage is low (about 91%) for the degrees of freedom ( ^f p ) is only 16.2. When using â â2 when n 12 and slightly low (about 93%) for â2 when n 24. Thus, even when the variance is modelled correctly, the coverage can be low in small samples, and, in terms of coverage probability, one is better off using ordinary least squares with a robust variance and our degreesof-freedom approximation. Fig. 1 gives a plot of the coverage probabilities as a function of n for â2 (based on more simulations than those given in Table 1). As stated above, we see that using ordinary least squares with a robust variance estimate and our degrees-of-freedom approximation has the best coverage. Looking at Table 2, in which the true model has homogeneous variance, we see that the usual ordinary least squares method gives correct coverage probabilities, as expected. The coverage probability using the robust variance estimator with n ÿ P degrees of freedom appears low, although becoming closer to the nominal level as n increases. Finally, our degrees-of-freedom ^EGLS , the coverage approximation tends to increase the coverage to the nominal level. When using â is slightly low (about 92%) for â3 when n 12. Because of the broad range of possible data con®gurations, it is dif®cult to draw de®nitive 504 S. R. Lipsitz, J. G. Ibrahim and M. Parzen Fig. 1. Coverage probabilities as a function of n for â2 from simulations with (â0 , â1 , â2 ) (0, 0.4, ÿ0.25) and var(Y|x ) x: ´´´´´´´´´´´, ordinary least squares with ó^ 2i e 2i =(1 ÿ h i ) and df p ^f p ; - - - - - -, estimated generalized least squares; ± ± ± ±, ordinary least squares with ó^ 2i e 2i =(1 ÿ h i ) and df p n ÿ P; ÐÐÐ, ordinary least squares with ó^ 2i ó^ 2 and df p n ÿ P conclusions from a simulation. We can only make general suggestions. We have looked at a simple case to study the properties of the degrees-of-freedom approximation. We encourage using our degrees-of-freedom approximation ^f p since, when n is large, it will give the same coverage as the usual degrees of freedom n ÿ P and, when n is small, it appears more appropriate in our limited simulations and can increase the coverage probability by up to 5%. Further, it apparently has a better coverage probability in small samples than does the estimated generalized least squares estimate with correctly modelled variance. When analysing a data set in which the error variance appears heterogeneous, we suggest that the data analyst should calculate our proposed degrees of freedom ^f p and, if ^f p < 30, we suggest that our degrees-of-freedom approximation should be used. 6. Example Gasoline vapours are a signi®cant source of air pollution. One source of gasoline vapours occurs when gasoline is pumped into a tank and vapours are forced into the atmosphere. A laboratory experiment was conducted to determine the amount of gasoline vapours given off into the atmosphere under different temperature and vapour conditions (Cook and Weisberg, 1982). The model of interest is E(Y i jxi1 , xi2 , xi3 , xi4 ) â0 â1 xi1 â2 xi2 â3 xi3 â4 xi4 , where Y i is the amount of vapour given off, xi1 is the initial temperature of the tank, xi2 is the temperature of the gasoline dispensed, xi3 is the initial vapour pressure in the tank and xi4 is the vapour pressure of the gasoline dispensed. Through their proposed score statistic, Cook and Weisberg (1982) found that the error variance is of the form var(Y i jxi1 , xi2 , xi3 , xi4 ) exp(á0 á1 xi1 á2 xi4 ): Tables 3±6 give the t-tests and p-values obtained using the following respective variance estimators: Degrees-of-freedom Approximation for a t-statistic 505 Table 3. Ordinary least squares estimates of variance Parameter â0 â1 â2 â3 â4 Degrees of freedom for error Estimate Standard error t-statistic p-value 27 27 27 27 27 0.7663 0.0151 0.1992 ÿ4.8195 9.1439 2.0214 0.0903 0.0709 2.9318 2.9497 0.379 0.167 2.808 ÿ1.644 3.100 0.7076 0.8683 0.0092 0.1118 0.0045 Table 4. Robust estimates of variance with n ÿ P degrees of freedom Parameter â0 â1 â2 â3 â4 Degrees of freedom for error Estimate Standard error t-statistic p-value 27 27 27 27 27 0.7663 0.0151 0.1992 ÿ4.8195 9.1439 1.7390 0.0923 0.0573 3.9144 3.9841 0.4406 0.1637 3.4783 ÿ1.2312 2.2950 0.6630 0.8711 0.0017 0.2288 0.0297 Table 5. Robust estimates of variance with proposed degrees-of-freedom correction Parameter â0 â1 â2 â3 â4 Degrees of freedom for error Estimate Standard error t-statistic p-value 13.628 16.947 10.282 8.143 5.789 0.7663 0.0151 0.1992 ÿ4.8195 9.1439 1.7390 0.0923 0.0573 3.9144 3.9841 0.4406 0.1637 3.4783 ÿ1.2312 2.2950 0.6664 0.8718 0.0057 0.2526 0.0631 ^ 1 x i1 á ^ 2 x i4 ) Table 6. Estimated generalized least squares estimates with ó^ 2i exp(^ á0 á Parameter â0 â1 â2 â3 â4 Degrees of freedom for error Estimate Standard error t-statistic p-value 27 27 27 27 27 ÿ0.2610 0.0219 0.1863 ÿ4.7927 9.4411 1.6747 0.0759 0.0569 2.9221 2.9246 ÿ0.1558 0.2884 3.2725 ÿ1.6401 3.2282 0.8773 0.7752 0.0029 0.1126 0.0033 (a) ordinary least squares with ó^ 2i ó^ 2 and n ÿ P degrees of freedom; (b) ordinary least squares with variance estimator (3), ó^ 2i e2i =(1 ÿ h i ) and n ÿ P degrees of freedom; (c) ordinary least squares with variance estimator (3), ó^ 2i e2i =(1 ÿ h i ) and ^f p degrees of freedom; ^ 1 xi1 á ^ 2 xi4 ) and n ÿ P degrees á0 á (d) estimated generalized least squares with ó^ 2i exp(^ of freedom. From Tables 3±6, we observe several interesting results. First, we see how the degrees of freedom change dramatically from Table 4 to Table 5, especially for â4. Secondly, using the usual 506 S. R. Lipsitz, J. G. Ibrahim and M. Parzen ordinary least squares method with homogeneous variance and n ÿ P degrees of freedom, we see from Table 3 that the p-value for â4 is highly signi®cant ( p-value 0.0045) and becomes less signi®cant (Table 4) when the robust estimate of variance is used ( p-value 0.0297). However, when the degrees-of-freedom correction is used (Table 5), we see that the p-value for â4 becomes non-signi®cant at the 5% level ( p-value 0.0631). Further, using estimated generalized least squares, we obtain results that are similar to the usual ordinary least squares results ( p-value 0.0033). Thus, the effect of xi4 is reduced greatly when using ordinary least squares with robust variance and the degrees-of-freedom approximation ^f p , and we see how potentially different conclusions can be reached when the degrees-of-freedom approximation is used. This example agrees with the simulations in Section 4: the ordinary least squares variance and robust variance with degrees of freedom n ÿ P and estimated generalized least squares can be anticonservative in small samples. References Cook, R. D. and Weisberg, S. (1982) Residuals and In¯uence in Regression. London: Chapman and Hall. Ð (1983) Diagnostics for heteroscedasticity in regression. Biometrika, 70, 1±10. De Gruttola, V., Ware, J. H. and Louis, T. A. (1987) In¯uence analysis of generalized least squares estimators. J. Am. Statist. Ass., 82, 911±917. Hinkley, D. V. (1977) Jackkni®ng in unbalanced situations. Technometrics, 19, 285±292. Horn, S. D., Horn, R. A. and Duncan, D. B. (1975) Estimating heteroscedastic variances in linear models. J. Am. Statist. Ass., 70, 380±385. MacKinnon, J. G. and White, H. (1985) Some heteroscedasticity-consistent covariance matrix estimators with improved ®nite sample properties. J. Econometr., 29, 305±325. Maddala, G. S. (1971) Generalized least squares with an estimated variance covariance matrix. Econometrica, 39, 23±33. Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979) Multivariate Analysis. New York: Academic Press. Prentice, R. L. (1988) Correlated binary regression with covariates speci®c to each binary observation. Biometrics, 44, 1033±1048. SAS Institute (1993) The GENMOD procedure, release 6.09. Technical Report P-243, SAS/STAT Software. SAS Institute, Cary. Satterthwaite, F. E. (1946) An approximate distribution of estimates of variance components. Biometr. Bull., 2, 110±114. Searle, S. R. (1982) Matrix Algebra Useful for Statistics. New York: Wiley. Weisberg, S. (1985) Applied Linear Regression, 2nd edn. New York: Wiley. White, H. (1980) A heteroscedasticity-consistent covariance matrix and a direct test for heteroscedasticity. Econometrica, 48, 817±838. Wu, C. F. J. (1986) Jackknife, bootstrap and other resampling methods in regression analysis. Ann. Statist., 14, 1261±1295.