Outline for Class meeting 7 (hapter 3, Lohr, 9/20/99)

advertisement
Outline for Class meeting 25 (Chapter 9, Lohr, 4/19/04)
Variance Estimation
I. Often, analytic assessment of the variance of estimators made from probability designs
can be done using the simple relationship:
V ( ik1 ai tˆi )  ik1 ai2V (tˆi )  2 ik1  kj  i 1 ai a j Cov(tˆi , tˆ j )
A similar relationship holds for estimates of means. (See Lohr, p. 290).
If the sample design is a simple random sample of size n, then
Cov(tˆi , tˆ j )  N 2 (1  f ) S ij / n ,
where S ij   kN1 ( yik  yi )( y jk  y j ) /( N  1) . This can be estimated as with variance
estimators (by substituting the sample version of the estimator for the population one.)
There are two kinds of situations in which this relationship is helpful.
A. When the parameter of interest really is a linear combination of totals or means.
Example: Visitor’s survey
B. When the parameter of interest isn’t really a linear combination of totals or means, but
can be approximated by one. This is the Taylor linearization approach discussed
previously, leading to
V h(tˆ)  h(t )2 V (tˆ) ,
or in the multivariate case
2
2
 h

 h

V h(tˆ1 ,..., tˆk )  
V (tˆ1 )    
(
t
,...
t
)
(
t
,...
t
)
 V (tˆk )

1
k
1
k
 tˆ1

 tˆk

 h   h 
    
Cov(tˆi , tˆ j )
i  j  tˆi   tˆ j 
.
II. When surveys have very complicated designs, or when non-linear estimators are
required, it becomes either hard or impossible to calculate variances using analytical
methods, such as those we have discussed. There are alternatives.
An excellent reference for all these methods is a book by Kirk Wolter, Introduction to
Variance Estimation (1985).
A. Random Group Method
1. Denote your estimator by ̂ . The idea here is that if you conducted your survey in R
identical replicates, computed your estimator from each one ( ̂ r ), and formed the
~
average of the estimators (called  ), then its variance could be estimated unbiasedly by:
R
~ 2
 (ˆ r   )
1
~
Vˆ1 (  )  r 1
.
R
R 1
Another estimator (biased upward slightly) is
R
2
 (ˆ r  ˆ )
1
Vˆ2  r 1
.
R
R 1
2. In practice, your sample design is not carried out in identical replicates. But you
pretend it was(!), and just divide up your sample into R identical pieces.
a. This is easy for a SRS.
b. In a cluster sample, you must keep all the units in a PSU together in the same piece to
preserve the correlation structure.
c. In a stratified multistage design, each random group contains a sample of psu's from
each stratum.
3. Note that if k PSU's are sampled from the smallest stratum, you cannot have more than
K random groups.
B. The jackknife
1. Background (Review for most of you, I think?)
The jackknife is an all-purpose tool (hence the name) for calculating variance when the
analytics are intractable. It was developed for "regular" (non-finite population) parameter
estimation, but can also be used in finite population sampling, but less is known about its
performance there. The jackknife was popularized by John Tukey. originally used for
bias reduction.
a. Denote your estimator by ̂ . (E.g., could be estimator of ratio tˆy / tˆx .) Let ˆ ( j ) be
the estimator of the same form as ̂ , but not using observation j. Then
n
VˆJK (ˆ )  n 1  (ˆ ( j )  ˆ ) 2 .
n
j 1
b. Why would this work?
An example for which you know the answer will illustrate (not prove) that this is
sensible. Suppose the estimator is of a total from a SRS. That is, let ˆ  tˆ  N  yi .
n
 tˆ  Ny j
Then ˆ ( j )  N ( n tˆ  y j ) . Thus (ˆ ( j )  ˆ ) 2  
n 1 N
 n 1
2

 , and

is
n  tˆ  Ny j  2
n
2
n 1
ˆ
ˆ

  N
V JK (t ) 
( y j  y) 2 .


n
n 1
n( n 1)
j 1 
j 1

Note that this is the correct variance, except for the fpc.
2. Jackknife in complex designs
a. As with other resampling type variance estimators we have discussed, one leaves out a
PSU at a time when calculating ˆ ( j ) . When stratification is present, the variance
estimation is done separately in each stratum.
b. The jackknife approach does not work well for estimators that are not smooth in the
data, like estimates of the median or other quantiles.
c. A classic reference for the jackknife and the bootstrap (not for finite populations, but in
general) is a monograph, The Jackknife, the Bootstrap and Other Resampling Plans
(1982), by Bradley Efron.
III. Software (http://www.fas.harvard.edu/~stats/survey-soft/METHODS.html)
Summary of survey software: Methods
SAS
Taylor expansion.
Stata
Taylor-series linearization is used in the survey analysis commands. There are also commands for jackknife
and bootstrap variance estimation, although these are not specifically oriented to survey data.
SUDAAN
The Taylor series linearization method (GEE for regression models) is used combined with variance
estimation formulas specific to the sample design. The user does not need to develop special replicate
weights since the sample design can be specified directly to the program.
Jackknife and Balanced Repeated Replication (BRR) variance estimation is also
supported.
WesVar
Balanced repeated replication (including the Fay method), jackknife (several variants) and other replication
methods specified by users through the development of replicate weights (e.g., bootstrap
Download