Variance Estimation in Complex Surveys

advertisement
Variance Estimation in
Complex Surveys
Drew Hardin
Kinfemichael Gedif
So far..

Variance for estimated mean and total
under
 SRS, Stratified, Cluster (single, multi-stage), etc.

Variance for estimating a ratio of two
means under
 SRS (we used linearization method)
What about other cases?

Variance for estimators that are not linear
combinations of means and totals
– Ratios

Variance for estimating other statistic from
complex surveys
– Median, quantiles, functions of EMF, etc.

Other approaches are necessary
Outline

Variance Estimation Methods
– Linearization
– Random Group Methods
– Balanced Repeated Replication (BRR)
– Resampling techniques
 Jackknife, Bootstrap
Adapting to complex surveys
 ‘Hot’ research areas
 Reference

Linearization (Taylor Series
Methods)
We have seen this before (ratio estimator
and other courses).
 Suppose our statistic is non-linear. It can
often be approximated using Taylor’s
Theorem.
 We know how to calculate variances of
linear functions of means and totals.

Linearization (Taylor Series
Methods)

Linearize
h(c1, c 2, c3,...., ck )
ˆ
ˆ
ˆ
ˆ
h(t 1, t 2, t 3,..., t k )  h(t1, t 2,..., tk )  
t 1, t 2 ,.. tk (tˆj  tj )
cj
j 1
 Calculate Variance
k
 h
ˆ
ˆ
V h(t1 ,..., t k )  
 tˆ1
2
 h

ˆ
( t1 ,... t k )  V (t1 )    

 tˆk
 h   h 
     Cov(tˆi , tˆ j )
ˆ tˆ j 
i  j  t i  
 
2

ˆ
( t1 ,... t k )  V (t k )

Linearization (Taylor Series)
Methods
– Pro:
 Can be applied in general sampling designs
 Theory is well developed
 Software is available
– Con:
 Finding partial derivatives may be difficult
 Different method is needed for each statistic
 The function of interest may not be expressed a
smooth function of population totals or means
 Accuracy of the linearization approximation
Random Group Methods
Based on the concept of replicating the survey
design
 Not usually possible to merely go and replicate
the survey
 However, often the survey can be divided into R
groups so that each group forms a miniature
versions of the survey

Random Group Methods
Stratum 1
Stratum 2
Stratum 3
Stratum 4
Stratum 5
1
1
1
1
1
2
2
2
2
2
3
3
3
3
3
4
4
4
4
4
5
5
5
5
5
6
6
6
6
6
7
7
7
7
7
8
8
8
8
8
Treat as miniature sample

Unbiased Estimator (Average of Samples)
R
~ 2
ˆ
(


 r )
1
~
Vˆ1 (  )  r 1
R
R 1

Slightly Biased Estimator (All Data)
R
2
ˆ
ˆ
(



)
 r
1 r 1
ˆ
V2 
R
R 1
Random Group Methods

Pro:
– Easy to calculate
– General method (can also be used for non smooth
functions)

Con:
– Assumption of independent groups (problem when N
is small)
– Small number of groups (particularly if one strata is
sampled only a few times)
– Survey design must be replicated in each random
group (presence of strata and clusters remain the
same)
Resampling and Replication Methods

Balanced Repeated Replication (BRR)
– Special case when nh=2
Jackknife (Quenouille (1949) Tukey (1958))
 Bootstrap (Efron (1979) Shao and Tu (1995))
 These methods





Extend the idea of random group method
Allows replicate groups to overlap
Are all purpose methods
Asymptotic properties ??
Balanced Repeated Replication
Suppose we had sampled 2 per stratum
 There are 2H ways to pick 1 from each
stratum.
 Each combination could treated as a
sample.
 Pick R samples.

Balanced Repeated Replication

Which samples should we include?
– Assign each value either 1 or –1 within the stratum
– Select samples that are orthogonal to one another to
create balance
– You can use the design matrix for a fraction factorial
– Specify a vector ar of 1,-1 values for each stratum

Estimator
R

1
ˆ
ˆ
VBRR ( )   ˆ(a r)  ˆ
R r 1

2
Balanced Repeated Replication

Pro
– Relatively few computations
– Asymptotically equivalent to linearization methods for
smooth functions of population totals and quantiles
– Can be extended to use weights

Con
– 2 psu per sample
 Can be extended with more complex schemes
The Jackknife
SRS-with replacement

Quenoule (1949); Tukey (1958); Shao and Tu (1995)
Let ˆi be the estimator of  after omitting the ith
n
observation
ˆ
~
~
~
 J    i / n where  i  n ˆ  (n  1)ˆ i
Jackknife estimate
i 1

ˆ
Jackknife estimator of the V ( )


l
n
n

1
ˆ i  ˆ ) 2
VJ (ˆ) 
(


n i 1
n
where ˆ   ˆ i / n
i 1
n
1
~i ~ 2

(

J )

n(n  1) i 1

For Stratified SRS without replacement Jones (1974)
The Jackknife
stratified multistage design
In stratum h, delete one PSU at a time
 Let ˆ( hi) be the estimator of the same form as ˆ
when PSU i of stratum h is omitted
 Jackknife estimate:

y hi  h ' h Wh ' yh ' Wh (nh yh  yhi ) /( nh  1) where ˆ hi  g ( y hi )

Or using pseudovalues
~
 ( hi)  nhˆ  (nh  1)ˆ ( hi)
~(I )
L
nh
~ ( hi)
 J  
h 1 i 1
~ ( II )
/ n ; J
1 L 1
 
L h1 nh
nh
~ ( hi)

i 1
The Jackknife
stratified multistage design

Different formulae for V (ˆ)
nh
n

1
)
ˆ ( hi)  ˆ method ) 2
VL (ˆ)   h
(


n
h 1
i 1
h
L


L
L
h 1
h 1
method
can be ˆ ( h ) , ˆ,  ˆ ( hi ) / n, or  ˆ ( h ) / L
Where ˆ
Using the pseudovalues
nh
n

1
)
~ ( hi) ~ ( j ) 2
h
ˆ
VL ( )  
(
J )

nh i 1
h 1
L
j  I , II
The Jackknife
Asymptotics

Krewski and Rao (1981)
Based on the concept of a sequence of finite populations  
with L strata in L

Under conditions C1-C6 given in the paper


L
n1/ 2 (ˆ   ) d N (0,  2 )
ii ) nVmethod (ˆ)   2
ˆ  
iii ) Tmethod 
d N (0,1)
Vmethod (ˆ)
i)
Where method is the estimator used (Linearization, BRR, Jackknife)
L 1
The Bootstrap
Naïve bootstrap


Efron (1979); Rao and Wu (1988); Shao and Tu (1995)
Resample y  with replacement in stratum h
* nh
hi i 1
yh*(b )  nh
1
y
i
*(b )
hi
,
y *(b )  h yh*(b ) , and ˆ*(b )  g ( y * )

Estimate: b  1,2,..., B

Variance: VˆNBS (ˆ* )  E* (ˆ*  E* (ˆ* )) 2


B
1
ˆ*(b )  ˆ*. )
– Or approximate by Vˆ * (ˆ* ) 
(


NBS
B  1 b 1

The estimator is not a consistent estimator of the
variance of a general nonlinear statistics
The Bootstrap
Naïve bootstrap

For ˆ*  Wh yh*  y *
2
W
 nh  1  2
*
h

sh
Var ( y )  
nh  nh 


Comparing with
The ratio
bounded nh
Var ( y * )
Var ( y )
Var ( y )  
Wh2
nh
sh2
does not converge to 1for a
The Bootstrap
Modified bootstrap
Resample
 Calculate:

y 
* mh
hi i 1
, mh  1
~
yhi  yh 
with replacement in stratum h
m1h/ 2
*
(
y
 y)
hi
1/ 2
(nh  1)
mh
L
~
~
~
~
yh   yhi / mh , y   Wh ~
yh ,   g ( ~
y)
i 1
h


Variance:
 Can be approximated with Monte Carlo
 For the linear case, it reduces to the customary
unbiased variance estimator
~
~
~
*
VˆMBS
( * )  E* ( *  E* ( * )) 2


mh < nh
More on bootstrap

The method can be extended to stratified srs
without replacement by simply changing
~
yhi
to
1/ 2
m
*
~
h
yhi  yh 
(
1

f
)(
y
h
hi  yh )
1/ 2
(nh  1)
For mh=nh-1, this method reduces to the naïve BS
 For nh=2, mh=1, the method reduces to the
random half-sample replication method
 For nh>3, choice of mh …see Rao and Wu (1988)

Simulation
Rao and Wu (1988)





Jackknife and Linearization intervals gave
substantial bias for nonlinear statistics in one sided
intervals
The bootstrap performs best for one-sided intervals
(especially when mh=nh-1)
For two-sided intervals, the three methods have
similar performances in coverage probabilities
The Jackknife and linearization methods are more
stable than the bootstrap
B=200 is sufficient
‘Hot’ topics
Jackknife with non-smooth functions (Rao
and Sitter 1996)
 Two-phase variance estimation (Graubard
and Korn 2002; Rubin-Bleuer and SchiopuKratina 2005)
 Estimating Function (EF) bootstrap method
(Rao and Tausi 2004)

Software





OSIRIS – BRR, Jackknife
SAS – Linearization
Stata – Linearization
SUDAAN – Linearization, Bootstrap, Jackknife
WesVar – BRR, JackKnife, Bootstrap
References:
Effron, B. (1979). Bootstrap methods: Another look at the jackknife. Annals of
statistics 7, 1-26.
 Graubard, B., J., Korn, E., L. (2002). Inference for supper population parameters
using sample surveys. Statistical Science, 17, 73-96.
 Krewski, D., and Rao, J., N., K. (1981). Inference from stratified samples: Properties
of linearization, jackknife, and balanced replication methods. The annals of statistics.
9, 1010-1019.
 Quenouille, M., H.(1949). Problems in plane sampling. Annals of Mathematical
Statistics 20, 355-375.
 Rao, J.,N.,K., and Wu, C., F., J., (1988). Resampling inferences with complex survey
data. JASA, 83, 231-241.
 Rao, J.,N.,K., and Tausi, M. (2004). Estimating function variance estimation under
stratified multistage sampling. Communications in statistics. 33:, 2087-2095.
 Rao, J. N. K., and Sitter, R. R. (1996). Discussion of Shao’s paper.Statistics, 27, pp.
246–247.
 Rubin-Bleuer, S., and Schiopu-Kratina, I. (2005). On the two-phase framework for
joint model and design based framework. Annals of Statistics (to appear)
 Shao, J., and Tu, (1995). The jackknife and bootstrap. New York: Springer-Verlag.
 Tukey, J.W. (1958). Bias and confidence in not-quite large samples. Annals of
Mathematical Statistics. 29:614.
Not referred in the presentation
 Wolter, K. M. (1985) Introduction to variance estimation. New York: Springer-Verlag.
 Shao, J. (1996). Resampling Methods in Sample Surveys. Invited paper, Statistics,
27, pp. 203–237, with discussion, 237–254.

Download