Variance Estimation in Complex Surveys

Variance

Estimation in

Complex Surveys

Third International Conference on

Establishment Surveys

Montreal, Quebec

June 18-21, 2007

Presented by:

Kirk Wolter, NORC and the

University of Chicago

Outline of Lecture –











Introduction (Chapter 1)

Textbook Methods (Chapter 1)

Replication-Based Methods





Random Group (Chapter 2)

Balanced Half-Samples (Chapter 3)





Jackknife (Chapter 4)

Bootstrap (Chapter 5)

Taylor Series (Chapter 6)

Generalized Variance Functions

(Chapter 7)

2

Chapter 1: Introduction

Notation and Basic Definitions

1.

Finite population,

U



- Residents of Canada



1 ,  , N



- Restaurants in Montreal

- Farms in Quebec

- Schools in Ottawa

2.

Sample, s

- Simple random sampling, without replacement

- Systematic sampling

- Stratification

- Clustering

- Double sampling

3


5. Probability sampling design,

P

 

-

P ( s )



0



P

 



1 s

8. Characteristic of interest, Y i

Y i





1

0

, if i th resident is employed

, if not employed

-

Y i

 yield in tons of i

 th farm

4


12. Parameter, 

- Proportion of residents who are employed

- Total production of farms

- Trend in price index for restaurants

- Regression of sales on area for pharmacies

13. Estimator,  ˆ

-

 ˆ  

5


14.

Expectation and variance

-

E



 

Var



 

 s





E

P

 s

 

   

 ˆ

P



E



2

 



E

2



16.

Estimator of variance

v



 

E v



 



Var



 

-

P v

Var

 ˆ

  

1

  

0

6

Textbook Methods

1.

Design: srs wor of size n

Estimator: f

Y

ˆ  f



1 i n 



1

 n / N y i

Variance Estimator: v

 



N

2



1

 f

 s

2

/ n s

2  i n 



1

 y i

 y

  n



1

 y

 i n 



1 y i

/ n

7

Textbook Methods

2.

Design: srs wor at both the first and second stages of sampling

Estimator:

Y

ˆ f

1

 f

1



1 i n  



1



1

2 i m i



1 j y ij

 n / N f

2 i

 m i

/ M i

Variance Estimator: v Y

 

ˆ 

N

2



1

 f

1

  i n 



1



M i y i .



Y

ˆ

/ N

2



/

  

N / n

 i n 



1

M i

2



1

 f

2 i

 s i

2

/ m i s i

2  m i j





1

 y ij

 y i .

2



/

 m i



1

 y i .

 j m i 



1 y ij

/ m i

8

Replication-Based

Methods v



 



C

 k 



1



 ˆ



  ˆ

2 

9

Chapter 2: The Method of Random Groups











Interpenetrating samples

Replicated samples

Ultimate cluster

Resampling

Random groups

10


The Case of Independent

Random Groups

(i) Draw a sample, methodology s

1

No restrictions on the sampling

(ii) Replace the first sample

Draw second sample, s

2

Use same sampling methodology obtained, k



2 s

1

, s

2

,  , s k

11










Common estimation procedure:

Editing procedures

Adjustments for nonresponse

Outlier procedures

Estimator of parameter

12










Common measurement process:

Field work

Callbacks

Clerical screening and coding

Conversion to machine-readable form

13






Estimators of the Parameter of Interest,



:

Random group estimators

 ˆ

1

,

 ˆ

2

,  ,

 ˆ k

Overall estimators



ˆ

 k

1

 k 



1

 ˆ



 ˆ

14


Two Examples:

Population total

Ratio

  i

N 



1

Y i



Y

 ˆ

 ˆ







 i

 s

W i

Y i

 i

 s



W i



Y i



Y

ˆ



Y

ˆ





ˆ

 k

1

 k 

1 i s



W i



Y i

 

Y

X

 ˆ 

 ˆ



 ˆ

 





1 k k 

 

1

ˆ

ˆ





15


Estimators of Var



ˆ

  or Var

 

: v (



ˆ

)



 k 



1

(

 ˆ



 

ˆ

)

2

/ k

 k



1

 v

1

 v (



ˆ

) v

2



  

 

1

 ˆ



  ˆ 2



/ k

 k



1



16


Properties:

E



 v



ˆ 





Var



ˆ

 

CV



 v



ˆ 







 Var



 v

Var



ˆ



ˆ

 









1

2











4



 

1



 k



3

  k



1

 k







1

2 v

1

 ˆ   ˆ

17


Confidence Intervals:



ˆ

 c v (



ˆ

) ,



ˆ

 c v (



ˆ

) c

 z



/ 2 or t k



1 ,



/ 2

18

Chapter 3: Variance

Estimation Based on

Balanced Half-Samples

Description of Basic Techniques

L strata

N h units per stratum

N size of entire population n h

= 2 units selected per stratum srs wr

Example: restaurants in Montreal

19

Chapter 3: Variance

Estimation Based on


Y

 average number of customers served by

Montreal restaurants on a Monday night y st

 h

L 



1

W h y h

W h



N h

/ N y h



 y h 1

 y h 2



/ 2

20

Chapter 3: Variance

Estimation Based on


Textbook Estimator of Variance v

  st

 h

L 



1

W h

2 s h

2

/ 2

 h

L 



1

W h

2 d h

2

/ 4 s h

2 d h



 i

2 



1

 y hi y h 1

 y h 2

 y h

 

2



1



21

Chapter 3: Variance

Estimation Based on


Random Group Estimator of Variance k

= 2 independent random groups are available y

11

, y

21

,  , y

L 1 y

12

, y

22

,  , y

L 2 v

RG



 y st , 1





2 



1 y st , 2

2



 y st ,



/ 4

 y st

2

 y st ,



 h

L 



1

W h y h



22

Chapter 3: Variance

Estimation Based on


Half-Sample Methodology

 h 1





1 , if unit the



-

( h , 1 ) is selected for th half sample



0 , otherwise

2

L possible half samples y st ,



 h

L 



1

W h

  h 1

 y h 1

  h 2

 y h 2



23

Chapter 3: Variance

Estimation Based on


Choosing a Manageable Number, k

, of Half-

Samples v k

  st



 k 



1

 y st ,



 y st

2



/ k k random k balanced

24

Chapter 3: Variance

Estimation Based on


Table 3.2.1. Definition of Balanced Half-Sample

Replicates for 5, 6, 7, or 8 Strata

Replicate 1 2

Stratum (

3 4 5 h )

6 7

 h

 h

 h

 h

 h

 h

 h

 h

+1

-1

+1

-1

-1

-1

-1 -1 +1 -1

+1 +1 -1 -1 +1

+1 +1 +1 -1 -1

+1 +1 +1 -1

-1 +1 +1 +1

+1 -1 +1 +1

-1 +1 -1 +1

-1 -1 -1 -1

+1 +1

-1

+1

-1

-1

+1

+1 +1

-1

+1

-1

+1

-1

-1

-1

8

-1

-1

-1

-1

-1

-1

-1

-1

25

Chapter 3: Variance

Estimation Based on


Properties of the Balanced Half-Sample

Methods v k

    st

 st k

1

 k 



1 y st ,



 y st

, provided k



L

26

Chapter 3: Variance

Estimation Based on


Usage with Multistage Designs

Y



total number of employed persons in Canada

Y

ˆ p h 1

 h

L 



1



Y

ˆ h 1

/ 2

 housing p h 1



Y

ˆ h 2 units

/ 2 p h 2



Y

ˆ h 1

 estimator employed of total number persons in

 

of th PSU v

   h



1

Y

ˆ h 1

/ p h 1



Y

ˆ h 2

/ p h 2

2



/ 4

27

Chapter 3: Variance

Estimation Based on


Balanced Half-Sample Methodology

Y

ˆ



 h

L 



1



 h 1



Y

ˆ h 1

/ p h 1

  h 2



Y

ˆ h 2



/ p h 2

 v k

  

 

1

Y

ˆ





Y

ˆ 2



/ k

28

Chapter 3: Variance

Estimation Based on


Alternative Half-Sample Estimators of

Variance v k v c k v k v

† k v

 k



  

 

1

 ˆ



  ˆ 2





 

ˆ 

1 k

 k 



1



 ˆ

 c   ˆ 2



/ k



    k

ˆ 



1

   v k c



 

ˆ



 

ˆ 

1

4 k

 k 



1



 ˆ



  ˆ

 c

2





 

ˆ 



1

2 k

 k 



1



 ˆ 



  ˆ 2



Estimators are not necessaril y equal

29

Chapter 4:

The Jackknife Method

Quenouille (1949) – bias reduction

Tukey (1958) – variance estimation testing interval estimation

Resampling

30

Chapter 4:


Basic Methodology

Random sample y

1

, y

2

,  , y n

Random groups n

 km

Parameter



(example : yield per acre of farms in

Quebec)

Estimator

 ˆ

31

Chapter 4:


Drop out m

 ˆ  

1 ,  , k

Pseudovalue

 ˆ



 k

 ˆ 

 k



1

  ˆ



Quenouille’s estimator



ˆ

 k

1

 k 



1

 ˆ



Variance estimator v

1

(



ˆ

)

 v

2

(



ˆ

)

 k

 k

1



1



 k 



1

(

 ˆ



 

ˆ

)

2 k

 k

1



1



 k 



1



 ˆ



  ˆ

2 

Special case k

 n , m



1

32

Chapter 4:


Full-sample estimator

 ˆ

Variance estimator v

 k

 k

1



1

 i k 



1



 ˆ



  ˆ

2 

33

Chapter 4:


Example: ratio

 

Y / X

 ˆ  y / x

 ˆ  y   / x  

 ˆ  k y / x



 k



1

 y   / x  

34

Chapter 4:


Usage in Stratified Sampling

Drop out observation(s) from individual strata

 ˆ

 ˆ v

1



L  h



1 n h n h



1 i n h 



1



 ˆ   ˆ 2



35

Chapter 4:


Application to Cluster Sampling

Example

 

total employed persons

Drop out ultimate clusters

 ˆ 

 ˆ

1 n

 i n 



1

Y

ˆ i

/ p i

  i m

 k

1



1



 k



1

 m i





1

Y

ˆ i

/

 j p i

W ij

Y ij

  i

 j

W

(



) ij

Y ij

W

(



) ij



 m ( mk k



1 )

W ij

0

, if PSU is not dropped out

, if PSU is dropped out

36

Chapter 5: The

Bootstrap Method

Works with replicates of potentially any size, n

*

Original Application –

Y

1

,  , Y n

are iid random variables (scalar or vector) from a distribution function F



is to be estimated

 ˆ

37

Chapter 5: The

Bootstrap Method

A bootstrap sample (or bootstrap replicate ) is a simple random sample with replacement (srs wr) of size

* n selected from the original sample.

Y

1

*

,  , Y

* n

*

 ˆ

*

denotes the estimator of the same functional form as

 ˆ

38

Chapter 5: The

Bootstrap Method

Ideal Bootstrap Estimator of Var



  v

1

 



Var

*

 

, where Var

*

signifies the conditional variance, given the original sample

Monte Carlo Bootstrap Estimator of Var



  i.

Draw a large number, A , of independent bootstrap replicates from the main sample and label the corresponding observations as

Y



*

1

,  , Y

*

 n

*

, for

 

1 ,  , A ; ii.

For each bootstrap replicate, compute the corresponding estimator

 ˆ

*



of the parameter of interest; and iii.

Calculate the variance between the

 ˆ

*

values v

2



A

1



1



A 



1

 ˆ

*



 

ˆ

*

2

,



ˆ

* 

1

A



A 



1

 ˆ



*

.

39

Chapter 5: The

Bootstrap Method

Application to the Finite Population

–

Simple Random Sampling with Replacement (srs wr)

Data y

1

,  , y n

Parameter of Interest

Y

Standard Estimator y



   y i

40

Chapter 5: The

Bootstrap Method

Bootstrap Sample y

*

1

,  , y

* n *

Estimator y

* 

   y i

*

Bootstrap Moments

E

*

 

1



1 n n  i y i

Var

*

 

1



1 n n  i

 y i

 y



2  n



1 s n

2

Ideal Bootstrap Estimator of Variance v

1



Var

*



Var

*

{ y

1

*

}

 n

* n



1 s

2 n n

*

Unbiased Choice n

*  n



1

41

Chapter 5: The

Bootstrap Method

Multistage Sampling with pps wr Sampling at the First

Stage

Observed Data y ij

, where i indexes the selected PSU and j indexes the completed interview within the PSU

Parameter of Interest

Y

Estimator

Y

ˆ z i

 i n 



1



Y

ˆ i

/ p i

 j w ij y ij

 n

1 n

 i

Y

ˆ i p i

 n

1 n

 i z i

42

Chapter 5: The

Bootstrap Method

Bootstrap Sample z

1

*

, z

2

*

,...,

* z n

*

Bootstrap Moments

E

*

 n

1

Var

*

 

1

 n

1 n

 i z i



Y

ˆ n

 i z

 i



Y

ˆ 2



Ideal Bootstrap Estimator of Variance v

1

 



Var

*

 



Var

*

 

1

 n

*

1 n

* n

1 n h  i

 z i



Y

ˆ 2



.

Unbiased Choice n

*  n



1

43

Chapter 6: Taylor

Series Methods

 Assume a complex survey design



Y



( Y

1

,..., Y p

)



vector of population totals





( Y

ˆ

1

,..., Y

ˆ p

)





  g ( Y ) parameter of interest, such as

Y

1 the ratio

Y

2



 ˆ  g ( )

44

Chapter 6: Taylor

Series Methods



First-order Taylor series approximation

 ˆ    j p 



1

 g ( Y )

 y j

( Y

ˆ j



Y j

)



R



MSE

MSE {

 ˆ

} Var { j p 



1

 g ( Y ) d

 j  y j

 

E {(

 g ( Y )

 y j

( Y

ˆ j



Y j

)}

 d

 d





Y )(



Y )



}

45

Chapter 6: Taylor

Series Methods

 v (

 ˆ

)

 ˆ ˆ  d

ˆ j



 g (

ˆ

)

 y j



by textbook or replication-based method applied to the y -data

 Alternative algorithm

Y

ˆ j

  i

 s

W i

Y ji

U

ˆ   i

 s

W i

U i

U i

 j p 



1

 g ( Y )

 y j

Y ji

MSE {

 ˆ

} Var { U

ˆ

}

46

Chapter 7: Generalized

Variance Functions

1. Population total, X

2. Estimator of the total, X

ˆ

3. Relative variance,

V

2 

Var

X 2

4. V

2    

/ X

47

Variance Estimation in Complex Surveys

Variance

Estimation in

Complex Surveys

Notation and Basic Definitions

The Case of Independent

Random Groups

Description of Basic Techniques

Usage with Multistage Designs

Basic Methodology

Usage in Stratified Sampling

Application to Cluster Sampling

First-order Taylor series approximation

MSE

Related documents

Products

Support

Variance Estimation in Complex Surveys

Variance

Estimation in

Complex Surveys

Notation and Basic Definitions

The Case of Independent

Random Groups

Description of Basic Techniques

Usage with Multistage Designs

Basic Methodology

Usage in Stratified Sampling

Application to Cluster Sampling

First-order Taylor series approximation

MSE

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib