Variance Estimation in Complex Surveys

advertisement

Variance

Estimation in

Complex Surveys

Third International Conference on

Establishment Surveys

Montreal, Quebec

June 18-21, 2007

Presented by:

Kirk Wolter, NORC and the

University of Chicago

Outline of Lecture –

Introduction (Chapter 1)

Textbook Methods (Chapter 1)

Replication-Based Methods

Random Group (Chapter 2)

Balanced Half-Samples (Chapter 3)

Jackknife (Chapter 4)

Bootstrap (Chapter 5)

Taylor Series (Chapter 6)

Generalized Variance Functions

(Chapter 7)

2

Chapter 1: Introduction

Notation and Basic Definitions

1.

Finite population,

U

- Residents of Canada

1 ,  , N

- Restaurants in Montreal

- Farms in Quebec

- Schools in Ottawa

2.

Sample, s

- Simple random sampling, without replacement

- Systematic sampling

- Stratification

- Clustering

- Double sampling

3

Chapter 1: Introduction

5. Probability sampling design,

P

 

-

P ( s )

0

P

 

1 s

8. Characteristic of interest, Y i

Y i

1

0

, if i th resident is employed

, if not employed

-

Y i

 yield in tons of i

 th farm

4

Chapter 1: Introduction

12. Parameter, 

- Proportion of residents who are employed

- Total production of farms

- Trend in price index for restaurants

- Regression of sales on area for pharmacies

13. Estimator,  ˆ

-

 ˆ  

5

Chapter 1: Introduction

14.

Expectation and variance

-

E

 

Var

 

 s

E

P

 s

 

   

 ˆ

P

E

2

 

E

2

16.

Estimator of variance

v

 

E v

 

Var

 

-

P v

Var

 ˆ

  

1

  

0

6

Textbook Methods

1.

Design: srs wor of size n

Estimator: f

Y

ˆ  f

1 i n 

1

 n / N y i

Variance Estimator: v

 

N

2

1

 f

 s

2

/ n s

2  i n 

1

 y i

 y

  n

1

 y

 i n 

1 y i

/ n

7

Textbook Methods

2.

Design: srs wor at both the first and second stages of sampling

Estimator:

Y

ˆ f

1

 f

1

1 i n  

1

1

2 i m i

1 j y ij

 n / N f

2 i

 m i

/ M i

Variance Estimator: v Y

 

ˆ 

N

2

1

 f

1

  i n 

1

M i y i .

Y

ˆ

/ N

2

/

  

N / n

 i n 

1

M i

2

1

 f

2 i

 s i

2

/ m i s i

2  m i j

1

 y ij

 y i .

2

/

 m i

1

 y i .

 j m i 

1 y ij

/ m i

8

Replication-Based

Methods v

 

C

 k 

1

 ˆ

  ˆ

2 

9

Chapter 2: The Method of Random Groups

Interpenetrating samples

Replicated samples

Ultimate cluster

Resampling

Random groups

10

Chapter 2: The Method of Random Groups

The Case of Independent

Random Groups

(i) Draw a sample, methodology s

1

No restrictions on the sampling

(ii) Replace the first sample

Draw second sample, s

2

Use same sampling methodology obtained, k

2 s

1

, s

2

,  , s k

11

Chapter 2: The Method of Random Groups

Common estimation procedure:

Editing procedures

Adjustments for nonresponse

Outlier procedures

Estimator of parameter

12

Chapter 2: The Method of Random Groups

Common measurement process:

Field work

Callbacks

Clerical screening and coding

Conversion to machine-readable form

13

Chapter 2: The Method of Random Groups

Estimators of the Parameter of Interest,

:

Random group estimators

 ˆ

1

,

 ˆ

2

,  ,

 ˆ k

Overall estimators

ˆ

 k

1

 k 

1

 ˆ

 ˆ

14

Chapter 2: The Method of Random Groups

Two Examples:

Population total

Ratio

  i

N 

1

Y i

Y

 ˆ

 ˆ

 i

 s

W i

Y i

 i

 s

W i

Y i

Y

ˆ

Y

ˆ

ˆ

 k

1

 k 

1 i s

W i

Y i

 

Y

X

 ˆ 

 ˆ

 ˆ

 

1 k k 

 

1

ˆ

ˆ

15

Chapter 2: The Method of Random Groups

Estimators of Var

ˆ

  or Var

 

: v (

ˆ

)

 k 

1

(

 ˆ

 

ˆ

)

2

/ k

 k

1

 v

1

 v (

ˆ

) v

2

  

 

1

 ˆ

  ˆ 2

/ k

 k

1

16

Chapter 2: The Method of Random Groups

Properties:

E

 v

ˆ 



Var

ˆ

 

CV

 v

ˆ 



 Var

 v

Var

ˆ

ˆ

 





1

2

4

 

1

 k

3

  k

1

 k

1

2 v

1

 ˆ   ˆ

17

Chapter 2: The Method of Random Groups

Confidence Intervals:

ˆ

 c v (

ˆ

) ,

ˆ

 c v (

ˆ

) c

 z

/ 2 or t k

1 ,

/ 2

18

Chapter 3: Variance

Estimation Based on

Balanced Half-Samples

Description of Basic Techniques

L strata

N h units per stratum

N size of entire population n h

= 2 units selected per stratum srs wr

Example: restaurants in Montreal

19

Chapter 3: Variance

Estimation Based on

Balanced Half-Samples

Y

 average number of customers served by

Montreal restaurants on a Monday night y st

 h

L 

1

W h y h

W h

N h

/ N y h

 y h 1

 y h 2

/ 2

20

Chapter 3: Variance

Estimation Based on

Balanced Half-Samples

Textbook Estimator of Variance v

  st

 h

L 

1

W h

2 s h

2

/ 2

 h

L 

1

W h

2 d h

2

/ 4 s h

2 d h

 i

2 

1

 y hi y h 1

 y h 2

 y h

 

2

1

21

Chapter 3: Variance

Estimation Based on

Balanced Half-Samples

Random Group Estimator of Variance k

= 2 independent random groups are available y

11

, y

21

,  , y

L 1 y

12

, y

22

,  , y

L 2 v

RG

 y st , 1

2 

1 y st , 2

2

 y st ,

/ 4

 y st

2

 y st ,

 h

L 

1

W h y h

22

Chapter 3: Variance

Estimation Based on

Balanced Half-Samples

Half-Sample Methodology

 h 1

1 , if unit the

-

( h , 1 ) is selected for th half sample

0 , otherwise

2

L possible half samples y st ,

 h

L 

1

W h

  h 1

 y h 1

  h 2

 y h 2

23

Chapter 3: Variance

Estimation Based on

Balanced Half-Samples

Choosing a Manageable Number, k

, of Half-

Samples v k

  st

 k 

1

 y st ,

 y st

2

/ k k random k balanced

24

Chapter 3: Variance

Estimation Based on

Balanced Half-Samples

Table 3.2.1. Definition of Balanced Half-Sample

Replicates for 5, 6, 7, or 8 Strata

Replicate 1 2

Stratum (

3 4 5 h )

6 7

 h

 h

 h

 h

 h

 h

 h

 h

+1

-1

+1

-1

-1

-1

-1 -1 +1 -1

+1 +1 -1 -1 +1

+1 +1 +1 -1 -1

+1 +1 +1 -1

-1 +1 +1 +1

+1 -1 +1 +1

-1 +1 -1 +1

-1 -1 -1 -1

+1 +1

-1

+1

-1

-1

+1

+1 +1

-1

+1

-1

+1

-1

-1

-1

8

-1

-1

-1

-1

-1

-1

-1

-1

25

Chapter 3: Variance

Estimation Based on

Balanced Half-Samples

Properties of the Balanced Half-Sample

Methods v k

    st

 st k

1

 k 

1 y st ,

 y st

, provided k

L

26

Chapter 3: Variance

Estimation Based on

Balanced Half-Samples

Usage with Multistage Designs

Y

total number of employed persons in Canada

Y

ˆ p h 1

 h

L 

1

Y

ˆ h 1

/ 2

 housing p h 1

Y

ˆ h 2 units

/ 2 p h 2

Y

ˆ h 1

 estimator employed of total number persons in

 

of th PSU v

   h

1

Y

ˆ h 1

/ p h 1

Y

ˆ h 2

/ p h 2

2

/ 4

27

Chapter 3: Variance

Estimation Based on

Balanced Half-Samples

Balanced Half-Sample Methodology

Y

ˆ

 h

L 

1

 h 1

Y

ˆ h 1

/ p h 1

  h 2

Y

ˆ h 2

/ p h 2

 v k

  

 

1

Y

ˆ

Y

ˆ 2

/ k

28

Chapter 3: Variance

Estimation Based on

Balanced Half-Samples

Alternative Half-Sample Estimators of

Variance v k v c k v k v

† k v

 k

  

 

1

 ˆ

  ˆ 2

 

ˆ 

1 k

 k 

1

 ˆ

 c   ˆ 2

/ k

    k

ˆ 

1

   v k c

 

ˆ

 

ˆ 

1

4 k

 k 

1

 ˆ

  ˆ

 c

2

 

ˆ 

1

2 k

 k 

1

 ˆ 

  ˆ 2

Estimators are not necessaril y equal

29

Chapter 4:

The Jackknife Method

Quenouille (1949) – bias reduction

Tukey (1958) – variance estimation testing interval estimation

Resampling

30

Chapter 4:

The Jackknife Method

Basic Methodology

Random sample y

1

, y

2

,  , y n

Random groups n

 km

Parameter

(example : yield per acre of farms in

Quebec)

Estimator

 ˆ

31

Chapter 4:

The Jackknife Method

Drop out m

 ˆ  

1 ,  , k

Pseudovalue

 ˆ

 k

 ˆ 

 k

1

  ˆ

Quenouille’s estimator

ˆ

 k

1

 k 

1

 ˆ

Variance estimator v

1

(

ˆ

)

 v

2

(

ˆ

)

 k

 k

1

1

 k 

1

(

 ˆ

 

ˆ

)

2 k

 k

1

1

 k 

1

 ˆ

  ˆ

2 

Special case k

 n , m

1

32

Chapter 4:

The Jackknife Method

Full-sample estimator

 ˆ

Variance estimator v

 k

 k

1

1

 i k 

1

 ˆ

  ˆ

2 

33

Chapter 4:

The Jackknife Method

Example: ratio

 

Y / X

 ˆ  y / x

 ˆ  y   / x  

 ˆ  k y / x

 k

1

 y   / x  

34

Chapter 4:

The Jackknife Method

Usage in Stratified Sampling

Drop out observation(s) from individual strata

 ˆ

 ˆ v

1

L  h

1 n h n h

1 i n h 

1

 ˆ   ˆ 2

35

Chapter 4:

The Jackknife Method

Application to Cluster Sampling

Example

 

total employed persons

Drop out ultimate clusters

 ˆ 

 ˆ

1 n

 i n 

1

Y

ˆ i

/ p i

  i m

 k

1

1

 k

1

 m i

1

Y

ˆ i

/

 j p i

W ij

Y ij

  i

 j

W

(

) ij

Y ij

W

(

) ij

 m ( mk k

1 )

W ij

0

, if PSU is not dropped out

, if PSU is dropped out

36

Chapter 5: The

Bootstrap Method

Works with replicates of potentially any size, n

*

Original Application –

Y

1

,  , Y n

are iid random variables (scalar or vector) from a distribution function F

is to be estimated

 ˆ

37

Chapter 5: The

Bootstrap Method

A bootstrap sample (or bootstrap replicate ) is a simple random sample with replacement (srs wr) of size

* n selected from the original sample.

Y

1

*

,  , Y

* n

*

 ˆ

*

denotes the estimator of the same functional form as

 ˆ

38

Chapter 5: The

Bootstrap Method

Ideal Bootstrap Estimator of Var

  v

1

 

Var

*

 

, where Var

*

signifies the conditional variance, given the original sample

Monte Carlo Bootstrap Estimator of Var

  i.

Draw a large number, A , of independent bootstrap replicates from the main sample and label the corresponding observations as

Y

*

1

,  , Y

*

 n

*

, for

 

1 ,  , A ; ii.

For each bootstrap replicate, compute the corresponding estimator

 ˆ

*

of the parameter of interest; and iii.

Calculate the variance between the

 ˆ

*

values v

2

A

1

1

A 

1

 ˆ

*

 

ˆ

*

2

,

ˆ

* 

1

A

A 

1

 ˆ

*

.

39

Chapter 5: The

Bootstrap Method

Application to the Finite Population

Simple Random Sampling with Replacement (srs wr)

Data y

1

,  , y n

Parameter of Interest

Y

Standard Estimator y

   y i

40

Chapter 5: The

Bootstrap Method

Bootstrap Sample y

*

1

,  , y

* n *

Estimator y

* 

   y i

*

Bootstrap Moments

E

*

 

1

1 n n  i y i

Var

*

 

1

1 n n  i

 y i

 y

2  n

1 s n

2

Ideal Bootstrap Estimator of Variance v

1

Var

*

Var

*

{ y

1

*

}

 n

* n

1 s

2 n n

*

Unbiased Choice n

*  n

1

41

Chapter 5: The

Bootstrap Method

Multistage Sampling with pps wr Sampling at the First

Stage

Observed Data y ij

, where i indexes the selected PSU and j indexes the completed interview within the PSU

Parameter of Interest

Y

Estimator

Y

ˆ z i

 i n 

1

Y

ˆ i

/ p i

 j w ij y ij

 n

1 n

 i

Y

ˆ i p i

 n

1 n

 i z i

42

Chapter 5: The

Bootstrap Method

Bootstrap Sample z

1

*

, z

2

*

,...,

* z n

*

Bootstrap Moments

E

*

 n

1

Var

*

 

1

 n

1 n

 i z i

Y

ˆ n

 i z

 i

Y

ˆ 2

Ideal Bootstrap Estimator of Variance v

1

 

Var

*

 

Var

*

 

1

 n

*

1 n

* n

1 n h  i

 z i

Y

ˆ 2

.

Unbiased Choice n

*  n

1

43

Chapter 6: Taylor

Series Methods

 Assume a complex survey design

Y

( Y

1

,..., Y p

)

vector of population totals

( Y

ˆ

1

,..., Y

ˆ p

)

  g ( Y ) parameter of interest, such as

Y

1 the ratio

Y

2

 ˆ  g ( )

44

Chapter 6: Taylor

Series Methods

First-order Taylor series approximation

 ˆ    j p 

1

 g ( Y )

 y j

( Y

ˆ j

Y j

)

R

MSE

MSE {

 ˆ

} Var { j p 

1

 g ( Y ) d

 j  y j

 

E {(

 g ( Y )

 y j

( Y

ˆ j

Y j

)}

 d

 d

Y )(

Y )

}

45

Chapter 6: Taylor

Series Methods

 v (

 ˆ

)

 ˆ ˆ  d

ˆ j

 g (

ˆ

)

 y j

by textbook or replication-based method applied to the y -data

 Alternative algorithm

Y

ˆ j

  i

 s

W i

Y ji

U

ˆ   i

 s

W i

U i

U i

 j p 

1

 g ( Y )

 y j

Y ji

MSE {

 ˆ

} Var { U

ˆ

}

46

Chapter 7: Generalized

Variance Functions

1. Population total, X

2. Estimator of the total, X

ˆ

3. Relative variance,

V

2 

Var

X 2

4. V

2    

/ X

47

Download