Document 10313273

advertisement

2/1/13

Systematic and Complex

Sampling

!

Professor Ron Fricker !

Naval Postgraduate School !

Monterey, California !

Reading Assignment: !

Scheaffer, Mendenhall, Ott, & Gerow !

Chapter 7.1-7.4

!

1

Goals for this Lecture !

•   Define systematic sampling !

–   Examples !

–   Estimators (assuming SRS equivalence) !

•   Discuss examples of complex sampling designs !

•   Explain the Kish grid !

•   Introduce variance estimation under complex designs !

2/1/13 2

What is Systematic Sampling?

!

•   Systematic sampling : Given a list of items, select every k th element in the list !

–   Start by randomly selecting the first item from the first k elements !

•   Basis for how random searches are done of cars coming onto a base !

–   Often useful for things like sampling visitors to a web site !

–   Recently wrote a sampling methodology for

INSURV based on systematic sampling !

•   See http://faculty.nps.edu/rdfricke/docs/NPS-OR-12-001.pdf

!

2/1/13 3

Advantages and Disadvantages of

Systematic Sampling !

•   Advantages: !

–   Can be easier to perform in the field !

–   Less subject to selection errors by fieldworkers !

–   Can provide more information per unit cost than

SRS !

•   Potential disadvantages: !

–   If list systematically varies in a cycle of approximately every k th item, then can introduce a bias in the result !

–   May be harder to estimate variance in some situations !

2/1/13 4

When To Use Systematic Sampling !

•   If probability sampling is too complicated to implement in the field !

–   E.g., unreasonable to expect INSURV inspectors to either generate a random list of items to inspect or to run around the ship/submarine to inspect a random set of items !

•   When generating a sampling frame list is impossible or too hard !

–   Can be more effective and efficient to simply survey every k th item encountered !

•   E.g., every k th visitor to a web site !

2/1/13 5

Mean Estimation Summary

(Assuming SRS Equivalency) !

•   Estimator for the mean: !

y sy

=

1 n n

i = 1 y i

•   y !

Var

( )

=

⎝⎜

1 − n

N ⎠⎟

⎞ s 2 n

•   Bound on the error of estimation (margin of error): !

( )

= 2

⎝⎜

1 − n

N ⎠⎟

⎞ s 2 n

2/1/13 6

Estimating Totals

(Assuming SRS Equivalency) !

•   Estimator for the total: !

τ ˆ = N × y sy

=

N n n

i = 1 y i

•  

τ ˆ

!

Var

( )

= Var

( )

= N 2

⎝⎜

1 − n

N ⎠⎟

⎞ s 2 n

•   Bound on the error of estimation (margin of error): !

( )

= 2 N

⎝⎜

1 − n

N ⎠⎟

⎞ s 2 n

2/1/13 7

Estimating Proportions

(Assuming SRS Equivalency) !

•   Estimator for the proportion: !

ˆ = y sy

=

1 n n

i = 1 y i

•   p

!

( )

=

⎝⎜

1 − n

N ⎠⎟

⎞ ˆ

( ) n

•   Bound on the error of estimation (margin of error): !

ˆ

( ) n

( )

= 2

⎝⎜

1 − n

N ⎠⎟

2/1/13 8

Complex Sampling for Real-World Surveying !

•   Usually, real world requirements and constraints result in complex sampling !

–   Some combination of stratification and clustering along with unequal sampling probabilities !

•   For example, geographic clustering arises with face-to-face interviewer-based surveys !

–   Often it’s multi-stage clustering as well !

•   Stratification often also necessary to ensure desired representation in sample !

•   When combined, estimation gets much more complicated !

2/1/13 9

NAEP Sampling Scheme !

•   First stage: 96 PSUs consisting of metropolitan statistical areas

(MSAs), a single non-MSA county, or a group of contiguous non-MSA counties !

–   About a third of the PSUs are sampled with certainty !

–   Remainder are stratified and one selected from each stratum with probability proportional to size !

•   Second stage: selection public and nonpublic schools within the PSUs !

–   For elementary, middle, and secondary samples, independent samples of schools are selected with probability proportional to measures of size !

•   Third and final stage: 25 to 30 eligible students are sampled systematically with probabilities designed to make the overall selection probabilities approximately constant !

–   Except students from private schools and schools with high proportions of black or Hispanic students oversampled !

•   In 1996 nearly 150,000 students were tested from just over 2,000 participating schools !

2/1/13

Source: Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress ,

Board on Testing and Assessment (BOTA), National Academy of Science, 1999.

10

National Survey of Third World Country !

•   First step: Stratify sample by state/province proportional to population !

–   Oversample any state with less than 100 or 200 interviews to allow for state-to-state comparisons !

•   Second step: Within state/province, stratify by urban and rural !

–   Urban/rural stratification used to make sure that all localities are represented !

–   As a general rule, locations of 10,000 or more classified urban, otherwise classified rural !

•   Third step: Select PSUs within state/provinces and by urban/rural location !

•   Fourth step: Select starting point within each PSU for each interviewer !

–   Starting points defined as locations with sufficient public presence to be known by local residents, such as schools, markets, etc.

!

2/1/13 11

The World Health Survey Illustration !

2/1/13

Source: World Health Organization. The World Health Survey (WHS): Sampling Guidelines for Participating Countries.

Accessed online at http://www.who.int/entity/healthinfo/survey/whssamplingguidelines.pdf.

12

House Selection Via Systematic Sampling !

2/1/13 13

Selection of Household in Multi-dwelling Structure !

2/1/13 14

Respondent Selection in Each House !

•   To select the person to interview within a household: !

–   List all adult males and females aged 18 years and above in the household on a Kish grid !

•   A Kish grid is essentially a table of randomly generated numbers !

•   It’s a pre-assigned table of random numbers to find the person to be interviewed !

•   Alternative is the next-birthday method !

–   One respondent is selected using the grid !

–   Once the responded is selected, the interview is conducted with only that respondent !

2/1/13 15

Kish Grid (aka Kish Tables) Example !

Overall Selection Probabilities

2/1/13

Source: Kish, L. (1949). A Procedure for Objective Respondent Selection Within the Household,

Journal of the American Statistical Association, 380-387.

16

Variance Estimation for Complex Designs !

•   Complex sampling methods require nonstandard methods to estimate variances !

–   I.e., Can’t just plug the data into statistical software and use their standard errors !

–   (Very rare) exception: SRS with large population and low nonresponse !

•   Software for (some) complex survey designs: !

–   Free: CENVAR, VPLX, CPLX, EpiInfo !

–   Commercial: SAS, Stata, SUDAAN, WesVar !

•   Two estimation methods: Taylor series expansion and Jackknife !

2/1/13 17

Variance Estimation (Taylor Series)

!

•   Taylor series approximation : converts ratios into sums !

•   Example: Variance for weighted mean !

y w

= i n

∑ ∑

= 1 i n

= 1 w i

!

assuming a SRS can be expressed as !

=

Var

( ∑ w y

)

+ y Var

(

(

∑ w i w i

)

) 2

2 y Cov

( ∑ w y ,

∑ w i

)

2/1/13 18

Variance Estimation (Jackknife and

Balanced Repeated Replication)

!

•   Jackknife and balanced repeated replication methods rely on empirical methods !

–   Basically, resample from data c times !

–   Calculate overall mean as !

y = c

1 c

γ = 1 y

γ

!

and then estimate variance as !

=

1

( − 1) c

γ = 1

( y

γ

− y

) 2

2/1/13 19

What We Have Covered !

•   Defined systematic sampling !

–   Examples !

–   Estimators (assuming SRS equivalence) !

•   Discussed examples of complex sampling designs !

•   Explained the Kish grid !

•   Introduced variance estimation under complex designs !

2/1/13 20

Download