2/1/13
!
Professor Ron Fricker !
Naval Postgraduate School !
Monterey, California !
Reading Assignment: !
Scheaffer, Mendenhall, Ott, & Gerow !
Chapter 7.1-7.4
!
1
Goals for this Lecture !
• Define systematic sampling !
– Examples !
– Estimators (assuming SRS equivalence) !
• Discuss examples of complex sampling designs !
• Explain the Kish grid !
• Introduce variance estimation under complex designs !
2/1/13 2
What is Systematic Sampling?
!
• Systematic sampling : Given a list of items, select every k th element in the list !
– Start by randomly selecting the first item from the first k elements !
• Basis for how random searches are done of cars coming onto a base !
– Often useful for things like sampling visitors to a web site !
– Recently wrote a sampling methodology for
INSURV based on systematic sampling !
• See http://faculty.nps.edu/rdfricke/docs/NPS-OR-12-001.pdf
!
2/1/13 3
Advantages and Disadvantages of
Systematic Sampling !
• Advantages: !
– Can be easier to perform in the field !
– Less subject to selection errors by fieldworkers !
– Can provide more information per unit cost than
SRS !
• Potential disadvantages: !
– If list systematically varies in a cycle of approximately every k th item, then can introduce a bias in the result !
– May be harder to estimate variance in some situations !
2/1/13 4
When To Use Systematic Sampling !
• If probability sampling is too complicated to implement in the field !
– E.g., unreasonable to expect INSURV inspectors to either generate a random list of items to inspect or to run around the ship/submarine to inspect a random set of items !
• When generating a sampling frame list is impossible or too hard !
– Can be more effective and efficient to simply survey every k th item encountered !
• E.g., every k th visitor to a web site !
2/1/13 5
Mean Estimation Summary
(Assuming SRS Equivalency) !
• Estimator for the mean: !
y sy
=
1 n n
i = 1 y i
• y !
Var
( )
=
⎝⎜
⎛
1 − n
N ⎠⎟
⎞ s 2 n
• Bound on the error of estimation (margin of error): !
( )
= 2
⎝⎜
⎛
1 − n
N ⎠⎟
⎞ s 2 n
2/1/13 6
Estimating Totals
(Assuming SRS Equivalency) !
• Estimator for the total: !
τ ˆ = N × y sy
=
N n n
i = 1 y i
•
τ ˆ
!
Var
( )
= Var
( )
= N 2
⎝⎜
⎛
1 − n
N ⎠⎟
⎞ s 2 n
• Bound on the error of estimation (margin of error): !
( )
= 2 N
⎝⎜
⎛
1 − n
N ⎠⎟
⎞ s 2 n
2/1/13 7
Estimating Proportions
(Assuming SRS Equivalency) !
• Estimator for the proportion: !
ˆ = y sy
=
1 n n
i = 1 y i
• p
!
( )
=
⎝⎜
⎛
1 − n
N ⎠⎟
⎞ ˆ
( ) n
• Bound on the error of estimation (margin of error): !
ˆ
( ) n
( )
= 2
⎝⎜
⎛
1 − n
N ⎠⎟
⎞
2/1/13 8
Complex Sampling for Real-World Surveying !
• Usually, real world requirements and constraints result in complex sampling !
– Some combination of stratification and clustering along with unequal sampling probabilities !
• For example, geographic clustering arises with face-to-face interviewer-based surveys !
– Often it’s multi-stage clustering as well !
• Stratification often also necessary to ensure desired representation in sample !
• When combined, estimation gets much more complicated !
2/1/13 9
NAEP Sampling Scheme !
• First stage: 96 PSUs consisting of metropolitan statistical areas
(MSAs), a single non-MSA county, or a group of contiguous non-MSA counties !
– About a third of the PSUs are sampled with certainty !
– Remainder are stratified and one selected from each stratum with probability proportional to size !
• Second stage: selection public and nonpublic schools within the PSUs !
– For elementary, middle, and secondary samples, independent samples of schools are selected with probability proportional to measures of size !
• Third and final stage: 25 to 30 eligible students are sampled systematically with probabilities designed to make the overall selection probabilities approximately constant !
– Except students from private schools and schools with high proportions of black or Hispanic students oversampled !
• In 1996 nearly 150,000 students were tested from just over 2,000 participating schools !
2/1/13
Source: Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress ,
Board on Testing and Assessment (BOTA), National Academy of Science, 1999.
10
National Survey of Third World Country !
• First step: Stratify sample by state/province proportional to population !
– Oversample any state with less than 100 or 200 interviews to allow for state-to-state comparisons !
• Second step: Within state/province, stratify by urban and rural !
– Urban/rural stratification used to make sure that all localities are represented !
– As a general rule, locations of 10,000 or more classified urban, otherwise classified rural !
• Third step: Select PSUs within state/provinces and by urban/rural location !
• Fourth step: Select starting point within each PSU for each interviewer !
– Starting points defined as locations with sufficient public presence to be known by local residents, such as schools, markets, etc.
!
2/1/13 11
The World Health Survey Illustration !
2/1/13
Source: World Health Organization. The World Health Survey (WHS): Sampling Guidelines for Participating Countries.
Accessed online at http://www.who.int/entity/healthinfo/survey/whssamplingguidelines.pdf.
12
House Selection Via Systematic Sampling !
2/1/13 13
Selection of Household in Multi-dwelling Structure !
2/1/13 14
Respondent Selection in Each House !
• To select the person to interview within a household: !
– List all adult males and females aged 18 years and above in the household on a Kish grid !
• A Kish grid is essentially a table of randomly generated numbers !
• It’s a pre-assigned table of random numbers to find the person to be interviewed !
• Alternative is the next-birthday method !
– One respondent is selected using the grid !
– Once the responded is selected, the interview is conducted with only that respondent !
2/1/13 15
Kish Grid (aka Kish Tables) Example !
Overall Selection Probabilities
2/1/13
Source: Kish, L. (1949). A Procedure for Objective Respondent Selection Within the Household,
Journal of the American Statistical Association, 380-387.
16
Variance Estimation for Complex Designs !
• Complex sampling methods require nonstandard methods to estimate variances !
– I.e., Can’t just plug the data into statistical software and use their standard errors !
– (Very rare) exception: SRS with large population and low nonresponse !
• Software for (some) complex survey designs: !
– Free: CENVAR, VPLX, CPLX, EpiInfo !
– Commercial: SAS, Stata, SUDAAN, WesVar !
• Two estimation methods: Taylor series expansion and Jackknife !
2/1/13 17
!
• Taylor series approximation : converts ratios into sums !
• Example: Variance for weighted mean !
y w
= i n
= 1 i n
= 1 w i
!
assuming a SRS can be expressed as !
=
Var
( ∑ w y
)
+ y Var
(
(
∑
∑ w i w i
)
−
) 2
2 y Cov
( ∑ w y ,
∑ w i
)
2/1/13 18
!
• Jackknife and balanced repeated replication methods rely on empirical methods !
– Basically, resample from data c times !
– Calculate overall mean as !
y = c
1 c
γ = 1 y
γ
!
and then estimate variance as !
=
1
( − 1) c
γ = 1
( y
γ
− y
) 2
2/1/13 19
What We Have Covered !
• Defined systematic sampling !
– Examples !
– Estimators (assuming SRS equivalence) !
• Discussed examples of complex sampling designs !
• Explained the Kish grid !
• Introduced variance estimation under complex designs !
2/1/13 20