EMR 6500: Survey Research Dr. Chris L. S. Coryn Lyssa N. Wilson

advertisement
EMR 6500:
Survey Research
Dr. Chris L. S. Coryn
Lyssa N. Wilson
Spring 2015
Agenda
• Elements of the sampling problem
• Some basic concepts of statistics
• Case Study #1
Elements of the Sampling
Problem
Technical Terms (Again)
• An element is an object on which a
measurement is taken
• A population is a collection of elements
to which an inference is made
• A sample is a collection of sampling
units drawn from a frame or frames
• Sampling units are nonoverlapping
collections of elements from the
population that cover the entire
population
• A frame is a list of sampling units
How to Select the Sample: The
Design of the Survey Sample
How to Select the Sample
• The objective of sampling is to
estimate population parameters such
as the mean, proportion, or total
• The quantity of information is
controlled by the number of units
included in a sample and the method
used to select a sample
How to Select the Sample
• The primary questions addressed by
sampling theory are:
– What sampling procedure should be
used?
– What number of sampling units should
be included in a sample?
• The answer to both depends on how
much information one is willing to
buy
How to Select the Sample
• If Q is the population parameter of
interest and Q̂ is an estimator of Q
then a bound on the error of
estimation, B, should be specified
that represents the difference in
absolute value between Q and Q̂
Error of estimation = Q- Q̂ < B
How to Select the Sample
• A probability, (1- a ), specifies the
fraction of times in repeated samples
the the error of estimation is less
than B
P [ Error of estimation < B] =1- a
How to Select a Sample
• Typically B is set to 2s Q̂ and,
therefore, (1- a ) will be approximately
.95
• Once a bound, B, has been specified,
along with its associated probability,
(1- a ) , different sampling designs can
be compared to determine which is
most efficient for a particular
purpose
Probability Sampling
• Statistical estimation requires
randomness in sampling designs so
that properties of statistical
estimators can be assessed
probabilistically
• Sampling designs based on planned
randomness are probability samples
Simple Random Sampling
• The basic probability sampling
design, simple random sampling,
consists of selecting a group of n
sampling units in such a way that all
samples of size n have the same
probability of selection
Stratified Random Sampling
• A stratified random sample is one
obtained by separating the
population elements into discrete,
nonoverlapping groups, called strata,
and then selecting a simple random
sample from each stratum
Stratified Random Sampling
• The principle reasons for using stratified
random sampling rather than simple random
sampling are:
1. Stratification may produce a smaller bound on the
error of estimation than would be produced by a
simple random sample of the same size (this is
particularly true if measurements within strata
are homogenous)
2. The cost per observation may be reduced by
stratification of the population elements into
convenient groupings
3. Estimate of population parameters may be
desired for subgroups of the population (these
subgroups should then be identifiable strata)
Cluster Sampling
• Cluster sampling is a less costly
alternative to simple or stratified
random sampling if the cost of
obtaining a frame that lists all
population elements is very high or if
the cost of obtaining observations
increases as the distance separating
elements increases
Cluster Sampling
• Cluster sampling is an effective
design for obtaining a specified
amount of information under the
following conditions:
1. A good frame listing all population
elements is not available or is very
costly to obtain, but a frame listing
clusters is easily obtained
2. The cost of obtaining observations
increases as the distances separating
the elements increases
Cluster Sampling
• Clusters typically consist of herds,
households, or other units of clustering
(e.g., an orange tree forms a cluster of
oranges for investigating insect infestations)
• A farm herd contains a cluster of livestock
for estimating proportions of diseased
animals
• Elements within a cluster are often
physically close together and hence tend to
have similar characteristics and the
measurement on one element within a
cluster may be correlated with the
measurement on another
Each element of the population is in exactly one
stratum
Each element of the population is in exactly one
cluster
Take a simple random sample from every stratum
Take a simple random sample of clusters; observe
all elements within clusters in the sample
Variance of the estimate depends on the variability
within strata
Variance of the estimate depends primarily on the
variability between clusters
For greatest precision, individual elements within
each stratum should have similar values, but
stratum means should differ from each other as
much as possible
For greatest precision, individual elements within
each cluster should be heterogeneous, and cluster
means should be similar to one another
Systematic Sampling
• Systematic sampling involves
random selection of one element
from the first k elements and then
selecting every kth element
thereafter
Systematic Sampling
• Systematic sampling is a useful alternative
to simple random sampling for the following
reasons:
1. Systematic sampling is easier to perform in
the field and hence is less subject to selection
errors by field-workers than are either simple
random samples or stratified random
samples, especially if a good frame is not
available
2. Systematic sampling can provide greater
information per unit cost than simple random
sampling can provide for certain populations
with certain patterns in the arrangement of
elements
Multi-Stage Sampling
• Sampling conducted in stages, often
taking into account the hierarchical
(nested) structure of a population
– Primary sampling units (PSUs) are sampled
first (e.g., cities)
– Secondary sampling units (SSUs) are
sampled next (e.g., city blocks)
– Ultimate sampling units (actual elements)
are sampled last (e.g., households)
• Especially useful when no frame can be
established for a single-stage sample
Multi-Stage Sampling
• For a fixed sample size of elements,
a multi-stage sampling design is
almost always less efficient than a
simple random sample (though often
more feasible)
• Variance estimation methods for
complex sample designs must be
used to obtain correct standard
errors
Multiple-Frame Sampling
Quota Sampling
• A nonprobability sampling method
(although randomness is sometimes
part of the design) in which a
prespecified number of surveys is
obtained from specific subgroups of a
target population (e.g., Republicans,
Democrats)
• Introduces unknown sampling biases
into survey estimates
Chain-Referral Sampling
• Snowball sampling methods for sampling in
rare/hard-to-reach populations
• One or more persons having the trait of
interest serve as seeds and identify others
• Persons with many connections are likely to
be included, whereas isolated persons may
not be included at all
• Information about network connections in
the sample can be used to weight sample
units (respondent-driven sampling, which is
premised on Markov-chain theory)
Recruitment Network
+
+
+
+
–
–
+
–
+
–
+
+
–
+
+
–
+
–
+
+
+
+
–
+
+
+
–
+
–
+
+
–
–
+
–
+
+
+
–
+
+
+
+
+
–
+
+
+
+
–
+
+
+
+
–
+
+
–
–
+
–
+
Equilibrium
100%
90%
Percentage of Population
80%
70%
60%
50%
40%
30%
20%
10%
0%
0
1
2
3
4
5
6
Recruitment Wave
7
8
9
10
Planning a Survey
Planning A Survey
1. Statement of objectives
2. Target population
3. The frame
4. Sample design
5. Method of measurement
6. Measurement instrument
7. Selection and training of fieldworkers
8. The pretest (pilot)
9. Organization of fieldwork
10.Organization of data management
11.Data analysis
Some Basic Concepts of
Statistics
Finite Population Correction
• Most statistical theory is premised on an
underlying infinite population
• Sampling theory and practice is founded on
the assumption of sampling from a finite
population
• In the general framework of finite
population sampling, sample sizes of size n
are taken from a population of size N
• In the finite population case, the variance
estimate of a statistical estimator must be
adjusted due to the fact that not all data
from a finite population are observed, using
the finite population correction (fpc)
Finite Population Correction
• For simple random samples (without
replacement) the fpc is expressed as
or
n
1 or 1 - f
N
• Where f is the sampling fraction or
rate
n
f 
N
Finite Population Correction
• The fpc is, therefore, the fraction of a
finite population that is not sampled
• Because the fpc is literally a factor in
the calculation of an estimate of
variance for an estimated finite
population parameter, the estimated
variance is reduced to zero if n = N
Finite Population Correction
• When n is small relative to N, the fpc is
close to unity
• In samples of very large populations f
is very small and the fpc may be
ignored
– Ignore if 1-n/N>.95
• Although the fpc is applicable for
estimation, it often is not necessary for
many inferential uses such as statistical
significance testing (e.g., comparison
between sampled subgroups).
Estimate of Population Mean
n
m̂ = y =
åy
i
i=1
n
æ n ö s2
V̂(y) = ç1- ÷
è Nø n
æ n ö s2
2 V̂ ( y ) = 2 ç1- ÷
è Nø n
Estimate of Population
Proportion
n
åy
i
p̂ = y =
æ n ö p̂q̂
V̂( p̂) = ç1- ÷
è N ø n -1
i=1
n
where
q̂ =1- p̂
æ n ö p̂q̂
2 V̂ ( p̂) = 2 ç1- ÷
è N ø n -1
Estimate of Population Total
n
tˆ = Ny =
N å yi
i=1
n
2ö
æ
æ
ö
n s
2
V̂(tˆ ) = V̂ ( Ny ) = N ç1- ÷ç ÷
è N øè n ø
2ö
æ
æ
ö
n
s
2
2 V̂ ( Ny ) = 2 N ç1- ÷ç ÷
è N øè n ø
Case Study #1
Case Study Activity
• In small groups, address the
following questions in relation to
Case Study #1 relying only on the
material that was discussed thus far
in the semester
1. Has the surveyor committed any
serious error(s)?
2. If so, what type and why? If not, why?
Download