Design Effects - Population Research Institute

advertisement
Design Effects:
What are they and how do they affect your
analysis?
David R. Johnson
Population Research Institute &
Department of Sociology
The Pennsylvania State University
What are Design Effects?
• Applies to the analysis of data gathered in a sample
from a population.
• For Social Science folks, this is survey data.
• Design effects are the ways departures of the sampling
frame from a simple random sample (SRS) impact
statistical estimates from the sample.
• These departures from a SRS can affect:
– Standard errors and significance tests
– Estimates of coefficients
Simple Random Sampling
• Much of statistical theory used to develop inferential
statistics assumes a simple random sample.
• SRS assumptions include:
– Equal probability of selection for all elements
– Each element selected at random independently from other
elements in the sample.
• If these assumptions are not met the estimates are
likely to be in error (biased)
• Yet most sample surveys depart from a SRS design.
Why Depart from a Simple Random Sample?
• To reduce data collection costs (increase the
efficiency of sample).
–
–
–
Cluster sampling
Stratification
Disproportionate sampling
• To adjust for bias in the sample.
–
–
Design weights: (adjust for disproportionate sampling)
Post-estimation weights: (adjust for non-response
and coverage)
Use of Clustering in Sampling Designs
Example of a cluster
sampling design in a
multistage area probability
sample.
Would include in sample
several (5 – 10) housing
units in the final segment.
Violates the SRS
assumption that are
elements are sampled
independently
Reduces cost by greatly
decreasing listing and
interviewer costs.
Source: http://ccnmtl.columbia.edu/projects/qmss/samples_and_sampling/types_of_sampling.html
Other common clustered designs
• Students in Schools. Where schools are
randomly sampled but multiple students are
surveys in each selected school.
– Example: Add Health (80 schools; many students in
each school)
• Members in Organizations.
– Example: A random sample of long term care
providers in which all employees were surveyed in
each organization.
The Impact of Clustering
• Because two random elements sampled within the same
cluster may be more similar than two random element
selected between clusters the information gained by
adding more elements within clusters is less than that
gained by adding more clusters.
• This can results in higher standard errors than would be
found in a simple random sample.
A measure of Design Effect (deff)
• deff is a measure of how much the sampling variability in a sample
differs from the sampling variability in a simple random sample.
• deff = 1 + rho (n – 1)
• Where rho is the interclass correlation and n is the number of
elements in the cluster.
• rho measures the similarity two randomly selected elements within
a cluster compared to two randomly selected elements between
clusters. The higher the value the more similar elements are within
clusters.
• A deff of 2, for example, would mean that it the sample would have
to be twice as large to yield the same sampling variability (standard
errors) that would have been found with a simple random sample.
Example
• A study of rent rates in large apartment complexes.
• Draw a random sample of 50 apartment complexes in the
population.
• Randomly sample 10 apartments in each complex (n = 10).
• If the rent of each apartment were the same within each apartment
and different between each of the complexes then rho = 1 and
deff = 1 + 1(10 -1) = 10
• In this extreme case, each additional apartment surveyed within a
cluster adds no new information about the rental cost.
• Only surveying one apartment in each complex would give us the
same information (with the same standard error) about level of
rent as we get from surveying 10.
Example
• Another extreme example…
• If we studying a variable like “shoe size” of residents of
apartments the estimate of the design effect might be
quite different.
• We would not expect “shoe size” to be clustered by
apartment complex, so we expect rho = 0.
• deff = 1 + 0(10 – 1) = 1
• The sampling variability in our cluster sample would be
the same as found in a simple random sample
An important point!!!
• The design effect is not a fixed characteristic of
the sample but one that differs from variable to
variable.
• Shown here for the clustering effect but this is also true
of design effects from stratification and weighting.
• When design effects are present our estimates and
standard errors are likely to be wrong unless we adjust
for the sampling design in calculating our estimates.
Stratification
•
•
•
•
•
•
•
Stratification can make our sample more accurate than a simple random
sample.
We use prior knowledge about the distribution in the sample to reduce
variability.
For example, let’s say we have 1000 students in a school and we want to
draw a representative sample of 100 of them.
Assume we know the gender of each student in the school and 50% are
male and 50% are female.
If we randomly sample 50 from among the males and 50 from among the
females the distribution by gender in our sample will be exactly the same
as in the population.
With a SRS this might not have been the case.
Will improve the estimates for other variables only if they are also related
to gender.
Stratification
• The most widely used stratification variables in
large national probability samples are
geographical.
– Census Region
– Metropolitan areas
– Population sizes of geographical subareas
• Census data and census estimates are often
used to define the strata.
What estimates do clustering and
stratification affect?
• These do not affect the point estimates
– Means
– Regression coefficients
• They only affect the standard errors, confidence
intervals, and significance tests.
• Weights, however, can bias both the point estimates
and the standard errors, confidence intervals, and
significance tests.
• The impact of weights on point estimates is widely
know, but the effects on inferential statistics less so.
Weights – The Good and the Bad
• The Good
– Weights are designed to increase the representativeness of our
sample.
– e.g. if the percent male in our sample is 40% but 50% in the
population, we assign weights so each male is worth more than
one male and each female is worth less than one female to
yield the population percent.
– Weights can adjust for design decisions as well, e.g., most
surveys randomly select only one adult to interview per
household so adults in households with several adults are
underrepresented.
– These can reduce the bias in our sample.
Weights – The Good and the Bad
• The Bad
– Weights always yield a deff > 1
– The size of the design effect will be impacted by the
variability in the weights.
• Large differences in the size of the weights for the cases will
result in larger deff
• Very large weights appear to have more effect on the deff
than very small weights.
– Although weights decrease bias they do it at the cost
of increasing the variability of our estimates.
What to do…
• More Bad News:
• Most datasets used in the social science have at least
one of these features that affect the estimates.
• Most standard statistical software does not adjust the
estimates for these design factors.
• More journals and granting agencies are requiring that
the statistical findings are adjusted for design effects.
What to do…
• But the Good News is:
• The major statistical packages now have relatively easy to use
procedures for most types of statistical analysis that adjust for
them.
• Design effects appear to have substantially less impact on the
standard errors of coefficients from multivariate analysis (e.g.
regression coefficients) than they do on descriptive statistics
(means, percentages)
• Previous published analytic research findings are not likely to be
affected very much by failing to adjust for such effects (especially
the effects of clustering and stratification)
How can we adjust for the design effects?
• Documentation for most large datasets contain information on the
variables included in the data that can be used adjust for the
design.
• The design data can take several forms which require different
adjustment procedures. The most common are:
– Variables identifying the primary sampling units (psu), the strata, and
the weight
– A set of replicates (e.g. 40 – 80) variables that give the structure for a
resampling (replication) method for adjusting standard errors and
replace the need for information on the psu and strata.
– A set of replicate weights (e.g. 40-80) that replace psu, strata and
weight information.
• (The replicate methods are used to hide the psu information for
confidentiality reasons.)
Software to adjust for Design Effects
•
Until recently, specialized software, not an integrated part of standard
packages was required to include design information in the estimates.
– Sudaan: A separate program later included in SAS
– WesVar: A program using replicate methods available to some degree in SPSS
but also stand alone
– IVEware: A public domain software package from the University of Michigan
•
Flexible procedures for design effects now available in:
–
–
–
–
SAS: A set of survey analysis procedures separate from Sudaan
Stata: A comprehensive set of SVY: procedures
R: A set of survey analysis procedures
SPSS: A survey analysis module available for extra cost (not part of SPSS site
license at Penn State)
Computational procedures used to create the
adjusted estimates.
• Taylor series expansion method. Considered the “gold
standard” method.
– A computational method involving estimating non-linear
equations. Equations are different for different statistics.
– Requires information on the psu and strata to compute.
• Re-sampling or Replication methods.
– Uses techniques such as the Jackknife and Bootstrap to draw
multiple replicate samples which convey information on the
dispersion in the sample.
– These methods need either a set of replicates or can generate
these (in some software) if the psu and strata are available.
The National Survey of Families and Households
(NSFH)
• A large national personal interview survey with a
complex sampling design employing a multistage area
probability sampling design with clusters and strata.
• Over 13,000 respondents.
• There were 100 primary sampling units and 1,700
clusters with an average of 7.1 respondents per cluster.
• Provides design information and weights to adjust for
design effects in two ways:
– Variables for the strata and psu’s
– A set of replicate variables
Replicates in the NSFH
• includes a set of 52 balanced, half-sample, random replicates
instead of case-level information on the sampling units and strata.
• Balanced half-sample replicates require two or more primary
sampling units in each stratum.
• For each replicate, one of the two primary sampling units in each
stratum is assigned a value of zero, and the other is assigned a
value of 1.
• The primary sampling units assigned zero are excluded from that
replicate.
• Programs such as Stata or WesVar can use these to adjust for the
design effects.
Design Information also available
for the NSFH study
id
3
8
11
16
24
29
stratum
118
12
117
14
14
117
psu
68
3
67
12
12
67
newla
17
14
10
14
14
13
Listing Area
or cluster
The Stratum and
psu variables can be
used to convey
design information
to many software
packages
Stata svyset command for NSFH
• svyset psu [pweight=weight] , strata(stratum)
• To use the replicates in Stata you might want to consult
a PRI programmer.
Design Information in the American Community
Survey (ACS)
• Conducted by the Census Bureau as a substitute for the
long form of the Census
• A large mail survey with telephone and personal
interview follow-ups of non-respondents.
• Considered a complex survey design but it is not an
area probability sample or a SRS.
• Available as a public use dataset.
• Presents design effects in a set of 80 replicate weights
that include both design and weight information.
Examples of replicate weights for ACS
rw1
136
178
181
173
265
185
86
rw2
27
102
101
114
136
64
31
rw3
83
113
103
132
132
290
133
rw4
167
101
93
101
126
50
27
rw5
161
197
184
202
231
301
177
rw6
77
90
97
89
139
303
204
Using the ACS weights
• The documentations suggests the following:
– Conduct your analysis 80 times, substituting in each
weight respectively.
– Save your parameter estimate in a file.
– The standard deviation of your estimate over the 80
runs is your correct standard error.
• It may also be possible to do this with a setting
in the svyset command in Stata.
Setting the design parameters for a dataset.
• Consult the documentation.
– Examples for setting the design for some software
packages is often provided.
• May need to consult with a PRI programmer if
in doubt.
• Set the design and forget it!!
• You only need to do this once…
Thank You!!!
• This PowerPoint will be available on the
PRI web site.
• There is also a list of references on the
web site to sources that discuss and
explain design effect issues.
Download