MICS WS1

advertisement
Multiple Indicator Cluster Surveys
Survey Design Workshop
Sampling:
Advanced Sampling
MICS Survey Design Workshop
Major steps in designing MICS sample
• Define objectives
– Key indicators
– Desired level of precision
– Sub-national domains of estimation
• Identify most appropriate sampling
frame
– Most recent census of population and housing
– Master sample or sample for another survey
conducted recently
Major steps in designing MICS sample
• Determine sample size and
allocation
–Determine availability of previous
MICS or DHS results to provide
measures of sampling parameters
Sampling Frame
• Sampling frame:
– Nationally-representative
– Complete coverage
– Measures of size (households or
population) for small area units
• Generally most recent census is the most
effective sampling frame
Sampling Frame
• In some cases more recent pre-census
listing may be available
• When no census is available, identify
most complete geographic frame
available (e.g. list of villages/localities
with estimated population)
Sampling Frame
• Common problems with area frames:
– Coverage issues
– Census maps of poor quality
– Errors and changes in area boundaries
– Inappropriate type and size of area
units
– Lack of auxiliary information
SAMPLE SIZE DETERMINATION
• n is the required sample size (number of
households)
• 4 is a factor to achieve the 95 percent level of
confidence
• r is the predicted or estimated value of the
indicator in target population
• deff is the design effect
• RR is the response rate
• pb is the proportion of the target
subpopulation in total population (upon
which the indicator, r, is based)
• AveSize is the average household size (that is,
average number of persons per household)
• e is the margin of error to be tolerated at the
95% level of confidence
• Currently, note that e = 0.12r [defined as 12%
of r, in this case the relative standard error of
r is 6% because e = 2 standard error (r)]
Previously in MICS2
• 2 different values for margin of error
– Margin of error was 5 percentage points for high
values of r (over 25%)
– Margin of error was 3 percentage points for low
values of r (25% or less)
• Difficulty for users in deciding on the sample
size for their surveys.
MICS template for sample size
calculation - EXCEL FILE
Selection of key indicators
• Choose an important indicator that will yield
the largest sample size
• Step 1: Select 2 or 3 target populations
representing each a small percentage of the
total population (pb); typically
– Children 12-23 months: 2-4% or
– Children under 5 years: 7%-20%
Selection of key indicators
• Step 2: Review important indicators for these
target groups but ignore indicators with very
low or very high prevalence (less 10% or over
40%, respectively)
• Do not choose from the desirably low
coverage indicators an indicator that is
already acceptably low
• Do no choose childhood and maternal
mortality ratios
Explicit Stratification
• Explicit stratification: dividing the sampling
frame into sub-groups (called strata) of
homogeneous (similar) PSUs.
• Advantages:
– Better precision because reduced variance
within stratum given similarity of units
– Flexible design, sub-national estimates for
smaller domains (differential sampling rates)
• Example of stratification: region, urban/rural
Implicit Stratification
• Sort the sampling frame according to certain
characters such as regions, urban-rural
residence, sub-regions, districts, etc., then
select a systematic pps sample.
• Ensures a representative sample for each
subgroup
• Automatically provides proportional allocation
by size of subgroup
Allocation of sample to strata/domains
• Proportional allocation
– Effective for precision of estimates at the national level
• Equal allocation to each domain
– Used when each domain requires same level of precision
• Optimum allocation – takes into account differential
variance and costs by stratum
– For example, variability may be higher in urban areas and
enumeration costs may be higher in rural areas – use
higher sampling rate for urban areas
Subnational estimates
• Number of separate areas (domains) for which
separate, equally reliable estimates are wanted
affects sample size
• For example, if 10 regional estimates are wanted,
theoretically the sample should be increased by
factor of 10
• As a compromise, larger sampling errors accepted
for subnational estimates
– One proposal (by Dr. Vijay Verma) – increase national
sample size by factor of D0.65, where D is the number of
domains
– Results in an average increase in the sampling errors for
domain estimates by a factor of about 1.5
Sampling Stages
• Ideal to have two-stage sample design, with
EAs defined as PSUs
• In some countries only frame of larger
administrative units available
– Three-stage sample design: larger area units
selected as PSUs
– Necessary to delineate smaller segments in each
sample PSU
Number of PSUs and Cluster Size
• Survey costs depend not only on number of
households but their distribution among
primary sampling units (PSUs)
• Important to determine effective balance
between number of sample PSUs and number
of sample households per cluster
• In general, the more PSUs the better for
reliability but the greater the cost (mostly
costs of travel and listing)
Number of PSUs and Cluster Size
• Example: 8000 households selected in 400
PSUs of 20 sample households each is a much
more reliable sample than 200 PSUs of 40
households each, but more expensive
• Number of sample households per cluster
should be as small as practical for reliability
• A range of 15-25 households for MICS appears
to be effective
Design Effect (DEFF)
• Deff - ratio of variance of estimate based on
stratified multi-stage sample design and
corresponding variance from simple random
sample of same size
• Measure of the relative efficiency of the
sample design
• Effective stratification reduces the deff
• Cluster sampling increases the deff
Design Effect (DEFF)
• In case of cluster sampling, deff generally measures
effect of clustering
_
deff  1   (m 1)
• δ = intraclass correlation coefficient, or measure of
homogeneity
within cluster
_
• m = average number of households per cluster
• Design effect increases with intraclass correlation
and cluster size
First Stage Selection of PSUs
• Standard methodology for MICS and other
household surveys – select EAs or clusters
systematically with PPS
• Important to sort frame before selection, in
order to ensure effective implicit stratification
• Traditional procedure – cumulate measures of
size, determine sampling interval and random
start, generate selection numbers
Large sample PSUs in PPS sampling
• Sometimes a PSU may have a measure of size larger
than the sampling interval
• PSU may be selected more than once in the
systematic PPS selection
• Option 1 – if the PSU is selected two or more times,
multiply the number of households to be selected by
the number of “hits”
• Option 2 – separate the large PSUs and include in
sample with a probability of 1
MICS Sampling Option 1 –
new sample with household listing
• Design new MICS sample
• Two stages with census as frame
• Use of implicit stratification, systematic selection
of census EAs at first stage with pps
• List households in selected EAs/segments
• Select households systematically from listing
• Interview selected households, no replacement will
be allowed
Sampling Option 1 - continued
• Advantages of option 2
- simple design
- probability-based
- if possible self-weighting (national level)
• Limitations of option 2
- expense of listing households
- time necessary to list households
[Example, sample size of 5000 households may require 25000 to
50000 households to be listed]
MICS Sampling Option 2 –
use an existing sample
• Design MICS as a rider to another survey if timely
and feasible
• Use sample from a previous survey and re-interview
households for MICS
• Or, use old survey sample EAs and construct new
listing of households to select for MICS
• Old sample must be probability-based, national in
scope
• Possibilities – DHS, other national health survey, recent
labour force survey
• Important: design parameters must be known (such as
selection probability, stratification, etc.)
Sampling option 2 - continued
• Use of existing master sampling frame
• Some countries use master sample design for
intercensal national household surveys
• Master samples generally sufficiently large for
MICS; subsample of PSUs can be selected
• Advantage – updated maps may be available
for master sample of PSUs, and perhaps
updated listing
Sampling option 2 - continued
• Advantages of using previous sample
- cost savings
- maps available for interviewers
- appropriate sampling plan available
- simplicity
• Limitations of using old sample
- burden on respondents
- sample design may need modification
* sample size
* sub-national coverage
* number of PSUs or clusters
• Balance between loss and gain
Listing and Selection of Households
• Household listing manual is available
• Importance of new listing to represent current
population
• Problems with using previous listing (older
than 1 year)
– Does not represent newer households
– Distribution of sample population by age group
distorted, generally with higher median age
– Difficulty of finding households in old list
Listing and Selection of Households
• MICS recommends a separate household
listing operation
– More reliable as listing staff are less likely than
interviewers to bias the sample by excluding
households that are difficult to reach
– Allows household selection to be done in a
single central location using reliable and
uniform procedures
Listing and Selection of Households
• Household selection in the office:
– Advantages – conducted by specialized staff,
possible to avoid selection bias in the field,
possible to control overall sample size
– Disadvantage – increased costs from having two
field visits
• Selection in the field: use household selection table
– Advantage – cost savings of having one integrated
field operation
– Disadvantage - correct sampling may be difficult
for field staff, selection may be biased
Listing and Selection of Households
• Excel template for generating automatically
the sample of households based on the
number of households listed(see spreadsheet)
• Common problems found in listing operations
– Problem with quality of sketch maps – difficult to
determine segment boundaries
– Sometimes large differences found between
number of households in frame (census) and
number listed.
Sampling strategy for low fertility
countries
• In MICS 4 and 5, some low fertility countries
are using second-stage stratification of listing
by households with and without children
under 5
• Higher sampling rate used for households
with children
• Increases number of households with children
in MICS sample, and therefore number of
sample children
Sampling strategy for low fertility
countries (continued)
• Improves the reliability of the child indicators
without increasing the sample size to a very high
level
• This procedure also increases the variability in the
weights and the design effects for the overall sample
• Important to avoid very large variability in the
weights for households with and without children
– Differential weights between households with and without
children generally should not exceed a factor of about 4
Implications of sampling strategy
on sample size calculations
• One parameter in the sample size calculation
template is the proportion of the indicator
subpopulation
• Using a higher sampling rate for households with
children increases the proportion of children under 5
in the sample
• The proportion of children under 5 (or smaller age
groups) should be multiplied by a factor that reflects
the increase in sample households with children
Implications of sampling strategy
on weighting procedures
• Under normal MICS sample design, weights
vary by sample cluster
• With second stage stratification by
households with and without children, two
weights need to be calculated for each
cluster: for households with and without
children
Survey weighting procedures
• Survey data collected using a complex design
featuring clustering, unequal probabilities of
selection and stratification:
– All analyses must apply survey weights in order to
prevent biased results
• Formulas for calculating weights depend on
the exact sample design used in each country
• MICS has 4 set of weights: households,
women, men and children
Survey weighting procedures
• Components of MICS survey weights:
– Design weight: inverse of the final probability of
selection for households
– Adjustment factors for nonresponse (cluster,
household, woman, child level)
• Normalized weights so that the total weighted
number of observations is equal to the total
unweighting number (sample size)
Survey weighting procedures
Sampling Error Estimation
•
•
•
•
•
Necessary to evaluate reliability of survey estimates
Possible only when probability sampling is used
Should be done for 30-50 important indicators
Methodology is complex and design-specific
Several software packages:
– SPSS Complex Samples module – used in MICS
– SAS, Stata, SUDAAN, Clusters, WesVar, CENVAR,
PCCarp, etc.
• Standard error, confidence intervals and DEFF
Sampling Error Estimation
SPSS Complex Samples module
• Advantages:
– Simple to use
– Template syntax available for standard indicators
– Supported by MICS Global and Regional staff
• Steps:
– Set up sampling parameter specifications file
(csplan)
– Define variables for stratum, PSU and weight
Sampling Error Estimation
SPSS Complex Samples module
• Stratum should be lowest level of explicit
stratification (for example, province,
urban/rural)
• Necessary to have minimum of two sample
PSUs per stratum
Reducing bias
• Accuracy of survey results depends on both variance and bias
(mostly from nonsampling errors)
• Bias should be minimized with quality control for all survey
operations
• Basic data quality determined during enumeration
– Important to have good training and supervision in the field
• Data capture should include 100% or sample verification
• Important to have quality control for editing and coding
procedures
• Computer consistency and range checks
Download