Advanced Sampling

advertisement
Multiple Indicator Cluster Surveys
Survey Design Workshop
Advanced Sampling
MICS4 Survey Design Workshop
Major steps in designing MICS sample
• Define objectives
– Key indicators
– Desired level of precision
– Subnational domains of estimation
• Identify most appropriate sampling frame
– Sample for another survey conducted recently
– Most recent census of population and housing
• Determine sample size and allocation
– Determine availability of previous MICS or DHS results to
provide measures of sampling parameters
MICS4 Survey Design Workshop
Sampling frame
• Sampling frame should be nationally-representative and have complete
coverage, with measures of size (households or population)
• Most countries conduct Census of Population and Housing every 10 years
– Generally provides most effective sampling frame for household surveys
• Sample for another survey conducted recently
• In case of older frame, geographic areas with substantial changes, such as
peri-urban in larges cities, may need to be updated
• When no census is available
– Identify most complete geographic frame available
– Example – Southern Sudan – list of villages from WHO immunization program
with estimated population
MICS4 Survey Design Workshop
MICS recommendations on
sample size determinants
FACTOR
RECOMMENDATION
1.Expected size estimate of indicators
2.Expected size estimate of target population
3.Average household size
4.Relative margin of error wanted
5.Level of confidence wanted
6.Design effect in cluster surveys
7.Expected non-response rate
8.Number of clusters or PSUs - minimum
9.Cluster size
10.Number of estimation “domains” wanted
11.Survey budget
(next slide)
12-23 mos [3%]
6 persons
12% of coverage rate
95 percent
1.5
10 percent
[300-400]
[15-35 households]
[5 or fewer]
(country specific)
For items 2, 3, 6, 7 use available country data (recent survey or census);
if not available, use value above.
MICS4 Survey Design Workshop
Indicators for sample size determination
• Sample size is different for each MICS indicator.
• Must choose a key indicator, since only one sample size can be used
in MICS
• Recommendations for choosing key indicator:
–
–
–
–
Choose from among main indicators of interest in your country
Choose the one which will yield largest sample size
Usually for a single-year age group
Usually DPT, measles, polio or tuberculosis immunization - or birth
weight below 2.5 kg
• Exceptions: Do not choose infant or maternal mortality rates as the
key indicators. Do not choose a low prevalence indicator that is
desirably low (such as malnutrition prevalence).
MICS4 Survey Design Workshop
Sample size formula
n
4r (1  r )( deff )(1.1)
2
_
(. 12 r ) ( p)( n)
where
– n is the required sample size, expressed as number of households, for the KEY
indicator,
– 4 is factor to achieve 95 percent level of confidence,
– r is anticipated prevalence rate for key indicator,
– 1.1 is factor to raise sample size by 10 percent for potential nonresponse,
– deff is shortened symbol for design effect,
– 0.12r is margin of error to be tolerated, defined as 12 percent of r (12 percent
thus represents the relative margin of error of r),
– p _is proportion of total population that smallest group comprises, and
– n is average household size.
MICS4 Survey Design Workshop
4(.25)(.75)(1.6)(1.05)
1.26

 6667
2
(.12  .25) (.035)(6) .000189
Example
•
•
•
•
•
•
•
Target group:
Children 12 to 23 months old
Percent of population:
3.5 percent
Key indicator:
DPT immunization coverage
Prevalence (Coverage): 25 percent
Deff:
1.6
Non-response adjustment:1.05 (response rate 95%)
Average household size: 6
4(.25)(.75)(1.6)(1.05)
1.26

 6667
2
(.12  .25) (.035)(6) .000189
MICS4 Survey Design Workshop
Sample size (Households) to estimate coverage
rates for smallest target population
Average
Household Size
(number of
persons)
4.0
4.5
5.0
5.5
6.0
estimated rate,
r = 0.25
estimated rate,
r = 0.30
estimated rate,
r = 0.35
estimated rate,
r = 0.40
13,750
12,222
11,000
10,000
9,167
10,694
9,506
8,556
7,778
7,130
8,512
7,566
6,810
6,191
5,675
6,875
6,111
5,500
5,000
4,583
Use this table when your
1. Target population is 2.5% of total population; this is generally children 12-23 months
old
2. Sample design effect, deff, is assumed to be 1.5 and nonresponse is expected to be
10 percent
3. Relative margin of error is set at 12 percent of estimate of coverage rate, r
MICS4 Survey Design Workshop
Note on precision requirements
• In case of MICS2, precision requirements expressed in terms
of acceptable margin of error (ME), which varied according to
the size of the estimate (5% absolute error for high rate
indicators or 3% for low rate indicators)
• For MICS3 and MICS4, this was simplified to a relative margin
of error (RME) of 0.12
• Follow guidelines in sampling chapter carefully; avoid
indicators with a high rate
• Final criterion for acceptable precision: is the confidence
interval useful?
– If confidence interval is too wide, estimate may not be useful
MICS4 Survey Design Workshop
Stratification and sample allocation
• Stratification is the process of dividing the sampling frame
into sub-groups (strata) of homogeneous (similar) PSUs
• Advantages: better precision, flexible design, sub-national
estimates for smaller domains (differential sampling rates)
– Reduced variance within stratum given similarity of units
• Example of stratification: region, urban/rural
• Existing sampling frame, such as master sample, may have
socioeconomic stratification for large cities
– Should improve statistical efficiency of sample design
• Geographic domains defined as strata
– Possible to use variable sampling rates by domain to ensure sufficient
sample size for each
MICS4 Survey Design Workshop
Implicit stratification
• Sort the sampling frame according to certain
characters such as regions, urban-rural
residence, sub-regions, districts, etc., then
select a systematic pps sample.
• Ensures a representative sample for each
subgroup
• Automatically provides proportional allocation
by size of subgroup
MICS4 Survey Design Workshop
Allocation of sample to strata
• Proportional allocation
– Effective for precision of estimates at the national level
• Equal allocation to each domain
– Used when each domain requires same level of precision
• Optimum allocation – takes into account differential
variance and costs by stratum
– For example, variability may be higher in urban areas and
enumeration costs may be higher in rural areas – use
higher sampling rate for urban areas
MICS4 Survey Design Workshop
Number of PSUs and cluster size
• Survey costs depend not only on number of households but
their distribution among primary sampling units (PSUs)
• Important to determine effective balance between number of
sample PSUs and cluster size
• In general, the more PSUs the better for reliability but the
greater the cost (mostly costs of travel and listing)
• At national level, minimum of 300 to 400 PSUs should be
selected
– Subnational domains require larger samples
• Cluster size should be as small as practical for reliability
• Example: 8000 households selected in 400 PSUs of 20 sample
households each is a much more reliable sample than 200 PSUs
of 40 households each, but more expensive
MICS4 Survey Design Workshop
Design effect
• Deff - ratio of variance of estimate based on stratified multistage sample design and corresponding variance from simple
random sample of same size
• Measure of the relative efficiency of the sample design
• Effective stratification reduces the deff
• Cluster sampling increases the deff
• Deft = square root of Deff, expressed as ratio of standard
errors
– Generally presented in tables of standard errors for the DHS
MICS4 Survey Design Workshop
Design effect (continued)
• In case of cluster sampling, deff generally measures
effect of clustering
_
deff  1   (m 1)
• δ = intraclass correlation coefficient, or measure of
homogeneity
within cluster
_
• m = average cluster size (households per cluster)
• Design effect increases with intraclass correlation
and cluster size
MICS4 Survey Design Workshop
MICS Sampling Option 1 –
use an existing sample
• Design MICS as a rider to another survey if timely
and feasible
• Use sample from a previous survey and re-interview
households for MICS
• Or, use old survey sample EAs and construct new
listing of households to select for MICS
• Old sample must be probability-based, national in
scope
• Possibilities – DHS, other national health survey, recent
labour force survey
• Important: design parameters must be known (such as
selection probability, stratification, etc.)
MICS4 Survey Design Workshop
Sampling option 1 - continued
• Advantages of using previous sample
- cost savings
- maps available for interviewers
- appropriate sampling plan available
- simplicity
• Limitations of using old sample
- burden on respondents
- sample design may need modification
* sample size
* sub-national coverage
* number of PSUs or clusters
• Balance between loss and gain
MICS4 Survey Design Workshop
MICS Sampling Option 2 –
new sample with household listing
• Design new MICS sample based on prototype
• Two stages with census as frame
• Use of implicit stratification, systematic selection
of census EAs at first stage with pps
• Create standard segments (DHS approach)
• List households in selected segments
• Select households systematically from listing
• Interview selected households, no replacement will
be allowed
MICS4 Survey Design Workshop
Sampling Option 2 - continued
• Advantages of option 2
- simple design
- probability-based
- if possible self-weighting (national level)
• Limitations of option 2
- expense of listing households
- time necessary to list households
[Example, sample size of 5000 households may require 25000 to
50000 households to be listed]
MICS4 Survey Design Workshop
DHS Method - Option 2
• Create “standard” segments
• Divide census population in each EA by 500 to
determine number of standard segments
• Map sketch segments in each EA
• Choose 1 segment at random
• List households in selected segment only (instead of
entire EA)
• Purpose is to reduce listing workload to a
manageable size
MICS4 Survey Design Workshop
MICS Sampling Option 3 –
use “compact clusters” with no listing
•
•
•
•
•
•
•
•
Modified segment, or cluster, design)
Design new MICS sample based on prototype
Two stages with census as frame
Use of implicit stratification, systematic selection of
census EAs at first stage with pps
Pre-determine number of segments (measure of size)
based on desired cluster size
Map sketch segments in each EA
Choose 1 segment at random
Interview all households in selected segment
MICS4 Survey Design Workshop
MICS Sampling Option 3 - continued
• Illustration:
• Suppose desired cluster size is 20 households.
• Suppose first sample EA contains 112 census
households (according to frame)
• Divide 112 by 20 = 5.6 (round to 6)
• Map sketch exactly 6 segments based on canvass of
EA
• Select one segment at random
• Interview all households (no matter how many are
currently in the selected segment)
MICS4 Survey Design Workshop
MICS Sampling Option 3 - continued
• Advantages of option 3
– avoids listing completely
– probability-based
– self-weighting (national level)
• Limitations of option 3
– less reliable than option 2 (households are “clustered”
together in compact segments)
– segmentation itself can be time-consuming and
complicated
– difficult to control overall sample size
MICS4 Survey Design Workshop
Common sampling option used by
some countries
• Select EAs systematically with PPS, where
measure of size is based on number of
households (or population)
• In case of large EAs in sample, subdivide into
standard segments, similar to Option 2
• Advantage: measures of size more exact,
easier to implement a self-weighting design
and control sample size
MICS4 Survey Design Workshop
PPS systematic selection of PSUs
• Selection of PSUs with PPS provides a self-weighting
sample when a fairly constant number of sample
households selected in each PSU at second stage
• Systematic sampling of PSUs from a geographically
ordered list ensures that the sample is
geographically representative, with a proportional
allocation to the different levels of geography
• Examine template for PPS systematic sampling
MICS4 Survey Design Workshop
Listing of households in
sample segments
• Importance of new listing to represent current
population
• Problems with using previous listing (older
than 1 year)
– Does not represent newer households
– Distribution of sample population by age group
distorted, generally with higher median age
– Difficulty of finding households in old list
MICS4 Survey Design Workshop
Listing of households (continued)
• Common problems found in listing operations
– Problem with quality of sketch maps – difficult to
determine segment boundaries
– Sometimes large differences found between
number of households in frame (census) and
number listed
MICS4 Survey Design Workshop
Selection of sample households
from listing
• Selection of households in the office following listing
operation
– Advantages – conducted by specialized staff, possible to avoid
selection bias in the field, possible to control overall sample size
– Disadvantage – increased costs from having two field visits
• Selection of households in field
– Advantage – cost savings of having one integrated field operation
– Disadvantage - correct sampling may be difficult for field staff,
selection may be biased
• Self-weighting samples – cluster sizes somewhat variable
• Selection of fixed number of sample households per cluster
– Controls sample size, allowing weights to vary somewhat by EA
• Use of household selection table in field
– Easy to use, minimizes selection bias
MICS4 Survey Design Workshop
Considerations for designing
self-weighting samples
• Main advantage of self-weighting sample is to simplify the
estimation procedures
– Also effective for national-level estimates
• Disadvantages of self-weighting samples
– May not be possible to obtain reliable estimates for smaller
subnational groups, given proportional allocation of sample
– Difficult to control overall sample size
– Use of SPSS and other software packages that automatically weight
survey tables reduce advantages of self-weighting samples
• Most countries are not using self-weighting samples for MICS
– Prefer selection of fixed number of households per EA
MICS4 Survey Design Workshop
Subnational estimates
• Number of separate areas (domains) for which separate, equally
reliable estimates are wanted affects sample size
• For example, if 10 regional estimates are wanted, theoretically the
sample should be increased by factor of 10
• As a compromise, larger sampling errors accepted for subnational
estimates
– One proposal (by Dr. Vijay Verma) – increase national sample size by
factor of D0.65, where D is the number of domains
– Results in an average increase in the sampling errors for domain estimates
by a factor of about 1.5
– Minimum number of PSUs required for each domain – for example, 30
clusters
• Allocation of sample to domains
– Equal allocation
– Modified proportional allocation, with a minimum and maximum number
of sample PSUs per domain
MICS4 Survey Design Workshop
Survey weighting procedures
• All analysis based on survey data must apply survey
weights in order to prevent biased results
• Survey weighting is design-specific
– Overall probability of selection has component from each
sampling stage.
– Design weight is inverse of final probability of selection
• Non-response must be taken into account
– Separate non-response adjustment for households, women age
15-49 years and children under 5 years
MICS4 Survey Design Workshop
Survey weighting procedures
• Formulas for calculating weights depend on the exact
sample design used in each country
• Design weights important for validating calculation of
weights and coverage of frame
– Weighted total number of households by region, urban and
rural strata should be compared to corresponding distribution
from census data or projections
• Normalized weights – each weight is divided by the
overall average weight
– Using normalized weights, the weighted and unweighted total
number of sample cases (households, women and children) are
equal
• Review of templates for calculating weights
MICS4 Survey Design Workshop
Sampling error estimation
• Calculation of sampling errors necessary to evaluate reliability of
survey estimates
• Should be done for 30-50 important indicators
• Methodology is complex and design-specific
• There are several software options for sampling error calculations:
– SPSS – Complex Samples add-on – calculation of standard errors,
confidence intervals and design effects
– Other existing software can be used (Stata, Clusters, WesVar, CENVAR,
PCCarp, etc.)
– Soon variance component will be added to CSPro
• Review of SPSS sampling error application
MICS4 Survey Design Workshop
Reducing bias
• Accuracy of survey results depends on both variance and bias
(mostly from nonsampling errors)
• Bias should be minimized with quality control for all survey
operations
• Basic data quality determined during enumeration
– Important to have good training and supervision in the field
• Data capture should include 100% or sample verification
• Important to have quality control for editing and coding
procedures
• Computer consistency and range checks
MICS4 Survey Design Workshop
Country example
•
•
•
•
•
2008 Mozambique MICS3
Use of existing survey
Subsample of EAs from the other survey
Shared listing with another survey
Different households selected for each survey
MICS4 Survey Design Workshop
Download