Multiple Indicator Cluster Surveys Survey Design Workshop Advanced Sampling MICS4 Survey Design Workshop Major steps in designing MICS sample • Define objectives – Key indicators – Desired level of precision – Subnational domains of estimation • Identify most appropriate sampling frame – Sample for another survey conducted recently – Most recent census of population and housing • Determine sample size and allocation – Determine availability of previous MICS or DHS results to provide measures of sampling parameters MICS4 Survey Design Workshop Sampling frame • Sampling frame should be nationally-representative and have complete coverage, with measures of size (households or population) • Most countries conduct Census of Population and Housing every 10 years – Generally provides most effective sampling frame for household surveys • Sample for another survey conducted recently • In case of older frame, geographic areas with substantial changes, such as peri-urban in larges cities, may need to be updated • When no census is available – Identify most complete geographic frame available – Example – Southern Sudan – list of villages from WHO immunization program with estimated population MICS4 Survey Design Workshop MICS recommendations on sample size determinants FACTOR RECOMMENDATION 1.Expected size estimate of indicators 2.Expected size estimate of target population 3.Average household size 4.Relative margin of error wanted 5.Level of confidence wanted 6.Design effect in cluster surveys 7.Expected non-response rate 8.Number of clusters or PSUs - minimum 9.Cluster size 10.Number of estimation “domains” wanted 11.Survey budget (next slide) 12-23 mos [3%] 6 persons 12% of coverage rate 95 percent 1.5 10 percent [300-400] [15-35 households] [5 or fewer] (country specific) For items 2, 3, 6, 7 use available country data (recent survey or census); if not available, use value above. MICS4 Survey Design Workshop Indicators for sample size determination • Sample size is different for each MICS indicator. • Must choose a key indicator, since only one sample size can be used in MICS • Recommendations for choosing key indicator: – – – – Choose from among main indicators of interest in your country Choose the one which will yield largest sample size Usually for a single-year age group Usually DPT, measles, polio or tuberculosis immunization - or birth weight below 2.5 kg • Exceptions: Do not choose infant or maternal mortality rates as the key indicators. Do not choose a low prevalence indicator that is desirably low (such as malnutrition prevalence). MICS4 Survey Design Workshop Sample size formula n 4r (1 r )( deff )(1.1) 2 _ (. 12 r ) ( p)( n) where – n is the required sample size, expressed as number of households, for the KEY indicator, – 4 is factor to achieve 95 percent level of confidence, – r is anticipated prevalence rate for key indicator, – 1.1 is factor to raise sample size by 10 percent for potential nonresponse, – deff is shortened symbol for design effect, – 0.12r is margin of error to be tolerated, defined as 12 percent of r (12 percent thus represents the relative margin of error of r), – p _is proportion of total population that smallest group comprises, and – n is average household size. MICS4 Survey Design Workshop 4(.25)(.75)(1.6)(1.05) 1.26 6667 2 (.12 .25) (.035)(6) .000189 Example • • • • • • • Target group: Children 12 to 23 months old Percent of population: 3.5 percent Key indicator: DPT immunization coverage Prevalence (Coverage): 25 percent Deff: 1.6 Non-response adjustment:1.05 (response rate 95%) Average household size: 6 4(.25)(.75)(1.6)(1.05) 1.26 6667 2 (.12 .25) (.035)(6) .000189 MICS4 Survey Design Workshop Sample size (Households) to estimate coverage rates for smallest target population Average Household Size (number of persons) 4.0 4.5 5.0 5.5 6.0 estimated rate, r = 0.25 estimated rate, r = 0.30 estimated rate, r = 0.35 estimated rate, r = 0.40 13,750 12,222 11,000 10,000 9,167 10,694 9,506 8,556 7,778 7,130 8,512 7,566 6,810 6,191 5,675 6,875 6,111 5,500 5,000 4,583 Use this table when your 1. Target population is 2.5% of total population; this is generally children 12-23 months old 2. Sample design effect, deff, is assumed to be 1.5 and nonresponse is expected to be 10 percent 3. Relative margin of error is set at 12 percent of estimate of coverage rate, r MICS4 Survey Design Workshop Note on precision requirements • In case of MICS2, precision requirements expressed in terms of acceptable margin of error (ME), which varied according to the size of the estimate (5% absolute error for high rate indicators or 3% for low rate indicators) • For MICS3 and MICS4, this was simplified to a relative margin of error (RME) of 0.12 • Follow guidelines in sampling chapter carefully; avoid indicators with a high rate • Final criterion for acceptable precision: is the confidence interval useful? – If confidence interval is too wide, estimate may not be useful MICS4 Survey Design Workshop Stratification and sample allocation • Stratification is the process of dividing the sampling frame into sub-groups (strata) of homogeneous (similar) PSUs • Advantages: better precision, flexible design, sub-national estimates for smaller domains (differential sampling rates) – Reduced variance within stratum given similarity of units • Example of stratification: region, urban/rural • Existing sampling frame, such as master sample, may have socioeconomic stratification for large cities – Should improve statistical efficiency of sample design • Geographic domains defined as strata – Possible to use variable sampling rates by domain to ensure sufficient sample size for each MICS4 Survey Design Workshop Implicit stratification • Sort the sampling frame according to certain characters such as regions, urban-rural residence, sub-regions, districts, etc., then select a systematic pps sample. • Ensures a representative sample for each subgroup • Automatically provides proportional allocation by size of subgroup MICS4 Survey Design Workshop Allocation of sample to strata • Proportional allocation – Effective for precision of estimates at the national level • Equal allocation to each domain – Used when each domain requires same level of precision • Optimum allocation – takes into account differential variance and costs by stratum – For example, variability may be higher in urban areas and enumeration costs may be higher in rural areas – use higher sampling rate for urban areas MICS4 Survey Design Workshop Number of PSUs and cluster size • Survey costs depend not only on number of households but their distribution among primary sampling units (PSUs) • Important to determine effective balance between number of sample PSUs and cluster size • In general, the more PSUs the better for reliability but the greater the cost (mostly costs of travel and listing) • At national level, minimum of 300 to 400 PSUs should be selected – Subnational domains require larger samples • Cluster size should be as small as practical for reliability • Example: 8000 households selected in 400 PSUs of 20 sample households each is a much more reliable sample than 200 PSUs of 40 households each, but more expensive MICS4 Survey Design Workshop Design effect • Deff - ratio of variance of estimate based on stratified multistage sample design and corresponding variance from simple random sample of same size • Measure of the relative efficiency of the sample design • Effective stratification reduces the deff • Cluster sampling increases the deff • Deft = square root of Deff, expressed as ratio of standard errors – Generally presented in tables of standard errors for the DHS MICS4 Survey Design Workshop Design effect (continued) • In case of cluster sampling, deff generally measures effect of clustering _ deff 1 (m 1) • δ = intraclass correlation coefficient, or measure of homogeneity within cluster _ • m = average cluster size (households per cluster) • Design effect increases with intraclass correlation and cluster size MICS4 Survey Design Workshop MICS Sampling Option 1 – use an existing sample • Design MICS as a rider to another survey if timely and feasible • Use sample from a previous survey and re-interview households for MICS • Or, use old survey sample EAs and construct new listing of households to select for MICS • Old sample must be probability-based, national in scope • Possibilities – DHS, other national health survey, recent labour force survey • Important: design parameters must be known (such as selection probability, stratification, etc.) MICS4 Survey Design Workshop Sampling option 1 - continued • Advantages of using previous sample - cost savings - maps available for interviewers - appropriate sampling plan available - simplicity • Limitations of using old sample - burden on respondents - sample design may need modification * sample size * sub-national coverage * number of PSUs or clusters • Balance between loss and gain MICS4 Survey Design Workshop MICS Sampling Option 2 – new sample with household listing • Design new MICS sample based on prototype • Two stages with census as frame • Use of implicit stratification, systematic selection of census EAs at first stage with pps • Create standard segments (DHS approach) • List households in selected segments • Select households systematically from listing • Interview selected households, no replacement will be allowed MICS4 Survey Design Workshop Sampling Option 2 - continued • Advantages of option 2 - simple design - probability-based - if possible self-weighting (national level) • Limitations of option 2 - expense of listing households - time necessary to list households [Example, sample size of 5000 households may require 25000 to 50000 households to be listed] MICS4 Survey Design Workshop DHS Method - Option 2 • Create “standard” segments • Divide census population in each EA by 500 to determine number of standard segments • Map sketch segments in each EA • Choose 1 segment at random • List households in selected segment only (instead of entire EA) • Purpose is to reduce listing workload to a manageable size MICS4 Survey Design Workshop MICS Sampling Option 3 – use “compact clusters” with no listing • • • • • • • • Modified segment, or cluster, design) Design new MICS sample based on prototype Two stages with census as frame Use of implicit stratification, systematic selection of census EAs at first stage with pps Pre-determine number of segments (measure of size) based on desired cluster size Map sketch segments in each EA Choose 1 segment at random Interview all households in selected segment MICS4 Survey Design Workshop MICS Sampling Option 3 - continued • Illustration: • Suppose desired cluster size is 20 households. • Suppose first sample EA contains 112 census households (according to frame) • Divide 112 by 20 = 5.6 (round to 6) • Map sketch exactly 6 segments based on canvass of EA • Select one segment at random • Interview all households (no matter how many are currently in the selected segment) MICS4 Survey Design Workshop MICS Sampling Option 3 - continued • Advantages of option 3 – avoids listing completely – probability-based – self-weighting (national level) • Limitations of option 3 – less reliable than option 2 (households are “clustered” together in compact segments) – segmentation itself can be time-consuming and complicated – difficult to control overall sample size MICS4 Survey Design Workshop Common sampling option used by some countries • Select EAs systematically with PPS, where measure of size is based on number of households (or population) • In case of large EAs in sample, subdivide into standard segments, similar to Option 2 • Advantage: measures of size more exact, easier to implement a self-weighting design and control sample size MICS4 Survey Design Workshop PPS systematic selection of PSUs • Selection of PSUs with PPS provides a self-weighting sample when a fairly constant number of sample households selected in each PSU at second stage • Systematic sampling of PSUs from a geographically ordered list ensures that the sample is geographically representative, with a proportional allocation to the different levels of geography • Examine template for PPS systematic sampling MICS4 Survey Design Workshop Listing of households in sample segments • Importance of new listing to represent current population • Problems with using previous listing (older than 1 year) – Does not represent newer households – Distribution of sample population by age group distorted, generally with higher median age – Difficulty of finding households in old list MICS4 Survey Design Workshop Listing of households (continued) • Common problems found in listing operations – Problem with quality of sketch maps – difficult to determine segment boundaries – Sometimes large differences found between number of households in frame (census) and number listed MICS4 Survey Design Workshop Selection of sample households from listing • Selection of households in the office following listing operation – Advantages – conducted by specialized staff, possible to avoid selection bias in the field, possible to control overall sample size – Disadvantage – increased costs from having two field visits • Selection of households in field – Advantage – cost savings of having one integrated field operation – Disadvantage - correct sampling may be difficult for field staff, selection may be biased • Self-weighting samples – cluster sizes somewhat variable • Selection of fixed number of sample households per cluster – Controls sample size, allowing weights to vary somewhat by EA • Use of household selection table in field – Easy to use, minimizes selection bias MICS4 Survey Design Workshop Considerations for designing self-weighting samples • Main advantage of self-weighting sample is to simplify the estimation procedures – Also effective for national-level estimates • Disadvantages of self-weighting samples – May not be possible to obtain reliable estimates for smaller subnational groups, given proportional allocation of sample – Difficult to control overall sample size – Use of SPSS and other software packages that automatically weight survey tables reduce advantages of self-weighting samples • Most countries are not using self-weighting samples for MICS – Prefer selection of fixed number of households per EA MICS4 Survey Design Workshop Subnational estimates • Number of separate areas (domains) for which separate, equally reliable estimates are wanted affects sample size • For example, if 10 regional estimates are wanted, theoretically the sample should be increased by factor of 10 • As a compromise, larger sampling errors accepted for subnational estimates – One proposal (by Dr. Vijay Verma) – increase national sample size by factor of D0.65, where D is the number of domains – Results in an average increase in the sampling errors for domain estimates by a factor of about 1.5 – Minimum number of PSUs required for each domain – for example, 30 clusters • Allocation of sample to domains – Equal allocation – Modified proportional allocation, with a minimum and maximum number of sample PSUs per domain MICS4 Survey Design Workshop Survey weighting procedures • All analysis based on survey data must apply survey weights in order to prevent biased results • Survey weighting is design-specific – Overall probability of selection has component from each sampling stage. – Design weight is inverse of final probability of selection • Non-response must be taken into account – Separate non-response adjustment for households, women age 15-49 years and children under 5 years MICS4 Survey Design Workshop Survey weighting procedures • Formulas for calculating weights depend on the exact sample design used in each country • Design weights important for validating calculation of weights and coverage of frame – Weighted total number of households by region, urban and rural strata should be compared to corresponding distribution from census data or projections • Normalized weights – each weight is divided by the overall average weight – Using normalized weights, the weighted and unweighted total number of sample cases (households, women and children) are equal • Review of templates for calculating weights MICS4 Survey Design Workshop Sampling error estimation • Calculation of sampling errors necessary to evaluate reliability of survey estimates • Should be done for 30-50 important indicators • Methodology is complex and design-specific • There are several software options for sampling error calculations: – SPSS – Complex Samples add-on – calculation of standard errors, confidence intervals and design effects – Other existing software can be used (Stata, Clusters, WesVar, CENVAR, PCCarp, etc.) – Soon variance component will be added to CSPro • Review of SPSS sampling error application MICS4 Survey Design Workshop Reducing bias • Accuracy of survey results depends on both variance and bias (mostly from nonsampling errors) • Bias should be minimized with quality control for all survey operations • Basic data quality determined during enumeration – Important to have good training and supervision in the field • Data capture should include 100% or sample verification • Important to have quality control for editing and coding procedures • Computer consistency and range checks MICS4 Survey Design Workshop Country example • • • • • 2008 Mozambique MICS3 Use of existing survey Subsample of EAs from the other survey Shared listing with another survey Different households selected for each survey MICS4 Survey Design Workshop