Multiple Indicator Cluster Surveys Survey Design Workshop Sampling: Advanced Sampling MICS Survey Design Workshop Major steps in designing MICS sample • Define objectives – Key indicators – Desired level of precision – Sub-national domains of estimation • Identify most appropriate sampling frame – Most recent census of population and housing – Master sample or sample for another survey conducted recently Major steps in designing MICS sample • Determine sample size and allocation –Determine availability of previous MICS or DHS results to provide measures of sampling parameters Sampling Frame • Sampling frame: – Nationally-representative – Complete coverage – Measures of size (households or population) for small area units • Generally most recent census is the most effective sampling frame Sampling Frame • In some cases more recent pre-census listing may be available • When no census is available, identify most complete geographic frame available (e.g. list of villages/localities with estimated population) Sampling Frame • Common problems with area frames: – Coverage issues – Census maps of poor quality – Errors and changes in area boundaries – Inappropriate type and size of area units – Lack of auxiliary information SAMPLE SIZE DETERMINATION • n is the required sample size (number of households) • 4 is a factor to achieve the 95 percent level of confidence • r is the predicted or estimated value of the indicator in target population • deff is the design effect • RR is the response rate • pb is the proportion of the target subpopulation in total population (upon which the indicator, r, is based) • AveSize is the average household size (that is, average number of persons per household) • e is the margin of error to be tolerated at the 95% level of confidence • Currently, note that e = 0.12r [defined as 12% of r, in this case the relative standard error of r is 6% because e = 2 standard error (r)] Previously in MICS2 • 2 different values for margin of error – Margin of error was 5 percentage points for high values of r (over 25%) – Margin of error was 3 percentage points for low values of r (25% or less) • Difficulty for users in deciding on the sample size for their surveys. MICS template for sample size calculation - EXCEL FILE Selection of key indicators • Choose an important indicator that will yield the largest sample size • Step 1: Select 2 or 3 target populations representing each a small percentage of the total population (pb); typically – Children 12-23 months: 2-4% or – Children under 5 years: 7%-20% Selection of key indicators • Step 2: Review important indicators for these target groups but ignore indicators with very low or very high prevalence (less 10% or over 40%, respectively) • Do not choose from the desirably low coverage indicators an indicator that is already acceptably low • Do no choose childhood and maternal mortality ratios Explicit Stratification • Explicit stratification: dividing the sampling frame into sub-groups (called strata) of homogeneous (similar) PSUs. • Advantages: – Better precision because reduced variance within stratum given similarity of units – Flexible design, sub-national estimates for smaller domains (differential sampling rates) • Example of stratification: region, urban/rural Implicit Stratification • Sort the sampling frame according to certain characters such as regions, urban-rural residence, sub-regions, districts, etc., then select a systematic pps sample. • Ensures a representative sample for each subgroup • Automatically provides proportional allocation by size of subgroup Allocation of sample to strata/domains • Proportional allocation – Effective for precision of estimates at the national level • Equal allocation to each domain – Used when each domain requires same level of precision • Optimum allocation – takes into account differential variance and costs by stratum – For example, variability may be higher in urban areas and enumeration costs may be higher in rural areas – use higher sampling rate for urban areas Subnational estimates • Number of separate areas (domains) for which separate, equally reliable estimates are wanted affects sample size • For example, if 10 regional estimates are wanted, theoretically the sample should be increased by factor of 10 • As a compromise, larger sampling errors accepted for subnational estimates – One proposal (by Dr. Vijay Verma) – increase national sample size by factor of D0.65, where D is the number of domains – Results in an average increase in the sampling errors for domain estimates by a factor of about 1.5 Sampling Stages • Ideal to have two-stage sample design, with EAs defined as PSUs • In some countries only frame of larger administrative units available – Three-stage sample design: larger area units selected as PSUs – Necessary to delineate smaller segments in each sample PSU Number of PSUs and Cluster Size • Survey costs depend not only on number of households but their distribution among primary sampling units (PSUs) • Important to determine effective balance between number of sample PSUs and number of sample households per cluster • In general, the more PSUs the better for reliability but the greater the cost (mostly costs of travel and listing) Number of PSUs and Cluster Size • Example: 8000 households selected in 400 PSUs of 20 sample households each is a much more reliable sample than 200 PSUs of 40 households each, but more expensive • Number of sample households per cluster should be as small as practical for reliability • A range of 15-25 households for MICS appears to be effective Design Effect (DEFF) • Deff - ratio of variance of estimate based on stratified multi-stage sample design and corresponding variance from simple random sample of same size • Measure of the relative efficiency of the sample design • Effective stratification reduces the deff • Cluster sampling increases the deff Design Effect (DEFF) • In case of cluster sampling, deff generally measures effect of clustering _ deff 1 (m 1) • δ = intraclass correlation coefficient, or measure of homogeneity within cluster _ • m = average number of households per cluster • Design effect increases with intraclass correlation and cluster size First Stage Selection of PSUs • Standard methodology for MICS and other household surveys – select EAs or clusters systematically with PPS • Important to sort frame before selection, in order to ensure effective implicit stratification • Traditional procedure – cumulate measures of size, determine sampling interval and random start, generate selection numbers Large sample PSUs in PPS sampling • Sometimes a PSU may have a measure of size larger than the sampling interval • PSU may be selected more than once in the systematic PPS selection • Option 1 – if the PSU is selected two or more times, multiply the number of households to be selected by the number of “hits” • Option 2 – separate the large PSUs and include in sample with a probability of 1 MICS Sampling Option 1 – new sample with household listing • Design new MICS sample • Two stages with census as frame • Use of implicit stratification, systematic selection of census EAs at first stage with pps • List households in selected EAs/segments • Select households systematically from listing • Interview selected households, no replacement will be allowed Sampling Option 1 - continued • Advantages of option 2 - simple design - probability-based - if possible self-weighting (national level) • Limitations of option 2 - expense of listing households - time necessary to list households [Example, sample size of 5000 households may require 25000 to 50000 households to be listed] MICS Sampling Option 2 – use an existing sample • Design MICS as a rider to another survey if timely and feasible • Use sample from a previous survey and re-interview households for MICS • Or, use old survey sample EAs and construct new listing of households to select for MICS • Old sample must be probability-based, national in scope • Possibilities – DHS, other national health survey, recent labour force survey • Important: design parameters must be known (such as selection probability, stratification, etc.) Sampling option 2 - continued • Use of existing master sampling frame • Some countries use master sample design for intercensal national household surveys • Master samples generally sufficiently large for MICS; subsample of PSUs can be selected • Advantage – updated maps may be available for master sample of PSUs, and perhaps updated listing Sampling option 2 - continued • Advantages of using previous sample - cost savings - maps available for interviewers - appropriate sampling plan available - simplicity • Limitations of using old sample - burden on respondents - sample design may need modification * sample size * sub-national coverage * number of PSUs or clusters • Balance between loss and gain Listing and Selection of Households • Household listing manual is available • Importance of new listing to represent current population • Problems with using previous listing (older than 1 year) – Does not represent newer households – Distribution of sample population by age group distorted, generally with higher median age – Difficulty of finding households in old list Listing and Selection of Households • MICS recommends a separate household listing operation – More reliable as listing staff are less likely than interviewers to bias the sample by excluding households that are difficult to reach – Allows household selection to be done in a single central location using reliable and uniform procedures Listing and Selection of Households • Household selection in the office: – Advantages – conducted by specialized staff, possible to avoid selection bias in the field, possible to control overall sample size – Disadvantage – increased costs from having two field visits • Selection in the field: use household selection table – Advantage – cost savings of having one integrated field operation – Disadvantage - correct sampling may be difficult for field staff, selection may be biased Listing and Selection of Households • Excel template for generating automatically the sample of households based on the number of households listed(see spreadsheet) • Common problems found in listing operations – Problem with quality of sketch maps – difficult to determine segment boundaries – Sometimes large differences found between number of households in frame (census) and number listed. Sampling strategy for low fertility countries • In MICS 4 and 5, some low fertility countries are using second-stage stratification of listing by households with and without children under 5 • Higher sampling rate used for households with children • Increases number of households with children in MICS sample, and therefore number of sample children Sampling strategy for low fertility countries (continued) • Improves the reliability of the child indicators without increasing the sample size to a very high level • This procedure also increases the variability in the weights and the design effects for the overall sample • Important to avoid very large variability in the weights for households with and without children – Differential weights between households with and without children generally should not exceed a factor of about 4 Implications of sampling strategy on sample size calculations • One parameter in the sample size calculation template is the proportion of the indicator subpopulation • Using a higher sampling rate for households with children increases the proportion of children under 5 in the sample • The proportion of children under 5 (or smaller age groups) should be multiplied by a factor that reflects the increase in sample households with children Implications of sampling strategy on weighting procedures • Under normal MICS sample design, weights vary by sample cluster • With second stage stratification by households with and without children, two weights need to be calculated for each cluster: for households with and without children Survey weighting procedures • Survey data collected using a complex design featuring clustering, unequal probabilities of selection and stratification: – All analyses must apply survey weights in order to prevent biased results • Formulas for calculating weights depend on the exact sample design used in each country • MICS has 4 set of weights: households, women, men and children Survey weighting procedures • Components of MICS survey weights: – Design weight: inverse of the final probability of selection for households – Adjustment factors for nonresponse (cluster, household, woman, child level) • Normalized weights so that the total weighted number of observations is equal to the total unweighting number (sample size) Survey weighting procedures Sampling Error Estimation • • • • • Necessary to evaluate reliability of survey estimates Possible only when probability sampling is used Should be done for 30-50 important indicators Methodology is complex and design-specific Several software packages: – SPSS Complex Samples module – used in MICS – SAS, Stata, SUDAAN, Clusters, WesVar, CENVAR, PCCarp, etc. • Standard error, confidence intervals and DEFF Sampling Error Estimation SPSS Complex Samples module • Advantages: – Simple to use – Template syntax available for standard indicators – Supported by MICS Global and Regional staff • Steps: – Set up sampling parameter specifications file (csplan) – Define variables for stratum, PSU and weight Sampling Error Estimation SPSS Complex Samples module • Stratum should be lowest level of explicit stratification (for example, province, urban/rural) • Necessary to have minimum of two sample PSUs per stratum Reducing bias • Accuracy of survey results depends on both variance and bias (mostly from nonsampling errors) • Bias should be minimized with quality control for all survey operations • Basic data quality determined during enumeration – Important to have good training and supervision in the field • Data capture should include 100% or sample verification • Important to have quality control for editing and coding procedures • Computer consistency and range checks