Introduction to Sampling for the Implementation of PATs Materials Developed by The IRIS Center at the University of Maryland 1a Advantages of Sampling In most cases, do not want to survey EVERYONE Why? – Too costly – Too time consuming – Too many resources needed 2 Advantages of Sampling To make our work more cost-effective: -Interview the minimum number needed -Reduce: • time • cost • human error 3 Survey Sampling According to sampling theory we can get valid results from studying only a fraction (a sample) of our clients, provided: • the sample is REPRESENTATIVE of the qualities of our client POPULATION, and • of sufficient SIZE to satisfy the assumptions of the statistical techniques used in our analysis 4 Simple Random Sampling For the sample to be representative, it must be obtained randomly. It is a simple random sample if each item in the population has an equal chance of being selected. 5 Types of Bias in Survey Process Poor randomization is not the only cause of biased samples. Bias and error are more often introduced by: – poor group definition – interviewer error – inadequate records (incomplete or outdated client lists). 6 Longitudinal Design Longitudinal studies compare multiple clients at multiple points in time (at least two points in time). Often there is a baseline (when the client began the program) and an endline (two years later, for example). 7 Cross Sectional Design Cross sectional studies compare multiple clients in the program at one point in time. Ex: On October 1, 2005, program looks at: • Incoming clients • 2-year clients • 4-year clients 8 Calculating Sample Size: How Big is Big Enough? • Sample results are almost never identical to the entire population • The larger the sample of clients, the greater the likelihood that the statistical analysis will yield “significant” results that closely resemble the entire client population. 9 Calculating Sample Size Different Views: • Statistician – maximalist – at least 500 • Field researcher – minimalist – at least 35 to 50 for each subgroup we want to analyze and compare USAID PAT – at least 300 10 Trade off: Larger sample is more accurate, but costs more in time and money To make generalizations about entire population, need a total sample size of 200-400 (depending on total population and confidence level desired) 11 Sample Size Calculator • Creative Research Systems: www.surveysystem.com/sscalc.htm Population Size Confidence Level Confidence Interval 1,000 5 95% 278 5,000 5 95% 357 10,000 5 95% 370 50,000 5 95% 381 100,000 5 95% 383 1,000,000 5 95% 384 Sample Size 12 How to Sample Randomly? RANDOM = giving each client an equal chance to be selected This is done by: • drawing numbers, as in a lottery • numbering all clients and selecting numbers from a random number table • systematically, by selecting every ‘nth’ case from a complete list of clients DANGER!!! The list may be biased by: •who is left out—Is the list up-to-date? 13 Steps in Taking a Simple Random Sample • Number a copy of the complete client list, and note the total number of clients (the last number) • Decide on your sample size • Create a list of random numbers • Use Excel or a random number table to select the sample, matching the numbers from the table with those on your numbered client list. 14 Cluster Sampling To focus on specific subgroups, first classify the population into several subpopulations, called “strata,” then randomly sample from each stratum (subgroup). 15 Cluster Sampling Is a way of selecting randomly, when you have a geographically dispersed population when time is limited. This method can help reduce the time and cost in data collection. Group the clients into clusters (could be branches or loan groups). Randomly choose the clusters. Then sample random individuals from only some randomly chosen clusters. 16 Stratified Sampling • Stratified survey sampling enables you to focus on specific groups (for example, women or rural people), ensuring that they will be represented in the sample. Although random survey sampling, done correctly, will give the researcher roughly proportional samples of all groups, disproportional stratified sampling will guarantee that a certain group is adequately represented. 17 Parametric Statistics • Assumes that the distribution of values for your variables are normal (Bell Curve), and also relatively similar to each other. • In parametric statistics, thirty is a “magic minimum number”--meaning that it is generally accepted as the minimum cell size for each stratum or subgroup of a simple sample. 18 Minimum for Each Subgroup • 30 = ‘minimum magic number’ for each subgroup • To do any statistical analysis between subgroups, need a minimum of 30 in each subgroup in order to have any chance at all of finding ‘significant’ differences. BUT, 30 is NOT enough for your total sample.19 If you want to compare between subgroups, you need 35 in each cell • Since the magic minimum number is 30, and you may have some missing values in some of your interview forms, for practical purposes, you need to always have a minimum number of 35 completed surveys for each cell of the sampling frame. 20 Handling Sampling Problems in the Field • If you cannot interview the client who is sampled (not available, refusal, etc.) • Sample ‘at least’ an extra 40% and have alternates available to be interviewed in each area (subgroup) • Help ensure that you complete 35 questionnaires for each subgroup (if you plan to do additional analysis and compare subgroups) • Make better use of the interviewers’ time 21 Example of a Sampling Frame Survey Sample Region 1 Region 2 Region 3 Total Clients interviewed 112 100 88 300 Substitute sample (approx 40%) 45 40 35 120 Total 157 140 123 420 22 What if there are not enough with the 40% extra? A. Check with the sample tracking coordinator to give you new names B. If there is not time, the field supervisor must adjust in the field 1) Use random number table and select clients from master list that have not already been selected 2) If you do not have a random number table, can ask someone to pick a number between # and ## at random Do NOT introduce bias 3) Write down the changes that you made and how you 23did it An excerpt from a Random Number Table 32 81 45 13 64 38 95 97 50 34 11 56 61 09 76 30 92 70 49 08 65 93 09 78 46 46 20 38 94 01 00 89 24 99 50 43 30 49 24 23 69 27 86 12 17 43 54 44 48 95 16 11 51 06 15 66 93 04 75 01 54 09 04 18 77 69 80 21 45 24 34 71 87 59 55 41 85 42 41 83 47 71 33 13 41 23 58 08 17 30 98 87 22 23 61 21 Can use: www.random.org/nform.html 29 74 93 08 96 21 05 74 36 42 66 47 26 65 09 18 55 36 76 98 64 14 82 91 24 IF YOU DON’T HAVE A CLIENT LIST Random walk sampling -- less expensive but more prone to bias • Watch out for “tarmac bias”, selecting only houses that are easily accessible from the road 25 Example of BDS Sampling • Investigative emphasis: final beneficiaries. – Will use three subsectors (irrigation, cashews, potable water). – Will focus only on end users of the technologies. – Will focus on region surrounding Ziguinchor. 26 Example of Business Development Services Sampling • • • • Sample size = 200 Casamance region is the focus Program has three sub-sectors Sample in each sub-sector stratified according to major differences between types of clients Irrigation – individual owners and group owners Cashew processing – shellers and peelers Potable water – tubewells and rope pumps (rural and peri-urban) 27 Example of BDS Sampling • Generate a list of the direct clients and divide by subgroup • # of clients per stratum or subgroup depends on percentage the stratum constitutes in sector • Select clients using a random number list • Each direct client will provide information to the interviewers so that they can create a list of end users from which some will be chosen according to a predetermined random number list. 28 BDS Sampling Framework Example Subsector Irrigation Cashew Potable Water Total Total Number of Beneficiaries 5,500 800 3,500 9,800 Percentage of Total Beneficiary Population 56% 9% 35% 100% Total Number of Beneficiaries to be interviewed for PAT implementation (based on 300 + 40% extra, or 420) 235 38 147 420 Type of Client Individual Group Shellers Peelers Tubewell Rope Pump Percentage of Total 65% 35% 45% 55% 10% 90% Number to be interviewed 153 82 17 21 15 132 420 Rural Urban 30% 70% 40 92 29