SAMPLING TECHNIQUES Mrs.S. Valarmathi. M.sc., Mphil., Research Officer, Department of Epidemiology The Tamil Nadu Dr. MGR Medical University Is it possible to taste the whole sambar and add salt NO Is it possible to work out what 50 million people think by asking only 1000? YES What exactly IS a Population ? The entire group under study as defined by research objectives. Sometimes called the “universe.” The totality or aggregate of all individuals with the specified characteristics is a population TYPES OF POPULATION •Finite •Infinite •Hypothetical What exactly IS a “sample”? What exactly IS a “sample”? A subset of the population. What exactly IS a “sampling”? What exactly IS a “sampling”? Selecting and studying a small number of subjects from a specified population in order to draw inferences about the whole population Sampling Terminology Who do you want to generalize to THEORETICAL POPULATION Sampling Terminology What population can you get access to ? STUDYPOPULATION Sampling Terminology Who do you want to generalize to THEORETICAL POPULATION What population can you get access to ? STUDYPOPULATION Sampling Terminology How Can you get access to them? SAMPLING FRAME Sampling Terminology Who do you want to generalize to THEORETICAL THEORETICAL POPULATION POPULATION What population can you get access to ? STUDYPOPULATION How Can you get access to them? SAMPLING FRAME Sampling Terminology Who is in your study? THE SAMPLE Sampling Terminology Who do you want to generalize to THEORETICAL POPULATION What population can you get access to ? STUDY POPULATION How Can you get access to them? SAMPLING FRAME Who is in your study? THE SAMPLE Sampling and representative ness Study Population Sample Theoretical Population Theoretical Population Study Population Sample Sampling Fraction n N Sample • Representativeness express the degree to which the sample data precisely characterize the population. • Sample should reflect the study character of the population . • Strength of statistical inference also depends on representativeness. • Confidence level 95%, 99% for population Errors Survey Errors Random / Sampling Errors Systematic / Nonsampling Errors Why Sampling Errors ? Sampling error can be reduced simply by increasing the sample size! S P S S When you take a sample from a population, you only have a subset of the population - a piece of what you’re trying to understand. Standard Error IV Mean II Mean V Mean I Mean Population Mean III Mean The sampling distribution • The distribution of an infinite number of samples of the same size as the sample in your study is known as the sampling distribution. Standard Error • The standard deviation of the sampling distribution. • It tells us something about how different samples would be distributed • A measure of sampling variability Systematic / Non-sampling Errors • Occurs whether a census or a sample is being used. • Results solely from the manner in which the observations are made. • Cannot be measured. Types of Non-sampling Errors • Coverage error Excluded from frame. • Non response error Follow up on non responses. • Measurement error Bad Question! TYPES OF SAMPLING Sampling Non-Probability Sampling Probability Sampling TYPES OF SAMPLING •Non-Probability Sampling •Probability Sampling TYPES OF SAMPLING Sampling Probability Sampling Non-Probability Sampling Simple Random Convenience Quota Judgement Stratified Cluster Snowball Systematic Convenience Sampling The sample is identified primarily by convenience. Examples: “Man on the street” Medical student in the library Volunteer samples Patient coming to OP Problem : No evidence for representativeness. HAPZHARD SAMPLE Judgment Sampling The sampling procedure in which an experienced research selects the sample based on some appropriate characteristic of sample members… to serve a purpose (Purposive sampling, Deliberate sampling) Quota Sampling Attempt to be representative by selecting sample elements in proportion to their known incidence in the population Snowball sampling Typically used in qualitative research When members of a population are difficult to locate, hidden activity groups, non-cooperative groups Recruit one respondent, who identifies others, who identify others,…. Primarily used for exploratory purposes Respondent Driven Sampling • Applicable for Hidden, Hard to reach populations – MSM, IDU • A systematic form of snowball sampling with unique identification procedure. • Depends social network of target population • Under certain assumptions may be treated as a Random sample Steps involved in RDS • Begin with a small set of identified seeds. • Seeds recruit peers, who recruit their peers, etc., continued till required sample size is achieved. • Recruits are linked by coupons with unique identifying numbers. • Incentives provided for participation and each successful recruit. Wave 1 Seed Wave 2 Wave 3 Wave 4 Wave 5 Wave 1 Seed Wave 2 Wave 3 Wave 4 Wave 5 Wave 1 Seed Wave 2 Wave 3 Wave 4 Wave 5 Wave 1 Seed Wave 2 Wave 3 Wave 4 Wave 5 Wave 1 Seed Wave 2 Wave 3 Wave 4 Wave 5 Wave 1 Seed Wave 2 Wave 3 Wave 4 Wave 5 RDS: Advantages No need of sampling frame / mapping Ease of field operations - Target members recruit samples for you. Reach less visible segment of population CCPUR IDU Network HIV – ve HIV +ve Non Probability Sampling Methods Convenience sampling relies upon convenience and access Judgment sampling relies upon belief that participants fit characteristics Quota sampling emphasizes representation of specific characteristics Snowball sampling relies upon respondent referrals of others with like characteristics Probability samples A sampling that selects subjects with a known, non zero, probability. Removes possibility of bias in selection of subjects. Allows application of statistical theory to results. Important when one wishes to generalize the findings of the sample to the larger population from which samples are selected. Simple random sampling Applicable when population is small, homogeneous & readily available Required number of units are selected randomly. Each unit of the frame has an equal non zero probability of selection. Simple random sampling Merits • Easy to implement if list frame available or small population • Approximately satisfies the sampling model on which conventional statistics is based, so we can carry out complex analyses Demerits •Need complete list of units •Units may be scattered SRS METHODS 1. LOTTERY METHOD 2. RANDOM NUMBERS TABLE 3. Computer Generated Random numbers Simple random sampling Example: evaluate the prevalence of hypertension among the 1200 children attending schoolin the age group 14 to 17 years. List of children attending the school Children numerated from 1 to 1200 Sample size = 100 children Random sampling of 100 numbers between 1 and 1200 Simple random sampling Table of random numbers 57172 33883 77950 11607 56149 80719 93809 40950 12182 13382 38629 60728 01881 23094 15243 53501 07698 22921 68127 55309 92034 50612 81415 38461 07556 60557 42088 87680 67344 11596 55678 65101 19505 86216 59744 48076 94576 32063 99056 29831 21100 58431 24181 25930 00501 10713 90892 84077 98504 44528 24587 50031 70098 28923 10609 01796 38169 77729 82000 48161 65695 73151 48859 12431 46747 95387 48125 68149 01161 79579 37484 36439 69853 41387 32168 30953 88753 75829 11333 15659 87119 24498 47228 83949 79068 17646 83710 48724 75654 23898 08846 23917 05243 25405 01527 43488 99278 65660 06175 54107 17822 08633 71626 05622 26902 09839 15859 17009 49931 83358 45552 24164 41125 35670 17152 23683 01331 07421 16181 23463 17046 13211 28751 72554 61221 09190 49946 08049 64864 30237 29959 45817 74577 67119 94303 75230 86776 35513 14291 38453 66516 10853 88163 97869 39641 49168 31460 71120 80855 77021 76825 74305 37545 68698 54986 77795 43909 89405 42791 00614 67448 56624 48980 94057 74773 63154 78796 04038 74462 88092 36970 02048 91507 91715 02035 46279 18239 68196 47201 08759 38964 41870 49607 70743 75889 49529 31286 27549 56684 51834 66391 58116 73099 75246 14551 72201 99522 31522 16050 49881 10910 22705 47687 75634 85224 45611 83534 26300 Generating Random Numbers •This is a better and perhaps more efficient for selecting a simple random sample. •Computers and even your calculators can be used to generate random digits. The randomly produced digits can be used to pick your samples. •However, a complete listing of the members of the population is needed in this type of random selection. Excel: Enter the function = RNDBETWEEN () on any blank cell F9 refreshes the random numbers Through Calculator Press SHIFT · = RAN# Systematic random sampling The defined target population is ordered and the sample is selected according to position using a skip interval Systematic random sampling Systematically spreads sample through a list of population members In nearly all practical examples, the procedure results in a sample equivalent to SRS INTERVAL SAMPLING Systematic random sampling N = 1200, and n = 60 sampling interval = 1200/60 = 20 List persons from 1 to 1200 Randomly select a number between 1 and 20 (ex : 8) the 1st person selected = the 8th on list 2nd person = 8 + 20 = the 28th etc ..... Systematic sampling 1. Careful that there is no systematic rhythm to the flow or list of people. 2. If every Kth person on the list is, say, “rich” or “senior” or some other consistent pattern, avoid this method Stratified Random Sampling A method of probability sampling in which the population is divided into different subgroups and samples are selected from each Stratified Random Sampling Methods 1. Proportional Allocation Method 2. Equal Allocation Method Proportional Allocation Method Epidemiological profile of tuberculosis under 12 years of age. Sample size is 120 centre 1 - 56% -67 centre 2– 24% - 29 Centre 3 – 20% - 24 Equal Allocation Method Epidemiological profile of tuberculosis under 12 years of age. Sample size is 120 centre 1 - 40 centre 2– 40 Centre 3 – 40 CHENNAI NORTH GOVT. SOUTH PRIVATE 5 SCHOOLS PRIVATE 5 SCHOOLS EACH 10 STUDENTS 5 GIRLS GOVT. EACH 10 STUDENTS 5 5 BOYS GIRLS 5 BOYS CENTRAL GOVT. PRIVATE EAST GOVT. PRIVATE Cluster Sampling • Population by it self is divided into number of natural groups known as clusters (geographic or organizational) . • The units are heterogeneous within cluster but homogeneous between clusters. • Cluster sample is obtained by selecting the clusters by simple random sampling and all the units in the sampled clusters are included in the sample Cluster Sampling • Advantages – Sampling frame is not required – Simple and Easy – Less resources required • Disadvantages – Imprecise if units within clusters are homogeneous Cluster Sampling Randomly select Clusters and select all subjects Randomly select Clusters and select subjects randomly Cluster Sampling Especially useful for door-to-door personal surveys (significantly reduces costs) However, clustering increases sampling errors (people who live close together tend to be more similar) Drawing the clusters You need : Map of the region Distribution of population (by Taluks or area) Age distribution (population 5-12:3%) Taluks Mettur Sankari Salem Edapadi Omalur Yercaud Vazhappadi Attur Gangavalli Pop. 5-12 53000 7300 106000 13000 26500 6600 40000 6600 53000 1600 220 3200 400 800 200 1200 200 1600 Taluks 5-12 Mettur Sankari Salem Edapadi Omalur Yercaud Vazhappadi Attur Gangavalli 1600 220 3200 400 800 200 1200 200 1600 Then compute sampling fraction :K = 9420/30 = 314 Taluks Mettur Sankari Salem Edapadi Omalur Yercaud Vazhappadi Attur Gangavalli 5-12 1600 220 3200 400 800 200 1200 200 1600 5 1 10 1 2 1 4 1 5 Drawing households and children On the spot Go to the center of the Taluk , choose direction (random) Number the houses in this direction Ex: 21 Draw random number (between 1 and 21) to identify the first house to visit From this house progress until finding the 7 children ( itinerary rules fixed beforehand) Multistage Sampling • Selection of subjects is done in stage by stage • Any one of the sampling schemes can be applied during each stage • Multistage sampling generally ends with unequal probability to sampling unit • Analysis procedure becomes more complex. Multistage Sampling • Advantages – No sampling frame of population required – Most feasible approach for large populations • Disadvantages – Several sampling lists – Needs more man power Multi Stage Sampling • District Level Household & Facility Survey – Stage – 1 Selection of District – Stage – 2 Selection of Villages – Stage – 3 Selection of Households • Immunization coverage in a state – Stage – 1 Selection of District – Stage – 2 Selection of PHCs – Stage – 3 Selection of Subcenters Probability Sampling Methods Simple random sampling relies upon simple randomization Systematic sampling relies upon on the sampling interval Stratified sampling emphasizes dividing into groups and subgroups Cluster sampling relies upon geographical or organizational groups Cluster Clustersampling samplingrelies reliesupon upongeographical geographicalor ororganizational organizationalgroups groups Factors Affecting Choice of Sampling Design Sampling Frame: Existence and Size Costs Precision Desired Sub-Population Comparisons TO SUMMARIZE •Population •Sample •Standard Error •Types Of Sampling •Non Probability sampling •Probability Sampling Social actors are not predictable like objects. Randomized events are irrelevant to social life. Probability sampling is expensive and inefficient. Therefore… Non-probability sampling is the best approach. We want to generalize to the population. Random events are predictable. We can compare random events to our results. Therefore… Probability sampling is the best approach. Conclusions Probability samples are the best Beware of … •refusals •absentees •“do not know” Ensure •Validity •Precision …..within available constraints Conclusions If in doubt… Call an Experienced Person !!!! Or Call a statistician !!!! Professor: Hope u understand the sampling techniques Student: It is impossible to draw conclusion for the whole population by drawing samples Professor explained the whole thing again Student: I am not convinced Professor: Well, so next time when you go for a blood test ask them to extract all the blood from your body