SAMPLING Abraham L.(MPHE) OBJECTIVES At the end of the session, participants are expected to: • Identify and define the population(s) to be studied. • Identify and describe common methods of sampling. • Discuss problems of bias that should be avoided when selecting a sample. • Compute sample size for different study designs • Decide on the sampling method(s) and sample size(s) most appropriate for the research design you are developing. 4/30/2022 2 Outline • Population • Sampling techniques • Sample size determination 4/30/2022 3 Brain storming Define the following terms? • Target population • Source population • Study population • Participant population • Sampling frame • Sampling unit • Study unit 4/30/2022 4 Definition of terms • Population in scientific research refers to the material of the study, whether it is human subjects, animals or inanimate objects. • Target/Reference population – The population about which the researcher wants to draw conclusions – The population of interest for implementing public health actions – Example: all children in a town • Source population – All or accessible subset of the target population, from which the sample is drawn for particular study – Example, all children attending schools as proxy for all children living in the town 4/30/2022 5 Definition… • Study population – Study populations is part of the population from whom you would collect the data – Sampled individuals who fulfill inclusion and exclusion criteria – Example: sampled school children who fulfill the eligibility criteria • Participant population – Refers to eligible members of the sample who actually are investigated – Non-respondents in the sample population are not participants – Example: sampled students who have been studied 4/30/2022 6 Definition… • Sampling frame – List of all the sampling units from which sample is drawn • Sampling Unit ― Smallest unit from which sample can be selected, or ― The unit of selection in the sampling process • Sampling fraction ― The ratio of the number of units in the sample to the number of units in the reference population (n/N). 4/30/2022 7 Definition… • Study unit or Observation unit: is the unit from which data are actually collected. Example: a student • The way we define our study population and our study unit depends on the problem we want to investigate and on the objectives of the study Problem Source population Study population Study unity Malnutrition related All children 6-24 Children 6-24 months One child between 6 to weaning in district months of age in of age in X district and 24 months in X district X who fulfill the district X eligibility criteria High droup-out rates All primary schools Selected primary One primary school in primary schools in in district Y schools in district Y in district Y district Y Inappropriate record keeping of hypertensive patients registered in hospital Z 4/30/2022 All records of Records of One record of a hypertensive hypertensive patients hypertensive patient patients in hospital in hospital Z registered in hospital Z Z 8 What is sampling? • Sampling is the process of selecting a number of study units from entire population of interest. • When we draw a sample from a population we will be confronted with the following questions: What is the group of people (study population) from which we want to draw a sample? How many people do we need in our sample? How will these people be selected? 4/30/2022 9 Why Sampling? • Cost in terms of money, time and manpower • Accessibility • Utility – e.g. to do diagnostic laboratory test you don’t draw the whole of patient’s blood. 4/30/2022 10 Advantages of sampling • Feasibility: Sampling may be the only feasible method of collecting information • Reduced cost: Sampling reduces demands on resource such as finance, personnel, and material. • Greater accuracy: Sampling may lead to better accuracy of collecting data • Greater speed: Data can be collected and summarized more quickly 4/30/2022 11 Disadvantages of sampling • There is always a sampling error • Sampling may create a feeling of discrimination within the population • Not advisable where every unit in the population is legally required to have a record • Minority and smallness in number of sub-groups often render study to be suspected • Sampling bias 4/30/2022 12 Error in sampling • No sample is the exact mirror image of the population 1. 4/30/2022 Sampling error (chance/ Random error): Errors introduced due to errors in the selection of a sample. Can not be avoided or totally eliminated 13 1. Sampling error… • The chance and random variation in variables that occurs when any sample is selected from the population • Sampling error is to be expected • To avoid sampling error, a census of the entire population must be taken • To control/minimize sampling error, increase sample size and use various sampling methods. 4/30/2022 14 2. Non-sampling error (systematic error) • In the design or conduct of a sampling procedure which results in distortion of the sample • So that it is no longer representative of the reference population Observational error Respondent error Lack of preciseness of definition Error in editing and tabulation of the data • It can be eliminated or reduced by careful design and conduct of the study, not by increasing the sample size 4/30/2022 15 Sampling techniques/methods • Refers to ‘how the sampled population will be selected from the study population?’ • Clearly define study population and study unit – Study population – individuals, households, institutions, records – Study units – one individual, single household, institution or record • Types: probability and non-probability – Probability – quantitative studies – Non-probability – qualitative studies 4/30/2022 16 Probability sampling methods • Any method of sampling that utilizes some form of random selection. • Involves random selection of a sample • Every sampling unit has a known and non-zero probability of selection into the sample • Involves the selection of a sample from a population based on chance 4/30/2022 17 Probability sampling… • Probability sampling is: more complex, more time-consuming and usually more costly than non-probability sampling • However, because study samples are randomly selected and their probability of inclusion can be calculated: reliable estimates can be produced and inferences can be made about the population. 4/30/2022 18 Probability sampling… • There are several different ways in which a probability sample can be selected • The method chosen depends on a number of factors, such as the available sampling frame how spread out the population is how costly it is to survey members of the population Homogeneity of the target population 4/30/2022 19 Types of probability sampling methods 1. Simple random sampling 2. Systematic random sampling 3. Stratified random sampling 4. Cluster sampling 5. Multi-stage sampling 4/30/2022 20 Simple random sampling(SRS) • The required number of individuals are selected at random from the sampling frame, a list or a database of all individuals in the population • Each member of a population has an equal chance of being included in the sample. • To use a SRS method: Make a numbered list of all the units in the population Each unit should be numbered from 1 to N (where N is the size of the population) Select the required number. 4/30/2022 21 Simple random sampling… • The randomness of the sample is ensured by: Use of “lottery’ methods Table of random numbers Computer programs 4/30/2022 22 Simple random sampling… • Limitations of SRS Requires a sampling frame. Difficult if the reference population is dispersed. Minority subgroups of interest may not be selected 4/30/2022 23 Systematic random sampling • Sometimes called interval sampling • Selection of individuals from the sampling frame systematically rather than randomly • Individuals are taken at regular intervals down the list • The starting point is chosen at random 4/30/2022 24 Systematic random sampling • Important if the reference population is arranged in some order: Order of registration of patients Numerical number of house numbers Student’s registration books • Taking individuals at fixed intervals (every kth) based on the sampling fraction 4/30/2022 25 Steps in systematic random sampling • Number the units in the population from 1 to N • Decide on the n (sample size) that you want • k = N/n = the interval size • Randomly select an integer between 1 to k • Then, take every kth unit 4/30/2022 26 Systematic random sampling… • E.g.-to select 100 students from 1200, first calculate sampling interval-1200 divided by 100=12. Then randomly select the first student and finally pick every 12 th student until 100 students are selected. • Advantage Easier and less time consuming • Limitations Risk of bias Difficult to use when a cyclic repetition is inherent in the sampling frame. 4/30/2022 27 Stratified random sampling • It is done when the population is known to be have heterogeneity with regard to some factors and those factors are used for stratification • A method of probability sampling in which the population is divided into different subgroups and samples are selected from each subgroup • These subgroups are homogeneous and mutually exclusive groups called strata • A population can be stratified by any variable that is available for all units prior to sampling (e.g., age, sex, province of residence, income, profession, etc.). 4/30/2022 28 Stratified random sampling… • Divide the population into non-overlapping groups (i.e., strata) N1, N2, N3, ... Ni, such that N1 + N2 + N3 + ... + Ni = N. • A separate sample is taken independently from each stratum depending on the type of allocation • Elements within each strata are homogeneous, but are heterogeneous across strata. • A simple random or a systematic sample is taken from each strata 4/30/2022 29 Why stratification? • It can make the sampling strategy more efficient • A larger sample is required to get a more accurate estimation if a characteristic varies greatly from one unit to the other • For example, if every person in a population had the same salary, then a sample of one individual would be enough to get a precise estimate of the average salary. 4/30/2022 30 Why stratification? • If you use a SRS approach in the whole population without stratification, the sample would need to be larger than the total of all stratum samples to get an estimate of total income with the same level of precision. • Stratified sampling ensures an adequate sample size for subgroups in the population of interest • When a population is stratified, each stratum becomes an independent population and you will need to decide the sample size for each stratum 4/30/2022 31 Stratified random sampling… • There are different sample allocation methods in order to select sample from each strata: 1. Proportional allocation: allocating sampling proportional to the total population of each strata using the formula: ni = n* Ni N Where, n=total sample size to be selected – N=total population – Ni = total population of each strata – ni=sample size from each strata 2. Equal allocation: allocating equal sample for each strata 4/30/2022 32 Cluster sampling • Usually, it is too expensive to carry out SRS Population may be large and scattered Complete list of the study population unavailable Travel costs can become expensive if interviewers have to survey people from one end of the country to the other (most widely used to reduce the cost) • The clusters should be homogeneous, unlike stratified sampling where the strata are heterogeneous 4/30/2022 33 Cluster sampling… • A cluster sample is a simple random sample of groups or clusters of elements • Useful method when it is difficult or costly to develop a complete list of the population members or when the population elements are widely dispersed geographically • Cluster sampling may increase sampling error due to similarities among cluster members 4/30/2022 34 Stratification Vs Clustering Stratification • Dived population into groups different each other: Sex, age, race, reidence • Sample randomly from each group(strata) • Less error compared to simple random • More expensive to obtain stratification information before sampling 4/30/2022 Clustering • Dived population into comparable groups: schools, cities • Sample randomly some of groups (clusters) • More error compared to simple random • Reduces costs to sample only some areas or organization 35 Multi-stage sampling • • • • • • It is the combination of different sampling methods Carried out in stages Used in very large and diverse populations The method used in most community based studies This type of sampling requires at least two stages The primary sampling unit (PSU) is the sampling unit in the first sampling stage. • The secondary sampling unit (SSU) is the sampling unit in the second sampling stage, etc. 4/30/2022 36 Multi-stage sampling… Woreda PSU Kebele SSU Sub-kebele TSU HH 4/30/2022 37 Non-probability sampling • Non probability sampling does not involve random selection • Independent of the rationale of probability theory • In non-probability sampling, every item has an unknown chance of being selected • In non-probability sampling, there is an assumption that there is an even distribution of a characteristic of interest within the population • This is what makes the researcher believe that any sample would be representative and because of that, results will be accurate. 4/30/2022 38 Non-probability sampling… • Reliability cannot be measured in non-probability sampling; the only way to address data quality is to compare some of the survey results with available information about the population • They are quick, inexpensive and convenient • When unfeasible to conduct probability sampling 4/30/2022 39 Types of non-probability sampling 1. Convenience or haphazard sampling 2. Volunteer sampling 3. Judgment sampling 4. Quota sampling 5. Snowball sampling 4/30/2022 40 Convenience • Convenience sampling is sometimes referred to as haphazard or accidental sampling. • It is not normally representative of the target population because sample units are only selected if they can be accessed easily and conveniently • The method is easy to use, but that advantage is greatly offset by the presence of bias • It can deliver accurate results when the population is homogeneous • E.g.-including all patients visiting OPD in one day to study their attitude towards family planning • Drawback-unrepresentative samples 4/30/2022 41 Volunteer sampling • As the term implies, this type of sampling occurs when people volunteer to be involved in the study. • In pharmaceutical trials (drug testing), for example, it would be difficult and unethical to enlist random participants from the general public. • In these instances, the sample is taken from a group of volunteers • Sampling voluntary participants as opposed to the general population may introduce strong biases. • Often in opinion polling, only the people who care strongly enough about the subject tend to respond • The silent majority does not typically respond, resulting in large selection bias 4/30/2022 42 Judgment Sampling • It is used when a sample is taken based on certain judgments about the overall population • The underlying assumption is that the investigator will select units that are characteristic of the population • The critical issue here is objectivity: how much can judgment be relied upon to arrive at a typical sample? • Judgment sampling is subject to the researcher's biases and is perhaps even more biased than haphazard sampling. 4/30/2022 43 Judgment Sampling • Researchers often use this method in exploratory studies like pre-testing of questionnaires and focus groups. • They also prefer to use this method in laboratory settings where the choice of experimental subjects (i.e., animal, human) reflects the investigator's pre-existing beliefs about the population. • One advantage of judgment sampling is the reduced cost and time involved in acquiring the sample 4/30/2022 44 Quota Sampling • This is one of the most common forms of non-probability sampling • Sampling is done until a specific number of units (quotas) for various sub-populations have been selected • In many cases where the population has no suitable frame, quota sampling may be the only appropriate sampling method • E.g.-certain number of patients from each religion to assess their attitude towards family planning 4/30/2022 45 Snowball Sampling • Used in studies involving respondents who are rare to find. • To start with, the researcher compiles a short list of sample units from various sources • Each of these respondents are contacted to provide names of other probable respondents. 4/30/2022 46 Sample size determination • Sample size is the number of study subjects selected to represent a given study population • Should be sufficient to represent the characteristics of interest of the study population • In estimating a certain characteristic of a population, sample size calculations are important to ensure that estimates are obtained with required precision or confidence 4/30/2022 47 Common questions • “How many subjects should I include in my study?” • Which variables should be included in sample size calculation? Should be related to the study’s primary outcome variable If the study have secondary outcome variables which are considered important, the sample size should also be sufficient for the analysis of these variables 4/30/2022 48 Sample size determination Depends on • Objective of the study • Design of the study • Plan for statistical analysis • Accuracy of the measurement to be made • Degree of precision required for generalization • Degree of confidence 4/30/2022 49 Sample size determination • Sample size determination techniques Compute manually using formulae Use computer soft wares like statcalc of Epi Info, OpenEpi and STATA Formulae vary depending on type of design 4/30/2022 50 Sample size – prevalence studies • For descriptive cross-sectional designs – Single population proportion estimation formula – Decide and enter the value of required parameters • n 𝑝(1−𝑃) 2 =(Za/2) 𝑑2 4/30/2022 51 Sample size – prevalence studies • n-is minimum sample size • p-is estimate of the prevalence rate for the population – From previous studies – A pilot or preliminary sample – If not, to come with large sample size; set P=0.5. • d-is the margin of sampling error tolerated; commonly taken to be 5% but should be decreased for rare conditions • Za/2 is the standard normal variable at 1-α % confidence level and α is mostly 5% i.e. 95% confidence level is used • N-population size 4/30/2022 52 Exercise – Sample size in prevalence studies • What sample size do we need to estimate the prevalence of HIV among residents of Addis Ababa city with 95% confidence so that the error of estimation is within 5% of its actual value? – Use Open Epi software to compute the sample size. 4/30/2022 53 Sample size – comparing two proportions • For comparative cross-sectional and cohort designs (IP) – two population proportion (RR or PR ratio) estimation formula – Decide and enter the value of required parameters • Confidence level – usually taken to be 95% • Power – Usually a power of 80% is used • Ratio of non-exposed to exposed in sample – 1:1 is statistically efficient – If exposure is rare increase ratio • Percent of unexposed with outcome • Percent of exposed with outcome or • Risk or prevalence ratio 4/30/2022 54 Exercise – Sample size for comparing two proportions • A study is designed to compare the proportion of nurses leaving health services in urban and rural areas. From available literature 30% and 15% of nurses are estimated to leave services in rural and urban areas within three years of graduation respectively. What sample size is required for the study? 4/30/2022 55 Sample size – case-control studies • For case-control studies – Formula for case-control (unmatched) – Decide and enter the value of required parameters • Confidence level – usually taken to be 95% • Power – Usually a power of 80% is used • Ratio of controls to cases in sample – 1:1 is statistically efficient – If disease is rare increase ratio • percent of controls exposed • percent of cases exposed or • OR 4/30/2022 56 Exercise • Suppose you want to compare exposure status between cases and controls at 95% confidence level and with power of 80% using a 1:1 ratio of cases to controls while looking for an odds ratio of 2. You assume the prevalence of exposure in controls to be 25%. How many sample size do you need? 4/30/2022 57 Other considerations in sample size determination • Sampling technique – In complex samples (cluster, multistage) increase the sample size to account for design effect – Design effect - ratio variance of estimate derived from a complex sampling design to the variance of estimate from simple random sample – Usually sample size is multiplied by 2 (1.5) in cluster sampling • Increase – large PSU, many stages, clustered variable 4/30/2022 58 Other considerations in sample size determination • Non-response – Add contingency – 10% • More – sensitive topic, self-administered questionnaire ( up to 30%) – Response rate for cross-sectional survey should be >85% • More than one item to be measured – Use the most important one or the one which gives higher sample 4/30/2022 59 Other considerations in sample size determination • Finite population correction formula can be used as needed • If N (entire population) is less than 10, 000, the required sample size will be smaller • In such cases calculate the final sample estimate nfinal by using the following formula: where nfinal = the final sample size, n= initial sample size and N = total number population 4/30/2022 60 Sample size for other designs • Qualitative methods – estimate, not determined • Reading – Matched case-control study – Survival analysis – Repeated measurement cohort studies 4/30/2022 61