CHAPTER THREE Sampling Design CENSUS • All items in any field of inquiry constitute a ‘Universe’ or ‘Population.’ • A complete enumeration of all items in the ‘population’ is known as a census inquiry. • In this, no element of chance is left and highest accuracy is obtained. • But in practice this may not be true. Even the slightest element of bias in such an inquiry will get larger and larger as the number of observation increases. CENSUS • Moreover, this type of inquiry involves a great deal of time, money and energy. • Therefore, when the field of inquiry is large, this method becomes difficult to adopt because of the resources involved • Further, many a time it is not possible to examine every item in the population • It is better to obtain sufficiently accurate results by studying only a part of total population. • However, it needs to be emphasized that when the universe is a small one, it is no use resorting to a sample survey CENSUS • When field studies are undertaken in practical life, considerations of time and cost almost invariably lead to a selection of only a few items. • The respondents selected should be as representative of the total population as possible • The selected respondents constitute what is technically called a ‘sample’ and the selection process is called ‘sampling technique.’ • The survey so conducted is known as ‘sample survey’ • Algebraically, let the population size be ‘N’, and the group consisting of ‘n’ units is known as ‘sample’. CENSUS • Thus, the researcher must prepare a sample design for his/her study i.e., he/she must plan how a sample should be selected and of what size such a sample would be. SAMPLE DESIGN • A sample design is a definite plan for obtaining a sample from a given population. • It refers to the technique or the procedure the researcher would adopt in selecting items for the sample. • Sample design may as well lay down the number of items to be included in the sample • Sample design is determined before data are collected. • There are many sample designs from which a researcher can choose. SAMPLE DESIGN • Researcher must select/prepare a sample design which should be reliable and appropriate for his research study. STEPS IN SAMPLE DESIGN • While developing a sampling design, the researcher must pay attention to the following points: • Type of universe: The first step in developing any sample design is to clearly define the set of objects, technically called the Universe, to be studied. • The universe can be finite or infinite SAMPLE DESIGN • Sampling unit: A decision has to be taken concerning a sampling unit before selecting • Sampling unit may be a geographical one such as state, district, village, etc., or a construction unit such as house, etc., or it may be a social unit such as family, club, school, etc., or it may be an individual. • The researcher will have to decide one or more of such units that he/she has to select for his/her study. • Source list: It is also known as ‘sampling frame’ from which sample is to be drawn. • It contains the names of all items of a universe (in case of finite universe only) SAMPLE DESIGN • Size of sample: This refers to the number of items to be selected from the universe to constitute a sample. • The size of sample should neither be excessively large, nor too small • An optimum sample is one which fulfills the requirements of efficiency, representativeness, reliability and flexibility. • While deciding the size of sample, researcher must determine the desired precision, acceptable confidence level, size of population variance, size of population, and costs SAMPLE DESIGN • Parameters of interest: In determining the sample design, one must consider the question of the specific population parameters which are of interest. • Budgetary constraint: Cost considerations, from practical point of view, have a major impact upon decisions relating to not only the size of the sample but also to the type of sample. • Sampling procedure: Finally, the researcher must decide the type of sample he/she will use • He /she must decide about the technique to be used in selecting the items for the sample CRITERIA OF SELECTING A SAMPLING PROCEDURE • During sampling survey one must remember that two costs are involved in a sampling analysis viz., the cost of collecting the data and the cost of an incorrect inference resulting from the data • Researcher must keep in view the two causes of incorrect inferences viz., systematic bias and sampling error (i) A systematic bias: results from errors in the sampling procedures, and it cannot be reduced or eliminated by increasing the sample size • Usually a systematic bias is the result of one or more of the following factors: CRITERIA OF SELECTING A SAMPLING PROCEDURE • Inappropriate sampling frame: If the sampling frame is inappropriate i.e., a biased representation of the universe, it will result in a systematic bias. • Defective measuring device: If the measuring device is constantly in error • Non-respondents: If we are unable to sample all the individuals initially included in the sample • Indeterminacy principle: Sometimes we find that individuals act differently when kept under observation • Natural bias in the reporting of data: Natural bias of respondents in the reporting of data CRITERIA OF SELECTING A SAMPLING PROCEDURE (ii) Sampling errors: are the random variations in the sample estimates around the true population parameters. • Sampling error decreases with the increase in the size of the sample; • And it happens to be of a smaller magnitude in case of homogeneous population. • Sampling error can be measured for a given sample design and size, and the measurement of sampling error is usually called the ‘precision of the sampling plan’. CRITERIA OF SELECTING A SAMPLING PROCEDURE • But increasing the size of the sample has its own limitations viz., a large sized sample increases the cost of collecting data and also enhances the systematic bias. • Thus the effective way to increase precision is usually to select a better sampling design which has a smaller sampling error for a given sample size at a given cost. • Generally, while selecting a sampling procedure, researcher must ensure that the procedure causes a relatively small sampling error and helps to control the systematic bias in a better way. TYPES OF SAMPLE DESIGNS 1. On element selection basis • unrestricted or • restricted Unrestricted • When each sample element is drawn individually from the population at large, then the sample so drawn is known as ‘unrestricted sample’ Restricted • all other forms of sampling are covered under the term ‘restricted sampling’. TYPES OF SAMPLE DESIGNS 2. On the representation basis • probability sampling • non-probability sampling. Non-probability sampling • Non-probability sampling is ‘non-random’ sampling. Probability sampling • Probability sampling is based on the concept of random selection, TYPES OF SAMPLE DESIGNS Non-probability sampling: • Non-probability sampling is that sampling procedure which does not afford any basis for estimating the probability that each item in the population has of being included in the sample. • Non-probability sampling is also known as deliberate sampling, purposive sampling and judgement sampling. • In this type of sampling, items for the sample are selected deliberately by the researcher • The investigator may select a sample which shall yield results favorable to his point of view and if that happens, the entire inquiry may get vitiated. TYPES OF SAMPLE DESIGNS Non-probability sampling: • Thus, there is always danger of bias entering into this type of sampling technique. • But, if the investigators are impartial, work without bias and have the necessary experience so as to take sound judgement, the results obtained from an analysis of deliberately selected sample may be tolerably reliable • Sampling error in this type of sampling cannot be estimated and the element of bias, great or small, is always there. TYPES OF SAMPLE DESIGNS Probability sampling: • Probability sampling is also known as ‘random sampling’ or ‘chance sampling’. • Under this sampling design, every item of the universe has an equal chance of inclusion in the sample. • Here it is blind chance alone that determines whether one item or the other is selected. • The results obtained from probability or random sampling can be assured in terms of probability TYPES OF SAMPLE DESIGNS Probability sampling: • Random sampling ensures the law of Statistical Regularity which states that if on an average the sample chosen is a random one, the sample will have the same composition and characteristics as the universe. • This is the reason why random sampling is considered as the best technique of selecting a representative sample. Probability Sampling • There are four main types of probability sample: 1. Simple random sampling 2. Systematic sampling 3. Stratified sampling 4. Cluster sampling 1. Simple Random Sampling • In a simple random sample, every member of the population has an equal chance of being selected. • Your sampling frame should include the whole population. • To conduct this type of sampling, you can use tools like random number generators or other techniques that are based entirely on chance. …Cont’d HOW TO SELECT A RANDOM SAMPLE? Lottery • Such a procedure is obviously impractical, if not altogether impossible in complex problems of sampling. • In fact, the practical utility of such a method is very much limited. Random number table 2952 6641 3992 9792 7979 5911 3170 5624 4167 9525 1545 1396 7203 5356 1300 2693 2370 7483 3408 2769 3563 6107 6913 7691 0560 5246 1112 9025 6008 8126 …Cont’d Example You want to select a simple random sample of 100 employees of Company X. You assign a number to every employee in the company database from 1 to 1000, and use a random number generator to select 100 numbers. COMPLEX RANDOM SAMPLING DESIGNS • Probability sampling under restricted sampling techniques may result in complex random sampling designs. • Such designs may as well be called ‘mixed sampling designs’ • Such designs may represent a combination of probability and nonprobability sampling procedures in selecting a sample. • Some of the popular complex random sampling designs are as follows: 2. Systematic Sampling • Systematic sampling is similar to simple random sampling, but it is usually slightly easier to conduct. • Every member of the population is listed with a number, but instead of randomly generating numbers, individuals are chosen at regular intervals. • In some instances, the most practical way of sampling is to select every ith item on a list. …Cont’d Example All employees of the company are listed in alphabetical order. From the first 10 numbers, you randomly select a starting point: number 6. From number 6 onwards, every 10th person on the list is selected (6, 16, 26, 36, and so on), and you end up with a sample of 100 people. …Cont’d • If you use this technique, it is important to make sure that there is no hidden pattern in the list that might skew the sample. • For example, if the HR database groups employees by team, and team members are listed in order of seniority, there is a risk that your interval might skip over people in junior roles, resulting in a sample that is skewed towards senior employees. 3. Stratified Sampling • This sampling method is appropriate when the population has mixed characteristics, and you want to ensure that every characteristic is proportionally represented in the sample. • It means, if a population from which a sample is to be drawn does not constitute an homogeneous group, stratified sampling technique is generally applied • Under stratified sampling the population is divided into several subpopulations that are individually more homogeneous than the total population • You divide the population into subgroups (called strata) based on the relevant characteristic (e.g. gender, age range, income bracket, job role). …Cont’d • From the overall proportions of the population, you calculate how many people should be sampled from each subgroup. • Then you use random or systematic sampling to select a sample from each subgroup. • The following three questions are highly relevant in the context of stratified sampling: (a) How to form strata? (b) How should items be selected from each stratum? (c) How many items be selected from each stratum or how to allocate the sample size of each stratum? …Cont’d Example The company has 800 female employees and 200 male employees. You want to ensure that the sample reflects the gender balance of the company, so you sort the population into two strata based on gender. Then you use random sampling on each group, selecting 80 women and 20 men, which gives you a representative sample of 100 people. …Cont’d Illustration #1: • let us suppose that we want a sample of size n = 30 to be drawn from a population of size N = 8000 which is divided into three strata of size: N1= 4000, N2= 2400 and N3= 1600. Adopting proportional allocation, calculate the sample sizes for the different strata: …Cont’d • Proportional allocation is considered most efficient and an optimal design when: • the cost of selecting an item is equal for each stratum, • there is no difference in within-stratum variances, and • the purpose of sampling happens to be to estimate the population value of some characteristic. • But in case the purpose happens to be to compare the differences among the strata, then equal sample selection from each stratum would be more efficient even if the strata differ in sizes. …Cont’d • In cases where strata differ not only in size but also in variability and it is considered reasonable to take larger samples from the more variable strata and smaller samples from the less variable strata • Doing so, we can then account for both (differences in stratum size and differences in stratum variability) by using disproportionate sampling design by requiring: …Cont’d • where σ1 σ2 , , ... and σk denote the standard deviations of the k strata, N1, N2,…, Nk denote the sizes of the k strata and n1, n2,…, nk denote the sample sizes of k strata. • This is called ‘optimum allocation’ in the context of disproportionate sampling. • The allocation in such a situation results in the following formula for determining the sample sizes of different strata: …Cont’d Illustration #2: A population is divided into three strata • N1=5000, • N2=2000 and • N3=3000 Standard deviations are: • How should a sample of size n = 84 be allocated to the three strata, if we want optimum allocation using disproportionate sampling design? …Cont’d • In addition to differences in stratum size and differences in stratum variability, if we may have differences in stratum sampling cost, we can have cost optimal disproportionate sampling design, using the following formula 4. Cluster Sampling • Cluster sampling also involves dividing the population into subgroups, but each subgroup should have similar characteristics to the whole sample. • Instead of sampling individuals from each subgroup, you randomly select entire subgroups. • If it is practically possible, you might include every individual from each sampled cluster. • If the clusters themselves are large, you can also sample individuals from within each cluster using one of the techniques above. …Cont’d • This method is good for dealing with large and dispersed populations, but there is more risk of error in the sample, as there could be substantial differences between clusters. • It’s difficult to guarantee that the sampled clusters are really representative of the whole population. …Cont’d Example The company has offices in 10 cities across the country (all with roughly the same number of employees in similar roles). You don’t have the capacity to travel to every office to collect your data, so you use random sampling to select 3 offices – these are your clusters. …Cont’d (iv) Area sampling: • If clusters happen to be some geographic subdivisions, in that case cluster sampling is better known as area sampling. (v) Multi-stage sampling: • Multi-stage sampling is a further development of the principle of cluster sampling. Non-probability Sampling methods • There are four main types of non-probability sample: 1. 2. 3. 4. Convenience sampling Voluntary response sampling Purposive sampling Snowball sampling 1. Convenience Sampling • A convenience sample simply includes the individuals who happen to be most accessible to the researcher. • This is an easy and inexpensive way to gather initial data, but there is no way to tell if the sample is representative of the population, so it can’t produce generalizable results. …Cont’d Example You are researching opinions about student support services in your university, so after each of your classes, you ask your fellow students to complete a survey on the topic. This is a convenient way to gather data, but as you only surveyed students taking the same classes as you at the same level, the sample is not representative of all the students at your university. 2. Voluntary Response Sampling • Similar to a convenience sample, a voluntary response sample is mainly based on ease of access. • Instead of the researcher choosing participants and directly contacting them, people volunteer themselves (e.g. by responding to a public online survey). • Voluntary response samples are always at least somewhat biased, as some people will inherently be more likely to volunteer than others. …Cont’d Example You send out the survey to all students at your university and a lot of students decide to complete it. This can certainly give you some insight into the topic, but the people who responded are more likely to be those who have strong opinions about the student support services, so you can’t be sure that their opinions are representative of all students. 3. Purposive Sampling • This type of sampling involves the researcher using their judgment to select a sample that is most useful to the purposes of the research. • It is often used in qualitative research, where the researcher wants to gain detailed knowledge about a specific phenomenon rather than make statistical inferences. • An effective purposive sample must have clear criteria and rationale for inclusion. …Cont’d Example You want to know more about the opinions and experiences of disabled students at your university, so you purposefully select a number of students with different support needs in order to gather a varied range of data on their experiences with student services. 4. Snowball Sampling • If the population is hard to access, snowball sampling can be used to recruit participants via other participants. • The number of people you have access to “snowballs” as you get in contact with more people. …Cont’d Example You are researching experiences of homelessness in your city. Since there is no list of all homeless people in the city, probability sampling isn’t possible. You meet one person who agrees to participate in the research, and she puts you in contact with other homeless people that she knows in the area. Determining Sample Size • Perhaps the most frequently asked question concerning sampling is, what sample size do I need? • The answer is influenced by number of factors such as: • • • • Purpose of the study Population size The risk of selecting bad sample and The allowable sampling error …Cont’d Sample Size Criteria • In addition to purpose of the study and size of the population, three criteria usually need to be specified to determine the appropriate sample size: • Level of precession • Level of confidence or risk • The degree of variability in the attributes being measured …Cont’d Level of precession • Some times called sampling error • It is the range in which the true value of the population is estimated to be • This range is often expressed in percentage points (e.g. +/- 5 percent) Level of confidence or risk • confidence level is based on ideas encompassed under Central Limit Theorem • It means that, if a 95% confidence level is selected, 95 out 100 sample will have the true population value within the range of precession specified …Cont’d • There is always a chance that the sample you obtained does not represent the true value population value Degree Of Variability • The third criterion, the degree of variability in the attributes being measured • It is refers to the distribution of attributes in the population • The more heterogeneous a population, the larger the sample size required to obtain a given level of precision vise versa Strategies For Determining Sample Size • There are several approaches to determining the sample size, these include: • • • • using a census for small populations, imitating a sample size of similar studies, using published tables, and applying formulas to calculate a sample size. …Cont’d Using published tables • A third way to determine sample size is to rely on published tables which provide the sample size for a given set of criteria. • The following tables present sample sizes that would be necessary for given combinations of precision, confidence levels, and variability. • Please note two things. • First, these sample sizes reflect the number of obtained responses, and not necessarily the number of surveys mailed or interviews planned (this number is often increased to compensate for nonresponse). • Second, the sample sizes in Table 2 presume that the attributes being measured are distributed normally or nearly so. • If this assumption cannot be met, then the entire population may need to be surveyed. …Cont’d Table 1. Sample size for ±3%, ±5%, ±7% and ±10% Precision Levels Where Confidence Level is 95% and P=.5. Size of Population Table 2. Sample size for ±5%, ±7% and ±10% Precision Levels Where Confidence Level is 95% and P=.5. Sample Size (n) for Precision (e) of: ±3% ±5% ±7% ±10% 500 a 222 145 83 600 a 240 152 86 700 a 255 158 88 800 a 267 163 89 900 a 277 166 90 1,000 a 286 169 91 2,000 714 333 185 95 3,000 811 353 191 97 4,000 870 364 194 98 5,000 909 370 196 98 6,000 938 375 197 98 7,000 959 378 198 99 8,000 976 381 199 99 9,000 989 383 200 99 10,000 15,000 1,000 1,034 385 390 200 201 99 99 20,000 1,053 392 204 100 25,000 1,064 394 204 100 50,000 1,087 397 204 100 100,000 1,099 398 204 100 >100,000 1,111 400 204 100 a = Assumption of normal population is poor (Yamane, 1967). The entire population should be sampled. Size of Sample Size (n) for Precision (e) of: Population ±5% ±7% ±10% 100 81 67 51 125 96 78 56 150 110 86 61 175 122 94 64 200 134 101 67 225 144 107 70 250 325 275 350 300 375 154 180 163 187 172 194 112 125 117 129 121 132 72 77 74 78 76 80 400 201 135 81 425 207 138 82 450 212 140 82 …Cont’d Using Formulas To Calculate A Sample Size • Although tables can provide a useful guide for determining the sample size, you may need to calculate the necessary sample size for a different combination of levels of precision, confidence, and variability. • The fourth approach to determining sample size is the application of one of several formulas • For populations that are large, Cochran (1977) developed Equation to yield a representative sample for proportions. …Cont’d • Where: n0 is the sample size, Z2 is the abscissa of the normal curve that cuts off an area α at the tails (1 - α equals the desired confidence level, e.g., 95%) e is the desired level of precision, p is the estimated proportion of an attribute that is present in the population, and q is 1p. Note: the value for Z is found in statistical tables which contain the area under the normal curve. …Cont’d Illustration 1: Suppose we wish to evaluate a state- wide Extension program in which farmers were encouraged to adopt a new practice. • Assume there is a large population but that we do not know the variability in the proportion that will adopt the practice • Furthermore, suppose we desire a 95% confidence level and ±5% precision. Required: calculate the sample size …Cont’d Finite Population Correction For Proportions • If the population is small then the sample size can be reduced slightly. • This is because a given sample size provides proportionately more information for a small population than for a large population. • Thus the sample size (n0) can be adjusted using the following Equation • Where n is the sample size and N is the population size. …Cont’d Finite Population Correction For Proportions Illustration 2: suppose our evaluation of farmers’ adoption of the new practice only affected 2,000 farmers. Required: What is the sample size that would now be necessary …Cont’d A Simplified Formula For Proportions • Yamane (1967:886) provides a simplified formula to calculate sample sizes. • Where n is the sample size, N is the population size, and e is the level of precision. Illustration 3: Given: N= 2000, e= +/- 5% Required: calculate sample size for the given population and sampling error …Cont’d Formula For Sample Size For The Mean • The use of tables and formulas to determine sample size in the above discussion employed proportions that assume a dichotomous response for the attributes being measured. • There are two methods to determine sample size for variables that are polytomous or continuous. • One method is to combine responses into two categories and then use a sample size based on proportion (Smith, 1983). • The second method is to use the formula for the sample size for the mean. …Cont’d Formula For Sample Size For The Mean • The formula of the sample size for the mean is similar to that of the proportion, except for the measure of variability. • The formula for the mean employs σ2 instead of (p x q), as shown in the following …Cont’d Formula For Sample Size For The Mean • Where n0 is the sample size, z is the abscissa of the normal curve that cuts off an area α at the tails, e is the desired level of precision (in the same unit of measure as the variance), and σ2 is the variance of an attribute in the population. • The disadvantage of the sample size based on the mean is that a "good" estimate of the population variance is necessary. • Often, an estimate is not available. • Furthermore, the sample size can vary widely from one attribute to another because each is likely to have a different variance. …Cont’d Formula For Sample Size For The Mean • Because of these problems, the sample size for the proportion is frequently preferred. OTHER CONSIDERATIONS • In determining sample size, there are three additional issues: • First, the above approaches to determining sample size have assumed that a simple random sample is the sampling design. • More complex designs, e.g., stratified random samples, must take into account the variances of subpopulations, strata, or clusters before an estimate of the variability in the population as a whole can be made. …Cont’d • Second, consideration with sample size is the number needed for the data analysis. If descriptive statistics are to be used, e.g., mean, frequencies, then nearly any sample size will suffice. • On the other hand, a good size sample, e.g., 200-500, is needed for multiple regression, analysis of covariance, or log linear analysis, which might be performed for more rigorous state impact evaluations. • The sample size should be appropriate for the analysis that is planned. …Cont’d • In addition, an adjustment in the sample size may be needed to accommodate a comparative analysis of subgroups (e.g., such as an evaluation of program participants with nonparticipants). • Third, the sample size formulas provide the number of responses that need to be obtained. Many researchers commonly add 10% to the sample size to compensate for persons that the researcher is unable to contact. • The sample size also is often increased by 30% to compensate for nonresponse. • Thus, the number of mailed surveys or planned interviews can be substantially larger than the number required for a desired level of confidence and precision END