Table of Contents Word did not find any entries for your table of contents. In your document, select the words to include in the table of contents, and then in the Formatting Palette under Styles, click a heading style. Repeat for each heading that you want to include, and then insert the table of contents in your Manual Formatting option and then type the entries manually. Sampling THE CONTENTS OF THIS DOCUMENT ARE TAKEN VERBATIM FROM: 1. The Research Methods Knowledge Base. Retrieved from http://www.socialresearchmethods.net/kb/index.php on June 20, 2012. 2. The Ahfad University for Women. Methods for Social Researchers in Developing Counries. 3. Online Statistics Education: A Multimedia Course of Study (http://onlinestatbook.com/). Project Leader: David M. Lane, Rice University Sampling Introduction The purpose of most research is to learn something about a larger group called the population. One way to learn something about a population is to collect data from all the members of the population. This is known as taking a census or enumerating the population. Generally, because of the time and cost involved, it is impractical to enumerate a population. The only practical alternative is to obtain data from a part of the population, called a sample. The decision tree below summarizes the decisions involved in sampling: Sampling Terminology Before we look at the different methods of sampling, we need to first understand certain concepts and terms used in sampling: Population: Your research question defines the group or population you want to learn about. Many small populations can be defined precisely. Many others, however, are large and constantly changing, making it hard to define them accurately. Under such scenarios, you have to define your population as clearly as possible by placing specific geographic, time, membership or other limits on the abstract population to obtain a target population we want our sample to represent. Sampling element: A single member or unit of target population about which information will be obtained. It is often an individual but can also be an organization, group or household. Sample frame: This is a list of all sampling elements (members) in your target population from which the sample is selected. I.e. It is the practical, operational definition of the target population. Statistic: A finding based on a sample. Generally reported as percentages, averages, and measures of variation. Parameter: A finding based on measuring the entire population. Since this is next-to-impossible, parameters are usually not known and are estimated using statistics. Sampling/Systematic Error/Bias: Describes how well a statistic estimates a parameter. Can arise in many situations. For instance, even if you are able to identify perfectly the population of interest, you may not have access to all of them. And even if you do, you may not have a complete and accurate enumeration or sampling frame from which to select. And, even if you do, you may not draw the sample correctly or accurately Types of Sampling There are two types of sampling – probability and nonprobability sampling. Probability sampling methods rely only on random or chance selection. A carefully selected probability sample allows a researcher to generalize from sample results to the population from which the sample was selected. In contrast, nonprobability samples are selected by means other than chance, typically through some form of human choice or judgment. As a result, there is no way to show that a nonprobability sample represents our target population. Therefore, researchers typically prefer probabilistic sampling methods over nonprobabilistic ones as they are considered to be more accurate and rigorous. Nonetheless, nonprobability sampling is important as it has its uses under circumstances where it is not feasible, practical or theoretically sensible to do random sampling. With this in mind, we now turn to four simple methods for selecting probability and nonprobability samples respectively. Probability Sampling 1. Simple Random Sampling: This is the simplest form of random sampling. The objective is to select n units out of N (target population) such that each unit has an equal chance of being selected. This can be done, for example, using a table of random numbers in excel. Steps: Define the target population and sampling element Select a sampling frame; and Select the sample 2. Stratified Random Sampling: Also sometimes called proportional or quota random sampling, it involves dividing your population into homogeneous subgroups and then taking a simple random sample in each subgroup 3. Systematic Random Sampling1: The steps here are similar to that of simple random sampling. However, instead of drawing each element at random, the investigator calculates a sampling interval and uses this interval in selecting the elements to be included in the sample. Steps: Number the units in the population from 1 to N Decide on the n (sample size) that you want or need k = N/n = the interval size Randomly select an integer between 1 to k as a starting point Take every kth unit as your sample element 4. Cluster Sampling: This is used when there is no sampling frame or when the target population is very large and/or scattered over a wide area. Steps: Divide population into clusters (usually along geographic boundaries). Each cluster defines some part of the target population Randomly sample clusters NB: Before using systematic sampling, always check to see if the names/units on the sampling frame are arranged in any systematic order. If some kind of order exists, another sampling method should be used. Why? 1 For example, suppose you were studying the levels of job satisfaction among staff of a government ministry. Since the ministry has a complete list of all staff, you decide to use systematic sampling with a sampling interval of 20. Now, suppose that the first name you selected was a supervisor and that a supervisor's name appeared as every 20th name thereafter. If this were the case, you would have selected a supervisor for the first sample member. Then, by using the interval of 20 you would have selected only supervisors for the rest of the sample. Thus, the entire sample would be made up of supervisors. Obviously any results based on responses of only supervisors would be biased and could not be used to describe the morale of the staff in the ministry. Measure all units within sampled clusters Note, however, that in most real applied social research, we usually combine the simple methods described here to effectively address our sampling needs through multi-stage sampling. Nonprobability Sampling 1. Convenience Sampling: Also called haphazard sampling, researcher selects some number of persons or other sampling units because they are easily accessible. 2. Quota Sampling: A quota for subgroup is set in advance and persons having the right characteristics are selected nonrandomly until that number is met. 3. Expert Sampling: This involves the assembling of a sample of persons with known or demonstrable experience and expertise in some area. There are two reasons you might do expert sampling. First, because it would be the best way to elicit the views of persons who have specific expertise. Second, to provide evidence for the validity of another sampling approach you've chosen 4. Network Sampling: Also called snowball sampling, you begin by identifying someone who meets the criteria for inclusion in your study. You then ask them to recommend others who they may know who also meet the criteria. Although this method would hardly lead to representative samples, there are times when it may be the best method available. For example, when you are trying to reach populations that are inaccessible or hard to find. Sampling Biases Sampling biases are errors that arise due to the method of sampling used, not the sample itself. Thus, there is no guarantee that random sampling will result in a sample representative of the population just as not every sample obtained using a biased sampling method will be greatly non-representative of the population. Three types of biases will be discussed here: 1. Self-selection Bias: A self-selection bias can occur in three ways. Firstly, imagine a university newspaper ran an ad asking for students to volunteer for a study in which intimate details of their sex lives would be discussed. Clearly the sample of students who would volunteer for such a study would not be representative of the students at the university. Similarly, an online survey about computer use is likely to attract people more interested in technology than is typical. In both these examples, people who "self-select" themselves for the experiment are likely to differ in important ways from the population the experimenter wishes to draw conclusions about. Secondly, self-selection bias can occur when the non-random component occurs after the potential subject has enlisted in the experiment. Consider again the hypothetical experiment in which subjects are to be asked intimate details of their sex lives, assume that the subjects did not know what the experiment was going to be about until they showed up. Many of the subjects would likely leave the experiment leaving a non-random sample remaining. Thirdly, there can be a non-response bias (a form of self-selection bias) where certain subjects are more likely to respond than others. A commonly-cited example is the poll taken by the Literary Digest in 1936 that indicated that Landon would win an election against Roosevelt by a large margin when, in fact, Roosevelt won by a large margin. There was a non-response bias such that those favoring Landon were more likely to return their survey than those favoring Roosevelt. 2. Survivorship (Attrition) Bias Survivorship bias occurs when the observations recorded at the end of an investigation are a nonrandom set of those present at the beginning of the investigation. For example, the gains in stock funds is an area in which survivorship bias often plays a role. The basic problem is that poorly-performing funds are often either eliminated or merged into other funds. Suppose one considers a sample of stock funds that exist in the present and then calculates the mean 10-year appreciation of those funds. These results cannot be validly generalized to other stock funds of the same type as the poorly-performing stock funds that are not in existence (did not survive for 10 years) are not included and therefore there is a bias toward selecting betterperforming funds. There is good evidence that this survivorship bias is substantial (Malkiel, 1995).2 Another example of attrition bias is when data cannot be obtained from some of the persons or other units selected as members of the sample. Maybe some sample members cannot be located, some are never at home when an interviewer tries to contact them, or some (eventually) refuse to be interviewed. Whatever the reason, the actual number of persons interviewed is less than the number selected to form the sample. As a result, the resulting sample becomes less representative of the target population and the value of the results for generalizing to the target population is reduced. 3. Sampling Selection Bias This bias arises through the researcher’s choice of sampling method or choice of units sampled and could pose a large problem in nonprobability sampling methods such as convenience and quota sampling where samples are chosen non-randomly. Under-coverage bias is a particularly salient form of sampling selection bias where the researcher samples too few observations from a segment of the population. Again, using the LandonRoosevelt example from above, a common explanation for why Roosevelt won is that poorer people were under-covered because they were less likely to have telephones and yet more likely to support Roosevelt. 2 Malkiel, B. G. (1995) Returns from investing in equity mutual funds 1971 to 1991. The Journal of Finance, 50, 549-572