Sampling Sampling is the process of selecting from a population of interest so that based upon the sample we are able to reasonably generalize the results of the analysis back to the population from which the sample was chosen. In other words, sampling is a procedure by which characteristics about a large body of people or units (a population) can be inferred by getting data from only a few (the sample). Ideally we would be able to accurately generalize the results back to the population from which we drew the sample; however, a number of potential problems can arise. Among them are: Sampling error - the difference in the sample estimate and what we would have found by measuring the entire population Sampling biases - nonrandom errors due to inadequate data or clerical mistakes (for example, an interviewer always interviews the person who answers the door (typically the youngest family member) Non-sampling biases – for example the person recording the responses to telephone interviews writes 2s that look like 7s - in other words the kinds of errors that might occur even if we sampled the entire population There are two general types of samples, probabilistic and non-probabilistic. Probabilistic samples, by definition, meet the requirements of a good sample. A good sample is one in which every member of the population has an equal probability of being selected for the sample. Types of probabilistic samples are: Simple random sample – In a simple random sample each unit in the population has a known and equal chance of being selected - like drawing names out of a hat without replacement. For example in a company with 1,000 employees, a sample of 50 employees who were randomly picked would indicate that every employee had a one in 20 chance of being included in the sample. Systematic random sample – In systematic random sampling the target population is ordered in some manner which would not be considered systematically biased. In other words there must not be anything of importance in the ordering of the population. For example, a population of employees might be ordered by last name, alphabetically. Then, after the target population is arranged according to the ordering scheme, elements at regular intervals through that ordered list are selected. If from a population of 6,000 we wanted a sample of 150, our sampling interval would be 40 (6,000/150) = 40. Systematic random sampling involves selecting a random start within the first sampling interval and then proceeds with the selection of every kth element from then onwards. In this case, k = (population size/sample size). It is important that the starting point is not automatically the first in the list, but is instead randomly chosen from within the first to the kth element in the list. Thus, the sampling interval is selected depending upon the sample size desired, then a random starting number is selected, then every nth person on our list. In our example suppose out of the first 40 on our list we randomly selected number 27, then our next one would be number 67, then number 107, etc. until we obtained our sample of 150. Systematic random sampling may save time and costs with large populations. Stratified samples – Although perhaps not likely, it is possible to draw a sample of 100 of the same gender from a population of 1,000 that included the same number of men and women (500 each). While this sample may not be representative of the population, it would still meet our definition above of a good sample. If we know that certain population characteristics are important and we want to make sure they are adequately included in the sample, we might use stratified sampling. Stratified sampling is where the population is divided into subgroups (strata) or layers and the sample is drawn from each strata. In our example above, if we were interested in being certain that men and women were included in our sample of 100, we could divide our population into the 500 females (the female strata) and into the 500 males (the male strata), then randomly select 50 from each strata for a total sample of 100. The benefits of stratified sampling include allowing for the analysis of sub-groups when desired and may provide for more accuracy in statistical estimation. Cluster samples – In cluster sampling the population is divided geographically census tracts, voting precincts, counties, etc. Then the clusters are selected randomly - usually multi-stage on down to individual or household level. Many national studies are done in this manner. Although many of the statistical tests we will cover in future modules rely upon the assumption that the data is from probabilistic samples, much business research is in actuality based upon non-probabilistic methods. Types of probabilistic samples are: Purposive - In purposive sampling respondents are deliberately sampled for a particular reason; for example we may sample particular individuals because of their special expertise about a certain topic. One drawback of purposive sampling is that we can’t generalize from our findings to a population. Quota samples – Quota samples may appear similar to stratified samples in that the population is divided into sub-groups, except in quota sampling individuals are not selected randomly as they are in stratified sampling. This technique may be useful when time is limited, but because the sample is not randomly selected, unknown biases are not accounted for. An example of a quota sample would be a sample requiring five men and five women under the age of 40 and five men and five women 40 or older. Chunk samples – Chuck samples typically refer to simply including a group who happens to be available when needed. For example interviewing five people on the street about some topic doesn’t represent the whole population and probably doesn’t even represent the people on the street. Volunteer samples (also known as convenience samples) – Volunteer sampling consists of participants becoming part of a study because they volunteer when asked. This technique is typically quick and easy and much research is done with volunteers. However, the type of participants who volunteer may not be representative of the target population for a number of reasons. It’s also very difficult to determine how volunteers differ from those who did not volunteer and how whatever those differences might have been systematically affect the results. Snowball – The snowball sample is a technique in which study participants recruit others, from among friends and acquaintances. An obvious problem with this technique is the number of biases that may influence the results.