Sampling - E-Training for Social Science Research

advertisement
Table of Contents
Word did not find any entries for your table of contents.
In your document, select the words to include in the table of contents, and then
in the Formatting Palette under Styles, click a heading style. Repeat for each
heading that you want to include, and then insert the table of contents in your
Manual Formatting option and then type the entries manually.
Sampling
THE CONTENTS OF THIS DOCUMENT ARE TAKEN VERBATIM FROM:
1.
The Research Methods Knowledge Base. Retrieved from
http://www.socialresearchmethods.net/kb/index.php on June 20, 2012.
2. The Ahfad University for Women. Methods for Social Researchers in
Developing Counries.
3. Online Statistics Education: A Multimedia Course of Study
(http://onlinestatbook.com/). Project Leader: David M. Lane, Rice University
Sampling
Introduction
The purpose of most research is to learn something about a larger group called the population. One
way to learn something about a population is to collect data from all the members of the population.
This is known as taking a census or enumerating the population. Generally, because of the time and
cost involved, it is impractical to enumerate a population. The only practical alternative is to obtain
data from a part of the population, called a sample.
The decision tree below summarizes the decisions involved in sampling:
Sampling Terminology
Before we look at the different methods of sampling, we need to first understand certain concepts and
terms used in sampling:

Population: Your research question defines the group or population you want to learn about.
Many small populations can be defined precisely. Many others, however, are large and constantly
changing, making it hard to define them accurately. Under such scenarios, you have to define your
population as clearly as possible by placing specific geographic, time, membership or other limits
on the abstract population to obtain a target population we want our sample to represent.

Sampling element: A single member or unit of target population about which information will be
obtained. It is often an individual but can also be an organization, group or household.

Sample frame: This is a list of all sampling elements (members) in your target population from
which the sample is selected. I.e. It is the practical, operational definition of the target population.

Statistic: A finding based on a sample. Generally reported as percentages, averages, and measures
of variation.

Parameter: A finding based on measuring the entire population. Since this is next-to-impossible,
parameters are usually not known and are estimated using statistics.

Sampling/Systematic Error/Bias: Describes how well a statistic estimates a parameter. Can arise
in many situations. For instance, even if you are able to identify perfectly the population of
interest, you may not have access to all of them. And even if you do, you may not have a complete
and accurate enumeration or sampling frame from which to select. And, even if you do, you may
not draw the sample correctly or accurately
Types of Sampling
There are two types of sampling – probability and nonprobability sampling. Probability sampling
methods rely only on random or chance selection. A carefully selected probability sample allows a
researcher to generalize from sample results to the population from which the sample was selected. In
contrast, nonprobability samples are selected by means other than chance, typically through some form
of human choice or judgment. As a result, there is no way to show that a nonprobability sample
represents our target population.
Therefore, researchers typically prefer probabilistic sampling methods over nonprobabilistic ones as
they are considered to be more accurate and rigorous. Nonetheless, nonprobability sampling is
important as it has its uses under circumstances where it is not feasible, practical or theoretically
sensible to do random sampling.
With this in mind, we now turn to four simple methods for selecting probability and nonprobability
samples respectively.
Probability Sampling
1.
Simple Random Sampling:
This is the simplest form of random sampling. The objective is to select n units out of N (target
population) such that each unit has an equal chance of being selected. This can be done, for
example, using a table of random numbers in excel. Steps:
 Define the target population and sampling element
 Select a sampling frame; and
 Select the sample
2.
Stratified Random Sampling:
Also sometimes called proportional or quota random sampling, it involves dividing your
population into homogeneous subgroups and then taking a simple random sample in each
subgroup
3.
Systematic Random Sampling1:
The steps here are similar to that of simple random sampling. However, instead of drawing each
element at random, the investigator calculates a sampling interval and uses this interval in
selecting the elements to be included in the sample. Steps:
 Number the units in the population from 1 to N
 Decide on the n (sample size) that you want or need
 k = N/n = the interval size
 Randomly select an integer between 1 to k as a starting point
 Take every kth unit as your sample element
4.
Cluster Sampling:
This is used when there is no sampling frame or when the target population is very large and/or
scattered over a wide area. Steps:
 Divide population into clusters (usually along geographic boundaries). Each cluster defines
some part of the target population
 Randomly sample clusters
NB: Before using systematic sampling, always check to see if the names/units on the sampling frame are arranged in any
systematic order. If some kind of order exists, another sampling method should be used. Why?
1
For example, suppose you were studying the levels of job satisfaction among staff of a government ministry. Since the ministry
has a complete list of all staff, you decide to use systematic sampling with a sampling interval of 20. Now, suppose that the first
name you selected was a supervisor and that a supervisor's name appeared as every 20th name thereafter. If this were the case,
you would have selected a supervisor for the first sample member. Then, by using the interval of 20 you would have selected
only supervisors for the rest of the sample. Thus, the entire sample would be made up of supervisors. Obviously any results
based on responses of only supervisors would be biased and could not be used to describe the morale of the staff in the ministry.

Measure all units within sampled clusters
Note, however, that in most real applied social research, we usually combine the simple methods
described here to effectively address our sampling needs through multi-stage sampling.
Nonprobability Sampling
1.
Convenience Sampling:
Also called haphazard sampling, researcher selects some number of persons or other sampling
units because they are easily accessible.
2.
Quota Sampling:
A quota for subgroup is set in advance and persons having the right characteristics are selected
nonrandomly until that number is met.
3.
Expert Sampling:
This involves the assembling of a sample of persons with known or demonstrable experience and
expertise in some area. There are two reasons you might do expert sampling. First, because it
would be the best way to elicit the views of persons who have specific expertise. Second, to
provide evidence for the validity of another sampling approach you've chosen
4.
Network Sampling:
Also called snowball sampling, you begin by identifying someone who meets the criteria for
inclusion in your study. You then ask them to recommend others who they may know who also
meet the criteria. Although this method would hardly lead to representative samples, there are
times when it may be the best method available. For example, when you are trying to reach
populations that are inaccessible or hard to find.
Sampling Biases
Sampling biases are errors that arise due to the method of sampling used, not the sample itself. Thus,
there is no guarantee that random sampling will result in a sample representative of the population just
as not every sample obtained using a biased sampling method will be greatly non-representative of the
population. Three types of biases will be discussed here:
1.
Self-selection Bias:
A self-selection bias can occur in three ways.
Firstly, imagine a university newspaper ran an ad asking for students to volunteer for a study in
which intimate details of their sex lives would be discussed. Clearly the sample of students who
would volunteer for such a study would not be representative of the students at the university.
Similarly, an online survey about computer use is likely to attract people more interested in
technology than is typical. In both these examples, people who "self-select" themselves for the
experiment are likely to differ in important ways from the population the experimenter wishes to
draw conclusions about.
Secondly, self-selection bias can occur when the non-random component occurs after the potential
subject has enlisted in the experiment. Consider again the hypothetical experiment in which
subjects are to be asked intimate details of their sex lives, assume that the subjects did not know
what the experiment was going to be about until they showed up. Many of the subjects would
likely leave the experiment leaving a non-random sample remaining.
Thirdly, there can be a non-response bias (a form of self-selection bias) where certain subjects are
more likely to respond than others. A commonly-cited example is the poll taken by the Literary
Digest in 1936 that indicated that Landon would win an election against Roosevelt by a large
margin when, in fact, Roosevelt won by a large margin. There was a non-response bias such that
those favoring Landon were more likely to return their survey than those favoring Roosevelt.
2.
Survivorship (Attrition) Bias
Survivorship bias occurs when the observations recorded at the end of an investigation are a nonrandom set of those present at the beginning of the investigation.
For example, the gains in stock funds is an area in which survivorship bias often plays a role. The
basic problem is that poorly-performing funds are often either eliminated or merged into other
funds. Suppose one considers a sample of stock funds that exist in the present and then calculates
the mean 10-year appreciation of those funds. These results cannot be validly generalized to other
stock funds of the same type as the poorly-performing stock funds that are not in existence (did not
survive for 10 years) are not included and therefore there is a bias toward selecting betterperforming funds. There is good evidence that this survivorship bias is substantial (Malkiel,
1995).2
Another example of attrition bias is when data cannot be obtained from some of the persons or
other units selected as members of the sample. Maybe some sample members cannot be located,
some are never at home when an interviewer tries to contact them, or some (eventually) refuse to
be interviewed. Whatever the reason, the actual number of persons interviewed is less than the
number selected to form the sample. As a result, the resulting sample becomes less representative
of the target population and the value of the results for generalizing to the target population is
reduced.
3.
Sampling Selection Bias
This bias arises through the researcher’s choice of sampling method or choice of units sampled and
could pose a large problem in nonprobability sampling methods such as convenience and quota
sampling where samples are chosen non-randomly.
Under-coverage bias is a particularly salient form of sampling selection bias where the researcher
samples too few observations from a segment of the population. Again, using the LandonRoosevelt example from above, a common explanation for why Roosevelt won is that poorer
people were under-covered because they were less likely to have telephones and yet more likely to
support Roosevelt.
2 Malkiel, B. G. (1995) Returns from investing in equity mutual funds 1971 to 1991. The Journal of Finance, 50, 549-572
Download