Rodd_GeoMeth_Samplin.. - University of Colorado Boulder

advertisement
Sample Design for Surveys
Joshua Rodd
GEOG 5161
Spring 2011
Introduction
A sample survey is implemented in order to provide a description of a population by
studying a smaller sample of that population (Creswell 2009, 145). Sample design is the
process by which the researcher chooses the part of the population that will be included in
the survey (Kalton 1983, 7). These are relatively new technologies of investigation—at the
beginning of the 20th Century, statisticians and social scientists still debated the validity of
surveys that did not attempt to enumerate and collect data from an entire group. It is now
widely recognized that attempts at complete enumeration of a large population (i.e., a
census) are often less accurate and useful than a rigorously designed sample (Bernard
2002, 142-143).
Note the caveat—to be useful, sampling must be a rigorous and thoroughly considered
aspect of the research design. As useful as sampling is, a poorly designed sample usually
results in inaccurate survey results. Therefore, as Creswell notes (2009, 147-149), anyone
attempting to conduct a survey of any sort should pay ample attention to their sampling.
The focus of this presentation and summary of terms is oriented towards survey
researchers doing cross-sectional studies. However, regardless of whether their work is
quantitative, qualitative, or mixed, the researcher must think carefully about how he or she
selects those subjects who will represent the group under consideration.
Sample Surveys in Human Geography.
While certainly prominent, survey research has not been a dominant methodology in
human geography. A particular contribution by the discipline has been the integration of
human survey data and GIScience, a technique that is gaining increasing attention in other
disciplines. However, in a review of 613 articles on methods published in Annals of the
Association of American Geographers (Annals) over a 100 year period, Kwan (2010) found
only 12 with the word “survey” used as a search term in the title. Nor were all 12 articles
were oriented towards human research; Kwan’s article suggests many were oriented
towards land surveys. Perhaps this lack of attention derives from skepticism of surveys and
other quantitative methods of human research held by many human geographers since at
least the 1970s. (For an excellent summary of these criticisms, as well as a robust defense
of quantitative methods in human geography, see Kwan and Schwanen (2009) and
Schwanen and Kwan (2009).) Nevertheless, a review of the last three years of publications
in Annals as well as The Professional Geographer demonstrates that many human
geographers continue to employ survey methods.
Key Concepts in Sample Design:
The target population and the survey population: The group of people (or anything
else) that interests a researcher is the target population. The survey population includes
only those who will potentially be included in the sample. It is important to distinguish the
target population from the survey population, as they may not be the same. For example, in
a survey targeting the United States, it is very difficult to successfully sample those who are
enlisted in the military or in prison. Therefore, although a researcher’s target population
may be the inhabitants of the US, the survey population will not include prisoners or
military personnel (Kalton 1983, 6).
Non-probability sampling: In contrast to probability sampling (see below), in a nonprobability sample the researcher does not know the likelihood that any possible
respondent will be selected. In such cases, there is often no sampling frame (see below) or
any practical way to define one. A researcher may pull a convenience sample (also known as
haphazard or accidental sampling), which includes those who are available for the study
but who have not been randomly selected. Examples include volunteer respondents,
interviews conducted on a street corner, respondents in a geographic proximity to the
researcher’s work, or any other situation in which the probability of participation is
unknown. Alternatively, a researcher might conduct a judgment sample (or expert choice
sample), in which the researcher or another expert identifies a series of respondents (or
clusters) judged to be representative. Finally, quota sampling is used by some researchers.
Each interviewer is given a quota of certain categories of respondents (categories might be
ethnicity, gender, age or other relevant characteristics) and the sample is assembled based
on the willingness to talk of each respondent (Kalton 1983, 90-93; Bernard 2002, 180-202).
Probability Sampling: Statistical analysis of survey data depends on knowing the
probability that any respondent was selected out of the survey population. If that
probability of selection is known, the sample is a probability sample. If it is not known, the
sample is a nonprobability sample (see above) (Kalton 1983, 7).
Sampling Frames: Once a researcher has identified a target population and decided to
take a probability sample, he or she must identify a list of possible respondents from that
population. This is the sampling frame, and it is of critical importance because it defines the
survey population. Depending on the type of research one is doing, possible sampling
frames might include telephone books, voter registration lists, motor vehicle records,
company personnel records, association membership lists, or refugee camp feeding rosters.
In places that are either poor or not greatly bureaucratized, sampling frames may be hard
to come by. In such a case, the researcher may need to create the sampling frame himself or
herself (Kalton 1983, 56; Lohr 2010, 3-8).
Sample Size: The size of the target or survey populations is not the most important
question in determining sample size1. Instead, the critical issue is the degree of precision
needed in order to answer the question the surveyor is asking. If the researcher is trying to
determine the difference between two subpopulations, then the sample size will depends
on the subtlety of the distinction and the level of significance and power. If the distinction
between null and alternative hypotheses being tested is slight, sample size must be high. If
the difference between null and alternative hypotheses is large, sample size can be smaller.
In addition, the higher the statistical significance level and the higher the statistical power
sought, the higher the sample must be (Rosner 2000, 236-242). (In addition, as variance
around a variable in the survey population goes up, the sample size must also be
increased.) If the researcher is trying to estimate a proportion based on a sample, then the
sample size depends on the confidence level at which the researcher is estimating the
proportion (Bernard 2002, 176-179).Beyond these questions, the researcher must also
take in to account the nonresponse among sampled respondents.
Nonresponse: It is rare that all solicited respondents will be available to respond to a
survey or agree to participate. Therefore, the researcher must estimate a refusal rate and
adjust sample size up accordingly. In addition, if those who are not available or who refuse
to participate are systematically similar to each other and different from those who do
respond, the survey may be biased. Researchers should take this possibility in to account
(Kalton 1983, 63-68).
Census: The simplest form of probabilistic sample is one that includes 100% of the target
population. This kind of sample is also called a census. Although a census can be useful in
some cases, for large populations it is often inaccurate. It is likely that not every member of
the target population can be reached, and those who cannot be reached are often
systematically different from those who can be. In the US, census workers have more
difficulty finding and interviewing the homeless, the poor, and the very rich. These groups
are likely to be under-represented in a pure census.
Simple Random Sampling: Once a sample size has been determined (see below), the
simplest way to select a sample is by simple random sampling, or SRS. SRS can be
accomplished by pulling names out of a hat, assigning a number to each individual in the
However, if the population in question is small or the sample size is large proportionate to
the population, adjustments can be made based on population size.
1
sampling frame and using a random number generator, or other similar methods. The
probability of selecting any potential respondent is equal to the chance of selecting anyone
else. Although SRS is supposed to be simple, it is in fact often laborious to implement
(Kalton 1983, 8-15).
Systematic Sampling: In systematic sampling, the researcher lists all the members of his
or her sampling frame, randomly selects a starting point, and then selects each member of
the frame that falls a set period after the start. The period is defined by the sample size and
the number of individuals in the sampling frame. As an example, if the period is 8 and the
random stating point is the 36th member of the sampling frame, the research would then
pick the 44th, 52nd, 60th, etc. members of the frame until the sample were filled (Kalton
1983, 16-19).
Stratification: If a researcher requires additional statistical power in order to study a subpopulation of interest, he or she may draw more respondents from that sub-population, or
strata, than a random sample would dictate. This requires that the researcher have an
accurate idea of the proportion of the sub-population in the larger population and also that
it be possible to draw a separate sample for each stratum. For example, if a researcher is
surveying undergraduates at CU but is particularly interested in male First Years, he or she
would stratify by class and sex and oversample for male First Years (Kalton 1983, 19-20).
Cluster Sampling: In cluster sampling, the researcher first selects a grouping within the
larger survey population, and then systematically samples individuals within the selected
group. This technique is often employed when sampling frames are not available; for
instance, in southern countries. In such a case, the clusters are often communities, which
can be listed by the researcher even if the populations of these communities cannot be. The
researcher randomly selects communities, then enumerates the inhabitants of each
selected community and then randomly selects from this new sub-frame. If all clusters are
of equal size, then the researcher has a fairly easy time of it. If they are not, then the
researcher must adjust the probability of selecting the cluster accordingly, so that larger
clusters are more likely to be selected than smaller clusters (Kalton 1983, 28-37).
Works Cited
Bernard, H. Russell. 2002. Research Methods in Anthropology. 3rd ed. New York: Altamira
Press.
Creswell, John W. 2009. Research Design: Qualitative, Quantitative, and Mixed Methods
Approaches. 3rd ed. Thousand Oaks, CA: SAGE Publications.
Kalton, Graham. 1983. Introduction to Survey Sampling. SAGE University Paper 35.
Thousand Oaks, CA: SAGE Publications.
Kwan, Mei-Po, and Tim Schwanen. 2009. Quantitative Revolution 2: The Critical (Re)Turn.
The Professional Geographer 61, no. 3 (August):283-291.
Lohr, Sharon L. 2010. Sampling: Design and Analysis. 2nd ed. Boston, MA: Brooks/Cole.
Rosner, Bernard. 2000 Fundamentals of Biostatistics. 5th ed. Pacific Grove, CA; Duxbury.
Schwanen, Tim, Mai-Po Kwan. 2009. “Doing” Critical Geographies with Numbers. The
Professional Geographer 61, no. 4 (November): 459-464.
Download