2.3 Sampling Designs for Surveys 24

3:27 PM
Page 24
Chapter 2 Using Surveys and Experimental Studies to Gather Data
Sampling Designs for Surveys
A crucial element in any survey is the manner in which the sample is selected from
the population. If the individuals included in the survey are selected based on convenience alone, there may be biases in the sample survey, which would prevent the
survey from accurately reflecting the population as a whole. For example, a marketing graduate student developed a new approach to advertising and, to evaluate
this new approach, selected the students in a large undergraduate business course
to assess whether the new approach is an improvement over standard advertisements. Would the opinions of this class of students be representative of the general
population of people to which the new approach to advertising would be applied?
The income levels, ethnicity, education levels, and many other socioeconomic characteristics of the students may differ greatly from the population of interest. Furthermore, the students may be coerced into participating in the study by their
instructor and hence may not give the most candid answers to questions on a survey. Thus, the manner in which a sample is selected is of utmost importance to the
credibility and applicability of the study’s results.
In order to precisely describe the components that are necessary for a sample
to be effective, the following definitions are required.
target population
sampled population
observation unit
sampling unit
Target population: The complete collection of objects whose description is
the major goal of the study. Designating the target population is a crucial
but often difficult part of the first step in an observational or experimental study. For example, in a survey to decide if a new storm-water
drainage tax should be implemented, should the target population be all
persons over the age of 18 in the county, all registered voters, or all persons paying property taxes? The selection of the target population may
have a profound effect on the results of the study.
Sample: A subset of the target population.
Sampled population: The complete collection of objects that have the
potential of being selected in the sample; the population from which
the sample is actually selected. In many studies, the sampled population
and the target population are very different. This may lead to very
erroneous conclusions based on the information collected in the sample.
For example, in a telephone survey of people who are on the property
tax list (the target population), a subset of this population may not
answer their telephone if the caller is unknown, as viewed through caller
ID. Thus, the sampled population may be quite different from the target
population with respect to some important characteristics such as income
and opinion on certain issues.
Observation unit: The object upon which data are collected. In studies involving human populations, the observation unit is a specific individual
in the sampled population. In ecological studies, the observation unit
may be a sample of water from a stream or an individual plant on a plot
of land.
Sampling unit: The object that is actually sampled. We may want to sample
the person who pays the property tax but may only have a list of telephone numbers. Thus, the households in the sampled population serve as
the sampled units, and the observation units are the individuals residing
in the sampled household. In an entomology study, we may sample 1-acre
plots of land and then count the number of insects on individual plants
3:27 PM
Page 25
2.3 Sampling Designs for Surveys
sampling frame
simple random sampling
stratified random sample
ratio estimation
cluster sampling
residing on the sampled plot. The sampled unit is the plot of land, the
observation unit would be the individual plants.
Sampling frame: The list of sampling units. For a mailed survey, it may be a
list of addresses of households in a city. For an ecological study, it may be
a map of areas downstream from power plants.
In a perfect survey, the target population would be the same as the sampled population. This type of survey rarely happens. There are always difficulties in obtaining a sampling frame or being able to identify all elements within the target
population. A particular aspect of this problem is nonresponse. Even if the researcher was able to obtain a list of all individuals in the target population, there
may be a distinct subset of the target population which refuses to fill out the survey
or allow themselves to be observed. Thus, the sampled population becomes a subset of the target population. An attempt at characterizing the nonresponders is
very crucial in attempting to use a sample to describe a population. The group of
nonresponders may have certain demographics or a particular political leaning
that if not identified could greatly distort the results of the survey. An excellent discussion of this topic can be found in the textbook, Sampling: Design and Analysis
by Sharon L. Lohr (1999), Pacific Grove, CA: Duxbury Press.
The basic design (simple random sampling) consists of selecting a group of n
units in such a way that each sample of size n has the same chance of being selected.
Thus, we can obtain a random sample of eligible voters in a bond-issue poll by drawing
names from the list of registered voters in such a way that each sample of size n has the
same probability of selection. The details of simple random sampling are discussed in
Section 4.11. At this point, we merely state that a simple random sample will contain
as much information on community preference as any other sample survey design,
provided all voters in the community have similar socioeconomic backgrounds.
Suppose, however, that the community consists of people in two distinct income brackets, high and low. Voters in the high-income bracket may have opinions
on the bond issue that are quite different from the opinions of low-income bracket
voters. Therefore, to obtain accurate information about the population, we want
to sample voters from each bracket. We can divide the population elements into
two groups, or strata, according to income and select a simple random sample
from each group. The resulting sample is called a stratified random sample. (See
Chapter 5 of Scheaffer et al., 2006.) Note that stratification is accomplished by
using knowledge of an auxiliary variable, namely, personal income. By stratifying
on high and low values of income, we increase the accuracy of our estimator.
Ratio estimation is a second method for using the information contained in an auxiliary variable. Ratio estimators not only use measurements on the response of
interest but they also incorporate measurements on an auxiliary variable. Ratio
estimation can also be used with stratified random sampling.
Although individual preferences are desired in the survey, a more economical procedure, especially in urban areas, may be to sample specific families, apartment buildings, or city blocks rather than individual voters. Individual preferences
can then be obtained from each eligible voter within the unit sampled. This technique is called cluster sampling. Although we divide the population into groups for
both cluster sampling and stratified random sampling, the techniques differ. In
stratified random sampling, we take a simple random sample within each group,
whereas in cluster sampling, we take a simple random sample of groups and then
sample all items within the selected groups (clusters). (See Chapters 8 and 9 of
Scheaffer et al., 2006, for details.)
3:27 PM
Page 26
Chapter 2 Using Surveys and Experimental Studies to Gather Data
systematic sample
Sometimes, the names of persons in the population of interest are available
in a list, such as a registration list, or on file cards stored in a drawer. For this situation, an economical technique is to draw the sample by selecting one name near
the beginning of the list and then selecting every tenth or fifteenth name thereafter.
If the sampling is conducted in this manner, we obtain a systematic sample. As you
might expect, systematic sampling offers a convenient means of obtaining sample
information; unfortunately, we do not necessarily obtain the most information for
a specified amount of money. (Details are given in Chapter 7 of Scheaffer et al.,
The following example will illustrate how the goal of the study or the information available about the elements of the population determine which type of
sampling design to use in a particular study.
Identify the type of sampling design in each of the following situations.
a. The selection of 200 people to serve as potential jurors in a medical
malpractice trial is conducted by assigning a number to each of 140,000
registered voters in the county. A computer software program is used to
randomly select 200 numbers from the numbers 1 to 140,000. The people
having these 200 numbers are sent a postcard notifying them of their
selection for jury duty.
b. Suppose you are selecting microchips from a production line for inspection for bent probes. As the chips proceed past the inspection point,
every 100th chip is selected for inspection.
c. The Internal Revenue Service wants to estimate the amount of personal
deductions taxpayers made based on the type of deduction: home office,
state income tax, property taxes, property losses, and charitable contributions. The amount claimed in each of these categories varies greatly
depending on the adjusted gross income of the taxpayer. Therefore, a
simple random sample would not be an efficient design. The IRS decides
to divide taxpayers into five groups based on their adjusted gross incomes
and then takes a simple random sample of taxpayers from each of the
five groups.
d. The USDA inspects produce for E. coli contamination. As trucks carrying produce cross the border, the truck is stopped for inspection. A random sample of five containers is selected for inspection from the
hundreds of containers on the truck. Every apple in each of the five containers is then inspected for E. coli.
a. A simple random sample is selected using the list of registered voters as
the sampling frame.
b. This is an example of systematic random sampling. This type of inspection should provide a representative sample of chips because there is no
reason to presume that there exists any cyclic variation in the production
of the chips. It would be very difficult in this situation to perform simple
random sampling because no sampling frame exists.
c. This is an example of stratified random sampling with the five levels of
personal deductions serving as the strata. Overall the personal deductions
of taxpayers increase with income. This results in the stratified random
3:27 PM
Page 27
2.3 Sampling Designs for Surveys
sample having a much smaller total sample size than would be required
in a simple random sample to achieve the same level of precision in its
d. This is a cluster sampling design with the clusters being the containers
and the individual apples being the measurement unit.
The important point to understand is that there are different kinds of surveys
that can be used to collect sample data. For the surveys discussed in this text, we
will deal with simple random sampling and methods for summarizing and analyzing data collected in such a manner. More complicated surveys lead to even more
complicated problems at the summarization and analysis stages of statistics.
The American Statistical Association (http://www.amstat.org) publishes a
series of brochures on surveys: What Is a Survey? How to Plan a Survey, How to
Collect Survey Data, Judging the Quality of a Survey, How to Conduct Pretesting,
What Are Focus Groups? and More about Mail Surveys. These describe many
of the elements crucial to obtaining a valid and useful survey. They list many of
the potential sources of errors commonly found in surveys with guidelines on how
to avoid these pitfalls. A discussion of some of the issues raised in these brochures
Problems Associated with Surveys
survey nonresponse
Even when the sample is selected properly, there may be uncertainty about
whether the survey represents the population from which the sample was selected.
Two of the major sources of uncertainty are nonresponse, which occurs when a
portion of the individuals sampled cannot or will not participate in the survey, and
measurement problems, which occur when the respondent’s answers to questions
do not provide the type of data that the survey was designed to obtain.
Survey nonresponse may result in a biased survey because the sample is not
representative of the population. It is stated in Judging the Quality of a Survey that
in surveys of the general population women are more likely to participate than
men; that is, the nonresponse rate for males is higher than for females. Thus, a political poll may be biased if the percentage of women in the population in favor of
a particular issue is larger than the percentage of men in the population supporting
the issue. The poll would overestimate the percentage of the population in favor of
the issue because the sample had a larger percentage of women than their percentage in the population. In all surveys, a careful examination of the nonresponse
group must be conducted to determine whether a particular segment of the population may be either under- or overrepresented in the sample. Some of the remedies
for nonresponse are
1. Offering an inducement for participating in the survey
2. Sending reminders or making follow-up telephone calls to the individuals
who did not respond to the first contact
3. Using statistical techniques to adjust the survey findings to account for
the sample profile differing from the population profile
measurement problems
Measurement problems are the result of the respondents not providing the information that the survey seeks. These problems often are due to the specific wording of questions in a survey, the manner in which the respondent answers the survey