MOL 501: Sampling - Gonzaga University

advertisement
# 5: Sampling, pg 1
MOL 501: Probability and Nonprobability Sampling
(Revised 11/11/07)
Required Reading: Chs. 8 & 9.
The last lecture ended with a discussion of external validity—the extent to which one can
generalize one’s findings from a study to the larger world. Sometimes it is not enough to know
whether or not a hypothesis is supported for the people studied; we often really want to know if
the hypothesis would be supported if it were tested with other people in other settings. One of the
most important determinants of external validity is the quality of the sample studied. A “good”
sample is one that is so similar in composition to the larger world we are interested in that what
is true for the sample is also true for the larger world. Thus, in many studies, the quality of the
sample studied becomes the major influence on external validity.
In most social science research, external validity is a very important issue. However,
external validity is not always a concern in organizational research. Often, organizational
research is not concerned with people outside of the organization actually studied. For example,
studies of the effectiveness of policies, programs and innovations within a particular organization
are focused entirely on the members of the organization itself. Whether or not the same policies,
programs or innovations would be equally beneficial for other organizations is of no interest to
the researcher. If the organization is not too large and is limited to only one location, it is
possible to study every member of the organization. If every member of the organization
participates in the study, and the researcher is not interested in generalizing to some larger
population, then external validity is not likely to be a problem. However, if the organization is
very large or dispersed geographically, it may be too expensive and time consuming to include
every member of the organization in the study. Whenever some members cannot be studied,
external validity and, therefore, the quality of the sample becomes an important consideration.
# 5: Sampling, pg 2
I. Key Concepts
The logic of sampling is based on a few key concepts. Before discussing various types of
samples and their strengths and weaknesses, it is best to make sure these concepts make sense.
Population/Universe. All hypotheses refer to some population or group of people or
organizations. The groups to which the hypothesis applies is referred to as the population or
universe. If we are doing an evaluation study of the effectiveness of a program within a single
organization, then the population or universe is all of the members of that organization. If the
program only applies to some members, then the population is limited to those members of the
organization to which the program applies. For example, if we are interested in the effect of an
on-site day care facility on the productivity of workers, the population would be limited to
workers with children of the appropriate age for daycare.
Population parameters. A population is defined by certain characteristics, and these
characteristics are the population parameters. For example the population parameters for the
daycare study example described above would be having responsibility for the care of one or
more children, and being an employee of the organization.
Population element. A single member of the population is referred to as a population
element. The term “population element” is used instead of “member” or “person” because not all
populations consist of individuals. For example, if we are studying the effect of carpeting on the
noise level in classrooms, the population is not people but classrooms, and a population element
would be a single classroom, not each student. Similarly, if we were doing a study of the effects
of different kinds of leadership on the performance of teams, the population size would be the
number of teams, not the number of people in the teams. This is an important distinction because
tests of the statistical significance of findings are often reported in the literature, and sample size
has a large effect on many tests of significance. Pretending that a sample of classes or teams is
really a sample of individuals exaggerates the size of the sample (often by a factor of 10, 30, 50,
or even 100). When this kind of error occurs the author of the article may claim that the
# 5: Sampling, pg 3
independent variable (cause) produced a “significant difference” in the dependent variable, when
in fact, the difference may be due to random sampling error.
A good example of this kind of error can be found in published studies of the effect of
various factors on college students’ evaluations of their classes and instructors. Many published
studies report variables like time of day, gender of the teacher, grading practices, and the like
produce “statistically significant” differences in the ratings instructors receive, but their tests of
“significance” of the differences are calculated using the size of the sample as the number of
students who completed the surveys used. Since the sample is really a sample of classes, not
students, they are exaggerating the size of the sample by the average size of the classes studied.
If, for example, researchers studied student evaluations of 20 classes and the average size of
those classes was 40 students, and they misrepresented the sample as a sample of students rather
than as a sample of classes, the sample size they use to calculate whether or not the effects were
“statistically significant” would be 800, when, in fact the size of the sample was really only 20.
Since whether or not an effect or difference is “statistically significant” depends partly on the
size of the sample because sample size or “degrees of freedom” is entered into statistical
significance formulas , some very small and trivial results or results that occurred by chance,
may be reported as “significant”. As a result of such errors, I am sure that some college faculty
are worrying and obsessing over the results of such studies when, in fact, most of the effects
have little or no real impact on their students’ ratings.
Representative or Isomorphic Sample. A sample is representative if it is isomorphic.
That is, if it matches the population in all regards except size. A truly representative sample is a
small microcosm of the population that matches it perfectly in every detail. That is, the make-up
of the sample in regard to types of people (or other elements) exactly matches the make-up of the
population. It will have the same proportion of men and women, the same proportion of rich and
poor people, the same proportion of blue eyed people, etc. as the population. Achieving a
representative sample is often (but not always) the goal of sampling, but it is never achieved.
Thus, representative sampling refers to strategies used to come as close to this goal as possible.
# 5: Sampling, pg 4
Probability sample or random sample. A probability sample or random sample is a
sample in which each element of the population has an equal chance of being included, and for
which the probability of including a particular element is known. For example, in a small
probability sample of employees drawn from an organization of 163 members, each population
element would have one out of 163 chances of appearing in the sample. It is important to realize
that the requirements of probability sampling are very strict. For example, selecting students to
complete a survey by going to the student union and asking each student who enters to complete
a survey is not probability sampling. Not all students go to the student union, and of those who
do, some go there more often than others. As a result, not all students have an equal chance of
being chosen for the sample, and there is no way to calculate the probability that a particular
population member will end up in the sample.
Nonprobability sample. A nonprobability sample is any sample that involves selecting
members by some method that does not involve random selection. Some nonprobability samples
attempt to achieve representativeness without random selection, while other nonprobabilty
samples involve selecting people without regard to representativeness. For example, outlier
samples involve selecting people who are known to be atypical, the exact opposite of the goal
of representativeness.
II. Probability Samples
While the best way to achieve external validity is to include every member of the
population in a study, the next best approach is to use a probability sample. There is no guarantee
that a probability sample will be perfectly representative, but it gives us the best odds of
achieving representativeness. Furthermore, using probability math, we can estimate the
likelihood that any given probability sample will be representative. We can also use probability
math to select a sample size that will give us a particular probability of achieving
representativeness. For this reason, whenever representativeness is the goal, it is best to use a
probability sample.
# 5: Sampling, pg 5
Unfortunately, the requirements for a sample to be a true probability sample (or “random
sample”) are so strict, probability sampling can be quite difficult and expensive. Generally, it is
very difficult to draw a probability sample unless we either have a list of all members of the
population, or we can count on them all being at the same place at the same time. For example,
we could draw a random sample of all employees of a particular organization by selecting every
seventh person off a list of the employees, or we could draw a random sample of the same group
by asking all members to meet in the auditorium and select every seventh person by having them
count off by sevens. However, it would probably be impossible to draw a probability sample of
people who engage in employee theft. There is no list of employee thieves (only those who have
been caught could end up on a list), and there is no place where all employee thieves congregate
because they want to blend in with the general population of employees.
Types of Probability Samples
There are several types of probability samples, and each has its advantages and
disadvantages.
The simple random sample. The simple random sample comes closest to meeting the
strict definition of random sampling, and it is usually the method most likely to achieve
representativeness. Members of the population are usually selected from a list by lot (for
example by putting all names in a hat, shaking it up, and selecting a sample of names), or they
are selected in some other systematic way that gives every member an equal chance of being
selected. In real life, even a simple random sample is not a true probability sample because as
each member is selected, the odds of being included go up. For example, if we have a population
of 100 people and we select our sample by drawing slips of paper with their names out of a hat
containing the names of every member, the odds of being chosen first are one out of 100, but the
odds of being chosen second are “only” one out of 99, the odds of being chosen third are one out
of 98, and so on. The only way to achieve perfect random sampling would be to return each slip
chosen back to the hat so there are always 100 names in it. Of course random sampling with
replacement is not desirable because we do not want to take the chance that the same person
# 5: Sampling, pg 6
will be chosen several times. As a consequence, simple random samples actually used in research
are almost always selected by random sampling without replacement.
Simple random samples are sometimes too expensive or too time consuming to use. For
example, if we want to compare two groups of employees, one of which is very small, we would
need a huge simple random sample in order to obtain enough members of the smaller group to
make meaningful comparisons. (For example, comparing female CEO’s to male CEO’s of large
corporations would require a very large sample of CEO’s because women make up only a tiny
fraction of CEO’s in the population. To make sure we have a subsample of only 50 female
CEO’s would probably require drawing a sample of several thousand CEO’s.) In general, the
larger the sample, the more time consuming and expensive the study. Similarly studying a simple
random sample of employees of an organization with offices scattered all over the U.S. would be
extremely time consuming and expensive because we would have to travel to many different
locations to observe or survey the members of our sample. Two modifications of simple random
sampling are often used to reduce the costs and effort required by simple random sampling.
The stratified random sample. If our primary goal is to compare two or more groups to
each other, and if those groups are unequal in size, the size of the sample (and, therefore the
costs and effort required to study it) can be reduced by drawing a stratified random sample.
This technique involves first dividing up the population into the groups we are interested in, and
then drawing a probability sample of each. For example, if we are interested in comparing male
CEO’s to female CEO’s, we could reduce our costs by first dividing a list of CEO’s into two
groups or strata (males and females), and then drawing a probability sample of 50 from each
group (or stratum).
Using equal sized subgroups to compare the groups is an effective approach so long as
each subgroup sample is selected randomly from the population of members of that subgroup.
However, if we want to accurately represent the population as a whole, stratified random
sampling would not be a good choice because we are purposely over sampling one group (in
our example, women CEO’s). As a consequence any generalizations drawn about the group as a
# 5: Sampling, pg 7
whole would be misleading because the oversampled group would have too much influence on
our results. Of course, we could get around this problem by making sure the proportion of the
over-all sample that belong to each strata is equal to the proportion in the population, but that
would negate the money saving and time saving benefits of stratified random sampling because
we would end up with a very large sample.
Cluster samples. Although the stratified random sample is excellent for comparing
unequal sized groups, it can yield misleading results if we try to use it to study the group as a
whole. However, cluster sampling. or multistage sampling can be used when the goal is to
describe the population as a whole for the least cost, and the cost savings are especially large if
the population is spread out geographically.
Cluster sampling involves identifying clusters in the population, drawing a random sample
of those clusters, and then drawing a random sample of individual population elements from
each cluster selected. For example, if we wanted to study a probability sample of employees of
an organization with offices scattered throughout the U.S., we could save a lot of money and
effort by first getting a list of the various offices of the organizations and drawing a random
sample of the offices; then we could go to only the offices selected, and study a sample of
members from each. Similarly, if we wanted to administer surveys to a sample of G.U.
undergraduates living on campus, we could save a lot of time by first drawing a random sample
of dormitories and other campus housing from the list of all G.U. residences, and then go only to
those residences to distribute surveys to either every member of the residences or giving it to a
randomly selected sample from each of the residence. For example, drawing a cluster sample by
first randomly selecting five residence halls from the 15 at G.U. residences and two apartments
from the six G.U. apartments would cut the time and effort of obtaining surveys from a random
sample of G.U. students living on campus to a third of the costs and effort to obtain the same
number of surveys from a simple random sample.
III. NONPROBABILITY SAMPLES
# 5: Sampling, pg 8
While probability samples are the best for obtaining representative samples, they are not
always desirable or necessary. Some populations cannot be studied with probability samples at
all because there is no list of the members, and members may be hard to locate. For example, it is
impossible to study probability samples of deviant groups (like employees who steal from the
company), groups with no permanent location (like the homeless), or even potential customers or
clients (like the population of people who wear hiking boots or the population in need of some
kind of professional services) because there is no list of the population of these groups, and even
if there was such a list, tracking them down could be impossible. As a consequence, a number of
nonprobability sampling techniques have been developed, and each has advantages and
disadvantages.
As is the case with the various types of probability samples, each type of nonprobability
sample has strengths and weaknesses, and the best choice depends on the kind of research being
conducted.
A. Accidental Or Convenience Samples.
When lay persons say they selected people “at random” they are often referring to
convenience sampling or accidental sampling (the two terms are interchangeable) rather than true
random sampling. With convenience sampling we simply reach out to whoever is immediately
available and take the first population elements we come across. Convenience samples are the
lowest cost samples in terms of time and effort, but they are also the least likely to be
representative. Taking whoever is within easy reach almost guarantees that the sample will not
be representative of the population. If we simply sample from the people around us at school or
work, there is a good chance that the sample will primarily consist of people who are very
similar to us. If students sample their classmates, they will be over sampling people with
interests, backgrounds, age, and career goals similar to their own. Of course, we could go to
some convenient location that is not a regular stop in our daily routine in an effort to get a wider
range of respondents, but such a sample will probably not be representative. For example, many
marketing studies are done with “intercept samples”—convenience samples drawn by going to a
# 5: Sampling, pg 9
public place (like a mall) and asking people to complete a survey as they walk by. However, a
mall intercept sample will still not be representative of populations other than the patrons of that
particular mall. Some people are particularly likely to spend a lot of time at malls (for example
teenagers who go to mall to “hang out” with their friends), while others will go to great lengths
to avoid shopping at malls. Furthermore, most malls are located in suburban and urban settings,
so people living in rural areas are likely to be under represented.
Generally, convenience sampling should be our last choice, but sometimes limitations on
time and money make convenience sampling the only practical choice. One way to improve the
quality of convenience samples is to use time sampling. For example, if we must use a mall
intercept sample, we can make it more representative of the people who use that mall by
sampling at different times of the day and different days of the week. (We could even draw a
probability sample of times and days.) The population that goes to the mall on weekdays while
school is in session and most people are at work is probably quite different from the population
that normally goes to the mall in the evening or on weekends. Similarly, its possible to make
convenience samples more representative of the population we wish to study by sampling people
from several different locations because different populations sometimes use different locations
for their routine activities. For example, students forced to use a convenience sample drawn from
the student population can improve the quality of the sample by going to different locations (the
student union, several dormitories, campus dining halls frequented by students who live off
campus, the library, etc.) to obtain a more diverse sample. Of course, there is no way to make
such a sample completely representative of the population, but such efforts will help reduce
some aspects of over sampling and under sampling that inevitably result when we sample the
people most convenient for us to study.
B. Quota samples
One way to improve the quality of a convenience sample is to identify various types of
people, groups, or ways that population elements in the sample normally differ, estimate the
# 5: Sampling, pg 10
proportion of the population they represent, and then set quotas to ensure that they represent the
same proportion of the sample as they do in the population. For example, if we are studying
students at a particular university with a convenience sample, we can find out what proportion of
the student body is female, and what proportion is male (typically about 55% of most college
populations are women), and take steps to make sure that our convenience sample includes the
same proportion of men and women. This technique is called quota sampling, and the goal is to
obtain a sample that is more representative than could be obtained with a simple convenience
sample.
Quota samples, like stratified random samples, may be necessary if we wish to compare
two groups that are very unequal in size. For example, at Gonzaga University, the number of
students whose parents divorced while they were in school is very small. As a consequence, a
researcher who wished to compare students whose parents had divorced to those whose parents
did not, would need a huge convenience sample to obtain enough students from families that
experienced divorce to make meaningful comparisons. A less costly approach would be to set a
quota for each group, and after enough students from intact families had been obtained, recruit
only whose parents had been divorced. Such a quota sample could be obtained using the
intercept technique by beginning the interview or survey with a question designed to identify
people who fit the quota. For example, we could begin with the question, “Are your parents still
married to each other?” If the answer is “no”, then we could interview the respondents or ask
them to complete the questionnaire, but if the answer is “yes”, we could politely inform them
that we are only interested in interviewing people whose parents divorced and move on to the
next person who walks by.
It is important to keep in mind that quota samples are almost never representative. The
people actually sampled are still a convenience sample. For example, in the above example, the
children of divorce sampled are a convenience sample of all such children and the children
whose parents never divorced are a convenience sample of people from intact families.
C. Purposive Samples
# 5: Sampling, pg 11
Sometimes the goal is not to obtain a representative sample of the population. Purposive
sampling involves hand picking population elements to meet certain predefined criteria. As your
text indicates (see page 136), hand picking cases will inevitably produced a biased sample, but
sometimes we actually want a biased (or at least a nonrepresentative) sample.
Comparative samples. In organizational research we often need to have some basis for
comparison if we are to correctly interpret our data. For example, an organizational researcher
might be asked by management to conduct a study to determine what programs or innovations
would benefit employees with young children. This kind of study, called a needs assessment,
will be successful only if the unique or special needs of the group studied are identified, but
some of the needs of this group will be common among all employees. If we were interested in
identifying the special needs of employees with young children by interviewing them or asking
them to complete a survey, we would have no way to tell which of their needs are unique, and
which are similar to other workers, including those who are single, those with grown children,
and those who are childless. One solution to this problem is to draw two samples, a sample with
the characteristics we are interested in (in this case employees with young children), and another
sample that is similar in as many ways as possible to the group we are interested in except for the
key factor. For example, for the hypothetical needs assessment for employees with young
children, we could draw a second quota sample of workers who do not have young children and
compare the results obtained from the two samples. The presence of this comparison sample will
sensitize us to the unique needs of the group we are interested in.
Outlier analyses. Sometimes studying a sample that is representative of the general
population yields very little information because there is not much variation among most
members of the population. For example, most research designed to measure the characteristics
of successful professionals have yielded little useful information because most members of any
profession are quite similar in both competence and background. Professional schools generally
have high standards, and most professionals must meet certain minimum requirements to be
licensed to practice their profession. As a consequence, most members of a profession are quite
# 5: Sampling, pg 12
similar in regard to their training, qualifications, and background. Thus while there are some
outstanding performers in every profession (and there also some who are less able than average),
they are relatively rare. As a result, the factors that account for excellence (or failure) in a
profession are hard to identify by studying a representative sample because the vast majority are
so similar in regard to both competence and background. One approach for identifying the
characteristics of truly outstanding professionals is to select and compare outliers—in this case,
professionals who are much better than average and those who are much less successful than
average. By eliminating the vast group of competent professionals who are are close to average
in ability and looking only at the extremes, the differences become much more apparent.
Other types of purposive samples. Comparative samples and outlier samples are among
the more commonly used purposive samples, but many other variations are possible. What all
purposive samples have in common is that population elements are hand picked with some goal
in mind. It is important that purposive samples should not be used if the primary goal is to create
a representative sample of the population. Hand picking almost guarantees that the sample will
not be representative. However, a properly designed purposive sample can yield insights that
would not be obtained with other types of samples including probability samples. As a
consequence, purposive samples are particularly useful for exploratory research conducted on
topics that are poorly understood or are new to science.
D. The Snowball Sample
Sometimes we want to study a population that is hidden or includes many people who do
not wish to be part of a study. Studies of these populations is often impossible if traditional
probability or nonprobability sampling is used. For example, if an organizational researcher
wished to study the working conditions of illegal immigrants, none of the traditional probability
or nonprobability techniques would be useful. There is no list of illegal immigrants, so
probability sampling is out of the question, and nonprobability sampling techniques such as
convenience sampling in neighborhoods where many illegal immigrants are likely to live are not
likely to yield an adequate sample because most illegal immigrants will avoid strangers asking
# 5: Sampling, pg 13
questions because they might be immigration officials. Similarly, studies of workers who are
dissatisfied or disgruntled with management would be difficult if traditional sampling techniques
were used because, again, there is no list from which to draw a probability sample, and
convenience sampling may fail because the workers may fear the researcher will report them to
management.
In recent years, sociologists facing this kind of problem have developed a new solution
called snowball sampling. Snowball samples are obtained by first identifying a few people who
have the desired characteristics. Typically the researcher will begin with a few friends or
acquaintances, or a knowledgeable person is contacted and asked to refer the researcher to a few
people with the relevant characteristics. The researcher then interviews, surveys or observes the
few people identified, but at the end of each interview or observation session, the researcher asks
the respondents if they know anyone else with the relevant characteristics. The people identified
by the original respondents are then asked to cooperate, and, after the data are collected, these
new respondents are asked to recommend still others for inclusion in the study. In this way a
large sample can be gradually created from just a few cases.
Snowball sampling is effective because most people know and are most comfortable with
others who are similar to them (social psychologists call this the “principle of homophily”). As a
consequence, the best way to find a particular type of person is to ask someone who shares the
trait in question. Furthermore, when we inform people that a mutual acquaintance has
recommend them to the researcher, they are much more likely to trust the researcher and
cooperate
IV. Avoiding Sampling Problems in Organizational Settings
Most research methods texts are designed primarily for researchers who want to
generalize from their research to large populations, so they often assume that the researcher will
be studying a sample. But, generalizing to large populations is not always the goal of
# 5: Sampling, pg 14
organizational research. Often, the researcher is concerned only with the organization itself, and
many organizations have fewer than 500 members, all of whom work in the same location. For
example, a small manufacturing company may want to know if a new program has improved the
morale or job satisfaction of its members, or a social service organization with a client base of
several hundred individuals might want to know if they are satisfied with the services they are
receiving. This kind of research does not require a sample at all—smaller organizations often
have the resources to study every member of the organization or every current client. This kind
of study is called a census, and, if possible and affordable, a census is always preferable to a
sample study because there is always some chance that any sample, even a simple random
sample, might not be representative of the population.
In organizational research, it is always best to study the entire population if that is
feasible. So long as there is a high level of participation, a census eliminates the possibility that
the results are biased through sampling errors. When the population of interest is only a few
hundred individuals, and they all are available in one location, it is often less costly and less time
consuming to study the whole population than to try to draw a representative sample for study.
Of course, if the organization is very large and/or its members are scattered over a wide area,
then a sample study may be the only choice.
Of course, if participation is low because many refuse to answer questions, then sampling
bias becomes an issue even if a census study has been completed. Low return rates are
particularly likely when researchers rely on mailed surveys or internet surveys. For example, if a
researcher uses a mailed survey or an E-mail survey to measure employee satisfaction, it would
not be unusual for less than half of the employees to return a completed survey. Furthermore,
since the most dissatisfied employees may be more highly motivated to let management know
# 5: Sampling, pg 15
what they think, the results could be somewhat biased by high return rates among the least
satisfied and low return rates among the most satisfied. When return rates are low, it is very
likely that the survey results would not be representative of the entire population. Considerable
research and experimentation has been done to identify strategies for increasing return rates for
mailed and internet surveys. One of the best discussions of these strategies is Dillman, Don A.
2000. Mail and Internet Surveys: The Tailored Design Method, 2nd Edition. Published by John
Wiley & Sons.
# 5: Sampling, pg 16
SAMPLING APPENDIX 1
ESTIMATING DESIRED SAMPLE SIZE
E. F. Vacha
2/20/08
Most people understand that the goal of most (but not all) sampling is to create a sample
that is representative of the population. One of the most frequent questions I am asked by both
clients and research methods students trying to design a good sampling procedure is “how many
people should be in my sample?” Unfortunately the answer to that question is not simple because
it depends on on a very nonintuitive aspect of probability statistics.
Sample size estimation is nonintuitive because most of us think of sample size in terms of
what percentage of the population is included in the sample, and we incorrectly assume that the
“representativeness” of a sample depends on its size relative to the population. In reality, the
representativeness of a sample depends on the spread of scores we would obtain if everyone in
the population was included in the study. If the scores that people get on whatever measure we
are using vary widely with lots of people getting low scores and many also getting high scores (a
lot of spread), we will need a large sample. But if the scores people can obtain vary by only a
few points (very little spread), we may be able to get by with a very small sample. If the spread
of scores is quite large, it is conceivable that through sheer bad luck, a small sample could
include way too many people with scores clustered around one extreme. However, if the scores
have very little spread, even a small sample will usually include people at both the high and low
extremes as well as people scoring in the middle of the range.
The most common measure of spread of scores is the standard deviation. It is a measure of
the average difference between actual scores of each individual and the mean (average) score of
the whole sample. If the spread of scores is quite large, the average difference of individual
scores will be very large as well, but if the spread of scores is small, most will cluster near the
group average so the average difference between individual scores will be quite small. The
standard deviation is simply a way of expressing the average difference between individual
scores and the over-all average score. If a graph of the scores is a bell shaped or “normal” curve,
two-thirds of the scores will be within one standard deviation of the mean. That is, if the average
score is 20 and the standard deviation is 2, two-thirds of the scores will fall between 18 and 22.
If, however the average score is 20 and the standard deviation is 5, then two-thirds of the score
will be spread out between 15 and 25.
Because the quality of a sample depends on the spread of the scores rather than the size of the
population, a poll of 50,000,000 voters may require a sample size no larger than a study of a
medium sized organization with only a few thousand members. For example, presidential polls
can provide consistently accurate results with random samples of around 1.500 because the
standard deviation in a two choice situation is very small.
Sample Size Estimation for Probability Samples
One of the great advantages of probability sampling is that it allows us to use a mathematical
formula to estimate how large our sample should be. To use this formula we must first decide
how accurately we want our average sample score to represent the population average (the
# 5: Sampling, pg 17
average score we would obtain if everyone in the population was included in the sample). We
call this value the “degree of accuracy”. We must also decide how confident we want to be that
the sample score is at the level of accuracy we choose (we call this the “confidence level”). For
example, we might decide that we want to try for a sample average score that is within one point
of the actual population score, and we want to be 95% sure that the sample will hit that degree of
accuracy. Then, if we can estimate the population standard deviation, we can plug all three
values into a simple formula to obtain the ideal sample size.
Here is the calculation:
1. To determine the desired size of a sample (or subgroup within a sample) for estimating a
population mean, three values must be known or guessed:
a. The confidence level. Usually a 95% confidence level is used. “We will be 95% sure that
the mean is within a specific range or degree of accuracy.”
b. The degree of accuracy . The degree of accuracy is a range expressed as plus or minus
some value. E.g., “+/- .1” means the true average in the population will be within one
tenth of a point of the value obtained from the sample.
c. The population standard deviation. The range of scores above and below the mean that
includes 2/3 of the population. If the mean is 3 and the population standard deviation is 1,
2/3 of the population would score between 2 and 4.
2. The population standard deviation must be guessed. If respondents can be expected to have
similar scores, the population standard deviation will be small, but if respondents’ scores
vary a lot, the population standard deviation will be large. Most researchers use the sample
standard deviation(s) from previous studies to estimate the population standard deviation.
The table below illustrates the calculation and some typical results. Notice that when the
population standard deviation is small, and we are willing to be 95% sure that our sample’s
average score is within .4 points of the actual population average score, our sample need only
include 24 individuals, but when the population standard deviation is 3.0, we need 216 in our
sample. Of course, we must have a probability sample to use this approach to estimating sample
size.
# 5: Sampling, pg 18
SOME SAMPLE SIZE CALCULATIONS
Population Standard Deviation (SD) Assumptions:
Other Assumptions:
SD = 1
SD = 1.5
SD = 2
SD = 2.5
SD = 3.0
Confidence Level = 95%
Degree of Accuracy = .1
384
864
1,536
2,401
3,457
Confidence Level = 95%
Degree of Accuracy = .2
96
216
384
600
864
Confidence Level = 95%
Degree of Accuracy = .3
43
96
171
267
384
Confidence Level = 95%
Degree of Accuracy = .4
24
54
96
150
216
Values: 95% Confidence Level = 1.96; 99% Confidence Level = 2.796
Formula: Sample N = (Conf. Level x Est. Pop. St. Dev. / Deg. of Accuracy)2
Nonquantitative Approaches To Estimating Sample Size
Of course, if we have a nonprobability sample like a convenience, quota, or snowball sample,
we can’t estimate the ideal sample size mathematically. Also, sometimes we have no basis for
estimating the population standard deviation of a probability sample. In these situations, the
general practice is to use the largest sample we can, and to base sample size decisions on the size
of the subgroups we wish to study.
Estimates based on number and size of subgroups. One approach for estimating sample size
that is useful for both probability and nonprobability sampling is to base the sample size on the
smallest subgroups studied. Whenever we wish to compare groups or types of people to each
other, we need to base our sample size on the smallest of those groups because we need to be
sure there are enough in each group to provide meaningful comparisons. For example, if women
and men in an organization were roughly equal in number, a meaningful comparison might be
generated from a sample of only 100-200. However, if only 15% of the population is women, a
sample of 100 would yield only about 15 women. With a sample as small as 15, if only a few
atypical cases or “outliers” were accidently included in the sample, the average scores for
women could be very misleading. Similarly, if we want to compare seven or eight subgroups we
would need a large sample to make sure we have enough people from each group.
Current practices. Another approach to estimating sample size is to base one’s decision on
current practices by researchers in the field. For example, researchers could review similar
studies on their topic to discover what size samples are usually used. Rossi, Wright and
Anderson's (1983) review of several hundred published sociological studies found that typical
sample sizes of published research differed depending on whether the studies were national or
# 5: Sampling, pg 19
regional, and depending on the number of subgroup comparisons that were made. Most
organizational studies are more similar to regional studies than national studies because the focus
of the study is often on a particular kind of organization or even just one organization. Rossi et
al, found with regional studies, when few or no subgroup analyses were conducted, samples
ranged from 200-500; when an "average" number of subgroup analyses were conducted, sample
sizes ranged from 500-1000; and when many subgroup analyses were conducted, sample sizes
ranged from 1,000-2500.
Sample Sizes For Exploratory Qualitative Research
A special situation exists when researchers use qualitative methods like focus groups,
participant observation, and unstructured interviews. All of these approaches to data collection
involve intensive and extended interaction between the researcher and each subject. Unstructured
interviews can consume several hours per subject, focus groups of 5 to 10 members sometimes
require multiple meetings of several hours each, and participant observation often requires
months observing a small number of people.
Qualitative methods are unsuitable for testing
hypotheses because they demand the use of small samples selected from those immediately
available to the researcher. The researcher sacrifices representativeness in order to gain a much
more complete, nuanced, and rich body of data about each subject. In most such studies sample
sizes typically range between 25 and 250. However, such small samples are adequate for creating
hypotheses that can be tested in future research using methods that lend themselves to the study
of larger more representative samples.
Download