Sample Size

advertisement
Sample Size
Slide 1
The goal of this lecture on sample size is to discuss the basic issues associated with selecting a
sample size and to discuss the basic approaches that one might take in determining an
appropriate sample size.
Slide 2
Sample error is related to sample size. The returns to increasing sample size are diminishing.
With a very small sample, the random sampling error will be quite large. If sample size is
increased, then random sampling error drops quickly. As sample size increases further, random
sampling error does not drop as quickly as it did initially. Hence, the opportunity to identify
optimal sample size relative to error and budget.
Slide 3
As mentioned in the lecture on different types of samples, non-probability or non-scientific
samples have no statistical properties. Therefore, researchers don’t worry about sampling error
for a non-probability sample because they’re uninterested in extrapolating the results from the
sample to a larger population. If they’re working with a probability sample, then they’re
interested in extrapolating results to a larger population.
There are some practical issues regarding the reduction in random sampling error. The practical
issues are threefold: financial, statistical, and managerial.

Financial, in the sense that data is a resource and each completed survey has a cost
associated with it. The previous graph shows that sampling error declines at a declining
rate as sample size increases. The first thing a marketing manager must consider is how
much it will cost for each additional data point and how much reduction in error is
associated with that cost. Any real time marketing project has a budget.

Statistical, in the sense that point estimates are assumed with error. Recall any recent
election; poll results always are stated plus or minus some percent. For the leading
candidate, newscasters often indicate if his/her lead is within the margin of error. If within
that margin, then the results may have reversed had a different sample of voters been
queried. From a statistical standpoint, knowing the point estimate could be lower or
higher than the true score raises issues about the acceptable plus or minus range. The
larger the sample, the closer the endpoints of the range to the point estimate. A point
estimate may be +/- 10% for a small sample but only +/1 1% for a large sample.

Managerial, in the Bayesian sense of the preferred level of confidence about the
outcome. How much does additional data reduce uncertainty? Is a high degree of
confidence in the estimates necessary, or will a ball park estimate suffice?
Each of these considerations—budgetary, statistical, and managerial—all influence the
appropriate sample size for a probability sample.
Slide 4
Here are several approaches that a manager might use to determine a sample size for a
probability sample. Some are more suboptimal than others. A few especially poor ways:
Page | 1

The worst is the blind guess. Guessing 300 respondents sounds good is a terrible way to
determine the sample size. Such a guess probably will be wrong.

Managers might also use an available budget. The manager might think he has $10,000
to conduct a study and $6,000 of that $10,000 should be spent on collecting data;
therefore, the sample will be as large as $6,000 permits. That’s a terrible way to set a
sample size. The point of advertising is to accomplish some goal, so advertisers should
spend whatever is necessary to accomplish that goal. Spending too much is a waste and
spending too little won’t accomplish the goal. The same is true in selecting a sample
size. Using the available budget is a sub-optimal decision rule.

How much does one need their uncertainty reduced? A Bayesian-based approach may
be reasonable but too complex for most marketing managers.

A fourth approach is to use basic rules of thumb. From a statistical standpoint, one rule
of thumb is 100 cases for every main group and between 20 and 100 cases for every
sub-group. A sample of that size should include enough respondents to avoid major
random sampling error. For example, in a study on gender differences, the main group
would be 100 males and 100 females. That same study also might examine difference
by gender and age, so age would be the subgroup.
There are better ways to identify an appropriate sample size.

One easy way is to use conventional wisdom and follow the standards for comparable
studies. Some clever people have already examined the statistical and cost implications
of different sample sizes and have identified the appropriate size for different types of
studies. I recommend this approach because it doesn’t require a great knowledge of
statistics, making assumptions, or doing calculations.

A more sophisticated approach considers statistical precision; the acceptable plus or
minus percent for point estimates. However, this statistical sophistication may be beyond
most managers’ grasp, which makes its use problematic.
Slide 5
Here’s what I mean by typical sample size for studies. Test market penetration studies should
include no fewer than 200 respondents, but preferably between 300 and 500 respondents. A TV
commercial test should include at least 150 respondents, but preferably 200 to 300 respondents
per commercial. Such data is readily available and easy to access and use.
Slide 6
Assuming an appropriate level of statistical sophistication and the availability of certain types of
information, the statistical precision approach would be preferred. Here’s the type of things that
one must know to use this approach.

The variability of the total population and the individual stratum. The make the most
efficient use of data collection dollars, one should oversample more variable strata and
undersample less variable strata. Overall variability of the population also is important; to
reduce random sampling error, the sample size should be larger if the population is more
variable.
Page | 2

The acceptable level of random sampling error. This level could be high or low,
depending on the needed level of confidence (+/- percent) in the estimates.

The way in which data are distributed. If data are normally distributed, then a sample of
a certain size is needed to ensure a minimal random sampling error. If the data is nonnormally distributed—for example, bi-modally or uniformly distributed—then a larger
sample is needed, relative to normally distributed data, to ensure a minimal random
sampling error.
Slide 7
If you’re uncomfortable with performing calculations, then you might consider online sample
calculators like the one linked to in this slide.
Slide 8
The assumption in the statistical precision and sample calculator approach is that there’s one
key variable on which to base sample size. That variable could be the most important question
in a survey. If you would like to do the calculations yourself, the remaining slides suggest the
appropriate formulas for determining sample size. The formula is relatively straightforward; n
(the sample size) is equal to the square of the confidence interval (in standard area units)
multiplied by the standard error of the mean, and then divided by the acceptable magnitude of
error.
Slide 9
Here are two examples based on the formula. Suppose a survey researcher, studying
expenditures on lipstick, wishes to have a 95% confidence level and a range of error less than
$2.00. The estimate of the standard deviation is $29.00. That estimated standard deviation is
possibly based on previous studies, or it could be a guess, but the quality of that guesstimate is
critical to properly determining sample size. To apply the statistical precision approach, you
must know certain things and feel confident that you know them.
Slide 10
Plugging in the values from the previous slide, remembering the Z score is 1.96; we run through
this calculation and discover that the appropriate sample size is 80 respondents.
Slide 11
Let’s take this same example, but let’s double the range of the error from +/- $2 to +/- $4. By
how much is sample size reduced when the acceptable range of error is doubled?
Slide 12
By doubling the acceptable range of error, from +/- $2 to +/- $4, the necessary sample size
shrinks from 808 to 202. It’s ¼ the original sample by doubling the acceptable range of error.
Slide 13
Instead of being 95% confident, assume 99% confident. Instead of a Z score of 1.96, it’s 2.57.
Given the same set of calculations, instead of sample sizes of 808 and 202, they’ve grown to
1389 and 347 respectively. Going from a 95% confidence level to a 99% confidence level
almost doubles the required sample size.
Page | 3
Slide 14
Suppose the key variable in our statistical precision approach is a proportion. Think about
Presidential polls and the voters choosing one candidate versus another, which would be a
proportion. If that was the key question, then the appropriate sample size formula would be the
one shown here.
Slide 15
For a proportion sample size, the number of items in the sample is calculated as follows: Z
squared is the square of the confidence interval, in standard area units. At a 95% confidence
level, that’s 1.96 squared. P is the estimated proportion of successes, and q is the estimated
proportion of failures (or 1 – p). E squared is the square of maximum error between the true
proportion and the sample proportion.
Slide 16
Here’s an example of a calculation based on that previous formula. Assume that p is 0.6, which
makes q equal to 0.4. Also assume the difference between the estimated proportion and true
proportion is 0.035. Plugging those numbers into the formula produces the appropriate size for a
probability sample of 753.
Slide 17
As an alternative to the last formula, one could use the sample size calculator shown here.
Scale A indicates the percent favorable responses, which is the p-value. In scale C, what’s
indicated is the percent error that’s acceptable at a confidence level, either 95% on the left side
of the scale, or 99% on the right side of the scale. All that’s necessary is to take a straight edge,
move it on the left-hand scale A to the desired p-level, move the other edge to the right-hand
scale C, and line up with the percent error of favorable responses (where the straight edge
crosses scale B). The point at which it crosses scale B indicates the appropriate sample size.
Page | 4
Download