Chapter 7 Worked Example Problems
The answers to the following examples are written beneath the problems in white font. The problems
are in a scrambled order and may not follow along exactly with the order of the material in the notes to
help you practice context recognition.
Example 1: Drug Deaths
The state of Kentucky has a problem with heroin. In 2015, the state averaged 10.4 overdose
deaths per county. Across 120 counties, this means a total of 1,249 deaths. State officials want
to examine if the number of overdose deaths is increasing. They take a random sample of 50
counties and find an average of 11.0 overdose deaths per county with a standard deviation of
3.9. Let 𝑥̅ be the average number of overdose deaths per county.
a. Describe the sampling distribution for 𝑥̅ , including the shape of its distribution, its
center, and an appropriate measure of spread.
Shape: n>30, random sample, so it is bell-shaped. Mean = population mean = 10.4. SE =
3.9/sqrt(50) = .5515.
b. If the state average really is the same as it was in 2015, what was the probability that
the sample mean would have been 11.0 or greater?
In other words, we need to find the probability of getting a sample mean of 11.0 or
greater from a sample of size 50. As with any other situation involving the probability of
getting a sample mean at least so large, we will use our sampling distribution
techniques. Using the values from part a in the normal calculator, we find P(X>11.0)
= .138
A look ahead:
c. In 2016, the recorded number of overdose deaths per county across the 120 counties
rose to 11.7. Suppose that the standard deviation remains 3.9. A politician is adamant
that the true year-to-year average overdose deaths per county remains at 10.4; in other
words, it has not increased. If the true average really were 10.4, what is the probability
of getting a yearly average of 11.7 or more? Assume the data are independent (which is
probably not the case, but we can use more refined methods later in the semester).
This produces a t-statistic of 3.65, which yields a probability of P(X>3.65) = .0001. In
other words, this could happen purely by chance about 1 out of every 10,000 years. The
t-model is appropriate because the data are independent and the ‘sample’ size of 120 is
greater than 30. It has 119 DF.
d. Your answer to part c tells us the probability that the results of 2016 were anomalous
purely by chance and do not represent a real increase in overdose deaths from 2015 to
2016. Given the size of this probability, which do you think is more plausible: that the
increase in the observed average for 2016 is merely anomalous, or that the true mean
may have increased from 10.4? In the next unit, we will formalize this probability as a ‘pvalue.’
This is the crux of statistical inference: using a calculated probability to make a decision about
what is more likely, that our results are super anomalous or that the initial assumptions were
incorrect. Since the probability that such an increase was due simply to sampling variability was
so incredibly low, it seems reasonable to conclude that the politician’s underlying assumption is
incorrect and that the average number of deaths per county really has increased from 2015 to
2016.
Example 2: Bank Loans
Based on past experience, a large national bank believes that 8% of people who receive their
loans will not make payments on time. The bank randomly sampled 200 loans from 4300
approved loans. What is the probability that more than 9% of these customers will not make
timely payments?
a. What is mean of the sampling distribution?
The mean of the sampling distribution will be the population value. “Based on past
experience” tells us that that believed value, 8%, is the parameter. Thus, the mean of
our sampling distribution is .08.
b. What is the standard deviation for the sampling distribution? The standard deviation of
the sampling distribution, which I sometimes call the standard error, is found with
sqrt(p*(1-p)/n) = sqrt(.08 * .92 /200). Notice that I’m using the parameter here, not the
sample proportion of .09
c. Is the sampling distribution approximately normal? Check the conditions to use normal
model.
The sample is randomly selected. Np = 200*.08 = 16 and n(1-p) = 200*.92 = 184 are both
at least 15. Thus, it is approximately normal.
d. Find the probability that more than 9% won’t make payments on time.
Use the mean and SD found in parts a and b in the normal calculator. Find P(X>.09) =
.301
Example 3: Caribou
Suppose that the mean number of caribou per km2 in a region of Siberia is 12 with standard
deviation 4. Past studies have produced data that were bell-shaped.
a. Can we use a bell-shaped model to represent the sampling distribution of 𝑥̅ , the sample
average number of caribou? Can we use a bell-shaped model to represent the
population number of caribou per km2? Would we use the same models? If not, what
would be different?
We know from the “past studies have produced data that were bell-shaped” that the
population number of caribou per km2 should be bell-shaped. Since the population
should be bell-shaped, the sampling distribution for 𝑥̅ should also be bell-shaped
regardless of the sample size. However, we would not want to use the same model for
both distributions. The two distributions will always have the same center, but the
standard deviation of the sampling distribution will be smaller than that of the
population; specifically, it will be equal to the population’s SD divided by the square root
of the sample size.
b. What is the probability that a randomly selected km2 contains more than 13 caribou?
There are two ways to approach this problem. The first is knowing that individual
observations (rather than the average of a sample) follow the population distribution.
We use the normal calculator with mean = 12 and SD = 4 and find P(X>13) = .401
For another approach, we can imagine we are working with a sample of size n = 1. For
that, the center of the sampling distribution would be the population mean = 12, and
the standard deviation of the sampling distribution would be 4/sqrt(1) = 4. Then, use the
normal calculator with mean = 12 and SD = 4 and find P(X>13) = .401.
c. In a sample of 49 km2, what is the probability that the average number of caribou per
square km exceeds 13?
The big difference between this problem and part b is that we are now forced to approach
this as a sampling distribution problem, and we can determine this by the fact that the
problem asks about the probability associated with an average (rather than just a single
value). We know the center of the sampling distribution must be 12, since that is the
population mean. The standard deviation for the sampling distribution is 4/sqrt(49) = 4/7.
We then use the normal calculator with that mean and SD to find P(X>13) = .040.
d. What is the probability that a sample of 144 km2 will have a mean number of caribou
exceeding 13?
We repeat what we did in part c, but we now use n = 144. Once again, since we are asked
about the probability for a sample mean, we use the sampling distribution. Changing the
sample size does not affect the population mean; it is still 12. The standard deviation is
now 4/sqrt(144) = 1/3. We find P(X>13) = .001.
e. What do our answers to b, c, and d tell you?
Notice that in all parts, we were finding the probability of getting a result greater than
13. For a single value, the probability of getting something about 13 was about 40%. For
the average of 49, it was about 4%. For the average of 144, it was about .1%. As we
increased our sample size, the probability got smaller and smaller. This reinforces a
concept demonstrated by the math: sampling distributions are less variable than
populations/samples, and as the sample size increases, the variability further decreases.
In other words, you are much more likely to get anomalous results from a single
observation or a small sample size than you are from a large sample size.
f. Revisit the sample of 49 km2. What is the probability that the sample average number of
caribou across 49 km2 is between 11 and 14?
P(11<X<14) = .960
Example 4: GMAT scores
Suppose GMAT scores (note: the GMAT is an entrance exam for business schools) are normally
distributed with a mean of 500 and a standard deviation of 110.
a. What would the standard error be for a sample of 100 students?
SE = SD/sqrt(n) = 110/sqrt(100) = 11.
b. As the sample size increases, what happens to the standard error (assuming the
standard deviation does not change)?
As n increases, SE decreases.
c. Which of these would have the most uncertainty regarding its results, and which would
have the least: i) an individual testers score; ii) the average of 10 scores; iii) the average
of 100 scores?
Most to least : individual, average of 10, average of 100.
Example 5: Tribe Size
The distribution of family size in a particular tribe of people is skewed to the right with a
population mean of 5.2 and population standard deviation of 2.0. Suppose an anthropologist
samples families in this society to estimate family size. For a random sample of 36 families, the
anthropologist gets a mean of 4.6 and a standard deviation of 3.2.
a. Identify the mean, standard deviation, and shape of the population distribution.
These values are all given in the description of the population. The population mean
(the parameter, μ) is 5.2. The population standard deviation (also a parameter, since it
comes from a population) is 2.0. The population is said to be skewed right.
b. Identify the mean, standard deviation, and shape of the sample.
These are also mostly given in the description of the problem. The sample mean (a
statistic, 𝑥̅ ) is said to be 4.6. The sample standard deviation (also a statistic) is 3.2. The
shape of the sample is not described, but we know that the shape of a sample tends to
resemble that of the population. Therefore, we can infer that the sample is skewed
right.
c. Identify the mean, standard deviation, and shape of the sampling distribution if we were
to construct one.
Here is where we must make some inferences beyond what is given in the problem
description. We know that the mean of the sampling distribution (aka. the center of the
sampling distribution) must be equal to the value of the parameter (the population
mean). Therefore, the mean of the sampling distribution should be 5.2. The standard
deviation of the sampling distribution (sometimes called the standard error) is
calculated as
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 (𝑜𝑓 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑜𝑟 𝑠𝑎𝑚𝑝𝑙𝑒)
√𝑛
. We have the option to use the
population SD or sample SD here. As a general rule of thumb in this class, whenever
possible, we prefer to use the population values in our calculations. Therefore, our
standard deviation for the sampling distribution is 2.0/sqrt(36) = 1/3. Finally, we know
that the shape of a sampling distribution for a sample mean will be approximately bellshaped if:
i)
the population or sample are known to be bell-shaped, or
ii)
the sample size is at least 30.
We know the population and sample aren’t bell-shaped, but since the sample size is
larger than 30, the sampling distribution for the sample mean will be large bell-shaped
despite the shape of the population.
Example 6: Stock market
Prior studies have shown that 30% of all business students at a university invest in the stock
market. You randomly select 100 students from this university and ask these students whether
or not they invest.
a. How would we construct a sampling distribution?
A sampling distribution is a collection of sample statistics. Thus, to produce one, we
need to collect a bunch of sample statistics. In this case, it would mean taking a bunch
of samples of size 100 and recording their sample proportions.
b. What is mean of the sampling distribution?
B. .3
C. .7
D. 100
E. 30
The population proportion, indicated by that “prior studies have shown” language, is .30.
c. What is the standard deviation for the sampling distribution?
Sqrt(.30*.7/100) = .046
d. Is the sampling distribution approximately normal? Check the conditions to use normal
model.
Np = 30; nq = 70. These are both at least 15. The samples are random and independent.
e. Find the probability that more than 35% of sampled students invest in the stock market.
Use mean = .30, SD = .046, find P(X>.35) = .139.
f. Find the probability that fewer than 20% of sampled students invest in the stock market.
P(X<.20) = .015