Uploaded by Sean.bradford23

Chapter 17 Notes and Problem Set

advertisement
Elementary Statistics, STA 2023
Vocabulary
From the Data at Hand to the World at Large
(Chapters17)
Inference with Means
Recall that statistical inference provides methods for drawing conclusions about a population
from sample data. The two most common types of statistical inference include confidence
intervals (estimate  margin of error) for estimating the value of population parameter and
tests of significance (Five Step Method) which assess the evidence provided by data about
some claim concerning a population.
Both types of inference are based on the sampling distributions of statistics. In other words,
both report probabilities that state what would happen if we used the inference method many
times. In chapters 18-19, we studied statistical inference in relation to proportions. We’ll now
study statistical inference relating to means (averages).
Sampling distribution of sample means: The idea is to take as many samples as possible of
size n from the population with mean  and standard deviation . Then collect the means, x ,
from all the samples, and display the distribution of the sample means. According to the
Central Limit Theorem, as the sample size n increases…
…the sampling distribution model of the sample means will be:
1. normally distributed
2. the mean of the sampling distribution of sample means will equal the population
mean,  x  
3. the standard deviation of the distribution of sampling means (standard error of
the mean) will be  x 

n
.
…as long as the following assumptions and conditions have been met:



Randomization
o SRS Condition: the sample is randomly selected
10% Condition
o 10% Condition: the sample is no more than 10% of the population
Sample Size Condition
o Normality Condition: Histogram of data is unimodal and symmetric as follows…
 if n < 30, it must be very close to normal (probability plot)
If population description leads to unimodality and symmetry, the condition
is fine.
If there is no population description, data must be present to verify
(chapter 3)

if n > 30, normality not necessary (if outliers are present, report results
with and without the outliers)
The Central Limit Theorem allows us to use normal probability calculations to answer
questions about sample means from many observations even when the population distribution
is not normal.
Problem 17-41
If we are just describing the distribution based on the histogram, we are essentially
treating that as the population. You would describe this distribution with tools used in
chapter 3. The distribution is unimodal, slightly skewed, with mean of 36 inches and
standard deviation of 4 inches.
According to the construct of our central limit theorem, we know the sample means will
be centered around 36 inches and our variability in the sample means decreases as the
sample size increases. Due to the nature of the population, we see the normal behavior in
the sample means earlier than the sample size condition would indicate (n>30)
Problem 17-45
Randomization-Assuming the students are assigned randomly
10%-25 students seem to be 10 percent of any incoming freshman class.
Sample size condition-n<30. However, since the original population was mound shape
(unimodal) and only slightly skewed. This condition is probably satisfied. Though
usually, this condition is scrutinized more.
Since the conditions checked out, the distribution of the sample mean is normal with
mean equal to the population mean (i.e. 3.4 GPA) and standard deviation equal to
̅) = 𝝈𝒚̅ = 𝟎. 𝟑𝟓/√𝟐𝟓
𝑺𝑫(𝒚
For the 68-95-99.7 Rule, we use the same rule in chapter 5 (I know you all love that rule.
Just kidding!!)
The rule applies the same exact way:
68 percent of the sample means are within 1 standard deviation of the mean.
95 percent of the sample means are within 2 standard deviations of the mean.
99.7 of the sample means percent are within 3 standard deviations of the mean.
For this problem:
68 percent of sample means are within 3.33 and 3.47
95 percent of sample means are within 3.26 and 3.54
99.7 percent of sample means are within 3.19 and 3.61.
Problem 17-46
This is yours to practice and reach out if you need help.
Problem 17-49
Part A is a chapter 5 problem; we begin by calculation of z-scores for 270 and 280.
𝒛𝟏 =
𝟐𝟕𝟎 − 𝟐𝟔𝟔
𝟏𝟔
𝒛𝟐 =
𝟐𝟖𝟎 − 𝟐𝟔𝟔
𝟏𝟔
Then we use normalcdf in the following way:
𝟏 𝟕
𝒏𝒐𝒓𝒎𝒂𝒍𝒄𝒅𝒇 ( , , 𝟎, 𝟏) = 𝟎. 𝟐𝟏𝟎𝟓
𝟒 𝟖
Part B asks how many days should the longest 25% duration of pregnancies last. We
would find the z-score of lowest 25% using:
𝒊𝒏𝒗𝒏𝒐𝒓𝒎(𝟎. 𝟕𝟓, 𝟎, 𝟏) = −𝟎. 𝟔𝟕𝟒𝟒
And then solve for x below:
𝟎. 𝟔𝟕𝟒𝟒 =
𝒙 − 𝟐𝟔𝟔
𝒙 ≈ 𝟐𝟕𝟕 𝒅𝒂𝒚𝒔
𝟏𝟔
Part C informs, we took a sample of size 60 from this population. We want to essentially
confirm the central limit theorem through conditions.
10%-60 women is less than 10 percent of all pregnant women in any location
Randomization- Assumed unless told otherwise.
Sample size condition-n>30, we are good.
Based on approved conditions, the following holds true:
𝝁𝒚̅ = 𝝁 = 𝟐𝟔𝟔 𝝈𝒚̅ = 𝟏𝟔/√𝟔𝟎
Part D wants to know what is the probability of a mean duration of 260 days. This is a
similar problem to the earlier part in process where we found probability. However, only
difference is now we want to know the probability of a sample average being a certain
value. Prior to this, it was individual observations. So, the calculation looks like below:
𝒛=
𝟐𝟔𝟎 − 𝟐𝟔𝟔
= −𝟐. 𝟗𝟎𝟒𝟕
𝟏𝟔
√𝟔𝟎
Then 𝒏𝒐𝒓𝒎𝒂𝒍𝒄𝒅𝒇(−𝟏𝟎𝟎𝟎𝟎, −𝟐. 𝟗𝟎𝟒𝟕, 𝟎, 𝟏) = 𝟎. 𝟎𝟎𝟏𝟖
Problem 17-60
It is for you to practice and reach out with questions.
Extra Problem
Suppose we would like to know if there is actually a gallon of milk in a gallon container.
Suppose for checking this industry standard, we then looked at the population and found a
distribution with mean 0.99 gallons with standard deviation of 0.10 gallons. If the distribution of
these containers was unimodal with a very small skew:
What would be the model/distribution to approximate an individual container containing at least
0.98 gallons of milk?
This is normal based on the fact that we randomly take measurements in a sufficient
size. There will be instance where we are below target and above target.
What would be the probability of an individual container measuring 0.98?
𝒛=
𝟎. 𝟗𝟖 − 𝟎. 𝟗𝟗
= −𝟎. 𝟏𝟎
. 𝟏𝟎
𝒏𝒐𝒓𝒎𝒂𝒍𝒄𝒅𝒇(−𝟎. 𝟏𝟎, 𝟏𝟎𝟎𝟎𝟎, 𝟎, 𝟏) = 𝟎. 𝟓𝟑𝟗𝟖
Now, suppose we take a sample of size 25, Calculate the 68-95-99.7 boundaries for such a
context.
The boundary for 68 percent would be (0.97, 1.01)
The boundary for 95 percent would be (0.95, 1.03)
The boundary for 99 percent would be (0.93, 1.05)
10%-25 gallon containers is less than 10 percent of any population/batches of gallon milk
Randomization-Assumed
Sample Size Condition-since n<30, we have to look back to the problem description to
see if our population exhibited normal behavior. In this case it does. Since our sample
mean is normally distributed, we can state that:
̅) =
𝝁𝒚̅ = 𝝁 = 𝟎. 𝟗𝟗 𝒂𝒏𝒅 𝑺𝑫(𝒚
𝝈
√𝟐𝟓
= 𝟎. 𝟎𝟐
What is the probability that a sample of size 25 has a mean weight of at least 0.96 gallons?
𝒛=
𝟎. 𝟗𝟔 − 𝟎. 𝟗𝟗
= −𝟏. 𝟓
𝟎. 𝟎𝟐
𝒏𝒐𝒓𝒎𝒂𝒍𝒄𝒅𝒇(−𝟏. 𝟓, 𝟏𝟎𝟎𝟎𝟎, 𝟎, 𝟏) = 𝟎. 𝟗𝟑𝟑𝟐
Download