Chapter 5: Inferences About Means

advertisement
Chapter 5: Inferences About Means
There are lots of different parameters we might be interested in with a quantitative variable – the
mean, the median, the quartiles or any other percentile, the standard deviation, the interquartile
range. In this chapter, we’ll examine making inferences for a population mean. We should
always remember, however, that the mean does not completely describe the population.
Example: The following is a histogram of the weights of 17 bags of peanut M&M’s. The
package claims a net weight of 49.3 grams. Assuming these bags represent a random sample of
bags of M&M’s produced, how big or small do we believe the mean net weight of all bags
produced to be? Is there evidence that the mean weight of all bags is less than 49.3 grams?
5
4
3
2
1
Std. Dev = 2.08
Mean = 50.5
N = 17.00
0
46.0 47.0 48.0 49.0 50.0 51.0 52.0 53.0 54.0
Weight (grams)
We want to construct a confidence interval for µ, the mean weight of all 49.3 g. bags of peanut
M&M’s. Our best guess for µ is the sample mean weight y = 50.54 g. In order to guess how far
y might be from µ, we need to know what the sampling distribution of y looks like. That is, if
we repeatedly took random samples of 17 bags from the population of bags of M&M’s, and
computed the sample mean for each sample, how would those sample means vary around the
true mean?
Recall that the sample mean is given by:
1
y = ( y1 + y 2 + ... + y n )
n
where y1 ,..., y n are independent random variables each with the same distribution as the
population distribution.
E( y ) =
Var( y ) =
2
SD( y ) =
Central Limit Theorem: If n is large, then the sampling distribution of y is approximately
normal.
⎛ σ ⎞
y−µ
Hence, for large n, y is approximately N ⎜⎜ µ ,
⎟⎟ . Therefore,
is approximately N(0,1)
SD( y )
n⎠
⎝
for large n and a confidence interval for the population mean µ is
y ± z * SD( y ) =
y ± z*
σ
.
n
Unfortunately, we don’t know σ. So, we estimate it by the sample standard deviation
1
s=
( yi − y ) 2 . The resulting estimate of SD( y ) is called the standard error of the
∑
n −1
mean (sometimes abbreviated SEM):
SE( y ) =
s
n
.
For the sample mean, however, there is some theory that tells us how we should adjust the
formula for the confidence interval to make up for the fact that we are estimating σ. The
adjustment follows from the fact that if we are sampling from a normal distribution, then
y−µ
y−µ
=
SE ( y ) s / n
follows a t distribution with n-1 degrees of freedom (sometimes called a Student’s t distribution).
The confidence interval for the population mean is then
y ± t n*−1SE ( y ) =
y ± t n*−1
s
n
where t n*−1 is the appropriate critical value from a t distribution with n-1 degrees of freedom.
What is a t distribution? It’s not just one distribution; it’s a whole family of distributions.
There’s a different t distribution for each degrees of freedom (abbreviated as df).
3
•
A t distribution is similar to the standard normal distribution: it’s symmetric and
centered at 0, but more spread out than the standard normal. So the critical value t n*−1
will always be bigger than z* for any n: using this bigger value compensates for the fact
that we don’t know σ and have to substitute s.
•
Table T in the back of the book gives the critical values for various confidence levels
(across the bottom) for various degrees of freedom (the rows). The line for df=∞ is the
standard normal distribution (z* values).
•
As the degrees of freedom increase, the spread of the t distribution becomes less so that
t n*−1 approaches z*.
M&M example: For the time being, we’ll assume that the distribution of weights of all M&M
bags follows a normal model. We’ll examine this assumption and the other assumptions for
inference about a population mean after we illustrate the computation of the confidence interval
and a hypothesis test. The sample size is n = 17, so to form a 95% confidence interval for the
*
mean weight of all bags of M&M’s produced, the value of t n*−1 = t16
= 2.120 from Table T.
Therefore, the 95% confidence interval is
y ± t n*−1
⎛ 2.080 ⎞
= 50.54 ± 2.120⎜⎜
⎟⎟ = 50.54 ± 2.120(.5045) = 50.54 ± 1.07 = 49.47 to 51.61 gm.
n
⎝ 17 ⎠
s
We are 95% confident that the mean weight of M&M bags is between about 49.5 and 51.6
grams.
Which of the following are appropriate interpretations of the confidence interval?
•
95% of all bags of M&M’s weigh between 49.5 and 51.6 grams.
•
We are 95% confident that a randomly selected bag will weigh between 49.5 and 51.6
grams.
•
The mean weight of the bags is 50.54 grams 95% of the time.
•
Approximately 95% of all samples of size 17 will have mean weights between 49.5 and
51.6 grams.
The one-sample t-test
As consumers, we might be interested in whether the sample data provide evidence that the mean
weight of M&M bags is less than the claimed weight of 49.3 grams. We are therefore interested
in testing the hypotheses:
H 0 : µ = 49.3 grams
H A : µ < 49.3 grams
where µ is the mean weight of all M&M bags.
4
•
As always, the P-value is the probability of observing a sample result as extreme or
more extreme( in the direction of the alternative hypothesis) as the one we observed, if
the null hypothesis were true.
•
If we knew the population standard deviation σ, then this calculation would be relatively
easy. For the hypothesis test above, we would compute the probability of observing a
sample mean as small or smaller than the one we observed if the population mean were
49.3 grams. That is, we would compute
P( y ≤ 50.54 | µ = 49.3).
If H0 is true, then the mean of the sampling distribution of y is 49.3 grams and the
standard deviation is SD( y ) = σ / n = σ / 17 . If the population distribution is
normal, or if the sample size n is large enough so that the Central Limit Theorem applies,
y − 49.3
has a
then the sampling distribution of y is also approximately normal. Hence,
SD( y )
⎛
50.54 − 49.3 ⎞
⎟.
standard normal distribution and the P-value is P⎜⎜ Z <
SD( y ) ⎟⎠
⎝
•
If we don’t know the population standard deviation σ, as will almost always be the case,
then we plug in SE( y ) = s / n = 2.080 / 17 = 0.5045 for SD( y ). This changes the
standard normal distribution into a t distribution with n-1 = 16 degrees of freedom. The
P-value is then the probability that a t16 is less then the test statistic. The test statistic is
t=
•
50.43 − 49.3 1.13
=
= 2.460 .
SE( y )
.5045
Since the alternative hypothesis is “<”, the P-value is the area to the left of 2.460 in the
t16 distribution.
Computing the P-value
We can compute the P-value in three ways:
•
Approximately, from Table T in the text. Table T is not complete enough to calculate the
P-value exactly. All we can tell is that the area to the right of 2.460 is between .01 and
.025 (because 2.460 is between 2.120 and 2.583 in the 16 df row of the table). The Pvalue is the area to the left of 2.460 so the P-value is between .975 and .99 (or, we can
write .975 < P < .99).
•
Exactly, using a t distribution calculator available on the web (see
http://www.stat.sc.edu/~ogden/javahtml/pvalcalc.html ).
•
Exactly, using SPSS to do the hypothesis test. SPSS gives only the P-value for a twosided alternative so we have to derive the P-value for a one-sided test from the two-sided
P-value. For the M&M data, SPSS gives the value .026 for the two-sided P-value: this is
the area to the right of 2.460 (the test statistic) plus the area to the left of -2.460. This
means that the area to the right of 2.460 is .013 and, therefore, the area to the left of 2.460
is .987., which is the exact P-value.
The large P-value means that there is no evidence that the mean weight of all M&M bags is less
than 49.3 grams. This makes sense: the sample mean is greater than 49.3 grams so this is
5
certainly not evidence that the true mean is less than 49.3 grams as opposed to the null
hypothesis that it is equal to 49.3 grams.
Suppose the M&M company management wanted to know if the data provided evidence that the
mean weight of the bags was greater than 49.3 grams. Then they would test the hypotheses
H 0 : µ = 49.3 grams
H A : µ > 49.3 grams
The test statistic is unchanged at t = 2.460. The P-value is now the area to the right of 2.460 on a
t distribution with 16 df. Using the table, we have that the P-value is between .01 and .025.
Using SPSS or a t distribution calculator, we have that P-value = .013. Thus, we have fairly
strong evidence that the mean weight of all bags of M&M’s is greater than 49.3 grams. This
makes sense: the weaker the evidence that µ < 49.3 grams, the stronger the evidence that µ > 49.3
grams.
The fact that this one-sided test gives fairly strong evidence that the mean weight is greater than
49.3 grams does not tell us whether the difference is of practical importance. From the
manufacturer’s viewpoint, suppose if the mean weight is within .5 grams of the target, it’s not
worth the expense of recalibrating the machinery to correct. More than .5 grams is worth
correcting, however. Then we have to look at a confidence interval. The 95% confidence
interval is 49.5 to 51.6 grams. What would our conclusion be based on the confidence interval:
should the machinery be shut down or not? Or should more data be collected?
Assumptions of the t procedures
The t procedures (confidence interval and hypothesis test) make two assumptions that we’ve
seen before for proportions:
•
Randomization condition: the data are from a simple random sample from the population.
•
10% condition: the sample is no more than 10% of the population.
The second condition is certainly satisfied in the M&M problem. The first condition is not
strictly satisfied. These 17 bags were not randomly selected from all bags of M&M’s produced
over some time period since they were bought at one store on one day. It might be more
reasonable to assume that these are like a random sample from the bags in one shipment. We
could judge this better by buying batches of bags from different stores at different times.
There is a third assumption of the t procedures: that the data come from a normal population.
When σ is replaced by s in the formula for SD( y ), the standard normal distribution becomes a t
distribution only if the population distribution is normal. In the M&M example, that means we
need to assume that the distribution of weights of all bags of M&M’s in the population has a
normal distribution. Of course, hardly any real populations have distributions that are exactly
normal, so what good are the t procedures?
It turns out that the t procedures are quite robust to the assumption of normality if the sample size
is large enough. “Robust” means that the procedures work as advertised even when the
population isn’t quite normal: 95% confidence intervals will still contain the true mean about
95% of the time, and the P-value will still pretty accurately represent the probability of getting a
sample result as extreme or more extreme than the one you got if the null hypothesis were true.
6
The smaller the sample size, the more important is the assumption of normality. Our text gives
rough guidelines based on sample size, but remember that there is nothing magic about these
exact cutoffs: it’s gradual. The guidelines from our text are:
•
n < 15: the population distribution should be pretty normal; at least, mound shaped and
symmetric with no outliers.
•
15 ≤ n < 40: the population distribution should be unimodal and roughly symmetric
•
n > 40: the t procedures work on most distributions, even skewed ones. Outliers are
always a potential problem but they have less influence with larger sample sizes. If the
population is skewed, you should ask yourself if the mean is a useful measure of center
for your purposes.
Unfortunately, we only have the sample on which to judge the normality of the whole
population. The smaller the sample, the more important is the normality assumption and the
harder it is to judge from the sample. For small samples, we often simply look for outliers or
evidence of strong skewness. We also rely on our prior knowledge about the distribution of the
variable. In addition to histograms, normal probability plots are a useful graphical tool for
assessing the normality assumption.
Sample size calculations
It is important to be able to estimate the sample size required to get the desired level of accuracy
for a mean. For example, in the M&M example, suppose we wanted to estimate the population
mean weight to within a margin of error of ±0.5 grams with 95% confidence. The equation for
2
⎛ t* s ⎞
s
the margin of error is ME =
. Solving for n gives n = ⎜⎜ n−1 ⎟⎟ . So we need a guess at s,
n
⎝ ME ⎠
either from a pilot study or from our prior knowledge based on similar studies. We also don’t
know t n*−1 until we know the sample size so we use z* as an approximation. For the M&M
example, if we use the standard deviation of the initial sample as a guess at s for a future sample,
then the estimated sample size for a margin of error of ME = 0.5 with 95 % confidence is:
t n*−1
2
⎛ 1.96(2.08) ⎞
n=⎜
⎟ = 66.5
0 .5
⎠
⎝
So we would guess we need a sample size of around n = 67 bags. Remember that this won’t
guarantee that we will achieve our goal since our actual s could be bigger than our guess. In
addition, we used z* instead of t n*−1 , which is bigger. We could refine our calculation by
plugging in t n*−1 for n = 67 to get a revised estimate of sample size. It only increases the
estimated sample size slightly (to 70) in this case.
Correspondence between confidence intervals and two-sided hypothesis tests
There is an exact correspondence between confidence intervals for a mean and two-sided
hypothesis tests for a mean. A level C confidence interval contains all the null hypothesis values
which would not be rejected at the α = 1-C level. So, for example, the 95% confidence interval
for a mean contains all the null hypothesis values which would not be rejected at the α = .05
level.
Download