Chapter 12 - Significance Tests in Practice

advertisement
Chapter 12
Significance Tests in Practice
AP Statistics
Hamilton/Mann
Introduction
• In Chapter 11, we made the unrealistic claim that
we knew the population standard deviation σ.
• In this chapter, we will stop making this assumption.
• Just like with confidence intervals, when we no
longer know the population standard deviation σ,
we must use a t distribution to carry out a
significance test.
• Remember the use and abuse of tests from 11.3.
• Also remember that we could be making a Type I or
Type II error whenever we use a significance test.
• Think before you calculate!
CHAPTER 12 SECTION 1
Tests about a Population Mean
HW: 12.2, 12.3, 12.4, 12.5, 12.6, 12.7, 12.9, 12.10,
12.11
Guinness Brewery
• William S. Gosset was involved in experiments and
statistics to understand data for the Guinness
brewery in Dublin, Ireland. He was trying to
determine the best varieties of barley and hops for
brewing? He ran into the problem of not knowing
the population standard deviation σ. He observed
that replacing σ by s and calling the result roughly
Normal wasn’t accurate enough. After much work,
Gosset developed what we now call the t
distribution. Guinness allowed Gosset to publish
his discoveries, but not under his own name. He
used the name “Student,” and, as a result, it is
sometimes referred to as Student’s t distribution.
• The t statistic has the same interpretation as any
standardized statistic: it says how far is from its
mean μ in standard deviation units.
• We are now going to learn to find P-values for a
significance test about μ using the t table.
Determining P-values
• Suppose we carry out a significance test of
based on a sample of size 20 and obtain
t = 1.81.
• Since there were 20 observations, we would have
df = 19. So look along the row with df = 19. Our t
statistic falls between 1.729 and 2.093 which
correspond to upper-tail values of 0.05 and 0.025.
• We can conclude that our P-value is between 0.025
and 0.05 because we were performing a one-sided
test and it would be the area to the right, hence it
would be the upper-tail value.
Determining P-values
• Suppose we carry out a significance test of
based on a sample of size 37 and obtain
t = -3.17.
• Since there were 37 observations, we would have df =
36. Since there is no 36 in the table, we look along the
row with df = 30. Our t statistic falls between 3.030
and 3.385 which correspond to upper-tail values of
0.0025 and 0.001.
• Since it is a two-sided test, we have to find the
probability less than -3.17 or greater than 3.17. Due to
symmetry, the lower tail value would be the same as
the upper-tail value, so we just double both of them.
So we can conclude that our P-value is between 0.002
and 0.005.
The One-Sample t-test
• In significance tests as in confidence intervals, we
allow for unknown σ by using the standard error
and replacing z by t.
Testing
• Now we can do a realistic analysis of data produced
to test a claim about an unknown population mean.
• Again, we need to follow the steps in our Inference
Toolbox.
1. Hypotheses – What is the population of interest and
what are the hypotheses we are testing?
2. Conditions – SRS, Normality and Independence
3. Calculations – Find P-value
4. Interpretation – Connection, Conclusion, Context
Sweet Cola
• Diet colas use artificial sweeteners to avoid sugar.
These sweeteners gradually lose their sweetness over
time. Manufacturers therefore test new colas for loss
of sweetness before marketing them. Trained tasters
(wouldn’t you love this job) sip the cola along with
drinks of standard sweetness and score the cola on a
“sweetness scale” of 1 to 10. The cola is then stored for
a month at high temperature to imitate the effect of
storing it at room temperature for four months. Each
taster will then score the cola again after storage. Our
data are the differences (score before storage minus
score after storage) in the tasters’ scores. The bigger
these differences, the bigger the loss of sweetness.
Sweet Cola
• Here are the sweetness losses for a new cola, as
measured by 10 different sweetness tasters.
2.0
0.4
0.7
2.0
-0.4
2.2
-1.3
1.2
1.1
2.3
• Notice that most are positive, indicating a loss of
sweetness.
• Is there good evidence that the cola lost sweetness
in storage?
• Step 1: We are interested in the average difference
in sweetness between the before storage and after
storage sweetness score.
Sweet Cola
• Step 2 – Conditions – Since we do not know the
standard deviation of sweetness loss in the
population of tasters, we must use a one-sided ttest.
– SRS – We must be willing to treat our 10 tasters as an
SRS from the population of tasters if we want to draw
conclusions about tasters in general. The tasters all have
the same training. So even though we don’t have an
actual SRS, we are willing to act as if we did.
– Normality – The sample is too small to effectively check
normality. The stemplot and a boxplot show
left skewedness, but no gaps or outliers.
– Independence – It is reasonable that different
tasters would have results independent from each other.
Sweet Cola
• Step 3 – Calculations
• Since our t-value is 2.70 and we have 10
observations, our P-value would be between 0.01
and 0.02. Using the calculator, we can get the exact
value of 0.0123.
• Step 4 – Interpretation – A P-value between 0.01
and 0.02 is quite small and gives good evidence
against H0. Therefore, we reject H0, and conclude
that the cola has lost sweetness during storage.
T-Procedures
• Because t-procedures are so common, all statistical
software packages will do the calculations for you.
• The next two slides have the calculations as
performed by 4 different statistical software
packages:
1. DataDesk
2. Fathom
3. Minitab
4. CrunchIt!
Two-Tailed Example
• An investor with a stock portfolio worth several
hundred thousand dollars sued his broker because lack
of diversification in his portfolio led to poor
performance. The table below gives the rates of return
for the 39 months the account was managed by the
broker.
-8.36
1.63
-2.27
-2.93
-2.70
-2.93
-9.14
-2.64
6.82
-2.35
-3.58
6.13
7.00
-15.25
-8.66
-1.03
-9.16
-1.25
-1.22
-10.27
-5.11
-0.80
-1.44
1.28
-0.65
4.34
12.22
-7.21
-0.09
7.34
5.04
-7.24
-2.14
-1.01
-1.41
12.03
-2.56
4.33
2.35
• An arbitration panel compared these returns with the
average of the Standard & Poor’s 500 stock index for
the same period. Consider the 39 monthly returns as a
random sample from the monthly returns the broker
would generate if he managed the account forever. Are
these returns compatible with a population mean of
the S&P 500 average?
Two-Tailed Example
• Step 1 – Hypotheses
where μ is the mean return for
all possible months that the
broker could manage this
account
• Step 2 – Conditions – Since we don’t know σ, we
must use t-procedures.
– SRS – We were told to assume it was a random sample.
– Normality – Since we have 39 observations, the Central
Limit Theorem says that it is approximately Normal. A
boxplot and a histogram verify no outliers and no strong
skewness.
– Independence – This is a matter of judgment. Would
these 39 months represent independent observations?
Two-Tailed Example
• Step 3 – Calculations
• Since our t-value is -2.14 and we have 39
observations, we would get an upper-tail value
between 0.02 and 0.025. Since it is a two-tailed
test, our P-value would be between 0.04 and 0.05
• Step 4 – Interpretation – The mean monthly returns
for this client’s account differs significantly from the
S&P 500 for the same period (t = -2.14, P<0.05).
• Software outputs are on the next two slides!
Estimating Mean Stock Return – C.I.
• The mean monthly return on the client’s portfolio was
and the standard deviation was
Our resulting 95% confidence interval is
• Because the S&P 500 return, 0.95%, falls outside of our
interval, we know that μ differs significantly from 0.95%
at the α = 0.05 level. Since the S&P 500 showed a
mean gain of 0.95% during this time period, we can say
with 95% confidence that the underperformance of this
portfolio is between 0.09% and 4.01% per month. This
estimate helps to determine the compensation owed to
the investor.
Paired t Tests
• In the taste test Example, the same 10 tasters rated
before and after sweetness. Since the data were
paired by taster, we performed a one-sample t test
on the differences.
• That is, we used a paired t test.
• We are now going to look at another example of a
paired t test.
Floral Scents and Learning
• We hear that listening to Mozart improves students’
performance on tests. Perhaps pleasant odors have
a similar effect. To test this idea, 21 subjects
worked a paper-and-pencil maze while wearing a
mask. The mask was either unscented or carried a
floral scent. The response variable is their average
time on three trials. Each subject worked the maze
with both masks, in a random order. The
randomization is important because subjects tend
to improve their times as they work a maze
repeatedly. The table on the next slide gives the
subjects’ average times with both masks.
Floral Scents and Learning
Subject Unscented Scented
Difference
Subject Unscented Scented
Difference
1
30.60
37.97
-7.37
12
58.93
83.50
-24.57
2
48.43
51.57
-3.14
13
54.47
38.30
16.17
3
60.77
56.67
4.10
14
43.53
51.37
-7.84
4
36.07
40.47
-4.40
15
37.93
29.33
8.60
5
68.47
49.00
19.47
16
43.50
54.27
-10.77
6
32.43
43.23
-10.80
17
87.70
62.73
24.97
7
43.70
44.57
-0.87
18
53.53
58.00
-4.47
8
37.10
28.40
8.70
19
64.30
52.40
11.90
9
31.17
28.23
2.94
20
47.37
53.63
-6.26
10
51.23
68.47
-17.24
21
53.67
47.00
6.67
11
65.40
51.10
14.30
• To analyze these data, subtract the scented times from
the unscented times. Therefore a positive value
indicates that the subject did better wearing the
scented mask.
Floral Scenting and Learning
• Step 1: Hypotheses – μ is the mean difference in the
population from which the subjects were drawn.
• Step 2: Conditions – Since we don’t know σ, we must
use t procedures.
– SRS – The data come from a randomized matched pairs
design, which means we can attribute any difference to the
treatment. We can only generalize the results to the
population if the sample is an SRS.
– Normality – Since it is not large enough, we must look at a
stemplot and histogram to see if it is reasonably Normal. It
has no outliers or gaps and appears Normal.
– Independence – It seems reasonable that one subjects
difference in average mean completion time with the two
different masks would be independent of another subjects.
Floral Scenting and Learning
• Step 3 – Calculations
– For this t value, our P-value would be greater than 0.25.
• Step 4 – Interpretation
– Since this P-value is large, we fail to reject H0. Therefore,
there is not enough evidence to conclude that scented
masks improve performance.
• The next three slides contain statistical software
printouts for the significance test.
One Sample t Test: Robustness and Power
• Recall from Section 10.2 that t procedures are
robust against non-Normality of the population
except when outliers or strong skewness are
present.
• As the sample size increases, the Central Limit
Theorem ensures that the distribution of the
sample mean becomes more nearly Normal and
that the t distribution becomes more accurate for
calculating P-values.
• Review the guidelines in the box “Using the t
Procedures” on p. 655.
One Sample t Test: Robustness and Power
• The power of a statistical test measures its ability to
detect deviations from the null hypothesis.
• In practice, we carry out the test in the hope of
showing that the null hypothesis is false, so higher
power is important.
• The power of the one-sample t test against a
specific alternative value of the population mean μ
is the probability that the test will reject the null
hypothesis when the mean has this alternative
value.
CHAPTER 12 SECTION 2
Tests about a Population Proportion
HW: 12.23, 12.24, 12.26, 12.29, 12.30, 12.31, 12.32
Tests about a Population Proportion
• When the three important conditions are met, the
sampling distribution of is approximately Normal
with mean
and standard deviation
– SRS
– Normality – np and n(1-p) both greater than 10
– Independence
• For confidence intervals since we were trying to
estimate p, we replaced p with in the standard
deviation formula which gave us the standard error.
•
Tests about a Population Proportion
• Now, we are performing a significance test. In a
significance test, our null hypothesis specifies a
value for p which we call p0.
• So we will use p0 to find the standard deviation
since we are assuming that it is correct.
• This means that our statistic is
Significance Test for a Proportion
Work Stress
• According to the National Institute for Occupational
Safety and Health, job stress poses a major threat to
the health of workers. A national survey of
restaurant employees found that 75% said that
work stress had a negative impact on their personal
lives. A random sample of 100 employees from a
large chain finds that 68 answer “Yes” when asked,
“Does work stress have a negative impact on you
personal life?” Is this good reason to think that the
proportion of all employees in this chain who would
say “Yes” differs from the national proportion of
0.75?
Work Stress
• Step 1 – Hypotheses
• Step 2 – We should use a one-proportion z-test.
– SRS – We are told it was an SRS.
– Normality – The expected number of “Yes” and “No”
responses are 75 and 25 which are both larger than 10.
– Independence – This large chain must have at least 1000
employees.
Work Stress
• Step 3 – Calculations
• Since we are testing that it is not equal to, we must
find the probability that it is less than -1.62 or
greater than 1.62.
• Step 4 – Interpretation
– Since our P-value of 0.1052 is fairly large, we would fail
to reject the null hypothesis. Therefore, there is no
reason to believe that the proportion of workers at the
large restaurant chain who suffer from work stress is
different than the national survey result of 0.75.
Work Stress
Work Stress
Work Stress
• For the work stress example, we arbitrarily chose a
response of “Yes” to be a success and “No” to be a
failure.
• What would happen if we reversed these.
• Let’s repeat the significance test with “No” being a
success. The national comparison value for the
significance test will now be 0.25, the proportion in
the national sample who said “No.”
Work Stress, Again
• Step 1 – Hypotheses
• Step 2 – We should use a one-proportion z-test.
– SRS – We are told it was an SRS.
– Normality – The expected number of “No” and “Yes”
responses are 25 and 75 which are both larger than 10.
– Independence – This large chain must have at least 1000
employees.
Work Stress, Again
• Step 3 – Calculations
• Since we are testing that it is not equal to, we must
find the probability that it is less than -1.62 or
greater than 1.62.
• Step 4 – Interpretation
– Since our P-value of 0.1052 is fairly large, we would fail
to reject the null hypothesis. Therefore, there is no
reason to believe that the proportion of workers at the
large restaurant chain who suffer from work stress is
different than the national survey result of 0.75.
Work Stress, Again
• When we interchanged “Yes” and “No,” we simply
changed the sign of the test statistic z. Our P-value
remained the same.
• These results are true in general. Our conclusion
does not depend on an arbitrary choice of success
and failure.
Significance Test Results
• The results of a significance test will often have
limited use.
• Obviously, we would never expect the experiences
of a sample to be exactly the same as the overall
population.
• If our sample is sufficiently large, however, we will
have sufficient power to detect a very small
difference.
• On the other hand, if our sample is very small, we
may be unable to detect differences that could be
very important.
• This is why we prefer to include a confidence
interval as part of our analysis.
Confidence Intervals Provide More Info
• A confidence interval allows us to see what other
values of p are compatible with the sample results.
• This is why we will calculate a confidence interval.
Estimating Work Stress
• The restaurant worker survey found that 68 out of 100
employees agreed that work stress had a negative
impact on their personal lives. So we want to create a
95% confidence interval.
• Step 1 – Population
– We want to estimate the proportion of restaurant workers
who believe that work stress had a negative impact on their
personal lives.
• Step 2 – We should use a one-proportion z-interval.
– SRS – We are told it was an SRS.
– Normality – We have 68 successes and 32 failures in our
sample. Therefore it is approximately Normal.
– Independence – This large chain must have at least 1000
employees.
Estimating Work Stress
• Note: Checking normality for a confidence interval
and a significance test are different. For the
confidence interval we check that
while we have to check that
is true for a significance test
since we have to assume that the null hypothesis is
true.
Work Stress, Again
• Step 3 – Calculations
• Step 4 – Interpretation
– We are 95% confident that between 59% and 77% of the
restaurant chain’s employees feel that work related
stress is damaging their personal lives.
Estimating Work Stress
• The confidence interval gives us much more
information than the significance test.
• The confidence interval tells us which values of p are
consistent with the sample results.
• Notice that we use the standard error to create a
confidence interval while we use the hypothesized
value to calculate the z statistic.
• As a result, we do not have the nice relationship
between a confidence interval and a two-tailed
significance test like we did for means. The results are
still very close, but are not as exact as they were for
means.
• Our confidence interval (0.59, 0.77) gives an
approximate range of p0’s that would not be rejected by
a test at the α = 0.05 significance level.
Download