16-f11-bgunderson-wb

advertisement
Author(s): Brenda Gunderson, Ph.D., 2011
License: Unless otherwise noted, this material is made available under the
terms of the Creative Commons Attribution–Non-commercial–Share
Alike 3.0 License: http://creativecommons.org/licenses/by-nc-sa/3.0/
We have reviewed this material in accordance with U.S. Copyright Law and have tried to maximize your
ability to use, share, and adapt it. The citation key on the following slide provides information about how you
may share and adapt this material.
Copyright holders of content included in this material should contact open.michigan@umich.edu with any
questions, corrections, or clarification regarding the use of content.
For more information about how to cite these materials visit http://open.umich.edu/education/about/terms-of-use.
Any medical information in this material is intended to inform and educate and is not a tool for self-diagnosis
or a replacement for medical evaluation, advice, diagnosis or treatment by a healthcare professional. Please
speak to your physician if you have questions about your medical condition.
Viewer discretion is advised: Some medical content is graphic and may not be suitable for all viewers.
Some material may be sourced from:
Mind on Statistics
Utts/Heckard, 3rd Edition, Duxbury, 2006
Text Only: ISBN 0495667161
Bundled version: ISBN 1111978301
Material from this publication used with permission.
Attribution Key
for more information see: http://open.umich.edu/wiki/AttributionPolicy
Use + Share + Adapt
{ Content the copyright holder, author, or law permits you to use, share and adapt. }
Public Domain – Government: Works that are produced by the U.S. Government. (17 USC §
105)
Public Domain – Expired: Works that are no longer protected due to an expired copyright term.
Public Domain – Self Dedicated: Works that a copyright holder has dedicated to the public domain.
Creative Commons – Zero Waiver
Creative Commons – Attribution License
Creative Commons – Attribution Share Alike License
Creative Commons – Attribution Noncommercial License
Creative Commons – Attribution Noncommercial Share Alike License
GNU – Free Documentation License
Make Your Own Assessment
{ Content Open.Michigan believes can be used, shared, and adapted because it is ineligible for copyright. }
Public Domain – Ineligible: Works that are ineligible for copyright protection in the U.S. (17 USC § 102(b)) *laws in
your jurisdiction may differ
{ Content Open.Michigan has used under a Fair Use determination. }
Fair Use: Use of works that is determined to be Fair consistent with the U.S. Copyright Act. (17 USC § 107) *laws in your
jurisdiction may differ
Our determination DOES NOT mean that all uses of this 3rd-party content are Fair Uses and we DO NOT guarantee that
your use of the content is Fair.
To use this content you should do your own independent analysis to determine whether or not your use will be Fair.
Module 5: One Sample t-Test Procedures
Objectives: In this module you will learn an important statistical technique that will allow you to answer
the question, “Was it due to chance, or is there something else?” The objective is to guide you in the
understanding of the ideas behind tests of statistical significance and the statistical language involved.
This module first presents a general overview of testing. The activity discusses the one-sample t test for
a population mean. Module 6 will cover the paired data scenario for a population mean difference and
Module 7 will discuss inference for comparing two population means based on independent random
samples.
Overview of testing: A test of hypotheses or significance test is a procedure designed to assess the
evidence provided by the data in favor of some statement about a population parameter. Elements of a
statistical test include: a null and alternative hypothesis, assumption checking, a test statistic, a p-value,
a decision, and a conclusion. The choice of the test statistic depends on the distribution of the
population from which the data come as well as the hypotheses being considered.
The null hypothesis, H0, represents the status quo or statement of no effect. It is generally the model
that the experimenter would like to replace. The alternative hypothesis, Ha, usually represents the
experimenter's new model, what the experimenter would like to support (some texts will write the
alternative as H1). It may be a denial of the null hypothesis (two-sided test, ) or it may specify a
direction of interest (one-sided test; >, <).
The purpose of significance tests is to assess whether or not the observed data are consistent with the
null hypothesis within the reasonable bounds of sampling variability. If the data seem to be unlikely to
occur if the null hypothesis is assumed to be true, then we would reject the statement made in the null
hypothesis. The test statistic is a summary of the data that is used to help make the decision. It is a
random variable related to the hypotheses of interest having a known probability distribution (under
the null hypothesis) and will be examined for evidence for or against H0.
In hypothesis testing, it is often preferable to report the p-value, a number that is used to indicate the
degree of significance of the data. The p-value is the probability of getting a test statistic as extreme or
more extreme than the observed value of the test statistic, assuming the null hypothesis is true. Here
“extreme” means in the direction of Ha or providing more evidence against H0.
Hypothesis Testing Steps:
1. Determine appropriate null and alternative hypotheses.
2. Check assumptions for performing the test and calculate the test statistic.
3. Calculate the p-value under the assumption the null hypothesis is true.
4. Determine if the result is statistically significant.
5. Report a conclusion in the context of the problem.
52
We must decide in advance how much evidence against H0 we will insist on. This designated amount of
evidence is called the level of significance and denoted by  (alpha). Common values of  are 0.01,
0.05, and 0.10. The decision is made to reject H0 if the p-value is less than or equal to  . If we reject the
null hypothesis, the results of the test are said to be statistically significant at level  . A “significant”
result in the statistical sense does not necessarily imply an “important” result in the practical sense. It
means simply that such a difference from the null hypothesis is “not very likely to happen just by
chance”.
There are two types of errors that can be made in hypothesis testing. If the null hypothesis is true but
the decision is to reject H0, then a Type I error is said to have occurred. However, failing to reject H0
when the alternative hypothesis is true is called a Type II error. If the null hypothesis is true, the level
of significance  is also the probability of a Type I error. The probability of a Type II error is denoted
by  .
The power of a test measures its ability to detect an alternative hypothesis when it is true. Power
against a particular alternative is calculated as the probability that the test will reject H0 when the
alternative hypothesis is true and thus represented by 1 –  
Truth
Decision Made
Result
Reject H0
Type I Error
Not Reject H0
Correct Decision
Reject H0
Correct Decision
Not Reject H0
Type II Error
H0 True
Ha True
A few notes:
1. In practice we want to protect the status quo, so we are most concerned with type 1 error. We fix
type 1 error at a constant value by fixing the significance level .
2. Most tests we describe have the smallest  for given .
3. For a fixed sample size n, there is a tradeoff between  and . Decreasing one type of error
increases the other.
Ideally we want the probabilities of making a mistake to be small.
However, once a decision is made, it is either right or wrong – only prior to looking at the data can we
talk about probabilities of making these two types of errors.
53
Overview of One Sample t-test: The one sample t-test is used to test whether the mean of a
quantitative variable is significantly different from some value. This value, the test value, is given in the
null hypothesis (H0: µ = µ0 ) and is often taken to be zero; i.e. H0: µ= 0. The test relies on two key
assumptions: (1) the data is a random sample and (2) the data are observations from a normally
distributed population. Thanks to the central limit theorem (see Module 3) the assumption of normality
can be relaxed if our sample size, n, is large. (Generally, ‘large’ means more than 30 observations
though it depends somewhat on how serious the data depart from normality.) To carry out the test we
first calculate the test statistic:
.
The test statistic, t, tell us how many standard errors the sample mean, , is from the test value, . To
summarize the statistical importance of this distance we can calculate a p-value using the t-distribution
with n-1 degrees of freedom (df). The distribution of the test statistic is t(n-1) rather than normal
because both the sample mean, , and the sample standard deviation, s, are random variables (since
they depend on the data).
Formula card:
54
Activity 1: Is there Salmonella enteritidis in the ice cream?
Background: A massive multistate outbreak of food-borne illness was attributed to Salmonella
enteritidis. Epidemiologists determined that the source of the illness was ice cream. They sampled nine
production runs from the company that had produced the ice cream to determine the level of
Salmonella enteritidis in the ice cream.
The levels of Salmonella enteritidis in Most Probable Number/Gram (MPN/g) are given in the
Salmonella.sav data set (Source Lyman Ott et. al., pg 232). The researchers would like to use these data
to determine whether the average level of Salmonella enteritidis in the ice cream is greater than 0.3
MPN/g, since such levels are considered to be very dangerous using a 5% significance level. Time
ordering of the data is present since the sample is collected from production runs.
Task: Perform a test to assess if the average level of Salmonella enteritidis is significantly larger than
0.3 MPN/g.
Recall: Write out the Five Steps for conducting a test of hypotheses (Reference page 51).
1.
2.
3.
4.
5.
Before conducting any test, here are a set of questions to ask yourself:
 How many populations are there?
One
Two
More than two
 How many variables are there?
One
Two
 What is the response variable?
 What type of variable is the response?
Categorical
Quantitative
 What type of parameter would be useful for summarizing this response?
Proportion
Mean
Other (see Supplement 3)
Based on the answers to these questions, you should be able to identify the appropriate inference
procedure. You may refer back to Supplement 3 – Name that Scenario for assistance.
The appropriate inference procedure for this scenario is ______________________________
and the specific parameter of interest is ___________________ .
55
1. State the hypotheses: H0: _____ = _____________ Ha: ______
_____________
where _____ represents:
*Your parameter definition should always be a statement about the population(s) under study.
2. Assumption Checks and Computing the Test Statistic
Assumptions:
a. For this scenario, we need to assume that the data are a ___________ sample.
To check this assumption, we would make a _______ plot (if there was time order) of the
observations and look for _____________________________________.
b. We also need to assume that the responses come from a ______________ distributed
_________________ . To check this assumption, we would make a _______ plot.
c. Comment on each plot about whether these assumptions appear satisfied.
Normal Q-Q Plot of SAL
1.0
.8
.7
.8
.6
.6
Expected Normal Value
.5
.4
.4
.3
.2
SAL
.2
.1
.1
.2
.3
.4
.5
.6
.7
.8
0.0
1
Observed Value
2
3
4
5
6
7
8
9
Sequence number
d. Is the assumption of normality really that important with a sample of size 9? Why? If the sample
size were larger would this assumption be as important? Why?
Test-statistic:
e. Generate the t-test output. Use Analyze> Compare Means> One-Sample T-Test.
f.
The test value is _____ (this is the null value from the null hypothesis).
g. What is the value of the test statistic?
h. Provide an interpretation of the test statistic value. (To guide you, see supplement 6 in your lab
module notebook.)
i.
What is the distribution of the test statistic if the null hypothesis is true?
This is the same as asking what model you use to find the p-value.
56
3. Calculate the p-value:
a. What is the SPSS reported p-value? _____________. Is it the p-value we want? _____
b. Draw a picture of the p-value we want. Note you should be able to label the distribution based
on question 2 (i).
c. So, our p-value is _____________________
d. Provide an interpretation of the p-value (see supplement 6).
4. Decision:
What is your decision at a 5% significance level? Reject H0 Fail to reject H0
Remember:
Reject H0

Fail to reject H0 
Results statistically significant
Results not statistically significant
5. Conclusion: What is your conclusion in the context of the problem?
Conclusions should not be too strong -- i.e. say you have sufficient evidence or equivalent, do NOT
say we have proven.
Conclusions should always include a reference to the population parameter of interest.
57
Activity 2: Testing the Population Mean with p-value practice
For each of the following sets of hypotheses and test statistic values, provide a sketch of the p-value.
You can assume the degrees of freedom are 20 throughout. Then use your computer with the pval()
function in R, your calculator (if it has the capabilities), or the provided partial t distribution table and
compute the p-value (or bounds for the p-value) and circle the corresponding statistical decision (using
a 5% significance level). Note: This partial t distribution table can also be found on your formula card in
Table A.3.
Df
20
1.28
0.108
1.50
0.075
Absolute Value of t-Statistic
1.65
1.80
2.00
0.057
0.043
0.030
2.33
0.015
2.58
0.009
3.00
0.004
1. H 0 :   20, H a :   20, t  2.58
Sketch:
Reject H0
Do Not Reject H0
2. H 0 :   6, H a :   6, t  2.12
Sketch:
Reject H0
Do Not Reject H0
3. H 0 :  d  0, H a :  d  0, t  3.20
Sketch:
Reject H0
Do Not Reject H0
4. H 0 :   6, H a :   6, t  1.87
Sketch:
Reject H0
Do Not Reject H0
58
Check Your Understanding:
Note that while the alternative hypothesis and significance level must be set in advance, we can think
about what would have happened if the alternative had originally been two-sided.
Complete the following to understand how the analysis of the ice-cream data in Activity 1 would change
assuming the alternative was two-sided (instead of one-sided as you did on page 54).
The test statistic would be:
-2.205
0
2.205
The distribution of the test statistic assuming the null hypothesis is true is:
t(8) t(9)
t(16)
Your interpretation of the value of the test statistic would change to:
The p-value would be:
0.0295
0.059
1-0.0295
Your new decision at a 5% level would be: Reject H0
1-0.059
Fail to reject H0
If you were going to repeat the Salmonella study and wanted to increase the power of the test, which of
the following could you do?

Increase alpha

Increase beta

Increase the sample size, n

Decrease the sample size, n
59
Example Exam Question on One-Sample t-test
The Healthy Life Company produces canned whole pineapples. The production line is supposed to yield
cans with an average weight of 1 kg. In order to assess that the production line is operating as it should,
the quality control statistician at Healthy Life selects a random sample of 36 cans produced over the
course of the shift (keeping track of the order number). The latest sample gave a mean weight of 980
grams (1 kg = 1000 grams). Based on this sample, an estimate of the average distance that possible
sample means are from the true population mean is about 8.33 grams.
a. The value of 8.33 grams described above corresponds to what statistical term or notation?
(be specific)
b. The hypotheses to be tested are H0:  = 1000 grams versus Ha:   1000 grams. To examine the
data, various graphs were made. One graph is given below. State what assumption can be checked
using the graph and comment on the validity of the assumption.
WEIGHT
Assumption: ____________________________________________.
Based on this graph, this assumption appears (circle one): valid not valid because:
1100
1050
1000
950
900
850
1
3
5
7
9 11 13 15 17 19 21 23 25 27 29 31 33 35
Order number
c. The test is to be performed using a 5% significance level. The observed t-test statistic is t = –2.4 and
the corresponding 2-tailed p-value is 0.022. Consider the statements below and clearly circle all
those that are correct statements for this statistical analysis.
The null hypothesis is rejected at the 5% level.
The results are not statistically significant at the 5% level.
The probability that the null hypothesis is true is 0.022.
If this process were operating as it should and repeated random samples of 36 cans
were obtained, we would observe a t statistic of –2.4 or smaller or a t statistic of
2.4 or larger in about 2.2% of the repetitions.
d. What would have been the p-value …
 i. if Ha:  < 1000 grams?
ii.
if Ha: > 1000 grams?
e. Compute a 95% CI for the population mean weight. Do you expect your 95% CI to contain 1000?
Why?
60
Download