p-value of 1 - Cloudfront.net

advertisement
Data Analysis:
Simple Statistical Tests
Modified for AP Biology Statistics Unit Lesson
Sampling a Population


When a random study and a sample of a
general population are taken, there are some
characteristics that need to be determined.
Based on those corresponding properties, the
conclusion reached at the end of the study
may be assumed to be representative of that
population.
http://www.real-statistics.com/hypothesis-testing/null-hypothesis/
Why Choose a statistical analysis?


Choose an estimator function for the
characteristic (of the population) to study and
then apply this function to the sample to
obtain an estimate.
Use the appropriate statistical test to then
determine whether this estimate is based
solely on chance.
http://www.real-statistics.com/hypothesis-testing/null-hypothesis/
The Null hypothesis



The hypothesis that the estimate is based
solely on chance is called the null
hypothesis(H0) .
Thus, the null hypothesis is true if the
observed data (in the sample) do not differ
from what would be expected on the basis of
chance alone.
The complement of the null hypothesis is
called the alternative hypothesis.
http://www.real-statistics.com/hypothesis-testing/null-hypothesis/
The Alternative hypothesis

The alternative hypothesis, denoted by H1 or
Ha, is the hypothesis that sample observations
are influenced by some non-random cause.
“For example, suppose we wanted to determine whether a coin was fair and
balanced. A null hypothesis might be that half the flips would result in Heads
and half, in Tails. The alternative hypothesis might be that the number of
Heads and Tails would be very different. Symbolically, these hypotheses would
be expressed as
H0: p = 0.5
Ha: p <> 0.5”
http://stattrek.com/statistics/dictionary.aspx?definition=Alternative%20hypothesis
Chi-Square Statistics Example

A common analysis is whether Disease X
occurs as much among people in Group
A as it does among people in Group B


People are often sorted into groups based
on their exposure to some disease risk
factor
We then perform a test of the association
between exposure and disease in the two
groups
Hypothetical outbreak of
Salmonella on a cruise ship


All 300 people on the cruise ship were
interviewed, and 60 of them had
symptoms consistent with Salmonella
Questionnaires indicated many of the
case-patients ate tomatoes from the
salad bar
The Study and the Tested Population
Research Question: To see if there is a statistical
difference in the amount of illness between those who
ate tomatoes (41/130) and those who did not (19/170)
Null H0: Salmonella infection occurs as much among people in Group A
(ate tomatoes) as it does among people in Group B (did not eat tomatoes)
Alternative H1: Salmonella infection occurs much more among people in
Group A than it does among people in Group B
Table 2a. Cohort study:
Exposure to tomatoes and Salmonella infection
Salmonella?
Yes
No
Total
Tomatoes
41
89
130
No Tomatoes
19
151
170
Total
60
240
300
Characteristics of the Study:

To conduct a chi-square the following
conditions must be met:



There must be at least a total of 30 observations
(people) in the table
Each cell must contain a count of 5 or more
To conduct a chi-square test we compare the
observed data (from study results) with the
data we would expect to see(calculated)
Table 2b. How to calculate the
Expected Values: Total Size
Yes
No
Total
Tomatoes
?
?
130
No Tomatoes
?
?
170
Total
60
240
300


Gives an overall distribution of people who ate
tomatoes and became sick and those that did not
Based on these distributions we can fill in the empty
cells with the expected values
Calculating the Expected Values:


Expected Value
= Row Total x Column Total
Grand Total
For the first cell, people who ate tomatoes and
became ill:
Expected value =

130 x 60
300
=
26
Same formula can be used to calculate the expected
values for each of the other cells
Table 2c. Complete Expected values for
exposure to tomatoes
Salmonella?
Tomatoes
No Tomatoes
Total

Yes
130 x 60 = 26
300
170 x 60 = 34
300
60
No
130 x 240 = 104
300
170 x 240 = 136
300
240
Total
130
170
300
Formula = [(Observed – Expected)2/Expected] for
each cell of the table
Table 2d. Expected values for exposure to tomatoes
Salmonella?
Yes
No
Total
Tomatoes
(41-26)2 = 8.7
26
(89-104)2 = 2.2
104
130
No Tomatoes
(19-34)2 = 6.6
34 34
(151-136)2 = 1.7
136
170
Total
60
240
300

The chi-square (χ2) for this example is:
8.7 + 2.2 + 6.6 + 1.7 = 19.2
Analyze the Chi-Square Test


In general, the higher the chi-square
value, the greater the likelihood
there is a statistically significant
difference between the two groups
you are comparing
To know for sure, you need to look
up the p-value in a chi-square table
P-Values

Using our hypothetical cruise ship Salmonella
outbreak:


How do we know whether the difference
between 32% and 11% is a “real” difference?


32% of people who ate tomatoes got Salmonella
as compared with 11% of people who did not eat
tomatoes
In other words, how do we know that our chisquare value (calculated as 19.2) indicates a
statistically significant difference?
The p-value is our indicator
P-Values



Many statistical tests give both a
numeric result (e.g. a chi-square value)
and a p-value
The p-value ranges between 0 and 1
What does the p-value tell you?

The p-value is the probability of getting the
result you got, assuming that the two
groups you are comparing are
actually the same
P-Values


Start by assuming there is no difference in outcomes
between the groups
Look at the test statistic and p-value to see if they
indicate otherwise
 A low p-value means that (assuming the groups
are the same) the probability of observing these
results by chance is very small
 Difference between the two groups is statistically
significant
 A high p-value means that the two groups were
not that different
 A p-value of 1 means that there was no difference
between the two groups
P-Values <0.05

Generally, if the p-value is less than
0.05, the difference observed is
considered statistically significant, ie.
the difference did not happen by
chance
1)The chi-square value is calculated as 19.2
2)There are two groups
3)Degrees of freedom = 2 - 1 = 1
If p-value >0.05 there is not a significant
difference between groups
If p-value < 0.05 there is a significant
difference between groups
Null H0: Salmonella infection occurs as much among people
in Group A as it does among people in Group B
If p-value >0.05 there is not a significant
difference between groups
If p-value < 0.05 there is a significant
difference between groups
Null H0: Salmonella infection occurs as much among people in
Group A as it does among people in Group B
p-value < 0.05
X2= 19.2
• Reject H0 because 19.2 is greater than 3.84
(for p-value = 0.05)
• There is a significant statistical difference
between the two groups.
• The Salmonella outbreak might have been due
to contaminated tomatoes at the salad bar.
References
1.
2.
3.
4.
Bruce MG, Curtis MB, Payne MM, et al. Lake-associated
outbreak of Escherichia coli O157:H7 in Clark County,
Washington, August 1999. Arch Pediatr Adolesc Med.
2003;157:1016-1021.
Wheeler C, Vogt TM, Armstrong GL, et al. An outbreak of
hepatitis A associated with green onions. N Engl J Med.
2005;353:890-897.
Gregg MB. Field Epidemiology. 2nd ed. New York, NY: Oxford
University Press; 2002.
Aureli P, Fiorucci GC, Caroli D, et al. An outbreak of febrile
gastroenteritis associated with corn contaminated by Listeria
monocytogenes. N Engl J Med. 2000;342:1236-1241.
References
5.
6.
7.
8.
Schafer S, Gillette H, Hedberg K, Cieslak P. A community-wide
pertussis outbreak: an argument for universal booster vaccination.
Arch Intern Med. 2006;166:1317-1321.
Centers for Disease Control and Prevention. Partner counseling and
referral services to identify persons with undiagnosed HIV --- North
Carolina, 2001. MMWR Morb Mort Wkly Rep.2003;52:1181-1184.
Centers for Disease Control and Prevention. Outbreak of Salmonella
Enteritidis infection associated with consumption of raw shell eggs,
1991. MMWR Morb Mort Wkly Rep. 1992;41:369-372.
Centers for Disease Control and Prevention. Outbreak of invasive
group A streptococcus associated with varicella in a childcare center -Boston, Massachusetts, 1997. MMWR Morb Mort Wkly Rep.
1997;46:944-948.
Download