(Hypothesis Testing and CI’s)
We have already looked at confidence intervals as a method of making decisions about the population mean
. The confidence interval details are summarized below.
(100 -
)% Confidence Interval for
(e.g.
.
05
95 % confidence)
Recall the basic form of a confidence interval is as follows:
(estimate) + (table value) * SE(estimate)
For a single population mean a 100(1-
)% CI for
is:
X
t SE ( X ) where SE ( X )
s n
and t = t-distribution quantile with df
n
1 .
Confidence Level
95 % (
.
05 )
90 % (
99 % (
.
10 )
.
01 )
To find the t-statistic value ( t ) use the twotail
at the top of the columns in the table.
Before we look at hypothesis testing for a single population mean we will examine the five basic steps in a hypothesis test and introduce some important terminology and concepts.
1.
75
4.
5.
2.
3.
76
General Form of Hypotheses for a Population Mean:
Null Hypothesis ( H o
H o
:
o
) Alternative Hypothesis (
H a
:
o
H a
) p-value area
Upper-tail
H o
H o
:
:
o
o
H a
H a
:
:
o o
Lower-tail
Two-tailed
(perform test using CI for
)
hypothesized value for the mean assuming the null hypothesis is true. o
1) Suppose in the past the north end of a particular lake had a Secchi depth of 8.0 meters.
Due to recent development around the lake, researchers believe that there is decreased water clarity in that section of the lake. Set up the appropriate hypotheses for this situation.
2) In the community of Morgan Hill, CA there is concern about the perchlorate level found in well water. EPA guidelines suggest that a water supply should have a mean perchlorate level below 4 ppb (parts per billion). Environmental scientists wish to determine if mean perchlorate level in the Morgan Hill water supply is greater than the safe limit. Set up the appropriate hypotheses for this situation.
In general the basic form of most test statistics is given by:
Test Statistic =
( estimate )
SE
( hypothesiz ed
( estimate ) value )
(think “ z-score
”) which measures the discrepancy between the estimate from our sample and the hypothesized value under the null hypothesis. Specifically it gives the number of standard errors above (if positive) or below (if negative) the hypothesized value our observed statistic value is.
Intuitively, if our sample-based estimate is “far away” from the hypothesized value assuming the null hypothesis is true, we will reject the null hypothesis in favor of the alternative or research hypothesis. Extreme test statistic values occur when our estimate is a large number of standard errors away from the hypothesized value under the null.
77
The p-value is the probability, that by chance variation alone, we would get a test statistic as extreme or more extreme than the one observed assuming the null hypothesis is true.
If this probability is “small” then we have evidence against the null hypothesis, in other words we have evidence to support our research hypothesis. t
X
SE ( X o
) or t
X s
o n
~ t-distribution with df = n – 1.
Assumptions:
When making inferences about a single population mean we assume the following:
1. The sample constitutes a random sample from the population of interest.
2. The population distribution is normal. This assumption can be relaxed when our sample size in sufficiently “large”. How large the sample size needs to be is dependent upon how “non-normal” the population distribution is.
Type I and Type II Errors (
&
)
Decision H o
true
Truth
H a
true
Reject
Fail to
Reject
H o
H o
78
Type I and II Errors Example:
Testing Wells for a Perchlorate in Morgan Hill & Gilroy, CA.
EPA guidelines suggest that drinking water should not have a perchlorate level exceeding
4 ppb (parts per billion). Perchlorate contamination in California water (ground, surface, and well) is becoming a widespread problem. The Olin Corp., a manufacturer of road flares in the Morgan Hill area from 1955 to 1996 was is the source of the perchlorate contamination in the this area.
Suppose you are resident of the Morgan Hill area which alternative do you want well testers to use and why? (Which has the more serious type I error, A or B?)
H o
H a
:
:
A
4
4 ppb ppb
or
H o
H a
:
:
B
4 ppb
4 ppb
Example 1: Secchi Disc Readings in Seneca Lake (New York)
In 1997 the mean Secchi depth recorded at the north end of Seneca Lake was 8.0 meters.
In 1999 a depth study was conducted on the north end of the lake on similar dates as in the 1997 study yielding the data below:
9.0 9.0 10.5 8.0 9.2 7.0 8.3 6.0 7.1 4.7 7.8 6.0
7.9 6.5 8.3 7.4
Is there evidence to suggest the Secchi depth has decreased on the north end of the lake?
From the JMP output we see the following:
Normality appears to be satisfied here.
Notice the CI for the mean Secchi depth is (6.89 m, 8.44 m).
79
Hypothesis Test:
H o
:
1)
H
A
:
2) Choose
Test statistic
3) Compute test statistic
4) Find p-value (use t-Probability Calculator.JMP
)
5) Make decision and interpret
80
Performing the t-Test in JMP: Select Distribution from the Analyze menu and put the numeric response in the Y columns box. To perform a t-test in JMP, select Test
Mean from the Secchi Depth pull-down menu and enter value for mean under the null hypothesis, 8.0 in this example.
Conclusion:
Example 2: Testing Wells for a Perchlorate in Morgan Hill & Gilroy, CA
Hypothesis Test:
H o
:
1)
H
A
:
2) Choose
Test statistic
81
3) Compute test statistic
4) Find p-value (use t-Probability Calculator.JMP
)
5) Make decision and interpret
Performing a t-Test in JMP:
To perform a t-test in JMP, select Test Mean from the Perchlorate pull-down menu and enter value for mean under the null hypothesis, 4.0 in this example.
Conclusion:
Should we be using the mean as a measure of the typical perchlorate concentration found in these wells? Why or why not?
82
Consider the log base 10 of the perchlorate levels instead.
Hypothesis Test in Log Scale:
H o
:
H a
:
Confidence Interval in the Log Base 10 Scale
Test results from JMP
83
We have already discussed the confidence interval as a means of make a decision about the value of the population proportion, p . The CI results are summarized below.
Confidence Interval for p
100(1 -
)% CI for p
ˆ z
SE ( ) p
z
ˆ q
ˆ n
Here
sample proportion which is the number of “successes” in our sample divided by the sample size, q
1
, and z = equals a standard normal table value that corresponds to our desired confidence level.
z Confidence Level
95 % (
.
05 )
90 % (
99 % (
.
10 )
.
01 )
1.96
1.645
2.576
H : p
p o o
H a
: p
p o or p
p o or p
p o
( use CI for the two sided which is rarely of interest for p anyway)
Test Statistic
z
p o p o
( 1
p o
)
~ standard normal N(0,1) provided np o
5 and n ( 1
p o
)
5 n
When our sample size is small or we want an exact test we can use the binomial distribution to calculate the p-value, i.e. use the Binomial Exact Test.
84
Example: Hypertension During Finals Week
In the college-age population in this country (18 – 24 yr. olds), about 9.2% have hypertension (systolic BP > 140 mmHg and/or diastolic BP > 90 mmHg). Suppose a sample of n = 196 WSU students is taken during finals week and 29 have hypertension.
Do these data provide evidence that the percentage of students who are hypertensive during finals week is higher than 9.2%?
Hypothesis Test:
H o
:
H
A
:
2) Choose
Test statistic
3) Compute test statistic
4) Find p-value (use Standard Normal Table )
5) Make decision and interpret
85
Binomial Exact Test Use n = 196 and p = .092 (hypothesized value under H o
)
Exact p-value =
Find a 95% Confidence Interval for p:
86
Basic Idea:
Suppose we have taken a sample of size n from a population and found the values of a particular numeric variable of interest. We would like to know whether those values represent a random sample from a normal population. To do this we could use some of graphical tools we have already used namely:
Histograms
Boxplots (look for symmetry , i.e. mean = median, and few outliers)
Smooth curve estimates of the distributional shape
Fit a normal curve to data and check agreement with histogram/smooth curve.
Another graphical tool that can be useful for assessing normality is the normal quantile plot . A normal quantile plot essentially compares the spacing of observed values in our random sample to the spacing we would expect to see if we were indeed sampling from a normal population. The diagrams below illustrate this concept.
87
Examples from data we have seen previously
Maine Mercury Study MN Walleyes (Island Lake only)
Tumor Cell Radii (Benign tumors) Tennessee River DDT Study (all fish)
88
1
vs.
2
89
An experiment was conducted to determine the examine the potential effect cadmium oxide might have on the hemoglobin levels of dogs. It is thought that cadmium oxide exposure would lead to decreased hemoglobin levels. 10 dogs were randomly assigned to the control group and 15 were randomly assigned to the cadmium oxide exposure group.
Research Question: Is there evidence to suggest the cadmium oxide exposure lowers the hemoglobin level found in dogs?
To answer the question of interest we need tools for comparing the population mean hemoglobin level for dogs not exposed to cadmium oxide vs. that for dogs that have had cadmium oxide exposure, i.e. how does
control
compare to
exp osed
?
Basic Idea:
90
1
2
2 common variance to both populations
2
2
Assumptions:
For this case we make the following assumptions
1. The samples from the two populations were drawn independently.
2. The population variances/standard deviations are equal.
3. The populations are both normally distributed. This assumption can be relaxed when the samples from both populations are “large”.
100(1 -
)% Confidence Interval for (
1
2
)
( X
1
X
2
)
t SE ( X
1
X
2
) Rule of Thumb for Checking where
Variance Equality
If the larger sample variance is more than twice the smaller
SE ( X
1
X
2
)
s p
2
1 n
1
1 n
2
sample variance do not assume the variances are equal. where s p
2
( n
1
1 ) n
1 s
1
2
n
2
( n
2
2
1 ) s
2
2 if n
1
n
2 s
2 p
s
1
2
2 s
2
2 if n
1
n
2 s p
2
is called the “pooled estimate of the common variance (
2
)
”. The degrees of freedom for the t-distribution is df
n
1
n
2
2 . The t-quantiles are same as those for the single population mean case described above.
CI Example: Cadmium Exposure and Hemoglobin Levels
91
(
1
vs.
2
)
The general null hypothesis says that the two population means are equal, or equivalently there difference is zero. The alternative or research hypothesis can be any one of the three usual choices (upper-tail, lower-tail, or two-tailed). For the two-tailed case we can perform the test by using a confidence interval for the difference in the population means discussed above.
H o
H a
:
:
1
1
2
2 or equival ently (
or equival ently (
1
1
2
2
)
)
0
0 (upper tail) or
H a
:
1
2 or equivalent ly (
1
2
)
0 (two tailed, USE CI!
) etc....
Test Statistic t
( X
1
SE (
X
1
X
2
)
X
0
2
)
~ t-distribu tion with df
n
1
n
2
2 where the SE ( X
1
X
2
) is as defined in the confidence interval section above.
92
Testing Example: Cadmium Exposure and Hemoglobin Levels
Conducting the Test in JMP
93
Example 2: Lead Exposure and Motor Function of Children in El Paso Texas
In a public health study of children living in close proximity to a lead smelter in El Paso,
TX, children were tested for lead levels and given a series of IQ and motor function tests.
Children were classified as having high lead levels if their blood level exceeded 40
g/ml. One of the motor function tests was a finger wrist tapping test.
Did the children with high lead levels perform significantly lower than children who did not have elevated lead levels?
Can we assume equal variances and normality of the two populations?
94
Two-sample t-Test results
Conclusion:
95
1
Assumptions:
2
For this case we make the following assumptions
1. The samples from the two populations were drawn independently.
2. The population variances/standard deviations are NOT equal.
(This can be formally tested or use rule o’thumb)
3. The populations are both normally distributed. This assumption can be relaxed when the samples from both populations are “large”.
100(1 -
)% Confidence Interval for (
1
2
)
( X
1
X
2
)
t SE ( X
1
X
2
) where
SE ( X
1
X
2
)
s
1
2 n
1
s
2
2 n
2 and
2 s
1
2 n
1
s
2
2 n
2 df
2 2 rounded down to the nearest integer s
1
2 s
2
2 n
1 n
1
1
n n
2
2
1
The t-quantiles are the same as those we have seen previously.
Hypothesis Testing
Test Statistic t
( X
1
SE (
X
1
X
2
)
0
X
2
)
~ t-distribution with df = (using formula above) where the SE ( X
1
X
2
) is as defined above.
96
Example: Mean Cell Radii of Malignant vs. Benign Breast Tumors
In your previous work with these data you noticed that the radii of malignant breast tumor cells were generally larger than the radii of benign breast tumor cells. Assuming the researchers initially hypothesized that cancerous breast tumor cells have larger mean radii than non-cancerous cells, conduct a test to see if this is supported by these data.
The cell radii of the malignant tumors certainly appear to be larger than the cell radii of the benign tumors. The summary statistics support this with sample means/medians of rough 17 and 12 units respectively. The 95% CI’s for the mean cell radius for the two tumor groups do not overlap, which further supports a significant difference in the cell radii exists.
Can we assume the population variances are equal?
97
Formally Testing the Equality of Population Variances
H
H a o
:
:
1
2
1
2
2
2
2
2
or equivalently
H o
H a
:
:
1
1
2
2
In JMP
Test Statistic
F
max
s s
1
2
2
2
, s s
1
2
2
2
which has an F-distribution with numerator df = n
1
1 and denominator df = s
1
2 s
2
2
and are reversed if s
2
1 s
2
2
. n
2
1 if
If F is large then one variance is several times larger than the other and we should reject the null in favor of the alternative. There is separate F-table for each level of significance. If our test statistic value exceeds the value in the table for appropriate level of significance and degrees of freedom we reject the null hypothesis.
BETTER TO JUST USE JMP!!!
Because we conclude that the population variances are unequal we should use the nonpooled version to the two-sample t-test. No one does this by hand, so we will use JMP.
Conclusion:
98
When using dependent samples each observation from population 1 has a one-to-one correspondence with an observation from population 2. One of the most common cases where this arises is when we measure the response on the same subjects before and after treatment. This is commonly called a “pre-test/post-test” situation. However, sometimes we have pairs of subjects in the two populations meaningfully matched on some prespecified criteria. For example, we might match individuals who are the same race, gender, socio-economic status, height, weight, etc... to control for the influence these characteristics might have on the response of interest. When this is done we say that we are “controlling for the effects of race, gender, etc...”. By using matched-pairs of subjects we are in effect removing the effect of potential confounding factors , thus giving us a clearer picture of the difference between the two populations being studied.
DATA FORMAT
Matched Pair X
1 i
X
2 i d i
X
1 i
X
2 i For the sample paired differences
1 X
11
X
21 d
1
( d i
' ) find the sample mean ( d )
2 X
12
X
22 d
2 and standard deviation ( s d
) .
3 X
13
X
23 d
3
... n
... ... ...
X
1 n
X
2 n
The hypotheses are
H o
:
d
0
H a
:
d
0 or H a
:
d d n
0
We actually can hypothesize any size difference for the mean of the paired differences that we want. For example if wanted to show a certain diet resulted in at least a 10 lb. decrease in weight then we could test if the paired differences: d = Initial weight – After diet weight had mean greater than 10 ( H a
:
d
10 lbs.
)
0 or H a
:
d
Test Statistic for a Paired t-test t
d
d ~ t-distribution with df = n - 1 s d n
Note:
d
the hypothesized value for the mean paired difference under the null.
Assumptions:
1. The samples are random and meaningfully paired, i.e. are dependent.
2. Paired differences from two populations must be normally distributed.
100(1-
)% CI for
d d
t s d where t comes from the appropriate quantile of t-distribution df = n – 1. n
This interval has a 100(1-
)% chance of covering the true mean paired difference.
99
Example: Effect of Captopril on Blood Pressure
In order to estimate the effect of the drug Captopril on blood pressure (both systolic and diastolic) the drug is administered to a random sample n = 15 subjects. Each subjects blood pressure was recorded before taking the drug and then 30 minutes after taking the drug. The data are shown below.
Syspre – initial systolic blood pressure
Syspost – systolic blood pressure 30 minutes after taking the drug
Diapre – initial diastolic blood pressure
Diapost – diastolic blood pressure 30 minutes after taking the drug
Research Questions:
Is there evidence to suggest that Captopril results in a mean systolic blood pressure decrease of at least 10 mmHg
*
on average in patients 30 minutes after taking it?
(* It is decided that a mean change of less than 10 mmHg is of no physiological importance.)
Is there evidence to suggest that Captopril results in a mean diastolic blood pressure decrease of at least 5 mmHg
*
on average in patients 30 minutes after taking it?
(* It is decided that a mean change of less than 5 mmHg is of no physiological importance.)
For each blood pressure we need to consider paired differences of the form d i
BPpre i
BPpost i
For paired differences defined this way, positive values correspond to a reduction in their blood pressure ½ hour after taking Captopril. To answer the research questions above we need to conduct the following hypothesis tests:
H o
H a
:
syspre
syspost
:
syspre
syspost
10 mmHg
10 mmHg
and
H
H o a
:
:
diapre
diapost
diapre
diapost
5 mmHg
5 mmHg
Below are the relevant statistical summaries of the paired differences for both blood pressure measurements.
100
Diastolic BP
The t-statistics for both tests are given below:
Systolic BP
We can use the t-Probability Calculator in JMP to find the associated p-values or better yet use JMP to conduct the entire t-test.
Systolic Blood Pressure Diastolic Blood Pressure
Both tests result in rejection of the null hypotheses. This we have sufficient evidence to suggest that taking Captopril will result in mean decrease in systolic blood pressure exceeding 10 mmHg (p = _______) and a mean decrease in diastolic blood pressure exceeding 5 mmHg (p = _______). Furthermore we estimate that the mean change in systolic blood pressure will be somewhere between _______ mmHg and ______ mmHg, and that the mean change in diastolic blood pressure could be as large as ______ mmHg.
101
95% CI’s for the Mean Change in Systolic and Diastolic Blood Pressures
Let
We examined these data when we looked at applications of the binomial distribution.
We will now use a paired t-test to answer the research question:
Does the heart rate of a rat increased when it is taken from a solitary cage and placed in a cage with other rats?
102
( p
1 vs.
p
2
)
As we have seen previously,
100(1 -
)% Confidence Interval for ( p
1
p
2
)
(
1
where,
2
)
z
SE (
1
2
)
(provided n
1
& n
2
are “large”)
SE (
ˆ
1
2
)
ˆ
1
( 1
n
1
ˆ
1
)
2
( 1
n
2
2
) and
“Large” sample sizes
Both samples should be larger than 25 and both samples should have more than 5 “successes” and more than 5 “failures”
Confidence Level
95 % (
.
05 )
90 % (
99 % (
.
10 )
.
01 )
z
1.96
1.645
2.576
Hypothesis Testing
H o
: p
1
p
2 or equival ently ( p
1
H a
: p
1
p
2 or equival ently ( p
1
p
2
) p
2
)
0
0 (upper tail ) or
H a
: p
1
p
2 or
H a
: p
1
p
2
Test Statistic z
(
SE
1
(
1
2
)
2
0
)
~ standard normal distribution provided n
1
, n
2 are “large” (see above)
103
Where,
SE (
ˆ
1
ˆ
2
)
p q
1 n
1
1 n
2
and p
# of successes in combined n
1
n
2 sample
n
1
ˆ n
1
1
n
2
n
2
ˆ
2
An important enemy of the snail ( Cepaea nemoralis ) is the song thrush. These birds select snails from snail colonies and take them to nearby rocks. There the birds break open the snails, eat the soft parts, and leave the shells. In a study of natural selection, the proportion of unbanded shells in the rocks were compared to the proportion of unbanded snails in the nearby colony. The background in the bog was fairly uniform. It was felt that, because of their ability to blend into the background, the unbanded snails would be better protected from predators than the banded members of the colony. This would result in the proportion of unbanded snails in the rocks being smaller than that of unbanded snails in the colony.
In a sample of n = 863 broken shells around rocks, 377 were unbanded, which is a sample percentage of 43.7%. Of n = 560 individuals collected in the bog, 296 were unbanded, which is a sample percentage of 52.9%.
Conduct an appropriate test to compare the proportions and construct a 95%
3)
2) confidence interval for the difference in these two population proportions.
Hypothesis Test:
1)
104
4)
5)
Construct and interpret a 95% CI for ( p colony
p rocks
)
Enter these data as you would for setting up a 2 X 2 contingency table.
In JMP, select Analyze > Fit Y by X and place Group in the X box and Shell Type in the
Y. The output from JMP is obtained.
105
The results of Fisher's Exact Test are always included in the JMP output whenever we are working with a 2 X 2contingency table.
The three p-values given are for testing the following:
(1) Left, p-value = .0004 is for testing if the proportion of unbanded snails in the bog colony is greater than proportion of unbanded snails amongst those dead on the rocks. This is clearly supports the researchers natural selection theory that unbanded snails are less likely to end up dead on the rocks.
(2) Right, p-value = .9997 is for testing if the proportion of unbanded snails amongst those dead on the rocks is greater than the proportion of unbanded snails found in the bog colony. This is the opposite of the research hypothesis is clearly not supported by these data.
(3) 2-Tail, p-value = .0008 is for testing if the proportion of unbanded snails found in the two populations, those living in the bog and those dead on rocks, is different. We have very strong evidence against the equality of these two proportions (p = .0008).
106
Example 2: Low Birth Weight and Smoking
These data come from a study looking at the effects of smoking during pregnancy on birth weight. Amongst the 381 non-smokers in the study, 13 had babies with low birth weight, while amongst the 299 mothers who smoked during pregnancy, 28 had babies with low birth weight. Is there evidence to suggest that the proportion of babies born with low birth weight is greater for mothers who smoked during pregnancy?
Portion of data table…
Smoking
Status
Nonsmoker
Smoker
Column
Totals
Normal
Birth
Weight
368
96.59%
271
90.64%
639
Low Birth
Weight
13
3.41%
28
9.36%
41
Row
Totals
381
299
680
One column denotes smoking status and the other status of the infants birth weight. There are n = 680 rows in the spreadsheet.
Hypothesis Test:
3)
4)
1)
2)
107
5)
Construct and interpret a 95% CI for ( p smo ker
p non
smo ker
)
(see Two Sample Test for Proportions output from JMP below)
Fisher’s Exact Test Results from JMP
Conclusion:
108
We can also estimate the RR and OR for low birth weight associated with smoking during pregnancy using confidence intervals as shown on pages 69-70 of your earlier notes.
109