Estimation of the mean Central limit theorem

advertisement
Hypothesis testing
Summer Program
Brian Healy
Last class

Study design
– What is sampling variability?
– How does our sample effect the questions we
can answer?
Basics of probability
 Central limit theorem
 Sample mean

What are we doing today?
Rare event
 p-value
 Hypothesis test
 t-distribution / sample standard deviation

Big picture
We discussed last week that we could estimate
the population mean with the sample mean and
the central limit theorem told us the distribution
of the sample mean.
 Now, we are going to consider testing whether
or not our sample mean is equal to a
hypothesized value. We call this hypothesized
value the null hypothesis. This test allows us to
compare our sample to a value in a statistically
meaningful way.

Null hypothesis
We set up our null hypothesis so that we can
reject the null hypothesis. The test is designed
to disprove the null
 The first and most important step in any
problem. This part requires knowledge of the
problem.
 Notation: H0
 H0: My mother can run a 5 minute mile.

– Not: My mother cannot run a 5 minute mile.

H0: The probability of heads on the coin is 0.5.
– Not: The probability is not 0.5
Alternative hypothesis
Notation: HA or H1
 Has two characteristics

– Must cover all values not included in the null
– Must contain the value that we think is going
to happen
HA: My mother runs a mile slower than 5
minutes
 HA: The probability of heads is not 0.5

Hypothesis test
Definition: A statistical test of a null hypothesis
 Completed under the assumption that the null is
true (conditional probability)
 Always want to disprove the null hypothesis

–
–
–
–

Ex. H0: Mom’s mean time<=5:00 One-sided
HA: Mom’s mean time>5:00
Alternatively: H0: Probability of heads=0.5 Two-sided
HA: Probability of heads != 0.5
The most important step is properly defining the
null and alternative hypotheses
How do we test this hypothesis?
Take a sample
 As we have discussed, we want to think carefully
about the how to collect the sample to ensure
that we limit bias confounding and allow the
results to be generalized to the proper
population.
 From this sample, we can find a summary
statistic and compare this to null hypothesis

– Mean (t-test, linear regression)
– Median (Wilcoxon tests, quantile regression)
What does this have to do with the
CLT?

To test a hypothesis, we take a sample and find
the sample mean
– Ex. Have my mom run a mile 10 times, or flip the coin
20 times
– Determining the proper sample size is next class
Under the null hypothesis, we know the
population mean
 We sometimes may know the population
variance
 The distribution of the sample mean is normal
with known mean and variance under these
conditions

Distribution of test statistic

Under the null hypothesis, we know that the
distribution of x is normal with mean m and
standard deviation 
n
Now, we want to find the probability of
observing the sample mean or a value more
extreme, under the null (p-value) to see if the
null hypothesis is likely true or false.
 Have we observed a rare event? Is it rare
enough to reconsider the null?

What is a rare event?
My mom claims that she runs a mile in 5
minutes.
 I think she can’t
 How can I test this?
 What happens if she ran a mile in

– 5:15 minutes?
– 6 minutes?
– 10 minutes?

What if she ran 5 separate miles at 10 minutes
on average?
What is a rare event?






You play a game against a friend. In this game,
you win a dollar if the coin is heads and you lose
a dollar if the coin is tails
What is the null hypothesis?
What if the coin landed on tails 2 consecutive
times?
What if the coin landed on tails 10 consecutive
times?
At what point would you start to get suspicious?
We want to know if the event we observed could
have happened simply by chance or if something
else is more likely going on
P-value
Tells you how rare the event is
 Definition: Given a null hypothesis, the probability of the
observed value or something more extreme
 P(event or something more extreme | Ho is true)
 Ex. Coin toss problem
– Null hypothesis: P(tails)=0.5
– Sample 9 out of 10 tails
– P(9 or more tails | H0 is true)=P(9 tails | H0 is
true)+P(10 tails | H0 is true)=0.011

Alpha level-type I error





Definition: probability of rejecting the null hypothesis
when the null hypothesis is in fact true (rejection
probability).
Usually 0.05 or 0.1, but set by the investigator
Compare the p-value to the alpha level to determine if
you have a significant result. This value defines how rare
an event needs to be for use to say that the event did
not occur by chance.
It is called an error because this conclusion that the
result was not due to chance is wrong a*100% of the
time.
One-sided or two-sided
Steps for hypothesis testing
State null and alternative hypotheses
State type of test and alpha level
Determine and calculate appropriate test
statistic
Calculate p-value
Decide whether to reject or not reject the null
hypothesis
1)
2)
3)
4)
5)
•
6)
NEVER accept null
Write conclusion
Example
A study in New Bedford was looking at pregnant
teens to see how long after pregnancy did each
young woman arrive at the physician’s office for
the first visit and the amount of time between
the first visit and the second visit.
 Questions: Do teens from a low income area
arrive at a clinic later than the average woman?
Is there more time between the first and second
visit among these teens?

It is known that the average amount of time
from conception until a woman first visits her
doctor is 8.5 weeks (this number is an estimate
because it is difficult to know exactly when
conception occurred) and the average amount of
time from first visit to second visit is 4.3 weeks.
 For the moment, let’s assume that we know the
population standard deviations for each of these
are 2.6 weeks and 2.2 weeks, respectively.
 We have collected a sample of 35 pregnant
teens and we would like to know if they take
longer to get their first visit than the average
woman

Sample data




As with all of the data sets from now on, the data is on
the BIO232 website.
Let’s determine the mean for this sample and compare it
to the hypothesized value.
preg<-read.table(“preg.dat”, header=T)
first<-preg[,1]
mean(first) #This is the sample mean
[1] 9.74
So the sample mean is clearly not equal to the
population mean (8.5 weeks), but is it sufficiently
different to say that these girls are different than the
population.
Steps for hypothesis testing
Null: m=8.5 weeks, Alternative: m != 8.5 weeks
2) One sample hypothesis test, alpha=0.05
1)
3)
z
x  m 9.74  8.5

 2.82
 n 2.6 35
Area in upper tail = 0.0024, p-value = 0.0048
5) Reject null
6) Conclusion: There is a difference in the amount
of time from conception to the first visit to a
physician. The time is longer for the pregnant
teens.
4)
Picture

Here is a picture
Area=0.0024
Area=0.0024
8.5
9.74
Normal hypothesis test in R
To complete a normal hypothesis test in R,
you can simply use the pnorm command
with the appropriate mean and standard
deviation. Remember, pnorm provides the
area in the lower tail in all cases
 For the previous problem, to get the
appropriate 2-sided p-value, use
(1-pnorm(9.74,8.5,2.6))*2

Another way to look at the test



Given a specific alpha
level, you can find the
cut-off for which all
values more extreme, the
null hypothesis would be
z
rejected
The region more extreme
8.5
is called the rejection
region
cut  8.5
 1.96
For our present problem,
2.6 35
the cut-off for the
rejection region would be
Area=0.025
cut-off=9.36
 2.6 
cut  8.5  1.96
  9.36
 35 
Practice





Here are the times my mom ran in the 10 trials.
Test the null hypothesis that she can runs a 9:00
mile on average.
mom<-c(9.5, 10, 8.75, 9, 11.2, 8.65, 9.6, 10.2,
8.8, 9.8)
What are the null and alternative hypotheses?
What do you conclude?
What would have happened if we had completed
a two-sided test?
Comparison of one-sided and twosided tests
Two-sided p-value is always twice one-sided pvalue.
 Two-sided test is more conservative because the
rejection region is split between the high and
low side. For the one-sided test, the rejection
region is only on the side of interest
 Two-sided test most common in literature even
though usually people know the direction of
effect they are interested in detecting.
 Picture

Wait a minute
Up to now, assumed we know the population
variance (is this a good assumption?)
 How could we estimate the population variance?

– Sample variance!!!

1 n
2
s 
xi  x

n  1 i 1

2
– Is the sample variance exactly equal to population
variance?
– How can we account for the additional uncertainty?

Now, we need to do a little math
t-distribution
Assume Xi are iid normal
X m
 Normal distribution  ~ N (0,1)

n

Chi-square distribution (Proof of this is given
in Casella and Berger and in Inference I)
(n  1) S 2
2

~  n21
t-distribution- ratio of Normal (U) and chisquare (V) X  m  ( X  m ) ( n )  U ~ t
S
n
(n  1) S 2 (n  1) 2
V (n  1)
n 1
t-distribution

Heavier tails than normal distribution
– Accounts for additional variability
– Tails heavier with fewer degrees of freedom
(dof)
As dof goes to infinity, t dist  normal dist
 Can use t-dist test statistic just as the
previous
 Remember assumption of underlying
normal

Histogram of second

3
2
1
We can use a t-test to
test the second null
hypothesis about our
pregnant teens, namely
that the time from the
first visit to the second
visit is the same as in the
general population
First, we need to ensure
that the underlying
distribution is
approximately normal
0

Frequency
4
5
6
Example
2
4
6
second
8
10
Steps for hypothesis testing
Null: m=4.3 weeks, Alternative: m != 4.3 weeks
2) One sample hypothesis t-test, alpha=0.05
3)
x  m 5.97  4.8
1)
t34 
s
n

2.04
35
 3.4
p-value = 0.0017
5) Reject null
6) Conclusion: There is a difference in the amount
of time from the first visit to the second visit.
The time is longer for the pregnant teens.
4)
One sample t-test in R

To complete a t-test in R, use
> t.test(second,mu=4.8)
One Sample t-test
data: second
t = 3.4035, df = 34, p-value = 0.00172
alternative hypothesis: true mean is not equal to 4.8
95 percent confidence interval:
5.271960 6.670897
sample estimates:
mean of x
5.971429
Practice

Using the class data set, test the following
hypotheses:
– The average age of an incoming student to
the biostat program is 25. Is the mean age of
this year’s class significantly different? Is
there anything we need to consider in this
analysis?
– The average height of an incoming student is
71 inches. Is the mean height of this year’s
class significantly shorter?
More practice

The TV watching habits of my seventh
grade classes are shown in the dataset
TV.dat from the course website. The
gender and age of the students is given as
well. How did my students TV watching
habits compare to the national average for
7th graders of 4 hours/day? Use an alpha
level of 0.01.
Download