Hypothesis Testing II

advertisement
Exam #1
Wednesday, July 18th
Questions about the Assignment
For Part II, only one person calculated the 95% confidence
interval using the standard error method.
Exam will be written and in-class. They will be a
combination of defining terms, solving problems, and
interpreting/discussing results.
Exams will be closed book and closed computer. You may
bring a (non-cell phone) calculator and one double-sided 8 ½
x 11 page of notes. You must prepare this page of notes
yourself and submit it along with your exam.
There will be no make-up for exams. If an exam must be
missed, absence must be officially excused in advance.
Hypothesis Testing II
Hypothesis Testing II
What is the probability that our observed outcome
could have occurred by random chance?
Randomization distribution
p-value
Statistical significance
Exercise and Gender Study
p-value
www.lock5stat.com/statkey/
We use the
randomization
sampling
distribution to
calculate the p-value
of the observed
sample statistic.
The p-value is the probability of getting a sample statistic as
extreme as the observed sample statistic, just by random
chance, if the null hypothesis is true.
The smaller the p-value (i.e., the smaller the probability), the
stronger the evidence is against the H0 and in favor of the Ha.
p-value
Right Tail
Enter 3
The p-value is the
proportion of
randomization
sample statistics that
are as extreme as
our observed sample
statistic.
You could get the pvalue by counting
the red dots.
1
Exercise and Gender Study
p-value
www.lock5stat.com/statkey/
If time spent
exercising did not
differ by gender, we
would see a
difference in sample
means as extreme as
3 hours in about
10% of our studies.
p-value
Example: The observed sample statistic from Study A has a
p-value of 0.002 and the observed sample statistic from
Study B has a p-value of 0.2. Which study provides stronger
evidence against the null hypothesis?
The lower the p-value, the stronger the evidence against the
A. Study A
null hypothesis.
B. Study B
C. Study A and Study B provide equally strong evidence
Right Tail
Enter 3
An Experiment on Cocaine Addiction
Cocaine Addiction
Research Question: Is Desipramine effective at treating cocaine
addiction?
Research Question: Is Desipramine effective at treating cocaine
addiction?
In a randomized experiment on treating cocaine addiction 48 cocaine
addicts were randomly assigned to take either Desipramine (a new
drug) or a placebo. Then they were followed to see who relapsed.
The null hypothesis is the claim that there is no effect or no difference
What would be our null hypothesis for this experiment?
Desipramine is equally effective as a placebo at treating cocaine addiction.
Sample size (n) = 48 cocaine addicts
H0: pD = pP (or pD – pP = 0)
Two Variables:
Treatment given: Desipramine or a placebo
Outcome: Relapsed or No Relapse
The alternative hypothesis is the claim that we seek evidence for.
What are the sample statistics we need for this study?
̂ D: The proportion of people treated with Desipramine who relapsed
̂ P: The proportion of people treated with a placebo who relapsed
What would be our alternative hypothesis?
Desipramine is more effective than a placebo at treating cocaine addiction.
Ha: pD < pP (or pD – pP < 0)
Conducting the Experiment
Conducting the Experiment
P
P
P
P
P
P
P
P
P
P
P
P
1. Randomly assign participants to treatment groups
P
P
P
P
P
P
P
P
P
P
P
P
2. Carrying out the treatment phase for both groups
P
P
P
P
P
P
P
P
P
P
P
P
3. Observe relapse counts in each group
P
P
P
P
P
P
P
P
P
P
P
P
R = Relapsed
N = No Relapse
1. Randomly assign participants
to treatment groups
Desipramine
Placebo
Observed Sample
Statistic
Desipramine
P
P
P
P
P
P
P
P
R
R
R
R
R
R
P
P
P
P
P
P
P
P
P
P
P
P
R
R
R
R
N
R
N
R
P
P
P
P
P
P
P
P
P
P
P
P
N
R
N
R
N
N
N
N
P
P
P
P
P
P
P
P
P
P
P
P
N
N
N
N
N
N
24 Participants
24 Participants
10 relapsed, 14 no relapse
= ̂
=
̂P
= –.416
Placebo
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
N
N
N
N
N
N
20 relapsed, 4 no relapse
2
Cocaine Addiction
Measuring Evidence against H0
Research Question: Is Desipramine effective at treating cocaine
addiction?
Two options:
1. H0 is true (Desipramine and the placebo cause the same proportion of relapses)
To see if an observed sample statistic provides evidence
against H0, we need to see what kind of sample statistics we
would observe, just by random chance, if H0 was true.
2. Ha is true (Desipramine causes a smaller proportion of relapses than a placebo)
If H0 is true, how would you explain the observed difference in the
sample proportion of relapses?
The observed difference in the sample proportion of relapses
could have reasonably happen by chance.
How can we determine the probability that our observed sample
statistic could have occurred by random chance?
Cocaine Addiction
Statistical Test
Cocaine Addiction
Randomization Process
The sample size of our observed sample is 48.
Research Question: Is Desipramine effective at treating cocaine
addiction?
Observed sample statistic: ̂
̂ P = -.416 (Diff. in sample proportions)
Imagine having 48 pieces of paper. 30 pieces have an “R” on it and 18 have
an “N” on it. This corresponds with the total number of Relapsers and Nonrelapsers in our experiment (i.e., observed sample).
How unusual would it be to observe this sample statistic by random
chance if the null hypothesis was true (i.e., ̂
̂ P = 0)?
We want to generate samples where the null hypothesis is true (i.e.,
Desipramine is equally effective as a placebo at treating cocaine addiction).
What is the probability that we would observe, by random chance, a
difference in sample proportions as large as .416 if Desipramine is
equally effective as a placebo at treating cocaine addiction?
To do this we can randomly assign each piece of paper to a treatment group.
To answer this question we need a distribution of sample statistics that
would occur if the null hypothesis was true.
Then we would calculate the sample statistic (i.e., difference in sample
proportions) for this randomization sample.
To be consistent with our observed sample, we’d randomly assign 24 pieces to
the Desipramine group and 24 pieces to the placebo group.
We can generate these sample statistics using the randomization process.
Create a Randomization Sample
Create a Randomization Sample
Our Observed Sample
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
N
N
R
R
R
R
R
R
N
N
N
N
N
N
R
R
R
R
R
R
N
N
N
N
N
N
R
R
N
N
N
N
Randomization
Sample Statistic
Desipramine
R
R
R
R
R
R
R
R
R
R
R
R
R
N
R
N
R
R
R
R
N
N
R
R
R
R
R
R
R
R
R
R
R
R
N
N
N
N
N
N
R
R
R
R
R
R
R
N
R
R
R
N
N
N
N
N
N
N
R
R
N
N
N
N
R
N
N
N
R
R
10 relapsed, 14 no relapse
20 relapsed, 4 no relapse
16 relapsed, 8 no relapse
= ̂
=
Placebo
̂P
= .084
N
N
N
R
N
R
R
N
N
R
N
R
N
R
R
R
R
N
R
R
R
R
14 relapsed, 10 no relapse
3
Create a Randomization Sampling Distribution
Repeat this process 1,000 times to obtain 1,000 randomization sample statistics
to form a randomization sampling distribution.
Randomization
Sample Statistic
Desipramine
R
R
R
R
R
R
R
N
R
R
N
N
R
R
N
R
N
R
R
N
R
N
R
R
= ̂
=
̂P
= .166
17 relapsed, 7 no relapse
R
N
R
N
R
R
R
R
R
N
N
N
N
N
R
N
N
R
R
R
N
N
R
R
13 relapsed, 11 no relapse
www.lock5stat.com/statkey/
The p-value is the
area in the tail(s)
beyond the
observed sample
statistic in the
randomization
sampling
distribution.
p-value of the observed
sample statistic
The observed sample statistic
www.lock5stat.com/statkey/
Placebo
Cocaine Addiction
Proportion of randomization
sample statistics as extreme as
the observed sample statistic
Cocaine Addiction
Which tail(s) to
include (i.e., lefttail, right-tail, or
“two-tail”) depends
on the alternative
hypothesis.
Exercise and Gender
A Two-Tail Test
Research Question: Among college students, does one gender
spend more time exercising than the other?
What are the parameters of interest?
= mean number of hours male students spend exercising
= mean number of hours female students spend exercising
What is the H0 and Ha?
H0:
0 Time spent exercising does not differ by gender.
Ha:
0 Time spent exercising does differ by gender.
Proportion of randomization
sample statistics as extreme as
the observed sample statistic
p-value of the observed
sample statistic
The probability of
getting a sample
difference in
proportions as low
as -0.416 just by
random chance, if
Desipramine is
equally effective as
a placebo, is 0.003
The observed sample statistic
Alternative Hypothesis
The alternative hypothesis is determined by the research
question.
A one-sided Ha contains either > or <
A two-sided Ha contains ≠
For a one-sided Ha, the p-value is the proportion of
randomization sample statistics in the tail specified by Ha
(i.e., < → left-tail and > →right-tail).
For a two-sided Ha, the p-value is the proportion of
randomization sample statistics in both tails.
Exercise and Gender
www.lock5stat.com/statkey
p-value = 2 x .109
= 0.218
 Little evidence against H0
 Do not reject H0
Conclusion:
This study does not provide
adequate evidence that there
is any association between
gender and exercise times
among college students.
Think:
A result this extreme would
happen about 22% of the
time just by random chance
if H0 were true, so this study
does not provide adequate
evidence against H0.
4
Strength of Evidence
Hypothesis Testing
The p-value is the probability of getting results as extreme as our
observed sample statistic, if the null hypothesis is true.
If the p-value is small enough, we reject the null hypothesis,
in favor of the alternative hypothesis
The p-value measures our evidence against the null hypothesis.
How small is small enough?
.01
.05
p-values
.10
→1
The smaller the p-value, the smaller the proportion of
randomization sample statistics as extreme as our sample statistic.
The smaller the p-value, the stronger the evidence against H0.
Statistical Significance
The significance level ( ) is the threshold (e.g., .05, .01) below
which the p-value is deemed small enough to reject the null
hypothesis.
If the p-value is less than the threshold, the results are statistically
significant, and we reject the null hypothesis in favor of the
alternative hypothesis.
When the proportion of randomization sample statistics as extreme
as our observed sample statistic is less than (e.g., .05, .01), we say
that our observed sample statistic is “statistically significant”.
Saying that our observed sample statistic is statistically significant,
means that we have convincing evidence against H0 (and for Ha)
Formal Decisions
A formal hypothesis test has only two possible conclusions:
1. If the p-value is
:
Reject the null hypothesis in favor of the alternative.
Statistical Conclusions
Strength of evidence against H0:
.01
.05
p-values
.10
→1
Formal decision of hypothesis test [based on  = 0.05]:
.01
.05
statistically significant
[p-value
]
.10
→1
not statistically significant
[p-value
]
Assignment
Part I: 4.52, 4.76, and 4.84
Hint for #4.84: A correlation (r) between two variables is a type of sample statistic
Part II: See Next Slide
2. If the p-value is
:
Do not reject the null hypothesis.
5
Assignment
Obtaining Proportions from the GSS
Part II: (Type up this assignment in a Word document) [Worth 100 points]
Construct a research question that uses the following GSS variables DIVORCE and SEX.
Provide the symbol and value for the sample mean/sample proportion for each variable.
Provide the symbol and value for the sample statistic you’ll be testing.
State your null hypothesis in words and with an equation.
State your alternative hypothesis in words and with an equation.
Indicate whether this will be a left-tail, right-tail or two-tail test.
Use StatKey to generate a randomization sampling distribution where the H0 is true.
(Provide a screen shot of your randomization sampling distribution)
Calculate and interpret the p-value for your observed sample statistic.
Assess the strength of evidence this data provides against H0
Select a significance level and make a formal decision based on the significance level
Interpret/explain the results/conclusions of your study.
Hint: This is similar to the Cocaine study (i.e., difference in proportions)
Entering Data into StatKey to Create a
Randomization Sampling Distribution
Click this button to enter
your data and this window
will pop up.
To compare proportions across
two variables. Enter the first
variable here and the second
variable here.
Uncheck the “Weighted” box
Check the “Unweighted” box
Click this button and the
values/statistics needed to
calculate the difference in
sample proportions will open
up in a new window.
Identifying where your Observed Sample Statistic fits
on the Randomization Sampling Distribution
Once you’ve created your randomization
sampling distribution. Check the appropriate
tail test box.
To see where your observed sample statistic
fits on this distribution by click here.
This window will pop up and you can enter
the value for your observed sample statistic
here.
Summary
A randomization sampling distribution shows the distribution of
statistics that would be observed if H0 was true.
A p-value is the probability of getting a sample statistic as extreme as
the observed sample statistic, just by random chance, if H0 is true.
The p-value measures the strength of evidence against H0.
Results are statistically significant if the p-value is < α (the significance
level).
In making formal decisions, reject H0 if the p-value < α; otherwise do
not reject H0.
Hypothesis Testing
1. Construct research question
2. Define the parameter(s) of interest
3. State H0 and Ha
4. Set significance level ( ) [usually 0.05 if unspecified]
5. Collect data
6. Generate descriptive statistics
7. Calculate the appropriate observed sample statistic
8. Create a randomization sampling distribution (where H0 is true)
9. Calculate the p-value of the observed sample statistic
10.Assess the strength of evidence against H0
11.Make a formal decision based on the significance level
12.Interpret the conclusion in context
6
Randomized Experiments
In randomized experiments the “randomness” is the random
allocation of cases to treatment groups.
If the null hypothesis is true, it doesn’t make any difference
which treatment group a respondent gets placed in.
Generate randomization samples assuming H0 is true by
reallocating units to treatment groups, and keeping the
response values the same.
Formal Decisions
Reject H0 if observing a sample statistic so extreme is
unlikely when H0 is true. This means that the observed
sample data provides strong evidence to support Ha.
Do not reject H0 if observing a sample statistic is likely when
H0 is true. This means that the observed sample data does not
provide strong enough evidence to reject H0 (and support Ha)
For a given significance level ()
p-value <   Reject H0
p-value >   Do not Reject H0
Elephant Example
The mystery animal X is unknown, so we set up the
following hypothesis test:
H0: X is an elephant
Ha: X is not an elephant
What would you conclude, if you had the following data?
X has four legs
X walks on two
Since it remains plausible that X could be an elephant,
we don’t have enough evidence to reject H0. However,
with this data we also cannot accept H0 and conclude
that X is an elephant.
legs Since it is highly unusual for an elephant to
walk on two legs, we can reject H0 and
conclude that X is probably not an elephant.
Randomization Process
Through the randomization process we can generate a
randomization sampling distribution which is the distribution of
sample statistics we would observe, just by random chance, if the
null hypothesis was true.
1. Simulate many randomization samples, assuming H0 is true.
2. For each randomization sample, calculate the randomization
sample statistic.
3. These randomization sample statistics form a randomization
sampling distribution.
4. Find the proportion of these randomization sample statistics that
are as extreme as our observed sample statistic.
Statistical Significance
www.xkcd.com
7
Download