Tuesday week 4

advertisement
Recap of Tuesday 4/19 lecture
From Statistical Intuition
to Formal Inference
No difference in means
Unknown difference in means
1
0
0.2
1
True Population Differences
2
Main points from exercise 1
• Suggestive patterns are common in random data (where there are
no differences).
• Having samples from the populations without differences helps us
to interpret our observed difference.
• We can ask “how frequently would I expect to see a sample like this
(or more extreme), given there is no difference.” This is an informal
hypothesis test.
• Sometimes it’s hard to discern a difference in the sample due to
true population differences from random differences (groups 1,3, &
4). So, some times the results will be ambiguous.
• Only group 5 said that they thought their sample was unlikely, given
there was no difference. So for groups 1-4 we would just have to
say (we’re not sure!). Or in other words, we could not reject the null
hypothesis.
Exercise 2 (a permutation test)
• We are interested in effects of
climate change on alpine
ecoregions dominated by larch
and pine, Eastern slopes of the
Cascades in OR and WA.
• Question: Is there a difference
between species in date at
which ripe seed cones become
available.
• Ideally, we would look at trees
across several elevation bands.
Look only at lowest band today
(500-1000m). N=24
How to do a permutation test
1.
2.
3.
4.
5.
6.
7.
Keep 24 data cards together as one “dataset” – don’t
mix them.
Calculate the difference between mean date for the
larch trees and mean date for the pine trees.
Cut off the labels.
Randomly resample the data (sample sizes=11/13)
(without replacement).
Calculate the mean for one group and the mean for the
other group and then calculate the difference.
Put the difference on a sticky note with initials and place
on the board.
Repeat steps 4-6 as many times as possible.
Different tests with different assumptions
Parametric Test
(T-test)
Non-Parametric
Test (U-test)
Non-Parametric
Test
(Permutation)
Assume:
2.9
+ math theory
Probability of observing
a t-statistic this big or
larger if means &
variances are same.
+ drop of
probability theory
Probability of observing
a U-statistic this big or
larger if distributions are
equal . (Really a
permutation test using
ranks ..)
3.0
Probability of observing
a difference (or other
statistic) this big or
larger if the labels were
meaningless.
Main points from exercise 2
• With the permutation test, the trick to generating lots of
samples from the null hypothesis (i.e. there is no difference
in the populations), is to randomly shuffle (permute) the
values, many times.
• For a t-test you use the central limit theorem to get an
approximate distribution of the differences by using the
variance from the sample.
• For the Wilcoxon test you rank all of the observations based
on the date (1..24). Then you use probability theory to
generate the null distribution of a rank statistic (which
summarizes how different the ranks of the two groups are).
• All tests use the general null hypothesis that the groups are
not different, but the specifics different by test.
Download