Recap of Tuesday 4/19 lecture From Statistical Intuition to Formal Inference No difference in means Unknown difference in means 1 0 0.2 1 True Population Differences 2 Main points from exercise 1 • Suggestive patterns are common in random data (where there are no differences). • Having samples from the populations without differences helps us to interpret our observed difference. • We can ask “how frequently would I expect to see a sample like this (or more extreme), given there is no difference.” This is an informal hypothesis test. • Sometimes it’s hard to discern a difference in the sample due to true population differences from random differences (groups 1,3, & 4). So, some times the results will be ambiguous. • Only group 5 said that they thought their sample was unlikely, given there was no difference. So for groups 1-4 we would just have to say (we’re not sure!). Or in other words, we could not reject the null hypothesis. Exercise 2 (a permutation test) • We are interested in effects of climate change on alpine ecoregions dominated by larch and pine, Eastern slopes of the Cascades in OR and WA. • Question: Is there a difference between species in date at which ripe seed cones become available. • Ideally, we would look at trees across several elevation bands. Look only at lowest band today (500-1000m). N=24 How to do a permutation test 1. 2. 3. 4. 5. 6. 7. Keep 24 data cards together as one “dataset” – don’t mix them. Calculate the difference between mean date for the larch trees and mean date for the pine trees. Cut off the labels. Randomly resample the data (sample sizes=11/13) (without replacement). Calculate the mean for one group and the mean for the other group and then calculate the difference. Put the difference on a sticky note with initials and place on the board. Repeat steps 4-6 as many times as possible. Different tests with different assumptions Parametric Test (T-test) Non-Parametric Test (U-test) Non-Parametric Test (Permutation) Assume: 2.9 + math theory Probability of observing a t-statistic this big or larger if means & variances are same. + drop of probability theory Probability of observing a U-statistic this big or larger if distributions are equal . (Really a permutation test using ranks ..) 3.0 Probability of observing a difference (or other statistic) this big or larger if the labels were meaningless. Main points from exercise 2 • With the permutation test, the trick to generating lots of samples from the null hypothesis (i.e. there is no difference in the populations), is to randomly shuffle (permute) the values, many times. • For a t-test you use the central limit theorem to get an approximate distribution of the differences by using the variance from the sample. • For the Wilcoxon test you rank all of the observations based on the date (1..24). Then you use probability theory to generate the null distribution of a rank statistic (which summarizes how different the ranks of the two groups are). • All tests use the general null hypothesis that the groups are not different, but the specifics different by test.