Permutation Tests

advertisement

ST3060  –  Permutation  Tests  

 

Permutation Tests are similar to the Bootstrap in that they are based on the idea of resampling however they differ from the Bootstrap in that the samples drawn are done so without replacement.

Permutation Tests are a form of non-parametric test which can be readily applied in 2-sample unpaired and paired testing environments. They are primarily used to provide a p-value.

For example we may wish to test H o

: µ

1

= µ

2 vs H a

: µ

1

≠ µ

2

. where

µ

1 and

µ

2

are the underlying means of the two populations in question. Two random samples from both populations are pooled and by sampling without replacement we form two replicate samples having the same size as the original samples. We next calculate the difference in sample means for the two replicated samples and record this value. Ideally we should continue this procedure and record the replicated sample difference for all possible resamples (this is known as an exact randomisation testing procedure). However in practice we may just run the replications for 1,000 or

10,000 resamples and this may allow sufficient accuracy in our testing procedures. The distribution of the statistic from these resamples forms the sampling distribution under the null hypothesis. It is called a permutation distribution. The observed test statistic is the difference in means between the original two samples. The p-value is then the proportion of resamples that exceed the test-statistic in absolute size (since we are carrying out a 2-tailed test). We can easily extend this method to 1-tailed test and to a paired test.

Parametric tests, such as the t-test, give accurate p-values if the sampling distribution of the difference in means is at least roughly Normal. The Permutation Test gives accurate p-values even when the sampling distribution is not close to Normal. Additionally we can directly check the Normality of the sampling distribution by looking at the permutation distribution. Permutation Tests exist for any test statistic, regardless of whether or not its distribution is known.

The Permutation Test completely removes the Normality condition. However it does require the two populations to have identical distributions when the null hypothesis is true—not only the same means, but also the same variances, shapes etc. In practice it is robust against different distributions, except for different spreads when the sample sizes are not similar.

Just as in the case of Bootstrap confidence intervals, Permutation Tests are subject to two sources of random variability: the original sample is chosen at random from the population, and the resamples are chosen at random from the sample. Again as in the case of the Bootstrap, the added variation due to resampling is usually small and can be made as small as we like by increasing the number of

 

  resamples.

Notes  courtesy  of  Damian  Conway,  UCC  

Download