Ho : θ = 0 Ha : θ ≠ 0 Ha : θ > 0 Ho : µ = 0 Ha : µ > 0

advertisement
Statistical Analysis of Genomics Data: Review 3
(Francesca Chiaromonte)
More on inference, tests and random permutations
Null hypothesis on a feature of the population θ =t(F), for instance
Ho : θ = 0
This is what we would like to refute.
Using data from F, we investigate the null in comparison to an alternative, for instance
Ha : θ ≠ 0
(two-sided)
Ha : θ > 0
(one-sided, right – could be left)
or
This is what we would like to show is supported by evidence in the data.
Note in this case the null specifies one value, while the alternative specifies a range (these
are the most common specifications).
As an example, let us consider the mean µ =t(F), and the hypothesis system
Ho : µ = 0
Ha : µ > 0
e.g. we may want to assess if we have enough evidence to conclude, based on our sample
of n=100 observations, that the log length ratio for human vs chicken has a positive mean
(on average, the length of human is larger than that of chicken on orthologous regions).
1
We need to use a test statistic, i.e. a function of the data, whose distribution under Ho
(null distribution) is known and can thus be used as reference.
Here the situation is again as simple as it gets. We know that the quantity:
y−µ
se( y )
~
N (0,1)
~
Tn −1
approx
if n is large, regardless of the shape of F (asymptotics)
if F is normal, for any n. Good approximation if F does not
depart from normality too markedly (distributional
assumption on population).
Thus under Ho (say we use the first result, n=100):
u=
y
se( y )
~
N (0,1)
approx
the p-value or achieved significance level
associated with the observed u is the probability
that, under the null, the statistic would take the
observed value, or a value even more extreme in
the direction (here, right) defined by the
alternative
p(uobs) = Pr(u > uobs | µ = 0)
(Integral under the right tail, after uobs.)
For the two-sided alternative, the p-value is:
p(uobs) = Pr(u < -|uobs| or u > |uobs| | µ = 0) = 2 Pr(u > |uobs| | µ = 0)
here, because of symmetry of the null distribution.
Basic idea: we can reject Ho in favour of Ha if the observed value of the test statistic is
very extreme with respect to what one would expect under the null distribution; that is, if
the corresponding p-value is small. The smaller the p-value, the stronger the evidence
against Ho provided by the data.
2
In R:
> #Compute the observed value of u:
> u <- mean(chicken_toy[,"y"])/
(sd(chicken_toy[,"y"])/sqrt(length(chicken_toy[,"y"])))
> u
[1] 19.57509
> #this is a very large value. Next, we compute the p-value
> #for the right-sided alternative, i.e. the corresponding
> #right tail prob under N(0,1):
> pnorm(19.57509,mean=0,sd=1,lower.tail=FALSE)
[1] 1.260955e-85
> #0 by all practical means! Very strong evidence that the mean
> #log length ratio is positive.
Rejection rule: reject Ho if the p-value is < a threshold α, say 0.05.
This is called the level, or (target) significance. With this rule, we will ensure that
Pr(rejecting Ho | Ho) < α
i.e. we control the probability of a false positive, or so called type-I error.
The other error we can make is to fail to reject Ho when Ha holds:
Pr(not rejecting Ho | Ha)
This is the probability of a false negative, or so called type-II error. 1- such probability is
called the power of the test, and an explicit expression can be given for each point in the
alternative (in our instance Ha is a range).
Type-I and II error probabilities are in trade-off; test statistics are evaluated for their
power behavior, once the level is fixed.
3
Connections between CI and tests of hypothesis
The 1-2α coverage CI for the mean CI (α ) = y ± a(α ) se( y ) is the locus of points that
could not be rejected as nulls at level α (against a two-sided alternative).
Back to the right-sided alternative in the example above, consider the following
reasoning: both for the CI and for the test, we exploit
y−µ
se( y )
~
N (0,1)
approx
Loosely, when building the CI, we take N ( y , se( y )) as rendering the sampling
distribution of the sample mean (µ replaced with y-bar).
When testing, we take N (0, se( y )) as rendering the sampling distribution of the sample
mean under the null µ=0.
With α as tuning parameter, the data supports the conclusion µ > 0 if
•
The lower extreme of the CI(α) is >0 (0 not in the interval)
… 0 is far enough from the observed sample mean, using N ( y , se( y ))
•
The test rejects the null, p(uobs) < α, i.e. uobs > a(α)
… the observed sample mean is far enough from 0, using N (0, se( y ))
These are parallel ways of reasoning
N(0,se(mean))
0
Rejection threshold
N(mean,se(mean))
mean
CI
Note how here the two distributions are symmetric and identical except for their location.
4
It is useful to generalize this parallel.
Generic population feature
t(F)
hypothesis system, say
Ho: t(F) = 0 vs Ha t(F) > 0
Evidence in the data summarized through
t(Fn)
Now let:
•
PF(n) = sampling distribution of t(Fn)
•
Po(n) = sampling distribution of t(Fn) under the null.
∫
Po( n ) ( dt ) = p (tobs ) the p-value we seek
t ≥ tobs
∫P
(n)
F
( dt ) = f the parallel construction
t ≤0
if the two distributions are symmetric and not too different in shape, the latter is a good
approximation to the former, having the advantage that we can estimate it numerically by
bootstrap, whatever the t(F) under consideration:
1
fˆ = # (t * (b) ≤ 0) bootstrap-based empirical p-value
B
In rough terms, this is the logic for bootstrap testing (again, many refinements exist).
This approach breaks down if the nature of the variability presented by the statistic under
the null is substantially different.
However, there is an important class of testing problems for which we can obtain pvalues numerically without resorting to the parallel construction and the bootstrap.
(n)
This is because we can simulate Po
itself.
5
Permutation tests
Consider for instance
•
F = population jointly representing y and x both quantitative, and t(F)=ρ(y,x) the
correlation – e.g. the correlation between log length ratio and log large insertion ratio.
•
F = population jointly representing y (quantitative) and c=1,2 (categorical, indexed),
and t(F)=µ1−µ2 the difference between the means – e.g. the difference in mean log
length ratio between micro+medium and macro chicken chromosomes (creating two
groups out of three here).
In both cases the selected population features are ways of measuring the association
between two variables. The values representing no association are 0, and we can imagine
testing:
Ho : ρ(y,x)=0 (no linear association)
Ha : ρ(y,x)>0
Ho : µ1−µ2=0 (no location effects of the class)
Ha : µ1−µ2>0
We surely have:
y indep x Î Ho : ρ(y,x)=0
y indep c Î Ho : µ1−µ2=0
Permutation tests exploit the fact that we can simulate independence, and thus, a fortiori,
null hypothesis concerning these types of features.
6
Let us consider the difference of means problem:
y
y1
.
.
.
yn1
yn1+1
.
.
.
yn1+n2
c
1
.
.
.
1
2
.
.
.
2
(sub) sample of size n1 from the c=1 subpopulation
(sub) sample of size n2 from the c=2 subpopulation (n1+ n2= n)
… often called 2-sample problem.
The plug-in statistic for the difference of means is d = y1 − y 2 . Now, keep the y-column
fixed, and
For b=1…B
1. Create a random permutation of the c-labels in the second column (equivalent to
drawing n times from them without replacement). This gives (c1*(b)… cn*(b))
2. Using the permuted labels, recompute the difference between the means
*
*
d * (b) = y1 (b) − y 2 (b)
Compute the permutation-based empirical p-value as
p ( d obs ) =
1
# ( d * (b) ≥ d obs )
B
Random permutations simulate sampling from a population in which y and c are left
unchanged marginally, but any association existing between them is broken.
At the sample level, all the marginal features of y (overall mean, sd, etc) and c
(frequencies) are preserved, because the y and c numbers stay the same, but the
connection is “scrambled” – thus the means within the groups change.
7
Although in many practical cases bootstrap and permutation based empirical p-values
will be quite close, the latter is, when applicable, a better approach because it simulates
the null distribution directly.
Note: only one of the data clmns is permuted (could permute both, but it would not help).
To implement permutation tests in R, you can look for ready-to-use functions available
on the web (as individual functions or in packages), or you can write your own function.
In doing so, you will need to use the function:
> sample(x, size, replace = FALSE, prob = NULL)
> #A random permutation of x is a sample of size length(x)
> #from x, without repl. Need not specify sampling weights (prob)
8
Download