Statistical Thinking for Hypothesis Testing

Overview • Fundamentals of statistical hypothesis testing: • no-one likes a statistics lecture, but there are some things you have to know • Likelihood ratio tests in phylogenetics: • to make inferences about processes of evolution e.g. patterns of substitution, rates of substitution (Kevin’s section) • for finding best models of evolution for inferring trees (Kevin) • for detecting selection (Asif’s section) • [advanced topic] for comparing trees (topologies), to give measures of confidence in estimated trees Likelihood and maximum likelihood Recall that for model M, parameters θ and data D: likelihood L(M, θ | D) = Pr(D | M, θ) and that maximum likelihood inference consists of finding θ̂, the θ that makes the likelihood as large as possible: find θ̂ so that L(M, θ̂ | D) ≥ L(M, θ | D) for all other θ i.e. find the values for parameters θ that make the probability of the data as big as possible for the model being used — intuitively, it is clear that these are sensible estimates of the model parameters Likelihood and hypotheses Now suppose we have some hypothesis ‘H’ regarding the model and parameters. Similarly, the likelihood of the hypothesis is: L(H | D) = Pr(D | H) Perhaps the hypothesis fully defines the likelihood (no free parameters), or perhaps there are some free parameters in the hypothesis — in which case we again maximize the likelihood to find the best value under a hypothesis. ‘Fair coin’ hypothesis We toss a coin 100 times, and observe 65 Heads and 35 Tails. Our hypothesis ‘H0’ is that each throw is independent, with probability 0.5 of giving Heads. What is the likelihood of this hypothesis? L(H0 | D) = Pr(D | H0) = 0.565 x 0.535 = 7.889 x 10-31 or ln(L(H0)) = ln(7.889 x 10-31) = -69.31 This is all very interesting, but what can we do with it? ‘Possibly unfair coin’ hypothesis We toss a coin 100 times, and observe 65 Heads and 35 Tails. Our hypothesis ‘H1’ is that each throw is independent, with unknown probability p of giving Heads. What is the likelihood of this hypothesis? Now we have a free parameter p, the probability of getting Heads. The maximum likelihood estimate p̂ is exactly the observed proportion of Heads, i.e. 65/100 = 0.65 L(H1 | D) = Pr(D | H1) = 0.6565 x 0.3535 = 7.616 x 10-29 or ln(L(H1)) = ln(7.616 x 10-29) = -64.74 Comparison of coin hypotheses L(H0) = 7.889 x 10-31; ln(L(H0)) = -69.31 L(H1) = 7.616 x 10-29; ln(L(H1)) = -64.74 L(H1)/L(H2) = 7.616 x 10-29 / 7.889 x 10-31 = 96.55 Evidently H1 is better than H0, but is it ‘significantly’ better? Nested hypotheses Some terminology: hypothesis H0 is ‘nested’ within hypothesis H1 if forcing a particular choice of some of the parameters of H1 makes it the same as H0. For coin tossing, forcing the unknown probability of Heads in H1 to equal 0.5 gives us exactly H0. H0 is nested in H1. Many sequence substitution models are nested in others: The more-complicated model H1 ‘contains’ the simpler H0 and must have more parameters to estimate Comparison of general hypotheses (I) Traditional statistical hypothesis testing compares a ‘null hypothesis’ H0 with an alternative hypothesis H1. Usually H0 is nested in H1, and we will treat H0 as valid unless the evidence in favour of H1 is much stronger. The evidence we use for H0 and H1 are their likelihoods, L(H0) and L(H1). The relative evidence is the ratio of likelihoods: or twice the logarithm of this: Λ = L(H1)/L(H0) 2∆ = 2 ln(Λ) = 2[ln(L(H1)) – ln(L(H0))] Large values of 2∆ mean that the evidence for H1 is greater than for H0. (Nested models ensure that Λ ≥ 1, i.e. that 2∆ ≥ 0.) Comparison of general hypotheses (II) How large a value of 2∆ is big enough? Traditionally, we say that if 2∆ is bigger than we would expect by chance in 95% (or 99%, or 99.9%...) of cases when H0 is correct, then we favour H1 over H0. A useful theorem for doing the necessary calculations: Suppose H0 is nested in H1, and H0 has d fewer free parameters than H1. Then, if H0 is correct, 2∆ has a chi-squared distribution with d degrees of freedom: 2 2∆ = 2[ln(L(H1)) - ln(L(H0))] ~ χd If 2∆ is greater than the 95% (or 99%, or 99.9%...) point of the χ 2 distribution, d then we reject H0 in favour of H1. If 2∆ is ‘reasonable’, i.e. less than this, then we have no evidence to reject H0. Comparison of general hypotheses (III) These statistical tests are likelihood ratio tests (LRTs). This is a very powerful class of statistical hypothesis tests, with very broad applicability. or: http://www.danielsoper.com/statcalc3/calc.aspx?id=11 or: http://graphpad.com/quickcalcs/PValue1.cfm or: http://www.danielsoper.com/statcalc3/calc.aspx?id=11 or: http://graphpad.com/quickcalcs/PValue1.cfm Scatterplot of C3 vs C1 25 20 C3 15 10 5 0 -5 0 5 10 C1 15 20 Scatterplot of C3 vs C1 25 20 C3 15 10 5 0 -5 0 5 10 C1 15 20 Scatterplot of C3 vs C1 25 20 C3 15 10 5 0 -5 0 5 10 C1 15 20 Scatterplot of C3 vs C1 25 20 C3 15 10 5 0 -5 0 5 10 C1 15 20 Scatterplot of C3 vs C1 25 20 C3 15 10 5 0 -5 0 5 10 C1 15 20 Comparison of coin hypotheses revisited L(H0) = 7.889 x 10-31; ln(L(H0)) = -69.31 L(H1) = 7.616 x 10-29; ln(L(H1)) = -64.74 2∆ = 2[ln(L(H1)) – ln(L(H0))] = 2 x [-64.74 – -69.31] = 2 x [-64.74 + 69.31] = 9.14 H0 and H1 differ by 1 free parameter (the probability of Heads), so the degrees of freedom d = 1. 2 We compare 2∆ = 9.14 with the χ1 distribution, and observe a P-value < 0.005 — we conclude that H1 (probability of Heads not necessarily equal to 0.5) is preferred to H0 (fair coin). Fair dice? Suppose you roll a die 100 times, and observe the following: score: 1 2 3 4 5 6 # obs: 15 14 20 13 18 20 L(H0) = (1/6)15(1/6)14(1/6)20(1/6)13(1/6)18(1/6)20 = 1.531 x 10-78 L(H1) = (15/100)15(14/100)14(20/100)20(13/100)13(18/100)18(20/100)20 = 6.376 x 10-78 2∆ = 2(ln(L(H1)) – ln(L(H0))) = 2 x (-177.75 – -179.18) = 2.85 2 Comparing this with a χ5 distribution, we find the P-value is between 0.7 and 0.8 No statistical evidence to reject H0 in favour of H1 Some useful hypothesis tests in phylogenetics: Comparison of patterns of DNA substitution e.g. Jukes-Cantor model vs. Kimura 2-parameter model (0 parameters) µ Some useful hypothesis tests in phylogenetics: Comparison of patterns of DNA substitution (I) e.g. Jukes-Cantor model vs. Kimura 2-parameter model J-C: K2P: no rate parameters (same rate for all substitutions) 1 rate parameter (ratio of transition:transversion rates) H0: unknown tree relating the sequences; J-C model of substitutions (parameters: tree shape, branch lengths) H1: unknown tree relating the sequences; K2P model of substitutions (parameters: tree shape, branch lengths, ts:tv rate ratio) The difference in parameters is just the transition:transversion rate ratio, a single number. Fixing it equal to 1 in K2P gives us the J-C model back. So the models are nested. We can perform a hypothesis test between the models by 2 comparing 2∆ with a χ1 distribution. Some useful hypothesis tests in phylogenetics: Comparison of patterns of DNA substitution (II) e.g. Kimura 2-parameter model vs. Hasegawa, Kishino, Yano model (4 parameters) 5 4 Some useful hypothesis tests in phylogenetics: Comparison of patterns of DNA substitution (II) e.g. Kimura 2-parameter model vs. Hasegawa, Kishino, Yano model K2P: HKY: 1 rate parameter (ratio of transition:transversion rates) 4 rate parameters (ratio of transition:transversion rates AND 3 base frequencies free to vary) H0: unknown tree relating the sequences; K2P model of substitutions (parameters: tree shape, branch lengths, ts:tv rate ratio) H1: unknown tree relating the sequences; HKY model of substitutions (parameters: tree shape, branch lengths, ts:tv rate ratio, 3 base frequencies) The difference in parameters are the 3 base frequencies. Fixing them equal to 1/4 in HKY gives us back the K2P model. So the models are nested. We can perform a hypothesis test between the models by 2 comparing 2∆ with a χ3 distribution. Some useful hypothesis tests in phylogenetics: Comparison of patterns of DNA substitution (III) e.g. Hasegawa, Kishino, Yano model vs. general time reversible (GTR) model (8 parameters) 9 8 Some useful hypothesis tests in phylogenetics: Comparison of patterns of DNA substitution (III) e.g. Hasegawa, Kishino, Yano model vs. general time reversible (GTR) model HKY: GTR: 4 parameters (ratio of transition:transversion rates AND 3 base frequencies free to vary) 5 rates of change AND 3 base frequencies parameters H0: unknown tree relating the sequences; HKY model of substitutions (parameters: tree shape, branch lengths, ts:tv rate ratio, 3 base frequencies) H1: unknown tree relating the sequences; GTR model of substitutions (parameters: tree shape, branch lengths, 5 relative rates of substitution, 3 base frequencies) The difference in parameters are 4 relative rates of change. Fixing them in appropriate ratios in GTR gives us back HKY. So the models are nested. We can perform a hypothesis test between the models by 2 comparing 2∆ with a χ 4 distribution. • Studying the models themselves helps us learn about processes of evolution • Choosing a good model will also help us make reliable estimates of phylogenetic trees • Likelihoods give an intuitive measure of how well models fit the data we have observed • so comparing likelihoods is a good way to compare different models • Likelihood ratio tests (LRTs) enable us to perform robust statistical tests of which models are better • test statistic is 2∆ = 2 x log of ratio of likelihoods • test distribution is a χ2 distribution • degrees of freedom for the test depends on the difference in the number of parameters estimated in the models compared Hypothesis testing in phylogenetics Further topics: Models that are not nested We can use the Akaike Information Criterion (AIC): AIC(model) = 2k – 2[ln(L(model))] and the model with smallest AIC value is considered to be best k is the number of parameters estimated in the model Hypothesis testing in phylogenetics Further topics: Models that are not nested AIC(model) = 2k – 2[ln(L(model))] AICc(model) = 2k – 2[ln(L(model))] + 2k (k + 2) n − k −1 (correction for small sample size or many parameters — good!*) BIC(model) = k ln(n) – 2[ln(L(model))] (Bayesian approach; greater penalty for more parameters — bad?*) *see Burnham & Anderson (2002) Model Selection and Multi-Model Inference. Springer, New York (http://tinyurl.com/3ef8sn7)

Statistical Thinking for Hypothesis Testing

Related documents

Products

Support

Statistical Thinking for Hypothesis Testing

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib