Lecture 9: p-value functions and intro to Bayesian thinking Matthew Fox Advanced Epidemiology If you are a hypothesis tester, which pvalue is more likely to avoid a type I error P = 0.049 P = 0.005 What is the p-value fallacy? If you go to the doctor with a set of symptoms, does the doctor develop a hypothesis and test it? Anyone heard of Bayesian statistics? After completing a study, would you rather know the probability of the data given the null hypothesis, or the probability of the null hypothesis given the data? Last Session Randomization – – – P-values – – Leads to average confounding, gives meaning to p-values Provides a known distribution for all possible observed results Observational data does not have average 0 confounding Probability under the null that a test statistic would be greater than or equal to its observed value, assuming no bias. Not the probability of the data, the null, a significance level Confidence intervals – – Calculated assuming infinite repetitions of the data Don’t give a probability of containing the true value Today The p-value fallacy p-value functions – Shows p-values for the data at a range of hypotheses Bayesian statistics – – – The difference between Frequentists and Bayesians Bayesian Theory How to apply Bayes Theory in Practice P-value fallacy P-value developed by Fisher as informal measure of compatibility of data with null – – Provides no guidance on significance Should be interpreted in light of what we know Hypothesis testing developed by Neyman, Pearson to minimize errors in the long run – RA Fisher Jerzy Egon Pearson (not Karl, his father) Neyman P = 0.04 is no more evidence than p = 0.00001 Fallacy is that p-value can do both Different goals The goal of epidemiology: – The goal of policy: – To measure precisely and accurately the effect of an exposure on a disease To make decisions Given our goal: – – Why hypothesis testing? Why compare to the null? p-value functions (1) Recall that a p-value is: – “The probability under the test hypothesis (usually the null) that a test statistic would be ≥ to its observed value, assuming no bias.” We can calculate p-values under test hypotheses other than the null – – Particularly easy if we use a normal approximation If we assume we can fix the margins with observational data p-value functions (1) Exposed Unexposed Disease 6 3 No disease 14 17 Total 20 20 Risk 0.3 0.15 2.00 RR Observed 0.63 SE(ln(RR)) p-value functions (2a) To calculate a test statistics (Z score) for a p-value, usually given: z ln( RR ) SE ( RR ) ln( 2 . 0 ) 0 . 63 p ( z 1 . 1) 0 . 27 1 .1 Ln(1) = 0 p-value functions (2b) z ln( RR o ) ln( RR H ) , SE ( RR o ) SE ( RR ) c a Ne abs ( z ) p 2 (1 1 2 e d bNE 21 z 2 dz ) p-value functions (3) Hypothesis RR Hypothesis Hypothesis Hypothesis Hypothesis Hypothesis Hypothesis Hypothesis Hypothesis Hypothesis Observed RR a RR b RR c RR d RR e RR f RR g RR h RR I 0.1 0.20 0.33 0.5 1 2.00 3 5 10 z-value p-value 4.737 0.0E+00 3.641 0.0000 2.849 0.0040 2.192 0.0280 1.096 0.2730 0.000 1.0000 -0.641 0.5210 -1.449 0.1470 -2.545 0.0110 p-value functions (4) 2-sided p-value p-value function for example 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Null pval 0.27 UCLM LCLM 0.1 0.58 Point estimate 1 RR hypothesis 2 6.9 10 p-value functions (4) 2-sided p-value p-value function for example 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 0.1 1 RR hypothesis 10 Case-control study of spermicides and Down Syndrome Interpretation Introduction to Bayesian Thinking What is the best estimate of the true effect of E on D? Given a disease D and An exposure E+ vs. unexposed E-: – – The relative risk associating D w/ E+ (vs E-) = 2.0 with 95% confidence interval 1.5 to 2.7 OK, but why did you say what you said? – Have no other info to go on What is the best estimate of the true effect of E on D? Given a disease D and An exposure E+ vs. unexposed E-: – – The relative risk associating D w/ E+ (vs E-) = 2.0 with 95% confidence interval 0.5 to 8.0 Note that the width of the interval doesn’t affect the best estimate in this case What is the best estimate of the true effect of E on D? Given a disease D (breast cancer) and An exposure E+ (ever-smoking) vs. unexposed E- (never-smoking): – – The relative risk associating D w/ E+ (vs E-) = 2.0 with 95% confidence interval 1.0 to 4.0 Most previous studies of smoking and BC have shown no association What is the best estimate of the true effect of E on D? Given a disease D (lung cancer) and An exposure E+ (ever-smoking) vs. unexposed E- (never-smoking): – – The relative risk associating D w/ E+ (vs E-) = 2.0 with 95% confidence interval 1.0 to 4.0 Most previous studies of smoking and LC have shown much larger effects What is the best estimate of the true probability of heads? Given a 100 flips of a fair coin flipped in a fair way Observed number of heads = 40 – – The probability of heads equals 0.40 with 95% confidence interval 0.304 to 0.496 Given what we know about a fair coin, why should this data override what we know? So why would we interpret our study data as if it existed in a vacuum? The Monty Hall Problem An alternative to frequentist statistics Something important about the data is not being captured by the p-value and confidence interval, or at least in the way they’re used What is missing is a measure of the evidence provided by the data Evidence is a property of data that makes us alter our beliefs Frequentist statistics fail as measures of evidence The logical underpinning of frequentist statistics is that – “if an observation is rare under a hypothesis, then the observation can be used as evidence against the hypothesis.” Life is full of rare events we accord little attention – What makes us react is a plausible competing hypothesis under which the data are more probable Frequentist statistics fail as measures of evidence • Null p-value provides no information about the probability of alternatives to the null • Measurement of evidence requires 3 things: • The observations (data) • 2 competing hypotheses (often null and alternative) • The p-value incorporates data & 1 hypothesis • usually the null The likelihood as a measure of evidence Likelihood = c*Probability(dataH) Data are fixed and hypotheses variable – p-values calculated with fixed (null) hypothesis and assuming data are randomly variable. Evidence supporting one hypothesis versus another = ratio of their likelihoods – Log of the ratio is an additive measure of support Evidence versus belief The hypothesis with the higher likelihood is better supported by the evidence – Belief also depends on prior knowledge, and can be incorporated w/ Bayes Theorem – But that does not make it more likely to be true. It is the likelihood ratio that represents the data Priors can be subjective or empirical – But not arbitrary Bayesian analysis (1) Given (1) the prior odds that a hypothesis is true, and (2) data to measure the effect – – Update the prior odds using the data to calculate the posterior odds that the hypothesis is true. A formal algorithm to accomplish what many do informally Bayesian analysis (2) p (H 1 D ) p (H 1 ) p (D H 1 ) p (H 0 ) p (D H 0 ) p (H 0 D ) Prior odds times the likelihood ratio equals the posterior odds Only for people with an ignorant prior distribution (uniform) can we say that the frequentist 95% CI covers the true value with 95% certainty Bayesian analysis (3): Environmental tobacco smoke and breast cancer likelihood prior odds posterior study observation ratio (A) odds Sandler 1.6 (0.8-3.4) 1.78 1.0 1.78 Hirayama 1.3 (0.8-2.1) 0.38 1.8 0.7 Smith 1.6 (0.8-3.1) 2.12 0.7 1.4 H1 = [OR = 2]; H0 = [OR = 1] Initially, the analyst has no preference (prior odds = 1) Bayesian analysis (4): concepts Keep these concepts separate: – – – The hypotheses under comparison (e.g., RR=2 vs RR=1) The prior odds (>1 favors 1st hypothesis (RR=2), <1 favors the 2nd hypothesis (RR=1)) The estimate of effect for a study (this is the data used to modify the prior odds) Bayesian analysis (5): concepts Keep these concepts separate: – – The likelihood ratio (probability of data under 1st hypothesis versus under 2nd hypothesis. >1 favors 1st hypothesis, <1 favors the 2nd hypothesis) The posterior odds (compares the hypotheses after observing the data. >1 favors 1st hypothesis, <1 favors the 2nd hypothesis) Bayesian analysis: P art V : P rob lem 3 (20 p oin ts total) T he follow ing table show s odds ratios, 95% confidence intervals, and the standard error of the ln(O R ) from three studies of the association betw een passive exposu re to tobacco sm oke and the occurrence of breast cancer. T he fourth colum n show s the likelihood ratio calculated as the 1 0 likelihood under the hypothesis that the true relative risk equals 2.0 divided by the likelihood under the hypothesis that the true relative risk equals 1.0. H = RR = 2.0 study H = RR = 1.0 S E (ln(O R )) likelihood ratio prior odds posterior odds S andler observation 95% C I 1.6 (0.8-3.4) 0.38 1.8 0.75 1.34 H irayam a 1.3 (0.8-2.1) 0.24 0.4 1.34 0.50 S m ith 1.6 (0.8-3.1) 0.34 2.1 0.50 1.07 A . (7 points) A ssum e that som eone favors the hypothesis that the true odds ratio equals 1.0 over the hypothesis that the true odds ratio equals 2.0. T he person quantifies their preference by stating that their prior odds for the hypothesis of a relative risk equal to 2.0 versus t he hypothesis of a relative risk equal to 1.0 is 0.75 (see the first row of the table). C om plete the shaded cells of the table using B ayesian analysis. A fter seeing these three studies, should the person favor (circle one): the hypothesis of 2.0 th e hypothesis of 1.0 http://statpages.org/bayes.html Connection between p-values and the likelihood ratio Bayesian intervals Bayesian intervals require specification of prior odds for entire distribution of hypotheses, not just two hypotheses – Update the distribution with data – Distribution will look like a p-value function, but incorporate only prior knowledge. Posterior distribution Choose interval limits Priors 1 0 .9 p ro b a b ility d e n s ity 0 .8 0 .7 0 .6 0 .5 0 .4 0 .3 0 .2 0 .1 0 0 .1 1 10 R R H y p o th e s is A d vo ca te S ke p tic Ig n o ra n t B im o d a l + Sandler 1 0 .9 p ro b a b ility d e n s ity 0 .8 0 .7 0 .6 0 .5 0 .4 0 .3 0 .2 0 .1 0 0 .1 1 10 R R H y p o th e s is a d vo ca te ske p tic ig n o ra n t b im o d a l + Hirayama 1 0 .9 p ro b a b ility d e n s ity 0 .8 0 .7 0 .6 0 .5 0 .4 0 .3 0 .2 0 .1 0 0 .1 1 10 R R H y p o th e s is a d vo ca te ske p tic ig n o ra n t b im o d a l + Smith 1 0 .9 p ro b a b ility d e n s ity 0 .8 0 .7 0 .6 0 .5 0 .4 0 .3 0 .2 0 .1 0 0 .1 1 10 R R H y p o th e s is a d vo ca te ske p tic ig n o ra n t b im o d a l + Morabia 1 0 .9 p ro b a b ility d e n s ity 0 .8 0 .7 0 .6 0 .5 0 .4 0 .3 0 .2 0 .1 0 0 .1 1 10 R R H y p o th e s is a d vo ca te ske p tic ig n o ra n t b im o d a l + Johnson 1 0 .9 p ro b a b ility d e n s ity 0 .8 0 .7 0 .6 0 .5 0 .4 0 .3 0 .2 0 .1 0 0 .1 1 10 R R H y p o th e s is a d vo ca te ske p tic ig n o ra n t b im o d a l Conclusion Pvalue fallacy – Pvalue functions – Cannot serve both the long run perspective and the individual study perspective Can help see the entire distribution of probabilities Bayesian analysis – Allows us to change our beliefs with new information and measure the probability of hypothesis