reporting ‘almost significant’ results: follow-up to PRIMENT stats seminar (27 August 2013) stats methodologists meeting 10 September 2013 plan • brief summary of Priment seminar – deconstructing the phrase “…trend towards statistical significance” • investigating how p-values move • what p-values tell us • how best to report near-significant results? The trap of trends to statistical significance: how likely it really is that a near significant P value becomes more significant with extra data John Wood Nick Freemantle Michael King Irwin Nazareth “…a trend towards statistical significance…" • …is a very popular way of reporting ‘nonsignificant’ results where the p-values weren’t ‘too far’ above some threshold (usually p=0.05) • (e.g.) “…there was a trend toward a lower risk of any treatment failure … (hazard ratio ... 0.86; 95% CI, 0.73 to 1.01; P = 0.06)” • is this a reasonable use of words? • does it make sense to call it a ‘trend’ ‘trends’ imply movement • we’ve collected data comparing 2 treatments and found the 2-sided p-value (2p) to be just above 0.05 (say) • if this is a ‘trend towards significance’ then the following should be true: – running the experiment longer (k% more data)… – then p-value ‘should’ drop (get more significant) • what are the chances? (aside) how we might calculate that • current data {xi} – all ~N(μ,1) - is 100 (pairs of) observations, each contributes an estimate of the treatment effect • overall current estimate x̄~N(μ,0.01) is greater than 0 with 2-sided significance 2p – can express x̄ in terms of p: x̄=0.1*Φ-1(1-p) • our current knowledge about μ is reasonably represented by the likelihood, so (loosely) μ~N(x̄,0.01) • now add in an extra k pairs of observations (k% more data), which will have a mean of ȳ: ȳ|μ ~N(μ,1/k) ȳ~N(x̄, 0.01+1/k) • significance is unchanged if: – (updated mean)/(updated SEM) = (old mean)/(old SEM) – [(100.x̄+k.ȳ)/(100+k)].√[100+k] = 10.x̄ • have the distribution of ȳ, so can calculate chance of significance moving ‘backwards’ what is likely to happen if we add 20% more data… extra data current current prob p.val gets odds (x:1) % of 1.tailed 2.tailed less sig with against current (k) p.val (p) p.val (2p) more data that 20 0.005 0.01 0.29 2.4 20 0.025 0.05 0.34 2.0 20 0.03 0.06 0.34 1.9 20 0.04 0.08 0.35 1.8 20 0.05 0.10 0.36 1.8 20 0.075 0.15 0.38 1.6 summary • a p-value ‘on the brink’ would be quite likely to move the ‘wrong’ way if we were able to add more data • therefore, talking of ‘trends to significance’ is misleading impression • p-values have much more variability associated with them than we’d like to think (and not just when H0 is true) investigating how p-values move simple-comparative trial; effect-size = 0.3 up to n=250/group; what question do p-values answer? • not: “are the effects of A and B different?” (with “no” as a possible answer) • but “can we be confident of the direction from A to B: is it ‘up’, ‘down’ or ‘uncertain’?...” • …the follow-up question is about ‘how much’ • J. W. Tukey (1991). The Philosophy of Multiple Comparisons. Statistical Science 6 100-116 how should you report near-significant results? • not as ‘trends towards significance’ • but this is certainly not an argument for ignoring ‘interesting hints’ (Tukey again) • so, a word like ‘hint’ perhaps, and always with the CI • views?