Statistical Bite Size 10th September, 2013

advertisement
reporting ‘almost significant’ results:
follow-up to PRIMENT stats seminar
(27 August 2013)
stats methodologists meeting
10 September 2013
plan
• brief summary of Priment seminar
– deconstructing the phrase “…trend towards
statistical significance”
• investigating how p-values move
• what p-values tell us
• how best to report near-significant results?
The trap of trends to statistical
significance: how likely it really is that a
near significant P value becomes more
significant with extra data
John Wood
Nick Freemantle
Michael King
Irwin Nazareth
“…a trend towards statistical
significance…"
• …is a very popular way of reporting ‘nonsignificant’ results where the p-values weren’t
‘too far’ above some threshold (usually p=0.05)
• (e.g.) “…there was a trend toward a lower risk of
any treatment failure … (hazard ratio ... 0.86;
95% CI, 0.73 to 1.01; P = 0.06)”
• is this a reasonable use of words?
• does it make sense to call it a ‘trend’
‘trends’ imply movement
• we’ve collected data comparing 2 treatments and
found the 2-sided p-value (2p) to be just above
0.05 (say)
• if this is a ‘trend towards significance’ then the
following should be true:
– running the experiment longer (k% more data)…
– then p-value ‘should’ drop (get more significant)
• what are the chances?
(aside) how we might calculate that
•
current data {xi} – all ~N(μ,1) - is 100 (pairs of) observations, each contributes an
estimate of the treatment effect
•
overall current estimate x̄~N(μ,0.01) is greater than 0 with 2-sided significance 2p
– can express x̄ in terms of p:
x̄=0.1*Φ-1(1-p)
•
our current knowledge about μ is reasonably represented by the likelihood, so
(loosely)
μ~N(x̄,0.01)
•
now add in an extra k pairs of observations (k% more data), which will have a
mean of ȳ:
ȳ|μ ~N(μ,1/k)
ȳ~N(x̄, 0.01+1/k)
•
significance is unchanged if:
– (updated mean)/(updated SEM) = (old mean)/(old SEM)
– [(100.x̄+k.ȳ)/(100+k)].√[100+k] = 10.x̄
•
have the distribution of ȳ, so can calculate chance of significance moving
‘backwards’
what is likely to happen if we add
20% more data…
extra data current
current prob p.val gets odds (x:1)
% of
1.tailed
2.tailed less sig with
against
current (k) p.val (p) p.val (2p) more data
that
20
0.005 0.01
0.29
2.4
20
0.025 0.05
0.34
2.0
20
0.03 0.06
0.34
1.9
20
0.04 0.08
0.35
1.8
20
0.05 0.10
0.36
1.8
20
0.075 0.15
0.38
1.6
summary
• a p-value ‘on the brink’ would be quite likely to
move the ‘wrong’ way if we were able to add
more data
• therefore, talking of ‘trends to significance’ is
misleading impression
• p-values have much more variability associated
with them than we’d like to think (and not just
when H0 is true)
investigating how p-values move
simple-comparative trial;
effect-size = 0.3
up to n=250/group;
what question do p-values answer?
• not: “are the effects of A and B different?”
(with “no” as a possible answer)
• but “can we be confident of the direction from
A to B: is it ‘up’, ‘down’ or ‘uncertain’?...”
• …the follow-up question is about ‘how much’
• J. W. Tukey (1991). The Philosophy of Multiple
Comparisons. Statistical Science 6 100-116
how should you report near-significant
results?
• not as ‘trends towards significance’
• but this is certainly not an argument for
ignoring ‘interesting hints’ (Tukey again)
• so, a word like ‘hint’ perhaps, and always with
the CI
• views?
Download