INTERPRETING GRAPHS AND DIAGRAMS

advertisement
1. Correct answer D
L’abbe plots tabulate dichotomous outcomes in graph form. The size of the circle
corresponds to the sample size, ie the bigger the circle the better the study
The middle of the circle is the point estimate so EER – CER for study A is 0.8 –
0.4 = 0.4 and NNT = 1/0.4 = 2.5
2. Correct answer E
The circumference is not a confidence level. It doesn’t make any difference if it
crosses the line
3. Correct answer E
The x axis of a funnel plot shows effect size, the y axis shows a proxy of precision
which could be any of the other answers, with the scale oriented to show
increasing precision.
4. Correct answer A
The two tests used for analysing funnel plots are Egger’s and Begg’s tests with the
Fail safe N sometimes getting a mention.
Egger’s and Begg’s are similar to tests of heterogeneity in that they tell you how
likely it is that the diagram represents a random sample; the implication being that
if it doesn’t publication bias has occurred. Egger’s test is not very good. With low
numbers of studies it’s undersensitive; at high numbers it’s oversensitive.
Fail-safe N tells you how many negative studies would have to have been
unpublished to invalidate the result; it’s intuitively attractive but a bit fraught and
unreliable statistically
5. Correct answer B
A fiendish question to illustrate just how liable to bias funnel plots are.
The plot on the left ‘shows’ that negative studies are missing, the plot on the right
‘shows’ positive findings are missing. As the results directly contradict each other
they could not be taken to suggest publication bias.
Given the marked disparity between sample size and effect size it would be very
likely that clinical (selection bias) or methodological (parallel group vs crossover
studies) heterogeneity is manifesting as statistical heterogeneity.
There is a brilliant account of the problems of funnel plots in
‘The case of the misleading funnel plot’ Lau J et al; BMJ; 2006; 333; 597-600
6. Correct answer C
It’s just Cohen’s d again (mean difference/standard deviation for each study) to
allow reduction to a common outcome and combination of studies
But isn’t doctor Hart’s merry mood e’er more sportive and gamey as we approach
the end.
7. Correct answer C
The line of effect for absolute measures such as mean difference is 0
The line of no effect for relative measures such as RR and OR is 1
There are only two numbers in critical appraisal, one and zero; zero means there’s
no difference in absolute terms; one means there’s no difference in relative terms
8. Correct answer A
Because it’s been given the highest weight in the combined analysis. Weighting
reflects various aspects of research probity, such as reporting of randomisation
method and allocation concealment, blinding and follow up, as well as the likely
precision of the effect as reflected in the sample size and the se
9. Correct answer D
ie an effect size of 1
10. Correct answer E
You can only test for statistical heterogeneity, ie the probability that the spread of
results is consistent with a random sample of studies all measuring the same
effect.
If statistical heterogeneity is present it may be simply a false positive. On the
other hand it could indicate underlying clinical or methodological heterogeneity
with the implication that there is no single underlying effect and that the statistical
combination of the studies is invalid.
There’s no such thing as behavioural heterogeneity but that doesn’t mean we
shouldn’t celebrate our differences
11. Correct answer C
Measurement of heterogeneity is an attempt to gauge the extent to which the
observed spread of results is consistent with chance as opposed to the possibility
that the studies are spread out because they’re derived from different populations
showing different effects with different treatments measured in different ways.
Low heterogeneity means that the observed variation is consistent with chance
and it’s valid to combine the studies. High heterogeneity means that the spread is
larger than you’d expect and one or other intervening variables (clinical,
methodological) is playing a part and statistical combination may not be valid.
There are 4 ways of testing for heterogeneity
o Plumbline ie, if a single vertical line cuts all the confidence intervals
heterogeneity is less likely (I have no idea how valid this is but as with all tests
of heterogeneity its probably insensitive with low numbers and oversensitive
with high numbers)
o χ2 (aka Cochran’s Q). The output is χ and then a p value. If the p value is
less than 0.10 heterogeneity is likely, ie the spread of results is less than 10%
likely to have occurred by chance. If the p value is higher then heterogeneity
is less likely.
This test is unsatisfactory in all sorts of ways.

Firstly you’re interested in the degree to which the result is influenced by
heterogeneity (some sort of effect size) not an arbitary cut off.

Secondly the test itself is insensitive with metatanalyses of probably less
than 10 and definitely less than 5 studies so as to be completely invalid in
the sort of 2 study alleged meta-analyses that populate the psych literature.
As such a p value of 0.10 is adopted to make the test more sensitive

Thirdly if you have lots of studies the test is oversensitive and will pick up
tiny and completely irrelevant degrees of heterogeneity.
o I2 is a relative newcomer that, although based on Cochran’s Q, gives a much
more helpful and understandable output. Basically < 25% means that there’s
not much influence on the result due to heterogeneity; 25% to 50% a bit, 50%
to 75% loads, and >75% truckloads. Cf ‘Measuring inconsistency in metaanalyses’ Higgins JPT et al; BMJ; 2003; 327; 557-560 for a full explanation
o Comparison of Fixed and Random effect models. Lastly a very good rule
of thumb is that if the fixed effect analysis (which assumes no heterogeneity)
and the random effect analysis (which assumes that there is heterogeneity)
give the same result ewith the same CI then heterogeneity is excluded. If a
result is significant on fixed effect and insignificant on random effect you
assume heterogeneity
As an aside I would have to say that the vast majority of so called meta-analyses
are a waste of time because they include too few studies and purport to investigate
effects that are liable to vary enormously with changes in population both in terms
of the subjects and the people administering the intervention. If it’s ludicrous to
suggest, for example, that you couldn’t have a single unitary effect size for an
intervention like hypnotherapy I don’t see why you can’t say the same for any
other psychological intervention.
Rather than delivering some sort of unitary, universal but completely invalid
effect size most meta-analyses are interesting, if at all, in that they reveal
heterogeneity and prompt you to look for the cause rather than explaining it away
as experimental noise.
Have some references if you like….
‘Why sources of heterogeneity in meta-analyses should be investigated’
Thomson, S.G.; BMJ; 1994; 309: 1351-1355
‘Systematic reviews: meta-analysis and its problems’
Eysenck, H.J.; BMJ; 1994; 309: 789-792
…..but the best ever appraisal of a meta-analysis that I’ve ever seen said
something along the lines of ‘this study combines some grapes, 2 oranges and a
banana and comes up with fruit salad’ which probably applies to the majority.
12. Correct answer D
13. Correct answer A
Kaplan Meier curve is a way of representing time to outcome that makes best
use of all the information you have.
Imagine you have 3 months to recruit patients and 9 months to follow them up
after that. That means that some patients will be recruited on day 1 but some
patients might be recruited at the end of three months such that the follow time
will be 12 months and 9 months. It is also the case that some patients will go
away, withdrawal from follow up and otherwise leave the study without
experiencing the endpoint.
Because of all of the above, rather than having follow up data for 20 patients
for 12 months what you really have is a range of follow up periods. In the
initial stages of the study you’ll have information for everyone but as time
goes on people drop out. After that time they become censored values and the
fact that they’ve left the study, or were recruited later and were still unrelapsed
at the end of the study, is noted by the little mark. In effect it means that the
further to the right you go on a Kaplan Meier curve the more imprecise the
estimate because it’s based on fewer and fewer people because of a
combination of subjects reaching endpoint and censored individuals who are
no longer available for follow up.
Kaplan Meier curves assume that censored individuals dropped out at random
and are essentially the same as included subjects. Obviously this is a major
assumption and the appraisal of survival curves rests on whether or not there
has been selective attrition
Curves can be compared via the log rank test (which is parametric test with a
bewildering array of alternative names). The test assumes that the hazard
function (ie the rate of individuals experiencing endpoint) is constant for each
curve. The rule of thumb is that if the lines cross the assumption is violated
Curves can be further compared via Cox Hazard regression that looks at the
relative rates of relapse (output RR) +/- controlling for confounders
14. Correct answer C
The point at which the cumulative survival function reaches 0,5. ie draw a line
from the 0.5 on the y axis over to the line, draw a perpendicular and you’re
there
NB this is not the point where 10 patients (ie 50% of the original sample) will
have had a documented relapse.
15. Correct answer D
The probability of surviving for 200 days is the probability of surviving to 100
days multiplied by the probability of surviving from day 100 to day 200.
Therefore;
the probability of surviving from day 100 to day 200 =
the probability of surviving from time zero to 200 days divided by the
probability of surviving from time zero to 100 days
= 0.7/0.9
= 0.777
(This means that the answer given for Spring 2004 Question B2(b)(iii) in the
college book is wrong)
Survival analysis is either not covered in critical appraisal books or done very
badly with loads of mistakes. In fact if you’re thinking of buying a textbook
it’s a good thing to look up to see how well its explained
However the best explanation I found was in this article via Google.
‘A primer on survival analysis’ IH Kahn, GJ Prescott.
Download