1. Correct answer D L’abbe plots tabulate dichotomous outcomes in graph form. The size of the circle corresponds to the sample size, ie the bigger the circle the better the study The middle of the circle is the point estimate so EER – CER for study A is 0.8 – 0.4 = 0.4 and NNT = 1/0.4 = 2.5 2. Correct answer E The circumference is not a confidence level. It doesn’t make any difference if it crosses the line 3. Correct answer E The x axis of a funnel plot shows effect size, the y axis shows a proxy of precision which could be any of the other answers, with the scale oriented to show increasing precision. 4. Correct answer A The two tests used for analysing funnel plots are Egger’s and Begg’s tests with the Fail safe N sometimes getting a mention. Egger’s and Begg’s are similar to tests of heterogeneity in that they tell you how likely it is that the diagram represents a random sample; the implication being that if it doesn’t publication bias has occurred. Egger’s test is not very good. With low numbers of studies it’s undersensitive; at high numbers it’s oversensitive. Fail-safe N tells you how many negative studies would have to have been unpublished to invalidate the result; it’s intuitively attractive but a bit fraught and unreliable statistically 5. Correct answer B A fiendish question to illustrate just how liable to bias funnel plots are. The plot on the left ‘shows’ that negative studies are missing, the plot on the right ‘shows’ positive findings are missing. As the results directly contradict each other they could not be taken to suggest publication bias. Given the marked disparity between sample size and effect size it would be very likely that clinical (selection bias) or methodological (parallel group vs crossover studies) heterogeneity is manifesting as statistical heterogeneity. There is a brilliant account of the problems of funnel plots in ‘The case of the misleading funnel plot’ Lau J et al; BMJ; 2006; 333; 597-600 6. Correct answer C It’s just Cohen’s d again (mean difference/standard deviation for each study) to allow reduction to a common outcome and combination of studies But isn’t doctor Hart’s merry mood e’er more sportive and gamey as we approach the end. 7. Correct answer C The line of effect for absolute measures such as mean difference is 0 The line of no effect for relative measures such as RR and OR is 1 There are only two numbers in critical appraisal, one and zero; zero means there’s no difference in absolute terms; one means there’s no difference in relative terms 8. Correct answer A Because it’s been given the highest weight in the combined analysis. Weighting reflects various aspects of research probity, such as reporting of randomisation method and allocation concealment, blinding and follow up, as well as the likely precision of the effect as reflected in the sample size and the se 9. Correct answer D ie an effect size of 1 10. Correct answer E You can only test for statistical heterogeneity, ie the probability that the spread of results is consistent with a random sample of studies all measuring the same effect. If statistical heterogeneity is present it may be simply a false positive. On the other hand it could indicate underlying clinical or methodological heterogeneity with the implication that there is no single underlying effect and that the statistical combination of the studies is invalid. There’s no such thing as behavioural heterogeneity but that doesn’t mean we shouldn’t celebrate our differences 11. Correct answer C Measurement of heterogeneity is an attempt to gauge the extent to which the observed spread of results is consistent with chance as opposed to the possibility that the studies are spread out because they’re derived from different populations showing different effects with different treatments measured in different ways. Low heterogeneity means that the observed variation is consistent with chance and it’s valid to combine the studies. High heterogeneity means that the spread is larger than you’d expect and one or other intervening variables (clinical, methodological) is playing a part and statistical combination may not be valid. There are 4 ways of testing for heterogeneity o Plumbline ie, if a single vertical line cuts all the confidence intervals heterogeneity is less likely (I have no idea how valid this is but as with all tests of heterogeneity its probably insensitive with low numbers and oversensitive with high numbers) o χ2 (aka Cochran’s Q). The output is χ and then a p value. If the p value is less than 0.10 heterogeneity is likely, ie the spread of results is less than 10% likely to have occurred by chance. If the p value is higher then heterogeneity is less likely. This test is unsatisfactory in all sorts of ways. Firstly you’re interested in the degree to which the result is influenced by heterogeneity (some sort of effect size) not an arbitary cut off. Secondly the test itself is insensitive with metatanalyses of probably less than 10 and definitely less than 5 studies so as to be completely invalid in the sort of 2 study alleged meta-analyses that populate the psych literature. As such a p value of 0.10 is adopted to make the test more sensitive Thirdly if you have lots of studies the test is oversensitive and will pick up tiny and completely irrelevant degrees of heterogeneity. o I2 is a relative newcomer that, although based on Cochran’s Q, gives a much more helpful and understandable output. Basically < 25% means that there’s not much influence on the result due to heterogeneity; 25% to 50% a bit, 50% to 75% loads, and >75% truckloads. Cf ‘Measuring inconsistency in metaanalyses’ Higgins JPT et al; BMJ; 2003; 327; 557-560 for a full explanation o Comparison of Fixed and Random effect models. Lastly a very good rule of thumb is that if the fixed effect analysis (which assumes no heterogeneity) and the random effect analysis (which assumes that there is heterogeneity) give the same result ewith the same CI then heterogeneity is excluded. If a result is significant on fixed effect and insignificant on random effect you assume heterogeneity As an aside I would have to say that the vast majority of so called meta-analyses are a waste of time because they include too few studies and purport to investigate effects that are liable to vary enormously with changes in population both in terms of the subjects and the people administering the intervention. If it’s ludicrous to suggest, for example, that you couldn’t have a single unitary effect size for an intervention like hypnotherapy I don’t see why you can’t say the same for any other psychological intervention. Rather than delivering some sort of unitary, universal but completely invalid effect size most meta-analyses are interesting, if at all, in that they reveal heterogeneity and prompt you to look for the cause rather than explaining it away as experimental noise. Have some references if you like…. ‘Why sources of heterogeneity in meta-analyses should be investigated’ Thomson, S.G.; BMJ; 1994; 309: 1351-1355 ‘Systematic reviews: meta-analysis and its problems’ Eysenck, H.J.; BMJ; 1994; 309: 789-792 …..but the best ever appraisal of a meta-analysis that I’ve ever seen said something along the lines of ‘this study combines some grapes, 2 oranges and a banana and comes up with fruit salad’ which probably applies to the majority. 12. Correct answer D 13. Correct answer A Kaplan Meier curve is a way of representing time to outcome that makes best use of all the information you have. Imagine you have 3 months to recruit patients and 9 months to follow them up after that. That means that some patients will be recruited on day 1 but some patients might be recruited at the end of three months such that the follow time will be 12 months and 9 months. It is also the case that some patients will go away, withdrawal from follow up and otherwise leave the study without experiencing the endpoint. Because of all of the above, rather than having follow up data for 20 patients for 12 months what you really have is a range of follow up periods. In the initial stages of the study you’ll have information for everyone but as time goes on people drop out. After that time they become censored values and the fact that they’ve left the study, or were recruited later and were still unrelapsed at the end of the study, is noted by the little mark. In effect it means that the further to the right you go on a Kaplan Meier curve the more imprecise the estimate because it’s based on fewer and fewer people because of a combination of subjects reaching endpoint and censored individuals who are no longer available for follow up. Kaplan Meier curves assume that censored individuals dropped out at random and are essentially the same as included subjects. Obviously this is a major assumption and the appraisal of survival curves rests on whether or not there has been selective attrition Curves can be compared via the log rank test (which is parametric test with a bewildering array of alternative names). The test assumes that the hazard function (ie the rate of individuals experiencing endpoint) is constant for each curve. The rule of thumb is that if the lines cross the assumption is violated Curves can be further compared via Cox Hazard regression that looks at the relative rates of relapse (output RR) +/- controlling for confounders 14. Correct answer C The point at which the cumulative survival function reaches 0,5. ie draw a line from the 0.5 on the y axis over to the line, draw a perpendicular and you’re there NB this is not the point where 10 patients (ie 50% of the original sample) will have had a documented relapse. 15. Correct answer D The probability of surviving for 200 days is the probability of surviving to 100 days multiplied by the probability of surviving from day 100 to day 200. Therefore; the probability of surviving from day 100 to day 200 = the probability of surviving from time zero to 200 days divided by the probability of surviving from time zero to 100 days = 0.7/0.9 = 0.777 (This means that the answer given for Spring 2004 Question B2(b)(iii) in the college book is wrong) Survival analysis is either not covered in critical appraisal books or done very badly with loads of mistakes. In fact if you’re thinking of buying a textbook it’s a good thing to look up to see how well its explained However the best explanation I found was in this article via Google. ‘A primer on survival analysis’ IH Kahn, GJ Prescott.