Reporting systematic reviews and meta-analyses (PRISMA) and observational studies (STROBE) Doug Altman The EQUATOR Network Centre for Statistics in Medicine, Oxford, UK October 2012 Systematic review A systematic review is a scientific investigation that focuses on a specific question and uses explicit, prespecified scientific methods to identify, select, assess, and summarise the findings of similar but separate studies. – A study of studies Objective is to summarize evidence from multiple studies using explicit methods It may include a quantitative synthesis (metaanalysis), depending on the available data 2 Key characteristics of SR Focused well defined research question Clearly stated title and objectives Comprehensive strategy for identification of all relevant studies (published & unpublished) Explicit (and justified) predefined inclusion & exclusion criteria Critical appraisal of studies Clear analysis of the results of eligible studies – Quantitative (meta-analysis) – Qualitative Structured report 3 The QUOROM Statement [Moher et al 1999] Guidance on what information should be included in published reports of meta-analyses of randomized trials Checklist of items which should be reported Also recommended a flow diagram showing flow of studies through the review – to be included in the published report Evidence-based, whenever possible QUOROM developed in 1996 and published in 1999 Moher et al. Improving the quality of reporting of meta-analyses of randomised controlled trials: the QUOROM statement. Lancet 1999;354:1896-1900 4 Checklist of items 5 Also recommended a flow diagram showing flow of studies through the review 6 PRISMA: Preferred Reporting Items for Systematic Reviews and Meta-Analyses Update of QUOROM Developed in 2005, Published in 2007 Consists of a 27 item checklist and a flow diagram Reporting of systematic reviews and metaanalyses that evaluate healthcare interventions Includes long explanatory document 7 Slide 8 of 43 identification eligibility screening included # of records identified through database searching # of additional records identified through other sources Total # of duplicates removed # of records screened Total # of articles assessed for eligibility # of records excluded # of articles excluded, with reasons Total # of studies included in qualitative synthesis of systematic review # of studies included in quantitative synthesis of systematic review 9 Reporting vs conduct: study methods METHODS – each aspect of the methods Done well Done poorly Not done Fully reported (=reproducible) Ambiguously or incompletely reported Not reported 10 PRISMA Practical 11 2 reviews in one article focus on progesterone 12 For each item … Is there text relating to the item? Does the text tell us what we need to know? 13 PRISMA: Item 6, eligibility criteria METHODS Eligibility criteria Specify study characteristics used as criteria for eligibility, giving rationale – PICOS (participants, interventions, comparisons, outcomes and study design) – Length of follow-up Specify report characteristics used as criteria for eligibility, giving rationale – Years considered – Languages – Publication status Can you locate any text about this issue in the report? 14 Wyatt et al No explicit statement of study characteristics “… clinical trials of progesterone and progestogens in the management of premenstrual syndrome.” Dates given in search criteria “All languages were included.” 15 PRISMA – item 8, search “Present full electronic search strategy for at least one database, including any limits used, such that it could be repeated” Can you locate any text about this issue in the report? Slide 16 of 43 Wyatt et al Page 2, Methods, Trials “MeSH terms used were premenstrual syndrome, progesterone, and progestogen, as well as the individual drug names, together with title and abstract searches for keywords progesterone, progestogen, premenstrual syndrome, premenstrual tension (PMT), late luteal phase dysphoric disorder (LLPDD), premenstrual dysphoria (PMD), and premenstrual dysphoric disorder (PMDD).” Slide 17 of 43 PRISMA: Reporting search strategy We realize that journal restrictions vary and that having the search strategy in the text of the report is not always feasible – Expensive real estate We strongly encourage all journals, however, to find ways, such as a ‘‘Web extra,’’ appendix, or electronic link to an archive, to make search strategies accessible to readers We also advise all authors to archive their searches so that: – others may access and review them (e.g., replicate them or understand why their review of a similar topic did not identify the same reports) – future updates of their review are facilitated PRISMA: Item 9, study selection METHODS Study selection State the process for selecting studies – Screening – Eligibility – Included in systematic review and, if applicable, included in meta-analysis Can you locate any text about this issue in the report? 19 Wyatt et al “We searched medical databases for reports of published clinical trials of progesterone and progestogens in the management of premenstrual syndrome.” “References cited in all trials were searched iteratively to identify missing studies. All languages were included.” “We included trials that investigated the effect of progesterone or progestogens on premenstrual symptoms if they were randomised, placebo controlled, double blind studies that included patients with a pretreatment diagnosis of premenstrual syndrome, for which all data from the trials could be acquired.” 20 PRISMA Item 11, Data items METHODS Data items 11 List and define all variables for which data were sought (e.g., PICOS, funding sources) and any assumptions and simplifications made. (PICOS = participants, interventions, comparisons, outcomes and study design) Can you locate any text about this issue in the report? 21 Wyatt et al “We collected data on the dosage and preparation of treatment. The main outcome measure was a reduction in overall symptoms of premenstrual syndrome. Combined or overall symptoms was chosen in an attempt to overcome the clinical heterogeneity associated with the measurement and scoring of symptoms used in individual trials.” 22 PRISMA – item12, Risk of bias in individual studies “Describe methods used for assessing risk of bias of individual studies (including specification of whether this was done at the study or outcome level), and how this information is to be used in any data synthesis” – Can you locate any text about this issue in the report? Slide 23 of 43 Wyatt et al Page 2, Methods, Quality assessment “We assessed trial quality using a scale developed by Jadad et al,11 which assesses the randomisation, double blinding, reports of drop outs, and withdrawals for the trials ... our own quality scale, which assesses the quality of the trials for study design, reproducibility, and statistical analysis. This eight point scale comprised the following: confirmation that no other medications or oral contraceptives were being taken; a power calculation to justify patient numbers or more than 65 participants in each arm (enabling detection of a small effect size of 0.3, see below); a single, clearly stated dose of drug; reproducible measurement of premenstrual symptoms; clear presentation of results; a description of the number and reason for trial withdrawals; exclusion of, or a separate analysis of, participants with a major psychiatric disorder; and whether or not the trial was supported by independent funding.” Slide 24 of 43 Reporting risk of bias “Authors should report how they assessed risk of bias; whether it was in a blind manner; and if assessments were completed by more than one person, and if so, whether they were completed independently. Similarly, we encourage authors to report any calibration exercises among review team members that were done. Finally, authors need to report how their assessments of risk of bias are used subsequently in the data synthesis (see Item 16).” Slide 25 of 43 Wyatt et al Page 2, Methods, Quality assessment “We awarded one point for each category present in the trial. Each trial was independently scored by two investigators and the third investigator arbitrated on any disagreements. We used predetermined criteria for the recognition of the highest quality trials. A score of 3 or more was required in the Jadad score for the trial to be designated “high quality” and included in the metaanalysis11; a score of less than 3 meant that the trial was designated “low quality.” Slide 26 of 43 PRISMA Item 17 RESULTS Trial flow 17 Give numbers of studies screened, assessed for eligibility, and included in the review, with reasons for exclusions at each stage, ideally with a flow diagram. Can you locate any text about this issue in the report? 27 Page 2, Results “We identified 14 published trials that assessed the efficacy of progesterone in the management of premenstrual syndrome. We excluded four: two because of their low quality score on the Jadad scale, one because the data could not be extracted, and one because the trial failed to make a prospective diagnosis of premenstrual syndrome before randomisation. Ten trials remained ........” No flow diagram Slide 28 of 43 Caughey AB, Sundaram V, Kaimal AJ, Gienger A, Cheng YW, McDonald KM, Shaffer BL, Owens DK, Bravata DM. Systematic Review: Elective Induction of Labor Versus Expectant Management of Pregnancy. Annals of Internal Medicine 2009;151: 252-263 Slide 29 of 43 PRISMA Item 18 RESULTS Study characteristics 18 For each study, present characteristics for which data were extracted (e.g., study size, PICOS, follow-up period) and provide the citations. Can you locate any text about this issue in the report? 30 Wyatt et al 31 Wyatt et al Problems in Table 1 Crossover trials – not mentioned in text Comparator not stated Variation in outcome measures – how combined or chose one? Some statements disagree with Figure No details of elements of “quality” scores Unclear how handled trial with 3 arms 32 PRISMA – item 15, Risk of bias across studies Specify any assessment of risk of bias that may affect the cumulative evidence (e.g., publication bias, selective reporting within studies). Can you locate any text about this issue in the report? Slide 33 of 43 Page 2, methods, statistical analysis “We used the method of Egger et al to detect bias (such as publication and location bias) in the included trials with a funnel plot. We assessed the asymmetry of the funnel plot quantitatively by plotting a linear regression of the standard normal deviate (standardised mean difference divided by SE) against precision (inverse of SE). A regression line that passes through the origin of the plot (within error limits) indicates symmetry and hence the absence of bias.” Slide 34 of 43 PRISMA Item 20 RESULTS Results of individual studies 20 For all outcomes considered (benefits or harms) present, for each study: (a) simple summary data for each intervention group, (b) effect estimates and confidence intervals, ideally with a forest plot. 35 36 Altman & Cates complained about absence of any numerical results (BMJ rapid response) “There are several aspects of this review that readers cannot assess without summary data from each study. For example, we would wish – to assess the strange heterogeneity P values for Figures 1 and 2 (the quoted P values of 0.999 are implausible given the clear graphical heterogeneity); – to gain some insight into how the cross-over trials were included in the analysis (about which the authors say nothing at all) and whether the crossover and parallel group trials differed in their findings; – to seek an explanation for the apparent discrepancy for three trials (references 19, 31 and 32) between the ‘reported results’ in Table 1 and the results shown in Figures 1 and 2; – to assess the claim that random and fixed models give the same answer in the face of graphical heterogeneity;” – … and 5 more points 37 Altman & Cates complained about the absence of any numerical results “In addition, we note that the authors make no comment about the varied nature of the outcome measures in these trials, nor do they say which outcome was used for those trials that presented more than one. It is hard to believe that all of the scales can be considered equally valid assessments of symptoms. Also, we wonder if they can clarify the meaning in the figure legends of ‘standardised mean difference … for proportion of patients who showed improvement …’. We are puzzled by this terminology as the SMD gives no direct information about proportions of patients improving.” 38 Authors’ reply “We have found this personal attack unpleasant and upsetting and have to question the use of unsupported attacks in the Rapid Response forum.” “We have considerable experience in PMS (clinically as well as though our research) and believe ourselves competent to judge the clinical appropriateness of combining trial data.” The only question they addressed was one we had not asked! A 2nd request for the data to be provided went unanswered! 39 40 Observational studies Transparent reporting is particularly important for observational studies – Vulnerable to bias and confounding – Findings are often over-interpreted – Findings often generate health scares 41 41 42 Scope of STROBE Epidemiological research comprises several study designs and multiple topic areas Initial restriction to three major areas – cohort, case-control, and cross-sectional studies Later extensions to other study designs – STREGA for genetic association studies (published 2009) – STROBE-ME for molecular epidemiology – etc 43 Final STROBE checklist TITLE and ABSTRACT INTRODUCTION RESULTS Background/rationale Objectives METHODS Study design Setting Participants Variables Data sources/measurement Bias Study size Quantitative variables Statistical methods Participants Descriptive data Outcome data Main results Other analyses DISCUSSION Key results Limitations Interpretation Generalisability OTHER INFORMATION Funding 22 (34) items 44 44 Design-specific items Participants Statistical methods Descriptive data Outcome data 45 45 STROBE Statement Guidance on how to report observational studies well – Focus on 3 main study designs: cohort, case-control, crosssectional studies Published in Oct 2007: short paper and E&E Adopted by many journals Find it on: www.equator-network.org www.strobe-statement.org 46 Case-control studies Patients with a certain outcome or disease and an appropriate group of controls without the outcome or disease are selected – (usually with careful consideration of choice of controls, possibly with matching) Information is obtained on whether the participants have been exposed to the factor under investigation 47 STROBE exercise 48 49 50 STROBE Item 5. Setting Describe the setting, locations, and relevant dates, including periods of recruitment, exposure, followup, and data collection Can you locate any text about this issue in the report? 51 STROBE Item 5. Setting Qiu et al “This case-control study was conducted at the Materno Perinatal Institute of Lima and the Dos de Mayo Hospital in Lima, Peru, from May 2004 through October 2005. Both institutions are operated by the Peruvian government and are primarily responsible for providing maternity services to low income women residing in Lima.” 52 STROBE Item 6a. Participants Give the eligibility criteria, and the sources and methods of case ascertainment and control selection. Give the rationale for the choice of cases and controls Can you locate any text about this issue in the report? 53 STROBE Item 6a. Participants Qiu et al “Cases were selected from those women with a diagnosis of preeclampsia. Potential preeclampsia cases were identified by daily monitoring of all new admissions to antepartum wards, emergency room wards, and labor and delivery wards of the study hospitals. Study subjects were recruited during their hospital stay. Study personnel made periodic visits to each ward in a fixed order for the purposes of identifying potential cases and controls for the present study. Preeclampsia was defined by …” “Controls were women with pregnancies uncomplicated by pregnancy-induced hypertension or proteinuria. Each day during the enrollment period, controls were numbered in the order in which they were admitted and identified. Subsequently, they were approached in the order in which research personnel identified them.” 54 STROBE Item 10. Study size Explain how the study size was arrived at Can you locate any text about this issue in the report? 55 STROBE Item 10. Study size Qiu et al ? 56 STROBE Item 13. Participants (a) Report numbers of individuals at each stage of study—e.g. numbers potentially eligible, examined for eligibility, confirmed eligible, included in the study, completing follow-up, and analysed (b) Give reasons for non-participation at each stage (c) Consider use of a flow diagram 57 What we’d like to see: What do they give us? 58 58 Qiu et al 59 STROBE Item 19. Limitations Discuss limitations of the study, taking into account sources of potential bias or imprecision. Discuss both direction and magnitude of any potential bias Can you locate any text about this issue in the report? 60 Qiu et al “First, our analyses are based on cross-sectionally collected data, which may be subject to recall bias. There has been one longitudinal study of Finnish women [6]; however, more longitudinal studies are needed to re-examine the potential causal relation between maternal experience of depression and preeclampsia risk in different populations. Second, we used a depression screening instrument to categorize participants according to depression/depressive symptoms. Participants did not have formal diagnostic examinations. As a result, some misclassification is possible. ... In addition, our assessment of maternal depression and depressive symptoms was limited to the duration of the pregnancy. Last, although we adjusted for multiple confounding factors, as with all observational studies, we cannot exclude the possibility of some residual confounding.” 61 62 Closing Comments on Checklists They help AUTHORS ensure that they have addressed important issues in the report of their study They help PEER REVIEWERS and EDITORS by reminding them what issues should be addressed “Necessary but not sufficient!” 63 Closing Comments on Checklists They help AUTHORS ensure that they have addressed important issues in the report of their study They help PEER REVIEWERS and EDITORS by reminding them what issues should be addressed “Necessary but not sufficient!” 64