Systematic Review: Grading Strength of Evidence Kathleen N. Lohr, PhD Distinguished Fellow RTI International Grading Strength of Evidence Distinct from rating g the quality q y of individual studies Pertains to entire bodies of literature or evidence Done later in the process of producing review Generally done only for Major outcomes (benefits and harms) j comparisons, p , when relevant Major Why Grade Strength of Evidence? To facilitate use of systematic y reviews by y diverse decisionmakers and stakeholders To give decisionmakers: Comprehensive evaluation of the evidence Sense of ho how m much ch confidence they the can place in the evidence T foster To f t ttransparency and dd documentation t ti Three Steps to Grading SOE 1.Scoring four required domains 1. Risk of bias 2. Consistency 3. Directness 4. Precision 2.Considering, d possibly bl scoring, four additional dd l domains d 1. Dose-response association 2 Plausible 2. Pl ibl confounders f d 3. Strength of association 4 Publication bias 4. 3.Combining scores from required domains into a single SOE score, taking scores on additional domains into account as needed Four Required Domains: Risk of Bias Concerns both study design and study conduct f individual for i di id l studies, t di rated t db by usuall methods th d Assesses the aggregate gg g quality q y of studies within each major study design and integrates those assessments into an overall risk-of-bias score Four Required Domains: Consistency Degree of similarity in the effect sizes of different studies within an evidence base Consistent: same direction of effect (same side of “no effect”) and narrow range of effect sizes Inconsistent: nonoverlapping confidence intervals, significant unexplained clinical or statistical heterogeneity, etc. Four Required Domains: Directness Whether evidence reflects a single, direct link between the interventions of interest and the ultimate health outcome under consideration or relies on multiple links Using analytic frameworks is important If multiple links are involved, SOE can be only as strong as weakest link Four Required Domains: Aspects of Indirectness Intermediate or surrogate outcomes instead of health or patient-centered ti t t d outcomes t − e.g., lab results or radiology findings vs. patient-reported functional outcomes or death Indirect comparisons rather than direct, head-to-head comparisons − Direct: e.g., e g A vs. vs B, B A vs. vs C C, and B vs. vs C • Head-to-head studies in the evidence base • Generally assumes use of health outcomes, not surrogate/proxy outcomes • Better SOE − Indirect (e.g., A vs. B, B vs. C, but not A vs. C): • No head-to-head studies that cover all interventions or outcomes of interest • Problematic situation for all types of comparisons • SOE not as strong as with direct evidence Four Required Domains: Precision Degree g of certainty y for estimate of effect with respect to a specific outcome Complicated concept − Asks what decisionmakers can conclude about whether one treatment is, is clinically speaking speaking, inferior, superior, or equivalent (neither inferior nor superior) to another − Includes considerations of statistical significance for effect estimates and confidence intervals for those effect estimates Additional Domains Four F “discretionary” “di ti ”d domains i − Dose-response association − Plausible confounders g of association − Strength − Publication bias Use when they are − Applicable − Helpful l f l in reaching h conclusions l about b overall ll grades d for SOE Procedures for Assessing Domains Use two or more reviewers with the appropriate clinical and methodological expertise Assess separately − Each required domain (or each optional domain, as relevant) − For each major outcome, including benefits and harms Resolve differences by consensus or mediation b an additional by dditi l expert; t consensus scores appear in tables Record d and d maintain records d of each h reviewer's individual judgments about domains as background documentation. documentation Strength of Evidence Grades Global assessment that Takes the required domains directly into account Incorporates judgments about the additional domains, as needed For both benefits and harms, focus on outcomes most relevant to patients, clinicians, and policymakers Strength of Evidence Grades and Definitions High: High confidence that the evidence reflects the true effect. effect Further research is very unlikely to change our confidence in the estimate of effect. Moderate: Moderate confidence that the evidence reflects the true effect. Further research may change our confidence in the estimate of effect and may change the estimate. estimate Low: Low confidence that the evidence reflects the true effect. Further research is likely to change the confidence in the estimate of effect and is likely to change the estimate. Insufficient: ff Evidence id either i h is i unavailable il bl or does not permit a conclusion. Scoring and Reporting: General Guidance Use different approaches pp to incorporate p multiple domains into an overall strength-ofevidence grade GRADE algorithm Evidence-based based Practice Weighting system of the Evidence Center Some qualitative approach Use (at least) two reviewers Assess resulting l i iinterrater reliability li bili ffor each h domain score, and keep records Other Grading Systems GRADE working g group g p EPC and GRADE approaches are quite similar EPC approach reflects particular needs for reviews done on a wide variety of topics for AHRQ Stakeholders Main differences Domain definitions differ slightly (e.g., directness excludes applicability, which is handled separately) Initial grade for evidence about harms based on observational studies can be “moderate” grade definition differs: EPCs emphasize p confidence in Overall g estimate; GRADE emphasizes effect of future research EPC method permits three different ways to reach overall SOE grade; the GRADE formula has one Grading Strength of Evidence: An Approach to Presenting Results — Moderate and High Number Number of Studies (Subjects) Domains Pertaining to Strength of Evidence Risk of Bias; Design/Quality Consistency Directness Precision Severe Diarrhea Severe Diarrhea 4 (256) 14 (28,400) Absolute Risk Absolute Risk Difference per 100 Patients Moderate SOE Moderate SOE RCT/Fair Consistent Direct Imprecise 4 (95% CI –8 to +1) Cohort/Fair Consistent Direct Precise 5 (95% CI 8 to 2) Improved Quality of Life 6 (265) Magnitude of Magnitude of Effect and Strength of Evidence (SOE) RCTs/Good Consistent High SOE Direct CI = confidence interval; RCT = randomized controlled trial Precise 5 (95% CI 1 to 7) Summary: Grading Strength of Evidence Is a critical last step in analysis and presentation Is done after quality of articles and by at least two independent reviewers Helps users of systematic reviews understand the body of evidence and how much confidence they g decisions based on that can have in making evidence Uses scores on four primary (mandatory) domains and four additional (discretionary) domains Focuses on major outcomes and comparisons Is denoted in terms of high, high moderate, moderate or low strength or insufficient evidence Presents SOE grades in tabular form References Owens DK, Lohr KN, Atkins D, et al. Grading the strength of a body of evidence when comparing medical interventions —Agency for Healthcare Research and Quality and the Effective Health Care Program. Program J Clin Epidemiol 2010;63:513-523. Owens DK, DK Lohr KN, KN Atkins D D, et al al. Grading the strength of a body of evidence when comparing medical interventions. In: Agency g y for Healthcare Research and Quality. Methods Guide for Comparative Effectiveness Reviews [posted July 2009]. Rockville, MD. Available at: h http://effectivehealthcare. ff h lh ahrq.gov/healthInfo.cfm?infotype=rr&ProcessID=60.