Systematic Review:
Grading Strength of Evidence
Kathleen N. Lohr, PhD
Distinguished Fellow
RTI International
Grading Strength of Evidence
Distinct from rating
g the quality
q
y of individual
studies
Pertains to entire bodies of literature or
evidence
Done later in the process of producing review
Generally done only for
Major outcomes (benefits and harms)
j comparisons,
p
, when relevant
Major
Why Grade Strength of Evidence?
To facilitate use of systematic
y
reviews by
y
diverse decisionmakers and stakeholders
To give decisionmakers:
Comprehensive evaluation of the evidence
Sense of ho
how m
much
ch confidence they
the can place in the
evidence
T foster
To
f t ttransparency and
dd
documentation
t ti
Three Steps to Grading SOE
1.Scoring four required domains
1. Risk of bias
2. Consistency
3. Directness
4. Precision
2.Considering,
d
possibly
bl scoring, four additional
dd
l domains
d
1. Dose-response association
2 Plausible
2.
Pl
ibl confounders
f
d
3. Strength of association
4 Publication bias
4.
3.Combining scores from required domains into a single SOE score,
taking scores on additional domains into account as needed
Four Required Domains: Risk of Bias
Concerns both study design and study conduct
f individual
for
i di id l studies,
t di
rated
t db
by usuall methods
th d
Assesses the aggregate
gg g
quality
q
y of studies
within each major study design and integrates
those assessments into an overall risk-of-bias
score
Four Required Domains: Consistency
Degree of similarity in the effect sizes of
different studies within an evidence base
Consistent: same direction of effect (same side of
“no effect”) and narrow range of effect sizes
Inconsistent: nonoverlapping confidence intervals,
significant unexplained clinical or statistical
heterogeneity, etc.
Four Required Domains: Directness
Whether evidence reflects a single, direct link
between the interventions of interest and the
ultimate health outcome under consideration
or relies on multiple links
Using analytic frameworks is important
If multiple links are involved, SOE can be only
as strong as weakest link
Four Required Domains:
Aspects of Indirectness
Intermediate or surrogate outcomes instead of health
or patient-centered
ti t
t
d outcomes
t
− e.g., lab results or radiology findings vs. patient-reported functional
outcomes or death
Indirect comparisons rather than direct, head-to-head
comparisons
− Direct: e.g.,
e g A vs.
vs B,
B A vs.
vs C
C, and B vs.
vs C
• Head-to-head studies in the evidence base
• Generally assumes use of health outcomes, not surrogate/proxy outcomes
• Better SOE
− Indirect (e.g., A vs. B, B vs. C, but not A vs. C):
• No head-to-head studies that cover all interventions or outcomes of
interest
• Problematic situation for all types of comparisons
• SOE not as strong as with direct evidence
Four Required Domains: Precision
Degree
g
of certainty
y for estimate of effect with
respect to a specific outcome
Complicated concept
− Asks what decisionmakers can conclude about
whether one treatment is,
is clinically speaking
speaking,
inferior, superior, or equivalent (neither inferior nor
superior) to another
− Includes considerations of statistical significance for
effect estimates and confidence intervals for those
effect estimates
Additional Domains
Four
F
“discretionary”
“di
ti
”d
domains
i
− Dose-response association
− Plausible confounders
g of association
− Strength
− Publication bias
Use when they are
− Applicable
− Helpful
l f l in reaching
h
conclusions
l
about
b
overall
ll grades
d
for SOE
Procedures for Assessing Domains
Use two or more reviewers with the
appropriate clinical and methodological
expertise
Assess separately
− Each required domain (or each optional domain, as relevant)
− For each major outcome, including benefits and harms
Resolve differences by consensus or mediation
b an additional
by
dditi
l expert;
t consensus scores
appear in tables
Record
d and
d maintain records
d of each
h
reviewer's individual judgments about domains
as background documentation.
documentation
Strength of Evidence Grades
Global assessment that
Takes the required domains directly into account
Incorporates judgments about the additional domains, as
needed
For both benefits and harms, focus on outcomes
most relevant to patients, clinicians, and
policymakers
Strength of Evidence Grades and Definitions
High: High confidence that the evidence reflects
the true effect.
effect Further research is very unlikely to
change our confidence in the estimate of effect.
Moderate: Moderate confidence that the evidence
reflects the true effect. Further research may
change our confidence in the estimate of effect and
may change the estimate.
estimate
Low: Low confidence that the evidence reflects
the true effect. Further research is likely to change
the confidence in the estimate of effect and is likely
to change the estimate.
Insufficient:
ff
Evidence
id
either
i h is
i unavailable
il bl or
does not permit a conclusion.
Scoring and Reporting: General Guidance
Use different approaches
pp
to incorporate
p
multiple domains into an overall strength-ofevidence grade
GRADE algorithm
Evidence-based
based Practice
Weighting system of the Evidence
Center
Some qualitative approach
Use (at least) two reviewers
Assess resulting
l i iinterrater reliability
li bili ffor each
h
domain score, and keep records
Other Grading Systems
GRADE working
g group
g
p
EPC and GRADE approaches are quite similar
EPC approach reflects particular needs for reviews done on a
wide variety of topics for AHRQ Stakeholders
Main differences
Domain definitions differ slightly (e.g., directness excludes
applicability, which is handled separately)
Initial grade for evidence about harms based on observational
studies can be “moderate”
grade definition differs: EPCs emphasize
p
confidence in
Overall g
estimate; GRADE emphasizes effect of future research
EPC method permits three different ways to reach overall SOE
grade; the GRADE formula has one
Grading Strength of Evidence: An Approach to
Presenting Results — Moderate and High
Number Number
of Studies (Subjects)
Domains Pertaining to Strength of Evidence
Risk of Bias; Design/Quality
Consistency
Directness
Precision
Severe Diarrhea
Severe Diarrhea
4 (256)
14 (28,400)
Absolute Risk Absolute
Risk
Difference per 100 Patients
Moderate SOE
Moderate SOE
RCT/Fair
Consistent
Direct
Imprecise
4 (95% CI –8 to +1)
Cohort/Fair
Consistent
Direct
Precise
5 (95% CI 8 to 2)
Improved Quality of Life
6 (265)
Magnitude of Magnitude
of
Effect and Strength of Evidence (SOE)
RCTs/Good
Consistent
High SOE
Direct
CI = confidence interval; RCT = randomized controlled trial
Precise
5 (95% CI 1 to 7)
Summary: Grading Strength of Evidence
Is a critical last step in analysis and presentation
Is done after quality of articles and by at least two
independent reviewers
Helps users of systematic reviews understand the
body of evidence and how much confidence they
g decisions based on that
can have in making
evidence
Uses scores on four primary (mandatory) domains
and four additional (discretionary) domains
Focuses on major outcomes and comparisons
Is denoted in terms of high,
high moderate,
moderate or low
strength or insufficient evidence
Presents SOE grades in tabular form
References
Owens DK, Lohr KN, Atkins D, et al. Grading the strength
of a body of evidence when comparing medical
interventions —Agency for Healthcare Research and
Quality and the Effective Health Care Program.
Program J Clin
Epidemiol 2010;63:513-523.
Owens DK,
DK Lohr KN,
KN Atkins D
D, et al
al. Grading the strength
of a body of evidence when comparing medical
interventions. In: Agency
g
y for Healthcare Research and
Quality. Methods Guide for Comparative Effectiveness
Reviews [posted July 2009]. Rockville, MD. Available at:
h
http://effectivehealthcare.
ff
h lh
ahrq.gov/healthInfo.cfm?infotype=rr&ProcessID=60.