Systematic Review: Grading Strength of Evidence Kathleen N. Lohr, PhD Distinguished Fellow

advertisement
Systematic Review:
Grading Strength of Evidence
Kathleen N. Lohr, PhD
Distinguished Fellow
RTI International
Grading Strength of Evidence
Distinct from rating
g the quality
q
y of individual
studies
Pertains to entire bodies of literature or
evidence
Done later in the process of producing review
Generally done only for
 Major outcomes (benefits and harms)
j comparisons,
p
, when relevant
 Major
Why Grade Strength of Evidence?
To facilitate use of systematic
y
reviews by
y
diverse decisionmakers and stakeholders
To give decisionmakers:
Comprehensive evaluation of the evidence
Sense of ho
how m
much
ch confidence they
the can place in the
evidence
T foster
To
f t ttransparency and
dd
documentation
t ti
Three Steps to Grading SOE
1.Scoring four required domains
1. Risk of bias
2. Consistency
3. Directness
4. Precision
2.Considering,
d
possibly
bl scoring, four additional
dd
l domains
d
1. Dose-response association
2 Plausible
2.
Pl
ibl confounders
f
d
3. Strength of association
4 Publication bias
4.
3.Combining scores from required domains into a single SOE score,
taking scores on additional domains into account as needed
Four Required Domains: Risk of Bias
Concerns both study design and study conduct
f individual
for
i di id l studies,
t di
rated
t db
by usuall methods
th d
Assesses the aggregate
gg g
quality
q
y of studies
within each major study design and integrates
those assessments into an overall risk-of-bias
score
Four Required Domains: Consistency
Degree of similarity in the effect sizes of
different studies within an evidence base
Consistent: same direction of effect (same side of
“no effect”) and narrow range of effect sizes
Inconsistent: nonoverlapping confidence intervals,
significant unexplained clinical or statistical
heterogeneity, etc.
Four Required Domains: Directness
Whether evidence reflects a single, direct link
between the interventions of interest and the
ultimate health outcome under consideration
or relies on multiple links
Using analytic frameworks is important
If multiple links are involved, SOE can be only
as strong as weakest link
Four Required Domains:
Aspects of Indirectness
Intermediate or surrogate outcomes instead of health
or patient-centered
ti t
t
d outcomes
t
− e.g., lab results or radiology findings vs. patient-reported functional
outcomes or death
Indirect comparisons rather than direct, head-to-head
comparisons
− Direct: e.g.,
e g A vs.
vs B,
B A vs.
vs C
C, and B vs.
vs C
• Head-to-head studies in the evidence base
• Generally assumes use of health outcomes, not surrogate/proxy outcomes
• Better SOE
− Indirect (e.g., A vs. B, B vs. C, but not A vs. C):
• No head-to-head studies that cover all interventions or outcomes of
interest
• Problematic situation for all types of comparisons
• SOE not as strong as with direct evidence
Four Required Domains: Precision
Degree
g
of certainty
y for estimate of effect with
respect to a specific outcome
Complicated concept
− Asks what decisionmakers can conclude about
whether one treatment is,
is clinically speaking
speaking,
inferior, superior, or equivalent (neither inferior nor
superior) to another
− Includes considerations of statistical significance for
effect estimates and confidence intervals for those
effect estimates
Additional Domains
Four
F
“discretionary”
“di
ti
”d
domains
i
− Dose-response association
− Plausible confounders
g of association
− Strength
− Publication bias
Use when they are
− Applicable
− Helpful
l f l in reaching
h
conclusions
l
about
b
overall
ll grades
d
for SOE
Procedures for Assessing Domains
Use two or more reviewers with the
appropriate clinical and methodological
expertise
Assess separately
− Each required domain (or each optional domain, as relevant)
− For each major outcome, including benefits and harms
Resolve differences by consensus or mediation
b an additional
by
dditi
l expert;
t consensus scores
appear in tables
Record
d and
d maintain records
d of each
h
reviewer's individual judgments about domains
as background documentation.
documentation
Strength of Evidence Grades
Global assessment that
Takes the required domains directly into account
Incorporates judgments about the additional domains, as
needed
For both benefits and harms, focus on outcomes
most relevant to patients, clinicians, and
policymakers
Strength of Evidence Grades and Definitions
High: High confidence that the evidence reflects
the true effect.
effect Further research is very unlikely to
change our confidence in the estimate of effect.
Moderate: Moderate confidence that the evidence
reflects the true effect. Further research may
change our confidence in the estimate of effect and
may change the estimate.
estimate
Low: Low confidence that the evidence reflects
the true effect. Further research is likely to change
the confidence in the estimate of effect and is likely
to change the estimate.
Insufficient:
ff
Evidence
id
either
i h is
i unavailable
il bl or
does not permit a conclusion.
Scoring and Reporting: General Guidance
Use different approaches
pp
to incorporate
p
multiple domains into an overall strength-ofevidence grade
 GRADE algorithm
Evidence-based
based Practice
 Weighting system of the Evidence
Center
 Some qualitative approach
Use (at least) two reviewers
Assess resulting
l i iinterrater reliability
li bili ffor each
h
domain score, and keep records
Other Grading Systems
GRADE working
g group
g
p
EPC and GRADE approaches are quite similar
EPC approach reflects particular needs for reviews done on a
wide variety of topics for AHRQ Stakeholders
Main differences
Domain definitions differ slightly (e.g., directness excludes
applicability, which is handled separately)
Initial grade for evidence about harms based on observational
studies can be “moderate”
grade definition differs: EPCs emphasize
p
confidence in
Overall g
estimate; GRADE emphasizes effect of future research
EPC method permits three different ways to reach overall SOE
grade; the GRADE formula has one
Grading Strength of Evidence: An Approach to
Presenting Results — Moderate and High
Number Number
of Studies (Subjects)
Domains Pertaining to Strength of Evidence
Risk of Bias; Design/Quality
Consistency
Directness
Precision
Severe Diarrhea
Severe Diarrhea
4 (256)
14 (28,400)
Absolute Risk Absolute
Risk
Difference per 100 Patients
Moderate SOE
Moderate SOE
RCT/Fair
Consistent
Direct
Imprecise
4 (95% CI –8 to +1)
Cohort/Fair
Consistent
Direct
Precise
5 (95% CI 8 to 2)
Improved Quality of Life
6 (265)
Magnitude of Magnitude
of
Effect and Strength of Evidence (SOE)
RCTs/Good
Consistent
High SOE
Direct
CI = confidence interval; RCT = randomized controlled trial
Precise
5 (95% CI 1 to 7)
Summary: Grading Strength of Evidence
Is a critical last step in analysis and presentation
Is done after quality of articles and by at least two
independent reviewers
Helps users of systematic reviews understand the
body of evidence and how much confidence they
g decisions based on that
can have in making
evidence
Uses scores on four primary (mandatory) domains
and four additional (discretionary) domains
Focuses on major outcomes and comparisons
Is denoted in terms of high,
high moderate,
moderate or low
strength or insufficient evidence
Presents SOE grades in tabular form
References
Owens DK, Lohr KN, Atkins D, et al. Grading the strength
of a body of evidence when comparing medical
interventions —Agency for Healthcare Research and
Quality and the Effective Health Care Program.
Program J Clin
Epidemiol 2010;63:513-523.
Owens DK,
DK Lohr KN,
KN Atkins D
D, et al
al. Grading the strength
of a body of evidence when comparing medical
interventions. In: Agency
g
y for Healthcare Research and
Quality. Methods Guide for Comparative Effectiveness
Reviews [posted July 2009]. Rockville, MD. Available at:
h
http://effectivehealthcare.
ff
h lh
ahrq.gov/healthInfo.cfm?infotype=rr&ProcessID=60.
Download