Grades

advertisement

GRADE background

certainty in evidence (quality, confidence
evidence)


evidence profiles
strength of recommendation

exercises in applying GRADE

experience participating guideline panels?

clin epi methodology course?

is grading recommendations a good idea? If so,
why?

experience with grading
 systems used?

many available
 Australian National and MRC
 Oxford Center for Evidence-based Medicine
 Scottish Intercollegiate Guidelines (SIGN)
 US Preventative Services Task Force
 American professional organizations
 AHA/ACC, ACCP, AAP, Endocrine society,
etc....

cause of confusion, dismay


GRADE (Grades of recommendation,
assessment, development and evaluation)
international group
 Australian NMRC, SIGN, USPSTF, WHO, NICE,
Oxford CEBM, CDC, CC

~ 35 meetings over last 14 years
▪ (~10 – 70 attendants)

2004 BMJ, first description

2008 BMJ six part series
 for guideline users

2010-13, 21 part series, 15 published
 for systematic review authors, HTA practitioners,
guideline developers

interventions
 management strategy 1 versus 2

what grade is not about
 individual studies (body of evidence)

diagnostic accuracy questions
 in patients with a sore leg, what is the accuracy of
a blood test (D-Dimer) in sorting out whether a
deep venous thrombosis is the cause of the pain

prognosis

what it is about: diagnostic impact
 are patients better off (improved outcomes)
when doctors use the d-dimer test
80+ Organizations
2005
2006
2007
2008
2009
2010
2011
9
GRADE uptake

two components

certainty in estimate of effect adequate to
support decision (quality of body of evidence)
 high, moderate, low, very low
Likelihood
of and
confidence
in an
outcome

Quality
 Initial choice, defined as confidence
 natural to clinicians, but confusion with risk of bias

Confidence
 what we actually mean, but confusion with
confidence intervals, and experts always confident

Certainty
 avoids confusion of others, experts might
acknowledge uncertainty - Current preferred term

two components

certainty in evidence adequate to support
decision (quality of body of evidence)
 high, moderate, low, very low

strength of recommendation
 strong and weak
 weak alternatives
 conditional, contingent, discretionary
Health Care
Question
(PICO)
Systematic reviews
Studies
S1
S2
S3
S4
Outcomes
OC1
OC2
OC3
OC4
OC1
OC2
Important
outcomes
Critical
outcomes
S5
OC3
OC4
Generate an estimate of effect for each outcome
Rate the quality of evidence for each outcome, across studies
RCTs start high, observational studies start low
(-)
Study limitations
Imprecision
Inconsistency of results
Indirectness of evidence
Publication bias likely
(+)
Large magnitude of effect
Dose response
Plausible confounders would ↓ effect when
an effect is present or ↑ effect if effect is
absent
Final rating of quality for each outcome: high, moderate, low, or very low
Rate overall quality of evidence
(lowest quality among critical outcomes)
Decide on the direction (for/against) and grade strength
(strong/weak*) of the recommendation considering:
Quality of the evidence
Balance of desirable/undesirable outcomes
Values and preferences
*also labeled “conditional”
or “discretionary”
Decide if any revision of direction or strength is
necessary considering: Resource use

patients:
 Males over 50 presenting with fatigue, malaise and erecticle
dysfunction with laboratory evidence of decreased
testosterone

intervention, testosterone

comparator no testosterone

outcomes?

Where to start RCTs and observational
studies (High, moderate, low, very low)?

Recall antioxidant vitamins
 Observational studies less cancer, CV outcomes
 RCTs no difference
 Result observed repeatedly
 What went wrong?

RCTs start high

observational studies start low

what can lower confidence?

risk of bias
inconsistency
indirectness
imprecision
publication bias





what to consider?

well established





concealment
intention to treat principle observed
blinding
completeness of follow-up
more recent
 selective outcome reporting bias
 Stopping early for benefit

what to consider?

accurate assessment of exposure

adjusted analysis for all important prognostic
factors, accurately measures

accurate assessment of outcome

completeness of follow-up

6 studies, 100 patients each

3 studies low risk of bias, 3 high

rate down for risk of bias?

How did you decide?

Similarity of point estimates
 less similar, less happy

Overlap of confidence intervals
 less overlap, less happy
-40
-24
-8
8
RRR (95% CI)
24
40
56
Homogenous
test for heterogeneity
what is the p-value?
what is the null hypothesis
for the test for heterogeneity?
Ho: RR1 = RR2 = RR3 = RR4
p=0.99 for heterogeneity
Heterogeneous
test for heterogeneity
what is the p-value?
p-value for heterogeneity < 0.001
p-value for heterogeneity < 0.001
2
I Interpretation
100%
Why are we
pooling?
75%
Very
concerned
50%
Getting
concerned
25%
Only a
little
concerned
0%
No worries
Homogenous
What is the I2 ?
p=0.99 for heterogeneity
I2=0%
Heterogeneous
What is the I2 ?
p-value for heterogeneity < 0.001
I2=89%
Relative Risk with 95% CI for Vitamin D
Non-vertebral Fractures
Relative Risk with 95% CI for Vitamin D
(Non-Vertebral Fractures, Dose >400)
Relative Risk with 95% CI for Vitamin D
(Non-Vertebral Fractures, Dose = 400)
within-study comparison?
No
unlikely chance
Yes, p = 0.006
consistent across studies
Yes
one of small number a priori hypothesis with
direction
Yes
 biologically compelling
Yes





shall we believe sub-group analysis?
no way
0
sure thing
100

populations
 older, sicker or more co-morbidity

interventions
 warfarin in trials vs clinical practice

outcomes
 important versus surrogate outcomes
 glucose control versus CV events
Directness
interested in A versus B
available data A vs C, B vs C
Alendronate
Risedronate
Placebo

small sample size
 small number of events

wide confidence intervals
 uncertainty about magnitude of effect


how do you decide what is too wide?
primary criterion:
 would decisions differ at ends of CI

atrial fib at risk of stroke

warfarin increases serious gi bleeding
 3% per year

1,000 patients 1 less stroke
 30 more bleeds for each stroke prevented

1,000 patients 100 less strokes
 3 strokes prevented for each bleed

where is your threshold?
 how many strokes in 100 with 3% bleeding?
1.0%
0
1.0%
0
1.0%
0
1.0%
0

pts with threatened stroke

RCT of clopidogrel vs ASA
 19,185 patients

ischaemic stroke, MI, or vascular death
compared
 939 events (5·32%) clopidogrel
 1021 events (5·83%) with aspirin

RR 0.91 (95% CI 0.83 – 0.99) (p=0·043)

rate down for precision?

Clopidogrel or ASA for threatened
vascular events

RCT 19,185 patients
1.7% - 0.9 – 0.1%

1.0%
0
RR 0.91 (95% CI 0.83 – 0.99)

small trials, large effect
 likely to be overestimate

analogy to stopping early

lack of prognostic balance

solution: optimal information size
 # of pts from conventional sample size calculation
 specify control group risk, α, β, Δ
Fluoroquinolone prophylaxis in neutropenia:
infection-related mortality
Total number of events: 47
Fluoroquinolone prophylaxis in neutropenia:
infection-related mortality
sample size 1,002
α 0.05, β 0.20, Δ 0.25 RRR, CER 7%
N = 6,000

high likelihood could lower quality

when to suspect
 number of small studies
 industry sponsored


What do you do high certainty, no RCTs?
common criteria
 everyone used to do badly
 almost everyone does well
 quick action

insulin for diabetic ketoacidosis?

thyroxine for thyroid deficiency?

hydrocortisone for adrenal insufficiency?

childhood lymphoblastic leukemia

risk for CNS malignancies 15 years after cranial
irradiation



no radiation: 1% (95% CI 0% to 2.1%)
12 Gy: 1.6% (95% CI 0% to 3.4%)
18 Gy: 3.3% (95% CI 0.9% to 5.6%).
Cetainty assessment criteria

What to do when certainty differs across
outcomes?

options
 ignore all but primary
 previous approach
 least certainty of any outcome
 some blended approach
 least certainty of critical outcomes
Trading off desirable and undersirable

what do patients/clinicians need to know
 relative risk reduction?
 absolute risk difference?

Toxic treatment, 50% RRR mortality? OK?

1% to 1/2% OK?

40% to 20%, OK?

body of evidence
 how do we get risk difference?

meta-analysis get pooled relative risk

obtain baseline risk and multiply

BR 10%, RRR 50%, RD 5%

why not get risk difference directly?
RR 0.67
RD 10%
RR 0.67
RD 3.3%
RR 0.67
RD 1%
High versus low PEEP in ALI and ARDS
Population
No. of
participants
(trials) †
Higher
PEEP
Lower
PEEP
Adjusted Relative Risk
(95% CI; P-value) ‡
Adjusted Absolute Risk
Difference (95% CI)
Patients with
ARDS
1892 (3)
324/951
(34.1%)
368/941
(39.1%)
0.90 (0.81 to 1.00;
0.049)
-3.9% (-7.4% to -0.04%)
Patients
without ARDS
404 (3)
50/184
(27.2%)
41/220
(18.6%)
1.37 (0.98 to 1.92;
0.065)
6.9% (-0.4% to 17.1%)
Quality
High
Moderate
(imprecision)

strong recommendation
 benefits clearly outweigh risks/hassle/cost
 risk/hassle/cost clearly outweighs benefit

what can downgrade strength?

low confidence in estimates

close balance between up and downsides

aspirin after myocardial infarction
 25% reduction in relative risk
 side effects minimal, cost minimal
 benefit obviously much greater than risk/cost

warfarin in low risk atrial fibrillation
 warfarin reduces stroke vs ASA by 50%
 but if risk only 1% per year, ARR 0.5%
 increased bleeds by 1% per year
Strength of Recommendations

Aspirin after MI – do it



Warfarin rather than ASA in Afib
-- probably do it
-- probably don’t do it

variability in patient preference
 strong, almost all same choice (> 90%)
 weak, choice varies appreciably

interaction with patient
 strong, just inform patient
 weak, ensure choice reflects values

use of decision aid
 strong, don’t bother; weak, use the aid

quality of care criterion
 strong, consider; weak, don’t consider

choice more preference dependent

risk aversion

steroids for pulmonary fibrosis
 low quality evidence in support of benefit
 high quality evidence of toxicity

recommendation to the hopeful patient

I’m likely to deteriorate
 if something might work, let’s try it
 damn the torpedoes

recommendation to the fearful patient
 doctor, you mean you know it’s toxic
 diabetes, skin changes, body habitus, infection, osteoporosis
 you don’t know for sure it works? are you crazy?

weak recommendation mandated

Comparator often not clear

Children with suspected or confirmed tuberculous
meningitis should be treated with a four-drug
regimen (HRZE) for 2 months, followed by a twodrug regimen (HR) for 10 months

Offer and promote postpartum and post-abortion
contraception to adolescents through multiple
home visits and/or clinic visits
Strong recommendations,
Low certainty: Discordant recs
 Experts use often
 Why?
What are the possibilities?

panels don’t believe their own confidence
ratings

personal conviction trumps evidence

believe weak recommendations ignored

influence funders
Discordant recommendations:
What are the possibilities?

good practice

mistaken judgment

inappropriate

exceptional situation they got it right

For patients with congenital adrenal hyperplasia,
we recommend monitoring patients for signs of
glucocorticoid excess

Wealth of indirect linked evidence

High confidence in net benefit
 Benefit clear
 Minimal harms or costs

Poor use of guideline panel time effort summarize

symptoms and signs appear not infrequently
 Collect cohort studies of incidence
 Studies of accuracy of symptoms and signs

patients suffer if clinicians fail to recognize
 Reports of untreated glucocorticoid excess

clinical action can ameliorate the problem
 Evidence supporting therapy

describe how evidence is linked







Is the statement clear and actionable?
Is the message really necessary?
Is the net benefit large and unequivocal?
Is the evidence difficult to collect and
summarize?
If a public health guideline, are there specific
issues that should be considered (e.g. equity)
Have you made the rationale explicit?
Is this better to be formally GRADEd?

For patients with congenital adrenal
hyperplasia, we recommend monitoring
patients for signs of glucocorticoid excess

Monitor how often?

Nature of monitoring

What to do if signs of excess found

For patients with congenital adrenal
hyperplasia, we recommend monitoring
patients for signs of glucocorticoid excess

Really plausible that clinicians won’t
monitor?

If not, not necessary

relevant symptoms and signs appear not
infrequently

patients will suffer if clinicians fail to
recognize these signs

clinical action can ameliorate the problem.
1
LQE in a life-threatening
situation
Fresh frozen plasma and
intracranial bleed
2
LQoE benefit and HQoE
suggests harm
Head-to-toe CT/MRI screening for
cancer.
3
LQoE suggests equivalence,
HQoE less harm for one
alternative
Helicobacter pylori eradication
early stage gastric MALT
lymphoma
4
HQoE suggests equivalence,
LQoE suggests harm in one
alternative
ACEI in hypertension in women
planning conception and in
pregnancy.
5
HQoE suggests benefit in one Testosterone in males with or at
outcome, LQoE suggests harm risk of prostate cancer
in more highly valued
outcome
1
LQE in a life-threatening
situation
Fresh frozen plasma and
intracranial bleed
2
LQoE benefit and HQoE
suggests harm
Head-to-toe CT/MRI screening for
cancer.
3
LQoE suggests equivalence,
HQoE less harm for one
alternative
Helicobacter pylori eradication
early stage gastric MALT
lymphoma
4
HQoE suggests equivalence,
LQoE suggests harm in one
alternative
ACEI in hypertension in women
planning conception and in
pregnancy.
5
HQoE suggests benefit in one Testosterone in males with or at
outcome, LQoE suggests harm risk of prostate cancer
in more highly valued
outcome
1
LQE in a life-threatening
situation
Fresh frozen plasma and
intracranial bleed
2
LQoE benefit and HQoE
suggests harm
Head-to-toe CT/MRI screening for
cancer.
3
LQoE suggests equivalence,
HQoE less harm for one
alternative
Helicobacter pylori eradication
early stage gastric MALT
lymphoma
4
HQoE suggests equivalence,
LQoE suggests harm in one
alternative
ACEI in hypertension in women
planning conception and in
pregnancy.
5
HQoE suggests benefit in one Testosterone in males with or at
outcome, LQoE suggests harm risk of prostate cancer
in more highly valued
outcome
1
LQE in a life-threatening
situation
Fresh frozen plasma and
intracranial bleed
2
LQoE benefit and HQoE
suggests harm
Head-to-toe CT/MRI screening for
cancer.
3
LQoE suggests equivalence,
HQoE less harm for one
alternative
Helicobacter pylori eradication
early stage gastric MALT
lymphoma
4
HQoE suggests equivalence,
LQoE suggests harm in one
alternative
ACEI in hypertension in women
planning conception and in
pregnancy.
5
HQoE suggests benefit in one Testosterone in males with or at
outcome, LQoE suggests harm risk of prostate cancer
in more highly valued
outcome
1
LQE in a life-threatening
situation
Fresh frozen plasma and
intracranial bleed
2
LQoE benefit and HQoE
suggests harm
Head-to-toe CT/MRI screening for
cancer.
3
LQoE suggests equivalence,
HQoE less harm for one
alternative
Helicobacter pylori eradication
early stage gastric MALT
lymphoma
4
HQoE suggests equivalence,
LQoE suggests harm in one
alternative
ACEI in hypertension in women
planning conception and in
pregnancy.
5
HQoE suggests benefit in one Testosterone in males with or at
outcome, LQoE suggests harm risk of prostate cancer
in more highly valued
outcome
systematic survey of all published ES guidelines
between 2005 and 2011
 screening and extraction in duplicate
 for each recommendation: confidence in estimates,
strength of recommendation


strong recommendations based on LQE taxonomy
for paradigmatic recommendations applied
Condition
1
Best practice statements
2
Additional research
3
4
Example
For patients with Congenital Adrenal Hyperplasia,
we recommend monitoring patients for signs of
glucocorticoid excess
We recommend additional investigation using
rodents and primates to further define the specific
targets of androgen action
Mistaken judgment
For overweight and obese children and
adolescents, intensive lifestyle modification for
the patient and entire family
Inappropriate strong
recommendation
In patients with primary aldosteronism who are
unable or unwilling to undergo laparascopic
adrenalectomy, we recommend medical
treatment with mineralocorticoids
Strong recommendations
(n=206):
n (%)
High/moderate confidence
in estimates
85
(41%)
Very low/low confidence in
estimates
121
(59%)
Totals (%)
206
(100%)
Weak
recommendations
(n=151):
High/moderate
confidence in
estimates
Very low/ low
confidence in
estimates
Totals (%)
n (%)
16
(8%)
135
(92%)
151
(100%)
N - 35
1
LQE in a life-threatening situation
13
LQoE benefit and HQoE suggests harm or
a very high cost
7
LQoE suggests equivalence, HQoE less
harm for one of the competing
alternatives.
5
HQoE suggests equivalence of two
alternatives and LQoE suggests harm in
one alternative
9
HQoE suggests modest benefits and LQoE
suggests possibility of catastrophic harm
Appropriate
29%
Inappropriate
71%
Condition
N = 86
Condition
43
Best practice
5
Appropriate
29%
Mistaken judgment
5
Additional research
33
Inappropriate strong recommendation
Inappropriate
71%

majority ES recommendations strong
 121 (59%) discordant

35/121 (29%) of discordant appropriate

of 86 inappropriate, 43 (50%) best practice
statements

33/86 inappropriate, should have been weak
recommendations

underlying values and preferences always
present

sometimes crucial

important to make explicit

Stroke guideline: patients with TIA
clopidogrel over aspirin (Grade 2B).

Underlying values and preferences: This
recommendation to use clopidogrel over
aspirin places a relatively high value on a
small absolute risk reduction in stroke
rates, and a relatively low value on
minimizing drug expenditures.

peripheral vascular disease: aspirin be used
instead of clopidogrel (Grade 2A).

Underlying values and preferences: This
recommendation places a relatively high
value on avoiding large expenditures to
achieve small reductions in vascular
events.

Consider UpToDate style of values and preferences

Weak recommendation low certainty evidence for trial of
testosterone in men with apparent testosterone
deficiency and cardiovascular disease

Men who place a high value on minimizing risk of an
adverse cardiovascular event and a relatively low value in
ameliorating the symptoms of testosterone deficiency
are likely to choose against testoserone use

venotonic agents
 mechanism unclear, increase venous return

popularity
 90 venotonics commercialized in France
 none in Sweden and Norway
 France 70% of world market

possibilities
 French misguided
 rest of world missing out

14 trials, 1432 patients

key outcome
 risk not improving/persistent symptoms
 11 studies, 1002 patients, 375 events
 RR 0.4, 95% CI 0.29 to 0.57

minimal side effects

is France right?

what is the certainty of evidence?

risk of bias
 lack of detail re concealment
 questionnaires not validated

indirectness – no problem

inconsistency, need to look at the results
Review :
Comparison:
Outcome:
Study
or sub-category
Phlebotonics for hemorrhoids
01 Venotonics vs placebp
08 Overall improvement: no improvement/some improvement
RR (random)
95% CI
log[RR] (SE)
Weight
%
01 Up to seven days
-0.8916 (0.2376)
Chauvenet
-2.2073 (0.6117)
Cospite
-0.4308 (0.2985)
Thanapongsathorn
Subtotal (95% CI)
Test for heterogeneity: Chi² = 6.92, df = 2 (P = 0.03), I² = 71.1%
Test for overall effect: Z = 2.67 (P = 0.008)
02 Up to four w eeks
-1.6094 (0.7073)
Annoni F
-0.9943 (0.3983)
Clyne MB
-1.1712 (0.3086)
Pirard J
-1.1087 (1.1098)
Thanapongsathorn
0.2624 (0.3291)
Thorp
-0.8916
(0.3691)
Titapan
-0.5978
(0.1375)
Wijayanegara
Subtotal (95% CI)
Test for heterogeneity: Chi² = 13.87, df = 6 (P = 0.03), I² = 56.7%
Test for overall effect: Z = 3.57 (P = 0.0004)
12.67
5.51
11.18
29.36
0.41
0.11
0.65
0.37
[0.26,
[0.03,
[0.36,
[0.18,
0.65]
0.36]
1.17]
0.77]
4.50
8.94
10.94
2.18
10.46
9.56
14.97
61.54
0.20
0.37
0.31
0.33
1.30
0.41
0.55
0.48
[0.05,
[0.17,
[0.17,
[0.04,
[0.68,
[0.20,
[0.42,
[0.32,
0.80]
0.81]
0.57]
2.91]
2.48]
0.85]
0.72]
0.72]
03 Further than four w eeks
-1.7719 (0.3906)
Godeberg
Subtotal (95% CI)
Test for heterogeneity: not applicable
Test for overall effect: Z = 4.54 (P < 0.00001)
Total (95% CI)
Test for heterogeneity: Chi² = 28.66, df = 10 (P = 0.001), I² = 65.1%
Test for overall effect: Z = 5.14 (P < 0.00001)
0.001 0.01
0.1
Favours treatment
1
10
100
Favours control
RR (random)
95% CI
9.10
9.10
0.17 [0.08, 0.37]
0.17 [0.08, 0.37]
100.00
0.40 [0.29, 0.57]
1000

size of studies
 40 to 234 patients, most around 100

all industry sponsored
Phlebotonics for hemorrhoids
Review :
Comparison: 01 Venotonics vs placebp
08 Overall improvement: no improvement/some improvement
Outcome:
0.0
0.4
0.8
1.2
1.6
0.001
0.01
0.1
1
10
100
1000
RR (fixed)





risk of bias
 lack of detail re concealment
 questionnaires not validated
inconsistency
 almost all show positive effect, trend
 heterogeneity p < 0.001; I2 65.1%
indirectness
imprecision
 RR 0.4, 95% CI 0.29 to 0.57
publication bias
 40 to 234 patients, most around 100

recommendation
 yes
 no against use

strength
 strong
 weak
Beta blockers in non-cardiac surgery
Summary of Findings
Quality Assessment
Quality
Relative
Effect
(95% CI)
Outcome
Number of
participants
(studies)
Risk of
Bias
Consistency
Directness
Precision
Publication
Bias
Myocardial
infarction
10,125
(9)
No serious
limitations
No serious
imitations
No serious
limitations
No serious
limitations
Not
detected
High
0.71
(0.57 to 0.86)
Mortality
10,205
(7)
No serious
limitations
No serious
limiations
No serious
limitations
Imprecise
Not
detected
Moderate
1.23
(0.98 – 1.55)
Stroke
10,889
(5)
No serious
limitaions
No serious
limitations
No serious
limitations
No serious
limitations
Not
detected
High
2.21
(1.37 – 3.55)
Absolute risk
difference
1.5% fewer
(0.7% fewer to
2.1% fewer)
0.5% more
(0.1% fewer
to 1.3% more)
0.5% more
(0.2% more to
1.3% more0

GRADE values and preferences

GRADE diagnosis

Aspirin for primary prevention

Culprit only vs complete revascularization in
STEMI

Management of esophageal varices
Download