Outcomes

advertisement
The GRADE approach: an introductory
workshop
Holger Schünemann, MD, PhD
Professor and Chair, Dept. of Clinical Epidemiology & Biostatistics
Professor of Medicine
Michael Gent Chair in Healthcare Research
McMaster University, Hamilton, Canada
NTP, Raleigh
June 22, 2011
The Department of Clinical
Epidemiology & Biostatistics at
McMaster
History
- 1967 – Founded by David Sackett
- 6 chairs since
- Instrumental in specialty of Clinical Epidemiology,
origin of “Evidence-Based Medicine”
People
45 full time and joint faculty
~ 120 associate & part time faculty; 19 emeritus
~ 180 staff
~ 200 PhD and Master students
Content
• Guidelines and GRADE
– Background about GRADE
• Quality of evidence
• Going from evidence to recommendations
What is a guideline?
• "Guidelines are recommendations intended to
assist providers and recipients of health care and
other stakeholders to make informed decisions.
Recommendations may relate to clinical
interventions, public health activities, or
government policies."
WHO 2003, 2007
Evidence based healthcare decisions
Population values
and preferences
(Clinical) state and
circumstances
Expertise
Research evidence
Haynes et al. 2002
Confidence in evidence
• There always is evidence
– “When there is a question there is evidence”
• Better research  greater confidence in the
evidence and decisions
Hierarchy of evidence
based on quality
STUDY DESIGN




Randomized Controlled
Trials
Cohort Studies and Case
Control Studies
Case Reports and Case
Series, Non-systematic
observations
Expert Opinion
BIAS
“Everything should be made as simple as
possible but not simpler.”
Explain the following?
• Confounding, effect modification & ext. validity
• Concealment of randomization
• Blinding (who is blinded in a double blinded
study?)
• Intention to treat analysis and its correct
application
• P-values and confidence intervals
BMJ 2003
BMJ, 2003
Relative risk reduction:
….> 99.9 % (1/100,000)
U.S. Parachute Association
reported 821 injuries and 18
deaths out of 2.2 million jumps
in 2007
BMJ 2003
Simple hierarchies are (too)
simplistic
STUDY DESIGN


Cohort Studies and Case
Control Studies
Case Reports and Case
Series, Non-systematic
observations
Expert Opinion
Expert Opinion

Randomized Controlled
Trials
BIAS
Schünemann & Bone, 2003
Which hierarchy?
Recommendation for use of oral
anticoagulation in patients with atrial
fibrillation and rheumatic mitral valve disease
Evidence
•B
•A
• IV
Recommendation
Class I
1
C
Organization
 AHA
 ACCP
 SIGN
Oxford Centre for Evidence Based
Medicine
Levels of Evidence and Grades of Recommendations- 23 November 1999.
Grade of
Recommendation
Level of
Evidence
Therapy/Prevention, Aetiology/Harm
Prognosis
Diagnosis
Economic analysis
1a
SR (with homogeneity) of RCTs
SR (with homogeneity*) of Level 1 diagnostic
studies; or a CPG validated on a test set.
SR (with homogeneity*) of Level 1 economic
studies
1b
Individual RCT (with narrow Confidence
Interval)
SR (with homogeneity*) of
inception cohort studies; or a CPG
validated on a test set.
Individual inception cohort study
with > 80% follow-up
Independent blind comparison of an
appropriate spectrum of consecutive patients,
all of whom have undergone both the
diagnostic test and the reference standard.
1c
All or none
All or none case-series
Absolute SpPins and SnNouts
Analysis comparing all (critically-validated)
alternative outcomes against appropriate
cost measurement, and including a
sensitivity analysis incorporating clinically
sensible variations in important variables.
Clearly as good or better, but cheaper.
Clearly as bad or worse but more expensive.
Clearly better or worse at the same cost.
2a
SR (with homogeneity*) of cohort studies
SR (with homogeneity*) of Level >2 diagnostic
studies
SR (with homogeneity*) of Level >2
economic studies
2b
Individual cohort study (including low
quality RCT; e.g., <80% follow-up)
SR (with homogeneity*) of either
retrospective cohort studies or
untreated control groups in RCTs.
Retrospective cohort study or
follow-up of untreated control
patients in an RCT; or CPG not
validated in a test set.
Any of:
· Independent blind or objective comparison;
· Study performed in a set of non-consecutive
patients, or confined to a narrow spectrum of
study individuals (or both) all of whom have
undergone both the diagnostic test and the
reference standard;
· A diagnostic CPG not validated in a test set.
Analysis comparing a limited number of
alternative outcomes against appropriate
cost measurement, and including a
sensitivity analysis incorporating clinically
sensible variations in important variables.
2c
“Outcomes” Research
3a
SR (with homogeneity*) of case-control
studies
Individual Case-Control Study
Independent blind comparison of an
appropriate spectrum, but the reference
standard was not applied to all study patients
Analysis without accurate cost
measurement, but including a sensitivity
analysis incorporating clinically sensible
variations in important variables.
A
B
3b
4
Case-series (and poor quality cohort and
case-control studies)
Case-series (and poor quality
prognostic cohort studies)
Any of:
· Reference standard was unobjective,
unblinded or not
· independent;
· Positive and negative tests were verified
using separate reference standards;
· Study was performed in an inappropriate
spectrum** of patients.
Analysis with no sensitivity analysis
5
Expert opinion without explicit critical
appraisal, or based on physiology,
bench research or “first principles”
Expert opinion without explicit
critical appraisal, or based on
physiology, bench research or
“first principles”
Expert opinion without explicit critical
appraisal, or based on physiology, bench
research or “first principles”
Expert opinion without explicit critical
appraisal, or based on economic theory
C
D
“Outcomes” Research
Oxford Centre for Evidence-Based Medicine (Chris Ball, Dave Sackett, Bob Phillips, Brian Haynes, and
Sharon Straus).
USPSTF - Grade Definitions After May 2007:
Certainty
Level of Certainty
High
Moderate
Low
Description
The available evidence usually includes consistent results from well-designed, wellconducted studies in representative primary care populations. These studies assess the
effects of the preventive service on health outcomes. This conclusion is therefore
unlikely to be strongly affected by the results of future studies.
•The available evidence is sufficient to determine the effects of the preventive service
on health outcomes, but confidence in the estimate is constrained by such factors as:
The number, size, or quality of individual studies.
•Inconsistency of findings across individual studies.
•Limited generalizability of findings to routine primary care practice.
•Lack of coherence in the chain of evidence.
As more information becomes available, the magnitude or direction of the observed
effect could change, and this change may be large enough to alter the conclusion.
•The available evidence is insufficient to assess effects on health outcomes. Evidence is
insufficient because of: The limited number or size of studies.
•Important flaws in study design or methods.
•Inconsistency of findings across individual studies.
•Gaps in the chain of evidence.
•Findings not generalizable to routine primary care practice.
•Lack of information on important health outcomes.
More information may allow estimation of effects on health outcomes.
The USPSTF defines certainty as "likelihood that the USPSTF assessment
of the net benefit of a preventive service is correct."
• Recommendations for prognosis
– Use prognostic information to determine baseline
risk for healthcare decisions
19
20
Center for Disease Control and
Prevention (CDC)
Evidence of Execution
Effectiveness - Good or
Fair
Design Suitability
—
Greatest,
Moderate, or
Least
Greatest
Number
of Studies
Consistent
Effect
Sized
Expert Opinion
At Least 2
Yes
Sufficient
Not Used
Greatest or
Moderate
Greatest
At Least 5
Yes
Sufficient
Not Used
Good or
At Least 5
Yes
Fair
Meet Design, Execution, Number, and Consistency Criteria for
Sufficient But Not Strong Evidence
Sufficient
Good
Greatest
1
Not
Applicable
Good or
Greatest or
At Least 3
Yes
Fair
Moderate
Good or
Greatest,
At Least 5
Yes
Fair
Moderate, or Least
Expert Opinion Varies
Varies
Varies
Varies
Sufficient
Not Used
Large
Not Used
Sufficient
Not Used
Sufficient
Not Used
Sufficient
Not Used
Sufficient
Insufficient
D. Small
Supports a
Recommendation
E. Not Used
Strong
Good
Good
A.Insufficient Designs or
Execution
B. Too Few
Studies
C.
Inconsistent
Healthcare problem
“Healthy people”
“Herd immunity”
“Long term perspective”
“Disease perception”
“Lots of other things”
recommendation
GRADE
Working Group
Grades of Recommendation
Assessment, Development and
Evaluation
• Aim: to develop a common, transparent and sensible
system for grading the quality of evidence and the
strength of recommendations
• International group of guideline developers,
methodologists & clinicians from around the world (>250
contributors) – since 2000
• International group: ACCP, AHRQ, Australian NMRC,
BMJ Clinical Evidence, Cochrane Collaboration,
CDC, McMaster, NICE, Oxford CEBM, SIGN,
CMAJ 2003, BMJ 2004, BMC 2004, BMC 2005,
UpToDate, USPSTF, WHO
AJRCCM 2006, Chest 2006, BMJ 2008
GRADE Uptake



















World Health Organization
CDC-ACIP
Allergic Rhinitis in Asthma Guidelines (ARIA)
American Thoracic Society
American College of Physicians
European Respiratory Society
European Society of Thoracic Surgeons
British Medical Journal
Infectious Disease Society of America
American College of Chest Physicians
UpToDate®
National Institutes of Health and Clinical Excellence (NICE)
Scottish Intercollegiate Guideline Network (SIGN)
Cochrane Collaboration
Infectious Disease Society of America
Clinical Evidence
Agency for Health Care Research and Quality (AHRQ)
Partner of GIN
Over 40 major organizations
Guideline
development
Process
Prioritise problems & scoping

Establish guideline panel and develop questions, including outcomes

Find and critically appraise systematic review(s)
and/or
Prepare protocol(s) for systematic review(s)
and
Prepare systematic review(s)
(searches, selection of studies, data collection and analysis)

Prepare an evidence profile

Assess the quality of evidence for each outcome

Prepare a Summary of Findings table

If developing guidelines:
Assess the overall quality of evidence
and
Decide on the direction (which alternative) and strength of the
recommendation

Draft guideline

Consult with stakeholders and/or external peer reviewers

Disseminate guidelines

Update review or guidelines when needed

Adapt guidelines, if needed

Prioritise guidelines/recommendations for implementation

Implement or support implementation of the guidelines

Evaluate the impact of the guidelines and implementation strategies

Update systematic review/guidelines
Case scenario
A 13 year old girl who lives in rural Indonesia presented with
flu symptoms and developed severe respiratory distress over
the course of the last 2 days. She required intubation. The
history reveals that she shares her living quarters with her
parents and her three siblings. At night the family’s chicken
stock shares this room too and several chicken had died
unexpectedly a few days before the girl fell sick.
Potential interventions: antivirals, such as neuraminidase
inhibitors oseltamivir and zanamivir
Types of questions
Background Questions
Definition:
What is Avian Influenza?
Mechanism:
What is the mechanism of
action of oseltamivir?
Foreground Questions
Benefit > harm: In patients with avian
influenza, does oseltamivir
therapy improve survival, …?
Framing a foreground question
Population:
Avian Flu/influenza A (H5N1) patients
Intervention: Oseltamivir
Comparison: No pharmacological intervention
Outcomes:
Mortality, hospitalizations,
resource use, adverse outcomes,
antimicrobial resistance
Schunemann, et al., The Lancet ID, 2007
Choosing outcomes
• Desirable outcomes
–
–
–
–
lower mortality
reduced hospital stay
reduced duration of disease
reduced resource expenditure
• Undesirable outcomes
– adverse reactions
– the development of resistance
– costs of treatment
• Every decision comes with desirable and
undesirable consequences
Developing recommendations must include a
consideration of desirable and undesirable outcomes
Relative importance of outcomes
• Decision makers (and guideline
authors) need to consider the
relative importance of outcomes
when balancing these outcomes to
make a recommendation
• Relative importance vary across
populations
• Relative importance may vary across
patient groups within the same
population
• When considered critical - evaluate
GRADE: recommendation – quality of
evidence
Clear separation:
1) Recommendation: 2 grades –
weak/conditional/optional or strong (for or
against an intervention)?
– Balance of benefits and downsides, values and
preferences, resource use and quality of evidence
2) 4 categories of quality of evidence:  (High),
(Moderate), (Low), (Very low)?
– methodological quality of evidence
– likelihood of bias
– by outcome and across outcomes
*www.GradeWorking-Group.org
GRADE Quality of Evidence
In the context of a systematic review
• The quality of evidence reflects the extent to which
we are confident that an estimate of effect is
correct.
In the context of making recommendations
• The quality of evidence reflects the extent to which
our confidence in estimates of the effects is
adequate to support a particular recommendation.
Likelihood
of and
confidence
in an
outcome
Definition of grades of evidence
Research
• /A/High: Further research is very unlikely
to change confidence in the estimate of effect.
• /B/Moderate: Further research is likely to
have an important impact on confidence in the
estimate of effect and may change the estimate.
• /C/Low: Further research is very likely to
have an important impact on confidence in the
estimate of effect and is likely to change the
estimate.
• /D/Very low: Any estimate of effect is
very uncertain.
Confidence in evidence
/A/High: We are very confident that the true effect lies close to
that of the estimate of the effect.
/B/Moderate: : We are moderately confident in the effect
estimate: The true effect is likely to be close to the estimate of the
effect, but there is a possibility that it is substantially different.
/C/Low : Our confidence in the effect estimate is limited: The
true effect may be substantially different from the estimate of the
effect.
/D/Very low : We have very little confidence in the effect
estimate: The true effect is likely to be substantially different from the
estimate of effect.
Determinants of quality
• RCTs 
• observational studies 
• 5 factors that can lower quality
1.
2.
3.
4.
5.
limitations in detailed design and execution (risk of bias
criteria)
Inconsistency (or heterogeneity)
Indirectness (PICO and applicability)
Imprecision (number of events and confidence intervals)
Publication bias
• 3 factors can increase quality
1.
2.
3.
large magnitude of effect
all plausible residual confounding may be working to
reduce the demonstrated effect or increase the effect if
no effect was observed
dose-response gradient
1. Design and Execution/Risk of Bias
Examples:
• Inappropriate selection of exposed and unexposed groups
• Failure to adequately measure/control for confounding
• Selective outcome reporting
• Failure to blind (e.g. outcome assessors)
• High loss to follow-up
• Lack of concealment in RCTs
• Intention to treat principle violated
Design and Execution/RoB
From Cates , CDSR 2008
Design and Execution/RoB
Overall judgment required
2. Inconsistency of results
(Heterogeneity)
• if inconsistency, look for explanation
– patients, intervention, comparator, outcome
• if unexplained inconsistency lower quality
Reminders for immunization uptake
Indoor air polution: ALRI
Non-steroidal drug use and risk of
pancreatic cancer
Capurso G, Schünemann HJ, Terrenato I, Moretti A, Koch M, Muti P, Capurso L, Delle Fave G.
Meta-analysis: the use of non-steroidal anti-inflammatory drugs and pancreatic cancer risk for different exposure categories.
Aliment Pharmacol Ther. 2007 Oct 15;26(8):1089-99.
3. Directness of Evidence
• differences in
– populations/patients (children – neonates, women in
general – pregnant women)
– interventions (all vaccines, new - old)
– comparator appropriate (new policy – old or no policy)
– outcomes (important – surrogate: cases prevented –
seroconversion)
• indirect comparisons
– interested in A versus B
– have A versus C and B versus C
– Vaccine A versus Placebo versus Vaccine B
• Possibly. The “high” dose effects of bisphenol A in
laboratory animals that provide clear evidence for adverse
effects on development, i.e., reduced survival, birth weight,
and growth of offspring early in life, and delayed puberty in
female rats and male rats and mice, are observed at levels
of exposure that far exceed those encountered by humans.
However, estimated exposures in pregnant women and
fetuses, infants, and children are similar to levels of
bisphenol A associated with several “low” dose laboratory
animal findings of effects on the brain and behavior,
prostate and mammary gland development, and early
onset of puberty in females. When considered together,
these laboratory animal findings provide limited evidence
that bisphenol A has adverse effects on development.
Hierarchy of outcomes according to their importance to assess the effect of
phosphate lowering drugs in patients with renal failure and hyperphosphatemia
Importance of outcomes
9
Critical for
decision making
Important, but
not critical for
decision making
Mortality
Myocardial infarction
Coronary
calcification
Ca2+/Pproduct
7
Fractures
Bone
density
Ca2+/Pproduct
6
Pain due to soft tissue
calcification / function
Soft tissue
calcification
Ca2+/Pproduct
8
5
4
3
Low importance
for
decision making
2
1
Flatulence
Surrogates: relation to important
outcomes increasingly uncertain
4. Publication Bias
• Should always be suspected
– Only small “positive” studies (hypothesis confirming)
– For profit interest
– Various methods to evaluate – none perfect, but
clearly a problem
ISIS-4
Lancet 1995
I.V. Mg in
acute
myocardial
infarction
Meta-analysis
Yusuf S.Circulation 1993
Publication bias
Egger M, Smith DS. BMJ
1995;310:752-54
49
Funnel plot
Standard Error
0
Symmetrical:
No publication bias
1
2
3
0.1
0.3
0.6 1
3
10
Odds ratio
50
Funnel plot
Standard Error
0
1
File drawer
problem
No interest in
publishing or
being published
0.4
Asymmetrical:
Publication bias?
2
3
0.1
0.3
0.6 1
3
10
Odds ratio
51
Indoor air polution: ALRI
5. Imprecision
• Small sample size
– small number of events
• Wide confidence intervals
– uncertainty about magnitude of effect
• Extent to which confidence in estimate of
effect adequate to support decision
Example: Immunization in children
What can raise quality?
1. large magnitude can upgrade (RRR 50%/RR 2)
– very large two levels (RRR 80%/RR 5)
– criteria
• everyone used to do badly
• almost everyone does well
– parachutes to prevent death when jumping from
airplanes
Reminders for immunization uptake
What can raise quality?
2. dose response relation
– (higher INR – increased bleeding)
– childhood lymphoblastic leukemia
•
•
•
•
risk for CNS malignancies 15 years after cranial irradiation
no radiation: 1% (95% CI 0% to 2.1%)
12 Gy: 1.6% (95% CI 0% to 3.4%)
18 Gy: 3.3% (95% CI 0.9% to 5.6%)
3. all plausible confounding may be working to reduce the
demonstrated effect or increase the effect if no effect was
observed
All plausible residual confounding
would result in an overestimate of effect



Hypoglycaemic drug phenformin causes lactic
acidosis
The related agent metformin is under
suspicion for the same toxicity.
Large observational studies have failed to
demonstrate an association
– Clinicians would be more alert to lactic acidosis in
the presence of the agent
• Vaccine – adverse effects
Quality assessment criteria
Study
design
Initial
Lower if
quality of
a body of
evidence
Higher if
Quality of a
body of
evidence
Randomised
trials
High
Large effect
Dose response
All plausible residual
confounding & bias
-Would reduce a
demonstrated effect
-Would suggest a
spurious effect if no
effect was observed
A/High (four
plus:
)
B/Moderate
(three plus:
)
C/Low (two
plus:
)
D/Very low
(one plus:
)
Risk of Bias
Inconsistency
Indirectness
Imprecision
Observational Low
studies
Publication bias
Evidence Profiles/Summaries
Evidence Profiles/Summaries
Evidence Profiles/Summaries
Evidence Profiles/Summaries
Content
• Background
• Quality of evidence
• Moving from evidence to recommendations
Strength of recommendation
“The strength of a recommendation reflects
the extent to which we can, across the range
of patients for whom the recommendations
are intended, be confident that desirable
effects of a management strategy outweigh
undesirable effects.”
• Strong or weak/conditional
Determinants of the strength of
recommendation
Factors that can strengthen a Comment
recommendation
Quality of the evidence
The higher the quality of evidence, the
more likely is a strong
recommendation.
Balance between desirable
The larger the difference between the
and undesirable effects
desirable and undesirable
consequences, the more likely a strong
recommendation warranted. The
smaller the net benefit and the lower
certainty for that benefit, the more likely
weak recommendation warranted.
Values and preferences
The greater the variability in values and
preferences, or uncertainty in values
and preferences, the more likely weak
recommendation warranted.
Costs (resource allocation)
The higher the costs of an intervention
– that is, the more resources
consumed – the less likely is a strong
recommendation warranted
Developing recommendations
Case scenario
A 13 year old girl who lives in rural Indonesia presented
with flu symptoms and developed severe respiratory
distress over the course of the last 2 days. She required
intubation. The history reveals that she shares her
living quarters with her parents and her three siblings.
At night the family’s chicken stock shares this room too
and several chicken had died unexpectedly a few days
before the girl fell sick.
Methods – WHO Rapid Advice
Guidelines for management of Avian Flu
 Applied findings of a recent systematic evaluation
of guideline development for WHO/ACHR
 Group composition (including panel of 13 voting
members):
 clinicians who treated influenza A(H5N1) patients
 infectious disease experts
 basic scientists
 public health officers
 methodologists
 Independent scientific reviewers:
 Identified systematic reviews, recent RCTs, case series,
animal studies related to H5N1 infection
Oseltamivir for Avian Flu
Summary of findings:
 No clinical trial of oseltamivir for treatment of
H5N1 patients.
 4 systematic reviews and health technology
assessments (HTA) reporting on 5 studies of
oseltamivir in seasonal influenza.
 Hospitalization: OR 0.22 (0.02 – 2.16)
 Pneumonia: OR 0.15 (0.03 - 0.69)




3 published case series.
Many in vitro and animal studies.
No alternative that is more promising at present.
Cost: 40$ per treatment course
From evidence to recommendation
Factors that can strengthen a Comment
recommendation
Quality of the evidence
Very low quality evidence
Balance between desirable
and undesirable effects
Values and preferences
Costs (resource allocation)
Uncertain, but small reduction in
relative risk still leads to large absolute
effect
Little variability and clear
Low cost under non-pandemic
conditions
Example: Oseltamivir for Avian Flu
Recommendation: In patients with confirmed or
strongly suspected infection with avian influenza A
(H5N1) virus, clinicians should administer
oseltamivir treatment as soon as possible (?????
recommendation, very low quality evidence).
Schunemann et al. The Lancet ID, 2007
Example: Oseltamivir for Avian Flu
Recommendation: In patients with confirmed or
strongly suspected infection with avian influenza A
(H5N1) virus, clinicians should administer
oseltamivir treatment as soon as possible (strong
recommendation, very low quality evidence).
Values and Preferences
Remarks: This recommendation places a high
value on the prevention of death in an illness
with a high case fatality. It places relatively low
values on adverse reactions, the development
of resistance and costs of treatment.
Schunemann et al. The Lancet ID, 2007
Implications of
a strong recommendation
• Patients: Most people in this situation would want
the recommended course of action and only a small
proportion would not
• Clinicians: Most patients should receive the
recommended course of action
• Policy makers: The recommendation can be adapted
as a policy in most situations
Implications of
a conditional/weak recommendation
• Patients: The majority of people in this situation
would want the recommended course of action, but
many would not
• Clinicians: Be more prepared to help patients to
make a decision that is consistent with their own
values/decision aids and shared decision making
• Policy makers: There is a need for substantial
debate and involvement of stakeholders
Critical
Outcome
Critical
Outcome
Important
Outcome
Not
High
Moderate
Low
Very low
Summary of findings
& estimate of effect
for each outcome
Systematic review
Grade down
Outcome
Grade up
P
I
C
O
Randomization
increases initial
quality
1. Risk of bias
2. Inconsistency
3. Indirectness
4. Imprecision
5. Publication
bias
1. Large effect
2. Dose
response
3. Confounders
Guideline development
Formulate recommendations:
• For or against (direction)
• Strong or weak (strength)
By considering:
 Quality of evidence
 Balance benefits/harms
 Values and preferences
Revise if necessary by considering:
 Resource use (cost)
Grade
overall quality of evidence
across outcomes based on
lowest quality
of critical outcomes
•
•
•
•
“We recommend using…”
“We suggest using…”
“We recommend against using…”
“We suggest against using…”
Issues in guideline development
in Public Health
• Causation versus effects of intervention
– Causation not equivalent to efficacy of interventions
– Bradford Hill
• Nearly half a century old – tablet from the mountain?
• Harms caused by medications
– Assumption is that removal of exposure leads to NO
adverse effects
• How confident can one be that removal of the
exposure is effective in preventing disease?
– Whether drugs or environmental factors it will depend
on the intervention to remove exposure
Schünemann et al. JECH 2010
Conclusions
 Clinical practice guidelines should be based on the best
available evidence to be evidence based
 GRADE combines what is known in health research
methodology and provides a structured approach to
improve communication
 Criteria for evidence assessment across questions and
outcomes
 Criteria for moving from evidence to recommendations
 Transparent, systematic
 four categories of quality of evidence
 two grades for strength of recommendations
 Transparency in decision making and judgments is key
Formulating Questions and
Choosing Outcomes
Outline
• Type of questions
• Framing a foreground question
• Choosing outcomes
• Relative importance of outcomes
85
Guidelines and questions
Guidelines are a way of answering questions
about clinical, communication, organisational or
policy interventions, in the hope of improving
health care or health policy.
It is therefore helpful to structure a guideline in
terms of answerable questions.
WHO Guideline Handbook, 2008
Types of questions
Background Questions
Definition:
What is COPD?
Mechanism:
What is the mechanism of
action of mucolytic therapy?
Foreground Questions
Efficacy:
In patients with COPD, does
mucolytic therapy improve
survival?
Framing a foreground question
P
I
C
O
Framing a foreground question
Population:
Intervention:
Comparison:
Outcomes:
Case scenario
A 13 year old girl who lives in rural Indonesia presented with
flu symptoms and developed severe respiratory distress over
the course of the last 2 days. She required intubation. The
history reveals that she shares her living quarters with her
parents and her three siblings. At night the family’s chicken
stock shares this room too and several chicken had died
unexpectedly a few days before the girl fell sick.
Potential interventions: antivirals, such as neuraminidase
inhibitors oseltamivir and zanamivir
What are examples of:
• Background questions
• Foreground questions
•Population:
•Intervention:
•Comparison:
•Outcomes:
91
Framing a foreground question
Population:
Avian Flu/influenza A (H5N1) patients
Intervention: Oseltamivir (or Zanamivir)
Comparison: No pharmacological intervention
Outcomes:
Mortality, hospitalizations,
resource use, adverse outcomes,
antimicrobial resistance
Schunemann, Hill et al., The Lancet ID,
2007
Choosing outcomes
• Every decision comes with desirable and
undesirable consequences
Developing recommendations must include a
consideration of desirable and undesirable
outcomes
 Outcomes should be patient important
outcomes.
Choosing outcomes
• desirable outcomes
– lower mortality
– reduced hospital stay
– reduced duration of disease
– reduced resource expenditure
• undesirable outcomes
– adverse reactions
– the development of resistance
– costs of treatment
Choosing outcomes
 What if what is important is not measured?
 What if what is measured is not important?
 How do we make sure we’ve covered all
important outcomes?
Relative importance of outcomes
• Decision makers (and guideline
authors) need to consider the
relative importance of outcomes
when balancing these outcomes to
make a recommendation
• Relative importance vary across
populations
• Relative importance may vary across
patient groups within the same
population
• When considered critical - evaluate
Relative importance of outcomes
9
8
Critical
for decision making
7
6
5
Important,
but not critical for
decision making
4
3
2
1
Of low
importance
Using GRADEpro
Creating a new GRADEpro
file
Profile groups
Profiles
Managing outcomes
118
Content
• Quality of evidence
• Going from evidence to recommendations
Healthcare problem
recommendation
Strength of recommendation
“The strength of a recommendation reflects
the extent to which we can, across the range
of patients for whom the recommendations
are intended, be confident that desirable
effects of a management strategy outweigh
undesirable effects.”
• Strong or conditional
Strength of recommendation
The degree of confidence that the desirable
effects of adherence to a recommendation
outweigh the undesirable effects.
Desirable effects
•health benefits
•less burden
•savings
Undesirable effects
•harms
•more burden
•costs
Determinants of the strength of
recommendation
Factors that can strengthen a
recommendation
Quality of the evidence
Balance between desirable and
undesirable effects
Values and preferences
Costs (resource allocation)
Comment
The higher the quality of evidence, the
more likely is a strong
recommendation.
The larger the difference between the
desirable and undesirable
consequences, the more likely a strong
recommendation warranted. The
smaller the net benefit and the lower
certainty for that benefit, the more likely
weak recommendation warranted.
The greater the variability in values and
preferences, or uncertainty in values
and preferences, the more likely weak
recommendation warranted.
The higher the costs of an intervention
– that is, the more resources
consumed – the less likely is a strong
recommendation warranted
Balancing benefits and downsides
↑ herd
immunity
Conditional
Strong
↓
Morbidity
↓ Death
↑ QoL
For
↑ Resources
↑ Allergic
reactions
↑ Nausea
↑ Local skin
reactions
Against
Balancing benefits and downsides
Conditional
Strong
For
Against
Balancing benefits and downsides
Conditional
Strong
For
Against
Balancing benefits and downsides
Conditional
Strong
For
Against
Balancing benefits and downsides
Conditional
Strong
For
Against
Implications of
a strong recommendation
• Policy makers: The recommendation can
be adapted as a policy in most situations
• Patients: Most people in this situation
would want the recommended course of
action and only a small proportion would
not
• Clinicians: Most patients should receive
the recommended course of action
Implications of
a conditional recommendation
• Policy makers: There is a need for
substantial debate and involvement of
stakeholders
• Patients: The majority of people in this
situation would want the recommended
course of action, but many would not
• Clinicians: Be more prepared to help
patients to make a decision that is
consistent with their own values/decision
aids and shared decision making
Case scenario
A 13 year old girl who lives in rural
Indonesia presented with flu symptoms and
developed severe respiratory distress over
the course of the last 2 days. She required
intubation. The history reveals that she
shares her living quarters with her parents
and her three siblings. At night the family’s
chicken stock shares this room too and
several chicken had died unexpectedly a few
days before the girl fell sick.
Methods – WHO Rapid Advice Guidelines for Avian
Flu
 Applied findings of a recent systematic evaluation of
guideline development for WHO/ACHR
 Group composition (including panel of 13 voting
members):





clinicians who treated influenza A(H5N1) patients
infectious disease experts
basic scientists
public health officers
methodologists
 Independent scientific reviewers:
 Identified systematic reviews, recent RCTs, case series,
animal studies related to H5N1 infection
Oseltamivir for Avian Flu
Summary of findings:
• No clinical trial of oseltamivir for treatment of
H5N1 patients.
• 4 systematic reviews and health technology
assessments (HTA) reporting on 5 studies of
oseltamivir in seasonal influenza.
– Hospitalization: OR 0.22 (0.02 – 2.16)
– Pneumonia: OR 0.15 (0.03 - 0.69)
• 3 published case series.
• Many in vitro and animal studies.
• No alternative that was more promising at
present.
• Cost: 40$ per treatment course
From evidence to recommendation
Factors that can strengthen a Comment
recommendation
Quality of the evidence
Very low quality evidence
Balance between desirable
and undesirable effects
Values and preferences
Costs (resource allocation)
Uncertain, but small reduction in
relative risk still leads to large absolute
effect
Little variability and clear
Low cost under non-pandemic
conditions
Complex data & decisions: yes/no?
Recommendation
-
The Guidelines Group recommends that
fluoroquinolones are / not used in the
treatment of all patients with MDR
(Strong(conditional) recommendation/
low(moderate, high) grade of evidence)
Recommendation: In women with histologically confirmed CIN, the expert panel recommends/suggests cryotherapy/LEEP over cryotherapy/LEEP.
Population: Women with histologically confirmed CIN
Intervention: Cryotherapy versus LEEP
Decision
Factor
Explanation
High or moderate evidence
(is there high or moderate quality
evidence?)
The higher the quality of evidence, the

Yes
more likely is a strong
OO

N0
recommendation.
Certainty about the balance of
benefits versus harms and burdens
(is there certainty?)
The larger the difference between the
desirable and undesirable
consequences and the certainty
around that difference, the more likely
a strong recommendation. The
smaller the net benefit and the lower
the certainty for that benefit, the more
likely is a conditional/ weak
recommendation.
Certainty in or similar values (is there
certainty or similarity?)
The more certainty or similarity in
values and preferences, the more likely
a strong recommendation.
Resource implications (are the
resources worth the intervention?)
The lower the cost of an intervention
compared to the alternative that is
considered and other costs related to
the decision – that is, the less
resources consumed – the more likely
is a strong recommendation.
Overall strength of recommendation
There is moderate quality evidence from both randomised and
observational controlled studies for recurrence rates. However, there
is low quality evidence for other outcomes which were considered
critical and important for decision making (e.g., severe adverse events,
cervical cancer). There is uncertainty for fertility and other obstetrical
outcomes, and HIV acquisition/transmission was not measured.



Yes
No
Benefits of LEEP

were greater, and 
harms were fewer
or similar



YES
No
Similar values
across women


YES
No
More resources
required for LEEP





Conditional
Recurrence rates of CIN I, CIN II-III and all CINs are probably
greater with cryotherapy
o
CIN II-III, OR 3.3 (1.04 to 10.46)
o
CIN I, OR 2.74 (0.62 to 12.07)
o
All CIN, OR 2.14 (1.05 to 4.33)
Cryotherapy may be less acceptable to patients than LEEP
There may be little difference in serious adverse events
between cryotherapy and LEEP, but there may be fewer minor
adverse events (such as pain) with cryotherapy
It is unclear whether there is a difference in fertility/obstetric
outcomes
High value was placed on CIN recurrence, serious adverse
events and acceptability to the patient
Low value was placed on minor adverse events
Need for more skilled providers to perform LEEP
Need for more or expensive equipment/supplies for LEEP;
electrical supply for LEEP
Need for local anaesthesia with LEEP
Example: Oseltamivir for Avian Flu
Recommendation: In patients with confirmed or
strongly suspected infection with avian influenza A
(H5N1) virus, clinicians should administer
oseltamivir treatment as soon as possible (strong
recommendation, very low quality evidence).
Remarks: This recommendation places a high value
on the prevention of death in an illness with a high
case fatality. It places relatively low values on
adverse reactions, the development of resistance
and costs of treatment.
Schunemann et al. The Lancet ID, 2007
Issues in guideline development
for immunization
• Causation versus effects of intervention
– Causation not equivalent to efficacy of interventions
– Bradford Hill
• Nearly half a century old – tablet from the mountain?
• Harms caused by interventions
– Assumption is that removal of vaccine (or no
exposure) leads to NO adverse effects
• How confident can one be that removal of the
exposure is effective in preventing disease?
– Whether immunization or environmental factors: will
depend on the intervention to remove exposure
Current state of recommendations
14
Current state of recommendations
• Reviewed 7527 recommendations
– 1275 randomly selected
• Inconsistency across/within
• 31.6% did not recommendations clearly
– Most of them not written as executable actions
• 52.7% did not indicated strength
14
Recommendation
• The Guideline Group recommends rapid DST
testing for resistance to INH and RIF or RIF alone
over conventional testing or no testing at the
time of diagnosis of TB (conditional,  /low
quality evidence).
• Values and preferences: A high value was placed
on outcomes such as preventing death and
transmission of MDR as a result of delayed
diagnosis as well as avoiding spending resources.
Group composition
• Group composition might affect
recommendation
• Common principle:
include all affected by the recommendations
( multi-disciplinary groups incl. patients/carers) – Industry?
• Keep a manageable size
The Process:
How to make it constructive?
• Group members are heterogeneous and might have different
objectives
• Chair facilitates rather than leads the group
• Common understanding of goal, tasks and ground rules
• Similar level of required knowhow and skills
• Sufficient technical support
Balanced participation and
formal agreement
• Key task of chair
• Formal consensus processes
Delphi Method
Nominal group process
Voting
Group processes
How to present controversies
• Lay out the controversies
• Describe the evidence
• Ask members to focus on the agreed upon
evidence and the factors leading to a decision
• Ask whether there still is disagreement
• Vote
– Make voting explicit and transparent (ways of
doing this to come tomorrow)
Conclusions - Process
• Success depends on strong chair(s), training of group, good
facilitation and technical support
– Clinical and methods co-chairs
• Formal consensus developing methods might support
agreement on recommendations
– Voting represents forced consensus
• Guideline development will require sufficient resources.
GRADE Grid
Critical
Outcome
Critical
Outcome
Important
Outcome
Not
High
Moderate
Low
Very low
Summary of findings
& estimate of effect
for each outcome
Systematic review
Grade down
Outcome
Grade up
P
I
C
O
Randomization
increases initial
quality
1. Risk of bias
2. Inconsistency
3. Indirectness
4. Imprecision
5. Publication
bias
1. Large effect
2. Dose
response
3. Confounders
Guideline development
Formulate recommendations:
• For or against (direction)
• Strong or conditional (strength)
By considering:
 Quality of evidence
 Balance benefits/harms
 Values and preferences
(Revise by considering:)
 Resource use (cost)
Grade
overall quality of evidence
across outcomes based on
lowest quality
of critical outcomes
•
•
•
•
“We recommend using…/should”
“We suggest using…/might”
“We recommend against using…/might not”
“We suggest against using…/should not”
Conclusions
 WHO guidelines should be based on the best available
evidence to be evidence based
 GRADE is the approach used by WHO and gaining
acceptance internationally
 combines what is known in health research methodology and
provides a structured approach to improve communication
 Does not avoid judgments but provides framework
 Criteria for evidence assessment across questions and
outcomes
 Criteria for moving from evidence to recommendations
 Transparent, systematic
 four categories of quality of evidence
 two grades for strength of recommendations
 Transparency in decision making and judgments is key
Format
• Mix of seminars/interactive lectures, self
directed learning and simulation
– Large group and smaller group discussion
– Computer work
• Simulate guideline panel work
• Select rapporteur (both for large group and
any small group work)
Format
• Mix of seminars/interactive lectures, self
directed learning and simulation
– Large group and smaller group discussion
– Computer work?
• Simulate guideline panel work
• Select rapporteur (for any small group work)
Download