Uploaded by Layan

Understanding GRADE: an introduction

Journal of Evidence-Based Medicine ISSN 1756-5391
Understanding GRADE: an introduction
Gabrielle Goldet and Jeremy Howick
Department of Primary Health Sciences, University of Oxford, Oxford, UK
Evidence-based medicine; GRADE;
randomized controlled trial; systematic
Jeremy Howick, Department of Primary Care
Health Sciences, New Radcliffe House,
University of Oxford, Oxford OX2 2GG, UK.
Tel: 01865 289 258
Fax: 01865 289 287
Email: jeremy.howick@phc.ox.ac.uk
Financial aid
J Howick is funded by the National Institute
of Health (UK). G Goldet received no funding
for this work.
Author contributions
This was a collaborative effort. J Howick
(guarantor) wrote the first draft, and the
paper was developed in meetings and
correspondence between J Howick and G
Objective: Grading of recommendations, assessment, development, and evaluations (GRADE) is arguably the most widely used method for appraising studies to
be included in systematic reviews and guidelines. In order to use the GRADE system or know how to interpret it when reading reviews, reading several articles and
attending a workshop are required. Moreover, the GRADE system is not covered
in standard medical textbooks. Here, we explain GRADE concisely with the use of
examples so that students and other researchers can understand it.
Background: In order to use or interpret the GRADE system, reading several
articles and attending a workshop is currently required. Moreover, the GRADE
system is not covered in standard medical textbooks.
Methods: We read, synthesized, and digested the GRADE publications and contacted GRADE contributors for explanations where required. We composed a digested version of the system in a concise way a general medical audience could
Results: We were able to explain the GRADE basics clearly and completely in
under 1500 words.
Conclusions: While advanced critical appraisal requires judgment, training, and
practice, it is possible for a non-specialist to grasp GRADE basics very quickly.
Conflict of interest
The authors declare that they have no
conflicts of interest in the research.
Received 24 November 2012; accepted for
publication 11 January 2013.
doi: 10.1111/jebm.12018
The Grading of recommendations, assessment, development,
and evaluations (GRADE) system is emerging as the dominant method for appraising controlled studies and making recommendations for systematic reviews and guidelines
(1–12). Reading the series of publications dedicated to explaining GRADE (2–12) and perhaps attending a workshop
are usually required to grasp the GRADE system in sufficient detail to be able to use it for critical appraisal. There
are no concise explanations of GRADE intended for a general medical audience and medical textbooks do not include
explanations of GRADE. GRADE may thus elude students,
doctors, and policy makers who do not have the time to study
the literature or attend the workshops. Yet anyone reading
a systematic review or evaluating guideline recommendations needs to understand GRADE in order to appraise the
review or guideline. Taking all appraisal skills out of the
hands of clinicians and into the hands of expert reviewers is
undesirable from an Evidence-Based Medicine perspective.
Therefore a clear, brief, and accurate description of GRADE
is required.
What is GRADE?
GRADE is a method used by systematic reviewers and guideline developers to assess the quality of evidence and decide
C 2013 Wiley Publishing Asia Pty Ltd and Chinese Cochrane Center, West China Hospital of Sichuan University
JEBM 6 (2013) 50–54 G. Goldet and J. Howick
whether to recommend and intervention (1). GRADE differs
from other appraisal tools for three reasons: (i) because it
separates quality of evidence and strength of recommendation, (ii) the quality of evidence is assessed for each outcome,
and (iii) observational studies can be ‘upgraded’ if they meet
certain criteria.
Understanding GRADE: an introduction
from different trials and could not assert whether treatment
benefits outweighed the harms (24).
r Indirectness of evidence can refer to several
The GRADE method involves five distinct steps that we
explain here with examples.
Step 1
Assign an a priori ranking of ‘high’ to randomized controlled trials and ‘low’ to observational Studies. Randomized controlled trials are initially assigned a higher grade because they are usually less prone to bias than observational
studies (13, 14).
An indirect comparison of two drugs. If there are no
trials comparing drugs A and B directly, we infer a
comparison based on separate trials comparing drug
A with placebo and drug B with placebo. However
the validity of such an inference depends on the often
unjustified assumption that the two trial populations
were similar.
An indirect comparison of population, outcome, or
intervention, for example, the studies in the review
investigated one intervention, in one population with
a certain outcome, but the conclusions of the studies are intended to be applicable for related interventions, populations, or outcomes. For example,
the American College of Chest Physicians (ACCP)
downgraded the quality of the evidence for compression stocking use in trauma patients from high
to moderate because all the randomized controlled
trials had been performed on the general population
Step 2
‘Downgrade’ or ‘upgrade’ initial ranking. It is common
for randomized controlled trials and observational studies
to be downgraded because they suffer from identifiable bias
(15). Also, observational studies can be upgraded when multiple high-quality studies show consistent results (16–19).
r Imprecision when wide confidence intervals mar the qual-
Reasons to ‘downgrade’
r Publication bias when studies with ‘negative’ findings
r Risk of bias
Lack of clearly randomized allocation sequence
Lack of blinding
Lack of allocation concealment
Failure to adhere to intention-to-treat analysis
Trial is cut short
Large losses to follow-up.
There is strong evidence supporting the view that lack
of randomization, lack of allocation concealment, and lack
of blinding biases results (20). Intention-to-treat analysis is
also important to avoid bias arising when those dropping
out of trials due to harmful effects are not accounted for
(21). Similarly, interim results from trials that are cut short
often lead to overestimated effect sizes (22). Large losses due
to follow-up also lead to exaggerated effect estimates. For
example, in a randomized controlled trial comparing weight
loss programs in obese participants, 27% of participants were
lost to follow-up in a year. There is a putative positive bias
conferred on this study as participants benefiting from the
treatment were more likely to stay in the trial (23).
r Inconsistency when there is significant and unexplained
variability in results from different trials. For example,
a meta-analysis of early pharmacotherapy for prevention
of type 2 diabetes identified heterogeneity in the results
ity of the data.
remain unpublished. This can bias the review outcome
(26). For example, Turner et al examined all antidepressant
trials registered by the US Food and Drug Administration.
Of 38 trials with positive results, all but one was published,
whereas of the 36 trials with negative results, 22 were
unpublished (27)—a systematic review of all published
studies would give a biased result.
Reasons to ‘upgrade’
r Large effect when the effect is so large that bias common to observational studies cannot possibly account for
the result. For example, observational studies examining
the effect of antithrombotic therapy on thromboembolic
events in patients having undergone cardiac valve replacement revealed an 80% relative risk reduction, which was
used by the ACCP to rank the quality of evidence as high
r Dose–response relationship when the result is proportional to the degree of exposure. For example, the Nurses’
Health Study demonstrated a strong negative correlation
between physical activity and stroke risk—that is the more
physically active the nurses were, the smaller the risk of
stroke (29).
r All plausible biases only reducing an apparent treatment effect when all possible confounders would only
diminish the observed effect. It is likely thus that the
C 2013 Wiley Publishing Asia Pty Ltd and Chinese Cochrane Center, West China Hospital of Sichuan University
JEBM 6 (2013) 50–54 51
Understanding GRADE: an introduction
G. Goldet and J. Howick
actual effect is larger than the data suggests. For example, a systematic review of studies in 26,000 hospitals
found mortality was higher in for-profit private hospitals compared to non-profit public hospitals. Investigators suspected teaching hospitals might have confounded
the study because they are nonprofit and were suspected
to have low mortality. When teaching hospitals were removed from the analysis, however, the relative mortality
of for-profit hospitals actually increased (30).
Step 3
Assign final grade for the quality of evidence as ‘high’,
‘moderate’, ‘low’, or ‘very low’ for all the critically important outcomes (Box 1) (31). Upgrading and downgrading
requires careful judgment and thought. A rule of thumb is to
move up or down one category per issue. For example, the
quality of evidence for an outcome studied in randomized
controlled trials that initially starts as ‘high’ might move
to ‘moderate’ because of high risk of bias in the included
studies, and further down to ‘low’ if there was significant
unexplained heterogeneity between the trials, and even further down to ‘very low’ if there were few events leading to
confidence intervals including both important benefits and
Box 1 Final GRADE ranking
High we are very confident that the effect in the study
reflects the actual effect.
Moderate we are quite confident that the effect in the
study is close to the true effect, but it is also possible it is
substantially different.
Low the true effect may differ significantly from the
Very low the true effect is likely to be substantially
different from the estimated effect.
STEP 1: a priori
trial: HIGH
study: LOW
Downgrade for:
Risk of bias
Publicaon bias
Upgrade for:
Large consistent effect
Dose response
Confounders only
reducing size of effect
assign final
Step 4
Consider other factors that impact on the strength of recommendation for a course of action. High-quality evidence
does not always imply a strong recommendation. Recommendations must consider factors besides to the quality of
evidence (3). The first factor to consider is the balance between desirable and undesirable effects. Some interventions,
such as antibiotics to prevent urinary infections associated
with certain urologic procedures, have clear benefits and few
side effects and therefore a strong recommendation is uncontroversial (32, 33). In cases where the benefit to harm ratio
is less clear, then patient values and preferences, as well as
costs, need to be considered carefully (Figure 1).
Step 5
Make a ‘strong’ or ‘weak’ recommendation (34). When
the net benefit of the treatment is clear, patient values and
circumstances are unlikely to affect the decision to take a
treatment and a strong recommendation is appropriate. For
example, high-dose chemotherapy for testicular cancer is
often highly recommended for the increased likelihood of
survival in spite of chemotherapy’s toxicity (32, 35). Otherwise, if the evidence is weak or the balance of positive
and negative effects is vague, a weak recommendation is
appropriate. Consider the following example (borrowed from
the GRADE literature) concerning whether to recommend
screening for melanoma in primary care in the United States
(36, 37). The baseline risk for this population is 13.3 per
100,000, and there is weak evidence for the accuracy of
screening and the outcome of lethal melanoma; potential
harms from screening is lacking yet likely to include harmful consequences of false positive tests (38). Based on
this lack of strong evidence, the recommendation to screen
or not to screen will not be strong, and may require input from patients, friends, and family members about the
STEP 4: consider
factors affecng
Balance of desirable
and undesirable
Very low
Preference of
STEP 5: make
Strong for using
Weak for using
Strong against using
Weak against using
Figure 1 How GRADE is used to make recommendations; steps 1 to 3 are repeated for each critical outcome.
C 2013 Wiley Publishing Asia Pty Ltd and Chinese Cochrane Center, West China Hospital of Sichuan University
JEBM 6 (2013) 50–54 G. Goldet and J. Howick
different treatment options. Some people might choose to
recommend not screening because of the lack of strong evidence for benefit and potential harms. Others might recommend screening because they value the potential benefits
more highly (34).
GRADE is becoming increasingly popular for guideline developers and systematic reviewers. It is thus important for
anyone (students, clinicians, and policy makers) to understand GRADE. Advanced critical appraisal (using GRADE
or any other system) requires sound judgment, training, and
practice. However it is possible to grasp GRADE basics using
this concise and accurate summary.
We are very grateful to Gunn Vist, who provided expert
advice on the GRADE system and editorial suggestions.
1. Guyatt GH, Oxman AD, Vist GE, et al. GRADE: an emerging
consensus on rating quality of evidence and strength of
recommendations. BMJ 2008; 336(7650): 924−6.
2. Schünemann HJ, Oxman AD, Vist GE, et al. Interpreting results
and drawing conclusions. Cochrane Handbook for Systematic
Reviews of Interventions 2008; 359−87.
3. Balshem H, Helfand M, Schünemann HJ, et al. GRADE
guidelines: 3. Rating the quality of evidence. J Clin Epidemiol
2011; 64(4): 401−6.
4. Guyatt G, Oxman AD, Akl EA, et al. GRADE guidelines: 1.
Introduction-GRADE evidence profiles and summary of findings
tables. J Clin Epidemiol 2011; 64(4): 383−94.
5. Guyatt GH, Oxman AD, Kunz R, et al. GRADE guidelines: 2.
Framing the question and deciding on important outcomes. J
Clin Epidemiol 2011; 64(4): 395−400.
6. Guyatt GH, Oxman AD, Kunz R, et al. GRADE guidelines 6.
Rating the quality of evidence—imprecision. J Clin Epidemiol
2011; 64(12): 1283−93.
7. Guyatt GH, Oxman AD, Kunz R, et al. GRADE guidelines: 7.
Rating the quality of evidence—inconsistency. J Clin Epidemiol
2011; 64(12): 1294−302.
8. Guyatt GH, Oxman AD, Montori V, et al. GRADE guidelines:
5. Rating the quality of evidence—publication bias. J Clin
Epidemiol 2011; 64(12): 1277−82.
9. Guyatt GH, Oxman AD, Schünemann HJ, Tugwell P, Knottnerus
A. GRADE guidelines: a new series of articles in the Journal of
Clinical Epidemiology. J Clin Epidemiol 2011; 64(4): 380−2.
10. Guyatt GH, Oxman AD, Sultan S, et al. GRADE guidelines: 9.
Rating up the quality of evidence. J Clin Epidemiol 2011;
64(12): 1311−6.
Understanding GRADE: an introduction
11. Guyatt GH, Oxman AD, Vist G, et al. GRADE guidelines: 4.
Rating the quality of evidence—study limitations (risk of bias). J
Clin Epidemiol 2011; 64(4): 407−15.
12. Guyatt GH, Oxman AD, Kunz R, et al. GRADE guidelines: 8.
Rating the quality of evidence—indirectness. J Clin Epidemiol
2011; 64(12): 1303−10.
13. Odgaard-Jensen J, Vist GE, Timmer A, et al. Randomisation to
protect against selection bias in healthcare trials. Cochrane
database of systematic reviews 2011; (4): MR000012.
14. Howick J. The Philosophy of Evidence-Based Medicine. Oxford:
Wiley-Blackwell, 2011.
15. Ioannidis JP. Why most published research findings are false.
PLoS Med 2005; 2(8): e124.
16. Doll R, Hill AB. Smoking and carcinoma of the lung;
preliminary report. BMJ 1950; 2(4682): 739−48.
17. Doll R, Hill AB. A study of the aetiology of carcinoma of the
lung. Pak J Health 1953; 3(2): 65−94.
18. Doll R, Hill AB. The mortality of doctors in relation to their
smoking habits; a preliminary report. BMJ 1954; 1(4877):
19. Doll R, Hill AB. Lung cancer and other causes of death in
relation to smoking; a second report on the mortality of British
doctors. BMJ 1956; 2(5001): 1071−81.
20. Savović J, Jones HE, Altman DG, et al. Influence of reported
study design characteristics on intervention effect estimates from
randomized, controlled trials. Ann Intern Med 2012; 157(6):
21. Montori VM, Guyatt GH. Intention-to-treat principle. CMAJ
2001; 165(10): 1339−41.
22. Guyatt GH, Briel M, Glasziou P, Bassler D, Montori VM.
Problems of stopping trials early. BMJ 2012; 344:
23. Heshka S, Anderson JW, Atkinson RL, et al. Weight loss with
self-help compared with a structured commercial program: a
randomized trial. JAMA 2003; 289(14): 1792−8.
24. Krentz AJ. Some oral antidiabetic medications may prevent type
2 diabetes in people at high risk for it, but the evidence is
insufficient to determine whether the potential benefits outweigh
the possible risks. Evid Based Med 2011; 28:
25. Guyatt G, Gutterman D, Baumann MH, et al. Grading strength
of recommendations and quality of evidence in clinical
guidelines: report from an american college of chest physicians
task force. Chest 2006; 129(1): 174−81.
26. Scherer Roberta W, Langenberg P, von Elm E. Full publication
of results initially presented in abstracts. Cochrane Database of
Systematic Reviews 2007, Issue 2. Available from http://www.
27. Turner EH, Matthews AM, Linardatos E, Tell RA, Rosenthal R.
Selective publication of antidepressant trials and its influence on
apparent efficacy. N Engl J Med 2008; 358(3):
28. Hirsh J, Guyatt G, Albers GW, Harrington R, Schünemann HJ;
American College of Chest Physician. Antithrombotic and
C 2013 Wiley Publishing Asia Pty Ltd and Chinese Cochrane Center, West China Hospital of Sichuan University
JEBM 6 (2013) 50–54 53
Understanding GRADE: an introduction
thrombolytic therapy: American College of Chest Physicians
Evidence-Based Clinical Practice Guidelines (8th Edition). Chest
2008; 133(6 Suppl): 110S−2S.
29. Hu FB, Stampfer MJ, Colditz GA, et al. Physical activity and
risk of stroke in women. JAMA 2000; 283(22):
30. Devereaux PJ, Choi PT, Lacchetti C, et al. A systematic review
and meta-analysis of studies comparing mortality rates of private
for-profit and private not-for-profit hospitals. CMAJ 2002;
166(11): 1399−406.
31. Guyatt GH, Oxman AD, Kunz R, et al. What is “quality of
evidence” and why is it important to clinicians? BMJ 2008;
336(7651): 995−8.
32. Canfield SE, Dahm P. Rating the quality of evidence and the
strength of recommendations using GRADE. World J Urol 2011;
29(3): 311−7.
G. Goldet and J. Howick
33. Wolf JS, Bennett CJ, Dmochowski RR, et al. Best practice
policy statement on urologic surgery antimicrobial prophylaxis.
J Urol 2008; 179(4): 1379−90.
34. Guyatt GH, Oxman AD, Kunz R, et al. Going from evidence to
recommendations. BMJ 2008; 336(7652): 1049−51.
35. Feldman DR, Bosl GJ, Sheinfeld J, Motzer RJ. Medical
treatment of advanced testicular cancer. JAMA 2008; 299(6):
36. Atkins D, Best D, Briss PA, et al. Grading quality of evidence
and strength of recommendations. BMJ 2004; 328(7454):
37. Helfand M, Mahon S, Eden K. Screening for Skin Cancer. Am J
Prev Med 2001; 20(3): 47–58.
38. Evans I, Thornton H, Chalmers I. Testing Treatments: Better
Research for Better Healthcare. London: Pinter and Martin,
C 2013 Wiley Publishing Asia Pty Ltd and Chinese Cochrane Center, West China Hospital of Sichuan University
JEBM 6 (2013) 50–54 