Journal of Evidence-Based Medicine ISSN 1756-5391 METHODOLOGY Understanding GRADE: an introduction Gabrielle Goldet and Jeremy Howick Department of Primary Health Sciences, University of Oxford, Oxford, UK Keywords Evidence-based medicine; GRADE; randomized controlled trial; systematic review. Correspondence Jeremy Howick, Department of Primary Care Health Sciences, New Radcliffe House, University of Oxford, Oxford OX2 2GG, UK. Tel: 01865 289 258 Fax: 01865 289 287 Email: jeremy.howick@phc.ox.ac.uk Financial aid J Howick is funded by the National Institute of Health (UK). G Goldet received no funding for this work. Author contributions This was a collaborative effort. J Howick (guarantor) wrote the first draft, and the paper was developed in meetings and correspondence between J Howick and G Goldet. Abstract Objective: Grading of recommendations, assessment, development, and evaluations (GRADE) is arguably the most widely used method for appraising studies to be included in systematic reviews and guidelines. In order to use the GRADE system or know how to interpret it when reading reviews, reading several articles and attending a workshop are required. Moreover, the GRADE system is not covered in standard medical textbooks. Here, we explain GRADE concisely with the use of examples so that students and other researchers can understand it. Background: In order to use or interpret the GRADE system, reading several articles and attending a workshop is currently required. Moreover, the GRADE system is not covered in standard medical textbooks. Methods: We read, synthesized, and digested the GRADE publications and contacted GRADE contributors for explanations where required. We composed a digested version of the system in a concise way a general medical audience could understand. Results: We were able to explain the GRADE basics clearly and completely in under 1500 words. Conclusions: While advanced critical appraisal requires judgment, training, and practice, it is possible for a non-specialist to grasp GRADE basics very quickly. Conflict of interest The authors declare that they have no conflicts of interest in the research. Received 24 November 2012; accepted for publication 11 January 2013. doi: 10.1111/jebm.12018 Introduction The Grading of recommendations, assessment, development, and evaluations (GRADE) system is emerging as the dominant method for appraising controlled studies and making recommendations for systematic reviews and guidelines (1–12). Reading the series of publications dedicated to explaining GRADE (2–12) and perhaps attending a workshop are usually required to grasp the GRADE system in sufficient detail to be able to use it for critical appraisal. There are no concise explanations of GRADE intended for a general medical audience and medical textbooks do not include explanations of GRADE. GRADE may thus elude students, doctors, and policy makers who do not have the time to study 50 the literature or attend the workshops. Yet anyone reading a systematic review or evaluating guideline recommendations needs to understand GRADE in order to appraise the review or guideline. Taking all appraisal skills out of the hands of clinicians and into the hands of expert reviewers is undesirable from an Evidence-Based Medicine perspective. Therefore a clear, brief, and accurate description of GRADE is required. What is GRADE? GRADE is a method used by systematic reviewers and guideline developers to assess the quality of evidence and decide C 2013 Wiley Publishing Asia Pty Ltd and Chinese Cochrane Center, West China Hospital of Sichuan University JEBM 6 (2013) 50–54 G. Goldet and J. Howick whether to recommend and intervention (1). GRADE differs from other appraisal tools for three reasons: (i) because it separates quality of evidence and strength of recommendation, (ii) the quality of evidence is assessed for each outcome, and (iii) observational studies can be ‘upgraded’ if they meet certain criteria. Understanding GRADE: an introduction from different trials and could not assert whether treatment benefits outweighed the harms (24). r Indirectness of evidence can refer to several things. ◦ Using GRADE The GRADE method involves five distinct steps that we explain here with examples. Step 1 Assign an a priori ranking of ‘high’ to randomized controlled trials and ‘low’ to observational Studies. Randomized controlled trials are initially assigned a higher grade because they are usually less prone to bias than observational studies (13, 14). ◦ An indirect comparison of two drugs. If there are no trials comparing drugs A and B directly, we infer a comparison based on separate trials comparing drug A with placebo and drug B with placebo. However the validity of such an inference depends on the often unjustified assumption that the two trial populations were similar. An indirect comparison of population, outcome, or intervention, for example, the studies in the review investigated one intervention, in one population with a certain outcome, but the conclusions of the studies are intended to be applicable for related interventions, populations, or outcomes. For example, the American College of Chest Physicians (ACCP) downgraded the quality of the evidence for compression stocking use in trauma patients from high to moderate because all the randomized controlled trials had been performed on the general population (25). Step 2 ‘Downgrade’ or ‘upgrade’ initial ranking. It is common for randomized controlled trials and observational studies to be downgraded because they suffer from identifiable bias (15). Also, observational studies can be upgraded when multiple high-quality studies show consistent results (16–19). r Imprecision when wide confidence intervals mar the qual- Reasons to ‘downgrade’ r Publication bias when studies with ‘negative’ findings r Risk of bias ◦ ◦ ◦ ◦ ◦ ◦ Lack of clearly randomized allocation sequence Lack of blinding Lack of allocation concealment Failure to adhere to intention-to-treat analysis Trial is cut short Large losses to follow-up. There is strong evidence supporting the view that lack of randomization, lack of allocation concealment, and lack of blinding biases results (20). Intention-to-treat analysis is also important to avoid bias arising when those dropping out of trials due to harmful effects are not accounted for (21). Similarly, interim results from trials that are cut short often lead to overestimated effect sizes (22). Large losses due to follow-up also lead to exaggerated effect estimates. For example, in a randomized controlled trial comparing weight loss programs in obese participants, 27% of participants were lost to follow-up in a year. There is a putative positive bias conferred on this study as participants benefiting from the treatment were more likely to stay in the trial (23). r Inconsistency when there is significant and unexplained variability in results from different trials. For example, a meta-analysis of early pharmacotherapy for prevention of type 2 diabetes identified heterogeneity in the results ity of the data. remain unpublished. This can bias the review outcome (26). For example, Turner et al examined all antidepressant trials registered by the US Food and Drug Administration. Of 38 trials with positive results, all but one was published, whereas of the 36 trials with negative results, 22 were unpublished (27)—a systematic review of all published studies would give a biased result. Reasons to ‘upgrade’ r Large effect when the effect is so large that bias common to observational studies cannot possibly account for the result. For example, observational studies examining the effect of antithrombotic therapy on thromboembolic events in patients having undergone cardiac valve replacement revealed an 80% relative risk reduction, which was used by the ACCP to rank the quality of evidence as high (28). r Dose–response relationship when the result is proportional to the degree of exposure. For example, the Nurses’ Health Study demonstrated a strong negative correlation between physical activity and stroke risk—that is the more physically active the nurses were, the smaller the risk of stroke (29). r All plausible biases only reducing an apparent treatment effect when all possible confounders would only diminish the observed effect. It is likely thus that the C 2013 Wiley Publishing Asia Pty Ltd and Chinese Cochrane Center, West China Hospital of Sichuan University JEBM 6 (2013) 50–54 51 Understanding GRADE: an introduction G. Goldet and J. Howick actual effect is larger than the data suggests. For example, a systematic review of studies in 26,000 hospitals found mortality was higher in for-profit private hospitals compared to non-profit public hospitals. Investigators suspected teaching hospitals might have confounded the study because they are nonprofit and were suspected to have low mortality. When teaching hospitals were removed from the analysis, however, the relative mortality of for-profit hospitals actually increased (30). Step 3 Assign final grade for the quality of evidence as ‘high’, ‘moderate’, ‘low’, or ‘very low’ for all the critically important outcomes (Box 1) (31). Upgrading and downgrading requires careful judgment and thought. A rule of thumb is to move up or down one category per issue. For example, the quality of evidence for an outcome studied in randomized controlled trials that initially starts as ‘high’ might move to ‘moderate’ because of high risk of bias in the included studies, and further down to ‘low’ if there was significant unexplained heterogeneity between the trials, and even further down to ‘very low’ if there were few events leading to confidence intervals including both important benefits and harms. Box 1 Final GRADE ranking High we are very confident that the effect in the study reflects the actual effect. Moderate we are quite confident that the effect in the study is close to the true effect, but it is also possible it is substantially different. Low the true effect may differ significantly from the estimate. Very low the true effect is likely to be substantially different from the estimated effect. STEP 1: a priori ranking Randomised controlled trial: HIGH Observaonal study: LOW STEP 2: upgrade/downgrade Downgrade for: • Risk of bias • Inconsistency • Indirectness • Imprecision • Publicaon bias Upgrade for: • Large consistent effect • Dose response • Confounders only reducing size of effect STEP 3: assign final grade Step 4 Consider other factors that impact on the strength of recommendation for a course of action. High-quality evidence does not always imply a strong recommendation. Recommendations must consider factors besides to the quality of evidence (3). The first factor to consider is the balance between desirable and undesirable effects. Some interventions, such as antibiotics to prevent urinary infections associated with certain urologic procedures, have clear benefits and few side effects and therefore a strong recommendation is uncontroversial (32, 33). In cases where the benefit to harm ratio is less clear, then patient values and preferences, as well as costs, need to be considered carefully (Figure 1). Step 5 Make a ‘strong’ or ‘weak’ recommendation (34). When the net benefit of the treatment is clear, patient values and circumstances are unlikely to affect the decision to take a treatment and a strong recommendation is appropriate. For example, high-dose chemotherapy for testicular cancer is often highly recommended for the increased likelihood of survival in spite of chemotherapy’s toxicity (32, 35). Otherwise, if the evidence is weak or the balance of positive and negative effects is vague, a weak recommendation is appropriate. Consider the following example (borrowed from the GRADE literature) concerning whether to recommend screening for melanoma in primary care in the United States (36, 37). The baseline risk for this population is 13.3 per 100,000, and there is weak evidence for the accuracy of screening and the outcome of lethal melanoma; potential harms from screening is lacking yet likely to include harmful consequences of false positive tests (38). Based on this lack of strong evidence, the recommendation to screen or not to screen will not be strong, and may require input from patients, friends, and family members about the STEP 4: consider factors affecng recommendaon Moderate Balance of desirable and undesirable effects Low Cost-effecveness Very low Preference of paents High STEP 5: make recommendaon Strong for using Weak for using Strong against using Weak against using Figure 1 How GRADE is used to make recommendations; steps 1 to 3 are repeated for each critical outcome. 52 C 2013 Wiley Publishing Asia Pty Ltd and Chinese Cochrane Center, West China Hospital of Sichuan University JEBM 6 (2013) 50–54 G. Goldet and J. Howick different treatment options. Some people might choose to recommend not screening because of the lack of strong evidence for benefit and potential harms. Others might recommend screening because they value the potential benefits more highly (34). Summary GRADE is becoming increasingly popular for guideline developers and systematic reviewers. It is thus important for anyone (students, clinicians, and policy makers) to understand GRADE. Advanced critical appraisal (using GRADE or any other system) requires sound judgment, training, and practice. However it is possible to grasp GRADE basics using this concise and accurate summary. Acknowledgement We are very grateful to Gunn Vist, who provided expert advice on the GRADE system and editorial suggestions. References 1. Guyatt GH, Oxman AD, Vist GE, et al. GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ 2008; 336(7650): 924−6. 2. Schünemann HJ, Oxman AD, Vist GE, et al. Interpreting results and drawing conclusions. Cochrane Handbook for Systematic Reviews of Interventions 2008; 359−87. 3. Balshem H, Helfand M, Schünemann HJ, et al. GRADE guidelines: 3. Rating the quality of evidence. J Clin Epidemiol 2011; 64(4): 401−6. 4. Guyatt G, Oxman AD, Akl EA, et al. GRADE guidelines: 1. Introduction-GRADE evidence profiles and summary of findings tables. J Clin Epidemiol 2011; 64(4): 383−94. 5. Guyatt GH, Oxman AD, Kunz R, et al. GRADE guidelines: 2. Framing the question and deciding on important outcomes. J Clin Epidemiol 2011; 64(4): 395−400. 6. Guyatt GH, Oxman AD, Kunz R, et al. GRADE guidelines 6. Rating the quality of evidence—imprecision. J Clin Epidemiol 2011; 64(12): 1283−93. 7. Guyatt GH, Oxman AD, Kunz R, et al. GRADE guidelines: 7. Rating the quality of evidence—inconsistency. J Clin Epidemiol 2011; 64(12): 1294−302. 8. Guyatt GH, Oxman AD, Montori V, et al. GRADE guidelines: 5. Rating the quality of evidence—publication bias. J Clin Epidemiol 2011; 64(12): 1277−82. 9. Guyatt GH, Oxman AD, Schünemann HJ, Tugwell P, Knottnerus A. GRADE guidelines: a new series of articles in the Journal of Clinical Epidemiology. J Clin Epidemiol 2011; 64(4): 380−2. 10. Guyatt GH, Oxman AD, Sultan S, et al. GRADE guidelines: 9. Rating up the quality of evidence. J Clin Epidemiol 2011; 64(12): 1311−6. Understanding GRADE: an introduction 11. Guyatt GH, Oxman AD, Vist G, et al. GRADE guidelines: 4. Rating the quality of evidence—study limitations (risk of bias). J Clin Epidemiol 2011; 64(4): 407−15. 12. Guyatt GH, Oxman AD, Kunz R, et al. GRADE guidelines: 8. Rating the quality of evidence—indirectness. J Clin Epidemiol 2011; 64(12): 1303−10. 13. Odgaard-Jensen J, Vist GE, Timmer A, et al. Randomisation to protect against selection bias in healthcare trials. Cochrane database of systematic reviews 2011; (4): MR000012. 14. Howick J. The Philosophy of Evidence-Based Medicine. Oxford: Wiley-Blackwell, 2011. 15. Ioannidis JP. Why most published research findings are false. PLoS Med 2005; 2(8): e124. 16. Doll R, Hill AB. Smoking and carcinoma of the lung; preliminary report. BMJ 1950; 2(4682): 739−48. 17. Doll R, Hill AB. A study of the aetiology of carcinoma of the lung. Pak J Health 1953; 3(2): 65−94. 18. Doll R, Hill AB. The mortality of doctors in relation to their smoking habits; a preliminary report. BMJ 1954; 1(4877): 1451−5. 19. Doll R, Hill AB. Lung cancer and other causes of death in relation to smoking; a second report on the mortality of British doctors. BMJ 1956; 2(5001): 1071−81. 20. Savović J, Jones HE, Altman DG, et al. Influence of reported study design characteristics on intervention effect estimates from randomized, controlled trials. Ann Intern Med 2012; 157(6): 429−38. 21. Montori VM, Guyatt GH. Intention-to-treat principle. CMAJ 2001; 165(10): 1339−41. 22. Guyatt GH, Briel M, Glasziou P, Bassler D, Montori VM. Problems of stopping trials early. BMJ 2012; 344: e3863. 23. Heshka S, Anderson JW, Atkinson RL, et al. Weight loss with self-help compared with a structured commercial program: a randomized trial. JAMA 2003; 289(14): 1792−8. 24. Krentz AJ. Some oral antidiabetic medications may prevent type 2 diabetes in people at high risk for it, but the evidence is insufficient to determine whether the potential benefits outweigh the possible risks. Evid Based Med 2011; 28: 948−64. 25. Guyatt G, Gutterman D, Baumann MH, et al. Grading strength of recommendations and quality of evidence in clinical guidelines: report from an american college of chest physicians task force. Chest 2006; 129(1): 174−81. 26. Scherer Roberta W, Langenberg P, von Elm E. Full publication of results initially presented in abstracts. Cochrane Database of Systematic Reviews 2007, Issue 2. Available from http://www. mrw.interscience.wiley.com/cochrane/clsysrev/articles/ MR000005/frame.html. 27. Turner EH, Matthews AM, Linardatos E, Tell RA, Rosenthal R. Selective publication of antidepressant trials and its influence on apparent efficacy. N Engl J Med 2008; 358(3): 252−60. 28. Hirsh J, Guyatt G, Albers GW, Harrington R, Schünemann HJ; American College of Chest Physician. Antithrombotic and C 2013 Wiley Publishing Asia Pty Ltd and Chinese Cochrane Center, West China Hospital of Sichuan University JEBM 6 (2013) 50–54 53 Understanding GRADE: an introduction thrombolytic therapy: American College of Chest Physicians Evidence-Based Clinical Practice Guidelines (8th Edition). Chest 2008; 133(6 Suppl): 110S−2S. 29. Hu FB, Stampfer MJ, Colditz GA, et al. Physical activity and risk of stroke in women. JAMA 2000; 283(22): 2961−7. 30. Devereaux PJ, Choi PT, Lacchetti C, et al. A systematic review and meta-analysis of studies comparing mortality rates of private for-profit and private not-for-profit hospitals. CMAJ 2002; 166(11): 1399−406. 31. Guyatt GH, Oxman AD, Kunz R, et al. What is “quality of evidence” and why is it important to clinicians? BMJ 2008; 336(7651): 995−8. 32. Canfield SE, Dahm P. Rating the quality of evidence and the strength of recommendations using GRADE. World J Urol 2011; 29(3): 311−7. 54 G. Goldet and J. Howick 33. Wolf JS, Bennett CJ, Dmochowski RR, et al. Best practice policy statement on urologic surgery antimicrobial prophylaxis. J Urol 2008; 179(4): 1379−90. 34. Guyatt GH, Oxman AD, Kunz R, et al. Going from evidence to recommendations. BMJ 2008; 336(7652): 1049−51. 35. Feldman DR, Bosl GJ, Sheinfeld J, Motzer RJ. Medical treatment of advanced testicular cancer. JAMA 2008; 299(6): 672−84. 36. Atkins D, Best D, Briss PA, et al. Grading quality of evidence and strength of recommendations. BMJ 2004; 328(7454): 1490–498. 37. Helfand M, Mahon S, Eden K. Screening for Skin Cancer. Am J Prev Med 2001; 20(3): 47–58. 38. Evans I, Thornton H, Chalmers I. Testing Treatments: Better Research for Better Healthcare. London: Pinter and Martin, 2010. C 2013 Wiley Publishing Asia Pty Ltd and Chinese Cochrane Center, West China Hospital of Sichuan University JEBM 6 (2013) 50–54