GRADE detailed series - JCE GRADE Guidelines: Diagnosis II Version 20130912 The GRADE approach for tests and strategies: from test accuracy to patient important outcomes and recommendations Contributors so far (based on this and the prior version of the single article and in no particular order at this point – others welcome): Holger J Schünemann, Reem Mustafa, Nancy Santesso, Jan Brozek, Patrick Bossuyt, Miranda Langendam, Andrew D Oxman, Karen R Steingart, Tommaso Trenti, Paul Glasziou, Roman Jaeschke, Julia Kreis, Mark Helfand, Rob Scholten, Anne Rutjes, Gordon H Guyatt for the GRADE Working Group !Attention!: heavy self citation needs to be addressed Word count: Tables: 3 Figures: 4 Document1 1 GRADE detailed series - JCE GRADE Guidelines: Diagnosis II Version 20130912 Key points: GRADE has developed and applied a comprehensive framework for rating the confidence in estimates from a body of evidence obtained from diagnostic test studies and linking this evidence to health outcomes. Preferably, developers of recommendations will evaluate and rate a body of evidence for each of the pieces of evidence that is required for decision-making. Ideally, they will base the rating on a systematic review of the required evidence. Further research of linking diagnostic test accuracy evidence to other evidence that needed to judgments about the impact on health outcomes should focus on combining the rating of the confidence in estimates from the various bodies of evidence Document1 2 GRADE detailed series - JCE GRADE Guidelines: Diagnosis II Version 20130912 Abstract: In the present article we will focus on GRADE’s framework of moving from diagnostic test accuracy to health related outcomes when direct studies evaluating the impact of diagnostic tests or strategies are not providing the best available evidence. We will also describe how guideline developers can use information from diagnostic test accuracy to develop a recommendation. Document1 3 GRADE detailed series - JCE GRADE Guidelines: Diagnosis II Version 20130912 The GRADE approach for tests and strategies: from test accuracy to patient important outcomes and recommendations 1. Introduction The previous article in this series describes how systematic review authors and guideline developers assess their confidence in the estimates of a body of evidence evaluating tests and testing strategies, i.e. the quality of evidence. In that article we focused on applying GRADE to test accuracy (TA) studies. In the present article we will focus on GRADE’s framework of moving from TA to important health outcomes when direct studies evaluating the impact of diagnostic tests or strategies are not providing the best available evidence. We will also describe how guideline developers can use information from diagnostic test accuracy to develop a recommendation. Thus, the first part of this article will describe the judgments about directness involved in assessing the link between TA and important health outcomes. In particular, we will describe why guideline panels should be cautious when they use evidence of TA as the basis for recommendations because it requires review of and judgements about the evidence that links the evidence about test accuracy to patient or population-important outcomes. The second part will focus on the steps and criteria that are involved in moving from evidence to a recommendation or decision using examples from guidelines that have applied this approach to diagnostic tests and strategies. We will conclude by summarizing work done, challenges and suggestions for future work. 2.0 What evidence is needed to make assumptions about patient outcomes? Document1 4 GRADE detailed series - JCE GRADE Guidelines: Diagnosis II Version 20130912 Guideline developers will have to develop a clear idea of what consequences they anticipate from applying a test or strategy. In fact, the application of GRADE requires an understanding and consideration that a recommendation about or the use of a test should result from balancing the desirable and the undesirable consequences (including non-health related consequences such as resource utilization).1-3 Applying GRADE, in the context of making recommendations or decisions about tests, means that a division between testing and therapy, treatment or observation is artificial, but sometimes practical or pragmatic, e.g. when comparing the diagnostic test accuracy of two competing tests, one of which is already established. Figure 1 emphasizes that testing, therefore, has consequences that become part of an intervention, including observation when no further action is required or possible, and should be considered 4. Developers of recommendations should develop a pathway that follows from applying a test which allows for the consideration of such consequences. Figure 2 describes a pathway developed for a World Health Organization guideline on screening and treatment of cervical intraepithelial neoplasia (CIN), a precursor for cervical cancer. The guideline panel considered different screening options, human papilloma virus (HPV) and visual inspection with acetic acid (VIA), and subsequent treatment that patients may or may not be able to receive. For instance, only some patients, depending on the type of lesion would be able to receive cryotherapy. If the lesion would not be deemed eligible for cryotherapy other therapeutic interventions, in this case cold knife conization or loop electrosurgical excision procedure (LEEP), might be the best alternatives. The panel then considered the possible consequences that can result from each of the possible screen and treat pathways in terms of health outcomes, considering both benefits and harms. Figure 3 describes an alternative generic analytic framework Document1 5 GRADE detailed series - JCE GRADE Guidelines: Diagnosis II Version 20130912 developed by the United States Preventive Services Taskforce (USPSTF) to describe these considerations and relations. Figure 4 describes the sequential steps and considerations that are important when evidence following the consequences of testing, treating, observing and missing diagnoses. Returning to the example of the cervical screening guidelines, step 1 included a systematic review of the body of evidence describing the DTA of the two different screening tests against a reference standard. The authors of the systematic review conducted a metaanalysis to obtain summary estimates of the sensitivity and specificity of the two screening tests. The meta-analysis revealed a pooled sensitivity of of 95% (95% CI: 84 to 98) and pooled specificity 84% (95% CI: 72 to 91) for HPV and a pooled sensitivity of 69% (95% CI: 54 to 81) and a pooled specificity of 87% (95% CI: 79 to 92) for VIA, respectively, based on five studies that compared both tests against a reference standard. Applying these summary statistics to the pretest probability of the target population (assumed here to be 5%) allowed determining the number of test positives (true positives and false positives) and test negatives (true negatives and false negatives). For example, 4.8% (48 per 1000 women or 95% of 5%) in the HPV and 3.5% (35 per 1000 women) (69% of 5%) in the VIA group would be true positives, respectively. Step 2 involved linking these test outcomes to the anticipated important health outcomes. Together with a literature review about which outcomes women may experience, the multidisciplinary panel provided information about such outcomes. Women with a positive test, i.e. indicating presence of CIN, would undergo further management with one of the possible therapies to reduce the risk of cervical cancer. Treatment will come with a certain risk of cure and side effects. However, also those with a false positive test results would undergo treatment and experience the adverse consequences without experiencing the benefits. Women with a negative test results would Document1 6 GRADE detailed series - JCE GRADE Guidelines: Diagnosis II Version 20130912 not be treated and be further observed. However, this will include women with a false negative test result who will have a certain risk of CIN developing into cervical cancer, following the natural history of disease. While this model ignores the possibility of repeating the screening test, in some settings, such as low and middle-income countries, women may undergo only a single test and is realistic. The corresponding estimates of treatment efficacy, side effects and natural history, should, ideally, be derived from systematic reviews of the relevant evidence. For example, the efficacy of cryotherapy should be evaluated with a systematic review as should the risk of developing cervical cancer in untreated CIN (the natural history of the disease). In fact, the systematic review determining the efficacy of cryotherapy revealed a 61% relative risk reduction based on observational data5 and the search for evidence about the natural history an approximate 2% progression over 30 years to cervical cancer (http://globocan.iarc.fr/factsheets/cancers/cervix.asp - April 18, 2012). For a guideline evaluating the use of testing for cows’ milk allergy (CMA), a condition affecting between 2 and 5% of children, the guideline panel was asked to evaluate the possible benefits and downsides of the various test outcomes on the basis of case examples using semi-quantitative information. 6 7 For instance, in order to understand the consequences associated with the 264 per 1000 false negative skin prick tests in a population with a high risk (pretest probability) for CMA, guideline panel members were provided with typical case scenarios: the child suspected of CMA will be allowed to return home and will have an allergic reaction (possibly anaphylactic) to cow’s milk at home; high parental anxiety and reluctance to introduce future foods; may lead to multiple exclusion diet. The real cause of symptoms (i.e. CMA) will be missed leading to unnecessary investigations and treatments. These case scenarios, the baseline risk and the possible Document1 7 GRADE detailed series - JCE GRADE Guidelines: Diagnosis II Version 20130912 consequences, were based on a review of the literature and information obtained from allergist with experience in caring for affected patients. In another guideline, a WHO guideline panel considered the consequences of applying serological tests in a population with a 10% risk of pulmonary tuberculosis where a sensitivity of 59% and specificity of 95% leads to a risk of 81 per 1000 false positives and 36 per 1000 false negatives.8 Guideline panel members applied evidence synthesized in tuberculosis treatment guidelines to link the treatment efficacy and possible detrimental effects from delayed diagnosis, confusing other respiratory diseases (such as pneumonia) with pulmonary TB and consequential death from other disease, adverse drug reactions and unnecessary consumption of health care and patient resources. [here other examples] 3.0 How can the confidence in the estimates be graded Preferably, developers of recommendations will evaluate and rate a body of evidence for each of the pieces of evidence that is required for decision-making. Ideally, they will base the rating on a systematic review of the required evidence. This rating will inform how direct diagnostic test accuracy, in GRADE considered a surrogate marker that requires further evaluation of the related consequences, relates to health outcomes. 3.1 Rating the diagnostic test accuracy As described in the prior article when direct evidence about important health outcomes is not available or associated with low confidence, GRADE begins by assessing the confidence in the estimates of the DTA related to the test. The systematic review of HPV and VIA revealed that there was important inconsistency in the specificity estimates across the 5 Document1 8 GRADE detailed series - JCE GRADE Guidelines: Diagnosis II Version 20130912 included studies yielding an overall confidence rating of moderate while the confidence rating for the specificity estimates remained high (Table 1). Layer 1 SoF Tables do not consider the directness of the relation between DTA and health outcomes. [other examples here, e.g. CMA] 3.2 Rating the linked evidence – directness of the health outcomes To complete an assessment of the confidence in the estimates, in an ideal situation a rating of the confidence in the estimates should be undertaken for the body of evidence informing all key input variables. In other words, assessing the linked evidence completes the assessment of the directness of the outcomes in GRADE’s directness domain. For example, estimates of the baseline risk used to calculate the test results in Table 1 may influence the overall rating of the confidence. Application of GRADE for prognostic studies or prevalence studies will inform the rating of this confidence.9ref Falavigna Similarly the confidence in the estimates of the treatment effects of cryotherapy and other treatments should influence the overall confidence in the body of evidence supporting a recommendation. For example, applying GRADE for interventions the confidence in the estimates for the effects of cryotherapy was very low coming from observational studies with high risk of bias. Persistence of CIN in false negatives was estimated as approximately 70% based on moderate quality evidence from longitudinal prognostic observational studies.check Thus, step 2 in Figure 4 involves a rating of the confidence in the estimates when going from the test results to important health outcomes for a population. The authors of the cervical cancer guideline had very low confidence in the linked bodies of evidence when they derived the estimates for the various patient important outcomes based on the considerations above. Table 2 describes a layer 3 SoF Table for tests based on Document1 9 GRADE detailed series - JCE GRADE Guidelines: Diagnosis II Version 20130912 the best available research evidence and additional information the guideline panel obtained. The explanations in the related text provide the sources of evidence, assumptions made and explanations. [other examples here, including layer three table where rating down for directness is part of the overall quality rating] 4. How does the confidence in the estimates of the linked evidence influence the overall rating of the confidence in estimates Having realized that the linked evidence often lowers the overall confidence one has in the evidence required to formulate recommendations and make decisions about tests, there are several options for rating the overall confidence in the estimates. Option 1. Evaluate which bodies of linked evidence are critical for decision-making and base the overall rating of the confidence for population important outcomes on the lowest confidence of these bodies of evidence. For example, despite high confidence in the estimates of diagnostic test accuracy for TP and FN and moderate confidence in TN and FP, the recommendation would be associated with a rating of very low confidence resulting from the uncertainty about several of the linked bodies of evidence (e.g. natural history of the disease, efficacy of cryotherapy). Whether or not linked evidence is critical to decisionmaking will be influenced by the frequency and importance with which an outcome occurs. This is the approach the guideline panel on cervical cancer screening took by rating the overall confidence as very low. Option 1b. Base the overall rating on any of the linked evidence without considering what might lead to critical outcomes. Document1 10 GRADE detailed series - JCE GRADE Guidelines: Diagnosis II Version 20130912 Option 2. Present the evidence from diagnostic test accuracy and linked evidence separately without assigning an overall rating of the confidence. For further discussion When can linked evidence from other scenarios be applied without completing a full assessment of the evidence review for all linked evidence. 5. How can decisions and recommendations be made about tests A recommendation associated with a diagnostic question follows from an evaluation of the balance between the desirable and undesirable consequences of the test and subsequent therapy, treatment, management or observation after applying the test (Figure 1). When the consequences of the false positive, false negative, inconclusive results and complication rates with the alternative diagnostic strategies are quite secure, and those outcomes are important, we can make strong inferences concerning the relative impact of a test on important health outcomes. The guideline panel that developed recommendations regarding serological testing in patients with cow milk allergy (see example 2 in box 1), determined that for patients with a relatively low probability of the disease (approximately 10%) skin prick testing results in a large number of false positives leading to unnecessary anxiety and further testing. It also leads to missing about 3% (33/1000 tested patients are false negatives) of patients who suffer from cow milk allergy with the risk of severe allergic reaction and death. Document1 11 GRADE detailed series - JCE GRADE Guidelines: Diagnosis II Version 20130912 Uncertainty regarding the consequences of the false positives and false negative results will weaken inferences about the balance between desirable and undesirable consequences. Consider the consequences of false positive and false negative results of diagnostic imaging for patients suspected of acute sinusitis. Since the primary benefit of treatment is shortening of illness duration and symptoms, the balance of the patient important consequences is less clear between a) patients with false negatives results who are deprived of antibiotics and will have a longer duration of symptoms and an increased risk of complications from the infection, but suffer no side effects from antibiotic use, and b) patients with false positive results who receive antibiotics when they should not may feel relieved that they have received care and treatment. Furthermore, guideline panels will have to consider the societal consequences (e.g. antibiotics resistance) of administering antibiotics to false positives.9 GRADE has used decision tables that increase transparency of the decision making process to document such considerations by a panel.3 Extensive work has informed the selection of criteria that influence the development of health care recommendations about tests.GIF report, Reem thesis JCE papers Formats of these tables, labeled evidence to recommendation frameworks have been further developed as part of the DECIDE project and are included in GRADE’s guideline development tool.10 The purpose of the frameworks is to help guideline panels developing recommendations about the use of tests to move from evidence to recommendations. It is intended to inform decision makers’ judgments about the desirable and undesirable of the considered options (these may be diagnostic tests used for diagnosis, monitoring or other purposes which may sometimes be combined with management options). The frameworks also ensure that important factors that determine a recommendation (criteria) are considered by providing a Document1 12 GRADE detailed series - JCE GRADE Guidelines: Diagnosis II Version 20130912 concise summary of the best available evidence. They allow for a structure discussion and identify reasons for agreement and disagreement. One or more of the three layers of a SoF table for tests should be included in the framework with a link to the full GRADE evidence profile. Of all three layers a description of the expected health outcomes, ideally in a layer 3 SoF table (Table 2) or as a narrative summary, should be included in the evidence to recommendation framework. Modeling, that is calculating the anticipated benefits and harms as well as other desirable and undesirable consequences, is often required. The assumptions for these models should be described in the framework or in background information (Table 2). Other information listed in Table 3 can be included when guideline panels intend to achieve complete transparency about the recommendations they make (supplement – evidence to recommendation framework). The cervical cancer guideline panel made a … recommendation for the use of the following tests based on the considerations described in the evidence to recommendation framework (supplement). 6. Conclusions GRADE has developed and applied a comprehensive framework for rating the confidence in estimates from a body of evidence obtained from DTA studies and linking this evidence to health outcomes when studies directly evaluating the impact of testing on health outcomes are not available or not trustworthy. The framework focuses on explicitly and transparently laying out the bodies of evidence required to making the link. While the framework has facilitated the development of recommendations about diagnostic tests for several guidelines6-8 11add other references and can be ready applied, further examples and future Document1 13 GRADE detailed series - JCE GRADE Guidelines: Diagnosis II Version 20130912 research in several areas addressing the assessment of the confidence and the degree of modeling required will move this field forward. Further testing of evidence to recommendation frameworks will facilitate the development of recommendations about tests. Document1 14 GRADE detailed series - JCE GRADE Guidelines: Diagnosis II Version 20130912 Disclosure Statement The authors are members of the GRADE Working Group. Acknowledgment This work was partially funded by a European Community's Sixth Framework Programme (FP6/2001-2006) “The human factor, mobility and Marie Curie Actions Scientist Reintegration” IGR 42192 – (“GRADE to Dr. Schünemann), the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement °258583 (DECIDE project), the German Insurance Fund and the Cochrane Collaboration (Methods Innovation Fund). We would like to thank the many individuals and organizations who have contributed to the progress of the GRADE approach through funding of meetings and feedback on the work described in this article. Document1 15 GRADE detailed series - JCE GRADE Guidelines: Diagnosis II Version 20130912 References 1. Schunemann HJ, Oxman AD, Brozek J, Glasziou P, Bossuyt P, Chang S, et al. GRADE: assessing the quality of evidence for diagnostic recommendations. ACP J Club 2008;149(6):2. 2. Andrews J, Guyatt G, Oxman AD, Alderson P, Dahm P, Falck-Ytter Y, et al. GRADE guidelines: 14. Going from evidence to recommendations: the significance and presentation of recommendations. Journal of clinical epidemiology 2013;66(7):71925. 3. Andrews JC, Schunemann HJ, Oxman AD, Pottie K, Meerpohl JJ, Coello PA, et al. GRADE guidelines: 15. Going from evidence to recommendation-determinants of a recommendation's direction and strength. Journal of clinical epidemiology 2013;66(7):726-35. 4. Schunemann HJ, Mustafa R, Brozek J. [Diagnostic accuracy and linked evidence--testing the chain]. Zeitschrift fur Evidenz, Fortbildung und Qualitat im Gesundheitswesen 2012;106(3):153-60. 5. Santesso N, Schunemann H, Blumenthal P, De Vuyst H, Gage J, Garcia F, et al. World Health Organization Guidelines: Use of cryotherapy for cervical intraepithelial neoplasia. International journal of gynaecology and obstetrics: the official organ of the International Federation of Gynaecology and Obstetrics 2012;118(2):97-102. 6. Hsu J, Brozek JL, Terracciano L, Kreis J, Compalati E, Stein AT, et al. Application of GRADE: Making evidence-based recommendations about diagnostic tests in clinical practice guidelines. Implementation science : IS 2011;6:62. 7. Fiocchi A, Brozek J, Schunemann H, Bahna SL, von Berg A, Beyer K, et al. World Allergy Organization (WAO) Diagnosis and Rationale for Action against Cow's Milk Allergy (DRACMA) Guidelines. Pediatr Allergy Immunol 2010;21 Suppl 21:1-125. 8. WHO. Commercial Serodiagnostic Tests for Diagnosis of Tuberculosis 2011;ISBN 978 92 4 150205 4 9. Spencer FA, Iorio A, You J, Murad MH, Schunemann HJ, Vandvik PO, et al. Uncertainties in baseline risk estimates and confidence in treatment effects. BMJ 2012;345:e7401. 10. Treweek S, Oxman AD, Alderson P, Bossuyt PM, Brandt L, Brozek J, et al. Developing and Evaluating Communication Strategies to Support Informed Decisions and Practice Based on Evidence (DECIDE): protocol and preliminary results. Implementation science : IS 2013;8:6. 11. Bates SM, Jaeschke R, Stevens SM, Goodacre S, Wells PS, Stevenson MD, et al. Diagnosis of DVT: Antithrombotic Therapy and Prevention of Thrombosis, 9th ed: American College of Chest Physicians Evidence-Based Clinical Practice Guidelines. Chest 2012;141(2 Suppl):e351S-418S. 12. Harris RP, Helfand M, Woolf SH, Lohr KN, Mulrow CD, Teutsch SM, et al. Current methods of the US Preventive Services Task Force: a review of the process. American journal of preventive medicine 2001;20(3 Suppl):21-35. Document1 16 GRADE detailed series - JCE GRADE Guidelines: Diagnosis II Version 20130912 Figure 1. Linkage of testing, interventions and outcomes Testing/diagn osis (Uncertainty due to baseline risk or pretest probability as a result of prognostic studies and imperfect diagnostic accuracy studies) •symptoms, prognostic factors, tests, other diagnostic tests or strategies Therapy, treatment, observation, management • either evaluated directly or or indirectly as linked evidence" Outcome • possibly other actions, monitoring - directly investigated or based on assumptions from indirect evidence Intervention Document1 17 GRADE detailed series - JCE GRADE Guidelines: Diagnosis II Version 20130912 Figure 2. Clinical pathway for cervical cancer screen and treat approach. HPV = human papilloma virus VIA = visual inspection with acetic acid Test + = True and false positive tests (not known when test is performed) Test - = True and false negatives (not known when test is performed) CKC = Cold knife conization Leep = Loop electrosurgical excision procedure Cryo = cryotherapy Mortality from cervical cancer* Cervical Cancer Incidence* CIN2-3 recurrence* Undetected CIN2-3 (FN)* Major bleeding* Premature delivery* Infertility* Major infections* Minor infections* Unnecessarily treated (FP)* Cancers detected at screening* Document1 18 GRADE detailed series - JCE GRADE Guidelines: Diagnosis II Version 20130912 Figure 3. Generic analytic framework for a test from 12 Document1 19 GRADE detailed series - JCE GRADE Guidelines: Diagnosis II Version 20130912 Figure 4. Linking diagnostic test accuracy to patient important outcomes Document1 20 GRADE detailed series - JCE GRADE Guidelines: Diagnosis II Version 20130912 Table 1. Layer 1 SoF Table HPV compared to VIA for detection of cervical intraepithelial neoplasia in women at risk for cervical cancer Patients or population: women at risk of cervical cancer Settings: screening clinics across the world New Test: HPV Cut-off value: – Comparison Test: VIA Cut-off value: – Reference Test: conization and biopsy Number of Participants (Studies) 8921 (5) Pooled Sensitivity HPV Pooled Specificity HPV 95% (95% CI: 84 to 98) Pooled Sensitivity VIA 69% (95% CI: 54 to 81) 84% (95% CI: 72 to 91) Pooled Specificity VIA 87% (95% CI: 79 to 92) Number of results per 1000 patients tested Test Result True positives (TP) TP absolute difference False negatives (FN) FN absolute difference True negatives (TN) TN absolute difference False positives (FP) FP absolute difference Baseline risk 5%1 HPV VIA 48 (42 to 49) 35 (27 to 41) 13 more 2 (1 to 8) 15 (10 to 23) Quality of the Evidence (GRADE) ⊕⊕⊕⊕ high 13 less 798 (684 to 865) 827 (751 to 874) 29 less 152 (86 to 266) 123 (76 to 200) ⊕⊕⊕⊝ moderate2,3 due to inconsistency 29 more Reference: Mustafa, Santesso, Schünemann …. Footnotes: 1 Prevalence of 5% was assumed to be the average prevalence in a representative population Estimates of HPV and VIA sensitivity and specificity were variable despite similar cut-off values; inconsistency could not be explained by quality of studies. This was a borderline judgment. We downgraded TN and FP. This decision is considered in the context of other factors, in particular, imprecision. 3 Wide CI for TN and FP that may lead to different decisions depending on which of the confidence limits is assumed. 2 Document1 21 GRADE detailed series - JCE GRADE Guidelines: Diagnosis II Version 20130912 Table 2. Layer 3 Summary of Findings Table describing population important outcomes Events in the screen-treat strategies for patient important outcomes (numbers presented per 1,000,000 patients)* Outcomes HPV +/CKC HPV +/LEEP HPV +/Cryo VIA +/CKC VIA +/LEEP VIA +/Cryo NO screen Mortality from cervical cancer1 18 7 7 18 10 10 333 Cervical Cancer Incidence2 33 15 15 34 21 21 369 125 190 166 565 612 595 35000 CIN2-3 recurrence3 2000 Undetected CIN2-3 (FN) Major bleeding4 15000 16546 0 117 13071 0 740 0 741 646 625 691 615 599 500 - - - - - - - Major infections7 1351 0 104 1068 0 82 0 Minor infections8 18487 0 1826 14605 0 1442 0 Premature delivery5 Infertility6 Unnecessarily treated (FP) Cancers detected at screening Document1 152000 123000 2259 22 GRADE detailed series - JCE GRADE Guidelines: Diagnosis II Version 20130912 1 We assume mortality will decrease in true positive due to treatment. It will increase in false negative due to late diagnosis. No mortality from cervical cancer in true negative and false positive. Our calculations in the model are based on 61% RRR for cryotherapy and that it is 2.8 times more in CKC group and 1.06 times more for LEEP based on Kalliala 2007 mortality data. Mortality data was indirect as they evaluated all cause mortality in this study. Baseline risk of mortality from cervical cancer 1% per 30 years based on WHO data for Lower and middle-income countries (http://globocan.iarc.fr/factsheets/cancers/cervix.asp - April 18, 2012) 2 We assume cervical cancer incidence will decrease in true positive due to treatment. It will increase in false negative due to late diagnosis. No cervical cancer in true negative and false positive. Our calculations in the model are based on 61% RRR for cryotherapy and that it is 2.1 times more in CKC and similar in LEEP based on Kalliala 2007 cervical cancer data. Baseline risk of cervical cancer in 2% per 30 year based on WHO data for Lower and middle-income countries (http://globocan.iarc.fr/factsheets/cancers/cervix.asp - April 18, 2012) 3 We assume CIN2/3 recurrence incidence will decrease in true positive due to treatment. It will be high in false negative due to no diagnosis and natural persistence numbers. No CIN2/3 in true negative and false positive. Our calculations in the model are based on 70% natural persistence with no treatment. Recurrence rates of 4% in cryotherapy, 2.3% in CKC and 5% in LEEP. 4 We assumed major bleed would be 0 in TN and FN as they were not treated. We assumed 0.000585 of the population treated with cryotherapy, 0.082728 of the population treated with CKC and 0 of the population treated with LEEP will have major bleed based on reported proportions in single arm studies. 5 We assumed premature delivery would be at baseline risk as in the general population in TN and FN as they were not treated. We assumed 5% risk of premature delivery in 1% women becoming pregnant. We assumed 0.001125 of the population treated with cryotherapy, 0.001706 of the population treated with CKC and 0.00123 of the population treated with LEEP will have premature delivery based on reported proportions in single arm studies. 6 We did not identify any data about the risk of infertility after treatment for CIN2+. 7 We assumed major infection would be 0 in TN and FN as they were not treated. We assumed 0.000518 of the population treated with cryotherapy, 0.006757 of the population treated with CKC and 0 of the population treated with LEEP will have major infection based on reported proportions in single arm studies. 8 We assumed minor infection would be 0 in TN and FN as they were not treated. We assumed 0.009131 of the population treated with cryotherapy, 0.092437 of the population treated with CKC and 0 of the population treated with LEEP will have major infection based on reported proportions in single arm studies. Document1 23 GRADE detailed series - JCE GRADE Guidelines: Diagnosis II Version 20130912 Table 3. Evidence to recommendation considerations for guideline panels making recommendations about tests Criteria Explanation How common is the problem? Describe if the health problem is common (i.e. prevalence) and consider this in the context of other problems the panel is considering Is the problem severe? Is the problem so severe that it is a priority when making health care decisions with patients or the population What is the diagnostic test accuracy? Describe the diagnostic test accuracy (DTA) and make a judgment if it appears worth considering (compared to the alternative). That is, if the DTA is inferior and there are no other apparent benefits from using the index test this judgment supports the upcoming deliberations or makes them unnecessary What is the confidence in the diagnostic test accuracy information? Describe the confidence in the estimates of the DTA based on the GRADE criteria Overall, compared to the alternative, are the anticipated benefits large? Make a judgment about the magnitude of the considered benefits Overall, compared to the alternative, are the anticipated harms small? Overall, is there certainty about the link between the diagnostic test accuracy information and the linked benefits and harms? Make a judgment about the magnitude of the considered harms. Include information about side effects of tests. What is the overall confidence in the estimates of effect for benefits and harms? Describe how confident you are in the overall benefits and harms after considering the DTA information and the information about the linked evidence. What is the confidence in the values that patients place on the benefits and harms? Describe the source and confidence related to the values and preferences and how confident you are in the evidence What would be the impact on health inequities? Describe any impact that is expected on health inequities Document1 Describe how confident one can be in the evidence linking the DTA information and the ensuing (linked) benefits and harms, i.e. how certain are you in the information informing about the management and therapy and other consequences 24 GRADE detailed series - JCE GRADE Guidelines: Diagnosis II Version 20130912 Criteria Explanation Is the cost small relative to the net benefits of the favored option? Make a judgment about the cost relative to the net benefits of the index test relative to the cost. Document1 25 GRADE detailed series - JCE GRADE Guidelines: Diagnosis II Version 20130912 Supplemental information Document1 26 GRADE detailed series - JCE GRADE Guidelines: Diagnosis II Document1 Version 20130912 27