N=2391 Breast cancer screening: evidence of benefit depends on the method used Philippe Autier, MD, and Mathieu Boniol, PhD International Prevention Research Institute, Lyon (France) Correspondence Ph Autier, International Prevention Research Institute, 95 Cours Lafayette, F-69006 Lyon (France). Philippe.autier@i-pri.org; www.i-pri.org. Conflict of interest None to declare 1 Abstract The ability of mammography screening to decrease breast cancer risk in general populations (effectiveness) is often assessed by case-control studies that compare the screening history of women dying from breast cancer to that of women still alive. Some authors propose to institute the case-control design as the method of reference for the evaluation of breast screening effectiveness. However, when breast cancer mortality decreases for reasons unrelated to screening, this design leads to results suggesting that screening is the cause of the decrease. Available statistical methods cannot correct the confounding by indication that distorts results of case-control studies. Studies based on the incidence-based mortality are often flawed because they had no control area or ignored the role of treatments in mortality reductions. Evaluation of effectiveness should rest on the monitoring of the incidence of advanced cancer (that should decrease if screening is effective), or on mortality trends in areas with contrasting screening policies and where good information is available on non-screening factors involved in mortality. 2 Background Four randomised trials conducted in Sweden from 1977 to 1993 obtained results suggesting 20 to 30% reductions in breast cancer mortality associated with regular participation of women 40 to 74 years of age to mammography screening [1-4]. These trials encouraged the implementation of mammography screening services, and in 2012, screening programs have been in place for 20 years or more in several high income countries or regions. Breast screening mobilizes considerable resources and generates harm because of false positive exams, overdiagnosis and overtreatment, the amount of which depends on factors like screening intensity, screening ages, radiologist experience and the legal environment (e.g., defensive medicine in the USA). An essential question is thus the effectiveness of screening activities, that is, are mortality reductions observed in the ideal conditions of randomised trial (efficacy) also found in the real world conditions of general population mass screening (effectiveness) ? If screening proved to be weakly effective while causing harm, then resources could be re-allocated to more cost-effective, less detrimental health activities. In their paper, D Puliti and M Zappa [5] posit that case-control studies and studies based on incidence-based mortality (IBM) constitute valid, unbiased 3 evaluations of screening effectiveness. In contrast, studies based on mortality statistics have limitations that explain why their findings are different than those of case-control and IBM studies. As a result of these considerations, the paper proposes to institute the case-control design as the standard method for evaluating breast screening effectiveness. In this opinion article, we briefly review the main methods used for evaluation of the effectiveness of breast screening and show that case-control and IBM studies cannot provide reliable figures on screening effectiveness and may lead to errors. In contrast, methods based on the incidence of advanced cancer and of cancerspecific mortality rates should remain the reference methods for assessing screening effectiveness. The case-control design The case-control design compares the exposure to screening of women who died from breast cancer (the cases) to that of women who did not die from breast cancer (the controls). This comparison is deemed to inform on the ability of regular participation to screening to reduce the risk to die from breast cancer. This design is appealing because of its low cost and swift execution. 4 The many problems associated with case-control designs for the evaluation of screening effectiveness have been described in length [6], but the more serious limitation is the confounding by indication for which no method exists that can adjust for it [7]. Compared with women participating in screening mammography, non participating women have characteristics associated with a higher risk to die from breast cancer and from other cause like higher rate of obesity and of chronic disease, and lower compliance to treatments [8,9]. So, although a number of non-participants die from breast cancer for reasons unrelated to screening, the case-control design makes believe that these deaths are due to not having been screened. This type of confounding by indication has been termed “self-selection”. As the IARC Hanbook on Breast Cancer Screening concluded:”Observational studies of screening, such as cohort and case-control studies, may give biased measures of effect because of self-selection of women for screening. There are no certain ways of eliminating the bias” [10]. In 2002, SW Duffy et al proposed a method to correct for self-selection [11]. This method is based on computation of a quantity “Dr” that is the ratio of the breast cancer mortality rate in non-participating women at the end of a screening period to the rate in all women before screening introduction. Table 1 summarizes the effects of correcting for self-selection in a situation where screening has no impact on breast cancer mortality. Panel 1 reflects the fact that 5 the risk of breast cancer death of participants to screening is 30 to 60% lower than that of non participants and this difference in risk exists in the absence of screening [12-13]. A case-control study after a period of screening will find a crude risk of death of 0.61 in participants vs. non participants, but the correction for self-selection will yield an adjusted relative risk of 1.0, in line with the absence of effectiveness of screening. In Panel 2, a 25% reduction in breast cancer mortality occurs that is unrelated to screening (e.g., because of improved treatment). The crude risk of breast cancer death is still 0.61, but the quantity Dr is smaller than in panel 1 because treatments have improved breast cancer prognosis in non participants. As a consequence, the study will make believe that participation to screening is associated with a 44% reduction in breast cancer mortality. Hence, when breast cancer mortality decreases over time for reasons unrelated to screening, the case-control design will attribute the decrease to screening. Correction for self-selection cannot adjust for confounding by indication when mortality trends are changing for reasons other than screening. Other biases specific to case-control designs will further distort the true relationship between exposure to screening and breast cancer death. For instance, the life-expectancy of women participating to screening is longer than that of non 6 participating women [13], with the consequence that participating women have a greater chance to be selected as control, what will bias the risk of death in favour of screening. Incidence based mortality (IBM) studies IBM studies compare breast cancer mortality in breast cancer patients diagnosed during similar periods before (control period) and after screening introduction (screening period). The advantage of this method is that it uses the refined mortality that is obtained by excluding breast cancer deaths due to cancers diagnosed before screening introduction. Some IBM studies are summarized in Table 2. According to the IARC Handbook on Breast Cancer screening [10] “Refined mortality should be estimated for screened and unscreened population to ensure comparability. Furthermore, cancer registration with data on treatment is likely to be the only means for differentiating the confounding effect of changes in treatment from the effect of screening.” 7 In Sweden, breast cancer mortality steadily decreases by 0.9% per year since 1972 that is well before screening started [14]. The two main IBM studies done in Sweden [15,16] could not realize how these secular trends influenced their results because they had no comparison areas where no screening existed during the entire study period. Furthermore, these studies did not consider that changes in mortality over time could also be due to improved treatments [19]. Mortality trends in control areas may reflect the influence of non-screening factors [17,18], but it cannot replace information on patient management that is known to vary substantially across regions [20]. Swedish IBM studies corrected their results in various ways (Table 2). The most contentious correction is that of mortality data for the sharp increases in breast cancer incidence taking place after screening introduction. This correction is based on the assumption that if screening had not existed, breast cancer mortality would have increased in parallel to the incidence. This correction is not justified because first, in the absence of screening, mortality and incidence trends are not correlated, and second, mammography screening itself is the main cause of increasing breast cancer incidence. 8 Finally, counting breast cancer deaths due to breast cancer diagnosed during a specific period is prone to inaccuracies. In the study of Hellquist et al [17], the ratio of deaths in participants and non-participants was 0.94. Bearing in mind that women not participating to screening may have a risk of breast cancer death 1.7 to 3 times greater than participating women, a number of breast cancer deaths may have been missing in the screened groups. Changes in the incidence of advanced breast cancer mortality Mammographic screening aims to detect cancer at an earlier stage, when the cancer is less life threatening and easier to cure than if detected clinically. In a previous work that followed this logic, we found that in randomized trials of mammographic screening, breast cancer mortality reductions were directly proportional to the fall in the incidence of advanced breast cancer [21]. The incidence of advanced breast cancer incidence is not influenced by subsequent treatments. Therefore, in populations were breast screening is widespread since long (≥7 years), a reduction of advanced cancer incidence should reflect the impact of screening activities alone. Longstanding broad 9 consensus exists for considering a decrease in advanced breast cancer incidence as the best early indicator of the impact of screening [1,10,22,23]. This consensus fitted well with cancer registry data showing marked decreases in the incidence of advanced cervical and colorectal cancers over the last decades [24,25] which illustrated the contribution of screening to the reductions in mortality due to these two cancers. The IARC meeting of 2002 devoted a section on trends in advanced breast cancer incidence [10]. However, at that time, few cancer registries had collected adequate data on too short period of time after screening introduction. From 2006 onwards, with accumulating years of screening activities in populations where good quality cancer registries existed, larger amounts of data on advanced breast cancer incidence were available. A systematic review showed that in areas in Europe, North America and Australia where screening was widespread since long, no or small decreases in the incidence of advanced and of very advanced breast cancer was observed [26]. An analysis of breast cancer incidence in the USA reached the same conclusion [27]. A team of radiologists performed an in depth analysis of screen-detected, interval and all breast cancers diagnosed from 1989 to 2007 in the South-East 10 region of the Netherlands and found no decline in the incidence of advanced breast cancer [28]. In the United Kingdom, cancer registry data of Scotland, Northern Ireland and the West Midlands showed no decline of the incidence of advanced breast cancer after screening introduction in 1989 [26,29]. Changes in breast cancer mortality rates in countries with large difference in the timing of screening introduction It is the observation of a lack of downward trends in the incidence of advanced breast cancer in areas where such decrease was expected thanks to screening that prompted us to compare breast cancer mortality trends in pairs of similar European countries, since nonexistence of reduction in the incidence of advanced breast cancer means no impact on breast cancer mortality. In any logic, if breast screening was capable to reduce breast cancer mortality by 20 to 30% after 7 to 10 years, such reduction should become apparent in countries where screening is widespread since long, whereas no or smaller reductions should be observed in countries where screening was not implemented. 11 The ecological design may be useful for comparative effectiveness research, i.e., comparison of disease-specific trends in countries with similar quality of health systems and access to treatment, but with different prevention policies. This design has been used when randomized trials were impracticable, like for instance, the banning of smoking in public places in 2006 in Scotland was followed by a one-year 17% reduction in hospital admissions for acute myocardial events [30]. In contrast, in England where such ban did not yet exists, the hospital admission during that one-year period decreased by 4%. In this respect, the IARC Handbook on Breast Cancer Screening stated that “Routine screening programmes can be evaluated most readily by time trends and differential mortality from the disease for which screening is being performed. Probably the best known is screening for cervical cancer. The substantial differences among the Nordic countries in the extent of organized screening were closely matched by the mortality rates from cervical cancer (Läärä et al., 1987)” [10]. We mimicked the Nordic study on cervical cancer screening by selecting three pairs of European countries (The Netherlands and Belgium; Northern Ireland and Ireland; Sweden and Norway) having similar prevalence of risk factors for breast cancer death, access to treatment and expenditures for health, but where by year 1993, nationwide screening was in place in the first country of each pair, 12 while screening was implemented 10 to 12 years later in the second country [31]. Of note, in 2005, participation to screening in Belgium was still below 60%. The data showed equivalent reductions in breast cancer mortality from 1989 to 2007 in each country pair. These results agreed with the observation that breast cancer mortality reductions in high income countries are unrelated to the temporal introduction of screening mammography [14, 32]. A limitation of studies on mortality trends is the contamination of mortality data during the screening era with deaths due to breast cancer diagnosed before screening start. Also, probably not enough time had passed for being able to observe the benefits of screening. The epidemiologists evaluating the Dutch breast screening programme considered that decreases in breast cancer mortality due to screening started quite soon after screening implementation [33], and thus, the argument about insufficient observation time is not tenable. In 1993, the 8year survival of deadly (stage III) breast cancer is about 30% [34] and survival has continuously improved since then. Our study extended over 15 to 19 years of screening in Sweden, Northern Ireland and the Netherlands. Hence most fatal breast cancers diagnosed in these three countries before the screening era weighted little in the breast cancer mortality burden after 2000. 13 Conclusion Evaluation of effectiveness of screening for cervical and colorectal cancer is mainly based on time-trends of advanced disease incidence and cancer-specific mortality, with use of designs enabling comparisons between periods and geographical areas [24,25,35]. Until recently, the method based on incidence trends of advanced breast cancer has been supported by many breast screening enthusiasts [1,22,23]. Why this method is no longer cheered and why is there a motivation to institute the case-control design as the reference method for effectiveness evaluation? We recommend abandoning the case-control design for the evaluation of screening effectiveness because when cancer mortality is decreasing for any reason, this design will attribute the decrease to screening, even though screening had no effect on mortality. IBM studies are likely to obtain biased results when secular trends in mortality and increasing treatment effectiveness are not considered. Furthermore, the gathering of data for IBM analysis may be subject to bias. We conclude that evaluation of effectiveness should be based on the monitoring of the incidence of advanced cancer and on mortality trends in areas with contrasting screening policies and where good information is available on non-screening factors involved in mortality. Indeed, it remains to 14 explain why results of studies on effectiveness of breast screening do not match with results of Swedish randomized trials. References 1. Tabar L, Gad A, Holmberg LH, Ljungquist U, Fagerberg CJG, Baldetrop L, et al. Reduction in mortality from breast cancer after mass screening with mammography. Lancet 1985; i: 829-832. 2. Andersson I, Aspegren K, Janzon L, Landberg T, Lindholm K, Linell F, et al. Mammographic screening and mortality from breast cancer: the Malmo mammographic screening trial. BMJ 1988; 297: 943–948. 3. Frisell J, Lidbrink E, Hellstrom L, Rutqvist LE. Follow-up after 11 years update of mortality results in the Stockholm mammographic screening trial. Breast Cancer Research and Treatment 1997; 45: 263–270. 4. Bjurstam N, Björneld L, Warwick J, Sala E, Duffy SW, Nystrom L, et al. The Gothenburg breast screening trial. Cancer 2003; 97: 2387-2396. 5. Puliti D, Zappa M. Breast cancer screening: are we seeing the benefit? BMC Medicine 2012 (in press). 6. Cronin KA, Weed DL, Connor RJ, Prorok PC. Case-control studies of cancer screening: theory and practice. J Natl Cancer Inst 1998; 90: 498–504. 7. Bosco JLF, Silliman RA, Thwin SS, Geiger AM, Buist DSM, Prout MN, et al. A most stubborn bias: no adjustment method fully resolves confounding by indication in observational studies. J Clin Epidemiol. 2010;63:64-74. 8. Flamant C, Gauthier E, Clavel-Chapelon F. Determinants of noncompliance to recommendations on breast cancer screening among women participating in theFrench E3N cohort study. Eur J Cancer Prev 2006; 15:27–33. 15 9. Ferrante JM, Chen PH, Crabtree BF, Wartenberg D. Cancer Screening in Women: BMI and Adherence to Physician Recommendations. Am J Prev Med 2007; 32: 525–531. 10. International Agency for Research on Cancer. IARC/WHO handbooks of cancer prevention vol. 7: Breast Cancer Screening. IARC Press, Lyon, 2002. 11. Duffy SW, Cuzick J, Tabar L, Vitak B, Hsiu-His Chen T, Yen MF, Smith RA. Correcting for non-compliance bias in case-control studies to evaluate cancer screening programmes. Appl Statist 2002; 51(Part 2): 235– 243. 12. Duffy SW, Tabar L, Fagerberg G,Gad A, Grontoft O, South MC, Day NE. Breast screening, prognostic factors and survival - results from the Swedish two county study. Br J Cancer 1991; 64: 1133-1138. 13. Mook S, Van ’t Veer LJ, Rutgers EJ, Ravdin PM, van de Velde AO, van Leeuwen FE, Visser O, Schmidt MK. Independent Prognostic Value of Screen Detection in Invasive Breast Cancer. J Natl Cancer Inst 2011; 103:1– 13. 14. Autier P, Boniol M, LaVecchia C, Vatten L, Gavin A, Héry C, Heanue M. Disparities in breast cancer mortality trends between thirty European countries: retrospective trend analysis of WHO mortality database. BMJ 2010; 341:c3620. 15. Tabar L, Yen MF, Vitak B, Tony Chen HH, Smith RA, Duffy SW. Mammography service screening and mortality in breast cancer patients: 20-year follow-up before and after introduction of screening. Lancet 2003; 361: 1405–10 16. SOSSEG. The Swedish Organised Service Screening Evaluation Group. Reduction in Breast Cancer Mortality from Organized Service Screening with Mammography: 1. Further Confirmation with Extended Data. Cancer Epidemiol Biomarkers Prev 2006;15; 45–51. 17. Hellquist BN, Duffy SW, Abdsaleh S, Bjorneld L, Bordas P, Tabar L, Vitak B, et al. Effectiveness of population-based service screening with mammography for women ages 40 to 49 years. Cancer 2011; 117:714-722. 16 18. Kalager M, Zelen M, Langmark F, Adami HO. Effect of screening mammography on breast-cancer mortality in Norway. New Eng J Med 2010; 363: 1203-1210. 19. Wilking U , Jonsson B, Wilking N, Bergh J. Trastuzumab use in breast cancer patients in the six Health Care Regions in Sweden. Acta Oncologica, 2010; 49: 844–850. 20. Johansson P, Fohlin H, Arnesson LG, Dufmats M, Nordenskjöld K, Nordenskjöld B, Stål O. Improved survival for women with stage I breast cancer in south-east Sweden: a comparison between two time periods before and after increased use of adjuvant systemic therapy. Acta Oncol 2009; 48:504-513. 21. Autier P, Héry C, Haukka J, Boniol M, Byrnes G. Advanced breast cancer and breast cancer mortality in randomized controlled trials on mammography screening. J Clin Oncol 2009; 27:5919-5923. 22. Duffy SW, Tabar L, Vitak B, Day NE, Smith RA, Chen HH, Yen MF. The Relative contributions of screen-detected in situ and invasive breast carcinomas in reducing mortality from the disease. Eur J Cancer 2003; 39: 1755-1760. 23. Fracheboud J, Otto SJ, van Dijck JA, Broeders MJ, Verbeek AL, de Koning HJ. Decreased rates of advanced breast cancer due to mammography screening in The Netherlands. Br J Cancer 2004; 91: 861-867. 24. Sigurdson K, Sigvaldason H. Effectiveness of cervical cancer screening in Iceland, 1964-2002: a study on trends in incidence and mortality and the effect of risk factors. Acta Obstetricia et Gynecologica 2006; 85: 343-349. 25. Edwards BK, Ward E, Kohler BA, et al. Annual report to the nation on the status of cancer, 1975-2006, featuring colorectal cancer trends and impact of interventions (risk factors, screening, and treatment) to reduce future rates. Cancer 2010; 116:544-573. 26. Autier P, Boniol M, Middleton R, Doré JF, Héry C, Zheng T, Gavin A. Advanced breast cancer incidence following populationābased mammographic screening. Ann Oncol 2011; 22:1726-1735. 17 27. Esserman L, Shieh Y, Thompson I. Rethinking screening for breast cancer and prostate cancer. JAMA 2009; 302: 1685-1692. 28. Nederend J, Duijm LEM, Voogd AC, Groenewoud JH, Jansen FH, Louwman MWJ. Trends in incidence and detection of advanced breast cancer at biennial screening mammography in The Netherlands: a population based study. Breast Cancer Research 2012; 14:R10 doi:10.1186/bcr3091 29. Autier P, Boniol M. The incidence of advanced breast cancer in the West Midlands, United Kingdom. Eur J Cancer Prev 2012 [Epub ahead of print] 30. Pell JP, Haw S, Cobbe S, Newby DE, Pell ACH, Fischbacher C, et al. Smoke-free legislation and hospitalizations for acute coronary syndrome. New Eng J Med 2008;359:482-491. 31. Autier P, Boniol M, Gavin A, Vatten L. Breast cancer mortality in neighbouring European countries with different levels of screening but similar access to treatment: trend analysis of WHO mortality database. BMJ 2011; 343: d4411. 32. Bleyer A. US breast cancer mortality is consistent with European Data. BMJ 2011; 343: d5630. 33. Otten JDM, Broeders MJM, Fracheboud J, Otto SJ, de Koning HJ, Verbeek ALM. Impressive time-related influence of the Dutch screening programme on breast cancer incidence and mortality, 1975-2006. Int J Cancer 2008; 123: 1929–1934. 34. Thomson CS, Brewster DH, Dewar JA, Twelves CJ. Improvements in survival for women with breast cancer in Scotland between 1987 and 1993: impact of earlier diagnosis and changes in treatment. Eur J Cancer 2004; 40: 743–753 35. Quinn M, Babb P, Jones J, Allen E. Effect of screening on incidence of and mortality from cancer of cervix in England: evaluation based on routinely collected statistics. BMJ 1999; 318: 904-908. 18 Table 1 - Relative risk of breast cancer death associated with screening in an area with 75% participation to screening. Panel A Panel B RR breast cancer RR breast cancer Non All women Non All women Participants Participants Period death, participants vs. death, participants participants † participants † non-participants vs. non-participants Before screening 86 142 100 0.61 86 142 100 0.61 BC death rate* After years of Change in mortality due 0% 0% 0% 0% 0% 0% screening to screening: Change in mortality unrelated to screening: BC death rate* Dr p*phi*Dr 1-[(1-p)*Dr] Corrected RR ‡ 0% 0% 0% 86 142 100 0.61 1.42 0.65 1.00 -25% -25% -25% 65 107 75 0.61 1.07 0.48 0.73 0.66 RR: relative risk. * Per 100,000 women-year † (rate in participants*0.75)+(rate in non-participants * 0.25) ‡ After Duffy et al, 2002. Table 2 - Summary of main incidence-based mortality studies on breast cancer screening. Study Country Areas in study Comparison area where no Age of screened Effect of changes in screening existed during the women treatments considered entire study period Effect of Data on cancer screening on risk Adjustment of results done for management of BC death SelfIncidence Lead time selection Tabar et al, 2003[15] SOSSEG, 2006 [16] Helquist et al, 2010 [17] Kalager et al, 2010 [18] Sweden Dalarna and Kopparberg counties 40-69 No No No -45% Yes Yes No Sweden About half of Swedish counties 40(50)-69(74) No Yes, but only in non participants No -27% Yes Yes No Sweden All Swedish counties 40-49 Yes Yes (control areas) No -26% Yes No Yes Norway 4 counties with pilot programme 50-69 Yes Yes (control areas) No -10% (NS) No No No 19 20 21