1 TO Dr Don Mackie (Chief Medical Officer) Jill Lane (Director National Services Purchasing) FROM Dr Marli Gregory (Clinical Advisor, BreastScreen Aotearoa) Jacqui Akuhata-Brown (Manager National Screening Unit) DATE 17th April 2012 SUBJECT BreastScreen Health Care: Investigation into Internal Report and Recommendations for Further Action. EXECUTIVE SUMMARY A recent internal report (the Report) from BreastScreen Health Care (BSHC) raised concerns of a possibility that: there may have been systemic problems with the quality of reading screening mammograms at BSHC, and a number of individual women may have experienced harm from delays in diagnosis of cancer beyond what might be expected in a screening programme. Accordingly the National Screening Unit (NSU) carried out a number of initial investigatory steps in order to assess the likelihood that the Report findings reflect actual programme failure. The purpose of this paper is to: report on the NSU’s investigation findings, provide an assessment of the materiality of the Report findings, and identify options for further investigation and/or action. There is a specific risk with screening programmes in so far as false negatives findings (and therefore the potential for cancers not being detected in individual women) are an accepted aspect of screening. In this context, and from the available information, there can be reasonable confidence that there has not been unacceptable harm to women in this instance. Various options could be pursued in order to investigate further but these all have a significant risk of not providing any greater certainty than what already exists. The results of this investigation, the routine audit carried out in 2011 and some of the results from recent independent monitoring reports indicate that ongoing work is required to improve the quality and level of performance at BSHC. Recommendations 1. No further investigations should be carried out to establish whether there was systemic failure of screening or harm to individual women during the period covered by the Report. 2. Resources should be fully directed at working with Southern DHB in supporting lasting quality improvement at BSHC. 3. Further work should be carried out to refine the formal monitoring parameters for BSA. 4. Communicating the outcome of this investigation should include a clear description of the risks as well as the benefits of population based screening programmes. This should include clarifying that the result of a screening test provides a statement about the probability of disease being present rather than a diagnosis. Investigation into BSHC Internal Report and Options for Further Action 17th April 2012 2 On 13th March 2012 the National Screening Unit (NSU) was provided with the report of an ad hoc review (the Report) carried out by a radiologist (Dr Y) formerly employed by BreastScreen Health Care (BSHC). The review findings suggested that 47 out of 108 women (43.5%) who had previous mammograms which could be reviewed may have experienced delays in identification of malignancies. A subsequent reread of the mammograms of the 47 potentially affected women by a second radiologist (Dr Z, the clinical director of another BSA provider) indicated that 28 (25 per cent) of the women identified may have had delays in identification of malignancies. . The implications of the Report findings are that: there may have been systemic problems with the quality of reading screening mammograms at BSHC, and a number of individual women may have experienced harm from delays in diagnosis of cancer beyond what might be expected in a screening programme. The potential for harm arising from both false positive and false negative findings is a recognised risk of all screening programmes (hence the need for rigorous quality assurance and monitoring regimes). The benefits of formal screening programmes lie at the population level by reducing mortality and disability rates through early detection and treatment of cancer (or pre-cancerous conditions). The screening test itself provides an individual woman with a probability statement about the likelihood that disease is present – not a definitive diagnosis. As a statement of probability false negatives are inevitable – consequently it is inevitable that women in screening are exposed to the possibility of harm. Quality assurance mechanisms within organised screening programmes are aimed at ensuring that, as far as possible, the benefits of screening are maximised and inevitable harm, minimised. Questions of harm in a screening programme must be framed as unacceptable levels of harm as a consequence of the programme deviating from defined standards. Taken in isolation, the Report findings do not provide sufficient evidence to establish whether there has been systemic under-reporting nor whether individuals may have been harmed. Accordingly the NSU carried out a number of initial investigatory steps in order to assess the likelihood that the Report findings reflect actual programme failure. The investigation included: an external peer review of the Report, a review of BSHC’s performance parameters from routine monitoring, a search of the published literature on findings from reviews similar to that carried out by Dr Y, a re-read of the mammograms in question by an external panel as part of a set including control mammograms, and external review of BSHC radiologists’ performance statistics. The purpose of this paper is to: report on the NSU’s investigation findings, provide an assessment of the materiality of the Report findings, and identify options for further investigation and/or action. 1) Investigation Findings a) Peer Review and BSHC Performance The Chair of the BreastScreen Aotearoa (BSA) Independent Monitoring Group (IMG), Professor Richard Taylor, was asked to review the Report. Professor Taylor’s report is attached as Appendix One. Key points from his report follow. The standard monitoring indices for BSHC as assessed by the IMG Report for January 2009 to December 2010 including trends (to December 2011) have been satisfactory, and the latest Interval Cancer Report (still in draft) shows no elevated rates in BSHC. The time period for these reports overlaps that of the Report. Investigation into BSHC Internal Report and Options for Further Action 17th April 2012 3 Although for some monitoring indices BSHC is worse than other Lead Providers, this would also be the case for some other Lead Providers. There is a significant issue of observer bias in the Report and its subsequent review by Dr Z, since both radiologists that read the prior mammograms were aware of the definitive diagnosis of cancer, and probably also the localisation of the lesion. There is evidence that the proportions of small cancers ≤10 mm and ≤15 mm in this subset of 27 women are less than BSA targets, and less than that reported by BSHC for January 2009 to December 2010. The routine monitoring indices for small cancer detection for BSHC are in the target range based on the established criteria, especially for subsequent screens (which constitute the majority of screens, and were the only screens considered in the Report). Although not statistically significant, the point estimates for detection of cancers less than 15 mm diameter are very low. The most recent Independent Monitoring Reports of screening and assessment (for the periods July 2008 to June 2010 and January 2009 to December 2010) identify that BSHC is mostly on target or exceeds targets for most biennial indicators. However targets not achieved include: a high rate of referral to assessment for initial screens (target <10%) 14.3%* (July ’08 - June ’10) 11.6% ns (Jan ’09 – Dec ’10) * = statistically significant, ns = not significant a low percentage of women being offered assessment within timeframes and a suggested link between this delay and the high recall rate for first screens (target > 90%) 48%* (July ’08 - June ’10) 54%* (Jan ’09 – Dec ’10) a low percentage of cancers from referral to assessment for initial screens (target >9%) 4.6%* (July ’08 - June ’10) 5.8% ns (Jan ’09 – Dec ’10) a very low rate of detection for invasive cancers <15mm for initial screens (target >30.5/10,000 screens) 5.9 ns (July ’08 - June ’10) 7.5 ns (Jan ’09 – Dec ’10) b) Literature Review of Mammogram Review Methodologies and Investigations into Breast Screening Programme Failures It is well recognised that different methods of mammogram review will lead to greater or lesser numbers being classified as false negative or missed. The general principles for these reviews are: 1. the less blinded the reviewers, the higher the proportion of false negatives identified, 2. the more reviewers there are, the lower the proportion of false negatives identified, and 3. the more non-cancer (controls) in the mammogram set, the lower the proportion of false negatives identified. This leads to huge variation in identification of false negatives, even within the same set. For example, when a group of five radiologists reviewed interval cancer mammograms, the false negative rate varied from 6% (if 5/5 agreement required), through 14% (majority Investigation into BSHC Internal Report and Options for Further Action 17th April 2012 4 opinion), to 38% (if 1/5 agreement required) 1 The variance in methods of review of mammograms makes it difficult to compare results from different studies. A summary of published findings from broadly similar studies is provided in Appendix Two. In summary, it can be seen that the methods used in radiological review are not standardised, and can make a dramatic difference to recorded rates of false negatives. A range of false negative rates at mammography from 14-50% can be observed in the literature on similar studies. The false negative rates of 25% (Dr Z) and 43.5% (Dr Y) are within this range. c) Seeded Set Review The mammograms of 44 women subsequently diagnosed with cancer (identified by Dr Y) were seeded into a set along with 76 ‘normal’ mammograms chosen at random from the same period (1st January 2007 to 31st December 2008). The ‘case’ mammograms were distributed randomly among the ‘control’ mammograms and the resultant set was read independently by three experienced, accredited BreastScreen radiologists. In assessing the findings the panel then replicated the Interval Cancer Review Process defined in the BSA National Policy and Quality Standards 2008 (Appendix Q, page 157). The consensus finding of the panel was that twenty eight cases would have been recalled for lesions that subsequently proved to be cancer and which were therefore interpreted as having a delayed diagnosis (false negative). This represents 23% false negatives in the seeded set review which compares favourably with findings from published studies using similar methods. d) BSHC Radiologist Performance Statistics De-identified BSHC radiologist screening statistics for the period, and subsequently, have been reviewed by the Clinical Directors of two other BSA providers. In summary they found (verbal and email communication to Dr Gregory): o for the prevalent (initial) screening round – high rates of recall to assessment, low cancer detection rates (low positive predictive value), high false positive rate, o for incident (subsequent) screening rounds – low rates of recall to assessment, cancer detection rates at or just below targets, high false negatives rates with sensitivity rates all in the low 80% range, and o they did not identify a radiologist who was an outlier in terms of poor performance. The reviewers’ overall comment was that from these data they would consider the radiologists to be adequate within a larger unit and when paired with radiologists with better performance; but that as a small unit, they should not be reading each other’s films. 2) Assessment and Conclusions There are two related questions to answer: has there been a systemic failure in the quality of reading screening mammograms (beyond acceptable parameters for a screening programme), and if so, 1 has any individual woman, or women, experienced harm? Britton, P. D., J. McCann, et al. (2001). "Interval cancer peer review in East Anglia: implications for monitoring doctors as well as the NHS breast screening programme." Clinical Radiology 56(1): 44-49. Investigation into BSHC Internal Report and Options for Further Action 17th April 2012 5 Given that there is no ‘gold standard’ test that can provide a definitive answer to these questions all the available information must be weighed in order to come to a determination ‘on the balance of probabilities’. Table One provides a framework for assessing the available information. Table One: Information Sources for Assessment Has there been systemic under-reporting? Have any women experienced harm? a) Data from service level benchmarks a) This would be established on the basis of clear evidence of failure at service or individual practitioner level. o IMG reports o interval cancer reports b) Individual radiologist performance o reviews of radiologist performance data o external reviews of interval cancers c) Ad hoc internal reports o bench mark to published comparable studies o peer review by Professor Taylor o seeded set review results d) Formal quality audits (IANZ audit) a) Assessment of evidence for systemic failure Table Two provides a summary of the information currently available in respect to the question of systemic under-reporting (see Section 1 of this paper for details). Table Two: Information on Systemic Under-Reporting Commentary IMG reports BSHC is a small provider (in terms of the volume of women screened) and so has wide confidence intervals around most performance measures. BSHC performs adequately across standard indices but is a poor performer (relative to other providers) on some indices. Of note in the latest IMG report is a very low small invasive cancer detection rate (7.5 per 10,000 screens, target 30.5 per 10,000) Interval cancer reports The rate of interval cancers is a proxy outcome measure of programme quality. The most recent (draft) interval cancer report shows no increase in rate for BSHC. De-identified radiologist statistics Review of statistics by two external radiologists found: relatively high false positive rates for prevalent round screening, relatively high false negative rates for incident round screening (sensitivity in low 80%), and no outliers in terms of poor performance. Investigation into BSHC Internal Report and Options for Further Action 17th April 2012 6 Interval cancer external reviews as required by NPQS. Provided for BSHC by BreastScreen Waitemata and North (BSWN) but appears not to have occurred in the last eighteen months. Ad hoc report and international comparisons The original ad hoc report (Dr Y) suggested that 47 out of 108 women (43.5%) diagnosed with cancer had had previous mammograms ‘misread’. The follow up by Dr Z suggested 25% ‘misreads’. The published literature indicates that, depending on the rigor of the re-read methodology, ‘misread’ rates can range between 14-50%. Seeded-set review The false negative finding from the seeded set review was 23% which is comparable with the results of published studies using similar methods. IANZ audit The formal audit conducted by IANZ in 2011 identified a number of quality issues – some of which point toward fundamental problems with quality assurance processes at BSHC and some of which were identified in previous audits. Taken as a whole these data suggest that BSHC is a small provider that performs adequately in regard to national monitoring parameters but for which there are concerns around a number of quality measures. The study method used by Dr Y does not establish definitively whether there has been systemic under-reporting. Of note is the fact that the proportion of ‘misreads’ is within the range of the findings of similar published studies. However the relatively high rate of potentially missed small cancers does align with recent IMG reports and with the suggestion that there is low sensitivity by radiologists in incident round screening. b) Conclusions Based on the information available, and on the balance of probabilities, there is insufficient evidence to conclude that there has been a failure in the quality of reading screening mammograms beyond acceptable parameters for a screening programme. Accordingly, and acknowledging the recognised risks inherent in all screening programmes, there can be a reasonable degree of confidence that it is unlikely that women have experienced unacceptable harm. The caveat on these conclusions is that further in-depth analysis could be carried out to test the questions, which might provide a greater level of confidence, and these are explored in Section 3. 2) Options for Further Action Given that there is no gold standard for determining the threshold for when more intensive investigation should be carried out, we looked at the published literature from formal investigations in other countries – the results are provided in Appendix Three. In summary all three published investigations arose as the results of concerns around individual radiologist performance. The two from the NHS were as a result of peers identifying poor performance at assessment clinics. The Canadian investigation resulted from concerns around a screening radiologist working in isolation and outside of a formal quality assurance system. The circumstances that gave rise to these investigations (failure at assessment versus screening, identification of outlier performance by peers and professional isolation) are not Investigation into BSHC Internal Report and Options for Further Action 17th April 2012 7 directly comparable to our situation and so cannot provide a guide for decision making on whether further investigation should be carried out here. In deciding what, if any, further actions should be carried out consideration needs to be given to: the degree of confidence in the findings of the initial investigation, the costs, benefits and impact of further investigation, and the likelihood of conclusive findings from further investigation. a) Follow-up quality issues from IANZ audit In this option no further investigations would be taken in regard to the Report. The focus for the NSU and SDHB would be on addressing the quality issues identified in the IANZ audit report. Specific support and monitoring could also be put in place to lift the sensitivity of screening and to raise the small cancer detection rate. Professor Taylor’s recommendations on refining BSA monitoring parameters could be progressed further by the NSU. The strength of this option is that it focusses attention and resources on quality improvement. The risk is that if there has been a systematic failure and if there has been harm then this will not be identified. b) Detailed investigation of cancer case series This option involves a detailed case notes review of all 108 women with cancer identified in the Report. The information gathered would be for the initial purpose of descriptive epidemiology and hypothesis generation for subsequent investigation. The purpose would be to identify individual women who are ‘outliers’ in respect to any particular ‘exposures’ (such as radiologist or location or time of screening) and hence make a determination of likelihood of individual harm. This would involve additional costs and would take some time to complete. There is also a high probability that the results will again be inconclusive. There would also be an outstanding question of what would be the result if the same sort of analysis was carried out with another BSA provider. The strength of this option is that it lifts the probability (although it does not guarantee) that if there was harm then it will be identified. The risks are that such an investigation will be time consuming, costly and potentially inconclusive while undermining confidence in screening. c) Carry out a larger blinded reread study This involves a thoroughgoing formal reread of the mammograms of all 108 cases identified in the Report. A robust methodology would be based on a formal study protocol with features such as: using three or more control mammograms for every case, being re-read by a panel of 3 or more suitably qualified screening radiologists overseas, with each set merged with the radiologists’ day-to-day work. This would take considerable time to organise and would be very expensive. As there would be no bench-mark to other BSA providers (unless another one was selected as a control) the risk that the findings are inconclusive is high. The strengths and risks for this option are as for option b). d) Offer interval screening mammography to concerned women This option would involve communication to the eligible population in Otago-Southland that the investigation had not demonstrated that there had been a systemic failure of Investigation into BSHC Internal Report and Options for Further Action 17th April 2012 8 screening. However women would be provided with a free ‘interval’ screening mammogram (that is, before their next scheduled screening) if they had any concerns. This option is based on the fact that it is impossible to be sure that no harm has occurred and therefore to provide concerned women with reassurance. However there is a risk of this option providing a mixed message to women in saying that the service is considered to be safe enough to continue but to offer additional mammography ‘just in case’. On balance we consider that: there is a specific risk in screening programmes in so far as false negative findings (and therefore the potential for cancers not being detected in individual women) are an inevitable and accepted aspect of screening, in this context, and from the available information, there can be reasonable confidence that there has not been unacceptable harm to women in this instance, and the uncertainty of achieving any clearer outcome as well as the risks involved with further studies mean that additional investigation to determine the materiality of the Report is unwarranted. Accordingly we make the following recommendations. 1. That no further investigations be carried out to establish whether there was systemic failure of screening or harm to individual women during the period covered by the Report. 2. Resources should be fully directed at working with Southern DHB in supporting lasting quality improvement at BSHC. 3. Further work should be carried out to refine the formal monitoring parameters for BSA. 4. Communicating the outcome of this investigation must include a clear description of the risks as well as the benefits of population based screening programmes. This should include clarifying that the result of a screening test provides a statement about the probability of disease being present rather than a diagnosis. Investigation into BSHC Internal Report and Options for Further Action 17th April 2012 9 APPENDIX ONE BSHC Mammography Screening Issue Taylor R. 10 April 2012 1 Comments on the BreastScreen Health Care (Southern South Island) mammography screening issue Dr Richard Taylor MBBS(Syd), DTMH(Lon), FRCP(UK), PhD(Syd), FAFPHM. Professor, School of Public Health and Community Medicine UNSW, Sydney Australia r.taylor@unsw.edu.au Summary 1. The standard monitoring indices for BSHC as assessed by the IMG Report for Jan 2009-Dec 2010 including trends (Dec 2011) have been satisfactory, and the latest Interval Cancer Report (still in draft) shows no elevated rates in BSHC. The time period for these reports overlaps that of the Audit Report [Dr Y 2012]. However, BSHC has shown consistently worse performance, even if not statistically significant, and consistently low rank in relation to other Lead Providers for some indices. For example, the initial screen cancer detection rates. 2. Although for some monitoring indices BSHC is worse than other Lead Providers, this would also be the case for some other Lead Providers. Small numbers make interpretation difficult which is why 95% confidence intervals are supplied. In any comparison between areas there will be a range from best to worst for all indices, but this does not necessarily mean that the worst performers are unsatisfactory according to set criteria, and there may be only small differences between areas. 3. The Audit Report [Dr Y 2012] could have benefited by advice on a more structured methodology and reporting framework, but this may not have been possible in the circumstances. It would also have benefited by a review of the scientific literature on published findings from similar studies. It is commendable that such an investigation was carried out. 4. There is a significant issue of observer bias in the radiologist assessments, since both radiologists that read the prior mammograms were aware of the definitive diagnosis of cancer, and probably also the localisation of the lesion. The re-readings by Dr Z of cases which Dr Y considered as “possible misses” indicate that she would have recalled around 2/3. Further assessment of the readings of the prior mammograms by inclusion into a Test Set for naive radiologists, unaware of the diagnosis and circumstances, would provide better evidence of whether such women should have been recalled. However, in Test Sets there is a higher expectation of abnormality and lesions for recall than in screening practice, and readers also may be more attentive when reading test sets than in routine practice. 5. The recall to assessment in BSHC is a little below average of 3% (desirable target <4%), but is similar to 2 other Lead Providers, and higher than one other. BSHC Mammography Screening Issue Taylor R. 10 April 2012 2 6. The information and discussion on tumour size in the Audit Report [Dr Y 2012] is difficult to follow, and mean values can be misleading because of outliers. The revised Spreadsheet data provides information on 28 women which could have been subject to late diagnosis because of possible missed lesions on prior mammograms (of the 47 patients mentioned in the Report by Dr Y); these cases consist of 1 case of DCIS and 27 patients with solid tumours. There is evidence that the proportions of small cancers ≤10 mm and ≤15 mm in this subset of 27 women are less than BSA targets, and less than that reported by BSHC for Jan09-Dec10. The routine monitoring indices for small cancer detection for BSHC are in the target range based on the set criteria, especially for subsequent screens, which constitute the majority of screens, and were the only screens considered in the Audit Report [Dr Y 2012] (since there is no prior film for initial screens). 7. In the Spreadsheet subset of 27 patients with solid tumours, the node negative rate (approximately 60%) is lower than the BSA target (>75%) and lower than that achieved by BSHC and BSA over Investigation into BSHC Internal Report and Options for Further Action 17th April 2012 10 Jan09-Dec10, both of which were on target. The proportion of node negative patients from the Spreadsheet data was higher than that reported by Dr Y for the “indeterminate‟ group. 8. Before conclusions are drawn on whether there is excessive late diagnosis because of possible missed lesions on prior mammograms in BSHC, normative information should be obtained from: (a) published studies from other mammographic screening programs; and (b) expert opinion from New Zealand and international radiologists with extensive experience with mammographic screening. Similar studies of other Lead Providers could be conducted, but this may produce more issues. 9. Increased recall of suspicious changes on screening mammograms for further assessment, which may increase cancer detection, needs to be balanced by the higher recall and investigation of mammographic changes which are subsequently found not to be malignant. Information on this balance can be found in the scientific literature, and can be examined in test sets to some extent. 9. The hypothesised explanation (lack of „breast awareness‟) for absence of elevated rates of interval cancers, but diagnosis at the next screen, of lesions possibly missed on the prior mammogram, should be investigated further with a properly constructed study. 10. My opinion of the much higher palpability of lumps by surgeons compared with women is that it is partly related to prior knowledge of surgeons of a possible solid lesion visualised on mammography, and its localisation. Recommendations Further studies to be considered 1. Investigate whether women in this instance at BSHC should have been recalled on the basis of the prior mammogram by constructed studies: (a) further assessment of the readings of the prior mammograms by inclusion into a Test Set for naive radiologists (of differing experience and other predictors of expertise), unaware of the diagnosis and circumstances; (b) obtain information on normative performance of radiologists in similar circumstances from a review of published papers and reports from other screening programs, expert opinion, and possibly a review of other Lead Providers. 2. Investigation of the reason why the possibly missed cancers did not translate into elevated interval cancer rates for BSHC, examining proffered hypotheses. Monitoring interpretation and policy Depending on conclusions from above investigations: 3. In assessment of monitoring indices, weight more heavily in the future evidence of consistently worse performance in relation to targets, even if not statistically significant, and consistently low rank in relation to other Lead Providers. 4. Standards for performance indicators could be reviewed and readjusted. 5. New monitoring indices could be introduced related to radiology reading performance. Part I Comments on: PRELIMINARY FINDINGS, IMG REPORT BREAST CANCER AUDIT Dr Y (2011/12) covers July 2008 to June 2010 (24 months) 1. Page 1, Background. 1.1. BSHC have been consistently lowest compared with other Lead Providers for detection rate of DCIS and invasive breast cancer (combined), for initial screens and subsequent screens trend information based on 2 year data, with the upper 95% CIs and lower than the NZ average for Investigation into BSHC Internal Report and Options for Further Action 17th April 2012 11 subsequent screens in the last few reports [IMG Report for Jan 2009-Dec 2010 (Dec2011)]. There are no targets set for this parameter. 1.2. BSHC have manifested lower point estimates compared with targets for cancer detection based on initial screens from trend data for the following indices, although these rates are not statistically lower than the target based on 95% Confidence Intervals (CIs): proportion invasive cancers <15mm, initial screens, 2 years data (low but not significantly so); Invasive cancers <15mm per 10,000 women screened, initial screens, 2 years data (low but not significantly so); proportion invasive cancers <10mm, initial screens, 2 years (low but not significantly so); invasive cancers <10mm per 10,000 women screened, initial screens, 2 years data (low but not significantly so) [IMG Report for Jan 2009Dec 2010 (Dec2011)]. In a mature program like BSA (NZ), initial screens are mostly concentrated in younger women in the target age range, and lesions are more difficult to detect on mammograms in younger women because of residual breast tissue. Although initial screens become much less numerous as a screening program matures, once the proportion of initial screens stabilisers, they may be considered as a sensitive quality indicator of mammography readings, even though they will not affect overall performance because of their small number. 1.3. Other indicators of cancer detection from trend data appear satisfactory in relation to targets and to NZ average [IMG Report for Jan 2009-Dec 2010 (Dec2011)], and this needs to be considered in terms of BSHC performance. Small numbers are involved in parameter estimates, which is why 95% CIs are calculated. However, consistently low estimates for the same indices, even though not individually statistically significant, ought to be considered. 1.4. I agree that ranking in comparison to other Lead Providers ought to be considered if this is consistent for certain indices and the differences are of reasonable magnitude. 1.5. The most recent BSA Interval Cancer Report (still in draft), which covers screening up to and including 2007, with follow-up of interval cancers to 2009, did not indicate higher interval cancers for BSHC compared to other Lead Providers. In fact, BSHC display lower point estimates for 1st or 2nd year interval cancers, following initial or subsequent screens, compared to other Lead Providers. This report covers interval cancers diagnosed up to and including 2009 (from screens up to and including 2007) and overlaps the period of the Audit Report [Dr Y] and the most recent IMG Report for Jan 2009-Dec 2010 (Dec2011). 2. Page 1, Purpose of the audit 2.1. Although it is commendable that such an audit has been undertaken, it would have been better if there had been some input which would have refined the questions so each could have been addressed by a specific methodology. It would also have benefited by a review of the scientific literature on published findings from similar studies. However, this may not have been possible in the circumstances. 3. Page 2, What I Did 3.1. This is the Methodology section, although it is not labelled as such. It would have been preferable for a more nuanced and explicit rendering of the methodology in relation to the questions, including methods of analysis, but in view of the circumstances, this may not have been possible. 4. Page 3, Results and analyses In general, commentary or discussion of Results should not occur in this section, but rather in a subsequent section. Below I will address the results, as well as the comments on them. 4.1. Tumour size Most of the material here is background or explanation of methods, not Results. I find the discussion on DCIS difficult to follow; it appears to be considered because of the effect on average lesion size of solid tumours if DCIS is included. Similarly, I have difficulty understanding the relevance of descriptive material on histological (morphological) type. In any case, despite the implications related to size of tumour of DCIS or morphological type (lobular), these were not excluded from subsequent analyses. Investigation into BSHC Internal Report and Options for Further Action 17th April 2012 12 Data are presented on average tumour size for various subsets, and claims are later made that these are larger than desirable. Normally tumour size data are presented as proportions of cancers between certain cut-offs, such as: =<5mm, >5mm to =<10mm, >10mm to =<15mm, and =<10mm, =<15 mm, and so on. Data are not usually presented as means since this parameter is significantly affected by outliers. 4.2. Page 6, Recalls Since approximately 40% of mammograms were read by 3 radiologists, this is the proportion in which the lesion was missed for a recall by one of them. One would need comparative data from other services and the scientific literature to properly evaluate this finding. The recall to assessment in BSHC is a little below average of 3% (desirable target <4%) but is similar to BSAL and BSSL and higher than BSCtoC [IMG Report for Jan 2009-Dec 2010 (Dec2011)]. 4.3. Page 6. Prior Screens The table on Prior Screens is a little difficult to read because of inadequate headings. It appears that around 7% had evidence of a lesion suggestive of cancer on prior screens according to Dr Y. However, this increases to 34% of cases if the indeterminate category is included. Dr Y indicates that the „intermediate‟ category consists of “potential misses”, but does not indicate that they all should have been recalled, or that she would have recalled all of them. It is stated in the Audit Report [Dr Y 2012] that a total of 34% of cases in „indeterminate‟ and „probably malignant‟ categories on prior screens had “features of cancer”. This statement could imply that all or some should have been recalled. Dr Y admits to possible observer bias because she was aware of the diagnosis, and had seen the recent mammogram which displayed and localised the tumour. The intention as a follow-up was to use these images as part of a Test Set - along with normal mammograms and those with benign lesions – to determine if other radiologists would have recalled them. It appears that this did not occur. It should be noted that the expectation of malignant lesions in a Test Set is much higher than found in screening mammography practice. 4.4. Page 8, Interval cancers The monitoring evidence from the IMG Report for Jan 2009-Dec 2010 (Dec2011) does not suggest that the cancer detection rate in BSHC is „poor‟ according to the standards set. Most screens are subsequent, and the trend data for subsequent screens indicate that the <15 mm invasive cancer detection rate and proportion of cancers detected <15 mm are satisfactory, with the point estimates on target, and similar or better to BSCtoC. For the most recent biennium (Jan 2009-Dec 2010), the subsequent screen invasive cancer detection rate =<10 mm was on target, although lower than other areas, and the proportion of invasive cancers =<10 mm was on target and the second lowest, slightly higher than BSAL. The subsequent invasive cancer rate (all sizes) was on target, and although the lowest, was almost the same as BSSL. The most recent BSA Interval Cancer (draft) report, which covers screens to and including 2007, with follow-up to 2009, does not show higher interval cancers for BSHC compared to other Lead Providers; in fact, the point estimates are lower for 1st or 2nd year intervals from initial or subsequent screens. The period of cancer diagnosis in the most recent BSA Interval Cancer (draft) Report overlaps the Audit Report [Dr Y]. It is hypothesised that cancers which should have appeared as intervals were diagnosed at the next screen because the BSHC area consists of women who are not “breast aware”. This hypothesis would require comparative investigation. 4.5 Page 8, Palpable lesions It is stated that detected cancers are larger than desirable, but it is unclear upon which data this statement is based. It appears to be based on mean size, but this is not usually used for categorising tumour size. However, the revised Spreadsheet data (see below) indicate a low proportion of ≤10 mm and ≤15mm tumours compared to targets in the subset of 27 women with solid tumours that should possibly have been recalled on the prior mammogram. Investigation into BSHC Internal Report and Options for Further Action 17th April 2012 13 There follows a series of hypotheses concerning why only around 7% of women reported lumps at screening, but a lump was felt by the surgeon in 57%. However, the palpation by the surgeon was presumably informed by prior knowledge that a lesion was seen on the mammogram and in a particular location. The question is whether a palpable lump would have been found in asymptomatic women without prior information from a mammogram. Without knowledge of the distribution of sizes of breast cancers and the sizes of breasts they were in, it is difficult to determine what proportion should have been palpable to women or clinicians. 4.6. Page 9, Nodal involvement The most recent IMG Report for Jan 2009-Dec 2010 (Dec2011) indicates that BSHC at 76% was better than target (75%) for the proportion of node negative cancers, and ahead of 2 other services. The Audit Report [Dr Y] indicated approximately 70% node negative for the period July 2008 to June 2010 (24 months) which overlaps the period of the Dec 2011 IMG Report. The Audit Report [Dr Y] states that only around 50% of malignant or indeterminate lesions (47 cases) seen on a prior screen were node negative. This information is contradictory when compared to the Spreadsheet data (below). The revised Spreadsheet data supplied (see below) indicate that, for the subset of cases which may have been missed on prior mammography (27 invasive solid tumours), the node negative proportion was approximately 60%. 5. Overall comments This states that the purpose of the audit was to indicate that there was a problem. Part II REPORT BY DR Z, with regards to a visit to BreastScreen Health Care, Dunedin over the weekend of 10/3/2012 and 11/3/201 Dr Z examined mammograms of some of the cases of possible missed diagnosis identified by Dr Y (4/3/12), and would have recalled 7 of 9. 9-11/3/12 Dr Z examined 35 prior screens (3 missing) from cancers diagnosed 2009-10 which were categorised by Dr Y as „indeterminate‟ and were “potential misses”, and she would have recalled 20 (57%). If we assume that Dr Z would have recalled all 9 cases of „probably malignant‟, then for the categories of „indeterminate‟ and „probably malignant‟ together, Dr Z would have recalled 29/44 (67%) of those classified as “potential misses” by Dr Y. Dr Z also examined 40 prior subsequent screening mammograms from 2011 breast cancer cases (not covered by the Audit Report), and would have recalled 9 (23%). Of course, in all instances above, Dr Z was aware of the subsequent confirmed diagnosis of breast cancer, as was Dr Y. Part III SPREADSHEET DATA The revised Spreadsheet (April 2012) provides information on 28 women, including 1 case of DCIS, which could have been subject to late diagnosis because of possible missed lesions on prior mammograms (of the 47 patients mentioned in the Report by Dr Y). Thus there are 27 women with solid invasive tumours for analysis. Tumour size As set out in the following table, women were classified according to the size (diameter) of the tumour (mm) according to various standard criteria. If there were multiple tumours, the largest lesion was chosen for each woman. On this basis, 17 cases or 63% were T1 according to TNM staging, 8 or 29.6% were T2, and 2 or 7.4% were T3. Investigation into BSHC Internal Report and Options for Further Action 17th April 2012 14 Employing the small cancer detection target categories used by screening programs, including BSA, 7 or 25.9% were ≤10 mm (target: ≥ 30%) and 12 or 44.4% were ≤15 mm (target: >50%). That is, in this subset of patients, both small cancer detection proportions are below target, although these figures are based on small numbers, as evidenced by wide 95% CIs. For women screened by BSHC Jan09-Dec10, the ≤10 mm proportion was 29.8% (95% CIs: 20.340.7%) based on 25 cases, which was, with BSAL, the lowest of all Lead Providers, although on target (≥ 30%); the BSA proportion for the same period was higher at 37.2% (95% CIs 34.5-39.9%) For women screened by BSHC Jan09-Dec10, the ≤15 mm proportion was 52.4% (95% CIs: 41.263.4%) based on 44 cases, which was on target (>50%); the BSA proportion for the same period was 56.8% (95% CIs 54.0-59.6%). There is evidence that the proportions of small cancers ≤10 mm and ≤15 mm in this subset of 27 women which could have been subject to late diagnosis because of possible missed lesions on prior mammograms are less than target and less than that reported by BSHC for Jan09-Dec10. Sizes of solid tumours. Revised Spreadsheet data. April 2012-04-06 Investigation into BSHC Internal Report and Options for Further Action 17th April 2012 15 APPENDIX TWO. Summary of findings from published studies of screening mammography reviews. Author, Programme Cases Blinding Reviewers Proportion of False Negatives Warren et al. 2003, UK2 602 (both screen detected and interval cancers) Partial: Aware the set was 100% cancer, unaware of location/ history 3 radiologists 14% visible on earlier screen Baines et al (1991), Canada3 Mix: 677 (both screen detected and interval cancers) with 5200 randoms Partial: Aware of the casemix, unaware of location/history 1 radiologist 17% visible on earlier screen Saarenmaa et al (1999), Finland4 131 (both screen detected and interval cancers) Unblinded 1 radiologist 33% (Screen detected 43% Interval 19%) Broeders et al (2003), Netherlands5 234 (both screen detected and interval cancers) Unblinded 1 radiologist, 1 radiographer 50% (no significant difference between SDC and IC) Daly et al. (1998), UK6 100 cancers detected in the incident round Partial: Aware the set was 100% cancer, unaware of location/ history 1 radiologist 25% Daly et al. (1998), UK6 100 cancers detected in the incident round Unblinded 1 radiologist 44% Jones et al. (1996), UK7 Mix: 133 cancers detected in the incident round with ~400 normals Partial: Aware of casemix, unaware of location/history 4 radiologists (but only 1/4 required to diagnose a false negative) 19% 2 Warren, R. M. L., J. R. Young, et al. (2003). "Radiology review of the UKCCCR Breast Screening Frequency Trial: potential improvements in sensitivity and lead time of radiological signs." Clinical Radiology 58(2): 128-132. 3 Baines, C. J., D. V. McFarlane, et al. (1990). "The role of the reference radiologist. Estimates of inter-observer agreement and potential delay in cancer detection in the national breast screening study." Investigative Radiology 25(9): 971-976. 4 Baines, C. J., D. V. McFarlane, et al. (1990). "The role of the reference radiologist. Estimates of inter-observer agreement and potential delay in cancer detection in the national breast screening study." Investigative Radiology 25(9): 971-976. 5 Broeders, M. J. M., N. C. Onland-Moret, et al. (2003). "Use of previous screening mammograms to identify features indicating cases that would have a possible gain in prognosis following earlier detection." European Journal of Cancer 39(12): 1770-1775. 6 Daly, C. A., L. Apthorp, et al. (1998). "Second round cancers: how many were visible on the first round of the UK National Breast Screening Programme, three years earlier?" Clinical Radiology 53(1): 25-28. 7 Jones, R.D., McLean L., et al. (1996). “Proportion of cancers detected at the first incident screen which were false negative at the prevalent screen.” The Breast 5: 339-343 Investigation into BSHC Internal Report and Options for Further Action 17th April 2012 16 APPENDIX THREE Summary of published investigations into breast screening false negative issues Reports examined: 1. Burns, F. G. (2011). An independent external review of the breast screening unit at East Lancashire NHS Trust (Final Version 5). 2. Wilson, R. (2006). Report on a review of breast imaging at Altnagelvin Hospital, Belfast City Hospital and Antrim Area Hospital September 2002 to November 2005. Department of Health Social Services and Public Safety. 3. Bélanger, H. and L. Charbonneau (2012). Rapport d’enquête. Révision des mammographies et des tomodensitométries effectuées dans les cliniques de radiologie Fabreville, Jean-TalonBélanger et Domus medica, Collège des médecins du Québec 2008 – 2010 Summary: The two NHS reviews are in relation to failure of individual clinicians to appropriately diagnose breast cancer at assessment. The Quebec review focusses on the accuracy of an individual clinician to read mammograms and the absence of mandated quality standards for private providers there. Methods: Retrospective review of mammograms/assessment notes is the methodology used in these 3 reports. A panel of expert radiologists were used to determine if the read/assessment was correct. Issues: Some indicators of benchmarking of the investigated practitioner’s performance are mentioned, but there is little commentary around this. The low attention to benchmarking risks exposing harms done through screening that is not above the normal. Any review will discover cases that have slipped through – the critical question is to determine if the amount discovered is quantitatively or qualitatively above that which a well performing practitioner/provider would miss. Contributory factors of relevance to BSHC: • The limits of performance data. • The risk of Radiologists working in isolation • The risk of under staffing / busyness Report 1 The Burns report is an investigation into the practice of Dr X, who was identified as missing breast cancer diagnoses at assessment clinic. Dr X was the Clinical Director of the East Lancashire screening unit. The single root cause identified was that Dr X failed to update his practical screening assessment skills in line with guidelines. The overall cancer detection rates for the unit were above national targets and this led to his practice not being suspected until his poor performance came to light through the identification of an interval cancer which prompted a closer internal look at Dr X’s assessment clinic outcomes, which resulted in the discovery of two false negative assessments in a single clinic. Process An incident team was established and decided to get peer-review of all the cases in assessment clinic at that screening Trust (patients seen by Dr X and other radiologists). The initial review confirmed that Investigation into BSHC Internal Report and Options for Further Action 17th April 2012 17 the error rate of Dr X was significantly higher than the other 2 radiologists and overall was outside of acceptable practice. The review period was extended several times further – but only to look at Dr X’s cases. Comments This represents a failure of assessment clinic. The scenario at BSHC is potential failure at mammography – a different stage of screening both epidemiologically and clinically. It is important that Dr X’s performance was benchmarked to make sure the potential misses were in excess of expected. He was also found to be in breach of the National standards for screening practice. Report 2 The extended review of a single radiologist was prompted by colleagues who raised concerns to their Clinical Director about his clinical competence and performance. This involved assessment clinic performance for both screen detected and symptomatic cases. Of note, he was not performing ultrasound examination or taking biopsies on cases where the standards would require this to be done. Process A team of radiologists performed the review of cases. All cases which had been solely reviewed by Dr X during a time period were re-examined to see if correct diagnosis had been made. Some cases which had been poorly assessed were brought back and re-assessed. It does not detail what process was used to benchmark Dr X’s performance but does comment that “screening assessment clinics carried out by other radiologists has shown an overall high standard of care with no evidence of general poor practice.” Contributory factors • Shortage of consultant breast radiologists in Northern Ireland is considered to be a significant contributory factor. • A reliance on external locums was believed to increase the rate of referral to assessment. The recall rates to assessment doubled during the year when these deficiencies were observed. It is felt this was due to a reliance on locum radiologists to double read the screening mammograms but who did not have subsequent responsibility for the assessment of cases recalled. It is for this reason that the NHSBSP guidelines for breast screening radiology recommend that those involved in screen reading are also directly involved in the assessment at the same site. • Otherwise the performance data of the Antrim screening programme had been satisfactory and equivalent to other services. This illustrates the difficulty of relying on standard performance statistics to measure the standard of care being provided. • The review identified that most errors occurred at sites where the radiologist was working single handed. Comments This represents a failure of assessment clinic. The scenario at BSHC is potential failure at mammography – a different stage of screening both epidemiologically and clinically. Report 3 This report is written in French. Although an approximate translation has been achieved through Google Translate, there remains a limit on our ability to interpret this report accurately. The report reviews the performance of mammography reading done by a Quebec radiologist during the period 9 October 2008 and October 9, 2010. These films were reviewed by other radiologists to determine if the appropriate decision was made. It appears that this radiologist was practicing in private clinics at least some of the time and these did not have to meet usual quality assurance Investigation into BSHC Internal Report and Options for Further Action 17th April 2012 18 standards for breast screening. It is not clear if the performance of this radiologist is benchmarked, or if the false negative reads were above that which would be expected. The report confirms the discrepancies in the readings of the radiologist under investigation and, less significantly, those of other radiologists. It raises issues on quality assurance mechanisms surrounding the practice of imaging in private clinics. The recommendations largely focus on improving quality assurance systems about individual radiologist performance (performance standards, second reads of all mammograms read by outliers). Investigation into BSHC Internal Report and Options for Further Action 17th April 2012