Number 1234 December 2021 Healthcare AI: New Challenges  (all developed by the UK-based organization, DeepMind). Artificial Intelligence has had a fairly long-standing relationship with healthcare and medicine. It has delivered impressive results in different areas of healthcare such as early detection, diagnosis, decision-making, and even recommending treatments. In the past few years, several applications of AI have emerged where machine intelligence has augmented and sometimes even outperformed human decisions  . More recently, artificial intelligence has become a very active area of research inviting researchers to work on the intersection of different domains with healthcare being one of the popular choices. AI research in healthcare is lucrative since it can have a direct implication on human lives. Subtle Medical. AI in Healthcare: How effective? Adoption of Artificial Intelligence (AI) and Computer-Aided Diagnosis (CAD) can support UK’s healthcare system and play a significant role in accomplishing key objectives. However, despite of the prolific research in both industry and academia, we are still far from large-scale adoption of AI in clinical settings. While a previously published POSTnote provides background on artificial intelligence, its applications in healthcare, this POSTnote extends upon it by focusing on the latter part. It provides an overview of various challenges presented while using AI in healthcare by presenting them under 6 main categories. Additionally, it also discusses some implementable solutions and provides suggestions on how they can be integrated into current systems. Studying the problems and accompanying solutions can help bridge the current gap between academic research and clinical demands which could further foster better adoption of AI in the medical domain. Background Artificial Intelligence is a field that studies how to design algorithms that have the ability to extract patterns from a given dataset. The algorithms are designed to mimic the learning process of a human brain as closely as possible. Although, the algorithms employed for this process were designed long ago  , researchers have started working on innovative use-cases only recently due to advancements in computational and storage capacity. In the last five years, artificial intelligence has reached superhuman capacity in certain complex tasks such as playing the game of GO and starcraft   and advancing science   Despite of the continuous research and development, impressive results and innovative use-cases, there are major issues with adopting AI algorithms for use in clinical settings. COVID-19 pandemic was the first major opportunity for researchers to demonstrate the true efficacy of AI by creating tools to alleviate the nationwide pressure on hospitals and clinicians. Both academia and industry joined hands to solve a common problem. Tools were created to diagnose COVID-19   , predict the number of cases  , forecast demand of essential resources , spread awareness through social media , and even look for treatments . Industrial organizations such as Qure.ai (India) and Lunit (South Korea) were one of the first vendors to provide algorithms tailored for COVID-19 detection. Many hospitals, such as Royal Bolton Hospital, UK, adopted these algorithms in an attempt to reduce the increasing pressure on their staff. However, none of them could be adopted in practice.   performed a comprehensive study where they surveyed 62 papers that used machine learning, deep learning or both for either diagnosis or prognosis of COVID-19. They found that none of the proposed algorithms were of potential clinical use due to methodological flaws and/or underlying biases. Similarly, the Alan Turing Institute, UK’s national institute for data science and artificial intelligence, has presented a report on the response of the UK’s data science and AI community to the COVID-19 pandemic and noted how AI failed to make an impact it promised. This points towards a major gap between AI research and clinical demands that needs be addressed if we aim to establish UK as a world leader in medical AI. Page 1 POSTnote Number 1234 December 2021 Healthcare AI: New Challenges Overview • Artificial Intelligence has big potential for transforming and augmenting healthcare. However, certain challenges need to be addressed before that. • COVID-19 pandemic presented the first opporunity to test AI’s efficacy on a large scale. However, the technology failed to deliver the expected results. • Researchers have identified reasons and made recommendations on steps to improve AI for healthcare. • AI would be an integral part of healthcare in future. Hence, investing in AI research is important. Key Challenges Medical Imaging Data Curation Data is the most fundamental requirement for artificial intelligence. The algorithms require heaps of datasets to recognize patterns within them and give predictions. Although, many imaging datasets are available for research, some of them are either do not involve medical images  or are proprietary [JFT300M, JFT300B (Google), IG-3.5B-17k (Facebook)]. Curating medical datasets is a hard task due to multiple reasons. Firstly, unlike standard datasets used for machine learning research, curating medical datasets cannot be crowdsourced amongst the general population and requires the expertise of medical doctors. Further, such datasets require inter and intra-observer agreement amongst the experts which can be very low for specific clinical conditions. The degree of specialization required for curating medical datasets limits the number of individuals who can contribute to the process. The curation of some of the large-scale, open-sourced medical datasets   was enabled due to a join collaboration between different medical research centers spread across several nations. Page 2 having potential clinical implications rather than providing a proof-of-concept. Similarly, the metrics used to evaluate performance should be focused on clinical performance. For instance, an incorrect diagnosis has a much greater implication in medical domain than in other applications of artificial intelligence. Hence, metrics that take this into account should be designed and adopted for medical applications of AI. Lack of algorithm explainability and validation AI algorithms have limited explainability and generalizability. Most of the products present in the market were created as input-output systems with little support for explaining their decision-making process. Lack of explainability prevents clinicians from placing their trust in these algorithms and adopting them in their clinical workflow. Another challenge arises during validation and testing of these algorithms. An effective validation strategy should involve studies from multiple centers and demographics in order to better simulate the real-world scenario. However, this is difficult to implement since different study centers have a different structure for data pipelines while the algorithms are quite rigid about their own input pipeline structure. One way to alleviate this is standardizing the input-output pipeline structure for each study center and also for algorithms. Ethics, algorithmic bias and generalizability Patient privacy greatly hinders dataset curation and opensourcing medical datasets. Many times, the Protected Health Information (PHI) of a patient can be present in the dataset which could later be used to reveal the identity of the patient. This prevents publicly sharing most of the medical datasets. The presence of this protected information is mostly due to some of the standard steps in any clinical workflow (burning of patient name on X-Ray report for instance). AI algorithms have been accused of making use of this protected information to make predictions and give biased results. Artificial Intelligence algorithms have had a long history of being biased. Most of the algorithmic bias is attributed to lack of diversity in training data. The bias in data is amplified by the algorithms which leads to prejudiced results affecting a certain segment of the population. There have been cases in the past where an AI has prioritized white patients for receiving treatment for a condition that is much more prevalent in black patients   . A group of such decision by an AI can weaken people’s trust in the technology and can also lead to instability. It is important to understand that most of this bias stems from age-old inequalities within society , exclusion of women or minimal inclusion of black people in medical studies for instance, and lesser from algorithm’s design. Most AI algorithms lack strong generalization i.e they tend to perform poorly upon encountering data that is different from their training data. In the real world scenario, the patient population comprises of people from different ethnicities, demographics and backgrounds whereas algorithms are trained on a very limited subset of this population    . As a result, they often fail to perform upon encountering patients from demographics that were either absent or barely present    Algorithm Design and Measures of Performance Performance Drift Over Time Most of the algorithms and metrics used in machine learning are not particularly tailored for medical problems. There are unique requirements that are needed to be considered while designing algorithms to be used in the medical domain. These algorithms should be designed with a goal of The learning process of algorithms differs from how clinicians learn in one fundamental aspect - clinicians continuously adapt to new problems and cases over time whereas algorithms once deployed cannot change further. This inability to change leads to a performance-drift over time Patient privacy POSTnote Number 1234 December 2021 Healthcare AI: New Challenges where the algorithms are no longer relevant to current patient cases and have to be either re-trained on new data or completely scrapped. Re-training the models is expensive and requires collecting, processing and structuring the new data. It also leads to the model forgetting previously acquired knowledge. Addressing the Challenge Promote Collaboration In order to develop effective methods for solving clinical problems, there is a strong need for artificial intelligence researchers to collaborate with clinicians for understanding the clinical requirements and workflow. Most of the algorithms currently being used are derived from ideas formulated for standard machine learning problems hence making them sub-optimal for solving clinical problems. The development of a successful medical AI product requires involvement of all stakeholders in the development process. Recent collaborations between major pharmaceutical and technology companies is one successful step in this direction . The Government can promote establishing collaborative research centers and schemes that incentivize collaboration between different stakeholders. performed to win approval for an algorithm in the real-world. Recruit and retain talent The government can support AI research in healthcare through funding different programs. Universities and educational institutions can use these funds to recruit international talent. The recently created AI research division of NHS, termed NHS-X, is one successful step to facilitate and oversee country-wide adoption of AI in healthcare. Recently, UK Research and Innovation (UKRI) has established Centres for Doctoral Training (CDTs) for training students in areas of special importance including artificial intelligence in healthcare. This would enable recruiting national and international talent and also create incentives for them to stay within the country and contribute. The Government can also provide priority visas to people working in these areas of importance and who wish to stay in the country after completing their studies. Removing these barriers should be of prime importance in order to establish UK’s dominance in AI and healthcare. References  Lei Rigi Baltazar et al. “Artificial intelligence on COVID-19 pneumonia detection using chest xray images”. In: Plos one 16.10 (2021), e0257884.  Imon Banerjee et al. “Reading Race: AI Recognises Patient’s Racial Identity In Medical Images”. In: arXiv preprint arXiv:2107.10356 (2021).  Kanav Bhagat and Tavpritesh Sethi. “WashKaro”. In: (2020).  Bill Briggs. “Novartis empowers scientists with AI to speed the discovery and development of breakthrough medicines”. In: (2021). URL: https://news. microsoft.com/transform/novartis- empowersscientists- ai- speed- discovery- developmentbreakthrough-medicines/.  Alex Davies et al. “Advancing mathematics by guiding human intuition with AI”. In: Nature 600.7887 (2021), pp. 70–74.  Jia Deng et al. “ImageNet: A large-scale hierarchical image database”. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009, pp. 248– 255. DOI: 10.1109/CVPR.2009.5206848.  “fairness and inclusivity: key ingredients in equitable health ai2 021”. In: (2021). URL: https : / / www . statnews.com/2021/11/30/fairness-inclusivitykey-ingredients-development-health-ai/.  “from oximeters to ai, where bias in medical devices may lurk2 021”. In: (2021). URL: https : / / www . theguardian.com/society/2021/nov/21/fromoximeters - to - ai - where - bias - in - medical devices-may-lurk/. Enhance explainability and validation Vendors creating AI algorithms should ensure that explainability is built in the algorithms from the beginning. This would greatly help in building clinicians’ trust in the technology and better integration in clinical workflow. It would also enable clinicians to identify novel biomarkers through the lens of artificial intelligence that would advance and augment healthcare. In conjunction to this, AI vendors should also lay a major focus on extensive validation of their products. Validation and testing studies should be conducted separately but in parallel to the original study and involve data from multiple medical centers representing different demographics in order to effectively approximate the realworld scenario. In addition to this, they should also account for other factors such as image acquisition protocols and camera systems. Recent research has discovered AI algorithms having the potential to predict sensitive attributes such as race and ethnicity of a patient from images . This requires additional checks in terms of these attributes to be performed as a part of validation studies conducted by the vendors, researchers and clinicians. Eliminate bias from studies and algorithms All AI algorithms should be placed within strict constraints to check for bias in their decisions. A large cause of this bias lies in the way how current clinical studies are designed excluding minority groups in terms of race (black population), gender (females and non-binary) and ethnicity. It is essential for participants to be stratified into well-defined, equal groups for these studies. After creating balanced studies, the algorithms should be checked for bias in their decisions against any particular minority group. Such validation checks should also be included in validation studies Page 3 POSTnote Number 1234 December 2021 Healthcare AI: New Challenges    Victor O. Gawriljuk et al. “Machine Learning Models Identify Inhibitors of SARS-CoV-2”. In: Journal of Chemical Information and Modeling 61.9 (Sept. 27, 2021). Publisher: American Chemical Society, pp. 4224–4235. ISSN: 1549-9596. DOI: 10.1021/ acs.jcim.1c00683. URL: https://doi.org/10.1021/ acs.jcim.1c00683 (visited on 11/18/2021). Page 4  Seong Ho Park et al. “Ethical challenges regarding artificial intelligence in medicine from the perspective of scientific editing and peer review”. In: Science Editing 6.2 (2019), pp. 91–98.  Marzyeh Ghassemi et al. “Practical guidance on artificial intelligence for health-care data”. In: The Lancet Digital Health 1.4 (2019), e157–e159. Emma Pierson et al. “An algorithmic approach to reducing unexplained pain disparities in underserved populations”. In: Nature Medicine 27.1 (2021), pp. 136–140.  Varun Gulshan et al. “Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs”. In: Jama 316.22 (2016), pp. 2402–2410. Suman Ravuri et al. “Skillful Precipitation Nowcasting using Deep Generative Models of Radar”. In: arXiv preprint arXiv:2104.00954 (2021).  Michael Roberts et al. “Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans”. In: Nature Machine Intelligence 3.3 (Mar. 2021). Bandiera abtest: a Cc license type: cc by Cg type: Nature Research Journals Number: 3 Primary atype: Research Publisher: Nature Publishing Group Subject term: Computational science;Diagnostic markers;Prognostic markers;SARS-CoV-2 Subject term id: computational-science;diagnosticmarkers;prognostic-markers;sars-cov-2, pp. 199–217. ISSN : 2522-5839. DOI : 10.1038/s42256-021-003070. URL: https : / / www . nature . com / articles / s42256-021-00307-0 (visited on 11/18/2021).  Stephanie A Harmon et al. “Artificial intelligence for the detection of COVID-19 pneumonia on chest CT using multinational datasets”. In: Nature communications 11.1 (2020), pp. 1–7.  Sepp Hochreiter and Jürgen Schmidhuber. “Long short-term memory”. In: Neural computation 9.8 (1997), pp. 1735–1780.  Eui Jin Hwang et al. “Development and validation of a deep learning–based automated detection algorithm for major thoracic diseases on chest radiographs”. In: JAMA network open 2.3 (2019), e191095–e191095.  Jeremy Irvin et al. “Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison”. In: Proceedings of the AAAI conference on artificial intelligence. Vol. 33. 01. 2019, pp. 590–597.  David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. “Learning representations by backpropagating errors”. In: nature 323.6088 (1986), pp. 533–536.  John Jumper et al. “Highly accurate protein structure prediction with AlphaFold”. In: Nature 596.7873 (2021), pp. 583–589.  David Silver et al. “Mastering the game of Go with deep neural networks and tree search”. In: nature 529.7587 (2016), pp. 484–489.  Christopher J Kelly et al. “Key challenges for delivering clinical impact with artificial intelligence”. In: BMC medicine 17.1 (2019), pp. 1–9.   László Róbert Kolozsvári et al. “Predicting the epidemic curve of the coronavirus (SARS-CoV-2) disease (COVID-19) using artificial intelligence: An application on the first and second waves”. In: Informatics in Medicine Unlocked 25 (2021), p. 100691. Anuroop Sriram et al. “COVID-19 Prognosis via Self-Supervised Representation Learning and MultiImage Prediction”. In: arXiv preprint arXiv:2101.04909 (2021).  Oriol Vinyals et al. “Grandmaster level in StarCraft II using multi-agent reinforcement learning”. In: Nature 575.7782 (2019), pp. 350–354.  Linda Wang, Zhong Qiu Lin, and Alexander Wong. “Covid-net: A tailored deep convolutional neural network design for detection of covid-19 cases from chest x-ray images”. In: Scientific Reports 10.1 (2020), pp. 1–12.  Heidi Ledford. “Millions of black people affected by racial bias in health-care algorithms”. In: Nature 574.7780 (2019), pp. 608–610.  Scott Mayer McKinney et al. “International evaluation of an AI system for breast cancer screening”. In: Nature 577.7788 (2020), pp. 89–94.  Ziad Obermeyer et al. “Dissecting racial bias in an algorithm used to manage the health of populations”. In: Science 366.6464 (2019), pp. 447–453. Martin J Willemink et al. “Preparing medical imaging data for machine learning”. In: Radiology 295.1 (2020), pp. 4–15.  John R Zech et al. “Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study”. In: PLoS medicine 15.11 (2018), e1002683.   Deepak Painuli et al. “Forecast and prediction of COVID-19 using machine learning”. In: Data Science for COVID-19 (2021), pp. 381–397. DOI: 10 . 1016 / B978-0-12-824536-1.00027-7. URL: https://www. ncbi.nlm.nih.gov/pmc/articles/PMC8138040/ (visited on 11/18/2021).