Uploaded by dutt.raman07

AI In Healthcare

Number 1234 December 2021
Healthcare AI: New Challenges
[25] (all developed by the UK-based organization, DeepMind).
Artificial Intelligence has had a fairly long-standing relationship with healthcare and medicine. It has delivered
impressive results in different areas of healthcare such
as early detection, diagnosis, decision-making, and even
recommending treatments. In the past few years, several
applications of AI have emerged where machine intelligence has augmented and sometimes even outperformed
human decisions [11] [20]. More recently, artificial intelligence has become a very active area of research inviting
researchers to work on the intersection of different domains
with healthcare being one of the popular choices. AI research in healthcare is lucrative since it can have a direct
implication on human lives.
Subtle Medical. AI in Healthcare: How effective?
Adoption of Artificial Intelligence (AI) and
Computer-Aided Diagnosis (CAD) can support
UK’s healthcare system and play a significant role
in accomplishing key objectives. However, despite of the prolific research in both industry and
academia, we are still far from large-scale adoption of AI in clinical settings. While a previously
published POSTnote provides background on artificial intelligence, its applications in healthcare,
this POSTnote extends upon it by focusing on
the latter part. It provides an overview of various
challenges presented while using AI in healthcare
by presenting them under 6 main categories. Additionally, it also discusses some implementable
solutions and provides suggestions on how they
can be integrated into current systems. Studying
the problems and accompanying solutions can
help bridge the current gap between academic research and clinical demands which could further
foster better adoption of AI in the medical domain.
Artificial Intelligence is a field that studies how to design
algorithms that have the ability to extract patterns from a
given dataset. The algorithms are designed to mimic the
learning process of a human brain as closely as possible.
Although, the algorithms employed for this process were
designed long ago [27] [13], researchers have started working on innovative use-cases only recently due to advancements in computational and storage capacity. In the last five
years, artificial intelligence has reached superhuman capacity in certain complex tasks such as playing the game
of GO and starcraft [28] [30] and advancing science [16] [5]
Despite of the continuous research and development, impressive results and innovative use-cases, there are major
issues with adopting AI algorithms for use in clinical settings. COVID-19 pandemic was the first major opportunity
for researchers to demonstrate the true efficacy of AI by
creating tools to alleviate the nationwide pressure on hospitals and clinicians. Both academia and industry joined
hands to solve a common problem. Tools were created
to diagnose COVID-19 [31] [12] [1], predict the number of
cases [22] [18], forecast demand of essential resources
[29], spread awareness through social media [3], and even
look for treatments [9]. Industrial organizations such as
Qure.ai (India) and Lunit (South Korea) were one of the
first vendors to provide algorithms tailored for COVID-19
detection. Many hospitals, such as Royal Bolton Hospital,
UK, adopted these algorithms in an attempt to reduce the
increasing pressure on their staff. However, none of them
could be adopted in practice. [26] [26] performed a comprehensive study where they surveyed 62 papers that used
machine learning, deep learning or both for either diagnosis or prognosis of COVID-19. They found that none of the
proposed algorithms were of potential clinical use due to
methodological flaws and/or underlying biases. Similarly,
the Alan Turing Institute, UK’s national institute for data science and artificial intelligence, has presented a report on
the response of the UK’s data science and AI community to
the COVID-19 pandemic and noted how AI failed to make
an impact it promised. This points towards a major gap
between AI research and clinical demands that needs be
addressed if we aim to establish UK as a world leader in
medical AI.
Page 1
POSTnote Number 1234 December 2021 Healthcare AI: New Challenges
• Artificial Intelligence has big potential for transforming and augmenting healthcare. However, certain
challenges need to be addressed before that.
• COVID-19 pandemic presented the first opporunity
to test AI’s efficacy on a large scale. However, the
technology failed to deliver the expected results.
• Researchers have identified reasons and made recommendations on steps to improve AI for healthcare.
• AI would be an integral part of healthcare in future.
Hence, investing in AI research is important.
Key Challenges
Medical Imaging Data Curation
Data is the most fundamental requirement for artificial intelligence. The algorithms require heaps of datasets to recognize patterns within them and give predictions. Although,
many imaging datasets are available for research, some of
them are either do not involve medical images [6] or are
proprietary [JFT300M, JFT300B (Google), IG-3.5B-17k
(Facebook)]. Curating medical datasets is a hard task due
to multiple reasons. Firstly, unlike standard datasets used
for machine learning research, curating medical datasets
cannot be crowdsourced amongst the general population
and requires the expertise of medical doctors. Further,
such datasets require inter and intra-observer agreement
amongst the experts which can be very low for specific clinical conditions. The degree of specialization required for
curating medical datasets limits the number of individuals
who can contribute to the process. The curation of some
of the large-scale, open-sourced medical datasets [15] [31]
was enabled due to a join collaboration between different
medical research centers spread across several nations.
Page 2
having potential clinical implications rather than providing
a proof-of-concept. Similarly, the metrics used to evaluate
performance should be focused on clinical performance.
For instance, an incorrect diagnosis has a much greater
implication in medical domain than in other applications of
artificial intelligence. Hence, metrics that take this into account should be designed and adopted for medical applications of AI.
Lack of algorithm explainability and validation
AI algorithms have limited explainability and generalizability.
Most of the products present in the market were created as
input-output systems with little support for explaining their
decision-making process. Lack of explainability prevents
clinicians from placing their trust in these algorithms and
adopting them in their clinical workflow. Another challenge
arises during validation and testing of these algorithms. An
effective validation strategy should involve studies from multiple centers and demographics in order to better simulate
the real-world scenario. However, this is difficult to implement since different study centers have a different structure
for data pipelines while the algorithms are quite rigid about
their own input pipeline structure. One way to alleviate this
is standardizing the input-output pipeline structure for each
study center and also for algorithms.
Ethics, algorithmic bias and generalizability
Patient privacy greatly hinders dataset curation and opensourcing medical datasets. Many times, the Protected
Health Information (PHI) of a patient can be present in the
dataset which could later be used to reveal the identity of
the patient. This prevents publicly sharing most of the medical datasets. The presence of this protected information
is mostly due to some of the standard steps in any clinical
workflow (burning of patient name on X-Ray report for instance). AI algorithms have been accused of making use
of this protected information to make predictions and give
biased results.
Artificial Intelligence algorithms have had a long history of
being biased. Most of the algorithmic bias is attributed to
lack of diversity in training data. The bias in data is amplified by the algorithms which leads to prejudiced results
affecting a certain segment of the population. There have
been cases in the past where an AI has prioritized white
patients for receiving treatment for a condition that is much
more prevalent in black patients [24] [21] [19]. A group of
such decision by an AI can weaken people’s trust in the
technology and can also lead to instability. It is important
to understand that most of this bias stems from age-old inequalities within society [7], exclusion of women or minimal
inclusion of black people in medical studies for instance,
and lesser from algorithm’s design.
Most AI algorithms lack strong generalization i.e they tend
to perform poorly upon encountering data that is different
from their training data. In the real world scenario, the patient population comprises of people from different ethnicities, demographics and backgrounds whereas algorithms
are trained on a very limited subset of this population [17]
[10] [23] [32]. As a result, they often fail to perform upon
encountering patients from demographics that were either
absent or barely present [8] [33] [14]
Algorithm Design and Measures of Performance
Performance Drift Over Time
Most of the algorithms and metrics used in machine learning are not particularly tailored for medical problems. There
are unique requirements that are needed to be considered
while designing algorithms to be used in the medical domain. These algorithms should be designed with a goal of
The learning process of algorithms differs from how clinicians learn in one fundamental aspect - clinicians continuously adapt to new problems and cases over time whereas
algorithms once deployed cannot change further. This inability to change leads to a performance-drift over time
Patient privacy
POSTnote Number 1234 December 2021 Healthcare AI: New Challenges
where the algorithms are no longer relevant to current patient cases and have to be either re-trained on new data
or completely scrapped. Re-training the models is expensive and requires collecting, processing and structuring the
new data. It also leads to the model forgetting previously
acquired knowledge.
Addressing the Challenge
Promote Collaboration
In order to develop effective methods for solving clinical
problems, there is a strong need for artificial intelligence
researchers to collaborate with clinicians for understanding
the clinical requirements and workflow. Most of the algorithms currently being used are derived from ideas formulated for standard machine learning problems hence making them sub-optimal for solving clinical problems. The development of a successful medical AI product requires involvement of all stakeholders in the development process.
Recent collaborations between major pharmaceutical and
technology companies is one successful step in this direction [4]. The Government can promote establishing collaborative research centers and schemes that incentivize collaboration between different stakeholders.
performed to win approval for an algorithm in the real-world.
Recruit and retain talent
The government can support AI research in healthcare
through funding different programs. Universities and educational institutions can use these funds to recruit international talent. The recently created AI research division
of NHS, termed NHS-X, is one successful step to facilitate
and oversee country-wide adoption of AI in healthcare. Recently, UK Research and Innovation (UKRI) has established
Centres for Doctoral Training (CDTs) for training students in
areas of special importance including artificial intelligence
in healthcare. This would enable recruiting national and international talent and also create incentives for them to stay
within the country and contribute. The Government can also
provide priority visas to people working in these areas of
importance and who wish to stay in the country after completing their studies. Removing these barriers should be of
prime importance in order to establish UK’s dominance in
AI and healthcare.
Lei Rigi Baltazar et al. “Artificial intelligence on
COVID-19 pneumonia detection using chest xray images”. In: Plos one 16.10 (2021), e0257884.
Imon Banerjee et al. “Reading Race: AI Recognises
Patient’s Racial Identity In Medical Images”. In: arXiv
preprint arXiv:2107.10356 (2021).
Kanav Bhagat and Tavpritesh Sethi. “WashKaro”. In:
Bill Briggs. “Novartis empowers scientists with AI
to speed the discovery and development of breakthrough medicines”. In: (2021). URL: https://news.
microsoft.com/transform/novartis- empowersscientists- ai- speed- discovery- developmentbreakthrough-medicines/.
Alex Davies et al. “Advancing mathematics by guiding
human intuition with AI”. In: Nature 600.7887 (2021),
pp. 70–74.
Jia Deng et al. “ImageNet: A large-scale hierarchical
image database”. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009, pp. 248–
255. DOI: 10.1109/CVPR.2009.5206848.
“fairness and inclusivity: key ingredients in equitable
health ai2 021”. In: (2021). URL: https : / / www .
“from oximeters to ai, where bias in medical devices
may lurk2 021”. In: (2021). URL: https : / / www .
theguardian.com/society/2021/nov/21/fromoximeters - to - ai - where - bias - in - medical devices-may-lurk/.
Enhance explainability and validation
Vendors creating AI algorithms should ensure that explainability is built in the algorithms from the beginning. This
would greatly help in building clinicians’ trust in the technology and better integration in clinical workflow. It would also
enable clinicians to identify novel biomarkers through the
lens of artificial intelligence that would advance and augment healthcare. In conjunction to this, AI vendors should
also lay a major focus on extensive validation of their products. Validation and testing studies should be conducted
separately but in parallel to the original study and involve
data from multiple medical centers representing different
demographics in order to effectively approximate the realworld scenario. In addition to this, they should also account
for other factors such as image acquisition protocols and
camera systems. Recent research has discovered AI algorithms having the potential to predict sensitive attributes
such as race and ethnicity of a patient from images [2]. This
requires additional checks in terms of these attributes to be
performed as a part of validation studies conducted by the
vendors, researchers and clinicians.
Eliminate bias from studies and algorithms
All AI algorithms should be placed within strict constraints to
check for bias in their decisions. A large cause of this bias
lies in the way how current clinical studies are designed excluding minority groups in terms of race (black population), gender (females and non-binary) and ethnicity. It is
essential for participants to be stratified into well-defined,
equal groups for these studies. After creating balanced
studies, the algorithms should be checked for bias in their
decisions against any particular minority group. Such validation checks should also be included in validation studies
Page 3
POSTnote Number 1234 December 2021 Healthcare AI: New Challenges
Victor O. Gawriljuk et al. “Machine Learning Models Identify Inhibitors of SARS-CoV-2”. In: Journal of Chemical Information and Modeling 61.9
(Sept. 27, 2021). Publisher: American Chemical Society, pp. 4224–4235. ISSN: 1549-9596. DOI: 10.1021/
acs.jcim.1c00683. URL: https://doi.org/10.1021/
acs.jcim.1c00683 (visited on 11/18/2021).
Page 4
Seong Ho Park et al. “Ethical challenges regarding
artificial intelligence in medicine from the perspective
of scientific editing and peer review”. In: Science Editing 6.2 (2019), pp. 91–98.
Marzyeh Ghassemi et al. “Practical guidance on artificial intelligence for health-care data”. In: The Lancet
Digital Health 1.4 (2019), e157–e159.
Emma Pierson et al. “An algorithmic approach to
reducing unexplained pain disparities in underserved populations”. In: Nature Medicine 27.1 (2021),
pp. 136–140.
Varun Gulshan et al. “Development and validation
of a deep learning algorithm for detection of diabetic
retinopathy in retinal fundus photographs”. In: Jama
316.22 (2016), pp. 2402–2410.
Suman Ravuri et al. “Skillful Precipitation Nowcasting
using Deep Generative Models of Radar”. In: arXiv
preprint arXiv:2104.00954 (2021).
Michael Roberts et al. “Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest
radiographs and CT scans”. In: Nature Machine
Intelligence 3.3 (Mar. 2021). Bandiera abtest:
a Cc license type: cc by Cg type: Nature Research Journals Number: 3 Primary atype: Research Publisher: Nature Publishing Group Subject term: Computational science;Diagnostic
markers;Prognostic markers;SARS-CoV-2 Subject term id: computational-science;diagnosticmarkers;prognostic-markers;sars-cov-2, pp. 199–217.
ISSN : 2522-5839. DOI : 10.1038/s42256-021-003070. URL: https : / / www . nature . com / articles /
s42256-021-00307-0 (visited on 11/18/2021).
Stephanie A Harmon et al. “Artificial intelligence for
the detection of COVID-19 pneumonia on chest CT
using multinational datasets”. In: Nature communications 11.1 (2020), pp. 1–7.
Sepp Hochreiter and Jürgen Schmidhuber. “Long
short-term memory”. In: Neural computation 9.8
(1997), pp. 1735–1780.
Eui Jin Hwang et al. “Development and validation of a
deep learning–based automated detection algorithm
for major thoracic diseases on chest radiographs”. In:
JAMA network open 2.3 (2019), e191095–e191095.
Jeremy Irvin et al. “Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison”. In: Proceedings of the AAAI conference on
artificial intelligence. Vol. 33. 01. 2019, pp. 590–597.
David E Rumelhart, Geoffrey E Hinton, and Ronald
J Williams. “Learning representations by backpropagating errors”. In: nature 323.6088 (1986),
pp. 533–536.
John Jumper et al. “Highly accurate protein structure prediction with AlphaFold”. In: Nature 596.7873
(2021), pp. 583–589.
David Silver et al. “Mastering the game of Go with
deep neural networks and tree search”. In: nature
529.7587 (2016), pp. 484–489.
Christopher J Kelly et al. “Key challenges for delivering clinical impact with artificial intelligence”. In: BMC
medicine 17.1 (2019), pp. 1–9.
László Róbert Kolozsvári et al. “Predicting the epidemic curve of the coronavirus (SARS-CoV-2) disease (COVID-19) using artificial intelligence: An application on the first and second waves”. In: Informatics
in Medicine Unlocked 25 (2021), p. 100691.
Anuroop Sriram et al. “COVID-19 Prognosis via
Self-Supervised Representation Learning and MultiImage Prediction”. In: arXiv preprint arXiv:2101.04909
Oriol Vinyals et al. “Grandmaster level in StarCraft II
using multi-agent reinforcement learning”. In: Nature
575.7782 (2019), pp. 350–354.
Linda Wang, Zhong Qiu Lin, and Alexander Wong.
“Covid-net: A tailored deep convolutional neural network design for detection of covid-19 cases from
chest x-ray images”. In: Scientific Reports 10.1
(2020), pp. 1–12.
Heidi Ledford. “Millions of black people affected by
racial bias in health-care algorithms”. In: Nature
574.7780 (2019), pp. 608–610.
Scott Mayer McKinney et al. “International evaluation
of an AI system for breast cancer screening”. In: Nature 577.7788 (2020), pp. 89–94.
Ziad Obermeyer et al. “Dissecting racial bias in an
algorithm used to manage the health of populations”.
In: Science 366.6464 (2019), pp. 447–453.
Martin J Willemink et al. “Preparing medical imaging data for machine learning”. In: Radiology 295.1
(2020), pp. 4–15.
John R Zech et al. “Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study”. In:
PLoS medicine 15.11 (2018), e1002683.
Deepak Painuli et al. “Forecast and prediction of
COVID-19 using machine learning”. In: Data Science
for COVID-19 (2021), pp. 381–397. DOI: 10 . 1016 /
B978-0-12-824536-1.00027-7. URL: https://www.
(visited on 11/18/2021).