Cancer Detection with Machine Learning: A Review

A Review of Cancer Detection using Machine Learning Model
Conference Paper · October 2023
5 authors, including:
Shweta Tushar Kamble
Bharati Vidyapeeth College of Engineering Kolhapur
A Review of Cancer Detection using Machine Learning
Mrs. Pooja R. Patil , Mrs. Shweta T. Kamble
CSE (AIML), Bharati Vidyapeeth's College Of Engineering, Kolhapur (MH), India)
One of the most frequent cancers worldwide is skin cancer, often known as cancer of the skin or SC. Although a
clinical examination of skin lesions is crucial for detecting the disease's characteristics, it is limited by the time
it takes and the variety of interpretations it may lead to. Machine learning (ML) and deep learning (DL)
techniques have been developed to assist dermatologists in making an early and accurate diagnosis of SC,
which is crucial for increasing the patient's survival rate. Here, we systematically review the literature on skin
lesion classification using machine learning. Our goal is to provide newcomers to the subject with a solid basis
to develop their future studies and contributions. Several online databases were searched with the use of
inclusion/exclusion criteria. Documents were selected for this assessment based on their ability to provide a
detailed account of the procedures taken and an accurate account of the outcomes achieved. Sixty-eight studies
were selected, the vast majority of which rely on DL methods for detecting and classifying skin cancer,
particularly convolutional neural networks (CNN), with a lesser number relying on ML techniques or hybrid
ML/DL approaches. The papers were chosen for their usefulness in diagnosing and categorizing skin cancer.
Several ML and DL methods provide state-of-the-art results in categorizing skin lesions. The promising results
achieved so far bode well for the eventual use of these methods in clinical practice.
Keyword : Deep-Learning , Convolutional Neural Networks (CNN) , Skin Cancer , Acute
lymphoblastic leukemia, X-rays, CT scans, MRI scans.
Cancer is defined by the uncontrolled development and spread of abnormal cells throughout the body. This
proliferation and spread of aberrant cells are what causes cancer. In a healthy organism, cells will divide and
multiply in a controlled way to replace cells that have died or been injured. On the other hand, cancer causes
cells to continue dividing and growing uncontrolled, which results in the formation of a mass of aberrant cells
known as a tumour. Yet, not every tumour has the potential to develop into cancer. Benign tumours are so-called
because they do not contain cancer cells nor pose a danger to the patient's health in any way.
On the other hand, cancerous tumours can infect neighbouring tissues and organs and travel to other regions of
the body through the circulation or the lymphatic system [1]. This may result in several major consequences and
in some cases, death. Cancer may manifest itself in any area of the body and can strike individuals of any age,
despite the likelihood of having cancer rising with advancing years. There are a wide variety of cancers, each of
which has its own traits, symptoms, and approaches to therapy.
The topic of cancer diagnosis is one area where the cutting-edge technology of machine learning has shown a
great deal of promise. Machine learning algorithms can recognize patterns and make predictions with a high
degree of accuracy because they examine significant volumes of data. The development of models for the early
identification of cancer, which is essential for successful treatment and improved patient outcomes, may be
accomplished using machine learning methods [2]. These models can conduct an in-depth analysis of a wide
range of data, including genetic information, patient history, and medical pictures, to identify the existence of
cancer or the likelihood of the patient having cancer in the future. Moreover, machine learning may tailor
treatment regimens for specific patients by considering the patients' distinct traits and reactions to therapy. In
general, the application of machine learning to the cancer detection process can significantly increase the
accuracy of cancer diagnosis and the efficacy of cancer therapy.
Many types of cancer can be classified based on the type of cells that are initially affected. Here are some of the
most common types of cancer [3]:
Carcinomas are the most frequent cancer, originating in the cells that line the interior or exterior surfaces
of the body. Cancers such as those of the breast, lungs, prostate, and colon are some examples.
Sarcomas are cancer originating in connective tissues, including bone, muscle, and cartilage. Examples
include osteosarcoma and chondrosarcoma.
Leukaemias are cancer that develops from blood-forming cells and may damage the blood and the bone
marrow. Acute lymphoblastic leukaemia (ALL) and chronic myeloid leukaemia are two types of
leukaemia that are common examples (CML).
Lymphomas are a kind of cancer that affects the lymphatic system, a component of the immune system.
Hodgkin's lymphoma and non-lymphoma Hodgkin's are two types of lymphoma that are common
Tumours of the brain and spinal cord: These types of cancer may develop in the brain or spinal cord and
are categorized according to the kind of cells they originate from.
Melanoma is a kind of skin cancer that originates in the body's cells that produce pigments, referred to as
There are several methods for cancer detection, including [4][5]:
Imaging tests: These tests provide pictures of the interior of the body via the use of a variety of imaging
modalities, including X-rays, CT scans, MRI scans, ultrasonography, and PET scans. These tests have the
capability of identifying the existence of abnormalities such as cancer.
A biopsy is a technique in which a tiny tissue sample is taken from the body and inspected under a
microscope to check for cancer cells. This is done to determine whether or not the patient has cancer.
Blood testing These tests may evaluate specific components in the blood that may be symptomatic of
cancer, such as tumour markers or aberrant blood cell counts. For example, a tumour marker may be an elevated
number of white blood cells.
An endoscopy is a process in which a thin, flexible tube with a camera attached to the end is put into
the body to observe the interior of organs and tissues. This operation may be used to diagnose and treat various
medical conditions. This method may be used to either identify anomalies or collect tissue samples in
preparation for a biopsy.
Genetic screenings: These tests may examine a person's DNA to check for genetic mutations or
abnormalities that may raise the person's chance of acquiring specific forms of cancer.
Routine physical examinations Having a healthcare practitioner do routine physical exams on you will
assist in detecting abnormalities in the body that may indicate cancer.
Machine learning techniques can be used to develop models for cancer detection, which can analyze various
factors such as genetic information, patient history, and medical images to detect the presence of cancer or the
risk of developing cancer. Here are some common machine-learning techniques used for cancer detection [6][7]:
Supervised learning: This method includes training a machine learning model using labelled data, in which
the inputs are associated with known outputs. 2. Unsupervised learning: This method does not use labelled
data. The model may then use this training data to generate predictions on fresh data that has not been
tagged. This might entail training a model to identify photos as either malignant or non-cancerous based on
prior images that have been classified by a medical professional. One application of this technique is in the
field of a cancer diagnosis.
Unsupervised learning: This includes training a machine learning model on unlabeled data, with the
objective being to detect patterns or clusters within the data. This kind of learning aims to discover patterns
or clusters within the data. This might entail evaluating genomic data in the context of cancer detection,
with the goal of finding patterns of genetic alterations or gene expression patterns characteristic of cancer.
Deep learning is a kind of machine learning that includes training neural networks, which are designed after
the structure of the human brain. This sort of machine learning is becoming more popular. Image
identification and classification are two possible applications for deep learning algorithms, which may be
taught on vast volumes of data. Training a deep learning model to evaluate medical pictures such as X-rays
or CT scans and identifying anomalies that may be symptomatic of cancer is one possible use of this
technique in diagnosing cancer.
By training a model using labelled data in which the inputs are coupled with known outputs, supervised learning
methods may be used to diagnose cancer. The following is a list of popular supervised learning approaches that
are used for the identification of cancer [8][9]:
1. Support Vector Machines (SVMs): SVMs are a form of an algorithm that may be used for classification tasks,
where the aim is to separate data into different categories. This sort of method can be used to separate data into
unique categories. In cancer diagnosis, support vector machines (SVMs) may be trained on labeled medical
pictures to identify images as either malignant or non-cancerous depending on the images' characteristics.
2. Random Forest: Random Forest is a form of ensemble learning method that makes predictions using several
different decision trees. In cancer diagnosis, random forest algorithms may be trained on genomic data to predict
cancer risk based on different genetic markers. The presence or absence of certain genetic markers can
determine this risk.
3. Logistic Regression: Logistic regression is a statistical approach for binary classification problems that
predicts whether an outcome will fall into one of two categories. Logistic regression is a method for predicting
whether an event will fall into one of two groups. Logistic regression is a technology that may be used in cancer
detection to estimate the likelihood that a patient has cancer based on several clinical or demographic factors.
4. Artificial Neural Networks (ANNs): ANNs are a sort of deep learning algorithm designed after the human
brain's structure. ANNs are also known as convolutional neural networks (CNNs). In cancer detection, artificial
neural networks (ANNs) may be educated using enormous datasets of medical pictures or genetic data to
categorize tumors and forecast patient outcomes.
With the examination of unlabeled data and identifying patterns and clusters that may be symptomatic of cancer,
unsupervised learning methods may be used for cancer diagnosis. The following is a list of popular
unsupervised learning approaches that are used for the identification of cancer [10]:
1. Clustering is a method that groups data points that are similar based on the characteristics of those data
points. Regarding cancer diagnosis, clustering may be used to identify subgroups of patients or tumours with
similar features. This, in turn, can assist in identifying prospective therapeutic targets.
2. Principal Component Analysis (PCA): PCA is a method that may be used to decrease the dimensionality of
high-dimensional data, such as genomic data. PCA is an abbreviation for the phrase "principal component
analysis." In detecting cancer, principal component analysis (PCA) may be used to identify patterns of
genetic alterations or gene expression linked to the disease.
3. Autoencoders are neural networks that may be trained on unlabeled data to recognize patterns and features
within the data. This type of neural network is known as an autoencoder. Autoencoders may be used to
analyze medical pictures or genetic data in the field of cancer diagnosis. This allows for the identification of
patterns that may be suggestive of cancer.
4. Self-Organizing Maps, or SOMs, are neural networks capable of visualizing high-dimensional data in a
lower-dimensional environment. SOMs are also known as self-organizing maps. Regarding cancer detection,
SOMs may be used to locate clusters of patients or tumours that exhibit comparable characteristics. This, in
turn, can assist with identifying prospective therapeutic targets.
The use of deep learning strategies has shown significant promise in enhancing the precision and efficiency of a
cancer diagnosis. The following is a list of popular deep-learning approaches that are utilized for the diagnosis
of cancer [11]:
1. Convolutional Neural Networks, often known as CNNs for short. CNNs are a sort of deep learning algorithm
that is intended to evaluate pictures. CNNs may be trained on medical images like X-rays or CT scans to
recognize patterns and anomalies that may be symptomatic of cancer. This is useful in the field of a cancer
2. Recurrent Neural Networks (RNNs): RNNs are a deep learning technique to evaluate sequential data. They
are also known as recurrent neural architectures. RNNs may be trained on time-series data, such as gene
expression data, to detect patterns and changes in gene expression that may be related to cancer. This can be
done in the context of cancer detection using RNNs.
3. Generative Adversarial Networks (GANs): Generative adversarial networks are a form of deep learning
method used for picture creation. In cancer diagnosis, GANs may be taught on medical pictures to produce
synthetic images that can be used to enhance training datasets and increase the accuracy of cancer detection
models. These synthetic images can be generated by using GANs.
4. Transfer Learning: Transfer learning is a method that includes reusing pre-trained models for different tasks.
This may be accomplished via the use of pre-trained models. Pre-trained models, such as CNNs, may have their
accuracy for cancer diagnosis improved by being fine-tuned on fresh medical imaging datasets. This improves
the models' ability to identify cancer.
This research aims to classify oral cancer into distinct subtypes using the aforementioned intelligent computing
techniques and to test its effectiveness in various contexts. Using a Fragment Ary Whale Optimizer and a Deep
Convolutional Neural Network, we significantly contributed to identifying and classifying oral cancer using a
Deep Learning approach (FJWO-DCNN). This was done so that the best theoretical aspects of oral cancer
photos could be used, resulting in a high identification rate. The original picture and the theoretical
characteristics retrieved from it were fed into a higher-level categorization approach. A Deep Convolution
Neural Network (DCNN) classifier, trained using the suggested Fragment Jelly Whale Optimization (FJWO)
method, was used to carry out this classification technique. In conclusion, this approach may perform better in
achieving the required results. The proposed action improves performance by a greater percentage as assessed
by precision [12].
CKS Block may adaptively transition from standard convolution to deformable convolution in some layers to
better detect irregular objects without using too many processing resources. Based on the size distribution of the
items, the SOA Block may automatically construct the most efficient anchors. Regarding detection accuracy,
our technique beats previous algorithms on the HPC dataset (which comprises over 1800 T2 MRI slices) with a
maximum AP 50 of 78.90%. Other methodologies, on the other hand, often give lower overall detection rates.
Tests suggest that the proposed network has the potential to serve as the foundation for a computer-aided
diagnostics utility that enables HPC to make more accurate and timely diagnostic choices [13].
A Pap smear is a clinical test for detecting cervical cancer, although it may not always produce reliable results.
The procedure is also highly sensitive and time-consuming. A multimodal approach that combines imaging
techniques such as MRI and CT with other diagnostic tools may help diagnose cervical cancer. This study uses
image processing, and feature extraction approaches to assess pictures recorded by imaging technologies to
identify cervical cancer. This study examines the methodologies used to categorize textured pictures using
artificial intelligence, machine learning, and deep learning [14].
Because of developments in optical technology and the rapidly expanding area of silicon photonics, integrated
circuits for the detection of potentially deadly illnesses like cancer have been created (SiPh). Cancer cells are
excellent targets for detection because their optical characteristics differ from those of healthy blood cells. In
this article, we look at the newly discovered SiPh technology and how it may be used to take advantage of
optical features to identify cancer cells. Biosensor attributes are also considered, including their sensitivity,
affordability, and ease of application. In addition, various SiPh architectures are compared and contrasted [15],
including ring architectures, waveguides, photonic crystals (PhC), integrated circuits, and sensor arrays.
Machine learning is the activity of training a computer to learn from its errors and recognize subtle patterns in
complex data by using a variety of mathematical, statistical, and optimization approaches. This paper will
provide you with a quick overview of machine learning technologies that, when paired with cybersecurity, may
help in early breast cancer diagnosis. The work's main objective is to construct the most reliable algorithm for
accurately anticipating cancer's progression stages. This research compared and contrasted previously published
studies in this area to determine the efficacy and strength of each algorithm in categorizing data. The techniques
of Logistic Regression, Random Forest, and Decision Trees were used in the model training procedure. The
random forest model delivers the maximum accuracy, whereas the logistic regression approach provides the
most precision. The primary goal of this study is to find the best effective algorithm for detecting breast cancer
early. Our main aim is to choose the best method for the task. Thus we'll look for a high F1 score, strong
accuracy and recall [16].
This innovation gives the proposed biosensor greater control over its electrical performance characteristics,
namely the source-to-channel tunnelling rate. The charge-equivalent model of the C-erbB-2 interface is also
calculated here. To test the device's sensitivity, saliva and serum samples with various amounts of C-erbB-2
were employed. The conclusions of this assessment have been investigated. According to our findings, an III-V
heterojunction built of In1-xGaxAs/Si with x equal to 0.2 and an extended gate shape enhances tunnelling
probability, gate control to create a superior ION/IOFF ratio, and sensitivity. The effect of interface charges
corresponding to varying quantities of C-erbB-2 biomarkers enhances the biosensor sensitivity (as measured by
the ION/IOFF ratio) by a factor of 106 [17].
The improved loss function is a graph-based implementation of a regularized adaptive variation of the
complement cross entropy loss. Since the SoftMax cross entropy alone cannot overcome the erroneous
categorization problem, the cross-entropy loss approach is combined with the complement entropy
methodology. To punish the complement cross entropy loss, another way to improve the network's learning
potential is to use adaptive scaling of the regularization term using a spatial graph Laplacian basis. To evaluate
and compare the efficiency of the proposed approach, histopathology image datasets from BACH 2018 and
BreakHis are utilized. The suggested method outperforms existing state-of-the-art algorithms for the binary
classification of breast cancer image samples from the BreakHis dataset [18], achieving 99.00% precision,
99.40% recall, 99.20% F-1score, and 99.49% accuracy.
A noisy LO signal is added to the virtual system to study the influence of phase noise. To get this effect, we
"noise up" the input. This enables a direct comparison of findings obtained with a clean LO signal and those
obtained with a noisy LO signal. Both a frequency-modulated continuous-wave (FMCW) radar system
operating at 14-27 GHz and a pulse-modulated continuous-wave (PMCW) radar system working at 20 GHz are
simulated. To the best of the author's knowledge, this statistical technique is the first attempt to explain phase
noise in the time domain for FMCW and PMCW radar systems [19].
The suggested biosensor has been demonstrated to be much more sensitive than existing biosensors. Due to its
simplicity of fabrication and cheap cost, the device holds promise for use in array-based screening of breast
cancer cell lines and diagnostics [20]. The array-based screening technology might be applied to other
malignancies as well.
When mammography is employed as an imaging modality, the quality of the work proposed must be evaluated.
Mammography research makes use of the CBIS-DDSM database. They obtained the following results using our
suggested method: 94.12% accuracy, 93.33% true positive rate, 94.74% true negative rate, 0.93 precision, 0.93
F-score, 0.94 BCR, and 0.88 Youden's index. The study's results indicate a more successful technique for early
identification, classification, and localization of breast cancer. The suggested technique leads to superior
outcomes across various performance assessment measures such as accuracy, TPR, TNR, precision, F-score,
BCR, and Youden's index; see also [21].
As so, the following are the identified gaps: Intelligent algorithms have not been used in the Prevention phase of
cancer management, and they have been applied very sparingly to the Early Detection phase; private data
sources might be useful in this sort of study, but the difficulty of obtaining them is a barrier to research.
Moreover, despite Latin America and the Caribbean has a very high breast cancer mortality rates, patients from
these regions have not been the subject of any studies. Researchers' engagement in exploring this issue has been
minimal at best. The healthcare systems in the nations in the area might be much improved if more people lived
longer and if suggestions were generated based on sophisticated algorithms that were both cost-effective and
quick to execute [22]. There seems to be fantastic potential here since developing low-cost bids using
sophisticated algorithms is possible.
Lung cancer is one of the most frequent types of cancer, and unfortunately, it also has a relatively high mortality
rate. The chances of survival are much higher if it is discovered early. You Only Look Once (YOLOv3), a
custom-built deep learning framework, is being utilized to accurately and sensitively detect lung nodules from a
lung CT image. The YOLOv3 has a sensitivity of 0.961 and an accuracy of 0.9589 for identifying the nodules
[23]. Their deep neural network architecture allowed for this to happen.
A histologist's time may be better if they quickly and accurately identify suspicious areas for further inspection.
Artificial Neural Networks (CNNs) might be utilized for this detection. For this, we turn to the MIAS collection
of breast imaging studies. There are 322 mammograms, almost all showing 189 examples of healthy breasts and
133 examples of abnormal ones. Computer vision models such as the convolutional neural network are
employed in this research to examine mammary gland samples for indicators of cancer. The project is
continuing, and more developments are being made to improve the CNN design and employ trained neural
networks, hoping to produce more precise measurements [24].
Despite the significant progress made in cancer detection using machine learning techniques, some research
gaps still need to be addressed. Here are some of the research gaps in cancer detection:
Limited availability of labelled data: One of the major challenges in using machine learning techniques
for cancer detection is the limited availability of labelled data, particularly for rare types of cancer. This can
limit the effectiveness of supervised learning techniques, which rely on labelled data for training.
Lack of interpretability: Many machine learning models used for cancer detection are complex and
difficult to interpret, making it difficult to understand the underlying mechanisms of cancer and develop more
effective treatments.
Generalizability: Machine learning models for cancer detection often have high accuracy on specific
datasets but may not generalize well to other datasets or populations. This can limit their usefulness in clinical
Integration with clinical workflows: Machine learning models for cancer detection must be integrated
with clinical workflows to be useful in practice. This requires addressing data privacy, security, and regulatory
Limited research on rare cancers: Most research on cancer detection using machine learning has
focused on more common types of cancer. There is a need for more research on rare types of cancer, which can
be difficult to diagnose and treat.
There is an undeniable need for skin lesion diagnostic tools that can be incorporated into eHealth apps to assist
patients and medical professionals as the prevalence of skin cancer continues to climb. Melanoma is the worst
form of skin cancer, with an extremely poor five-year survival rate for those diagnosed with it. Melanoma
detected in its earlier stages has a better chance of being successfully treated. Many researchers contributed to
this article's writing, concisely explaining skin cancer function and how it may be detected. This information is
beneficial for the categorization of normal and abnormal skin cells.
