Uploaded by adetuire

GROUP 28 M&S

advertisement
THE LIKELINESS OF GLIOMA TO OCCUR
MORE IN MALES THAN IN FEMALES AND
THE SEX DIFFERENCE IN THE TREATMENT
AND OUTCOME OF GLIOMA PATIENTS
Eyo Otoabasi, Abdul Omotoyosi, Adetu Ireoluwatunde, Akalamudo David, Edet Daniel Tioluwani
Department of Computer Science (Group A)
Babcock University
Ilishan-Remo, Nigeria
adetu1021@student.babcock.edu.ng
Abstract— Glioblastoma(GBM) is the most common and
deadly primary brain tumor in adults. When it is detected it is
immediately categorised as stage IV cancer. The standard
remedy of glioblastoma includes surgery, radiotherapy, and
alkylating chemotherapy. Women are more likely to respond to
treatment than males thereby giving them a better chance of
survial
than
females.
O(6)-methylguanyl
DNA
methyltransferase (MGMT) promoter methylation predicts
benefit from alkylating chemotherapy with temozolomide and
guides first-line treatment selection in elderly patients. Current
research focuses on addressing the molecular traits that drive
the malignant phenotype, such as aberrant signal transduction
and angiogenesis, as well as, more recently, different
immunotherapy techniques. This study conducts a review of
two works using machine learning to understand a large
dataset that includes additional information on the subject
matter. We built a model that predicts the survival rate of
patients based on certain features selected from our dataset. So
we trained the model to predict wether a patient will survived
or not. The outcome of this project contributes to the literature
review by identifying gaps, methods, and approaches in
machine learning analysis of prediction and classification. [1]
Keywords— machine learning, glioblastpma, algorithms,
pathogenesis
I. INTRODUCTION
Glioblastoma is a type of central nervous system
tumor that is one of the most dangerous. It remains largely
incurable, despite advances in treatment modalities. Our
review's goal is to present a full image of glioblastoma
pathogenesis, etiology, clinical findings, epidemiology, and
treatment options. [2] A literature search for glioblastoma
was conducted using cBioPortal to get the dataset, PubMed
and Google, with papers published up until 2017 being
reviewed and the only risk factors identified were specific
genetic syndromes for glioblastoma. Till date Glioblastoma
arises as a result of malignant tumors which is most
prominent in the Central Nervous System (CNS) after that is
astrocytoma and lastly lymphoma Radiation. Patients may
be admitted to the clinic with different symptoms depending
on the tumor site. Several interfering and auto-immune
imaging techniques must be used to confirm the occurrence
and level of the tumor. The pathogenesis was discovered to
involve multiple signaling pathway aberrations, as well as
various mutations in genetic profiles and altered gene
expression, according to the literature review. Despite the
fact that there are multiple treatment options available, such
as surgery and extra chemotherapy and radiotherapy, the
disease has the highest rate of malignant tumors, and
patients usually die within 14 months of diagnosis. [3]
II. LITERATURE REVIEW
A. Glioblastoma Affects Men Differently Than Women
Glioblastoma is a severe brain cancer that often affects
persons over the age of 50, and it kills almost half of those
diagnosed in less than 14 months of their diagnosis. The
likelihood of Men having the disease is almost double that
of women is. Typically, the tumor will be surgically
removed, followed by radiation and chemotherapy. Even
that rigorous method isn't always adequate, and within six
months, new tumor cells frequently replace the ones that
have been destroyed. Women, on the other hand, benefit
more from the therapy than males. The researchers used
MRIs from a cancer research database to determine the
tumor development velocity of glioblastomas. Essentially,
you can view the tumor growth pace as patients are being
treated and calculate how quickly their tumors are growing.
This allows you to reason more carefully if the medication
being provided to the patient is useful.
The results from 40 males and 23 females who had
all received conventional therapy revealed that tumor
development velocities were similar. Only women, however
had a significant and consistent reduction in tumor
development after receiving the most commonly used
glioblastoma chemotherapy medication, temozolomide. [2]
B. Sex-Specific Differences in Glioblastoma
III. DISCUSSION
Sex hormones have an impact on the etiology and
fate of GBM tumors. Large-scale investigations in women
have found a substantial link between estrogens and
neuroprotective properties. In comparison, testosterone has
lately sparked attention in the field of Glioblastoma
tumorigenesis since it has been proposed that this sex
hormone plays a key role in the disease's male
predominance. Endogenous estrogens have been shown to
be neuroprotective in a number of neurologic illnesses,
including brain tumor genesis and growth control.
Furthermore, in a study conducted, premenopausal women
with GBM outlived males, and this difference vanished after
menopause.
Glioblastoma (GBM) is more common in
postmenopausal women than in premenopausal women. The
findings point to female hormones having a strong
protective impact in Glioblastoma. However, older females
have a much higher chance of getting glioblastoma which
leaves them at a disadvantage. Research was carried out
which says estrogen which is a female sex hormone protects
a woman from having glioblastoma and if she eventually
gets it, has a better chance to survive it. This is because the
estrogen hormone can overcome a blood to brain barrier. In
this research the hormones effect to the treatment involving
the possibility of getting the disease was conducted although
the results are questionable. There were two case studies,
the first was older women who were getting towards
menopause compared to those who have already passed
menopause, had a lower chance of getting the disease
compared to the latter. The second suggests that being a
younger woman and menstruating compared to being an
older woman and menstruating had a lower chance of
getting glioblastoma.
Metabolism is a fundamental driver of one surviving
tumor and the growth of tumors. Changes in metabolism also
supports the disparity between genders when glioblastoma is
detected. Diabetes is the most common kind of metabolic illness
in humans which has hyperglycemia which tends to aid the
development of a tumor because of the body not being able to
break down glucose properly. Furthermore, glucose metabolism
and lactate generation differ between male and female embryos,
with male embryos producing twice as much lactate as female
embryos.
The Human immune system is a major factor
which supports why Glioblastoma occurs more in males
than females. Women respond better because their immune
system respond stronger than that of men due to hormones
and their chromosomal makeup. Emerging evidence shows
that sex variations in the frequency, clinical features, and
prognosis of brain illnesses may be due to immunologic
differences. The immune system's function and sex
differences in GBM incidence are still unclear, but have
recently received attention. [3]
A. Problem Statement
For decades, researchers have known that males are more
likely than women to acquire glioblastoma, an aggressive
form of brain cancer. There is also evidence that women
react better to normal treatment for this condition than
males. Many illnesses, including certain malignancies,
affect men and women differently or generate distinct
symptoms depending on the patient's gender. These
disparities are typically connected to sex hormones like
testosterone or estrogen, which contribute to numerous
biological variations between men and women. For
example, while the female hormone estrogen contributes
considerably to more women developing breast cancer than
men, males are more prone than females to acquire
malignant brain tumors at all ages, including childhood. Sex
hormones, on the other hand, did not directly contribute to
female and male disparities in GBM diagnosis and survival.
This shows that the gap cannot be explained entirely by
circulating sex hormones. [4]
The most frequent primary CNS malignancy is
glioblastoma multiforme (GBM), which accounts for 45.2
percent of all malignant CNS tumors and 55 percent of all
gliomas. Males are 60% more likely than females to develop
glioblastoma in general. Brain cancer grades, like stages,
run from 1 to 4. The more aggressive the malignancy, the
higher the grade. Glioblastomas, on the other hand, are
invariably categorized as grade 4 brain cancer. This is due to
the fact that this malignancy is a particularly aggressive
variety of astrocytoma. Males are diagnosed about twice as
frequently as females. The researchers discovered that
conventional therapy for glioblastoma is more successful in
women than men when they studied people with the disease.
[3]
B. Aim and Objectives
This research is aimed at survival rate prediction. It is
centered around the survival rate for glioblastoma in males
compared to females. It is also coupled with objectives as
follows:
a) To design and develop a survival rate prediction
system using machine learning algorithms to
determine the survival rate in males compared to
females.
b) To analyze and study review body of literary
works, with glioma as a knowledge base for the
integration and implementation of the proposed
prediction system.
C. Methodology
This section explains the proposed system, which has
been employed to predict the survival rate of males
compared to females for Glioblastoma .
This paper demonstrates that survival rate prediction is
possible using a data set and machine learning algorithms.
The data set was downloaded from cBioPortal FOR
CANCER GENOMICS website. We determine the survival
rate of each patient by choosing specific features in our
dataset to train our model. This is called Feature
Engineering.
Page 2 of 7
We used machine learning algorithms such as K-Nearest
Neighbors ,Naive Bayes, Random Forest model, Decision
Tree. After building each model, we evaluated the four of
them and compared which model was best to predict the
survival rate. Then the model was optimized by the hyper
parameters being tuned using GridSearch. Finally, we saved
the result of the prediction derived from the dataset and then
saved the model to an excel file for reusability. [5]
D. Libraries
a) Pandas: Python provides a library named “pandas”
that we used to read our data into a data frame ds_glioma. It
is useful for importing dataset, data manipulation, managing
datasets and analysis of dataset. [6]
b) NumPy: NumPy (Numerical Python) is the
foundational Python package for scientific computation. As a
result, it inserts any mathematical operation into the code.
[7]
c) scikit-learn: Scikit-learn (Sklearn) is Python's most
useful and robust machine learning library. Through a
Python interface, it provides a collection of quick tools for
machine learning and statistical modeling, such as
classification, regression, clustering, and dimensionality
reduction.[8]
d) warnings: the ‘warning’ module is used to display
warning messages in python. The module is a subclass of
Exception which is a built-in class in Python. [9]
e) Seaborn: Seaborn is a matplotlib-based Python
data visualization package. It offers a high-level interface
for creating visually appealing and useful statistics visuals.
[10]
f) Matplotlib.pyplot: Matplotlib is a Python package
that allows you to create static, animated, and interactive
visualizations. Matplotlib makes simple things simple and
difficult things possible. Produce plots suitable for
publishing. Create interactive figures that can be zoomed,
panned, and updated. [11]
E. Applications Used
a) Jupyter Notebook
Jupyter notebook is a free and open-source web
application. Allows the creation and sharing of documents
with live code, equations, visualisations, text, other
multimedia elements, and explanatory text. Jupyter
Notebooks are a spinoff of the IPython project, which used to
have its own IPython Notebook project. The name Jupyter is
derived from the primary programming languages that it
supports: R, Julia, and Python. [12] Jupyter Notebooks may
be used for various data science tasks such as data cleansing
and transformation, numerical simulation, exploratory data
analysis, data visualisation, statistical modelling, machine
learning, deep learning, and much more. One of the reasons
we choose Jupyter Notebook is because it allows us to work
with Python inside a virtual “notebook.” Its adaptability is
one of the reasons it is gaining appeal among data scientists.
It allows the integration of code, photos, charts, comments,
and so on following the step of the “data science process.”
Furthermore, it is a type of interactive computing in which
users execute code, observe the results, adjust, and repeat in
an iterative interaction between the data scientist and the
data. [13]
IV. RESULTS
A. Data collection
The data set we are using contains information about 577
glioblastoma patients [14]. We fed this data to our machine
learning algorithms to plot our results to predict the survival
rate. Based on the data gathered, a researcher can evaluate
their hypothesis. In most cases, data collection is the first and
most crucial step regardless of the field of study. The
approach to data collection for different fields of study varies
depending on the information required.
Fig. 1. Snapshot of glioblastoma Dataset
B. Description of Dataset
In this project, we imported our dataset from cBioPortal.
The dataset was divided into training data and testing data.
75% of the data was for the training data and 35% was for
the testing data. [14]
C. Data Preprocessing
To achieve the expected accuracy, data must be
preprocessed. Data mining includes data preprocessing,
which is necessary for cleaning the data and removing
outliers and inconsistencies. The primary goal of
preprocessing is to maximise the extracted features, resulting
in more contextual features and normalising the dataset.[18]
Page 3 of 7
therapy, Diagnosis Age, Overall Survival (Months), Disease
Free (Months), Mutation Count are properties of the dataset
used for this system. When dealing with a dataset with many
features, the significance of feature selection becomes clear.
[14]
Feature engineering is a critical stage in training a
machine learning model, particularly for traditional machine
learning methods (not deep learning). It can take up the bulk
of the time in the workflow because we may need to return to
this stage several times to increase the performance of our
model. To begin with, we identified certain properties in our
dataset that are neither useable or valuable such as Sample
ID, Patient ID and Study ID and dropped them. [17]
Fig. 2. Snapshot of preprocessed data
D. Data Cleaning
This is the process of finding and fixing flaws in a
dataset that may have a detrimental impact on a prediction
model. While working with our dataset we observed that
there were rows with null values. So, we identified those
rows using the isna function. There are many ways to fix
null values in your data such as:
 To remove the entire row if they are missing.
 To leave the rows with null values as-it-is.
 To fill the gaps with the mean value.
 To fill the gaps with the mode value.
 Customised gap-filling algorithms. [15] [16]
Fig. 3. Snapshot of glioblastoma Dataset before Data Cleaning
After identifying them, we dropped the rows with null
values and then the total number of columns in our dataset
dropped from 577 columns to 183 columns. [16]
F. Training and Testing Data
To complete our data pre-processing stages, we will
divide our data into two datasets: training and testing. In this
scenario, because we have adequate data, we will split the
data in a 75:25 ratio for training and testing. As a
consequence, our training data will have 137 rows and our
testing data will have 47 rows. [14]
a) Training Data: Machine learning algorithms
construct a model from sample data, referred to as “training
data,” in order to make predictions or decisions without
being explicitly programmed to do so. A training set is used
to build a model in a dataset, while a test (or validation) set
is used to test the model. The entire training dataset can be
found here. As a result, features can be extracted, and the
model can be validated. [18]
b) Testing Data: On the other hand, a test set is a
subset of the dataset used to validate the machine learning
model. The ML model uses the test set to predict outcomes.
Once the model has been obtained, it can predict using the
model obtained on the training set. Some data may be used
conclusively, typically to ensure that a given set of input to a
given function yields the expected result. Other data may be
used to test the program’s ability to respond to unusual,
extreme, exceptional, or unexpected input. [19]
G. Graph
The graphs below provide a visual representation of our
model.
Fig 5. And Fig 6. shows how well each model performs.
To do this we created two chart, first is a grouped bar chart
to display the value of accuracy, precision, recall, f1, kappa
score of our model, and second a line chart to show the Area
Under the Curve(AUC) of all our models.
Fig. 4. Snapshot of glioblastoma Dataset after Data Cleaning
This will protect the integrity of our dataset and make it
error- free.
E. Feature Selection
The columns that are fed into our model (and then used
to make predictions) are referred to as "features." In our case,
they are the columns that will be utilized to calculate the
survival rate. You may utilize all columns except the target
as features at times. Other times, less features are preferable.
Many factors such as Cancer Type, Disease Free Status, Sex,
Page 4 of 7
Fig. 5. Evaluation Metrics for the four algorithms
Fig. 7. Evaluation metrics for Random Forest Model after optimization
Fig. 6. ROC curve (receiver operating characteristic curve)
Fig 6. And Fig 7. shows how the change in; accuracy,
precision, recall, F1 score, Kappa score and AUC (Area
Under the Curve) for Random Forest Model after
Optimization compared to the Base line Model. To do this
we created two chart, first is a grouped bar chart to display
the change in value of accuracy, precision, recall, f1, and
kappa score of our model, and second a line chart to show
the Area Under the Curve(AUC) for the Random Forest
models.
Fig. 8. ROC curve (receiver operating characteristic curve)
Page 5 of 7
VI. SUMMARY AND CONCLUSION
The main goal of this work was to evaluate the survival
rate of glioma using sex disparities. This project showed
that, Glioblastoma occurs less in women than in men just like
other many forms of cancer. Our study showed that males
receive, on average, more treatment than females, with
higher rates of radiotherapy, chemotherapy, and surgery for
glioblastoma patients. A large number of digital and
behavioral indicators could be refined continuously using a
process of continuous refinement can be used to predict the
survival rate. The majority of the following collection of new
data predictive indicators could be used and collected to
update a model, contributing to its creation in a more
significant body of cumulative knowledge in the discipline
The use of ML models will also increase, making it easier to
compare new studies to previous research disciplines
(assuming the precautions mentioned earlier are followed,
such as accurate categorization and consistent information
content variable predictors). For example, much work has
been done in the fields of human–computer interaction,
computer science, and information technology.
Fig. 1. Bar chart showing Mutation count against Diagnosis Age
Fig. 2. Bar chart showing Mutation Count and Sex
Second, we discovered that males have a larger
proportion of glioblastoma than females, as well as obtaining
higher rates of radiation, chemotherapy, and surgery on
average, but having a higher risk of mortality. Furthermore,
we show that when comparing by sex, there are no
statistically significant differences in numerous times to
treatment associated factors.
The dissimilarity among male and female are
important in terms of human health and disease. This study
concentrated on recent reults suggesting that glioblastoma is
a sexually dimorphic disease. Personalized research in order
to achieve sex-specific targeting in glioblastoma will need
uncovering and understanding the underlying genetic and
molecular mechanisms by which glioblastoma differs
between sexes. Mechanisms that cause glioblastoma to vary
across sexes As a result, preclinical research should come
first and should be carried out individually in men and
women before putting their findings into GBM.
Fig. 3. Bar chart showing Diagnosis Age and Sex
Fig. 4. Bar chart showing Overall Survival Status and Sex
References
V. LIMITATIONS
This project has several limitations. Our database is
based on 183 patients, which doesn’t capture the majority of
people living and deceased with glioblastoma.
Unfortunately, due to very low incidence of these tumors,
we had very few patients included in the analysis. With a
larger sample size, we would have had a more accurate
result.
It was observed that studies that included machine
learning in predicting the survival rate are not published
inaccessible journals. Thus, a limitation of exclusion has
been studied, which may affect the original outlook of the
research. In future research, publications of the intended
accuracy outcome of the machine learning algorithm shall be
included. Additional research using different databases is
required to evaluate parameters not covered here in order to
offer more information on sex differences in treatment time,
survival rate and outcomes for glioblastoma. [18]
[ E. G. ,. M. W. Hans-Georg Wirsching, "Glioblastoma,"
1 NIH,
[Online].
Available:
] https://pubmed.ncbi.nlm.nih.gov/26948367/. [Accessed 9
March 2022].
[ C. Leitch, "Glioblastoma Affects Men Differently Than
2 Women," labroots, 7 January 2019. [Online]. Available:
] https://varnish.labroots.com/trending/genetics-andgenomics/13747/glioblastoma-affectswomen?msclkid=7a3e0841b7ef11ecb23ee665f6d33fbf.
[Accessed 10 March 2022].
[ J. J. J. D. I. A. I. H. G. C. Anna Carrano, "Sex-Specific
3 Differences in Glioblastoma," 14 July 2021. [Online].
] Available:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8303471/
#B1-cells-10-01783. [Accessed 10 March 2022].
[ N. Staff, "Glioblastoma Study Highlights Sex Differences
4 in Brain Cancer," NIH, 30 January 2019. [Online].
] Available: https://www.cancer.gov/news-events/cancer-
Page 6 of 7
currents-blog/2019/glioblastoma-treatment-responsediffers-bysex#:~:text=The%20Biology%20of%20Sex%20Differenc
es&text=These%20differences%20are%20frequently%20l
inked,all%20ages%2C%20including%20in%20childhood.
. [Accessed 10 March 2022].
[ "004_Classification
Bank
Marketing
Dataset
5 (Assignment).ipynb,"
colab,
[Online].
Available:
] https://colab.research.google.com/github/rafiag/DTI2020/
blob/main/004_Classification_Bank_Marketing_Dataset_(
Assignment).ipynb#scrollTo=Xsl1BqVh8whF. [Accessed
10 March 2022].
[ "Data Analysis with Python and pandas using Jupyter
6 Notebook," Soda Developers, 01 February 2106. [Online].
] Available:
https://dev.socrata.com/blog/2016/02/01/pandas-andjupyternotebook.html#:~:text=Import%20a%20Dataset%20Into%
20Jupyter&text=pandas%20is%20an%20open%20source,
structures%20and%20data%20analysis%20tools.%E2%80
%9D&text=pandas%20has%20two%20main%20data%20
struc. [Accessed 10 March 2022].
[ "What is Numpy in Python | Python Numpy Tutorial,"
7 Great Learning, 11 January 2022. [Online]. Available:
] https://www.mygreatlearning.com/blog/python-numpytutorial/#:~:text=NumPy%2C%20which%20stands%20for
%20Numerical,stands%20for%20%E2%80%99%20Nume
rical%20Python%E2%80%99.. [Accessed 10 March
2022].
[ kunal, "Scikit-learn(sklearn) in Python – the most
8 important Machine Learning tool I learnt last year!,"
] Analytics Vidha, 5 January 2015. [Online]. Available:
https://www.analyticsvidhya.com/blog/2015/01/scikitlearn-python-machine-learning-tool/. [Accessed 10 March
2022].
[ R. Rajsaha, "Warnings in Python," GeeksforGeeks, 23
9 January
2020.
[Online].
Available:
] https://www.geeksforgeeks.org/warnings-in-python/.
[Accessed 10 March 2022].
[ M. Waskom, "seaborn: statistical data visualization,"
1 seaborn,
[Online].
Available:
0 https://seaborn.pydata.org/#:~:text=Seaborn%20is%20a%
] 20Python%20data,introductory%20notes%20or%20the%2
0paper.. [Accessed 10 March 2022].
[ "Matplotlib: Visualization with Python," Matplotlib,
1 [Online].
Available:
1 https://matplotlib.org/#:~:text=Matplotlib%20is%20a%20
] comprehensive%20library,can%20zoom%2C%20pan%2C
%20update.. [Accessed 10 March 2022].
[ "What is jupyter notebook?,Jupyter/Iphython Notebook
1 Quick Start Guide 0.1 documentation.," [Online].
2 Available: [15] “What is the jupyter nhttps://jupyter] notebook-beginnerguide.readthedocs.io/en/latest/what_is_jupyter.html.
.
[Accessed 10 March 2022].
[ K. Tran, "“5 reasons why you should switch from Jupyter
1 notebook to scripts"," 28 September 2020. [Online].
3 Available: https://towardsdatascience.com/5-reasons-why] you-should-switch-from-jupyter-notebook-to-scriptscb3535ba9c95. [Accessed 28 March 2022].
[ "Glioblastoma (TCGA, Cell 2013)," cBioPortal FOR
1 CANCER
GENOMICS,
[Online].
Available:
4 https://www.cbioportal.org/study/summary?id=gbm_tcga_
] pub2013. [Accessed 10 March 2022].
[ C. Tao, "Hands-on Machine Learning in Python —
1 Decision Tree Classification," Towards Data Science, 12
5 October
2020.
[Online].
Available:
] https://towardsdatascience.com/hands-on-machinelearning-in-python-decision-tree-classificationeba67a37a39c. [Accessed 10 March 2022].
[ Y.
F.
M.
L.
Model,
1 "https://www.kaggle.com/code/dansbecker/your-first6 machine-learning-model," Kaggle, [Online]. Available:
] https://www.kaggle.com/code/dansbecker/your-firstmachine-learning-model. [Accessed 10 March 2022].
[ "Preprocessing data," sxikit learn, [Online]. Available:
1 https://scikit-learn.org/stable/modules/preprocessing.html.
7 [Accessed 10 March 2022].
]
[ H. K. N. P. K. W. J. S. B.-S. Nickolas Stabellini, "Sex
1 Differences in Time to Treat and Outcomes for Gliomas,"
8 frontiers in Oncology, 19 February 2021. [Online].
] Available:
https://www.frontiersin.org/articles/10.3389/fonc.2021.63
0597/full. [Accessed 10 March 2022].
[ D. Carty, "Training Data vs. Validation Data vs. Test Data
1 for ML Algorithms," Applause, 29 April 2021. [Online].
9 Available: https://www.applause.com/blog/training-data] validation-data-vs-test-data. [Accessed 10 March 2022].
[ "Training Data and Test Data," tutorialspoint, [Online].
2 Available:
0 https://www.tutorialspoint.com/machine_learning_with_p
] ython/machine_learning_with_python_training_test_data.
htm. [Accessed 10 March 2022].
Page 7 of 7
Download