THE LIKELINESS OF GLIOMA TO OCCUR MORE IN MALES THAN IN FEMALES AND THE SEX DIFFERENCE IN THE TREATMENT AND OUTCOME OF GLIOMA PATIENTS Eyo Otoabasi, Abdul Omotoyosi, Adetu Ireoluwatunde, Akalamudo David, Edet Daniel Tioluwani Department of Computer Science (Group A) Babcock University Ilishan-Remo, Nigeria adetu1021@student.babcock.edu.ng Abstract— Glioblastoma(GBM) is the most common and deadly primary brain tumor in adults. When it is detected it is immediately categorised as stage IV cancer. The standard remedy of glioblastoma includes surgery, radiotherapy, and alkylating chemotherapy. Women are more likely to respond to treatment than males thereby giving them a better chance of survial than females. O(6)-methylguanyl DNA methyltransferase (MGMT) promoter methylation predicts benefit from alkylating chemotherapy with temozolomide and guides first-line treatment selection in elderly patients. Current research focuses on addressing the molecular traits that drive the malignant phenotype, such as aberrant signal transduction and angiogenesis, as well as, more recently, different immunotherapy techniques. This study conducts a review of two works using machine learning to understand a large dataset that includes additional information on the subject matter. We built a model that predicts the survival rate of patients based on certain features selected from our dataset. So we trained the model to predict wether a patient will survived or not. The outcome of this project contributes to the literature review by identifying gaps, methods, and approaches in machine learning analysis of prediction and classification. [1] Keywords— machine learning, glioblastpma, algorithms, pathogenesis I. INTRODUCTION Glioblastoma is a type of central nervous system tumor that is one of the most dangerous. It remains largely incurable, despite advances in treatment modalities. Our review's goal is to present a full image of glioblastoma pathogenesis, etiology, clinical findings, epidemiology, and treatment options. [2] A literature search for glioblastoma was conducted using cBioPortal to get the dataset, PubMed and Google, with papers published up until 2017 being reviewed and the only risk factors identified were specific genetic syndromes for glioblastoma. Till date Glioblastoma arises as a result of malignant tumors which is most prominent in the Central Nervous System (CNS) after that is astrocytoma and lastly lymphoma Radiation. Patients may be admitted to the clinic with different symptoms depending on the tumor site. Several interfering and auto-immune imaging techniques must be used to confirm the occurrence and level of the tumor. The pathogenesis was discovered to involve multiple signaling pathway aberrations, as well as various mutations in genetic profiles and altered gene expression, according to the literature review. Despite the fact that there are multiple treatment options available, such as surgery and extra chemotherapy and radiotherapy, the disease has the highest rate of malignant tumors, and patients usually die within 14 months of diagnosis. [3] II. LITERATURE REVIEW A. Glioblastoma Affects Men Differently Than Women Glioblastoma is a severe brain cancer that often affects persons over the age of 50, and it kills almost half of those diagnosed in less than 14 months of their diagnosis. The likelihood of Men having the disease is almost double that of women is. Typically, the tumor will be surgically removed, followed by radiation and chemotherapy. Even that rigorous method isn't always adequate, and within six months, new tumor cells frequently replace the ones that have been destroyed. Women, on the other hand, benefit more from the therapy than males. The researchers used MRIs from a cancer research database to determine the tumor development velocity of glioblastomas. Essentially, you can view the tumor growth pace as patients are being treated and calculate how quickly their tumors are growing. This allows you to reason more carefully if the medication being provided to the patient is useful. The results from 40 males and 23 females who had all received conventional therapy revealed that tumor development velocities were similar. Only women, however had a significant and consistent reduction in tumor development after receiving the most commonly used glioblastoma chemotherapy medication, temozolomide. [2] B. Sex-Specific Differences in Glioblastoma III. DISCUSSION Sex hormones have an impact on the etiology and fate of GBM tumors. Large-scale investigations in women have found a substantial link between estrogens and neuroprotective properties. In comparison, testosterone has lately sparked attention in the field of Glioblastoma tumorigenesis since it has been proposed that this sex hormone plays a key role in the disease's male predominance. Endogenous estrogens have been shown to be neuroprotective in a number of neurologic illnesses, including brain tumor genesis and growth control. Furthermore, in a study conducted, premenopausal women with GBM outlived males, and this difference vanished after menopause. Glioblastoma (GBM) is more common in postmenopausal women than in premenopausal women. The findings point to female hormones having a strong protective impact in Glioblastoma. However, older females have a much higher chance of getting glioblastoma which leaves them at a disadvantage. Research was carried out which says estrogen which is a female sex hormone protects a woman from having glioblastoma and if she eventually gets it, has a better chance to survive it. This is because the estrogen hormone can overcome a blood to brain barrier. In this research the hormones effect to the treatment involving the possibility of getting the disease was conducted although the results are questionable. There were two case studies, the first was older women who were getting towards menopause compared to those who have already passed menopause, had a lower chance of getting the disease compared to the latter. The second suggests that being a younger woman and menstruating compared to being an older woman and menstruating had a lower chance of getting glioblastoma. Metabolism is a fundamental driver of one surviving tumor and the growth of tumors. Changes in metabolism also supports the disparity between genders when glioblastoma is detected. Diabetes is the most common kind of metabolic illness in humans which has hyperglycemia which tends to aid the development of a tumor because of the body not being able to break down glucose properly. Furthermore, glucose metabolism and lactate generation differ between male and female embryos, with male embryos producing twice as much lactate as female embryos. The Human immune system is a major factor which supports why Glioblastoma occurs more in males than females. Women respond better because their immune system respond stronger than that of men due to hormones and their chromosomal makeup. Emerging evidence shows that sex variations in the frequency, clinical features, and prognosis of brain illnesses may be due to immunologic differences. The immune system's function and sex differences in GBM incidence are still unclear, but have recently received attention. [3] A. Problem Statement For decades, researchers have known that males are more likely than women to acquire glioblastoma, an aggressive form of brain cancer. There is also evidence that women react better to normal treatment for this condition than males. Many illnesses, including certain malignancies, affect men and women differently or generate distinct symptoms depending on the patient's gender. These disparities are typically connected to sex hormones like testosterone or estrogen, which contribute to numerous biological variations between men and women. For example, while the female hormone estrogen contributes considerably to more women developing breast cancer than men, males are more prone than females to acquire malignant brain tumors at all ages, including childhood. Sex hormones, on the other hand, did not directly contribute to female and male disparities in GBM diagnosis and survival. This shows that the gap cannot be explained entirely by circulating sex hormones. [4] The most frequent primary CNS malignancy is glioblastoma multiforme (GBM), which accounts for 45.2 percent of all malignant CNS tumors and 55 percent of all gliomas. Males are 60% more likely than females to develop glioblastoma in general. Brain cancer grades, like stages, run from 1 to 4. The more aggressive the malignancy, the higher the grade. Glioblastomas, on the other hand, are invariably categorized as grade 4 brain cancer. This is due to the fact that this malignancy is a particularly aggressive variety of astrocytoma. Males are diagnosed about twice as frequently as females. The researchers discovered that conventional therapy for glioblastoma is more successful in women than men when they studied people with the disease. [3] B. Aim and Objectives This research is aimed at survival rate prediction. It is centered around the survival rate for glioblastoma in males compared to females. It is also coupled with objectives as follows: a) To design and develop a survival rate prediction system using machine learning algorithms to determine the survival rate in males compared to females. b) To analyze and study review body of literary works, with glioma as a knowledge base for the integration and implementation of the proposed prediction system. C. Methodology This section explains the proposed system, which has been employed to predict the survival rate of males compared to females for Glioblastoma . This paper demonstrates that survival rate prediction is possible using a data set and machine learning algorithms. The data set was downloaded from cBioPortal FOR CANCER GENOMICS website. We determine the survival rate of each patient by choosing specific features in our dataset to train our model. This is called Feature Engineering. Page 2 of 7 We used machine learning algorithms such as K-Nearest Neighbors ,Naive Bayes, Random Forest model, Decision Tree. After building each model, we evaluated the four of them and compared which model was best to predict the survival rate. Then the model was optimized by the hyper parameters being tuned using GridSearch. Finally, we saved the result of the prediction derived from the dataset and then saved the model to an excel file for reusability. [5] D. Libraries a) Pandas: Python provides a library named “pandas” that we used to read our data into a data frame ds_glioma. It is useful for importing dataset, data manipulation, managing datasets and analysis of dataset. [6] b) NumPy: NumPy (Numerical Python) is the foundational Python package for scientific computation. As a result, it inserts any mathematical operation into the code. [7] c) scikit-learn: Scikit-learn (Sklearn) is Python's most useful and robust machine learning library. Through a Python interface, it provides a collection of quick tools for machine learning and statistical modeling, such as classification, regression, clustering, and dimensionality reduction.[8] d) warnings: the ‘warning’ module is used to display warning messages in python. The module is a subclass of Exception which is a built-in class in Python. [9] e) Seaborn: Seaborn is a matplotlib-based Python data visualization package. It offers a high-level interface for creating visually appealing and useful statistics visuals. [10] f) Matplotlib.pyplot: Matplotlib is a Python package that allows you to create static, animated, and interactive visualizations. Matplotlib makes simple things simple and difficult things possible. Produce plots suitable for publishing. Create interactive figures that can be zoomed, panned, and updated. [11] E. Applications Used a) Jupyter Notebook Jupyter notebook is a free and open-source web application. Allows the creation and sharing of documents with live code, equations, visualisations, text, other multimedia elements, and explanatory text. Jupyter Notebooks are a spinoff of the IPython project, which used to have its own IPython Notebook project. The name Jupyter is derived from the primary programming languages that it supports: R, Julia, and Python. [12] Jupyter Notebooks may be used for various data science tasks such as data cleansing and transformation, numerical simulation, exploratory data analysis, data visualisation, statistical modelling, machine learning, deep learning, and much more. One of the reasons we choose Jupyter Notebook is because it allows us to work with Python inside a virtual “notebook.” Its adaptability is one of the reasons it is gaining appeal among data scientists. It allows the integration of code, photos, charts, comments, and so on following the step of the “data science process.” Furthermore, it is a type of interactive computing in which users execute code, observe the results, adjust, and repeat in an iterative interaction between the data scientist and the data. [13] IV. RESULTS A. Data collection The data set we are using contains information about 577 glioblastoma patients [14]. We fed this data to our machine learning algorithms to plot our results to predict the survival rate. Based on the data gathered, a researcher can evaluate their hypothesis. In most cases, data collection is the first and most crucial step regardless of the field of study. The approach to data collection for different fields of study varies depending on the information required. Fig. 1. Snapshot of glioblastoma Dataset B. Description of Dataset In this project, we imported our dataset from cBioPortal. The dataset was divided into training data and testing data. 75% of the data was for the training data and 35% was for the testing data. [14] C. Data Preprocessing To achieve the expected accuracy, data must be preprocessed. Data mining includes data preprocessing, which is necessary for cleaning the data and removing outliers and inconsistencies. The primary goal of preprocessing is to maximise the extracted features, resulting in more contextual features and normalising the dataset.[18] Page 3 of 7 therapy, Diagnosis Age, Overall Survival (Months), Disease Free (Months), Mutation Count are properties of the dataset used for this system. When dealing with a dataset with many features, the significance of feature selection becomes clear. [14] Feature engineering is a critical stage in training a machine learning model, particularly for traditional machine learning methods (not deep learning). It can take up the bulk of the time in the workflow because we may need to return to this stage several times to increase the performance of our model. To begin with, we identified certain properties in our dataset that are neither useable or valuable such as Sample ID, Patient ID and Study ID and dropped them. [17] Fig. 2. Snapshot of preprocessed data D. Data Cleaning This is the process of finding and fixing flaws in a dataset that may have a detrimental impact on a prediction model. While working with our dataset we observed that there were rows with null values. So, we identified those rows using the isna function. There are many ways to fix null values in your data such as: To remove the entire row if they are missing. To leave the rows with null values as-it-is. To fill the gaps with the mean value. To fill the gaps with the mode value. Customised gap-filling algorithms. [15] [16] Fig. 3. Snapshot of glioblastoma Dataset before Data Cleaning After identifying them, we dropped the rows with null values and then the total number of columns in our dataset dropped from 577 columns to 183 columns. [16] F. Training and Testing Data To complete our data pre-processing stages, we will divide our data into two datasets: training and testing. In this scenario, because we have adequate data, we will split the data in a 75:25 ratio for training and testing. As a consequence, our training data will have 137 rows and our testing data will have 47 rows. [14] a) Training Data: Machine learning algorithms construct a model from sample data, referred to as “training data,” in order to make predictions or decisions without being explicitly programmed to do so. A training set is used to build a model in a dataset, while a test (or validation) set is used to test the model. The entire training dataset can be found here. As a result, features can be extracted, and the model can be validated. [18] b) Testing Data: On the other hand, a test set is a subset of the dataset used to validate the machine learning model. The ML model uses the test set to predict outcomes. Once the model has been obtained, it can predict using the model obtained on the training set. Some data may be used conclusively, typically to ensure that a given set of input to a given function yields the expected result. Other data may be used to test the program’s ability to respond to unusual, extreme, exceptional, or unexpected input. [19] G. Graph The graphs below provide a visual representation of our model. Fig 5. And Fig 6. shows how well each model performs. To do this we created two chart, first is a grouped bar chart to display the value of accuracy, precision, recall, f1, kappa score of our model, and second a line chart to show the Area Under the Curve(AUC) of all our models. Fig. 4. Snapshot of glioblastoma Dataset after Data Cleaning This will protect the integrity of our dataset and make it error- free. E. Feature Selection The columns that are fed into our model (and then used to make predictions) are referred to as "features." In our case, they are the columns that will be utilized to calculate the survival rate. You may utilize all columns except the target as features at times. Other times, less features are preferable. Many factors such as Cancer Type, Disease Free Status, Sex, Page 4 of 7 Fig. 5. Evaluation Metrics for the four algorithms Fig. 7. Evaluation metrics for Random Forest Model after optimization Fig. 6. ROC curve (receiver operating characteristic curve) Fig 6. And Fig 7. shows how the change in; accuracy, precision, recall, F1 score, Kappa score and AUC (Area Under the Curve) for Random Forest Model after Optimization compared to the Base line Model. To do this we created two chart, first is a grouped bar chart to display the change in value of accuracy, precision, recall, f1, and kappa score of our model, and second a line chart to show the Area Under the Curve(AUC) for the Random Forest models. Fig. 8. ROC curve (receiver operating characteristic curve) Page 5 of 7 VI. SUMMARY AND CONCLUSION The main goal of this work was to evaluate the survival rate of glioma using sex disparities. This project showed that, Glioblastoma occurs less in women than in men just like other many forms of cancer. Our study showed that males receive, on average, more treatment than females, with higher rates of radiotherapy, chemotherapy, and surgery for glioblastoma patients. A large number of digital and behavioral indicators could be refined continuously using a process of continuous refinement can be used to predict the survival rate. The majority of the following collection of new data predictive indicators could be used and collected to update a model, contributing to its creation in a more significant body of cumulative knowledge in the discipline The use of ML models will also increase, making it easier to compare new studies to previous research disciplines (assuming the precautions mentioned earlier are followed, such as accurate categorization and consistent information content variable predictors). For example, much work has been done in the fields of human–computer interaction, computer science, and information technology. Fig. 1. Bar chart showing Mutation count against Diagnosis Age Fig. 2. Bar chart showing Mutation Count and Sex Second, we discovered that males have a larger proportion of glioblastoma than females, as well as obtaining higher rates of radiation, chemotherapy, and surgery on average, but having a higher risk of mortality. Furthermore, we show that when comparing by sex, there are no statistically significant differences in numerous times to treatment associated factors. The dissimilarity among male and female are important in terms of human health and disease. This study concentrated on recent reults suggesting that glioblastoma is a sexually dimorphic disease. Personalized research in order to achieve sex-specific targeting in glioblastoma will need uncovering and understanding the underlying genetic and molecular mechanisms by which glioblastoma differs between sexes. Mechanisms that cause glioblastoma to vary across sexes As a result, preclinical research should come first and should be carried out individually in men and women before putting their findings into GBM. Fig. 3. Bar chart showing Diagnosis Age and Sex Fig. 4. Bar chart showing Overall Survival Status and Sex References V. LIMITATIONS This project has several limitations. Our database is based on 183 patients, which doesn’t capture the majority of people living and deceased with glioblastoma. Unfortunately, due to very low incidence of these tumors, we had very few patients included in the analysis. With a larger sample size, we would have had a more accurate result. It was observed that studies that included machine learning in predicting the survival rate are not published inaccessible journals. Thus, a limitation of exclusion has been studied, which may affect the original outlook of the research. In future research, publications of the intended accuracy outcome of the machine learning algorithm shall be included. Additional research using different databases is required to evaluate parameters not covered here in order to offer more information on sex differences in treatment time, survival rate and outcomes for glioblastoma. [18] [ E. G. ,. M. W. Hans-Georg Wirsching, "Glioblastoma," 1 NIH, [Online]. Available: ] https://pubmed.ncbi.nlm.nih.gov/26948367/. [Accessed 9 March 2022]. [ C. Leitch, "Glioblastoma Affects Men Differently Than 2 Women," labroots, 7 January 2019. [Online]. Available: ] https://varnish.labroots.com/trending/genetics-andgenomics/13747/glioblastoma-affectswomen?msclkid=7a3e0841b7ef11ecb23ee665f6d33fbf. [Accessed 10 March 2022]. [ J. J. J. D. I. A. I. H. G. C. Anna Carrano, "Sex-Specific 3 Differences in Glioblastoma," 14 July 2021. [Online]. ] Available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8303471/ #B1-cells-10-01783. [Accessed 10 March 2022]. [ N. Staff, "Glioblastoma Study Highlights Sex Differences 4 in Brain Cancer," NIH, 30 January 2019. [Online]. ] Available: https://www.cancer.gov/news-events/cancer- Page 6 of 7 currents-blog/2019/glioblastoma-treatment-responsediffers-bysex#:~:text=The%20Biology%20of%20Sex%20Differenc es&text=These%20differences%20are%20frequently%20l inked,all%20ages%2C%20including%20in%20childhood. . [Accessed 10 March 2022]. [ "004_Classification Bank Marketing Dataset 5 (Assignment).ipynb," colab, [Online]. Available: ] https://colab.research.google.com/github/rafiag/DTI2020/ blob/main/004_Classification_Bank_Marketing_Dataset_( Assignment).ipynb#scrollTo=Xsl1BqVh8whF. [Accessed 10 March 2022]. [ "Data Analysis with Python and pandas using Jupyter 6 Notebook," Soda Developers, 01 February 2106. [Online]. ] Available: https://dev.socrata.com/blog/2016/02/01/pandas-andjupyternotebook.html#:~:text=Import%20a%20Dataset%20Into% 20Jupyter&text=pandas%20is%20an%20open%20source, structures%20and%20data%20analysis%20tools.%E2%80 %9D&text=pandas%20has%20two%20main%20data%20 struc. [Accessed 10 March 2022]. [ "What is Numpy in Python | Python Numpy Tutorial," 7 Great Learning, 11 January 2022. [Online]. Available: ] https://www.mygreatlearning.com/blog/python-numpytutorial/#:~:text=NumPy%2C%20which%20stands%20for %20Numerical,stands%20for%20%E2%80%99%20Nume rical%20Python%E2%80%99.. [Accessed 10 March 2022]. [ kunal, "Scikit-learn(sklearn) in Python – the most 8 important Machine Learning tool I learnt last year!," ] Analytics Vidha, 5 January 2015. [Online]. Available: https://www.analyticsvidhya.com/blog/2015/01/scikitlearn-python-machine-learning-tool/. [Accessed 10 March 2022]. [ R. Rajsaha, "Warnings in Python," GeeksforGeeks, 23 9 January 2020. [Online]. Available: ] https://www.geeksforgeeks.org/warnings-in-python/. [Accessed 10 March 2022]. [ M. Waskom, "seaborn: statistical data visualization," 1 seaborn, [Online]. Available: 0 https://seaborn.pydata.org/#:~:text=Seaborn%20is%20a% ] 20Python%20data,introductory%20notes%20or%20the%2 0paper.. [Accessed 10 March 2022]. [ "Matplotlib: Visualization with Python," Matplotlib, 1 [Online]. Available: 1 https://matplotlib.org/#:~:text=Matplotlib%20is%20a%20 ] comprehensive%20library,can%20zoom%2C%20pan%2C %20update.. [Accessed 10 March 2022]. [ "What is jupyter notebook?,Jupyter/Iphython Notebook 1 Quick Start Guide 0.1 documentation.," [Online]. 2 Available: [15] “What is the jupyter nhttps://jupyter] notebook-beginnerguide.readthedocs.io/en/latest/what_is_jupyter.html. . [Accessed 10 March 2022]. [ K. Tran, "“5 reasons why you should switch from Jupyter 1 notebook to scripts"," 28 September 2020. [Online]. 3 Available: https://towardsdatascience.com/5-reasons-why] you-should-switch-from-jupyter-notebook-to-scriptscb3535ba9c95. [Accessed 28 March 2022]. [ "Glioblastoma (TCGA, Cell 2013)," cBioPortal FOR 1 CANCER GENOMICS, [Online]. Available: 4 https://www.cbioportal.org/study/summary?id=gbm_tcga_ ] pub2013. [Accessed 10 March 2022]. [ C. Tao, "Hands-on Machine Learning in Python — 1 Decision Tree Classification," Towards Data Science, 12 5 October 2020. [Online]. Available: ] https://towardsdatascience.com/hands-on-machinelearning-in-python-decision-tree-classificationeba67a37a39c. [Accessed 10 March 2022]. [ Y. F. M. L. Model, 1 "https://www.kaggle.com/code/dansbecker/your-first6 machine-learning-model," Kaggle, [Online]. Available: ] https://www.kaggle.com/code/dansbecker/your-firstmachine-learning-model. [Accessed 10 March 2022]. [ "Preprocessing data," sxikit learn, [Online]. Available: 1 https://scikit-learn.org/stable/modules/preprocessing.html. 7 [Accessed 10 March 2022]. ] [ H. K. N. P. K. W. J. S. B.-S. Nickolas Stabellini, "Sex 1 Differences in Time to Treat and Outcomes for Gliomas," 8 frontiers in Oncology, 19 February 2021. [Online]. ] Available: https://www.frontiersin.org/articles/10.3389/fonc.2021.63 0597/full. [Accessed 10 March 2022]. [ D. Carty, "Training Data vs. Validation Data vs. Test Data 1 for ML Algorithms," Applause, 29 April 2021. [Online]. 9 Available: https://www.applause.com/blog/training-data] validation-data-vs-test-data. [Accessed 10 March 2022]. [ "Training Data and Test Data," tutorialspoint, [Online]. 2 Available: 0 https://www.tutorialspoint.com/machine_learning_with_p ] ython/machine_learning_with_python_training_test_data. htm. [Accessed 10 March 2022]. Page 7 of 7