BREAST CANCER PREDICTION USING MACHINE LEARNING BACHELOR OF TECHNOLOGY IN ELECTRONICS AND COMMUNICATION ENGINEERING SUBMITTED BY - A. SAI 32010651200 KRISHNA 1 A. GURU 32010651200 DATTA 2 M. YAKSHITH 32010651203 UNDER THE GUIDANCE OF 0 PROF. G SASI BHUSHANA RAO DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING ANDHRA UNIVERISTY COLLEGE OF ENGINEERING ANDHRA UNIVERISTY VISAKHAPATNAM-530003 2023-2024 CONTENTS : Abstract Introduction Tumor types and differences Project Methodology Machine Learning Algorithms Feature Extraction Result Conclusion References ABSTRACT : • Breast cancer is a common cause of female mortality in developing countries. Early detection and treatment are crucial for successful outcomes. • In this project, we propose the adoption of logistic regression as an alternative to k-nearest neighbors (KNN) for classification tasks. • We present a comparative analysis of the two methods using real-world datasets, evaluating their performance metrics such as accuracy, precision, recall, and F1score. • Our findings demonstrate the potential of logistic regression as a powerful alternative to KNN, providing insights for practitioners seeking to improve classification performance in their applications. INTRODUCTION : • Breast cancer is one of the most prevalent forms of cancer affecting women globally, making early detection crucial for successful treatment and improved survival rates. • Manual cancer identification using microscopic biopsy images is subjective; findings can vary from expert to expert depending on their experience and other factors. • With the advancements in technology, machine learning (ML) has emerged as a powerful tool in healthcare for predicting and diagnosing various diseases, including breast cancer. • The automated identification of malignant tissue by extracting features from microscopic biopsy images using Machine Learning helps to alleviate the problems outlined above and gives improved outcomes. Basic types of tumors : • Benign and malignant tumors are two fundamental classifications of tumors, and understanding the distinction between them is crucial in the context of breast cancer diagnosis and prognosis • Both benign and malignant tumors arise from abnormal cell growth, but they exhibit key differences in terms of behavior, impact on surrounding tissues, and the potential for spreading to other parts of the body • Malignant cells are considered cancerous, Malignant breast cells have the potential to grow uncontrollably, invade surrounding tissues, and spread to other parts of the body, leading to the formation of tumors. • BENIGN • MALIGNANT • Slowly growing. • Rapidly growing. • Regular surface ,Capsulated. • Irregular surface ,Noncapsulated. • No spread or Metastasis. • Not attached to deep structures. • Spread or Metastasis. • Slight pressure effect in neighboring organ. • Remarkable pressure effect in neighboring organ. • Attached to deep structures. Symptoms : • A change in the size, shape or contour of your breast. • A mass or lump, which may feel as small as a pea. • A lump or thickening in or near your breast or in your underarm that persists through your menstrual cycle . • A change in the look or feel of your skin on your breast or nipple. • A marble-like hardened area under your skin. PROJECT METHODOLOGY: Collection of microscopic biopsy images Feature Extraction Data processing Train and Evaluation split Machine Learning model Prediction Classification Suffering from Breast Cancer Not Suffering from Breast Cancer MACHINE LEARNING : • Machine learning (ML) is a subfield of artificial intelligence that uses statistical, probabilistic, and optimization techniques to help computers learn from past examples and find patterns in data sets. • In essence, it's about teaching machines to recognize patterns and make decisions based on data rather than being explicitly programmed to do so. • Machine learning can Learning: be broadlyAlgorithms categorized intofrom several Supervised learn labeled types: data, making predictions or decisions based on input-output pairs provided during training. Unsupervised Learning: Algorithms learn from unlabeled data to discover patterns or structures within it, without explicit guidance on what to look for. EXISTING METHOD FOR BREAST CANCER DETECTION : K- Nearest Neighbours: • The k-nearest neighbors (KNN) algorithm is a non-parametric, supervised learning classifier, which uses proximity to make classifications or predictions about the grouping of an individual data point. • The KNN algorithm uses 'feature similarity' to predict the values of any new data points. This means that the new point is assigned a value based on how closely it resembles the points in the training set. Minkowski Distance : Minkowski distance is a mathematical measure of the distance between two points in a multidimensional space. Generally, we use p=2 in case of k-Nearest Classifier PROPOSED METHOD FOR BREAST CANCER DETECTION : Logistic Regression: • Logistic regression is a statistical method used for binary classification tasks, where the target variable has two possible outcomes (e.g., true/false, yes/no, 0/1). • logistic regression is a classification algorithm, not a regression algorithm. It models the probability that a given input belongs to a particular category using a logistic function (also known as the sigmoid function). SIGMOID FUNCTION The Logistic function is of the form: where μ is a location parameter (the midpoint of the curve, where p(μ )=1/2}) and s is a scale parameter. This expression may be rewritten as: Where and is known as the intercept and is the rate parameter Feature extraction: Method used : Gray Level Co-occurrence Matrix • GLCM stands for Gray-Level Co-occurrence Matrix. It's a technique used in image processing to understand the texture of an image. • GLCM organizes this information into a matrix. Each cell in the matrix represents how often two gray levels appear together at a certain distance and in a certain direction in the image. Features Extracted : Contrast Correlation Dissimilarity Homogenity Angular Second Movement Energy Contrast : Contrast in GLCM (Gray-Level Co-occurrence Matrix) refers to how much the gray levels in an image differ from each other in neighboring pixels. Correlation : Correlation in GLCM (Gray-Level Co-occurrence Matrix) is a measure of how much the gray levels in an image are related or vary together in a particular direction. Dissimilairty : Dissimilarity in GLCM (Gray-Level Co-occurrence Matrix) is a measure of how different neighboring pixels are from each other in an image. Homogeneity : Homogeneity in GLCM (Gray-Level Co-occurrence Matrix) is a measure of how uniform or smooth the texture of an image appears. Angular Second Moment: ASM is like a measure of orderliness in an image. It calculates how regularly different gray levels appear in different directions throughout the image. Energy : Probabilities of different grey levels in the image. Results : Conclusion: • In conclusion, our comparative analysis demonstrates that logistic exhibit higher accuracy in breast cancer detection compared to k-nearest neighbors (KNN). • The logistic regression model consistently outperforms knearest neighbors (KNN) various evaluation metrics, benefiting from its ability to model linear relationships and provide interpretable results. • KNN suffers from computational complexity, especially in highdimensional spaces. The superior performance of logistic regression suggests their potential utility as a predictive tool in clinical practice, aiding in early diagnosis and treatment planning. • This underscores the importance of selecting appropriate machine learning algorithms tailored to specific healthcare tasks, with logistic regression proving to be a reliable choice for breast cancer detection. Future Scope: Personalized Medicine: • Machine learning models can be tailored to individual patient characteristics, such as genetic markers, medical history, and lifestyle factors. • This personalized approach can lead to more accurate risk assessment and treatment recommendations, improving patient Early Detection and Prevention: outcomes and reducing unnecessary interventions. • Machine learning algorithms can analyse large-scale datasets to identify subtle patterns and biomarkers associated with earlystage breast cancer. By detecting cancer at an earlier stage, patients can receive timely interventions, leading to better prognosis and survival rates. Explainable AI (XAI): • As machine learning models become more complex, there is a growing need for transparency and interpretability. • Explainable AI techniques can help clinicians and researchers understand how models make predictions, enabling them to trust and validate the results and identify potential biases or limitations. References: • “Comparing Logistic Regression to the K-nearest Neighbors (KNN) technique, A Novel Pattern Discovery Based Human Activity Recognition” by S. Ritesh Reddy, Devi T. • “Comparison of machine learning models for breast cancer diagnosis” by Rania R. Kadhim, Mohammed Y. Kamil • VolcashDB: Volcanic ash particle image and classification database January 2023 THANK YOU