Lung Cancer Detection using Python ML

LUNG CANCER DETECTION USING PYTHON ML INTRODUCTION • Lung Cancer (or lung carcinoma) is one of the most common maladies around the world. And is the second most diagnosed cancer in the individuals. Lung cancer cannot be prevented but its risk can be reduced. So detection of lung cancer at the earliest is crucial for the survival rate of patients. The number of chain- smokers is directly proportional to the number of people affected with lung cancer. In this project I am using machine learning algorithm for the detection lung cancer from the provided CT scan image datasets. This is purely based on python programming language. This project can be used in the treatment for early detection of lung cancers in the individuals and can help them in overcoming this health conditions. • I used Jupyter Notebook as my working platform or IDE during this project. And the dataset was downloaded from Kaggle. Different libraries and modules has been used in the process. Image in the datasets are basically the slices of CT Scans. Segmentation, Feature Extraction and Classification was the priority in this project of lung cancer detection using ml. OBJECTIVE • The main objective of the project is to construct a program used for the detection of lung cancer using python machine learning. BACKGROUND  In background of this project , used libraries are given below  Pandas - Pandas is a software library written for the Python programming language for data manipulation and analysis.  Numpy - NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.  OS – For accessing directory structure.  Matplotlib- matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK.  Module used1)Pyplot is a collection of functions in the popular visualization package Matplotlib. Its functions manipulate elements of a figure, such as creating a figure, creating a plotting area, plotting lines, adding plot labels, etc. 2) mpl_toolkits. axes_grid1 toolkit is a collection of helper classes to ease displaying multiple images in matplotlib BACKGROUND  CV2 -OpenCV-Python is a library of Python bindings designed to solve computer vision problems. cv2. imread() method loads an image from the specified file. If the image cannot be read (because of missing file, improper permissions, unsupported or invalid format) then this method returns an empty matrix.  Tensorflow - TensorFlow is a free and open-source software library for machine learning. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks. Tensorflow is a symbolic math library based on dataflow and differentiable programming.  Keras - Keras is a deep learning API written in Python, running on top of the machine learning platform TensorFlow. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result as fast as possible is key to doing good research.  Pydicom- Pydicom is a pure python package for working with DICOM files such as medical images, reports, and radiotherapy objects. Pydicom makes it easy to read these complex files into natural pythonic structures for easy manipulation. Modified datasets can be written again to DICOM format files.  Seaborn - Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics BACKGROUND  Scikit learn - Scikit-learn (formerly scikits.learn and also known as sklearn) is a free software machine learning library for the Python programming language.[3] It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.  Module used1) StandardScaler removes the mean and scales each feature/variable to unit variance. This operation is performed featurewise in an independent way. StandardScaler can be influenced by outliers (if they exist in the dataset) since it involves the estimation of the empirical mean and standard deviation of each feature.  SciPy -SciPy is a free and open-source Python library used for scientific computing and technical computing. SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, signal and image processing, ODE solvers and other tasks common in science and engineering.  PIL - The Python Imaging Library adds image processing capabilities to your Python interpreter.This library provides extensive file format support, an efficient internal representation, and fairly powerful image processing capabilities.The core image library is designed for fast access to data stored in a few basic pixel formats. It should provide a solid foundation for a general image processing tool. BACKGROUND  SKIMAGE- Image Processing for Python. scikit-image (a.k.a. skimage) is a collection of algorithms for image processing and computer vision. The main package of skimage only provides a few utilities for converting between image data types.  Module usedSkimage.io - Utilities to read and write images in various formats.  PLOTLY- Plotly's Python graphing library makes interactive, publication-quality graphs. Examples of how to make line plots, scatter plots, area charts, bar charts, error bars, box plots, histograms, heatmaps, subplots, multiple-axes, polar charts, and bubble charts.  GLOB – Glob module is a useful part of Python standard library. Glob is used to return all file paths that match a specific pattern.  IMREAD- Imread is a very simple libray. It has three functions imread Reads an image from disk imread_multi Reads multiple images from disk (only for file formats that support multiple images) imwrite Save an image to disk.  SCIPY - SciPy, a scientific library for Python is an open source, BSD-licensed library for mathematics, science and engineering. The SciPy library depends on NumPy, which provides convenient and fast N-dimensional array manipulation. The main reason for building the SciPy library is that, it should work with NumPy arrays. BACKGROUND • In this project I have used the dataset from the Kaggle datasets. I had used the CT Scan slices image datasets . First for checking the consistency of the data’s I had done exploratory data analysis. Then with the help of graphs and plots from the analysis we then move into the next step. That is pre-processing of the data’s in which segmentation ,3D visualisation and volume rendering happens. And after this steps next was feature extraction from the images, we had obtain the features like modality, image size, pixel Spacing, the location of the CT Scan slice and other related data’s too of the image , since it was in dicom format the process was much easy compared to others format. Then after the feature extraction was classification and for that we had split the data into test and train with test size as 0.2 that is 20%. Then we had done networking and model building keras tensorflow(CNN) and we got the model summary. And we had plot the graph of accuracies and loss using matplot library. At last features has been extracted from the model that we builded. Workflow LOAD THE DATASET EXPLORATORY DATA ANALYSIS FOR CHECKING THE CONSISTENCY IN DATA DATA PREPROCESSING AND SEGMENTATION FEATURE EXTRACTION FROM THE IMAGES CLASSIFICATION MODEL ANALYSIS Hardware and software requirements HARDWARE TOOLS MINIMUM REQUIREMENTS Processor i5 or above Hardware 10GB RAM 8GB Monitor 17” Colored Mouse Optical Keyboard 122 keys Hardware and software requirements SOFTWARE TOOLS MINIMUM REQUIREMENTS Platform Windows, Linux or MacOS Operating System Windows, Linux or MacOS Technology Machine Learning-Python Scripting Language Python IDE Jupyter notebook FUTURE SCOPE • The lung cancer detection using the machine learning algorithm can be used in early diagnosis of cancer in individuals. This can be very helpful for doctors, radiologist for giving a better result for the patients who consult them. • This techniques can be used in curing the individuals and can also control the occurrence of lung cancer and can save millions of life. Machine learning algorithms can be used for more such improvements in the health care and also other sectors. CONCLUSION • In this project I have used the dataset from the Kaggle datasets. I had used the CT Scan slices image datasets. We had done exploratory data analysis then Image pre-processing and segmentation then features from image file has been extracted, then we had made network using CNN algorithm by importing Keras library. Then I had build the sequential model and we plotted graphs on the basics of accuracies and losses. • When we run the model.fit() or epochs was iterated till 40. And we got the training accuracies at an average of 98% while validation accuracy was 50%. This result was obtained because our training data was familiar with the model that we build while the validation or testing data was a collection of new data prints which is new to our model. Actually it is overfitted because our data size was small compared to others, but there is no case to worry about. If we our datasets size is more then there is chance of getting accuracy a good value to. Visualising data is one of the best way to humanize data to make it easy in understanding the concepts. • In the process of segmentation , sometimes the doctors can identify whether there is cancer in those sliced ct scans and can be helpful in reducing the risk of death. REFERENCES I. Datasets – [https://www.kaggle.com/kmader/siim-medical-images/code]. II. “The Machine Learning Landscape.” Hands-on Machine Learning with ScikitLearn and TensorFlow Concepts, Tools, and Techniques to Build Intelligent Systems, by Géron Auré lien, O'Reilly, 2019. III. P, Monkam, et al. “CNN Models Discriminating between Pulmonary MicroNodules and Non-Nodules from CT Images.” Biomed Eng Online, 26 July 2018. IV. Wang, Xing, et al. “An Appraisal of Lung Nodules Automatic Classification Algorithms for CT Images.” Sensors (Basel), Jan. 2019, doi:10.3390/s19010194. V. B.A. Miah and M.A. Yousuf, “Detection of Lung cancer from CT image using Image Processing and Neural network”,2nd International Conference on Electrical Engineering and Information and Communication Technology (ICEEICT), May 2015 VI. Ypsilantis, Petros-Pavlos, and Giovanni Montana. "Recurrent Convolutional Networks for Pulmonary Nodule Detection in CT Imaging." (2016): 1- 36.Https://arxiv.org/pdf/1609.09143.pdf. Web. May 2017. VII. Kuruvilla, Jinsa, and K. Gunavathi. "Lung Cancer Classification Using Neural Networks for CT Images." Computer Methods and Programs in Biomedicine 113.1 (2014): 202-09. Web. VIII.Jafar, Iyad, Hao Ying, Anthony F. Shields, and Otto Muzik. "Computerized Detection of Lung Tumors in PET/CT Images." 2006 International Conference of the IEEE Engineering in Medicine and Biology Society (2006): n. pag. Web. REFERENCES Thankyou!

Lung Cancer Detection using Python ML

Related documents

Study collections

Products

Support

Lung Cancer Detection using Python ML

Related documents

Study collections

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib