Uploaded by aksh914592

Lung Cancer Detection with Python ML

advertisement
LUNG CANCER
DETECTION USING
PYTHON ML
INTRODUCTION
• Lung Cancer (or lung carcinoma) is one of the most common maladies around the world. And is the second most
diagnosed cancer in the individuals. Lung cancer cannot be prevented but its risk can be reduced. So detection of lung
cancer at the earliest is crucial for the survival rate of patients. The number of chain- smokers is directly proportional
to the number of people affected with lung cancer. In this project I am using machine learning algorithm for the
detection lung cancer from the provided CT scan image datasets. This is purely based on python programming
language. This project can be used in the treatment for early detection of lung cancers in the individuals and can help
them in overcoming this health conditions.
• I used Jupyter Notebook as my working platform or IDE during this project. And the dataset was downloaded from
Kaggle. Different libraries and modules has been used in the process. Image in the datasets are basically the slices of
CT Scans. Segmentation, Feature Extraction and Classification was the priority in this project of lung cancer detection
using ml.
OBJECTIVE
• The main objective of the project is to construct a program used for the detection of lung cancer using python
machine learning.
BACKGROUND
 In background of this project , used libraries are given below
 Pandas - Pandas is a software library written for the Python programming language for data manipulation and analysis.
 Numpy - NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and
matrices, along with a large collection of high-level mathematical functions to operate on these arrays.
 OS – For accessing directory structure.
 Matplotlib- matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy.
It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter, wxPython,
Qt, or GTK.
 Module used1)Pyplot is a collection of functions in the popular visualization package Matplotlib. Its functions manipulate elements of a
figure, such as creating a figure, creating a plotting area, plotting lines, adding plot labels, etc.
2) mpl_toolkits. axes_grid1 toolkit is a collection of helper classes to ease displaying multiple images in matplotlib
BACKGROUND
 CV2 -OpenCV-Python is a library of Python bindings designed to solve computer vision problems. cv2. imread() method loads an
image from the specified file. If the image cannot be read (because of missing file, improper permissions, unsupported or invalid
format) then this method returns an empty matrix.
 Tensorflow - TensorFlow is a free and open-source software library for machine learning. It can be used across a range of tasks but
has a particular focus on training and inference of deep neural networks. Tensorflow is a symbolic math library based on dataflow
and differentiable programming.
 Keras - Keras is a deep learning API written in Python, running on top of the machine learning platform TensorFlow. It was
developed with a focus on enabling fast experimentation. Being able to go from idea to result as fast as possible is key to doing good
research.
 Pydicom- Pydicom is a pure python package for working with DICOM files such as medical images, reports, and radiotherapy
objects. Pydicom makes it easy to read these complex files into natural pythonic structures for easy manipulation. Modified datasets
can be written again to DICOM format files.
 Seaborn - Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive
and informative statistical graphics
BACKGROUND
 Scikit learn - Scikit-learn (formerly scikits.learn and also known as sklearn) is a free software machine learning library for
the Python programming language.[3] It features various classification, regression and clustering algorithms including support vector
machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and
scientific libraries NumPy and SciPy.
 Module used1) StandardScaler removes the mean and scales each feature/variable to unit variance. This operation is performed featurewise in an independent way. StandardScaler can be influenced by outliers (if they exist in the dataset) since it involves the
estimation of the empirical mean and standard deviation of each feature.
 SciPy -SciPy is a free and open-source Python library used for scientific computing and technical computing. SciPy contains
modules for optimization, linear algebra, integration, interpolation, special functions, FFT, signal and image processing, ODE
solvers and other tasks common in science and engineering.
 PIL - The Python Imaging Library adds image processing capabilities to your Python interpreter.This library provides extensive file
format support, an efficient internal representation, and fairly powerful image processing capabilities.The core image library is
designed for fast access to data stored in a few basic pixel formats. It should provide a solid foundation for a general image
processing tool.
BACKGROUND
 SKIMAGE- Image Processing for Python. scikit-image (a.k.a. skimage) is a collection of algorithms for image processing and
computer vision. The main package of skimage only provides a few utilities for converting between image data types.
 Module usedSkimage.io - Utilities to read and write images in various formats.
 PLOTLY- Plotly's Python graphing library makes interactive, publication-quality graphs. Examples of how to make line plots,
scatter plots, area charts, bar charts, error bars, box plots, histograms, heatmaps, subplots, multiple-axes, polar charts, and bubble
charts.
 GLOB – Glob module is a useful part of Python standard library. Glob is used to return all file paths that match a specific pattern.
 IMREAD- Imread is a very simple libray. It has three functions imread Reads an image from disk imread_multi Reads multiple
images from disk (only for file formats that support multiple images) imwrite Save an image to disk.
 SCIPY - SciPy, a scientific library for Python is an open source, BSD-licensed library for mathematics, science and engineering.
The SciPy library depends on NumPy, which provides convenient and fast N-dimensional array manipulation. The main reason for
building the SciPy library is that, it should work with NumPy arrays.
BACKGROUND
• In this project I have used the dataset from the Kaggle datasets. I had used the CT Scan slices image datasets . First
for checking the consistency of the data’s I had done exploratory data analysis. Then with the help of graphs and
plots from the analysis we then move into the next step. That is pre-processing of the data’s in which segmentation
,3D visualisation and volume rendering happens. And after this steps next was feature extraction from the images,
we had obtain the features like modality, image size, pixel Spacing, the location of the CT Scan slice and other
related data’s too of the image , since it was in dicom format the process was much easy compared to others format.
Then after the feature extraction was classification and for that we had split the data into test and train with test size
as 0.2 that is 20%. Then we had done networking and model building keras tensorflow(CNN) and we got the model
summary. And we had plot the graph of accuracies and loss using matplot library. At last features has been extracted
from the model that we builded.
Workflow
LOAD THE DATASET
EXPLORATORY DATA ANALYSIS FOR
CHECKING THE CONSISTENCY IN DATA
DATA PREPROCESSING AND
SEGMENTATION
FEATURE EXTRACTION FROM THE
IMAGES
CLASSIFICATION
MODEL ANALYSIS
Hardware and software requirements
HARDWARE TOOLS
MINIMUM REQUIREMENTS
Processor
i5 or above
Hardware
10GB
RAM
8GB
Monitor
17” Colored
Mouse
Optical
Keyboard
122 keys
Hardware and software requirements
SOFTWARE TOOLS
MINIMUM REQUIREMENTS
Platform
Windows, Linux or MacOS
Operating System
Windows, Linux or MacOS
Technology
Machine Learning-Python
Scripting Language
Python
IDE
Jupyter notebook
FUTURE SCOPE
• The lung cancer detection using the machine learning algorithm can be used in early diagnosis of cancer in
individuals. This can be very helpful for doctors, radiologist for giving a better result for the patients who consult
them.
• This techniques can be used in curing the individuals and can also control the occurrence of lung cancer and can
save millions of life. Machine learning algorithms can be used for more such improvements in the health care and
also other sectors.
CONCLUSION
• In this project I have used the dataset from the Kaggle datasets. I had used the CT Scan slices image datasets. We
had done exploratory data analysis then Image pre-processing and segmentation then features from image file has
been extracted, then we had made network using CNN algorithm by importing Keras library. Then I had build the
sequential model and we plotted graphs on the basics of accuracies and losses.
• When we run the model.fit() or epochs was iterated till 40. And we got the training accuracies at an average of 98%
while validation accuracy was 50%. This result was obtained because our training data was familiar with the model
that we build while the validation or testing data was a collection of new data prints which is new to our model.
Actually it is overfitted because our data size was small compared to others, but there is no case to worry about. If
we our datasets size is more then there is chance of getting accuracy a good value to. Visualising data is one of the
best way to humanize data to make it easy in understanding the concepts.
• In the process of segmentation , sometimes the doctors can identify whether there is cancer in those sliced ct scans
and can be helpful in reducing the risk of death.
REFERENCES
I.
Datasets – [https://www.kaggle.com/kmader/siim-medical-images/code].
II. “The Machine Learning Landscape.” Hands-on Machine Learning with ScikitLearn and TensorFlow Concepts, Tools, and Techniques to
Build Intelligent Systems, by Géron Auré lien, O'Reilly, 2019.
III. P, Monkam, et al. “CNN Models Discriminating between Pulmonary MicroNodules and Non-Nodules from CT Images.” Biomed Eng
Online, 26 July 2018.
IV. Wang, Xing, et al. “An Appraisal of Lung Nodules Automatic Classification Algorithms for CT Images.” Sensors (Basel), Jan. 2019,
doi:10.3390/s19010194.
V. B.A. Miah and M.A. Yousuf, “Detection of Lung cancer from CT image using Image Processing and Neural network”,2nd International
Conference on Electrical Engineering and Information and Communication Technology (ICEEICT), May 2015
VI. Ypsilantis, Petros-Pavlos, and Giovanni Montana. "Recurrent Convolutional Networks for Pulmonary Nodule Detection in CT Imaging."
(2016): 1- 36.Https://arxiv.org/pdf/1609.09143.pdf. Web. May 2017.
VII. Kuruvilla, Jinsa, and K. Gunavathi. "Lung Cancer Classification Using Neural Networks for CT Images." Computer Methods and Programs
in Biomedicine 113.1 (2014): 202-09. Web.
VIII.Jafar, Iyad, Hao Ying, Anthony F. Shields, and Otto Muzik. "Computerized Detection of Lung Tumors in PET/CT Images." 2006
International Conference of the IEEE Engineering in Medicine and Biology Society (2006): n. pag. Web.
REFERENCES
Thankyou!
Download
Study collections