Uploaded by IT-3 VNR

Batch-4-Dr.G.Madhu-Malaria Doc-Final

advertisement
A Project Report
on
TRANSFER LEARNING APPROACH FOR MALARIA
PARASITE DETECTION FROM MICROSCOPIC IMAGES
Submitted in the partial fulfillment of the requirements for the award of the degree of
BACHELOR OF TECHNOLOGY
In
INFORMATION TECHNOLOGY
Submitted by
Ms. Anushka Reddy Enugu
Mr. G.D. Rithvik Chowdary
Ms. N. Tharanya
Ms. Y. Nishitha
(16071A1203)
(16071A1219)
(16071A1237)
(16071A1259)
DEPARTMENT OF INFORMATION TECHNOLOGY
VNR Vignana Jyothi Institute of Engineering & Technology
(Autonomous Institute, Accredited by NAAC with ‘A’ grade and NBA)
Bachupally, Nizampet (S.O.) Hyderabad- 500 090.
April 2020
i
A Project Report
on
TRANSFER LEARNING APPROACH FOR MALARIA
PARASITE DETECTION FROM MICROSCOPIC IMAGES
Submitted in the partial fulfillment of the requirements for the award of the degree of
BACHELOR OF TECHNOLOGY
In
INFORMATION TECHNOLOGY
Submitted by
Ms. Anushka Reddy Enugu
Mr. G. D. Rithvik Chowdary
Ms. N. Tharanya
Ms. Y. Nishitha
(16071A1203)
(16071A1219)
(16071A1237)
(16071A1259)
Under the esteemed guidance of
PROJECT GUIDE
Dr. G. Madhu
Associate Professor,
Dept. of Information Technology,
VNRVJIET
DEPARTMENT OF INFORMATION TECHNOLOGY
VNR Vignana Jyothi Institute of Engineering & Technology
(Autonomous Institute, Accredited by NAAC with ‘A’ grade and NBA)
Bachupally, Nizampet (S.O.) Hyderabad- 500 090.
April 2020
ii
VNR Vignana Jyothi Institute of Engineering & Technology
Autonomous Institute, Accredited by NAAC with ‘A’ grade and NBA)
Bachupally, Nizampet (S.O.) Hyderabad- 500 090
DEPARTMENT OF INFORMATION TECHNOLOGY
Date: 23 May 2020
CERTIFICATE
This is to certify that the project work entitled “Transfer Learning Approach for Malaria
Parasite Detection from Microscopic images” is being submitted by Ms. Anushka Reddy
Enugu(16071A1203), Mr. G. D. Rithvik Chowdary (16071A1219), Ms. N. Tharanya
(16071A1237), Ms. Y. Nishitha (16071A1259) in partial fulfillment for the award of Degree of
Bachelor of Technology in Information Technology to the Jawaharlal Nehru Technological
University, Hyderabad during the academic year 2019-20 is a record of bonafide work carried out by
him/her under our guidance and supervision.
The results embodied in this report have not been submitted by the students to any other University
or Institution for the award of any degree or diploma.
Project Guide
Head of the department
Dr. G. Madhu
Associate Professor
Dr. G. Suresh Reddy
Head of the Department
External Examiner
iii
VNR Vignana Jyothi Institute of Engineering & Technology
Autonomous Institute, Accredited by NAAC with ‘A’ grade and NBA)
Bachupally, Nizampet (S.O.) Hyderabad- 500 090.
DEPARTMENT OF INFORMATION TECHNOLOGY
Date: 23 May 2020
DECLARATION
I hereby declare that the project entitled “Transfer Learning Approach for Malaria Parasite
Detection from Microscopic Images” submitted to VNR Vignana Jyothi Institute of Engineering
and Technology in partial fulfillment of the requirement for the award of Bachelor of Technology
in Information Technology is a bonafide report of the work carried out by us under the guidance
and supervision of Dr.G.Madhu, Associate Professor, Department of Information Technology,
Vallurupalli Nageswara Rao Vignana Jyothi Institute of Engineering and Technology.
To the best of my knowledge, this has not been submitted in any form to any university or institution
for the award of any degree or diploma.
Signature of the Student:
Anushka Reddy Enugu
(16071A1203)
G. D. Rithvik Chowdary
(16071A1219)
N. Tharanya
(16071A1237)
Y. Nishitha
(16071A1259)
Place:
Date:
iv
ACKNOWLEDGMENT
We express our deep sense of gratitude to our beloved President, Dr. D. N. Rao, VNR
Vignana Jyothi Institute of Engineering & Technology for the valuable guidance and for
permitting us to carry out this project.
With immense pleasure, we record our deep sense of gratitude to our beloved Principal, Dr.
C. D. Naidu permitting us to carry out this project.
We express our deep sense of gratitude to our beloved professor Dr. G. Suresh Reddy,
Associate Professor and Head, Department of Information Technology, VNR Vignana Jyothi
Institute of Engineering & Technology, Hyderabad-90 for the valuable guidance and suggestions,
keen interest and through encouragement extended throughout project work.
We take immense pleasure to express our deep sense of gratitude to our beloved Guide Dr.
G. Madhu, Associate Professor in Information Technology, VNR Vignana Jyothi Institute of
Engineering & Technology, Hyderabad, for his valuable suggestions and rare insights, for the
constant source of encouragement and inspiration throughout my project work.
We express our thanks to all those who contributed to the successful completion of our
project work.
1. ANUSHKA REDDY ENUGU
_____________________
2. G. D. RITHVIK CHOWDARY
_____________________
3. N. THARANYA
_____________________
4. Y. NISHITHA
_____________________
v
A Project Report
on
TRANSFER LEARNING APPROACH FOR MALARIA
PARASITE DETECTION FROM MICROSCOPIC IMAGES
Submitted in the partial fulfillment of the requirements for the award of the degree of
BACHELOR OF TECHNOLOGY
In
INFORMATION TECHNOLOGY
Submitted by
ANUSHKA REDDY ENUGU
16071A1203
Under the esteemed guidance of
PROJECT GUIDE
Dr. G. Madhu
Associate Professor,
Dept. of Information Technology,
VNRVJIET
DEPARTMENT OF INFORMATION TECHNOLOGY
VNR Vignana Jyothi Institute of Engineering & Technology (Autonomous
Institute, Accredited by NAAC with ‘A’ grade and NBA) Bachupally, Nizampet (S.O.)
Hyderabad- 500 090.
April 2020
vi
A Project Report
on
TRANSFER LEARNING APPROACH FOR MALARIA
PARASITE DETECTION FROM MICROSCOPIC IMAGES
Submitted in the partial fulfillment of the requirements for the award of the degree
of
BACHELOR OF TECHNOLOGY
In
INFORMATION TECHNOLOGY
Submitted by
G. D. RITHVIK CHOWDARY
16071A1219
Under the esteemed guidance of
PROJECT GUIDE
Dr. G. Madhu
Associate Professor,
Dept. of Information Technology,
VNRVJIET
DEPARTMENT OF INFORMATION TECHNOLOGY
VNR Vignana Jyothi Institute of Engineering & Technology (Autonomous
Institute, Accredited by NAAC with ‘A’ grade and NBA) Bachupally, Nizampet (S.O.)
Hyderabad- 500 090.
April 2020
vii
A Project Report
on
TRANSFER LEARNING APPROACH FOR MALARIA
PARASITE DETECTION FROM MICROSCOPIC IMAGES
Submitted in the partial fulfillment of the requirements for the award of the degree
of
BACHELOR OF TECHNOLOGY
In
INFORMATION TECHNOLOGY
Submitted by
N. THARANYA
16071A1237
Under the esteemed guidance of
PROJECT GUIDE
Dr. G. Madhu
Associate Professor,
Dept. of Information Technology,
VNRVJIET
DEPARTMENT OF INFORMATION TECHNOLOGY
VNR Vignana Jyothi Institute of Engineering & Technology (Autonomous
Institute, Accredited by NAAC with ‘A’ grade and NBA) Bachupally, Nizampet (S.O.)
Hyderabad- 500 090.
April 2020
viii
A Project Report on
TRANSFER LEARNING APPROACH FOR MALARIA
PARASITE DETECTION FROM MICROSCOPIC IMAGES
Submitted in the partial fulfillment of the requirements for the award of the degree
of
BACHELOR OF TECHNOLOGY
In
INFORMATION TECHNOLOGY
Submitted by
Y. NISHITHA
16071A1259
Under the esteemed guidance of
PROJECT GUIDE
Dr. G. Madhu
Associate Professor,
Dept. of Information Technology,
VNRVJIET
DEPARTMENT OF INFORMATION TECHNOLOGY
VNR Vignana Jyothi Institute of Engineering & Technology (Autonomous
Institute, Accredited by NAAC with ‘A’ grade and NBA) Bachupally, Nizampet (S.O.)
Hyderabad- 500 090.
April 2020
ix
ABSTRACT
Malaria is a deadly disease caused by infected mosquitoes. The most affected regions across the
globe are the tropical regions. Millions of lives are affected every year due to malaria. Malaria is
usually diagnosed by the microscopic examination in the laboratory. Through this diagnosis you
must distinguish between the infected and non-infected parasites in the Red Blood Cells (RBCs).
This process of diagnosis takes more time, and the correctness of the results are also dependent on
the experience and knowledge of the analyst in the laboratory. To overcome the challenges faced
during the diagnosis using traditional methods we can automate the diagnosis process using current
technological advancement. In this project, we use the concept of transfer learning to detect the
infected and non-infected malarial parasites using microscopic images. In this transfer learning
approach, we combined the Residual Neural Network (ResNet) and Support Vector Machine (SVM).
Through this approach we obtained Validation accuracy as 90%, Test accuracy as 89%, Train
accuracy as 99%. With this technique being implemented there will be no chance of human errors
and the results obtained will be accurate.
x
INDEX
Contents
Page No
ABSTRACT
X
CHAPTER 1: INTRODUCTION
1-4
1.1 PURPOSE OF THE PROJECT
1.2 EXISTING METHODOLOGIES AND
ITS DISADVANTAGES
1.3 PROPOSED SYSTEM
1.4 OBJECTIVE
1.5 THESIS ORGANIZATION
CHAPTER 2: LITERATURE SURVEY
5-14
CHAPTER 3: ISSUES AND CHALLENGES
15
CHAPTER 4: METHODOLOGY
16-17
4.1 INTRODUCTION
4.2 PROPOSED APPROACH AND
ITS FLOW DIAGRAM
CHAPTER 5: IMPLEMENTATION
18-25
5.1 DATASET COLLECTION AND
UNDERSTANDING
5.2 DATA PIPELINE TO THE MODEL
5.3 COMPOSE THE MODEL
5.4 TRAINING THE MODEL AND
CLASSIFICTAION
5.5 PERFORMANCE ANALYSIS
5.6 ENVIRONMENT
xi
CHAPTER 6: RESULTS
26-27
CHAPTER 7: CONCLUSION
28
CHAPTER 8: FUTURE WORK
29
CHAPTER 9: REFERENCES
30
SOURCE CODE
31-33
xii
CHAPTER 1
INTRODUCTION
Malaria is a hazardous disease caused by Plasmodium parasites. These parasites are transmitted
to people when they are bitten by feminine Anopheles’ mosquitoes. Only an infected anopheles’
mosquito can transmit malaria. These Anopheles’ mosquito might have been infected when it
had its blood meal from a person who has already been infected. Now when it has its next blood
meal from other person the parasites present in the mosquito are transmitted into him/her. The
symptoms of malaria typically include fever, tiredness, vomiting, and headaches. The symptoms
usually show after 10-15 days after bitten by the mosquito.
We generally have five species of malaria parasites that infect humans Plasmodium falciparum,
P. vivax, P. ovale, and P. malariae, P. knowlesi. Plasmodium falciparum is the dangerous one
among these five different varieties and it leads to severe infections and may lead to death
sometimes. The parasites causing malaria are usually present in the red blood cells of the person
who has been infected so, malaria can be transmitted through blood transmission,
transplantation of organs, or using the same needle for more than once.
Fig 1.1 Plasmodium falciparum gametocytes
1
The long-established way of diagnosing malaria is by visually inspecting a droplet of the
patient’s blood on a slide that is spread evenly which is studied using a microscope. The
specimen is stained before examination such that each parasite can have distinguishing
appearance. The examination is done by qualified and experienced technicians. This
microscopy examination of malarial parasites remains the evergreen technique for confirmation
of malaria in laboratory. This technique is at ease of use and widely available. The efficiency of
this technique is dependent on the experience of the laboratory analyst and the condition of the
microscope. It is not sure that all the reports obtained are accurate so, it may lead to incorrect
diagnostic results and inappropriate treatment.
In order to provide accuracy of the diagnostic results, we proposed a model using transfer
learning approach based on the Residual Neural Network (ResNet) and Support Vector
Machine (SVM). With this approach we will be able to classify microscopic images of the
blood cells as infected or uninfected.
1.1 Purpose of the Project
Malaria is a contagious disease that is caused by the transmission of plasmodium parasites.
Malaria is widespread across the world. The most affected regions are the tropical zoned areas.
The parasites carried by the mosquito are present in the red blood cells of the person who has
been infected. These deadly parasites can live for a year in our body without showing any
symptoms and delay in the treatment can lead to complications or even death of the person.
Early detection of the disease saves lives. As per the demographics released by the WHO, the
rate of the disease has decreased from 2010 to 2014 and increased from 2015 to 2017. about
228 million malaria cases were detected worldwide in 2018 and approximately 405,000
fatalities which is less when compared to 2017 where the detected malaria cases were said to be
231 million and 416,000 loss of lives. When the number of cases increases the diagnosis will be
difficult if we use the traditional laboratory method, so we need to automate the process of
diagnosis. We already have certain automated methods proposed by certain researchers. We
develop an efficient model using a transfer learning approach with significant performance
compared to the existing ones that would make the diagnosis easy, fast, and effective.
2
1.2 Existing methodologies and disadvantages
As the microscopy analysis of the detection of parasites is considered as the gold standard
technique for detecting malaria. The correctness of the results depend on the capability and
experience of the analysts. This method could be inefficient if there is a lack of efficiency and
experience in the technicians. We also have rapid diagnostic tests and other tests like
Polymerase chain reaction, Fluorescent microscopy, and some staining methods. There is a
chance of human error in traditional malaria detection. We also have some automated methods
for malaria parasite detection using Machine Learning, Artificial Neural Networks (ANN),
Clustering algorithms, Transfer learning and also using neural networks concept. The maximum
accuracy obtained in one of these methods is said to be around 93%. You can see the
comparisons of our developed model with a few other models in the results section.
1.3 Proposed System
In this proposed system we perform a transfer learning approach to detect the malarial parasites
using Residual Neural Network (ResNet) and Support Vector Machine (SVM). In this model
we read the dataset using OpenCV library in python. Pre-processing of images is not performed
instead the images are directly passed to our model of ResNet and SVM. We have used ResNet50 in that we used 49 layers of ResNet. The output obtained from the dense matrixes (dense-32
+ dense-1) is transmitted to the SVM. As SVM is a neat decision boundary model it classifies
the images as infected and uninfected.
1.4 Objective
The main objective of this model is to propose an efficient computerized model for detecting
malarial parasites from microscopic images using transfer learning approach. Some research
fraternities already proposed a model using VGG-19 and SVM. We are implementing a transfer
learning approach using Residual Neural Network (ResNet) and Support Vector Machine
(SVM) to obtain the highest accuracy compared to the existing ones.
3
1.5 Thesis Organization
Chapter 1: This chapter presents the basic information and introduction and the necessary
technical knowledge to implement.
Chapter 2: This chapter deals with the literature survey and research needed to implement the
project.
Chapter 3: This chapter deals with the challenges and issues that occurred during the
implementation of the project.
Chapter 4: This chapter deals with methodology design and approach
Chapter 5: This chapter deals with the implementation of the methodology.
Chapter 6: This chapter deals with the experimental results of the model.
Chapter 7: This chapter deals with the conclusion.
4
CHAPTER 2
LITERATURE SURVEY
Recently, automated malaria detection has made great achievements using current technological
advancements. Many research fraternities have proposed models using transfer learning, machine
learning, artificial neural networks and few more. Now, let us discuss the papers referred by us
during the project.
Vijayalakshmi A in 2019 proposed a novel transfer learning model to detect the malarial parasites. In
this approach they unified the Virtual Geometric Group (VGG-19) and Support Vector Machine
(SVM). They used the strategy of training the top layers and freezing the remaining layers. The
results of their proposed model excelled, and performance of their model proved to be more efficient
than the existing CNN models.
Christine F. Mark Walter in 2016 proposed a model to detect the malarial biomarkers. In this model
they made use of magnetic based detection to identify the malarial biomarkers. The magnetic beadbased ELISA format they used in this project shows good performance. They also suggested to use
this model for detection multiple biomarkers.
Purnima Pandit in 2016 proposed a model using Artificial Neural Networks to classify the malarial
parasites in Red Blood Cells. In their model they extract features from the data that is collected from
the images of the Red Blood Cells. These extracted features are given to the Artificial Neural
Network as input and it classifies the images as infected or not.
Meng-Hsim Tsai in 2015 proposed a model that is used to identify the malarial parasites in the blood
smear images. This model can say if the patient is infected with malaria or not. In this model they
made use of Adaptive histogram, Threshold segmentation and K-means clustering. With the help of
their proposed model detecting malaria would be easy for doctors too.
5
S. T. Khot in 2014 proposed computerized model for detecting the parasites of malaria. They took
thin blood smear images as input and used different classification algorithms to separate the infected
and uninfected parasites. In this model they made use of some preprocessing techniques and
algorithms like KNN classification, Linear programming, Artificial Neural Networks and Template
matching techniques to classify the images.
Daniel Maitethia Memeu in 2013 proposed a model to detect the parasites from images. In this
model they made used of Square Median filter for image pre-processing, Histogram Thresholding
for image segmentation, Histogram Segmentation and Artificial Neural Networks for classification
of the images. This model showed good performances in recognizing the parasites causing malaria.
Isha suwalka in 2012 proposed a model that is used to identify the kind of parasite with the color and
shape of the cell. In this model they make use of Digital Image processing and Curved shape
analysis to detect the parasites. This model shows better performance as it takes less time and not
much manpower is required to implement this model.
Yashasvi Purwar in 2011 proposed an unsupervised model to detect the parasites causing malaria. In
this model they did image preprocessing, segmentation of images, and clustering algorithms to
classify the microscopic images of the malarial parasites. This method can be improved using imagebased screening methods. The performance of this model is better when compared to the laboratory
methods.
Subhamoy Mandal in 2010 proposed a model for segmentation of the blood cells and classification
of the images as infected and uninfected. The steps involved in implementation of the model are data
gathering and preprocessing the image and using color space conversion and normalized cut
algorithm for segmentation of images. After the segmentation the images of the parasites are
classified as infected and uninfected.
6
TRANSFER LEARNING
Transfer learning is a machine learning model where once we develop the model it can be reused.
The knowledge we gain while solving a problem can be used again while solving the related
problems of different models. Consider, if we develop a model to predict or classify images that
contains a computer, we can use the knowledge of the model to recognize other items. We need not
start solving the problem from starting because with this technique we can make use of the patterns
we got while solving other problem models.
Transfer learning is popular in deep learning as it can be used in training neural networks where
prepared trained models are used with a comparatively small amount of data. Transfer learning is
very useful in our real time situations because most of the problems do not have large data point to
train such models. We scrutinize that knowledge that was learned in one model and make some
modifications while using in other models to improve the performance of these models. Transfer
learning is generally used in developing many applications like face recognition, malarial parasites
detection and few more. In 1993, Lorien Pratt is the first person to publish a paper on transfer
in machine learning, formulating the discriminability-based transfer (DBT) algorithm.
As we know that transfer learning makes use of the knowledge gained from its previous models and
it is also important to know what is the data being transferred and how do we transfer the data. We
use the knowledge gained from previous problems to increase the performance of the problem we
are solving.
We have different transfer learning techniques. The implementation of the techniques vary based on
the data we are training. Some of the transfer learning techniques are listed below.
•
Inductive Transfer learning
•
Transductive transfer learning
•
Unsupervised transfer learning
We generally have two approaches for implementing transfer learning they are namely,
1. Develop model approach and
2. Pre trained model approach.
7
Transfer learning model is a layer structure. The starting layers are trained, and their output is added
to the last layer and then we get the final output. We extract features while training the data in this
approach. We can replace the last layer with any classification algorithm or something that is
suitable for our model. We can freeze the top layers that is restrain some layers of the network while
we use it for other model. We can replace these frozen layers by some supervised motive of the
problem.
The basic principle of transfer learning is the use of previously studied models. So we have some
pre-trained models that can be used in developing some applications. The pre-trained models are
listed below
•
VGG-19
•
VGG-16
•
ResNet-50
Transfer Learning with Image Data
In general, transfer learning is applied to the problems that use pre-trained models and also take
images as input. For example, consider face recognition model in this the images captured by the
webcam or the images that are already on your device are taken as input.
We use transfer learning approach when we have large data set and also when we should classify the
images that are taken as input. The pre-trained models may take so much time for training the data, it
may be like days or weeks. These models can be downloaded and can used to train your dataset.
Three examples models of this type are:
•
Oxford VGG Model
•
Google Inception Model
•
Microsoft ResNet Model
8
Why we use transfer learning
The main advantages of transfer learning are as follows
1. Saving training time as we are using the pre-trained model
2. Better performance of neural networks
3. It does not need a lot of data.
In general, we require a lot of data to train a network from starting. It is not sure that you will be
provided with the data all the time. In such cases transfer learning can be used, as we make use of
the pre-trained models existing in it.
We make use of transfer learning in image classification, face recognition related problems. It takes
a lot of time to train the network from the start. So, it would be helpful if we make use of the pretrained models that are already existing that would help us save time required to train the network.
Image Processing
Image classification is the task of recognizing objects or patterns on an image and assign a category to
classify the image based on a pool of given categories/classes. There are a lot of different sub fields
like recognizing multiple objects on a single image or even localize the objects within the image.
The image processing of malaria parasite consists of different stages
•
Image acquisition
•
Pre-processing
•
Erythrocyte segmentation
•
Feature extraction
•
Image classification
9
Image acquisition
It is the process where images are collected from the microscope that are captured by the camera
attached to it.
Pre-processing
Pre-processing is said to be the process of removing unwanted noise from the images. As a result,
we could get better visualization.
Segmentation
The process where red blood cells are isolated by removing all other details such as white blood
cells, platelets from the pre-processed image.
Feature extraction
Here we extract the insights from the parasites like colour, geometry, texture. These features can be
used to classify the images using algorithms.
Applications of transfer learning
1. Real time simulation
2. Gamming
3. Image processing
4. Zero shot translation
5. Sentiment classification
10
Neural Network
We have multiple Kinds of Neural Networks such as Feed Forward (FF) Neural Networks, Back
Propagation, Recurrent Neural Network, Markov chain, Deep Convolutional Network. In this project
we focus on performance and efficiency that requires less throughput. We follow two layered Feed
Forward Network that reduces complexity while understanding the facial features in a given image.
We have used Feed Forward (FF) Neural Networks for the classification of the input images. Feed
Forward Neural Network can be viewed as a simple distribution of nodes in one direction with some
intermediate layers. The first stage of Neural Network between the neuron or the nodes can be
termed as Feed Forward Neural Network (FF). Neural Network is a combination of algorithms that
are distributed in different layers which are computed one by one to find relationship from a set of
huge data that reflects the properties of the neural network of a human being learning capabilities.
We use neural network to classify an image.
A Neural Networks consists of layers of interconnected nodes. We have two important components
in Neural Networks. They are the Neuron and Network. A neuron is a node which stores data. The
Network is mathematical relationship between the nodes that is accomplished using functions. We
also have different Layers present in the Network. They are the Input Data Layer, Multiple
Intermediate Hidden Data Layers, Output Data Layers. Input Data Layer indicates the input image
which is represented as array of data which holds RGB Values ranging from 0 to 255. Hidden Data
Layers are the outcome of the Previous Layer when we pass through a Mathematical Function.
Output Layer is the Layer that gives some information regarding the input information. The
information could be an object or pattern or image recognition depending on the function used in the
network. The reason for using Feed forward is to reduce most of the intermediate and loops that
increases load on the CPU as well as on other resources that requires huge amount of time for
Mathematical computations
11
Fig 2.1: Neural Network
ResNet
Resnet is usually known as residual network. It was evolved in 2015. It can be said as leading-edge
work in computer deep learning networks/models. With the help of ResNet it is possible to train
hundreds or even thousands of layers and achieve best performance. ResNet has got many
advantages compared to existing networks that also includes representation ability, other than image
classification many computer vision applications have been enhanced like face recognition and
object detection.
In general, a single layer is enough to represent function in a feedforward network. But if the layer is
massive it may lead to overfitting of data. In such cases the neural networks must go deeper. When
the neural networks go deeper it may lead to worst performances. This gradient issue is solved with
the help of ResNet. ResNet introduces a new concept of identifying shortcut connection that is., it
bypasses one or more layers in the network.
12
Fig 2.2: Identity shortcut connection in ResNet
In this project we have used ResNet-50. ResNet-50 is a convolutional neural network which has
depth of 50 layers. A pre-trained version of the network trained can be loaded on more than a million
images. The pre-trained network model can classify images into different classes. Due to which, the
network has learned to obtain rich features from a large number of images.
13
Support Vector Machine
Support Vector Machine is usually known as SVM. This is said to be a supervised machine learning
algorithm. It is mostly used for classification algorithms. SVM is a neat decision boundary that
classifies the data based on the hyperplane obtained that differentiates the classes.
Fig 2.3: Hyperplane of SVM
The data sets are plotted in n-dimensional space. These datasets will be said as support vectors. The
class of data present on the right side of the hyperplane will be classifies as positive and the class
present on the left side of the hyperplane is classifies as negative datasets.
With the help of this Support Vector Machine (SVM) we will classify our microscopic images as
infected and un-infected.
14
CHAPTER 3
ISSUES AND CHALLENGES
Over the last decade, in education many research fraternities and commercial product developers
have proposed different models to detect the malarial parasites. Every model developed might have
faced certain challenges during development. In our transfer learning approach of detecting malarial
parasites we did face some issues and challenges.
The major issue faced during the development of our model is finding the right hyperparameters. In
general, hyperparameter is a value of the parameters that are set before the learning process begins.
Hyperparameters vary from each training model. The time required to train and test a model depends
on the hyperparameters we choose.
Hyperparameters usually cannot be learned using well-known gradient methods. As these parameters
describe the model representation they cannot be learned through normal optimization techniques.
They cannot be learned from the training data as they increase the capacity of the model that may
lead to overfitting of the data and picking up noise in the data.
Performance variation can be attributed to only a few hyperparameters. We tune the algorithm,
hyperparameters to know how much performance can be achieved by the model. Methods that do
not respond fast to small changes in the hyperparameters or different implementations of the
algorithm must not be involved in the model as it leads to a decrease in the robustness of the model.
Another major issue faced during the development of the model is choosing the depth of the
Residual Neural Network (ResNet). As the depth increases in the neural network there is a chance
for the performance of the model to degrade. ResNet follows the residual network approach that
means it skips one more input layer of the network. It will be difficult to know the exact depth
required for the model to generate optimal results.
15
CHAPTER 4
METHODOLOGY
4.1 Introduction
In this section we understand what strategies are adopted to extract the features from the images. As
we proposed a transfer learning model where we combine the Residual Neural Network (ResNet)
and Support Vector Machine (SVM). We have used ResNet50 to extract the features. We took 49
layers of ResNet and in these layers we have added our model and the output obtained from the
output layer is sent to the SVM. As SVM is a classifier it classifies the input images into different
classes based on the input.
4.2 Proposed Approach and its Flow Diagram
There are different stages in the proposed approach:
1.Reading the dataset
2.Building a model where we made use of ResNet and SVM
2.1 passing the images to 49 layers of ResNet
2.2 Passing the output of the 49 layers to Dense-32 layer
2.3 Passing the output of the Dense-32 layer to Dense-1 layer
2.4 This output is then given as input to the SVM for classification
In this proposed system we perform transfer learning approach to detect the malarial parasites
using Residual Neural Network (ResNet) and Support Vector Machine (SVM). In this model
we read the dataset using OpenCv library in python. We do not perform any image preprocessing instead the images are directed passed to our model of ResNet and SVM.
We made use of ResNet-50 in that we use 49 layers of ResNet where all the feature extraction
of the images will take place. The output obtained from these layers is passed to dense matrixes
(dense-32 + dense-1) is transmitted to the SVM. As SVM is a neat decision boundary model it
classifies the images based on the input as infected and uninfected.
16
The flow diagram of the methodology is as shown below:
Fig 4.1: Methodology Flowchart
17
CHAPTER 5
IMPLEMENTATION
Malarial Parasites Detection
Detection of malarial parasites using automated methods that are developed using current
technologies provides us with the results that have accuracy as there is no chance of human errors.
We developed one such model using a transfer learning approach.
Malarial parasite detection involves the following steps:
1.
Dataset collection and understating
2.
Data pipeline to the model
3.
Compose the model
4.
Training the dataset and classification
5.1 Dataset collection and Understanding
The microscopic images used for our model are previously existing ones. The images are encrypted
in a zip file. We need to decrypt the file and try to understand which of those are infected and which
are not. In those microscopic images we have the images of infected malarial parasites as well as
uninfected malarial parasites.
Fig 5.1: Microscopic Images of red blood cells.
18
5.2 Data Pipeline to the model
Data pipeline is generally the transfer of data from one system to another. Here, the output of one
system will be the input of other system. We transform the data in batches from one system to other.
As we are working with unstructured data we cannot load the dataset in to the memory. We need to
modify this unstructured data as structured data that would make the training of images to the model
a bit easy.
We classify the input dataset into 2 classes.
CELL_IMAGES
UNINFECTED
PARASITIZED
Fig 5.2: Classification of images
5.3. Proposed Transfer Learning Model
The model we proposed for detecting malarial parasites is the transfer learning approach. In this
approach we combine both Residual Neural Networks and Support Vector Machine to classify the
microscopic images. We make use of ResNet-50 in our model and in that we have Dense-32 and
Dense-1 matrices. The output obtained from ResNet is provided as input to SVM to classify.
19
Fig 5.3: Proposed Transfer Learning Workflow Model with ResNet-50+SVM.
5.4 Training the model and classification
We will not be using the whole dataset in training the model we will split the dataset for training,
validation, and testing into 80,10,10 respectively. We are reading the dataset by the OpenCV library.
In our proposed system we are using ResNet and SVM. We are taking only 49 layers of Resent as a
pre-trained model and passing the dataset size of (64,64,3) through these layers. The extraction of
features from the images will be performed in these layers.
The features are again flattened into a linear array where it will extract low-level features in the
images. These features are passed to the Dense-32 layer here it produces an output of 32 length
vector. We have chosen Dense-32 as it has experimented as it came out to be best to produce the
features. This 32 length vector is passed as input to Dense-1 layers which is the output layer. It gives
the output as value. The value is then passed to SVM (support vector machine) a classifier that
classifies the images into infected and uninfected.
20
Fig 5.4: Training the input dataset
21
5.5 Performance Analysis
The performance analysis of the proposed model is assessed by calculating the following
formulas.
Accuracy is obtained using the formula,
Accuracy =
TP+TN
TP+TN+FP+FN
where TP is the true positives rate, TN is the true negatives rate, FN is false negatives, and FP is
false positives.
Recall value obtained using the formula,
Recall =
True Positives
(True Positives + False Negatives)
Precision is calculated using the formula,
Precision =
True Positives
(True Positives + False Positives)
F-Measure is calculated using the formula,
F-Measure =
(2 * Precision * Recall)
(Precision + Recall)
5.6 Environment
Python is an interpreted and a high-level programming language. Python has simple English
statements which enable us to understand the code without much prior knowledge. Python’s design
makes code readability with its use of significant whitespace. It also follows indentation which
doesn’t hinder much in analysing or debugging the code. It follows an object-oriented approach to
have users a clear view of the logical and functional part of the projects. The projects need not be a
small scale.
22
Interface- Jupyter notebook, anaconda prompt
The Jupyter Notebook is a web application that is available for free. It is an open source in which we
can create and share the documents that contain code, text or equation. Jupyter is a project that is
used to develop the Jupyter user interface to support multiple programming languages.
Jupyter Notebooks is subproject from a project called Ipython. It mainly supports Julia, Python and
R programming languages and hence the name Jupyter. Jupyter along with the Ipython kernel. It
enables us to write your code in Python, R. There are around 100 other kernels which are available
in Jupyter that can be used to run the program.
The notebook extends helps to compute or visualize data for building an application which requires
analysis of data using console-based approach. Jupyter also has extra feature of documenting and
transfer of information which is useful to maintain records on a largescale project. As every other
platform it supports development, execution and computation of code in an IDE. The Jupyter
notebook integrates two main components while developing any application or platform. These
include:
•
Web application: It is a web-based tool for interactive documents production which
incorporates explanatory text, mathematics, computations and their rich media performance.
•
Notebook documents: It contains all the accessible material in the web application, including
computational inputs and outputs, explanatory text, mathematics, images and objects’ rich
media representations.
Python Packages
OpenCV:
OpenCV is an open source computer vision library, optimized and cross platform. It written in c++
programming language. It was originally developed by intel and later on it was supported by Willow
Garage. It is also used in python, Java interfaces. It provides interface for any image processing
technique. There are many different applications of openCV such as Human-computer Interaction,
robotic, segmentation and recognition, motion tracing, Gesture and Facial recognition.
23
Numpy:
NumPy or Numerical Python is library used to perform some operations on arrays. It contains a ndimensional array object that is useful in applying algebra and various other mathematical formulas
to the array as a whole. NumPy object can also be referred as Numpy array. It stores data such that
the it effectively performs mathematical calculations. Numpy Array stores the data in the tabular
format i.e. rows and columns. In order to use Numpy first check whether numpy is install or not. If
not install the numpy package using below steps.
To install numpy, go to command prompt. Type the below command and press enter.
“pip install numpy”
The numpy array has many advantages when compared with list because:
1. Less Memory
2. Fast
3. Convenient
We can perform different operations such as ndim, itemsize, dtype, reshape, slicing, linspace,
max/min, square Root and Standard Deviation etc.
Os Package:
The python OS package provides functionalities to interact with the operating system. OS module
comes among the basic utility modules of Python. It offers a versatile way of using functionalities
based on the OS.
The modules * os * and * os.path * have several functions for communicating with the filesystem. In
os the functions are:
•
os.name: This function calls the module based on the operating system which is being imported.
•
os.getcwd: It returns the Current Working Directory (CWD) of the file used for the code
execution. It could vary from one system to another.
• os.rename Using this function we can rename a file.
• os.popen(): This function opens a pipe to or from command. It contains two modes by which we
can read or write they are ‘r’ and ‘w’.
24
Keras
Keras is a high-level neural networks API. It is written in Python programming language. It is can
be ran on top of TensorFlow, CNTK, or Theano. If we want a deep learning library we can use Keras
Keras allows easy and fast prototyping.
It does not computation at low level. It uses a library called Backend to do it. So, the low level API
is high-level API.
It handles the way we make our models, defining layers of the model. It compiles our model with
few parameters they are loss, optimization functions, and a training process.
TQDM:
TQDM a library which provides progress bars and also provides remarkable support for the nested
loops and Jupyter notebooks. It supports nested progress bars and if we have Keras fit and predict
loops within an outer TQDM loop. The nested loops will display properly.
25
CHAPTER 6
RESULTS
In this section, you will come across the outputs that are obtained by the implementation of the
proposed system.
Fig 6.1: Accuracy and Loss Curves of the model proposed
26
We also got the confusion matrix of the model, classification reports and accuracy of the proposed
model.
Performance evaluation of existing models
Methods
Sensitivity
Precision
F-score
Accuracy
VGG16-SVM
89.90%
84.47%
87.05%
89.21%
VGG!9-SVM
93.44%
89.95%
91.66%
93.13%
Accuracy table of the proposed model:
Accuracy
ResNet50-SVM
Test
89.89%
Validation
90%
Train
99%
Fig 6.2: Confusion matrix and Classification Report
27
CHAPTER 7
CONCLUSION
In this project, we implemented a transfer learning approach to detect malaria parasites in thick
blood smears where we use a pre-trained model approach. Detection of malaria using ResNet and
SVM model is a model where it can’t detect the malaria parasite in the cells. The model can detect
the features through layers of Resnet and further feeding the data to SVM. ResNet a residual
network extracts the features from the images and then these features are sent into SVM (support
vector Machine) which acts as a classifier and classifies the images into two classes that are infected
and uninfected. This model has obtained a training accuracy of 99% and validation accuracy of 90%.
Hence, by using this model we can accurately declare if they have malaria or than. This model
produces more accuracy than the person diagnosing the cells in the laboratory. In help in decision
making in many of the forensics laboratories.
28
CHAPTER 8
FUTURE WORKS
Our algorithms can be enhanced by applying other deep learning techniques so that better accuracy
and loos can be obtained. Unification of ResNet and SVM provides better performance than VGG19 and CNN models.
The proposed model is used to classify only one kind of malarial parasite as infected and uninfected.
As we have five different parasites that cause malaria in the future, our project can be extended to
classify all five different parasites.
We can improve the performance of the proposed model by using many neural networks like
Random forest, AlexNet neural networks.
The proposed system can be used in the diagnosis of malarial parasites which makes it easy, fast,
and effective.
29
CHAPTER 9
REFERENCES
1.
Deep learning approach to detect malaria from microscopic images Vijayalakshmi A1 &
Rajesh Kanna B1 https://link.springer.com/article/10.1007/s11042-019-7162-y
2.
Image analysis and machine learning for detecting malaria,” M. Poostchi, K. Silamut, R. J.
Maude, S. Jaeger, and G. Thoma, https://pubmed.ncbi.nlm.nih.gov/29360430/
3.
Improving Malaria Parasite Detection from Red Blood Cell using Deep Convolutional
Neural Networks Aimon Rahman, Hasib Zunair, M Sohel Rahman, Jesia Quader Yuki
https://arxiv.org/ftp/arxiv/papers/1907/1907.10418.pdf
4.
Detecting Malaria with Deep Learning https://towardsdatascience.com/detecting-malariawith-deep-learning-9e45c1e34b60
5.
CNN-Based Image Analysis for Malaria Diagnosis Zhaohui Liang, Andrew Powel
https://www.lhncbc.nlm.nih.gov/system/files/pub9433.pdf
6.
Deep Learning for Medical Image Processing: Overview, Challenges and Future
Muhammad
Imran
Razzak,
Saeeda
Naz
and
Ahmad
Zaib
https://arxiv.org/ftp/arxiv/papers/1704/1704.06825.pdf
7.
Teaching a neural network to see roads https://towardsdatascience.com/teaching-a-neuralnetwork-to-see-roads-74bff240c3e5
8.
How
to
visualize
a
Deep
Learning
Neural
Network
Model
in
Keras
https://machinelearningmastery.com/visualize-deep-learning-neural-network-model-keras/
9.
Higher validation accuracy,than training accuracy using TensorFlow and Keras
https://intellipaat.com/community/19943/higher-validation-accuracy-than-trainingaccurracy-using-tensorflow-and-keras
10. https://www.learnopencv.com/wp-content/uploads/2017/11/cnn-keras-curves-with-aug.jpg
11. https://towardsdatascience.com/the-4-convolutional-neural-network-models-that-canclassify-your-fashion-images-9fe7f3e5399d
12. Detecting malaria with deep learning https://opensource.com/article/19/4/detectingmalaria-deep-learning
13. https://towardsdatascience.com/an-overview-of-resnet-and-its-variants-5281e2f56035
30
SOURCE CODE:
import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm
import os
import cv2
import keras
from keras.layers import *
from keras.models import *
from keras.optimizers import *
from keras.preprocessing import image
from keras.utils.vis_utils import plot_model
import warnings
warnings.filterwarnings("ignore")
#DATA CREATION
DATADIR = r"C:\cell_images"
CATEGORIES = ["Uninfected","Parasitized"]
IMG_SIZE = 64
os.chdir(DATADIR)
training_data = []
training_data1 = []
def create_training_data():
for category in CATEGORIES:
path = os.path.join(DATADIR,category)
class_num = CATEGORIES.index(category)
for img in tqdm(os.listdir(path)):
try:
img_array = cv2.imread(os.path.join(path,img))
new_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE))
training_data.append([new_array, class_num])
training_data1.append(new_array)
except Exception as e:
pass
X = []
y = []
for features,label in training_data:
X.append(features)
y.append(label)
X = np.array(X).reshape(-1, IMG_SIZE, IMG_SIZE, 3)
img_rows = IMG_SIZE
img_cols = IMG_SIZE
channels = 3
img_shape = [img_rows, img_cols, channels]
return X,y,img_shape
31
X,y, shape = create_training_data()
print('Dataset shape: {0}, Image shape: {1}'.format(X.shape, shape))
print(len(training_data1))
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# using a pretrained model
from keras.applications import VGG19,ResNet50
#model = VGG19(include_top = False, input_shape = shape)
resnet = ResNet50(include_top = False, weights='imagenet',input_shape = shape)
for layer in resnet.layers:
layer.trainable = False
#output = resnet.layers[-1].output
#output = keras.layers.Flatten()(output)
#resnet = Model(resnet.input, output=output)
resnet.summary()
resnet = Model(inputs=resnet.input, outputs=resnet.get_layer('activation_71').output)
from keras.models import Sequential
model = Sequential()
model.add(resnet)
model.add(Flatten())
model.add(Dense(32, activation='relu'))
model.add(Dense(1, kernel_regularizer='l2', activation='sigmoid'))
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
print(model.summary())
#history = model.fit(x_train,y_train,batch_size=32,validation_data=(x_test,y_test),epochs=10)
history = model.fit(X,y,batch_size=32,validation_data=None,epochs=50)
model.save('model_50_res.h5')
y_pred=np.round(np.squeeze(model.predict(x_test)))
y_true=np.argmax(np.array(y_test),axis=-1)
pred=np.argmax(y_pred,axis=-1)
num_epoch=50
print(history.history.keys())
# summarize history for accuracy
plt.plot(history.history['accuracy'],color='r')
plt.plot(history.history['val_accuracy'],color='g')
plt.title('model accuracy for {0} epochs'.format(num_epoch))
32
plt.ylabel('accuracy')
plt.xlabel('epochs')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history.history['loss'],color='r')
plt.plot(history.history['val_loss'],color='g')
plt.title('model loss for {0} epochs'.format(num_epoch))
plt.ylabel('loss')
plt.xlabel('epochs')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
from keras.models import load_model
model1 = Model(inputs=model.input, outputs=model.get_layer('dense_3').output)
X = model1.predict(X)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
from sklearn import svm
#Create a svm Classifier
clf = svm.SVC(kernel='linear') # Linear Kernel
#Train the model using the training sets
clf.fit(X_train, y_train)
#Predict the response for test dataset
y_pred = clf.predict(X_test)
from sklearn.metrics import accuracy_score,confusion_matrix, classification_report
cm=confusion_matrix(y_test,y_pred)
cr=classification_report(y_test, y_pred)
ac=accuracy_score(y_test, y_pred)
plt.show()
print('confusion matrix:',cm)
print('classification report',cr)
print('accuarcy score',ac)
33
Download