A Project Report on TRANSFER LEARNING APPROACH FOR MALARIA PARASITE DETECTION FROM MICROSCOPIC IMAGES Submitted in the partial fulfillment of the requirements for the award of the degree of BACHELOR OF TECHNOLOGY In INFORMATION TECHNOLOGY Submitted by Ms. Anushka Reddy Enugu Mr. G.D. Rithvik Chowdary Ms. N. Tharanya Ms. Y. Nishitha (16071A1203) (16071A1219) (16071A1237) (16071A1259) DEPARTMENT OF INFORMATION TECHNOLOGY VNR Vignana Jyothi Institute of Engineering & Technology (Autonomous Institute, Accredited by NAAC with ‘A’ grade and NBA) Bachupally, Nizampet (S.O.) Hyderabad- 500 090. April 2020 i A Project Report on TRANSFER LEARNING APPROACH FOR MALARIA PARASITE DETECTION FROM MICROSCOPIC IMAGES Submitted in the partial fulfillment of the requirements for the award of the degree of BACHELOR OF TECHNOLOGY In INFORMATION TECHNOLOGY Submitted by Ms. Anushka Reddy Enugu Mr. G. D. Rithvik Chowdary Ms. N. Tharanya Ms. Y. Nishitha (16071A1203) (16071A1219) (16071A1237) (16071A1259) Under the esteemed guidance of PROJECT GUIDE Dr. G. Madhu Associate Professor, Dept. of Information Technology, VNRVJIET DEPARTMENT OF INFORMATION TECHNOLOGY VNR Vignana Jyothi Institute of Engineering & Technology (Autonomous Institute, Accredited by NAAC with ‘A’ grade and NBA) Bachupally, Nizampet (S.O.) Hyderabad- 500 090. April 2020 ii VNR Vignana Jyothi Institute of Engineering & Technology Autonomous Institute, Accredited by NAAC with ‘A’ grade and NBA) Bachupally, Nizampet (S.O.) Hyderabad- 500 090 DEPARTMENT OF INFORMATION TECHNOLOGY Date: 23 May 2020 CERTIFICATE This is to certify that the project work entitled “Transfer Learning Approach for Malaria Parasite Detection from Microscopic images” is being submitted by Ms. Anushka Reddy Enugu(16071A1203), Mr. G. D. Rithvik Chowdary (16071A1219), Ms. N. Tharanya (16071A1237), Ms. Y. Nishitha (16071A1259) in partial fulfillment for the award of Degree of Bachelor of Technology in Information Technology to the Jawaharlal Nehru Technological University, Hyderabad during the academic year 2019-20 is a record of bonafide work carried out by him/her under our guidance and supervision. The results embodied in this report have not been submitted by the students to any other University or Institution for the award of any degree or diploma. Project Guide Head of the department Dr. G. Madhu Associate Professor Dr. G. Suresh Reddy Head of the Department External Examiner iii VNR Vignana Jyothi Institute of Engineering & Technology Autonomous Institute, Accredited by NAAC with ‘A’ grade and NBA) Bachupally, Nizampet (S.O.) Hyderabad- 500 090. DEPARTMENT OF INFORMATION TECHNOLOGY Date: 23 May 2020 DECLARATION I hereby declare that the project entitled “Transfer Learning Approach for Malaria Parasite Detection from Microscopic Images” submitted to VNR Vignana Jyothi Institute of Engineering and Technology in partial fulfillment of the requirement for the award of Bachelor of Technology in Information Technology is a bonafide report of the work carried out by us under the guidance and supervision of Dr.G.Madhu, Associate Professor, Department of Information Technology, Vallurupalli Nageswara Rao Vignana Jyothi Institute of Engineering and Technology. To the best of my knowledge, this has not been submitted in any form to any university or institution for the award of any degree or diploma. Signature of the Student: Anushka Reddy Enugu (16071A1203) G. D. Rithvik Chowdary (16071A1219) N. Tharanya (16071A1237) Y. Nishitha (16071A1259) Place: Date: iv ACKNOWLEDGMENT We express our deep sense of gratitude to our beloved President, Dr. D. N. Rao, VNR Vignana Jyothi Institute of Engineering & Technology for the valuable guidance and for permitting us to carry out this project. With immense pleasure, we record our deep sense of gratitude to our beloved Principal, Dr. C. D. Naidu permitting us to carry out this project. We express our deep sense of gratitude to our beloved professor Dr. G. Suresh Reddy, Associate Professor and Head, Department of Information Technology, VNR Vignana Jyothi Institute of Engineering & Technology, Hyderabad-90 for the valuable guidance and suggestions, keen interest and through encouragement extended throughout project work. We take immense pleasure to express our deep sense of gratitude to our beloved Guide Dr. G. Madhu, Associate Professor in Information Technology, VNR Vignana Jyothi Institute of Engineering & Technology, Hyderabad, for his valuable suggestions and rare insights, for the constant source of encouragement and inspiration throughout my project work. We express our thanks to all those who contributed to the successful completion of our project work. 1. ANUSHKA REDDY ENUGU _____________________ 2. G. D. RITHVIK CHOWDARY _____________________ 3. N. THARANYA _____________________ 4. Y. NISHITHA _____________________ v A Project Report on TRANSFER LEARNING APPROACH FOR MALARIA PARASITE DETECTION FROM MICROSCOPIC IMAGES Submitted in the partial fulfillment of the requirements for the award of the degree of BACHELOR OF TECHNOLOGY In INFORMATION TECHNOLOGY Submitted by ANUSHKA REDDY ENUGU 16071A1203 Under the esteemed guidance of PROJECT GUIDE Dr. G. Madhu Associate Professor, Dept. of Information Technology, VNRVJIET DEPARTMENT OF INFORMATION TECHNOLOGY VNR Vignana Jyothi Institute of Engineering & Technology (Autonomous Institute, Accredited by NAAC with ‘A’ grade and NBA) Bachupally, Nizampet (S.O.) Hyderabad- 500 090. April 2020 vi A Project Report on TRANSFER LEARNING APPROACH FOR MALARIA PARASITE DETECTION FROM MICROSCOPIC IMAGES Submitted in the partial fulfillment of the requirements for the award of the degree of BACHELOR OF TECHNOLOGY In INFORMATION TECHNOLOGY Submitted by G. D. RITHVIK CHOWDARY 16071A1219 Under the esteemed guidance of PROJECT GUIDE Dr. G. Madhu Associate Professor, Dept. of Information Technology, VNRVJIET DEPARTMENT OF INFORMATION TECHNOLOGY VNR Vignana Jyothi Institute of Engineering & Technology (Autonomous Institute, Accredited by NAAC with ‘A’ grade and NBA) Bachupally, Nizampet (S.O.) Hyderabad- 500 090. April 2020 vii A Project Report on TRANSFER LEARNING APPROACH FOR MALARIA PARASITE DETECTION FROM MICROSCOPIC IMAGES Submitted in the partial fulfillment of the requirements for the award of the degree of BACHELOR OF TECHNOLOGY In INFORMATION TECHNOLOGY Submitted by N. THARANYA 16071A1237 Under the esteemed guidance of PROJECT GUIDE Dr. G. Madhu Associate Professor, Dept. of Information Technology, VNRVJIET DEPARTMENT OF INFORMATION TECHNOLOGY VNR Vignana Jyothi Institute of Engineering & Technology (Autonomous Institute, Accredited by NAAC with ‘A’ grade and NBA) Bachupally, Nizampet (S.O.) Hyderabad- 500 090. April 2020 viii A Project Report on TRANSFER LEARNING APPROACH FOR MALARIA PARASITE DETECTION FROM MICROSCOPIC IMAGES Submitted in the partial fulfillment of the requirements for the award of the degree of BACHELOR OF TECHNOLOGY In INFORMATION TECHNOLOGY Submitted by Y. NISHITHA 16071A1259 Under the esteemed guidance of PROJECT GUIDE Dr. G. Madhu Associate Professor, Dept. of Information Technology, VNRVJIET DEPARTMENT OF INFORMATION TECHNOLOGY VNR Vignana Jyothi Institute of Engineering & Technology (Autonomous Institute, Accredited by NAAC with ‘A’ grade and NBA) Bachupally, Nizampet (S.O.) Hyderabad- 500 090. April 2020 ix ABSTRACT Malaria is a deadly disease caused by infected mosquitoes. The most affected regions across the globe are the tropical regions. Millions of lives are affected every year due to malaria. Malaria is usually diagnosed by the microscopic examination in the laboratory. Through this diagnosis you must distinguish between the infected and non-infected parasites in the Red Blood Cells (RBCs). This process of diagnosis takes more time, and the correctness of the results are also dependent on the experience and knowledge of the analyst in the laboratory. To overcome the challenges faced during the diagnosis using traditional methods we can automate the diagnosis process using current technological advancement. In this project, we use the concept of transfer learning to detect the infected and non-infected malarial parasites using microscopic images. In this transfer learning approach, we combined the Residual Neural Network (ResNet) and Support Vector Machine (SVM). Through this approach we obtained Validation accuracy as 90%, Test accuracy as 89%, Train accuracy as 99%. With this technique being implemented there will be no chance of human errors and the results obtained will be accurate. x INDEX Contents Page No ABSTRACT X CHAPTER 1: INTRODUCTION 1-4 1.1 PURPOSE OF THE PROJECT 1.2 EXISTING METHODOLOGIES AND ITS DISADVANTAGES 1.3 PROPOSED SYSTEM 1.4 OBJECTIVE 1.5 THESIS ORGANIZATION CHAPTER 2: LITERATURE SURVEY 5-14 CHAPTER 3: ISSUES AND CHALLENGES 15 CHAPTER 4: METHODOLOGY 16-17 4.1 INTRODUCTION 4.2 PROPOSED APPROACH AND ITS FLOW DIAGRAM CHAPTER 5: IMPLEMENTATION 18-25 5.1 DATASET COLLECTION AND UNDERSTANDING 5.2 DATA PIPELINE TO THE MODEL 5.3 COMPOSE THE MODEL 5.4 TRAINING THE MODEL AND CLASSIFICTAION 5.5 PERFORMANCE ANALYSIS 5.6 ENVIRONMENT xi CHAPTER 6: RESULTS 26-27 CHAPTER 7: CONCLUSION 28 CHAPTER 8: FUTURE WORK 29 CHAPTER 9: REFERENCES 30 SOURCE CODE 31-33 xii CHAPTER 1 INTRODUCTION Malaria is a hazardous disease caused by Plasmodium parasites. These parasites are transmitted to people when they are bitten by feminine Anopheles’ mosquitoes. Only an infected anopheles’ mosquito can transmit malaria. These Anopheles’ mosquito might have been infected when it had its blood meal from a person who has already been infected. Now when it has its next blood meal from other person the parasites present in the mosquito are transmitted into him/her. The symptoms of malaria typically include fever, tiredness, vomiting, and headaches. The symptoms usually show after 10-15 days after bitten by the mosquito. We generally have five species of malaria parasites that infect humans Plasmodium falciparum, P. vivax, P. ovale, and P. malariae, P. knowlesi. Plasmodium falciparum is the dangerous one among these five different varieties and it leads to severe infections and may lead to death sometimes. The parasites causing malaria are usually present in the red blood cells of the person who has been infected so, malaria can be transmitted through blood transmission, transplantation of organs, or using the same needle for more than once. Fig 1.1 Plasmodium falciparum gametocytes 1 The long-established way of diagnosing malaria is by visually inspecting a droplet of the patient’s blood on a slide that is spread evenly which is studied using a microscope. The specimen is stained before examination such that each parasite can have distinguishing appearance. The examination is done by qualified and experienced technicians. This microscopy examination of malarial parasites remains the evergreen technique for confirmation of malaria in laboratory. This technique is at ease of use and widely available. The efficiency of this technique is dependent on the experience of the laboratory analyst and the condition of the microscope. It is not sure that all the reports obtained are accurate so, it may lead to incorrect diagnostic results and inappropriate treatment. In order to provide accuracy of the diagnostic results, we proposed a model using transfer learning approach based on the Residual Neural Network (ResNet) and Support Vector Machine (SVM). With this approach we will be able to classify microscopic images of the blood cells as infected or uninfected. 1.1 Purpose of the Project Malaria is a contagious disease that is caused by the transmission of plasmodium parasites. Malaria is widespread across the world. The most affected regions are the tropical zoned areas. The parasites carried by the mosquito are present in the red blood cells of the person who has been infected. These deadly parasites can live for a year in our body without showing any symptoms and delay in the treatment can lead to complications or even death of the person. Early detection of the disease saves lives. As per the demographics released by the WHO, the rate of the disease has decreased from 2010 to 2014 and increased from 2015 to 2017. about 228 million malaria cases were detected worldwide in 2018 and approximately 405,000 fatalities which is less when compared to 2017 where the detected malaria cases were said to be 231 million and 416,000 loss of lives. When the number of cases increases the diagnosis will be difficult if we use the traditional laboratory method, so we need to automate the process of diagnosis. We already have certain automated methods proposed by certain researchers. We develop an efficient model using a transfer learning approach with significant performance compared to the existing ones that would make the diagnosis easy, fast, and effective. 2 1.2 Existing methodologies and disadvantages As the microscopy analysis of the detection of parasites is considered as the gold standard technique for detecting malaria. The correctness of the results depend on the capability and experience of the analysts. This method could be inefficient if there is a lack of efficiency and experience in the technicians. We also have rapid diagnostic tests and other tests like Polymerase chain reaction, Fluorescent microscopy, and some staining methods. There is a chance of human error in traditional malaria detection. We also have some automated methods for malaria parasite detection using Machine Learning, Artificial Neural Networks (ANN), Clustering algorithms, Transfer learning and also using neural networks concept. The maximum accuracy obtained in one of these methods is said to be around 93%. You can see the comparisons of our developed model with a few other models in the results section. 1.3 Proposed System In this proposed system we perform a transfer learning approach to detect the malarial parasites using Residual Neural Network (ResNet) and Support Vector Machine (SVM). In this model we read the dataset using OpenCV library in python. Pre-processing of images is not performed instead the images are directly passed to our model of ResNet and SVM. We have used ResNet50 in that we used 49 layers of ResNet. The output obtained from the dense matrixes (dense-32 + dense-1) is transmitted to the SVM. As SVM is a neat decision boundary model it classifies the images as infected and uninfected. 1.4 Objective The main objective of this model is to propose an efficient computerized model for detecting malarial parasites from microscopic images using transfer learning approach. Some research fraternities already proposed a model using VGG-19 and SVM. We are implementing a transfer learning approach using Residual Neural Network (ResNet) and Support Vector Machine (SVM) to obtain the highest accuracy compared to the existing ones. 3 1.5 Thesis Organization Chapter 1: This chapter presents the basic information and introduction and the necessary technical knowledge to implement. Chapter 2: This chapter deals with the literature survey and research needed to implement the project. Chapter 3: This chapter deals with the challenges and issues that occurred during the implementation of the project. Chapter 4: This chapter deals with methodology design and approach Chapter 5: This chapter deals with the implementation of the methodology. Chapter 6: This chapter deals with the experimental results of the model. Chapter 7: This chapter deals with the conclusion. 4 CHAPTER 2 LITERATURE SURVEY Recently, automated malaria detection has made great achievements using current technological advancements. Many research fraternities have proposed models using transfer learning, machine learning, artificial neural networks and few more. Now, let us discuss the papers referred by us during the project. Vijayalakshmi A in 2019 proposed a novel transfer learning model to detect the malarial parasites. In this approach they unified the Virtual Geometric Group (VGG-19) and Support Vector Machine (SVM). They used the strategy of training the top layers and freezing the remaining layers. The results of their proposed model excelled, and performance of their model proved to be more efficient than the existing CNN models. Christine F. Mark Walter in 2016 proposed a model to detect the malarial biomarkers. In this model they made use of magnetic based detection to identify the malarial biomarkers. The magnetic beadbased ELISA format they used in this project shows good performance. They also suggested to use this model for detection multiple biomarkers. Purnima Pandit in 2016 proposed a model using Artificial Neural Networks to classify the malarial parasites in Red Blood Cells. In their model they extract features from the data that is collected from the images of the Red Blood Cells. These extracted features are given to the Artificial Neural Network as input and it classifies the images as infected or not. Meng-Hsim Tsai in 2015 proposed a model that is used to identify the malarial parasites in the blood smear images. This model can say if the patient is infected with malaria or not. In this model they made use of Adaptive histogram, Threshold segmentation and K-means clustering. With the help of their proposed model detecting malaria would be easy for doctors too. 5 S. T. Khot in 2014 proposed computerized model for detecting the parasites of malaria. They took thin blood smear images as input and used different classification algorithms to separate the infected and uninfected parasites. In this model they made use of some preprocessing techniques and algorithms like KNN classification, Linear programming, Artificial Neural Networks and Template matching techniques to classify the images. Daniel Maitethia Memeu in 2013 proposed a model to detect the parasites from images. In this model they made used of Square Median filter for image pre-processing, Histogram Thresholding for image segmentation, Histogram Segmentation and Artificial Neural Networks for classification of the images. This model showed good performances in recognizing the parasites causing malaria. Isha suwalka in 2012 proposed a model that is used to identify the kind of parasite with the color and shape of the cell. In this model they make use of Digital Image processing and Curved shape analysis to detect the parasites. This model shows better performance as it takes less time and not much manpower is required to implement this model. Yashasvi Purwar in 2011 proposed an unsupervised model to detect the parasites causing malaria. In this model they did image preprocessing, segmentation of images, and clustering algorithms to classify the microscopic images of the malarial parasites. This method can be improved using imagebased screening methods. The performance of this model is better when compared to the laboratory methods. Subhamoy Mandal in 2010 proposed a model for segmentation of the blood cells and classification of the images as infected and uninfected. The steps involved in implementation of the model are data gathering and preprocessing the image and using color space conversion and normalized cut algorithm for segmentation of images. After the segmentation the images of the parasites are classified as infected and uninfected. 6 TRANSFER LEARNING Transfer learning is a machine learning model where once we develop the model it can be reused. The knowledge we gain while solving a problem can be used again while solving the related problems of different models. Consider, if we develop a model to predict or classify images that contains a computer, we can use the knowledge of the model to recognize other items. We need not start solving the problem from starting because with this technique we can make use of the patterns we got while solving other problem models. Transfer learning is popular in deep learning as it can be used in training neural networks where prepared trained models are used with a comparatively small amount of data. Transfer learning is very useful in our real time situations because most of the problems do not have large data point to train such models. We scrutinize that knowledge that was learned in one model and make some modifications while using in other models to improve the performance of these models. Transfer learning is generally used in developing many applications like face recognition, malarial parasites detection and few more. In 1993, Lorien Pratt is the first person to publish a paper on transfer in machine learning, formulating the discriminability-based transfer (DBT) algorithm. As we know that transfer learning makes use of the knowledge gained from its previous models and it is also important to know what is the data being transferred and how do we transfer the data. We use the knowledge gained from previous problems to increase the performance of the problem we are solving. We have different transfer learning techniques. The implementation of the techniques vary based on the data we are training. Some of the transfer learning techniques are listed below. • Inductive Transfer learning • Transductive transfer learning • Unsupervised transfer learning We generally have two approaches for implementing transfer learning they are namely, 1. Develop model approach and 2. Pre trained model approach. 7 Transfer learning model is a layer structure. The starting layers are trained, and their output is added to the last layer and then we get the final output. We extract features while training the data in this approach. We can replace the last layer with any classification algorithm or something that is suitable for our model. We can freeze the top layers that is restrain some layers of the network while we use it for other model. We can replace these frozen layers by some supervised motive of the problem. The basic principle of transfer learning is the use of previously studied models. So we have some pre-trained models that can be used in developing some applications. The pre-trained models are listed below • VGG-19 • VGG-16 • ResNet-50 Transfer Learning with Image Data In general, transfer learning is applied to the problems that use pre-trained models and also take images as input. For example, consider face recognition model in this the images captured by the webcam or the images that are already on your device are taken as input. We use transfer learning approach when we have large data set and also when we should classify the images that are taken as input. The pre-trained models may take so much time for training the data, it may be like days or weeks. These models can be downloaded and can used to train your dataset. Three examples models of this type are: • Oxford VGG Model • Google Inception Model • Microsoft ResNet Model 8 Why we use transfer learning The main advantages of transfer learning are as follows 1. Saving training time as we are using the pre-trained model 2. Better performance of neural networks 3. It does not need a lot of data. In general, we require a lot of data to train a network from starting. It is not sure that you will be provided with the data all the time. In such cases transfer learning can be used, as we make use of the pre-trained models existing in it. We make use of transfer learning in image classification, face recognition related problems. It takes a lot of time to train the network from the start. So, it would be helpful if we make use of the pretrained models that are already existing that would help us save time required to train the network. Image Processing Image classification is the task of recognizing objects or patterns on an image and assign a category to classify the image based on a pool of given categories/classes. There are a lot of different sub fields like recognizing multiple objects on a single image or even localize the objects within the image. The image processing of malaria parasite consists of different stages • Image acquisition • Pre-processing • Erythrocyte segmentation • Feature extraction • Image classification 9 Image acquisition It is the process where images are collected from the microscope that are captured by the camera attached to it. Pre-processing Pre-processing is said to be the process of removing unwanted noise from the images. As a result, we could get better visualization. Segmentation The process where red blood cells are isolated by removing all other details such as white blood cells, platelets from the pre-processed image. Feature extraction Here we extract the insights from the parasites like colour, geometry, texture. These features can be used to classify the images using algorithms. Applications of transfer learning 1. Real time simulation 2. Gamming 3. Image processing 4. Zero shot translation 5. Sentiment classification 10 Neural Network We have multiple Kinds of Neural Networks such as Feed Forward (FF) Neural Networks, Back Propagation, Recurrent Neural Network, Markov chain, Deep Convolutional Network. In this project we focus on performance and efficiency that requires less throughput. We follow two layered Feed Forward Network that reduces complexity while understanding the facial features in a given image. We have used Feed Forward (FF) Neural Networks for the classification of the input images. Feed Forward Neural Network can be viewed as a simple distribution of nodes in one direction with some intermediate layers. The first stage of Neural Network between the neuron or the nodes can be termed as Feed Forward Neural Network (FF). Neural Network is a combination of algorithms that are distributed in different layers which are computed one by one to find relationship from a set of huge data that reflects the properties of the neural network of a human being learning capabilities. We use neural network to classify an image. A Neural Networks consists of layers of interconnected nodes. We have two important components in Neural Networks. They are the Neuron and Network. A neuron is a node which stores data. The Network is mathematical relationship between the nodes that is accomplished using functions. We also have different Layers present in the Network. They are the Input Data Layer, Multiple Intermediate Hidden Data Layers, Output Data Layers. Input Data Layer indicates the input image which is represented as array of data which holds RGB Values ranging from 0 to 255. Hidden Data Layers are the outcome of the Previous Layer when we pass through a Mathematical Function. Output Layer is the Layer that gives some information regarding the input information. The information could be an object or pattern or image recognition depending on the function used in the network. The reason for using Feed forward is to reduce most of the intermediate and loops that increases load on the CPU as well as on other resources that requires huge amount of time for Mathematical computations 11 Fig 2.1: Neural Network ResNet Resnet is usually known as residual network. It was evolved in 2015. It can be said as leading-edge work in computer deep learning networks/models. With the help of ResNet it is possible to train hundreds or even thousands of layers and achieve best performance. ResNet has got many advantages compared to existing networks that also includes representation ability, other than image classification many computer vision applications have been enhanced like face recognition and object detection. In general, a single layer is enough to represent function in a feedforward network. But if the layer is massive it may lead to overfitting of data. In such cases the neural networks must go deeper. When the neural networks go deeper it may lead to worst performances. This gradient issue is solved with the help of ResNet. ResNet introduces a new concept of identifying shortcut connection that is., it bypasses one or more layers in the network. 12 Fig 2.2: Identity shortcut connection in ResNet In this project we have used ResNet-50. ResNet-50 is a convolutional neural network which has depth of 50 layers. A pre-trained version of the network trained can be loaded on more than a million images. The pre-trained network model can classify images into different classes. Due to which, the network has learned to obtain rich features from a large number of images. 13 Support Vector Machine Support Vector Machine is usually known as SVM. This is said to be a supervised machine learning algorithm. It is mostly used for classification algorithms. SVM is a neat decision boundary that classifies the data based on the hyperplane obtained that differentiates the classes. Fig 2.3: Hyperplane of SVM The data sets are plotted in n-dimensional space. These datasets will be said as support vectors. The class of data present on the right side of the hyperplane will be classifies as positive and the class present on the left side of the hyperplane is classifies as negative datasets. With the help of this Support Vector Machine (SVM) we will classify our microscopic images as infected and un-infected. 14 CHAPTER 3 ISSUES AND CHALLENGES Over the last decade, in education many research fraternities and commercial product developers have proposed different models to detect the malarial parasites. Every model developed might have faced certain challenges during development. In our transfer learning approach of detecting malarial parasites we did face some issues and challenges. The major issue faced during the development of our model is finding the right hyperparameters. In general, hyperparameter is a value of the parameters that are set before the learning process begins. Hyperparameters vary from each training model. The time required to train and test a model depends on the hyperparameters we choose. Hyperparameters usually cannot be learned using well-known gradient methods. As these parameters describe the model representation they cannot be learned through normal optimization techniques. They cannot be learned from the training data as they increase the capacity of the model that may lead to overfitting of the data and picking up noise in the data. Performance variation can be attributed to only a few hyperparameters. We tune the algorithm, hyperparameters to know how much performance can be achieved by the model. Methods that do not respond fast to small changes in the hyperparameters or different implementations of the algorithm must not be involved in the model as it leads to a decrease in the robustness of the model. Another major issue faced during the development of the model is choosing the depth of the Residual Neural Network (ResNet). As the depth increases in the neural network there is a chance for the performance of the model to degrade. ResNet follows the residual network approach that means it skips one more input layer of the network. It will be difficult to know the exact depth required for the model to generate optimal results. 15 CHAPTER 4 METHODOLOGY 4.1 Introduction In this section we understand what strategies are adopted to extract the features from the images. As we proposed a transfer learning model where we combine the Residual Neural Network (ResNet) and Support Vector Machine (SVM). We have used ResNet50 to extract the features. We took 49 layers of ResNet and in these layers we have added our model and the output obtained from the output layer is sent to the SVM. As SVM is a classifier it classifies the input images into different classes based on the input. 4.2 Proposed Approach and its Flow Diagram There are different stages in the proposed approach: 1.Reading the dataset 2.Building a model where we made use of ResNet and SVM 2.1 passing the images to 49 layers of ResNet 2.2 Passing the output of the 49 layers to Dense-32 layer 2.3 Passing the output of the Dense-32 layer to Dense-1 layer 2.4 This output is then given as input to the SVM for classification In this proposed system we perform transfer learning approach to detect the malarial parasites using Residual Neural Network (ResNet) and Support Vector Machine (SVM). In this model we read the dataset using OpenCv library in python. We do not perform any image preprocessing instead the images are directed passed to our model of ResNet and SVM. We made use of ResNet-50 in that we use 49 layers of ResNet where all the feature extraction of the images will take place. The output obtained from these layers is passed to dense matrixes (dense-32 + dense-1) is transmitted to the SVM. As SVM is a neat decision boundary model it classifies the images based on the input as infected and uninfected. 16 The flow diagram of the methodology is as shown below: Fig 4.1: Methodology Flowchart 17 CHAPTER 5 IMPLEMENTATION Malarial Parasites Detection Detection of malarial parasites using automated methods that are developed using current technologies provides us with the results that have accuracy as there is no chance of human errors. We developed one such model using a transfer learning approach. Malarial parasite detection involves the following steps: 1. Dataset collection and understating 2. Data pipeline to the model 3. Compose the model 4. Training the dataset and classification 5.1 Dataset collection and Understanding The microscopic images used for our model are previously existing ones. The images are encrypted in a zip file. We need to decrypt the file and try to understand which of those are infected and which are not. In those microscopic images we have the images of infected malarial parasites as well as uninfected malarial parasites. Fig 5.1: Microscopic Images of red blood cells. 18 5.2 Data Pipeline to the model Data pipeline is generally the transfer of data from one system to another. Here, the output of one system will be the input of other system. We transform the data in batches from one system to other. As we are working with unstructured data we cannot load the dataset in to the memory. We need to modify this unstructured data as structured data that would make the training of images to the model a bit easy. We classify the input dataset into 2 classes. CELL_IMAGES UNINFECTED PARASITIZED Fig 5.2: Classification of images 5.3. Proposed Transfer Learning Model The model we proposed for detecting malarial parasites is the transfer learning approach. In this approach we combine both Residual Neural Networks and Support Vector Machine to classify the microscopic images. We make use of ResNet-50 in our model and in that we have Dense-32 and Dense-1 matrices. The output obtained from ResNet is provided as input to SVM to classify. 19 Fig 5.3: Proposed Transfer Learning Workflow Model with ResNet-50+SVM. 5.4 Training the model and classification We will not be using the whole dataset in training the model we will split the dataset for training, validation, and testing into 80,10,10 respectively. We are reading the dataset by the OpenCV library. In our proposed system we are using ResNet and SVM. We are taking only 49 layers of Resent as a pre-trained model and passing the dataset size of (64,64,3) through these layers. The extraction of features from the images will be performed in these layers. The features are again flattened into a linear array where it will extract low-level features in the images. These features are passed to the Dense-32 layer here it produces an output of 32 length vector. We have chosen Dense-32 as it has experimented as it came out to be best to produce the features. This 32 length vector is passed as input to Dense-1 layers which is the output layer. It gives the output as value. The value is then passed to SVM (support vector machine) a classifier that classifies the images into infected and uninfected. 20 Fig 5.4: Training the input dataset 21 5.5 Performance Analysis The performance analysis of the proposed model is assessed by calculating the following formulas. Accuracy is obtained using the formula, Accuracy = TP+TN TP+TN+FP+FN where TP is the true positives rate, TN is the true negatives rate, FN is false negatives, and FP is false positives. Recall value obtained using the formula, Recall = True Positives (True Positives + False Negatives) Precision is calculated using the formula, Precision = True Positives (True Positives + False Positives) F-Measure is calculated using the formula, F-Measure = (2 * Precision * Recall) (Precision + Recall) 5.6 Environment Python is an interpreted and a high-level programming language. Python has simple English statements which enable us to understand the code without much prior knowledge. Python’s design makes code readability with its use of significant whitespace. It also follows indentation which doesn’t hinder much in analysing or debugging the code. It follows an object-oriented approach to have users a clear view of the logical and functional part of the projects. The projects need not be a small scale. 22 Interface- Jupyter notebook, anaconda prompt The Jupyter Notebook is a web application that is available for free. It is an open source in which we can create and share the documents that contain code, text or equation. Jupyter is a project that is used to develop the Jupyter user interface to support multiple programming languages. Jupyter Notebooks is subproject from a project called Ipython. It mainly supports Julia, Python and R programming languages and hence the name Jupyter. Jupyter along with the Ipython kernel. It enables us to write your code in Python, R. There are around 100 other kernels which are available in Jupyter that can be used to run the program. The notebook extends helps to compute or visualize data for building an application which requires analysis of data using console-based approach. Jupyter also has extra feature of documenting and transfer of information which is useful to maintain records on a largescale project. As every other platform it supports development, execution and computation of code in an IDE. The Jupyter notebook integrates two main components while developing any application or platform. These include: • Web application: It is a web-based tool for interactive documents production which incorporates explanatory text, mathematics, computations and their rich media performance. • Notebook documents: It contains all the accessible material in the web application, including computational inputs and outputs, explanatory text, mathematics, images and objects’ rich media representations. Python Packages OpenCV: OpenCV is an open source computer vision library, optimized and cross platform. It written in c++ programming language. It was originally developed by intel and later on it was supported by Willow Garage. It is also used in python, Java interfaces. It provides interface for any image processing technique. There are many different applications of openCV such as Human-computer Interaction, robotic, segmentation and recognition, motion tracing, Gesture and Facial recognition. 23 Numpy: NumPy or Numerical Python is library used to perform some operations on arrays. It contains a ndimensional array object that is useful in applying algebra and various other mathematical formulas to the array as a whole. NumPy object can also be referred as Numpy array. It stores data such that the it effectively performs mathematical calculations. Numpy Array stores the data in the tabular format i.e. rows and columns. In order to use Numpy first check whether numpy is install or not. If not install the numpy package using below steps. To install numpy, go to command prompt. Type the below command and press enter. “pip install numpy” The numpy array has many advantages when compared with list because: 1. Less Memory 2. Fast 3. Convenient We can perform different operations such as ndim, itemsize, dtype, reshape, slicing, linspace, max/min, square Root and Standard Deviation etc. Os Package: The python OS package provides functionalities to interact with the operating system. OS module comes among the basic utility modules of Python. It offers a versatile way of using functionalities based on the OS. The modules * os * and * os.path * have several functions for communicating with the filesystem. In os the functions are: • os.name: This function calls the module based on the operating system which is being imported. • os.getcwd: It returns the Current Working Directory (CWD) of the file used for the code execution. It could vary from one system to another. • os.rename Using this function we can rename a file. • os.popen(): This function opens a pipe to or from command. It contains two modes by which we can read or write they are ‘r’ and ‘w’. 24 Keras Keras is a high-level neural networks API. It is written in Python programming language. It is can be ran on top of TensorFlow, CNTK, or Theano. If we want a deep learning library we can use Keras Keras allows easy and fast prototyping. It does not computation at low level. It uses a library called Backend to do it. So, the low level API is high-level API. It handles the way we make our models, defining layers of the model. It compiles our model with few parameters they are loss, optimization functions, and a training process. TQDM: TQDM a library which provides progress bars and also provides remarkable support for the nested loops and Jupyter notebooks. It supports nested progress bars and if we have Keras fit and predict loops within an outer TQDM loop. The nested loops will display properly. 25 CHAPTER 6 RESULTS In this section, you will come across the outputs that are obtained by the implementation of the proposed system. Fig 6.1: Accuracy and Loss Curves of the model proposed 26 We also got the confusion matrix of the model, classification reports and accuracy of the proposed model. Performance evaluation of existing models Methods Sensitivity Precision F-score Accuracy VGG16-SVM 89.90% 84.47% 87.05% 89.21% VGG!9-SVM 93.44% 89.95% 91.66% 93.13% Accuracy table of the proposed model: Accuracy ResNet50-SVM Test 89.89% Validation 90% Train 99% Fig 6.2: Confusion matrix and Classification Report 27 CHAPTER 7 CONCLUSION In this project, we implemented a transfer learning approach to detect malaria parasites in thick blood smears where we use a pre-trained model approach. Detection of malaria using ResNet and SVM model is a model where it can’t detect the malaria parasite in the cells. The model can detect the features through layers of Resnet and further feeding the data to SVM. ResNet a residual network extracts the features from the images and then these features are sent into SVM (support vector Machine) which acts as a classifier and classifies the images into two classes that are infected and uninfected. This model has obtained a training accuracy of 99% and validation accuracy of 90%. Hence, by using this model we can accurately declare if they have malaria or than. This model produces more accuracy than the person diagnosing the cells in the laboratory. In help in decision making in many of the forensics laboratories. 28 CHAPTER 8 FUTURE WORKS Our algorithms can be enhanced by applying other deep learning techniques so that better accuracy and loos can be obtained. Unification of ResNet and SVM provides better performance than VGG19 and CNN models. The proposed model is used to classify only one kind of malarial parasite as infected and uninfected. As we have five different parasites that cause malaria in the future, our project can be extended to classify all five different parasites. We can improve the performance of the proposed model by using many neural networks like Random forest, AlexNet neural networks. The proposed system can be used in the diagnosis of malarial parasites which makes it easy, fast, and effective. 29 CHAPTER 9 REFERENCES 1. Deep learning approach to detect malaria from microscopic images Vijayalakshmi A1 & Rajesh Kanna B1 https://link.springer.com/article/10.1007/s11042-019-7162-y 2. Image analysis and machine learning for detecting malaria,” M. Poostchi, K. Silamut, R. J. Maude, S. Jaeger, and G. Thoma, https://pubmed.ncbi.nlm.nih.gov/29360430/ 3. Improving Malaria Parasite Detection from Red Blood Cell using Deep Convolutional Neural Networks Aimon Rahman, Hasib Zunair, M Sohel Rahman, Jesia Quader Yuki https://arxiv.org/ftp/arxiv/papers/1907/1907.10418.pdf 4. Detecting Malaria with Deep Learning https://towardsdatascience.com/detecting-malariawith-deep-learning-9e45c1e34b60 5. CNN-Based Image Analysis for Malaria Diagnosis Zhaohui Liang, Andrew Powel https://www.lhncbc.nlm.nih.gov/system/files/pub9433.pdf 6. Deep Learning for Medical Image Processing: Overview, Challenges and Future Muhammad Imran Razzak, Saeeda Naz and Ahmad Zaib https://arxiv.org/ftp/arxiv/papers/1704/1704.06825.pdf 7. Teaching a neural network to see roads https://towardsdatascience.com/teaching-a-neuralnetwork-to-see-roads-74bff240c3e5 8. How to visualize a Deep Learning Neural Network Model in Keras https://machinelearningmastery.com/visualize-deep-learning-neural-network-model-keras/ 9. Higher validation accuracy,than training accuracy using TensorFlow and Keras https://intellipaat.com/community/19943/higher-validation-accuracy-than-trainingaccurracy-using-tensorflow-and-keras 10. https://www.learnopencv.com/wp-content/uploads/2017/11/cnn-keras-curves-with-aug.jpg 11. https://towardsdatascience.com/the-4-convolutional-neural-network-models-that-canclassify-your-fashion-images-9fe7f3e5399d 12. Detecting malaria with deep learning https://opensource.com/article/19/4/detectingmalaria-deep-learning 13. https://towardsdatascience.com/an-overview-of-resnet-and-its-variants-5281e2f56035 30 SOURCE CODE: import numpy as np import matplotlib.pyplot as plt from tqdm import tqdm import os import cv2 import keras from keras.layers import * from keras.models import * from keras.optimizers import * from keras.preprocessing import image from keras.utils.vis_utils import plot_model import warnings warnings.filterwarnings("ignore") #DATA CREATION DATADIR = r"C:\cell_images" CATEGORIES = ["Uninfected","Parasitized"] IMG_SIZE = 64 os.chdir(DATADIR) training_data = [] training_data1 = [] def create_training_data(): for category in CATEGORIES: path = os.path.join(DATADIR,category) class_num = CATEGORIES.index(category) for img in tqdm(os.listdir(path)): try: img_array = cv2.imread(os.path.join(path,img)) new_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE)) training_data.append([new_array, class_num]) training_data1.append(new_array) except Exception as e: pass X = [] y = [] for features,label in training_data: X.append(features) y.append(label) X = np.array(X).reshape(-1, IMG_SIZE, IMG_SIZE, 3) img_rows = IMG_SIZE img_cols = IMG_SIZE channels = 3 img_shape = [img_rows, img_cols, channels] return X,y,img_shape 31 X,y, shape = create_training_data() print('Dataset shape: {0}, Image shape: {1}'.format(X.shape, shape)) print(len(training_data1)) from sklearn.model_selection import train_test_split x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # using a pretrained model from keras.applications import VGG19,ResNet50 #model = VGG19(include_top = False, input_shape = shape) resnet = ResNet50(include_top = False, weights='imagenet',input_shape = shape) for layer in resnet.layers: layer.trainable = False #output = resnet.layers[-1].output #output = keras.layers.Flatten()(output) #resnet = Model(resnet.input, output=output) resnet.summary() resnet = Model(inputs=resnet.input, outputs=resnet.get_layer('activation_71').output) from keras.models import Sequential model = Sequential() model.add(resnet) model.add(Flatten()) model.add(Dense(32, activation='relu')) model.add(Dense(1, kernel_regularizer='l2', activation='sigmoid')) model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy']) print(model.summary()) #history = model.fit(x_train,y_train,batch_size=32,validation_data=(x_test,y_test),epochs=10) history = model.fit(X,y,batch_size=32,validation_data=None,epochs=50) model.save('model_50_res.h5') y_pred=np.round(np.squeeze(model.predict(x_test))) y_true=np.argmax(np.array(y_test),axis=-1) pred=np.argmax(y_pred,axis=-1) num_epoch=50 print(history.history.keys()) # summarize history for accuracy plt.plot(history.history['accuracy'],color='r') plt.plot(history.history['val_accuracy'],color='g') plt.title('model accuracy for {0} epochs'.format(num_epoch)) 32 plt.ylabel('accuracy') plt.xlabel('epochs') plt.legend(['train', 'validation'], loc='upper left') plt.show() # summarize history for loss plt.plot(history.history['loss'],color='r') plt.plot(history.history['val_loss'],color='g') plt.title('model loss for {0} epochs'.format(num_epoch)) plt.ylabel('loss') plt.xlabel('epochs') plt.legend(['train', 'validation'], loc='upper left') plt.show() from keras.models import load_model model1 = Model(inputs=model.input, outputs=model.get_layer('dense_3').output) X = model1.predict(X) from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) from sklearn import svm #Create a svm Classifier clf = svm.SVC(kernel='linear') # Linear Kernel #Train the model using the training sets clf.fit(X_train, y_train) #Predict the response for test dataset y_pred = clf.predict(X_test) from sklearn.metrics import accuracy_score,confusion_matrix, classification_report cm=confusion_matrix(y_test,y_pred) cr=classification_report(y_test, y_pred) ac=accuracy_score(y_test, y_pred) plt.show() print('confusion matrix:',cm) print('classification report',cr) print('accuarcy score',ac) 33