1 SEMINAR REPORT Entitled “Multiclass Image Classification Using Deep Learning ” Submitted to the Department of Electronics Engineering In Partial Fulfillment of the Requirement for the Degree of Presented & Submitted By : Mr. Nikhil Gupta (Roll No. U19EC136) B. TECH. III (EC), 5th Semester Guided By : Prof./Dr. Suman deb Assistant Professor, ECED. (Year : 2021-22) DEPARTMENT OF ELECTRONICS ENGINEERING Sardar Vallabhbhai National Institute of Technology Surat-395007, Gujarat, INDIA. 2 Sardar Vallabhbhai National Institute of Technology Surat-395 007, Gujarat, INDIA. ELECTRONICS ENGINEERING DEPARTMENT This is to certify that the SEMINAR REPORT entitled “Multiclass Image Classification Using Deep Learning” is presented & submitted by Candidate Mr Nikhil Gupta, bearing Roll No. U19EC136, of B.Tech. III, 5th Semester in the partial fulfillment of the requirement for the award of B. Tech. degree in Electronics & Communication Engineering for academic year 2021-22. He has successfully and satisfactorily completed his Seminar Exam in all respect. We, certify that the work is comprehensive, complete and fit for evaluation. Prof. / Dr. Suman deb Assistant Professor & Seminar Guide SEMINAR EXAMINERS : Name of Examiner Signature with date 1. Prof KP UPLA 2. Prof MC PATEL Dr. P. N. Patel Associate Professor & Head, ECED, SVNIT. DEPARTMENT SEAL (October - 2021) 3 Acknowledgements It was an excellent opportunity for me to work and research on a topic that seemed to be taking the future world to solve the problems. The research process helped me realize how deep learning has evolved in past decades by solving and simplifying the complexities . Hence, I am thankful to everyone who was associated with me through the whole process and guided me throughout, be it directly or indirectly. I want to extend my gratitude especially to my seminar guide Prof. Suman Deb, Assistant Professor, Electronics Engineering Department, Sardar Vallabhbhai National Institute of Technology, who acted as a backbone for me and helped me wherever I got stuck in the whole flow. Without his constant support and mentorship, I would not have been able to select and complete an exciting project like this. Also, I am thankful to my friends for helping me out in understanding different technology stacks. Nikhil Gupta U19EC136 Sardar Vallabhbhai National Institute ofTechnology Surat 4 ABSTRACT Multiclass Image Classification Using Deep Learning Neural network, as a fundamental classification algorithm, is widely used in many image classification issues. With the rapid development of high performance computing device and parallel computing devices, convolutional neural network also draws increasingly more attention from many researchers in this area The term Deep Learning or Deep Neural Network refers to Artificial Neural Networks (ANN) with multi layers. Over the last few decades, it has been considered to be one of the most powerful tools, and has become very popular in the literature as it is able to handle a huge amount of data. The interest in having deeper hidden layers has recently begun to surpass classical methods performance in different fields; especially in pattern recognition. One of the most popular deep neural networks is the Convolutional Neural Network (CNN). It take this name from mathematical linear operation between matrixes called convolution. CNN have multiple layers; including convolutional layer, nonlinearity layer, pooling layer and fully-connected layer. The convolutional and fullyconnected layers have parameters but pooling and non-linearity layers don't have parameters. The CNN has an excellent performance in machine learning problems. Specially the applications that deal with image data, such as largest image classification data set (Image Net), computer vision, and in natural language processing (NLP) and the results achieved were very amazing Nikhil Gupta U19EC136 Guide:Prof. Suman deb Seminar Date & Time : 28th oct (2pm to 4pm) Examiners’ Names : 1)Prof. KP UPLA 2)Prof. MC patel 5 Table of Contents Table of Contents Acknowledgements .............................................................................................................. 3 Table of Contents .................................................................................................................. 5 List of Figures ....................................................................................................................... 6 CHAPTER 1 ......................................................................................................................... 7 INTRODUCTION ................................................................................................................ 7 1.1 Overview… .................................................................................................................8 1.2 Motivation ................................................................................................................... 8 1.3 Applications ................................................................................................................ 8 CHAPTER 2 ......................................................................................................................... 9 2.3 VGGNet ..................................................................................................................... 11 2.4 GoogLeNet ................................................................................................................ 12 2.5 ResNet ....................................................................................................................... 13 CHAPTER 3 ....................................................................................................................... 14 3.1 Deep Learning ............................................................................................. 14 Chapter 4 ............................................................................................................................ 19 4.1 CONVOLUTION NEURAL NETWORK ................................................................ 19 4.2 Terms Related to Convolution NeuralNetwork Explaination ................................... 21 4.21 Convolution .................................................................................................... 24 4.22 Stride................................................................................................................24 4.23 Padding. .......................................................................................................... 25 4.24 Pooling ............................................................................................................ 27 4.25 Fully-Connected Layer ................................................................................... 28 CHAPTER 5 .................................................................................................................... 29 5.2 Trained Model Output ............................................................................................... 30 CHAPTER-6.......................................................................................................................... 33 6.1 Conclusion ................................................................................................................... 33 6.2 Future Scope ................................................................................................................. 33 6.3 Refrences ..................................................................................................................... 34 6 List of Figures 2.1 LeNET Architecture......................................................................................................9 2.2 AlexNET Architecture ................................................................................................ 10 2.3 VggNET Architecture ................................................................................................ 11 2.4 GoogleNET Architecture ............................................................................................ 12 2.5 RESNET ..................................................................................................................... 13 3.1 AI vs ML vs DL. Machine Learning is a subset of Artificial Intelligence and Deep Learning Is subset of Machine Learning… ............................................................... 14 3.2 Simple Neural Network diagram to show how different layers are connected… ..... 15 3.3 How Implementation of Machine Learning is different from Deep Learning…........ 16 3.4 How Performance changes when we increase amount of data input .........................17 4.1 Image of a dog for understanding how human brain recognize things ..................... 18 4.2 Image of a complete Convolution Neural Network ..................................................19 4.3 Convolution Operation Mathematically how it works.............................................. 22 4.4 Diff filters used for convolution and its output ....................................................... 23 4.5 Showing What exactly is Stride feature with help of a diagram ............................. 24 4.6 Diagrammatically showing how padding is used in convolution operation ........... 25 4.7 Graph Of different Activation method and their comparison ..................................26 4.8 Figure to illustrate Pooling operation ..................................................................... 27 4.9 Figure to illustrate Max Pooling operation ............................................................. 28 4.10 figure for showing the fully connected layer used in CNN process .................... 29 5.1 In this a glacier Image was given as input and model predicted it correctly .......... 33 5.2 In this a Building Image was given as input and model predicted it correctly ........34 5.3 In this a Sea Image was given as input and model predicted it correctly ............... 34 7 CHAPTER 1 INTRODUCTION 1.1 Overview The term image processing refers to the use of computers to alter digital images. It's a broad topic that encompasses mathematically complex processes. Certain essential operations, such as image categorization, image restoration/rectification, and image compression, are all part of image processing. Image enhancement, image fusion, and other similar techniques are available. Image categorization is an important part of the process. Processing of images The purpose of image classification is to assign photographs to categories automatically. Lessons with a theme There are two types of classifications: supervised classification and unsupervised classification. To deal with a limited amount of samples and computational units, traditional machine learning algorithms (such as multilayer perception machines, support vector machines, and so on) usually use shallow structures. The performance and generalization capabilities of complex classification problems are clearly insufficient when the target objects have rich meanings. Because it is good at dealing with picture classification and recognition difficulties and has improved the accuracy of many machine learning tasks, the convolution neural network (CNN) created in recent years has been widely used in the field of image processing. It's evolved into a powerful and widely used deep learning mode. 1.2 Motivation Medical imaging, item recognition in satellite images, traffic management systems, brake light detection, machine vision, and other applications all require image classification software. Convolutional neural networks (CNN) are the underlying idea underpinning current deep learning advances and improvements. CNNs have shattered the mould and climbed to the throne as the most advanced computer vision technology. CNNs are by far the most prevalent form of neural network (others include recurrent neural networks (RNN), long short term memory (LSTM), artificial neural networks (ANN), and so on). These convolutional neural network models are used in picture data. They perform exceptionally well in computer vision tasks such as picture categorization, object identification, image recognition, and so on. 1.3 Applications Predicting Consumerism Behavior Of course, the area of brand advertisements, Ad targeting, and improving customer service was bound to benefit from the useful applications of image recognition. By 8 targeting customer’s posted photos through IR – they can learn about their interests and consumerism behaviors. This, in turn, will help the brand’s targeting efforts to be streamlined and perfected. Since they will have the required data to target a relevant audiences and place their ads smartly. Iris Recognition Improvement Iris recognition has been improved considerably with the help of image recognition technology that recognizes the unique patterns in the iris. One of the most important and essential applications of iris recognition is biometric identification. Optimizing Medical Imagery Given the advent of the photo-centric age, where images, photos, and video are preferred, the medical industry isn't far behind, with 90 percent of medical data consisting of medical images, making it the largest data source in healthcare. As a result of connecting one dot with another, these medical photos will be educated by clever image recognition technology to change the art of diagnosis, making it easier to detect severe diseases such as cancer. 9 CHAPTER 2 Literature Review 2.1 LeNet Before starting, let's note that we would not have been successful if we simply used a raw multi-layer perceptron connected to each pixel of an image. On top of becoming quickly intractable, this direct operation is not very efficient as pixels are spatially correlated. Therefore we first need to extract 1. meaningful and 2. low-dimensional features that we can work on And that's where convolutional neural networks come in the game! To tackle this issue, Yann Le Cun's idea proceeds in multiple steps. First, an input image is fed to the network. Filters of a given size scan the image and perform convolutions. The obtained features then go through an activation function. Then, the output goes through a succession of pooling and other convolution operations.As you can see, features are reduced in dimension as the network goes on. At the end, high-level features are flattened and fed to fully connected layers, which will eventually yield class probabilities through a softmax layer. During training time, the network learns how to recognize the features that make a sample belong to a given class through backpropagation.To give an example of what such a network can 'see': let's say we have an image of a horse. The first filters may focus on the animal's overall shape. And then as we go deeper, we can reach a higher level of abstraction where details like eyes and ears can be captured.That way, ConvNets appear as a way to construct features that we would have had to handcraft ourselves otherwise. Fig 2.1 LeNet architecture 10 2.2 AlexNet Then you could wonder, why have ConvNets not been trendy since 1998? The short answer is: we had not leveraged their full potential back then. Here, AlexNet takes the same top-down approach, where successive filters are designed to capture more and more subtle features. But here, his work explored several crucial details First, Krizhevsky introduced better non-linearity in the network with the ReLU activation, whose derivative is 0 if the feature is below 0 and 1 for positive values. This proved to be efficient for gradient propagation. Second, his paper introduced the concept of dropout as regularization. From a representation point of view, you force the network to forget things at random, so that itcan see your next input data from a better perspective. Just to give an example, after you finish reading this post, you will have most probablyforgotten parts of it. And yet this is OK, because you will have only kept in mind what was essential.Also, it introduced data augmentation. When fed to the network, images are shown with random translation, rotation, crop. That way, it forces the network to bemore aware of the attributes of the images, rather than the images themselves.Finally, another trick used by AlexNet is to be deeper. You can see here that they stacked more convolutional layers before pooling operations. The representation captures consequently finer features that reveal to be useful for classification.This network largely outperformed what was state-of-the-art back in 2012, with a 15.4% top-5 error on the ImageNet dataset. Fig 2.2 AlexNet Architecture 11 2.3 VGGNet Deeper is better The next big milestone of image classification further explored the last point that I mentioned: going deeper.And it works. This suggests that such networks can achieve a better hierarchical representation of visual data with more layers.As you can see, something else is very special on this network. It contains almost exclusively 3 by 3 convolutions. This is curious, isn't?In fact, the authors were driven by three main reasons to do so: First, using small filters induces more non-linearity, which means more degrees of freedom for the network.Second, the fact of stacking these layers together enables the network to see more things than it looks like. For example, with two of these, the network in fact sees a 5x5 receptive field. And when you stack 3 of these filters, you have in fact a 7x7 receptive field! Therefore, the same feature extraction capabilities as in the previous examples can be achieved on this architecture as well.Third, using only small filters also limits the number of parameters, which is good when you want to go that deep.Quantitatively speaking, this architecture achieved a 7.3% top-5 error on ImageNet. Fig 2.3 VGGNet Architecture 12 2.4 GoogLeNet Next, GoogLeNet came in the game. It bases its success on its inception modules.As you can see, convolutions with different filter sizes are processed on the same input, and then concatenated together.From a representation point of view, this allows the model to take advantage of multi-level feature extraction at each step. For example, general features can be extracted by the 5x5 filters at the same time that more local features are captured by the 3x3 convolutions.But then, you could tell me. Well that's great. But isn't that insanely expensive to compute?And I would say: very good remark! Actually, the Google team had a brilliant solution for this: 1x1 convolutions.On the one hand, it reduces the dimensionality of your features.On the other, it combines feature maps in a way that can be beneficial from a representation perspective.Then you could ask, why is it called inception? Well, you can see all of those modules as being networks stacked one over another inside a bigger network.And for the record, the best GoogLeNet ensemble achieved a 6.7% error on ImageNet. Fig 2.4 GoogleNet Architecture 13 2.5 ResNet Connect the layers So all these networks we talked about earlier followed the same trend: going deeper. But at some point, we realize that stacking more layers does not lead to better performance. In fact, the exact opposite occurs. But why is that? In one word: the gradient, ladies and gentlemen. But don't worry, researchers found a trick to counterthis effect. Here, the key concept developed by ResNet is residual learning. As you can see, every two layers, there is an identity mapping via an element-wise addition. This proved to be very helpful for gradient propagation, as the error can be back propagated through multiple paths. Also, from a representation point of view, this helps to combinedifferent levels of features at each step of the network, just like we saw it with the inception modules. It is to this date one of the best performing network on Image Net,with a 3.6% top-5 error rate. Fig 2.5 ResNet Architecture 14 CHAPTER 3 Deep Learning 3.1 Deep Learning Deep Learning is a subset of Machine Learning, which on the other hand is a subset of Artificial Intelligence. Artificial Intelligence is a general term that refers to techniques that enable computers to mimic human behavior. Machine Learning represents a set of algorithms trained on data that make all of this possible. Fig 3.1 AI vs ML vs DL. Machine Learning is a subset of Artificial Intelligence and Deep Learning Is subset of Machine Learning. Deep Learning, on the other hand, is a sort of Machine Learning that is inspired by the human brain's structure. Deep learning algorithms analyze data with a predetermined logical structure in order to reach similar conclusions as humans. Deep learning achieves this by employing a multi-layered structure of algorithms known as neural networks. 15 Fig 3.2 A simple Neural Network diagram to show how different layers are connected The neural network's design is inspired by the structure of the human brain. Neural networks can be taught to perform the same tasks on data that our brains do when identifying patterns and classifying different sorts of information. Individual layers of neural networks can also be thought of as a kind of filter that works from the most obvious to the most subtle, improving the possibility of detecting and producing a right result. The human brain operates in a similar manner. When we get new knowledge, our brain attempts to compare it to previously encountered objects. Deep neural networks make use of the same notion. 3.2 Reason of Popularity of Deep Learning these days Why are artificial neural networks and deep learning so strong and unique in today's industry? Why are deep learning models more potent than machine learning models, above all? Please allow me to clarify. The lack of requirement for so-called feature extraction is the primary advantage of deep learning over machine learning. 16 Traditional machine learning approaches were widely utilized long before deep learning. Decision Trees, SVM, Nave Bayes Classifier, and Logistic Regression are some examples. These algorithms are also called flat algorithms. Flat here means that these algorithms can not normally be applied directly to the raw data (such as .csv, images, text, etc.). We need a preprocessing step called Feature Extraction. The result of Feature Extraction is a representation of the given raw data that can now be used by these classic machine learning algorithms to perform a task. For example, the classification of the data into several categories or classes. Feature Extraction is usually quite complex and requires detailed knowledge of the problem domain. This preprocessing layer must be adapted, tested and refined over several iterations for optimal results. On the other side are the artificial neural networks of Deep Learning. These do not need the Feature Extraction step. The layers are able to learn an implicit representation of the raw data directly and on their own. Here, a more and more abstract and compressed representation of the raw data is produced over several layers of an artificial neural-nets. This compressed representation of the input data is then used to produce the result. The result can be, for example, the classification of the input data into different classes. 17 Fig 3.3 How Implementation of Machine Learning is different from Deep Learning During the training process, this step is also optimized by the neural network to obtain the best possible abstract representation of the input data. This means that the models of deep learning thus require little to no manual effort to perform and optimize the feature extraction process. 3.3 Old Machine learning algo comparison with deep learning method Let us look at a concrete example. For example, if you want to use a machine learning model to determine if a particular image is showing a car or not, we humans first need to identify the unique features or features of a car (shape, size, windows, wheels, etc.) extract the feature and give them to the algorithm as input data. In this way, the algorithm would perform a classification of the images. That is, in machine learning, a programmer must intervene directly in the action for the model to come to a conclusion. In the case of a deep learning model, the feature extraction step is completely unnecessary. The model would recognize these unique characteristics of a car and make correct predictions. That completely without the help of a human. In fact, refraining from extracting the characteristics of data applies to every other task you’ll ever do with neural networks. Just give the raw data to the neural network, the rest is done by the model. 18 Fig 3.4 How Performance changes when we increase amount of data input. Deep Learning models tend to increase their accuracy with the increasing amount of training data, where’s traditional machine learning models such as SVM and Naive Bayes classifier stop improving after a saturation point 19 Chapter 4 4.1 CONVOLUTION NEURAL NETWORK Convolutional Neural Networks (ConvNet or CNN) are a type of deep neural network that is often used for image analysis. The CNNs' building blocks are convolution layers. A convolution is the simple application of a filter to an input that results in an activation. Repeated application of the same filter to an input result in a map of activations called a feature map, indicating the locations and strength of a detected feature in an input, such as an image. What makes CNNs so powerful and useful is that they can generate excellent predictions with minimal image preprocessing. Also, the CNNs are immune to spatial variance and hence are able to detect features anywhere in the input images. Before Going into deep Let's discuss about human Brain Actually recognize an image using example.... Fig 4.1 Image of a dog for understanding how human braing recognize things We all know it's an image of dog and even with just a glimpse of this image or any other similar image we would always know that this a dog, but how do we know this?? How are we so sure and correct all the time? The reason is that with every evolution step in Humankind, our brain has learned to identify certain key features (big ears, hairy face, long mouth, large teeth etc.) in an 20 image and basis of these feature it would just recognize the above image as a Dog Image. Convolutional layers apply a convolution operation to the input. This passes the information on to the next layer. Pooling combines the outputs of clusters of neurons into a single neuron in the next layer. Fully connected layers connect every neuron in one layer to every neuron in the next layer. This is what a CNN tries to mimic for classifying Images. Fig 4.2 Image of a complete Convolution Neural Network CNN FLOW 1 starts with an input image 2 applies many different filters to it to create a feature map 3 applies a ReLU function to increase non-linearity 4 applies a pooling layer to each feature map 5 flattens the pooled images into one long vector. inputs the vector into a fully connected artificial neural network. processes the features through the network. The final fully connected layer provides the voting of the classes that we’re after. trains through forward propagation and backpropagation.To for many, many epochs. This repeats until we have a well-defined neural network with trained weights and feature detectors. A Convolutional Neural Network can be interpreted as two sub networks where each sub network is responsible for performing a specific task. These Sub networks are 21 Feature Learning Net (The Eyes of our CNN) and Classification Net (The Brain of our CNN). Together they perform an approximate function of how a Human Brain classifies the Images. 1) Feature Learning — Convolutional Layers and Pooling Layers make up themajority of the Feature Learning section. The number of Convolutional and Pooling Layers in a CNN is typically greater than one and is directly proportional to the complexity of the classification problem (a more difficult problem would require more Convolutional and Pooling layers for feature extraction). 2) Classification — The classification part is responsible for classifying the images to their respective categories based on the features (Feature Maps) that FeatureLearning part has extracted (created) from the image. The Classification part usually consist of a Flatten Layer and a network of Fully Connected Hidden Layer 4.2 Terms Related to Convolution Neural Network Explanation: 4.21 Convolution In purely mathematical terms, convolution is a function derived from two given functions by integration which expresses how the shape of one is modified by the other. The convolution function is presumably recognizable to those of you who have worked in any subject that involves signal processing. Let's get into the actual convolution operation in the context of neural networks. The following example will provide you with a breakdown of everything you need to know about this process. Here are the three elements that enter into the convolution operation: Input image Feature detector Feature map The input image is the same smiley face image that we used in the previous example, as you can see. Again, if you look closely at the sequence of 1s and 0s, you can see the happy face. 22 A 55% or 77% matrix is sometimes used as a feature detector, but the more common one, and the one we will be working with, is a 33% matrix. The feature detector is also known as a "kernel" or a "filter," terms you may come across as you read more about the subject. To avoid confusion, it is preferable to remember both terms. They all mean the same thing and are used interchangeably in this course as well. How exactly does the Convolution Operation work? The feature detector can be thought of as a window with 9 (3*3) cells. Here's what you're supposed to do with it: You position it over the input image starting at the top-left corner and counting the number of cells where the feature detector matches the input image within the above-mentioned borders. The number of matching cells is then entered in the feature map's top-left cell. After that, you move the feature detector one cell to the right and repeat the process. Because we are moving the feature detector one cell at a time, this movement is referred to as a stride of one pixel. What you will find in this example is that the feature detector's middle-left cell with the number 1 inside it matches the cell that it is standing over inside the input image. That's the only matching cell, and so you write “1” in the next cell in the feature map, and so on and so forth. After you have gone through the whole first row, you can then move it over to the next row and go through the same process. It's important not to confuse the feature map with the other two elements. The cells of the feature map can contain any digit, not only 1's and 0's. After going over every pixel in the input image in the example above, we would end up with these results: Fig 4.3 Convolution Operation Mathematically how it works 23 By the way, just like feature detector can also be referred to as a kernel or a filter, a feature map is also known as an activation map and both terms are also interchangeable. What is the point from the Convolution Operation? There are several uses that we gain from deriving a feature map. These are the most important of them: Reducing the size of the input image, and you should know that the larger your strides (the movements across pixels), the smaller your feature map. In this example, we used one-pixel strides which gave us a fairly large feature map. When dealing with proper images, you will find it necessary to widen your strides. Here we were dealing with a 7×7 input image after all, but real images tend to be substantially larger and more complex. That way you will make them easier to read. Do we lose information when using a feature detector? The answer is YES. The feature map that we end up with has fewer cells and therefore less information than the original input image. However, the very purpose of the feature detector is to sift through the information in the input image and filter the parts that are integral to it and exclude the rest. Basically, it is meant to separate the wheat from the chaff. Why do we aim to reduce the input image to its essential features? Think of it this way. What you do is detect certain features, say, their eyes and their nose, for instance, and you immediately know who you are looking at. These are the most revealing features, and that is all your brain needs to see in order to make its conclusion. Even these features are seen broadly and not down to their minutiae. If your brain actually had to process every bit of data that enters through your senses at any given moment, you would first be unable to take any actions, and soon you would have a mental breakdown. Broad categorization happens to be more practical. Convolutional neural networks operate in exactly the same way. 24 Fig 4.4 diff filters used for convolution and its output 4.22 Stride Stride is a feature of convolutional neural networks, which are neural networks optimized for image and video compression. Stride is a filter parameter in a neural network that controls the amount of movement in an image or video. When the stride of a neural network is set to 1, for example, the filter moves one pixel (or unit) at a time. Because the size of the filter has an impact on the encoded output volume, stride is usually set to a whole number rather than a fraction or decimal. 25 Fig 4.5 showing what exactly is Stride feature with help of a diagram 4.23 Padding Padding is a term used in convolutional neural networks to describe how many pixels are added to an image when it is processed by the CNN kernel. If the padding in a CNN is set to zero, for example, every pixel value added will have the value zero. If the zero padding is set to one, a one-pixel border with a pixel value of zero will be added to the image. Fig 4.6 Diagrammaticaly showing how padding is used in convolution operation. How does Padding work? Padding is a term used in convolutional neural networks to describe how many pixels are added to an image when it is processed by the CNN kernel. If the padding in a CNN is set to zero, for example, every pixel value added will have the value zero. If the zero 26 padding is set to one, a one-pixel border with a pixel value of zero will be added to the image. Activation Functions The next layer after the convolution is non-linearity. The non-linearity can be used to adjust or cut-off the generated output. This layer is applied in order to saturate the output or limiting the generated output. For many years, sigmoid and tanh were the most popular non-linearity. Fig. 11, shows the common types of nonlinearity. However, recently, the Rectified Linear Unit (ReLU) has been used more often for the following reasons. Fig 4.7 Different Activation Function with there graph 1) ReLU has simpler definitions in both function and gradient. ReLU(x)=max(0,x) d(x)/dx ={1 if x>0; 0 otherwise}(4)(5) 2 ) The saturated function such as sigmoid and tanh cause problems in the back propagation. As the neural network design is deeper, the gradient signal begins to vanish, which is called the “vanishing gradient”. This happens since the gradient of those functions is very close to zero almost everywhere but the center. However, the 27 ReLU has a constant gradient for the positive input. Although the function is not differentiable, it can be ignored in the actual implementation. 3) The ReLU creates a sparser representation. because the zero in the gradient leads to obtaining a complete zero. However, sigmoid and tanh always have non-zeroresults from the gradient, which might not be in favor for training. 4.24 Pooling A pooling layer is another building block of a CNN. Pooling Fig 4.8 Figure to illustrate Pooling operation Its function is to progressively reduce the spatial size of the representation to reduce the amount of parameters and computation in the network. Pooling layer operates on each feature map independently.The most common approach used in pooling is max pooling. Max Pooling Fig 4.9 Figure to illustrate Max Pooling operation 28 4.25 Fully-Connected Layer The fully-connected layer is a similar to the way that neurons are arranged in a traditional neural network. Therefore, each node in a fully-connected layer is directly connected to every node in both the previous and in the next layer as shown in Fig. 13, From this figure we can note that each of the nodes in the last frames in the pooling layer are connected as a vector to the first layer from the fully-connected layer. These are the most parameters used with the CNN within these layers, and take a long time in training . The major drawback of a fully-connected layer, is that it includes a lot of parameters that need complex computational in training examples. Therefore, we try to eliminate the number of nodes and connections. The removed nodes and connection can be satisfied by using the dropout technique. For example, LeNet and AlexNet designed a deep and wide network while keeping the computational complex constant [4,6,9]. The essence of the CNN network, which is the convolution, is when the nonlinearity and pooling layer are introduced. Most common architecture uses three of them as Fig 4.10 figure for showing the fully connected layer used in CNN process. 20 29 CHAPTER 5 5.1 IMAGE CLASSIFICATION USING CNN: For compiling the model, we have to specify the following parameters — Optimizer → Algorithm used for updating the weights of our CNN. “Adam” (Gradient Descent) is one of the popular optimizer used for updating weights. Loss → Cost function used for calculating the error between the predicted & actual value. In our case we will be using “categorical_crossentropy” since we are dealingwith multiclass classification. In case of binary classification we have to use “binary_crossentropy” as loss function. Metrics → Evaluation metric for checking performance of our model. For fitting the model, we have to specify the following parameters — Batch_size → Number of images that will be used by to train our CNN model before updating the weights using back propagation. Epochs → An epoch is a measure of the number of times all of the training images are used once to update the weights.Till now we have learnt about the different components of a Convolutional NeuralNetwork. Let us now develop our own CNN for image classification.The problem at our hand is image data of Nature Scenes around the world. The Datacontains around 25k images of size 150x150 distributed under 6 categories. 30 As you can see, our model has an accuracy of ~83 % on validation data whichis not bad. The accuracy can be further increased by playing around with different parameters such as 1. Increasing number of Neurons 2. Increasing number of hidden layers 3. Increasing epochs 4. Playing around with convolutional layer parameters 5.2 Trained Model Output: Fig 5.1 In this a glacier Image was given as input and model predicted it correctly.. 31 Fig 5.2 In this a Building Image was given as input and model predicted it correctly.. 32 Fig 5.3 In this a Sea Image was given as input and model predicted it correctly.. 33 CHAPTER-6 6.1 Conclusion With today's quickly evolving technology in artificial intelligence and computer vision, it's more important than ever to be more precise and optimal with your research. The sciences of picture categorization and image recognition are likewise progressing and reaching new heights. In this subject, numerous innovative, complicated, and efficient algorithms and neural networks are being introduced. In my report I have presented model for Classification of natral scenes around the world. In the training of model I has shown accuracy of approx 83%.So By this We can say natural scenes can also be detected by using Convolution Neural Network.As shown in above figures the model is working properly. 6.2 Future Scope Scanning the heavens for other sentient species out in space will be the future of image processing. Advances in image processing applications will also be included in new intelligent, digital species created wholly by research experts in various countries across the world. In a few decades, developments in image processing and associated technologies will have resulted in millions upon millions of robots in the planet, altering the way the world is managed. Spoken instructions, anticipating government information needs, interpreting languages, detecting and tracking people and things, and diagnosing medical issues will all benefit from advances in image processing and artificial intelligence. Image recognition is an excellent prototype problem to learn about neural networks, and it's a great way to develop more advanced techniques of deep learning. Which can be added to my model in future. Multi label image classification can be further added to these models. 34 6.3 References 1 https://ieeexplore.ieee.org/document/8308186/figures#figures 2 https://towardsdatascience.com/wtf-is-image-classification8e78a8235acb 3 Yann LeCun et al., 1998, Gradient-Based Learning Applied to Document Recognition 4 Adit Deshpande, 2016, The 9 Deep Learning Papers You Need To Know About (Understanding CNNs Part 3) 5. C.-C. Jay Kuo, 2016, Understanding Convolutional Neural Networks with A Mathematical Model 6 https://www.sciencedirect.com/science/article/pii/S0924271617303660 7(Conference proceeding) A. Z. Kouzani, F. He and K. Sammut, "Commonsense knowledge-based face detection," Proceedings of IEEE International Conference on Intelligent Engineering Systems, 1997 8 Conference proceeding) H. A. Rowley, S. Baluja and T. Kanade, "Rotation invariant neural network-based face detection," Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231), 1998 9 (Conference proceeding) P. Viola and M. Jones, "Rapid object detection using a boosted cascade of simple features," Proceedings of the 2001 IEEE Computer Society C