Uploaded by Shubham Khatarkar

U19EC136 IMAGE CLASSIFICATION

advertisement
1
SEMINAR REPORT
Entitled
“Multiclass Image Classification Using Deep Learning
”
Submitted to the Department of Electronics Engineering
In Partial Fulfillment of the Requirement for the Degree of
Presented & Submitted By :
Mr. Nikhil Gupta
(Roll No. U19EC136)
B. TECH. III (EC), 5th Semester
Guided By :
Prof./Dr. Suman deb
Assistant Professor, ECED.
(Year : 2021-22)
DEPARTMENT OF ELECTRONICS ENGINEERING
Sardar Vallabhbhai National Institute of Technology
Surat-395007, Gujarat, INDIA.
2
Sardar Vallabhbhai National Institute of Technology
Surat-395 007, Gujarat, INDIA.
ELECTRONICS ENGINEERING DEPARTMENT
This is to certify that the SEMINAR REPORT entitled “Multiclass Image
Classification Using Deep Learning” is presented & submitted by Candidate Mr Nikhil
Gupta, bearing Roll No. U19EC136, of B.Tech. III, 5th Semester in the partial
fulfillment of the requirement for the award of B. Tech. degree in Electronics &
Communication Engineering for academic year 2021-22.
He has successfully and satisfactorily completed his Seminar Exam in all
respect. We, certify that the work is comprehensive, complete and fit for evaluation.
Prof. / Dr. Suman deb
Assistant Professor &
Seminar Guide
SEMINAR EXAMINERS :
Name of Examiner
Signature with date
1. Prof KP UPLA
2. Prof MC PATEL
Dr. P. N. Patel
Associate Professor &
Head, ECED, SVNIT.
DEPARTMENT SEAL
(October - 2021)
3
Acknowledgements
It was an excellent opportunity for me to work and research on a topic that seemed to be
taking the future world to solve the problems. The research process helped me realize
how deep learning has evolved in past decades by solving and simplifying the
complexities . Hence, I am thankful to everyone who was associated with me through the
whole process and guided me throughout, be it directly or indirectly. I want to extend my
gratitude especially to my seminar guide Prof. Suman Deb, Assistant Professor,
Electronics Engineering Department, Sardar Vallabhbhai National Institute of
Technology, who acted as a backbone for me and helped me wherever I got stuck in the
whole flow. Without his constant support and mentorship, I would not have been able to
select and complete an exciting project like this. Also, I am thankful to my friends for
helping me out in understanding different technology stacks.
Nikhil Gupta
U19EC136
Sardar Vallabhbhai National Institute ofTechnology
Surat
4
ABSTRACT
Multiclass Image Classification Using Deep Learning
Neural network, as a fundamental classification algorithm, is widely used in many
image classification issues. With the rapid development of high performance computing
device and parallel computing devices, convolutional neural network also draws
increasingly more attention from many researchers in this area
The term Deep Learning or Deep Neural Network refers to Artificial Neural Networks
(ANN) with multi layers. Over the last few decades, it has been considered to be one of
the most powerful tools, and has become very popular in the literature as it is able to
handle a huge amount of data. The interest in having deeper hidden layers has recently
begun to surpass classical methods performance in different fields; especially in pattern
recognition. One of the most popular deep neural networks is the Convolutional Neural
Network (CNN). It take this name from mathematical linear operation between matrixes
called convolution. CNN have multiple layers; including convolutional layer, nonlinearity layer, pooling layer and fully-connected layer. The convolutional and fullyconnected layers have parameters but pooling and non-linearity layers don't have
parameters. The CNN has an excellent performance in machine learning problems.
Specially the applications that deal with image data, such as largest image classification
data set (Image Net), computer vision, and in natural language processing (NLP) and
the results achieved were very amazing
Nikhil Gupta
U19EC136
Guide:Prof. Suman deb
Seminar Date & Time : 28th oct (2pm to
4pm)
Examiners’ Names : 1)Prof. KP UPLA
2)Prof. MC patel
5
Table of Contents
Table of Contents
Acknowledgements .............................................................................................................. 3
Table of Contents .................................................................................................................. 5
List of Figures ....................................................................................................................... 6
CHAPTER 1 ......................................................................................................................... 7
INTRODUCTION ................................................................................................................ 7
1.1 Overview… .................................................................................................................8
1.2 Motivation ................................................................................................................... 8
1.3 Applications ................................................................................................................ 8
CHAPTER 2 ......................................................................................................................... 9
2.3 VGGNet ..................................................................................................................... 11
2.4 GoogLeNet ................................................................................................................ 12
2.5 ResNet ....................................................................................................................... 13
CHAPTER 3 ....................................................................................................................... 14
3.1 Deep Learning ............................................................................................. 14
Chapter 4 ............................................................................................................................ 19
4.1 CONVOLUTION NEURAL NETWORK ................................................................ 19
4.2 Terms Related to Convolution NeuralNetwork Explaination ................................... 21
4.21
Convolution .................................................................................................... 24
4.22
Stride................................................................................................................24
4.23
Padding. .......................................................................................................... 25
4.24
Pooling ............................................................................................................ 27
4.25
Fully-Connected Layer ................................................................................... 28
CHAPTER 5 .................................................................................................................... 29
5.2 Trained Model Output ............................................................................................... 30
CHAPTER-6.......................................................................................................................... 33
6.1 Conclusion ................................................................................................................... 33
6.2 Future Scope ................................................................................................................. 33
6.3 Refrences ..................................................................................................................... 34
6
List of Figures
2.1 LeNET Architecture......................................................................................................9
2.2 AlexNET Architecture ................................................................................................ 10
2.3 VggNET Architecture ................................................................................................ 11
2.4 GoogleNET Architecture ............................................................................................ 12
2.5 RESNET ..................................................................................................................... 13
3.1 AI vs ML vs DL. Machine Learning is a subset of Artificial Intelligence and Deep
Learning Is subset of Machine Learning… ............................................................... 14
3.2 Simple Neural Network diagram to show how different layers are connected… ..... 15
3.3 How Implementation of Machine Learning is different from Deep Learning…........ 16
3.4 How Performance changes when we increase amount of data input .........................17
4.1 Image of a dog for understanding how human brain recognize things ..................... 18
4.2 Image of a complete Convolution Neural Network ..................................................19
4.3 Convolution Operation Mathematically how it works.............................................. 22
4.4 Diff filters used for convolution and its output ....................................................... 23
4.5 Showing What exactly is Stride feature with help of a diagram ............................. 24
4.6 Diagrammatically showing how padding is used in convolution operation ........... 25
4.7 Graph Of different Activation method and their comparison ..................................26
4.8 Figure to illustrate Pooling operation ..................................................................... 27
4.9 Figure to illustrate Max Pooling operation ............................................................. 28
4.10 figure for showing the fully connected layer used in CNN process .................... 29
5.1 In this a glacier Image was given as input and model predicted it correctly .......... 33
5.2 In this a Building Image was given as input and model predicted it correctly ........34
5.3 In this a Sea Image was given as input and model predicted it correctly ............... 34
7
CHAPTER 1
INTRODUCTION
1.1 Overview
The term image processing refers to the use of computers to alter digital images. It's a
broad topic that encompasses mathematically complex processes. Certain essential
operations, such as image categorization, image restoration/rectification, and image
compression, are all part of image processing.
Image enhancement, image fusion, and other similar techniques are available. Image
categorization is an important part of the process. Processing of images
The purpose of image classification is to assign photographs to categories
automatically. Lessons with a theme There are two types of classifications: supervised
classification and unsupervised classification.
To deal with a limited amount of samples and computational units, traditional machine
learning algorithms (such as multilayer perception machines, support vector machines,
and so on) usually use shallow structures. The performance and generalization capabilities
of complex classification problems are clearly insufficient when the target objects have
rich meanings. Because it is good at dealing with picture classification and recognition
difficulties and has improved the accuracy of many machine learning tasks, the
convolution neural network (CNN) created in recent years has been widely used in the
field of image processing. It's evolved into a powerful and widely used deep learning
mode.
1.2 Motivation
Medical imaging, item recognition in satellite images, traffic management systems, brake
light detection, machine vision, and other applications all require image classification
software.
Convolutional neural networks (CNN) are the underlying idea underpinning current
deep learning advances and improvements. CNNs have shattered the mould and
climbed to the throne as the most advanced computer vision technology. CNNs are by
far the most prevalent form of neural network (others include recurrent neural networks
(RNN), long short term memory (LSTM), artificial neural networks (ANN), and so on).
These convolutional neural network models are used in picture data. They perform
exceptionally well in computer vision tasks such as picture categorization, object
identification, image recognition, and so on.
1.3 Applications
Predicting Consumerism Behavior
Of course, the area of brand advertisements, Ad targeting, and improving customer
service was bound to benefit from the useful applications of image recognition. By
8
targeting customer’s posted photos through IR – they can learn about their interests and
consumerism behaviors. This, in turn, will help the brand’s targeting efforts to be
streamlined and perfected. Since they will have the required data to target a relevant
audiences and place their ads smartly.
Iris Recognition Improvement
Iris recognition has been improved considerably with the help of image recognition
technology that recognizes the unique patterns in the iris. One of the most important and
essential applications of iris recognition is biometric identification.
Optimizing Medical Imagery
Given the advent of the photo-centric age, where images, photos, and video are
preferred, the medical industry isn't far behind, with 90 percent of medical data
consisting of medical images, making it the largest data source in healthcare.
As a result of connecting one dot with another, these medical photos will be educated
by clever image recognition technology to change the art of diagnosis, making it easier
to detect severe diseases such as cancer.
9
CHAPTER 2
Literature Review
2.1 LeNet
Before starting, let's note that we would not have been successful if we simply used a raw
multi-layer perceptron connected to each pixel of an image. On top of becoming quickly
intractable, this direct operation is not very efficient as pixels are spatially correlated.
Therefore we first need to extract
1. meaningful and
2. low-dimensional features that we can work on
And that's where convolutional neural networks come in the game!
To tackle this issue, Yann Le Cun's idea proceeds in multiple steps.
First, an input image is fed to the network. Filters of a given size scan the image and
perform convolutions. The obtained features then go through an activation function.
Then, the output goes through a succession of pooling and other convolution
operations.As you can see, features are reduced in dimension as the network goes on.
At the end, high-level features are flattened and fed to fully connected layers, which
will eventually yield class probabilities through a softmax layer. During training time,
the network learns how to recognize the features that make a sample belong to a given
class through backpropagation.To give an example of what such a network can 'see':
let's say we have an image of a horse. The first filters may focus on the animal's overall
shape. And then as we go deeper, we can reach a higher level of abstraction where
details like eyes and ears can be captured.That way, ConvNets appear as a way to
construct features that we would have had to handcraft ourselves otherwise.
Fig 2.1 LeNet architecture
10
2.2 AlexNet
Then you could wonder, why have ConvNets not been trendy since 1998? The short
answer is: we had not leveraged their full potential back then.
Here, AlexNet takes the same top-down approach, where successive filters are designed
to capture more and more subtle features. But here, his work explored several crucial
details
First, Krizhevsky introduced better non-linearity in the network with the ReLU
activation, whose derivative is 0 if the feature is below 0 and 1 for positive values. This
proved to be efficient for gradient propagation.
Second, his paper introduced the concept of dropout as regularization. From a
representation point of view, you force the network to forget things at random, so that
itcan see your next input data from a better perspective. Just to give an example, after
you finish reading this post, you will have most probablyforgotten parts of it. And yet
this is OK, because you will have only kept in mind what was essential.Also, it
introduced data augmentation. When fed to the network, images are shown with
random translation, rotation, crop.
That way, it forces the network to bemore aware of the attributes of the images, rather
than the images themselves.Finally, another trick used by AlexNet is to be deeper. You
can see here that they stacked more convolutional layers before pooling operations.
The representation captures consequently finer features that reveal to be useful for
classification.This network largely outperformed what was state-of-the-art back in
2012, with a 15.4% top-5 error on the ImageNet dataset.
Fig 2.2 AlexNet Architecture
11
2.3 VGGNet
Deeper is better
The next big milestone of image classification further explored the last point that I
mentioned: going deeper.And it works. This suggests that such networks can achieve a
better hierarchical representation of visual data with more layers.As you can see,
something else is very special on this network. It contains almost exclusively 3 by 3
convolutions. This is curious, isn't?In fact, the authors were driven by three main
reasons to do so:
First, using small filters induces more non-linearity, which means more degrees of
freedom for the network.Second, the fact of stacking these layers together enables the
network to see more things than it looks like. For example, with two of these, the
network in fact sees a 5x5 receptive field. And when you stack 3 of these filters, you
have in fact a 7x7 receptive field! Therefore, the same feature extraction capabilities as
in the previous examples can be achieved on this architecture as well.Third, using only
small filters also limits the number of parameters, which is good when you want to go
that deep.Quantitatively speaking, this architecture achieved a 7.3% top-5 error on
ImageNet.
Fig 2.3 VGGNet Architecture
12
2.4 GoogLeNet
Next, GoogLeNet came in the game. It bases its success on its inception modules.As
you can see, convolutions with different filter sizes are processed on the same input, and
then concatenated together.From a representation point of view, this allows the model to
take advantage of multi-level feature extraction at each step. For example, general
features can be extracted by the 5x5 filters at the same time that more local features are
captured by the 3x3 convolutions.But then, you could tell me. Well that's great. But isn't
that insanely expensive to compute?And I would say: very good remark! Actually, the
Google team had a brilliant solution for this: 1x1 convolutions.On the one hand,
it reduces the dimensionality of your features.On the other, it combines feature maps in
a way that can be beneficial from a representation perspective.Then you could ask, why
is it called inception? Well, you can see all of those modules as being networks stacked
one over another inside a bigger network.And for the record, the best GoogLeNet
ensemble achieved a 6.7% error on ImageNet.
Fig 2.4 GoogleNet Architecture
13
2.5 ResNet
Connect the layers
So all these networks we talked about earlier followed the same trend: going deeper.
But at some point, we realize that stacking more layers does not lead to better
performance. In fact, the exact opposite occurs. But why is that? In one word:
the gradient, ladies and gentlemen. But don't worry, researchers found a trick to
counterthis effect. Here, the key concept developed by ResNet is residual learning. As
you can see, every two layers, there is an identity mapping via an element-wise
addition. This proved to be very helpful for gradient propagation, as the error can be
back propagated through multiple paths. Also, from a representation point of view, this
helps to combinedifferent levels of features at each step of the network, just like we
saw it with
the inception modules. It is to this date one of the best performing network on Image
Net,with a 3.6% top-5 error rate.
Fig 2.5 ResNet Architecture
14
CHAPTER 3
Deep Learning
3.1 Deep Learning
Deep Learning is a subset of Machine Learning, which on the other hand is a subset of
Artificial Intelligence. Artificial Intelligence is a general term that refers to techniques that
enable computers to mimic human behavior. Machine Learning represents a set of
algorithms trained on data that make all of this possible.
Fig 3.1 AI vs ML vs DL. Machine Learning is a subset of Artificial Intelligence and
Deep Learning Is subset of Machine Learning.
Deep Learning, on the other hand, is a sort of Machine Learning that is inspired by the
human brain's structure. Deep learning algorithms analyze data with a predetermined
logical structure in order to reach similar conclusions as humans. Deep learning achieves
this by employing a multi-layered structure of algorithms known as neural networks.
15
Fig 3.2 A simple Neural Network diagram to show how different layers are connected
The neural network's design is inspired by the structure of the human brain. Neural
networks can be taught to perform the same tasks on data that our brains do when
identifying patterns and classifying different sorts of information.
Individual layers of neural networks can also be thought of as a kind of filter that works
from the most obvious to the most subtle, improving the possibility of detecting and
producing a right result.
The human brain operates in a similar manner. When we get new knowledge, our brain
attempts to compare it to previously encountered objects. Deep neural networks make
use of the same notion.
3.2 Reason of Popularity of Deep Learning these
days
Why are artificial neural networks and deep learning so strong and unique in today's
industry? Why are deep learning models more potent than machine learning models,
above all? Please allow me to clarify.
The lack of requirement for so-called feature extraction is the primary advantage of deep
learning over machine learning.
16
Traditional machine learning approaches were widely utilized long before deep learning.
Decision Trees, SVM, Nave Bayes Classifier, and Logistic Regression are some
examples.
These algorithms are also called flat algorithms. Flat here means that these algorithms
can not normally be applied directly to the raw data (such as .csv, images, text, etc.). We
need a preprocessing step called Feature Extraction.
The result of Feature Extraction is a representation of the given raw data that can now be
used by these classic machine learning algorithms to perform a task. For example, the
classification of the data into several categories or classes.
Feature Extraction is usually quite complex and requires detailed knowledge of the
problem domain. This preprocessing layer must be adapted, tested and refined over
several iterations for optimal results.
On the other side are the artificial neural networks of Deep Learning. These do not
need the Feature Extraction step.
The layers are able to learn an implicit representation of the raw data directly and on
their own. Here, a more and more abstract and compressed representation of the raw data
is produced over several layers of an artificial neural-nets. This compressed
representation of the input data is then used to produce the result. The result can be, for
example, the classification of the input data into different classes.
17
Fig 3.3 How Implementation of Machine Learning is different from Deep Learning
During the training process, this step is also optimized by the neural network to obtain
the best possible abstract representation of the input data. This means that the models of
deep learning thus require little to no manual effort to perform and optimize the feature
extraction process.
3.3 Old Machine learning algo comparison with deep learning method
Let us look at a concrete example. For example, if you want to use a machine learning
model to determine if a particular image is showing a car or not, we humans first need to
identify the unique features or features of a car (shape, size, windows, wheels, etc.) extract
the feature and give them to the algorithm as input data.
In this way, the algorithm would perform a classification of the images. That is, in machine
learning, a programmer must intervene directly in the action for the model to come to a
conclusion.
In the case of a deep learning model, the feature extraction step is completely unnecessary.
The model would recognize these unique characteristics of a car and make correct
predictions.
That completely without the help of a human.
In fact, refraining from extracting the characteristics of data applies to every other task
you’ll ever do with neural networks. Just give the raw data to the neural network, the rest
is done by the model.
18
Fig 3.4 How Performance changes when we increase amount of data input.
Deep Learning models tend to increase their accuracy with the increasing amount of
training data, where’s traditional machine learning models such as SVM and Naive
Bayes classifier stop improving after a saturation point
19
Chapter 4
4.1 CONVOLUTION NEURAL NETWORK
Convolutional Neural Networks (ConvNet or CNN) are a type of deep neural network
that is often used for image analysis. The CNNs' building blocks are convolution
layers.
A convolution is the simple application of a filter to an input that results in an
activation. Repeated application of the same filter to an input result in a map of
activations called a feature map, indicating the locations and strength of a detected
feature in an input, such as an image.
What makes CNNs so powerful and useful is that they can generate excellent
predictions with minimal image preprocessing. Also, the CNNs are immune to spatial
variance and hence are able to detect features anywhere in the input images.
Before Going into deep Let's discuss about human Brain Actually recognize an image
using example....
Fig 4.1 Image of a dog for understanding how human braing recognize things
We all know it's an image of dog and even with just a glimpse of this image or any other
similar image we would always know that this a dog, but how do we know this?? How
are we so sure and correct all the time?
The reason is that with every evolution step in Humankind, our brain has learned to
identify certain key features (big ears, hairy face, long mouth, large teeth etc.) in an
20
image and basis of these feature it would just recognize the above image as a Dog
Image.
Convolutional layers apply a convolution operation to the input. This passes the
information on to the next layer.
Pooling combines the outputs of clusters of neurons into a single neuron in the next
layer.
Fully connected layers connect every neuron in one layer to every neuron in the next
layer.
This is what a CNN tries to mimic for classifying Images.
Fig 4.2 Image of a complete Convolution Neural Network
CNN FLOW
1 starts with an input image
2 applies many different filters to it to create a feature map
3 applies a ReLU function to increase non-linearity
4 applies a pooling layer to each feature map
5 flattens the pooled images into one long vector.
inputs the vector into a fully connected artificial neural network.
processes the features through the network. The final fully connected layer provides the
voting of the classes that we’re after.
trains through forward propagation and backpropagation.To for many, many epochs.
This repeats until we have a well-defined neural network with trained weights and
feature detectors.
A Convolutional Neural Network can be interpreted as two sub networks where each
sub network is responsible for performing a specific task. These Sub networks are
21
Feature Learning Net (The Eyes of our CNN) and Classification Net (The Brain of
our CNN). Together they perform an approximate function of how a Human Brain
classifies the Images.
1) Feature Learning — Convolutional Layers and Pooling Layers make up
themajority of the Feature Learning section. The number of Convolutional and
Pooling Layers in a CNN is typically greater than one and is directly proportional to
the complexity of the classification problem (a more difficult problem would require
more Convolutional and Pooling layers for feature extraction).
2) Classification — The classification part is responsible for classifying the
images to their respective categories based on the features (Feature Maps) that
FeatureLearning part has extracted (created) from the image. The Classification part
usually consist of a Flatten Layer and a network of Fully Connected Hidden Layer
4.2 Terms Related to Convolution Neural
Network Explanation:
4.21 Convolution
In purely mathematical terms, convolution is a function derived from two given
functions by integration which expresses how the shape of one is modified by the other.
The convolution function is presumably recognizable to those of you who have worked
in any subject that involves signal processing.
Let's get into the actual convolution operation in the context of neural networks. The
following example will provide you with a breakdown of everything you need to know
about this process.
Here are the three elements that enter into the convolution operation:



Input image
Feature detector
Feature map
The input image is the same smiley face image that we used in the previous example, as
you can see. Again, if you look closely at the sequence of 1s and 0s, you can see the
happy face.
22
A 55% or 77% matrix is sometimes used as a feature detector, but the more common
one, and the one we will be working with, is a 33% matrix. The feature detector is also
known as a "kernel" or a "filter," terms you may come across as you read more about
the subject.
To avoid confusion, it is preferable to remember both terms. They all mean the same
thing and are used interchangeably in this course as well.
How exactly does the Convolution Operation work?
The feature detector can be thought of as a window with 9 (3*3) cells. Here's what
you're supposed to do with it:





You position it over the input image starting at the top-left corner and counting
the number of cells where the feature detector matches the input image within
the above-mentioned borders.
The number of matching cells is then entered in the feature map's top-left cell.
After that, you move the feature detector one cell to the right and repeat the
process. Because we are moving the feature detector one cell at a time, this
movement is referred to as a stride of one pixel.
What you will find in this example is that the feature detector's middle-left cell
with the number 1 inside it matches the cell that it is standing over inside the
input image. That's the only matching cell, and so you write “1” in the next cell
in the feature map, and so on and so forth.
After you have gone through the whole first row, you can then move it over to
the next row and go through the same process.
It's important not to confuse the feature map with the other two elements. The cells of
the feature map can contain any digit, not only 1's and 0's. After going over every pixel
in the input image in the example above, we would end up with these results:
Fig 4.3 Convolution Operation Mathematically how it works
23
By the way, just like feature detector can also be referred to as a kernel or a filter, a
feature map is also known as an activation map and both terms are also interchangeable.
What is the point from the Convolution Operation?
There are several uses that we gain from deriving a feature map. These are the most
important of them: Reducing the size of the input image, and you should know that the
larger your strides (the movements across pixels), the smaller your feature map. In this
example, we used one-pixel strides which gave us a fairly large feature map.
When dealing with proper images, you will find it necessary to widen your strides. Here
we were dealing with a 7×7 input image after all, but real images tend to be
substantially larger and more complex.
That way you will make them easier to read.
Do we lose information when using a feature detector?
The answer is YES. The feature map that we end up with has fewer cells and therefore
less information than the original input image. However, the very purpose of the feature
detector is to sift through the information in the input image and filter the parts that are
integral to it and exclude the rest.
Basically, it is meant to separate the wheat from the chaff.
Why do we aim to reduce the input image to its essential features?
Think of it this way. What you do is detect certain features, say, their eyes and their
nose, for instance, and you immediately know who you are looking at.
These are the most revealing features, and that is all your brain needs to see in order to
make its conclusion. Even these features are seen broadly and not down to their
minutiae.
If your brain actually had to process every bit of data that enters through your senses at
any given moment, you would first be unable to take any actions, and soon you would
have a mental breakdown. Broad categorization happens to be more practical.
Convolutional neural networks operate in exactly the same way.
24
Fig 4.4 diff filters used for convolution and its output
4.22 Stride
Stride is a feature of convolutional neural networks, which are neural networks
optimized for image and video compression. Stride is a filter parameter in a neural
network that controls the amount of movement in an image or video. When the stride of
a neural network is set to 1, for example, the filter moves one pixel (or unit) at a time.
Because the size of the filter has an impact on the encoded output volume, stride is
usually set to a whole number rather than a fraction or decimal.
25
Fig 4.5 showing what exactly is Stride feature with help of a diagram
4.23 Padding
Padding is a term used in convolutional neural networks to describe how many pixels are
added to an image when it is processed by the CNN kernel. If the padding in a CNN is
set to zero, for example, every pixel value added will have the value zero. If the zero
padding is set to one, a one-pixel border with a pixel value of zero will be added to the
image.
Fig 4.6 Diagrammaticaly showing how padding is used in convolution operation.
How does Padding work?
Padding is a term used in convolutional neural networks to describe how many pixels
are added to an image when it is processed by the CNN kernel. If the padding in a CNN
is set to zero, for example, every pixel value added will have the value zero. If the zero
26
padding is set to one, a one-pixel border with a pixel value of zero will be added to the
image.
Activation Functions
The next layer after the convolution is non-linearity. The non-linearity can be used to
adjust or cut-off the generated output. This layer is applied in order to saturate the
output or limiting the generated output.
For many years, sigmoid and tanh were the most popular non-linearity. Fig. 11, shows
the common types of nonlinearity. However, recently, the Rectified Linear Unit (ReLU)
has been used more often for the following reasons.
Fig 4.7 Different Activation Function with there graph
1) ReLU has simpler definitions in both function and gradient.
ReLU(x)=max(0,x)
d(x)/dx ={1 if x>0; 0 otherwise}(4)(5)
2 ) The saturated function such as sigmoid and tanh cause problems in the back
propagation. As the neural network design is deeper, the gradient signal begins to
vanish, which is called the “vanishing gradient”. This happens since the gradient of
those functions is very close to zero almost everywhere but the center. However, the
27
ReLU has a constant gradient for the positive input. Although the function is not
differentiable, it can be ignored in the actual implementation.
3)
The ReLU creates a sparser representation. because the zero in the gradient leads to
obtaining a complete zero. However, sigmoid and tanh always have non-zeroresults
from the gradient, which might not be in favor for training.
4.24 Pooling
A pooling layer is another building block of a CNN.
Pooling
Fig 4.8 Figure to illustrate Pooling operation
Its function is to progressively reduce the spatial size of the representation to reduce the
amount of parameters and computation in the network. Pooling layer operates on each
feature map independently.The most common approach used in pooling is max pooling.
Max Pooling
Fig 4.9 Figure to illustrate Max Pooling operation
28
4.25 Fully-Connected Layer
The fully-connected layer is a similar to the way that neurons are arranged in a
traditional neural network. Therefore, each node in a fully-connected layer is directly
connected to every node in both the previous and in the next layer as shown in Fig. 13,
From this figure we can note that each of the nodes in the last frames in the pooling
layer are connected as a vector to the first layer from the fully-connected layer. These
are the most parameters used with the CNN within these layers, and take a long time in
training .
The major drawback of a fully-connected layer, is that it includes a lot of parameters
that need complex computational in training examples. Therefore, we try to eliminate
the number of nodes and connections. The removed nodes and connection can be
satisfied by using the dropout technique. For example, LeNet and AlexNet designed a
deep and wide network while keeping the computational complex constant [4,6,9].
The essence of the CNN network, which is the convolution, is when the nonlinearity
and pooling layer are introduced. Most common architecture uses three of them as
Fig 4.10 figure for showing the fully connected layer used in CNN process.
20
29
CHAPTER 5
5.1 IMAGE CLASSIFICATION USING CNN:
For compiling the model, we have to specify the following parameters —
Optimizer → Algorithm used for updating the weights of our CNN. “Adam”
(Gradient Descent) is one of the popular optimizer used for updating weights.
Loss → Cost function used for calculating the error between the predicted & actual
value. In our case we will be using “categorical_crossentropy” since we are dealingwith
multiclass classification. In case of binary classification we have to use
“binary_crossentropy” as loss function.
Metrics → Evaluation metric for checking performance of our model.
For fitting the model, we have to specify the following parameters —
Batch_size → Number of images that will be used by to train our CNN model before
updating the weights using back propagation.
Epochs → An epoch is a measure of the number of times all of the training images are
used once to update the weights.Till now we have learnt about the different components
of a Convolutional NeuralNetwork. Let us now develop our own CNN for image
classification.The problem at our hand is image data of Nature Scenes around the world.
The Datacontains around 25k images of size 150x150 distributed under 6 categories.
30
As you can see, our model has an accuracy of ~83 % on validation data whichis not
bad. The accuracy can be further increased by playing around with different
parameters such as
1. Increasing number of Neurons
2. Increasing number of hidden layers
3. Increasing epochs
4. Playing around with convolutional layer parameters
5.2 Trained Model Output:
Fig 5.1 In this a glacier Image was given as input and model predicted it correctly..
31
Fig 5.2 In this a Building Image was given as input and model predicted it correctly..
32
Fig 5.3 In this a Sea Image was given as input and model predicted it correctly..
33
CHAPTER-6
6.1 Conclusion
With today's quickly evolving technology in artificial intelligence and computer vision,
it's more important than ever to be more precise and optimal with your research. The
sciences of picture categorization and image recognition are likewise progressing and
reaching new heights. In this subject, numerous innovative, complicated, and efficient
algorithms and neural networks are being introduced.
In my report I have presented model for Classification of natral scenes around the
world.
In the training of model I has shown accuracy of approx 83%.So By this We can say
natural scenes can also be detected by using Convolution Neural Network.As shown in
above figures the model is working properly.
6.2 Future Scope
Scanning the heavens for other sentient species out in space will be the future of image
processing. Advances in image processing applications will also be included in new
intelligent, digital species created wholly by research experts in various countries across
the world. In a few decades, developments in image processing and associated
technologies will have resulted in millions upon millions of robots in the planet, altering
the way the world is managed. Spoken instructions, anticipating government
information needs, interpreting languages, detecting and tracking people and things, and
diagnosing medical issues will all benefit from advances in image processing and
artificial intelligence.
Image recognition is an excellent prototype problem to learn about neural networks, and
it's a great way to develop more advanced techniques of deep learning. Which can be
added to my model in future. Multi label image classification can be further added to
these models.
34
6.3 References
1 https://ieeexplore.ieee.org/document/8308186/figures#figures
2 https://towardsdatascience.com/wtf-is-image-classification8e78a8235acb
3 Yann LeCun et al., 1998, Gradient-Based Learning Applied to Document
Recognition
4 Adit Deshpande, 2016, The 9 Deep Learning Papers You Need To Know
About (Understanding CNNs Part 3)
5. C.-C. Jay Kuo, 2016, Understanding Convolutional Neural Networks
with A Mathematical Model
6 https://www.sciencedirect.com/science/article/pii/S0924271617303660
7(Conference proceeding) A. Z. Kouzani, F. He and K. Sammut,
"Commonsense knowledge-based face detection," Proceedings of IEEE
International Conference on Intelligent Engineering Systems, 1997
8 Conference proceeding) H. A. Rowley, S. Baluja and T. Kanade,
"Rotation invariant neural network-based face detection," Proceedings.
1998 IEEE Computer Society Conference on Computer Vision and Pattern
Recognition (Cat. No.98CB36231), 1998
9 (Conference proceeding) P. Viola and M. Jones, "Rapid object detection
using a boosted cascade of simple features," Proceedings of the 2001 IEEE
Computer Society C
Download