Uploaded by deepakkochhar1112

TARP Assesment 3

advertisement
CSE 3999
Technical Answers to Real World Problems
Submitted By:
16BCE0412
Deepak Kochhar
16BCE0019
Shivam Agrawal
16BCE0007
Utsav Tulsyan
16BCE0300
Karmendra Chaudhary
Slot: TF2
Project Supervisor:
Prof. Kumaravelu R
Assessment 3
Explain the Software/Hardware technologies selected for the
identified real-world problem?
Problem Identified Using Deep Learning for Image-Based Plant Disease Detection
Our solution uses plant image as input to our trained Convolutional Neural
Network which classifies the input as normal or diseased. So, the core technology
that drives our work is neural networks, more specifically Convolutional Neural
Networks. On the hardware side, we require an i5/i7 processor with a preferred
RAM size of 8GB for faster computations.
Software technologies and Concepts Used CNNs, like neural networks, are made up of neurons with learnable weights and
biases. Each neuron receives several inputs, takes a weighted sum over them,
pass it through an activation function and responds with an output. The whole
network has a loss function and all the tips and tricks that we developed for neural
networks still apply on CNNs.
To teach an algorithm how to recognise objects in images, we use a specific type
of Artificial Neural Network: a Convolutional Neural Network (CNN). Their name
stems from one of the most important operations in the network: convolution.
Convolutional Neural Networks have a different architecture than regular Neural
Networks. Regular Neural Networks transform an input by putting it through a
series of hidden layers. Every layer is made up of a set of neurons, where each
layer is fully connected to all neurons in the layer before. Finally, there is a last
fully-connected layer — the output layer — that represent the predictions.
Convolutional Neural Networks are a bit different. First of all, the layers are
organised in 3 dimensions: width, height and depth. Further, the neurons in one
layer do not connect to all the neurons in the next layer but only to a small region
of it. Lastly, the final output will be reduced to a single vector of probability scores,
organized along the depth dimension.
CNNs have two components:
The Hidden layers/Feature extraction part
In this part, the network will perform a series of convolutions and pooling
operations during which the features are detected. If you had a picture of a zebra,
this is the part where the network would recognise its stripes, two ears, and four
legs.
The Classification part
Here, the fully connected layers will serve as a classifier on top of these extracted
features. They will assign a probability for the object on the image being what the
algorithm predicts it is.
Feature extraction
Convolution is one of the main building blocks of a CNN. The term convolution
refers to the mathematical combination of two functions to produce a third
function. It merges two sets of information.
In the case of a CNN, the convolution is performed on the input data with the use
of a filter or kernel (these terms are used interchangeably) to then produce a
feature map.
We execute a convolution by sliding the filter over the input. At every location,
matrix multiplication is performed and sums the result onto the feature map.
In the animation below, you can see the convolution operation. You can see the
filter (the green square) is sliding over our input (the blue square) and the sum of
the convolution goes into the feature map (the red square).
The area of our filter is also called the receptive field, named after the neuron
cells! The size of this filter is 3x3.
For the sake of explaining, I have shown you the operation in 2D, but in reality,
convolutions are performed in 3D. Each image is namely represented as a 3D
matrix with a dimension for width, height, and depth. Depth is a dimension
because of the colours channels used in an image (RGB).
We perform numerous convolutions on our input, where each operation uses a
different filter. This results in different feature maps. In the end, we take all of
these feature maps and put them together as the final output of the convolution
layer.
Just like any other Neural Network, we use an activation function to make our
output non-linear. In the case of a Convolutional Neural Network, the output of the
convolution will be passed through the activation function. This could be the ReLU
activation function.
Stride is the size of the step the convolution filter moves each time. A stride size is
usually 1, meaning the filter slides pixel by pixel. By increasing the stride size, your
filter is sliding over the input with a larger interval and thus has less overlap
between the cells. The animation below shows stride size 1 in action.
Because the size of the feature map is always smaller than the input, we have to
do something to prevent our feature map from shrinking. This is where we use
padding.
A layer of zero-value pixels is added to surround the input with zeros so that our
feature map will not shrink. In addition to keeping the spatial size constant after
performing convolution, padding also improves performance and makes sure the
kernel and stride size will fit in the input.
After a convolution layer, it is common to add a pooling layer in between CNN
layers. The function of pooling is to continuously reduce the dimensionality to
reduce the number of parameters and computation in the network. This shortens
the training time and controls overfitting.
The most frequent type of pooling is max pooling, which takes the maximum value
in each window. These window sizes need to be specified beforehand. This
decreases the feature map size while at the same time keeping the significant
information.
Thus when using a CNN, the four important hyperparameters we have to
decide on are:
the kernel size
the filter count (that is, how many filters do we want to use)
stride (how big are the steps of the filter)
padding
Classification
After the convolution and pooling layers, our classification part consists of a few
fully connected layers. However, these fully connected layers can only accept 1
Dimensional data. To convert our 3D data to 1D, we use the function flatten in
Python. This essentially arranges our 3D volume into a 1D vector.
The last layers of a Convolutional NN are fully connected layers. Neurons in a fully
connected layer have full connections to all the activations in the previous layer.
This part is in principle the same as a regular Neural Network.
Training
Training a CNN works in the same way as a regular neural network, using
backpropagation or gradient descent. However, here this is a bit more
mathematically complex because of the convolution operations.
Summary
In summary, CNNs are especially useful for image classification and recognition.
They have two main parts: a feature extraction part and a classification part. The
main special technique in CNNs is convolution, where a filter slides over the input
and merges the input value + the filter value on the feature map. In the end, our
goal is to feed new images to our CNN so it can give a probability for the object it
thinks it sees or describes an image with text.
Hardware technologies For the real world implementation, we require a processing unit and a data
acquisition unit. Mobile Cameras or any camera can be our data acquisition unit to
capture inputs which are then fed to the processing unit which runs CNN
algorithm and outputs the classification.
Download