Uploaded by fayojob608

22MECE05 exp7

advertisement
3EC32D306_Machine Learning for Embedded Systems
Exp No: 7
Title: Design a custom CNN architecture and implement transfer learning/ pretrained models
Objective: To build a convolutional neural network model for image classification using
Python and MATLAB
Learning Outcomes:
→ Build a convolutional neural network
→ Classify the given data having multiple parameters using neural network
→ Build a digit classifier using convolutional neural network
Task 1: MNIST Digit Classification using CNN in Python
# baseline cnn model for mnist
from numpy import mean
from numpy import std
from matplotlib import pyplot
from sklearn.model_selection import KFold
from keras.datasets import mnist
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Dense
from keras.layers import Flatten
from keras.optimizers import SGD
# load train and test dataset
def load_dataset():
# load dataset
(trainX, trainY), (testX, testY) = mnist.load_data()
# reshape dataset to have a single channel
trainX = trainX.reshape((trainX.shape[0], 28, 28, 1))
testX = testX.reshape((testX.shape[0], 28, 28, 1))
# one hot encode target values
trainY = to_categorical(trainY)
testY = to_categorical(testY)
return trainX, trainY, testX, testY
22MECE05_Helina
1
3EC32D306_Machine Learning for Embedded Systems
# scale pixels
def prep_pixels(train, test):
# convert from integers to floats
train_norm = train.astype('float32')
test_norm = test.astype('float32')
# normalize to range 0-1
train_norm = train_norm / 255.0
test_norm = test_norm / 255.0
# return normalized images
return train_norm, test_norm
# define cnn model
def define_model():
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform',
input_shape=(28, 28, 1)))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(10, activation='softmax'))
# compile model
opt = SGD(lr=0.01, momentum=0.9)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
return model
# evaluate a model using k-fold cross-validation
def evaluate_model(dataX, dataY, n_folds=5):
scores, histories = list(), list()
# prepare cross validation
kfold = KFold(n_folds, shuffle=True, random_state=1)
# enumerate splits
for train_ix, test_ix in kfold.split(dataX):
# define model
model = define_model()
# select rows for train and test
trainX, trainY, testX, testY = dataX[train_ix], dataY[train_ix], dataX[test_ix],
dataY[test_ix]
# fit model
history = model.fit(trainX, trainY, epochs=10, batch_size=32,
validation_data=(testX, testY), verbose=0)
22MECE05_Helina
2
3EC32D306_Machine Learning for Embedded Systems
# evaluate model
_, acc = model.evaluate(testX, testY, verbose=0)
print('> %.3f' % (acc * 100.0))
# stores scores
scores.append(acc)
histories.append(history)
return scores, histories
# plot diagnostic learning curves
def summarize_diagnostics(histories):
for i in range(len(histories)):
# plot loss
pyplot.subplot(2, 1, 1)
pyplot.title('Cross Entropy Loss')
pyplot.plot(histories[i].history['loss'], color='blue', label='train')
pyplot.plot(histories[i].history['val_loss'], color='orange', label='test')
# plot accuracy
pyplot.subplot(2, 1, 2)
pyplot.title('Classification Accuracy')
pyplot.plot(histories[i].history['accuracy'], color='blue', label='train')
pyplot.plot(histories[i].history['val_accuracy'], color='orange', label='test')
pyplot.show()
# summarize model performance
def summarize_performance(scores):
# print summary
print('Accuracy: mean=%.3f std=%.3f, n=%d' % (mean(scores)*100, std(scores)*100,
len(scores)))
# box and whisker plots of results
pyplot.boxplot(scores)
pyplot.show()
22MECE05_Helina
3
3EC32D306_Machine Learning for Embedded Systems
# run the test harness for evaluating a model
def run_test_harness():
# load dataset
trainX, trainY, testX, testY = load_dataset()
# prepare pixel data
trainX, testX = prep_pixels(trainX, testX)
# evaluate model
scores, histories = evaluate_model(trainX, trainY)
# learning curves
summarize_diagnostics(histories)
# summarize estimated performance
summarize_performance(scores)
# entry point, run the test harness
run_test_harness()
# fit model
model.fit(trainX, trainY, epochs=10, batch_size=32, verbose=0)
# save model
model.save('final_model.h5')
Output:
Fig. 1
22MECE05_Helina
4
3EC32D306_Machine Learning for Embedded Systems
Task 1.1 Evaluate the Model make Prediction for sample image
# make a prediction for a new image.
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.models import load_model
# load and prepare the image
def load_image(filename):
# load the image
img = load_img(filename, grayscale=True,
target_size=(28, 28))
# convert to array
img = img_to_array(img)
# reshape into a single sample with 1 channel
img = img.reshape(1, 28, 28, 1)
# prepare pixel data
img = img.astype('float32')
img = img / 255.0
return img
# load an image and predict the class
def run_example():
# load the image
img = load_image('sample_image.png')
# load model
model = load_model('final_model.h5')
# predict the class
digit = model.predict_classes(img)
print(digit[0])
# entry point, run the example
run_example()
Output:
•
Due to getting errors in the saving the model of Task 1 , I was not able to get the
expected results for this Task 1.1 which to print the number on the console.
22MECE05_Helina
5
3EC32D306_Machine Learning for Embedded Systems
Task 2: MNIST Digit Classification using CNN in MATLAB
clear ; close all; clc
digitDatasetPath = fullfile(matlabroot,'toolbox','nnet','nndemos', ...
'nndatasets','DigitDataset');
imds = imageDatastore(digitDatasetPath, ...
'IncludeSubfolders',true,'LabelSource','foldernames');
figure;
perm = randperm(10000,20);
for i = 1:20
subplot(4,5,i);
imshow(imds.Files{perm(i)});
end
numTrainFiles = 750;
[imdsTrain,imdsValidation] =
splitEachLabel(imds,numTrainFiles,'randomize');
layers = [
imageInputLayer([28 28 1])
convolution2dLayer(3,8,'Padding','same')
batchNormalizationLayer
reluLayer
maxPooling2dLayer(2,'Stride',2)
convolution2dLayer(3,16,'Padding','same')
batchNormalizationLayer
reluLayer
maxPooling2dLayer(2,'Stride',2)
convolution2dLayer(3,32,'Padding','same')
batchNormalizationLayer
reluLayer
fullyConnectedLayer(10)
softmaxLayer
classificationLayer];
options = trainingOptions('sgdm', ...
'InitialLearnRate',0.01, ...
'MaxEpochs',4, ...
'Shuffle','every-epoch', ...
'ValidationData',imdsValidation, ...
'ValidationFrequency',30, ...
'Verbose',false, ...
'Plots','training-progress');
net = trainNetwork(imdsTrain,layers,options);
YPred = classify(net,imdsValidation);
YValidation = imdsValidation.Labels;
accuracy = sum(YPred == YValidation)/numel(YValidation)
22MECE05_Helina
6
3EC32D306_Machine Learning for Embedded Systems
Output:
Fig. 2
Fig. 3
22MECE05_Helina
7
3EC32D306_Machine Learning for Embedded Systems
Task 3: For the model prepared in task 1, modify the model to increase the accuracy.
(3.1) Improve the learning by changing the learning algorithm – change the learning
rate, use batch normalization
# cnn model with batch normalization for mnist
from numpy import mean
from numpy import std
from matplotlib import pyplot
from sklearn.model_selection import KFold
from keras.datasets import mnist
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Dense
from keras.layers import Flatten
from keras.optimizers import SGD
from keras.layers import BatchNormalization
# load train and test dataset
def load_dataset():
# load dataset
(trainX, trainY), (testX, testY) = mnist.load_data()
# reshape dataset to have a single channel
trainX = trainX.reshape((trainX.shape[0], 28, 28, 1))
testX = testX.reshape((testX.shape[0], 28, 28, 1))
# one hot encode target values
trainY = to_categorical(trainY)
testY = to_categorical(testY)
return trainX, trainY, testX, testY
# scale pixels
def prep_pixels(train, test):
# convert from integers to floats
train_norm = train.astype('float32')
test_norm = test.astype('float32')
# normalize to range 0-1
train_norm = train_norm / 255.0
test_norm = test_norm / 255.0
# return normalized images
return train_norm, test_norm
22MECE05_Helina
8
3EC32D306_Machine Learning for Embedded Systems
# define cnn model
def define_model():
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu',
kernel_initializer='he_uniform', input_shape=(28, 28, 1)))
model.add(BatchNormalization())
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(100, activation='relu',
kernel_initializer='he_uniform'))
model.add(BatchNormalization())
model.add(Dense(10, activation='softmax'))
# compile model
opt = SGD(lr=0.01, momentum=0.9)
model.compile(optimizer=opt, loss='categorical_crossentropy',
metrics=['accuracy'])
return model
# evaluate a model using k-fold cross-validation
def evaluate_model(dataX, dataY, n_folds=5):
scores, histories = list(), list()
# prepare cross validation
kfold = KFold(n_folds, shuffle=True, random_state=1)
# enumerate splits
for train_ix, test_ix in kfold.split(dataX):
# define model
model = define_model()
# select rows for train and test
trainX, trainY, testX, testY = dataX[train_ix],
dataY[train_ix], dataX[test_ix], dataY[test_ix]
# fit model
history = model.fit(trainX, trainY, epochs=10,
batch_size=32, validation_data=(testX, testY), verbose=0)
# evaluate model
_, acc = model.evaluate(testX, testY, verbose=0)
print('> %.3f' % (acc * 100.0))
# stores scores
scores.append(acc)
histories.append(history)
return scores, histories
22MECE05_Helina
9
3EC32D306_Machine Learning for Embedded Systems
# plot diagnostic learning curves
def summarize_diagnostics(histories):
for i in range(len(histories)):
# plot loss
pyplot.subplot(2, 1, 1)
pyplot.title('Cross Entropy Loss')
pyplot.plot(histories[i].history['loss'], color='blue',
label='train')
pyplot.plot(histories[i].history['val_loss'], color='orange',
label='test')
# plot accuracy
pyplot.subplot(2, 1, 2)
pyplot.title('Classification Accuracy')
pyplot.plot(histories[i].history['accuracy'], color='blue',
label='train')
pyplot.plot(histories[i].history['val_accuracy'],
color='orange', label='test')
pyplot.show()
# summarize model performance
def summarize_performance(scores):
# print summary
print('Accuracy: mean=%.3f std=%.3f, n=%d' %
(mean(scores)*100, std(scores)*100, len(scores)))
# box and whisker plots of results
pyplot.boxplot(scores)
pyplot.show()
# run the test harness for evaluating a model
def run_test_harness():
# load dataset
trainX, trainY, testX, testY = load_dataset()
# prepare pixel data
trainX, testX = prep_pixels(trainX, testX)
# evaluate model
scores, histories = evaluate_model(trainX, trainY)
# learning curves
summarize_diagnostics(histories)
# summarize estimated performance
summarize_performance(scores)
# entry point, run the test harness
run_test_harness()
22MECE05_Helina
10
3EC32D306_Machine Learning for Embedded Systems
Output:
Fig. 4
(3.2) Increase the depth of the model - Two common approaches involve:
▪
▪
Changing the capacity of the feature extraction part of the model.
Changing the capacity or function of the classifier part of the model
# deeper cnn model for mnist
from numpy import mean
from numpy import std
from matplotlib import pyplot
from sklearn.model_selection import KFold
from keras.datasets import mnist
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Dense
from keras.layers import Flatten
from keras.optimizers import SGD
22MECE05_Helina
11
3EC32D306_Machine Learning for Embedded Systems
# load train and test dataset
def load_dataset():
# load dataset
(trainX, trainY), (testX, testY) = mnist.load_data()
# reshape dataset to have a single channel
trainX = trainX.reshape((trainX.shape[0], 28, 28, 1))
testX = testX.reshape((testX.shape[0], 28, 28, 1))
# one hot encode target values
trainY = to_categorical(trainY)
testY = to_categorical(testY)
return trainX, trainY, testX, testY
# scale pixels
def prep_pixels(train, test):
# convert from integers to floats
train_norm = train.astype('float32')
test_norm = test.astype('float32')
# normalize to range 0-1
train_norm = train_norm / 255.0
test_norm = test_norm / 255.0
# return normalized images
return train_norm, test_norm
# define cnn model
def define_model():
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu',
kernel_initializer='he_uniform', input_shape=(28, 28, 1)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu',
kernel_initializer='he_uniform'))
model.add(Conv2D(64, (3, 3), activation='relu',
kernel_initializer='he_uniform'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(100, activation='relu',
kernel_initializer='he_uniform'))
model.add(Dense(10, activation='softmax'))
# compile model
opt = SGD(lr=0.01, momentum=0.9)
model.compile(optimizer=opt, loss='categorical_crossentropy',
metrics=['accuracy'])
return model
22MECE05_Helina
12
3EC32D306_Machine Learning for Embedded Systems
# evaluate a model using k-fold cross-validation
def evaluate_model(dataX, dataY, n_folds=5):
scores, histories = list(), list()
# prepare cross validation
kfold = KFold(n_folds, shuffle=True, random_state=1)
# enumerate splits
for train_ix, test_ix in kfold.split(dataX):
# define model
model = define_model()
# select rows for train and test
trainX, trainY, testX, testY = dataX[train_ix],
dataY[train_ix], dataX[test_ix], dataY[test_ix]
# fit model
history = model.fit(trainX, trainY, epochs=10,
batch_size=32, validation_data=(testX, testY), verbose=0)
# evaluate model
_, acc = model.evaluate(testX, testY, verbose=0)
print('> %.3f' % (acc * 100.0))
# stores scores
scores.append(acc)
histories.append(history)
return scores, histories
# plot diagnostic learning curves
def summarize_diagnostics(histories):
for i in range(len(histories)):
# plot loss
pyplot.subplot(2, 1, 1)
pyplot.title('Cross Entropy Loss')
pyplot.plot(histories[i].history['loss'], color='blue',
label='train')
pyplot.plot(histories[i].history['val_loss'], color='orange',
label='test')
# plot accuracy
pyplot.subplot(2, 1, 2)
pyplot.title('Classification Accuracy')
pyplot.plot(histories[i].history['accuracy'], color='blue',
label='train')
pyplot.plot(histories[i].history['val_accuracy'],
color='orange', label='test')
pyplot.show()
22MECE05_Helina
13
3EC32D306_Machine Learning for Embedded Systems
# summarize model performance
def summarize_performance(scores):
# print summary
print('Accuracy: mean=%.3f std=%.3f, n=%d' %
(mean(scores)*100, std(scores)*100, len(scores)))
# box and whisker plots of results
pyplot.boxplot(scores)
pyplot.show()
# run the test harness for evaluating a model
def run_test_harness():
# load dataset
trainX, trainY, testX, testY = load_dataset()
# prepare pixel data
trainX, testX = prep_pixels(trainX, testX)
# evaluate model
scores, histories = evaluate_model(trainX, trainY)
# learning curves
summarize_diagnostics(histories)
# summarize estimated performance
summarize_performance(scores)
# entry point, run the test harness
run_test_harness()
Output:
Fig. 5
22MECE05_Helina
14
3EC32D306_Machine Learning for Embedded Systems
Exercise:
1. How do you evaluate the performance of a CNN?
→ Tune Parameters: To improve CNN model performance, we can tune parameters
like epochs, learning rate etc. Number of epochs definitely affects the performance.
For large number of epochs, there is improvement in performance. But need to do
certain experimentation for deciding epochs, learning rate. We can see after certain
epochs there is not any reduction is training loss and improvement in training
accuracy. Accordingly, we can decide number of epochs. Also, we can use dropout
layer in the CNN model. As per the application, need to decide proper optimizer
during compilation of model.
→ Image Data Augmentation: Image augmentation parameters that are generally used
to increase the data sample count are zoom, shear, rotation, pre-processing function
and so on. Usage of these parameters results in generation of images having these
attributes during training of Deep Learning model. Image samples generated using
image augmentation, in general existing data samples increased by the rate of nearly
3x to 4x times. One more advantage of data augmentation is as we know CNN is not
rotation invariant, using augmentation we can add the images in the dataset by
considering rotation.
→ Deeper Network Topology: Deeper networks capture the natural “hierarchy” that is
present everywhere in nature. See a convent for example, it captures low level
features in first layer, a little better but still low-level features in the next layer and at
higher layers object parts and simple structures are captured. The advantage of
multiple layers is that they can learn features at various levels of abstraction.
→ Handel Overfitting and Underfitting problem:
▪ Overfitting refers to a model that models the training data too well. In the
overfitting your model gives very nice accuracy on trained data but very less
accuracy on test data. The meaning of this is overfitting model is having good
memorization ability but less generalization ability. Our model doesn’t generalize
well from our training data to unseen data.
▪ Underfitting refers to a model which works unwell on train as well as test data. In
the technical terms a model that overfits has low bias and high variance. A model
that underfits has high bias and less variance. In any modeling, there will always
be a trade-off between bias and variance and when we build models, we try to
achieve the best balance.
2. How can you decide the size and number of filters in a convolutional layer?
→ Basically, we divide kernel sizes into smaller and larger ones. Smaller kernel sizes
consist of 1x1, 2x2, 3x3 and 4x4, whereas larger one consists of 5x5 and so on, but
we use till 5x5 for 2D convolution. Because of extremely longer training time
consumed and expensiveness, we no longer use such large kernel sizes.
22MECE05_Helina
15
3EC32D306_Machine Learning for Embedded Systems
→ One of the reasons to prefer small kernel sizes over fully connected network is that it
reduces computational costs and weight sharing that ultimately leads to lesser weights
for back-propagation.
→ 1x1 kernel size is only used for dimensionality reduction that aims to reduce the
number of channels. It captures the interaction of input channels in just one pixel of
feature map. Therefore, 1x1 was eliminated as the features extracted will be finely
grained and local that too with no information from the neighboring pixels.
→ 2x2 and 4x4 are generally not preferred because odd-sized filters symmetrically
divide the previous layer pixels around the output pixel. And if this symmetry is not
present, there will be distortions across the layers which happen when using an even
sized kernel, that is, 2x2 and 4x4. So, this is why we don’t use 2x2 and 4x4 kernel
sizes.
→ Therefore, 3x3 is the optimal choice.
3. What is the need of a softmax layer?
→ The softmax function is a function that turns a vector of K real values into a vector of
K real values that sum to 1. The input values can be positive, negative, zero, or greater
than one, but the softmax transforms them into values between 0 and 1, so that they
can be interpreted as probabilities. If one of the inputs is small or negative, the
softmax turns it into a small probability, and if an input is large, then it turns it into a
large probability, but it will always remain between 0 and 1.
→ The softmax function is sometimes called the softargmax function, or multi-class
logistic regression. This is because the softmax is a generalization of logistic
regression that can be used for multi-class classification, and its formula is very
similar to the sigmoid function which is used for logistic regression. The softmax
function can be used in a classifier only when the classes are mutually exclusive.
→ Where all the zi values are the elements of the input vector and can take any real
value. The term on the bottom of the formula is the normalization term which ensures
that all the output values of the function will sum to 1, thus constituting a valid
probability distribution.
22MECE05_Helina
16
3EC32D306_Machine Learning for Embedded Systems
4. Differentiate a fully connected layer and a fully convoluted layer.
Fully connected neural network:
→ A fully connected neural network consists of a series of fully connected layers that
connect every neuron in one layer to every neuron in the other layer.
→ The major advantage of fully connected networks is that they are “structure agnostic”
i.e. there are no special assumptions needed to be made about the input.
→ While being structure agnostic makes fully connected networks very broadly
applicable, such networks do tend to have weaker performance than special-purpose
networks tuned to the structure of a problem space.
Convolutional Neural Network:
→ CNN architectures make the explicit assumption that the inputs are images, which
allows encoding certain properties into the model architecture.
→ A simple CNN is a sequence of layers, and every layer of a CNN transforms one
volume of activations to another through a differentiable function.
→ Three main types of layers are used to build CNN architecture: Convolutional Layer,
Pooling Layer, and Fully-Connected Layer.
Conclusion:
•
From this experiment I learnt about the convolutional neural networks which have one
or more convolutional layers and are used mainly for image processing, classification,
segmentation and also for other auto correlated data. In between the input and the
output the CNN models, we will be using some filters to get the results which is also
known as CNN kernels. Here, I build a CNN network for MNIST Digit Classification
and observed the output. And for this model i also check its accuracy by modifying
the model in the different ways. I also perform CNN task in the MATLAB and
observed its accuracy.
22MECE05_Helina
17
Download