3EC32D306_Machine Learning for Embedded Systems Exp No: 7 Title: Design a custom CNN architecture and implement transfer learning/ pretrained models Objective: To build a convolutional neural network model for image classification using Python and MATLAB Learning Outcomes: → Build a convolutional neural network → Classify the given data having multiple parameters using neural network → Build a digit classifier using convolutional neural network Task 1: MNIST Digit Classification using CNN in Python # baseline cnn model for mnist from numpy import mean from numpy import std from matplotlib import pyplot from sklearn.model_selection import KFold from keras.datasets import mnist from keras.utils import to_categorical from keras.models import Sequential from keras.layers import Conv2D from keras.layers import MaxPooling2D from keras.layers import Dense from keras.layers import Flatten from keras.optimizers import SGD # load train and test dataset def load_dataset(): # load dataset (trainX, trainY), (testX, testY) = mnist.load_data() # reshape dataset to have a single channel trainX = trainX.reshape((trainX.shape[0], 28, 28, 1)) testX = testX.reshape((testX.shape[0], 28, 28, 1)) # one hot encode target values trainY = to_categorical(trainY) testY = to_categorical(testY) return trainX, trainY, testX, testY 22MECE05_Helina 1 3EC32D306_Machine Learning for Embedded Systems # scale pixels def prep_pixels(train, test): # convert from integers to floats train_norm = train.astype('float32') test_norm = test.astype('float32') # normalize to range 0-1 train_norm = train_norm / 255.0 test_norm = test_norm / 255.0 # return normalized images return train_norm, test_norm # define cnn model def define_model(): model = Sequential() model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', input_shape=(28, 28, 1))) model.add(MaxPooling2D((2, 2))) model.add(Flatten()) model.add(Dense(100, activation='relu', kernel_initializer='he_uniform')) model.add(Dense(10, activation='softmax')) # compile model opt = SGD(lr=0.01, momentum=0.9) model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy']) return model # evaluate a model using k-fold cross-validation def evaluate_model(dataX, dataY, n_folds=5): scores, histories = list(), list() # prepare cross validation kfold = KFold(n_folds, shuffle=True, random_state=1) # enumerate splits for train_ix, test_ix in kfold.split(dataX): # define model model = define_model() # select rows for train and test trainX, trainY, testX, testY = dataX[train_ix], dataY[train_ix], dataX[test_ix], dataY[test_ix] # fit model history = model.fit(trainX, trainY, epochs=10, batch_size=32, validation_data=(testX, testY), verbose=0) 22MECE05_Helina 2 3EC32D306_Machine Learning for Embedded Systems # evaluate model _, acc = model.evaluate(testX, testY, verbose=0) print('> %.3f' % (acc * 100.0)) # stores scores scores.append(acc) histories.append(history) return scores, histories # plot diagnostic learning curves def summarize_diagnostics(histories): for i in range(len(histories)): # plot loss pyplot.subplot(2, 1, 1) pyplot.title('Cross Entropy Loss') pyplot.plot(histories[i].history['loss'], color='blue', label='train') pyplot.plot(histories[i].history['val_loss'], color='orange', label='test') # plot accuracy pyplot.subplot(2, 1, 2) pyplot.title('Classification Accuracy') pyplot.plot(histories[i].history['accuracy'], color='blue', label='train') pyplot.plot(histories[i].history['val_accuracy'], color='orange', label='test') pyplot.show() # summarize model performance def summarize_performance(scores): # print summary print('Accuracy: mean=%.3f std=%.3f, n=%d' % (mean(scores)*100, std(scores)*100, len(scores))) # box and whisker plots of results pyplot.boxplot(scores) pyplot.show() 22MECE05_Helina 3 3EC32D306_Machine Learning for Embedded Systems # run the test harness for evaluating a model def run_test_harness(): # load dataset trainX, trainY, testX, testY = load_dataset() # prepare pixel data trainX, testX = prep_pixels(trainX, testX) # evaluate model scores, histories = evaluate_model(trainX, trainY) # learning curves summarize_diagnostics(histories) # summarize estimated performance summarize_performance(scores) # entry point, run the test harness run_test_harness() # fit model model.fit(trainX, trainY, epochs=10, batch_size=32, verbose=0) # save model model.save('final_model.h5') Output: Fig. 1 22MECE05_Helina 4 3EC32D306_Machine Learning for Embedded Systems Task 1.1 Evaluate the Model make Prediction for sample image # make a prediction for a new image. from keras.preprocessing.image import load_img from keras.preprocessing.image import img_to_array from keras.models import load_model # load and prepare the image def load_image(filename): # load the image img = load_img(filename, grayscale=True, target_size=(28, 28)) # convert to array img = img_to_array(img) # reshape into a single sample with 1 channel img = img.reshape(1, 28, 28, 1) # prepare pixel data img = img.astype('float32') img = img / 255.0 return img # load an image and predict the class def run_example(): # load the image img = load_image('sample_image.png') # load model model = load_model('final_model.h5') # predict the class digit = model.predict_classes(img) print(digit[0]) # entry point, run the example run_example() Output: • Due to getting errors in the saving the model of Task 1 , I was not able to get the expected results for this Task 1.1 which to print the number on the console. 22MECE05_Helina 5 3EC32D306_Machine Learning for Embedded Systems Task 2: MNIST Digit Classification using CNN in MATLAB clear ; close all; clc digitDatasetPath = fullfile(matlabroot,'toolbox','nnet','nndemos', ... 'nndatasets','DigitDataset'); imds = imageDatastore(digitDatasetPath, ... 'IncludeSubfolders',true,'LabelSource','foldernames'); figure; perm = randperm(10000,20); for i = 1:20 subplot(4,5,i); imshow(imds.Files{perm(i)}); end numTrainFiles = 750; [imdsTrain,imdsValidation] = splitEachLabel(imds,numTrainFiles,'randomize'); layers = [ imageInputLayer([28 28 1]) convolution2dLayer(3,8,'Padding','same') batchNormalizationLayer reluLayer maxPooling2dLayer(2,'Stride',2) convolution2dLayer(3,16,'Padding','same') batchNormalizationLayer reluLayer maxPooling2dLayer(2,'Stride',2) convolution2dLayer(3,32,'Padding','same') batchNormalizationLayer reluLayer fullyConnectedLayer(10) softmaxLayer classificationLayer]; options = trainingOptions('sgdm', ... 'InitialLearnRate',0.01, ... 'MaxEpochs',4, ... 'Shuffle','every-epoch', ... 'ValidationData',imdsValidation, ... 'ValidationFrequency',30, ... 'Verbose',false, ... 'Plots','training-progress'); net = trainNetwork(imdsTrain,layers,options); YPred = classify(net,imdsValidation); YValidation = imdsValidation.Labels; accuracy = sum(YPred == YValidation)/numel(YValidation) 22MECE05_Helina 6 3EC32D306_Machine Learning for Embedded Systems Output: Fig. 2 Fig. 3 22MECE05_Helina 7 3EC32D306_Machine Learning for Embedded Systems Task 3: For the model prepared in task 1, modify the model to increase the accuracy. (3.1) Improve the learning by changing the learning algorithm – change the learning rate, use batch normalization # cnn model with batch normalization for mnist from numpy import mean from numpy import std from matplotlib import pyplot from sklearn.model_selection import KFold from keras.datasets import mnist from keras.utils import to_categorical from keras.models import Sequential from keras.layers import Conv2D from keras.layers import MaxPooling2D from keras.layers import Dense from keras.layers import Flatten from keras.optimizers import SGD from keras.layers import BatchNormalization # load train and test dataset def load_dataset(): # load dataset (trainX, trainY), (testX, testY) = mnist.load_data() # reshape dataset to have a single channel trainX = trainX.reshape((trainX.shape[0], 28, 28, 1)) testX = testX.reshape((testX.shape[0], 28, 28, 1)) # one hot encode target values trainY = to_categorical(trainY) testY = to_categorical(testY) return trainX, trainY, testX, testY # scale pixels def prep_pixels(train, test): # convert from integers to floats train_norm = train.astype('float32') test_norm = test.astype('float32') # normalize to range 0-1 train_norm = train_norm / 255.0 test_norm = test_norm / 255.0 # return normalized images return train_norm, test_norm 22MECE05_Helina 8 3EC32D306_Machine Learning for Embedded Systems # define cnn model def define_model(): model = Sequential() model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', input_shape=(28, 28, 1))) model.add(BatchNormalization()) model.add(MaxPooling2D((2, 2))) model.add(Flatten()) model.add(Dense(100, activation='relu', kernel_initializer='he_uniform')) model.add(BatchNormalization()) model.add(Dense(10, activation='softmax')) # compile model opt = SGD(lr=0.01, momentum=0.9) model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy']) return model # evaluate a model using k-fold cross-validation def evaluate_model(dataX, dataY, n_folds=5): scores, histories = list(), list() # prepare cross validation kfold = KFold(n_folds, shuffle=True, random_state=1) # enumerate splits for train_ix, test_ix in kfold.split(dataX): # define model model = define_model() # select rows for train and test trainX, trainY, testX, testY = dataX[train_ix], dataY[train_ix], dataX[test_ix], dataY[test_ix] # fit model history = model.fit(trainX, trainY, epochs=10, batch_size=32, validation_data=(testX, testY), verbose=0) # evaluate model _, acc = model.evaluate(testX, testY, verbose=0) print('> %.3f' % (acc * 100.0)) # stores scores scores.append(acc) histories.append(history) return scores, histories 22MECE05_Helina 9 3EC32D306_Machine Learning for Embedded Systems # plot diagnostic learning curves def summarize_diagnostics(histories): for i in range(len(histories)): # plot loss pyplot.subplot(2, 1, 1) pyplot.title('Cross Entropy Loss') pyplot.plot(histories[i].history['loss'], color='blue', label='train') pyplot.plot(histories[i].history['val_loss'], color='orange', label='test') # plot accuracy pyplot.subplot(2, 1, 2) pyplot.title('Classification Accuracy') pyplot.plot(histories[i].history['accuracy'], color='blue', label='train') pyplot.plot(histories[i].history['val_accuracy'], color='orange', label='test') pyplot.show() # summarize model performance def summarize_performance(scores): # print summary print('Accuracy: mean=%.3f std=%.3f, n=%d' % (mean(scores)*100, std(scores)*100, len(scores))) # box and whisker plots of results pyplot.boxplot(scores) pyplot.show() # run the test harness for evaluating a model def run_test_harness(): # load dataset trainX, trainY, testX, testY = load_dataset() # prepare pixel data trainX, testX = prep_pixels(trainX, testX) # evaluate model scores, histories = evaluate_model(trainX, trainY) # learning curves summarize_diagnostics(histories) # summarize estimated performance summarize_performance(scores) # entry point, run the test harness run_test_harness() 22MECE05_Helina 10 3EC32D306_Machine Learning for Embedded Systems Output: Fig. 4 (3.2) Increase the depth of the model - Two common approaches involve: ▪ ▪ Changing the capacity of the feature extraction part of the model. Changing the capacity or function of the classifier part of the model # deeper cnn model for mnist from numpy import mean from numpy import std from matplotlib import pyplot from sklearn.model_selection import KFold from keras.datasets import mnist from keras.utils import to_categorical from keras.models import Sequential from keras.layers import Conv2D from keras.layers import MaxPooling2D from keras.layers import Dense from keras.layers import Flatten from keras.optimizers import SGD 22MECE05_Helina 11 3EC32D306_Machine Learning for Embedded Systems # load train and test dataset def load_dataset(): # load dataset (trainX, trainY), (testX, testY) = mnist.load_data() # reshape dataset to have a single channel trainX = trainX.reshape((trainX.shape[0], 28, 28, 1)) testX = testX.reshape((testX.shape[0], 28, 28, 1)) # one hot encode target values trainY = to_categorical(trainY) testY = to_categorical(testY) return trainX, trainY, testX, testY # scale pixels def prep_pixels(train, test): # convert from integers to floats train_norm = train.astype('float32') test_norm = test.astype('float32') # normalize to range 0-1 train_norm = train_norm / 255.0 test_norm = test_norm / 255.0 # return normalized images return train_norm, test_norm # define cnn model def define_model(): model = Sequential() model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', input_shape=(28, 28, 1))) model.add(MaxPooling2D((2, 2))) model.add(Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_uniform')) model.add(Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_uniform')) model.add(MaxPooling2D((2, 2))) model.add(Flatten()) model.add(Dense(100, activation='relu', kernel_initializer='he_uniform')) model.add(Dense(10, activation='softmax')) # compile model opt = SGD(lr=0.01, momentum=0.9) model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy']) return model 22MECE05_Helina 12 3EC32D306_Machine Learning for Embedded Systems # evaluate a model using k-fold cross-validation def evaluate_model(dataX, dataY, n_folds=5): scores, histories = list(), list() # prepare cross validation kfold = KFold(n_folds, shuffle=True, random_state=1) # enumerate splits for train_ix, test_ix in kfold.split(dataX): # define model model = define_model() # select rows for train and test trainX, trainY, testX, testY = dataX[train_ix], dataY[train_ix], dataX[test_ix], dataY[test_ix] # fit model history = model.fit(trainX, trainY, epochs=10, batch_size=32, validation_data=(testX, testY), verbose=0) # evaluate model _, acc = model.evaluate(testX, testY, verbose=0) print('> %.3f' % (acc * 100.0)) # stores scores scores.append(acc) histories.append(history) return scores, histories # plot diagnostic learning curves def summarize_diagnostics(histories): for i in range(len(histories)): # plot loss pyplot.subplot(2, 1, 1) pyplot.title('Cross Entropy Loss') pyplot.plot(histories[i].history['loss'], color='blue', label='train') pyplot.plot(histories[i].history['val_loss'], color='orange', label='test') # plot accuracy pyplot.subplot(2, 1, 2) pyplot.title('Classification Accuracy') pyplot.plot(histories[i].history['accuracy'], color='blue', label='train') pyplot.plot(histories[i].history['val_accuracy'], color='orange', label='test') pyplot.show() 22MECE05_Helina 13 3EC32D306_Machine Learning for Embedded Systems # summarize model performance def summarize_performance(scores): # print summary print('Accuracy: mean=%.3f std=%.3f, n=%d' % (mean(scores)*100, std(scores)*100, len(scores))) # box and whisker plots of results pyplot.boxplot(scores) pyplot.show() # run the test harness for evaluating a model def run_test_harness(): # load dataset trainX, trainY, testX, testY = load_dataset() # prepare pixel data trainX, testX = prep_pixels(trainX, testX) # evaluate model scores, histories = evaluate_model(trainX, trainY) # learning curves summarize_diagnostics(histories) # summarize estimated performance summarize_performance(scores) # entry point, run the test harness run_test_harness() Output: Fig. 5 22MECE05_Helina 14 3EC32D306_Machine Learning for Embedded Systems Exercise: 1. How do you evaluate the performance of a CNN? → Tune Parameters: To improve CNN model performance, we can tune parameters like epochs, learning rate etc. Number of epochs definitely affects the performance. For large number of epochs, there is improvement in performance. But need to do certain experimentation for deciding epochs, learning rate. We can see after certain epochs there is not any reduction is training loss and improvement in training accuracy. Accordingly, we can decide number of epochs. Also, we can use dropout layer in the CNN model. As per the application, need to decide proper optimizer during compilation of model. → Image Data Augmentation: Image augmentation parameters that are generally used to increase the data sample count are zoom, shear, rotation, pre-processing function and so on. Usage of these parameters results in generation of images having these attributes during training of Deep Learning model. Image samples generated using image augmentation, in general existing data samples increased by the rate of nearly 3x to 4x times. One more advantage of data augmentation is as we know CNN is not rotation invariant, using augmentation we can add the images in the dataset by considering rotation. → Deeper Network Topology: Deeper networks capture the natural “hierarchy” that is present everywhere in nature. See a convent for example, it captures low level features in first layer, a little better but still low-level features in the next layer and at higher layers object parts and simple structures are captured. The advantage of multiple layers is that they can learn features at various levels of abstraction. → Handel Overfitting and Underfitting problem: ▪ Overfitting refers to a model that models the training data too well. In the overfitting your model gives very nice accuracy on trained data but very less accuracy on test data. The meaning of this is overfitting model is having good memorization ability but less generalization ability. Our model doesn’t generalize well from our training data to unseen data. ▪ Underfitting refers to a model which works unwell on train as well as test data. In the technical terms a model that overfits has low bias and high variance. A model that underfits has high bias and less variance. In any modeling, there will always be a trade-off between bias and variance and when we build models, we try to achieve the best balance. 2. How can you decide the size and number of filters in a convolutional layer? → Basically, we divide kernel sizes into smaller and larger ones. Smaller kernel sizes consist of 1x1, 2x2, 3x3 and 4x4, whereas larger one consists of 5x5 and so on, but we use till 5x5 for 2D convolution. Because of extremely longer training time consumed and expensiveness, we no longer use such large kernel sizes. 22MECE05_Helina 15 3EC32D306_Machine Learning for Embedded Systems → One of the reasons to prefer small kernel sizes over fully connected network is that it reduces computational costs and weight sharing that ultimately leads to lesser weights for back-propagation. → 1x1 kernel size is only used for dimensionality reduction that aims to reduce the number of channels. It captures the interaction of input channels in just one pixel of feature map. Therefore, 1x1 was eliminated as the features extracted will be finely grained and local that too with no information from the neighboring pixels. → 2x2 and 4x4 are generally not preferred because odd-sized filters symmetrically divide the previous layer pixels around the output pixel. And if this symmetry is not present, there will be distortions across the layers which happen when using an even sized kernel, that is, 2x2 and 4x4. So, this is why we don’t use 2x2 and 4x4 kernel sizes. → Therefore, 3x3 is the optimal choice. 3. What is the need of a softmax layer? → The softmax function is a function that turns a vector of K real values into a vector of K real values that sum to 1. The input values can be positive, negative, zero, or greater than one, but the softmax transforms them into values between 0 and 1, so that they can be interpreted as probabilities. If one of the inputs is small or negative, the softmax turns it into a small probability, and if an input is large, then it turns it into a large probability, but it will always remain between 0 and 1. → The softmax function is sometimes called the softargmax function, or multi-class logistic regression. This is because the softmax is a generalization of logistic regression that can be used for multi-class classification, and its formula is very similar to the sigmoid function which is used for logistic regression. The softmax function can be used in a classifier only when the classes are mutually exclusive. → Where all the zi values are the elements of the input vector and can take any real value. The term on the bottom of the formula is the normalization term which ensures that all the output values of the function will sum to 1, thus constituting a valid probability distribution. 22MECE05_Helina 16 3EC32D306_Machine Learning for Embedded Systems 4. Differentiate a fully connected layer and a fully convoluted layer. Fully connected neural network: → A fully connected neural network consists of a series of fully connected layers that connect every neuron in one layer to every neuron in the other layer. → The major advantage of fully connected networks is that they are “structure agnostic” i.e. there are no special assumptions needed to be made about the input. → While being structure agnostic makes fully connected networks very broadly applicable, such networks do tend to have weaker performance than special-purpose networks tuned to the structure of a problem space. Convolutional Neural Network: → CNN architectures make the explicit assumption that the inputs are images, which allows encoding certain properties into the model architecture. → A simple CNN is a sequence of layers, and every layer of a CNN transforms one volume of activations to another through a differentiable function. → Three main types of layers are used to build CNN architecture: Convolutional Layer, Pooling Layer, and Fully-Connected Layer. Conclusion: • From this experiment I learnt about the convolutional neural networks which have one or more convolutional layers and are used mainly for image processing, classification, segmentation and also for other auto correlated data. In between the input and the output the CNN models, we will be using some filters to get the results which is also known as CNN kernels. Here, I build a CNN network for MNIST Digit Classification and observed the output. And for this model i also check its accuracy by modifying the model in the different ways. I also perform CNN task in the MATLAB and observed its accuracy. 22MECE05_Helina 17