Open in app Search Last chance: Become a member and get 25% off the first year (Ends 1/31) CNN on CIFAR10 Data set using PyTorch Shonit Gangoly · Follow 11 min read · Mar 29, 2021 Listen Share More The goal is to apply a Convolutional Neural Net Model on the CIFAR10 image data set and test the accuracy of the model on the basis of image classification. CIFAR10 is a collection of images used to train Machine Learning and Computer Vision algorithms. It contains 60K images having dimension of 32x32 with ten different classes such as airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. We train our Neural Net Model specifically Convolutional Neural Net (CNN) on this data set. CNN's are a class of Deep Learning Algorithms that can recognize and and classify particular features from images and are widely used for analyzing visual images. There are two main parts to a CNN architecture 1. A convolution tool that separates and identifies the various features of the image for analysis in a process called as Feature Extraction. 2. A fully connected layer that utilizes the output from the convolution process and predicts the class of the image based on the features extracted in previous stages. CNN Architecture There are mainly three types of layers: 1. Convolutional Layer : Used to extract features from the image. It creates an MxM matrix filter that slides over the image and uses dot product with the input of the image. This gives us a Feature Map that gives us information about the image such as corners, edges. This is then fed to the other layers to learn more about the image. 2. Pooling Layer : The main goal of this layer is to reduce the convoluted size of the feature map and to reduce Computational costs. This is done by decreasing the connections between layers. It acts as a bridge between Convolutional Layer and the FC layer. 3. Fully Connected Layer : This layer consists of weights and biases along with neurons to connect the various layers. The input image is flattened and fed to this layer. Mathematical operations are then used to do classification of images. 4. Dropout Layer : Usually when all features are connected to FC Layer, it can cause over fitting wherein model performs well on training set but not on the test set. On passing a dropout of 0.3, 30% of neurons are dropped randomly from the network 5. Activation Functions : These are used to learn and approximate any kind of continuous and complex relations between variables of the network. It decides when the variables should fire and when they shouldn't. It also adds nonlinearity to the model. In our example we start by importing the required packages import torch import numpy as np from torchvision import datasets import torchvision.transforms as transforms from torch.utils.data.sampler import SubsetRandomSampler Since these are large images (32x32x3), we use GPU to train our model so that it is much faster. import torch import numpy as np # check if CUDA is available train_on_gpu = torch.cuda.is_available() if not train_on_gpu: print('CUDA is not available. Training on CPU ...') else: print('CUDA is available! Training on GPU ...') Next we load the CIFAR10 Data set from torchvision import datasets import torchvision.transforms as transforms from torch.utils.data.sampler import SubsetRandomSampler # number of subprocesses to use for data loading num_workers = 0 # how many samples per batch to load batch_size = 20 # percentage of training set to use as validation valid_size = 0.2 # convert data to a normalized torch.FloatTensor transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) ]) # choose the training and test datasets train_data = datasets.CIFAR10('data', train=True, download=True, transform=transform) test_data = datasets.CIFAR10('data', train=False, download=True, transform=transform) # obtain training indices that will be used for validation num_train = len(train_data) indices = list(range(num_train)) np.random.shuffle(indices) split = int(np.floor(valid_size * num_train)) train_idx, valid_idx = indices[split:], indices[:split] # define samplers for obtaining training and validation batches train_sampler = SubsetRandomSampler(train_idx) valid_sampler = SubsetRandomSampler(valid_idx) # prepare data loaders (combine dataset and sampler) train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, sampler=train_sampler, num_workers=num_workers) valid_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, sampler=valid_sampler, num_workers=num_workers) test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, num_workers=num_workers) # specify the image classes classes = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] We try and visualize the data set in order to do feature extraction import matplotlib.pyplot as plt %matplotlib inline # helper function to un-normalize and display an image def imshow(img): img = img / 2 + 0.5 # unnormalize plt.imshow(np.transpose(img, (1, 2, 0))) # convert from Tensor image # obtain one batch of training images dataiter = iter(train_loader) images, labels = dataiter.next() images = images.numpy() # convert images to numpy for display # plot the images in the batch, along with the corresponding labels fig = plt.figure(figsize=(25, 4)) # display 20 images for idx in np.arange(20): ax = fig.add_subplot(2, 20/2, idx+1, xticks=[], yticks=[]) imshow(images[idx]) ax.set_title(classes[labels[idx]]) Snippet of the data set View the images in more detail. Here, we look at the normalized red, green, and blue (RGB) color channels as three separate, grayscale intensity images. rgb_img = np.squeeze(images[19]) channels = ['red channel', 'green channel', 'blue channel'] fig = plt.figure(figsize = (36, 36)) for idx in np.arange(rgb_img.shape[0]): ax = fig.add_subplot(1, 3, idx + 1) img = rgb_img[idx] ax.imshow(img, cmap='gray') ax.set_title(channels[idx]) width, height = img.shape thresh = img.max()/2.5 for x in range(width): for y in range(height): val = round(img[x][y],2) if img[x][y] !=0 else 0 ax.annotate(str(val), xy=(y,x), horizontalalignment='center', verticalalignment='center', size=8, color='white' if img[x][y]<thresh else 'black') The vector values of the images To compute the output size of a given convolutional layer we can perform the following calculation (taken from Stanford’s cs231n course): We can compute the spatial size of the output volume as a function of the input volume size (W), the kernel/filter size (F), the stride with which they are applied (S), and the amount of zero padding used (P) on the border. The correct formula for calculating how many neurons define the output_W is given by (W−F+2P)/S+1 . For example for a 7x7 input and a 3x3 filter with stride 1 and pad 0 we would get a 5x5 output. With stride 2 we would get a 3x3 output. import torch.nn as nn import torch.nn.functional as F # define the CNN architecture class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.conv1 = nn.Conv2d(3, 6, 5) self.pool = nn.MaxPool2d(2, 2) self.conv2 = nn.Conv2d(6, 16, 5) self.fc1 = nn.Linear(16 * 5 * 5, 120) self.fc2 = nn.Linear(120, 84) self.fc3 = nn.Linear(84, 10) def x x x x forward(self, x): = self.pool(F.relu(self.conv1(x))) = self.pool(F.relu(self.conv2(x))) = x.view(-1, 16 * 5 * 5) = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x # create a complete CNN model = Net() print(model) # move tensors to GPU if CUDA is available if train_on_gpu: model.cuda() Using a Loss Function and Optimization function. These help us in calculating the difference between the actual output and the target variable. Here we will be using SGD (Stochastic Gradient Descent) optimizer. The gradient descent helps us in reducing losses over time. This is heavily influenced by the learning rate. Learning rate helps us how quickly the model converges to the solution. Here we have chosen a value of 0.01 import torch.optim as optim # specify loss function criterion = nn.CrossEntropyLoss() # specify optimizer optimizer = optim.SGD(model.parameters(), lr=.01) Now we train our model using Validation set. It is used to see how our model performs before using the actual testing set. We need to take a close look at our validation set losses. If the losses increase then it is a case of overfitting # number of epochs to train the model n_epochs = [*range(30)] #List to store loss to visualize train_losslist = [] valid_loss_min = np.Inf # track change in validation loss for epoch in range(1, n_epochs+1): # keep track of training and validation loss train_loss = 0.0 valid_loss = 0.0 ################### # train the model # ################### model.train() for data, target in train_loader: # move tensors to GPU if CUDA is available if train_on_gpu: data, target = data.cuda(), target.cuda() # clear the gradients of all optimized variables optimizer.zero_grad() # forward pass: compute predicted outputs by passing inputs to the model output = model(data) # calculate the batch loss loss = criterion(output, target) # backward pass: compute gradient of the loss with respect to model parameters loss.backward() # perform a single optimization step (parameter update) optimizer.step() # update training loss train_loss += loss.item()*data.size(0) ###################### # validate the model # ###################### model.eval() for data, target in valid_loader: # move tensors to GPU if CUDA is available if train_on_gpu: data, target = data.cuda(), target.cuda() # forward pass: compute predicted outputs by passing inputs to the model output = model(data) # calculate the batch loss loss = criterion(output, target) # update average validation loss valid_loss += loss.item()*data.size(0) # calculate average losses train_loss = train_loss/len(train_loader.dataset) valid_loss = valid_loss/len(valid_loader.dataset) train_losslist.append(train_loss) # print training/validation statistics print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format( epoch, train_loss, valid_loss)) # save model if validation loss has decreased if valid_loss <= valid_loss_min: print('Validation loss decreased ({:.6f} --> {:.6f}). Saving model ...'.format( valid_loss_min, valid_loss)) torch.save(model.state_dict(), 'model_cifar.pt') valid_loss_min = valid_loss plt.plot(n_epochs, train_losslist) plt.xlabel("Epoch") plt.ylabel("Loss") plt.title("Performance of Model 1") plt.show() Training Epoch Model 1 Now loading the model with the lowest Validation loss value model.load_state_dict(torch.load('model_cifar.pt')) Now we test the model on a testing set # track test loss test_loss = 0.0 class_correct = list(0. for i in range(10)) class_total = list(0. for i in range(10)) model.eval() # iterate over test data for data, target in test_loader: # move tensors to GPU if CUDA is available if train_on_gpu: data, target = data.cuda(), target.cuda() # forward pass: compute predicted outputs by passing inputs to the model output = model(data) # calculate the batch loss loss = criterion(output, target) # update test loss test_loss += loss.item()*data.size(0) # convert output probabilities to predicted class _, pred = torch.max(output, 1) # compare predictions to true label correct_tensor = pred.eq(target.data.view_as(pred)) correct = np.squeeze(correct_tensor.numpy()) if not train_on_gpu else np.squeeze(correct_tensor.cpu().numpy()) # calculate test accuracy for each object class for i in range(batch_size): label = target.data[i] class_correct[label] += correct[i].item() class_total[label] += 1 # average test loss test_loss = test_loss/len(test_loader.dataset) print('Test Loss: {:.6f}\n'.format(test_loss)) for i in range(10): if class_total[i] > 0: print('Test Accuracy of %5s: %2d%% (%2d/%2d)' % ( classes[i], 100 * class_correct[i] / class_total[i], np.sum(class_correct[i]), np.sum(class_total[i]))) else: print('Test Accuracy of %5s: N/A (no training examples)' % (classes[i])) print('\nTest Accuracy (Overall): %2d%% (%2d/%2d)' % ( 100. * np.sum(class_correct) / np.sum(class_total), np.sum(class_correct), np.sum(class_total))) Accuracy of 63% We can see that we get an accuracy of 63% if we use the model given in the PyTorch tutorial which is pretty bad. We need to tweak the model as well as the hyper parameters to get a better score My first attempt was using a sequential CNN that has more layers and a larger batch size along with defined flattening layer. class Net(nn.Module): def __init__(self): super(Net, self).__init__() # convolutional layer self.conv1 = nn.Conv2d(3, 16, 3, padding=1) self.conv2 = nn.Conv2d(16, 32, 3, padding=1) self.conv3 = nn.Conv2d(32, 64, 3, padding=1) # max pooling layer self.pool = nn.MaxPool2d(2, 2) # fully connected layers self.fc1 = nn.Linear(64 * 4 * 4, 512) self.fc2 = nn.Linear(512, 64) self.fc3 = nn.Linear(64, 10) # dropout self.dropout = nn.Dropout(p=.5) def forward(self, x): # add sequence of convolutional and max pooling layers x = self.pool(F.relu(self.conv1(x))) x = self.pool(F.relu(self.conv2(x))) x = self.pool(F.relu(self.conv3(x))) # flattening x = x.view(-1, 64 * 4 * 4) # fully connected layers x = self.dropout(F.relu(self.fc1(x))) x = self.dropout(F.relu(self.fc2(x))) x = self.fc3(x) return x Training Performance Model 2 After testing this model we can see an increase in our accuracy Accuracy of 72% The channels have been set as 3 and 16 for the first Conv Layer. 16 and 32 for next layer and then 32 and 64 for third layer. Channels are matrices that are used in the dot product when doing feature selection. Next is changes in Maxpool Layer. This down samples the input representation by taking the maximum value over the window defined by pool size for each dimension along the features axis. It also includes a 50% dropout layer to reduce over fitting. These changes led us to an increase in accuracy to 72%. To increase the accuracy, we need to tweak hyper parameters more along with the learning rate. class CNN(nn.Module): def __init__(self): super(CNN, self).__init__() self.conv_layer = nn.Sequential( padding=1), # Conv Layer block 1 nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, nn.BatchNorm2d(32), nn.ReLU(inplace=True), nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1), nn.ReLU(inplace=True), nn.MaxPool2d(kernel_size=2, stride=2), # Conv Layer block 2 nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding=1), nn.BatchNorm2d(128), nn.ReLU(inplace=True), nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, padding=1), nn.ReLU(inplace=True), nn.MaxPool2d(kernel_size=2, stride=2), nn.Dropout2d(p=0.05), # Conv Layer block 3 nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, padding=1), nn.BatchNorm2d(256), nn.ReLU(inplace=True), nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, padding=1), nn.ReLU(inplace=True), nn.MaxPool2d(kernel_size=2, stride=2), ) self.fc_layer = nn.Sequential( nn.Dropout(p=0.1), ) nn.Linear(4096, 1024), nn.ReLU(inplace=True), nn.Linear(1024, 512), nn.ReLU(inplace=True), nn.Dropout(p=0.1), nn.Linear(512, 10) def forward(self, x): """Perform forward.""" # conv layers x = self.conv_layer(x) # flatten x = x.view(x.size(0), -1) # fc layer x = self.fc_layer(x) return x Here I increased the amount of layers, added Convolutional blocks that have a kernel size of 3. A kernel is a filter used to extract features from the images. I also changed the channel sizes. Next thing I did is change the learning rate from 0.01 to 0.001 so that our model converges more gradually. import torch.optim as optim # specify loss function criterion = nn.CrossEntropyLoss() # specify optimizer optimizer = optim.SGD(model.parameters(), lr=.001) Training the new model Model 3 We get accuracy output as Accuracy of 82% We can see that our accuracy drastically improved to 82%. This happens because the batch sizes in the previous models were too small for the data set which is quite large. Next, our learning rate was set at a higher value thus we were not able to reach the minimum loss in 30 epochs. Changing it to 0.001 helps us converge much more quickly. Comparison between the models Accuracy Chart Train Time between models Challenges faced 1. Feature extraction is an issue. The model struggles when multiple colors are involved in the image. It struggles for images of cats, dogs and birds which are multicolored as compared to the other objects 2. Tried running the model for greater number of epochs and with a much lower learning rate. This caused over fitting and drastically reduced the accuracy of the model Dropped accuracy 3. The tuning of hyper parameters is a challenging task since the training of the models takes a long time. View Source Code ShoGan20/CNN-on-CIFAR10-using-PyTorch Contribute to ShoGan20/CNN-on-CIFAR10-using-PyTorch development by creating an account on GitHub. github.com References 1. https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#sphx-glrbeginner-blitz-cifar10-tutorial-py 2. https://www.upgrad.com/blog/basic-cnn-architecture/ 3. https://www.kaggle.com/datajameson/cifar-10-object-recognition-cnn-explained 4. https://zhenye-na.github.io/2018/09/28/pytorch-cnn-cifar10.html 5. https://www.stefanfiott.com/machine-learning/cifar-10-classifier-using-cnn-inpytorch/ 6. https://medium.com/swlh/image-classification-with-cnn-4f2a501faadb 7. https://github.com/martinoywa/cifar10-cnnexercise/blob/master/cifar10_cnn_exercise.ipynb Neural Networks Convolutional Neural Net Cifar 10 Image Classification Feature Extraction Follow Written by Shonit Gangoly 10 Followers Software Engineer in the making More from Shonit Gangoly Shonit Gangoly Over fitting in Polynomial Regression The goal is to understand the concept of over fitting by using polynomial regression. 7 min read · Apr 8, 2021 4 Shonit Gangoly Classification on Diabetes Data set Using Classification models on Pima Indians Diabetes Data set 16 min read · May 1, 2021 6 Shonit Gangoly Titanic Kaggle Challenge My contribution towards the Titanic Challenge on Kaggle. 8 min read · Feb 23, 2021 1 See all from Shonit Gangoly Recommended from Medium Shivam Baldha Binary Classification with PyTorch In the realm of machine learning, binary classification is a fundamental task that serves as the cornerstone for numerous real-world… 5 min read · Oct 1, 2023 3 Karunesh Upadhyay Implement ResNet in PyTorch Introduction 7 min read · Aug 4, 2023 5 1 Lists Natural Language Processing 1137 stories · 611 saves Juan C Olamendy Understanding ReLU, LeakyReLU, and PReLU: A Comprehensive Guide Why should you care about ReLU and its variants in neural networks? 6 min read · Dec 4, 2023 31 1 Elven Kim How to freeze Keras layers for fine-tuning I am sure many of us heard about transfer learning. Which basically means use existing model and use it to detect a custom dataset. But… 4 min read · Oct 10, 2023 Frederik vl in Advanced Deep Learning Understanding the Convolutional Filter Operation in CNN’s. 5 min read · Aug 18, 2023 134 4 Kenneth Leung in Towards Data Science Practical Guide to Transfer Learning in TensorFlow for Multiclass Image Classification Clearly-explained step-by-step tutorial for implementing transfer learning in image classification · 14 min read · Dec 27, 2022 244 3 See more recommendations