Uploaded by Aitor Alberdi Escudero

Introduction to neural netowrks - Images - CNNs

advertisement
Image processing
Nestor Arana Arexolaleiba
narana@mondragon.edu
https://www.linkedin.com/in/nestor-arana-arexolaleiba-26180b8/
2023
Image
Loading
Loading Image Dataset –Dataset
Classification
…
Detection o
segmentation
Custom
Video
Classification
and prediction
Stereo Images
Subtitles
images
Pytorch https://pytorch.org/vision/stable/datasets.html
03.10.23
Mondragon Unibertsitatea
3
Loading Image Dataset
•
The MS COCO (Microsoft Common Objects in Context)
dataset is consisting of 328K images. It contains
annotations for object detection, key points detection,
panoptic segmentation, stuff image segmentation,
captioning, and Dense human pose estimation.
https://cocodataset.org/
03.10.23
Mondragon Unibertsitatea
4
Loading Image Dataset
•
ImageNet is one of the most popular image databases
with more than 14 million hand-annotated images.
https://medium.com/@704/first‐summary‐imagenet‐classification‐606c904ecd86
https://www.image‐net.org/update‐mar‐11‐2021.php
03.10.23
Mondragon Unibertsitatea
5
Loading Image Dataset
•
03.10.23
MNIST https://en.wikipedia.org/wiki/MNIST_database
Mondragon Unibertsitatea
6
Loading Image Dataset
•
The CIFAR-10 and CIFAR-100 are labeled subsets of
the 80 million tiny images dataset collected by Alex
Krizhevsky, Vinod Nair, and Geoffrey Hinton.
• The CIFAR-10 dataset consists of 60.000 32x32 color images in
10 classes, with 6.000 images per class.
• There are 50000 training images and 10.000 test images.
• CIFAR-100 consists of 100 classes containing 600 images each.
There are 500 training images and 100 testing images per class.
https://www.cs.toronto.edu/~kriz/cifar.html
03.10.23
Mondragon Unibertsitatea
7
Loading Image Dataset
•
Torchvision provides many built-in datasets in the
torchvision.datasets module, as well as utility classes for
creating your own datasets.
dataset = torchvision.datasets.MNIST('./data/cifar', train=True, download=True,
transform=transform)
dataset = torchvision.datasets.CIFAR10('./data/cifar', train=True, download=True,
transform=transform)
dataset = torchvision.datasets.ImageFolder(‘path/to/data', train=True, download=True,
transform=transform)
03.10.23
Mondragon Unibertsitatea
8
Exercise
•
•
03.10.23
Check the result with the code M.6.0
What happen if we use shuffle=True?
Mondragon Unibertsitatea
9
Data Augmentation
•
A common strategy for training neural networks is to introduce
randomness into the input data itself. For example, you can
•
•
•
•
•
•
rotate,
mirror,
scale and/or
crop images
randomly during training.
This will help your network generalize, since you are seeing the
same images but in different locations, with different sizes, in
different orientations, etc.
#Data augmentation
transform = transforms.Compose([
transforms.Resize(32),
transforms.RandomHorizontalFlip()
transforms.ToTensor()])
https://pytorch.org/vision/stable/transforms.html
03.10.23
Mondragon Unibertsitatea
10
Loading Dataset – MIST
import torchvision
from torchvision import transforms
import torch
transform = transforms.Compose([
transforms.Resize(32),
transforms.RandomHorizontalFlip()
transforms.ToTensor()])
dataset = torchvision.datasets.MNIST('./data/cifar’,
train=True,
download=True,
transform=transform)
data_loader = torch.utils.data.DataLoader(dataset,
batch_size=8,
shuffle=True,
num_workers=2)
03.10.23
Mondragon Unibertsitatea
11
Exercise
•
•
03.10.23
What happen if we use
transforms.RandomHorizontalFlip() with the
MNIST images?
Check the result with the code M.6.0
Mondragon Unibertsitatea
12
Loading Dataset – MIST
images, labels = next(iter(val_loader))
plt.figure(figsize=(20, 4))
for index in range(batch_size):
ax = plt.subplot(2, batch_size, index + 1)
plt.imshow(images[index].numpy().reshape (28,28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
print('GroundTruth: ', ' '.join(f'{labels[j]:10}' for j in range(batch_size)))
[3 3 2 9 6 1 3 6]
03.10.23
Mondragon Unibertsitatea
13
Exercise
1. Get a batch of 4 random images from CIFAR10 data set
2. Include the following data augmentation function
• transforms.RandomHorizontalFlip()
03.10.23
Mondragon Unibertsitatea
14
Loading Dataset – Flowers102
1. What happens when the image size of the images are
not always the same?
train_dataset = torchvision.datasets.Flowers102(
root="~/torch_datasets", transform=transform, download=True
)
train_loader = torch.utils.data.DataLoader(
train_dataset, batch_size=train_batch_size, shuffle=True
)
https://www.robots.ox.ac.uk/~vgg/data/flowers/102/categories.html
03.10.23
Mondragon Unibertsitatea
15
Loading Dataset – Custom
•
The easiest way to load your own image data is with
torchvision's ImageFolder datasets
dataset = torchvision.datasets.ImageFolder('path/to/data', transform=transforms)
ddbb/obj1/1.png
ddbb/obj1/2.png
ddbb/obj1/3.png
ddbb/obj2/1.png
ddbb/obj2/2.png
ddbb/obj2/3.png
03.10.23
Mondragon Unibertsitatea
16
Exercise
1. Create your own data set
• 2 categories
• 2-6 images per category
2. Get a batch of 2 random images from your own data set
and show the result
3. Include one relevant transforms function
• Justify why is relevant
• and show the result
03.10.23
Mondragon Unibertsitatea
17
References
•
Beginner’s Guide to Loading Image Data with PyTorch
• https://towardsdatascience.com/beginners-guide-to-loadingimage-data-with-pytorch-289c60b7afec
03.10.23
Mondragon Unibertsitatea
18
Convolutional
Neural
Network - CNN
Convolutional Neural Network
Mondragon Unibertsitatea
20
Convolutional filter
03.10.23
Mondragon Unibertsitatea
21
Convolutional filter
https://towardsdatascience.com/intuitively‐understanding‐convolutions‐for‐deep‐learning‐1f6f42faee1
03.10.23
Mondragon Unibertsitatea
22
Stride
03.10.23
Mondragon Unibertsitatea
23
Max Pooling Layer
https://pub.towardsai.net/introduction‐to‐pooling‐layers‐in‐cnn‐dafe61eabe34
03.10.23
Mondragon Unibertsitatea
24
Padding
•
03.10.23
Padding
Mondragon Unibertsitatea
25
Convolutional Neural Network
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(in_channels=3, out_channels=6, kernet_size=5)
self.pool = nn.MaxPool2d(kernet_size=2)
self.conv2 = nn.Conv2d(in_channels=6, out_channels=16, kernet_size=5)
self.fc1 = nn.Linear(in_channels=16 * 5 * 5, out_channels= 120)
self.fc2 = nn.Linear(in_channels=120, out_channels=84)
self.fc3 = nn.Linear(in_channels=84, out_channels=10)
def forward(self, x):
x = F.relu(self.conv1(x))
x = self.pool(x)
x = F.relu(self.conv2(x))
x = self.pool(x)
# flatten all dimensions except batch
x = torch.flatten(x, 1)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
03.10.23
Mondragon Unibertsitatea
26
Exercise
•
•
03.10.23
Considering that all the convolutional filter are the same
as the previous code what will be the size of the first
convolutional layer if the size if the input image is
Design graphically its shape
Mondragon Unibertsitatea
27
Example
•
Design the shape of the information along the following
CNN.
class NN (nn.Module):
def __init__(self, output_dim):
super().__init__()
self.conv1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=5, stride=2)
self.conv2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=5, stride=2)
self.conv3 = nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, stride=1)
self.fc1 = nn.Linear(576, 256)
self.fc2 = nn.Linear(256, output_dim)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.relu(self.conv2(x))
x = F.relu(self.conv3(x))
x = torch.flatten(x, start_dim=1) # flatten all dimensions except batch
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
return x
03.10.23
Mondragon Unibertsitatea
28
Example
•
03.10.23
Cifar10 tutorial
Mondragon Unibertsitatea
29
Exercises
•
03.10.23
Update the code to plot the training and validation curve
during the training process
Mondragon Unibertsitatea
30
Exercise
•
Perform a CIFAR-10 image classification
• Load the CIFAR-10 using the data loader
• Plot the training and validation curve during the training process
03.10.23
Mondragon Unibertsitatea
31
Extra-Exercise
•
03.10.23
Perform a Custom image classification
Mondragon Unibertsitatea
32
Predefined
Neural
Network
Predefined Neural Network
•
•
03.10.23
Predefined
Pretrained
Mondragon Unibertsitatea
34
Alexnet Architecture
import torchvision
from torchvision.models import models
# Alexnet
model = models.alexnet(pretrained=use_pretrained)
set_parameter_requires_grad(model, feature_extract)
num_ftrs = model.classifier[6].in_features
model.classifier[6] = nn.Linear(num_ftrs,num_classes)
input_size = 224
# Forward
outputs = model(inputs)
03.10.23
Mondragon Unibertsitatea
35
VGG-Net Architecture
import torchvision
from torchvision.models import models
# VGG
model = models.vgg16(pretrained=use_pretrained)
set_parameter_requires_grad(model, feature_extract)
num_ftrs = model.classifier[6].in_features
model.classifier[6] = nn.Linear(num_ftrs,num_classes)
input_size = 224
# Forward
outputs = model(inputs)
03.10.23
Mondragon Unibertsitatea
36
RESNET Architecture
import torchvision
from torchvision.models import models
# RESNET
model= models.resnet18(pretrained=use_pretrained)
set_parameter_requires_grad(model, feature_extract)
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs, num_classes)
input_size = 224
# Forward
outputs = model(inputs)
03.10.23
Mondragon Unibertsitatea
37
Inception Architecture
import torchvision
from torchvision.models import models
# INCEPTION
model = models.inception_v3(pretrained=use_pretrained)
set_parameter_requires_grad(model, feature_extract)
# Handle the auxilary net
num_ftrs = model.AuxLogits.fc.in_features
model.AuxLogits.fc = nn.Linear(num_ftrs, num_classes)
# Handle the primary net
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs,num_classes)
input_size = 299
# Forward
outputs = model(inputs)
03.10.23
Mondragon Unibertsitatea
38
Summary
https://towardsdatascience.com/the‐w3h‐of‐alexnet‐vggnet‐resnet‐and‐inception‐7baaaecccc96
03.10.23
Mondragon Unibertsitatea
39
Segmentation
•
•
•
03.10.23
DeepLabV3
FCN
LRASPP
Mondragon Unibertsitatea
40
Object detection
•
•
•
•
•
Faster R-CNN
FCOS
RetinaNet
SSD
SSDlite
https://jonathan‐hui.medium.com/object‐detection‐speed‐and‐accuracy‐comparison‐faster‐r‐cnn‐r‐fcn‐
ssd‐and‐yolo‐5425656ae359
03.10.23
Mondragon Unibertsitatea
41
Skeleton tracking
•
•
03.10.23
Regression problem
Keypoint R-CNN
Mondragon Unibertsitatea
42
Exercise
•
Perform a CIFAR-10 image classification
• Using Resnet-18 backbone
import torchvision
from torchvision.models import resnet18, ResNet18_Weights
# RESNET 18
weights = ResNet18_Weights.DEFAULT
net2 = torchvision.models.resnet18(weights=weights, progress=False)
net2.fc = torch.nn.Linear(512, 10)
# Forward
outputs = model(inputs)
03.10.23
Mondragon Unibertsitatea
43
(VNHUULN DVNR
0 XFKDVJUDFLDV
7KDQN \RX
Nestor Arana-Arexolaleiba
narana@mondragon.edu
Loramendi, 4. Apartado 23
20500 Arrasate – Mondragon
T. 943 71 21 85
info@mondragon.edu
Download