Uploaded by alixhr999

Assignment 2

advertisement
UNIVERSITY OF ENGINEERING AND
TECHNOLOGY TAXILA
SOFTWARE ENGINEERING DEPARTMENT
Session 2k20
6th Semester
Digital Image Processing
Assignment 2
Submitted to
Dr. Ali Javed
Submitted by
Sameer Akram
Muhammad Ali Ejaz
Reg number
20-SE-70
20-SE-40
We will break the project into these three phases:
Phase 1: Data Preparation

Gather a set of low-resolution images that you want to enhance.

If necessary, crop or resize the images to a consistent size.

Divide the dataset into training, validation, and testing sets.
Phase 2: Model Development

Choose an Image Super Resolution method, such as deep learning-based methods or
traditional algorithms.

Train your chosen model on the training set of low-resolution images and their
corresponding high-resolution images.

Validate your model on the validation set and tune its hyperparameters to optimize its
performance.

Test your model on the testing set and evaluate its performance using metrics such as
PSNR and SSIM.
Phase 3: Post-Processing and Visualization

Apply post-processing techniques to the enhanced images, such as denoising or color
correction.

Visualize and compare the enhanced images to the original low-resolution images to
evaluate the effectiveness of the Image Super Resolution method.
Preprocessing Stage:
1. Data collection: The first step is to collect a large dataset of surveillance footage
containing instances of people walking, running, or performing other activities. The
dataset should also include a variety of lighting conditions, camera angles, and
scenarios.
2. Data pre-processing: The next step is to pre-process the collected data to ensure that it
is of high quality and ready for training. This step can include tasks such as resizing,
cropping, and filtering the images, as well as removing any noise or artifacts.
3. Data labelling: The dataset needs to be labelled with annotations indicating the
location of people in each image or video frame. This step can be done manually or
using automated tools.
4. Model selection: The next step is to select a suitable deep learning model architecture
for detecting people in surveillance footage. Some commonly used models for object
detection include Faster R-CNN, YOLO, and SSD.
5. Training the model: Once the model is selected, it needs to be trained on the labelled
dataset using an appropriate optimization algorithm such as stochastic gradient
descent (SGD). The goal is to adjust the model's parameters to minimize the loss
function and maximize the accuracy of detecting people in surveillance footage.
6. Model evaluation: After training, the model needs to be evaluated on a separate test
set to determine its performance on new and unseen data. This step helps to identify
any issues such as overfitting, underfitting, or generalization problems.
7. Model deployment: Finally, the trained model can be deployed to a real-world
surveillance system to detect and track people in real-time. The model's output can be
used to trigger alarms or alerts, or to assist human operators in identifying suspicious
activities.
Overall, training a model for surveillance of persons requires careful attention to data quality,
labelling, and selection of appropriate deep learning techniques. It also involves testing and
evaluating the model to ensure that it is reliable and effective in real-world scenarios.
1. Data collection: You can collect surveillance data from public datasets such as the
COCO (Common Objects in Context) dataset, which includes images and videos of
people performing various activities. You can also collect data from your local CCTV
cameras if you have permission to use the footage for your project.
2. Data processing: One important data processing task is data augmentation, which
involves generating additional training samples by applying various transformations
to the existing images, such as flipping, rotating, or cropping. This can help to
improve the model's ability to generalize to new and unseen data.
3. Data labelling: The data can be labelled manually or using automated tools such as
LabelImg, which allows you to draw bounding boxes around the people in each image
or video frame. You can also use crowd-sourcing platforms such as Amazon
Mechanical Turk to label the data more efficiently.
4. Model selection: One of the best models for detecting people in surveillance is YOLO
(You Only Look Once), which is a real-time object detection system that can detect
multiple objects in an image or video frame. YOLO is fast and accurate, making it
well-suited for surveillance applications.
Overall, your project can involve collecting surveillance data, preprocessing it using data
augmentation, labeling it with bounding boxes, and training a YOLO model to detect people
in the footage. You can then evaluate the performance of the model on a separate test set and
deploy it to a real-world surveillance system.
import cv2
import numpy as np
# Load YOLOv3 model and weights
net = cv2.dnn.readNet("yolov3.weights", "yolov3.cfg")
# Load COCO class names
classes = []
with open("coco.names", "r") as f:
classes = [line.strip() for line in f.readlines()]
# Set input and output layers
layer_names = net.getLayerNames()
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]
input_size = (416, 416)
# Process each image in the dataset
for i in range(1000):
# Load image and resize
img = cv2.imread(f"images/{i}.jpg")
img = cv2.resize(img, input_size)
# Apply data augmentation (e.g., random cropping, rotation, flip)
# ...
# Apply YOLO object detection
blob = cv2.dnn.blobFromImage(img, 1/255.0, input_size, swapRB=True,
crop=False)
net.setInput(blob)
outs = net.forward(output_layers)
# Post-process the detections
conf_threshold = 0.5
nms_threshold = 0.4
class_ids = []
confidences = []
boxes = []
for out in outs:
for detection in out:
scores = detection[5:]
class_id = np.argmax(scores)
confidence = scores[class_id]
if confidence > conf_threshold:
center_x = int(detection[0] * img.shape[1])
center_y = int(detection[1] * img.shape[0])
width = int(detection[2] * img.shape[1])
height = int(detection[3] * img.shape[0])
left = int(center_x - width / 2)
top = int(center_y - height / 2)
class_ids.append(class_id)
confidences.append(float(confidence))
boxes.append([left, top, width, height])
indices = cv2.dnn.NMSBoxes(boxes, confidences, conf_threshold,
nms_threshold)
# Draw the bounding boxes and class labels
colors = np.random.uniform(0, 255, size=(len(classes), 3))
for i in indices:
i = i[0]
box = boxes[i]
label = f"{classes[class_ids[i]]}: {confidences[i]:.2f}"
color = colors[class_ids[i]]
cv2.rectangle(img, (box[0], box[1]), (box[0]+box[2], box[1]+box[3]),
color, 2)
cv2.putText(img, label, (box[0], box[1]-10), cv2.FONT_HERSHEY_SIMPLEX,
0.5, color, 2)
# Save the result image
cv2.imwrite(f"result/{i}.jpg", img)
Download