UNIVERSITY OF ENGINEERING AND TECHNOLOGY TAXILA SOFTWARE ENGINEERING DEPARTMENT Session 2k20 6th Semester Digital Image Processing Assignment 2 Submitted to Dr. Ali Javed Submitted by Sameer Akram Muhammad Ali Ejaz Reg number 20-SE-70 20-SE-40 We will break the project into these three phases: Phase 1: Data Preparation Gather a set of low-resolution images that you want to enhance. If necessary, crop or resize the images to a consistent size. Divide the dataset into training, validation, and testing sets. Phase 2: Model Development Choose an Image Super Resolution method, such as deep learning-based methods or traditional algorithms. Train your chosen model on the training set of low-resolution images and their corresponding high-resolution images. Validate your model on the validation set and tune its hyperparameters to optimize its performance. Test your model on the testing set and evaluate its performance using metrics such as PSNR and SSIM. Phase 3: Post-Processing and Visualization Apply post-processing techniques to the enhanced images, such as denoising or color correction. Visualize and compare the enhanced images to the original low-resolution images to evaluate the effectiveness of the Image Super Resolution method. Preprocessing Stage: 1. Data collection: The first step is to collect a large dataset of surveillance footage containing instances of people walking, running, or performing other activities. The dataset should also include a variety of lighting conditions, camera angles, and scenarios. 2. Data pre-processing: The next step is to pre-process the collected data to ensure that it is of high quality and ready for training. This step can include tasks such as resizing, cropping, and filtering the images, as well as removing any noise or artifacts. 3. Data labelling: The dataset needs to be labelled with annotations indicating the location of people in each image or video frame. This step can be done manually or using automated tools. 4. Model selection: The next step is to select a suitable deep learning model architecture for detecting people in surveillance footage. Some commonly used models for object detection include Faster R-CNN, YOLO, and SSD. 5. Training the model: Once the model is selected, it needs to be trained on the labelled dataset using an appropriate optimization algorithm such as stochastic gradient descent (SGD). The goal is to adjust the model's parameters to minimize the loss function and maximize the accuracy of detecting people in surveillance footage. 6. Model evaluation: After training, the model needs to be evaluated on a separate test set to determine its performance on new and unseen data. This step helps to identify any issues such as overfitting, underfitting, or generalization problems. 7. Model deployment: Finally, the trained model can be deployed to a real-world surveillance system to detect and track people in real-time. The model's output can be used to trigger alarms or alerts, or to assist human operators in identifying suspicious activities. Overall, training a model for surveillance of persons requires careful attention to data quality, labelling, and selection of appropriate deep learning techniques. It also involves testing and evaluating the model to ensure that it is reliable and effective in real-world scenarios. 1. Data collection: You can collect surveillance data from public datasets such as the COCO (Common Objects in Context) dataset, which includes images and videos of people performing various activities. You can also collect data from your local CCTV cameras if you have permission to use the footage for your project. 2. Data processing: One important data processing task is data augmentation, which involves generating additional training samples by applying various transformations to the existing images, such as flipping, rotating, or cropping. This can help to improve the model's ability to generalize to new and unseen data. 3. Data labelling: The data can be labelled manually or using automated tools such as LabelImg, which allows you to draw bounding boxes around the people in each image or video frame. You can also use crowd-sourcing platforms such as Amazon Mechanical Turk to label the data more efficiently. 4. Model selection: One of the best models for detecting people in surveillance is YOLO (You Only Look Once), which is a real-time object detection system that can detect multiple objects in an image or video frame. YOLO is fast and accurate, making it well-suited for surveillance applications. Overall, your project can involve collecting surveillance data, preprocessing it using data augmentation, labeling it with bounding boxes, and training a YOLO model to detect people in the footage. You can then evaluate the performance of the model on a separate test set and deploy it to a real-world surveillance system. import cv2 import numpy as np # Load YOLOv3 model and weights net = cv2.dnn.readNet("yolov3.weights", "yolov3.cfg") # Load COCO class names classes = [] with open("coco.names", "r") as f: classes = [line.strip() for line in f.readlines()] # Set input and output layers layer_names = net.getLayerNames() output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()] input_size = (416, 416) # Process each image in the dataset for i in range(1000): # Load image and resize img = cv2.imread(f"images/{i}.jpg") img = cv2.resize(img, input_size) # Apply data augmentation (e.g., random cropping, rotation, flip) # ... # Apply YOLO object detection blob = cv2.dnn.blobFromImage(img, 1/255.0, input_size, swapRB=True, crop=False) net.setInput(blob) outs = net.forward(output_layers) # Post-process the detections conf_threshold = 0.5 nms_threshold = 0.4 class_ids = [] confidences = [] boxes = [] for out in outs: for detection in out: scores = detection[5:] class_id = np.argmax(scores) confidence = scores[class_id] if confidence > conf_threshold: center_x = int(detection[0] * img.shape[1]) center_y = int(detection[1] * img.shape[0]) width = int(detection[2] * img.shape[1]) height = int(detection[3] * img.shape[0]) left = int(center_x - width / 2) top = int(center_y - height / 2) class_ids.append(class_id) confidences.append(float(confidence)) boxes.append([left, top, width, height]) indices = cv2.dnn.NMSBoxes(boxes, confidences, conf_threshold, nms_threshold) # Draw the bounding boxes and class labels colors = np.random.uniform(0, 255, size=(len(classes), 3)) for i in indices: i = i[0] box = boxes[i] label = f"{classes[class_ids[i]]}: {confidences[i]:.2f}" color = colors[class_ids[i]] cv2.rectangle(img, (box[0], box[1]), (box[0]+box[2], box[1]+box[3]), color, 2) cv2.putText(img, label, (box[0], box[1]-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2) # Save the result image cv2.imwrite(f"result/{i}.jpg", img)