Muhammad Faizan FRKT Department Neural Networks and Neural Computers Moscow Institute of Physics and Technology Draft Thesis Title: Object detection (people, vehicles, animals) Abstract: Due to object detection’s close relationship with video analysis and image understanding, it has attracted much research attention in recent years. This paper provides a review deep learning based object detection frameworks. It also focuses on typical generic object detection architectures along with some modifications and useful tricks to improve detection performance further. It also provides analysis of some state of the art object detection algorithms including yolo, MRCNN etc. and their comparison under various conditions. In the end, several promising directions and tasks are provided to serve as guidelines for future work in both object detection and relevant neural network based learning systems. General Terms: Deep learning, object detection, neural network, CNN Background: Due to object detection’s close relationship with video analysis and image understanding, it has attracted much research attention in recent years. To gain a complete image understanding, we should not only concentrate on classifying different images, but also try to precisely estimate the concepts and locations of objects contained in each image. However, due to large variations in viewpoints, poses, occlusions and lighting conditions, it’s diffiicult to perfectly accomplish object detection with an additional localization task. So much attention has been attracted to this field in recent years. Aims and Objectives: Explore state of the art object detection algorithms, comparing their performance, and implement one with accurate detection of specific objects. Literature Review: Traditional detection algorithms on manually extracting features mainly include six steps: preprocessing, window sliding, feature extraction, feature selection, feature classification and postprocessing and generally for specific recognition tasks. Its disadvantages mainly include small data size, poor portability, no pertinence, high time complexity, window redundancy, no robustness for diversity changes, and good performance only in specific simple environments. In 2012, AlexNet image classification model based on convolutional neural network (CNN) was proposed by Krizhevsjy [1] and others.In the image classification competition of the image dataset ● R-CNN: In 2014, the R-CNN[6] algorithm was proposed by Girshick, which is the first real target detection model based on convolutional neural networks. The improved R-CNN model achieves 66% mAP. As shown in figure 1, the model first uses the Selective Search to extract approximately 2000 region proposals of each image to be detected. Then the size of each extracted proposals is uniformly scaled to a fixed-length feature vector and these extracted image features are input into the SVM classifier for classification. Finally, a linear regression model is trained to perform the regression operation of the bounding box. ● SPP-Net: In 2015, the Spatial Pyramid Pooling (SPP) model was proposed that solves the problems of low detection efficiency and the need for fixed input size image blocks in R-CNN. This algorithm extracts the features of the regions proposal on the feature map after the original image has passed through the convolution layer, and all the convolution calculations are performed only once. At the same time, the spatial pyramid pooling layer is added after the last convolutional layer, and the feature of region proposal is passed through the spatial pyramid pooling layer to extract the feature vector of fixed size. ● YOLOv3: YOLOv3 proposed by Redmon is the most balanced object detection model for detection speed and detection accuracy by far. In terms of category prediction, YOLOv3 is mainly to change the original single-label classification into multi-label classification, and replace the original softmax layer used for single-label multi-classification with a logistic regression layer for multi-label multi-classification. Although the YOLOv3 model further improves the detection speed and the detection effect of small targets has also been significantly improved, the detection accuracy has not been significantly improved, especially when IOU>0.5 Work Done: I have relevant Literature study regarding object detection through deep learning. I have also explored different datasets for specific object detection and how they can be used collectively to enhance the performance. Moreover, Different algorithms are being tested with different test data and analysing which of them performs best in most of the conditions. Conclusion: As one of the most basic and challenging problems in computer vision, object detection has received great attention in recent years. Detection algorithms based on deep learning have been widely applied in many fields, but deep learning still has some problems to be explored: 1) Reduce the dependence on data. 2) To achieve efficient detection of small objects. 3) Realization of multi-category object detection. References: [1] Krizhevsky, A., Sutskever, I., Hinton, G. ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems,2012, 25: 1097-1105 [2] Tian, J.X., Liu, G.C., Gu, S.S., Ju, Z.J., Liu, J.G., Gu, D.D. Research and Challenge of Deep Learning Methods for Medical Image Analysis. Acta Automatica Sinica,2018, 44: 401-424.