Uploaded by Faizan Saleem

Muhammad Faizan Thesis Draft

advertisement
Muhammad Faizan
FRKT Department
Neural Networks and Neural Computers
Moscow Institute of Physics and Technology
Draft Thesis
Title:
Object detection (people, vehicles, animals)
Abstract:
Due to object detection’s close relationship with video analysis and image understanding, it has
attracted much research attention in recent years. This paper provides a review deep learning
based object detection frameworks. It also focuses on typical generic object detection
architectures along with some modifications and useful tricks to improve detection performance
further. It also provides analysis of some state of the art object detection algorithms including yolo,
MRCNN etc. and their comparison under various conditions. In the end, several promising
directions and tasks are provided to serve as guidelines for future work in both object detection
and relevant neural network based learning systems.
General Terms:
Deep learning, object detection, neural network, CNN
Background:
Due to object detection’s close relationship with video analysis and image understanding, it has
attracted much research attention in recent years. To gain a complete image understanding, we
should not only concentrate on classifying different images, but also try to precisely estimate the
concepts and locations of objects contained in each image. However, due to large variations in
viewpoints, poses, occlusions and lighting conditions, it’s diffiicult to perfectly accomplish object
detection with an additional localization task. So much attention has been attracted to this field
in recent years.
Aims and Objectives:
Explore state of the art object detection algorithms, comparing their performance, and
implement one with accurate detection of specific objects.
Literature Review:
Traditional detection algorithms on manually extracting features mainly include six steps: preprocessing, window sliding, feature extraction, feature selection, feature classification and postprocessing and generally for specific recognition tasks. Its disadvantages mainly include small
data size, poor portability, no pertinence, high time complexity, window redundancy, no
robustness for diversity changes, and good performance only in specific simple environments.
In 2012, AlexNet image classification model based on convolutional neural network (CNN) was
proposed by Krizhevsjy [1] and others.In the image classification competition of the image dataset
● R-CNN:
In 2014, the R-CNN[6] algorithm was proposed by Girshick, which is the first real target
detection model based on convolutional neural networks. The improved R-CNN model
achieves 66% mAP. As shown in figure 1, the model first uses the Selective Search to
extract approximately 2000 region proposals of each image to be detected. Then the size
of each extracted proposals is uniformly scaled to a fixed-length feature vector and these
extracted image features are input into the SVM classifier for classification. Finally, a linear
regression model is trained to perform the regression operation of the bounding box.
● SPP-Net:
In 2015, the Spatial Pyramid Pooling (SPP) model was proposed that solves the problems
of low detection efficiency and the need for fixed input size image blocks in R-CNN. This
algorithm extracts the features of the regions proposal on the feature map after the original
image has passed through the convolution layer, and all the convolution calculations are
performed only once. At the same time, the spatial pyramid pooling layer is added after
the last convolutional layer, and the feature of region proposal is passed through the
spatial pyramid pooling layer to extract the feature vector of fixed size.
● YOLOv3:
YOLOv3 proposed by Redmon is the most balanced object detection model for detection
speed and detection accuracy by far. In terms of category prediction, YOLOv3 is mainly
to change the original single-label classification into multi-label classification, and replace
the original softmax layer used for single-label multi-classification with a logistic regression
layer for multi-label multi-classification.
Although the YOLOv3 model further improves the detection speed and the detection effect
of small targets has also been significantly improved, the detection accuracy has not been
significantly improved, especially when IOU>0.5
Work Done:
I have relevant Literature study regarding object detection through deep learning. I have also
explored different datasets for specific object detection and how they can be used collectively to
enhance the performance. Moreover, Different algorithms are being tested with different test
data and analysing which of them performs best in most of the conditions.
Conclusion:
As one of the most basic and challenging problems in computer vision, object detection has
received great attention in recent years. Detection algorithms based on deep learning have
been widely applied in many fields, but deep learning still has some problems to be explored:
1) Reduce the dependence on data.
2) To achieve efficient detection of small objects.
3) Realization of multi-category object detection.
References:
[1] Krizhevsky, A., Sutskever, I., Hinton, G. ImageNet Classification with Deep Convolutional
Neural Networks. Advances in Neural Information Processing Systems,2012, 25: 1097-1105
[2] Tian, J.X., Liu, G.C., Gu, S.S., Ju, Z.J., Liu, J.G., Gu, D.D. Research and Challenge of Deep
Learning Methods for Medical Image Analysis. Acta Automatica Sinica,2018, 44: 401-424.
Download