Uploaded by Лилия Зиганшина

10.1109@ICICCS48265.2020.9121150

advertisement
Proceedings of the International Conference on Intelligent Computing and Control Systems (ICICCS 2020)
IEEE Xplore Part Number:CFP20K74-ART; ISBN: 978-1-7281-4876-2
Drone Detection and Classification using
Deep Learning
Dinesh Kumar Behera
Dept. of Aerospace Engineering
Defence Institute of Advanced Technology
Pune, India
dkbehera36@gmail.com
Abstract— This paper presents a systematic approach to drone
detection and classification using deep learning with different
modalities. The YOLOv3 object detector is used to detect the
moving or still objects. It uses a computer vision-based
approach as a reliable solution. The convolutional neural
network helps to extract features from images and to detect the
object with maximum accuracy. The model is trained with a
proper dataset and trained for 150 epoch only to detect various
types of drones. A con volutional neural network with modern
object detection methods shows an excellent approach for realtime detection of drones.
Keywords— Deep learning, Convolutional neural network, object
detector
Arockia Bazil Raj
Dept. of Electronics Engineering
Defence Institute of Advanced Technology
Pune, India
brazilraj.a@diat.ac.in
the ability to ext ract features easy to understand and reliab le
than conventional machine learning approaches [4-5].
In this study, experiments with the latest object detectors are
carried out based on the deep learning approach to detect
drones. While detecting it is going to classify the type such
as tricopter, quadcopter, or hexacopter. The model is trained
with a proper dataset so that in the case due to some
orientation or scaling issue it is not able to find out the type
but it is going to detect the drone. In figure 1, so me images
fro m the dataset are shown with ground truth annotation [68]
I. INTRODUCTION
The size of the drone industry expands exponentially to
make this gadget reachable to common citizens with cheaper
prices. By loading explosives , with them, the drone can
easily be converted into killer weapons. Even by pertaining
drones, some terror attacks attempts have been reported [1,
17]. As drones are small in size and having small
electro magnetic
signature
makes
difficu lties
fo r
conventional RADA R for detection. A counter mechanis m
is appraised by industry and the academic world.
An object is having a specific structure, texture as well as
some specific pattern. In natural environments, it is difficult
to differentiate between the same types of objects because of
high variation. The performance of an object detector
reduces due to the lighting condition, change of appearance,
and at what angle the object is facing towards the camera.
Most object detector fails when some deformat ion happens
to the object or changes in scale happens to the object [3, 4].
Stopple and background noises add more difficult ies to the
object detector. In the modern-day object detection, a
convolutional neural network (CNN) [14] has performed so
well that traditional methods have almost vanished from the
picture. The best part of the convolutional neural network is
its ability for extract ing features. Based on the convolutional
neural network, many object detectors come in to picture
such as R-CNN [15], Fast R-CNN [7], Faster R-CNN [9],
YOLO [6], SSD [12], etc. Apart fro m that, the highperformance GPUs and its easy availability through the use
of
high-performance
cloud
computing
advanced
computational ability. In the success of the neural network,
it played a crucial ro le. Deep learning arch itectures allo w
Fig 1. sample images from the dataset
II. LITERATURE REVIEW
In the past few years, many object detector has been
proposed in the field of deep learn ing such as R-CNN [15],
Fast R-CNN [7], Faster R-CNN [9], YOLO [6], SSD [12],
etc. CNN brought a revolutionary change in the field o f
computer vision and object detection. CNN is a hierarchical
model that can be trained to perform a variety of detection,
recognition, and segmentation tasks. It conducts a regionbased approach in which characteristics are extracted in a
layer hierarchy, where lower layers in the network extract
978-1-7281-4876-2/20/$31.00 ©2020 IEEE
1012
Authorized licensed use limited to: University of Exeter. Downloaded on June 25,2020 at 12:02:19 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Intelligent Computing and Control Systems (ICICCS 2020)
IEEE Xplore Part Number:CFP20K74-ART; ISBN: 978-1-7281-4876-2
lower-level characteristics such as edges and middle layer
display droplet-like structure [18-20].
R-CNN, Fast R-CNN, and Faster R-CNN are the modern
days’ region-based approach. R-CNN and Fast R-CNN use
selective search methods to find out object properties of
different scales and at different positions of the image.
Again for all of these, the image is compressed to a fixed
size o f (227x227) p ixels before feeding it to the CNN
model. In R-CNN it collects 2000 reg ions by using the
selective search [21] method. In this, they collect similarlooking sub-regions and try to merge it to get the final big
region. After getting the big region classifier has to classify
into one of the classes. But selective search does not fulfill
proper requirements. So to avoid this one linear regression
method is used to map pred icted bounded box to ground truth bounding box. Then it uses the SVM classifier [8] in
an offline manner. For every classified region, one SVM
classifier is used. R-CNN takes more time for the testing
procedure as its training pipeline is very complex. In the
case of Fast R-CNN, the feeding approach is the same as
before in R-CNN. After feed ing it to CNN it creates a CNN
feature map. Then this feature map is fed to the RoI pooling
layer, which co mpresses them into square sizes and feeds it
to fully connected layers for getting image vectors. Then
using the softmax layer it predicts offset values for the
bounding boxes as well as the class to which the bounded
region belongs.
In the case of Faster R-CNN, instead of using the selective
search method for finding the region fro m the featured map
form CNN layer they use a separate region-based proposal.
Then it feeds to the RoI pooling layer. Finally, the softmax
layer for classification is used.
In the case of Faster R-CNN, instead of using the selective
search method for finding the region fro m the featured map
form CNN layer they use a separate region-based proposal.
Then it feeds to the RoI pooling layer. Finally, the softmax
layer for classification is used.
The network generates a class likelihood for each bounding
box and offset values for the bounding box. A threshold value
is set to predict the class probability of the bounding box. If the
value is more than the threshold then that region is having the
maximum probability of containing the object in the image. In
the case of SSD, it is a feed-forward CNN that generates a
fixed-sized bounding box which gives a confidence score of
each class to locate the object inside the image. The
architecture based on the VGG-16 (Visual Geometry Group16) [13] architecture. The fully connected layers are not taken
into consideration. In the case of high-quality image
classification performance-wise VGG-16 is very accurate. To
extract features in different scales, a collection of convolutional
layers is added along with the VGG-16 layers.
Based on the brief description of object detectors, YOLO
was considered in the study for experiments on drone
detection and classification.
III. METHODOLOGY
In this work, an approach is proposed to detect the target
moving or still using YOLOv3 [11] object detector to get
maximu m accuracy. YOLOv3 is the latest model given by
the YOLO family. YOLO only uses a convolutional neural
network fo r getting features fro m images. It does some
small changes in them fo r better and faster perfor mance.
DarkNet 53 is used in YOLOv 3, which has 53 convolutional
layers for extract ing features from the images. The
architecture of DarkNet53 is shown in figure 2 [27, 28].
Fig 2. DarkNet53 Architecture
YOLOv3 follows the same prediction method as
YOLO9000 [10] with some small changes in it. It uses
dimension clusters as the anchor bo x for p redicting
bounding boxes. The equations are given below shows the
calculations of the coordinate points of bounding [29, 30].
b x = (t x) + cx
(1)
b y = (t y ) + cy
(2)
bw = pwet w
(3)
th
bh = ph e
(4)
Where t x and t y shows the x and y coordinate in the image
starting point of the bounding box. c x and cy show top-left
coordinate of the image when the offset is taken into
consideration. t w and th show the width and height of each
bounding box. For pred icting an objectness score, YOLOv 3
uses logistic regression. Also for predicting class, it uses
978-1-7281-4876-2/20/$31.00 ©2020 IEEE
1013
Authorized licensed use limited to: University of Exeter. Downloaded on June 25,2020 at 12:02:19 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Intelligent Computing and Control Systems (ICICCS 2020)
IEEE Xplore Part Number:CFP20K74-ART; ISBN: 978-1-7281-4876-2
binary cross-entropy loss function. It predicts boxes of 3
different scales. As it uses a convolutional neural network
for its main structure, it extracts features fro m the bo xes of
different scales by using a similar concept of feature
pyramid network. When the bounding box above overlaps
the ground truth object more than every other bounding box
before, the value is 1. If the objectness score is not good but
if it shows more than the threshold value for ground truth
object, it will ignore that pred iction. In Darknet53, the last
few layers give a 3-d tensor which gives the information
regarding bounding box, offset value, and class score [31,
32].
For class prediction, each box predicts the classes which can
contain the bounding box using multi label classification.
PyTorch is an open-source mach ine learning library, based
on a torch lib rary is used as a platform. It is developed by
Facebook’s AI research lab for natural language processing
and computer v ision applicat ions. PyTorch has two
interfaces such as python and C++. But python interface is a
more refined one as python is the most used software
language for the application of AI projects.
size of 64. The performance of the model is analyzed at
different iterat ion. For every 10 epoch, the model loss rate,
precision, recall, etc. is saved during the time of training,
and kept it training for 150 epochs. To understands the
training procedure and how good the training is calculated
the loss, precision, and recall values. The graphs of the
model perfo rmance, are shown in figure 4. To evaluate the
detection performance, the mean average precision (mAP)
value is calculated. The results show the best performance
of the model is 0.74 at the 150th epoch.
Fig 3. Flowchart of the complete experiment
IV. RESULTS FROM EXPERIM ENT
A. Dataset
The dataset had made by collect ing images fro m the
internet and extracting frames fro m videos of different types
of drones. There are more than 10000 images of different
categories of drones. The type of drones is differentiated
based on their number of rotors. In this , all drones are mu ltirotor drones. Such as, if it has three rotors then it is
tricopter. If it has four rotors then it is a quadcopter and if it
has six rotors then hexacopter. If due to some light issue or
viewpoint issue if the drones are not able to differentiate
type then those types of images are taken in drone category
to train the model. The drones have appeared in the images
at different scales, different orientations, different
viewpoints, and illu minations. The annotations give height,
width, and top left (x,y) coordinate and type of drone for the
ground truth bounding box. For this experiment, the
annotations are taken in YOLO format.
Fig 4. Results from training
B. Performance of Dataset Training
NVIDIA GeForce GTX 1050 Ti GPU model is trained
with our model, with a learning rate of 0.0001 and a batch
978-1-7281-4876-2/20/$31.00 ©2020 IEEE
1014
Authorized licensed use limited to: University of Exeter. Downloaded on June 25,2020 at 12:02:19 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Intelligent Computing and Control Systems (ICICCS 2020)
IEEE Xplore Part Number:CFP20K74-ART; ISBN: 978-1-7281-4876-2
C. Visual Analysis of test results
In figure 5, some results have been shown on drone
detection fro m images. The testing images are different fro m
the training images. The 4 images are taken of different types
of drones from a different viewpoint. The first image shows
it is a quadcopter. The second one is detecting it is a
hexacopter. The third one is detecting as a drone and forth
one as a tricopter.
V. CONCLUSION
Due to the big arch itecture of the YOLOv3 model and less
class, the model is trained for 150 epoch only. In so me
cases, the model is unable to detect the correct drone type.
For improvement of accuracy, a new kind of counter
mechanis m can be integrated, such as RF signal detection.
In which the RF signal between the operator and the drone.
X band RADA R and micro-doppler RADA R are the new
methods. A new acoustic system is a modern method to
detect drone from drone blade sound also able to detect the
type of drone.
REFERENCES
[1]
Convolutional Neural Network-Based Real-Time Object Detection
and Tracking for Parrot AR Drone 2,ALI ROHAN,MOHAMMED
RABAH, AND SUNG-HO KIM1 School of Electronics and
Information Engineer-ing, Kunsan National University,South Korea,
Department of Control and Robotics Engineering, Kunsan National
University, Gunsan
[2] Real-T ime, Cloud-based Object Detection for Unmanned Aerial
Vehicles Jangwon Lee, Jingya Wang, David Crandall, Selma
Sabanovi c, and Geoffrey Fox School of Informatics and Comput ing
Indiana University
[3] Using Shape Descriptors for UAV Detection , Eren Unlu, Emmanuel
Zenou, Nicolas Riviere` ,Eren Unlu, Emmanuel Zenou, Nicolas
Riviere`. Using Shape Descriptors for UAV Detection. Electronic
Imaging 2017, Jan 2018, Burlingam, United States.pp. 1-5.
[4] Drone Detection and Identification System using Artificial Intelligence,Dongkyu ’Roy’ Lee,Woong Gyu La and Hwangnam
Kim,School of Electrical Engineering, Korea University, Seoul, Rep.
[5] A Study on Detecting Drones Using Deep Convolutional Neural Networks,Muhammad Saqib,Nabin Sharma,Sultan Daud Khan
[6] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look
once:Unified, real-time object detection,” in Proc. IEEE Conf.
Comput. Vis.Pattern Recognit. (CVPR), Jun. 2016, pp. 779-788.
[7] R. Girshick. Fast r-cnn. In Proceedings of the IEEE International
Con-ference on Computer Vision,2015.
[8] Support Vector Machines: Theory and Applications, Theodoros Evgeniou1 and Massimiliano Pontil2
[9] S. Ren, K. He, R. Girshick and J. Sun. Faster r-cnn:Towards real-time
object detection with region proposal networks. In Advances in neural
information processing systems, pages 91–99, 2015.
[10] YOLO9000:Better, Faster, Stronger Joseph Redmon, Ali Farhadi,
Uni-versity of Washington, Allen Institute for AIy
[11] YOLOv3: An Incremental Improvement, Joseph Redmon, Ali
Farhadi, University of Washington
Fig 5. Results from test images
[12] SSD: Single Shot MultiBox Detector Wei Liu1, Dragomir Anguelov,
Dumitru Erhan, Christian Szegedy,Scott Reed, Cheng-Yang Fu1, and
Alexander C. Berg1
[13] Accelerating Very Deep Convolutional Networks for Classification
and Detection Xiangyu Zhang, Jianhua Zou, Kaiming He†, and Jian
Sun
[14] Visualizing and Understanding Convolutional Networks Matthew D.
Zeiler and Rob Fergus ,Dept. of Computer Science, New York
University,USA
[15] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In
Proceedings of the IEEE conference on computer vision and pattern
recognition, pages 580–587,2014.
[16] Selective Search for Object Recognition,J. R. R. Uijlings, K. E. A.
van de Sande,T . Gevers · A. W. M. Smeulders
[17] Arockia Bazil Raj, A and Harish C Kumawat, 2020, ‘Extraction of
Doppler Signature of micro-to-macro rotations/motions using CW
Radar assisted measurement system’, IET- Science, Measurement &
T echnology, Accepted- Mar. 2020. [SCI – IET; IF:1.895].
[18] Daliya V Thiruvoth, B. Pawan Kumar, V.S Kumar, A.A.Bazil Raj
and Ravi Dutt Gupta, 2019, “Dual-Band Shared-Aperture
Reflectarray Antenna Element at Ku-Band for the TT&C Application
978-1-7281-4876-2/20/$31.00 ©2020 IEEE
1015
Authorized licensed use limited to: University of Exeter. Downloaded on June 25,2020 at 12:02:19 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Intelligent Computing and Control Systems (ICICCS 2020)
IEEE Xplore Part Number:CFP20K74-ART; ISBN: 978-1-7281-4876-2
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
of a Geostationary Satellite”, IEEE Inter., Conf.,’Recent Trends on
Eletronics,Information,Communication and T echnology(RTEICT)’.
Anupama Gupta and A.A.Bazil Raj, 2019, “Feature Extraction of
Intra-Pulse Modulated LPI Waveforms Using ST FT”, IEEE Inter.,
Conf., ‘Recent Trends on Electronics, Information, Communication
and T echnology (RTEICT)’ pp. 90-95
Suchismita Batabyal and A.A.Bazil Raj, 2019, “Design of Ring
Oscillator Based PUF with Enhanced Challenge Response pair and
Improved Reliability”, IEEE Inter., Conf., ‘Recent T rends on
Electronics, Information, Communication and Technology
(RT EICT)’ pp.1-5
Lakshmi Prasad and A.A.Bazil Raj, 2019, “Design of 2D-WH/T S
OCDMA PON ONU Receiver with FLC T echnique”, IEEE Inter.,
Conf., ‘Recent Trends on Electronics, Information, Communication
and T echnology (RTEICT)’ pp.90-95
Ravi Vaishnavi, G. unnikrishnan and A.A.Bazil Raj, 2019,
“Implementation of Algorithm for Point Target Detection and
Tracking in Infrared Image Sequence”, IEEE Inter., Conf., ‘Recent
Trends on Electronics, Information, Communication and
T echnology (RTEICT)’, pp. 1-5
Satyendra R. Nishad and A.A.Bazil Raj, 2019, “Sliding Mode
Control of Robotic Gait Simulator”, IEEE Int., Conf., ‘ Intelligent
Computing and Control Systems (ICCS-2019) ’, pp. 1-6.
Priyanka Shakya and A. A. Bazil Raj, 2019, “Inverse Synthetic
Aperture Radar Imaging Using Fourier Transform Technique”,
IEEE Inter., Conf., ‘Innovations in Information and
Communication Technology (ICIICT)’ pp. 1-4.
Using Shape Descriptors for UAV Detection , Eren Unlu,
Emmanuel Zenou, Nicolas Riviere` ,Eren Unlu, Emmanuel Zenou,
Nicolas Riviere`. Using Shape Descriptors for UAV Detection.
Electronic Imaging 2017, Jan 2018, Burlingam, United States.pp.
1-5.
Upasana Garg , A A Bazil Raj and K P Ray, 2018, “Cognitive
Radar Assisted T arget Tracking: A Study”, IEEE Inter., Conf.,
‘Communication and Electronics Systems (ICCES)’ pp. 427-430.
Arockia Bazil Raj, A et al, 2018, ‘Multi-Bit Digital Receiver
Design For Radar Signature Estimation’, IEEE Int., Conf., ‘Recent
Trends in Electronics, Communication and Information
T echnology 2018’, pp. 1-6.
Arockia Bazil Raj, A et al, 2018, ‘Design and Evaluation of C-band
FMCW Radar System’, IEEE Int., Conf., ‘Trends in Electronics
and Informatics 2018’, pp. 1-5.
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only
look once:Unified, real-time object detection,” in Proc. IEEE Conf.
Comput. Vis.Pattern Recognit. (CVPR), Jun. 2016, pp. 779-788.
Arockia Bazil Raj, A et al, 2020, ‘Prehistoric man’s fire to today’s
free space optical communication: technology and advancements’,
IEEE Communications survey and tutorials, R1 submitted, Mar.
2020.
A.A. Bazil Raj, “FPGA- Based Embedded System Developer's
Guide”, 1st ed., USA: CRC Press, 2018.
978-1-7281-4876-2/20/$31.00 ©2020 IEEE
1016
Authorized licensed use limited to: University of Exeter. Downloaded on June 25,2020 at 12:02:19 UTC from IEEE Xplore. Restrictions apply.
Download