Proceedings of the International Conference on Intelligent Computing and Control Systems (ICICCS 2020) IEEE Xplore Part Number:CFP20K74-ART; ISBN: 978-1-7281-4876-2 Drone Detection and Classification using Deep Learning Dinesh Kumar Behera Dept. of Aerospace Engineering Defence Institute of Advanced Technology Pune, India dkbehera36@gmail.com Abstract— This paper presents a systematic approach to drone detection and classification using deep learning with different modalities. The YOLOv3 object detector is used to detect the moving or still objects. It uses a computer vision-based approach as a reliable solution. The convolutional neural network helps to extract features from images and to detect the object with maximum accuracy. The model is trained with a proper dataset and trained for 150 epoch only to detect various types of drones. A con volutional neural network with modern object detection methods shows an excellent approach for realtime detection of drones. Keywords— Deep learning, Convolutional neural network, object detector Arockia Bazil Raj Dept. of Electronics Engineering Defence Institute of Advanced Technology Pune, India brazilraj.a@diat.ac.in the ability to ext ract features easy to understand and reliab le than conventional machine learning approaches [4-5]. In this study, experiments with the latest object detectors are carried out based on the deep learning approach to detect drones. While detecting it is going to classify the type such as tricopter, quadcopter, or hexacopter. The model is trained with a proper dataset so that in the case due to some orientation or scaling issue it is not able to find out the type but it is going to detect the drone. In figure 1, so me images fro m the dataset are shown with ground truth annotation [68] I. INTRODUCTION The size of the drone industry expands exponentially to make this gadget reachable to common citizens with cheaper prices. By loading explosives , with them, the drone can easily be converted into killer weapons. Even by pertaining drones, some terror attacks attempts have been reported [1, 17]. As drones are small in size and having small electro magnetic signature makes difficu lties fo r conventional RADA R for detection. A counter mechanis m is appraised by industry and the academic world. An object is having a specific structure, texture as well as some specific pattern. In natural environments, it is difficult to differentiate between the same types of objects because of high variation. The performance of an object detector reduces due to the lighting condition, change of appearance, and at what angle the object is facing towards the camera. Most object detector fails when some deformat ion happens to the object or changes in scale happens to the object [3, 4]. Stopple and background noises add more difficult ies to the object detector. In the modern-day object detection, a convolutional neural network (CNN) [14] has performed so well that traditional methods have almost vanished from the picture. The best part of the convolutional neural network is its ability for extract ing features. Based on the convolutional neural network, many object detectors come in to picture such as R-CNN [15], Fast R-CNN [7], Faster R-CNN [9], YOLO [6], SSD [12], etc. Apart fro m that, the highperformance GPUs and its easy availability through the use of high-performance cloud computing advanced computational ability. In the success of the neural network, it played a crucial ro le. Deep learning arch itectures allo w Fig 1. sample images from the dataset II. LITERATURE REVIEW In the past few years, many object detector has been proposed in the field of deep learn ing such as R-CNN [15], Fast R-CNN [7], Faster R-CNN [9], YOLO [6], SSD [12], etc. CNN brought a revolutionary change in the field o f computer vision and object detection. CNN is a hierarchical model that can be trained to perform a variety of detection, recognition, and segmentation tasks. It conducts a regionbased approach in which characteristics are extracted in a layer hierarchy, where lower layers in the network extract 978-1-7281-4876-2/20/$31.00 ©2020 IEEE 1012 Authorized licensed use limited to: University of Exeter. Downloaded on June 25,2020 at 12:02:19 UTC from IEEE Xplore. Restrictions apply. Proceedings of the International Conference on Intelligent Computing and Control Systems (ICICCS 2020) IEEE Xplore Part Number:CFP20K74-ART; ISBN: 978-1-7281-4876-2 lower-level characteristics such as edges and middle layer display droplet-like structure [18-20]. R-CNN, Fast R-CNN, and Faster R-CNN are the modern days’ region-based approach. R-CNN and Fast R-CNN use selective search methods to find out object properties of different scales and at different positions of the image. Again for all of these, the image is compressed to a fixed size o f (227x227) p ixels before feeding it to the CNN model. In R-CNN it collects 2000 reg ions by using the selective search [21] method. In this, they collect similarlooking sub-regions and try to merge it to get the final big region. After getting the big region classifier has to classify into one of the classes. But selective search does not fulfill proper requirements. So to avoid this one linear regression method is used to map pred icted bounded box to ground truth bounding box. Then it uses the SVM classifier [8] in an offline manner. For every classified region, one SVM classifier is used. R-CNN takes more time for the testing procedure as its training pipeline is very complex. In the case of Fast R-CNN, the feeding approach is the same as before in R-CNN. After feed ing it to CNN it creates a CNN feature map. Then this feature map is fed to the RoI pooling layer, which co mpresses them into square sizes and feeds it to fully connected layers for getting image vectors. Then using the softmax layer it predicts offset values for the bounding boxes as well as the class to which the bounded region belongs. In the case of Faster R-CNN, instead of using the selective search method for finding the region fro m the featured map form CNN layer they use a separate region-based proposal. Then it feeds to the RoI pooling layer. Finally, the softmax layer for classification is used. In the case of Faster R-CNN, instead of using the selective search method for finding the region fro m the featured map form CNN layer they use a separate region-based proposal. Then it feeds to the RoI pooling layer. Finally, the softmax layer for classification is used. The network generates a class likelihood for each bounding box and offset values for the bounding box. A threshold value is set to predict the class probability of the bounding box. If the value is more than the threshold then that region is having the maximum probability of containing the object in the image. In the case of SSD, it is a feed-forward CNN that generates a fixed-sized bounding box which gives a confidence score of each class to locate the object inside the image. The architecture based on the VGG-16 (Visual Geometry Group16) [13] architecture. The fully connected layers are not taken into consideration. In the case of high-quality image classification performance-wise VGG-16 is very accurate. To extract features in different scales, a collection of convolutional layers is added along with the VGG-16 layers. Based on the brief description of object detectors, YOLO was considered in the study for experiments on drone detection and classification. III. METHODOLOGY In this work, an approach is proposed to detect the target moving or still using YOLOv3 [11] object detector to get maximu m accuracy. YOLOv3 is the latest model given by the YOLO family. YOLO only uses a convolutional neural network fo r getting features fro m images. It does some small changes in them fo r better and faster perfor mance. DarkNet 53 is used in YOLOv 3, which has 53 convolutional layers for extract ing features from the images. The architecture of DarkNet53 is shown in figure 2 [27, 28]. Fig 2. DarkNet53 Architecture YOLOv3 follows the same prediction method as YOLO9000 [10] with some small changes in it. It uses dimension clusters as the anchor bo x for p redicting bounding boxes. The equations are given below shows the calculations of the coordinate points of bounding [29, 30]. b x = (t x) + cx (1) b y = (t y ) + cy (2) bw = pwet w (3) th bh = ph e (4) Where t x and t y shows the x and y coordinate in the image starting point of the bounding box. c x and cy show top-left coordinate of the image when the offset is taken into consideration. t w and th show the width and height of each bounding box. For pred icting an objectness score, YOLOv 3 uses logistic regression. Also for predicting class, it uses 978-1-7281-4876-2/20/$31.00 ©2020 IEEE 1013 Authorized licensed use limited to: University of Exeter. Downloaded on June 25,2020 at 12:02:19 UTC from IEEE Xplore. Restrictions apply. Proceedings of the International Conference on Intelligent Computing and Control Systems (ICICCS 2020) IEEE Xplore Part Number:CFP20K74-ART; ISBN: 978-1-7281-4876-2 binary cross-entropy loss function. It predicts boxes of 3 different scales. As it uses a convolutional neural network for its main structure, it extracts features fro m the bo xes of different scales by using a similar concept of feature pyramid network. When the bounding box above overlaps the ground truth object more than every other bounding box before, the value is 1. If the objectness score is not good but if it shows more than the threshold value for ground truth object, it will ignore that pred iction. In Darknet53, the last few layers give a 3-d tensor which gives the information regarding bounding box, offset value, and class score [31, 32]. For class prediction, each box predicts the classes which can contain the bounding box using multi label classification. PyTorch is an open-source mach ine learning library, based on a torch lib rary is used as a platform. It is developed by Facebook’s AI research lab for natural language processing and computer v ision applicat ions. PyTorch has two interfaces such as python and C++. But python interface is a more refined one as python is the most used software language for the application of AI projects. size of 64. The performance of the model is analyzed at different iterat ion. For every 10 epoch, the model loss rate, precision, recall, etc. is saved during the time of training, and kept it training for 150 epochs. To understands the training procedure and how good the training is calculated the loss, precision, and recall values. The graphs of the model perfo rmance, are shown in figure 4. To evaluate the detection performance, the mean average precision (mAP) value is calculated. The results show the best performance of the model is 0.74 at the 150th epoch. Fig 3. Flowchart of the complete experiment IV. RESULTS FROM EXPERIM ENT A. Dataset The dataset had made by collect ing images fro m the internet and extracting frames fro m videos of different types of drones. There are more than 10000 images of different categories of drones. The type of drones is differentiated based on their number of rotors. In this , all drones are mu ltirotor drones. Such as, if it has three rotors then it is tricopter. If it has four rotors then it is a quadcopter and if it has six rotors then hexacopter. If due to some light issue or viewpoint issue if the drones are not able to differentiate type then those types of images are taken in drone category to train the model. The drones have appeared in the images at different scales, different orientations, different viewpoints, and illu minations. The annotations give height, width, and top left (x,y) coordinate and type of drone for the ground truth bounding box. For this experiment, the annotations are taken in YOLO format. Fig 4. Results from training B. Performance of Dataset Training NVIDIA GeForce GTX 1050 Ti GPU model is trained with our model, with a learning rate of 0.0001 and a batch 978-1-7281-4876-2/20/$31.00 ©2020 IEEE 1014 Authorized licensed use limited to: University of Exeter. Downloaded on June 25,2020 at 12:02:19 UTC from IEEE Xplore. Restrictions apply. Proceedings of the International Conference on Intelligent Computing and Control Systems (ICICCS 2020) IEEE Xplore Part Number:CFP20K74-ART; ISBN: 978-1-7281-4876-2 C. Visual Analysis of test results In figure 5, some results have been shown on drone detection fro m images. The testing images are different fro m the training images. The 4 images are taken of different types of drones from a different viewpoint. The first image shows it is a quadcopter. The second one is detecting it is a hexacopter. The third one is detecting as a drone and forth one as a tricopter. V. CONCLUSION Due to the big arch itecture of the YOLOv3 model and less class, the model is trained for 150 epoch only. In so me cases, the model is unable to detect the correct drone type. For improvement of accuracy, a new kind of counter mechanis m can be integrated, such as RF signal detection. In which the RF signal between the operator and the drone. X band RADA R and micro-doppler RADA R are the new methods. A new acoustic system is a modern method to detect drone from drone blade sound also able to detect the type of drone. REFERENCES [1] Convolutional Neural Network-Based Real-Time Object Detection and Tracking for Parrot AR Drone 2,ALI ROHAN,MOHAMMED RABAH, AND SUNG-HO KIM1 School of Electronics and Information Engineer-ing, Kunsan National University,South Korea, Department of Control and Robotics Engineering, Kunsan National University, Gunsan [2] Real-T ime, Cloud-based Object Detection for Unmanned Aerial Vehicles Jangwon Lee, Jingya Wang, David Crandall, Selma Sabanovi c, and Geoffrey Fox School of Informatics and Comput ing Indiana University [3] Using Shape Descriptors for UAV Detection , Eren Unlu, Emmanuel Zenou, Nicolas Riviere` ,Eren Unlu, Emmanuel Zenou, Nicolas Riviere`. Using Shape Descriptors for UAV Detection. Electronic Imaging 2017, Jan 2018, Burlingam, United States.pp. 1-5. [4] Drone Detection and Identification System using Artificial Intelligence,Dongkyu ’Roy’ Lee,Woong Gyu La and Hwangnam Kim,School of Electrical Engineering, Korea University, Seoul, Rep. [5] A Study on Detecting Drones Using Deep Convolutional Neural Networks,Muhammad Saqib,Nabin Sharma,Sultan Daud Khan [6] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once:Unified, real-time object detection,” in Proc. IEEE Conf. Comput. Vis.Pattern Recognit. (CVPR), Jun. 2016, pp. 779-788. [7] R. Girshick. Fast r-cnn. In Proceedings of the IEEE International Con-ference on Computer Vision,2015. [8] Support Vector Machines: Theory and Applications, Theodoros Evgeniou1 and Massimiliano Pontil2 [9] S. Ren, K. He, R. Girshick and J. Sun. Faster r-cnn:Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pages 91–99, 2015. [10] YOLO9000:Better, Faster, Stronger Joseph Redmon, Ali Farhadi, Uni-versity of Washington, Allen Institute for AIy [11] YOLOv3: An Incremental Improvement, Joseph Redmon, Ali Farhadi, University of Washington Fig 5. Results from test images [12] SSD: Single Shot MultiBox Detector Wei Liu1, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy,Scott Reed, Cheng-Yang Fu1, and Alexander C. Berg1 [13] Accelerating Very Deep Convolutional Networks for Classification and Detection Xiangyu Zhang, Jianhua Zou, Kaiming He†, and Jian Sun [14] Visualizing and Understanding Convolutional Networks Matthew D. Zeiler and Rob Fergus ,Dept. of Computer Science, New York University,USA [15] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 580–587,2014. [16] Selective Search for Object Recognition,J. R. R. Uijlings, K. E. A. van de Sande,T . Gevers · A. W. M. Smeulders [17] Arockia Bazil Raj, A and Harish C Kumawat, 2020, ‘Extraction of Doppler Signature of micro-to-macro rotations/motions using CW Radar assisted measurement system’, IET- Science, Measurement & T echnology, Accepted- Mar. 2020. [SCI – IET; IF:1.895]. [18] Daliya V Thiruvoth, B. Pawan Kumar, V.S Kumar, A.A.Bazil Raj and Ravi Dutt Gupta, 2019, “Dual-Band Shared-Aperture Reflectarray Antenna Element at Ku-Band for the TT&C Application 978-1-7281-4876-2/20/$31.00 ©2020 IEEE 1015 Authorized licensed use limited to: University of Exeter. Downloaded on June 25,2020 at 12:02:19 UTC from IEEE Xplore. Restrictions apply. Proceedings of the International Conference on Intelligent Computing and Control Systems (ICICCS 2020) IEEE Xplore Part Number:CFP20K74-ART; ISBN: 978-1-7281-4876-2 [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] of a Geostationary Satellite”, IEEE Inter., Conf.,’Recent Trends on Eletronics,Information,Communication and T echnology(RTEICT)’. Anupama Gupta and A.A.Bazil Raj, 2019, “Feature Extraction of Intra-Pulse Modulated LPI Waveforms Using ST FT”, IEEE Inter., Conf., ‘Recent Trends on Electronics, Information, Communication and T echnology (RTEICT)’ pp. 90-95 Suchismita Batabyal and A.A.Bazil Raj, 2019, “Design of Ring Oscillator Based PUF with Enhanced Challenge Response pair and Improved Reliability”, IEEE Inter., Conf., ‘Recent T rends on Electronics, Information, Communication and Technology (RT EICT)’ pp.1-5 Lakshmi Prasad and A.A.Bazil Raj, 2019, “Design of 2D-WH/T S OCDMA PON ONU Receiver with FLC T echnique”, IEEE Inter., Conf., ‘Recent Trends on Electronics, Information, Communication and T echnology (RTEICT)’ pp.90-95 Ravi Vaishnavi, G. unnikrishnan and A.A.Bazil Raj, 2019, “Implementation of Algorithm for Point Target Detection and Tracking in Infrared Image Sequence”, IEEE Inter., Conf., ‘Recent Trends on Electronics, Information, Communication and T echnology (RTEICT)’, pp. 1-5 Satyendra R. Nishad and A.A.Bazil Raj, 2019, “Sliding Mode Control of Robotic Gait Simulator”, IEEE Int., Conf., ‘ Intelligent Computing and Control Systems (ICCS-2019) ’, pp. 1-6. Priyanka Shakya and A. A. Bazil Raj, 2019, “Inverse Synthetic Aperture Radar Imaging Using Fourier Transform Technique”, IEEE Inter., Conf., ‘Innovations in Information and Communication Technology (ICIICT)’ pp. 1-4. Using Shape Descriptors for UAV Detection , Eren Unlu, Emmanuel Zenou, Nicolas Riviere` ,Eren Unlu, Emmanuel Zenou, Nicolas Riviere`. Using Shape Descriptors for UAV Detection. Electronic Imaging 2017, Jan 2018, Burlingam, United States.pp. 1-5. Upasana Garg , A A Bazil Raj and K P Ray, 2018, “Cognitive Radar Assisted T arget Tracking: A Study”, IEEE Inter., Conf., ‘Communication and Electronics Systems (ICCES)’ pp. 427-430. Arockia Bazil Raj, A et al, 2018, ‘Multi-Bit Digital Receiver Design For Radar Signature Estimation’, IEEE Int., Conf., ‘Recent Trends in Electronics, Communication and Information T echnology 2018’, pp. 1-6. Arockia Bazil Raj, A et al, 2018, ‘Design and Evaluation of C-band FMCW Radar System’, IEEE Int., Conf., ‘Trends in Electronics and Informatics 2018’, pp. 1-5. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once:Unified, real-time object detection,” in Proc. IEEE Conf. Comput. Vis.Pattern Recognit. (CVPR), Jun. 2016, pp. 779-788. Arockia Bazil Raj, A et al, 2020, ‘Prehistoric man’s fire to today’s free space optical communication: technology and advancements’, IEEE Communications survey and tutorials, R1 submitted, Mar. 2020. A.A. Bazil Raj, “FPGA- Based Embedded System Developer's Guide”, 1st ed., USA: CRC Press, 2018. 978-1-7281-4876-2/20/$31.00 ©2020 IEEE 1016 Authorized licensed use limited to: University of Exeter. Downloaded on June 25,2020 at 12:02:19 UTC from IEEE Xplore. Restrictions apply.