DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B DEGREE PROJECT IN ELECTRICAL ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2020 Detecting Faulty Tape-around Weatherproofing Cables by Computer Vision RUIWEN SUN KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B Detecting Faulty Tape-around Weatherproofing Cables by Computer Vision RUIWEN SUN Master’s Program in Embedded Systems (120 credits) Date: February 17, 2020 Supervisor: Yuan Yao Examiner: Zhonghai Lu School of Electrical Engineering and Computer Science (EECS) Host company: Ericsson DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B 12/16/2021 12/16/2021 DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B iii Abstract More cables will be installed owing to setting up more radio towers when it comes to 5G. However, a large proportion of radio units are constructed high in the open space, which makes it difficult for human technicians to maintain the systems. Under these circumstances, automatic detections of errors among radio cabinets are crucial. Cables and connectors are usually covered with weatherproofing tapes, and one of the most common problems is that the tapes are not closely rounded on the cables and connectors. This makes the tape go out of the cable and look like a waving flag, which may seriously damage the radio systems. The thesis aims at detecting this flagging-tape and addressing the issues. This thesis experiments two methods for object detection, the convolutional neural network as well as the OpenCV and image processing. The former uses YOLO (You Only Look Once) network for training and testing, while in the latter method, the connected component method is applied for the detection of big objects like the cables and line segment detector is responsible for the flagging-tape boundary extraction. Multiple parameters, structurally and functionally unique, were developed to find the most suitable way to meet the requirement. Furthermore, precision and recall are used to evaluate the performance of the system output quality, and in order to improve the requirements, larger experiments were performed using different parameters. The results show that the best way of detecting faulty weatherproofing is with the image processing method by which the recall is 71% and the precision reaches 60%. This method shows better performance than YOLO dealing with flagging-tape detection. The method shows the great potential of this kind of object detection, and a detailed discussion regarding the limitation is also presented in the thesis. Keywords Object Detection, Image Processing, OpenCV, YOLO, Line Segment Detector DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B iv Swedish Abstract Sammanfattning Fler kablar kommer att installeras på grund av installation av fler radiotorn när det gäller 5G. En stor del av radioenheterna är dock konstruerade högt i det öppna utrymmet, vilket gör det svårt för mänskliga tekniker att underhålla systemen. Under dessa omständigheter är automatiska upptäckter av fel bland radioskåp avgörande. Kablar och kontakter täcks vanligtvis med väderbeständiga band, och ett av de vanligaste problemen är att banden inte är rundade på kablarna och kontakterna. Detta gör att tejpen går ur kabeln och ser ut som en viftande flagga, vilket allvarligt kan skada radiosystemen. Avhandlingen syftar till att upptäcka detta flaggband och ta itu med frågorna. Den här avhandlingen experimenterar två metoder för objektdetektering, det invändiga neurala nätverket såväl som OpenCV och bildbehandling. Den förstnämnda använder YOLO (You Only Look Once) nätverk för träning och testning, medan i den senare metoden används den anslutna komponentmetoden för detektering av stora föremål som kablarna och linjesegmentdetektorn är ansvarig för utvinning av bandbandgränsen . Flera parametrar, strukturellt och funktionellt unika, utvecklades för att hitta det mest lämpliga sättet att uppfylla kravet. Dessutom används precision och återkallande för att utvärdera prestandan för systemutgångskvaliteten, och för att förbättra kraven utfördes större experiment med olika parametrar. Resultaten visar att det bästa sättet att upptäcka felaktigt väderbeständighet är med bildbehandlingsmetoden genom vilken återkallelsen är 71% och precisionen når 60%. Denna metod visar bättre prestanda än YOLO som hanterar markering av flaggband. Metoden visar den stora potentialen för denna typ av objektdetektering, och en detaljerad diskussion om begränsningen presenteras också i avhandlingen. Nyckelord Objektdetektion, Bildbehandling, OpenCV, YOLO, Linjesegmentdetektor DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B Acknowledgements v Acknowledgements I would like to express my deepest gratitude to Athanasios Karapantelakis, Yifei Jin, and my Ericsson supervisor, Maxim Teslenko, for the opportunity they have provided to implement the project and to take the time to assist me when I have any questions and provide them throughout the thesis process. I would also like to thank my reviewer Zhonghai Lu for accepting my project and for providing valuable feedback throughout the project and documentation. Eventually, I would like to thank my family, my girlfriend, and my friends for giving me the greatest help throughout my learning process. Stockholm, January 2020 Ruiwen Sun DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B CONTENTS vii Contents 1 Introduction 1.1 Background . . . . . . . . . . . . 1.2 Problem . . . . . . . . . . . . . . 1.3 Purpose . . . . . . . . . . . . . . 1.4 Goals . . . . . . . . . . . . . . . 1.5 Research Methodology . . . . . . 1.6 Benefits, Ethics and Sustainability 1.7 Delimitations . . . . . . . . . . . 1.8 Outline . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 3 3 3 4 5 5 . . . . . . . . . . . . . . . . 7 8 8 10 15 16 16 17 18 18 18 20 21 21 22 24 24 3 Methods 3.1 Research Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Qualitative and Quantitative Research Methods . . . . . . . . . . . . . 3.1.2 Philosophical Assumptions . . . . . . . . . . . . . . . . . . . . . . . . 25 25 26 26 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Theoretical Background 2.1 Object Detection Method . . . . . . . . . . . . . 2.1.1 R-CNN . . . . . . . . . . . . . . . . . . 2.1.2 YOLO . . . . . . . . . . . . . . . . . . . 2.2 Custom Image Processing Function . . . . . . . . 2.2.1 Greyscale and Binary Thresholding . . . 2.2.2 Erosion and Dilation . . . . . . . . . . . 2.2.3 Find Contours and Connected Component 2.3 Line Segmentation Methods . . . . . . . . . . . 2.3.1 Hough Transform . . . . . . . . . . . . . 2.3.2 Line Segment Detector (LSD) . . . . . . 2.4 Precision and Recall . . . . . . . . . . . . . . . . 2.5 Related Work . . . . . . . . . . . . . . . . . . . 2.5.1 Small Object Detection . . . . . . . . . . 2.5.2 Line Segmentation . . . . . . . . . . . . 2.5.3 Connected Component . . . . . . . . . . 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B viii CONTENTS . . . . . . . . . 27 27 28 28 28 29 29 30 32 . . . . . . . . . . 33 33 33 34 35 36 36 36 38 42 48 . . . . . . . . . 49 49 49 50 50 50 53 55 56 57 6 Conclusion and Future Work 6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 58 59 61 3.2 3.3 3.4 3.1.3 Research Methods . . . . . . . . . . . . . 3.1.4 Data Collection and Analysis Methods . . 3.1.5 Quality Assurance . . . . . . . . . . . . 3.1.6 Inference . . . . . . . . . . . . . . . . . Software Environment . . . . . . . . . . . . . . . Experimental Design . . . . . . . . . . . . . . . 3.3.1 Method of YOLO . . . . . . . . . . . . . 3.3.2 Method of OpenCV and Image Processing Summary . . . . . . . . . . . . . . . . . . . . . 4 Design and Implementation 4.1 Implementation on YOLO Network . . . . . . . . 4.1.1 Data Labelling . . . . . . . . . . . . . . 4.1.2 Network Configurations . . . . . . . . . . 4.1.3 Training Dataset . . . . . . . . . . . . . . 4.2 Implementation of OpenCV and Image Processing 4.2.1 Image Initialisation . . . . . . . . . . . . 4.2.2 Remove Blobs in the Image . . . . . . . . 4.2.3 Calculate Cable Information . . . . . . . 4.2.4 Line Segment Detection . . . . . . . . . 4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Measurements and Result 5.1 Object Detection of Faulty Weatherproofing . . . . . . . . 5.1.1 Result . . . . . . . . . . . . . . . . . . . . . . . . 5.1.2 Discussion . . . . . . . . . . . . . . . . . . . . . . 5.2 Method Based on Computer Vision and Image Processing . 5.2.1 Testing Method and Testing Environment . . . . . 5.2.2 Testing Result Under Fixed Binary Threshold . . . 5.2.3 Testing Result Under Adjustable Binary Threshold 5.2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . 5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . Bibliography Appendix A Table of Testing Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 68 DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B CONTENTS Appendix B Testing Result by Image Processing ix 74 DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B x LIST OF FIGURES List of Figures 1.1 Thesis outline: six chapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 2.2 2.3 2.4 2.5 2.6 Phases of image identification . . . . . . . . . . Bounding boxes with the predicted location . . Framework of Darknet-53 . . . . . . . . . . . . Gradient and level-lines definition . . . . . . . ERF-YOLO performance comparison . . . . . . Performance comparison of EDLines and LSD . . . . . . . 7 12 14 19 21 23 3.1 3.2 3.3 Research methods and methodologies . . . . . . . . . . . . . . . . . . . . . . Flowchart of the training of object detection model . . . . . . . . . . . . . . . The flowchart of the detection system . . . . . . . . . . . . . . . . . . . . . . . 26 29 31 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 OpenLabeling tool interface . Sample blobs in the test set . Connectivity Introduction . Bounding box selection . . . Inner reflection sample . . . LSD sample . . . . . . . . . Positions of endpoints . . . . Distance calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 37 38 39 42 43 47 47 5.1 5.2 5.3 5.4 5.5 Sample test image . . . . . . . . . . . . . . . . Result of the sample image . . . . . . . . . . . Binary threshold and precision and recall . . . . Binary threshold and run time . . . . . . . . . Comparison between slider and fixed threshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 54 55 55 56 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B LIST OF TABLES xi List of Tables 2.1 2.2 Comparison of backbones. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contingency table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 20 3.1 Summary of Research Methodology . . . . . . . . . . . . . . . . . . . . . . . 26 5.1 5.2 Testing Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test result under different binary thresholds . . . . . . . . . . . . . . . . . . . 53 54 A.1 A.2 A.3 A.4 A.5 Statistics of all the test set with the binary threshold of 30 Statistics of all the test set with the binary threshold of 40 Statistics of all the test set with the binary threshold of 50 Statistics of all the test set with the binary threshold of 60 Statistics with slider . . . . . . . . . . . . . . . . . . . . 69 70 71 72 73 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B List of Acronyms and Abbreviations mAP Mean average precision YOLO You only look once MIMO Multiple-input and multiple-output 5G Fifth-generation wireless technology AI Artificial intelligence OpenCV Open computer vision library FPN Feature pyramid networks RGB model Red, green, and blue color model GHT Generalized hough transform IoUs Intersection over union CNN Convolutional neural network R-CNN Regions with CNN features ROI Region of interest RPN Region proposal networks GPU Graphics processing unit XML Extensible Markup Language HOG Histogram of Oriented Gradients SVM Support Vector Machines LSM Line Segment Merging ED Edge Drawing xii DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B List of Acronyms and Abbreviations xiii VOC Visual Object Classes RF radio frequency MINST database Modified National Institute of Standards and Technology database) DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B Chapter 1 Introduction The Fifth-Generation (5G) Mobile networks are known to be high-speed networks which are able to reach 10Gbps theoretically [1]. This means that 5G can be used in various use cases, including intelligent transport systems [2], [3], IoT applications Internet connection [4], telemedicine [5], etc. Radio site, which is a telecommunication equipment consisting of a radio tower and a base cabinet, plays an important role [6]. Typically, the radio base station is set in radio tower, which covers certain radio coverage area, to communicate with terminal including mobile phones, wireless routers, etc. For example, when one makes a call by mobile phone, signals will be sent and received by a nearby (usually nearest) base station. When it comes to 5G, radio towers are expected to be massive in numbers but smaller in physical sizes [7]. This is because 5G works on a higher-frequency so that it has a shorter transmission distance. Furthermore, a larger number of devices results in higher bandwidth requirement. In this way, the stability and performance of the radio antenna are essential. To ensure this, the massive application of the Multiple-Input Multiple-Output (MIMO) is applied [8], which uses multiple transmitting and receiving antennas. As a result, more hardware needs to be connected. In short, radio site plays a more important role in terms of 5G networks, in which the number of the radio towers and hardware connection increase in the meantime. This thesis ensures the working status of the hardware, the cables. More precisely, this work focuses on detecting weatherproofing nuts (tape-around style) sticking to the cables on the radio tower and checks whether the weatherproofing is good or not. In this way, the automatic inspection is really meaningful to meet the demand of the increasing number of these types of hardware. 1.1 Background Radio site inspection is particularly expensive when involving equipment which is mounted on radio towers (e.g. radio units and/or antennas). This is due to field service engineers having 1 DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B 2 CHAPTER 1. INTRODUCTION to climb the tower to assess the condition of equipment, which is a dangerous and expensive operation. In order to address safety and cost concerns in radio site inspection, drones are used to assist field engineers with the inspection process. However, video footage from the drones still needs post-analysis from drone operators or field service engineers. This can result in a lengthy feedback loop of video being submitted to expert engineers, getting assessed and field service engineers revisiting the site to climb back up to the tower in case if any issues are found. 1.2 Problem In installation of the radio tower, cables are used in radio towers to connect between antennas and radio units. However, during long-term use of these cables and connectors, especially under various natural meteorology including raining and strong wind, these connections may not be as strong as expected. As a consequence, the site inspections of these pieces of equipment are essential. Nevertheless, these site inspections have a variety of problems in human and financial resources. On one hand, climbing up to the radio tower and inspecting is extremely dangerous. The tower structure related fatalities in America [9], [10] show that a number of technicians lost their life due to falling mostly. One the other hand, the weather affects radio frequency cable installations and inspection. The moisture effects on the cable is a great enemy. The moisture can become trapped inside the antenna connectors, which will result in corrosion among the shields and conductor. This will significantly influence the system. Furthermore, the inspection video calls for analysis by accompanied field engineer on the computer afterwards, which is time-costing. These situations will affect the efficiency of the inspection. Commonly, the site inspection issues consist of the following two types, which are bending radius and weatherproofing. In terms of bending radius, bending blow greatly will put stresses on cables, and they finally cause mechanical damage and/or signal losses or deamplification. In the case of weatherproofing connection, there are two types, one is plastic-type weatherproofing, another is the tape-around weatherproofing. This master thesis looks at one of the specific issues that are common on a radio tower, which deals with correct application of weatherproofing on the cables connecting radio units and antennas. In many developing countries, engineers do not use expensive plastic weatherproofing but instead opt for the application of weather-sealing tape on the cable connectors of the aforementioned equipment. This is also known as “tape-around” weatherproofing. However, there are many cases where either tape is incorrectly applied or becomes loose or torn due to environmental causes. This thesis will deal with the problem of detecting whether a tape-around weatherproofing is in a good or bad situation. Several different aspects will be considered when providing the assessment: DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B CHAPTER 1. INTRODUCTION 3 One of the typical bad-amounted cases is that the end of the tape is observed outside the cable, in other words, not glue jointed to the cables. These tapes look like waving flags on the flagpole, so they are called flagging-tapes in the following paragraphs. But how can these situations be precisely detected since the shape of flagging tape vary significantly? More concretely, how to detect flagging tape on the bottom or top of a tape-around shielded RF (radio frequency) cable? 1.3 Purpose The purpose of this project is to reduce costs and mitigate safety concerns for engineers inspecting radio towers. Therefore, drone-based faulty weatherproofing detection is conducted to avoid tower inspection from human beings. This thesis shall be able to present a feasible method, despite the interference during the detection. Concretely, the research area of focus is morphological image transformation, but also computer vision, and especially machine-learning-based object detection. Both of these sub-research areas and approaches will potentially be considered towards the final thesis deliverable. The final approach is expected to detect the faulty tape-around weatherproofing in most cases while minimizing bad performance in all aspects. 1.4 Goals The goal of this project is to build up a system for faulty weatherproofing detection. The main goals and deliveries are stated as follows. • The potential error detected in the image will be highlighted and shown to the operator. • The AI-based object detection method should be capable of detecting certain weatherproofing in more than a certain number of cases ideally, and it should produce less false positives as possible. If the result does not meet the required performance, another method of image processing will be considered. • As far as various lighting and illumination scenarios are concerned, operators are available to change the value of the binary threshold and erosion kernel size parameter in OpenCV respectively. Furthermore, the background may produce a huge number of noises. This means that the system is robust enough for the noise illumination. • The master thesis report is provided to illustrate the relevant literature area, describing the solution taken, and an evaluation of the solution against state-of-art. 1.5 Research Methodology The faulty weatherproofing detection system prototypes are based on theoretical and experimental work. It is also discussed with experienced engineers in the industry area. Corresponding theoretical knowledge is obtained from databases, including Google Scholar, IEEE Xplore, KTH DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B 4 CHAPTER 1. INTRODUCTION databases, Diva, etc. The conclusions are explained in the “Theoretical Background” section regarding theoretical information. This includes brief and detailed information, which are required to have a basic understanding of the background of the paper. This section will introduce an overview of the issues. Furthermore, possible solutions will be developed to addressed those problems. The specific focus will be on the requirements set by the company. The priority requirement for the solution is to have relatively accurate detection and full functions. Based on this, the precision and recall of the whole test set should be as large as possible, and these performances should be better if the detection of the system model has variance in different test pictures. However, the values in the requirement regarding precision and recall are not specified, because of the background noises, illumination differences, and a variety of shooting angles will all make a huge difference in the detection result. On contrast, time and power consumption are not the most crucial part of this system, but rather a reduction of false positives is critical. Therefore, the potential solutions to be chosen are based basically on these requirements. Different kinds of experiments are based on corresponding theoretical knowledge. These will be determined by empirical methods, which means that they will use the observations and experiences from others [11]. The result of experiments will determine if this method is possible to be applied. Furthermore, other methods are required to be imported to make a comparison. The more detailed discussion regarding various research methods of this thesis will be accessible in chapter 3. 1.6 Benefits, Ethics and Sustainability With the wider application in 5G, more radio towers need to be constructed and compatible with 5G. This requires a high-demand for radio tower operations including upgrades, cable mounting, etc. Since tower climbing is dangerous and high consumption in time and finance, a better method highly requires to be applied. The basic benefit of this thesis is to provide better sustainable solutions. The prototype will reduce the company’s time, human resources and financial costs, helping them increase profits and provide more sustainable solutions for the manufacturing industry. This thesis will not contain any personal data. In other words, private data cannot be abused. In order to obtain higher credibility, the conclusions of this paper are supported by repeatable theoretical and experimental results. The theoretical background is informative and citations are used carefully to make them comprehensive. This ensures that there is a reputation and to let the readers learn clearly about previous research. The results and discussion are based on experimental and theoretical backgrounds. This ensures that they are apparent and trustful. DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B CHAPTER 1. INTRODUCTION 1.7 5 Delimitations The thesis is limited owing to restricted technical limitations and test requirements. Within technical limitations, it is hard to detect the certain shape of the black tape surrounded by the dark background. Furthermore, the brightness of the image not only influences the reflection and specularity of the cable but also affect the binary threshold of the image. In this way, a slider of threshold and kernel are conducted to face flexible brightness and illumination environment. In terms of test requirement, a certain test was set up by experienced developers from the company, which includes a number of tape-around cables with various scenes and weather, like metropolitan, city night scene, open country, etc. However, due to the business administration and safety issues, it is not accessible to climb the real high radio tower to get data, so there will be differences between the real site and the test set. As a consequence, the conducted system will primarily focus on the simulated environment. 1.8 Outline Primarily, the thesis is built on six chapters, which is shown in figure 1.1. In this section, the outline of these chapters is discussed briefly for readers. In this way, readers can easily follow the outline of the whole thesis. Introduction Theoretical Background Methods Implementation Results Conclusion Figure 1.1: Thesis outline: six chapters Chapter 1 introduces the readers to the thesis by presenting an overall perspective, including the research topics and questions. Chapter 2 explains two methods to be implemented in the program, the object detection method and image processing method on OpenCV, and how they are possible to be applied in the program. Chapter 3 gives the introduction of the methodology and tools during the thesis. This chapter briefly explains how the system works. The risks of the object detection method are also shown in the passage. Moreover, this chapter leads to the next, the implementation section. Chapter 4 presents how these two methods are implemented in detail. It shows the processes in the system deeply into steps and functions. Furthermore, some prime algorithms are also explained in this chapter. Chapter 5 illustrates the results of the two methods and shows why the object detection on YOLO fails in the system. In this way, the system is focused on the method of image processing. When it comes to the implementation of OpenCV, the result is analysed in quantitative research focusing on precision and recall. DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B 6 CHAPTER 1. INTRODUCTION Chapter 6 concludes the whole system and its performance, and it suggests for the future work. DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B Chapter 2 Theoretical Background Image identification refers to the technique of using a computer to process, analyze, and processing images to identify targets and objects in various modes. Based on the observed images, the image processing method distinguishes the objects in the image by their categories and make reasonable judgments. In this way, the algorithm uses modern information processing and computing techniques to simulate and implement human sense and the processes of acknowledgement. In general, an image identification system is mainly composed of three parts: image segmentation, image feature extraction, and classification of classifiers, which is shown in figure 2.1. Pre-processing of the image Image segmentation Feature Extraction Result Delivery Judgement and matching Figure 2.1: Phases of image identification The image segmentation divides the image into a plurality of meaningful regions and then extracts the features of each region, and finally, the classifier classifies the images according to the extracted image features. In fact, there is no strict limit to image identification and image segmentation. In a sense, the process of image segmentation is the process of image identification. Image segmentation focuses on the relationship between objects and backgrounds. This step concentrates on the overall properties of objects in a particular context, while image identification focuses on the properties of the objects themselves. Image segmentation and identification technologies are widely used in aerospace, medical, communications, industrial automation, 7 DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B 8 CHAPTER 2. THEORETICAL BACKGROUND robotics, and military fields. This chapter offers theoretical background information. As this project is attempted in two basic methods, this section will introduce several prime parts of the technical background. Since Convolutional Neural Networks (CNNs) have been widely applied in various fields including image recognition [12], the first part will illustrate deep learning methods having attempted in the system. The second part explains and compares line detection methods. Last but not least, other image processing methods are also conducted. 2.1 Object Detection Method Compared with image classification, object detection in images is a more complicated problem in computer vision, because image classification only needs to judge which type of image belongs to. In object detection, there may be multiple objects in the image, class discrimination and position determination are required for all objects. In this way, object detection is more challenging than image classification. Furthermore, the deep learning model applied to object detection is more complicated. Recent years have witnessed the development of object detection methods. Most popular methods can be classified into two types of methods. The former is based on Region proposal, which includes R-CNN (Region Convolutional Neural Network) and its derived methods. While the latter is based on a single CNN network, like YOLO (You Only Look Once). This section will explain and compare these two types of methods regarding object detection based on deep learning. 2.1.1 R-CNN In 2014, the R-CNN algorithm was proposed [13], which basically laid the application of the two-stage method in the field of target detection. The first step is the segmentation of region proposal by means of selective search, while applying the best recognition network at the time, AlexNet [14], to classify the object in each region. The R-CNN method works as the following steps. First of all, get the original image entered. Secondly, the selective search algorithm is used to evaluate the similarity between adjacent images. Then the similarity regions will be merged, and the merged blocks are scored. In this way, the candidate frame of the region of interest, the sub-graph, can be selected. This step requires approximately 2,000 sub-graphs to be selected. After that, the convolutional neural network is separately used for the sub-graph, and convolution-ReLU(rectified linear unit )-pooling and full connection are performed to extract features. This step goes primarily into the area of object recognition. Last but not least, the extracted features are classified into objects, and the blocks with high classification accuracy are treated as the final object positioning block. R-CNN achieves a 50% performance improvement over the traditional target detection al- DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B CHAPTER 2. THEORETICAL BACKGROUND 9 gorithm. In the case of using the VGG-16 model [15] as the object recognition model, 66% accuracy can be achieved on the voc2007 dataset [16], which is a relatively good result. However, the biggest problem is that the speed is very slow and the memory usage is very large. There are two main reasons. • The candidate bounding box is completed by the traditional selective search algorithm, which is slow. • For all the 2000 candidate frames, object recognition is required. In other words, 2000 convolutional network calculations are needed, whose amount of calculation is gigantic. Fast R-CNN Aiming at addressing the weakness of R-CNN, a new fast R-CNN method is proposed [17]. Fast R-CNN mainly optimized two issues, image rescaling and speed issues. • ROI (Region of Interest) Pooling infrastructure is conducted. It solves the problem that the candidate frame sub-graph must be cropped and scaled to the same size. Since the input image size of the CNN network must be a fixed size (otherwise it cannot be calculated when fully connected), the candidate frames of different sizes and shapes in the R-CNN are cropped and scaled so that they can reach the same size. This operation is a waste of time. Moreover, it can easily lead to loss and deformation of image information. The Fast R-CNN inserts the ROI Pooling Layer before the fully connected layer. As a result, the image does not need to be cropped, through which solves this problem. • The multi-task loss function idea is proposed, and the classification loss and the frame positioning regression loss are combined to form a unified training, and finally, the corresponding classification and frame coordinates are output. Faster R-CNN Both R-CNN and fast R-CNN have a similar problem, which is to generate candidate frames by the selective search. Nevertheless, this algorithm is very slow. Moreover, all of the 2000 candidate frames generated in the R-CNN is required to feed into a convolutional neural network, which is very time consuming for calculation. This causes the slow detection speed of these two algorithms. In terms of this problem, the RPN (region proposal networks) is proposed to obtain candidate frames, thus eliminating the selective search algorithm and requiring only one convolution layer operation, which greatly improves the recognition speed. The Faster-RCNN mainly bases on four steps. • Convolution layer. The original image is first fed into Convolutional-ReLU-Pooling multilayer convolutional neural network to extract feature maps for subsequent regional proposal networks and fully connected layers. Different from R-CNN, Faster R-CNN only needs to extract the entire image once, which greatly reduces the computation time. DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B 10 CHAPTER 2. THEORETICAL BACKGROUND • RPN layer. The RPN layer is implemented to generate candidate frames and applies softmax [18] to determine whether the candidate frame is foreground or background. From these backgrounds, RPN selects foreground candidate frames, which usually contains targets, and uses bounding box regression to adjust the position of the candidate frame to obtain the feature subgraph. • ROI layer. This is similar to the layer in Fast R-CNN. It pools the proposal of different sizes into the same size, then feeds them into the successive fully connected layer for object classification. • Classification layer. This layer will output the result, including the class of object as well as the precise location of the object. In general, from R-CNN, Fast R-CNN, Faster R-CNN, the process based on deep learning target detection is growing streamlined, with higher precision and faster speed, which meets the recent industry and research demand. 2.1.2 YOLO YOLO [19], You only look once, is a network for object detection working a single CNN. Typically, the target detection consists of two tasks, i.e. identifying the location of the explicit object in the image, and classifying those objects. Previously, R-CNN and its derivative methods used a pipeline in multiple steps to complete the detection of objects. This results in a slow operation and is difficult to optimize because each module must be trained separately. However, YOLO does all of these processings in a single neural network. In other words, object detection is reconstructed into a single regression problem, and directly obtain the bound box coordinates and classification probability from the image pixels [19]. So, in a nutshell, the images are taken as input, and passed to a neural network that is similar to a normal CNN, and get a bounding box and class-predicted vector in the output. The following section introduces the progress of later versions of YOLO. YOLOv2 and YOLO9000 Based on YOLOv1, after improvement by Joseph Redmon, YOLO9000 and YOLOv2 algorithms were proposed in 2017 CVPR [20], focusing on the error of YOLOv1 recall rate and positioning accuracy. At the time of the presentation, YOLOv2 was faster than other inspection systems in multiple monitoring dataset and could be weighed against speed and accuracy. The article proposes a new training method – joint training algorithm. This algorithm can mix the two datasets together. Objects are categorized using a hierarchical view, and a large amount of categorical dataset is applied to augment the test dataset to mix two different datasets. The prime idea is to train object detectors on the detection dataset and the classification dataset, to learn the exact position of the object by monitoring the data of the dataset, and to increase DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B CHAPTER 2. THEORETICAL BACKGROUND 11 the classification quantity of the classification data by using the data of the classification dataset. YOLO9000 is trained using a joint training algorithm. It has 9000 categories of classification information learned from the ImageNet classification dataset [14], while object position detection is learned from the COCO detection dataset [21], [22]. Compared with the former version, YOLOv2 concentrates on improving recall, improving the accuracy of positioning while maintaining the accuracy of the classification. Several methods among the [20] are introduced below, which are implemented in YOLOv2 to improve the performance. • Batch normalization [23]. This approach optimizes the CNN network. Specifically, it increases the convergence of the network while eliminating the dependency on other forms of regularization. Implementing batch normalization to every convolutional layer in YOLO, mAP (mean Average Precision) is eventually increased by 2%. Furthermore, the model achieved regularization. Implement the batch normalization is able to remove dropout from the model but also get rid of overfitting [20]. • Use high-resolution classifier. YOLOv1 is pretrained on images with 224×224 resolution [19], while the newer version trained the model on 448×448 resolution. In correspondence with the resolution, the parameter of ImageNet [14] is also adjusted, which results in 4% improvement of mAP. • Convolution with anchor boxes. The YOLOv1 includes a fully connected layer that directly predicts the coordinates of bounding boxes, while the Faster R-CNN method uses only the convolutional layer and the region proposal network to predict the anchor box offset and confidence, rather than directly predicting the coordinates. The authors of YOLO found that by predicting the offset rather than the coordinate values, the problem can be simplified so that the neural network can be learned more easily. In other words, if a hand-picked priors [24] bounding box, which is a more accurate dimensioned bounding box, is applied to the system, the convolutional neural network will predict the location more easily. • YOLOv2 removed the fully connected layer and used anchor boxes to predict bounding boxes. At the same time, a pooling layer in the network is removed, which allows the output of the convolution layer to have a higher resolution. Shrink the network to run at 416×416 instead of 448×448. Since the objects in the picture tend to appear in the centre of the picture, especially the larger objects, there is a separate location at the centre of the object to predict these objects. YOLO’s convolutional layer uses a value of 32 to down-sampling the picture, so by selecting 416× 416 as the input size, it can finally output a 13×13 feature map. Using the Anchor Box will reduce the accuracy slightly. But according to [20], using it will allow YOLOv2 to predict more than a thousand boxes, with a recall of 88% and an mAP of 69.2%. According to the previous YOLO method, the network does not predict the offset, rather DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B 12 CHAPTER 2. THEORETICAL BACKGROUND predicts the coordinates based on the position of the grid cells in YOLO, which makes the Ground Truth value from 0 to 1. In order to make the results of the network to fall within this range, the network uses a Sigmoid function to limit the network predictions to the amount between 0 and 1. The network predicts five bounding boxes in each grid unit. Each Bounding Boxes has five-coordinate values tx , ty , tw , th , to . Their relationship is shown in the following figure 2.2. Suppose the offset of a grid unit for the upper left corner of the image is cx , cy , and the width and height of Bounding Boxes Prior are pw , ph . Then the predicted result is shown below from equation 2.1 to equation 2.5, and IoU means intersection over the union. The score of IoU is a standard performance metric for object category segmentation problems. Given a set of images, the IoU measurement gives the similarity between the predicted area and the ground truth area of the objects present in the set of images. cx cy pw bw ph σ(ty) bh σ(tx) Figure 2.2: Bounding boxes with the predicted location bx = σ (tx ) + cx (2.1) by = σ (ty ) + cy (2.2) bw = pw etw (2.3) bh = ph eth (2.4) P r(object) × IOU (b, object) = σ(to ) (2.5) Compared to YOLOv1, YOLOv2 can be faster and stronger. YOLO uses architecture of DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B CHAPTER 2. THEORETICAL BACKGROUND 13 GoogLeNet [25], which is faster than VGG-16 [26]. YOLO only uses 8.52 billion operations for a forward process, while VGG-16 requires 30.69 billion times. YOLOv2 is based on a new classification model Darknet19 [27]. YOLOv2 uses a 3×3 filter that doubles the number of channels after each pooling. YOLOv2 uses global average pooling, by means of batch normalisation, training becomes more stable, is accelerated to convergence, and the model is standardized. The final model, Darknet19, has 19 convolutional layers and 5 max-pooling layers. It takes only 5.58 billion operations to process a single image, 72.9% top-1 accuracy on ImageNet, and 91.2% top-5 accuracy. During training, if the entire network has 10 turns of fine turning on a larger 448×448 resolution, the initial learning rate is initialized to 0.001, and the network achieves 76.5% top-1 accuracy, 93.3% top-5 accuracy. YOLOv3 For a long period of time, there is a problem in the field of computer vision to be solved, that is, how to detect two similar targets or different types of targets that are close or adjacent to each other? Most algorithms will scale the input image data to a smaller resolution, but generally, only one bounding box will be given in this case. This is because the detection method will treat the whole of several objects as a single object. However, this is two identical or different objects in reality. There are many new algorithms for small target detection, like [28], [29]. However, YOLOv3 achieves a relatively good performance. Compared with SSD [30], YOLOv1 and v2 both perform poorer. Whilst YOLOv3 is seen to have superior performance than the former versions. Following part will illustrate the mutation and improvement of YOLOv3. • Multi-label classification prediction. After YOLO9000, the system uses dimension clusters as anchor boxes to predict the bounding box, and the network predicts 4 coordinates for each bounding box [20]. Nevertheless, logistic regression is used in YOLOv3 to predict the object confidence for each bounding box [31]. The value should be 1 if the previous bounding box overlaps the ground truth object by any other previous bounding box. When the previous bounding box is not the best, but it outnumbers the ground truth object by a certain threshold, this prediction will be ignored. Unlike YOLOv2, the system only assigns a bounding box to each ground truth object. If the previous bounding box is not assigned to a grounding box object, there is no loss of coordinates or category predictions. • Each bounding box uses a multi-label classification to predict which classes the bounding box might contain. The algorithm does not use softmax since it is not necessary for high performance, so YOLOv3 uses a separate logical classifier. In the training process, binary cross-entropy loss is applied for category prediction. In this way, for overlapping tags, the multi-label approach can better simulate data. • Cross-scale prediction is applied in YOLOv3. YOLOv3 uses multiple scales to make pre- DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B 14 CHAPTER 2. THEORETICAL BACKGROUND dictions. The original YOLOv2 has a layer called passthrough layer. Assuming that the size of the feature map extracted eventually is 13×13. The effect of this layer is to connect the 26×26 feature map of the previous layer and the 13×13 of this layer, similar to ResNet [32], [33]. This operation is also to enhance the accuracy of the YOLO algorithm for small target detection. This idea was further refined in YOLOv3, by means of Feature Pyramid Networks (FPN)-like upsample and fusion practices in YOLOv3 [34]. Eventually, YOLOv3 combines 3 scales, the other two scales are 26×26 and 52×52 respectively. After testing on multiple scale feature maps, the improvement of detection performance regarding small targets is still relatively obvious. Although each grid predicts 3 bounding boxes in YOLOv3, while each grid cell in YOLOv2 predicts 5 bounding boxes, the number of the bounding box in YOLOv3 is outnumbered. This is because YOLOv3 uses multiple scales of feature blending. • Framework changes. YOLOv3 uses a new network for feature extraction. Adding a hybrid network of Darknet-19 uses a continuous 3x3 and 1x1 convolutional layer, but now there are some shortcut connections, and YOLOv3 expands it to 53 layers and calls it Darknet53 [31]. Figure 2.3 illustrates the framework of Darknet-53. Inputs (batch_size,416,416,3) Conv2D 32x3x3 (batch_size,416,416,32) Residual Block 1x 64 (batch_size,208,208,64) Conv2D Block 5L 128 (batch_size,52,52,128) Conv2D 3x3+Conv2D 1x1 (batch_size,52,52,75) Residual Block 8x 256 (batch_size,54,54,256) Concat (batch_size,52,52,384) Conv2D+UpSampling2D (batch_size,52,52,128) Residual Block 8x 512 (batch_size,26,26,512) Concat (batch_size,26,26,768) Conv2D Block 5L 256 (batch_size,26,26,256) Residual Block 2x 128 (batch_size,104,104,128) Residual Block 4x 1024 (batch_size,13,13,1024) Conv2D 3x3+Conv2D 1x1 (batch_size,13,13,75) Conv2D+UpSampling2D (batch_size,26,26,256) Darknet-53 Conv2D Block 5L 1024 (batch_size,13,13,1024) Conv2D 3x3+Conv2D 1x1 (batch_size,13,13,75) Figure 2.3: Framework of Darknet-53 Compared with Darknet-19, this network is more powerful. Moreover, Darknet-53 is also more effective than ResNet-101 and ResNet-152 [32]. Each network is trained with the same settings and tested with a 256×256 single-precision test. The run time is measured on a Titan X at 256×256. Therefore, the Darknet-53 is compa- DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B CHAPTER 2. THEORETICAL BACKGROUND 15 Table 2.1: Comparison of backbones. Backbone Darknet-19 ResNet-101 ResNet-152 Darknet-53 Top-1 74.1 77.1 77.6 77.2 Top-5 91.8 93.7 93.8 93.8 Bn Ops 7.29 19.7 29.4 18.7 BFLOP/s 1246 1039 1090 1457 FPS 171 53 37 78 rable to the state-of-the-art classifier, but with fewer floating-point operations and faster speeds. The Darknet-53 is better than the ResNet-101 and is 1:5 times faster. The Darknet-53 has similar performance to the ResNet-152 and is up to 2 times faster. The Darknet-53 also achieves the highest measurement of floating-point operations per second. This means that the network structure can make better use of the GPU, making it more efficient and faster to evaluate. The YOLO detection algorithm performs target detection and achieves high detection speed and detection accuracy. The algorithm not only has a good effect on the real entity but also has good compatibility with other objects, such as works of art. Compared with other algorithms, the YOLO algorithm is more in line with the real-time requirements of the industry for target detection algorithms. It is simple and easy to implement and is very friendly to embedded systems. The YOLO series continuously absorbs the advantages of target detection algorithms, applies them to itself, and continues to improve. It is a growing algorithm. In conclusion, YOLOv3 is by far one of the most balanced object detection networks in terms of speed and accuracy. Through the integration of a variety of advanced methods, the short boards of the YOLO series are refined, including the speed issues, the precision of small object detection, etc. In other words, YOLOv3 finally achieves impressive object detection and speed of detection. To rehearsal the framework of the YOLO series, a great performance improvement can be witnessed while the upgrade of YOLO versions. 2.2 Custom Image Processing Function The solution to the traditional computer vision problem is to obtain image then preprocessing then hand-crafted features extraction and finally classification. Most of the research has focused on the construction and classification of artificial features, and many outstanding works have emerged. However, the problem is that the characteristics of artificial design may not be applicable. In other words, the generalization ability is weak. This means that one type of feature may be better for a certain type of problem, but in terms of other problems, it has a much poorer performance. Nevertheless, as far as this topic regarding specific faulty tape-around weatherproofing detection, a very special example, the custom image processing can be implemented. DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B 16 CHAPTER 2. THEORETICAL BACKGROUND The following sections will deliver a brief introduction of several popular image processing methods. 2.2.1 Greyscale and Binary Thresholding A colourful image consists of three channels, red, green, blue, which is called RGB. The logarithmic relationship between white and black is divided into several levels, called greyscale. The greyscale is divided into 256 steps, while the white is 255, and black is 0. There are several approaches to convert the RGB image to greyscale image. However, the most common one is the weighted average method. Because human vision is most sensitive to green colour, whilst most insensitive to blue colour, the parameter of weights is produced. According to [35], following equation 2.6 is defined. Y = 0.299 × R + 0.587 × G + 0.114 × B (2.6) In a digital image, the histogram is the number of pixels for each grey values, and these pixels are counted to show as a graph. Image binary thresholding is the process of setting the grey value of a pixel on an image to 0 (pure black) or 255 (pure white) by setting up a threshold, which is the process of rendering the entire image a distinct black and white effect. A greyscale image of 256 brightness levels is selected by appropriate thresholds to obtain a binarized image that still reflects the overall and local features of the image. In digital image processing, the binary image plays a very important role. The binarization of the image facilitates further processing of the image, which simplifies the image. In other words, the amount of data is reduced, through which the target of interest and contours can be highlighted. If a particular object has a uniform greyscale value inside, and it is in a uniform background with other levels of grey values, the thresholding method can be used to obtain a comparative segmentation effect. If the difference between the object and the background is not in the grey value like different textures, these different features can be converted to a greyscale difference. After this the threshold selection technique is used to segment the image. Dynamically adjusting the threshold to achieve binarization of the image can dynamically observe the specific results of the segmented image. In this use case, the brightness of different images cause different grayscale values on the cable area, which means the binary threshold may filter the cables to different shapes in another image. 2.2.2 Erosion and Dilation As the terms suggest, erosion and dilation mean erode and dilate the area in the image. They are both morphological operations. To be specific, the operation is contributed to the white area, because the black means 0 in the image, which makes no difference to the operation. DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B CHAPTER 2. THEORETICAL BACKGROUND 17 These morphological operations feed structure element as input and the output is based on shapes. In terms of erosion, it convolutes the image with an arbitrary shape kernel, which is typically square or circular. This operation makes the kernel slide the image and multiply with matrix in the relevant image area. Then it extracts the maximum pixel value to replace corresponding pixels in the image. Obviously, this maximization operation will cause the bright areas in the image to start "expanding". In this way, dilation of the image is implemented, and the white area will expand, while the black part erodes. By contrast, erosion is the opposite operation. The difference is that, when the kernel slides the whole image and convolutes with related pixel matrix, the minimum of result matrix is selected to replace the anchor-position pixel. Therefore, the white proportion becomes smaller, whilst the black area expands. 2.2.3 Find Contours and Connected Component Find Contours Contour lines are defined as lines that join all points that have the same intensity along the image boundary. The contour is very convenient in shape analysis, finding the size of the target object and detecting the object [36]. Find Contours [37], as its name indicates, is the method which retrieves contours from the binary image. OpenCV introduces a find contour function that helps extract outlines from images. It works best on binary images, which means thresholding measures like Sobel edge detection should be implemented first. Connected Component Connected Component is a term in graph theory. The connected component is a subgraph of the undirected graph [38]. Inside the connected component, any two vertices are connected to each other by paths. Furthermore, any vertices of other components can be connected to this subgraph. Based on this algorithm, every connected component can be labelled by a unique number. In this way, the object in the image is able to be located and counted. This algorithm also plays an important role when it is integrated into an image recognition system or a humancomputer interaction system [39], [40]. In OpenCV library, compared with findContours, connectedComponent and connectedComponentsWithStats are newly defined in OpenCV3. Furthermore, the connected component with stats is able to return labelled value among the connected components DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B 18 CHAPTER 2. THEORETICAL BACKGROUND 2.3 2.3.1 Line Segmentation Methods Hough Transform The Hough transform was invented in 1972 by Richard Duda and Peter Hart [41]. Then in 1981, the method spread by the article [42]. This article describes the modification of the Hough transform using the template matching principle. It is important to know that the Hough transform was originally developed to analyze defined shapes such as lines, circles, ellipses, etc. By detection of the shape and aiming to find the position and orientation in the image, this change enables the Hough transform to detect any object described by its model. In this way, after using the generalized Hough transform (GHT), the problem of finding the position of the model is transformed into the problem of finding the transformation parameters which map the model to the image. Given the value of the transform parameter, the position of the model can be determined in the image. The Hough transform is usually performed after edge detection. Usually, the lines are expressed as y = mx +b. The basic idea of collineation is that a number of points are on this line. As long as there are two points on the line, it can determine one straight line. Hence, the problem can be transformed into solving all (m, b) combinations. Set two coordinate systems, one coordinate system represents the (x, y) value, and the other coordinate system represents the value of (m, b), which is the parameter value of the straight line. Then a point (x, y) on the right corresponds to a line, and a straight line in the left coordinate system is a point in the right coordinate system. In this way, the intersection point in the rightleft system indicates that there are multiple points passing through the line defined by (m, b). However, this method has a problem that the value range of (m, b) is too large. In order to solve the problem that the value range of (m, n) is too large, the normal expression of xcosθ + ysinθ = p is used instead of the general expression in the representation of straight lines. In this way, a pixel in the image space is a curve (triangular function curve) in the parameter space. The Hough Line algorithm is expressed as follows: • Initialize (θ, p) space, N (θ, p) = 0 (N (θ, p) represents the number of pixels on the line represented by this parameter). • For each pixel (x, y), find the (θ, p) coordinates of xcosθ + ysinθ = p in the parameter space, and N (θ, p) + = 1. • Count the size of all N (θ, p), and take out the parameters of N (θ, p)> preset threshold. 2.3.2 Line Segment Detector (LSD) The core of LSD is pixel merging in error control. LSD is known as a line segment detection algorithm that achieves subpixel accuracy within linear-time [43]. Although LSD claims that it DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B CHAPTER 2. THEORETICAL BACKGROUND 19 does not need to manually set any parameters, in actual cases, the sampling rate, as well as the difference in direction of two pixels, can be set. It is known that detecting a line in an image is actually looking for pixels with large gradient changes in the image. The goal of LSD is to detect local straight contours in the image. That is the reason why it is called line segmentation. Contours are special areas of the image in which the greyscale of an image varies significantly from black to white or from white to black. In this way, gradient and level-line are defined and shown in figure 2.4. Gradient Figure 2.4: Gradient and level-lines definition The algorithm first calculates the level-line angle of each pixel to form a level-line field. The field is divided into several connected parts, and their directions are approximately the same and within the tolerance τ so that a series of regions can be obtained. These regions are called line support regions. Each line support region is actually a set of pixels, which is also a candidate for a line segment. Furthermore, for this line support region, we can observe its minimum circumscribed rectangle. Intuitively, when a group of pixels is particularly elongated, then the set of pixels is more likely to be a straight line segment. A main inertia axis of the line support region is treated as the main direction of the rectangle, and the size of the rectangle is selected to cover the entire area. The angle difference between the level-line angle of the pixel in the rectangle and the main direction of the minimum circumscribed rectangle is within tolerance τ , then this point is called "aligned point". Then the number of all pixels in the minimum circumscribed rectangle is summarized. After counting the number of aligned points inside this rectangle as well, the two statistics are used to determine whether the line support region is a straight line segment. The criteria for the decision are a contrario approach and the Helmholtz principle [44]. DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B 20 CHAPTER 2. THEORETICAL BACKGROUND 2.4 Precision and Recall Recall rate is a measure of the ability of search systems and searchers to detect relevant information, and precision is a measure of the ability of search systems and searchers to reject non-related information. Therefore, the accuracy rate is equal to the number of correct information extracted divided by the number of extracted information, whilst the recall rate is equal to the number of correct information extracted divided by the number of information in the sample. Expressed in the formula, the function of precision and recall can be explained in the following formula 2.7 and 2.8, and parameter TP, FP, FN are indicated in table 2.2. P recision = TP TP + FP (2.7) TP TP + FN Moreover, the contingency table is shown in table 2.2. (2.8) Recall = Table 2.2: Contingency table Actual 1 Actual 0 Total Prediction 1 TP: True Positive FP: False Positive TP+FP: Predicted Positive Prediction 0 FN: False Negative TN: True Negative FN+TN: Predicted Negative Total TP+FN: Actual Postive FP+TN: Actual Negative TP+FN+FP+TN A general indicator for evaluating a classifier is classification accuracy, which is defined as the ratio of the number of samples that the classifier correctly classifies to the total number of samples for a given test dataset. But for the second-category classification problem, especially when we comment on the minority party in the second-category classification, the accuracy rate basically loses its significance as a judgment standard. For example, when a classifier is made for cancer detection, suppose there are 100 samples, 99 of which are positive (no cancer) and one is negative (with cancer). Suppose a model is applied to judge that the result is always positive to make predictions. The accuracy rate of this model is the proportion of correct predictions number in total number, which comes to 99%, but if this model is used to make predictions, not a single cancer patient cannot be distinguished, in which case the accuracy rate will lose the significance of evaluation. Therefore, for the two-category classification, the more commonly used evaluation indicators are precision and recall. Generally, the class of interest is a positive class, and the other classes are negative classes. The prediction of the classifier on the test dataset is correct or incorrect. DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B CHAPTER 2. THEORETICAL BACKGROUND 2.5 21 Related Work So far, there is not any related work regarding specific faulty weatherproofing detection. However, there are some partially relevant researches on kinds of small object detection as well as some techniques which are introduced above. These researches deliver good examples of the implementation of the algorithms. 2.5.1 Small Object Detection This part introduces a series of AI-based algorithms for the solution of several use cases. In [45], the work was implemented on YOLOv2 and CNN classifiers, which aims to detect small objects in the whole image. The objects they applied on were traffic signs, and real-time was crucial to their work. What they explained were that their former work, Faster C-RNN was not suitable for the execution time demand. Even though HOG (Histogram of Oriented Gradients) [46] and SVM (Support Vector Machines) [47] performs in 20 fps, it cannot meet the requirement for real-time processing. In terms of [48], the report explained the disadvantage of existing object detection methods, which was the low performance in small object detection. As a consequence, ERF-YOLO was conducted to deal with the problem. The authors used expanding receptive field block, downsampling and up-sampling based on YOLOv2. The result of normal small object detection in their report was significantly improved, which is illustrated in figure 2.5. Furthermore, their test samples also included self-built dataset, which concretely detected the aeroplanes on the satellite by remote sensing. 100 90 80 70 60 50 40 30 20 10 0 mAP aero bird Faster-RCNN[1] boat bottle car SSD300[4] cat chair table plant sofa YOLOv2 544[3] ERF-YOLO 544 Figure 2.5: ERF-YOLO performance comparison The [49] explained the difficulty of small object detection owing to the background interfer- DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B 22 CHAPTER 2. THEORETICAL BACKGROUND ence. Thus, they proposed a modified network on YOLOv3, which is added multi-scale convolution kernels and Google-Inception network. Their research improved the mAP slightly, with around 1.5% in terms of the small objects. Another group of researchers [50] implemented and compared target detection algorithms, SVM and YOLO. Their use cases were detecting small targets using a drone with a pan-tiltzoom camera. Their crucial problems were accuracy, environment adaptation as well as execution time. Comparing these two methods, SVM was not chosen because the image required pre-processing for ROI extraction and performed poorly when it comes to black objects. As a result, YOLO was applied to their research. Furthermore, the network can be optimized because the objects are too small. Thus, only one scale will be used in practice. Finally, after the improvement of the network , the model recall came to 70% and 85% accuracy. 2.5.2 Line Segmentation Till now, a series of line segment detection methods are developed, including hough, LSD, CannyLines [51], EDlines (Edge drawing lines) [52], LSM (line segment merging) [53], etc. These algorithms have their pros and cons and can be performed in different use cases. It explained real-time lane detection in streets or highways in [54]. This system was based on LSD algorithm and using an inverse perspective mapping from the top view of the image. The system achieved 70 frames per second and was able to detect lanes that distinguish between dotted and solid lines, straight and curved lanes. LSD algorithm was also applied on the lane detection and tracking with fusion of Kalman filter [55]. In the project, line segment was employed as low-level features for lane marking detection, and LSD performs well in detection after filtering, which achieves around 95% correctness, 4% fault and roughly 1% missing detection. LSD algorithm played important roles in different applications. In [56], researchers used LSD to produce numerous line segments and after the filtering, the airport candidates were extracted, which benefited the consecutive processing. Moreover, LSD can extract all the line segments in [57], after that, the following algorithm can detect these regions of interests are text region or not. Furthermore, authors in [58] implemented this algorithm for delivering highlyaccurate building contour segments. EDlines is another powerful algorithm. In [52], the author illustrated that the work ran much faster than former methods. The result of [52] is shown in figure 2.6 below. It is clear that EDlines detected slightly less number of lines but improved significantly regarding the execution time. As a consequence, researches devoted themselves to address the line segment problem through EDlines algorithm. [59] illustrated runway detection in real-time. EDlines were used to extract straight line segments. And those fragmented lines can be linked into long runway lines. Similarly, it was also performed in [60], aiming to develop the system which had false lane detection control in real-time. DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B CHAPTER 2. THEORETICAL BACKGROUND 23 Number of Lines 900 800 700 600 500 400 300 200 100 0 Office Man and bicyle Zebra EDLines Boy and girl Chairs House Chairs House LSD Execution Time ms 200 150 100 50 0 Office Man and bicyle Zebra EDLines Boy and girl LSD Figure 2.6: Performance comparison of EDLines and LSD Power line detection was also one of the special cases of line segment detection [61]. They compared the performance of LSD and EDlines. The result shows that two methods have a similarity with accuracy, but EDlines (1ms) ran faster than LSD (3ms). EDlines was also used to solve a variety of problems. A high-speed automatic building extraction from high-resolution images was presented in [62]. EDlines was conducted for realtime accurate extraction of building line segments, which influenced the shape of the buildings. Their method for building extraction is the line linking and closed contour search. Furthermore, EDlines was presented in space target problem [63]. Owing to the rigid body in the space, the object had massive line features. So the feature matching between objects can be processed. DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B 24 2.5.3 CHAPTER 2. THEORETICAL BACKGROUND Connected Component A connected component is one of the most basic steps applied in the image processing systems. By assigning unique labels to all pixels that refer to the same object, the technology can distinguish different objects in a single image. Researchers in recent years are mostly optimising the execution time of the system. Report [64] reviewed state-of-art connected component labelling problem in the past decade, and each of the algorithms was attempted for the implementation. Literature [65] introduced a block-based algorithm for labelling, and the number of neighbourhood operations were reduced. Furthermore, they used two different processes with a binary decision tree to integrate block connection relationships in order to reduce unnecessary memory access. This greatly simplifies the pixel position of the block-based scan mask. In [66], researchers introduced a new "connected component tagging" hardware architecture for high-performance heterogeneous image processing for embedded designs. Concretely, the most complex proportion of the connected component labelling was processed in parallel, so the access of the memory became one of the critical issues in their work. 2.6 Summary This chapter introduces the theoretical background and state-of-art technologies which are potentially to be implemented in the algorithm. Concretely, it explains a number of object detection methods based on convolutional neural network, after which a series of custom image processing methods are also presented, including morphological, connected component, etc. Furthermore, this part illustrates two-line segmentation algorithms, Hough transformation as well as the line segment detector. Precision and recall are also mentioned, which will be used to evaluate the performance of the program. Last but not least, related works are also shown in the last section, showing clear state-of-art research in the techniques, including YOLO, line segment detection method and the connected component algorithm. The literature review gives a clear direction of how this thesis to be developed. DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B Chapter 3 Methods This section describes the methods used in this thesis. The purpose of indicating these methods is to clearly understand the reasons and decisions of these research methods before implementation. This section is divided into three sections, which covers the research methodology of the thesis, software environment introduction, as well as the experimental methods. 3.1 Research Methodology To fulfil the scientific objectives, choosing between different methods is required to answer the scientific questions. As this thesis covers a new area of the concept, it is even more important to decide which methods to implement thoroughly. Thus, a large proportion of the work is focused on a better way to achieve higher performance, including the precision and the recall. In chapter 2, two basic methods are introduced for object detection. Therefore, these two methods will be implemented and compared. There are several parameters to be considered when deciding which method to use. In terms of the source of the decision-making process, quantitative and qualitative analyses were applied. The experiments with the methods that need to be discussed are mostly regarding precision and recall, and less related to the execution time. However, the object detection method based on CNN may not predict well because of the insufficient number of the test set and its complexity of the parameter adjustment. In this way, if the object detection method can hardly predict the faulty weatherproofing correctly anyhow, another method, the image processing method must be made every endeavour to achieve higher performance. In this paper, the method is divided into four layers, and each layer is divided into two types of research: qualitative research and quantitative research in figure 3.1 [11]. In figure 3.1, these layers are illustrated with terms that will be introduced in each chapter. 25 DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B 26 CHAPTER 3. METHODS Qualitative Quantitative Interpretivism Positivism Experimental Experiments Empirical Coding Statistics Computational mathematics Validity Ethics Criticalism Interviews Language & Text Narrative Reliability Replicability Reproducibility Philosophical Research Methods Data Collections Quality assurance Figure 3.1: Research methods and methodologies 3.1.1 Qualitative and Quantitative Research Methods A qualitative research paradigm is subordinate to constructivism and hermeneutics [67], which concentrates on the meaningfulness of certain question. In this thesis, qualitative research is developed in the decision of the algorithm. The decision is on account of the reports and theory as a qualitative methodology. All other experiments implementation are developed on quantitative research methods, where experiments are conducted with different parameters to analyze the results. The result determines whether the assumption method is valid or invalid. In this quantitative research, it is important to have a large amount of data to draw valid conclusions and prove the performance of the system. Table 3.1: Summary of Research Methodology Quantitative research Precision Recall Runtime 3.1.2 Qualitative research Decision of the image processing method Decision of the OpenCV processing function Philosophical Assumptions Philosophical assumptions are the theoretical frameworks for researchers to collect, analyze and interpret the data in certain fields [68]. This thesis uses positivism, criticalism, and interpretivism [11]. Positivism assumes that when performing experiments, the real radio base station and the real environment simulated in the paper are very similar, and the execution time test represents the running time in most cases. Moreover, criticism is also used to explain the strength DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B CHAPTER 3. METHODS 27 and weakness of the methods. However, qualitative research use interpretivism to introduce ideas and experiences. 3.1.3 Research Methods There are two related methods, named experimental procedures and empirical procedures. The object detection methods are empirical, whilst the rest are experimental. In experimental research, researchers study the causes and effects of experiments and adjust the code to meet the requirement. In this way, the connections between variables and other relationships are observed in the experiment. This method is commonly used when researchers not only want to obtain the expected delivery but also want to learn more about system performance, like precision. Another method is the empirical method, which relies on people’s experience and observations and collects data from others’ data to verify hypotheses. 3.1.4 Data Collection and Analysis Methods This section describes how different data are collected and how to analyze them. The main method to collect data is during experimentation. The test set is taken from a small-scaled radio base station located in the laboratory. Hence, the photo shooting via drone or phone camera is accessible at any time. In other words, the simple test sets are available. However, in order to prove the system robust, enrich the training and test sets are imperative. In this way, the company supervisor created a relatively multi-situations dataset for the thesis. This is able to collect a number of variable data. The decision of the object detection method is based on text, language and interviews. By reading the theory, and more importantly, text and language can be discussed from previous reports. Interviews are used to gain knowledge from experts and gain information through their experience with object detection. Narrative analysis, coding, statistics, and computational mathematics are applied in this work regarding the data analysis. Precision, recall and execution time are all statistics and computational math and statistics, where calculations and algorithms are contained. These raw data derives from measurements, while the processing of them is on account of statistics, including summation and averaging. Coding is essential during the implementation. The output of the procedures or functions are observed during the coding and the algorithm will be analyzed and debugged when computational math also joins in to adjust parameters. The decision of object detection experiments is both coding and narrative analysis. After reading relative literature, coding is necessary for reproductivity, and also the analysis of observations. In terms of the narrative analysis, it is mainly used when learning background and reading previous research. DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B 28 3.1.5 CHAPTER 3. METHODS Quality Assurance To improve the validity, the system should be as general as possible to meet all the requirements. However, after the tests of the object detection method on account of CNN, the system cannot perform well among the test sets. In this way, the computer vision and image processing method are introduced. Although this method has more limitations, its performance is much better than object detection on CNN. The thesis also explains algorithms and experiments in detailed, so it is believed to be reproducible. However, owing to different background and the shooting angles, the level of repeatability is low. This means that for the same parameter, the output can be different. For the same output, it is barely to achieve for other researchers. However, a comparison of the methods and their reproductivity should be possible and similar conclusions can be made. 3.1.6 Inference Some supporting assumptions must be made for the validation of the research. Some auxiliary hypothesis is conducted in this thesis. Concretely, the environment during the execution is presumed to be stable, which will not affect the execution time of the system. Furthermore, assumptions under different background are carried out. It is important to note that due to the workload and method limitations, the author tried his best not to take other influences into account, for instance, considering limited use cases. Some special case may become the poor results obtained when experimenting in dark conditions. These results were therefore removed and stated that the results were not available in the dark background, which is the limitations. For "correct results", temporary modifications can be used. In this case, the temporary modification may be "the system only runs when the background is bright." It is vital to announce that the input may change under various background, but the system should be robust enough to always meet its requirements. 3.2 Software Environment In terms of each method, the environment is different. Concretely, since the training is timeconsuming and requires high performance on GPU, the object detection method will train on the small server in research lab in the company, with desktop Intel i7 and Nvidia Geforce GTX 1080 graphic card. After training, the weight is fed into the program and can run on the local device. On account of the image processing method, the program will run on the laptop sponsored by the company. A brief introduction is given in terms of setting up the software environment in a different method. The environment was set in advance by the researchers before in the company. The Yolov3 is already setup under ubuntu 16.04 and GCC version 5.4.0. The driver for the graphic card is 410.48 while the CUDA is under 9.0 version from Nvidia. All the sources are accessible from DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B CHAPTER 3. METHODS 29 the company web page. On account of the labelling data, an open-source tool, Openlabeling is implemented. Considering the image processing method, the program is implemented on the laptop and is programmed on the python language. The library applied in the program includes a line segment detector, OpenCV library, numpy library, etc. These libraries are available online and open-sourced for import. 3.3 3.3.1 Experimental Design Method of YOLO Start Start/Continue training N Image labelling Set up the network If the loss meets the requirement Y Stop training Feed training set into the network Test images with trained weight End Figure 3.2: Flowchart of the training of object detection model The steps of this method are illustrated in the flowchart above. These steps are general instructions for the object detection, which can be divided into preparing, training and testing sections. First of all is the image labelling, followed by the network parameter setup. Then, these labelled data will be fed into the network and start training. During this time, the loss during the training will change and display on the console window. Finally, after the loss is stable and DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B 30 CHAPTER 3. METHODS the loss is relatively small, the training is done. When it comes to the testing part, these weight after training is settled and can be used for testing. The performance of the system depends significantly on the training. Concretely, various conditions will affect the final weight. These conditions are training set and test set difference, labelling accuracy and whether the training is overfitted. Considering the training set, it is better when these data contains all the conditions in terms of the environment, brightness, cable gestures, etc. Generally, if the training does not include the test data, the network will be extremely hard to predict the true positives in the test set. Meanwhile, if the training is based on fewer conditions or even single environment, the result of the prediction will easily turn to overfit, which means that the performance of the test sets which are similar with the training set will be excellent, whilst other test sets will perform poorly. Hence, this incorrect selection of modelling samples, such as too few samples, incorrect selection methods, incorrect sample labels, etc., resulting in insufficient selected sample data to represent predetermined classification rules. This needs to be avoided during the training. In terms of the labelling accuracy, the labelled image must contain the cable or the flagging tape as the object. The object is labelled in a rectangle, which is called the bounding box. Nevertheless, the cables are curved object and flagging tapes are small parts among the cable. Therefore, how to accurately localize the object is essential and will reduce a significant loss during the training later. On account of the overfitting issues, apart from what has been explained above, the monitor of the average IoU and loss must be careful. There may be non-unique classification decision for the sample data. As the learning progresses, the backpropagation algorithm may make the weights converge too complex decisions. Furthermore, if the number of iterations for the weight learning is sufficient, which is overtraining, the training will fit noise in the data and no representative features in the training examples. Hence, the performance will be influenced. 3.3.2 Method of OpenCV and Image Processing Considering the former work has not met satisfying result in section 2.5, it is important to consider another method, so the custom image processing and OpenCV are necessary for this thesis. Custom image processing is generally implemented on the feature of the image. In this case, the faulty weatherproofing detection, it is critical to analyse the feature of flagging tape. The cables on the cable unit generally are black and thin columns going from the hex nut to the end and they are also pixel-consecutive. Though the weatherproofing type may vary, the cables are anyhow derived from the cabinet to another end. When it comes to the flagging tape, the feature is to detect the flagging part outside the original part of the cables. So it is possible to detect the cable first and localize potential flagging tape in the following step. As a result, the method could be delivered in the following way. First of all, it is necessary for image binarization. After that, the method of removing other blobs should be implemented, DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B CHAPTER 3. METHODS 31 during which it is possible to gain information of the potential cables, which are usually the dark column-shaped objects. Meanwhile, each cable, representing as connected components, should be separated and identified in the image, and the other objects will not be considered to participate in further processing. After that, these cable contours will be described in functions. To generalize the system, the linear fit of the cable contours are implemented for the processing. In this way, the flagging tape is able to be detected, because the flagging tape part is outside the linear-fitted contours of the cable body. Implementation of these methods will be introduced specifically in chapter 4. Start Note: CC refers to Connected component Find line segments in CC Image initialization Get line segmentation inside expanded CC Full image CC: Remove Blobs Find close endpoints within threshold Image segmentation and CC Y Detect and get cable information Segmentation of each CC CC, Index cables and remove rest blobs Image morphology to fill the reflection N Linear fit: similar line slop? Connect new endpoints with single line Connect the endpoints with polyline Get cable contours and calculation subtraction Store and draw endpoints outside the cable (To be the flagging tape) Figure 3.3: The flowchart of the detection system DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B 32 3.4 CHAPTER 3. METHODS Summary The four layers of the method are discussed, which contains philosophy, research methods, data collection and analysis, and quality assurance. Furthermore, this chapter introduces the two methods and preparations included in this thesis. Software environment and experimental design sections are discussed in terms of these two methods, YOLO and image processing on computer vision. In YOLO method, the process can be divided into labelling, training, and testing. However, removing noises, cable detection and flagging tape detection are three steps for the implementation of the image processing algorithm. DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B Chapter 4 Design and Implementation This chapter will show a deep perspective of the thesis. Two algorithms introduced in chapter 3 will be implemented in detail. 4.1 Implementation on YOLO Network The object detection method based on YOLO is simple for implementation, and the algorithm is implemented by primarily five steps, which is described in section 3.3.1. 4.1.1 Data Labelling Basically, the first premise of all machine learning methods is to provide the dataset to train the neural network. Under some circumstances, the already collected and open source datasets can be used, such as handwritten digital MINST database (Modified National Institute of Standards and Technology database), ImagNet dataset, COCO dataset, etc. Otherwise, own training set should be created by collecting data oneself. In this thesis, the dataset is the cables with or without flagging tape, which is unique images. Thus, the second method, the self-created dataset should be fed into the convolutional neural network and trained. One of the easy-to-use image labelling tools is OpenLabeling which can create own dataset. This tool enables users to input the video or images as raw data. After the object is labelled in the image, a related file with the information of object class and location are generated in the XML (Extensible Markup Language) file. These files are in the format of PASCAL VOC (Visual Object Classes) dataset, which provides a standard evaluation systems. Each XML file has the same name as corresponding image, and it contains the category and position (Xmin, Xmax, Ymin, Ymax) of the objects. Furthermore, the program is able to track the object with deep learning method for the prediction of the next frame image. Thus, this tool converts the object in the image into the PASCAL VOC format which contains these labelling information, and in this way, the object can be labelled efficiently. 33 DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B 34 CHAPTER 4. DESIGN AND IMPLEMENTATION Figure 4.1: OpenLabeling tool interface 4.1.2 Network Configurations YOLO has provided a relatively perfect convolutional neural network because the configuration file offers various parameters for the adjustments. In this use case, one network is implemented to locate the cables, while another is applied to distinguish the flagging tapes and good weatherproofing cables. In this way, the former network has one class, the cables, and the latter also includes one object, the flagging tape. The following part will give a brief introduction of several parameters which can be adjusted for different situations. • Batch: The number of pictures sent to the network per iteration, also called the number of batches. If this is increased, the network can complete an epoch in fewer iterations. Under the premise of fixing the maximum number of iterations, increasing the batch will increase the training time, but it will better find the direction of the gradient descent. If the memory of GPU (graphics processing unit) is large enough, the value of batches can be increased in order to improve memory utilization. If the number is too small, the training will not converge enough, and if it is too large, it will fall into a local optimum. max_batches refers to the maximum number of iterations. • Subdivision: This parameter makes the system not throw every batch of you into the network at one time. The training set is divided into the number as the subdivision represented. After running thoroughly, one iteration is finished. This will reduce the occupation of video memory. If this parameter is set to one, all batch images are thrown into the network at one time. When the subdivision is two, it feeds half at a time. DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B CHAPTER 4. DESIGN AND IMPLEMENTATION 35 • Angle: image rotation angle, this is used to enhance the training effect. Basically, it is to increase the training sample set in disguise by rotating the picture. • Saturation, exposure, hue are used to enhance the training effect. • Learning_rate: The learning rate. If the training diverges, the learning rate can be reduced. When learning encountered a bottleneck or loss is constant, the learning rate will be reduced. • Step and scales: These two parameters are together. For example, it is assumed that the learning_rate is 0.001, the steps are 100, 25000, 35000, and scales are 10, .1, .1. This set of data means learning during 0-100 iterations, the rate is the original, which is 0.001. When it comes to the learning rate during 100-25000 iterations, it is 10 times the original 0.01, and the learning rate during the 25,000-35,000 iterations is 0.1 times the current value, which is 0.001. The learning rate is 0.0001 during the 35,000 to maximum iteration, because it is 0.1 times. As iteration increases, reducing the learning rate can be more efficient learning of the model, that is, a better reduction of train loss. The number of filters in the last convolutional layer is 3 × (class_number + 5). The output needs to be constructed as S × S × 3 × (5 + class_number). YOLO divides the image into a grid of S × S. When the target centre locates in a certain grid, this grid is used for detection. This is the origin of S × S. YOLO has three scales of output. As each grid needs to detect 3 anchor boxes, so for each scale, the output is S × S × 3. For each anchor box, it contains coordinate information (x, y, w, h) and confidence, which was described in section 2.1.2, which are 5 pieces of information; it also contains information about whether all categories are used, using one-hot encoding. For example, there are three classes: the person, cars, and dogs. When a detection result is a person, so it is encoded as [1,0,0]. It can be seen that all category information will be coded. In this case, there is one category, so the result is six. Therefore, for the output of each dimension, the result is S × S × 3 × (5 + 1) = S × S × 18. 4.1.3 Training Dataset The training process will display some parameters on the console window, which shows the intermediate result during the training. It is necessary to keep monitoring these results and react for the next instruction. There are some parameters which reflect the training process, including IoU, class, object, average recall, loss. The former four parameters are expected to be higher till one ideally, while the loss is supposed to be a Speed Factor. However, in real situations, when the recall is too large, there are risks where the network is overtrained. For example, empirically speaking, a percentage of 90 is a relatively high value approaching the risk boundary. DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B 36 CHAPTER 4. DESIGN AND IMPLEMENTATION 4.2 Implementation of OpenCV and Image Processing The object detection implementation on OpenCV and LSD algorithm is based on several steps in figure 3.3, applying various library and functions. 4.2.1 Image Initialisation The first step is the image pre-processing, including file path management, resizing, greyscale and binary threshold. These steps lay solid foundations of further processing later. Algorithm 1 is shown below. The reason for inverting the binary image is that the cable to be detected is mostly black. However, black is treated ideally 0 after greyscale. In this way, inverting the greyscale image is meaningful to process the image. However, in order to simplify the explanation, the colour of the images will be explained with their raw data in the following sections. Algorithm 1: Initialize the Image Result: Resize and threshold image 1 Import libraries; 2 Set all constant parameters; 3 Set directories of input and output folder; 4 Binary threshold the image; 5 Dilation of the binary-inversed image ; 6 Image resizing; 4.2.2 Remove Blobs in the Image Blobs are primarily the areas in the image which differ in brightness, colour or other features compared with the adjacent regions. In this case, the method is aimed to remove the small areas which are not potentially cable. This operation will have a positive effect on filtering the background. Figure 4.2 shows steps of finding blobs in the image. The first image is the original image, and it will be binary filtered in the following step. After that, the binary image is inversed and dilated. The last step is to implement the connected component algorithm and output small connected components, which are the blobs in this section. Specifically, the method is implemented by means of connectedcomponentsWithStats in OpenCV library. This function is input with two parameters, a greyscaled image and the connectivity, as well as producing four parameters which will be processed in later functions. Through connectedcomponentsWithStats function, a variety of information of connected component will be obtained, including retval, labels, stats and the centroids. • Retval introduces the total number of connected components. DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B CHAPTER 4. DESIGN AND IMPLEMENTATION 37 Figure 4.2: Sample blobs in the test set • Labels are the map of pixels, which are labelled with the serial number of each connected components. • Stats explains some information of each connected components, including the coordinate of top-left corners, maximum widths, maximum heights, and total areas. • The centroid is the centroid of the image, but it is not used in this program. In terms of the input, the parameter connectivity offers the opportunity of connection option. The connectivity can be separated into 4-ways and 8-ways and is illustrated below in figure 4.3. 4-way connectivity only focuses on 4 adjacent pixels in the vertical and horizontal direction, while 8-way connectivity includes the other 4 pixels in the diagonal direction. In this way, the 8-way connectivity is a more complete option since it will contain more connected components and contain more scenarios. DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B 38 CHAPTER 4. DESIGN AND IMPLEMENTATION 4-way connectivity 8-way connectivity Figure 4.3: Connectivity Introduction Algorithm 2: Remove the blob in the image Result: New Matrix without blobs Input: matrix inversed image and its retval, labels, stats 1 for i ← 0 to retval do 2 if Connected component area > threshold blob area&&height > threshold × height then 3 Store number of the connected component in list blobs 4 end 5 end 6 for pixels P ∈ Inversed image do 7 if label of each pixels ∈ blobs then 8 Output Matrix ← Labelled pixels 9 end 10 end 4.2.3 Calculate Cable Information Cable Positioning and Width This section introduces how to get potential cables information, including the position, cable width and the total number of cables. In the test images, there are usually several cables. In this way, getting information about these cables are vital for further calculation. The information primarily contains the position of the cable as well as the cable width. Segmentation of the image is introduced to locate the cable. From the stats parameter in section 4.2.2, it is available to obtain the width and height of the bounding rectangle, and the DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B CHAPTER 4. DESIGN AND IMPLEMENTATION 39 top-left vertex is also settled. In this way, the detection is inside each bounding box. However, as the image is big and with various backgrounds in brightness or darkness, errors can be produced during the detection. These errors will result in difficulty for cable detection, so a concentration of the part of the image is implemented to solve this problem. In the cases, to be notified that the cables are ranged from the cabinet bottom to the bottom of the image. When it comes to the binary image, the cable itself may not be exposed completely under the unsuitable binary threshold. Concretely, the cable surface has some reflections in many cases, which causes the area are not to be detected as pure black (0 in greyscale). The algorithm 3 proposed in this section is first to segment the image. The images were segmented and concentrated on the target proportion. In this program, the target range is set from 0.5 times by height to 0.75 times by height, and in the horizontal direction, the abscissa is from the left-most abscissa to the sum of abscissa and bounding box width, as it is displayed in figure 4.4. Figure 4.4: Bounding box selection In this range, basically, the method detects and makes the statistics to black pixels on certain rows. Nevertheless, the statistics of a single row is not reliable owing to the background or even the flagging tape existence. Thus, various rows should be detected. Hence, it is more suitable to collect data in different rows. Accordingly, the segmented height is divided in average into 10 samples with the same distance to the adjacent one. Ideally, the black pixels fill the whole cable. However, owing to the illumination of the cables, some regions are filtered to be white colour. In order to reduce the error, all of the 10 rows are counted. Thus, the black pixels on the rightmost and leftmost of the connected components are the edge, and the width of cables of each row are calculated by their subtraction. These widths are then ordered, while the middle 6 rows are considered for width calculation. Furthermore, during this data collection, the number of black pixels in each line will be counted for filtering. Concretely, if the black pixels are few, this row will be filled with zero black pixels for the ordering. The background is also not considered because it takes the largest proportion in the whole area. This algorithm is explained below in algorithm 3. This cable width calculation will DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B 40 CHAPTER 4. DESIGN AND IMPLEMENTATION figure out the intercept of the cable in the horizontal axis, rather strictly the cable width which is vertical to the cable itself. Algorithm 3: Get cable information Result: Cable abscissa range and cable width Input: label matrix without blobs f ull Out Label 1 Set segmentation parameter, T op Ordinate, Bottom Ordinate; 2 half Inversed Image ← full Inversed Image [T op:Bottom]; 3 half Labels ← connectedComponentwithStats(half Inversed Image); 4 half OutLabels ← Remove blobs in half Labels; 5 retval, label, stats ← connectedComponentwithStats(half OutLabels,8); 6 Discard the zeros in the background in stats through area filtering; 7 Sets 10 ordinates sampleP oint and relative abscissa range; 8 for i ← 0 to retval do 9 for j ← 0 to 10 do 10 for xCoordinate k∈ abscissa range do 11 if labels[sampleP oint[j], k] = i then 12 Store List tempRange x coordinate k; 13 end 17 end if length(tempRange) > cable width threshold then Record the maximum and minimum element in tempRange; List cable width temp = maximum - minimum; 18 end 14 15 16 19 20 21 22 end Obtain X coordinate min and X coordinate max; Sort cable width temp and obtain average width of cable width temp[2 : 8]; end Separate Cables However, in the perspective of the whole image, these cables can become a single connected component because the cabinet bottom is dark like cables, through which cables connected to each other. After implementing the algorithm 3, this concentration on the proportion of the image will not take the cabinet bottom into consideration. In this way, cable x-coordinates and width are figured out, which help with truncation of each cable. The method to cut off cabinet connect is in brute-force. Basically, from former cable information result, the x-coordinate range can be obtained for each cable. Therefore, the distance of the adjacent cable can be calculated. In order to cut off the connection, the middle part of the adjacent cables can be filled with the pure background value. This will not influence the result DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B CHAPTER 4. DESIGN AND IMPLEMENTATION 41 of connected component any more since the cables are already cut away from the connection. Therefore, the cables are observed to be separate. However, there are still some bigger blobs in the image. In section 4.2.3, the image is processed in a specific area and non-cables are removed because there are not sufficient black pixels on the sample rows, so these faults are not processed in the loop. However, looking into the whole image, there are still some faulty objects. The output of line 7 in algorithm 6 below aims to separate the cables rather remove the faulty obAlgorithm 4: Cable separation Result: New Matrix without blobs Input: matrix cable inf ormation and retval, labels, stats of half OutLabels 1 for i ← 0 to cable total number − 1 do 2 cable Distance = abscissa: (cable[i + 1] - cable[i]); for j ← abscissa[i] + 1/3cable Distance to abscissa[i] + 2/3cable Distance do jects. 3 4 Fill column j with 0 → f ullOutLabels 5 end 6 7 8 end connectedComponentwithStats(f ullOutLabels,8); Index the x-coordinate from f ullOutLabels to half OutLabels Output Image with Only Cables In terms of removing other blobs in the whole image, the function in section 4.2.3 is able to select the correct cables and obtain the x-coordinate from algorithm 3. As the x-coordinate is the same in the two label matrix, it can be used as an index for the cable localization for fullimage processing. After using this index, the algorithm 6 aims to separate cables to different connected components. After this section, the output ideally should be only cables. However, owing to the background noises, some false positive detection may also exist, if the background is a column-shaped dark object. Considering the shade or illumination of the cabinet base which is on the top of the image, they will remarkably affect the result. However, this thesis does not focus on the detection of the radio cabinet. Hence, the cabinet is segmented out from the image. Specifically, a simple method of feature extraction, the line segmentation detection method is implemented to detect the bottom line of the radio cabinet, after which the program fill the upper part of the bottom line as zeros to illuminate the errors. Inner reflection of cables After the filtering and indexing, the potential cables will be filtered out primarily, while a few dark and column-shaped object may still exist. The basic idea of this section is to remove the reflections which are located inside of the cables. A sample picture of the image is shown below in figure 4.5. DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B 42 CHAPTER 4. DESIGN AND IMPLEMENTATION Figure 4.5: Inner reflection sample Therefore, method solving these inertial blobs is to be carried out. The basic idea is the morphology transform of the binary image. As mentioned in section 2.2.2, erosion will erode the white area while dilation will expand the white area. As a result, assuming the convolutional kernel is in a certain size, if the image is eroded first, the white blobs inside the cables will vanish, whilst after that when this image is dilated, these blobs will not appear as before. On the contrast, the edges of the objects will keep the same position as they are in the original image since the dilation and erosion are on the same scale. 4.2.4 Line Segment Detection LSD method, line segment detection, is implemented to extract the lines in the whole images, which is illustrated in section 2.3.2. This line extraction method is fast and efficient for line extraction in most cases. The following image is a sample output of the LSD method, and the image is resized initially. DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B CHAPTER 4. DESIGN AND IMPLEMENTATION 43 Figure 4.6: LSD sample After implementing the LSD algorithm, the output indicates all the information of segments, including the coordinates of the two endpoints in the segment as well as the width of the line. From the output of algorithm 3, the width of the cable can be obtained easily and also be set as the standard length for the filtering. Implementing through it will improve the stability of the system when zooming in or out, as the cable will transform in the same scale as the whole image does. Line segments and contour processing It is necessary to process line segments in the program, and how to choose suitable segments is significantly crucial. Selection of line segments must base on the rule that fewer segments which are available to indicate the edge of the cable. Therefore, two aspects of the purpose are conducted. One is to merge the lines and discard some lines which are out of the cable range, whilst the other algorithm finds suitable long lines. After determination of the cable width, segment ending points can be filtered and merged under a certain distance threshold. If the distance of two-line vertexes is relatively close to each other, these two ending points can be treated as one, and when the slope of these two lines are similar, they can be merged to a single line, otherwise only the ending point becomes the same. After merging similar endpoints and straight lines with similar slopes, the number of lines is significantly reduced. The next step is to further filter out all line segments inside or at the edge of the cable. In other words, the endpoints of these line segments are on the same cable range. Furthermore, after the morphology transform, the blobs are eliminated. Meanwhile, the endpoints of the line segments are ideally black. These conditions can further screen out the line DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B 44 CHAPTER 4. DESIGN AND IMPLEMENTATION segments that describe the characteristics of the cable. What needs to be pointed out is that after the dilation of the colour-inversed image, the area of the cable will expand so that it contains more endpoints of the segment. The purpose of the next step is to draw the contour of the entire cable through the line segments which have been filtered in the previous step. Therefore, only the line segments at the edge of the cable are meaningful for further processing. These line segments can relatively accurately describe the edge of the relevant cable. Moreover, the number of the output lines are expected to be smaller after discarding cable inertial lines. Hence, these line segments can be selected by the principle that the line segments are at the edges and are longer. Generally speaking, the line segments output through the above steps are longer and almost perpendicular to the abscissa axis, while they are more possible to be the potential cable outlines because the longer these lines will be able to show the outline of the entire cable. These edges are the candidates of the grey gradient of the cable. But there are exceptions, such as reflection on the surface of the cable. These long mirrorlike reflections enable the LSD algorithm to detect long line segments on the cable surface, which are virtually not the lines to be proposed. Therefore, it is required to be checked whether the output line segments are on the cable. In terms of the program, the cable has the same characteristics. Concretely, the cable ranges through the entire picture, so each cable has a border on the left and right, while the flagging tape exceeds the theoretical boundary of the cable. In theory, the entire cable can describe the boundary with a large quantity of straight lines, and these massive straight lines will increase the difficulty of finding locations. And more importantly, there may be flagging tapes in some locations along the cable. A large number of straight lines will directly surround the entire connected component. Indeed, the flagging tape can be wrapped in the contour, the subsequent detection steps cannot be implemented. Therefore, this part of the algorithm needs the following requirements: • The processed line segments can describe the approximate contour of the entire cable. • The line segments can be adjacent to the edge of the outline, and they do not need to closely fit all pixels. • It is necessary to ensure that the number of these line segments is as small as possible. In conclusion, because contour drawing requires low accuracy, the longest line segment is selected in the thesis. It will be extended, from the bottom end (the maximum vertical coordinate) to the radio cabinet bottom end (the minimum vertical coordinate). Hence, the the contour DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B CHAPTER 4. DESIGN AND IMPLEMENTATION 45 of the entire cable is drawn approximately. Algorithm 5: Merge Endpoints and linear regression Result: Image with cable contour Input: matrix of cable information 1 lsdMat ← Line segment detector(Original Image) 2 if both endpoints ∈ connectedcomponent then 3 Save both endpoints in ccLsdM at with bigger Y coordinate first 4 Record cable number 5 6 7 8 9 10 11 12 end for i ← 0 to cable total number do set x,y difference threshold for j ← 0 to number of linesegments, k ← j + 1 to number of linesegments do if one endpoint of line segment j is within distance of thresholdX&Y in line k then if one endpoint in j and k within distance of thresholdX&Y then if Slope of two lines are similar then Given the biggest Y and smallest Y as two line endpoints’ Y /* Combine two lines into a single longer one */ end 13 else Set adjacent endpoints’ coordinates to the same value /* Do not combine rather connect 14 15 end 16 end 17 18 19 20 21 22 23 24 25 26 27 28 29 */ end Quicksort by horizontal coordinate /* Determine which connected component the line belongs */ for i ← 0 to number of linesegments do for j ← 0 to cable total number do Get the line having maximum Y subtraction on each side of the connected component end end if Any of the subtraction of Y < threshold × height then Discard the result end Extend the line from height to cabinetbottom end DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B 46 CHAPTER 4. DESIGN AND IMPLEMENTATION If long line segments cannot be filtered out, one option is to extend and connect the collinear line segments, while another is not to connect. Because line segments may be produced owing to the reflection of the cable surface, which will produce false positives. Therefore, the implementation of complex connection algorithms may not necessarily produce a positive effect on the system. Therefore, this experiment uses a simple cable description method which indicates a single line segment on each cable contour. Distinguish flagging tape The algorithm of the previous step is only to perform the contour of the cable, while this part will illustrate how to figure out and highlight the flagging tape, which is based on the cable contour. Assuming that the contour of the cable has been obtained properly in the previous step, the flagging tape must be outside the contour. According to algorithm 5, all potential candidates have been filtered out through LSD and connected component algorithms, which display as the coordinates of two points. The points may locate on several kinds of the positions on account of the potential cables: inside, on the contour, scarcely outside the contour and at the further outside of contour. These situations have different features if multiply the horizontal coordinate differences between the endpoint and two contour lines with the same vertical coordinate. If a point is inside the two line segments, it is obvious that the product of the two horizontal subtraction is a negative number, showing blue in figure 4.7. If the point is on any one of the line segments, one of the distances from the line segment is zero so that the product will be zero, showing orange in figure 4.7. If the point is near the outside of the contour, the distance from one of the contour line segments will be approximately cable width, and from the other line segment, the distance is very small, so a smaller product is figured out, showing orange in figure 4.7. Only when the filtered endpoint is still far away from the nearest cable contour, the product of the difference between the two abscissas is large. In this way, the set threshold is satisfied and output this line is the flagging tape, showing yellow in figure 4.7. The following figure and algorithm show different situations and the method. DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B CHAPTER 4. DESIGN AND IMPLEMENTATION 47 Radio Cabinet Cable Figure 4.7: Positions of endpoints Figure 4.8 shows the conditions of the points. The black is the real cable, whilst the purple dashes indicate the cable contour by means of the algorithm 5. For example, Orange point is the situation on the contour. Parameter o1 and o2 illustrate the horizontal difference between orange point and left/right contour. And the determination of the flagging tape, as explained above, is based on the product of o1 and o2. The blue point introduces the case of being inside the contour with the parameter b1 and b2, while the yellow one is further outside the contour with r1 and r2 as the distance. (0,0) x Radio Cabinet o2 o1 b1 b2 r1 r2 Cable y Figure 4.8: Distance calculation DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B 48 CHAPTER 4. DESIGN AND IMPLEMENTATION Algorithm 6: Flagging tape detection Result: Detected flagging tape point Input: Cables Contour matrix Line endpoints m,n: xm , ym , xn , yn 1 Get linear function of line segments and contours 2 for i ← 0 to number of linesegments do 3 x1 , x2 = Substitute ym & yn into 2 contour f unctions 4 DetectM ←(x1 − xm ) ×(x2 − xm ) 5 if DetectM < threshold× cable Width2 then 6 Store point M as potential flagging tape 7 end 8 DetectN ←(x1 − xn ) ×(x2 − xn ) 9 if DetectN < threshold× cable Width2 then 10 Store point N as potential flagging tape 11 end 12 end In this way, the flagging tape is figured out and can be highlighted on the image, the image will multiply by the scaler and finally output in the original scale. 4.3 Summary This chapter introduces concrete implementations of the two methods, YOLO and image processing on OpenCV. These two methods are distinct in thoughts and infrastructure. The training of YOLO is based on convolutional neural network. The method uses Openlabeling tool for labelling the data, which are fed into the networks. This implementation has association with the network parameters during the training, which is essential for the performance. Another method, which is also the crucial part in the thesis, is the image processing method with computer vision. This method is implemented in a series of steps, including initialisation, blob removal, cable localization, and flagging tape detection. All of these parts are illustrated in detail. Compared with former work and literature, this program has several advantages. The use of the connected components algorithm delivers high reliability of detecting cables, and line segment detector extracts lines in the image. The combination of the two methods validates a dependable algorithm of the whole system. DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B Chapter 5 Measurements and Result The test will base on two methods, one is the AI object detection method, while another is the method combination of computer vision and typical image processing. 5.1 5.1.1 Object Detection of Faulty Weatherproofing Result As it is mentioned in section 2.5, the object detection method is possible to perform poorly in the system, and the real situation is true. There are many problems during the part of the object detection. The object detection using AI is basically separated into several steps, labelling, training, and testing. The latter two steps have big issues with detection. After labelling on the original shooting videos, the images are fed into the network and trained. The image is labelled for both cable and the faulty weatherproofing, the flagging tape, which means that the system is targeted to detect faulty weatherproofing directly. During the training, several images are fed into the graphic memory simultaneously, which is known as the batches, and the console window shows the output of the system, including execution time, precision, etc. In general, the system works on the gradient descent for the fitting. Generally, when the precision for the system reaches 95% in the normal object detection, the system will tend to become overfit. Normally, though sometimes the precision can struggle in low value, the system will adjust and refit for better performance. Then the precision will rise and finally reach high performance eventually (like 90%). However, if the model cannot reduce its loss for many loops, the model or the dataset might be unsuitable for the algorithm. The result of object detection is not satisfying. Nevertheless, however the environment or the shape of the flagging tape changes, the precision which displays on the output window shows it cannot reach 70% in most training sets by means of normal labelling. To make matters worse, the detection of the image are filled with false positives so that it is scarcely to observe the 49 DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B 50 CHAPTER 5. MEASUREMENTS AND RESULT flagging tape part. Although the network is changed with some parameters, it barely meets the requirement and reason is still due to too many false positive generations. Therefore, the confidence of the bounding box (object) is less than 60%, while IoU is less than 50%. 5.1.2 Discussion The cables are in the curve shape and flagging tapes are only small parts sticky on the cables. However, the labelling tool labels the object in a rectangle box. Hence, the labelling is inevasible to label some background parts from the whole image. Furthermore, the cable and flagging tape itself does not have too much feature and the flagging tape is a too small object for detection. Most important for all, the shape of the flagging tape shifts in different situations, which increases the difficulty for detection. In addition, the cable itself has some reflections on the surface. 5.2 Method Based on Computer Vision and Image Processing From the result of object detection, it can be noticed that due to the variability of line shapes and poor visibility of colours, which is mostly black, the IoUs on the cable are very poor. On the other hand, it is difficult to detect the flagging tape owing to the shape variance. As a result, the detection may generate a large number of false positives in many cases. Basically, object detection, as an artificial intelligence method, has the same common issues of overfitting. Concretely, the number of training sets is possibly too small compared to the test set under various conditions. In this way, the training set is easily triggered to be overfitted, resulting in the above situation. As a consequence, YOLO and other object detection methods are not supposed to be the best choice for this thesis. The following thesis implementation will focus on the characteristics of the cable itself. Specifically, these characteristics of the cable will be analyzed and extracted. In this way, the position of each cable is detected by the method of computer vision, and then various situations of the flagging tape are further analyzed, so as to find a most suitable and more general method for detecting the flagging tape. 5.2.1 Testing Method and Testing Environment Test set creation The testing method is based on the testing set, which derived from the responsible research of the sponsor company. In the test set, the radio base station is in existence in the company laboratory. To be specific, it is composed of a radio cabinet and a metal skeleton, and the former is mounted on the latter. There are four cables mounted on the four connectors. One of them is tape-around DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B CHAPTER 5. MEASUREMENTS AND RESULT 51 weatherproofing and with flagging tape on a certain position of the cable. Whilst three of them are the plastic type of the weatherproofing, which are not considered to be part of the testing so far. Moreover, the backgrounds of the test images are open source images from google, which are displayed on the full-scaled screen at the behind of the radio station. In this way, the test set of an image is created. A sample of the test set is shown below in figure 5.1, the most right cable is the tape-around type weatherproofing and with flagging tape on the top, which is the detection target. The other three cables are the plastic type of weatherproofing, their detection is out from the range of the thesis so that will not be discussed as part of the result. For the whole test set, there are 38 images in total under preceding rules. Figure 5.1: Sample test image Among all the images in the test set, the differences are the backgrounds, positions of the flagging tape, and the shooting angle. In this test set, all the images include the faulty weatherproofing, which is the flagging tape to be detected. If the program detects any faulty weatherproofing on the images, these possible problems will be displayed anyhow, even though there might be false positives among the highlight lines. Performance of the system The performance of the system is primarily based on two aspects, which are precision and recall. The other indexes, including the run time, is not crucial under this circumstance, but they will DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B 52 CHAPTER 5. MEASUREMENTS AND RESULT still be measured and analyzed. The precision of the test set is relatively complex to be calculated. Table 2.2 shows four conditions considering binary prediction. The precision shows the fraction of relevant instances among the retrieved instances [69]. In this way, the precision is equal to the true positive divided by the predicted positive. In every image, there are a number of predicted flagging tapes, and some of them are target object, whilst some are not. In terms of every image, its weight is the same. As a result, the precision of every image should be added together and divided by the total number of the predicted images. Those images which are not predicted flagging tape in the system should not be included since they do not have the precision when divided by zero. However, the method to calculate all the actual faulty lines and its division by all the predicted lines is not appropriate, because the quantity of the predicted lines varies in each image. If implemented, the difference of the prediction makes the weight not the same, and the images with more predictions will have higher weights. As a consequence, the precision will be unbalanced and become unascertainable. The formula is shown in equation 5.1. precision = Correctly P redicted Lines T rue P ositive = P redicted P ositive T otal P redicted Lines (5.1) The recall calculation of the system is significantly simpler than precision. Recall, generally shows the sensitivity of the model, showing the fraction of the total amount of relevant instances that were practically retrieved [69]. According to table 2.2, recall is the ratio of the true positives and the actual positive. Since all of the pictures include the flagging tape in the image, the recall is the number of actual faulty weatherproofing divide by the whole test set. It is shown below in equation 5.2. recall = T rue P ositive P redicted F lagging T apeImages = Actual P ositive T otal T est Images (5.2) Although running time is not critical in this system, the measurement is still meaningful for processing the image to reduce the complexity. In order to reduce the deviation of processing each image, a large quantity of iterations are implemented to test the processing time. In this way, the whole test set is processed for 100 times, which means that the system will process 3,800 images in total. It is expected to reduce the error significantly. Nevertheless, it requires to be pointed out that this thesis is only focused on tape-around style weatherproofing. For other mounting methods, such as plastic-type weatherproofing, no performance calculation is performed in this experiment. Other forms of weatherproofing prediction will be implemented by other methods. As their shape is very certain, these weatherproofing detection can be better implemented by AI methods, such as deep learning. , CNN, YOLO and other methods. Therefore, false positives generated by detecting other forms of weatherproofing in the picture are not considered. In other words, the calculation of the conclusion in the paper is DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B CHAPTER 5. MEASUREMENTS AND RESULT 53 only based on whether each picture itself has been detected faulty tape-around weatherproofing, and the precision of the predicted lines that constitute faulty tape-around weatherproofing. Moreover, to show a better performance of the system, a slider is implemented in the window. The slider is possible to control the binary threshold for processing the image, and it is a built-in function in OpenCV. Generally, the control rule is to use a bigger threshold when the picture is too bright. Thus, more bright black pixels will be processed, and vice versa. Furthermore, when a slider is applied, different values of the threshold can adjust to the images beyond this test set, which improves the flexibility of the system. Although the control of the slider is by the human but it is still meaningful for the algorithm reference when advanced brightness detection method may be implemented. Testing environment The testing is based on the following testing environment, a relatively high-end configuration. Table 5.1: Testing Environment Processor Memory Operation System Programming Language 5.2.2 Intel i7 32GB Windows 10 1809 version Python 3.7 Testing Result Under Fixed Binary Threshold Sample output The system will highlight all the predicted output in the images. The sample image output is shown below in figure 5.2. The red lines shown in the image are predicted flagging tape. As mentioned above, the most right cable is the object cable (tape-around weatherproofing), while the other cables are not. In this way, only the number of predicted lines on the right affects the precision and recall. The detailed graph is shown in appendix table A.1. DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B 54 CHAPTER 5. MEASUREMENTS AND RESULT Figure 5.2: Result of the sample image System performance Table 5.2 below introduces the performance of the system regarding precision, recall and the run time. Under the binary threshold of a certain value, the system precision can reach the percentage of 59,51 while the recall can reach 71.05%. The average run time for each image is less than 0.27s. Table 5.2: Test result under different binary thresholds Binary Threshold Precision (%) Recall (%) Run time (s) Run time Per image (s) 30 40 50 60 58,14 68,42 986,34 59,51 71,05 992,28 54,84 76,32 1012,47 55,93 73,68 1014,97 0,260 0,261 0,266 0.267 From the table above, it is more distinct to present the trend and the influence when a graph is generated. In this way, the data are collected and show as a statistical way below in two separate graphs 5.3 and 5.4. DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B CHAPTER 5. MEASUREMENTS AND RESULT 55 Percentage Precision and Recall 90 80 70 60 50 40 30 20 10 0 30 40 50 60 Binary Threshold Precision Recall Figure 5.3: Binary threshold and precision and recall Binary Threshold - Average Runtime 0,268 Average Run Time(s) 0,266 0,264 0,262 0,26 0,258 0,256 30 40 50 60 Figure 5.4: Binary threshold and run time 5.2.3 Testing Result Under Adjustable Binary Threshold In terms of using a slider, the performance of the system is better than a single fixed binary threshold. After using the slider, the threshold can be adjusted manually under different brightness conditions. Thus, the slider system is able to detect most of the test set. Two situations cannot be predicted properly. The first condition is several complex and black backgrounds which are almost mixed with the cable, the other condition is that the grey gradient difference between the flagging tape and the background is too small to recognize. As a consequence, the LSD algorithm is not able to predict relevant edges of the flagging tape. The concrete result of each image is shown in table A.5 in the appendix. The result shows 31 images are predicted DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B 56 CHAPTER 5. MEASUREMENTS AND RESULT correct flagging tape location among the 38 images, which means the recall reaches 82%. 68,5% of the lines are true positives, whilst rest are lines of false positives, so the precision is 68.5%, and its comparison with a fixed threshold is shown in the figure 5.5 Slider vs Fixed threshold 90 80 70 60 50 40 30 20 10 0 Slider 30 40 Precision (%) 50 60 Recall (%) Figure 5.5: Comparison between slider and fixed threshold 5.2.4 Discussion The graph 5.3, 5.4 and table 5.2 illustrate how parameter, the binary threshold, influences the performance of the system. Regarding the run time, when the threshold is bigger, the execution time for the systems increases. When the binary threshold becomes larger, some pixels which used to be filtered as white becomes black. After processed by the invertor, these black pixels will be treated as white ones. As a consequence, there are more white pixels than before for the execution. In this way, the calculation time increases. However, the number of pixels is not relatively large, which results in a slight increment of the execution time. Considering the precision and recall, the graph shows that in the appropriate value of the binary threshold, the precision and recall are not in the positive correlation. When the recall percentage increases, it often is at the cost of precision. The test data show that set 30 as a fixed threshold is not a good alternative, because when it comes to 40, the precision and recall are both better. This also means that 30 is too small for this test set threshold. After using the slider in the system, the performance of the system is improved significantly, because the binary threshold can be modified manually to fit the brightness of the image. In this way, the connected component algorithm will be implemented better discarding the noises in the DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B CHAPTER 5. MEASUREMENTS AND RESULT 57 background. However, manually modification of the system is difficult because the adjustment is likely to be excessive and cannot get the actual positive result. 5.3 Summary In conclusion, the experiments of two methods show a complete different result. YOLO, as AIbased object detection method cannot produce a satisfying result, while the method of image processing on OpenCV is able to reach 60% of precision and 71% regarding recall with 0.26 second execution time roughly . In terms of YOLO method, the system cannot reach a low level of the loss during the whole training. As a consequence, the system is not possible to deliver a good result. However, the cumstom image processing method is more reliable. To be pointed out, the calculation of precision and recall are regardless of plastic type of weatherproofing, rather only the tape-around style. In this way, the concentration is a proportion in the whole image. DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B Chapter 6 Conclusion and Future Work This section summarizes the work and findings found throughout the project. It also conducts recommendations for future work based on the results presented in this report, which can be studied in these areas. 6.1 Conclusion With the progression of 5G, there is a high demand for radio tower set up. In this way, there is a higher need for the maintenance of the whole radio base station system. To meet the requirement, deeper research and further development are imminently needed to find better solutions for reducing human costs and preventing dangers from humanity. The purpose of this work is a system that can predict faulty tape-around weatherproofing, specifically the flagging tape type of weatherproofing. Aiming to detect the faulty weatherproofing, this work was first carried out through extensive literature research, researching and attempting to reproduce some of the conclusions. Then the self-designed program or network is performed and compared. Therefore, the final prototype applies and combines the line segment detector, connected components, image processing, etc. for the test and experiment. With this method, faulty weatherproofing (flagging tapes) can be identified. Therefore, this article can answer the following three key questions that make it possible for the company to create a specific target detection system (flagging tape): Which methods cannot detect the flagging tape on a radio tower? What method can detect the flagging tape on the radio tower? How are results tested and evaluated? Two tracks are attempted during the thesis. AI-Based object detection (YOLO) and image processing on computer vision. In the former method, the first step is to label the data. The VOC format data is generated by the Openlabeling tool label cable or flagging tape, which can be recognized by the network. Then the network is configured, and the data is fed into the network to train the model. 58 DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B CHAPTER 6. CONCLUSION AND FUTURE WORK 59 In terms of the image processing method, resizing is implemented to improve the execution speed. Furthermore, a filtering method is added to eliminate the impact of small objects on the system. For the narrow or small reflections inside, the close operation in the morphology transform is applied to process the image. In order to improve reliability, the system applies the LSD algorithm to extract the lines, and the connected component method ensures that the detected line segments are reliable on the cable. Last but not least, the detection of flagging tape is based on the cable features, through which drawing the cable contours and find the lines outside the contour. The test set is 38 images with flagging tape on a certain place and with different backgrounds, and it is created by researchers in the company. The evaluation is based on precision, recall, and execution time. In YOLO, precision is the proportion of true positives in the predicted positive, and recall is the ratio of true positive and actual positive, IoU is the intersection area to the union area. On account of the image processing method, the precision is the ratio of correctly predicted lines and total predicted lines, concentrating on the area of tape-around weatherproofing, while the recall is the ratio of predicted flagging tape images and total test images. Comparison is made between the two methods, but the evaluation is not compared with former work since there are not any same use cases. In the program of the image processing method, the performance of the system reaches the (precision, recall) = (59.5%, 71.1%) and (54.8%, 76.3%) under different binary threshold parameters. Furthermore, the execution time of the program is less than 0.267 second per image. However, in terms of the object detection method, it does not work well by means of a stateof-art algorithm, YOLO. Even though the parameter of the network is changed and tried training in a single background, the system cannot reduce the loss to a satisfying value. Therefore, IoU, the confidence of the bounding box are relatively low (50%, 60% respectively), and the precision and recall are in poor value, which makes the result unreliable. 6.2 Limitations The final system is based on the OpenCV method and processed on custom image processing algorithm. Compared with the Artificial Intelligence method, the algorithm presented in this thesis is better to detect the faulty tape-around weatherproofing. However, some weaknesses and limitations still exist and are inevitable. • The shooting direction cannot be parallel with the cables. Considering that a drone is used for obtaining the photographs and is designed to shoot horizontally, but the radio cabinet may not be installed horizontally. This program does not adjust the program for bigger angles of shooting, like 45° to the horizontal or even vertical. • The brightness and complexity of the background strongly affect the prediction. Although the parameters can be adjusted through the slider, adjusting these parameters is relatively unreliable for non-technical personnel. Because there are no specific numerical values for DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B 60 CHAPTER 6. CONCLUSION AND FUTURE WORK • • • • the adaption, and the adjustment is easy to be exceeded. For example, given an image with a certain brightness, when the flagging tape cannot be detected with the original parameter, it is difficult for non-technical personnel to determine the specific value of the parameter is suitable for detecting faulty weatherproofing, even if the binary image can be displayed on the same screen. Algorithmically, the entire boundary of the cable is only fit by two rough line segments. Therefore, the predicted contour is not glueing to the actual boundary and impact the distance measurement of the potential flagging tape and the fitted contours. This might result in mass false positives generation. There are some bright speck or blobs inside the cables owing to the light reflection, and different brightness produces various levels of these blobs. These blobs in the cable may influence the consistency of the cable pixels and results in errors in the image. However, they are filled by means of the morphology transformation, the erosion and dilation, which is implemented through sliding with a fixed-size kernel. The blobs are not automatically filled in the system, which is inside the cable like the small islands, the system implements the function on an empirical value. Limitations of the LSD algorithm. The LSD algorithm performs well when detecting all the boundaries with big differences of grey gradients as potential segmentation lines, which are named candidates for this image. The prediction of the faulty weatherproofing line in this thesis is based on the filtering of these candidates from the LSD algorithms. However, whether all the lines can be completely detected is the limitation of the performance. From the test results, it can be seen that the LSD algorithm will be not very effective when the background and cable colours are similar, and this makes it more possible to fail the prediction. There are only positive samples in the test set. In other words, all the images in the test set contain the faulty weatherproofing in a certain location. However, the negative samples, which are the images without these faulty type weatherproofing are not included in the test set. This means some proportion of the whole situations are not included in the test set. In this way, the delivered result of precision and recall are part of the situations. Theoretically, there will be another result regarding the precision, different from the precision in each image, the precision of the whole test set is explained in formula 6.1. This precision is the percentage of true flagging tape images to the whole predicted faulty weatherproofing images, which reflects the degree of the correct prediction. precision = CorrectlyP redictedImages T rueP ositive = P redictedP ositive T otalP redictedImages (6.1) • The execution time per image is less than 0.267 second on the PC. However, if the system works on an extremely low-end computer or immigrates to mobile devices, the execution time shall be longer. This will make the system less efficient. DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B CHAPTER 6. CONCLUSION AND FUTURE WORK 6.3 61 Future Work According to these limitations, there are several parts which can be improved in the system. In this section, critical future works will be discussed. • The results of this experiment show that the YOLO algorithm can not achieve the detection of labelled tapes well, so a better method of target detection is required to be found or designed, that is, a network more suitable for this use case, which can realize small detection of the objects which are without fixed features and shapes. • It can be seen from the results that the brightness of the light greatly affects the test results because the background is also considered as a cable in an excessively dark environment. If false positives are increased, it is difficult to deal with this problem in traditional image detection. However, if the lightness and darkness of the background can be quantified, or the lightness and darkness of the approximate cable object and the surface reflection can be detected, then the cable position can be found more accurately. • The current method of fitting contour curves simply uses one longer line segment to present each side of the object, because the multi-curve fitting may wrap the flagging tape inside the contour so that making it undetectable. Thus, better fitting methods can be applied to fit contour in the future. • Better line segmentation method can be used. The line detector is the infrastructure of the image processing and it needs to be acknowledged that the LSD algorithm is very powerful. However, in the face of some background or object colour approximations, and when the grey gradient is not obvious, these detection results are not clear enough, and this causes un-detection of some straight lines belonging to the flagging tape. If there is a better algorithm, more straight lines can be added to the subsequent algorithm for processing. • Although it consumes not much time on a better-performing PC, when it comes to porting the program to a mobile terminal in the future, the program still needs to be optimized. Both time complexity and space complexity needs to be reduced. Hence, it is essential to reduce loops, the use of large arrays, and reduce the use of complex algorithms such as connected components as much as possible. • Add a gyroscope sensor to measure the shooting angle. The shooting angle may be nonhorizontal. Additional sensors for correcting the angle can be applied to ensure that the cable runs vertically through the entire picture during the shooting. DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B Bibliography [1] X. Liu, S. Wu, Y. Guo, and C. Chen, “The demand and development of internet of things for 5G: A survey,” in 2018 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW). IEEE, 2018, pp. 1–2. [2] A. Karapantelakis and J. Markendahl, “Challenges for ict business development in intelligent transport systems,” in 2017 Internet of Things Business Models, Users, and Networks. IEEE, 2017, pp. 1–6. [3] R. Inam, A. Karapantelakis, K. Vandikas, L. Mokrushin, A. V. Feljan, and E. Fersman, “Towards automated service-oriented lifecycle management for 5G networks,” in 2015 IEEE 20th Conference on Emerging Technologies & Factory Automation (ETFA). IEEE, 2015, pp. 1–8. [4] A. V. Feljan and Y. Jin, “A simulation framework for validating cellular v2x scenarios,” in IECON 2018-44th Annual Conference of the IEEE Industrial Electronics Society. IEEE, 2018, pp. 4078–4083. [5] W. D. de Mattos and P. R. Gondim, “M-health solutions using 5G networks and m2m communications,” IT Professional, vol. 18, no. 3, pp. 24–29, 2016. [6] L. Grcev, A. van Deursen, and J. van Waes, “Lightning current distribution to ground at a power line tower carrying a radio base station,” IEEE transactions on electromagnetic compatibility, vol. 47, no. 1, pp. 160–170, 2005. [7] S. Talwar, D. Choudhury, K. Dimou, E. Aryafar, B. Bangerter, and K. Stewart, “Enabling technologies and architectures for 5G wireless,” in 2014 IEEE MTT-S International Microwave Symposium (IMS2014). IEEE, 2014, pp. 1–4. [8] L. Dai, B. Wang, Y. Yuan, S. Han, I. Chih-Lin, and Z. Wang, “Non-orthogonal multiple access for 5G: solutions, challenges, opportunities, and future research trends,” IEEE Communications Magazine, vol. 53, no. 9, pp. 74–81, 2015. [9] “US tower fatality tracker,” http://wirelessestimator.com/content/fatalities, accessed: 2019-12-07. 62 DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B Bibliography 63 [10] “In race for better cell service, men who climb towers pay with their lives,” https://www. propublica.org/article/cell-tower-fatalities, accessed: 2019-12-07. [11] A. Håkansson, “Portal of research methods and methodologies for research projects and degree projects,” in The 2013 World Congress in Computer Science, Computer Engineering, and Applied Computing WORLDCOMP 2013; Las Vegas, Nevada, USA, 22-25 July. CSREA Press USA, 2013, pp. 67–73. [12] Q. Chen, Y. Fu, W. Song, K. Cheng, Z. Lu, C. Zhang, and L. Li, “An efficient streaming accelerator for low bit-width convolutional neural networks,” Electronics, vol. 8, no. 4, p. 371, 2019. [13] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580–587. [14] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105. [15] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014. [16] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes challenge 2007 (voc2007) results,” 2007. [17] R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440–1448. [18] “Softmax function,” https://en.wikipedia.org/wiki/Softmax_function, accessed: 2019-1114. [19] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, realtime object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788. [20] J. Redmon and A. Farhadi, “Yolo9000: better, faster, stronger,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7263–7271. [21] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in European conference on computer vision. Springer, 2014, pp. 740–755. DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B 64 Bibliography [22] A. Veit, T. Matera, L. Neumann, J. Matas, and S. Belongie, “Coco-text: Dataset and benchmark for text detection and recognition in natural images,” arXiv preprint arXiv:1601.07140, 2016. [23] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv preprint arXiv:1502.03167, 2015. [24] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Advances in neural information processing systems, 2015, pp. 91–99. [25] J. Du, “Understanding of object detection based on cnn family and yolo,” in Journal of Physics: Conference Series, vol. 1004, no. 1. IOP Publishing, 2018, p. 012029. [26] W. Yu, K. Yang, Y. Bai, T. Xiao, H. Yao, and Y. Rui, “Visualizing and comparing alexnet and vgg using deconvolutional layers,” in Proceedings of the 33 rd International Conference on Machine Learning, 2016. [27] A. Bobrovsky, M. Galeeva, A. Morozov, V. Pavlov, and A. Tsytsulin, “Automatic detection of objects on star sky images by using the convolutional neural network,” in Journal of Physics: Conference Series, vol. 1236, no. 1. IOP Publishing, 2019, p. 012066. [28] Y. Bai, Y. Zhang, M. Ding, and B. Ghanem, “Sod-mtgan: Small object detection via multitask generative adversarial network,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 206–221. [29] J. Li, X. Liang, Y. Wei, T. Xu, J. Feng, and S. Yan, “Perceptual generative adversarial networks for small object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1222–1230. [30] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in European conference on computer vision. Springer, 2016, pp. 21–37. [31] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv, 2018. [32] Z. Wu, C. Shen, and A. Van Den Hengel, “Wider or deeper: Revisiting the resnet model for visual recognition,” Pattern Recognition, vol. 90, pp. 119–133, 2019. [33] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception-v4, inception-resnet and the impact of residual connections on learning,” in Thirty-First AAAI Conference on Artificial Intelligence, 2017. DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B Bibliography 65 [34] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117–2125. [35] C. Saravanan, “Color image to grayscale image conversion,” in 2010 Second International Conference on Computer Engineering and Applications, vol. 2. IEEE, 2010, pp. 196–199. [36] “Find and draw contours using OpenCV | python,” https://www.geeksforgeeks.org/ find-and-draw-contours-using-opencv-python/, accessed: 2019-11-14. [37] S. Suzuki et al., “Topological structural analysis of digitized binary images by border following,” Computer vision, graphics, and image processing, vol. 30, no. 1, pp. 32–46, 1985. [38] “Component (graph theory),” https://en.wikipedia.org/wiki/Component_(graph_theory), accessed: 2019-11-14. [39] W. Chen, M. L. Giger, and U. Bick, “A fuzzy c-means (fcm)-based approach for computerized segmentation of breast lesions in dynamic contrast-enhanced mr images1,” Academic radiology, vol. 13, no. 1, pp. 63–72, 2006. [40] K. Wu, W. Koegler, J. Chen, and A. Shoshani, “Using bitmap index for interactive exploration of large datasets,” in 15th International Conference on Scientific and Statistical Database Management, 2003. IEEE, 2003, pp. 65–74. [41] R. O. Duda and P. E. Hart, “Use of the hough transformation to detect lines and curves in pictures,” Sri International Menlo Park Ca Artificial Intelligence Center, Tech. Rep., 1971. [42] D. H. Ballard, “Generalizing the hough transform to detect arbitrary shapes,” Pattern recognition, vol. 13, no. 2, pp. 111–122, 1981. [43] R. G. Von Gioi, J. Jakubowicz, J.-M. Morel, and G. Randall, “LSD: a line segment detector,” Image Processing On Line, vol. 2, pp. 35–55, 2012. [44] A. Desolneux, L. Moisan, and J.-M. Morel, “The helmholtz principle,” From Gestalt Theory to Image Analysis: A Probabilistic Approach, pp. 31–45, 2008. [45] A. M. Algorry, A. G. García, and A. G. Wofmann, “Real-time object detection and classification of small and similar figures in image processing,” in 2017 International Conference on Computational Science and Computational Intelligence (CSCI). IEEE, 2017, pp. 516– 519. [46] M. Mathias, R. Timofte, R. Benenson, and L. Van Gool, “Traffic sign recognition—how far are we from the solution?” in The 2013 international joint conference on Neural networks (IJCNN). IEEE, 2013, pp. 1–8. DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B 66 Bibliography [47] J. Greenhalgh and M. Mirmehdi, “Real-time detection and recognition of road traffic signs,” IEEE Transactions on Intelligent Transportation Systems, vol. 13, no. 4, pp. 1498– 1506, 2012. [48] Z. Du, J. Yin, and J. Yang, “Expanding receptive field yolo for small object detection,” in Journal of Physics: Conference Series, vol. 1314, no. 1. IOP Publishing, 2019, p. 012202. [49] P. Du, X. Qu, T. Wei, C. Peng, X. Zhong, and C. Chen, “Research on small size object detection in complex background,” in 2018 Chinese Automation Congress (CAC). IEEE, pp. 4216–4220. [50] J. Wang, S. Jiang, W. Song, and Y. Yang, “A comparative study of small object detection algorithms,” in 2019 Chinese Control Conference (CCC). IEEE, 2019, pp. 8507–8512. [51] X. Lu, J. Yao, K. Li, and L. Li, “Cannylines: A parameter-free line segment detector,” in 2015 IEEE International Conference on Image Processing (ICIP). IEEE, 2015, pp. 507–511. [52] C. Akinlar and C. Topal, “Edlines: A real-time line segment detector with a false detection control,” Pattern Recognition Letters, vol. 32, no. 13, pp. 1633–1642, 2011. [53] N. Hamid and N. Khan, “Lsm: perceptually accurate line segment merging,” Journal of Electronic Imaging, vol. 25, no. 6, p. 061620, 2016. [54] A. Mahmoud, L. Ehab, M. Reda, M. Abdelaleem, H. A. El Munim, M. Ghoneima, M. S. Darweesh, and H. Mostafa, “Real-time lane detection-based line segment detection,” in 2018 New Generation of CAS (NGCAS). IEEE, 2018, pp. 57–61. [55] S. Liu, L. Lu, X. Zhong, and J. Zeng, “Effective road lane detection and tracking method using line segment detector,” in 2018 37th Chinese Control Conference (CCC). IEEE, 2018, pp. 5222–5227. [56] Ü. Budak, U. Halıcı, A. Şengür, M. Karabatak, and Y. Xiao, “Efficient airport detection using line segment detector and fisher vector representation,” IEEE Geoscience and Remote Sensing Letters, vol. 13, no. 8, pp. 1079–1083, 2016. [57] H. El Bahi and A. Zatni, “Document text detection in video frames acquired by a smartphone based on line segment detector and dbscan clustering,” Journal of Engineering Science and Technology, vol. 13, no. 2, pp. 540–557, 2018. [58] J. Wang, Q. Qin, L. Chen, X. Ye, X. Qin, J. Wang, and C. Chen, “Automatic building extraction from very high resolution satellite imagery using line segment detector,” in 2013 IEEE International Geoscience and Remote Sensing Symposium-IGARSS. IEEE, 2013, pp. 212–215. DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B Bibliography 67 [59] L. Zhang, Y. Cheng, and Z. Zhai, “Real-time accurate runway detection based on airborne multi-sensors fusion,” Defence Science Journal, vol. 67, no. 5, pp. 542–550, 2017. [60] I. Gamal, A. Badawy, A. M. Al-Habal, M. E. Adawy, K. K. Khalil, M. A. El-Moursy, and A. Khattab, “A robust, real-time and calibration-free lane departure warning system,” Microprocessors and Microsystems, vol. 71, p. 102874, 2019. [61] Ö. E. Yetgin and Ö. N. Gerek, “Pld: Power line detection system for aircrafts,” in 2017 International Artificial Intelligence and Data Processing Symposium (IDAP). IEEE, 2017, pp. 1–5. [62] J. Wang, X. Yang, X. Qin, X. Ye, and Q. Qin, “An efficient approach for automatic rectangular building extraction from very high resolution optical satellite imagery,” IEEE Geoscience and Remote Sensing Letters, vol. 12, no. 3, pp. 487–491, 2014. [63] L. Huang, Q. Chang, S. Chen, and H. Dai, “Line segment matching of space target image sequence based on optical flow prediction,” in 2015 IEEE International Conference on Progress in Informatics and Computing (PIC). IEEE, 2015, pp. 148–152. [64] L. He, X. Ren, Q. Gao, X. Zhao, B. Yao, and Y. Chao, “The connected-component labeling problem: A review of state-of-the-art algorithms,” Pattern Recognition, vol. 70, pp. 25–43, 2017. [65] W.-Y. Chang, C.-C. Chiu, and J.-H. Yang, “Block-based connected-component labeling algorithm using binary decision trees,” Sensors, vol. 15, no. 9, pp. 23 763–23 787, 2015. [66] F. Spagnolo, F. Frustaci, S. Perri, and P. Corsonello, “An efficient connected component labeling architecture for embedded systems,” Journal of Low Power Electronics and Applications, vol. 8, no. 1, p. 7, 2018. [67] B. Peck and J. Mummery, “Hermeneutic constructivism: An ontology for qualitative research,” Qualitative health research, vol. 28, no. 3, pp. 389–407, 2018. [68] K. Yilmaz, “Comparison of quantitative and qualitative research traditions: Epistemological, theoretical, and methodological differences,” European Journal of Education, vol. 48, no. 2, pp. 311–325, 2013. [69] “Precision and recall,” https://en.wikipedia.org/wiki/Precision_and_recall, accessed: 2019-12-04. DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B Appendix A Table of Testing Result Tables from table A.1 to table A.4 show test statistics and the result of each image under the different binary threshold. The results include the precision of each image and overall recall and precision, etc. Table A.5 shows the result after using a slider to adjust the binary threshold manually. 68 DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B APPENDIX A. TABLE OF TESTING RESULT 69 Table A.1: Statistics of all the test set with the binary threshold of 30 Image 1 Image 2 Image 3 Image 4 Image 5 Image 6 Image 7 Image 8 Image 9 Image 10 Image 11 Image 12 Image 13 Image 14 Image 15 Image 16 Image 17 Image 18 Image 19 Image 20 Image 21 Image 22 Image 23 Image 24 Image 25 Image 26 Image 27 Image 28 Image 29 Image 30 Image 31 Image 32 Image 33 Image 34 Image 35 Image 36 Image 37 Image 38 Total: Total time(s): Average time (s): Predicted Number of Cables Lines of Errors All predicted lines in the interested area 4 3 4 4 4 4 4 4 4 4 4 4 4 4 4 3 3 4 4 4 4 2 4 4 3 4 4 2 4 4 4 4 4 4 4 4 4 4 2 6 6 9 8 7 7 5 8 11 6 9 7 2 8 14 4 8 5 12 19 0 13 17 3 5 3 0 6 7 10 5 4 5 5 5 9 8 2 0 1 2 4 3 4 3 6 6 1 6 6 1 1 4 2 5 3 6 4 0 0 6 0 5 1 0 5 5 4 4 4 4 3 2 2 4 986,34 0,260 Actual faulty lines in the interested area 2 0 0 1 2 1 2 2 2 0 0 0 2 1 1 2 0 3 0 2 2 0 0 2 0 2 1 0 3 3 3 3 3 1 1 0 1 3 If error is detected Precision correct of the (1 is image detected) 1 1 0 0 0 1 0,5 1 0,5 1 0,33 1 0,5 1 0,67 1 0,33 0 0 0 0 0 0 1 0,33 1 1 1 1 1 0,5 0 0 1 0,6 0 0 1 0,33 1 0,5 0 0 1 0,33 0 1 0,4 1 1 0 1 0,6 1 0,6 1 0,75 1 0,75 1 0,75 1 0,25 1 0,33 0 0 1 0,5 1 0,75 26 15,12 Recall: Precision: 68,42% 58,14% DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B 70 APPENDIX A. TABLE OF TESTING RESULT Table A.2: Statistics of all the test set with the binary threshold of 40 Image 1 Image 2 Image 3 Image 4 Image 5 Image 6 Image 7 Image 8 Image 9 Image 10 Image 11 Image 12 Image 13 Image 14 Image 15 Image 16 Image 17 Image 18 Image 19 Image 20 Image 21 Image 22 Image 23 Image 24 Image 25 Image 26 Image 27 Image 28 Image 29 Image 30 Image 31 Image 32 Image 33 Image 34 Image 35 Image 36 Image 37 Image 38 Total: Total time(s): Average time (s): Predicted Number of Cables Lines of Errors 4 4 4 4 4 4 4 4 4 4 3 4 4 4 4 4 3 4 5 4 4 1 4 4 3 4 4 2 4 4 4 4 4 4 4 4 4 4 2 11 8 9 7 5 7 7 8 13 4 7 9 1 8 22 6 8 8 13 19 0 11 13 0 1 6 0 6 7 10 5 5 5 8 2 9 15 992,28 0,261 All predicted lines in the interested area 2 1 2 2 3 2 4 4 5 6 1 3 6 1 4 4 2 5 6 5 4 0 0 5 0 1 2 0 4 5 4 4 4 4 4 2 3 4 Actual faulty lines in the interested area 2 0 0 2 2 1 2 2 2 0 0 1 2 1 1 3 0 3 0 2 2 0 0 2 0 1 1 0 3 3 3 3 3 1 2 0 1 3 If error is Precision detected of the (1 is image correct) 1 1 0 0 0 0 1 1 1 0,67 1 0,5 1 0,5 1 0,5 1 0,4 0 0 0 0 1 0,33 1 0,33 1 1 1 0,25 1 0,75 0 0 1 0,6 0 0 1 0,4 1 0,5 0 0 1 0,4 0 1 1 1 0,5 0 1 0,75 1 0,6 1 0,75 1 0,75 1 0,75 1 0,25 1 0,5 0 0 1 0,33 1 0,75 27 16,07 Recall: Precision: 71,05% 59,51% DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B APPENDIX A. TABLE OF TESTING RESULT 71 Table A.3: Statistics of all the test set with the binary threshold of 50 Image 1 Image 2 Image 3 Image 4 Image 5 Image 6 Image 7 Image 8 Image 9 Image 10 Image 11 Image 12 Image 13 Image 14 Image 15 Image 16 Image 17 Image 18 Image 19 Image 20 Image 21 Image 22 Image 23 Image 24 Image 25 Image 26 Image 27 Image 28 Image 29 Image 30 Image 31 Image 32 Image 33 Image 34 Image 35 Image 36 Image 37 Image 38 Total: Total time(s): Average time (s): Predicted Number of Cables Lines of Errors 4 4 4 4 4 4 4 4 4 4 3 4 4 4 4 4 1 4 5 4 4 3 4 3 2 4 4 2 4 4 4 4 4 4 4 1 4 4 3 9 9 6 7 2 9 8 8 17 11 7 11 6 9 16 0 8 16 4 21 29 0 5 0 2 7 0 6 7 11 7 5 7 5 0 6 7 1012,47 0,266 All predicted lines in the interested area 2 0 4 3 3 0 5 4 5 6 6 3 7 4 5 4 0 5 10 3 3 4 0 3 0 1 2 0 4 5 4 4 4 4 3 2 3 4 Actual faulty lines in the interested area 2 0 2 2 2 0 2 2 2 0 1 1 2 2 1 3 0 3 0 2 2 2 0 1 0 1 1 0 3 3 3 3 3 1 1 0 1 3 If error is Precision detected of the (1 is image correct) 1 1 0 1 0,5 1 0,67 1 0,67 0 1 0,4 1 0,5 1 0,4 0 0 1 0,17 1 0,33 1 0,29 1 0,5 1 0,2 1 0,75 0 1 0,6 0 0 1 0,67 1 0,67 1 0,5 0 1 0,33 0 1 1 1 0,5 0 1 0,75 1 0,6 1 0,75 1 0,75 1 0,75 1 0,25 1 0,33 0 0 1 0,33 1 0,75 29 15,90 Recall: Precision: 76,32% 54,84% DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B 72 APPENDIX A. TABLE OF TESTING RESULT Table A.4: Statistics of all the test set with the binary threshold of 60 Image 1 Image 2 Image 3 Image 4 Image 5 Image 6 Image 7 Image 8 Image 9 Image 10 Image 11 Image 12 Image 13 Image 14 Image 15 Image 16 Image 17 Image 18 Image 19 Image 20 Image 21 Image 22 Image 23 Image 24 Image 25 Image 26 Image 27 Image 28 Image 29 Image 30 Image 31 Image 32 Image 33 Image 34 Image 35 Image 36 Image 37 Image 38 Total: Total time(s): Average time (s): Predicted Number of Cables Lines of Errors 4 4 4 4 4 3 4 5 4 4 4 4 4 4 5 4 3 4 4 4 4 3 4 3 2 4 4 1 4 4 4 4 4 4 4 1 4 4 4 6 8 6 6 2 10 8 7 19 14 7 5 8 10 9 14 13 2 10 28 25 1 4 0 2 9 0 8 7 11 7 5 4 5 0 7 7 1014,97 0,267 All predicted lines in the interested area 2 1 4 3 3 0 7 4 5 10 10 6 3 4 7 4 8 8 0 5 4 4 0 2 0 1 2 0 5 4 6 4 3 2 3 0 4 4 Actual faulty lines in the interested area 2 1 2 2 2 0 2 2 2 0 1 1 2 2 0 3 0 3 0 2 2 1 0 0 0 1 1 0 3 3 3 3 3 1 1 0 1 3 If error is Precision detected of the (1 is image correct) 1 1 1 1 1 0,5 1 0,67 1 0,67 0 1 0,29 1 0,5 1 0,4 0 0 1 0,1 1 0,17 1 0,67 1 0,5 0 0 1 0,75 0 0 1 0,38 0 1 0,4 1 0,5 1 0,25 0 0 0 0 1 1 1 0,5 0 1 0,6 1 0,75 1 0,5 1 0,75 1 1 1 0,5 1 0,33 0 1 0,25 1 0,75 28 15,66 Recall: Precision: 73,68% 55,93% DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B APPENDIX A. TABLE OF TESTING RESULT Table A.5: Statistics with slider Image 1 Image 2 Image 3 Image 4 Image 5 Image 6 Image 7 Image 8 Image 9 Image 10 Image 11 Image 12 Image 13 Image 14 Image 15 Image 16 Image 17 Image 18 Image 19 Image 20 Image 21 Image 22 Image 23 Image 24 Image 25 Image 26 Image 27 Image 28 Image 29 Image 30 Image 31 Image 32 Image 33 Image 34 Image 35 Image 36 Image 37 Image 38 Total If error is detected 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 0 1 1 1 0 1 0 1 1 0 1 1 1 1 1 1 1 0 1 1 31 Precision of each image 1 1 0,5 1 0,666667 0,5 0,5 0,666667 0,4 0 0,166667 0,333333 0,666667 1 1 0,75 0 0,6 0 0,666667 0,666667 0,5 0 0,4 0 1 1 0 0,75 0,75 0,75 0,75 1 0,5 0,5 0 0,5 0,75 21,23333 Recall: 81,58% Precision: 68.49% 73 DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B Appendix B Testing Result by Image Processing Following images in B are all the test sets with potentially detected flagging tapes, which are marked with red lines. The left three cables are plastic-type of weatherproofing, which is not considered among the result statistics. The rightmost one is the flagging tape and it is the detect target. 74 DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B APPENDIX B. TESTING RESULT BY IMAGE PROCESSING 75 DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B 76 APPENDIX B. TESTING RESULT BY IMAGE PROCESSING DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B APPENDIX B. TESTING RESULT BY IMAGE PROCESSING 77 DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B 78 APPENDIX B. TESTING RESULT BY IMAGE PROCESSING DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B TRITA-EECS-EX-2020:38 www.kth.se