Uploaded by Raziullah Baig

FULLTEXT01

advertisement
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
DEGREE PROJECT IN ELECTRICAL ENGINEERING,
SECOND CYCLE, 30 CREDITS
STOCKHOLM, SWEDEN 2020
Detecting Faulty Tape-around
Weatherproofing Cables by
Computer Vision
RUIWEN SUN
KTH ROYAL INSTITUTE OF TECHNOLOGY
SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
Detecting Faulty Tape-around
Weatherproofing Cables by
Computer Vision
RUIWEN SUN
Master’s Program in Embedded Systems (120 credits)
Date: February 17, 2020
Supervisor: Yuan Yao
Examiner: Zhonghai Lu
School of Electrical Engineering and Computer Science (EECS)
Host company: Ericsson
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
12/16/2021
12/16/2021
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
iii
Abstract
More cables will be installed owing to setting up more radio towers when it comes to 5G. However, a large proportion of radio units are constructed high in the open space, which makes it
difficult for human technicians to maintain the systems. Under these circumstances, automatic
detections of errors among radio cabinets are crucial. Cables and connectors are usually covered with weatherproofing tapes, and one of the most common problems is that the tapes are not
closely rounded on the cables and connectors. This makes the tape go out of the cable and look
like a waving flag, which may seriously damage the radio systems. The thesis aims at detecting
this flagging-tape and addressing the issues.
This thesis experiments two methods for object detection, the convolutional neural network
as well as the OpenCV and image processing. The former uses YOLO (You Only Look Once)
network for training and testing, while in the latter method, the connected component method
is applied for the detection of big objects like the cables and line segment detector is responsible for the flagging-tape boundary extraction. Multiple parameters, structurally and functionally
unique, were developed to find the most suitable way to meet the requirement. Furthermore, precision and recall are used to evaluate the performance of the system output quality, and in order
to improve the requirements, larger experiments were performed using different parameters.
The results show that the best way of detecting faulty weatherproofing is with the image
processing method by which the recall is 71% and the precision reaches 60%. This method shows
better performance than YOLO dealing with flagging-tape detection. The method shows the
great potential of this kind of object detection, and a detailed discussion regarding the limitation
is also presented in the thesis.
Keywords
Object Detection, Image Processing, OpenCV, YOLO, Line Segment Detector
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
iv
Swedish Abstract
Sammanfattning
Fler kablar kommer att installeras på grund av installation av fler radiotorn när det gäller 5G.
En stor del av radioenheterna är dock konstruerade högt i det öppna utrymmet, vilket gör det
svårt för mänskliga tekniker att underhålla systemen. Under dessa omständigheter är automatiska upptäckter av fel bland radioskåp avgörande. Kablar och kontakter täcks vanligtvis med
väderbeständiga band, och ett av de vanligaste problemen är att banden inte är rundade på kablarna och kontakterna. Detta gör att tejpen går ur kabeln och ser ut som en viftande flagga, vilket
allvarligt kan skada radiosystemen. Avhandlingen syftar till att upptäcka detta flaggband och ta
itu med frågorna.
Den här avhandlingen experimenterar två metoder för objektdetektering, det invändiga neurala nätverket såväl som OpenCV och bildbehandling. Den förstnämnda använder YOLO (You
Only Look Once) nätverk för träning och testning, medan i den senare metoden används den
anslutna komponentmetoden för detektering av stora föremål som kablarna och linjesegmentdetektorn är ansvarig för utvinning av bandbandgränsen . Flera parametrar, strukturellt och funktionellt unika, utvecklades för att hitta det mest lämpliga sättet att uppfylla kravet. Dessutom används precision och återkallande för att utvärdera prestandan för systemutgångskvaliteten, och
för att förbättra kraven utfördes större experiment med olika parametrar.
Resultaten visar att det bästa sättet att upptäcka felaktigt väderbeständighet är med bildbehandlingsmetoden genom vilken återkallelsen är 71% och precisionen når 60%. Denna metod
visar bättre prestanda än YOLO som hanterar markering av flaggband. Metoden visar den stora
potentialen för denna typ av objektdetektering, och en detaljerad diskussion om begränsningen
presenteras också i avhandlingen.
Nyckelord
Objektdetektion, Bildbehandling, OpenCV, YOLO, Linjesegmentdetektor
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
Acknowledgements
v
Acknowledgements
I would like to express my deepest gratitude to Athanasios Karapantelakis, Yifei Jin, and my
Ericsson supervisor, Maxim Teslenko, for the opportunity they have provided to implement the
project and to take the time to assist me when I have any questions and provide them throughout
the thesis process.
I would also like to thank my reviewer Zhonghai Lu for accepting my project and for providing valuable feedback throughout the project and documentation.
Eventually, I would like to thank my family, my girlfriend, and my friends for giving me the
greatest help throughout my learning process.
Stockholm, January 2020
Ruiwen Sun
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
CONTENTS
vii
Contents
1 Introduction
1.1 Background . . . . . . . . . . . .
1.2 Problem . . . . . . . . . . . . . .
1.3 Purpose . . . . . . . . . . . . . .
1.4 Goals . . . . . . . . . . . . . . .
1.5 Research Methodology . . . . . .
1.6 Benefits, Ethics and Sustainability
1.7 Delimitations . . . . . . . . . . .
1.8 Outline . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
1
1
2
3
3
3
4
5
5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7
8
8
10
15
16
16
17
18
18
18
20
21
21
22
24
24
3 Methods
3.1 Research Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.1 Qualitative and Quantitative Research Methods . . . . . . . . . . . . .
3.1.2 Philosophical Assumptions . . . . . . . . . . . . . . . . . . . . . . . .
25
25
26
26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2 Theoretical Background
2.1 Object Detection Method . . . . . . . . . . . . .
2.1.1 R-CNN . . . . . . . . . . . . . . . . . .
2.1.2 YOLO . . . . . . . . . . . . . . . . . . .
2.2 Custom Image Processing Function . . . . . . . .
2.2.1 Greyscale and Binary Thresholding . . .
2.2.2 Erosion and Dilation . . . . . . . . . . .
2.2.3 Find Contours and Connected Component
2.3 Line Segmentation Methods . . . . . . . . . . .
2.3.1 Hough Transform . . . . . . . . . . . . .
2.3.2 Line Segment Detector (LSD) . . . . . .
2.4 Precision and Recall . . . . . . . . . . . . . . . .
2.5 Related Work . . . . . . . . . . . . . . . . . . .
2.5.1 Small Object Detection . . . . . . . . . .
2.5.2 Line Segmentation . . . . . . . . . . . .
2.5.3 Connected Component . . . . . . . . . .
2.6 Summary . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
viii
CONTENTS
.
.
.
.
.
.
.
.
.
27
27
28
28
28
29
29
30
32
.
.
.
.
.
.
.
.
.
.
33
33
33
34
35
36
36
36
38
42
48
.
.
.
.
.
.
.
.
.
49
49
49
50
50
50
53
55
56
57
6 Conclusion and Future Work
6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58
58
59
61
3.2
3.3
3.4
3.1.3 Research Methods . . . . . . . . . . . . .
3.1.4 Data Collection and Analysis Methods . .
3.1.5 Quality Assurance . . . . . . . . . . . .
3.1.6 Inference . . . . . . . . . . . . . . . . .
Software Environment . . . . . . . . . . . . . . .
Experimental Design . . . . . . . . . . . . . . .
3.3.1 Method of YOLO . . . . . . . . . . . . .
3.3.2 Method of OpenCV and Image Processing
Summary . . . . . . . . . . . . . . . . . . . . .
4 Design and Implementation
4.1 Implementation on YOLO Network . . . . . . . .
4.1.1 Data Labelling . . . . . . . . . . . . . .
4.1.2 Network Configurations . . . . . . . . . .
4.1.3 Training Dataset . . . . . . . . . . . . . .
4.2 Implementation of OpenCV and Image Processing
4.2.1 Image Initialisation . . . . . . . . . . . .
4.2.2 Remove Blobs in the Image . . . . . . . .
4.2.3 Calculate Cable Information . . . . . . .
4.2.4 Line Segment Detection . . . . . . . . .
4.3 Summary . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5 Measurements and Result
5.1 Object Detection of Faulty Weatherproofing . . . . . . . .
5.1.1 Result . . . . . . . . . . . . . . . . . . . . . . . .
5.1.2 Discussion . . . . . . . . . . . . . . . . . . . . . .
5.2 Method Based on Computer Vision and Image Processing .
5.2.1 Testing Method and Testing Environment . . . . .
5.2.2 Testing Result Under Fixed Binary Threshold . . .
5.2.3 Testing Result Under Adjustable Binary Threshold
5.2.4 Discussion . . . . . . . . . . . . . . . . . . . . . .
5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . .
Bibliography
Appendix A Table of Testing Result
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
62
68
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
CONTENTS
Appendix B Testing Result by Image Processing
ix
74
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
x
LIST OF FIGURES
List of Figures
1.1
Thesis outline: six chapters . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1
2.2
2.3
2.4
2.5
2.6
Phases of image identification . . . . . . . . . .
Bounding boxes with the predicted location . .
Framework of Darknet-53 . . . . . . . . . . . .
Gradient and level-lines definition . . . . . . .
ERF-YOLO performance comparison . . . . . .
Performance comparison of EDLines and LSD .
.
.
.
.
.
.
7
12
14
19
21
23
3.1
3.2
3.3
Research methods and methodologies . . . . . . . . . . . . . . . . . . . . . .
Flowchart of the training of object detection model . . . . . . . . . . . . . . .
The flowchart of the detection system . . . . . . . . . . . . . . . . . . . . . . .
26
29
31
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
OpenLabeling tool interface .
Sample blobs in the test set .
Connectivity Introduction .
Bounding box selection . . .
Inner reflection sample . . .
LSD sample . . . . . . . . .
Positions of endpoints . . . .
Distance calculation . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
34
37
38
39
42
43
47
47
5.1
5.2
5.3
5.4
5.5
Sample test image . . . . . . . . . . . . . . . .
Result of the sample image . . . . . . . . . . .
Binary threshold and precision and recall . . . .
Binary threshold and run time . . . . . . . . .
Comparison between slider and fixed threshold
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
51
54
55
55
56
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
LIST OF TABLES
xi
List of Tables
2.1
2.2
Comparison of backbones. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contingency table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
20
3.1
Summary of Research Methodology . . . . . . . . . . . . . . . . . . . . . . .
26
5.1
5.2
Testing Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Test result under different binary thresholds . . . . . . . . . . . . . . . . . . .
53
54
A.1
A.2
A.3
A.4
A.5
Statistics of all the test set with the binary threshold of 30
Statistics of all the test set with the binary threshold of 40
Statistics of all the test set with the binary threshold of 50
Statistics of all the test set with the binary threshold of 60
Statistics with slider . . . . . . . . . . . . . . . . . . . .
69
70
71
72
73
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
List of Acronyms and Abbreviations
mAP
Mean average precision
YOLO
You only look once
MIMO
Multiple-input and multiple-output
5G
Fifth-generation wireless technology
AI
Artificial intelligence
OpenCV
Open computer vision library
FPN
Feature pyramid networks
RGB model
Red, green, and blue color model
GHT
Generalized hough transform
IoUs
Intersection over union
CNN
Convolutional neural network
R-CNN
Regions with CNN features
ROI
Region of interest
RPN
Region proposal networks
GPU
Graphics processing unit
XML
Extensible Markup Language
HOG
Histogram of Oriented Gradients
SVM
Support Vector Machines
LSM
Line Segment Merging
ED
Edge Drawing
xii
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
List of Acronyms and Abbreviations
xiii
VOC
Visual Object Classes
RF
radio frequency
MINST database
Modified National Institute of Standards and Technology database)
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
Chapter 1
Introduction
The Fifth-Generation (5G) Mobile networks are known to be high-speed networks which are able
to reach 10Gbps theoretically [1]. This means that 5G can be used in various use cases, including
intelligent transport systems [2], [3], IoT applications Internet connection [4], telemedicine [5],
etc.
Radio site, which is a telecommunication equipment consisting of a radio tower and a base
cabinet, plays an important role [6]. Typically, the radio base station is set in radio tower, which
covers certain radio coverage area, to communicate with terminal including mobile phones, wireless routers, etc. For example, when one makes a call by mobile phone, signals will be sent and
received by a nearby (usually nearest) base station.
When it comes to 5G, radio towers are expected to be massive in numbers but smaller in
physical sizes [7]. This is because 5G works on a higher-frequency so that it has a shorter
transmission distance. Furthermore, a larger number of devices results in higher bandwidth
requirement. In this way, the stability and performance of the radio antenna are essential. To
ensure this, the massive application of the Multiple-Input Multiple-Output (MIMO) is applied
[8], which uses multiple transmitting and receiving antennas. As a result, more hardware needs
to be connected.
In short, radio site plays a more important role in terms of 5G networks, in which the number
of the radio towers and hardware connection increase in the meantime. This thesis ensures the
working status of the hardware, the cables. More precisely, this work focuses on detecting weatherproofing nuts (tape-around style) sticking to the cables on the radio tower and checks whether
the weatherproofing is good or not. In this way, the automatic inspection is really meaningful to
meet the demand of the increasing number of these types of hardware.
1.1
Background
Radio site inspection is particularly expensive when involving equipment which is mounted on
radio towers (e.g. radio units and/or antennas). This is due to field service engineers having
1
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
2
CHAPTER 1. INTRODUCTION
to climb the tower to assess the condition of equipment, which is a dangerous and expensive
operation.
In order to address safety and cost concerns in radio site inspection, drones are used to assist field engineers with the inspection process. However, video footage from the drones still
needs post-analysis from drone operators or field service engineers. This can result in a lengthy
feedback loop of video being submitted to expert engineers, getting assessed and field service
engineers revisiting the site to climb back up to the tower in case if any issues are found.
1.2
Problem
In installation of the radio tower, cables are used in radio towers to connect between antennas
and radio units. However, during long-term use of these cables and connectors, especially under
various natural meteorology including raining and strong wind, these connections may not be
as strong as expected. As a consequence, the site inspections of these pieces of equipment are
essential. Nevertheless, these site inspections have a variety of problems in human and financial
resources.
On one hand, climbing up to the radio tower and inspecting is extremely dangerous. The
tower structure related fatalities in America [9], [10] show that a number of technicians lost
their life due to falling mostly. One the other hand, the weather affects radio frequency cable
installations and inspection. The moisture effects on the cable is a great enemy. The moisture
can become trapped inside the antenna connectors, which will result in corrosion among the
shields and conductor. This will significantly influence the system. Furthermore, the inspection
video calls for analysis by accompanied field engineer on the computer afterwards, which is
time-costing. These situations will affect the efficiency of the inspection.
Commonly, the site inspection issues consist of the following two types, which are bending
radius and weatherproofing. In terms of bending radius, bending blow greatly will put stresses
on cables, and they finally cause mechanical damage and/or signal losses or deamplification. In
the case of weatherproofing connection, there are two types, one is plastic-type weatherproofing,
another is the tape-around weatherproofing.
This master thesis looks at one of the specific issues that are common on a radio tower,
which deals with correct application of weatherproofing on the cables connecting radio units
and antennas. In many developing countries, engineers do not use expensive plastic weatherproofing but instead opt for the application of weather-sealing tape on the cable connectors of
the aforementioned equipment. This is also known as “tape-around” weatherproofing. However,
there are many cases where either tape is incorrectly applied or becomes loose or torn due to
environmental causes.
This thesis will deal with the problem of detecting whether a tape-around weatherproofing
is in a good or bad situation. Several different aspects will be considered when providing the
assessment:
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
CHAPTER 1. INTRODUCTION
3
One of the typical bad-amounted cases is that the end of the tape is observed outside the cable,
in other words, not glue jointed to the cables. These tapes look like waving flags on the flagpole,
so they are called flagging-tapes in the following paragraphs. But how can these situations be
precisely detected since the shape of flagging tape vary significantly? More concretely, how to
detect flagging tape on the bottom or top of a tape-around shielded RF (radio frequency) cable?
1.3
Purpose
The purpose of this project is to reduce costs and mitigate safety concerns for engineers inspecting radio towers. Therefore, drone-based faulty weatherproofing detection is conducted to avoid
tower inspection from human beings. This thesis shall be able to present a feasible method,
despite the interference during the detection. Concretely, the research area of focus is morphological image transformation, but also computer vision, and especially machine-learning-based
object detection. Both of these sub-research areas and approaches will potentially be considered towards the final thesis deliverable. The final approach is expected to detect the faulty
tape-around weatherproofing in most cases while minimizing bad performance in all aspects.
1.4
Goals
The goal of this project is to build up a system for faulty weatherproofing detection. The main
goals and deliveries are stated as follows.
• The potential error detected in the image will be highlighted and shown to the operator.
• The AI-based object detection method should be capable of detecting certain weatherproofing in more than a certain number of cases ideally, and it should produce less false
positives as possible. If the result does not meet the required performance, another method
of image processing will be considered.
• As far as various lighting and illumination scenarios are concerned, operators are available
to change the value of the binary threshold and erosion kernel size parameter in OpenCV
respectively. Furthermore, the background may produce a huge number of noises. This
means that the system is robust enough for the noise illumination.
• The master thesis report is provided to illustrate the relevant literature area, describing the
solution taken, and an evaluation of the solution against state-of-art.
1.5
Research Methodology
The faulty weatherproofing detection system prototypes are based on theoretical and experimental work. It is also discussed with experienced engineers in the industry area. Corresponding
theoretical knowledge is obtained from databases, including Google Scholar, IEEE Xplore, KTH
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
4
CHAPTER 1. INTRODUCTION
databases, Diva, etc. The conclusions are explained in the “Theoretical Background” section
regarding theoretical information. This includes brief and detailed information, which are required to have a basic understanding of the background of the paper. This section will introduce
an overview of the issues. Furthermore, possible solutions will be developed to addressed those
problems. The specific focus will be on the requirements set by the company.
The priority requirement for the solution is to have relatively accurate detection and full
functions. Based on this, the precision and recall of the whole test set should be as large as possible, and these performances should be better if the detection of the system model has variance
in different test pictures. However, the values in the requirement regarding precision and recall
are not specified, because of the background noises, illumination differences, and a variety of
shooting angles will all make a huge difference in the detection result. On contrast, time and
power consumption are not the most crucial part of this system, but rather a reduction of false
positives is critical. Therefore, the potential solutions to be chosen are based basically on these
requirements.
Different kinds of experiments are based on corresponding theoretical knowledge. These
will be determined by empirical methods, which means that they will use the observations and
experiences from others [11]. The result of experiments will determine if this method is possible
to be applied. Furthermore, other methods are required to be imported to make a comparison.
The more detailed discussion regarding various research methods of this thesis will be accessible
in chapter 3.
1.6
Benefits, Ethics and Sustainability
With the wider application in 5G, more radio towers need to be constructed and compatible with
5G. This requires a high-demand for radio tower operations including upgrades, cable mounting,
etc. Since tower climbing is dangerous and high consumption in time and finance, a better
method highly requires to be applied.
The basic benefit of this thesis is to provide better sustainable solutions. The prototype will
reduce the company’s time, human resources and financial costs, helping them increase profits
and provide more sustainable solutions for the manufacturing industry.
This thesis will not contain any personal data. In other words, private data cannot be abused.
In order to obtain higher credibility, the conclusions of this paper are supported by repeatable
theoretical and experimental results.
The theoretical background is informative and citations are used carefully to make them
comprehensive. This ensures that there is a reputation and to let the readers learn clearly about
previous research. The results and discussion are based on experimental and theoretical backgrounds. This ensures that they are apparent and trustful.
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
CHAPTER 1. INTRODUCTION
1.7
5
Delimitations
The thesis is limited owing to restricted technical limitations and test requirements. Within
technical limitations, it is hard to detect the certain shape of the black tape surrounded by the
dark background. Furthermore, the brightness of the image not only influences the reflection and
specularity of the cable but also affect the binary threshold of the image. In this way, a slider of
threshold and kernel are conducted to face flexible brightness and illumination environment.
In terms of test requirement, a certain test was set up by experienced developers from the
company, which includes a number of tape-around cables with various scenes and weather, like
metropolitan, city night scene, open country, etc. However, due to the business administration
and safety issues, it is not accessible to climb the real high radio tower to get data, so there will
be differences between the real site and the test set. As a consequence, the conducted system
will primarily focus on the simulated environment.
1.8
Outline
Primarily, the thesis is built on six chapters, which is shown in figure 1.1. In this section, the
outline of these chapters is discussed briefly for readers. In this way, readers can easily follow
the outline of the whole thesis.
Introduction
Theoretical
Background
Methods
Implementation
Results
Conclusion
Figure 1.1: Thesis outline: six chapters
Chapter 1 introduces the readers to the thesis by presenting an overall perspective, including
the research topics and questions.
Chapter 2 explains two methods to be implemented in the program, the object detection
method and image processing method on OpenCV, and how they are possible to be applied in
the program.
Chapter 3 gives the introduction of the methodology and tools during the thesis. This chapter
briefly explains how the system works. The risks of the object detection method are also shown
in the passage. Moreover, this chapter leads to the next, the implementation section.
Chapter 4 presents how these two methods are implemented in detail. It shows the processes
in the system deeply into steps and functions. Furthermore, some prime algorithms are also
explained in this chapter.
Chapter 5 illustrates the results of the two methods and shows why the object detection on
YOLO fails in the system. In this way, the system is focused on the method of image processing.
When it comes to the implementation of OpenCV, the result is analysed in quantitative research
focusing on precision and recall.
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
6
CHAPTER 1. INTRODUCTION
Chapter 6 concludes the whole system and its performance, and it suggests for the future
work.
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
Chapter 2
Theoretical Background
Image identification refers to the technique of using a computer to process, analyze, and processing images to identify targets and objects in various modes. Based on the observed images, the image processing method distinguishes the objects in the image by their categories and
make reasonable judgments. In this way, the algorithm uses modern information processing and
computing techniques to simulate and implement human sense and the processes of acknowledgement. In general, an image identification system is mainly composed of three parts: image
segmentation, image feature extraction, and classification of classifiers, which is shown in figure
2.1.
Pre-processing of
the image
Image
segmentation
Feature Extraction
Result Delivery
Judgement and
matching
Figure 2.1: Phases of image identification
The image segmentation divides the image into a plurality of meaningful regions and then
extracts the features of each region, and finally, the classifier classifies the images according to
the extracted image features. In fact, there is no strict limit to image identification and image segmentation. In a sense, the process of image segmentation is the process of image identification.
Image segmentation focuses on the relationship between objects and backgrounds. This step
concentrates on the overall properties of objects in a particular context, while image identification focuses on the properties of the objects themselves. Image segmentation and identification
technologies are widely used in aerospace, medical, communications, industrial automation,
7
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
8
CHAPTER 2. THEORETICAL BACKGROUND
robotics, and military fields.
This chapter offers theoretical background information. As this project is attempted in two
basic methods, this section will introduce several prime parts of the technical background. Since
Convolutional Neural Networks (CNNs) have been widely applied in various fields including
image recognition [12], the first part will illustrate deep learning methods having attempted in
the system. The second part explains and compares line detection methods. Last but not least,
other image processing methods are also conducted.
2.1
Object Detection Method
Compared with image classification, object detection in images is a more complicated problem
in computer vision, because image classification only needs to judge which type of image belongs to. In object detection, there may be multiple objects in the image, class discrimination
and position determination are required for all objects. In this way, object detection is more
challenging than image classification. Furthermore, the deep learning model applied to object
detection is more complicated.
Recent years have witnessed the development of object detection methods. Most popular
methods can be classified into two types of methods. The former is based on Region proposal,
which includes R-CNN (Region Convolutional Neural Network) and its derived methods. While
the latter is based on a single CNN network, like YOLO (You Only Look Once). This section
will explain and compare these two types of methods regarding object detection based on deep
learning.
2.1.1
R-CNN
In 2014, the R-CNN algorithm was proposed [13], which basically laid the application of the
two-stage method in the field of target detection. The first step is the segmentation of region
proposal by means of selective search, while applying the best recognition network at the time,
AlexNet [14], to classify the object in each region.
The R-CNN method works as the following steps. First of all, get the original image entered.
Secondly, the selective search algorithm is used to evaluate the similarity between adjacent images. Then the similarity regions will be merged, and the merged blocks are scored. In this way,
the candidate frame of the region of interest, the sub-graph, can be selected. This step requires
approximately 2,000 sub-graphs to be selected. After that, the convolutional neural network is
separately used for the sub-graph, and convolution-ReLU(rectified linear unit )-pooling and full
connection are performed to extract features. This step goes primarily into the area of object
recognition. Last but not least, the extracted features are classified into objects, and the blocks
with high classification accuracy are treated as the final object positioning block.
R-CNN achieves a 50% performance improvement over the traditional target detection al-
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
CHAPTER 2. THEORETICAL BACKGROUND
9
gorithm. In the case of using the VGG-16 model [15] as the object recognition model, 66%
accuracy can be achieved on the voc2007 dataset [16], which is a relatively good result. However, the biggest problem is that the speed is very slow and the memory usage is very large.
There are two main reasons.
• The candidate bounding box is completed by the traditional selective search algorithm,
which is slow.
• For all the 2000 candidate frames, object recognition is required. In other words, 2000
convolutional network calculations are needed, whose amount of calculation is gigantic.
Fast R-CNN
Aiming at addressing the weakness of R-CNN, a new fast R-CNN method is proposed [17]. Fast
R-CNN mainly optimized two issues, image rescaling and speed issues.
• ROI (Region of Interest) Pooling infrastructure is conducted. It solves the problem that the
candidate frame sub-graph must be cropped and scaled to the same size. Since the input
image size of the CNN network must be a fixed size (otherwise it cannot be calculated
when fully connected), the candidate frames of different sizes and shapes in the R-CNN
are cropped and scaled so that they can reach the same size. This operation is a waste of
time. Moreover, it can easily lead to loss and deformation of image information. The Fast
R-CNN inserts the ROI Pooling Layer before the fully connected layer. As a result, the
image does not need to be cropped, through which solves this problem.
• The multi-task loss function idea is proposed, and the classification loss and the frame
positioning regression loss are combined to form a unified training, and finally, the corresponding classification and frame coordinates are output.
Faster R-CNN
Both R-CNN and fast R-CNN have a similar problem, which is to generate candidate frames
by the selective search. Nevertheless, this algorithm is very slow. Moreover, all of the 2000
candidate frames generated in the R-CNN is required to feed into a convolutional neural network,
which is very time consuming for calculation. This causes the slow detection speed of these
two algorithms. In terms of this problem, the RPN (region proposal networks) is proposed to
obtain candidate frames, thus eliminating the selective search algorithm and requiring only one
convolution layer operation, which greatly improves the recognition speed. The Faster-RCNN
mainly bases on four steps.
• Convolution layer. The original image is first fed into Convolutional-ReLU-Pooling multilayer convolutional neural network to extract feature maps for subsequent regional proposal networks and fully connected layers. Different from R-CNN, Faster R-CNN only
needs to extract the entire image once, which greatly reduces the computation time.
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
10
CHAPTER 2. THEORETICAL BACKGROUND
• RPN layer. The RPN layer is implemented to generate candidate frames and applies softmax [18] to determine whether the candidate frame is foreground or background. From
these backgrounds, RPN selects foreground candidate frames, which usually contains targets, and uses bounding box regression to adjust the position of the candidate frame to
obtain the feature subgraph.
• ROI layer. This is similar to the layer in Fast R-CNN. It pools the proposal of different
sizes into the same size, then feeds them into the successive fully connected layer for object
classification.
• Classification layer. This layer will output the result, including the class of object as well
as the precise location of the object.
In general, from R-CNN, Fast R-CNN, Faster R-CNN, the process based on deep learning
target detection is growing streamlined, with higher precision and faster speed, which meets the
recent industry and research demand.
2.1.2
YOLO
YOLO [19], You only look once, is a network for object detection working a single CNN. Typically, the target detection consists of two tasks, i.e. identifying the location of the explicit object
in the image, and classifying those objects. Previously, R-CNN and its derivative methods used
a pipeline in multiple steps to complete the detection of objects. This results in a slow operation
and is difficult to optimize because each module must be trained separately. However, YOLO
does all of these processings in a single neural network.
In other words, object detection is reconstructed into a single regression problem, and directly obtain the bound box coordinates and classification probability from the image pixels
[19].
So, in a nutshell, the images are taken as input, and passed to a neural network that is similar
to a normal CNN, and get a bounding box and class-predicted vector in the output.
The following section introduces the progress of later versions of YOLO.
YOLOv2 and YOLO9000
Based on YOLOv1, after improvement by Joseph Redmon, YOLO9000 and YOLOv2 algorithms
were proposed in 2017 CVPR [20], focusing on the error of YOLOv1 recall rate and positioning
accuracy. At the time of the presentation, YOLOv2 was faster than other inspection systems in
multiple monitoring dataset and could be weighed against speed and accuracy.
The article proposes a new training method – joint training algorithm. This algorithm can
mix the two datasets together. Objects are categorized using a hierarchical view, and a large
amount of categorical dataset is applied to augment the test dataset to mix two different datasets.
The prime idea is to train object detectors on the detection dataset and the classification dataset,
to learn the exact position of the object by monitoring the data of the dataset, and to increase
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
CHAPTER 2. THEORETICAL BACKGROUND
11
the classification quantity of the classification data by using the data of the classification dataset.
YOLO9000 is trained using a joint training algorithm. It has 9000 categories of classification information learned from the ImageNet classification dataset [14], while object position detection
is learned from the COCO detection dataset [21], [22].
Compared with the former version, YOLOv2 concentrates on improving recall, improving
the accuracy of positioning while maintaining the accuracy of the classification.
Several methods among the [20] are introduced below, which are implemented in YOLOv2
to improve the performance.
• Batch normalization [23]. This approach optimizes the CNN network. Specifically, it increases the convergence of the network while eliminating the dependency on other forms
of regularization. Implementing batch normalization to every convolutional layer in YOLO,
mAP (mean Average Precision) is eventually increased by 2%. Furthermore, the model
achieved regularization. Implement the batch normalization is able to remove dropout
from the model but also get rid of overfitting [20].
• Use high-resolution classifier. YOLOv1 is pretrained on images with 224×224 resolution
[19], while the newer version trained the model on 448×448 resolution. In correspondence
with the resolution, the parameter of ImageNet [14] is also adjusted, which results in 4%
improvement of mAP.
• Convolution with anchor boxes. The YOLOv1 includes a fully connected layer that directly predicts the coordinates of bounding boxes, while the Faster R-CNN method uses
only the convolutional layer and the region proposal network to predict the anchor box offset and confidence, rather than directly predicting the coordinates. The authors of YOLO
found that by predicting the offset rather than the coordinate values, the problem can be
simplified so that the neural network can be learned more easily. In other words, if a
hand-picked priors [24] bounding box, which is a more accurate dimensioned bounding
box, is applied to the system, the convolutional neural network will predict the location
more easily.
• YOLOv2 removed the fully connected layer and used anchor boxes to predict bounding
boxes. At the same time, a pooling layer in the network is removed, which allows the
output of the convolution layer to have a higher resolution. Shrink the network to run
at 416×416 instead of 448×448. Since the objects in the picture tend to appear in the
centre of the picture, especially the larger objects, there is a separate location at the centre
of the object to predict these objects. YOLO’s convolutional layer uses a value of 32
to down-sampling the picture, so by selecting 416× 416 as the input size, it can finally
output a 13×13 feature map. Using the Anchor Box will reduce the accuracy slightly. But
according to [20], using it will allow YOLOv2 to predict more than a thousand boxes, with
a recall of 88% and an mAP of 69.2%.
According to the previous YOLO method, the network does not predict the offset, rather
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
12
CHAPTER 2. THEORETICAL BACKGROUND
predicts the coordinates based on the position of the grid cells in YOLO, which makes the Ground
Truth value from 0 to 1. In order to make the results of the network to fall within this range,
the network uses a Sigmoid function to limit the network predictions to the amount between 0
and 1. The network predicts five bounding boxes in each grid unit. Each Bounding Boxes has
five-coordinate values tx , ty , tw , th , to . Their relationship is shown in the following figure 2.2.
Suppose the offset of a grid unit for the upper left corner of the image is cx , cy , and the width
and height of Bounding Boxes Prior are pw , ph . Then the predicted result is shown below from
equation 2.1 to equation 2.5, and IoU means intersection over the union. The score of IoU is a
standard performance metric for object category segmentation problems. Given a set of images,
the IoU measurement gives the similarity between the predicted area and the ground truth area
of the objects present in the set of images.
cx
cy
pw
bw
ph
σ(ty)
bh
σ(tx)
Figure 2.2: Bounding boxes with the predicted location
bx = σ (tx ) + cx
(2.1)
by = σ (ty ) + cy
(2.2)
bw = pw etw
(2.3)
bh = ph eth
(2.4)
P r(object) × IOU (b, object) = σ(to )
(2.5)
Compared to YOLOv1, YOLOv2 can be faster and stronger. YOLO uses architecture of
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
CHAPTER 2. THEORETICAL BACKGROUND
13
GoogLeNet [25], which is faster than VGG-16 [26]. YOLO only uses 8.52 billion operations for
a forward process, while VGG-16 requires 30.69 billion times.
YOLOv2 is based on a new classification model Darknet19 [27]. YOLOv2 uses a 3×3 filter
that doubles the number of channels after each pooling. YOLOv2 uses global average pooling,
by means of batch normalisation, training becomes more stable, is accelerated to convergence,
and the model is standardized. The final model, Darknet19, has 19 convolutional layers and 5
max-pooling layers. It takes only 5.58 billion operations to process a single image, 72.9% top-1
accuracy on ImageNet, and 91.2% top-5 accuracy. During training, if the entire network has
10 turns of fine turning on a larger 448×448 resolution, the initial learning rate is initialized to
0.001, and the network achieves 76.5% top-1 accuracy, 93.3% top-5 accuracy.
YOLOv3
For a long period of time, there is a problem in the field of computer vision to be solved, that is,
how to detect two similar targets or different types of targets that are close or adjacent to each
other? Most algorithms will scale the input image data to a smaller resolution, but generally,
only one bounding box will be given in this case. This is because the detection method will treat
the whole of several objects as a single object. However, this is two identical or different objects
in reality.
There are many new algorithms for small target detection, like [28], [29]. However, YOLOv3
achieves a relatively good performance.
Compared with SSD [30], YOLOv1 and v2 both perform poorer. Whilst YOLOv3 is seen to
have superior performance than the former versions. Following part will illustrate the mutation
and improvement of YOLOv3.
• Multi-label classification prediction. After YOLO9000, the system uses dimension clusters as anchor boxes to predict the bounding box, and the network predicts 4 coordinates for
each bounding box [20]. Nevertheless, logistic regression is used in YOLOv3 to predict
the object confidence for each bounding box [31]. The value should be 1 if the previous bounding box overlaps the ground truth object by any other previous bounding box.
When the previous bounding box is not the best, but it outnumbers the ground truth object
by a certain threshold, this prediction will be ignored. Unlike YOLOv2, the system only
assigns a bounding box to each ground truth object. If the previous bounding box is not
assigned to a grounding box object, there is no loss of coordinates or category predictions.
• Each bounding box uses a multi-label classification to predict which classes the bounding
box might contain. The algorithm does not use softmax since it is not necessary for high
performance, so YOLOv3 uses a separate logical classifier. In the training process, binary
cross-entropy loss is applied for category prediction. In this way, for overlapping tags, the
multi-label approach can better simulate data.
• Cross-scale prediction is applied in YOLOv3. YOLOv3 uses multiple scales to make pre-
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
14
CHAPTER 2. THEORETICAL BACKGROUND
dictions. The original YOLOv2 has a layer called passthrough layer. Assuming that the
size of the feature map extracted eventually is 13×13. The effect of this layer is to connect the 26×26 feature map of the previous layer and the 13×13 of this layer, similar to
ResNet [32], [33]. This operation is also to enhance the accuracy of the YOLO algorithm
for small target detection. This idea was further refined in YOLOv3, by means of Feature
Pyramid Networks (FPN)-like upsample and fusion practices in YOLOv3 [34]. Eventually, YOLOv3 combines 3 scales, the other two scales are 26×26 and 52×52 respectively.
After testing on multiple scale feature maps, the improvement of detection performance
regarding small targets is still relatively obvious. Although each grid predicts 3 bounding
boxes in YOLOv3, while each grid cell in YOLOv2 predicts 5 bounding boxes, the number
of the bounding box in YOLOv3 is outnumbered. This is because YOLOv3 uses multiple
scales of feature blending.
• Framework changes. YOLOv3 uses a new network for feature extraction. Adding a hybrid
network of Darknet-19 uses a continuous 3x3 and 1x1 convolutional layer, but now there
are some shortcut connections, and YOLOv3 expands it to 53 layers and calls it Darknet53 [31]. Figure 2.3 illustrates the framework of Darknet-53.
Inputs
(batch_size,416,416,3)
Conv2D 32x3x3
(batch_size,416,416,32)
Residual Block 1x 64
(batch_size,208,208,64)
Conv2D Block 5L 128
(batch_size,52,52,128)
Conv2D 3x3+Conv2D 1x1
(batch_size,52,52,75)
Residual Block 8x 256
(batch_size,54,54,256)
Concat
(batch_size,52,52,384)
Conv2D+UpSampling2D
(batch_size,52,52,128)
Residual Block 8x 512
(batch_size,26,26,512)
Concat
(batch_size,26,26,768)
Conv2D Block 5L 256
(batch_size,26,26,256)
Residual Block 2x 128
(batch_size,104,104,128)
Residual Block 4x 1024
(batch_size,13,13,1024)
Conv2D 3x3+Conv2D 1x1
(batch_size,13,13,75)
Conv2D+UpSampling2D
(batch_size,26,26,256)
Darknet-53
Conv2D Block 5L 1024
(batch_size,13,13,1024)
Conv2D 3x3+Conv2D 1x1
(batch_size,13,13,75)
Figure 2.3: Framework of Darknet-53
Compared with Darknet-19, this network is more powerful. Moreover, Darknet-53 is also
more effective than ResNet-101 and ResNet-152 [32].
Each network is trained with the same settings and tested with a 256×256 single-precision
test. The run time is measured on a Titan X at 256×256. Therefore, the Darknet-53 is compa-
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
CHAPTER 2. THEORETICAL BACKGROUND
15
Table 2.1: Comparison of backbones.
Backbone
Darknet-19
ResNet-101
ResNet-152
Darknet-53
Top-1
74.1
77.1
77.6
77.2
Top-5
91.8
93.7
93.8
93.8
Bn Ops
7.29
19.7
29.4
18.7
BFLOP/s
1246
1039
1090
1457
FPS
171
53
37
78
rable to the state-of-the-art classifier, but with fewer floating-point operations and faster speeds.
The Darknet-53 is better than the ResNet-101 and is 1:5 times faster.
The Darknet-53 has similar performance to the ResNet-152 and is up to 2 times faster. The
Darknet-53 also achieves the highest measurement of floating-point operations per second. This
means that the network structure can make better use of the GPU, making it more efficient and
faster to evaluate.
The YOLO detection algorithm performs target detection and achieves high detection speed
and detection accuracy. The algorithm not only has a good effect on the real entity but also has
good compatibility with other objects, such as works of art. Compared with other algorithms,
the YOLO algorithm is more in line with the real-time requirements of the industry for target
detection algorithms. It is simple and easy to implement and is very friendly to embedded
systems.
The YOLO series continuously absorbs the advantages of target detection algorithms, applies
them to itself, and continues to improve. It is a growing algorithm.
In conclusion, YOLOv3 is by far one of the most balanced object detection networks in
terms of speed and accuracy. Through the integration of a variety of advanced methods, the
short boards of the YOLO series are refined, including the speed issues, the precision of small
object detection, etc. In other words, YOLOv3 finally achieves impressive object detection and
speed of detection.
To rehearsal the framework of the YOLO series, a great performance improvement can be
witnessed while the upgrade of YOLO versions.
2.2
Custom Image Processing Function
The solution to the traditional computer vision problem is to obtain image then preprocessing
then hand-crafted features extraction and finally classification. Most of the research has focused
on the construction and classification of artificial features, and many outstanding works have
emerged. However, the problem is that the characteristics of artificial design may not be applicable. In other words, the generalization ability is weak. This means that one type of feature
may be better for a certain type of problem, but in terms of other problems, it has a much poorer
performance. Nevertheless, as far as this topic regarding specific faulty tape-around weatherproofing detection, a very special example, the custom image processing can be implemented.
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
16
CHAPTER 2. THEORETICAL BACKGROUND
The following sections will deliver a brief introduction of several popular image processing
methods.
2.2.1
Greyscale and Binary Thresholding
A colourful image consists of three channels, red, green, blue, which is called RGB. The logarithmic relationship between white and black is divided into several levels, called greyscale.
The greyscale is divided into 256 steps, while the white is 255, and black is 0. There are several
approaches to convert the RGB image to greyscale image. However, the most common one is the
weighted average method. Because human vision is most sensitive to green colour, whilst most
insensitive to blue colour, the parameter of weights is produced. According to [35], following
equation 2.6 is defined.
Y = 0.299 × R + 0.587 × G + 0.114 × B
(2.6)
In a digital image, the histogram is the number of pixels for each grey values, and these pixels
are counted to show as a graph.
Image binary thresholding is the process of setting the grey value of a pixel on an image to
0 (pure black) or 255 (pure white) by setting up a threshold, which is the process of rendering
the entire image a distinct black and white effect.
A greyscale image of 256 brightness levels is selected by appropriate thresholds to obtain
a binarized image that still reflects the overall and local features of the image. In digital image
processing, the binary image plays a very important role. The binarization of the image facilitates
further processing of the image, which simplifies the image. In other words, the amount of data
is reduced, through which the target of interest and contours can be highlighted.
If a particular object has a uniform greyscale value inside, and it is in a uniform background
with other levels of grey values, the thresholding method can be used to obtain a comparative
segmentation effect. If the difference between the object and the background is not in the grey
value like different textures, these different features can be converted to a greyscale difference.
After this the threshold selection technique is used to segment the image. Dynamically adjusting
the threshold to achieve binarization of the image can dynamically observe the specific results
of the segmented image. In this use case, the brightness of different images cause different
grayscale values on the cable area, which means the binary threshold may filter the cables to
different shapes in another image.
2.2.2
Erosion and Dilation
As the terms suggest, erosion and dilation mean erode and dilate the area in the image. They are
both morphological operations. To be specific, the operation is contributed to the white area,
because the black means 0 in the image, which makes no difference to the operation.
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
CHAPTER 2. THEORETICAL BACKGROUND
17
These morphological operations feed structure element as input and the output is based on
shapes.
In terms of erosion, it convolutes the image with an arbitrary shape kernel, which is typically
square or circular. This operation makes the kernel slide the image and multiply with matrix in
the relevant image area. Then it extracts the maximum pixel value to replace corresponding
pixels in the image. Obviously, this maximization operation will cause the bright areas in the
image to start "expanding". In this way, dilation of the image is implemented, and the white area
will expand, while the black part erodes.
By contrast, erosion is the opposite operation. The difference is that, when the kernel slides
the whole image and convolutes with related pixel matrix, the minimum of result matrix is selected to replace the anchor-position pixel. Therefore, the white proportion becomes smaller,
whilst the black area expands.
2.2.3
Find Contours and Connected Component
Find Contours
Contour lines are defined as lines that join all points that have the same intensity along the image
boundary. The contour is very convenient in shape analysis, finding the size of the target object
and detecting the object [36].
Find Contours [37], as its name indicates, is the method which retrieves contours from the binary image. OpenCV introduces a find contour function that helps extract outlines from images.
It works best on binary images, which means thresholding measures like Sobel edge detection
should be implemented first.
Connected Component
Connected Component is a term in graph theory. The connected component is a subgraph of
the undirected graph [38]. Inside the connected component, any two vertices are connected to
each other by paths. Furthermore, any vertices of other components can be connected to this
subgraph. Based on this algorithm, every connected component can be labelled by a unique
number. In this way, the object in the image is able to be located and counted. This algorithm
also plays an important role when it is integrated into an image recognition system or a humancomputer interaction system [39], [40].
In OpenCV library, compared with findContours, connectedComponent and connectedComponentsWithStats are newly defined in OpenCV3. Furthermore, the connected component with
stats is able to return labelled value among the connected components
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
18
CHAPTER 2. THEORETICAL BACKGROUND
2.3
2.3.1
Line Segmentation Methods
Hough Transform
The Hough transform was invented in 1972 by Richard Duda and Peter Hart [41]. Then in 1981,
the method spread by the article [42]. This article describes the modification of the Hough transform using the template matching principle. It is important to know that the Hough transform
was originally developed to analyze defined shapes such as lines, circles, ellipses, etc. By detection of the shape and aiming to find the position and orientation in the image, this change
enables the Hough transform to detect any object described by its model. In this way, after using the generalized Hough transform (GHT), the problem of finding the position of the model
is transformed into the problem of finding the transformation parameters which map the model
to the image. Given the value of the transform parameter, the position of the model can be
determined in the image.
The Hough transform is usually performed after edge detection. Usually, the lines are expressed as y = mx +b. The basic idea of collineation is that a number of points are on this line.
As long as there are two points on the line, it can determine one straight line. Hence, the problem
can be transformed into solving all (m, b) combinations.
Set two coordinate systems, one coordinate system represents the (x, y) value, and the other
coordinate system represents the value of (m, b), which is the parameter value of the straight line.
Then a point (x, y) on the right corresponds to a line, and a straight line in the left coordinate
system is a point in the right coordinate system. In this way, the intersection point in the rightleft system indicates that there are multiple points passing through the line defined by (m, b).
However, this method has a problem that the value range of (m, b) is too large.
In order to solve the problem that the value range of (m, n) is too large, the normal expression
of xcosθ + ysinθ = p is used instead of the general expression in the representation of straight
lines. In this way, a pixel in the image space is a curve (triangular function curve) in the parameter
space.
The Hough Line algorithm is expressed as follows:
• Initialize (θ, p) space, N (θ, p) = 0 (N (θ, p) represents the number of pixels on the line
represented by this parameter).
• For each pixel (x, y), find the (θ, p) coordinates of xcosθ + ysinθ = p in the parameter
space, and N (θ, p) + = 1.
• Count the size of all N (θ, p), and take out the parameters of N (θ, p)> preset threshold.
2.3.2
Line Segment Detector (LSD)
The core of LSD is pixel merging in error control. LSD is known as a line segment detection
algorithm that achieves subpixel accuracy within linear-time [43]. Although LSD claims that it
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
CHAPTER 2. THEORETICAL BACKGROUND
19
does not need to manually set any parameters, in actual cases, the sampling rate, as well as the
difference in direction of two pixels, can be set. It is known that detecting a line in an image
is actually looking for pixels with large gradient changes in the image. The goal of LSD is to
detect local straight contours in the image. That is the reason why it is called line segmentation.
Contours are special areas of the image in which the greyscale of an image varies significantly
from black to white or from white to black. In this way, gradient and level-line are defined and
shown in figure 2.4.
Gradient
Figure 2.4: Gradient and level-lines definition
The algorithm first calculates the level-line angle of each pixel to form a level-line field. The
field is divided into several connected parts, and their directions are approximately the same and
within the tolerance τ so that a series of regions can be obtained. These regions are called line
support regions.
Each line support region is actually a set of pixels, which is also a candidate for a line segment. Furthermore, for this line support region, we can observe its minimum circumscribed
rectangle. Intuitively, when a group of pixels is particularly elongated, then the set of pixels is
more likely to be a straight line segment. A main inertia axis of the line support region is treated
as the main direction of the rectangle, and the size of the rectangle is selected to cover the entire
area.
The angle difference between the level-line angle of the pixel in the rectangle and the main
direction of the minimum circumscribed rectangle is within tolerance τ , then this point is called
"aligned point". Then the number of all pixels in the minimum circumscribed rectangle is summarized. After counting the number of aligned points inside this rectangle as well, the two
statistics are used to determine whether the line support region is a straight line segment. The
criteria for the decision are a contrario approach and the Helmholtz principle [44].
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
20
CHAPTER 2. THEORETICAL BACKGROUND
2.4
Precision and Recall
Recall rate is a measure of the ability of search systems and searchers to detect relevant information, and precision is a measure of the ability of search systems and searchers to reject non-related
information. Therefore, the accuracy rate is equal to the number of correct information extracted
divided by the number of extracted information, whilst the recall rate is equal to the number of
correct information extracted divided by the number of information in the sample. Expressed in
the formula, the function of precision and recall can be explained in the following formula 2.7
and 2.8, and parameter TP, FP, FN are indicated in table 2.2.
P recision =
TP
TP + FP
(2.7)
TP
TP + FN
Moreover, the contingency table is shown in table 2.2.
(2.8)
Recall =
Table 2.2: Contingency table
Actual
1
Actual
0
Total
Prediction
1
TP: True
Positive
FP: False
Positive
TP+FP:
Predicted Positive
Prediction
0
FN: False
Negative
TN: True
Negative
FN+TN:
Predicted Negative
Total
TP+FN:
Actual Postive
FP+TN:
Actual Negative
TP+FN+FP+TN
A general indicator for evaluating a classifier is classification accuracy, which is defined as
the ratio of the number of samples that the classifier correctly classifies to the total number of
samples for a given test dataset. But for the second-category classification problem, especially
when we comment on the minority party in the second-category classification, the accuracy rate
basically loses its significance as a judgment standard. For example, when a classifier is made
for cancer detection, suppose there are 100 samples, 99 of which are positive (no cancer) and one
is negative (with cancer). Suppose a model is applied to judge that the result is always positive
to make predictions. The accuracy rate of this model is the proportion of correct predictions
number in total number, which comes to 99%, but if this model is used to make predictions,
not a single cancer patient cannot be distinguished, in which case the accuracy rate will lose the
significance of evaluation. Therefore, for the two-category classification, the more commonly
used evaluation indicators are precision and recall. Generally, the class of interest is a positive
class, and the other classes are negative classes. The prediction of the classifier on the test dataset
is correct or incorrect.
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
CHAPTER 2. THEORETICAL BACKGROUND
2.5
21
Related Work
So far, there is not any related work regarding specific faulty weatherproofing detection. However, there are some partially relevant researches on kinds of small object detection as well as
some techniques which are introduced above. These researches deliver good examples of the
implementation of the algorithms.
2.5.1
Small Object Detection
This part introduces a series of AI-based algorithms for the solution of several use cases. In [45],
the work was implemented on YOLOv2 and CNN classifiers, which aims to detect small objects
in the whole image. The objects they applied on were traffic signs, and real-time was crucial to
their work. What they explained were that their former work, Faster C-RNN was not suitable
for the execution time demand. Even though HOG (Histogram of Oriented Gradients) [46] and
SVM (Support Vector Machines) [47] performs in 20 fps, it cannot meet the requirement for
real-time processing.
In terms of [48], the report explained the disadvantage of existing object detection methods,
which was the low performance in small object detection. As a consequence, ERF-YOLO was
conducted to deal with the problem. The authors used expanding receptive field block, downsampling and up-sampling based on YOLOv2. The result of normal small object detection in
their report was significantly improved, which is illustrated in figure 2.5. Furthermore, their test
samples also included self-built dataset, which concretely detected the aeroplanes on the satellite
by remote sensing.
100
90
80
70
60
50
40
30
20
10
0
mAP aero
bird
Faster-RCNN[1]
boat bottle car
SSD300[4]
cat
chair table plant sofa
YOLOv2 544[3]
ERF-YOLO 544
Figure 2.5: ERF-YOLO performance comparison
The [49] explained the difficulty of small object detection owing to the background interfer-
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
22
CHAPTER 2. THEORETICAL BACKGROUND
ence. Thus, they proposed a modified network on YOLOv3, which is added multi-scale convolution kernels and Google-Inception network. Their research improved the mAP slightly, with
around 1.5% in terms of the small objects.
Another group of researchers [50] implemented and compared target detection algorithms,
SVM and YOLO. Their use cases were detecting small targets using a drone with a pan-tiltzoom camera. Their crucial problems were accuracy, environment adaptation as well as execution time. Comparing these two methods, SVM was not chosen because the image required
pre-processing for ROI extraction and performed poorly when it comes to black objects. As a
result, YOLO was applied to their research. Furthermore, the network can be optimized because the objects are too small. Thus, only one scale will be used in practice. Finally, after the
improvement of the network , the model recall came to 70% and 85% accuracy.
2.5.2
Line Segmentation
Till now, a series of line segment detection methods are developed, including hough, LSD, CannyLines [51], EDlines (Edge drawing lines) [52], LSM (line segment merging) [53], etc. These
algorithms have their pros and cons and can be performed in different use cases.
It explained real-time lane detection in streets or highways in [54]. This system was based
on LSD algorithm and using an inverse perspective mapping from the top view of the image.
The system achieved 70 frames per second and was able to detect lanes that distinguish between
dotted and solid lines, straight and curved lanes.
LSD algorithm was also applied on the lane detection and tracking with fusion of Kalman
filter [55]. In the project, line segment was employed as low-level features for lane marking
detection, and LSD performs well in detection after filtering, which achieves around 95% correctness, 4% fault and roughly 1% missing detection.
LSD algorithm played important roles in different applications. In [56], researchers used
LSD to produce numerous line segments and after the filtering, the airport candidates were
extracted, which benefited the consecutive processing. Moreover, LSD can extract all the line
segments in [57], after that, the following algorithm can detect these regions of interests are text
region or not. Furthermore, authors in [58] implemented this algorithm for delivering highlyaccurate building contour segments.
EDlines is another powerful algorithm. In [52], the author illustrated that the work ran much
faster than former methods. The result of [52] is shown in figure 2.6 below. It is clear that
EDlines detected slightly less number of lines but improved significantly regarding the execution
time. As a consequence, researches devoted themselves to address the line segment problem
through EDlines algorithm. [59] illustrated runway detection in real-time. EDlines were used
to extract straight line segments. And those fragmented lines can be linked into long runway
lines. Similarly, it was also performed in [60], aiming to develop the system which had false
lane detection control in real-time.
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
CHAPTER 2. THEORETICAL BACKGROUND
23
Number of Lines
900
800
700
600
500
400
300
200
100
0
Office
Man and
bicyle
Zebra
EDLines
Boy and girl
Chairs
House
Chairs
House
LSD
Execution Time
ms
200
150
100
50
0
Office
Man and
bicyle
Zebra
EDLines
Boy and girl
LSD
Figure 2.6: Performance comparison of EDLines and LSD
Power line detection was also one of the special cases of line segment detection [61]. They
compared the performance of LSD and EDlines. The result shows that two methods have a
similarity with accuracy, but EDlines (1ms) ran faster than LSD (3ms).
EDlines was also used to solve a variety of problems. A high-speed automatic building
extraction from high-resolution images was presented in [62]. EDlines was conducted for realtime accurate extraction of building line segments, which influenced the shape of the buildings.
Their method for building extraction is the line linking and closed contour search. Furthermore,
EDlines was presented in space target problem [63]. Owing to the rigid body in the space, the
object had massive line features. So the feature matching between objects can be processed.
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
24
2.5.3
CHAPTER 2. THEORETICAL BACKGROUND
Connected Component
A connected component is one of the most basic steps applied in the image processing systems.
By assigning unique labels to all pixels that refer to the same object, the technology can distinguish different objects in a single image. Researchers in recent years are mostly optimising the
execution time of the system.
Report [64] reviewed state-of-art connected component labelling problem in the past decade,
and each of the algorithms was attempted for the implementation.
Literature [65] introduced a block-based algorithm for labelling, and the number of neighbourhood operations were reduced. Furthermore, they used two different processes with a binary
decision tree to integrate block connection relationships in order to reduce unnecessary memory
access. This greatly simplifies the pixel position of the block-based scan mask.
In [66], researchers introduced a new "connected component tagging" hardware architecture
for high-performance heterogeneous image processing for embedded designs. Concretely, the
most complex proportion of the connected component labelling was processed in parallel, so the
access of the memory became one of the critical issues in their work.
2.6
Summary
This chapter introduces the theoretical background and state-of-art technologies which are potentially to be implemented in the algorithm. Concretely, it explains a number of object detection
methods based on convolutional neural network, after which a series of custom image processing
methods are also presented, including morphological, connected component, etc. Furthermore,
this part illustrates two-line segmentation algorithms, Hough transformation as well as the line
segment detector. Precision and recall are also mentioned, which will be used to evaluate the
performance of the program.
Last but not least, related works are also shown in the last section, showing clear state-of-art
research in the techniques, including YOLO, line segment detection method and the connected
component algorithm. The literature review gives a clear direction of how this thesis to be
developed.
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
Chapter 3
Methods
This section describes the methods used in this thesis. The purpose of indicating these methods is
to clearly understand the reasons and decisions of these research methods before implementation.
This section is divided into three sections, which covers the research methodology of the thesis,
software environment introduction, as well as the experimental methods.
3.1
Research Methodology
To fulfil the scientific objectives, choosing between different methods is required to answer the
scientific questions. As this thesis covers a new area of the concept, it is even more important to
decide which methods to implement thoroughly. Thus, a large proportion of the work is focused
on a better way to achieve higher performance, including the precision and the recall. In chapter
2, two basic methods are introduced for object detection. Therefore, these two methods will be
implemented and compared.
There are several parameters to be considered when deciding which method to use. In terms
of the source of the decision-making process, quantitative and qualitative analyses were applied.
The experiments with the methods that need to be discussed are mostly regarding precision and
recall, and less related to the execution time. However, the object detection method based on
CNN may not predict well because of the insufficient number of the test set and its complexity
of the parameter adjustment. In this way, if the object detection method can hardly predict the
faulty weatherproofing correctly anyhow, another method, the image processing method must
be made every endeavour to achieve higher performance.
In this paper, the method is divided into four layers, and each layer is divided into two types
of research: qualitative research and quantitative research in figure 3.1 [11]. In figure 3.1, these
layers are illustrated with terms that will be introduced in each chapter.
25
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
26
CHAPTER 3. METHODS
Qualitative
Quantitative
Interpretivism
Positivism
Experimental
Experiments
Empirical
Coding
Statistics
Computational
mathematics
Validity Ethics
Criticalism
Interviews
Language
& Text
Narrative
Reliability
Replicability
Reproducibility
Philosophical
Research
Methods
Data
Collections
Quality
assurance
Figure 3.1: Research methods and methodologies
3.1.1
Qualitative and Quantitative Research Methods
A qualitative research paradigm is subordinate to constructivism and hermeneutics [67], which
concentrates on the meaningfulness of certain question. In this thesis, qualitative research is
developed in the decision of the algorithm. The decision is on account of the reports and theory
as a qualitative methodology.
All other experiments implementation are developed on quantitative research methods, where
experiments are conducted with different parameters to analyze the results. The result determines whether the assumption method is valid or invalid. In this quantitative research, it is
important to have a large amount of data to draw valid conclusions and prove the performance
of the system.
Table 3.1: Summary of Research Methodology
Quantitative research
Precision
Recall
Runtime
3.1.2
Qualitative research
Decision of the image processing method
Decision of the OpenCV processing function
Philosophical Assumptions
Philosophical assumptions are the theoretical frameworks for researchers to collect, analyze and
interpret the data in certain fields [68]. This thesis uses positivism, criticalism, and interpretivism [11]. Positivism assumes that when performing experiments, the real radio base station
and the real environment simulated in the paper are very similar, and the execution time test represents the running time in most cases. Moreover, criticism is also used to explain the strength
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
CHAPTER 3. METHODS
27
and weakness of the methods. However, qualitative research use interpretivism to introduce
ideas and experiences.
3.1.3
Research Methods
There are two related methods, named experimental procedures and empirical procedures. The
object detection methods are empirical, whilst the rest are experimental.
In experimental research, researchers study the causes and effects of experiments and adjust
the code to meet the requirement. In this way, the connections between variables and other relationships are observed in the experiment. This method is commonly used when researchers not
only want to obtain the expected delivery but also want to learn more about system performance,
like precision. Another method is the empirical method, which relies on people’s experience and
observations and collects data from others’ data to verify hypotheses.
3.1.4
Data Collection and Analysis Methods
This section describes how different data are collected and how to analyze them.
The main method to collect data is during experimentation. The test set is taken from a
small-scaled radio base station located in the laboratory. Hence, the photo shooting via drone
or phone camera is accessible at any time. In other words, the simple test sets are available.
However, in order to prove the system robust, enrich the training and test sets are imperative.
In this way, the company supervisor created a relatively multi-situations dataset for the thesis.
This is able to collect a number of variable data. The decision of the object detection method is
based on text, language and interviews. By reading the theory, and more importantly, text and
language can be discussed from previous reports. Interviews are used to gain knowledge from
experts and gain information through their experience with object detection.
Narrative analysis, coding, statistics, and computational mathematics are applied in this work
regarding the data analysis. Precision, recall and execution time are all statistics and computational math and statistics, where calculations and algorithms are contained. These raw data
derives from measurements, while the processing of them is on account of statistics, including
summation and averaging. Coding is essential during the implementation. The output of the
procedures or functions are observed during the coding and the algorithm will be analyzed and
debugged when computational math also joins in to adjust parameters. The decision of object
detection experiments is both coding and narrative analysis. After reading relative literature,
coding is necessary for reproductivity, and also the analysis of observations. In terms of the
narrative analysis, it is mainly used when learning background and reading previous research.
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
28
3.1.5
CHAPTER 3. METHODS
Quality Assurance
To improve the validity, the system should be as general as possible to meet all the requirements.
However, after the tests of the object detection method on account of CNN, the system cannot
perform well among the test sets. In this way, the computer vision and image processing method
are introduced. Although this method has more limitations, its performance is much better than
object detection on CNN. The thesis also explains algorithms and experiments in detailed, so it is
believed to be reproducible. However, owing to different background and the shooting angles, the
level of repeatability is low. This means that for the same parameter, the output can be different.
For the same output, it is barely to achieve for other researchers. However, a comparison of the
methods and their reproductivity should be possible and similar conclusions can be made.
3.1.6
Inference
Some supporting assumptions must be made for the validation of the research. Some auxiliary hypothesis is conducted in this thesis. Concretely, the environment during the execution
is presumed to be stable, which will not affect the execution time of the system. Furthermore,
assumptions under different background are carried out. It is important to note that due to the
workload and method limitations, the author tried his best not to take other influences into account, for instance, considering limited use cases. Some special case may become the poor
results obtained when experimenting in dark conditions. These results were therefore removed
and stated that the results were not available in the dark background, which is the limitations. For
"correct results", temporary modifications can be used. In this case, the temporary modification
may be "the system only runs when the background is bright." It is vital to announce that the
input may change under various background, but the system should be robust enough to always
meet its requirements.
3.2
Software Environment
In terms of each method, the environment is different. Concretely, since the training is timeconsuming and requires high performance on GPU, the object detection method will train on the
small server in research lab in the company, with desktop Intel i7 and Nvidia Geforce GTX 1080
graphic card. After training, the weight is fed into the program and can run on the local device.
On account of the image processing method, the program will run on the laptop sponsored by
the company. A brief introduction is given in terms of setting up the software environment in a
different method.
The environment was set in advance by the researchers before in the company. The Yolov3
is already setup under ubuntu 16.04 and GCC version 5.4.0. The driver for the graphic card is
410.48 while the CUDA is under 9.0 version from Nvidia. All the sources are accessible from
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
CHAPTER 3. METHODS
29
the company web page. On account of the labelling data, an open-source tool, Openlabeling is
implemented.
Considering the image processing method, the program is implemented on the laptop and
is programmed on the python language. The library applied in the program includes a line
segment detector, OpenCV library, numpy library, etc. These libraries are available online and
open-sourced for import.
3.3
3.3.1
Experimental Design
Method of YOLO
Start
Start/Continue
training
N
Image labelling
Set up the network
If the loss meets
the requirement
Y
Stop training
Feed training set into
the network
Test images with
trained weight
End
Figure 3.2: Flowchart of the training of object detection model
The steps of this method are illustrated in the flowchart above. These steps are general instructions for the object detection, which can be divided into preparing, training and testing sections.
First of all is the image labelling, followed by the network parameter setup. Then, these
labelled data will be fed into the network and start training. During this time, the loss during
the training will change and display on the console window. Finally, after the loss is stable and
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
30
CHAPTER 3. METHODS
the loss is relatively small, the training is done. When it comes to the testing part, these weight
after training is settled and can be used for testing.
The performance of the system depends significantly on the training. Concretely, various
conditions will affect the final weight. These conditions are training set and test set difference,
labelling accuracy and whether the training is overfitted.
Considering the training set, it is better when these data contains all the conditions in terms
of the environment, brightness, cable gestures, etc. Generally, if the training does not include
the test data, the network will be extremely hard to predict the true positives in the test set.
Meanwhile, if the training is based on fewer conditions or even single environment, the result
of the prediction will easily turn to overfit, which means that the performance of the test sets
which are similar with the training set will be excellent, whilst other test sets will perform poorly.
Hence, this incorrect selection of modelling samples, such as too few samples, incorrect selection
methods, incorrect sample labels, etc., resulting in insufficient selected sample data to represent
predetermined classification rules. This needs to be avoided during the training.
In terms of the labelling accuracy, the labelled image must contain the cable or the flagging
tape as the object. The object is labelled in a rectangle, which is called the bounding box.
Nevertheless, the cables are curved object and flagging tapes are small parts among the cable.
Therefore, how to accurately localize the object is essential and will reduce a significant loss
during the training later.
On account of the overfitting issues, apart from what has been explained above, the monitor of
the average IoU and loss must be careful. There may be non-unique classification decision for the
sample data. As the learning progresses, the backpropagation algorithm may make the weights
converge too complex decisions. Furthermore, if the number of iterations for the weight learning
is sufficient, which is overtraining, the training will fit noise in the data and no representative
features in the training examples. Hence, the performance will be influenced.
3.3.2
Method of OpenCV and Image Processing
Considering the former work has not met satisfying result in section 2.5, it is important to consider another method, so the custom image processing and OpenCV are necessary for this thesis.
Custom image processing is generally implemented on the feature of the image. In this case,
the faulty weatherproofing detection, it is critical to analyse the feature of flagging tape.
The cables on the cable unit generally are black and thin columns going from the hex nut
to the end and they are also pixel-consecutive. Though the weatherproofing type may vary, the
cables are anyhow derived from the cabinet to another end. When it comes to the flagging tape,
the feature is to detect the flagging part outside the original part of the cables. So it is possible
to detect the cable first and localize potential flagging tape in the following step.
As a result, the method could be delivered in the following way. First of all, it is necessary
for image binarization. After that, the method of removing other blobs should be implemented,
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
CHAPTER 3. METHODS
31
during which it is possible to gain information of the potential cables, which are usually the
dark column-shaped objects. Meanwhile, each cable, representing as connected components,
should be separated and identified in the image, and the other objects will not be considered to
participate in further processing.
After that, these cable contours will be described in functions. To generalize the system, the
linear fit of the cable contours are implemented for the processing. In this way, the flagging tape
is able to be detected, because the flagging tape part is outside the linear-fitted contours of the
cable body. Implementation of these methods will be introduced specifically in chapter 4.
Start
Note:
CC refers to
Connected
component
Find line segments in
CC
Image initialization
Get line segmentation
inside expanded CC
Full image CC:
Remove Blobs
Find close endpoints
within threshold
Image segmentation
and CC
Y
Detect and get cable
information
Segmentation of each
CC
CC, Index cables and
remove rest blobs
Image morphology to
fill the reflection
N
Linear fit:
similar line
slop?
Connect new
endpoints with
single line
Connect the
endpoints with
polyline
Get cable contours
and calculation
subtraction
Store and draw
endpoints outside the
cable
(To be the flagging tape)
Figure 3.3: The flowchart of the detection system
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
32
3.4
CHAPTER 3. METHODS
Summary
The four layers of the method are discussed, which contains philosophy, research methods, data
collection and analysis, and quality assurance. Furthermore, this chapter introduces the two
methods and preparations included in this thesis. Software environment and experimental design sections are discussed in terms of these two methods, YOLO and image processing on
computer vision. In YOLO method, the process can be divided into labelling, training, and testing. However, removing noises, cable detection and flagging tape detection are three steps for
the implementation of the image processing algorithm.
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
Chapter 4
Design and Implementation
This chapter will show a deep perspective of the thesis. Two algorithms introduced in chapter 3
will be implemented in detail.
4.1
Implementation on YOLO Network
The object detection method based on YOLO is simple for implementation, and the algorithm
is implemented by primarily five steps, which is described in section 3.3.1.
4.1.1
Data Labelling
Basically, the first premise of all machine learning methods is to provide the dataset to train the
neural network. Under some circumstances, the already collected and open source datasets can
be used, such as handwritten digital MINST database (Modified National Institute of Standards
and Technology database), ImagNet dataset, COCO dataset, etc. Otherwise, own training set
should be created by collecting data oneself. In this thesis, the dataset is the cables with or
without flagging tape, which is unique images. Thus, the second method, the self-created dataset
should be fed into the convolutional neural network and trained.
One of the easy-to-use image labelling tools is OpenLabeling which can create own dataset.
This tool enables users to input the video or images as raw data. After the object is labelled
in the image, a related file with the information of object class and location are generated in
the XML (Extensible Markup Language) file. These files are in the format of PASCAL VOC
(Visual Object Classes) dataset, which provides a standard evaluation systems. Each XML file
has the same name as corresponding image, and it contains the category and position (Xmin,
Xmax, Ymin, Ymax) of the objects. Furthermore, the program is able to track the object with
deep learning method for the prediction of the next frame image. Thus, this tool converts the
object in the image into the PASCAL VOC format which contains these labelling information,
and in this way, the object can be labelled efficiently.
33
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
34
CHAPTER 4. DESIGN AND IMPLEMENTATION
Figure 4.1: OpenLabeling tool interface
4.1.2
Network Configurations
YOLO has provided a relatively perfect convolutional neural network because the configuration
file offers various parameters for the adjustments. In this use case, one network is implemented
to locate the cables, while another is applied to distinguish the flagging tapes and good weatherproofing cables. In this way, the former network has one class, the cables, and the latter also
includes one object, the flagging tape. The following part will give a brief introduction of several
parameters which can be adjusted for different situations.
• Batch: The number of pictures sent to the network per iteration, also called the number of
batches. If this is increased, the network can complete an epoch in fewer iterations. Under
the premise of fixing the maximum number of iterations, increasing the batch will increase
the training time, but it will better find the direction of the gradient descent. If the memory
of GPU (graphics processing unit) is large enough, the value of batches can be increased
in order to improve memory utilization. If the number is too small, the training will not
converge enough, and if it is too large, it will fall into a local optimum. max_batches refers
to the maximum number of iterations.
• Subdivision: This parameter makes the system not throw every batch of you into the network at one time. The training set is divided into the number as the subdivision represented. After running thoroughly, one iteration is finished. This will reduce the occupation of video memory. If this parameter is set to one, all batch images are thrown into the
network at one time. When the subdivision is two, it feeds half at a time.
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
CHAPTER 4. DESIGN AND IMPLEMENTATION
35
• Angle: image rotation angle, this is used to enhance the training effect. Basically, it is to
increase the training sample set in disguise by rotating the picture.
• Saturation, exposure, hue are used to enhance the training effect.
• Learning_rate: The learning rate. If the training diverges, the learning rate can be reduced. When learning encountered a bottleneck or loss is constant, the learning rate will
be reduced.
• Step and scales: These two parameters are together. For example, it is assumed that the
learning_rate is 0.001, the steps are 100, 25000, 35000, and scales are 10, .1, .1. This set of
data means learning during 0-100 iterations, the rate is the original, which is 0.001. When
it comes to the learning rate during 100-25000 iterations, it is 10 times the original 0.01,
and the learning rate during the 25,000-35,000 iterations is 0.1 times the current value,
which is 0.001. The learning rate is 0.0001 during the 35,000 to maximum iteration,
because it is 0.1 times. As iteration increases, reducing the learning rate can be more
efficient learning of the model, that is, a better reduction of train loss.
The number of filters in the last convolutional layer is 3 × (class_number + 5). The output
needs to be constructed as S × S × 3 × (5 + class_number). YOLO divides the image into a grid
of S × S. When the target centre locates in a certain grid, this grid is used for detection. This
is the origin of S × S. YOLO has three scales of output. As each grid needs to detect 3 anchor
boxes, so for each scale, the output is S × S × 3. For each anchor box, it contains coordinate
information (x, y, w, h) and confidence, which was described in section 2.1.2, which are 5 pieces
of information; it also contains information about whether all categories are used, using one-hot
encoding. For example, there are three classes: the person, cars, and dogs. When a detection
result is a person, so it is encoded as [1,0,0]. It can be seen that all category information will be
coded. In this case, there is one category, so the result is six. Therefore, for the output of each
dimension, the result is S × S × 3 × (5 + 1) = S × S × 18.
4.1.3
Training Dataset
The training process will display some parameters on the console window, which shows the
intermediate result during the training. It is necessary to keep monitoring these results and react
for the next instruction. There are some parameters which reflect the training process, including
IoU, class, object, average recall, loss. The former four parameters are expected to be higher till
one ideally, while the loss is supposed to be a Speed Factor. However, in real situations, when
the recall is too large, there are risks where the network is overtrained. For example, empirically
speaking, a percentage of 90 is a relatively high value approaching the risk boundary.
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
36
CHAPTER 4. DESIGN AND IMPLEMENTATION
4.2
Implementation of OpenCV and Image Processing
The object detection implementation on OpenCV and LSD algorithm is based on several steps
in figure 3.3, applying various library and functions.
4.2.1
Image Initialisation
The first step is the image pre-processing, including file path management, resizing, greyscale
and binary threshold. These steps lay solid foundations of further processing later. Algorithm
1 is shown below. The reason for inverting the binary image is that the cable to be detected
is mostly black. However, black is treated ideally 0 after greyscale. In this way, inverting the
greyscale image is meaningful to process the image. However, in order to simplify the explanation, the colour of the images will be explained with their raw data in the following sections.
Algorithm 1: Initialize the Image
Result: Resize and threshold image
1 Import libraries;
2 Set all constant parameters;
3 Set directories of input and output folder;
4 Binary threshold the image;
5 Dilation of the binary-inversed image ;
6 Image resizing;
4.2.2
Remove Blobs in the Image
Blobs are primarily the areas in the image which differ in brightness, colour or other features
compared with the adjacent regions. In this case, the method is aimed to remove the small
areas which are not potentially cable. This operation will have a positive effect on filtering the
background.
Figure 4.2 shows steps of finding blobs in the image. The first image is the original image,
and it will be binary filtered in the following step. After that, the binary image is inversed and
dilated. The last step is to implement the connected component algorithm and output small
connected components, which are the blobs in this section.
Specifically, the method is implemented by means of connectedcomponentsWithStats in
OpenCV library. This function is input with two parameters, a greyscaled image and the connectivity, as well as producing four parameters which will be processed in later functions. Through
connectedcomponentsWithStats function, a variety of information of connected component will
be obtained, including retval, labels, stats and the centroids.
• Retval introduces the total number of connected components.
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
CHAPTER 4. DESIGN AND IMPLEMENTATION
37
Figure 4.2: Sample blobs in the test set
• Labels are the map of pixels, which are labelled with the serial number of each connected
components.
• Stats explains some information of each connected components, including the coordinate
of top-left corners, maximum widths, maximum heights, and total areas.
• The centroid is the centroid of the image, but it is not used in this program.
In terms of the input, the parameter connectivity offers the opportunity of connection option.
The connectivity can be separated into 4-ways and 8-ways and is illustrated below in figure 4.3.
4-way connectivity only focuses on 4 adjacent pixels in the vertical and horizontal direction,
while 8-way connectivity includes the other 4 pixels in the diagonal direction. In this way, the
8-way connectivity is a more complete option since it will contain more connected components
and contain more scenarios.
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
38
CHAPTER 4. DESIGN AND IMPLEMENTATION
4-way connectivity
8-way connectivity
Figure 4.3: Connectivity Introduction
Algorithm 2: Remove the blob in the image
Result: New Matrix without blobs
Input: matrix inversed image and its retval, labels, stats
1 for i ← 0 to retval do
2
if Connected component area > threshold blob area&&height >
threshold × height then
3
Store number of the connected component in list blobs
4
end
5 end
6 for pixels P ∈ Inversed image do
7
if label of each pixels ∈ blobs then
8
Output Matrix ← Labelled pixels
9
end
10 end
4.2.3
Calculate Cable Information
Cable Positioning and Width
This section introduces how to get potential cables information, including the position, cable
width and the total number of cables.
In the test images, there are usually several cables. In this way, getting information about
these cables are vital for further calculation. The information primarily contains the position of
the cable as well as the cable width.
Segmentation of the image is introduced to locate the cable. From the stats parameter in
section 4.2.2, it is available to obtain the width and height of the bounding rectangle, and the
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
CHAPTER 4. DESIGN AND IMPLEMENTATION
39
top-left vertex is also settled. In this way, the detection is inside each bounding box. However, as
the image is big and with various backgrounds in brightness or darkness, errors can be produced
during the detection. These errors will result in difficulty for cable detection, so a concentration
of the part of the image is implemented to solve this problem.
In the cases, to be notified that the cables are ranged from the cabinet bottom to the bottom of
the image. When it comes to the binary image, the cable itself may not be exposed completely
under the unsuitable binary threshold. Concretely, the cable surface has some reflections in
many cases, which causes the area are not to be detected as pure black (0 in greyscale).
The algorithm 3 proposed in this section is first to segment the image. The images were
segmented and concentrated on the target proportion. In this program, the target range is set
from 0.5 times by height to 0.75 times by height, and in the horizontal direction, the abscissa is
from the left-most abscissa to the sum of abscissa and bounding box width, as it is displayed in
figure 4.4.
Figure 4.4: Bounding box selection
In this range, basically, the method detects and makes the statistics to black pixels on certain
rows. Nevertheless, the statistics of a single row is not reliable owing to the background or even
the flagging tape existence. Thus, various rows should be detected. Hence, it is more suitable to
collect data in different rows. Accordingly, the segmented height is divided in average into 10
samples with the same distance to the adjacent one.
Ideally, the black pixels fill the whole cable. However, owing to the illumination of the cables,
some regions are filtered to be white colour. In order to reduce the error, all of the 10 rows are
counted. Thus, the black pixels on the rightmost and leftmost of the connected components are
the edge, and the width of cables of each row are calculated by their subtraction. These widths
are then ordered, while the middle 6 rows are considered for width calculation. Furthermore,
during this data collection, the number of black pixels in each line will be counted for filtering.
Concretely, if the black pixels are few, this row will be filled with zero black pixels for the
ordering. The background is also not considered because it takes the largest proportion in the
whole area. This algorithm is explained below in algorithm 3. This cable width calculation will
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
40
CHAPTER 4. DESIGN AND IMPLEMENTATION
figure out the intercept of the cable in the horizontal axis, rather strictly the cable width which
is vertical to the cable itself.
Algorithm 3: Get cable information
Result: Cable abscissa range and cable width
Input: label matrix without blobs f ull Out Label
1 Set segmentation parameter, T op Ordinate, Bottom Ordinate;
2 half Inversed Image ← full Inversed Image [T op:Bottom];
3 half Labels ← connectedComponentwithStats(half Inversed Image);
4 half OutLabels ← Remove blobs in half Labels;
5 retval, label, stats ← connectedComponentwithStats(half OutLabels,8);
6 Discard the zeros in the background in stats through area filtering;
7 Sets 10 ordinates sampleP oint and relative abscissa range;
8 for i ← 0 to retval do
9
for j ← 0 to 10 do
10
for xCoordinate k∈ abscissa range do
11
if labels[sampleP oint[j], k] = i then
12
Store List tempRange x coordinate k;
13
end
17
end
if length(tempRange) > cable width threshold then
Record the maximum and minimum element in tempRange;
List cable width temp = maximum - minimum;
18
end
14
15
16
19
20
21
22
end
Obtain X coordinate min and X coordinate max;
Sort cable width temp and obtain average width of cable width temp[2 : 8];
end
Separate Cables
However, in the perspective of the whole image, these cables can become a single connected
component because the cabinet bottom is dark like cables, through which cables connected to
each other. After implementing the algorithm 3, this concentration on the proportion of the
image will not take the cabinet bottom into consideration. In this way, cable x-coordinates and
width are figured out, which help with truncation of each cable.
The method to cut off cabinet connect is in brute-force. Basically, from former cable information result, the x-coordinate range can be obtained for each cable. Therefore, the distance of
the adjacent cable can be calculated. In order to cut off the connection, the middle part of the
adjacent cables can be filled with the pure background value. This will not influence the result
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
CHAPTER 4. DESIGN AND IMPLEMENTATION
41
of connected component any more since the cables are already cut away from the connection.
Therefore, the cables are observed to be separate. However, there are still some bigger blobs
in the image. In section 4.2.3, the image is processed in a specific area and non-cables are removed because there are not sufficient black pixels on the sample rows, so these faults are not
processed in the loop. However, looking into the whole image, there are still some faulty objects.
The output of line 7 in algorithm 6 below aims to separate the cables rather remove the faulty obAlgorithm 4: Cable separation
Result: New Matrix without blobs
Input: matrix cable inf ormation and retval, labels, stats of half OutLabels
1 for i ← 0 to cable total number − 1 do
2
cable Distance = abscissa: (cable[i + 1] - cable[i]);
for j ← abscissa[i] + 1/3cable Distance to abscissa[i] + 2/3cable Distance do
jects. 3
4
Fill column j with 0 → f ullOutLabels
5
end
6
7
8
end
connectedComponentwithStats(f ullOutLabels,8);
Index the x-coordinate from f ullOutLabels to half OutLabels
Output Image with Only Cables
In terms of removing other blobs in the whole image, the function in section 4.2.3 is able to
select the correct cables and obtain the x-coordinate from algorithm 3. As the x-coordinate is
the same in the two label matrix, it can be used as an index for the cable localization for fullimage processing. After using this index, the algorithm 6 aims to separate cables to different
connected components. After this section, the output ideally should be only cables. However,
owing to the background noises, some false positive detection may also exist, if the background
is a column-shaped dark object.
Considering the shade or illumination of the cabinet base which is on the top of the image,
they will remarkably affect the result. However, this thesis does not focus on the detection of
the radio cabinet. Hence, the cabinet is segmented out from the image. Specifically, a simple
method of feature extraction, the line segmentation detection method is implemented to detect
the bottom line of the radio cabinet, after which the program fill the upper part of the bottom
line as zeros to illuminate the errors.
Inner reflection of cables
After the filtering and indexing, the potential cables will be filtered out primarily, while a few
dark and column-shaped object may still exist. The basic idea of this section is to remove the
reflections which are located inside of the cables. A sample picture of the image is shown below
in figure 4.5.
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
42
CHAPTER 4. DESIGN AND IMPLEMENTATION
Figure 4.5: Inner reflection sample
Therefore, method solving these inertial blobs is to be carried out. The basic idea is the
morphology transform of the binary image. As mentioned in section 2.2.2, erosion will erode
the white area while dilation will expand the white area. As a result, assuming the convolutional
kernel is in a certain size, if the image is eroded first, the white blobs inside the cables will
vanish, whilst after that when this image is dilated, these blobs will not appear as before. On the
contrast, the edges of the objects will keep the same position as they are in the original image
since the dilation and erosion are on the same scale.
4.2.4
Line Segment Detection
LSD method, line segment detection, is implemented to extract the lines in the whole images,
which is illustrated in section 2.3.2. This line extraction method is fast and efficient for line
extraction in most cases. The following image is a sample output of the LSD method, and the
image is resized initially.
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
CHAPTER 4. DESIGN AND IMPLEMENTATION
43
Figure 4.6: LSD sample
After implementing the LSD algorithm, the output indicates all the information of segments,
including the coordinates of the two endpoints in the segment as well as the width of the line.
From the output of algorithm 3, the width of the cable can be obtained easily and also be set
as the standard length for the filtering. Implementing through it will improve the stability of the
system when zooming in or out, as the cable will transform in the same scale as the whole image
does.
Line segments and contour processing
It is necessary to process line segments in the program, and how to choose suitable segments
is significantly crucial. Selection of line segments must base on the rule that fewer segments
which are available to indicate the edge of the cable. Therefore, two aspects of the purpose are
conducted. One is to merge the lines and discard some lines which are out of the cable range,
whilst the other algorithm finds suitable long lines.
After determination of the cable width, segment ending points can be filtered and merged
under a certain distance threshold. If the distance of two-line vertexes is relatively close to each
other, these two ending points can be treated as one, and when the slope of these two lines are
similar, they can be merged to a single line, otherwise only the ending point becomes the same.
After merging similar endpoints and straight lines with similar slopes, the number of lines
is significantly reduced. The next step is to further filter out all line segments inside or at the
edge of the cable. In other words, the endpoints of these line segments are on the same cable
range. Furthermore, after the morphology transform, the blobs are eliminated. Meanwhile, the
endpoints of the line segments are ideally black. These conditions can further screen out the line
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
44
CHAPTER 4. DESIGN AND IMPLEMENTATION
segments that describe the characteristics of the cable. What needs to be pointed out is that after
the dilation of the colour-inversed image, the area of the cable will expand so that it contains
more endpoints of the segment.
The purpose of the next step is to draw the contour of the entire cable through the line segments which have been filtered in the previous step. Therefore, only the line segments at the edge
of the cable are meaningful for further processing. These line segments can relatively accurately
describe the edge of the relevant cable. Moreover, the number of the output lines are expected
to be smaller after discarding cable inertial lines. Hence, these line segments can be selected by
the principle that the line segments are at the edges and are longer. Generally speaking, the line
segments output through the above steps are longer and almost perpendicular to the abscissa
axis, while they are more possible to be the potential cable outlines because the longer these
lines will be able to show the outline of the entire cable. These edges are the candidates of the
grey gradient of the cable.
But there are exceptions, such as reflection on the surface of the cable. These long mirrorlike reflections enable the LSD algorithm to detect long line segments on the cable surface,
which are virtually not the lines to be proposed. Therefore, it is required to be checked whether
the output line segments are on the cable. In terms of the program, the cable has the same
characteristics. Concretely, the cable ranges through the entire picture, so each cable has a border
on the left and right, while the flagging tape exceeds the theoretical boundary of the cable. In
theory, the entire cable can describe the boundary with a large quantity of straight lines, and these
massive straight lines will increase the difficulty of finding locations. And more importantly,
there may be flagging tapes in some locations along the cable. A large number of straight lines
will directly surround the entire connected component. Indeed, the flagging tape can be wrapped
in the contour, the subsequent detection steps cannot be implemented. Therefore, this part of
the algorithm needs the following requirements:
• The processed line segments can describe the approximate contour of the entire cable.
• The line segments can be adjacent to the edge of the outline, and they do not need to
closely fit all pixels.
• It is necessary to ensure that the number of these line segments is as small as possible.
In conclusion, because contour drawing requires low accuracy, the longest line segment is
selected in the thesis. It will be extended, from the bottom end (the maximum vertical coordinate) to the radio cabinet bottom end (the minimum vertical coordinate). Hence, the the contour
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
CHAPTER 4. DESIGN AND IMPLEMENTATION
45
of the entire cable is drawn approximately.
Algorithm 5: Merge Endpoints and linear regression
Result: Image with cable contour
Input: matrix of cable information
1 lsdMat ← Line segment detector(Original Image)
2 if both endpoints ∈ connectedcomponent then
3
Save both endpoints in ccLsdM at with bigger Y coordinate first
4
Record cable number
5
6
7
8
9
10
11
12
end
for i ← 0 to cable total number do
set x,y difference threshold
for j ← 0 to number of linesegments, k ← j + 1 to number of linesegments
do
if one endpoint of line segment j is within distance of thresholdX&Y in line k
then
if one endpoint in j and k within distance of thresholdX&Y then
if Slope of two lines are similar then
Given the biggest Y and smallest Y as two line endpoints’ Y
/* Combine two lines into a single longer one
*/
end
13
else
Set adjacent endpoints’ coordinates to the same value
/* Do not combine rather connect
14
15
end
16
end
17
18
19
20
21
22
23
24
25
26
27
28
29
*/
end
Quicksort by horizontal coordinate
/* Determine which connected component the line belongs
*/
for i ← 0 to number of linesegments do
for j ← 0 to cable total number do
Get the line having maximum Y subtraction on each side of the connected
component
end
end
if Any of the subtraction of Y < threshold × height then
Discard the result
end
Extend the line from height to cabinetbottom
end
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
46
CHAPTER 4. DESIGN AND IMPLEMENTATION
If long line segments cannot be filtered out, one option is to extend and connect the collinear
line segments, while another is not to connect. Because line segments may be produced owing
to the reflection of the cable surface, which will produce false positives. Therefore, the implementation of complex connection algorithms may not necessarily produce a positive effect on
the system. Therefore, this experiment uses a simple cable description method which indicates
a single line segment on each cable contour.
Distinguish flagging tape
The algorithm of the previous step is only to perform the contour of the cable, while this part will
illustrate how to figure out and highlight the flagging tape, which is based on the cable contour.
Assuming that the contour of the cable has been obtained properly in the previous step, the
flagging tape must be outside the contour. According to algorithm 5, all potential candidates
have been filtered out through LSD and connected component algorithms, which display as the
coordinates of two points. The points may locate on several kinds of the positions on account
of the potential cables: inside, on the contour, scarcely outside the contour and at the further
outside of contour. These situations have different features if multiply the horizontal coordinate
differences between the endpoint and two contour lines with the same vertical coordinate.
If a point is inside the two line segments, it is obvious that the product of the two horizontal
subtraction is a negative number, showing blue in figure 4.7.
If the point is on any one of the line segments, one of the distances from the line segment is
zero so that the product will be zero, showing orange in figure 4.7.
If the point is near the outside of the contour, the distance from one of the contour line
segments will be approximately cable width, and from the other line segment, the distance is
very small, so a smaller product is figured out, showing orange in figure 4.7.
Only when the filtered endpoint is still far away from the nearest cable contour, the product
of the difference between the two abscissas is large. In this way, the set threshold is satisfied and
output this line is the flagging tape, showing yellow in figure 4.7.
The following figure and algorithm show different situations and the method.
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
CHAPTER 4. DESIGN AND IMPLEMENTATION
47
Radio Cabinet
Cable
Figure 4.7: Positions of endpoints
Figure 4.8 shows the conditions of the points. The black is the real cable, whilst the purple
dashes indicate the cable contour by means of the algorithm 5. For example, Orange point is the
situation on the contour. Parameter o1 and o2 illustrate the horizontal difference between orange
point and left/right contour. And the determination of the flagging tape, as explained above, is
based on the product of o1 and o2. The blue point introduces the case of being inside the contour
with the parameter b1 and b2, while the yellow one is further outside the contour with r1 and r2
as the distance.
(0,0)
x
Radio Cabinet
o2
o1
b1
b2
r1
r2
Cable
y
Figure 4.8: Distance calculation
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
48
CHAPTER 4. DESIGN AND IMPLEMENTATION
Algorithm 6: Flagging tape detection
Result: Detected flagging tape point
Input: Cables Contour matrix Line endpoints m,n: xm , ym , xn , yn
1 Get linear function of line segments and contours
2 for i ← 0 to number of linesegments do
3
x1 , x2 = Substitute ym & yn into 2 contour f unctions
4
DetectM ←(x1 − xm ) ×(x2 − xm )
5
if DetectM < threshold× cable Width2 then
6
Store point M as potential flagging tape
7
end
8
DetectN ←(x1 − xn ) ×(x2 − xn )
9
if DetectN < threshold× cable Width2 then
10
Store point N as potential flagging tape
11
end
12
end
In this way, the flagging tape is figured out and can be highlighted on the image, the image
will multiply by the scaler and finally output in the original scale.
4.3
Summary
This chapter introduces concrete implementations of the two methods, YOLO and image processing on OpenCV. These two methods are distinct in thoughts and infrastructure. The training
of YOLO is based on convolutional neural network. The method uses Openlabeling tool for
labelling the data, which are fed into the networks.
This implementation has association with the network parameters during the training, which
is essential for the performance. Another method, which is also the crucial part in the thesis, is
the image processing method with computer vision. This method is implemented in a series of
steps, including initialisation, blob removal, cable localization, and flagging tape detection. All
of these parts are illustrated in detail. Compared with former work and literature, this program
has several advantages. The use of the connected components algorithm delivers high reliability
of detecting cables, and line segment detector extracts lines in the image. The combination of
the two methods validates a dependable algorithm of the whole system.
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
Chapter 5
Measurements and Result
The test will base on two methods, one is the AI object detection method, while another is the
method combination of computer vision and typical image processing.
5.1
5.1.1
Object Detection of Faulty Weatherproofing
Result
As it is mentioned in section 2.5, the object detection method is possible to perform poorly in
the system, and the real situation is true.
There are many problems during the part of the object detection. The object detection using
AI is basically separated into several steps, labelling, training, and testing. The latter two steps
have big issues with detection.
After labelling on the original shooting videos, the images are fed into the network and
trained. The image is labelled for both cable and the faulty weatherproofing, the flagging tape,
which means that the system is targeted to detect faulty weatherproofing directly. During the
training, several images are fed into the graphic memory simultaneously, which is known as
the batches, and the console window shows the output of the system, including execution time,
precision, etc. In general, the system works on the gradient descent for the fitting. Generally,
when the precision for the system reaches 95% in the normal object detection, the system will
tend to become overfit. Normally, though sometimes the precision can struggle in low value, the
system will adjust and refit for better performance. Then the precision will rise and finally reach
high performance eventually (like 90%). However, if the model cannot reduce its loss for many
loops, the model or the dataset might be unsuitable for the algorithm.
The result of object detection is not satisfying. Nevertheless, however the environment or
the shape of the flagging tape changes, the precision which displays on the output window shows
it cannot reach 70% in most training sets by means of normal labelling. To make matters worse,
the detection of the image are filled with false positives so that it is scarcely to observe the
49
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
50
CHAPTER 5. MEASUREMENTS AND RESULT
flagging tape part. Although the network is changed with some parameters, it barely meets
the requirement and reason is still due to too many false positive generations. Therefore, the
confidence of the bounding box (object) is less than 60%, while IoU is less than 50%.
5.1.2
Discussion
The cables are in the curve shape and flagging tapes are only small parts sticky on the cables.
However, the labelling tool labels the object in a rectangle box. Hence, the labelling is inevasible
to label some background parts from the whole image. Furthermore, the cable and flagging tape
itself does not have too much feature and the flagging tape is a too small object for detection.
Most important for all, the shape of the flagging tape shifts in different situations, which increases
the difficulty for detection. In addition, the cable itself has some reflections on the surface.
5.2
Method Based on Computer Vision and Image Processing
From the result of object detection, it can be noticed that due to the variability of line shapes
and poor visibility of colours, which is mostly black, the IoUs on the cable are very poor. On
the other hand, it is difficult to detect the flagging tape owing to the shape variance. As a result,
the detection may generate a large number of false positives in many cases. Basically, object
detection, as an artificial intelligence method, has the same common issues of overfitting. Concretely, the number of training sets is possibly too small compared to the test set under various
conditions. In this way, the training set is easily triggered to be overfitted, resulting in the above
situation.
As a consequence, YOLO and other object detection methods are not supposed to be the best
choice for this thesis.
The following thesis implementation will focus on the characteristics of the cable itself.
Specifically, these characteristics of the cable will be analyzed and extracted. In this way, the
position of each cable is detected by the method of computer vision, and then various situations
of the flagging tape are further analyzed, so as to find a most suitable and more general method
for detecting the flagging tape.
5.2.1
Testing Method and Testing Environment
Test set creation
The testing method is based on the testing set, which derived from the responsible research of the
sponsor company. In the test set, the radio base station is in existence in the company laboratory.
To be specific, it is composed of a radio cabinet and a metal skeleton, and the former is mounted
on the latter. There are four cables mounted on the four connectors. One of them is tape-around
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
CHAPTER 5. MEASUREMENTS AND RESULT
51
weatherproofing and with flagging tape on a certain position of the cable. Whilst three of them
are the plastic type of the weatherproofing, which are not considered to be part of the testing so
far. Moreover, the backgrounds of the test images are open source images from google, which
are displayed on the full-scaled screen at the behind of the radio station. In this way, the test set
of an image is created. A sample of the test set is shown below in figure 5.1, the most right cable
is the tape-around type weatherproofing and with flagging tape on the top, which is the detection
target. The other three cables are the plastic type of weatherproofing, their detection is out from
the range of the thesis so that will not be discussed as part of the result. For the whole test set,
there are 38 images in total under preceding rules.
Figure 5.1: Sample test image
Among all the images in the test set, the differences are the backgrounds, positions of the
flagging tape, and the shooting angle. In this test set, all the images include the faulty weatherproofing, which is the flagging tape to be detected. If the program detects any faulty weatherproofing on the images, these possible problems will be displayed anyhow, even though there
might be false positives among the highlight lines.
Performance of the system
The performance of the system is primarily based on two aspects, which are precision and recall.
The other indexes, including the run time, is not crucial under this circumstance, but they will
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
52
CHAPTER 5. MEASUREMENTS AND RESULT
still be measured and analyzed.
The precision of the test set is relatively complex to be calculated. Table 2.2 shows four
conditions considering binary prediction. The precision shows the fraction of relevant instances
among the retrieved instances [69]. In this way, the precision is equal to the true positive divided
by the predicted positive.
In every image, there are a number of predicted flagging tapes, and some of them are target
object, whilst some are not. In terms of every image, its weight is the same. As a result, the precision of every image should be added together and divided by the total number of the predicted
images. Those images which are not predicted flagging tape in the system should not be included
since they do not have the precision when divided by zero. However, the method to calculate
all the actual faulty lines and its division by all the predicted lines is not appropriate, because
the quantity of the predicted lines varies in each image. If implemented, the difference of the
prediction makes the weight not the same, and the images with more predictions will have higher
weights. As a consequence, the precision will be unbalanced and become unascertainable. The
formula is shown in equation 5.1.
precision =
Correctly P redicted Lines
T rue P ositive
=
P redicted P ositive
T otal P redicted Lines
(5.1)
The recall calculation of the system is significantly simpler than precision. Recall, generally
shows the sensitivity of the model, showing the fraction of the total amount of relevant instances
that were practically retrieved [69]. According to table 2.2, recall is the ratio of the true positives
and the actual positive. Since all of the pictures include the flagging tape in the image, the recall
is the number of actual faulty weatherproofing divide by the whole test set. It is shown below in
equation 5.2.
recall =
T rue P ositive
P redicted F lagging T apeImages
=
Actual P ositive
T otal T est Images
(5.2)
Although running time is not critical in this system, the measurement is still meaningful for
processing the image to reduce the complexity. In order to reduce the deviation of processing
each image, a large quantity of iterations are implemented to test the processing time. In this
way, the whole test set is processed for 100 times, which means that the system will process
3,800 images in total. It is expected to reduce the error significantly.
Nevertheless, it requires to be pointed out that this thesis is only focused on tape-around style
weatherproofing. For other mounting methods, such as plastic-type weatherproofing, no performance calculation is performed in this experiment. Other forms of weatherproofing prediction
will be implemented by other methods. As their shape is very certain, these weatherproofing
detection can be better implemented by AI methods, such as deep learning. , CNN, YOLO and
other methods. Therefore, false positives generated by detecting other forms of weatherproofing
in the picture are not considered. In other words, the calculation of the conclusion in the paper is
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
CHAPTER 5. MEASUREMENTS AND RESULT
53
only based on whether each picture itself has been detected faulty tape-around weatherproofing,
and the precision of the predicted lines that constitute faulty tape-around weatherproofing.
Moreover, to show a better performance of the system, a slider is implemented in the window.
The slider is possible to control the binary threshold for processing the image, and it is a built-in
function in OpenCV. Generally, the control rule is to use a bigger threshold when the picture is
too bright. Thus, more bright black pixels will be processed, and vice versa. Furthermore, when
a slider is applied, different values of the threshold can adjust to the images beyond this test set,
which improves the flexibility of the system. Although the control of the slider is by the human
but it is still meaningful for the algorithm reference when advanced brightness detection method
may be implemented.
Testing environment
The testing is based on the following testing environment, a relatively high-end configuration.
Table 5.1: Testing Environment
Processor
Memory
Operation
System
Programming
Language
5.2.2
Intel i7
32GB
Windows 10
1809 version
Python 3.7
Testing Result Under Fixed Binary Threshold
Sample output
The system will highlight all the predicted output in the images. The sample image output is
shown below in figure 5.2. The red lines shown in the image are predicted flagging tape. As
mentioned above, the most right cable is the object cable (tape-around weatherproofing), while
the other cables are not. In this way, only the number of predicted lines on the right affects the
precision and recall. The detailed graph is shown in appendix table A.1.
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
54
CHAPTER 5. MEASUREMENTS AND RESULT
Figure 5.2: Result of the sample image
System performance
Table 5.2 below introduces the performance of the system regarding precision, recall and the run
time. Under the binary threshold of a certain value, the system precision can reach the percentage
of 59,51 while the recall can reach 71.05%. The average run time for each image is less than
0.27s.
Table 5.2: Test result under different binary thresholds
Binary
Threshold
Precision (%)
Recall (%)
Run time (s)
Run time
Per image (s)
30
40
50
60
58,14
68,42
986,34
59,51
71,05
992,28
54,84
76,32
1012,47
55,93
73,68
1014,97
0,260
0,261
0,266
0.267
From the table above, it is more distinct to present the trend and the influence when a graph is
generated. In this way, the data are collected and show as a statistical way below in two separate
graphs 5.3 and 5.4.
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
CHAPTER 5. MEASUREMENTS AND RESULT
55
Percentage
Precision and Recall
90
80
70
60
50
40
30
20
10
0
30
40
50
60
Binary Threshold
Precision
Recall
Figure 5.3: Binary threshold and precision and recall
Binary Threshold - Average Runtime
0,268
Average Run Time(s)
0,266
0,264
0,262
0,26
0,258
0,256
30
40
50
60
Figure 5.4: Binary threshold and run time
5.2.3
Testing Result Under Adjustable Binary Threshold
In terms of using a slider, the performance of the system is better than a single fixed binary
threshold. After using the slider, the threshold can be adjusted manually under different brightness conditions. Thus, the slider system is able to detect most of the test set. Two situations
cannot be predicted properly. The first condition is several complex and black backgrounds
which are almost mixed with the cable, the other condition is that the grey gradient difference
between the flagging tape and the background is too small to recognize. As a consequence, the
LSD algorithm is not able to predict relevant edges of the flagging tape. The concrete result
of each image is shown in table A.5 in the appendix. The result shows 31 images are predicted
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
56
CHAPTER 5. MEASUREMENTS AND RESULT
correct flagging tape location among the 38 images, which means the recall reaches 82%. 68,5%
of the lines are true positives, whilst rest are lines of false positives, so the precision is 68.5%,
and its comparison with a fixed threshold is shown in the figure 5.5
Slider vs Fixed threshold
90
80
70
60
50
40
30
20
10
0
Slider
30
40
Precision (%)
50
60
Recall (%)
Figure 5.5: Comparison between slider and fixed threshold
5.2.4
Discussion
The graph 5.3, 5.4 and table 5.2 illustrate how parameter, the binary threshold, influences the
performance of the system. Regarding the run time, when the threshold is bigger, the execution
time for the systems increases. When the binary threshold becomes larger, some pixels which
used to be filtered as white becomes black. After processed by the invertor, these black pixels
will be treated as white ones. As a consequence, there are more white pixels than before for
the execution. In this way, the calculation time increases. However, the number of pixels is not
relatively large, which results in a slight increment of the execution time.
Considering the precision and recall, the graph shows that in the appropriate value of the
binary threshold, the precision and recall are not in the positive correlation. When the recall
percentage increases, it often is at the cost of precision. The test data show that set 30 as a fixed
threshold is not a good alternative, because when it comes to 40, the precision and recall are
both better. This also means that 30 is too small for this test set threshold.
After using the slider in the system, the performance of the system is improved significantly,
because the binary threshold can be modified manually to fit the brightness of the image. In this
way, the connected component algorithm will be implemented better discarding the noises in the
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
CHAPTER 5. MEASUREMENTS AND RESULT
57
background. However, manually modification of the system is difficult because the adjustment
is likely to be excessive and cannot get the actual positive result.
5.3
Summary
In conclusion, the experiments of two methods show a complete different result. YOLO, as AIbased object detection method cannot produce a satisfying result, while the method of image
processing on OpenCV is able to reach 60% of precision and 71% regarding recall with 0.26
second execution time roughly .
In terms of YOLO method, the system cannot reach a low level of the loss during the whole
training. As a consequence, the system is not possible to deliver a good result. However, the
cumstom image processing method is more reliable. To be pointed out, the calculation of precision and recall are regardless of plastic type of weatherproofing, rather only the tape-around
style. In this way, the concentration is a proportion in the whole image.
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
Chapter 6
Conclusion and Future Work
This section summarizes the work and findings found throughout the project. It also conducts
recommendations for future work based on the results presented in this report, which can be
studied in these areas.
6.1
Conclusion
With the progression of 5G, there is a high demand for radio tower set up. In this way, there is
a higher need for the maintenance of the whole radio base station system. To meet the requirement, deeper research and further development are imminently needed to find better solutions
for reducing human costs and preventing dangers from humanity. The purpose of this work is a
system that can predict faulty tape-around weatherproofing, specifically the flagging tape type
of weatherproofing.
Aiming to detect the faulty weatherproofing, this work was first carried out through extensive
literature research, researching and attempting to reproduce some of the conclusions. Then the
self-designed program or network is performed and compared. Therefore, the final prototype
applies and combines the line segment detector, connected components, image processing, etc.
for the test and experiment. With this method, faulty weatherproofing (flagging tapes) can be
identified.
Therefore, this article can answer the following three key questions that make it possible for
the company to create a specific target detection system (flagging tape): Which methods cannot
detect the flagging tape on a radio tower? What method can detect the flagging tape on the radio
tower? How are results tested and evaluated?
Two tracks are attempted during the thesis. AI-Based object detection (YOLO) and image
processing on computer vision.
In the former method, the first step is to label the data. The VOC format data is generated
by the Openlabeling tool label cable or flagging tape, which can be recognized by the network.
Then the network is configured, and the data is fed into the network to train the model.
58
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
CHAPTER 6. CONCLUSION AND FUTURE WORK
59
In terms of the image processing method, resizing is implemented to improve the execution
speed. Furthermore, a filtering method is added to eliminate the impact of small objects on
the system. For the narrow or small reflections inside, the close operation in the morphology
transform is applied to process the image. In order to improve reliability, the system applies
the LSD algorithm to extract the lines, and the connected component method ensures that the
detected line segments are reliable on the cable. Last but not least, the detection of flagging
tape is based on the cable features, through which drawing the cable contours and find the lines
outside the contour.
The test set is 38 images with flagging tape on a certain place and with different backgrounds,
and it is created by researchers in the company. The evaluation is based on precision, recall, and
execution time. In YOLO, precision is the proportion of true positives in the predicted positive,
and recall is the ratio of true positive and actual positive, IoU is the intersection area to the union
area. On account of the image processing method, the precision is the ratio of correctly predicted
lines and total predicted lines, concentrating on the area of tape-around weatherproofing, while
the recall is the ratio of predicted flagging tape images and total test images.
Comparison is made between the two methods, but the evaluation is not compared with
former work since there are not any same use cases.
In the program of the image processing method, the performance of the system reaches the
(precision, recall) = (59.5%, 71.1%) and (54.8%, 76.3%) under different binary threshold parameters. Furthermore, the execution time of the program is less than 0.267 second per image.
However, in terms of the object detection method, it does not work well by means of a stateof-art algorithm, YOLO. Even though the parameter of the network is changed and tried training
in a single background, the system cannot reduce the loss to a satisfying value. Therefore, IoU,
the confidence of the bounding box are relatively low (50%, 60% respectively), and the precision
and recall are in poor value, which makes the result unreliable.
6.2
Limitations
The final system is based on the OpenCV method and processed on custom image processing
algorithm. Compared with the Artificial Intelligence method, the algorithm presented in this
thesis is better to detect the faulty tape-around weatherproofing. However, some weaknesses
and limitations still exist and are inevitable.
• The shooting direction cannot be parallel with the cables. Considering that a drone is used
for obtaining the photographs and is designed to shoot horizontally, but the radio cabinet
may not be installed horizontally. This program does not adjust the program for bigger
angles of shooting, like 45° to the horizontal or even vertical.
• The brightness and complexity of the background strongly affect the prediction. Although
the parameters can be adjusted through the slider, adjusting these parameters is relatively
unreliable for non-technical personnel. Because there are no specific numerical values for
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
60
CHAPTER 6. CONCLUSION AND FUTURE WORK
•
•
•
•
the adaption, and the adjustment is easy to be exceeded. For example, given an image with
a certain brightness, when the flagging tape cannot be detected with the original parameter,
it is difficult for non-technical personnel to determine the specific value of the parameter
is suitable for detecting faulty weatherproofing, even if the binary image can be displayed
on the same screen.
Algorithmically, the entire boundary of the cable is only fit by two rough line segments.
Therefore, the predicted contour is not glueing to the actual boundary and impact the
distance measurement of the potential flagging tape and the fitted contours. This might
result in mass false positives generation.
There are some bright speck or blobs inside the cables owing to the light reflection, and
different brightness produces various levels of these blobs. These blobs in the cable may
influence the consistency of the cable pixels and results in errors in the image. However,
they are filled by means of the morphology transformation, the erosion and dilation, which
is implemented through sliding with a fixed-size kernel. The blobs are not automatically
filled in the system, which is inside the cable like the small islands, the system implements
the function on an empirical value.
Limitations of the LSD algorithm. The LSD algorithm performs well when detecting
all the boundaries with big differences of grey gradients as potential segmentation lines,
which are named candidates for this image. The prediction of the faulty weatherproofing
line in this thesis is based on the filtering of these candidates from the LSD algorithms.
However, whether all the lines can be completely detected is the limitation of the performance. From the test results, it can be seen that the LSD algorithm will be not very
effective when the background and cable colours are similar, and this makes it more possible to fail the prediction.
There are only positive samples in the test set. In other words, all the images in the test set
contain the faulty weatherproofing in a certain location. However, the negative samples,
which are the images without these faulty type weatherproofing are not included in the test
set. This means some proportion of the whole situations are not included in the test set. In
this way, the delivered result of precision and recall are part of the situations. Theoretically,
there will be another result regarding the precision, different from the precision in each
image, the precision of the whole test set is explained in formula 6.1. This precision is
the percentage of true flagging tape images to the whole predicted faulty weatherproofing
images, which reflects the degree of the correct prediction.
precision =
CorrectlyP redictedImages
T rueP ositive
=
P redictedP ositive
T otalP redictedImages
(6.1)
• The execution time per image is less than 0.267 second on the PC. However, if the system
works on an extremely low-end computer or immigrates to mobile devices, the execution
time shall be longer. This will make the system less efficient.
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
CHAPTER 6. CONCLUSION AND FUTURE WORK
6.3
61
Future Work
According to these limitations, there are several parts which can be improved in the system. In
this section, critical future works will be discussed.
• The results of this experiment show that the YOLO algorithm can not achieve the detection of labelled tapes well, so a better method of target detection is required to be found
or designed, that is, a network more suitable for this use case, which can realize small
detection of the objects which are without fixed features and shapes.
• It can be seen from the results that the brightness of the light greatly affects the test results
because the background is also considered as a cable in an excessively dark environment.
If false positives are increased, it is difficult to deal with this problem in traditional image
detection. However, if the lightness and darkness of the background can be quantified, or
the lightness and darkness of the approximate cable object and the surface reflection can
be detected, then the cable position can be found more accurately.
• The current method of fitting contour curves simply uses one longer line segment to
present each side of the object, because the multi-curve fitting may wrap the flagging
tape inside the contour so that making it undetectable. Thus, better fitting methods can be
applied to fit contour in the future.
• Better line segmentation method can be used. The line detector is the infrastructure of
the image processing and it needs to be acknowledged that the LSD algorithm is very
powerful. However, in the face of some background or object colour approximations, and
when the grey gradient is not obvious, these detection results are not clear enough, and
this causes un-detection of some straight lines belonging to the flagging tape. If there
is a better algorithm, more straight lines can be added to the subsequent algorithm for
processing.
• Although it consumes not much time on a better-performing PC, when it comes to porting
the program to a mobile terminal in the future, the program still needs to be optimized.
Both time complexity and space complexity needs to be reduced. Hence, it is essential to
reduce loops, the use of large arrays, and reduce the use of complex algorithms such as
connected components as much as possible.
• Add a gyroscope sensor to measure the shooting angle. The shooting angle may be nonhorizontal. Additional sensors for correcting the angle can be applied to ensure that the
cable runs vertically through the entire picture during the shooting.
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
Bibliography
[1] X. Liu, S. Wu, Y. Guo, and C. Chen, “The demand and development of internet of things for
5G: A survey,” in 2018 IEEE International Conference on Consumer Electronics-Taiwan
(ICCE-TW). IEEE, 2018, pp. 1–2.
[2] A. Karapantelakis and J. Markendahl, “Challenges for ict business development in intelligent transport systems,” in 2017 Internet of Things Business Models, Users, and Networks.
IEEE, 2017, pp. 1–6.
[3] R. Inam, A. Karapantelakis, K. Vandikas, L. Mokrushin, A. V. Feljan, and E. Fersman, “Towards automated service-oriented lifecycle management for 5G networks,” in 2015 IEEE
20th Conference on Emerging Technologies & Factory Automation (ETFA). IEEE, 2015,
pp. 1–8.
[4] A. V. Feljan and Y. Jin, “A simulation framework for validating cellular v2x scenarios,” in
IECON 2018-44th Annual Conference of the IEEE Industrial Electronics Society. IEEE,
2018, pp. 4078–4083.
[5] W. D. de Mattos and P. R. Gondim, “M-health solutions using 5G networks and m2m
communications,” IT Professional, vol. 18, no. 3, pp. 24–29, 2016.
[6] L. Grcev, A. van Deursen, and J. van Waes, “Lightning current distribution to ground at
a power line tower carrying a radio base station,” IEEE transactions on electromagnetic
compatibility, vol. 47, no. 1, pp. 160–170, 2005.
[7] S. Talwar, D. Choudhury, K. Dimou, E. Aryafar, B. Bangerter, and K. Stewart, “Enabling
technologies and architectures for 5G wireless,” in 2014 IEEE MTT-S International Microwave Symposium (IMS2014). IEEE, 2014, pp. 1–4.
[8] L. Dai, B. Wang, Y. Yuan, S. Han, I. Chih-Lin, and Z. Wang, “Non-orthogonal multiple access for 5G: solutions, challenges, opportunities, and future research trends,” IEEE
Communications Magazine, vol. 53, no. 9, pp. 74–81, 2015.
[9] “US tower fatality tracker,” http://wirelessestimator.com/content/fatalities, accessed:
2019-12-07.
62
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
Bibliography
63
[10] “In race for better cell service, men who climb towers pay with their lives,” https://www.
propublica.org/article/cell-tower-fatalities, accessed: 2019-12-07.
[11] A. Håkansson, “Portal of research methods and methodologies for research projects and
degree projects,” in The 2013 World Congress in Computer Science, Computer Engineering, and Applied Computing WORLDCOMP 2013; Las Vegas, Nevada, USA, 22-25 July.
CSREA Press USA, 2013, pp. 67–73.
[12] Q. Chen, Y. Fu, W. Song, K. Cheng, Z. Lu, C. Zhang, and L. Li, “An efficient streaming
accelerator for low bit-width convolutional neural networks,” Electronics, vol. 8, no. 4, p.
371, 2019.
[13] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate
object detection and semantic segmentation,” in Proceedings of the IEEE conference on
computer vision and pattern recognition, 2014, pp. 580–587.
[14] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp.
1097–1105.
[15] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image
recognition,” arXiv preprint arXiv:1409.1556, 2014.
[16] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal
visual object classes challenge 2007 (voc2007) results,” 2007.
[17] R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international conference on computer
vision, 2015, pp. 1440–1448.
[18] “Softmax function,” https://en.wikipedia.org/wiki/Softmax_function, accessed: 2019-1114.
[19] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, realtime object detection,” in Proceedings of the IEEE conference on computer vision and
pattern recognition, 2016, pp. 779–788.
[20] J. Redmon and A. Farhadi, “Yolo9000: better, faster, stronger,” in Proceedings of the IEEE
conference on computer vision and pattern recognition, 2017, pp. 7263–7271.
[21] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in European conference on computer
vision. Springer, 2014, pp. 740–755.
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
64
Bibliography
[22] A. Veit, T. Matera, L. Neumann, J. Matas, and S. Belongie, “Coco-text: Dataset
and benchmark for text detection and recognition in natural images,” arXiv preprint
arXiv:1601.07140, 2016.
[23] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by
reducing internal covariate shift,” arXiv preprint arXiv:1502.03167, 2015.
[24] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection
with region proposal networks,” in Advances in neural information processing systems,
2015, pp. 91–99.
[25] J. Du, “Understanding of object detection based on cnn family and yolo,” in Journal of
Physics: Conference Series, vol. 1004, no. 1. IOP Publishing, 2018, p. 012029.
[26] W. Yu, K. Yang, Y. Bai, T. Xiao, H. Yao, and Y. Rui, “Visualizing and comparing alexnet
and vgg using deconvolutional layers,” in Proceedings of the 33 rd International Conference on Machine Learning, 2016.
[27] A. Bobrovsky, M. Galeeva, A. Morozov, V. Pavlov, and A. Tsytsulin, “Automatic detection
of objects on star sky images by using the convolutional neural network,” in Journal of
Physics: Conference Series, vol. 1236, no. 1. IOP Publishing, 2019, p. 012066.
[28] Y. Bai, Y. Zhang, M. Ding, and B. Ghanem, “Sod-mtgan: Small object detection via multitask generative adversarial network,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 206–221.
[29] J. Li, X. Liang, Y. Wei, T. Xu, J. Feng, and S. Yan, “Perceptual generative adversarial
networks for small object detection,” in Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, 2017, pp. 1222–1230.
[30] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd: Single
shot multibox detector,” in European conference on computer vision. Springer, 2016, pp.
21–37.
[31] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv, 2018.
[32] Z. Wu, C. Shen, and A. Van Den Hengel, “Wider or deeper: Revisiting the resnet model
for visual recognition,” Pattern Recognition, vol. 90, pp. 119–133, 2019.
[33] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception-v4, inception-resnet and
the impact of residual connections on learning,” in Thirty-First AAAI Conference on Artificial Intelligence, 2017.
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
Bibliography
65
[34] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid
networks for object detection,” in Proceedings of the IEEE conference on computer vision
and pattern recognition, 2017, pp. 2117–2125.
[35] C. Saravanan, “Color image to grayscale image conversion,” in 2010 Second International
Conference on Computer Engineering and Applications, vol. 2. IEEE, 2010, pp. 196–199.
[36] “Find and draw contours using OpenCV | python,” https://www.geeksforgeeks.org/
find-and-draw-contours-using-opencv-python/, accessed: 2019-11-14.
[37] S. Suzuki et al., “Topological structural analysis of digitized binary images by border following,” Computer vision, graphics, and image processing, vol. 30, no. 1, pp. 32–46, 1985.
[38] “Component (graph theory),” https://en.wikipedia.org/wiki/Component_(graph_theory),
accessed: 2019-11-14.
[39] W. Chen, M. L. Giger, and U. Bick, “A fuzzy c-means (fcm)-based approach for computerized segmentation of breast lesions in dynamic contrast-enhanced mr images1,” Academic
radiology, vol. 13, no. 1, pp. 63–72, 2006.
[40] K. Wu, W. Koegler, J. Chen, and A. Shoshani, “Using bitmap index for interactive exploration of large datasets,” in 15th International Conference on Scientific and Statistical
Database Management, 2003. IEEE, 2003, pp. 65–74.
[41] R. O. Duda and P. E. Hart, “Use of the hough transformation to detect lines and curves in
pictures,” Sri International Menlo Park Ca Artificial Intelligence Center, Tech. Rep., 1971.
[42] D. H. Ballard, “Generalizing the hough transform to detect arbitrary shapes,” Pattern
recognition, vol. 13, no. 2, pp. 111–122, 1981.
[43] R. G. Von Gioi, J. Jakubowicz, J.-M. Morel, and G. Randall, “LSD: a line segment detector,” Image Processing On Line, vol. 2, pp. 35–55, 2012.
[44] A. Desolneux, L. Moisan, and J.-M. Morel, “The helmholtz principle,” From Gestalt Theory to Image Analysis: A Probabilistic Approach, pp. 31–45, 2008.
[45] A. M. Algorry, A. G. García, and A. G. Wofmann, “Real-time object detection and classification of small and similar figures in image processing,” in 2017 International Conference
on Computational Science and Computational Intelligence (CSCI). IEEE, 2017, pp. 516–
519.
[46] M. Mathias, R. Timofte, R. Benenson, and L. Van Gool, “Traffic sign recognition—how far
are we from the solution?” in The 2013 international joint conference on Neural networks
(IJCNN). IEEE, 2013, pp. 1–8.
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
66
Bibliography
[47] J. Greenhalgh and M. Mirmehdi, “Real-time detection and recognition of road traffic
signs,” IEEE Transactions on Intelligent Transportation Systems, vol. 13, no. 4, pp. 1498–
1506, 2012.
[48] Z. Du, J. Yin, and J. Yang, “Expanding receptive field yolo for small object detection,” in
Journal of Physics: Conference Series, vol. 1314, no. 1. IOP Publishing, 2019, p. 012202.
[49] P. Du, X. Qu, T. Wei, C. Peng, X. Zhong, and C. Chen, “Research on small size object
detection in complex background,” in 2018 Chinese Automation Congress (CAC). IEEE,
pp. 4216–4220.
[50] J. Wang, S. Jiang, W. Song, and Y. Yang, “A comparative study of small object detection
algorithms,” in 2019 Chinese Control Conference (CCC). IEEE, 2019, pp. 8507–8512.
[51] X. Lu, J. Yao, K. Li, and L. Li, “Cannylines: A parameter-free line segment detector,”
in 2015 IEEE International Conference on Image Processing (ICIP). IEEE, 2015, pp.
507–511.
[52] C. Akinlar and C. Topal, “Edlines: A real-time line segment detector with a false detection
control,” Pattern Recognition Letters, vol. 32, no. 13, pp. 1633–1642, 2011.
[53] N. Hamid and N. Khan, “Lsm: perceptually accurate line segment merging,” Journal of
Electronic Imaging, vol. 25, no. 6, p. 061620, 2016.
[54] A. Mahmoud, L. Ehab, M. Reda, M. Abdelaleem, H. A. El Munim, M. Ghoneima, M. S.
Darweesh, and H. Mostafa, “Real-time lane detection-based line segment detection,” in
2018 New Generation of CAS (NGCAS). IEEE, 2018, pp. 57–61.
[55] S. Liu, L. Lu, X. Zhong, and J. Zeng, “Effective road lane detection and tracking method
using line segment detector,” in 2018 37th Chinese Control Conference (CCC). IEEE,
2018, pp. 5222–5227.
[56] Ü. Budak, U. Halıcı, A. Şengür, M. Karabatak, and Y. Xiao, “Efficient airport detection
using line segment detector and fisher vector representation,” IEEE Geoscience and Remote
Sensing Letters, vol. 13, no. 8, pp. 1079–1083, 2016.
[57] H. El Bahi and A. Zatni, “Document text detection in video frames acquired by a smartphone based on line segment detector and dbscan clustering,” Journal of Engineering Science and Technology, vol. 13, no. 2, pp. 540–557, 2018.
[58] J. Wang, Q. Qin, L. Chen, X. Ye, X. Qin, J. Wang, and C. Chen, “Automatic building
extraction from very high resolution satellite imagery using line segment detector,” in 2013
IEEE International Geoscience and Remote Sensing Symposium-IGARSS. IEEE, 2013,
pp. 212–215.
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
Bibliography
67
[59] L. Zhang, Y. Cheng, and Z. Zhai, “Real-time accurate runway detection based on airborne
multi-sensors fusion,” Defence Science Journal, vol. 67, no. 5, pp. 542–550, 2017.
[60] I. Gamal, A. Badawy, A. M. Al-Habal, M. E. Adawy, K. K. Khalil, M. A. El-Moursy,
and A. Khattab, “A robust, real-time and calibration-free lane departure warning system,”
Microprocessors and Microsystems, vol. 71, p. 102874, 2019.
[61] Ö. E. Yetgin and Ö. N. Gerek, “Pld: Power line detection system for aircrafts,” in 2017
International Artificial Intelligence and Data Processing Symposium (IDAP). IEEE, 2017,
pp. 1–5.
[62] J. Wang, X. Yang, X. Qin, X. Ye, and Q. Qin, “An efficient approach for automatic rectangular building extraction from very high resolution optical satellite imagery,” IEEE Geoscience and Remote Sensing Letters, vol. 12, no. 3, pp. 487–491, 2014.
[63] L. Huang, Q. Chang, S. Chen, and H. Dai, “Line segment matching of space target image
sequence based on optical flow prediction,” in 2015 IEEE International Conference on
Progress in Informatics and Computing (PIC). IEEE, 2015, pp. 148–152.
[64] L. He, X. Ren, Q. Gao, X. Zhao, B. Yao, and Y. Chao, “The connected-component labeling
problem: A review of state-of-the-art algorithms,” Pattern Recognition, vol. 70, pp. 25–43,
2017.
[65] W.-Y. Chang, C.-C. Chiu, and J.-H. Yang, “Block-based connected-component labeling
algorithm using binary decision trees,” Sensors, vol. 15, no. 9, pp. 23 763–23 787, 2015.
[66] F. Spagnolo, F. Frustaci, S. Perri, and P. Corsonello, “An efficient connected component
labeling architecture for embedded systems,” Journal of Low Power Electronics and Applications, vol. 8, no. 1, p. 7, 2018.
[67] B. Peck and J. Mummery, “Hermeneutic constructivism: An ontology for qualitative research,” Qualitative health research, vol. 28, no. 3, pp. 389–407, 2018.
[68] K. Yilmaz, “Comparison of quantitative and qualitative research traditions: Epistemological, theoretical, and methodological differences,” European Journal of Education, vol. 48,
no. 2, pp. 311–325, 2013.
[69] “Precision and recall,” https://en.wikipedia.org/wiki/Precision_and_recall, accessed:
2019-12-04.
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
Appendix A
Table of Testing Result
Tables from table A.1 to table A.4 show test statistics and the result of each image under the
different binary threshold. The results include the precision of each image and overall recall and
precision, etc.
Table A.5 shows the result after using a slider to adjust the binary threshold manually.
68
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
APPENDIX A. TABLE OF TESTING RESULT
69
Table A.1: Statistics of all the test set with the binary threshold of 30
Image 1
Image 2
Image 3
Image 4
Image 5
Image 6
Image 7
Image 8
Image 9
Image 10
Image 11
Image 12
Image 13
Image 14
Image 15
Image 16
Image 17
Image 18
Image 19
Image 20
Image 21
Image 22
Image 23
Image 24
Image 25
Image 26
Image 27
Image 28
Image 29
Image 30
Image 31
Image 32
Image 33
Image 34
Image 35
Image 36
Image 37
Image 38
Total:
Total time(s):
Average time (s):
Predicted
Number
of Cables
Lines of
Errors
All predicted
lines in the
interested
area
4
3
4
4
4
4
4
4
4
4
4
4
4
4
4
3
3
4
4
4
4
2
4
4
3
4
4
2
4
4
4
4
4
4
4
4
4
4
2
6
6
9
8
7
7
5
8
11
6
9
7
2
8
14
4
8
5
12
19
0
13
17
3
5
3
0
6
7
10
5
4
5
5
5
9
8
2
0
1
2
4
3
4
3
6
6
1
6
6
1
1
4
2
5
3
6
4
0
0
6
0
5
1
0
5
5
4
4
4
4
3
2
2
4
986,34
0,260
Actual faulty
lines in the
interested
area
2
0
0
1
2
1
2
2
2
0
0
0
2
1
1
2
0
3
0
2
2
0
0
2
0
2
1
0
3
3
3
3
3
1
1
0
1
3
If error is
detected Precision
correct
of the
(1 is
image
detected)
1
1
0
0
0
1
0,5
1
0,5
1
0,33
1
0,5
1
0,67
1
0,33
0
0
0
0
0
0
1
0,33
1
1
1
1
1
0,5
0
0
1
0,6
0
0
1
0,33
1
0,5
0
0
1
0,33
0
1
0,4
1
1
0
1
0,6
1
0,6
1
0,75
1
0,75
1
0,75
1
0,25
1
0,33
0
0
1
0,5
1
0,75
26
15,12
Recall: Precision:
68,42%
58,14%
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
70
APPENDIX A. TABLE OF TESTING RESULT
Table A.2: Statistics of all the test set with the binary threshold of 40
Image 1
Image 2
Image 3
Image 4
Image 5
Image 6
Image 7
Image 8
Image 9
Image 10
Image 11
Image 12
Image 13
Image 14
Image 15
Image 16
Image 17
Image 18
Image 19
Image 20
Image 21
Image 22
Image 23
Image 24
Image 25
Image 26
Image 27
Image 28
Image 29
Image 30
Image 31
Image 32
Image 33
Image 34
Image 35
Image 36
Image 37
Image 38
Total:
Total time(s):
Average time (s):
Predicted
Number
of Cables
Lines of
Errors
4
4
4
4
4
4
4
4
4
4
3
4
4
4
4
4
3
4
5
4
4
1
4
4
3
4
4
2
4
4
4
4
4
4
4
4
4
4
2
11
8
9
7
5
7
7
8
13
4
7
9
1
8
22
6
8
8
13
19
0
11
13
0
1
6
0
6
7
10
5
5
5
8
2
9
15
992,28
0,261
All predicted
lines in the
interested
area
2
1
2
2
3
2
4
4
5
6
1
3
6
1
4
4
2
5
6
5
4
0
0
5
0
1
2
0
4
5
4
4
4
4
4
2
3
4
Actual faulty
lines in the
interested
area
2
0
0
2
2
1
2
2
2
0
0
1
2
1
1
3
0
3
0
2
2
0
0
2
0
1
1
0
3
3
3
3
3
1
2
0
1
3
If error is
Precision
detected
of the
(1 is
image
correct)
1
1
0
0
0
0
1
1
1
0,67
1
0,5
1
0,5
1
0,5
1
0,4
0
0
0
0
1
0,33
1
0,33
1
1
1
0,25
1
0,75
0
0
1
0,6
0
0
1
0,4
1
0,5
0
0
1
0,4
0
1
1
1
0,5
0
1
0,75
1
0,6
1
0,75
1
0,75
1
0,75
1
0,25
1
0,5
0
0
1
0,33
1
0,75
27
16,07
Recall: Precision:
71,05%
59,51%
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
APPENDIX A. TABLE OF TESTING RESULT
71
Table A.3: Statistics of all the test set with the binary threshold of 50
Image 1
Image 2
Image 3
Image 4
Image 5
Image 6
Image 7
Image 8
Image 9
Image 10
Image 11
Image 12
Image 13
Image 14
Image 15
Image 16
Image 17
Image 18
Image 19
Image 20
Image 21
Image 22
Image 23
Image 24
Image 25
Image 26
Image 27
Image 28
Image 29
Image 30
Image 31
Image 32
Image 33
Image 34
Image 35
Image 36
Image 37
Image 38
Total:
Total time(s):
Average time (s):
Predicted
Number
of Cables
Lines
of Errors
4
4
4
4
4
4
4
4
4
4
3
4
4
4
4
4
1
4
5
4
4
3
4
3
2
4
4
2
4
4
4
4
4
4
4
1
4
4
3
9
9
6
7
2
9
8
8
17
11
7
11
6
9
16
0
8
16
4
21
29
0
5
0
2
7
0
6
7
11
7
5
7
5
0
6
7
1012,47
0,266
All predicted
lines in the
interested
area
2
0
4
3
3
0
5
4
5
6
6
3
7
4
5
4
0
5
10
3
3
4
0
3
0
1
2
0
4
5
4
4
4
4
3
2
3
4
Actual faulty
lines in the
interested
area
2
0
2
2
2
0
2
2
2
0
1
1
2
2
1
3
0
3
0
2
2
2
0
1
0
1
1
0
3
3
3
3
3
1
1
0
1
3
If error is
Precision
detected
of the
(1 is
image
correct)
1
1
0
1
0,5
1
0,67
1
0,67
0
1
0,4
1
0,5
1
0,4
0
0
1
0,17
1
0,33
1
0,29
1
0,5
1
0,2
1
0,75
0
1
0,6
0
0
1
0,67
1
0,67
1
0,5
0
1
0,33
0
1
1
1
0,5
0
1
0,75
1
0,6
1
0,75
1
0,75
1
0,75
1
0,25
1
0,33
0
0
1
0,33
1
0,75
29
15,90
Recall: Precision:
76,32%
54,84%
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
72
APPENDIX A. TABLE OF TESTING RESULT
Table A.4: Statistics of all the test set with the binary threshold of 60
Image 1
Image 2
Image 3
Image 4
Image 5
Image 6
Image 7
Image 8
Image 9
Image 10
Image 11
Image 12
Image 13
Image 14
Image 15
Image 16
Image 17
Image 18
Image 19
Image 20
Image 21
Image 22
Image 23
Image 24
Image 25
Image 26
Image 27
Image 28
Image 29
Image 30
Image 31
Image 32
Image 33
Image 34
Image 35
Image 36
Image 37
Image 38
Total:
Total time(s):
Average time (s):
Predicted
Number
of Cables
Lines
of Errors
4
4
4
4
4
3
4
5
4
4
4
4
4
4
5
4
3
4
4
4
4
3
4
3
2
4
4
1
4
4
4
4
4
4
4
1
4
4
4
6
8
6
6
2
10
8
7
19
14
7
5
8
10
9
14
13
2
10
28
25
1
4
0
2
9
0
8
7
11
7
5
4
5
0
7
7
1014,97
0,267
All predicted
lines in the
interested
area
2
1
4
3
3
0
7
4
5
10
10
6
3
4
7
4
8
8
0
5
4
4
0
2
0
1
2
0
5
4
6
4
3
2
3
0
4
4
Actual faulty
lines in the
interested
area
2
1
2
2
2
0
2
2
2
0
1
1
2
2
0
3
0
3
0
2
2
1
0
0
0
1
1
0
3
3
3
3
3
1
1
0
1
3
If error is
Precision
detected
of the
(1 is
image
correct)
1
1
1
1
1
0,5
1
0,67
1
0,67
0
1
0,29
1
0,5
1
0,4
0
0
1
0,1
1
0,17
1
0,67
1
0,5
0
0
1
0,75
0
0
1
0,38
0
1
0,4
1
0,5
1
0,25
0
0
0
0
1
1
1
0,5
0
1
0,6
1
0,75
1
0,5
1
0,75
1
1
1
0,5
1
0,33
0
1
0,25
1
0,75
28
15,66
Recall: Precision:
73,68%
55,93%
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
APPENDIX A. TABLE OF TESTING RESULT
Table A.5: Statistics with slider
Image 1
Image 2
Image 3
Image 4
Image 5
Image 6
Image 7
Image 8
Image 9
Image 10
Image 11
Image 12
Image 13
Image 14
Image 15
Image 16
Image 17
Image 18
Image 19
Image 20
Image 21
Image 22
Image 23
Image 24
Image 25
Image 26
Image 27
Image 28
Image 29
Image 30
Image 31
Image 32
Image 33
Image 34
Image 35
Image 36
Image 37
Image 38
Total
If error
is detected
1
1
1
1
1
1
1
1
1
0
1
1
1
1
1
1
0
1
0
1
1
1
0
1
0
1
1
0
1
1
1
1
1
1
1
0
1
1
31
Precision of
each image
1
1
0,5
1
0,666667
0,5
0,5
0,666667
0,4
0
0,166667
0,333333
0,666667
1
1
0,75
0
0,6
0
0,666667
0,666667
0,5
0
0,4
0
1
1
0
0,75
0,75
0,75
0,75
1
0,5
0,5
0
0,5
0,75
21,23333
Recall:
81,58%
Precision:
68.49%
73
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
Appendix B
Testing Result by Image Processing
Following images in B are all the test sets with potentially detected flagging tapes, which are
marked with red lines. The left three cables are plastic-type of weatherproofing, which is not
considered among the result statistics. The rightmost one is the flagging tape and it is the detect
target.
74
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
APPENDIX B. TESTING RESULT BY IMAGE PROCESSING
75
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
76
APPENDIX B. TESTING RESULT BY IMAGE PROCESSING
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
APPENDIX B. TESTING RESULT BY IMAGE PROCESSING
77
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
78
APPENDIX B. TESTING RESULT BY IMAGE PROCESSING
DocuSign Envelope ID: 0386690B-F538-4312-8D49-C49D2DEA582B
TRITA-EECS-EX-2020:38
www.kth.se
Download