Uploaded by yong xu

A weld seam feature real-time extraction method of three typical welds based on target detection

advertisement
Measurement 207 (2023) 112424
Contents lists available at ScienceDirect
Measurement
journal homepage: www.elsevier.com/locate/measurement
A weld seam feature real-time extraction method of three typical welds
based on target detection
Liangyuan Deng 1, Ting Lei 1, Chaoqun Wu *, Yibo Liu , Shiyu Cao , Song Zhao
School of Mechanical and Electrical Engineering, Wuhan University of Technology, Wuhan 430070, Hubei Province, China
Hubei Province Engineering Research Center of Robot & Intelligent Manufacturing, Wuhan 430070, Hubei Province, China
A R T I C L E I N F O
A B S T R A C T
Keywords:
Weld seam feature point extraction
Laser stripe feature point
Target detection
CenterNet
To realize strong robust, widely adaptable and highly precise weld seam feature detection, an extraction method
based on the improved target detection model CenterNet is proposed. In this target detection method, the weld
feature points on the laser strip are taken as the center points of the bounding box. This eliminates the need for an
initial positioning frame. When dealing with multiple welds, an independent classifier is used to predict the weld
type, which can avoid false detection. In post-processing, this result is used to determine the number of feature
points to filter out suspicious targets. The method was tested with the welding image dataset of three typical
welds. The classification efficiency of the three welds can reach 99.359 %, with an average extraction error of
1.754 pixel and an average processing time of 32.186 ms. The proposed method exhibits high performance in
dealing with common welding images with time-varying noise.
1. Introduction
With the development of automation, welding robots are increas­
ingly used in industrial manufacturing [1,2]. Traditional robot welding
based on teach-playback usually ignores thermal deformation, assem­
bling clearance and workpiece size errors, thus leading to the deviation
of welding trajectory [3–5]. To solve this problem, laser vision sensors
are applied to welding robots for real-time welding guidance [6,7].
Such robot technology has significantly improved the accuracy of
automatic welding, owing to the application of laser vision with ad­
vantages of easy realization and high precision [8]. The key of this
technology lies in the detection algorithm of laser vision, which can
locate the weld seam through image processing during welding [9,10].
Li et al. [11] used a median filtering method to reduce image noise, thus
enabling to extract center line of the laser stripe. Finally, the feature
points could be extracted according to the gradient analysis method.
After that, Muhammad et al. [12] proposed to increase the effect of
median filtering by enhancing the laser stripes in the image, so as to
obtain the weld feature points by using the feature extraction algorithm.
To further reduce the noise, Li et al. [13] captured the weld contour from
the image through the Kalman filter and least square fitting and then
specific position of the weld by matching it with the defined model. Xiao
et al. [14] improved the snake model by designing an energy function
suitable for the laser stripe characteristics,which it was used as a stripe
extractor. Then, the feature points could be identified according to the
curvature of the stripes. To improve the robustness, Wang et al. [15]
proposed to use the information from previous frame to guide the pro­
cessing of current one, resulting in the NURBS-snake model to detect the
center of the laser line. Considering that the images in the welding
process are continuous, the target tracking method has also been applied
to the extraction of weld feature points [16,17]. Yang et al. [18]
developed a seam tracking algorithm based on the kernel correlation
filtering algorithm, which ensured to realize fast and accurate tracking
of multiple welds. Zou et al. [19] combined convolution filters and
morphology to realize the tracking of weld feature points.
Although morphological methods can be used to extract feature
points from welding images, specific design is necessary when they are
applied for different applications, due to their poor generalization
ability. Moreover, the morphological methods have three shortcomings,
namely, poor extraction robustness under strong interferences, weak
algorithm for multiple welds and difficulty in accuracy improvement
[20].
In recent years, the vision detection algorithm based on the deep
learning neural network has attracted more and more attentions, due to
* Corresponding author at: School of Mechanical and Electrical Engineering, Wuhan University of Technology, Wuhan 430070, Hubei Province, China.
E-mail address: chaoqunwu@whut.edu.cn (C. Wu).
1
The first authors are Liangyuan Deng and Ting Lei.
https://doi.org/10.1016/j.measurement.2022.112424
Received 16 September 2022; Received in revised form 28 December 2022; Accepted 29 December 2022
Available online 4 January 2023
0263-2241/© 2023 Elsevier Ltd. All rights reserved.
L. Deng et al.
Measurement 207 (2023) 112424
Fig. 1. Robot welding experimental platform with laser vision sensor.
Fig. 2. Examples of three typical welds and their corresponding welding images included in the dataset: (a), (b) and (c) are the fillet weldment, half V-groove
weldment and V-groove weldment, (d) (e) and (f) are the schematic views of three weldments, (g), (h) and (i) are the corresponding weld images for the three types
of welds.
the development of computer technology. Compared with the tradi­
tional morphological methods, the data-driven deep learning algorithm
has higher extraction robustness, stronger generalization capability and
higher accuracy in processing images against strong noises [21].
Meanwhile, the number of the weld types to be detected can be easily
increased by expanding the types of images. Therefore, deep learning
algorithms have been widely employed to extract weld seam feature
points. For example, Wu et al. [22] built a semantic segmentation
network by using the improved VGGNet (Visual Geometry Group
Network) to extract the edge features of laser stripes. Then, they used the
gray centroid method and B-spline method to extract the feature points.
Zhao et al. [23] also adopted the task model of semantic segmentation
and chose the ERFNet network. Instead of using morphological pro­
cessing to obtain feature points, they obtained them through semantic
segmentation. Similarly, Wang et al. [24] reported a new semantic
segmentation model and proposed an attention mechanism. Compared
with other segmentation results, the accuracy and speed have been
greatly increased. Considering the influence of strong noise, Xu et al.
2
L. Deng et al.
Measurement 207 (2023) 112424
Fig. 3. Three types of labels used when detecting key points by heat maps: (a) original image, (b) ground truth, (c) soft label, (d) offset label.
equipment is from Aotai. The industrial camera is Daheng’s Mercury
series, with a resolution of 2448 × 2048 pixel.
[25] proposed a method to repair welding image results by using a GAN
network, leading to DCFnet that could be used to track feature points. In
addition to the semantic segmentation method, target detection method
is another strategy for the feature point extraction of welds. For instance,
Xiao et al. [26] proposed an adaptive method for a variety of typical
welds, in which morphological methods are used to detect feature points
in the region of interest extracted by target detection. Zou et al. [27]
combined convolutional filter and deep learning methods to design a
two-stage tracker to solve the drift problem in tracking. In this case,
rough positions of feature points are first obtained through the convo­
lution filter, and then the results are optimized using the designed deep
reinforcement learning network.
Among the methods, the semantic segmentation and object detection
methods usually need to cooperate with morphological processing to get
the weld feature points. As a result, there could be two errors, from the
deep learning model and the morphological processing. In this regard, it
is necessary to design two processing algorithms, which is complicated
and time-consuming. The target tracking method requires high precision
for the first frame image processing and desired time sequence of the
processed images. Considering these problems, in this article, we pro­
posed a feature point extraction model for weld seams based on Cen­
terNet [28,29]. The key point detection technology could be used to
directly process the welding images with strong noises and obtain exact
position of the weld feature points. More importantly, input timing of
the image is not required.
The content is arranged as follows. In Section 2, the robotic auto­
matic welding platform is firstly introduced (Section 2.1) and then
dataset is constructed (Section 2.2). The proposed model is explained in
Section 3, covering the key point detection method (Section 3.1),
network structure (Section 3.2), and loss function for training (Section
3.3). In Section 4, the proposed model is trained and tested (Section 4.1)
and then the feature point extraction method based on semantic seg­
mentation in the literature is reproduced as a comparison (Section 4.2).
Section 5 is to summarize the works and Section 6 is a brief introduction
to future works.
2.2 Welding image dataset
Deep learning algorithms are primarily data-driven and model
training requires a large amount of annotated data. In this work, three
types of weldments shown in Fig. 2 (a), (b) and (c), were chosen to be
used as test objects, and images of these during the welding process were
acquired (Fig. 2 (g), (h) and (i)). These images are all grayscale images
gathered by using the structured light sensors. After the image acquisi­
tion is completed, the welding image needs to be labeled with two
contents, including weld type and the coordinates of the weld feature
points on it. These feature points are defined as the turning points of the
laser line on the weldment contour. The feature points of the three types
of welds are illustrated in Fig. 2 (d), (e) and (f), they are marked with red
circles in schematic cross-sectional view of the weldment. Fillet weld
seam has only one feature point at the position of the weld, while half Vgroove and V-groove weld seams include the endpoint of the groove as
the feature point. When marking, these feature points are only marked
with coordinates and not classified. All feature points of the three types
of weld seam are regarded as the same type. The position labeling of
weld feature points is completed by applying the open-source labeling
software Labelme. The captured weld image was imported into the
software and the position of the weld features in the image to be labeled
was zoomed in to the pixel level. The center line of the laser stripe is then
manually located and the feature points are marked using the mouse.
Eventually, all annotation information is transcribed consistently into a
text file, including image information and annotation data, which is
convenient for subsequent training tests. The final data set contains a
total of 2370 pictures, including 919 fillet weld seams, 702 half Vgroove weld seams, and 749 V-groove weld seams.
3 Extraction method of weld feature points
In order to extract weld feature points in welding images without
relying on morphological methods and image continuity, a single-stage
detection network is designed basing on the target detection network
CenterNet. The CenterNet is a powerful anchor-free target detection
network, which is different from the anchor box mode used in the
traditional target detection network [28]. The method does not require
to generate a mass of anchor boxes as candidate targets, but directly
predicts the center point of the bounding box and then regression
calculation to get the size of the bounding box. Therefore, in this
method, the weld feature point is taken as the center point of the target
and the prediction of the size of the bounding box is eliminated.
2 Experimental platform and dataset
2.1 Experimental platform
In this paper, an experimental platform, consisting of a robot with
controller, a welding equipment, a structured light sensor, and a com­
puter terminal, was set up, as shown in Fig. 1. In this system, the com­
puter terminal is used as the upper computer to control the image
acquisition of the vision sensor and process the welding image. At the
same time, it also communicates with the robot control cabinet. The
vision sensor is composed of an industrial camera, a laser, a protective
glass, and a shell. The first three are installed on the shell. The robot is
responsible for movement task, while the welding action is completed by
the welding machine and the welding gun at the end of the robot arm.
The ABB’s IRB6700 industrial robot was adopted and the welding
3.1 Key point detection method based on Gaussian heat map
The Gaussian heat map is widely used to detect the target positions of
key points in a graph. This method was initially applied to human pose
3
L. Deng et al.
Measurement 207 (2023) 112424
Fig. 4. Overall layout of the proposed network and specific structure of each module: (a) overall network structure, (b) encoder, (c) decoder, (d) prediction branch
and (e) legend.
estimation [30,31]. Then, it was introduced for target detection by Hei
et al. [32] and Zhou et al. [33]. Zhou et al. [28] proposed a more concise
detection method that can be used to directly detect the center point
coordinates of bounding box.
Compared with those of other commonly used regression methods,
the label and output of the Gaussian heat map method are not the co­
ordinates of the key points. Actually, the heat map reflects the proba­
bility of the position of the feature points, given by
̂ (x,y) ∈ [0, 1]W × H × C where W × H is the size of the output feature
Y
with coordinates (x, y), its coordinates on the output heat map with heat
can
be
calculated
as
size
Hheatmap × Wheatmap
)
(
xheatmap + Δxoffset , yheatmap + Δyoffset , where xheatmap and yheatmap are the
integer parts and Δxoffset and Δyoffset are the fractional parts.
⎧
x
⎪
⎨ xheatmap + Δxoffset = × Wheatmap
W
⎪
⎩ yheatmap + Δyoffset = y × Hheatmap
H
map and C is the number of keypoint types. These pixels will be filtered
by a set value of probability, and the final filtering result is the predicted
feature point coordinates. Nevertheless, it is challenging to accurately
predict the position of a pixel in a feature map of H × W size. In most
cases, the target point cannot be accurately defined by a pixel position,
which means it is difficult to mark the feature point. Since the points
near the target point are generally highly similar to the target point, if
they are marked as negative samples directly, interference could be
brought with the network training. Therefore, in the method to detect
key points through the Gaussian heat map, an additional heat map
generated by the Gaussian function is included as a soft label. These soft
labels offered directional guidance to the training of the network. The
shorter the distance to the target point, the greater the activation value
would be. In this way, the network can reach the target point quickly
and make the network converge faster.
The visualization of the two types of labels needed when training the
heat map is shown in Fig. 3. Fig. 3(a) shows the original image that is
labeled with size H × W, while Fig. 3(b) and Fig. 3(c) show its corre­
sponding ground truth and soft label visualization. For a feature point
(1)
The ground truth label corresponding to this feature point is a tensor,
whose shape is consistent with the heat map. This tensor has a value of 1
(
)
at xheatmap , yheatmap , while the rest of the tensor is 0. It is visualized as a
mask map, as shown in Fig. 3(b), with only bright spots at the feature
points. The corresponding soft label is similar, as shown in Fig. 3(c), but
the difference is that the mask part of this label is a Gaussian circle with
(
)
radius r centered at xheatmap , yheatmap , whereas the activation value of
this mask region is the maximum at the center of the circle and decays to
the boundary along the radius. The function that generates the Gaussian
circle is [28]:
⎛ (
)2 (
)2 ⎞
+
y
−
̃
p
x
−
̃
p
x
y
⎟
⎜
⎟
⎜
Yxyc = exp⎜ −
(2)
⎟
⎝
⎠
2σ 2p
where x and y are the coordinates of the target point, ̃
px , ̃
py are the
4
L. Deng et al.
Measurement 207 (2023) 112424
Fig. 5. Results of two image processing methods for welding images with strong noise: (a) original image, (b) laser stripe centerline, (c) feature maps.
Table 1
Parameters of each layer of the three detection branches.
Branch
Layer
Kernel size
Channels
Weld type
Conv1
Global pool
Conv1
BN
ReLU
Conv2
Conv1
BN
ReLU
Conv2
1×1
H×W
3×3
–
–
1×1
3×3
–
–
1×1
C
–
64
64
–
1
64
64
–
2
Heat map
Offset
coordinates of the center of the circle and σ p is set to the radius of the
Gaussian circle. In general target detection methods, the radius is
calculated by the size of the bounding box. However, in weld inspection,
it is not necessary to predict the size of the bounding box. Therefore, a
uniform value of 3 is used, which is half of the estimated thickness of the
laser line in the input image.
Because the detection method of the Gaussian heat map is not an
end-to-end model, the accuracy of the output feature point coordinates
will be lost if they are integers. In addition, feature images smaller than
the original input image in size are compressed by feature extraction,
which also leads to a large coordinate deviation. Therefore, an extra
coordinate offset is predicted outside the heat map, so that an accurate
prediction of the coordinate of the weld feature points can be achieved
through the combination. This offset is the compensation for the decimal
places of the coordinates of the feature points after the image is scaled.
The label used for the offset detection branch in model training is a
tensor with two channels of same size as the heat map. Also, the two
channels are the x and the y axis. Similar to that of the labels of the heat
map, the tensor representing the offset labels has values only at the
Fig. 6. Training and testing loss curves.
(
)
integer coordinates xheatmap , yheatmap of the feature points and the rest is
0. The difference is that the values on the two channels of the offset
labels are Δxoffset and Δyoffset . It is visualized as shown in Fig. 3(d).
3.2 Feature point detection network based on Gaussian heat map
The proposed deep learning model refers to the classical target
detection method CenterNet, with the overall structure design of the
model to be shown in Fig. 4 (a). The input image accepted by the
network is a welding image with a size of 3 × 512 × 512. The image is
encoded and decoded to extract a feature map with 64 channels and a
5
L. Deng et al.
Measurement 207 (2023) 112424
Fig. 7. Feature point extraction results of three types of welds by using the model: (a) fillet weld seam, (b) half V-groove weld seam and (c) V-groove weld seam.
Table 2
MAE, ME and RMSE of different types of welds on the X-axis and Y-axis.
Type
Accuracy
X- MAE
(pixel)
Y- MAE
(pixel)
X-ME
(pixel)
Y-ME
(pixel)
X-RMSE
(pixel)
Y-RMSE
(pixel)
Fillet
Half V-groove
V-groove
Total
99.456
99.288
99.332
99.359
1.781
1.997
2.611
2.130
1.472
1.439
1.220
1.377
26.965
9.129
12.169
26.965
10.811
7.636
11.327
11.327
3.856
2.379
3.190
3.221
2.461
1.819
1.623
2.021
%
%
%
%
the problem of vanishing gradient in network training, improved the
performance of the network and reduced the number of parameters. In
the decoding part, because the main function is to upgrade the resolu­
tion, there is no need to have too many operations. Therefore, 8 times
up-sampling can be achieved by three deconvolution operations, as
shown in Fig. 4(c).
After obtaining the feature map output with the encoding and
decoding structure, the weld type, heat map, and offset need to be
predicted through three independent branches, as shown in Fig. 4(d).
The purpose of designing the three branches is to obtain the feature
points in the welding image more accurately. When the feature points
are located by the Gaussian heat map, generally, the size of the output
heat map is H × W × C. The feature points are filtered from the heat map
by setting a threshold value. This method can ensure maximum
screening of possible targets, but in the feature point detection of welds,
there will not be multiple types of welds in one picture. The number of
feature points of different weld types is determined. To ensure an ac­
curate detection of the required number of feature point coordinates, the
prediction of weld type is included as a constraint. At the same time, the
feature points of different weld types can be regarded to have same
feature, that is, the inflection point of the laser stripe. Under these two
conditions, it can be considered that the feature points in all weld images
belong to the same category. Therefore, the number of channels in the
predicted output heat map can be set as 1 with a size of H × W, which
only represents the probability that each point on the feature map may
be the feature point of the weld, but does not include the prediction of
the category of the feature point. Such heat maps are no longer used to
predict the target category, but only to obtain the location information
of the target. Then a global offset prediction is used to improve the ac­
curacy of feature point coordinate detection. Although this offset is a
global prediction, only the one corresponding to the feature point will be
selected according to the result of the heat map in the post-processing.
The three branches are mainly composed of the convolution layer.
The specific parameters of each branch are listed in Table 1. The weld
type prediction branch is composed of a convolution layer and a global
pooling [35]. Global pooling is a method of pooling calculations in
which the size of the sliding window is the same as the size of the input
feature map at the time of calculation. By using this pooling method, the
input H × W × C sized feature map can be converted to a C × 1-sized
output, where H and W are the width and height of the feature maps and
C represents the number of weld types. The convolution layer combined
size of 128 × 128. Then, the results are sent to three independent
branches to predict the weld type, coordinate and offset, thus obtaining
coordinates of the weld feature points on the image accordingly.
In the method utilizing morphology, when processing an image with
strong noise, as shown in Fig. 5 (a), it is usually necessary to use various
denoising methods to obtain an image containing only laser stripes, as
shown in Fig. 5 (b). Accordingly, positions of the feature points can be
calculated. Nonetheless, in the method of image processing employing
deep learning, the input image is commonly processed directly through
the feature extraction network to gain a set of low-resolution feature
maps as shown in Fig. 5 (c), which are transferred them to subsequent
modules for processing. Because the feature extraction network plays an
important role in excluding noise and extracting key information, so it is
necessary to select a good and fast feature extraction network.
In our proposed model, in order to ensure that the Gaussian heat map
detection has desired accuracy, an encoding and decoding structure has
been used for the feature extraction. When using a Gaussian heat map to
predict the coordinates of feature points, the low resolution of the
feature map will lead to large errors, so that large compensation values
are required. However, if the resolution is too high, it will make the
model difficult to converge and the accuracy will be affected. At the
same time, it will increase the number of model parameters and hence
affect the running speed. Therefore, the adopted encoding and decoding
structure is first down-sampling the input image for 32 times to obtain
an image with a size to be 1/32 of the original one. Then, the image
resolution is restored to 1/4 of the original image by up-sampling for 8
times. In this process, the encoding part is the backbone of feature
extraction, while the decoding part is mainly used to restore the
resolution.
To meet the requirements of accuracy and real-time performance,
DenseNet was selected as the encoder. The network has a positive good
effect on feature extraction through the use of dense connection mode,
and can effectively reduce the number of network parameters and in­
crease the running speed [34]. The main structure of DenseNet is shown
in Fig. 4 (b), which is mainly composed of dense blocks and bottlenecks.
In a dense block module, the input to each convolutional layer is the
output of all the layers before it plus the original one. This operation
requires the feature map to have a consistent size in each dense block.
The function of the Bottleneck layer between each dense block is to
reduce the number of feature maps and the computation of the network.
Through this dense connection mode, DenseNet successfully addressed
6
L. Deng et al.
Measurement 207 (2023) 112424
Fig. 8. Test error curves of the feature points extracted with the improved CenterNet: (a) and (b) are the errors of fillet weld on the X-axis and Y-axis, respectively; (c)
and (d) are the errors of half V-groove weld on the X-axis and Y-axis, respectively; (e) and (f) are the errors of half V-groove weld on the X-axis and Y-axis,
respectively.
with the global pooling operation can replace the fully connected layer,
which can allow the size of the input image to be unlimited. The pre­
dicted output of the weld type is a vector with a size of C × 1, which
represents the one-hot encoding of the weld type. The branch of the
feature point heat map consists of two convolution layers, with which
the input feature maps are calculated to obtain a heat map with the
shape of 1 × H × W. The structure of the coordinate offset branch is the
same as that of the heat map branch, with a difference in the parameters
of the convolution layer. The output of the coordinate offset branch is a
tensor with a shape of 2 × H × W. This result includes the compensation
value for each point in the figure. The convolution operations used in the
three branches have the same step size. However, in the branches of the
heat map and offset, in order to ensure that the size of the convolved
feature map remains unchanged, it is necessary to use padding
supplement. In comparison, the weld type branch has no such
requirement.
3.3 Selection of loss function for training
The loss function is used to calculate the error of the detection result
of the training model and the ground truth. When training the deep
learning model, selecting an appropriate loss function can effectively
accelerate the convergence speed and improve the accuracy of the
model. Since the weld key point detection has three outputs, different
loss functions are selected according to their characteristics. The loss of
the entire network is calculated as the sum of three outputs. Considering
that the number of positive samples in the heat map is much smaller
than that of negative samples, the focal loss function is used to obtain
7
L. Deng et al.
Measurement 207 (2023) 112424
The offset prediction is regression method, using the commonly used
L1 loss function [36]. The formula is as follows,
Loffset =
N
N
1 ∑
1 ∑
L1Loss(yi , ̂y i ) =
y i |)
(|yi − ̂
N i=1
N i=1
(4)
where N is the number of samples, yi and ̂
y i are the predicted value, and
the labeled value, respectively.
Multi-category cross-entropy function is used to predict the welding
category, given by:
C− 1
∑
Lclass = −
yi log(pi ) = − log(ptrue )
(5)
i=0
where C is the number of categories in the sample, yi is the label value of
the predicted target belonging to category i (usually 1 or 0, corre­
sponding to belonging or not belonging). pi is the probability value when
the target predicted by the model belongs to the category i. Expanding
this equation reduces to the result on the right-hand side, where the loss
is essentially the negative logarithm to calculate the predicted proba­
bility of the model for predicted probability of the target belonging to
the category (ptrue ) to which it really belongs.
The loss values of the three parts together constitute the loss of the
whole network.
Fig. 9. Time-consuming curve of image processing.
Table 3
Statistical results of the 5-fold cross-validation test.
Fold
Accuracy
X- MAE
(pixel)
Y- MAE
(pixel)
X-RMSE
(pixel)
Y-RMSE
(pixel)
1
2
3
4
5
Mean
Standard deviation
99.359 %
99.789 %
99.367 %
99.577 %
99.788 %
99.576 %
0.0019
2.130
2.090
2.506
2.152
2.722
2.320
0.250
1.377
1.437
1.517
1.263
1.472
1.411
0.088
3.221
3.209
3.887
3.519
3.810
3.529
0.284
2.021
2.204
2.271
1.973
2.200
2.134
0.11551
L = Lclass + Lheatmap + Loffset
where Lclass is the predicted loss of the weld type output by the network,
Lheatmap is the loss of the output heat map, and Loffset is the loss of the
offset of feature points.
4 Experiment and analysis
In this section, the dataset created in Section 2.2 was divided into 5
equally sized groups for 5-fold cross-validation with reference to the Kfold cross-validation method. In total, five training and testing sessions
were conducted, i.e., the five groups were rotated as the test set, while
the other four groups were used for training. For the results of the 5-fold
cross-validation, because the results of the 5 groups are similar, in this
section, only the first training and testing results in order to reduce the
space are displayed. For the remaining four times, only the statistical
results are shown.
better results [32].
Lheatmap =
{
(
)α ( )
C ∑
H ∑
W
1 − pcij log pcij
− 1∑
(
)β ( )α (
)
N c=1 i=1 j=1
1 − ycij pcij log 1 − pcij
(6)
if ycij = 1
otherwise
(3)
where N is the number of samples, C is the number of categories in the
sample, and H and W are the dimensions of the heat map. pcij and ycij
respectively represent the predicted score and ground truth at position
(i, j), while α and β are the parameters used to control weights.
Fig. 10. The weld seam images and corresponding semantic segmentation labels: (a) welding images of three types of welds, (b) corresponding ground truth.
8
L. Deng et al.
Measurement 207 (2023) 112424
Fig. 11. Test results with DeepLabV3plus and YOLOv5: (a), (b) and (c) are results of DeepLabV3plus splits out the laser stripes in the weld image, (d), (e) and (f) are
false detections of YOLOv5.
Table 4
Statistical results of feature point extraction errors of the two methods.
Method
Type
DeepLabV3plus
X- MAE
(pixel)
Fillet
10.758
Half V-groove
5.734
V-groove
17.428
Total
10.028
YOLOv5a
Fillet
1.947
Half V-groove
1.271
V-groove
1.100
Total
1.439
a
In the test results with YOLOv5, the data of false detection are removed.
Y- MAE
(pixel)
X-ME
(pixel)
Y-ME
(pixel)
X-RMSE
(pixel)
Y-RMSE
(pixel)
7.001
6.850
8.948
7.313
2.503
1.836
2.209
2.183
91.941
78.8453
97.590
97.590
50.499
8.000
9.015
50.499
97.836
80.171
80.171
97.836
14.000
6.001
7.170
14.000
20.398
13.931
30.706
20.147
5.99
1.826
1.642
2.971
16.866
18.581
16.968
17.687
3.097
2.152
2.332
2.432
4.1 Model training and testing
an aid to determine the convergence of the model, while the model with
the lowest test loss is generally considered to be successfully trained.
According to Fig. 6, after the 100th epoch, the training loss tended to be
constant while the test loss started to increase. It can be considered that
the model has enough iterations and starts overfitting. Therefore, the
model of the 100th training epoch was selected for subsequent tests.
To evaluate performance of the model, it was tested with the test
dataset. Fig. 7 inspection results of each of the three weld seams by using
the model. The areas where the feature points are located are marked
with red bounding boxes and the types of weld are indicated. The lo­
cations of the feature points are marked with red crosses. In addition, the
X-axis and Y-axis in the figure indicate the directions of the two errors, i.
e., the subsequent statistical characteristic point coordinate detection
errors.
The trained model was tested on the test set, while the results of weld
type and feature point extraction were analyzed. The detection results of
the model were compared with those of the label, and the prediction
accuracy of the weld type was calculated as listed in Table 2. At the same
time, the extraction error of the weld feature points compared with the
labeled coordinates under the detection image resolution (2448 × 2048)
was calculated. The mean absolute error (MAE), the maximum error
(ME) and the root mean square error (RMSE) were statistically obtained
as summarized in Table 2. The MAE is calculated as:
The network is implemented using the PyTorch deep learning
framework based on Python. The dataset collected and made by using
the robot welding experimental platform mentioned above was used for
the training and testing of the model. The machine used to train the
model was a computer equipped with two Nvidia 2080Ti 12G ram
models, with an Intel I9-10 CPU. With this machine, the designed model
was trained and tested using the image dataset of the weld. For the
initialization parameter setting of the model, the DenseNet121 pretraining network model in the PyTorch framework library was
selected to initialize the feature extraction part of the network structure,
in order to accelerate the training convergence speed of the model. In
actual operation, because only DenseNet121 was used as the feature
extraction structure, in the training parameters of the imported Dense­
Net121 model, the decision-making layer part should be deleted. In the
key point detection part, initial parameters were given by using the HE
initialization method, where the deviation was initialized to 0. Adam
optimizer was used in training, while default parameters were used
instead of manual Settings for learning rate and weight attenuation.
After 150 epochs on the training dataset produced, the train loss reflects
the variation of the model’s loss on the training set, as shown in Fig. 6.
While training, the validation of the test set was inserted at every 10
epochs interval, thus obtaining the test loss curve, as illustrated in Fig. 6.
Because these tests were not involved in back-propagation, the training
of the model was not affected. The loss curve of the test set was used as
MAE =
9
N
1 ∑
yi|
|yi − ̂
N i=1
(7)
L. Deng et al.
Measurement 207 (2023) 112424
Fig. 12. Test error curves of feature points extracted by using the semantic segmentation method: (a) and (b) are the errors of fillet weld on the X-axis and Y-axis,
respectively; (c) and (d) are the errors of half V-groove weld on the X-axis and Y-axis, respectively; (e) and (f) are the errors of half V-groove weld on the X-axis and Yaxis, respectively.
the X-axis and Y-axis. The absolute error is the difference in absolute
values in the x or y direction between the coordinates of the feature
points extracted by the test and their corresponding labeled ones. The
MAE is an average of the absolute errors of all feature points for a given
type of weld in the test, which was drawn as a red horizontal line and
labeled with the average value. The points marked in the plots are the
locations with the largest value of error for that type of image in the test.
According to the definition in labeling, the welding images of fillet weld,
half V-groove weld and V-groove weld contain 1, 2 and 3 feature points,
respectively. These feature points were all calculated and counted to
identify the errors. It is found that sufficiently high performance in the
test set has been achieved using the trained model. The prediction ac­
curacy of weld types is 99.359 %, thus ensuring accurate classification of
the three types of welds. In the extraction of weld feature points, the
where N is the number of test samples, yi and ̂
y i are the true coordinates
of the feature points labeled and the coordinates predicted by the model,
respectively. This equation was used to calculate the average error in the
model on the detection of feature point coordinates. Meanwhile, the
RMSE was used to evaluate the error that may be generated by the
model, which is calculated as:
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
√
N
√1 ∑
RMSE = √
(8)
(yi − ̂y i )2
N i=1
Fig. 8 shows the error curves. The three groups respectively corre­
spond to the coordinate error curves of the feature points extracted from
the test images of fillet weld, half V-groove weld and V-groove weld, on
10
L. Deng et al.
Measurement 207 (2023) 112424
Fig. 13. Test error curve of feature points extracted by YOLOv5 with removing false detection: (a) and (b) are the errors of fillet weld on the X-axis and Y-axis,
respectively; (c) and (d) are the errors of half V-groove weld on the X-axis and Y-axis, respectively; (e) and (f) are the errors of half V-groove weld on the X-axis and Yaxis, respectively.
overall MAE on the X-axis and Y-axis can be 2.130 pixel and 1.377 pixel.
Also, the MAE of each type of welds is less than 3 pixel.
In robot real-time automatic welding, time consumption is an
important evaluation index. In real-time welding trajectory tracking, to
ensure real-time performance, it is necessary to reduce the time of
inference as much as possible. To calculate the processing time of the
model, it was tested for 100 times with a single image, resulting in the
time-consuming curve, as shown in Fig. 9. The maximum time con­
sumption in the whole reasoning process is 65.470 ms, the minimum
time consumption is 26.072 ms and the average time consumption is
32.186 ms, showing a promising real-time effect, which meets the
requirement of for real-time weld seam track tracking.
The final results of the 5-fold cross-validation experiment are listed
in Table 3, in terms of the values of MAE and RMSE on the X-axis and Y-
axis for each fold of the tested model. The mean and standard deviation
of the results were also calculated. In the X and Y directions, the 5-fold
cross- validation’s MAE is 2.320 pixel and 1.411 pixel, meanwhile its
RMSE is 3.529 pixel and 2.134 pixel. These values are within a narrow
fluctuation of error and satisfy the needs of the application. The results
of these five tests have a low standard deviation, and the variations of
MAE and RMSE are less than 1 pixel, indicating that they are also stable.
Overall, the model showed good stability and mediocre performance.
4.2 Comparative experiment of feature points extraction
In this section, two methods are used for comparison. The first one is
a combination of semantic segmentation with morphological processing
to extract weld feature points [24]. In this case, the semantic
11
L. Deng et al.
Measurement 207 (2023) 112424
Fig. 14. Histograms of MAE and RMSE for the three methods: (a) Mean absolute error, (b) Root mean square error.
segmentation network (DeepLabV3plus [37]) is used to separate the
laser stripes from the weld image and then a morphological approach is
employed to obtain the coordinates of the feature points. In the second
method, a target detection network YOLOv5 is utilized, with detection
was achieved by using the weld feature points as the center of the
bounding box. The deep learning models used in two methods are
trained and tested. The first method requires new labels of the welded
images because of the semantic segmentation network. Fig. 10 shows
semantic segmentation label of the welding image, with subfigure (a) as
the original image and subfigure (b) as its corresponding semantic
segmentation label.
The two methods were trained and tested in the same environment,
with representative test results to shown in Fig. 11. Fig. 11 (a), (b) and
(c) shows the results of the semantic segmentation model for separating
laser stripes. It is observed that the segmented laser stripes are incom­
plete, which has impact on the subsequent morphological processing to
a center degree. As seen in Fig. 11 (d), (e) and (f), YOLOv5 has false
detection. In Fig. 11 (d), the edge of weldment is misidentified as a
feature point, in Fig. 11 (e), a spatter is misidentified as a V-groove weld,
and in Fig. 11 (f), a redundancy is misidentified near a feature point. In
other words, other features similar to feature points in the weld image
might be incorrectly detected as feature points. This is because the two
types of features may have similar confidence levels, making it difficult
to eliminate by setting a threshold. Statistic results of the two methods
are listed in Table 4. For clear observation, only the correct detection
results are retained. The average processing times of semantic segmen­
tation and YOLOv5 are 58.993 ms and 40.271 ms, respectively.
Error curves of the test data with the two methods are shown in
Fig. 12 and Fig. 13. In Fig. 13, the false detection error of YOLOv5 is not
included. It is observed that the comprehensive performance of the se­
mantic segmentation method is the lowest among the three methods.
The MAE of this method is the largest, while large deviations are often
present. The two methods based on target detection have relatively high
performance. In the method proposed in this work, the error on the Yaxis is smaller than that on the X-axis, while opposite result is observed
when using YOLOv5. Therefore, in comparison with YOLOv5, the
method proposed in this work is weaker on the X-axis and stronger on
the Y-axis. Meanwhile, CenterNet and YOLOv5 are only slightly
different, when the error of false detection is not included.
To demonstrate the impact of false detection on accuracy, a statis­
tical histogram was drawn as shown in Fig. 14, which the false detection
data were not removed. YOLOv5′ s has not any false inspection for the
half V-groove weld, which is consistent with the result shown in Fig. 13
(b). However, for fillet and V-groove weld seams, the accuracy is seri­
ously affected by the false detection. In particular, the MAE reached
32.369 pixels, while the RMSE is 233.538 pixels. Therefore, false
detection should be avoided as much as possible.
The reason for the poor performance of the semantic segmentation
method could be attributed to the fact that the process of extracting
feature points is not sufficiently direct. The two steps of the method
introduced errors twice, i.e., the error of separating laser stripes and the
error of locating feature points. Moreover, the difficulty in separating
the laser stripes inevitably led to large errors in localizing of the feature
points. Therefore, this method is ineffective in dealing with the data in
this work. In contrast to YOLOv5, the proposed method in this work has
no false detections, owing to the addition of a weld type constraint. This
constraint limited the number of detected feature points, thus ensuring
that only the closest feature points are retained when there are similar
features.
5 Conclusions
(1) An experimental platform consisting of a robot, a laser vision
sensor, welding equipment and a computer terminal, has been
built to collect welding images of three types of welds. The ac­
quired images were labeled by an annotation tool and welding
image dataset was constructed.
(2) A method of extracting weld feature points was proposed based
on CenterNet, in which the weld feature points were regarded as
the center points of the bounding box for direct detection. It can
be used deal with disordered images without the assistance of
morphological methods and prior frames.
(3) The method was trained and tested with a dataset that was built
using the three types of weld images. The classification accuracy
for the three types of welds reached 99.359 %. Mean absolute
errors for fillet, half V-groove and V-groove weld seams, are
1.627 pixel, 1.718 pixel, and 1.9155 pixel, respectively. In
addition, the corresponding root means square errors are 3.159,
2.099, and 2.407, respectively. Moreover, the processing time for
a single image is 32.186 ms. Therefore, the model has sufficiently
high accuracy, stability, and real-time performance.
6 Future development
Firstly, the welding image dataset should be expanded by adding
new weld types. Secondly, this method could be expanded to multi-layer
and multi-pass welds. Thirdly, it is desirable to add new constraints to
further increase accuracy and stability of the model used for continuous
images. Last but not least, real application of the method for weld
tracking should be explored.
CRediT authorship contribution statement
Liangyuan Deng: Methodology, Writing – original draft, Software.
12
L. Deng et al.
Measurement 207 (2023) 112424
Ting Lei: Writing – review & editing, Conceptualization, Project
administration. Chaoqun Wu: Writing – review & editing, Resources,
Project administration, Supervision. Yibo Liu: Data curation. Shiyu
Cao: Writing – review & editing. Song Zhao: Resources.
[13] X. Li, X. Li, S.S. Ge, M.O. Khyam, C. Luo, Automatic Welding Seam Tracking and
Identification, IEEE Trans. Ind. Electron. 64 (2017) 7261–7271.
[14] R. Xiao, Y. Xu, Z. Hou, C. Chen, S. Chen, A feature extraction algorithm based on
improved Snake model for multi-pass seam tracking in robotic arc welding,
J. Manuf. Process. 72 (2021) 48–60.
[15] N. Wang, K. Zhong, X. Shi, X. Zhang, A robust weld seam recognition method under
heavy noise based on structured-light vision, Rob. Comput. Integr. Manuf. 61
(2020), 101821.
[16] Y. Zou, Y. Wang, W. Zhou, X. Chen, Real-time seam tracking control system based
on line laser visions, Opt. Laser Technol. 103 (2018) 182–192.
[17] Y. Zou, X. Chen, G. Gong, J. Li, A seam tracking system based on a laser vision
sensor, Measurement 127 (2018) 489–500.
[18] L. Yang, E. Li, T. Long, J. Fan, Z. Liang, A High-Speed Seam Extraction Method
Based on the Novel Structured-Light Sensor for Arc Welding Robot: A Review, IEEE
Sens. J. 18 (2018) 8631–8641.
[19] Y. Zou, T. Chen, Laser vision seam tracking system based on image processing and
continuous convolution operator tracker, Opt. Lasers Eng. 105 (2018) 141–149.
[20] A. Singh, V. Kalaichelvi, R. Karthikeyan, Application of Convolutional Neural
Network for Classification and Tracking of Weld Seam Shapes for TAL Brabo
Manipulator, Mater. Today: Proc. 28 (2020) 491–497.
[21] Y. Zou, W. Zhou, Automatic seam detection and tracking system for robots based
on laser vision, Mechatronics 63 (2019), 102261.
[22] K. Wu, T. Wang, J. He, Y. Liu, Z. Jia, Autonomous seam recognition and feature
extraction for multi-pass welding based on laser stripe edge guidance network, Int.
J. Adv. Manuf. Technol. 111 (2020) 2719–2731.
[23] Z. Zhao, J. Luo, Y. Wang, L. Bai, J. Han, Additive seam tracking technology based
on laser vision, Int. J. Adv. Manuf. Technol. 116 (2021) 197–211.
[24] B. Wang, F. Li, R. Lu, X. Ni, W. Zhu, Weld Feature Extraction Based on Semantic
Segmentation Network, Sensors (Basel) 22 (2022) 4130.
[25] F. Xu, H. Zhang, R. Xiao, Z. Hou, S. Chen, Autonomous weld seam tracking under
strong noise based on feature-supervised tracker-driven generative adversarial
network, J. Manuf. Process. 74 (2022) 151–167.
[26] R. Xiao, Y. Xu, Z. Hou, C. Chen, S. Chen, An adaptive feature extraction algorithm
for multiple typical seam tracking based on vision sensor in robotic arc welding,
Sens. Actuators, A 297 (2019), 111533.
[27] Y. Zou, T. Chen, X. Chen, J. Li, Robotic seam tracking system combining
convolution filter and deep reinforcement learning, Mech. Syst. Sig. Process. 165
(2022), 108372.
[28] X. Zhou, D. Wang, P. Krähenbühl, Objects as points, arXiv preprint arXiv:
1904.07850, (2019).
[29] X. Zhou, V. Koltun, P. Krähenbühl, in: Tracking Objects as Points, UT Austin,
Austin, United States, Intel Labs, Hillsboro, United States, 2020, pp. 474–490.
[30] J. Tompson, A. Jain, Y. LeCun, C. Bregler, Joint Training of a Convolutional
Network and a Graphical Model for Human Pose Estimation, Adv. Neural Inf.
Proces. Syst. (2014) 1799–1807.
[31] A. Newell, K. Yang, J. Deng, Stacked hourglass networks for human pose
estimation, European conference on computer vision, Springer, 2016, pp. 483-499.
[32] H. Law, J. Deng, Cornernet: Detecting objects as paired keypoints, Proceedings of
the European conference on computer vision (ECCV), 2018, pp. 734-750.
[33] X. Zhou, J. Zhuo, P. Krähenbühl, Bottom-up Object Detection by Grouping Extreme
and Center Points, Conference on Computer Vision and Pattern Recognition
(CVPR), 2019, pp. 850-859.
[34] G. Huang, Z. Liu, L. Van Der Maaten, K.Q. Weinberger, Densely connected
convolutional networks, Proceedings of the IEEE conference on computer vision
and pattern recognition, 2017, pp. 4700-4708.
[35] M. Lin, Q. Chen, S. Yan, Network in network, arXiv preprint arXiv:1312.4400,
(2013).
[36] R. Girshick, Fast r-cnn, Proceedings of the IEEE international conference on
computer vision, 2015, pp. 1440-1448.
[37] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, H. Adam, Encoder-decoder with
atrous separable convolution for semantic image segmentation, Proceedings of the
European conference on computer vision (ECCV), 2018, pp. 801-818.
Declaration of Competing Interest
The authors declare that they have no known competing financial
interests or personal relationships that could have appeared to influence
the work reported in this paper.
Data availability
The authors do not have permission to share data.
Acknowledgments
This research is supported by Project funded by National Natural
Science Foundation of China (52275506), Fund Project of Technology
Transfer Center of Wuhan University of Technology in Jingmen
(WHUTJMZX-2022JJ-10).
References
[1] C. Liu, H. Wang, Y. Huang, Y. Rong, J. Meng, G. Li, G. Zhang, Welding seam
recognition and tracking for a novel mobile welding robot based on multi-layer
sensing strategy, Meas. Sci. Technol. 33 (2022), 055109.
[2] T. Lei, Y. Rong, H. Wang, Y. Huang, M. Li, A review of vision-aided robotic
welding, Comput. Ind. 123 (2020), 103326.
[3] T. Lei, W. Wang, Y. Rong, P. Xiong, Y. Huang, Cross-lines laser aided machine
vision in tube-to-tubesheet welding for welding height control, Opt. Laser Technol.
121 (2020), 105796.
[4] C. Wu, P. Yang, T. Lei, D. Zhu, Q. Zhou, S. Zhao, A teaching-free welding position
guidance method for fillet weld based on laser vision sensing and EGM technology,
Optik 262 (2022), 169291.
[5] T. Lei, C. Wu, H. Yu, Electric Arc Length Control of Circular Seam in Welding Robot
Based on Arc Voltage Sensing, IEEE Sens. J. 22 (2022) 3326–3333.
[6] P. Yang, T. Lei, C. Wu, S. Zhao, J. Hu, A Fast Calibration of Laser Vision Robotic
Welding Systems Using Automatic Path Planning, IEEE Trans. Instrum. Meas. 71
(2022) 1–10.
[7] L. Yang, Y. Liu, J. Peng, Advances techniques of the structured light sensing in
intelligent welding robots: a review, Int. J. Adv. Manuf. Technol. 110 (2020)
1027–1046.
[8] W. Huang, R. Kovacevic, Development of a real-time laser-based machine vision
system to monitor and control welding processes, Int. J. Adv. Manuf. Technol. 63
(2012) 235–248.
[9] X. Lu, D. Gu, Y. Wang, Y. Qu, C. Qin, F. Huang, Feature Extraction of Welding Seam
Image Based on Laser Vision, IEEE Sens. J. 18 (2018) 4715–4724.
[10] C. Wu, J. Hu, T. Lei, P. Yang, S. Gu, Research on robust laser vision feature
extraction method for fillet welds with different reflective materials under
uncertain interference, Opt. Laser Technol. 158 (2023), 108866.
[11] L. Li, L. Fu, X. Zhou, X. Li, in: Image processing of seam tracking system using laser
vision, Robotic welding, intelligence and automation, Springer, 2007,
pp. 319–324.
[12] J. Muhammad, H. Altun, E. Abo-Serie, A robust butt welding seam finding
technique for intelligent robotic welding system using active laser vision, Int. J.
Adv. Manuf. Technol. 94 (2016) 13–29.
13
Download