Measurement 207 (2023) 112424 Contents lists available at ScienceDirect Measurement journal homepage: www.elsevier.com/locate/measurement A weld seam feature real-time extraction method of three typical welds based on target detection Liangyuan Deng 1, Ting Lei 1, Chaoqun Wu *, Yibo Liu , Shiyu Cao , Song Zhao School of Mechanical and Electrical Engineering, Wuhan University of Technology, Wuhan 430070, Hubei Province, China Hubei Province Engineering Research Center of Robot & Intelligent Manufacturing, Wuhan 430070, Hubei Province, China A R T I C L E I N F O A B S T R A C T Keywords: Weld seam feature point extraction Laser stripe feature point Target detection CenterNet To realize strong robust, widely adaptable and highly precise weld seam feature detection, an extraction method based on the improved target detection model CenterNet is proposed. In this target detection method, the weld feature points on the laser strip are taken as the center points of the bounding box. This eliminates the need for an initial positioning frame. When dealing with multiple welds, an independent classifier is used to predict the weld type, which can avoid false detection. In post-processing, this result is used to determine the number of feature points to filter out suspicious targets. The method was tested with the welding image dataset of three typical welds. The classification efficiency of the three welds can reach 99.359 %, with an average extraction error of 1.754 pixel and an average processing time of 32.186 ms. The proposed method exhibits high performance in dealing with common welding images with time-varying noise. 1. Introduction With the development of automation, welding robots are increas­ ingly used in industrial manufacturing [1,2]. Traditional robot welding based on teach-playback usually ignores thermal deformation, assem­ bling clearance and workpiece size errors, thus leading to the deviation of welding trajectory [3–5]. To solve this problem, laser vision sensors are applied to welding robots for real-time welding guidance [6,7]. Such robot technology has significantly improved the accuracy of automatic welding, owing to the application of laser vision with ad­ vantages of easy realization and high precision [8]. The key of this technology lies in the detection algorithm of laser vision, which can locate the weld seam through image processing during welding [9,10]. Li et al. [11] used a median filtering method to reduce image noise, thus enabling to extract center line of the laser stripe. Finally, the feature points could be extracted according to the gradient analysis method. After that, Muhammad et al. [12] proposed to increase the effect of median filtering by enhancing the laser stripes in the image, so as to obtain the weld feature points by using the feature extraction algorithm. To further reduce the noise, Li et al. [13] captured the weld contour from the image through the Kalman filter and least square fitting and then specific position of the weld by matching it with the defined model. Xiao et al. [14] improved the snake model by designing an energy function suitable for the laser stripe characteristics,which it was used as a stripe extractor. Then, the feature points could be identified according to the curvature of the stripes. To improve the robustness, Wang et al. [15] proposed to use the information from previous frame to guide the pro­ cessing of current one, resulting in the NURBS-snake model to detect the center of the laser line. Considering that the images in the welding process are continuous, the target tracking method has also been applied to the extraction of weld feature points [16,17]. Yang et al. [18] developed a seam tracking algorithm based on the kernel correlation filtering algorithm, which ensured to realize fast and accurate tracking of multiple welds. Zou et al. [19] combined convolution filters and morphology to realize the tracking of weld feature points. Although morphological methods can be used to extract feature points from welding images, specific design is necessary when they are applied for different applications, due to their poor generalization ability. Moreover, the morphological methods have three shortcomings, namely, poor extraction robustness under strong interferences, weak algorithm for multiple welds and difficulty in accuracy improvement [20]. In recent years, the vision detection algorithm based on the deep learning neural network has attracted more and more attentions, due to * Corresponding author at: School of Mechanical and Electrical Engineering, Wuhan University of Technology, Wuhan 430070, Hubei Province, China. E-mail address: chaoqunwu@whut.edu.cn (C. Wu). 1 The first authors are Liangyuan Deng and Ting Lei. https://doi.org/10.1016/j.measurement.2022.112424 Received 16 September 2022; Received in revised form 28 December 2022; Accepted 29 December 2022 Available online 4 January 2023 0263-2241/© 2023 Elsevier Ltd. All rights reserved. L. Deng et al. Measurement 207 (2023) 112424 Fig. 1. Robot welding experimental platform with laser vision sensor. Fig. 2. Examples of three typical welds and their corresponding welding images included in the dataset: (a), (b) and (c) are the fillet weldment, half V-groove weldment and V-groove weldment, (d) (e) and (f) are the schematic views of three weldments, (g), (h) and (i) are the corresponding weld images for the three types of welds. the development of computer technology. Compared with the tradi­ tional morphological methods, the data-driven deep learning algorithm has higher extraction robustness, stronger generalization capability and higher accuracy in processing images against strong noises [21]. Meanwhile, the number of the weld types to be detected can be easily increased by expanding the types of images. Therefore, deep learning algorithms have been widely employed to extract weld seam feature points. For example, Wu et al. [22] built a semantic segmentation network by using the improved VGGNet (Visual Geometry Group Network) to extract the edge features of laser stripes. Then, they used the gray centroid method and B-spline method to extract the feature points. Zhao et al. [23] also adopted the task model of semantic segmentation and chose the ERFNet network. Instead of using morphological pro­ cessing to obtain feature points, they obtained them through semantic segmentation. Similarly, Wang et al. [24] reported a new semantic segmentation model and proposed an attention mechanism. Compared with other segmentation results, the accuracy and speed have been greatly increased. Considering the influence of strong noise, Xu et al. 2 L. Deng et al. Measurement 207 (2023) 112424 Fig. 3. Three types of labels used when detecting key points by heat maps: (a) original image, (b) ground truth, (c) soft label, (d) offset label. equipment is from Aotai. The industrial camera is Daheng’s Mercury series, with a resolution of 2448 × 2048 pixel. [25] proposed a method to repair welding image results by using a GAN network, leading to DCFnet that could be used to track feature points. In addition to the semantic segmentation method, target detection method is another strategy for the feature point extraction of welds. For instance, Xiao et al. [26] proposed an adaptive method for a variety of typical welds, in which morphological methods are used to detect feature points in the region of interest extracted by target detection. Zou et al. [27] combined convolutional filter and deep learning methods to design a two-stage tracker to solve the drift problem in tracking. In this case, rough positions of feature points are first obtained through the convo­ lution filter, and then the results are optimized using the designed deep reinforcement learning network. Among the methods, the semantic segmentation and object detection methods usually need to cooperate with morphological processing to get the weld feature points. As a result, there could be two errors, from the deep learning model and the morphological processing. In this regard, it is necessary to design two processing algorithms, which is complicated and time-consuming. The target tracking method requires high precision for the first frame image processing and desired time sequence of the processed images. Considering these problems, in this article, we pro­ posed a feature point extraction model for weld seams based on Cen­ terNet [28,29]. The key point detection technology could be used to directly process the welding images with strong noises and obtain exact position of the weld feature points. More importantly, input timing of the image is not required. The content is arranged as follows. In Section 2, the robotic auto­ matic welding platform is firstly introduced (Section 2.1) and then dataset is constructed (Section 2.2). The proposed model is explained in Section 3, covering the key point detection method (Section 3.1), network structure (Section 3.2), and loss function for training (Section 3.3). In Section 4, the proposed model is trained and tested (Section 4.1) and then the feature point extraction method based on semantic seg­ mentation in the literature is reproduced as a comparison (Section 4.2). Section 5 is to summarize the works and Section 6 is a brief introduction to future works. 2.2 Welding image dataset Deep learning algorithms are primarily data-driven and model training requires a large amount of annotated data. In this work, three types of weldments shown in Fig. 2 (a), (b) and (c), were chosen to be used as test objects, and images of these during the welding process were acquired (Fig. 2 (g), (h) and (i)). These images are all grayscale images gathered by using the structured light sensors. After the image acquisi­ tion is completed, the welding image needs to be labeled with two contents, including weld type and the coordinates of the weld feature points on it. These feature points are defined as the turning points of the laser line on the weldment contour. The feature points of the three types of welds are illustrated in Fig. 2 (d), (e) and (f), they are marked with red circles in schematic cross-sectional view of the weldment. Fillet weld seam has only one feature point at the position of the weld, while half Vgroove and V-groove weld seams include the endpoint of the groove as the feature point. When marking, these feature points are only marked with coordinates and not classified. All feature points of the three types of weld seam are regarded as the same type. The position labeling of weld feature points is completed by applying the open-source labeling software Labelme. The captured weld image was imported into the software and the position of the weld features in the image to be labeled was zoomed in to the pixel level. The center line of the laser stripe is then manually located and the feature points are marked using the mouse. Eventually, all annotation information is transcribed consistently into a text file, including image information and annotation data, which is convenient for subsequent training tests. The final data set contains a total of 2370 pictures, including 919 fillet weld seams, 702 half Vgroove weld seams, and 749 V-groove weld seams. 3 Extraction method of weld feature points In order to extract weld feature points in welding images without relying on morphological methods and image continuity, a single-stage detection network is designed basing on the target detection network CenterNet. The CenterNet is a powerful anchor-free target detection network, which is different from the anchor box mode used in the traditional target detection network [28]. The method does not require to generate a mass of anchor boxes as candidate targets, but directly predicts the center point of the bounding box and then regression calculation to get the size of the bounding box. Therefore, in this method, the weld feature point is taken as the center point of the target and the prediction of the size of the bounding box is eliminated. 2 Experimental platform and dataset 2.1 Experimental platform In this paper, an experimental platform, consisting of a robot with controller, a welding equipment, a structured light sensor, and a com­ puter terminal, was set up, as shown in Fig. 1. In this system, the com­ puter terminal is used as the upper computer to control the image acquisition of the vision sensor and process the welding image. At the same time, it also communicates with the robot control cabinet. The vision sensor is composed of an industrial camera, a laser, a protective glass, and a shell. The first three are installed on the shell. The robot is responsible for movement task, while the welding action is completed by the welding machine and the welding gun at the end of the robot arm. The ABB’s IRB6700 industrial robot was adopted and the welding 3.1 Key point detection method based on Gaussian heat map The Gaussian heat map is widely used to detect the target positions of key points in a graph. This method was initially applied to human pose 3 L. Deng et al. Measurement 207 (2023) 112424 Fig. 4. Overall layout of the proposed network and specific structure of each module: (a) overall network structure, (b) encoder, (c) decoder, (d) prediction branch and (e) legend. estimation [30,31]. Then, it was introduced for target detection by Hei et al. [32] and Zhou et al. [33]. Zhou et al. [28] proposed a more concise detection method that can be used to directly detect the center point coordinates of bounding box. Compared with those of other commonly used regression methods, the label and output of the Gaussian heat map method are not the co­ ordinates of the key points. Actually, the heat map reflects the proba­ bility of the position of the feature points, given by ̂ (x,y) ∈ [0, 1]W × H × C where W × H is the size of the output feature Y with coordinates (x, y), its coordinates on the output heat map with heat can be calculated as size Hheatmap × Wheatmap ) ( xheatmap + Δxoffset , yheatmap + Δyoffset , where xheatmap and yheatmap are the integer parts and Δxoffset and Δyoffset are the fractional parts. ⎧ x ⎪ ⎨ xheatmap + Δxoffset = × Wheatmap W ⎪ ⎩ yheatmap + Δyoffset = y × Hheatmap H map and C is the number of keypoint types. These pixels will be filtered by a set value of probability, and the final filtering result is the predicted feature point coordinates. Nevertheless, it is challenging to accurately predict the position of a pixel in a feature map of H × W size. In most cases, the target point cannot be accurately defined by a pixel position, which means it is difficult to mark the feature point. Since the points near the target point are generally highly similar to the target point, if they are marked as negative samples directly, interference could be brought with the network training. Therefore, in the method to detect key points through the Gaussian heat map, an additional heat map generated by the Gaussian function is included as a soft label. These soft labels offered directional guidance to the training of the network. The shorter the distance to the target point, the greater the activation value would be. In this way, the network can reach the target point quickly and make the network converge faster. The visualization of the two types of labels needed when training the heat map is shown in Fig. 3. Fig. 3(a) shows the original image that is labeled with size H × W, while Fig. 3(b) and Fig. 3(c) show its corre­ sponding ground truth and soft label visualization. For a feature point (1) The ground truth label corresponding to this feature point is a tensor, whose shape is consistent with the heat map. This tensor has a value of 1 ( ) at xheatmap , yheatmap , while the rest of the tensor is 0. It is visualized as a mask map, as shown in Fig. 3(b), with only bright spots at the feature points. The corresponding soft label is similar, as shown in Fig. 3(c), but the difference is that the mask part of this label is a Gaussian circle with ( ) radius r centered at xheatmap , yheatmap , whereas the activation value of this mask region is the maximum at the center of the circle and decays to the boundary along the radius. The function that generates the Gaussian circle is [28]: ⎛ ( )2 ( )2 ⎞ + y − ̃ p x − ̃ p x y ⎟ ⎜ ⎟ ⎜ Yxyc = exp⎜ − (2) ⎟ ⎝ ⎠ 2σ 2p where x and y are the coordinates of the target point, ̃ px , ̃ py are the 4 L. Deng et al. Measurement 207 (2023) 112424 Fig. 5. Results of two image processing methods for welding images with strong noise: (a) original image, (b) laser stripe centerline, (c) feature maps. Table 1 Parameters of each layer of the three detection branches. Branch Layer Kernel size Channels Weld type Conv1 Global pool Conv1 BN ReLU Conv2 Conv1 BN ReLU Conv2 1×1 H×W 3×3 – – 1×1 3×3 – – 1×1 C – 64 64 – 1 64 64 – 2 Heat map Offset coordinates of the center of the circle and σ p is set to the radius of the Gaussian circle. In general target detection methods, the radius is calculated by the size of the bounding box. However, in weld inspection, it is not necessary to predict the size of the bounding box. Therefore, a uniform value of 3 is used, which is half of the estimated thickness of the laser line in the input image. Because the detection method of the Gaussian heat map is not an end-to-end model, the accuracy of the output feature point coordinates will be lost if they are integers. In addition, feature images smaller than the original input image in size are compressed by feature extraction, which also leads to a large coordinate deviation. Therefore, an extra coordinate offset is predicted outside the heat map, so that an accurate prediction of the coordinate of the weld feature points can be achieved through the combination. This offset is the compensation for the decimal places of the coordinates of the feature points after the image is scaled. The label used for the offset detection branch in model training is a tensor with two channels of same size as the heat map. Also, the two channels are the x and the y axis. Similar to that of the labels of the heat map, the tensor representing the offset labels has values only at the Fig. 6. Training and testing loss curves. ( ) integer coordinates xheatmap , yheatmap of the feature points and the rest is 0. The difference is that the values on the two channels of the offset labels are Δxoffset and Δyoffset . It is visualized as shown in Fig. 3(d). 3.2 Feature point detection network based on Gaussian heat map The proposed deep learning model refers to the classical target detection method CenterNet, with the overall structure design of the model to be shown in Fig. 4 (a). The input image accepted by the network is a welding image with a size of 3 × 512 × 512. The image is encoded and decoded to extract a feature map with 64 channels and a 5 L. Deng et al. Measurement 207 (2023) 112424 Fig. 7. Feature point extraction results of three types of welds by using the model: (a) fillet weld seam, (b) half V-groove weld seam and (c) V-groove weld seam. Table 2 MAE, ME and RMSE of different types of welds on the X-axis and Y-axis. Type Accuracy X- MAE (pixel) Y- MAE (pixel) X-ME (pixel) Y-ME (pixel) X-RMSE (pixel) Y-RMSE (pixel) Fillet Half V-groove V-groove Total 99.456 99.288 99.332 99.359 1.781 1.997 2.611 2.130 1.472 1.439 1.220 1.377 26.965 9.129 12.169 26.965 10.811 7.636 11.327 11.327 3.856 2.379 3.190 3.221 2.461 1.819 1.623 2.021 % % % % the problem of vanishing gradient in network training, improved the performance of the network and reduced the number of parameters. In the decoding part, because the main function is to upgrade the resolu­ tion, there is no need to have too many operations. Therefore, 8 times up-sampling can be achieved by three deconvolution operations, as shown in Fig. 4(c). After obtaining the feature map output with the encoding and decoding structure, the weld type, heat map, and offset need to be predicted through three independent branches, as shown in Fig. 4(d). The purpose of designing the three branches is to obtain the feature points in the welding image more accurately. When the feature points are located by the Gaussian heat map, generally, the size of the output heat map is H × W × C. The feature points are filtered from the heat map by setting a threshold value. This method can ensure maximum screening of possible targets, but in the feature point detection of welds, there will not be multiple types of welds in one picture. The number of feature points of different weld types is determined. To ensure an ac­ curate detection of the required number of feature point coordinates, the prediction of weld type is included as a constraint. At the same time, the feature points of different weld types can be regarded to have same feature, that is, the inflection point of the laser stripe. Under these two conditions, it can be considered that the feature points in all weld images belong to the same category. Therefore, the number of channels in the predicted output heat map can be set as 1 with a size of H × W, which only represents the probability that each point on the feature map may be the feature point of the weld, but does not include the prediction of the category of the feature point. Such heat maps are no longer used to predict the target category, but only to obtain the location information of the target. Then a global offset prediction is used to improve the ac­ curacy of feature point coordinate detection. Although this offset is a global prediction, only the one corresponding to the feature point will be selected according to the result of the heat map in the post-processing. The three branches are mainly composed of the convolution layer. The specific parameters of each branch are listed in Table 1. The weld type prediction branch is composed of a convolution layer and a global pooling [35]. Global pooling is a method of pooling calculations in which the size of the sliding window is the same as the size of the input feature map at the time of calculation. By using this pooling method, the input H × W × C sized feature map can be converted to a C × 1-sized output, where H and W are the width and height of the feature maps and C represents the number of weld types. The convolution layer combined size of 128 × 128. Then, the results are sent to three independent branches to predict the weld type, coordinate and offset, thus obtaining coordinates of the weld feature points on the image accordingly. In the method utilizing morphology, when processing an image with strong noise, as shown in Fig. 5 (a), it is usually necessary to use various denoising methods to obtain an image containing only laser stripes, as shown in Fig. 5 (b). Accordingly, positions of the feature points can be calculated. Nonetheless, in the method of image processing employing deep learning, the input image is commonly processed directly through the feature extraction network to gain a set of low-resolution feature maps as shown in Fig. 5 (c), which are transferred them to subsequent modules for processing. Because the feature extraction network plays an important role in excluding noise and extracting key information, so it is necessary to select a good and fast feature extraction network. In our proposed model, in order to ensure that the Gaussian heat map detection has desired accuracy, an encoding and decoding structure has been used for the feature extraction. When using a Gaussian heat map to predict the coordinates of feature points, the low resolution of the feature map will lead to large errors, so that large compensation values are required. However, if the resolution is too high, it will make the model difficult to converge and the accuracy will be affected. At the same time, it will increase the number of model parameters and hence affect the running speed. Therefore, the adopted encoding and decoding structure is first down-sampling the input image for 32 times to obtain an image with a size to be 1/32 of the original one. Then, the image resolution is restored to 1/4 of the original image by up-sampling for 8 times. In this process, the encoding part is the backbone of feature extraction, while the decoding part is mainly used to restore the resolution. To meet the requirements of accuracy and real-time performance, DenseNet was selected as the encoder. The network has a positive good effect on feature extraction through the use of dense connection mode, and can effectively reduce the number of network parameters and in­ crease the running speed [34]. The main structure of DenseNet is shown in Fig. 4 (b), which is mainly composed of dense blocks and bottlenecks. In a dense block module, the input to each convolutional layer is the output of all the layers before it plus the original one. This operation requires the feature map to have a consistent size in each dense block. The function of the Bottleneck layer between each dense block is to reduce the number of feature maps and the computation of the network. Through this dense connection mode, DenseNet successfully addressed 6 L. Deng et al. Measurement 207 (2023) 112424 Fig. 8. Test error curves of the feature points extracted with the improved CenterNet: (a) and (b) are the errors of fillet weld on the X-axis and Y-axis, respectively; (c) and (d) are the errors of half V-groove weld on the X-axis and Y-axis, respectively; (e) and (f) are the errors of half V-groove weld on the X-axis and Y-axis, respectively. with the global pooling operation can replace the fully connected layer, which can allow the size of the input image to be unlimited. The pre­ dicted output of the weld type is a vector with a size of C × 1, which represents the one-hot encoding of the weld type. The branch of the feature point heat map consists of two convolution layers, with which the input feature maps are calculated to obtain a heat map with the shape of 1 × H × W. The structure of the coordinate offset branch is the same as that of the heat map branch, with a difference in the parameters of the convolution layer. The output of the coordinate offset branch is a tensor with a shape of 2 × H × W. This result includes the compensation value for each point in the figure. The convolution operations used in the three branches have the same step size. However, in the branches of the heat map and offset, in order to ensure that the size of the convolved feature map remains unchanged, it is necessary to use padding supplement. In comparison, the weld type branch has no such requirement. 3.3 Selection of loss function for training The loss function is used to calculate the error of the detection result of the training model and the ground truth. When training the deep learning model, selecting an appropriate loss function can effectively accelerate the convergence speed and improve the accuracy of the model. Since the weld key point detection has three outputs, different loss functions are selected according to their characteristics. The loss of the entire network is calculated as the sum of three outputs. Considering that the number of positive samples in the heat map is much smaller than that of negative samples, the focal loss function is used to obtain 7 L. Deng et al. Measurement 207 (2023) 112424 The offset prediction is regression method, using the commonly used L1 loss function [36]. The formula is as follows, Loffset = N N 1 ∑ 1 ∑ L1Loss(yi , ̂y i ) = y i |) (|yi − ̂ N i=1 N i=1 (4) where N is the number of samples, yi and ̂ y i are the predicted value, and the labeled value, respectively. Multi-category cross-entropy function is used to predict the welding category, given by: C− 1 ∑ Lclass = − yi log(pi ) = − log(ptrue ) (5) i=0 where C is the number of categories in the sample, yi is the label value of the predicted target belonging to category i (usually 1 or 0, corre­ sponding to belonging or not belonging). pi is the probability value when the target predicted by the model belongs to the category i. Expanding this equation reduces to the result on the right-hand side, where the loss is essentially the negative logarithm to calculate the predicted proba­ bility of the model for predicted probability of the target belonging to the category (ptrue ) to which it really belongs. The loss values of the three parts together constitute the loss of the whole network. Fig. 9. Time-consuming curve of image processing. Table 3 Statistical results of the 5-fold cross-validation test. Fold Accuracy X- MAE (pixel) Y- MAE (pixel) X-RMSE (pixel) Y-RMSE (pixel) 1 2 3 4 5 Mean Standard deviation 99.359 % 99.789 % 99.367 % 99.577 % 99.788 % 99.576 % 0.0019 2.130 2.090 2.506 2.152 2.722 2.320 0.250 1.377 1.437 1.517 1.263 1.472 1.411 0.088 3.221 3.209 3.887 3.519 3.810 3.529 0.284 2.021 2.204 2.271 1.973 2.200 2.134 0.11551 L = Lclass + Lheatmap + Loffset where Lclass is the predicted loss of the weld type output by the network, Lheatmap is the loss of the output heat map, and Loffset is the loss of the offset of feature points. 4 Experiment and analysis In this section, the dataset created in Section 2.2 was divided into 5 equally sized groups for 5-fold cross-validation with reference to the Kfold cross-validation method. In total, five training and testing sessions were conducted, i.e., the five groups were rotated as the test set, while the other four groups were used for training. For the results of the 5-fold cross-validation, because the results of the 5 groups are similar, in this section, only the first training and testing results in order to reduce the space are displayed. For the remaining four times, only the statistical results are shown. better results [32]. Lheatmap = { ( )α ( ) C ∑ H ∑ W 1 − pcij log pcij − 1∑ ( )β ( )α ( ) N c=1 i=1 j=1 1 − ycij pcij log 1 − pcij (6) if ycij = 1 otherwise (3) where N is the number of samples, C is the number of categories in the sample, and H and W are the dimensions of the heat map. pcij and ycij respectively represent the predicted score and ground truth at position (i, j), while α and β are the parameters used to control weights. Fig. 10. The weld seam images and corresponding semantic segmentation labels: (a) welding images of three types of welds, (b) corresponding ground truth. 8 L. Deng et al. Measurement 207 (2023) 112424 Fig. 11. Test results with DeepLabV3plus and YOLOv5: (a), (b) and (c) are results of DeepLabV3plus splits out the laser stripes in the weld image, (d), (e) and (f) are false detections of YOLOv5. Table 4 Statistical results of feature point extraction errors of the two methods. Method Type DeepLabV3plus X- MAE (pixel) Fillet 10.758 Half V-groove 5.734 V-groove 17.428 Total 10.028 YOLOv5a Fillet 1.947 Half V-groove 1.271 V-groove 1.100 Total 1.439 a In the test results with YOLOv5, the data of false detection are removed. Y- MAE (pixel) X-ME (pixel) Y-ME (pixel) X-RMSE (pixel) Y-RMSE (pixel) 7.001 6.850 8.948 7.313 2.503 1.836 2.209 2.183 91.941 78.8453 97.590 97.590 50.499 8.000 9.015 50.499 97.836 80.171 80.171 97.836 14.000 6.001 7.170 14.000 20.398 13.931 30.706 20.147 5.99 1.826 1.642 2.971 16.866 18.581 16.968 17.687 3.097 2.152 2.332 2.432 4.1 Model training and testing an aid to determine the convergence of the model, while the model with the lowest test loss is generally considered to be successfully trained. According to Fig. 6, after the 100th epoch, the training loss tended to be constant while the test loss started to increase. It can be considered that the model has enough iterations and starts overfitting. Therefore, the model of the 100th training epoch was selected for subsequent tests. To evaluate performance of the model, it was tested with the test dataset. Fig. 7 inspection results of each of the three weld seams by using the model. The areas where the feature points are located are marked with red bounding boxes and the types of weld are indicated. The lo­ cations of the feature points are marked with red crosses. In addition, the X-axis and Y-axis in the figure indicate the directions of the two errors, i. e., the subsequent statistical characteristic point coordinate detection errors. The trained model was tested on the test set, while the results of weld type and feature point extraction were analyzed. The detection results of the model were compared with those of the label, and the prediction accuracy of the weld type was calculated as listed in Table 2. At the same time, the extraction error of the weld feature points compared with the labeled coordinates under the detection image resolution (2448 × 2048) was calculated. The mean absolute error (MAE), the maximum error (ME) and the root mean square error (RMSE) were statistically obtained as summarized in Table 2. The MAE is calculated as: The network is implemented using the PyTorch deep learning framework based on Python. The dataset collected and made by using the robot welding experimental platform mentioned above was used for the training and testing of the model. The machine used to train the model was a computer equipped with two Nvidia 2080Ti 12G ram models, with an Intel I9-10 CPU. With this machine, the designed model was trained and tested using the image dataset of the weld. For the initialization parameter setting of the model, the DenseNet121 pretraining network model in the PyTorch framework library was selected to initialize the feature extraction part of the network structure, in order to accelerate the training convergence speed of the model. In actual operation, because only DenseNet121 was used as the feature extraction structure, in the training parameters of the imported Dense­ Net121 model, the decision-making layer part should be deleted. In the key point detection part, initial parameters were given by using the HE initialization method, where the deviation was initialized to 0. Adam optimizer was used in training, while default parameters were used instead of manual Settings for learning rate and weight attenuation. After 150 epochs on the training dataset produced, the train loss reflects the variation of the model’s loss on the training set, as shown in Fig. 6. While training, the validation of the test set was inserted at every 10 epochs interval, thus obtaining the test loss curve, as illustrated in Fig. 6. Because these tests were not involved in back-propagation, the training of the model was not affected. The loss curve of the test set was used as MAE = 9 N 1 ∑ yi| |yi − ̂ N i=1 (7) L. Deng et al. Measurement 207 (2023) 112424 Fig. 12. Test error curves of feature points extracted by using the semantic segmentation method: (a) and (b) are the errors of fillet weld on the X-axis and Y-axis, respectively; (c) and (d) are the errors of half V-groove weld on the X-axis and Y-axis, respectively; (e) and (f) are the errors of half V-groove weld on the X-axis and Yaxis, respectively. the X-axis and Y-axis. The absolute error is the difference in absolute values in the x or y direction between the coordinates of the feature points extracted by the test and their corresponding labeled ones. The MAE is an average of the absolute errors of all feature points for a given type of weld in the test, which was drawn as a red horizontal line and labeled with the average value. The points marked in the plots are the locations with the largest value of error for that type of image in the test. According to the definition in labeling, the welding images of fillet weld, half V-groove weld and V-groove weld contain 1, 2 and 3 feature points, respectively. These feature points were all calculated and counted to identify the errors. It is found that sufficiently high performance in the test set has been achieved using the trained model. The prediction ac­ curacy of weld types is 99.359 %, thus ensuring accurate classification of the three types of welds. In the extraction of weld feature points, the where N is the number of test samples, yi and ̂ y i are the true coordinates of the feature points labeled and the coordinates predicted by the model, respectively. This equation was used to calculate the average error in the model on the detection of feature point coordinates. Meanwhile, the RMSE was used to evaluate the error that may be generated by the model, which is calculated as: √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ √ N √1 ∑ RMSE = √ (8) (yi − ̂y i )2 N i=1 Fig. 8 shows the error curves. The three groups respectively corre­ spond to the coordinate error curves of the feature points extracted from the test images of fillet weld, half V-groove weld and V-groove weld, on 10 L. Deng et al. Measurement 207 (2023) 112424 Fig. 13. Test error curve of feature points extracted by YOLOv5 with removing false detection: (a) and (b) are the errors of fillet weld on the X-axis and Y-axis, respectively; (c) and (d) are the errors of half V-groove weld on the X-axis and Y-axis, respectively; (e) and (f) are the errors of half V-groove weld on the X-axis and Yaxis, respectively. overall MAE on the X-axis and Y-axis can be 2.130 pixel and 1.377 pixel. Also, the MAE of each type of welds is less than 3 pixel. In robot real-time automatic welding, time consumption is an important evaluation index. In real-time welding trajectory tracking, to ensure real-time performance, it is necessary to reduce the time of inference as much as possible. To calculate the processing time of the model, it was tested for 100 times with a single image, resulting in the time-consuming curve, as shown in Fig. 9. The maximum time con­ sumption in the whole reasoning process is 65.470 ms, the minimum time consumption is 26.072 ms and the average time consumption is 32.186 ms, showing a promising real-time effect, which meets the requirement of for real-time weld seam track tracking. The final results of the 5-fold cross-validation experiment are listed in Table 3, in terms of the values of MAE and RMSE on the X-axis and Y- axis for each fold of the tested model. The mean and standard deviation of the results were also calculated. In the X and Y directions, the 5-fold cross- validation’s MAE is 2.320 pixel and 1.411 pixel, meanwhile its RMSE is 3.529 pixel and 2.134 pixel. These values are within a narrow fluctuation of error and satisfy the needs of the application. The results of these five tests have a low standard deviation, and the variations of MAE and RMSE are less than 1 pixel, indicating that they are also stable. Overall, the model showed good stability and mediocre performance. 4.2 Comparative experiment of feature points extraction In this section, two methods are used for comparison. The first one is a combination of semantic segmentation with morphological processing to extract weld feature points [24]. In this case, the semantic 11 L. Deng et al. Measurement 207 (2023) 112424 Fig. 14. Histograms of MAE and RMSE for the three methods: (a) Mean absolute error, (b) Root mean square error. segmentation network (DeepLabV3plus [37]) is used to separate the laser stripes from the weld image and then a morphological approach is employed to obtain the coordinates of the feature points. In the second method, a target detection network YOLOv5 is utilized, with detection was achieved by using the weld feature points as the center of the bounding box. The deep learning models used in two methods are trained and tested. The first method requires new labels of the welded images because of the semantic segmentation network. Fig. 10 shows semantic segmentation label of the welding image, with subfigure (a) as the original image and subfigure (b) as its corresponding semantic segmentation label. The two methods were trained and tested in the same environment, with representative test results to shown in Fig. 11. Fig. 11 (a), (b) and (c) shows the results of the semantic segmentation model for separating laser stripes. It is observed that the segmented laser stripes are incom­ plete, which has impact on the subsequent morphological processing to a center degree. As seen in Fig. 11 (d), (e) and (f), YOLOv5 has false detection. In Fig. 11 (d), the edge of weldment is misidentified as a feature point, in Fig. 11 (e), a spatter is misidentified as a V-groove weld, and in Fig. 11 (f), a redundancy is misidentified near a feature point. In other words, other features similar to feature points in the weld image might be incorrectly detected as feature points. This is because the two types of features may have similar confidence levels, making it difficult to eliminate by setting a threshold. Statistic results of the two methods are listed in Table 4. For clear observation, only the correct detection results are retained. The average processing times of semantic segmen­ tation and YOLOv5 are 58.993 ms and 40.271 ms, respectively. Error curves of the test data with the two methods are shown in Fig. 12 and Fig. 13. In Fig. 13, the false detection error of YOLOv5 is not included. It is observed that the comprehensive performance of the se­ mantic segmentation method is the lowest among the three methods. The MAE of this method is the largest, while large deviations are often present. The two methods based on target detection have relatively high performance. In the method proposed in this work, the error on the Yaxis is smaller than that on the X-axis, while opposite result is observed when using YOLOv5. Therefore, in comparison with YOLOv5, the method proposed in this work is weaker on the X-axis and stronger on the Y-axis. Meanwhile, CenterNet and YOLOv5 are only slightly different, when the error of false detection is not included. To demonstrate the impact of false detection on accuracy, a statis­ tical histogram was drawn as shown in Fig. 14, which the false detection data were not removed. YOLOv5′ s has not any false inspection for the half V-groove weld, which is consistent with the result shown in Fig. 13 (b). However, for fillet and V-groove weld seams, the accuracy is seri­ ously affected by the false detection. In particular, the MAE reached 32.369 pixels, while the RMSE is 233.538 pixels. Therefore, false detection should be avoided as much as possible. The reason for the poor performance of the semantic segmentation method could be attributed to the fact that the process of extracting feature points is not sufficiently direct. The two steps of the method introduced errors twice, i.e., the error of separating laser stripes and the error of locating feature points. Moreover, the difficulty in separating the laser stripes inevitably led to large errors in localizing of the feature points. Therefore, this method is ineffective in dealing with the data in this work. In contrast to YOLOv5, the proposed method in this work has no false detections, owing to the addition of a weld type constraint. This constraint limited the number of detected feature points, thus ensuring that only the closest feature points are retained when there are similar features. 5 Conclusions (1) An experimental platform consisting of a robot, a laser vision sensor, welding equipment and a computer terminal, has been built to collect welding images of three types of welds. The ac­ quired images were labeled by an annotation tool and welding image dataset was constructed. (2) A method of extracting weld feature points was proposed based on CenterNet, in which the weld feature points were regarded as the center points of the bounding box for direct detection. It can be used deal with disordered images without the assistance of morphological methods and prior frames. (3) The method was trained and tested with a dataset that was built using the three types of weld images. The classification accuracy for the three types of welds reached 99.359 %. Mean absolute errors for fillet, half V-groove and V-groove weld seams, are 1.627 pixel, 1.718 pixel, and 1.9155 pixel, respectively. In addition, the corresponding root means square errors are 3.159, 2.099, and 2.407, respectively. Moreover, the processing time for a single image is 32.186 ms. Therefore, the model has sufficiently high accuracy, stability, and real-time performance. 6 Future development Firstly, the welding image dataset should be expanded by adding new weld types. Secondly, this method could be expanded to multi-layer and multi-pass welds. Thirdly, it is desirable to add new constraints to further increase accuracy and stability of the model used for continuous images. Last but not least, real application of the method for weld tracking should be explored. CRediT authorship contribution statement Liangyuan Deng: Methodology, Writing – original draft, Software. 12 L. Deng et al. Measurement 207 (2023) 112424 Ting Lei: Writing – review & editing, Conceptualization, Project administration. Chaoqun Wu: Writing – review & editing, Resources, Project administration, Supervision. Yibo Liu: Data curation. Shiyu Cao: Writing – review & editing. Song Zhao: Resources. [13] X. Li, X. Li, S.S. Ge, M.O. Khyam, C. Luo, Automatic Welding Seam Tracking and Identification, IEEE Trans. Ind. Electron. 64 (2017) 7261–7271. [14] R. Xiao, Y. Xu, Z. Hou, C. Chen, S. Chen, A feature extraction algorithm based on improved Snake model for multi-pass seam tracking in robotic arc welding, J. Manuf. Process. 72 (2021) 48–60. [15] N. Wang, K. Zhong, X. Shi, X. Zhang, A robust weld seam recognition method under heavy noise based on structured-light vision, Rob. Comput. Integr. Manuf. 61 (2020), 101821. [16] Y. Zou, Y. Wang, W. Zhou, X. Chen, Real-time seam tracking control system based on line laser visions, Opt. Laser Technol. 103 (2018) 182–192. [17] Y. Zou, X. Chen, G. Gong, J. Li, A seam tracking system based on a laser vision sensor, Measurement 127 (2018) 489–500. [18] L. Yang, E. Li, T. Long, J. Fan, Z. Liang, A High-Speed Seam Extraction Method Based on the Novel Structured-Light Sensor for Arc Welding Robot: A Review, IEEE Sens. J. 18 (2018) 8631–8641. [19] Y. Zou, T. Chen, Laser vision seam tracking system based on image processing and continuous convolution operator tracker, Opt. Lasers Eng. 105 (2018) 141–149. [20] A. Singh, V. Kalaichelvi, R. Karthikeyan, Application of Convolutional Neural Network for Classification and Tracking of Weld Seam Shapes for TAL Brabo Manipulator, Mater. Today: Proc. 28 (2020) 491–497. [21] Y. Zou, W. Zhou, Automatic seam detection and tracking system for robots based on laser vision, Mechatronics 63 (2019), 102261. [22] K. Wu, T. Wang, J. He, Y. Liu, Z. Jia, Autonomous seam recognition and feature extraction for multi-pass welding based on laser stripe edge guidance network, Int. J. Adv. Manuf. Technol. 111 (2020) 2719–2731. [23] Z. Zhao, J. Luo, Y. Wang, L. Bai, J. Han, Additive seam tracking technology based on laser vision, Int. J. Adv. Manuf. Technol. 116 (2021) 197–211. [24] B. Wang, F. Li, R. Lu, X. Ni, W. Zhu, Weld Feature Extraction Based on Semantic Segmentation Network, Sensors (Basel) 22 (2022) 4130. [25] F. Xu, H. Zhang, R. Xiao, Z. Hou, S. Chen, Autonomous weld seam tracking under strong noise based on feature-supervised tracker-driven generative adversarial network, J. Manuf. Process. 74 (2022) 151–167. [26] R. Xiao, Y. Xu, Z. Hou, C. Chen, S. Chen, An adaptive feature extraction algorithm for multiple typical seam tracking based on vision sensor in robotic arc welding, Sens. Actuators, A 297 (2019), 111533. [27] Y. Zou, T. Chen, X. Chen, J. Li, Robotic seam tracking system combining convolution filter and deep reinforcement learning, Mech. Syst. Sig. Process. 165 (2022), 108372. [28] X. Zhou, D. Wang, P. Krähenbühl, Objects as points, arXiv preprint arXiv: 1904.07850, (2019). [29] X. Zhou, V. Koltun, P. Krähenbühl, in: Tracking Objects as Points, UT Austin, Austin, United States, Intel Labs, Hillsboro, United States, 2020, pp. 474–490. [30] J. Tompson, A. Jain, Y. LeCun, C. Bregler, Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation, Adv. Neural Inf. Proces. Syst. (2014) 1799–1807. [31] A. Newell, K. Yang, J. Deng, Stacked hourglass networks for human pose estimation, European conference on computer vision, Springer, 2016, pp. 483-499. [32] H. Law, J. Deng, Cornernet: Detecting objects as paired keypoints, Proceedings of the European conference on computer vision (ECCV), 2018, pp. 734-750. [33] X. Zhou, J. Zhuo, P. Krähenbühl, Bottom-up Object Detection by Grouping Extreme and Center Points, Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 850-859. [34] G. Huang, Z. Liu, L. Van Der Maaten, K.Q. Weinberger, Densely connected convolutional networks, Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4700-4708. [35] M. Lin, Q. Chen, S. Yan, Network in network, arXiv preprint arXiv:1312.4400, (2013). [36] R. Girshick, Fast r-cnn, Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440-1448. [37] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, H. Adam, Encoder-decoder with atrous separable convolution for semantic image segmentation, Proceedings of the European conference on computer vision (ECCV), 2018, pp. 801-818. Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Data availability The authors do not have permission to share data. Acknowledgments This research is supported by Project funded by National Natural Science Foundation of China (52275506), Fund Project of Technology Transfer Center of Wuhan University of Technology in Jingmen (WHUTJMZX-2022JJ-10). References [1] C. Liu, H. Wang, Y. Huang, Y. Rong, J. Meng, G. Li, G. Zhang, Welding seam recognition and tracking for a novel mobile welding robot based on multi-layer sensing strategy, Meas. Sci. Technol. 33 (2022), 055109. [2] T. Lei, Y. Rong, H. Wang, Y. Huang, M. Li, A review of vision-aided robotic welding, Comput. Ind. 123 (2020), 103326. [3] T. Lei, W. Wang, Y. Rong, P. Xiong, Y. Huang, Cross-lines laser aided machine vision in tube-to-tubesheet welding for welding height control, Opt. Laser Technol. 121 (2020), 105796. [4] C. Wu, P. Yang, T. Lei, D. Zhu, Q. Zhou, S. Zhao, A teaching-free welding position guidance method for fillet weld based on laser vision sensing and EGM technology, Optik 262 (2022), 169291. [5] T. Lei, C. Wu, H. Yu, Electric Arc Length Control of Circular Seam in Welding Robot Based on Arc Voltage Sensing, IEEE Sens. J. 22 (2022) 3326–3333. [6] P. Yang, T. Lei, C. Wu, S. Zhao, J. Hu, A Fast Calibration of Laser Vision Robotic Welding Systems Using Automatic Path Planning, IEEE Trans. Instrum. Meas. 71 (2022) 1–10. [7] L. Yang, Y. Liu, J. Peng, Advances techniques of the structured light sensing in intelligent welding robots: a review, Int. J. Adv. Manuf. Technol. 110 (2020) 1027–1046. [8] W. Huang, R. Kovacevic, Development of a real-time laser-based machine vision system to monitor and control welding processes, Int. J. Adv. Manuf. Technol. 63 (2012) 235–248. [9] X. Lu, D. Gu, Y. Wang, Y. Qu, C. Qin, F. Huang, Feature Extraction of Welding Seam Image Based on Laser Vision, IEEE Sens. J. 18 (2018) 4715–4724. [10] C. Wu, J. Hu, T. Lei, P. Yang, S. Gu, Research on robust laser vision feature extraction method for fillet welds with different reflective materials under uncertain interference, Opt. Laser Technol. 158 (2023), 108866. [11] L. Li, L. Fu, X. Zhou, X. Li, in: Image processing of seam tracking system using laser vision, Robotic welding, intelligence and automation, Springer, 2007, pp. 319–324. [12] J. Muhammad, H. Altun, E. Abo-Serie, A robust butt welding seam finding technique for intelligent robotic welding system using active laser vision, Int. J. Adv. Manuf. Technol. 94 (2016) 13–29. 13