Adaptive Multi-threshold Object Selection in Remote Sensing Images Vladimir Yu. Volkov Radioengineering dept. Saint-Petersburg Electrotechnical University ''LETI''; Radioengineering dept. Saint-Petersburg State University of Aerospace Instrumentation Saint-Petersburg, Russia vl_volk@mail.ru Abstract— Algorithms for detection and selection of objects of interest in remote sensing observations based on multithreshold processing are investigated. The studied algorithms convert monochrome images into a set of binary layers which are next subjected to simple morphological analysis allowing for the selection of isolated objects in each layer. By analyzing the location and the geometric characteristics of objects in neighboring layers one can generalize the selection procedure by applying objective geometric criteria to the multi-layer scene reconstruction based on the percolation effect. This way adaptive threshold can be selected individually for each object of interest leading to a significant reduction in the false alarms rate during detection especially at lower-level thresholds where high hit rates can be achieved. The efficacy and performance of the approach is supported using both simulated random fields as well as television and radar remote sensing observations. Keywords—object detection and selection, multi-threshold processing, image segmentation, percolation effect I. INTRODUCTION The problems of effective detection, selection and localization of the objects of interest in observational images are of immense importance for the performance of television, infrared, laser and radar remote sensing systems [1-4]. The keynote properties of objects in contrast to the background are their compactness and isolation. Segmentation of an image into individual objects is usually based on such characteristics as homogeneity of intensity and color matching. Regional methods are mainly based on the assumption that neighboring pixels within the object area have more or less similar values [3]. For a detailed overview of object segmentation methods that are sufficiently versatile for various image analysis applications, not limited to remote sensing systems, but also including microscopic or biomedical imaging, we refer to [4] and references therein. Mikhail I. Bogachev Radioengineering dept. Saint-Petersburg Electrotechnical University ''LETI''; 5 Professor Popov street, 197376 Saint-Petersburg, Russia mibogachev@etu.ru links isolated objects in neighboring binary layers to obtain a 3D hierarchical structure for subsequent segmentation. The percolation effect is associated with the trickle of empty pixels in the area occupied by an isolated object with the increase of the threshold value which ultimately leads to its destruction and the emergence of new objects from the separated fragments [5,6]. After 3D reconstruction, all the objects of interest can be selected using various criteria, such as their geometric characteristics or texture parameters [7,8]. II. SELECTION OF OBJECTS IN MULTI-THRESHOLDED IMAGE PROCESSING To implement object selection, one needs to know the expected properties of the object. Typically, there is a severe shortage of information about objects, except for their typical sizes and some assumptions about their location, perimeter, shape and orientation. The original idea is to select and set the optimal threshold value based on the results of the preliminary selection of objects for a multitude of thresholds to achieve the best result based on a posteriori information. This approach was originally proposed in [7] for the allocation of small-scale objects in remote sensing images. The object area appears very effective for the selection of similar small-scale objects by multi-threshold processing, as it has been shown recently for a biomedical image analysis example [8]. It is usually possible to exclude small objects that arise from the background or are fragments separated from larger objects that have already destructed at this threshold from consideration. The drawback of this approach is the need to specify the absolute object areas in pixels thus limiting multiscale processing that in turn requires using some scale-invariant features of the objects of interest. Multi-thresholding transforms the original monochrome image into a set of binary layers. For sufficiently large number of thresholds one can assume that this transformation does not lead to the loss of information. At the same time, binary image processing is simpler and faster than grayscale image processing. Ideally, each object of interest requires its own threshold value, and such local thresholds can be obtained by using local (gliding) windows, within which the background can be assumed homogeneous [9]. Although these methods also require a priori knowledge of the object sizes, using a background window also results in a loss of resolution for the nearby objects and in the suppression of one object by neighboring objects that fall within the area of this window. This article develops an original approach to the analysis of monochrome images based on their multi-threshold processing followed by scene reconstruction utilizing the percolation effect. A specific feature of the approach is that both image segmentation and object selection are based on the a posteriori information obtained for a series of multiple binary layers. The main advantage is that no prior training is required, and the algorithm parameters are adjusted for each image and then for each object individually. This approach Multi-threshold processing provides with an alternative such as setting a threshold for each category of objects of interest, which in turn are selected according to given criteria [10,11]. In this case various parameters to describe the category of objects may be used, such as the object size or its orientation. Invariant parameters such as the ratio of the squared perimeter to the area, the elongation coefficient of the fitting ellipse as well as other geometric or textural characteristics are more convenient for the analysis of multi- scale images. In each binary layer, those objects that satisfy the specified criteria are selected, and the binarization threshold for such objects is chosen such that the maximum number of selected objects of this category (or their pixels) is obtained, taking into account the required preservation of the shape of the objects. This process can be automated, resulting in adaptive threshold setting method and algorithm. The use of invariant geometric metrics helps to largely overcome the limitations of the area-based selection method which is not suitable for objects that considerably differ in their area and intensity. The normalized quotient of the squared object perimeter and the object area PS = P2/4πS that characterizes the compactness of the object is a common example of an invariant metric [10]. The normalizing coefficient 4π provides with a unit value for the circle which is the most compact planar object possible. Observational image background is commonly characterized by "fractalshaped" speckle structures characterized by PS values significantly above one. In marked contrast, objects of interest are typically compact indicated by lower PS values than the background objects, which makes it possible to distinguish them from a noisy background. Another geometric invariant is the relative elongation of the main axis of the fitting ellipse PL = πL2/4S normalized to the object area such that it equals one for a simple circle [10]. In contrast to PS, the PL coefficient is rather suitable for the selection of prolonged objects. By minimizing the geometric invariant(s), the threshold level can be chosen adaptively for each of the selected objects. As we show below, the above approach leads to a significant reduction of the false alarm rate during detection, allowing to use lower thresholds characterized by higher hit rates for the objects of interest. III. THE HIERARCHICAL SCENE RECONSTRUCTION BASED ON MULTI-THRESHOLD PROCESSING RESULTS TAKING INTO ACCOUNT THE PERCOLATION EFFECT To select a local threshold, one must establish the relationship between the binary layers, and decide whether each pixel belongs to the same or to a new object formed in the new binary layer detached from a larger object each time the threshold level increases. For that, establishing links between pixels with the same coordinates in different binary layers should be supplemented by the determination of the specific relationship between objects in different layers based on a certain algorithm and objective criteria. The percolation based approach suggests reconstruction of a 3D hierarchical structure representing all binary layers and setting up relations between isolated objects in them according to rather universal although reasonably tunable algorithm [11]. Let a monochrome image I(x, y), where I is the intensity and x, y are the pixel coordinates, be subjected to a fixed global threshold T. The result is a binary layer BT: { BT = 1 if I(x, y) ≥ T; BT = 0 if I(x, y) < T } where BT = 1 represents the objects of interest (foreground), such as buildings, structures, vehicles, coastlines, while BT = 0 refers to the background represented by the landscape of the observation area. Let us start with zero threshold T = 0, when I(x, y) > 0 is satisfied for the entire image, thus forming a binary layer containing a single global isolated object with an area of S0, that occupies the entire image. Next let the threshold T be increased by ∆T, so that some pixels appear below the new threshold ∆T, leading to the formation of a new binary layer satisfying I(x, y) > ∆T. If ∆T is relatively small and only a small fraction of pixels is excluded from the object, then the global isolated object remains uncrippled, while reducing slightly in its area ST < S0. When the threshold is further increased, the fraction of pixels with an intensity below the threshold becomes large enough that these pixels merge together to form gaps in the image. Ultimately, this leads to the formation of gaps in the original object, and to the separation of isolated fragments from it. In this case, one should decide whether among several separated fragments there is a successor to the original object at all, or whether the original object is destroyed, and all the fragments appear as new objects. Usually the selection of the successor(s) is based on the analysis of the areas of the separated fragments relatively to the area of the original object. This kind of phase transition is known as percolation [5,6]. For each pair of binary layers, the ratio ST+∆T / ST characterizes the fraction of connected pixels belonging to the isolated object. Further increase in the threshold leaves an increasing number of pixels initially belonging to the object below the new thresholds T+k∆T, where k is an integer number of layers; k = 0,1, ..., K. Thus Kp = ST+∆T / ST can be introduced as a persistence parameter of the object that depends on its topological and textural characteristics. As long as Kp exceeds one-half, the object in the upper layer can be considered as an unambiguous successor of the object in the lower layer, which turns out to be its only predecessor. If the object in the lower layer is precisely divided in two halves, given that some pixels contributed to the formation of the gap, both fragments appear smaller than ST/2, and thus two new objects appear. The initial binary layer for the new object is formed at the moment of percolation of its predecessor representing its ''basement''. Next the new object accumulates further binary representations for various threshold values Tk until the condition I(x, y) > Tk is met. Let the object arise at the threshold value T. The base area of the object ST decreases over K layers until one of two events occurs: 1) the object disappears completely, i.e. ST+(K+1)∆T = 0; 2) the object fragmented, i.e. on the (K+1) th layer the ratio ST+(K+1)∆T / ST+K∆T is less than the specified Kp value. In the latter case, all the fragments of the initial object appear as new objects. The fraction of the area eliminated over the lifetime of an object PC = ST+K·∆T /ST can be considered as its percolation coefficient, while the threshold level TK which leads to the disintegration of the object is known as the percolation threshold Tc. The percolation coefficient partly characterizes the texture of the object's surface. If an object has a flat vertex in its 3D representation with a constant intensity value I(x, y) = const, it disappears completely after a single threshold increment, and its percolation coefficient PC = 1. In this case, the intensity value itself does not affect the value of the percolation coefficient, i.e. it turns out to be invariant to transformations such as shifting or scaling of the image. If the object smoothly changes its intensity, i.e. it has a small intensity gradient, then the inheritance between adjacent layers with an increasing threshold will be maintained as long as ST+K·∆T ≥ Kp·ST. If the object is "long-lived", its percolation coefficient is usually close to zero (see example in Fig. 1). and pure white noise, the percolation coefficient is invariant to the noise distribution and is equal to PC ≈ 0.593. IV. 25 5 30 25 20 OBJECT SELECTION USING ADAPTIVE GLOBAL THRESHOLD 10 20 15 15 15 10 20 10 5 0 40 25 5 30 40 30 30 20 10 15 20 25 20 10 0 5 10 30 0 a 0 b Npix The simplest selection is based on the area of objects. In some cases, there is a priori information about the typical size or area of objects. Objects smaller than the minimum expected area of the object Smin are eliminated this way significantly reducing the computational of the algorithms. 1 2 T c 1 1 0.9 5 0.9 5 0.8 10 The global threshold works well when detecting and selecting a group of similar compact (or extended) objects from a homogeneous noisy background. Consider the case when the image contains a number of similar objects that need to be selected. In each binary layer, objects that satisfy the specified properties are selected, and the selection threshold for such objects is chosen to obtain the maximum number of objects of this category (or their pixels), taking into account the required preservation of the object's shape. 0.7 0.8 10 0.7 0.6 15 Figure 2 shows simulation results for 49 square objects of 16×16 pixels each with standard Gaussian noise background (Fig. 2a). The signal-to-noise ratio (relative expectation shift or deflection) in each pixel is d = 1.163. The dependence of the total number of connected objects on the threshold value is shown in Fig. 2b. When selecting objects by area, acceptable distortion of object boundaries is achieved at threshold value T = 133 (Fig. 2c). 0.6 15 0.5 0.4 20 0.5 0.4 20 0.3 25 0.3 25 0.2 0.1 30 0.2 0.1 30 0 5 10 15 20 d 25 30 0 5 10 15 20 25 30 e Fig. 1. Example of two Gaussian objects (a,b) and the percolation effect: c – reducing areas in pixels for two objects with increasing threshold; (d,e) – binary slices at the moment of percolation at threshold T = 17 As a result of this approach, a three-dimensional hierarchical structure is formed, containing all the selected objects, in which each pixel no longer belongs to one binary layer, but corresponds to several binary layers k = 0, ... , K. Based on this reconstruction, generalization of the image segmentation and object selection procedures can be proposed. First restrictions on the area of the selected objects as well as other geometric parameters that characterize their compactness and shape are introduced. For each selected object from its multiple representation in a series of binary layers, the optimal Topt threshold corresponding to the layer with the best planar representation can be selected by various geometric or textural criteria, as indicated above. To analyze deeper details, one can set Kp = 1, which is the most stringent requirement when the loss of even one pixel for the original object leads to the formation of a new object. In this case every rising threshold gives new collection of objects. In contrast, choosing Kp = 0.5 results in a significant reduction in the total number of threedimensional objects spanning through multiple binary layers. If the image contains only random noise, i.e. the intensity is randomly distributed throughout the image, the location of pixels that appear below a certain threshold is also random. Theoretically, it is well known that for an infinite image size At lower threshold values, the shape of objects is distorted by background noise, which significantly fragments the boundaries. At higher threshold values some objects may be lost. A simple detector with Neumann-Pearson threshold provides false alarm rate F = 0.01 for each pixel, as shown in Fig. 2d, but the hit rate is only D = 0.12, so the objects are highly fragmented and none retain their squared shape. Fig. 4 shows the corresponding detection characteristic (curve 1), where x-axis contains signal-to-noise ratio (deflection). Widely used Otsu detector gives too many false alarms as it is shown in Fig. 2e. The false alarm rate depends on the minimal area Smin of the detected objects (x-axis). Figure 3 shows corresponding simulation results in logarithmic scale. For a given threshold and for each area Smin the fraction of threshold exceedances outside the objects of interest (caused by background noise) has been calculated and normalized to the entire image size giving the false alarm rate (y-axis, log-scale). Curves 1 to 5 correspond to threshold levels T = 150, 155, 160, 165, 170. The consequence of selection and removal of small objects is the ability to reduce the threshold to levels where useful objects are better detected while maintaining low false alarm rate. The larger the fraction of noise objects to be removed, the lower the detection threshold can be set for the same false alarm rate. For Smin = 150 it leads to normalized threshold tNP = 0.47 instead of 2.326 (see Fig. 4, 1), while without selection the same threshold would lead to a significantly higher false alarm rate F = 0.32 (see Fig. 4, 2). Similar curve showing per pixel detection under object area based preselection scenario lies in between (see Fig. 4, 3). The results are obtained by simulation so y-axis presents the estimates of the hit rate. The threshold deflection is now equal to 0.5 providing about 6 dB benefit in the terms of signal-to-noise ratio. Once there is information about the shape, the results could be further improved by averaging. 2 D 50 Nobj 100 150 3 1 200 250 50 100 150 200 250 S/N T a Fig. 4. Detection characteristics for object selection by area for Smin=150. The x-axis contains signal-to-noise ratio (deflection). 1 – Neymann-Pearson (NP) detector with high threshold; 2 – NP detector with low threshold; 3 – detector for low threshold with object selection b T = 133 Object number i= 2870 LENGTH = 180 200 It is worth noting that the adaptation of the threshold level to the maximum of the selected objects makes sense in those situations when the scene contains a sufficient number of rather similar objects of interest. In addition, this method gives a slightly lower value of the threshold level. To overcome the limitations geometric invariants are used [11]. 50 150 100 150 100 200 50 250 V. OBJECT SELECTION USING ADAPTIVE LOCAL THRESHOLD BASED ON GEOMETRIC CRITERIA 0 50 100 150 200 250 c Otsu Threshold 50 50 100 100 150 150 200 200 250 250 50 100 150 200 50 250 d 100 150 200 250 e Fig. 2. Detection of squared objects in background noise. The intensity scale in Fig. 2c shows the the object area lg10 F 1 2 3 5 4 Smin Fig. 3. Fraction of threshold exceedances by background noise (false alarms rate) as a function of the minimum object area Smin. Curves 1 to 5 correspond to threshold levels T = 150, 155, 160, 165, 170, respectively. Let us first consider white noise field subjected to threshold T with Smin=10 which can substantially reduce the number of isolated objects in the binary layer. The dependence of the number of selected objects Nobj on threshold value T is shown in Fig. 5a, 1 where the total number of objects equals 500. Fig. 5a, 2 indicates the dependence of the maximum (over all objects in the given layer) perimeter elongation coefficient PSmax on threshold T. Obviously this coefficient reaches large values (up to 90) for noisy objects with fractured structure. Area based preselection reduces the rate of the noise based exceedances as shown in Fig. 5b, 2 compared to simple threshold based analysis without preselection (Fig. 5b, 1), while it has virtually no effect on the values of the perimeter elongation coefficient PS for the remaining objects. It is interesting to investigate the influence of limitations on the coefficient PS. Obviously restriction of the maximum value for PS results in additional elimination of noise objects characterized by high elongation coefficients. It seems these objects have fractured structure and do not represents the objects of interest. The value PSmax is the second important parameter for the adaptive object selection by PS > PSmax. Figure 6 shows that limiting the elongation coefficient PS can decrease the false alarm rate during detection, resulting in more efficient detection of objects corresponding to the PS although the gain substantially depends on the object shape. As a result, the curves exhibit a decline not only at high but also at low threshold values (see Fig. 6). F Nobj ; Psmax 1 2 2 T 1 where brightness of each object corresponds to the individually chosen threshold values. The image contains 82 objects which are selected by the use of local adaptive threshold for every object. Corresponding variation of threshold values is shown in Fig. 8a where x-axis is the isolated object number. Typical U-shaped optimization curve is presented in Fig.8b. T b Ps Fig. 5. Results of Gaussian noise processing: a– number of objects Nobj and maximum perimeter elongation coefficient PSmax after area selection with Smin = 10; b – probabilities of noise emissions for simple binarization (1) and after preselection by area with Smin = 10 (2). Topt a T nobj a b lg10 F Fig. 8. The dependence of the optimal threshold values on the object number (a) and typical U-shaped optimization curve (b) tNP Fig. 6. The dependence of logarithm for rate of exceedances (false alarm rate) upon normalizing threshold for Smin = 10 and PSmax = 1000, 300, 100, 10, 3 (from top to the bottom) Next let us consider an aerial radar image depicted in Fig. 9a, and a sketch based on an aerial photographic image of the same area shown in Fig. 9b. A common problem is to combine or match fragments obtained by different remote sensing tools. One possible solution to this problem is to select the same objects in different images, which are then used to obtain reference points, this performing matching not at pixel-level but already at object-level representation. 50 50 100 100 50 100 150 150 150 200 200 200 250 50 250 50 100 150 200 250 300 350 400 450 100 150 200 50 250 100 150 200 500 a b a 50 50 100 100 150 150 200 200 250 50 100 c b Fig. 7. Object selection from a television image with Smin = 150 and PSmax = 50. Brightness scale indicates the adaptive threshold values VI. PRACTICAL EXAMPLES OF PROCESSING WITH REMOTE OBSERVATIONS Next let us consider an aerial television image shown in Fig. 7a. The task is to select and localize buildings in this scene. The proposed method is applied with Smin = 150 and PSmax = 50. The selection results are presented in Fig. 7b 150 200 250 50 100 150 200 d Fig. 9. Radar image (a), sketch (b) and their Otsu analysis results (c and d) In this case, the global threshold does not work well leading to the destruction of the majority of objects as indicated by Figs. 9c,d showing binary layers obtained using the Otsu threshold. In contrast, by using an adaptive local threshold with selection of objects by area taking into account the PS coefficient allows one to obtain several representations for each of the selected objects, which differ in the displayed parameter (see results in Fig. 10). Bright = Length Psmax = 30 ×10 6 4 ×10 4 6 Bright = Length Psmax = 30 20 40 5 object exhibits a characteristic contour curve following the river bank that can be easily used for the image matching. 5 50 60 80 4 VII. CONCLUSIONS 4 100 100 3 120 140 3 150 2 160 2 180 200 1 200 1 220 250 50 100 150 200 250 0 50 100 a 150 200 b Bright = OptThr Psmax = 30 Bright = OptThr Psmax = 30 140 20 40 120 200 50 60 100 80 150 100 100 80 120 140 60 150 100 200 50 160 40 180 200 20 220 250 50 100 150 200 250 0 50 100 c 150 200 d Bright = Psmax-Vopt Psmax = 30 nobj = 200 Bright = Psmax-Vopt Psmax = 30 nobj = 151 To summarize, method and algorithm for the adaptive selection of compact and prolonged objects in remote sensing images are proposed. The methodology is based on the initial multi-threshold processing of the raw image, resulting in the creation of several binary layers. After selecting isolated objects in each of the binary layers, the layers containing the best object representation in terms of its geometric criteria are used for further analysis. The key idea of the algorithm is using the a posteriori information about object representations in each binary layer, and finding the best layer in terms of the object properties depending on the requirements such as preservation of each object shape despite the non-stationary background. Based on this approach automated adaptive selection algorithms can be easily implemented. In a test detection problem, the elimination of small objects by preselection provides with at least 6 dB benefit in terms of the signal-to-noise ratio. We believe that this approach is particularly suitable for the analysis of objects with characteristic edges such as detection and monitoring of the progression of cracks in ice shields. 20 25 25 40 50 ACKNOWLEDGEMENT 60 20 20 80 100 100 15 120 15 140 150 10 160 10 We acknowledge the financial support of this work by the Russian Science Foundation (grant No. 16-19-00172). 180 200 5 200 REFERENCES 5 220 250 50 100 150 200 250 0 50 100 150 200 [1] e f Fig. 10. Different views of selected objects with respect to displayed parameter: a – object area; b – minimal coefficient Psmin; c – difference between thresholds Tmax – Topt 25 20 25 40 50 20 60 20 80 100 100 15 15 120 140 10 150 5 200 0 250 10 160 180 5 200 220 50 100 150 a 200 250 0 50 100 150 200 b Fig. 11. Objects which are selected separately on the radar image (a) and on the sketch (b) This displayed parameter can be the object area (Fig. 10a,b), the minimal value of its PS coefficient (Fig. 10c,d) achieved at the optimal threshold Topt, as well as the value of this optimal threshold Topt itself. Fig. 10e,f shows the difference Tmax – Topt, although the choice of one or another criteria largely depends on particular problem, as objects with different topology and texture are better selected using different parameters or criteria based on their combination. It is important that the objects of interest can be selected separately in each image source, thus making possible their matching based on their properties in each of the original images. To illustrate this, Fig. 11 shows the same detached object in the radar image (a) and in the sketch (b) after the percolation of the base object. In both images the detached G. Cheng, J Han, “A survey on object detection in optical remote sensing images,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 117, 2016, pp. 11–28 [2] G. P. Patil, C. Taillie, “Upper level set scan statistic for detecting arbitrarily shaped hotspots. Environmental and Ecological Statistics,” vol. 11, 2004, pp. 183–197 [3] W. Zhou, A. Troy, “An Object-Oriented Approach for Analyzing and Characterizing Urban Landscape at the Parcel Level,: International Journal of Remote Sensing, vol. 29(11) , 2008, pp. 3119–3135 [4] H. Gu, Y. Han, Y. Yang, H. Li, Z. Liu, U. Soergel, T. Blaschke, S. Cui, “An efficient parallel multi-scale segmentation method for remote sensing imagery,” Remote Sensing, vol. 10 (4) , 2018, pp. 590 – 608 [5] M. Langovoy, O. Wittich, “Randomized algorithms for statistical image analysis and site percolation on square lattices,” Statistica Neerlandica, vol. 67, 2013, pp. 337–353 [6] E. Arias-Castro, G. R. Grimmett, “Cluster detection in networks using percolation,” Bernoulli, vol. 19(2) , 2013, pp. 676–719 [7] V. Volkov, “Extraction of extended small-scale objects in digital images. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., vol. XL-5/ W6, 2015, pp. 87–93 [8] M.I. Bogachev, V.Y. Volkov, O.A. Markelov, E.Y. Trizna, D.R. Baydamshina, V. Melnikov, R.R. Murtazina, P.V. Zelenikhin, I.S. Sharafutdinov, A.R. Kayumov, “Fast and simple tool for the quantification of biofilm-embedded cells sub-populations from fluorescent microscopic images,” PLoS One, vol. 13 (5) , 2018, pp. e0193267 [9] H. Rohling, “Ordered statistic CFAR technique - an overview,” in Proc. 12th International Radar Symposium (IRS), 2011, pp. 631–638 [10] M.I. Bogachev, V.Y. Volkov, G. Kolaev, L. Chernova, I. Vishnyakov, A. Kayumov, “Selection and Quantification of Objects in Microscopic Images: from Multi-Criteria to Multi-Threshold Analysis,” BioNanoScience, vol. 9, 2019, pp. 59–65 [11] V. Yu. Volkov, M. I. Bogachev, A. R. Kayumov, “Object Selection in Computer Vision:from Multi-Thresholding to Percolation Based Scene Representation,” in: Computer Vision in Advanced Control Systems-5, Intelligent Systems Reference Library, vol. 175. Springer, Cham, 2020, pp. 161-194