Detection and Extraction of Objects in Digital Images Vladimir Yu. Volkov Radio Systems Dept. Saint-Petersburg Electrotechnical University ''LETI''; Saint-Petersburg State University of Aerospace Instrumentation Saint-Petersburg, Russia vl_volk@mail.ru Abstract—Detection and extraction of objects of interest out of monochrome images based on their compactness and isolation are essential in remote sensing systems data analysis. Algorithms under consideration are based on the results of multi-threshold processing, which provides with a set of binary layers. This allows for further morphological processing of isolated objects in each binary layer in order to analyze their geometric characteristics and perform their subsequent selection by geometric criteria. As a result, one can set an adaptive detection threshold individually for each of the selected objects. Using selection allows to significantly reduce the number of false alarms during detection as well as to use lower-level thresholds this way increasing the probability of correct detection of the objects of interest. The results of synthetic test imagery analysis as well as object detection in remote observational imagery demonstrates explicitly the effectiveness of the considered algorithms. Keywords- object detection and extraction; multi-threshold processing; image segmentation; percolation I. INTRODUCTION The problems of detection, extraction and localization of the objects of interest are relevant in the analysis of images obtained by various remote sensing systems and thus are being intensively studied in the last decades [1–5]. There are various approaches to the image segmentation problem. Retrospectively, the earlier developed and the simplest methods were based on comparison of an image with a certain threshold, this way segmenting the source image into more intensive (above the threshold) and less intensive (below the threshold) areas. In threshold based approaches, one of the key problems in the choice of appropriate criteria for the threshold selection. Among various approaches, several could be mentioned as being widely used in applied computer vision, namely the Otsu method based on maximizing the variance between the segments while minimizing the variance within each segment, as well as methods based on the optimization of corresponding entropy metrics [6]. Other relevant approaches include clustering based methods with k-means clustering being the most representative and widespread technique [7], histogram based methods that focus on finding local peaks and valleys is the sample histogram computed from all of the pixels in the Mikhail I. Bogachev Radio Systems Dept. Saint-Petersburg Electrotechnical University ''LETI'' Saint-Petersburg, Russia rogex@yandex.com image with the balanced histogram algorithm being a representative example [8], compression based segmentation that focuses on finding regular patterns in an image this way aiming at the minimization of the coding length of the data, which is a common step in the image compression algorithms [9], as well as edge detection based methods that focus on the adjustment of neighboring sharp edges to form close region boundaries [10]. Among more complex alternatives several nonlinear approaches such as the fuzzy logic based non-linear adaptive thresholds [6] as well as some specialized methods based on an early adaptive estimation of threshold values from the raw data even prior to the image formation that are used in some scanning and tomographic systems are of interest [11]. Segmentation is typically followed by the selection of certain segments that represent potential objects of interest. In a general perspective, algorithm performance optimization encourages possible integration of the image segmentation and object selection procedures, leading to some of the segment properties calculated at the segmentation stage being available later for the object selection process. In contrast, additional parameters of the segments required for their selection could be calculated at the object selection stage, this way providing additional information for choosing a preferential segmentation scenario. Furthermore, running the entire selection procedure for several segmentation scenarios may further improve decision making, as it would be based not only on a priory but also on a posteriori information. Although this is always done at the cost of the algorithm performance, as calculation of additional scenarios considerably enhances the overall computational complexity of the algorithm, this approach might nevertheless appear attractive when considering implementation variants for highly parallel hardware architectures that are common nowadays [12]. In the context of threshold based approaches, the latter scenario is represented by multi-threshold processing. Various applications of multi-threshold processing for image segmentation have been considered in numerous papers, a brief overview can be found for example in [13, 14]. Multi-threshold segmentation is often based on the histogram properties of the original image. In most cases, the last step is to select the optimal threshold. The properties of the objects of interest and the results of their selection are not taken into account. Here we consider the a posteriori decision making concept discussed above in the context of the multi-threshold image processing problem, where the statistics required for the image segmentation and object selection are first being calculated for a series of test thresholds, and final decisions upon appropriate segmentation and selection scenarios are based on the analysis of these statistics for the entire set of the thresholds tested [15]. In the case of selecting a group of homogeneous objects, a simple approach can be used, where the best global threshold value is obtained from the maximum area occupied by objects of a certain size or shape parameters, which is optimized among the entire range of test thresholds [16]. For dissimilar objects, an alternative approach is developed that involves the reconstruction of a three-dimensional hierarchical structure of objects based on multi-threshold processing using the percolation effect [17]. This approach allows linking the properties of an object cuts in neighboring binary layers and build a hierarchical structure for subsequent segmentation [18]. The methodology under consideration contains easily implementable computational operations, a small number of adjustable parameters, while being efficient especially for highly parallel hardware architectures. The goal is to estimate its effectiveness in the detection and selection of compact objects in noisy synthetic and observational images. II. SELECTION OF OBJECTS IN MULTI-THRESHOLD IMAGE PROCESSING To implement the above idea in a working algorithm, it is essential to find appropriate quantities that could characterize segmentation results in the context of the selection of the objects of interest. Although such adjustments should be generally based on the expected properties of the object of interest that may vary considerably between different areas of application thus being largely problem oriented, there are some universal strategies that are expected to be applicable in a general context with only minor problem-specific adjustments. The very first requirement is that reasonable metrics should be scalable, i.e., they should not depend on the image scale, at least in the first approximation, not to mention inevitable discreteness effects at the moment. Invariant geometric metrics include the ratio of an object's perimeter squared to the area of the object PS = P2/4πS. This feature characterizes the compactness of the object [19]. The normalizing coefficient 4π is introduced in order to provide a unity value of the coefficient for the most compact planar object with given area, which is a circle. Noise and background objects are usually characterized by fractured edges, so their PS values significantly exceed one. Objects of interest that have a compact shape have lower coefficient values than noise induced segments, which makes it possible to detect and select them from a noisy background. Another geometric invariant is the coefficient of extension of the main axis of the describing ellipse PL = πL2/4S also normalized such that it is equal to one for a simple circle [20]. When using the PS and PL coefficients, the threshold level can be set for each of the selected objects by the minimum value of this coefficient in a certain binary layer. In this case, adaptive local thresholds can be obtained. These methods can significantly reduce the number of false alarms during detection and use lower-level thresholds this way increasing the probability of correct detection of the objects of interest. III. THE HIERARCHICAL STRUCTURE ON THE BASIS OF MULTITHRESHOLD PROCESSING TAKING INTO ACCOUNT THE EFFECT OF PERCOLATION To select a local threshold, one should first determine the relationship between the individual layers, and establish whether each pixel belongs to the same or to a new object observed in the next binary layer above the previous one. Accordingly, one needs to establish links between pixels with the same coordinates in different binary layers. After the introduction of a certain parameter that determines the specified relationship between pixels in different layers, a three-dimensional hierarchical structure can be formed on the basis of a single binary object, obtained at a zero threshold value and occupying the entire image area. To resolve the above problem, an algorithm based on the percolation principle is further used [18]. The original grayscale image I(x, y) is being subjected to a set of thresholds T resulting in a series of binary layers BT: {BT = 1 if I(x, y) ≥ T; BT = 0 if I(x, y) < T}, in which a subset of units represents objects of interest, while a subset of zeros refers to the background. For the starting threshold T = 0, the corresponding binary layer contains a single global isolated object with an area of S0, occupying the full size of the image. Now let the threshold T be increased by ∆T and a new binary layer be created above the previous one satisfying I(x, y) > ∆T. For small ∆T only a few pixels are being excluded from the object, although its area ST < S0 is being consecutively reduced at each step. Further enhancement of the threshold increases the proportion of pixels outside of the single global object. At a certain point these pixels merge together forming gaps in the image. Finally, these gaps merge together resulting in the decomposition of the single object and its separation into several isolated fragments. This kind of phase transition event is known as percolation [17]. After the destruction of the original object, a set of fragments is formed in its place, each representing its potential successor. In this case, one should decide whether there is a successor to the original object at all, or whether the original object is eliminated, and all appeared fragments should be accounted as new objects. To do this, one requires a parameter that characterizes the stability of each isolated object when the threshold changes. This parameter is associated with the rate at which the area of an isolated object decreases as the threshold increases. For each couple of consecutive layers, the object areas ratio ST+∆T / ST characterizes the proportion of connected pixels that remain inside the isolated object. Once an increasing number of pixels fall out of the object each time the threshold is being incremented by ∆T, leading to the new value T+k∆T, for the kth layer, the fraction of pixels remaining within the object Kp = ST+∆T / ST can be considered as a characteristic of the expected persistency (stability of the area) for the object as the threshold increases. As long as the ratio Kp is equal to or more than 1/2, the object in the upper layer can be considered as the unambiguous successor of the object in the lower layer, which turns out to be its only predecessor or base object. If the base object is precisely divided in two halves, both fragments, minus the pixels that form the gap, are smaller in area than ST/2, and thus two new objects appear. 25 5 30 25 20 10 20 15 15 15 10 20 10 5 0 40 25 5 30 40 30 30 20 10 15 20 25 20 10 0 5 10 30 0 b Npix a 0 As a result of this approach, a three-dimensional hierarchical structure is formed, containing all the selected objects, in which each pixel no longer belongs to one binary layer, but can correspond to several binary layers k = 0, ... , K in the structure. Based on this construction, it is possible to generalize image segmentation and selection of objects. To do this, restrictions are introduced on the area of the selected objects and other geometric features that characterize its compactness and shape. For each selected object that passes through multiple binary layers in a three-dimensional representation, the optimal Topt threshold corresponding to the layer with its best cut can be selected by various geometric or textural criteria. Particular examples of selection quantities are the areas or the ranges of areas of selected objects as well as the invariant geometric coefficients PS and PL discussed above. An example in Fig. 1 explains the above. The image is being binarized by a discrete threshold. The original object at T = 0 slowly loses its area with increasing threshold (Fig. 1c). Its persistence coefficient Kp is more than one half and the object is fragmented into two parts at the threshold T1 = 17 (Fig. 1d). Then the first of the two objects disappears after its percolation threshold Tc1 = 21. Its percolation coefficient is very small; it is less than 0.01. The second object has a slightly higher percolation coefficient, equal to 0.12. IV. 1 2 T c 1 1 0.9 5 OBJECT SELECTION USING ADAPTIVE GLOBAL THRESHOLD The simplest selection is based on the area of objects. In some cases, there is a priori information about the typical size or area of objects. Too small objects should be removed from consideration, which significantly reduces the computational complexity of the algorithms. Thus, the minimum area Smin of the object of interest is one of the algorithm parameters. 0.9 5 0.8 10 0.7 0.8 10 0.7 0.6 15 0.6 15 0.5 0.4 20 0.5 0.4 20 0.3 25 0.3 25 0.2 0.1 30 0.2 0.1 30 0 5 10 15 20 d 25 30 0 5 10 15 20 25 30 e Figure 1. Example of two Gaussian objects (a,b) and the percolation effect: c – reducing areas in pixels for two objects with increasing threshold: total area – dashed line; (d,e) – binary slices at the moment of percolation at threshold T = 17 Let the object first appear as a result of percolation of a larger object at threshold T and occupy the area ST that we later denote as its base area. With increasing T, the area of the object monotonously decays until either the object disappears entirely indicated by ST+(K+1)∆T = 0 or the object disintegrates indicated by ST+(K+1)∆T / ST+K∆T < Kp. The change in area over the lifetime of an object ST+K·∆T / ST can be considered as its percolation coefficient PC. The value of the threshold level TK is considered the percolation threshold Tc for this object. The simulation results are shown in Fig. 2 which contains 49 squared objects of 16×16 pixels with standard Gaussian noise background (Fig. 2a). The signal-to-noise ratio (relative expectation shift or deflection) in each pixel is d = 1.163. The dependence of the total number of connected objects on the threshold value is shown in Fig. 2b. When selecting objects by area with Kp = 0.75, acceptable distortion of object boundaries is achieved at the threshold value T = 133 (Fig. 2c). At lower threshold values, the shape of objects is distorted by fractal noise, which significantly fragments the boundaries. A simple detector with a Neumann-Pearson threshold gives a false alarm probability F = 0.01 in each pixel, as shown in Fig. 2d, but the probability of detection is only D = 0.12, so the objects are highly fragmented and none of them retain its squared shape. The corresponding detection characteristic is shown by the right curve in Fig. 4, where x-axis contains the signal-to-noise ratio (deflection). Widely used Otsu detector gives too many false alarms as it is shown in Fig. 2e. selection, such a threshold will give a significantly greater probability of false alarm (0.32) (that is shown in Fig. 4, 2). Detection curve for every pixel in the case of object selection goes between these two as it is shown in Fig. 4, 3. Results are obtained by simulation so y-axis presents estimate of detection probability. The threshold deflection now is equal to 0.5 providing a gain in signal-to-noise ratio about 6 dB. If there is information about the shape of signal area, the characteristic can be improved using accumulation. 50 Nobj 100 150 200 250 50 100 150 200 T 250 a b T = 133 Object number i= 2870 LENGTH = 180 200 50 150 150 100 200 50 1 lg10 F 100 2 3 5 250 100 150 200 250 c Smin Otsu Threshold 50 50 100 100 150 150 200 200 Figure 3. Dependence of the threshold exceeding degree for noise (logarithm of false alarms probability) upon the area Smin. Curves 1 to 5 correspond to rising threshold levels T = 150, 155, 160, 165, 170, respectively. 2 100 d 150 200 250 50 100 150 200 250 e Figure 2. Detection of square objects in noise: a – test image; b – dependence of number of selected objects on threshold value; c – percolation-based adaptive multi-threshold selection; d – Neymann-Pearson detector; e – Otsu detector. The brightness scale in Fig. 2c shows the value of the object area. The probability of false alarm depends on the minimal area Smin of the detected objects (x-axis), as shown in Fig. 3 in logarithmic scale. The curves are obtained by simulation. For a given threshold and for each area Smin (which is laid in pixels along the horizontal axis), the amount of noise exceedances has been calculated and normalized to the field size. The y-axis is the decimal logarithm of this normalized value which corresponds to the estimated degree of false alarm probability. Curves 1 to 5 correspond to the rising threshold levels T = 150, 155, 160, 165, 170, respectively. The consequence of selection and removal of small objects is the ability to reduce the threshold level while maintaining low false alarm probability when useful objects are detected. The larger the area of noise objects to be removed, the lower the detection threshold can be set at the same probability of false alarm. For Smin = 150 in this task it gives normalized threshold tNP = 0.47 instead of 2.326 (see Fig. 4, 1). It is clear that without D 250 250 50 4 0 50 3 1 S/N Figure 4. Detection characteristics for object selection by area for Smin=150. The x-axis contains signal-to-noise ratio (deflection). 1 - Curve on the right – Neymann-Pearson (NP) detector with high threshold; curve on the left – NP detector with low threshold; curve in the middle – detector for low threshold with object selection It is worth noting that the adaptation of the threshold level to the maximum of the selected objects makes sense in those situations when the scene contains a sufficient number of them. In addition, this method gives a slightly lower value of the threshold level. These shortcomings can be eliminated by the use of geometric invariants [19]. V. OBJECT SELECTION USING ADAPTIVE LOCAL THRESHOLD BASED ON GEOMETRIC CRITERIA Consider first binarization of a white noise field with restriction Smin=10 which can substantially reduce the number of isolated objects after binarization. The dependence of object number Nobj upon threshold value T is shown in Fig. 5, a, curve 1, where the maximum number of objects is 500. The bottom curve 2 presents the dependence of maximum (over all objects in the given slice) perimeter elongation coefficient PSmax upon the threshold value. Obviously this coefficient reaches big values (here about 90) for noisy objects with fractal structures. Area preselection reduces the probability of noise emissions as shown in Fig. 5b (left bottom curve) compared to simple binarization (right bottom curve), while it has virtually no effect on the values of the perimeter elongation coefficient PS for the remaining objects. F Nobj ; Psmax 1 2 2 VI. PRACTICAL EXAMPLES OF PROCESSING WITH REMOTE OBSERVATION Let us consider an observational remote sensing image shown in Fig. 7a. The task is to select and localize compact objects (e.g., buildings) in this scene. The proposed method is next applied with Smin = 150 and PSmax = 50. The selection results are presented in Fig. 7b where brightness corresponds to the threshold values. Image contains 82 objects which are selected by the use of local adaptive threshold for every object. Corresponding dependence of threshold values is shown in Fig. 8a where x-axis contains numbers of isolated objects. Adaptive local threshold is set for every object by the use of optimization process. Typical U-shaped optimization curve is presented in Fig.8b. 1 T 50 T 100 a b 150 Figure 5. Results of Gaussian noise processing: a – number of selected objects Nobj (1) and maximum perimeter elongation coefficient PSmax (2) after area selection with Smin = 10; b – probabilities of noise emissions for simple binarization (1) and after preselection by area with Smin = 10 (2). 200 250 50 100 It is interesting to investigate the influence of limitations on the coefficient PS. Obviously restriction on maximum value for PS results in additional removing of noise objects which have too large elongation coefficients. It seems these objects have fractal structures and do not represent the objects of interest. The value PSmax is the second important parameter for adaptive object selection. Algorithm removes objects with PS > PSmax. 150 200 250 300 350 400 450 500 lg10 F a b Figure 7. Object selection in a television image with with Smin = 150 and PSmax = 50. Brightness corresponds to the adaptive threshold values. As it can be shown from analysis limitations on perimeter elongation coefficient PS can decrease the false alarm probability in the detection task. It may result in more efficient detection of objects corresponding to the PS and the gain is dependent on this value which in turn depends on the threshold. So the gain substantially depends on the object shape. As a result, the curves get a dip at low threshold values (see Fig. 6). Topt Figure 6. The dependence of logarithm for rate of exceedances (false alarm rate) upon normalizing threshold for Smin = 10 and PSmax = 1000, 300, 100, 10, 3 (from top to the bottom) Ps tNP nobj a T b Figure 8. The dependence of adaptive threshold values on the object number (a) and typical U-shaped optimization curve (b). VII. CONCLUSIONS To summarize, a recently proposed algorithm for the adaptive selection of compact and prolonged objects based on images of remote observations has been investigated. Based on the results of the initial multi-threshold processing of the raw image, the algorithm effectively selects isolated objects in each of the binary layers. Next by considering a series of binary layers one containing the object's best representation in terms of the applied geometric criterion is being selected. The key idea of the algorithm is that the decision bases on a posteriori information about the properties of objects that can be selected from each binary layer, and finding the best layer in terms of the properties of these objects. The quantitative analysis performed here indicated that by using this information, one can successfully implement the adaptive selection, while preserving the shape of each object of interest, despite the presence of non-stationary background. In a considered test detection problem, the use of selection results provides with benefit that is equivalent to approx. 6 dB gain in the signal-to-noise ratio. The effectiveness of the algorithm is supported by both computer simulations and analysis of empirical observational imagery obtained from the remote observation applications. We believe that other relevant examples from remote observation problems could be resolved using similar strategy. Moreover, in addition to direct benefits from using a posteriori information from the initial multi-threshold processing, the proposed approach also leads to a generalized hierarchical object reconstruction that in turn generalizes segmentation and selection problems in comparison with the conventional image analysis. [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] VIII. ACKNOWLEDMENT We would like to thank the Russian Science Foundation (project No. 16-19-00172) for the financial support of this work. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] G. Cheng, J Han, “A survey on object detection in optical remote sensing images,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 117, pp. 11–28, 2016 E. Arias-Castro, G. R. Grimmett, “Cluster detection in networks using percolation,” Bernoulli, vol. 19(2), pp. 676–719, 2013 G. P. Patil, C. Taillie, “Upper level set scan statistic for detecting arbitrarily shaped hotspots. Environmental and Ecological Statistics,” vol. 11, pp. 183–197, 2004 W. Zhou, A. Troy, “An Object-Oriented Approach for Analyzing and Characterizing Urban Landscape at the Parcel Level,” International Journal of Remote Sensing, vol. 29(11), pp. 3119–3135, 2008 H. Gu, Y. Han, Y. Yang, H. Li, Z. Liu, U. Soergel, T. Blaschke, S. Cui, “An efficient parallel multi-scale segmentation method for remote sensing imagery,” Remote Sensing, vol. 10(4), pp.590–608, 2018 A. Kashanipour, N. Milani, H. Eghrary, "Robust Classification Using Fuzzy Rule-Based Particle Swarm Optimization," IEEE Congress on Image and Signal Processing, vol. 2, pp. 110–114, 2008 L. Barghout, J. Sheynin, "Real-world scene perception and perceptual organization: Lessons from Computer Vision," Journal of Vision, vol. 13 (9), pp. 709, 2013 A. Anjos, H. Shahbazkia, "Bi-Level Image Thresholding – A Fast Method, " Biosignals, vol. 2, pp. 70-76, 2008 [19] [20] M. Hossein, R. Shankar, Y. Allen S. Sastry, Y. Ma, "Segmentation of Natural Images by Texture and Boundary Compression," International Journal of Computer Vision, vol. 95, pp. 86–98, 2011 R. Kimmel, A.M. Bruckstein, "Fast Edge Integration," International Journal of Computer Vision, vol. 53(3), pp. 225–243, 2003 K. J. Batenburg, J. Sijbers, "Optimal Threshold Selection for Tomogram Segmentation by Projection Distance Minimization," IEEE Transactions on Medical Imaging, vol. 28 (5), pp. 676–686, 2009 D. Bogayevskiy, S. Ezhov, D. Kaplun, D. Minenko, S. Aryashev, K. Petrov, “Study of Vector Processor Architectures for Image Processing Using Model Profiling, ”, Proceedings of the 8th Mediterranean Conference on Embedded Computing, pp. 8760039, 2019 B. D. Shivahare, S. K. Gupta, “Multilevel Thresholding based Image Segmentation using Whale Optimization Algorithm,” International Journal of Innovative Technology and Exploring Engineering (IJITEE), vol. 8, Issue-12, October 2019 E. Cuevas, A. González, F. Fausto, D. Zaldívar, M. Pérez-Cisneros, “Multithreshold Segmentation by Using an Algorithm Based on the Behavior of Locust Swarms,” Mathematical Problems in Engineering, Article ID 805357, 2015 V. Volkov, “Extraction of extended small-scale objects in digital images. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., vol. XL-5/ W6, pp. 87–93, 2015. https://doi.org/10.5194/isprsarchives-XL-5-W6-872015 M. Bogachev, V. Volkov, O. Markelov, E. Trizna, D. Baydamshina, V. Melnikov, R. Murtazina, P. Zelenikhin, I. Sharafutdinov, A. Kayumov, “Fast and simple tool for the quantification of biofilm-embedded cells sub-populations from fluorescent microscopic images,” PLoS One, vol. 13, Iss. 3, p. e0192022, 2018 M. Langovoy, O. Wittich, “Randomized algorithms for statistical image analysis and site percolation on square lattices,” Statistica Neerlandica, vol. 67, pp. 337–353, 2013. doi:10.1111/stan.12010 V. Yu. Volkov, M. I. Bogachev, A. R. Kayumov, “Object Selection in Computer Vision:from Multi-Thresholding to Percolation Based Scene Representation,” in: Computer Vision in Advanced Control Systems-5, Intelligent Systems Reference Library, Springer, pp. 161-194, 2019. https://www.springer.com/gp/book/9783030337940 MI Bogachev, VY Volkov, G Kolaev, L Chernova, I Vishnyakov, Airat Kayumov, “Selection and quantification of objects in microscopic images: from multi-criteria to multi-threshold analysis,” Bionanoscience, vol. 9(1), pp. 59-65 V. Melnikov, M. I. Bogachev, V. Y. Volkov, O.A. Markelov, “Selection and Analysis of Objects in Multi-Threshold Image Processing,” Proceedings of the IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering, pp. 1202-1205, 2019