Proc. IEEE Int. Conf. on Image Processing ICIP2001, October 7-10, 2001, Thessaloniki, Greece, Vol. I, pp. 858-861. FINDING OBJECTS IN A 3D ENVIRONMENT BY COMBINING DISTANCE MEASUREMENT AND COLOR INDEXING Andreas Koschan, SunHo Lee, and Mongi A. Abidi Imaging, Robotics, and Intelligent Systems Laboratory Department of Electrical and Computer Engineering The University of Tennessee koschan@iristown.engr.utk.edu ABSTRACT In this paper, a new method is presented for the localization and recognition of three-dimensional objects using color information. In the first processing step, we estimate depth information by either applying a chromatic block matching method to color stereo images or acquiring a range image from a laser scanner. Second, the computed depth maps are segmented to distinguish between the image background and the objects that should be recognized. Assuming that the segmented regions represent single objects in the three-dimensional scene, feature vectors are generated based on color histograms. The Euclidean distance is used as well as the scalar product to measure the similarity between the feature vectors computed from the color image and the feature vectors stored in a database. 1. INTRODUCTION Color indexing [1] is a very efficient method for the recognition of colored objects in color images. It is based on a comparison of color distributions or color histograms of objects in a color space. During the last years, several proposals were presented to make this method approximately color constant. Funt and Finlayson suggested [2] the use of relations or quotients of color histograms instead of the color histograms themselves for the comparison of objects. Through this, the results of the object recognition obtained under varying illumination can be enhanced in comparison to the results of a "direct" color indexing method. However, the method of Funt and Finlayson is sensitive to noise, especially in sparsely illuminated areas of the scene. The idea of representing the objects by a small set of moments of color histograms was suggested by Healey and Slater [3]. Assuming that the change in illumination can be described by a linear model, they prove that some moments of the color distribution are invariant to changes in illumination. Other approaches to color constancy use, for example, six angles of the color distribution instead of a color histogram to represent an object [4] or comprehensive color image normalization is proposed [5]. The robustness or non-robustness of color constancy algorithms is still being evaluated and discussed [6]. In this paper, we focused on a task for the efficient and robust recognition of colored objects. This task is also able to localize the objects in the three-dimensional environment of the camera. An object recognition method based on color indexing is combined with a distance measurement technique to localize the objects in the scene. Since different requirements have to be met for different applications, we used two optional techniques to measure the distance between the camera system (for example mounted on a mobile robot) and the objects to be recognized and located. The first technique employs passive stereo sensing. It is proved to be very worthwhile for color stereo [7]. The second technique employs active sensing based on the principle of 3D laser scanners. The computed depth maps or range images are used to segment the color images into objects and background. Therefore, we can recognize and localize single objects in color images presenting multiple objects. We assume that all objects are opaque and no object covers another object in the color images. The processing steps and the results are detailed in the following chapters. 2. DISTANCE MEASUREMENT WITH COLOR STEREO IMAGES Finding corresponding pixels in the images is still one of the main problems in stereo matching techniques (correspondence problem). In this approach, we used a chromatic block matching technique [7] that is based on a similarity check between the color distributions of two blocks (of size n × m) in the left and in the right image. One of the images (for example the right one) is segmented into blocks of equal size. For each of these blocks a corresponding block is searched for in the other image. The mean square error MSE between the color vectors corresponding with the pixels inside the respective blocks defines a measure for the similarity or nonsimilarity of two blocks. We measure the color distance in the RGB color space by the Euclidean distance Dε . In the RGB color space the left color image FL and the right color image FR may be represented as FL ( x, y ) = ( RL ( x, y ), GL ( x, y ), BL ( x, y )) and FR ( x, y ) = ( R R ( x, y ), G R ( x, y ), B R ( x, y )) . For a block size of n × m pixels and an offset δ the mean square error MSE is defined by MSE ( x, y, δ ) = = 1 n −1 m −1 ∑ ∑ Dε (FR ( x + i, y + j ), FL ( x + i + δ , y + j )) n ⋅ m i =0 j =0 = 1 n⋅ m n−1 m−1 ∑ ∑ ( RR (x + i, y + j) − RL ( x + i + δ , y + j) 2 i= 0 j= 0 + GR (x + i, y + j) − GL (x + i + δ , y + j) 2 2 + BR (x + i, y + j) − BL (x + i + δ , y + j ) ) The block (of size n × m) is shifted pixel by pixel inside the search area. Using standard stereo geometry the epipolar lines match the image lines. Furthermore, the search area in the left image is limited by a predefined maximum disparity d max . The disparity D between the blocks in both images is defined by the distance between the positions (the difference in the columns) of the blocks, showing the minimum mean square error in both images. D = arg min {MSE (x, y, δ )} . δ ≤d max The disparity with the smallest difference to the disparity of the neighboring block is selected if more than one minimum exists (disparity smoothness constraint). As opposed to the approach presented in [7], single disparities will not be computed for each pixel to reduce computation time. The computation of block disparities is sufficient for this approach because only a coarse localization of the objects in the three-dimensional environment is needed instead of a surface reconstruction. An average distance between the colored object and the camera system is computed (for example 1.25 m for the whole object). If the image background represents one single color or if the image background is nearly homogeneous then the method mentioned above computes disparity values by chance for the pixels representing the image background. Fortunately, these (mainly false) disparity values are not considered in the succeeding processing steps. Only the values corresponding to an object are regarded. Therefore, a coarse segmentation is applied to the disparity map or depth map. 3. SEGMENTING DISPARITY IMAGES The segmentation of disparity maps or depth maps into objects and image background constitutes a problem that cannot be solved without additional constraints about the scene. A basic assumption is that objects located nearer to the camera show higher disparity values than the disparity values for the pixels representing the image background. The disparity values computed for the image background should be close to zero. This is not true if the image background is nearly homogeneous. In this case, (false) disparity values are computed by chance for the image background. A segmentation assuming zero disparities for the background produces false results. Another problem in disparity segmentation occurs when shadows are seen in the stereo images. The shadow of an object shows nearly the same disparity values as the object itself. Therefore, the shadow would be categorized as part of the object if no information about the shadows in the images is known. An optional two-stage criterion is applied to solve the problems mentioned above at least in parts. We assume that the image background represents either a known single color or the image background is structured in a way that corresponding pixels can be found by the stereo matcher. A pixel is segmented as part of an object if two criteria are fulfilled. First, the disparity values are relatively constant along a scan line in a bigger area (that has to be established). Second, the corresponding color vectors differ from the background color (which is black in our stereo images). As opposed to an exclusive evaluation of the image values this technique has the advantage that it can be applied to images with uniform background colors as well as to images with textured backgrounds. This technique yields for the objects at least a "coarse" segmentation that does not consist of all pixels representing the objects. Due to the robustness of the color indexing technique the missing pixels do not influence the recognition results. 4. DISTANCE MEASUREMENT BY A LASER RANGE SCANNER High quality color stereo images can be easily obtained by a binocular camera system at low cost. However, the resolution in depth obtained when applying a stereo matcher is in general lower than the resolution obtained when applying a laser range scanner. If an application requires high accuracies for distance measurement, we employ laser range techniques. Currently, we acquire range images and true color images with the 3D laser mirror scanner LMS-Z210 by RIEGL. The system is based on accurate distance measurement by means of electrooptical range measurement and a two-axis beam scanning mechanism. It is loaded with an additional three-color passive receiver that facilitates the capture of registered range, intensity, and true color images at the same time. Since the system provides registered range and color images, we can easily combine results obtained by techniques analyzing either range images or color images. However, we have to segment the range images to distinguish between objects and background. While we have to deal with mismatches when segmenting depth maps obtained by stereo techniques, we have to consider noise when segmenting range images obtained by laser techniques. Based on a comparison of range image segmentation algorithms [8], we developed a scheme to segment noisy range images (see [9] for further details). However, the problem of how to handle complex backgrounds and occlusions is still part of further investigations. Figure 1: Left and right stereo image SCENE_5. 5. GENERATION OF FEATURE VECTORS Feature vectors were generated for each object in the database and for each "coarsely" segmented object in the image. We use the HSV color space since this color space enables a simple investigation of the hue (H), saturation (S), and intensity (V) values. The investigations should answer at least two questions. Question 1: Does a more detailed subdivision of the hue areas enhance the recognition ratio? Question 2: Does an additional analysis of the saturation values and the intensity values show any benefits? Six different versions of feature vectors were generated to answer these questions. In version 1 all colors are mapped onto the plane V=1 in the HSV color space. Furthermore, all hue values are combined into regions of size 5° starting at 0°. All values lying inside such region are mapped to a value in the feature vector that has to be generated. The feature vectors have a length of 72 entries. Version 2 is identical to version 1 with the exception that the colors are combined into regions of size 3°. The feature vectors have a length of 120 entries. In version 3 the saturation values are combined into two areas. One for values bigger than 0.5 and the other for values smaller than 0.5. The hue values are combined into regions of size 5° and the feature vectors have a length of 144 entries. Version 4 is identical to version 3 with the exception that the colors are combined into regions of size 3°. The feature vectors have a length of 240 entries. In version 5 the saturation values are not taken into consideration. The intensity values are combined into two areas. One area for intensity values bigger than 0.5 and the other area for intensity values smaller than 0.5. The hue values are combined into regions of size 5° and the feature vectors have a length of 144 entries. Version 6 is identical to Figure 2: Color and range image pair CARS_6. version 5 except the colors are combined into regions of size 3°. The feature vectors have a length of 240 entries. The sizes of the different feature vectors are mentioned above because they are related to the memory space that is needed to represent an object in the database. Furthermore, the sizes of the vectors are related to the computing time needed to measure the similarity between the feature vectors computed from the color image and the feature vectors stored in a database. 6. OBJECT RECOGNITION USING COLOR INDEXING We used the Euclidean distance as well as the scalar product to measure the similarity between the feature vectors computed from the color image and the feature vectors stored in the database. A value close to zero indicates a high similarity between the feature vectors if the Euclidean distance is used. As opposed to this, a value close to one indicates a high similarity between the feature vectors if the scalar product is applied. 7. RESULTS Different versions of a method for the recognition and localization of colored objects were discussed above. For the evaluation of these different versions a database was established that consists in total of 109 different views of 30 colored objects of different complexity. 45 color images representing in average two objects were investigated. The objects have been rotated, translated, Version 1 Version 2 Version 3 scalar 96 % 90 % 93 % Euclid. 90 % 90 % 90 % TAB. 1: Object recognition ratios in percentage, obtained when applying feature vectors of versions 1 to 3 (on average for 20 images). Version 4 Version 5 Version 6 scalar 96 % 93 % 83 % Euclid. 90 % 86 % 86 % TAB. 2: Object recognition ratios in percentage, obtained when applying feature vectors of versions 4 to 6 (on average for 20 images). and distorted (non rigid objects). One example of a color stereo image (named SCENE_5) that shows two beverage cans is presented in Fig. 1. The color and range image pair CARS_6 obtained with the 3D laser mirror scanner for some cars in a parking lot is shown in Fig. 2. Some results are presented in TAB. 1 and in TAB. 2. The object recognition ratios (on average for 20 color images) obtained when applying version 1 to version 6 are presented. Furthermore, the results can be distinguished by the similarity measures for the feature vectors. Here "scalar" denotes the scalar product and "Euclid" denotes the Euclidean distance. In summary, good recognition results are obtained in images with a relatively uniform illumination of the scene. Instead of an improvement the results impair more likely when using a more fine subdivision of 3° instead of 5° for the hue values. Furthermore, the quality of the results does not improve by an additional analysis of the intensity values or the saturation values. The object recognition ratio decreases significantly if the images are taken under varying illumination conditions. 8. SUMMARY AND CONCLUSIONS We presented a new method for the recognition and the localization of three-dimensional objects using color information. Dependent on the requirements of an application, we either employ a binocular stereo technique or a laser range technique to obtain distance measurements for locating the objects in the environment of the camera system. The stereo matcher is based on efficient chromatic block matching. After a coarse segmentation of the disparity image or the range image, feature vectors were generated for the recognition of the colored objects. Several different versions of the feature vectors were implemented and tested. Instead of an improvement the results impaired more likely when using a more fine subdivision for the hue values in the feature vectors. Furthermore, the quality of the results did not improve by an additional analysis of the intensity values and the saturation values. In our investigations slightly better object recognition ratios were obtained when using the scalar product instead of the Euclidean distance to measure the similarity between the feature vectors computed from the color image and the feature vectors stored in the database. 9. ACKNOWLEDGMENTS The authors acknowledge the support of the U.S. Army TACOM Project. Furthermore, we thank Dirk Stürmer for implementing the object recognition approach at the Technical University Berlin, Germany. 10. REFERENCES [1] M.J. Swain and D.H. Ballard, "Color indexing," International Journal of Computer Vision, Vol. 7, pp. 11-32, 1991. [2] B.V. Funt and G.D. Finlayson, "Color constant color indexing," IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 17, pp. 522-529, 1995. [3] G. Healey and D. Slater, "Global color constancy: recognition of objects by use of illumination-invariant properties of color distributions," Journal Optical Society of America A, Vol. 11, pp. 3003-3010, 1994. [4] G.D. Finlayson, S.S. Chatterjee, and B.V. Funt, "Color angular indexing," Proceedings of the 4th European Conference on Computer Vision, Vol. II, pp. 16-27, Cambridge, England, April 1996. [5] G.D. Finlayson, B. Schiele, and J.L. Crowley, "Comprehensive colour image normalization," Proceedings of the 5th European Conference on Computer Vision, Vol. I, pp. 475-490, Freiburg, Germany, June 1998. [6] B. Bunt, K. Barnard, and L. Martin, "Is machine colour constancy good enough?," Proceedings of the 5th European Conference on Computer Vision, Vol. I, pp. 445-459, Freiburg, Germany, June 1998. [7] A. Koschan, "Chromatic Block Matching for Dense Stereo Correspondence," Proceedings of the 7th International Conference on Image Analysis and Processing 7ICIAP, pp. 641648, Capitolo, Monopoly, Italy, September 1993. [8] A. Hoover, G. Jean-Baptiste, X. Jiang, P.J. Flynn, H. Bunke, D.B. Goldof, K.W. Bowyer, D.W. Eggert, A. Fitzgibbon, and R.B. Fisher, “An experimental comparison of range image segmentation algorithms,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 18, pp. 643-649, 1996. [9] Y. Zhang, Y. Sun, H. Sari-Sarraf, and M.A. Abidi, "Impact of Intensity Edge Map on Segmentation of Noisy Range Images,“ Proceedings of SPIE Conference on ThreeDimensional Image Capture and Application III, Vol. 3958, pp. 260-269, San Jose, CA, January 2000.