Proc. IEEE Int. Conf. on Image Processing ICIP2001, October 7-10

advertisement
Proc. IEEE Int. Conf. on Image Processing ICIP2001, October 7-10, 2001, Thessaloniki, Greece, Vol. I, pp. 858-861.
FINDING OBJECTS IN A 3D ENVIRONMENT BY COMBINING
DISTANCE MEASUREMENT AND COLOR INDEXING
Andreas Koschan, SunHo Lee, and Mongi A. Abidi
Imaging, Robotics, and Intelligent Systems Laboratory
Department of Electrical and Computer Engineering
The University of Tennessee
koschan@iristown.engr.utk.edu
ABSTRACT
In this paper, a new method is presented for the
localization and recognition of three-dimensional objects
using color information. In the first processing step, we
estimate depth information by either applying a chromatic
block matching method to color stereo images or acquiring
a range image from a laser scanner. Second, the computed
depth maps are segmented to distinguish between the
image background and the objects that should be
recognized. Assuming that the segmented regions
represent single objects in the three-dimensional scene,
feature vectors are generated based on color histograms.
The Euclidean distance is used as well as the scalar
product to measure the similarity between the feature
vectors computed from the color image and the feature
vectors stored in a database.
1. INTRODUCTION
Color indexing [1] is a very efficient method for the
recognition of colored objects in color images. It is based
on a comparison of color distributions or color histograms
of objects in a color space. During the last years, several
proposals were presented to make this method
approximately color constant. Funt and Finlayson
suggested [2] the use of relations or quotients of color
histograms instead of the color histograms themselves for
the comparison of objects. Through this, the results of the
object recognition obtained under varying illumination can
be enhanced in comparison to the results of a "direct"
color indexing method. However, the method of Funt and
Finlayson is sensitive to noise, especially in sparsely
illuminated areas of the scene.
The idea of representing the objects by a small set of
moments of color histograms was suggested by Healey and
Slater [3]. Assuming that the change in illumination can be
described by a linear model, they prove that some
moments of the color distribution are invariant to changes
in illumination. Other approaches to color constancy use,
for example, six angles of the color distribution instead of
a color histogram to represent an object [4] or
comprehensive color image normalization is proposed [5].
The robustness or non-robustness of color constancy
algorithms is still being evaluated and discussed [6]. In
this paper, we focused on a task for the efficient and
robust recognition of colored objects. This task is also able
to localize the objects in the three-dimensional
environment of the camera. An object recognition method
based on color indexing is combined with a distance
measurement technique to localize the objects in the scene.
Since different requirements have to be met for different
applications, we used two optional techniques to measure
the distance between the camera system (for example
mounted on a mobile robot) and the objects to be
recognized and located.
The first technique employs passive stereo sensing. It
is proved to be very worthwhile for color stereo [7]. The
second technique employs active sensing based on the
principle of 3D laser scanners. The computed depth maps
or range images are used to segment the color images into
objects and background. Therefore, we can recognize and
localize single objects in color images presenting multiple
objects. We assume that all objects are opaque and no
object covers another object in the color images. The
processing steps and the results are detailed in the
following chapters.
2. DISTANCE MEASUREMENT WITH COLOR
STEREO IMAGES
Finding corresponding pixels in the images is still one of
the main problems in stereo matching techniques
(correspondence problem). In this approach, we used a
chromatic block matching technique [7] that is based on a
similarity check between the color distributions of two
blocks (of size n × m) in the left and in the right image.
One of the images (for example the right one) is
segmented into blocks of equal size. For each of these
blocks a corresponding block is searched for in the other
image. The mean square error MSE between the color
vectors corresponding with the pixels inside the respective
blocks defines a measure for the similarity or nonsimilarity of two blocks.
We measure the color distance in the RGB color space
by the Euclidean distance Dε . In the RGB color space the
left color image FL and the right color image FR may be
represented as FL ( x, y ) = ( RL ( x, y ), GL ( x, y ), BL ( x, y ))
and FR ( x, y ) = ( R R ( x, y ), G R ( x, y ), B R ( x, y )) . For a
block size of n × m pixels and an offset δ the mean square
error MSE is defined by
MSE ( x, y, δ ) =
=
1 n −1 m −1
∑ ∑ Dε (FR ( x + i, y + j ), FL ( x + i + δ , y + j ))
n ⋅ m i =0 j =0
=
1
n⋅ m
n−1 m−1
∑ ∑ ( RR (x + i, y + j) − RL ( x + i + δ , y + j)
2
i= 0 j= 0
+ GR (x + i, y + j) − GL (x + i + δ , y + j)
2
2
+ BR (x + i, y + j) − BL (x + i + δ , y + j ) )
The block (of size n × m) is shifted pixel by pixel inside
the search area. Using standard stereo geometry the
epipolar lines match the image lines. Furthermore, the
search area in the left image is limited by a predefined
maximum disparity d max . The disparity D between the
blocks in both images is defined by the distance between
the positions (the difference in the columns) of the blocks,
showing the minimum mean square error in both images.
D = arg min {MSE (x, y, δ )} .
δ ≤d max
The disparity with the smallest difference to the disparity
of the neighboring block is selected if more than one
minimum exists (disparity smoothness constraint). As
opposed to the approach presented in [7], single disparities
will not be computed for each pixel to reduce computation
time. The computation of block disparities is sufficient for
this approach because only a coarse localization of the
objects in the three-dimensional environment is needed
instead of a surface reconstruction. An average distance
between the colored object and the camera system is
computed (for example 1.25 m for the whole object).
If the image background represents one single color
or if the image background is nearly homogeneous then
the method mentioned above computes disparity values by
chance for the pixels representing the image background.
Fortunately, these (mainly false) disparity values are not
considered in the succeeding processing steps. Only the
values corresponding to an object are regarded. Therefore,
a coarse segmentation is applied to the disparity map or
depth map.
3. SEGMENTING DISPARITY IMAGES
The segmentation of disparity maps or depth maps into
objects and image background constitutes a problem that
cannot be solved without additional constraints about the
scene. A basic assumption is that objects located nearer to
the camera show higher disparity values than the disparity
values for the pixels representing the image background.
The disparity values computed for the image background
should be close to zero. This is not true if the image
background is nearly homogeneous. In this case, (false)
disparity values are computed by chance for the image
background. A segmentation assuming zero disparities for
the background produces false results. Another problem in
disparity segmentation occurs when shadows are seen in
the stereo images. The shadow of an object shows nearly
the same disparity values as the object itself. Therefore,
the shadow would be categorized as part of the object if no
information about the shadows in the images is known.
An optional two-stage criterion is applied to solve the
problems mentioned above at least in parts. We assume
that the image background represents either a known
single color or the image background is structured in a
way that corresponding pixels can be found by the stereo
matcher. A pixel is segmented as part of an object if two
criteria are fulfilled. First, the disparity values are
relatively constant along a scan line in a bigger area (that
has to be established). Second, the corresponding color
vectors differ from the background color (which is black
in our stereo images).
As opposed to an exclusive evaluation of the image
values this technique has the advantage that it can be
applied to images with uniform background colors as well
as to images with textured backgrounds. This technique
yields for the objects at least a "coarse" segmentation that
does not consist of all pixels representing the objects. Due
to the robustness of the color indexing technique the
missing pixels do not influence the recognition results.
4. DISTANCE MEASUREMENT BY A LASER
RANGE SCANNER
High quality color stereo images can be easily obtained by
a binocular camera system at low cost. However, the
resolution in depth obtained when applying a stereo
matcher is in general lower than the resolution obtained
when applying a laser range scanner. If an application
requires high accuracies for distance measurement, we
employ laser range techniques. Currently, we acquire
range images and true color images with the 3D laser
mirror scanner LMS-Z210 by RIEGL. The system is based
on accurate distance measurement by means of electrooptical range measurement and a two-axis beam scanning
mechanism. It is loaded with an additional three-color
passive receiver that facilitates the capture of registered
range, intensity, and true color images at the same time.
Since the system provides registered range and color
images, we can easily combine results obtained by
techniques analyzing either range images or color images.
However, we have to segment the range images to
distinguish between objects and background. While we
have to deal with mismatches when segmenting depth
maps obtained by stereo techniques, we have to consider
noise when segmenting range images obtained by laser
techniques. Based on a comparison of range image
segmentation algorithms [8], we developed a scheme to
segment noisy range images (see [9] for further details).
However, the problem of how to handle complex
backgrounds and occlusions is still part of further
investigations.
Figure 1: Left and right stereo image SCENE_5.
5. GENERATION OF FEATURE VECTORS
Feature vectors were generated for each object in the
database and for each "coarsely" segmented object in the
image. We use the HSV color space since this color space
enables a simple investigation of the hue (H), saturation
(S), and intensity (V) values. The investigations should
answer at least two questions. Question 1: Does a more
detailed subdivision of the hue areas enhance the
recognition ratio? Question 2: Does an additional analysis
of the saturation values and the intensity values show any
benefits? Six different versions of feature vectors were
generated to answer these questions.
In version 1 all colors are mapped onto the plane V=1
in the HSV color space. Furthermore, all hue values are
combined into regions of size 5° starting at 0°. All values
lying inside such region are mapped to a value in the
feature vector that has to be generated. The feature vectors
have a length of 72 entries. Version 2 is identical to
version 1 with the exception that the colors are combined
into regions of size 3°. The feature vectors have a length
of 120 entries. In version 3 the saturation values are
combined into two areas. One for values bigger than 0.5
and the other for values smaller than 0.5. The hue values
are combined into regions of size 5° and the feature
vectors have a length of 144 entries.
Version 4 is identical to version 3 with the exception
that the colors are combined into regions of size 3°. The
feature vectors have a length of 240 entries. In version 5
the saturation values are not taken into consideration. The
intensity values are combined into two areas. One area for
intensity values bigger than 0.5 and the other area for
intensity values smaller than 0.5. The hue values are
combined into regions of size 5° and the feature vectors
have a length of 144 entries. Version 6 is identical to
Figure 2: Color and range image pair CARS_6.
version 5 except the colors are combined into regions of
size 3°. The feature vectors have a length of 240 entries.
The sizes of the different feature vectors are mentioned
above because they are related to the memory space that is
needed to represent an object in the database.
Furthermore, the sizes of the vectors are related to the
computing time needed to measure the similarity between
the feature vectors computed from the color image and the
feature vectors stored in a database.
6. OBJECT RECOGNITION USING COLOR
INDEXING
We used the Euclidean distance as well as the scalar
product to measure the similarity between the feature
vectors computed from the color image and the feature
vectors stored in the database. A value close to zero
indicates a high similarity between the feature vectors if
the Euclidean distance is used. As opposed to this, a value
close to one indicates a high similarity between the feature
vectors if the scalar product is applied.
7. RESULTS
Different versions of a method for the recognition and
localization of colored objects were discussed above. For
the evaluation of these different versions a database was
established that consists in total of 109 different views of
30 colored objects of different complexity. 45 color
images representing in average two objects were
investigated. The objects have been rotated, translated,
Version 1
Version 2
Version 3
scalar
96 %
90 %
93 %
Euclid.
90 %
90 %
90 %
TAB. 1: Object recognition ratios in percentage, obtained
when applying feature vectors of versions 1 to 3 (on
average for 20 images).
Version 4
Version 5
Version 6
scalar
96 %
93 %
83 %
Euclid.
90 %
86 %
86 %
TAB. 2: Object recognition ratios in percentage, obtained
when applying feature vectors of versions 4 to 6 (on
average for 20 images).
and distorted (non rigid objects). One example of a color
stereo image (named SCENE_5) that shows two beverage
cans is presented in Fig. 1. The color and range image pair
CARS_6 obtained with the 3D laser mirror scanner for
some cars in a parking lot is shown in Fig. 2.
Some results are presented in TAB. 1 and in TAB. 2.
The object recognition ratios (on average for 20 color
images) obtained when applying version 1 to version 6 are
presented. Furthermore, the results can be distinguished by
the similarity measures for the feature vectors. Here
"scalar" denotes the scalar product and "Euclid" denotes
the Euclidean distance.
In summary, good recognition results are obtained in
images with a relatively uniform illumination of the scene.
Instead of an improvement the results impair more likely
when using a more fine subdivision of 3° instead of 5° for
the hue values. Furthermore, the quality of the results does
not improve by an additional analysis of the intensity
values or the saturation values. The object recognition
ratio decreases significantly if the images are taken under
varying illumination conditions.
8. SUMMARY AND CONCLUSIONS
We presented a new method for the recognition and the
localization of three-dimensional objects using color
information. Dependent on the requirements of an
application, we either employ a binocular stereo technique
or a laser range technique to obtain distance measurements
for locating the objects in the environment of the camera
system. The stereo matcher is based on efficient chromatic
block matching. After a coarse segmentation of the
disparity image or the range image, feature vectors were
generated for the recognition of the colored objects.
Several different versions of the feature vectors were
implemented and tested. Instead of an improvement the
results impaired more likely when using a more fine
subdivision for the hue values in the feature vectors.
Furthermore, the quality of the results did not improve by
an additional analysis of the intensity values and the
saturation values. In our investigations slightly better
object recognition ratios were obtained when using the
scalar product instead of the Euclidean distance to
measure the similarity between the feature vectors
computed from the color image and the feature vectors
stored in the database.
9. ACKNOWLEDGMENTS
The authors acknowledge the support of the U.S. Army
TACOM Project. Furthermore, we thank Dirk Stürmer for
implementing the object recognition approach at the
Technical University Berlin, Germany.
10. REFERENCES
[1] M.J. Swain and D.H. Ballard, "Color indexing,"
International Journal of Computer Vision, Vol. 7, pp. 11-32,
1991.
[2] B.V. Funt and G.D. Finlayson, "Color constant color
indexing," IEEE Trans. on Pattern Analysis and Machine
Intelligence, Vol. 17, pp. 522-529, 1995.
[3] G. Healey and D. Slater, "Global color constancy:
recognition of objects by use of illumination-invariant properties
of color distributions," Journal Optical Society of America A,
Vol. 11, pp. 3003-3010, 1994.
[4] G.D. Finlayson, S.S. Chatterjee, and B.V. Funt, "Color
angular indexing," Proceedings of the 4th European Conference
on Computer Vision, Vol. II, pp. 16-27, Cambridge, England,
April 1996.
[5] G.D. Finlayson, B. Schiele, and J.L. Crowley,
"Comprehensive colour image normalization," Proceedings of
the 5th European Conference on Computer Vision, Vol. I, pp.
475-490, Freiburg, Germany, June 1998.
[6] B. Bunt, K. Barnard, and L. Martin, "Is machine colour
constancy good enough?," Proceedings of the 5th European
Conference on Computer Vision, Vol. I, pp. 445-459, Freiburg,
Germany, June 1998.
[7] A. Koschan, "Chromatic Block Matching for Dense Stereo
Correspondence," Proceedings of the 7th International
Conference on Image Analysis and Processing 7ICIAP, pp. 641648, Capitolo, Monopoly, Italy, September 1993.
[8] A. Hoover, G. Jean-Baptiste, X. Jiang, P.J. Flynn, H. Bunke,
D.B. Goldof, K.W. Bowyer, D.W. Eggert, A. Fitzgibbon, and
R.B. Fisher, “An experimental comparison of range image
segmentation algorithms,” IEEE Trans. on Pattern Analysis and
Machine Intelligence, Vol. 18, pp. 643-649, 1996.
[9] Y. Zhang, Y. Sun, H. Sari-Sarraf, and M.A. Abidi, "Impact
of Intensity Edge Map on Segmentation of Noisy Range
Images,“ Proceedings of SPIE Conference on ThreeDimensional Image Capture and Application III, Vol. 3958, pp.
260-269, San Jose, CA, January 2000.
Download