Full Text

advertisement
Paper Number: 021102
An ASAE Meeting Presentation
Measurement of 3-D Locations of Fruit by Binocular Stereo
Vision for Apple Harvesting in an Orchard
Teruo Takahashi, Dr.
Hirosaki University, 3 Bunkyo-cho Hirosaki Japan, teruo@cc.hirosaki-u.ac.jp
Shuhuai Zhang, Dr.
Hirosaki University, 3 Bunkyo-cho Hirosaki Japan, zhang@cc.hirosaki-u.ac.jp
Hiroshi Fukuchi
Hirosaki University, 3 Bunkyo-cho Hirosaki Japan, fukuchi@cc.hirosaki-u.ac.jp
Written for presentation at the
2002 ASAE Annual International Meeting / CIGR XVth World Congress
Sponsored by ASAE and CIGR
Hyatt Regency Chicago
Chicago, Illinois, USA
July 28-July 31, 2002
Abstract. This paper describes the results of measurement of 3-D locations of fruit by binocular stereo vision
for apple harvesting in an orchard. In the method of image processing, a 3-D space is divided into a number of
cross sections at an interval based on disparity that is calculated from a gaze distance, and is reconstructed by
integrating central composite images owing to stereo pairs. Three measures to restrict false images were
proposed: (1) a set of narrow searching range values, (2) comparison of an amount of color featured on the
half side in a common area, and (3) the central composition of the half side. Experiments with a trial stereo
system were conducted on ripe apples in red (search distance ranging from 1.5m to 5.5m) and in yellow-green
(search range of 2m to 4m) in an orchard. The results showed that two measures of (1) and (3) were effective,
whereas the other was effective if there was little influence of background color similar to that of the objects.
The rate of fruit discrimination was about 90% or higher in the images with 20 to 30 red fruits, and from 65% to
70% in images dense with red fruit and in the images of yellow-green apples. The errors of distance
measurement were about ±5%.
Keywords. Fruit harvesting, Binocular stereo vision, Image processing, Correspondence problem,
central composite image
The authors are solely responsible for the content of this technical presentation. The technical presentation does not necessarily
reflect the official position of the American Society of Agricultural Engineers (ASAE), and its printing and distribution does not
constitute an endorsement of views which may be expressed. Technical presentations are not subject to the formal peer review
process by ASAE editorial committees; therefore, they are not to be presented as refereed publications. Citation of this work should
state that it is from an ASAE meeting paper. EXAMPLE: Author's Last Name, Initials. 2002. Title of Presentation. ASAE Meeting
Paper No. 02xxxx. St. Joseph, Mich.: ASAE. For information about securing permission to reprint or reproduce a technical
presentation, please contact ASAE at hq@asae.org or 616-429-0300 (2950 Niles Road, St. Joseph, MI 49085-9659 USA).
Introduction
An automated machine or robot for apple harvesting must have the ability to discriminate an
apple from its surroundings, to sense the apple’s relative location, and to measure the
coordinate values of three dimensions (3-D) from the base point of the machine. Binocular
stereo vision is an available measuring method by which to obtain such 3-D information in an
outdoor field. However, practical solutions to the correspondence problem of stereopsis have
not yet been found. Thus, use of a stereo vision system with an automated machine remains
impracticable. Fruit images obtained in apple orchards frequently exhibit the correspondence
problem or other image problems such as similar features, as in the case of a row of similar
fruits or overlapped fruits, and an occluded or transformed image in which the fruit is hidden by
leaves and branches. It is necessary to find a solution to, or take measures against, this type of
problem in order to apply the method of binocular stereo vision to such images.
A major point of conventional methods to measure distance by binocular stereo vision is the
search for corresponding points of the same object on a pair of images from a left camera and a
right one, respectively, in order to calculate distance based on triangulation principle. Since
each image of the pair is taken by an independent camera, it is difficult to determine the
corresponding points, especially when several similar objects exist on an epipolar plane, and it
is said that the results are deficient in reliability.
A precept of the method used in the present study is the establishment of monocular vision that
is consisted of two cameras used in cooperation with under triangulation principle to aim
improvement of reliability. A central image of monocular vision is obtained by composing a
stereo pair of images on a cross section of a search space in the direction of depth (Takahashi
et al., 1998 and 2000a). Therefore, preparation of several central images involves dividing the
space into slices, and integration of these slices uses 2-D images to reconstruct a 3-D version
of the object. The correspondence problem will be reduced if the central composite images are
obtained under the above mentioned condition from the outset. But a stereo vision system with
current available cameras that are not designed to serve this function is apt to create false
images, even if the method of composing central images is adopted.
Lately, we proposed three measures to restrict occurrence of the false images (Takahashi et al.,
2000b). This paper describes the results of performance of discrimination and accuracy of
measurement that were obtained by applying these methods to the images containing many
apples in red and in yellow-green.
Central composite image and measures to restrict occurrence of false
images
A typical example of images demonstrating the correspondence problem is shown in Figure 1,
where images of 4 circular plates (P11 to P44) in space were obtained in the same order from the
left and right, as shown in Figure 1(a). In Figure 1(b) of the top view, four genuine image are
illustrated by thick bars, and twelve false images will be created regularly at positions indicated
by thin bars under combinations of object images. In this case, central composite images and
measures to restrict occurrence of false images are explained as follows.
2
Z
l1
r3
l2
P11
P22
P33
P44
l4
r1
P22
P33
P11
left image
l3
r4
r2
P44
Z33
Z32
f
P11
P22
P33
P44
right image
(a) a stereo pair of object images
C 0
L
R
X
the left side of left and right eye-lines
the right side of left and right eye-lines
(b) Measures, introduced herein, to address
the correspondence problem.
Figure 1. A typical example of the correspondence problem in binocular stereo vision.
Central composite image based on a stereo pair of images
An image of monocular vision is obtained on a central screen through a visual point C, and the
relationships of coordinates among the central image, (xc, yc), a left image, (xl, yl), a right image,
(xr, yr), and disparity s are represented by next equations:
xl = xc + s /2, xr = xc - s /2, yl = yr = yc.
(1)
The relationships of coordinates between the central image and the space (X, Y, Z), are
represented by the equations:
X = ab * xc / s, Y = ab * yc / s, Z = Cd * f * ab / s,
(2)
where ab is the optical interval of two cameras, f is the focal length, and Cd is a scale factor of
the central screen. When the Z33 coordinate is given as a gaze distance, a disparity value, s33, is
calculated backward by equation (2), and is constant. The cross section of Z33 is projected on
the central screen by relations of equation (1). In the images of P11 and P33 on the central
screen, their whole area is clear because their left images and right images overlap completely
with each other. In the images of P22 and P44, the central parts of their areas is clear, but the
sides are vague because their left images and right images overlap with slight discrepancies.
Therefore, if clarity is detected on a section, information regarding its color, position and
distance is obtained. The searching space is divided into some cross sections that are
calculated by disparities in front of and behind s33. And the space image is reconstructed by the
central composite images where the clear sections were integrated. The distance resolution is
determined by disparity interval whose minimum value is one pixel.
3
For enhancement of clarity, composite image were made and arranged using a method that
alternatively selected even lines of the left image and odd lines of the right image, respectively.
An index of clarity, Ch i,j, of a pixel (i, j) is represented by variance of density of HSI -- i.e., hue,
hu, saturation, sa, and intensity, it, -- as follows:
Chi,j = sqrt{(hui,k - hua)2 + (sai,k - saa)2 + (iti,k - ita)2}, (k = j - m to j + m)
(3)
where the subscript a represents the average from j-m to j+m. When the left and right images of
an object overlapped well, the color difference in the vertical direction is smallest, and the
average value, Cha, on a whole area of the object reaches a minimum.
Measures to restrict the occurrence of false images
In Figure 1(b), if a cross section is located at Z32, three clear false images appear at the central
screen because the image of P33 at the left screen and the image of P22 at the right screen
overlap on the cross section, as will be the case with the other false images. There is no
difference between the genuine images and the false images if the color and shape of their
objects are equal each other. Based on the occurrence of these false images, three measures
to restrict it are as follows.
If a searching range includes the cross section of Z33 and but did not include that of Z32, false
images on the Z32 does not appear on the central screen that the range is projected. Therefore,
the first measure is to set a search range in the direction of depth narrow enough to prevent
false images from occurring, and is to move the range from near to far gradually.
The second measure is to compare the amount of color featured on an object. In Figure 1(b),
the amount of color featured at the left side of a line, l3, of the left camera through the genuine
image, P33, is equal to that at the left side of a line, r3, of the right camera. The same condition
exists on the right side of both lines. In the case of a line r2 through false image on Z32, the
amounts of color featured at the left side of both lines, and at the right side of both lines, do not
agree with each other, respectively. Here, the symbol Slij represents the absolute value of
difference between the amount of color featured at the left side of li and that featured at the left
side of rj, and the symbol Srij represents the same amount of color at the right side of both lines.
The symbol Saij is a summation of these amounts, as follows:
Saij = | Slij | + | Srij |.
(4)
Thus, the value of Saij will become a minimum when li and rj intersect at genuine images. This
relation is useful for judging genuine central images when object images on the left and right
screens are of rows of the same order.
The third measure involves composition on each half-side of an object image. In Figure 1(b), the
dots area at the left side of P33 is a common area between the left side of l3 and that of r3, and
the dots area at the right side of P33 is a common area between the right side of these regions.
When the left side of a central image is composed from the respective left half side of the left
and right images, and its right side is composed from respective right side, the whole image will
become clear only in these common areas. On borders and outside of these common areas, the
image will become unclear, and the average value of variance of the color feature increases
because the overlap of the left and right images cannot help but be incomplete. In the case of
4
Figure 1(b), four genuine images and only two false images among total number of created
images are clear, and the others are restricted. However, if this method is applied to a false
image, genuine images decrease and, conversely, false images increase.
Experiments
Instrumentation
The main components of a trial system of binocular stereo vision were two CCD color video
cameras (SONY EVI-310, 1/3" CCD element, interline type, 768x494 pixels, focal length of
5.9mm to 47.2mm, F1.4), video capture cable (RGB 24bit, 640x480 pixels maximum), and a
laptop computer (Pentium 266 MHz, 64MB RAM). The zoom lens mechanism of the cameras
was controlled by the computer through a video system control architecture (VISCA) network on
a RS232C interface. The convergence angle of the two cameras was 0.2 degrees. In image
processing, correction of lens aberration was conducted for each image.
Experimental method
The objects were red ripe 'Fuji' apples and yellow-green 'Orin' apples in an orchard. Images of
these objects were obtained under front-lighting conditions in sunny weather. The camera's
distance from the objects ranged from about 1.5m to 5.5m. Image size was 320x240 pixels.
Adjustments to focus, exposure, and white balance were conducted automatically. In the image
processing, composite images were produced at a size of 260x220 pixels in RGB at a 24bit
density. Average variance of the color featured at each pixel on composite images was
calculated in a range of pixels equivalent to an actual length of 8cm, which was equivalent to the
average diameter of fruits.
Results and discussion
The results of processing red apple images
Figure 2 is a pair of original images of red variety 'Fuji' apples. The fruits were located at a
range of 3.4m to 4.5m from the cameras. The focal length of the cameras was 13.7 mm, and the
optical interval was 300 mm. The fruit was about 80 mm to 85 mm in diameter. The image
processing was carried out at a gaze distance of 3.4m, a disparity interval of 3 pixels, with 11
cross sections by disparity (in a search range of 2.9m to 4.2m), and hue values of red range and
average variance were used for discrimination. The central images were composed by
overlapping right images on left images at each cross section of distance by disparity. The total
number of fruits was 33; the number of horizontal rows was 3; the number of overlapping fruits
was 7; the number of fruits occluded by other fruits, leaves, or branches was 10; and the
number of fruits with complete shape was 8 in the frame of the left image.
Figure 3 shows a process of central images composed from the same half sides of left and right
images. Each image in Figure 3(a) is of cross sections of 3,201 mm, 3,429 mm, and 3,691 mm,
respectively. The images of fruits became clear at the cross section of 3,429 mm near the actual
distances of the objects. The central vertical lines at the images of 3,201 mm and 3,691 mm
show borders between the left half side and the right half side in the common area. The values
of Sa were related to distance by disparity in Figure 3(b). They decreased in the front of the
actual distance, and fell to a minimum at a distance of about 0.2m shorter than the actual
distance of the object. This tendency are useful for determine a gaze distance. Average
5
variance, Cha, of color density became a minimum at a distance near the actual distance. As the
minimum value of average variance increases when the gaze distance was far from the actual
distance, comparison of the value is useful for measuring distance.
left
right
Figure 2. A pair of original images of red variety 'Fuji' apples.
(a) Central composite images in processing
Cha
1500
40
1200
30
900
20
600
10
300
0
2500
3000
3500
distance, mm
4000
Cha
Sa
Sa
50
0
4500
(b) Relationships among distance, Sa, and Cha.
Figure 3. A process of central images composed from the same-half sides of left and right
images: (a) central images of three cross sections of 3,201 mm, 3,429 mm, and 3,691 mm; (b)
relationships among distance by disparity, Sa, and Cha.
6
The final composite image, shown in Figure 4(a), was obtained by gathering pixels whose
average variance was minimum among the images of the cross sections. Color and shape of
fruit composite images in the search range were clear approximately. The shape of the gazed-at
fruit image in the center of the central composite screen was transformed gradually as the gaze
distance got farther from the actual distance. Figure 4(b) shows the distribution of average
variances in the area of fruit images. The divided value is represented by the symbols 'A' to 'M'
and 'n' to 'z' in ascending order. In the results, twenty seven fruit images were marked, and eight
images had two or more symbols because a single fruit image was divided into several areas
which were measured at different distances. Three images where fruit images overlapped were
not separated. The interval of cross sections and their positions is related to the separation of
these images. Figure 4(c) shows the fruit images marked with symbols of 'A' to 'z' that represent
52 divisions of the distance from 1.3m to 5.0m. Twenty six fruit images were marked with each
symbol. There were two areas where fruit images overlapped (i.e., one area had two fruit
images and the other had three fruit images), which are represented by one symbol,
respectively. No image was marked with two or more symbols. The number of cases in which
the marked point was far from the center of the fruit area was five because the integration of
areas on the relevant fruit images was unsuitable. Thus, the composite images except the
gazed image is possible to involve false images partially.
(a)
(b)
(c)
Figure 4. The results of image processing by central composition: (a) the final composite image,
(b) symbol divisions of average variances in the area of fruit images, and (c) fruit images
marked by symbols of 'A' to 'z' that represent the distance division in ascending order.
An example of images densely containing red fruits is shown in Figure 5. Figure 5(a) is a pair of
original images of red variety 'Fuji' apples. The fruits were located at a range of 4.7m to 5.5m
from the cameras. The focal length of the cameras was 13.7 mm, and the optical interval was
250 mm. The image processing was carried out at a gaze distance of 4.8m, a disparity interval
of one pixel, and with 11 cross sections (in a search range of 4.4m to 5.4m). The total number of
fruits in the white frame area was 46; the number of horizontal rows was 7; the number of
overlapping fruit images was 8; the number of fruits occluded by other fruits, leaves, or
branches was 4; and the number of fruits with complete shape was 14. Figure 5(b) shows the
distribution of average variances in the area of fruit images. Forty fruit images were marked, and
twenty one images had two or more symbols on each fruit image. Three images where fruit
images overlapped each other were not separated, and are represented by a symbol. Since the
distance resolution of the stereo vision system becomes low as the objects is farther, the
difficulty in separating the overlapped areas increases. But discrimination of fruit images by the
present method tended to be better than that by the method of simple composition. The
tendency of Sa on the amount of color featured was found different from the expectation due to
7
the influence of background color similar to that of the objects. The calculation of distance is
conducted on the areas of fruit images that was marked with the symbol of average variance.
Figure 5(c) shows the fruit images marked with symbols of 'A' to 'L', which represent the
distance divisions of 4.4m to 5.5m. Thirty one fruit images were marked. Three areas where fruit
images overlapped each other were represented by one symbol, respectively. Two fruit images
were marked with two or more symbols. The number of cases in which the marked point was far
from the center of the fruit area was 9.
(a)
(b)
(c)
Figure 5. An example of processing images densely containing fruits: (a) a pair of original
images of red variety 'Fuji' apples, (b) the distribution of average variances in the area of fruit
images, and (c) the fruit images marked by symbols of 'A' to 'L' that represent the distance
division in ascending order.
The results of images of a yellow-green apple
In Figure 6(a), an example of original images of yellow-green variety 'Orin' apples is shown. The
fruits were located at a range of 2.3m to 4.5m from the cameras. The focal length of the
cameras was 8.6 mm, and the optical interval was 250 mm. The image processing was carried
out at a gaze distance of 2.8m, a disparity interval of 2 pixels, and with 11 cross sections (in a
search range of 2.3m to 3.6m in distance). The total number of fruits was 20 in the white frame
area; the number of horizontal rows was one; the number of overlapping fruits was 3; the
number of fruits occluded by other fruits, leaves, or branches was 8; and the number of fruits
with complete shape was 10. Figure 6(b) shows the distribution of average variances in the area
of fruit images. Sixteen fruit images were marked, and no fruit image had two or more symbols.
8
Two areas of fruit images that overlapped each other were not separated, and were represented
by a symbol. Four area of leaves were marked with symbols. Figure 6(c) shows the fruit images
marked by the same symbols as those in Figure 4(c). Thirteen fruit images were marked. One
area in which fruit images overlapped each other is represented by one symbol. Four area of
leaves are marked by the symbols. The number of cases in which the marked point was far from
the center of fruit area was 5.
(a)
(b)
(c)
Figure 6. An example of processing images of yellow-green variety 'Orin' apples: (a) a pair of
original images, (b) the distribution of average variances in the area of fruit images, and (c) the
fruit images marked by symbols of 'A' to 'z' that represent the distance division in ascending
order.
10
Fuji-2.3
erorr, %
5
Fuji-3.4
0
Fuji-4.7
Orin-3.0
-5
-10
1000
2000
3000
4000
distance, mm
5000
6000
Figure 7. Relationship between photographing distance and error of distance.
9
Errors of measured distances
A laser range meter with accuracy of 3 mm was used to calibrate the values of measurement by
the trial stereo system. In Figure 7, the results for the red apples showed that the error in
determining distances was about ±5% (an average value of -1.3% and a standard deviation of
3.5%) at a distance range of 1.7m to 5.2m. In cases in which fruit images overlapped each
other, the error was larger because false images appeared in unsuitable correspondence with
the stereo pair. The number of false images tended to decrease when the search range was
narrowed in the direction of depth. The results for the yellow-green apples showed that errors of
distance were about ±5% (an average value of 1.6% and a standard deviation of 3.6%) in a
distance range of 2.3m to 4.3m if the value of color variance of the fruit image differed from
those of the surrounding leaves.
Conclusion
The results of three measures to address the correspondence problem showed that two of
these measures--the set of values of a narrow searching range and the central composition of
the same-half sides of the common area--were effective, but comparison of the amount of color
featured on the same-half sides was not always effective due to the influence of background
color. However, the rate of fruit discrimination was about 90% or higher in the image with 20 to
30 fruits; from 65% to 70% in dense red fruit images and in yellow-green apple images; and the
errors of distance measurement were ±5%. Based on these results, the method of image
processing utilizing these measures will be useful in research on the application of a stereo
vision system.
References
Takahashi, T., S. Zhang, M. Sun, and H. Fukuchi. 1998. New Method of Image Processing for
Distance Measurement by a Passive Stereo Vision. ASAE Meeting Paper No.983031.
St. Joseph, Mich.: ASAE.
Takahashi, T., S. Zhang, H. Fukuchi and E. Bekki. 2000a. Binocular Stereo Vision System for
Measuring Distance of Apples in Orchard (Part 1) - Method due to composition of left
and right images -. JSAM 62(1): 89-99 (in Japanese).
_____. 2000b. Binocular Stereo Vision System for Measuring Distance of Apples in Orchard
(Part 2) - Analysis of and solution to the correspondence problem -. JSAM 62(3): 94-102
(in Japanese).
10
Download