Stereo Vision System for Autonomous Terrain Navigation

advertisement
Vision and Navigation of Marsokhod Rover
Marina Kolesnik
Space Research Institute
84/32 Profsoyuznaya St., Moscow, 117810 Russia
mkolesni@iki3.iki.rssi.ru
Abstract
The exploration of the Martian surface with help of a long-duration rover is planned
as a part of the international space project "Mars-98". In order to provide the rover
autonomy, we have developed a path generation algorithm that makes use of 3-d
stereo reconstruction. Two main tasks are solved for successive obstacle avoidance:
(1) recognition of visible terrain in front of the rover; and (2) safe path generation
and execution. The area-based stereo reconstruction algorithm [11] which combines
the pyramidal data structure and dynamic programming technique has been used for
the recognition of the local environment. Prohibited areas are identified on the
elevation map regarding rover’s locomotion capabilities to overcome them. The safest
path generation is based on the Dijkstra algorithm being applied to non-prohibited
areas. The computational complexity and memory requirements of the algorithm
developed meets the implementation constraints of the onboard real-time processing.
We also provide the result of the tests which have been carried out in sandy and rocky
sites (Kamchatka, Russia, 1993; Mojave desert, California, 1994; Tarusa, Russia,
1994), to prove the robustness of the vision-guided system.
Keywords: stereo vision system, image pyramid, dynamic programming technique,
elevation map, directed graph, Dijkstra algorithm.
Vision and Navigation of Marskhod Rover
ACCV’95
Introduction
The heart of vision-based navigation system is stereo reconstruction of the surface
relief. Extensive research experience in the field of stereo analysis has been
accumulated worldwide. The known algorithms for passive stereo matching can be
classified in two basic categories:
1. Feature-based algorithms. These algorithms [8, 10, 12] extract features from the
images, such as edges, segments, contours, separate points (markers) and then match
the corresponding features on the left and right images. The matching stage of all
these algorithms is computationally fast, because only a small subset of the image
pixels are used, however, in general, the process of feature extraction is time
consuming. Another drawback is that the algorithms of this class may fail if the
primitives can not be reliably determined in the image pair. In particular, the edge
segment extraction is quite sensitive to any brightness distortion, as well as to
imbalance of the vision system parameters. Furthermore, feature-based methods yield
usually only a sparse depth maps that is unacceptable for solving the path planning
task.
2. Area-based. Assuming that the left and right images of a pair are locally similar to
each other, one can find a strong correlation between the small gray-level areas on the
different images [5, 9]. The underlying assumption appears to be a valid one for
relatively textured areas; however, it may prove wrong at occlusion boundaries and
within featureless regions.
The algorithms of both categories often use special methods to improve the
matching reliability, such as:
image hierarchies [13] and pyramidal structure of the images [9];
multiresolution coarse-to-fine strategy [7] to overcome the match ambiguities;
interpolation methods [7] to expand the depth values from the edge into the domain
interior;
dynamic programming techniques [6, 1, 15] to produce a smooth depth map;
A particular navigation algorithm based on the stereo vision was developed by
a French research group [16]. Their algorithm for stereo reconstruction makes use of
the area-based correlation process [14]. The path planning step comprises of the
following substeps:
1. distance map reconstruction using the geometrical parameters of the vision system;
2
Vision and Navigation of Marskhod Rover
ACCV’95
2. interpolation of these 3-D values to get a regular surface grid of heights (Digital
Terrain Model);
3. elaboration of a navigation map of the terrain with several classes (flat, traversable,
obstacle, unknown) using two local thresholds for slope and height discontinuity;
4. path generation by observing a distance margin from the rover to the obstacles.
This interesting solution includes, however, the computationally expensive step (2),
which is not really necessary for the motion. The last step (4) could also be simplified
by minimizing the local risk for the rover.
The special issue in vision-guided navigation is the design of relatively stable
and fast algorithm for the stereo reconstruction, which doesn't need large memory
resources. The accuracy of reconstructed 3-d shape may not be so high because we
need to detect those obstacles only, which vertical dimension is bigger then 30 cm.
The time consumption of our stereo reconstruction algorithm is quite small due to
simple features being used and combination of the image-pyramid/dynamicprogramming techniques. The path is calculated in the spatial domain based on the
map, describing the distance to visible points instead of using the Digital Terrain
Model (heights) on the real surface grid. We emphasize visibility, because it is useless
to investigate the invisible (unknown) regions in front of the rover, as the rover will
not enter these areas. Dijkstra algorithm [3] is applied to minimize a local risk for the
rover along the path. These steps are believed to be much faster then other methods
widely used.
This paper consists of three parts. In the first part we describe the stereo
reconstruction and navigation algorithms. A brief description of the onboard computer
and analysis of the rover stereo vision system parameters are given in the second part.
The experiments and processing results are presented in the third part. The algorithm
performance compared to known hardware-based solution is given in the conclusion.
1. Navigation Algorithm
The autonomous rover progression consists of the following steps to be
repeated in cycle:
stereo image acquisition;
stereo reconstruction in order to estimate the visible terrain;
path planning that follows a principal direction and avoids the obstacles;
3
Vision and Navigation of Marskhod Rover
ACCV’95
execution step by traversing along the path;
The general principle applied to match points in the right and left images is
correlation. It consists of comparing the gray level values of the images on a small
size (3x3 pixels) local window centered on each point of the left image to find the
most similar window on the right image. The disparities (parallaxes: pixel shift from
left to right image) thus obtained are then used for the reconstruction of the distance
between the rover and the points of the terrain surface, based on the given camera's
geometry.
The stereo matching process [11] is implemented under the following
assumptions concerning the stereo images and a surface to be reconstructed:
 The original images are always noisy due to geometrical and photometric
distortions;
The reconstructed surface is mainly smooth ("continuity constraint");
The stereo pair is taken as perspective projection of a scene, that means the disparity
values are generally decreasing upwards.
The search for correspondences along the horizontal lines is based on:
A fast procedure to extract the brightness features which are more stable with
respect to the brightness distortions;
Data pyramid construction for both stereo images. The original image is considered
as zero pyramid layer. Each next layer is defined from the previous one by panning it
with a factor of 1/2;
A local correlation analysis is implemented iteratively on the image pyramid. The
process of identification starts from the top pyramid layer and is continuing over the
pyramid layers from top to bottom. The parallax values obtained are stored and then
used as the initial shift values, while passing the higher resolution layer. Local
correlation analysis is combined with the dynamic programming method to
reconstruct the relief along the scan lines at each pyramid layer. The method meets the
continuity principle, it also introduces regularization to extract a smooth relief.
Finally, the parallax map is recalculated to the real scale distance map
according to the geometry of the acquisition system. The value of each point on the
map defines the distance from the rover to the appropriate point on the surface.
The path planning algorithm is implemented in two steps. During the first one
the obstacles are detected in the scale of origin images in the following way. The
4
Vision and Navigation of Marskhod Rover
ACCV’95
distance map is recalculated to create an elevation map with respect to that horizontal
plane which is under the rover. The actual values of rover inclinations (roll and pitch
angles) are taken into account. The obstacles are detected on the elevation map
according to the rover locomotion capacities to get over them. The regions which are
detected as the obstacles, represent the prohibit zones in the rover's field of view. In
the second step the path from the start point to the feasible destination points is
generated by applying Dijkstra algorithm to the elevation map. To implement this
step, we consider those pixels which do not belong to the obstacles as the nodes of the
directed graph. The start point of the path (see fig. 1) is considered as the graph origin.
The length of the graph edge connecting the node i with the node j is a positive
number defined as follows:
W (i , j )  Hi  H j
where: Hi , H j - are the values of the elevation map;
feasible target points
obstacle,
prohibit area
Y
Y+1
start point
Fig. 1. Path planning task: directed graph on the image field.
As usual, the length of a particular path which joins any two given vertices of the
graph, is defined as the sum of weights of the edges composing the path. Because the
real path must follow continuously through the image field, the graph edges can be
connected in the following way: each node in the image row Y can be connected only
with the three nodes from the previous row Y+1 (see fig. 1). Under this restriction, the
number of operations for searching the shortest path from the start point to destination
points is O( N 2 ) , were N is the image size. The number of operations required in the
5
Vision and Navigation of Marskhod Rover
ACCV’95
calculation described above is strictly less then the number of operation used in the
Voronoi Diagram method, which is implemented in paper [16].
Finally, a virtual destination is selected as a target point keeping the global
rover displacement within the direction defined by the mission task. The whole path is
then reconstructed from the target to the start position according to the best direction
for each graph node.
2. Rover Stereo Vision System
The rover stereo vision system consists of two cameras, and onboard computer
providing image capturing/processing facilities.
vertical visual
angle=32
cameras
height
1m
initial pitch=10
optical axis
2m
11m
Fig. 2. Rover stereo vision system.
The stereo cameras were installed in the rover in such a way to enable them to
analyze the nearest rover environment (fig.2). The blinded area is within
approximately 2m from the rover. The cameras are installed on the vertical rack of
o
1m. The cameras’ inclination toward the horizon is 10 . A rather large stereo basis (50
cm) made possible to process stereopairs with the resolution of 128x128 pixels. This
is enough to recognize major obstacles during the rover motion: the difference
between the parallax values corresponding to the top and the bottom of a stone of 30
cm height is equal to 3 pixels from a distance of 14m from the rover's center.
In order to provide the autonomy of movement, control and timing
experiments, data collection and storing etc., the rover is equipped with an onboard
computer based on the 32-bit T805 transputer from INMOS Corporation [17], that can
be regarded both as a special (i.e. image processing) and a general purpose processor.
Major characteristics of IMS-T805 are:
32-bit internal and external architecture.
6
Vision and Navigation of Marskhod Rover
ACCV’95
30 MIPS (peak) instruction rate.
4 Kbyte on-chip RAM direct addressable.
Internal timers.
4 fast Serial Links (10 Mbit/sec).
Less than 1 watt power consumption at 30 MHz.
The heart of onboard computer is four transputer modules, which are the real
copy of each other both electrically and even mechanically. There is no distinguished
one among them as far as the access to the peripheral blocks concerned, but, and it is a
substantial point, only two out of four transputer modules are powered at a time.
Which two, it is determined by the actual state of the overswitch logic [2]. The
computer system has an access to 256 Kbytes local memory (upgradable).
Since the stereo matching process is based on the assumption of epipolar
geometry of the original stereo images, the optical axes of the cameras must be
strictly parallel to each other. Let’s estimate the accuracy for the alignment of the
cameras assuming that vertical parallax for the corresponding points is within one
pixel. Let us suppose that the world co-ordinate system X,Y,Z coincides with the left
camera position (fig.3). Let us denote the camera viewing angles as a pan angle 
related to Y-direction, tilt angle  related to X-direction, and roll angle  related to Zdirection (,  and  means variation of these angles with respect to the parallel
optical axes.) The pixel mismatching due to camera imbalance can be calculated by
the following formula:
x   x  x  ( f 
y   y  y 
xy
f
x
2
f
) 
  ( f 
xy
f
y
2
f
  y
)  x
where: f - is the camera focal length, (x,y) - ideal projection, (x’,y’) - real projection
of the point P (fig.3).
Focal length of the cameras installed in the rover equals to 12 mm, the size of the
CCD matrix is 7mm (x,y3.5mm). As it follows from the equations above, the most
significant contribution to y (which is crucial for the matching process) is the
variation of the angle . If y  1 (in pixels), then   5 arcsec.
It is still a question whether is it possible to keep the camera axes parallel with
accuracy of 5 arcsec after the rover landing. In addition the temperature variations on
7
Vision and Navigation of Marskhod Rover
ACCV’95
y
P(x,y,z)
z
yl
yr
y'
P
Pr
l
xl

xr
z'

x'

O
x
b
Fig.3 The scheme of the cameras orientation in the stereo vision system.
the planetary surface will lead to thermal deformation the imaging system. That is
why additional calibration of the stereo vision system may be necessary after the rover
landing. One way to calibrate the system can be based on matching the set of points
put on the target. Such points should be placed uniformly within the viewing field of
the cameras. A search for the correspondences is performed along the horizontal and
vertical directions on-earth. Later on, the calculated map of vertical parallaxes can be
used onboard to compensate the vertical (y-direction) divergence for every stereo pair
taken. Such preprocessing of the original image will prevent the occurrence of local
area corruption in the matching process. The errors in the recalculated distance map
due to uncertainty in the camera orientation are negligible with respect to the rover
size.
3. Experiment and Processing Results
Marsokhod consists of a chassis with 6 wheels, each of which is articulated so
that it can be turned forward or backwards on each side of central joint. The chassis
holds on-board power and computing capabilities, and is equipped with stereo
cameras mounted on a central mast.
Two basic principles underlying the autonomous locomotion are:
On the average, the rover displacement should be in the direction required by the
mission tasks;
8
Vision and Navigation of Marskhod Rover
ACCV’95
The position of the rover during the motion should be risk-free at each moment of
the operation. In unexpected situation the autonomous movement can be terminated to
return the rover under remote control;
Usually, rover operates within the normal path range of 10-12m long. In
emergency case while traversing along the generated path, if any obstacle is detected
by the rover sensors (it means that the actual rover's inclination becomes higher than
the threshold), the rover stops and then performs one step rotation in the opposite
direction, away from the obstacle. At the next step the system is searching a path of
about 3-4m long to avoid the obstacle. If the path generation is unsuccessful, the rover
rotates one step further. If during a complete turn-around a path couldn’t be found,
then the guidance returns back to manual handling (i.e. to remote control). Every time
when the path is successfully executed, the rover restarts the execution of the motion
scheme from the beginning, i.e. returns to the normal path range operations.
Fig. 4 shows the successive steps of the image processing in order to generate
a path which is highlighted in the left image (4c). In Fig. 4b the prohibited area is
shown in white color, this is actually a deep pit in the ground. Areas shaded in black
are safe for the rover motion. The start-off point of the generated path is located at 2.9
m in front of the rover. Fig.5 demonstrates the processing result in another test-site. A
relatively flat sandy terrain with a smaller pit in the right-down corner of the image is
shown in fig. 5a. The brightness of the pixels within the traversable area (5b) are
proportional to the elevation values in the appropriate points on the scene. The
beginning of the path shown is at 2.2m distance off the rover.
Fig. 6 and 7 illustrate the navigation system capability to work stable within
the different illumination. The stereo pair (fig.6) shows a lava field in Kamchatka
close to volcano Tolbachik. Both stereo images look gray because of the black
volcanic wet dust (it was raining during the test). The stereo pair (fig.7) has been
taken during the test in Mojave desert (California, 1994). These images look rather
different from the previous one because of bright sun. The overilluminated sandy
areas are somewhat hard to detect the correspondences, nevertheless, a safe path is
generated.
Conclusions
In this paper we have discussed the concept of a stereo vision system, and the
navigation algorithm for autonomous planetary rover. Both the good quality of the test
9
Vision and Navigation of Marskhod Rover
ACCV’95
results and high performance of the software implementation have demonstrated a
feasibility of real-time automatic navigation. The robustness of the algorithm
developed are proved in a number of experiments including different types of terrain
(sandy and rocky scenes) along with wide range of illumination. All navigation
software is written in OCCAM language taking into account onboard computer
architecture and limitations. The navigation software has also been tested alone on the
commercially available add-on transputer card to PC. The overall processing time on
such card equals 10 sec (image resolution is 128x128 pixels) versus 1.3 min in
onboard processing. The difference is due to rather small local memory installed in the
onboard computer. There is known hardware solution (MD96 board) for real-time
correlation-based stereo emerged from a European ESPRIT project [4]. MD96 board
is based on eight Motorola 96002 Digital Signal Processors and has a peak processing
power of 240 MFLOPS, enabling to perform stereo reconstruction of the 128x128
images in 0.9 sec. In our configuration (1 transputer module, the same image
resolution) the stereo reconstruction algorithm takes 8 sec. This result is comparable
to French hardware-based solution, but doesn’t require additional costs.
References
1. Baker, H.H. & Binford, T.O. (August 1981). Depth from Edge and Intensity Based
Stereo. Seventh Int. Joint Conf. on Art. Intel. Vancouver, 631-636.
2. Balazs, A. & Biro, J. & Szalai, S. (15-21 May 1994). Onboard computer for Mars96 rover. Proc. 2-nd Int. Sympos. on Missions, Technologies and Design of Planetary
Rovers. Center National d'Efudes Spatiales(CNES, France)-Russian Space Agency
(RKA), Moscow-St.-Petersburg (Russia).
3. Dijkstra, E.W. (1959). A note on two problems in connection with graphs. Numer.
Math., 1, 269-271.
4. Faugeras, O.D., et al. (1993). “Real-time correlation based stereo: algorithm
implementations and applications”. The Intern. Jour. of Comp. Vision.
5. Fua, P. (1993). A Parallel Stereo Algorithm that Produces Dense Depth Maps and
Preserves Image Features. Machine Vision and Applications. 6(1), 35-49.
6. Gimel'farb, G.L. (1991). Intensity -based computer binocular stereo vision: signal
models and algorithms. Int. J. of Imaging Systems and Technology. Vol. 3, No. 3,
189-200.
10
Vision and Navigation of Marskhod Rover
ACCV’95
7. Grimson, W.E.L. (1981). From Images to Surfaces: A Computational Study of the
Human Early Visual System. MIT Press, MA.
8. Grimson, W.E.L. (Jan. 1985). Computational Experiments with a Feature Based
Stereo Algorithm. IEEE Trans. on Patt. Anal. and Mach. Intell. PAMI-7(1), 17-33.
9.Hannah, M.J. (Dec. 1989). A System for Digital Stereo Image Matching. Phot. Eng.
and Rem. Sens. 55(12), 1765-1770.
10. Kim, N.H. & A.C. Bovik (1988). A Contour-Based Stereo Matching Algorithm
Using Disparity Continuity. Pattern Recognition. 21(5), 505-514.
11. Kolesnik, M.I. (1993). Fast algorithm for the stereo pair matching with parallel
computation. Lecture Notes in Computer Science. 719, 533-537.
12. Medioni, G. & R. Nevatia (1984). Matching Images Using Linear Features. IEEE
Trans. on Patt. Anal. and Mach. Intell. PAMI-6(6), 675-685.
13. Moravec, H.P. (Sept. 1980). Obstacle avoidance and Navigation in the Real World
by Seeing Robot Rover. Ph.D. Thesis, Stanford Univ., Comp. Sc. Dept., Report
STAN-CS-80-813.
14. Nishihara H.K. (September, 1984). Practical Real-Time Imaging Stereo Matcher.
Optical Engineering. Vol.23, N.5.
15. Ohta, Y & T. Kanade (March, 1985). Stereo by Intra-and-Inter Scanline Search
Using Dynamic Programming. IEEE Trans. on Pattern. Anal. Machine Intell.
Vol.PAMI-7( 2).
16. Proy, C., et al. (16-22 Oct. 1993). Improving autonomy of Marsokhod 96. 44th
Cong. of the Int. Astronautical Federation. Graz, Austria.
17. Transputer Data Book. (1992). 2-nd Edition, London, UK:Prentice Hall.
11
Vision and Navigation of Marskhod Rover
ACCV’95
Left
Right
Fig. 4 a. Original stereo pair: resolution: 256x256 pixels.
Rover's inclinations: pitch = 0, roll = 0.
Left + Path
Prohibit / Traversable Zones
Fig. 4b
Fig.4c
Black area on the Fig. 4b is suitable for the rover's motion.
The start point of the path (Fig.4c) has the distance 2.90m from the rover's center .
The distance of the fracture point is 4.60m and the distance of the target point is 12.70m
12
Vision and Navigation of Marskhod Rover
ACCV’95
Left
Right
Fig. 5 a. Original stereo pair: resolution: 256x256 pixels.
Rover's inclinations: pitch = 0, roll = 0.
Left + Path
Prohibit / Traversable Zones
Fig. 5b
Fig.5c
Gray area on the Fig. 5b which is the part of elevation map (practically flat here) is suitable for the
rover's motion. The start point of the path (Fig.5c) has the distance 2.2m from the rover's center .
The distance to the target point is 9.35m
13
Vision and Navigation of Marskhod Rover
ACCV’95
Left + Path
Right
Fig. 6. Original stereo pair and generated path: resolution: 256x256 pixels.
Right
Left + Path
Fig. 7. Original stereo pair and generated path.
Resolution: 256x256 pixels. Rover's inclinations: pitch=-0, roll=-4.
14
Download