Vision and Navigation of Marsokhod Rover Marina Kolesnik Space Research Institute 84/32 Profsoyuznaya St., Moscow, 117810 Russia mkolesni@iki3.iki.rssi.ru Abstract The exploration of the Martian surface with help of a long-duration rover is planned as a part of the international space project "Mars-98". In order to provide the rover autonomy, we have developed a path generation algorithm that makes use of 3-d stereo reconstruction. Two main tasks are solved for successive obstacle avoidance: (1) recognition of visible terrain in front of the rover; and (2) safe path generation and execution. The area-based stereo reconstruction algorithm [11] which combines the pyramidal data structure and dynamic programming technique has been used for the recognition of the local environment. Prohibited areas are identified on the elevation map regarding rover’s locomotion capabilities to overcome them. The safest path generation is based on the Dijkstra algorithm being applied to non-prohibited areas. The computational complexity and memory requirements of the algorithm developed meets the implementation constraints of the onboard real-time processing. We also provide the result of the tests which have been carried out in sandy and rocky sites (Kamchatka, Russia, 1993; Mojave desert, California, 1994; Tarusa, Russia, 1994), to prove the robustness of the vision-guided system. Keywords: stereo vision system, image pyramid, dynamic programming technique, elevation map, directed graph, Dijkstra algorithm. Vision and Navigation of Marskhod Rover ACCV’95 Introduction The heart of vision-based navigation system is stereo reconstruction of the surface relief. Extensive research experience in the field of stereo analysis has been accumulated worldwide. The known algorithms for passive stereo matching can be classified in two basic categories: 1. Feature-based algorithms. These algorithms [8, 10, 12] extract features from the images, such as edges, segments, contours, separate points (markers) and then match the corresponding features on the left and right images. The matching stage of all these algorithms is computationally fast, because only a small subset of the image pixels are used, however, in general, the process of feature extraction is time consuming. Another drawback is that the algorithms of this class may fail if the primitives can not be reliably determined in the image pair. In particular, the edge segment extraction is quite sensitive to any brightness distortion, as well as to imbalance of the vision system parameters. Furthermore, feature-based methods yield usually only a sparse depth maps that is unacceptable for solving the path planning task. 2. Area-based. Assuming that the left and right images of a pair are locally similar to each other, one can find a strong correlation between the small gray-level areas on the different images [5, 9]. The underlying assumption appears to be a valid one for relatively textured areas; however, it may prove wrong at occlusion boundaries and within featureless regions. The algorithms of both categories often use special methods to improve the matching reliability, such as: image hierarchies [13] and pyramidal structure of the images [9]; multiresolution coarse-to-fine strategy [7] to overcome the match ambiguities; interpolation methods [7] to expand the depth values from the edge into the domain interior; dynamic programming techniques [6, 1, 15] to produce a smooth depth map; A particular navigation algorithm based on the stereo vision was developed by a French research group [16]. Their algorithm for stereo reconstruction makes use of the area-based correlation process [14]. The path planning step comprises of the following substeps: 1. distance map reconstruction using the geometrical parameters of the vision system; 2 Vision and Navigation of Marskhod Rover ACCV’95 2. interpolation of these 3-D values to get a regular surface grid of heights (Digital Terrain Model); 3. elaboration of a navigation map of the terrain with several classes (flat, traversable, obstacle, unknown) using two local thresholds for slope and height discontinuity; 4. path generation by observing a distance margin from the rover to the obstacles. This interesting solution includes, however, the computationally expensive step (2), which is not really necessary for the motion. The last step (4) could also be simplified by minimizing the local risk for the rover. The special issue in vision-guided navigation is the design of relatively stable and fast algorithm for the stereo reconstruction, which doesn't need large memory resources. The accuracy of reconstructed 3-d shape may not be so high because we need to detect those obstacles only, which vertical dimension is bigger then 30 cm. The time consumption of our stereo reconstruction algorithm is quite small due to simple features being used and combination of the image-pyramid/dynamicprogramming techniques. The path is calculated in the spatial domain based on the map, describing the distance to visible points instead of using the Digital Terrain Model (heights) on the real surface grid. We emphasize visibility, because it is useless to investigate the invisible (unknown) regions in front of the rover, as the rover will not enter these areas. Dijkstra algorithm [3] is applied to minimize a local risk for the rover along the path. These steps are believed to be much faster then other methods widely used. This paper consists of three parts. In the first part we describe the stereo reconstruction and navigation algorithms. A brief description of the onboard computer and analysis of the rover stereo vision system parameters are given in the second part. The experiments and processing results are presented in the third part. The algorithm performance compared to known hardware-based solution is given in the conclusion. 1. Navigation Algorithm The autonomous rover progression consists of the following steps to be repeated in cycle: stereo image acquisition; stereo reconstruction in order to estimate the visible terrain; path planning that follows a principal direction and avoids the obstacles; 3 Vision and Navigation of Marskhod Rover ACCV’95 execution step by traversing along the path; The general principle applied to match points in the right and left images is correlation. It consists of comparing the gray level values of the images on a small size (3x3 pixels) local window centered on each point of the left image to find the most similar window on the right image. The disparities (parallaxes: pixel shift from left to right image) thus obtained are then used for the reconstruction of the distance between the rover and the points of the terrain surface, based on the given camera's geometry. The stereo matching process [11] is implemented under the following assumptions concerning the stereo images and a surface to be reconstructed: The original images are always noisy due to geometrical and photometric distortions; The reconstructed surface is mainly smooth ("continuity constraint"); The stereo pair is taken as perspective projection of a scene, that means the disparity values are generally decreasing upwards. The search for correspondences along the horizontal lines is based on: A fast procedure to extract the brightness features which are more stable with respect to the brightness distortions; Data pyramid construction for both stereo images. The original image is considered as zero pyramid layer. Each next layer is defined from the previous one by panning it with a factor of 1/2; A local correlation analysis is implemented iteratively on the image pyramid. The process of identification starts from the top pyramid layer and is continuing over the pyramid layers from top to bottom. The parallax values obtained are stored and then used as the initial shift values, while passing the higher resolution layer. Local correlation analysis is combined with the dynamic programming method to reconstruct the relief along the scan lines at each pyramid layer. The method meets the continuity principle, it also introduces regularization to extract a smooth relief. Finally, the parallax map is recalculated to the real scale distance map according to the geometry of the acquisition system. The value of each point on the map defines the distance from the rover to the appropriate point on the surface. The path planning algorithm is implemented in two steps. During the first one the obstacles are detected in the scale of origin images in the following way. The 4 Vision and Navigation of Marskhod Rover ACCV’95 distance map is recalculated to create an elevation map with respect to that horizontal plane which is under the rover. The actual values of rover inclinations (roll and pitch angles) are taken into account. The obstacles are detected on the elevation map according to the rover locomotion capacities to get over them. The regions which are detected as the obstacles, represent the prohibit zones in the rover's field of view. In the second step the path from the start point to the feasible destination points is generated by applying Dijkstra algorithm to the elevation map. To implement this step, we consider those pixels which do not belong to the obstacles as the nodes of the directed graph. The start point of the path (see fig. 1) is considered as the graph origin. The length of the graph edge connecting the node i with the node j is a positive number defined as follows: W (i , j ) Hi H j where: Hi , H j - are the values of the elevation map; feasible target points obstacle, prohibit area Y Y+1 start point Fig. 1. Path planning task: directed graph on the image field. As usual, the length of a particular path which joins any two given vertices of the graph, is defined as the sum of weights of the edges composing the path. Because the real path must follow continuously through the image field, the graph edges can be connected in the following way: each node in the image row Y can be connected only with the three nodes from the previous row Y+1 (see fig. 1). Under this restriction, the number of operations for searching the shortest path from the start point to destination points is O( N 2 ) , were N is the image size. The number of operations required in the 5 Vision and Navigation of Marskhod Rover ACCV’95 calculation described above is strictly less then the number of operation used in the Voronoi Diagram method, which is implemented in paper [16]. Finally, a virtual destination is selected as a target point keeping the global rover displacement within the direction defined by the mission task. The whole path is then reconstructed from the target to the start position according to the best direction for each graph node. 2. Rover Stereo Vision System The rover stereo vision system consists of two cameras, and onboard computer providing image capturing/processing facilities. vertical visual angle=32 cameras height 1m initial pitch=10 optical axis 2m 11m Fig. 2. Rover stereo vision system. The stereo cameras were installed in the rover in such a way to enable them to analyze the nearest rover environment (fig.2). The blinded area is within approximately 2m from the rover. The cameras are installed on the vertical rack of o 1m. The cameras’ inclination toward the horizon is 10 . A rather large stereo basis (50 cm) made possible to process stereopairs with the resolution of 128x128 pixels. This is enough to recognize major obstacles during the rover motion: the difference between the parallax values corresponding to the top and the bottom of a stone of 30 cm height is equal to 3 pixels from a distance of 14m from the rover's center. In order to provide the autonomy of movement, control and timing experiments, data collection and storing etc., the rover is equipped with an onboard computer based on the 32-bit T805 transputer from INMOS Corporation [17], that can be regarded both as a special (i.e. image processing) and a general purpose processor. Major characteristics of IMS-T805 are: 32-bit internal and external architecture. 6 Vision and Navigation of Marskhod Rover ACCV’95 30 MIPS (peak) instruction rate. 4 Kbyte on-chip RAM direct addressable. Internal timers. 4 fast Serial Links (10 Mbit/sec). Less than 1 watt power consumption at 30 MHz. The heart of onboard computer is four transputer modules, which are the real copy of each other both electrically and even mechanically. There is no distinguished one among them as far as the access to the peripheral blocks concerned, but, and it is a substantial point, only two out of four transputer modules are powered at a time. Which two, it is determined by the actual state of the overswitch logic [2]. The computer system has an access to 256 Kbytes local memory (upgradable). Since the stereo matching process is based on the assumption of epipolar geometry of the original stereo images, the optical axes of the cameras must be strictly parallel to each other. Let’s estimate the accuracy for the alignment of the cameras assuming that vertical parallax for the corresponding points is within one pixel. Let us suppose that the world co-ordinate system X,Y,Z coincides with the left camera position (fig.3). Let us denote the camera viewing angles as a pan angle related to Y-direction, tilt angle related to X-direction, and roll angle related to Zdirection (, and means variation of these angles with respect to the parallel optical axes.) The pixel mismatching due to camera imbalance can be calculated by the following formula: x x x ( f y y y xy f x 2 f ) ( f xy f y 2 f y ) x where: f - is the camera focal length, (x,y) - ideal projection, (x’,y’) - real projection of the point P (fig.3). Focal length of the cameras installed in the rover equals to 12 mm, the size of the CCD matrix is 7mm (x,y3.5mm). As it follows from the equations above, the most significant contribution to y (which is crucial for the matching process) is the variation of the angle . If y 1 (in pixels), then 5 arcsec. It is still a question whether is it possible to keep the camera axes parallel with accuracy of 5 arcsec after the rover landing. In addition the temperature variations on 7 Vision and Navigation of Marskhod Rover ACCV’95 y P(x,y,z) z yl yr y' P Pr l xl xr z' x' O x b Fig.3 The scheme of the cameras orientation in the stereo vision system. the planetary surface will lead to thermal deformation the imaging system. That is why additional calibration of the stereo vision system may be necessary after the rover landing. One way to calibrate the system can be based on matching the set of points put on the target. Such points should be placed uniformly within the viewing field of the cameras. A search for the correspondences is performed along the horizontal and vertical directions on-earth. Later on, the calculated map of vertical parallaxes can be used onboard to compensate the vertical (y-direction) divergence for every stereo pair taken. Such preprocessing of the original image will prevent the occurrence of local area corruption in the matching process. The errors in the recalculated distance map due to uncertainty in the camera orientation are negligible with respect to the rover size. 3. Experiment and Processing Results Marsokhod consists of a chassis with 6 wheels, each of which is articulated so that it can be turned forward or backwards on each side of central joint. The chassis holds on-board power and computing capabilities, and is equipped with stereo cameras mounted on a central mast. Two basic principles underlying the autonomous locomotion are: On the average, the rover displacement should be in the direction required by the mission tasks; 8 Vision and Navigation of Marskhod Rover ACCV’95 The position of the rover during the motion should be risk-free at each moment of the operation. In unexpected situation the autonomous movement can be terminated to return the rover under remote control; Usually, rover operates within the normal path range of 10-12m long. In emergency case while traversing along the generated path, if any obstacle is detected by the rover sensors (it means that the actual rover's inclination becomes higher than the threshold), the rover stops and then performs one step rotation in the opposite direction, away from the obstacle. At the next step the system is searching a path of about 3-4m long to avoid the obstacle. If the path generation is unsuccessful, the rover rotates one step further. If during a complete turn-around a path couldn’t be found, then the guidance returns back to manual handling (i.e. to remote control). Every time when the path is successfully executed, the rover restarts the execution of the motion scheme from the beginning, i.e. returns to the normal path range operations. Fig. 4 shows the successive steps of the image processing in order to generate a path which is highlighted in the left image (4c). In Fig. 4b the prohibited area is shown in white color, this is actually a deep pit in the ground. Areas shaded in black are safe for the rover motion. The start-off point of the generated path is located at 2.9 m in front of the rover. Fig.5 demonstrates the processing result in another test-site. A relatively flat sandy terrain with a smaller pit in the right-down corner of the image is shown in fig. 5a. The brightness of the pixels within the traversable area (5b) are proportional to the elevation values in the appropriate points on the scene. The beginning of the path shown is at 2.2m distance off the rover. Fig. 6 and 7 illustrate the navigation system capability to work stable within the different illumination. The stereo pair (fig.6) shows a lava field in Kamchatka close to volcano Tolbachik. Both stereo images look gray because of the black volcanic wet dust (it was raining during the test). The stereo pair (fig.7) has been taken during the test in Mojave desert (California, 1994). These images look rather different from the previous one because of bright sun. The overilluminated sandy areas are somewhat hard to detect the correspondences, nevertheless, a safe path is generated. Conclusions In this paper we have discussed the concept of a stereo vision system, and the navigation algorithm for autonomous planetary rover. Both the good quality of the test 9 Vision and Navigation of Marskhod Rover ACCV’95 results and high performance of the software implementation have demonstrated a feasibility of real-time automatic navigation. The robustness of the algorithm developed are proved in a number of experiments including different types of terrain (sandy and rocky scenes) along with wide range of illumination. All navigation software is written in OCCAM language taking into account onboard computer architecture and limitations. The navigation software has also been tested alone on the commercially available add-on transputer card to PC. The overall processing time on such card equals 10 sec (image resolution is 128x128 pixels) versus 1.3 min in onboard processing. The difference is due to rather small local memory installed in the onboard computer. There is known hardware solution (MD96 board) for real-time correlation-based stereo emerged from a European ESPRIT project [4]. MD96 board is based on eight Motorola 96002 Digital Signal Processors and has a peak processing power of 240 MFLOPS, enabling to perform stereo reconstruction of the 128x128 images in 0.9 sec. In our configuration (1 transputer module, the same image resolution) the stereo reconstruction algorithm takes 8 sec. This result is comparable to French hardware-based solution, but doesn’t require additional costs. References 1. Baker, H.H. & Binford, T.O. (August 1981). Depth from Edge and Intensity Based Stereo. Seventh Int. Joint Conf. on Art. Intel. Vancouver, 631-636. 2. Balazs, A. & Biro, J. & Szalai, S. (15-21 May 1994). Onboard computer for Mars96 rover. Proc. 2-nd Int. Sympos. on Missions, Technologies and Design of Planetary Rovers. Center National d'Efudes Spatiales(CNES, France)-Russian Space Agency (RKA), Moscow-St.-Petersburg (Russia). 3. Dijkstra, E.W. (1959). A note on two problems in connection with graphs. Numer. Math., 1, 269-271. 4. Faugeras, O.D., et al. (1993). “Real-time correlation based stereo: algorithm implementations and applications”. The Intern. Jour. of Comp. Vision. 5. Fua, P. (1993). A Parallel Stereo Algorithm that Produces Dense Depth Maps and Preserves Image Features. Machine Vision and Applications. 6(1), 35-49. 6. Gimel'farb, G.L. (1991). Intensity -based computer binocular stereo vision: signal models and algorithms. Int. J. of Imaging Systems and Technology. Vol. 3, No. 3, 189-200. 10 Vision and Navigation of Marskhod Rover ACCV’95 7. Grimson, W.E.L. (1981). From Images to Surfaces: A Computational Study of the Human Early Visual System. MIT Press, MA. 8. Grimson, W.E.L. (Jan. 1985). Computational Experiments with a Feature Based Stereo Algorithm. IEEE Trans. on Patt. Anal. and Mach. Intell. PAMI-7(1), 17-33. 9.Hannah, M.J. (Dec. 1989). A System for Digital Stereo Image Matching. Phot. Eng. and Rem. Sens. 55(12), 1765-1770. 10. Kim, N.H. & A.C. Bovik (1988). A Contour-Based Stereo Matching Algorithm Using Disparity Continuity. Pattern Recognition. 21(5), 505-514. 11. Kolesnik, M.I. (1993). Fast algorithm for the stereo pair matching with parallel computation. Lecture Notes in Computer Science. 719, 533-537. 12. Medioni, G. & R. Nevatia (1984). Matching Images Using Linear Features. IEEE Trans. on Patt. Anal. and Mach. Intell. PAMI-6(6), 675-685. 13. Moravec, H.P. (Sept. 1980). Obstacle avoidance and Navigation in the Real World by Seeing Robot Rover. Ph.D. Thesis, Stanford Univ., Comp. Sc. Dept., Report STAN-CS-80-813. 14. Nishihara H.K. (September, 1984). Practical Real-Time Imaging Stereo Matcher. Optical Engineering. Vol.23, N.5. 15. Ohta, Y & T. Kanade (March, 1985). Stereo by Intra-and-Inter Scanline Search Using Dynamic Programming. IEEE Trans. on Pattern. Anal. Machine Intell. Vol.PAMI-7( 2). 16. Proy, C., et al. (16-22 Oct. 1993). Improving autonomy of Marsokhod 96. 44th Cong. of the Int. Astronautical Federation. Graz, Austria. 17. Transputer Data Book. (1992). 2-nd Edition, London, UK:Prentice Hall. 11 Vision and Navigation of Marskhod Rover ACCV’95 Left Right Fig. 4 a. Original stereo pair: resolution: 256x256 pixels. Rover's inclinations: pitch = 0, roll = 0. Left + Path Prohibit / Traversable Zones Fig. 4b Fig.4c Black area on the Fig. 4b is suitable for the rover's motion. The start point of the path (Fig.4c) has the distance 2.90m from the rover's center . The distance of the fracture point is 4.60m and the distance of the target point is 12.70m 12 Vision and Navigation of Marskhod Rover ACCV’95 Left Right Fig. 5 a. Original stereo pair: resolution: 256x256 pixels. Rover's inclinations: pitch = 0, roll = 0. Left + Path Prohibit / Traversable Zones Fig. 5b Fig.5c Gray area on the Fig. 5b which is the part of elevation map (practically flat here) is suitable for the rover's motion. The start point of the path (Fig.5c) has the distance 2.2m from the rover's center . The distance to the target point is 9.35m 13 Vision and Navigation of Marskhod Rover ACCV’95 Left + Path Right Fig. 6. Original stereo pair and generated path: resolution: 256x256 pixels. Right Left + Path Fig. 7. Original stereo pair and generated path. Resolution: 256x256 pixels. Rover's inclinations: pitch=-0, roll=-4. 14