Side-scan Sonar Image Registration for AUV Navigation P. Vandrish, A. Vardy, D. Walker, and O. A. Dobre Memorial University of Newfoundland Faculty of Engineering and Applied Science St. John’s, NL A1B 3X5 Canada Abstract - As part of the Responsive Autonomous Underwater Vehicle Localization and Mapping (REALM) project at Memorial University (MUN), a side-scan sonar based qualitative navigation system is currently being developed for the AUV MUN Explorer. The qualitative navigation system intricately depends on the registration of side-scan sonar images obtained at navigation time. Image registration is the process of computing a transformation which maps features observed from one perspective to the same features observed from a second perspective. However, feature matching in side-scan sonar images does present a number of challenges due to the nature of acoustic sensors, the dynamic properties of the AUV, and the perpetually changing acoustic medium. As such, feature extraction and matching techniques must be designed to be robust enough to simultaneously handle a variety of image distortions. For this paper some of the difficulties involved in the registration of side-scan sonar images will be discussed. Preliminary results on image registration from side-scan images captured by a sonar sensor affixed to a small boat executing approximately parallel tracks will be presented. Index Terms – AUV, Navigation, Side-scan Sonar, SIFT, Image Registration INTRODUCTION Today, AUVs can provide an effective alternative to manned vessels for a variety of routine marine applications, such as surveying and monitoring. AUVs may be deployed for use in mine counter measures, underwater equipment inspection, or research activities. The REALM project led by Memorial University focuses on expanding the capabilities of the Explorer AUV platform in particular, and developing novel techniques for AUV navigation and surveying capabilities in general. Navigation itself plays a very important role in enabling the autonomous operation of an AUV not only as a path planning subsystem, but it also ensures the safety and recovery of the vehicle. Qualitative navigation is a concept that has been developed for mobile robotics. A qualitative navigation system (QNS) would provide an AUV with the ability to find its way to a home location without making reference to a global coordinate system. In other words, a QNS would use uniquely identifiable landmarks to interpret a homing direction, without necessarily establishing an estimate of each landmarks geographical location. This has a significant advantage over navigation systems which attempt to keep track of landmark and vehicle locations. Navigation systems which try to maintain a pose vector become plagued with drift errors over time and are not correctable in the absence of geographical positioning feedback. Most navigation systems in use today are of this type [1]. They obtain a current estimate of the AUV’s location and orientation through the use of inertial sensors and must periodically resurface to receive Global Positioning System (GPS) feedback. A side-scan sonar system can provide an AUV with a framework for developing highly sophisticated navigation systems through the use of techniques borrowed from the area of computer vision. An AUV carrying a side-scan sonar system within its payload can extract an enormous amount of information about its current surroundings in the form of sonar images. Side-scan sonar images tend to contain features such as textures, markings, and blobs. These features are usually unique enough that, over the course of an operation, they can be distinctly identified and matched if observed multiple times. If numerous matches are found within a given frame, image registration can be performed and the transformation matrix computed which maps the frame feature set to a previously observed feature set. The transformation matrix can then be used to directly infer the homing direction for the vehicle. However, realistically, the procedure described here is not as straightforward as it seems. The nature of side-scan sonar sensors, the acoustic medium, and the dynamics of the vehicle introduce a variety of image distortions which will degrade the effectiveness of this scheme. Applying image processing and computer vision techniques to side-scan sonar images is not new. Automatic target recognition has been studied [2] for the purpose of identifying foreign and potentially dangerous objects on the sea floor. Texture segmentation techniques were studied in [3]. Feature extraction techniques were also developed in [4], and [5]. Image registration of side-scan sonar images through phase correlation, mutual information, and the correlation ratio is presented in [6]. Some interesting work was undertaken in [7] on the derivation of a statistical technique for the correction of geometric distortions in sidescan sonar images. Among all previous work, no robust and unifying technique for feature extraction and matching applied to side-scan sonar images has been attained. This paper will discuss the nature of the problem in further detail, present the Scale Invariant Feature Transform (SIFT) as a potentially robust technique for feature extraction and matching, and look at an image registration technique based on RANdom SAmple Consensus (RANSAC). A preliminary evaluation of the performance of this combination of operations will be presented. Section I will introduce some of the mathematics behind side-scan sonar and associated distortions. Section II will follow with a brief look at what makes a good image feature and the reasons why we chose to use SIFT for our preliminary experiments. Section III will discuss the use of RANSAC to estimate transformation model parameters. Section IV applies the techniques presented here to some sample sidescan sonar data obtained off the coast of Foxtrap, NL. In conclusion, we will take a step back and review the challenges which need to be overcome in the near future to produce a highly robust image registration technique. I. SIDE-SCAN SONAR Side-scan sonar systems produce high resolution image representations (or sonograms) of acoustic backscatter. They operate by emitting a narrow (in the fore-aft direction) acoustic pulse perpendicular to the along-track direction vector and out each side of the sonar. The pulse spreads out over a wide angle in the across-track plane to ensonify the seabed, as shown in Figure 1. along-track slant-range swath Figure 1. A typical side-scan sonar depiction. The distance from the sonar to a point on the seafloor is called the slant-range. A typical side-scan sonar image which has not been slant-range corrected is shown in Figure 2. The width along the seafloor covered by the acoustic pulse (ping) is referred to as the swath width. In the unprocessed received signal, regions along the swath which produce strong backscatter will show up darker within the image scanline. Protruding objects on the seabed will produce dark regions followed by a light shadow. As mentioned above, there are numerous factors which come to impact the accuracy of image interpretation. They include changes to the velocity, orientation (roll, pitch, and Figure 2. A side-scan sonar image which has not been corrected for slantrange distortion. yaw) and sway of the vehicle, acoustic noise, beam pattern, and the absorptivity and structure of the reflecting surfaces for example. As a ping propagates away from the sonar, the beam width in the along-track direction expands. This in turn has the effect of a decreased along-track image resolution at points sufficiently far from the sonar. Multiple collinear landmarks running parallel to the along track at a long range from the sonar could appear merged together in the final image. Obviously, one might think that this problem can be sidestepped by simply accepting data from within a shorter range. However, a similar problem exists which affects the across-track resolution. Depending on the pulse duration, multiple collinear landmarks along the scan-line closer to the sonar could appear merged in the final image. This is due to the fact that the pulse’s “footprint” on the seabed is longer nearer to the sonar and becomes shorter as the pulse propagates outward. Hence, there is a tradeoff between the resolutions in the across-track versus along-track directions. This, as will be illustrated shortly, is one challenge which could affect the reliability of image feature descriptions. Acoustic noise can be modeled as a random process. Some types of noise, such as white noise, can be easily reduced through the use of filters in a preprocessing stage. Side-scan sonar image preprocessing is beyond the scope of this paper. In terms of backscatter, large protruding objects will cast a light “shadow” adjacent to the object. This occurs since no acoustic energy is capable of reaching this region. The horizontal length of this shadow will vary depending on where the object is in the across-track direction relative to the sonar. The shadow may also appear differently depending on the vantage point of the sonar. Thus, the shadow may be a misleading indication of the height and shape of the object. In addition, areas of seabed which produce a very high specular reflection could also be mistakenly interpreted as a shadow. Thus, it is necessary to attempt to avoid deriving features from an image which may potentially have arisen on account of these shadow regions as they may be unstable. However, the protruding object itself may provide a very useful feature on account of its shape. Another aspect which influences the amount of backscatter seen at the sonar is the structure and absorptivity of the seabed. Regions which are coarse will tend to reflect more energy back to the acoustic sensor and they will therefore show up darker. Areas which absorb much of the energy will show up lighter in intensity. These effects depend on the seabed material, the measure of coarseness, the frequency of the transmitted pulse, and the angle of incidence for instance. One consequence of this is that these regions show up as texture-like features in the resulting image. It will be further investigated as to whether textures can provide highly stable features. Considering slant-range distortion aside, geometric image distortions can arise on account of the dynamics of the vehicle. Most sonar transducers, whether contained within an AUV’s payload or part of a towfish construction, will usually undergo changes in orientation, velocity, and sway. Some of these dynamics are more present in a towfish configuration since surge and heave in the towing ship will have a large impact on the towfish. For simplification, in this work we consider the AUV to be constrained by 3 degrees of freedom. They include the freedom of translatory motion in the x-y directions and the freedom to rotate within a 2D plane parallel to the surface at a constant depth (i.e yaw). It is important for the vehicle to maintain a constant forward velocity since changes in velocity can make a target on the seabed appear differently with multiple passes. If changes do occur, it is reasonable to assume that the alongtrack image lines can be easily resampled and interpolated using information from an onboard inertial navigation system. Changes in the direction of the vehicle can also impact the ability of a qualitative navigation system to correctly extract and match observed features. An example image, as a result of orientation changes to a towfish, is shown in Figure 3. In order to compensate for this effect, the scanlines must be reoriented within the resulting image. The method to accomplish this accurately will continue to be explored as part of this project. using a current reference image and a previously observed image alone. That is, without making reference to a global coordinate system. A good feature is regarded as being robust to image or scene object affine or homographic transformations. Although, it is in part the description of the feature which must also be invariant to these transformations, if the feature is to be considered stable. In general, a good feature within an image is a point which is at the center of an area of high spatial information. There are many different methods for obtaining these image keypoints. Some classical methods include the Moravec corner detector [9], Harris corner detector [10], the Kanade-Lucas-Tomasi interest point detector [11], and the Laplacian of Gaussian (LoG) feature detector. The LoG will be discussed in more detail as it is used by SIFT. SIFT is a widely used computer vision algorithm developed by David Lowe at the University of British Columbia [12]. It was initially chosen for this work because it has a proven track record for image stitching applications. Also, many efficient implementations of this algorithm have been developed. SIFT is an algorithm that is capable of extracting image keypoints which are rotation, scale invariant, and robust to a range of affine transformations. SIFT consists of the following processing stages: scalespace extrema detection, keypoint localization, orientation assignment, and keypoint description. In this stage of the project the performance of SIFT applied to side-scan sonar images is being investigated. Some preliminary results will be presented in this paper. The scale-space functional is defined by where G is a Gaussian kernel of standard deviation σ , I is the original image, and (x, y) mark the coordinates of an image point. The Difference of Gaussian (DoG) function closely approximates the scale-normalized LoG function, and as such, can be used to identify interest points within the image. The DoG function can be written as Figure 3. A side-scan sonar image exhibiting geometric distortion due to orientation changes. II. FEATURE EXTRACTION AND MATCHING Robust image features have been a study of ongoing computer vision research for decades. Image feature extraction and matching has applications in camera motion estimation, 3D scene reconstruction, image mosaicing, and optic flow calculations for example. In particular, visual homing algorithms have been developed [8] to enable autonomous robots to find their way to a home location . where k is some empirically determined constant. It has been shown [13] that the maxima and minima of the LoG provide highly stable image features when compared to other keypoint extraction methods. A search is performed over the image at multiple scales to identify these extrema. Lowe has also empirically determined the best choices for σ to produce the most reliable keypoints. Once a keypoint is identified, its location is refined by fitting a 3D quadratic to the DoG in the neighbourhood of the keypoint using a Taylor expansion. The assignment of an orientation to the keypoint is necessary to ensure rotation invariance. This is accomplished by measuring the magnitude and orientation of the intensity gradient at the keypoint location. Using the scale, location, and orientation information for the keypoint a description is then produced. The orientation is used to assign a local coordinate system to the keypoint. The gradients at points within a region surrounding the origin of this coordinate system are sampled, weighted, and stored in a 128 element descriptor vector. Once a keypoint descriptor is produced, it can then be used to match the keypoint to a previously observed keypoint which was stored in a database. A common way to do this is to perform a nearest-neighbour search over the set of keypoint descriptors in the database. Various heuristics can be used to improve matching effectiveness. 5) If the ratio from step 4 is below threshold, then steps 1-4 should be repeated. 6) The algorithm should be reiterated up to a maximum number of times chosen to ensure a specified probability of correctly identifying the model parameters. The final parameters tell us how to transform our currently observed image to the same image section we had observed previously. They can be used to infer the direction and magnitude of a homing vector for the AUV. Thus, it has been shown that navigation may be possible without mapping the observed features and localizing the AUV relative to global coordinate system. III. IMAGE REGISTRATION IV. RESULTS Image registration is a procedure which transforms images viewed from different perspectives into a single coordinate system. In our case we wish to obtain the transformation matrix which maps a currently viewed sidescan sonar image to one which was previously seen and stored in a database. Since we are assuming that the AUV is constrained to undergo transformations in 3 degrees of freedom the image transformation model is kept simple. Using keypoint matches our objective is to compute a transformation consisting of 2 translation and a single rotation component which maps a currently observed feature set to a subset of the feature database. Two correctly matched keypoint pairs are required to completely resolve the transformation parameters. The RANSAC algorithm was first proposed by Fischler and Bolles [14]. It is a parameter estimation algorithm that is capable of handling a large proportion of outliers. The RANSAC algorithm proceeds as follows: The results presented here are preliminary. They will be used to illustrate the potential of these techniques when applied to AUV navigation. Figure 5 shows two independent images stacked vertically for illustrative purposes. These images were obtained during field work in Foxtrap, NL using a side-scan towfish. They correspond to two independent passes over the same region of seabed. RANSAC Algorithm 1) Among all matches randomly select the minimum number of matched pairs required to compute the transformation model. 2) Compute the transformation model using these matched pairs. 3) Count the number of matched pairs other than those selected for the model which agree with the model to within a certain tolerance ε. 4) If the ratio of correctly matched pairs (according to the previously computed model) to the total number of matched pairs exceeds a certain threshold τ, then the model should be recomputed considering all correctly matched pairs and the algorithm terminated. Figure 5. Feature matching of images from two independent passes over a region of seabed in Foxtrap, NL. It appears, in the lower image of Figure 5, that the predominant structural features have been compressed along the vertical axis. This may be due to velocity or pitch changes experienced by the towfish. Due to the robustness of the SIFT descriptor a few matches were identified regardless of the distortions present. Using RANSAC to compute the transformation parameters has indicated that the two images are roughly 180 degrees apart and separated by a displacement of magnitude 60.5 (pixels). For comparison, Figure 6 shows the result of the upper image in Figure 5 transformed according to the computed parameters. It can be observed that the two images appear quite similar in orientation, considering displacement aside. RANSAC has been presented. Challenges faced by the system are introduced as geometric distortions within the images. These distortions mainly occur on account of the dynamics of the vehicle. Preliminary results obtained by applying this registration technique to various side-scan sonar image sets appear positive. However, in future work, methods to improve this technique will be investigated. Some indicators for the future direction of this project have been emphasized throughout this paper. In general, they are concerned with identifying characteristics of stable sidescan sonar image features. In turn, the image registration of these images can be improved by carefully taking these characteristics into account. REFERENCES Figure 6. Resulting image (right) after applying transformation. Numerous other sets of images from the same field data have been tested against the SIFT based image registration technique presented here. The results appear promising for image sets exhibiting a tolerable level of distortion. The future objectives of this project involve improving the robustness of this method to handle a wider variety of image distortion. CONCLUSIONS AUVs are now being deployed more frequently for research, monitoring, and surveying applications. A qualitative navigation system based on the registration of side-scan sonar images from a sensor affixed to the AUV may provide a failsafe homing system for the vehicle. In this paper, a registration technique based on the SIFT and [1] J. J. Leonard, A. A. Bennett, C. M. Smith, and H. J. S. Feder, “Autonomous underwater vehicle navigation,” Dept. Ocean Eng., MIT Marine Robot. Lab., Cambridge, MA, Tech. Memorandum, Jan. 1998. [2] G. J. Dobeck, J. C. Hyland, and L. Smedley, “Automated Detection/Classification of Sea Mines in Sonar Imagery,” Proc. SPIE—Int. Soc. Optics, vol. 3079, pp. 90–110, 1997. [3] M. Lianantonakis, and Y.R. Petillot, “Sidescan sonar segmentation using active contours and level set methods,” IEEE Oceans'05 Europe, Brest, France, pp. 20-23, June 2005. [4] S. Daniel, F. Le Léannec, C. Roux, B. Solaiman, and E. P. Maillard, “Side-scan sonar image matching,” IEEE J. Oceanic Eng., vol. 23, pp. 245–259, July 1998. [5] S. Stalder, H. Bleuler, and T. Ura, “Terrain-based navigation for underwater vehicles using side-scan sonar images,” IEEE Oceans’08, pp. 1-3, Sept. 2008. [6] C. Chailloux and B. Zerr, “Non symbolic methods to register sonar images,” IEEE Oceans'05 Europe, Brest, 2005. [7] D. T. Cobra, A. V. Oppenheim, and J. S. Jaffe, “Geometric distortions in side-scan sonar images: A procedure for their estimation and correction,” IEEE J. Oceanic Eng., vol. 17, no. 3, pp. 252-268, July 1992. [8] D. Churchill and A. Vardy, “Homing in scale space,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1307–1312, 2008. [9] Moravec, H, “Obstacle avoidance and navigation in the real world by a seeing robot rover,” Tech Report CMU-RI-TR-3, Carnegie-Mellon University, Robotics Institute, September 1980. [10] Harris, C. and Stephens, “A combined corner and edge detector,” In Fourth Alvey Vision Conference, Manchester, UK, pp. 147-151, M. 1988. [11] C. Tomasi and T. Kanade, “Detection and tracking of point features,” Carnegie Mellon Univ., Pittsburgh, PA, Tech. Rep. CMU-CS-91-132, Apr. 1991. [12] D. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” Int’l J. Computer Vision, vol. 2, no. 60, pp. 91-110, 2004. [13] Mikolajczyk, K. 2002. “Detection of local features invariant to affine transformations,” Ph.D. thesis, Institut National Polytechnique de Grenoble, France. [14] M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Conzmun. ACM, vol. 24, pp. 381-395, June 1981.