Map-Enhanced UAV Image Sequence Registration Yuping Lin Qian Yu Gerard Medioni Computer Science Department University of Southern California Los Angeles, CA 90089-0781 {yupingli, qianyu, medioni}@usc.edu Abstract Registering consecutive images from an airborne sensor into a mosaic is an essential tool for image analysts. Strictly local methods tend to accumulate errors, resulting in distortion. We propose here to use a reference image (such as a high resolution map image) to overcome this limitation. In our approach, we register a frame in an image sequence to the map using both frame-to-frame registration and frameto-map registration iteratively. In frame-to-frame registration, a frame is registered to its previous frame. With its previous frame been registered to the map in the previous iteration, we can derive an estimated transformation from the frame to the map. In frame-to-map registration, we warp the frame to the map by this transformation to compensate for scale and rotation difference and then perform an area based matching using Mutual Information to find correspondences between this warped frame and the map. From these correspondences, we derive a transformation that further registers the warped frame to the map. With this two-step registration, the errors between each consecutive frames are not accumulated. We present results on real image sequences from a hot air balloon. 1. Introduction Geo-registration is very useful application, which can be widely used in UAV (Unmannered Aerial Vehicle) to navigate, or to geo-locating a target, or even to refine a map. Feature-based [1][5] registration has produced good progress in recent years. Based on the technology of image registration, mosaicing of image sequences can be done by computing the transformations between consecutive frames. To take into account the accumulated error, bundle adjustment [6] is usually employed as a global error minimizing approach. However, for long sequences with thousands of frames, bundle adjustment is not feasible in terms of com- IEEE Workshop on Applications of Computer Vision (WACV'07) 0-7695-2794-9/07 $20.00 © 2007 putation. Moreover, offline bundle adjust is not appropriate for many tasks. To perform image mosaicing in a progressive manner while still preserving accuracy, we propose to use an associated map image as a global reference. A two-step procedure is applied to register an UAV image sequence to the global map. In the first step, we register consecutive frames by estimating the best homography to align the feature points in each frame. By using the homography obtained from the first step, we roughly align the UAV image with the global map. The first step provides us an initialization which basically compensates for the scale and rotation between the UAV image and the map. In the second step, we try to register the roughly aligned UAV image to the map. A similar scenario has been presented in [8]. In area based matching, MSE[12] or normalized correlation[13] are used to determine correspondences between the UAV image and the reference image. However, the UAV images are captured at different times and in different views with respect to the satellite image. The color, illumination, and the dynamic content (such as vehicles, trees, shadow and so on) could be very different. MSE or normalized correlation in such cases are not robust enough. We propose an approach that applies mutual information [4] to establish correspondences. Mutual information has been successfully applied in establishing correspondence in different modality images, especially in medical image processing. Our experiments show that mutual information does provide strong enough correspondence after roughly compensating for scale and rotation. Given the correspondence between the roughly aligned UAV image and the map, we derive a homography that further registers the roughly aligned UAV image to the map. By linking this homography and the initialized homography from the first step, we can register the UAV images with the map without incremental accumulated registration errors. This paper is organized as follows. In section 2, we formulate our problem with definition of symbols. In section 3, we present the two-step procedure for geo-registration. In section 4, we compare our results with and without refinement in geo-registration. Experiments show that the refinement procedure significantly reduces the accumulated error. Discussion and future work are presented in section 5. Mi in the map, namely Mi = Hi,M Ii . Then we register Mi to the map at Mi , and derive H , namely Mi = H Mi . Finally the actual homography Hi,M that registers Ii to Mi on the map is derived as Hi,M = H Hi,M . 2. Problem formulation and Issues We start by giving definitions of the symbols used in this paper. We are given a sequence of UAV images I0 , I1 , . . . , In , and a map (usually a satellite image) M . Here, we assume the scene depth is small with respect to the distance from the UAV camera, so the transformation between two UAV images can be represented by a homography. The transformation between an UAV image and the map is also represented as a homography. Let Hi,j denote the homography from Ii to Ij , Hi,M denotes the homography from Ii to M , namely Hi,j Ii = Ij and Hi,M Ii = Mi , Mi is the image where Ii projects to in M . Note that Hj,i = Hi,j −1 . Our goal is to derive accurate estimates of H0,M , . . . , Hi,M so that I1 , . . . , In are registered to M and form a mosaic without distortion (Figure 1). Figure 1. For each Ii , derive Hi,M so that they all register to the map M and form a seamless mosaic However, the map and images are taken at different times, from different sensors, from different viewpoints, and may have different dynamic contents, (such as vehicles or shadows). As a result, it is difficult to simply match each incoming image to the map. Instead, we need to build a partial local mosaic, then register to the map in an iterative manner. 3. Approach Figure 2 illustrates the flow chart of our approach. Each frame Ii in the UAV image sequence is first registered to the previous frame to derive Hi,i−1 . In the second step, we estimate Hi,M as Hi−1,M Hi,i−1 , denoted as Hi,M . This estimated homography warps Ii to a partial local mosaic IEEE Workshop on Applications of Computer Vision (WACV'07) 0-7695-2794-9/07 $20.00 © 2007 Figure 2. Flow chart of our approach In the following sections, we first describe the method we use to register Ii to the previous image Ii−1 . Then we introduce our method to further fine-tune Hi,M so that Ii is mapped to M more accurately and the registration error is not accumulated along the registration process. 3.1. Registration of consecutive Images To compute the Hi,i−1 , we match features and then perform RANSAC[3] outlier filtering. After trying many kinds of features, we selected SIFT (Scale Invariant Feature Transform) [1] features. SIFT features are invariant to image scale and rotation, and provide robust descriptions across changes in 3D viewpoint. of matching and the computation time are far less than directly registering Ii to the map. 3.2.1. Finding Correspondences between UAV image and Map Figure 3. initial registration between the UAV images and the map In the feature matching step, we use nearest neighbor matching [2]. Since the translation and rotation of the UAV camera between consecutive frames are small, we can assume matched features should be within a small window. This adds one more constraint to match features. Usually, at resolution of 720 × 480, we can generate 2000 correspondence pairs. Finally, we use RANSAC to filter outliers (we use inlier tolerance = 1 pixel) among the set of correspondences and derive Hi−1,i . Having the Hi,i−1 and H0,M , we can roughly register the UAV image to the map by estimating Hi,M as: i Hi,M = Hi−1,M Hi,i−1 = H0,M Hk,k−1 (1) To derive H , we try to find correspondences between M i and the map area which Mi spans. However, Mi is usually a smaller region than Ii (map has lower resolution), which means Mi preserves less amount of information than Ii . Hence we do it in a reverse way. As shown in Figure 4, let Ui be the map image transformed back from the same area which Mi spans using HM,i . Instead of finding correspon dences between Mi and the map area where M i spans, we find correspondences between Ii and Ui . k=1 This shows that if there exists a subtle transformation error in each Hk,k−1 , these errors are multiplied and result in a significant error. This means that later UAV images could be registered to a very wrong area on the map. As shown in Figure 3, the registration is not perfect. Thus, we need to find a way to establish correspondences between the UAV image and the map and refine the homography by using these correspondences. 3.2. UAV to Map registration Registering an aerial image to a map is a challenging problem [10][11]. Due to significant differences in lighting conditions, resolution, and 3D viewpoints between the UAV image and the map, the same point may yield quite different SIFT descriptors respectively. Therefore, poor feature matching and poor registration can be expected. Since it is difficult to register an UAV image to the map directly, we make use of Hi,i−1 derived from UAV to UAV registration and estimate Hi,M as Hi,M = Hi−1,M Hi,i−1 , and then fine-tune it to a better one. Let M i denotes the warped image of Ii by Hi,M (Figure 2, Step 2). Our goal is to derive a homography H that registers M i to the map at Mi (Figure 2, Step 3), so that the image is accurately aligned to the map. The advantage of this approach is that with Mi roughly aligned to the map, we can perform a local search for correspondence under the same scale. Therefore the ambiguity IEEE Workshop on Applications of Computer Vision (WACV'07) 0-7695-2794-9/07 $20.00 © 2007 Figure 4. Ui denotes the map image transformed back from the same region which Mi spans using HM,i . PI and PU are points locate at the same coordinate in Ii and Ui respectively. SP , SPU are two image patches of I the same size centered at point PI and PU respectively, where PI is the corresponding point to PU . Let PI and PU be points located at the same coordinates in Ii and Ui respectively. With a good enough Hi,M , PU should have its correspondence PI in Ii close to PI . PI is determined by having the UAV image patch centers at it most similar to the map image patch centers at PU . We use mutual information[4] as the similarity measure. Mutual information of two random variables is a quantity that measures the dependence of the two variables. Taking two images (same size) as the random variables, it measures how much information two images share, or how much an image depends on the other. It is a more meaningful criterion way compared to measures such as cross-correlation or grey value differences. Let SPi , SPj be two image patches of the same size centered at point Pi and Pj respectively. M I(SPi , SPj ) be the mutual information of SPi and SPj . We find PI by looking for pixels Pi in PI ’s neighborhood that yields the greatest M I(SPU , SPi ). (a) (a) (b) (b) Figure 5. The correspondences in the UAV image (a) with Figure 6. The correspondences in the uav image (a) with respect to the feature points in the map image (b). Blue dots and red dots represent good and poor correspon- respect to the feature points in the map image (b). Green dots and orange dots represent RANSAC inliers and out- dences respectively. liers respectively. 3.2.2. Defining Good Correspondences It may happen that ”all” or ”none” of the image patches centered on PI ’s neighborhood pixels are similar to the image patch centered on PU . In either case, the maximum mutual information is meaningless, since the mutual information at other places could be just slightly smaller. We need to filter these unreliable correspondences so that the derived homography is accurate. Let Pk be the pixel in PI ’s neighborhood area that has the smallest mutual information value. We consider it a good correspondence when M I(SPU , SP ) is significantly I larger than M I(SPU , SPk ) (we use M I(SPU , SP ) > I 2M I(SPM , SPk )). Intuitively, it means that image patch SP must be significantly more similar to SPU than SPk . I Figure 5 shows the results of extracting good correspondences. Blue dots and red dots represent good and poor correspondences respectively. We can generate as many correspondences as we want by performing such an operation on feature points in Ui . Here we use the Harris Corner Detector[5] to extract features instead of SIFT because our purpose is to have the locations of some interest points in Ui . Harris Corner Detector satisfies our need, and it is computationally cheaper than SIFT. Once we have enough correspondences, RANSAC is performed to filter outliers, and then H is derived. As shown in Figure 6, color dots in 6(b) are feature points extracted IEEE Workshop on Applications of Computer Vision (WACV'07) 0-7695-2794-9/07 $20.00 © 2007 by Color dots in 6(a) are their correspondences respectively, while the green dots are RANSAC inliers to derive H . Finally, Hi,M is derived as Hi,M = H Hi,M , and Ii is registered to the map at Mi (Figure 2, Step 4). 4. Experimental Results We show results on two data sets. The UAV image sequences are provided with latitude and longitude information. The satellite images are acquired from Google Earth. The size of the each UAV image 720 × 480. We manually register the first frame of the UAV sequence to their corresponding satellite images, namely H0,M is given. In each UAV to Map registration step, we select 200 Harris Corners in the UAV image as samples. We require the distance between any two features to be no lower than 10 pixels. For each sample, an image patch of size 100×100 is used to compute the mutual information, and the neighborhood region where we search for a best match is a window of size 40 × 40. We found the window size of 100 × 100 is a proper size for a discriminative local feature in our UAV image registration. Since mutual information computation is very costly, we only perform an UAV to Map registration every 50 frames. The results of case 1 with and without UAV to Map registration are shown in 7(a) and 7(c) respectively. The results of case 2 with and without UAV to Map registration are shown in 7(b) and 7(d) respectively. Table 1 shows the comparison between registration with and without UAV to Map registration in the two examples. ! " #$ " $ " % % Table 1. Experimental results of two examples. 5. Discussion and Future Work We have proposed a new method to improve the accuracy of mosaicing. An additional map image is provided as a global reference to prevent accumulated error in the mosaic. We use mutual information as a similarity measure between two images to generate correspondences between an image and the map. The main limitation of our approach is the assumption that the scene structure is planar compared with the height of the camera. With the UAV camera not high enough, parallax between the UAV image and the map is strong, and the similarity measured by mutual information becomes meaningless. Moreover, even if all correspondences are accurate, they may not be lying on the same plane, and a homography cannot represent the transformation between the UAV image and the map. In our test cases, case 1 has stronger parallax than case 2. As shown in Figure 7, whenever a UAV image is registered to the map, case 1 is more likely to have images registered to a slightly off location, while case 2 has images registered correctly. Our future work aims at classifying features with the same plane. With correspondences of features on the same plane, our assumption is more valid and the UAV to Map registration should be more accurate. In addition, we are studying faster algorithms to speed up the mutual information computation in the UAV to Map registration step so that the overall mosaicing process can be done in reasonable time. Acknowledgments This work was supported by grants from Lockheed Martin. We thank Mark Pritt for providing the data. References [1] David G. Lowe, ”Distinctive image features from scaleinvariant keypoints”, International Journal of Computer Vision, Vol. 60, No. 2, pp. 91-110, 2004. IEEE Workshop on Applications of Computer Vision (WACV'07) 0-7695-2794-9/07 $20.00 © 2007 [2] Matthew Brown and David G. Lowe, ”Recognising panoramas,” International Conference on Computer Vision (ICCV 2003), pp. 1218-25. [3] M. A. Fischler and R. C. Bolles, ”Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography”, Comm. of the ACM, 24, pp. 381-395, 1981. [4] P. A. Viola, ”Alignment by Maximization of Mutual Information”, International Journal of Computer Vision, 24(2) pp. 137 - 154, 1997. [5] C. Harris and M.J. Stephens. ”A combined corner and edge detector”, Alvey Vision Conference, pp. 147V152, 1988. [6] W. Triggs, P. McLauchlan, R. Hartley, and A. Fitzgibbon. Bundle Adjustment: A Modern Synthesis. In Vision Algorithms: Theory and Practice, number 1883 in LNCS, pages 298V373. Springer-Verlag, Corfu, Greece, September 1999. [7] H. S. Sawhney and R. Kumar. ”True multi-image alignment and its application to mosaicing and lens distortion correction”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(3):235 - 243, 1999. [8] L. G. Brown, ”A survey of image registration techniques”, ACM Computing Surveys, 24(4), pp. 325 376, 1992. [9] R. Wildes, D. Horvonen, S. Hsu, R. Kumar, W. Lehman, B. Matei and W. Zhao, ”Video Georegistration: Algorithm and Quantitative Evaluation,” Proc. ICCV, 343 - 350, 2001. [10] G. Medioni, ”Matching of a Map with an Aerial Image”, Proceedings of the 6th International Conference on Pattern Recognition, pp. 517-519, Munich, Germany, October 1982. [11] Xiaolei Huang, Yiyong Sun, Dimitris Metaxas, Frank Sauer, Chenyang Xu, ”Hybrid Image Registration based on Configural Matching of Scale-Invariant Salient Region Features,” cvprw, p. 167, 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW’04) Volume 11, 2004 [12] S. Hsu. ”Geocoded Terrestrial Mosaics Using Pose Sensors and Video Registration”, IEEE Conf. on Computer Vision and Pattern Recognition, Kauai, Huwaii, USA, Dec. 2001. [13] Cannata, R.W. Shah, M. Blask, S.G. Van Workum, J.A. Harris Corp., Melbourne, FL ”Autonomous video registration using sensor model parameter adjustments”, Applied Imagery Pattern Recognition Workshop, 2000. Proceedings. 29th, pp. 215-222. [14] D. Hirvonen, B. Matei, R. Wildes and S. Hsu. ”Video to Reference Image Alignment in the Presence of Sparse Features and Appearance Change”, IEEE Conf. on Computer Vision and Pattern Recognition, Kauai, Huwaii, USA, Dec. 2001. (a) (b) (c) (d) Figure 7. (a),(c) Results of case 1 and case 2 with only registration of consecutive UAV images respectively. (b),(d) Results of case 1 and case 2 with additional UAV to Map registration very 50 frames respectively. IEEE Workshop on Applications of Computer Vision (WACV'07) 0-7695-2794-9/07 $20.00 © 2007