Map-Enhanced UAV Image Sequence Registration

advertisement
Map-Enhanced UAV Image Sequence Registration
Yuping Lin
Qian Yu
Gerard Medioni
Computer Science Department
University of Southern California
Los Angeles, CA 90089-0781
{yupingli, qianyu, medioni}@usc.edu
Abstract
Registering consecutive images from an airborne sensor
into a mosaic is an essential tool for image analysts. Strictly
local methods tend to accumulate errors, resulting in distortion. We propose here to use a reference image (such as a
high resolution map image) to overcome this limitation. In
our approach, we register a frame in an image sequence to
the map using both frame-to-frame registration and frameto-map registration iteratively. In frame-to-frame registration, a frame is registered to its previous frame. With its
previous frame been registered to the map in the previous iteration, we can derive an estimated transformation from the
frame to the map. In frame-to-map registration, we warp the
frame to the map by this transformation to compensate for
scale and rotation difference and then perform an area based
matching using Mutual Information to find correspondences
between this warped frame and the map. From these correspondences, we derive a transformation that further registers the warped frame to the map. With this two-step registration, the errors between each consecutive frames are not
accumulated. We present results on real image sequences
from a hot air balloon.
1. Introduction
Geo-registration is very useful application, which can be
widely used in UAV (Unmannered Aerial Vehicle) to navigate, or to geo-locating a target, or even to refine a map.
Feature-based [1][5] registration has produced good
progress in recent years. Based on the technology of image
registration, mosaicing of image sequences can be done by
computing the transformations between consecutive frames.
To take into account the accumulated error, bundle adjustment [6] is usually employed as a global error minimizing
approach. However, for long sequences with thousands of
frames, bundle adjustment is not feasible in terms of com-
IEEE Workshop on Applications of Computer Vision (WACV'07)
0-7695-2794-9/07 $20.00 © 2007
putation. Moreover, offline bundle adjust is not appropriate
for many tasks.
To perform image mosaicing in a progressive manner
while still preserving accuracy, we propose to use an associated map image as a global reference. A two-step procedure
is applied to register an UAV image sequence to the global
map. In the first step, we register consecutive frames by estimating the best homography to align the feature points in
each frame. By using the homography obtained from the
first step, we roughly align the UAV image with the global
map. The first step provides us an initialization which basically compensates for the scale and rotation between the
UAV image and the map.
In the second step, we try to register the roughly aligned
UAV image to the map. A similar scenario has been presented in [8]. In area based matching, MSE[12] or normalized correlation[13] are used to determine correspondences
between the UAV image and the reference image. However, the UAV images are captured at different times and
in different views with respect to the satellite image. The
color, illumination, and the dynamic content (such as vehicles, trees, shadow and so on) could be very different.
MSE or normalized correlation in such cases are not robust enough. We propose an approach that applies mutual
information [4] to establish correspondences. Mutual information has been successfully applied in establishing correspondence in different modality images, especially in medical image processing. Our experiments show that mutual
information does provide strong enough correspondence after roughly compensating for scale and rotation. Given the
correspondence between the roughly aligned UAV image
and the map, we derive a homography that further registers
the roughly aligned UAV image to the map. By linking this
homography and the initialized homography from the first
step, we can register the UAV images with the map without
incremental accumulated registration errors.
This paper is organized as follows. In section 2, we formulate our problem with definition of symbols. In section 3,
we present the two-step procedure for geo-registration. In
section 4, we compare our results with and without refinement in geo-registration. Experiments show that the refinement procedure significantly reduces the accumulated error.
Discussion and future work are presented in section 5.
Mi in the map, namely Mi = Hi,M Ii . Then we register
Mi to the map at Mi , and derive H , namely Mi = H Mi .
Finally the actual homography Hi,M that registers Ii to Mi
on the map is derived as Hi,M = H Hi,M .
2. Problem formulation and Issues
We start by giving definitions of the symbols used in
this paper. We are given a sequence of UAV images
I0 , I1 , . . . , In , and a map (usually a satellite image) M .
Here, we assume the scene depth is small with respect to
the distance from the UAV camera, so the transformation
between two UAV images can be represented by a homography. The transformation between an UAV image and the
map is also represented as a homography. Let Hi,j denote
the homography from Ii to Ij , Hi,M denotes the homography from Ii to M , namely Hi,j Ii = Ij and Hi,M Ii = Mi ,
Mi is the image where Ii projects to in M . Note that
Hj,i = Hi,j −1 .
Our goal is to derive accurate estimates of H0,M , . . . ,
Hi,M so that I1 , . . . , In are registered to M and form a mosaic without distortion (Figure 1).
Figure 1. For each Ii , derive Hi,M so that they all register to the map M and form a seamless mosaic
However, the map and images are taken at different
times, from different sensors, from different viewpoints,
and may have different dynamic contents, (such as vehicles
or shadows). As a result, it is difficult to simply match each
incoming image to the map. Instead, we need to build a
partial local mosaic, then register to the map in an iterative
manner.
3. Approach
Figure 2 illustrates the flow chart of our approach. Each
frame Ii in the UAV image sequence is first registered to
the previous frame to derive Hi,i−1 . In the second step, we
estimate Hi,M as Hi−1,M Hi,i−1 , denoted as Hi,M . This
estimated homography warps Ii to a partial local mosaic
IEEE Workshop on Applications of Computer Vision (WACV'07)
0-7695-2794-9/07 $20.00 © 2007
Figure 2. Flow chart of our approach
In the following sections, we first describe the method
we use to register Ii to the previous image Ii−1 . Then we
introduce our method to further fine-tune Hi,M so that Ii is
mapped to M more accurately and the registration error is
not accumulated along the registration process.
3.1. Registration of consecutive Images
To compute the Hi,i−1 , we match features and then
perform RANSAC[3] outlier filtering. After trying many
kinds of features, we selected SIFT (Scale Invariant Feature Transform) [1] features. SIFT features are invariant to
image scale and rotation, and provide robust descriptions
across changes in 3D viewpoint.
of matching and the computation time are far less than directly registering Ii to the map.
3.2.1. Finding Correspondences between UAV image
and Map
Figure 3. initial registration between the UAV images and
the map
In the feature matching step, we use nearest neighbor
matching [2]. Since the translation and rotation of the UAV
camera between consecutive frames are small, we can assume matched features should be within a small window.
This adds one more constraint to match features. Usually,
at resolution of 720 × 480, we can generate 2000 correspondence pairs. Finally, we use RANSAC to filter outliers (we
use inlier tolerance = 1 pixel) among the set of correspondences and derive Hi−1,i .
Having the Hi,i−1 and H0,M , we can roughly register
the UAV image to the map by estimating Hi,M as:
i
Hi,M = Hi−1,M Hi,i−1 = H0,M
Hk,k−1
(1)
To derive H , we try to find correspondences between M i
and the map area which Mi spans. However, Mi is usually
a smaller region than Ii (map has lower resolution), which
means Mi preserves less amount of information than Ii .
Hence we do it in a reverse way. As shown in Figure 4, let
Ui be the map image transformed back from the same area
which Mi spans using HM,i . Instead of finding correspon
dences between Mi and the map area where M i spans, we
find correspondences between Ii and Ui .
k=1
This shows that if there exists a subtle transformation
error in each Hk,k−1 , these errors are multiplied and result
in a significant error. This means that later UAV images
could be registered to a very wrong area on the map. As
shown in Figure 3, the registration is not perfect. Thus, we
need to find a way to establish correspondences between
the UAV image and the map and refine the homography by
using these correspondences.
3.2. UAV to Map registration
Registering an aerial image to a map is a challenging
problem [10][11]. Due to significant differences in lighting
conditions, resolution, and 3D viewpoints between the UAV
image and the map, the same point may yield quite different SIFT descriptors respectively. Therefore, poor feature
matching and poor registration can be expected.
Since it is difficult to register an UAV image to the map
directly, we make use of Hi,i−1 derived from UAV to UAV
registration and estimate Hi,M as Hi,M = Hi−1,M Hi,i−1 ,
and then fine-tune it to a better one. Let M i denotes the
warped image of Ii by Hi,M (Figure 2, Step 2). Our goal
is to derive a homography H that registers M i to the map
at Mi (Figure 2, Step 3), so that the image is accurately
aligned to the map.
The advantage of this approach is that with Mi roughly
aligned to the map, we can perform a local search for correspondence under the same scale. Therefore the ambiguity
IEEE Workshop on Applications of Computer Vision (WACV'07)
0-7695-2794-9/07 $20.00 © 2007
Figure 4.
Ui denotes the map image transformed back
from the same region which Mi spans using HM,i . PI
and PU are points locate at the same coordinate in Ii
and Ui respectively.
SP , SPU are two image patches of
I
the same size centered at point PI and PU respectively,
where PI is the corresponding point to PU .
Let PI and PU be points located at the same coordinates
in Ii and Ui respectively. With a good enough Hi,M , PU
should have its correspondence PI in Ii close to PI .
PI is determined by having the UAV image patch centers at it most similar to the map image patch centers at PU .
We use mutual information[4] as the similarity measure.
Mutual information of two random variables is a quantity
that measures the dependence of the two variables. Taking
two images (same size) as the random variables, it measures
how much information two images share, or how much an
image depends on the other. It is a more meaningful criterion way compared to measures such as cross-correlation or
grey value differences.
Let SPi , SPj be two image patches of the same size centered at point Pi and Pj respectively. M I(SPi , SPj ) be the
mutual information of SPi and SPj . We find PI by looking
for pixels Pi in PI ’s neighborhood that yields the greatest
M I(SPU , SPi ).
(a)
(a)
(b)
(b)
Figure 5. The correspondences in the UAV image (a) with
Figure 6. The correspondences in the uav image (a) with
respect to the feature points in the map image (b). Blue
dots and red dots represent good and poor correspon-
respect to the feature points in the map image (b). Green
dots and orange dots represent RANSAC inliers and out-
dences respectively.
liers respectively.
3.2.2. Defining Good Correspondences
It may happen that ”all” or ”none” of the image patches centered on PI ’s neighborhood pixels are similar to the image
patch centered on PU . In either case, the maximum mutual
information is meaningless, since the mutual information
at other places could be just slightly smaller. We need to
filter these unreliable correspondences so that the derived
homography is accurate.
Let Pk be the pixel in PI ’s neighborhood area that has
the smallest mutual information value. We consider it a
good correspondence when M I(SPU , SP ) is significantly
I
larger than M I(SPU , SPk ) (we use M I(SPU , SP ) >
I
2M I(SPM , SPk )). Intuitively, it means that image patch
SP must be significantly more similar to SPU than SPk .
I
Figure 5 shows the results of extracting good correspondences. Blue dots and red dots represent good and poor
correspondences respectively.
We can generate as many correspondences as we want by
performing such an operation on feature points in Ui . Here
we use the Harris Corner Detector[5] to extract features instead of SIFT because our purpose is to have the locations
of some interest points in Ui . Harris Corner Detector satisfies our need, and it is computationally cheaper than SIFT.
Once we have enough correspondences, RANSAC is performed to filter outliers, and then H is derived. As shown
in Figure 6, color dots in 6(b) are feature points extracted
IEEE Workshop on Applications of Computer Vision (WACV'07)
0-7695-2794-9/07 $20.00 © 2007
by Color dots in 6(a) are their correspondences respectively,
while the green dots are RANSAC inliers to derive H .
Finally, Hi,M is derived as Hi,M = H Hi,M , and Ii is
registered to the map at Mi (Figure 2, Step 4).
4. Experimental Results
We show results on two data sets. The UAV image sequences are provided with latitude and longitude information. The satellite images are acquired from Google Earth.
The size of the each UAV image 720 × 480. We manually
register the first frame of the UAV sequence to their corresponding satellite images, namely H0,M is given.
In each UAV to Map registration step, we select 200 Harris Corners in the UAV image as samples. We require the
distance between any two features to be no lower than 10
pixels. For each sample, an image patch of size 100×100 is
used to compute the mutual information, and the neighborhood region where we search for a best match is a window
of size 40 × 40. We found the window size of 100 × 100 is
a proper size for a discriminative local feature in our UAV
image registration.
Since mutual information computation is very costly, we
only perform an UAV to Map registration every 50 frames.
The results of case 1 with and without UAV to Map registration are shown in 7(a) and 7(c) respectively. The results of
case 2 with and without UAV to Map registration are shown
in 7(b) and 7(d) respectively.
Table 1 shows the comparison between registration with
and without UAV to Map registration in the two examples.
! " #$ " $ " %
%
Table 1. Experimental results of two examples.
5. Discussion and Future Work
We have proposed a new method to improve the accuracy
of mosaicing. An additional map image is provided as a
global reference to prevent accumulated error in the mosaic.
We use mutual information as a similarity measure between
two images to generate correspondences between an image
and the map.
The main limitation of our approach is the assumption
that the scene structure is planar compared with the height
of the camera. With the UAV camera not high enough, parallax between the UAV image and the map is strong, and the
similarity measured by mutual information becomes meaningless. Moreover, even if all correspondences are accurate,
they may not be lying on the same plane, and a homography cannot represent the transformation between the UAV
image and the map. In our test cases, case 1 has stronger
parallax than case 2. As shown in Figure 7, whenever a
UAV image is registered to the map, case 1 is more likely to
have images registered to a slightly off location, while case
2 has images registered correctly.
Our future work aims at classifying features with the
same plane. With correspondences of features on the same
plane, our assumption is more valid and the UAV to Map
registration should be more accurate. In addition, we are
studying faster algorithms to speed up the mutual information computation in the UAV to Map registration step so
that the overall mosaicing process can be done in reasonable time.
Acknowledgments
This work was supported by grants from Lockheed Martin. We thank Mark Pritt for providing the data.
References
[1] David G. Lowe, ”Distinctive image features from scaleinvariant keypoints”, International Journal of Computer Vision, Vol. 60, No. 2, pp. 91-110, 2004.
IEEE Workshop on Applications of Computer Vision (WACV'07)
0-7695-2794-9/07 $20.00 © 2007
[2] Matthew Brown and David G. Lowe, ”Recognising
panoramas,” International Conference on Computer Vision (ICCV 2003), pp. 1218-25.
[3] M. A. Fischler and R. C. Bolles, ”Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography”,
Comm. of the ACM, 24, pp. 381-395, 1981.
[4] P. A. Viola, ”Alignment by Maximization of Mutual Information”, International Journal of Computer Vision,
24(2) pp. 137 - 154, 1997.
[5] C. Harris and M.J. Stephens. ”A combined corner
and edge detector”, Alvey Vision Conference, pp.
147V152, 1988.
[6] W. Triggs, P. McLauchlan, R. Hartley, and A. Fitzgibbon. Bundle Adjustment: A Modern Synthesis. In Vision Algorithms: Theory and Practice, number 1883
in LNCS, pages 298V373. Springer-Verlag, Corfu,
Greece, September 1999.
[7] H. S. Sawhney and R. Kumar. ”True multi-image alignment and its application to mosaicing and lens distortion correction”, IEEE Transactions on Pattern Analysis
and Machine Intelligence, 21(3):235 - 243, 1999.
[8] L. G. Brown, ”A survey of image registration techniques”, ACM Computing Surveys, 24(4), pp. 325 376, 1992.
[9] R. Wildes, D. Horvonen, S. Hsu, R. Kumar, W.
Lehman, B. Matei and W. Zhao, ”Video Georegistration: Algorithm and Quantitative Evaluation,” Proc.
ICCV, 343 - 350, 2001.
[10] G. Medioni, ”Matching of a Map with an Aerial Image”, Proceedings of the 6th International Conference
on Pattern Recognition, pp. 517-519, Munich, Germany, October 1982.
[11] Xiaolei Huang, Yiyong Sun, Dimitris Metaxas,
Frank Sauer, Chenyang Xu, ”Hybrid Image Registration based on Configural Matching of Scale-Invariant
Salient Region Features,” cvprw, p. 167, 2004 Conference on Computer Vision and Pattern Recognition
Workshop (CVPRW’04) Volume 11, 2004
[12] S. Hsu. ”Geocoded Terrestrial Mosaics Using Pose
Sensors and Video Registration”, IEEE Conf. on Computer Vision and Pattern Recognition, Kauai, Huwaii,
USA, Dec. 2001.
[13] Cannata, R.W. Shah, M. Blask, S.G. Van Workum,
J.A. Harris Corp., Melbourne, FL ”Autonomous video
registration using sensor model parameter adjustments”, Applied Imagery Pattern Recognition Workshop, 2000. Proceedings. 29th, pp. 215-222.
[14] D. Hirvonen, B. Matei, R. Wildes and S. Hsu. ”Video
to Reference Image Alignment in the Presence of
Sparse Features and Appearance Change”, IEEE Conf.
on Computer Vision and Pattern Recognition, Kauai,
Huwaii, USA, Dec. 2001.
(a)
(b)
(c)
(d)
Figure 7. (a),(c) Results of case 1 and case 2 with only registration of consecutive UAV images respectively. (b),(d) Results of
case 1 and case 2 with additional UAV to Map registration very 50 frames respectively.
IEEE Workshop on Applications of Computer Vision (WACV'07)
0-7695-2794-9/07 $20.00 © 2007
Download