Lourakis_2811305 - Robotics

advertisement
Autonomous Visual Navigation
for Planetary Exploration Rovers
M. Lourakis, G. Chliveros and X. Zabulis
Institute of Computer Science
Foundation for Research and Technology – Hellas
Heraklion, Crete, Greece
ASTRA 2013, 15 - 17 May 2013, Noordwijk, The Netherlands
Introduction
• The talk concerns on-going work pursued in the context of the
SEXTANT activity
• SEXTANT is funded by ESA and aims to develop visual
navigation algorithms suitable for use by Martian rovers
• Low-performance target CPU (150 MIPS)
• Vision algorithms should require as little computing power,
memory footprint and communication overhead as possible
• Computationally most intensive parts are implemented on
FPGA (not part of this talk)
SEXTANT – CDR
2012/11/28
Page 2
Terrain mapping & visual odometry
• Two key elements of visual navigation are terrain mapping &
visual odometry
• Terrain mapping uses dense binocular stereo on a set of
image pairs to produce a 3D representation of the viewed
scene that will be utilized for obstacle avoidance
• Visual odometry (VO) is the process of estimating the
egomotion of a mobile system using as input only the images
acquired by its cameras
• VO will be used as a building block for a complete vSLAM
system (the latter may include loop closure and possibly
global optimization for the map)
SEXTANT – CDR
2012/11/28
Page 3
Camera setup
• Prototype rover is equipped with two pairs of stereo cameras
on a mast
• High pair is used for mapping, low for VO
• Stereo  no depth/scale ambiguity
SEXTANT – CDR
2012/11/28
Page 4
Dense 3D reconstruction: plane-sweeping
• A local stereo method based on the optimization of visual
similarity along rays that uses a hypothetical sweeping plane
along depth
• Simple yet effective
• Amenable to efficient implementation
• Inherently parallelizable, detail can be modulated
• Has been successfully used for large-scale urban reconstruction
(e.g. UrbanScape project)
• Expandable to more than two cameras for greater accuracy
• Limitations:
• Textureless or non-reflective surfaces
• Illumination specularities
• The result can be a depth map or point cloud
R. Collins. A space-sweep approach to true multi-image matching, CVPR 1996.
SEXTANT – TConf1
2012/09/24
Page 5
Plane-sweeping: sample result
512 by 348 images
101 depth planes
441 by 351 plane resolution
15 by 15 correlation kernel
SEXTANT – TConf1
2012/09/24
Page 6
Visual odometry: processing pipeline
• Harris corner detection
• BRIEF descriptor extraction
• BRIEF descriptor matching
• Sparse stereo triangulation
• Pose determination from 2D-3D matches in two views
• No temporal smoothing, e.g. local bundle adjustment
SEXTANT – TConf1
2012/09/24
Page 7
Harris (Plessey) corner detection: formulation
• Corner points exhibit significant intensity change in all directions
• The intensity change within a shifted image patch is captured by the
22 autocorrelation matrix M involving image derivatives:
 I x2
M   w( x, y ) 
x, y
 I x I y
IxI y 
2 
I y 
• The eigenvalues 1, 2 of M quantify the intensity change in a
shifting window
• Large intensity change in all directions implies that both eigenvalues
should be large
• Eigenvalues are not computed explicitly, thus avoiding costly
calculation of square roots
SEXTANT – TConf1
2012/09/24
Page 8
Harris corner detection: formulation (cont’d)
• A “cornerness” measure R determines if 1, 2 are sufficiently large
R  det M  k  trace M 
2
det M  12
trace M  1  2
• k is an empirical constant, chosen between 0.04-0.06. Alternative
cornerness measures avoiding arbitrary constants also available
• Subpixel corner approximation by locating the minimum of a
quadratic surface fitted to R
• Harris has high detection and repeatability rates but poor localization
• Involves moderate computational cost (mostly separable
convolutions)
• Not rotation or scale-invariant
C. Harris and M. Stephens. A Combined Corner and Edge Detector. Alvey Vision Conf. 1988.
SEXTANT – TConf1
2012/09/24
Page 9
Harris corner selection: ANMS
• Improved spatial distribution with the ANMS (Adaptive Nonmaximal Suppression) scheme of Brown, Szeliski and Winder
• Only corners whose cornerness is locally maximal are retained
Strongest corners using plain Harris (left) and Harris with ANMS (right)
M. Brown et al. Multi-Image Matching Using Multi-scale Oriented Patches. CVPR 2005.
SEXTANT – TConf1
2012/09/24
Page 10
BRIEF descriptor (BRIEF: Binary Robust Independent Elementary Features)
• Performs several pair-wise intensity comparisons on a Gaussiansmoothed image patch and encodes the outcomes in a bit vector
• The pattern of pixels to be compared by BRIEF in each patch is
selected randomly (same for all patches)
• BRIEF is less discriminate compared to SIFT but much faster to
compute and match, and more compact to store
M. Calonder et al. BRIEF: Binary Robust Independent Elementary Features. ECCV 2010.
SEXTANT – TConf1
2012/09/24
Page 11
BRIEF descriptor matching
• Standard distance ratio test (Lowe):
• Matches between images I1 and I2 are identified by finding
the two nearest neighbors of each keypoint from I1 among
those in I2, and only accepting a match if the distance to the
closest neighbor is less than a fixed threshold of that to the
second closest neighbor
• Non-symmetric
• Nearest neighbors are determined with the Hamming distance
which counts the number of positions at which two bit strings
differ
• Maximum disparity and epipolar constraints are also imposed
SEXTANT – TConf1
2012/09/24
Page 12
Sparse stereo
• Given some corner matches in a calibrated stereo pair, the 3D
points they originate from can be recovered via triangulation
• Triangulating rays will not exactly intersect; approximating
their point of intersection with least squares has no physical
meaning
L2
C1 m1
M
L1
m2
C
2
• Skewness of L1, L2 is dealt with by correcting m1, m2 so that
they are guaranteed to comply with the epipolar geometry
SEXTANT – TConf1
2012/09/24
Page 13
Sparse stereo (cont’d)
• Given an epipolar plane, we seek the optimal 3D point for (m1, m2)
l1
m1
m2´
m1´
l2
m2
• The solution is to select the closest points (m1´, m2´) on epipolar lines
and obtain 3D point through exact triangulation
• This is achieved by minimizing the distances to epipolar lines with a
non-iterative method involving the roots of a sixth degree polynomial
• An approximate but much cheaper alternative we use is to rely on the
Sampson approximation of the distance error
SEXTANT – TConf1
2012/09/24
Page 14
Pose estimation: overview
• Concerns the determination of position and orientation of a
camera given its intrinsic parameters and a set of N
correspondences between 3D points and their 2D projections
• Has been extensively studied due to its diverse applicability in
computer vision, robotics, augmented reality, HCI, …
• Our solution:
• A preliminary pose is estimated using an analytic P3P solver
combined with RANSAC for coping with mismatches
• The preliminary pose is refined by minimizing the total image
reprojection error pertaining to inliers
• Extended to the binocular case for better accuracy by jointly
minimizing the reprojection error in two images
• Motion parameters covariance computed as byproduct
SEXTANT – TConf1
2012/09/24
Page 15
Pose estimation: 3D-2D correspondences
• A stereo rig moving freely in space
t+1
t
Blue: projection rays, Green: spatial matches,
Red: temporal matches
• Temporal matches associate 3D points triangulated at time t
with their 2D image projections at time t+1
SEXTANT – TConf1
2012/09/24
Page 16
Quantitative evaluation of VO pipeline
• A 512x348 synthetic stereo sequence + ground truth. Motion is
primarily forward with a shallow right turn
• Moderate numbers of matches across images
• VO was run for 363 frames (total traveled distance ~ 22m)
• Naming convention: X-Y denotes detector X with descriptor Y, e.g.
HARRIS-BRIEF refers to Harris corners and BRIEF descriptors
Sequence courtesy of
Marcos Aviles, GMV
SEXTANT – TConf1
2012/09/24
Page 17
Accuracy of VO pipeline
Left: translational error wrt ground truth, right: rotational error
SEXTANT – TConf1
2012/09/24
Page 18
Summary & conclusions
• VO performance tested against ground truth associated with
simulated data
• HARRIS+BRIEF detector/descriptor achieves lower accuracy
compared to SIFT+SIFT
• HARRIS+BRIEF has considerably lower computational
requirements (32 times faster than SIFT+SIFT)
• HARRIS+BRIEF yields a relative translational error < 2%,
hence provides a good accuracy / performance tradeoff
• Binocular HARRIS+BRIEF is more accurate compared to
monocular HARRIS+BRIEF
• Binocular HARRIS+BRIEF runs @ 4 fps on an Intel Core
3GHz
SEXTANT – TConf1
2012/09/24
Page 19
Thank you
Any questions?
20
Download