ppt - Ziv Yaniv

advertisement
Introduction to Robot Vision
Ziv Yaniv
Computer Aided Interventions and Medical Robotics,
Georgetown University
Vision
The special sense by which the qualities of an
object (as color, luminosity, shape, and size)
constituting its appearance are perceived
through a process in which light rays entering
the eye are transformed by the retina into
electrical signals that are transmitted to the brain
via the optic nerve.
[Miriam Webster dictionary]
The Sensor
endoscope
Single Lens Reflex
(SLR) Camera
webcam
C-arm X-ray
The Sensor
Model: Pin-hole Camera, Perspective Projection
z
y
y
x
z
focal point
x
Machine Vision
Goal:
Obtain useful information about the 3D world from 2D
images.
Model:
images
Regions
Textures
Corners
Lines
…
3D Geometry
Object identification
Activity detection
…
actions
Machine Vision
Goal:
Obtain useful information about the 3D world from 2D
images.
•Low level (image processing)
• image filtering (smoothing, histogram modification…),
• feature extraction (corner detection, edge detection,…)
• stereo vision
• shape from X (shading, motion,…)
•…
• High level (machine learning/pattern recognition)
• object detection
• object recognition
• clustering
•…
Machine Vision
• How hard can it be?
Machine Vision
• How hard can it be?
Robot Vision
1. Simultaneous Localization and Mapping
(SLAM)
2. Visual Servoing.
Robot Vision
1. Simultaneous Localization and Mapping
(SLAM) – create a 3D map of the world and
localize within this map.
NASA stereo vision image processing, as used by the MER Mars rovers
Robot Vision
1. Simultaneous Localization and Mapping
(SLAM) – create a 3D map of the world and
localize within this map.
“Simultaneous Localization and Mapping with Active Stereo Vision”, J.
Diebel, K. Reuterswärd, S. Thrun, J. Davis, R. Gupta, IROS 2004.
Robot Vision
1. Visual Servoing – Using visual feedback to
control a robot:
a) image-based systems: desired motion directly from
image.
“An image-based visual servoing scheme for
following paths with nonholonomic mobile
robots” A. Cherubini, F. Chaumette, G. Oriolo,
ICARCV 2008.
Robot Vision
1. Visual Servoing – Using visual feedback to
control a robot:
b) Position-based systems: desired motion from 3D
reconstruction estimated from image.
System Configuration
• Difficulty of similar tasks in different settings
varies widely:
– How many cameras?
– Are the cameras calibrated?
– What is the camera-robot configuration?
– Is the system calibrated (hand-eye calibration)?
Common configurations:
y
x
z
x
x
z
y
y
y
z
x
z
System Characteristics
• The greater the control over the system
configuration and environment the easier it is to
execute a task.
• System accuracy is directly dependent upon
model accuracy – what accuracy does the task
require?.
• All measurements and derived quantitative
values have an associated error.
Stereo Reconstruction
• Compute the 3D location of a point in the stereo rig’s coordinate system:
• Rigid transformation between the two cameras is known.
• Cameras are calibrated –given a point in the world coordinate system we
know how to map it to the image.
• Same point localized in the two images.
world
Camera 1
Camera 2
2
1
T
Commercial Stereo Vision
Polaris Vicra infra-red system
(Northern Digitial Inc.)
MicronTracker visible light system
(Claron Technology Inc.)
Commercial Stereo Vision
Images acquired by the Polaris Vicra infra-red stereo system:
left image
right image
Stereo Reconstruction
• Wide or short baseline – reconstruction accuracy vs. difficulty of point matching
Camera 1
Camera 2
Camera 2
Camera 2
Camera Model
• Points P, p, and O, given in the camera coordinate system, are collinear.
There is a number a for which O + aP = p
aP = p
a = f/Z , therefore x  f
O
x
f
y f
Y
Z
P=[X,Y,Z]
p=[x,y,f]
y
X
Z
z
u   f
v   0
  
 w  0
0
f
0
X 
0 0  
Y


0 0
Z 
1 0  
1
Camera Model
Transform the pixel coordinates from the camera coordinate system to the image
coordinate system:
• Image origin (principle point) is at [x0,y0] relative to the camera coordinate
system.
• Need to change from metric units to pixels, scaling factors kx, ky.
y
[x’,y’]
x
principle point
 u '   fk x
 v'    0
  
 w'  0
0
fk y
0
x0
y0
1
X 
0  
Y
0  
Z 

0  
1
• Finally, the image coordinate system may be skewed resulting in:
 u '   fk x
 v'    0
  
 w'  0
s
fk y
0
x0
y0
1
X 
0  
Y
0  
Z 

0  
1
Camera Model
• As our original assumption was that points are given in the camera coordinate
system, a complete projection matrix is of the form:
M34
R  RC
 K 33[I33 | 031 ]
 KR[I | C]

1 
0
• How many degrees of freedom does M have?
M 34
 m11
 m21
m31
m12
m13
m22
m23
m32
m33
m14   M 1T 


m24    M 2T 
m34   M 3T 
 fk x
K   0
 0
s
fk y
0
x0 
y0 
1 
C – camera origin in the
world coordinate system.
Camera Calibration
• Given pairs of points, piT=[x,y,w], PiT=[X,Y,Z,W], in homogenous coordinates we
have:
image
coordinate
system
p  MP
z
y
Our goal is to estimate M
y
x
calibration object/
world coordinate
system
principle point
x
z
camera coordinate system
•As the points are in homogenous coordinates the vectors p and MP are not
necessarily equal, they have the same direction but may differ by a non-zero scale
factor.
p MP  0
Camera Calibration
• After a bit of algebra we have:
 0T

T
w
P
i
i

 yi PiT

 wi PiT
0T
xi PiT
yi PiT   M1 

 xi PiT  M 2   0
0T  M 3 
Am  0
• The three equations are linearly dependent: 
xi
y
A1  i A 2  A 3
wi
wi
• Each point pair contributes two equations.
• Exact solution: M has 11 degrees of freedom, requiring a minimum of n=6 pairs.
• Least squares solution: For n>6 minimize ||Am|| s.t. ||m||=1.
Obtaining the Rays
• Camera location in the calibration object’s coordinate system, C, is
given by the one dimensional right null space of the matrix M
(MC=0).
• A 3D homogenous point P = M+p is on the ray defined by p and the
camera center [it projects onto p, MM+p =Ip=p].
• These two points define our ray in the world coordinate system.
• As both cameras were calibrated with respect to the same
coordinate system the rays will be in the same system too.
Intersecting the Rays
r1 (t1 )  a1  t1n1
r2 (t2 )  a2  t2n2
a1
n1
a2
t1 
((a 2  a1 )  n 2 ) (n1  n 2 )
T
n1  n 2
2
t2 
n2
((a 2  a1 )  n1 )T (n1  n 2 )
[r1 (t1 )  r2 (t 2 )]
2
n1  n 2
2
World vs. Model
• Actual cameras most often don’t follow the ideal pin-hole model, usually exhibit
some form of distortion (barrel, pin-cushion, S).
• Sometimes the world changes to fit your model, improvements in camera/lens
quality can improve model performance.
old image-Intensifier x-ray:
pin-hole+distortion
replaced by flat panel x-ray: pin-hole
Additional Material
•
Code:
– Camera calibration toolbox for matlab (Jean-Yves Bouguet )
http://www.vision.caltech.edu/bouguetj/calib_doc/
•
Machine Vision:
– “Multiple View Geometry in Computer Vision”, Hartley and Zisserman,
Cambridge University Press.
– "Machine Vision", Jain, Kasturi, Schunck, McGraw-Hill.
•
Robot Vision:
– “Simultaneous Localization and Mapping: Part I”, H. Durant-Whyte, T. Bailey,
IEEE Robotics and Automation Magazine, Vol. 13(2), pp. 99-110, 2006.
– “Simultaneous Localization and Mapping (SLAM) : Part II”,T. Bailey, H. DurantWhyte, IEEE Robotics and Automation Magazine, Vol. 13(3), pp. 108-117, 2006.
– “Visual Servo Control Part I: Basic Approaches”, IEEE Robotics and Automation
Magazine, Vol. 13(4), 82-90, 2006.
– Visual Servo Control Part II: Advanced Approaches”, IEEE Robotics and
Automation Magazine, Vol. 14(1), 109-118, 2007.
Download