Introduction

advertisement
3D Computer Vision
3D Vision
and Video Computing
CSC I6716
Spring2011
Topic 1 of Part II
Camera Models
Zhigang Zhu, City College of New York zhu@cs.ccny.cuny.edu
3D Computer Vision
and Video Computing

Closely Related Disciplines





Image Processing – images to mages
Computer Graphics – models to images
Computer Vision – images to models
Photogrammetry – obtaining accurate measurements from images
What is 3-D ( three dimensional) Vision?




3D Vision
Motivation: making computers see (the 3D world as humans do)
Computer Vision: 2D images to 3D structure
Applications : robotics / VR /Image-based rendering/ 3D video
Lectures on 3-D Vision Fundamentals




Camera Geometric Models (1 lecture)
Camera Calibration (2 lectures)
Stereo (2 lectures)
Motion (2 lectures)
3D Computer Vision
and Video Computing

Geometric Projection of a Camera




Pinhole camera model
Perspective projection
Weak-Perspective Projection
Camera Parameters


Intrinsic Parameters: define mapping from 3D to 2D
Extrinsic parameters: define viewpoint and viewing direction


Basic Vector and Matrix Operations, Rotation
Camera Models Revisited

Linear Version of the Projection Transformation Equation





Lecture Outline
Perspective Camera Model
Weak-Perspective Camera Model
Affine Camera Model
Camera Model for Planes
Summary
3D Computer Vision
and Video Computing

Camera Geometric Models




Lecture Assumptions
Knowledge about 2D and 3D geometric transformations
Linear algebra (vector, matrix)
This lecture is only about geometry
Goal
Build up relation between 2D images and 3D scenes
-3D Graphics (rendering): from 3D to 2D
-3D Vision (stereo and motion): from 2D to 3D
-Calibration: Determning the parameters for mapping
3D Computer Vision
and Video Computing
Image Formation
3D Computer Vision
Image Formation
and Video Computing
Light (Energy) Source
Surface
Imaging Plane
Camera:
Spec &
Pose
3D
Scene
Pinhole Lens
World
Optics
Sensor
Signal
2D
Image
3D Computer Vision
Pinhole Camera Model
and Video Computing
Image Plane
Optical Axis
f
Pinhole lens



Pin-hole is the basis for most graphics and vision
 Derived from physical construction of early cameras
 Mathematics is very straightforward
3D World projected to 2D Image
 Image inverted, size reduced
 Image is a 2D plane: No direct depth information
Perspective projection
 f called the focal length of the lens
 given image size, change f will change FOV and figure sizes
3D Computer Vision
Focal Length, FOV
and Video Computing

Consider case with object on the optical axis:
Image plane
f
z
viewpoint





Optical axis: the direction of imaging
Image plane: a plane perpendicular to the optical axis
Center of Projection (pinhole), focal point, viewpoint, nodal point
Focal length: distance from focal point to the image plane
FOV : Field of View – viewing angles in horizontal and vertical
directions
3D Computer Vision
Focal Length, FOV
and Video Computing

Consider case with object on the optical axis:
Image plane
z
f
Out of view






Optical axis: the direction of imaging
Image plane: a plane perpendicular to the optical axis
Center of Projection (pinhole), focal point, viewpoint, , nodal point
Focal length: distance from focal point to the image plane
FOV : Field of View – viewing angles in horizontal and vertical
directions
Increasing f will enlarge figures, but decrease FOV
3D Computer Vision
Equivalent Geometry
and Video Computing

Consider case with object on the optical axis:
f
z

More convenient with upright image:
z
f
Projection plane z = f

Equivalent mathematically
3D Computer Vision
Perspective Projection
and Video Computing

Compute the image coordinates of p in terms of the
world (camera) coordinates of P.
y
Y
p(x, y)
x
P(X,Y,Z )
X
0
Z
Z=f



Origin of camera at center of projection
Z axis along optical axis
Image Plane at Z = f; x // X and y//Y
X
x f
Z
Y
y f
Z
3D Computer Vision
Reverse Projection
and Video Computing

Given a center of projection and image coordinates of a
point, it is not possible to recover the 3D depth of the point
from a single image.
P(X,Y,Z) can be anywhere along this line
p(x,y)
All points on this line
have image coordinates (x,y).
In general, at least two images of the same point taken
from two different locations are required to recover depth.
3D Computer Vision
and Video Computing
Pinhole camera image
Amsterdam : what do you see in this picture?
straight
line
size
parallelism/angle
shape
shape
of planes
depth
Photo by Robert Kosara, robert@kosara.net
http://www.kosara.net/gallery/pinholeamsterdam/pic01.html
3D Computer Vision
and Video Computing
Pinhole camera image
Amsterdam
straight line
size
parallelism/angle
shape
shape
of planes
depth
Photo by Robert Kosara, robert@kosara.net
http://www.kosara.net/gallery/pinholeamsterdam/pic01.html
3D Computer Vision
and Video Computing
Pinhole camera image
Amsterdam
straight line
size
parallelism/angle
shape
shape
of planes
depth
Photo by Robert Kosara, robert@kosara.net
http://www.kosara.net/gallery/pinholeamsterdam/pic01.html
3D Computer Vision
and Video Computing
Pinhole camera image
Amsterdam
straight line
size
parallelism/angle
shape
shape
of planes
depth
Photo by Robert Kosara, robert@kosara.net
http://www.kosara.net/gallery/pinholeamsterdam/pic01.html
3D Computer Vision
and Video Computing
Pinhole camera image
Amsterdam
straight line
size
parallelism/angle
shape
shape
of planes
depth
Photo by Robert Kosara, robert@kosara.net
http://www.kosara.net/gallery/pinholeamsterdam/pic01.html
3D Computer Vision
and Video Computing
Pinhole camera image
Amsterdam
straight line
size
parallelism/angle
shape
shape

of planes
parallel to image
depth
Photo by Robert Kosara, robert@kosara.net
http://www.kosara.net/gallery/pinholeamsterdam/pic01.html
3D Computer Vision
and Video Computing
Pinhole camera image
Amsterdam: what do you see?
straight line
size
parallelism/angle
shape
shape

of planes
parallel to image
Depth
?
stereo
motion
size
structure
…
- We see spatial shapes rather than individual pixels
- Knowledge: top-down vision belongs to human
- Stereo &Motion most successful in 3D CV & application
- You can see it but you don't know how…
3D Computer Vision
and Video Computing
Yet
other pinhole camera images
Rabbit or Man?
Markus Raetz, Metamorphose II, 1991-92, cast iron, 15 1/4 x 12 x 12 inches
Fine Art Center University Gallery, Sep 15 – Oct 26
3D Computer Vision
and Video Computing
Yet
other pinhole camera images
2D projections are not the “same” as the real
object as we usually see everyday!
Markus Raetz, Metamorphose II, 1991-92, cast iron, 15 1/4 x 12 x 12 inches
Fine Art Center University Gallery, Sep 15 – Oct 26
3D Computer Vision
and Video Computing
It’s real!
3D Computer Vision
and Video Computing

Weak Perspective Projection
Average depth Z is much larger than the relative distance
between any two scene points measured along the optical axis
y
Y
p(x, y)
x
P(X,Y,Z )
X
0
Z
Z=f

A sequence of two transformations



Orthographic projection : parallel rays
Isotropic scaling : f/Z
Linear Model

Preserve angles and shapes
X
x f
Z
Y
y f
Z
3D Computer Vision
Camera Parameters
and Video Computing
Pose / Camera
xim
Image
frame
(xim,yim)
yim

y
x
p
Coordinate Systems





Frame
Grabber
O
Frame coordinates (xim, yim) pixels
Image coordinates (x,y) in mm
Camera coordinates (X,Y,Z)
World coordinates (Xw,Yw,Zw)
Camera Parameters


Object / World
Zw
P
Pw
Xw
Yw
Intrinsic Parameters (of the camera and the frame grabber): link the
frame coordinates of an image point with its corresponding
camera coordinates
Extrinsic parameters: define the location and orientation of the
camera coordinate system with respect to the world coordinate
system
3D Computer Vision
and Video Computing
y
(0,0)
x
p (x,y,f)
O




Image center
Directions of axes
Pixel size
ox
xim
o
y
yim
Pixel
(xim,yim)
x  ( xim  ox ) s x
y  ( yim  o y ) s y
From 3D to 2D


Size:
(sx,sy)
From image to frame

Intrinsic Parameters (I)
Perspective projection
Intrinsic Parameters



(ox ,oy) : image center (in pixels)
(sx ,sy) : effective size of the pixel (in mm)
f: focal length
X
x f
Z
Y
y f
Z
3D Computer Vision
and Video Computing
(x, y)


Intrinsic Parameters (II)
k1 , k2
(xd, yd)
Lens
Distortions
Modeled as simple radial distortions




r2 = xd2+yd2
x  xd (1  k1r 2  k2r 4 )
(xd , yd) distorted points
2
4
y

y
(
1

k
r

k
r
d
1
2 )
k1 , k2: distortion coefficients
A model with k2 =0 is still accurate for a
CCD sensor of 500x500 with ~5 pixels
distortion on the outer boundary
3D Computer Vision
Extrinsic Parameters
and Video Computing
xim
(xim,yim)
yim

O
y
x
p
From World to Camera
Zw
P  R Pw  T

Extrinsic Parameters


P
Pw
Xw
T
Yw
A 3-D translation vector, T, describing the relative locations of the
origins of the two coordinate systems (what’s it?)
A 3x3 rotation matrix, R, an orthogonal matrix that brings the
corresponding axes of the two systems onto each other
3D Computer Vision
and Video Computing
Linear

A point as a 2D/ 3D vector p   x   ( x, y)T




Algebra: Vector and Matrix
Image point: 2D vector
Scene point: 3D vector
Translation: 3D vector
 y
 
T: Transpose
P  ( X , Y , Z )T
T  (Tx , Ty , Tz )T
Vector Operations

Addition:



Translation of a 3D vector
Dot product ( a scalar):

P  Pw  T  ( X w  Tx , Yw  Ty , Z w  Tz )T
a.b = |a||b|cosq
c  a  b  aT b
Cross product (a vector)

Generates a new vector that is orthogonal to both of them
c  ab
a x b = (a2b3 - a3b2)i + (a3b1 - a1b3)j + (a1b2 - a2b1)k
3D Computer Vision
and Video Computing
Linear

Rotation: 3x3 matrix

Orthogonal :
R 1  RT , i.e. RRT  RT R  I



Algebra: Vector and Matrix
 r11
R  rij
 r21
33
 r31
 
r12
r22
r32
r13  R1T 
 T

r23   R 2 
r33  RT3 
 
9 elements => 3+3 constraints (orthogonal/cross ) => 2+2 constraints
(unit vectors) => 3 DOF ? (degrees of freedom, orthogonal/dot)
How to generate R from three angles? (next few slides)
Matrix Operations

R Pw +T= ? - Points in the World are projected on three new axes
(of the camera system) and translated to a new origin
 r11 X w  r12Yw  r13 Z w  Tx   R1T Pw  Tx 

 

T
P  RPw  T   r21 X w  r22Yw  r23 Z w  T y   R 2 Pw  T y 

  T

R
P

T
r
X

r
Y

r
Z

T
z 
 31 w 32 w 33 w z   3 w
3D Computer Vision
and Video Computing

Rotation: from Angles to Matrix
Rotation around the Axes
 Result of three consecutive
rotations around the
coordinate axes
O

Zw
R  R R  R

Notes:



Only three rotations
Every time around one axis
Bring corresponding axes to each other


Xw = X, Yw = Y, Zw = Z
First step (e.g.) Bring Xw to X
Xw

Yw

3D Computer Vision
and Video Computing
Rotation: from Angles to Matrix

cos 
R   sin 
 0

cos 
0
0
0
1
Zw
O
Yw
Rotation  around the Zw Axis





 sin 
Rotate in XwOYw plane
Goal: Bring Xw to X
But X is not in XwOYw
Xw
YwX X in XwOZw (Yw XwOZw)
 Yw in YOZ ( X YOZ)
Next time rotation around Yw
3D Computer Vision
and Video Computing
Rotation: from Angles to Matrix

cos 
R   sin 
 0

cos 
0
0
0
1
Zw
Xw
Rotation  around the Zw Axis



 sin 
Rotate in XwOYw plane so that
YwX X in XwOZw (YwXwOZw)
 Yw in YOZ (  XYOZ)
Zw does not change
O
Yw
3D Computer Vision
and Video Computing
cos 
R    0
 sin 

Zw
Xw
Rotation  around the Yw Axis



0  sin  
1
0 
0 cos  
Rotation: from Angles to Matrix
O

Rotate in XwOZw plane so that
Xw = X  Zw in YOZ (& Yw in YOZ)
Yw does not change
Yw
3D Computer Vision
and Video Computing
cos 
R    0
 sin 

0  sin  
1
0 
0 cos  
Rotation  around the Yw Axis



Rotation: from Angles to Matrix
O

Rotate in XwOZw plane so that
Xw = X  Zw in YOZ (& Yw in YOZ)
Yw does not change
Yw
Xw
Zw
3D Computer Vision
and Video Computing
0
1
R  0 cos 
0 sin 


 sin  
cos  
0
Rotation  around the Xw(X) Axis



Rotation: from Angles to Matrix
Rotate in YwOZw plane so that
Yw = Y, Zw = Z (& Xw = X)
Xw does not change

O
Yw
Xw
Zw
3D Computer Vision
and Video Computing
0
1
R  0 cos 
0 sin 
Rotation: from Angles to Matrix

 sin  
cos  
Yw
0

O
Xw
Zw

Rotation  around the Xw(X) Axis



Rotate in YwOZw plane so that
Yw = Y, Zw = Z (& Xw = X)
Xw does not change
3D Computer Vision
and Video Computing
Rotation: from Angles to Matrix
Appendix A.9 of the textbook

Rotation around the Axes

Result of three consecutive
rotations around the coordinate
axes
O
R  R R  R


Zw
Notes:






Rotation directions
The order of multiplications matters:
,,
Same R, 6 different sets of ,,
R Non-linear function of ,,
R is orthogonal
It’s easy to compute angles from R
cos  cos 

R   sin  sin  cos   cos  sin 
 cos  sin  cos   sin  sin 
 cos  sin 
sin  sin  sin   cos  cos 
 cos  sin  sin   sin  cos 
Xw

Yw

 sin 

 sin  cos  
cos  cos  
3D Computer Vision
and Video Computing
Rotation- Axis and Angle
Appendix A.9 of the textbook


According to Euler’s Theorem, any 3D rotation can be
described by a rotating angle, q, around an axis
defined by an unit vector n = [n1, n2, n3]T.
Three degrees of freedom – why?
 n12

R  I cos q  n2 n1
 n3 n1

n1n2
n22
n3 n2
n1n3 
 0

n2 n3  (1  cos q )   n3
 n2
n32 
 n3
0
n1
n2 
 n1  sin q
0 
3D Computer Vision
Linear
Version
and Video
Computing

World to Camera






Camera: P = (X,Y,Z)T
Image: p = (x,y)T
Not linear equations
Image to Frame



 r11 X w  r12Yw  r13 Z w  Tx   R1T Pw  Tx 

 

P  RPw  T   r21 X w  r22Yw  r23 Z w  T y   RT2 Pw  T y 

  T

r
X

r
Y

r
Z

T
 31 w 32 w 33 w z   R 3 Pw  Tz 
Camera to Image


Camera: P = (X,Y,Z)T
World: Pw = (Xw,Yw,Zw)T
Transform: R, T
of Perspective Projection
Neglecting distortion
Frame (xim, yim)T
World to Frame


(Xw,Yw,Zw)T -> (xim, yim)T
Effective focal lengths


fx = f/sx, fy=f/sy
Three are not independent
( x, y )  ( f
X
Y
, f )
Z
Z
x  ( xim  ox ) s x
y  ( yim  o y ) s y
r X  r Y  r Z T
xim  ox   f x 11 w 12 w 13 w x
r31 X w  r32Yw  r33 Z w  Tz
yim  o y   f y
r21 X w  r22Yw  r23 Z w  Ty
r31 X w  r32Yw  r33 Z w  Tz
3D Computer Vision
and Video Computing

Projective Space

Add fourth coordinate





Only extrinsic parameters
World to camera
3x3 Matrix Mint





X
 x1 
w


 
 x2   M int M ext  Yw 
Z 
x 
 3
 w 
1 
 xim   x1 / x3 

  

 yim   x2 / x3 
x1/x3 =xim, x2/x3 =yim
3x4 Matrix Mext


Pw =
(Xw,Yw,Zw, 1)T
Define (x1,x2,x3)T such that


Linear Matrix Equation of
perspective projection
Only intrinsic parameters
Camera to frame
 r11
M ext  r21
 r31
r12
r22
r32
 f x
M int   0
 0
0
 fy
0
r13 Tx  R1T

r23 T y   RT2
r33 Tz  RT3

Tx 

Ty 

Tz 
ox 
o y 
1 
Simple Matrix Product! Projective Matrix M= MintMext



(Xw,Yw,Zw)T -> (xim, yim)T
Linear Transform from projective space to projective plane
M defined up to a scale factor – 11 independent entries
3D Computer Vision
and Video Computing

Perspective Camera Model

Making some assumptions




Known center: Ox = Oy = 0
Square pixel: Sx = Sy = 1
  fr11
M   fr21
 r31
 fr12
 fr22
r32
 fr13  fTx 
 fr23  fTy 
r33
Tz 




11 independent entries <-> 7 parameters
Weak-Perspective Camera Model


Average Distance Z >> Range dZ
Define centroid vector Pw
Z  Z  RT
3 Pw  Tz


Three Camera Models
8 independent entries
Affine Camera Model



Mathematical Generalization of Weak-Pers
Doesn’t correspond to physical camera
But simple equation and appealing geometry




M wp  


Doesn’t preserve angle BUT parallelism
8 independent entries
fr11
fr21
0
fr12
fr22
0
 a11
M af  a21
 0
fr13
 fTx 

fr23
 fTy 

0
RT
3 Pw  Tz 
a12
a22
0
a13 b1 
a23 b2 
0 b3 
3D Computer Vision
and Video Computing

Planes are very common in the Man-Made World
nx X w  n yYw  nz Z w  d




One more constraint for all points: Zw is a function of Xw and Yw
Zw=0
Pw =(Xw, Yw,0, 1)T
3D point -> 2D point
Projective Model of a Plane


n T Pw  d
Special case: Ground Plane


Camera Models for a Plane
8 independent entries
General Form ?

8 independent entries
 x1    fr11
  
 x2    fr21
x   r
 3   31
 fr12
 fr22
r32
 Xw
 fr13  fTx 
 Yw

 fr23  fTy 
Z 0
r33
Tz  w
1
3D Computer Vision
and Video Computing

Camera Models for a Plane
A Plane in the World
nx X w  n yYw  nz Z w  d




Zw=0
Pw =(Xw, Yw,0, 1)T
3D point -> 2D point
 x1    fr11
  
 x2    fr21
x   r
 3   31
 fr12
 fr22
r32
Projective Model of Zw=0


One more constraint for all points: Zw is a function of Xw and Yw
Special case: Ground Plane


n T Pw  d
8 independent entries
General Form ?

 Xw
 fr13  fTx 
 Yw

 fr23  fTy 
Z 0
r33
Tz  w
1
8 independent entries
 x1    fr11
  
 x2    fr21
x   r
 3   31
 fr12
 fr22
r32
 fTx  X w 



 fT23  Yw 
Tz 1 
3D Computer Vision
and Video Computing

Camera Models for a Plane
A Plane in the World
nx X w  n yYw  nz Z w  d




 x1    fr11
  
 x2    fr21
x   r
 3   31
Zw=0
Pw =(Xw, Yw,0, 1)T
3D point -> 2D point
Projective Model of Zw=0


One more constraint for all points: Zw is a function of Xw and Yw
Special case: Ground Plane


n T Pw  d
 fr12
 fr22
r32
 Xw

 fr13  fTx 
 Yw 

 fr23  fTy 
Zw 

r33
Tz 

1 
8 independent entries
General Form ?

 x1    f (r11  nx r13 )  f (r12  n y r13 )  f (dr13  Tx )  X w 

  

x


f
(
r

n
r
)

f
(
r

n
r
)

f
(
dr

T
)
Y
nz = 1
 2 
21
x 23
22
y 23
23
y  w 
(r32  n y r33 )
(dr33  Tz ) 1 
Z w  d  nx X w  n yYw  x3   (r31  nx r33 )

8 independent entries

2D (xim,yim) -> 3D (Xw, Yw, Zw) ?
3D Computer Vision
and Video Computing

Graphics /Rendering

From 3D world to 2D image




Changing viewpoints and directions
Changing focal length
Fast rendering algorithms
Vision / Reconstruction

From 2D image to 3D model





Applications and Issues



 xim   x1 / x3 

  

y
x
/
x
 im   2 3 
Inverse problem
Much harder / unsolved
Robust algorithms for matching and parameter estimation
Need to estimate camera parameters first
Calibration





X
 x1 
w


 
 x2   M int M ext  Yw 
Z 
x 
 3
 w 
1 
Find intrinsic & extrinsic parameters
Given image-world point pairs
Probably a partially solved problem ?
11 independent entries

<-> 10 parameters: fx, fy, ox, oy, ,,, Tx,Ty,Tz
 f x
M int   0
 0
 r11
M ext  r21
r31
0
 fy
0
r12
r22
r32
ox 
o y 
1 
r13 Tx 
r23 Ty 
r33 Tz 
3D Computer Vision
and Video Computing

Geometric Projection of a Camera




Pinhole camera model
Perspective projection
Weak-Perspective Projection
Camera Parameters (10 or 11)



Camera Model Summary
Intrinsic Parameters: f, ox,oy, sx,sy,k1: 4 or 5 independent
parameters
Extrinsic parameters: R, T – 6 DOF (degrees of freedom)
Linear Equations of Camera Models (without distortion)





General Projection Transformation Equation : 11 parameters
Perspective Camera Model: 11 parameters
Weak-Perspective Camera Model: 8 parameters
Affine Camera Model: generalization of weak-perspective: 8
Projective transformation of planes: 8 parameters
3D Computer Vision
Next
and Video Computing

Determining the value of the extrinsic and intrinsic parameters of
a camera
Calibration
(Ch. 6)




X
 x1 
w


 
 x2   M int M ext  Yw 
Z 
x 
 3
 w 
1 
 f x
M int   0
 0
 r11
M ext  r21
r31
0
 fy
0
r12
r22
r32
ox 
o y 
1 
r13 Tx 
r23 Ty 
r33 Tz 
Download