CV: methods of 3D sensing Structured light; Shape-from-shading; Photometric stereo;

advertisement
CV: methods of 3D sensing
Structured light;
Shape-from-shading;
Photometric stereo;
Depth-from-focus;
Structure from motion.
MSU CSE 803 Stockman
Alternate projection models





orthographic
weak perspective
simpler mathematical models
approximations often very good in
center of the FOV
can use as a first approximation and
then switch to full perspective
MSU CSE 803 Stockman
Perspective vs orthographic
projection
Orthographic is often
used in design and
blueprints. True
(scaled) dimensions
can be taken from
the image
MSU CSE 803 Stockman
Orthographic projection
MSU CSE 803 Stockman
Weak perspective is
orthographic and scaling
MSU CSE 803 Stockman
Study of approximation
MSU CSE 803 Stockman
P3P problem: solve for pose of object
relative to camera using 3
corresponding points (Pi, Qi)
3 points in 3D
3 corresponding
2D image points
MSU CSE 803 Stockman
What is the “pose” of an object?




“pose” means “position and orientation”
work in 3D camera frame defined by a
known camera with known parameters
common problem: given the image of a
known model of an object, compute the pose
of that object in the camera frame
needed for object recognition by alignment
and for robot manipulation
MSU CSE 803 Stockman
Recognition by alignment



Have CAD model
of objects
Detect image
features of
objects
Compute object
pose from 3D-2D
point matches
MSU CSE 803 Stockman
P3P solution approach
MSU CSE 803 Stockman
General PnP problem





“perspective n-point problem”
Given: n 3D points from some model
Given: n 2D image points known to
correspond to the 3D model points
Given: perspective transformation with known
camera parameters (not pose)
Solve for the location of all n model points in
terms of camera coordinates, or the relative
rotation and translation of the object model
MSU CSE 803 Stockman
Formal definition of PnP problem
Solutions exist for P3P: in most cases there are 2 solutions; in a rare case
there are 4 solutions (see Fischler and Bolles 1981 paper). An interative
solution, good for continuous tracking is given below.
A simpler solution using weak perspective has been provided by Huttenlocher
and Ullman (1988)
MSU CSE 803 Stockman
Deriving 3 quadratic equations
in 3 unknowns
We know qi; by solving for the 3 ai we
will known where each Pi is located
qi are unit vectors
We know the interpoint distances
from the model
MSU CSE 803 Stockman
Iteratively solving 3 equations
in 3 unknowns
Want these
all to be 0
MSU CSE 803 Stockman
Approximate via Taylor series
Start with some guessed a1, a2, a3 and move along gradient toward 0,0,0
MSU CSE 803 Stockman
Solution using Newton’s Method
MSU CSE 803 Stockman
Our functions have simple
partial derivatives
MSU CSE 803 Stockman
Iteration can be very fast
MSU CSE 803 Stockman
Notes on this P3P method



the equations actually have 8 solutions:
4 are behind the camera (-ai = ai’);
4 are possible, but rare;
2 are common – how to get both solutions?
method used by Ohmura et al (1988) to track a
human face at workstation using points outside
the eyes and one under the nose
any 3 model points can align with any 3 image
points – can match a ship to the image of a face
MSU CSE 803 Stockman
Using weak perspective




algorithm by Huttenlocher and Ullman
is in closed form – no iterations
it produces 2 solutions
these solutions can be used as starting
points for the iterative perspective
method
additional point correspondences can
be used to choose correct starting point
MSU CSE 803 Stockman
Shape from shading methods
Computing surface normals of
diffuse objects from the intensity
of surface pixels
MSU CSE 803 Stockman
Surface normals in C
orthographic projection
MSU CSE 803 Stockman
Information used by such
algorithms





Typically use weak perspective
projection model
Brightest surface elt points to light
Normal determined to be perpendicular
at object limb
Use differential equations to propagate
z from boundary using surface normal.
Smooth using neighbor information.
MSU CSE 803 Stockman
Results from Tsai-Shah Alg.
Left: from compturer generated image of a vase; right:
from a bust of Mozart
MSU CSE 803 Stockman
Constraint on surface normals
There is a “cone of constraint” for a
normal N relative to the light source.
MSU CSE 803 Stockman
How to use the constraints?
MSU CSE 803 Stockman
Photometric stereo: calibrate by
lighting a sphere, get tables
MSU CSE 803 Stockman
Photometric stereo: 3 lights
MSU CSE 803 Stockman
Photometric stereo: online
MSU CSE 803 Stockman
Comments




Photometric stereo is a brilliant idea
Rajarshi Ray got it to work well even on
specular objects, such as metal parts
Requires careful set up and calibration
Not a replacement for structured light,
which has better precision and flexibility
as evidenced by many applications.
MSU CSE 803 Stockman
Depth from focus
Humans and machine vision
devices can use focus in a single
image to estimate depth
MSU CSE 803 Stockman
Use model of thins lens
World point P
is “in focus”
at image
point p’
MSU CSE 803 Stockman
Automatic focus technique




Consumer camera autofocus – many methods
One method requires user to frame object in
a small window (face?)
Focus is changed automatically until the
contrast is the best
Search over focal length until small window
has the sharpest features (most energy)
MSU CSE 803 Stockman
Depth map from focus: concept
for an entire range of focal lengths fi
set focal plane at fi and take image
for all pixels (x,y) in the image,
compute contrast[ fi, x, y]
set Depth[x,y] = max contrast[fi, x, y]
MSU CSE 803 Stockman
A look at blur vs focal length
Can define resolution limit in line
pairs per inch; can define depthof-field of sensing
MSU CSE 803 Stockman
Points P create a blurred image
on non optimal image planes
Point P is in focus on plane S, but out of focus on
planes S’ and S”
MSU CSE 803 Stockman
Image
plane
How many line pairs can be resolved?



imagine a target that is just a set of
parallel black lines on white paper
if lines are far apart relative to the blur
radius b, then their image will be a set
of lines
if the lines are close relative to blur
radius b, then a gray image without
clear lines will be observed
MSU CSE 803 Stockman
Thin lens equation relates object
depth to image plane via f
For world point P in focus, then the thin lens equation is:
1/f = 1/u + 1/v
MSU CSE 803 Stockman
Derivation of thin lens
equation from geometry
MSU CSE 803 Stockman
To compute depth-of-field




the blur changes for different locations
via simple geometry
move image forward – get blur
move image backward – get blur
move image plane to extremes within
limiting blur b and compute depth of
field
MSU CSE 803 Stockman
extreme locations of v set
the extremes of u
a is aperture.
By similar
triangles
b/a = (v’-v)/v
so
v’/v = (a+b)/a
MSU CSE 803 Stockman
Compute near extreme of u
Apply thin lens
equation with v’
Note that if b=0, we
obtain Un = U
MSU CSE 803 Stockman
Compute far extreme of u
DEF: The depth of field is the difference between the far and near
object planes (Ur – Un) for the given imaging parameters and blur b.
Smaller focal lengths f yield larger DOF.
MSU CSE 803 Stockman
Example computation



assume f = 50 mm, u = 1000 mm,
b = 0.025mm, a = 5 mm
Un = 1000 (5 + 0.025) / (5 + 25/50)
= 1000 (5.025)/5.5 = 914
Ur = 1000 (5 – 0.025) / (5 – 25/50)
= 1000 (4.975)/4.5 = 1106
MSU CSE 803 Stockman
Example computation
assume f = 25 mm, u = 1000 mm,
b = 0.025mm, a = 5 mm
 Un = 1000 (5 + 0.025) / (5 + 25/25)
= 1000 (5.025)/6.0 = 838
 Ur = 1000 (5 – 0.025) / (5 – 25/25)
= 1000 (4.975)/4.5 = 1244
A smaller f gives larger DOF

MSU CSE 803 Stockman
Large a needed to pinpoint u
changing the aperture to 10 mm
Un = 955mm
Ur = 1050mm
 changing the aperture to 20 mm
Un = 977mm
Ur = 1024mm
(See work of Murali Subbarao)

MSU CSE 803 Stockman
Structure from Motion
A moving camera/computer
computes the 3D structure of the
scene and its own motion
MSU CSE 803 Stockman
Sensing 3D scene structure via
a moving camera
We now have two views
over time/space
compared to stereo
which has multiple
views at the same time.
MSU CSE 803 Stockman
Assumptions for now



The scene is rigid.
The scene may move or the camera
may move giving a sequence of 2 or
more 2D images
Corresponding 2D image points (Pi, Pj)
are available across the images
MSU CSE 803 Stockman
What can be computed


The 3D coordinates of the scene points
The motion of the camera
Camera sees
many frames
of 2D points
Rigid scene with
many 3D
interest points
From Jabara, Azarbayejani, Pentland
MSU CSE 803 Stockman
From 2D point correspondences,
compute 3D points WP and TR
MSU CSE 803 Stockman
applications



We can compute a 3D model of a
landmark from a video
We can create 3D television!
We can compute the trajectory of the
sensor relative to the 3D object points
MSU CSE 803 Stockman
Use only 2D correspondences, SfM
can compute 3D jig pts
… up to one scale factor.
MSU CSE 803 Stockman
http://www1.cs.columbia.edu/~je
bara/htmlpapers/SFM/sfm.html
Jabara, Azarbayejani, Pentland
a) Two video frames with
corresponding 2D
interest points. 3D
points can be computed
from SfM method.
b) Some edges detected
from 2D gradients.
c) Texture mapping from
2D frames onto 3D
polyhedral model.
d) 3D model can be
viewed arbitrarily!
MSU CSE 803 Stockman
Virtual museums; 3D TV?




Much work, and software, from about 10
years ago.
3D models, including shape and texture can
be made of famous places (Notre Dame, Taj
Mahal, Titanic, etc.) and made available to
those who cannot travel to see the real
landmark.
Theoretically, only quality video is required.
Usually, some handwork is needed.
MSU CSE 803 Stockman
Shape from Motion methods





Typically require careful mathematics
EX: from 5 matched points, get 10
equations to estimate 10 unknowns;
also a more popular 8 pt linear method
Effects of noise imply many matches
needed, still can have large errors
Methods can run in real time
Rich literature still evolving
MSU CSE 803 Stockman
Special mathematics



Epipolar geometry is modeled
Fundamental matrix: computed from a
pair of cameras and point matches
Essential matrix: specialization of
fundamental matrix when calibration is
available
MSU CSE 803 Stockman
Epipolar constraint on view pair
A) Relative orientation
of cameras C1 and C2
can be computed from
many point matches
B) 3D point positions (P) can also be computed from many point
matches. Fundamental matrix represents the constraints.
MSU CSE 803 Stockman
Revisit: Internal parameters of
the camera: 5,6,7 ?






Properties of actual
camera, not its pose
Actual focal length f
Actual pixel size Sx, Sy
Actual location Ix, Iy of
optical axis on image
array
Can have skew Sk
Can have radial distortion
of the lens r.
MSU CSE 803 Stockman
Sensor
array
Optical
axis
6 Extrinsic/external parameters




Define the pose of the camera in the
world
3 rotation parameters relative to W
3 translation parameters
Projection of world to image
IP = M M WP
i
e
where Me has 6 parameters and Mi has 5
MSU CSE 803 Stockman
Fundamental matrix F





Represents epipolar structure of 2 views of
scene
Depends only on the internal parameters of the
camera and the relative pose of the two views
Not dependent on the scene
Can compute F, and E, and more from many
correspondences: lots of literature and public
software
What actual mathematical methods? What point
detection and point correspondence methods?
MSU CSE 803 Stockman
Summary of shape-from methods



each uses a simple source of information;
math model often uses minimal information
Psychologist J.J. Gibson, and others, were
aware of information used by humans
David Marr, around 1980, proposed study of
Type-I AI research
* study information processing problem
* identify what information is used
* develop/study algorithm choices
* favor algorithm suited for human arch.
MSU CSE 803 Stockman
Recent years



Trend is away from minimal models;
minimal models are fragile
Multiple channels cooperate and
compete (see experiments by
Ramachandran at UCSD)
Human brain is more plastic than
formerly believed; many things are
learned, new neurons and connections
MSU CSE 803 Stockman
Download