Classification with Invariants

advertisement
Invariants (concluded); Lowe
and Biederman
Announcements
• No class Thursday. Attend Rao lecture.
• Double-check your paper assignments.
Key Points
•
•
•
•
Rigid rotation is 3x3 orthonormal matrix.
3-D Translation is 3x4 matrix.
3-D Translation + Rotation is 3x4 matrix.
Scaled Orthographic Projection: Remove row
three and allow scaling.
• Planar Object, remove column 3.
• Projective Transformations
– Rigid Rotation of Planar Object Represented by
3x3 matrix.
– When we write in homogeneous coordinates,
projection implicit.
– When we drop rigidity, 3x3 matrix is arbitrary.
Projective
 x
u 
a
b
c
x
u
r
r
r
t
r
r
t
x











 
 1     w 

 y  
  
 r r r t     r r t  y   d e f  y1    v    v w 
 r r r t  0   r r t  1   g h 1  1   w 
 1 

  
  







1
Rigid rotation and translation. Projective Transformation
1,1
1, 2
1, 3
x
1,1
1, 2
x
2 ,1
2,2
2,3
y
2 ,1
2,2
y
3 ,1
3, 2
3,3
z
3 ,1
3, 2
z
Notation suggests that first two
columns are orthonormal, and
transformation has 6 degrees of
freedom.
Notation suggests that
transformation is
unconstrained linear
transformation. Points in
homogenous coordinates are
equivalent. Transformation
has 8 degrees of freedom,
because its scale is arbitrary.
Lines: Parameterization
• Equation for line: ax+by+c=0.
• Parameterize line as l = (a,b,c)T.
• p=(x,y,1)T is on line if <p,l>=0.
Line Intersection
• The intersection of l and l’ is l x l’ (where
x denotes the cross product).
• This follows from the fact that the cross
product is orthogonal to both lines.
Intersection of Parallel Lines
• Suppose l and l’ are parallel. We can write l=(a,b,c),
l’ = (a,b,c’). l x l’ = (c’-c)(b,-a,0). This equivalent to
(b,-a,0).
• This point corresponds to a line through the focal
point that doesn’t intersect the image plane.
• We can think of the real plane as points (a,b,c) where
c isn’t equal to 0. When c = 0, we say these points lie
on the ideal line at infinity.
• Note that a projective transformation can map this to
another line, the horizon, which we see.
Invariants of Lines
• Notice that affine transformations are
the subgroup of projective
transformations in which the last row is
(0, 0, 1).
• These map the line at infinity to itself.
• So parallel lines are affine invariants,
since they continue to intersect at
infinity.
Invariance in 3D to 2D
• 3D to 2D “Invariance” isn’t captured by
mathematical definition of invariance because
3D to 2D transformations don’t form a group.
– You can’t compose or invert them.
• Definition: Let f be a function on images.
We say f is an invariant iff for every Object O,
if I1 and I2 are images of O, f(I1)=f(I2).
• This means we can define f(O) as f(I) for I any
image of O. O and I match only if f(O)=f(I).
• f is a non-trivial invariant if there exist two
image I1 and I2 such that f(I1)~=f(I2).
Non-Invariance in 3D to 2D
• Theorem: Assume valid objects are any
3D point sets of size k, for some k.
Then there are no non-trivial invariants
of the images of these objects under
perspective projection.
Proof Strategy
• Let f be an invariant.
• Suppose two objects, A and B have a
common image. Then f(I)=f(J) if I and J are
images of either A or B.
• Given any O0, Ok, we construct a series of
objects, O1, …, O(k-1), so that Oi and O(i+1)
have a common image for all i, and Ok and j
have a common image.
• So for any pair of images, I, J, from any two
objects, f(I) = f(J).
Constructing O1 … Ok-1
• Oi has its first i points identical to the first i
points of Ok, and the remaining points
identical to the remaining points of O0.
• If two objects are identical except for one
point, they produce the same image when
viewed along a line joining those two points.
– Along that line, those two points look the same.
– The remaining points always look the same.
Summary
• Planar objects give rise to rich set of
invariants.
• 3-D objects have no invariants.
– We can deal with this by focusing on planar
portions of objects.
– Or special restricted classes of objects.
– Or by relaxing notion of invariants.
• However, invariants have become less
popular in computer vision due to these
limitations.
Lowe and Biederman
• Background
• Viewpoint Invariant Non-Accidental Properties.
–
–
–
–
Lowe sees these as probabilistic.
Biederman drops this.
Primitive properties
Composing them into units/geons.
• Use in Recognition.
– Speed search.
– Geons: analogy to speech.
• Evidence for Value.
– Computational speed.
– Human psychology: parts; qualitative descriptions; view
invariance.
Background
• Computational
– 2D approach to recognition.
• Lowe is reacting to Marr.
• Partly due to Lowe, recognition rarely involves reconstruction
now. (But also 3D models more rare).
– State of the art:
– Little recognition of 3D objects, grouping implicit.
– Speed, robustness a big concern.
– 2D recognition through search.
• Psychology
– Much more ambitious and specific than any prior theory of
recognition (I believe).
– P.O. widely studied, rarely related to other tasks.
• Contrast.
– CS must account for low-level processing.
– Psych must account for categorization.
Viewpoint Invariant NAPs
• Non-Accidental Property
– Happens rarely by chance
– More frequently by scene structure.
– p = property, c = chance, s = structure.
P( p | s) P( s)
P( p | s) P( s)
P( s | p) 

P( p)
P( p | s )  P( p | c)
This is high due to viewpoint invariance.
Jepson and
Richards consider
this
Lowe focuses on this
• Biederman downplays probabilistic inference.
•Not concerned with background, feature detection.
Examples
(Copied from Lowe)
Issues with Non-Accidental
Properties
• Is it “just” Bayesian inference?
– Then why not model all information?
• This may fit Lowe
• Biederman relies more on certain inference.
• See also Feldman, Jepson, Richards.
Viewpoint Invariance
• Match properties that are invariant to viewing
conditions.
– Parallelism, symmetry, collinearity, cotermination,
straightness.
– Lowe picks one side of property, Biederman
stresses contrast. Why?
• How used?
– Lowe, correspondence of geometric features.
Speed up search
– Description of parts for indexing.
Geons
–Biederman, description of
geons. Are they still view
invariant when describing a
geon?
• 3D shape’s occluding
contour depends on
viewpoint. May be straight
from one view, curved from
another.
• Metric properties not truly
invariant.
• Maybe more like quasiinvariants.
Geons for Recognition
• Analogy to speech.
– 36 different geons.
– Different relations between them.
– Millions of ways of putting a few geons
together.
Empirical Support for Geons
• First, divide geons predictions:
– Part structure is important in recognition.
– Perceptual grouping can be used for filling in.
– NAPs are used for indexing.
• View invariant descriptions.
• Qualitative descriptions.
• Second, what is alternative?
– View-based recognition with many examples.
Empirical Support
• Recognition is fast. Fine metric
judgments are slow.
– Does this disqualify other approaches?
• Recognition is view-invariant.
– Does this disqualify other approaches?
• Number of geon descriptions sufficient
for number of categories we recognize.
– Argues plausibility, but no more.
Empirical Support (2)
• 2-4 Geons needed
for recognition.
Complex objects no
harder than simple
ones.
• Line Drawings vs.
Colored images.
Color similar
speed.
Empirical Support (3):
Degraded Objects
• Deleting contours that interfere with geon
structure interferes more.
• Deleting Components worse than
midsections.
• This argues for perceptual organization for
interpolation/reconstruction. But for geons?
• Should we measure information deleted
rather than contour length?
Conclusions
• Maybe helpful to separate:
– Perceptual organization/completion.
– View Invariance
– Part Structure.
• All three widely used in computer vision.
• Biederman’s paper probably addresses
view-invariance least.
– This became subject of much research.
Download