Invariants (concluded); Lowe and Biederman Announcements • No class Thursday. Attend Rao lecture. • Double-check your paper assignments. Key Points • • • • Rigid rotation is 3x3 orthonormal matrix. 3-D Translation is 3x4 matrix. 3-D Translation + Rotation is 3x4 matrix. Scaled Orthographic Projection: Remove row three and allow scaling. • Planar Object, remove column 3. • Projective Transformations – Rigid Rotation of Planar Object Represented by 3x3 matrix. – When we write in homogeneous coordinates, projection implicit. – When we drop rigidity, 3x3 matrix is arbitrary. Projective x u a b c x u r r r t r r t x 1 w y r r r t r r t y d e f y1 v v w r r r t 0 r r t 1 g h 1 1 w 1 1 Rigid rotation and translation. Projective Transformation 1,1 1, 2 1, 3 x 1,1 1, 2 x 2 ,1 2,2 2,3 y 2 ,1 2,2 y 3 ,1 3, 2 3,3 z 3 ,1 3, 2 z Notation suggests that first two columns are orthonormal, and transformation has 6 degrees of freedom. Notation suggests that transformation is unconstrained linear transformation. Points in homogenous coordinates are equivalent. Transformation has 8 degrees of freedom, because its scale is arbitrary. Lines: Parameterization • Equation for line: ax+by+c=0. • Parameterize line as l = (a,b,c)T. • p=(x,y,1)T is on line if <p,l>=0. Line Intersection • The intersection of l and l’ is l x l’ (where x denotes the cross product). • This follows from the fact that the cross product is orthogonal to both lines. Intersection of Parallel Lines • Suppose l and l’ are parallel. We can write l=(a,b,c), l’ = (a,b,c’). l x l’ = (c’-c)(b,-a,0). This equivalent to (b,-a,0). • This point corresponds to a line through the focal point that doesn’t intersect the image plane. • We can think of the real plane as points (a,b,c) where c isn’t equal to 0. When c = 0, we say these points lie on the ideal line at infinity. • Note that a projective transformation can map this to another line, the horizon, which we see. Invariants of Lines • Notice that affine transformations are the subgroup of projective transformations in which the last row is (0, 0, 1). • These map the line at infinity to itself. • So parallel lines are affine invariants, since they continue to intersect at infinity. Invariance in 3D to 2D • 3D to 2D “Invariance” isn’t captured by mathematical definition of invariance because 3D to 2D transformations don’t form a group. – You can’t compose or invert them. • Definition: Let f be a function on images. We say f is an invariant iff for every Object O, if I1 and I2 are images of O, f(I1)=f(I2). • This means we can define f(O) as f(I) for I any image of O. O and I match only if f(O)=f(I). • f is a non-trivial invariant if there exist two image I1 and I2 such that f(I1)~=f(I2). Non-Invariance in 3D to 2D • Theorem: Assume valid objects are any 3D point sets of size k, for some k. Then there are no non-trivial invariants of the images of these objects under perspective projection. Proof Strategy • Let f be an invariant. • Suppose two objects, A and B have a common image. Then f(I)=f(J) if I and J are images of either A or B. • Given any O0, Ok, we construct a series of objects, O1, …, O(k-1), so that Oi and O(i+1) have a common image for all i, and Ok and j have a common image. • So for any pair of images, I, J, from any two objects, f(I) = f(J). Constructing O1 … Ok-1 • Oi has its first i points identical to the first i points of Ok, and the remaining points identical to the remaining points of O0. • If two objects are identical except for one point, they produce the same image when viewed along a line joining those two points. – Along that line, those two points look the same. – The remaining points always look the same. Summary • Planar objects give rise to rich set of invariants. • 3-D objects have no invariants. – We can deal with this by focusing on planar portions of objects. – Or special restricted classes of objects. – Or by relaxing notion of invariants. • However, invariants have become less popular in computer vision due to these limitations. Lowe and Biederman • Background • Viewpoint Invariant Non-Accidental Properties. – – – – Lowe sees these as probabilistic. Biederman drops this. Primitive properties Composing them into units/geons. • Use in Recognition. – Speed search. – Geons: analogy to speech. • Evidence for Value. – Computational speed. – Human psychology: parts; qualitative descriptions; view invariance. Background • Computational – 2D approach to recognition. • Lowe is reacting to Marr. • Partly due to Lowe, recognition rarely involves reconstruction now. (But also 3D models more rare). – State of the art: – Little recognition of 3D objects, grouping implicit. – Speed, robustness a big concern. – 2D recognition through search. • Psychology – Much more ambitious and specific than any prior theory of recognition (I believe). – P.O. widely studied, rarely related to other tasks. • Contrast. – CS must account for low-level processing. – Psych must account for categorization. Viewpoint Invariant NAPs • Non-Accidental Property – Happens rarely by chance – More frequently by scene structure. – p = property, c = chance, s = structure. P( p | s) P( s) P( p | s) P( s) P( s | p) P( p) P( p | s ) P( p | c) This is high due to viewpoint invariance. Jepson and Richards consider this Lowe focuses on this • Biederman downplays probabilistic inference. •Not concerned with background, feature detection. Examples (Copied from Lowe) Issues with Non-Accidental Properties • Is it “just” Bayesian inference? – Then why not model all information? • This may fit Lowe • Biederman relies more on certain inference. • See also Feldman, Jepson, Richards. Viewpoint Invariance • Match properties that are invariant to viewing conditions. – Parallelism, symmetry, collinearity, cotermination, straightness. – Lowe picks one side of property, Biederman stresses contrast. Why? • How used? – Lowe, correspondence of geometric features. Speed up search – Description of parts for indexing. Geons –Biederman, description of geons. Are they still view invariant when describing a geon? • 3D shape’s occluding contour depends on viewpoint. May be straight from one view, curved from another. • Metric properties not truly invariant. • Maybe more like quasiinvariants. Geons for Recognition • Analogy to speech. – 36 different geons. – Different relations between them. – Millions of ways of putting a few geons together. Empirical Support for Geons • First, divide geons predictions: – Part structure is important in recognition. – Perceptual grouping can be used for filling in. – NAPs are used for indexing. • View invariant descriptions. • Qualitative descriptions. • Second, what is alternative? – View-based recognition with many examples. Empirical Support • Recognition is fast. Fine metric judgments are slow. – Does this disqualify other approaches? • Recognition is view-invariant. – Does this disqualify other approaches? • Number of geon descriptions sufficient for number of categories we recognize. – Argues plausibility, but no more. Empirical Support (2) • 2-4 Geons needed for recognition. Complex objects no harder than simple ones. • Line Drawings vs. Colored images. Color similar speed. Empirical Support (3): Degraded Objects • Deleting contours that interfere with geon structure interferes more. • Deleting Components worse than midsections. • This argues for perceptual organization for interpolation/reconstruction. But for geons? • Should we measure information deleted rather than contour length? Conclusions • Maybe helpful to separate: – Perceptual organization/completion. – View Invariance – Part Structure. • All three widely used in computer vision. • Biederman’s paper probably addresses view-invariance least. – This became subject of much research.