Computational Theories & Low-level Pixels To Percepts A. Efros, CMU, Spring 2009 Four Stages of Visual Perception Sound A u d itio n L TM O d o r (e tc.) L ig h t L ig h t Im a g e B a se d P ro ce ssin g S TM M o to r V isio n S u rfa ce B a se d P ro ce ssin g O b je ctB a se d P ro ce ssin g M o ve ment C a te g o ryB a se d P ro ce ssin g Ceramic cup on a table David Marr, 1982 © Stephen E. Palmer, 2002 Four Stages of Visual Perception The Retinal Image An Image (blowup) Receptor Output © Stephen E. Palmer, 2002 Four Stages of Visual Perception Retinal Image Image-based Representation Imagebased processes Edges Lines Blobs etc. An Image (Line Drawing) Primal Sketch (Marr) © Stephen E. Palmer, 2002 We likely throw away a lot line drawings are universal Four Stages of Visual Perception Image-based Representation Surface-based Representation Surfacebased processes Stereo Shading Motion etc. Primal Sketch 2.5-D Sketch © Stephen E. Palmer, 2002 Single Surface (Koenderink’s trick) Four Stages of Visual Perception Image-based Representation Surface-based Representation Surfacebased processes Stereo Shading Motion etc. Primal Sketch 2.5-D Sketch © Stephen E. Palmer, 2002 Figure/Ground Organization A contour belongs to one of the two (but not both) abutting regions. Figure (face) Ground (Shapeless) Ground (shapeless) Figure (Goblet) Important for the perception of shape Figure-Ground Organization 15.18 Properties of figures vs. grounds Figure Thing-like Closer Shaped Ground Not thing-like Farther Extends behind © Stephen E. Palmer, 2002 Figure-Ground Organization 15.19 Principles of figure-ground organization: Surroundedness Surrounded region --> Figure Surrounding region --> Ground © Stephen E. Palmer, 2002 Figure-Ground Organization 15.20 Principles of figure-ground organization: Size Smaller region --> Figure Larger region --> Ground © Stephen E. Palmer, 2002 Figure-Ground Organization 15.21 Principles of figure-ground organization: Orientation Horizontal/vertical region --> Figure Oblique region --> Ground © Stephen E. Palmer, 2002 Figure-Ground Organization 15.22 Principles of figure-ground organization: Contrast Higher contrast region --> Figure Lower contrast region --> Ground © Stephen E. Palmer, 2002 Figure-Ground Organization 15.23 Principles of figure-ground organization: Symmetry Symmetrical region --> Figure Asymmetrical region --> Ground © Stephen E. Palmer, 2002 Figure-Ground Organization 15.24 Principles of figure-ground organization: Convexity More convex region --> Figure Less convex region --> Ground © Stephen E. Palmer, 2002 Figure-Ground Organization 15.25 Principles of figure-ground organization: Parallelism More parallel region --> Figure Less parallel region --> Ground © Stephen E. Palmer, 2002 Figure-Ground Organization 15.26 Principles of figure-ground organization: Lower region Lower region --> Figure Upper region --> Ground © Stephen E. Palmer, 2002 Figure-Ground Organization 15.27 Principles of figure-ground organization: Meaningfulness More meaningful region --> Figure Less meaningful region --> Ground © Stephen E. Palmer, 2002 Figure-Ground Organization 15.28 Relation to Depth Factors Figure-ground organization as edge assignment: To which side does the edge belong? To the closer side. This fact connects figure-ground organization with depth perception. Depth cues can also be figure-ground factors and Figure-ground factors can be depth cues. © Stephen E. Palmer, 2002 Figure-Ground Organization 15.29 Principles of figure-ground organization: Occlusion Occluding region --> Figure Occluded region --> Ground © Stephen E. Palmer, 2002 Figure-Ground Organization 15.30 Principles of figure-ground organization: Cast Shadows Shadowing region --> Figure Shadowed region --> Ground © Stephen E. Palmer, 2002 Figure-Ground Organization 15.32 Principles of figure-ground organization: Shading Shaded region --> Figure Nonshaded region --> Ground © Stephen E. Palmer, 2002 Line Labeling [Clowes 1971, Huffman 1971; Waltz 1972; Malik 1986] Constraint Propagation > : contour direction + : convex edge - : concave edge possible junctions (constraints) 26 Line Labeling Four Stages of Visual Perception Object-based Representation Surface-based Representation Objectbased processes Grouping Parsing Completion etc. 2.5-D Sketch Volumetric Sketch © Stephen E. Palmer, 2002 Geons (Biederman '87) Four Stages of Visual Perception Category-based Representation Object-based Representation Categorybased processes Category: cup Color: light-gray PatternRecognition Size: 6” Location: table Spatialdescription Volumetric Sketch Basic-level Category © Stephen E. Palmer, 2002 We likely throw away a lot line drawings are universal However, things are not so simple… ● Problems with feed-forward model of processing… Junctions in Real Images Are Junctions local evidence? J McDermott, 2004 Early vs. Late Grouping 14.38 Is grouping an early or late process? L ig h t ? ? ? ? Im a g e B a se d P ro ce ssin g S u rfa ce B a se d P ro ce ssin g O b je ctB a se d P ro ce ssin g C a te g o ryB a se d P ro ce ssin g © Stephen E. Palmer, 2002 Early vs. Late Grouping 14.39 Before or after stereoscopic depth? (Rock & Brosgole, 1964) © Stephen E. Palmer, 2002 Early vs. Late Grouping 14.40 Before or after lightness constancy? Opaque paper strip (Rock, Nijhawan, Palmer & Tudor, 1992) © Stephen E. Palmer, 2002 Early vs. Late Grouping 14.41 Before or after visual completion? (Palmer, Neff & Beck, 1996) © Stephen E. Palmer, 2002 Early vs. Late Grouping 14.42 Before or after illusory contours? ? (Palmer & Nelson, 2000) © Stephen E. Palmer, 2002 Early vs. Late Grouping 14.43 Conclusion: Grouping can occur “late” Question: Can grouping also occur “early” (Palmer & Brooks, in preparation) © Stephen E. Palmer, 2002 Early vs. Late Grouping 14.44 Grouping affects shape constancy Ambiguous Circle in depth Flat oval (Palmer & Brooks, in preparation) © Stephen E. Palmer, 2002 Early vs. Late Grouping 14.45 Proximity effects Biased toward oval Biased toward circle © Stephen E. Palmer, 2002 Early vs. Late Grouping 14.46 Color similarity effects Biased toward oval Biased toward circle © Stephen E. Palmer, 2002 Early vs. Late Grouping 14.47 Common fate effects Biased toward oval Biased toward circle © Stephen E. Palmer, 2002 Early vs. Late Grouping 14.48 Conclusion: Grouping occurs both “early” and “late” -- possibly everywhere! Grouping Grouping Grouping Grouping L ig h t Im a g e B a se d P ro ce ssin g S u rfa ce B a se d P ro ce ssin g O b je ctB a se d P ro ce ssin g C a te g o ryB a se d P ro ce ssin g © Stephen E. Palmer, 2002 two-tone images “attached shadow” contour hair (not shadow!) “cast shadow” contour inferred external contours Cavanagh's argument Finding 3D structure in two-tone images requires distinguishing cast shadows, attached shadows, and areas of low reflectivity The images do not contain this information a priori (at low level) A Classical View of Vision High-level Object and Scene Recognition Figure/Ground Organization Mid-level Grouping / Segmentation Low-level pixels, features, edges, etc. A Contemporary View of Vision High-level Mid-level Object and Scene Recognition Figure/Ground Organization Grouping / Segmentation But where we draw this line? Low-level pixels, features, edges, etc. Question #1: What (if anything) should be done at the “Low-Level”? N.B. I have already told you everything that is known. From now on, there aren’t any answers.. Only questions… Who cares? Why not just use pixels? Pixel differences vs. Perceptual differences Eye is not a photometer! "Every light is a shade, compared to the higher lights, till you come to the sun; and every shade is a light, compared to the deeper shades, till you come to the night." — John Ruskin, 1879 Cornsweet Illusion Sine wave Campbell-Robson contrast sensitivity curve Metamers Question #1: What (if anything) should be done at the “Low-Level”? i.e. What input stimulus should we be invariant to? Invariant to: • Brightness / Color changes? low-frequency changes small brightness / color changes But one can be too invariant Invariant to: • Edge contrast / reversal? I shouldn’t care what background I am on! but be careful of exaggerating noise Representation choices Raw Pixels Gradients: Gradient Magnitude: Thresholded gradients (edge + sign): Thresholded gradient mag. (edges): Spatial invariance • Rotation, Translation, Scale • Yes, but not too much… • In brain: complex cells – partial invariance • In Comp. Vision: histogram-binning methods (SIFT, GIST, Shape Context, etc) or, equivalently, blurring (e.g. Geometric Blur) -will discuss later Many lives of a boundary Often, context-dependent… input canny Maybe low-level is never enough? human 1/f amplitude spectra for natural images There are statistical regularities in the natural world, and image statistics reflect that. (Burton & Moorehead 1987; Field 1987; Tolhurst et al. 1992) (Field 1987) Why 1/f? Scale invariance Edges have 1/f structure Object distribution in real world (Ruderman 1997; Lee & Mumford 1999) (Image source: smokiesguidebook.com Slide content: Simoncelli & Olshausen 2001) A closer look at amplitude spectra (Torralba & Oliva 2003) Do natural image statistics matter? Sensory coding might exploit statistical regularities of our world according to various criteria: Representational efficiency Decorrelate input responses, make them independent, sparse, information theoretic metrics etc. Metabolic efficiency Spike efficiency, minimal wiring. Learning efficiency Sparseness, invariance, over completeness etc. Lots and lots of work; see reviews Graham & Field (2007), Simoncelli & Olshausen (2001)