Computational Vision Jitendra Malik University of California, Berkeley What is in an image? The input is just an array of brightness values; humans perceive structure in it. From Pixels to Perception Water Tiger outdoor wildlife Grass Sand back Tiger head eye tail legs shadow mouse If visual processing was purely feedforward…(it isn’t) Pixels Local Neighborhoods Contours Surfaces Objects Scenes Water Tiger Grass Sand Low-level Image Processing Mid-level Grouping Figure/Ground Surface Attributes High-level Recognition Boundaries of image regions defined by a number of attributes Brightness/color Texture Motion Binocular disparity Familiar configuration Grouping is hierarchical A Perceptual organization forms a tree: Image BG B C grass bush far L-bird beak body beak eye head • A,C are refinements of B • A,C are mutual refinements • A,B,C represent the same percept R-bird body eye head Two segmentations are consistent when they can be explained by the same segmentation tree Humans assign a depth ordering to surfaces across a contour R1 appears in front of R2 R2 appears in front of R3 This can be done for images of natural scenes … Figure-Ground Labeling - red is near; blue is far Figure/Ground Organization A contour belongs to one of the two (but not both) abutting regions. Figure (face) Ground (Shapeless) Ground (shapeless) Figure (Goblet) Important for the perception of shape Some other aspects of perceptual organization Good continuation Amodal completion Modal completion What do we see here? And here? Some Pictorial Cues Support, Size 2 ? ? 1 3 ? Cast Shadows Shading Measuring Surface Orientation Binocular Stereopsis Optical flow for a pilot Object Category Recognition Shape variation within a category D’Arcy Thompson: On Growth and Form, 1917 studied transformations between shapes of organisms Attneave’s Cat (1954) Line drawings convey most of the information Objects are in Scenes Human stick figure from single image Input image Stick figure Support masks This is hard… Variety of poses Clothing Missing parts Small support for parts Background clutter Taxonomy and Partonomy Taxonomy: E.g. Cats are in the order Felidae which in turn is in the class Mammalia Recognition can be at multiple levels of categorization, or be identification at the level of specific individuals , as in faces. Partonomy: Objects have parts, they have subparts and so on. The human body contains the head, which in turn contains the eyes. These notions apply equally well to scenes and to activities. Psychologists have argued that there is a “basic-level” at which categorization is fastest (Eleanor Rosch et al). In a partonomy each level contributes useful information for recognition. Visual Control of Action Locomotion Navigation/Way-finding Obstacle Avoidance Manipulation Grasping Pick and Place Tool use Camera Obscura (Reinerus Gemma-Frisius, 1544) Camera Obscura (Angelo Sala, 1576-1637)