Object Recognition Computer Vision CIS-581 Fall 2009, Jianbo Shi Human vision: recognition Slides taken from Bart Rypma 1 “What & Where” Visual Pathways • Established with electophysiology, lesion, neuropsychology and neuroimaging data Monkey Lesion Data • Two types of Delayed Response Task • Monkeys trained to criterion on one of these tasks • Then task was reversed • After learning, either temporal or parietal lobe lesioned Landmark Discrimination Task Object Discrimination Task 2 Effects of Lesion on Landmark Task • Unoperated monkeys show no impairment • Temporal-lobe lesion monkeys show minimal impairment • Parietal-lobe lesion monkeys show much impairment Effects of Lesion on Object Task • Temporal-lobe lesion monkeys show much impairment • Parietal-lobe lesion monkeys show minimal impairment 3 Monkey Lesion Data • Subsequent lesion work supports the “whatwhere” distinction • Object discrimination: Ventral lesion deficits restricted to visual modality • Posterior/Anterior Ventral Lobe distinction: – Posterior: Visual discrimination – Anterior: Visual memory The What-Where Distinction: Human Neuroimaging Object task: Same objects? Spatial Task” Same locations? • Data indicate evidence for what-where distinction 4 Human Neuropsychological Data • Agnosia • Term coined by Sigmund Freud • From the Greek word for “lack of knowledge” • The inability to recognize objects when using a given sense, even though that sense is basically intact (Nolte, 1999) Agnosia • Usually involves damage to the occipito-parietal pathway 5 Patient GS • Sensory abilities intact • Language normal • Unable to name objects Agnosia • Two Types • Apperceptive – Object recognition failure due to perceptual processing • Associative – Perceptual processing intact but subject cannot use information to recognize objects 6 Agnosia Apperceptive Associative • Depends on the availability of the object representation to consciousness Apperceptive Agnosias (also known as visual space agnosias) refer to a condition in which a person fails to recognize objects due to a functional impairment of the occipito-temporal vision areas of the brain. Other elementary visual functions such as acuity, colour vision, and brightness discrimination are still intact. Apperceptive agnosics are unable to distinguish visual shapes and so have trouble recognizing, copying, or discriminating between different visual stimuli. When patients are able to identify objects, they do so based on inferences using colour, size, texture and/or reflective cues to piece it together. For example, in the image below, an apperceptive patient may not be able to distinguish a poker chip from a scrabble tile despite their clear difference in shape and surface features. QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. 7 This would be problem for apperceptive agnosia patient: QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. They also have trouble with object constancy by view changes Right hemisphere lesions Associative Agnosias are also known as visual object agnosias. Although they can present with a variety of symptoms, the main impairment is failure to recognize visually presented objects despite having intact perception of that object. A patient with an associative agnosia may be able to replicate a drawing of the object but still fail to recognize it. Errors in misidentifying an object as one that looks similar are common. Three specific criteria are associated with a diagnosis of associative agnosia (Farah,1990): 1) Difficulty recognizing a variety of visually presented objects (e.g., naming or grouping objects together according to their semantic categories). 2) Normal recognition of objects from a verbal description of it or when using a sense other than vision such as touch, smell, or taste. 3) Elementary visual perception intact sufficient to copy an object, as exemplified in original and copied picture below. Overall, this loss can be thought of as "recognition without meaning". QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. 8 Prosopagnosia • Specific inability to recognize faces • Are faces and other objects in the world represented in fundamentally different ways in memory? • Does face-memory depend on fundamentally different brain systems? Are Faces Special? • Subjects presented with a face and asked to represent a face-part • Subjects presented with a house and asked to represent a house-part 9 Are Faces Special? • Houses represented in parts • Faces represented as wholes Are Faces Special? • Objects represented in parts and holistically • Faces represented holistically 10 Object Recognition, Computer Vision • Three distinct Approaches: 1) Alignment, prototype, 2) Part-based classification, 3) Invariance, geometrical & photometrical, hashing Hypothesis-Test: Alignment Method 11 Recognition by Hypothesize and Test • General idea – Hypothesize object identity and pose – Render object in camera – Compare to image • Issues – where do the hypotheses come from? – How do we compare to image (verification)? 12 Step 1: correspondence What are the features? • They have to project like points – – – – Lines Conics Other fitted curves Regions (particularly the center of a region, etc.) 13 Step 2: Shape deformation and matching Pose consistency • Strategy: – Generate hypotheses using small numbers of correspondences (e.g. triples of points for a calibrated perspective camera, etc., etc.) – Backproject and verify • Appropriate groups are “frame groups” 14 Figure from “Object recognition using alignment,” D.P. Huttenlocher and S. Ullman, Proc. Int. Conf. Computer Vision, 1986, copyright IEEE, 1986 Models Body Recognition G. Mori, X. Ren, A. Efros, and J. Malik, Recovering Human Body Configurations: Combining Segmentation and Recognition, IEEE Computer Vision and Pattern Recognition, 2004. 15 G. Mori, X. Ren, A. Efros, and J. Malik, Recovering Human Body Configurations: Combining Segmentation and Recognition, IEEE Computer Vision and Pattern Recognition, 2004. 16 Problem with Alignment algorithm: Example 1: View-point variations, many examples are needed T. Sebastian Example 2: Partial occlusion T. Sebastian 17 Part-based Object Recognition Binford ‘78 18 Computing part-decomposition • Shocks (or medial axis or skeleton) are locus of centers of maximal circles that are bitangent to shape boundary Shape boundary Shocks T. Sebastian 19 • Complexity-increasing shape deformation paths are not optimal • Represent a deformation path by a pair of simplifying deformation paths from A, B to a simpler shape C T. Sebastian • Shock graph edit operation transforms a shape to adjacent transition shape T. Sebastian 20 • Edit-distance is defined as the sum of the cost of edits in optimal edit sequence T. Sebastian • Shock graphs represents object parts and part hierarchy Edit-distance is robust in presence of part-based changes T. Sebastian 21 Invariance + hashing Figure from “Efficient model library access by projectively invariant indexing functions,” by C.A. Rothwell et al., Proc. Computer Vision and Pattern Recognition, 1992, copyright 1992, IEEE 22 Invariant Local Features • Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging parameters SIFT Features David Lowe 23 24 25