Object Recognition Through Reasoning About Functionality: A Survey of Related Work and Open Problems Louise Stark University of the Pacific Stockton, California Melanie Sutton University of the West Florida Pensacola, Florida Dagstuhl Oct09 1 Function-Based Research Dr. Louise Stark University of the Pacific Stockton, CA Dagstuhl Oct09 2 University of the Pacific Dagstuhl Oct09 3 University of the Pacific Dagstuhl Oct09 4 University of West Florida Pre/post-hurricane season… Seminar Goals • This seminar brings together scientists from disciplines such as computer science, neuroscience, robotics, developmental psychology, and cognitive science Dagstuhl Oct09 6 Seminar Goals • Hope to further the knowledge • how the perception of form relates to object function • how intention and task knowledge (and hence function) aids in the recognition of relevant objects Dagstuhl Oct09 7 Overview • Recognition based on functionality • Overview of GRUFF approach • Functionality in Related Disciplines • Open Problem Areas Dagstuhl Oct09 8 Function-based Approaches Cognitive Psychology/Human Perception Representations of object categories Human-robot interaction strategies Wayfinding Artificial Intelligence Computer Vision Formal representations of knowledge Machine learning techniques to automate reasoning Document/aerial image analysis Interpreting human motion Object recognition/categorization Robotics Mapping of indoor environments Object detection Navigation/interaction plans Formalisms for autonomous robot control Dagstuhl Oct09 9 Computer Vision? • Deriving meaningful descriptions of the environment from images •Descriptions needed for •Recognition •Manipulation •Reasoning about objects Dagstuhl Oct09 10 Generic Object Recognition Minsky (1991) • Argued for the necessity of representing knowledge about functionality • “… rarely use a representation in an intentional vacuum, but we always have goals…” • “… we must classify things… according to what they can be used for.” • Dagstuhl Oct09 11 Motivation Parameterized Model Structural Model Could these be recognized? Dagstuhl Oct09 12 GRUFF Generic Recognition Using Form and Function chair (cher) n. - a piece of furniture for one person to sit on Dagstuhl Oct09 13 What is the goal? Develop alternative approaches to generic object recognition & manipulation - concentrate on man made objects (artifacts) Human artifacts – existence or non/existence of properties can be deduced by analyzing the shape of an object For any particular object category – there is some set of functional properties shared by ALL objects in that category. Dagstuhl Oct09 14 Approach to the Problem •Derive the format of my function-based representation • Confirm feasibility of appoach test domainperfect input - planar face models • Expand the domains • Test real data • Interact to confirm functionality • Exploit contextual information Dagstuhl Oct09 15 Knowledge in GRUFF is of three types: A category hierarchy which specifies superordinate / basic / subordinate categories furniture chair arm chair Functional properties that define each catgory (provides_sittable_surface, provides_stability,...) Knowledge primitives used to reason about shape (dimensions, relative orientation, ...) All organized into a "category definition tree" which is GRUFF's knowldge about the world. Dagstuhl Oct09 16 Category Representation Tree Conventional Chair Provides Sittable Surface Provides Stable Support Dagstuhl Oct09 17 We imagine the definition of a generic object category to be something like... straight_back_chair ::= provides_seating_surface + provides_stability + provides_back_support_surface and recognition is conceptualized as ... Provides_back_support provides_arm_support Provides_sittable_surface provides_stable_support Dagstuhl Oct09 18 Shape-based Knowledge Primitives A functional requirement such as : provides_sittable_surface is implemented as a sequence of calls to shape-based operators. dimensions(shape_element, dimensions_type, range_parameters) relative_orientation(normal 1,normal 2, range_parameters) clearance(shape_element clearance_volume) Dagstuhl Oct09 19 Knowledge Primitives Abstract shape reasoning • Metric dimensions (width, depth, height, area, contiguous surface, volume • Proximity • Relative orientation • Clearance • Stability • Enclosure Dagstuhl Oct09 20 Knowledge Primitives Physical interaction reasoning • Change orientation • Apply force • Observe deformation Dagstuhl Oct09 21 Evaluation Measures Value returned from knowledge primitive invocation 1.0 Evaluation Measure 0.0 least low high greatest ideal ideal Values of Shape Property Dagstuhl Oct09 22 Combining Evidence •Combine required measurements using probabilistic AND (0-1) •Combine descendent subcategory node measure using probabilistic OR Dagstuhl Oct09 23 Recognition Process • Category representation graph is control structure • Structural Constraint Propagation – subcategory nodes constrained by what was found for the parent Dagstuhl Oct09 24 Recognition Stage 2 approaches 1. Check all known categories in the knowledge base 2. Confirm/deny object can/cannot function as a specified (sub)category Dagstuhl Oct09 25 Valid Chairs Recognized by GRUFF Dagstuhl Oct09 26 History of GRUFF Project Dagstuhl Oct09 27 Context-based Reasoning GRUFF Generic object recognition system Reasons about and generates plans for understanding 3D scenes of objects Extension to Context-based Reasoning Determine significance of accumulated functional evidence to infer the existence of scene concepts Dagstuhl Oct09 28 Functionality in the Large What makes an 'office' an office? A desk with at least one chair in close proximity. You categorize areas or workspaces by the functional configuration of the objects in the area. Dagstuhl Oct09 29 Context-based Reasoning Name: Office Type: Category Function Verification Plan Realized by Potential Results Name: Provides potential seating Name: Infer Seating Areas Name: Infer Back Support Name: Provides potential worksurfaces Name: Infer worksurfaces Context-based Reasoning Shape-based Reasoning Dagstuhl Oct09 30 What Did Change? • Multiple objects in scene • Relax functional requirements • Allow partial evidence Dagstuhl Oct09 31 What Did Not Change? • Basic set of functional primitives • Organization of the representation • Categorization, not identification Dagstuhl Oct09 32 Test Data Simulated data - Complete 3D models evaluated no occlusion surfaces - Partial 3D models derived from laser range finder simulation tool Real data - Stereo camera system generating range data (SRI's Small Vision System software) Dagstuhl Oct09 33 Test Scenes Used in Context-based Reasoning Dagstuhl Oct09 34 Test Scenes Used in Context-based Reasoning Dagstuhl Oct09 35 Context-based Reasoning System Infer contextual relationships from accumulated functional evidence Provides potential worksurfaces Provides potential seating (back support and/or seating area) Dagstuhl Oct09 36 What is the goal? Question – How do we recognize objects we have never previously encountered? - we don'thave a model (or do we?) EssentiallyWe categorize objects using some type of "model" Dagstuhl Oct09 37 Earlier Work Roberts “Machine perception of three dimensional solids” 1965 •Analyze intensity image •Extract edge information •Match against library of geometric models - “Model-based vision” paradigm - “Single arbitrary view 3-D object recognition” paradigm Dagstuhl Oct09 38 Earlier Work Binford “Survey of model-based image analysis systems” 1982 “The essential definition of object class is functional. … Object classes have an associated 3-D form: form equals function. … Dagstuhl Oct09 39 Earlier Work Binford “Survey of model-based image analysis systems” 1982 “An object’s function is often a geometric function. The function of a room is to be an enclosing volume. … The function of a chair… is to be a flat surface at a comfortable height for sitting….” Dagstuhl Oct09 40 Earlier Work Winston, Binford, Katz and Lowry “Learning physical descriptions from functional definitions, examples and precedents” 1984 •Discussed used of function-based definitions of object categories •Infinity of individual physical descriptions of objects in a category… •Single functional description to represent all (cup example) Dagstuhl Oct09 41 Earlier Work Brady, Agre, Braunegg and Connell “The mechanics mate” 1985 Connell and Brady “Generating and generalizing models of visual objects” 1987 • Discussed relation between geometric structure and functional significance • Generalized structural description learned from sequence of examples Dagstuhl Oct09 42 Earlier Work Minsky “The Society of Mind”, 1985 “… The solution is that we need to combine at least two different kinds of descriptions. On one side, we need structural descriptions for recognizing chairs when we see them. ” Dagstuhl Oct09 43 Earlier Work Minsky “The Society of Mind”, 1985 “… On the other side we need functional descriptions in order to know what we can do with them… we need connections between parts of the chair structure and the requirements of the human body that those parts are supposed to serve. “ Dagstuhl Oct09 44 Background DiManzo, Trucco, Giunchiglia, Ricci “FUR: Understanding Functional Reasoning”, 1989 • Utilized functional knowledge within an expert system framework •Primitives defined as individual expert systems that evaluate 3D information Dagstuhl Oct09 45 Background Rivlin and Rosenfeld “Navigational Functionalities”, 1995 • Explored functionality as it relates to mobile robots • Navigating agent may classify objects in its environment in functional terms as “threat,” “landmark” and so on. Dagstuhl Oct09 46 Function-based Approaches Cognitive Psychology/Human Perception Representations of object categories Human-robot interaction strategies Wayfinding Artificial Intelligence Computer Vision Formal representations of knowledge Machine learning techniques to automate reasoning Document/aerial image analysis Interpreting human motion Object recognition/categorization Robotics Mapping of indoor environments Object detection Navigation/interaction plans Formalisms for autonomous robot control Dagstuhl Oct09 47 Artificial Intelligence Two areas within AI that impact functionbased research • Work on formal representations of knowledge about functionality •Application of machine learning techniques to automate the process of constructing function-based systems Dagstuhl Oct09 48 Artificial Intelligence • AI approach developed greater formalism and depth than that in computer vision • Advantage as complexity of system requirements increases Dagstuhl Oct09 49 Robotics • Incorporate best practices from other fields • Evolution • Service robots (controlled environment) • Interaction to confirm function • General navigational systems Dagstuhl Oct09 50 Human Perception Theories • Klatsky et al. (2005) • observe how children interact with object associated to specific function • use information in design of algorithms for robotic interaction with objects to reason about their function Dagstuhl Oct09 51 Functional Knowledge Representation • Barsalou et al. (2005) • HIPE (History, Intentional perspective, Physical environment, and Event sequences) • Raubal and Moratz (2007) • expanded on theory • representation of affordance-based attributes Dagstuhl Oct09 52 Affordances? Goal is object recognition using function According to Webster… Affordance - <graphics> A visual clue to the function of an object. Yes, GRUFF uses affordances Dagstuhl Oct09 53 Affordances Some interpretation of Gibson affordance • Automatic • Pop out – no processing necessary Have to admit – there were (are) different camps Dagstuhl Oct09 54 Affordances According to Gibson “If you know what can be done with… an object, what it can be used for, you can call it whatever you please.” Dagstuhl Oct09 55 Affordances • Considered an error if an object is misclassified. Yes or no? www.businesssupply.com Dagstuhl Oct09 56 Affordances According to Gibson “If a surface of support is knee-high above the ground, it affords sitting on. We call it a seat in general. If it can be discriminated as having just these properties, it should look sit-on-able. If it does, the affordance is perceived visually.” Yes, GRUFF uses affordances Dagstuhl Oct09 57 Affordances Yes, it is a chair Dagstuhl Oct09 58 Gibson’s Theory of Affordances • Properties noted: • horizontal • flat • extended • rigid Knowledge Primitives Relative Orientation Planar Metric Dimensions Requires Interaction Physical properties, measured relative to the animal. (Shape Properties) The Ecological Approach to Visual Perception, J.J. Gibson Dagstuhl Oct09 59 Open Problems: Across Disciplines Work to ensure: • scalability • efficiency • accuracy • ability to learn Dagstuhl Oct09 60 What we learned from GRUFF Open Problem Areas Data Flow End Goals Provides potential containment Provides potential table area Infer contextual relationships from accumulated functional evidence… Provides potential worksurfaces Provides potential seating (back support and/or seating area) Infer affordances “in the large”…(in scale-space) Factors Influencing System Complexity Degree of Interaction Feedback from Interaction Complexity of Interaction From Function From Visual Analysis and Physical Interaction. M. Sutton, L. Stark, & K. Bowyer. Image and Vision Computing. 16 (1998) 745-763. Knowledge Representation The internal architecture utilized for reasoning about affordances: Representation Representation Action/ Observation Sequence Interaction Tests for a Cup Object Action/ Observation Sequence Action/Observation Sequence Results: Furniture-like objects Results: Dish-like objects Representative OPUS Models Representative Image Sequences Results: Segmentation Issues Segmentation Issues Summary of Unpredicted Subsystem Failures Model Building Subsystem Shape-based Reasoning Subsystem Interaction-based Reasoning Subsystem Chairs (13/45) 29% 8/32 (25%) 3/18 (17%) Cups - 0/27 (0%) 7/27 (26%) Category Task/Affordance Driven Data Flow Use task information Capture image pair Calculate disparity and range data Perform segmentation Perform function-based reasoning (and evaluate) (and evaluate) (and evaluate) Reset parameters ? Reset parameters Data flow from function-based reasoning to refinement of image acquisition and range segmentation parameters. Implementation Level: Parameter Sets Implementation Level: Metrics / Error Calculations Real Data: SVS Real Data: Parameter Variations Surface Extraction and Use of Affordances Capture image pair -> calculate disparity and range -> evaluate range data -> perform/evaluate range segmentation -> perform/evaluate object recognition Question: How can use of affordances be incorporated into feedback loops? Guiding Questions AND ANSWERS! (from previous Dagstuhl seminar) How could or should a robot control architecture look like that makes use of affordances as first-class items in perceiving the environment? How could or should such an architecture make use of affordances for action and reasoning? Is there more to affordances than function-oriented perception, action and reasoning? Guiding Questions AND ANSWERS! (from previous Dagstuhl seminar) Should affordances in a robot be programmed or learned? (Can they be programmed in the first place?) What about an affordance needs to be represented in a robot, and how? How and where in the architecture would attention, intention, or other internal states filter affordances that were perceived on a low level? How would affordance-based control go together with behavior-based and plan-based control? Is it complementary? Redundant? Inconsistent? How can affordances be used for reasoning and action? Affordances: …in space and time… Affordances: …within subsystems… …supervisors, specialists, agents… Affordances: …in scale-space… In a similar vein, trying to understand perception by studying only neurons is like trying to understand bird flight by studying only feathers: It just cannot be done. In order to understand bird flight, we have to understand aerodynamics; only then do the structure of feathers and the different shapes of birds’ wings make sense. David Marr (1982) QUESTIONS? Thank You ! Dagstuhl Oct09 91