Utility = f(Vision) - A Review Perception “To perceive is also about how to approach and what to do with an object …” “Perception/cognition is determined by aspects and form of the agent (Embodiment) …” Affordances “An affordance is an intrinsic property of an object, allowing an action to be performed with the object. It also depends on the embodiment of the agent performing the action …” “Objects which are cars for residents of Lilliput, are merely toys for Gulliver… ” A Condition for Survival “One of the most basic function of all organisms is the cutting up of environment into classifications by which nonidentical stimuli can be treated as equivalent …” Clustering Visual Input • Click to edit Master text styles • Second level • Third level • Fourth level • Fifth level Tremendous variation in shape ! (Hard for state of art algorithms based on appearance to recognize them) BUT All are sittable surfaces! (for humans) Or, dimensionality = 1 in affordance space. So, the question to ask is: What are the affordances an object can support given its visual features such as shape, texture and color ? Why to answer this question ? • Obtaining semantic clustering of objects Generalization ! • Building vision perception for robotic platforms. • Generating scene descriptions in a utilitarian framework Visual aid devices for blind ! • For the sake of science ! Points to Note Shape Is not one to one Affordances “The proposition is to use appearance cues as a supplement to affordance learning and not to totally ignore them…” Continued… Implicit and Explicit Knowledge • Click to edit Master text styles • Second level • Third level “Shapes can only represent explicit knowledge ..” “Knowledge about hooks/fixture is implicit in (b)…” • Fourth level • Fifth level A Survey of efforts in the past “If I have seen further it is by standing on ye sholders of Giants” -Isaac Newton Affordance Learning From Activity From Simulation From Shape Global Features Local Features Body Activity Hand Activity Interactive Robot Freeman & Newell [1971] • Structure is a unit that provides a set of functions. • Laid down a formalism for • When • How Can structures be combined to provide required functions. The first efforts ! (Winston, Binford et al [1983]) • Functional description of an object cup • ako: A kind of • hq: Has quality Input to System: First Vision System using functional information (Connel & Brady [1987]) • Describe functional concepts geometrically. • Generalize ! • Click to edit Master text styles • Second level • Third level • Fourth level • Fifth level Understanding Functional Reasoning ([Di Manzo, Ricci et al 1989]) • Knowledge representation Semantic Networks • Objects 3D octree models (Synthetic). • Try to account for real-world noise • Functional Elements: Support, Grasp, Hang, Cut, Equilibrium, Enter, Contain, Pierce, Stop Understanding Functional Reasoning 1989]) • Click to edit Master text styles • Second level • Third level • Fourth level • Fifth level ([Di Manzo, Ricci et al More Attempts 1994]) • Concept of Knowledge Primitives • • • • • Dimensions (length or area of surface) Relative orientation: between surfaces Proximity: between surfaces/faces Clearance : Lack of obstacles in a defined area Stability: being in rest in certain orientation. • Pre-define Categories and Sub-Categories • CAD and Range-sensor data. ([Stark et al. 1991- ([Stark et al. 1991-1994]) • Click to edit Master text styles • Second level • Third level • Fourth level • Fifth level ([Stark et al. 1991-1994]) • Click to edit Master text styles • Second level • Third level • Fourth level • Fifth level ([Stark et al. 1991-1994]) • Categories Considered • Chairs, Tables, Bench, Bookshelf, Bed, Not Known • Click to edit Master text styles • Second level • Third level • Fourth level • Fifth level A Part based approach ([Rivlin et al 1995]) • Extract 4 parts Reason about their relative configuration • Sticks, Blobs, Plate, Strip Criticism • Highlight the importance of Knowledge representation • Hard-Coded definitions • Almost no testing on real world data • Instead of trying to recognize surfaces for sitting, sleeping, keeping objects ended up recognizing chairs, beds and tables ! • Pseudo-functional Space Using Affordance Cues for Object Detection • Click to edit Master text styles • Second level • Third level • Fourth level • Fifth level Continued.. Use of Coarse Features ([Dillman et al. ICRA 2011]) • 2 Oranges, 1 Apple, Can, Tissue Packs, Beaker, Bottle • Coarse features generalize the most. • Active Stereo, Multiple Viewpoints Affordance Learning Learning by Actions Human Actions and Object Context (Moore et al. [ICCV 1999]) • Jointly Model actions and Image features • Pre-defined object model • Shape: Pixel area, size of bounding box, L2-distance from known classes • Action: HMM based hand pose estimation Results • Click to edit Master text styles • Second level • Third level • Fourth level • Fifth level Interaction Signatures ([Venkatesh et al ICCV-05]) • Click to edit Master text styles • Second level • Third level • Fourth level • Fifth level Interaction Signatures ([Venkatesh et al ICCV-05]) Consider only printer, chair, keyboard and paper ! Observing Humans 2005]) • Click to edit Master text styles • Second level • Third level • Fourth level • Fifth level ([Veloso et al. ICRA Objects in Action ([Gupta et al CVPR 2007]) • • • • HOG Initial guess on probability of object in a window. Reach (Mr) Manipulation (Mm) Reaction (Or) Objects in Action ([Gupta et al CVPR 2007]) • Click to edit Master text styles • Second level • Third level • Fourth level • Fifth level Interactive Learning (Leonardis et al 2009]) • Object Shape Ellipses (Curvature, area, etc) • Action features Color and Edge histograms • SVM object features to clustered action features. Object-Action Recognition ([Kragic 2011]) • Consider • Book, Magazine, Hammer, Pitcher, Box,Cup • Hammering, opening, pouring • Video Data • Object Recognition HOG • Hand pose (velocity, angle b/w joints, orientation) SVM • Learn a joint model using Factorial CRF. Affordance Learning Learning by Simulation Learning Spatial Relations Using Functional Simulation (Sjoo et al [IROS 2011]) • Learn relation between 2 objects • • • • Support Protection Constraint Move Together • Features • Pose, closest seperation, area, distance, contact patch area etc. • Predict Relation given feature. What makes a chair chair ? • Discussed ! Indoor Scenes • Highly Structured ! • Surface Orientations: Mainly Vertical and Horizontal • Components • Boundaries • Walls, Floors, Doors • Furniture • Tables, Chairs, Beds, Shelves, Cabinets • Actions • Cups, Bottles, Glasses, Books, Pens, Kitchen Appliances etc. • Current Proposition Discover the first 2 categories of scene components Scene Interpretation Most Relevant Work (Rusu et al [2010]) • Click to edit Master text styles • Second level • Third level • Fourth level • Fifth level Framework • Kitchen Environment • Co-Register 16 scans Laser and TOF Cameras. • Bottom and Topmost regions Floor and Roof • Determine X and Y axes • Use heuristics on remaining vertical surfaces to get walls. • Label other vertical surfaces as furniture. Segmentation • Click to edit Master text styles • Second level • Third level • Fourth level • Fifth level Furniture Labelling Moving Ahead (Replacing Heuristics) • Horizontal L-1 features • Z-Coordinate, Length and Width • Vertical L-1 features • Height, Floor Distance, Roof Distance, Width • L2 features • Height, Width • Num Handles, Knobs • Learn using CRF. Some Results • Click to edit Master text styles • Second level • Third level • Fourth level • Fifth level • Legend • Horizontal Planes: Floors, Tables, Ceilings • Vertical Planes: Walls, Furniture Candidates • Furniture: Cupboards, Drawers, Kitchen Appliances Leftover Objects • Like cups, bottles etc. • Application: Grasping, Manipulation Geometrical Primitives • Planes, Spheres, Cylinders, Cones, Tori, Edges and Corners • Use local point features for primitive labeling using CRF. • Further using point labels, an SVM modeling capturing shape is used for identifying class of object. (4 object classes). Proposition Pipeline Point Cloud Observe Clusters Surface Normal Clustering Identify floor, roof, Z axis Segmentatio n Normal Edges Walls, X and Y Axes Compute Features Identify Horizontal and Vertical Surfaces Features • Defined for each horizontal/vertical surface • • • • • Orientation of Surface Area of Surface Volume of Object Distance from floor Distance from walls • At a 2nd level • Relation with other surfaces in the object • Metrics • Human Height Hopeful Objects • Identification • Walls, Floors, Navigable Spaces • Emerge by un-supervised clustering (pure geometrical features) • • • • • • • • • Tables/Desks Chairs Beds Shelves Almirahs Doors Cabinets Windows Dustbins Further Extensions • Poselet driven affordance learning: • Human moving around in an environment. • Vision system Tracks humans, associates poses and objects. • Supplement object detection by using poses. • Eg: Recognizing bean bags for sitting. • Predict the affordance pose given the object. • fMRI Study: • Learn a model by showing common tools. • Use say a screwdriver for hammering would be interesting to see if it is predicted as hammer or screw-driver.