Utility = f(Vision) - A Review

advertisement
Utility = f(Vision)
- A Review
Perception
“To perceive is also about how to approach and what to do
with an object …”
“Perception/cognition is determined by aspects and form of
the agent (Embodiment) …”
Affordances
“An affordance is an intrinsic property of an object, allowing
an action to be performed with the object. It also depends
on the embodiment of the agent performing the action …”
“Objects which are cars for residents of Lilliput, are merely
toys for Gulliver… ”
A Condition for Survival
“One of the most basic function of all organisms is the
cutting up of environment into classifications by which nonidentical stimuli can be treated as equivalent …”
Clustering Visual Input
• Click to edit Master text styles
• Second level
• Third level
• Fourth level
• Fifth level
Tremendous variation in
shape !
(Hard for state of art algorithms
based on appearance to
recognize them)
BUT
All are sittable surfaces!
(for humans)
Or, dimensionality = 1 in
affordance space.
So, the question to ask is:
What are the affordances an object
can support given its visual features
such as shape, texture and color ?
Why to answer this question
?
• Obtaining semantic clustering of objects  Generalization !
• Building vision perception for robotic platforms.
• Generating scene descriptions in a utilitarian framework 
Visual aid devices for blind !
• For the sake of science !
Points to Note
Shape
Is not one
to one
Affordances
“The proposition is to use appearance cues as a supplement to
affordance learning and not to totally ignore them…”
Continued… Implicit and Explicit Knowledge
• Click to edit Master text styles
• Second level
• Third level
“Shapes can only
represent explicit
knowledge ..”
“Knowledge about
hooks/fixture is
implicit in (b)…”
• Fourth level
• Fifth level
A Survey of efforts in the past
“If I have seen further it is by standing
on ye sholders of Giants”
-Isaac Newton
Affordance Learning
From
Activity
From
Simulation
From Shape
Global
Features
Local
Features
Body
Activity
Hand
Activity
Interactive
Robot
Freeman & Newell [1971]
• Structure is a unit that provides a set of functions.
• Laid down a formalism for
• When
• How
Can structures be combined to provide required functions.
The first efforts ! (Winston, Binford et al
[1983])
• Functional description of an object cup
• ako: A kind of
• hq: Has quality
Input to System:
First Vision System using functional
information (Connel & Brady [1987])
• Describe functional concepts geometrically.
• Generalize !
• Click to edit Master text styles
• Second level
• Third level
• Fourth level
• Fifth level
Understanding Functional Reasoning
([Di Manzo, Ricci et al
1989])
• Knowledge representation  Semantic Networks
• Objects  3D octree models (Synthetic).
• Try to account for real-world noise
• Functional Elements: Support, Grasp, Hang, Cut, Equilibrium,
Enter, Contain, Pierce, Stop
Understanding Functional Reasoning
1989])
• Click to edit Master text styles
• Second level
• Third level
• Fourth level
• Fifth level
([Di Manzo, Ricci et al
More Attempts
1994])
• Concept of Knowledge Primitives
•
•
•
•
•
Dimensions (length or area of surface)
Relative orientation: between surfaces
Proximity: between surfaces/faces
Clearance : Lack of obstacles in a defined area
Stability: being in rest in certain orientation.
• Pre-define Categories and Sub-Categories
• CAD and Range-sensor data.
([Stark et al. 1991-
([Stark et al. 1991-1994])
• Click to edit Master text styles
• Second level
• Third level
• Fourth level
• Fifth level
([Stark et al. 1991-1994])
• Click to edit Master text styles
• Second level
• Third level
• Fourth level
• Fifth level
([Stark et al. 1991-1994])
• Categories Considered
• Chairs, Tables, Bench, Bookshelf, Bed, Not Known
• Click to edit Master text styles
• Second level
• Third level
• Fourth level
• Fifth level
A Part based approach
([Rivlin et al
1995])
• Extract 4 parts  Reason about their relative configuration
• Sticks, Blobs, Plate, Strip
Criticism
• Highlight the importance of Knowledge representation 
• Hard-Coded definitions 
• Almost no testing on real world data 
• Instead of trying to recognize surfaces for sitting, sleeping,
keeping objects ended up recognizing chairs, beds and tables !
• Pseudo-functional Space
Using Affordance Cues for Object Detection
• Click to edit Master text styles
• Second level
• Third level
• Fourth level
• Fifth level
Continued..
Use of Coarse Features
([Dillman et al.
ICRA 2011])
• 2 Oranges, 1 Apple, Can, Tissue Packs, Beaker, Bottle
• Coarse features generalize the most.
• Active Stereo, Multiple Viewpoints
Affordance Learning
Learning by Actions
Human Actions and Object Context
(Moore et al. [ICCV
1999])
• Jointly Model actions and Image features
• Pre-defined object model
• Shape: Pixel area, size of bounding box, L2-distance from
known classes
• Action: HMM based hand pose estimation
Results
• Click to edit Master text styles
• Second level
• Third level
• Fourth level
• Fifth level
Interaction Signatures
([Venkatesh et al
ICCV-05])
• Click to edit Master text styles
• Second level
• Third level
• Fourth level
• Fifth level
Interaction Signatures
([Venkatesh et al
ICCV-05])
Consider only printer, chair, keyboard and paper !
Observing Humans
2005])
• Click to edit Master text styles
• Second level
• Third level
• Fourth level
• Fifth level
([Veloso et al. ICRA
Objects in Action
([Gupta et al CVPR
2007])
•
•
•
•
HOG  Initial guess on probability of object in a window.
Reach (Mr)
Manipulation (Mm)
Reaction (Or)
Objects in Action
([Gupta et al CVPR
2007])
• Click to edit Master text styles
• Second level
• Third level
• Fourth level
• Fifth level
Interactive Learning
(Leonardis et al
2009])
• Object Shape  Ellipses (Curvature, area, etc)
• Action features  Color and Edge histograms
• SVM  object features to clustered action features.
Object-Action Recognition
([Kragic
2011])
• Consider
• Book, Magazine, Hammer, Pitcher, Box,Cup
• Hammering, opening, pouring
• Video Data
• Object Recognition  HOG
• Hand pose (velocity, angle b/w joints, orientation)  SVM
• Learn a joint model using Factorial CRF.
Affordance Learning
Learning by Simulation
Learning Spatial Relations Using
Functional Simulation (Sjoo et al [IROS 2011])
• Learn relation between 2
objects
•
•
•
•
Support
Protection
Constraint
Move Together
• Features
• Pose, closest seperation,
area, distance, contact
patch area etc.
• Predict Relation given
feature.
What makes a chair chair ?
• Discussed !
Indoor Scenes
• Highly Structured !
• Surface Orientations: Mainly Vertical and Horizontal
• Components
• Boundaries
• Walls, Floors, Doors
• Furniture
• Tables, Chairs, Beds, Shelves, Cabinets
• Actions
• Cups, Bottles, Glasses, Books, Pens, Kitchen Appliances etc.
• Current Proposition  Discover the first 2 categories of scene
components
Scene Interpretation
Most Relevant Work
(Rusu et al [2010])
• Click to edit Master text styles
• Second level
• Third level
• Fourth level
• Fifth level
Framework
• Kitchen Environment
• Co-Register 16 scans  Laser and TOF Cameras.
• Bottom and Topmost regions  Floor and
Roof
• Determine X and Y axes
• Use heuristics on remaining vertical surfaces
to get walls.
• Label other vertical surfaces as furniture.
Segmentation
• Click to edit Master text styles
• Second level
• Third level
• Fourth level
• Fifth level
Furniture Labelling
Moving Ahead (Replacing
Heuristics)
• Horizontal L-1 features
• Z-Coordinate, Length and Width
• Vertical L-1 features
• Height, Floor Distance, Roof Distance, Width
• L2 features
• Height, Width
• Num Handles, Knobs
• Learn using CRF.
Some Results
• Click to edit Master text styles
• Second level
• Third level
• Fourth level
• Fifth level
• Legend
• Horizontal Planes: Floors, Tables, Ceilings
• Vertical Planes: Walls, Furniture Candidates
• Furniture: Cupboards, Drawers, Kitchen
Appliances
Leftover Objects
• Like cups, bottles etc.
• Application: Grasping, Manipulation
Geometrical Primitives
• Planes, Spheres, Cylinders, Cones, Tori, Edges and Corners
• Use local point features for primitive labeling using CRF.
• Further using point labels, an SVM modeling capturing shape
is used for identifying class of object. (4 object classes).
Proposition
Pipeline
Point Cloud
Observe
Clusters
Surface Normal
Clustering
Identify floor,
roof, Z axis
Segmentatio
n Normal
Edges
Walls, X and Y
Axes
Compute
Features
Identify Horizontal
and Vertical Surfaces
Features
• Defined for each horizontal/vertical surface
•
•
•
•
•
Orientation of Surface
Area of Surface
Volume of Object
Distance from floor
Distance from walls
• At a 2nd level
• Relation with other surfaces in the object
• Metrics
• Human Height
Hopeful Objects
• Identification
• Walls, Floors, Navigable Spaces
• Emerge by un-supervised clustering (pure geometrical
features)
•
•
•
•
•
•
•
•
•
Tables/Desks
Chairs
Beds
Shelves
Almirahs
Doors
Cabinets
Windows
Dustbins
Further Extensions
• Poselet driven affordance learning:
• Human moving around in an environment.
• Vision system  Tracks humans, associates poses and objects.
• Supplement object detection by using poses.
• Eg: Recognizing bean bags for sitting.
• Predict the affordance pose given the object.
• fMRI Study:
• Learn a model by showing common tools.
• Use say a screwdriver for hammering  would be interesting to
see if it is predicted as hammer or screw-driver.
Download