Quick Intro

advertisement
IBM Research
Extensible Language Interface
for Robot Manipulation
Jonathan Connell
Exploratory Computer Vision Group
Etienne Marcheret
Speech Algorithms & Engines Group
Sharath Pankanti (IBM Yorktown)
Michiharu Kudoh (IBM Tokyo)
Risa Nishiyama (IBM Tokyo)
IBM Research
Much of “Intelligence” Based on Two Illusions
 Animal part = mobility, perception, and reaction
•
•
•
•
People flock around robots and readily anthropomorphize them
Real-world action seems to convey a feeling of “aliveness”
Responsiveness to changes in environment conveys sense of “mind”
Key point in the embodied / situated agents viewpoint
 Human part = learning by being told
• Bulk of human knowledge contained in culture, largely passed verbally
• No one discovers how to cook macaroni and cheese – someone explains
• Lack of communication makes even people (e.g. foreigners) seem less “human”
Goal is to “fuse” these two parts into a harmonious whole

Analogy to a Turing machine
• Core is a simple finite state machine controller (= language interpreter)
• Addition of tape vastly increases computational power (= learning from language)
2
IBM Research
Required Innate Mechanisms
 Segmentation
• Division of the world into spatial regions (partial segmentation okay)
• Positive space regions are objects, people, and surface
• Negative space regions are places and passages
 Comparison
• Objects have properties like color and size that are different
• Objects have relations to other objects such as position
 Actions
• Operators can be indexed to operate on certain objects
• Most have expected continuation and / or end conditions
 Time
• Physical motions have expected durations
• Actions can be sequenced based on completion
• More complex actions can be built from simpler ones
 Language interpretation ties into all these pre-existing (animal) abilities
• Nouns, adjectives, prepositions, verbs, adverbs, conjunctions
3
IBM Research
ELI: A Fetch-and-Carry Robot
Use speech, language, and vision to learn objects & actions
• But not from lowest level like “what is a word” or “what visual properties signal an object”
• Build in as much as is practical
 Save learning for terms not knowable a priori
• Names for particular items or rooms in a house
• How to perform special tasks like “clean up”
Example dialog:
command
following
verb
learning
noun
learning
advice
taking
Round up my mug.
I don’t know how to “round up” your mug.
Walk around the house and look for it.
When you find it bring it back to me.
I don’t know what your “mug” looks like.
It is like this <shows another mug> but sort of orange-ish.
OK … I could not find your mug.
Try looking on the table in the living room.
OK … Here it is!
Potential use in eldercare scenario – a service dog with less slobber
4
IBM Research
Capabilities Illustrated
Through 4 Part Video
camera
 Arm and camera removed from
robot and mounted on table
 Simplifies problem by reducing
the degrees of freedom
arm
OTC medications
(Advil & Gaviscon)
5
IBM Research
Multi-Modal Interaction (video part 1)
Features:
 Automatically finds objects
 Selects by position, size, color
 Grabs selected object
 Understands pronoun reference
 Can ask clarifying questions
 Handles user pointing
 Robot points for emphasis
6
IBM Research
Noun Learning Scenario (video part 2)
Features:
 Builds visual models
 Adds new nouns to grammar
 Identifies objects from models
 Passes object to/from user
Model = size + shape + colors
Matching = nearest neighbor
dist = Σ w[i] * | v[i] – m[i] |
7
IBM Research
Once objects have names, more properties are available
 Oversee operation of physical robot to provide more intelligent action
Eli Robot
at Watson
Brainy Response System at Tokyo
Vision
ASR
Objects
Parser
Vocabulary
Visual
models
Reasoning
Semantic
memory
Action
models
Talk
Archive
context update
Network
Lifelog
vetoes,
recommendation
Kinematics
Sequencer
Could envision a similar extension using RoboEarth online resource
8
Retrieve
IBM Research
Manipulation with Intelligent Backend (video part 3)
Features:
 Vetoes actions based on DB
 Picks alternates using ontology
 Checks for valid dose interval
 Real-time cloud connection
“Alice”
aspirin
lifelog history
NO
DB
Gavagai problem
9
antacid
Rolaids
Tums
(requested)
(present)
7:14 AM
xxxxx
8:39 AM
zzzzz
9:01 AM
took Tylenol
IBM Research
Verb Learning Scenario (video part 4)
Features:
 Learns action sequences
 Handles relative motion commands
 Responds to incremental positioning
 Applies new actions to other objects
“poke”
10
point
1.0
out
1.0
out
-1.0
IBM Research
ELI Arm Demos Video
Also available on YouTube:
http://www.youtube.com/watch?v=M2RXDI3QYNU
11
IBM Research
Summary of Abilities
 Perception
• Automatically detects and counts visual objects
• Understands colors, sizes, and overall positions
 Action
• Can successfully reach for seen objects
• Can grasp and deposit objects in real world
 Language
• Parses and responds appropriately to speech commands
• Understands pointing and uses pointing itself
• Properly interprets object passing interactions
 Reasoning
• Knows limitation about what it can see, reach, and grab
• Asks clarifying questions when there are ambiguities
• Can alter actions based on known facts, histories, and ontologies
 Learning
• Acquires new visual object models and corresponding words
• Can verbally train and name a sequence of indexical actions
 Differences from some AGI work
• Complete approach attacking core problem (language as tape)
• Concrete, physical, and implemented system (all integrated)
12
IBM Research
Extensions
 What is still missing?
• Acquiring new data by observation & interaction
• Filling in holes in learned representations & procedures
• Fixing inaccuracies in taught knowledge
 Free the robot from top-down imperatives!
• Add initiative – a smart assistant will look for answers himself
• Improvisation – if something does not match perfectly, try a variation
• Experiential learning – better to pick up a cup by rim instead of base
13
Download