Using Computational Cognitive Models for Better Human

advertisement
Using Computational Cognitive Models
for Better Human-Robot Collaboration
Alan C. Schultz
J. Gregory Trafton
Nick Cassimatis
Navy Center for Applied Research in Artificial Intelligence
Naval Research Laboratory
Peer-to-peer collaboration in
Human-Robot Teams
• Not interested in general, unified grand theory of
cognition for solving the whole problem
– We already know how to be mobile, avoid collisions,etc
• Approach: to be informed from cognitive
psychology
– Study human-human collaboration
– Determine important high-level cognitive skills
– Build computational cognitive models of these skills
• ACT-R, SOAR, EPIC, Polyscheme…
– Use computational models as reasoning
mechanism on robot for high-level cognition
Cognitive Science as Enabler
Cognitive Robotics
• Hypothesis:
– A system using human-like representations and
processes will enable better collaboration with people
than a computational system that does not
• Similar representations and reasoning mechanisms make it
easier for humans to work with the system; more compatible
– For close collaboration, systems should act “naturally”
• i.e. not do something or say something in a way that detracts
from the interaction/collaboration with the human
• Robot should accommodate humans; not other way around
– Solving tasks from “first principles”
• Humans are good at solving some tasks; let’s
leverage human’s ability
Cognitive Skills
• Appropriate knowledge representations
– Spatial representation for spatial reasoning
– Adapting representation to problem solving method
• Problem solving
– Navigation routing with constraints (e.g., remaining hidden)
• Learning
– Learning to recognize and anticipate others’ behaviors
– Learning characteristics of other’s capabilities
• Vision
– Object permanence and tracking (Cassimatis et al., 04)
– Recognizing gestures
• Natural language/gestures (Perzanowski et al., 01)
Cognitive Skills
• Perspective-Taking
– Spatial (Trafton et al., 2005)
– Social (Breazeal et al., 2006)
• Spatial reasoning
– People use metric information implicitly; use and think
qualitatively much more frequently (Trafton et al., 2006)
– Spatial referencing/language (Skubic et al., 04)
• Temporal reasoning
– Predicting how long something will take
• Anticipation
– What does a person need and why?
Hide and Seek
(Trafton & Schultz, 2004, 2006)
• Lots of knowledge about space required
• A “good” hider needs visual, spatial perspective
taking to find the good hiding places (large
amount of spatial knowledge needed)
Development of Perspective-Taking
• Children start developing (very very basic)
perspective-taking ability around age 3-4
– Huttenlocher & Presson, 1979; Newcombe &
Huttenlocher, 1992; Wallace, Alan, & Tribol, 2001
• In general, 3-4 year old children do not have a
particularly well developed sense of
perspective taking
Case Study: Hide and Seek
Age 3½
Game Num ber
1
Hiding Location
eyes-closed
2
out-in -open
suggestion
3
4
don't hide out in
the open
under piano
in laundry room
break
5
6
under piano
in laundry room
7
in bathroom
8
in her room
9
10
11
12
under chair
behind bedroom
door
under chair
under covers
13
under covers
14
15
in bathroom
under
glass
coffee table
Hiding Type
can't see me if I
can't see you
understanding
rules of game
under
containment
(room)
under
containment
(room)
containment
(room)
containment
(room)
under
containment
behind
under
under
containment
under
containment
containment
under
or
Elena did not have perspective taking ability
or
or
– Left/right errors
– play hide and seek by learning pertinent
qualitative features of objects
– construct knowledge about hiding
that is object-specific
Hide and Seek Cognitive Model
• Created cognitive model of Elena learning to play hide and
seek using ACT-R (Anderson, et al 93, 95, 98, 05)
• Correctly models Elena’s behavior at 3½ years of age
• Learns and refines hiding behavior based on interactions
with “teacher”
– Learns production strength based on success and failure of hiding
behavior
– Learns ontological or schematic knowledge about hiding
• Its bad to hide behind something that’s clear
• Its good to hide behind something that is big enough
• Knows about location of objects (relative) (behind, in front of) adds
knowledge about relationships. Model only has syntactic notion of
spatial relationships
Hybrid Cognitive/Reactive Architecture
Robot Hide and Seek
Computational cognitive model of
hiding makes deliberative (highlevel cognitive) decisions. Models
learning.
QuickTime™ and a
Cinepak decompressor
are needed to see this picture.
Using cognitive model of hiding
(after learning) in order to reason
about what makes a good hiding
place in order to seek.
QuickTime™ and a
Cinepak decompressor
are needed to see this picture.
• Reactive layer of hybrid model for
mobility and sensor processing
How important is perspective taking?
(Trafton et al., 2005)
• Analyzed a corpus of NASA training tapes
– Space Station Mission 9A
– Two astronauts working in full suits in neutral-buoyancy
facility. Third, remote person participates.
– Standard protocol analysis techniques; transcribed 8
hours of utterances and gestures (~4000 instances)
• Use of spatial language (up, down, forward, in
between, my left, etc) and commands
– Research questions:
• What frames of reference are used?
• How often do people switch frames of reference?
• How often do people take another person’s perspective?
Spatial language in space
Results
Frame of Reference
Example
% Utterances
Exocentric
Go straight zenith (“up”)
Egocentric
Turn to my left
15%
Turn to your left
10%
Addressee-Centered
Deictic
Object-centered
Put it over there [Points]
Put it on top of the box
7%
5%
63%
• How frequently do people switch their frame of reference?
– 45% of the time (Consistent with Franklin, Tversky, & Coon, 1992)
• How often do people take other people’s perspective (or
force others to take theirs)?
– 25% of the time
Perspective Taking and
Changing Frames of Reference
QuickTime™ and a DV/DVCPRO - NTSC decompressor are needed to see this picture.
Perspective Taking and Changing
Frames of Reference
Bob, if you come straight down from where you are, uh, and uh
kind of peek down under the rail on the nadir side, by your right
hand, almost straight nadir, you should see the uh…
• Notice the mixing of perspectives: exocentric (down),
object-centered (down under the rail), addresseecentered (right hand), and exocentric again (nadir) all in
one instruction!
• Notice the “new” term developed collaboratively:
mystery hand rail
Perspective Taking
• Perspective taking is critical for collaboration.
• How do we model it? (ACT-R, Polyscheme…)
• I’ll show several demos that show our current
progress on spatial perspective taking
• But first a scenario:
“Please hand me the wrench”
Perspective taking in
human interactions
• How do people usually resolve ambiguous
references that involve different spatial
perspectives? (Clark, 96)
– Principle of least effort (which implies least joint effort)
• All things being equal, agents try to minimize their effort
– Principle of joint salience
• The ideal solution to a coordination problem among two or more
agents is the solution that is the most salient, prominent, or
conspicuous with respect to their current common ground.
• In less simple contexts, agents may have to work harder to
resolve ambiguous references
Perspective Taking:
A tale of two systems
• ACT-R/S (Schunn & Harrison, 2001)
– Our perspective-taking system using ACT-R/S is
described in Hiatt et al. 2003
•
•
•
•
Three Integrated VisuoSpatial buffers
Focal: Object ID; non-metric geon parts
Manipulative: grasping/tracking; metric geons
Configural: navigation; bounding boxes
• Polyscheme (Cassimatis)
Configural - Navigation
– Computational Cognitive Architecture where:
• Mental Simulation is the primitive
• Many AI methods are integrated
– Our perspective-taking using Polyscheme is
described in Trafton et al., 2005
Focal -object identification
Manipulative
- grasping & tracking
Robot Perspective Taking
Human can see one cone
Robot can sense two cones
QuickTime™ and a
Cinepak decompressor
are needed to see this picture.
(Fong et al., 06)
QuickTime™ and a
Cinepak decompressor
are needed to see this picture.
Summary
• Having similar or compatible representation and
reasoning as a human facilitates human-robot
collaboration
• We’ve developed computational cognitive
models of high-level human cognitive skills as
reasoning mechanisms for robots
• Open questions:
– Scale up; combining many such skills
– What are the important skills?
– Which skills are built upon others?
Shameless Advertisement
• ACM/IEEE Second International Conference
on Human-Robot Interaction
– Washington DC, March 9-11, 2007
– With HRI 2007 Young Researchers Workshop,
March 8, 2007
– Single track, highly multi-disciplinary
– Robotics, Cognitive Science, HCI, Human factors,
Cognitive Psychology…
– Submission deadline: August 31, 2006
– www.hri2007.org
A Dynamic Auditory Scene
• Everyday Auditory Scenes are
VERY Noisy
– Fans
– Alarms/Telephones
– Traffic
– Weather
– People
Auditory Perspective Taking
Allow a robot to use its knowledge of the
environment, both a priori and sensed, to predict
what human can hear and effectively understand.
• Information Kiosk
• Stealth Bot
– Robot uses speech to relay
information to an interested
human listener.
– Robot uses its awareness of the
auditory environment to hide
from people and or machines.
– Given the auditory scene, can
the person understand what the
robot is saying?
– The robot knows its own acoustic
signature
– If not, what actions can the robot
take to improve intelligibility and
knowledge transfer?
– Now predict how each action or
location will be heard by the
listener, and select the best
choice.
An Example of Adaptation:
Robot Speech Interface
• Adjust word usage depending on noise levels
– Use smaller words with higher recognition rates.
– Ask questions to verify understanding; repeat yourself.
• Change the quality of the speech sounds
– Adapt voice volume and pitch to overcome local noise
levels (Lombard Speech).
– Emphasize difficult words.
– Don’t talk during loud noises
• Reposition Oneself
– Vary the proximity to the listener
– Face the listener as much as possible
– Move to a different location if all else fails.
Information Kiosk
• Overhead Microphone Array
– Tracks local sound levels
– Localizes interfering sources
– Guides the vision system to new users
• Stereo Vision
– Tracks the users position in real-time.
• Actions
– Raise speaking volume relative to users
distance and the level of ambient noise
– Pause during loud sounds or speech
interruptions.
– Rotate the robot to face users
– Reposition the robot if noise levels become
too large.
Acoustic Perspective
• Noise Maps – Combine Knowledge of Sound
Sources to Build Maps
• Measured Volume/Frequency Levels
• Source Locations/Directionality
• Walls and environmental features
– Multiple maps can be built and
combined in real-time
After exploring the area inside the
square, 3 air vents are localized by
the robot
• Modifying action based on noise map
– Seeking noisy hiding places so that it can best
observe its target without being detected.
• masking its particular acoustic signature.
4 Sources are combined together
as omnidirectional sources,
without environmental reflections.
Download