Gaze Perception and Theory of Mind - AAAC emotion

advertisement
Attention and Perception
of Attentive Behaviours
for Emotive Interaction
Christopher Peters
LINC
University of Paris 8
Virtual Humans


Computer models of people
Can be used as…



substitutes for the real thing in
ergonomic evaluations
Conversational agents
Display and Animation:



Two layers: skeletal layer and skin
layer
Skeleton is hierarchy of positions and
orientations
Skin layer provides the visual
appearance
Animating Characters

Animation Methods


Low level: rotate leye01 by 0.3 degrees around axis
(0.707,0.707,0.0) at 0.056 seconds into the
animation
High level: ‘walk to the shops’


Character must know where shop is, avoid obstacles on
the way there, etc. Must also be able to walk …
Autonomy



Direct animation  automatic generation
Autonomy requires character to animate itself
based on simulation models
Models should result in plausible behaviour
Our Focus
1.
Attention and related behaviours


2.
(a) Where to look
(b) How to generate gaze behaviour
Perception of attentive
behaviours and emotional
significance


How to interpret attention
behaviours of others
Conversation initialisation
in Virtual Environments…
Why VE?

Cheap!



Quick


No need for expensive
equipment (facilities, robots,
etc)
Duplication at the click of a
mouse
Changes to environment can
be made quickly and easily, at
no extra cost
But…

Things we take for granted in
RL need to be programmed into
the virtual environment


Physics
And will only ever be
approximations of reality
1. Attention and Gaze


Our character is
walking down a virtual
street
Where should the
character look and how
should the looking
behaviours be
generated?
Foundation

Humans need to look
around





Ilab, University of Southern California
An ecological approach to
visual perception, J.J. Gibson,
1979.
Eyes in the front of our head
Poor acuity over most of
visual field
Even for places where we
have been before, memory is
far from perfect
Virtual humans should look
around too!
Significance to Virtual
Humans

Viewer perception



The impact of eye gaze on
communication using
humanoid avatars, Garau
et al., 2001.
Plausibility
“If they don’t look around,
then how do they know
where things are?”
viewer
Human
Significance to Virtual
Humans

Functional purposes



Navigation for digitial actors
based on synthetic vision,
memory and learning,
Noser et al., 1995.
Autonomy
If they don’t look around,
then they won’t know where
things are
Our Focus


Gaze shifts versus saccadic eye movements
General looking behaviours

Where to Look? Automating Certain Visual
Attending Behaviors of Human Characters,
Chopra-Khullar, 1999.


Practical Behavioural Animation Based On Vision
and Attention, Gillies, 2001.
Two problems:
1.
2.
Where to look
How to look
Approach

Use appropriately simplified models from
areas such as psychology, neuroscience,
artificial intelligence …



Appropriate = fast, allowing real-time operation
Capture the high-level salient aspects of such
models without the intricate detail
Components
Where to look

Sensing
Attention
Memory

Gaze Generator
How to look


System Overview
Input environment through
synthetic vision component
1.
Process visual field using
spatial attention model
2.
Modulate attended object
details using memory
component
3.
Generate gaze behaviours
towards target locations
Visual Sensing

Three renderings taken per visual update


One full-scene rendering (  to attention module)
Two false-colour renderings (  to memory
module)
False-colour Renderings

Approximate acuity of the eye with two
renderings


Fovea
Periphery
Viewpoint
Fovea
Periphery
Renderings

Renderings allow both spatial/image and
object based operations to take place
1(a) Where to look

Model of Visual Attention


“Bottom-up”




Two-component theory of attention
Endogenous
Voluntary, task driven
‘Look for the coke can’
“Top-down”



Exogenous
Environment appears to ‘grab’ our attention
Colour, intensity, orientation, motion, texture,
sudden onset, etc
Bottom-up Attention
Orientation, intensity and colour contrast
Bottom-up Attention

Model





Cognitive engineering
Itti et al. 2000
 http://ilab.usc.edu/bu/
Biologically inspired
Inputs an image, outputs
encoding of attention
allocation
Peters and O’ Sullivan
2003
Input Image
Intensity
RG Colour
BY Colour
Gaussian Pyramid

Each channel acts as the first
level in a Gaussian or Gabor
pyramid
Each subsequent level is a
blurred and decimated version
of the previous level


Image processing techniques
simulate early visual
processing
Center-Surround Processing

Early visual processing

Ganglion cells
Respond to light in a center-surround pattern
Contrast a central area with its neighbours

Contrast important, not amplitude (CONTEXT)



Simulated by comparing different levels in
image pyramids
Saliency Map

Conspicuity Maps


Input

Result of center-surround calculations for
each feature type
Define the ‘pop-out’ for each feature type
Intensity
Colour
Orientation
Integrated into saliency map

Attention directed preferably to lighter
areas
Saliency Map
Memory

Differentiate between what an agent
has and hasn’t observed
Agents should only know about objects
that they have witnessed
 Agents won’t have exact knowledge about
world
 Used to modulate output of attention
module (saliency map)
 Object-based, taking input from synthetic
vision module

Stage Theory


The further information goes, the longer
it is retained
Attention acts as a filter
Stimulus Representations

Two levels of detail
representation for
objects

Proximal stimuli



Early representation of
the stimulus
Data discernable only
from retinal image
Observations

Later representation of
stimuli after resolution
with the world database
Stage Theory

Short-term Sensory Storage (STSS)


From distal to proximal stimuli
Objects have not yet been resolved with world
database
Stage Theory

Short-term memory (STM) and Long-Term
Memory (LTM)


Object-based
Contains resolved object information


Observations store information for attended
objects




From proximal stimuli to observations
Object pointer
World-space transform
Timestamp
Virtual humans are not completely autonomous
from the world database
Memory Uncertainty Map

Can now create a memory uncertainty map
for any part of the scene the agent is looking
at


The agent is uncertain of parts of the scene it has
not looked at before
Depends on scene object ‘granularity’
Attention Map

Determines where
attention will be allocated
to




Bottom-up components
Top-down (see 2)
Memory
Modulating the saliency
map by the uncertainty
map

Here, sky and road have low
uncertainty levels
Human Scanpaths
Eye movements
and fixations
Inhibition of Return

Focus of attention must change
Inhibit attended parts of the scene from
being revisited soon
 Image-based IOR

Problem: Moving viewer or dynamic scene
 Solution: Object based memory


Object-based IOR
Store uncertainty level with each object
 Modulate saliency map by uncertainty levels

Artificial Regions of Interest


Attention map at lower resolution than visual
field
Generate AROIs from highest values of
current attention map to create scanpath

Assume simple one-to-one mapping from attention
map to overt attention
1(b) How to look


Generate gaze animation given a target
location
Gaze shifts


Combined eye-head gaze shifts to visual and
auditory targets in humans, Goldring et al., 1996.
Targets beyond oculomotor range
Gaze Shifts

Contribution of head
movements



Head Movement Propensity, J.
Fuller, 1992.
‘Head movers’ Vs. ‘Eye movers’
±40 degree orbital threshold


Innate behavioural tendancy for
subthreshold head moving
Midline-attraction and Resetting
Blinking

Subtle and often overlooked



Not looking while leaping: the linkage of blinking
and saccadic gaze shifts, Evinger et al., 1994.
Gaze-evoked blinking
Amplitude of gaze shift influences blink probability
and magnitude
2. Perception of Attention

Attention behaviours may elicit attention
from others
Predator-prey
 Gaze-following
 Goals
 Intentions

Gaze in Infants

Infants
Notice gaze direction
as early as 3 months
 Gaze-following



Infants are faster at
looking at targets that
are being looked at by
a central face
Respond even to
circles that look like
eyes
www.mayoclinic.org
Theory of Mind
Baron-Cohen (1994)
• Eye Direction and
Intentionality Detectors
• Theory of Mind Module
Perrett and Emery (1994)
• More general Direction
of Attention Detector
• Mutual Attention
Mechanism
Our Model

ToM for conversation initiation


Based on attention behaviours
Key metrics in our system are Attention
Levels and Level of Interest
Metrics represent the amount of attention
perceived to have been paid by another
 Based primarily on gaze

Also body direction, locomotion, directed gestures
and facial expressions
 Emotional significance of gaze

Implementation (in progress)

Torque game engine
http://www.garagegames.com
 Proven engine used for number of AAA titles
 Useful basis providing fundamental
functionality
 Graphics exporters
 In-simulation editing
 Basic animation
 Scripting
 Terrain rendering
 Special effects

Overview
Synthetic Vision


Approximated human vision for computer agents
Why?


Inexpensive – no special hardware required
Bypasses many computer vision complexities



Segmentation of images, recognition
Enables characters to receive visual information in a
way analogous to humans
How?



Updated in a snapshot manner
Small, simplified images rendered from agents
perspective
 Textures, lighting and sfx disabled
False-colouring

False-colours provide a lookup scheme for acquiring
objects from the database

False colour defined (r,g,b)
where
Red is the object type identifier
 Green is the object instance
identifier
 Blue is the sub-object identifier


Allows quick retrieval of objects
Intentionality Detector
(ID)


Represents behaviour in terms of volitional
states (goal and desire)
Based on visual, auditory and tactile cues



Our version only based on vision
Attributes intentionality characteristic to objects
based on the presence of certain cues
Implemented as a filter on objects from visual
system

Only “agent” objects can pass the filter
Direction of Attention

Direction of Attention Detector (DAD)
More useful than EDD alone
 Eye, head, body and locomotion direction read
from database after false-colour lookup
 Used to derive Attention Level metric from
filtered stimuli

Direction of Attention

What happens when eyes aren’t visible?

Hierarchy of other cues

Head direction > Body direction > Locomotion
direction
Mutual Attention

Comparison between:
Eye direction read from other agent
 Focus of attention of this agent



See 1. Generating Attention Behaviours
If agents are focus of each others attention,
Mutual Attention Mechanism (MAM) is
activated
Attention Levels

Perception of
attention paid by
another


At instant of time
Based on
orientation of
body parts

Eyes, head,
body, locomotion
direction
Attention Levels

Direction is weighted for each segment

Eyes provide largest contribution
Less attention
Attention Levels

Also



Support for


Locomotion
direction
Directed gestures
Directed facial
expressions
Gestures and
expressions convey
emotion
Attention Profiles


Attention levels over a time period are stored in
memory
Attention Profile


Consideration of all attention levels of another agent
over a certain time frame
Analysis can provide a higher level description of
their amount of interest in me
Level of interest

General descriptor of amount of interest that
another is perceived to be paying…


…over a time period
Error prone process due to reliance on perception



Links between gaze and interest


Were they actually looking at you?
Human gaze perception is good, but not perfect
Gaze does not necessarily mean attention e.g. blank stare
Inherently probabilistic

Theories of Mind are theories
Application

Conversation initiation scenarios

Subtle negotation involving gaze

Avoid social embarrassment of engaging in
discourse with an unwilling participant
Our ToMM

Stores simple, high-level theories
Useful for conversation initialisation behaviours
– Have they seen me?

• ID, DAD and MAM
modules
– Have they seen me
looking?
• ID, DAD and MAM
modules
– Are they interested in
interacting?
• Level of Interest metric
Future Work




Finish implementation of the model
Further links between attention/memory and
emotion
Hardware based bu attention implementation
Integration of facial expression and gestures
from Greta
Thank you!
Questions
Download