Attention and Perception of Attentive Behaviours for Emotive Interaction Christopher Peters LINC University of Paris 8 Virtual Humans Computer models of people Can be used as… substitutes for the real thing in ergonomic evaluations Conversational agents Display and Animation: Two layers: skeletal layer and skin layer Skeleton is hierarchy of positions and orientations Skin layer provides the visual appearance Animating Characters Animation Methods Low level: rotate leye01 by 0.3 degrees around axis (0.707,0.707,0.0) at 0.056 seconds into the animation High level: ‘walk to the shops’ Character must know where shop is, avoid obstacles on the way there, etc. Must also be able to walk … Autonomy Direct animation automatic generation Autonomy requires character to animate itself based on simulation models Models should result in plausible behaviour Our Focus 1. Attention and related behaviours 2. (a) Where to look (b) How to generate gaze behaviour Perception of attentive behaviours and emotional significance How to interpret attention behaviours of others Conversation initialisation in Virtual Environments… Why VE? Cheap! Quick No need for expensive equipment (facilities, robots, etc) Duplication at the click of a mouse Changes to environment can be made quickly and easily, at no extra cost But… Things we take for granted in RL need to be programmed into the virtual environment Physics And will only ever be approximations of reality 1. Attention and Gaze Our character is walking down a virtual street Where should the character look and how should the looking behaviours be generated? Foundation Humans need to look around Ilab, University of Southern California An ecological approach to visual perception, J.J. Gibson, 1979. Eyes in the front of our head Poor acuity over most of visual field Even for places where we have been before, memory is far from perfect Virtual humans should look around too! Significance to Virtual Humans Viewer perception The impact of eye gaze on communication using humanoid avatars, Garau et al., 2001. Plausibility “If they don’t look around, then how do they know where things are?” viewer Human Significance to Virtual Humans Functional purposes Navigation for digitial actors based on synthetic vision, memory and learning, Noser et al., 1995. Autonomy If they don’t look around, then they won’t know where things are Our Focus Gaze shifts versus saccadic eye movements General looking behaviours Where to Look? Automating Certain Visual Attending Behaviors of Human Characters, Chopra-Khullar, 1999. Practical Behavioural Animation Based On Vision and Attention, Gillies, 2001. Two problems: 1. 2. Where to look How to look Approach Use appropriately simplified models from areas such as psychology, neuroscience, artificial intelligence … Appropriate = fast, allowing real-time operation Capture the high-level salient aspects of such models without the intricate detail Components Where to look Sensing Attention Memory Gaze Generator How to look System Overview Input environment through synthetic vision component 1. Process visual field using spatial attention model 2. Modulate attended object details using memory component 3. Generate gaze behaviours towards target locations Visual Sensing Three renderings taken per visual update One full-scene rendering ( to attention module) Two false-colour renderings ( to memory module) False-colour Renderings Approximate acuity of the eye with two renderings Fovea Periphery Viewpoint Fovea Periphery Renderings Renderings allow both spatial/image and object based operations to take place 1(a) Where to look Model of Visual Attention “Bottom-up” Two-component theory of attention Endogenous Voluntary, task driven ‘Look for the coke can’ “Top-down” Exogenous Environment appears to ‘grab’ our attention Colour, intensity, orientation, motion, texture, sudden onset, etc Bottom-up Attention Orientation, intensity and colour contrast Bottom-up Attention Model Cognitive engineering Itti et al. 2000 http://ilab.usc.edu/bu/ Biologically inspired Inputs an image, outputs encoding of attention allocation Peters and O’ Sullivan 2003 Input Image Intensity RG Colour BY Colour Gaussian Pyramid Each channel acts as the first level in a Gaussian or Gabor pyramid Each subsequent level is a blurred and decimated version of the previous level Image processing techniques simulate early visual processing Center-Surround Processing Early visual processing Ganglion cells Respond to light in a center-surround pattern Contrast a central area with its neighbours Contrast important, not amplitude (CONTEXT) Simulated by comparing different levels in image pyramids Saliency Map Conspicuity Maps Input Result of center-surround calculations for each feature type Define the ‘pop-out’ for each feature type Intensity Colour Orientation Integrated into saliency map Attention directed preferably to lighter areas Saliency Map Memory Differentiate between what an agent has and hasn’t observed Agents should only know about objects that they have witnessed Agents won’t have exact knowledge about world Used to modulate output of attention module (saliency map) Object-based, taking input from synthetic vision module Stage Theory The further information goes, the longer it is retained Attention acts as a filter Stimulus Representations Two levels of detail representation for objects Proximal stimuli Early representation of the stimulus Data discernable only from retinal image Observations Later representation of stimuli after resolution with the world database Stage Theory Short-term Sensory Storage (STSS) From distal to proximal stimuli Objects have not yet been resolved with world database Stage Theory Short-term memory (STM) and Long-Term Memory (LTM) Object-based Contains resolved object information Observations store information for attended objects From proximal stimuli to observations Object pointer World-space transform Timestamp Virtual humans are not completely autonomous from the world database Memory Uncertainty Map Can now create a memory uncertainty map for any part of the scene the agent is looking at The agent is uncertain of parts of the scene it has not looked at before Depends on scene object ‘granularity’ Attention Map Determines where attention will be allocated to Bottom-up components Top-down (see 2) Memory Modulating the saliency map by the uncertainty map Here, sky and road have low uncertainty levels Human Scanpaths Eye movements and fixations Inhibition of Return Focus of attention must change Inhibit attended parts of the scene from being revisited soon Image-based IOR Problem: Moving viewer or dynamic scene Solution: Object based memory Object-based IOR Store uncertainty level with each object Modulate saliency map by uncertainty levels Artificial Regions of Interest Attention map at lower resolution than visual field Generate AROIs from highest values of current attention map to create scanpath Assume simple one-to-one mapping from attention map to overt attention 1(b) How to look Generate gaze animation given a target location Gaze shifts Combined eye-head gaze shifts to visual and auditory targets in humans, Goldring et al., 1996. Targets beyond oculomotor range Gaze Shifts Contribution of head movements Head Movement Propensity, J. Fuller, 1992. ‘Head movers’ Vs. ‘Eye movers’ ±40 degree orbital threshold Innate behavioural tendancy for subthreshold head moving Midline-attraction and Resetting Blinking Subtle and often overlooked Not looking while leaping: the linkage of blinking and saccadic gaze shifts, Evinger et al., 1994. Gaze-evoked blinking Amplitude of gaze shift influences blink probability and magnitude 2. Perception of Attention Attention behaviours may elicit attention from others Predator-prey Gaze-following Goals Intentions Gaze in Infants Infants Notice gaze direction as early as 3 months Gaze-following Infants are faster at looking at targets that are being looked at by a central face Respond even to circles that look like eyes www.mayoclinic.org Theory of Mind Baron-Cohen (1994) • Eye Direction and Intentionality Detectors • Theory of Mind Module Perrett and Emery (1994) • More general Direction of Attention Detector • Mutual Attention Mechanism Our Model ToM for conversation initiation Based on attention behaviours Key metrics in our system are Attention Levels and Level of Interest Metrics represent the amount of attention perceived to have been paid by another Based primarily on gaze Also body direction, locomotion, directed gestures and facial expressions Emotional significance of gaze Implementation (in progress) Torque game engine http://www.garagegames.com Proven engine used for number of AAA titles Useful basis providing fundamental functionality Graphics exporters In-simulation editing Basic animation Scripting Terrain rendering Special effects Overview Synthetic Vision Approximated human vision for computer agents Why? Inexpensive – no special hardware required Bypasses many computer vision complexities Segmentation of images, recognition Enables characters to receive visual information in a way analogous to humans How? Updated in a snapshot manner Small, simplified images rendered from agents perspective Textures, lighting and sfx disabled False-colouring False-colours provide a lookup scheme for acquiring objects from the database False colour defined (r,g,b) where Red is the object type identifier Green is the object instance identifier Blue is the sub-object identifier Allows quick retrieval of objects Intentionality Detector (ID) Represents behaviour in terms of volitional states (goal and desire) Based on visual, auditory and tactile cues Our version only based on vision Attributes intentionality characteristic to objects based on the presence of certain cues Implemented as a filter on objects from visual system Only “agent” objects can pass the filter Direction of Attention Direction of Attention Detector (DAD) More useful than EDD alone Eye, head, body and locomotion direction read from database after false-colour lookup Used to derive Attention Level metric from filtered stimuli Direction of Attention What happens when eyes aren’t visible? Hierarchy of other cues Head direction > Body direction > Locomotion direction Mutual Attention Comparison between: Eye direction read from other agent Focus of attention of this agent See 1. Generating Attention Behaviours If agents are focus of each others attention, Mutual Attention Mechanism (MAM) is activated Attention Levels Perception of attention paid by another At instant of time Based on orientation of body parts Eyes, head, body, locomotion direction Attention Levels Direction is weighted for each segment Eyes provide largest contribution Less attention Attention Levels Also Support for Locomotion direction Directed gestures Directed facial expressions Gestures and expressions convey emotion Attention Profiles Attention levels over a time period are stored in memory Attention Profile Consideration of all attention levels of another agent over a certain time frame Analysis can provide a higher level description of their amount of interest in me Level of interest General descriptor of amount of interest that another is perceived to be paying… …over a time period Error prone process due to reliance on perception Links between gaze and interest Were they actually looking at you? Human gaze perception is good, but not perfect Gaze does not necessarily mean attention e.g. blank stare Inherently probabilistic Theories of Mind are theories Application Conversation initiation scenarios Subtle negotation involving gaze Avoid social embarrassment of engaging in discourse with an unwilling participant Our ToMM Stores simple, high-level theories Useful for conversation initialisation behaviours – Have they seen me? • ID, DAD and MAM modules – Have they seen me looking? • ID, DAD and MAM modules – Are they interested in interacting? • Level of Interest metric Future Work Finish implementation of the model Further links between attention/memory and emotion Hardware based bu attention implementation Integration of facial expression and gestures from Greta Thank you! Questions