The Effect of Affect: Modeling the Impact of Emotional State on the Behavior of Interactive Virtual Humans Stacy Marsella, Jonathan Gratch, and Jeff Rickel USC Information Sciences Institute 4676 Admiralty Way, Suite 1001, Marina del Rey, CA, 90292 marsella@isi.edu, rickel@isi.edu USC Institute for Creative Technologies 13274 Fiji Way, Suite 600, Marina del Rey, CA, 90292 gratch@ict.usc.edu ABSTRACT A person’s behavior provides significant information about their emotional state, attitudes, and attention. Our goal is to create virtual humans that convey such information to people while interacting with them in virtual worlds. The virtual humans must respond dynamically to the events surrounding them, which are fundamentally influenced by users’ actions, while providing an illusion of human-like behavior. A user must be able to interpret the dynamic cognitive and emotional state of the virtual humans using the same nonverbal cues that people use to understand one another. Towards these goals, we are integrating and extending components from three prior systems: a virtual human architecture with a range of cognitive and motor capabilities, a model of emotional appraisal, and a model of the impact of emotional state on physical behavior. We describe the key research issues, our approach, and an initial implementation in an Army peacekeeping scenario. 1. INTRODUCTION A person’s emotional state influences them in many ways. It impacts their decision making, actions, memory, attention, voluntary muscles, etc., all of which may subsequently impact their emotional state (e.g., see [4]). This pervasive impact is reflected in the fact that a person will exhibit a wide variety of nonverbal behaviors consistent with their emotional state, behaviors that can serve a variety of functions both for the person exhibiting them as well as for people observing them. For example, shaking a fist at someone plays an intended role in communicating information. On the other hand, behaviors such as rubbing one’s thigh, averting gaze, or a facial expression of fear may have no explicitly intended role in communication. Nevertheless, these actions may suggest considerable information about a person’s emotional arousal, their attitudes, and their focus of attention. Our goal is to create virtual humans that convey these types of information to humans while interacting with them in virtual worlds. We are interested in virtual worlds that offer human users an en- gaging scenario through which they will gain valuable experience. For example, a young Army lieutenant could be trained for a peacekeeping mission by putting him in virtual Bosnia and presenting him with the sorts of situations and dilemmas he is likely to face. In such scenarios, virtual humans can play a variety of roles, such as an experienced sergeant serving as a mentor, soldiers serving as his teammates, and the local populace. Unless the lieutenant is truly drawn into the scenario, his actions are unlikely to reflect the decisions he will make under stress in real life. The effectiveness of the training depends on our success in creating engaging, believable characters that convey a rich inner dynamics that unfolds in response to the scenario. Thus, our design of the virtual humans must satisfy three requirements. First, they must be believable; that is, they must provide a sufficient illusion of human-like behavior that the human user will be drawn into the scenario. Second, they must be responsive; that is, they must respond to the events surrounding them, which will be fundamentally influenced by the user’s actions. Finally, they must be interpretable; the user must be able to interpret their response to situations, including their dynamic cognitive and emotional state, using the same nonverbal cues that people use to understand one another. Thus, our virtual humans cannot simply create an illusion of life through cleverly designed randomness in their behavior; their inner behavior must respond appropriately to a dynamically unfolding scenario, and their outward behavior must convey that inner behavior accurately and clearly. This paper describes our progress towards a model of the outward manifestations of an agent’s emotional state. Our work integrates three previously implemented systems. The first, Steve [18, 20, 19], provides an architecture for virtual humans in 3D simulated worlds that can interact with human users as mentors and teammates. Although Steve did not include any emotions, his broad capabilities provide a foundation for the virtual humans towards which we are working. The second, Émile [8], focuses on emotional appraisal: how emotions arise from the relationship between environmental events and an agent’s plans and goals. The third, IPD [11], contributes a complementary model of emotional appraisal as well as a model of the impact of emotional state on physical behavior. The integration of these three systems provides an initial model of virtual humans for experiential learning in engaging virtual worlds. Our work is part of a larger effort to add a variety of new capabilities to Steve, including more sophisticated support for spoken dialog and a more human-like model of perception [17]. Figure 1: An interactive peacekeeping scenario featuring (left to right) a sergeant, a mother, and a medic 2. EXAMPLE SCENARIO To illustrate our vision for virtual humans that can interact with people in virtual worlds, we have implemented an Army peacekeeping scenario, which has been viewed by several hundred people and was favorably viewed by many domain experts [21]. As the simulation begins, a human user, playing the role of a U.S. Army lieutenant, finds himself in the passenger seat of a simulated vehicle speeding towards a Bosnian village to help a platoon in trouble. Suddenly, he rounds a corner to find that one of his platoon’s vehicles has crashed into a civilian vehicle, injuring a local boy (Figure 1). The boy’s mother and an Army medic are hunched over him, and a sergeant approaches the lieutenant to brief him on the situation. Urgent radio calls from the platoon downtown, as well as occasional explosions and weapons fire from that direction, suggest that the lieutenant send his troops to help them. Emotional pleas from the boy’s mother, as well as a grim assessment by the medic that the boy needs a medevac immediately, suggest that the lieutenant instead use his troops to secure a landing zone for the medevac helicopter. The lieutenant carries on a dialog with the sergeant and medic to assess the situation, issue orders (which are carried out by the sergeant through four squads of soldiers), and ask for suggestions. His decisions influence the way the situation unfolds, culminating in a glowing news story praising his actions or a scathing news story exposing the flaws in his decisions and describing their sad consequences. While the current implementation of the scenario offers little flexibility to the lieutenant, it provides a rich test bed for our research. Currently, the mother, medic and sergeant are implemented as Steve agents. All other virtual humans (a crowd of locals and four squads of soldiers) are scripted characters. We will illustrate elements of our design using the mother, since her emotions are both the most dramatic in the scenario and the most crucial for influencing the lieutenant’s decisions and emotional state. While our current implementation of the mother includes a preliminary integration of Steve, Émile, and IPD, the focus of this paper is on a more general integration that goes beyond that implementation. 3. RELATED WORK Several researchers have considered computational models of emotion to drive the behavior of synthetic characters. These approaches can be roughly characterized as being either communication driven or appraisal driven. Communication driven means that one selects a communication act and then chooses emotional expression based on some desired impact it will have on the user. For example, Cosmo [10] makes a one-to-one mapping between speech acts and emotional behaviors that reinforce the act (congratulatory acts trigger an admiration emotive intent that is conveyed with applause). Ball and Breese [2] intend to convey a sense of empathy by expressing behaviors that mirror the assessed emotional state of the user. Poggi and Pelachaud [16] use facial expressions to convey the performative of a communication act, showing “potential anger” to communicate that the agent will be angry if a request is not fulfilled. In contrast, appraisal theories focus on the apparent evaluative function emotions play in human reasoning. Appraisal theories view emotion as arising from some assessment of an agent’s internal state vis-à-vis its external environment (e.g., is this event contrary to my desire?). Such appraisals can be used to guide decision-making and behavior selection. For example, Beaudoin [3] uses them to set goal priorities and guide meta-planning. They can also serve as a basis for communicating information about the agent’s assessment, though not in the intentional way viewed by communicative models. Appraisal methods are useful for giving coherence to an agent’s emotional dynamics that can be lacking in purely communicative models. This coherence is essential for conveying a sense of believability and realism [13]. These approaches are complementary, though few approaches have considered how to combine them into a coherent approach. An exception is IPD [11], which represents both appraised emotional state and communicative intent and selectively expresses one or the other based on a simple threshold model. In the discussion that follows we will focus on the problem of emotional appraisal. 4. MODELING EMOTIONS 4.1 Emotional State Our primary design goal is to create general mechanisms that are not tied to the details of any specific domain. Thus, emotional appraisals do not refer to features of the environment directly, but reference abstract features of the agent’s information processing. Appraisals and re-appraisals are triggered reflexively by changes in an agent’s mental representations, which are in turn changed by general reasoning mechanisms. External events can indirectly impact these internal representations. For example, the agent may perceive an event (e.g., soldiers are leaving the accident scene), and then a general reasoning mechanism (e.g. Steve’s task reasoning) infers the consequences of this event (e.g., the departure of troops violates the precondition of them treating my child). Alternatively, these representations may change as the result of internal decision-making (e.g., forming an intention to violate a social obligation). Appraisal frames key off of features of these representations (e.g., the negative impact of a precondition violation on goal probability or the violation of the social norm of meeting your obligations). The dynamics of behavior and emotional expression is thereby driven by the mechanisms that update an agent’s mental representations. The Émile system [8] illustrates this approach by using a domainindependent planning approach to represent mental state and make inferences, but Émile is therefore largely restricted to appraisals related to an agent’s goals and changes to the probability of goal attainment. IPD [11] explores a much richer notion of appraisal including a variety of emotional coping strategies and a richer notion of social relationships and ego identity, but is implemented in terms of a less general inference mechanism. In our joint work, we begin to bridge the gap between by buttressing Émile-style appraisals with a broader repertoire of inference strategies. Initially, we are augmenting plan-based appraisals with those related to dialog, incorporating methods for representing and updating the discourse state and social obligations that arise from discourse [15]. These structures can then serve as the basis for a more uniform representation of social commitments and obligations beyond those strictly related to dialog, and that interact with plan-based reasoning (e.g. one is obligated to redress wrongs). This will allow the system to model some of the rich dynamics of emotions relating to discourse that IPD supports, but do so in a more structured way. Émile contains a set of recognition rules that scan an agent’s internal representations and generate an appraisal frame whenever certain features are recognized. For example, whenever the agent adopts a new goal (or is informed of a goal of some other agent), frames are created to track the status of that goal. Each appraisal frame describes the appraised situation in terms of a number of specific features proposed by the OCC model [14]. These include the point of view from which the appraisal is formed, the desirability of the situation, whether the situation has come to pass or is only a possibility, whether the situation merits praise or blame (we do not model the appealingness of domain objects, an important factor in OCC). For example, if someone threatens my goal, from my perspective this is an undesirable possibility that merits blame. Based on the setting of the various features, one or more “emotion instances” are generated according to OCC. Émile incorporates a decision-theoretic method for assigning intensities to these instances. Intensity values decay over time but are “re-energized” whenever the underlying representational structures are manipulated. Individual emotion instances are aggregated into “buckets” corresponding to emotions of the same type. Thus, threats to multiple goals will be aggregated into an overall level of fear. In this sense, Émile contains a two-level representation of emotional state. Emotion instances are the level at which some semantic connection is still retained between the emotion and the mental structures underlying the appraisal. Thus, the agent has the capability of answering questions about its emotion state (e.g., “Why are you so angry?”). The aggregate buckets roughly correspond to the over- all assessment of the agent’s emotional state and are used to drive the physical focus modes discussed below. This model does yet support some of the subtle dynamics displayed by IPD which can convey an overall tone yet make agile transitions between conveyed emotional state. We are exploring some different strategies for achieving this effect, including aggregating subsets of appraisal frames on demand depending of the current discourse state, or by adding another aggregate layer that incorporates a different decay rate into appraisal intensities. 4.2 The Effect of Emotions on Behavior The agents we design incorporate a wide range of outward behaviors in order to interact believably with the environment as well as other agents and humans. Their bodies have fully articulated limbs, facial expressions, and sensory apparatus. They can move in the environment, manipulate objects and direct their gaze in appropriate ways. They are capable of rich, multi-modal communication that incorporates both verbal behaviors as well as nonverbal behaviors. In addition, they have facial expressions, body postures and the ability to perform various kinds of gestures. The key challenge for the agent design is to manage this flexibility in the agent’s physical presence in a way that conveys consistent emotional state and individual differences. To address these challenges, we rely on a wide range of work in human emotion, social behavior, clinical psychology, animation and the performance arts. For example, psychological research on emotion reveals its pervasive impact on physical behavior such as facial expressions, gaze and gestures [1, 5, 6]. These behaviors suggest considerable information about emotional arousal, attitudes and attention. For example, depressed individuals may avert their gaze and direct their gestures inward towards their bodies in self-touching behaviors. Note that such movements also serve to mediate the information available to the individual. For example, if a depressed individual’s head is lowered, this also regulates the information available to the individual. Orienting on an object of fear or anger brings the object to the focus of perceptual mechanisms, which may have indirect influences on cognition and cognitive appraisal by influencing the content of working memory. Even a soothing behavior like rubbing an arm may serve to manage what a person attends to [7]. These findings provide a wealth of data to inform agent design but often leave open many details as to how alternative behaviors are mediated. The agent technology we use allows one to create rich physical bodies for intelligent characters with many degrees of physical movement. This forces one to directly confront the problem of emotional consistency. For example, an “emotionally depressed” agent might avert gaze, be inattentive, perhaps hugging themselves. However, if in subsequent dialog the agent used strong communicative gestures such as beats [12], then the behavior might not “read” correctly. This issue is well known among animators. For example, Frank Thomas and Ollie Johnston, two of Disney’s “grand old men,” observe that “the expression must be captured throughout the whole body as well as in the face” and “any expression will be weakened greatly if it is limited only to the face, and it can be completely nullified if the body or shoulder attitude is in any way contradictory” [22]. Implicit in these various concerns is that the agent has what amounts to a resource allocation problem. The agent has limited physical assets, e.g., two hands, one body, etc. At any point in time, the agent must allocate these assets according to a variety of de- mands, such as performing a task, communicating, or emotionally soothing themselves. For instance, the agent’s dialog may be suggestive of a specific gesture for the agent’s arms and hands while the emotional state is suggestive of another. The agent must mediate between these alternative demands in a fashion consistent with their goals and their emotional state. From the human participant’s perspective, we expect this consistency in the agent’s behavior to support believability and interpretability. To address this problem we rely on the Physical Focus model [11], a computational technique inspired by work on nonverbal behavior in clinical settings [7] and Lazarus’s [9] delineation of emotiondirected versus problem-directed coping strategies. The Physical Focus model bases an agent’s physical behavior in terms of what the character attends to and how they relate to themselves and the world around them, specifically whether they are focusing on themselves and thereby withdrawing from the world or whether they are focusing on the world, engaging it. In other words, the model bases physical behavior on how the agent is choosing to cope with the world. The model organizes possible behaviors around a set of modes. Behaviors can be initiated via requests from the planner/executor or started spontaneously when the body is not otherwise engaged. At any point in time, the agent will be in a unique mode based on the current emotional state. In the current work, we model three modes of physical focus: Body Focus, Transitional Focus and Communicative Focus (as opposed to the five modes discussed in [11] and identified in Freedman’s work). The focus mode impacts behavior along several key dimensions, including which behaviors are available, how behaviors are performed, how attentive to external events the agent is, and which actions the agent selects and how it speaks. The Body Focus mode models a self-focused attention, away from the world, and is also designed to reveal considerable depression or guilt. The prevalent type of gestures available to the agent in this mode are self-touching gestures, or adaptors [5], which involve hand-to-body gestures that appear soothing (e.g., rhythmic stroking of forearm) or self-punitive (e.g., squeezing or scratching of forearm). Conversely, communicative gestures such as deictic or beat gestures [12] are muted in this mode both in terms of their numbers and dynamic extent. Indeed, overall gesturing in this mode is muted in its dynamics. Also, the agent is less attentive to the environment and in particular exhibits considerable gaze aversion and downward looking gaze. The agent’s verbal behavior is inhibited and marked by pauses. In terms of action selection, the agent has a reduced preference for communicative acts. Transitional Focus indicates a less divided attention, less depression or guilt, a burgeoning willingness to take part in the conversation, milder conflicts with the problem solving and a closer relation to the listener. This mode’s gesturing is marked by hand-to-hand gestures (such as rubbing hands or hand fidgetiness) and handto-object gestures, such as playing with a pen. There are more communicative gestures in this mode but they are still muted or stilted. Finally, Communicative Focus is designed to indicate a full willingness to engage the world in dialog and action. In this mode, the agent’s full range of communicative gestures are available. On the other hand, adaptors are muted and occur only when the agent is listening or otherwise not occupied. In contrast, the communicative gestures are more exaggerated, both in terms of numbers and physical dynamics. The agent is also more attentive to and responsive to events in the environment. Transition between modes is currently based on emotional state derived from the appraisal model. High levels of sadness, decreased hope, or guilt, in absolute terms and relative to other emotional buckets, induces transitions away from Communicative Focus and towards Body Focus. Increased hope or anger induces transitions towards Communicative Focus. The transitions are designed to be “sticky” so that the agent does not readily pop into and then out of a mode. Transitional Focus, true to its name, lies between the other two modes. However, extreme emotional state can bypass it. Grouping behaviors into modes attempts to (a) mediate competing demands on an agent’s physical resources and (b) coalesce behaviors into groups that provide a consistent interpretation about how the agent is relating to its environment. It is designed to do this in a fashion consistent with emotional state, with the intent that it be general across agents. However, realism also requires that specific behaviors within each mode incorporate individual differences, as in human behavior. For example, we would not expect a mother’s repertoire of gestures to be identical to that of an army sergeant. Indeed, the current implementation is a step en route to a fuller rendering of a coping model: a model of how the agent chooses to cope with emotional stress that includes both emotional appraisal as well as individual differences (personality). 5. CONCLUSION The challenge of this research is not an easy one. We seek to create virtual humans whose behavior is a believable rendition of human-like behavior, is responsive to dynamically unfolding scenarios and is interpretable using the same cues that people use to understand each other. A key factor for success will be a faithful rendering of the cause of emotion and its outward manifestation. To address that challenge, we have begun to integrate components from three previous systems: Steve’s virtual human architecture, Émile’s emotional appraisal and IPD’s complementary model of emotional appraisal and its impact on physical behavior. An initial integration has already been realized and implemented in the mother character of our Army peacekeeping scenario. However, our integration efforts are ongoing. Our current efforts are on generalizing that integration and incorporating more of the IPD social-based appraisals into the general inferencing mechanisms in Steve and Émile. 6. ACKNOWLEDGMENTS This research was funded by the Army Research Institute under contract TAPC-ARI-BR, the National Cancer Institute under grant R25 CA65520-04, and the U.S. Army Research Office under contract DAAD19-99-C-0046. 7. REFERENCES [1] M. Argyle and M. Cook. Gaze and Mutual Gaze. Cambridge University Press, Cambridge, 1976. [2] G. Ball and J. Breese. Emotion and personality in a conversational agent. In J. Cassell, J. Sullivan, S. Prevost, and E. Churchill, editors, Embodied Conversational Agents. MIT Press, Cambridge, MA, 2000. [3] L. Beaudoin. Goal Processing in Autonomous Agents. PhD thesis, University of Birmingham, 1995. CSRP-95-2. [4] L. Berkowitz. Causes and Consequences of Feelings. Cambridge University Press, 2000. [5] P. Ekman and W. V. Friesen. The repertoire of nonverbal behavior: Categories, origins, usage, and coding. Semiotica, 1:49–97, 1969. [6] P. Ekman and W. V. Friesen. Constants across cultures in the face and emotion. Personality and Social Psychology, 17(2), 1971. [7] N. Freedman. The analysis of movement behavior during the clinical interview. In A. Siegman and B. Pope, editors, Studies in Dyadic Communication, pages 177–210. Pergamon Press, New York, 1972. [8] J. Gratch. Émile: Marshalling passions in training and education. In Proceedings of the Fourth International Conference on Autonomous Agents, pages 325–332, New York, 2000. ACM Press. [9] R. Lazarus. Emotion and Adaptation. Oxford Press, 1991. [10] J. C. Lester, J. L. Voerman, S. G. Towns, and C. B. Callaway. Deictic believability: Coordinating gesture, locomotion, and speech in lifelike pedagogical agents. Applied Artificial Intelligence, 13:383–414, 1999. [11] S. C. Marsella, W. L. Johnson, and C. LaBore. Interactive pedagogical drama. In Proceedings of the Fourth International Conference on Autonomous Agents, pages 301–308, New York, 2000. ACM Press. [12] D. McNeill. Hand and Mind: What Gestures Reveal about Thought. University of Chicago Press, 1992. [13] W. S. Neal Reilly. Believable Social and Emotional Agents. PhD thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 1996. Technical Report CMU-CS-96-138. [14] A. Ortony, G. Clore, and A. Collins. The Cognitive Structure of Emotions. Cambridge University Press, 1988. [15] M. Poesio and D. Traum. Towards an axiomatization of dialogue acts. In J. Hulstijn and A. Nijhol, editors, Proceedings of the Twente Workshop on the Formal Semantics and Pragmatics of Dialogues (13th Twente Workshop on Language Technology), pages 207–222, Enschede, 1998. [16] I. Poggi and C. Pelachaud. Emotional meaning and expression in performative faces. In International Workshop on Affect in Interactions: Towards a New Generation of Interfaces, Siena, Italy, 1999. [17] J. Rickel, J. Gratch, R. Hill, S. Marsella, and W. Swartout. Steve goes to Bosnia: Towards a new generation of virtual humans for interactive experiences. In AAAI Spring Symposium on Artificial Intelligence and Interactive Entertainment, March 2001. [18] J. Rickel and W. L. Johnson. Animated agents for procedural training in virtual reality: Perception, cognition, and motor control. Applied Artificial Intelligence, 13:343–382, 1999. [19] J. Rickel and W. L. Johnson. Virtual humans for team training in virtual reality. In Proceedings of the Ninth International Conference on Artificial Intelligence in Education, pages 578–585. IOS Press, 1999. [20] J. Rickel and W. L. Johnson. Task-oriented collaboration with embodied agents in virtual worlds. In J. Cassell, J. Sullivan, S. Prevost, and E. Churchill, editors, Embodied Conversational Agents. MIT Press, Cambridge, MA, 2000. [21] W. Swartout, R. Hill, J. Gratch, W. Johnson, C. Kyriakakis, C. LaBore, R. Lindheim, S. Marsella, D. Miraglia, B. Moore, J. Morie, J. Rickel, M. Thiébaux, L. Tuch, R. Whitney, and J. Douglas. Toward the holodeck: Integrating graphics, sound, character and story. In Proceedings of the Fifth International Conference on Autonomous Agents, 2001. [22] F. Thomas and O. Johnston. The Illusion of Life: Disney Animation. Walt Disney Productions, New York, 1981.