Running head: EMOTIONAL EXPRESSIONS OF VIRTUAL AGENTS 0 Title: Relevance of modality of virtual agents’ emotional expressions on the recognition of emotional states and user evaluations Benny Liebold Chemnitz University of Technology Author Note Institute for Media Research, Chair of Media Psychology Chemnitz University of Technology, Germany Corresponding author contact information: Benny Liebold Institute for Media Research, Chair of Media Psychology Chemnitz University of Technology Thüringer Weg 11, 09126 Chemnitz, Germany Benny.Liebold@phil.tu-chemnitz.de EMOTIONAL EXPRESSIONS OF VIRTUAL AGENTS Abstract ... Keywords: … 1 EMOTIONAL EXPRESSIONS OF VIRTUAL AGENTS 2 Relevance of modality of virtual agents’ emotional expressions on the recognition of emotional states and user evaluations Due to their ability to utilize naturalistic means of communication, virtual agents (VA) are considered a promising technology in our effort to create more naturalistic interfaces (ref) for Human-Computer Interaction (HCI). The emotional expressiveness of virtual agents is believed to further enrich the interaction process between user and virtual agent by increasing the virtual agents believability (ref). According to this assumption developers of virtual agent platforms often integrate the possibility to display the agents emotional state. However, current technology only provides the necessary means to create realistically looking synthetic facial expressions and gestures, but has not gained the ability yet to create emotional synthetic voice. As a result, developers of virtual agent platforms often implement visual cues (ref) but at the same time do not implement auditory cues resulting in a disparity of emotionality in different communication channels. Consequently, research on effects of emotional expressiveness of virtual agents focused on single information channels with visual cues (mimic, gestures) representing the major part of the research. For example de Melo, Carnevale and Gratch (2012) demonstrated that an VA’s visual basic emotion displays resulted in changes in the participants decision making while negotiating with the VA. Additionally, the timing of emotional expressions is believed to strongly moderate the effects of emotional expressions on the interaction partner’s perception of the VA (Asendorpf & Schönbrodt, 2011). However, researching single information channels of emotional expressions does not reflect the fact that natural expressions of emotion are dynamic and situationally specific multimodal arrangements of expressive behaviors (Scherer & Ellgring, 2007) that utilize at least visual and auditory cues. Even more important, emotion expressions, where only one modality contains emotionally relevant information and the other one neutral expressions, are relatively unusual in face-to-face EMOTIONAL EXPRESSIONS OF VIRTUAL AGENTS 3 communication. This paper investigates the effect of such unimodal vs. multimodal emotion expressions on identifying a virtual agents emotional state. We argue that incongruent emotion expressions influences the user’s evaluation of virtual agents by impairing the user’s ability to recognize the virtual agent’s respective emotional state correctly. Identifying emotional states is a necessary precursor of assumed effects of a virtual agents emotion expressions on the user’s evaluation. Prior research on multimodal emotion recognition in human expressions indicates that our ability to recognize a person’s emotional state depends on the respectively available information channels (ref). Single displays of single information channels (e.g. mimic vs. prosody) indicate that visual emotion cues are recognized best, but auditory cues are also recognized well above chance level (ref, ref). In the case of virtual agents, the presence or absence of expressive behavior conveyed in different information channels can conflict regarding its relevance to the agents emotional state: The user would implicitly have to decide, whether for example the virtual agent’s neutral voice is a relevant expression for the agent’s emotional state, when presented together with emotional facial expressions. Considering the flow of information in the emotion recognition process as a modified Brunswikian lens model as suggested by Scherer (ref - 1978), we integrate emotional information conveyed in different modalities into a coherent judgment of the senders emotional state. Because human emotions are typically expressed as multimodal arrangements, we should tend to interpret the neutral expression in a single information channel of a virtual agent as a relevant component of the agent’s emotional state. We therefore assume that neutral information channels that are presented in conjunction with emotionally relevant information channels reduce the users ability to recognize the virtual agents emotional state compared to the presentation of emotional information alone. Additionally, the user’s reaction time until he makes a decision should be shorter for single EMOTIONAL EXPRESSIONS OF VIRTUAL AGENTS 4 emotion displays without neutral information channels, because users would not have to integrate conflicting information into their judgment. According to the Brunswikian lens model, multimodally consistent emotional information should increase the user’s ability to identify the respective emotional state correctly as well as decrease the reaction time in the decision process as a result of the increased naturalness of expression. Method To test our hypotheses, we conducted an experiment, in which participants were asked to indicate a virtual agent’s emotional state, that was presented via several video clips. We recruited 84 students (f=XX, m=XX) with a mean age of M=22 (SD=XX). They received credit points for their participation in the study. To identify the effect of coherence of expressive behavior in different modalities, we varied the virtual agent’s modality of expressive behavior in a modified 2×2 within-subjects design with the presence or absence of emotional cues in mimic and prosody as within-subjects factors. A similar approach was used in the Multimodal Emotion Recognition Test (MERT) by Bänziger, xxx and Scherer (ref). They presented both short video clips of real actors with mimic (no voice), vocal (no video) or multimodal (both) emotion expressions and pictures of the respective video’s apex of the emotion expression. The vocal expressions did not contain any meaningful content in order to isolate the effect of vocal emotion expressions from context influences. MERT further differentiates five different emotions at two intensity levels resulting in a 4(modality)×5(emotion)×2(intensity) within-subjects design. Because each combination of conditions is enacted by two different encoders MERT contains 80 stimuli that have to be rated according to their emotion quality and intensity by choosing one out of ten emotion descriptions. EMOTIONAL EXPRESSIONS OF VIRTUAL AGENTS 5 To compare the results of multimodal emotion recognition performance of virtual actors to real actors, we used a similar design, but modified the modality conditions slightly: We changed the picture condition to a context condition (denoted as C), that contained neutral mimic and neutral vocal expressions, but used speech samples that only in this condition contained emotionally relevant content. Further, we added an additional condition to each condition, where only one modality contained emotionally relevant cues: Each of the two unimodal emotion expressions were presented either together with neutral expressions in the other modality (An, Vn) or without the other modality (i.e. no video or no sound; A0, V0). The multimodal condition with both mimic and vocal expressions (AV) remained the same. An overview of the conditions is presented in Table 1. Stimulus Materials Video clips were based on an animated virtual agent of the computer game Half Life 2 (Valve, 200X). The ability of the game engine to display authentic mimic expressions is based on the Facial Action Coding System (FACS) by Ekman, Friesen and Hager (ref, 2002) allowing to manipulate facial parameters according to reported research results. We used one of the game’s female main characters (Alyx) as the enacting virtual agent, because it presents itself as an attractive and the most sophisticated animated model of the game in terms of facial expressiveness. Facial animation parameters for different emotions were drawn from the same FACS-coded video clips of actors that have been used in MERT (ref). We then created identical facial expressions at different intensity levels and chose appropriate video clips for low and high emotion intensities via a pretest. Because the current technology does not provide the necessary means to synthesize emotional voice samples in an appropriate and authentic way, we used real actors to perform the necessary voice samples. We recorded two meaningless sentences that were used in MERT as well and have been reported to be recognized as a foreign language (ref). Four EMOTIONAL EXPRESSIONS OF VIRTUAL AGENTS 6 female actors were recruited to perform both sentences according to the respective affective state as well as ten context sentences that implied attributions that are consistent with specific emotions according to cognitive emotion theories (e.g. Weiner, OCC). The sentences are presented in table 2. We then selected the most appropriate speaker via a pretest. Afterwards the facial animations and the voice samples were integrated and manual lip synchronization was applied. The resulting video clips had an average length of M=2s (SD=XX) and presented only the apex of a facial expression (see figure 1). The final test employed two randomized orders of the video clips, where no emotion or modality was presented in two subsequent video clips. The tests design resulted in 6(modality)×5(emotion)×2(intensity) = 60 video clips. Measures The developed test as well as MERT were presented via E-Prime 2.0 which allowed to measure the given response and reaction times. We further administered the short form of the Trait Emotional Intelligence Questionaire (tEI-Que SF; ref) and the Big Five Inventory Short Form (BFI-K, ref). Results text Discussion text EMOTIONAL EXPRESSIONS OF VIRTUAL AGENTS 7 References Asendorpf, J. B., & Schönbrodt, F. D. (2011). The Challenge of Constructing Psychologically Believable Agents. Journal of Media Psychology: Theories, Methods, and Applications, 23(2), 100-107. doi: 10.1027/1864-1105/a000040 de Melo, C. M., Carnevale, P., & Gratch, J. (2012). The effect of virtual agents’ emotion displays and appraisals on people’s decision making in negotiation. In Y. Nakano, M. Neff, A. Paiva & M. Walker (Eds.), Intelligent Virtual Agents, 12th International Conference, IVA 2012, Santa Cruz, CA, USA (pp. 53-66). New York: Springer. Scherer, K. R., & Ellgring, H. (2007). Multimodal expression of emotion: affect programs or componential appraisal patterns? Emotion, 7(1), 158-171. doi: 10.1037/15283542.7.1.158 EMOTIONAL EXPRESSIONS OF VIRTUAL AGENTS Figure 1: Facial expressions used in the study 8 EMOTIONAL EXPRESSIONS OF VIRTUAL AGENTS 9 Table 1: Employed conditions of the modality of emotion expressions vocal cues yes yes mimic cues no AV A0 An no V0 Vn C Explanation: AV = emotional expression in mimic and voice; V0 = emotional mimic and no sound; Vn = emotional mimic and neutral speech; A0 = emotional speech and no picture; An = emotional speech and neutral mimic; C = neutral expression in mimic and voice, but meaningful speech content EMOTIONAL EXPRESSIONS OF VIRTUAL AGENTS 10 Tabelle 2: Translation of German context sentences and meaningless sentences used in the study Emotion Low intensity High intensity anger disgust sadness fear happiness meaningless1 meaningless 2 He has been unfriendly to me. There is rotten meat in the freezer. The storm demolished my garden. There are said to be wolves in the forest again. I received a present. Haett sandig pron you venzy. Tim intentionally broke my cellphone. He has been sentenced to prison for lifetime. My father died. The plane’s engines failed. I won a lot of money in the lottery. Fee goett laich jonkill goster.