Benny_Introduction & Methods - NCAS

advertisement
Running head: EMOTIONAL EXPRESSIONS OF VIRTUAL AGENTS
0
Title: Relevance of modality of virtual agents’ emotional expressions on the recognition of
emotional states and user evaluations
Benny Liebold
Chemnitz University of Technology
Author Note
Institute for Media Research, Chair of Media Psychology
Chemnitz University of Technology, Germany
Corresponding author contact information:
Benny Liebold
Institute for Media Research, Chair of Media Psychology
Chemnitz University of Technology
Thüringer Weg 11, 09126 Chemnitz, Germany
Benny.Liebold@phil.tu-chemnitz.de
EMOTIONAL EXPRESSIONS OF VIRTUAL AGENTS
Abstract
...
Keywords: …
1
EMOTIONAL EXPRESSIONS OF VIRTUAL AGENTS
2
Relevance of modality of virtual agents’ emotional expressions on the recognition of
emotional states and user evaluations
Due to their ability to utilize naturalistic means of communication, virtual agents (VA) are
considered a promising technology in our effort to create more naturalistic interfaces (ref) for
Human-Computer Interaction (HCI). The emotional expressiveness of virtual agents is
believed to further enrich the interaction process between user and virtual agent by increasing
the virtual agents believability (ref). According to this assumption developers of virtual agent
platforms often integrate the possibility to display the agents emotional state. However,
current technology only provides the necessary means to create realistically looking synthetic
facial expressions and gestures, but has not gained the ability yet to create emotional
synthetic voice. As a result, developers of virtual agent platforms often implement visual cues
(ref) but at the same time do not implement auditory cues resulting in a disparity of
emotionality in different communication channels.
Consequently, research on effects of emotional expressiveness of virtual agents focused on
single information channels with visual cues (mimic, gestures) representing the major part of
the research. For example de Melo, Carnevale and Gratch (2012) demonstrated that an VA’s
visual basic emotion displays resulted in changes in the participants decision making while
negotiating with the VA. Additionally, the timing of emotional expressions is believed to
strongly moderate the effects of emotional expressions on the interaction partner’s perception
of the VA (Asendorpf & Schönbrodt, 2011). However, researching single information
channels of emotional expressions does not reflect the fact that natural expressions of
emotion are dynamic and situationally specific multimodal arrangements of expressive
behaviors (Scherer & Ellgring, 2007) that utilize at least visual and auditory cues. Even more
important, emotion expressions, where only one modality contains emotionally relevant
information and the other one neutral expressions, are relatively unusual in face-to-face
EMOTIONAL EXPRESSIONS OF VIRTUAL AGENTS
3
communication. This paper investigates the effect of such unimodal vs. multimodal emotion
expressions on identifying a virtual agents emotional state. We argue that incongruent
emotion expressions influences the user’s evaluation of virtual agents by impairing the user’s
ability to recognize the virtual agent’s respective emotional state correctly. Identifying
emotional states is a necessary precursor of assumed effects of a virtual agents emotion
expressions on the user’s evaluation.
Prior research on multimodal emotion recognition in human expressions indicates that our
ability to recognize a person’s emotional state depends on the respectively available
information channels (ref). Single displays of single information channels (e.g. mimic vs.
prosody) indicate that visual emotion cues are recognized best, but auditory cues are also
recognized well above chance level (ref, ref). In the case of virtual agents, the presence or
absence of expressive behavior conveyed in different information channels can conflict
regarding its relevance to the agents emotional state: The user would implicitly have to
decide, whether for example the virtual agent’s neutral voice is a relevant expression for the
agent’s emotional state, when presented together with emotional facial expressions.
Considering the flow of information in the emotion recognition process as a modified
Brunswikian lens model as suggested by Scherer (ref - 1978), we integrate emotional
information conveyed in different modalities into a coherent judgment of the senders
emotional state. Because human emotions are typically expressed as multimodal
arrangements, we should tend to interpret the neutral expression in a single information
channel of a virtual agent as a relevant component of the agent’s emotional state. We
therefore assume that neutral information channels that are presented in conjunction with
emotionally relevant information channels reduce the users ability to recognize the virtual
agents emotional state compared to the presentation of emotional information alone.
Additionally, the user’s reaction time until he makes a decision should be shorter for single
EMOTIONAL EXPRESSIONS OF VIRTUAL AGENTS
4
emotion displays without neutral information channels, because users would not have to
integrate conflicting information into their judgment.
According to the Brunswikian lens model, multimodally consistent emotional information
should increase the user’s ability to identify the respective emotional state correctly as well as
decrease the reaction time in the decision process as a result of the increased naturalness of
expression.
Method
To test our hypotheses, we conducted an experiment, in which participants were asked to
indicate a virtual agent’s emotional state, that was presented via several video clips. We
recruited 84 students (f=XX, m=XX) with a mean age of M=22 (SD=XX). They received
credit points for their participation in the study.
To identify the effect of coherence of expressive behavior in different modalities, we varied
the virtual agent’s modality of expressive behavior in a modified 2×2 within-subjects design
with the presence or absence of emotional cues in mimic and prosody as within-subjects
factors. A similar approach was used in the Multimodal Emotion Recognition Test (MERT)
by Bänziger, xxx and Scherer (ref). They presented both short video clips of real actors with
mimic (no voice), vocal (no video) or multimodal (both) emotion expressions and pictures of
the respective video’s apex of the emotion expression. The vocal expressions did not contain
any meaningful content in order to isolate the effect of vocal emotion expressions from
context influences. MERT further differentiates five different emotions at two intensity levels
resulting in a 4(modality)×5(emotion)×2(intensity) within-subjects design. Because each
combination of conditions is enacted by two different encoders MERT contains 80 stimuli
that have to be rated according to their emotion quality and intensity by choosing one out of
ten emotion descriptions.
EMOTIONAL EXPRESSIONS OF VIRTUAL AGENTS
5
To compare the results of multimodal emotion recognition performance of virtual actors to
real actors, we used a similar design, but modified the modality conditions slightly: We
changed the picture condition to a context condition (denoted as C), that contained neutral
mimic and neutral vocal expressions, but used speech samples that only in this condition
contained emotionally relevant content. Further, we added an additional condition to each
condition, where only one modality contained emotionally relevant cues: Each of the two
unimodal emotion expressions were presented either together with neutral expressions in the
other modality (An, Vn) or without the other modality (i.e. no video or no sound; A0, V0). The
multimodal condition with both mimic and vocal expressions (AV) remained the same. An
overview of the conditions is presented in Table 1.
Stimulus Materials
Video clips were based on an animated virtual agent of the computer game Half Life 2
(Valve, 200X). The ability of the game engine to display authentic mimic expressions is
based on the Facial Action Coding System (FACS) by Ekman, Friesen and Hager (ref, 2002)
allowing to manipulate facial parameters according to reported research results. We used one
of the game’s female main characters (Alyx) as the enacting virtual agent, because it presents
itself as an attractive and the most sophisticated animated model of the game in terms of
facial expressiveness. Facial animation parameters for different emotions were drawn from
the same FACS-coded video clips of actors that have been used in MERT (ref). We then
created identical facial expressions at different intensity levels and chose appropriate video
clips for low and high emotion intensities via a pretest.
Because the current technology does not provide the necessary means to synthesize
emotional voice samples in an appropriate and authentic way, we used real actors to perform
the necessary voice samples. We recorded two meaningless sentences that were used in
MERT as well and have been reported to be recognized as a foreign language (ref). Four
EMOTIONAL EXPRESSIONS OF VIRTUAL AGENTS
6
female actors were recruited to perform both sentences according to the respective affective
state as well as ten context sentences that implied attributions that are consistent with specific
emotions according to cognitive emotion theories (e.g. Weiner, OCC). The sentences are
presented in table 2. We then selected the most appropriate speaker via a pretest. Afterwards
the facial animations and the voice samples were integrated and manual lip synchronization
was applied. The resulting video clips had an average length of M=2s (SD=XX) and
presented only the apex of a facial expression (see figure 1). The final test employed two
randomized orders of the video clips, where no emotion or modality was presented in two
subsequent video clips. The tests design resulted in 6(modality)×5(emotion)×2(intensity) =
60 video clips.
Measures
The developed test as well as MERT were presented via E-Prime 2.0 which allowed to
measure the given response and reaction times. We further administered the short form of the
Trait Emotional Intelligence Questionaire (tEI-Que SF; ref) and the Big Five Inventory Short
Form (BFI-K, ref).
Results
text
Discussion
text
EMOTIONAL EXPRESSIONS OF VIRTUAL AGENTS
7
References
Asendorpf, J. B., & Schönbrodt, F. D. (2011). The Challenge of Constructing
Psychologically Believable Agents. Journal of Media Psychology: Theories, Methods,
and Applications, 23(2), 100-107. doi: 10.1027/1864-1105/a000040
de Melo, C. M., Carnevale, P., & Gratch, J. (2012). The effect of virtual agents’ emotion
displays and appraisals on people’s decision making in negotiation. In Y. Nakano, M.
Neff, A. Paiva & M. Walker (Eds.), Intelligent Virtual Agents, 12th International
Conference, IVA 2012, Santa Cruz, CA, USA (pp. 53-66). New York: Springer.
Scherer, K. R., & Ellgring, H. (2007). Multimodal expression of emotion: affect programs or
componential appraisal patterns? Emotion, 7(1), 158-171. doi: 10.1037/15283542.7.1.158
EMOTIONAL EXPRESSIONS OF VIRTUAL AGENTS
Figure 1: Facial expressions used in the study
8
EMOTIONAL EXPRESSIONS OF VIRTUAL AGENTS
9
Table 1: Employed conditions of the modality of emotion expressions
vocal cues
yes
yes
mimic cues
no
AV
A0
An
no
V0
Vn
C
Explanation: AV = emotional expression in mimic and voice; V0 = emotional mimic and no sound; Vn =
emotional mimic and neutral speech; A0 = emotional speech and no picture; An = emotional speech and neutral
mimic; C = neutral expression in mimic and voice, but meaningful speech content
EMOTIONAL EXPRESSIONS OF VIRTUAL AGENTS
10
Tabelle 2: Translation of German context sentences and meaningless sentences used in the study
Emotion
Low intensity
High intensity
anger
disgust
sadness
fear
happiness
meaningless1
meaningless 2
He has been unfriendly to me.
There is rotten meat in the freezer.
The storm demolished my garden.
There are said to be wolves in the forest again.
I received a present.
Haett sandig pron you venzy.
Tim intentionally broke my cellphone.
He has been sentenced to prison for lifetime.
My father died.
The plane’s engines failed.
I won a lot of money in the lottery.
Fee goett laich jonkill goster.
Download