Learning to Influence Emotional Responses for Interactive Storytelling

advertisement
Learning to Influence Emotional Responses for Interactive Storytelling
David L. Roberts and Harikrishna Narayanan and Charles L. Isbell
School of Interactive Computing
College of Computing
Georgia Institute of Technology
robertsd@cc.gatech.edu, harikrishna@gatech.edu, isbell@cc.gatech.edu
Abstract
We present an architecture for interactive storytelling. The
system interleaves pre-authored text with pre-selected videos
to generate a story. Between iterations, the player is given
an opportunity to answer questions that help to drive the narrative. Videos are used to have an effect on the emotional
response of the players. The system is capable of performing modeling of both videos and players to better adapt the
narrative progression in response to the player’s answers to
questions.
The system is designed to serve two purposes: 1) to label natural language utterances and passages for use in a text classification system; and 2) to serve as a test environment for
computational models of influence and persuasion. We motivate the approach we have taken by describing research efforts on classifying emotion in natural language and on the
use of influence to affect decision making. The architecture
and prototype system is described in detail. A summary of
human-subject experiments planned are included as well.
Introduction
In this paper we present an architecture for authoring and
interactively telling stories using a combination of videos
and text. The system is designed to serve two purposes: 1)
to create and observe emotions in humans while generating
a labeled data set for a machine learning/natural language
processing system; and 2) to test the effect of computational
models of influence based on theories from social psychology and behavioral economics.
Stories play a significant role in the way we communicate. On a personal level, we use stories to inform others of
important events in our lives and as a window into our emotional state. We are a culture of story tellers. In fact, we are
so practiced at storytelling that we can adjust our approach
in reaction to our audience. We can read reactions like eye
contact, facial ticks or positions, body posture, and response
to questions as we tell our stories. All of these signs that we
interpret enable us to adapt the details of our story to try to
engage the listener, establish a rapport with them, and ideally ensure they experience the exact reaction to our story
that we intend for them to experience.
c 2009, Association for the Advancement of Artificial
Copyright Intelligence (www.aaai.org). All rights reserved.
In addition, artists and authors (and an increasing number of game designers) use story as a tool to create an alternate reality for people. A skilled storyteller has the ability
to cause the present to fade and construct a fictional world
that the listener(s) become immersed in. In doing so, the
storyteller can be very influential to the listener, causing a
range of emotions—some of which may extend long after
the telling of the story has completed.
In traditional storytelling mediums such as print and film,
the listener is a mostly passive participant in the experience.
With the exception of the minor adjustments they can make
by using their own imagination, listeners have no affect on
the outcome or style in which the story is told. In recent
years, computer hardware technology has reached a point
that enables real-time 3d graphics, complex virtual characters, sophisticated AI algorithms, and other approaches to be
used in the design of interactive virtual experiences where
the listener takes on a more active role in the story telling
process. This active role, in effect, changes the passive listener into an active player.
It is important to note that when a listener becomes a
player a set of challenges arise that can potentially be significant hurdles for the story author. First, the author must take
steps to create “setting” and “characters”, which we refer to
as the model. Given the choice of model, the author must
specify what, out of everything consistent with the model, is
consistent with their aesthetic or artistic goals for the story.
For example, if the author is creating an interactive mystery
drama, the model would consist of the location (perhaps a
town), structures in the environment (perhaps a few buildings, streets, cars, etc.) and characters that populate the environment (some that are important the plot and some that
may not be). The choice of model is not particularly specific
and in practice the author’s goals for the experience represent a subset of possible interactions that are consistent with
the model. The ability for the player to interact, or more
specifically exercise their sense of self-agency, allows for
goal-driven exploration on the part of the player. In the case
of live storytelling this is tantamount to allowing listeners to
ask questions and direct the storyteller. The degree to which
the player’s self-agency can lead them to parts of the space
that are not consistent with the author’s goals is the degree
to which threats to the narrative exist.
Threats in an interactive storytelling environment necessi-
tate the use of what has been referred to as a drama manager
(DM) in the literature (Laurel 1986). A drama manager is an
agent of the author that is tasked with shaping the player’s
experience. In part, the drama manager functions like a human storyteller would if they were adapting their story based
on their observations of the listeners. Throughout the remainder of this paper we will present an interactive storytelling architecture designed to study the effect of influence,
emotions, and the adaption of stories based on emotions.
Why Emotions?
Emotions are vital to what makes us human. They enhance
and augment our ability to experience our environment and
our relationships with others. A well told story can have
a profound effect on us. It can shape our perspective on
important issues or influence our opinion of people. The
most effective storytellers create an emotional connection
with their audience. They do so by describing characters or
events in their stories that listeners can identify with. In part,
listeners latch onto the emotional response that characters
have to story events or to other characters. There are two
flavors of research related to emotion in a virtual setting:
generation and recognition.
There has been a lot of work on modeling emotion for virtual characters in the intelligent virtual agent’s community.
When embodied virtual characters are used to tell a story, it
is important that they appear believable. Subtle verbal and
non-verbal behaviors in virtual characters can convey deep
emotion. Thus, for embodied characters generating emotion
is a critical task (Gratch and Marsella 2001). A character’s
emotional state has an effect on their decision-making, actions, memory, attention, voluntary muscles, etc. (Berkowitz
2000). Because emotion is portrayed by such a wide array
of behaviors, we can learn a lot from the actions people take
without hearing their words.
On the other side of the spectrum, and more closely related to this paper, is work on recognizing emotion in humans. There are a variety of approaches to this task using a number of modalities. Some approaches make use
of techniques from computer vision that are based on nonverbal gestures such as gaze aversion (Morency, Christoudias, and Darrell 2006) and head pose (Stiefelhagen
2002). In more recent work, the vision algorithms are augmented with conversational context to improve the recognition accuracy (Morency et al. 2007). Using context that is
available by analyzing natural language (either recognized
or generated) or simply world state information can aide significantly in the interpretation of recognized gestures.
In addition to recognition of non-verbal behaviors, recognizing prosody is a key element to interpreting emotion.
Prosody is the quality of speech as observed by pitch, intonation, duration, rhythm, volume, etc. Prosodic cues can be
equally informative when classifying utterances according
to emotional state. Dellaert, Polzin, and Waibel present an
approach based on smoothing pitch in voice recordings of
utterances (1996). Interactive voice response systems (such
as automated telephone call centers) have been used to study
speech classification based on anger or neutrality. A number of machine learning methods such as artificial neural
networks, support vector machines, and decision trees have
been applied to that domain (Yacoub et al. 2003); however,
it has been shown that prosodic cues alone do not yield accurate results in real world scenarios (Huber et al. 2000). The
additional use of context similar to Morency et al. (2007)
has been demonstrated to be warranted in these situations.
Lastly, there have been a number of research efforts to
recognize emotion using physiological measurements such
as heart rate, skin conductance, and facial electromyogram, which are summarized by Lisetti and Nasoz (2004).
Those studies were conducted using a “noninvasive” wearable computer. They achieved classification accuracy in the
80% − 90% range for most of the six emotions they were
attempting to classify: sadness, anger, surprise, fear, frustration, and amusement. A relatively high success rate such
as this is more than adequate for a majority of interactive
entertainment settings.
There have been some efforts to analyze emotion from
text (as apposed to speech). Most characterize emotion with
low granularity into five categories such as: high arousal,
negative; low arousal, negative; neutral; low arousal, positive; and high arousal, positive (Osherenko 2008). These
methods tend to rely on an “affect dictionary” that provides
relationships between words and emotions (Read 2004).
There has been some effort to learn an affect dictionary
from general dictionaries, but the results of experiments indicate that domain-specific affect dictionaries tailored to the
corpora used results in higher classification accuracy (Osherenko and Andr 2007).
To avoid the problems of affect dictionaries, some approaches have leveraged commonsense reasoning based on
ontologies of real world facts combined with lexical models (Liu, Lieberman, and Selker 2003). Neviarouskaya,
Prendinger, and Ishizuka present an approach also based on
ontological knowledge for characterizing one of nine emotions in “journal style” blog posts (2007). To measure success, the authors have three human participants hand classify
each sentence in the corpus into one of the nine emotions
they are trying to identify.
The idea of using humans playing a game to label data
for machine use is similar to von Ahn’s notion of human
computation (2006). von Ahn, along with his collaborators,
have created web games such as Peekaboom (von Ahn, Liu,
and Blum 2006) and Verbosity (von Ahn, Kedia, and Blum
2006) to get humans to label images based on their contents and to acquire common-sense knowledge—two things
machines are not good at. Similarly, we hope to leverage
interactive story participants to perform similar tasks. We
will discuss in more detail below the ways in which our system can help to create labeled corpora for these types of affect/emotion recognition experiments without the need for
explicit human coders.
Why Influence?
As described above, the task of effectively balancing between an author’s goals for an interactive story and the
player’s goals is very important. This task amounts to the
minimization of potential threats. There have been numerous representations of story and associated algorithms for
solving this problem collectively referred to as experience
or drama managers (Roberts and Isbell 2008; Mateas 1999).
A majority of the work on drama management has been on
abstract representations of story (with a few exceptions) and
results presented are of experiments conducted in simulation.
The use of simulation to evaluate drama mangers has
led to some criticism of the result’s applicability to realworld interactive experiences. For example, the Declarative Optimization-based Drama Manager (DODM) first proposed by Bates (1992) and studied more extensively later
by Weyhrauch (1997) uses drama manager actions to affect the plot progression in a story. These actions are modeled as abstract operators on significant plot events such as
cause, deny, or hint. A variety of algorithms have been used
to solve a DODM instance including search (Weyhrauch
1997), Reinforcement Learning (Nelson et al. 2006), and
a variant of Markov decision processes known as TTDMDPs (Roberts et al. 2006; 2007). Recently these results
have come under scrutiny for relying heavily on the drama
manger’s cause action and therefore being overly manipulative and reducing the player’s choices—in effect limiting
their self-agency (Nelson and Mateas 2008).
While Nelson and Mateas target their critique on the
TTD-MDP-based DODM solution technique, it is likely the
case that their criticism applies to many of the approaches to
drama management that rely on proactive actions. They argue that one way to address the issue is to change the objective of the drama manager to explicitly include a representation of player choice (and therefore self-agency) and present
the results of a brief simulation experiment to support this
claim.1
Another approach to addressing the issue of drama managers being overly manipulative (regardless of the particular
approach) is to look more carefully at the implementation
of the drama manager’s actions. Specifically the types of
actions that cause an event in the story world. Roberts et
al. (2008) propose the use of computational models based
on the theories of influence and persuasion from social psychology (Cialdini 1998) and behavioral economics (Ariely
2008) as a tool for automatically refining drama manager actions. They argue that a significant feature of actions based
on those theories is the strict preservation of the player’s
sense of self-agency. That argument is summarized here.
Because interactive experiences are marked by a strong
social context, in order to fully engage in the management
of these experiences it is important to move beyond simple
physical manipulation of the environment. The goal in developing computational models of influence is to: 1) benefit
authors by providing tools designed to influence players to
buy into the adoption of goals consistent with the author’s;
2) reduce the burden on authors by enabling them to specify
goals abstractly, relying on the principles of influence and
1
One piece of the argument not discussed by Nelson and Mateas
is the availability of non-manipulative actions such as hints to the
drama manager. It is unclear if their results can be duplicated when
the drama manager is given a sufficient number of appropriately
modeled hints.
persuasion to bridge the gap to a concrete implementation in
the virtual environment; and 3) accomplish (1) and (2) without the player perceiving any decrease in (and preferably an
increase in) self-agency.
It should be noted that the traditional approach of physical
manipulation of the environment can put players in situations that may change their mental or emotional state provided the experiential content has been appropriately authored. It is relatively straightforward to author for the transfer of knowledge from an NPC to the player; however, simply imparting knowledge to the player is not sufficient for
increasing the likelihood that they will choose to adapt a specific goal. Further, using computational models of influence
the drama manager will be able to decide how to change
the player’s mental or emotional state without using detailed
pre-authored content. The main benefits of this are twofold:
1) With the notable exception of the beat-based drama manager (Mateas and Stern 2005), no other approach to drama
management has been designed to influence a player’s goals
which means new doors to creating narrative experiences
may open; and 2) The drama manager can try to align the
player’s goals with the author’s goals without them realizing it is doing so—thus preserving their perception of selfagency while increasing the affordance for authorial control
of the narrative experience.
There are six major principles of influence to focus on.
These principles have been identified by years of research in
the field of social psychology and behavioral economics and
are frequently employed as sales tactics by savvy marketers.
They are:
• Reciprocation: give and take; when someone does something for us we feel obligated to return in kind.
• Consistency: we have a near obsessive desire to be (and
appear) consistent with what we have already done or
said.
• Social Proof: we look to others like us to determine the
appropriate action to take.
• Liking: the more we like someone, the more likely we
are to abide by her requests.
• Authority: we have a deep-seated sense of duty to authority.
• Scarcity: something that, on its own merits, holds little
appeal to us will become decidedly more enticing if it will
soon become unavailable to us.
The principles provide the foundation for understanding
how to create the powerful tools used on a daily basis by
sales professionals. Used properly by themselves or in combination with each other these principles can greatly increase
the likelihood of someone complying with a request. The
fact that these principles can only increase the likelihood of
compliance is an important feature of an approach to drama
management that uses these ideas. The player can always
decide not to do what the drama manager is trying to get her
to do. Thus, while the careful application of these principles of influence can greatly increase the chances of a player
choosing to act in a manner the author prescribes, the affordance for self-agency is strictly preserved.
3a) Story
1) Consent information
2) Demographics
3b) Video
4) Exit survey
3c) Story and selection
?
Figure 1: A schematic overview of the YouTube narrative
system.
In order to use these principles effectively for interactive
experiences, it is only necessary for the drama manager to
hit upon the triggers that cause humans to behave in a predictable manner. Perhaps the most powerful thing about using the principles of influence to the DM’s advantage is that
to do so requires minimal effort. Therefore, a player willingly complying with the DM’s wishes will tend to see her
actions as a result of either her own choices or of natural
forces rather than the influence of an exploiter.
Note that each principle can potentially be used in more
than one way. For example, scarcity can be used to entice
a player to obtain a particular object or to convince her that
certain information she has obtained (or should obtain) is
more important. Liking can actually be employed as a function of friendship, reputation, or physical attractiveness. Authority can be merely a title like Dr. or can be a function of
celebrity.
Architecture
The design of our narrative system was created specifically
to meet two goals: 1) to tell stories interactively that will
cause participants to experience specifically targeted emotions and leverage those emotions to create a labeled data set
for use in classifying free text based on the emotional state
of the writer; and 2) to provide a test platform for analyzing
the effectiveness of using computational models constructed
upon the theories of influence and persuasion as the basis for
drama manager actions that guide the player’s experience in
the interactive story.
To achieve those goals, our system uses a web-based interface that displays a sequence of authored text and videos that
are interspersed with specific decision points for the player.
The videos have been obtained from YouTube2 , a free online
repository for streaming video. The architecture is presented
in Figure 1. A player’s experience will advance according to
the following procedure (depicted in the figure):
1) Consent information: Upon arriving at the landing
page, the player is presented with the study’s consent information and asked to acknowledge it.
2
http://www.youtube.com/
Figure 2: A screenshot of our prototype system during the
question asking phase.
2) Demographics: The player is asked to answer a set of
basic demographic questions which will enable us to compare the groups of people in different treatments. Upon
completion, the player is presented with brief instructions
for continuing.
3a) Story: The player sees author-provided text that describes the beginning of a story event.
3b) Video: The player is presented with a video from
YouTube presenting information supplemental to the
story. Sights, sounds, and action can be conveyed much
more efficiently. The video does not start playing automatically; it is completely optional for the player to view
it.
3c) Story and selection: After the video, the player is presented with a short bit of text and a multiple choice question. The question solicits a decision from the player that
will drive the narrative. At this point, the system cycles
back to Step (3a) and displays another text-video-text set
or, once the story has reached a conclusion, moves on to
the final step.
4) Exit survey: The player is presented with a set of exit
questions and, depending on the study, is asked to summarize the story they just experienced.
All of the text, question-response sets, and videos are
pre-authored—not generated. We have opted to obtain the
videos for our system from YouTube to ease the authoring
process. Information about each video is stored in a database
including: a unique identifier; the html code to embed the
video in a webpage; any meta-data the author associates with
it; and information on how often it is played. Additionally,
the database is populated with the questions the player will
answer, the set of potential responses, and the text authored
for transitioning between events. The story structure is encoded in the database as well.
A majority of the responses to questions are categorical in
nature. We have questions that are both explicit and subtle
in the elicitation of emotional response categories. For example, to be explicit we may ask the player: “How do you
feel about the actions of Thomas in the last video?” To be
more subtle, we may ask the player: “What would you like
to do in response to Thomas’ actions in the last video?” In
the former case, the set of responses will correspond to emotional labels such as “satisfied” or “disgusted.” In the latter
case, the actions availabe as responses to the player will be
at the level of discourse acts such as “punch Thomas” or
“hug Thomas.” These discourse acts can be correlated to
emotional responses as well. In this case, “punch” would
correspond to anger or disgust whereas “hug” would correspond to empathy or happiness. The choice of discourse
acts will determine the narrative progress as they correspond
directly to player actions. Note that depending on the particular experiment we are running, both types of questions
need not be asked of the player; however, in all cases a subtle question that leads to discourse acts will be presented to
the player in order to determine how to continue the story.
The Use of Videos
As we will discuss further in the concluding section of this
paper, we are not just interested in the creation of interactive
stories, but also the authoring process. As skilled as we are
at telling stories in general, very few of us are capable writers. We can retell stories that we have experienced or heard
before, but putting those stories down on paper for others
to read is something very few of us can do effectively. We
believe that the limitations many of us have in disseminating our stories through text similarly applies to an interactive
setting. Thus, an authoring paradigm that alleviates most of
the burden associated with creating a written narrative is desirable. Further, with the increasing availability of personal
video recording devices (even many cell phones can record
video clips now) many people have vast collections of small
snippets of their life’s story. Weaving those videos together
effectively is a much easier task than telling the entire story
from scratch. Thus, one benefit of using videos for our interactive storytelling system is that we believe many more
people will be able to use it tell stories.
Research into the creation of virtual humans is an ongoing area that, while having made much progress in recent
years, still is far from the effectiveness of photo-realism one
gets from actual humans in video. Because an emotional
connection with characters is one way in which listeners (or
players) can really engage in a storytelling event, using characters that may not appear realistic can adversely affect the
quality of experience. Further, because virtual human technology is still very new, the engineering effort to create such
a character is massive in comparison to that of creating a
corpus of videos. Creating a corpus of videos is something
that an average computer user can likely do effectively while
building a virtual human is better left for technical experts
well-versed in state of the art technology and algorithms.
Lastly, the emotionally persuasive power of video is well
documented. As early as 1962, researchers were using video
in lab experiments to study emotions such as stress (Lazarus
et al. 1962). More recently, film clips have been used to
study a range of emotions. To examine the effectiveness
of film at causing emotion, psychology researchers have
compared “self-reports” among study participants to determine the consistency of emotional responses across subjects to particular video clips (McHugo, Smith, and Lanzetta
1982). Philippot has specifically studied the ability of
videos to induce differentiated emotional states in lab experiments (1993). Others have studied the process by which
an effective corpus of videos can be constructed specifically for use in emotion experiments involving human subjects (Gross and Levenson 1995). Finally, most recently,
the effectiveness of self-reports for a constructed corpus of
emotionally persuasive videos has been studied using classification of physiological responses such as heart rate, skin
temperature, and pupil dilation (Lisetti and Nasoz 2004).
Thus, the use of videos in our system is a strategic choice
that serves three important purposes: 1) they provide a simple authoring paradigm; 2) we can avoid the creation of virtual human characters; and 3) videos are emotionally persuasive. These three benefits to using videos in our storytelling
system will help to enable a range of experiments on player
experience, collection of labeled emotional utterances, and
enable us to examine how unskilled authors can take advantage of the tools we provide.
The Use of Influence
Note that while videos are emotionally “persuasive,” we are
in fact interested in a different type of persuasion as well.
A separate, but equal interest that drives the development of
our system is to study the effect of computational models of
influence and persuasion as tools for the automatic refinement of drama manager actions. We are specifically interested in the type of influence often used in marketing that
can affect consumer behavior.
There are two questions associated with the use of influence in an interactive setting: 1) What specific tool of influence is appropriate to use given the current goal? and
2) What is the best way to refine the tool given the current
state of the story? Thus, the drama manager’s job is to select a goal for the story. Given that goal, a range of action
“types” are available for the system to take. These actions
types represent a model based on a particular theory of influence. Given an action type, the system then must choose
how to implement (or refine) an instance of that action type
given the current state of the story.
The YouTube storytelling architecture described in this
paper enables experiments to be run that will test the effectiveness of solutions to both influence selection questions.
In the concluding section of this paper we will describe the
human-subject experiments we have planned in more detail.
In short, we are developing specific models of a number of
methods of influence. Each of these methods of influence
will be tested against a baseline to determine to what degree
their use can affect the action selection by players.
It should be noted that the models of influence for virtual characters have been implemented before. PsychSim, a
social simulation system developed by Marsella, Pynadath,
and Read, uses theory of mind for virtual agents in a partially
(a) Clear
(b) Ambiguous
Figure 3: The dogs pictured here are Shiba Inus, a breed known for hectic play style. Few would confuse the action on the left, but some might assume the shot on the right is of dogs fighting when in fact it is play. The difference in interpretation is likely manifest itself in different emotional responses. Photos courtesy of BW Pet Photography
(http://bwpetphotography.blogspot.com).
observable Markov decision process framework to model
the beliefs and goals of other agents in the simulation (2004).
To maintain an agent’s beliefs about the world and other
agents, rules based on Cialdini’s methods of influence are
used to help explain the observed actions of agents and as a
heuristic to select from alternative explanations of those observations. PsychSim has also been used in the Thespian interactive narrative system (Si, Marsella, and Pynadath 2005;
2006). The models of influence that we will test using our
YouTube storytelling system are different than those of PsychSim. PsychSim’s models are designed for agent to agent
interactions and are used to describe behavior observed in
other agents. The models we will test using our system are
designed for agent to human interaction and are used to refine agent behaviors that will affect, rather than explain, human behavior.
Player Modeling and Adaption
One key feature of an interactive storytelling system is its
ability to adapt to the user. The YouTube storytelling architecture is no exception to this rule. The author’s goals
in this setting are represented as a set of emotional changes
over time. The task for the system is to present videos and
text in sequence to induce those affect changes. Our system will use a Declarative Optimization-based Drama Manager (DODM) described above. A detailed description of
the DODM approach to managing the experience is beyond
the scope of this paper.
For the DODM manager to be effective, it is necessary
for the system to model the player in a way that allows
it to reason about the author’s goals and the player’s behavior in relation to those goals. Since the author’s goals
are represented as emotional change over time, a model
the player’s emotional reactions is vital to proper function.
Therefore, the YouTube storytelling system maintains a cumulative model of player responses to videos over time.
Note, it is possible for players to interpret certain videos in
different ways, and therefore experience different emotional
reactions. For example, since we cannot show a video in
this paper, consider the still photos of dogs in Figure 3. The
Shiba Inus in those photos are a breed known for their apparently aggressive play style. In Figure 3(a) the lack of action
and calmness is likely to elicit a positive, or happy, response
from viewers; However, if the viewer has a fear of dogs,
they may respond very differently. Further, in Figure 3(b)
the dogs are at play. The commotion conveyed in the image
and the visible teethe would be construed by most as a sign
of fighting and generally cause a concerned or upset reaction. On the other hand, those familiar with the breed and its
play style might recognize it as playing and react happy.
The player’s reactions will be tracked over time in two
ways. First, for each video and bit of authored text in
the system, a cumulative reaction count is maintained to
model the average emotional response of players. Assume
we have k emotions e1 , . . . , ek and l actions a1 , . . . , al .
Let c(ei , vj ) be the number of times emotion ei is reported in association with video vj in response to an explicit question, and c(ai , vj ) be the number of times action
ai is taken in response to a subtle question about video vj .
Note that a player’s actions need not correspond one-to-one
with a particular emotion. For example, a “punch” action
could indicate fear or anger. Thus, let f (ai , ej ) ∈ [0, 1],
Pk
m=1 f (ai , em ) = 1.0 be an author defined function that
represents the emotional responses associated with action ai .
This function captures the multiple-interpretation effect that
can occur in response to videos. Then, assuming video vi
has been viewed n times, the model of video vi is given by
the vector:
"
#
Pl
c(e
,
v
)
+
f
(a
,
e
)
·
c(a
,
v
)
j
i
m
j
m
i
m=1
V~i = · · · ,
,···
2n
This vector represents the average emotional responses for
the video across all player and all episodes. By itself, this
does not provide a model the player, but a model of how the
average player will respond to the video.
To construct the player model, a comparison of the
player’s particular emotional responses to the average across
all players’ responses is necessary. Thus, a player’s selfreported emotional response and action define a player response vector for a video vi :
"
#
Pl
s(ej , vi ) + m=1 f (am , ej ) · s(am , vi )
~
Pi = · · · ,
,···
2
where s is a selection function that evaluates to 1.0 when
the player has self-reported the emotion or taken the action
passed as the first argument in response to the video passed
as the second argument and evaluates to 0.0 otherwise. P~i
represents the player’s emotional response to video vi and
can easily be compared to the model V~i for the video.
Simple vector subtraction provides a dissimilarity mea~ i = V~i − P~i of the player model from the video
sure D
model. Thus, when selecting a video to present to the player,
the system can reason about how the player’s emotional reactions are likely to be different from the expected. If the
system has a goal of creating an emotional response that corresponds to emotion ej it can examine videos the player has
seen already that, on average, cause emotion ej and determine if the player reacts similarly to the general population
of players. If so, it will show that video. If not, it will search
through the dissimilarity vectors it has computed thus far to
determine if there is another type of video in the system that
may have better success causing emotion ej .
Concluding Thoughts
In this paper, we have described our YouTube storytelling
system. The system uses a combination of pre-authored
text, question-response pairs, and a corpus of videos from
YouTube to tell stories. The system will enable two orthogonal directions of future research—analysing free text
to classify the emotional state of the writer and using computational models of influence to affect the choice of actions
by players.
We have a functioning prototype of the system that includes the majority of the features described in this paper.
We plan to begin evaluating the system in the near future.
With only one minor exception, all of the evaluation of the
YouTube interactive storytelling system will involve human
subjects. First, we plan to run “baseline” studies with a fixed
story structure. This baseline study will serve three purposes. First, we will collect data to build the video models
described in the previous section. Once we have meaningful
models, we can begin to perform the adaption that is characteristic of interactive storytelling system. Second, we will
have a small set of sentences that are labeled by the emotional state of the writer. Eventually, this data set will be
used to train a classifier that will work on natural language
input into the YouTube narrative system in place of the multiple choice questions we are currently employing. Third, it
will provide a data set free from the type of marketing influence we will test in subsequent studies.
The overlap between evaluation of our work on emotion
and on influence will end after the first study. The second
study we plan to conduct will include a DODM drama manager, player modeling and adaption, and will have branching
story structure. This study will be conducted specifically to
characterize the ability of the drama manager to cause emotional change over time in the player. It will further serve to
gather more data for our natural language classifier.
Next, we will perform a series of experiments to test various influence techniques (or combinations of techniques).
These influence techniques will be constructed as actions
available to the drama manager that will change the text (and
potentially questions depending on the particular technique)
that is presented to the player in a principled manner. The
intended result of these experiments is to demonstrate a measurable and statistically significant change in the frequency
of actions taken by the player given the same fixed story
structure presented to the players in the baseline study.
Lastly, we plan to study how author’s can leverage the
tools we have developed for this system. We plan to conduct a study where participants are asked to compose text
and question/response sets as well as select videos for inclusion in an interactive story they author. We are interested in
getting author’s impressions of how useful it is to have influence as a tool available to them. The authors participating in
the study will be given access to “influence tools” they can
associate with explicit goals they have for their interactive
story. These influence tools will be directives to the system
about a technique it should use to realize the author’s goal,
the details of which will be generated automatically.
References
Ariely, D. 2008. Predictably Irrational: The Hidden
Forces That Shape Our Decisions. Harper Collins.
Bates, J. 1992. Virtual reality, art, and entertainment. Presence: The Journal of Teleoperators and Virtual Environments 1(1):133–138.
Berkowitz, L. 2000. Causes and Consequences of Feelings.
Cambridge University Press.
Cialdini, R. B. 1998. Influence: The Psychology of Persuasion. Collins.
Dellaert, F.; Polzin, T.; and Waibel, A. 1996. Recognizing emotion in speech. In Proceedings of the CMC, 1970–
1973.
Gratch, J., and Marsella, S. 2001. Tears and fears: Modeling emotions and emotional behaviors in synthetic agents.
In Proceedings of the Fifth International Conference on
Autonomous Agents.
Gross, J. J., and Levenson, R. W. 1995. Emotion elicitation
using lms. Cognition and Emotion 9(1):87–108.
Huber, R.; Batliner, A.; Buckow, J.; Noth, E.; Warnke, V.;
and Niemann, H. 2000. Recognition of emotion in a realistic dialogue scenario. In Proceedings of INTERSPEECH
(ICSLP), 665–668.
Laurel, B. 1986. Toward the Design of a Computer-Based
Interactive Fantasy System. Ph.D. Dissertation, Drama department, Ohio State University.
Lazarus, R. S.; Speisman, J. C.; Mordkoff, A. M.; and
Davison, L. A. 1962. A laboratory study of psychological stress produced by motion picture film. Psychological
Monographs 76.
Lisetti, C. L., and Nasoz, F. 2004. Using noninvasive wearable computers to recognize human emotions from physiological signals. EURASIP Journal on Applied Signal Processing 2004(11):1672–1687.
Liu, H.; Lieberman, H.; and Selker, T. 2003. A model
of textual affect sensing using real-world knowledge. In
Proceedings of Intelligent User Interfaces (IUI) 2003, 125–
132.
Marsella, S. C.; Pynadath, D. V.; and Read, S. J. 2004. Psychsim: Agent-based modeling of social interactions and influence. In Proceedings of the International Conference on
Cognitive Modeling, 243–248.
Mateas, M., and Stern, A. 2005. Structuring content in
the facade interactive drama architecture. In Proceedings
of Artificial Intelligence and Interactive Digital Entertainment (AIIDE05).
Mateas, M. 1999. An Oz-centric review of interactive
drama and believable agents. In Woodridge, M., and
Veloso, M., eds., AI Today: Recent Trends and Developments. Lecture Notes in AI 1600. Berlin, NY: Springer.
First appeared in 1997 as Technical Report CMU-CS-97156, Computer Science Department, Carnegie Mellon University.
McHugo, G. J.; Smith, C. A.; and Lanzetta, J. T. 1982.
The structure of self-reports of emotional responses to film
segments. Motivation and Emotion 6:365–385.
Morency, L. P.; Sidner, C.; Lee, C.; and Darrell, T. 2007.
Head gestures for perceptual interfaces: The role of context
in improving recognition. Artificial Intelligence 171(8–
9):568–585.
Morency, L. P.; Christoudias, C. M.; and Darrell, T. 2006.
Recognizing gaze aversion gestures in embodied conversational discourse. In International Conference on Multimodal Interactions (ICMI06).
Nelson, M. J., and Mateas, M. 2008. Another look at
search-based drama management. In Proceedings of the
23rd AAAI Conference on Artificial Intelligence (AAAI-08).
Nelson, M. J.; Roberts, D. L.; Isbell, C. L.; and Mateas, M.
2006. Reinforcement learning for declarative optimizationbased drama management. In Proceedings of the Fifth International Joint Conference on Autonomous Agents and
Multiagent Systems (AAMAS-06).
Neviarouskaya, A.; Prendinger, H.; and Ishizuka, M. 2007.
Textual affect sensing for sociable and expressive online
communication. In ACII, 218–229.
Osherenko, A., and Andr, E. 2007. Lexical affect sensing:
Are affect dictionaries necessary to analyze affect? In Proceedings of Affective Computing and Intelligent Interaction
(ACII). Springer.
Osherenko, A. 2008. Towards semantic affect sensing in
sentences. In Proceedings of AISB 2008 Convention on
Communication, Interaction and Social Intelligence.
Philippot, P. 1993. Inducing and assessing differentiated
emotion-feeling states in the laboratory. Cognition and
Emotion 7:171–193.
Read, J. 2004. Recognising affect in text using pointwisemutual information. Masters thesis, University of Sussex.
Roberts, D. L., and Isbell, C. L. 2008. A survey and qualitative analysis of recent advances in drama management.
International Transactions on Systems Science and Applications, Special Issue on Agent Based Systems for Human
Learning 4(2).
Roberts, D. L.; Nelson, M. J.; Isbell, C. L.; Mateas, M.;
and Littman, M. L. 2006. Targetting specific distributions
of trajecotries in MDPs. In Proceedings of the Twenty-First
National Conference on Artificial Intelligence (AAAI-06).
Roberts, D. L.; Bhat, S.; Clair, K. S.; and Isbell, C. L.
2007. Authorial idioms for target distributions in ttd-mdps.
In Proceedings of the 22st Conference on Artificial Intelligence (AAAI-07).
Roberts, D. L.; Isbell, C. L.; Riedl, M. O.; Bogost, I.; and
Furst, M. L. 2008. On the use of computational models of
influence for interactive virtual experience management. In
Proceedings of the First International Conference on Interactive Digital Storytelling (ICIDS-08).
Si, M.; Marsella, S. C.; and Pynadath, D. V. 2005. Thespian: An architecture for interactive pedagogical drama. In
International Conference on Artificial Intelligence in Education.
Si, M.; Marsella, S. C.; and Pynadath, D. V. 2006. Thespian: Modeling socially normative behavior in a decisiontheoretic framework. In Proceedings of the 6th International Conference on Intelligent Virtual Agents.
Stiefelhagen, R. 2002. Tracking focus of attention in meetings. In In Proceedings of the International Conference on
Multimodal Interfaces.
von Ahn, L.; Kedia, M.; and Blum, M. 2006. Verbosity: A game for collecting common-sense knowledge.
In ACM Conference on Human Factors in Computing Systems (CHI), 75–78.
von Ahn, L.; Liu, R.; and Blum, M. 2006. Peekaboom: A
game for locating objects in images. In ACM Conference
on Human Factors in Computing Systems (CHI), 55–64.
von Ahn, L. 2006. Games with a purpose. In IEEE Computer Magazine, 96–98.
Weyhrauch, P. 1997. Guiding Interactive Drama. Ph.D.
Dissertation, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA. Technical Report CMU-CS97-109.
Yacoub, S.; Simske, S.; Lin, X.; and Burns, J. 2003.
Recognition of emotions in interactive voice response systems. In Proceedings of Eurospeech, 1–4.
Download