Buzz: Telling Compelling Stories

advertisement
Buzz: Telling Compelling Stories
Sara H. Owsley, Kristian J. Hammond, David A. Shamma, Sanjay Sood
Intelligent Information Laboratory
Northwestern University
2133 Sheridan Road, Room 3-320
Evanston, Illinois 60208
+1 (847) 467-6924
{sowsley,
hammond, ayman, sood}@cs.northwestern.edu
ABSTRACT
This paper describes a digital theater installation called Buzz. Buzz
consists of virtual actors who express the collective voice generated by weblogs (blogs). These actors find compelling stories from
blogs and perform them. In this paper, we explore what it means
for a story to be compelling and describe a set of techniques for retrieving compelling stories. We also outline an architecture for high
level direction of a performance using Adaptive Retrieval Charts
(ARCs), allowing a director-level of interaction with the performance system. Our overall goal in this work is to build a model of
human behavior on a new foundation of query formation, information retrieval and filtering.
Categories and Subject Descriptors
J.5 [Arts and Humanities]: Arts, fine and performing; H.3.3 [Information Search and Retrieval]: Information filtering
Figure 1: An installation of Buzz in the Ford Engineering Design Center at Northwestern University.
General Terms
Human Factors
Keywords
tional and evocative words from the monologue, shown as falling
text.
As an example of a Buzz performance, Table 1 shows three stories read in a Buzz performance. The actors contribute to the performance by reading these discovered stories (found in blogs) aloud,
in turn. The actors are attentive to each other by turning to face
the actor currently speaking. The central screen (shown up close in
Figure 2), displays the emotionally evocative words extracted from
the current story being performed.
To find compelling stories, Buzz mines the blogosphere (the collection of all blogs as a community), collecting blogs where the
author describes an emotionally compelling situation: a dream, a
nightmare, a fight, an apology, a confession, etc. After retrieving these blogs, Buzz performs affective classification to focus on
blogs with a heightened emotional state. Other techniques including syntax filtering and colloquial filtering are used to ensure retrieval of appropriate content for the performance. After passing
through these filters, the resulting story selections are compelling
and emotional.
Several techniques are used to give Buzz a realistic feel and to
make performances engaging to an audience. Dramatic ARCs are
used to provide a higher level control of the performance, similar to
that of a director. The actors are attentive to one another, turning to
face the actor currently speaking. Gender classification is used to
Network Arts, Emotion, Blogs, Media Arts, Culture, World Wide
Web, Software Agents, Story Generation
1.
INTRODUCTION
Buzz is a multimedia installation that exposes the buzz generated
by blogs. Buzz finds the weblogs (blogs) which are compelling;
those where someone is laying their feelings on the table, exposing a dream or a nightmare that they had, making a confession or
apology to a close friend, or regretting an argument that they had
with their mother or spouse. It embodies the author (blogger) with
virtual actors who externalize these monologues by reading them
aloud. The focal point of the installation displays the most emo-
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
MM’06, October 23–27, 2006, Santa Barbara, California, USA.
Copyright 2006 ACM 1-59593-447-2/06/0010 ...$5.00.
261
Table 1: Three stories discovered by Buzz.
I have a confession – beneath my cynical, sarcastic facade beats a heart of pure mush. Before you
snort milk through your nose, think about it – despite Connie’s best efforts, my favorite movie in the
world is THE SOUND OF MUSIC. What’s not to like?
Great songs, fabulous scenery, incorrigible children, a
charming nun/governess and a stern, handsome frozenhearted captain who slowly melts under the spell of the
songs, the scenery, his kids and Julie Andrews. When
I start that movie and the mountain scenery comes on
the scene with the birds twittering and the first chords
of music play ... I’m in heaven.
Figure 2: A close up view of the central screen of an installation of Buzz. The screen displays emotionally amplified words
extracted from the blog currently being performed by one of
the virtual actors.
Ever sense I got into a fight with my dad I have started
to drink beer. Friday night I stole 5 beers from my dad
and saturday night i stole about 5 and last night I only
stole 1. I feel as if alcohol is the only thing that can
help me. I feel like its the only thing there for me. I
dont know whats wrong with me. I dont know why I
feel like this. I think another reason why I am so upset
is because I never get to talk to Billy one on one. We
are never alone. I am making him stay at my house this
weekend. I need to spend some alone time with him.
And if that means Saturday night then that means no
Rachael.
ensure that gender-specific stories are performed by virtual actors
of the appropriate gender. A model of speech emphasis is employed
to enhance the cadence and prosody of text to speech technology.
The adoption of blogs by millions of users has resulted in much
more than the mere presence of millions of online journals, and has
created a new kind of communication [15]. We can expose and give
voice to such communication in installations like Buzz. This work
is part of a greater effort in an area called “Network Arts” [24],
which uses information found in the world, via the network, to create artistic installations.
2.
Last night for instance, I dreamed that we were having
the rehearsal dinner at an aquarium for some reason this
aquarium had a killer whale and I was dumb enough to
dip my feet in the tank. Well, it attacked, and in the
dream I was clearly bummed out due to having a major
foot surgery instead of a wedding. There was also a
debacle with a scorpion that I won’t go into. And also
the cake melted.
RELATED WORK
Owsley, et al. [18], created an installation called the Association
Engine, composed of a troupe of virtual improvisational actors. A
troupe of five actors, with animated faces [22] and voice generation [16], began a performance by taking a single word or phrase
suggestion from the audience, through keyboard input. They used
this word as a seed to an improvisational warm-up game called the
Pattern Game, where the actors free associate to create a collective
context, getting themselves on the same contextual page.
Following this warm-up game, the actors would generate a One
Word Story, from the context of the warm-up. A One Word Story is
a common game in improvisational theater where actors each contribute one word at a time to create a collective story. See Table 2
for a sample pattern game and generated One Word Story from the
Association Engine.
Using a template-based approach, the Association Engine was
able to generate stories that were coherent, but did not engage the
audience, as seen from the sample One Word Story in Table 2. They
lacked in character development and a general purpose.
Looking at the stories generated by the Association Engine, it is
clear that the system faced problems that prevailed from previous
years of Artificial Intelligence research in story generation. TaleSpin [14] used a world simulation model and planning approach
for story generation. To generate stories, TaleSpin triggered one of
the characters with a goal and used natural language generation to
narrate the plan for reaching that goal. The stories were simplistic
in their content (using a limited amount of encoded knowledge) as
well as their natural language generation.
Klein’s Automatic Novel Writer [8] uses a similar approach in
Table 2: Discovered Word Chain and One Word Story from the
Association Engine
Pattern Game
music → fine art → art → creation
→ creative → inspiration → brainchild
→ product → production → magazine
→ newspaper → issue → exit
→ outlet → out
One Word Story
An artist named Colleen called her friend Alicia.
Colleen wanted to go to the production at the music
hall. Colleen and Alicia met up at the music hall. To
their surprise there was no production at the music hall.
Instead the women decided to go to the stage.
262
order to produce murder novels from a set of locations, characters,
and motivating personality qualities. The stories follow a planning
system’s output as the characters searched for clues. The system
does not seem to capture the qualities of a good murder story, including plot twists, foreshadowing, and erroneous clues.
Dehn’s Author system [4] was driven by the need for an “author view” to tell stories, as opposed to the “character view” found
in world simulation models. Dehn’s explanation was that “the author’s purpose is to make up a good story” whereas the “character’s
purpose is to have things go well for himself and for those he cares
about” [4].
In general, previous story generation systems faced a trade-off
between a scalable system, and one that can generate coherent stories. Besides Dehn’s Author, previous research in this area has employed a weak model of the aesthetic constraints of story telling.
In response to the shortcomings of story generation, we chose to
explore story discovery. We found an incredible corpus of existing
stories of people’s life experiences. These stories exist within a
subset of blogs [11, 6] found on the Internet. We then used what we
learned from other systems doing story generation to inform story
discovery. We define a stronger model for the aesthetic elements of
story telling and use that to drive retrieval of stories, and to filter
and evaluate results.
Artistically, story telling and online communication have been
externalized within several installations. Of the more well-known,
Listening Post [7] exposes content from thousands of chat rooms
and other online forums through an audio and visual display. In
a very real sense, Listening Post exposes the real-time ‘buzz’ of
chat rooms on the web. Similar to Listening Post, Buzz externalizes
online communication, through a context refined via search and extraction. While Listening Post demonstrates online communication
as a whole, Buzz focuses on singular voices from the blogosphere,
grounded in current popular topics.
Mateas’s Terminal Time [13] also tells stories extracted from
real world sources. Storytelling in Terminal Time is produced by
traversing a common sense knowledge base, a verified information
source, steering a narrative arc by audience applause [12]. Buzz follows a dramatic arc, though at the level of the control of a director
through web-based, unverified, information.
3.
5. involving dramatic situations
6. comprised of developed characters
We designed Buzz to find stories with all of these qualities.
3.1
Topics Of Interest
A compelling story is generally about a compelling topic, one
that interests the audience. For this reason, we chose the day’s
most popular searches from Yahoo (provided by Yahoo buzz [27])
as topics. Search engines recently started providing a log of their
most frequently used query topics. This feed worked well as a seed
to story discovery, as we were using the topics that people were
searching for most and discovering people’s thoughts and opinions
on these topics.
We found Wikipedia [26] to be another source for topics of interest as the site maintains a list of “controversial topics”. The list
shows topics that are in “edit wars” on Wikipedia as contributors
are unable to agree on the subject matter. This list includes topics
such as apartheid, overpopulation, ozone depletion, and censorship.
These topics, by their nature, are topics that people are passionate
about.
Using these two sites as sources for topics, finding compelling
stories began with a simple web search restricted to the domain of
www.livejournal.com [11], a popular blog hosting site, with each
focal topic as a search query. Out of the first 100 results for each
topic, about 60 tend to be actual blog entries and not blogger’s profile pages (this differs greatly per topic).
After discarding profile pages, the remaining blog entries are analyzed phrasally, eliminating posts that do not contain at least one
of the two word phrases (non-stopwords) from the topic. For example, given a topic of ‘Star Wars: Revenge of the Sith,’ entries
that contained the phrase ‘star wars’ were acceptable, but not entries that merely had the word ‘star’ or ‘wars.’ The remaining blog
entries were known to be relevant to the current popular topic.
After realizing the limiting results from searching merely for
blogs from Live Journal, we moved to finding blogs using Google
Blog Search [6]. This move involved creating a generalized algorithm for finding the blog text from a blog entry in any format (as
we previously knew the format of all blogs hosted on Live Journal). We found these results to be more wide-ranging and varying
in type.
Using topics of interest as the source of topic keywords and blogs
as the target, we were able to discover what was being said about
what people were most interested in.
COMPELLING STORIES
A first pass at building Buzz revealed that the content of blogs is
incredibly wide-ranging, but unfortunately often very dull. Buzz
succeeded in finding stories that were on point to any provided
topic, but the results were not compelling.
We found that people blogged about topics including their class
schedule, what they are eating for lunch, how to install a wireless
router, what they wore today, and a list of their 45 favorite ice cream
flavors. While this was interesting to observe from a sociological
point of view, it did not make for a compelling performance. Not
only were the blogs on these topics boring, but the lengths of the
stories varied widely from one sentence to pages upon pages.
We needed to give the system strategies for finding stories that
were compelling and engaging to an audience. To do so, we define
a simple model for the aesthetic qualities of a compelling story.
These qualities include but are not limited to:
3.2
Filtering Retrieval by Affect
Given that our initial version of Buzz was reading blogs that were
boring, and since such a large volume of blogs exist on the web,
we strove to filter the retrieved blog entries by affect, giving us
the ability to portray the strongest affective stories. Beyond purely
showing the most affective stories, we also wanted to be able to
juxtapose happy stories on a topic with angry or fearful stories on
a topic.
To build such a tool, we used a combination of case-based reasoning and machine learning approaches [19, 25]. We created a
case base of 106,000 product reviews labeled with a star rating between one and five (one being negative and five being positive).
We omitted reviews with a score of three as those were seen as
neutral. We built a Naı̈ve Bayes statistical representation of these
documents, separating them into positive (four or five stars) and
negative (one or two stars).
Given a target document, the system creates an “affect query” as
a representation of the document. The query is created by selecting
1. an interesting topic
2. emotionally charged
3. complete and of a length that holds the audience’s attention
4. content at the right level of familiarity to an audience
263
3.5
the words with the greatest statistical variance between positive and
negative documents in the Naı̈ve Bayes model. The system uses
this query to retrieve “affectively similar” documents from the case
base. The labels from the retrieved documents are used to derive
an affect score between -2 and 2 for the target document. This tool
was found to be 73.39% accurate.
For Buzz, blogs which scored from -1 to 1 were seen as neutral and not good candidates for a performance. When using the
emotional filtering tool, Buzz was considerably more compelling.
The actors were also able to retrieve stories from the Web based on
emotional stance, enabling the theatrical agents to juxtapose positive and negative stories on the same topic.
Future work on this classification tool includes creating a model
affect based on Ekman’s six emotion model (happiness, sadness,
anger, disgust, fear, surprise) [10, 17, 5]. This would allow for
greater control of the flow of the performance through emotional
states.
3.3
Through experiencing Buzz in the world and watching audiences
reactions and responses to stories, we discovered more generalized
traits of compelling stories. The most compelling stories to watch
were those where someone is laying their feelings on the table, exposing a dream or a nightmare that they had, making a confession
or apology to a close friend, or regretting an argument that they had
with their mother or spouse.
Codifying these qualities, we built our story discovery engine to
seek out these types of stories. While still making use of multiple
retrieval filters described in the previous section, we added a component to the retrieval that found stories that began with a cue that
the writer was about to describe a dream, nightmare, fight, apology,
confession, or any other emotionally fraught situation. Such cues
include phrases such as “I had a dream last night,” “I must confess,”
“I had a terrible fight,” “I feel awful,” “I’m so happy that,” and “I’m
so sorry.”
This realization was an important turning point in our system’s
capabilities with regard to retrieving compelling stories. The newest instance of Buzz no longer focuses on the popular or contentious
topics, but instead focuses on stories in different types of emotionladen situations (dreams, fights, confessions, etc.).
These stories are more interesting as the blogger isn’t talking
about a popular product on the market, or ranting about a movie;
they are relaying a personal experience from their life, which typically makes them emotionally charged. The experiences they describe are often frightening, funny, touching, or surprising. They
describe situations which have a common element in all of our
lives [20], giving the audience a way to relate to the content and
live through the experiences of the writer, whereas the topically
based approach excluded the portion of the audience that was not
familiar with the topic at hand (a popular actress, story in the news,
etc.).
Including dramatic situations as a filter and search parameter not
only gets us to more interesting story topics and content, but we
also tend to see more character depth and development in the stories. As writers describe dramatic situations in their lives, more
pieces of their personality and personal issues with themselves and
others around them are revealed as a result.
Filtering Retrieval by Syntax
In our first pass at retrieving stories from blogs, we noticed that
we often found lists or surveys instead of text in paragraph form.
For example, one blogger posted an exhaustive list of lip balm flavors. Others posted answers to a survey about themselves (their
favorite vacation spot, favorite color, favorite band and actor, etc.).
These are clearly not good candidates for stories to be presented in
a performance.
To solve this problem, we chose to filter the retrieved blog entries
by syntax. Blog entries that met any of the following criteria were
removed:
1. too many newline characters (more than six in a entry of four
hundred characters)
2. too many commas (more than three in a sentence)
3. too many numbers (more than one number in a sentence)
This method successfully filtered blog entries that contained a
list or survey of some sort. While the precision of such removal of
blogs based on syntax was lower, we optimized for recall so that
all potential lists and surveys were removed for the corpus. Given
the large volume of blogs on the web updated every minute, letting
some potentially good blogs fall through the cracks sufficed for our
purposes.
3.4
Dramatic Situations
3.6
Complete Passages
Given the blog entries that remained after passing through the
five above mentioned filters (relevance, affect, syntax, colloquial
and dramatic situations), the system must choose which pieces of
blog entries to present to the audience. This involves finding complete thoughts or stories of a length that can keep the audience engaged.
For the most part, we found that blog authors format their entries
in a way such that each paragraph contains one distinct thought.
Given this, the paragraph where the dramatic situation is mentioned
with the greatest frequency will suffice as a complete story for our
system. If this paragraph is of an ideal length (between a minimum
and maximum threshold), which we determined by viewing Buzz
with stories at many different lengths, then it is posted as a candidate story. For our system, we found that stories between 150 and
400 characters long were ideal. Again, given the large volume of
blogs on the web, letting many blogs fall through the cracks because they are too long or too short is fine for our purposes.
An example of three stories discovered by Buzz can be seen in
Table 1. The stories shown were retrieved and passed through all
above mentioned filters. Notice the differing emotional stances of
the first and second stories. This was a deliberate and automatic
juxtaposition of positive and negative passages.
Colloquial Filtering
Shamma, et al. [24], began exploring the use of Csikszentmihalyi Flow State [3] as a method of keeping the audience engaged
through audiovisual interaction. In Buzz, for an audience to stay
engaged, they must understand the content of the stories that they
are hearing. That is, the story can’t involve topics that the audience
is unfamiliar with or contain jargon particular to some field. The
story must be colloquial. The story must also not be too familiar as
they audience could get bored.
To determine how colloquial a story is, we built a classifier that
makes use of page frequencies on the web. For each word in the
story, we look at the number of pages in which this word appears on
the web, a frequency that is obtained through a simple web search.
Applying Zipf’s Law [28], we can determine how colloquial each
word is [23]. A story is then classified to be as colloquial as the language used in it. Given a set of possible stories, colloquial thresholds (high and low) are generated dynamically based on the distribution of scores.
264
Figure 3: Results of a study where participants judged how
interesting stories chosen by and rejected by Buzz were.
3.7
Evaluation
Figure 4: An architecture diagram of the Buzz system.
To evaluate the effectiveness of our filters in finding compelling
stories, we conducted a user study including twelve participants.
Each participant was given five stories to score on a scale from
one to ten (uninteresting to interesting). The stories were chosen at
random from a set of stories selected by Buzz as good candidates for
a performance, and a set of stories retrieved by Buzz but removed
as they did not pass one of the five filters.
On a scale of one to ten (uninteresting to interesting), the study
participants found Buzz selected stories to be an average 7.13 and
Buzz rejected stories to be an average of 4.3. A graph of the frequencies of participant scores across Buzz accepted and Buzz rejected stories can be seen in Figure 3.
4.
We’ve found this display to be a good addition to the actors as it
gives the audience more context in the performance and amplifies
the impact of the emotional words.
4.2
CREATING A PERFORMANCE
While finding compelling stories is an important aspect of Buzz,
conveying them to an audience in an engaging way is just as crucial. We found several aspects of the presentation to be critical. The
performance must follow a dramatic arc that keeps the audience engaged. Text-to-speech technology and graphics must be believable
and evocative. Gender-specific stories must be presented by virtual
actors of the appropriate gender. While these issues are a subset
of those critical to an engaging performance, we chose to address
these directly as we feel that our findings can generalize to other
performance systems.
4.1
Director Level Control
Given the above classifiers and filters, we are able to retrieve a
set of compelling stories. These filters and classifiers also give us
a level of control of the performance similar to that of a director.
Having information about each story such as its “emotional point
of view”, and its “familiarity”, we can plan out the structure of the
performance from a high level view before retrieving the performance content, giving the performance a flow, based not only on
content, but on emotion, familiarity, on-point vs. tangential, etc.
Given a topic, we can juxtapose stories with different emotional
stances, different levels of familiarity, and on-point vs. off-point.
These affordances give a meaningful structure to the performance.
To provide a high level control of the performance, we created
an architecture for driving the retrieval of performance content.
The structures, called Adaptive Retrieval Charts (or ARCs), provide high level instructions to the Buzz engine as to what is needed,
where to find it, how to find it, how to evaluate it, how to modify
queries if needed and how to adapt the results to fit the current goal
set. To get an idea of how the ARCs interact with the blog search
and filters, see Figure 4.
An example of an ARC used in Buzz is shown in Figure 5. The
pictured ARC defines a point/counterpoint/dream interaction between agents. The three modules define three different information needs, as well as the sources for retrieval to fulfill these needs.
The first module specifies that we want a blog entry that is on point
to a specified topic, has passed through the syntax and colloquial
filters, and is generally happy on the topic. The module specifies
using Google Blog Search [6] as a source. The source node specifies to form queries by single words as well as phrases related to
the topic. If too few results are returned from this source, we have
specified that queries are to be continually modified by lexical expansion and stemming.
The Display
The current Buzz installations include five flat panel monitors in
the shape of an ’x’. The four outer monitors display actors represented by different adaptations of the graphics from Ken Perlin’s Responsive Face technology [22]. These faces are synchronized with voice generation technology [16] controlled through the
Microsoft Speech API, matching mouth positions on the faces to
viseme events, lip position cues output by the MSAPI. Within this
configuration, the actors are able to read stories and turn to face the
actor currently speaking.
The central screen (shown in figure 2) displays emotionally evocative words, pulled from the text currently being spoken, falling
in constant motion. These words are extracted using the emotion
classification technology described in the section on “Filtering Retrieval by Affect.” The most emotional words are extracted by finding the words with the largest disparity between positive and negative probabilities in the Naı̈ve Bayes statistical model.
265
Figure 5: A sample ARC from the Buzz system, defining a point/counter point interaction between agents.
The ARC extensible framework allows for interactions from directors with no knowledge of the underlying system. In a future
system, we will accomplish this via a range of possible interfaces
from storyboarding and affect manipulation to a natural language
interface.
4.3
Table 3: Precision and Recall Scores for
specific stories.
Document Type Precision
female-specific
92.59%
male-specific
100%
gender-neutral
89.66%
overall
91.67%
Compelling Speech
While text-to-speech systems have made great strides in improving believability of generated speech, these systems are not perfect [1]. Their focus has been on telephony systems, where the
length of time of spoken speech is limited. In watching a Buzz performance, we found that the voices tended to drone monotonously
during stories longer than one to two sentences. An additional
problem we encountered using text-to-speech systems to read blogs
was caused by the stream of consciousness nature of some blogs,
resulting in casual formatting with poor or limited punctuation.
Text-to-speech systems rely on punctuation to provide natural pauses in the speech. In blogs where limited punctuation was present,
we found that the voices tended to drone on even more.
In response to these issues, we created a model for speech emphasis. In recent work, others have created models for how to emphasize words [21] and which words to emphasize. While these
models are successful, we strove to create a simple model that
would scale to our needs. To select words to emphasize, we first
used emotional word extraction, using the Naı̈ve Bayes statistical
model discussed in the section on “Filtering Retrieval by Affect”
to find the words with the largest disparity between positive and
negative probabilities.
As we were using the Microsoft Speech API to control the NeoSpeech voices, we were able to use XML markup provided by the
MSAPI to control the volume, rate and pitch of the voices, as well
as insert pauses of different periods (specified in milliseconds) in
the speech. Using emotional words for emphasis, we found the top
two emotional words from each sentence. We emphasized these
words by increasing the volume of the voice (from 70% to 100%)
and slowing the rate (from an absolute rate of 0 to a rate of 2) while speaking these words. While this method did break the
monotony of the speech, we found that it did not preserve the flow
of the speech, resulting in choppy sounding speech. This also did
not solve the more prevalent problem of the limited punctuation of
blogs.
detection of genderRecall
86.21%
84.62%
96.30%
91.67%
To smooth the choppiness of this emphasis, we found that emphasizing the entire noun phrase where emotional words appeared
tended to sound smoother than just emphasizing the emotional word
itself. To accomplish this, we used a part of speech tagger [2], extracting all noun phrases from a passage. We chose to emphasize
the most emotional noun phrase in each sentence. To solve the
problem of limited punctuation, we chose to insert a pause following each emphasized noun phrase, serving as a natural breaking
point.
While our model of speech emphasis is simplistic, we’ve found it
to be effective in enhancing the Buzz experience. We expect to further tweak our emphasis model in response to audience feedback.
4.4
Detecting Gender-Specific Stories
One problem encountered in a first pass of building Buzz was
that gender-specific stories were occasionally read by actors of the
incorrect gender. For example, if a blog author describes their experiences during pregnancy, it is awkward to have this story performed by a male actor. Conversely, if a blogger talks about their
day at work as a steward, having this read by a female could also
be slightly distracting.
As a solution to this problem, we sought to detect and classify
gender-specific stories. Unlike previous gender classification systems [9], it was not necessary for our system to classify all stories as
either male or female. Rather, it was only important for us to detect
stories where the author’s gender is evident, thus classifying stories
as male, female, neutral (in the case where gender-specificity is not
evident in the passage), or ambiguous (in the case where both male
and female indicators are present).
To do this, we look for specific indicators that the story is written
266
6.
by a male or a female. These indicators include roles (family and
jobs), relationships, and physical states.
To detect self-referential roles in a blog, the system looks for
‘I’ references including “I am”, “I was”, “I’m”, “being”, and “as
a.” These phrases indicate gender-specificity if they are followed
within five words (if none of these five words are pronouns) by a
female-only or male-only role such as wife, mother, groom, aunt,
waitress, mailman, sister, etc. Such roles were collected from various sources and enumerated as such. Excluding extra pronouns
between the self reference and the role eliminates false positives
such as “I was close to his girlfriend.”
To detect physical states that carry gender connotations, the system again looks for ‘I’ references, as above, followed within five
words by a gender-specific physical state such as pregnant. As in
detecting roles, we also ignore cases with extraneous pronouns between the ‘I’ reference and the physical state. This eliminates false
positives such as “I was amazed by her pregnancy.”
To detect male or female-only relationships, the system looks for
use of the word ‘my’ followed within five words by a male or female only relationship such as husband, ex-girlfriend, etc. Again,
cases with extraneous pronouns are ignored to eliminate false positives such as “my feelings towards his girlfriend.” In this our first
pass at a gender specific story classification system, we make the
assumption of heterosexual relationships, which we hope to relax
in a future system.
If any of these cases exist and they agree on a male/female classification, then it is classified as such. If they disagree, it is classified
as ‘ambiguous.’ If no indicators exist, it is classified as ‘neutral.’
This gender detection tool was evaluated using a corpus of 96
stories retrieved by Buzz. These stories were retrieved from an
indexed corpus of stories found by Buzz. They were selected by
queries for words that often indicate gender-specificity (‘pregnant’,
‘mom’, ‘mother’, ‘dad’, ‘father’, ‘girlfriend’, ‘boyfriend’, ‘husband’, ‘wife’, and ‘daughter’). They were sorted into three groups,
stories written by females, males, or neutral (written by males or
females). This sorting was based on textual cues that gave a clear
indication of gender, and was verified unanimously by from 5 participants.
While our gender-classification system is still simple, it does
an admirable job. Results showed that the gender detection tool
performed very well, as seen in the precision and recall scores
in Table 3. Overall precision and recall were both approximately
91.67%. Enabling Buzz with the ability to detect and handle genderspecific stories has created a more realistic performance, without
the distraction of an actor performing a gender-mismatched story.
5.
CONCLUSION
Initially, story discovery within Buzz was based on popular topics. As we approached the task of engaging the user, it became
more important that the stories themselves were compelling, as opposed to topical. Using filters and information retrieval strategies
that focused on finding the interesting and not the topical has resulted in an engaging theatrical installation. In the future, we will
turn our focus back to topics, discovered within the scope of interesting stories.
While finding compelling stories to present is a very important
part of the Buzz performance, presenting these stories in a way that
is meaningful and engaging is equally important. We found issues
of gender-specificity, voice prosody, and presentation order to be
the aspects of a Buzz performance with which we could make great
strides in improving. Future work in the presentation of Buzz will
include more realistic looking avatars and continued work on enhancing the voice prosody.
7.
FUTURE WORK
Our current and future work in this area involves expanding Buzz
into a full length improvisational performance on stage, interacting
with human actors. We are building a full body projected avatar
host with voice generation, and voice recognition to take audience
suggestions and interact with human actors. Understanding the
state of current technology in voice recognition, we are enabling
the host to drive her conversations with actors and the audience, to
recover from mistakes, and express and expose her shortcomings.
This production will make use of the ARC architecture to allow
a high level control of the flow of the performance. Our research in
story discovery will serve as a platform for character development
for the host, as she can relate to and participate in discussions by
telling stories discovered from blogs related to the current conversation topic or audience suggestion.
8.
ACKNOWLEDGMENTS
This material is based upon work supported by the National Science Foundation under Grant No. 0535231.
9.
REFERENCES
[1] A. Black. Perfect synthesis for all of the people all of the
time. In IEEE TTS Workshop, Santa Monica, CA, 2002.
[2] E. Brill. Transformation-based error-driven learning and
natural language processing: a case study in part-of-speech
tagging. Computational Linguistics, 21:543–565, 1995.
[3] M. Csikszentmihalyi. Flow: The Psychology of Optimal
Experience. Harper & Row, New York, NY, USA, 1990.
[4] N. Dehn. Story generation after tale-spin. In Proceedings of
the Seventh International Joint Conference on Artificial
Intelligence., University of British Columbia, 1981.
[5] P. Ekman. Emotions Revealed: Recognizing Faces and
Feelings to Improve Communication and Emotional Life.
Henry Holt and Company, New York, NY, 2003.
[6] GoogleBlogSearch. http://blogsearch.google.com, 2006.
[7] M. Hansen and B. Rubin. Listening post: Giving voice to
online communication. In Proceedings of the 2002
International Conference on Auditory Display. ACM Press,
2002.
[8] S. Klein, J. Aeschlimann, D. Balsiger, S. Converse, C. Court,
M. Foster, R. Lao, J. Oakely, and J. D. Smith. Automatic
novel writing. Technical report, University of Wisconsin
Madison., 1973.
BUZZ IN THE WORLD
Enabling Buzz with the ability to discover compelling stories on
a popular topic has produced great results. Buzz has changed from
an installation that was unbearably dull, exposing the boring nature
of many blogs, to a system that engages its viewers. The performance is now not driven simply by the relevance of on-line content,
but by the blogger’s emotional state. The highly emotional content
engages the audience and creates a high visibility installation.
Buzz was exhibited last year at the Athenaeum Theater as a part
of the 8th Annual Chicago Improv Festival. It was well-received
by actors, writers, producers and theater-goers alike during this ten
day installation.
Buzz was installed in the lobby of Chicago’s Second City theater
at 1616 N. Wells St. in Chicago on August 24th, 2005 for a long
term installation, currently still running. Buzz will also be exhibited at Wired NextFest in New York City from September 29th to
October 1st, 2006.
267
[9] M. Koppel, S. Argamon, and A. R. Shimoni. Automatically
categorizing written texts by author gender. Literary and
Linguistic Computing, 2003.
[10] H. Liu, H. Lieberman, and T. Selker. A model of textual
affect sensing using real-world knowledge. In Proceedings of
the 8th international conference on Intelligent user
interfaces, pages 125–132. ACM Press, 2003.
[11] LiveJournal. http://www.livejournal.com, 2004.
[12] M. Mateas. Expressive ai: A hybrid art and science practice.
Leonardo: Journal of the International Society for Arts,
Sciences, and Technology, 34(2):147–153, 2001.
[13] M. Mateas, S. Domike, and P. Vanouse. Terminal time: An
ideologically-biased history machine. In Proceedings of the
1999 AISB Symposium on Artificial Intelligence and Creative
Language: Stories and Humor. ACM Press, 1999.
[14] J. R. Meehan. Tale-spin, an interactive program that writes
stories. In Proceedings of the 5th IJCAI, pages 91–98, 1977.
[15] J. Murray. Hamlet on the Holodeck: The Future of Narrative
in Cyberspace. The Free Press, 1997.
[16] NeoSpeech. http://www.neospeech.com/, 2005.
[17] A. Ortony, G. L. Clore, and M. A. Foss. The referential
structure of the affective lexicon. Cognitive Science,
11:341–362, 1987.
[18] S. Owsley, D. A. Shamma, K. J. Hammond, S. Bradshaw,
and S. Sood. The association engine: a free associative
digital improviser. In Proceedings of the 12th International
Conference on Multi-Media. ACM Press, 2004.
[19] S. Owsley, S. Sood, and K. Hammond. Domain specific
affective classification of documents. In Proceedings of the
AAAI Spring Symposium on Computational Approaches to
Analysing Weblogs., March 2006.
[20] G. Polti and L. Ray. The Thirty-Six Dramatic Situations.
Writer, Boston, 1940.
[21] A. Raux and A. Black. A unit selection approach to f0
modeling and its application to emphasis. In ASRU, St
Thomas, US Virgin Islands, 2003.
[22] Responsive Face Project, NYU Media Research Lab.
http://www.mrl.nyu.edu/perlin/facedemo/, 2000.
[23] D. A. Shamma, S. Owsley, S. Bradshaw, and K. J.
Hammond. Using web frequency within multi-media
exhibitions. In Proceedings of the 12th International
Conference on Multi-Media. ACM Press, 2004.
[24] D. A. Shamma, S. Owsley, K. J. Hammond, S. Bradshaw,
and J. Budzik. Network Arts: Exposing cultural reality. In
Alternate track papers & posters of the 13th international
conference on World Wide Web, pages 41–47. ACM Press,
2004.
[25] S. Sood, S. Owsley, and K. Hammond. Reasoning through
search: A novel approach to sentiment classification. In
submitted to EMNLP, July 2006.
[26] Wikipedia. http://www.wikipedia.org, 2005.
[27] Yahoo Buzz Index. http://buzz.yahoo.com/, 2005.
[28] G. Zipf. Human Behavior and the Principle of Least-effort.
Addison-Wesley, Cambridge, MA, USA, 1949.
268
Download