Buzz: Telling Compelling Stories Sara H. Owsley, Kristian J. Hammond, David A. Shamma, Sanjay Sood Intelligent Information Laboratory Northwestern University 2133 Sheridan Road, Room 3-320 Evanston, Illinois 60208 +1 (847) 467-6924 {sowsley, hammond, ayman, sood}@cs.northwestern.edu ABSTRACT This paper describes a digital theater installation called Buzz. Buzz consists of virtual actors who express the collective voice generated by weblogs (blogs). These actors find compelling stories from blogs and perform them. In this paper, we explore what it means for a story to be compelling and describe a set of techniques for retrieving compelling stories. We also outline an architecture for high level direction of a performance using Adaptive Retrieval Charts (ARCs), allowing a director-level of interaction with the performance system. Our overall goal in this work is to build a model of human behavior on a new foundation of query formation, information retrieval and filtering. Categories and Subject Descriptors J.5 [Arts and Humanities]: Arts, fine and performing; H.3.3 [Information Search and Retrieval]: Information filtering Figure 1: An installation of Buzz in the Ford Engineering Design Center at Northwestern University. General Terms Human Factors Keywords tional and evocative words from the monologue, shown as falling text. As an example of a Buzz performance, Table 1 shows three stories read in a Buzz performance. The actors contribute to the performance by reading these discovered stories (found in blogs) aloud, in turn. The actors are attentive to each other by turning to face the actor currently speaking. The central screen (shown up close in Figure 2), displays the emotionally evocative words extracted from the current story being performed. To find compelling stories, Buzz mines the blogosphere (the collection of all blogs as a community), collecting blogs where the author describes an emotionally compelling situation: a dream, a nightmare, a fight, an apology, a confession, etc. After retrieving these blogs, Buzz performs affective classification to focus on blogs with a heightened emotional state. Other techniques including syntax filtering and colloquial filtering are used to ensure retrieval of appropriate content for the performance. After passing through these filters, the resulting story selections are compelling and emotional. Several techniques are used to give Buzz a realistic feel and to make performances engaging to an audience. Dramatic ARCs are used to provide a higher level control of the performance, similar to that of a director. The actors are attentive to one another, turning to face the actor currently speaking. Gender classification is used to Network Arts, Emotion, Blogs, Media Arts, Culture, World Wide Web, Software Agents, Story Generation 1. INTRODUCTION Buzz is a multimedia installation that exposes the buzz generated by blogs. Buzz finds the weblogs (blogs) which are compelling; those where someone is laying their feelings on the table, exposing a dream or a nightmare that they had, making a confession or apology to a close friend, or regretting an argument that they had with their mother or spouse. It embodies the author (blogger) with virtual actors who externalize these monologues by reading them aloud. The focal point of the installation displays the most emo- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MM’06, October 23–27, 2006, Santa Barbara, California, USA. Copyright 2006 ACM 1-59593-447-2/06/0010 ...$5.00. 261 Table 1: Three stories discovered by Buzz. I have a confession – beneath my cynical, sarcastic facade beats a heart of pure mush. Before you snort milk through your nose, think about it – despite Connie’s best efforts, my favorite movie in the world is THE SOUND OF MUSIC. What’s not to like? Great songs, fabulous scenery, incorrigible children, a charming nun/governess and a stern, handsome frozenhearted captain who slowly melts under the spell of the songs, the scenery, his kids and Julie Andrews. When I start that movie and the mountain scenery comes on the scene with the birds twittering and the first chords of music play ... I’m in heaven. Figure 2: A close up view of the central screen of an installation of Buzz. The screen displays emotionally amplified words extracted from the blog currently being performed by one of the virtual actors. Ever sense I got into a fight with my dad I have started to drink beer. Friday night I stole 5 beers from my dad and saturday night i stole about 5 and last night I only stole 1. I feel as if alcohol is the only thing that can help me. I feel like its the only thing there for me. I dont know whats wrong with me. I dont know why I feel like this. I think another reason why I am so upset is because I never get to talk to Billy one on one. We are never alone. I am making him stay at my house this weekend. I need to spend some alone time with him. And if that means Saturday night then that means no Rachael. ensure that gender-specific stories are performed by virtual actors of the appropriate gender. A model of speech emphasis is employed to enhance the cadence and prosody of text to speech technology. The adoption of blogs by millions of users has resulted in much more than the mere presence of millions of online journals, and has created a new kind of communication [15]. We can expose and give voice to such communication in installations like Buzz. This work is part of a greater effort in an area called “Network Arts” [24], which uses information found in the world, via the network, to create artistic installations. 2. Last night for instance, I dreamed that we were having the rehearsal dinner at an aquarium for some reason this aquarium had a killer whale and I was dumb enough to dip my feet in the tank. Well, it attacked, and in the dream I was clearly bummed out due to having a major foot surgery instead of a wedding. There was also a debacle with a scorpion that I won’t go into. And also the cake melted. RELATED WORK Owsley, et al. [18], created an installation called the Association Engine, composed of a troupe of virtual improvisational actors. A troupe of five actors, with animated faces [22] and voice generation [16], began a performance by taking a single word or phrase suggestion from the audience, through keyboard input. They used this word as a seed to an improvisational warm-up game called the Pattern Game, where the actors free associate to create a collective context, getting themselves on the same contextual page. Following this warm-up game, the actors would generate a One Word Story, from the context of the warm-up. A One Word Story is a common game in improvisational theater where actors each contribute one word at a time to create a collective story. See Table 2 for a sample pattern game and generated One Word Story from the Association Engine. Using a template-based approach, the Association Engine was able to generate stories that were coherent, but did not engage the audience, as seen from the sample One Word Story in Table 2. They lacked in character development and a general purpose. Looking at the stories generated by the Association Engine, it is clear that the system faced problems that prevailed from previous years of Artificial Intelligence research in story generation. TaleSpin [14] used a world simulation model and planning approach for story generation. To generate stories, TaleSpin triggered one of the characters with a goal and used natural language generation to narrate the plan for reaching that goal. The stories were simplistic in their content (using a limited amount of encoded knowledge) as well as their natural language generation. Klein’s Automatic Novel Writer [8] uses a similar approach in Table 2: Discovered Word Chain and One Word Story from the Association Engine Pattern Game music → fine art → art → creation → creative → inspiration → brainchild → product → production → magazine → newspaper → issue → exit → outlet → out One Word Story An artist named Colleen called her friend Alicia. Colleen wanted to go to the production at the music hall. Colleen and Alicia met up at the music hall. To their surprise there was no production at the music hall. Instead the women decided to go to the stage. 262 order to produce murder novels from a set of locations, characters, and motivating personality qualities. The stories follow a planning system’s output as the characters searched for clues. The system does not seem to capture the qualities of a good murder story, including plot twists, foreshadowing, and erroneous clues. Dehn’s Author system [4] was driven by the need for an “author view” to tell stories, as opposed to the “character view” found in world simulation models. Dehn’s explanation was that “the author’s purpose is to make up a good story” whereas the “character’s purpose is to have things go well for himself and for those he cares about” [4]. In general, previous story generation systems faced a trade-off between a scalable system, and one that can generate coherent stories. Besides Dehn’s Author, previous research in this area has employed a weak model of the aesthetic constraints of story telling. In response to the shortcomings of story generation, we chose to explore story discovery. We found an incredible corpus of existing stories of people’s life experiences. These stories exist within a subset of blogs [11, 6] found on the Internet. We then used what we learned from other systems doing story generation to inform story discovery. We define a stronger model for the aesthetic elements of story telling and use that to drive retrieval of stories, and to filter and evaluate results. Artistically, story telling and online communication have been externalized within several installations. Of the more well-known, Listening Post [7] exposes content from thousands of chat rooms and other online forums through an audio and visual display. In a very real sense, Listening Post exposes the real-time ‘buzz’ of chat rooms on the web. Similar to Listening Post, Buzz externalizes online communication, through a context refined via search and extraction. While Listening Post demonstrates online communication as a whole, Buzz focuses on singular voices from the blogosphere, grounded in current popular topics. Mateas’s Terminal Time [13] also tells stories extracted from real world sources. Storytelling in Terminal Time is produced by traversing a common sense knowledge base, a verified information source, steering a narrative arc by audience applause [12]. Buzz follows a dramatic arc, though at the level of the control of a director through web-based, unverified, information. 3. 5. involving dramatic situations 6. comprised of developed characters We designed Buzz to find stories with all of these qualities. 3.1 Topics Of Interest A compelling story is generally about a compelling topic, one that interests the audience. For this reason, we chose the day’s most popular searches from Yahoo (provided by Yahoo buzz [27]) as topics. Search engines recently started providing a log of their most frequently used query topics. This feed worked well as a seed to story discovery, as we were using the topics that people were searching for most and discovering people’s thoughts and opinions on these topics. We found Wikipedia [26] to be another source for topics of interest as the site maintains a list of “controversial topics”. The list shows topics that are in “edit wars” on Wikipedia as contributors are unable to agree on the subject matter. This list includes topics such as apartheid, overpopulation, ozone depletion, and censorship. These topics, by their nature, are topics that people are passionate about. Using these two sites as sources for topics, finding compelling stories began with a simple web search restricted to the domain of www.livejournal.com [11], a popular blog hosting site, with each focal topic as a search query. Out of the first 100 results for each topic, about 60 tend to be actual blog entries and not blogger’s profile pages (this differs greatly per topic). After discarding profile pages, the remaining blog entries are analyzed phrasally, eliminating posts that do not contain at least one of the two word phrases (non-stopwords) from the topic. For example, given a topic of ‘Star Wars: Revenge of the Sith,’ entries that contained the phrase ‘star wars’ were acceptable, but not entries that merely had the word ‘star’ or ‘wars.’ The remaining blog entries were known to be relevant to the current popular topic. After realizing the limiting results from searching merely for blogs from Live Journal, we moved to finding blogs using Google Blog Search [6]. This move involved creating a generalized algorithm for finding the blog text from a blog entry in any format (as we previously knew the format of all blogs hosted on Live Journal). We found these results to be more wide-ranging and varying in type. Using topics of interest as the source of topic keywords and blogs as the target, we were able to discover what was being said about what people were most interested in. COMPELLING STORIES A first pass at building Buzz revealed that the content of blogs is incredibly wide-ranging, but unfortunately often very dull. Buzz succeeded in finding stories that were on point to any provided topic, but the results were not compelling. We found that people blogged about topics including their class schedule, what they are eating for lunch, how to install a wireless router, what they wore today, and a list of their 45 favorite ice cream flavors. While this was interesting to observe from a sociological point of view, it did not make for a compelling performance. Not only were the blogs on these topics boring, but the lengths of the stories varied widely from one sentence to pages upon pages. We needed to give the system strategies for finding stories that were compelling and engaging to an audience. To do so, we define a simple model for the aesthetic qualities of a compelling story. These qualities include but are not limited to: 3.2 Filtering Retrieval by Affect Given that our initial version of Buzz was reading blogs that were boring, and since such a large volume of blogs exist on the web, we strove to filter the retrieved blog entries by affect, giving us the ability to portray the strongest affective stories. Beyond purely showing the most affective stories, we also wanted to be able to juxtapose happy stories on a topic with angry or fearful stories on a topic. To build such a tool, we used a combination of case-based reasoning and machine learning approaches [19, 25]. We created a case base of 106,000 product reviews labeled with a star rating between one and five (one being negative and five being positive). We omitted reviews with a score of three as those were seen as neutral. We built a Naı̈ve Bayes statistical representation of these documents, separating them into positive (four or five stars) and negative (one or two stars). Given a target document, the system creates an “affect query” as a representation of the document. The query is created by selecting 1. an interesting topic 2. emotionally charged 3. complete and of a length that holds the audience’s attention 4. content at the right level of familiarity to an audience 263 3.5 the words with the greatest statistical variance between positive and negative documents in the Naı̈ve Bayes model. The system uses this query to retrieve “affectively similar” documents from the case base. The labels from the retrieved documents are used to derive an affect score between -2 and 2 for the target document. This tool was found to be 73.39% accurate. For Buzz, blogs which scored from -1 to 1 were seen as neutral and not good candidates for a performance. When using the emotional filtering tool, Buzz was considerably more compelling. The actors were also able to retrieve stories from the Web based on emotional stance, enabling the theatrical agents to juxtapose positive and negative stories on the same topic. Future work on this classification tool includes creating a model affect based on Ekman’s six emotion model (happiness, sadness, anger, disgust, fear, surprise) [10, 17, 5]. This would allow for greater control of the flow of the performance through emotional states. 3.3 Through experiencing Buzz in the world and watching audiences reactions and responses to stories, we discovered more generalized traits of compelling stories. The most compelling stories to watch were those where someone is laying their feelings on the table, exposing a dream or a nightmare that they had, making a confession or apology to a close friend, or regretting an argument that they had with their mother or spouse. Codifying these qualities, we built our story discovery engine to seek out these types of stories. While still making use of multiple retrieval filters described in the previous section, we added a component to the retrieval that found stories that began with a cue that the writer was about to describe a dream, nightmare, fight, apology, confession, or any other emotionally fraught situation. Such cues include phrases such as “I had a dream last night,” “I must confess,” “I had a terrible fight,” “I feel awful,” “I’m so happy that,” and “I’m so sorry.” This realization was an important turning point in our system’s capabilities with regard to retrieving compelling stories. The newest instance of Buzz no longer focuses on the popular or contentious topics, but instead focuses on stories in different types of emotionladen situations (dreams, fights, confessions, etc.). These stories are more interesting as the blogger isn’t talking about a popular product on the market, or ranting about a movie; they are relaying a personal experience from their life, which typically makes them emotionally charged. The experiences they describe are often frightening, funny, touching, or surprising. They describe situations which have a common element in all of our lives [20], giving the audience a way to relate to the content and live through the experiences of the writer, whereas the topically based approach excluded the portion of the audience that was not familiar with the topic at hand (a popular actress, story in the news, etc.). Including dramatic situations as a filter and search parameter not only gets us to more interesting story topics and content, but we also tend to see more character depth and development in the stories. As writers describe dramatic situations in their lives, more pieces of their personality and personal issues with themselves and others around them are revealed as a result. Filtering Retrieval by Syntax In our first pass at retrieving stories from blogs, we noticed that we often found lists or surveys instead of text in paragraph form. For example, one blogger posted an exhaustive list of lip balm flavors. Others posted answers to a survey about themselves (their favorite vacation spot, favorite color, favorite band and actor, etc.). These are clearly not good candidates for stories to be presented in a performance. To solve this problem, we chose to filter the retrieved blog entries by syntax. Blog entries that met any of the following criteria were removed: 1. too many newline characters (more than six in a entry of four hundred characters) 2. too many commas (more than three in a sentence) 3. too many numbers (more than one number in a sentence) This method successfully filtered blog entries that contained a list or survey of some sort. While the precision of such removal of blogs based on syntax was lower, we optimized for recall so that all potential lists and surveys were removed for the corpus. Given the large volume of blogs on the web updated every minute, letting some potentially good blogs fall through the cracks sufficed for our purposes. 3.4 Dramatic Situations 3.6 Complete Passages Given the blog entries that remained after passing through the five above mentioned filters (relevance, affect, syntax, colloquial and dramatic situations), the system must choose which pieces of blog entries to present to the audience. This involves finding complete thoughts or stories of a length that can keep the audience engaged. For the most part, we found that blog authors format their entries in a way such that each paragraph contains one distinct thought. Given this, the paragraph where the dramatic situation is mentioned with the greatest frequency will suffice as a complete story for our system. If this paragraph is of an ideal length (between a minimum and maximum threshold), which we determined by viewing Buzz with stories at many different lengths, then it is posted as a candidate story. For our system, we found that stories between 150 and 400 characters long were ideal. Again, given the large volume of blogs on the web, letting many blogs fall through the cracks because they are too long or too short is fine for our purposes. An example of three stories discovered by Buzz can be seen in Table 1. The stories shown were retrieved and passed through all above mentioned filters. Notice the differing emotional stances of the first and second stories. This was a deliberate and automatic juxtaposition of positive and negative passages. Colloquial Filtering Shamma, et al. [24], began exploring the use of Csikszentmihalyi Flow State [3] as a method of keeping the audience engaged through audiovisual interaction. In Buzz, for an audience to stay engaged, they must understand the content of the stories that they are hearing. That is, the story can’t involve topics that the audience is unfamiliar with or contain jargon particular to some field. The story must be colloquial. The story must also not be too familiar as they audience could get bored. To determine how colloquial a story is, we built a classifier that makes use of page frequencies on the web. For each word in the story, we look at the number of pages in which this word appears on the web, a frequency that is obtained through a simple web search. Applying Zipf’s Law [28], we can determine how colloquial each word is [23]. A story is then classified to be as colloquial as the language used in it. Given a set of possible stories, colloquial thresholds (high and low) are generated dynamically based on the distribution of scores. 264 Figure 3: Results of a study where participants judged how interesting stories chosen by and rejected by Buzz were. 3.7 Evaluation Figure 4: An architecture diagram of the Buzz system. To evaluate the effectiveness of our filters in finding compelling stories, we conducted a user study including twelve participants. Each participant was given five stories to score on a scale from one to ten (uninteresting to interesting). The stories were chosen at random from a set of stories selected by Buzz as good candidates for a performance, and a set of stories retrieved by Buzz but removed as they did not pass one of the five filters. On a scale of one to ten (uninteresting to interesting), the study participants found Buzz selected stories to be an average 7.13 and Buzz rejected stories to be an average of 4.3. A graph of the frequencies of participant scores across Buzz accepted and Buzz rejected stories can be seen in Figure 3. 4. We’ve found this display to be a good addition to the actors as it gives the audience more context in the performance and amplifies the impact of the emotional words. 4.2 CREATING A PERFORMANCE While finding compelling stories is an important aspect of Buzz, conveying them to an audience in an engaging way is just as crucial. We found several aspects of the presentation to be critical. The performance must follow a dramatic arc that keeps the audience engaged. Text-to-speech technology and graphics must be believable and evocative. Gender-specific stories must be presented by virtual actors of the appropriate gender. While these issues are a subset of those critical to an engaging performance, we chose to address these directly as we feel that our findings can generalize to other performance systems. 4.1 Director Level Control Given the above classifiers and filters, we are able to retrieve a set of compelling stories. These filters and classifiers also give us a level of control of the performance similar to that of a director. Having information about each story such as its “emotional point of view”, and its “familiarity”, we can plan out the structure of the performance from a high level view before retrieving the performance content, giving the performance a flow, based not only on content, but on emotion, familiarity, on-point vs. tangential, etc. Given a topic, we can juxtapose stories with different emotional stances, different levels of familiarity, and on-point vs. off-point. These affordances give a meaningful structure to the performance. To provide a high level control of the performance, we created an architecture for driving the retrieval of performance content. The structures, called Adaptive Retrieval Charts (or ARCs), provide high level instructions to the Buzz engine as to what is needed, where to find it, how to find it, how to evaluate it, how to modify queries if needed and how to adapt the results to fit the current goal set. To get an idea of how the ARCs interact with the blog search and filters, see Figure 4. An example of an ARC used in Buzz is shown in Figure 5. The pictured ARC defines a point/counterpoint/dream interaction between agents. The three modules define three different information needs, as well as the sources for retrieval to fulfill these needs. The first module specifies that we want a blog entry that is on point to a specified topic, has passed through the syntax and colloquial filters, and is generally happy on the topic. The module specifies using Google Blog Search [6] as a source. The source node specifies to form queries by single words as well as phrases related to the topic. If too few results are returned from this source, we have specified that queries are to be continually modified by lexical expansion and stemming. The Display The current Buzz installations include five flat panel monitors in the shape of an ’x’. The four outer monitors display actors represented by different adaptations of the graphics from Ken Perlin’s Responsive Face technology [22]. These faces are synchronized with voice generation technology [16] controlled through the Microsoft Speech API, matching mouth positions on the faces to viseme events, lip position cues output by the MSAPI. Within this configuration, the actors are able to read stories and turn to face the actor currently speaking. The central screen (shown in figure 2) displays emotionally evocative words, pulled from the text currently being spoken, falling in constant motion. These words are extracted using the emotion classification technology described in the section on “Filtering Retrieval by Affect.” The most emotional words are extracted by finding the words with the largest disparity between positive and negative probabilities in the Naı̈ve Bayes statistical model. 265 Figure 5: A sample ARC from the Buzz system, defining a point/counter point interaction between agents. The ARC extensible framework allows for interactions from directors with no knowledge of the underlying system. In a future system, we will accomplish this via a range of possible interfaces from storyboarding and affect manipulation to a natural language interface. 4.3 Table 3: Precision and Recall Scores for specific stories. Document Type Precision female-specific 92.59% male-specific 100% gender-neutral 89.66% overall 91.67% Compelling Speech While text-to-speech systems have made great strides in improving believability of generated speech, these systems are not perfect [1]. Their focus has been on telephony systems, where the length of time of spoken speech is limited. In watching a Buzz performance, we found that the voices tended to drone monotonously during stories longer than one to two sentences. An additional problem we encountered using text-to-speech systems to read blogs was caused by the stream of consciousness nature of some blogs, resulting in casual formatting with poor or limited punctuation. Text-to-speech systems rely on punctuation to provide natural pauses in the speech. In blogs where limited punctuation was present, we found that the voices tended to drone on even more. In response to these issues, we created a model for speech emphasis. In recent work, others have created models for how to emphasize words [21] and which words to emphasize. While these models are successful, we strove to create a simple model that would scale to our needs. To select words to emphasize, we first used emotional word extraction, using the Naı̈ve Bayes statistical model discussed in the section on “Filtering Retrieval by Affect” to find the words with the largest disparity between positive and negative probabilities. As we were using the Microsoft Speech API to control the NeoSpeech voices, we were able to use XML markup provided by the MSAPI to control the volume, rate and pitch of the voices, as well as insert pauses of different periods (specified in milliseconds) in the speech. Using emotional words for emphasis, we found the top two emotional words from each sentence. We emphasized these words by increasing the volume of the voice (from 70% to 100%) and slowing the rate (from an absolute rate of 0 to a rate of 2) while speaking these words. While this method did break the monotony of the speech, we found that it did not preserve the flow of the speech, resulting in choppy sounding speech. This also did not solve the more prevalent problem of the limited punctuation of blogs. detection of genderRecall 86.21% 84.62% 96.30% 91.67% To smooth the choppiness of this emphasis, we found that emphasizing the entire noun phrase where emotional words appeared tended to sound smoother than just emphasizing the emotional word itself. To accomplish this, we used a part of speech tagger [2], extracting all noun phrases from a passage. We chose to emphasize the most emotional noun phrase in each sentence. To solve the problem of limited punctuation, we chose to insert a pause following each emphasized noun phrase, serving as a natural breaking point. While our model of speech emphasis is simplistic, we’ve found it to be effective in enhancing the Buzz experience. We expect to further tweak our emphasis model in response to audience feedback. 4.4 Detecting Gender-Specific Stories One problem encountered in a first pass of building Buzz was that gender-specific stories were occasionally read by actors of the incorrect gender. For example, if a blog author describes their experiences during pregnancy, it is awkward to have this story performed by a male actor. Conversely, if a blogger talks about their day at work as a steward, having this read by a female could also be slightly distracting. As a solution to this problem, we sought to detect and classify gender-specific stories. Unlike previous gender classification systems [9], it was not necessary for our system to classify all stories as either male or female. Rather, it was only important for us to detect stories where the author’s gender is evident, thus classifying stories as male, female, neutral (in the case where gender-specificity is not evident in the passage), or ambiguous (in the case where both male and female indicators are present). To do this, we look for specific indicators that the story is written 266 6. by a male or a female. These indicators include roles (family and jobs), relationships, and physical states. To detect self-referential roles in a blog, the system looks for ‘I’ references including “I am”, “I was”, “I’m”, “being”, and “as a.” These phrases indicate gender-specificity if they are followed within five words (if none of these five words are pronouns) by a female-only or male-only role such as wife, mother, groom, aunt, waitress, mailman, sister, etc. Such roles were collected from various sources and enumerated as such. Excluding extra pronouns between the self reference and the role eliminates false positives such as “I was close to his girlfriend.” To detect physical states that carry gender connotations, the system again looks for ‘I’ references, as above, followed within five words by a gender-specific physical state such as pregnant. As in detecting roles, we also ignore cases with extraneous pronouns between the ‘I’ reference and the physical state. This eliminates false positives such as “I was amazed by her pregnancy.” To detect male or female-only relationships, the system looks for use of the word ‘my’ followed within five words by a male or female only relationship such as husband, ex-girlfriend, etc. Again, cases with extraneous pronouns are ignored to eliminate false positives such as “my feelings towards his girlfriend.” In this our first pass at a gender specific story classification system, we make the assumption of heterosexual relationships, which we hope to relax in a future system. If any of these cases exist and they agree on a male/female classification, then it is classified as such. If they disagree, it is classified as ‘ambiguous.’ If no indicators exist, it is classified as ‘neutral.’ This gender detection tool was evaluated using a corpus of 96 stories retrieved by Buzz. These stories were retrieved from an indexed corpus of stories found by Buzz. They were selected by queries for words that often indicate gender-specificity (‘pregnant’, ‘mom’, ‘mother’, ‘dad’, ‘father’, ‘girlfriend’, ‘boyfriend’, ‘husband’, ‘wife’, and ‘daughter’). They were sorted into three groups, stories written by females, males, or neutral (written by males or females). This sorting was based on textual cues that gave a clear indication of gender, and was verified unanimously by from 5 participants. While our gender-classification system is still simple, it does an admirable job. Results showed that the gender detection tool performed very well, as seen in the precision and recall scores in Table 3. Overall precision and recall were both approximately 91.67%. Enabling Buzz with the ability to detect and handle genderspecific stories has created a more realistic performance, without the distraction of an actor performing a gender-mismatched story. 5. CONCLUSION Initially, story discovery within Buzz was based on popular topics. As we approached the task of engaging the user, it became more important that the stories themselves were compelling, as opposed to topical. Using filters and information retrieval strategies that focused on finding the interesting and not the topical has resulted in an engaging theatrical installation. In the future, we will turn our focus back to topics, discovered within the scope of interesting stories. While finding compelling stories to present is a very important part of the Buzz performance, presenting these stories in a way that is meaningful and engaging is equally important. We found issues of gender-specificity, voice prosody, and presentation order to be the aspects of a Buzz performance with which we could make great strides in improving. Future work in the presentation of Buzz will include more realistic looking avatars and continued work on enhancing the voice prosody. 7. FUTURE WORK Our current and future work in this area involves expanding Buzz into a full length improvisational performance on stage, interacting with human actors. We are building a full body projected avatar host with voice generation, and voice recognition to take audience suggestions and interact with human actors. Understanding the state of current technology in voice recognition, we are enabling the host to drive her conversations with actors and the audience, to recover from mistakes, and express and expose her shortcomings. This production will make use of the ARC architecture to allow a high level control of the flow of the performance. Our research in story discovery will serve as a platform for character development for the host, as she can relate to and participate in discussions by telling stories discovered from blogs related to the current conversation topic or audience suggestion. 8. ACKNOWLEDGMENTS This material is based upon work supported by the National Science Foundation under Grant No. 0535231. 9. REFERENCES [1] A. Black. Perfect synthesis for all of the people all of the time. In IEEE TTS Workshop, Santa Monica, CA, 2002. [2] E. Brill. Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Computational Linguistics, 21:543–565, 1995. [3] M. Csikszentmihalyi. Flow: The Psychology of Optimal Experience. Harper & Row, New York, NY, USA, 1990. [4] N. Dehn. Story generation after tale-spin. In Proceedings of the Seventh International Joint Conference on Artificial Intelligence., University of British Columbia, 1981. [5] P. Ekman. Emotions Revealed: Recognizing Faces and Feelings to Improve Communication and Emotional Life. Henry Holt and Company, New York, NY, 2003. [6] GoogleBlogSearch. http://blogsearch.google.com, 2006. [7] M. Hansen and B. Rubin. Listening post: Giving voice to online communication. In Proceedings of the 2002 International Conference on Auditory Display. ACM Press, 2002. [8] S. Klein, J. Aeschlimann, D. Balsiger, S. Converse, C. Court, M. Foster, R. Lao, J. Oakely, and J. D. Smith. Automatic novel writing. Technical report, University of Wisconsin Madison., 1973. BUZZ IN THE WORLD Enabling Buzz with the ability to discover compelling stories on a popular topic has produced great results. Buzz has changed from an installation that was unbearably dull, exposing the boring nature of many blogs, to a system that engages its viewers. The performance is now not driven simply by the relevance of on-line content, but by the blogger’s emotional state. The highly emotional content engages the audience and creates a high visibility installation. Buzz was exhibited last year at the Athenaeum Theater as a part of the 8th Annual Chicago Improv Festival. It was well-received by actors, writers, producers and theater-goers alike during this ten day installation. Buzz was installed in the lobby of Chicago’s Second City theater at 1616 N. Wells St. in Chicago on August 24th, 2005 for a long term installation, currently still running. Buzz will also be exhibited at Wired NextFest in New York City from September 29th to October 1st, 2006. 267 [9] M. Koppel, S. Argamon, and A. R. Shimoni. Automatically categorizing written texts by author gender. Literary and Linguistic Computing, 2003. [10] H. Liu, H. Lieberman, and T. Selker. A model of textual affect sensing using real-world knowledge. In Proceedings of the 8th international conference on Intelligent user interfaces, pages 125–132. ACM Press, 2003. [11] LiveJournal. http://www.livejournal.com, 2004. [12] M. Mateas. Expressive ai: A hybrid art and science practice. Leonardo: Journal of the International Society for Arts, Sciences, and Technology, 34(2):147–153, 2001. [13] M. Mateas, S. Domike, and P. Vanouse. Terminal time: An ideologically-biased history machine. In Proceedings of the 1999 AISB Symposium on Artificial Intelligence and Creative Language: Stories and Humor. ACM Press, 1999. [14] J. R. Meehan. Tale-spin, an interactive program that writes stories. In Proceedings of the 5th IJCAI, pages 91–98, 1977. [15] J. Murray. Hamlet on the Holodeck: The Future of Narrative in Cyberspace. The Free Press, 1997. [16] NeoSpeech. http://www.neospeech.com/, 2005. [17] A. Ortony, G. L. Clore, and M. A. Foss. The referential structure of the affective lexicon. Cognitive Science, 11:341–362, 1987. [18] S. Owsley, D. A. Shamma, K. J. Hammond, S. Bradshaw, and S. Sood. The association engine: a free associative digital improviser. In Proceedings of the 12th International Conference on Multi-Media. ACM Press, 2004. [19] S. Owsley, S. Sood, and K. Hammond. Domain specific affective classification of documents. In Proceedings of the AAAI Spring Symposium on Computational Approaches to Analysing Weblogs., March 2006. [20] G. Polti and L. Ray. The Thirty-Six Dramatic Situations. Writer, Boston, 1940. [21] A. Raux and A. Black. A unit selection approach to f0 modeling and its application to emphasis. In ASRU, St Thomas, US Virgin Islands, 2003. [22] Responsive Face Project, NYU Media Research Lab. http://www.mrl.nyu.edu/perlin/facedemo/, 2000. [23] D. A. Shamma, S. Owsley, S. Bradshaw, and K. J. Hammond. Using web frequency within multi-media exhibitions. In Proceedings of the 12th International Conference on Multi-Media. ACM Press, 2004. [24] D. A. Shamma, S. Owsley, K. J. Hammond, S. Bradshaw, and J. Budzik. Network Arts: Exposing cultural reality. In Alternate track papers & posters of the 13th international conference on World Wide Web, pages 41–47. ACM Press, 2004. [25] S. Sood, S. Owsley, and K. Hammond. Reasoning through search: A novel approach to sentiment classification. In submitted to EMNLP, July 2006. [26] Wikipedia. http://www.wikipedia.org, 2005. [27] Yahoo Buzz Index. http://buzz.yahoo.com/, 2005. [28] G. Zipf. Human Behavior and the Principle of Least-effort. Addison-Wesley, Cambridge, MA, USA, 1949. 268