Synthetic Interviews This document describes the basic Synthetic Interview ® technology, features of the rule-base driven, Intelligent Synthetic Interview ®, communications with simulated objects and virtual worlds (multimode input to SI), SI sub-systems for automatically improving the quality of Synthetic Interviews, and potential applications enabled by these technologies. Basic Synthetic Interview ® Process and Technology Description Synthetic Interviews permit experts to scale and individuals to span time. It is a technology and technique that creates an anthropomorphic interface into multimedia data of a particular kind: video of a person responding to questions (interacting with another person). The responses of the interviewee are presented in such a way as to simulate the experience of interacting with a live person. . Thus, Synthetic Interviews provide a means of conversing in-depth with an individual or character, permitting users to ask questions in a conversational manner (just as they would if they were interviewing the figure face-to-face), and receive relevant, pertinent answers to the questions asked. The process of creating a Synthetic Interview is split into are four principal phases: Preproduction/Production (domain/biographical analysis, video pre-production & production). Language Analysis (indexing and the creation of language models relevant to the domain of discourse), Integration, (video, html, and other media with the SI index), and Testing. Pre-production and production is similar to a traditional video project. Tasks include: scripting, assembly of crew, casting, location selection, video format selection; special effects, interface design, and scheduling. The principal difference is the domain analysis and “pool” capture. In a Synthetic Interview, it is necessary to develop and anticipate the questions likely to be asked by the target audience. Still, it is impossible to predict every possible question. It is important that the interface, script, and overall experience is designed to set user expectations. That is, if the users believe they are interacting with an astronaut, they are unlikely to ask questions about cardiology. Equally important is the ability to deal with unexpected questions. We have developed a series of pool topics and associated questions to handle events such as: out-of-bounds questions and statements, follow-on questions, exceptions, transitions, and transformations. Transitions include phrases like, “I disagree with you.” And transformations change invalid statements to valid ones such as, “I don’t really know about that, but let me discuss something else of interest.” Follow-on statements are handled by specific transitions such as, “That’s really all I have to say.” Or, “As I was saying.” Out of bound questions are recognized, but not answered, i.e. admonishments for obscene questions. And exceptions handle unrecognized questions, “I don’t have anything to say.” Or “Please repeat yourself.” For indexing and retrieval the basic Synthetic Interview uses a context free grammar similar to a Bayesian Optimal Classifier. Early Synthetic Interviews required significant effort. New, Page 1 of 9 automatic subsystems (discussed below) greatly reduce this effort. Prior to indexing, we apply a combination of manual and automatic language expansion to the base set of interview questions. Manual techniques are used for semantic expansion and automatic techniques foe syntactic expansion. For example, assume a base question/answer pair of Q1: When were you born? A1: I was born on April First, 1968, in Chicago, Illinois. Manual semantic expansion would include generating a set of questions mapping to this answer including Q1a: How old are you? Q1b: What’s your age? Q1c: Where were you born? Q1d: What’s your birthday? Depending on the content of the full interview, one might even map “Where did you grow up?” to A1. Since listeners fill in much in natural conversation, A1 will be typically acceptable if no specific response is available and likely better than responding with a pool such as “I don’t have an answer for that question.” Simple automatic syntactic expansion would include Q1c -> Q1c’: What is your birthday? from grammatical expansion and Q1d: -> Q1d’: What’s [What is] your date of birth? from grammatical and synonomic expansion. The indexer/retrieval system takes any typed sentence and retrieves what it believes to be the most relevant response. As a consequence of this design there are two principal error types to test for: 1) indexing and retrieval errors wherein incorrect responses are presented for proper sentences; and, 2) sentences or topics that were not covered during pre-production domain analysis. Intelligent Synthetic Interview In order to better model appropriate discourse and personae personality a rule-base mediates all interactions between the user the basic Synthetic Interview. In practice, rule-base and language analysis occur simultaneously and are interdependent on one another. We have developed personality attributes that can change over the course of the interaction with the user. Initially we identified four basic attributes, dissatisfaction, unhappiness, frustration and skepticism. Other attributes can be easily added to create more complex behavioral reactions. Page 2 of 9 Further, a complete history of the interactions is kept. This permits the context of the conversation to be better understood and the experience dynamically tailored to the current user and situation. At any point in the dialog the system tracks stages, topics, discourse specificity within topics, and “knowledge points." Stages, are used to keep track of where the user is in the course of the conversation (i.e. the first 5 minutes of a conversation vs. 10 minutes of discourse on one subject 60 minutes into a conversation). Topics are clusters of questions, at varying levels of specificity, about a subject (e.g., number seats in the Space Shuttle, their shape, their upholstery, etc.). Questions within a topic are assigned specificity levels to differentiate levels of detail; more detailed questions have higher specificity levels and more detailed answers. “Knowledge points” are the most detailed response (highest specificity level) in a topic, providing the user with the greatest information on a topic and guiding the user through the interactions Topics1 are integrated with rules of behavior. The combination of topics and rules guide the iterative development of question-answer pairs. Behavioral changes drive the adaptation and addition of both topics and discourse within topics. As important as scripting anticipated questions is the ability to deal with unexpected questions. Such events are managed by a series of pool topics and associated questions (e.g. I don't understand, I've already answered that, I'm leaving). The following are examples from each of the pool categories. The ‘Don't Understand Pool’ includes phrases like “I’m sorry, I don’t understand the question” or “I'm not sure.” A ‘Storm out answer’ would be “I've had enough. I’m leaving.” If the system recognizes that the same question has been asked sequentially, the ‘I’ve already answered that’ pool will respond with an answer such as “I gave you an answer already” and “Didn't we just talk about that?” Three classes of rules have been developed for the Synthetic interview, administrative, statechanging, and state-effect rules. Administrative rules keep track of the current state as well as items like what topic was active, a list of closed topics, etc. State-changing rules deal with modifying the emotional state of the customer based on the question asked and/or the history of questions. State-effect rules use the current customer state to determine the “correct” answer to the asked question. Finally, project members performed manual semantic and syntactic expansion (permutations). Two main techniques were used in the process for developing permutations: 1) each team member was asked to provide five different forms of each question; and, 2) all terms used in the questions were run through a thesaurus to increase vocabulary. Results from each process were integrated. After integration the resulting question sets were reviewed and added to as necessary. MBUK training experts reviewed the questions for both coverage and idiomatic form. Alpha testing provided a second order expansion of the question forms. Five tables were created in the database, question, topic, stage, answer, and event. The question table contains the questions themselves, tracking information on the questions, and the code for 1 We were not exhaustive in our topic list. We felt the number of topics was sufficient to provide a reasonable test of the SI. More time will need to be spent on this stage of development to make a more complete list. Page 3 of 9 the appropriate answer for each of the three response levels. The topic table contains tracking information for all of the topics. The stage table contains tracking information for all of the stages. The answer table contains all of the possible answers that can be given (pointers to the video clips), including pool responses. The event table contains all the possible ways the customer can change state. Multimode input to Synthetic Interviews. Besides text input, Synthetic Interviews can now monitor user's interactions with other objects in the browser. This permits the SI character to respond to users' interactions with simulated objects. For example: a user could rotate a VR representation of the International Space Station. At each instant the SI would know what the user was looking at and could respond appropriately. If the user manipulated a section, say moved into the living quarters, the SI would similarly understand what the user was doing. Throughout any point the SI can answer questions, present relevant video, stills and audio or even manipulate the VR simulation to emphasize the spoken explanation. The combination of the Intelligent SI with multimode input will even permit multiple characters to appear simultaneously, respond to the same question, or even "talk" to one another. Learning The indexer/retrieval system takes any typed sentence and retrieves what it believes to be the most relevant response. Often the selected response will be good enough even if not a precise answer to the question. However, indexing and retrieval errors do occur when the system mistakenly returns an answer that is inappropriate. For example if there is an answer to "Was driving the lunar rover like driving a car?" the system will likely return that answer to "What kind of car do you drive?" However if we could know that the second question has no answer we could return a comment like "I don't have an answer for that question." New subsystems cache all user interactions for any combination of Synthetic Interviews and share data. This is particularly important because it permits us to "know-what-we-don't-know." For example, the two questions above are recognized as distinct and the system will understand that there is no answer to "What kind of car do you drive." Our current indices have knowledge of tens of thousands of questions. As more Synthetic Interviews are accessed by large numbers of users, this data will grow to millions and provide an invaluable resource for future Synthetic Interview development. Applications Virtual Chat Page 4 of 9 Virtually hosted - A virtual talk show host asks questions of the celebrities and moves the discussion along. The host can even bring up film clips. ("We have a clip from your first cameo, let's look at it." It plays and the host may ask for a comment or the talent may volunteer one.) The user (audience of one) may interject a question at any time. The host may respond (especially for inappropriate questions, freeing up the talent from making extensive pools of generic responses to manage obscenities) or the celebrity may respond. Any response may include other multimedia (video clips, music clips, pictures, scripts, links to other sites or pages) Multiple Celebrities - The site may offer multiple celebrities at the same time or users may ask to have several celebrities on at once. Each is actually a separate synthetic interview and could also have been played in isolation. The system manages the discussion and knows when multiple people have comments on the same question. Virtual chat participants - the user sees a chat window and other questions are continually appearing. These questions are being generated by the system. As always, the user can ask questions whenever he or she wishes. Real chat participants - from the user's perspective, this is the same as the virtual chat participant version, but the questions are being submitted by other users in real-time. Real chat rooms - real chat rooms devoted to particular celebrities, their Synthetic Interviews or special events. Participants can present answer from celebrities to illustrate points. (Did you hear what she said about that nude scene? Listen.) Build your own Interview - record your ideal interview with your favorite celebrity and prove you're his or her biggest fan. This interview is then published on our site and played in a linear fashion. Premium services (paying) allow users to have their picture there when the text is played. For an extra fee the audio of the user asking the question can be played. And for more money we will host video of the user asking their question. The last two versions will need manual monitoring to insure no obscene or offensive language is being used. For the text only version this can be automated. Create your own SI - our host can ask questions that premium members answer and we host. The virtual set - sets includes: scrap books of pictures; juke boxes of songs and music videos for musicians; clips, movie clips, outtakes, scripts, and filmographies for actors; training tips, game highlights, and equipment suggestions for athletes. Special Requests - Users can request an autographed picture and have it printed on the spot. Shopping - Either in response to user questions, prompts from the host, or available links on the virtual set, celebrities let users know where they buy their clothes, gadgets, cars, etc. and can take users to online shops or what they do for leisure activities and link to travel agents, Ticketron, golf courses, etc. Page 5 of 9 Page 6 of 9 Appendix A Database Table Structure DataAnswer sAID non-unique identifier bNugget answer is a "final answer" sKeywords keywords for the answer (unused in this prototype) sTranscript transcript of the answer video sVideo pointer to the video file DataEvent sEID unique identifier sDescription description of the event (used for debugging) iDeltaU change in unhappiness index iDeltaD change in dissatisfaction index iDeltaS change in skepticism index iDeltaF change in frustration index DataGeneral sPKIndex path to index files sClipExt extension of the video clips sClipServerDir absolute path to video clips DataPool sPID unique identifier sDescription description of the pool sBackupAID AID for backup answer sNeutralAID AID for neutral answer sHappyAID AID for happy answer Page 7 of 9 DataQuestion sQID unique identifier sDescription question description (single permutation) iSpecificity specificity of the question sTID TID for this question sEID EID for the event associated with this question sBackupAID AID for backup answer sNeutralAID AID for neutral answer sHappyAID AID for happy answer DataRules iMaxTopics maximum topics to be open at one time sMaxTopicsEID EID for event if more than the maximum are open iPrecentCutoff cutoff for "do not understand" response to a question DataStage sSID unique identifier sDescription stage description iMinTopics minimum number of topics to be covered in this stage sMinTopicsEID EID for event if less than the minimum are opened iMaxTopics maximum number of topics for this stage sMaxTopicsEID EID for event if more than the maximum are open iOrder order for this stage (unused in this prototype) sGoBackEID EID for event if this stage is returned to DataTopic sTID unique identifier sDescription description of this topic sSID SID of stage associated with this topic bNugget is this a stage-ending topic Page 8 of 9 Appendix B Coding Manual KEY CODES A= Legitimate question asked, given an incorrect Response when there was a correct Response available. B= Legitimate question asked, given an incorrect Response and there was NO correct Response available. C= Legitimate question asked, given a "temporary answer" when there was a correct Response available. D= Legitimate question asked, given a "temporary answer" when there was NO correct Response available. E= Non-Legitimate (nonsense) question asked, given an incorrect Response when there was a correct Response available. F= Non-Legitimate (nonsense) question asked, given an incorrect Response when there was NO correct Response available. G= Non-Legitimate (nonsense) question asked, given a "temporary answer" when there was a correct Response available. H= Non-Legitimate (nonsense) question asked, given a "temporary answer" when there was NO correct Response available. I = Legitimate question asked, given a correct response. J = Legitimate question asked, no Response available, but a sufficient answer is found. X= This code is added to a letter when the response that was given was incorrect, but sufficient. [Temporary Answer: Could you please rephrase that question? OR Let me remind you that we are here to talk about .... ] Page 9 of 9