Carolyn Penstein Rosé Language Technologies Institute and Human-Computer Interaction Institute New York Times Article What strikes you about the agent’s style of speaking? June 24, 2010 Computers Learn to Listen, and Some Talk Back By STEVE LOHR and JOHN MARKOFF “Hi, thanks for coming,” the medical assistant says, greeting a mother with her 5-year-old son. “Are you here for your child or yourself?” The boy, the mother replies. He has diarrhea. “Oh no, sorry to hear that,” she says, looking down at the boy. The assistant asks the mother about other symptoms, including fever (“slight”) and abdominal pain (“He hasn’t been complaining”). She turns again to the boy. “Has your tummy been hurting?” Yes, he replies. After a few more questions, the assistant declares herself “not that concerned at this point.” She schedules an appointment with a doctor in a couple of days. The mother leads her son from the room, holding his hand. But he keeps looking back at the assistant, fascinated, as if reluctant to leave. Maybe that is because the assistant is the disembodied likeness of a woman’s face on a computer screen — a no-frills avatar. Her words of sympathy are jerky, flat and mechanical. But she has the right stuff — the ability to understand speech, recognize pediatric conditions and reason according to simple rules — to make an initial diagnosis of a childhood ailment and its seriousness. And to win the trust of a little boy. Not all so rosy… Are we missing something? Sociolinguists and Discourse Analysts have been studying social aspects of language since the 20s and 30s!!! Ask yourself this: Where do I sound like I’m from? Actually from California, but picked up some accent from my dad from New York... Did you notice the a in Carolyn? But not the back-open r. And if you heard me say “daughter…” But how often do I say that in class? Note that context is everything. a in sat doesn’t have the same significance as a in Carolyn. What information are we throwing away or ignoring that would allow us to distinguish meaningful variation from meaningless variation? What will you get out of this class? Learn to read the primary literature in sociolinguistics, discourse analysis, and pragmatics Get a more intimate familiarity with the state-of-theart in language processing applied to analysis of social media, especially conversation and narrative Explore what insights these fields of linguistics can contribute to language technologies Explore what language technologies might be able to do to advance these fields of linguistics Get hands on experience working on both Please Introduce Yourself… What experience do you have with discourse analysis? What do you most want to get out of this class? Discourse and Identity Identity is reflected in the way we present ourselves in conversational interactions Reflects who we are, how we think, and where we belong Also reflects how we think of our audience Examples Regional dialect: shows my identification with where I am from, but also shows I am comfortable letting you identify me that way Jargon and technical terms: shows my identification with a work community, but also shows I expect you to be able to relate to that part of my life Level of formality: shows where we stand in relation to one another Explicitness in reference: shows whether I am treating you like an insider or an outsider Discourse and Identity Discourse is text above the clause level (Martin & Rose, 2007) A Discourse is an ongoing conversation [type] Socialization is the process of joining a Lakoff & Johnson, 1980 Discourse (Lave & Wenger, 1991; Sfard, 2010) We join Discourses that match our core identity (de Fina, Schiffrin, & Bamberg, 2006) In moving from the periphery to the core of a Discourse community, we sound more and more like the community (Arguello et al., 2006) A discourse is one instance of it [token] All discourses contain echoes of Lave & Wenger, 1991 previous discourses (Bakhtin, 1983) Metaphors Structure our Experience We describe arguments using terms related to war Using a typical war ‘script’ to structure a story about an argument We orient towards arguments as though they were wars Our conversational partner is our opponent We may feel that we won or lost We may feel wounded as a result Discourses, Frames, and Metaphors Frame: A portion of a discourse belonging to distinct Discourse Metaphor : One linguistic device that can be used to define a set of discourse practices that constitute a frame Topic models: a technical approach that makes sense for identifying frames within a discourse A discourse could be drawn from a mixture of Discourses Within the same conversation, we may wear a variety of “hats” E.g., the same discourse with a co-worker may contain exchanges pertaining to our relationship as colleagues and others to our relationship as friends http://video.google.com/videoplay?docid=-6547777336881961043&hl=en# Discussion Questions What other stories/movies/genres does this remind you of? What is the message being communicated about Hummers? What is communicated about the company that makes them? What is communicated about the assumed audience? What are other messages? E.g., are any political statements being made? Semester Plan Unit 1: Theoretical Foundation Unit 2: Linguistic Structure Unit 3: Sentiment Unit 4: Identity and Personality Unit 5: Social Positioning In each Unit: Readings from Discourse Analysis and Sociolinguistics Readings from Language Technologies Hands-on assignment Implementation and corpus based experiment Competitive error analysis Student Presentations Grading people who make a good faith effort always do well in my courses… 15% for each of 5 Unit assignments First one is a discourse analysis Others are corpus based experiments We provide the corpus You implement a feature extractor, test it, do an error analysis, and present your well motivated idea and evaluation in class 10% for class participation Doing readings (will be posted to course Drupal) Posting to Drupal discussion by 10pm the night before class Actively contributing to class discussions 15% for final critique of a technical paper Corpora for experimentation Unit 2: Maptask data (Negotiation coding) Possibly other chat corpora with same coding as well Unit 3: Product Reviews (Sentiment) Unit 4: Blog corpus (Age and Gender) Unit 5: AMI meeting corpus (Dialogue Acts) Other corpora Email discussion list (Social Support coding) SIDE: Workbench for Experimentation http://www.cs.cmu.edu/~cprose/SIDE.html SIDE SIDE Two Options Create your own feature extractor plugins We will provide documents abstract classes that you create specializations of Programmed in Java Elijah is the developer and can answer your questions Use SIDE’s feature creation functionality to create novel functions Grades will be based on: The extent to which your features are theory motivated or data motivated The depth of your error analysis SIDE SIDE SIDE Setting up the course Drupal If you are not registered, please do so If you don’t have an Andrew account, make sure I have your email address We will manage the course through Drupal All materials, including pdfs for required readings, will be posted to Drupal Slides for all lectures will be posted to Drupal after class Discussion threads in preparation for each lecture will be found on Drupal Assignment 1 (not due til Jan26) Transcribe a scene from a favorite move, play, or TV show As a shortcut, you can find a script online Excerpt should be no more than one page of text Select one of the methodologies we are discussing in Unit 1 (e.g., from Gee, Martin & Rose, or Levinson) Do a qualitative analysis of the script and write it up Use readings from Unit 1 as a collection of models to chose from Due on Week 3 lecture 2 Turn in transcript, raw analysis (can be annotations added to the transcript), and write up (your interpretation of the analysis) Prepare a powerpoint presentation for class (no more than 5 minutes of material) For next time…. You will receive login information for Drupal http://kanagawa.lti.cs.cmu.edu/11719/ Read excerpts from James Gee’s book (linked to syllabus entry for Wednesday’s lecture) Post to drupal (in response to discussion question posted for Week 1 Lecture 2) Carolyn Penstein Rosé http://www.cs.cmu.edu/~cprose cprose@cs.cmu.edu Gates-Hillman Center 5415