Displayed Bias as a Reflection of Both Speaker and Intended

advertisement
Carolyn Penstein Rosé
Language Technologies Institute
and Human-Computer Interaction Institute
New York Times Article
What strikes you about the agent’s style of speaking?
June 24, 2010
Computers Learn to Listen, and Some Talk Back
By STEVE LOHR and JOHN MARKOFF
“Hi, thanks for coming,” the medical assistant says, greeting a mother with her 5-year-old
son. “Are you here for your child or yourself?”
The boy, the mother replies. He has diarrhea.
“Oh no, sorry to hear that,” she says, looking down at the boy.
The assistant asks the mother about other symptoms, including fever (“slight”) and
abdominal pain (“He hasn’t been complaining”).
She turns again to the boy. “Has your tummy been hurting?” Yes, he replies.
After a few more questions, the assistant declares herself “not that concerned at this
point.” She schedules an appointment with a doctor in a couple of days. The mother leads
her son from the room, holding his hand. But he keeps looking back at the assistant,
fascinated, as if reluctant to leave.
Maybe that is because the assistant is the disembodied likeness of a woman’s face on a
computer screen — a no-frills avatar. Her words of sympathy are jerky, flat and
mechanical. But she has the right stuff — the ability to understand speech, recognize
pediatric conditions and reason according to simple rules — to make an initial diagnosis
of a childhood ailment and its seriousness. And to win the trust of a little boy.
Not all so rosy…
Are we missing something?
Sociolinguists
and Discourse Analysts
have been studying
social aspects of language
since the 20s and 30s!!!
Ask yourself this:
Where do I sound
like I’m from?
Actually from California, but picked up some accent from my dad from New
York...
Did you notice the a in Carolyn? But not the back-open r. And if you heard me
say “daughter…” But how often do I say that in class?
Note that context is everything. a in sat doesn’t have the same significance as a in Carolyn.
What information
are we throwing away
or ignoring
that would allow us to
distinguish meaningful variation
from meaningless variation?
What will you get out of this class?
 Learn to read the primary literature in sociolinguistics,
discourse analysis, and pragmatics
 Get a more intimate familiarity with the state-of-theart in language processing applied to analysis of social
media, especially conversation and narrative
 Explore what insights these fields of linguistics can
contribute to language technologies
 Explore what language technologies might be able to
do to advance these fields of linguistics
 Get hands on experience working on both
Please Introduce Yourself…
 What experience do you have with discourse analysis?
 What do you most want to get out of this class?
Discourse and Identity
 Identity is reflected in the way we present ourselves
in conversational interactions
 Reflects who we are, how we think, and where we
belong
 Also reflects how we think of our audience
 Examples
 Regional dialect: shows my identification with where
I am from, but also shows I am comfortable letting you
identify me that way
 Jargon and technical terms: shows my identification
with a work community, but also shows I expect you
to be able to relate to that part of my life
 Level of formality: shows where we stand in relation
to one another
 Explicitness in reference: shows whether I am
treating you like an insider or an outsider
Discourse and Identity
 Discourse is text above the clause
level (Martin & Rose, 2007)
 A Discourse is an ongoing
conversation [type]
 Socialization is the process of joining a
Lakoff & Johnson, 1980
Discourse (Lave & Wenger, 1991; Sfard,
2010)
 We join Discourses that match our core
identity (de Fina, Schiffrin, & Bamberg,
2006)
 In moving from the periphery to the
core of a Discourse community, we
sound more and more like the
community (Arguello et al., 2006)
 A discourse is one instance of it
[token]
 All discourses contain echoes of
Lave & Wenger, 1991
previous discourses (Bakhtin, 1983)
Metaphors Structure our
Experience
 We describe arguments using
terms related to war
 Using a typical war ‘script’ to
structure a story about an
argument
 We orient towards arguments
as though they were wars
 Our conversational partner is
our opponent
 We may feel that we won or
lost
 We may feel wounded as a
result
Discourses, Frames, and
Metaphors
 Frame: A portion of a discourse belonging to
distinct Discourse
 Metaphor : One linguistic device that can be
used to define a set of discourse practices that
constitute a frame
 Topic models: a technical approach that makes
sense for identifying frames within a discourse
 A discourse could be drawn from a mixture of
Discourses
 Within the same conversation, we may wear a
variety of “hats”
 E.g., the same discourse with a co-worker may
contain exchanges pertaining to our relationship
as colleagues and others to our relationship as
friends
http://video.google.com/videoplay?docid=-6547777336881961043&hl=en#
Discussion Questions
 What other stories/movies/genres does this remind
you of?
 What is the message being communicated about
Hummers?
 What is communicated about the company that makes
them?
 What is communicated about the assumed audience?
 What are other messages?
 E.g., are any political statements being made?
Semester Plan
 Unit 1: Theoretical
Foundation
 Unit 2: Linguistic Structure
 Unit 3: Sentiment
 Unit 4: Identity and
Personality
 Unit 5: Social Positioning
 In each Unit:
 Readings from
Discourse Analysis and
Sociolinguistics
 Readings from
Language Technologies
 Hands-on assignment
 Implementation and
corpus based experiment
 Competitive error analysis
 Student Presentations
Grading
people who make a good faith effort always do well in my courses…
 15% for each of 5 Unit assignments
 First one is a discourse analysis
 Others are corpus based experiments


We provide the corpus
You implement a feature extractor, test it, do an error analysis, and
present your well motivated idea and evaluation in class
 10% for class participation
 Doing readings (will be posted to course Drupal)
 Posting to Drupal discussion by 10pm the night before class
 Actively contributing to class discussions
 15% for final critique of a technical paper
Corpora for experimentation
 Unit 2: Maptask data (Negotiation coding)
 Possibly other chat corpora with same coding as well
 Unit 3: Product Reviews (Sentiment)
 Unit 4: Blog corpus (Age and Gender)
 Unit 5: AMI meeting corpus (Dialogue Acts)
 Other corpora
 Email discussion list (Social Support coding)
SIDE: Workbench for
Experimentation
 http://www.cs.cmu.edu/~cprose/SIDE.html
SIDE
SIDE
Two Options
 Create your own feature extractor plugins
 We will provide documents abstract classes that you
create specializations of
 Programmed in Java
 Elijah is the developer and can answer your questions
 Use SIDE’s feature creation functionality to create
novel functions
 Grades will be based on:
 The extent to which your features are theory motivated
or data motivated
 The depth of your error analysis
SIDE
SIDE
SIDE
Setting up the course Drupal
 If you are not registered, please do so
 If you don’t have an Andrew account, make sure I have
your email address
 We will manage the course through Drupal
 All materials, including pdfs for required readings, will
be posted to Drupal
 Slides for all lectures will be posted to Drupal after class
 Discussion threads in preparation for each lecture will
be found on Drupal
Assignment 1 (not due til Jan26)
 Transcribe a scene from a favorite move, play, or TV show
 As a shortcut, you can find a script online
 Excerpt should be no more than one page of text
 Select one of the methodologies we are discussing in Unit 1
(e.g., from Gee, Martin & Rose, or Levinson)
 Do a qualitative analysis of the script and write it up
 Use readings from Unit 1 as a collection of models to chose
from
 Due on Week 3 lecture 2
 Turn in transcript, raw analysis (can be annotations added to
the transcript), and write up (your interpretation of the
analysis)
 Prepare a powerpoint presentation for class (no more than 5
minutes of material)
For next time….
 You will receive login information for Drupal
 http://kanagawa.lti.cs.cmu.edu/11719/
 Read excerpts from James Gee’s book (linked to
syllabus entry for Wednesday’s lecture)
 Post to drupal (in response to discussion question
posted for Week 1 Lecture 2)
Carolyn Penstein Rosé
http://www.cs.cmu.edu/~cprose
cprose@cs.cmu.edu
Gates-Hillman Center 5415
Download