Models of
Discourse Analysis
Carolyn Penstein Rosé
Language Technologies Institute/
Human-Computer Interaction Institute
Read the posts and be
ready to discuss what you
see as the take aways for
computationization of
discourse analysis from
today’s readings…
What are the computational
implications of the debate
between DA and CA?
Note: These preparatory
activities were rated as
least beneficial to
students, so…
We will start the
lecture/discussion at
exactly 12:05pm.
Please be on time and
ready to discuss!
Early Course Evaluation
Good news: everyone rated
the lectures/discussions as
valuable and engaging
Things to improve:
Decrease preparation time
Change: only one discussion thread
per week, but continue to use it
throughout the week, include different
options for response that require
different amounts of time
Change: frontload readings for
Monday, further divide readings into
required, extra, and supplemental
More focus on fewer concepts for the
remainder of the semester
This week:
Sections 7.2-7.7 are most important
* Not required!!!
Chicken and Egg…
Main issue for this week:
Exploring sequencing and
linking between speech acts in
* Where do the ordering constraints come from? Is it the language? Or is it what is behind the language
(e.g., intentions, task structure)? If the latter, how do we computationalize that?
Reminder from last time RE
Constraint from Ordering
Inform is the most common class
With bigrams, if we look for conditional
probabilities above 25%
Next most frequent is Assess (18.5%)
The only case where the most likely
next class is not Inform is ElicitAssessment, which is followed by
Assessment 36% of the time
It is followed by Inform 33% of the time
It only occurs about 1% of the time
Trigrams might be better, but this
makes ordering information look pretty
More on what was least valuable
(student quotes)
•Nice job on the homeworks!!!
•I saw SO much improvement
over the several posts and finally
the assignment.
The forum prompt
seem unbalanced in
proportion to the
homework - by the
time the "real"
homework came
along, I felt I had done
ten times more work
on my posts already.
The Homework
Assignment 2 (not due til Feb23)
Look at the Maptask dataset and Negotiation
coding that is provided
Think about what distinguishes the codes at a
linguistic level
Do an error analysis on the dataset using a simple
unigram baseline, and from that propose one or a
few new types of features motivated by your
linguistic understanding of the Negotiation
Due on Week 7 lecture 2
Turn in data your feature extractors (documented code)
and a formal write up of your experimentation
Have a 5 minute powerpoint presentation ready for
class on Week 7 lecture 2
Interesting Observation!
Responses can address either illocutions or perlocutions
Perlocutions are much less constrained
Accounts for some of the difficulty in imposing ordering
Argues in favor for thinking about conversation as
organized around intentions and tasks rather than
linguistic categories
Wednesday’s readings will argue just the opposite!!
Are illocutions just the wrong categories??
Discourse Analysis vs Conversation Analysis
(according to Levinson)
 Rules, formulas, more
typical of linguistics and
 Categories,
contingencies, grammars
 Use of a small but
strategic amount of data
 Accused of “premature”
theory construction
 Martin & Rose, Levinson
More rigorously empirical
and inductive
Focus on what is found in
data, not on what is
expected to be found or
would sound odd
Hesitant to make
generalizations/ Accused
of being atheoretical
Questions about whether
the rules “work” on real
* Is it a question about the nature of language (is there a fundamental segmentation
difference between utterances and acts?), or is it a question about research
methodology? Are these linked?
The nature
of what we
are modeling
What we can
know about it
and how certain
we can be
How we
learn what
we know
Rules, like speech
anthrooplogy style
… An now for
Elijah’s SIDE