Uploaded by nlp.uhb.jg

slp21-handout

advertisement
Computational Discourse
Speech and Language Processing
Chapter 21
http://clg.wlv.ac.uk/demos/similarity/index.html
Terminology
• Discourse: anything longer than a single utterance or
sentence
– Monologue
– Dialogue:
• May be multi-party
• May be human-machine
Is this text coherent?
“Consider, for example, the difference between
passages (18.71) and (18.72). Almost certainly not.
The reason is that these utterances, when juxtaposed,
will not exhibit coherence. Do you have a discourse?
Assume that you have collected an arbitrary set of
well-formed and independently interpretable
utterances, for instance, by randomly selecting one
sentence from each of the previous chapters of this
book.”
Or, this?
“Assume that you have collected an arbitrary set of
well-formed and independently interpretable
utterances, for instance, by randomly selecting one
sentence from each of the previous chapters of this
book. Do you have a discourse? Almost certainly not.
The reason is that these utterances, when juxtaposed,
will not exhibit coherence. Consider, for example, the
difference between passages (18.71) and (18.72).”
What makes a text coherent?
• Discourse structure
– In a coherent text the parts of the discourse exhibit a
sensible ordering and hierarchical relationship
• Rhetorical structure
– The elements in a coherent text are related via
meaningful relations (“coherence relations”)
• Entity structure (“Focus”)
– A coherent text is about some entity or entities, and the
entity/entities is/are referred to in a structured way
throughout the text.
Outline
• Discourse Structure
– TextTiling (unsupervised)
– Supervised approaches
• Coherence
– Hobbs coherence relations
– Rhetorical Structure Theory
– Entity Structure
• Pronouns and Reference Resolution
Conventions of Discourse Structure
• Differ for different genres
– Academic articles:
• Abstract, Introduction, Methodology, Results,
Conclusion
– Newspaper stories:
• Inverted Pyramid structure:
– Lead followed by expansion, least important last
– Textbook chapters
– News broadcasts
– NB: We can take advantage of this to ‘parse’
discourse structures
Discourse Segmentation
• Simpler task: Separating document into linear
sequence of subtopics
• Applications
– Information retrieval
• Automatically segmenting a TV news broadcast or a
long news story into sequence of stories
• Audio browsing (e.g. of voicemail)
– Text summarization
– Information extraction
• Extract information from a coherent segment or topic
– Question Answering
Unsupervised Segmentation
• Hearst (1997): 21-paragraph science news article on
“Stargazers”
• Goal: produce the following subtopic segments:
Intuition: Cohesion
• Halliday and Hasan (1976): “The use of certain linguistic
devices to link or tie together textual units”
• Lexical cohesion:
– Indicated by relations between words in the two units (identical
word, synonym, hypernym)
• Before winter I built a chimney, and shingled the sides of my house.
I thus have a tight shingled and plastered house.
• Peel, core and slice the pears and the apples. Add the fruit to the
skillet.
Intuition: Cohesion
• Non-lexical: anaphora
– The Woodhouses were first in consequence there. All
looked up to them.
• Cohesion chain:
– Peel, core and slice the pears and the apples. Add the
fruit to the skillet. When they are soft…
•
Note: cohesion is not coherence
Cohesion-Based Segmentation
• Sentences or paragraphs in a subtopic are cohesive
with each other
• But not with paragraphs in a neighboring subtopic
• So, if we measured the cohesion between every
neighboring sentences
– We might expect a ‘dip’ in cohesion at subtopic
boundaries.
TextTiling (Hearst ’97)
1. Tokenization
–
–
–
–
–
Each space-delimited word
Converted to lower case
Throw out stop list words
Stem the rest
Group into pseudo-sentences (windows) of length w=20
2. Lexical Score Determination: cohesion score
Three part score including
• Average similarity (cosine measure) between gaps
• Introduction of new terms
• Lexical chains
3. Boundary Identification
TextTiling Method
Cosine Similarity
Vector Space Model
(recall distributional semantics, following IR)
• In the vector space model for TextTiling, both
segments are represented as vectors
• Numbers are derived from the words that occur in the
collection
– Could count bits to get similarity, frequency,
weights etc.
• But that favors long documents over shorter ones
– Cosine to normalize dot product by vector lengths
• Lexical features as before
• Discourse markers or cue words
– Broadcast news
• Good evening, I’m <PERSON>
• …coming up….
– Science articles
• “First,….”
• “The next topic….”
• Supervised machine learning
– Label segment boundaries in training and test set
• Easy to get automatically for very clear boundaries
(paragraphs, news stories)
– Extract features in training
– Learn a (sequence) classifier
– In testing, apply features to predict boundaries
• Evaluation: WindowDiff (Pevzner and Hearst 2000)
assign partial credit
Local Work
• Representation and application of hierarchical discourse segmentation
– Mihai Rotaru, Computer Science, PhD 2008: Applications of Discourse
Structure for Spoken Dialogue Systems (Litman)
• Measuring and applying lexical cohesion (Ward & Litman, 2008)
• Cue phrase disambiguation (Hirschberg & Litman, 1993; Litman, 1996)
• Supervised linear discourse segmentation (Passonneau & Litman, 1997)
• Swapna Somasundaran, Computer Science, PhD 2010: Discourse-level relations for
Opinion Analysis (Wiebe)
• Your name here!
Part II of: What makes a text coherent?
• Appropriate sequencing of subparts of the discourse - discourse/topic structure
• Appropriate use of coherence relations
between subparts of the discourse -rhetorical structure
• Appropriate use of referring expressions
Text Coherence, again
The reason is that these utterances, when juxtaposed, will
not exhibit coherence. Almost certainly not. Do you have a
discourse? Assume that you have collected an arbitrary set
of well-formed and independently interpretable utterances,
for instance, by randomly selecting one sentence from each
of the previous chapters of this book.
Or….
Assume that you have collected an arbitrary set of wellformed and independently interpretable utterances, for
instance, by randomly selecting one sentence from each of
the previous chapters of this book. Do you have a
discourse? Almost certainly not. The reason is that these
utterances, when juxtaposed, will not exhibit coherence.
Coherence
• John hid Bill’s car keys. He was drunk.
• ??John hid Bill’s car keys. He likes spinach.
• Again, not the same as cohesion.
Why Coherence? Summarization
Slide from Mirella Lapata and Regina Barzilay
Applications of Coherence Metrics
• Text Generation:
– concept-to-text generation
– summarization, text simplification
– question answering, machine translation
– essay grading
• Text Understanding
– Improving coreference resolution
• Potential uses
– automatic evaluation tool for text quality
– during system development (avoids repeated human evaluations)
• Software
– Brown Coherence Toolkit
– Coh-Metrix: cohmetrix.memphis.edu/CohMetrixDemo/demo.htm
Hobbs ’79: Coherence Relations
• Result
– Infer that the state or event asserted by S0 causes
or could cause the state or event asserted by S1.
The Tin Woodman was caught in the rain. His joints
rusted.
• Explanation
– Infer that the state or event asserted by S1 causes
or could cause the state or event asserted by S0.
John hid Bill’s car keys. He was drunk.
• Parallel
– Infer p(a1, a2..) from the assertion of S0 and
p(b1,b2…) from the assertion of S1, where ai and
bi are similar, for all i.
The Scarecrow wanted some brains. The Tin
Woodman wanted a heart.
• Elaboration
– Infer the same proposition P from the assertions of
S0 and S1.
Dorothy was from Kansas. She lived in the midst of
the great Kansas prairies.
Coherence Relations
Rhetorical Structure Theory
• Another theory of discourse structure, based on
identifying relations between segments of the text
– Nucleus/satellite notion encodes asymmetry
• Nucleus is thing that if you deleted it, text wouldn’t
make sense.
– Some rhetorical relations:
•
•
•
•
Elaboration: (set/member, class/instance, whole/part…)
Contrast: multinuclear
Condition: Sat presents precondition for N
Purpose: Sat presents goal of the activity in N
One Rhetorical Relation
• A sample definition
–
–
–
–
Relation: Evidence
Constraints on N: H might not believe N as much as S think s/he should
Constraints on Sat: H already believes or will believe Sat
Effect: H’s belief in N is increased
• An example:
Kevin must be here.
His car is parked outside.
Nucleus
Satellite
Some Problems with RST
• How many Rhetorical Relations are there?
• How can we use RST in dialogue as well as
monologue?
• RST does not model overall structure of the
discourse.
• Difficult to get annotators to agree on labeling the
same texts
• Trees versus directed graphs?
Automatic Labeling / Discourse Parsing
• Supervised machine learning
– Get a group of annotators to assign a set of relations
to a text, and/or tree structure
– Extract a set of surface features from the text that
might signal the presence of the coherence relations
– Train a supervised ML system based on the training
set
– Some publicly available resources (RST treebank,
Penn Discourse Treebank) exist
• Semi and unsupervised approaches
• Inferential / Abduction (my dissertation!)
Penn Discourse Treebank
• The city’s Campaign Finance Board has refused to
pay Mr. Dinkins $95,142 in matching funds because
his campaign records are incomplete.
– Causal relation
• So much of the stuff poured into Motorola’s offices
that its mail rooms there simply stopped delivering
it. Implicit = so. Now, thousands of mailers,
catalogs and sales pitches go straight into the
trash.
- Consequence relation
Shallow Features
• Explicit markers / cue phrases: because, however,
therefore, then, etc.
• Tendency of certain syntactic structures to signal
certain relations:
– Infinitives are often used to signal purpose
relations: Use rm to delete files.
• Ordering
• Tense/aspect
• Intonation
Part III of: What makes a text coherent?
Entity-based Coherence
• Appropriate sequencing of subparts of the discourse - discourse/topic structure
• Appropriate use of coherence relations between
subparts of the discourse -- rhetorical structure
• Appropriate use of referring expressions
Centering Theory:
Grosz, Joshi and Weinstein, 1995
• The way entities are introduced and discussed influences coherence
• Entities in an utterance are ranked according to salience.
– Is an entity pronominalized or not?
– Is an entity in a prominent syntactic position?
• Each utterance has one center (≈topic or focus).
– Coherent discourses have utterances with common centers.
• Entity transitions capture degrees of coherence
– (e.g., in Centering theory CONTINUE > SHIFT).
Claim: Entity coherence:
Discourses without a clear ‘central entity’ feel less coherent
Claim: Entity coherence:
Discourses without a clear ‘central entity’ feel less coherent
Concepts and definitions, I
• Every UTTERANCE U in a discourse (segment) DS updates the LOCAL
FOCUS - a PARTIALLY RANKED set of discourse entities, or
FORWARD-LOOKING CENTERS (CFs)
• An utterance U in discourse segment DS updates the existing CF set by
replacing it with the set of CFs REALIZED in U, CF(U,DS) (usually
simplified to CF(U))
• The most highly ranked CF realized in utterance U is CP(U)
(1)
u1. Susan gave James a pet hamster.
CF(u1) = [Susan,James,pet hamster]. CP(u1) = Susan
(2)
u2. She gave Peter a nice scarf.
CF(u2) = [Susan,Peter,nice scarf]. CP(u2) = Susan
Concepts and Definitions,II:
The CB
• The BACKWARD-LOOKING CENTER of
utterance Ui,
• CB(Ui),
• is the highest-ranked element of CF(UI-1) that
is realized in Ui
The CB: Examples
(1)
u1. Susan gave James a pet hamster.
CF(u1) = [Susan,James,pet hamster]. CB = undefined CP=Susan
(2)
u2. She gave Peter a nice scarf.
CF(u2) = [Susan,Peter,nice scarf]. CB=Susan. CP=Susan
NB: The CB is not always the most ranked entity of the PREVIOUS utterance
(2’)
u2. He loves hamsters.
CF(u2) = [James,hamsters]. CB=James. CP=James
… or the most highly ranked entity of the CURRENT one
(2’’) u2. Peter gave her a nice scarf.
CF(u2) = [Peter,Susan, nice scarf]. CB=Susan. CP=Peter
Constraint 1
CONSTRAINT 1 (STRONG): All utterances of a
segment except for the first have exactly one CB
CB UNIQUENESS: Utterances have at most one CB
ENTITY CONTINUITY: For all utterances of a segment except
for the first, CF(Ui)  CF(U i-1)  Ø
CONSTRAINT 1 (WEAK): All utterances of a segment
except for the first have AT MOST ONE CB
Claims of the theory: Local salience and
pronominalization
• Grosz et al (1995): the CB is also the most salient
entity. Texts in which other entities (but not the CB)
are pronominalized are less felicitous
(1)
a. Something must be wrong with John.
b. He has been acting quite odd.
c.
He called up Mike yesterday.
d. John wanted to meet him quite urgently.
(2) a.
Something must be wrong with John.
b.
He has been acting quite odd.
c.
He called up Mike yesterday.
d. He wanted to meet him quite urgently.
Rule 1
RULE 1: if any CF is pronominalized, the CB
is.
Claims of the theory:
Preserving the ranking
• Discourses without a clear ‘central entity’ feel less
coherent
(1)
a. John went to his favorite music store to buy a piano.
b. He had frequented the store for many years.
c.
He was excited that he could finally buy a piano.
d. He arrived just as the store was closing for the day.
(2) a.
John went to his favorite music store to buy a piano.
b.
It was a store John had frequented for many years.
c.
He was excited that he could finally buy a piano.
d.
It was closing just as John arrived.
Transitions
• Grosz et al.: utterances are easier to process
– if they preserve CB of previous utterance or
– if CB(U) is also CP(U).
CONTINUE: Ui is a continuation if CB(Ui) = CB(Ui-1),
and CB(Ui) = CP(Ui)
RETAIN: Ui is a retain if CB(Ui) = CB(Ui-1), but
CB(Ui) is different from CP(Ui)
SHIFT: Ui is a shift if CB(Ui) ≠ CB(Ui-1
Utterance classification
(0)
u0. Susan is a generous person.
CF(u0) = [Susan] CB = undefined CP = Susan.
(1)
u1. She gave James a pet hamster.
CF(u1) = [Susan,James,pet hamster]. CB = Susan
CP=Susan
CONTINUE:
(2)
u2. She gave Peter a nice scarf.
CF(u2) = [Susan,Peter,nice scarf]. CB=Susan.
CP=Susan CONTINUE
Utterance classification, II
(0)
u0. Susan is a generous person.
CF(u0) = [Susan] CB = undefined CP = Susan.
(1)
u1. She gave James a pet hamster.
CF(u1) = [Susan,James,pet hamster]. CB = Susan
CP=Susan
SHIFT:
(2’)
u2. He loves hamsters.
CF(u2) = [James]. CB=James. CP=James SHIFT
Utterance classification, III
(0)
u0. Susan is a generous person.
CF(u0) = [Susan] CB = undefined CP = Susan.
(1)
u1. She gave James a pet hamster.
CF(u1) = [Susan,James,pet hamster]. CB = Susan
CP=Susan
RETAIN:
(2’’) u2. Peter gave her a nice scarf.
CF(u2) = [Peter,Susan, nice scarf]. CB=Susan.
CP=Peter RETAIN
Rule 2
RULE 2: (Sequences of) continuations
are preferred over (sequences of)
retains, which are preferred over
(sequences of) shifts.
Summary of the claims
CONSTRAINT 1: All utterances of a segment except for the
first have exactly one CB
RULE 1: if any CF is pronominalized, the CB is.
RULE 2: (Sequences of) continuations are preferred over
(sequences of) retains, which are preferred over (sequences
of) shifts.
Original centering theory
• Grosz et al do not provide algorithms for computing any of the
notions used in the basic definitions:
–
–
–
–
–
UTTERANCE (clause? finite clause? sentence?)
PREVIOUS UTTERANCE
REALIZATION
RANKING
What counts as a ‘PRONOUN’ for the purposes of Rule 1? (Only
personal pronouns? Or demonstrative pronouns as well? What about
second person pronouns?)
• One of the reasons for the success of the theory is that it
provides plenty of scope for theorizing …
Barzilay and Lapata: The Entity Grid
(extensions/software, e.g. Brown Coherence Toolkit
http://www.cs.brown.edu/~melsner/manual.html)
Barzilay and Lapata: The Entity Grid
Barzilay and Lapata: The Entity Grid
Barzilay and Lapata: The Entity Grid
Barzilay and Lapata: The Entity Grid
Barzilay and Lapata: The Entity Grid
Barzilay and Lapata: The Entity Grid
Barzilay and Lapata: The Entity Grid
Barzilay and Lapata: The Entity Grid
Barzilay and Lapata: The Entity Grid
Barzilay and Lapata: The Entity Grid
Final topic: Pronouns and Reference Resolution
A Reference Joke
Gracie: Oh yeah ... and then Mr. and Mrs. Jones were having
matrimonial trouble, and my brother was hired to watch Mrs.
Jones.
George: Well, I imagine she was a very attractive woman.
Gracie: She was, and my brother watched her day and night for
six months.
George: Well, what happened?
Gracie: She finally got a divorce.
George: Mrs. Jones?
Gracie: No, my brother's wife.
Reference Resolution: Vocabulary
• Process of associating Bloomberg/he/his with
particular person and big budget problem/it with a
concept
Guiliani left Bloomberg to be mayor of a city with
a big budget problem. It’s unclear how he’ll be
able to handle it during his term.
• Referring exprs.: Guilani, Bloomberg, he, it, his
• Presentational it, there: non-referential
• Referents: the person named Bloomberg, the concept
of a big budget problem
• Co-referring referring expressions: Bloomberg, he,
his
• Antecedent: Bloomberg
• Anaphors: he, his
Discourse Models
• Needed to model reference because referring
expressions (e.g. Guiliani, Bloomberg, he, it, budget
problem) encode information about beliefs about the
referent
• When a referent is first mentioned in a discourse, a
representation is evoked in the model
– Information predicated of it is stored also in the
model
– On subsequent mention, it is accessed from the
model
Types of Referring Expressions
• Entities, concepts, places, propositions, events, ...
According to John, Bob bought Sue an Integra, and
Sue bought Fred a Legend.
– But that turned out to be a lie. (a speech act)
– But that was false. (proposition)
– That struck me as a funny way to describe the
situation. (manner of description)
– That caused Sue to become rather poor. (event)
– That caused them both to become rather poor.
(combination of multiple events)
•
Reference Phenomena:
5 Types of Referring Expressions
Indefinite NPs
A homeless man hit up Bloomberg for a dollar.
Some homeless guy hit up Bloomberg for a dollar.
This homeless man hit up Bloomberg for a dollar.
• Definite NPs
The poor fellow only got a lecture.
• Demonstratives
This homeless man got a lecture but that one got
carted off to jail.
• Names
Prof. Litman teaches on Monday.
Pronouns
A large tiger escaped from the Central Park zoo chasing
a tiny sparrow. It was recaptured by a brave
policeman.
– Referents of pronouns usually require some degree
of salience in the discourse (as opposed to definite
and indefinite NPs, e.g.)
– How do items become salient in discourse?
Salience vs. Recency
E: So you have the engine assembly finished. Now
attach the rope. By the way, did you buy the gas can
today?
A: Yes.
E: Did it cost much?
A: No.
E: OK, good. Have you got it attached yet?
Reference Phenomena: Information Status
• Giveness hierarchy / accessibility scales …
• But complications
Inferables
• I almost bought an Acura Integra today, but a door
had a dent and the engine seemed noisy.
• Mix the flour, butter, and water. Knead the dough
until smooth and shiny.
Discontinuous Sets
• Entities evoked together but mentioned in different
sentence or phrases
John has a St. Bernard and Mary has a Yorkie. They
arouse some comment when they walk them in the
park.
Generics
I saw two Corgis and their seven puppies today. They
are the funniest dogs
Constraints on Pronominal Reference
• Number agreement
John’s parents like opera. John hates it/John hates
them.
• Person agreement
George and Edward brought bread. They shared it.
• Gender agreement
John has a Porsche. He/it/she is attractive.
• Syntactic constraints
John bought himself a new Volvo. (himself =
John)
John bought him a new Volvo (him = not John).
Preferences in Pronoun Interpretation
• Recency
John bought a new boat. Bill bought a bigger one.
Mary likes to sail it.
• But…grammatical role raises its ugly head…
John went to the Acura dealership with Bill. He
bought an Integra.
Bill went to the Acura dealership with John. He
bought an Integra.
?John and Bill went to the Acura dealership. He
bought an Integra.
• And so does…repeated mention
– John needed a car to go to his new job. He decided that he
wanted something sporty. Bill went to the dealership with
him. He bought a Miata.
– Who bought the Miata?
– What about grammatical role preference?
• Parallel constructions
Saturday, Mary went with Sue to the farmer’s market.
Sally went with her to the bookstore.
Sunday, Mary went with Sue to the mall.
Sally told her she should get over her shopping obsession.
• Selectional restriction
John left his plane in the hangar.
He had flown it from Memphis this morning
• Verb semantics/thematic roles
John telephoned Bill. He’d lost the directions to
his house.
John criticized Bill. He’d lost the directions to
his house.
Summary: What Affects Reference Resolution?
• Lexical factors
– Reference type: Inferrability, discontinuous set, generics,
one anaphora, pronouns,…
• Discourse factors:
– Recency
– Focus/topic structure, digression
– Repeated mention
• Syntactic factors:
– Agreement: gender, number, person, case
– Parallel construction
– Grammatical role
– Selectional restrictions
• Semantic/lexical factors
– Verb semantics, thematic role
Reference Resolution Algorithms
• Given these types of features, can we construct an
algorithm that will apply them such that we can
identify the correct referents of anaphors and other
referring expressions?
Reference Resolution Task
• Finding in a text all the referring expressions that
have one and the same denotation
– Pronominal anaphora resolution
– Anaphora resolution between named entities
– Full noun phrase anaphora resolution
Issues
• Which constraints/features can/should we make use
of?
• How should we order them? I.e. which override
which?
• What should be stored in our discourse model? I.e.,
what types of information do we need to keep track
of?
• How to evaluate?
Some Algorithms
• Hobbs ‘78: syntax tree-based referential search
• Centering: recall entity-coherence
• Supervised learning approaches
Hobbs: Syntax-Based Approach
• Search for antecedent in parse tree of current
sentence, then prior sentences in order of recency
– For current S, search for NP nodes to the left of a
path p from the pronoun up to the first NP or S
node (X) above it in L2R, breadth-first
• Propose as pronoun’s antecedent any NP you find as
long as it has an NP or S node between itself and X
• If X is highest node in sentence, search prior sentences,
L2R breadth-first, for candidate NPs
• O.w., continue searching current tree by going to next S
or NP above X before going to prior sentences
Example
• Hobbs versus centering
Supervised Anaphora Resolution
• Input: pronoun plus current and preceding sentences
• Training: hand-labeled corpus of positive examples
plus inferred negative examples
• Features
•
•
•
•
•
Strict number
Compatible number
Strict gender
Compatible gender
Sentence distance
• Hobbs distance (noun groups between pronoun and
candidate antecedent
• Grammatical role
• Linguistic form (proper name, definite, indefinite,
pronominal antecedent)
– From pronouns to general coreference
• Edit distance for full NPs
Summary: Reference Resolution
• Many approaches to reference resolution
• Use similar information/features but different
methods
– Hobbs’ Syntax
– Centering coherence
– Supervised and shallow methods use simple
techniques with reasonable success
– Also “deep” inferential approaches
Download