discourse

advertisement
Natural Language Processing
COMPSCI 423/723
Rohit Kate
Discourse Processing
Reference: Jurafsky & Martin book, Chapter 21
Basic Steps of Natural
Language Processing
Sound
waves
Words
Phonetics
•
Syntactic
processing
Parses
Semantic
processing
Meaning
This is a conceptual pipeline, humans or
computers may process multiple stages
simultaneously
Dsicourse Meaning
processing in context
Discourse
• So far we always analyzed one sentence in
isolation, syntactically and/or semantically
• Natural languages are spoken or written as a
collection of sentences
• In general, a sentence cannot be understood
in isolation:
– Today was Jack's birthday. Penny and Janet went
to the store. They were going to get presents.
Janet decided to get a kite. "Don't do that," said
Penny. "Jack has a kite. He will make you take it
back.”
Discourse
• Discourse is a coherent structured group of sentences
– Example: monologues (including reading passages), dialogues
• Very little work has been done in understanding beyond a
sentence, i.e. understanding a whole paragraph or an entire
document together
• Important tasks in processing a discourse
– Discourse Segmentation
– Determining Coherence Relations
– Anaphora Resolution
• Ideally deep understanding is needed to do well on these
tasks but so far shallow methods have been used
Discourse Segmentation
• Discourse Segmentation: Separating a document into
a linear sequence of subtopics
– For example: scientific articles are segmented into
Abstract, Introduction, Methods, Results, Conclusions
– This is often a simplification of a higher level structure of a
discourse
• Applications of automatic discourse segmentation:
– For Summarization: Summarize each segment separately
– For Information Retrieval or Information Extraction: Apply
to an appropriate segment
• Related task: Paragraph segmentation, for example
of a speech transcript
Unsupervised Discourse
Segmentation
• Given raw text, segment it into multiple
paragraph subtopics
• Unsupervised: No training data is given for the
task
• Cohesion-based approach: Segment into
subtopics in which sentences/paragraphs are
cohesive with each other; A dip is cohesion at
subtopic boundaries
Cohesion
• Cohesion: Links between text units due to
linguistic devices
• Lexical Cohesion: Use of same or similar words
to link text units
– Today was Jack's birthday. Penny and Janet went
to the store. They were going to get presents.
Janet decided to get a kite. "Don't do that," said
Penny. "Jack has a kite. He will make you take it
back.”
• Non-lexical Cohesion: For example, using
anaphora
Cohesion-based Unsupervised
Discourse Segmentation
• TextTiling algorithm (Hearst, 1997)
– compare adjacent blocks of text
– look for shifts in vocabulary
• Do pre-processing: Tokenization, remove stop
words, stemming
• Divide text into pseudo-sentences of equal
length (say 20 words)
Cohesion-based Unsupervised
Discourse Segmentation
• TextTiling algorithm (Hearst, 1997)
– compare adjacent blocks of text
– look for shifts in vocabulary
• Do pre-processing: Tokenization, remove stop
words, stemming
• Divide text into pseudo-sentences of equal
length (say 20 words)
TextTiling Algorithm contd.
• Compute lexical cohesion score at each gap
between pseudo-sentences
• Lexical cohesion score: Similarity of words
before and after the gap (take say 10 pseudosentences before and 10 pseudo-sentences
after)
• Similarity: Cosine similarity between the word
vectors (high if words co-occur)
Gap
TextTiling Algorithm contd.
• Compute lexical cohesion score at each gap
between pseudo-sentences
• Lexical cohesion score: Similarity of words
before and after the gap (take say 10 pseudosentences before and 10 pseudo-sentences
after)
• Similarity: Cosine similarity between the word
vectors (high if words co-occur)
Similarity
Gap
TextTiling Algorithm contd.
• Plot the similarity and compute the depth
scores of the “similarity valleys”, (a-b)+(c-b)
• Assign segmentation if the depth score is
larger than a threshold (e.g. one standard
deviation deeper than mean valley depth)
valley
a
b
c
TextTiling Algorithm contd.
• Plot the similarity and compute the depth
scores of the “similarity valleys”, (a-b)+(c-b)
• Assign segmentation if the depth score is
larger than a threshold (e.g. one standard
deviation deeper than mean valley depth)
TextTiling Algorithm contd.
From (Hearst, 1994)
Supervised Discourse
Segmentation
• Easy to get supervised data for some segmentation
tasks
– For e.g., paragraph segmentation
– Useful to find paragraphs in speech recognition output
Supervised Discourse
Segmentation
• Easy to get supervised data for some segmentation
tasks
– For e.g., paragraph segmentation
– Useful to find paragraphs in speech recognition output
• Model as a classification task: Classify if the sentence
boundary is a paragraph boundary
– Use any classifier SVM, Naïve Bayes, Maximum Entropy
etc.
Supervised Discourse
Segmentation
• Easy to get supervised data for some segmentation
tasks
– For e.g., paragraph segmentation
– Useful to find paragraphs in speech recognition output
• Model as a classification task: Classify if the sentence
boundary is a paragraph boundary
– Use any classifier SVM, Naïve Bayes, Maximum Entropy
etc.
• Or model as a sequence labeling task: Label a
sentence boundary with “paragraph boundary” or
“not a paragraph boundary label”
Supervised Discourse
Segmentation
• Features:
– Use cohesion features: word overlap, word cosine
similarity, anaphoras etc.
– Additional features: Discourse markers or cue word
• Discourse marker or cue phrase/word: A word or
phrase that signal discourse structure
– For example, “good evening”, “joining us now” in
broadcast news
– “Coming up next” at the end of a segment, “Company
Incorporated” at the beginning of a segment etc.
– Either hand-code or automatically determine by feature
selection
Discourse Segmentation
Evaluation
• Not a good idea to measure precision, recall and Fmeasure because that won’t be sensitive to near
misses
• One good metric WindowDiff (Pevzner & Hearst,
2002)
• Slide a window of length k across the reference
(correct) and the hypothesized segmentation and
count the number of segmentation boundaries in
each
• WindowDiff metric: Average difference in the
number of boundaries in the sliding window
Text Coherence
• A collection of independent sentences do not make a
discourse because they lack coherence
• Coherence: Meaning relation between two units of
text; explains how the meaning of different units of
text combine to build meaning of the larger unit (to
contrast, cohesion is links between units)
Explanation
John hid Bill’s car keys. He was drunk.
???
John hid Bill’s car keys. He likes spinach.
• Humans try to find coherence between sentences all
the time
Coherence Relations
• Coherence Relations: Set of connections between
units in a discourse.
• A few more such relations, Hobbs (1979):
Result
The Tin Woodman was caught in the rain. His joints rusted.
Parallel
The scarecrow wanted some brains. The Tin Woodman wanted a heart.
Elaboration
Dorothy was from Kansas. She lived in the midst of the great Kansas prairies.
Occasion
Dorothy picked up the oil-can. She oiled the Tin Woodman’s joints.
Discourse Structure
• Discourse Structure: The hierarchical structure of a
discourse according to the coherence relations.
John went to the bank to deposit his paycheck. He then took
a train to Bill’s car dealership. He needed to buy a car. The
company he works for now isn’t near any public
transportation. He also wanted to talk to Bill about their
softball league.
Discourse Structure
• Discourse Structure: The hierarchical structure of a
discourse according to the coherence relations.
Occasion
Explanation
John went to the bank to
deposit his paycheck.
Parallel
He then took a train to
Bill’s car dealership.
Explanation
He needed to buy a car.
He also wanted to talk to Bil
l about their softball league.
The company he works for now
isn’t near any public transportation.
• Analogous to syntactic tree structure
• A node in a tree represents locally coherent
sentences: discourse segment (not linear)
Discourse Structure
• What are the uses of discourse structure?
– Summarization systems may skip or merge the segment
connected with Elaboration relation
– Question-answering systems can search in segments with
Explanation relations
– Information extraction system need not merge information
from segments not linked by relations
– A semantic parser may build a larger meaning
representation of the whole discourse
Discourse Parsing
• Coherence Relation Assignment: Automatically
determining the coherence relations between units
of a discourse
• Discourse Parsing: Automatically finding the
discourse structure of an entire discourse
• Both are largely unsolved problems, but some
shallow methods work to some degree, for example,
using cue phrases (or discourse markers)
Automatic Coherence
Assignment
Shallow cue-phrase-based algorithm:
1. Identify cue phrases in a text
2. Segment text into discourse segments using cue
phrases
3. Assign coherence relations between consecutive
discourse segments
1. Identify Cue Phrases
• Phrases that signal discourse structure, e.g. “joining
us now”, “coming up next” etc.
• Connectives: “because”, “although”, “example”,
“with”, “and”
• However, their occurrence is not always indicative of
discourse relation: they are ambiguous
– With its distant orbit, Mars exhibits frigid weather
conditions
– We can see Mars with an ordinary telescope
• Use some simple heuristics, e.g. capitalization of
with, etc. but in general use techniques similar to
word sense disambiguation
2. Segment Text into Discourse
Segments
• Usually sentences so may suffice to to sentence
segmentation
• However, often clauses are more appropriate
Explanation
– With its distant orbit, Mars exhibits frigid weather
conditions
• Use hand-written rules or utilize syntactic parses to
get such segments
3. Classify Relation between
Neighboring Segments
• Use rules based on the cue phrases and connectives
– For example, a sentence beginning with “Because”
indicates Explanation relation with the next segment
• Train classifiers using appropriate features
Drawback of Cue-phrase-based
Algorithm
• Sometimes relations are not signaled by cue phrases
but are implicit through syntax, words, negation etc.:
Contrast
– I don’t want a truck. I’d prefer a convertible.
• Difficult to encode such rules these manually or to
get labeled training examples
• One solution: Automatically find easy examples with
cue phrases then remove the cue phrases to
generate difficult supervised training examples
– I don’t want a truck although I’d prefer a convertible.
Drawback of Cue-phrase-based
Algorithm
• Sometimes relations are not signaled by cue phrases
but are implicit through syntax, words, negation etc.:
Contrast
– I don’t want a truck. I’d prefer a convertible.
• Difficult to encode such rules these manually or to
get labeled training examples
• One solution: Automatically find easy examples with
cue phrases then remove the cue phrases to
generate difficult supervised training examples
– I don’t want a truck.
I’d prefer a convertible.
• Train using words, word pairs, POS tags, etc. as
features
Penn Discourse Treebank
• Recently released corpus that is likely to lead to
better systems for discourse processing
• Has coherence relations encoded associated with
the discourse connectives
• Linked to the Penn Treebank
http://www.seas.upenn.edu/~pdtb/
Reference Resolution
• Reference Resolution: The task of determining what
entities are referred to by which linguistic
expressions
• To understand any discourse it is necessary to know
which entities are being talked about at which point
Mr. Obama visited the city. The president talked about
Milwaukee’s economy. He mentioned new jobs.
– “Mr.Obama”, “The president” and “He” are referring
expressions for referent “Barack Obama” and they corefer
– Anaphora: When a referring expression refers to a
previously introduced entity (antecedent), the referring
expression is called anaphoric, e.g. “The president”, “He”
– Cataphora: When a referring expression refers to an entity
which is introduced later, the referring expression is called
cataphoric, e.g. “the city”
Two Reference Resolution Tasks
• Coreference Resolution: The task of finding referring
expressions that refer to the same entity, i.e. find
coreference chain
– In the previous example the coreference chains are: {Mr.
Obama, The president, he}, {the city, Milwaukee’s}
• Pronominal Anaphora Resolution: The task of finding
the antecedent for a single pronoun
– In the previous example, “he” refers to “Mr. Obama”
• A lot of work has been done in these tasks in the last
15 or so years [Ng, 2010]
Supervised Pronominal Anaphora
Resolution
• Given a pronoun and an entity mentioned earlier,
classify whether the pronoun refers to that entity or
not given the surrounding context
?
?
?
Mr. Obama visited the city. The president talked about Milwaukee’s economy. He mentioned new jobs.
• First filter out pleonastic pronouns like “It is raining.”
using hand-written rules
• Use any classifier, obtain positive examples from
training data, generate negative examples by pairing
each pronouns with other (incorrect) entities
Features for Pronominal
Anaphora Resolution
• Constraints:
– Number agreement
• Singular pronouns (it/he/she/his/her/him) refer to
singular entities and plural pronouns
(we/they/us/them) refer to plural entities
– Person agreement
• He/she/they etc. must refer to a third person entity
– Gender agreement
• He -> John; she -> Mary; it -> car
– Certain syntactic constraints
• John bought himself a new car. [himself -> John]
• John bought him a new car. [him can not be John]
Features for Pronominal
Anaphora Resolution
• Preferences:
– Recency: More recently mentioned entities are
more likely to be referred to
• John went to a movie. Jack went as well. He was not
busy.
– Grammatical Role: Entities in the subject position
is more likely to be referred to than entities in the
object position
• John went to a movie with Jack. He was not busy.
– Parallelism:
• John went with Jack to a movie. Joe went with him to a
bar.
Features for Pronominal
Anaphora Resolution
• Preferences:
– Verb Semantics: Certain verbs seem to bias
whether the subsequent pronouns should be
referring to their subjects or objects
• John telephoned Bill. He lost the laptop.
• John criticized Bill. He lost the laptop.
– Selectional Restrictions: Restrictions because of
semantics
• John parked his car in the garage after driving it around
for hours.
• Encode all these and may be more as features
Coreference Resolution
• Can be done analogously to pronominal
anaphora resolution: Given an anaphor and a
potential antecedent, classify as true or false
• Some approaches also do clustering on the
referring expressions instead of doing binary
classification
• Additional features to incorporate aliases,
variations in names etc., e.g. Mr. Obama,
Barack Obama; Megabucks, Megabucks Inc.
Download