Reference Resolution

advertisement
Reference Resolution
Natural Language Processing
January 22, 2008
Agenda
• Reference resolution
– Knowledge-rich, deep analysis approaches
– Centering
– Knowledge-based, shallow analysis:
CogNIAC (‘95)
– Learning approaches: Fully, Weakly, and UnSupervised
• Cardie&Ng ’99-’04
Centering
• Identify the local “center” of attention
– Pronominalization focuses attention,
appropriate use establishes coherence
• Identify entities available for reference
• Describe shifts in what discourse is about
– Prefer different types for coherence
Centering: Structures
• Each utterance (Un) has:
– List of forward-looking centers: Cf(Un)
• Entities realized/evoked in Un
• Rank by likelihood of focus of future discourse
• Highest ranked element: Cp(Un)
– Backward looking center (focus): Cb(Un)
Centering: Transitions
Cb(Un)=Cb(Un-1) Cb(Un) != Cb(Un-1)
Cb(Un)=Cp(Un) Continuing
Smooth Shift
Cb(Un)!=Cp(Un) Retaining
Rough Shift
Centering: Constraints and
Rules
• Constraints:
– Exactly ONE backward -looking center
– Everything in Cf(Un) realized in Un
– Cb(Un): highest ranked item in Cf(Un) in Un-1
• Rules:
– If any item in Cf(Un-1) realized as pronoun in Un,
Cb(Un) must be realized as pronoun
– Transitions are ranked:
• Continuing > Retaining > Smooth Shift > Rough Shift
Centering: Example
• John saw a beautiful Acura Integra at the
dealership
– Cf: (John, Integra, dealership); No Cb
• He showed it to Bill.
– Cf:(John/he, Integra/it*, Bill); Cb: John/he
• He bought it:
– Cf: (John/he, Integra/it); Cb: John/he
Reference Resolution:
Differences
• Different structures to capture focus
• Different assumptions about:
– # of foci, ambiguity of reference
• Different combinations of features
•
Reference Resolution:
Agreements
Knowledge-based
– Deep analysis: full parsing, semantic analysis
– Enforce syntactic/semantic constraints
– Preferences:
•
•
•
•
Recency
Grammatical Role Parallelism (ex. Hobbs)
Role ranking
Frequency of mention
• Local reference resolution
• Little/No world knowledge
• Similar levels of effectiveness
Alternative Strategies
• Knowledge-based, but
– Shallow processing, simple rules!
• CogNIAC (Baldwin ’95)
• Data-driven
– Fully or weakly supervised learning
• Cardie & Ng ( ’02-’04)
Questions
• 80% on (clean) text. What about…
– Conversational speech?
• Ill-formed, disfluent
– Dialogue?
• Multiple speakers introduce referents
– Multimodal communication?
• How else can entities be evoked?
• Are all equally salient?
More Questions
• 80% on (clean) (English) text: What
about..
– Other languages?
• Salience hierarchies the same
– Other factors
• Syntactic constraints?
– E.g. reflexives in Chinese, Korean,..
• Zero anaphora?
– How do you resolve a pronoun if you can’t find it?
CogNIAC
• Goal: Resolve with high precision
– Identify where ambiguous, use no world knowledge,
simple syntactic analysis
– Precision: # correct labelings/# of labelings
– Recall: # correct labelings/# of anaphors
• Uses simple set of ranked rules
– Applied incrementally left-to-right
• Designed to work on newspaper articles
– Tune/rank rules
CogNIAC: Rules
• Only resolve reference if unique
antecedent
• 1) Unique in prior discourse
• 2) Reflexive: nearest legal in same
sentence
• 3) Unique in current & prior:
• 4) Possessive Pro: single exact poss in
prior
• 5) Unique in current
CogNIAC: Example
• John saw a beautiful Acura Integra in the
dealership.
• He showed it to Bill.
– He= John : Rule 1; it -> ambiguous (Integra)
• He bought it.
– He=John: Rule 6; it=Integra: Rule 3
Data-driven Reference
Resolution
• Prior approaches
– Knowledge-based, hand-crafted
• Data-driven machine learning approach
– Cast coreference as classification problem
• For each pair NPi,NPj, do they corefer?
• Cluster to form equivalence classes
NP Coreference Examples
• Link all NPs refer to same entity
Queen Elizabeth set about transforming her husband,
King George VI, into a viable monarch. Logue,
a renowned speech therapist, was summoned to help
the King overcome his speech impediment...
Example from Cardie&Ng 2004
Training Instances
• 25 features per instance: 2NPs, features, class
– lexical (3)
• string matching for pronouns, proper names, common nouns
– grammatical (18)
•
•
•
•
•
pronoun_1, pronoun_2, demonstrative_2, indefinite_2, …
number, gender, animacy
appositive, predicate nominative
binding constraints, simple contra-indexing constraints, …
span, maximalnp, …
– semantic (2)
• same WordNet class
• alias
– positional (1)
• distance between the NPs in terms of # of sentences
– knowledge-based (1)
• naïve pronoun resolution algorithm
Classification & Clustering
• Classifiers:
– C4.5 (Decision Trees), RIPPER
• Cluster: Best-first, single link clustering
– Each NP in own class
– Test preceding NPs
– Select highest confidence coref, merge
classes
• Tune: Training sample skew: class, type
ALIAS = C: +
ALIAS = I:
| SOON_STR_NONPRO = C:
| | ANIMACY = NA: | | ANIMACY = I: | | ANIMACY = C: +
| SOON_STR_NONPRO = I:
| | PRO_STR = C: +
| | PRO_STR = I:
| | | PRO_RESOLVE = C:
| | | | EMBEDDED_1 = Y: | | | | EMBEDDED_1 = N:
| | | | | PRONOUN_1 = Y:
| | | | | | ANIMACY = NA: | | | | | | ANIMACY = I: | | | | | | ANIMACY = C: +
| | | | | PRONOUN_1 = N:
| | | | | | MAXIMALNP = C: +
| | | | | | MAXIMALNP = I:
| | | | | | | WNCLASS = NA: | | | | | | | WNCLASS = I: +
| | | | | | | WNCLASS = C: +
| | | PRO_RESOLVE = I:
| | | | APPOSITIVE = I: | | | | APPOSITIVE = C:
| | | | | GENDER = NA: +
| | | | | GENDER = I: +
| | | | | GENDER = C: -
Classifier for
MUC-6 Data
Set
Unsupervised Clustering
• Analogous features to supervised
• Distance measure: weighted sum of features
– Positive infinite weights: block clustering
– Negative infinite weights: cluster, unless blocked
– Others, heuristic
• If distance > r (cluster radius), non-coref
• Clustering:
– Each NP in own class
– Test each preceding NP for dist < r
• If so, cluster, UNLESS incompatible NP
• Performance: Middling: b/t best and worst
Problem 1
• Coreference is a rare relation
– skewed class distributions (2% positive instances)
– remove some negative instances
NP1
NP2
NP3
NP4
farthest antecedent
NP5
NP6
NP7
NP8
NP9
Problem 2
• Coreference is a discourse-level problem
– different solutions for different types of NPs
• proper names: string matching and aliasing
– inclusion of “hard” positive training instances
– positive example selection: selects easy positive
training instances (cf. Harabagiu et al. (2001))
Queen Elizabeth set about transforming her husband,
King George VI, into a viable monarch. Logue,
the renowned speech therapist, was summoned to help
the King overcome his speech impediment...
Problem 3
• Coreference is an equivalence relation
– loss of transitivity
– need to tighten the connection between classification
and clustering
– prune learned rules w.r.t. the clustering-level
coreference scoring function
coref ?
coref ?
[Queen Elizabeth] set about transforming [her] [husband], ...
not coref ?
Weakly Supervised Learning
• Exploit small pool of labeled training data
– Larger pool unlabeled
• Single-View Multi-Learner Co-training
– 2 different learning algorithms, same feature set
– each classifier labels unlabeled instances for the
other classifier
– data pool is flushed after each iteration
Effectiveness
• Supervised learning approaches
– Comparable performance to knowledgebased
• Weakly supervised approaches
– Decent effectiveness, still lags supervised
– Dramatically less labeled training data
• 1K vs 500K
Reference Resolution:
Extensions
• Cross-document co-reference
• (Baldwin & Bagga 1998)
– Break “the document boundary”
– Question: “John Smith” in A = “John Smith” in
B?
– Approach:
• Integrate:
– Within-document co-reference
• with
– Vector Space Model similarity
Cross-document Co-reference
• Run within-document co-reference
(CAMP)
– Produce chains of all terms used to refer to
entity
• Extract all sentences with reference to
entity
– Pseudo per-entity summary for each
document
Cross-document Co-reference
• Experiments:
– 197 NYT articles referring to “John Smith”
• 35 different people, 24: 1 article each
• With CAMP: Precision 92%; Recall 78%
• Without CAMP: Precision 90%; Recall 76%
• Pure Named Entity: Precision 23%; Recall 100%
Conclusions
• Co-reference establishes coherence
• Reference resolution depends on
coherence
• Variety of approaches:
– Syntactic constraints, Recency,
Frequency,Role
• Similar effectiveness - different
requirements
• Co-reference can enable summarization
Coherence & Coreference
• Cohesion: Establishes semantic unity of
discourse
– Necessary condition
– Different types of cohesive forms and relations
– Enables interpretation of referring expressions
• Reference resolution
– Syntactic/Semantic Constraints/Preferences
– Discourse, Task/Domain, World knowledge
• Structure and semantic constraints
Challenges
• Alternative approaches to reference
resolution
– Different constraints, rankings, combination
• Different types of referent
– Speech acts, propositions, actions, events
– “Inferrables” - e.g. car -> door, hood, trunk,..
– Discontinuous sets
– Generics
– Time
Discourse Structure Theories
,Natural Language Processing
CMSC 35100-1
January 22, 2008
Roadmap
• Goals of Discourse Structure Models
– Limitations of early approaches
• Models of Discourse Structure
– Attention & Intentions (Grosz & Sidner 86)
– Rhetorical Structure Theory (Mann &
Thompson 87)
• Contrasts, Constraints & Conclusions
Why Model Discourse
Structure?
(Theoretical)
• Discourse: not just constituent utterances
–
–
–
–
–
Create joint meaning
Context guides interpretation of constituents
How????
What are the units?
How do they combine to establish meaning?
• How can we derive structure from surface forms?
– What makes discourse coherent vs not?
– How do they influence reference resolution?
Why Model Discourse
Structure?
(Applied)
• Design better summarization,
understanding
• Improve speech synthesis
– Influenced by structure
• Develop approach for generation of
discourse
• Design dialogue agents for task
interaction
• Guide reference resolution
Early Discourse Models
• Schemas & Plans
• (McKeown, Reichman, Litman & Allen)
– Task/Situation model = discourse model
• Specific->General: “restaurant” -> AI planning
• Topic/Focus Theories (Grosz 76, Sidner 76)
– Reference structure = discourse structure
• Speech Act
– single utt intentions vs extended discourse
Discourse Models: Common
Features
• Hierarchical, Sequential structure applied to
subunits
– Discourse “segments”
– Need to detect, interpret
• Referring expressions provide coherence
– Explain and link
• Meaning of discourse more than that of
component utterances
• Meaning of units depends on context
Earlier Models
• Issues:
– Conflate different aspects of discourse
• Task plan, discourse plan
– Ignore aspects of discourse
• Goals & intentions vs focus
– Overspecific
• Fixed plan, schema, relation inventory
Attention, Intentions and the
Structure of Discourse
• Grosz&Sidner (1986)
• Goals:
– Integrate approaches for focus (reference
res.), plan/task structure, discourse structure,
goals
• Three part model:
– Linguistic structure (utterances)
– Attentional structure (focus, reference)
– Intentional structure (plans, purposes)
Linguistic Structure
• Utterances group into discourse segments
– Hierarchical, not necessarily contiguous
– Not strictly decompositional
• 2-way interactions
– Utterances define structure;
• Cue phrases mark segment boundaries
– But, okay, fine, incidentally
– Structure guides interpretation
– Reference
Intentional Structure
• Discourse & participants: overall purpose
– Discourse segments have purposes
(DP/DSP)
• Contribute to overall
• Main DP/DSP intended to be recognized
Intentional Structure: Relations
• Two relations between purposes
– Dominance
• DSP1 dominates DSP2 if doing DSP2 contributes to
achieving DSP1
– Satisfaction-Precedence
• DSP1 must be satisfied before DSP2
• Purposes:
– Intend that someone know something, do something,
believe something, etc
– Open-ended
Attentional State
• Captures focus of attention in discourse
– Incremental
– Focus Spaces
• Include entities salient/evoked in discourse
• Include a current DSP
• Stack-structured:
– higher->more salient, lower still accessible
– Push:segment contributes to previous DSP
– Pop: segment to contributes to more dominant DSP
» Tied to intentional structure
Attentional State cntd.
• Focusing structure depends on the intentional
structure: the relationships between DSPs
determine pushes and pops from the stack
• Focusing structure coordinates the linguistic and
intentional structures during processing
• Like the other 2 structures, focusing structure
evolves as discourse proceeds
Discourse examples
• Essay
• Task-oriented dialog
– Intentional structure is neither identical nor
isomorphic to the general plan
0
The "movies" are so attractive to the great American public, especially to young
people, that it is time to take careful thought about their effect on mind and morals.
1
Ought any parent to permit his children to attend a moving picture show often or
without being quite certain of the show he permits them to see?
2
3
No one can deny, of course, that great educational and ethical gains may be made
through the movies because of their astonishing vividness.
But the important fact to be determined is the total result of continuous and
indiscriminate attendance on shows of this kind. Can it other than harmful?
4
5
In the first place the character of the plays is seldom of the best.
One has only to read the ever-present "movie" billboard to see how cheap,
melodramatic and vulgar most of the photoplays are.
6
Even the best plays, moreover, are bound to be exciting and over-emotional.
Without spoken words, facial expression and gesture must carry the meaning: but
only strong emotion or buffoonery can be represented through facial expression and
gesture. The more reasonable and quiet aspects of life are necessarily neglected.
How can our young people drink in through their eyes a continuous spectacle of
intense and strained activity and feeling without harmful effects?
Parents and teachers will do well to guard the young against overindulgence in the
taste for the "movie".
H:1. First you have to remove the flywheel .
R:2. How do I remove the flywheel?
H:3. First, loosen the screw , then pull it off.
R:4. OK
.5. The tool I have is awkward. Is there another tool that I could use instead?
H:6. Show me the tool you are using.
R:7. OK.
H:8. Are you sure you are using the right size key?
R:9. I’ll try some others. 10. I found an angle I can get at it .
11. The screw is loose, but I’m having trouble getting the flywheel off.
H:12. Use the wheelpuller . Do you know how to use it ?
R:13. No.
H:14. Do you know what it looks like?
R:15. Yes.
H:16. Show it to me please.
R:17. OK.
H:18. Good. Loosen the screw in the center and place the jaws around the
hub of the flywheel, then tighten the screw onto the center of the
shaft. The flywheel should slide off.
Processing issues
• Intention recognition
– What info can be used to recognize an intention
– At what point does this info become available
• Overall processing module has to be able to
operate on partial information
• It must allow for incrementally constraining the
range of possibilities on the basis of new info
that becomes available as the segment
progresses
• Info constraining DSP:
– Specific linguistic markers
– Utterance-level intentions
– General knowledge about actions and objects in
the domain of discourse
• Applications of the theory:
– Interruptions
• Weak – not linked to immediate DSP
• Strong - not linked to any DSP
– Cue words
Interruption
• John came by and left
the groceries
• Stop that you kids
• And I put them away
after he left
kids
DSP2
John, groceries
DSP1
John, groceries
DSP1
Conclusions
• Generalizes approaches to task-oriented
dialogue
– Goal: Domain-independence
– Broad, general, abstract model
• Accounts for interesting phenomena
– Interruptions, returns, cue phrases
More conclusions
• Asks more questions than it answers.
• How do we implement these aspects of
dialog?
– Is it remotely feasible????
Download