Shallow Semantics

advertisement
Shallow Semantics
Semantics and Pragmatics
High-level Linguistics (the good stuff!)
Semantics: the study of meaning that can be
determined from a sentence, phrase or word.
Pragmatics: the study of meaning, as it depends
on context (speaker, situation, dialogue
history)
LING 2000 - 2006
NLP
2
Language to (Simplistic) Logic
• John went to the book store.
go(John, store1)
• John bought a book.
buy(John,book1)
• John gave the book to Mary.
give(John,book1,Mary)
• Mary put the book on the table.
put(Mary,book1,on table1)
LING 2000 - 2006
NLP
3
What’s missing?
•
•
•
•
•
Word sense disambiguation
Quantification
Coreference
Interpreting within a phrase
Many, many more issues …
• But it’s still more than you get from parsing!
Some problems in shallow semantics
1.
Identifying entities
–
–
–
2.
Identifying relationship names
–
–
–
3.
Verb-phrase chunking
Predicate identification (step 0 of semantic role labeling)
Synonym resolution (e.g., get = receive)
Identifying arguments to predicates
–
–
4.
5.
noun-phrase chunking
named-entity recognition
coreference resolution
(involves discourse/pragmatics too)
Information extraction
Argument identification (step 1 of semantic role labeling)
Assigning semantic roles (step 2 of semantic role labeling)
Sentiment classification
–
–
That is, does the relationship express an opinion?
If so, is the opinion positive or negative?
1. Identifying Entities
Named Entity Tagging: Identify all the proper names in a text
Sally went to see Up in the Air at the local theater.
Person
Film
Noun Phrase Chunking: Find all base noun phrases
(that is, noun phrases that don’t have smaller noun phrases
nested inside them)
Sally went to see Up in the Air at the local theater on Elm Street.
1. Identifying Entities (2)
Parsing: Identify all phrase constituents, which
will of course include all noun phrases.
S
VP
NP
N
V
NP
PP
P
NP
NP
Sally
saw
Up in the Air
at
the theater
PP
P
NP
on
Elm St.
1. Identifying Entities (3)
Coreference Resolution: Identify all references
(aka ‘mentions’) of people, places and things
in text, and determine which mentions are
‘co-referential’.
John stuck his foot in his mouth.
2. Identifying relationship names
Verb phrase chunking: the commonest approach
Some issues:
1.
Often, prepositions/particles “belong” with the relation name
You’re ticking me off.
2.
Many relationships are expressed without a verb:
Jack Welch, CEO of GE, …
3.
Some verbs don’t really express a meaningful relationship by themselves:
Jim is the father of 12 boys.
4.
5.
Verb sense disambiguation
Synonymy
ticking off = bothering
2. Identifying relationship names (2)
Synonym Resolution:
Discovery of Inference Rules from Text
(DIRT) (Lin and Pantel, 2001)
1. They collect millions of examples of
Subject Verb Object
triples by parsing a Web corpus.
2. For a pair of verbs, v1 and v2, they
compute mutual information scores
between
- the vector space model (VSM) for
subjects of v1 and the vector space
model for the subjects of v2
- the VSM for objects of v1 and VSM
for objects of v2
3. They cluster verbs with high MI scores
between them
give
donate
many
gift
souls
gift
.
your
self
partner
monthly
How to
animal
please
hair
you
gift
many
dollars
please
blood
you
car
help
life
you
money
members
energy
you
today
See (Yates and Etzioni, JAIR 2009)
for a more recent approach
using probabilistic models.
5. Sentiment Classification
Given a review (about a movie, hotel, Amazon product, etc.), a
sentiment classification system tries to determine what
opinions are expressed in the review.
Coarse-level objective: is the review positive, negative, or
neutral overall?
Fine-grained objective: what are the positive aspects
(according to the reviewer), and what are the negative
aspects?
Question: what technique(s) would you use to solve these
two problems?
Semantic Role Labeling
a.k.a., Shallow Semantic Parsing
Semantic Role Labeling
Semantic role labeling is the computational task of
assigning semantic roles to phrases
It’s usually divided into three subtasks:
1. Predicate identification
2. Argument Identification
3. Argument Classification -- assigning semantic roles
Agent
B-Arg
Patient
B-Arg
Pred
John broke the
I-Arg
Means
(or instrument)
B-Arg I-Arg
window with
a
I-Arg
hammer.
Same event - different sentences
John broke the window with a hammer.
John broke the window with the crack.
The hammer broke the window.
The window broke.
NLP
14
Same event - different syntactic frames
John broke the window with a hammer.
SUBJ VERB
OBJ
MODIFIER
John broke the window with the crack.
SUBJ VERB
OBJ
MODIFIER
The hammer broke the window.
SUBJ VERB
OBJ
The window broke.
SUBJ VERB
NLP
15
Semantic role example
break(AGENT, INSTRUMENT, PATIENT)
AGENT
PATIENT
INSTRUMENT
John broke the window with a hammer.
INSTRUMENT
PATIENT
The hammer broke the window.
PATIENT
The window broke.
Fillmore 68 - The case for case
NLP
16
AGENT
PATIENT
INSTRUMENT
John broke the window with a hammer.
SUBJ
OBJ
INSTRUMENT
MODIFIER
PATIENT
The hammer broke the window.
SUBJ
OBJ
PATIENT
The window broke.
SUBJ
NLP
17
Semantic roles
Semantic roles (or just roles) are slots, belonging to a
predicate, which arguments can fill.
- There are different naming conventions, but one common set of
names for semantic roles are agent, patient, means/instrument, ….
Some constraints:
1. Only certain kinds of phrases can fill certain kinds of semantic
roles
“with a crack” will never be an agent
But many are ambiguous:
“hammer” patient or instrument?
2. Syntax provides a clue, but it is not the full answer
Subject  Agent? Patient? Instrument?
Slot Filling
Phrases
Slots
John
Pred
broke
Agent
Patient
the window
with a
hammer
Means
(or instrument)
Argument Classification
Slot Filling
Phrases
Slots
The
hammer
Pred
broke
Agent
Patient
the window
Means
(or instrument)
Argument Classification
Slot Filling
Phrases
Slots
The
window
Pred
broke
Agent
Patient
Means
(or instrument)
Argument Classification
Slot Filling and Shallow Semantics
Phrases
Slots
John
Pred
Shallow
Semantics
Means
(or instrument)
broke(John, the window, with a hammer)
Pred Agent
broke
the window
with a
hammer
Agent
Patient
Means
(or instrument)
Patient
Slot Filling and Shallow Semantics
Phrases
Slots
The window
Pred
broke
Agent
Shallow
Semantics
Means
(or instrument)
broke( ?x , the window,
?y
)
Pred Agent
Patient
Means
(or instrument)
Patient
Semantic Role Labeling
Techniques
Semantic Role Labeling Techniques
We’ll cover 3 approaches to SRL
1. Basic (Gildea and Jurafsky, Comp. Ling. 2003)
2. Joint inference for argument structure (Toutanova
et al., Comp. Ling. 2008)
3. Open-domain (Huang and Yates, ACL 2010)
1. Gildea and Jurafsky
Main idea: start with parse tree, and try to identify constituents that are arguments.
G&J (1)
Build a (probabilistic) classifier for predicting:
- for each constituent, which role is it?
- Essentially, a maximum-entropy classifier, although it’s not described
that way
Features for Argument Classification:
1. Phrase type of constituent
2. Governing category of NPs – S or VP (differentiates between
subjects and objects)
3. Position w.r.t. predicate (before or after)
4. Voice of predicate (active or passive verb)
5. Head word of constituent
6. Parse tree path between predicate and constituent
G&J (2) – Parse Tree Path Feature
Parse tree path (or just path) feature:
Determines the syntactic relationship
between predicate and current constituent.
In this example, path feature:
VB ↑ VP ↑ S ↓ NP
G&J (3)
4086 possible values of the Path feature in training data.
A sparse feature!
G&J (4)
Build a (probabilistic) classifier for predicting:
- for each constituent, which role is it?
- Essentially, a maximum-entropy classifier, although it’s not
described that way
Features for Argument Identification:
1. Predicate word
2. Head word of constituent
3. Parse tree path between predicate and constituent
G&J (5): Results
Task
Best Result
Argument Identification (only)
92% prec., 86% rec., .89 F1
Argument Classification (only)
78.5% assigned correct role
2. Toutanova, Haghighi, and Manning
A Global Joint Model for SRL (Comp. Ling., 2008)
Main idea(s):
Include features that depend on multiple
arguments
Use multiple parsers as input, for robustness
THM (1): Motivation
1. “The day that the ogre cooked the children is still remembered.”
2. “The meal that the ogre cooked the children is still remembered.”
Both sentences have identical syntax.
They differ in only 1 word (day vs. meal).
If we classify arguments 1 at a time, “the children” will be labeled the same
thing in both cases.
But in (1), “the children” is the Patient (thing being cooked).
And in (2), “the children” is the Beneficiary (people for whom the cooking is
done).
Intuitively, we can’t classify these arguments independently.
THM(2): Features
Features:
1. Whole label sequence
1. [voice:active, Arg1, pred, Arg4, ArgM-TMP]
2. [voice:active, lemma:accelerated, Arg1, pred, Arg4, ArgM-TMP]
3. [voice:active, lemma:accelerated, Arg1, pred, Arg4] (no adjuncts)
4. [voice:active, lemma:accelerated, Arg, pred, Arg] (no adjuncts, no #s)
2. Syntax and semantics in the label sequence
1. [voice:active, NP-Arg1, pred, PP-Arg4]
2. [voice:active, lemma:accelerated, NP-Arg1, pred, PP-Arg4]
3. Repetition features: whether Arg1 (for example) appears multiple times
THM(3): Classifier
• First, for each sentence, obtain the top-10
most likely parse tree/semantic role label
outputs from G&J
• Build a max-ent classifier to select from these
10, using the features above
• Also, include top-10 parses from the Charniak
parser
THM(4): Results
These are on a different data set from G&J, so
results not directly comparable. But the local
model is similar to G&J, so think of that as the
comparison.
Model
WSJ (ID & CLS)
Brown (ID & CLS)
Local
78.00
65.55
Joint (1 parse)
79.71
67.79
Joint (top 5 parses)
80.32
68.81
Results show F1 scores for IDentification and CLaSsification of arguments together.
WSJ is the Wall Street Journal test set, a collection of approximately 4,000 news
sentences.
Brown is a smaller collection of fiction stories.
The system is trained on a separate set of WSJ sentences.
3. Huang and Yates
Open-Domain SRL by Modeling Word Spans, ACL 2010
Main Idea:
One of the biggest problems for SRL systems is that
they need lexical features to classify arguments, but
lexical features are sparse.
We build a simple SRL system that outperforms the
previous state-of-the-art on out-of-domain data, by
learning new lexical representations.
Simple, open-domain SRL
SRL Label
Breaker
Pred
Thing Broken
Means
dist. from
predicate
-1
0
+1
+2
+3
+4
+5
Chunk tag
B-NP
B-VP
B-NP
I-NP
B-PP
B-NP
I-NP
Proper
Noun
Verb
Det.
Noun
Prep.
Det.
Noun
Chris
broke
the
window
with
a
hammer
Baseline Features
POS tag
Simple, open-domain SRL
SRL Label
Breaker
Pred
Thing Broken
Means
dist. from
predicate
-1
0
+1
+2
+3
+4
+5
Chunk tag
B-NP
B-VP
B-NP
I-NP
B-PP
B-NP
I-NP
Proper
Noun
Verb
Det.
Noun
Prep.
Det.
Noun
Chris
broke
the
window
with
a
hammer
Baseline +HMM
HMM label
POS tag
The importance of paths
Chris [predicate broke] [thing broken a hammer]
Chris [predicate broke] a window with [means a
hammer]
Chris [predicate broke] the desk, so she fetched
[not an arg a hammer] and nails.
Simple, open-domain SRL
SRL Label
Breaker
Pred
Thing Broken
Means
Baseline +HMM + Paths
Word path
thewindowwith
None
None
None
the
thewindow
Chris
broke
the
window
with
a
thewindowwith-a
hammer
Simple, open-domain SRL
SRL Label
Breaker
Pred
Thing Broken
Means
Baseline +HMM + Paths
POS path
Word path
None
None
None
DetDetNoun- NounDet Det-Noun
Prep Prep-Det
thewindowwith
None
None
None
the
thewindow
Chris
broke
the
window
with
a
thewindowwith-a
hammer
Simple, open-domain SRL
SRL Label
Breaker
Pred
Thing Broken
None
None
None
Means
Baseline +HMM + Paths
HMM path
POS path
Word path
None
None
None
DetDetNoun- NounDet Det-Noun
Prep Prep-Det
thewindowwith
None
None
None
the
thewindow
Chris
broke
the
window
with
a
thewindowwith-a
hammer
Experimental results – F1
0.85
0.80
0.75
0.70
0.65
0.729
0.672
0.655
0.750
0.677
0.617
0.60
WSJ
0.55
Brown
0.50
All systems were trained on newswire text from the Wall Street Journal
(WSJ), and tested on WSJ and fiction texts from the Brown corpus (Brown).
Experimental results – F1
0.85
0.808
0.80
0.75
0.70
0.65
0.729
0.672
0.655
0.786
0.794
0.750
0.677
0.688
0.684
0.678
0.617
0.60
WSJ
0.55
Brown
0.50
All systems were trained on newswire text from the Wall Street Journal
(WSJ), and tested on WSJ and fiction texts from the Brown corpus (Brown).
Span-HMMs
Span-HMM features
SRL Label
Breaker
Pred
Chris
broke
Thing Broken
Means
Span-HMM Features
Span-HMM feature
Span-HMM for
“hammer”
the
window
with
a
hammer
Span-HMM features
SRL Label
Breaker
Pred
Chris
broke
Thing Broken
Means
Span-HMM Features
Span-HMM feature
Span-HMM for
“hammer”
the
window
with
a
hammer
Span-HMM features
SRL Label
Breaker
Pred
Chris
broke
Thing Broken
Means
Span-HMM Features
Span-HMM feature
Span-HMM for “a”
the
window
with
a
hammer
Span-HMM features
SRL Label
Breaker
Pred
Chris
broke
Thing Broken
Means
Span-HMM Features
Span-HMM feature
Span-HMM for “a”
the
window
with
a
hammer
Span-HMM features
SRL Label
Breaker
Pred
Thing Broken
None
None
None
Chris
broke
the
Means
Span-HMM Features
Span-HMM feature
window
with
a
hammer
Experimental results – SRL F1
0.85
0.808
0.80
0.75
0.70
0.786
0.794
0.786
0.750
0.718
0.677
0.688
0.684
0.792
0.731
0.678
0.65
0.60
WSJ
0.55
Brown
0.50
All systems were trained on newswire text from the Wall Street Journal
(WSJ), and tested on WSJ and fiction texts from the Brown corpus (Brown).
Experimental results – feature
sparsity
Benefit grows with distance from predicate
Download