SRL via Generalized Inference
Vasin Punyakanok, Dan Roth,
Wen-tau Yih, Dav Zimak, Yuancheng Tu
Department of Computer Science
University of Illinois at Urbana-Champaign
Page 1
Semantic Role Labeling
For each verb in a sentence
identify all constituents that fill a semantic role
determine their roles
1.
2.
•
•
Agent, Patient or Instrument, …
Their adjuncts, e.g., Locative, Temporal or Manner
PropBank project [Kingsbury & Palmer02] provides
a large human-annotated corpus of semantic
verb-argument relations.
CoNLL-2004 shared task [Carreras & Marquez 04]
Page 2
Example
A0 represents the leaver,
A1 represents the thing left,
A2 represents the benefactor,
AM-LOC is an adjunct indicating the location of the action,
V determines the verb.
Page 3
Argument Types
A0-A5 and AA have different semantics for each
verb as specified in the PropBank Frame files.
13 types of adjuncts labeled as AM-XXX where
XXX specifies the adjunct type.
C-XXX is used to specify the continuity of the
argument XXX.
In some cases, the actual agent is labeled as the
appropriate argument type, XXX, while the
relative pronoun is instead labeled as R-XXX.
Page 4
Examples
C-XXX
R-XXX
Page 5
Outline
Find potential argument candidates
Classify arguments to types
Inference for Argument Structure
Cost
Function
Constraints
Integer linear programming (ILP)
Results & Discussion
Page 6
Find Potential Arguments
An argument can be any
consecutive words
Restrict potential arguments
BEGIN(word)
BEGIN(word)
= 1 “word begins argument”
END(word)
END(word)
= 1 “word ends argument”
I left my nice pearls to her
[ [
[
[
[
]
] ]
]
]
Argument
(wi,...,wj) is a potential argument
BEGIN(wi) = 1 and END(wj) = 1
I left my nice pearls to her
iff
Reduce set of potential arguments
Page 7
Details – Word-level Classifier
BEGIN(word)
Learn
B(word,context,structure) {0,1}
END(word)
Learn
a function
a function
E(word,context,structure) {0,1}
POTARG = {arg | BEGIN(first(arg)) and END(last(arg))}
Page 8
Arguments Type Likelihood
Assign type-likelihood
I left my nice pearls to her
[ [
[
[
[
]
] ]
]
]
How
likely is it that arg a is type t?
For all a POTARG , t T
P (argument a = type t )
0.3 0.2 0.2 0.3
0.6 0.0 0.0 0.4
A0
C-A1
A1
I left my nice pearls to her
Ø
Page 9
Details – Phrase-level Classifier
Learn a classifier
ARGTYPE(arg)
P(arg)
{A0,A1,...,C-A0,...,AM-LOC,...}
argmaxt{A0,A1,...,C-A0,...,LOC,...} wt P(arg)
Estimate Probabilities
Softmax
P(a
= t) = exp(wt P(a)) / Z
Page 10
What is a Good Assignment?
Likelihood of being correct
P(Arg
a = Type t)
if t is the correct type for argument a
For a set of arguments a1, a2, ..., an
Expected
number of arguments that are correct
i P( ai = ti )
We search for the assignment with the maximum
expected number of correct arguments.
Page 11
Inference
Maximize expected number correct
T*
= argmaxT
i P( ai = ti )
I left my nice pearls to her
Subject to some constraints
Structural
and Linguistic (R-A1A1)
0.3 0.2 0.2 0.3
0.6 0.0 0.0 0.4
0.1 0.3 0.5 0.1
I left my nice pearls to her
0.1 0.2 0.3 0.4
Cost = 0.3 + 0.4
0.6 + 0.5
0.3 + 0.4 = 1.8
1.6
1.4
Non-Overlapping
BlueRed
Independent
&Max
N-O
Page 12
LP Formulation – Linear Cost
Cost function
a P
OTARG
P(a=t) = a POTARG , t T P(a=t) x{a=t}
Indicator variables
x{a1=A0}, x{a1= A1}, …, x{a4= AM-LOC}, x{P4=} {0,1}
Total Cost
= p(a1= A0)· x(a1= A1) + p(a1= )· x(a1= ) +… + p(a4= )· x(a4= )
Page 13
Linear Constraints (1/2)
Binary values
a POTARG , t T , x{a = t} {0,1}
Unique labels
a POTARG , t T x{a = t} = 1
No overlapping or embedding
a1 and a2 overlap x{a1=Ø} + x{a2=Ø} 1
Page 14
Linear Constraints (2/2)
No duplicate argument classes
a POTARG x{a = A0} 1
R-XXX
a2 POTARG , a POTARG x{a = A0} x{a2 = R-A0}
C-XXX
a2 POTARG ,
(a POTARG) (a is before a2 ) x{a = A0} x{a2 = C-A0}
Page 15
Results on Perfect Boundaries
Assume the boundaries of arguments
(in both training and testing) are given.
Development Set
without
inference
with
inference
Precision
86.95
Recall
87.24
F1
87.10
88.03
88.23
88.13
Page 16
Results
Development Set
69
68.26
68
67.13
67
F1
66
non-overlap
all const.
65.71
65.46
65
64
1st Phase
2nd Phase
Overall F1 on Test Set : 66.39
Page 17
Discussion
Data analysis is important !!
F1: ~45% ~65%
Feature engineering, parameter tuning, …
Global inference helps !
Using all constraints gains more than 1% F1
compared to just using non-overlapping constraints
Easy and fast: 15~20 minutes
Performance difference ?
Not
from word-based vs. chunk-based
Page 18
Thank you
yih@uiuc.edu
Page 19