SRL via Generalized Inference Vasin Punyakanok, Dan Roth, Wen-tau Yih, Dav Zimak, Yuancheng Tu Department of Computer Science University of Illinois at Urbana-Champaign Page 1 Semantic Role Labeling For each verb in a sentence identify all constituents that fill a semantic role determine their roles 1. 2. • • Agent, Patient or Instrument, … Their adjuncts, e.g., Locative, Temporal or Manner PropBank project [Kingsbury & Palmer02] provides a large human-annotated corpus of semantic verb-argument relations. CoNLL-2004 shared task [Carreras & Marquez 04] Page 2 Example A0 represents the leaver, A1 represents the thing left, A2 represents the benefactor, AM-LOC is an adjunct indicating the location of the action, V determines the verb. Page 3 Argument Types A0-A5 and AA have different semantics for each verb as specified in the PropBank Frame files. 13 types of adjuncts labeled as AM-XXX where XXX specifies the adjunct type. C-XXX is used to specify the continuity of the argument XXX. In some cases, the actual agent is labeled as the appropriate argument type, XXX, while the relative pronoun is instead labeled as R-XXX. Page 4 Examples C-XXX R-XXX Page 5 Outline Find potential argument candidates Classify arguments to types Inference for Argument Structure Cost Function Constraints Integer linear programming (ILP) Results & Discussion Page 6 Find Potential Arguments An argument can be any consecutive words Restrict potential arguments BEGIN(word) BEGIN(word) = 1 “word begins argument” END(word) END(word) = 1 “word ends argument” I left my nice pearls to her [ [ [ [ [ ] ] ] ] ] Argument (wi,...,wj) is a potential argument BEGIN(wi) = 1 and END(wj) = 1 I left my nice pearls to her iff Reduce set of potential arguments Page 7 Details – Word-level Classifier BEGIN(word) Learn B(word,context,structure) {0,1} END(word) Learn a function a function E(word,context,structure) {0,1} POTARG = {arg | BEGIN(first(arg)) and END(last(arg))} Page 8 Arguments Type Likelihood Assign type-likelihood I left my nice pearls to her [ [ [ [ [ ] ] ] ] ] How likely is it that arg a is type t? For all a POTARG , t T P (argument a = type t ) 0.3 0.2 0.2 0.3 0.6 0.0 0.0 0.4 A0 C-A1 A1 I left my nice pearls to her Ø Page 9 Details – Phrase-level Classifier Learn a classifier ARGTYPE(arg) P(arg) {A0,A1,...,C-A0,...,AM-LOC,...} argmaxt{A0,A1,...,C-A0,...,LOC,...} wt P(arg) Estimate Probabilities Softmax P(a = t) = exp(wt P(a)) / Z Page 10 What is a Good Assignment? Likelihood of being correct P(Arg a = Type t) if t is the correct type for argument a For a set of arguments a1, a2, ..., an Expected number of arguments that are correct i P( ai = ti ) We search for the assignment with the maximum expected number of correct arguments. Page 11 Inference Maximize expected number correct T* = argmaxT i P( ai = ti ) I left my nice pearls to her Subject to some constraints Structural and Linguistic (R-A1A1) 0.3 0.2 0.2 0.3 0.6 0.0 0.0 0.4 0.1 0.3 0.5 0.1 I left my nice pearls to her 0.1 0.2 0.3 0.4 Cost = 0.3 + 0.4 0.6 + 0.5 0.3 + 0.4 = 1.8 1.6 1.4 Non-Overlapping BlueRed Independent &Max N-O Page 12 LP Formulation – Linear Cost Cost function a P OTARG P(a=t) = a POTARG , t T P(a=t) x{a=t} Indicator variables x{a1=A0}, x{a1= A1}, …, x{a4= AM-LOC}, x{P4=} {0,1} Total Cost = p(a1= A0)· x(a1= A1) + p(a1= )· x(a1= ) +… + p(a4= )· x(a4= ) Page 13 Linear Constraints (1/2) Binary values a POTARG , t T , x{a = t} {0,1} Unique labels a POTARG , t T x{a = t} = 1 No overlapping or embedding a1 and a2 overlap x{a1=Ø} + x{a2=Ø} 1 Page 14 Linear Constraints (2/2) No duplicate argument classes a POTARG x{a = A0} 1 R-XXX a2 POTARG , a POTARG x{a = A0} x{a2 = R-A0} C-XXX a2 POTARG , (a POTARG) (a is before a2 ) x{a = A0} x{a2 = C-A0} Page 15 Results on Perfect Boundaries Assume the boundaries of arguments (in both training and testing) are given. Development Set without inference with inference Precision 86.95 Recall 87.24 F1 87.10 88.03 88.23 88.13 Page 16 Results Development Set 69 68.26 68 67.13 67 F1 66 non-overlap all const. 65.71 65.46 65 64 1st Phase 2nd Phase Overall F1 on Test Set : 66.39 Page 17 Discussion Data analysis is important !! F1: ~45% ~65% Feature engineering, parameter tuning, … Global inference helps ! Using all constraints gains more than 1% F1 compared to just using non-overlapping constraints Easy and fast: 15~20 minutes Performance difference ? Not from word-based vs. chunk-based Page 18 Thank you yih@uiuc.edu Page 19