Global Inference via Linear Programming Formulation Presenter: Natalia Prytkova Tutor: Maximilian Dylla 14.07.2011 Outline • Motivation • Naïve Algorithm • LP Formulation – Constraints – Objective Function • Applications of LP • Experiments • Discussion 2 Inference with Classifiers Recognize entities Recognize relations Inference 3 Example Book Author 4 Example Book Author 5 Properties of Extracted Items Composer Author BookWrittenBy (Book, Author) BalletWrittenBy (Ballet, Composer) Ballet Book 6 Properties of Extracted Items MemberOfUnion (Author, WritersUnion) GraduatedFrom (Composer, Conservatory) Conservatory Composer Author WritersUnion BookWrittenBy (Book, Author) BalletWrittenBy (Ballet, Composer) Ballet Book ShownInTheater (Ballet,Theater) BookPublishedBy (Book, Publisher) Theater Publisher 7 Example BalletWrittenBy Ballet Composer 8 Example BalletWrittenBy Ballet Composer 9 Properties of Extracted Items • a lot of relations types • a lot of entities types • mutually dependent 10 Outline • Motivation • Naïve Algorithm • ILP Formulation – Constraints – Objective Function • Applications of ILP • Experiments • Discussion 11 Outline • Motivation • Naïve Algorithm • LP Formulation – Constraints – Objective Function • Applications of LP • Experiments • Discussion 12 Key Idea Recognize relations Inference Recognize entities 13 Naïve Algorithm 14 Naïve Algorithm P(Book BalletWrittenBy Composer) = 0.07 P(Book BalletWrittenBy Author) = 0.07 P(Book BookWrittenBy Composer) = 0.12 P(Book BookWrittenBy Author) = 0.03 P(Ballet BalletWrittenBy Composer) = 0.28 P(Ballet BalletWrittenBy Author) = 0.28 P(Ballet BookWrittenBy Composer) = 0.12 P(Ballet BookWrittenBy Author) = 0.12 … 15 Naïve Algorithm P(Book BalletWrittenBy Composer) = 0.07 P(Book BalletWrittenBy Author) = 0.07 n entities – O(n2) binary relations 2 P(Book BookWrittenBy Composer) = 0.12 l labels – ln assignments P(Book BookWrittenBy Author) = 0.03 P(Ballet BalletWrittenBy Composer) = 0.28 P(Ballet BalletWrittenBy Author) = 0.28 P(Ballet BookWrittenBy Composer) = 0.12 P(Ballet BookWrittenBy Author) = 0.12 … 16 Naïve Algorithm P(Book BalletWrittenBy Composer) = 0.07 P(Book BalletWrittenBy Author) = 0.07 n entities – O(n2) binary relations 2 P(Book BookWrittenBy Composer) = 0.12 l labels – ln assignments P(Book BookWrittenBy Author) = 0.03 P(Ballet BalletWrittenBy Composer) = 0.28 P(Ballet BalletWrittenBy Author) = 0.28 P(Ballet BookWrittenBy Composer) = 0.12 P(Ballet BookWrittenBy Author) = 0.12 … 17 Some Useful Properties • Relations impose restrictions on entities • Each entity or relation can be labeled only with one label • Relations can be directed (BookWrittenBy) or undirected (SpouseOf) 18 Outline • Motivation • Naïve Algorithm • ILP Formulation – Constraints – Objective Function • Applications of ILP • Experiments • Discussion 19 Key Idea • Obtain a set of possible labels for entities/relations • Optimize the global decision given a set of constraints 20 Definitions • Sentence S – Linked list of words and entities. Boundaries of entities are given Piotr Ilyich Tchaikovsky is one entity. • Entity ε – Observed variables E1, E2 ...En E1 The Nutcracker LE ballet 1 E2 Piotr Ilyich Tchaikovsk y LE2 composer • Relation – Binary relations between entities R12 (E1 , E 2 ) L R12 BalletWrit tenBy • Class – Predefined sets of entities and relations labels . Lr BalletWrit tenBy, BookWritt enBy Le Composer, Author, Book, Ballet 21 Constraints Indicator variables x{Ei ,l } 1 iff entity Ei was labeled as l x{Rij ,l } 1 iff relation Rij was labeled as l x{Rij ,lij ,Ei ,ei } 1 iff relation Rij was labeled as lij and it takes entity Ei with the label ei as its first argument otherwise 0 for all x 22 Constraints LE1 ballet , LE2 composer , LR12 BalletWrit tenBy x{R12 ,BalletW rittenBy} 1 x{R12 ,BookW rittenBy} 0 x{R12 ,BalletW rittenBy, E1 ,ballet} 1 x{R12 ,BalletW rittenBy, E1 ,book} 0 x{E1 ,ballet} 1 x{E1 ,book} 0 23 Constraints • Each entity or relation can be labeled only with one label • Assignment to each entity or relation variable is consistent with the assignments to its neighboring variables 24 Objective Function • Assignment cost cv (l ) log( p) – e.g. cE1 (ballet ) log( 0.8) – Cost of deviating from the assignments given by classifiers • Constraint cost d 1 ( f Rij , f Ei ) 0 if ( f Rij , f Ei ) C1, otherwise – e.g. d 1 (balletWrit tenBy , ballet ) 0 d 2 (balletWrit tenBy , author) – Cost of breaking constraints between two neighboring entities min C ( f ) min 1 2 c ( f ) [ d ( f , f ) d ( f Rij , f E j )] v v Rij Ei vV Rij 25 Naïve Algorithm P(Book BalletWrittenBy Composer) = 0.07 P(Book BalletWrittenBy Author) = 0.07 n entities – O(n2) binary relations 2 P(Book BookWrittenBy Composer) = 0.12 l labels – ln assignments P(Book BookWrittenBy Author) = 0.03 P(Ballet BalletWrittenBy Composer) = 0.28 P(Ballet BalletWrittenBy Author) = 0.28 P(Ballet BookWrittenBy Composer) = 0.12 P(Ballet BookWrittenBy Author) = 0.12 … 26 Useful Property ILP is NP hard in general, but sometimes can be solved in polynomial time. 27 Outline • Motivation • Naïve Algorithm • ILP Formulation – Constraints – Objective Function • Applications of ILP • Experiments • Discussion 28 Viterbi Shortest path 29 Viterbi min log M ( y, y ' ) x i i , yy ' i[ 0 ,n 1], y , y '[ 0 ,m 1] s.t. x i, y ' y y'[0,m -1] x i , yy '' y ''[ 0 ,m 1] x 1 x 1 start, 0 y y[ 0 ,m 1] end , y 0 y[ 0 ,m 1] 0 i [0, n 1], y [0, m 1] x {0,1} xi , yy ' - there is an edge between vi 1, y and vi , y ' 30 Phrases Identification 31 Phrases Identification 32 Phrases Identification xi : (t1 , t3 ), (t1 , t5 ), (t1 , t6 ), (t2 , t3 ), (t2 , t5 ), (t2 , t6 ), (t4 , t5 ), (t4 , t6 ) n min p x i 1 i i s.t. shortest path constraint s x i {0,1} pi is the probabilit y that the pair i is a phrase 33 Outline • Motivation • Naïve Algorithm • ILP Formulation – Constraints – Objective Function • Applications of ILP • Experiments • Discussion 34 Experiments E E I I R R E E -> R I E <-> R R E I R R -> E Separate E I R Omniscient 35 Experiments 36 Experiments • • • • 5 336 entities 19 048 pairs of entities 1 437 sentences running time < 30 sec on Pentium III 800 MHz 37 Outline • Motivation • Naïve Algorithm • ILP Formulation – Constraints – Objective Function • Applications of ILP • Experiments • Discussion 38 Discussion • Guarantees optimality • Supports correct decisions by imposing limitations • LP solvers are available • Not scalable – cplex accepts at most 231 variables and constraints • ~ 46 000 entities – student edition accepts only 500 =) • ~ 20 entities • No feedback to extractors 39 References • Dan Roth and Wen-tau Yih: A Linear Programming Formulation for Global Inference in Natural Language Tasks, CoNLL'04 • Dan Roth and Wen-tau Yih: Global Inference for Entity and Relation Identification via a Linear Programming Formulation, Introduction to Statistical Relational Learning, 2007 40