PPT

advertisement
Global Inference via Linear
Programming Formulation
Presenter: Natalia Prytkova
Tutor: Maximilian Dylla
14.07.2011
Outline
• Motivation
• Naïve Algorithm
• LP Formulation
– Constraints
– Objective Function
• Applications of LP
• Experiments
• Discussion
2
Inference with Classifiers
Recognize
entities
Recognize
relations
Inference
3
Example
Book
Author
4
Example
Book
Author
5
Properties of Extracted Items
Composer
Author
BookWrittenBy
(Book, Author)
BalletWrittenBy
(Ballet, Composer)
Ballet
Book
6
Properties of Extracted Items
MemberOfUnion
(Author, WritersUnion)
GraduatedFrom
(Composer, Conservatory)
Conservatory
Composer
Author
WritersUnion
BookWrittenBy
(Book, Author)
BalletWrittenBy
(Ballet, Composer)
Ballet
Book
ShownInTheater
(Ballet,Theater)
BookPublishedBy
(Book, Publisher)
Theater
Publisher
7
Example
BalletWrittenBy
Ballet
Composer
8
Example
BalletWrittenBy
Ballet
Composer
9
Properties of Extracted Items
• a lot of relations types
• a lot of entities types
• mutually dependent
10
Outline
• Motivation
• Naïve Algorithm
• ILP Formulation
– Constraints
– Objective Function
• Applications of ILP
• Experiments
• Discussion
11
Outline
• Motivation
• Naïve Algorithm
• LP Formulation
– Constraints
– Objective Function
• Applications of LP
• Experiments
• Discussion
12
Key Idea
Recognize
relations
Inference
Recognize
entities
13
Naïve Algorithm
14
Naïve Algorithm
P(Book BalletWrittenBy Composer) = 0.07
P(Book BalletWrittenBy Author)
= 0.07
P(Book BookWrittenBy Composer) = 0.12
P(Book BookWrittenBy Author)
= 0.03
P(Ballet BalletWrittenBy Composer) = 0.28
P(Ballet BalletWrittenBy Author)
= 0.28
P(Ballet BookWrittenBy Composer) = 0.12
P(Ballet BookWrittenBy Author)
= 0.12
…
15
Naïve Algorithm
P(Book BalletWrittenBy Composer) = 0.07
P(Book BalletWrittenBy Author)
= 0.07 n entities – O(n2) binary relations
2
P(Book BookWrittenBy Composer) = 0.12 l labels – ln assignments
P(Book BookWrittenBy Author)
= 0.03
P(Ballet BalletWrittenBy Composer) = 0.28
P(Ballet BalletWrittenBy Author)
= 0.28
P(Ballet BookWrittenBy Composer) = 0.12
P(Ballet BookWrittenBy Author)
= 0.12
…
16
Naïve Algorithm
P(Book BalletWrittenBy Composer) = 0.07
P(Book BalletWrittenBy Author)
= 0.07 n entities – O(n2) binary relations
2
P(Book BookWrittenBy Composer) = 0.12 l labels – ln assignments
P(Book BookWrittenBy Author)
= 0.03
P(Ballet BalletWrittenBy Composer) = 0.28
P(Ballet BalletWrittenBy Author)
= 0.28
P(Ballet BookWrittenBy Composer) = 0.12
P(Ballet BookWrittenBy Author)
= 0.12
…
17
Some Useful Properties
• Relations impose restrictions on entities
• Each entity or relation can be labeled only with
one label
• Relations can be directed (BookWrittenBy) or
undirected (SpouseOf)
18
Outline
• Motivation
• Naïve Algorithm
• ILP Formulation
– Constraints
– Objective Function
• Applications of ILP
• Experiments
• Discussion
19
Key Idea
• Obtain a set of possible labels for
entities/relations
• Optimize the global decision given a set of
constraints
20
Definitions
• Sentence S
– Linked list of words and entities. Boundaries of entities are given
Piotr Ilyich Tchaikovsky is one entity.
• Entity ε
– Observed variables   E1, E2 ...En 
E1  The Nutcracker
LE  ballet
1
E2  Piotr Ilyich Tchaikovsk y LE2  composer
• Relation
– Binary relations between entities
R12  (E1 , E 2 )
L R12  BalletWrit tenBy
• Class
– Predefined sets of entities and relations labels
. Lr  BalletWrit tenBy, BookWritt enBy 
Le  Composer, Author, Book, Ballet 
21
Constraints
Indicator variables
x{Ei ,l }  1 iff entity Ei was labeled as l
x{Rij ,l }  1 iff relation Rij was labeled as l
x{Rij ,lij ,Ei ,ei }  1 iff relation Rij was labeled as lij and
it takes entity Ei with the label ei as its
first argument
otherwise 0 for all x
22
Constraints
LE1  ballet , LE2  composer , LR12  BalletWrit tenBy
x{R12 ,BalletW rittenBy}  1
x{R12 ,BookW rittenBy}  0
x{R12 ,BalletW rittenBy, E1 ,ballet}  1 x{R12 ,BalletW rittenBy, E1 ,book}  0
x{E1 ,ballet}  1
x{E1 ,book}  0
23
Constraints
• Each entity or relation can be labeled only with
one label
• Assignment to each entity or relation variable is
consistent with the assignments to its
neighboring variables
24
Objective Function
• Assignment cost cv (l )   log( p)
– e.g. cE1 (ballet )   log( 0.8)
– Cost of deviating from the assignments given by classifiers
• Constraint cost
d 1 ( f Rij , f Ei )  0 if ( f Rij , f Ei )  C1, otherwise 
– e.g.
d 1 (balletWrit tenBy , ballet )  0
d 2 (balletWrit tenBy , author)  
– Cost of breaking constraints between two neighboring entities
min C ( f )  min
1
2
c
(
f
)

[
d
(
f
,
f
)

d
( f Rij , f E j )]
 v v  Rij Ei
vV
Rij
25
Naïve Algorithm
P(Book BalletWrittenBy Composer) = 0.07
P(Book BalletWrittenBy Author)
= 0.07 n entities – O(n2) binary relations
2
P(Book BookWrittenBy Composer) = 0.12 l labels – ln assignments
P(Book BookWrittenBy Author)
= 0.03
P(Ballet BalletWrittenBy Composer) = 0.28
P(Ballet BalletWrittenBy Author)
= 0.28
P(Ballet BookWrittenBy Composer) = 0.12
P(Ballet BookWrittenBy Author)
= 0.12
…
26
Useful Property
ILP is NP hard in general, but sometimes can be
solved in polynomial time.
27
Outline
• Motivation
• Naïve Algorithm
• ILP Formulation
– Constraints
– Objective Function
• Applications of ILP
• Experiments
• Discussion
28
Viterbi
Shortest path
29
Viterbi
min
  log M ( y, y ' )  x
i
i , yy '
i[ 0 ,n 1],
y , y '[ 0 ,m 1]
s.t.
x
i, y ' y
y'[0,m -1]

x
i , yy ''
y ''[ 0 ,m 1]
x
1
x
1
start, 0 y
y[ 0 ,m 1]
end , y 0
y[ 0 ,m 1]
 0 i  [0, n  1], y  [0, m  1]
x  {0,1}
xi , yy ' - there is an edge between vi 1, y and vi , y '
30
Phrases Identification
31
Phrases Identification
32
Phrases Identification
xi : (t1 , t3 ), (t1 , t5 ), (t1 , t6 ), (t2 , t3 ), (t2 , t5 ), (t2 , t6 ), (t4 , t5 ), (t4 , t6 )
n
min
 p  x
i 1
i
i
s.t. shortest path constraint s
x i  {0,1}
pi is the probabilit y that the pair i is a phrase
33
Outline
• Motivation
• Naïve Algorithm
• ILP Formulation
– Constraints
– Objective Function
• Applications of ILP
• Experiments
• Discussion
34
Experiments
E
E
I
I
R
R
E
E -> R
I
E <-> R
R
E
I
R
R -> E
Separate
E
I
R
Omniscient
35
Experiments
36
Experiments
•
•
•
•
5 336 entities
19 048 pairs of entities
1 437 sentences
running time < 30 sec on Pentium III 800 MHz
37
Outline
• Motivation
• Naïve Algorithm
• ILP Formulation
– Constraints
– Objective Function
• Applications of ILP
• Experiments
• Discussion
38
Discussion
• Guarantees optimality
• Supports correct decisions by imposing
limitations
• LP solvers are available
• Not scalable
– cplex accepts at most 231 variables and constraints
• ~ 46 000 entities
– student edition accepts only 500 =)
• ~ 20 entities
• No feedback to extractors
39
References
• Dan Roth and Wen-tau Yih:
A Linear Programming Formulation for Global
Inference in Natural Language Tasks,
CoNLL'04
• Dan Roth and Wen-tau Yih:
Global Inference for Entity and Relation
Identification via a Linear Programming
Formulation, Introduction to Statistical
Relational Learning, 2007
40
Download