Statistical Predicate Invention

advertisement
Learning Markov Logic Networks
Using Structural Motifs
Stanley Kok
Dept. of Computer Science and Eng.
University of Washington
Seattle, USA
Joint work with Pedro Domingos
1
Outline




Background
Background
Learning Using Structural Motifs
Experiments
Future Work
2
Markov Logic Networks
[Richardson & Domingos, MLJ’06]



A logical KB is a set of hard constraints
on the set of possible worlds
Let’s make them soft constraints:
When a world violates a formula,
it becomes less probable, not impossible
Give each formula a weight
(Higher weight  Stronger constraint)
2.7 Teaches(p,c) ) Professor(p)
3
Markov Logic

A Markov logic network (MLN) is a set of
pairs (F,w)


F is a formula in first-order logic
w is a real number
vector of truth
assignments to partition
ground atoms function
weight of
ith formula
#true
groundings
of ith formula
4
MLN Structure Learning
Output: MLN
Input: Relational Data
Advises
Teaches
Pete
Sam
Pete
CS1
Pete
Saul
Pete
CS2
Paul
Sara
Paul
CS2
…
…
…
…
TAs
Sam
CS1
Sam
CS2
Sara
CS1
…
…
2.7 Teaches(p, c)
Æ TAs(s, c) )
Advises(p, s)
MLN
Structure
Learner
1.4 Advises(p, s) Æ
Teaches(p, c) )
TAs(s, c)
1.1 :TAs(s, c) ˅
: Advises (s, p)
…
5
Previous Systems

Generate-and-test or greedy




MSL [Kok & Domingos, ICML’05]
BUSL [Mihalkova & Mooney, ICML’07]
Computationally expensive; large search space
Susceptible to local maxima
6
LHL System
[Kok & Domingos, ICML’09]
Advises
Pete Sam
Pete Saul
Paul Sara
…
…
TAs
Sam
Sam
Sara
…
Teaches
Pete CS1
Pete CS2
Paul CS2
…
…
Advises
CS1
CS2
CS1
…
Trace paths &
convert paths to
first-order clauses
Professor Pete
Paul
Pat
Phil
Advises
Sam
Sara
Saul
Student Sue
Pete
Paul
Pat
Phil
Sam
Sara
Saul
Sue
CS1
CS2
CS3
CS4
CS5
CS6
CS7
CS8
TAs
CS1 Course
CS2
CS3
CS4
CS5
CS6
CS7
CS8
‘Lifts’
7
Outline




Background
Learning Using Structural Motifs
Experiments
Future Work
8
Learning Using
Structural Motifs (LSM)

First MLN structure learner that can learn
long clauses


Capture more complex dependencies
Explore a larger space of clauses
LHL Recap
Course1
C1
TAs
S1
Student1
S2
S3
Course3
C3
S4
S9Student3
S10 S11
Course5
C5
Course6
C6
Advises
Prof1
P1
Teaches
Prof2
P2
S14Student5
S15 S16
S17 Student6
S18 S19
S5
Student2
S6
S7
S8
S12
Student4
S13
P3Prof3
P4
P5 Prof4
P6
Course4
C4
Course2
C2
C13
Course11
C9
Course9
C10
Course7
C7
Course13
C15
S33 Student11
S34 S35
S25
S26
Student9
S27
S28
S20Student7
S21 S22
S38Student13
S39 S40
P9
Prof7
Prof6
P8
Prof5
P7
P10Prof8
P11
S36
Student12
S37
S29 Student10
S30 S31
S32
S23
Student8
S24
Student13
S41
Course12
C14
Course13
C16
Course10
C11
C12
C8
Course8
10
Repeated Patterns
Course1
Course3
Student1
Student3
Course5
Course6
Prof1
Prof2
Student5
Student6
Student2
Student4
Course2
Course4
Prof3
Prof4
Course11
Course9
Course7
Student11
Student9
Student7
Prof7
Prof6
Prof5
Student12
Student10
Student8
Course12
Course10
Course8
Course13
Student13
Prof8
Student13
Course13
11
Repeated Patterns
Course1
Course3
Student1
Student3
Course5
Course6
Course
Prof1
Prof2
Student5
Student6
Student2
Student4
Course2
Course4
Teaches
Course11
TAs
Prof3
Student
Course9
Prof4
Course7
Course13
Student11
Student9
Advises
Student7
Student13
Prof7
Prof8
ProfProf6
Prof5
Student12
Student10
Student8
Course12
Course10
Course8
Student13
Course13
12
Learning Using
Structural Motifs (LSM)

Finds literals that are densely connected

Random walks & hitting times

Groups literals into structural motifs

Structural motif = set of literals→ a set of clauses
= a subspace of clauses
{ Teaches(p,c),
:Teaches(p,c)
˅ TAs(s,c) ˅ Advises(p,s)
…
 Cluster
nodes into high-level concepts
TAs(s,c),
Teaches(p,c) ˅ :TAs(s,c)
…& nodes
Advises(p,s)
} paths
 Symmetrical
TAs(s,c
)
…
13
LSM’s Three Components
LSM
Input:
Relational DB
Advises
Output:MLN
Teaches
Pet
e
Pet
e
Pau
l
Sa
m
Sau
l
…
…
Sar
Pet
e
Pet
e
Pau
l
CS
1
CS
2
CS
2
…
…
TAs
Sa
m
Sa
m
Sar
a
CS
1
CS
2
CS
1
…
…
2.7
Identify
Motifs
Find
Paths
Create
MLN
Teaches(p, c)
Æ TAs(s, c) )
Advises(p, s)
1.4 Advises(p, s) Æ
Teaches(p, c))
TAs(s, c)
-1.1 TAs(s, c) )
Advises(s, p)
…
14
Random Walk


Begin at node A
Randomly pick neighbor n
E
B
D
F
A
C
15
Random Walk



Begin at node A
Randomly pick neighbor n
Move to node n
E
B
2
D
F
A
C
16
Random Walk




Begin at node A
Randomly pick neighbor n
Move to node n
Repeat
E
B
D
F
A
2
C
17
Hitting Time from node i to j

Expected number of steps starting from node
i before node j is visited for first time


Smaller hitting time → closer to start node i
Truncated Hitting Time [Sarkar & Moore, UAI’07]


Random walks are limited to T steps
Computed efficiently & with high probability by
sampling random walks [Sarkar, Moore & Prakash ICML’08]
18
Finding Truncated Hitting Time
By Sampling
E
B
D
A
1
F
C
T=5
A
19
Finding Truncated Hitting Time
By Sampling
E
B
D
4
A
F
C
T=5
AD
20
Finding Truncated Hitting Time
By Sampling
E
5
B
D
A
F
C
T=5
ADE
21
Finding Truncated Hitting Time
By Sampling
E
B
D
4
A
F
C
T=5
ADED
22
Finding Truncated Hitting Time
By Sampling
E
B
D
A
F
6
C
T=5
ADEDF
23
Finding Truncated Hitting Time
By Sampling
E
5
B
D
A
F
C
T=5
ADEDFE
24
Finding Truncated Hitting Time
By Sampling
hAE=2
E
hAD=1
hAA=0
D
B hAB=5
A
F
C
hAC=5
hAF=4
T=5
ADEDFE
25
Symmetrical Paths
Physics
History
C1
C3
TAs
S1
S2
S3
S4
S9
S10
S11
Advises
P1
Teaches
S5
S6
P2
S7
S8
P1,
S2
0, Advises, 1
S13
C4
C2
P1→S2
S12
Symmetrical
P1→S3
P1,
0, Advises, 1
S3
26
Symmetrical Paths
Physics
History
C1
C3
TAs
S1
S2
S3
S4
S9
S10
S11
Advises
P1
Teaches
S5
S6
P2
S7
C2
S8
S12
S13
C4
P1→S2
P1→S3
P1,
S2
0, Advises, 1
P1,
0, Advises, 1
S3
P1,
0, Advises, S1,
1, TAs, C1,
2, TAs, S2
3
P1,
0, Advises,
Advises, S4,
1,TAs,
TAs,C1,
2, TAs,
TAs, S3
3
27
Symmetrical Nodes
Physics
History
C1
C3
TAs
S1Sym.
S2
S3
S4have identical
S9 S10 S11
nodes
Advisestruncated hitting times
P1
P2
Teaches
Sym. nodes have identical
S12 S13
S5
S6
S7
S8
path distributions in a
C4
C2
sample
of
random
walks
Symmetrical
P1→S2
P1→S3
P1,
S2
0, Advises, 1
P1,
0, Advises, 1
S3
P1,
0, Advises, S1,
1, TAs, C1,
2, TAs, S2
3
…
P1,
0, Advises, S4,
1, TAs,
TAs, C1,
2, TAs, S3
3
…
28
Learning Using Structural Motifs
LSM
Input:
Relational DB
Advises
Output:MLN
Teaches
Pet
e
Pet
e
Pau
l
Sa
m
Sau
l
…
…
Sar
Pet
e
Pet
e
Pau
l
CS
1
CS
2
CS
2
…
…
TAs
Sa
m
Sa
m
Sar
a
CS
1
CS
2
CS
1
…
…
2.7
Identify
Motifs
Find
Paths
Create
MLN
Teaches(p, c)
Æ TAs(s, c) )
Advises(p, s)
1.4 Advises(p, s) Æ
Teaches(p, c))
TAs(s, c)
-1.1 TAs(s, c) )
Advises(s, p)
…
29
Sample Random Walks
0,Advises,1,TAs,2
1
…
…
Physics
History
C1
C3
TAs
S1
S2
S3
S4
S9
S10
S11
Advises
Teaches
S5
P1
S6
P2
S7
C2
S8
S12
S13
C4
30
Estimate Truncated
Hitting Times
Physics
3.2
3.55
S1
3.52
3.99
C1
C3
3.55 3.55 3.55
S2
0
S5
History
S3
S4
S9
3.99
P1
S6
3.93
S7
S8
3.52 3.52
C2
3.21
3.52
4
S10
4
S11
P2
4
S12
4
S13
C4 4
31
Prune ‘Faraway’ Nodes
Physics
3.2
3.55
S1
3.52
3.99
C1
C3
3.55 3.55 3.55
S2
0
S5
History
S3
S4
S9
3.99
P1
S6
3.93
S7
S8
3.52 3.52
C2
3.21
3.52
4
S10
4
S11
P2
4
S12
4
S13
C4 4
32
Group Nodes with
Similar Hitting Times
3.2
3.55
S1
3.55 3.55 3.55
S2
0
S5
3.52
C1
S3
S4
Candidate symmetrical nodes
P1
S6
S7
S8
3.52 3.52
C2
3.21
3.52
33
Cluster Nodes

Cluster nodes with similar path distributions
C1
S1
S2
S3
S4
S7
S8
0,Advises,1
0.5
0,Advises,2,…,1
0.1
…
…
P1
S5
S6
C2
34
Create ‘Lifted’ Graph
Course
C2
C1
TAs
Teaches
S1
S2
S5
S6
S3
S4
S7
Student
S8
Advises
P1
Professor
35
Extract Motif with DFS
Course
C2
C1
TAs
Teaches
S1
S2
S5
S6
S3
S4
S7
Student
S8
Advises
P1
Professor
36
Create Motif
C1
Motif
TAs
Teaches
{ Teaches(P1,C1),
TAs(S1,C1),
Advises(P1,S1) }
S1
{ Teaches(p,c),
TAs(s,c),
Advises(p,s) }
Advises
true grounding of
P1
37
Restart from Next Node
S1
Physics
History
C1
C3
S2
S3
S4
P1
S5
S6
S9
S10
S11
P2
S7
C2
S8
S12
S13
C4
38
Restart from Next Node
different motif over same
set of constants
Physics
Course1
C1
S1
S2
S3
S4
Professor
P1
S5
S6
Student1
S7
C2
S8
Student
Course2
39
Select Motifs

Choose motifs with large #true groundings
Motif
{ Teaches(p,c),
TAs(s,c),
Advises(p,s) }
{ Teaches(p,c),
…}
…
Est. #True Gndings
100
20
…
40
LSM
Pass selected motifs to FindPaths & CreateMLN

LSM
Input:
Relational DB
Advises
Output:MLN
Teaches
Pet
e
Pet
e
Pau
l
Sa
m
Sau
l
…
…
Sar
Pet
e
Pet
e
Pau
l
CS
1
CS
2
CS
2
…
…
TAs
Sa
m
Sa
m
Sar
a
CS
1
CS
2
CS
1
…
…
2.7
Identify
Motif
Find
Paths
Create
MLN
Teaches(p, c)
Æ TAs(s, c) )
Advises(p, s)
1.4 Advises(p, s) )
Teaches(p, c) Æ
TAs(s, c)
-1.1 TAs(s, c) )
Advises(s, p)
…
41
FindPaths
Paths Found
p
{ Teaches(p,c),
c
Advises TAs(s,c),
Advises(p,s) }
s
Advises(p,s)
Advises(p,s) ,
Teaches (p,c)
Advises(p,s) ,
Teaches (p,c),
TAs(s,c)
42
Clause Creation
Advises(p, s)
Æ :Teaches(p,
Teaches(p, c)c)Æ V
TAs(s,
c) c)
: Advises(p, s) V
:TAs(s,
Advises(p, s) ÆV :Teaches(p,
Teaches(p,c) c) V :TAs(s, c)
Advises(p, s)
V
Æ
Teaches(p, c) V :TAs(s, c)
…
TAs(s,c)
43
Clause Pruning
Score
: Advises(p, s) V :Teaches(p, c) V TAs(s, c)
-1.15
Advises(p, s) V :Teaches(p, c) V TAs(s, c)
…
: Advises(p, s) V :Teaches(p, c)
: Advises(p, s) V TAs(s, c)
:Teaches(p, c) V TAs(s, c)
…
-1.17
: Advises(p, s)
: Teaches(p, c)
TAs(s, c)
-3.13
-2.93
-3.93
`
…
-2.21
-2.23
-2.03
…
44
Clause Pruning
Compare each clause against its
sub-clauses (taken individually)
Score
: Advises(p, s) V :Teaches(p, c) V TAs(s, c)
-1.15
Advises(p, s) V :Teaches(p, c) V TAs(s, c)
…
: Advises(p, s) V :Teaches(p, c)
: Advises(p, s) V TAs(s, c)
:Teaches(p, c) V TAs(s, c)
…
-1.17
: Advises(p, s)
: Teaches(p, c)
TAs(s, c)
-3.13
-2.93
-3.93
…
-2.21
-2.23
-2.03
…
45
MLN Creation



Add all clauses to empty MLN
Train weights of clauses
Remove clauses with absolute weights
below threshold
46
Outline




Background
Learning Using Structural Motifs
Experiments
Future Work
47
Datasets

IMDB




Created from IMDB.com DB
Movies, actors, etc., and relationships
17,793 ground atoms; 1224 true ones
UW-CSE



Describes academic department
Students, faculty, etc., and relationships
260,254 ground atoms; 2112 true ones
48
Datasets

Cora



Citations to computer science papers
Papers, authors, titles, etc., and their
relationships
687,422 ground atoms; 42,558 true ones
49
Methodology


Five-fold cross validation
Inferred prob. true for groundings of each pred.


Groundings of all other predicates as evidence
For Cora, inferred four predicates jointly too

SameCitation, SameTitle, SameAut, SameVenue

MCMC to eval test atoms: 106 samples or 24 hrs

Evaluation measures: CLL, AUC
50
Methodology

LSM, LHL, BUSL, MSL

Two lengths per system


Short length of 4
Long length of 10
51
AUC & CLL
1
Short Clauses
↑5%
1
0.8
AUC
0.8
Long Clauses
0.6
0.6
0.4
0.4
0.2
0.2
0
0
LSM
LHL
BUSL
MSL
0
0
LHL
BUSL MSL
LHL
BUSL
↑45% -0.2
CLL
-0.2
-0.4
-0.4
-0.6
-0.6
-0.8
-0.8
-1
LSM
LSM
LHL
BUSL
MSL
-1
LSM
MSL
52
Runtimes
Short Clauses
230,000
9.96
200000
hr
10
9
8
7
6
5
4
3
2
1
0
Long Clauses
10,000X
100000
1.33
1.92
1.83
20.6
LSM
LHL
BUSL MSL
0
LSM
1.83
LHL
9.81
BUSL MSL
same local maxima
53
Long Rules Learned on Cora

VenueOfCit(v,c) ᴧ VenueOfCit(v,c') ᴧ
AuthorOfCit(a,c) ᴧ AuthorOfCit(a',c') ᴧ SameAuthor(a,a') ᴧ
TitleOfCit(t,c)
ᴧ TitleOfCit(t',c')
) SameTitle(t,t')

SameCitation(c,c') ᴧ TitleOfCit(t,c)
ᴧ TitleOfCit(t',c') ᴧ
HasWordTitle(t,w) ᴧ HasWordTitle(t',w) ᴧ AuthorOfCit(a,c) ᴧ
AuthorOfCit(a',c') ᴧ SameAuthor(a,a')
54
Outline




Background
Learning Using Structural Motifs
Experiments
Future Work
55
Future Work



Discover motifs at multiple granularities
Combine LSM with generate-and-test
approaches
Apply LSM to larger, richer domains, e.g.,
Web
56
Conclusion

LSM finds structural motifs in data



random walks & hitting times
Accurately and efficiently learns long clauses by
searching within motifs
Outperforms state-of-the-art structure learners
57
58
Long Rules Learned (Cora)

AuthorOfCit(a,c) ᴧ AuthorOfCit(a',c') ᴧ SameAuthor(a,a') ᴧ
TitleOfCit(t,c)
ᴧ TitleOfCit(t',c')
ᴧ SameTitle(t,t')
) SameCitation(c,c')

AuthorHasWord(a,w) ᴧ AuthorHasWord(a',w') ᴧ
AuthorHasWord(a'',w) ᴧ AuthorHasWord(a'',w')
) SameAuthor(a,a')
59
Download