Learning Markov Logic Networks Using Structural Motifs Stanley Kok Dept. of Computer Science and Eng. University of Washington Seattle, USA Joint work with Pedro Domingos 1 Outline Background Background Learning Using Structural Motifs Experiments Future Work 2 Markov Logic Networks [Richardson & Domingos, MLJ’06] A logical KB is a set of hard constraints on the set of possible worlds Let’s make them soft constraints: When a world violates a formula, it becomes less probable, not impossible Give each formula a weight (Higher weight Stronger constraint) 2.7 Teaches(p,c) ) Professor(p) 3 Markov Logic A Markov logic network (MLN) is a set of pairs (F,w) F is a formula in first-order logic w is a real number vector of truth assignments to partition ground atoms function weight of ith formula #true groundings of ith formula 4 MLN Structure Learning Output: MLN Input: Relational Data Advises Teaches Pete Sam Pete CS1 Pete Saul Pete CS2 Paul Sara Paul CS2 … … … … TAs Sam CS1 Sam CS2 Sara CS1 … … 2.7 Teaches(p, c) Æ TAs(s, c) ) Advises(p, s) MLN Structure Learner 1.4 Advises(p, s) Æ Teaches(p, c) ) TAs(s, c) 1.1 :TAs(s, c) ˅ : Advises (s, p) … 5 Previous Systems Generate-and-test or greedy MSL [Kok & Domingos, ICML’05] BUSL [Mihalkova & Mooney, ICML’07] Computationally expensive; large search space Susceptible to local maxima 6 LHL System [Kok & Domingos, ICML’09] Advises Pete Sam Pete Saul Paul Sara … … TAs Sam Sam Sara … Teaches Pete CS1 Pete CS2 Paul CS2 … … Advises CS1 CS2 CS1 … Trace paths & convert paths to first-order clauses Professor Pete Paul Pat Phil Advises Sam Sara Saul Student Sue Pete Paul Pat Phil Sam Sara Saul Sue CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 TAs CS1 Course CS2 CS3 CS4 CS5 CS6 CS7 CS8 ‘Lifts’ 7 Outline Background Learning Using Structural Motifs Experiments Future Work 8 Learning Using Structural Motifs (LSM) First MLN structure learner that can learn long clauses Capture more complex dependencies Explore a larger space of clauses LHL Recap Course1 C1 TAs S1 Student1 S2 S3 Course3 C3 S4 S9Student3 S10 S11 Course5 C5 Course6 C6 Advises Prof1 P1 Teaches Prof2 P2 S14Student5 S15 S16 S17 Student6 S18 S19 S5 Student2 S6 S7 S8 S12 Student4 S13 P3Prof3 P4 P5 Prof4 P6 Course4 C4 Course2 C2 C13 Course11 C9 Course9 C10 Course7 C7 Course13 C15 S33 Student11 S34 S35 S25 S26 Student9 S27 S28 S20Student7 S21 S22 S38Student13 S39 S40 P9 Prof7 Prof6 P8 Prof5 P7 P10Prof8 P11 S36 Student12 S37 S29 Student10 S30 S31 S32 S23 Student8 S24 Student13 S41 Course12 C14 Course13 C16 Course10 C11 C12 C8 Course8 10 Repeated Patterns Course1 Course3 Student1 Student3 Course5 Course6 Prof1 Prof2 Student5 Student6 Student2 Student4 Course2 Course4 Prof3 Prof4 Course11 Course9 Course7 Student11 Student9 Student7 Prof7 Prof6 Prof5 Student12 Student10 Student8 Course12 Course10 Course8 Course13 Student13 Prof8 Student13 Course13 11 Repeated Patterns Course1 Course3 Student1 Student3 Course5 Course6 Course Prof1 Prof2 Student5 Student6 Student2 Student4 Course2 Course4 Teaches Course11 TAs Prof3 Student Course9 Prof4 Course7 Course13 Student11 Student9 Advises Student7 Student13 Prof7 Prof8 ProfProf6 Prof5 Student12 Student10 Student8 Course12 Course10 Course8 Student13 Course13 12 Learning Using Structural Motifs (LSM) Finds literals that are densely connected Random walks & hitting times Groups literals into structural motifs Structural motif = set of literals→ a set of clauses = a subspace of clauses { Teaches(p,c), :Teaches(p,c) ˅ TAs(s,c) ˅ Advises(p,s) … Cluster nodes into high-level concepts TAs(s,c), Teaches(p,c) ˅ :TAs(s,c) …& nodes Advises(p,s) } paths Symmetrical TAs(s,c ) … 13 LSM’s Three Components LSM Input: Relational DB Advises Output:MLN Teaches Pet e Pet e Pau l Sa m Sau l … … Sar Pet e Pet e Pau l CS 1 CS 2 CS 2 … … TAs Sa m Sa m Sar a CS 1 CS 2 CS 1 … … 2.7 Identify Motifs Find Paths Create MLN Teaches(p, c) Æ TAs(s, c) ) Advises(p, s) 1.4 Advises(p, s) Æ Teaches(p, c)) TAs(s, c) -1.1 TAs(s, c) ) Advises(s, p) … 14 Random Walk Begin at node A Randomly pick neighbor n E B D F A C 15 Random Walk Begin at node A Randomly pick neighbor n Move to node n E B 2 D F A C 16 Random Walk Begin at node A Randomly pick neighbor n Move to node n Repeat E B D F A 2 C 17 Hitting Time from node i to j Expected number of steps starting from node i before node j is visited for first time Smaller hitting time → closer to start node i Truncated Hitting Time [Sarkar & Moore, UAI’07] Random walks are limited to T steps Computed efficiently & with high probability by sampling random walks [Sarkar, Moore & Prakash ICML’08] 18 Finding Truncated Hitting Time By Sampling E B D A 1 F C T=5 A 19 Finding Truncated Hitting Time By Sampling E B D 4 A F C T=5 AD 20 Finding Truncated Hitting Time By Sampling E 5 B D A F C T=5 ADE 21 Finding Truncated Hitting Time By Sampling E B D 4 A F C T=5 ADED 22 Finding Truncated Hitting Time By Sampling E B D A F 6 C T=5 ADEDF 23 Finding Truncated Hitting Time By Sampling E 5 B D A F C T=5 ADEDFE 24 Finding Truncated Hitting Time By Sampling hAE=2 E hAD=1 hAA=0 D B hAB=5 A F C hAC=5 hAF=4 T=5 ADEDFE 25 Symmetrical Paths Physics History C1 C3 TAs S1 S2 S3 S4 S9 S10 S11 Advises P1 Teaches S5 S6 P2 S7 S8 P1, S2 0, Advises, 1 S13 C4 C2 P1→S2 S12 Symmetrical P1→S3 P1, 0, Advises, 1 S3 26 Symmetrical Paths Physics History C1 C3 TAs S1 S2 S3 S4 S9 S10 S11 Advises P1 Teaches S5 S6 P2 S7 C2 S8 S12 S13 C4 P1→S2 P1→S3 P1, S2 0, Advises, 1 P1, 0, Advises, 1 S3 P1, 0, Advises, S1, 1, TAs, C1, 2, TAs, S2 3 P1, 0, Advises, Advises, S4, 1,TAs, TAs,C1, 2, TAs, TAs, S3 3 27 Symmetrical Nodes Physics History C1 C3 TAs S1Sym. S2 S3 S4have identical S9 S10 S11 nodes Advisestruncated hitting times P1 P2 Teaches Sym. nodes have identical S12 S13 S5 S6 S7 S8 path distributions in a C4 C2 sample of random walks Symmetrical P1→S2 P1→S3 P1, S2 0, Advises, 1 P1, 0, Advises, 1 S3 P1, 0, Advises, S1, 1, TAs, C1, 2, TAs, S2 3 … P1, 0, Advises, S4, 1, TAs, TAs, C1, 2, TAs, S3 3 … 28 Learning Using Structural Motifs LSM Input: Relational DB Advises Output:MLN Teaches Pet e Pet e Pau l Sa m Sau l … … Sar Pet e Pet e Pau l CS 1 CS 2 CS 2 … … TAs Sa m Sa m Sar a CS 1 CS 2 CS 1 … … 2.7 Identify Motifs Find Paths Create MLN Teaches(p, c) Æ TAs(s, c) ) Advises(p, s) 1.4 Advises(p, s) Æ Teaches(p, c)) TAs(s, c) -1.1 TAs(s, c) ) Advises(s, p) … 29 Sample Random Walks 0,Advises,1,TAs,2 1 … … Physics History C1 C3 TAs S1 S2 S3 S4 S9 S10 S11 Advises Teaches S5 P1 S6 P2 S7 C2 S8 S12 S13 C4 30 Estimate Truncated Hitting Times Physics 3.2 3.55 S1 3.52 3.99 C1 C3 3.55 3.55 3.55 S2 0 S5 History S3 S4 S9 3.99 P1 S6 3.93 S7 S8 3.52 3.52 C2 3.21 3.52 4 S10 4 S11 P2 4 S12 4 S13 C4 4 31 Prune ‘Faraway’ Nodes Physics 3.2 3.55 S1 3.52 3.99 C1 C3 3.55 3.55 3.55 S2 0 S5 History S3 S4 S9 3.99 P1 S6 3.93 S7 S8 3.52 3.52 C2 3.21 3.52 4 S10 4 S11 P2 4 S12 4 S13 C4 4 32 Group Nodes with Similar Hitting Times 3.2 3.55 S1 3.55 3.55 3.55 S2 0 S5 3.52 C1 S3 S4 Candidate symmetrical nodes P1 S6 S7 S8 3.52 3.52 C2 3.21 3.52 33 Cluster Nodes Cluster nodes with similar path distributions C1 S1 S2 S3 S4 S7 S8 0,Advises,1 0.5 0,Advises,2,…,1 0.1 … … P1 S5 S6 C2 34 Create ‘Lifted’ Graph Course C2 C1 TAs Teaches S1 S2 S5 S6 S3 S4 S7 Student S8 Advises P1 Professor 35 Extract Motif with DFS Course C2 C1 TAs Teaches S1 S2 S5 S6 S3 S4 S7 Student S8 Advises P1 Professor 36 Create Motif C1 Motif TAs Teaches { Teaches(P1,C1), TAs(S1,C1), Advises(P1,S1) } S1 { Teaches(p,c), TAs(s,c), Advises(p,s) } Advises true grounding of P1 37 Restart from Next Node S1 Physics History C1 C3 S2 S3 S4 P1 S5 S6 S9 S10 S11 P2 S7 C2 S8 S12 S13 C4 38 Restart from Next Node different motif over same set of constants Physics Course1 C1 S1 S2 S3 S4 Professor P1 S5 S6 Student1 S7 C2 S8 Student Course2 39 Select Motifs Choose motifs with large #true groundings Motif { Teaches(p,c), TAs(s,c), Advises(p,s) } { Teaches(p,c), …} … Est. #True Gndings 100 20 … 40 LSM Pass selected motifs to FindPaths & CreateMLN LSM Input: Relational DB Advises Output:MLN Teaches Pet e Pet e Pau l Sa m Sau l … … Sar Pet e Pet e Pau l CS 1 CS 2 CS 2 … … TAs Sa m Sa m Sar a CS 1 CS 2 CS 1 … … 2.7 Identify Motif Find Paths Create MLN Teaches(p, c) Æ TAs(s, c) ) Advises(p, s) 1.4 Advises(p, s) ) Teaches(p, c) Æ TAs(s, c) -1.1 TAs(s, c) ) Advises(s, p) … 41 FindPaths Paths Found p { Teaches(p,c), c Advises TAs(s,c), Advises(p,s) } s Advises(p,s) Advises(p,s) , Teaches (p,c) Advises(p,s) , Teaches (p,c), TAs(s,c) 42 Clause Creation Advises(p, s) Æ :Teaches(p, Teaches(p, c)c)Æ V TAs(s, c) c) : Advises(p, s) V :TAs(s, Advises(p, s) ÆV :Teaches(p, Teaches(p,c) c) V :TAs(s, c) Advises(p, s) V Æ Teaches(p, c) V :TAs(s, c) … TAs(s,c) 43 Clause Pruning Score : Advises(p, s) V :Teaches(p, c) V TAs(s, c) -1.15 Advises(p, s) V :Teaches(p, c) V TAs(s, c) … : Advises(p, s) V :Teaches(p, c) : Advises(p, s) V TAs(s, c) :Teaches(p, c) V TAs(s, c) … -1.17 : Advises(p, s) : Teaches(p, c) TAs(s, c) -3.13 -2.93 -3.93 ` … -2.21 -2.23 -2.03 … 44 Clause Pruning Compare each clause against its sub-clauses (taken individually) Score : Advises(p, s) V :Teaches(p, c) V TAs(s, c) -1.15 Advises(p, s) V :Teaches(p, c) V TAs(s, c) … : Advises(p, s) V :Teaches(p, c) : Advises(p, s) V TAs(s, c) :Teaches(p, c) V TAs(s, c) … -1.17 : Advises(p, s) : Teaches(p, c) TAs(s, c) -3.13 -2.93 -3.93 … -2.21 -2.23 -2.03 … 45 MLN Creation Add all clauses to empty MLN Train weights of clauses Remove clauses with absolute weights below threshold 46 Outline Background Learning Using Structural Motifs Experiments Future Work 47 Datasets IMDB Created from IMDB.com DB Movies, actors, etc., and relationships 17,793 ground atoms; 1224 true ones UW-CSE Describes academic department Students, faculty, etc., and relationships 260,254 ground atoms; 2112 true ones 48 Datasets Cora Citations to computer science papers Papers, authors, titles, etc., and their relationships 687,422 ground atoms; 42,558 true ones 49 Methodology Five-fold cross validation Inferred prob. true for groundings of each pred. Groundings of all other predicates as evidence For Cora, inferred four predicates jointly too SameCitation, SameTitle, SameAut, SameVenue MCMC to eval test atoms: 106 samples or 24 hrs Evaluation measures: CLL, AUC 50 Methodology LSM, LHL, BUSL, MSL Two lengths per system Short length of 4 Long length of 10 51 AUC & CLL 1 Short Clauses ↑5% 1 0.8 AUC 0.8 Long Clauses 0.6 0.6 0.4 0.4 0.2 0.2 0 0 LSM LHL BUSL MSL 0 0 LHL BUSL MSL LHL BUSL ↑45% -0.2 CLL -0.2 -0.4 -0.4 -0.6 -0.6 -0.8 -0.8 -1 LSM LSM LHL BUSL MSL -1 LSM MSL 52 Runtimes Short Clauses 230,000 9.96 200000 hr 10 9 8 7 6 5 4 3 2 1 0 Long Clauses 10,000X 100000 1.33 1.92 1.83 20.6 LSM LHL BUSL MSL 0 LSM 1.83 LHL 9.81 BUSL MSL same local maxima 53 Long Rules Learned on Cora VenueOfCit(v,c) ᴧ VenueOfCit(v,c') ᴧ AuthorOfCit(a,c) ᴧ AuthorOfCit(a',c') ᴧ SameAuthor(a,a') ᴧ TitleOfCit(t,c) ᴧ TitleOfCit(t',c') ) SameTitle(t,t') SameCitation(c,c') ᴧ TitleOfCit(t,c) ᴧ TitleOfCit(t',c') ᴧ HasWordTitle(t,w) ᴧ HasWordTitle(t',w) ᴧ AuthorOfCit(a,c) ᴧ AuthorOfCit(a',c') ᴧ SameAuthor(a,a') 54 Outline Background Learning Using Structural Motifs Experiments Future Work 55 Future Work Discover motifs at multiple granularities Combine LSM with generate-and-test approaches Apply LSM to larger, richer domains, e.g., Web 56 Conclusion LSM finds structural motifs in data random walks & hitting times Accurately and efficiently learns long clauses by searching within motifs Outperforms state-of-the-art structure learners 57 58 Long Rules Learned (Cora) AuthorOfCit(a,c) ᴧ AuthorOfCit(a',c') ᴧ SameAuthor(a,a') ᴧ TitleOfCit(t,c) ᴧ TitleOfCit(t',c') ᴧ SameTitle(t,t') ) SameCitation(c,c') AuthorHasWord(a,w) ᴧ AuthorHasWord(a',w') ᴧ AuthorHasWord(a'',w) ᴧ AuthorHasWord(a'',w') ) SameAuthor(a,a') 59