Unifying MAX SAT, Local Consistency Relaxations, and Soft Logic with Hinge-Loss MRFs Stephen H. Bach Maryland Bert Huang Virginia Tech Lise Getoor UC Santa Cruz Modeling Relational Data with Markov Random Fields Markov Random Fields Probabilistic model for high-dimensional data: The random variables represent the data, such as whether a person has an attribute or whether a link exists The potentials The weights 3 score different configurations of the data scale the influence of different potentials Markov Random Fields Variables and potentials form graphical structure: 4 MRFs with Logic One way to compactly define MRFs is with first-order logic, e.g., Markov logic networks Use first-order logic to define templates for potentials - Ground out weighted rules over graph data - The truth table of each ground rule is a potential - Each first-order rule has a weight that becomes the potential’s 5 Richardson and Domingos, 2006 MRFs with Logic Let be a set of weighted logical rules, where each rule has the general form - Weights and sets Equivalent clausal form: 6 and index variables MRFs with Logic Probability distribution: 7 MAP Inference MAP (maximum a posteriori) inference seeks a mostprobable assignment to the unobserved variables MAP inference is This MAX SAT problem is combinatorial and NP-hard! 8 MAX SAT Relaxation Approximate Inference View MAP inference as optimizing rounding probabilities Expected score of a clause is a weighted noisy-or function: Then expected total score is But, 10 is highly non-convex! Approximate Inference It is the products in the objective that make it non-convex The expected score can be lower bounded using the relationship between arithmetic and harmonic means: This leads to the lower bound 11 Goemans and Williamson, 1994 Approximate Inference So, we solve the linear program If we set , a greedy rounding method will find a -optimal discrete solution If we set 12 Goemans and Williamson, 1994 , it improves to ¾-optimal Local Consistency Relaxation Local Consistency Relaxation LCR is a popular technique for approximating MAP in MRFs - Often simply called linear programming (LP) relaxation - Dual decomposition solves dual to LCR objective Idea: relax search over consistent marginals to simpler set LCR admits fast message-passing algorithms, but no quality guarantees in general 14 Wainwright and Jordan 2008 Local Consistency Relaxation : pseudomarginals over variable states : pseudomarginals over joint potential states 15 Wainwright and Jordan 2008 Unifying the Relaxations Analysis j=1 j=3 j=2 17 Bach et al. AISTATS 2015 and so on… Analysis 18 Bach et al. AISTATS 2015 Analysis We can now analyze each potential’s parameterized subproblem in isolation: Using the KKT conditions, we can find a simplified expression for each solution based on the parameters 19 Bach et al. AISTATS 2015 : Analysis 20 Bach et al. AISTATS 2015 Substitute back into outer objective Analysis Leads to simplified, projected LCR over 21 Bach et al. AISTATS 2015 : Analysis Local Consistency Relaxation 22 Bach et al. AISTATS 2015 MAX SAT Relaxation Consequences MAX SAT relaxation solved with choice of algorithms Rounding guarantees apply to LCR! 23 Bach et al. AISTATS 2015 Soft Logic and Continuous Values Continuous Values Continuous values can also be interpreted as similarities Or degrees of truth. Łukasiewicz logic is a fuzzy logic for reasoning about imprecise concepts 25 Bach et al. In Preparation All Three are Equivalent Local Consistency Relaxation Exact MAX SAT for Łukasiewicz logic 26 Bach et al. In Preparation MAX SAT Relaxation Consequences Exact MAX SAT for Łukasiewicz logic is equivalent to relaxed Boolean MAX SAT and local consistency relaxation for logical MRFs So these scalable message-passing algorithms can also be used to reason about similarity, imprecise concepts, etc.! 27 Bach et al. In Preparation Hinge-Loss Markov Random Fields Generalizing Relaxed MRFs Relaxed, logic-based MRFs can reason about both discrete and continuous relational data scalably and accurately Define a new distribution over continuous variables: We can generalize this inference objective to be the energy of a new type of MRF that does even more 29 Bach et al. NIPS 12, Bach et al. UAI 13 Generalizations Arbitrary hinge-loss functions (not just logical clauses) Hard linear constraints Squared hinge losses 30 Bach et al. NIPS 12, Bach et al. UAI 13 Hinge-Loss MRFs Define hinge-loss MRFs by using this generalized objective as the energy function 31 Bach et al. NIPS 12, Bach et al. UAI 13 HL-MRF Inference and Learning MAP inference for HL-MRFs always a convex optimization - Highly scalable ADMM algorithm for MAP Supervised Learning - No need to hand-tune weights - Learn from training data - Also highly scalable Unsupervised and Semi-Supervised Learning More in - New learning algorithm that interleaves inference and parameter updates to cut learning time by as much as 90%Bert’s (under talk review) 32 Bach et al. NIPS 12, Bach et al. UAI 13 Probabilistic Soft Logic Probabilistic Soft Logic (PSL) Probabilistic programming language for defining HL-MRFs PSL components - Predicates: - Atoms: - Rules: 34 relationships or properties (continuous) random variables potentials Example: Voter Identification 5.0 : Donates(A, “Republican”) -> Votes(A, “Republican”) ? $ Status update $ Tweet 0.3 : Mentions(A, “Affordable Care”) -> Votes(A, “Democrat”) Votes(A, “Republican”) + Votes(A, “Democrat”) = 1.0 . 35 Example: Voter Identification 0.3 : Votes(A,P) && Friend(B,A) -> Votes(B,P) friend spouse colleague friend friend friend spouse colleague spouse 0.8 : Votes(A,P) && Spouse(B,A) -> Votes(B,P) 36 Example: Voter Identification /* Predicate definitions */ Votes(Person, Party) Donates(Person, Party) Mentions(Person, Term) Colleague(Person, Person) Friend(Person, Person) Spouse(Person, Person) (closed) (closed) (closed) (closed) (closed) /* Local rules */ 5.0 : Donates(A, P) -> Votes(A, P) 0.3 : Mentions(A, “Affordable Care”) -> Votes(A, “Democrat”) 0.3 : Mentions(A, “Tax Cuts”) -> Votes(A, “Republican”) ... /* Relational rules 1.0 : Votes(A,P) && 0.3 : Votes(A,P) && 0.1 : Votes(A,P) && */ Spouse(B,A) -> Votes(B,P) Friend(B,A) -> Votes(B,P) Colleague(B,A) -> Votes(B,P) /* Range constraint */ Votes(A, “Republican”) + Votes(A, “Democrat”) = 1.0 . 37 PSL Defines HL-MRFs /* Predicate definitions */ Votes(Person, Party) Donates(Person, Party) (closed) Mentions(Person, Term) (closed) Colleague(Person, Person) (closed) Friend(Person, Person) (closed) Spouse(Person, Person) (closed) /* Local rules */ 5.0 : Donates(A, P) -> Votes(A, P) 0.3 : Mentions(A, “Affordable Care”) -> Votes(A, “Democrat”) 0.3 : Mentions(A, “Tax Cuts”) -> Votes(A, “Republican”) ... /* Relational rules 1.0 : Votes(A,P) && 0.3 : Votes(A,P) && 0.1 : Votes(A,P) && */ Spouse(B,A) -> Votes(B,P) Friend(B,A) -> Votes(B,P) Colleague(B,A) -> Votes(B,P) /* Range constraint */ Votes(A, “Republican”) + Votes(A, “Democrat”)=1. 38 + = Open Source Implementation PSL is implemented as an open source library and programming language interface It’s ready to use for your next project Some of the other groups already using PSL: - 39 Jure Leskovec (Stanford) Dan Jurafsky (Stanford) Ray Mooney (UT Austin) Kevin Murphy (Google) [West et al., TACL 14] [Li et al., ArXiv 14] [Beltagy et al., ACL 14] [Pujara et al., BayLearn 14] Other PSL Topics Not discussed here: - Smart grounding - Lazy inference - Distributed inference and learning Future work: - Lifted inference - Generalized rounding guarantees 40 psl.cs.umd.edu 41 Applications PSL Empirical Highlights Compared with discrete MRFs: Collective Classification Trust Prediction PSL 81.8% 0.7 sec .482 AuPR 0.32 sec Discrete 79.7% 184.3 sec .441 AuPR 212.36 sec Predicting MOOC outcomes via latent engagement (AuPR): 43 Tech Women-Civil Genes Lecture Rank 0.333 0.508 0.688 PSL-Direct 0.393 0.565 0.757 PSL-Latent 0.546 0.816 0.818 Bach et al. UAI 13, Ramesh et al. AAAI 14 PSL Empirical Highlights Improved activity recognition in video: 5 Activities 6 Activities HOG 47.4% .481 F1 59.6% .582 F1 PSL + HOG 59.8% .603 F1 79.3% .789 F1 ACD 67.5% .678 F1 83.5% .835 F1 PSL + ACD 69.2% .693 F1 86.0% .860 F1 Compared on drug-target interaction prediction: 44 AuPR P@130 Perlman’s Method .564 ± .05 .594 ± .04 PSL .617 ± .05 .616 ± .04 London et al. CVPR WS 13, Fakhraei et al. TCBB 14 Conclusion Conclusion HL-MRFs unite and generalize different ways of viewing fundamental AI problems, including MAX SAT, probabilistic graphical models, and fuzzy logic PSL’s mix of expressivity and scalability makes it a general-purpose tool for network analysis, bioinformatics, NLP, computer vision, and more Many exciting open problems, from algorithms, to theory, to applications 46