Markov Logic Networks: A Step Towards a Unified Theory of Learning and Cognition Pedro Domingos Dept. of Computer Science & Eng. University of Washington Joint work with Jesse Davis, Stanley Kok, Daniel Lowd, Hoifung Poon, Matt Richardson, Parag Singla, Marc Sumner One Algorithm Observation: The cortex has the same architecture throughout Hypothesis: A single learning/inference algorithm underlies all cognition Let’s discover it! The Neuroscience Approach Map the brain and figure out how it works Problem: We don’t know nearly enough The Engineering Approach Pick a task (e.g., object recognition) Figure out how to solve it Generalize to other tasks Problem: Any one task is too impoverished The Foundational Approach Consider all tasks the brain does Figure out what they have in common Formalize, test and repeat Advantage: Plenty of clues Where to start? The grand aim of science is to cover the greatest number of experimental facts by logical deduction from the smallest number of hypotheses or axioms. Albert Einstein Recurring Themes Noise, uncertainty, incomplete information → Probability / Graphical models Complexity, many objects & relations → First-order logic Examples Field Logical approach Statistical approach Knowledge representation First-order logic Graphical models Automated reasoning Satisfiability testing Markov chain Monte Carlo Machine learning Inductive logic programming Neural networks Planning Markov decision processes Classical planning Natural language Definite clause grammars processing Prob. contextfree grammars Research Plan Unify graphical models and first-order logic Develop learning and inference algorithms Apply to wide variety of problems This talk: Markov logic networks Markov Logic Networks MLN = Set of 1st-order formulas with weights Formula = Feature template (Vars→Objects) E.g., Ising model: Up(x) ^ Neighbor(x,y) => Up(y) 1 P( x) exp wi ni ( x) Z i Weight of formula i No. of true instances of formula i in x Most graphical models are special cases First-order logic is infinite-weight limit MLN Algorithms: The First Three Generations Problem MAP inference Marginal inference Weight learning Structure learning First generation Weighted satisfiability Gibbs sampling Pseudolikelihood Inductive logic progr. Second generation Lazy inference MC-SAT Voted perceptron ILP + PL (etc.) Third generation Cutting planes Lifted inference Scaled conj. gradient Clustering + pathfinding Weighted Satisfiability SAT: Find truth assignment that makes all formulas (clauses) true Huge amount of research on this problem State of the art: Millions of vars/clauses in minutes MaxSAT: Make as many clauses true as possible Weighted MaxSAT: Clauses have weights; maximize satisfied weight MAP inference in MLNs is just weighted MaxSAT Best current solver: MaxWalkSAT MC-SAT Deterministic dependences break MCMC In practice, even strong probabilistic ones do Swendsen-Wang: Introduce aux. vars. u to represent constraints among x Alternately sample u | x and x | u. But Swendsen-Wang only works for Ising models MC-SAT: Generalize S-W to arbitrary clauses Uses SAT solver to sample x | u. Orders of magnitude faster than Gibbs sampling, etc. Lifted Inference Consider belief propagation (BP) Often in large problems, many nodes are interchangeable: They send and receive the same messages throughout BP Basic idea: Group them into supernodes, forming lifted network Smaller network → Faster inference Akin to resolution in first-order logic Belief Propagation x f ( x) Nodes (x) h x hn ( x ) \{ f } ( x) Features (f) wf ( x ) f x ( x) e y f ( y ) ~{ x} yn ( f ) \{x} Lifted Belief Propagation x f ( x) Nodes (x) h x hn ( x ) \{ f } ( x) Features (f) wf ( x ) f x ( x) e y f ( y ) ~{ x} yn ( f ) \{x} Lifted Belief Propagation , : Functions of edge counts Nodes (x) x f ( x) hx ( x) hn ( x ) \{ f } Features (f) wf ( x ) f x ( x) e y f ( y ) ~{ x} yn ( f ) \{x} Weight Learning Pseudo-likelihood + L-BFGS is fast and robust but can give poor inference results Voted perceptron: Gradient descent + MAP inference Problem: Multiple modes Not alleviated by contrastive divergence Alleviated by MC-SAT Start each MC-SAT run at previous end state Weight Learning (contd.) Problem: Extreme ill-conditioning Solvable by quasi-Newton, conjugate gradient, etc. But line searches require exact inference Stochastic gradient not applicable because data not i.i.d. Solution: Scaled conjugate gradient Use Hessian to choose step size Compute quadratic form inside MC-SAT Use inverse diagonal Hessian as preconditioner Structure Learning Standard inductive logic programming optimizes the wrong thing But can be used to overgenerate for L1 pruning Our approach: ILP + Pseudo-likelihood + Structure priors For each candidate structure change: Start from current weights & relax convergence Use subsampling to compute sufficient statistics Search methods: Beam, shortest-first, etc. Applications to Date Natural language processing Information extraction Entity resolution Link prediction Collective classification Social network analysis Robot mapping Activity recognition Scene analysis Computational biology Probabilistic Cyc Personal assistants Etc. Unsupervised Semantic Parsing Goal Microsoft buys Powerset. BUYS(MICROSOFT,POWERSET) Challenge Microsoft buys Powerset Microsoft acquires semantic search engine Powerset Powerset is acquired by Microsoft Corporation The Redmond software giant buys Powerset Microsoft’s purchase of Powerset, … USP Recursively cluster expressions composed of similar subexpressions Evaluation Extract knowledge from biomedical abstracts and answer questions Substantially outperforms state of the art Three times as many correct answers; accuracy 88% Research Directions Compact representations Deep architectures Boolean decision diagrams Arithmetic circuits Unified inference procedure Learning MLNs with many latent variables Tighter integration of learning and inference End-to-end NLP system Complete agent Resources Open-source software/Web site: Alchemy Learning and inference algorithms Tutorials, manuals, etc. MLNs, datasets, etc. Publications alchemy.cs.washington.edu Book: Domingos & Lowd, Markov Logic, Morgan & Claypool, 2009.