Markov Logic Networks

advertisement
Markov Logic Networks:
A Step Towards a Unified Theory of
Learning and Cognition
Pedro Domingos
Dept. of Computer Science & Eng.
University of Washington
Joint work with Jesse Davis, Stanley Kok, Daniel Lowd,
Hoifung Poon, Matt Richardson, Parag Singla, Marc Sumner
One Algorithm



Observation:
The cortex has the same
architecture throughout
Hypothesis: A single
learning/inference
algorithm underlies
all cognition
Let’s discover it!
The Neuroscience Approach


Map the brain and figure out how it works
Problem: We don’t know nearly enough
The Engineering Approach




Pick a task (e.g., object recognition)
Figure out how to solve it
Generalize to other tasks
Problem: Any one task is too impoverished
The Foundational Approach





Consider all tasks the brain does
Figure out what they have in common
Formalize, test and repeat
Advantage: Plenty of clues
Where to start?
The grand aim of science is to cover the greatest
number of experimental facts by logical deduction
from the smallest number of hypotheses or axioms.
Albert Einstein
Recurring Themes

Noise, uncertainty, incomplete information
→ Probability / Graphical models

Complexity, many objects & relations
→ First-order logic
Examples
Field
Logical
approach
Statistical
approach
Knowledge
representation
First-order logic Graphical models
Automated
reasoning
Satisfiability
testing
Markov chain
Monte Carlo
Machine learning Inductive logic
programming
Neural networks
Planning
Markov decision
processes
Classical
planning
Natural language Definite clause
grammars
processing
Prob. contextfree grammars
Research Plan




Unify graphical models and first-order logic
Develop learning and inference algorithms
Apply to wide variety of problems
This talk: Markov logic networks
Markov Logic Networks



MLN = Set of 1st-order formulas with weights
Formula = Feature template (Vars→Objects)
E.g., Ising model:
Up(x) ^ Neighbor(x,y) => Up(y)
1


P( x)  exp  wi ni ( x) 
Z
 i

Weight of formula i


No. of true instances of formula i in x
Most graphical models are special cases
First-order logic is infinite-weight limit
MLN Algorithms:
The First Three Generations
Problem
MAP
inference
Marginal
inference
Weight
learning
Structure
learning
First
generation
Weighted
satisfiability
Gibbs
sampling
Pseudolikelihood
Inductive
logic progr.
Second
generation
Lazy
inference
MC-SAT
Voted
perceptron
ILP + PL
(etc.)
Third
generation
Cutting
planes
Lifted
inference
Scaled conj.
gradient
Clustering +
pathfinding
Weighted Satisfiability

SAT: Find truth assignment that makes all
formulas (clauses) true






Huge amount of research on this problem
State of the art: Millions of vars/clauses in minutes
MaxSAT: Make as many clauses true as possible
Weighted MaxSAT: Clauses have weights;
maximize satisfied weight
MAP inference in MLNs is just weighted MaxSAT
Best current solver: MaxWalkSAT
MC-SAT



Deterministic dependences break MCMC
In practice, even strong probabilistic ones do
Swendsen-Wang:






Introduce aux. vars. u to represent constraints among x
Alternately sample u | x and x | u.
But Swendsen-Wang only works for Ising models
MC-SAT: Generalize S-W to arbitrary clauses
Uses SAT solver to sample x | u.
Orders of magnitude faster than Gibbs sampling,
etc.
Lifted Inference





Consider belief propagation (BP)
Often in large problems, many nodes are
interchangeable:
They send and receive the same messages
throughout BP
Basic idea: Group them into supernodes,
forming lifted network
Smaller network → Faster inference
Akin to resolution in first-order logic
Belief Propagation
 x f ( x) 
Nodes
(x)

h x
hn ( x ) \{ f }
( x)
Features
(f)
 wf ( x )

 f  x ( x)    e
 y  f ( y ) 

~{ x} 
yn ( f ) \{x}

Lifted Belief Propagation
 x f ( x) 
Nodes
(x)

h x
hn ( x ) \{ f }
( x)
Features
(f)
 wf ( x )

 f  x ( x)    e
 y  f ( y ) 

~{ x} 
yn ( f ) \{x}

Lifted Belief Propagation
, :
Functions
of edge
counts
Nodes
(x)

 x f ( x)    hx ( x)
hn ( x ) \{ f }
Features
(f)
 wf ( x )

 f  x ( x)    e
 y  f ( y ) 

~{ x} 
yn ( f ) \{x}

Weight Learning



Pseudo-likelihood + L-BFGS is fast and
robust but can give poor inference results
Voted perceptron:
Gradient descent + MAP inference
Problem: Multiple modes



Not alleviated by contrastive divergence
Alleviated by MC-SAT
Start each MC-SAT run at previous end state
Weight Learning (contd.)

Problem: Extreme ill-conditioning

Solvable by quasi-Newton, conjugate gradient, etc.
But line searches require exact inference
Stochastic gradient not applicable because
data not i.i.d.
Solution: Scaled conjugate gradient
Use Hessian to choose step size
Compute quadratic form inside MC-SAT
Use inverse diagonal Hessian as preconditioner






Structure Learning






Standard inductive logic programming optimizes
the wrong thing
But can be used to overgenerate for L1 pruning
Our approach:
ILP + Pseudo-likelihood + Structure priors
For each candidate structure change:
Start from current weights & relax convergence
Use subsampling to compute sufficient statistics
Search methods: Beam, shortest-first, etc.
Applications to Date






Natural language
processing
Information extraction
Entity resolution
Link prediction
Collective classification
Social network analysis







Robot mapping
Activity recognition
Scene analysis
Computational biology
Probabilistic Cyc
Personal assistants
Etc.
Unsupervised Semantic Parsing
Goal
Microsoft buys Powerset.
BUYS(MICROSOFT,POWERSET)
Challenge Microsoft buys Powerset
Microsoft acquires semantic search engine Powerset
Powerset is acquired by Microsoft Corporation
The Redmond software giant buys Powerset
Microsoft’s purchase of Powerset, …
USP Recursively cluster expressions
composed of similar subexpressions
Evaluation Extract knowledge from biomedical abstracts
and answer questions
Substantially outperforms state of the art
Three times as many correct answers; accuracy 88%
Research Directions

Compact representations








Deep architectures
Boolean decision diagrams
Arithmetic circuits
Unified inference procedure
Learning MLNs with many latent variables
Tighter integration of learning and inference
End-to-end NLP system
Complete agent
Resources

Open-source software/Web site: Alchemy




Learning and inference algorithms
Tutorials, manuals, etc.
MLNs, datasets, etc.
Publications
alchemy.cs.washington.edu

Book: Domingos & Lowd, Markov Logic,
Morgan & Claypool, 2009.
Download