bach-birs

advertisement
Unifying MAX SAT,
Local Consistency Relaxations,
and Soft Logic with Hinge-Loss MRFs
Stephen H. Bach
Maryland
Bert Huang
Virginia Tech
Lise Getoor
UC Santa Cruz
Modeling Relational Data
with
Markov Random Fields
Markov Random Fields
 Probabilistic model for high-dimensional data:
 The random variables represent the data, such as
whether a person has an attribute or whether a link exists
 The potentials
 The weights
3
score different configurations of the data
scale the influence of different potentials
Markov Random Fields
 Variables and potentials form graphical structure:
4
MRFs with Logic
 One way to compactly define MRFs is with first-order
logic, e.g., Markov logic networks
 Use first-order logic to define templates for potentials
- Ground out weighted rules over graph data
- The truth table of each ground rule is a potential
- Each first-order rule has a weight that becomes the potential’s
5
Richardson and Domingos, 2006
MRFs with Logic
 Let
be a set of weighted logical rules, where each rule
has the general form
- Weights
and sets
 Equivalent clausal form:
6
and
index variables
MRFs with Logic
 Probability distribution:
7
MAP Inference
 MAP (maximum a posteriori) inference seeks a mostprobable assignment to the unobserved variables
 MAP inference is
 This MAX SAT problem is combinatorial and NP-hard!
8
MAX SAT Relaxation
Approximate Inference
 View MAP inference as optimizing rounding probabilities
 Expected score of a clause is a weighted noisy-or function:
 Then expected total score is
 But,
10
is highly non-convex!
Approximate Inference
 It is the products in the objective that make it non-convex
 The expected score can be lower bounded using the
relationship between arithmetic and harmonic means:
 This leads to the lower bound
11
Goemans and Williamson, 1994
Approximate Inference
 So, we solve the linear program
 If we set
, a greedy rounding method will find a
-optimal discrete solution
 If we set
12
Goemans and Williamson, 1994
, it improves to ¾-optimal
Local Consistency
Relaxation
Local Consistency Relaxation
 LCR is a popular technique for approximating MAP in MRFs
- Often simply called linear programming (LP) relaxation
- Dual decomposition solves dual to LCR objective
 Idea: relax search over consistent marginals to simpler set
 LCR admits fast message-passing
algorithms, but no quality guarantees
in general
14
Wainwright and Jordan 2008
Local Consistency Relaxation
: pseudomarginals over variable states
: pseudomarginals over joint potential states
15
Wainwright and Jordan 2008
Unifying the
Relaxations
Analysis
j=1
j=3
j=2
17
Bach et al. AISTATS 2015
and so on…
Analysis
18
Bach et al. AISTATS 2015
Analysis
 We can now analyze each potential’s parameterized
subproblem in isolation:
 Using the KKT conditions, we can find a simplified
expression for each solution based on the parameters
19
Bach et al. AISTATS 2015
:
Analysis
20
Bach et al. AISTATS 2015
Substitute back into
outer objective
Analysis
 Leads to simplified, projected LCR over
21
Bach et al. AISTATS 2015
:
Analysis
Local Consistency
Relaxation
22
Bach et al. AISTATS 2015
MAX SAT
Relaxation
Consequences
 MAX SAT relaxation solved with choice of algorithms
 Rounding guarantees apply to LCR!
23
Bach et al. AISTATS 2015
Soft Logic and
Continuous Values
Continuous Values
 Continuous values can also be interpreted as similarities
 Or degrees of truth. Łukasiewicz logic is a fuzzy
logic for reasoning about imprecise concepts
25
Bach et al. In Preparation
All Three are Equivalent
Local Consistency
Relaxation
Exact MAX SAT for
Łukasiewicz logic
26
Bach et al. In Preparation
MAX SAT
Relaxation
Consequences
 Exact MAX SAT for Łukasiewicz logic is equivalent to
relaxed Boolean MAX SAT and local consistency relaxation
for logical MRFs
 So these scalable message-passing algorithms can also be
used to reason about similarity, imprecise concepts, etc.!
27
Bach et al. In Preparation
Hinge-Loss
Markov Random Fields
Generalizing Relaxed MRFs
 Relaxed, logic-based MRFs can reason about both discrete
and continuous relational data scalably and accurately
 Define a new distribution over continuous variables:
 We can generalize this inference objective to be the
energy of a new type of MRF that does even more
29
Bach et al. NIPS 12, Bach et al. UAI 13
Generalizations
 Arbitrary hinge-loss functions (not just logical clauses)
 Hard linear constraints
 Squared hinge losses
30
Bach et al. NIPS 12, Bach et al. UAI 13
Hinge-Loss MRFs
 Define hinge-loss MRFs by using this generalized objective
as the energy function
31
Bach et al. NIPS 12, Bach et al. UAI 13
HL-MRF Inference and Learning
 MAP inference for HL-MRFs always a convex optimization
- Highly scalable ADMM algorithm for MAP
 Supervised Learning
- No need to hand-tune weights
- Learn from training data
- Also highly scalable
 Unsupervised and Semi-Supervised Learning
More
in
- New learning algorithm that interleaves inference
and parameter
updates to cut learning time by as much as 90%Bert’s
(under talk
review)
32
Bach et al. NIPS 12, Bach et al. UAI 13
Probabilistic Soft Logic
Probabilistic Soft Logic (PSL)
 Probabilistic programming language for defining HL-MRFs
 PSL components
- Predicates:
- Atoms:
- Rules:
34
relationships or properties
(continuous) random variables
potentials
Example: Voter Identification
5.0 : Donates(A, “Republican”) -> Votes(A, “Republican”)
?

$
Status update
$
Tweet
0.3 : Mentions(A, “Affordable Care”) -> Votes(A, “Democrat”)
Votes(A, “Republican”) + Votes(A, “Democrat”) = 1.0 .
35
Example: Voter Identification
0.3 : Votes(A,P) && Friend(B,A) -> Votes(B,P)






friend
spouse
colleague
friend
friend
friend
spouse
colleague
spouse
0.8 : Votes(A,P) && Spouse(B,A) -> Votes(B,P)
36
Example: Voter Identification
/* Predicate definitions */
Votes(Person, Party)
Donates(Person, Party)
Mentions(Person, Term)
Colleague(Person, Person)
Friend(Person, Person)
Spouse(Person, Person)
(closed)
(closed)
(closed)
(closed)
(closed)
/* Local rules */
5.0 : Donates(A, P) -> Votes(A, P)
0.3 : Mentions(A, “Affordable Care”) -> Votes(A, “Democrat”)
0.3 : Mentions(A, “Tax Cuts”) -> Votes(A, “Republican”)
...
/* Relational rules
1.0 : Votes(A,P) &&
0.3 : Votes(A,P) &&
0.1 : Votes(A,P) &&
*/
Spouse(B,A) -> Votes(B,P)
Friend(B,A) -> Votes(B,P)
Colleague(B,A) -> Votes(B,P)
/* Range constraint */
Votes(A, “Republican”) + Votes(A, “Democrat”) = 1.0 .
37
PSL Defines HL-MRFs
/* Predicate definitions */
Votes(Person, Party)
Donates(Person, Party)
(closed)
Mentions(Person, Term)
(closed)
Colleague(Person, Person) (closed)
Friend(Person, Person)
(closed)
Spouse(Person, Person)
(closed)
/* Local rules */
5.0 : Donates(A, P) -> Votes(A, P)
0.3 : Mentions(A, “Affordable Care”)
-> Votes(A, “Democrat”)
0.3 : Mentions(A, “Tax Cuts”)
-> Votes(A, “Republican”)
...
/* Relational rules
1.0 : Votes(A,P) &&
0.3 : Votes(A,P) &&
0.1 : Votes(A,P) &&
*/
Spouse(B,A) -> Votes(B,P)
Friend(B,A) -> Votes(B,P)
Colleague(B,A) -> Votes(B,P)
/* Range constraint */
Votes(A, “Republican”) + Votes(A, “Democrat”)=1.
38
+
=
Open Source Implementation
 PSL is implemented as an open source library and
programming language interface
 It’s ready to use for your next project
 Some of the other groups already using PSL:
-
39
Jure Leskovec (Stanford)
Dan Jurafsky (Stanford)
Ray Mooney (UT Austin)
Kevin Murphy (Google)
[West et al., TACL 14]
[Li et al., ArXiv 14]
[Beltagy et al., ACL 14]
[Pujara et al., BayLearn 14]
Other PSL Topics
 Not discussed here:
- Smart grounding
- Lazy inference
- Distributed inference and learning
 Future work:
- Lifted inference
- Generalized rounding guarantees
40
psl.cs.umd.edu
41
Applications
PSL Empirical Highlights
 Compared with discrete MRFs:
Collective Classification
Trust Prediction
PSL
81.8%
0.7 sec
.482 AuPR
0.32 sec
Discrete
79.7%
184.3 sec
.441 AuPR
212.36 sec
 Predicting MOOC outcomes via latent engagement (AuPR):
43
Tech
Women-Civil
Genes
Lecture Rank
0.333
0.508
0.688
PSL-Direct
0.393
0.565
0.757
PSL-Latent
0.546
0.816
0.818
Bach et al. UAI 13, Ramesh et al. AAAI 14
PSL Empirical Highlights
 Improved activity recognition in video:
5 Activities
6 Activities
HOG
47.4%
.481 F1
59.6%
.582 F1
PSL + HOG
59.8%
.603 F1
79.3%
.789 F1
ACD
67.5%
.678 F1
83.5%
.835 F1
PSL + ACD
69.2%
.693 F1
86.0%
.860 F1
 Compared on drug-target interaction prediction:
44
AuPR
P@130
Perlman’s Method
.564 ± .05
.594 ± .04
PSL
.617 ± .05
.616 ± .04
London et al. CVPR WS 13, Fakhraei et al. TCBB 14
Conclusion
Conclusion
 HL-MRFs unite and generalize different ways of viewing
fundamental AI problems, including MAX SAT, probabilistic
graphical models, and fuzzy logic
 PSL’s mix of expressivity and scalability makes it a
general-purpose tool for network analysis,
bioinformatics, NLP, computer vision, and more
 Many exciting open problems, from algorithms, to theory,
to applications
46
Download