Probabilistic Inference in PRISM

advertisement
Probabilistic Inference in PRISM
Taisuke Sato
Tokyo Institute of Technology
Problem
• Statistical machine learning is a labor-intensive process:
{modeling  learning  evaluation}* of trial-and-error
• Pains of deriving and implementing model-specific learning
algorithms and model-specific probabilistic inference
Model 1
Model 2
EM2
EM1
EM
...
Model n
...
MCMC
EMn
VB
model-specific learning algorithms
Our solution
• Develop a high-level modeling language that offers universal
learning and inference methods applicable to every model
Model 1
Model 2
...
Model n
modeling language
EM
VB
...
MCMC
• The user concentrates on modeling and the rest (learning
and inference) is taken care of by the system
PRISM (http://sato-www.cs.titech.ac.jp/prism/)
• Logic-based high-level modeling language
Probabilistic models
Bayesian
network
HMM
New
model
PCFG
...
PRISM system
EM/MAP
VT
VBVT
VB
MCMC
Learning methods
• Its generic inference/learning methods subsume standard algorithms
such as FB for HMMs and BP for Bayesian networks
Basic ideas
• Semantics
• program = Turing machine + probabilistic choice
+ Dirichlet prior
• denotation = a probability measure over possible worlds
• Propositionalized probability computation (PPC)
• programs written at predicate logic level
• probability computation at propositional logic level
• Dynamic programming for PPC
• proof search generates a directed graph (explanation graph)
• Probabilities are computed from bottom to top in the graph
• Discriminative use
• generatively define a model by a PRISM program and
descriminatively use it for better prediction performance
ABO blood type program
values(abo,[a,b,o],[0.5,0.2,0.3]).
msw(abo,a) is true with prob. 0.5
btype(X):- gtype(Gf,Gm), pg_table(X,[Gf,Gm]).
pg_table(X,GT):((X=a;X=b),(GT=[X,o];GT=[o,X];GT=[X,X])
; X=o,GT=[o,o]
; X=ab,(GT=[a,b];GT=[b,a])).
gtype(Gf,Gm):- msw(abo,Gf),msw(abo,Gm).
father
a
mother
b
probabilistic primitives simulate gene inheritance
from father (left) and mother (right)
a o
AB
b
A
o
child
B
Propositionalized probability computation
0.55
btype(a)0.25
0.15
0.15
<=> gtype(a,a) v gtype(a,o) v gtype(o,a)
0.25
0.5
0.5
0.15
0.5
0.3
0.15
0.3
0.5
gtype(a,a) <=> msw(abo,a) & msw(abo,a)
Explanation graph for btype(a)
that explains how btype(a) is
proved by probabilistic choice
made by msw-atoms
gtype(a,o) <=> msw(abo,a) & msw(abo,o)
gtype(o,a) <=> msw(abo,o) & msw(abo,a)
PPC+DP subsumes
forward-backward, belief
propagation, insideoutside computation
Sum-product computation of
probabilities in a bottom-up
manner using probabilities
assigned to msw atoms
Expl. graph is acyclic and
dynamic programming (DP)
is possible
Learning
• A program defines a joint distributionP(x,y|q) where x hidden
and y observed
• P(msw(abo,a),..btype(a),… |qa,qb,qo) where qa+qb+qo=1
• Learning q from observed data y by maximizing
• P(y|q)  MLE/MAP
• P(x*,y|q) where x* = argmax_x P(x,y|q)  VT
• From a Bayesian point of view, a program defines marginal
likelihood ∫P(x,y|q,a) dq
• We wish to compute
• predictive distribution = ∫P(x|y,q,a) dq
• marginal likelihood P(y|a) = Sx∫P(x,y|q,a) dq
• Both need approximation
• Variational Bayes (VB)  VB, VB-VT
• MCMC  Metropolis-Hastings
Sample session 1
- Expl. graph and prob. computation
built-in predicate
| ?- prism(blood)
loading::blood.psm.out
| ?- show_sw
Switch gene: unfixed_p: a (p: 0.500000000) b (p: 0.200000000) o (p: 0.300000000)
| ?- probf(btype(a))
btype(a)
<=> gtype(a,a) v gtype(a,o) v gtype(o,a)
gtype(a,a)
<=> msw(gene,a) & msw(gene,a)
gtype(a,o)
<=> msw(gene,a) & msw(gene,o)
gtype(o,a)
<=> msw(gene,o) & msw(gene,a)
| ?- prob(btype(a),P)
P = 0.55
Sample session 2
- MLE and Viterbi inference
| ?- D=[btype(a),btype(a),btype(ab),btype(o)],learn(D)
Exporting switch information to the EM routine ... done
#em-iters: 0(4) (Converged: -4.965121886)
Statistics on learning:
Graph size: 18
Number of switches: 1
Number of switch instances: 3
Number of iterations: 4
Final log likelihood: -4.965121886
| ?- prob(btype(a),P)
P = 0.598211
| ?- viterbif(btype(a))
btype(a) <= gtype(a,a)
gtype(a,a) <= msw(gene,a) & msw(gene,a)
Sample session 3
- Bayes inference by MCMC
| ?- D=[btype(a), btype(a), btype(ab), btype(o)],
marg_mcmc_full(D,[burn_in(1000),end(10000),skip(5)],[VFE,ELM]), marg_exact(D,LogM)
VFE = -5.54836
ELM = -5.48608
LogM = -5.48578
|?- D=[btype(a), btype(a), btype(ab) ,btype(o)], predict_mcmc_full(D,[btype(a)],[[_,E,_]]),
print_graph(E,[lr('<=')])
btype(a) <= gtype(a,a)
gtype(a,a) <= msw(gene,a) & msw(gene,a)
Summary
• PRISM = Probabilistic Prolog for statistical machine learning
• Forward sampling
• Exact probability computation
• Parameter learning
• MLE/MAP,
• VT
• Bayesian inference
• VB
• VBVT
• MCMC
• Viterbi inference
• model core (BIC,Cheesman-Stutz,VFE)
• smoothing
• Current version 2.1
Download