Probabilistic Inference in PRISM

Probabilistic Inference in PRISM Taisuke Sato Tokyo Institute of Technology Problem • Statistical machine learning is a labor-intensive process: {modeling  learning  evaluation}* of trial-and-error • Pains of deriving and implementing model-specific learning algorithms and model-specific probabilistic inference Model 1 Model 2 EM2 EM1 EM ... Model n ... MCMC EMn VB model-specific learning algorithms Our solution • Develop a high-level modeling language that offers universal learning and inference methods applicable to every model Model 1 Model 2 ... Model n modeling language EM VB ... MCMC • The user concentrates on modeling and the rest (learning and inference) is taken care of by the system PRISM (http://sato-www.cs.titech.ac.jp/prism/) • Logic-based high-level modeling language Probabilistic models Bayesian network HMM New model PCFG ... PRISM system EM/MAP VT VBVT VB MCMC Learning methods • Its generic inference/learning methods subsume standard algorithms such as FB for HMMs and BP for Bayesian networks Basic ideas • Semantics • program = Turing machine + probabilistic choice + Dirichlet prior • denotation = a probability measure over possible worlds • Propositionalized probability computation (PPC) • programs written at predicate logic level • probability computation at propositional logic level • Dynamic programming for PPC • proof search generates a directed graph (explanation graph) • Probabilities are computed from bottom to top in the graph • Discriminative use • generatively define a model by a PRISM program and descriminatively use it for better prediction performance ABO blood type program values(abo,[a,b,o],[0.5,0.2,0.3]). msw(abo,a) is true with prob. 0.5 btype(X):- gtype(Gf,Gm), pg_table(X,[Gf,Gm]). pg_table(X,GT):((X=a;X=b),(GT=[X,o];GT=[o,X];GT=[X,X]) ; X=o,GT=[o,o] ; X=ab,(GT=[a,b];GT=[b,a])). gtype(Gf,Gm):- msw(abo,Gf),msw(abo,Gm). father a mother b probabilistic primitives simulate gene inheritance from father (left) and mother (right) a o AB b A o child B Propositionalized probability computation 0.55 btype(a)0.25 0.15 0.15 <=> gtype(a,a) v gtype(a,o) v gtype(o,a) 0.25 0.5 0.5 0.15 0.5 0.3 0.15 0.3 0.5 gtype(a,a) <=> msw(abo,a) & msw(abo,a) Explanation graph for btype(a) that explains how btype(a) is proved by probabilistic choice made by msw-atoms gtype(a,o) <=> msw(abo,a) & msw(abo,o) gtype(o,a) <=> msw(abo,o) & msw(abo,a) PPC+DP subsumes forward-backward, belief propagation, insideoutside computation Sum-product computation of probabilities in a bottom-up manner using probabilities assigned to msw atoms Expl. graph is acyclic and dynamic programming (DP) is possible Learning • A program defines a joint distributionP(x,y|q) where x hidden and y observed • P(msw(abo,a),..btype(a),… |qa,qb,qo) where qa+qb+qo=1 • Learning q from observed data y by maximizing • P(y|q)  MLE/MAP • P(x*,y|q) where x* = argmax_x P(x,y|q)  VT • From a Bayesian point of view, a program defines marginal likelihood ∫P(x,y|q,a) dq • We wish to compute • predictive distribution = ∫P(x|y,q,a) dq • marginal likelihood P(y|a) = Sx∫P(x,y|q,a) dq • Both need approximation • Variational Bayes (VB)  VB, VB-VT • MCMC  Metropolis-Hastings Sample session 1 - Expl. graph and prob. computation built-in predicate | ?- prism(blood) loading::blood.psm.out | ?- show_sw Switch gene: unfixed_p: a (p: 0.500000000) b (p: 0.200000000) o (p: 0.300000000) | ?- probf(btype(a)) btype(a) <=> gtype(a,a) v gtype(a,o) v gtype(o,a) gtype(a,a) <=> msw(gene,a) & msw(gene,a) gtype(a,o) <=> msw(gene,a) & msw(gene,o) gtype(o,a) <=> msw(gene,o) & msw(gene,a) | ?- prob(btype(a),P) P = 0.55 Sample session 2 - MLE and Viterbi inference | ?- D=[btype(a),btype(a),btype(ab),btype(o)],learn(D) Exporting switch information to the EM routine ... done #em-iters: 0(4) (Converged: -4.965121886) Statistics on learning: Graph size: 18 Number of switches: 1 Number of switch instances: 3 Number of iterations: 4 Final log likelihood: -4.965121886 | ?- prob(btype(a),P) P = 0.598211 | ?- viterbif(btype(a)) btype(a) <= gtype(a,a) gtype(a,a) <= msw(gene,a) & msw(gene,a) Sample session 3 - Bayes inference by MCMC | ?- D=[btype(a), btype(a), btype(ab), btype(o)], marg_mcmc_full(D,[burn_in(1000),end(10000),skip(5)],[VFE,ELM]), marg_exact(D,LogM) VFE = -5.54836 ELM = -5.48608 LogM = -5.48578 |?- D=[btype(a), btype(a), btype(ab) ,btype(o)], predict_mcmc_full(D,[btype(a)],[[_,E,_]]), print_graph(E,[lr('<=')]) btype(a) <= gtype(a,a) gtype(a,a) <= msw(gene,a) & msw(gene,a) Summary • PRISM = Probabilistic Prolog for statistical machine learning • Forward sampling • Exact probability computation • Parameter learning • MLE/MAP, • VT • Bayesian inference • VB • VBVT • MCMC • Viterbi inference • model core (BIC,Cheesman-Stutz,VFE) • smoothing • Current version 2.1

Probabilistic Inference in PRISM

Related documents

Products

Support

Probabilistic Inference in PRISM

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib