Turning Probabilistic
Reasoning into Programming
Avi Pfeffer
Harvard University
Uncertainty
Uncertainty is ubiquitous
Partial information
Noisy sensors
Non-deterministic actions
Exogenous events
Reasoning under uncertainty is a central challenge for building intelligent systems
Probability
Probability provides a mathematically sound basis for dealing with uncertainty
Combined with utilities, provides a basis for decision-making under uncertainty
Probabilistic Reasoning
Representation: creating a probabilistic model of the world
Inference: conditioning the model on observations and computing probabilities of interest
Learning: estimating the model from training data
The Challenge
How do we build probabilistic models of large, complex systems that are
easy to construct and understand support efficient inference can be learned from data
(The Programming Challenge)
How do we build programs for interesting problems that are
easy to construct and maintain do the right thing run efficiently
Lots of Representations
Plethora of existing models
Bayesian networks, hidden Markov models, stochastic context free grammars, etc.
Lots of new models
Object-oriented Bayesian networks, probabilistic relational models, etc.
Goal
A probabilistic representation language that
captures many existing models allows many new models provides programming-language like solutions to building and maintaining models
IBAL
A high-level “probabilistic programming” language for representing
Probabilistic models
Decision problems
Bayesian learning
Implemented and publicly available
Outline
Motivation
The IBAL Language
Inference Goals
Probabilistic Inference Algorithm
Lessons Learned
Stochastic Experiments
A programming language expression describes a process that generates a value
An IBAL expression describes a process that stochastically generates a value
Meaning of expression is probability distribution over generated value
Evaluating an expression = computing the probability distribution
Simple expressions
Constants
Variables
Conditionals
Stochastic Choice x = ‘hello y = x z = if x==‘bye ff then 1 else 2 w = dist [ 0.4: ’hello,
0.6: ’world ]
Functions fair ( ) = dist [0.5 : ‘heads, 0.5 : ‘tails] x = fair ( ) y = fair ( )
x and y are independent tosses of a ft fair coin
Higher-order Functions fair ( ) = dist [0.5 : ‘heads, 0.5 : ‘tails] biased ( ) = dist [0.9 : ‘heads, 0.1 : ‘tails] pick ( ) = dist [0.5 : fair, 0.5 : biased] coin = pick ( ) x = coin ( ) y = coin ( )
x and y are conditionally independent ff given coin
Data Structures and Types
IBAL provides a rich type system
tuples and records algebraic data types
IBAL is strongly typed
automatic ML-style type inference
Bayesian Networks
Smart
Good Test
Taker
Diligent
Understands s s
S s
D d
P(U| S, D)
0.9 0.1
s d d d
0.3 0.7
0.6 0.4
0.01 0.99
Exam
Grade
HW
Grade nodes = domain variables edges = direct causal influence
Network structure encodes conditional independencies:
I( HW-Grade , Smart | Understands)
BNs in IBAL smart = flip 0.8
diligent = flip 0.4
G
S understands =
E case <smart,diligent> of
H
# <true,true> : flip 0.9
# <true,false> : flip 0.6
…
U
D
First-Order HMMs
H
1
H
2
H t-1
H t
O
1
O
2
O t-1
O t
Initial distribution P(H
1
Transition model P(H i
)
|H i-1
)
Observation model P(O i
|H i
)
What if hidden state is arbitrary data structure?
HMMs in IBAL init : () -> state trans : state -> state obs : state -> obsrv sequence(current) = { state = current observation = obs(state) future = sequence(trans(state)) } hmm() = sequence(init())
SCFGs
S -> AB (0.6)
S -> BA (0.4)
A -> a (0.7)
A -> AA (0.3)
B -> b (0.8)
B -> BB (0.2)
Non-terminals are data generating functions
SCFGs in IBAL append(x,y) = if null(x) then y else cons (first(x), append (rest(x),y) production(x,y) = append(x(),y()) terminal(x) = cons(x,nil) s() = dist[0.6:production(a,b),
0.4:production(b,a)] a() = dist[0.7:terminal(‘a),…
Probabilistic Relational Models
Actor
Chaplin
…
Appearance
Movie Role-Type
Mod T.
…
Actor
Gender Actor
Chaplin
…
Role-Type Actor.Gender, Movie.Genre
Movie
Genre Movie
Mod T.
…
PRMs in IBAL movie( ) = { genre = dist ... } actor( ) = { gender = dist ... } appearance(a,m) = { role_type = case (a.gender,m.genre) of
(male,western) : dist ... } mod_times = movie() chaplin = actor() a1 = appearance(chaplin, mod_times)
Other IBAL Features
Observations can be inserted into programs
condition probability distribution over values
Probabilities in programs can be learnable parameters, with Bayesian priors
Utilities can be associated with different outcomes
Decision variables can be specified
influence diagrams, MDPs
Outline
Motivation
The IBAL Language
Inference Goals
Probabilistic Inference Algorithm
Lessons Learned
Goals
Generalize many standard frameworks for inference
e.g. Bayes nets, HMMs, probabilistic CFGs
Support parameter estimation
Support decision making
Take advantage of language structure
Avoid unnecessary computation
Desideratum #1:
Exploit Independence
Smart
Diligent
Good Test
Taker
Understands
Exam
Grade
HW
Grade
Use Bayes net-like inference algorithm
Desideratum #2:
Exploit Low-Level Structure
Causal independence (noisy-or) x = f() y = g() z = x & flip(0.9) | y & flip(0.8)
Desideratum #2:
Exploit Low-Level Structure
Context-specific independence x = f() y = g() z = case <x,y> of
<false,false> : flip 0.4
<false,true> : flip 0.6
<true> : flip 0.7
Desideratum #3:
Exploit Object Structure
Complex domain often consists of interacting objects weakly
Objects share a small interface
Objects are conditionally independent given interface
Student 1
Course Difficulty
Student 2
Desideratum #4:
Exploit Repetition
Domain often consists of many of the same kinds of objects
Can inference be shared between them?
f() = complex x1 = f() x2 = f()
… x100 = f()
Desideratum #5:
Use the Query
Only evaluate required parts of model
Can allow finite computation on infinite model f() = f() x = let y = f() in true
A query on x does not require f
Lazy evaluation is required
Particularly important for probabilistic languages, e.g. stochastic grammars
Desideratum #6
Use Support
The support probability of a variable is the set of values it can take with positive
Knowing support of subexpressions can simplify computation f() = f() x = false y = if x then f() else true
Desideratum #7
Use Evidence
Evidence can restrict the possible values of a variable
It can be used like support to simplify computation f() = f() x = flip 0.6
y = if x then f() else true observe x = false
Outline
Motivation
The IBAL Language
Inference Goals
Probabilistic Inference Algorithm
Lessons Learned
Two-Phase Inference
Phase 1: decide what computations need to be performed
Phase 2: perform the computations
Natural Division of Labor
Responsibilities of phase 1:
utilizing query, support and evidence taking advantage of repetition
Responsibilities of phase 2:
exploiting conditional independence, lowlevel structure and inter-object structure
Phase 1
IBAL Program
Computation graph
Computation Graph
Nodes are subexpressions
Edge from X to Y means “Y needs to be computed in order to compute X”
Graph, not tree
different expressions may share subexpressions
memoization used to make sure each subexpression occurs once in graph
Construction of
Computation Graph
1.
2.
Propagate evidence throughout program
Compute support for each node
Evidence Propagation
Backwards and forwards let x = <a:flip 0.4, b:1> in observe x.a = true in if x.a then ‘a else ‘b
Construction of
Computation Graph
1.
2.
Propagate evidence throughout program
•
Compute support for each node
•
• this is an evaluator for a nondeterministic programming language lazy evaluation memoization
Gotcha!
Laziness and memoization don’t go together
Memoization: when a function is called, look up arguments in cache
But with lazy evaluation, arguments are not evaluated before function call!
Lazy Memoization
Speculatively evaluate function without evaluating arguments
When argument is found to be needed
abort function evaluation
store in cache that argument is needed evaluate the argument speculatively evaluate function again
When function evaluates successfully
cache mapping from evaluated arguments to result
Lazy Memoization let f(x,y,z) = if x then y else z in f(true,’a,’b) f(_,_,_) f(true,_,_) f(true,’a,_)
Need x
Need y
‘a
Phase 2
Computation Graph
Microfactors
Solution
P(Outcome=true)=0.6
Microfactors
Representation of function from variables to reals
X Y Value
True 1 is the indicator function of XvY
More compact than complete tables
Can represent low-level structure
Producing Microfactors
Goal: Translate an IBAL program into a set of microfactors F and a set of variables X such that the P(Output) =
X f
F Similar to Bayes net f
Can solve by variable elimination
exploits independence
Producing Microfactors
Accomplished by recursive descent on computation graph
Use production rules to translate each expression type into microfactors
Introduce temporary variables where necessary
Producing Microfactors if e
1 then e
2 else e
3 e
1 e
2 e
3
X e
1
X=True e
2
X=False e
3
Phase 2
Computation Graph
Microfactors
Structured
Variable
Elimination
Solution
P(Outcome=true)=0.6
Learning and Decisions
Learning uses EM
like BNs, HMMs, SCFGs etc.
Decision making uses backward induction
like influence diagrams
Memoization provides dynamic programming
simulates value iteration for MDPs
Lessons Learned
Stochastic programming languages are more complex than they appear
Single mechanism is insufficient for inference in a complex language
Different approaches may each contribute ideas to solution
Beware of unexpected interactions
Conclusion
IBAL is a very general language for constructing probabilistic models
captures many existing frameworks, and allows many new ones
Building an IBAL model = writing a program describing how values are generated
Probabilistic reasoning is like programming
Future Work
Approximate inference
loopy belief propagation likelihood weighting
Markov chain Monte Carlo special methods for IBAL?
Ease of use
Reading formatted data
Programming interface
Obtaining IBAL www.eecs.harvard.edu/~avi/ibal