# Inference in Graphical Models Variable Elimination and Message Passing Algorithm Le Song ```Inference in Graphical Models
Variable Elimination and Message Passing Algorithm
Le Song
CSE 8803ML, Spring 2012
Conditional Independence Assumptions
Local Markov Assumption
Global Markov Assumption
𝑋 ⊥ 𝑁𝑜𝑛𝑑𝑒𝑠𝑐𝑒𝑛𝑑𝑎𝑛𝑡𝑋 |𝑃𝑎𝑋
𝐴 ⊥ 𝐵|𝐶, 𝑠𝑒𝑝𝐺 𝐴, 𝐵; 𝐶
𝑁𝑜𝑛𝑑𝑒𝑠𝑐𝑒𝑛𝑑𝑎𝑛𝑡𝑋
𝑃𝑎𝑋
𝑋
Undirected Tree
Undirected Chordal
Graph
𝐵𝑁 𝑀𝑁
𝐴
𝐶
𝐵
𝑃
Moralize
𝑀𝑁
𝐵𝑁
Triangulate
2
Distribution Factorization
Bayesian Networks (Directed Graphical Models)
𝐼 − 𝑚𝑎𝑝: 𝐼𝑙 𝐺 ⊆ 𝐼 𝑃
⇔
𝑛
𝑃(𝑋1 , … , 𝑋𝑛 ) =
Conditional
Probability
Tables (CPTs)
𝑃(𝑋𝑖 | 𝑃𝑎𝑋𝑖 )
𝑖=1
Markov Networks (Undirected Graphical Models)
𝑠𝑡𝑟𝑖𝑐𝑡𝑙𝑦 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑃, 𝐼 − 𝑚𝑎𝑝: 𝐼 𝐺 ⊆ 𝐼 𝑃
Clique
⇔
𝑚
Potentials
1
𝑃(𝑋1 , … , 𝑋𝑛 ) =
Ψ𝑖 𝐷𝑖
𝑍
Maximal
𝑖=1
Normalization
(Partition
Function)
𝑚
𝑍 =
𝑥1 ,𝑥2 ,…,𝑥𝑛
Ψ𝑖 𝐷𝑖
Clique
𝑖=1
3
Inference in Graphical Models
Graphical models give compact representations of probabilistic
distributions 𝑃 𝑋1 , … , 𝑋𝑛 (n-way tables to much smaller tables)
We use inference as a name for the process of computing
4
Query Type 1: Likelihood
Most queries involve evidence
Evidence 𝑒 is an assignment of values to a set 𝐸 variables
Evidence are observations on some variables
Without loss of generality 𝐸 = 𝑋𝑘+1 , … , 𝑋𝑛
Simplest query: compute probability of evidence
𝑃 𝑒 =
𝑥1 …
𝑥𝑘 𝑃(𝑥1 , … , 𝑥𝑘 , 𝑒)
This is often referred to as computing the likelihood of 𝑒
Sum over this
set of variables
𝐸
5
Query Type 2: Conditional Probability
Often we are interested in the conditional probability
distribution of a variable given the evidence
𝑃 𝑋, 𝑒
𝑃(𝑋, 𝑒)
𝑃 𝑋𝑒 =
=
𝑃 𝑒
𝑥 𝑃(𝑋 = 𝑥, 𝑒)
It is also called a posteriori belief in 𝑋 given evidence 𝑒
We usually query a subset Y of all variables 𝒳 = {𝑌, 𝑍, 𝑒} and
“don’t care” about the remaining 𝑍
𝑃 𝑌𝑒 =
𝑃(𝑌, 𝑍 = 𝑧|𝑒)
𝑧
Take all possible configuration of 𝑍 into account
The processes of summing out the unwanted variable Z is called
marginalization
6
Query Type 2: Conditional Probability Example
Interested in the
conditionals for
these variables
Sum over this
set of variables
𝐸
Interested in the
conditionals for
these variables
𝐸
Sum over this
set of variables
7
Application of a posteriori Belief
Prediction: what is the probability of an outcome given the
starting condition
The query node is a descendent of the evidence
𝐴
𝐵
𝐶
Diagnosis: what is the probability of disease/fault given
symptoms
The query node is an ancestor of the evidence
𝐴
𝐵
𝐶
Learning under partial observations (Fill in the unobserved)
Information can flow in either direction
Inference can combine evidence from all parts of the networks
8
Query Type 3: Most Probable Assignment
Want to find the most probably joint assignment for some
variables of interests
Such reasoning is usually performed under some given
evidence 𝑒, and ignoring (the values of other variables) 𝑍
Also called maximum a posteriori (MAP) assignment for 𝑌
𝑀𝐴𝑃 𝑌 𝑒 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑦 𝑃 𝑌 𝑒 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑦 𝑧 𝑃 𝑌, 𝑍 = 𝑧 𝑒
Interested in the
most probable
values for these
variables
Sum over this
set of variables
𝐸
9
Application of MAP assignment
Classification
Find most likely label, given the evidence
Explanation
What is the most likely scenario, given the evidence
Cautionary note:
The MAP assignment of a variable dependence on its context –
the set of varibales being jointly queried
Example:
MAP of 𝑋, 𝑌 ?
(0, 0)
MAP of 𝑋?
1
X
Y
P(X,Y)
X
P(X)
0
0
0.35
0
0.4
0
1
0.05
1
0.6
1
0
0.3
1
1
0.3
10
Complexity of Inference
Computing the a posteriori belief 𝑃 𝑋 𝑒 in a GM is NP-hard in
general
Hardness implies we cannot find a general procedure that
works efficiently for arbitrary GMs
For particular families of GMs, we can have provably efficient
procedures
eg. trees
For some families of GMs, we need to design efficient
approximate inference algorithms
eg. grids
11
Approaches to inference
Exact inference algorithms
Variable elimination algorithm
Message-passing algorithm (sum-product, belief propagation
algorithm)
The junction tree algorithm
Approximate inference algorithms
Sampling methods/Stochastic simulation
Variational algorithms
12
Marginalization and Elimination
A metabolic pathway:
What is the likelihood protein 𝐸 is produced
𝐵
𝐴
𝐷
𝐶
𝐸
Query: P(E)
𝑃 𝐸 =
𝑑
𝑐
𝑏
𝑎𝑃
𝑎, 𝑏, 𝑐, 𝑑, 𝐸
Using graphical models, we get
𝑃 𝐸 =
𝑑
𝑐
𝑏
𝑎𝑃
𝑎)𝑃 𝑏 𝑎 𝑃 𝑐 𝑏 𝑃 𝑑 𝑐 𝑃(𝐸|𝑑
Na&iuml;ve summation needs to
enumerate over an
exponential number of terms
13
Elimination in Chains
𝐴
𝐵
𝐷
𝐶
𝐸
Rearranging terms and the summations
𝑃 𝐸
=
𝑃 𝑎)𝑃 𝑏 𝑎 𝑃 𝑐 𝑏 𝑃 𝑑 𝑐 𝑃(𝐸|𝑑
𝑑
𝑐
𝑏
=
𝑎
𝑃 𝑐𝑏 𝑃 𝑑𝑐 𝑃 𝐸𝑑
𝑑
𝑐
𝑏
𝑃 𝑎 𝑃 𝑏𝑎
𝑎
14
Elimination in Chains (cont.)
𝐵
𝐴
𝐷
𝐶
𝐸
𝑃(𝑏)
Now we can perform innermost summation efficiently
𝑃 𝐸
=
𝑃 𝑐𝑏 𝑃 𝑑𝑐 𝑃 𝐸𝑑
𝑑
𝑐
𝑏
=
𝑃 𝑎 𝑃 𝑏𝑎
𝑎
𝑃 𝑐 𝑏 𝑃 𝑑 𝑐 𝑃 𝐸 𝑑 𝑃(𝑏)
𝑑
𝑐
𝑏
Equivalent to matrix-vector
multiplication, |Val(A)| * |Val(B)|
The innermost summation eliminates one variable from our
summation argument at a local cost.
15
Elimination in Chains (cont.)
𝐵
𝐴
𝑃(𝑏)
𝐶
𝐸
𝐷
𝑃(𝑐)
Rearranging and then summing again, we get
C B
𝑃 𝐸
=
𝑃 𝑐 𝑏 𝑃 𝑑 𝑐 𝑃 𝑒 𝑑 𝑃(𝑏)
𝑑
𝑐
=
𝑐
=
1
B
0
0
0 .15 0.35
0
0 .25
1
0.85
1
0.75
0.65
𝑏
𝑃 𝑑𝑐 𝑃 𝐸𝑑
𝑑
0
𝑃 𝑐𝑏 𝑃 𝑏
𝑏
𝑃 𝑑 𝑐 𝑃 𝐸 𝑑 𝑃(𝑐)
𝑑
𝑐
Equivalent to matrix-vector
multiplication, |Val(B)| * |Val(C)|
16
Elimination in Chains (cont.)
𝐵
𝐴
𝑃(𝑏)
𝐶
𝐷
𝐸
𝑃(𝑐)
Eliminate nodes one by one all the way to the end
𝑃 𝐸 =
𝑃 𝐸 𝑑 𝑃(𝑑)
𝑑
Computational Complexity for a chain of length 𝑘
Each step costs O(|Val(𝑋𝑖 )| * |Val(𝑋𝑖+1 )|) operations: O(𝑘𝑛2 )
Ψ 𝑋𝑖 =
𝑥𝑖−1 𝑃
𝑋𝑖 𝑋𝑖−1 )𝑃(𝑋𝑖−1 )
Compare to na&iuml;ve summation: O(𝑛𝑘 )
𝑥1 …
𝑥𝑘−1 𝑃(𝑥1 , … , 𝑋𝑘 )
17
Undirected Chains
𝐵
𝐴
𝐶
𝐸
𝐷
Rearrange terms, perform local summation …
𝑃 𝐸
=
𝑑
1
=
𝑍
1
=
𝑍
𝑐
𝑏
𝑎
1
Ψ 𝑏, 𝑎 Ψ 𝑐, 𝑏 Ψ 𝑑, 𝑐 Ψ(𝐸, 𝑑)
𝑍
Ψ 𝑐, 𝑏 Ψ 𝑑, 𝑐 Ψ 𝐸, 𝑑
𝑑
𝑐
𝑏
Ψ 𝑏, 𝑎
𝑎
Ψ 𝑐, 𝑏 Ψ 𝑑, 𝑐 Ψ 𝐸, 𝑑 Ψ 𝑏
𝑑
𝑐
𝑏
18
The Sum-Product Operation
During inference, we try to compute an expression
Sum-product form: 𝑍 Ψ∈𝓕 Ψ
𝓧 = {𝑋1 , … , 𝑋𝑛 } the set of variables
𝓕 a set of factors such that for each Ψ ∈ 𝓕, 𝑆𝑐𝑜𝑝𝑒 Ψ ∈ 𝓧
𝓨 ⊂ 𝓧 a set of query variables
𝓩 = 𝓧 − 𝓨 the variables to eliminate
The result of eliminating the variables in 𝓩 is a factor
𝜏 𝓨 =
Ψ
𝑧 Ψ∈𝓕
This factor does not necessarily correspond to any probability or
conditional probability in the network.
𝑃 𝓨 =
𝜏(𝓨)
𝜏(𝓨)
19
Inference via Variable Elimination
General Idea
Write query in the form
𝑃 𝑋1 , 𝑒 =
…
𝑥𝑛
𝑃 𝑥𝑖 𝑃𝑎𝑋𝑖
𝑥3 𝑥2
𝑖
The sum is ordered to suggest an elimination order
Then iteratively
Move all irrelevant terms outside of innermost sum
Perform innermost sum, getting a new term
Insert the new term into the product
Finally renormalize
𝑃 𝑋1 𝑒 =
𝜏 𝑋1 , 𝑒
𝑥1 𝜏(𝑋1 , 𝑒)
20
A more complex network
A food web
𝐵
𝐶
𝐴
𝐷
𝐸
𝐺
𝐹
𝐻
What is the probability 𝑃 𝐴 𝐻 that hawks are leaving given
that the grass condition is poor?
21
Example: Variable Elimination
Query: 𝑃(𝐴|ℎ), need to eliminate 𝐵, 𝐶, 𝐷, 𝐸, 𝐹, 𝐺, 𝐻
Initial factors
𝐵
𝐶
𝑃 𝑎 𝑃 𝑏 𝑃 𝑐 𝑏 𝑃 𝑑 𝑎 𝑃 𝑒 𝑐, 𝑑 𝑃 𝑓 𝑎 𝑃 𝑔 𝑒 𝑃 ℎ 𝑒, 𝑓
Choose an elimination order: 𝐻, 𝐺, 𝐹, 𝐸, 𝐷, 𝐶, 𝐵 (&lt;)
𝐷
𝐸
𝐹
𝐻
𝐺
Step 1: Eliminate G
Conditioning (fix the evidence node on its observed value)
𝑚ℎ 𝑒, 𝑓 = 𝑃(𝐻 = ℎ|𝑒, 𝑓)
𝐴
𝐵
𝐶
𝐴
𝐷
𝐹
𝐸
𝐺
22
Example: Variable Elimination
Query: 𝑃(𝐴|ℎ), need to eliminate 𝐵, 𝐶, 𝐷, 𝐸, 𝐹, 𝐺
𝐵
𝐶
Initial factors
𝑃 𝑎 𝑃 𝑏 𝑃 𝑐 𝑏 𝑃 𝑑 𝑎 𝑃 𝑒 𝑐, 𝑑 𝑃 𝑓 𝑎 𝑃 𝑔 𝑒 𝑃 ℎ 𝑒, 𝑓
⇒ 𝑃 𝑎 𝑃 𝑏 𝑃 𝑐 𝑏 𝑃 𝑑 𝑎 𝑃 𝑒 𝑐, 𝑑 𝑃 𝑓 𝑎 𝑃 𝑔 𝑒 𝑚ℎ (𝑒, 𝑓)
𝐴
𝐷
𝐸
𝐹
𝐻
𝐺
Step 2: Eliminate 𝐺
Compute 𝑚𝑔 𝑒 =
𝑔𝑃
𝑔𝑒 =1
𝐵
⇒ 𝑃 𝑎 𝑃 𝑏 𝑃 𝑐 𝑏 𝑃 𝑑 𝑎 𝑃 𝑒 𝑐, 𝑑 𝑃 𝑓 𝑎 𝑚𝑔 𝑒 𝑚ℎ (𝑒, 𝑓)
⇒ 𝑃 𝑎 𝑃 𝑏 𝑃 𝑐 𝑏 𝑃 𝑑 𝑎 𝑃 𝑒 𝑐, 𝑑 𝑃 𝑓 𝑎 𝑚ℎ (𝑒, 𝑓)
𝐶
𝐴
𝐷
𝐹
𝐸
23
Example: Variable Elimination
Query: 𝑃(𝐴|ℎ), need to eliminate 𝐵, 𝐶, 𝐷, 𝐸, 𝐹
𝐵
𝐶
Initial factors
𝑃 𝑎 𝑃 𝑏 𝑃 𝑐 𝑏 𝑃 𝑑 𝑎 𝑃 𝑒 𝑐, 𝑑 𝑃 𝑓 𝑎 𝑃 𝑔 𝑒 𝑃 ℎ 𝑒, 𝑓
⇒ 𝑃 𝑎 𝑃 𝑏 𝑃 𝑐 𝑏 𝑃 𝑑 𝑎 𝑃 𝑒 𝑐, 𝑑 𝑃 𝑓 𝑎 𝑃 𝑔 𝑒 𝑚ℎ 𝑒, 𝑓
⇒ 𝑃 𝑎 𝑃 𝑏 𝑃 𝑐 𝑏 𝑃 𝑑 𝑎 𝑃 𝑒 𝑐, 𝑑 𝑃 𝑓 𝑎 𝑚ℎ 𝑒, 𝑓
𝐷
𝐸
𝐹
𝐻
𝐺
𝐵
Step 3: Eliminate 𝐹
Compute 𝑚𝑓 𝑒, 𝑎 =
𝐴
𝑓𝑃
𝑓 𝑎 𝑚ℎ (𝑒, 𝑓)
⇒ 𝑃 𝑎 𝑃 𝑏 𝑃 𝑐 𝑏 𝑃 𝑑 𝑎 𝑃 𝑒 𝑐, 𝑑 𝑚𝑓 (𝑒, 𝑎)
𝐶
𝐴
𝐷
𝐸
24
Example: Variable Elimination
Query: 𝑃(𝐴|ℎ), need to eliminate 𝐵, 𝐶, 𝐷, 𝐸
𝐶
Initial factors
𝑃 𝑎 𝑃
⇒𝑃 𝑎
⇒𝑃 𝑎
⇒𝑃 𝑎
𝑏 𝑃
𝑃 𝑏
𝑃 𝑏
𝑃 𝑏
𝑐𝑏 𝑃
𝑃 𝑐𝑏
𝑃 𝑐𝑏
𝑃 𝑐𝑏
𝐵
𝑑𝑎 𝑃
𝑃 𝑑𝑎
𝑃 𝑑𝑎
𝑃 𝑑𝑎
𝑒 𝑐, 𝑑 𝑃
𝑃 𝑒 𝑐, 𝑑
𝑃 𝑒 𝑐, 𝑑
𝑃 𝑒 𝑐, 𝑑
𝑓 𝑎 𝑃 𝑔 𝑒 𝑃 ℎ 𝑒, 𝑓
𝑃 𝑓 𝑎 𝑃 𝑔 𝑒 𝑚ℎ 𝑒, 𝑓
𝑃 𝑓 𝑎 𝑚ℎ 𝑒, 𝑓
𝑚𝑓 𝑎, 𝑒
𝐴
𝐷
𝐸
𝐹
𝐻
𝐺
𝐵
Step 3: Eliminate 𝐸
Compute 𝑚𝑒 𝑎, 𝑐, 𝑑 =
𝑒𝑃
𝑒 𝑐, 𝑑 𝑚𝑓 (𝑎, 𝑒)
𝐶
𝐴
𝐷
⇒ 𝑃 𝑎 𝑃 𝑏 𝑃 𝑐 𝑏 𝑃 𝑑 𝑎 𝑚𝑒 (𝑎, 𝑐, 𝑑)
25
Example: Variable Elimination
Query: 𝑃(𝐴|ℎ), need to eliminate 𝐵, 𝐶, 𝐷
𝐶
Initial factors
𝑃 𝑎 𝑃
⇒𝑃 𝑎
⇒𝑃 𝑎
⇒𝑃 𝑎
𝑏 𝑃
𝑃 𝑏
𝑃 𝑏
𝑃 𝑏
𝑐𝑏 𝑃
𝑃 𝑐𝑏
𝑃 𝑐𝑏
𝑃 𝑐𝑏
𝐵
𝑑𝑎 𝑃
𝑃 𝑑𝑎
𝑃 𝑑𝑎
𝑃 𝑑𝑎
𝑒 𝑐, 𝑑 𝑃
𝑃 𝑒 𝑐, 𝑑
𝑃 𝑒 𝑐, 𝑑
𝑃 𝑒 𝑐, 𝑑
𝑓 𝑎 𝑃 𝑔 𝑒 𝑃 ℎ 𝑒, 𝑓
𝑃 𝑓 𝑎 𝑃 𝑔 𝑒 𝑚ℎ 𝑒, 𝑓
𝑃 𝑓 𝑎 𝑚ℎ 𝑒, 𝑓
𝑚𝑓 𝑎, 𝑒
𝐷
𝐸
𝐹
𝐻
𝐺
⇒ 𝑃 𝑎 𝑃 𝑏 𝑃 𝑐 𝑏 𝑃 𝑑 𝑎 𝑚𝑒 𝑎, 𝑐, 𝑑
Step 3: Eliminate 𝐷
𝐴
𝐵
𝐴
𝐶
Compute 𝑚𝑑 𝑎, 𝑐 = 𝑑 𝑃 𝑑 𝑎 𝑚𝑒 (𝑎, 𝑐, 𝑑)
⇒ 𝑃 𝑎 𝑃 𝑏 𝑃 𝑐 𝑏 𝑚𝑑 (𝑎, 𝑐)
26
Example: Variable Elimination
Query: 𝑃(𝐴|ℎ), need to eliminate 𝐵, 𝐶
𝐶
Initial factors
𝑃 𝑎 𝑃
⇒𝑃 𝑎
⇒𝑃 𝑎
⇒𝑃 𝑎
𝑏 𝑃
𝑃 𝑏
𝑃 𝑏
𝑃 𝑏
𝑐𝑏 𝑃
𝑃 𝑐𝑏
𝑃 𝑐𝑏
𝑃 𝑐𝑏
𝐵
𝑑𝑎 𝑃
𝑃 𝑑𝑎
𝑃 𝑑𝑎
𝑃 𝑑𝑎
𝑒 𝑐, 𝑑 𝑃
𝑃 𝑒 𝑐, 𝑑
𝑃 𝑒 𝑐, 𝑑
𝑃 𝑒 𝑐, 𝑑
𝑓 𝑎 𝑃 𝑔 𝑒 𝑃 ℎ 𝑒, 𝑓
𝑃 𝑓 𝑎 𝑃 𝑔 𝑒 𝑚ℎ 𝑒, 𝑓
𝑃 𝑓 𝑎 𝑚ℎ 𝑒, 𝑓
𝑚𝑓 𝑎, 𝑒
𝐴
𝐷
𝐸
𝐻
𝐺
⇒ 𝑃 𝑎 𝑃 𝑏 𝑃 𝑐 𝑏 𝑃 𝑑 𝑎 𝑚𝑒 𝑎, 𝑐, 𝑑
⇒ 𝑃 𝑎 𝑃 𝑏 𝑃 𝑐 𝑏 𝑚𝑑 𝑎, 𝑐
𝐹
𝐵
𝐴
𝐶
Step 3: Eliminate 𝐶
Compute 𝑚𝑐 𝑎, 𝑏 =
⇒ 𝑃 𝑎 𝑃 𝑏 𝑚𝑐 (𝑎, 𝑏)
𝑐𝑃
𝑐 𝑏 𝑚𝑑 (𝑎, 𝑐)
27
Example: Variable Elimination
Query: 𝑃(𝐴|ℎ), need to eliminate 𝐵
𝐶
Initial factors
𝑃 𝑎 𝑃
⇒𝑃 𝑎
⇒𝑃 𝑎
⇒𝑃 𝑎
𝑏 𝑃
𝑃 𝑏
𝑃 𝑏
𝑃 𝑏
𝑐𝑏 𝑃
𝑃 𝑐𝑏
𝑃 𝑐𝑏
𝑃 𝑐𝑏
𝐵
𝑑𝑎 𝑃
𝑃 𝑑𝑎
𝑃 𝑑𝑎
𝑃 𝑑𝑎
𝑒 𝑐, 𝑑 𝑃
𝑃 𝑒 𝑐, 𝑑
𝑃 𝑒 𝑐, 𝑑
𝑃 𝑒 𝑐, 𝑑
𝑓 𝑎 𝑃 𝑔 𝑒 𝑃 ℎ 𝑒, 𝑓
𝑃 𝑓 𝑎 𝑃 𝑔 𝑒 𝑚ℎ 𝑒, 𝑓
𝑃 𝑓 𝑎 𝑚ℎ 𝑒, 𝑓
𝑚𝑓 𝑎, 𝑒
⇒ 𝑃 𝑎 𝑃 𝑏 𝑃 𝑐 𝑏 𝑃 𝑑 𝑎 𝑚𝑒 𝑎, 𝑐, 𝑑
⇒ 𝑃 𝑎 𝑃 𝑏 𝑃 𝑐 𝑏 𝑚𝑑 𝑎, 𝑐
⇒ 𝑃 𝑎 𝑃 𝑏 𝑚𝑐 𝑎, 𝑏
𝐴
𝐷
𝐸
𝐹
𝐻
𝐺
𝐵
𝐴
𝐶
Step 3: Eliminate 𝐶
Compute 𝑚𝑏 𝑎 =
⇒ 𝑃 𝑎 𝑚𝑏 (𝑎)
𝑏 𝑃(𝑏)𝑚𝑐 (𝑎, 𝑏)
28
Example: Variable Elimination
Query: 𝑃(𝐴|ℎ), need to renormalize over 𝐴
𝐶
Initial factors
𝑃 𝑎 𝑃
⇒𝑃 𝑎
⇒𝑃 𝑎
⇒𝑃 𝑎
𝑏 𝑃
𝑃 𝑏
𝑃 𝑏
𝑃 𝑏
⇒𝑃
⇒𝑃
⇒𝑃
⇒𝑃
𝑃 𝑏 𝑃 𝑐 𝑏 𝑃 𝑑 𝑎 𝑚𝑒 𝑎, 𝑐, 𝑑
𝑃 𝑏 𝑃 𝑐 𝑏 𝑚𝑑 𝑎, 𝑐
𝑃 𝑏 𝑚𝑐 𝑎, 𝑏
𝑚𝑏 𝑎
𝑎
𝑎
𝑎
𝑎
𝐵
𝑐𝑏 𝑃
𝑃 𝑐𝑏
𝑃 𝑐𝑏
𝑃 𝑐𝑏
𝑑𝑎 𝑃
𝑃 𝑑𝑎
𝑃 𝑑𝑎
𝑃 𝑑𝑎
𝑒 𝑐, 𝑑 𝑃
𝑃 𝑒 𝑐, 𝑑
𝑃 𝑒 𝑐, 𝑑
𝑃 𝑒 𝑐, 𝑑
𝑓 𝑎 𝑃 𝑔 𝑒 𝑃 ℎ 𝑒, 𝑓
𝑃 𝑓 𝑎 𝑃 𝑔 𝑒 𝑚ℎ 𝑒, 𝑓
𝑃 𝑓 𝑎 𝑚ℎ 𝑒, 𝑓
𝑚𝑓 𝑎, 𝑒
𝐴
𝐷
𝐸
𝐹
𝐻
𝐺
𝐵
𝐴
𝐶
Step 3: renormalize
𝑃 𝑎, ℎ = 𝑃 𝑎 𝑚𝑏 𝑎 , compute 𝑃(ℎ) =
⇒𝑃 𝑎ℎ =
𝑎𝑃
𝑎 𝑚𝑏 (𝑎)
𝑃 𝑎 𝑚𝑏 (𝑎)
𝑎 𝑃 𝑎 𝑚𝑏 (𝐴)
29
Complexity of variable elimination
Suppose in one elimination step we compute
𝑚𝑥 𝑦1 , … , 𝑦𝑘 =
𝑚𝑥′ 𝑥, 𝑦1 , … , 𝑦𝑘
′
𝑥 𝑚𝑥 (𝑥, 𝑦1 , … , 𝑦𝑘 )
= 𝑘𝑖=1 𝑚𝑖 𝑥, 𝑦𝑐𝑖
𝑋
𝑦1
This requires
𝑘 ∗ 𝑉𝑎𝑙 𝑋 ∗
𝑖
𝑉𝑎𝑙 𝑌𝑐𝑖
𝑦𝑖
𝑦𝑘
multiplications
For each value of 𝑥, 𝑦1 , … , 𝑦𝑘 , we do k multiplications
𝑉𝑎𝑙 𝑋 ∗
𝑖
𝑉𝑎𝑙 𝑌𝑐𝑖
For each value of 𝑦1 , … , 𝑦𝑘 , we do 𝑉𝑎𝑙 𝑋 additions
Complexity is exponential in the number of variables in the
intermediate factor
30
From Variable Elimination to Message Passing
Recall that induced dependency during marginalization is
captured in elimination cliques
Summation  Elimination
Intermediate term  Elimination cliques
Can this lead to an generic inference algorithm?
31
Tree Graphical Models
Undirected tree: a unique path
between any pair of nodes
Directed tree: all nodes except
the root have exactly one parent
32
Equivalence of directed and undirected trees
Any undirected tree can be converted to a directed tree by
choosing a root node and directing all edges away from it
A directed tree and the corresponding undirected tree make
the conditional independence assertions
Parameterization are essentially the same
Undirected tree: 𝑃 𝑋 =
1
𝑍
Directed tree: 𝑃 𝑋 = 𝑃 𝑋𝑟
𝑖∈V
Ψ 𝑋𝑖
(𝑖,𝑗)∈E Ψ(𝑋𝑖 , 𝑋𝑗 )
𝑖,𝑗 ∈𝐸 𝑃(𝑋𝑗 |𝑋𝑖 )
Equivalence: Ψ 𝑋𝑖 = 𝑃 𝑋𝑟 , Ψ 𝑋𝑖 , 𝑋𝑗 = 𝑃 𝑋𝑗 𝑋𝑖 , 𝑍 =
1, Ψ 𝑋𝑖 = 1
33
From Variable Elimination to Message Passing
Recall Variable Elimination Algorithm
Choose an ordering in which the query node 𝑓 is the final node
Eliminate node 𝑖 by removing all potentials containing 𝑖, take
sum/product over 𝑥𝑖
Place the resultant factor back
For a Tree graphical model:
Choose query node f as the root of the tree
View tree as a directed tree with edges pointing towards 𝑓
Elimination of each node can be considered as message-passing
directly along tree branches, rather than on some transformed
graphs
Thus, we can use the tree itself as a data-structure to inference
34
Message passing for trees
Let 𝑚𝑖𝑗 𝑋𝑖 denote the factor resulting from eliminating
variables from below up to 𝑖, which is a function 𝑋𝑖
𝑚𝑗𝑖 𝑋𝑖 =
𝑥𝑗
Ψ 𝑥𝑗 Ψ 𝑋𝑖 , 𝑥𝑗
𝑘∈𝑁 𝑗 \i 𝑚𝑘𝑗 (𝑥𝑗 )
This is like a message sent from 𝑗 to 𝑖
𝑘
𝑚𝑘𝑗 𝑋𝑗
𝑚𝑗𝑖 𝑋𝑖
𝑗
𝑙
𝑚𝑙𝑗 𝑋𝑗
𝑚𝑗𝑖 𝑋𝑓
𝑖
𝑓
𝑃 𝑥𝑓 ∝ Ψ 𝑥𝑓
𝑚𝑒𝑓 (𝑥𝑓 )
𝑒∈𝑁 𝑓
𝑚𝑒𝑓 (𝑥𝑓 ) represents a belief on 𝑥𝑓 from 𝑥𝑒
35
```