Inference in Graphical Models Variable Elimination and Message Passing Algorithm Le Song

Inference in Graphical Models Variable Elimination and Message Passing Algorithm Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Conditional Independence Assumptions Local Markov Assumption Global Markov Assumption 𝑋 ⊥ 𝑁𝑜𝑛𝑑𝑒𝑠𝑐𝑒𝑛𝑑𝑎𝑛𝑡𝑋 |𝑃𝑎𝑋 𝐴 ⊥ 𝐵|𝐶, 𝑠𝑒𝑝𝐺 𝐴, 𝐵; 𝐶 𝑁𝑜𝑛𝑑𝑒𝑠𝑐𝑒𝑛𝑑𝑎𝑛𝑡𝑋 𝑃𝑎𝑋 𝑋 Undirected Tree Undirected Chordal Graph 𝐵𝑁 𝑀𝑁 𝐴 𝐶 𝐵 𝑃 Moralize 𝑀𝑁 𝐵𝑁 Triangulate 2 Distribution Factorization Bayesian Networks (Directed Graphical Models) 𝐼 − 𝑚𝑎𝑝: 𝐼𝑙 𝐺 ⊆ 𝐼 𝑃 ⇔ 𝑛 𝑃(𝑋1 , … , 𝑋𝑛 ) = Conditional Probability Tables (CPTs) 𝑃(𝑋𝑖 | 𝑃𝑎𝑋𝑖 ) 𝑖=1 Markov Networks (Undirected Graphical Models) 𝑠𝑡𝑟𝑖𝑐𝑡𝑙𝑦 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑃, 𝐼 − 𝑚𝑎𝑝: 𝐼 𝐺 ⊆ 𝐼 𝑃 Clique ⇔ 𝑚 Potentials 1 𝑃(𝑋1 , … , 𝑋𝑛 ) = Ψ𝑖 𝐷𝑖 𝑍 Maximal 𝑖=1 Normalization (Partition Function) 𝑚 𝑍 = 𝑥1 ,𝑥2 ,…,𝑥𝑛 Ψ𝑖 𝐷𝑖 Clique 𝑖=1 3 Inference in Graphical Models Graphical models give compact representations of probabilistic distributions 𝑃 𝑋1 , … , 𝑋𝑛 (n-way tables to much smaller tables) How do we answer queries about P? We use inference as a name for the process of computing answers to such queries 4 Query Type 1: Likelihood Most queries involve evidence Evidence 𝑒 is an assignment of values to a set 𝐸 variables Evidence are observations on some variables Without loss of generality 𝐸 = 𝑋𝑘+1 , … , 𝑋𝑛 Simplest query: compute probability of evidence 𝑃 𝑒 = 𝑥1 … 𝑥𝑘 𝑃(𝑥1 , … , 𝑥𝑘 , 𝑒) This is often referred to as computing the likelihood of 𝑒 Sum over this set of variables 𝐸 5 Query Type 2: Conditional Probability Often we are interested in the conditional probability distribution of a variable given the evidence 𝑃 𝑋, 𝑒 𝑃(𝑋, 𝑒) 𝑃 𝑋𝑒 = = 𝑃 𝑒 𝑥 𝑃(𝑋 = 𝑥, 𝑒) It is also called a posteriori belief in 𝑋 given evidence 𝑒 We usually query a subset Y of all variables 𝒳 = {𝑌, 𝑍, 𝑒} and “don’t care” about the remaining 𝑍 𝑃 𝑌𝑒 = 𝑃(𝑌, 𝑍 = 𝑧|𝑒) 𝑧 Take all possible configuration of 𝑍 into account The processes of summing out the unwanted variable Z is called marginalization 6 Query Type 2: Conditional Probability Example Interested in the conditionals for these variables Sum over this set of variables 𝐸 Interested in the conditionals for these variables 𝐸 Sum over this set of variables 7 Application of a posteriori Belief Prediction: what is the probability of an outcome given the starting condition The query node is a descendent of the evidence 𝐴 𝐵 𝐶 Diagnosis: what is the probability of disease/fault given symptoms The query node is an ancestor of the evidence 𝐴 𝐵 𝐶 Learning under partial observations (Fill in the unobserved) Information can flow in either direction Inference can combine evidence from all parts of the networks 8 Query Type 3: Most Probable Assignment Want to find the most probably joint assignment for some variables of interests Such reasoning is usually performed under some given evidence 𝑒, and ignoring (the values of other variables) 𝑍 Also called maximum a posteriori (MAP) assignment for 𝑌 𝑀𝐴𝑃 𝑌 𝑒 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑦 𝑃 𝑌 𝑒 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑦 𝑧 𝑃 𝑌, 𝑍 = 𝑧 𝑒 Interested in the most probable values for these variables Sum over this set of variables 𝐸 9 Application of MAP assignment Classification Find most likely label, given the evidence Explanation What is the most likely scenario, given the evidence Cautionary note: The MAP assignment of a variable dependence on its context – the set of varibales being jointly queried Example: MAP of 𝑋, 𝑌 ? (0, 0) MAP of 𝑋? 1 X Y P(X,Y) X P(X) 0 0 0.35 0 0.4 0 1 0.05 1 0.6 1 0 0.3 1 1 0.3 10 Complexity of Inference Computing the a posteriori belief 𝑃 𝑋 𝑒 in a GM is NP-hard in general Hardness implies we cannot find a general procedure that works efficiently for arbitrary GMs For particular families of GMs, we can have provably efficient procedures eg. trees For some families of GMs, we need to design efficient approximate inference algorithms eg. grids 11 Approaches to inference Exact inference algorithms Variable elimination algorithm Message-passing algorithm (sum-product, belief propagation algorithm) The junction tree algorithm Approximate inference algorithms Sampling methods/Stochastic simulation Variational algorithms 12 Marginalization and Elimination A metabolic pathway: What is the likelihood protein 𝐸 is produced 𝐵 𝐴 𝐷 𝐶 𝐸 Query: P(E) 𝑃 𝐸 = 𝑑 𝑐 𝑏 𝑎𝑃 𝑎, 𝑏, 𝑐, 𝑑, 𝐸 Using graphical models, we get 𝑃 𝐸 = 𝑑 𝑐 𝑏 𝑎𝑃 𝑎)𝑃 𝑏 𝑎 𝑃 𝑐 𝑏 𝑃 𝑑 𝑐 𝑃(𝐸|𝑑 Naïve summation needs to enumerate over an exponential number of terms 13 Elimination in Chains 𝐴 𝐵 𝐷 𝐶 𝐸 Rearranging terms and the summations 𝑃 𝐸 = 𝑃 𝑎)𝑃 𝑏 𝑎 𝑃 𝑐 𝑏 𝑃 𝑑 𝑐 𝑃(𝐸|𝑑 𝑑 𝑐 𝑏 = 𝑎 𝑃 𝑐𝑏 𝑃 𝑑𝑐 𝑃 𝐸𝑑 𝑑 𝑐 𝑏 𝑃 𝑎 𝑃 𝑏𝑎 𝑎 14 Elimination in Chains (cont.) 𝐵 𝐴 𝐷 𝐶 𝐸 𝑃(𝑏) Now we can perform innermost summation efficiently 𝑃 𝐸 = 𝑃 𝑐𝑏 𝑃 𝑑𝑐 𝑃 𝐸𝑑 𝑑 𝑐 𝑏 = 𝑃 𝑎 𝑃 𝑏𝑎 𝑎 𝑃 𝑐 𝑏 𝑃 𝑑 𝑐 𝑃 𝐸 𝑑 𝑃(𝑏) 𝑑 𝑐 𝑏 Equivalent to matrix-vector multiplication, |Val(A)| * |Val(B)| The innermost summation eliminates one variable from our summation argument at a local cost. 15 Elimination in Chains (cont.) 𝐵 𝐴 𝑃(𝑏) 𝐶 𝐸 𝐷 𝑃(𝑐) Rearranging and then summing again, we get C B 𝑃 𝐸 = 𝑃 𝑐 𝑏 𝑃 𝑑 𝑐 𝑃 𝑒 𝑑 𝑃(𝑏) 𝑑 𝑐 = 𝑐 = 1 B 0 0 0 .15 0.35 0 0 .25 1 0.85 1 0.75 0.65 𝑏 𝑃 𝑑𝑐 𝑃 𝐸𝑑 𝑑 0 𝑃 𝑐𝑏 𝑃 𝑏 𝑏 𝑃 𝑑 𝑐 𝑃 𝐸 𝑑 𝑃(𝑐) 𝑑 𝑐 Equivalent to matrix-vector multiplication, |Val(B)| * |Val(C)| 16 Elimination in Chains (cont.) 𝐵 𝐴 𝑃(𝑏) 𝐶 𝐷 𝐸 𝑃(𝑐) Eliminate nodes one by one all the way to the end 𝑃 𝐸 = 𝑃 𝐸 𝑑 𝑃(𝑑) 𝑑 Computational Complexity for a chain of length 𝑘 Each step costs O(|Val(𝑋𝑖 )| * |Val(𝑋𝑖+1 )|) operations: O(𝑘𝑛2 ) Ψ 𝑋𝑖 = 𝑥𝑖−1 𝑃 𝑋𝑖 𝑋𝑖−1 )𝑃(𝑋𝑖−1 ) Compare to naïve summation: O(𝑛𝑘 ) 𝑥1 … 𝑥𝑘−1 𝑃(𝑥1 , … , 𝑋𝑘 ) 17 Undirected Chains 𝐵 𝐴 𝐶 𝐸 𝐷 Rearrange terms, perform local summation … 𝑃 𝐸 = 𝑑 1 = 𝑍 1 = 𝑍 𝑐 𝑏 𝑎 1 Ψ 𝑏, 𝑎 Ψ 𝑐, 𝑏 Ψ 𝑑, 𝑐 Ψ(𝐸, 𝑑) 𝑍 Ψ 𝑐, 𝑏 Ψ 𝑑, 𝑐 Ψ 𝐸, 𝑑 𝑑 𝑐 𝑏 Ψ 𝑏, 𝑎 𝑎 Ψ 𝑐, 𝑏 Ψ 𝑑, 𝑐 Ψ 𝐸, 𝑑 Ψ 𝑏 𝑑 𝑐 𝑏 18 The Sum-Product Operation During inference, we try to compute an expression Sum-product form: 𝑍 Ψ∈𝓕 Ψ 𝓧 = {𝑋1 , … , 𝑋𝑛 } the set of variables 𝓕 a set of factors such that for each Ψ ∈ 𝓕, 𝑆𝑐𝑜𝑝𝑒 Ψ ∈ 𝓧 𝓨 ⊂ 𝓧 a set of query variables 𝓩 = 𝓧 − 𝓨 the variables to eliminate The result of eliminating the variables in 𝓩 is a factor 𝜏 𝓨 = Ψ 𝑧 Ψ∈𝓕 This factor does not necessarily correspond to any probability or conditional probability in the network. 𝑃 𝓨 = 𝜏(𝓨) 𝜏(𝓨) 19 Inference via Variable Elimination General Idea Write query in the form 𝑃 𝑋1 , 𝑒 = … 𝑥𝑛 𝑃 𝑥𝑖 𝑃𝑎𝑋𝑖 𝑥3 𝑥2 𝑖 The sum is ordered to suggest an elimination order Then iteratively Move all irrelevant terms outside of innermost sum Perform innermost sum, getting a new term Insert the new term into the product Finally renormalize 𝑃 𝑋1 𝑒 = 𝜏 𝑋1 , 𝑒 𝑥1 𝜏(𝑋1 , 𝑒) 20 A more complex network A food web 𝐵 𝐶 𝐴 𝐷 𝐸 𝐺 𝐹 𝐻 What is the probability 𝑃 𝐴 𝐻 that hawks are leaving given that the grass condition is poor? 21 Example: Variable Elimination Query: 𝑃(𝐴|ℎ), need to eliminate 𝐵, 𝐶, 𝐷, 𝐸, 𝐹, 𝐺, 𝐻 Initial factors 𝐵 𝐶 𝑃 𝑎 𝑃 𝑏 𝑃 𝑐 𝑏 𝑃 𝑑 𝑎 𝑃 𝑒 𝑐, 𝑑 𝑃 𝑓 𝑎 𝑃 𝑔 𝑒 𝑃 ℎ 𝑒, 𝑓 Choose an elimination order: 𝐻, 𝐺, 𝐹, 𝐸, 𝐷, 𝐶, 𝐵 (<) 𝐷 𝐸 𝐹 𝐻 𝐺 Step 1: Eliminate G Conditioning (fix the evidence node on its observed value) 𝑚ℎ 𝑒, 𝑓 = 𝑃(𝐻 = ℎ|𝑒, 𝑓) 𝐴 𝐵 𝐶 𝐴 𝐷 𝐹 𝐸 𝐺 22 Example: Variable Elimination Query: 𝑃(𝐴|ℎ), need to eliminate 𝐵, 𝐶, 𝐷, 𝐸, 𝐹, 𝐺 𝐵 𝐶 Initial factors 𝑃 𝑎 𝑃 𝑏 𝑃 𝑐 𝑏 𝑃 𝑑 𝑎 𝑃 𝑒 𝑐, 𝑑 𝑃 𝑓 𝑎 𝑃 𝑔 𝑒 𝑃 ℎ 𝑒, 𝑓 ⇒ 𝑃 𝑎 𝑃 𝑏 𝑃 𝑐 𝑏 𝑃 𝑑 𝑎 𝑃 𝑒 𝑐, 𝑑 𝑃 𝑓 𝑎 𝑃 𝑔 𝑒 𝑚ℎ (𝑒, 𝑓) 𝐴 𝐷 𝐸 𝐹 𝐻 𝐺 Step 2: Eliminate 𝐺 Compute 𝑚𝑔 𝑒 = 𝑔𝑃 𝑔𝑒 =1 𝐵 ⇒ 𝑃 𝑎 𝑃 𝑏 𝑃 𝑐 𝑏 𝑃 𝑑 𝑎 𝑃 𝑒 𝑐, 𝑑 𝑃 𝑓 𝑎 𝑚𝑔 𝑒 𝑚ℎ (𝑒, 𝑓) ⇒ 𝑃 𝑎 𝑃 𝑏 𝑃 𝑐 𝑏 𝑃 𝑑 𝑎 𝑃 𝑒 𝑐, 𝑑 𝑃 𝑓 𝑎 𝑚ℎ (𝑒, 𝑓) 𝐶 𝐴 𝐷 𝐹 𝐸 23 Example: Variable Elimination Query: 𝑃(𝐴|ℎ), need to eliminate 𝐵, 𝐶, 𝐷, 𝐸, 𝐹 𝐵 𝐶 Initial factors 𝑃 𝑎 𝑃 𝑏 𝑃 𝑐 𝑏 𝑃 𝑑 𝑎 𝑃 𝑒 𝑐, 𝑑 𝑃 𝑓 𝑎 𝑃 𝑔 𝑒 𝑃 ℎ 𝑒, 𝑓 ⇒ 𝑃 𝑎 𝑃 𝑏 𝑃 𝑐 𝑏 𝑃 𝑑 𝑎 𝑃 𝑒 𝑐, 𝑑 𝑃 𝑓 𝑎 𝑃 𝑔 𝑒 𝑚ℎ 𝑒, 𝑓 ⇒ 𝑃 𝑎 𝑃 𝑏 𝑃 𝑐 𝑏 𝑃 𝑑 𝑎 𝑃 𝑒 𝑐, 𝑑 𝑃 𝑓 𝑎 𝑚ℎ 𝑒, 𝑓 𝐷 𝐸 𝐹 𝐻 𝐺 𝐵 Step 3: Eliminate 𝐹 Compute 𝑚𝑓 𝑒, 𝑎 = 𝐴 𝑓𝑃 𝑓 𝑎 𝑚ℎ (𝑒, 𝑓) ⇒ 𝑃 𝑎 𝑃 𝑏 𝑃 𝑐 𝑏 𝑃 𝑑 𝑎 𝑃 𝑒 𝑐, 𝑑 𝑚𝑓 (𝑒, 𝑎) 𝐶 𝐴 𝐷 𝐸 24 Example: Variable Elimination Query: 𝑃(𝐴|ℎ), need to eliminate 𝐵, 𝐶, 𝐷, 𝐸 𝐶 Initial factors 𝑃 𝑎 𝑃 ⇒𝑃 𝑎 ⇒𝑃 𝑎 ⇒𝑃 𝑎 𝑏 𝑃 𝑃 𝑏 𝑃 𝑏 𝑃 𝑏 𝑐𝑏 𝑃 𝑃 𝑐𝑏 𝑃 𝑐𝑏 𝑃 𝑐𝑏 𝐵 𝑑𝑎 𝑃 𝑃 𝑑𝑎 𝑃 𝑑𝑎 𝑃 𝑑𝑎 𝑒 𝑐, 𝑑 𝑃 𝑃 𝑒 𝑐, 𝑑 𝑃 𝑒 𝑐, 𝑑 𝑃 𝑒 𝑐, 𝑑 𝑓 𝑎 𝑃 𝑔 𝑒 𝑃 ℎ 𝑒, 𝑓 𝑃 𝑓 𝑎 𝑃 𝑔 𝑒 𝑚ℎ 𝑒, 𝑓 𝑃 𝑓 𝑎 𝑚ℎ 𝑒, 𝑓 𝑚𝑓 𝑎, 𝑒 𝐴 𝐷 𝐸 𝐹 𝐻 𝐺 𝐵 Step 3: Eliminate 𝐸 Compute 𝑚𝑒 𝑎, 𝑐, 𝑑 = 𝑒𝑃 𝑒 𝑐, 𝑑 𝑚𝑓 (𝑎, 𝑒) 𝐶 𝐴 𝐷 ⇒ 𝑃 𝑎 𝑃 𝑏 𝑃 𝑐 𝑏 𝑃 𝑑 𝑎 𝑚𝑒 (𝑎, 𝑐, 𝑑) 25 Example: Variable Elimination Query: 𝑃(𝐴|ℎ), need to eliminate 𝐵, 𝐶, 𝐷 𝐶 Initial factors 𝑃 𝑎 𝑃 ⇒𝑃 𝑎 ⇒𝑃 𝑎 ⇒𝑃 𝑎 𝑏 𝑃 𝑃 𝑏 𝑃 𝑏 𝑃 𝑏 𝑐𝑏 𝑃 𝑃 𝑐𝑏 𝑃 𝑐𝑏 𝑃 𝑐𝑏 𝐵 𝑑𝑎 𝑃 𝑃 𝑑𝑎 𝑃 𝑑𝑎 𝑃 𝑑𝑎 𝑒 𝑐, 𝑑 𝑃 𝑃 𝑒 𝑐, 𝑑 𝑃 𝑒 𝑐, 𝑑 𝑃 𝑒 𝑐, 𝑑 𝑓 𝑎 𝑃 𝑔 𝑒 𝑃 ℎ 𝑒, 𝑓 𝑃 𝑓 𝑎 𝑃 𝑔 𝑒 𝑚ℎ 𝑒, 𝑓 𝑃 𝑓 𝑎 𝑚ℎ 𝑒, 𝑓 𝑚𝑓 𝑎, 𝑒 𝐷 𝐸 𝐹 𝐻 𝐺 ⇒ 𝑃 𝑎 𝑃 𝑏 𝑃 𝑐 𝑏 𝑃 𝑑 𝑎 𝑚𝑒 𝑎, 𝑐, 𝑑 Step 3: Eliminate 𝐷 𝐴 𝐵 𝐴 𝐶 Compute 𝑚𝑑 𝑎, 𝑐 = 𝑑 𝑃 𝑑 𝑎 𝑚𝑒 (𝑎, 𝑐, 𝑑) ⇒ 𝑃 𝑎 𝑃 𝑏 𝑃 𝑐 𝑏 𝑚𝑑 (𝑎, 𝑐) 26 Example: Variable Elimination Query: 𝑃(𝐴|ℎ), need to eliminate 𝐵, 𝐶 𝐶 Initial factors 𝑃 𝑎 𝑃 ⇒𝑃 𝑎 ⇒𝑃 𝑎 ⇒𝑃 𝑎 𝑏 𝑃 𝑃 𝑏 𝑃 𝑏 𝑃 𝑏 𝑐𝑏 𝑃 𝑃 𝑐𝑏 𝑃 𝑐𝑏 𝑃 𝑐𝑏 𝐵 𝑑𝑎 𝑃 𝑃 𝑑𝑎 𝑃 𝑑𝑎 𝑃 𝑑𝑎 𝑒 𝑐, 𝑑 𝑃 𝑃 𝑒 𝑐, 𝑑 𝑃 𝑒 𝑐, 𝑑 𝑃 𝑒 𝑐, 𝑑 𝑓 𝑎 𝑃 𝑔 𝑒 𝑃 ℎ 𝑒, 𝑓 𝑃 𝑓 𝑎 𝑃 𝑔 𝑒 𝑚ℎ 𝑒, 𝑓 𝑃 𝑓 𝑎 𝑚ℎ 𝑒, 𝑓 𝑚𝑓 𝑎, 𝑒 𝐴 𝐷 𝐸 𝐻 𝐺 ⇒ 𝑃 𝑎 𝑃 𝑏 𝑃 𝑐 𝑏 𝑃 𝑑 𝑎 𝑚𝑒 𝑎, 𝑐, 𝑑 ⇒ 𝑃 𝑎 𝑃 𝑏 𝑃 𝑐 𝑏 𝑚𝑑 𝑎, 𝑐 𝐹 𝐵 𝐴 𝐶 Step 3: Eliminate 𝐶 Compute 𝑚𝑐 𝑎, 𝑏 = ⇒ 𝑃 𝑎 𝑃 𝑏 𝑚𝑐 (𝑎, 𝑏) 𝑐𝑃 𝑐 𝑏 𝑚𝑑 (𝑎, 𝑐) 27 Example: Variable Elimination Query: 𝑃(𝐴|ℎ), need to eliminate 𝐵 𝐶 Initial factors 𝑃 𝑎 𝑃 ⇒𝑃 𝑎 ⇒𝑃 𝑎 ⇒𝑃 𝑎 𝑏 𝑃 𝑃 𝑏 𝑃 𝑏 𝑃 𝑏 𝑐𝑏 𝑃 𝑃 𝑐𝑏 𝑃 𝑐𝑏 𝑃 𝑐𝑏 𝐵 𝑑𝑎 𝑃 𝑃 𝑑𝑎 𝑃 𝑑𝑎 𝑃 𝑑𝑎 𝑒 𝑐, 𝑑 𝑃 𝑃 𝑒 𝑐, 𝑑 𝑃 𝑒 𝑐, 𝑑 𝑃 𝑒 𝑐, 𝑑 𝑓 𝑎 𝑃 𝑔 𝑒 𝑃 ℎ 𝑒, 𝑓 𝑃 𝑓 𝑎 𝑃 𝑔 𝑒 𝑚ℎ 𝑒, 𝑓 𝑃 𝑓 𝑎 𝑚ℎ 𝑒, 𝑓 𝑚𝑓 𝑎, 𝑒 ⇒ 𝑃 𝑎 𝑃 𝑏 𝑃 𝑐 𝑏 𝑃 𝑑 𝑎 𝑚𝑒 𝑎, 𝑐, 𝑑 ⇒ 𝑃 𝑎 𝑃 𝑏 𝑃 𝑐 𝑏 𝑚𝑑 𝑎, 𝑐 ⇒ 𝑃 𝑎 𝑃 𝑏 𝑚𝑐 𝑎, 𝑏 𝐴 𝐷 𝐸 𝐹 𝐻 𝐺 𝐵 𝐴 𝐶 Step 3: Eliminate 𝐶 Compute 𝑚𝑏 𝑎 = ⇒ 𝑃 𝑎 𝑚𝑏 (𝑎) 𝑏 𝑃(𝑏)𝑚𝑐 (𝑎, 𝑏) 28 Example: Variable Elimination Query: 𝑃(𝐴|ℎ), need to renormalize over 𝐴 𝐶 Initial factors 𝑃 𝑎 𝑃 ⇒𝑃 𝑎 ⇒𝑃 𝑎 ⇒𝑃 𝑎 𝑏 𝑃 𝑃 𝑏 𝑃 𝑏 𝑃 𝑏 ⇒𝑃 ⇒𝑃 ⇒𝑃 ⇒𝑃 𝑃 𝑏 𝑃 𝑐 𝑏 𝑃 𝑑 𝑎 𝑚𝑒 𝑎, 𝑐, 𝑑 𝑃 𝑏 𝑃 𝑐 𝑏 𝑚𝑑 𝑎, 𝑐 𝑃 𝑏 𝑚𝑐 𝑎, 𝑏 𝑚𝑏 𝑎 𝑎 𝑎 𝑎 𝑎 𝐵 𝑐𝑏 𝑃 𝑃 𝑐𝑏 𝑃 𝑐𝑏 𝑃 𝑐𝑏 𝑑𝑎 𝑃 𝑃 𝑑𝑎 𝑃 𝑑𝑎 𝑃 𝑑𝑎 𝑒 𝑐, 𝑑 𝑃 𝑃 𝑒 𝑐, 𝑑 𝑃 𝑒 𝑐, 𝑑 𝑃 𝑒 𝑐, 𝑑 𝑓 𝑎 𝑃 𝑔 𝑒 𝑃 ℎ 𝑒, 𝑓 𝑃 𝑓 𝑎 𝑃 𝑔 𝑒 𝑚ℎ 𝑒, 𝑓 𝑃 𝑓 𝑎 𝑚ℎ 𝑒, 𝑓 𝑚𝑓 𝑎, 𝑒 𝐴 𝐷 𝐸 𝐹 𝐻 𝐺 𝐵 𝐴 𝐶 Step 3: renormalize 𝑃 𝑎, ℎ = 𝑃 𝑎 𝑚𝑏 𝑎 , compute 𝑃(ℎ) = ⇒𝑃 𝑎ℎ = 𝑎𝑃 𝑎 𝑚𝑏 (𝑎) 𝑃 𝑎 𝑚𝑏 (𝑎) 𝑎 𝑃 𝑎 𝑚𝑏 (𝐴) 29 Complexity of variable elimination Suppose in one elimination step we compute 𝑚𝑥 𝑦1 , … , 𝑦𝑘 = 𝑚𝑥′ 𝑥, 𝑦1 , … , 𝑦𝑘 ′ 𝑥 𝑚𝑥 (𝑥, 𝑦1 , … , 𝑦𝑘 ) = 𝑘𝑖=1 𝑚𝑖 𝑥, 𝑦𝑐𝑖 𝑋 𝑦1 This requires 𝑘 ∗ 𝑉𝑎𝑙 𝑋 ∗ 𝑖 𝑉𝑎𝑙 𝑌𝑐𝑖 𝑦𝑖 𝑦𝑘 multiplications For each value of 𝑥, 𝑦1 , … , 𝑦𝑘 , we do k multiplications 𝑉𝑎𝑙 𝑋 ∗ 𝑖 𝑉𝑎𝑙 𝑌𝑐𝑖 additions For each value of 𝑦1 , … , 𝑦𝑘 , we do 𝑉𝑎𝑙 𝑋 additions Complexity is exponential in the number of variables in the intermediate factor 30 From Variable Elimination to Message Passing Recall that induced dependency during marginalization is captured in elimination cliques Summation  Elimination Intermediate term  Elimination cliques Can this lead to an generic inference algorithm? 31 Tree Graphical Models Undirected tree: a unique path between any pair of nodes Directed tree: all nodes except the root have exactly one parent 32 Equivalence of directed and undirected trees Any undirected tree can be converted to a directed tree by choosing a root node and directing all edges away from it A directed tree and the corresponding undirected tree make the conditional independence assertions Parameterization are essentially the same Undirected tree: 𝑃 𝑋 = 1 𝑍 Directed tree: 𝑃 𝑋 = 𝑃 𝑋𝑟 𝑖∈V Ψ 𝑋𝑖 (𝑖,𝑗)∈E Ψ(𝑋𝑖 , 𝑋𝑗 ) 𝑖,𝑗 ∈𝐸 𝑃(𝑋𝑗 |𝑋𝑖 ) Equivalence: Ψ 𝑋𝑖 = 𝑃 𝑋𝑟 , Ψ 𝑋𝑖 , 𝑋𝑗 = 𝑃 𝑋𝑗 𝑋𝑖 , 𝑍 = 1, Ψ 𝑋𝑖 = 1 33 From Variable Elimination to Message Passing Recall Variable Elimination Algorithm Choose an ordering in which the query node 𝑓 is the final node Eliminate node 𝑖 by removing all potentials containing 𝑖, take sum/product over 𝑥𝑖 Place the resultant factor back For a Tree graphical model: Choose query node f as the root of the tree View tree as a directed tree with edges pointing towards 𝑓 Elimination of each node can be considered as message-passing directly along tree branches, rather than on some transformed graphs Thus, we can use the tree itself as a data-structure to inference 34 Message passing for trees Let 𝑚𝑖𝑗 𝑋𝑖 denote the factor resulting from eliminating variables from below up to 𝑖, which is a function 𝑋𝑖 𝑚𝑗𝑖 𝑋𝑖 = 𝑥𝑗 Ψ 𝑥𝑗 Ψ 𝑋𝑖 , 𝑥𝑗 𝑘∈𝑁 𝑗 \i 𝑚𝑘𝑗 (𝑥𝑗 ) This is like a message sent from 𝑗 to 𝑖 𝑘 𝑚𝑘𝑗 𝑋𝑗 𝑚𝑗𝑖 𝑋𝑖 𝑗 𝑙 𝑚𝑙𝑗 𝑋𝑗 𝑚𝑗𝑖 𝑋𝑓 𝑖 𝑓 𝑃 𝑥𝑓 ∝ Ψ 𝑥𝑓 𝑚𝑒𝑓 (𝑥𝑓 ) 𝑒∈𝑁 𝑓 𝑚𝑒𝑓 (𝑥𝑓 ) represents a belief on 𝑥𝑓 from 𝑥𝑒 35

Inference in Graphical Models Variable Elimination and Message Passing Algorithm Le Song

Related documents

Products

Support

Inference in Graphical Models Variable Elimination and Message Passing Algorithm Le Song

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib