Representation for Undirected Graphical Models Le Song

Representation for Undirected Graphical Models Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Summary of Directed Graphical Models Local Markov Assumptions: 𝑋 ⊥ 𝑁𝑜𝑛𝐷𝑒𝑠𝑐𝑒𝑛𝑑𝑎𝑛𝑡𝑋 | 𝑃𝑎𝑋 I−map 𝐼𝑙 𝐺 ⊆ 𝐼 𝑃 ⇔ P X1 , … , 𝑋𝑛 factorizes as 𝑖 𝑃(𝑋𝑖 | 𝑃𝑎𝑋𝑖 ) Topological ordering, chain rule, local Markov assumptions Less edge = stronger assumption Delete any edges of G, no longer I-map ⇒ 𝐺 minimal I-map for 𝑃 P-map: 𝐼𝑙 𝐺 = 𝐼 𝑃 D-separation: Active trail for longer range dependence No active trail between 𝑋𝑖 and 𝑋𝑗 given 𝑍 ⇒ 𝑋_𝑖 ⊥ 𝑋_𝑗 | 𝑍 𝐴 𝑆 ¬ (𝐴 ⊥ 𝐻) 𝐴 ⊥𝐻|𝑆 𝐻 𝑆 𝑁 𝐻 𝑁 ⊥𝐻|𝑆 ¬(𝑁 ⊥ 𝐻) 𝐹 𝐴 𝑆 𝐴⊥𝐹 ¬ (𝐴 ⊥ 𝐹 | 𝑆) 2 How about other derived relations? ? 𝐹 ⊥ {𝐵, 𝐸, 𝐺, 𝐽} 𝐹 ⊥ {𝐵, 𝐸, 𝐺, 𝐽} ? 𝐴, 𝐶, 𝐹, 𝐼 ⊥ 𝐵, 𝐸, 𝐺, 𝐽 𝐴, 𝐶, 𝐹, 𝐼 ⊥ {𝐵, 𝐸, 𝐺, 𝐽} ? 𝐵 ⊥𝐽|𝐸 𝐵 ⊥ 𝐽|𝐸 ? 𝐸 ⊥𝐹|𝐾 ¬ (𝐸 ⊥ 𝐹 | 𝐾) ? 𝐸 ⊥ 𝐹 | {𝐾, 𝐼} 𝐸 ⊥ 𝐹 | {𝐾, 𝐼} ? 𝐹 ⊥𝐺|𝐷 𝐴 𝐵 𝐶 𝐷 𝐸 𝐹 𝐻 𝐺 𝐽 𝐼 ¬ (𝐹 ⊥ 𝐺 | 𝐷) ? 𝐹 ⊥𝐺|𝐻 ¬ (𝐹 ⊥ 𝐺 | 𝐻) 𝐾 ? 𝐹 ⊥ 𝐺 | {𝐻, 𝐾} ¬ (𝐹 ⊥ 𝐺 | {𝐻, 𝐾}) ? 𝐹 ⊥ 𝐺 | {𝐻, 𝐴} 𝐹 ⊥ 𝐺 | {𝐻, 𝐴} 3 Generate Samples from Bayesian Networks BN describe a generative process for observations First, sort the nodes in topological order 1 2 𝐹𝑙𝑢 𝐴𝑙𝑙𝑒𝑟𝑔𝑦 Then, generate sample using this order according to the CPTs 𝑆𝑖𝑛𝑢𝑠 3 Generate a set of sample for (A, F, S, N, H): Sample 𝑎𝑖 ∼ 𝑃 𝐴 Sample 𝑓𝑖 ∼ 𝑃 𝐹 Sample 𝑠𝑖 ∼ 𝑃 𝑆 𝑎𝑖 , 𝑓𝑖 Sample 𝑛𝑖 ∼ 𝑃 𝑁 𝑠𝑖 Sample ℎ𝑖 ∼ 𝑃 𝐻 𝑠𝑖 𝐻𝑒𝑎𝑑𝑎𝑐ℎ𝑒 𝑁𝑜𝑠𝑒 4 5 4 Reduction in representation 𝐷 𝐷 Easy One parent each 𝐶 𝑋1 𝑋2 … 𝑋𝑛 𝑃 𝑋1 , 𝑋2 , … , 𝑋𝑛 , 𝐶, 𝐷 = 𝑃 𝐷 𝑃 𝐶𝐷 𝑃(𝑋𝑖 | 𝐶) Difficult Multiple parents for C 𝐶 𝑋1 𝑋2 … 𝑋𝑛 𝑃 𝑋1 , 𝑋2 , … , 𝑋𝑛 , 𝐶, 𝐷 = 𝑃 𝑋1 ) … 𝑃(𝑋𝑛 𝑃 𝐶 𝑋1 , … , 𝑋𝑛 𝑃 𝐷 𝐶 𝑖=1…𝑛 Only need small two-way tables Still need (n+1)-way table, 2𝑛 parameters 5 Additional notation Observed variable: filled circle Hidden variable: open circle 𝐻1 𝐻2 𝐻3 𝑋1 𝑋2 𝑋3 A plate: repeat the variable 𝑛 times with the same CPTs 𝐶 𝑋1 𝑋2 When all 𝑃 𝑋𝑖 𝐶 are the same ... 𝑋𝑛 𝐶 𝑋𝑖 𝑛 6 Nested plate notation Eg. Latent Dirichlet Allocation (LDA) … 𝛽 𝜃𝑚 𝑊2𝑛 … … … 𝛼 𝑍2𝑛 𝜃2 𝑍22 𝑊22 𝜃1 𝑍21 𝑊21 𝜃𝑖 : topic mixing proportion for document 𝑖 𝑍𝑖𝑗 : topic indicator variable for word 𝑗 𝑤𝑖𝑗 : word 𝑗 in document 𝑖 𝛼: prior for mixing proportion 𝛽: topic parameters 𝑃 𝛼, 𝜃𝑖 , 𝑍𝑖𝑗 , 𝑊𝑖𝑗 , 𝛽 = … 𝑃 𝛼 𝑃 𝜃𝑖 𝑖 𝑃 𝑍𝑖𝑗 𝜃𝑗 𝑃 𝑊𝑖𝑗 𝑍𝑖𝑗 , 𝛽 𝑗 𝛽 𝛼 𝜃𝑖 𝑍𝑖𝑗 Edge with the same color has the same parameter 𝑊𝑖𝑗 𝑛 𝑚 7 Inexistence of P-maps for Bayesian Networks XOR: A = B XOR C 𝐴 ⊥ 𝐵, ¬ (𝐴 ⊥ 𝐵 | 𝐶) 𝐵 ⊥ 𝐶, ¬ (𝐴 ⊥ 𝐶 | 𝐵) 𝐶 ⊥ 𝐴, ¬ (𝐵 ⊥ 𝐶 | 𝐴) 𝐵 𝐴 Not P-map Can not read 𝐵⊥𝐶 𝐶 𝐴⊥C Minimal I-map 𝐴⊥𝐵 Swinging couples of variables 𝑋1 , 𝑌1 and 𝑋2 , 𝑌2 ¬ (𝑋1 ⊥ 𝑌1 ) ¬ (𝑋2 ⊥ 𝑌2 ) 𝑋1 ⊥ 𝑋2 | 𝑌1 𝑌2 𝑌1 ⊥ 𝑌2 | 𝑋1 𝑋2 No BN 𝑃-map, need new representation! 𝑌2 𝑋1 𝑋2 𝑌1 8 Undirected Graphical Models (UGM) Eg. Grid models (image processing, physics) Each node: a pixel, or an atom The values of adjacent variables are dependent due to pattern continuity or electro-magnetic force, etc. Most likely joint configuration corresponds to “low-energy” state 𝑃 𝑋1 , … , 𝑋𝑛 = 1 exp( 𝑍 2. Factorize distribution 𝜃𝑖𝑗 𝑋𝑖 𝑋𝑗 + 𝜃𝑖 𝑋𝑖 ) 𝑖∈𝑉 𝑖,𝑗 ∈𝐸 3. Representation power 𝑃 1. Read conditional independence 𝐵𝑁 𝑀𝑁 9 Read conditional independence from UGM Global Markov Independence 𝐴 ⊥ 𝐵 | 𝐶 Independence based on separation 𝐵 𝐶 𝐴 Local Markov Independence 𝑋 ⊥ 𝑇ℎ𝑒𝑅𝑒𝑠𝑡|𝐴𝐵𝐶𝐷 ABCD Markov blanket 𝐴 𝐵 𝑋 𝐶 𝐷 10 Global Markov Independence 𝑠𝑒𝑝𝐺 𝐴; 𝐵 𝐶 : C separates A and B if every path from a node in A to a node in B passes through a node in C A distribution satisfies the global Markov property if for any disjoint A, B, C, such that 𝑠𝑒𝑝𝐺 𝐴; 𝐵 𝐶 , then A is independent of C given B: 𝐼(𝐺) = {𝐴 ⊥ 𝐵 | 𝐶: 𝑠𝑒𝑝𝐺 (𝐴; 𝐵|𝐶)} 𝐴 𝐶 𝐵 11 Soundness of separation criterion The independence in 𝐼 𝐺 are precisely those that are guaranteed to hold for every MN distribution 𝑃 over 𝐺 In other words, the separation criterion is sound for detecting independence properties in MN distributions over 𝐺 In a sense, reading conditional independence from MN is simpler than that from BN 𝐵 𝐴 𝐶 𝐶 separate 𝐴 and 𝐵 But ¬ (𝐴 ⊥ 𝐵 | 𝐶) 𝐶 𝐴 𝐵𝑁 𝐵 𝐶 separate 𝐴 and 𝐵 (𝐴 ⊥ 𝐵 | 𝐶) 𝑀𝑁 𝐶 𝐴 𝐵 𝐶 separate 𝐴 and 𝐵 (𝐴 ⊥ 𝐵 | 𝐶) 12 Local Markov Independence For each node 𝑋𝑖 ∈ 𝑉, there is a unique Markov blanket of 𝑋𝑖 , denoted by 𝑀𝐵𝑋𝑖 , which is the set of immediate neighbors of 𝑋𝑖 in the graph The local Markov independence associated with G is: 𝐼𝑙 𝐺 = {𝑋𝑖 ⊥ 𝑉 – 𝑋𝑖 – 𝑀𝐵𝑋𝑖 | 𝑀𝐵𝑋𝑖 : ∀𝑖} In other words, 𝑋𝑖 is independence of the rest given its immediate neighbors 𝐴 𝐵 𝑋 𝐶 𝐷 13 Pairwise Markov independence Given a graph G=(V,E), pairwise Markov independence associated with G are 𝐼𝑝 (𝐺) = {𝑋 ⊥ 𝑌 | 𝑉 ∖ {𝑋, 𝑌}: {𝑋, 𝑌} ∉ 𝐸} E.g., 𝑋1 ⊥ 𝑋5 | 𝑋2 , 𝑋3 , 𝑋4 𝑋1 𝑋2 𝑋3 𝑋4 𝑋5 BN: need to use active trail to judge it E.g., ¬ (𝑋1 ⊥ 𝑋3 | 𝑋2 ) 𝑋1 𝑋2 𝑋3 14 Markov Blanket example Note: the local Markov independence in MN and BN can be quite different! 𝑀𝑁 A A B D 𝑋 ⊥ 𝑉 – 𝑋 – 𝐴𝐵𝐶𝐷 | 𝐴𝐵𝐶𝐷 B 𝑋 E 𝑋 C 𝐵𝑁 C D ¬ (𝑋 ⊥ 𝑉 – 𝑋 – 𝐴𝐵𝐶𝐷 | 𝐴𝐵𝐶𝐷) ! ¬ (𝑋 ⊥ 𝐸 | 𝐴𝐵𝐶𝐷) 15 Read conditional independence from UGM Global Markov Independence 𝐴 ⊥ 𝐵 | 𝐶 Independence based on separation 𝐵 𝐶 𝐴 Edges give dependence between variables, but no causal relations and generate sample is more complicated Local Markov Independence 𝑋 ⊥ 𝑇ℎ𝑒𝑅𝑒𝑠𝑡|𝐴𝐵𝐶𝐷 ABCD Markov blanket 𝐴 𝐵 How to factorize the distribution? 𝑋 𝐶 𝐷 16 Maximal Cliques For 𝐺 = 𝑉, 𝐸 , a complete subgraph (clique) is a subgraph 𝐺’ = 𝑉’ ⊆ 𝑉, 𝐸’ ⊆ 𝐸 s.t. nodes in 𝑉’ are fully connected A (maximal) clique is a complete subgraph s.t. any superset 𝑉 ′′ , 𝑉’ ⊂ 𝑉’’, is not fully connected 𝐶 𝐵 𝐴 𝐷 Example: Maximal cliques = {A,B,C}, {A,B,D} Sub-cliques = {A}, {B}, {A,B}, {C,D} … (all edges and singletons) 17 Distribution Factorization in Markov Networks Given an undirected graph G over variables 𝒳 = 𝑋1 , … , 𝑋𝑛 A distribution 𝑃 factorizes over 𝐺 if there exist subset of variables 𝐷1 ⊆ 𝒳, …, 𝐷𝑚 ⊆ 𝒳 (𝐷𝑖 are maximal cliques in 𝐺) non-negative potentials (factors/functions) Ψ1 𝐷1 ,…, Ψ𝑚 𝐷𝑚 such that 𝑃 𝑋1 , 𝑋2 , … , 𝑋𝑛 1 = 𝑍 where 𝑚 Ψ𝑖 𝐷𝑖 𝑖=1 𝑚 𝑍 = 𝑥1 ,𝑥2 ,…,𝑥𝑛 Ψ𝑖 𝐷𝑖 𝑖=1 Also know as Gibbs distributions, Markov random Fields, and undirected graphical models 18 Interpretation of Potential Functions 𝑋 𝑌 𝑍 𝑃 𝑋, 𝑌, 𝑍 = 1 Ψ 𝑋, 𝑌 Ψ(𝑌, 𝑍) 𝑍 The undirected graph implies 𝑋 ⊥ 𝑍 | 𝑌. This independence statement implies that the joint must factorizes as 𝑃 𝑋, 𝑌, 𝑍 = 𝑃 𝑌 𝑃 𝑋, 𝑍 𝑌 = 𝑃 𝑌 𝑃 𝑋 𝑌 𝑃 𝑍 𝑌 We can write as 𝑃 𝑋, 𝑌, 𝑍 = 𝑃 𝑋, 𝑌 𝑃 𝑍 𝑌 or 𝑃 𝑋 𝑌 𝑃 𝑍, 𝑌 , but cannot have all potentials be marginals cannot have all potentials be conditionals The clique potentials can be thought of as general “compatibility”, “goodness” or “happiness” functions over their variables, but not distributions/conditionals 19 Another example 𝑃 𝐴, 𝐵, 𝐶, 𝐷 = 𝑍 = 1 𝑍 𝑎,𝑏,𝑐,𝑑 Ψ1 Ψ1 𝐴, 𝐵, 𝐶 Ψ2 𝐴, 𝐵, 𝐷 𝐶 𝐴, 𝐵, 𝐶 Ψ2 𝐴, 𝐵, 𝐷 𝐵 𝐴 For discrete nodes, we can represent 𝑃(𝐴, 𝐵, 𝐶, 𝐷) as two 3-way tables instead of one 4-way tables 𝐷 𝑀𝑁 𝐵𝑁 BN: 𝑃 𝐴, 𝐵, 𝐶, 𝐷 = 𝑃 𝐶 𝑃 𝐴 𝐶 𝑃 𝐵 𝐴, 𝐶 𝑃 𝐷 𝐴, 𝐵 Two 3-way tables + one 2-way table + one vector 𝐶 𝐵 𝐴 𝐷 Each table has meaning as conditional distribution 20 Conditional Independence in Problem World, Data, Reality: BN MN True distribution 𝑃 Contains conditional independence assertions 𝐼(𝑃) local Markov assumptions 𝐼𝑙 𝐺 ⊆ 𝐼 𝑃 global Markov assumptions I 𝐺 ⊆𝐼 𝑃 21 I-map: Bayesian Networks BN encodes local Markov assumptions 𝐼𝑙 𝐺 If local conditional This BN is an I-map of P independence in BN are 𝑃 factorizes according to BN subset of Then the joint probability conditional independence in 𝑃 can be written as obtain 𝑃 𝑃(𝑋1 , … , 𝑋𝑛 ) = 𝑃(𝑋𝑖 | 𝑃𝑎𝑋𝑖 ) 𝐼𝑙 𝐺 ⊆ 𝐼 𝑃 𝑖 Every P has least one BN structure G If the joint probability 𝑃 can be written as 𝑃(𝑋1 , … , 𝑋𝑛 ) = 𝑃(𝑋𝑖 | 𝑃𝑎𝑋𝑖 ) obtain 𝑖 Read independence of P from BN structure G Then local conditional independence in BN are subset of conditional independence in 𝑃 𝐼𝑙 𝐺 ⊆ 𝐼 𝑃 22 I-map: Markov Networks MN encodes global Markov assumptions 𝐼𝑙 𝐺 If global conditional This MN is an I-map of P independence in MN are 𝑃 factorizes according to BN subset of Then the joint probability conditional independence in 𝑃 can be written as obtain 𝑃 1 𝑃(𝑋1 , … , 𝑋𝑛 ) = Ψ 𝐷𝑖 𝑍 𝐼𝑙 𝐺 ⊆ 𝐼 𝑃 𝑖 Every P has least one MN structure G If the joint probability 𝑃 can be written as 1 𝑃(𝑋1 , … , 𝑋𝑛 ) = 𝑍 Ψ 𝐷𝑖 obtain 𝑖 Read independence of P from MN structure G Then global conditional independence in MN are subset of conditional independence in 𝑃 𝐼𝑙 𝐺 ⊆ 𝐼 𝑃 23 Counter Example 𝑋1 , … , 𝑋4 are binary, and only eight assignments have positive probability (each with 1/8) 𝑋1 (0,0,0,0), (1,0,0,0), (1,1,0,0), (1,1,1,0), (0,0,0,1), (0,0,1,1), (0,1,1,1), (1,1,1,1) 𝑋4 𝑋2 𝑋3 Eg., 𝑋1 ⊥ 𝑋3 | 𝑋2 , 𝑋4 𝑋𝟐 𝑋𝟒 00 01 10 11 𝑋𝟏 𝑋𝟑 𝑋𝟐 𝑋𝟒 00 01 10 11 𝑋𝟏 𝑋𝟐 𝑋𝟒 00 01 10 11 𝑋𝟑 00 ½ ½ 0 0 0 ½ 1 0 ½ 0 1 ½ ½ 0 01 0 ½ 0 ½ 1 ½ 0 1 ½ 1 0 ½ ½ 1 10 ½ 0 ½ 0 11 0 0 ½ ½ 𝑃 𝑋1 , 𝑋3 𝑋2 , 𝑋4 = 𝑃 𝑋1 𝑋2 , 𝑋4 𝑃 𝑋3 𝑋2 , 𝑋4 But distribution does not factorize! eg. 𝑃 0,0,1,0 = 0 = 1 𝑍 Ψ12 0, 0 Ψ23 0,1 Ψ34 1,0 Ψ14 (0,0) 24 Markov Network Representation If global conditional This MN is an I-map of P independence in MN are 𝑃 factorizes according to BN subset of Then the joint probability conditional independence in 𝑃 can be written as obtain a strictly postive 𝑃 1 𝑃(𝑋1 , … , 𝑋𝑛 ) = Ψ 𝐷𝑖 𝑍 𝐼𝑙 𝐺 ⊆ 𝐼 𝑃 𝑖 Every P has least one MN structure G For all 𝑥, 𝑃 𝑋 = 𝑥 > 0 Known as Hammersley-Clifford Theorem 25 Minimal I-maps and Markov networks A fully connected graph is an I-map for all distribution Remember minimal I-maps Deleting an edge make it no longer I-map In a Bayesian Network, there is no unique minimal I-map For strictly positive distributions and Markov network, minimal I-map is unique! Many ways to find minimal I-map, eg., Take pairwise Markov assumption If 𝑃 does not entail it, add edge 26 How about a perfect map? Perfect maps? Independence in the graph are exactly the same as those in 𝑃 For Bayesian networks, does not always exist Counter example: swinging couple of variables 𝑃 How about for Markov networks? Counter example: V-structure 𝐴 𝐹 𝑆 𝐴⊥𝐹 ¬ (𝐴 ⊥ 𝐹 | 𝑆) 𝑀𝑁 𝐵𝑁 𝐹 𝐴 𝑆 Minimal I-map MN Not a P-map Can not code 𝐴 ⊥ 𝐹 27 Pairwise Markov Networks All factors over single variables or pairs of variables Node potentials Ψ𝑖 𝑋𝑖 Edge potentials Ψ𝑖𝑗 𝑋𝑖 , 𝑋𝑗 Factorization 𝑃 𝑋 = 𝑖∈𝑉 Ψ𝑖 𝑋𝑖 (𝑖,𝑗)∈𝐸 Ψ𝑖𝑗 𝑋𝑖 , 𝑋𝑗 Note that there may be bigger cliques in the graph, but only consider pairwise potentials 𝐶 𝐵 𝐴 𝐷 28 An example 𝑃 𝐴, 𝐵, 𝐶, 𝐷 = 𝑍 = 1 𝑍 𝑎,𝑏,𝑐,𝑑 Ψ1 𝐶 Ψ1 𝐴, 𝐵, 𝐶 Ψ2 𝐴, 𝐵, 𝐷 𝐴, 𝐵, 𝐶 Ψ2 𝐴, 𝐵, 𝐷 𝐵 𝐴 𝐷 Use two 3-way tables Maximal clique specification 1 𝑍 𝑃′ 𝐴, 𝐵, 𝐶, 𝐷 = Ψ 𝐴𝐶 Ψ 𝐵𝐶 Ψ 𝐴𝐵 Ψ 𝐴𝐷 Ψ(𝐵𝐷) Pairwise Markov Networks 𝐶 Use five 2-way tables 𝐵 𝐴 What is the relation between 𝐼 𝑃 and 𝐼 𝑃’ ? 𝐹𝑢𝑙𝑙𝑦 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑 ⊆ 𝐼 𝑃 ⊆ 𝐼 𝑃’ ⊆ 𝑑𝑖𝑠𝑐𝑜𝑛𝑛𝑒𝑡𝑒𝑑 𝐷 29 Applications of Pairwise Markov Networks Image segmentation: separate foreground from background bg Graph structure: Grid with one node per pixel Pairwise Markov networks fg Node potential Background color vs. foreground color Ψ 𝑋𝑖 = 𝑓𝑔, 𝑚𝑖 = exp −||𝑚𝑖 − 𝜇𝑓𝑔 ||2 Ψ 𝑋𝑖 = 𝑏𝑔, 𝑚𝑖 = exp −||𝑚𝑖 − 𝜇𝑏𝑔 ||2 Edge potential Neighbors likely have the same label 𝑋𝑖 𝑋𝑗 Fg Ψ(𝑋𝑖 , 𝑋𝑗 ) = Fg Bg Bg 10 1 1 10 𝑃 𝑋1 , … , 𝑋𝑛 1 = 𝑍 Ψ 𝑋𝑖 𝑖 Ψ 𝑋𝑖 , 𝑋𝑗 𝑖𝑗 30 Exponential Form Standard model: 1 𝑃(𝑋1 , … , 𝑋𝑛 ) = 𝑍 Ψ 𝐷𝑖 𝑖 Assuming strictly positive potentials: 𝑃(𝑋1 , … , 𝑋𝑛 ) = 1 𝑍 𝑖Ψ 𝐷𝑖 1 exp log Ψ 𝐷𝑖 𝑍 𝑖 1 = exp 𝑖 log Ψ 𝐷𝑖 𝑍 1 = exp( 𝑖 Φ(𝐷𝑖 ) ) 𝑍 = We can maintain table Φ 𝐷𝑖 (can have negative entries) rather than table Ψ 𝐷𝑖 (strictly positive entries) 31 Exponential Form—Log-linear Models Features are some functions 𝑓(𝐷) for a subset of variables 𝐷 Log-linear model over a Markov network G: A set of features 𝑓1 𝐷1 , … , 𝑓𝑘 𝐷𝑘 Each 𝐷𝑖 is over a subset of a clique in G Eg. Pairwise model 𝐷𝑖 = 𝑋𝑖 , 𝑋𝑗 Two f’s can be over the same variables It’s ok for 𝐷𝑖 = 𝐷𝑗 A set of weights 𝑤1 , … , 𝑤𝑘 Usually learned from data 𝑃 𝑋1 , … , 𝑋𝑛 = 1 exp( 𝑍 𝑘 𝑖=1 𝑤𝑖 𝑓𝑖 (𝐷𝑖 ) ) 32 Factor Graph 𝐶 Maximal clique specification 𝑃 𝐴, 𝐵, 𝐶, 𝐷 = 1 𝑍 Ψ1 𝐴, 𝐵, 𝐶 Ψ2 𝐴, 𝐵, 𝐷 𝐵 𝐴 Pairwise Markov Networks 𝐷 1 𝑍 𝑃′ 𝐴, 𝐵, 𝐶, 𝐷 = Ψ 𝐴𝐶 Ψ 𝐵𝐶 Ψ 𝐴𝐵 Ψ 𝐴𝐷 Ψ(𝐵𝐷) Can not look at the graph and tell what potential is using Factor graph is to make this clear in graphical form 33 Factor Graph Make factor dependency explicit 𝐶 Useful for later inference 𝐵 𝐴 Bipartite graph: 𝐷 Variable nodes (circle) for 𝑋1 , … , 𝑋𝑛 Factor nodes (square) for Ψ1 , … , Ψ𝑚 Edge 𝑋𝑖 − Ψ𝑗 if 𝑋𝑖 ∈ 𝐷𝑗 (Scope of Ψ𝑗 𝐷𝑗 ) 𝐴 𝐵 Ψ1 𝐶 Ψ2 1 Ψ 𝐴, 𝐵, 𝐶 Ψ2 𝐴, 𝐵, 𝐷 𝑍 1 𝐴 𝐷 Ψ1 𝐶 𝐵 Ψ2 Ψ3 𝐷 Ψ4 Ψ5 1 Ψ 𝐴𝐵 Ψ2 𝐴𝐶 Ψ3 𝐵𝐶 Ψ4 𝐴𝐷 Ψ5 (𝐵𝐷) 𝑍 1 34 Conditional random Fields Focus on conditional distribution 𝑃(𝑌1 , … , 𝑌𝑛 |𝑋1 , … , 𝑋𝑛 , 𝑋) Do not explicitly model dependence between 𝑋1 , … , 𝑋𝑛 , 𝑋 𝑋 Only model relation between 𝑋 − 𝑌 and 𝑌 − 𝑌 𝑃(𝑌1 , 𝑌2 , 𝑌3 , 𝑌4 |𝑋1 , 𝑋2 , 𝑋3 , 𝑋4 , 𝑋) =𝑍 1 𝑋1 ,𝑋2 ,𝑋3 ,𝑋4 ,𝑋 Ψ 𝑌1 , 𝑌2 , 𝑋1 , 𝑋2 , 𝑋 Ψ(𝑌2 , 𝑌3 , 𝑋2 , 𝑋3 , 𝑋) 𝑌1 𝑌2 𝑌3 𝑋1 𝑋2 𝑋3 𝑍 𝑋1 , 𝑋2 , 𝑋3 , 𝑋4 , 𝑋 = 𝑦1 𝑦2 𝑦3 Ψ 𝑌1 , 𝑌2 , 𝑋1 , 𝑋2 , 𝑋 Ψ(𝑌2 , 𝑌3 , 𝑋2 , 𝑋3 , 𝑋) 35

Representation for Undirected Graphical Models Le Song

Related documents

Products

Support

Representation for Undirected Graphical Models Le Song

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib