7. Max-product algorithm MAP elimination algorithm Max-product algorithm on a tree Max-product algorithm 7-1 Inference tasks on graphical models consider an undirected graphical model (a.k.a. Markov random field) µ(x) = 1 Y ψc (xc ) Z c∈C where C is the set of all maximal cliques in G we want to calculate marginals: µ(xA ) = P xV \A µ(x) calculating conditional distributions µ(xA |xB ) = µ(xA , xB ) µ(xB ) calculation maximum a posteriori estimates: arg maxx̂ µ(x̂) calculating the partition function Z sample from this distribution Max-product algorithm 7-2 MAP estimation = Mode computation x1 x2 x3 x4 x5 x6 x9 x7 x8 x10 x12 x11 x∗ ∈ arg max µ(x) x∈X V Y = arg max ψc (xc ) x∈X V Max-product algorithm c∈C 7-3 MAP elimination algorithm for computing the mode MAP elimination algorithm is exact but can require O(|X ||V | ) operations 3 4 2 5 1 µ(x) ∝ ex1 x2 e−x1 x3 ex2 x5 e−x3 x4 x5 xi ∈ {+1, −1} we want to find x∗ ∈ arg maxx µ(x) brute force combinatorial search: max x∈{+1,−1}5 µ(x) ∝ max x1 ,x2 ,x3 ,x4 ,x5 ex1 x2 e−x1 x3 ex2 x5 e−x3 x4 x5 requires O(|C| · |X |5 ) operations Max-product algorithm 7-4 consider an elimination ordering (5, 4, 3, 2, 1) max µ(x) ∝ x = max x1 ,x2 ,x3 ,x4 ,x5 max x1 ,x2 ,x3 ,x4 ex1 x2 e−x1 x3 ex2 x5 e−x3 x4 x5 ex1 x2 e−x1 x3 max e(x2 −x3 x4 )x5 x5 {z } | ≡m5 (x2 ,x3 ,x4 )=e|x2 −x3 x4 | , with x∗ 5 =sign(x2 −x3 x4 ) = = max x1 ,x2 ,x3 ,x4 ex1 x2 e−x1 x3 m5 (x2 , x3 , x4 ) max ex1 x2 e−x1 x3 max m5 (x2 , x3 , x4 ) {z } | x1 ,x2 ,x3 x4 ∈X ≡m4 (x2 ,x3 )=e2 , with x∗ 4 =−(x2 x3 ) = = max e x1 x2 −x1 x3 x1 ,x2 ,x3 e m4 (x2 , x3 ) max ex1 x2 max e−x1 x3 m4 (x2 , x3 ) x3 ∈X | {z } x1 ,x2 ≡m3 (x1 ,x2 )=e3 , with x∗ 3 =−x1 = max e x1 ,x2 x1 x2 m3 (x1 , x2 ) in the final step, let (x∗1 , x∗2 ) ∈ arg max e3+x1 x2 , that is x∗2 = x1 and x∗1 = +1 or −1 Max-product algorithm 7-5 from x∗1 ∈ {±1}, x∗2 = x1 , x∗3 = −x1 , x∗4 = −x2 x3 , x∗5 = sign(x2 − x3 x4 ), we can find a MAP assignment by backtracking (5, 4, 3, 2, 1): x∗ = + 1 , +1 , −1 , +1 , +1 or − 1 , −1 , +1 , +1 , −1 maximizes ex1 x2 e−x1 x3 ex2 x5 e−x3 x4 x5 since we are only searching for one maximizing configuration, we are free to break ties arbitrarily computational complexity depends on the elimination ordering how do we know which ordering is better? Max-product algorithm 7-6 Computational complexity of elimination algorithm µ(x1 ) ∝ = max x2 ,x3 ,x4 ,x5 ∈X max x2 ,x3 ,x4 ∈X ex1 x2 e−x1 x3 ex2 x5 e−x3 x4 x5 ex1 x2 e−x1 x3 · max ex2 x5 e−x3 x4 x5 {z } | x5 ∈X ≡m5 (S5 ), S5 ={x2 ,x3 ,x4 }, Ψ5 ={ex2 x5 ,e−x3 x4 x5 } = max e x1 x2 −x1 x3 e x2 ,x3 ∈X · max m5 (x2 , x3 , x4 ) x4 ∈X | {z } ≡m4 (S4 ), S4 ={x2 ,x3 },Ψ4 ={m5 } x1 x2 max e−x1 x3 m4 (x2 , x3 ) {z } | = max e · = x1 x2 x2 ∈X x3 ∈X ≡m3 (S3 ), S3 ={x1 ,x2 }, Ψ3 ={e−x1 x3 ,m4 } max e x2 ∈X | m3 (x1 , x2 ) {z } = m2 (x1 ) ≡m2 (S2 ), S2 ={x1 }, Ψ2 ={ex1 x2 ,m3 } total complexity: P i O(|Ψi | · |X |1+|Si | ) = O(|V | · maxi |Ψi | · |X |1+maxi |Si | ) use induced graph for graph theoretic complexity analysis Max-product algorithm 7-7 MAP elimination algorithm input: {ψc }c∈C , alphabet X , elimination ordering I output: x∗ ∈ arg max µ(x) 1. initialize active set Ψ to be the set of input compatibility functions 2. for node i in I that is not in A do let Si be the set of nodes, not including i, that share a compatibility function with i let Ψi be the set of compatibility functions in Ψ involving xi compute mi (xSi ) x∗i (xSi ) = ∈ max xi Y ψ(xi , xSi ) ψ∈Ψi arg max xi Y ψ(xi , xSi ) ψ∈Ψi remove elements of Ψi from Ψ add mi to Ψ end 3. produce x∗ by traversing I in the reverse order Max-product algorithm x∗I(j) = x∗I(j) (x∗I(k) : j < k) 7-8 Max-product algorithm on a tree max-product algorithm computes max-marginal: µ̃i (xi ) ≡ max µ(x) xV \{i} when each max-marginal is uniquely achieved, that is x∗i ∈ arg maxxi µ̃i (xi ) is unique for all i ∈ V , then x∗ = (x∗1 , . . . , x∗n ) is the global MAP configuration when there is a tie, we can keep track of ‘backpointers’ on what maximizing value of the neighbors are to recover a global MAP configuration note that in general arg maxxi µ̃i (xi ) 6= arg maxxi µ(xi ) Max-product algorithm 7-9 most generic form of max-product algorithm on a Factor Graph (t+1) νi→a (xi ) (t) Y = ν̃b→i (xi ) b∈∂i\{a} (t) ν̃a→i (xi ) (t) δa→i (xi ) (t+1) µ̃i (xi ) = x∂a\{i} (t) Y max νj→a (xj )ψa (x∂a ) j∈∂a\{i} = arg max = Y x∂a\{i} (t) Y νj→a (xj )ψa (x∂a ) j∈∂a\{i} (t) ν̃a→i (xi ) a∈∂i for pairwise MRFs there are two ways to reduce this updates into single update of single messages: (t+1) Y νi→j (xi ) = n k∈∂i\{j} (t+1) δi→j (xi ) = arg o (t) max ψ(xk , xi )νk→i (xk ) xk max Y x∂i\{j} k∈∂i\{j} Max-product algorithm (t) ψ(xk , xi )νk→i (xk ) (t+1) xi (t) Y ν̃i→j (xj ) = max ψ(xi , xj ) ν̃k→i (xi ) k∈∂i\{j} (t+1) δi→j (xj ) = arg max ψ(xi , xj ) xi Y (t) ν̃k→i (xi ) k∈∂i\{j} 7-10 (parallel) max-product algorithm for pairwise MRFs (0) 1. initialize νi→j (xi ) = 1/|X | for all (i, j) ∈ D 2. for t ∈ {0, 1, . . . , tmax } for all (i, j) ∈ D compute o Y n (t+1) (t) νi→j (xi ) = max ψik (xi , xk )νk→i (xk ) xk k∈∂i\j (t+1) δi→j (xi ) = o Y n (t) ψik (xi , xk )νk→i (xk ) arg max x∂i\j k∈∂i\j 3. for each i ∈ V compute max-marginal o Yn (tmax +1) µ̃i (xi ) = max ψik (xi , xk )νk→i (xk ) xk k∈∂i δi (xi ) = arg max x∂i o Yn (tmax +1) ψik (xi , xk )νk→i (xk ) k∈∂i 4. backtrack to find a MAP configuration related to dynamic programming exact when tmax ≥ diam(T ) computes max-marginals in O(n|X |2 · diam(T )) operations Max-product algorithm 7-11 Numerical stability and min-sum algorithm multiplying many compatibility functions can lead to numerical issues (e.g. when the values are close to zero) one remedy is to work in log domain min-sum algorithm n o X min − log ψki (xk , xi ) + νk→i (xk ) νi→j (xi ) = k∈∂i\j µ̃i (xi ) = exp xk nX k∈∂i Max-product algorithm min xk n − log ψki (xk , xi ) + νk→i (xk ) oo 7-12 1 ν2→1 2 4 3 6 5 define graphical model on sub-trees T2→1 = (V2→1 , E2→1 ) ≡ µ2→1 (xV2→1 ) ≡ Subtree rooted at 2 and excluding 1 Y 1 ψ(xu , xv ) Z(T2→1 ) (u,v)∈E2→1 ν2→1 (x2 ) ≡ Max-product algorithm max xV2→1 \{2} µ2→1 (xV2→1 ) 7-13 1 ν2→1 2 3 4 6 5 the messages from neighbors are sufficient to compute the max-marginal µ̃(x1 ) Y ψ(xu , xv ) µ̃(x1 ) ∝ max xV \{1} = = = max xV2→1 ,xV3→1 max xV2→1 ,xV3→1 Y k∈∂1 = (u,v)∈E Y k∈∂1 max xV n n ψ(x1 , x2 )ψ(x1 , x3 ) k∈∂1 xk oo ψ(xu , xv ) oo ψ(xu , xv ) Y (u,v)∈Ek→1 n xV | Max-product algorithm o ψ(xu , xv ) (u,v)∈Ek→1 n ψ(x1 , xk ) max ψ(x1 , xk ) Y Y (u,v)∈E3→1 (u,v)∈E2→1 n Y n ψ(x1 , xk ) k→1 n on ψ(xu , xv ) Y max k→1 \{k} Y ψ(xu , xv ) oo (u,v)∈Ek→1 {z νk→1 (xk ) } 7-14 1 ν2→1 2 4 3 6 5 recursion on sub-trees to compute the messages ν µ2→1 (xV2→1 ) = 1 Z(T2→1 ) Y ψ(xu , xv ) (u,v)∈E2→1 ∝ n ψ(x2 , x4 )ψ(x2 , x5 ) ∝ ψ(x2 , x4 ) ψ(x2 , x5 )µ4→2 (xV4→2 )µ5→2 (xV5→2 ) Y on ψ(xu , xv ) (u,v)∈E4→2 Max-product algorithm Y ψ(xu , xv ) o (u,v)∈E5→2 7-15 ν2→1 (x2 ) = ∝ max µ2→1 (xV2→1 ) max ψ(x2 , x4 ) ψ(x2 , x5 )µ4→2 (xV4→2 )µ5→2 (xV5→2 ) xV 2→1 \2 xV 2→1 \2 ∝ n = n ∝ n o on max ψ(x2 , x5 )µ5→2 (xV5→2 ) max ψ(x2 , x4 )µ4→2 (xV4→2 ) xV 5→2 xV 4→2 max ψ(x2 , x4 ) x4 max xV 4→2 \{4} max ψ(x2 , x4 )ν4→2 (x4 ) x4 on µ4→2 (xV4→2 ) max ψ(x2 , x5 ) on x5 max xV 5→2 \{5} o µ5→2 (xV5→2 ) o max ψ(x2 , x5 )ν5→2 (x5 ) x5 update messages via recursion with uniform initialization νi→j (xi ) = all leaves i o Y n ν2→1 (x2 ) = max ψ2k (x2 , xk )νk→2 (xk ) k∈∂2\{1} 1 |X | for xk to obtain a global configuration, we need to store the arg max for each message for backtracking o Y n δ2→1 (x2 ) = arg max ψ2k (x2 , xk )νk→2 (xk ) x∂2\{1} Max-product algorithm k∈∂2\1 7-16