Max-product algorithm

advertisement
7. Max-product algorithm
MAP elimination algorithm
Max-product algorithm on a tree
Max-product algorithm
7-1
Inference tasks on graphical models
consider an undirected graphical model (a.k.a. Markov random field)
µ(x) =
1 Y
ψc (xc )
Z
c∈C
where C is the set of all maximal cliques in G
we want to
calculate marginals: µ(xA ) =
P
xV \A
µ(x)
calculating conditional distributions
µ(xA |xB ) =
µ(xA , xB )
µ(xB )
calculation maximum a posteriori estimates: arg maxx̂ µ(x̂)
calculating the partition function Z
sample from this distribution
Max-product algorithm
7-2
MAP estimation = Mode computation
x1
x2
x3
x4
x5
x6
x9
x7
x8
x10
x12
x11
x∗
∈ arg max µ(x)
x∈X V
Y
= arg max
ψc (xc )
x∈X V
Max-product algorithm
c∈C
7-3
MAP elimination algorithm for computing the mode
MAP elimination algorithm is exact but can require O(|X ||V | ) operations
3
4
2
5
1
µ(x) ∝ ex1 x2 e−x1 x3 ex2 x5 e−x3 x4 x5
xi ∈ {+1, −1}
we want to find x∗ ∈ arg maxx µ(x)
brute force combinatorial search:
max
x∈{+1,−1}5
µ(x) ∝
max
x1 ,x2 ,x3 ,x4 ,x5
ex1 x2 e−x1 x3 ex2 x5 e−x3 x4 x5
requires O(|C| · |X |5 ) operations
Max-product algorithm
7-4
consider an elimination ordering (5, 4, 3, 2, 1)
max µ(x) ∝
x
=
max
x1 ,x2 ,x3 ,x4 ,x5
max
x1 ,x2 ,x3 ,x4
ex1 x2 e−x1 x3 ex2 x5 e−x3 x4 x5
ex1 x2 e−x1 x3
max e(x2 −x3 x4 )x5
x5
{z
}
|
≡m5 (x2 ,x3 ,x4 )=e|x2 −x3 x4 | , with x∗
5 =sign(x2 −x3 x4 )
=
=
max
x1 ,x2 ,x3 ,x4
ex1 x2 e−x1 x3 m5 (x2 , x3 , x4 )
max ex1 x2 e−x1 x3
max m5 (x2 , x3 , x4 )
{z
}
|
x1 ,x2 ,x3
x4 ∈X
≡m4 (x2 ,x3 )=e2 , with x∗
4 =−(x2 x3 )
=
=
max e
x1 x2 −x1 x3
x1 ,x2 ,x3
e
m4 (x2 , x3 )
max ex1 x2 max e−x1 x3 m4 (x2 , x3 )
x3 ∈X
|
{z
}
x1 ,x2
≡m3 (x1 ,x2 )=e3 , with x∗
3 =−x1
=
max e
x1 ,x2
x1 x2
m3 (x1 , x2 )
in the final step, let (x∗1 , x∗2 ) ∈ arg max e3+x1 x2 , that is x∗2 = x1 and x∗1 = +1 or
−1
Max-product algorithm
7-5
from
x∗1 ∈ {±1}, x∗2 = x1 , x∗3 = −x1 , x∗4 = −x2 x3 , x∗5 = sign(x2 − x3 x4 ),
we can find a MAP assignment by backtracking (5, 4, 3, 2, 1):
x∗ = + 1 , +1 , −1 , +1 , +1 or − 1 , −1 , +1 , +1 , −1
maximizes ex1 x2 e−x1 x3 ex2 x5 e−x3 x4 x5
since we are only searching for one maximizing configuration, we are
free to break ties arbitrarily
computational complexity depends on the elimination ordering
how do we know which ordering is better?
Max-product algorithm
7-6
Computational complexity of elimination algorithm
µ(x1 ) ∝
=
max
x2 ,x3 ,x4 ,x5 ∈X
max
x2 ,x3 ,x4 ∈X
ex1 x2 e−x1 x3 ex2 x5 e−x3 x4 x5
ex1 x2 e−x1 x3 ·
max ex2 x5 e−x3 x4 x5
{z
}
|
x5 ∈X
≡m5 (S5 ), S5 ={x2 ,x3 ,x4 }, Ψ5 ={ex2 x5 ,e−x3 x4 x5 }
= max e
x1 x2 −x1 x3
e
x2 ,x3 ∈X
·
max m5 (x2 , x3 , x4 )
x4 ∈X
|
{z
}
≡m4 (S4 ), S4 ={x2 ,x3 },Ψ4 ={m5 }
x1 x2
max e−x1 x3 m4 (x2 , x3 )
{z
}
|
= max e
·
=
x1 x2
x2 ∈X
x3 ∈X
≡m3 (S3 ), S3 ={x1 ,x2 }, Ψ3 ={e−x1 x3 ,m4 }
max e
x2 ∈X
|
m3 (x1 , x2 )
{z
}
= m2 (x1 )
≡m2 (S2 ), S2 ={x1 }, Ψ2 ={ex1 x2 ,m3 }
total complexity:
P
i
O(|Ψi | · |X |1+|Si | ) = O(|V | · maxi |Ψi | · |X |1+maxi |Si | )
use induced graph for graph theoretic complexity analysis
Max-product algorithm
7-7
MAP elimination algorithm
input: {ψc }c∈C , alphabet X , elimination ordering I
output: x∗ ∈ arg max µ(x)
1. initialize active set Ψ to be the set of input compatibility functions
2. for node i in I that is not in A do
let Si be the set of nodes, not including i, that share a
compatibility function with i
let Ψi be the set of compatibility functions in Ψ involving xi
compute
mi (xSi )
x∗i (xSi )
=
∈
max
xi
Y
ψ(xi , xSi )
ψ∈Ψi
arg max
xi
Y
ψ(xi , xSi )
ψ∈Ψi
remove elements of Ψi from Ψ
add mi to Ψ
end
3. produce x∗ by traversing I in the reverse order
Max-product algorithm
x∗I(j)
= x∗I(j) (x∗I(k) : j < k)
7-8
Max-product algorithm on a tree
max-product algorithm computes max-marginal:
µ̃i (xi ) ≡
max µ(x)
xV \{i}
when each max-marginal is uniquely achieved, that is
x∗i ∈ arg maxxi µ̃i (xi ) is unique for all i ∈ V , then x∗ = (x∗1 , . . . , x∗n )
is the global MAP configuration
when there is a tie, we can keep track of ‘backpointers’ on what
maximizing value of the neighbors are to recover a global MAP
configuration
note that in general arg maxxi µ̃i (xi ) 6= arg maxxi µ(xi )
Max-product algorithm
7-9
most generic form of max-product algorithm on a Factor Graph
(t+1)
νi→a (xi )
(t)
Y
=
ν̃b→i (xi )
b∈∂i\{a}
(t)
ν̃a→i (xi )
(t)
δa→i (xi )
(t+1)
µ̃i
(xi )
=
x∂a\{i}
(t)
Y
max
νj→a (xj )ψa (x∂a )
j∈∂a\{i}
=
arg max
=
Y
x∂a\{i}
(t)
Y
νj→a (xj )ψa (x∂a )
j∈∂a\{i}
(t)
ν̃a→i (xi )
a∈∂i
for pairwise MRFs there are two ways to reduce this updates into
single update of single messages:
(t+1)
Y
νi→j (xi ) =
n
k∈∂i\{j}
(t+1)
δi→j (xi ) = arg
o
(t)
max ψ(xk , xi )νk→i (xk )
xk
max
Y
x∂i\{j}
k∈∂i\{j}
Max-product algorithm
(t)
ψ(xk , xi )νk→i (xk )
(t+1)
xi
(t)
Y
ν̃i→j (xj ) = max ψ(xi , xj )
ν̃k→i (xi )
k∈∂i\{j}
(t+1)
δi→j (xj ) = arg max ψ(xi , xj )
xi
Y
(t)
ν̃k→i (xi )
k∈∂i\{j}
7-10
(parallel) max-product algorithm for pairwise MRFs
(0)
1. initialize νi→j (xi ) = 1/|X | for all (i, j) ∈ D
2. for t ∈ {0, 1, . . . , tmax }
for all (i, j) ∈ D compute
o
Y n
(t+1)
(t)
νi→j (xi ) =
max ψik (xi , xk )νk→i (xk )
xk
k∈∂i\j
(t+1)
δi→j (xi )
=
o
Y n
(t)
ψik (xi , xk )νk→i (xk )
arg max
x∂i\j
k∈∂i\j
3. for each i ∈ V compute max-marginal
o
Yn
(tmax +1)
µ̃i (xi ) =
max ψik (xi , xk )νk→i
(xk )
xk
k∈∂i
δi (xi )
=
arg max
x∂i
o
Yn
(tmax +1)
ψik (xi , xk )νk→i
(xk )
k∈∂i
4. backtrack to find a MAP configuration
related to dynamic programming
exact when tmax ≥ diam(T )
computes max-marginals in O(n|X |2 · diam(T )) operations
Max-product algorithm
7-11
Numerical stability and min-sum algorithm
multiplying many compatibility functions can lead to numerical issues
(e.g. when the values are close to zero)
one remedy is to work in log domain
min-sum algorithm
n
o
X
min − log ψki (xk , xi ) + νk→i (xk )
νi→j (xi ) =
k∈∂i\j
µ̃i (xi ) = exp
xk
nX
k∈∂i
Max-product algorithm
min
xk
n
− log ψki (xk , xi ) + νk→i (xk )
oo
7-12
1
ν2→1
2
4
3
6
5
define graphical model on sub-trees
T2→1 = (V2→1 , E2→1 ) ≡
µ2→1 (xV2→1 ) ≡
Subtree rooted at 2 and excluding 1
Y
1
ψ(xu , xv )
Z(T2→1 )
(u,v)∈E2→1
ν2→1 (x2 ) ≡
Max-product algorithm
max
xV2→1 \{2}
µ2→1 (xV2→1 )
7-13
1
ν2→1
2
3
4
6
5
the messages from neighbors are sufficient to compute the max-marginal µ̃(x1 )
Y
ψ(xu , xv )
µ̃(x1 ) ∝
max
xV \{1}
=
=
=
max
xV2→1 ,xV3→1
max
xV2→1 ,xV3→1
Y
k∈∂1
=
(u,v)∈E
Y
k∈∂1
max
xV
n
n
ψ(x1 , x2 )ψ(x1 , x3 )
k∈∂1
xk
oo
ψ(xu , xv )
oo
ψ(xu , xv )
Y
(u,v)∈Ek→1
n
xV
|
Max-product algorithm
o
ψ(xu , xv )
(u,v)∈Ek→1
n
ψ(x1 , xk )
max ψ(x1 , xk )
Y
Y
(u,v)∈E3→1
(u,v)∈E2→1
n
Y n
ψ(x1 , xk )
k→1
n
on
ψ(xu , xv )
Y
max
k→1 \{k}
Y
ψ(xu , xv )
oo
(u,v)∈Ek→1
{z
νk→1 (xk )
}
7-14
1
ν2→1
2
4
3
6
5
recursion on sub-trees to compute the messages ν
µ2→1 (xV2→1 )
=
1
Z(T2→1 )
Y
ψ(xu , xv )
(u,v)∈E2→1
∝
n
ψ(x2 , x4 )ψ(x2 , x5 )
∝
ψ(x2 , x4 ) ψ(x2 , x5 )µ4→2 (xV4→2 )µ5→2 (xV5→2 )
Y
on
ψ(xu , xv )
(u,v)∈E4→2
Max-product algorithm
Y
ψ(xu , xv )
o
(u,v)∈E5→2
7-15
ν2→1 (x2 )
=
∝
max
µ2→1 (xV2→1 )
max
ψ(x2 , x4 ) ψ(x2 , x5 )µ4→2 (xV4→2 )µ5→2 (xV5→2 )
xV
2→1 \2
xV
2→1 \2
∝
n
=
n
∝
n
o
on
max ψ(x2 , x5 )µ5→2 (xV5→2 )
max ψ(x2 , x4 )µ4→2 (xV4→2 )
xV
5→2
xV
4→2
max ψ(x2 , x4 )
x4
max
xV
4→2 \{4}
max ψ(x2 , x4 )ν4→2 (x4 )
x4
on
µ4→2 (xV4→2 )
max ψ(x2 , x5 )
on
x5
max
xV
5→2 \{5}
o
µ5→2 (xV5→2 )
o
max ψ(x2 , x5 )ν5→2 (x5 )
x5
update messages via recursion with uniform initialization νi→j (xi ) =
all leaves i
o
Y n
ν2→1 (x2 ) =
max ψ2k (x2 , xk )νk→2 (xk )
k∈∂2\{1}
1
|X |
for
xk
to obtain a global configuration, we need to store the arg max for each
message for backtracking
o
Y n
δ2→1 (x2 ) = arg max
ψ2k (x2 , xk )νk→2 (xk )
x∂2\{1}
Max-product algorithm
k∈∂2\1
7-16
Download