Belief Propagation for Min-Cost Network Flow: Convergence & Correctness Please share

advertisement
Belief Propagation for Min-Cost Network Flow:
Convergence & Correctness
The MIT Faculty has made this article openly available. Please share
how this access benefits you. Your story matters.
Citation
David Gamarnik, Devavrat Shah, and Yehua Wei. 2010. Belief
propagation for min-cost network flow: convergence \&
correctness. In Proceedings of the Twenty-First Annual ACMSIAM Symposium on Discrete Algorithms (SODA '10). Society
for Industrial and Applied Mathematics, Philadelphia, PA, USA,
279-292. Copyright © 2010, Society for Industrial and Applied
Mathematics
As Published
http://dl.acm.org/citation.cfm?id=1873625
Publisher
Society for Industrial and Applied Mathematics
Version
Final published version
Accessed
Thu May 26 06:36:54 EDT 2016
Citable Link
http://hdl.handle.net/1721.1/73521
Terms of Use
Article is made available in accordance with the publisher's policy
and may be subject to US copyright law. Please refer to the
publisher's site for terms of use.
Detailed Terms
Belief Propagation for Min-cost Network Flow: Convergence & Correctness
David Gamarnik∗
Devavrat Shah†
Abstract
We formulate a Belief Propagation (BP) algorithm in the
context of the capacitated minimum-cost network flow problem (MCF). Unlike most of the instances of BP studied in
the past, the messages of BP in the context of this problem
are piecewise-linear functions. We prove that BP converges
to the optimal solution in pseudo-polynomial time, provided
that the optimal solution is unique and the problem input is
integral. Moreover, we present a simple modification of the
BP algorithm which gives a fully polynomial-time randomized approximation scheme (FPRAS) for MCF. This is the
first instance where BP is proved to have fully-polynomial
running time.
1 Introduction
The Markov random field or graphical model provides
a succinct representation for capturing the dependency
structure between a collection of random variables. In
the recent years, the need for large scale statistical inference has made graphical models the representation of
choice in many statistical applications. There are two
key inference problems for a graphical model of interest.
The first problem is the computation of marginal distribution of a random variable. This problem is (computationally) equivalent to the computation of partition
function in statistical physics or counting the number
of combinatorial objects (e.g., independent sets) in computer science. The second problem is that of finding the
mode of the distribution, i.e., the assignment with maximum likelihood (ML). For a constrained optimization
problem, when the constraints are modeled through a
graphical model and probability is proportional to the
cost of the assignment, an ML assignment is the same as
an optimal solution. Both of these questions, in general,
are computationally hard.
Belief Propagation (BP) is an “umbrella” heuristic
designed for these two problems. Its version for the
first problem is known as the “sum-product algorithm”
and for the second problem is known as the “max-
Yehua Wei
‡
product algorithm”. Both versions of the BP algorithm
are iterative, simple and message-passing in nature.
In this paper, our interest lies in the correctness and
convergence properties of the max-product version of
BP when applied to the minimum-cost network flow
problems (MCF), an important class of linear (or more
generally convex) optimization problems. In the rest
of the paper, we will use the term BP to refer to the
max-product version unless specified otherwise.
The BP algorithm is essentially an approximation
of the dynamic programming assuming that underlying graphical model is a tree [8], [22], [16]. Specifically,
assuming that the graphical model is a tree, one can
obtain a natural parallel iterative version of the dynamic programming in which variable nodes pass messages between each other along edges of the graphical
model. Somewhat surprisingly, this seemingly naive BP
heuristic has become quite popular in practice [2], [9],
[11], [17]. In our opinion, there are two primary reasons
for the popularity of BP. First, like dynamic programming, it is generically applicable, easy to understand
and implementation-friendly due to its iterative, simple
and message-passing nature. Second, in many practical
scenarios, the performance of BP is surprisingly good
[21] [22]. On one hand, for an optimist, this unexpected
success of BP provides a hope for its being a genuinely
much more powerful algorithm than what we know thus
far (e.g., better than primal-dual methods). On the
other hand, a skeptic would demand a systematic understanding of the limitations (and strengths) of BP, in
order to caution a practitioner in using it. Thus, irrespective of the perspective of an algorithmic theorist,
rigorous understanding of BP is very important.
Prior work. Despite these compelling reasons, only
recently have we witnessed an explosion of research
for theoretical understanding of the performance of the
BP algorithm in the context of various combinatorial
optimization problems, both tractable and intractable
(NP-hard) versions. To begin with, Bayati, Shah and
Sharma [6] considered the performance of BP for the
∗ Operations Research Center and Sloan School of Managethe problem of finding maximum weight matching in a
ment, MIT, Cambridge, MA, 02139, e-mail: gamarnik@mit.edu
bipartite graph. They established that BP converges in
† EECS,
MIT,
Cambridge,
MA,
02139,
e-mail: pseudo-polynomial time to the optimal solution when
devavrat@mit.edu
the optimal solution is unique [6]. Bayati et al. [4]
‡ Operations Research Center, MIT, Cambridge, MA, 02139,
as well as Sanghavi et al. [18] generalized this result
e-mail: y4wei@MIT.EDU
279
Copyright © by SIAM.
Unauthorized reproduction of this article is prohibited.
v1
v2
v3
min 4x1 + x2 − 3x4 + x5 + 2x6 − x7
s.t. x1 + 9x3 − x4 + 5x7 = 10
−2x1 + x2 + x3 − 3x4 + x5 − x6 + x7 = 3
e1
e2
e3
e4
e5
e6
x1 + x6 = 2
e7
x ≥ 0, x ∈ R7 .
Figure 1: An example of a factor graph
In FLP , for every 1 ≤ i ≤ m, let Ei = {ej | aij = 0}; and
let Vj = {vi | aij = 0}. For each factor node vi , define
the factor function ψi : R|Ei | → R̄ (R̄ = R ∪ ∞) as:
0
if
j∈Ei aij zj = bj
by establishing correctness and convergence of the BP
ψi (z) =
∞
otherwise
algorithm for b-matching problem when the linear programming (LP) relaxation corresponding to the node
Essentially, ψi represents the i-th equality constraints
constraints has a unique integral optimal solution. Note
in LP. If we let xEi be the subvector of x with indices
that the LP relaxation corresponding to the node conin Ei , then ψi (xEi ) < ∞ if and only if x does not violate
straints is not tight in general, as inclusion of the oddthe i-th equality constraint in LP. Also, for any variable
cycle elimination constraints [20] is essential. Furthernode ej , define variable function φj : R → R̄ as:
more, [4] and [18] established that the BP does not con
verge if this LP relaxation does have a non-integral soif z ≥ 0
cj z
φj (z) =
lution. Thus, for a b-matching problem BP finds an
∞
otherwise
optimal answer when the LP relaxation can find an
optimal solution. In the context of maximum weight Notice that problem LP is equivalent to
n
independent set problem, a one-sided relation between minx∈Rn m
i=1 ψi (xEi ) +
j=1 φj (xj ), which is deLP relaxation and BP is established [19]; if BP con- fined on FLP . As BP is defined on graphical models,
verges then it is correct and LP relaxation is tight. We we can now formulate the BP algorithm as an iterative
also take note of the work on BP for Steiner tree prob- message-passing algorithm on FLP . It is formally
lem [5], special instances of quadratic/convex optimiza- presented below as Algorithm 1. The idea of the
tion problems [13], [12], Gaussian graphical models [10], algorithm is that at the t-th iteration, every factor
and the relation between primal and message-passing for vertex vi sends a message function mtv →e (z) to
i
j
LP decoding [3].
each one of its neighbors ej , where mtvi →ej (z) is vi ’s
estimate of the cost if xj takes value z, based on the
2 BP for Linear Programing Problem
messages vi received at time t from its neighbors.
Suppose we are given an LP problem in the standard Additionally, every variable vertex ej sends a message
form:
function mtej →vi (z) to each one of its neighbors vi ,
where mtej →vi (z) is ej ’s estimate of the cost if xj takes
value z, based on the messages ej received at time t − 1
from all of its neighbors except vj . After N iterations,
min cT x
(LP)
the algorithm calculates the “belief” at variable node
s.t. Ax = b,
ej using the messages sent to vertex ej at time N for
x ≥ 0, x ∈ Rn
each j.
In Algorithm 1, one basic question is the computation procedure of message functions of the form mtej →vi
and mtvi →ej . We claim that every message function is a
where A is a real m × n matrix, b is a real vector convex piecewise-linear function, which allows us to enof dimension m, and c is a real vector of dimension code it in terms of a finite vector describing the break
n. We define its factor graph, FLP , to be a bipartite points and slopes of its linear pieces. In subsection 3.1,
graph, with factor nodes v1 , v2 , ..., vm and variable nodes we describe in detail how mtvi →ej (z) can be computed.
e1 , e2 , ..., en , such that ei and vj are adjacent if and only
In general, the max-product behaves badly when
if aij = 0. For example, the graph shown in Figure 1, is solving an LP instance without a unique optimal solution [6], [4]. Yet, even with the assumption that an
the factor graph of the LP problem:
280
Copyright © by SIAM.
Unauthorized reproduction of this article is prohibited.
Algorithm 1 BP for LP
1: Initialize t = 0, and for each variable node ej , create
factor to variable message m0vi →ej (z) = 0, ∀ej ∈ Ei .
2: for t = 1, 2, ..., N do
3:
∀ej , vi , where vi ∈ Vj , set mtej →vi (z) to
φj (z) +
vi ∈Vj \vi
4:
5:
6:
7:
8:
mt−1
vi →ej (z),
ej ∈Ei \ej
ce xe ,
(MCF)
e∈E
Δ(v, e)xe = bv , ∀v ∈ V
(demand constraints)
e∈Ev
0 ≤ xe ≤ ue ,
∀z ∈ R.
∀ej , vi , where ej ∈ Ei , set mtvi →ej (z) to
⎧
⎨
ψi (z̄) +
min
z̄∈R|Ei | ,z̄j =z ⎩
min
⎫
⎬
mtej →vi (z̄j ) ,
⎭
∀z ∈ R.
t := t + 1
end for
Set the “belief” function as bN
j (z) = φj (z) +
N
m
(z),
∀1
≤
j
≤
n.
vi →ej
vi ∈Vj
Calculate the “belief” by finding x̂N
=
j
arg min bN
(z)
for
each
j.
j
Return x̂N .
∀e ∈ E
(flow constraints)
where ce ≥ 0, ue ≥ 0, ce ∈ R, ue ∈ R̄, for each e ∈ E,
and bv ∈ R for each v ∈ V . As MCF is a special class
of LP, we can view each arc e ∈ E as a variable vertex
and each v ∈ V as a factor vertex, and define functions
ψv , φe , ∀v ∈ V , ∀e ∈ E:
0
if
e∈Ev Δ(v, e)ze = bv
ψv (z) =
∞
otherwise
φe (z) =
ce z
∞
if 0 ≤ z ≤ ue
otherwise
Algorithm 2 BP for Network Flow
1: Initialize t = 0. For each e ∈ E, suppose e = (v, w).
Initialize messages m0e→v (z) = 0, ∀z ∈ R and
m0e→w (z) = 0, ∀z ∈ R.
2: for t = 1, 2, 3, ..., N do
LP problem has a unique optimal solution, in general
3:
∀e ∈ E, let e = (v, w), update messages:
x̂N in Algorithm 1 does not converge to the unique opmte→v (z) = φe (z)
timal solution as N grows large. One such instance
⎫
⎧
(LP-relaxation of an instance of the maximum-weight
⎬
⎨
t−1
independent set problem) which was earlier described
mẽ→w
(z̄ẽ ) ,
ψw (z̄) +
+
min
⎭
z̄∈R|Ew | ,z̄e =z ⎩
in [19]:
ẽ∈Ew \e
9:
∀z ∈ R
min −
3
i=1
2xi −
3
s.t. xi + yj + zij = 1,
x, y, z ≥ 0
3
mte→w (z) = φe (z)
⎫
⎧
⎬
⎨
+
min
mt−1
(z̄ẽ ) ,
ψv (z̄) +
ẽ→v
⎭
z̄∈R|Ev | ,z̄e =z ⎩
3yi
j=1
∀1 ≤ i ≤ 3, 1 ≤ j ≤ 3
ẽ∈Ev \e
BP Algorithm for Min-Cost Network Flow
Problem
We start off this section by defining the capacitated mincost network flow problem (MCF). Given a directed
graph G = (V, E), let V , E denote the set of vertices
and arcs respectively, and let |V | = n, |E| = m. For any
vertex v ∈ V , let Ev be the set of arcs incident to v, and
for any e ∈ Ev , let Δ(v, e) = 1 if e is an out-arc of v (i.e.
arc e = (v, w), for some w ∈ V ), and Δ(v, e) = −1 if e
is an in-arc of v (i.e. arc e = (w, v), for some w ∈ V ).
MCF on G is formulated as follows [7],[1]:
281
∀z ∈ R.
4:
5:
6:
t := t + 1
end for
∀e ∈ E, let e = (v, w), and set the “belief” function:
N
N
nN
e (z) = me→v (z) + me→w (z) − φe (z)
Calculate the “belief” by finding x̂N
=
e
arg min nN
(z)
for
each
e
∈
E.
e
8: Return x̂N , which is the guess of the optimal
solution of MCF.
7:
Copyright © by SIAM.
Unauthorized reproduction of this article is prohibited.
Clearly, solving MCFis equivalent to solving
minx∈R|E| { v∈V ψv (xEv )+ e∈E φe (xe )}. We can then
apply BP algorithm for LP (Algorithm 1) for MCF.
Because of the special structure of MCF, each variable
node is adjacent to exactly two factor nodes. This allows us to skip the step of update messages mtv→e , and
present a simplified version of BP algorithm for MCF,
which we refer to as Algorithm 2.
To understand Algorithm 2 at a high level, each
arc can be thought of as an agent, which is trying to
figure out its own flow while meeting the conservation
constraints at its endpoints. Each link maintains an
estimate of its “local cost” as a function of its flow (thus
this estimate is a function, not a single number). At
each time step an arc updates its function as follows:
the cost of assigning x units of flow to link e is the cost
of pushing x units of flow through e plus the minimumcost way of assigning flow to neighboring edges (w.r.t.
the functions they computed last time step) to restore
flow conservation at the endpoints of e.
Before formally stating theorem of convergence of
BP for MCF, we first give the definition of a residual
network [1]. Define G(x) to be the residual network of
G and flow x as: G(x) has the same vertex set as G,
and ∀e = (v, w) ∈ E, if xe < ue , then e is an arc in
G(x) with cost cxe = ce , also, if xe > 0, then there is
an arc e = (w, v) in G(x)
with cost cxe = −ce , now, let
x
δ(x) = minC∈C {c (C) = e∈C cxe }, where C is the set
of directed cycles in G(x). Note that if x∗ is a unique
optimal solution of MCF with directed graph G, then
δ(x∗ ) > 0 in G(x∗ ).
(p(f ) denotes the number of linear pieces in f ) and
ci (z − ai−1 ) + f (ai−1 ) for z ∈ [ai−1 , ai ] the ith linear
piece of f .
Clearly, if f is a convex piecewise-linear function, then
we can store all the“information” about f in a finite
vector of size O(p(f )).
Property 3.1. Let f1 (x), f2 (x) be two convex
piecewise-linear functions. Then, f1 (ax + b), cf1 (x) +
df2 (x) are also convex piecewise-linear functions, for
any real numbers a, b, c and d, where c ≥ 0, d ≥ 0.
Definition 3.2. Let S = {f1 , f2 , ..., fk } be a set of
convex piecewise-linear functions, and let Ψt : Rk → R
be:
k
0
if
i=1 xi = t
Ψt (x) =
∞
otherwise
We say IS (t) = minx∈Rk {ψt (x) + ki=1 fi (xi )} is a
interpolation of S.
Theorem 3.1. Suppose MCF has a unique optimal
solution x∗ . Define L to be the maximum cost of a
simple directed path in G(x∗ ). Then for any N ≥
L
N
(
2δ(x
= x∗ .
∗ ) + 1)n, we have x̂
Figure 2: Functions f1 and f2
Figure 3: Interpolation of f1 and f2
Thus, by Theorem 3.1, the BP algorithm finds the
L
unique optimal solution of MCF in at most (
2δ(x
∗) +
1)n iterations.
3.1 Computing/Encoding Message Functions
First, we formally define a convex piecewise-linear function:
Lemma 3.1. Let f1 , f2 be two convex piecewise-linear
Definition 3.1. A function f is a convex piecewisefunctions, and suppose z1∗ , z2∗ are vertices of f1 , f2
linear function if for some finite set of reals, a0 < a1 <
and z1∗ = arg min f1 (z), z2∗ = arg min f2 (z). Let
... < an , (allowing a0 = −∞ and an = ∞), we have:
S = {f1 , f2 }, then IS (t) (the interpolation of S) is a
⎧
convex piecewise-linear function and it can be computed
if z ∈ [a0 , a1 ]
⎨ c1 (z − a1 ) + f (a1 )
innO(p(f1 ) + p(f2 )) operations.
ci+1 (z − ai ) + f (ai )
if z ∈ (ai , ai+1 ], 1 ≤ i ≤
f (z) =
⎩
∞
otherwise
Proof. We prove this lemma by describing a procedure
where c1 < c2 < ... < cn and f (a1 ) ∈ R. We to construct IS (t). The idea behind construction of
define a0 , a1 , ..., an as the vertices of f , p(f ) = n IS (t) is essentially to “stitch” together the linear pieces
282
Copyright © by SIAM.
Unauthorized reproduction of this article is prohibited.
of f1 and f2 . Specifically, let g(t) be the function
which is defined only at z1∗ + z2∗ , with g(z1∗ + z2∗ ) =
f1 (z1∗ )+ f2 (z2∗ ), and let L1 = z1∗ , L2 = z2∗ , U1 = z1∗ , U2 =
z2∗ . We will construct g iteratively and eventually have
g(t) = IS (t). The construction is described as follows:
at each iteration, let X1 (and X2 ) be the linear piece of
f1 (and f2 ) at the left side of L1 (and L2 ). Then choose
the linear piece with the larger slope from {X1 , X2 }, and
“stitch” this piece onto the left side of the left endpoints
of g. If Pi is chosen, update Li to the vertex which is
on the left end of Pi . As an example, consider f1 and
f2 in Figure 2, then z1∗ = 1 and z2∗ = 0 are vertices of f1
and f2 such that z1∗ = arg min f1 (z), z2∗ = arg min f2 (z).
Note that the linear piece X1 in the procedure is labeled
as P 1 on the graph, while X2 does not exist (since there
is no linear piece for f2 on the right side of z2 ). Hence,
we “stitch” P 1 to the left side of g, and update L1 to 0.
Similarly, let Y1 (Y2 ) be the linear piece of f1 (f2 )
at the right side of U1 (U2 ), then choose the linear piece
with the smaller slope and “stitch” this piece onto the
right side of the right endpoints of g. If Qi is chosen,
update Ui to the vertex which is on the right side of Pi .
Again, we use f1 and f2 in Figure 2 as an illustration.
Then the linear piece Y1 in the procedure is labeled as
P 2, while Y2 is labeled as P 3. As P 2 has a lower slope
than P 3, we “stitch” P 2 to the right side of g and update
U1 to 2.
Repeat this procedure until both L1 (and L2 )
and U1 (and U2 ) are the left most (and right most)
endpoints of f1 (and f2 ), or the both endpoints of g are
infinity. See Figure 2 and Figure 3 as an illutration of
interpolation of two functions.
Note that the total number of iterations is bounded
by O(p(f1 ) + p(f2 )), and each iteration takes at most
constant number of operations. By construction, it
is clear that g is a convex piecewise-linear function.
Also, g(z1∗ + z2∗ ) = f1 (z1∗ ) + f2 (z2∗ ), and by the way we
constructed g, we must have g(t) ≤ {Ψt (x) + f1 (x1 ) +
f2 (x2 )} for any t ∈ R. Therefore, we have g = IS , and
we are done.
Proof. For the sake of simplicity, assume k is divisible by
2. Let S1 = {f1 , f2 }, S2 = {f3 , f4 }, ..., S k = {fk−1 , fk },
2
and S = {IS1 , IS2 , ..., IS k }. Then one can observe that
2
IS = IS , by definition of IS . By Lemma 3.1, each
function S is piecewise-linear, and S can be computed
in O(P ) operations. Consider changing S to S as a
procedure of decreasing the number of convex piecewiselinear functions. This procedure reduces the number by
a factor of 2 each time, and consumes O(P ) operations.
Hence, it takes O(log k) procedures to reduce set S into
a single convex piecewise-linear function. And hence,
computing IS (t) takes O(P · log k) operations.
Definition 3.3. Let S = {f1 , f2 , ..., fk } be a set of
convex piecewise-linear functions, a ∈ Rk , and let Ψt :
Rk → R be:
k
0
if
i=1 ai xi = t ,
Ψt (x) =
∀v ∈ V
∞
otherwise
We say ISa (t) = minx∈Rk {ψt (x) +
scaled interpolation of S.
k
i=1
fi (xi )} is a
Corollary 3.1. Given any set of convex piecewiselinear functions S, ISa (t) is a convex piecewise-linear
function. Suppose |S| = k, and P =
f ∈S p(f ),
then, the corners and slopes of IS (t) can be found in
O(P · log k) operations.
Proof. Let S = {f1 (x), f2 (x), ..., fk (x)}, and let S =
{f1 (x) = f1 (a1 x), f2 (x) = f2 (a2 x), ..., fk (x) =
fk (ak x)}. Then we have ISa (t) = IS (t), and the result
follows immediately from Theorem 3.2.
Corollary 3.2. For any nonnegative integer t, e ∈ E
such that e = (v, w) (or e = (w, v)), then mte→v , a
message function at t-th iteration of Algorithm 2, is a
convex piecewise-linear function.
Proof. We show this by doing induction on t. For t = 0,
mte→v is a convex piecewise-linear function by definition.
For t > 0, let us remind
that mte→v (z) =
ourselves
t−1
Remark. In the case where either of f1 or f2 do φe (z)+minz̄∈R|Ew | ,z̄e =z ψw (z̄) + ẽ∈Ew \e mẽ→w (z̄ẽ ) ,
not have global minimal, the interpolation can be still for z ∈ R.
t−1
obtained in O(p(f1 ) + p(f2 )) operations, but it involves
By induction hypothesis, any mẽ→w
is a piecewisetedious analysis of different cases. As reader will notice, linear convex function. Suppose S = {mt−1
ẽ→w (z̄ẽ ),
the implementation of Algorithm 2 does not require this ẽ ∈ Ew \ e}, and a = REw \e , ae = Δ(v, e)ze for any
case and hence we don’t discuss it here.
e ∈ Ew \ e. Then, g(z) = minz̄∈R|Ew | ,z̄e =z {ψw (z̄) +
t−1
a
ẽ∈Ew \e mẽ→w (z̄ẽ )} is equal to IS (cz + d) for some real
Theorem 3.2. Given a set S consisting of convex number c and d (i.e., a shifted scaled interpolation of
piecewise-linear functions, IS (t) is a convex piecewise- S). By Corollary 3.1, g(z) is a convex piecewise-linear
linear
function. Moreover, suppose |S| = k, and P = function, and φe is a convex piecewise-linear function.
t
f ∈S p(f ). Then, IS (t) can be computed in O(P · log k) Therefore, we have that me→v = g + φe is a convex
operations.
piecewise-linear function.
283
Copyright © by SIAM.
Unauthorized reproduction of this article is prohibited.
(for some e = (u, v), t ∈ N+ ) is a piecewise-linear function with at most ue linear pieces. Thus, if ue is bounded
by some constant for all e, while b and u are integral,
then the message functions at each iteration are much
simpler. Next, we present a subclass of MCF, which
can be solved quickly using BP compare to the more
Proof. We prove the corollary using induction. For general MCF. Given a directed graph G = (V, E), this
t = 0, the statement is clearly true. For t ≥ 1, we subclass of problem is defined as follows:
recall that mte→v (z) − φe (z) = minz̄∈R|Ew | ,z̄e =z {ψw (z̄) +
t−1
ẽ∈Ew \e mẽ→w (z̄ẽ )} is a scaled interpolation of message
t−1
min
ce xe
(MCF )
functions in the form of mẽ→w
(possibly shifted). Since
e∈E
Δ(v, e) = ±1, the absolute values of the slopes for the
linear pieces of mte→v −φe is same as the absolute values
Δ(v, e)xe = bv ,
∀v ∈ V
of the slopes for the linear pieces of message functions
e∈Ev
mt−1
ẽ→w which was used for interpolation. Therefore, by
xe ≤ ũv , ∀v ∈ V, δ(v) = {(u, v) ∈ E}
induction hypothesis, the absolute values of the slopes
e∈δ(v)
of mte→v − φe are integral and bounded by (t − 1)cmax .
0 ≤ xe ≤ ue ,
∀e ∈ E
This implies the absolute values of slopes of mte→v are
integral and bounded by tcmax .
Corollary 3.3. Suppose the cost vector c in MCF is
integral, then at iteration t, for any defined message
function mte→v (z), let {s1 , s2 , ..., sk } be the slopes of
mte→v (z), then −tcmax ≤ s1 ≤ sk ≤ tcmax , and si is
integral for each 1 ≤ i ≤ k.
Corollary 3.4. Suppose the vectors b and u are integral in MCF, then at iteration t, for any message
function mte→v , the vertices of mte→v are also integral.
Proof. Again, we use induction. For t = 0, the
statement is clearly true. For t ≥ 1, as mte→v − φe is a
piecewise-linear function obtained from a shifted scaled
t−1
interpolation of functions mẽ→w
. Since Δ(v, e) = ±1,
and b is integral, the shift of the interpolation is integral,
and as Δ(v, e) = ±1, the scaled factor is ±1. As all
t−1
vertices in any function mẽ→w
are integral, we deduce
t
that all vertices of me→v − φe must also be integral.
Since u is integral, all vertices of φe are integral, and
therefore, all vertices of mte→v are integral.
Corollary 3.1 and Corollary 3.2 shows that at each
iteration, every message function can be encoded in
terms of a finite vector describing the corners and
slopes of its linear pieces in finite number of iterations.
Using similar argument, we can see that the message
functions in Algorithm 1 are also convex piecewise-linear
functions, and thus can also be encoded in terms of
a finite vector describing the corners and slopes of its
linear pieces. We would like to note the result that
message functions mte→v are convex piecewise-linear
functions can be also shown by sensitivity analysis of
LP (see Chapter 5 of [7]).
3.2 BP on Special Min-cost Flow Problems Although we have showed in Section 3.1 that each message
functions can be computed as piecewise-linear convex
functions, those functions tends to become complicated
to compute and store as the number of iterations grows.
When an instance of MCF has b, u, integral, then mte→v
c, u, and ũ are all integral, and all of their entries are
bounded by a constant K.
To see MCF is indeed a MCF, observe that for
each v ∈ V , if we split v into two vertices vin and vout ,
where vin is incident to all in-arcs of v with bvin = 0,
and vout is incident to all out-arcs of v with bvout = bv ;
while create an arc from vin to vout , set the capacity of
the arc to be ũv , and cost of the arc to be 0. Let the
new graph be G , then, the MCF on G is equivalent
to MCF . Although we can use Algorithm 2 to solve
the MCF on the new graph, we would like to define
functions ψv , φe , ∀v ∈ V , ∀e ∈ E:
⎧
⎨ 0
ψv (z) =
⎩
∞
φe (z) =
if
e∈E
v Δ(v, e)ze = bv
and
e∈δ(v) xe ≤ ũv
otherwise
ce z
∞
if 0 ≤ z ≤ ue
otherwise
and apply Algorithm 2 with defined ψ and φ. When
we update message functions mte→v for all e ∈ Ew , the
inequality e∈δ(w) xe ≤ ũw , allow us to only look at
the ũw linear pieces from message functions mt−1
ẽ→w , for
all but constant number of e ∈ Ew . This allows us to
further trim the running time of BP for MCF , and we
will present this more formally in Section 6.1.
4 Convergence of BP for MCF
Before proving Theorem 3.1, we define the computation trees, and then establish the connection between
computation trees and BP. Using this connection, we
proceed to prove the main result of our paper.
284
Copyright © by SIAM.
Unauthorized reproduction of this article is prohibited.
e5
v2
G
e1
v1
e2
v4
v2
e4
v1
v3
v2
v4
v4 v5
e6
v3
e3
v5
e7
demand/supply constraint for every node v ∈ V o (TeN ).
Now, we state the lemma which exhibits the connection
between BP and the computation trees.
∗
Lemma 4.1. MCF N
e has an optimal solution y , satisN
fying yr∗ = x̂N
(r
is
the
root
of
T
,
Γ(r)
=
e).
e
e
v1 v4 v1 v4 v2 v3 v1 v2
Te23
Figure 4: Computation tree of G rooted at e3 = (1, 3)
4.1 Computation Tree and BP Let TeN , the N computation tree corresponding to variable xe , be defined as follows:
1. Let V (TeN ) be the vertex set of TeN and E(TeN ) be
the arc set of TeN . Let Γ(.) be the map which maps
each vertex v ∈ V (TeN ) to a vertex v ∈ V . And
Γ preserves the arcs (i.e. For each e = (vα , vβ ) ∈
E(TeN ), Γ(e ) = (Γ(vα ), Γ(vβ )) ∈ E).
This result is not too surprising as BP is exact on
trees. A thorough analysis can show that the message
functions in BP are essentially the implementation
of an algorithm using the“bottom up” (aka dynamic
programming) approach, starting from the bottom level
of Tet . A more formal, combinatorial proof of this lemma
also exists, which is a rather technical argument using
induction. Proofs for similar statements can be found
in [4] and [18].
4.2
Proof of the Main Theorem
Proof. [Proof of Theorem 3.1] Suppose ∃e0 ∈ E and
L
N
∗
N ≥ (
2δ(x
∗ ) +1)n such that x̂e0 = xe0 . By Lemma 4.1,
we can find an optimal solution y ∗ for MCF N
e , where
2. Divide V (TeN ) into N + 1 levels, on the 0-th level, y ∗ = x̂N . Now, without loss of generality, we0 assume
r
e
we have a “root” arc r, and Γ(r) = e. And a vertex y ∗ > x∗0 . Let r = (v , v ), because y ∗ is feasible for
α β
r
e0
N
v is on the t-th level of Te if v is t arcs apart from
∗
MCF N
e0 and x is feasible for MCF, we have
either nodes of r.
Δ(vα , e)ye∗ = yr∗ +
Δ(vα , e)ye∗
3. For any v ∈ V (TeN ), let Ev be the set of all arcs bΓ(vα ) =
N
e∈Evα
e∈Evα \r
incident to v . If v is on the t-th level of Te , and
t < N , then Γ(Ev ) = EΓ(v ) (i.e. Γ preserves the b
∗
∗
Δ(vα , e)xΓ(e) = xe +
Δ(vα , e)x∗Γ(e)
Γ(vα ) =
neighborhood of v ).
e∈E
e∈E \r
vα
4. For every vertex v that is on the N -th level of
v is incident to exactly one arc in TeN .
TeN ,
In other literatures, TeN is known as the “unwrapped
tree” of G rooted at e. Figure 4 gives an example of a
computation tree. We make a note that the definition
of computation trees we have introduced is slightly
different from the definition in other papers [4] [6] [18],
although the analysis and insight for computation trees
is very similar.
Let V o (TeN ) be the set of all the vertices which are
not on the N -th level of TeN . Consider the problem
min
cΓ(f ) xf
vα
Then, we can find arc e1 incident to vα0 , e1 = r, such
that ye∗ > x∗Γ(e ) if e1 has the same orientation as r,
1
1
and ye∗ < x∗Γ(e ) if e1 has the opposite orientation as
1
1
r. Similarly, we can find some arc e−1 incident to vβ
satisfying the same condition. Let vα1 , vα−1 be the
vertices on e1 , e−1 which are one level below vα , vβ on
TeN , then, we can find some arc e2 , e−2 satisfying the
same condition. If we continue this all the way down
to the leaf nodes of TeN0 , we can find the set of arcs
{e−N , e−N +1 , ..., e−1 , , e1 , ..., eN } such that
ye∗i > x∗Γ(ei ) ⇐⇒ ei has the same orientation as r
ye∗i < x∗Γ(e ) ⇐⇒ ei has the opposite orientation as r
(MCF N
e )
i
f ∈E(TeN )
Let X = {e−N , e−N +1 , ..., e−1 , e0 = r, e1 , ..., eN }.
For any e = (vp , vq ) ∈ X, let Aug(e ) =
f ∈Ev
(vp , vq ) if ye∗ > x∗Γ(e ) , and Aug(e ) = (vq , vp )
N
if ye∗ < x∗Γ(e ) .
Then, each Γ(Aug(e )) is an
0 ≤ xf ≤ uΓ(f ) ,
f ∈ E(Te )
arc on the residual graph G(x∗ ), and W =
(Aug(e−N ), Aug(e−N +1 ), ..., Aug(e0 ), ..., Aug(eN )) is a
N
N
Loosely speaking, MCF e is almost an MCF on Te . directed path on TeN0 . We call W the augmenting path
There is a flow constraint for every arc e ∈ E(TeN ), and a of y ∗ with respect to x∗ . Also, Γ(W ) is a directed
Δ(v, f )xf = bΓ(v) ,
∀v ∈ V o (TeN )
285
Copyright © by SIAM.
Unauthorized reproduction of this article is prohibited.
walk on G(x∗ ). Now we can decompose Γ(W ) into
a simple directed path P (W ), and a set of simple directed cycles Cyc(W ). Because each simple directed
cycle/path on G(x∗ ) can have at most n arcs, and W
L
has 2N + 1 arcs, where N ≥ (
2δ(x
∗ ) + 1)n, we have
L
that |Cyc(W )| > δ(x∗ ) . Then, we have
Proof. The proof of this corollary uses the same idea of
finding an augmenting path on the computation tree as
in Theorem 3.1.
(⇒): Suppose MCF has a unique optimal solution.
Let y be the optimal solution MCF N
e0 when the variable
at r, yr is fixed to be ze∗ − 1. Then, we can find an
augmenting path W of y with respect to x∗ of length
2n2 cmax . Then W can be decomposed into at least
c∗ (C) + c∗ (P (W ))
c∗ (W ) =
2ncmax disjoint cycles and one path. Since each cycle
C∈Cyc(W )
has a cost of at most −δ(x∗ ), which is at least -1 as
≥
c∗ (C) − L
MCF has integral data. Hence, once we push 1 unit of
C∈Cyc(W )
flow of y on W , we can decrease the cost of y by at least
∗
N ∗
L
ncmax . Hence nN
∗
e (ze − 1) + ncmax < ne (ze ). Similarly,
·
δ(x
>
)
−
L
N
∗
N
∗
we can also show that ne (ze ) < ne (ze + 1) + ncmax .
δ(x∗ )
(⇐): Suppose MCF does not have a unique optimal
=0
solution. Let x∗ be an optimal solution of MCF, and y
N
Let F orw = {e|e ∈ X , ye∗ > x∗Γ(e) }, Back = {e|e ∈ be the optimal solution MCF e0 . Again, we can find an
∗
P , ye∗ < x∗Γ(e) }. Since both F orw and Back are finite, augmenting path W of y with respect to x of length
2
we can find λ > 0 such that ye∗ − λ ≥ x∗Γ(e) , ∀e ∈ F orw 2n cmax , and we can decompose W into at least 2ncmax
N
disjoint cycles. and one path P . The cost of P is at
and ye∗ + λ ≤ x∗Γ(e) , ∀e ∈ Back. Let ỹ ∈ R|E(Te0 )| such most (n − 1)c
max , while each cycle has a non-positive
that
⎧ ∗
cost.
Hence,
when
we push 1 unit of flow of y on W ,
⎨ ye − λ : e ∈ F orw
we
increase
the
cost
of y by at most (n − 1)cmax <
∗
y + λ : e ∈ Back
ỹe =
⎩ e
nc
.
Therefore,
depends
on the orientation of r on
max
0
: otherwise
∗
∗
W , we either have nN
(z
−
1) − ncmax ≤ nN
e
e
e (ze ) or
∗
N ∗
We can think ỹ as pushing λ units of flow on W for nN
e (ze + 1) − ncmax ≤ ne (ze ).
y ∗ . Since for each e ∈ F orw, ye∗ − λ ≥ x∗Γ(e) ≥ 0,
and for each e ∈ Back, ye∗ + λ ≤ x∗Γ(e) ≤ uΓ(e) , ỹ Theorem 4.1 shows that BP can be used to detect the
satisfy all the flow constraints. Also, because F orw = existence of an unique optimal solution for MCF.
{e|e ∈ X , e has the same orientation as r}, and Back =
{e|e ∈ X , e has the opposite orientation
as r}, we 5 Extension of BP on Convex-Cost Network
o
N
Flow Problems
have
that
for
any
v
∈
V
(T
),
Δ(v,
e)ỹe =
e
e∈Ev
∗
Δ(v,
e)y
=
b
,
which
implies
ỹ
satisfies
all
In this section, we discuss the extension of Theorem 3.1
Γ(e)
e
e∈Ev
the demand/supply constraints. Therefore, ỹ is feasible to convex-cost network flow problem (or convex flow
solution for (MCF N
problem). Convex flow problem is defined on graph
e0 ). But
G = (V, E) as follows:
cΓ(e) (ye∗ ) −
cΓ(e) (ỹe )
min
ce (xe )
(CP)
e∈E(TeN0 )
e∈E(TeN0 )
e∈E
cΓ(e) λ −
cΓ(e) λ
=
Δ(v, e)xe = bv ,
∀v ∈ V
e∈F orw
e∈Back
e∈Ev
= c∗ (W )λ
>0
0 ≤ xe ≤ ue ,
This contradicts the optimality of y ∗ , and completes the
proof of Theorem 3.1.
Corollary 4.1. Given an MCF with integral data c,
b and u. Suppose cmax = maxe∈E {ce }. Let Algorithm
2 run N = n2 cmax + n iterations. Let ze∗ = arg nN
e (z),
∗
N ∗
N ∗
then we have nN
(z
−1)−nc
>
n
(z
)
and
n
max
e
e
e
e
e (ze +
∗
1) − ncmax > nN
(z
)
iff
MCF
has
a
unique
optimal
e
e
solution.
∀e ∈ E
where each ce is a convex piecewise-linear function.
Notice that if we define ψ exactly same as we did for
MCF, and for each e ∈ E, define
if 0 ≤ z ≤ ue
ce (z)
φe (z) =
∞
otherwise
then, we can apply Algorithm 2 on the graph G with
functions ψ and φ, and hence obtain BP on convex flow
problem.
286
Copyright © by SIAM.
Unauthorized reproduction of this article is prohibited.
v2
Now, we give the definition of a residual graph
on convex cost flow problem. Suppose x is a feasible
solution for (CP), let G(x) be the residual graph for
G and x∗ defined as follows: ∀e = (vα , vβ ) ∈ E,
if xe < ue , then e is also an arc in G(x) with cost
e)
cxe = limt→0 c(xe +t)−c(x
(the value of the slope of ce at
t
the right side of xe ); if xe > 0, then there is an arc e =
e)
(vβ , vα ) in G(x) with cost cxe = limt→0 c(xe −t)−c(x
t
(negative value of the slope of ce at the left side of xe ).
Finally, let
x
δ(x) = min
ce .
e2
e1
v1
v3
e3
Figure 5:
As the size of Ew is at most n, by Corollary 3.1,
t−1
g(z) = minz̄∈R|Ew | ,z̄e =z {ψw (z̄) + ẽ∈Ew \e mẽ→w
(z̄ẽ )}
can be calculated in O(log n · ntcmax ) operations. As
mte→v (z) = g(z)+ φe (z), we have that mte→v can also be
C∈C
e∈C
calculated in O(log n · ntcmax ) operations. Since at each
iteration,
we update 2m message functions, the total
where C is the set of directed cycles in G(x).
number of computations at iteration t can be bounded
Theorem 5.1. Suppose x∗ is the unique optimal solu- by O(nm log n · tcmax ).
tion for CP. Let L to be the maximum cost of a simple
6.1. Given MCF has a unique optimal sodirected path in G(x∗ ), and by uniqueness of optimal so- Theorem
∗
lution
x
and
integral data, the BP algorithm finds the
L
∗
lution, δ(x ) > 0. Then, for any N ≥ (
2δ(x∗ ) + 1)m,
unique
optimal
solution of MCF in O(n5 m log n · c3max )
x̂N = x∗ . Namely, the BP algorithm converges to the
operations.
L
optimal solution in at most (
2δ(x
∗ ) + 1)m iterations.
Proof. Because c is integral, the value δ(x∗ ) in Theorem
The proof of Theorem 5.1 is almost identical to the 3.1 is also integral. Recall that L is the maximum cost of
proof of Theorem 3.1. Theorem 5.1 is an illustration a simple directed path in G(x∗ ). Since simple directed
of the power of BP, that not only it can solve MCF, path has at most n − 1 arcs, L is bounded by ncmax .
but possibly many other variants of MCF as well.
Thus, by Theorem 3.1, the BP algorithm (Algorithm
2) converges to the optimal solution of MCF after
6 Running Time of BP on MCF
O(n2 cmax ) iterations. Combine this with Lemma 6.1,
In the next two sections, we will always assume vectors we have that BP converges to the optimal solution in
c, u and b in problem MCF are integral. This assump- O(n2 cmax ·nm log n·n2 cmax ·cmax ) = O(n5 m log n·c3max )
tion is not restrictive in practice, as the MCF solved in operations.
practice usually have rational data sets [1]. We define
We would like to point out that for MCF with
cmax = maxe∈E {ce }, and provide an upper bound for
a
unique
optimal solution, Algorithm 2 can take an
the running time of BP on MCF in terms m (the numexponential
number of iterations to converge. Consider
ber of arcs in G), n (the number of nodes in G), and
the
MCF
on
the directed graph G shown in Figure 5.
cmax .
Take a large positive integer D, set ce1 = ce2 = D,
Lemma 6.1. Suppose MCF has integral data, in Al- ce3 = 2D − 1, bv1 = 1, bv2 = 0 and bv3 = −1. It can
gorithm 2, the number of operations to update all the be checked that x̂N
1 alternates between 1 and −1 when
messages at iteration t is O(nm log n · tcmax ).
2N + 1 < 2D
.
This
means that BP algorithm takes at
3
least
Ω(D)
iterations
to converge. Since the input size
Proof. At iteration t, fix a valid message function mte→v , of a large D is just log(D), we have that Algorithm 2 for
and we update it as:
MCF does not converge to the unique optimal solution
in polynomial time.
t
me→v (z)
⎫
⎧
⎬
⎨
6.1 Runtime of BP on MCF Here we analyze
min
mt−1
(z̄
)
ψw (z̄) +
=φe (z) +
ẽ
the run time of BP on MCF , which is defined in
ẽ→w
⎭
z̄∈R|Ew | ,z̄e =z ⎩
ẽ∈Ew \e
Section 3.2. We show that BP performs much better
on this special class of MCF. Specifically, we state the
As MCF has integral data, by Corollary 3.3, any
following theorem:
t−1
messages of the form mẽ→w has integral slopes, and
the absolute values of the slopes are bounded by (t − Theorem 6.2. If MCF has a unique optimal solution,
1)cmax . This implies for each mt−1
ẽ→w , the number of Algorithm 2 for MCF finds the unique optimal solution
t−1
4
linear pieces of mẽ→w is bounded by 2(t − 1)cmax . in O(n ) operations.
287
Copyright © by SIAM.
Unauthorized reproduction of this article is prohibited.
First, we state the convergence result of BP for MCF which is reminiscent of Theorem 3.1.
to update all messages in the set {mte→v |e ∈ Ew }, we
can obtain information (1), (2), (3) for all the messages
in {mte→v |e ∈ Ew } in O(n) operations, and hence
Theorem 6.3. Suppose MCF has a unique optimal can compute all messages in {mt |e ∈ E } in O(n)
w
e→v
solution x∗ . Define L to be the maximum cost of a operations.
∗
simple directed path in G(x ). Then, Algorithm 2 for
Therefore, updating all the message functions at
MCF converges to the optimal solution in at most every iteration takes O(n2 ) operations. By Theorem
L
(
2δ(x
∗ ) + 1)n iterations.
6.2, since L is bounded by nK, δ(x∗ ) ≥ 1, then
L
2
The proof of this theorem is exactly the same as the (
2δ(x∗ ) + 1)n is bounded by O(n ). This concludes
proof of Theorem 3.1, and is hence omitted. Now, that4 Algorithm 2 finds a unique optimal solution in
we apply this result to compute the running time of O(n ) operations.
Algorithm 2 for MCF .
Note both the shortest-path problem and bipartite
Proof. [Proof of Theorem 6.2] For the simplicity of the
proof, we assume that every linear piece in a message
function has unit length. This assumption is not
restrictive, as each linear piece in general has integral
vertices (from Corollary 3.4), so we can always break a
bigger linear piece into many unit length linear pieces
with the same slopes. At iteration t ≥ 1, recall mte→v (z)
is defined as:
⎫
⎧
⎬
⎨
φe (z) +
min
mt−1
(z̄
)
ψw (z̄) +
ẽ
ẽ→w
⎭
z̄∈R|Ew | ,z̄e =z ⎩
ẽ∈Ew \e
We claim that each mte→v can be constructed using the
following information:
t−1
(1) ũw smallest linear pieces in {mẽ→w
|ẽ ∈ δ(w) \ e},
where δ(w) denotes all in-arcs of w;
t−1
(2) (ũw − bw ) smallest linear pieces in {mẽ→w
|ẽ ∈
γ(w) \ e} where γ(w)
denotes
all
out-arcs
of
w;
t−1
(3) The value of ẽ∈Ew mẽ→w
(0).
Observe that once all three sets of information is
given, for any integer 0 ≤ z ≤ ue , one can find mte→v (z)
in a constant number of operations. As ue is also
bounded by a constant, we have that mte→v can be
computed in constant time when information (1), (2)
and (3) are given.
Now, since |Ew | is bounded by n, obtaining each
one of information (1), (2) or (3) for a specific message
mte→v takes O(n) operations. When we update all
the messages in the set {mte→v |e ∈ Ew }, note that
every e ∈ Ew except for a constant number of them,
information (1) for updating mte→v is equivalent to
t−1
ũw smallest pieces in mẽ→w
for some ẽ ∈ δ(w); and
information (2) for updating mte→v is equivalent ũw −bw
smallest linear pieces in mt−1
ẽ→w for some ẽ ∈ γ(w).
In fact, the exceptions are precisely those e such that
mt−1
e→w contain at least one of those ũw smallest linear
t−1
pieces in {mẽ→w
|ẽ ∈ δ(w)} or one of those (ũw − bw )
smallest linear pieces in {mt−1
ẽ→w |ẽ ∈ γ(w)}. Moreover,
t
update any message in the set
|e ∈ Ew } has
{me→vt−1
information (3) to be exactly ẽ∈Ew mẽ→w (0). Hence,
matching problem belongs to the class of MCF , where
ũ, b, u are all bounded by 2.
7
FPRAS Implementation
In this section, we provide a fully polynomial-time
randomized approximation scheme (FPRAS) for MCF
using BP as a subroutine. First, we describe the main
idea of our approximation scheme.
As Theorem 6.1 indicated, in order to come up with
an efficient approximation scheme using BP (Algorithm
2), we need to get around the following difficulties:
1. The convergence of BP requires MCF to have a
unique optimal solution.
2. The running time of BP is polynomial in m, n and
cmax .
For an instance of MCF, our strategy for finding an
efficient (1 + ) approximation scheme is to find a
modified cost vector c̄ (let MCF be the problem with
the modified cost vector) which satisfies the following
properties:
1. The MCF has a unique optimal solution.
2. c̄max is polynomial in m, n and 1 .
3. The optimal solution of MCF is a “near optimal”
solution of MCF. The term “near optimal” is
rather fuzzy and we will address this question later
in the section.
First, in order to find a modified MCF with a unique
optimal solution, we first state a result which is a variant
of the Isolation Lemma introduced in [15].
7.1
Variation of Isolation Lemma
Theorem 7.1. Let MCF be an instance of min-cost
flow problem with underlying graph G = (V, E), demand
vector b, constraint vector u. Let its cost vector c̄ be
generated as follows: for each e ∈ E, c̄e is chosen
288
Copyright © by SIAM.
Unauthorized reproduction of this article is prohibited.
independently and uniformly over Ne , where Ne is a
discrete set of 4m positive numbers (m = |E|). Then,
the probability that MCF has a unique optimal solution
is at least 12 .
(x∗ + λd)e ≤ ue , ∀e ∈ E. As c̄e > 0 for any e ∈ E,
and c̄T d = c̄T x∗∗ − c̄T x∗ = 0, there exists some e such
that de < 0. Let λ∗ = sup{λ | x∗ + λd is an optimal
solution of MCF}, since de < 0, λ∗ is bounded; and
since x∗ + d = x∗∗ , λ∗ > 0. By optimality of λ∗ , there
must exists some e such that either (x∗ + λ∗ d)e = 0 or
ue . Since λ∗ > 0, x∗e = (x∗ + λ∗ d)e , this contradicts
the assumption that D(e ) is satisfied. Thus, MCF
must have a unique optimal solution.
Proof. Fix an arc e1 ∈ E, and fix c̄e for all e ∈ E \ e1 .
Suppose that there exists α ≥ 0, such that when
c̄e1 = α, MCF have optimal solutions x∗ , x∗∗ , where
x∗e1 = 0 and x∗∗
e1 > 0. Then, if c̄e1 > α, for any feasible
solution x of MCF , where xe1 > 0, we have
We note that Theorem 7.1 can be easily modified for
LP in standard form.
c̄e x∗e =
c̄e x∗e
e∈E
e∈E,e=e1
≤
c̄e xe + xe1 α
e∈E,e=e1
<
c̄e xe
e∈E
And if c̄e1 < α, for any feasible solution x of MCF
where xe1 = 0, we have
∗∗
c̄e x∗∗
c̄e x∗∗
e <
e + αxe1
e∈E
e∈E,e=e1
≤
c̄e xe + αxe1
e∈E,e=e1
=
c̄e xe
e∈E
This implies there exists at most one value for α,
such that if c̄e1 = α then MCF have optimal solutions
x∗ , x∗∗ , where x∗e1 = 0 and x∗∗
e1 > 0. Similarly, we can
also deduce that there exists at most one value for β,
such that if c̄e1 = β, MCF have optimal solutions x∗ ,
x∗∗ , where x∗e1 < ue1 and x∗∗
e1 = ue1 .
Let OS denote the set of all optimal solutions of
MCF, and let D(e) be the condition of either: 0 = xe ,
∀x ∈ OS, or 0 < xe < ue , ∀x ∈ OS, or xe = ue ,
∀x ∈ OS. Since c̄e1 has 4m possible values, where
each value is chosen with equal probability, we conclude
that the probability of D(e1 ) is satisfied is at least
4m−2
2m−1
4m = 2m . By the union bound of probability, we
have that the
probability of D(e) is satisfied for all e ∈ E
1
is at least 1− e∈E 2m
= 12 . Now, we state the following
lemma:
Lemma 7.1. If ∀e ∈ E, condition D(e) is satisfied, then
MCF have a unique optimal solution.
Then by Lemma 7.1, we conclude that the probability
that MCF has a unique optimal solution is at least 12 .
Corollary 7.1. Let LP be an LP problem with constraint Ax = b, where A is a m × n matrix, b ∈ Rm .
The cost vector c̄ of LP is generated as follows: for each
e ∈ E, c̄e is chosen independently and uniformly over
Ne , where Ne is a discrete set of 2n elements. Then,
the probability that LP has a unique optimal solution is
at least 12 .
7.2 Finding the Correct Modified Cost Vector c̄
Next, we provide a randomly generate c̄ with the desired
properties stated in the beginning of this section. Let
X : E → {1, 2, ..., 4m} be a random function where for
each e ∈ E, X(e) is chosen independently and uniformly
max over the range. Let t = c4mn
, and generate c̄ as: for
each e ∈ E, let c̄e = 4m · cte + X(e). Then, for any
c̄ generated at random, c̄max is polynomial in m, n and
1
, and by Theorem 7.1, the probability of MCF having
a unique optimal solution is greater than 12 .
Now, we introduce algorithm APRXMT(MCF, ),
which works as follows: select a random c̄; try to solve
MCF using BP. If BP discovers that MCF has no
unique optimal solution, then we start the procedure
by selecting another c̄ at random, otherwise, return
the unique optimal solution found by BP. Formally, we
present APRXMT(MCF, ) as Algorithm 3.
Corollary 7.2. The expected number of operations
8
7
for APRXMT(MCF, ) to terminate is O( n m3log n ).
Proof. With
Theorem
7.1,
when
we
call
APRXMT(MCF, ), the expected number of MCF
BP tried to solve is bounded by 2. For each selection
of c̄, we run Algorithm 2 for 2c̄max n2 iterations. As
2
c̄max = O( m n ), by Lemma 6.1, the expected number
of operations for APRXMT(MCF, ) to terminate is
8
7
O(c̄3max n5 m log n) = O( n m3log n ).
Now let c̄ be a randomly chosen vector such that
MCF has a unique optimal solution x2 . The next thing
Proof. [Proof of Lemma 7.1] Suppose x∗ and x∗∗ are two we want is to show that x2 is a “near optimal” solution
distinct optimal solutions of MCF. Let d = x∗∗ − x∗ , of MCF. To accomplish this, let e = arg max{ce } and
then x∗ + λd is an optimal solution of MCF iff 0 ≤ define
289
Copyright © by SIAM.
Unauthorized reproduction of this article is prohibited.
we have that c̄ k ≤ 0. And
Algorithm 3 APRXMT(MCF, )
cmax 1: Let t = 4mn
, for any e ∈ E, assign c̄e = 4m ·
ce
c̄e = 4m
+ pe , 1 ≤ pe ≤ 4m,
ce
t +pe , where pe is an integer chosen indepent
dently, uniformly random from {1, 2, . . . , 4m}
ce
ce
4mce
∈ [4m
, 4m(
+ 1)],
=⇒ c̄e ,
2: Let MCF be the problem with modified cost c̄.
t
t
t
3: Run Algorithm 2 on MCF for N = 2c̄max n2
4mce
=⇒ |c̄e −
| ≤ 4m,
iterations.
t
4: Use Corollary 4.1 to determine if MCF has a unique
4mce
−
c̄
=⇒
|
||k
|
≤
|4m||ke | ≤ 4mn,
e
e
solution.
t
e
e
5: if MCF does not have a unique solution then
4mce
4mc k
4mc k
6:
Restart the procedure APRXMT(MCF, ).
≤
− c̄ k ≤
− c̄e ||ke |,
|
but
t
t
t
7: else
e
8:
Terminate and return x2 = x̂N , where x̂N if the
4mc k
≤ 4mn,
thus,
we
have
“belief” vector found in Algorithm 2.
t
9: end if
=⇒ c k ≤ nt.
Clearly, x1 + k∈K k satisfy the demand/supply
constraints
of MCF. Since min{x1e , x2e } ≤ x1e +
1
2
k∈K ke ≤ max{xe , xe }for all e ∈ E, and each
1
k ∈ K , we have that x + k∈K k is a feasible solution
for MCF. Since
x3 is the optimal solution of MCF,
T 3
T 1
c x ≤ c x + k∈K cT k ≤ cT x1 + |K |nt. Since
min
ce xe
(MCF)
|K | = |x2e − x1e |, we have cT x3 − cT x1 ≤ |x2e − x1e |nt.
e∈E
Δ(v, e)xe = bv , ∀v ∈ V (demand constraints) Corollary 7.3. cT x3 ≤ (1 + 2m
)cT x1 , for any ≤ 2.
e∈Ev
Proof.
xe = x2e
0 ≤ xe ≤ ue ,
∀e ∈ E
(flow constraints)
By Lemma 7.2,
Suppose x3 is an optimal solution for (MCF), and x1
is an optimal solution of MCF, then
|x2 − x1 |nt
cT x3 − cT x1
≤ e T 3e
T
3
c x
c x
|x2e − x1e |nt
≤ 2
|xe − x1e |ce
ce , as t = 4mn
=
.
4m
Thus,
T
3
T
1
Lemma 7.2. c x − c x ≤
|x2e
−
cT x3 − cT x1
≤
cT x3
4m
1
T 1
=⇒ cT x3 ≤
c x
1 − 4m
T 1
)c x
=⇒ cT x3 ≤ (1 +
2m
x1e |nt.
Proof. Consider d = x2 − x1 , we call a vector k ∈
{−1, 0, 1}|E| an synchronous cycle vector of d if for any
e ∈ E, ke = 1 only if de > 0, ke = −1 only if de < 0,
and the set {(i, j)|kij = 1 or kji = −1} forms exactly
one directly cycle in G. Since d is a integral vector
of circulation (i.e., d send 0 unti amount of flow to
every vertex
v ∈ V ). Hence that we can decompose
d so that k∈K k = d, for some set K of consisting of
synchronous cycle vectors.
Now, let K ⊂ K be the subset of K containing
all vectors k ∈ K such that ke = 0. Then, for any
k ∈ K , observe that x2 − k is a feasible solution for
MCF. Because x2 is the optimal solution for MCF,
where the last inequality holds because (1 −
2
2m ) = 1 + 4m − 8m ≥ 1, as 2m ≤ 1.
4m )(1
+
7.3 A (1 + ) Approximation Scheme Loosely
speaking, Corollary 7.3 shows that x2 at arc e is “near
optimal”, since fixing the flow at arc e to x2e helps us
in finding a feasible solution of MCF which is close
to optimal. This leads us to approximate algorithm
AS(MCF, ) (see Algorithm 4); at each iteration, it
uses APRXMT (Algorithm 3), and iteratively fixes the
flow at the arc with the largest cost.
290
Copyright © by SIAM.
Unauthorized reproduction of this article is prohibited.
Algorithm 4 AS(MCF, )
1: Let G = (V, E) be the underlying directed graph of
(MCF), m = |E|, and n = |V |.
2: while (MCF) contains at least 1 arcs do
3:
Run APRXMT (MCF, ), let x2 be the solution
returned.
4:
Find (i , j ) = e = arg maxe∈E {ce }, modify
MCF by fix the flow on arc e by x2e , and change
the demands/supply on nodes i , j accordingly.
5: end while
Theorem 7.2. The expected number of operations for
8
8
AS(MCF, ) to terminate is O( n m3log n ). Suppose x∗
be an answer returned by AS(MCF, ), then cT x∗ ≤
(1 + )cT x1 , for any ≤ 1.
known as ‘decimation’ (see [14]). To the best of
out knowledge, this is the first disciplined, provable
instance of decimation procedure. The ‘decimation’ in
our scheme is extremely conservative, and a natural
question is if there exists a more aggressive ’decimation’
heuristic which can improve the running time.
Acknowledgements
While working on this paper, D. Gamarnik was partially
supported by NSF Project CMMI-0726733; D. Shah was
supported in parts by NSF EMT Project CCF 0829893
and NSF CAREER Project CNS 0546590; and Y. Wei
was partially supported NSERC PGS M. The authors
would also like to thank the anonymous referees for the
helpful comments.
Proof. By Corollary 7.2, the expected number of References
operations for APRXMT(MCF, ) to terminates is
8
7
O( n m3log n ). Since AS(MCF, ) calls the method
[1] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin. PrenticeAPRXMT(MCF, ) m times, the expected numHall Inc. Addison-Wesley, 1993.
ber of operations for AS(MCF, ) to terminate is
[2] S. M. Aji and R. J. McEliece. The generalized distribu8
8
tive law. IEEE Transaction on Information Theory,
O( n m3log n ). And by Corollary 7.3,
cT x∗ ≤ (1 +
m T 1
) c x ≤ e 2 cT x1 ≤ (1 + )cT x1
2m
where the last two inequalities follows using standard
analysis arguments.
And thus, AS(MCF, ) is an (1 + ) approximation
scheme for MCF with polynomial running time.
8
Discussions
In this paper, we formulated belief propagation (BP) for
MCF, and proved that BP solves MCF exactly when
the optimal solution is unique. This result generalizes
the result from [6], and provides new insights for understanding BP as an optimization solver. Although the
running time of BP for MCF is slower than other existing algorithms for MCF, BP is rather a generic and
flexible heuristic, and one should be able to modify it to
solve other variants of network flow problems. One such
example, the convex-cost flow problem, was analyzed in
Section 5. Moreover, the distributed nature of BP can
be taken advantage of using parallel computing.
This paper also presents the first fully-polynomial
running time scheme for optimization problems using
BP. Since the correctness of BP often rely on the
optimization problem having a unique optimal solution,
Corollary 7.1 can be used for modifying BP in general
to work in the absence of a unique optimal solution.
In the approximation scheme, the heuristic of fixing
values on variables while running BP is commonly
291
46(2):325–343, March 2000.
[3] S. Arora, C. Daskalakis, and D. Steurer. Message
passing algorithms and improved lp decoding. 2009.
[4] M. Bayati, C. Borgs, J. Chayes, and R. Zecchina.
Belief-propagation for weighted b-matchings on arbitrary graphs and its relation to linear programs
with integer solutions.
Technical report, 2008.
arXiv:0709.1190v2.
[5] M. Bayati, A. Braunstein, and R. Zecchina. A rigorous analysis of the cavity equations for the minimum spanning tree. Journal of Mathematical Physics,
49(12):125206, 2008.
[6] M. Bayati, D. Shah, and M. Sharma. Max-product for
maximum weight matching: Convergence, correctness,
and lp duality. IEEE Transaction on Information
Theory, 54(3):1241–1251, March 2008.
[7] D. Bertsimas and J. Tsitsiklis. Introduction to Linear
Optimization, pages 289–290. Athena Scientific, third
edition, 1997.
[8] R. Gallager. Low Density Parity Check Codes. PhD
thesis, Massachusetts Institute of Technology, Cambridge, MA, 1963.
[9] G. B. Horn.
Iterative Decoding and Pseudocodewords. PhD thesis, California Institute of Technology,
Pasadena, CA, 1999.
[10] D. M. Malioutov, J. K. Johnson, and A. S. Willsky.
Walk-sums and belief propagation in gaussian graphical models. J. Mach. Learn. Res., 7:2031–2064, 2006.
[11] M. Mezard, G. Parisi, and R. Zecchina. Analytic and
algorithmic solution of random satisfiability problems.
Science, 297:812, 2002.
[12] C. Moallemi and B. V. Roy. Convergence of minsum message passing for convex optimization. In 45th
Copyright © by SIAM.
Unauthorized reproduction of this article is prohibited.
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
Allerton Conference on Communication, Control and
Computing, 2008.
C. C. Moallemi and B. V. Roy. Convergence of
the min-sum message passing algorithm for quadratic
optimization. CoRR, abs/cs/0603058, 2006.
A. Montanari, F. Ricci-Tersenghi, and G. Semerjian.
Solving constraint satisfaction problems through belief
propagation-guided decimation.
K. Mulmuley, U. Vazirani, and V. Vazirani. Matching is as easy as matrix inversion. Combinatorica,
7(1):105–113, March 1987.
J. Pearl. Causality: Models, reasoning, and inference.
Cambridge University Press, 2000.
T. Richardson and R. Urbanke. The capacity of
low-density parity check codes under message-passing
decoding. IEEE Transaction on Information Theory,
47(2):599–618, February 2001.
S. Sanghavi, D. Malioutov, and A. Willsky. Linear
programming analysis of loopy belief propagation for
weighted matching. In Proc. NIPS Conf, Vancouver,
Canada, 2007.
S. Sanghavi, D. Shah, and A. Willsky. Message-passing
for maximum weight independent set. July 2008.
A. Schrijver. Combinatorial Optimization. Springer,
2003.
Y. Weiss and W. Freeman. On the optimality of solutions of the max-product belief-propagation algorithm
in arbitrary graphs. IEEE Transactions on Information Theory, 47(2), February 2001.
J. Yedidia, W. Freeman, and Y. Weiss. Understanding
belief propagation and its generalizations. Technical
Report TR-2001-22, Mitsubishi Electric Research Lab,
2002.
292
Copyright © by SIAM.
Unauthorized reproduction of this article is prohibited.
Download