Parallel and Distributed Graph Cuts by Dual Decomposition

advertisement
Petter Strandmark
Lund University
Fredrik Kahl
Image denoising
Shape fitting from point clouds
Stereo estimation
Segmentation
3
1
1
2
S
1
1
2
3
1
2
1
1
2
1
2
1
5
2
T
1
1
3
1
1
2
1
1
1
2
1
4
Minimum cut: 4
3

Delong and Boykov, CVPR 2008
 Implementation of push-relabel
 Excellent speed-up for 2-8 processors
 Method of choice for dense 3D graphs

CUDA-cuts: Vineet and Narayanan, CVGPU CVPR 2008
 Push-relabel on GPU
 Not clear what range of regularization can be used

L1-norm: Bhusnurmath and Taylor, PAMI 2008
 Solves continuous problem on GPU
 Not faster than augmenting paths on single processor

Liu and Sun, CVPR 2010
 ” Parallel Graph-cuts by Adaptive Bottom-up Merging”
 Splits large graph into several pieces
 Augmenting paths found separately
 Pieces merged together and search trees reused

Our approach
 Graph split into several pieces
 Solutions constrained to be equal with dual variables
 Shared memory not required
See Komodakis et al. in ICCV 2007 for dual decomposition
Optimization
Is convertedproblem
into
𝑦21(𝑥+
𝑓2 𝑥2 , 𝑦2
minmin
𝑓1 𝑥1 ,𝑓𝑦
1 𝑥+
1, 𝑓
2 , 𝑦)
𝑥12,𝑥
𝑥1 ,𝑥
,𝑦2 ,𝑦1 ,𝑦2
such that 𝑦1 = 𝑦2 .
Dualize the constraint!
The dual function is
𝑔 𝜆 = min 𝑓1 𝑥1 , 𝑦1 + 𝑓2 𝑥2 , 𝑦2 + 𝜆𝑇 𝑦1 − 𝑦2
= min 𝑓1 𝑥1 , 𝑦1 + 𝜆𝑇 𝑦1
𝑥1 ,𝑦1
+min
𝑥2 ,𝑦2
𝑓2 𝑥2 , 𝑦2 − 𝜆𝑇 𝑦2
Two
separate
problems!
3
½
2
2
1
2
2
𝑦1
1
𝜆4
S
3
1
2
1
1
2
1
3
4
3
1
1
½
3
1
1
=
=
=
≠
1
1
1
2
𝑥1
1
2
4
𝜆4
T
1
3
𝑦2
𝑥2
Original
Min-cut Problem
Linear
Program


?
Decomposed
Linear Program
Decomposed
Min-cut Problem

Dual Linear
Program
 Zero duality gap
 Dual function has a maximum such that the
constraints are met
 Global solution guaranteed!
Theorem: If the graph weights are even
integers, there exists an integer vector
maximizing the dual function.
This means that the dual problem can be solved
without floating point arithmetic.
-

Begin with a graph
1
2
3

Split into two parts
=

Constrained to be equal
on the overlap
Independent problems!
Berkeley segmentation database
301 images
2 processors
4 processors
Iteration
1
2
3
4
5
...
10
11
Differences
108
105
30
33
16
...
9
0
Time (ms)
245
1.5
1.2
0.1
0.08
...
0.07
0.47
1152 × 1536
Easy
problem:
230 ms
Hard
problem:
4s
S
This choice of split severes all possible s/t paths
Parallel approach still 30% faster
T

LUNARC cluster
 401 × 396 × 312
7 seconds
4 computers
 95 × 98 × 30 × 19
80-connectivity
4 computers
 512 × 512 × 2317
6-connectivity
36 computers
12.3 GB
131 GB
Not much data need to be exchanged, 54kB in the first example
4D
MRI
data
3D
CT
data

Dual decomposition allows:
 Faster processing
 Solving larger graphs

Open source
 C++/Matlab
 Python
Download