Petter Strandmark Lund University Fredrik Kahl Image denoising Shape fitting from point clouds Stereo estimation Segmentation 3 1 1 2 S 1 1 2 3 1 2 1 1 2 1 2 1 5 2 T 1 1 3 1 1 2 1 1 1 2 1 4 Minimum cut: 4 3 Delong and Boykov, CVPR 2008 Implementation of push-relabel Excellent speed-up for 2-8 processors Method of choice for dense 3D graphs CUDA-cuts: Vineet and Narayanan, CVGPU CVPR 2008 Push-relabel on GPU Not clear what range of regularization can be used L1-norm: Bhusnurmath and Taylor, PAMI 2008 Solves continuous problem on GPU Not faster than augmenting paths on single processor Liu and Sun, CVPR 2010 ” Parallel Graph-cuts by Adaptive Bottom-up Merging” Splits large graph into several pieces Augmenting paths found separately Pieces merged together and search trees reused Our approach Graph split into several pieces Solutions constrained to be equal with dual variables Shared memory not required See Komodakis et al. in ICCV 2007 for dual decomposition Optimization Is convertedproblem into 𝑦21(𝑥+ 𝑓2 𝑥2 , 𝑦2 minmin 𝑓1 𝑥1 ,𝑓𝑦 1 𝑥+ 1, 𝑓 2 , 𝑦) 𝑥12,𝑥 𝑥1 ,𝑥 ,𝑦2 ,𝑦1 ,𝑦2 such that 𝑦1 = 𝑦2 . Dualize the constraint! The dual function is 𝑔 𝜆 = min 𝑓1 𝑥1 , 𝑦1 + 𝑓2 𝑥2 , 𝑦2 + 𝜆𝑇 𝑦1 − 𝑦2 = min 𝑓1 𝑥1 , 𝑦1 + 𝜆𝑇 𝑦1 𝑥1 ,𝑦1 +min 𝑥2 ,𝑦2 𝑓2 𝑥2 , 𝑦2 − 𝜆𝑇 𝑦2 Two separate problems! 3 ½ 2 2 1 2 2 𝑦1 1 𝜆4 S 3 1 2 1 1 2 1 3 4 3 1 1 ½ 3 1 1 = = = ≠ 1 1 1 2 𝑥1 1 2 4 𝜆4 T 1 3 𝑦2 𝑥2 Original Min-cut Problem Linear Program ? Decomposed Linear Program Decomposed Min-cut Problem Dual Linear Program Zero duality gap Dual function has a maximum such that the constraints are met Global solution guaranteed! Theorem: If the graph weights are even integers, there exists an integer vector maximizing the dual function. This means that the dual problem can be solved without floating point arithmetic. - Begin with a graph 1 2 3 Split into two parts = Constrained to be equal on the overlap Independent problems! Berkeley segmentation database 301 images 2 processors 4 processors Iteration 1 2 3 4 5 ... 10 11 Differences 108 105 30 33 16 ... 9 0 Time (ms) 245 1.5 1.2 0.1 0.08 ... 0.07 0.47 1152 × 1536 Easy problem: 230 ms Hard problem: 4s S This choice of split severes all possible s/t paths Parallel approach still 30% faster T LUNARC cluster 401 × 396 × 312 7 seconds 4 computers 95 × 98 × 30 × 19 80-connectivity 4 computers 512 × 512 × 2317 6-connectivity 36 computers 12.3 GB 131 GB Not much data need to be exchanged, 54kB in the first example 4D MRI data 3D CT data Dual decomposition allows: Faster processing Solving larger graphs Open source C++/Matlab Python