Toru Tamaki, Miho Abe, Bisser Raytchev, Kazufumi Kaneda 19th Nov. 2010 Contribution of this talk Fast GPU implementations of registration algorithms for 3D point sets. Softassign [Gold et al., 1998] EM-ICP [Granger et al., 2002] (Weighted) Horn’s method [Horn, 1987] So, what is “registartion” ? What is “Registration” or “Alignment” ? Image registration A set of images 3D registration algorithm Input Two point sets: 𝑋 and 𝑌 Output Rotation matrix 𝑅 Translation vector 𝒕 𝑅 and 𝒕 X Y Algorithms for registration Horn’s method ICP (Iterative closest point) • Corresponding point sets are given. • Estimate R and t. • • • • Unknown correspondence. Fast, standard. Easily fail due to local minimum. A lot of variants follow. Registration algorithm Softassign EM-ICP • Unknown correspondence. • Robust. • Very slow because of iterations. • Unknown correspondence. • Robust. • Very slow because of iterations. Algorithms for registration Horn’s method ICP (Iterative closest point) • Corresponding point sets are given. • Estimate R and t. • • • • Unknown correspondence. Fast, standard. Easily fail due to local minimum. A lot of variants follow. Registration algorithm Softassign EM-ICP • Unknown correspondence. • Robust. • Very slow because of iterations. • Unknown correspondence. • Robust. • Very slow because of iterations. Horn’s method: correspondence is known. 𝒙1𝑇 𝑇 𝒙2 ⋮ 𝒚1𝑇 𝑇 𝒚2 ⋮ 𝑋 𝑌 Known correspondence X Y 𝒙1 = (𝑥1𝑥 , 𝑥1𝑦 , 𝑥1𝑧 )𝑇 Unknown correspondence ? X Y Horn’s method: correspondence is known. 𝒙1 𝑇 𝒙2 ⋮ 𝑇 3 𝒚1 𝑇 𝒚2 ⋮ 𝑇 𝑆 𝑋 𝑌 𝑋 = 𝑋 𝑌 𝑌 4 𝐾 = 𝑋−𝒙 𝒙 𝑌−𝒚 𝒚 Compute centers 1 Centering 2 5 Computer 1st Eigenvector 𝒒 : quaternion 𝑞 Convert 𝑞 to 𝑅 𝒕 = 𝒙 − 𝑅𝒚 Algorithms for registration Horn’s method ICP (Iterative closest point) • Corresponding point sets are given. • Estimate R and t. • • • • Unknown correspondence. Fast, standard. Easily fail due to local minimum. A lot of variants follow. Registration algorithm Softassign EM-ICP • Unknown correspondence. • Robust. • Very slow because of iterations. • Unknown correspondence. • Robust. • Very slow because of iterations. ICP: correspondence is unknown. 𝒙1𝑇 𝑇 𝒙2 ⋮ 𝒚1𝑇 𝑇 𝒚2 ⋮ 𝒚𝑖 𝑋 𝑌 𝑌∗ 𝒚𝑖 Find closest (nearest) point to 𝒙1 in 𝑌 Put the point to 𝑌 ∗ ICP: correspondence is unknown. 𝒙1𝑇 𝑇 𝒙2 ⋮ 𝒚1𝑇 𝑇 𝒚2 ⋮ 𝒚𝑖 𝒚𝑗 ⋮ Horn’s method with 𝑋 and 𝑌 ∗ 𝒚𝑗 𝑋 𝑌 𝑌∗ Estimate 𝑅 and 𝒕 Find closest (nearest) point to 𝒙1 in 𝑌 Put the point to 𝑌 ∗ ICP: correspondence is unknown. 𝒙1𝑇 𝑇 𝒙2 ⋮ 𝒚1𝑇 𝑇 𝒚2 ⋮ 𝒚𝑖 𝒚𝑗 Horn’s method with 𝑋 and 𝑌 ∗ ⋮ 𝒚𝑗 𝑋 𝑅𝑌 + 𝒕 𝑌∗ Estimate 𝑅 and 𝒕 Repeat Find closest (nearest) point to 𝒙1 in 𝑌 Put the point to 𝑌 ∗ Fast, but easy to fail due to hard correspondence. Algorithms for registration Horn’s method ICP (Iterative closest point) • Corresponding point sets are given. • Estimate R and t. • • • • Unknown correspondence. Fast, standard. Easily fail due to local minimum. A lot of variants follow. Registration algorithm Softassign EM-ICP • Unknown correspondence. • Robust. • Very slow because of iterations. • Unknown correspondence. • Robust. • Very slow because of iterations. Softassign: soft correspondence. 𝒚𝑗 𝑌 Weighted Horn’s method with 𝑋 and 𝑌 𝑀 𝒙𝑖 𝑋 𝑚𝑖𝑗 Each row and column should be normalized to 1 by Shinkhorn iterations 𝑚𝑖𝑗 = ||𝒙𝑖 − 𝑅𝒚𝑗 + 𝒕 || Estimate 𝑅 and 𝒕 Repeat Shinkhorn iterations 𝑀 𝑚𝑖𝑗 sum up to 1 sum up to 1 sum up to 1 ⋮ Each row and column should be normalized to 1 by Shinkhorn iterations sum up to 1 Repeat row and column normalization until converge. Shinkhorn iterations 𝑀 𝑚𝑖𝑗 Each row and column should be normalized to 1 by Shinkhorn iterations sum up to 1 ⋮ sum up to 1 sum up to 1 sum up to 1 Repeat row and column normalization until converge. Shinkhorn.GPU (row normalization) Using sgemv of CUBLAS 𝑀 1 1 1 ⋮ Each row and column should be normalized to 1 by Shinkhorn iterations 𝑹𝑀 𝟏 Shinkhorn.GPU (row normalization) Using CUDA kernel 𝑀 Row-wise division Each row and column should be normalized to 1 by Shinkhorn iterations 𝑹𝑀 Column normalization is done by the same way. Weighted Horn’s method Normal version Weighted version 3 3 𝑆 = 𝑋 𝑌 𝑆 = 𝑋 𝑀 Using CUBLAS sgemv twice. 𝑌 Centering.GPU (weighted version) CUDA kernel ∗ ∗ 𝑋 1 1 1 ⋮ CUBLAS sasum Weighted sum 𝑋 𝟏 𝑹𝑀 CUBLAS sasum ∗ 𝑹𝑀 Weighted center 𝒙 𝟏 Same as for 𝒚 Pipeline of Softassing.GPU 𝑌 𝑌 𝑋 𝑋 𝑀 Compute 𝑀 with CUDA kernel Shinkhorn.GPU 𝒙 ,𝒚 𝑅 and 𝒕 Solve Eigenvalue problem 𝐾 𝑆 𝑆= Centering.GPU 𝑋 𝑀 𝑌 Weighted Horn’s method Algorithms for registration Horn’s method ICP (Iterative closest point) • Corresponding point sets are given. • Estimate R and t. • • • • Unknown correspondence. Fast, standard. Easily fail due to local minimum. A lot of variants follow. Registration algorithm Softassign EM-ICP • Unknown correspondence. • Robust. • Very slow because of iterations. • Unknown correspondence. • Robust. • Very slow because of iterations. EM-ICP: soft correspondence. Pseudo correspondence 𝑋′ 𝒙𝑗 𝑋 Weighted Horn’s method with 𝑋′ and 𝑌 𝐴 𝒚𝑖 𝑑𝑖𝑗 𝒙′𝑖 𝑋′ 𝑌 Estimate 𝑅 and 𝒕 Each row is normalized once. 𝑑𝑖𝑗 = ||𝒙𝑗 − 𝑅𝒚𝑖 + 𝒕 || Repeat Row normalization on GPU Using sgemv of CUBLAS 𝐴 1 1 1 ⋮ Not normalized yet. 𝑪 𝟏 Row normalization on GPU Using CUDA kernel 𝐴 Row-wise division + sqrt Now normalized. 𝑪 Computing weights Using sgemv of CUBLAS 𝐴 1 1 1 ⋮ Now normalized. 𝝀 𝟏 Pseudo correspondence CUBLAS sgemv 𝐴 𝑋 Now normalized. Centering: same with Softassing.GPU 𝑋′ Weighted Horn’s method Weighted version (not efficient) 3 𝜆1 𝑆 = 0 𝜆2 𝑋′ 0 ⋱ 𝑌 Weighted version (2 steps) 3 CUDA kernel CUBLAS sgemm ∗ 𝑋′ 𝑆 𝝀 𝑋 = 𝑋’ 𝑌 Pipeline of EM-ICP.GPU 𝑌 𝑌 𝑋 𝑋 𝐴 Compute 𝐴with CUDA kernel Row normalization on GPU 𝒙 ,𝒚 𝑅 and 𝒕 Solve Eigenvalue problem 𝐾 𝑆 ∗ 𝑋′ Centering.GPU 𝑆 = 𝑋′ 𝑌 𝑋 𝝀 2 step weighted Horn’s method Computing time over different number of points GPU: GeForce8800GT CPU: Intel Core2 Quad + OpenMP (4 cores) Successfully aligned 5000 points less than 7 seconds. Slightly fast, but failed. Summary Implemented 3D registration algorithms on a GPU are: Softassign, EM-ICP, Weighted Horn’s method. EM-ICP.GPU is able to align 5000 points within 7 seconds, 60 times faster than EM-ICP.CPU, more robust than ICP.CPU. Code, binary, and movies are available at: http://home.hiroshima-u.ac.jp/tamaki/study/cuda_softassign_emicp/ Limitations Number of points Should be less than 8000 for GeForce8800GT with 512MB memory. More memory, more points. Stopping condition requires to store whole matrix 𝑀 or 𝐴, and compare with previous ones: inefficient. Hence, currently, number of iterations is fixed. Algorithms for registration Horn’s method ICP (Iterative closest point) • Corresponding point sets are given. • Estimate R and t. • • • • Unknown correspondence. Fast, standard. Easily fail due to local minimum. A lot of variants follow. Registration algorithm Softassign EM-ICP • Unknown correspondence. • Robust. • Very slow because of iterations. • Unknown correspondence. • Robust. • Very slow because of iterations.