Adam Wagner Kevin Forbes Motivation Take advantage of GPU architecture for highly parallel data-intensive application Enhance image segmentation using Microsoft Kinect IR depth images Reduce frame-to-frame segmentation overhead with optical flow and iterative simulated annealing “Depth-supported real-time video segmentation with the Kinect” Algorithm uses Potts model and Metropolis method for segmentation on GPU Implementation No base source code Software frameworks: OpenCV – image capture, transformations, optical flow OpenNI – Kinect middleware CUDA – NVIDIA GPGPU driven architecture Testbed: rcl1.engr.arizona.edu CPU: Quad-core Intel Xeon 5160, 3.0GHz GPU: NVIDIA GeForce GTX 480 480 CUDA Cores GDDR5 Threads/block = 1024 Shared memory / block = 48KB Methodology Primary effort focused on parallelization of segmentation algorithm Without source, code was written from scratch for CPU, then parallelized Memory indexing rearranged to improve coalescing of global loads/stores Much later in semester, some code became available from paper authors Image divided into thread blocks on GPU Image data loaded into block shared memory from global memory Each thread performs state update on a single pixel Results Time per Metropolis Iteration (ms) Metropolis Iteration Time vs Image Size 1000 Input Image: 512 x 384 100 10 CPU rate (ms) 1 4096 GPU rate (ms) 32768 262144 2097152 0.1 0.01 Image Size (pixels) RGB to HSV Conversion Dimensions 128 x 96 256 x 192 512 x 384 1024 x 768 Size 12288 49152 196608 786432 CPU rate (ms) 10.5 41 169 860 GPU rate (ms) 0.054 0.157 0.54 2.115 2000 Metropolis Iterations Conclusions Parallelized algorithm shows vast improvement over CPU version Makes real-time video processing a possibility Implementation does not match paper More improvement possible through use of simpler data types Still more fine tuned memory arrangement Increase work done by each thread