ppt

advertisement
Adam Wagner
Kevin Forbes
Motivation
Take advantage of GPU architecture for highly
parallel data-intensive application
 Enhance image segmentation using Microsoft
Kinect IR depth images
 Reduce frame-to-frame segmentation
overhead with optical flow and iterative
simulated annealing
 “Depth-supported real-time video segmentation
with the Kinect”

 Algorithm uses Potts model and Metropolis method
for segmentation on GPU
Implementation
No base source code
Software frameworks:



OpenCV – image capture, transformations, optical flow
OpenNI – Kinect middleware
CUDA – NVIDIA GPGPU driven architecture
Testbed: rcl1.engr.arizona.edu


CPU: Quad-core Intel Xeon 5160, 3.0GHz
GPU: NVIDIA GeForce GTX 480




480 CUDA Cores
GDDR5
Threads/block = 1024
Shared memory / block = 48KB
Methodology

Primary effort focused on parallelization of
segmentation algorithm
 Without source, code was written from scratch
for CPU, then parallelized
 Memory indexing rearranged to improve
coalescing of global loads/stores
 Much later in semester, some code became
available from paper authors

Image divided into thread blocks on GPU
 Image data loaded into block shared memory
from global memory
 Each thread performs state update on a single
pixel
Results
Time per Metropolis Iteration (ms)
Metropolis Iteration Time vs Image Size
1000
Input Image: 512 x 384
100
10
CPU rate (ms)
1
4096
GPU rate (ms)
32768
262144
2097152
0.1
0.01
Image Size (pixels)
RGB to HSV Conversion
Dimensions
128 x 96
256 x 192
512 x 384
1024 x 768
Size
12288
49152
196608
786432
CPU rate (ms)
10.5
41
169
860
GPU rate (ms)
0.054
0.157
0.54
2.115
2000 Metropolis Iterations
Conclusions

Parallelized algorithm shows vast
improvement over CPU version
 Makes real-time video processing a
possibility

Implementation does not match paper
 More improvement possible through use of
simpler data types
 Still more fine tuned memory arrangement
 Increase work done by each thread
Download