Junpeng Wang Fei Yang Yong Cao Virginia Tech

advertisement
Cache-Aware Iso-Surface Volume Rendering with CUDA
Junpeng Wang
Virginia Tech
junpeng@vt.edu
Fei Yang
Chinese Academy of Sciences
yangfei09@mails.gucas.ac.cn
Yong Cao
Virginia Tech
yongcao@vt.edu
Introduction & Motivation
Warp Marching
Most of the existing GPU-based ray casting algorithms map
volume data to 3D texture of GPUs. However texture cache is
optimized in 2D space. As a result, the viewing direction has
significant effects on the texture cache hit rate. This paper
mitigates the cache penalty by presenting a new sampling
strategy, i.e. warp marching.
3D texture of the GPU
is organized as many 2D
16 17 20 21
18 19 22 23
slices and inside each
24 25 28 29
slice, z-order curves are
26 27 30 31
used to optimize the 2D
spatial locality.
(b): Warp Marching
0
1
4
5
16 17 20 21
2
3
6
7
18 19 22 23
8
9
12 13 24 25 28 29
cycle 1
10 11 14 15 26 27 30 31
cycle 1 cycle 2
32 33 36 37 48 49 52 53
34 35 38 39 50 51 54 55
(a): Standard
40 41 44 45 56 57 60 61
42 43 46 47 58 59 62 63
Standard Sampling: existing ray casting algorithms map one
ray to one GPU thread. At any given instance, the samples of
all rays are fitted into a plane. The algorithm accesses all
these samples on the plane in parallel. It then marches this
plane from front to back along a given camera orientation.
cycle 2
Standard Sampling: one thread per ray and marches one
sample on four rays in one warp cycle.
Warp Marching: four threads per ray and marches four
samples on one ray in one warp cycle.
Y
Y
Z
a
b
c
d
e
f
Z
X
X
facing XY
Samples are on the same slice
facing ZY
Samples cover multiple slices
Result & Analysis
beetle (16 bit, 0.65GB)
832x832x494
Standard
Texture
Cache Hit Warp Marching
Rate
Speedup
Standard
FPS
Warp Marching
Speedup
bat (8 bit, 1.12GB)
906x911x1466
Facing XY
beetle
bat
94.67%
94.13%
78.58%
75.33%
0.83
0.80
39.62
30.13
31.92
25.81
0.81
0.86
Facing ZY
beetle
Bat
56.32%
33.13%
95.83%
97.38%
1.70
2.94
9.80
2.10
35.99
29.49
3.67
14.04
GPU used: NVIDIA GeForce GTX TITAN. Sampling distance along rays
is 0.3 voxel length. Thread block size: 256 threads per block.
Three General Cases
a. Densities of the warp of samples < the iso-value;
b. Densities of the warp of samples >= the iso-value;
c. Some density values are lesser than the iso-value, while
others are larger than the iso-value.
Two Special Cases
d. Iso-surface is located in the gap between two warp cycles.
f. Boundary condition: warp marching cannot detect the isovalue if it is on the first sample of the warp.
Conclusion & Future Work
We introduce a new sampling strategy for the texture-based
iso-surface volume rendering. The strategy takes advantage
of cache coherence and increases rendering performance at
certain camera orientations. Applying warp marching to
other types of GPUs, even those with varying warp sizes, is a
promising direction for future exploration. Also, a hybrid
approach that adjusts sampling strategy based on viewing
direction is worth trying.
Download