HistoPyramid stream compaction and expansion Christopher Dyken * and Gernot Ziegler

advertisement
HistoPyramid stream compaction and expansion
Christopher Dyken1 * and Gernot Ziegler2
Advanced Computer Graphics / Vision Seminar
TU Graz 23/10-2007
1
2
University of Oslo
Max-Planck-Institut für Informatik, Saarbrücken
Page 1
GPUs are highly parallel and perfect for all data local operations.
However, re-arranging or selectively deleting data is difficult on
GPUs.
Stream compaction and expansion are such operations:
Stream compaction
For each element in input stream,
let a predicate determine if the element should be discarded.
Produce a compact output stream of remaining elements.
Generalization: Stream compaction and expansion
For each element in input stream,
let a predicate determine each element’s multiplicity, i.e. how many times the
element should be present in the output stream (0 =⇒ discard)
Produce a compact output stream of all elements with a multiplicity greater
than 0
Page 2
Why bother with stream compaction and expansion?
I
Feature extraction:
I
I
Point cloud generation:
I
I
I
Save GPU → CPU bandwith.
Emulate/Offload geometry-shader:
I
I
Generate a set of points on an implicitly defined surface.
Compaction of intermediate results:
I
I
get a compact list of all locations satisfying a criterion.
Our HP-based Marching Cubes implementation does not need
GS, currently even outperforms GS-based approaches.
Sparse matrix extraction.
...
Data re-organization is an active field of research (see GPU Gems
II and III, etc.)
Page 3
The HistoPyramid algorithm
I
The input stream is laid out over a N × N grid (=tex2D).
I
Each input element are subjected to a predicate function:
I
I
I
I
The output of the predicate func forms the HP base layer.
I
I
I
I
the HP is a mipmap-pyramid of partial sums.
the HP pyramid is built in log2 (N) passes.
top element of HP: number of elements in output.
Then, iterate over output elements:
I
I
I
count = 0 =⇒ discard from output stream.
count = 1 =⇒ keep in output stream.
count > 1 =⇒ repeat in output stream.
extraction of an element is done in log2 (N) texture lookups.
Each input element can have multiple copies in output.
No data transfer from GPU to CPU.
Page 4
Overview
12
2 2
4 4
0
0
2
0
Input image
Predicate
Bucket count
1
1
0
2
1
1
0
2
0
0
2
0
HistoPyramid
HP-builder
(2,1) (2,2) (5,1)
(5,2) (0,4) (1,5)
(2,6) (3,6) (7,4)
(6,5) (4,6) (5,6)
Point list
Extractor
Page 5
Predicate function
I
For each input element, determine output stream count.
I
I
For different predicates, several base layers can be built:
I
I
I
I
count is often binary (1/0)
per base layer, one HP will be built - in parallel.
predicates may overlap if needed!
NV40: 4×RGBA = 16 predicates, G80: 8×RGBA = 32
predicates.
Example: extract list of edge pixels (Laplace + threshold):
Input data
Laplace + threshold
|
{z
}
HP base level
predicate
Page 6
HistoPyramid builder
I
mipmap-generation, but with sum instead of average:
I
in effect, each cell counts elements in its sub-pyramid:
1
1
0
1
1
0
2
0
0
1
0
0
1
0
1
0
3 2
3 1
Level 1, 2 × 2
9
Level 2, 1 × 1
Base level, 4 × 4
I
I
top element: total number of elements in base layer =⇒
output size.
Example: HP of the Lena edge-pixels (red = nonzero count):
Page 7
Pointlist extractor
Input: output index used as
key index.
[0,1)
∅
Input: Key indices
0
1
2
3
4
5
6
7
8
Output: texcoords & clone ix
[1,2)
∅
intervals
6
1
4
9
L2
1
1
0
1
3
2
1
0
1
0
3
1
0
2
0
1
1
0
0
0
L1
[0,0],0 [1,0],0 [0,1],0
L0
1
[3,0],0 [2,1],0 [1,2],0
[1,2],1 [0,3],0 [3,2],0
Notice:
multiplicity from base layer:
=1 =⇒ copy once.
>1 =⇒ copy multiple times.
[0,3)
[3,5)
[5,8)
[8,9)
intervals
∅
[0,2)
[2,3)
intervals
Page 8
∅
The Marching Cubes Algorithm
The input to the algorithm is an M 3 grid of scalar values.
Examine groups of 2 × 2 × 2 voxels (MC cell).
Check if MC cell’s corners are inside/outside iso-level.
8 corners, inside/outside
=⇒ 256 classes.
Each MC class:
combination of edges
that pierce iso-surface.
Use table with geometry
for MC classes, with all
possible triangulations
of the edge intersections
(figure).
Determine exact edge-surface intersections and emit corresponding triangles.
Notice: Effectively a stream compaction/expansion process!
Page 9
HistoPyramid Marching Cubes
Scalar field
texture
Vertex
count
texture
HistoPyramid
texture
Triangulation
table
texture
Enumeration
VBO
Build
HP base
HP
reduce
Vertex count
readback
Render
geometry
Iso-level
Start
new frame
Update
scalar field
For each level
Input: A stream of (M − 1)3 MC-cells (2x2x2 voxels grouped).
Predicate: Samples and determines MC class via
inside/outside-state of MC cell corners, then writes number of
required vertices for MC geometry to base layer.
HistoPyramid: Top element gives total number of vertices in the
iso-surface (3× the number of triangles).
Extraction: Use output index to traverse HP, determine
corresponding input element (i.e. which MC cell), remainder tells
which edge intersection this vertex correspond to, determine edge
intersection and emit position.
Page 10
Datasets used in the performance analysis
Bunny
CThead
MRbrain
Bonsai
Aneurism
Cayley
Page 11
Cayley
Aneurism
Bonsai
mrbrain
Cthead
Bunny
Performance of HistoPyramid Marching Cubes
Model
255x255x255
127x127x127
63x63x63
31x31x31
255x255x128
127x127x63
63x63x31
31x31x15
255x255x128
127x127x63
63x63x31
31x31x15
255x255x255
127x127x127
63x63x63
31x31x31
255x255x255
127x127x127
63x63x63
31x31x31
255x255x255
127x127x127
63x63x63
31x31x31
MC cells
16581375
2048383
250047
29791
8323200
1016127
123039
14415
8323200
1016127
123039
14415
16581375
2048383
250047
29791
16581375
2048383
250047
29791
16581375
2048383
250047
29791
Density
3.2%
5.6%
9.1%
13.6%
3.7%
6.3%
9.6%
14.5%
5.8%
7.4%
10.0%
14.9%
3.0%
5.1%
6.7%
8.2%
1.6%
2.1%
3.7%
6.8%
0.9%
1.9%
3.9%
8.1%
6600GT
HP-VS
–
5.4 (2.6)
4.0 (16.1)
2.5 (82.8)
–
5.4 (5.3)
3.7 (29.9)
2.3 (161.3)
–
4.6 (4.5)
3.5 (28.6)
2.2 (155.0)
–
5.9 (2.9)
5.4 (21.5)
4.1 (136.8)
–
12.6 (6.1)
9.1 (36.2)
4.5 (149.7)
–
13.5 (6.6)
8.5 (33.9)
3.7 (123.8)
7800GT
HP-VS
–
11.8 (5.8)
8.5 (34.1)
5.0 (167.9)
16.3 (2.0)
11.6 (11.5)
7.7 (62.2)
4.5 (311.5)
10.5 (1.3)
9.9 (9.7)
7.4 (60.0)
4.3 (300.9)
–
13.0 (6.3)
11.4 (45.5)
8.0 (268.8)
–
29.1 (14.2)
19.2 (76.7)
8.6 (289.1)
–
31.2 (15.2)
17.9 (71.6)
7.3 (245.8)
8800GTX
HP-VS
538.6 (32.5)
309.5 (151.1)
163.4 (653.5)
25.5 (857.0)
437.6 (53.0)
288.1 (283.6)
97.3 (791.0)
12.9 (896.4)
309.0 (37.4)
257.7 (263.6)
96.8 (786.5)
12.7 (879.7)
560.8 (33.8)
329.8 (161.0)
186.5 (745.9)
25.1 (843.0)
892.5 (53.8)
557.6 (272.2)
190.5 (761.9)
25.0 (839.3)
1112.3 (67.1)
581.3 (283.8)
198.0 (791.9)
25.8 (866.2)
8800GTX
NV-SDK10
–
–
28.3 (113.2)
21.9 (734.0)
–
–
25.3 (205.9)
17.1 (1187.0)
–
–
26.4 (214.9)
18.2 (1257.4)
–
–
28.9 (115.6)
24.0 (804.6)
–
–
32.9 (131.5)
25.5 (856.6)
–
–
32.1 (128.5)
24.7 (827.9)
Numbers in million voxels processed per second (Parentheses: MC runs per second - framerate).
Page 12
References:
I C. Dyken, G. Ziegler, C. Theobalt, H.-P. Seidel,
GPU Marching Cubes on Shader Model 3.0 and 4.0,
MPI-I-2007-4-006, Max-Planck-Institut für Informatik, 2007
I C. Dyken, J. Seland, and M.Reimers,
Real-Time GPU Silhouette Refinement using adaptively blended Bézier
patches,
to appear in Graphics Forum, 2007
I I Ihrke, G. Ziegler, A. Tevs, C. Theobalt, M. Magnor, H.-P. Seidel
Eikonal Rendering: Efficient Light Transport in Refractive Objects
to appear in ACM Trans. on Graphics (Siggraph’07), 2007.
I G. Ziegler, A. Tevs, C. Theobalt, H.-P. Seidel,
”GPU Point List Generation through Histogram Pyramids”,
MPI-I-2006-4-002, Max-Planck-Institut für Informatik, 2006.
Page 13
Download