“QSplat : A Multiresolution Point Rendering System for Large Data Meshes” Authors:

advertisement
“QSplat : A Multiresolution Point Rendering System
for Large Data Meshes”
Authors:
Szymon Rusinklewicz
Marc Levoy
Presentation:
Nathaniel Fout
Motivation
• A quick review…
- Rendering time is a very strong function of scene complexity
- Which class of rendering algorithms is this not true for ?
- Does this pose a problem for rendering the Stanford
Bunny in real time? What about 100 Bunnies?
What about 1000 Bunnies?
( ~ 4.5 billion tri/sec)
- Current graphics hardware:
nVidia Quadro at 17 million tri/sec
Motivation
•
•
Who would want to render 1000 Bunnies in real time ?
Practical Applications :
- rendering complex terrain (games, simulators)
- rendering sampled models of physical objects
• Advances in scanning technology have enabled the creation of very
large meshes with hundreds of millions of polygons
• Conventional rendering will not work. Why not?
Obviously insufficient triangle throughput, but what about storage?
1,000,000 triangles  36 MB
100,000,000 triangles  3600 MB
etc…
Rendering Large Data Sets
•
Methodologies for dealing with this problem
1) Visibility Culling – includes frustum culling, backface culling, occlusion culling
2) LOD Control – discrete or fine-grained control
3) Geometric Compression – saves on storage
costs, but must be
decoded to render
4) Point Rendering – use a simpler primitive, the
point, instead of triangles
• Many algorithms use some of these techniques; QSplat
uses all of them
What is QSplat ?
QSplat is a point-based rendering system
that uses the visibility culling, LOD control,
and geometric compression to render very
large triangular meshes. The core of the
renderer is a hierarchical data structure
consisting of bounding spheres.
General Description
• Basic Idea – instead of rendering all those
polygons, let’s approximate the mesh with points
along the surface
• We can then splat these points on the image
plane; z-buffer takes care of visibility as usual
• Point samples are organized in a hierarchical
fashion using bounding spheres – this facilitates
easy visibility culling, LOD control, and rendering
• Hierarchy construction is a preprocessing step –
it is done once only and saved to a disk
Rendering
•
The rendering algorithm:
TraverseHierarchy(node) {
if (node not visible)
skip this branch of the tree
else if (node is a leaf node)
draw a splat
else if (benefit of recursing further is low)
draw a splat
else
for each child in children(node)
TraverseHierarchy(child)
}
Rendering: Visibility Culling
•
Frustum culling is performed by testing the bounding
sphere against all six planes of the viewing frustum
•
Each node stores a normal cone
which is a collective representation
of the normals of the subtree for
that node – this cone is used
for back face culling
•
Occlusion culling is not used
N
Rendering: LOD Control
•
•
•
•
•
LOD control is accomplished by adjusting the depth of recursion
when traversing the tree
There are two factors which control the depth of recursion:
- projected screen space area of the bounding sphere
- user selected frame rate
If the projected area of the sphere exceeds a threshold value then we
descend to the next level
A feedback adjustment takes place to keep the frame rate at a user
specified value; this adjustment is based simply on the ratio of actual
to desired frame rate
Progressive refinement is initiated once the user stops moving – the
area threshold is successively reduced until it is the size of a pixel
LOD Control
Threshold:
15 pixels
Points:
130,712
Rendering Time:
132 ms
Threshold:
1 pixel
Points:
14,835,967
Rendering Time:
8308 ms
Michelangelo’s statue of St. Matthew
Preprocessing
• Building the Hierarchy tree…
What do the nodes look like?
Interior nodes will have
at most 4 children
Leaf nodes correspond
to vertices
Preprocessing
• Building the hierarchy tree…
- we begin with a list of vertices
left child
- next we find a bounding box
which contains the vertices
- find the midpoint vertex along the
longest axis of the bounding box
- split the set of vertices into two
parts
- this creates the two children of
the current node
right child
- the current node corresponds to current node
the bounding sphere of the two
child nodes
- continue recursively…
Preprocessing
• Preprocessing Issues:
- to ensure that there are no holes in the rendering
we set the leaf node spheres to be a certain size
If two vertices are joined by an edge, then the
spheres for those vertices are made large
enough to touch each other.
Also, the size of a sphere at a vertex is set to
the size of the maximum sphere of the
vertices which make up that triangle
- to decrease the size of the tree, nodes are combined to
increase the average branching factor to ~4
- after the tree is created the properties of the nodes are
calculated
Design Overview
Design Details
tree node layout:
13
3
14
2
16
position and radius
• Position and radius of sphere encoded as offsets relative to parent
and quantized to 13 values
• Not all of 134 values are valid – in fact, only 7621 are valid
• Incremental encoding of geometry essentially spreads out the bits of
information among the levels of the hierarchy
• Note that connectivity information is discarded
• Encoding saves space but increases rendering time due to the
necessity of decoding on-the-fly
• Quantization saves space but pays for it by sacrificing accuracy
Design Details
tree node layout:
13
3
14
2
16
tree structure
• Information as to the structure of the tree is necessary for traversal
since the number of children may vary
• Normally a pointer is kept for each child; however, if we store the tree
in breadth-first order then we only need one pointer for each group of
siblings
• This one pointer (along with the tree structure bits) is enough for
traversal
• The first two bits represent the number of children: 0, 2, 3, or 4
• The last bit indicates whether or not all children are leaf nodes
Design Details
tree node layout:
13
3
14
2
16
normal
• Normals are quantized to 14 bits
• These bits hold an encoded direction: a virtual cube with each face
sub-divided into a 52 x 52 grid represents the possible values
• Grid positions are warped to sample normal space more uniformly
• Unlike the range of positions, normal space is bounded – this makes it
efficient to use a single look-up table for rendering
• Incremental encoding is more expensive to decode and is not used for
normals
• Banding artifacts can be seen in specular highlights
Design Details
tree node layout:
13
3
14
2
16
width of
normal cone
• Width of normal cone is quantized to four values:
cones whose half-angles have sines of 1/16, 4/16, 9/16, or 16/16
• On typical data sets, back face culling with these quantized cone
values discards over 90% of nodes which would be discarded were
exact normal cone widths to be used
• Again, incremental encoding could be used, but with a penalty in
rendering time
Design Details
tree node layout:
13
3
14
2
16
color
• Colors stored using 16 bits (RGB as 5-6-5)
Design Details
• file layout
- as a consequence of storing the tree in breadth-first order, the
information necessary to render at low resolution is located
in the first part of the file
- therefore only a working set needs to be loaded into memory;
wait to load in a tree level until it is needed
- this progressive loading may slow frame rates temporarily
when zooming in for the first time, but greatly increases initial
load time
- speculative prefetching could help to amend this problem
Design Details: Splatting
• Splat Shape:
- OpenGL point (rendered as a square)
- opaque circle
- fuzzy spot which decays radially as Gaussian using
alpha blending
• In order to render points as fuzzy spots we need to make sure splats
are drawn in the correct order.
We can accomplish this with multi-pass rendering:
1. Offset depth values by some amount z0
2. Render only into the depth buffer
3. Unset depth offset and render additively
into the color buffer using depth
comparison but not depth update
Design Details: Splatting
• Comparison of splats with a constant size:
• Gaussian kernel exhibits less aliasing
• Relative rendering times for square, circle, and Gaussian are 1, 2,
and 4 respectively
• Constant threshold of 20 pixels
Design Details: Splatting
• Based on this comparison it is better to
use Gaussian kernels, right?
• Not all splats are rendered in the same
amount of time
• What if we allow the threshold to fluctuate,
but constrain the rendering times to be the
same
• Sample Rate vs. Reconstruction Quality
Design Details: Splatting
• Comparison of splats with constant rendering time:
• Based on rendering time the square is the best splat shape to use
• Note that results will be hardware dependent
Design Details: Splatting
• Another consideration is whether the
splats are always round or if they can be
elliptical (perspectively correct)
• Can use the node normal to determine
eccentricity of ellipse
• Using elliptical splats reduce noise and
enhance the smoothness of silhouettes
• Using ellipses can cause holes to occur
Design Details: Splatting
• A visual comparison of circles vs. ellipses:
Performance
• Typical preprocessing times:
Performance
• Preprocessing timing comparisons:
- Hoppe reports 10 hours for 200,000 vertices
- Luebke and Erikson report 121 seconds for
281,000 vertices
- QSplat can process 200,000 vertices in
under 5 seconds
- Comparisons with mesh simplification for a bunny
with 35,000 vertices:
* Lindstrom and Turk report 30 s to 45 min
* Rossignac and Borrel report less than 1 s
* QSplat takes 0.6 s
Performance
• Rendering Performance:
Performance
• Rendering Performance:
- QSplat can render between 1.5 and 2.5
million points per second
- Hoppe reports 480,000 polygons per second
with progressive meshes
- ROAM system (a terrain rendering system)
reports 180,000 polygons per second
- QSplat can render 250 – 400 thousand points
per second on a laptop with no 3D graphics
hardware
Performance
• Rendering Performance:
- Comparison with polygon rendering…
a) Points
b) Polygons – same number of
primitives and same rendering
time as a)
c) Polygons – same number of
vertices as a) but twice the
rendering time
Conclusions
• QSplat accomplishes its goal of interactive
rendering of very large data sets
• QSplat’s performance both in preprocessing and
rendering is competitive with the fastest
progressive display algorithms and mesh
simplification algorithms
• Geometric compression achieved by QSplat is
close to that of current geometric compression
techniques
• QSplat can be implemented independent of 3D
graphics hardware
Future Work…
• Huffman coding could be used to achieve even greater
compression, but would require further decompression
prior to rendering
• For cases when rendering speed is more important and
storage is not a problem, the incremental encoding could
be removed
• The rendering algorithm could be parallelized by
distributing different parts of the tree to different
processors
• Exploration into using the data structure for ray tracing
acceleration
• Exploration into instancing for for scenes
• Exploration into storing additional items in the nodes
such as transparency, etc.
Some final pictures…
QSplat on display at a museum in Florence – some kids kept crashing
the program by zooming in too close.
Download