advertisement

“QSplat : A Multiresolution Point Rendering System for Large Data Meshes” Authors: Szymon Rusinklewicz Marc Levoy Presentation: Nathaniel Fout Motivation • A quick review… - Rendering time is a very strong function of scene complexity - Which class of rendering algorithms is this not true for ? - Does this pose a problem for rendering the Stanford Bunny in real time? What about 100 Bunnies? What about 1000 Bunnies? ( ~ 4.5 billion tri/sec) - Current graphics hardware: nVidia Quadro at 17 million tri/sec Motivation • • Who would want to render 1000 Bunnies in real time ? Practical Applications : - rendering complex terrain (games, simulators) - rendering sampled models of physical objects • Advances in scanning technology have enabled the creation of very large meshes with hundreds of millions of polygons • Conventional rendering will not work. Why not? Obviously insufficient triangle throughput, but what about storage? 1,000,000 triangles 36 MB 100,000,000 triangles 3600 MB etc… Rendering Large Data Sets • Methodologies for dealing with this problem 1) Visibility Culling – includes frustum culling, backface culling, occlusion culling 2) LOD Control – discrete or fine-grained control 3) Geometric Compression – saves on storage costs, but must be decoded to render 4) Point Rendering – use a simpler primitive, the point, instead of triangles • Many algorithms use some of these techniques; QSplat uses all of them What is QSplat ? QSplat is a point-based rendering system that uses the visibility culling, LOD control, and geometric compression to render very large triangular meshes. The core of the renderer is a hierarchical data structure consisting of bounding spheres. General Description • Basic Idea – instead of rendering all those polygons, let’s approximate the mesh with points along the surface • We can then splat these points on the image plane; z-buffer takes care of visibility as usual • Point samples are organized in a hierarchical fashion using bounding spheres – this facilitates easy visibility culling, LOD control, and rendering • Hierarchy construction is a preprocessing step – it is done once only and saved to a disk Rendering • The rendering algorithm: TraverseHierarchy(node) { if (node not visible) skip this branch of the tree else if (node is a leaf node) draw a splat else if (benefit of recursing further is low) draw a splat else for each child in children(node) TraverseHierarchy(child) } Rendering: Visibility Culling • Frustum culling is performed by testing the bounding sphere against all six planes of the viewing frustum • Each node stores a normal cone which is a collective representation of the normals of the subtree for that node – this cone is used for back face culling • Occlusion culling is not used N Rendering: LOD Control • • • • • LOD control is accomplished by adjusting the depth of recursion when traversing the tree There are two factors which control the depth of recursion: - projected screen space area of the bounding sphere - user selected frame rate If the projected area of the sphere exceeds a threshold value then we descend to the next level A feedback adjustment takes place to keep the frame rate at a user specified value; this adjustment is based simply on the ratio of actual to desired frame rate Progressive refinement is initiated once the user stops moving – the area threshold is successively reduced until it is the size of a pixel LOD Control Threshold: 15 pixels Points: 130,712 Rendering Time: 132 ms Threshold: 1 pixel Points: 14,835,967 Rendering Time: 8308 ms Michelangelo’s statue of St. Matthew Preprocessing • Building the Hierarchy tree… What do the nodes look like? Interior nodes will have at most 4 children Leaf nodes correspond to vertices Preprocessing • Building the hierarchy tree… - we begin with a list of vertices left child - next we find a bounding box which contains the vertices - find the midpoint vertex along the longest axis of the bounding box - split the set of vertices into two parts - this creates the two children of the current node right child - the current node corresponds to current node the bounding sphere of the two child nodes - continue recursively… Preprocessing • Preprocessing Issues: - to ensure that there are no holes in the rendering we set the leaf node spheres to be a certain size If two vertices are joined by an edge, then the spheres for those vertices are made large enough to touch each other. Also, the size of a sphere at a vertex is set to the size of the maximum sphere of the vertices which make up that triangle - to decrease the size of the tree, nodes are combined to increase the average branching factor to ~4 - after the tree is created the properties of the nodes are calculated Design Overview Design Details tree node layout: 13 3 14 2 16 position and radius • Position and radius of sphere encoded as offsets relative to parent and quantized to 13 values • Not all of 134 values are valid – in fact, only 7621 are valid • Incremental encoding of geometry essentially spreads out the bits of information among the levels of the hierarchy • Note that connectivity information is discarded • Encoding saves space but increases rendering time due to the necessity of decoding on-the-fly • Quantization saves space but pays for it by sacrificing accuracy Design Details tree node layout: 13 3 14 2 16 tree structure • Information as to the structure of the tree is necessary for traversal since the number of children may vary • Normally a pointer is kept for each child; however, if we store the tree in breadth-first order then we only need one pointer for each group of siblings • This one pointer (along with the tree structure bits) is enough for traversal • The first two bits represent the number of children: 0, 2, 3, or 4 • The last bit indicates whether or not all children are leaf nodes Design Details tree node layout: 13 3 14 2 16 normal • Normals are quantized to 14 bits • These bits hold an encoded direction: a virtual cube with each face sub-divided into a 52 x 52 grid represents the possible values • Grid positions are warped to sample normal space more uniformly • Unlike the range of positions, normal space is bounded – this makes it efficient to use a single look-up table for rendering • Incremental encoding is more expensive to decode and is not used for normals • Banding artifacts can be seen in specular highlights Design Details tree node layout: 13 3 14 2 16 width of normal cone • Width of normal cone is quantized to four values: cones whose half-angles have sines of 1/16, 4/16, 9/16, or 16/16 • On typical data sets, back face culling with these quantized cone values discards over 90% of nodes which would be discarded were exact normal cone widths to be used • Again, incremental encoding could be used, but with a penalty in rendering time Design Details tree node layout: 13 3 14 2 16 color • Colors stored using 16 bits (RGB as 5-6-5) Design Details • file layout - as a consequence of storing the tree in breadth-first order, the information necessary to render at low resolution is located in the first part of the file - therefore only a working set needs to be loaded into memory; wait to load in a tree level until it is needed - this progressive loading may slow frame rates temporarily when zooming in for the first time, but greatly increases initial load time - speculative prefetching could help to amend this problem Design Details: Splatting • Splat Shape: - OpenGL point (rendered as a square) - opaque circle - fuzzy spot which decays radially as Gaussian using alpha blending • In order to render points as fuzzy spots we need to make sure splats are drawn in the correct order. We can accomplish this with multi-pass rendering: 1. Offset depth values by some amount z0 2. Render only into the depth buffer 3. Unset depth offset and render additively into the color buffer using depth comparison but not depth update Design Details: Splatting • Comparison of splats with a constant size: • Gaussian kernel exhibits less aliasing • Relative rendering times for square, circle, and Gaussian are 1, 2, and 4 respectively • Constant threshold of 20 pixels Design Details: Splatting • Based on this comparison it is better to use Gaussian kernels, right? • Not all splats are rendered in the same amount of time • What if we allow the threshold to fluctuate, but constrain the rendering times to be the same • Sample Rate vs. Reconstruction Quality Design Details: Splatting • Comparison of splats with constant rendering time: • Based on rendering time the square is the best splat shape to use • Note that results will be hardware dependent Design Details: Splatting • Another consideration is whether the splats are always round or if they can be elliptical (perspectively correct) • Can use the node normal to determine eccentricity of ellipse • Using elliptical splats reduce noise and enhance the smoothness of silhouettes • Using ellipses can cause holes to occur Design Details: Splatting • A visual comparison of circles vs. ellipses: Performance • Typical preprocessing times: Performance • Preprocessing timing comparisons: - Hoppe reports 10 hours for 200,000 vertices - Luebke and Erikson report 121 seconds for 281,000 vertices - QSplat can process 200,000 vertices in under 5 seconds - Comparisons with mesh simplification for a bunny with 35,000 vertices: * Lindstrom and Turk report 30 s to 45 min * Rossignac and Borrel report less than 1 s * QSplat takes 0.6 s Performance • Rendering Performance: Performance • Rendering Performance: - QSplat can render between 1.5 and 2.5 million points per second - Hoppe reports 480,000 polygons per second with progressive meshes - ROAM system (a terrain rendering system) reports 180,000 polygons per second - QSplat can render 250 – 400 thousand points per second on a laptop with no 3D graphics hardware Performance • Rendering Performance: - Comparison with polygon rendering… a) Points b) Polygons – same number of primitives and same rendering time as a) c) Polygons – same number of vertices as a) but twice the rendering time Conclusions • QSplat accomplishes its goal of interactive rendering of very large data sets • QSplat’s performance both in preprocessing and rendering is competitive with the fastest progressive display algorithms and mesh simplification algorithms • Geometric compression achieved by QSplat is close to that of current geometric compression techniques • QSplat can be implemented independent of 3D graphics hardware Future Work… • Huffman coding could be used to achieve even greater compression, but would require further decompression prior to rendering • For cases when rendering speed is more important and storage is not a problem, the incremental encoding could be removed • The rendering algorithm could be parallelized by distributing different parts of the tree to different processors • Exploration into using the data structure for ray tracing acceleration • Exploration into instancing for for scenes • Exploration into storing additional items in the nodes such as transparency, etc. Some final pictures… QSplat on display at a museum in Florence – some kids kept crashing the program by zooming in too close.