Visibility Culling Roger A. Crawfis CIS 781 The Ohio State University Interactive Frame Rates Are Difficult To Achieve The Problem • Two keys for an interactive system – Interactive rendering speed: too many polygons – difficult!! – Uniform frame rate: varied scene complexity – difficult!! Possible Solutions • Visibility Culling – back face culling, frustum culling, occlusion culling (might not be sufficient) • Levels of Detail (LOD) – hierarchical structures and choose one to satisfy the frame rate requirement LOD Selections How to pick the Optimal ones??!! Occlusion Culling • Hidden Surface Removal methods are not fast enough for massive models on current hardware • Occlusion Culling avoids rendering primitives that are occluded by another part of the scene • Occlusion Culling techniques are ideally output sensitive – runtime is proportional to the size of exact visibility set Related Work • Hierarchical Z-Buffer – Image space occlusion culling method [Greene’93] – Build a layered Z-pyramid with a different resolution of the Z-buffer at each level – Allows quick accept/reject • Hierarchical LODs – Simplification Culling : Approximate entire branch of the scene graph by an HLOD – Can we use HLODs as occluders/occludees? Visibility in Games • What do we need it for? – Increase of rendering speed by removing unseen scene data from the rendering pipeline as early as possible – Reduction of data transfers to the graphics hardware – Current games would not be possible without visibility calculations Visibility methods • 2 very different categories: – Visibility from a region (Portals, PVS) • (Quake, Unreal, Severance and co.) – Visibility from a point (Z-Buffer, BFC,...) • Racing games, outdoor scenes, sports games etc. Point-Visibility Occlusion • Traditionally used: – Back-Face culling – Z-Buffering – View frustum culling • Octree • Quadtree A PSX Example • Iron Soldier 3 on PSX: – View frustum culling based on a quad-tree – Back-face culling – Painters algorithm Only culling to the left and right sides of the viewing frustum. New Occlusion Methods • Image-space occlusion culling – Hierarchical Z-Buffering – Hierarchical Occlusion Maps • Object-space occlusion culling – Hierarchical View Frustum culling – Hierarchical Back-Face culling Visibility Culling • We will look at these: – – – – Hierarchical Back-face culling View-frustum culling Occlusion culling Detail culling Hierarchical Back-Face Culling • Partitions each model into clusters • Primitives in one cluster are: – Facing into similar directions – Lie close to each other • If the cluster fails the visibility test, all primitives in this cluster are culled Hierarchical Back-Face Culling Normal Maps • Create a data structure that places each polygon in the space according to its normal direction. • Partition this space and then simply look at those partitions that might have visible polygons. phi theta View-Frustum Culling • Remove objects that are outside the viewing frustum 1. Construct bounding volumes (BVs) 2. Create hierarchy 3. BV/V-F intersection tests Mostly done in “Application Stage” View-Frustum Culling • Culling against bounding volumes to save time • Bounding volumes – AABB, OBB, Spheres, etc. – easy to compute, as tight as possible AABB Sphere OBB View-Frustum Culling • Often done hierarchically to save time In-order, top-down traversal and test View-Frustum Culling • Two popular hierarchical data structures – BSP Tree and Octree Axis-Aligned BSP Polygon-Aligned BSP Intersecting? View-Frustum Culling • Octree • A parent has 8 childrens • Subdivide the space until the number of primitives within each leaf node is less than a threshold • In-order, top-down traversal Hierarchical Z-Buffer • Z-Buffer is arranged in an image pyramid. • Scene is partitioned in an octree. • Octree nodes are tested against the ZPyramid where pixels have the same size. • Visible nodes serve as input for the next frame. • Relies on HW visibility query. HZB/Hierarchical occlusion maps Hierarchical occlusion maps • Potential occluders are pre-selected • These occluders are rendered to the occlusion map. The hierarchy can be built with MIP-Mapping HW • Depth test after occlusion test • Separate depth estimation buffer Hierarchical View Frustum Culling • Speeds up VFC by testing only 2 box corners of a bounding box first. • Plane coherency during frame advancing • Test against VF-octants. • BB-Child masking Detail Culling • A technique that sacrifices quality for speed • Base on the size of projected BV – if it is too small, discard it. • Also often done hierarchically. Always helps to create a hierarchical structure, or scene graph. Occlusion Culling • Discard objects that are occluded • Z-buffer is not the smartest algorithm in the world (particularly for high depthcomplexity scenes) • We want to avoid the processing of invisible objects Occlusion Culling OcclusionCulling (G) Or = empty For each object g in G if (isOccluded(g, Or)) skip g else render (g) update (Or) end End G: input graphics data Or: occlusion representation The problem: 1. algorithms for isOccluded() 2. Fast update Or Hierarchical Visibility • Object-space octree – Primitives in a octree node are hidden if the octree node (cube) is hidden – A octree cube is hidden if its 6 faces are hidden polygons – Hierarchical visibility test: Hierarchical Visibility (obj-sp.) From the root of octree: • View-frustum culling • Scan conversion each of the 6 faces and perform z-buffering • If all 6 faces are hidden, discard the entire node and sub-branches • Otherwise, render the primitives here and traverse the front-to-back children recursively A conservative algorithm – why? Hierarchical Visibility (obj-sp.) • Scan conversion the octree faces can be expensive – cover a large number of pixels (overhead) • How can we reduce the overhead? • Goal: quickly conclude that a large polygon is hidden • Method: use hierarchical z-buffer ! Hierarchical Z-buffer An image-space approach • Create a Z-pyramid 1 value ¼ resolution ½ resolution Original Z-buffer Hierarchical Z-buffer (2) 7 1 0 3 0 1 6 2 7 6 3 9 1 2 9 2 9 1 2 2 Keep the maximum value 9 Hierarchical Z-buffer update Visibility (OctreeNode N) if (isOccluded (N, Zp) then return; for each primitive p in N render and update Zp end for each child node C of N in front-to-back order Visibility ( C ) end Some Practical Issues • A fast software algorithm • Lack of hardware support – Scan conversion – Efficient query of if a polygon is visible (without render it) – Z feedback Combining with hardware • Utilizing frame-to-frame coherence – First frame – regular HZ algorithm (software) • Remember the visible octree nodes – Second frame (view changes slightly) • Render the previous visible nodes using OpenGL • Read back the Z-buffer and construct Z-pyramid • Perform regular HZ (software) – What about the third frame? – Utilizing hardware to perform rendering and Zbuffering – considerably faster Hierarchical Occlusion Map Zhang et al SIGGRAPH 98 Basic Ideas • Choose a set of graphics objects from the scene as Occluders • Use the occluders to define an Occlusion Map (hierarchically) • Compare the rest of scene against the occlusion map Example Blue: Occluders Red: Occludees Algorithm Pipeline Occluder Database Viewing Frustum Culling Occluder Selection Rendering Build Occlusion Map Hierarchy Real Scene Viewing Frustum Culling Occlusion Test 2-Step Occlusion Test 1. Overlap Test 2. Overlap Test Overlap + Depth = Occlusion Why decomposition? • The occlusion test is done approximately (conservatively) • We can afford to be more conservative in depth test than overlap test Why Decomposition? Overlap Test – Occlusion Map • Representation of projection for overlap test: occlusion map • A gray scale image – each pixel represents one block of screen region • Generate by rendering occluders Occlusion Map (OM) • Each pixel of the occlusion map has an opacity, which represents the ratio of the sum of the opaque areas in the block to the total area. • If fully covered, p= 1, if anti-alised pixel, p <1) • Occlusion map: the alpha channel of an image Overlap Test using OM For each potential occludee, we can scan-convert it and compare against the opacity of the pixels it overlaps Expensive!! • Conservative Approximation: use the screen-space bounding box of the occludee (a superset of the actual covered pixels) • If all the pixels inside the bounding box are opaque, the object is occluded. Hierarchical Occlusion Map Like hierarchical Z-buffer, we can create a hierachy to speed up the comparison (for large objects) The low resolution pixel is an average of the high resolution pixels Overlap Test using HOM Basic Algorithm 1. Start from the lowest resolution 2. If the pixel cover the bounding rectangle has a value 1, the object is occluded 3. Otherwise traverse down the hierarchy: • If all children =1: occluded • If all children =0; not occluded • Otherwise, traverse down further Approximate Overlap Test • Instead of concluding an object is occluded only when the bounding box is within pixels with opacity 1, we can use an threshold between [0,1] • Early termination in the high level of the hierarchy • What does it mean when a block has high opacity but not one? This is the unique feature of HOM !! Depth Test Approximate Z (depth) test: • A single Z Plane A single Z plane to separate the occluders from occludees. Depth Test Break the screen into small regions • • • Build at each frame Instead of using Z-buffer, use the occluder’s bounding volume’s farthest Z Compare each potential occludee’s nearest Z (conservative test) Occluder Selection Ideal occluder: the visible objects – it’s a joke View-dependent occluder: too expensive Solution: Estimate and build an occluder database Discard objects that do not server as good occluders Occluder Selection • Size: not too small • Redundant: detail polygons (clock on the wall) • Complexity: Complex polygons are not preferred (why?) • Done at run time – sort the occluders in depth, add them in order until reach the polygon count. OPS – View-independent Occluders X Z OPS – View-dependent Occluders Occludders – In practice, use traditional, static LOD’s • • • • • More restrictive view-independent OPS Well-studied and available Low run-time overhead Shared with final rendering, no extra memory Area-preserving [Erikson 98] Occluder selection • At run time – Distance-based selection with a polygon budget – Temporal coherence • Visibility sampling – Pre-compute visible objects on a 3-D grid – Facilitates run-time selection Implementation • A two-pass framework Occluder Selection Scene Database View Frustum Culling LOD Rendering Build Occlusion Representation Occlusion Culling LOD Results • The city model Results • The city model – – – – – – 312,524 polygons Single CPU 5,000 occluder polygons Depth estimation buffer Opacity thresholds 1.0 Lighting; display lists; no triangle strips Results 90 80 Frame rate (fps) 70 60 50 OC+VFC 40 VFC+Only 30 20 10 0 1 201 401 Frame # 601 Results Number of remaining triangles 320,000 300,000 280,000 260,000 240,000 220,000 200,000 Total 180,000 160,000 140,000 after VFC after VFC+OC Ideal 120,000 100,000 80,000 60,000 40,000 20,000 0 1 201 401 Frame # 601 Results • Auxiliary Machine Room (AMR) Results •AMR – – – – – – – 632,252 polygons 3 CPUs 25,000 occluder polygons No-background z-buffer Approximate culling (0.85 for level 64x64) LOD Lighting; display lists; no triangle strips Results 8 7 Frame rate (fps) 6 5 LOD+VFC+OC 4 LOD+VFC 3 2 1 0 1 201 Frame # 401 Results Number of remaining triangles 700,000 600,000 500,000 Original model After LOD 400,000 After LOD+VFC 300,000 After LOD+VFC+OC Ideal 200,000 100,000 0 1 201 Frame # 401 Results Number of triangles culled by OC 180,000 160,000 140,000 OT=1.0 120,000 OT=0.8 100,000 80,000 60,000 1 201 Frame # 401 Results • The power plant model Results •The power plant model – – – – – – – 15 million triangles 3 CPUs Visibility pre-processing on a 20x20 grid (~15min) No-background z-buffer 18,000 occluder polygons opacity thresholds from 0.85 and up LOD Results 18 16 Frames rate (fps) 14 12 10 LOD+VFC+OC LOD+VFC 8 6 4 2 0 1 201 Frame # 401 14,400,000 Original model After LOD 13,400,000 12,400,000 11,400,000 1 Number of remaining polygons 800,000 700,000 600,000 500,000 After LOD+VFC After LOD+VFC+OC Ideal 400,000 300,000 200,000 100,000 1 201 Frame # 401 Conclusion •Goals achieved – Generality • Any model, any occluder • Occluder fusion – Speed-up • Accelerate interactive graphics – Ease of implementation • Configurability • Robustness HP hardware occlusion • • • • Extend OpenGL – add an OCCLUSION_MODE The bounding box of an object is scan converted A flag is set if any pixel of the BB faces is visible Only need to read back one flag, instead of the entire frame buffer • Tradeoff – valuable rendering time is used to render useless BB faces (need to be used wisely) • Reportedly 25%-100% speedup were observed The Real World • Scientific approaches often too complicated • Science often uses models with hundreds of thousands of vertices, games don’t. (LOD) • Game developers “pick” ideas from different algorithms • Research has impact on hardware design! Gaming Industry • Parts of the Hierarchical Z-Buffer (HZB) are used sometimes • Runtime-LOD is used as input for a simple HZB • View Frustum Culling (VFC) is almost always used. • Hierarchical Occlusion Maps introduce too much overhead for games, and the z-buffer is there anyway The Real World (3) • PSX-One doesn’t even have a z-buffer • ATI’s Radeon has parts of a HZB (Called Hyper-Z) • GForce2 only has a z-buffer • GForce3 similar to Radeon, but supports HZB visibility query • Dreamcasts Power-VR2 works pretty different (Infinite planes) Conclusions • Visibility algorithms are used in many different applications – – – – Occlusion culling Shadow calculations Radiosity Volumetric lights • All these fields benefit from advances in visibility techniques Recap • Visibility culling: don’t render what can’t be seen – Off-screen: view-frustum culling – Z-buffered away: occlusion culling • Cells and portals – Works well for architectural models – Teller: accurate, complex, a bit slow – pfPortals: fast, cheap, easy Hierarchical Z-Buffer • Q: What do you think this is? • Replace Z-buffer with a Z-pyramid – Lowest level: full-resolution Z-buffer – Higher levels: each pixel represents what? • A: Maximum distance of geometry visible to the four pixels “underneath” it • Q: How is this going to help? Hierarchical Z-Buffer • Idea: test polygon against highest level first – If polygon is further than distance recorded in pixel, stop--it’s occluded – If polygon is closer, recursively check against next lower level • Amounts to hierarchical rasterization of the polygon, with early termination – Must update higher levels as we go Hierarchical Z-Buffer • Z-pyramid exploits image-space coherence: polygon occluded in one pixel is probably occluded nearby • HZB also exploits object-space coherence: polygons near an occluded polygon are probably occluded • Q: How might you use object-space coherence? Hierarchical Z-Buffer • Subdivide scene with an octree • All geometry in an octree node is contained by a cube • Before rendering the contents of a node, “render” the faces of its cube • If cube faces are occluded, ignore the entire node • Query Z-pyramid to “render” cubes Hierarchical Z-Buffer • Exploit temporal coherence (What?) • HZB operates at max efficiency when Zpyramid is already built • Idea: most polygons affecting Z-buffer (“nearest polygons”) are the same from frame to frame • So start by rendering the polygons (octree nodes) visible last frame Hierarchical Occlusion Maps stolen by Dave Luebke from the Ph.D. Defense presentation of: Hansong Zhang Department of Computer Science UNC-Chapel Hill Visibility Culling • Discard objects not visible to the viewer View-frustum culling Back-face culling View Point View Frustum Occlusion culling Hierarchical Occlusion Maps: Overview Blue parts: occluders Red parts: occludees Effective Algorithms •Generality • Arbitrary models •Speed-up • Significant, fast culling for interactive graphics •Portability • Few hardware assumptions • Robustness Thesis Statement • By properly decomposing the occlusion-culling problem and efficiently representing occlusion, we can obtain effective algorithms and systems for occlusion culling. Observations • Want to handle cumulative occlusion A View Point B Observations • Want an occlusion representation (OR) – Fast to compute – Fast to use A View Point B Observations •Progressive occlusion culling Initialize OR to null for each object Occlusion test against OR if culled Discard object else Render object Update OR Observations •Multi-pass occlusion culling Initialize OR to null; initialize PO to empty for each object The set of potential occluders Occlusion test against OR If culled Discard object else Render object Add object to PO #passes = #updates if PO is large enough Update OR with objects in PO Observations • Special case: one-pass occlusion culling – Select occluders until PO is large enough – Update (build) occlusion representation – Occlusion culling & final rendering Problem Decomposition • View Point X Z Y • Occlusion = depth + overlap Problem Decomposition • Verifying occlusion – Overlap tests • Based on representations for projection – Depth tests • Based on representations for depth Occlusion Maps Rendered Image Occlusion Map Occlusion Maps – An occlusion map • Corresponds to a screen subdivision • Records average opacity for each partition – Can be generated by rendering occluders • Record pixel opacities (pixel coverage) – Merge projections of occluders – Represent occlusion in image-space Occlusion Map Pyramid 64 x 64 32 x 32 16 x 16 Occlusion Map Pyramid Occlusion Map Pyramid •Analyzing cumulative projection – A hierarchy of occlusion maps (HOM) – Made by recursive averaging (low-pass filtering) – Record average opacities for blocks of pixels – Represent occlusion at multiple resolutions – Construction accelerated by hardware Overlap Tests – Problem: is the projection of tested object inside the cumulative projection of the occluders? – Cumulative projection of occluders: the pyramid – Projection of the tested object • Conservative overestimation – Bounding boxes (BB) – Bounding rectangles (BR) of BB’s Overlap Tests • The basic algorithm Given: HOM pyramid; the object to be tested • Compute BR and the initial level in the pyramid • for each pixel touched by the BR if pixel is fully opaque continue else if level = 0 return FALSE else descend... Overlap Tests • Evaluating opacity: early termination – Conservative rejection – Aggressive approximate culling – Predictive rejection Conservative Rejection – A low-opacity pixel does not correspond to many high-opacity pixels at finer levels – The transparency threshold 1 1 1 1 1 0.8 1 1 0.9 0.9 0.1 0 0.2 0.3 0 0 Aggressive Approximate Culling •Ignoring barely-visible objects – Small holes in or among objects – To ignore the small holes • LPF suppresses noise — holes “dissolve” • Thresholding: regard “very high” opacity as fully opaque – The opacity threshold: the opacity above which a pixel is considered to be fully opaque Aggressive Approximate Culling 0 1 2 3 4 Aggressive Approximate culling – Further descent not necessary when fully opaque • Tests terminated before holes are reached – Need different opacity thresholds for each level Predictive Rejection – Terminate the test knowing it must fail later... 1 1 1 1 1 0.8 1 1 1 1 1 0 0.2 0.3 0 0 Summary: Levels of Visibility • The continuum between being visible and non-visible Occlusion Maps Almost transparent (low opacity) Almost opaque (high opacity) Potential Occludees Almost visible Almost non-visible Resolving Depth • What’s left of the occlusion test? “A occludes B” = “A’s projection contains B’s” + ? B A B does not occlude any part of A Another interpretation... Resolving Depth • Depth representations – Define a boundary beyond which an object overlapping occluders is definitely occluded – Conservative estimates: • A single plane • Depth estimation buffer – No-background z-buffer A single plane • … at the farthest vertex of the occluders Image plane The plane Occluders The point with nearest depth Viewing direction A This object passes the depth test Depth Estimation Buffer • Like a low-res depth buffer – – – – Uniform subdivision of the screen A plane for each partition Defines the far boundary Updates (i.e. computing depth representation) • Occluder bounding rectangle at farthest depth – Depth tests • Occudee bounding rectangle at nearest depth Depth Estimation Buffer Transformed view-frustum Image plane Viewing direction D. E. B. Bounding rectangle at farthest depth Occluders Bounding rectangle at nearest depth B A Depth Estimation Buffer •Trade-off – Advantages • Removes need for strict depth sorting • Speed • Portability – Disadvantages • Conservative far boundary • Requires good bounding volumes No-Background Z-Buffer – The z-buffer from occluder rendering... • is by itself an full occlusion representation • has to be modified to support our depth tests – “Removing” background depth values • Replace them the “foreground” depth values – Captures the near boundary No-Background Z-Buffer Transformed view-frustum Image plane D. E. B Occluders N. B. Z Viewing direction A Objects passing the depth tests No-Background Z-Buffer • Trade-off – Advantages • Captures the near boundary • Less sensitive to bounding boxes – Disadvantages • Assumes quickly accessible z-buffer • Resolution same as occlusion maps (however…) Occluder Selection – Occlusion-preserving simplification (OPS) – Run-time selection – Visibility pre-processing OPS – View-independent OPS X Z OPS – View-dependent OPS OPS – In practice, use traditional, static LOD’s • • • • • More restrictive view-independent OPS Well-studied and available Low run-time overhead Shared with final rendering, no extra memory Area-preserving [Erikson 98] – Conservative OPS (COPS)... Occluder selection • At run time – Distance-based selection with a polygon budget – Temporal coherence • Visibility sampling – Pre-compute visible objects on a 3-D grid – Facilitates run-time selection Implementation • A two-pass framework Occluder Selection Scene Database View Frustum Culling LOD Rendering Build Occlusion Representation Occlusion Culling LOD Implementation • Pipelining OccSelN+1 OccSelN FinalDrawN+1 OccSelN+2 OccDraw FinalDrawN OccDrawN FinalDrawN N CullN CullN+1 +1 +1 CullN+2 OccSelN+3 OccDrawN+2 Implementation – Uses bounding volume hierarchy – Active layers of the pyramid: 4x4 - 64x64 – Resolutions • Occluder rendering - 256x256 • D. E. B. - 64x64 – Test platforms • SGI Onyx II, 4 195Mhz R10000, InfiniteReality • SGI Onyx I, 4 250MHz R4400, InfiniteReality Results • The city model Results • The city model – – – – – – 312,524 polygons Single CPU 5,000 occluder polygons Depth estimation buffer Opacity thresholds 1.0 Lighting; display lists; no triangle strips Results 90 80 Frame rate (fps) 70 60 50 OC+VFC 40 VFC+Only 30 20 10 0 1 201 401 Frame # 601 Results Number of remaining triangles 320,000 300,000 280,000 260,000 240,000 220,000 200,000 Total 180,000 160,000 140,000 after VFC after VFC+OC Ideal 120,000 100,000 80,000 60,000 40,000 20,000 0 1 201 401 Frame # 601 Results • Auxiliary Machine Room (AMR) Results •AMR – – – – – – – 632,252 polygons 3 CPUs 25,000 occluder polygons No-background z-buffer Approximate culling (0.85 for level 64x64) LOD Lighting; display lists; no triangle strips Results 8 7 Frame rate (fps) 6 5 LOD+VFC+OC 4 LOD+VFC 3 2 1 0 1 201 Frame # 401 Results Number of remaining triangles 700,000 600,000 500,000 Original model After LOD 400,000 After LOD+VFC 300,000 After LOD+VFC+OC Ideal 200,000 100,000 0 1 201 Frame # 401 Results Number of triangles culled by OC 180,000 160,000 140,000 OT=1.0 120,000 OT=0.8 100,000 80,000 60,000 1 201 Frame # 401 Results • The power plant model Results •The power plant model – – – – – – – 15 million triangles 3 CPUs Visibility pre-processing on a 20x20 grid (~15min) No-background z-buffer 18,000 occluder polygons opacity thresholds from 0.85 and up LOD Results 18 16 Frames rate (fps) 14 12 10 LOD+VFC+OC LOD+VFC 8 6 4 2 0 1 201 Frame # 401 14,400,000 Original model After LOD 13,400,000 12,400,000 11,400,000 1 Number of remaining polygons 800,000 700,000 600,000 500,000 After LOD+VFC After LOD+VFC+OC Ideal 400,000 300,000 200,000 100,000 1 201 Frame # 401 Conclusion •Goals achieved – Generality • Any model, any occluder • Occluder fusion – Speed-up • Accelerate interactive graphics – Ease of implementation • Configurability • Robustness Conclusion • Main contributions: – Problem decomposition • Overlap tests and depth tests – Occlusion representations • Occlusion maps • Depth Estimation Buffer • No-Background Z-Buffer Conclusion • Main contributions – Hierarchical occlusion maps • • • • Analysis of occlusion at multiple resolutions High-level opacity estimation Aggressive approximate culling Levels of visibility – The first occlusion culling algorithm for general models and interactive 3-D graphics Future Work •Other implementations... – PC’s and games • How much can be done in software? – Integration into hardware • More progressive updates to occlusion representation • Less conservative culling – Wide-spread use of occlusion culling Early Splat Elimination • Need: splat visibility test – a voxel is only visible if the volume material in front is not opaque screen occluded voxel: does not pass visibility test wall of occluding voxels occlusion map = opacity image Visibility Test - Naive • Check opacity of every pixel within footprint – number of pixels to be checked is large voxel footprint voxel kernel opaque area Visibility Test - Efficient IEEE Trans. Vis. and Comp. Graph. ‘99 • Compute occlusion map after each sheetbuffer compositing project do not project opacity threshold occlusion map opacity < threshold opacity = 0 Early Splat Elimination - Results standard early elim. voxels splatted Head (2563) 54.6 s 12.7 s 3% Transp. head 35.8 s 14.8 s 21 % Nerve (512x76) 17.6 s 8.4 s 10 % 3 Tomato (256 ) 36.8 s 15.5 s 37 %