Experiences with Streaming Construction of SAH KD Trees Stefan Popov, Johannes Günther, Hans-Peter Seidel, Philipp Slusallek Motivation Large speed-up of ray tracing lately Better algorithms (packet tracing [Wald04, Reshetov05]) Optimized spatial index structures Best known: KD trees [Havran00] Faster hardware Research concentrated mainly on static scenes Dynamic scenes Building – slow for SAH based KD trees Done in a pre-processing step Stefan Popov Streaming Construction of KD Trees Dynamic Scenes Approaches Embed dynamics in the index structure Use a two level approach [Wald03] Fuzzy KD trees [Günther06] Update index structure Grids, BVHs and KD tree hybrids Faster build/update Lower traversal performance No efficient approach for KD trees Rebuild entire KD tree Need to make it fast Lazy build Stefan Popov Streaming Construction of KD Trees SAH Algorithm Extract & sort events in advance Split hyper-plane [X: 68] Abstract objects with AABBs Events given by AABB boundaries 2 1 3 Recursive top-down construction 6 5 4 7 8 Find split plane using SAH 1, 2, 3, 4, 5, 6, 7, 8 Compute minimum cost X: 68 Distribute objects to children By distributing the events Keep them sorted Stefan Popov Left Right 1, 2, 3, 4 4, 5, 6, 7, 8 Streaming Construction of KD Trees SAH Cost Function Piecewise linear Discontinuities at object boundaries Evaluate only before opening and after closing event Y 179 169 2 159 149 1 139 129 X Stefan Popov -2 18 38 58 78 Streaming Construction of KD Trees 98 Distribution Along the Split Axis Right Given: event list & split position Sweep event list and classify Left Both Open event X [ [ [ ] Right Left ght Re-label left Label ri Might have to insert new events Both Label both Labe both Before split re-label object “left” Copy event to corresponding child’s list ] Keep right Close event ] Keep both Before split label object “both” After split label object “right” Random memory access [ Stefan Popov [ ] ] [ [ Streaming Construction of KD Trees ] ] Distribution Along the Other Axes Sweep event lists. Copy event to Left, if corresponding object labeled “left” or “both” Right, if corresponding object labeled “right” or “both” Look up in object array Random memory access Y Y Y Y Right Left Stefan Popov Left child’s list Right child’s list Both Streaming Construction of KD Trees Problems of KD Tree Construction Random memory accesses Expensive cost function evaluation Initial sorting – inefficient for lazy builds Stefan Popov Streaming Construction of KD Trees Streaming Algorithm Overview Work with unsorted lists of AABBs Avoid initial sorting Sweep list once to locate initial split plane In a single sweep Distribute objects (straightforward) Determine split positions of children Once data fits in caches, switch to conventional build Parent list Stefan Popov Left list Right list Streaming Construction of KD Trees SAH Cost Estimation Cost function typically varies only slowly No need to evaluate SAH at every event Use sampling! 18000 16000 SAH 14000 12000 10000 8000 Real minimum Minimum found 6000 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 Naïve approach For every event: check all samples O(kN) How to sample efficiently? Stefan Popov Streaming Construction of KD Trees Efficient Sampling Two step approach #Objects to left of sample = #Opening events to its left #Objects to right of sample = #Closing events to its right Count opening/closing events between samples Regular sampling index computation in O(1) Reconstruct left/right object counts at samples Using two partial sums from left and right O(k+N) [ 1 0 [ ] [ 1 1 1 Stefan Popov ] ] 1 0 0 3 1 3 2 2 3 1 1 Streaming Construction of KD Trees 3 0 Refining of Samples SAH – sum of two monotone functions – Cl and Cr Cost between two samples a < b is bounded from below C Cmin = min(Cl) + min(Cr) = Cl(a) + Cr(b) Resample areas where Cmin < current minimum Typically only few intervals need to be re-sampled (< 5%) 18000 16000 14000 C 12000 10000 l+ C = Cr Current minimum 8000 6000 4000 Cl Cr 2000 0 -0.4 Stefan Popov -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 Streaming Construction of KD Trees Algorithm properties Streaming memory accesses SAH cost function estimated by sampling No initial sorting required Refining of Samples Stefan Popov Streaming Construction of KD Trees Improvements Conventional Algorithm Use radix sort – O(N) Fastest algorithm if data set fits into caches No need to order events at same position Count opening/closing events instead Removes one radix sort pass Multiple cores parallelize build Most time spent in the lower tree levels One sub-tree one core Stefan Popov Streaming Construction of KD Trees Results Speed-up up to 50% Only effective in the upper levels Limited by copying of object/events The larger the scene, the higher the speedup Performance independent of triangle order Small decrease in traversal performance (< 2%) With 1024 samples Multi-threading 2.43x @ 4 cores (no local memory management) Stefan Popov Streaming Construction of KD Trees Future Work Fully multi-threaded implementation Carefully memory management on NUMA architectures Memory CPU CPU Memory Memory CPU CPU Memory Extend to other spatial index structures BVHs, BKD trees, SKD trees, … Stefan Popov Streaming Construction of KD Trees Conclusion Streaming construction algorithm 50% speedup Cost function sampling Very low quality degradation Refining of samples Stefan Popov Streaming Construction of KD Trees Thank you! Stefan Popov Streaming Construction of KD Trees Advantages Sequential memory access in the upper levels Small data foot print in conventional build Fits in caches Radix sort is efficient Less computations needed for split plane position estimation But, what about the tree cost? Stefan Popov Streaming Construction of KD Trees Memory Managment Use two arrays and alternate them Object count for node n = in+1 - in Objects in in+1 Index array Sift to second array Object count += SP Left only SP x 2 Left child’s objects im im+1 Right only Right child’s objects im+2 Index array Stefan Popov Streaming Construction of KD Trees SAH tree cost Optimal KD tree for ray tracing SAH based Minimize average expected traversal cost of an arbitrary ray Stefan Popov Streaming Construction of KD Trees SAH computation Efficient computation – extract & sort events in advance Compute incrementally. Keep track of objects on left/right Evaluate after close, before an open events Y 179 169 2 159 149 1 139 X 129 -2 Stefan Popov 18 38 58 78 Streaming Construction of KD Trees 98 Alternative Multi-Threading required on NUMA architectures) Sub-tree core not suitable for the first log(#cores) levels Also unsuitable for some architecture (Cell) Alternative CPU CPU Bring data to cores from sequentialMemory pages CPU CPU Gather event counts in bins at each core Merge counts before actual cost evaluation Stefan Popov Streaming Construction of KD Trees Extension: Multi-Threading Multiple cores parallelize build Most time spent in the lower tree levels One sub-tree one core Stefan Popov Streaming Construction of KD Trees