Experiences with fast KD tree construction

advertisement
Experiences with Streaming
Construction of SAH KD Trees
Stefan Popov, Johannes Günther,
Hans-Peter Seidel, Philipp Slusallek
Motivation
 Large speed-up of ray tracing lately
 Better algorithms (packet tracing [Wald04, Reshetov05])
 Optimized spatial index structures
 Best known: KD trees [Havran00]
 Faster hardware
 Research concentrated mainly on static scenes
 Dynamic scenes
 Building – slow for SAH based KD trees
 Done in a pre-processing step
Stefan Popov
Streaming Construction of KD Trees
Dynamic Scenes Approaches
 Embed dynamics in the index structure
 Use a two level approach [Wald03]
 Fuzzy KD trees [Günther06]
 Update index structure
 Grids, BVHs and KD tree hybrids
 Faster build/update
 Lower traversal performance
 No efficient approach for KD trees
 Rebuild entire KD tree
 Need to make it fast
 Lazy build
Stefan Popov
Streaming Construction of KD Trees
SAH Algorithm
 Extract & sort events in advance
Split hyper-plane [X: 68]
 Abstract objects with AABBs
 Events given by AABB boundaries
2
1
3
 Recursive top-down
construction
6
5
4
7
8
 Find split plane using SAH
1, 2, 3, 4, 5, 6, 7, 8
 Compute minimum cost
X: 68
 Distribute objects to children
 By distributing the events
 Keep them sorted
Stefan Popov
Left
Right
1, 2, 3, 4
4, 5, 6, 7, 8
Streaming Construction of KD Trees
SAH Cost Function

 Piecewise linear
 Discontinuities at object boundaries
 Evaluate only before opening and after closing event
Y
179
169
2
159
149
1
139
129
X
Stefan Popov
-2
18
38
58
78
Streaming Construction of KD Trees
98
Distribution Along the Split Axis
Right
 Given: event list & split position
 Sweep event list and classify
Left
Both
 Open event
X
[
[
[
]
Right
Left
ght
Re-label left
Label ri
 Might have to insert new events
Both
Label both
Labe both
 Before split  re-label object “left”
 Copy event to corresponding
child’s list
]
Keep right
 Close event
]
Keep both
 Before split  label object “both”
 After split  label object “right”
 Random memory access
[
Stefan Popov
[
] ]
[ [
Streaming Construction of KD Trees
]
]
Distribution Along the Other Axes
 Sweep event lists. Copy event to
 Left, if corresponding object labeled “left” or “both”
 Right, if corresponding object labeled “right” or “both”
 Look up in object array  Random memory access
Y
Y
Y
Y
Right
Left
Stefan Popov
Left child’s list
Right child’s list
Both
Streaming Construction of KD Trees
Problems of KD Tree Construction
 Random memory accesses
 Expensive cost function evaluation
 Initial sorting – inefficient for lazy builds
Stefan Popov
Streaming Construction of KD Trees
Streaming Algorithm Overview
 Work with unsorted lists of AABBs
 Avoid initial sorting
 Sweep list once to locate initial split plane
 In a single sweep
 Distribute objects (straightforward)
 Determine split positions of children
 Once data fits in caches, switch to conventional build
Parent list
Stefan Popov
Left list
Right list
Streaming Construction of KD Trees
SAH Cost Estimation
 Cost function typically varies only slowly
 No need to evaluate SAH at every event
 Use sampling!
18000
16000
SAH
14000
12000
10000
8000
Real minimum
Minimum found
6000
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
 Naïve approach
 For every event: check all samples  O(kN)
 How to sample efficiently?
Stefan Popov
Streaming Construction of KD Trees
Efficient Sampling
 Two step approach
 #Objects to left of sample = #Opening events to its left
 #Objects to right of sample = #Closing events to its right
 Count opening/closing events between samples
 Regular sampling  index computation in O(1)
 Reconstruct left/right object counts at samples
 Using two partial sums from left and right  O(k+N)
[
1
0
[ ] [
1 1 1
Stefan Popov
]
]
1
0
0 3
1 3
2 2
3 1
1
Streaming Construction of KD Trees
3 0
Refining of Samples
 SAH – sum of two monotone functions – Cl and Cr
 Cost between two samples a < b is bounded from below
 C  Cmin = min(Cl) + min(Cr) = Cl(a) + Cr(b)
 Resample areas where Cmin < current minimum
 Typically only few intervals need to be re-sampled (< 5%)
18000
16000
14000
C
12000
10000
l+
C
=
Cr
Current minimum
8000
6000
4000
Cl
Cr
2000
0
-0.4
Stefan Popov
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Streaming Construction of KD Trees
Algorithm properties
 Streaming memory accesses
 SAH cost function estimated by sampling
 No initial sorting required
 Refining of Samples
Stefan Popov
Streaming Construction of KD Trees
Improvements
 Conventional Algorithm
 Use radix sort – O(N)
 Fastest algorithm if data set fits into caches
 No need to order events at same position
 Count opening/closing events instead
 Removes one radix sort pass
 Multiple cores  parallelize build
 Most time spent in the lower tree levels
 One sub-tree  one core
Stefan Popov
Streaming Construction of KD Trees
Results
 Speed-up up to 50%




Only effective in the upper levels
Limited by copying of object/events
The larger the scene, the higher the speedup
Performance independent of triangle order
 Small decrease in traversal performance (< 2%)
 With 1024 samples
 Multi-threading
 2.43x @ 4 cores (no local memory management)
Stefan Popov
Streaming Construction of KD Trees
Future Work
 Fully multi-threaded implementation
 Carefully memory management on NUMA
architectures
Memory
CPU
CPU
Memory
Memory
CPU
CPU
Memory
 Extend to other spatial index structures
 BVHs, BKD trees, SKD trees, …
Stefan Popov
Streaming Construction of KD Trees
Conclusion
 Streaming construction algorithm
 50% speedup
 Cost function sampling
 Very low quality degradation
 Refining of samples
Stefan Popov
Streaming Construction of KD Trees
Thank you!
Stefan Popov
Streaming Construction of KD Trees
Advantages
 Sequential memory access in the upper
levels
 Small data foot print in conventional build
 Fits in caches
 Radix sort is efficient
 Less computations needed for split plane
position estimation
 But, what about the tree cost?
Stefan Popov
Streaming Construction of KD Trees
Memory Managment
 Use two arrays and
alternate them
Object count for node n = in+1 - in
Objects
in
in+1
Index array
Sift to second array
Object count += SP
Left only
SP x 2
Left child’s objects
im
im+1
Right only
Right child’s objects
im+2
Index array
Stefan Popov
Streaming Construction of KD Trees
SAH tree cost
 Optimal KD tree for ray tracing
 SAH based
 Minimize average expected traversal cost of an
arbitrary ray



Stefan Popov
Streaming Construction of KD Trees
SAH computation
 Efficient computation – extract & sort events
in advance
 Compute incrementally. Keep track of objects on
left/right
 Evaluate after close, before an open events
Y
179
169
2
159
149
1
139
X
129
-2
Stefan Popov
18
38
58
78
Streaming Construction of KD Trees
98
Alternative Multi-Threading
 required on NUMA architectures)
 Sub-tree  core not suitable for the first
log(#cores) levels
 Also unsuitable for some architecture (Cell)
 Alternative
CPU CPU
 Bring data to cores from
sequentialMemory
pages
CPU CPU
 Gather event counts in bins at each core
 Merge counts before actual cost evaluation
Stefan Popov
Streaming Construction of KD Trees
Extension: Multi-Threading
 Multiple cores  parallelize build
 Most time spent in the lower tree levels
 One sub-tree  one core
Stefan Popov
Streaming Construction of KD Trees
Download