GPU-Assisted Path Tracing

advertisement
GPU-Assisted Path Tracing
Matthias Boindl
Christian Machacek
Institute of Computer Graphics and Algorithms
Vienna University of Technology
Motivation: Why Path Tracing?
Physically based
Nature provides the
reference image
Parallelizable
Sublinear in #objects
Conceptually simple
Can lead to a clean
implementation
But: fast implementation
on GPUs not trivial
2
Outline
Path tracing intro
Main steps of the algorithm
Mapping the algorithm to the GPU
How to organize code into kernels
When to launch kernels
How to pass data between kernels
Acceleration structures
Focus on bounding volume hierarchies
Christian Machacek
3
Path Tracing Intro
Like ray tracing, except it…
…supports arbitrary BRDFs
…is stochastic: at each bounce, the new
direction is decided randomly
Convergence video
From Pharr, Humphreys: PBRT, 2nd ed. (2010)
4
Path Tracing Pseudocode
while image not converged
r = new ray from eye through next pixel
do
i = closest intersection of r with scene
if no i: break
if i is on a light source: c = c + throughput * emission
randomly pick new direction and create reflected ray r
evaluate BRDF at i
update throughput
while path throughput high enough
From Pharr, Humphreys: PBRT, 2nd ed. (2010)
5
Path Tracing Pseudocode
while image not converged
r = new ray from eye through next pixel
do
i = closest intersection of r with scene
if no i: break
if i is on a light source: c = c + throughput * emission
randomly pick new direction and create reflected ray r
evaluate BRDF at i
Execution Time
update throughput
while path throughput high enough
logic
15%
ray cast
56%
From Pharr, Humphreys: PBRT, 2nd ed. (2010)
6
materials
25%
new path
4%
Megakernel Execution Divergence
From Bikker (2013)
7
Solution: Wavefront Path Tracing
Separate, specialized kernels
Keep a pool of ~1 million paths alive
Work for next stage goes into kernel-specific,
compact queues (=4MB index arrays)
https://mediatech.aalto.fi/~samuli/
8
Results
Performance
Execution times
(ms / 1M path segments)
Christian Machacek
9
Limitations and Possible Improvements
Higher memory requirements (+200 MB)
Kernel launch overhead
Dynamic parallelism on GK110
Use an outer scheduling kernel
No CPU round trip
Launch independent stages side-by-side
CUDA streams
So kernels with little work don’t hog the GPU
Christian Machacek
10
Acceleration Structures
Find nearest intersection in O(log N)
Space partitioning vs. object partitioning
Hybrid methods exist
Matthias Boindl
11
Performance
For interactive rendering, compromise
Traversal performance (build quality)
Construction/Update time
Update or rebuild from scratch
Adapt to GPU environment
Memory architecture
Parallel execution
Matthias Boindl
12
State of the Art
Tero Karras and Timo Aila. 2013. Fast parallel
construction of high-quality bounding volume
hierarchies. In Proceedings of the 5th HighPerformance Graphics Conference (HPG '13).
ACM, New York, NY, USA, 89-99.
Matthias Boindl
13
Close the Performance Gap
Matthias Boindl
14
Basic Idea
Fast construction of simple BVH
Generate leaf for each triangle
Reduce SAH cost by modifying tree
Matthias Boindl
15
Treelets
Allow local tree modification
ABCF are leaves, DEG are internal nodes
Matthias Boindl
16
Treelet Construction
Find root: parallel bottom-up traversal
Start with leaves
Use atomic counter at conjunctions
Ensures all children have been processed
Build treelet
Add both children
Pick children with
highest surface area
Fixed size: 7 leaf nodes
Matthias Boindl
17
Rearrange Treelet
Minimize treelet root node surface area
Naive implementation: test each permutation
Better: dynamic programming
Caching of best intermediate results
Start with leaves, then pairs, then triplets, …
Suboptimal subtree construction avoided
Parallelizable as well
Matthias Boindl
18
Results
Gap closed
Matthias Boindl
19
Results
Speed/Quality tradeoff
Matthias Boindl
20
Conclusion
Use specialized kernels
Lower execution divergence
(Better use of instruction cache)
(Fewer registers used simultaneously)
Construct acceleration structures quickly
But not too quickly
Matthias Boindl
21
Thanks for your attention!
Institute of Computer Graphics and Algorithms
Vienna University of Technology
Results
Speed/Quality tradeoff
Matthias Boindl
23
Logic Kernel
Does not need a queue, operates on all paths
If shadow ray was unblocked, add light
contribution
Find material or light source the ray hits
Place path into proper material queue
Russian roulette
If path terminated, accumulate to image
Place path into new path queue
Sample light sources (aka next event estim.)
Christian Machacek
24
New Path Kernel
Generate a new image-space sample
Generate camera ray
Place it into extension ray cast queue
Initialize path state
Throughput
Pixel position
etc.
Christian Machacek
25
Material Kernels
Generate incoming direction
Evaluate light contribution based on light
sample generated in the logic kernel
We haven’t cast the shadow ray yet!
For MIS: p(light sample) from the BSDF
Discard BSDF stack
Queue
extension ray
(shadow ray)
Christian Machacek
26
Ray Cast Kernels
Extension rays
Find first intersection against scene geometry
Store hit data into path state
Shadow rays
Blocked or not?
Christian Machacek
27
Download