GDC 2005

advertisement

Zen of multi core rendering

» Corrinne Yu

» Halo team Principal engine programmer

» Corrinne.Yu@microsoft.com

Zen of multi core rendering

» Take away

» Compilation and survey of effective rendering techniques for current generation multi core console hardware

Rendering equation

Rendering equation

» Radiance leaving a point

» Integral of radiance in all direction

Rendering equation

» Radiance leaving a point

» Integral of radiance in all direction

» Reflectance distribution function

Rendering equation

» Radiance leaving a point

» Integral of radiance in all direction

» Reflectance distribution function

» Light coming inward to surface position

Rendering equation

» Radiance leaving a point

» Integral of radiance in all direction

» Reflectance distribution function

» Light coming inward to surface position

» Visibility of light to surface position

Rendering equation

» Integral of radiance in all direction

» Reflectance distribution function

» Light coming inward to surface position

» Visibility of light to surface position

» Attenuation of inward light due to incident angle with surface normal

Compromise and cheats

» This is computed per surface element

» This is infeasibly expensive

» In the past, we made quality compromises throughout to make run time rendering possible

First generation

» 1 to 4 dynamic lights

» Simple point lights

» Lambertian

» Blinn-Phong approximation

» Pre-computed diffuse radiosity

» Shadow map optional

Hardware

» 117 million triangles per second

» 0.933 gigapixels per second

» 1.86 giga texels per second

» 6.4 gigabytes of bandwidth per second

» 64 megabytes of video memory

Hardware

» 117 million triangles per second

» 0.933 gigapixels per second

» 1.86 giga texels per second

» 6.4 gigabytes of bandwidth per second

» 64 megabytes of video memory

Second generation

» 500 million triangles per second

» 4 gigapixels per second

» 8 giga texels per second

» 256 gigabytes of bandwidth per second

» 512 megabytes of video memory

Second generation

» 4.27x triangle throughput

» 4.29x pixel fill rate

» 4.29x texel rate

» 40x bandwidth

» 8x video memory

Second generation

» 4.27x triangle throughput

» 4.29x pixel fill rate

» 4.29x texel rate

» 40x bandwidth

» 8x video memory

Second generation

» Large number of lights of precomputed radiance transfer

» Environment and area lights

» Realistic reflectance models

» Cook Torrance, Ward

» Shadow map

Large lights integral

» Large number of lights integral

» Static geometry

» Precomputed visibility

» Spatially non-varying BRDF's

» Low-frequency illumination

» Image-space resolution limited

Multi core generation

» 70x triangle throughput

» 450x pixel fill rate

» 390x texel rate

» 110x bandwidth

» 16x video memory

Multi core generation

» 70x triangle throughput

» 450x pixel fill rate

» 390x texel rate

» 110x bandwidth

» 16x video memory

Amdahl’s law

Multi core insight

» Fill rate is achieved by completely asynchronous out of order VPU

(Vector Processing Unit) computation

» My experience with CUDA is that there are intentionally no synchronization primitives

Multi core insight

» On Larrabee, each core has 4 hardware threads

» Each thread is out of order

» But for one thread’s execution, the vertices and pixels are synchronized

Multi core insight

» So there are essentially 256 out of order processes

» Each consisting of a batch of about

16 synchronized pixels or vertices in flight at any one time

Multi core insight

» Expectation is shader flops will grow the most

» Speed not from higher clock rate

» Speed from larger number of low power cores

» Memory is not exepcted to catch up to shader flops

Multi core insight

» ALU's or VPU's to increase by 300x

» Future is tfetch bound, not ALU bound

» Homogeneous computing

» Keep ALU's or VPU's very busy with cache coherent local data

Multi core generation

» Occlusion from static geometry

» Precomputed visibility

» Spatially non-varying BRDF's

» Low-frequency illumination

» Image-space resolution limited

Multi core generation

» Occlusion from dynamic geometry

» Precomputed visibility

» Spatially non-varying BRDF's

» Low-frequency illumination

» Image-space resolution limited

Multi core generation

» Occlusion from dynamic geometry

» Dynamic visibility computation

» Spatially non-varying BRDF's

» Low-frequency illumination

» Image-space resolution limited

Multi core generation

» Occlusion from dynamic geometry

» Dynamic visibility computation

» Spatially varying BRDF's

» Low-frequency illumination

» Image-space resolution limited

Multi core generation

» Occlusion from dynamic geometry

» Dynamic visibility computation

» Spatially varying BRDF's

» High -frequency illumination

» Image-space resolution limited

Multi core generation

» Occlusion from dynamic geometry

» Dynamic visibility computation

» Spatially varying BRDF's

» High -frequency illumination

» High quality resolution

Multi core generation

» Occlusion from dynamic geometry

» Dynamic visibility computation

» Spatially varying BRDF's

» High -frequency illumination

» High quality resolution

» Remove remaining compromises

Practical techniques

» Directional light map basis

» Zonal harmonics

» Screen space ambient occlusion

» Shadow map

Directional light map

» Proposed by Valve's G McTaggert for Half Life

» Used in many game games like

Half Life and Unreal

Directional light map

» Spatial axial basis

» (- 1 / sqrt(6), - 1 / sqrt(2), 1 / sqrt(3) )

» ( - 1 / sqrt(6), 1 / sqrt(2), 1 / sqrt(3) )

» ( sqrt(2 / 3), 0, 1 / sqrt(3) )

Analysis

» Static radiance can interact with directional changes of reflectance surface

» Per pixel normal reflectance of radiosity

» Per pixel normal specularity

Analysis

» Basis and precision are not uniformly distributed

» Radiance is correct at exactly 3 clamped directions

» Radiance undersampling occurs for wide ranges of directions

» Only for hemisphere

Pre-computed radiance transfer

» Zonal harmonics

» R Ramamoorthi and P Hanrahan came up with an efficient representation for irradiance environment

Irradiance environment map

» Only 1 st 2 orders of zonal harmonics

» Only use 9 terms

» Average errors only 1% against raytracing

» Much less error prone than directional light maps

Analysis

» Completely feasible in current hardware

» Better than directional light maps

Analysis

» Completely feasible in current hardware

» Better than directional light maps

» Only the lowest of frequencies

» Incapable of representing dynamic local lights

Screen space ambient occlusion

» Developed by V Kajalin

» Used first in Crysis

» Used by game games like Crysis and Unreal

» Sample depth difference between screen space neighbors as occlusion factor

Optimization

» Too many samples in reality

» In practice read small number of samples from a randomly rotated kernel

» Results are filtered to reduce noise

Analysis

» Too many samples in reality

» In practice read small number of samples from a randomly rotated kernel

» Results are filtered to reduce noise

» Low number samples lead to low impact visual effect

Shadow map

» Xbox 360 has several hardware bilinear weight fetch instructions

» Performance boosters

» Use it for hardware accelerated percentage closer filtering

» getWeights1D, getWeights2D, getWeights3D, getWeightsCube

Shadow map

» Poisson filter with rotating kernel is shipped in many games, including Fable 2, Brothers in

Arms, and so on

Poisson distribution

Poisson filter

» Generate random numbers with this distribution

» Rotate them

» Offset source sample by the jitters

» Render weighted accumulation

Analysis

» Shadow map itself has no soft edge

» Soft shadow map is created from jitters and filters

» Shadow map is an image based technique of finite resolution

Analysis

» Still a fast technique for high frequency local lighting

» 10000 spherical harmonics term will not give you the occlusion shadow map will give you

» Still useful for a very long time

Multi core generation

» Occlusion from dynamic geometry

» Dynamic visibility computation

» Spatially varying BRDF's

» High -frequency illumination

» High quality resolution

» Remove remaining compromises

Dynamic radiance

» Haar wavelet radiance caches

» Radiance transfer factorization

» Dimensionality reduction

» Linear discriminant analysis

» BRDF factorization

Dynamic radiance linear discriminant analysis

BRDF factorization wavelet caches distance cube (or hemi cube) radiance factorization wavelet radiance caches rasterization factorized radiance caches factorized

BRDF dynamic radiance

Radiance caches wavelet caches distance cube (or hemi cube) radiance factorization wavelet radiance caches

Wavelet radiance caches

» Haar wavelet basis

» Visibility

» Radiance factorization

Wavelet radiance caches

» Haar wavelet basis

» Visibility

» Radiance factorization

Haar wavelet basis

» Spherical harmonics is not the only basis available for radiance transfer

» Radiance and sum of area lights can also be represented by Haar wavelets

Haar wavelet radiance

» What is exciting about Haar wavelet is that its radiance visibility triple integral is fast enough to run on GPU in real time

Haar wavelet

2D Haar wavelet and visibility

» The visibility function V(x, theta) is also a binary function

» Multiplying visibility to wavelet radiance is spatially and physically turning parts of the wavelet equation on and off

Wavelet and integrals

» The integral of the product of wavelet radiance and visibility also simplifies the run-time equation

Wavelet visibility insights

» In some ways, spherical harmonics is the frequency corrected distribution of the basis in directional light map

» Zonal harmonics correctly samples and stores radiance contribution without a preference to a direction

Wavelet visibility insights

» “Simulating soft shadows with graphics hardware” Heckbert,

Herf, 1997

» Heckbert rendered soft shadows by rendering shadows from 100 lights to create shadow penumbra

Analysis

» No BRDF and inter-reflection

» No radiance transfer

» No specular reflectance

Analysis

» No BRDF and inter-reflection

» No radiance transfer

» No specular reflectance

» It was GPU accelerated for its time!

Multi core rendering

» What is the modern multi core shader / homogenous function pipeline version of this technique?

Multi core rendering

» Not just shadows, the full radiance illumination model

Multi core rendering

» Not just shadows, the full radiance illumination model

» Not one light per pass, sample sparse wavelet data efficiently in tfetchCube

Wavelet radiance caches

» Haar wavelet basis

» Visibility

» Radiance factorization

Dynamic radiance

» For dynamic geometry, convolution of the visibility changes with the radiance wavelet coefficients must be performed before the radiance is applied

» Still challenging to perform at run time

Ray tracing or radiosity

» Capture only occlusion

» Capture the full transport and full reflectance distribution

» GPU occlusion through rasterization

» GPU kd-tree line trace

Capture only occlusion

» Feasible with current hardware

» Fast

» GPU side, hardware occlusion

» CPU side, line trace into kd-tree

» Visually unsophisticated

Capture full reflectance transport

» Visually much more complex than

GPU occlusion

» More expensive

» Fill out wavelet probes on different threads across multiple frames

» Unfinished wavelet probes still useful for radiance

Radiosity

» The hemi-cube: a radiosity solution for complex environments. Cohen and

Greenberg 1985

» Use GPU to rasterize radiance

Radiosity

» Great for low frequency spherical harmonics

» First pass has direct lighting only

» For high frequency wavelets, needs excessively high resolution

» No caustics, subsurface scattering

Radiosity

» Low resolution first pass with GPU hemi-cube

» Higher frequency passes with direction cube kd-tree line tracing

Raytracing

» Direction cube techniques and ray tracer caches can take up too much memory

» Reyes ray tracing may be more parallelizable, but be careful of bucket load balancing

Bounding volume hierarchy

» Kd-tree can be 15x faster than

BSP for ray tracing

» SAH (surface area heuristic) only necessary in deeper nodes

» For nodes close to root, divide by number of objects in boxes are good enough

Wavelet radiance analysis

» It takes about 18 to 20 terms to represent all frequencies well

» This is twice the number of terms for SH irradiance maps (9 terms)

Wavelet radiance analysis

» Memory is much less because the probes are not pre-computed across the level

» Fetching the terms to synthesize the radiance is twice or more the pixel ALU cost

Wavelet radiance analysis

» 18 wavelet terms, on the other hand, capture high frequency quality not captured by 10000 term spherical harmonics

» Not exactly a 1:1 trade-off for high frequency or all frequency solution

Wavelet radiance caches

» Haar wavelet basis

» Visibility

» Radiance factorization

Radiance factorization

» Radiance factorization is important to dynamic radiance transfer

» Decompose radiance transfer

Radiance factorization

» Spatial contribution

Radiance factorization

» Spatial contribution

» Angular contribution

Radiance factorization

» Spatial contribution

» Angular contribution

» Temporal contribution

Radiance factorization

» Spatial contribution

» Angular contribution

» Temporal contribution

» Visibility contribution

Dimensionality reduction

» Exponential growth with dimensionality and contribution factors

» Dimensionality reduction to factorize the radiance triple integral

Dimensionality reduction

» In reality, there top factors impact output more than less relevant factors

Dimensionality reduction

» Principal components analysis

» Linear discriminant analysis

Principal components

» Principal

» Orthogonal linear combinations with the largest variance

» Secondary

» Linear combination with the second largest variance and orthogonal to principal

Principal components

» Use principal components to select important factors in the original radiance equation

» Keep separating until factors are separated into components

» Equation factored out into dynamic factors

Principal components

» We can see how factoring principal components can factor out the primary impact of dynamic variables in the radiance equation

Principal component

» PCA remaps an apparently complex function into feature or factor separable distribution

Principal components

Dimensionality reduction

» PCA works best with purely orthogonal data

» Unfortunately, radiance transfer is not very orthogonal at all

» For better results, a dimensionality reduction algorithm should find separation even when there is none

Linear discriminant analysis

» Works best for Gaussian distribution clusters

» Finds separation even when there is (almost) none

» LDA has potential to out-perform

PCA in factorization of the rendering triple integral

Linear discriminant analysis

» Same idea as PCA

» Maximize separation by classification

» Minimize variance within the classification after projection

» Principal, secondary, …

D* for rendering?

» B Guenter at MSR

» Developed a compiler and declarative meta language D*

» Creates optimized source code

» Solve for dynamics of an equivalent system and no constraints

D* for rendering?

» With fewer degrees of freedom

» Uses analytic / symbolic approaches based on Lagrangian dynamics

» Coordinate reduction and projection

D* for rendering?

» Derive optional equations to solve for forward dynamics of the system

» Necessary derivatives to linearlize the system’s equations of motion at any given configuration

D* for analytical models

» Is there potential for D* to reduce dimension symbolically for the render equation?

Factorization technology

» LDA and D* can be applied to factorize the triple integral

» Factorization is essential to dynamic radiance

Dynamic radiance

» Haar wavelet radiance caches

» Radiance transfer factorization

» Dimensionality reduction

» Linear discriminant analysis

» BRDF factorization

Dynamic radiance

linear discriminant analysis

BRDF factorization

Dynamic scenes

» Before light reaches the eye, light undergoes a huge number of physical interactions with many objects

» When these objects deform, animate, move, change, gets destroyed, reflectance distribution should update accordingly

Dynamic radiance

» Factored dynamic radiance requires BRDF cooperation

» Factored spatial radiance transfer, factored specular radiance transfer, needs to be evaluated with only the BRDF lobes that are affected

BRDF factorization

» Efficiency and compression

» Specular lobes require higher order basis for fidelity

» Factorization keep the basis cost down

BRDFs

» Cook Torrance

» Oren Nayar

» Ward

» Linear combination of measured

BRDFs

BRDF factorization prior work

» BRDF factorization

» “Interactive relighting with dynamic BRDFs” MSRA: Sun Zhou

Chen Lin Shi Guo 2007

» They used PCA, not LDA.

» I learned good BRDF factorization practices from this paper.

Factorization

» The challenge of dynamic scene is that given a static world, the radiance inter-reflectance is determined by the configuration of the objects

» We need factorization that takes deformation into account

Haar and factorization

» Another reason I became interested in Haar wavelet representation of radiance is that it adapts very well with factorized tensors generated by LDA

Summary

» Occlusion from dynamic geometry

» Dynamic visibility computation

» Spatially varying BRDF's

» High -frequency illumination

» High quality resolution

» Remove remaining compromises

Long tail Xbox 360

Long tail Xbox 360

» Use LDA at build time to reduce dimensionality

» Combine classifications

» Reduce number of run time variables to principal components

» Speed optimization

Future work

Future work

» Spherical wavelet instead of 2D haar wavelet?

Future work

» Spherical wavelet instead of 2D haar wavelet?

» Nonlinear and kernel dimensionality reduction instead of

LDA?

Future work

» Spherical wavelet instead of 2D haar wavelet?

» Nonlinear and kernel dimensionality reduction instead of

LDA?

» Dimensionality reduction on a symbolic level?

Summary

» Rally effort to develop symbolic kernels for dynamic radiance transfer

» Rally effort to factorize the rendering equation triple integral with mathematic techniques or human manual optimization

Thank you

» Corrinne.Yu@microsoft.com

» Continue our discussion and future work to implement dynamic radiance at corrinnesdotplan.blogspot.com

» Please fill in the survey.

Download