Implementing the Render Cache and the Edge-and-Point Image on Graphics Hardware

advertisement

Implementing the Render Cache and the

Edge-and-Point Image on Graphics Hardware

Edgar Velázquez-Armendáriz

Eugene Lee

Bruce Walter

Kavita Bala

GI 2006, Québec, June 9th 2006

Motivation

• High quality shading is still too slow.

– Not ready for interactivity.

– It is slow even on the GPU.

• Potential applications.

– Architecture.

– Modeling.

– Movies.

Overview

• GPU acceleration of the Render Cache and the

Edge-and-Point Image (EPI).

Render Cache overview

Projection Depth cull Interpolation

Edge-and-Point Image overview

• Edge-constrained interpolation preserves sharp features

• Fast anti-aliasing

Presented work

• Mapping to the hardware

– The algorithm’s components differ from standard hardware rendering.

– Overcome GPU limitations.

• Results

– GPU strategies.

– Better interactivity.

Related Work

• Interactive.

– Shading cache. [Tole02]

– Corrective texturing. [Stamminger00]

– Tapestry. [Simmons00]

– Adaptive Frameless Rendering. [Dayal05]

– Distance impostors. [Szirmay-Kalos05]

• Non-interactive.

– Irradiance caching. [Smky05]

• Pure Hardware implementations.

– Ray tracing. [Purcell02, Carr06]

– Photon mapping. [Purcell03]

Talk overview

• Algorithm overview.

• Mapping to the hardware: strategies and challenges.

• Results.

• Discussion.

Overview

Shader

Shading samples

Point manager

3D points

Point projector

Feedback

Asynchronous CPU GPU

Overview

Shader

Shading samples

Shadow edge finder

3D points

Point manager

Point projector

Silhouette edge finder

3D edges

Edge raster

Feedback

Asynchronous CPU GPU

Overview

Shading samples

Shadow edge finder

3D points

Silhouette edge finder

3D edges

Shader

Request samples

Point manager

Feedback

Point projector

2D points

Edge raster

2D edges

Output Image

Edge

Constrained

Interpolation

Asynchronous CPU GPU

Public availability

• The complete Cg source of the shaders is available online: http://www.cs.cornell.edu/~kb/projects/epigpu/

Talk overview

• Algorithm overview.

• Mapping to the hardware: strategies and challenges.

• Results.

• Discussion.

Mapping to the hardware

• Sections are grouped on computational similarity:

– Point processing

– Edge finding

– Edge constrained interpolation

• Most of the processing has been moved to the GPU.

Silhouette edge finder

3D edges

Point projector

2D points

Edge raster

2D edges

Edge

Constrained

Interpolation

Point processing

• Point Cloud as Vertex Buffer Object (VBO) and

Texture.

• Multiple Render Targets (MRT) used to write all information in a single pass.

• Simplified predicted projection.

– Not as accurate as the regular projection.

quarter of the point cloud

Point processing: Update

• Render Cache’s structures are complex to map.

• We cannot modify pipelined GPU data.

– Use additional passes.

Vertex and Pixel shaders

Point projector

Point Cloud Point Image

Point processing: Bandwidth issues

• Point projection is bandwidth limited.

– Point cloud update.

– New samples request.

• Write to the point cloud only the new samples.

– We use vertex scatter.

– Faster than replacing all the point cloud.

• A static VBO is projected three times faster than a constantly modified one.

Silhouette detection

• The original EPI uses hierarchical trees.

– Does not map well to GPU.

• Brute force method on the GPU.

– Avoid edges transfer every frame.

– Faster than hierarchical structures!

• Shadow edge detection left on the CPU.

Edge texture

Model edges

Silhouette detection: Limitations

• GPU silhouette detection is limited by the fill rate.

• Texture memory constraints.

– We need to keep all vertices as VBO.

– Vertices and normals as textures.

– One results texture.

• Normals stored as fp16 to reduce space.

Edge Raster

• Raster edges with subpixel precision.

• Depends on model complexity.

• Extended lines as described in SEN03.

• Filtered depth as read-only depth buffer.

– Free occlusion culling!

No depth texture

With depth texture

Edge Constrained Interpolation

• Multi-pass pixel shaders.

– Very long.

– A lot of texture accesses.

• Image resolution dependent.

• Use look-up tables encoded as textures.

– Avoid control code in shaders.

– Encode original EPI operations.

Future trends

• Branching granularity.

– Some filters require fine granularity to take advance of dynamic branching.

– This issue is being solved with newer cards beginning with ATI X1000 series.

• Bit operations not directly supported.

– DirectX 10 will support them.

• Bottom line: GPU implementation will get better and faster.

Limitations

• Fill rate and texture access.

– These characteristics constantly improve with newer hardware with more pipelines and faster clock frequencies.

• Improve by diminishing shaders length.

– Number of registers used is still important.

– A 180 instructions shader with 25 registers performs 50% slower than a 215 instructions shader with and 24 registers on our GPU.

Talk overview

• Algorithm overview.

• Mapping to the hardware: strategies and challenges.

• Results.

• Discussion.

Test platform

• Test environment.

– Software written in C++, Cg 1.4rc, and Java through

JNI under Windows XP.

– Pentium 4 EE 3.2 Ghz dual core, 2 GB RAM, dual

Nvidia GeForce 7800 GTX (81.85).

• Test scenes.

– Cornell Box

– Chains

– Mackintosh Room

– David Head

– Dragon

Results: FPS

• GPU version is 60–110% faster than the original.

– Speed up increases along with scene complexity.

30

25

CPU only

GPU

20

15

10

5

0

Cornell Box Chains Mack Room David Head Dragon

Results: Speed increase from CPU

700.0%

600.0%

500.0%

400.0%

300.0%

200.0%

278.6%

100.0%

0.0%

13.6%

Point projection Predicted projection

90.6%

Depth cull

665.3%

317.2%

45.4%

Silhouette detection

Edge raster Image Filters

Results: Rendering times

140

120

100

80

60

40

20

0

CPU Dragon GPU Dragon

Image filters

Edge raster

Silhouette detection

Depth cull

Predicted projection

Point projection

Talk overview

• Algorithm overview.

• Mapping to the hardware: strategies and challenges.

• Results.

• Discussion.

Discussion

• Point projection, even though it maps straightforwardly to the GPU is the bottleneck.

• Image filters are very fast in spite of their multiple texture accesses and multiple passes.

• We originally thought the opposite would be true!

Discussion

• Projection is not optimal.

• We wanted to use Vertex Texture Fetch

(VTF) for mapping the point cloud update but it was slower than Render to Vertex

Array (RTV).

• Dual GPU rendering with Scalable Link

Interface (SLI) showed marginal gains.

Future performance

• Texture accesses are very fast and efficient.

• Transferring vertex data on the GPU is too slow to be fully useful.

• Scatter write on pixel shaders and geometry shaders may allow complete data management on the GPU.

Conclusions

• We presented a hybrid GPU/CPU system for the Render Cache and the EPI using commodity graphics hardware.

• Our implementation is 60−110% faster than a pure CPU implementation and frees the CPU up for other operations.

• System’s performance is likely to improve with the current trend of GPUs.

Questions?

Implementing the Render Cache and the

Edge-and-Point Image on Graphics Hardware http://www.cs.cornell.edu/~kb/projects/epigpu/

Download