CS 446: Real-Time Rendering
& Game Technology
David Luebke
University of Virginia
• Billboards
– Screen-aligned, world-aligned
• Point sprites
• Imposters
– Trees, buildings, portal textures, billboard clouds
– Dynamic imposters for “caching” rendering results
• Depth textures
• Multitexturing
– Low-res light maps, hi-res decals, etc
Real-Time Rendering 2 David Luebke
• Render to texture – framebuffer objects (FBOs)
– Multiple render targets
• Environment maps
– Sphere map, cube maps (hardware supported)
• Shadow maps
– A depth texture rendered from light source (more later)
• Relief textures
– Demo now, details later
3 David Luebke Real-Time Rendering
• Normal maps – especially for bump mapping
– Gloss maps, reflectance maps, etc
• Generally:
– Think of textures as global memory for fragment programs, with built-in filtering
– Just starting to be able to access textures in vertex programs too
(NVIDIA hardware only, today)
• Deferred shading
• Projective texture mapping
4 David Luebke Real-Time Rendering
• Many of the techniques we discuss in this class do not depend on programmable graphics hardware
– But even those are often easier to implement!
• And programmable graphics opens up an endless number of tricks and techniques that could not have been efficiently implemented before
• So, the next topic is a brief intro to Cg
– My apologies to those of you who’ve seen this
– My apologies to those of you who haven’t
Real-Time Rendering 5 David Luebke
• Much of this lecture comes from Bill Mark’s
SIGGRAPH 2002 course talk on NVIDIA’s programmable graphics technology
• For this reason, and because the lab is outfitted with NVIDIA cards, we will focus on NVIDIA tech
• I try to mention similarities and differences with ATI, the other main GPU vendor, in lecture and slides
• Note: many/most images are from NVIDIA as well
Real-Time Rendering 6 David Luebke
Application
Transform
& Light
Assemble
Primitives
Graphics State
Rasterize Shade
CPU GPU
• A simplified graphics pipeline
– Note that pipe widths vary
– Many caches, FIFOs, and so on not shown
Render-to-texture
Video
Memory
(Textures)
Real-Time Rendering 7 David Luebke
• Transform & light (a.k.a. vertex processor)
– Transform from “world space” to “image space”
– Compute per-vertex lighting
Real-Time Rendering 8 Courtesy Mark Harris David Luebke
• Rasterizer
– Convert geometric rep. (vertex) to image rep. (fragment)
• Fragment = image fragment
– Pixel + associated data: color, depth, stencil, etc.
– Interpolate per-vertex quantities across pixels
9 Courtesy Mark Harris David Luebke Real-Time Rendering
• Fragment processors (multiple in parallel)
– Compute a color for each pixel
– Optionally read colors from textures (images)
Application
Assemble
Primitives
Graphics State
Rasterize
Processor
CPU
• Programmable vertex processor!
GPU
Render-to-texture
• Programmable pixel processor!
Video
Memory
(Textures)
11 David Luebke Real-Time Rendering
Graphics State
Application
Vertex
Processor
Rasterize
Fragment
Processor
Video
Memory
(Textures)
CPU
• Programmable primitive assembly!
GPU
Render-to-texture
• More flexible memory access!
12 David Luebke Real-Time Rendering
• 32-bit IEEE floating-point throughout pipeline
– Framebuffer
– Textures
– Fragment processor
– Vertex processor
– Interpolants
Real-Time Rendering 13 David Luebke
• Can support 32-bit IEEE floating point throughout pipeline
– Vertices, interpolants, framebuffer, textures, computations
• Fragment processor also supports:
– 16-bit “half” floating point, 12-bit fixed point
– These may be faster than 32-bit
• Framebuffer/textures also support:
– Large variety of fixed-point formats
• E.g., classical 8-bit per component RGBA, BGRA, etc.
– These formats use less memory bandwidth than FP32
14 David Luebke Real-Time Rendering
• 4-vector FP32 operations
• Condition codes + true data-dependent control flow
– Conditional branches, subroutine calls, jump table
– Useful for avoiding extra work, e.g.:
• Don’t do animation, skinning if vertex will be clipped
• Do displacement mapping only for vertices near silhouette
– Transcendental arithmetic instructions (e.g. COS)
• User clip-plane support
• Texture reads (up to 4 textures, unlimited lookups)
Real-Time Rendering 15 David Luebke
• No arbitrary memory write
• No “vertex kill”
– Can put vertex off-screen
– Can make degenerate primitives
• Only 32-bit texture formats supported
Real-Time Rendering 16 David Luebke
• 65535 instructions per program
• Other statistics (NV30, not sure about NV40-G70):
– 16 temporary 4-vector registers
– 256 “uniform” parameter registers
– 2 address registers (4-vector)
– 6 clip-distance outputs
Real-Time Rendering 17 David Luebke
• Texture reads are just another instruction
• Allows computed texture coordinates, nested to arbitrary depth
– This is a big difference w/ NVIDIA and ATI right now
• Allows multiple uses of a single texture unit
• Optional LOD control – can specify filter extent
• Think of it as a memory-read instruction, with optional user-controlled filtering
18 Real-Time Rendering David Luebke
• Dynamic branching
• Conditional fragment-kill instruction
• Read access to window-space position
• Read/write access to fragment Z (but not stencil)
• Multiple render targets
• Built-in derivative instructions
– Partial derivatives w.r.t. screen-space x or y
– Useful for anti-aliasing shaders
• FP32, FP16, and fixed-point data
Real-Time Rendering 19 David Luebke
• Dynamic branching less efficient than vertex proc.
– Especially for non-coherent branching (<~ 30x30 pixels)
– Can do a lot with condition codes
• No indexed reads from registers
– I.e., no indexed arrays
– Must use texture reads instead
• No arbitrary memory write
20 David Luebke Real-Time Rendering
• 65535+ instructions
• Nearly unlimited constants
– Each constant counts as one instruction
• 16 texture units (NV30, still?), reuse as often as desired
• 10 FP32 x 4 perspective-correct inputs (e.g. tex coords)
• Up to 4 128-bit framebuffer “color” outputs
– Can pack as 4 x FP32, 8 x FP16, etc…)
• Can also set the depth output
– 24 or 32 bits, depending on stencil
– Changing depth in fragment program may disable Z-optimizations
Real-Time Rendering 21 David Luebke
• Note: this slide will be dated almost instantly
• NVIDIA: as described in previous slides
• ATI hardware today (1900XT current high-end part):
– No vertex texture fetch (but good render-to-vertex-array)
– Far fewer levels of computed texture coordinates
– Better at fine-grained (less coherent) dynamic branching
• ATI Xenos (Xbox 360 chip):
– Unified shader model: vertex proc == pixel proc
– Scatter support: shaders can write arbitrary memory loc
Real-Time Rendering 22 David Luebke
• Cg is a high-level GPU programming language
• Designed by NVIDIA and Microsoft
• Competes with the (quite similar)
GL Shading Language, a.k.a GLslang
Real-Time Rendering 23 David Luebke
Assembly
…
FRC R2.y, C11.w;
ADD R3.x, C11.w, -R2.y;
MOV H4.y, R2.y;
ADD H4.x, -H4.y, C4.w;
MUL R3.xy, R3.xyww, C11.xyww;
ADD R3.xy, R3.xyww, C11.z;
TEX H5, R3, TEX2, 2D;
ADD R3.x, R3.x, C11.x;
TEX H6, R3, TEX2, 2D;
…
Cg
…
L2weight = timeval – floor(timeval);
L1weight = 1.0 – L2weight; ocoord1 = floor(timeval)/64.0 +
1.0/128.0; ocoord2 = ocoord1 + 1.0/64.0;
L1offset = f2tex2D(tex2, float2(ocoord1, 1.0/128.0));
L2offset = f2tex2D(tex2, float2(ocoord2, 1.0/128.0));
…
• Easier to read and modify
• Cross-platform
• Combine pieces
24
• etc.
David Luebke Real-Time Rendering
• CPU languages
– C – close to the hardware; general purpose
– C++, Java, lisp – require memory management
– RenderMan – specialized for shading
• Real-time shading languages
– Stanford shading language
– Creative Labs shading language
Real-Time Rendering 25 David Luebke
• Start with C (and a bit of C++)
– Minimizes number of decisions
– Gives you known mistakes instead of unknown ones
• Allow subsetting of the language
• Add features desired for GPU’s
– To support GPU programming model
– To enable high performance
• Tweak to make it fit together well
26 Real-Time Rendering David Luebke
1. GPU is a stream processor
– Multiple programmable processing units
– Connected by data flows
Application
Vertex
Processor
Fragment
Processor
Textures
Application
Vertex
Processor
Program
Real-Time Rendering 28
Fragment
Processor
Textures
Program
David Luebke
• Varying inputs (streaming data)
– e.g. normal vector – comes with each vertex
– This is the default kind of input
• Uniform inputs (a.k.a. graphics state)
– e.g. modelview matrix
• Note: Outputs are always varying vout MyVertexProgram( float4 normal , uniform float4x4 modelview ) {
…
a) Let compiler do it
– Define a single structure
– Use it for vertex-program output
– Use it for fragment-program input struct vout { float4 color;
}; float4 texcoord;
…
b) Do it yourself
– Specify register bindings for VP outputs
– Specify register bindings for FP inputs
– May introduce HW dependence
– Necessary for mixing Cg with assembly struct vout { float4 color : TEX3 ;
}; float4 texcoord : TEX5 ;
…
• E.g. the position output from vert prog
– This output drives the rasterizer
– It must be marked struct vout { float4 color;
}; float4 texcoord; float4 position : HPOS ;