Programming GPUs CS 446: Real-Time Rendering & Game Technology David Luebke

advertisement
Programming GPUs
CS 446: Real-Time Rendering
& Game Technology
David Luebke
University of Virginia
Demo
• Today: Jiayuan Meng, ATI demos
• Mar 14: Sean Arietta, Super Mario Sunshine
Real-Time Rendering
2
David Luebke
Assignments
• Assn 3 due tomorrow at midnight
• Assignment 4 now out on web page
– Individual assignment w/ group “integration” deadline
– Due Mar 21 (individual), Mar 23 (group)
– Next 2 assignments will also follow this pattern
Real-Time Rendering
3
David Luebke
The Modern Graphics Pipeline
Graphics State
GPU
• Programmable vertex
processor!
Real-Time Rendering
Fragment
Shade
Processor
Final Pixels (Color, Depth)
CPU
Rasterize
Fragments (pre-pixels)
Assemble
Primitives
Screenspace triangles (2D)
Transform
Vertex
& Light
Processor
Xformed, Lit Vertices (2D)
Vertices (3D)
Application
Video
Memory
(Textures)
Render-to-texture
• Programmable pixel
processor!
4
David Luebke
GPU vendor differences
• Note: this slide (March 2006) will be dated quickly!
• NVIDIA: as described in previous lecture
• ATI GPUs (1900XT current high-end part):
– No vertex texture fetch (but good render-to-vertex-array)
– Far fewer levels of computed texture coordinates
– Better at fine-grained (less coherent) dynamic branching
• ATI Xenos (Xbox) has some cool design features:
– Unified shader model: vertex proc == pixel proc
– Scatter support: shaders can write arbitrary memory loc
Real-Time Rendering
5
David Luebke
Cg – “C for Graphics”
• Cg is a high-level GPU programming language
• Designed by NVIDIA and Microsoft
• Competes with the (quite similar)
GL Shading Language, a.k.a GLslang
Real-Time Rendering
6
David Luebke
Programming in assembly is painful
Assembly
…
FRC
ADD
MOV
ADD
MUL
ADD
TEX
ADD
TEX
…
R2.y, C11.w;
R3.x, C11.w, -R2.y;
H4.y, R2.y;
H4.x, -H4.y, C4.w;
R3.xy, R3.xyww, C11.xyww;
R3.xy, R3.xyww, C11.z;
H5, R3, TEX2, 2D;
R3.x, R3.x, C11.x;
H6, R3, TEX2, 2D;
Real-Time Rendering
Cg
…
L2weight = timeval –
floor(timeval);
L1weight = 1.0 – L2weight;
ocoord1 = floor(timeval)/64.0 +
1.0/128.0;
ocoord2 = ocoord1 + 1.0/64.0;
L1offset = f2tex2D(tex2,
float2(ocoord1, 1.0/128.0));
L2offset = f2tex2D(tex2,
float2(ocoord2, 1.0/128.0));
…
• Easier to read and modify
• Cross-platform
• Combine pieces
• etc.
7
David Luebke
Some points in the design space
• CPU languages
– C – close to the hardware; general purpose
– C++, Java, lisp – require memory management
– RenderMan – specialized for shading
• Real-time shading languages
– Stanford shading language
– Creative Labs shading language
Real-Time Rendering
8
David Luebke
Design strategy
• Start with C (and a bit of C++)
– Minimizes number of decisions
– Gives you known mistakes instead of unknown ones
• Allow subsetting of the language
• Add features desired for GPU’s
– To support GPU programming model
– To enable high performance
• Tweak to make it fit together well
Real-Time Rendering
9
David Luebke
How are GPUs different from CPUs?
1. GPU is a stream processor
– Multiple programmable processing units
– Connected by data flows
Textures
Framebuffer
Fragment
Processor
Framebuffer
Operations
Vertex
Processor
Assembly &
Rasterization
Application
Cg separates vertex & fragment programs
Fragment
Processor
Framebuffer
Vertex
Processor
Framebuffer
Operations
Assembly &
Rasterization
Application
Textures
Program
Program
Real-Time Rendering
11
David Luebke
Cg programs have two kinds of inputs
• Varying inputs (streaming data)
– e.g. normal vector – comes with each vertex
– This is the default kind of input
• Uniform inputs (a.k.a. graphics state)
– e.g. modelview matrix
• Note: Outputs are always varying
vout MyVertexProgram( float4 normal,
uniform float4x4 modelview) {
…
Binding VP outputs to FP inputs
a) Let compiler do it
– Define a single structure
– Use it for vertex-program output
– Use it for fragment-program input
struct vout {
float4 color;
float4 texcoord;
…
};
Binding VP outputs to FP inputs
b) Do it yourself
–
–
–
–
Specify register bindings for VP outputs
Specify register bindings for FP inputs
May introduce HW dependence
Necessary for mixing Cg with assembly
struct vout {
float4 color : TEX3 ;
float4 texcoord : TEX5;
…
};
Some inputs and outputs are special
• E.g. the position output from vert prog
– This output drives the rasterizer
– It must be marked
struct vout {
float4 color;
float4 texcoord;
float4 position : HPOS;
};
How are GPUs different from CPUs?
2. Greater variation in basic capabilities
– Most processors don’t yet support branching
– Vertex processors don’t support texture mapping
– Some processors support additional data types
•
•
•
•
Compiler can’t hide these differences
Least-common-denominator is too restrictive
Cg exposes differences via language profiles
(list of capabilities and data types)
Over time, profiles will converge
How are GPUs different from CPUs?
3. Optimized for 4-vector arithmetic
– Useful for graphics – colors, vectors, texcoords
– Easy way to get high performance/cost
•
•
•
•
C philosophy says: expose these HW data types
Cg has vector data types and operations
e.g. float2, float3, float4
Makes it obvious how to get high performance
Cg also has matrix data types
e.g. float3x3, float3x4, float4x4
Some vector operations
//
// Clamp components of 3-vector to [minval,maxval] range
//
float3 clamp(float3 a, float minval, float maxval) {
a = (a < minval.xxx) ? Minval.xxx : a;
a = (a > maxval.xxx) ? Maxval.xxx : a;
return a;
}
? : is per-component
for vectors
Comparisons between vectors
are per-component, and
produce vector result
Swizzle – replicate and/or
rearrange components.
Cg has arrays too
• Declared just as in C
• But, arrays are distinct from
built-in vector types:
float4 != float[4]
• Language profiles may restrict array usage
vout MyVertexProgram( float3 lightcolor[10],
…) {
…
How are GPUs different from CPUs?
4.
No support for pointers
–
5.
Arrays are first-class data types in Cg
No integer data type
–
–
Cg adds “bool” data type for boolean operations
This change isn’t obvious except when declaring vars
Real-Time Rendering
20
David Luebke
Cg basic data types
• All profiles:
– float
– bool
• All profiles with texture lookups:
– sampler1D, sampler2D, sampler3D,
samplerCUBE
• NV_fragment_program profile:
– half -- half-precision float
– fixed -- fixed point [-2,2)
Real-Time Rendering
21
David Luebke
Other Cg capabilities
• Function overloading
• Function parameters are value/result
– Use “out” modifier to declare return value
void foo (float a, out float b) {
b = a;
}
• “discard” statement – fragment kill
if (a > b)
discard;
Real-Time Rendering
22
David Luebke
Cg Built-in functions
• Texture lookups (in fragment profiles)
• Math
–
–
–
–
Dot product
Matrix multiply
Sin/cos/etc.
Normalize
• Misc
– Partial derivative (when supported)
• See spec for more details
Real-Time Rendering
23
David Luebke
Cg Example
• The following fragment program implements a
(very) simple toon shader
– Flat 3-tone shading
• Highlight
• Base color
• Shadow
– Black silhouettes
Real-Time Rendering
24
David Luebke
Cg Example – part 1
// In:
//
eye_space position = TEX7
//
eye space T = (TEX4.x, TEX5.x, TEX6.x)
//
eye space B = (TEX4.y, TEX5.y, TEX6.y)
//
eye space N = (TEX4.z, TEX5.z, TEX6.z)
fragout frag program main(vf30 In) {
float m = 30;
float3 hiCol
= float3( 1.0, 0.1, 0.1 );
float3 lowCol = float3( 0.3, 0.0, 0.0 );
float3 specCol = float3( 1.0, 1.0, 1.0 );
denormalized
denormalized
denormalized
//
//
//
//
power
lit color
dark color
specular color
// Get eye-space eye vector.
float3 e = normalize( -In.TEX7.xyz );
// Get eye-space normal vector.
float3 n = normalize(float3(In.TEX4.z, In.TEX5.z, In.TEX6.z));
Cg Example – part 2
float
float3
float3
float3
edgeMask = (dot(e, n) > 0.4) ? 1 : 0;
lpos = float3(3,3,3);
l = normalize(lpos - In.TEX7.xyz);
h = normalize(l + e);
float specMask = (pow(dot(h, n), m) > 0.5) ? 1 : 0;
float hiMask = (dot(l, n) > 0.4) ? 1 : 0;
float3 ocol1 = edgeMask *
(lerp(lowCol, hiCol, hiMask) + (specMask *specCol));
fragout O;
O.COL = float4(ocol1.x, ocol1.y, ocol1.z, 1);
return O;
}
New vector operators
• Swizzle – replicate/rearrange elements
a = b.xxyy;
• Write mask – selectively over-write
a.w = 1.0;
• Vector constructor builds vector
a = float4(1.0, 0.0, 0.0, 1.0);
Real-Time Rendering
27
David Luebke
Change to constant-typing mechanism
• In C, it’s easy to accidentally use high precision
half x, y;
x = y * 2.0;
// Double-precision multiply!
• Not in Cg
x = y * 2.0;
// Half-precision multiply
• Unless you want to
x = y * 2.0f; // Float-precision multiply
Real-Time Rendering
28
David Luebke
Dot product, matrix multiply
• Dot product
– dot(v1,v2);
// returns a scalar
• Matrix multiplications:
– matrix-vector: mul(M, v);
– vector-matrix: mul(v, M);
– matrix-matrix: mul(M, N);
Real-Time Rendering
29
// returns a vector
// returns a vector
// returns a matrix
David Luebke
Demos and Examples
Real-Time Rendering
30
David Luebke
Cg runtime API
•
•
•
•
•
•
Compile a program
Select active programs for rendering
Pass “uniform” parameters to program
Pass “varying” (per-vertex) parameters
Load vertex-program constants
Other housekeeping
Real-Time Rendering
31
David Luebke
Runtime is split into three libraries
• API-independent layer – cg.lib
– Compilation
– Query information about object code
• API-dependent layer –
cgGL.lib and cgD3D.lib
– Bind to compiled program
– Specify parameter values
– etc.
Real-Time Rendering
32
David Luebke
More Information…
• The Cg Toolkit (includes Cg user’s manual):
– http://developer.nvidia.com/object/cg_toolkit.html
• NVIDIA’s GPU Programming Guide:
– http://developer.nvidia.com/object/gpu_programming_guide.html
• Cg Tutorial Book
• The CgShader class
– See http://www.cs.virginia.edu/~rs5ea/tools/
• Hello, Cg (haven’t checked out myself):
– http://developer.nvidia.com/object/hello_cg_tutorial.html
Real-Time Rendering
33
David Luebke
Download