GRAPHICS PIPELINE PERFORMANCE OPTIMIZATION

advertisement
GRAPHICS PIPELINE PERFORMANCE OPTIMIZATION
Sumantra Dasgupta
Summary of work to date: A theoretical background of the problem has been formed.
The identification and optimization of bottlenecks can be summarized as follows:
A. Locate Bottlenecks:
1. Raster Operations: This can be detected by altering the number of bits
used for color or depth.
2. Texture Bandwidth: This can be detected by altering the LOD bias.
3. Fragment Shading: This can be detected by changing the resolution of the
screen or by altering the length of the fragment program.
4. Vertex processing: This can be detected by changing the length of the
vertex program.
5. Vertex and Index Transfer: This can be detected by changing the vertex
format size.
6. CPU bound: This can be detected by under-clocking the CPU.
B. Optimize:
1. Optimization on CPU:
a. Reduce resource locking.
b. Maximize batch size.
2. Reducing the cost of vertex transfer:
a. Use minimum number of bytes.
b. Generate derivable vertex attributes inside vertex programs.
c. Use 16 bit indices.
d. Access vertex data sequentially and reduce cache miss.
3. Optimizing vertex processing:
a. Optimize the post T&L vertex cache.
b. Reduce number of vertex processed.
c. Use vertex processing LOD
d. Pull out per object computation onto the CPU.
e. Use correct coordinate space.
f. Use vertex branching to early out of computations.
4. Speeding up fragment shading:
a. Render depth first.
b. Render in a roughly front to back manner.
c. Store complex functions in textures.
d. Move per fragment work to vertex shaders.
e. Use lowest precision necessary.
f. Avoid excessive normalization of vectors.
g. Use fragment shader LOD.
h. Disable trilinear filtering where necessary.
i. Use the simplest shader type possible.
5. Reducing texture bandwidth:
a. Reduce texture size.
b. Compress color textures.
c. Avoid expensive texture formats.
d. Use mipmapping on any surface that may be minified.
6. Optimizing frame buffer bandwidth:
a. Render depth first.
b. Reduce alpha blending.
c. Turn off depth writes when possible.
d. Avoid extraneous color buffer clears.
e. Render front to back.
f. Optimize skybox rendering.
g. Use floating point frame buffers only when needed.
h. Use 16 bit depth buffer.
i. Use 16 bit color.
Analysis of work to date: The common practices used to identify and optimize
bottlenecks in modern day GPUs have been studied and understood. Adequate theoretical
background has been formed to actually optimize a real unoptimized graphics program.
Plan for completion: The work ahead is planned as follows:
1. Further literature survey: Next 1 week.
2. Develop a graphics program where the user can navigate through the scene: Next
2 weeks.
3. Identify the bottlenecks of the rendering program in step 2 above: By next update.
4. Optimize the program and complete the project (including write-up and
benchmarking with the unoptimized program): By final submission date.
5. Derive a statistical model for bottleneck optimization: if time permits, after the
completion of the project.
References:
1. Wloka, Matthias, 2003 “Batch, Batch, Batch: What Does It Really Mean?”
Presentation at Game Developers Conference 2003
2. http://developer.nvidia.com
3. http://www.ti.com
4. http://msdn.microsoft.com/directx
Download