Introduction to direct3D 11 graphics pipleline

Introduction to the
Direct3D 11 Graphics Pipeline
Allison Klein
Senior Lead Program Manager
Direct3D
Microsoft
Executive Summary: D3D 11
Direct3D 11 focuses on scalability and
performance, a creating a better development
experience, and extending the reach of the GPU
Direct3D 11 is a strict superset of D3D 10 & 10.1
D3D 11 adds support for new features to D3D 10.1
The fastest way to move to Direct3D 11 is to start
developing on Direct3D 10/10.1 today
Direct3D 11 will be available on Windows Vista &
future Windows operating systems
Direct3D 11 will run on down-level hardware
You can all go back to sleep now.
Outline
Overview
Drilldown
Summary
Direct3D 10
Cleaner API  Easier coding than Direct3D 9
More efficient DDI  Driver Optimization
A more consistent experience across
hardware!
Tighter specification
Elimination of caps
Direct3D 10.1
Improved multisampling
MSAA depth access in shader
Expose sample positions
Explicit coverage control
4-sample MSAA required
Improved fixed-function blending
Per-MRT blend mode
16-bit integer blending
Arrays of cube maps
Direct3D 10.1 (Cont’d)
Improved performance over Direct3D 10
6-10% for common cases
20-30% for applications relying on MSAA such
as deferred shading engines
Algorithms closer to Direct3D 11 and
future APIs
Direct3D Issues/Opportunities
Scalability
Performance
Cross-Platform Content and Techniques
General-Purpose Data-Parallel Computing
Outline
Overview
Drilldown
Summary
Outline
Overview
Drilldown
Tessellation
Compute Shader
Multithreading
Dynamic Shader Linkage
Improved Texture Compression
Quick Glance at Other Features
Summary
Current Authoring Pipeline
(Rocket Frog Taken From Loop &Schaefer, "Approximating Catmull-Clark Subdivision Surfaces with Bicubic Patches“)
Sub-D Modeling
Polygon Mesh
Animation
Displacement Map
Generate LODs
Character Authoring (Cont’d)
Trends
Denser meshes, more detailed characters
~5K triangles -> 30-100K triangles
More complex animations
Animations on polygon mesh vertices more costly
Result
Indirection in authoring pipeline more painful
Painful I/O issues
Solution
Use higher-level surface representation longer
Animate control cage (~5K vertices)
Generate displacement & normal maps
Direct3D 11 Pipeline
Input Assembler
Direct3D 10 pipeline
Plus
Three new stages for
Tessellation
Vertex Shader
Hull Shader
Tessellator
Domain Shader
Geometry Shader
Rasterizer
Pixel Shader
Output Merger
Stream Output
Hull Shader (HS)
HS input:
patch control pts
One Hull Shader
invocation per
patch
Hull Shader
HS output:
Patch control pts after
Basis conversion
HS output:
• TessFactors (how much to tessellate)
• fixed tessellator mode declarations
Tessellator
Domain
Shader
Fixed-Function Tessellator (TS)
Note: Tessellator
does not see
control points
Hull Shader
TS input:
• TessFactors (how much to tessellate)
• fixed tessellator mode declarations
Tessellator
TS output:
• U V {W} domain
points
Tessellator
operates per
patch
Domain
Shader
TS output:
• topology
(to primitive assembly)
Domain Shader (DS)
Hull Shader
DS input:
• control points
• TessFactors
Tessellator
DS input:
• U V {W} domain points
Domain Shader
One Domain Shader
invocation per
point from
Tessellator
DS output:
• one vertex
Direct3D 11 Pipeline
Input Assembler
Vertex Shader
Hull Shader
Tessellator
Domain Shader
Geometry Shader
Rasterizer
Pixel Shader
Output Merger
Stream Output
D3D11 HW Feature
D3D11 Only
Fundamental
primitive is patch
(not triangle)
Superset of Xbox 360
tessellation
Example Surface Processing Pipeline
Single-pass process!
vertex shader
hull shader
Animate/skin
Control
Points
Transform basis,
Determine how
much to tessellate
patch
control points
transformed
control points
Tess
Factors
control points
in Bezier patch
tessellator
domain shader
Tessellate!
Evaluate
surface
including
displacement
U V {W}
domain points
displacement
map
Sub-D Patch
Bezier Patch
New Authoring Pipeline
(Rocket Frog Taken From Loop &Schaefer, "Approximating Catmull-Clark Subdivision Surfaces with Bicubic Patches“)
Animation
Sub-D Modeling
Displacement Map
Optimally Tessellated Mesh
GPU
Tessellation: Summary
Helps us get closer to eliminating “pointy heads”
Scales visual quality across PC hardware configurations
Supports performance increases
Coarse model = compression, faster I/0 to GPU
Rendering tailored to each end user’s hardware
Better cross-platform (Windows + Xbox 360)
development experience
Xbox 360 has a subset of D3D11’s tessellation
Parity = ease of cross-platform development
Extra features = innovation for Windows gaming
Render content as the artist created it!
Want to Know More?
“Direct3D 11 Tessellation”
Tuesday, 4:00-4:55pm (Next)
Kev Gee (Microsoft)
“Advanced Topics in GPU Tessellation”
Wednesday, 10:15-11:10am
Natasha Tatarchuk (AMD)
“Water-Tight, Textured, Displaced Subdivision
Surface Tessellation Using Direct3D 11”
Wednesday, 1:30-2:25pm
Ignacio Castano (NVIDIA)
Outline
Overview
Drilldown
Tessellation
Compute Shader
Multithreading
Dynamic Shader Linkage
Improved Texture Compression
Quick Glance at Other Features
Summary
GPGPU = Data Parallel Computing
GPU performance continues to grow
Many applications scale well to massive
parallelism without tricky code changes
Direct3D is the API for talking to GPU
How do we expand Direct3D to GPGPU?
Direct3D 11 Pipeline
Input Assembler
Vertex Shader
Hull Shader
Tessellator
Domain Shader
Geometry Shader
Stream Output
Direct3D 10 pipeline
Plus
Three new stages for
Tessellation
Plus
Compute Shader
Rasterizer
Pixel Shader
Output Merger
Data Structure
Compute
Shader
Integration with Direct3D
Fully supports all Direct3D resources
Targets graphics/media data types
Evolution of DirectX HLSL
Graphics pipeline updated to emit general
data structures…
…which can then be manipulated by
compute shader…
And then rendered by Direct3D again
Example Scenario
Render scene
Write out scene image
Use Compute for
image post-processing
Output final image
Input Assembler
Vertex Shader
Hull Shader
Tessellator
Domain Shader
Geometry Shader
Stream Output
Rasterizer
Pixel Shader
Output Merger
Data Structure
Compute
Shader
Target Applications
Image/Post processing:
Image
Image
Image
Image
Reduction
Histogram
Convolution
FFT
A-Buffer/OIT
Ray-tracing, radiosity, etc.
Physics
AI
Compute Shader: Summary
Enables much more general algorithms
Transparent parallel processing model
Full cross-vendor support
Broadest possible installed base
Want to Know More?
“Direct3D 11 Compute Shader—
More Generality for Advanced Techniques”
Wednesday, 4:00-4:55pm
Chas Boyd (Microsoft)
Outline
Overview
Drilldown
Tessellation
Compute Shader
Multithreading
Dynamic Shader Linkage
Improved Texture Compression
Quick Glance at Other Features
Summary
Multithreading Today
Physics
Graphics
GPU
AI
Multithreading Today
Physics
CPU-Bound Graphics
GPU
AI
D3D11 Multithreading Usage
Enables distribution across threads of
Application code
Runtime
Driver
Device: free threaded resource creation
Immediate Context: your single primary
device for state & draws
Deferred Contexts: your per-thread
devices for state & draws
Display Lists: Recorded sequence of
graphics commands
Direct3D 11 Multithreading
Now, the following can be distributed across
threads:
Application
Direct3D 11 Runtime
Direct3D 11 Drivers
Updated Direct3D 10 and 10.1 Drivers
Direct3D 11 Multithreading
Application
Direct3D 11 Runtime
Existing 10/10.1 Drivers
Direct3D 11 Driver
Direct3D 10/10.1 HW
Direct3D 11 HW
Direct3D 11 Multithreading
Application
Direct3D 11 Runtime
New 10/10.1 Drivers
Direct3D 11 Driver
Direct3D 10/10.1 HW
Direct3D 11 HW
Multithreading: Summary
Improves performance
Scalable across hardware configurations in
two ways:
# of CPUs
Graphics cards/drivers
Better cross-platform (Windows+Xbox 360)
development experience
Want to Know More?
“Multithreaded Rendering for Games”
Wednesday, 1:30-2:25pm
Matt Lee (Microsoft)
Outline
Overview
Drilldown
Tessellation
Compute Shader
Multithreading
Dynamic Shader Linkage
Improved Texture Compression
Quick Glance at Other Features
Summary
Shader Issues Today
Shaders getting bigger, more complex
Shaders need to target wide range of hardware
Two approaches today:
Write specialized shaders
Good: Build optimal shaders as specializations
Bad: Generates lots of shaders
Write “one shader to rule them all”
Combines multiple shaders
Good: Reduces shader binding changes
Bad: Code is complex
Answer: Subroutines
Shader Subroutines
Über-shader
Dynamic Subroutine
foo (…) {
if (m == 1) {
// do material 1
} else if (m == 2) {
// do material 2
}
if (l == 1) {
// do light model 1
} else if (l == 2) {
// do light model 2
}
}
Material1(…) { … }
Material2(…) { … }
Light1(…) { … }
Light2(…) { … }
foo(…) {
(*material)(…);
(*light)(…);
}
Application binds appropriate
*material, *light
Shader Subroutines
Details
Calls must be fast
Binding applies to all primitives in a Draw call
Binding operation must be fast
Need parameter passing mechanism
Need access to textures, samplers, etc.
Advantages
Reduce register usage in Über-shaders
Not worst case of all if statements
Allows specialization of subroutines
Want to Know More?
“High Level Shader Language (HLSL)
Update—Introducing Version 5.0”
Tuesday, 5:05-6:00pm
Michael Oneppo (Microsoft)
Outline
Overview
Drilldown
Tessellation
Compute Shader
Multithreading
Dynamic Shader Linkage
Improved Texture Compression
Quick Glance at Other Features
Summary
Why New Texture Formats?
Existing block palette interpolations too
simple
Results often rife with blocking artifacts
No high dynamic range (HDR) support
NB: All are issues we heard from
developers
Two New BC’s for Direct3D11
BC6 (aka BC6H)
High dynamic range
6:1 compression (16 bpc RGB)
Targeting high (not lossless) visual quality
BC7
LDR with alpha
3:1 compression for RGB or 4:1 for RGBA
High visual quality
New BC’s: Compression
Block compression (unchanged)
Each block independent
Fixed compression ratio
Multiple block types (new)
Tailored to different types of content
Smooth gradients vs. noisy normal maps
Varied alpha vs. constant alpha
Also new: decompression results must be bit-accurate with spec
Multiple Block Types
Different numbers of color interpolation lines
Less variance in one block means:
1 color line
Higher-precision endpoints
More variance in one block means:
2 (BC6 & 7) or 3 (BC7 only) color lines
Lower-precision endpoints and interpolation bits
Different numbers of index bits
2 or 3 bits to express position on color line
Alpha
Some blocks have implied 1.0 alpha
Others encode alpha
Partitions
When using multiple color lines, each pixel
needs to be associated with a color line
Individual bits to choose is expensive
For a 4x4 block with 2 color lines
162 possible partition patterns
16 to 64 well-chosen partition patterns give a
good approximation of the full set
BC6H: 32 partitions
BC7: 64 partitions, shares first 32 with BC6H
Example Partition Table
A 32-partition table for 2 color lines
Comparisons
Orig
BC3
Orig
BC7
Abs Error
Comparisons
Orig
BC3
Orig
BC7
Abs Error
Comparisons
HDR Original at
given exposure
Abs Error
BC6 at
given exposure
Outline
Overview
Drilldown
Tessellation
Compute Shader
Multithreading
Dynamic Shader Linkage
Improved Texture Compression
Quick Glance at Other Features
Summary
A Plethora of Other Features
Addressable Stream Out
Draw Indirect
Pull-model attribute eval
Improved Gather4
Min-LOD texture clamps
16K texture limits
Required 8-bit subtexel,
submip filtering precision
Conservative oDepth
2 GB Resources
Geometry shader instance
programming model
Optional double support
Read-only depth or
stencil views
Outline
Overview
Drilldown
Tessellation
Compute Shader
Multithreading
Dynamic Shader Linkage
Improved Texture Compression
Quick Glance at Other Features
Summary
Direct3D 11
Direct3D 11 is strict superset of Direct3D 10 & 10.1
Direct3D 11 adds support for features like
multithreading, tessellation, compute to Direct3D 10.1
The fastest way to move to Direct3D 11 is to start
developing on Direct3D 10/10.1 today
Direct3D 11 will be available on Windows Vista and
future Windows operating systems
Direct3D 11 will run on down-level hardware
Multithreading!
Direct3D 10.1, 10, and 9 hardware/drivers
Full functionality (for example, tessellation) will
require Direct3D 11 hardware
When Can I Get It?
Preview bits will be in November 2008 SDK
Will work on Windows Vista
Will run on Direct3D10/10.1 hardware
Full documentation, samples, etc.
Questions?
www.xnagamefest.com
© 2008 Microsoft Corporation. All rights reserved.
This presentation is for informational purposes only.
Microsoft makes no warranties, express or implied, in this summary.