DirectX and Streaming Video Drivers

advertisement
®
DirectX And Streaming
Video Drivers
Jeff Noyle, Development Lead
Gary Sullivan, Software Design Engineer
William Messmer, Software Design Engineer
Eric Rudolph, Software Design Engineer
Microsoft Corporation
Speakers




“DirectX Graphics Drivers,” Jeff Noyle,
Lead Developer, DirectDraw®/Direct3D®,
Microsoft Corporation
“DirectX VA Video Acceleration Drivers,”
Gary Sullivan, Software Design Engineer,
DMD Video Services Group,
Microsoft Corporation
“Writing AVStream Minidrivers for
Windows® XP,” William Messmer,
Software Design Engineer,
Digital Audio-Video, Microsoft Corporation
“Testing Your WDM Driver with DirectShow®,”
Eric Rudolph, SDE, DirectShow Editing
Services, Microsoft
DirectX Graphics Drivers
Jeff Noyle
Development Lead
DirectDraw/Direct3D
Microsoft Corporation
Prerequisites

I’m assuming

Basic familiarity with DirectDraw and
Direct3D concepts:




System Architecture
Surfaces
Page flipping
The DDK can be hard to read
Agenda






Single-source issues
Windows 9x issues
OS-independent issues
DirectX 7.0 implementation details
Changes in DirectX 8.0
What can you do next?
Single-Source Issues
Stuff you should know if you want
one code-base to support
Windows 9x OS versions and
Windows NT® OS versions
Allocating System Memory
Per-Surface


(Do NOT use this process to allocate
surface memory itself...See later)
Normally system memory is charged
against a particular process


Can’t free it in some other process
(as in ctrl-alt-del mechanism)
Use EngAllocPrivateUserMem and
EngFreePrivateUserMem

Uses DirectDraw object to locate
proper process context
YUV/FOURCC Surfaces


System memory YUV/FOURCC
surfaces on NT systems


DirectDraw Kernel-mode “pretends”
that these surfaces are 8bpp RGB for the
purposes of allocating memory
DXTn:



Height: height in 4x4 blocks
Width: width in blocks * sizeof(block)
You must undo these transformations at
CreateSurface time
YUV/FOURCC Surfaces

NT kernel mode doesn’t understand
any FOURCC formats, so:


The driver must handle video memory
allocation for these types
The driver must handle Lock for
these types
Windows 2000 Issue
(Fixed In Windows XP)


During allocation of an AGP surface...
If the driver fails to allocate and:





returns DDHAL_DRIVER_HANDLED
AND sets an error code in ddRVal
AND sets the surface’s lpVidMemHeap
to non-zero
Then the system will ignore the error
So NULL the lpVidMemHeap on error!
Atomic Surface Creation


On Windows 9x, drivers are given
a list of surfaces
On Windows NT, drivers are given
surfaces one-at-a-time, unless:


Driver reports GUID_NTPrivateDriverCaps
and sets DDHAL_PRIVATECAP_
ATOMICSURFACECREATION
Windows NT Extra

You can use the
GUID_NTPrivateDriverCaps to request
notification of primary surface:

Set DDHAL_PRIVATECAP_
NOTIFYPRIMARYCREATION
Windows 9x Issues
System-To-Video Blts


To speed up some titles, implement
system-to-video blts
All you need to implement is SRCCOPY,
no stretch


But you should implement sub-rects
DirectDraw assumes your driver
requires system memory to be
pagelocked during Blt

If this is not true, set
DDCAPS2_NOPAGELOCKREQUIRED
OS-Independent Issues
HeapVidmemAllocAligned





It’s an “Eng” function in
Windows NT versions
It’s a ddraw.dll export in Windows 9x
You can use this to allocate
surface memory
You must have passed the heap to
DirectDraw previously
You must fill in the fpHeapOffset,
fpVidmem and lpVidmemHeap
of the surface
Heap Offsets Explained
Return values from
HeapVidmemAllocAligned
are these offsets:
(Note fpStart is set to 0x1000
by DirectDraw for AGP heaps)
fpEnd
(points TO
last byte)
Heap
Surface
Return value from
HVMAA and fpHeapOffset
fpStart
“0”
DDSCAPS_VIDEOMEMORY


Remember that this includes AGP
unless combined with
DDSCAPS_LOCALVIDMEM
At GetAvailDriverMem time,
a request that specifies
DDSCAPS_VIDEOMEMORY (and not any
explicit type: local or non-local) should
include both types in the total
GetScanLine



Implement this, if you can!
DirectX 8.0 uses it a lot for
presentation-Blt timing
Set DDCAPS_READSCANLINE,
so DirectX 8.0 knows
CreateSurfaceEx


More on this later
NEVER fail CreateSurfaceEx for system
memory surfaces, even if you don’t
understand the pixel format


Just return DDHAL_DRIVER_HANDLED
and DD_OK
(Otherwise new system-memory formats
used by the reference rasterizer
can’t be created)
Alpha-In-The-Primary

If your driver can do this in 32bpp:


Create an A8R8G8B8 render target
Blt that to the primary surface IGNORING
the alpha channel


(And stretch/shrink (please))
Then you should set:


DDHALINFO.vmiData.ddpfDisplay. dwFlags
|= DDPF_ALPHAPIXELS
DDHALINFO.vmiData.ddpfDisplay.
dwRGBAlphaBitMask = 0xFF000000
Windowed Applications
And Blt Queuing

Don’t allow “many” presentation-blts
in your queue


WHQL enforces low latency for
DirectX 8.0 drivers


That is, don’t allow a large latency between
scheduling and retiring a presentation-blt
Check DDBLT_PRESENTATION, and don’t
allow more than three
More info in ddraw.h
DDBLT_WAIT And
DDBLT_DONOTWAIT



Drivers should never look at these
They are set by the application/
DirectDraw runtime
They are handled by the
DirectDraw runtime


Sometimes DirectDraw spins, and wants
to do that in user-mode
Applies to DDFLIP_WAIT as well
DDBLT_ASYNC


Ignore this flag
Always perform your blts
asynchronously, if possible
What Are DDROPS?



We don’t know either
An idea of the original designer of
DirectDraw, but never implemented
or specified
In short: ignore!
Blt And YUV Surfaces



DirectShow can gain performance
benefits if it knows it can use Blt to
copy Overlay surfaces
Check to see if you can support
DDCAPS2_COPYFOURCC
This means you can SRCCOPY, no subrects, no stretch, no overlap between
two FOURCC surfaces of the same type
Update Overlay, Etc.

If multiple overlays are created, but you
have hardware for only one:


Succeed all CreateSurface calls
Fail the UpdateOverlay call
Flip Flags

DDFLIP_NOVSYNC



This means: flip immediately; do not wait
for vertical blank
The hardware must be capable of relatching the new primary surface address
immediately, or at least on the
next scanline
In other words, don’t allow the remaining
raster scans to read from the old
back buffer
Flip Flags

DDFLIP_INTERVALn


Please don’t implement by busy-waiting
in the driver
But please do implement if your hardware
can defer flips for n frames
Gamma Ramps



DirectDraw and Direct3D’s gamma
ramps are passed through the GDI DDI
call SetDeviceGammaRamp
This call is poorly prototyped
This is the struct you will be passed:
struct
{
WORD red[256]; //WORDs not BYTEs
WORD green[256];
WORD blue[256];
};
DirectX 7.0 Implementation
Details
Overview Of DirectX 7.0 Model



Direct3D refers to surfaces
via “handles”
Driver keeps a look-up table indexed
by handle
Driver keeps everything it needs to
know about a surface in this table
CreateSurfaceEx



Called after CreateSurface
Assigns a Direct3D-allocated handle
to the surface(s)
Driver runs attachment lists, creates
internal structures for each
surface in list
CreateSurfaceEx Is Hard



Driver has to run surface
attachment list
Z buffer might be attached, or
separate surface
Cubic Environment Maps are
the hardest...
Cubemap Attachments
(Abstract View)
Positive
X
Negative
X
Positive
Y
Mip SubLevel
Mip SubLevel
Mip SubLevel
...
...
...
...
Cubemaps (Struct View)
Positive X
lpAttachList
lpLink
lpAtt..
lpLink
lpAtt..
lpLink
lpAtt..
Positive Y
Negative X
+ X Mip
lpAttachList
lpLink
lpAtt..
lpAttachList
lpAttachList
lpLink
lpAtt..
lpLink
lpAtt..
+ X Mip
- X Mip
lpAttachList
Drivers Cannot



Keep pointers to DirectDraw’s surface
structures in their own structures
Flip confusion (explained later)
Overhead


Under DirectX 8.0, we don’t keep the
DirectDraw structure
...So DirectX 8.0 drivers CAN’T store
pointers – they will crash
Flip Confusion Explained
Before Flip:
User Mode
Front
Buffer
User Mode
Back
Buffer
Handle A
Handle A
Driver
Surface A
Driver
Surface B
After Flip
The user-mode
structures now
refer to different
pieces of
memory.
=> You cannot
store
pointers to the
user-mode
structs
in the driver
structs.
User Mode
Front
Buffer
User Mode
Back
Buffer
Handle B
Handle A
Driver
Surface A
Driver
Surface B
Aliasing: What It Is




Video memory is a shared resource
On mode switch, all must be given up
But the application may be writing
directly to video memory
We re-map the application’s view of
video memory to a dummy page, then
allow the mode switch to proceed

Only done at app’s request:
DDLOCK_NOSYSLOCK
Aliasing: How It’s Done

When the driver returns a pointer to
video memory at CreateSurface time:


The offset into the frame buffer is
calculated, and then an equivalent
aliased pointer is returned to
the application
If the pointer lies outside of video memory,
no aliasing is done (we don’t know
enough to do so)
Aliasing: How To Break It

On Windows NT systems, the driver
must NOT return a pointer outside of
video memory at Lock time



This pointer will not be aliased
The application will crash if a mode
switch happens
Drivers should allocate system memory
at CreateSurface time
(PLEASE_ALLOC_USERMEM)
Changes For DirectX 8.0
Driver Capabilities Are
Constant Across Modes



This means everything in D3DCAPS8
The caps are allowed to be “nothing”
in some modes, e.g., 24bpp
You are allowed to support different
back buffer formats

That is, the one that matches the
front buffer
Pixel Formats In DirectX 8.0


Goodbye DDPIXELFORMAT
Hello D3DFORMAT


All FOURCCs are D3DFORMATs
D3DFMT has this form
Byte 3
Byte 2
Byte 0
Vendor ID (0=Microsoft)
(Use your PCI Vendor ID)
Nonzero
=> FOURCC
Byte 1
Format
Number
D3DFORMAT Examples

D3DFMT_A1R5G5B5


IHV-defined Format



0x00000019
0xACAT0001
(PCI ID 0xACAT, not FOURCC, format 1)
FOURCC “UYVY”


0x55595659
(Byte 2 is non-zero)
IHV-Def’d Texture Formats

Since Direct3D doesn’t understand



These formats cannot be “managed”
Applications can lock these
surfaces directly
(In fact this is the only way to fill such
surfaces with data)
DirectX 8.0 Format Op-list


The format op-list tells DirectX 8.0
everything about capabilities that
vary with surface format
For each format, the driver sets bits
that indicate:




Can Texture from this format
Render to this format
Switch display mode to this format
Has caps in modes of this format
Format Op-List Tricks



The runtime searches for the first entry
that has all required capabilities
Example: Application wishes to render
to 565 texture
Runtime will search for an Op-List
entry with:

D3DFORMAT_OP_TEXTURE |
D3DFORMAT_OP_OFFSCREEN
_RENDERTARGET
Format Op-List Tricks


Driver A can render to 565 texture
Sets this entry:


Format = D3DFMT_R5G6B5
Ops = D3DFORMAT_OP_TEXTURE |
D3DFORMAT_OP_OFFSCREEN
_RENDERTARGET
Format Op-List Tricks


Driver B can NOT render and texture
from the same surface, but can do
both operations individually
Sets TWO entries




Format1 = D3DFMT_R5G6B5
Ops1 = D3DFORMAT_OP_TEXTURE
Format2 = D3DFMT_R5G6B5
Ops2 = D3DFORMAT_OP_OFFSCREEN
_RENDERTARGET
What Can You Do Next?

If you develop DX Graphics Drivers:

You need a relationship with Microsoft’s
DirectX team, and should contact IHV
Program Manager:


Michele Boland (MBoland@microsoft.com)
Install and run against DEBUG runtimes


Available in the DirectX SDK
Will output debug messages for
common errors
DirectX VA
Video Acceleration
Drivers
Gary Sullivan
GarySull@microsoft.com
Software Design Engineer
DMD Video Services Group
Microsoft Corporation
Agenda




DirectX VA design and status
Current and future requirements
and tests
Future plans and potential extensions
What can you do next?
DirectX VA
Decouple software decoder operation
from hardware accelerator design to
Prime Directive
achieve
full interoperability
Any other
MPEG-2
MPEG-4
DirectX VA
MPEG-1
H.263++
H.261
Motion Comp
Inverse DCT
VLD
What Is DXVA?
What Can It Achieve?






Interoperable interface between video
decoding software and advancedcapability graphics accelerators
Increases video capability for the
consumer’s PC
Increases the demand for advanced graphics
accelerators and video applications
Decreases implementation effort for
software decoder writers
Decreases support burden for graphics
accelerator companies
Decreases testing burden for OEMs
DirectX VA
General Status





Spec went 1.0 with DirectX 8.0 Beta 2 (October ’00)
See http://www.microsoft.com/hwdev/DirectX_VA
OEMs love it – it enables separate WHQL qualification
of decoders and drivers
Software decoder companies are developing with it
(Mediamatics, Intervideo, Ravisent, Cyberlink,
MGI/Zoran, MbyN, …)
Hardware accelerator companies are supporting it in
drivers (ATI, Nvidia, Intel, SiS, S3, SiliconMotion, …)
DirectX VA Capabilities


Emphasis on MPEG-2 and DVD “sub-picture”
Support of all important video coding standards
(H.261, H.263, MPEG-1,
MPEG-2, MPEG-4)



Alpha graphic blending (e.g., DVD subpicture)
Three basic degrees of decoding configuration
capability:




And some non-standard variations on
the standards
Motion compensation on accelerator with host residual
difference decoding
Motion compensation and IDCT on accelerator
Full raw bitstream decoding
Externally-defined encryption support
How Does DXVA Operate?





Operation with Windows 2000 Overlay Mixer
(OVM) or new Windows XP Video Mixing
Renderer (VMR)
Requires DirectX 8.0 or Windows XP
Decoders use it through existing Windows
2000 “IAMVideoAccelerator” API
Drivers use it through corresponding
Windows 2000 “MoComp” DDI
DirectVA specifies payload content of data
buffers that previously had acceleratorspecific formats
Host Versus Accelerator
Functional Split





Bitstream processing either on host
or accelerator
Accelerator handles the primary data
flow and performs the intensive
signal processing
PCI/AGP is the bridge between the two
Reconstruction loop maintained in
graphics Accelerator memory
Host processing converts standardspecific streams into generic
Accelerator work units
Today’s DirectX VA
Compressed Video
Source
Variable-Length
Decoding
Residual Difference
Decoding (IDCT)
Motion
Compensation
Sum & Clip
Frame Storage
OVM/VMR/3D
Graphic
Source
Graphic Decoder
Graphic Blending
(Content Protection Supported Outside of Scope)
Constrained Parameter
Profiles


Strategy is to define a general interface
and a number of constrained-parameter
profiles, with decoder data structure
configuration settings
Profiles defined:




MPEG-2 Main Profile with and without
DVD Subpicture
Several H.263/MPEG-4 profiles
MPEG-1
H.261 with and without deblocking
post-processing
Defined Buffer Types


Picture-level decoding parameter buffers
Buffers for bitstream decoding:




Buffers for macroblock-level decoding:










Macroblock control buffers
Residual difference data buffers
Buffers for graphic blending:


Bitstream data buffers
Bitstream slice control buffers
Inverse quantization matrix buffers
Alpha+YUV graphic buffers
AI44 graphic buffers
DVD DPXD graphic buffers
DVD highlight definition buffers
DVD display control command buffers
Alpha blend combination buffers
Deblocking filter control buffers
Picture resampling buffers
Read-back data buffers
DXVA Requirement Plans
Primary Goals



Clear specification for MPEG-2
interoperability (and front-end DVD
subpicture) is the primary goal
Driver and decoder that claim video
acceleration must support DXVA
Specific “minimal interoperability set”
for each defined profile
July ’01 Stated
Requirements








MPEG2_A and MPEG2_C required
MPEG1_A required
H263_A required (?!)
Arithmetic accuracy required
IDCT accuracy required
Picture resolutions up to 720x576
Uncompressed surface types must
include NV12 in supported list
Must have “front end” capability to
convert to YUY2 from format in use
July ’01 Actual Tests







StRowe test decoder developed
Test driver also developed
Released DCT400 driver tests cover
MPEG2_A, _B, _C, _D profiles
Pass/Fail based on MPEG2_A and _B
Tests are currently of functional
operation and visual performance
Contact us (?!) if any test problems
Don’t ship untested features (?!)
Structure Of Motion Comp Data




All standards send only luma motion
vectors, deriving chroma vectors
from luma vectors
Each standard derives chroma vectors
in its own way
Switches for configuring the motion
comp are provided to minimize host
“translation” requirements
MPEG-2 Dual-Prime motion vectors
derived on host
DXVA Macroblock Control
Example
/* Basic form for P and B pictures */
typedef struct _DXVA_MBctrl_P_OffHostIDCT_1
{
WORD wMBaddress;
WORD wMBtype;
DWORD dwMB_SNL;
WORD wPatternCode;
UINT8 NumCoef[6];
DXVA_MVvalue MVector[4];
} DXVA_MBctrl_P_OffHostIDCT_1;
Structure Of Residual Data
Background (1 of 2)

Things that vary within and across standards:







Coefficient scan schemes
Intra Coefficient prediction schemes
VLC schemes
Inverse quantization schemes
Mismatch-control schemes
…
These things need lots of logic – not always
justified for accelerator implementation
Structure Of Residual Data
Background (2 of 2)

Things that do not vary within and
across standards

IDCT definition




Conformance rules may slightly differ – but
multi-standard conformance not a big problem
Many zero-valued coefficients
Predicted-versus-Intra operation
Only a few currently-specified
inverse scans
Structure Of Residual Data
The Chosen Method




Keep standard-specific issues on the
host to the extent possible
Support host-based or acceleratorbased IDCT
Send only non-zero coefficients
Send index or run-length for coefficients
Residual Difference Example
(Off-Host IDCT 16b TCOEFF)
typedef struct _DXVA_TCoefSingle
{
WORD
wIndexWithEOB;
SHORT TCoefValue;
} DXVA_TCoefSingle, *LPDXVA_TCoefSingle;
/* Macros for Reading EOB and Index Values */
#define readDXVA_TCoefSingleIDX(ptr) ((ptr)->wIndexWithEOB >> 1)
#define readDXVA_TCoefSingleEOB(ptr) ((ptr)->wIndexWithEOB & 1)
/* Macros for Writing EOB and Index Values */
#define writeDXVA_TCoefSingleIndexWithEOB(ptr, idx, eob) ((ptr)>wIndexWithEOB = ((idx) << 1) | (eob))
#define setDXVA_TCoefSingleIDX(ptr, idx) ((ptr)->wIndexWithEOB |=
((idx) << 1))
#define setDXVA_TCoefSingleEOB(ptr)
((ptr)->wIndexWithEOB |=
1)
Decoding Configurations
(Part 1 of 2)


Bitstream decoding vs. Host VLD
Encryption:




Bitstream data if bitstream decoding
Macroblock control commands and/or residual
difference data if Host VLD
Type of encryption protocol supported
For Host VLD:


Host-based residual difference decoding versus
Accelerator-based residual difference
decoding versus both
Macroblock control commands in raster-scan
order versus arbitrary order
Decoding Configurations
(Part 2 of 2)

For host-based residual difference decoding







8b vs. 16b differences
If 8b differences, overflow supported, or not
If 8b differences, subtract second pass, or not
Interleaved chroma or not
Host clips range of data, or not
Intra residuals unsigned, or not
For accelerator-based difference decoding



Specific IDCT support
Inverse scan on host or accelerator
Coefficients sent in groups of four, or singly
Alpha Blending
Configurations

AYUV alpha blend graphic loading


AI44 or IA44 +palette or DPXD+Highlight
or AYUV
Alpha blend combination operation:





Front-end versus back-end
Picture resizing or not
Only use picture destination area or not
Graphic resizing or not
Whole plane alpha or not
Longer Term Requirements






Include H263_A, _B, _C in tested
requirements
Include mathematical motion comp and
IDCT accuracy in tests
Add speed performance testing
Picture resolutions up to 1920x1088
Six or more uncompressed surfaces
Specific FOURCC surface types for
uncompressed surfaces
Kill Superfluous Configs







bConfigRasterOrder = 0
bConfigResidDiffHost = 1
(bConfigResid8Subtraction = 1 with
bConfigSpatialResid8 = 1) or
(bConfigResidDiffHost = 1 with
(bConfigSpatialResid8 = 0 and
bConfigSpatialHost8or9Clipping = 0))
bConfigIntraResidUnsigned = 0
bConfigSpatialResidInterleaved = 0
bConfigHostInverseScan = 0
bConfig4GroupedCoefs = 0
Enhance Blending Configs







Eliminate duplication of AI44 & IA44
(bConfigDataType = 0 & 1)
Require both AYUV and AI44/IA44
(bConfigDataType = 3 and 0/1)
Require front-end blend
(bConfigBlendType = 0)
bConfigPictureResizing = 1
bConfigOnlyUsePicDestRectArea = 0
bConfigGraphicResizing = 1
bConfigWholePlaneAlpha = 1
Hot Issue:
WMV/H.263/MPEG-4


Codecs beyond MPEG-2 need support
H263_A profile needs:


H263_B profile needs:





Different derivation of chroma motion
Rounding control
Motion vectors over picture boundaries
8x8 motion vectors
Alternative inverse scan (or host inverse scan)
H263_C profile needs:

Deblocking filter support (also in H263_B?!)
Desirable Future Extensions










De-interlacing
Interoperable encryption / DRM
Compressed-video encoding (including
ME, DCT, and so on)
Inverse-telecine
Hue/contrast/brightness/gamma/color
corrections
Future decoding methods (MPEG-4v2,
WMV, H.26L)
Frame rate conversion
Precise separable re-sampling
Gen lock/frame rate synchronization
TV out control
New GUIDs
Reducing Memory Use




Add three new GUIDs to parallel MPEG2_A,
MPEG2_B, and MPEG2_D
New GUID adds raw bitstream decoding to the
“minimal interoperability set” of the
corresponding existing GUID
Driver with raw bitstream support then need
not allocate buffers for macroblock-level
processing with these GUIDs
Drivers could also not expose bitstream
processing with existing GUIDs to
save memory
Interoperable Encryption





Define an interoperable encryption
scheme
Much like the old draft DXVA scheme
Certificates for establishing trust
(perhaps X.509 or something else
rather than old draft scheme)
RSA key exchange
AES (RIJNDAEL) content encryption
Other In-Scope Additions

Add new features for other codecs –
WMV, H.26L, MPEG-4v2, etc.







1/4-sample motion comp
Added motion comp sizes and shapes
New inverse transforms (e.g., 4x4)
Fine granularity scalability
Global motion comp
Studio profile features
More possible GUIDs for precise
codec/configuration needs
New Video Building Blocks






Deinterlacing
Inverse Telecine
Frame rate conversion
Contrast/Brightness/Gamma/Color
Precisely-specified resampling
Video compression encoding
Deinterlace/
Inverse Telecine





Deinterlace is crucial
Becoming a standard feature of
high-end consumer TVs
1080i in weave can look awful
1080i in bob can look wrong too
Deinterlace can be useful for either
decoding or encoding
Hypothetical DXVA Structure
Today’s Scope of
DirectX VA
De-interlace /
Inverse Telecine
Frame Rate
Conversion
?
OVM/VMR/3D
Color Conversions
& Adjustments
??
Scaling
???
Interoperable DRM/Conditional Access/Content Protection/Encryption
Video Encoding
?
Uncompressed
Video Source
Motion Estimation
Inverse Telecine /
De-interlace
Color Conversions
And Adjustments
Frame Storage
Motion
Compensation
Mode & Motion
Vector Decision
Sum and Clip
Residual Difference
Transform (DCT)
Quantization
Residual Difference
Decoding (IDCT)
Variable Length
Encoding
?
What Can You Do Next?
(To All) Give Us Your Proposals




About any difficulties/problems in design
About encryption design
About new in-scope feature needs
About how to support new features





Deinterlace/inverse telecine
Encoding
Frame rate conversion
Contrast/Brightness/Gamma/Color
Resampling
What Can You Do Next?
(For Graphic Accelerator Designers)


Make your MPEG-2 and DVD subpicture DXVA
solution rock-solid, fully-tested with every
available decoder, and frighteningly fast
Fully support YUV surfaces as textures
for input to 3-D


Design maximal WMV/H.263/MPEG-4 feature
support into your next generation



Conversion to RGB, and so on
But don’t expose them unless fully tested
Move to the preferred configurations and
uncompressed surface types
Support new memory-conserving GUIDs
Writing AVStream
Minidrivers For Windows XP
William Messmer, SDE
Digital Audio-Video
Microsoft Corporation
Agenda

AVStream minidriver architecture




Data processing
Writing a minidriver: key issues
and pitfalls




When and why to use AVStream
Exposing minidriver functionality
Walk through sample code
Common problems and mistakes
DirectX 8.0 versus Windows XP
What can you do next?
Why AVStream

THE next generation class driver





More efficient streaming
Reduces the amount of minidriver code
Simplifies development; faster to market
One minidriver, one model – no more confusion
over stream class versus port class
New features, new technologies will only be
supported in AVStream; stream and port class,
however, are still supported!
When To Use AVStream


BDA Drivers
New Device Types



Combined A/V devices
Kernel Software Transforms


Which are not already written to stream
class or port class
Audio Global Effects (GFX) Filters
No necessity to port existing stream
or port class drivers
Minidriver Architecture

Functionality is exposed as a tree
hierarchy described through
static descriptors






Device – described by Device Descriptor
Filter Factory – creates a type of Filter
Filter – described by Filter Descriptor
Pin Factory – creates a type of Pin
Pin – described by Pin Descriptor
Functionality provided through static
dispatch and automation tables
Minidriver Architecture
Device
Device Descriptor
Device
Add Device
Dispatch
Filter Descriptor
Filter
FilterDispatch
Create
>= 1
Filter Factory
>= 1
Filter Automation
Filter
Pin Automation
>= 1
Pin Factory
Pin Descriptor
Pin
PinDispatch
Create
>= 1
Pin
Key:
Minidriver Provided Table
Public AVStream Construct
Minidriver Dispatch Routine
Private AVStream Construct
Exposing Minidrivers

Expose your driver to AVStream




Call KsInitializeDriver in DriverEntry
passing your Device Descriptor
Return the status from KsInitializeDriver
AVStream handles PnP to get your
driver set up; minidriver gets calls
through device dispatch
Filter Factories set up by AVStream
during Add Device and Start Device
Exposing Minidrivers

AVStream creates filters/pins
based on descriptors




Minidriver receives creation dispatch
Creation dispatch associates minidriver
specific context with object
Object bags available as containers for
dynamic memory like contexts
AVStream handles cleanup of objects
based on bags

No forgetting to free dynamic memory
Minidriver Architecture

Sample Code (Exposing Functionality)
Data Processing

AVStream queues data/buffers



Minidriver queues not necessary
Cancellation handled in the queue
Data exposed through two abstractions:
stream pointers and process pins


Stream pointers are robust and allow versatile
queue management; typically used in
hardware drivers
Process pins work purely at a single buffer
level making for very simple software
transforms
Design Issues

Two distinct ways to handle data
processing

Filter-Centric processing


Pin-Centric processing


Specify filter process dispatch
Specify pin process dispatches
The choice of which to use will
influence design greatly
Filter-Centric Processing




Filter is called to process data in a
context where data is available on
all required pins
Typically used for software transforms
Stream pointer use not required
Processing based on an index of
process pins


Index/pins stable during processing
Minidriver does transform, specifies
how many bytes of each buffer used
Process Pins




One per pin – points back to the pin
Contains a stream pointer if needed
Contains a buffer virtual address and
size for data manipulation
Informs the process routine of the pin’s
relationships with other pins



InPlaceCounterpart – other pin in an
in-place transform pair
CopySource – pin data is copied from
DelegateBranch – pin that delegates
frames (in the same pipe)
Transform Example
INPUT
OUTPUT
1. Frame(s) arrive
Frame
Frame
2. Filter is called to process.
Filter sees two process pins:
(1920)
Gone
3. Process Pins Point to Buffers
Frame
(1100)
(140)
(2880)
Gone
(960)
4. Filter performs transform;
Sets 1920 bytes used on input and output
IN
5. Filter is called back;
more data to transform
OUT
6. Process Repeats Similarly
1920
1100 Bytes
Data
Data
2880960
Bytes
Buffer
Bytes
Buffer
Pin-Centric Processing



Each pin called to process data in a
context independent of other pins
Typically used for hardware drivers
Data accessed through stream pointer
abstraction
Stream Pointers



Reference a single frame in a queue
Hold that frame in the queue
Can be in multiple states




Locked – referenced data is safe to access;
Irp cannot be cancelled
Unlocked – not guaranteed to even
reference data; Irp can be cancelled
Can be cloned to create new pointers
into the data stream
Can schedule time-outs
Stream Pointers


Contain two offsets into the data
stream for ease of in-place use
Address data at one of two
granularities:


Byte – access via virtual address
Mapping – access via logical DMA
address


KSPIN_FLAG_GENERATE_MAPPINGS
Minidriver usable context available per
stream pointer
Stream Pointers And Queues
Oldest Frames
Frame
Frame
Frame
(1)
(1)
(1)
Frame
Leading
Edge
(0)
Frame
Trailing
Edge
(1)
Clone
Frame
(2)
Clones
Frame
Frame
Frame
(0)
(1)
(3)
Frame
Frame
(0)
(0)
Leading
Edge
Frame
Clones
(1)
Leading
Edge
Newest Frames
Direct DMA Example
QUEUE
Frame
Frame
Frame
Gone
(2)
(1)
Frame
Frame
Gone
(2)
(1)
1. Frame(s) arrive; minidriver called to process
2. Processing routine acquires leading edge

KsPinGetLeadingEdgeStreamPointer
3. Leading edge is cloned

KsStreamPointerClone
4. DMA Hardware is programmed
5. Leading edge is advanced
Frame
Frame
Gone
(2)
(1)
6. Process may repeat for more frames
7. Hardware interrupts for DMA completion
8. ISR Schedules a DPC
9. DPC releases the associated frames

KsStreamPointerDelete
10. May need to continue processing

KsPinAttemptProcessing
Data Frame Control

Held non-cancelable for a period



Can relinquish claim with callback


Use locked stream pointers
Consider stream pointer timeouts
Use unlocked stream pointers with
a cancel callback
Periodic access where frame can
disappear between accesses

Use unlocked stream pointers and
lock periodically
Processing Decisions

Filter-Centric




All pins are involved in the decision
Each pin type can have separate
requirements
One pin not fulfilling requirements will
veto processing for the entire filter
Pin-Centric


Only one pin is involved in the decision
Each pin type can have separate
requirements which do not influence
other pins
When Processing Happens

Default case (no pin flags)


Attempt will succeed if



Attempt made when frame arrives and
leading edge points to no frame
Involved pin(s) are >= KSSTATE_PAUSE
Involved pin(s) all have data
Continuing processing

STATUS_SUCCESS returned from
dispatch and conditions still met
Adjusting Processing

KSPIN_FLAG_

_INITIATE_PROCESSING_ON_EVERY…


_DO_NOT_INITIATE_PROCESSING


No frame arrival initiates
PROCESS_IN_RUN_STATE_ONLY


Every frame arrival initiates
Pin must be in KSSTATE_RUN
FRAMES_NOT_REQUIRED…

Data is not required on this pin
Adjusting Processing




Some mentioned flags useful
for pin-centric
Most flags useful for filter-centric
where all pins are involved in the
decision as to when to process data
See the DDK for a complete
description of flags
Understand when processing
happens based on your flags!
Adjusting Processing

Processing can happen in a DPC!



Dispatch level processing still synchronized



KSFILTER_FLAG_DISPATCH_LEVEL_PROCESSING
KSPIN_FLAG_DISPATCH_LEVEL_
PROCESSING
Processing mutex still held during dispatch
level processing
Can still be used to synchronize with processing
Data manipulation (stream pointer) API fully
dispatch level ready!
Walkthrough Sample Code

Pin-centric sample code
Common Problems

Internal mutexes are exposed

Three mutex types in a hierarchy




Device Mutex
Filter Control Mutex
Processing Mutex
Some calls require mutexes held


Sometimes AVStream holds the mutex for
you; sometimes you must hold the mutex!
See the DDK for this!
Common Problems

Mutex Rules



Do NOT take mutexes out of order:
device then control then processing
Do NOT take a mutex and call out –
not for properties, not for anything!
Walking the object hierarchy requires
mutexes held:


Device Mutex – device down to filter
Filter Control Mutex – filter down to pins
Common Problems

Do not traverse the object tree (filters
and pins) during processing!



KsFilterGetFirstChildPin
KsPinGetNextSiblingPin
Pin-centric filters should not need to
do this; filter-centric filters have the
process pins index
DirectX 8.0 Versus Windows XP

Mutexes in DirectX 8.0 are fast mutexes



Certain APIs require mutexes held
Client must be careful of when to
acquire mutexes!
Mutexes in Windows XP are
full mutexes



Completely backwards compatible
with DirectX 8.0 drivers
Less APIs require mutex acquisition
Mutex acquisition more lenient
DirectX 8.0 Versus Windows XP

New flags in Windows XP

_SOME_FRAMES_REQUIRED…



One or more pin instances of this type
requires frames
Can be programmatically done in DirectX 8.0
_PROCESS_IF_ANY_IN_RUN_STATE


One or more pin instances of this type must be
>= KSSTATE_RUN; others must be >=
KSSTATE_PAUSE
Processing routine must check in DirectX 8.0
What Can You Do Next?



Install the DirectX 8.0 or
Windows XP DDK
Try out the samples in the DDK
Write AVStream minidrivers for
new hardware!
Testing your WDM Driver
with DirectShow
Eric Rudolph
System Design Engineer
DirectShow Editing Services
Microsoft Corporation
Agenda

DirectShow supports capture from
1394, USB, analog video/audio, TV
tuner, and custom devices
 Demonstrate
the use of the DirectShowbased generic graph editor, GraphEdt,
as a WDM driver test tool
 Walk through sample code that uses the
GraphBuilder COM object
What tools exist to test your
driver?
Included in DX8: GraphEdt.exe, a generic
graph editor
 Also in DX8: AmCap.exe, a simple
capture application
 New for Windows XP: Still Image devices
show up in the shell (Explorer)
 New for Windows XP: Movie Maker (on
Start Menu)

GraphEdt
Overview
Ships with DX8
 Provides UI to build dataflow graphs
and then uses DirectShow to run,
pause, and stop the data
 Views different filter categories

 Capture,
compressor, crossbar, DMO, and
so on
Connects different filters together
 Accesses property pages
 Writes out files
 Controls 1394 devices

GraphEdt
Filter Categories





Categories enable you to
easily find a particular type
of DirectShow filter
Many categories predefined
in ksuuids.h & uuids.h
WDM drivers have many of
their own categories
Capture devices can show up
in both non-WDM and WDM
categories
As you add/remove WDM
devices, if they send device
notifications, they will auto
show/hide from category
lists
GraphEdt
Property Pages




The filter itself can expose
multiple property pages
Each pin can expose 1 or
more property pages
When you query an output
pin’s property pages, you
will see 1 extra page per
pin which lists available
output media connection
types
Capture property pages are
often exposed by capture
applications (using
standard DirectShow
methods), so make them
look nice!
Example property page
GraphEdt Property Pages
And Media Types






Output pins provide one or
more media types
Input pins normally do not
provide a list of types, but
instead accept types
When you render a pin,
DirectShow will try to find
appropriate filters to render
When you try to connect two
pins, DirectShow will find try
and find intermediate filters
The media types must agree
between any output pin and
its connected input pin
Buffers are also negotiated
The different media
types Indeo 5.11
decompressor provides
Common Problems








Hot unplug while streaming
Device add/remove while streaming
Enter hibernation while streaming
Multiple camera enumeration
Multiple camera streaming (one
driver, multiple devices)
Video shows up black or wrong
Changing display props while
streaming
Overlay and DDraw issues
GraphEdt Demos
Part 1



Capture from USB, both with 1 pin
and with 2 pins (capture & preview)
DV capture and device control
Device Insertion / Removal and how
the Graph refreshes
GraphEdt Demos
Part 2





How to write AVI, WAV, and WM files
New Video Mixing Renderer has
slightly different connection model
than old Video Renderer
How to force a filter to produce a
media type with a Type Enforcer
Timestamps are important!
Using .GRF files
Sample Code
Using the GraphBuilder COM Object



CaptureGraphBuilder makes
connecting capture devices
easy
See the AmCap sample code in
the DX8/DirectShow SDK
directory
Sample code walkthrough
What Can You Do Next?
Test your WDM drivers! Under many
different conditions!
 Read up on the DX8 docs, they’re great!
 DirectShow contact:
stanpenn@microsoft.com
 Get on the DirectX A/V list

Download