® DirectX And Streaming Video Drivers Jeff Noyle, Development Lead Gary Sullivan, Software Design Engineer William Messmer, Software Design Engineer Eric Rudolph, Software Design Engineer Microsoft Corporation Speakers “DirectX Graphics Drivers,” Jeff Noyle, Lead Developer, DirectDraw®/Direct3D®, Microsoft Corporation “DirectX VA Video Acceleration Drivers,” Gary Sullivan, Software Design Engineer, DMD Video Services Group, Microsoft Corporation “Writing AVStream Minidrivers for Windows® XP,” William Messmer, Software Design Engineer, Digital Audio-Video, Microsoft Corporation “Testing Your WDM Driver with DirectShow®,” Eric Rudolph, SDE, DirectShow Editing Services, Microsoft DirectX Graphics Drivers Jeff Noyle Development Lead DirectDraw/Direct3D Microsoft Corporation Prerequisites I’m assuming Basic familiarity with DirectDraw and Direct3D concepts: System Architecture Surfaces Page flipping The DDK can be hard to read Agenda Single-source issues Windows 9x issues OS-independent issues DirectX 7.0 implementation details Changes in DirectX 8.0 What can you do next? Single-Source Issues Stuff you should know if you want one code-base to support Windows 9x OS versions and Windows NT® OS versions Allocating System Memory Per-Surface (Do NOT use this process to allocate surface memory itself...See later) Normally system memory is charged against a particular process Can’t free it in some other process (as in ctrl-alt-del mechanism) Use EngAllocPrivateUserMem and EngFreePrivateUserMem Uses DirectDraw object to locate proper process context YUV/FOURCC Surfaces System memory YUV/FOURCC surfaces on NT systems DirectDraw Kernel-mode “pretends” that these surfaces are 8bpp RGB for the purposes of allocating memory DXTn: Height: height in 4x4 blocks Width: width in blocks * sizeof(block) You must undo these transformations at CreateSurface time YUV/FOURCC Surfaces NT kernel mode doesn’t understand any FOURCC formats, so: The driver must handle video memory allocation for these types The driver must handle Lock for these types Windows 2000 Issue (Fixed In Windows XP) During allocation of an AGP surface... If the driver fails to allocate and: returns DDHAL_DRIVER_HANDLED AND sets an error code in ddRVal AND sets the surface’s lpVidMemHeap to non-zero Then the system will ignore the error So NULL the lpVidMemHeap on error! Atomic Surface Creation On Windows 9x, drivers are given a list of surfaces On Windows NT, drivers are given surfaces one-at-a-time, unless: Driver reports GUID_NTPrivateDriverCaps and sets DDHAL_PRIVATECAP_ ATOMICSURFACECREATION Windows NT Extra You can use the GUID_NTPrivateDriverCaps to request notification of primary surface: Set DDHAL_PRIVATECAP_ NOTIFYPRIMARYCREATION Windows 9x Issues System-To-Video Blts To speed up some titles, implement system-to-video blts All you need to implement is SRCCOPY, no stretch But you should implement sub-rects DirectDraw assumes your driver requires system memory to be pagelocked during Blt If this is not true, set DDCAPS2_NOPAGELOCKREQUIRED OS-Independent Issues HeapVidmemAllocAligned It’s an “Eng” function in Windows NT versions It’s a ddraw.dll export in Windows 9x You can use this to allocate surface memory You must have passed the heap to DirectDraw previously You must fill in the fpHeapOffset, fpVidmem and lpVidmemHeap of the surface Heap Offsets Explained Return values from HeapVidmemAllocAligned are these offsets: (Note fpStart is set to 0x1000 by DirectDraw for AGP heaps) fpEnd (points TO last byte) Heap Surface Return value from HVMAA and fpHeapOffset fpStart “0” DDSCAPS_VIDEOMEMORY Remember that this includes AGP unless combined with DDSCAPS_LOCALVIDMEM At GetAvailDriverMem time, a request that specifies DDSCAPS_VIDEOMEMORY (and not any explicit type: local or non-local) should include both types in the total GetScanLine Implement this, if you can! DirectX 8.0 uses it a lot for presentation-Blt timing Set DDCAPS_READSCANLINE, so DirectX 8.0 knows CreateSurfaceEx More on this later NEVER fail CreateSurfaceEx for system memory surfaces, even if you don’t understand the pixel format Just return DDHAL_DRIVER_HANDLED and DD_OK (Otherwise new system-memory formats used by the reference rasterizer can’t be created) Alpha-In-The-Primary If your driver can do this in 32bpp: Create an A8R8G8B8 render target Blt that to the primary surface IGNORING the alpha channel (And stretch/shrink (please)) Then you should set: DDHALINFO.vmiData.ddpfDisplay. dwFlags |= DDPF_ALPHAPIXELS DDHALINFO.vmiData.ddpfDisplay. dwRGBAlphaBitMask = 0xFF000000 Windowed Applications And Blt Queuing Don’t allow “many” presentation-blts in your queue WHQL enforces low latency for DirectX 8.0 drivers That is, don’t allow a large latency between scheduling and retiring a presentation-blt Check DDBLT_PRESENTATION, and don’t allow more than three More info in ddraw.h DDBLT_WAIT And DDBLT_DONOTWAIT Drivers should never look at these They are set by the application/ DirectDraw runtime They are handled by the DirectDraw runtime Sometimes DirectDraw spins, and wants to do that in user-mode Applies to DDFLIP_WAIT as well DDBLT_ASYNC Ignore this flag Always perform your blts asynchronously, if possible What Are DDROPS? We don’t know either An idea of the original designer of DirectDraw, but never implemented or specified In short: ignore! Blt And YUV Surfaces DirectShow can gain performance benefits if it knows it can use Blt to copy Overlay surfaces Check to see if you can support DDCAPS2_COPYFOURCC This means you can SRCCOPY, no subrects, no stretch, no overlap between two FOURCC surfaces of the same type Update Overlay, Etc. If multiple overlays are created, but you have hardware for only one: Succeed all CreateSurface calls Fail the UpdateOverlay call Flip Flags DDFLIP_NOVSYNC This means: flip immediately; do not wait for vertical blank The hardware must be capable of relatching the new primary surface address immediately, or at least on the next scanline In other words, don’t allow the remaining raster scans to read from the old back buffer Flip Flags DDFLIP_INTERVALn Please don’t implement by busy-waiting in the driver But please do implement if your hardware can defer flips for n frames Gamma Ramps DirectDraw and Direct3D’s gamma ramps are passed through the GDI DDI call SetDeviceGammaRamp This call is poorly prototyped This is the struct you will be passed: struct { WORD red[256]; //WORDs not BYTEs WORD green[256]; WORD blue[256]; }; DirectX 7.0 Implementation Details Overview Of DirectX 7.0 Model Direct3D refers to surfaces via “handles” Driver keeps a look-up table indexed by handle Driver keeps everything it needs to know about a surface in this table CreateSurfaceEx Called after CreateSurface Assigns a Direct3D-allocated handle to the surface(s) Driver runs attachment lists, creates internal structures for each surface in list CreateSurfaceEx Is Hard Driver has to run surface attachment list Z buffer might be attached, or separate surface Cubic Environment Maps are the hardest... Cubemap Attachments (Abstract View) Positive X Negative X Positive Y Mip SubLevel Mip SubLevel Mip SubLevel ... ... ... ... Cubemaps (Struct View) Positive X lpAttachList lpLink lpAtt.. lpLink lpAtt.. lpLink lpAtt.. Positive Y Negative X + X Mip lpAttachList lpLink lpAtt.. lpAttachList lpAttachList lpLink lpAtt.. lpLink lpAtt.. + X Mip - X Mip lpAttachList Drivers Cannot Keep pointers to DirectDraw’s surface structures in their own structures Flip confusion (explained later) Overhead Under DirectX 8.0, we don’t keep the DirectDraw structure ...So DirectX 8.0 drivers CAN’T store pointers – they will crash Flip Confusion Explained Before Flip: User Mode Front Buffer User Mode Back Buffer Handle A Handle A Driver Surface A Driver Surface B After Flip The user-mode structures now refer to different pieces of memory. => You cannot store pointers to the user-mode structs in the driver structs. User Mode Front Buffer User Mode Back Buffer Handle B Handle A Driver Surface A Driver Surface B Aliasing: What It Is Video memory is a shared resource On mode switch, all must be given up But the application may be writing directly to video memory We re-map the application’s view of video memory to a dummy page, then allow the mode switch to proceed Only done at app’s request: DDLOCK_NOSYSLOCK Aliasing: How It’s Done When the driver returns a pointer to video memory at CreateSurface time: The offset into the frame buffer is calculated, and then an equivalent aliased pointer is returned to the application If the pointer lies outside of video memory, no aliasing is done (we don’t know enough to do so) Aliasing: How To Break It On Windows NT systems, the driver must NOT return a pointer outside of video memory at Lock time This pointer will not be aliased The application will crash if a mode switch happens Drivers should allocate system memory at CreateSurface time (PLEASE_ALLOC_USERMEM) Changes For DirectX 8.0 Driver Capabilities Are Constant Across Modes This means everything in D3DCAPS8 The caps are allowed to be “nothing” in some modes, e.g., 24bpp You are allowed to support different back buffer formats That is, the one that matches the front buffer Pixel Formats In DirectX 8.0 Goodbye DDPIXELFORMAT Hello D3DFORMAT All FOURCCs are D3DFORMATs D3DFMT has this form Byte 3 Byte 2 Byte 0 Vendor ID (0=Microsoft) (Use your PCI Vendor ID) Nonzero => FOURCC Byte 1 Format Number D3DFORMAT Examples D3DFMT_A1R5G5B5 IHV-defined Format 0x00000019 0xACAT0001 (PCI ID 0xACAT, not FOURCC, format 1) FOURCC “UYVY” 0x55595659 (Byte 2 is non-zero) IHV-Def’d Texture Formats Since Direct3D doesn’t understand These formats cannot be “managed” Applications can lock these surfaces directly (In fact this is the only way to fill such surfaces with data) DirectX 8.0 Format Op-list The format op-list tells DirectX 8.0 everything about capabilities that vary with surface format For each format, the driver sets bits that indicate: Can Texture from this format Render to this format Switch display mode to this format Has caps in modes of this format Format Op-List Tricks The runtime searches for the first entry that has all required capabilities Example: Application wishes to render to 565 texture Runtime will search for an Op-List entry with: D3DFORMAT_OP_TEXTURE | D3DFORMAT_OP_OFFSCREEN _RENDERTARGET Format Op-List Tricks Driver A can render to 565 texture Sets this entry: Format = D3DFMT_R5G6B5 Ops = D3DFORMAT_OP_TEXTURE | D3DFORMAT_OP_OFFSCREEN _RENDERTARGET Format Op-List Tricks Driver B can NOT render and texture from the same surface, but can do both operations individually Sets TWO entries Format1 = D3DFMT_R5G6B5 Ops1 = D3DFORMAT_OP_TEXTURE Format2 = D3DFMT_R5G6B5 Ops2 = D3DFORMAT_OP_OFFSCREEN _RENDERTARGET What Can You Do Next? If you develop DX Graphics Drivers: You need a relationship with Microsoft’s DirectX team, and should contact IHV Program Manager: Michele Boland (MBoland@microsoft.com) Install and run against DEBUG runtimes Available in the DirectX SDK Will output debug messages for common errors DirectX VA Video Acceleration Drivers Gary Sullivan GarySull@microsoft.com Software Design Engineer DMD Video Services Group Microsoft Corporation Agenda DirectX VA design and status Current and future requirements and tests Future plans and potential extensions What can you do next? DirectX VA Decouple software decoder operation from hardware accelerator design to Prime Directive achieve full interoperability Any other MPEG-2 MPEG-4 DirectX VA MPEG-1 H.263++ H.261 Motion Comp Inverse DCT VLD What Is DXVA? What Can It Achieve? Interoperable interface between video decoding software and advancedcapability graphics accelerators Increases video capability for the consumer’s PC Increases the demand for advanced graphics accelerators and video applications Decreases implementation effort for software decoder writers Decreases support burden for graphics accelerator companies Decreases testing burden for OEMs DirectX VA General Status Spec went 1.0 with DirectX 8.0 Beta 2 (October ’00) See http://www.microsoft.com/hwdev/DirectX_VA OEMs love it – it enables separate WHQL qualification of decoders and drivers Software decoder companies are developing with it (Mediamatics, Intervideo, Ravisent, Cyberlink, MGI/Zoran, MbyN, …) Hardware accelerator companies are supporting it in drivers (ATI, Nvidia, Intel, SiS, S3, SiliconMotion, …) DirectX VA Capabilities Emphasis on MPEG-2 and DVD “sub-picture” Support of all important video coding standards (H.261, H.263, MPEG-1, MPEG-2, MPEG-4) Alpha graphic blending (e.g., DVD subpicture) Three basic degrees of decoding configuration capability: And some non-standard variations on the standards Motion compensation on accelerator with host residual difference decoding Motion compensation and IDCT on accelerator Full raw bitstream decoding Externally-defined encryption support How Does DXVA Operate? Operation with Windows 2000 Overlay Mixer (OVM) or new Windows XP Video Mixing Renderer (VMR) Requires DirectX 8.0 or Windows XP Decoders use it through existing Windows 2000 “IAMVideoAccelerator” API Drivers use it through corresponding Windows 2000 “MoComp” DDI DirectVA specifies payload content of data buffers that previously had acceleratorspecific formats Host Versus Accelerator Functional Split Bitstream processing either on host or accelerator Accelerator handles the primary data flow and performs the intensive signal processing PCI/AGP is the bridge between the two Reconstruction loop maintained in graphics Accelerator memory Host processing converts standardspecific streams into generic Accelerator work units Today’s DirectX VA Compressed Video Source Variable-Length Decoding Residual Difference Decoding (IDCT) Motion Compensation Sum & Clip Frame Storage OVM/VMR/3D Graphic Source Graphic Decoder Graphic Blending (Content Protection Supported Outside of Scope) Constrained Parameter Profiles Strategy is to define a general interface and a number of constrained-parameter profiles, with decoder data structure configuration settings Profiles defined: MPEG-2 Main Profile with and without DVD Subpicture Several H.263/MPEG-4 profiles MPEG-1 H.261 with and without deblocking post-processing Defined Buffer Types Picture-level decoding parameter buffers Buffers for bitstream decoding: Buffers for macroblock-level decoding: Macroblock control buffers Residual difference data buffers Buffers for graphic blending: Bitstream data buffers Bitstream slice control buffers Inverse quantization matrix buffers Alpha+YUV graphic buffers AI44 graphic buffers DVD DPXD graphic buffers DVD highlight definition buffers DVD display control command buffers Alpha blend combination buffers Deblocking filter control buffers Picture resampling buffers Read-back data buffers DXVA Requirement Plans Primary Goals Clear specification for MPEG-2 interoperability (and front-end DVD subpicture) is the primary goal Driver and decoder that claim video acceleration must support DXVA Specific “minimal interoperability set” for each defined profile July ’01 Stated Requirements MPEG2_A and MPEG2_C required MPEG1_A required H263_A required (?!) Arithmetic accuracy required IDCT accuracy required Picture resolutions up to 720x576 Uncompressed surface types must include NV12 in supported list Must have “front end” capability to convert to YUY2 from format in use July ’01 Actual Tests StRowe test decoder developed Test driver also developed Released DCT400 driver tests cover MPEG2_A, _B, _C, _D profiles Pass/Fail based on MPEG2_A and _B Tests are currently of functional operation and visual performance Contact us (?!) if any test problems Don’t ship untested features (?!) Structure Of Motion Comp Data All standards send only luma motion vectors, deriving chroma vectors from luma vectors Each standard derives chroma vectors in its own way Switches for configuring the motion comp are provided to minimize host “translation” requirements MPEG-2 Dual-Prime motion vectors derived on host DXVA Macroblock Control Example /* Basic form for P and B pictures */ typedef struct _DXVA_MBctrl_P_OffHostIDCT_1 { WORD wMBaddress; WORD wMBtype; DWORD dwMB_SNL; WORD wPatternCode; UINT8 NumCoef[6]; DXVA_MVvalue MVector[4]; } DXVA_MBctrl_P_OffHostIDCT_1; Structure Of Residual Data Background (1 of 2) Things that vary within and across standards: Coefficient scan schemes Intra Coefficient prediction schemes VLC schemes Inverse quantization schemes Mismatch-control schemes … These things need lots of logic – not always justified for accelerator implementation Structure Of Residual Data Background (2 of 2) Things that do not vary within and across standards IDCT definition Conformance rules may slightly differ – but multi-standard conformance not a big problem Many zero-valued coefficients Predicted-versus-Intra operation Only a few currently-specified inverse scans Structure Of Residual Data The Chosen Method Keep standard-specific issues on the host to the extent possible Support host-based or acceleratorbased IDCT Send only non-zero coefficients Send index or run-length for coefficients Residual Difference Example (Off-Host IDCT 16b TCOEFF) typedef struct _DXVA_TCoefSingle { WORD wIndexWithEOB; SHORT TCoefValue; } DXVA_TCoefSingle, *LPDXVA_TCoefSingle; /* Macros for Reading EOB and Index Values */ #define readDXVA_TCoefSingleIDX(ptr) ((ptr)->wIndexWithEOB >> 1) #define readDXVA_TCoefSingleEOB(ptr) ((ptr)->wIndexWithEOB & 1) /* Macros for Writing EOB and Index Values */ #define writeDXVA_TCoefSingleIndexWithEOB(ptr, idx, eob) ((ptr)>wIndexWithEOB = ((idx) << 1) | (eob)) #define setDXVA_TCoefSingleIDX(ptr, idx) ((ptr)->wIndexWithEOB |= ((idx) << 1)) #define setDXVA_TCoefSingleEOB(ptr) ((ptr)->wIndexWithEOB |= 1) Decoding Configurations (Part 1 of 2) Bitstream decoding vs. Host VLD Encryption: Bitstream data if bitstream decoding Macroblock control commands and/or residual difference data if Host VLD Type of encryption protocol supported For Host VLD: Host-based residual difference decoding versus Accelerator-based residual difference decoding versus both Macroblock control commands in raster-scan order versus arbitrary order Decoding Configurations (Part 2 of 2) For host-based residual difference decoding 8b vs. 16b differences If 8b differences, overflow supported, or not If 8b differences, subtract second pass, or not Interleaved chroma or not Host clips range of data, or not Intra residuals unsigned, or not For accelerator-based difference decoding Specific IDCT support Inverse scan on host or accelerator Coefficients sent in groups of four, or singly Alpha Blending Configurations AYUV alpha blend graphic loading AI44 or IA44 +palette or DPXD+Highlight or AYUV Alpha blend combination operation: Front-end versus back-end Picture resizing or not Only use picture destination area or not Graphic resizing or not Whole plane alpha or not Longer Term Requirements Include H263_A, _B, _C in tested requirements Include mathematical motion comp and IDCT accuracy in tests Add speed performance testing Picture resolutions up to 1920x1088 Six or more uncompressed surfaces Specific FOURCC surface types for uncompressed surfaces Kill Superfluous Configs bConfigRasterOrder = 0 bConfigResidDiffHost = 1 (bConfigResid8Subtraction = 1 with bConfigSpatialResid8 = 1) or (bConfigResidDiffHost = 1 with (bConfigSpatialResid8 = 0 and bConfigSpatialHost8or9Clipping = 0)) bConfigIntraResidUnsigned = 0 bConfigSpatialResidInterleaved = 0 bConfigHostInverseScan = 0 bConfig4GroupedCoefs = 0 Enhance Blending Configs Eliminate duplication of AI44 & IA44 (bConfigDataType = 0 & 1) Require both AYUV and AI44/IA44 (bConfigDataType = 3 and 0/1) Require front-end blend (bConfigBlendType = 0) bConfigPictureResizing = 1 bConfigOnlyUsePicDestRectArea = 0 bConfigGraphicResizing = 1 bConfigWholePlaneAlpha = 1 Hot Issue: WMV/H.263/MPEG-4 Codecs beyond MPEG-2 need support H263_A profile needs: H263_B profile needs: Different derivation of chroma motion Rounding control Motion vectors over picture boundaries 8x8 motion vectors Alternative inverse scan (or host inverse scan) H263_C profile needs: Deblocking filter support (also in H263_B?!) Desirable Future Extensions De-interlacing Interoperable encryption / DRM Compressed-video encoding (including ME, DCT, and so on) Inverse-telecine Hue/contrast/brightness/gamma/color corrections Future decoding methods (MPEG-4v2, WMV, H.26L) Frame rate conversion Precise separable re-sampling Gen lock/frame rate synchronization TV out control New GUIDs Reducing Memory Use Add three new GUIDs to parallel MPEG2_A, MPEG2_B, and MPEG2_D New GUID adds raw bitstream decoding to the “minimal interoperability set” of the corresponding existing GUID Driver with raw bitstream support then need not allocate buffers for macroblock-level processing with these GUIDs Drivers could also not expose bitstream processing with existing GUIDs to save memory Interoperable Encryption Define an interoperable encryption scheme Much like the old draft DXVA scheme Certificates for establishing trust (perhaps X.509 or something else rather than old draft scheme) RSA key exchange AES (RIJNDAEL) content encryption Other In-Scope Additions Add new features for other codecs – WMV, H.26L, MPEG-4v2, etc. 1/4-sample motion comp Added motion comp sizes and shapes New inverse transforms (e.g., 4x4) Fine granularity scalability Global motion comp Studio profile features More possible GUIDs for precise codec/configuration needs New Video Building Blocks Deinterlacing Inverse Telecine Frame rate conversion Contrast/Brightness/Gamma/Color Precisely-specified resampling Video compression encoding Deinterlace/ Inverse Telecine Deinterlace is crucial Becoming a standard feature of high-end consumer TVs 1080i in weave can look awful 1080i in bob can look wrong too Deinterlace can be useful for either decoding or encoding Hypothetical DXVA Structure Today’s Scope of DirectX VA De-interlace / Inverse Telecine Frame Rate Conversion ? OVM/VMR/3D Color Conversions & Adjustments ?? Scaling ??? Interoperable DRM/Conditional Access/Content Protection/Encryption Video Encoding ? Uncompressed Video Source Motion Estimation Inverse Telecine / De-interlace Color Conversions And Adjustments Frame Storage Motion Compensation Mode & Motion Vector Decision Sum and Clip Residual Difference Transform (DCT) Quantization Residual Difference Decoding (IDCT) Variable Length Encoding ? What Can You Do Next? (To All) Give Us Your Proposals About any difficulties/problems in design About encryption design About new in-scope feature needs About how to support new features Deinterlace/inverse telecine Encoding Frame rate conversion Contrast/Brightness/Gamma/Color Resampling What Can You Do Next? (For Graphic Accelerator Designers) Make your MPEG-2 and DVD subpicture DXVA solution rock-solid, fully-tested with every available decoder, and frighteningly fast Fully support YUV surfaces as textures for input to 3-D Design maximal WMV/H.263/MPEG-4 feature support into your next generation Conversion to RGB, and so on But don’t expose them unless fully tested Move to the preferred configurations and uncompressed surface types Support new memory-conserving GUIDs Writing AVStream Minidrivers For Windows XP William Messmer, SDE Digital Audio-Video Microsoft Corporation Agenda AVStream minidriver architecture Data processing Writing a minidriver: key issues and pitfalls When and why to use AVStream Exposing minidriver functionality Walk through sample code Common problems and mistakes DirectX 8.0 versus Windows XP What can you do next? Why AVStream THE next generation class driver More efficient streaming Reduces the amount of minidriver code Simplifies development; faster to market One minidriver, one model – no more confusion over stream class versus port class New features, new technologies will only be supported in AVStream; stream and port class, however, are still supported! When To Use AVStream BDA Drivers New Device Types Combined A/V devices Kernel Software Transforms Which are not already written to stream class or port class Audio Global Effects (GFX) Filters No necessity to port existing stream or port class drivers Minidriver Architecture Functionality is exposed as a tree hierarchy described through static descriptors Device – described by Device Descriptor Filter Factory – creates a type of Filter Filter – described by Filter Descriptor Pin Factory – creates a type of Pin Pin – described by Pin Descriptor Functionality provided through static dispatch and automation tables Minidriver Architecture Device Device Descriptor Device Add Device Dispatch Filter Descriptor Filter FilterDispatch Create >= 1 Filter Factory >= 1 Filter Automation Filter Pin Automation >= 1 Pin Factory Pin Descriptor Pin PinDispatch Create >= 1 Pin Key: Minidriver Provided Table Public AVStream Construct Minidriver Dispatch Routine Private AVStream Construct Exposing Minidrivers Expose your driver to AVStream Call KsInitializeDriver in DriverEntry passing your Device Descriptor Return the status from KsInitializeDriver AVStream handles PnP to get your driver set up; minidriver gets calls through device dispatch Filter Factories set up by AVStream during Add Device and Start Device Exposing Minidrivers AVStream creates filters/pins based on descriptors Minidriver receives creation dispatch Creation dispatch associates minidriver specific context with object Object bags available as containers for dynamic memory like contexts AVStream handles cleanup of objects based on bags No forgetting to free dynamic memory Minidriver Architecture Sample Code (Exposing Functionality) Data Processing AVStream queues data/buffers Minidriver queues not necessary Cancellation handled in the queue Data exposed through two abstractions: stream pointers and process pins Stream pointers are robust and allow versatile queue management; typically used in hardware drivers Process pins work purely at a single buffer level making for very simple software transforms Design Issues Two distinct ways to handle data processing Filter-Centric processing Pin-Centric processing Specify filter process dispatch Specify pin process dispatches The choice of which to use will influence design greatly Filter-Centric Processing Filter is called to process data in a context where data is available on all required pins Typically used for software transforms Stream pointer use not required Processing based on an index of process pins Index/pins stable during processing Minidriver does transform, specifies how many bytes of each buffer used Process Pins One per pin – points back to the pin Contains a stream pointer if needed Contains a buffer virtual address and size for data manipulation Informs the process routine of the pin’s relationships with other pins InPlaceCounterpart – other pin in an in-place transform pair CopySource – pin data is copied from DelegateBranch – pin that delegates frames (in the same pipe) Transform Example INPUT OUTPUT 1. Frame(s) arrive Frame Frame 2. Filter is called to process. Filter sees two process pins: (1920) Gone 3. Process Pins Point to Buffers Frame (1100) (140) (2880) Gone (960) 4. Filter performs transform; Sets 1920 bytes used on input and output IN 5. Filter is called back; more data to transform OUT 6. Process Repeats Similarly 1920 1100 Bytes Data Data 2880960 Bytes Buffer Bytes Buffer Pin-Centric Processing Each pin called to process data in a context independent of other pins Typically used for hardware drivers Data accessed through stream pointer abstraction Stream Pointers Reference a single frame in a queue Hold that frame in the queue Can be in multiple states Locked – referenced data is safe to access; Irp cannot be cancelled Unlocked – not guaranteed to even reference data; Irp can be cancelled Can be cloned to create new pointers into the data stream Can schedule time-outs Stream Pointers Contain two offsets into the data stream for ease of in-place use Address data at one of two granularities: Byte – access via virtual address Mapping – access via logical DMA address KSPIN_FLAG_GENERATE_MAPPINGS Minidriver usable context available per stream pointer Stream Pointers And Queues Oldest Frames Frame Frame Frame (1) (1) (1) Frame Leading Edge (0) Frame Trailing Edge (1) Clone Frame (2) Clones Frame Frame Frame (0) (1) (3) Frame Frame (0) (0) Leading Edge Frame Clones (1) Leading Edge Newest Frames Direct DMA Example QUEUE Frame Frame Frame Gone (2) (1) Frame Frame Gone (2) (1) 1. Frame(s) arrive; minidriver called to process 2. Processing routine acquires leading edge KsPinGetLeadingEdgeStreamPointer 3. Leading edge is cloned KsStreamPointerClone 4. DMA Hardware is programmed 5. Leading edge is advanced Frame Frame Gone (2) (1) 6. Process may repeat for more frames 7. Hardware interrupts for DMA completion 8. ISR Schedules a DPC 9. DPC releases the associated frames KsStreamPointerDelete 10. May need to continue processing KsPinAttemptProcessing Data Frame Control Held non-cancelable for a period Can relinquish claim with callback Use locked stream pointers Consider stream pointer timeouts Use unlocked stream pointers with a cancel callback Periodic access where frame can disappear between accesses Use unlocked stream pointers and lock periodically Processing Decisions Filter-Centric All pins are involved in the decision Each pin type can have separate requirements One pin not fulfilling requirements will veto processing for the entire filter Pin-Centric Only one pin is involved in the decision Each pin type can have separate requirements which do not influence other pins When Processing Happens Default case (no pin flags) Attempt will succeed if Attempt made when frame arrives and leading edge points to no frame Involved pin(s) are >= KSSTATE_PAUSE Involved pin(s) all have data Continuing processing STATUS_SUCCESS returned from dispatch and conditions still met Adjusting Processing KSPIN_FLAG_ _INITIATE_PROCESSING_ON_EVERY… _DO_NOT_INITIATE_PROCESSING No frame arrival initiates PROCESS_IN_RUN_STATE_ONLY Every frame arrival initiates Pin must be in KSSTATE_RUN FRAMES_NOT_REQUIRED… Data is not required on this pin Adjusting Processing Some mentioned flags useful for pin-centric Most flags useful for filter-centric where all pins are involved in the decision as to when to process data See the DDK for a complete description of flags Understand when processing happens based on your flags! Adjusting Processing Processing can happen in a DPC! Dispatch level processing still synchronized KSFILTER_FLAG_DISPATCH_LEVEL_PROCESSING KSPIN_FLAG_DISPATCH_LEVEL_ PROCESSING Processing mutex still held during dispatch level processing Can still be used to synchronize with processing Data manipulation (stream pointer) API fully dispatch level ready! Walkthrough Sample Code Pin-centric sample code Common Problems Internal mutexes are exposed Three mutex types in a hierarchy Device Mutex Filter Control Mutex Processing Mutex Some calls require mutexes held Sometimes AVStream holds the mutex for you; sometimes you must hold the mutex! See the DDK for this! Common Problems Mutex Rules Do NOT take mutexes out of order: device then control then processing Do NOT take a mutex and call out – not for properties, not for anything! Walking the object hierarchy requires mutexes held: Device Mutex – device down to filter Filter Control Mutex – filter down to pins Common Problems Do not traverse the object tree (filters and pins) during processing! KsFilterGetFirstChildPin KsPinGetNextSiblingPin Pin-centric filters should not need to do this; filter-centric filters have the process pins index DirectX 8.0 Versus Windows XP Mutexes in DirectX 8.0 are fast mutexes Certain APIs require mutexes held Client must be careful of when to acquire mutexes! Mutexes in Windows XP are full mutexes Completely backwards compatible with DirectX 8.0 drivers Less APIs require mutex acquisition Mutex acquisition more lenient DirectX 8.0 Versus Windows XP New flags in Windows XP _SOME_FRAMES_REQUIRED… One or more pin instances of this type requires frames Can be programmatically done in DirectX 8.0 _PROCESS_IF_ANY_IN_RUN_STATE One or more pin instances of this type must be >= KSSTATE_RUN; others must be >= KSSTATE_PAUSE Processing routine must check in DirectX 8.0 What Can You Do Next? Install the DirectX 8.0 or Windows XP DDK Try out the samples in the DDK Write AVStream minidrivers for new hardware! Testing your WDM Driver with DirectShow Eric Rudolph System Design Engineer DirectShow Editing Services Microsoft Corporation Agenda DirectShow supports capture from 1394, USB, analog video/audio, TV tuner, and custom devices Demonstrate the use of the DirectShowbased generic graph editor, GraphEdt, as a WDM driver test tool Walk through sample code that uses the GraphBuilder COM object What tools exist to test your driver? Included in DX8: GraphEdt.exe, a generic graph editor Also in DX8: AmCap.exe, a simple capture application New for Windows XP: Still Image devices show up in the shell (Explorer) New for Windows XP: Movie Maker (on Start Menu) GraphEdt Overview Ships with DX8 Provides UI to build dataflow graphs and then uses DirectShow to run, pause, and stop the data Views different filter categories Capture, compressor, crossbar, DMO, and so on Connects different filters together Accesses property pages Writes out files Controls 1394 devices GraphEdt Filter Categories Categories enable you to easily find a particular type of DirectShow filter Many categories predefined in ksuuids.h & uuids.h WDM drivers have many of their own categories Capture devices can show up in both non-WDM and WDM categories As you add/remove WDM devices, if they send device notifications, they will auto show/hide from category lists GraphEdt Property Pages The filter itself can expose multiple property pages Each pin can expose 1 or more property pages When you query an output pin’s property pages, you will see 1 extra page per pin which lists available output media connection types Capture property pages are often exposed by capture applications (using standard DirectShow methods), so make them look nice! Example property page GraphEdt Property Pages And Media Types Output pins provide one or more media types Input pins normally do not provide a list of types, but instead accept types When you render a pin, DirectShow will try to find appropriate filters to render When you try to connect two pins, DirectShow will find try and find intermediate filters The media types must agree between any output pin and its connected input pin Buffers are also negotiated The different media types Indeo 5.11 decompressor provides Common Problems Hot unplug while streaming Device add/remove while streaming Enter hibernation while streaming Multiple camera enumeration Multiple camera streaming (one driver, multiple devices) Video shows up black or wrong Changing display props while streaming Overlay and DDraw issues GraphEdt Demos Part 1 Capture from USB, both with 1 pin and with 2 pins (capture & preview) DV capture and device control Device Insertion / Removal and how the Graph refreshes GraphEdt Demos Part 2 How to write AVI, WAV, and WM files New Video Mixing Renderer has slightly different connection model than old Video Renderer How to force a filter to produce a media type with a Type Enforcer Timestamps are important! Using .GRF files Sample Code Using the GraphBuilder COM Object CaptureGraphBuilder makes connecting capture devices easy See the AmCap sample code in the DX8/DirectShow SDK directory Sample code walkthrough What Can You Do Next? Test your WDM drivers! Under many different conditions! Read up on the DX8 docs, they’re great! DirectShow contact: stanpenn@microsoft.com Get on the DirectX A/V list