Windows audio architecture Win MM Application DirectSound Application Windows Driver Model – – WinMM.DLL DSound.DLL User Mode APIs – Kernel Mode SysAudio.SYS – DirectSound WinMM Kernel streaming Kmixer.SYS – multiple audio streams can be played at the same time Device Drive Container – SysAudio.SYS decides the optimal audio format and sample rate conversion – Kmixer.SYS performs the actual converting Windows Components By Hardware Vendor ISA CARD Driver supported by Win 98, Win ME, Win 2K and Win XP a single audio driver works for multiple Windows versions PCI CARD Driver USB Device Driver IEEE 1394 Device Driver WinMM API Simple, but – – – Play audio – – – – high latency inability to take advantage of hardware acceleration no easy way to implement features, e.g. 3-D positioning, effect processing waveOutOpen(…) - open the output audio device waveOutWrite(…) - write the waveform audio data waveOutClose(…) - close the output audio device need to use callback or polling to check the result Not very interesting to real-time applications DirectSound API - over view Audio component of DirectX package – – – – Two programming interfaces – – low latency use hardware acceleration direct access to sound device support capturing sound COM (Component Object Model) in C++ .NET in C++, C#, Visual Basic, etc. Important objects – – – secondary buffers: write/read audio data buffer cursors: point to current captured/played audio data buffer notifications: send events when buffer cursors reach a position DirectSound API - COM interfaces IDirectSound8 – – CreateSoundBuffer(descriptor, bufferPointer, …) create a sound buffer object to manage audio samples fields of descriptor – buffer size – audio format: commonly16 bits linear PCM – buffer features SetCooperativeLevel(windowHandle, level) set the priority of the sound buffer DirectSound API - COM interfaces IDirectSoundBuffer8 – Lock(offset, size, addr1, size1, addr2, size2, flag) – Play(reserved, priority, flags) – cause the sound buffer to play, starting from the play cursor Unlock(addr1, size1, addr2, size2) – ready all or part of the buffer for a data write and return pointers to which data can be written release a locked sound buffer Stop() cause the sound buffer to stop playing DirectSound API - COM interfaces IDirectSoundNotify8 – SetNotificationPositions(NumberOfNotifyStructure, ArrayofNotifyStructure) set the notification positions; during playback, whenever the play cursor reaches one of the specified offsets, the associated event is signaled fields of NotifyStructure – buffer offset – notify event Sound capturing is similar DirectSound API - code example 1. Streaming audio in an event-driven thread while (true) { DWORD r = WaitForSingleObject(event, INFINITE); // receives notification of refilling buffer if (r == WAIT_OBJECT_0) { Buffer.Lock(offset, size, &addr1, &size1, &addr2, &size2, 0); // copy audio to buffer addresses returned // by DirectSound // could be two addresses because of buffer // wrap-around memcpy(addr1, audio, size1); if (size2 != 0) { memcpy(addr2, left, size2); } Buffer.Unlock(addr1, size1, addr2, size2); } } // while Windows audio architecture revisited Win MM Application DirectSound Application Can we achieve lower latency? – WinMM.DLL DSound.DLL User Mode Kernel Mode – Windows Components – SysAudio.SYS Kmixer.SYS Device Drive Container By Hardware Vendor ISA CARD Driver PCI CARD Driver USB Device Driver IEEE 1394 Device Driver kernel mixing introduces at least 30 ms of delay kernel mixing is not necessary if I’m the only application generating audio streams How about interacting with device drivers directly? DirectKS - the unofficial audio API Win MM Application DirectSound Application DirectKS Application – WinMM.DLL DSound.DLL Pros DirectKS User Mode Cons – Kernel Mode SysAudio.SYS – Kmixer.SYS Device Drive Container Windows Components By Hardware Vendor ISA CARD Driver PCI CARD Driver USB Device Driver IEEE 1394 Device Driver very low latency – only one application can play sound at one time applications need to handle audio format and sample rate conversion might not work in future version of Windows The next-generation Windows audio – None of the current audio interfaces satisfies realtime applications – transition between user mode and kernel mode for each I/O request blocking upon completion of an I/O request CPU cycles for copying data WaveRT (wave real-time) drivers in the next version of Windows - “Longhorn” data flow directly between the client and the audio hardware Learn more – URLs overview – Windows Driver Model (WDM) – http://www.microsoft.com/whdc/hwdev/tech/audio/w dmaudio.mspx#wdm1 DirectKS – http://msdn.microsoft.com/library/default.asp?url=/lib rary/en-us/dnwmt/html/audiooverview.asp http://www.microsoft.com/whdc/hwdev/tech/audio/Di rectKS.mspx WaveRT – http://www.microsoft.com/whdc/hwdev/tech/audio/WaveR Tport.mspx Audio library overview – Transmit audio over the internet use low latency audio APIs – pluggable codecs – – DirectSound or DirectKS G.711, GSM, Speex, iLBC modular playout buffer integrated with rtplib++ System Requirements Windows XP or Windows 2K DirectSound 9.x runtime libraries Visual C++ runtime libraries Audio library architecture SIP user agent SIP user agent Audio tool GUI Audio tool GUI DirectSound/DirectKS DirectSound/DirectKS Playout buffer Playout buffer Decoder Decoder Encoder Encoder Rtplib++ Rtplib++ Socket Socket Network Audio library API – Initialization setUserName(name) – setRemoteAddress(host/IP, port) – receive audio from this address setPlayerAudioFormat(audioFormat) – send audio to this address setLocalAddress(host/IP, port) – set the local user name play audio in this format setCapturerAudioFormat(audioFormat) – capture audio in this format Audio library API – Initialization (Cont.) setEncoder(encoder) – use this encoder to encode audio – encoder can be created by setDecoder(decoder) – – encoder = SpeexEncoder - create a Speex encoder instance encoder.setPayloadType(payLoadType) - set RTP payload type encoder.setOutputAudioFormat(audioFormat) - set the encoded format … (similar to encoder) Start startReceiver()/startSender() – start to receive/send audio Audio library delay – One-way mouth-to-ear delay measurement of audio library using DirectSound and DirectKS DirectKS shows close to 30 ms improvement over DirectSound Min DirectSound 68 ms DirectKS 42 ms Max Avg. 195 ms 121 ms 162 ms 111 ms