Innovations in teaching OS concepts using native NT Arkady Retik Program Manager Source Asset Management Dave Probert Architect Windows Core Kernel Microsoft Corporation Windows Academic Shared Source Program Integrate Windows internals into Operating Systems courses Give students more real-world illustrations of the principles being taught Achieve a better concept-to-effort ratio for OS projects Include examples from the Windows kernel source code Agenda Program Overview Windows OS Internals Curriculum Resource Kit ProjectOZ Windows Research Kernel Q&A Working below ground in Windows WEB GUI Applications LOB Services Middleware WinFX Win32 POSIX System Runtime Libraries System Services Net Interfaces Protocol Stacks Devices File Systems ProjectOZ Lecture Materials Textbooks System Call Interface I/O mgr Processes Object mgr SOURCE Data cache Threads Registry InterProcess Virtual Memory Scheduler Synchr Security Interrupts Partnership with Higher Education We believe Microsoft technologies are important to Computer Science education ubiquitous empowering scalable innovative customer-driven features We know Computer Science Education is important to Microsoft a source of the human and intellectual resources that drive our industry quality of education determines technical capabilities of our customers our partners our employees Windows Academic Shared Source Program Windows Operating Systems Internals Curriculum Resource Kit (CRK) presentation slides, experiments, labs, quizzes and assignments for introducing case studies from the Windows kernel into operating system courses. Windows Research Kernel – the core CRK WRK ProjectOZ kernel sources and binaries integrated with an environment for building and testing experimental versions of the Windows kernel for use in teaching and research. Available soon Available now ProjectOZ - an operating systems project environment that uses the native kernel interfaces of Windows to provide simple, clean, user-mode abstractions of the CPU, MMU, trap mechanism, and physical memory that can be used to perform experiments in operating systems principles. Pilots this year CRK CRK Authors industry Mark Russinovich is chief software architect and cofounder of Winternals Software (www.winternals.com), a company that specializes in advanced systems software for Microsoft Windows. Mark is co-author of Inside Windows 2000, 3rd Edition (Microsoft Press) with David Solomon andsuccessor, Windows Internals, 4th Edition (Microsoft Press). Mark is a Microsoft Most Valuable Professional (MVP) and serves as senior contributing editor for Windows IT Pro magazine where he contributes to the Windows Power Tools column. He is also a frequent speaker at major industry conferences such as Microsoft Tech Ed, IT Forum, Windows IT Pro Magazine's Connections and Redmond Magazine's TechMentor. Mark has a B.S. from Carnegie Mellon University and a M.S. from Rensselaer Polytechnic Institute, both in computer engineering. In 1994, he earned a Ph.D. from Carnegie Mellon University, also in computer engineering. David Solomon (www.solsem.com) teaches classes on Windows kernel internals to developers and IT professionals at companies worldwide, including Microsoft. He is the co-author of Windows Internals, 4th edition, the official Microsoft Press book on Windows kernel internals, as well as the previous edition, Inside Windows 2000. David also wrote Inside Windows NT, 2nd edition, and Windows NT for OpenVMS Professionals. He also co-created the Windows Internals COMPLETE video series which Microsoft licensed for worldwide internal training. David has served as technical chair for three past Windows NT conferences and has spoken at many TechEds and PDCs. He was a recipient of the 1993 & 2005 Microsoft Support Most Valuable Professional (MVP) award. academia Andreas Polze is the Operating Systems and Middleware Professor at the HassoPlattner-Institute for Software Engineering at University Potsdam, Germany. He received a doctoral degree from Freie University Berlin, Germany, in 1994 and a habilitation degree from Humboldt University Berlin in 2001, both in computer science. His habilitation thesis investigates Predictable Computing in MulticomputerSystems. Current research interests include Interconnecting Middleware and Embedded Systems, Mobility and Adaptive System Configuration, and End-to-End Service Availability for standard middleware platforms. At University Potsdam, his current teaching activities focus on architecture of operating systems, on component-based middleware, as well as on predictable distributed computing. Our curriculum includes lectures that discuss operating system issues based on standard platforms (Windows 2000/XP, Mac OS X (BSD Unix), and Solaris) as well as on embedded systems (Windows CE, Embedded Linux). Prof. Polze was a visiting scientist with the Dynamic Systems Unit at Software Engineering Institute, at Carnegie Mellon University, Pittsburgh, USA, were he worked on real-time computing on standard middleware (CORBA), and with the Real-Time Systems Laboratory at University of Illinois, Urbana-Champaign. What about CRK content? cover all OS BOK units and more (based on Windows XP/Server 2003) scaleable to multiple levels modular (can be used in whole / in part) case studies / compare & contrast Basic module provides materials to incorporate into a complete basic level OS course of one semester in length. The module cover the Windows OS specific topics in the core and elective units of the OS BOK of Computing Curricula 2001. Advanced module provides materials to incorporate into an advanced level OS course of one semester in length. The module covers the Windows OS specific topics in the core and elective units of the “CC2001” OS BOK as well as three supplementary units. What OS topics CRK covers? a. Core topics OS1. Overview of operating systems OS2. Operating system principles OS3. Concurrency OS4. Scheduling and dispatch Available OS5. Memory management now! b. Elective topics OS6. Device management OS7. Security and protection OS8. File systems OS9. Real-time and embedded systems OS10. Fault tolerance OS11. System perf evaluation & troubleshooting OS12. Scripting c. Supplementary topics 13. Windows networking 14. Comparing the Linux and Windows Kernels 15. Windows – Unix Interoperability Note: Labs and Exercises to reinforce the topics Available now @ http://www.msdnaa.net/curriculum Anything we missed? ProjectOZ 12 ProjectOZ Background Collaboration with MSR University Relations, Windows Kernel & Architecture Team, and Source Asset Team Goal is to provide better support for OS instruction and research using Windows Part of a larger program: • Windows Research Kernel • Curriculum Resource Kit • Textbooks and other resources Based on observations from SPACE research project at UC Santa Barbara (Probert & Bruno) Provide an alternative to Nachos Alpha version of ProjectOZ implemented by Paul Turner, a summer intern from University of Waterloo 13 OS model of processor RETI External interrupts CPU MMU TRAP handler OS can only control: MMU (memory management unit) trap vector MEMORY scheduling of external interrupts when it does an RETI (Return from Interrupt) OS only regains control through trap/interrupt 14 SPACE Systems Programming using Address-spaces and Capabilities for Extensibility – a reaction to distributed-shared virtual memory research Key observation: extending core OS functionality difficult because existing kernel abstractions get in the way (i.e. threads, processes, inter-process communication) SPACE uses lower-level abstractions: control flow, address spaces/domains, portals – represent hardware abstractions i.e. CPU, MMU, trap-vectors – then threads, processes, IPC built on top Monolithic kernel is not necessary => fundamental extensibility 15 Kernel Abstractions thread thread thread Process thread Process kernel pagetable pagetable CPU CPU CPU MMU MMU MMU 16 SPACE Abstractions Space: a mapping of addresses from logical to physical Domain: permission bit-vector on each address mapping in a Space – Each bit-vector indexed by the current protection-mode – (Space, mode) → Domain Portal: entry-point in a Domain – (currDomain, trap/interrupt) → (newDomain, newPC) – Each portal traversal saves state and associates a token – SPACE implementation maintains stack of tokens corresponding to nested traversals of portals on a particular CPU – Resume reverses portal-traversal to state at top of token stack Two portal operations – Suspend: • Save state token at top of current token stack • Create empty token stack, to be used at next portal traversal • Pass handle on token for previous stack to routine at newPC in newDomain – Unsuspend(token) operation: • Takes handle to a previous token stack • Discards current token stack (if any) • Resumes token from top of previous stack 17 Kernels out of spaces & domains Kernel-mode memory mappings (mostly) shared in all spaces kernel-mode domain 0 kernel-mode domain 0 kernel-mode domain 1 user-mode domain 1 user-mode domain 1 user-mode domain 1 space 0 space 1 space 2 spaces used to build processes 18 Following the CPU CPU 0 Domain a Domain b Domain c suspend a b c unsuspend T0 suspend Domain d T1 Domain f Domain d T0 Domain e f e d Domain c resume Domain b suspend a Domain d unsuspend T1 T0 Domain f 19 Redrawing the picture CPU 0 Domain d T0 a b c SCHEDULER Domain d wakeup1 sleep2 sleep1 suspend start2 suspend Domain c Domain f Domain b Domain e Domain a Domain d T1 f e d T0 unsuspend T0 Domain c a sleep1 suspend wakeup2 unsuspend T1 Domain f resume Domain b 20 Building SPACE on top of NT Spaces – use NT Processes Domains – use a Space for each domain, but – other than the page permissions, the logical-to-‘physical’ mappings are identical for domains in the same space Physical memory – creates an NT section, and selectively creates single ‘page’ views onto the section from each Space/Domain (64K page size) CPUs – each domain has an NT thread corresponding to each logical CPU configured -- with only one thread per CPU runnable at a time Space implementation – space.exe, controls the simulation, provides the space primitives such as portal traversal, implementing CPUs and MMUs 21 Building SPACE on top of NT Exceptions – space.exe establishes an exception port for each domain, which it uses to detect exceptions (e.g. pagefaults) and implement portal traversal. Traps – programmatic traps in a domain are forwarded to space.exe for portal traversal using either NT LPC or the exception mechanism Interrupts – space.exe interrupts CPUs by suspending the running NT thread, and doing get/set thread context MMU – simulated by space.exe by modifying the views each domain has for the ‘physical memory’ section (using NT memory management APIs) 22 SPACE Multi-computer Network simulator space.exe NT Proc space.exe NT Proc NT Proc space.exe NT Proc NT Proc NT Proc NT Proc NT Proc NT Proc 23 Teaching Objectives SPACE Mission: • An exciting, innovative, productive environment for OS instruction & research Goals: • Use SPACE to abstract hardware • Let students focus on OS data structures and algorithms • Provide a non-simulated environment for normal execution • Build models for I/O devices, timers, DMA • Support both project-level and lab-level experiments • Provide an experimental apparatus for exploring the OS literature 24 Approach to OS experiments Provide the BasicOZ environment • SPACE core implementing SPACE abstractions • Small vanilla OS implementation on top of SPACE • System described by XML configuration file • Development/measurement environment • Tools for tracing/profiling/analyzing • Workload/test library • Access to native NTAPIs (?) Student experiments improve on BasicOZ Experiments selected to complement lectures Some experiments progressive, others independent 25 Approach to OS experiments Multiple types of experiments can be assigned • Lab-level experiments to implement different algorithms, make small extensions, explore performance • Medium-level projects that do major work on a particular subsystem • Competitive projects where different groups implement different algorithms and compare resulting performance • Literature-based projects, where students implement algorithms/solutions from published papers • Investigations into novel algorithms and new solutions (open-ended) 26 BasicOZ Environment System calls • implementation of basic system calls, using dynamic allocation of stacks in 'kernel' • token-chains provide trapframes for returning to usermode User-mode Threads • no preemption, no guard pages on stack System devices • timer, clock, console, disk simulator, network simulator (with fault-injection) 27 BasicOZ Environment Input/output • I/O device simulation framework – DMA, interrupts – simple device register operations – simulation of IRQLs – simulation of real device properties Filesystem • trivial file system – one directory – assumes infinite storage, contiguous allocation – no delete or other namespace operations, no file extension – populated as part of system specification 28 BasicOZ Environment Processes • single thread • static executable images (no libraries or relocation) • simple create/loadimage model (not fork/exec) • simple virtual address management with linear freelist Virtual Memory • no shared virtual memory • simple pagefile management • pagefault handler always goes to disk • management of physical memory with linear freelist • artificial forcing of low-memory • random page replacement, blocking on page writes • fetch-from-previous-space for kernel implementation 29 BasicOZ Environment Boot loader • load kernel configuration and images Image library • load the segments of an executable image into an address space • access symbols, relocation information, headers, import/export tables, profiling support, stacktrace support, disassembly Build environments • environment for producing the 'kernel' (server) • environment for building test programs (client) 30 BasicOZ Environment Debug, test, instrumentation • execution statistics and timing • profiling information • tracing (flight-data recorder) Tests & Workloads • library of individual applets, applications, and entire workloads for test/evaluation/demonstration, e.g. – multi-process, multi-thread, multi-computer loads – demonstrate synchronization, priority inversion, scheduling characteristics – IPC, shared-memory – asynchronous I/O – client/server applications – etc, etc, etc 31 Project Areas: multi-threading Multi-threading and synchronization primitives • use the timer to make user-mode threads preemptive • implement a pluggable scheduler, with several different scheduling algorithms (including priority-based) • demonstrate race conditions, including priority inversion • implement basic kernel-mode blocking synchronization primitives, like semaphores and reader/writer locks • user synchronization primitives to eliminate race conditions 32 Project Areas: handles Implement handles and file table • provide a user-mode mechanism for referencing kernelmode objects • implement a way of referring to open files in the trivial file system • implement open/read/write/close on the file system • experiment with ways of detecting bad closes and test with poorly synchronized multi-processor workload 33 Project Areas: virtual memory Virtual memory • improve algorithms for managing – physical memory – pagefile space – virtual addresses • implement shared memory between processes • implement distributed-shared-virtual-memory across a SPACE multi-computer 34 Project Areas: processes Process management • create/destroy processes – using fork – using other algorithms • build a capability-based sandbox • build a process pool for isolating hosted code 35 Project Areas: I/O drivers I/O driver • implement IRQL-based protection of data structures • write a traditional top-half/bottom-half I/O driver for a simple simulated device • add DMA • implement asynchronous completion of I/O 36 Project Areas: IPC Inter-Process (i.e. cross-domain) Communication • simple reader/writer synchronization • basic message-based IPC between processes – copy-based – shared-memory • named IPC ports • named pipes • mailboxes 37 Project Areas: objects Build simple kernel-level object model • cross-domain invocation of object methods, with simple marshalling • build a name server • recover from cross-domain failures • persist objects across reboots 38 Project Areas: file system File system (and volumes) • build a more complex file system (on the simulated disk - or a USB thumbdrive) • implement block management, directory hierarchies • build a log-based file system • implement namespace operations (like rename, link/unlink) and test for race conditions • implement a cache (either blocks or files) • implement memory mapped files • implement get/put file protocols (incl memory mapping) • build a RAID layer below the file system, evaluating robustness and performance) 39 Project Areas: security Investigate security features • give processes identities • add ACLs to files/objects • demonstrate buffer-overflow • implement ‘applications’ • implement client/server impersonation • implement client/server capability mechanism 40 Project Areas: signals/exceptions signals and exceptions • deliver signals to threads • test for race conditions • use signals for delivery of asynchronous events (like I/O completion) • exception notification using signals • exception notification using unwinding 41 Project Areas: networking networking • using the SPACE multicomputer, build a simple network stack • implement sockets • packetize streams and send between computers • Use network unreliability feature in simulation – implement reliable streams – explore techniques to minimize network latencies 42 Project Areas: basic debugging implement a basic debugger • run/stop/step • examine/modify memory • disassemble • set breakpoints 43 2006/2006 academic plans • Initial version nearing completion (thanks Paul!) • Start building community • Pilot projects in China – building on the Chinese OS principles textbook by faculty at Peking, Tsinghua, and Behai – considering a follow-on project book • Will use in short-courses in Japan this year • Talking with some U.S. schools about special topics courses this year • Working with faculty on proposal for internals book using ProjectOZ as basis for experiments • Lot of interest in Europe 44 That’s as far as our travels have taken us so far Windows Research Kernel 45 WRK Goals • Make it easier for faculty and students to compare & contrast Windows to other operating systems • Students can study source, and modify and build projects • Better support for research & publication based on Windows internals • Encourage more OS textbook and university-oriented internals books on Windows kernel • Simplified licensing 46 NTOS Kernel Sources Based on Windows XP/SP2 and Windows x64 NTOS • Processes, threads, LPC, VM, scheduler, object manager, I/O manager, synchronization, worker threads, kernel memory manager, … – most everything in NTOS except plug-and-play, power-management, and specialized code such as the driver verifier, splash screen, branding, timebomb, etc. – non-kernel kernel-mode code (drivers, file systems, networking) code is from the DDK and IFSKIT • Simplified in a few places, cleaned up comments, improved spelling • Non-source is encapsulated in a binary library Build and set up utilities and tools Tools for tracing, performance monitoring, logging, debugging, etc Packaged with – – – – – DDK subset and documentation for working with drivers File system sources from IFSKIT VirtualPC product Kernel regression tests Documentation for Native NT API Something over 500K lines of source 47 WRK licensing Improvements over current MSR UR license: – Faculty feel comfortable agreeing to its conditions – Students can use in classroom environment License type: – Non commercial, academic use only; allow derivative works for noncommercial purpose Eligibility criteria: – Available to faculty and students in colleges/universities WW Usage scenarios: – View, copy, reproduce, distribute within the institution – Modify for teaching and experimentation purposes – Produce teaching and research publications including relevant snippets of source • Can use in textbooks and academic publications, and community forums • Have to perpetuate MS copyright notices – Share derivatives within academic community 48 Status CRK: Core & security topics are available now Elective & Supplementary topics will be available by end of 2005 ProjectOZ and WRK – we will be looking for participants in pilots and trials AY05/06 If you are interested - contact us at compsci@microsoft.com More information on this and related topics Shared Source http://www.microsoft.com/resources/sharedsource Curriculum Repository on MSDNAA http://www.msdnaa.net/curriculum