Operating System Architectures [Silberschatz ch2] OS as virtual machine to users Mechanisms and Policies Mechanisms Policies How basic facilities should be what specifics to be done (flexibility) time construct for CPU protection how long the timer for a user? no change over place and time change (only redfine certain param.) giving priority priority values I/O vs. CPU bound * Microkernel: mechanism/policy separation to extreme * small or basic set of primitive building blocks (policy free) Two Key Architectures: Monolithic/Microkernels Monolithic massive, undifferentiated, intractable Microkernel only basic abstractions (mechanism) (addr map, drivers, ipc, interrupts, switch) single addr. space, efficiency extensibility, emulation (VM), secure lack of structure-> layering, OO modularity, open distri. System Unix, Windows (XP) Mach, Chorus [Note: XP embodys basic OO mechanism with access-interface defined] Hybrid Approaches Mach and Chorus (microkernel) improved version initially developed by running servers as user processes which communicate with kernel via interrupts/messages Performance considerations: servers can be loaded dynamically into kernel or user address space, client interactions via same ipc calls. Can debug a server at user level, and then later run it inside the kernel addr. for performance (but then may threatens integrity – if still has bugs) SPIN (’95): Finesses the problem of trading off efficiency for protection by employing type-safe language facilities (modula-3) for protection. Kernel and dynamically loaded modules grafted on to kernel (single addr. space) – using type-safe language for mutual protection (access only given by a reference) Minimize dependencies between system modules by choosing an eventbased (net., timer, page fault) model for interactions between modules System components register themselves for handlers for events L4 2nd generation microkernel design Forces dynamically loaded system modules executed in user space (but optimize IPC to offset the costs) Offloads kernel complexity: management of addr space in user-level servers Exokernel Employing user-level libraries (instead of user-level servers) to supply function extensions Only Protected allocation of extremely low-level resources such as disk blocks. Expects all other resource management functionality (even file system) to be linked in to applications as libraries. Virtualization Machine hardware allocated between multiple virtual machines (virtual hardware images), each running a separate OS instance IBM 370VM. Project Xen (2003) developed virtual machine monitor (for PC) A software layer between HW (with host PS) and commodity OS Such as Linux, Windows, Commercial VMWare Dynamically loadable kernel modules: Disadvantages Size: module management (using non-pageable kernel memory space) Complexity of kernel bootstrap Modules loaded from disk (driver needed to access disk must be loaded first) Manage bootstrap requiring extra work Loadable kernel modules: Choices How often: modules (drivers) to be used – on demand? Kernel to be built – usable on large variety of machines or single machine? Loadable kernel modules access kernel symbols: Only a fixed set of entry points made available to kernel modules. Ensure kernel module can not invoke arbitrary code with kernel and interfere with kernel execution Guaranteed that the kernel interactions at controlled points (invariants) Drawbacks Limited number of symbols can be accessable to kernel module. (loss in flexibility and performance because some details of kernel are hidden from the module) Shared Libraries: advantages Reliability: Protection to prevent bug in kernel Performance: less kernel memory calling faster – since calling kernel involves syscall Manageability loaded as needed, upgrade is easy- no system down Shared Libraries: drawbacks Some code unsuitable in shared library Low level dev drivers or file systems Services shared around the entire system better implemented in kernel if performance critical. Otherwise if it is running as separate process/communicating with it through IPC (requiring two context switches for each requested) Security/Privilege Limitation Shared lib. runs with privilege of process calling the lib Can not directly access resources inaccessible to calling process Service provided requires any privilege outside a normal process or data managed by lib. needs to be protected from normal user process - ? Virtual Machines Layered approach to its logical conclusion Treats HW and OS kernel as though they are all hardware Provides interface identical to underlying bare hardware Complete protection of system resources (isolated) – OS development OS, JVM, VMWare Introduction to Linux Kernel Monolithic (much from Micro-kernel) Dynamically loadable kernel module SMP support Kernel preemptive, schedulable Thread support: process/thread, shared/partial shared (clone) Processor affinity Implement only that makes real-sense/applications Kernel memory not pageable Kernel version notation: 2.5 develop / 2.6 stable (2.6.1 patch) Kernel different from user-space applications No libc (not linked) – speed and size issue many functions implemented inside the kernel: printk Source in GNU C (not ANSI C): some extensions inline function, inline assembly No memory protection, kernel memory not pageable No (Easy) use of Floating Point (no easy way to trap itself) Small fixed-size stack (user stack can be large, dynamic) More susceptible to race condition (synchronization/concurrency) preemptive multitasking/SMP/interrupts More important on portability Architecture indep. C and architecture dep. Part on kernel source tree Remain endian neutral, be 64-bit clean, no assumption on word/page size Linux on various hardware platform and needs to be portable to different CPU and memory architectures – How is kernel designed and implemented? (a) Keep as much as possible on common code (b) Provide a clean way of defining architecture-specific properties and code Two separate subdirectory hierarchies for each hardware architecture Code for the architecture (syscalls interface, interrupts) C header files (descriptive of the architecture): type definition and macro hiding architecture difference – word/page size, endian order conversion Process Management Manipulating Current Process States Process Context User Space System Calls -> kernel space (on behalf of process) – process context [Current macro is valid] Interrupt context Process Family Tree init init_task pid=1 // starts in last step of boot reads system initscripts – completing boot process Tree: parent, children, siblings task_struct task list is circular doubly linked list macros: next_task (task), prev_task (task), for_each_process(task) Process creation - fork, vfork, clone, exec fork: copy, But own pid, ppid, pending signals, copy-on-write Copy On Write (COW) Share addr. space (data marked: if written, dup) Delay copying until actually written (exec case, will never) Only overhead of fork: dup of parent page tables, creation of process descriptor Fork Implement fork via clone syscall fork, vfork,_clone library calls : all invoke clone syscall do-fork (in kernel/fork.c) -> copy_process dup_task_struct: new kernel stack, thread_info, task_struct check resource limit process descriptor (shared, or adjusted – initial values) Task_UNINTERRUPTIBLE (ensure - not running yet) Copy_flags (update flags in task_struct) New PID Based on flags passed to clone Dup/share open files, filesystem info. Signal handlers, addr. space Remaining time slice split between parent/child Vfork Same as fork except: page table entries not copied Implemented via a special flag to clone syscall Child executes as sole thread in parent addr. space Parent blocked till child exec/exit Child not allowed writing to addr. space Because: COW and child run first semantics The only benefit of vfork is: not copying parent page table entries Implementation of Threads No concept of thread Implement as process Other OS: threads are an abstraction to provide a lighter/quicker execution unit One process descriptor (shared) - in turn pointing to different threads (indiv.) Linux simply a manner of sharing resources between processes Simply several processes and several task_struct structures (and share some resource) Process Termination exit () syscall or implicitly, do_exit() task_struct : EXITING remove kernel timer, release mm_struct, exit_sem accounting exit_files, fs, namespace, sighand (ref. count) set exit code exit_notify signal to parent reparent any of the task’s children to another thread in their thread group or to init process. TASK_ZOMBIE call schedule() The only memory it occupies: kernel stack, thread_info, task_struct (freed when parent wait for – completion) wait family are implemented via a single wait4() syscall by parent AT END: release_task (free all resources) Parentless Task Parent exit before its children -> reparent child tasks To either another process in the current thread group or If fails, the init process. do_exit -> notify_parent ->forget_original_parent