Operating System Design General Design Principles Dr. C. C. Lee Ref: Operating System Concepts by Silberschatz… OS Design and Structure OS vs Kernel(privileged part) Operating System Services (Shared/Protected) Internal Services: CPU execution Scheduling, Memory/IO/file management Interfaces: GUI, commands system calls or library calls System programs: invoking system calls Others: Utilities, compilers, editors, Shell, Misc. tools Operating System Design Goals Varies from User to system itself Users want: Easy to use, reliable, fast Systems want: Easy to implement/maintain, flexible, reliable, efficient Varies from system to system Embedded Systems, Server Systems, etc. Main goals for General-purpose Systems Define abstractions Data structures of Processes, files, threads, signals, I/O model Implement primitive operations on abstractions Read/write files Implemented in system calls Ensure Isolation/Protection Users, processes, files, virtualization Managing hardware Support low-level chips, interrupt controllers Device drivers Main Design Issues Start with interfaces it provides Internal system design Mechanism/policy System structure and architecture Mechanism and Policy Separation Purpose: Minimize change, flexibility Mechanism: define timer struct Policy: define time quantum value Mechanism: define priority array with scheduler search the highest priority to run Policy: set the different priority for different users or processes System Structure and Architecture Layered Approach Unix System programs: user mode Kernel mode CPU/memory/process/device management, protection, etc Windows System programs: user mode Kernel mode Kernel layer Executive layer: Manage kernel layer objects Hardware abstraction layer to different hardware Monolithic Kernel (Unix, Windows…) Single address space – direct call, fast Problems - maintenance, flexibility, expandability, reliability, secure Microkernel (Mach) Move as much from kernel to user space– reliable, secure, extensibility Problems –slow due to IPC (user/kernel), with separate address space Loadable kernel modules Each core component is separate and loadable as needed within the kernel. Concerns: Kernel size due to module management and complexity of kernel bootstrap Choices of modules: How often modules to be used, will kernel be built for many architectures? Hybrid SYSTEMS Linux kernel: in kernel address space - monolithic plus modules for dynamic loading of functionality Windows: most monolithic, plus microkernel for different subsystems personalities MacOS X: Layered, Aqua UI, Cocoa programming environment, Mach microkernel, BSD Unix, dynamically loadable modules (called kernel extensions) Apple iOS: Structured Mac OS X, added functionality, layers Android: Based on Linux Kernel (modified), layers, run-time environments Dalvik VM, libraries, frameworks. Processes Process Concept Program in execution (text, data, stack, heap) Process Control Block (PCB) context switch Process states (ready, blocked, running) Process Scheduling Long term scheduling: Job, Batch Short term scheduling: CPU scheduling Medium term scheduling: Swap in/out Operations on Processes Creation/terminate/abort IPC Shared memory Approach Processes: shared memory mapping shmget, shmat, shmdt, shmctl Message passing (mailbox, ports) Various message system calls msgget, msgctl, msgsend, msgrcv Threads Processes vs Threads Child processes, Multiple-threads Multithreading, multitasking Concurrency, multiple CPUs Sharing the process PCB, with its own program counter, stack, register info. thread ID. Context switch (within same process) overhead minimal. User/Kernel Threads: User/Kernel Manage Green threads, GNU Portable Threads Windows, Linux, Mac OS X, Solaris Multithreading Models (User Many-to-One: blocking threads->Kernel) (rare now) Solaris Green thread, GNU portable thread One-to-One: more concurrency, overhead Windows, Linux… Many-to-many: more currency Windows with the ThreadFiber package Two-Level Model (M:M, Can be bounded) Thread Library (API) Entirely in user space (local call) OR Entirely in kernel (system call) Three Main Thread Library (API) POSIX Pthreads (API): Either user or kernel library Specification, not implementation Common in Unix systems, Mac OS X Windows Threads: Win32 Kernel level library Java threads: managed by JVM Implemented by underline OS Windows: Win32 API UNIX: POSIX Pthreads Scheduler Activations Scheme for communication between userthread library and the kernel. Kernel will make upcall to run a new user thread (with a new LWP) when blocking occurs on the current thread. CPU Scheduling CPU Scheduling Basics CPU utilization with multiprogramming CPU and I/O cycle CPU Scheduler CPU scheduling occurs when a process state changes from: Run to Wait, Terminate (Nonpreemptive) Run to Ready, wait to ready (Preemptive) CPU Dispatcher Switching context Start newly selected processes CPU Scheduling Optimization Principle CPU utilization Throughput Turnaround time Waiting time Response time CPU (Short-term) Scheduling Algorithms FCFS SJF: minimum average waiting time Shortest-Remaining-Time First Priority-based RR (Quantum Size? Context Switch overhead) Multi-Level Feedback Queue (Feedback Aging or quantum expired, I/O return) – I/O-bound favored over CPU-bound (feedback, aging, I/O) Why? How? Multiple-Processor Scheduling Asymmetric Multiprocessing, SMP(common) Processor Affinity: Soft/Hard Load Balancing Real-Time Scheduling Soft Real-time Hard Real-time POSIX Real-Time Scheduling SCHED_FIFO, SCHED_RR specified pthread_attr_setschedpolicy(&attr, SCHED_FIFO) OS Scheduling Algorithm Examples *Solaris Scheduling (Classes) Interrupt threads: priority 160-169 Real-time threads: 100-159 System threads: 60-99 Timeshare and others: 0-59 (dispatch table) *Windows Scheduling (Priority Classes) A process can have the following classes: REALTIME_PRIORITY_CLASS, HIGH_PRIORITY_CLASS … A thread within a given priority class has a relative priority: TIME_CRITICAL, HIGHEST, ABOVE_NORMAL .… *Linux Scheduling: O(1)/CFS (later) Priority-Inversion Problem and Priority Inheritance Solution (L, M1,M2.., H priority) Disk Scheduling Cylinders, Tracks, Sectors Seek Time, Rotational Delay, Transfer Time Disk Arm Movement (Seek Time) To be minimized Elevator Algorithm SCAN C-SCAN (more uniform waiting time - begin) LOOK (more common, only to the last request) C-LOOK Algorithm Selection SSTF is common and natural Elevator algorithm – heavy load systems Can be influenced by file-allocation method: contiguous allocation – nearby SSTF or LOOK usually as default Process (Thread) Synchronization Why and What ? Parallelism/concurrency and IPC Synchronization, Coordination Mutual exclusion Cooperation Race condition Producer/Consumer problem Critical-Section (shared access) Solution to Critical Section Problem Mutually Exclusive Progress (can’t delayed indefinitely if no other in CS) Bounded Waiting (bounded wait of entering times from others) Some Software Solutions (difficulties) Peterson’s Solution turn and flag Two Processes? All Conditions Met? The key is: interleaving due to Interrupts Preemption Only guarantee: at instruction level Hardware/Architecture Solution Uniprocessors Disable interrupts? User level? Hardware scalable? Atomic hardware instructions (TS, SWAP) Locks Implemented using hardware instructions Non-busy-waiting/busy-wait Mutex, spinlock Semaphores What is? P/V, Down/Up, “Wait”/”Signal” Synchronization Tool by Dijkstra for Mutual Exclusion and Process Cooperation Implemented by lower level primitives i.e. machine instructions. Binary/Counting Semaphores Purpose? Differences? Potential Deadlock/Starvation – Semaphores, Example? Bounded Buffer Problem – Traditional Producer/Consumer problem (Binary/Counter semaphores used?) Readers and Writers Problem Examples, Readers favored? Binary/Counter semaphores used? Condition Variables, Monitors Pthread Examples Mutex locks pthread_mutex_init/lock/unlock Condition Variables pthread_cond_init/signal/wait Read-Write Locks pthread_rwlock_init/rdlock/wrlock Solaris, Windows, Linux Examples All Synchronization tools are used Solaris: Adaptive mutex Windows: Spinlocks (kernel), Dispatcher objects (Executive) Linux: Sequencial Locks Deadlocks Four Necessary Conditions Mutual Exclusion Hold and Wait No Preemption Circular waiting Detection and recovery Detection Cycle detection (single instance resource) Allocation and check if all processes complete Recovery Kill/abort cycle Prevention Violate any of the necessary conditions Mutual Exclusion> Hold and Wait> No Preemption> Circular waiting> Shared All or none allocation Preempt, abort (transaction, DB) Resource ordering Avoidance Safe state; Safe algorithm Banker’s algorithm – Use Safe algorithm Example: PPTs (7.27-7.33) Memory Management Memory Allocation and Relocation Problems Contiguous Allocation -> Fragmentation (external), Compaction? Paging and Page Tables Mapping Fixed block called pages mapped to page frame No external fragmentation (but internal fragmentation in last page frame – minor) Table mapping->Memory Effective Access Time? TLB to speedup with good hit ratio – program locality Why multilevel page tables? (hierarchical) – avoid large contiguous page tables Segmentation and paging User (logical) view Same as paging with internal Fragmentation MULTICS, Intel Pentium Demand Paging (vs. Prepaging) Pages loaded only as they are needed Page Faults and Handling Process Creation and Copy-on-Write Page Replacement (Policies) FIFO (Belady’s Anomaly – more page faults with memory increased) Optimal (Not realistic) LRU (Processing Overhead: clock counter, linked list) 2nd Chance (clock) : Implement LRU If the reference bit is 0, replace it Else if the reference bit is 1, give this page 2nd chance and move onto next page; reset reference bit to 0 If a page is used often to keep its reference bit set, it will never be replaced. Implementation: The clock algorithm using a circular queue A pointer (hand on a clock) indicates which page is to be replaced next When a frame is needed, the pointer advances, finds a page with a ref bit 0 As it advances, it clears the reference bits (2nd chance) Once a victim page is found, the page is replaced, and the new page is inserted in the circular queue in that position It degenerates to FIFO replacement if all bits are set Enhanced 2nd Chance (in addition to ref. bit - modified bit) <0,0> <0,1> <1,0> <1,1> neither recently referenced nor modified, best! modified but not recently referenced, will need to be written. recently referenced but clean—likely to be used again. both—likely to be used again and will need to be written. There are three 3 steps (4 loops) through the circular buffer: (1) Cycle through and look for <0,0>. If one is found, use that page. (2) Cycle through and look for <0,1>. Set the referenced bit to zero for all frames bypassed. Afterwards, (1, 0) -> (0, 0), and (1,1,) -> (0,1) (3) If step 2 failed, all referenced bits will now be zero and repetition of steps 1 and 2 are guaranteed to find a frame for replacement. Allocation of Frames (by size, by priority, ...) Thrashing (excessive paging) – Min. memory needed Working Set, Working Set Size, Locality (in memory) Memory-Mapped Files: mmap() call File I/O as memory-mapped (efficient) Sharing Allocating Kernel Memory Often allocated from a free-memory pool (contiguous) Kernel data structures of varying size (less than a page size) Minimize internal fragmentation and therefore should not subject to paging system. Often contiguous allocation required (for device to kernel) Implementation: Buddy System, Slab Allocation (see Linux) Other Considerations Page Size Selection Table space Fragmentation I/O time I/O interlock (lock page when in I/O) Operating System Examples Windows Demand paging with clustering Replacement policy with working set Min/Max. LRU with working set trimming (WS-Max or system memory pressure) VM manager periodically makes a pass through working set of each process and increment the age for pages that have not been marked in the PTE as referenced since last pass. LRU heuristics policy is used to remove the old page in working set trimming. Solaris Demand paging with modified (2-hands) Clock algorithm Parameters: lotsfree (start paging), desfree, minfree (swap) File-System Essentials Virtual File Systems (Layered File System) In-Memory File System Structure Open and Read (inode in memory) Design Criteria of Allocation Methods Contiguous Allocation Simple, Random Access, Fragmentation, Files can not grow Linked Allocation (FAT) Simple, No Random Access, No Fragmentation Indexed Allocation Random Access, No Fragmentation, Extra space form index table Combined Scheme (Unix) Performance Considerations File is usually accessed sequentially and file is small ? Contiguous File is usually accessed sequentially and file is large? Linked File is usually accessed randomly and file is large? Indexed Implementation of Free-Space management Bit Vector Extra space for bit maps, easy to get contiguous file Protect bit map on disk Access in memory – Consistency Problem Linked List No waste of space, can not get contiguous space easily Protect pointer to free list Efficiency and Performance of File-system Design Efficiency Issues (disk space) Disk allocation and directory algorithms Types of data kept in file’s directory entry Performance Issues Even after the basic file-system algorithms have been selected, We can still improve performance in several ways: Disk Cache (memory for frequently used blocks) Free-behind and Read-ahead to optimize sequential access Memory as virtual disk or RAM disk