CS 346 – Chapter 1 • • • • Operating system – definition Responsibilities What we find in computer systems Review of – Instruction execution – Compile – link – load – execute • Kernel versus user mode Questions • What is the purpose of a computer? • What if all computers became fried or infected? • How did Furman function before 1967 (the year we bought our first computer)? • Why do people not like computers? Definition • How do you define something? Possible approaches: – – – – What it consists of What is does (a functional definition) – purpose What if we didn’t have it What else it’s similar to • OS = set of software between user and HW – – – – Provides “environment” for user to work Convenience and efficiency Manage the HW / resources Ensure correct and appropriate operation of machine • 2 Kinds of software: application and system – Distinction is blurry; no universal definition for “system” Some responsibilities • Can glean from table of contents – Book compares an OS to a government – Don’t worry about details for now • Security: logins • Manage resources – Correct and efficient use of CPU – Disk: “memory management” – network access • File management • I/O, terminal, devices • Kernel vs. shell Big picture • Computer system has: CPU, main memory, disk, I/O devices • Turn on computer: – Bootstrap program already in ROM comes to life – Tells where to find the OS on disk. Load the OS. – Transfer control to OS once loaded. • From time to time, control is “interrupted” – Examples? • Memory hierarchy – Several levels of memory in use from registers to tape – Closer to CPU: smaller, faster, more expensive – OS must decide who belongs where Big picture (2) • von Neumann program execution – Fetch, decode, execute, data access, write result – OS usually not involved unless problem • Compiling – – – – – 1 source file 1 object file 1 entire program 1 executable file “link” object files to produce executable Code may be optimized to please the OS When you invoke a program, OS calls a “loader” program that precedes execution • I/O – Each device has a controller, a circuit containing registers and a memory buffer – Each controller is managed by a device driver (software) 2 modes • When CPU executing instructions, nice to know if the instruction is on behalf of the OS • OS should have the highest privileges kernel mode – Some operations only available to OS – Examples? • Users should have some restriction user mode • A hardware bit can be set if program is running in kernel mode • Sometimes, the user needs OS to help out, so we perform a system call Management topics • What did we ask the OS to do during lab? • File system • Program vs. process – – – – “job” and “task” are synonyms of process Starting, destroying processes Process communication Make sure 2 processes don’t interfere with each other • Multiprogramming – CPU should never be idle – Multitasking: give each job a short quantum of time to take turns – If a job needs I/O, give CPU to another job More topics • Scheduling: deciding the order to do the jobs – Detect system “load” – In a real-time system, jobs have deadlines. OS should know worst-case execution time of jobs • Memory hierarchy – Higher levels “bank” the lower levels – OS manages RAM/disk decision – Virtual memory: actual size of RAM is invisible to user. Allow programmer to think memory is huge – Allocate and deallocate heap objects – Schedule disk ops and backups of data CS 346 – Chapter 2 • OS services – OS user interface – System calls – System programs • How to make an OS – Implementation – Structure – Virtual machines • Commitment – For next day, please finish chapter 2. OS services 2 types • For the user’s convenience – – – – – Shell Running user programs Doing I/O File system Detecting problems • Internal/support – Allocating resources – System security – Accounting • Infamous KGB spy ring uncovered due to discrepancy in billing of computer time at Berkeley lab User interface • Command line = shell program – Parses commands from user – Supports redirection of I/O (stdin, stdout, stderr) • GUI – Pioneered by Xerox PARC, made famous by Mac – Utilizes additional input devices such as mouse – Icons or hotspots on screen • Hybrid approach – GUI allowing several terminal windows – Window manager System calls • “an interface for accessing an OS service within a computer program” • A little lower level than an API, but similar • Looks like a function call • Examples – Performing any I/O request, because these are not defined by the programming language itself e.g. read(file_ptr, str_buf_ptr, 80); – assembly languages typically have “syscall” instruction. When is it used? How? • If many parameters, they may be put on runtime stack Types of system calls • Controlling a process • File management • Device management • Information • Communication between processes • What are some specific examples you’d expect to find? System programs • Also called system utilities • Distinction between “system call” and “system program” • Examples – – – – – Shell commands like ls, lp, ps, top Text editors, compilers Communication: e-mail, talk, ftp Miscellaneous: cal, fortune What are your favorites? • Higher level software includes: – Spreadsheets, text formatters, etc. – But, boundary between “application” and “utility” software is blurry. A text formatter is a type of compiler! OS design ideas • An OS is a big program, so we should consider principles of systems analysis and software engineering • In design phase, need to consider policies and mechanisms – Policy = What should we do; should we do X – Mechanism = how to do X – Example: a way to schedule jobs (policy) versus: what input needed to produce schedule, how schedule decision is specified (mechanism) Implementation • Originally in assembly • Now usually in C (C++ if object-oriented) • Still, some code needs to be in assembly – Some specific device driver routines – Saving/restoring registers • We’d like to use HLL as much as possible – why? • Today’s compilers produce very efficient code – what does this tell us? • How to improve performance of OS: – More efficient data structure, algorithm – Exploit HW and memory hierarchy – Pay attention to CPU scheduling and memory management Kernel structure • Possible to implement minimal OS with a few thousand lines of code monolithic kernel – Modularize like any other large program – After about 10k loc, difficult to prove correctness • Layered approach to managing the complexity – Layer 0 is the HW – Layer n is the user interface – Each layer makes use of routines and d.s. defined at lower levels – # layers difficult to predict: many subtle dependencies – Many layers lots of internal system call overhead Kernel structure (2) • kernel – Kernel = minimal support for processes and memory management – (The rest of the OS is at user level) – Adding OS services doesn’t require changing kernel, so easier to modify OS – The kernel must manage communication between user program and appropriate OS services (e.g. file system) – Microsoft gave up on kernel idea for Windows XP • OO Module approach – Components isolated (OO information hiding) – Used by Linux, Solaris – Like a layered approach with just 2 layers, a core and everything else Virtual machine • How to make 1 machine behave like many • Give users the illusion they have access to real HW, distinct from other users • Figure 2.17 levels of abstraction: – Processes / kernels / VM’s / VM implementations / host HW As opposed to: – Processes / kernels / different machines • Why do it? – To test multiple OS’s on the same HW platform – Host machine’s real HW protected from virus in a VM bubble VM implementation • It’s hard! – Need to painstakingly replicate every HW detail, to avoid giving away the illusion – Need to keep track of what each guest OS is doing (whether it’s in kernel or user mode) – Each VM must interpret its assembly code – why? Is this a problem? • Very similar concept: simulation – Often, all we are interested in is changing the HW, not the OS; for example, adding/eliminating the data cache – Write a program that simulates every HW feature, providing the OS with the expected behavior CS 346 – Chapter 3 • • • • • • What is a process Scheduling and life cycle Creation Termination Interprocess communication: purpose, how to do it Client-server: sockets, remote procedure call • Commitment – Please read through section 3.4 by Wednesday and 3.6 by Friday. Process • Goal: to be able to run > 1 program concurrently – We don’t have to finish one before starting another – Concurrent doesn’t mean parallel – CPU often switches from one job to another • Process = a program that has started but hasn’t yet finished • States: – New, Ready, Running, Waiting, Terminated – What transitions exist between these states? Contents • A process consists of: – – – – – Code (“text” section) Program Counter Data section Run-time stack Heap allocated memory • A process is represented in kernel by a Process Control Block, containing: – – – – – – – State Program counter Register values Scheduling info (e.g. priority) Memory info (e.g. bounds) Accounting (e.g. time) I/O info (e.g. which files open) – What is not stored here? Scheduling • Typically many processes are ready, but only 1 can run at a time. – Need to choose who’s next from ready queue – Can’t stay running for too long! – At some point, process needs to be switched out temporarily back to the ready queue (Fig. 3.4) • What happens to a process ? (Fig 3.7) – New process enters ready queue. At some point it can run. – After running awhile, a few possibilities: 1. Time quantum expires. Go back to ready queue. 2. Need I/O. Go to I/O queue, do I/O, re-enter ready queue! 3. Interrupted. Handle interrupt, and go to ready queue. – Context switch overhead Creation • Processes can spawn other processes. – Parent / child relationship – Tree – Book shows Solaris example: In the beginning, there was sched, which spawned init (the ancestor of all user processes), the memory manager, and the file manager. – Process ID’s are unique integers (up to some max e.g. 215) • What should happen when process created? – OS policy on what resources for baby: system default, or copy parent’s capabilities, or specify at its creation – What program does child run? Same as parent, or new one? – Does parent continue to execute, or does it wait (i.e. block)? How to create • Unix procedure is typical… • Parent calls fork( ) – This creates duplicate process. – fork( ) returns 0 for child; positive number for parent; negative number if error. (How could we have error?) • Next, we call exec( ) to tell child what program to run. – Do this immediately after fork – Do inside the if clause that corresponds to case that we are inside the child! • Parent can call wait( ) to go to sleep. – Not executing, not in ready queue Termination • Assembly programs end with a system call to exit( ). – An int value is returned to parent’s wait( ) function. This lets parent know which child has just finished. • Or, process can be killed prematurely – Why? – Only the parent (or ancestor) can kill another process – why this restriction? • When a process dies, 2 possible policies: – OS can kill all descendants (rare) – Allow descendants to continue, but set parent of dead process to init IPC Examples • Allowing concurrent access to information – Producer / consumer is a common paradigm • Distributing work, as long as spare resources (e.g. CPU) are around • A program may need result of another program – IPC more efficient than running serially and redirecting I/O – A compiler may need result of timing analysis in order to know which optimizations to perform • Note: ease of programming is based on what OS and programming language allow 2 techniques • Shared memory – 2 processes have access to an overlapping area of memory – Conceptually easier to learn, but be careful! – OS overhead only at the beginning: get kernel permission to set up shared region • Message passing – Uses system calls, with kernel as middle man – easier to code correctly – System call overhead for every message we’d want amount of data to be small – Definitely better when processes on different machines • Often, both approaches are possible on the system Shared memory • Usually forbidden to touch another process’ memory area • Each program must be written so that the shared memory request is explicit (via system call) – An overlapping “buffer” can be set up. Range of addresses. But there is no need for the buffer to be contiguous in memory with the existing processes. – Then, the buffer can be treated like an array (of char) • Making use of the buffer (p. 122) – Insert( ) function – Remove( ) function – Circular array… does the code make sense to you? Shared memory (2) • What could go wrong?... How to fix? • Trying to insert into full buffer • Trying to remove from empty buffer • Sound familiar? • Also: both trying to insert. Is this a problem? Message passing • Make continual use of system calls: – Send( ) – Receive( ) • Direct or indirect communication? – Direct: send (process_num, the_message) Hard coding the process we’re talking to – Indirect: send (mailbox_num, the_message) Assuming we’ve set up a “mailbox” inside the kernel • Flexibility: can have a communication link with more than 2 processes. e.g. 2 producers and 1 consumer • Design issues in case we have multiple consumers – We could forbid it – Could be first-come-first-serve Synchronization • What should we do when we send/receive a message? • Block (or “wait”): – Go to sleep until counterpart acts. – If you send, sleep until received by process or mailbox. – If you receive, block until a message available. How do we know? • Don’t block – Just keep executing. If they drop the baton it’s their fault. – In case of receive( ), return null if there is no message (where do we look?) • We may need some queue of messages (set up in kernel) so we don’t lose messages! Buffer messages • The message passing may be direct (to another specific process) or indirect (to a mailbox – no process explicitly stated in the call). • But either way, we don’t want to lose messages. • Zero capacity: sender blocks until recipient gets message • Bounded capacity (common choice): Sender blocks if the buffer is full. • Unbounded capacity: Assume buffer is infinite. Never block when you send. Socket • Can be used as an “endpoint of communication” • Attach to a (software) port on a “host” computer connected to the Internet – 156.143.143.132:1625 means port # 1625 on the machine whose IP number is 156.143.143.132 – Port numbers < 1024 are pre-assigned for “well known” tasks. For example, port 80 is for a Web server. • With a pair of sockets, you can communicate between them. • Generally used for remote I/O Implementation • Syntax depends on language. • Server – – – – – – Create socket object on some local port. Wait for client to call. Accept connection. Set up output stream for client. Write data to client. Close client connection. Go back to wait • Client – Create socket object to connect to server – Read input analogous to file input or stdin – Close connection to server Remote procedure call • Useful application of inter-process communication (the message-passing version) • Systematic way to make procedure call between processes on the network – Reduce implementation details for user • Client wants to call foreign function with some parameters – – – – – Tell kernel server’s IP number and function name 1st message: ask server which port corresponds with function 2nd message: sending function call with “marshalled” parameters Server daemon listens for function call request, and processes Client receives return value • OS should ensure function call successful (once) CS 346 – Chapter 4 • Threads – How they differ from processes – Definition, purpose Threads of the same process share: code, data, open files – Types – Support by kernel and programming language – Issues such as signals – User thread implementation: C and Java • Commitment – For next day, please read chapter 4 Thread intro • Also called “lightweight process” • One process may have multiple threads of execution • Allows a process to do 2+ things concurrently – Games – Simulations • Even better: if you have 2+ CPU’s, you can execute in parallel • Multicore architecture demand for multithreaded applications for speedup • More efficient than using several concurrent processes Threads • A process contains: – Code, data, open files, registers, memory usage (stack + heap), program counter • Threads of the same process share – Code, data, open files • What is unique to each thread? • Can you think of example of a computational algorithm where threads would be a great idea? – Splitting up the code – Splitting up the data • Any disadvantages? 2 types of threads • User threads – Can be managed / controlled by user – Need existing programming language API support: POSIX threads in C Java threads • Kernel threads – Management done by the kernel • Possible scenarios – OS doesn’t support threading – OS support threads, but only at kernel level – you have no direct control, except possibly by system call – User can create thread objects and manipulate them. These objects map to “real” kernel threads. Multithreading models • Many-to-one: User can create several thread objects, but in reality the kernel only gives you one. Multithreading is an illusion • One-to-one: Each user thread maps to 1 real kernel thread. Great but costly to OS. There may be a hard limit to # of live threads. • Many-to-many: A happy compromise. We have multithreading, but the number of true threads may be less than # of thread objects we created. – A variant of this model “two-level” allows user to designate a thread as being bound to one kernel thread. Thread issues • What should OS do if a thread calls fork( )? – Can duplicate just the calling thread – Can duplicate all threads in the process • exec ( ) is designed to replace entire current process • Cancellation – kill thread before it’s finished – “Asynchronous cancellation” = kill now. But it may be in the middle of an update, or it may have acquired resources. You may have noticed that Windows sometimes won’t let you delete a file because it thinks it’s still open. – “Deferred cancellation”. Thread periodically checks to see if it’s time to quit. Graceful exit. Signals • Reminiscent of exception in Java • Occurs when OS needs to send message to a process – Some defined event generates a signal – OS delivers signal – Recipient must handle the signal. Kernel defines a default handler – e.g. kill the process. Or, user can write specific handler. • Types of signals – Synchronous: something in this program caused the event – Asynchronous: event was external to my program Signals (2) • But what if process has multiple threads? Who gets the signal? For a given signal, choose among 4 possibilities: – – – – Deliver signal to the 1 appropriate thread Deliver signal to all threads Have the signal indicate which threads to contact Designate a thread to receive all signals • Rules of thumb… – Synchronous event just deliver to 1 thread – User hit ctrl-C kill all threads Thread pool • Like a motor pool • When process starts, can create a set of threads that sit around and wait for work • Motivation – overhead in creating/destroying – We can set a bound for total number of threads, and avoid overloading system later • How many threads? – User can specify – Kernel can base on available resources (memory and # CPU’s) – Can dynamically change if necessary POSIX threads • aka “Pthreads” • C language • Commonly seen in UNIX-style environments: – Mac OS, Linux, Solaris • POSIX is a set of standards for OS system calls – Thread support is just one aspect • POSIX provides an API for thread creation and synchronization • API specifies behavior of thread functionality, but not the low-level implementation Pthread functions • pthread_attr_init – – – – Initialize thread attributes, such as Schedule priority Stack size State • pthread_create – Start new thread inside the process. – We specify what function to call when thread starts, along with the necessary parameter – The thread is due to terminate when its function returns • pthread_join – Allows us to wait for a child thread to finish Example code #include <pthread.h> int sum; main() { pthread_t tid; pthread_attr attr; pthread_attr_init(&attr); pthread_create(&tid, &attr, fun, argv[1]); pthread join(tid, NULL); printf(“%d\n”, sum); } int fun(char *param) ... void *fun(void *param) { // compute a sum: // store in global // variable ... } Java threads • Managed by the Java virtual machine • Two ways to create threads 1. Create a class that extends the Thread class – Put code inside public void run( ) 2. Implement the Runnable interface – public void run( ) • Parent thread (e.g. in main() …) – – Create thread object – just binds name of thread Call start( ) – creates actual running thread, goes to run( ) See book example Skeletons class Worker extends Thread { public void run() { // do stuff } } public class Driver { // in main method: Worker w = new Worker(); w.start(); ... Continue/join } class Worker2 implements Runnable { public void run() { // do stuff } } Public class Driver2 { // in main method: Runnable w2=new Worker2(); Thread t = new Thread(w2); t.start(); // ...Continue/join } Java thread states • This will probably sound familiar! • New – From here, go to “runnable” at call to start( ) • Runnable – Go to “blocked” if need I/O or going to sleep – Go to “dead” when we exit run( ) – Go to “waiting” if we call join( ) for child thread • Blocked – Go to “runnable” when I/O is serviced • Waiting • Dead CS 346 – Sect. 5.1-5.2 • Process synchronization – What is the problem? – Criteria for solution – Producer / consumer example – General problems difficult because of subtleties Problem • It’s often desirable for processes/threads to share data – Can be a form of communication – One may need data being produced by the other • Concurrent access possible data inconsistency • Need to “synchronize”… – HW or SW techniques to ensure orderly execution • Bartender & drinker – – – – – Bartender takes empty glass and fills it Drinker takes full glass and drinks contents What if drinker overeager and starts drinking too soon? What if drinker not finished when bartender returns? Must ensure we don’t spill on counter. Key concepts • Critical section = code containing access to shared data – Looking up a value or modifying it • Race condition = situation where outcome of code depends on the order in which processes take turns – The correctness of the code should not depend on scheduling • Simple example: producer / consumer code, p. 204 – Producer adds data to buffer and executes ++count; – Consumer grabs data and executes --count; – Assume count initially 5. – Let’s see what could happen… Machine code Producer’s ++count becomes: 1 2 3 r1 = count r1 = r1 + 1 count = r1 Consumer’s --count becomes: 4 5 6 r2 = count r2 = r2 – 1 count = r2 Does this code work? Yes, if we execute in order 1,2,3,4,5,6 or 4,5,6,1,2,3 -- see why? Scheduler may have other ideas! Alternate schedules 1 2 4 5 3 6 r1 = count r1 = r1 + 1 r2 = count r2 = r2 – 1 count = r1 count = r2 1 2 4 5 6 3 r1 = count r1 = r1 + 1 r2 = count r2 = r2 – 1 count = r2 count = r1 • What are the final values of count? • How could these situations happen? • If the updating of a single variable is nontrivial, you can imagine how critical the general problem is! Solution criteria • How do we know we have solved a synchronization problem? 3 criteria: • Mutual exclusion – Only 1 process may be inside its critical section at any one time. – Note: For simplicity we’re assuming there is one zone of shared data, so each process using it has 1 critical section. • Progress – Don’t hesitate to enter your critical section if no one else is in theirs. – Avoid an overly conservative solution • Bounded waiting – There is a limit on # of times you may access your critical section if another is still waiting to enter theirs. – Avoid starvation Solution skeleton while (true) { Seek permission to enter critical section Do critical section Announce done with critical section Do non-critical code } • BTW, easy solution is to forbid preemption. – But this power can be abused. – Identifying critical section can avoid preemption for a shorter period of time. CS 346 – Sect. 5.3-5.7 • Process synchronization – A useful example is “producer-consumer” problem – Peterson’s solution – HW support – Semaphores – “Dining philosophers” • Commitment – Compile and run semaphore code from os-book.com Peterson’s solution … to the 2-process producer/consumer problem. (p. 204) while (true) { ready[ me ] = true turn = other while (ready[ other ] && turn == other) ; Do critical section ready[ me ] = false Do non-critical code } // Don’t memorize but think: Why does this ensure mutual exclusion? // What assumptions does this solution make? HW support • As we mentioned before, we can disable interrupts – No one can preempt me. – Disadvantages • The usual way to handle synchronization is by careful programming (SW) • We require some atomic HW operations – A short sequence of assembly instructions guaranteed to be non-interruptable – This keeps non-preemption duration to absolute minimum – Access to “lock” variables visible to all threads – e.g. swapping the values in 2 variables – e.g. get and set some value (aka “test and set”) Semaphore • Dijkstra’s solution to mutual exclusion problem • Semaphore object – integer value attribute ( > 0 means resource is available) – acquire and release methods • Semaphore variants: binary and counting – Binary semaphore aka “mutex” or “mutex lock” acquire() { if (value <= 0) wait/sleep --value } release() { ++value // wake sleeper } Deadlock / starvation • After we solve a mutual exclusion problem, also need to avoid other problems – Another way of expressing our synchronization goals • Deadlock: 2+ process waiting for an event that can only be performed by one of the waiting processes – the opposite of progress • Starvation: being blocked for an indefinite or unbounded amount of time – e.g. Potentially stuck on a semaphore wait queue forever Bounded-buffer problem • aka “producer-consumer”. See figures 5.9 – 5.10 • Producer class – run( ) to be executed by a thread – Periodically call insert( ) • Consumer class – Also to be run by a thread – Periodically call remove( ) • BoundedBuffer class – Creates semaphores (mutex, empty, full): why 3? Initial values: mutex = 1, empty = SIZE, full = 0 – Implements insert( ) and remove( ). These methods contain calls to semaphore operations acquire( ) and release( ). Insert & delete public void insert(E item) { empty.acquire(); mutex.acquire(); public E remove() { full.acquire(); mutex.acquire(); // add an item to the // buffer... mutex.release(); full.release(); } // remove item ... mutex.release(); empty.release(); } • What are we doing with the semaphores? Readers/writers problem • More general than producer-consumer • We may have multiple readers and writers of shared info • Mutual exclusion requirement: Must ensure that writers have exclusive access • It’s okay to have multiple readers reading See example solution, Fig. 5.10 – 5.12 • Reader and Writer threads periodically want to execute. – Operations guarded by semaphore operations • Database class (analogous to BoundedBuffer earlier) – readerCount – 2 semaphores: one to protect database, one to protect the updating of readerCount Solution outline Reader: mutex.acquire(); ++readerCount; if(readerCount == 1) db.acquire(); mutex.release(); // READ NOW mutex.acquire(); --readerCount; if(readerCount == 0) db.release(); mutex.release(); Writer: db.acquire(); // WRITE NOW db.release(); Example output writer writer writer reader writer reader reader Reader Reader Reader writer Reader Reader Reader writer reader writer 0 0 0 2 1 0 1 2 0 1 0 1 2 0 1 0 1 wants to write. is writing. is done writing. wants to read. wants to write. wants to read. wants to read. is reading. Reader count = 1 is reading. Reader count = 2 is reading. Reader count = 3 wants to write. is done reading. Reader count = 2 is done reading. Reader count = 1 is done reading. Reader count = 0 is writing. wants to read. is done writing. CS 346 – Sect. 5.7-5.8 • Process synchronization – “Dining philosophers” (Dijkstra, 1965) – Monitors Dining philosophers • Classic OS problem – Many possible solutions depending on how foolproof you want solution to be • Simulates synchronization situation of several resources, and several potential consumers. • What is the problem? • Model chopsticks with semaphores – available or not. – Initialize each to be 1 • Achieve mutual exclusion: – acquire left and right chopsticks (numbered i and i+1) – Eat – release left and right chopsticks • What could go wrong? DP (2) • What can we say about this solution? mutex.acquire(); Acquire 2 neighboring forks Eat Release the 2 forks mutex.release(); • Other improvements: – Ability to see if either neighbor is eating – May make more sense to associate semaphore with the philosophers, not the forks. A philosopher should block if cannot acquire both forks. – When done eating, wake up either neighbor if necessary. Monitor • Higher level than semaphore – Semaphore coding can be buggy • Programming language construct – Special kind of class / data type – Hides implementation detail • Automatically ensures mutual exclusion – Only 1 thread may be “inside” monitor at any one time – Attributes of monitor are the shared variables – Methods in monitor deal with specific synchronization problem. This is where you access shared variables. – Constructor can initialize shared variables • Supported by a number of HLLs – Concurrent Pascal, Java, C# Condition variables • With a monitor, you get mutual exclusion • If you also want to ensure against deadlock or starvation, you need condition variables • Special data type associated with monitors • Declared with other shared attributes of monitor • How to use them: – No attribute value to manipulate. 2 functions only: – Wait: if you call this, you go to sleep. (Enter a queue) – Signal: means you release a resource, waking up a thread waiting for it. – Each condition variable has its own queue of waiting threads/processes. Signal( ) • • • • A subtle issue for signal… In a monitor, only 1 thread may be running at a time. Suppose P calls x.wait( ). It’s now asleep. Later, Q calls x.signal( ) in order to yield resource to P. • What should happen? 3 design alternatives: – “blocking signal” – Q immediately goes to sleep so that P can continue. – “nonblocking signal” – P does not actually resume until Q has left the monitor – Compromise – Q immediately exits the monitor. • Whoever gets to continue running may have to go to sleep on another condition variable. CS 346 – Sect. 5.9 • Process synchronization – “Dining philosophers” monitor solution – Java synchronization – atomic operations Monitor for DP • Figure 5.18 on page 228 • Shared variable attributes: – state for each philosopher – “self” condition variable for each philosopher • takeForks( ) – Declare myself hungry – See if I can get the forks. If not, go to sleep. • returnForks( ) – Why do we call test( )? • test( ) – If I’m hungry and my neighbors are not eating, then I will eat and leave the monitor. Synch in Java • “thread safe” = data remain consistent even if we have concurrently running threads • If waiting for a (semaphore) value to become positive – Busy waiting loop – Better: Java provides Thread.yield( ): “block me” • But even “yielding” ourselves can cause livelock – Continually attempting an operation that fails – e.g. You wait for another process to run, but the scheduler keeps scheduling you instead because you have higher priority Synchronized • Java’s answer to synchronization is the keyword synchronized – qualifier for method as in public synchronized void funName(params) { … • When you call a synchronized method belonging to an object, you obtain a “lock” on that object e.g. sem.acquire(); • Lock automatically released when you exit method. • If you try to call a synchronized method, & the object is already locked by another thread, you are blocked and sent to the object’s entry set. – Not quite a queue. JVM may arbitrarily choose who gets in next Avoid deadlock • Producer/consumer example – Suppose buffer is full. Producer now running. – Producer calls insert( ). Successfully enters method has lock on the buffer. Because buffer full, calls Thread.yield( ) so that consumer can eat some data. – Consumer wakes up, but cannot enter remove( ) method because producer still has lock. we have deadlock. • Solution is to use wait( ) and notify( ). – When you wait, you release the lock, go to sleep (blocked), and enter the object’s wait set. Not to be confused with entry set. – When you notify, JVM picks a thread T from the wait set and moves it to entry set. T now eligible to run, and continues from point after its call to wait(). notifyAll • Put every waiting thread into the entry set. – Good idea if you think > 1 thread waiting. – Now, all these threads compete for next use of synchronized object. • Sometimes, just calling notify can lead to deadlock – Book’s doWork example *** – Threads are numbered – doWork has a shared variable turn. You can only do work here if it’s your turn: if turn == your number. – Thread 3 is doing work, sets turn to 4, and then leaves. – But thread 4 is not in the wait set. All other threads will go to sleep. More Java support See: java.util.concurrent • Built-in ReentrantLock class – Create an object of this class; call its lock and unlock methods to access your critical section (p. 282) – Allows you to set priority to waiting threads • Condition interface (condition variable) – Meant to be used with a lock. What is the goal? – await( ) and notify( ) • Semaphore class – acquire( ) and release( ) Atomic operations • Behind the scenes, need to make sure instructions are performed in appropriate order • “transaction” = 1 single logical function performed by a thread – In this case, involving shared memory – We want it to run atomically • As we perform individual instructions, things might go smoothly or not – If all ok, then commit – If not, abort and “roll back” to earlier state of computation • This is easier if we have fewer instructions in a row to do Keeping the order Transaction 1 Transaction 2 Transaction 1 Read (A) Read (A) Write (A) Write (A) Transaction 2 Read (B) Read (A) Write (B) Write (A) Read (A) Read (B) Write (A) Write (B) Read (B) Read (B) Write (B) Write (B) • Are these two schedules equivalent? Why? CS 346 – Chapter 6 • CPU scheduling – Characteristics of jobs – Scheduling criteria / goals – Scheduling algorithms – System load – Implementation issues – Real-time scheduling Schedule issues • Multi-programming is good! better CPU utilization • CPU burst concept – Jobs typically alternate between work and wait – Fig. 6.2: Distribution has long tail on right. General questions • How or when does a job enter the ready queue? • How much time can a job use the CPU? • Do we prioritize jobs? • Do we pre-empt jobs? • How do we measure overall performance? Scheduler • Makes short-term decisions – When? Whenever a job changes state (becomes ready, needs to wait, finishes) – Selects a job on the ready queue – Dispatcher can then do the “context switch” to give CPU to new job • Should we preempt? – Non-preemptive = Job continues to execute until it has to wait or finishes – Preemptive = Job may be removed from CPU while doing work! – When you preempt: need to leave CPU “gracefully”. May be in the middle of a system call or modifying shared data. Often we let that operation complete. Scheduling criteria • CPU utilization = what % of time CPU is executing instructions • Throughput = # or rate of jobs completed in some time period • Turnaround time = (finish time) – (request time) • Waiting time = how long spent in ready state – Confusing name! • Response time = how long after request that a job begins to produce output • Usually, we want to optimize the “average” of each measure. e.g. Reduce average turnaround time. Some scheduling algorithms • First-come, first-served • Round robin – Like FCFS, but each job has a limited time quantum • Shortest job next • We use a Gantt chart to view and evaluate a schedule – e.g. compute average turnaround time • Often, key question is – in what order do we execute jobs? • Let’s compare FCFS and SJN… Example 1 Process number Time of request Execution time needed 1 0 20 2 5 30 3 10 40 4 20 10 • First-come, first-served – – – – Process 1 can execute from t=0 to t=20 Process 2 can execute from t=20 to t=50 Process 3 can execute from t=50 to t=90 Process 4 can execute from t=90 to t=100 • We can enter this info as extra columns in the table. • What is the average turnaround time? • What if we tried Shortest Job Next? Example 2 Process number Time of request Execution time needed 1 0 10 2 30 30 3 40 20 4 50 5 Note that it’s possible to have idle time. System load • A measure of how “busy” the CPU is • At an instant: how many tasks are currently running or ready. – If load > 1, the system is “overloaded”, and work is backing up. • Typically reported as an average of the last 1, 5, or 15 minutes. • Based on the schedule, can calculate average load as well as maximum (peak) load. Example 1 Job # Request Exec Start Finish 1 0 20 0 20 2 5 30 20 50 3 10 40 50 90 4 20 10 90 100 “Request time” aka “Arrival time” • FCFS schedule can also be depicted this way: X X X X R R R X X X X X X R R R R R R R R X X X X X X X X R R R R R R R R R R R R R R • What can we say about the load? X X Example 2 Job # Request Exec Start Finish 1 0 10 0 10 2 30 30 30 60 3 40 20 65 85 4 50 5 60 65 • SJN schedule can be depicted this way: X X X • Load? X X X X X R R R R R R R X X X X X Preemptive SJN • If a new job arrives with a shorter execution time (CPU burst length) than currently running process, preempt! • Could also call it “shortest remaining job next” • Let’s redo previous example allowing preemption – Job #1 is unaffected. – Job #2 would have run from 30 to 60, but … Job # Request Exec Start Finish 1 0 10 0 10 2 30 30 3 40 20 4 50 5 – Does preemption reduce average turnaround time? Load? Estimating time • Some scheduling algorithms like SJN need a job’s expected CPU time • We’re interested in scheduling bursts of CPU time, not literally the entire job. • OS doesn’t really know in advance how much of a “burst” will be needed. Instead, we estimate. • Exponential averaging method. We predict the next CPU burst will take this long: pn+1 = a tn + (1 – a)pn tn = actual time of the nth burst • Formula allows us to weight recent vs. long-term history. – What if a = 0 or 1? Estimating time (2) • pn+1 = a tn + (1 – a)pn • Why is it called “exponential”? Becomes clearer if we substitute all the way back to the first burst. • p1 = a t0 + (1 – a)p0 • p2 = a t1 + (1 – a)p1 = a t1 + (1 – a) [a t0 + (1 – a)p0 ] = a t1 + (1 – a) a t0 + (1 – a)2 p0 • A general formula for pn+1 will eventually contain terms of the form (1 – a) raised to various powers. – In practice, we just look at previous actual vs. previous prediction • Book’s example Figure 6.3: Prediction eventually converges to correct recent behavior. Priority scheduling • SJN is a special case of a whole class of scheduling algorithms that assign priorities to jobs. • Each job has a priority value: – Convention: low number = “high” priority • SJN: priority = next predicted burst time • Starvation: Some “low priority” jobs may never execute – How could this happen? • Aging: modify SJN so that while a job waits, it gradually “increases” its priority so it won’t starve. Round robin • • • • Each job takes a short turn at the CPU Commonly used, easy for OS to handle Time quantum typically 10-100 ms – it’s constant Choice of time quantum has a minor impact on turnaround time (Figure 6.5) – Can re-work an earlier example • Questions to think about: • If there are N jobs, what is the maximum wait time before you can start executing? • What happens if the time quantum is very large? • What happens if the time quantum is very short? Implementation issues • Multi-level ready queue • Threads • Multi-processor scheduling Multi-level queue • We can assign jobs to different queues based on their purpose or priority • Foreground / interactive jobs may deserve high priority to please the user – Also: real-time tasks • Background / routine tasks can be given lower priority • Each queue can have its own scheduling regime, e.g. round robin instead of SJN – Interactive jobs may have unpredictable burst times • Key issue: need to schedule among the queues themselves. How? Scheduling among queues • Classify jobs according to purpose priority • Priority based queue scheduling – Can’t run any Priority 2 job until all Priority 1 jobs done. – While running Priority 2 job, can preempt if a Priority 1 job arrives. – Starvation • Round robin with different time quantum for each queue • Time share for each queue – Decreasing % of time for lower priorities • Or… Multi-level feedback queue (pp. 275-277) – All jobs enter at Priority 0. Given short time quantum. – If not done, enter queue for Priority 1 jobs. Longer quantum next time. Thread scheduling • The OS schedules “actual” kernel-level threads • The thread library must handle user threads – One-to-one model – easy, each user thread is already a kernel thread. Direct system call – Many-to-many or many-to-one models • Thread library has 1 or a small number of kernel threads available. • Thread library must decide when user thread should run on a true kernel thread. • Programmer can set a priority for thread library to consider. In other words, threads of the same process are competing among themselves. Multi-processing • More issues to address, more complex overall • Homogeneous system = identical processors, job can run on any of them • Asymmetric approach = allocate 1 processor for the OS, all others for user tasks – This “master server” makes decisions about what jobs run on the other processors • Symmetric approach (SMP) = 1 scheduler for each processor, usually separate ready queue for each (but could have a common queue for all) • Load balancing: periodically see if we should “pull” or “push” jobs Affinity • When switched out, a job may want to return next time to the same processor as before – Why desirable? – An affinity policy may be “soft” or “hard”. – Soft = OS will try but not guarantee. Why might an OS prefer to migrate a process to a different processor? • Generalized concept: processor set – For each process, maintain a list of processors it may be run on • Memory system can exploit affinity, allocating more memory that is closer to the favorite CPU. (Fig. 6.9) Multicore processor • Conceptually similar to multiprocessors – Place multiple “processor cores” on same chip – Faster, consume less power – OS treats each core like a unique CPU • However, the cores often share cache memory – Leads to more “cache misses” Jobs spend more time stalled waiting for instructions or data to arrive – OS can allocate 2 threads to the same core, to increase processor utilization – Fig. 6.11 shows idealized situation. What happens in general? Real-time Scheduling • Real-time scheduling – Earliest Deadline First – Rate Monotonic • What is this about? – Primary goal is avoid missing deadlines. Other goals may include having response times that are low and consistent. – We’re assuming jobs are periodic, and the deadline of a job is the end of a period Real-time systems • Specialized operating system • All jobs potentially have a deadline – Correctness of operation depends on meeting deadlines, in addition to correct algorithm – Often, jobs are periodic; some may be aperiodic/sporadic • Hard real-time = missing a deadline is not acceptable • Soft real-time = deadline miss not end of world, but try to minimize – Number of acceptable deadline misses is a design parameter – We try to measure Quality of Service (QoS) – Examples? • Used in defense, factories, communications, multimedia; embedded in appliances Features • A real-time system may be used to control specific device – Opening bomb bay door – When to release chocolate into vat • Host device typically very small and lacks features of PC, greatly simplifying OS design – – – – – Single user or no user Little or no memory hierarchy Simple instruction set (or not!) No disk drive, monitor Cheap to manufacture, mass produce Scheduling • Most important issue in real-time systems is CPU scheduling • System needs to know WCET of jobs • Jobs are given priority based on their timing/deadline needs • Jobs may be pre-empted • Kernel jobs (implemented system calls) contain many possible preemption points at which they may be safely suspended • Want to minimize latency – System needs to respond quickly to external event, such as change in temperature – Interrupt must have minimum overhead – how to measure it? EDF • Given a set of jobs – Need to know period and execution time of each – Each job contributes to the CPU’s utilization: execution time divided by the period – If the total utilization of all jobs > 1, no schedule possible! • At each scheduling checkpoint, choose the job with the earliest deadline. – A scheduling checkpoint occurs at t = 0, when a job begins period or is finished, or when a new job arrives into the system – If no new jobs enter the system, EDF is non-preemptive – Sometimes the CPU is idle • Need to compute schedule one dynamic job at a time until you reach the LCM of the job periods – Can predict deadline miss, if any EDF example • Suppose we have 2 jobs, A and B, with periods 10 and 15, and execution times 5 and 6. • At t = 0, we schedule A because its deadline is earlier (10 < 15). • At t = 5, A is finished. We can now schedule B. • At t = 11, B is finished. A has already started a new period, we can schedule it immediately. • At t = 16, A is finished. B already started a new period, so schedule it. • At t = 22, B is finished. Schedule A. • At t = 27, A is finished, and CPU is idle until t = 30. EDF: be careful • At certain scheduling checkpoints, you need to schedule the job with the earliest deadline. – As long as that job has started its period. – Do each cycle iteratively until the LCM of the job periods. • Checkpoints include – t=0 – Whenever a job is finished executing – Whenever a job begins its period (This condition is important when we have maximum utilization.) • Example with 2 jobs – Job A has period 6 and execution time 3. – Job B has period 14 and execution time 7. – U = 1. We should be able to schedule these jobs with EDF. Example: EDF • Wrong way: ignoring beginning of job periods – At t = 0, we see jobs A (period 0-6) and B (period 0-14) Since A has sooner deadline, schedule A for its 3 cycles. – At t = 3, we see jobs A (period 6-12) and B (period 0-14) Since A hasn’t started its period, our only choice is B, for its 7 cycles. – At t = 10, we have job A (period 6-12) and B (period 14-28) A has sooner deadline. Schedule A for its 3 cycles. – At t = 13, A is finished but it missed its deadline. We don’t want this to happen! continued • Job A = (per 6, exec 3) Job B = (per 14, exec 7) • Correct EDF schedule that takes into account the start of a job period as another scheduling checkpoint 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 A A A B B B A A A B B B B A A A B B A A A 2 3 4 5 6 7 8 B B B B B A A 9 0 1 2 3 A B A A A 4 5 6 7 8 9 B B B A A A 0 1 2 B B B • Notice: – At t = 12 and t = 24, we don’t preempt job B, because B’s deadline is sooner. In the other cases when A’s period begins, A takes the higher priority. RM • Most often used because it’s easy • Inherently preemptive • Assign each job a fixed priority based on its period – The shorter the period, the more often this job must execute, the more deadlines it has the higher the priority • Determine in advance the schedule of the highest priority job – Continue for other jobs in descending order of priority – Be sure not to “schedule” a job before its period begins • Less tedious than EDF to compute entire schedule – For highest priority job, you know exactly when it will execute – Other jobs may be preempted by higher priority jobs that were scheduled first RM (2) • Sometimes not possible to find a schedule – Our ability to schedule is more limited than EDF. • There is a simple mathematical check to see if a RM schedule is possible: – We can schedule if the total utilization is n (21/n – 1) Proved by Liu and Layland in 1973. – If n (21/n – 1) < U 1, the test is inconclusive. Must compute the schedule to find out. – Ex. If n = 2, we are guaranteed to find a RM schedule if U < 82%, but for 90% it gets risky. – Large experiments using random job parameters show that RM is reliable up to about 88% utilization. n 1 2 3 4 5 6 7 8 P(RM) 1.000 0.828 0.780 0.757 0.743 0.735 0.729 0.724 0.693 RM Example • Suppose we have 2 jobs, C and D, with periods of 2 and 3, both with execution time 1. • U = 1/2 + 1/3 > 82%, so RM is risky. Let’s try it… • Schedule the more frequent job first. C C C C C C C • Then schedule job D. C D • Looks okay! C D D C D RM Example 2 • Let’s look at earlier set of tasks, A and B, with periods of 10 and 15, and execution times of 5 and 6. • U = 5/10 + 6/15 = 0.9, also risky. • Schedule task A first. 123456789012345678901234567890 A A A A A A A A A A A A A A A • Schedule task B into available spaces. 123456789012345678901234567890 A A A A A B B B B B A A A A A B A A A A A Comparison • Consider this set of jobs Job # Period Execution time 1 10 3 2 12 4 3 15 5 • What is the total utilization ratio? Are EDF and RM schedules feasible? • Handout RM vs. EDF • EDF – Job’s priority is dynamic, hard to predict in advance – Too democratic / egalitarian? Maybe we are trying to execute too many jobs. • RM – Fixed priority is often desirable – Higher priority job will have better response times overall, not bothered by a lower priority job that luckily has an upcoming deadline. – RM cannot handle utilization up to 1 unless periods are in sync, as in 1 : n1 : n1n2 : n1n2n3 : … (Analogy: Telling time is easy until you get to months/years.) RM example • Let’s return to previous example, this time using RM. – Job A = (per 6, exec 3) Job B = (per 14, exec 7) – Hyperperiod is 42 – First, we must schedule job A, because it has shorter period. 1 2 3 4 5 6 A A A 7 8 9 A A A 10 11 12 13 14 15 A A A … – Next, schedule job B. 1 2 3 4 5 6 7 A A A B B B A 8 9 10 A A B 11 12 13 14 15 … B B A A A B – Uh-oh! During B’s period 0-14, it is only able to execute for 6 cycles. Deadline miss This job set cannot be scheduled. But it could if either job’s execution time were less reduce U. RM Utilization bound • Liu and Layland (1973): “Scheduling Algorithms for Multiprogramming in a Hard Real-Time Environment” – They first looked at the case of 2 jobs. What is the maximum CPU utilization that RM will always work? Express U as a function of 1 of the job’s execution time, assuming the other job will fully utilize the CPU during its period. • We have 2 jobs – Job j1 has period T1 = 8 Job j2 has period T2 = 12 – Let’s see what execution times C1 and C2 we can have, and what effect this has on the CPU utilization. – During one of j2’s periods, how many times will j1 start? In general: ceil(T2/T1). In our case, ceil(12/8) = 2. – They derive formulas to determine C2 and U, once we decide on a value of C1. continued • We have job j1 (T1 = 8) • Suppose C1 = 2. and job j2 (T2 = 12) – C2 = T2 – C1 * (number of times j1 starts) = T2 – C1 * ceil (T2 / T1) = 12 – 2 ceil (12 / 8) = 8. – We can compute U = 2/8 + 8/12 = 11/12. • Suppose C1 = 4 – C2 = 4 – U = 4/8 + 4/12 = 5/6 – The CPU utilization is actually lower as we increase the execution time of j1. • … If the last execution of j1 spills over into the next period of j2, the opposite trend occurs. Formula • Eventually, Liu and Layland derive this general formula for maximum utilization for 2 jobs: U = 1 – x(1 – x)/(W + x) where W = floor(T2/T1) and x = T2/T1 – floor(T2/T1) • We want to minimize U: to find at what level we can guarantee schedulability. In this case W = 1, so U = 1 – x(1 – x) / (1 + x) • Setting the derivative equal to 0, we get x = √2 – 1, and U(√2 – 1) = 2(√2 – 1) = about 0.83 • Result can be generalized to n jobs: U = n(2^(1/n) – 1) CS 346 – Chapter 7 • Deadlock – Properties – Analysis: directed graph • Handle – Prevent – Avoid • Safe states and the Banker’s algorithm – Detect – Recover Origins of deadlock • System contains resources • Process compete for resources: – request, acquire / use, release • Deadlock occurs on a set of processes when each one is waiting for some event (e.g. the release of a resource) that can only be triggered by another deadlocked process. – e.g. P1 possesses the keyboard, and P2 has the printer. P1 requests the printer and goes to sleep waiting. P2 requests the keyboard and goes to sleep waiting. – Sometimes hard to detect because it may depend on the order in which resources are requested/allocated Necessary conditions 4 conditions to detect for deadlock: • Mutual exclusion – when a resource is held, the process has exclusive access to it • Hold and wait – processes each hold 1+ resource while seeking more • No preemption – a process will not release a resource unless it’s finished using it • Circular wait • The first 3 conditions are routine, so it’s the circular wait that is usually the big problem. – Model using a directed graph, and look for cycle Directed graph • A resource allocation graph is a formal way to show we have deadlock • Vertices include processes and resources • Directed edges – (P R) means that process requests a resource – (R P) means that resource is allocated to process • If a resource has multiple instances – Multiple processes may request or be allocated the resource – Intuitive, but make sure you don’t over-allocate – e.g. Figure 7.2: Resource R2 has 2 instances which are both allocated. But process P3 also wants some of R2. The “out” degree of R2 is 2 and “in” degree is 1. Examples • R2 has 2 instances. – We can have these edges: P1 R2, P2 R2, P3 R2. – What does this situation mean? What should happen next? • Suppose R1 and R2 have 1 instance each. – – – – Edges: R1 P1, R2 P2, P1 R2 Describe this situation. Now, add this edge: P2 R1 Deadlock? • Fortunately, not all cycles imply a deadlock. – There may be sufficient instances to honor request – Fig 7.3 shows a cycle. P1 waits for R1 and P3 waits for R2. But either of these 2 resources can be released by processes that are not in the cycle…. as long as they don’t run forever. How OS handles • Ostrich method – pretend it will never happen. Ignore the issue. Let the programmer worry about it. – Good idea if deadlock is rare. • Dynamically prevent deadlock from ever occurring – Allow up to 3 of the 4 necessary conditions to occur. – Prevent certain requests from being made. • A priori avoidance – Require advance warning about requests, so that deadlock can be avoided. – Some requests are delayed • Detection – Allow conditions that create deadlock, and deal with it as it occurs. – Must be able to detect! Prevention • “An ounce of prevention is worth a pound of cure”: Benjamin Franklin • Take a look at each of the 4 necessary conditions. Don’t allow it to be the 4th nail in the coffin. 1. Mutual exclusion – – Not much we can do here. Some resources must be exclusive. Which resources are sharable? 2. Hold & wait – – – Could require a process to make all its requests at the beginning of its execution. How does this help? Resource utilization; and starvation? Prevention (2) 3. No resource preemption – Well, we do want to allow some preemption – If you make a resource request that can’t be fulfilled at the moment, OS can require you to release everything you have. (release = preempting the resource) – If you make a resource request, and its held by a sleeping process, OS can let you steal it for a while. 4. Circular wait – System ensures the request doesn’t complete a cycle – Total ordering technique: Assign a whole number to each resource. Process must request resources in numerical order, or at least not request a lowered # resource when it holds a higher one. – Fig. 7.2: P3 has resource #3 but also requests #2. OS could reject this request. Avoidance • We need a priori information to avoid future deadlock. • What information? We could require processes to declare up front the maximum # of resources of each type it will ever need. • During execution: let’s define a resource-allocation state, telling us: – # of resources available (static) – Maximum needs of each process (static) – # allocated to each process (dynamic) Safe state • To be in a safe state, there must exist a safe sequence. • A safe sequence is a list of processes [ P1, P2, … Pn ] – for each P_i, we can satisfy P_i’s requests given whatever resources are currently available or currently held by the processes numbered lower than i (i.e. Pj where j < i) by letting them finish. – For example, all of P2’s possible requests can be met by either what is currently available or by what is held by P1. – If P3 needs a resource held by P2, can wait until P2 done, etc. • Safe state = a safe sequence including all processes. – Deadlock occurs only in an unsafe state. • The system needs to examine each request and ensure that if the allocation will preserve the safe state. Example • Suppose we have 12 instances of some resource. • 3 processes have these a priori known needs Process # Max needs Current use 1 10 5 2 4 2 3 9 2 • We need to find some safe sequence of all 3 processes • At present, 12 – (5 + 2 + 2) = 3 instances available. • Is [ 1, 2, 3 ] a safe sequence? Banker’s algorithm • General enough to handle multiple instances • Principles – – – – No customer can borrow more money than is in the bank All customers given maximum credit limit at outset Can’t go over your limit! Sum of all loans never exceeds bank’s capital. • Good news: customers’ aggregate credit limit may be higher than bank’s assets • Safe state: Bank has enough “money” to service request of 1 customer. • Algorithm: satisfy a request only if you stay safe – Identify which job has smallest remaining requests, and make sure we always have enough dough Example • Consider 10 devices of the same type • Processes 1-3 need up to 4, 5, 8 of these devices, respectively • Are these states safe? Job # allocated Max needed 1 0 4 2 2 5 3 4 8 Job # allocated Max needed 1 2 4 2 3 5 3 4 8 Handling deadlock • Continually employ a detection algorithm – Search for cycle – Can do it occasionally • When deadlock detected, perform recovery • Recover by killing – Kill 1 process at a time until deadlock cycle gone – Kill which process? Consider: priority, how many resources it has, how close it is to completion. • Recover by resource preemption – Need to restart that job in near future. – Possibility for starvation if the same process is selected over and over. Detection algorithm • Start with allocation graph, and “reduce it” While no changes do: • Find a process using a resource & not waiting for one. Remove edge: process will eventually finish. • Can now re-allocate this resource to another process, if needed. • Also can perform other resource allocations for resources not fully allocated. • If there are any edges left, we have deadlock. Example • 3 processes & 3 resources • Edges: – – – – – (R1 P1) (P1 R2) (R2 P2) (P2 R3) (R3 P3) • Can this graph be reduced to the point that it has no edges? CS 346 – Chapter 8 • Main memory – Addressing – Swapping – Allocation and fragmentation – Paging – Segmentation • Commitment – Please finish chapter 8 Addresses • CPU/instructions can only access registers and main memory locations – Stuff on disk must be loaded into main memory • Each process given range of legal memory addresses – Base and limit registers – Accessible only to OS – Every address request compared against these limits • When is address of an object determined? – Compile time: hard-coded by programmer – Load time: compiler generates a relative address – Execute time: if address may vary during execution because the process moves. (most flexible) Addresses (2) • Logical vs. physical address – Logical (aka virtual): The address as known to the CPU and source code – Physical = the real location in RAM – How could logical and physical address differ? In case of execution-time binding. i.e. if the process location could move during execution • Relocation register – Specifies what constant offset to add to logical address to obtain physical address – CPU / program never needs to worry about the “real” address, or that addresses of things may change. It can pretend its address start at 0. Swapping • A process may need to go back to disk before finishing. – Why? • Consequence of scheduling (context switch) • Maintain a queue of processes waiting to be loaded from disk • Actual transfer time is relatively huge – When loading a program initially, we might not want to load the whole thing • Another question – what to do if we’re swapped out while waiting for I/O. – Don’t swap if waiting for input; or – Put input into buffer. Empty buffer next time process back in memory. Allocation • Simplest technique is to define fixed-size partitions • Some partitions dedicated to OS; rest for user processes • Variable-size partitions also possible, but must maintain starting address of each • Holes to fill • How to dynamically fill hole with a process: – First fit: find the first hole big enough for process – Best fit: find smallest one big enough – Worst fit: fit into largest hole, in order to create largest possible remaining hole • Internal & external fragmentation Paging • Allows for noncontiguous process memory space • Physical memory consists of “frames” • Logical memory consists of “pages” – Page size = frame size • Every address referenced by CPU can be resolved: – Page number – Offset – how to do it? Turns out page/frame size is power of 2. Determines # bits in address. • Look up page number in the page table to find correct frame Example • Suppose RAM = 256 MB, page/frame size is 4 KB, and our logical addresses are 32 bits. – – – – How many bits for the page offset? How many bits for the logical/virtual page number? How many bits for the physical page number? Note that the page offsets (logical & physical) will match. • A program’s data begins at 0x1001 0000, and text begins at 0x0040 0000. If they are each 1 page, what is the highest logical address of each page? • What physical page do they map to? • How large is the page table? Page table • HW representation – Several registers – Store in RAM, with pointer as a register – TLB (“translation look-aside buffer”) Functions as a “page table cache”: Should store info about most commonly occurring pages. • How does a memory access work? – First, inspect address to see if datum should be in cache. – If not, inspect address to see if TLB knows physical address – If no TLB tag match, look up logical/virtual page number in the page table (thus requiring another memory access) – Finally, in the worst case, we have to go out to disk. Protection, etc. • HW must ensure that accesses to TLB or page table are legitimate – No one should be able to access frame belonging to another process • Valid bit: does the process have permission to access this frame? – e.g. might no longer belong to this process • Protection bit: is this physical page frame read-only? • Paging supports shared memory. Example? • Paging can cause internal fragmentation. How? • Sometimes we can make page table more concise by storing just the bounds of the pages instead of each one. Page table design • How to deal with huge number of pages • Hierarchical or 2-level page table – In other words, we “page” the page table. – Split up the address into 3 parts. “outer page”, “inner page” and then the offset. – The outer page number tells you where to find the appropriate part of the (inner) page table. See Figure 8.15. – Not practical for 64-bit addressing! Why not? • Hashed page table – Look up virtual page number in a hash table. – The contents of the cell might be a linked list: search for match. • Inverted page table – A table that stores only the physical pages, and then tells you which logical page map to each. Any disadvantage? Segmentation • Alternative to paging • More intuitive way to lay out main memory… – Segments do not have to be contiguous in memory – Process has segment table: For each segment, stores the base address and size • As before, a process has a “logical address space” – But now: it consists of segments, each having a name and size. – How does a program(mer) specify an address in a segmented scheme? – What kinds of segments might we want to create for a program? • HW may support both paging and segmentation – So, OS may exploit either or both addressing techniques. – To ignore segmentation, just use 1 segment for entire process. Pentium example • To convert logical to physical address – Handle the segmentation first… – Segmentation unit takes the logical address, and converts this to a linear address (why?) – Paging unit takes the linear address and converts this to a physical address (somewhat familiar process) • A segment may be up to 4 GB, so offset is 32 bits – Logical address has 2 parts: segment number plus offset – Look up segment number into “descriptor table”. Entries in this table give the upper bits of the 32-bit linear address. • Pentium uses 2-level paging – Outer and inner page numbers are 10 bits each. What information does this tell you? CS 346 – Section 9.1-9.4 • Virtual memory – (continues similar themes from main memory chapter) – What it is – Demand paging – Page faults – Copy on write – Page replacement strategies Virtual memory • Recall: main memory management seeks to support multiprogramming • VM principles – Allow process to run even if only some of it is in main memory – Allow process to have a logical address space larger than all physical memory – Allow programmer to be oblivious of memory management details, except in extreme cases. • Motivation – Some code is never executed. Some data never used. – Programmer may over-allocate an array. – Even if we need to load entire program, we don’t need it all at once. – We’ll use less RAM, and swap fewer pages. Using VM • The programmer (or compiler) can refer to addresses throughout entire (32-bit) address space. – In practice, may be restricted, because you may want to have virtual addresses for outside stuff; but still a huge fraction – All addresses will be virtual/logical, and will be translated to actual physical address by OS and HW – We can allocate a huge amount of VM for stack and heap, which may grow during execution. – Stack and heap will be unlikely to bump into each other. • Supports sharing of code (libraries) and data – Virtual addresses will point to the same physical address Demand paging • Typical way to implement VM • Only bring a page in from disk as it is requested. – What is benefit? Why not load all pages at once? – “lazy pager” more accurate term than “lazy swapper” • Pager initially guesses which pages to initially load – “pure demand paging” skips this step • Valid bit: is this page resident in RAM? • If not: page fault – The page we want is not in physical memory (i.e. it’s in the “swap space” on disk) – How often does this happen? – Temporal and spatial locality help us out Page fault Steps to handle: • OS verifies the problem is not more severe • Find free space in RAM into which to load proper page • Disk operation to load page • Update page table • Continue execution of process • Cost of page fault ~ 40,000x normal memory access – Probability should be minuscule Copy-on-write • A memory optimization • Can be used when we fork, but not exec • No real need to duplicate the address space – Two processes running the same code, accessing same data • Until… one of the processes wants to write. – In this case, we create a 2nd copy of the page containing the written-to area. – So, we only copy some pages. Compare Figures 9.7 and 9.8 • If you want to exec immediately after fork, you would not need to copy-on-write. – vfork( ) system call: child shares same pages as parent. Child should not alter anything here because of the exec. But if child did, changes would be seen by parent. Page fault • Demand paging to implement virtual memory √ • What is a page fault? • How to handle – … Find a free frame and load the new page into it … – But what if no frame is free? Aha! • Extreme approaches – Terminate process if no free frame available – Swap out a process and free all its pages being used • Alternative: replace (i.e. evict) one of the resident pages – Need to amend the procedure for handling page fault: – Copy victim to disk if necessary; replace frame with new page – Let process continue Issues • Frame allocation – How many frames should we give to each process? – If more than enough, never need to evict a page. (Too good…) – More about this later (section 9.5) • Page replacement algorithm – Need a way to pick a victim – Many such algorithms exist – Goal: reduce total # of page faults (or the rate), since costly! • To simplify analysis of page behavior, use “reference string”: list of referenced pages, rather than complete addresses. (p. 412) – Given # frames, replacement algorithm and reference string, should be able to determine # of page faults. Clairvoyant • The clairvoyant page replacement algorithm is optimal. – In other words, the minimum possible number of page faults • Replace the page that will not be used for the longest period of time in the future. • Not realistic to know such detailed info about the future, so it’s not a real algorithm • Useful as a benchmark. – If your algorithm is better, check your arithmetic. FIFO • “First in, first out” – queue philosophy • Evict the page that has been resident the longest. • Example with 3 frames: – 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 1, 2, 0, 1, 7, 0, 1 – 15 page faults, compared to 9 with clairvoyant • Does this policy make sense? – Being “old” has nothing to do with being useful or not in the future. – Startup routines may no longer be needed. Ok. – Does a grocery store get rid of bread to make way for green tea? • Belady’s anomaly – Undesirable feature: it’s possible to increase # frames and see an increase in # of page faults. (p. 414) LRU • “Least recently used” • Attempts to be more sensible than FIFO – More akin to a stack, rather than a queue • Example with 3 frames – 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 1, 2, 0, 1, 7, 0, 1 Has 12 page faults • Problem: how to represent the LRU information – Stack of page numbers: Reference a page bring it to the top Evict the lowest page # in the stack – Associate a counter or timestamp for each page. Search for min. – HW might not support these expensive ops: require significant overhead, e.g. update for each reference. Almost LRU • We want to perform fewer HW steps – It’s reasonable to test/set a bit during a memory reference. Not much more than this. – Gives rise to reference bit(s) associated with each page. • Second chance FIFO – When a page is referenced, set its reference bit. – When time to find victim, scan the frames. If ref bit = 1, clear it. If ref bit already 0, we have our victim. Next time need to search for victim, continue from here (circular/“clock” arrangement). • Multiple reference bits – Periodically shift left the ref value. Evict page that has all 0’s. • Use reference count (MFU or LFU) – When a page is referenced, increment its reference value. – Policy may be to evict either least or most frequently referenced. CS 346 – Sections 9.5-9.7 • Paging issues – How big should a page be? – Frame allocation – Thrashing – Memory-mapped files & devices • Commitment – Please finish chapter 9 Page size • HW may have a default (small) page size – OS can opt for somewhat larger sizes – If we want 4K pages, but the default is 1K, then tell HW to always group its pages in fours • Small or large? – On average, ½ of final page will be blank (internal fragmentation) – But, small pages larger page table • Let’s measure overhead – – – – – s = average process size; p = page size; e = size of page table entry We’ll need about s/p pages, occupying se/p bytes for page table. Last-page waste = p/2 Total overhead = se/p + p/2. See the trade-off? Optimum result p = sqrt(2se) ~ sqrt(2 * 1MB * 8) = 4 KB Frame allocation • A process needs a certain minimum number of frames. • Some instructions may require 2 memory references (unusual), plus the instruction itself. – All 3 memory locations may be in different pages – To execute this single instruction, we would need 3 frames. – Also, a memory reference could straddle a page boundary. Not a good HW design. – Book mentions example of inst requiring up to 8 frames. • • • • Equal allocation among processes Proportional allocation (to total process size) Priority bonus Allocation needs to be dynamic: changes in # processes Allocation (2) • Global or local page replacement? – Local = you can only evict your own pages With this policy: the number of frames allocated to a process never changes. – Global = you can evict someone else’s page You are at the mercy of other processes. # of page faults depends on the environment. But if you need extra space, you can take it from someone who isn’t using theirs. More responsive to actual memory needs throughput. • Non-uniform memory – With multiple CPUs and memories, we desire frames that are “closer” to the CPU we are running on. Thrashing • Spending more time paging than doing useful work. • How does it occur? – If CPU utilization low, OS may schedule more jobs. – Each job requires more frames for its pages. It takes frames away from other jobs. More page faults ensue. – When more and more jobs wait to page in/out, CPU utilization goes down. OS tries to schedule more jobs. – Fig 9.18 – don’t have too many jobs running at once! • To avoid the need for stealing too many frames from other jobs, should have enough to start with. – Locality principle: At any point in the program, we need some, but not all of our pages. And we’ll use these pages for a while. Loops inside different functions. • Or: swap out a job and be less generous in future. Working set model • A way to measure locality by Peter Denning, 1968. • Begin by setting a window – How far back in time are you interested? – Let = the number of memory references in the recent past – What if is too big or too small? • Working set = set of pages accessed during window – Another number: Working set size (WSS) : How many pages accessed during the window – For example, we could have = 10,000 and WSS = 5. • OS can compute WSS for each job. – If extra frames still available, can safely start a new job. • Practical consideration: how often to recalculate WSS? Memory mapping • Often we read a file sequential from start to finish. – Seems a shame to make so many system calls and disk accesses for something so routine (e.g. read a single character or line of text). – Instead, pages in memory get allocated to file on disk. • When writing data to file, disk contents not immediately updated. – RAM acts as buffer – periodic checks: if something written, write to disk – Final writes when job is done. • For read-only file, multiple jobs can share this memory • Other I/O devices also mapped to pages (screen, printer, modem) CS 346 – rest of Ch. 9 • Allocating memory for kernel • Making paging work better – – – – prepaging TLB reach Memory-aware coding Locking pages Kernel memory • Some memory is reserved for kernel • To minimize overhead – (fragmentation): we don’t allocate entire pages at a time – (efficiency / direct memory access): OS would like to allocate a contiguous block of memory of arbitrary size • Simple approach: “buddy system”: • Memory manager maintains list of free blocks of size 1, 2, 4, 8, … bytes up to some maximum e.g. 1 MB. • Initially, we have just 1 free block: the entire 1 MB. – Over time this may get split up into smaller pieces (buddies). • When kernel needs some memory, we round it up to the next power of 2. • If no such size available, split up something larger. Example 1024 Request A = 70K A Request B = 35K A B 64 Request C = 80K A B 64 C 128 512 Return A 128 B 64 C 128 512 Request D = 60K 128 B D C 128 512 Return B 128 64 D C 128 512 C 128 512 Return D Return C 128 256 256 512 256 512 1024 When a block of size 2k is freed, memory manager only has to search other 2k blocks to see if a merge is possible. Slab • Relatively new technique • Kernel objects are grouped by type in effect, grouped by size – e.g. semaphores, file descriptors, etc. • OS allocates a “cache” to hold objects of the same type. – Large enough to hold several such objects. Some are unused, i.e. “free” • How many objects are in a cache? – 1 page (4 K) is usually not enough. So we may want several contiguous pages – this is called a slab. – So, we achieve contiguous memory allocation, even though the objects might not be resident contiguously themselves. See figure 9.27 Prepaging • In order to avoid initial number of page faults, OS can bring in all needed pages at once. • Can also do this when restarting a job that was swapped out. Need to “remember” the working set of that job. • But: will the job need “all” of its pages? • Is the cost of prepaging < cost of servicing all future individual page faults? TLB reach • For paging to work well, we want more TLB hits too • TLB reach = how much memory is referred to by TLB entries – Memory-intensive process more TLB misses • Approaches to improve TLB hit rate – Larger TLB But sometimes, to achieve acceptable hit rate, need unreasonably large table! – Allow for larger page size For simplicity, can offer 2 sizes (regular and super) OS must manage the TLB, so it can change page size as needed. Any disadvantage? Memory-aware code • Page faults do happen. Keep working set small if you can. • Let’s initialize array elements. Does it matter if we proceed row or column major? • Data structures: stack, queue, hash table • BFS vs. DFS – which is better with respect to memory? • array versus ArrayList Locking pages • Sometimes we want to make sure some pages don’t get replaced (evicted) • Each frame has a lock bit • I/O – Actual transfer of data performed by specialized processor, not the CPU – When you request I/O, you go to sleep while the transfer takes place. – You don’t want the I/O buffer pages to be swapped out! • Kernel pages should be locked • Can lock a page until it has been used a little – To avoid situation where we replace a page we just brought in CS 346 – Chapter 10 • Mass storage – Advantages? – Disk features – Disk scheduling – Disk formatting – Managing swap space – RAID Disks • Anatomy (Figure 10.1) – Sector, track, cylinder, platter – Read/write head attached to arm, attached to arm assembly – Head quickly reads binary data as: orientation of iron ions or reflectiveness of surface • Example: CD – About 25,000 tracks, 50 sectors per track 1 bit occupies about 1 square m – Entire CD can be read in about 7 minutes on a 12x speed drive • But usually we don’t read entire disks 2 aspects dominate access time: – Seek time: proportional to square root of seek distance – Rotational latency Some specs Floppy disk Hard drive (2001) Hard drive (2011) Cylinders 40 10 601 310 101 Tracks/cylinder 2 12 16 Sectors/track 9 281 (average) 63 Sectors/disk 720 35 742 000 312 500 000 Bytes/sector 512 512 512 Capacity 360 KB 18 GB 160 GB Seek adjacent track 6 ms 0.8 ms Seek (average) 77 ms 6.9 ms 9.5 ms Rotation 200 ms 8.3 ms 8.3 ms Transfer 1 sector 22 ms 17 s 1.7 s Disk scheduling • Common problem is a backup of disk requests • Disk queue • When disk is ready, in what order should it do the disk requests? Similar to problem of scheduling CPU • Pending jobs are classified by which track/cylinder they want to access • Ex. 4, 7, 16, 2, 9, 1, 9, 5, 6 • Several disk scheduling algorithms exist – Simple approach: first-come, first-served – Total head movement = ? – Want to reduce total seeking time or head movement: avoid “wild swings”. – Would be nice not to finish at extreme sector number. Scheduling (2) • Shortest seek first – For 4, 7, 16, 2, 9, 1, 9, 5, 6: After serving track 4, where do we go next? Total head movement = ? – Very good but not optimal • Elevator algorithm (“scan” method) – Pick a direction and go all the way to end, then come back and handle all other requests. – Better than Shortest Seek in our example? • Circular scan – Same as elevator algorithm BUT: when you reach the end you immediately go to the other end without stopping for requests. In other words, you only do work as head is moving in 1 direction. • Look scheduling: modify elevator & circular scan so you only go as far as highest/lowest request Disk mgmt • Low-level (physical) formatting – Dividing disk medium into sectors – Besides data, sector contains error-correcting code – Later, disk controller will manipulate individual sectors • High-level (logical) formatting – Record a data structure for file system on disk – Partition groups of cylinders if desired – Associate adjacent blocks into logical clusters to support file I/O • “Sector sparing”: compensate for bad blocks! – Maintain list of bad blocks; replace each with a spare one • Boot from disk: boot blocks in predefined locations contain system code to load “boot partition” of drive Swap space • Recall: used in virtual memory to store pages evicted from RAM – Faster to return to RAM than loading from file from scratch – In effect: disk space is now being used as extension of main memory, the very essence of VM • Logically a separate partition of the disk from the file system • When process started, it’s given some swap space • Swap map: kernel data structure to track usage – Associate an counter value with each page in swap area – 0 means that page is available to swap into – Positive number: number of processes using that swapped-out data (> 1 means it’s shared data) RAID • Increasingly practical to have several disks on a system – But increases probability & mean time to failure • RAID = “redundant array of independent disks” – Redundancy: fault tolerance technique • Six “levels” or strategies of RAID: use various combinations of fault tolerant techniques • Typical RAID techniques in use – Striping a group of disks: split bits of each byte across disks Or block-level striping: split blocks of a file… – Mirroring another disk – Store parity (error-correcting) bits on another disk – Leaving some disks empty until needed to replace failed disk RAID levels Various combinations of techniques… For example: • RAID 0 – block striping; no mirroring or parity bits • RAID 1 – add mirrored disks • RAID 2, 3, 4 – extra disks store parity bits – If 1 disk fails, remaining bits of each byte and error-correction bit can be used to construct lot bit of each byte. – RAID 3 – bit-interleaved parity – RAID 4 – block-interleaved parity • RAID 0+1 – a set of disks is striped, and then the stripe is mirrored to another disk • RAID 1+0 – disks are mirrored into a pair of disks. This pair is then striped. RAID extensions • RAID is designed just to detect & handle disk failure • Does not prevent/detect data corruption, etc. – Could be pointing to wrong file, wrong block • Checksum for data and metadata on disk – Ex. For each disk block, how many bits are set? – Store with pointer to object (See Figure 10.13) – Detect whether it has changed. Grab correct data from the mirror. • RAID also somewhat inflexible because its techniques require a certain number of disks. What to do? CS 346 – Chapter 11 • File system – Files – Access – Directories – Mounting – Sharing – Protection Files • What is a file? • Attributes – Name, internal ID, type, location on device, size, permissions, modification/creation time • Operations – Create, read, write, reposition file pointer (seek), delete, truncate (i.e. to zero) – Less essential: append, rename, copy – The first time we refer to a file, need to search for it: “open” • Active file tables. What is stored in each? – Table per process – System-wide table • The “open count” for a file Type and structure • Policy question – should OS be aware of file types? • How file type determined – filename extension – Keep track of which application created file – Magic number • File type determines its structure – At a minimum: bits and bytes – e.g. OS expects executable file to have certain format – Text file: recognize meaning of certain ASCII codes • Files stored in “blocks” on a device – Each I/O operation can grab one block (~ 1KB <= page size) – Can start a new file on a new block, or do some “packing” Accessing data • Sequential access – Read, write, rewind operations – We almost always utilize files this way • Direct access – More complex system calls: Allow arbitrary access to any byte in file on demand – What kind of application needs this functionality? – Read/write operations may specify a relative or absolute block number • Indexed access – Another file stores pointers to appropriate blocks in some large file Directories • File system resides on some “volume” – A volume may be a device, part of a device, multiple devices: – So, can have multiple file systems on the same device (partition) – A file system can use multiple devices, but this adds complexity • Can have specialized “file systems” to allow certain devices to be treated as files, with file I/O commands • Volume must keep around info about all files – Confusingly called a directory • Directory operations on files: – Search, create, delete, list, rename, traverse File organization • How are files logically organized in the directory? • Single-level directory: one flat list – File names must be unique – Excellent if everyone is sharing files • Two-level directory – Each user has a separate directory: Figure 11.9 – System maintains a master file directory: pointers to each user’s file directory – Allows user’s work to be isolated – Can specify file by absolute or relative path name – Special “system user” for system files. Why necessary? – Search path: sequence of directories to use when searching for a file. Look here, look in system folder, etc. File org (2) • Tree-based directory: Files can be arbitrarily deep • Allows user to impose local structure on files • Each process has a current working directory – To access file, need to specify path name or change the current directory • Policy on deleting an entire directory • Acyclic directory: support links to existing files – – – – – – In effect, the same file has multiple path names Same file exists in multiple directories But there is just 1 file, not a copy When traversing, need to ignore the links What happens when we delete file? Links now point to … Can count the # of references to file (like garbage collection) Mounting • Mount = make volume/device available to file system. • Assign a name to its root so that all files will have a specific path name. • Mount point = position in existing file system in which we insert the new volume. – Think of inserting a subtree at a new child of an existing node. – E.g. You plug in a USB drive, and immediately it acquires the name E: so you can access its files – In UNIX, a new “volume” may appear under / • Unused volumes may be temporarily unmounted if file system desires File sharing • In multi-user system, desirable to have some files accessible by multiple users! • File system must have more info – Owner of each file – Assign unique ID numbers for users and groups of users – When you access file, we check your IDs first • Remote file system access – Manually transfer files via FTP – Distributed file system: see a file system on another computer on the network – Anonymous browsing on the Web Remote file system • We’d like to mount a remote file system on our machine. – In other words, be able to give (path) names to remote files to manipulate them. • Client-server relationship: a file server accepts requests for remote machines to mount – E.g. You are logged into ultrax2, but ultrax1 is the file server. – NFS is a standard UNIX file sharing protocol – OS file system calls are translated into remote calls • One challenge – to authenticate the client. – Typically the client & server share same set of user IDs. When you get a computer account, your user ID is good everywhere. – Or, provide your password the first time you access server. • What is role of distributed naming service, e.g. DNS ? Consistency • Policy decisions concerning how we handle multiple users accessing the same file – Reminiscent of synchronization • When do changes made by one user become observable to others? – Immediately, or not until you reopen the file? • Should we allow 2 users to read/write concurrently? – As in a database access • System may define immutable shared file – Like a CD-R – Cannot be modified, name cannot be resused. – No constraints on reading Protection • Owner/creator of file should set capabilities for – What can by done – By whom • Types of access – Read – Write – Execute Could also distinguish other access capabilities: – Delete – List Specifying permissions • Establish classes of users, each with a possibly distinct set of permissions – Classes can be: owner, group, rest of world • For each level of users: – ‘r’ = Can I read the file? – ‘w’ = Can I write to (or delete) the file? – ‘x’ = Can I execute the file? • Examples – rw-rw-r-– rwxr-xr-– rw-r----- (664) (754) (640) • If no groups, can set group permission = rest of world. • Use chmod command CS 346 – Chapter 12 • File systems – Structure – Information to maintain – How to access a file – Directory implementation – Disk allocation methods efficient use, quick access – Managing free space – Efficiency, performance – Recovery Structure • File system is usually built on top of disks – Medium can be rewritten in place – Relatively easy to move to another place on disk • Purpose of file system – Provide a user interface to access files – Define a mapping between logical files and space on a secondary storage device • FS have several levels/layers of abstraction & functionality, e.g. 4 – – – – Logical file system File organization module Basic file system I/O control Layers • Logical file system – Maintain file’s metadata: inside a “file control block” aka “inode” – Directory data structure • File-organization module – Translates between logical and physical data blocks of a file. In other words it knows everybody’s real address. – e.g. logical block numbers might always start 0 • Basic file system – Manipulate specific sectors on disk. – Maintain buffers for file I/O • I/O control – Device drivers give machine-language commands to device to accomplish the file I/O. – (Different file systems can use the same device drivers.) FS information On disk: • Boot (control) block = first block on a volume. Give inst on how to load the OS. • Volume control block = “superblock” – Statistics about the volume: # of blocks, their size, how many are free and which ones • Directory data structure: point to each file • File control block (inode) for each file (contains what info?) In memory: • Which volumes are currently mounted • Cache of recently accessed directories (faster access) • Which files are currently open – Per process – System-wide • Buffers holding currently processing file I/O Opening file • open( ) system call passes file name to logical FS • See if anyone else already has this file opened. – How? – What if it is? • If not already open, search the directory • If found, – copy file control block (inode) to system-wide open file table – Set pointer in process’ open file table (Why not the inode?) – Also in process’ table: dynamic stuff like initialize current location within file, whether opened for read or write, etc. Should we copy this to inode also? • open( ) returns file descriptor (i.e. pointer to per-process table entry). Use this for future I/O on this file. Multiple file systems • We generally don’t have 1 file system in charge of the entire disk • Disks usually have partitions… • Raw partition – Where you don’t want/need to have files – Ex. Swap space; information related to backups • Boot partition – should be treated special / separate – Contains program to ask user which OS to boot – Multiple OS can give rise to different FS • Use “virtual file system” to manage multiple FS – Hide from user the fact > 1 FS Virtual FS • See Figure 12.4 • Purpose: act as an interface between the logical FS the user interacts with, and the actual local/remote file system • Defines essential object types, for example – – – – File metadata, e.g. inode Info about an open file Superblock: info about an entire file system Directory entries • For each type of object, set of operations defined, to be implemented by individual FS – Ex. For a file: open, read, write, … Directory rep’n • A question of which data structure to use • Linear list? – Essentially an array of pointers (we point to the data blocks) – Advantage / Disadvantage? • Other data structures are possible: any good? – Sorted list – Binary search tree; B-tree – Hash table 1. Contiguous allocation • Advantage – few seeks needed on disk – Ex. We would like a file to reside entirely in one track if possible • If you know disk address of first block, and length of file, you know where entire file is. • Problems – Where to put a new file: dynamic storage allocation: best fit, worst fit, first fit – External fragmentation – Can’t predict a file deletion that would give you a better fit – Don’t know size of brand new file – Preallocating extra space: internal fragmentation • Can compact (defragment) files. Tedious operation. • File “extents”: a modification to contiguous scheme 2. Linked allocation • File is linked list of disk blocks (Figure 11.6) • File’s directory entry points to first & last blocks (in addition to maintaining other file attributes) • Avoids disadvantages of contiguous allocation – No fragmentation, don’t need to know size in advance, … • Criticism – Linked list inefficient to access data “directly” as opposed to sequentially. Ex. Editor requests to go to 3 millionth line. – What if 1 of the pointers becomes damaged? – Minor overhead from the pointer in each block. Can define “clusters” of the file to be contiguous blocks, but this suffers some fragmentation. File allocation table • Located at start of disk • Table has an entry for each disk block – Has fixed size and is indexed by disk number – Purpose of these table entries is to point to next block in a file, like emulating a linked list with an array • File’s directory entry contains the starting block number. – See Figure 11.7 • Performance problem: – Need to do 2 seek operations every time you go to a new block in the file. Why? • Direct access with FAT is faster than pure linked allocation. Why? 3. Indexed allocation • The file on disk begins with an index block • Index block contains pointers to the various disk blocks containing the actual data of the file. • When file first created, all pointers set to null. One by one, they get initialized as file grows. • File’s directory entry contains block number of the index block. See Figure 11.8 • If all blocks on disk are exactly 1KB in size, how big of a file can we support using this scheme? Bigger files • Linked indexed allocation: Can continue pointers in another index block. In general, can have a linked list of index blocks. – How big a file can we have with 2 linked index blocks? • Multilevel indexed allocation – Begin with a first-level index block. Entries contain addresses of second-level index blocks. – Each second-level index block has pointers to actual file data. – How big a file can we have? • Direct & indirect indexed allocation – File’s directory entry can hold several block numbers itself. – Followed by: single indirect block, double indirect block, triple indirect block. See figure 11.9 Free space • As long as volume is mounted, system maintains freespace “list” of unused blocks. Question is how to represent this info. • Bit vector: how much space? Keep it in memory? • Collect all free blocks into linked list. – We don’t typically traverse this list. Just grab/insert one. • Grouping technique – First “free” block used to store addresses of n – 1 actual free blocks. Last address stores location of another indirect block of free addresses. • Counting: store address along with number of contiguous free blocks Efficiency • Where should inodes be on the disk? All in one place or scattered about? • Using linked allocation (treating data blocks as a linked list on disk) – How to keep a lid on the number of nodes in a list? – How to reduce internal fragmentation? • File metadata may include the last time file accessed – How expensive is this operation? Response/alternative? • Size of pointers (locations holding address) • Should the system’s global tables (process, open files) be fixed or variable length? Performance Some techniques to optimize disk usage • Disk controller: store contents of a whole track – Why? What is necessary to accomplish this? • Buffer cache and page cache – “Cache” a file’s data blocks / physical pages of virtual memory • Caching the pages may be more efficient: – Pages can be individually larger than individual data blocks – Fewer computational steps to do virtual memory access than interfacing with the file system. • Not too efficient to employ both kinds of caches – “Double caching problem” with memory-mapped I/O: data first arrives into a page cache because the device is paged… and then copied to/from buffer cache Performance (2) • Do you want writes to be synchronous or asynchronous? – Pass a parameter to open( ) system call to specify which you want. – Which is better typically? – When one is preferred over the other? • Page replacement: default policy like LRU may be bad in some situations – Consider sequential access to a file: Remove a page as soon as the next one is in use. Request the next few pages in advance. Recovery • Need to protect from – Loss of data – Inconsistency (corruption) of data – resulting from what? • Consistency checking – – – – – Scan file metadata, see if it all makes sense See if data blocks match a file correctly: traverse all pointers Check free block list What if something is wrong? Is some information more critical than others? What extra protection to give? • Log transactions: tell which operations are pending, not complete • Full vs. incremental backups Network file system • Principles – Each machine has its own file system – Client-server relationships may appear anywhere – Sharing only affects the client • To access remote directory, mount it – A remote directory is inserted in place of an existing (empty) directory whose contents now becomes hidden. – It will then look & behave like part of your local file system – Supports user mobility: access your files anywhere in the network • Protocol – Server has list of valid file systems that can be made available, and access rights (for each possible client) CS 346 – Chapter 13 • I/O systems – – – – Hardware components Polling & interrupts DMA: direct memory access I/O & the kernel • Commitment – Please read chapter 13. I/O • Challenge: so many different I/O devices • Try to put a lid on complexity – Classify I/O by how they behave – All devices should have a common set of essential features • Each device has a controller (hardware / circuitry) that is compatible to the host machine. – Process running on CPU needs to read/write values in registers belonging to an I/O controller • Corresponding device driver installed as part of the OS – Communicates with controller – I/O instructions ultimately “control” the devices • Devices can have memory addresses allocated to them Concepts • Port – physical connection point between device and computer • Bus – set of wires connecting 1+ devices – The bus itself is connected to the port, and devices are connected to the bus – Figure 13.1: Notice controllers connected to bus – System enforces some protocol for communication among the devices along this bus • Daisy chain – another way to group devices – One device is connected directly to the computer – Each other device is connected to another device along the chain. Think of it as a linked list of devices, with the first device directly connected. Memory mapped I/O • Some of RAM is reserved to allow processes to communicate with I/O controllers • We read/write data to specific address – This address is assigned to a specific port identify device – Each device is given a specific range of addresses: Fig. 13.2 – Address also signifies meaning of the value. E.g. Status of I/O request, command to issue to controller, data in, data out • An I/O instruction can immediately get/set a value in controller’s register Polling & interrupts • When working with an I/O device, we need to determine its state: is it ready/busy, did it encounter an error or complete successfully? • Polling = busy-wait cycle to wait for answer from device. Periodically check status of operation • Interrupt – let the I/O device inform me – Device sends signal along an interrupt-request line – CPU detects signal and jumps to predefined interrupt handling routine. (Need to save state while away) Figure 13.3 – Nature of signal allows us to choose appropriate handler – Some interrupts maskable: can ignore – What I/O interrupts do we encounter? Direct memory access • Used to large transfer of data – E.g. reading contents of a file into memory • DMA controller does I/O between device and memory independent of and parallel with CPU execution • Figure 13.5 example – Process in CPU sends command to DMA controller identifying source and destination locations. – CPU goes about its business. DMA controller & device driver do the rest, communicating with the disk controller. – DMA tells disk to transfer a chunk of data to memory location X. – Disk controller sends individual bytes to DMA controller – DMA controller keeps track of progress. When done, interrupt CPU to announce completion. Application I/O interface • In order to help the OS define appropriate system calls, we need to know what devices can do for us • Classify device personality. Such as: – Character-stream or block? – Sequential or random access desired? – Synchronous or asynchronous, i.e. predictable or unpredictable response times? • Example devices (Figure 13.7) – – – – Terminal is character-stream oriented Disk is block oriented, and can both read & write data Keyboard is asynchronous Graphics card is write-only • Question for next time: what use can we make of clock? • I/O systems, continued – – – – …Features of I/O system calls Kernel responsibilities Buffer, cache, spool Performance issues System call behavior • Clocks – Some I/O requests may be periodic, or set to occur at a specific time – Fine grain (cycle): look up the time – Coarse grain: HW clock generates timer interrupts approx. every 1/60 of a second. Why so seldom? • Blocking vs. nonblocking I/O – Blocking: put yourself to sleep while waiting for completion. More straightforward to code – Nonblocking: you want to keep going while waiting. Response time is important. Example? • If it’s short and quick: have another thread get the data • Usually: use asynchronous system call, and wait for I/O interrupt or “event” to take place Kernel’s job • I/O scheduling – – – – Critical task since inherently slow 2 goals: minimize average response time; fairness Rearrange order of I/O requests as they enter “queue” Ex. Using the elevator algorithm for disk access • Error handling – Transient failures occur: prepare to retry I/O calls – I/O system calls can return an errno • Protection – We don’t want users to directly access I/O instructions. – All I/O requests need to be checked by kernel – Memory-mapped memory areas should be off limits to direct user intervention. (unnecessary and invites bugs) Buffers • Memory area between device and application temporarily holding data. Motivation… • Different speeds of producer & consumer (Fig. 13.10) – Would be nice to do one disk operation; wait until a whole disk block can be written to, not just 1 line of text. – Why do we use “double buffering”? • Different size units – Not everything is the same size as a disk block, page frame, TCP packet, etc. • Spool = buffer where output cannot be interleaved from different sources: printing – Create temporary “file” for each print job print queue – Managed by dedicated daemon process Preparing HW ops • Many steps, common example is reading file from disk • Before we can communicate with disk controller, need to locate file – File system identifies the device containing the file (how?) – Determine which disk blocks comprise the file (how?) • Life cycle of I/O request begins! – – – – – – Note that: A device has a wait queue (why?) Use DMA if the amount of data is large Small data can be kept in a buffer Lots of signalling/interrupts going on End result: I/O system call returns some value to user process. Let’s go through the steps Performance issues • I/O incurs many interrupts • Each interrupt causes a context switch – we’d like to minimize these • Ex. When logging in to a remote machine, don’t create a network message for every keyboard interrupt • Don’t copy data too many times unnecessarily • Where to implement I/O: various levels: – User space, kernel space, device driver – Microcode on device controller or in the makeup of device • Trends to observe among the levels (Fig. 13.16) – Cost of mistake; efficiency; development cost; flexibility CS 346 – Chapter 14 • Protection (Ch. 14) – Users & processes want resources. Protection means controlling their access. – More than just RWX. • Security (Ch. 15) – Preserving integrity of system & its data Background • Protect from … – Malicious, unauthorized or incompetent users – Waste (e.g. accessing expensive equipment just because cheaper resource is busy) • Distinguish between: policy & mechanism • Principle of least privilege – Minimum damage in case of error – Easier to identify who did what – Create user accounts, and tailor privileges accordingly • Bipartite relationship – Processes vs. objects – Ex. What files does a process have access to? – More practical to organize privileges by user Access control matrix • Butler Lampson, 1969. • Express our policies: how subjects (users/processes) can use each object – For each subject & each object, state the access rights – Can be unwieldy in general! • Protection domain – Set of common access rights – Usually correspond to a user or class of users Ex. Students, faculty, guests, system administrators – Process runs inside a domain determined by its owner – Domains may coincidentally overlap (Figure 14.1) Domains • Representation as 2-D table – Rows are the domains – Columns are objects – Entries in table specify access rights (Fig. 14.3) • A user can only be in 1 protection domain at any given time. – Static: a user/process always operates in the same domain (simple but inflexible) – Dynamic: a user/process can switch to another domain (complex but flexible) Can represent this way: domains are objects that a user in some domain can “switch” to. See Fig. 14.4. • UNIX: some programs have setuid bit set to allow domain switching. Example Domain Resource 1 Admin Execute Students Execute Faculty Owner Execute Resource 2 Resource 3 Write Execute Read Copy Execute • In addition to read/write/execute, special powers • Copy: you can “copy” an access right for this object to another domain. • Owner: You can create/delete access rights for this object Implementation • In theory, access control matrix is a huge table – Logically it’s 3 dimensional (capability is 3rd dimension) – Sparse: few rows, thousands of columns – Waste of virtual memory, I/O to look up this separate table • Access list for objects – Each object (file or other resource) will have attribute identifying what can be done by members of each domain – Can define a default to save space • Capability list for domains – List what I have access to, and what I can do with it – We don’t want users to arbitrarily change their capabilities! Capability information must be protected. How? Some questions • What should we do about objects that have no access rights defined? • How would we implement a policy limiting the number of times a resource is accessed? • How would we implement a policy allowing access only during certain times of day? CS 346 – Chapter 15 • Security – – – – – Physical, human, program Authentication Dictionary attack Cryptography Defense policies Areas of security Attackers look for every opportunity to get in • Physical – Restricting access: guards, locked doors – Sounds simple, but don’t neglect! • Human factors – Naivete, laziness, dishonesty – Help users pick good passwords, other recommended practices – How to handle offenders or people with a history • Program – Correct algorithm, installation of software – Used in the way originally intended – Proper behavior vs. malicious code Coding errors • Not checking validation correctly – A program to support a client remotely accessing the server through commands – Input command is scrutinized for safety: limited to “safe” commands. – But if we parse the command incorrectly, we may actually perform unsafe operation unwittingly • Synchronization problem – mkdir could be executed in 2 steps: kernel creates new empty subdirectory and assigns it to root. Then, ownership is transferred to the user who executed mkdir. – In between the 2 steps: If the system is busy, evil user can execute a command to replace the new directory with a link to some other existing file on the system. Malicious code • Trojan horse – 2 purposes: one obvious & benign; the other hidden and evil – Designed to appear like ordinary, beneficial program. “eat me” • Root kit – Trojans that replace system utility files – Suppose you break into a system, and install programs that allow you secret access. System admin can find evidence of your intrusion, look at system logs of your files and work. What can you do to cover your tracks? • Trap door – Flaw in a program placed there by designer. Bypasses security checks under some circumstances. May originally have been debugging mode. – Ex. Special access code Malicious (2) • Virus – – – – Fragment of code that spreads copies of itself to other programs Requires a host program Ex. May append/prepend its instructions to existing program Every time program runs, virus code is executed, in order to spread itself & perhaps do other “work” • Virus scanning technique – Read program code for “signature” of known viruses. In other words, look for substring of code that is unique to the virus. – But… virus may be polymorphic – New viruses keep appearing Malicious (3) • Worm – Like a virus, but it’s a stand-alone program that replicates itself and spreads. – Also can contain code to do other “work” Example: Robert Morris, 1988 • Included a special module called the “grappling hook” – – – – Install itself on remote system Make network connection back to original system Transfer rest of worm to new victim Execute worm on victim • Worm designed to exploit weaknesses in existing UNIX utility programs Morris exploits • sendmail program – Debug option: allowed an e-mail message to specify a program as its recipient. This program would run, using e-mail message body as its input. – Worm created an e-mail message, containing grappling hook code…. Instructions to remove mail headers…. Resulting program passed to shell • finger daemon – Exploited buffer overflow by “fingering” a very long name. When procedure called, it overwrote correct return address with address of grappling hook code. • 2 other exploits involved remote shell applications – Attempted to crack passwords • What happened to Morris himself? Dictionary attack • We can use a hash function to encode passwords – No way to compute decoded value, so we don’t have to worry about password table being compromised • Attacker’s strategy – Get the password table. Administrator complacently left it unprotected. – Compile a dictionary of thousands of common words; compute the hash value of each. – Look for matches between dictionary and values in password table. • Prepare for the threat – Ask people to pick strange passwords, or force them to use a predefined one… that’s hard to remember. – Salt the password table Salt • A random string that is appended to a password before being hashed. • When user logs in, password is concatenated with salt value, hashed, and checked against entry in password table. • Attacker must now expand dictionary to contain every possible salt value with every possible password. Cryptography • Generally not feasible to build a totally secure network. • Goal: secure communication over unsecure medium – Key = secret information used to encode/decode message – Recipient verifies the message it receives is from correct sender – Sender wants to ensure only the recipient will understand msg • Encryption algorithm: how to secure messages – Encryption function: (plaintext, key) ciphertext – Decryption function: (ciphertext, key’) plaintext – Decryption secrecy is more critical than encryption. • Types – Symmetric: Use same key; decrypt analogous to encrypt – Asymmetric: Different keys; breaking much more tedious Examples • Caesar cipher; substitution ciphers – There are 26! ways in which letters can be reassigned. – What is the “key”? Is this method secure? • One-time pad (e.g. JN-25) – Dictionary table: convert each word to a 5-digit number – Additive table: add the next random number to each word – Preface the message by indicating where in additive table you are starting the encoding – Tables may be periodically changed. – Example: encryption code book.xlsx • Data encryption standard – Manipulate 64-bit chunks at a time, using XOR and shift operators. RSA • Choose distinct 512-bit random primes p and q • Let N = pq, and let M = (p – 1)(q – 1) • Choose public encryption key e: a value less than and relatively prime to M. – Message is x. Sender transmits: y = xe mod N • Choose private decryption key d: where ed mod M = 1 – e and N are public; outsider should have a tough time factoring N to obtain p and q to determine d – Recipient converts: z = yd mod N which should equal x. • Example p = 31, q = 41 N = 1271, M = 1200, e = 7, d = 343 x = 12 y = 127 mod 1271 = 1047; z = 1047343 mod 1271 = 12 Note: exponentiation should not be iterative multiplications Example • Choose secret primes p,q • N = pq; M = (p – 1)(q – 1) • Choose e < & relatively prime to M. • Message is x. Compute and send y = xe mod N • Pick private decrypt key d where ed mod M = 1 • z = yd mod N, which should equal x. p = 31, q = 41 N = 1271, M = 1200 e=7 x = 12 y = 127 mod 1271 = 1047 d = 343 z = 1047343 mod 1271 = 12 It works! Diffie - Hellman • Method for 2 people to establish a private key • Choose values p (prime) and q • Sender – chooses secret value a, and computes A = qa mod p – Sends A, p, q – Eavesdropper cannot easily determine a • Receiver – Chooses secret value b – Computes B = qb mod p and K = Ab mod p – Sends B back to sender, who can compute K = Ba mod p • Both methods of computing secret K are equivalent – Ab mod p = (qa)b mod p – Ba mod p = (qb)a mod p Digital signature • Used to authenticate origin of message – Also useful if later sender denies ever sending the message • Sender – Computes hash value of message 128/160 bit result – Applies D function (using private key) “signature block” – Appends signature block to the message to send • Receiver – Applies E function (using sender’s public key) hash – Computes hash value of message, see if there is a match. • Efficient since E & D functions applied to small amount of data. The message body itself might not be confidential. Doing security • Defense in depth: don’t rely on just 1 catch-all method • Some attackers know intimate details of your system and how you operate – Attackers may make some assumptions; surprises slow them down • Penetration test. Look for: – Bad passwords – Programs that look or behave abnormally • Using setuid when not necessary • In system directory when not necessary • Too many daemons – Unusual file permissions, search paths, modification dates – Old versions of software Intrusion detection • What data do you want to collect? • When is a real-time response required? • What to scan: – System calls, shell commands, network packets • Possible responses – Kill process – Surreptitiously alerting admin – Have honeypots ready for attacker • How to detect – Signature-based: look for specific string or behavior pattern • Must know what to look for – Anomalies from normal operating specifications • But, what is normal? Anomaly detection • Establish accurate benchmarks of normal operation – Ex. How often do we get pinged from China? • False positive = false alarm: alert human, but no intrusion • False negative = we missed an intrusion • Deciding whether to alert human is critical, or else people will perceive a lot of false alarms exist • Example – 20 out of 1,000,000 records show intrusion – System detects/alerts 80% of these intrusion events • 16 records revealed, 4 ignored – System falsely identifies 0.01% of normal events as an intrusion • 0.01% of 999,980 = ~ 100 false alarms – From human point of view, 100/116 = 86% alarms are false