Review: threads and processes Landon Cox January 20, 2016 Intro to processes • Remember, for any area of OS, ask • What interface does the hardware provide? • What interface does the OS provide? • Physical reality? • Single computer (CPUs + memory) • Execute instructions from many programs • What an application sees? • Each app thinks it has its own CPU + memory Hardware, OS interfaces Applications Job 1 Job 2 Job 3 CPU, Mem CPU, Mem CPU, Mem OS Hardware Memory CPUs What is a process? • Informal • A program in execution • Running code + things it can read/write • Process ≠ program • Formal • ≥ 1 threads in their own address space • (soon threads will share an address space) Parts of a process • Thread • Sequence of executing instructions • Active: does things • Address space • Data the process uses as it runs • Passive: acted upon by threads Play analogy • Process is like a play performance • Program is like the play’s script What are the threads? What is the address space? Threads Address space What is in the address space? • Program code • Instructions, also called “text” • Data segment • Global variables, static variables • Heap (where “new” memory comes from) • Stack • Where local variables are stored Review of the stack • Each stack frame contains a function’s • • • • Local variables Parameters Return address Saved values of calling function’s registers • The stack enables recursion Example stack Code void C () { A (0); 0x8048347 } Memory 0xfffffff tmp=0 A RA=0x8048347 const=0 C RA=0x8048354 void B () { C (); 0x8048354 } void A (int tmp){ if (tmp) B (); 0x8048361 } Stack … B RA=0x8048361 tmp=1 A RA=0x804838c int main () { A (1); return 0; 0x804838c } main 0x0 const1=1 const2=0 The stack and recursion Code Memory 0xfffffff How can recursion go wrong? Can overflow the stack … Keep adding frame after frame bnd=0 A RA=0x8048361 bnd=1 A RA=0x8048361 void A (int bnd){ if (bnd) A (bnd-1); 0x8048361 } int main () { A (3); return 0; 0x804838c } Stack … bnd=2 A RA=0x8048361 bnd=3 A RA=0x804838c main 0x0 const1=3 const2=0 What is missing? • What state isn’t in the address space? • Registers • Program counter (PC) • General purpose registers • Review architecture for more details Multiple threads in an addr space • Several actors on a single set • Sometimes they interact (speak, dance) • Sometimes they are apart (different scenes) Private vs global thread state • What state is private to each thread? • PC (where actor is in his/her script) • Stack, SP (actor’s mindset) • What state is shared? • Global variables, heap • (props on set) • Code (like lines of a play) Concurrency • Concurrency • Having multiple threads active at one time • Thread is the unit of concurrency • Primary topics • How threads cooperate on a single task • How multiple threads can share the CPU Address spaces • Address space • Unit of “state partitioning” • Primary topics • Many addr spaces sharing physical memory • Efficiency • Safety (protection) Cooperating threads • Assume each thread has its own CPU • We will relax this assumption later Memory Thread A Thread B Thread C CPU CPU CPU • CPUs run at unpredictable speeds • Can be stalled for any number of reasons • Source of non-determinism Non-determinism and ordering Thread A Thread B Thread C Global ordering Why do we care about the global ordering? Might have dependencies between events Different orderings can produce different results Why is this ordering unpredictable? Can’t predict how fast processors will run Time Non-determinism example 1 • Thread A: cout << “ABC”; • Thread B: cout << “123”; • Possible outputs? • “A1BC23”, “ABC123”, … • Impossible outputs? Why? • “321CBA”, “B12C3A”, … • What is shared between threads? • Screen, maybe the output buffer Non-determinism example 2 • y=10; • Thread A: int x = y+1; • Thread B: y = y*2; • Possible results? • A goes first: x = 11 and y = 20 • B goes first: y = 20 and x = 21 • What is shared between threads? • Variable y Non-determinism example 3 • x=0; • Thread A: x = 1; • Thread B: x = 2; • Possible results? • B goes first: x = 1 • A goes first: x = 2 • Is x = 3 possible? Example 3, continued • What if “x = <int>;” is implemented as • x := x & 0 • x := x | <int> • Consider this schedule • • • • Thread A: Thread B: Thread B: Thread A: x x x x := := := := x x x x & & | | 0 0 1 2 Atomic operations • Must know what operations are atomic • before we can reason about cooperation • Atomic • Indivisible • Happens without interruption • Between start and end of atomic action • No events from other threads can occur Review of examples • Print example (ABC, 123) • What did we assume was atomic? • What if “print” is atomic? • What if printing a char was not atomic? • Arithmetic example (x=y+1, y=y*2) • What did we assume was atomic? Atomicity in practice • On most machines • Memory assignment/reference is atomic • E.g.: a=1, a=b • Many other instructions are not atomic • E.g.: double-precision floating point store • (often involves two memory operations) Virtual/physical interfaces Applications SW atomic operations OS If you don’t have atomic operations, you can’t make one. HW atomic operations Hardware Constraining concurrency • Synchronization • Controlling thread interleavings • Some events are independent • No shared state • Relative order of these events don’t matter • Other events are dependent • Output of one can be input to another • Their order can affect program results Goals of synchronization 1. All interleavings must give correct result • Correct concurrent program • Works no matter how fast threads run • Important for your projects! 2. Constrain program as little as possible • Why? • Constraints slow program down • Constraints create complexity Raising the level of abstraction • Locks • Also called mutexes • Provide mutual exclusion • Prevent threads from entering a critical section • Lock operations • Lock (aka Lock::acquire) • Unlock (aka Lock::release) Lock operations • Lock: wait until lock is free, then acquire it do { if (lock is free) { lock = 1 break } } while (1) Must be atomic with respect to other threads calling this code • This is a busy-waiting implementation • We’ll fix this in a few lectures • Unlock: atomic lock = 0 Elements of locking 1. The lock is initially free 2. Threads acquire lock before an action 3. Threads release lock when action completes 4. Lock() must wait if someone else has lock • Key idea • All synchronization involves waiting • Threads are either running or blocked Example: thread-safe queue enqueue () { lock (qLock) // ptr is private // head is shared new_element = new node(); if (head == NULL) { head = new_element; } else { node *ptr; // find queue tail for (ptr=head; ptr->next!=NULL; ptr=ptr->next){} ptr->next=new_element; } new_element->next=0; unlock(qLock); } dequeue () { lock (qLock); element=NULL; if (head != NULL) { // if queue non-empty if (head->next!=0) { // remove head element=head->next; head->next= head->next->next; } else { element = head; head = NULL; } } unlock (qLock); return element; } What can go wrong? Thread-safe queue • Can enqueue unlock anywhere? • No • Must leave shared data • In a consistent/sane state • Data invariant • “consistent/sane state” • “always” true enqueue () { lock (qLock) // ptr is private // head is shared new_element = new node(); if (head == NULL) { head = new_element; } else { node *ptr; // find queue tail for (ptr=head; ptr->next!=NULL; ptr=ptr->next){} ptr->next=new_element; } unlock(qLock); // safe? new_element->next=0; } Invariants • What are the queue invariants? • Each node appears once (from head to null) • Enqueue results in prior list + new element • Dequeue removes exactly one element • Can invariants ever be false? • Must be • Otherwise you could never change states More on invariants • So when is the invariant broken? • Can only be broken while lock is held • And only by thread holding the lock http://www.flickr.com/photos/jacobaaron/3489644869/ http://www.flickr.com/photos/jacobaaron/3489644869/ More on invariants • So when is the invariant broken? • Can only be broken while lock is held • And only by thread holding the lock • Really a “public” invariant • The data’s state in when the lock is free • Like having your house tidy before guests arrive • Hold lock whenever accessing shared data More on invariants • What about reading shared data? • Still must hold lock • Else another thread could break invariant • (Thread A prints Q as Thread B enqueues) Intro to ordering constraints • Say you want dequeue to wait while the queue is empty • Can we just busy-wait? dequeue () { • No! lock (qLock); element=NULL; • Still holding lock while (head==NULL) {} // remove head element=head->next; head->next=NULL; unlock (qLock); return element; } Release lock before spinning? dequeue () { lock (qLock); element=NULL; unlock (qLock); while (head==NULL) {} What can go wrong? Head might be NULL when lock (qLock); // remove head we try to remove entry element=head->next; head->next=NULL; unlock (qLock); return element; } One more try • Does it work? • Seems ok • Why? • Shared state is protected • Downside? • Busy-waiting • Wasteful dequeue () { lock (qLock); element=NULL; while (head==NULL) { unlock (qLock); lock (qLock); } // remove head element=head->next; head->next=NULL; unlock (qLock); return element; } Ideal solution • Would like dequeueing thread to “sleep” • Add self to “waiting list” • Enqueuer can wake up when Q is non-empty • Problem: what to do with the lock? • Why can’t dequeueing thread sleep with lock? • Enqueuer would never be able to add Release the lock before sleep? enqueue () { acquire lock find tail of queue add new element if (dequeuer waiting){ remove from wait list wake up dequeuer } release lock } dequeue () { acquire lock … if (queue empty) { release lock add self to wait list sleep acquire lock } … release lock } Does this work? Release the lock before sleep? 2 enqueue () { acquire lock find tail of queue add new element if (dequeuer waiting){ remove from wait list wake up dequeuer } release lock } Thread can sleep forever dequeue () { acquire lock … if (queue empty) { release lock add self to wait list sleep acquire lock } … release lock } 1 3 Release the lock before sleep? enqueue () { acquire lock find tail of queue add new element if (dequeuer waiting){ remove from wait list wake up dequeuer } release lock } dequeue () { acquire lock … if (queue empty) { add self to wait list release lock sleep acquire lock } … release lock } Release the lock before sleep? 2 enqueue () { acquire lock find tail of queue add new element if (dequeuer waiting){ remove from wait list wake up dequeuer } release lock } dequeue () { acquire lock … if (queue empty) { add self to wait list release lock sleep acquire lock } … release lock } 1 3 Problem: missed wake-up Note: this can be fixed, but it’s messy Two types of synchronization • As before we need to raise the level of abstraction 1. Mutual exclusion • One thread doing something at a time • Use locks 2. Ordering constraints • Describe “before-after” relationships • One thread waits for another • Use monitors: a lock + its condition variable Locks and condition variables • Condition variables • Let threads sleep inside a critical section • Internal atomic actions (for now, by definition) // begin atomic release lock put thread on wait queue go to sleep // end atomic • CV State = queue of waiting threads + one lock Condition variable operations Lock always held Lock always held Lock usually held Lock usually held wait (lock){ release lock put thread on wait queue go to sleep // after wake up acquire lock } Atomic signal (){ wakeup one waiter (if any) } Atomic broadcast (){ wakeup all waiters (if any) } Atomic CVs and invariants • Ok to leave invariants violated before wait? • No: wait can release the lock • Larger rule about returning from wait • Lock may have changed hands • State can change between wait entry and return • Don’t make assumptions about shared state Multi-threaded queue enqueue () { acquire lock dequeue () { acquire lock find tail of queue add new element if (queue empty) { wait (lock, CV) } signal (lock, CV) remove item from queue release lock return removed item release lock } } What if “queue empty” takes more than one instruction? Any problems with the “if” statement in dequeue? Multi-threaded queue enqueue () { acquire lock dequeue () { acquire lock find tail of queue add new element if (queue empty) { // begin atomic wait release lock add wait list, sleep // end atomic wait re-acquire lock } signal (lock, CV) release lock } remove item from queue release lock return removed item } Multi-threaded queue enqueue () { acquire lock 2 find tail of queue add new element signal (lock, CV) dequeue () { acquire lock 1 release lock } dequeue () { acquire lock … return removed item } 3 4 if (queue empty) { // begin atomic wait release lock add wait list, sleep // end atomic wait re-acquire lock } remove item from queue release lock return removed item } Multi-threaded queue enqueue () { acquire lock dequeue () { acquire lock find tail of queue add new element } if (queue empty) { // begin atomic wait release lock add wait list, sleep // end atomic wait re-acquire lock } How to solve? remove item from queue release lock return removed item signal (lock, CV) release lock } Multi-threaded queue The “condition” in condition enqueue () { variable acquire lock dequeue () { acquire lock find tail of queue add new element while (queue empty) { wait (lock, CV) } signal (lock, CV) remove item from queue release lock return removed item release lock } } Solve with a while loop (“loop before you leap”) You can now do first programming project Recap and looking ahead Applications Threads, synchronization primitives OS Hardware Atomic Load-Store, Interrupt enabledisable, Atomic Test-Set Course administration • Next lecture • Memory and address spaces • Paging, page tables, TLB, etc. • Next week • Start reading papers • Look at early operating systems • First programming project out Threads that aren’t running • What is a non-running thread? • thread=“sequence of executing instructions” • non-running thread=“paused execution” • Must save thread’s private state • To re-run, re-load private state • Want thread to start where it left off Private vs global thread state • What state is private to each thread? • Code (like lines of a play) • PC (where actor is in his/her script) • Stack, SP (actor’s mindset) • What state is shared? • Global variables, heap • (props on set) Thread control block (TCB) • What needs to access threads’ private data? • The CPU • This info is stored in the PC, SP, other registers • The OS needs pointers to non-running threads’ data • Thread control block (TCB) • Container for non-running threads’ private data • Values of PC, code, SP, stack, registers Thread control block Address Space TCB1 TCB2 TCB3 PC Ready queue SP registers PC SP registers PC SP registers Code Code Code Stack Stack Stack Thread 1 running PC SP registers CPU Thread control block Address Space Ready queue TCB2 TCB3 PC SP registers PC SP registers Stack Stack Code Stack Thread 1 running PC SP registers CPU Thread states • Running • Currently using the CPU • Ready • Ready to run, but waiting for the CPU • Blocked • Stuck in lock (), wait (), or down () Switching threads • What needs to happen to switch threads? 1. Thread returns control to OS • For example, via the “yield” call 2. OS chooses next thread to run 3. OS saves state of current thread • To its thread control block 4. OS loads context of next thread • From its thread control block 5. Run the next thread On Linux swapcontext 1. Thread returns control to OS • How does the thread system get control? • Voluntary internal events • Thread might block inside lock or wait • Thread might call into kernel for service • (system call) • Thread might call yield • Are internal events enough? 1. Thread returns control to OS • Involuntary external events • (events not initiated by the thread) • Hardware interrupts • Transfer control directly to OS interrupt handlers • From your architecture course – CPU checks for interrupts while executing – Jumps to OS code with interrupt mask set • Interrupts lead to pre-emption (a forced yield) • Common interrupt: timer interrupt 2. Choosing the next thread • If no ready threads, just spin • Modern CPUs execute a “halt” instruction • Loop switches to thread if one is ready • Many ways to prioritize ready threads • Huge literature on scheduling algorithms 3. Saving state of current thread • What needs to be saved? • Registers, PC, SP • What makes this tricky? • Self-referential sequence of actions • Need registers to save state • But you’re trying to save all the registers • Saving the PC is particularly tricky Saving the PC • Why won’t this work? Instruction address 100 store PC in TCB 101 switch to next thread • Returning thread will execute instruction at 100 • And just re-execute the switch • Really want to save address 102 4. OS loads the next thread • Where is the next thread’s state/context? • Thread control block (in memory) • How to load the registers? • Use load instructions to grab from memory • How to load the stack? • Stack is already in memory, load SP 5. OS runs the next thread • How to resume thread’s execution? • Jump to the saved PC • On whose stack are these steps running? or Who jumps to the saved PC? • The thread that called yield • (or was interrupted or called lock/wait) • How does this thread run again? • Some other thread must switch to it Why use locks? • If we have disable-enable, why do we need locks? • Program could bracket critical sections with disable-enable • Might not be able to give control back to thread library disable interrupts while (1){} • Can’t have multiple locks (over-constrains concurrency) Why use locks? • How do we know if disabling interrupts is safe? • Need hardware support • CPU has to know if running code is trusted (i.e, is the OS) • Example of why we need the kernel • Other things that user programs shouldn’t do? • Manipulate page tables • Reboot machine • Communicate directly with hardware • Will cover in upcoming memory review Lock implementation #1 • Kernel implementation • Disable interrupts + busy-waiting unlock () { lock () { disable interrupts disable interrupts value = FREE while (value != FREE) { enable interrupts enable interrupts } disable interrupts } value = BUSY enable interrupts } Why is it ok for lock code to disable interrupts? It’s in the trusted kernel (we have to trust something). Lock implementation #1 • Kernel implementation • Disable interrupts + busy-waiting unlock () { lock () { disable interrupts disable interrupts value = FREE while (value != FREE) { enable interrupts enable interrupts } disable interrupts } value = BUSY enable interrupts } Do we need to disable interrupts in unlock? Only if “value = FREE” is multiple instructions (safer) Lock implementation #1 • Kernel implementation • Disable interrupts + busy-waiting unlock () { lock () { disable interrupts disable interrupts value = FREE while (value != FREE) { enable interrupts enable interrupts } disable interrupts } value = BUSY enable interrupts } Why enable-disable in lock loop body? Otherwise, no one else will run (including unlockers) Using read-modify-write instructions • Disabling interrupts • Ok for uni-processor, breaks on multi-processor • Why? • Could use atomic load-store to make a lock • Inefficient, lots of busy-waiting • Hardware people to the rescue! Using read-modify-write instructions • Most modern processor architectures • Provide an atomic read-modify-write instruction • Atomically • Read value from memory into register • Write new value to memory • Implementation details • Lock memory location at the memory controller Test&set on most architectures Set: sets location to 1 Test: retruns old value test&set (X) { tmp = X X = 1 return (tmp) } • Slightly different on x86 (Exchange) • Atomically swaps value between register and memory Lock implementation #2 • Use test&set • Initially, value = 0 lock () { while (test&set(value) == 1) { } } What happens if value = 1? What happens if value = 0? unlock () { value = 0 } Locks and busy-waiting • All implementations have used busywaiting • Wastes CPU cycles • To reduce busy-waiting, integrate • Lock implementation • Thread dispatcher data structures Lock implementation #3 • Interrupt disable, no busy-waiting lock () { disable interrupts if (value == FREE) { value = BUSY // lock acquire } else { add thread to queue of threads waiting for lock switch to next ready thread // don’t add to ready queue } enable interrupts } unlock () { disable interrupts value = FREE if anyone on queue of threads waiting for lock { take waiting thread off queue, put on ready queue value = BUSY } enable interrupts } Lock implementation #3 This is called a “handoff” lock. lock () { disable interrupts if (value == FREE) { value = BUSY // lock acquire } else { add thread to queue of threads waiting for lock switch to next ready thread // don’t add to ready queue } enable interrupts } unlock () { disable interrupts value = FREE if anyone on queue of threads waiting for lock { take waiting thread off queue, put on ready queue value = BUSY } enable interrupts } Who gets the lock after someone calls unlock? Lock implementation #3 This is called a “handoff” lock. lock () { disable interrupts if (value == FREE) { value = BUSY // lock acquire } else { add thread to queue of threads waiting for lock switch to next ready thread // don’t add to ready queue } enable interrupts } unlock () { disable interrupts value = FREE if anyone on queue of threads waiting for lock { take waiting thread off queue, put on ready queue value = BUSY } enable interrupts } Who might get the lock if it weren’t handed-off directly? (i.e., if value weren’t set BUSY in unlock) Lock implementation #3 This is called a “handoff” lock. lock () { disable interrupts if (value == FREE) { value = BUSY // lock acquire } else { add thread to queue of threads waiting for lock switch to next ready thread // don’t add to ready queue } enable interrupts } unlock () { disable interrupts value = FREE if anyone on queue of threads waiting for lock { take waiting thread off queue, put on ready queue value = BUSY } enable interrupts } What kind of ordering of lock acquisition guarantees does the hand-off lock provide? Fumble lock? Lock implementation #3 This is called a “handoff” lock. lock () { disable interrupts if (value == FREE) { value = BUSY // lock acquire } else { add thread to queue of threads waiting for lock switch to next ready thread // don’t add to ready queue } enable interrupts } unlock () { disable interrupts value = FREE if anyone on queue of threads waiting for lock { take waiting thread off queue, put on ready queue value = BUSY } enable interrupts } What does this mean? Are we saving the PC? Lock implementation #3 This is called a “handoff” lock. lock () { disable interrupts if (value == FREE) { value = BUSY // lock acquire } else { lockqueue.push(&current_thread->ucontext); swapcontext(&current_thread->ucontext, &new_thread->ucontext)); } enable interrupts } unlock () { disable interrupts value = FREE if anyone on queue of threads waiting for lock { take waiting thread off queue, put on ready queue value = BUSY } enable interrupts } No, just adding a pointer to the TCB/context. Thread A Thread B B holds lock yield () { disable interrupts … switch (B->A) enable interrupts } // exit thread library <user code> lock () { disable interrupts … switch (A->B) back from switch (B->A) … enable interrupts } // exit yield <user code> unlock () // moves A to ready queue yield () { disable interrupts … switch (B->A) back from switch (A->B) … enable interrupts } // exit lock <user code> Lock implementation #4 • Test&set, minimal busy-waiting lock () { while (test&set (guard)) {} // like interrupt disable if (value == FREE) { value = BUSY } else { put on queue of threads waiting for lock switch to another thread // don’t add to ready queue } guard = 0 // like interrupt enable } unlock () { while (test&set (guard)) {} // like interrupt disable value = FREE if anyone on queue of threads waiting for lock { take waiting thread off queue, put on ready queue value = BUSY } guard = 0 // like interrupt enable } Lock implementation #4 Why is this better than t&s-only lock implementation? Only busywait while another thread is in lock or unlock Before, we busy-waited while lock was held lock () { while (test&set (guard)) {} // like interrupt disable if (value == FREE) { value = BUSY } else { put on queue of threads waiting for lock switch to another thread // don’t add to ready queue } guard = 0 // like interrupt enable } unlock () { while (test&set (guard)) {} // like interrupt disable value = FREE if anyone on queue of threads waiting for lock { take waiting thread off queue, put on ready queue value = BUSY } guard = 0 // like interrupt enable } Lock implementation #4 What is the switch invariant? Threads promise to call switch with guard set () { to 1. lock while (test&set (guard)) {} // like interrupt disable if (value == FREE) { value = BUSY } else { put on queue of threads waiting for lock switch to another thread // don’t add to ready queue } guard = 0 // like interrupt enable } unlock () { while (test&set (guard)) {} // like interrupt disable value = FREE if anyone on queue of threads waiting for lock { take waiting thread off queue, put on ready queue value = BUSY } guard = 0 // like interrupt enable } Summary of implementing locks • Synchronization code needs atomicity • Three options • Atomic load-store • Lots of busy-waiting • Interrupt disable-enable • No busy-waiting • Breaks on a multi-processor machine • Atomic test-set • Minimal busy-waiting • Works on multi-processor machines Semaphores • First defined by Dijkstra in mid 60s • Two operations: up and down // aka “V” (“verhogen”) up () { // begin atomic value++ // end atomic } What is going on here? Can value ever be < 0? // aka “P” (“proberen”) down () { do { // begin atomic if (value > 0) { value-break } // end atomic } while (1) } More semaphores • Key state of a semaphore is its value • Initial value determines semaphore’s behavior • Value cannot be accessed outside semaphore • (i.e., there is no semaphore.getValue() call) • Semaphores can be both synchronization types • Mutual exclusion (like locks) • Ordering constraints (like monitors) Semaphore mutual exclusion • Ensure that 1 (or < N) thread is in critical section s.down (); // critical section s.up (); • How do we make a semaphore act like a lock? • • • • Set initial value to 1 (or N) Like lock/unlock, but more general (could allow 2 threads in critical section if initial value = 2) Lock is equivalent to a binary semaphore Semaphore ordering constraints • Thread A waits for B before proceeding // Thread A s.down (); // continue // Thread B // do task s.up (); • How to make a semaphore behave like a monitor? • Set initial value of semaphore to 0 • A is guaranteed to wait for B to finish • Doesn’t matter if A or B run first • Like a CV in which condition is “sem.value==0” • Can think of as a “prefab” condition variable Upcoming • Friday lecture • Review of memory and address spaces • Next week • We’ll start reading papers • Start programming Project 1 • Other questions?