W4118 Operating Systems Instructor: Junfeng Yang Logistics Homework 2 Who has problems getting the VM? This assignment requires changes to some assembly code; we’ll update webpage to include the assembly changes Any questions? Last lecture Process (cont) Process termination Interprocess communication • Zombie • Orphan • Message passing: signal, pipe • Shared memory: shmget Processes in Linux Data structures Process operations • Process control block: task_struct • kernel stack • Connections between them, thread_info • fork • exit() Today Processes in Linux Context switch on x86 Threads Why and what? Multithreading Models Intro to synchronization Context switch call chain schedule() (kernel/sched.c) (talk about scheduling later) context_switch() swtich_mm (include/asm-i386/mmu_context.h) switch address space switch_to (include/asm-i386/system.h) switch stack, regs, and %eip __swtich_to (arch/i386/kernel/process.c) architecture dependent setup Context switch by stack swtich: the idea Kernel stack captures process states Registers task_struct through thread_info through %esp Changing the stack pointer changes the process esp Task’s process-descriptor Task’s kernel-stack Task’s thread-info Context switch by stack switch: the implementation (simplified) P0 stack eax P1 stack eax … … ret_addr thread_info thread_info esp esp eip eip swtich_to(p0,p1) save registers on stack p0->esp = %esp p0->eip = ret_addr; %esp = p1->esp; push p1->eip; ret ret_addr: pop registers from stack include/asm-i386/system.h p0->eip = ret_addr esp eip eax … CPU Today Processes in Linux Context switch on x86 Threads Why and what? Multithreading Models Intro to synchronization Why threads? Express concurrency Web server (multiple requests), Browser (gui + network I/O), … for(;;) { int fd = accept_client(); create_thread(process_request, fd); } Efficient communication Using a separate process for each task can be heavyweight • Costly: process creation, context switch, communication • In general, having a separate kernel anything may be heavyweight: must trap into kernel for service Threads Threads: separate streams of executions that share an address space Allows one process to have multiple point of executions, can potentially use multiple CPUs Thread control block (TCB): PC, regs, stack, … Very similar to processes, but different Single and Multithreaded Processes Threads in one process share code, data, files, … Threads vs. Processes A thread has no data segment or heap A thread cannot live on its own, it must live within a process There can be more than one thread in a process, the first thread calls main & has the process’s stack Inexpensive creation Inexpensive context switching Efficient communication If a thread dies, its stack is reclaimed • • A process has code/data/heap & other segments A process has at least one thread • Threads within a process share code/data/heap, share I/O, but each has its own stack & registers • • Expensive creation Expensive context switching • Interprocess communication can be expressive If a process dies, its resources are reclaimed & all threads die • How to use threads? Use thread library E.g. pthread, Win32 thread Common operations create/terminate suspend/resume priorities and scheduling synchronization Example pthread functions int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine)(void*), void *arg); Create a new thread to run start_routine on arg thread holds the new thread’s id int pthread_join(pthread_t thread, void **value_ptr); Wait for thread termination, and retrieve return value in value_ptr void pthread_exit(void *value_ptr); Terminates the calling thread, and returns value_ptr to threads waiting in pthread_join pthread creation example void* thread_fn(void *arg) { int id = (int)arg; printf("thread %d runs\n", id); return NULL; $ gcc –o threads threads.c –Wall –lpthread } $ threads int main() thread 1 runs { thread 2 runs pthread_t t1, t2; pthread_create(&t1, NULL, thread_fn, (void*)1); pthread_create(&t2, NULL, thread_fn, (void*)2); pthread_join(t1, NULL); pthread_join(t2, NULL); return 0; One way to view threads: function calls, } except caller doesn’t wait for callee; instead, both run concurrently Today Processes in Linux Context switch on x86 Threads Why and what? Multithreading Models Intro to synchronization Multithreading Models Where to support threads? User threads: thread management done by user-level threads library, typically without knowledge of the kernel Kernel threads: threads directly supported by the kernel Virtually all modern OS support kernel threads User vs. Kernel Threads Example from Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639 User vs. Kernel Threads (cont.) Pros: fast, no system call for creation, context switch • Cons: slow, kernel does creation, scheduling, etc Cons: kernel unaware, so can’t schedule one thread blocks, all blocks • Pros: kernel knows, complete flexibility one thread blocks, schedule another No free lunch! Multiplexing User-Level Threads A thread library must map user threads to kernel threads Big picture: kernel thread: physical concurrency, how many cores? User thread: application concurrency, how many tasks? Different mappings exist, representing different tradeoffs Many-to-One: many user threads map to one kernel thread, i.e. kernel sees a single process One-to-One: one user thread maps to one kernel thread Many-to-Many: many user threads map to many kernel threads Many-to-One Many user-level threads mapped to single kernel thread Why good? • Fast - no system calls required • Portable - few system dependencies Examples: • Solaris Green Threads • GNU Portable Threads Problem No parallel execution of threads - can’t exploit multiple CPUs - All threads block when one uses synchronous I/O One-to-One Each user-level thread maps to kernel thread Why good? More concurrency • better multiprocessor performance • When one blocks, others can run Examples • Windows NT/XP/2000 • Linux • Solaris 9 and later Problem: expensive Each user thread requires creation of kernel thread Each thread requires kernel resources; limits number of total threads Most thread operation are system calls Many-to-Many Model Allows many user level threads to be mapped to many kernel threads (U >= K) Why good? • Flexible. OS creates kernel threads for physical concurrency, and programmers create user threads for application concurrency Examples: • Solaris prior to version 9 • Windows NT/2000 with the ThreadFiber package Problem? complex Most use 1:1 mapping anyway Two-level Model Similar to M:M, except that a user thread may be bound to kernel thread Examples IRIX HP-UX Tru64 UNIX Solaris 8 and earlier Thread Design Issues Semantics of fork() and exec() system calls Does fork() duplicate only the calling thread or all threads? Signal handling Which thread to deliver it to? Thread Pools Recall web server example: create a thread for each user request Problem: Thread creation also costly • And, the created thread exits after serving a request More user request More threads Solution: thread pool Pre-create a number of threads waiting for work Wake up thread to serve user request --- faster than thread creation When request done, don’t exit --- go back to pool Limits the max number of threads Today Processes in Linux Context switch on x86 Threads Why and what? Multithreading Models Intro to synchronization Banking example int balance = 1000; int main() { pthread_t t1, t2; pthread_create(&t1, NULL, deposit, (void*)1); pthread_create(&t2, NULL, withdraw, (void*)2); pthread_join(t1, NULL); pthread_join(t2, NULL); printf(“all done: balance = %d\n”, balance); return 0; } void* deposit(void *arg) { int i; for(i=0; i<1e7; ++i) ++ balance; } void* withdraw(void *arg) { int i; for(i=0; i<1e7; ++i) -- balance; } Results of the banking example $ gcc –Wall –lpthread –o bank bank.c $ bank all done: balance = 1000 $ bank all done: balance = 140020 $ bank all done: balance = -94304 $ bank all done: balance = -191009 Why? A closer look at the banking example $ objdump –d bank … 08048464 <deposit>: … 8048473: a1 80 97 04 08 8048478: 83 c0 01 804847b: a3 80 97 04 08 … 0804849b <withdraw>: … 80484aa: a1 80 97 04 08 80484af: 83 e8 01 80484b2: a3 80 97 04 08 … // ++ mov add mov balance 0x8049780,%eax $0x1,%eax %eax,0x8049780 // -- balance mov 0x8049780,%eax sub $0x1,%eax mov %eax,0x8049780 One possible schedule CPU 0 mov time CPU 1 0x8049780,%eax balance: 1000 eax0: 1000 add $0x1,%eax mov %eax,0x8049780 eax0: 1001 balance: 1001 eax1: 1001 eax1: 1000 mov sub 0x8049780,%eax $0x1,%eax mov balance: 1000 One deposit and one withdraw, balance unchanged. Correct %eax,0x8049780 Another possible schedule CPU 0 mov add mov time CPU 1 0x8049780,%eax balance: 1000 eax0: 1000 $0x1,%eax eax0: 1001 %eax,0x8049780 mov 0x8049780,%eax eax1: 1000 balance: 1001 eax1: 999 sub mov $0x1,%eax %eax,0x8049780 balance: 999 One deposit and one withdraw, balance becomes less. Wrong! Race condition Definition: a timing dependent error involving shared state Can be very bad • “non-deterministic:” don’t know what the output will be, and it is likely to be different across runs • Hard to detect: too many possible schedules – Typical: doesn’t show up in testing, but does show up in field • Hard to debug: “heisenbug,” debugging changes timing so hides bugs – vs “bohr bug” What to do??? Avoiding race conditions Critical section: a segment of code that accesses shared variable (or resource) and must not be concurrently executed by more than one thread To implement critical sections, need atomic operations Atomic operations: no other instructions can be interleaved, executed “as a unit” “all or none” Example atomic operations x86: load, store of words test_and_set, compare_and_swap Critical-Section Problem 1. 2. Entry Section - Code that requests permission to enter its critical section. Exit Section - Code that is run after exiting the critical section Critical Section Goals Threads do some stuff but eventually might try to access shared data time T1 CSEnter(); Critical section CSExit(); T1 T2 CSEnter(); Critical section CSExit(); T2 Critical Section Goals Perhaps they loop (perhaps not!) T1 CSEnter(); Critical section CSExit(); T1 T2 CSEnter(); Critical section CSExit(); T2 Critical Section Goals We would like Safety (aka mutual exclusion) • No more than one thread can be in a critical section at any time. Liveness (aka progress) • A thread that is seeking to enter the critical section will eventually succeed Bounded waiting • A bound must exist on the number of times that other threads are allowed to enter their critical sections after a thread has made a request to enter its critical section and before that request is granted • Assume that each process executes at a nonzero speed • No assumption concerning relative speed of the N processes Ideally we would like fairness as well If two threads are both trying to enter a critical section, they have equal chances of success … in practice, fairness is rarely guaranteed