Threads What do we have so far The basic unit of CPU utilization is a process. To run a program (a sequence of code), create a process. Processes are well protected from one another. Switching between processes is fairly expensive. Communications between processes are done through inter-process communication mechanisms. Running process requires the memory management system. Process I/O is done by the I/O subsystem. Process for all concurrency needs? Consider developing a PC game: Different code sequences for different characters (soldiers, cities, airplanes, cannons, user controlled heroes) Each of the characters is more or less independent. We can create a process for each character. Any drawbacks? The action of a character usually depends on the game state (locations of other characters). Implication on the process based implementation of characters? A lot of context switching. What do we really need for a PC game? A way to run different sequences of code (threads of control) for different characters. Processes do this. A way for different threads of control to share data effectively. Processes are NOT designed to do this. Protection is not very important, a game is one application anyway. Process is an over-kill. Switching between threads of control must be as efficient as possible. Context switching is known to be expensive!!! Theads are created to do all of above. Thread Process context Process ID, process group ID, user ID, and group ID Environment Working directory. Program instructions Registers (including PC) Stack Heap File descriptors Signal actions Shared libraries Inter-process communication tools What is absolutely needed to run a sequence of code (a thread of control)? Process/Thread context What are absolutely needed to support a stream of instructions, given the process context? Process ID, process group ID, user ID, and group ID Environment Working directory. Program instructions Registers (including PC) Stack Heap File descriptors Signal actions Shared libraries Inter-process communication tools Process and Thread Threads Threads are executed within a process. Share process context means easy inter-thread communication. No protection among threads but threads are intended to be cooperative. Thread context: PC, registers, a stack, misc. info. Much smaller than the process context!! Faster context switching Faster thread creation Threads Threads are also called light-weight processes. traditional processes are considered heavy-weight. A process can be single-threaded or multithreaded. Threads become so ubiquitous that almost all modern computing systems use thread as the basic unit of CPU utilization. More about threads OS view: A thread is an independent stream of instructions that can be scheduled to run by the OS. Software developer view: a thread can be considered as a “procedure” that runs independently from the main program. Sequential program: a single stream of instructions in a program. Multi-threaded program: a program with multiple streams Multiple threads are needed to use multiple cores/CPUs Threads… Exist within processes Die if the process dies Use process resources Duplicate only the essential resources for OS to schedule them independently Each thread maintains Stack Registers Scheduling properties (e.g. priority) Set of pending and blocked signals (to allow different react differently to signals) Thread specific data The Pthreads (POSIX threads) API Three types of routines: Thread management: create, terminate, join, and detach Mutexes: mutual exclusion, creating, destroying, locking, and unlocking mutexes Condition variables: event driven synchronizaiton. Mutexes and condition variables are concerned about synchronization. Why not anything related to inter-thread communication? The concept of opaque objects pervades the design of the API. The Pthreads API naming convention Routine Prefix Function Pthread_ General pthread Pthread_attr_ Thread attributes Pthread_mutex_ mutex Pthread_mutexattr Mutex attributes Pthread_cond_ Condition variables Pthread_condaddr Conditional variable attributes Pthread_key_ Thread specific data keys Compiling pthread programs Pthread header file <pthread.h> Compiling pthread programs: gcc –lpthread aaa.c Thread management routines Creation: pthread_create Termination: Return Pthread_exit Can we still use exit? Wait (parent/child synchronization): pthread_join Creation Thread equivalent of fork() int pthread_create( pthread_t * thread, pthread_attr_t * attr, void * (*start_routine)(void *), void * arg ); Returns 0 if OK, and non-zero (> 0) if error. Parameters for the routines are passed through void * arg. What if we want to pass a structure? Termination Thread Termination Return from initial function. void pthread_exit(void * status) Process Termination exit() called by any thread main() returns Waiting for child thread int pthread_join( pthread_t tid, void **status) Equivalent of waitpid()for processes Detaching a thread The detached thread can act as daemon thread The parent thread doesn’t need to wait int pthread_detach(pthread_t tid) Detaching self : pthread_detach(pthread_self()) Some multi-thread program examples A multi-thread program example: example1.c Making multiple producers: example2.c What is going on in this program? How to fix it? Matrix multiply and threaded matrix multiply Matrix multiply: C = A × B N C[i, j ] A[i, k ] B[k , j ] k 1 C[0, 1], ......, C[0, N - 1] A[0, 0], A[0, 1], ......, A[0, N - 1] B[0, 0], B[0, 1], ......, B[0, N - 1] C[0, 0], C[1, 1], ......, C[1, N - 1] A[1, 0], A[1, 1], ......, A[1, N - 1] B[1, 0], B[1, 1], ......, B[1, N - 1] C[1, 0], ................................................. ................................................. ................................................. C[N - 1, 0], C[N - 1, 1], ......, C[N - 1, N - 1] A[N - 1, 0], A[N - 1, 1], ......, A[N - 1, N - 1] B[N - 1, 0], B[N - 1, 1], ......, B[N - 1, N - 1] Matrix multiply and threaded matrix multiply Sequential code: For (i=0; i<N; i++) for (j=0; j<N; j++) for (k=0; k<N; k++) C[I, j] = C[I, j] + A[I, k] * A[k, j] Threaded code program The calculation of c[I,j] does not depend on other C term. Mm_pthread.c. C[0, 1], ......, C[0, N - 1] C[0, 0], C[1, 0], C[1, 1], ......, C[1, N 1] ................................................. C[N - 1, 0], C[N - 1, 1], ......, C[N - 1, N - 1] PI calculation 1 n PI lim( n i 1 n 4. 0 i 0.5 n 2 ) 1 .0 Sequential code: pi.c Threaded version: homework Type of Threads Independent threads Cooperative threads Independent Threads No states shared with other threads Deterministic computation Output depends on input Reproducible Output does not depend on the order and timing of other threads Scheduling order does not matter. Cooperating Threads Shared states Nondeterministic Nonreproducible Example: 2 threads sharing the same display Thread A cout << “ABC”; Thread B cout << “123”; You may get “A12BC3” So, Why Allow Cooperating Threads? Shared resources e.g., a single processor Speedup Occurs when threads use different resources at different times mm_pthread.c Modularity An application can be decomposed into threads Some Concurrent Programs If threads work on separate data, scheduling does not matter Thread A x = 1; Thread B y = 2; Some Concurrent Programs If threads share data, the final values are not as obvious Thread A x = 1; x = y + 1; Thread B y = 2; y = y * 2; What are the indivisible operations? Atomic Operations An atomic operation always runs to completion; it’s all or nothing e.g., memory loads and stores on most machines Many operations are not atomic Double precision floating point store on 32-bit machines Suppose… Each C statement is atomic Let’s revisit the example… All Possible Execution Orders Thread A x = 1; x = y + 1; Thread B y = 2; y = y * 2; x=1 x=y+1 y=2 y=2 x=1 y=y*2 y=2 x=y+1 y=y*2 x=1 y=y*2 y=y*2 x=y+1 x=y+1 A decision tree All Possible Execution Orders Thread A x = 1; x = y + 1; Thread B y = 2; y = y * 2; (x = ?, y = ?) (x = ?, y = 2) y=2 (x = 1, y = ?) x=1 (x = ?, y = ?) x=y+1 (x = ?, y = 2) y=2 (x = 1, y = 2) y=2 (x = 3, y = 2) x=y+1 y=y*2 y=y*2 (x = ?, y = 4) (x = 3, y = 4) x=1 (x = ?, y = 4) y=y*2 (x = 1, y = 4) (x = 1, y = 4) x=1 y=y*2 x=y+1 x=y+1 (x = 5, y = 4) (x = 5, y = 4) Another Example Assume each C statement is atomic Thread A j = 0; while (j < 10) { ++j; } cout << “A wins”; Thread B j = 0; while (j > -10) { --j; } cout << “B wins”; So… Who wins? Can the computation go on forever? Race conditions occur when threads share data, and their results depend on the timing of their executions… The take home point: sharing data in threads can cause problems, we need some mechanism to deal with such situation.