POSIX Threads 1. Background 2. Threads vs. Processes 3. Thread Synchronization 4. Mutex Variables 5. Condition Variables 6. Threads and UNIX Page 1 threads CS 360, WSU Vancouver 1. Background Threads are light-weight processes – only local variables in a function are specific to a thread (e.g. each thread has its own stack) – most other data is shared between threads (e.g. global variables & the heap) pthreads is the POSIX threads standard. – The associated library interface is obtained by: – #include <pthread.h> Page 2 threads CS 360, WSU Vancouver 1.1. Reasons for Using Threads Efficiency Ease of data sharing Common uses of parallelism: – overlapping I/O do long I/O and CPU tasks at the same time – asynchronous events wait for a response and do something else – real-time scheduling quickly respond to important tasks – to utilize multiple processors Page 3 threads CS 360, WSU Vancouver 2. Threads vs Processes A thread uses less system resources than a similar process (for the same task). A thread may require more user space resources than a similar process – depends on the implementation Threads (within the same process) share everything except their stack and processor state continued Page 4 threads CS 360, WSU Vancouver The threads mechanism can be used instead of many different process-based mechanisms: – non-blocking I/O, shared memory (IPC), jmp() functions, signals, ... Threads use an inherently simpler shared memory mechanism than processes: – inter-thread communication is far easier – but it is also much easier to code parallelism errors (e.g. race conditions) Page 5 threads CS 360, WSU Vancouver Problems with Threads Unpredictable functioning (non-determinism) Difficult to debug Impossible to prove correct Almost impossible to test thoroughly Specific behavior not easily reproducible Simple conceptually, but added complexities for dealing with many pitfalls Page 6 threads CS 360, WSU Vancouver 2.1. Processes Example shared memory Global variables, etc shmid, shm_ptr, r1p, r2p r1p child 1: do_one_thing() Page 7 threads Global variables, etc r2p shmid, shm_ptr, r1p, r2p child 2: do_another_thing() CS 360, WSU Vancouver 2.1. ints_processes.c #include #include #include #include #include #include #include <stdio.h> <unistd.h> <sys/wait.h> <errno.h> <sys/types.h> <sys/ipc.h> <sys/shm.h> void do_one_thing(int *pnum); void do_another_thing(int *pnum); int shmid, *shm_ptr; /* global, but not shared */ int *r1p, *r2p; : : Page 8 threads continued CS 360, WSU Vancouver int main() { pid_t child1, child2; /* initialize shared memory */ shmid = shmget(IPC_PRIVATE, 2*sizeof(int), (IPC_CREAT | 0666)); if ((shm_ptr = (int *)shmat(shmid, (char *)0, 0)) < 0 ) perror("shmat failed"); : continued Page 9 threads CS 360, WSU Vancouver /* initialise shared memory ints */ r1p = shm_ptr; r2p = (shm_ptr + 1); *r1p = 0; *r2p = 0; /* start forking children */ if ((child1 = fork()) == 0) { /* first child */ do_one_thing(r1p); exit(0); } : continued Page 10 threads CS 360, WSU Vancouver if ((child2 = fork()) == 0) { /* second child */ do_another_thing(r2p); exit(0); } /* parent waits for children */ wait(NULL); wait(NULL); printf(“Final Values: %d, %d\n”, *r1p, *r2p); return 0; } Page 11 threads continued CS 360, WSU Vancouver void do_one_thing(int *pnum) /* Waste time, and increment pnum */ { int i, j, x=0; for (i = 0; i < 4; i++) { printf("doing one thing\n"); for (j = 0; j < 10000; j++) x = x + i; (*pnum)++; } } Page 12 threads continued CS 360, WSU Vancouver void do_another_thing(int *pnum) /* Waste time, and increment pnum. The code is almost the same as do_one_thing() */ { int i, j, x=0; for (i = 0; i < 4; i++) { printf("doing another \n"); for (j = 0; j < 10000; j++) x = x + i; (*pnum)++; } } Page 13 threads continued CS 360, WSU Vancouver Usage $ ints_processes doing one thing doing one thing doing another doing another doing another doing another doing one thing doing one thing Final Values: 4, 4 $ Page 14 threads Order may vary each time ints_processes is executed. CS 360, WSU Vancouver 2.2. Threads Version Global variables, etc shared by default thread 1: do_one_thing() Page 15 threads r1 r2 thread 2: do_another_thing() CS 360, WSU Vancouver 2.2. ints_threads.c #include <stdio.h> #include <pthread.h> /* Same functions as in simple_processes.c */ void do_one_thing(int *pnum); void do_another_thing(int *pnum); /* Global (and shared) integers */ int r1 = 0, r2 = 0; : Page 16 threads continued CS 360, WSU Vancouver int main() { pthread_t thread1, thread2; pthread_create(&thread1, NULL, (void *) do_one_thing, (void *) &r1); pthread_create(&thread2, NULL, (void *) do_another_thing, (void *) &r2); pthread_join(thread1, NULL); pthread_join(thread2, NULL); printf(“Final Values: %d, %d\n”,r1, r2); return 0; Page 17 threads CS 360, WSU Vancouver } Notes No need for shared memory library functions – threads approach has better performance – And simpler code Difficult to program – synchronization problems (with processes, too) Global variables are accessible to all threads – easy to make coding mistakes Page 18 threads CS 360, WSU Vancouver 2.3. Thread Functions int pthread_create(pthread_t *thread, const pthread_attr_t attr, void *(*func)(void *), void *arg); Create a new thread with attributes specified in attr (usually NULL). Start executing func() and pass it arg. Consider carefully what to pass as the argument Return 0 if ok, non-zero if error. continued Page 19 threads CS 360, WSU Vancouver int pthread_join(pthread_t thread, void **value_ptr); Make the calling thread wait for the specified thread to terminate. value_ptr is assigned its return value (or PTHREAD_CANCELLED). Page 20 threads CS 360, WSU Vancouver 2.4. Matrix Multiplication Example Parallelize matrix multiplication: 9 8 7 6 6 5 4 3 3 2 1 0 0 -1 -2 -3 Page 21 threads * 1 2 3 4 4 5 6 7 7 8 9 10 10 11 12 13 = 150 180 210 240 84 102 120 138 18 24 30 36 -48 -54 -60 -66 CS 360, WSU Vancouver Coding Approach Create MATSIZE (e.g. 4) threads: one for each column of the results[] array – each column will be calculated in parallel Page 22 threads The parallelism could be increased by creating MATSIZE*MATSIZE threads: one for each element of the results[] array. CS 360, WSU Vancouver matmult.c #include <stdio.h> #include <pthread.h> #define MATSIZE 4 void *matMult(void *); /* note the template */ void printMult(void); /* global and shared data */ int mat1[MATSIZE][MATSIZE] = {{9,8,7,6},{6,5,4,3},{3,2,1,0},{0,-1,-2,-3}}; int mat2[MATSIZE][MATSIZE] = {{1,2,3,4},{4,5,6,7},{7,8,9,10},{10,11,12,13}}; int result[MATSIZE][MATSIZE]; : Page 23 threads CS 360, WSU Vancouver int main() { pthread_t thr[MATSIZE]; int i, *iPtr; for(i=0; i<MATSIZE; i++) iPtr = (int*) malloc(sizeof(int)); *iPtr = i; pthread_create(&thr[i], NULL, matMult, (void *)iPtr); for(i=0; i < MATSIZE; i++) pthread_join(thr[i], NULL); printMult(); return 0; } Page 24 threads CS 360, WSU Vancouver void *matMult(void *colvPtr) { int i, j; int col = *colvPtr; free(colPtr); for(i=0; i < MATSIZE; i++) { result[i][col] = 0; for(j=0; j < MATSIZE; j++) result[i][col] += mat1[i][j] * mat2[j][col]; } return NULL; } Page 25 threads CS 360, WSU Vancouver void printMult(void) { int i, j; for(i=0; i < MATSIZE; i++) { printf(“|”); for(j=0; j < MATSIZE; j++) printf(“%3d”, mat1[i][j]); printf(“|%c|”, (i==MATSIZE/2 ? ‘*’ : ‘‘)); for(j=0; j < MATSIZE; j++) printf(“%3d”, mat2[i][j]); printf(“|%c|”, (i==MATSIZE/2 ? ‘=’ : ‘‘)); for(j=0; j < MATSIZE; j++) printf(“%4d”, result[i][j]); printf(“|\n”); } } Page 26 threads CS 360, WSU Vancouver 3. Thread Synchronization pthread_join() – like wait() for processes mutex variables – like binary semaphores for processes condition variables – wait for an ‘event’ (e.g. a variable is assigned a certain value), which is ‘signaled’ by another thread. Page 27 threads CS 360, WSU Vancouver 4. Mutex Variables A mutex variable is a mutual exclusion lock, allowing threads to control access to shared data. Only one thread can hold a mutex at a time. The threads must agree to use the mutex to protect the shared data. Mutex variables are not managed by OS and thus very efficient (no system calls). Page 28 threads CS 360, WSU Vancouver Example Diagram Global variables, etc r1 r3 r3_mutex r2 shared by default lock lock thread 1: do_one_thing() Page 29 threads thread 2: do_another_thing() CS 360, WSU Vancouver ints_mutex.c #include <stdio.h> #include <stdlib.h> #include <pthread.h> void lock_one_thing(int *pnum); void lock_another_thing(int *pnum); /* global (and shared) variables */ int r1 = 0, r2 = 0, r3 = 0; pthread_mutex_t r3_mutex; : Page 30 threads continued CS 360, WSU Vancouver void main(int argc, char *argv[]) { pthread_t thread1, thread2; pthread_mutex_init(&r3_mutex, NULL); if (argc < 2) { printf("usage %s number\n",argv[0]); exit(1); } r3 = atoi(argv[1]); : continued Page 31 threads CS 360, WSU Vancouver : pthread_create(&thread1, NULL, (void *) lock_one_thing, (void *) &r1); pthread_create(&thread2, NULL, (void *) lock_another_thing, (void *) &r2); pthread_join(thread1, NULL); pthread_join(thread2, NULL); printf(“Final Values: %d, %d\n”, r1, r2); } continued Page 32 threads CS 360, WSU Vancouver void lock_one_thing(int *pnum) { int i, j, x=0; for (i = 0; i < 4; i++) { pthread_mutex_lock(&r3_mutex); r3 = r3 + (*pnum); printf(“one altered r3: %d\n”, r3); pthread_mutex_unlock(&r3_mutex); for (j = 0; j < 10000; j++) x = x + i; (*pnum)++; } } continued Page 33 threads CS 360, WSU Vancouver void lock_another_thing(int *pnum) { int i, j, x=0; for (i = 0; i < 4; i++) { pthread_mutex_lock(&r3_mutex); r3 = r3 + (*pnum); printf(“another altered r3: %d\n”,r3); pthread_mutex_unlock(&r3_mutex); for (j = 0; j < 10000; j++) x = x + i; (*pnum)++; } } Page 34 threads continued CS 360, WSU Vancouver Usage $ ints_mutex 4 one altered r3: 4 one altered r3: 5 one altered r3: 7 one altered r3: 10 another altered r3: another altered r3: another altered r3: another altered r3: Final Values: 4, 4 $ Page 35 threads 10 11 13 16 CS 360, WSU Vancouver 4.2. Mutex Functions int pthread_mutex_lock( pthread_mutex_t *mutex); Lock an unlocked mutex; if already locked, the thread waits until the mutex becomes unlocked. int pthread_mutex_trylock( pthread_mutex_t *mutex); Lock an unlocked mutex, but if locked, do not block and return EBUSY continued Page 36 threads CS 360, WSU Vancouver int pthread_mutex_unlock( pthread_mutex_t *mutex); Unlock a mutex; if any threads are waiting to lock this mutex, one is woken up. Page 37 threads CS 360, WSU Vancouver 5. Condition Variables Synchronize threads by using events – e.g. a variable is assigned a certain value A thread (or threads) wait for an event which is ‘signaled’ by another thread. – These events are not UNIX signals The ‘signal’ causes the thread (or threads) to wake up. We won’t cover the details of coding this! – This could be used effectively on the dining philosophers problem. Page 38 threads CS 360, WSU Vancouver 6. Threads and UNIX UNIX was originally designed to handle processes before shared memory multiprocessors were available. – how to add in threads to take advantage of multiple CPUs ? A process is a ‘container’ for one or more threads – all the threads share the process’ memory address space Page 39 threads CS 360, WSU Vancouver How do threads deal with... signals? library functions? process management? – e.g. fork(), exec() Page 40 threads CS 360, WSU Vancouver 6.1. Signals Each thread can have its own signal mask and signal actions. Signals can be sent to a specific thread or to the process that ‘holds’ the thread(s). – Synchronous signals get delivered to the thread that caused them (such as SEGV or SIGFPE) Page 41 threads CS 360, WSU Vancouver 6.2. Library Functions How do several threads share the same library function at the same time? – Any function that uses global variables (or statically declared variables) is suspect! Answer: thread-safe libraries – Libraries can be made thread-safe by adding mutexes around the library function’s global variables – library functions may not be thread-safe! – Some libraries have thread-safe alternatives (ctime vs. ctime_r) Page 42 threads CS 360, WSU Vancouver What if a thread is terminated inside a library function? – e.g. in the middle of changing global data Answer: the pthreads library includes cancellation-safe functions – they clean up upon cancellation Page 43 threads CS 360, WSU Vancouver What if a thread blocks inside a library function? Answer: the pthreads library includes many functions which only block the thread, not the entire process – the programmer can also turn off blocking in other functions Page 44 threads CS 360, WSU Vancouver 6.3. Process Management What does a fork() call from a thread do to the other threads in the containing process? Answer: the new child process contains a single copy of the thread that called fork() – – What happens to mutexes? big headaches are possible! Guidelines: – Fork from a process with only one thread – Fork before creating additional threads – Fork only from the main (parent) thread – Hold no locks during the fork Page 45 threads CS 360, WSU Vancouver What does an exec() call from a thread do to the threads in the containing process? Answer: all the threads terminate, and a new thread is created for the exec() program. Page 46 threads CS 360, WSU Vancouver