Concurrent Programming The Cunning Plan • We’ll look into: – What concurrent programming is – Why you care – How it’s done • We’re going to skim over *all* the interesting details 2 One-Slide Summary • There are many different ways to do concurrent programming • There can be more than one • We need synchronization primitives (e.g. semaphores) to deal with shared resources • Message passing is complicated 3 4 What? • Concurrent Programming – using multiple threads on a single machine • OS simulates concurrency or • using multiple cores/processors – using message passing • Memory is not shared between threads • More general in terms of hardware requirements 5 What (Shorter version) • There are a million different ways to do concurrent programming • We’ll focus on three: – co-begin blocks – threads – message passing 6 Concurrent Programming: Why? 1. Because it is intuitive for some problems (say you’re writing httpd) 2. Because we need better-than-sequential performance 3. Because the problem is inherently distributed (e.g. BitTorrent) 7 Coding Challenges • How do you divide the problem across threads? – easy: matrix multiplication using threads – hard: heated plate using message passing – harder: n-body simulation for large n 8 One Slide on Co-Begin • We want to execute commands simultaneously m’kay—solution: int x; int y; // ... run-in-parallel{ functionA(&x) | functionB(&x,&y) } Main A B 9 Threads • Most common in everyday applications • Instead of a run-in-parallel block, we want explicit ways to create and destroy threads • Threads can all see a program’s global variables (i.e. they share memory) 10 Some Syntax: Thread mythread = new Thread( new Runnable() { public void run() { // your code here } } ); mythread.start(); mythread.join(); 11 Some Syntax: void foo(int Thread mythread & x) = { new //Thread( your code newhere Runnable() { } //... public void run() { // your code here int bar = 5; } pthread_t my_id; } pthread_create(&my_id, ); NULL, foo, (void *)&bar); mythread.start(); // ... pthread_join(my_id, NULL); 12 Example: Matrix Multiplication • Given: • Compute: 9 7 4 * * * 2 5 -3 = -18 18 = 35 = -12 +----5 41 13 Matrix Multiplication ‘Analysis’ • We have p = 4 size(A) = (p, q) q = 3 size(B) = (q, r) r=4 size(C) = (p, r) • Complexity: • p×r elements in C • O(q) operations per element • Note: calculating each element of C is independent from the other elements 14 Matrix Multiplication using Threads pthread_t threads[P][Q]; struct location locs[P][Q]; for (i = 0; i < P; ++i) { for (j = 0; j < R; ++j) { (locs[i][j]).row = i; (locs[i][j]).col = j; pthread_create( &threads[i][j], NULL, calc_cell, (void*)(&(locs[i][j])) ); } } for (i = 0; i < P; ++i) { for (j = 0; j < R; ++j) { pthread_join( &threads[i][j], NULL); } } 15 Matrix Multiplication using Threads for each element in C: create a thread: call the function 'calc_cell' for each created thread: wait until the thread finishes // Profit 16 Postmortem • Relatively easy to parallellize: – matrices A and B are ‘read only’ – each thread writes to a unique entry in C – entries in C do not depend on each other • What are some problems with this? – overhead of creating threads – use of shared memory 17 Synchronization • So far, we have only covered how to create & destroy threads • What else do we need? (See title) 18 Synchronization • We want to do things like: – event A must happen before event B and – events A and B cannot occur simultaneously • Is there a problem here? Thread 1 counter = counter + 1 Thread 2 counter = counter + 1 19 Semaphores • A number n (initialized to some value) • Can only increment sem.V() and decrement sem.P() • n > 0 : P() doesn’t block n ≤ 0 : P() blocks V() unblocks some waiting process 20 More Semaphore Goodness • Semaphores are straightforward to implement on most types of systems • Easy to use for resource management (set n equal to the number of resources) • Some additional features are common (e.g. bounded semaphores) 21 Semaphore Example • Let’s try this again: Main Semaphore wes = new Semaphore(0) // start threads 1 and 2 simultaneously Thread 1 counter = counter + 1 wes.V() Thread 2 wes.P() counter = counter + 1 22 Semaphore Example 2 • Suppose we want two threads to “meet up” at specific points in their code: Semaphore aArrived = new Semaphore(0) Sempahore bArrived = new Semaphore(0) // start threads A and B simultaneously Thread A foo_a1 aArrived.V() bArrived.P() bArrived.P() aArrived.V() foo_a2 Thread B foo_b1 bArrived.V() aArrived.P() aArrived.P() bArrived.V() foo_b2 23 Deadlock • ‘Deadlock’ refers to a situation in which one or more threads is waiting for something that will never happen • Theorem: You will, at some point in your life, write code that deadlocks 24 Readers/Writers Problem • Let’s do a slightly bigger example • Problem: – some finite buffer b – multiple writer threads (only one can write at a time) – multiple reader threads (many can read at a time) – can only read if no writing is happening 25 Readers/Writers Solution #1 int readers = 0 Semaphore mutex = new Semaphore(1) Semaphore roomEmpty = new Semaphore(1) Writers: roomEmpty.P() // write here roomEmpty.V() 26 Readers/Writers Solution #1 Readers: mutex.P() readers ++ if (readers == 1) roomEmpty.P() mutex.V() // read here mutex.P() readers -if (readers == 0) roomEmpty.V() mutex.V() 27 Starvation • Starvation occurs when a thread is continuously denied resources • Not the same as deadlock: it might eventually get to run, but it needs to wait longer than we want • In the previous example, a writer might ‘starve’ if there is a continuous onslaught of readers 28 Guiding Question • Earlier, I said sem.V() unblocks some waiting thread • If we don’t unblock in FIFO order, that means we could cause starvation • Do we care? 29 Synchronization Summary • We can use semaphores to enforce synchronization: – ordering – mutual exclusion – queuing • There are other constructs as well • See your local OS Prof 30 Message Passing • Threads and co. rely on shared memory • Semaphores make very little sense if they cannot be shared between n > 1 threads • What about systems in which we can’t share memory? 31 Message Passing • Threads Processes are created for us (they just exist) • We can do the following: blocking_send(int destination, char * buffer, int size) blocking_receive(int source, char * buffer, int size) 32 Message Passing: Motivation • We don’t care if threads run on different machines: – same machine - use virtual memory tricks to make messages very quick – different machines - copy and send over the network 33 Heated Plate Simulation • Suppose you have a metal plate: • Three sides are chilled to 273 K • One side is heated to 373 K 34 Heated Plate Simulation Problem: Calculate the heat distribution after some time t: t=10 t=30 t=50 35 Heated Plate Simulation • We model the problem by dividing the plate into small squares: • For each time step, take the average of a square’s four neighbors 36 Heated Plate Simulation • Problem: need to communicate for each time step P1 P2 • Sending messages is expensive… P3 37 Heated Plate Simulation • Problem: need to communicate for each time step P1 • Sending messages is expensive… P2 • Solution: send fewer, larger messages, limit longest message path P3 38 How to cause deadlock in MPI Process 1 Process 2 char * buff = "Goodbye" char * buff2 = new char(15); char * buff = ", cruel world\n“ char * buff2 = new char(8); send(2, buff, 8) recv(2, buff, 15) send(1, buff, 15) recv(1, bugg, 8) 39 Postmortem • Our heated plate solution does not rely on shared memory • Sending messages becomes complicated in a hurry (easy to do the wrong thing) • We need to reinvent the wheel constantly for different interaction patterns 40 Example Summary • Matrix Multiplication Example – used threads and implicitly shared memory – this is common for everyday applications (especially useful for servers, GUI apps, etc.) • Heated Plate Example – used message passing – this is more common for big science and big business (also e.g. peer-to-peer) – it is not used to code your average firefox 41 Guiding Question If you’re writing a GUI app (let’s call it “firefox”) would you prefer to use threads or message passing? 42 Summary • Looked at three ways to do concurrent programming: – co-begin – threads, implicitly shared memory, semaphores – message passing • Concerns of scheduling, deadlock, starvation 43 Summary 44