Feb 2 2009 1:10 synchronization.txt A summary of the synchronization lecture (and additions) −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− Page 1 Feb 2 2009 1:10 synchronization.txt main int i = 0; start thread A start thread B wait for threads to finish printf("value of i: %d\n", i); thread A and B 1 while (busy) ; 2 busy = true; 3 copy i to eax 4 increment eax 5 copy eax to i 6 busy = false; thread A ++i Let us still assume a interrupt and thread switch at same code as before, now between A4 and A5: thread B ++i The instructions executed to increase i would be: 1. Copy value of i to register eax 2. Increase value of register by 1 3. Copy value of register eax to i We first assume one CPU and that thread A completed before thread B starts. Indicating the instruction order by thread name and instruction number we have: A1 A2 A3 B1 B2 B3 i 0 0 0 1 1 1 2 Executing the instructions you find that the final value of i is 2. Since it is increased two times this is correct and expected. Next we assume a timer interrupt causing a thread switch between instruction A2 and A3. Assuming thread B is picked from the ready queue the ins ruction sequence become: A1 A2 A3 A4 B1 B1 B1 B1 B1 B1 ... f t t t t t t t t t 0 0 0 0 0 0 0 0 0 0 busy f i 0 Since busy is now true B will get stuck in the loop ... until a new thread switch, we assume A continue: A5 A6 B2 B3 B4 B5 B6 t f t t t t f 1 1 1 1 1 2 2 busy t i 0 When A6 has been executed B can continue (once B is scheduled again). The result is now correct again. One problem we can see here is that thread B is wasting a lot of CPU time. B occupies the CPU to determine when to stop waiting, this is called "busy−wait" and is a waste of CPU time. Really bad. But that’s not all. Let us now consider the code again, but now we use our brains to insert interrupts at the most "unlucky" places, between A1 and A2 as well as between B3 and B4: A1 B1 B2 B3 A2 A3 A4 A5 A6 B4 B5 B6 f f t t t t t t f f f f 0 0 0 0 0 0 0 1 1 1 1 1 A1 A2 B1 B2 B3 A3 i 0 0 0 0 0 0 1 busy f i 0 The final value of i become 1. Since we still have the same program that increase i twice this is unexpected and wrong. What happened? Both threads read the initial value of i (A1 and B1), and then A3 overwrote the result of B3. With two unlucky thread switches the result is still wrong. Because now we pushed the problem to the busy flag. Now the initial value of that is read by both threads. Finally we assume two CPUs ant that the threads run truly simultaneous. We assume thread B start a fraction of time after A. A1 A2 A3 B1 B2 B3 We have a similar result as previously. Both threads read the initial value of i and B3 will overwrite the result of A3 (loosing it). Using threads a above the result is not deterministic. Clearly not acceptable. A program should always return a deterministic correct result. How do we solve the situation? First, we try to let each thread put up a sign "I’m modifying i, you’ll have to wait". The new program: main int i = 0; bool busy = false; start thread A start thread B wait for threads to finish printf("value of i: %d\n", i); Page 2 We note that the reason for the problem given only one CPU is that we get unlucky thread switches, and they are caused by interrupts. We test a new idea, disable interrupts: thread A and B 1 disable interrupts 2 copy i to eax 3 increment eax 4 copy eax to i 5 enable interrupts Wow, that works great. No unlucky switches. Either the switches happen before we read the value of i, or after we already updated it. The section of code from we start using the thread−common variable i to the point we are done are called a "critical section" (the time during which switches must be prevented). Unfortunately we still have a host of problems: Consider we need to do something more complicated, and do it often: insert_unique(list, x) { disable interrupts 1 synchronization.txt Feb 2 2009 1:10 if (!find(list, x)) append(list, x) enable interrupts Observe we must prevent switches during the entire operation, during find we traverse the pointers of the linked list, and if some thread is half−way to inserting something one pointer is bound to be wrong. Also if some thread could insert x just after we determined it was not in the list we would insert a duplicate. Disabling interrupts for the amount of time needed to traverse the list will prevent also keystrokes, network packets, hard−drive access etc... everything that depends on interrupts to signal completion. The computer will "stop responding". Very bad. Also, disabling interrupts should be something only the OS can do, since if user−programs could do it one program could "hang" the computer. But we want to be able to use threads safely also in user programs. Third, consider this solution on a multiprocessor. It will not prevent the other processors to access i during the critical section unless all other processors is stopped somehow. And stopping all other processors would be a heavy penalty. Let us now combine the ideas to see if some problem is solved: \ | copy i to eax increment eax copy eax to i |__ lock | | / Page 4 |__ also critical section | / \ > "main" critical section / disable interrupts move one thread from wait queue to ready queue busy = false; enable interrupts \ | \ > unlock | / > also critical section / Now we have a lock solution that works good on single CPU. Since this puts the threads asleep while waiting we call them "sleeplocks". Using the lock around code sections guarantees mutual exclusion (sometimes locks are called mutex’es). Only one of the code sections can be executed "simultaneous". Using a while loop is essential, since someone else may enter the critical section while a "unlocked" thread is on the ready queue, so it must wait again. Note that we have a two−level locking. The disabled interrupts guarantees atomic modification of "busy", that guarantee atomic modification our user code. But the user code runs with interrupts enabled. −−− > lock | / 4 5 6 copy i to eax increment eax copy eax to i 7 8 9 disable interrupts \ busy = false; > unlock enable interrupts / Consider a comparison: A bathroom without a lock (or broken lock), so you can not lock the door while doing your business. So you put up a sign on the door when you enter that reads "BUSY", and take it down when you leave. For this to work without embarrassment it puts heavy requirements on the involved parties: \ > critical section / Now we prevent switches during the time we access the busy flag. Since this is a very brief operation interrupts will not be disabled for long. Interrupts are now enabled during (the possibly heavy) computation in the critical section. The problem with only OS should be able to access interrupts can be solved by letting the OS provide the lock and unlock code as a OS function, taking the address of the busy variable (the sign). It will always enable interrupts again before user program restarts. And other threads still can not modify the variable i as long as they execute the lock code before and the unlock code after the modification attempt. Unfortunately we still have problems. At line 1 interrupts are disabled, so no switch will occur. Thus, reaching this code when the busy flag is true will enter an infinite loop. Very bad. Even worse than a busy−loop. And the problem with multiprocessors remain. Fortunately the infinite loop can be solved: thread A and B disable interrupts while (busy) synchronization.txt Feb 2 2009 1:10 put thread on wait queue switch to other thread busy = true; enable interrupts } thread A and B 0 disable interrupts 1 while (busy) ; 2 busy = true; 3 enable interrupts Page 3 \ | \ If someone enters the room without checking the sign embarrassment may occur... If some joker takes the sign down embarrassment may occur... If someone forgets to put the sign up embarrassment may occur... If someone forgets to take the sign down embarrassment may occur because of too long waiting... Also consider the situation when the bathroom (critical section) has many doors. The sign must be checked no matter which door is used. And there can be only one sign for this bathroom, if several signs occur the algorithm breaks (may be checking wrong sign). Thus the programmer must use the lock and unlock code correct at every access. −−− But what about multiprocessors (the today and even more the future)? Well, clever hardware engineers have invented two special "atomic" instructions (short instructions that are done without any interrupt or other CPU can intervene, they are "locked"). Two variants exists, test−and−set and atomic−swap. They would equate the following code: int test_and_set(int* adr) { int ret = *adr; *adr = 1; /* true */ 2 synchronization.txt Feb 2 2009 1:10 return ret; Page 5 Feb 2 2009 1:10 synchronization.txt Page 6 − Initiate the counter to 5 (number of persons it can hold, or free resources). } void atomic_swap(int* a, int* b) { int save = *a; *a = *b; *b = save; } We can use the second to implement the first: int test_and_set(int* adr) { int set = 1; atomic_swap(&set, adr); return set; } − After exiting the bridge, increment the counter and notify one of the persons waiting. All three operations above must of course be atomic (mutual exclusive) since each is a critical section (the counter is shared). This describes exactly a semaphore. Another situation often occurring when using threads is the problem of waiting for some data to become available. Now we can use these instructions to protect a critical section: while (test_and_set(&busy) ; − Before entering the bridge, check the counter. If it is zero, wait, if it more than zero, decrement it and enter the bridge. \__ spinlock acquire / /* critical section */ Consider a mailbox some hundred yards from your window). You do not want to arrived. You put a small flag on your after mail has been delivered. To use following: from your house (but visible go all the way to see if mail mailbox that are visible only this scheme you do the − Mount the flag on the empty mailbox. − Before fetching the mail, check the flag. If it is not visible, wait, if it is visible, fetch the mail and reset the flag. busy = false; ___ spinlock release − When mail is delivered the flag is raised. Since this kind of lock uses busy−wait, it spin in the loop, we call it spinlock. Let us use it to replace the interrupt manipulation to protect the critical section. thread A and B spinlock_acquire(busy) copy i to eax increment eax copy eax to i This is actually also a semaphore, but this time initiated to 0 (no resource available). If you also consider a semaphore initiated to 1 you will come tho the conclusion that this is equivalent to a lock, since a lock protects a resource (often memory location) only one thread at a time may access. Many threading systems provide only semaphores instead of locks, since they solve all three synchronization problems above. \ > critical section / spinlock_release(busy) A semaphore need to wait for a counter to become non−zero. Now consider the more general situation where we need to wait for a custom condition to become true. For example we may need to wait for N threads to finish the summation of one N:th part of an array. The bad thing is we use a busy−loop in the spinlock. The good thing on multiprocessors are that since we have true simultaneous execution we will not have to wait very long time as long as the critical section is short. The overhead of waiting would (hopefully) be shorter than the overhead of interrupt disabling thread switching. When the thread occupying the lock is currently executing on an other CPU this is most beneficial. Modern operating systems use a combined solution. If the occupying thread is executing on an other CPU they use a spinlock, else they use a sleeplock. In the waiting thread: Fortunately, from now on we only need to use the "lock" and "unlock" function to protect our code, and not worry so much about the internal lock implementation. Since done is a shared variable reading and writing it become a critical code section that must be protected by a lock: while (done < N) ; In each summing thread: sum_Nth_array_part(array, N) ++done; In the waiting thread: In some critical sections we can allow multiple, but limited, number of threads. Consider a situation where N resources are available. For example a bounded−buffer. In this case we can use a semaphore. Consider a bridge persons enter the entrance and exit To safely use the that can hold no more than 5 persons. If more bride it will fall apart. Assume the bridge at each has some means to modify a central large counter. bridge the following algorithm must be followed: lock(done_lock) while (done < N) ; unlock(done_lock) In each summing thread: sum_Nth_array_part(array, N) 3 synchronization.txt Feb 2 2009 1:10 lock(done_lock) ++done; unlock(done_lock) But now we got a problem. Before we enter the while loop we take the lock. Then we wait for it to reach N. But while we wait we hold the lock, so it is busy. Thus, when a thread tries to take the lock to increment the done variable it must wait. Then the done variable will not be incremented, and not reach N, and the waiting thread will wait forever, never releasing the lock so that the done variable can be incremented. We have a kind of deadlock. The waiting thread will wait for the summing threads that simultaneously will wait for the summing threads. Ouch. We try to solve it: In the waiting thread: lock(done_lock) while (done < N) put on wait queue unlock(done_lock) switch thread lock(done_lock) unlock(done_lock) \ |__ wait for condition to appear | / In each summing thread: sum_Nth_array_part(array, N) lock(done_lock) ++done; if done == N move one thread from wait \__ signal condition appeared queue to ready queue / unlock(done_lock) This solution to custom wait situation occurring inside a critical section is named "conditions". A condition is a special mechanism that safely release the lock, waits, and then reacquire the lock. But is must be used correct in order to work. The condition code is typically implemented in two functions named wait and signal that require to be executed inside the critical section lock (the condition code itself is critical, and also need the lock to unlock). Using these function the code looks like this: Page 7 Feb 2 2009 1:10 synchronization.txt Page 8 lock(count_lock) while counter == 0 wait(count_lock, count_condition) −−counter unlock(count_lock) increase semaphore lock(count_lock) ++counter signal(count_lock, count_condition) unlock(count_lock) It is worth to note that conditions always need the while condition to function correctly. We can not use an "empty" condition to "just wait". Consider for example trying to solve the mailbox example using only a "empty" condition: Wait for mail lock(mail_lock) wait(mail_lock, mail_condition) unlock(mail_lock) WRONG WRONG WRONG Deliver mail: lock(mail_lock) WRONG signal(mail_lock, mail_condition) WRONG unlock(mail_lock) WRONG Why is it wrong? Consider the case where mail arrive BEFORE you start waiting for it. In this case, since wait will ALWAYS wait you will wait forever, because signal will only signal if you are already waiting. The correct solution uses a semaphore initiated to 0: Wait for mail: decrease_semaphore(mail_sema) CORRECT Deliver mail: increase_semaphore(mail_sema) CORRECT In the waiting thread: lock(done_lock) while (done < N) wait(done_lock, done_condition) unlock(done_lock) In each summing thread: sum_Nth_array_part(array, N) lock(done_lock) ++done; if done == N signal(done_lock, done_condition) unlock(done_lock) Using a while loop is in most cases essential, since the condition may become false again while a "signalled" thread is on the ready queue. In this special example the condition will stay true once it become true, but we write while for (in case of conditions) good habit. Now we can consider how to implement a semaphore using a condition: decrease semaphore 4