synchronization.txt Feb 2 2009 1:10 Page 1 Page 2

advertisement
Feb 2 2009 1:10
synchronization.txt
A summary of the synchronization lecture (and additions)
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Page 1
Feb 2 2009 1:10
synchronization.txt
main
int i = 0;
start thread A
start thread B
wait for threads to finish
printf("value of i: %d\n", i);
thread A and B
1 while (busy)
;
2 busy = true;
3 copy i to eax
4 increment eax
5 copy eax to i
6 busy = false;
thread A
++i
Let us still assume a interrupt and thread switch at same code as
before, now between A4 and A5:
thread B
++i
The instructions executed to increase i would be:
1. Copy value of i to register eax
2. Increase value of register by 1
3. Copy value of register eax to i
We first assume one CPU and that thread A completed before thread B
starts. Indicating the instruction order by thread name and
instruction number we have:
A1 A2 A3 B1 B2 B3
i 0 0 0 1 1 1 2
Executing the instructions you find that the final value of i is 2.
Since it is increased two times this is correct and expected.
Next we assume a timer interrupt causing a thread switch between
instruction A2 and A3. Assuming thread B is picked from the ready
queue the ins ruction sequence become:
A1 A2 A3 A4 B1 B1 B1 B1 B1 B1 ...
f t t t t t t t t t
0 0 0 0 0 0 0 0 0 0
busy f
i
0
Since busy is now true B will get stuck in the loop ...
until a new thread switch, we assume A continue:
A5 A6 B2 B3 B4 B5 B6
t f t t t t f
1 1 1 1 1 2 2
busy t
i
0
When A6 has been executed B can continue (once B is scheduled again).
The result is now correct again. One problem we can see here is that
thread B is wasting a lot of CPU time. B occupies the CPU to determine
when to stop waiting, this is called "busy−wait" and is a waste of CPU
time. Really bad. But that’s not all.
Let us now consider the code again, but now we use our brains to
insert interrupts at the most "unlucky" places, between A1 and A2 as
well as between B3 and B4:
A1 B1 B2 B3 A2 A3 A4 A5 A6 B4 B5 B6
f f t t t t t t f f f f
0 0 0 0 0 0 0 1 1 1 1 1
A1 A2 B1 B2 B3 A3
i 0 0 0 0 0 0 1
busy f
i
0
The final value of i become 1. Since we still have the same program
that increase i twice this is unexpected and wrong. What happened?
Both threads read the initial value of i (A1 and B1), and then A3
overwrote the result of B3.
With two unlucky thread switches the result is still wrong. Because
now we pushed the problem to the busy flag. Now the initial value of
that is read by both threads.
Finally we assume two CPUs ant that the threads run truly simultaneous.
We assume thread B start a fraction of time after A.
A1 A2 A3
B1 B2 B3
We have a similar result as previously. Both threads read the initial
value of i and B3 will overwrite the result of A3 (loosing it).
Using threads a above the result is not deterministic. Clearly not
acceptable. A program should always return a deterministic correct
result. How do we solve the situation?
First, we try to let each thread put up a sign "I’m modifying i,
you’ll have to wait". The new program:
main
int i = 0;
bool busy = false;
start thread A
start thread B
wait for threads to finish
printf("value of i: %d\n", i);
Page 2
We note that the reason for the problem given only one CPU is that we
get unlucky thread switches, and they are caused by interrupts. We
test a new idea, disable interrupts:
thread A and B
1 disable interrupts
2 copy i to eax
3 increment eax
4 copy eax to i
5 enable interrupts
Wow, that works great. No unlucky switches. Either the switches happen
before we read the value of i, or after we already updated it. The
section of code from we start using the thread−common variable i to
the point we are done are called a "critical section" (the time during
which switches must be prevented).
Unfortunately we still have a host of problems:
Consider we need to do something more complicated, and do it often:
insert_unique(list, x)
{
disable interrupts
1
synchronization.txt
Feb 2 2009 1:10
if (!find(list, x))
append(list, x)
enable interrupts
Observe we must prevent switches during the entire operation, during
find we traverse the pointers of the linked list, and if some thread
is half−way to inserting something one pointer is bound to be wrong.
Also if some thread could insert x just after we determined it was not
in the list we would insert a duplicate.
Disabling interrupts for the amount of time needed to traverse the list
will prevent also keystrokes, network packets, hard−drive access
etc... everything that depends on interrupts to signal completion. The
computer will "stop responding". Very bad.
Also, disabling interrupts should be something only the OS can do,
since if user−programs could do it one program could "hang" the
computer. But we want to be able to use threads safely also in user
programs.
Third, consider this solution on a multiprocessor. It will not prevent
the other processors to access i during the critical section unless
all other processors is stopped somehow. And stopping all other
processors would be a heavy penalty.
Let us now combine the ideas to see if some problem is solved:
\
|
copy i to eax
increment eax
copy eax to i
|__ lock
|
|
/
Page 4
|__ also critical section
|
/
\
> "main" critical section
/
disable interrupts
move one thread from wait
queue to ready queue
busy = false;
enable interrupts
\
|
\
> unlock
|
/
> also critical section
/
Now we have a lock solution that works good on single CPU. Since this
puts the threads asleep while waiting we call them "sleeplocks". Using
the lock around code sections guarantees mutual exclusion (sometimes
locks are called mutex’es). Only one of the code sections can be
executed "simultaneous".
Using a while loop is essential, since someone else may enter the
critical section while a "unlocked" thread is on the ready queue, so
it must wait again.
Note that we have a two−level locking. The disabled interrupts
guarantees atomic modification of "busy", that guarantee atomic
modification our user code. But the user code runs with interrupts
enabled.
−−−
> lock
|
/
4
5
6
copy i to eax
increment eax
copy eax to i
7
8
9
disable interrupts \
busy = false;
> unlock
enable interrupts /
Consider a comparison: A bathroom without a lock (or broken lock), so
you can not lock the door while doing your business.
So you put up a sign on the door when you enter that reads "BUSY", and
take it down when you leave.
For this to work without embarrassment it puts heavy requirements on
the involved parties:
\
> critical section
/
Now we prevent switches during the time we access the busy flag.
Since this is a very brief operation interrupts will not be disabled
for long. Interrupts are now enabled during (the possibly heavy)
computation in the critical section.
The problem with only OS should be able to access interrupts can be
solved by letting the OS provide the lock and unlock code as a OS
function, taking the address of the busy variable (the sign). It will
always enable interrupts again before user program restarts.
And other threads still can not modify the variable i as long as they
execute the lock code before and the unlock code after the
modification attempt.
Unfortunately we still have problems. At line 1 interrupts are
disabled, so no switch will occur. Thus, reaching this code when the
busy flag is true will enter an infinite loop. Very bad. Even worse
than a busy−loop. And the problem with multiprocessors remain.
Fortunately the infinite loop can be solved:
thread A and B
disable interrupts
while (busy)
synchronization.txt
Feb 2 2009 1:10
put thread on wait queue
switch to other thread
busy = true;
enable interrupts
}
thread A and B
0 disable interrupts
1 while (busy)
;
2 busy = true;
3 enable interrupts
Page 3
\
|
\
If someone enters the room without checking the sign embarrassment
may occur...
If some joker takes the sign down embarrassment may occur...
If someone forgets to put the sign up embarrassment may occur...
If someone forgets to take the sign down embarrassment may occur because of too long
waiting...
Also consider the situation when the bathroom (critical section) has
many doors. The sign must be checked no matter which door is used. And
there can be only one sign for this bathroom, if several signs occur
the algorithm breaks (may be checking wrong sign).
Thus the programmer must use the lock and unlock code correct at every
access.
−−−
But what about multiprocessors (the today and even more the future)?
Well, clever hardware engineers have invented two special "atomic"
instructions (short instructions that are done without any interrupt
or other CPU can intervene, they are "locked"). Two variants exists,
test−and−set and atomic−swap. They would equate the following code:
int test_and_set(int* adr)
{
int ret = *adr;
*adr = 1; /* true */
2
synchronization.txt
Feb 2 2009 1:10
return ret;
Page 5
Feb 2 2009 1:10
synchronization.txt
Page 6
− Initiate the counter to 5 (number of persons it can hold, or free
resources).
}
void atomic_swap(int* a, int* b)
{
int save = *a;
*a = *b;
*b = save;
}
We can use the second to implement the first:
int test_and_set(int* adr)
{
int set = 1;
atomic_swap(&set, adr);
return set;
}
− After exiting the bridge, increment the counter and notify one of
the persons waiting.
All three operations above must of course be atomic (mutual exclusive)
since each is a critical section (the counter is shared).
This describes exactly a semaphore. Another situation often occurring
when using threads is the problem of waiting for some data to become
available.
Now we can use these instructions to protect a critical section:
while (test_and_set(&busy)
;
− Before entering the bridge, check the counter. If it is zero, wait,
if it more than zero, decrement it and enter the bridge.
\__ spinlock acquire
/
/* critical section */
Consider a mailbox some hundred yards
from your window). You do not want to
arrived. You put a small flag on your
after mail has been delivered. To use
following:
from your house (but visible
go all the way to see if mail
mailbox that are visible only
this scheme you do the
− Mount the flag on the empty mailbox.
− Before fetching the mail, check the flag. If it is not visible,
wait, if it is visible, fetch the mail and reset the flag.
busy = false;
___ spinlock release
− When mail is delivered the flag is raised.
Since this kind of lock uses busy−wait, it spin in the loop, we call
it spinlock.
Let us use it to replace the interrupt manipulation to protect the
critical section.
thread A and B
spinlock_acquire(busy)
copy i to eax
increment eax
copy eax to i
This is actually also a semaphore, but this time initiated to 0 (no
resource available).
If you also consider a semaphore initiated to 1 you will come tho the
conclusion that this is equivalent to a lock, since a lock protects a
resource (often memory location) only one thread at a time may access.
Many threading systems provide only semaphores instead of locks, since
they solve all three synchronization problems above.
\
> critical section
/
spinlock_release(busy)
A semaphore need to wait for a counter to become non−zero. Now
consider the more general situation where we need to wait for a custom
condition to become true. For example we may need to wait for N
threads to finish the summation of one N:th part of an array.
The bad thing is we use a busy−loop in the spinlock. The good thing on
multiprocessors are that since we have true simultaneous execution we
will not have to wait very long time as long as the critical section
is short. The overhead of waiting would (hopefully) be shorter than
the overhead of interrupt disabling thread switching. When the thread
occupying the lock is currently executing on an other CPU this is most
beneficial. Modern operating systems use a combined solution. If the
occupying thread is executing on an other CPU they use a spinlock, else
they use a sleeplock.
In the waiting thread:
Fortunately, from now on we only need to use the "lock" and "unlock"
function to protect our code, and not worry so much about the internal
lock implementation.
Since done is a shared variable reading and writing it become a
critical code section that must be protected by a lock:
while (done < N)
;
In each summing thread:
sum_Nth_array_part(array, N)
++done;
In the waiting thread:
In some critical sections we can allow multiple, but limited, number
of threads. Consider a situation where N resources are available. For
example a bounded−buffer. In this case we can use a semaphore.
Consider a bridge
persons enter the
entrance and exit
To safely use the
that can hold no more than 5 persons. If more
bride it will fall apart. Assume the bridge at each
has some means to modify a central large counter.
bridge the following algorithm must be followed:
lock(done_lock)
while (done < N)
;
unlock(done_lock)
In each summing thread:
sum_Nth_array_part(array, N)
3
synchronization.txt
Feb 2 2009 1:10
lock(done_lock)
++done;
unlock(done_lock)
But now we got a problem. Before we enter the while loop we take the
lock. Then we wait for it to reach N. But while we wait we hold the
lock, so it is busy. Thus, when a thread tries to take the lock to
increment the done variable it must wait. Then the done variable will
not be incremented, and not reach N, and the waiting thread will wait
forever, never releasing the lock so that the done variable can be
incremented. We have a kind of deadlock. The waiting thread will wait
for the summing threads that simultaneously will wait for the summing
threads. Ouch. We try to solve it:
In the waiting thread:
lock(done_lock)
while (done < N)
put on wait queue
unlock(done_lock)
switch thread
lock(done_lock)
unlock(done_lock)
\
|__ wait for condition to appear
|
/
In each summing thread:
sum_Nth_array_part(array, N)
lock(done_lock)
++done;
if done == N
move one thread from wait \__ signal condition appeared
queue to ready queue /
unlock(done_lock)
This solution to custom wait situation occurring inside a critical
section is named "conditions". A condition is a special mechanism that
safely release the lock, waits, and then reacquire the lock. But is
must be used correct in order to work. The condition code is typically
implemented in two functions named wait and signal that require to be
executed inside the critical section lock (the condition code itself
is critical, and also need the lock to unlock). Using these function
the code looks like this:
Page 7
Feb 2 2009 1:10
synchronization.txt
Page 8
lock(count_lock)
while counter == 0
wait(count_lock, count_condition)
−−counter
unlock(count_lock)
increase semaphore
lock(count_lock)
++counter
signal(count_lock, count_condition)
unlock(count_lock)
It is worth to note that conditions always need the while condition to
function correctly. We can not use an "empty" condition to "just
wait". Consider for example trying to solve the mailbox example using
only a "empty" condition:
Wait for mail
lock(mail_lock)
wait(mail_lock, mail_condition)
unlock(mail_lock)
WRONG
WRONG
WRONG
Deliver mail:
lock(mail_lock)
WRONG
signal(mail_lock, mail_condition) WRONG
unlock(mail_lock)
WRONG
Why is it wrong?
Consider the case where mail arrive BEFORE you start waiting for
it. In this case, since wait will ALWAYS wait you will wait forever,
because signal will only signal if you are already waiting. The
correct solution uses a semaphore initiated to 0:
Wait for mail:
decrease_semaphore(mail_sema)
CORRECT
Deliver mail:
increase_semaphore(mail_sema)
CORRECT
In the waiting thread:
lock(done_lock)
while (done < N)
wait(done_lock, done_condition)
unlock(done_lock)
In each summing thread:
sum_Nth_array_part(array, N)
lock(done_lock)
++done;
if done == N
signal(done_lock, done_condition)
unlock(done_lock)
Using a while loop is in most cases essential, since the condition may
become false again while a "signalled" thread is on the ready queue.
In this special example the condition will stay true once it become
true, but we write while for (in case of conditions) good habit.
Now we can consider how to implement a semaphore using a condition:
decrease semaphore
4
Download