Concurrent Programming Introducing the principles of reentrancy, mutual exclusion and thread-synchronication

advertisement
Concurrent Programming
Introducing the principles of
reentrancy, mutual exclusion and
thread-synchronication
Advantages of multithreading
• For multiprocessor systems (two or more
CPUs), there are potential efficiencies in
the parallel execution of separate threads
(a computing job may be finished sooner)
• For uniprocessor systems (just one CPU),
there are likely software design benefits in
dividing a complex job into simpler pieces
(easier to debug and maintain -- or reuse)
Some Obstacles
• Separate tasks need to coordinate actions,
share data, and avoid competing for same
system resources
• Management ‘overhead’ could seriously
degrade the system’s overall efficiency
• Examples:
– Frequent task-switching is costly in CPU time
– Busy-Waiting is wasteful of system resources
Some ‘work-arounds’
• Instead of using ‘pipes’ for the exchange of
data among separate processes, Linux
lets ‘threads’ use the same address-space
(reduces ‘overhead’ in context-switching)
• Instead of requiring one thread to waste
time busy-waiting while another finishes
some particular action, Linux lets a thread
voluntarily give up its control of the CPU
Additional pitfalls
• Every thread needs some private memory
that cannot be ‘trashed’ by another thread
(for example, it needs a private stack for
handling interrupts, passing arguments to
functions, creating local variables, saving
CPU register-values temporarily)
• Each thread needs a way to prevent being
interrupted in a ‘critical’ multi-stage action
Example of a ‘critical section’
• If interrupt occurs
• Recall Disk-Drive device-programming
(status-register and control-register)
• Algorithm:
– (1) Loop rereads status-register until ‘ready’
– (2) Write drive-command to control-register
• If an interrupt occurs between these steps,
another thread can send its own command
‘mutual exclusion’
• To prevent one thread from ‘sabotaging’ the
actions of another, some mechanism is needed
that allows a thread to temporarily ‘block’ other
threads from gaining control of the CPU -- until
the first thread has completed its ‘critical’ action
• Some ways to accomplish this:
– Disable interrupts (stops CPU time-sharing)
– Use a ‘mutex’ (a mutual exclusion variable)
– Put other tasks to sleep (remove from run-queue)
What about ‘cli’?
• Disabling interrupts will stop ‘time-sharing’
among tasks on a uniprocessor system
• But it would be ‘unfair’ in to allow this in a
multi-user system (monopolize the CPU)
• So ‘cli’ is a privileged instruction: it cannot
normally be executed by user-mode tasks
• It won’t work on a multiprocessor system
What about a ‘mutex’?
• A shared global variable acts as a ‘lock’
• Initially it’s ‘unlocked’: e.g., int mutex = 1;
• Before entering a ‘critical section’ of code,
a task ‘locks’ the mutex: i.e., mutex = 0;
• As soon as it leaves its ‘critical section’, it
‘unlocks’ the mutex: i.e., mutex = 1;
• While the mutex is ‘locked’, no other task
can enter the ‘critical section’ of code
Advantages and cautions
• A mutex can be used in both uniprocessor
and multiprocessor systems – provided it
is possible for a CPU to ‘lock’ the mutex
with a single ‘atomic’ instruction (requires
special support by processors’ hardware)
• Use of a mutex can introduce busy-waiting
by tasks trying to enter the ‘critical section’
(thereby severely degrading performance)
Software mechanism
• The operating system can assist threads
needing mutual exclusion, simply by not
scheduling other threads that might want
to enter the same ‘critical section’ of code
• Linux accomplishes this by implementing
‘wait-queues’ for those threads that are all
contending for access to the same system
resource – including ‘critical sections’
Demo programs
• To show why ‘synchronization’ is needed
in multithreaded programs, we wrote the
‘concur1.cpp’ demo-program
• Here several separate threads will all try to
increment a shared ‘counter’ – but without
any mechanism for doing synchronization
• The result is unpredictable – a different
total is gotten each time the program runs!
How to employ a ‘mutex’
• Declare a global variable: int mutex = 1;
• Define a pair of shared subroutines
– void enter_critical_section( void );
– void leave_critical_section( void );
• Insert calls to these subroutines before
and after accessing the global ‘counter’
Special x86 instructions
• We need to use x86 assembly-language
(to implement ‘atomic’ mutex-operations)
• Several instruction-choices are possible,
but ‘btr’ and ‘bts’ are simplest to use:
– ‘btr’ means ‘bit-test-and-reset’
– ‘bts’ means ‘bit-test-and’set’
• Syntax and semantics:
– asm(“ btr $0, mutex “); // acquire the mutex
– asm(“ bts $0, mutex “); // release the mutex
The two mutex-functions
void enter_critical_section( void )
{
asm(“spin: btr $0, mutex “);
asm(“
jnc spin
“);
}
void leave_critical_section( void )
{
asm(“
bts $0, mutex “);
}
Where to use the functions
void my_thread( int * data )
{
int
i, temp;
for (i = 0; i < maximum; i++)
{
enter_critical_section();
temp = counter;
temp += 1;
counter = temp;
leave_critical_section();
}
}
‘reentrancy’
• By the way, we point out as an aside that
our ‘my_thread()’ function (on the previous
slide) is an example of ‘reentrant’ code
• More than one process (or processor) can
be safely executing it concurrently
• It needs to obey two cardinal rules:
– It contains no ‘self-modifying’ instructions
– Access to shared variables is ‘exclusive’
In-class exercise #1
• Rewrite the ‘concur1.cpp’ demo-program,
as ‘concur2.cpp’, inserting these functions
that will implement ‘mutual exclusion’ for
our thread’s ‘critical section’
• Then try running your ‘concur2.cpp’ on a
uniprocessor system (your workstation)
• Also try running your ‘concur2.cpp’ on a
multiprocessor system (e.g., dept server)
The x86 ‘lock’ prefix
• In order for the ‘btr’ instruction to perform
an ‘atomic’ update (when multiple CPUs
are using the same bus to access memory
simultaneously), it is necessary to insert
an x86 ‘lock’ prefix, like this:
asm(“ spin:
lock
btr $0, mutex “);
• This instruction ‘locks’ the shared systembus during this instruction execution -- so
another CPU cannot intervene
In-class exercise #2
• Add the ‘lock’ prefix to your ‘concur2.cpp’
demo, and then try executing it again on
the multiprocessor system
• Use the Linux ‘time’ command to measure
how long it takes for your demo to finish
• Observe the ‘degraded’ performance due
to adding the ‘mutex’ functions – penalty
for achieving a ‘correct’ parallel program
The ‘nanosleep()’ system-call
• Your multithreaded demo-program shows
poor performance because your threads
are doing lots of ‘busy-waiting’
• When a thread can’t acquire the mutex, it
should voluntarily give up control of the
CPU (so another thread can do real work)
• The Linux ‘nanosleep()’ system-call allows
a thread to ‘yield’ its time-slice
In-class exercise #3
• Revice your ‘concur3.cpp’ program so that
a thread will ‘yield’ if it cannot immediately
acquire the mutex (see our ‘yielding.cpp’
demo for header-files and call-syntax)
• Use the Linux ‘time’ command to compare
the performance of ‘concur3’ and ‘concur2’
– On a uniprocessor platform
– On a multiprocessor platform
Download