Tasks

advertisement
Concurrency
Motivations
 To capture the logical structure of a problem
 Servers, graphical applications
 To exploit extra processors, for speed
 Ubiquitous multi-core processors
 To cope with separate physical devices
 Internet applications
HTC vs HPC
 High throughput computing
 Environments that can deliver large amounts of processing
capacity over long periods of time
 High performance computing
 Uses supercomputers and computer clusters to solve advanced
computation problems
 DACTAL
 Condor
 Concurrent application
Concurrency
 Any system in which two or more tasks may be underway at
the same time (at an unpredictable point in their execution)
 Parallel: more than one task physically active
 Requires multiple processors
 Distributed: processors are associated with people or devices
that are physically separated from one another in the real
world
Levels of Concurrency
 Instruction level
 Two or more machine instructions
 Statement level
 Two of more source language statements
 Unit level
 Two or more subprogram units
 Program level
 Two or more programs
Fundamental Concepts
 A task or process is a program unit that can be in concurrent
execution with other program units
 Tasks differ from ordinary subprograms in that:
 A task may be implicitly started
 When a program unit starts the execution of a task, it is not
necessarily suspended
 When a task’s execution is completed, control may not return
to the caller
 Tasks usually work together
Task Categories
 Heavyweight tasks
 Execute in their own address space and have their own run-time
stacks
 Lightweight tasks
 All run in the same address space and use the same run-time
stack
 A task is disjoint if it does not communicate with or affect the
execution of any other task in the program in any way
Synchronization
 A mechanism that controls the order in which tasks execute
 Cooperation: Task A must wait for task B to complete some
specific activity before task A can continue its execution
 e.g. the producer-consumer problem
 Competition: Two or more tasks must use some resource that
cannot be simultaneously used
 e.g. a shared counter, dining philosophers
 Competition is usually provided by mutually exclusive access
The Producer-Consumer Problem
There are a number of “classic”
synchronization problems, one of which
is the producer-consumer problem...
producer1
producer2
There are M producers that put items into a fixed-sized
buffer. The buffer is shared with N consumers that
remove items from the buffer.
 The problem is to devise a solution that synchronizes
the producer & consumer accesses to the buffer.
producerM
consumer1
…
consumer2
consumerN
Accesses to the buffer must be synchronized, because if multiple producers and/or consumers access it
simultaneously, values may get lost, retrieved twice, etc.
In the bounded-buffer version, the buffer has some fixed-capacity N.
Dining Philosophers
Five philosophers sit at a table, alternating between eating
noodles and thinking. In order to eat, a philosopher must have
two chopsticks. However, there is a single chopstick between
each pair of plates, so if one is eating, neither neighbor can eat.
A philosopher puts down both chopsticks when thinking.
Devise a solution that ensures:
 no philosopher starves; and
 a hungry philosopher is only prevented
from eating by his neighbor(s).
philosopher := [
[true] whileTrue: [
self get: left.
self get: right.
self eat.
self release: left.
self release: right.
self think.
]
]
Deadlock!
How about this instead?
philosopher := [
[true] whileTrue: [
[self have: left and: right]
whileFalse: [
self get: left.
[right notInUse]
ifTrue: [self get: right]
ifFalse: [self release: left]
].
self eat.
self release: left.
self release: right.
self think.
]
]
Livelock!
Liveness and Deadlock
 Liveness is a characteristic that a program unit may or may not
have
 In sequential code, it means the unit will eventually complete
its execution
 In a concurrent environment, a task can easily lose its liveness
 If all tasks in a concurrent environment lose their liveness, it
is called deadlock (or livelock)
Race conditions
 When the resulting value of a variable, when two different
thread of a program are writing to it, will differ depending
on which thread writes to it first
 Transient errors, hard to debug
 E.g.
c=c+1
1. load c
2. add 1
3. store c
• Solution: acquire access to the shared resource before
execution can continue
•
Issues: lockout, starvation
Task Execution States
 Assuming some mechanism for synchronization (e.g. a
scheduler), tasks can be in a variety of states:
 New - created but not yet started
 Runnable or ready - ready to run but not currently running (no
available processor)
 Running
 Blocked - has been running, but cannot now continue (usually
waiting for some event to occur)
 Dead - no longer active in any sense
Design Issues
 Competition and cooperation synchronization
 Controlling task scheduling
 How and when tasks start and end execution
 Alternatives:
 Semaphores
 Monitors
 Message Passing
Semaphores
 Simple mechanism that can be used to provide
synchronization of tasks
 Devised by Edsger Dijkstra in 1965 for competition
synchronization, but can also be used for cooperation
synchronization
 A data structure consisting of an integer and a queue that
stores task descriptors
 A task descriptor is a data structure that stores all the relevant
information about the execution state of a task
Semaphore operations
 Two atomic operations, P and V
 Consider a semaphore s:
 P (from the Dutch “passeren”)
 P(s) – if s > 0 then assign s = s – 1; otherwise block (enqueue) the thread
that calls P
 Often referred to as “wait”
 V (from the Dutch “vrygeren/vrijgeven”)
 V(s) – if a thread T is blocked on the s, then wake up T; otherwise assign s
=s+1
 Often referred to as “signal”
Dining Philosophers
wantBothSticks := Semaphore new.
philosopher := [
[true] whileTrue: [
[self haveBothSticks] whileFalse: [
wantBothSticks wait.
left available and right available
ifTrue: [
self get: left.
self get: right.
].
wantBothSticks signal.
].
self eat.
self release: left.
self release: right.
self think.
]
]
The trouble with semaphores
 No way to statically check for the correctness of their use
 Leaving out a single wait or signal event can create many
different issues
 Getting them just right can be tricky
 Per Brinch Hansen (1973):
 “The semaphore is an elegant synchronization tool for an ideal
programmer who never makes mistakes.”
Locks and Condition Variables
 A semaphore may be used for either of two purposes:
 Mutual exclusion: guarding access to a critical section
 Synchronization: making processes suspend/resume
 This dual use can lead to confusion: it may be unclear which role a
semaphore is playing in a given computation…
 For this reason, newer languages may provide distinct constructs
for each role:
 Locks: guarding access to a critical section
 Condition Variables: making processes suspend/resume
 Locks provide for mutually-exclusive access to shared memory;
condition variables provide for thread/process synchronization.
Locks
 Like a Semaphore, a lock has two associated operations:
 acquire()
 try to lock the lock; if it is already locked, suspend execution
 release()
 unlock the lock; awaken a waiting thread (if any)
 These can be used to ‘guard’ a critical section:
sharedLock.acquire();
sharedLock.acquire();
Lock sharedLock;
// access sharedObj
// access sharedObj
Object sharedObj;
sharedLock.release();
sharedLock.release();
 A Java class has a hidden lock accessible via the synchronized keyword
Condition Variables
 A Condition is a predefined type available in some languages that can
be used to declare variables for synchronization.
 When a thread needs to suspend execution inside a critical section
until some condition is met, a Condition can be used.
 There are three operations for a Condition:
 wait()
 suspend immediately; enter a queue of waiting threads
 signal(), aka notify() in Java
 awaken a waiting thread (usually the first in the queue), if any
 broadcast(), aka notifyAll() in Java
 awaken all waiting threads, if any
 Java has no Condition class, but every Java class has an anonymous
condition-variable that can be manipulated via wait, notify & notifyAll
Monitor motivation
 A Java class has a hidden lock accessible via the synchronized
keyword
 Deadlocks/livelocks/non-mutual-exclusion are easy to produce
 Just as control structures were “higher level” than the goto,
language designers began looking for higher level ways to
synchronize processes
 In 1973, Brinch-Hansen and Hoare proposed the monitor, a
class whose methods are automatically accessed in a
mutually-exclusive manner.
 A monitor prevents simultaneous access by multiple threads
Monitors
 The idea: encapsulate the shared data and its operations to
restrict access
 A monitor is an abstract data type for shared data
 Shared data is resident in the monitor (rather than in the
client units)
 All access resident in the monitor
 Monitor implementation guarantee synchronized access by
allowing only one access at a time
 Calls to monitor procedures are implicitly queued if the
monitor is busy at the time of the call
Monitor Visualization
The compiler ‘wraps’ calls to put() and get() as follows:
buf.lock.acquire();
… call to put or get
put(obj)
get(obj)
public (interface)
lock
hidden
private
buf.lock.release();
If the lock is locked, the
thread enters the entry queue
notEmpty
…
notFull
…
myHead
mySize
myTail
N
myValues
…
…
entry queue
Each condition variable has its own
internal queue, in which waiting
threads wait to be signaled…
Evaluation of Monitors
 A better way to provide competition synchronization than
are semaphores
 Equally powerful as semaphores:
 Semaphores can be used to implement monitors
 Monitors can be used to implement semaphores
 Support for cooperation synchronization is very similar as
with semaphores, so it has the same reliability issues
Distributed Synchronization
Semaphores, locks, condition variables, monitors, are shared-memory constructs, and so only useful
on a tightly-coupled multiprocessor.
 They are of no use on a distributed multiprocessor
On a distributed multiprocessor, processes can communicate via message-passing -- using send() and
receive() primitives.
 If the message-passing system has no storage, then the send/receive operations must be
synchronized:
1. Sender
(ready)
3. message (transmitted)
2. Receiver
(ready)
 If the message-passing system has storage to buffer the message, then the send() can proceed
asynchronously:
1. Sender
(ready)
2. message (buffered)
The receiver can then retrieve the message when it is ready...
3. Receiver
(not ready)
Tasks
 In 1980, Ada introduced the task, with 3 characteristics:
 its own thread of control;
 its own execution state; and
 mutually exclusive subprograms (entry procedures)
Entry procedures are self-synchronizing subprograms that another task can invoke for task-to-task
communication.
If task t has an entry procedure p, then another task t2 can execute
t.p( argument-list );
In order for p to execute, t must execute:
accept p ( parameter-list );
- If t executes accept p and t2 has not called p, t will automatically wait;
- If t2 calls p and t has not accepted p, t2 will automatically wait.
Rendezvous
When t and t2 are both ready, p
executes:





t
t2’s argument-list is evaluated and passed
accept p(params)
to t.p’s parameters
begin
t2 suspends
t2
t.p (args)
[suspend]
…
t executes the body of p, using its
end p;
parameter values
return-values (or out or in out parameters)
are passed back to t2
t continues execution; t2
resumes execution
[resume]
This interaction is called a rendezvous between t and t2.
It does not depend on shared memory, so t1 and t2 can be on a uniprocessor, a tightly-coupled or a
distributed multiprocessor.
time
Example Problem
 How can we rewrite what’s below to complete more quickly?
procedure sumArray is
N: constant integer := 1000000;
type RealArray is array(1..N) of float;
anArray: RealArray;
function sum(a: RealArray; first, last: integer)
return float is
result: float := 0.0;
begin
for i in first..last loop
result := result + a(i);
end loop;
return result;
end sum;
begin
-- code to fill anArray with values omitted
put( sum(anArray, 1, N) );
end sumArray;
Divide-And-Conquer via Tasks
procedure parallelSumArray is
-- declarations of N, RealArray, anArray, Sum() as before …
task type PartialAdder
entry SumSlice(Start: in Integer; Stop: in Integer);
entry GetSum(Result: out float);
end PartialAdder;
task body ArraySliceAdder is
i, j: Integer; Answer: Float;
begin
accept SumSlice(Start: in Integer; Stop: in Integer) do
i:= Start; j:= Stop;
-- get ready
end SumSlice;
Answer := Sum(anArray, i, j);
accept GetSum(Result: out float) do
Result := Answer;
end GetSum;
end ArraySliceAdder;
-- continued on next slide…
-- do the work
-- report outcome
Divide-And-Conquer via Tasks (ii)
-- continued from previous slide …
firstHalfSum, secondHalfSum: Integer;
T1, T2 : ArraySliceAdder;
-- T1, T2 start & wait on accept
begin
-- code to fill anArray with values omitted
T1.SumSlice(1, N/2);
T2.SumSlice(N/2 + 1, N);
-- start T1 on 1st half
-- start T2 on 2nd half
T1.GetSum( firstHalfSum ); -- get 1st half sum from T1
T2.GetSum( secondHalfSum ); -- get 2nd half sum from T2
put( firstHalfSum + secondHalfSum );
end parallelSumArray;
-- we’re done!
Using two tasks T1 and T2, this parallelSumArray version requires roughly 1/2 the time required by
sumArray (on a multiprocessor).
Using three tasks, the time will be roughly 1/3 the time of sumArray.
…
Producer-Consumer in Ada
To give the producer and consumer
separate threads, we can define the
behavior of one in the ‘main’
procedure:
and the behavior of the other in a
separate task:
We can then build a Buffer task with
put() and get() as (autosynchronizing) entry procedures...
procedure ProducerConsumer is
buf: Buffer;
it: Item;
task consumer;
task body consumer is
it: Item;
begin
loop
buf.get(it);
-- consume Item it
end loop;
end consumer;
begin -- producer task
loop
-- produce an Item in it
buf.put(it);
end loop;
end ProducerConsumer;
Capacity-1 Buffer
 A single-value buffer is
easy to build using an Ada
task-type:
As a task-type, variables of this type (e.g.,
buf) will automatically have their own
thread of execution.
The body of the task is a loop that
accepts calls to put() and get() in strict
alternation.
task type Buffer is
entry get(it: out Item);
entry put(it: in Item);
end Buffer;
task body Buffer is
B: Item;
begin
loop
accept put(it: in Item) do
B:= it;
end put;
accept get(it: out Item) do
it := B;
end get;
end loop;
end Buffer;
This causes myBuffer to alternate between being empty and nonempty.
Capacity-N Buffer
 An N-value buffer is a
bit more work:
We can accept any call to get() so
long as we are not empty, and any
call to put() so long as we are not
full.
Ada provides the select-when statement
to guard an accept, and perform it if
and only if a given condition is true
-- task declaration is as before …
task body Buffer is
N: constant integer := 1024;
package B is new Queue(N, Items);
begin
loop
select
when not B.isFull =>
accept put(it: in Item) do
B.append(it);
end put;
or when not B.isEmpty =>
accept get(it: out Item) do
it := B.first;
B.delete;
end get;
end select;
end loop;
end Buffer;
The Importance of Clusters
 Scientific computation is increasingly performed on clusters
 Cost-effective: Created from commodity parts
 Scientists want more computational power
 Cluster computational power is easy to increase by adding
processors
 Cluster size keeps increasing!
Clusters Are Not Perfect
 Failure rates are increasing
 The number of moving parts is growing (processors, network
connections, disks, etc.)
 Mean Time Between Failures (MTBF) is shrinking
 Anecdotal: every 20 minutes for Google’s cluster
 How can we deal with these failures?
Options for Fault-Tolerance
 Redundancy in space
 Each participating process has a backup process
 Expensive!
 Redundancy in time
 Processes save state and then rollback for recovery
 Lighter-weight fault tolerance
Today’s Answer: Redundancy in Time
 Programmers place checkpoints
 Small checkpoint size
 Synchronous
 Every process checkpoints in the same place in the code
 Global synchronization before and after checkpoints
What’s the Problem?
 Future systems will be larger
 Checkpointing will hurt program performance
 Many processes checkpointing synchronously will result in
network and file system contention
 Checkpointing to local disk not viable
 Application programmers are only willing to pay 1%
overhead for fault-tolerance
 The solution:
 Avoid synchronous checkpoints
Understanding Staggered Checkpointing
More
processes,
moreis data, synchronous
Not
State
That’s
so
is
fast…
inconsistent--easy!
There
We’ll
State is consistent---it could have existed
Today:
Tomorrow:
Nonotproblem!
checkpoints
checkpoints….
Contention!
communication!
it
could
stagger
thehave
existed
Send not
Processes
0
X
X
saved
1
Receive is saved
checkpoint with
checkpoint
contention
…
64K2
VALID
Recovery line
Recovery line
[Randall 75]
X
Receive not
saved
Send is saved
Time
Identify All Possible Valid Recovery Lines
There are so many!
[1,0,0]
[2,0,0]
[3,2,0]
Processes
0
1
2
[4,5,2]
[2,3,2] [2,4,2]
[1,1,0]
[1,2,0]
[2,0,1]
Time
[2,0,2]
[2,5,2]
[2,4,3]
Coroutine
 A coroutine is two or more procedures that share a single
thread of execution, each exercising mutual control over the
other:
procedure A;
begin
-- do something
resume B;
-- do something
resume B;
-- do something
-- …
end A;
procedure B;
begin
-- do something
resume A;
-- do something
resume A;
-- …
end B;
Summary
 Concurrent computations consist of multiple entities.
 Processes in Smalltalk
 Tasks in Ada
 Threads in Java
 OS-dependent in C++
On a shared-memory multiprocessor:
 The Semaphore was the first synchronization primitive
 Locks and condition variables separated a semaphore’s mutual-exclusion usage from its
synchronization usage
 Monitors are higher-level, self-synchronizing objects

Java classes have an associated (simplified) monitor
On a distributed system:
 Ada tasks provide self-synchronizing entry procedures
Download