
 To capture the logical structure of a problem
 Servers, graphical applications
 To exploit extra processors, for speed
 Ubiquitous multi-core processors
 To cope with separate physical devices
 Internet applications
 High throughput computing
 Environments that can deliver large amounts of processing
capacity over long periods of time
 High performance computing
 Uses supercomputers and computer clusters to solve advanced
computation problems
 Condor
 Concurrent application
 Any system in which two or more tasks may be underway at
the same time (at an unpredictable point in their execution)
 Parallel: more than one task physically active
 Requires multiple processors
 Distributed: processors are associated with people or devices
that are physically separated from one another in the real
Levels of Concurrency
 Instruction level
 Two or more machine instructions
 Statement level
 Two of more source language statements
 Unit level
 Two or more subprogram units
 Program level
 Two or more programs
Fundamental Concepts
 A task or process is a program unit that can be in concurrent
execution with other program units
 Tasks differ from ordinary subprograms in that:
 A task may be implicitly started
 When a program unit starts the execution of a task, it is not
necessarily suspended
 When a task’s execution is completed, control may not return
to the caller
 Tasks usually work together
Task Categories
 Heavyweight tasks
 Execute in their own address space and have their own run-time
 Lightweight tasks
 All run in the same address space and use the same run-time
 A task is disjoint if it does not communicate with or affect the
execution of any other task in the program in any way
 A mechanism that controls the order in which tasks execute
 Cooperation: Task A must wait for task B to complete some
specific activity before task A can continue its execution
 e.g. the producer-consumer problem
 Competition: Two or more tasks must use some resource that
cannot be simultaneously used
 e.g. a shared counter, dining philosophers
 Competition is usually provided by mutually exclusive access
The Producer-Consumer Problem
There are a number of “classic”
synchronization problems, one of which
is the producer-consumer problem...
There are M producers that put items into a fixed-sized
buffer. The buffer is shared with N consumers that
remove items from the buffer.
 The problem is to devise a solution that synchronizes
the producer & consumer accesses to the buffer.
Accesses to the buffer must be synchronized, because if multiple producers and/or consumers access it
simultaneously, values may get lost, retrieved twice, etc.
In the bounded-buffer version, the buffer has some fixed-capacity N.
Dining Philosophers
Five philosophers sit at a table, alternating between eating
noodles and thinking. In order to eat, a philosopher must have
two chopsticks. However, there is a single chopstick between
each pair of plates, so if one is eating, neither neighbor can eat.
A philosopher puts down both chopsticks when thinking.
Devise a solution that ensures:
 no philosopher starves; and
 a hungry philosopher is only prevented
from eating by his neighbor(s).
philosopher := [
[true] whileTrue: [
self get: left.
self get: right.
self eat.
self release: left.
self release: right.
self think.
How about this instead?
philosopher := [
[true] whileTrue: [
[self have: left and: right]
whileFalse: [
self get: left.
[right notInUse]
ifTrue: [self get: right]
ifFalse: [self release: left]
self eat.
self release: left.
self release: right.
self think.
Liveness and Deadlock
 Liveness is a characteristic that a program unit may or may not
 In sequential code, it means the unit will eventually complete
its execution
 In a concurrent environment, a task can easily lose its liveness
 If all tasks in a concurrent environment lose their liveness, it
is called deadlock (or livelock)
Race conditions
 When the resulting value of a variable, when two different
thread of a program are writing to it, will differ depending
on which thread writes to it first
 Transient errors, hard to debug
 E.g.
1. load c
2. add 1
3. store c
• Solution: acquire access to the shared resource before
execution can continue
Issues: lockout, starvation
Task Execution States
 Assuming some mechanism for synchronization (e.g. a
scheduler), tasks can be in a variety of states:
 New - created but not yet started
 Runnable or ready - ready to run but not currently running (no
available processor)
 Running
 Blocked - has been running, but cannot now continue (usually
waiting for some event to occur)
 Dead - no longer active in any sense
Design Issues
 Competition and cooperation synchronization
 Controlling task scheduling
 How and when tasks start and end execution
 Alternatives:
 Semaphores
 Monitors
 Message Passing
 Simple mechanism that can be used to provide
synchronization of tasks
 Devised by Edsger Dijkstra in 1965 for competition
synchronization, but can also be used for cooperation
 A data structure consisting of an integer and a queue that
stores task descriptors
 A task descriptor is a data structure that stores all the relevant
information about the execution state of a task
Semaphore operations
 Two atomic operations, P and V
 Consider a semaphore s:
 P (from the Dutch “passeren”)
 P(s) – if s > 0 then assign s = s – 1; otherwise block (enqueue) the thread
that calls P
 Often referred to as “wait”
 V (from the Dutch “vrygeren/vrijgeven”)
 V(s) – if a thread T is blocked on the s, then wake up T; otherwise assign s
 Often referred to as “signal”
Dining Philosophers
wantBothSticks := Semaphore new.
philosopher := [
[true] whileTrue: [
[self haveBothSticks] whileFalse: [
wantBothSticks wait.
left available and right available
ifTrue: [
self get: left.
self get: right.
wantBothSticks signal.
self eat.
self release: left.
self release: right.
self think.
The trouble with semaphores
 No way to statically check for the correctness of their use
 Leaving out a single wait or signal event can create many
different issues
 Getting them just right can be tricky
 Per Brinch Hansen (1973):
 “The semaphore is an elegant synchronization tool for an ideal
programmer who never makes mistakes.”
Locks and Condition Variables
 A semaphore may be used for either of two purposes:
 Mutual exclusion: guarding access to a critical section
 Synchronization: making processes suspend/resume
 This dual use can lead to confusion: it may be unclear which role a
semaphore is playing in a given computation…
 For this reason, newer languages may provide distinct constructs
for each role:
 Locks: guarding access to a critical section
 Condition Variables: making processes suspend/resume
 Locks provide for mutually-exclusive access to shared memory;
condition variables provide for thread/process synchronization.
 Like a Semaphore, a lock has two associated operations:
 acquire()
 try to lock the lock; if it is already locked, suspend execution
 release()
 unlock the lock; awaken a waiting thread (if any)
 These can be used to ‘guard’ a critical section:
Lock sharedLock;
// access sharedObj
// access sharedObj
Object sharedObj;
 A Java class has a hidden lock accessible via the synchronized keyword
Condition Variables
 A Condition is a predefined type available in some languages that can
be used to declare variables for synchronization.
 When a thread needs to suspend execution inside a critical section
until some condition is met, a Condition can be used.
 There are three operations for a Condition:
 wait()
 suspend immediately; enter a queue of waiting threads
 signal(), aka notify() in Java
 awaken a waiting thread (usually the first in the queue), if any
 broadcast(), aka notifyAll() in Java
 awaken all waiting threads, if any
 Java has no Condition class, but every Java class has an anonymous
condition-variable that can be manipulated via wait, notify & notifyAll
Monitor motivation
 A Java class has a hidden lock accessible via the synchronized
 Deadlocks/livelocks/non-mutual-exclusion are easy to produce
 Just as control structures were “higher level” than the goto,
language designers began looking for higher level ways to
synchronize processes
 In 1973, Brinch-Hansen and Hoare proposed the monitor, a
class whose methods are automatically accessed in a
mutually-exclusive manner.
 A monitor prevents simultaneous access by multiple threads
 The idea: encapsulate the shared data and its operations to
restrict access
 A monitor is an abstract data type for shared data
 Shared data is resident in the monitor (rather than in the
client units)
 All access resident in the monitor
 Monitor implementation guarantee synchronized access by
allowing only one access at a time
 Calls to monitor procedures are implicitly queued if the
monitor is busy at the time of the call
Monitor Visualization
The compiler ‘wraps’ calls to put() and get() as follows:
… call to put or get
public (interface)
If the lock is locked, the
thread enters the entry queue
entry queue
Each condition variable has its own
internal queue, in which waiting
threads wait to be signaled…
Evaluation of Monitors
 A better way to provide competition synchronization than
are semaphores
 Equally powerful as semaphores:
 Semaphores can be used to implement monitors
 Monitors can be used to implement semaphores
 Support for cooperation synchronization is very similar as
with semaphores, so it has the same reliability issues
Distributed Synchronization
Semaphores, locks, condition variables, monitors, are shared-memory constructs, and so only useful
on a tightly-coupled multiprocessor.
 They are of no use on a distributed multiprocessor
On a distributed multiprocessor, processes can communicate via message-passing -- using send() and
receive() primitives.
 If the message-passing system has no storage, then the send/receive operations must be
1. Sender
3. message (transmitted)
2. Receiver
 If the message-passing system has storage to buffer the message, then the send() can proceed
1. Sender
2. message (buffered)
The receiver can then retrieve the message when it is ready...
3. Receiver
(not ready)
 In 1980, Ada introduced the task, with 3 characteristics:
 its own thread of control;
 its own execution state; and
 mutually exclusive subprograms (entry procedures)
Entry procedures are self-synchronizing subprograms that another task can invoke for task-to-task
If task t has an entry procedure p, then another task t2 can execute
t.p( argument-list );
In order for p to execute, t must execute:
accept p ( parameter-list );
- If t executes accept p and t2 has not called p, t will automatically wait;
- If t2 calls p and t has not accepted p, t2 will automatically wait.
When t and t2 are both ready, p
t2’s argument-list is evaluated and passed
accept p(params)
to t.p’s parameters
t2 suspends
t.p (args)
t executes the body of p, using its
end p;
parameter values
return-values (or out or in out parameters)
are passed back to t2
t continues execution; t2
resumes execution
This interaction is called a rendezvous between t and t2.
It does not depend on shared memory, so t1 and t2 can be on a uniprocessor, a tightly-coupled or a
distributed multiprocessor.
Example Problem
 How can we rewrite what’s below to complete more quickly?
procedure sumArray is
N: constant integer := 1000000;
type RealArray is array(1..N) of float;
anArray: RealArray;
function sum(a: RealArray; first, last: integer)
return float is
result: float := 0.0;
for i in first..last loop
result := result + a(i);
end loop;
return result;
end sum;
-- code to fill anArray with values omitted
put( sum(anArray, 1, N) );
end sumArray;
Divide-And-Conquer via Tasks
procedure parallelSumArray is
-- declarations of N, RealArray, anArray, Sum() as before …
task type PartialAdder
entry SumSlice(Start: in Integer; Stop: in Integer);
entry GetSum(Result: out float);
end PartialAdder;
task body ArraySliceAdder is
i, j: Integer; Answer: Float;
accept SumSlice(Start: in Integer; Stop: in Integer) do
i:= Start; j:= Stop;
-- get ready
end SumSlice;
Answer := Sum(anArray, i, j);
accept GetSum(Result: out float) do
Result := Answer;
end GetSum;
end ArraySliceAdder;
-- continued on next slide…
-- do the work
-- report outcome
Divide-And-Conquer via Tasks (ii)
-- continued from previous slide …
firstHalfSum, secondHalfSum: Integer;
T1, T2 : ArraySliceAdder;
-- T1, T2 start & wait on accept
-- code to fill anArray with values omitted
T1.SumSlice(1, N/2);
T2.SumSlice(N/2 + 1, N);
-- start T1 on 1st half
-- start T2 on 2nd half
T1.GetSum( firstHalfSum ); -- get 1st half sum from T1
T2.GetSum( secondHalfSum ); -- get 2nd half sum from T2
put( firstHalfSum + secondHalfSum );
end parallelSumArray;
-- we’re done!
Using two tasks T1 and T2, this parallelSumArray version requires roughly 1/2 the time required by
sumArray (on a multiprocessor).
Using three tasks, the time will be roughly 1/3 the time of sumArray.
Producer-Consumer in Ada
To give the producer and consumer
separate threads, we can define the
behavior of one in the ‘main’
and the behavior of the other in a
separate task:
We can then build a Buffer task with
put() and get() as (autosynchronizing) entry procedures...
procedure ProducerConsumer is
buf: Buffer;
it: Item;
task consumer;
task body consumer is
it: Item;
-- consume Item it
end loop;
end consumer;
begin -- producer task
-- produce an Item in it
end loop;
end ProducerConsumer;
Capacity-1 Buffer
 A single-value buffer is
easy to build using an Ada
As a task-type, variables of this type (e.g.,
buf) will automatically have their own
thread of execution.
The body of the task is a loop that
accepts calls to put() and get() in strict
task type Buffer is
entry get(it: out Item);
entry put(it: in Item);
end Buffer;
task body Buffer is
B: Item;
accept put(it: in Item) do
B:= it;
end put;
accept get(it: out Item) do
it := B;
end get;
end loop;
end Buffer;
This causes myBuffer to alternate between being empty and nonempty.
Capacity-N Buffer
 An N-value buffer is a
bit more work:
We can accept any call to get() so
long as we are not empty, and any
call to put() so long as we are not
Ada provides the select-when statement
to guard an accept, and perform it if
and only if a given condition is true
-- task declaration is as before …
task body Buffer is
N: constant integer := 1024;
package B is new Queue(N, Items);
when not B.isFull =>
accept put(it: in Item) do
end put;
or when not B.isEmpty =>
accept get(it: out Item) do
it := B.first;
end get;
end select;
end loop;
end Buffer;
The Importance of Clusters
 Scientific computation is increasingly performed on clusters
 Cost-effective: Created from commodity parts
 Scientists want more computational power
 Cluster computational power is easy to increase by adding
 Cluster size keeps increasing!
Clusters Are Not Perfect
 Failure rates are increasing
 The number of moving parts is growing (processors, network
connections, disks, etc.)
 Mean Time Between Failures (MTBF) is shrinking
 Anecdotal: every 20 minutes for Google’s cluster
 How can we deal with these failures?
Options for Fault-Tolerance
 Redundancy in space
 Each participating process has a backup process
 Expensive!
 Redundancy in time
 Processes save state and then rollback for recovery
 Lighter-weight fault tolerance
Today’s Answer: Redundancy in Time
 Programmers place checkpoints
 Small checkpoint size
 Synchronous
 Every process checkpoints in the same place in the code
 Global synchronization before and after checkpoints
What’s the Problem?
 Future systems will be larger
 Checkpointing will hurt program performance
 Many processes checkpointing synchronously will result in
network and file system contention
 Checkpointing to local disk not viable
 Application programmers are only willing to pay 1%
overhead for fault-tolerance
 The solution:
 Avoid synchronous checkpoints
Understanding Staggered Checkpointing
moreis data, synchronous
State is consistent---it could have existed
Send not
Receive is saved
checkpoint with
Recovery line
Recovery line
[Randall 75]
Receive not
Send is saved
Identify All Possible Valid Recovery Lines
There are so many!
[2,3,2] [2,4,2]
 A coroutine is two or more procedures that share a single
thread of execution, each exercising mutual control over the
procedure A;
-- do something
resume B;
-- do something
resume B;
-- do something
-- …
end A;
procedure B;
-- do something
resume A;
-- do something
resume A;
-- …
end B;
 Concurrent computations consist of multiple entities.
 Processes in Smalltalk
 Tasks in Ada
 Threads in Java
 OS-dependent in C++
On a shared-memory multiprocessor:
 The Semaphore was the first synchronization primitive
 Locks and condition variables separated a semaphore’s mutual-exclusion usage from its
synchronization usage
 Monitors are higher-level, self-synchronizing objects
Java classes have an associated (simplified) monitor
On a distributed system:
 Ada tasks provide self-synchronizing entry procedures