Concurrent Systems

advertisement
Concurrent Systems
Parallelism
Final Exam Schedule
• CS1311 Sections L/M/N Tuesday/Thursday 10:00 A.M.
• Exam Scheduled for 8:00 Friday May 5, 2000
• Physics L1
Final Exam Schedule
• CS1311 Sections E/F Tuesday/Thursday 2:00 P.M.
• Exam Scheduled for 2:50 Wednesday May 3, 2000
• Physics L1
Concurrent Systems
Sequential Processing
• All of the algorithms we’ve seen so far are
sequential:
– They have one “thread” of execution
– One step follows another in sequence
– One processor is all that is needed to
run the algorithm
A Non-sequential Example
• Consider a house with a burglar alarm system.
• The system continually monitors:
– The front door
– The back door
– The sliding glass door
– The door to the deck
– The kitchen windows
– The living room windows
– The bedroom windows
• The burglar alarm is watching all of
these “at once” (at the same time).
Another Non-sequential Example
• Your car has an onboard digital dashboard
that simultaneously:
– Calculates how fast you’re going and
displays it on the speedometer
– Checks your oil level
– Checks your fuel level and calculates
consumption
– Monitors the heat of the engine and
turns on a light if it is too hot
– Monitors your alternator to make sure it
is charging your battery
Concurrent Systems
• A system in which:
– Multiple tasks can be executed at
the same time
– The tasks may be duplicates of
each other, or distinct tasks
– The overall time to perform the
series of tasks is reduced
Advantages of Concurrency
• Concurrent processes can reduce
duplication in code.
• The overall runtime of the algorithm can
be significantly reduced.
• More real-world problems can be solved
than with sequential algorithms alone.
• Redundancy can make systems more
reliable.
Disadvantages of Concurrency
• Runtime is not always reduced, so
careful planning is required
• Concurrent algorithms can be more
complex than sequential algorithms
• Shared data can be corrupted
• Communications between tasks is
needed
Achieving Concurrency
• Many computers today have more than
one processor (multiprocessor machines)
CPU 1
CPU 2
bus
Memory
Achieving Concurrency
• Concurrency can also be achieved on a computer
with only one processor:
– The computer “juggles” jobs, swapping its
attention to each in turn
– “Time slicing” allows many users to get CPU
resources
– Tasks may be suspended while they wait for
something, such as device I/O
task 1
task 2 ZZZZ
CPU
task 3
ZZZZ
Concurrency vs. Parallelism
• Concurrency is the execution of
multiple tasks at the same time,
regardless of the number of
processors.
• Parallelism is the execution of
multiple processors on the same
task.
Types of Concurrent Systems
•
•
•
•
Multiprogramming
Multiprocessing
Multitasking
Distributed Systems
Multiprogramming
• Share a single CPU among many
users or tasks.
• May have a time-shared algorithm or
a priority algorithm for determining
which task to run next
• Give the illusion of simultaneous
processing through rapid swapping
of tasks (interleaving).
Multiprogramming
Memory
User 1
User 2
CPU
User1
User2
Tasks/Users
Multiprogramming
4
3
2
1
1
2
3
CPU’s
4
Multiprocessing
• Executes multiple tasks at the
same time
• Uses multiple processors to
accomplish the tasks
• Each processor may also
timeshare among several tasks
• Has a shared memory that is used
by all the tasks
Multiprocessing
Memory
User 1: Task1
User 1: Task2
User 2: Task1
CPU
CPU
User1
CPU
User2
Tasks/Users
Multiprocessing
4
Shared
Memory
3
2
1
1
2
3
CPU’s
4
Multitasking
• A single user can have multiple tasks
running at the same time.
• Can be done with one or more
processors.
• Used to be rare and for only
expensive multiprocessing systems,
but now most modern operating
systems can do it.
Multitasking
Memory
User 1: Task1
User 1: Task2
User 1: Task3
CPU
User1
Multitasking
Tasks
4
3
Single User
2
1
1
2
3
CPU’s
4
Distributed Systems
Multiple computers working together with no
central program “in charge.”
ATM Buford
ATM Student Ctr
ATM Perimeter
ATM North Ave
Central
Bank
Distributed Systems
• Advantages:
– No bottlenecks from sharing processors
– No central point of failure
– Processing can be localized for efficiency
• Disadvantages:
– Complexity
– Communication overhead
– Distributed control
Questions?
Questions?
Parallelism
Parallelism
• Using multiple processors to solve a
single task.
• Involves:
– Breaking the task into meaningful
pieces
– Doing the work on many
processors
– Coordinating and putting the
pieces back together.
Parallelism
Network
Interface
Memory
CPU
Parallelism
Tasks
4
3
2
1
1
2
3
CPU’s
4
Pipeline Processing
Repeating a sequence of operations or pieces of
a task.
Allocating each piece to a separate processor
and chaining them together produces a pipeline,
completing tasks faster.
input
output
A
B
C
D
Example
• Suppose you have a choice between a washer
and a dryer each having a 30 minutes cycle or
• A washer/dryer with a one hour cycle
• The correct answer depends on how much work
you have to do.
One Load
Transfer
Overhead
wash
combo
dry
Three Loads
wash
dry
wash
dry
wash
combo
combo
dry
combo
Examples of Pipelined Tasks
• Automobile manufacturing
• Instruction processing within a computer
A
2
1
B
3
4
2
1
C
5
3
2
1
0
1
2
3
2
time
4
5
4
3
1
D
5
4
5
5
4
3
6
7
Task Queues
• A supervisor processor maintains a queue of
tasks to be performed in shared memory.
• Each processor queries the queue, dequeues
the next task and performs it.
• Task execution may involve adding more tasks
to the task queue.
P1
P2
Super
P3
Pn
Task Queue
Parallelizing Algorithms
How much gain can we get from
parallelizing an algorithm?
Parallel Bubblesort
We can use N/2 processors to do all the
comparisons at once, “flopping” the
pair-wise comparisons.
93
87
74
65
57
45
33
27
87
93
65
74
45
57
27
33
87
65
93
45
74
27
57
33
Runtime of Parallel Bubblesort
3
65
87
4
65
45
5
45
6
45
93
27
74
33
57
87
27
93
33
74
57
65
27
87
33
93
57
74
45
27
65
33
57
93
74
7
27
45
33
65
57
87
74
93
8
27
33
45
57
65
74
87
93
87
Completion Time of Bubblesort
• Sequential bubblesort finishes in N2 time.
• Parallel bubblesort finishes in N time.
O(N)
parallel
Bubble Sort
O(N2)
Product Complexity
•
•
•
•
Got done in O(N) time, better than O(N2)
Each time “chunk” does O(N) work
There are N time chunks.
Thus, the amount of work is still O(N2)
• Product complexity is the amount of work
per “time chunk” multiplied by the number
of “time chunks” – the total work done.
Ceiling of Improvement
• Parallelization can reduce time, but it cannot
reduce work. The product complexity cannot
change or improve.
• How much improvement can parallelization
provide?
– Given an O(NLogN) algorithm and Log N
processors, the algorithm will take at least
O(?) time.
O(N)
– Given an O(N3) algorithm and N processors,
the algorithm will take at least O(?)
O(N2)time.
time.
Number of Processors
• Processors are limited by hardware.
• Typically, the number of processors is a
power of 2
• Usually: The number of processors is a
constant factor, 2K
• Conceivably: Networked computers joined
as needed (ala Borg?).
Adding Processors
• A program on one processor
– Runs in X time
• Adding another processor
– Runs in no more than X/2 time
– Realistically, it will run in X/2 +  time
because of overhead
• At some point, adding processors will not
help and could degrade performance.
Overhead of Parallelization
• Parallelization is not free.
• Processors must be controlled and coordinated.
• We need a way to govern which processor does
what work; this involves extra work.
• Often the program must be written in a special
programming language for parallel systems.
• Often, a parallelized program for one machine
(with, say, 2K processors) doesn’t work on other
machines (with, say, 2L processors).
What We Know about Tasks
• Relatively isolated units of computation
• Should be roughly equal in duration
• Duration of the unit of work must be much
greater than overhead time
• Policy decisions and coordination
required for shared data
• Simpler algorithm are the easiest to
parallelize
Questions?
More?
Matrix Multiplication
c  a b
n
cij   aik bkj
k 1
. .   .
.
.
.  .
. .   .
.
.
.  

. c34  a31 a32 a33 a34  
 

. .   .
.
.
. 
c34  a31b14  a32b24  a33b34  a34b44
.
.

.

.
.
.
.
.
.
.
b14 
b24 
b34 

b44 
Inner Product Procedure
Procedure inner_prod(a, b, c isoftype in/out
Matrix, i, j isoftype in Num)
// Compute inner product of a[i][*] and b[*][j]
Sum isoftype Num
k isoftype Num
Sum <- 0
k <- 1
loop
exitif(k > n)
sum <- sum + a[i][k] * b[k][j]
k < k + 1
endloop
endprocedure // inner_prod
Matrix definesa Array[1..N][1..N] of Num
N is // Declare constant defining size
// of arrays
Algorithm P_Demo
a, b, c isoftype Matrix Shared
server isoftype Num
Initialize(NUM_SERVERS)
// Input a and b here
// (code not shown)
i, j isoftype Num
i <- 1
loop
exitif(i > N)
server <- (i * NUM_SERVERS) DIV N
j <- 1
loop
exitif(j > N)
RThread(server, inner_prod(a, b, c, i, j ))
j <- j + 1
endloop
i <- i + 1
endloop
Parallel_Wait(NUM_SERVERS)
// Output c here
endalgorithm // P_Demo
Questions?
Download