BSS 797: Principles of Parallel Computing

advertisement
BSS 797: Principles of Parallel Computing
Lecture 5
Parallel Performance Measurement
Speedup
Let T(1, N)= be the time required for the best serial
algorithm to solve problem of size N on 1 processor
and T(P, N) be the time for a given parallel
algorithm to solve the same problem of the same
size N on P processors. Thus, speedup is defined as
S(P, N) = T(1,N)/T(P, N)
 Remarks:
1. Normally, S(P,N) <= P;
Ideally, S(P,N) = P;
Rarely, S(P,N) > P --- super speedup.
2. Linear speedup: S(P,N) = c P where c is
independent of N and P.
3. Algorithms with S(P,N) = c P are scalable.
Parallel efficiency
Let T(1, N)= time required for the best serial
algorithm to solve problem of size N on 1 processor
and T(P, N)= time for a given parallel algorithm to
solve the same problem of the same size N on P
processors. Thus, parallel efficiency is defined as
E(P,N)= T(1, N)/[T(P, N)P] = S(P,N)/P
 Remarks:
1. Normally, E(P,N) <= 1;
Ideally, S(P,N) = 1;
Hardly, S(P,N) > 1;
Commonly, E(P,N) ~.6. Of course, it is
problem-dependent.
2. Linear speedup: E(P,N) = c where c is
independent of N and P.
3. Algorithms with E(P,N) = c are scalable.
Load Imbalance Ratio I(P,N)
Processor i spends ti doing useful work.


tmax = max{ti}
tavg = \sum i=0P-1 (ti/P) = average time.
 The total for computation and communication is
\sum i=0P-1 ti
while the time that the system is occupied (either
computation or communication or idle) is P tmax.
 Load imbalance ratio:
I(P,N) = (P tmax - \sum i=0P-1 ti)/(\sum i=0P-1 ti) =
tmax/tavg - 1
 Remarks:
1. tavg * I(P,N) = tmax - tavg = per-processor wasted
time.
2. if tmax = tavg, then ti = tavg, then, I(P,N)=0,
therefore, load is fully balanced.
3. One slow processor (tmax) can mess up the
entire team---slave-master scheme is usually
avoided.
Overhead
h(P,N) can be defined by E(P,N)= 1/(1 + h(P,N))
Thus, overhead is
h(P,N)= 1/E(P,N) - 1 = P/S(P,N) - 1
 Remarks


h(P,N) --> \infty if E(P,N) --> 0.
h(P,N) --> 0 if E(P,N) --> 1.

h(P,N) results from: communication and load
imbalance.
Amdahl’s ``law''
Suppose a fraction f of an algorithm for a problem
of size N on P processors is inherently serial and
the remainder is perfectly parallel, then assume
T(1,N) = \tau. Thus, T(P,N) = f\tau + (1-f)\tau/P.
Therefore, S(P,N) =1/(f + (1-f)/P).
This indicates that when P --> \infty, the speedup
S(P,N) is bounded by 1/f. It means that the
maximum possible speedup is finite even if P -->
\infty.
Granularity
The size of the problem allocated to individual
processors are called the granularity of the
decomposition.
 Remarks:
1. granularity is usually determined by the
problem size N and computer size P.
2. decreasing granularity usually increases
communication and decreases load imbalance.
3. increasing granularity usually decreases
communication and increases load imbalance.
Scalability



A scalable algorithm:
whose E(P, N) remains bounded from below, i.e.,
E(P, N) >= E0 > 0$, when the number of processors
P --> \infty at fixed problem size.
A quasi-scalable algorithm:
whose E(P, N) remains bounded from below, i.e.,
E(P, N) >= E0 > 0, when the number of processors
Pmin < P < Pmax at fixed problem size. The interval
Pmin < P < Pmax is called scaling zone.
Remarks:
1. true scalable: rare, quasi-scalable: common.
2. quasi-scalable is regarded as scalable, usually
3. at fixed N=N(P), E(P,N(P)) decreases
monotonically as P increases.
4. efforts: maximize scaling zone: Pmin < P < Pmax
and E0.
Principles: minimize overhead.
In practice, we do,
1. Minimize communication-to-computation ratio
2. Minimize load imbalance
3. Maximize scaling zone
Download