Analysis of an Approximation Algorithm for Scheduling Independent Parallel Tasks Keqin Li

advertisement
Discrete Mathematics and Theoretical Computer Science 3, 1999, 155–166
Analysis of an Approximation Algorithm for
Scheduling Independent Parallel Tasks
Keqin Li
Department of Mathematics and Computer Science, State University of New York, New Paltz, NY 12561-2499, USA
received April 14, 1998, revised May 28, 1999, accepted May 30, 1999.
In this paper, we consider the problem of scheduling independent parallel tasks in parallel systems with identical
processors. The problem is NP-hard, since it includes the bin packing problem as a special case when all tasks have
unit execution time. We propose and analyze a simple approximation algorithm called , where is a positive
integer. Algorithm has a moderate asymptotic worst-case performance ratio in the range for all ;
, when task
but the algorithm has a small asymptotic worst-case performance ratio in the range ! sizes do not exceed of the total available processors, where #"$ is an integer. Furthermore, we show that if the
task sizes are independent, identically distributed (i.i.d.) uniform random variables, and task execution times are i.i.d.
random variables with finite mean and variance, then the average-case performance ratio of algorithm % is no larger
than 1.2898680..., and for an exponential distribution of task sizes, it does not exceed 1.2898305.... As demonstrated
by our analytical as well as numerical results, the average-case performance ratio improves significantly when tasks
request for smaller numbers of processors.
Keywords: approximation algorithm, average-case performance ratio, parallel task scheduling, probabilistic analysis, worst-case performance ratio
1 Introduction
In this paper, we consider the problem of scheduling independent parallel tasks in parallel systems with
identical processors. Assume that we are given a list of & tasks ')(+*-,/.102,4310657585709,4:<; . Each task ,4=
is specified by its execution time >6= , and its size ? = , i.e., ,4= requires ? = processors to execute. There
are @ identical processors, and any ? = processors can be allocated to ,4= . Once ,4= starts to execute, it
runs without interruption until it completes. Tasks in ' are mutually independent, that is, there is no
precedence constraint nor data communication among the & tasks. The problem addressed here is to find
a nonpreemptive schedule of ' such that its makespan (i.e., the total execution time of the & tasks) is
minimized.
The problem is NP-hard, since it includes the bin packing problem as a special case when all tasks
have unit execution time. Therefore, one practical way to solve this problem is to design and analyze
approximation algorithms that produce near optimal solutions. Let AB*C'D; be the makespan of the schedule
E
The author can be reached at email: li@mcs.newpaltz.edu, phone: (914) 257-3534, fax: (914) 257-3571.
F
1365–8050 c 1999 Maison de l’Informatique et des Mathématiques Discrètes (MIMD), Paris, France
156
' , and GIHKJL*'; be the makespan of an optimal schedule of ' . The
MON
P (RUWQ7V S8T NYX ]_^ `_Z2[<a8b<\ cd UIe AB*';
GIHKJL*';1fhg
is called the asymptotic worst-case performance ratio of algorithm A . If there exist two constants i and j
M N
such that for all ' , AB*';%klinmoGIHDJL*'D;Wpqj/>r , where >rs(tTvu1wx.zy{=y{:_*>6=|; is the longest execution time
of the & tasks, then P k}i , and i is called an asymptotic worst-case performance bound of algorithm
M N a list ' , such that GIHKJL*C'D;ƒ„ , and
A . Moreover, if for any small ~hY€ and all large ‚l€ , there exists
AB*';%ƒ *i†‡~z;ˆGIHKJL*'; , then the bound i is called tight, i.e., P (‰i . When task sizes and execution
times are random variables, both AB*'; and GIHDJL*'D; become random variables, and
MON
ŠP
N
Œ *AŽ*';9;
(‹: Q7V S8T X
Œ *|GIHDJO*';9;g
M
N
is called the asymptotic average-case performance
ratio of algorithm A , where Œ *m; stands for the exŠP
pectation of a random variable. Of course,
depends on the probability distributions of task sizes and
M N times. If there exists a constant ‘ such that for all ' , Œ *CAB*';9;#k’‘“m Œ *|GIHDJI*C'D;2; as &”–• ,
execution
Š
then P k—‘ , and ‘ is called an asymptotic average-case performance bound of algorithm A .
generated by an algorithm
quantity
A
Keqin Li
for
We notice that our scheduling problem defined above looks similar to but is quite different from the two
dimensional rectangle packing problem [1, 2, 7, 9], where each task , = is treated as a rectangle with width
? = and height > = . The rectangle packing model implies that processors should be allocated in contiguous
groups. That is, the @ processors have indices 1, 2, 3, ..., @ , and task , = must be allocate ? = processors
with indices ˜ , ˜spY™ , ..., ˜#pš? = †l™ for some ˜ . Such a scheduling problem arises in parallel systems like
linear arrays. The rectangle packing problem has been extensively studied, where a complicated algorithm
with asymptotic worst-case performance ratio as low as 1.25 has been found [1]. However, contiguous
processor allocation is not required in our model, where any ? = processors can be allocated to ,4= . Our
problem could be regarded as a resource constraint scheduling problem [3, 6], where the resource is a
set of processors. It has applications in parallel computing systems such as symmetric shared memory
multiprocessors and distributed computing systems such as bus-connected networks of workstations. In
these systems, a processor allocation mechanism is independent of the topology of an interconnection
network. Another related problem is scheduling malleable tasks which has also been investigated in the
literature [8, 10]. In that problem, each task requests for several possible numbers of processors, i.e., a
task has adjustable size, and for each size, an execution time is also specified. The problem has several
variations depending on different ways in which the execution time of a task changes with the number of
processors allocated to it, and the performance measures to be optimized (e.g., makespan, average flow
time).
The problem we consider here is to schedule nonmalleable tasks with noncontiguous processor allocation. Even though the complicated algorithm in [1] for rectangle packing can also be applied to solve our
M N
problem, we propose and analyze a simple approximation algorithm called ›#œ , where  is a positive inte¡W¢ ™ž5¤¥¥<58575
ger. Algorithm › œ has a moderate asymptotic worst-case performance ratio ( ™ž5 ŸžŸŸ<57575 k
MforN all ¦ƒ£Ÿ ); but the algorithm has a small asymptotic worst-case performance ratio ™hp£™§<k£
*-¨#p£™©;sk
¡ ¢ k)™spª™§1¨ , when task sizes do not exceed @„§1¨ , where ¨—«™ is an integer. We notice that the
capability to deal with small tasks is important in real applications since many task sizes are relatively
small as compared with the system size so that a large scale parallel system can be shared by many users
Scheduling Independent Parallel Tasks
157
simultaneously. However, it is not clear whether the algorithm in [1] has such capability. Furthermore,
the simplicity of our algorithm allows us to conduct average-case performance analysis. In particular,
we show that if the numbers of processors requested by the tasks are independent, identically distributed
M N
(i.i.d.) random variables uniformly distributed in the Š range ¬8™575 @£­ , and task execution times are i.i.d. random variables with finite mean and variance, then ¡4¢ k®™ž5 ¥¯°ž¯Ÿ¯ž€<57585 , i.e., Œ *›sœŽ*C'D;2;2§ Œ *|GIHKJO*C'D;2;
M N from above by 1.2898680... as &£”±• . For an exponential distribution of
is asymptotically bounded
Š
task sizes, we have ¡4¢ k«™5¥¯ž°¯²ž€ž³<58575 . As demonstrated by our analytical as well as numerical results, the average-case performance ratio improves significantly when tasks request for smaller numbers
of processors. We notice that there is lack of such results on probabilistic algorithm analysis, especially
in multi-dimensional cases [4]. The average-case performance of the algorithm in [1] is unknown.
The rest of the paper is organized as follows. We present algorithm › œ in Section 2. The worst-case
performance of the algorithm is analyzed in Section 3, and its average-case performance is studied in
Section 4. Finally we give a summary in Section 5.
2 The Approximation Algorithm H
Our algorithm Hœ for scheduling independent parallel tasks divides '
into  sublists ' . , ' 3 , ..., 'Kœ
according to task sizes (i.e., numbers of processors requested by tasks), where ´ƒ‰™ is a positive integer.
For ™µk}¶·k‰¸†t™ , define '¹Ž(º©, =D» ' ¼ @„§ *|¶Bp£™©;s½‰? = k}@„§¶{¾ , i.e., 'D¹ contains all tasks in '
that have sizes in the interval ¿o¹(*C@„§ *|¶Àp£™©;0ˆ@„§¶ ­ . Define 'Dœª(ºo, =D» '‚¼ €Á½„? = k‚@„§¾ , i.e.,
'Kœ contains all tasks whose sizes are in the range ¿œ‚( *C€<0ˆ@„§1Â­ .
Algorithm Hœ produces schedules of the ' ¹ ’s sequentially and separately. To process tasks in ' ¹ ,
where ™·kölkÝ«†£™ , the @ processors are partitioned into ¶ groups, ÄB.0ˆÄO30o58575702Ä ¹ , each contains
@„§¶ processors. Each group ÄsÅ of processors is treated as a unit, and is assigned to a task in ' ¹ .
Such an allocation can be implemented using, for example, the list scheduling algorithm [4]. Suppose
' ¹ ()*, . ¹ 09, 3 ¹ 0657585709, :¹ Æ ; , where & ¹ is the number of tasks in ' ¹ . Initially, group ÄsÅ is assigned to , Å ¹ ,
¹ ¹
¹
¹
where ™ŽkY˜nk}¶ , and , . 09, 3 0657585709, ¹ are removed from ' ¹ . Upon the completion of a task , Å , the first
unscheduled task in '¹ , i.e., , ¹z¹ Ç . , is removed from '¹ and scheduled to execute on Ä Å . This process
repeats until all tasks in 'D¹ are finished. Then algorithm Hœ begins the scheduling of next sublist 'D¹Ç . .
For 'Dœ , there is no need to divide the @ processors. The list scheduling algorithm is again employed here. Let 'Kœ«(È*-, . œ 09, 3 œ 0657585709, : œ ¢ ; . Initially, as many tasks in 'Kœ are scheduled as possible,
œ œ
œ
i.e., tasks , . 02, 3 0o58575709, É start their execution, where Ê is defined in such a way that the total size of
œ
œ
œ
, . 02, 3 0o58575802, É is no larger than @ , but the total size of , . œ 09, 3 œ 0657585709, É œ Ç . exceeds @ . When a task
finishes, the next task in ' œ begins its execution, provided that there are enough idle processors. Notice
that it is the scheduling of tasks in ' œ that takes advantage of noncontiguous processor allocation.
3 Combinatorial Analysis
In this section, we analyze the worst-case performance of algorithm Hœ . Let ›#œ*'; be the makespan of
the schedule produced by algorithm H œ for ' , and GIHKJL*'; be the makespan of an optimal schedule of
' . First, we show the following result.
Theorem 1. For any list ' of & tasks and )ƒl² , we have
›#œµ*C'D;k‰™ o™™o²¯ I
G HDJL*'D;_p‰*-Ë†l™;Ì> r 0
158
Keqin Li
where >r is the longest execution time of the & tasks. Furthermore, for ÍƒÎŸ and any large «€ ,
M N
3Ð
there exists ' , such that GIHKJL*';σ‚ , and ›sœ*C'D;ˆ§GIHKJI*';(Ù . Therefore, for all Èƒ£Ÿ , we have
™5 ŸŸŸx58575 k ¡4¢ k}™5¤¥ž¥ 57585 .
Proof. Assume that all tasks in ' are executed in the time interval ¬ €<0ˆ› œ *';Ñ­ , and that tasks in ' ¹ are
scheduled in ¬ Ò ¹ 09Ò ¹zÇ .Ó­ , where ™LkY¶“k$ , i.e., the first task in ' ¹ starts at time Ò ¹ , and the last completion
¹
time of the tasks in '¹ is Ò9¹zÇ . . Let ʏ¹ be the starting time of the last task , :žÆ in '¹ . Define Ô . („Ò 3 †ÕÒ . ,
and for all ¥µk‰¶nkY , define Ô1¹À(}Ê©¹I†qÒ9¹ , and ¨¹Ž(‰Ò9¹zÇ . †šÊ¹ be the remaining execution time of 'D¹
once , :¹ Æ starts. Clearly, ¨©¹Žk’>r , and
Ð
Ð
Ô. $
p Ô3 —
p Ô p„m6mom©p$Ô©œ’pš¨ 3 pš¨ p„m6m6m1pš¨©œ
Ð
k Ô . p$Ô 3 —
p Ô p„m6mom©p$Ô©œ’p„*®†’™©;Ì> r 5
(1)
We give several lower bounds for GIHKJL*'; . First, tasks in ' . have large sizes such that no two of them
›sœŽ*C'D;Ö(
can execute in parallel. Therefore, we have
GIHDJI*C'D;ƒ’Ô . 5
(2)
Second, there can be at most two tasks from 'K3 that can execute at the same time, and there can be at
most one task from ' 3 that can execute simultaneously with a task in ' . . Thus,
GIHDJI*C'D;ƒ’Ô1.Kp ¥Ô©3%¥ †qÔ. ( Ô¥ . p$Ô©35
(3)
Third, define the area of a task ,4= to be ? =C>6= , and the total area of ' ¹ to be A ¹ (Ø×tÙÚÛ b Æ ? =>6= for
™Ükª¶‡k} , and A(ÝA . pYA 3 p}mom6mpYA#œ . For convenience, we assume that processor requirements
are normalized such that €“½‰? = kޙ for all ™µk£ßhk‰& . Clearly, GIHDJL*'D;sƒ}A . We give a lower bound
¹
for A as follows. For ™Žk}¶Õk„¸†Y™ , we notice that all the ¶ groups Ä . , Ä 3 , ..., ÄÀ¹ are busy until , :Æ
starts its execution. That is, during time interval ¬ Ò9¹ 0Óʏ¹©­ , at least ¶x§ *|¶%p—™©; of the processors are busy, i.e.,
we have As¹£*|¶x§ *|¶#pl™©;2;Ô1¹ . For 'Dœ , we note that during time interval ¬ ҐœB0ÓÊ©œÏ­ , the percentage of busy
processors is at least *Ã†—™©;ˆ§ , i.e., A#œÞ}*2*-Ã†—™;2§Õ;Ôœ ; otherwise, some tasks should start earlier.
Hence,
Ð
(4)
GIHDJI*C'D;ƒ ¥™ Ô . p ²¥ Ô 3 p à² Ô p„m6mom©p  †l™ ԏœ#á . p ® †l™ Ô©œB5
Now let us consider the ratio
â
Ô . p$Ô 3
(
Tvuwã|Ô1.10 3. Ô.äp$ԏ30 3.
Ð
Let æÁ(‰Ô p„m6mom©p$Ô œ . Apparently,
â
k
Tvu1w ã Ô . 0
Ð
p Ô „
$
p m6mom©p$Ô©œsá . p—ԏœ
œ á . Ô œIå 5
Ô1.äp 3Ð ©Ô 3ptmom6m©p œsœ á . Ô œsá .äp s
œ
(5)
Ô.äp$ԏ3Dpšæ
Ð 5
.Ô. —
.
3
Ð
ç
3 p Ô30 3Ô. p Ô3 p æ å
(6)
We give the proof of the following result in the appendix.
Ô.äp$Ô©3Kp—æ
Ð k‰™ o™ ² 0
.
.
3
Ð
™o¯
TµuwãÑÔ . 0 3 Ô . —
p Ô30 3Ô. p Ô3 p çæ å
where
Ô.©0ˆÔ309淃l€<5
(7)
Scheduling Independent Parallel Tasks
M
.
159
Ð
Combining Equations (1)–(7), weN get ›sœŽ*C'D;k}™ .9è GIHDJI*C'D;/p‰*-Ë†l™;Ì>r .
.
To show the lower bound for ¡ ¢ , let us consider a list ' of tasks which contains & tasks of size 3 pÕ~ ,
Ð
.
.
& tasks of size pY~ , and & tasks of size é †l¥~ , where & is a multiple of 6, and ~ހ is a very small
quantity. All tasks have unit execution time. Clearly, GIHDJI*C'D;L(& . Algorithm H œ divides ' into ' . ,
' 3 , and ' é , and ›#œ*';#(Þ&Ápl&ꧥIÐ3 pl&ê§1ŸÂ(®™ 3Ð & . Thus, we can choose & a sufficiently large number
while keeping ›sœŽ*C'D;2§žGIHKJL*';ë( ™ . This completes the proof of the theorem.
For tasks with small sizes, algorithm ›#œ exhibits much better performance due to increasing processor
utilization, as claimed by the following theorem.
Theorem 2. For any list ' of & tasks, such that ?
= k’@„§1¨
for all ß , where ¨‰™ is an integer, we have
›#œ*';k e ™Dp ¨ ™ f I
G HDJL*'D;êp„*†¨#p„™©;> r 0
where >r is the longest execution time of the & tasks. Furthermore, for ÈƒY¨#p„™ and any large  „€ ,
M N
there exists ' , such that GIHDJI*C'D;ƒ¸ , and ›#œ*';2§žGIHKJL*';(ì™spޙ§<*-¨Op ™; . Therefore, we have
™Dpt™1§ *-¨Ïp„™©;Dk ¡4¢ k}™Dpt™1§¨ .
Proof. The proof is similar to that of Theorem 1. Since ¨Áƒ‚¥ , and sublists ' . , ' 3 , ..., 'Kí6á . are empty,
Equation (1) becomes
› œ *';k’Ô í p$Ô íˆÇ .äp„m6mom©p—Ô œ p‰*-Ë†¨Ïp„™©;Ì> r 5
By using the area lower bound for GIHDJI*C'D; , we have
¨Ïpt™
†’™
†’™
¨
e ¨hp„™xf Ô íKp e ¨Ïp$¥_f ԏí2Ç . p„m6mom©p e  f ԏœsá . p e  f ԏœ
¨ *Ô í $
ƒ ¨Ïp„
™ p Ô í2Ç .äp„m6mom©p$Ô œ ;z5
M N
The above two inequalities give the asymptotic worst-case performance bound ™#p‚™1§¨ in the theorem.
To show the lower bound ™Dpt™1§ *¨hp„™©; for ¡ ¢ , let us consider a list ' which contains &4¨ tasks of size
™§ *¨#p£™©;êpl~ , and & tasks of size ™1§ *¨Ïp£™;ä†q¨1~ , where & is a multiple of ¨sp‰™ , and ~ is a sufficiently
small value. All tasks have unit execution time. Clearly, GIHKJL*C'D;À(Ã& . Algorithm Hœ divides ' into
'Kí and 'Dí2Ç . , and ›sœ*C'D;(¸&Õp„&ê§ *¨Opª™©; . Thus, we can choose & sufficiently large while keeping
›#œ*'D;ˆ§GIHDJI*C'D;ë(ª™Dpt™1§ *¨%p„™©; .
When ¨Áƒ ³ , algorithm ›#œ has better asymptotic worst-case performance ratio than the algorithm in
GIHKJL*';Rƒ
[1].
4 Probabilistic Analysis
Now let us consider the average-case performance of algorithm Hœ . For convenience, we assume the task
sizes are normalized such that €µ½l? =ëk‰™ , and that the ? = ’s are independent, identically distributed (i.i.d.)
random variables with a common probability density function îï*æ{; in the range *C€<0o™­ . Our assumption on
the task execution times is quite general, i.e., the >o= ’s are i.i.d. random variables with mean ð and variance
160
Keqin Li
ñ 3
, where ð and ñ are any finite numbers independent of & . The probability distributions of task sizes and
execution times are independent of each other.
Theorem 3. We have the following asymptotic average-case performance bound for algorithm ›sœ :
œ#ó á
MON
Š ¡4¢
N X Œ * ›sœµ*'D;2;
¹d
( : Q8V S7T L
ò
k
Œ *ÑI
G HKJO*C'D;9;1g
(Note that the bound only depends on îï*-æW; .)
¢
™ Æ ïî *-æ{;9÷ævp 
ù õ æ{îï*-æW;÷žæ
¶ ôÎÆ2ö õ
®†l™sôø
. Ü
5
.
.
õ
Tvuw e õ ô ù {æ îï*æ{;÷žæ/0 ô îï*æ{;÷žæ f
ú
.
õ
Proof. It is clear that the mean task size is
.
? Š ( ô ù æ{îï*æ{;÷žæ/5
Since a task size falls into ¿ ¹ with probability û©ü Æ îï*-æ{;9÷æ , where ¿ ¹ (Î*™§<*C¶Àp‰™©;06™1§¶ ­
†l™ , and ¿œ‚( *C€<06™1§“­ , the expected number of tasks in '¹ is
Œ *-&/¹;K( X ô ü îï*-æW;÷žæ g &ä0
Æ
for all ™OkY¶Âk’ . Also, the expected size of tasks in 'D¹ is
for all
™k‰¶Õk
æWîï*-æW;÷žæ
Š? ¹O( ô ü Æ
5
ï
î
*
{
æ

;
ž
÷
æ
ô üÆ
Š
Since the area of a task , = has expectation ? ð , and tasks in ' . have to be executed sequentially, we have
Œ *|GIHDJI*C'D;2;ƒlTvu1wn*-& ? Š ðä0 Œ *& . ;ð/;K(týv&4ðä0
(8)
.
.
ýÎ(„Tvu1wn* ?{Š 0 Œ *& . ;2§1&/;ë(tTvuw e ô ù {æ îï*æ{;9÷æê0 ô îï*-æ{;9÷æ f 5
ú
Let þv¹ be the makespan of the schedule for '¹ . Then, Œ *C#
› œµ*';9;ë( Œ õ *þ . ; p Œ *þ 3 ;žpmom6mÑp Œ *þµœI; .
Clearly,
.
Œ *Cþ . ;ë( Œ *-& . ;Ìðÿ( X ô îï*-æW;÷žæ g &4ðä5
(9)
ú
For ¥Bkt¶“kl®†l™ , we have Œ *Cþv¹;K( Œ *Ô1¹;/p Œ *-¨¹; , and
õ
Œ *Ô1¹;k Œ *¶ &/¹; ðä5
where
Scheduling Independent Parallel Tasks
161
Furthermore, ¨©¹ is no more than the maximum of ¶ random execution times. It is well known from order
statistics [5] that the mean of the maximum of i.i.d. random variables . 0 3 065757580 with mean ð and
3
variance ñ is
Œ *-Tvu1w_* . 0 3 0o5857570;2;%k’ðvp
s†l™ ñ 5
¥ s†l™
Therefore,
Œ *þ ¹ ;
k
(
Œ *&/¹; ðÜp—ðÜp Ž
¶ †l™ ñ
¶
¥ ¶Ž†’™
X™
p ™ g ðÜp
¶ e ô ü Æ îï*æ{;÷žæ f &Ât
¶B†’™ ñ 5
¥ž¶À†’™
(10)
Finally, we consider þµœ . Since
Œ *A œ ;D( Œ *& œ ; ? Š œ ðÿ( X ü ¢ æ{îï*-æW;÷žæ &4ðä0
ô
g
and the processor utilization is at least
™h†’™§1
in the time interval
¬ ҐœB0ÓÊoœ#­ , we get
 X æWîï*-æW;÷žæ &4ðÜp Œ *-¨oœO;5
Œ *CþµœL;k Œ C* A#œO. ; p Œ *-¨oœO;ë( 
 †’™ ô ü ¢
g
™h† œ
œ
For Œ *¨ œ ; , the main difficulty is that when task , : ¢ starts execution, the number of active tasks still in
execution is unknown, which could be as large as & œ , the total number of tasks in ' œ . Since Œ *& œ ;
could be *&/; , we use the following quite loose upper bound for Œ *-¨oœO; , that is, ¨oœ is no more than the
maximum of & random execution times,
Œ *-¨©œL;k’ðÜp
Therefore, we get
&Á†’™ ñ ½’ðvp
¥1&Á†’™
& ñ 5
¥
Œ *CþµœL;k e  †’™ X ô ü ¢ æ{îï*æ{;÷žæ g &vp„™ f ðÂp
& ñ 5
¥
(11)
Combining Equations (9)–(11), we obtain
Œ *›sœµ*'D;2;k
e ô ú
.
õ
Notice that
œsó á
¹d
.
™ Æ îï*-æ{;9÷æ
¶ ô Æ2ö õ
3 Ü
¢ õ
œsá .

 †l™ &4ðÜp ó
¶ †l™ p
Ž
p †’™sô ù õ æ{îï*õ æ{;9÷ævp Ë
e
& f
¹ d 3 ¥ ¶B†l™
îï*-æW;÷žæµp
¶†’™ ½
¥ž¶B†’™
¶
¥
&
ñ
¥f 5
162
Keqin Li
for all ¶“ƒ£™ . Consequently,
œ#á .
œsó á .
¶ †’™ ½ ó
B
¹ d 3 ž¥ ¶À†’™ ¹ d 3
¶ ½ œ
¥ ô3
æ ž÷ æÿ½
¥
¥ . ²  5
The above calculations give rises to
¢
™ Æ îï*æ{;÷žæµp 
 †l™ &4ðÜp

¥  . p
W
æ
ï
î
*
W
æ

;
ž
÷
µ
æ
p
e
ù
¶ ôÎÆ9ö õ
†’™Iô õ
& f
²
. v
õ
Using Equations (8) and (12),õ we obtain
Œ *C›#œµ*';9;k
œsó á
e ¹d
.
&
ñ
¥f 5
(12)
¢
™ Æ ïî *-æW;÷žæµp 
Wæ îï*-æ{;9÷ævp ®& †l™ f
ù
¶
v
ô
®

’
†
s
™
ô
õ
õ
9
Æ
ö
.
. ™ ñ
õõ
p ý™ e ² ¥ m  & p
¥&$f ð
¢
œsó á .
.
Æ
( ý ™ e d ¶ ™ ôÎõ îï*-æW;÷žæµp ® †’™ ô ù õ æWîï*-æ{;9÷æ f p e  & p ™ & f 5
Æ9ö
¹ .
õõ
It is clear that as &Ք¦• , we get the asymptotic
average-case performance bound in the theorem.
As an example, let us consider the uniform distributions, that is, the ? = ’s are i.i.d. random variables
uniformly distributed in the range *€x06™§1¨©­ , where ¨tƒ ™ is a positive integer. That is, îï*æ{;n(´¨ for
€µ½$æÿk£™§¨ . (Notice that when @ is sufficiently large, a discrete uniform distribution on ºž™0Ó¥ 0o5857580Ó@„§¨ ¾
can be treated as a continuous uniform distribution on *C€<06™1§¨©­ .)
M N
Theorem
4. If the ? = ’s be i.i.d. random variables uniformly distributed in the range *€x06™§1¨©­ , we have
Š ¡4¢
k í , that is,
Œ *C›#œµ*';9;
k sí Œ *|GIHKJO*C'D;2;z0
as &n”Ø• , where
™
3 X 3 † ™ † ™Dp ™ p‰mom6m©p
IíÏ(‰¥¨
3
e
Ÿ
¨
¥
*-¨I†l™; 3 fhg 0
as ¸”¦• .
Œ * ›sœ*C'D;2;
Œ *ÑGIHKJL*';9;
k
œsó á
ý e ¹d
™
.
Proof. We examine the numerator and denominator of the bound in Theorem 3. The denominator is
simply
and the numerator is
.
.
ýÎ(„Tvu1w e ô ù {æ îï*-æW;÷žæ/0 ô ïî *æ{;÷žæ f ( ¥™ ¨ 0
ú
õ
œ#ó á .
™
™
(t¨ e
p
5
d¹ í ¶ 3 *C¶Op„™©; ¥1q*-†’™©;f
Scheduling Independent Parallel Tasks
Since
we have
Note that
Thus, we have
163
™
™
™
™ 0
3¶ *C¶Àp„™©; ( ¶ 3 † ¶ p O
¶ pt™
œsó á .
œ#ó á .
™
™ † ™ p ™ 5
(
3
3
e
¹ dN í ¶ *C¶Àp„™©;
¹ dN í ¶ f ¨ 
œ#ó á .
í6á .
6í á .
™ ( ó ™ † ó ™ † ó
™ k 3 † ó ™ †
3
3
3
3
3
Ÿ
¹d í ¶
¹d . ¶
¹d . ¶
¹d œ ¶
¹d . ¶
œ#ó á .
3 ™
™ p„m6m6m1p
™
™
k
†
†
D
™
p
3
3
e
Ÿ
¨
¥
*¨s†’™©; 3
¹ d í ¶ *C¶Àp„™©;
™ 5

f 0
3
™
k’¨ e Ÿ † ¨ ™ † e ™Dp ¥ ™ 3 ptmom6m©p *-¨L†’™ ™©; 3 f p ¥ ‡*
†l™; f 5
By choosing  sufficiently large, the average-case performance bound §1ý can be made arbitrarily close
to sí .
3
When ¨L(ª™ , the asymptotic average-case performance bound given in Theorem 4 is . ( §1²%†Õ¥s(
™5¥¯°ž¯Ÿž¯€<57575 . To show the quality of the average-case performance bound Ií in Theorem 4, we give the
and
following numerical data.
.
3
(
(
Ð
(
ç
(
(
é
(
(
è
.ù
™ž5 ¥¯°¯žŸ¯ž€<57585
™ž58™³° à ¤¥€<57585
™ž58™©€¯¯x™©¥<™57585
™ž5 €ž¯ž¥²ž²ž¥ ¤ 57585
™ž5 €žŸŸ<™ àà °<57585
™ž5 € ³³¥ à ¯ ¤ 57585
™ž5 € à ¤ à ¥ ™©°<57585
™ž5 € à ™©³²€žŸ<57585
™ž5 €ž²Ÿ°ž² ¤¥ 57585
™ž5 €ž²²ž¥ž³³°<57585
(
(
(
It is clear that í ½™Ïp}™1§¨ for all ¨ÿΙ , i.e., í is less than the asymptotic worst-case performance
bound in Theorem 2.
Though closed form solutions are not available, the average-case performance bounds of algorithm › œ
could be calculated using Theorem 3 numerically for arbitrary probability distribution of task sizes. For
instance, let us consider a truncated exponential distribution, i.e.,
á!
îï*-æ{;D(™h† á 0
€µ½$æ·k‰™5
164
Keqin Li
M N
Theorem
5. If the ? = ’s be i.i.d. random variables exponentially distributed in the range
Š¡ ¢
k" , that is,
as &n”Ø• , where
as ¸”¦• .
Œ *C›#œµ*';9;%k#
N
ó ™ $á &%
d . ¶ m ¹
(
™ †
*€x06™­ , we have
Œ Ñ* GIHKJL*';9;0
a ¹zÇ . c á'%Ó¹
† ™h† á 0
á $
™h† á Proof. It can be easily verified by straightforward calculation that the numerator and denominator in
Theorem 3 are
œsó á
(
¹d
a c
™ m á$&% z¹ Ç . † á$&%Ó¹ p 
m ™h† ™ á$ e ™h†
á
¶
h
™
†
Ë

l
†
™
.
.
á$&%ˆœ
á&%2œ
†(  f 0
á $
á &% 3
$
á á$
ý(tTvuw e ™ † ™h† á$ 0 ™h† † á f ( ™ † ™h† á$ 0
respectively. By letting  ”¦• , the average-case performance bound §1ý can be made arbitrarily close
and
to .
To show the average-case performance bound
such a way that the mean task size
"
in Theorem 5, we let
 ()™o€ž¥ à
, and choose
in
á$
? Š ( ™ † ™h† á
can be made between performance
™1§ *|¥1¨; for ¨’( ™ž0ˆ¥<0657585706™©€ , so that a comparison
takes the values
bounds of ›#œ under the uniform and the exponential distributions.
(t€x5 €ž€€<™©¯<™©²<5758570
(t²x5 ³°²ž³<™™©°<5758570
(„³<5 °ž€²€ž€€ž€<5758570
(‰¤ 5 ° ¤1¯<™©€ ¤ž¤ 5758570
(t°x5 °ž°ž³ àžà ™ž™5758570
(ª™™ž5 °ž°°x™™ à ³ 5758570
(ª™o²x5 °ž°°ž¯² ¤€<5758570
(ª™©³<5 °ž°°ž° ¤ ™ž™5758570
(ª™¤ 5 °ž°°ž°°ž³€<5758570
(ª™o°x5 °ž°°ž°°°x™5758570
(ª™5¥¯°ž¯²ž€ž³ 57575
(ª™5¥ž¤1²x™o€žŸ à 57575
à
(ª™5¥ ™ ³ž¤³³ 57575
à
(ª™57™oŸ €€x™oŸ<57575
à
"B(ª™57™©¥ž¤² €€<57575
"B(ª™57™o€ž¥<™™©¯<™57575
à à
"B(ª™5 €¯ Ÿ ¤ ³ 57575
"B(ª™5 € ¤¥ž¥€ ¤1Ÿ<57575
"B(ª™5 €Ÿž¥°²ž€ ¤ 57575
"B(ª™5 €ž³³ ¤1Ÿ ³²<57575
Š (Ȁ<5³ , i.e, ¨l(–™ , due to
3 §1²v†‰¥ , when ?„
í for all ¨B}™ , because Œ *-&ê.6; is never null.
As shown in the above list, is slightly smaller than distribution imbalance in *C€<0o™­ . However, is larger than Scheduling Independent Parallel Tasks
165
5 Conclusions
We have studied the problem of scheduling independent nonmalleable parallel tasks in parallel systems
with identical processors. We proposed a simple approximation algorithm called ›#œ , and performed
combinatorial analysis for its worst-case performance and probabilistic analysis for its average-case perM N
Ð
formance. In particular, we proved
Ð3 . the following results. (1) The asymptotic worst-case performance
ratio ¡W¢ is in the range ¬8™ 5857™ .9è ­ . (2) If the numbers of processors requested by the tasks are uniformly
M N are i.i.d. random variables with finite mean
distributed i.i.d. random variables and task execution times
Š
and variance, then the average-case performance ratio is ¡4¢ kޙž5 ¥¯°ž¯Ÿ¯ž€<57585 . In other words, less than
22.5% of the allocated computing power is wasted. (3) Both the worst- and average-case performance
ratios improve significantly when tasks request for smaller numbers of processors. (4) Results similar to
(2)–(3) also hold for the truncated exponential distribution of task sizes.
Acknowledgements
The author wishes to express his gratitude to the editor and two anonymous referees for their comments
on improving the paper. This research was supported in part by National Aeronautics and Space Administration and the Research Foundation of State University of New York through the NASA/University Joint
Venture in Space Science Program under Grant NAG8-1313.
References
[1] B.S. Baker, D.J. Brown, and H.P. Katseff, “A 5/4 algorithm for two-dimensional packing,” Journal
of Algorithms 2, 348-368, 1981.
[2] B.S. Baker and J.S. Schwarz, “Shelf algorithms for two-dimensional packing problems,” SIAM Journal on Computing 12, 508-525, 1983.
[3] J. Blazewicz, J.K. Lenstra, and A.H.G. Rinnooy Kan, “Scheduling subject to resource constraints:
classification and complexity,” Discrete Applied Mathematics 5, 11-24, 1983.
[4] E.G. Coffman, Jr., D.S. Johnson, P.W. Shor, and G.S. Lueker, “Probabilistic analysis of packing and
related partitioning problems,” in Probability and Algorithms, 87-107, National Research Council,
1992.
[5] H.A. David, Order Statistics, John Wiley & Sons, 1970.
[6] M.R. Garey and R.L. Graham, “Bound for multiprocessor scheduling with resource constraints,”
SIAM Journal on Computing 4, 187-200, 1975.
[7] I. Golan, “Performance bounds for orthogonal oriented two-dimensional packing algorithms,” SIAM
Journal on Computing 10, 571-582, 1981.
[8] R. Krishnamurti and E. Ma, “An approximation algorithm for scheduling tasks on varying partition
sizes in partitionable multiprocessor systems,” Technical Report RC 15900, IBM Research Division,
1990.
166
Keqin Li
[9] D.K.D.B. Sleator, “A 2.5 times optimal algorithm for bin packing in two dimensions,” Information
Processing Letters 10, 37-40, 1980.
[10] J. Turek, J.L. Wolf, K.R. Pattipati, and P.S. Yu, “Scheduling parallelizable tasks: putting it all on
the shelf,” Proc. International Conference on Measurement and Modeling of Computer Systems,
225-236, 1992.
Appendix. Proof of Equation (7)
The following fact is used in the proof. If
)
æ p-, 0
µ
*æ{;K(+*. æ
p ÷
$
)
where 0/,10 . 0ˆ÷x02æ$ª€ , then *-æW; is an increasing (decreasing) function of æ if
*
*
consider three cases.
.
.
Case 1. ç Ô . is the maximum. Since Ô . ƒ 3 Ô . pqÔ 3 , we have Ô 3 k 3 Ô . . Since Ô .
Ð * . Ô . † 3Ð Ô 3 ; . Hence,
have æÕk
3
â
÷†, . Þ
 € (ª
½ € ). We
Ð
.
Ð
3
ç
ƒ 3 Ô . p Ô 3 p æ , we
™Dp ÔÔ 3. p Ô æ .
à ™ ¥ Ô3
3
Ô
( ™Dp Ô. p ² e ¥ † ² m Ô . f
( ™Dp ²¥ p ° ™ m ÔÔ 3.
k ™Dp ²¥ p ° ™ m ¥™
( ™ ™o™o²¯ 5
.
.
.
2. 3 Ô Ð . p£Ô 3 is the maximum.
Since Ô . k 3 Ô . p£Ô 3 , we have Ô . k ¥Ô 3 . Since 3 Ô . p‰Ô 3 ƒ
ç
Ð
3
. Ô Case
3 . p Ô 3 p ç æ , we get æÿk Ô 3 . Now,
Ð
Ô.äp . Ô©3
â
Ô
ä
.
$
p

Ô
D
3
š
p
æ
k .
0
( .
3 Ô . p$Ô 3
3 Ô . p—Ô 3
Ð
â
. Ð Ô . (£¥Ô 3 .
Ð
which is an increasing function
of Ô . . Hence takes its maximum value ™ .9è when
.
3Ð Ô 3 p ç æ is the maximum. Since . Ô . plÔ 3 k . Ô . p 3Ð Ô 3 p ç æ , we get æƒ ç Ô 3 . Note
Case 3. 3 Ô . p
3
3
that
â
( çÐ æŽpl. Ô1.äp$ԏÐ33
æµp 3 Ô . p Ô 3
ç
â
is a decreasing function of æ . Thus, gets its maximum value when æÁ( ԏ3 , i.e.,
Ð
Ô . p . Ô 3
â
k .
0
3 Ô . p$Ô 3
Ð
ç
Ð
.
3
ç
Ð
which is
â an increasing function of Ô. . . Since, Ô.µk 3 Ô1.p ԏ3%p æ , and æš( ԏ3 , we have Ô.µkޥԏ3 .
Hence reaches its maximum value ™ .è when Ô.h(£¥Ô3 .
(
Download