Future Generation Computer Systems(FGCS.J) journal homepage: www.elsevier.com/locate/fgcs
Saeid Abrishami a, ∗ , Mahmoud Naghibzadeha, Dick H.J. Epemab
Tai, Yu-Chang
4/29/2013
*
*
*
*
*
*
*
*
*
* Clouds are different from utility Grids
- on-demand resource provisioning
- homogeneous networks
- the pay-as-you-go pricing model
* consider the benefits of using Cloud computing for executing scientific workflows
-there exist several commercial Clouds, such as Amazon
*
Infrastructure as a Service (IaaS) Clouds, has some potential benefits for executing scientific workflows
1. users can dynamically obtain and release resources on demand, and charged on a pay-as-you-go basis
2.resource provisioning
3. illusion of unlimited resources important parameter : economic cost
-faster resources are more expensive than slower ones
-time-cost tradeoff in selecting appropriate services
-belongs to the multi-criteria optimization problems minimize the execution cost of the workflow, while completing the workflow before the user specified deadline
IaaS Cloud Partial Critical Paths (IC-PCP)
IaaS Cloud Partial Critical Paths with Deadline Distribution (IC-PCPD2)
*
* An application is modeled by a directed acyclic graph G(T , E)
* T is a set of n tasks {t
1
, t
2
, . . . , t n
}
* E is a set of dependencies e i,j
=(t i
,t j
)
* two dummy tasks t entry and t exit to the beginning and the end of the workflow (zero execution time and they are connected with zero-weight dependencies to the actual entry and exit tasks)
*
* services S = {s
1
,s
2
,…,s m
} with different QoS parameters such as CPU type and memory size, and different prices
* The pricing model is based on a pay-as-you-go basis similar to the current commercial Clouds, i.e., the users are charged based on the number of time intervals that they have used the resource, even if they have not completely used the last time interval c1= 5 c2= 2 c3= 1
*
ET(t i
, s j
) : execution time of task t i on computation service s j
* average bandwidth between the computation services is roughly equal
*
TT( e i,j
) : data transfer time of a dependency e i,j
*
MET(t i
) : Minimum Execution Time of a task t i
-execution time of task t i on a service s j ∈ S which has the minimum ET(t i , s j ) between all available services p p c t i t i c
* SS (t i
) = s j,k
: Selected Service for each scheduled task ti
* s j,k
: kth instance of service sj.
* AST (t i
) : Actual Start Time of ti
* assigned node :has already been assigned to (scheduled on) a service
* Critical Parent : of a node ti is the unassigned parent of t i that has the latest data arrival time at ti, that is, it is the parent tp of ti for which EFT(t p
) + TT(e p,i
), is maximal
* PCP: The Partial Critical Path of a node t i is:
- empty if t i does not have any unassigned parents
-consists of the Critical Parent tp of t i and the Partial
Critical Path of t p if has any unassigned parents
IC-PCP
*
*
1
0
2
3
10 20 30
0~ 2 __ 19
2
5
8
0~ 5 __ 16
5
12
16
0~ 3 __ 16
3
5
9
3~ 7 __ 24
4
6
10
7~ 10 __ 23
3
8
11
7~ 11 __ 22
4
8
11
8~ 13 __ 30
5
8
11
14~ 17 __ 30
3
6
8
14~ 19 __ 30
5
8
14
D=30
*
3
10
1
0
2
S
2,1 28
20
2
30
0~ 2 __ 19
2
5
8
Path{t2,t6,t9}
3~ 7 __ 24
4
6
10
0~ 12 __ 14
0~ 5 __ 16
5
12
16
14~ 17 __ 23
7~ 10 __ 23
3
8
11
0~ 3 __ 12
0~ 3 __ 16
3
5
9
12~ 20 __ 23
7~ 11 __ 22
4
8
11
8~ 13 __ 30
5
8
11
21~ 24 __ 23
14~ 17 __ 30
3
6
8
20~ 28 __ 30
14~ 19 __ 30
5
8
14
D=30
3
10
1
2
0
S
2,1 28
S
3,1
9 1
20
2
30
0~ 2 __ 19
2
5
8
0~ 12 __ 14
5
12
16
0~ 9 __ 12
0~ 3 __ 12
3
5
9
Path{t3}
3~ 7 __ 24
4
6
10
14~ 17 __ 23
3
8
11
12~ 20 __ 22
4
8
11
8~ 13 __ 30
5
8
11
21~ 24 __ 30
3
6
8
20~ 28 __ 30
5
8
14
D=30
10 20 30
1
2
0
S
2,1 28 2
S
2,2 14 6
3
S
3,1
9 1
0~ 2 __ 18
0~ 2 __ 19
2
5
8
0~ 12 __ 14
5
12
16
Path{t5,t8}
3~ 7 __ 23
3~ 7 __ 24
4
6
10
14~ 22 __ 24
14~ 17 __ 23
3
8
11
0~ 9 __ 12
3
5
9
12~ 20 __ 22
4
8
11
8~ 13 __ 30
5
8
11
22~ 28 __ 30
21~ 24 __ 30
3
6
8
20~ 28 __ 30
5
8
14
D=30
10 20 30
1
2
0
S
2,1
28 2
S
2,2 14 6
3
S
3,1
9 1
S
3,2 18 2
0~ 8 __ 13
0~ 2 __ 18
2
5
8
0~ 12 __ 14
5
12
16
0~ 9 __ 12
3
5
9
Path{t1,t4}
8~ 18 __ 23
3~ 7 __ 23
4
6
10
14~ 22 __ 24
3
8
11
12~ 20 __ 22
4
8
11
19~ 24 __ 30
8~ 13 __ 30
5
8
11
22~ 28 __ 30
3
6
8
20~ 28 __ 30
5
8
14
D=30
10 20 30
5
2
1
2
0
S
2,1
28 2
S
2,2 14 6
1 3
S
3,1
9 1
S
3,2 18 2
S
3,3 11
1
COST=2* 5 +1* 4 =14
0~ 8 __ 13
2
5
8
0~ 12 __ 14
5
12
16
0~ 9 __ 12
3
5
9
Path{t7}
8~ 18 __ 23
4
6
10
14~ 22 __ 24
3
8
11
12~ 20 __ 22
4
8
11
18~ 29 __ 30
19~ 24 __ 30
5
8
11
22~ 28 __ 30
3
6
8
20~ 28 __ 30
5
8
14
D=30
* Applicable
* applicable instance for a path if it satisfies two conditions:
- The path can be scheduled on the instance such that each task of the path is finished before its latest finish time
- The new schedule uses (a part of) the extra time of the instance,which is the remaining time of the last time interval of thatinstance.
P
C
P C
Cost=zero
Call PLANNING(G(T,E))
IC-PCPD2
*
Assign subdeadline on PCP node
(assigned node)
t entety t
1 t
2 t
3 t
4 sb=0
0~ 5 __ 6
0~ 2 __ 6
2
5
8
0~ 5 __ 7
5
12
16
0~ 3 __ 16
3
5
9
S 1,1
0
S 2,1 t
1
S 3,1
10 20 30
6~ 10 __ 24
3~ 7 __ 24
4
6
10
7~ 10 __ 13
3
8
11
7~ 11 __ 17
4
8
11
11~ 16 __ 30
8~ 13 __ 30
5
8
11
14~ 17 __ 30
3
6
8
14~ 19 __ 30
5
8
14
D=30
t entety t
1 t
2 t
3 t
4 t
5 sb=0
0~ 5 __ 6
2
5
8
0~ 5 __ 7
5
12
16
0~ 3 __ 16
3
5
9
S 1,1
0 t
2
S 2,1 t
1
S 3,1
10 20 30
6~ 10 __ 24
4
6
10
7~ 10 __ 13
3
8
11
7~ 11 __ 17
4
8
11
11~ 16 __ 30
5
8
11
14~ 17 __ 30
3
6
8
14~ 19 __ 30
5
8
14
D=30
t entety t
1 t
2 t
3 t
4 t
5 t
6 sb=0
0~ 5 __ 6
2
5
8
0~ 5 __ 7
5
12
16
0~ 9 __ 16
0~ 3 __ 16
3
5
9
S 1,1
0 t
2
S 2,1 t
1
S 3,1 t
3
10 20 30
6~ 10 __ 24
4
6
10
7~ 10 __ 13
3
8
11
11~ 15 __ 17
7~ 11 __ 17
4
8
11
11~ 16 __ 30
5
8
11
14~ 17 __ 30
3
6
8
18~ 23 __ 30
14~ 19 __ 30
5
8
14
D=30
t entety t
1 t
2 t
3 sb=0 t
4 t
5 t
6 t
0~ 5 __ 6
2
5
8
0~ 5 __ 7
5
12
16
7
0~ 9 __ 16
3
5
9
S 1,1
0 t
2
S 2,1 t
1
10
S 3,1 t
3
S 3,2 t
4
6~ 16 __ 24
6~ 10 __ 24
4
6
10
20
7~ 10 __ 13
3
8
11
30
17~ 22 __ 30
11~ 16 __ 30
5
8
11
17~ 20 __ 30
14~ 17 __ 30
3
6
8
11~ 15 __ 17
4
8
11
18~ 23 __ 30
5
8
14
D=30
30
S 1,1
0 t
2 t
5
10
S 2,1 t
1 t entety t
1 t
2 t
3 t
4 t
5 t
6 t
7 t
8
S 3,1 t
3
S 3,2 t
4
20
0~ 5 __ 6
2
5
8
6~ 16 __ 24
4
6
10 sb=0
0~ 5 __ 7
5
12
16
7~ 10 __ 13
3
8
11
17~ 22 __ 30
5
8
11
17~ 20 __ 30
3
6
8
0~ 9 __ 16
3
5
9
11~ 15 __ 17
4
8
11
18~ 23 __ 30
5
8
14
D=30
30
S 1,1
0 t
2 t
5
10
S 1,2 t
6
S 2,1 t
1 t entety t
1 t
2 t
3 t
4 t
5 t
6 t
7 t
8 t
9
S 3,1 t
3
S 3,2 t
4
20
0~ 5 __ 6
2
5
8
6~ 16 __ 24
4
6
10 sb=0
0~ 5 __ 7
5
12
16
7~ 10 __ 13
3
8
11
17~ 22 __ 30
5
8
11
17~ 20 __ 30
3
6
8
0~ 9 __ 16
3
5
9
11~ 15 __ 17
4
8
11
18~ 23 __ 30
5
8
14
D=30
30
S 1,1
0 t
2 t
5
10
S 1,2 t
6
S 2,1 t
1 t entety t
1 t
2 t
3 t
4 t
5 t
6 t
7 t
8 t
9
S 3,1 t
3
S 3,2 t
7 t
4
20
0~ 5 __ 6
2
5
8
6~ 16 __ 24
4
6
10 sb=0
0~ 5 __ 7
5
12
16
7~ 10 __ 13
3
8
11
16~ 28 __ 30
17~ 29 __ 30
17~ 22 __ 30
5
8
11
17~ 20 __ 30
3
6
8
0~ 9 __ 16
3
5
9
11~ 15 __ 17
4
8
11
18~ 23 __ 30
5
8
14
D=30
0~ 5 __ 7
5
12
16
7~ 10 __ 13
3
8
11
30 20
S 1,1
0 t
2 t
5
10
S 1,2 t
6
S 2,1 t
1 t entety t
1 t
2 t
3 t
4 t
5 t
6 t
7 t
8 t
9
0~ 5 __ 6
2
5
8
S 3,1 t
3
S 3,2 t
7 t
4
S 3,3 t
8
6~ 16 __ 24
4
6
10 sb=0
16~ 28 __ 30
5
8
11
17~ 25 __ 30
17~ 20 __ 30
3
6
8
0~ 9 __ 16
3
5
9
11~ 15 __ 17
4
8
11
18~ 23 __ 30
5
8
14
D=30
30 t entety t
1 t
2 t
3 t
4 t
5 t
6 t
7 t
8 t
9
5 S 1,1
0 t
2 t
5
10
S 1,2 t
6
2 S 2,1 t
1
1 S 3,1 t
3
S 3,2
S 2,2 t
7 t
4
S 3,3
20 t
8 t
9
0~ 5 __ 6
2
5
8
6~ 16 __ 24
4
6
10
16~ 28 __ 30
5
8
11
COST=5* 2 +2* 2 +1* 4 =18 sb=0
0~ 5 __ 7
5
12
16
0~ 9 __ 16
3
5
9
7~ 10 __ 13
3
8
11
11~ 15 __ 17
4
8
11
17~ 25 __ 30
3
6
8
18~ 26 __ 30
18~ 23 __ 30
5
8
14
D=30
*
O(n+e)~O(n^2)
IC-PCP=O(n^2)
O(n)
O(n-1)
O(n^2)
O(m*n)=O(n^2)
*
Call PLANNING(G(T,E))
O(n^2)
O(n^2)
IC-PCPD2=O(n^2)
Assign subdeadline on PCP node O(n)
Algo1
IC-PCP
*
Algo2
IC-PCPD2
Algo3
IC-LOSS
Fastest schedule : scheduling each workflow task on a distinct instance of the fastest computation service, while all data transmission times are considered to be zero
MF = makespan of the Fastest schedule deadline factor α set the deadline = α ・ MF
-Since the problem has no solution for α = 1, we let α ranges from
1.5 to 5 in our experiments, with a step length equal to 0.5
Cheapest schedule : scheduling all workflow tasks on a single instance of the cheapest computation service normalize the total cost of each workflow execution
*
Algo1
IC-PCP
Algo2
IC-PCPD2
Algo3
IC-LOSS
1 > 2 > 3 1 > 2 > 3 1 ≈ 2 > 3 1 > 2 > 3 2 > 1 > 3
1 > 2 > 3 1 > 2 > 3 1 ≈ 2 > 3 1 > 2 > 3 2 > 1 > 3
*
* The new algorithms consider the main features of the current
commercial Clouds such as on-demand resource provisioning, homogeneous networks, and the pay-as-you-go pricing model
*
The time complexity of both algorithms is O(n2), The polynomial time complexity makes them suitable options for the large workflows
* IC-PCP outperforms both, IC-PCPD2 and IC-Loss in most cases
* experiments show that the computation times of the algorithms are very low, less than 500 ms for the large workflows
* intend to improve our algorithms for the real Cloud environments
*
*