ppt - IBM Research

advertisement
Performance and Availability Models for
IaaS Cloud and Their Applications
Rahul Ghosh
Duke High Availability Assurance Lab
Dept. of Electrical and Computer Engineering
Duke University, Durham, NC 27708
www.ee.duke.edu/~rg51
Collaborators: Vijay K. Naik, Murthy Devarakonda (IBM),
Kishor S. Trivedi, DongSeong Kim and Francesco Longo (Duke)
IBM Student Workshop for Frontiers of Cloud Computing
Hawthorne, NY, USA
September 10, 2010
1
Introduction
Key problems of interest:
 Characterize cloud services as a function of arrival rate, available capacity,
service requirements, and failure properties
 Apply these characteristics in SLA analysis and management, admission
control, cloud capacity planning, cloud economics
Approach:
 Performability (Performance + Availability) analysis
 We use stochastic interacting stochastic sub-models based approach
• Lower relative cost of solving the models while covering large parameter
space compared to measurement based analysis
Two key quality-of-service measures for IaaS cloud:
(1) service availability and (2) provisioning response delay
2
Novelty of our approach
Single monolithic model vs. interacting sub-models approach
-
Even with a simple case of 6 physical machines and 1 virtual machine
per physical machine, a monolithic model will have 126720 states.
-
In contrast, our approach of interacting sub-models has only 41 states.
Clearly, for a real cloud, a naïve modeling approach will lead to very large
analytical model. Solution of such model is practically impossible.
Interacting sub-models approach is scalable, tractable and of high fidelity.
Also, adding a new feature in an interacting sub-models approach, does
not require reconstruction of the entire model.
What are the different sub-models? How do they interact?
3
System model
Main Assumptions
 All requests are homogenous, where each request is for one virtual
machine (VM) with fixed size CPU cores, RAM, disk capacity.
 We use the term “job” to denote a user request for provisioning a VM.
 Submitted requests are served in FCFS basis by resource provisioning
decision engine (RPDE).
 If a request can be accepted, it goes to a specific physical machine (PM)
for VM provisioning. After getting the VM, the request runs in the
cloud and releases the VM when it finishes.
 To reduce cost of operations, PMs can be grouped into multiple pools.
We assume three pools – hot (running with VM instantiated), warm
(turned on but VM not instantiated) and cold (turned off).
 All physical machines (PMs) in a particular type of pool are identical.
4
Life-cycle of a job inside a IaaS cloud
Provisioning response delay
Arrival
Queuing
Admission
control
Job rejection
due to buffer full
Provisioning
Decision
Instantiation
Resource
Provisioning
Decision
Engine
Instance
Creation
VM
deployment
Deploy
Actual Service
Out
Run-time
Execution
Job rejection due to
insufficient capacity
Provisioning and servicing steps:
(i) resource provisioning decision,
(ii) VM provisioning and
(iii) run-time execution
We translate these steps
into analytical sub-models
5
Resource provisioning decision
Provisioning response delay
Arrival
Queuing
Admission
control
Job rejection
due to buffer full
Provisioning
Decision
Instantiation
Resource
Provisioning
Decision
Engine
Instance
Creation
VM
deployment
Deploy
Actual Service
Out
Run-time
Execution
Job rejection due to
insufficient capacity
6
Resource provisioning decision
A request is provisioned on a hot PM if pre-instantiated but unassigned
VM exists. If none exists, a PM from warm pool is used. If all warm
machines are busy, a PM from cold pool is used.
7
Resource provisioning decision model
Continuous Time Markov Chain (CTMC)
Provisioning decision of a single job
i = number of jobs in queue,
s = pool (hot, warm or cold) 8
Output measures
-Job rejection probability due to buffer full (Pblock)
-Job rejection probability due to insufficient capacity (Pdrop)
-Total job rejection probability (Preject= Pblock+ Pdrop)
Reward rate based approach
(attach a reward rate to each state of
Markov chain)
-Mean queuing delay (E[Tq_dec])
-Mean decision delay (E[Tdecision])
Little’s law
(connecting mean number in the
queue with mean waiting time)
3-stage Coxian distribution
9
VM provisioning
Provisioning response delay
Arrival
Queuing
Admission
control
Job rejection
due to buffer full
Provisioning
Decision
Instantiation
Resource
Provisioning
Decision
Engine
Instance
Creation
VM
deployment
Deploy
Actual Service
Out
Run-time
Execution
Job rejection due to
insufficient capacity
10
VM provisioning model
Hot PM
Resource
Provisioning
Decision
Engine
Accepted jobs
Running VMs
Idle resources in hot machine
Idle resources in warm machine
Idle resources in cold machine
Service
out
Hot pool
Warm pool
Cold pool
11
VM provisioning model for each hot PM
h
0,0,0

h
0,1,0
h
…

h
0,0,1
Lh,1,0
h

h
h
…
(Lh-1),1,1
…
…
(m  1) 
h
(m  1) 
0,0,(m-1)
h
m
i,j,k
h
0,1,(m-1)
h
0,0,m
h
m
h

h
Lh,1,1
h
2
2
…
h
2
Lh is the buffer size
and m is max. # VMs
that can run
simultaneously on a PM
h
(m  1) 
…
h
h
1,0,m
h
…
(Lh-1),1,(m-1)
h
h
(m  1) 
Lh,1,(m1)
h
m
h
Lh,0,m
i = number of jobs in the queue, j = number of VMs being
provisioned, k = number of VMs running
12
VM provisioning model for each warm PM
0,0,0
w
w
0,1**,0
w

…
(Lw1),1,1
h
w
m
h
0,1,(m-1)
h
0,0,m
h
2
w
m
w
2
…
(m  1)
0,0,(m-1)
Lw,1,1
…
(m  1)
h
h 
…
2
Lw, 1**,0
w
w
w
Lw,1,0
w
…

h
0,0,1
w
w
0,1,0
w
Lw,1*,0
…
w

w
w
0,1*,
0
h
(m  1)
…
w
h
1,0,m
(Lw-1),1,(m1)
w
…
h
(m  1)
w
Lw,1,(m-1)
w
Lw,0,m
m
h
13
Output measures from VM provisioning models
Prob. that a job can be accepted in the hot/warm/cold pool (Ph /Pw /Pc)
Weighted mean queuing delay for VM provisioning (E[Tvm_q])
Weighted mean provisioning delay (E[Tprov])
14
Run-time execution
Provisioning response delay
Arrival
Queuing
Admission
control
Job rejection
due to buffer full
Provisioning
Decision
Instantiation
Resource
Provisioning
Decision
Engine
Instance
Creation
VM
deployment
Deploy
Actual Service
Out
Run-time
Execution
Job rejection due to
insufficient capacity
15
Run-time model
Model outputs: Mean job service time / resource holding time
1

16
Output measures from pure performance models
All these models are used for pure performance analysis since we do not consider any failure
Output of resource provisioning decision model:
-Job rejection probability due to buffer full (Pblock)
-Job rejection probability due to insufficient capacity (Pdrop)
-Mean queuing delay (E[Tq_dec])
-Mean decision delay (E[Tdecision])
Output of VM provisioning models:
-Probability that a atleast one machine in hot /warm/cold pool can accept a job for
provisioning
-These probabilities are denoted by Ph, Pw and Pc for hot, warm and cold pool respectively
-Weighted mean queuing delay for VM provisioning (E[Tq_vm])
-Weighted mean provisioning delay (E[Tprov])
Output of run-time model:
-Mean job service time
Output of pure performance models
-Total job rejection probability (Preject= Pblock + Pdrop)
-Net mean response delay (E[Tresp]=E[Tq_dec]+E[Tdecision]+E[Tq_vm]+E[Tprov])
17
Availability model
Model outputs: Probability that the cloud service is available, downtime
in minutes per year
18
Model interactions: Performability
19
Numerical Results
20
Effect of increasing job arrival rate
21
Effect of increasing job service time
22
Effect of increasing # VMs
23
Effect of increasing MTTF of a PM
24
Applications of the models
25
Admission control
Arrival
rate
(jobs/hr)
Distribution of PMs across different pools (all delays are in seconds)
(15, 15, 15)
(30, 30, 30)
(45, 45, 0)
(90, 0, 0)
E[Tresp]
E[Tprov]
E[Tresp]
E[Tprov]
E[Tresp]
E[Tprov]
E[Tresp]
E[Tprov]
250
484.37
477.83
314.26
310.27
304.03
300.24
303.79
300.00
500
697.98
656.92
354.87
347.83
312..00
306.62
305.14
300.00
550
5146.12
666.07
363.95
355.66
315.00
309.06
305.54
300.00
600
13825.85
670.52
373.99
364.03
318.42
311.80
306.00
300.00
Increasing arrival rate increases response
delay. Putting more PMs reduces this delay.
What is the maximum job arrival rate that can supported by the cloud service?
26
Response time – energy trade-off
Arrival
rate
(jobs/hr)
Distribution of PMs across different pools (all delays are in seconds)
(15, 15, 15)
(30, 30, 30)
(45, 45, 0)
(90, 0, 0)
E[Tresp]
E[Tprov]
E[Tresp]
E[Tprov]
E[Tresp]
E[Tprov]
E[Tresp]
E[Tprov]
250
484.37
477.83
314.26
310.27
304.03
300.24
303.79
300.00
500
697.98
656.92
354.87
347.83
312..00
306.62
305.14
300.00
550
5146.12
666.07
363.95
355.66
315.00
309.06
305.54
300.00
600
13825.85
670.52
373.99
364.03
318.42
311.80
306.00
300.00
Increasing capacity reduces the
gap between actual provisioning
delay and response delay.
What is the optimal # PMs across different pools that minimizes response time
for a given energy budget?
27
SLA driven capacity planning
What should be the size of each pool, so
that total cost is minimized and SLA (maximum rejection
probability or response delay) is upheld?
28
Recent work on
IaaS cloud resiliency
29
Resiliency Analysis
Definition of resiliency
Resiliency is the persistence of service delivery that can justifiably be
trusted when facing changes*
changes of interest in the context of IaaS cloud
Increase in workload, faultload
Decrease in system capacity
Security attacks
Accidents or disasters
Our contributions:
Quantifying resiliency of IaaS cloud
Resiliency analysis approach using performance analysis models
*[1] J. Laprie, “From Dependability to resiliency”, DSN 2008
[2] L. Simoncini, “Resilient Computing: An Engineering Discipline”, IPDPS 2009
30
Effect of changing demand
31
Effect of changing capacity
32
Conclusions
Stochastic model can be an inexpensive alternative to measurement based
evaluation of cloud QoS
To reduce the complexity of modeling, we use an interacting sub-model approach
- Overall solution of the model is obtained iteration over individual sub-model
solutions
The proposed approach is general and can be applicable to variety of IaaS clouds
Results show that IaaS cloud service quality is affected through variations in
workload (job arrival rate, job service rate), faultload (machine failure rate) and
available system capacity
This approach can be extended to solve specific cloud problems such as capacity
planning of public, private and hybrid clouds
In future, models will be validated using real data collected from cloud
33
Thanks!
34
Download