Some Issues in Modeling VMware ESX Server

advertisement
The power behind great IT decisions
Some Issues in Modeling
VMware ESX Server
William L. Shelden, Jr., Ph.D.
Bill.Shelden@PerfMan.com
ISM, Inc.
© COPYRIGHT 2007 THE INFORMATION SYSTEMS MANAGER, INC. (ISM)
Abstract
Some Issues in Modeling VMware ESX Server
Two approaches to modeling VMware ESX Server are
presented.
The focus is on how the two model
designs reflect the impact of changing the number of
virtual CPUs in virtual machines running under VMware
ESX Server from the point-of-view of CPU throughputs
and response times. Some benchmark results appearing
recently in the literature are used to validate the
preferred
model
design.
Some
problems
with the interpretation and modeling of virtual machine
CPU utilizations as measured by Windows guest
operating systems are also discussed.
© COPYRIGHT 2007 THE INFORMATION SYSTEMS MANAGER, INC. (ISM)
Topics
•
The first model and its problems.
•
The second model and why it is better.
•
Validating the second model.
 Bolker and Ding paper
 Open v. Closed benchmarks/models
 Windows CPU Utilizations are problematic
•
Modeling changes to the number of virtual CPUs in Virtual
Machines
•
Additional VMware ESX Server Modeling Challenges
 CPU affinity
 Hyperthreading
© COPYRIGHT 2007 THE INFORMATION SYSTEMS MANAGER, INC. (ISM)
First Model (CMG 2005)
VM2
Delay
VM1
Delay
CPU
Queue
2 Job Classes
Server with 4
pCPUs
© COPYRIGHT 2007 THE INFORMATION SYSTEMS MANAGER, INC. (ISM)
•One for each VM
•Pop = No. vCPUs
•Target Util/Tput
Two Problems with the First Model
•
Throughput cannot change
 The model is solved iteratively. After each iteration,
the basic equation is for each VM is:
• N = Tput x (STCPU + QTCPU + Z) This becomes…
• vCPUs = Tput x (STCPU + QTCPU + Z)
• What happens if user changes vCPUs?
 Z is is recomputed after each iteration to achieve the target
utilization (i.e. throughput), so the throughput cannot
change (unless the target utilization is changed.)
 And, Z is a workload characteristic; it should not change.
• The model does not reflect queueing for virtual
CPUs within the guest operating systems.
Fewer vCPUs will always be better!
© COPYRIGHT 2007 THE INFORMATION SYSTEMS MANAGER, INC. (ISM)
Two Sources of CPU Delay
Without the vCPU Delay,
25
second model looks worse
20
15
pCPU Delay
vCPU Delay
CPU Service
10
5
0
Base
More vCPUs
© COPYRIGHT 2007 THE INFORMATION SYSTEMS MANAGER, INC. (ISM)
Second Model
VM2
Delay
VM1
Delay
Allocate
vCPU
Release
vCPU
CPU
Queue
Release
vCPU
Allocate
vCPU
Server with 4
pCPUs
© COPYRIGHT 2007 THE INFORMATION SYSTEMS MANAGER, INC. (ISM)
2 Job Classes
•One for each VM
•Pop = WinMPL
•Target Util/Tput
Why is the Second Model better?
• Throughput can change
 We still have for each VM:
• N = Tput x (STCPU + QTCPU + Z), but now it looks like
• WinMPL = Tput x (STCPU + QTCPU + Z)
• If the user changes vCPUs for a particular VM:
 It changes the number of ‘tokens’ (vCPUs) available at the
Allocate node in the model for the particular VM
 Model is executed once with Z fixed from Base model, so
throughput will change
• Can track delay for virtual CPUs and delay for
physical CPUs separately in the simulation
© COPYRIGHT 2007 THE INFORMATION SYSTEMS MANAGER, INC. (ISM)
The Bolker & Ding Paper
Virtual performance won’t do:
Capacity planning for virtual systems
• Describes a series of benchmarks:
 VMware ESX Server running on dual processor
 Two guests running Windows 2000 (Bermuda and Largo)
 Load generator forced load at specified utilization to each
VM by sending a Poisson stream of computation intensive
jobs (finding logarithms)
 Seven runs with Bermuda at 25% and Largo varying from
20% to 50% by 5%
 Used CPU affinity so two guests competed with each other
for the same processor but did not compete with the
manager
 No hyperthreading
© COPYRIGHT 2007 THE INFORMATION SYSTEMS MANAGER, INC. (ISM)
Paper Results
© COPYRIGHT 2007 THE INFORMATION SYSTEMS MANAGER, INC. (ISM)
Bermuda Results
Bermuda Results from Closed Model
70
60
50
40
30
20
10
0
20%
25%
30%
35%
40%
45%
Largo % Utilization
throughput
utilization (guest view)
utilization (manager view)
response time
total utilization (manager view)
© COPYRIGHT 2007 THE INFORMATION SYSTEMS MANAGER, INC. (ISM)
50%
Largo Results
Largo Results from Closed Model
70
60
50
40
30
20
10
0
20%
25%
30%
35%
40%
45%
Largo % Utilization
throughput
utilization (guest view)
utilization (manager view)
response time
total utilization (manager view)
© COPYRIGHT 2007 THE INFORMATION SYSTEMS MANAGER, INC. (ISM)
50%
Modeling the Bolker and Ding
Benchmarks
• Benchmarks set up with an open source
• Therefore, the best model to use to model the
benchmark is an open model
• But queue lengths in an open model are not
bounded
• Therefore, an open model will in general give
higher mean queue times
© COPYRIGHT 2007 THE INFORMATION SYSTEMS MANAGER, INC. (ISM)
Open vs. Closed Model
CPU Response Times Closed
Model (100 Threads)
CPU Response Times
Open Model
Mean
Queue
Time
0.4
0.2
45% 50% 55% 60% 65% 69% 74% 80% 85%
6
5
0.6
4
0.4
3
2
0.2
0.0
1
45% 50% 55% 59% 64% 69% 73% 78% 82%
CPU
Service
Time
0.6
4
0.4
3
2
0.2
1
45% 50% 55% 59% 64% 68% 73% 78% 82%
0
CPU
Queue
Time
CPU
Queue
Length
0.8
6
5
0.6
4
0.4
3
2
0.2
0.0
1
45% 50% 55% 59% 65% 69% 73% 78% 82%
% Utilization
© COPYRIGHT 2007 THE INFORMATION SYSTEMS MANAGER, INC. (ISM)
% Utilization
0
Mean
Number
5
Seconds
6
Mean
Number
Seconds
CPU Response Times Closed
Model (150 Threads)
CPU
Queue
Length
0.8
0.0
0
% Utilization
CPU Response Times Closed
Model (125 Threads)
CPU
Queue
Time
CPU
Queue
Length
0.8
% Utilization
CPU
Service
Time
CPU
Queue
Time
Mean
Number
0.6
6
5
4
3
2
1
0
Mean
Number
Seconds
0.8
0.0
CPU
Service
Time
CPU
Queue
Length
Seconds
Mean
Service
Time
Open v. Closed Model Summarized
Open v. Closed Model Summary
Open
Closed P=100
Closed P=125
Closed P=150
0.8
0.7
0.6
Seconds
0.5
0.4
0.3
0.2
0.1
0
1
2
3
4
5
6
Model Run
© COPYRIGHT 2007 THE INFORMATION SYSTEMS MANAGER, INC. (ISM)
7
8
9
Modeling changes to vCPUs
• VMware ESX Server System:
 Server had 4 physical CPUs (pCPUs)
 13 Virtual machines (VM)
 No VM used more than about 40% of a single pCPU
 Each VM had 2 virtual CPUs (vCPUs)
• What would be impact of reducing the number of vCPUs
from 2 to 1 in each VM?
© COPYRIGHT 2007 THE INFORMATION SYSTEMS MANAGER, INC. (ISM)
Modeling changes to vCPUs
© COPYRIGHT 2007 THE INFORMATION SYSTEMS MANAGER, INC. (ISM)
VMware ESX Server Dispatching
• Dispatch when Ready
 Dispatch a vCPU when it has work to do
 Not how VMware ESX Server dispatches
• Dispatch in Pairs
 Dispatch all of the vCPUs in the VM together
 This is how VMware ESX Server dispatches
© COPYRIGHT 2007 THE INFORMATION SYSTEMS MANAGER, INC. (ISM)
Using the Second Model
© COPYRIGHT 2007 THE INFORMATION SYSTEMS MANAGER, INC. (ISM)
Additional Modeling Issues
• Model CPU Affinity
• Model Hyperthreading
© COPYRIGHT 2007 THE INFORMATION SYSTEMS MANAGER, INC. (ISM)
Download