Storage Management in Virtualized Cloud Environments

advertisement
Storage Management in
Virtualized Cloud
Environments
Sankaran Sivathanu, Ling Liu, Mei Yiduo and
Xing Pu
Student Workshop on Frontiers of Cloud Computing, IBM
2010
Talk Outline
• Introduction
• Measurement results & Observations
– Data Placement & Provisioning
– Workload Interference
– Impacts of Virtualization
• Summary
2
Cloud & Virtualization
• Cloud Environment – Goals
– Flexibility in resource configuration
– Maximum resource utilization
– Pay-per-use Model
• Virtualization – Benefits
– Resource consolidation
– Re-structuring flexibility
– Separate protection domains
• Virtualization suits as one of the basic foundations of
Cloud infrastructures
3
Fundamental Issues
• Could Service Providers (CSPs) vs. Customers
– Customers purchase computing resources
– CSPs provide virtual resources (VMs)
– Customers perceive their resources as physical
machines!
• Multiple VMs reside in single physical host
– Resource Interference
– End-user performance depends on other users
• End-user unaware of where their data physically
exists
4
Goals of our Measurement
• For cloud service providers
– How to place data such that end-user performance is
maximized ?
– How to co-locate workloads for least interference ?
• For End-Users
– How to purchase resources in tune with requirement ?
– How to tune applications for maximum performance ?
• General insights on storage I/O in virtualized
environments
5
Benchmarks Used
• Postmark
– Mail Server Workload
– Create/Delete, Read/Append files
– Parameters
• File Size
• # of files
• Read/Write ratio
• Synthetic Workload
– Sequential vs. random accesses
– Zipf Distribution
6
Data Provisioning & Placement
7
Disk Provisioning
Consider a 100GB Disk
Case - I
Case - II
Workload
Data
footprint
~150MB
40GB Partition
4GB Partition
Throughput : 2.1 MB/s
Throughput : 1.4 MB/s
Performance Difference : 33%
8
Where to place VM disk ?
• Postmark benchmark
– Read operation
• Cases :
– Read from physical
partitions in different
zones
• Based on LBNs
• LBNs start from inner
zone and proceeds
towards outer zones.
– Read from disk file
(.vmdk)
9
Where to place multiple VM disks ?
• Postmark benchmark
– 2 instances (1 for each VM)
• Random reads
• Compare physical
partitions placed in
different zones
– O -> Outer
– I -> Inner
– M -> Mid
10
Observations
• Customers should purchase storage based on workload
requirement, not price
• Thin provisioning may be practiced
• Throughput intensive VMs can be placed in outer disk
zones
• Multiple VMs that may be accessed simultaneously
should be co-located on disk
– CSPs can monitor access patterns and move virtual disks
accordingly
11
Workload Interference
12
CPU-Disk Interference
Physical Host
VM - 1
VM - 2
CPU
CPU
DISK
DISK
Throughput : 23.4 MB/s
Throughput : 27.6 MB/s
Performance Difference : 15.3%
13
CPU-Disk Interference
 CPU allocation ratios has no effect on disk throughput
across VMs
 Disk intensive job performs better along with a CPU
intensive job
14
CPU-Disk Interference
Reason ?
Dynamic Frequency Scaling
15
CPU-Disk Interference
 CPU DFS is enabled in Linux by default
 Three ‘governors’ to control the DFS policy
 On-demand (default)
 Performance
 Power-save
 When 1 core is idle, entire CPU is down-scaled
because overall CPU utilization falls
16
Disk-Disk Interference
VM-1
CPU
Physical Host
V.Disk-1
CPU
V.Disk-2
VM-2
Physical Disk
• 1 instance of Postmark in each VMs
• 65.3% more time taken when compared to running
Postmark in a single VM
• Overhead mainly attributed to disk seeks : No more
sequential accesses
17
Disk-Disk Interference
VM-1
Physical Host
CPU
V.Disk-1
Disk - 1
CPU
V.Disk-2
VM-2
Disk - 2
• VMs using separate physical disks
• 17.52% more time taken when compared to running
Postmark in a single VM
• Overhead attributed to contention in Dom-0’s queue
structures
18
Disk-Disk Interference
• Postmark Benchmark
(Reads)
• Cases :
– Running in a single VM
– 1 instance in each of two
VMs
• 2 VMs reading from virtual
disks in same physical disk
• 2 VMs reading from virtual
disks in different physical
disks
19
Disk-Disk Interference
• IO scheduling policy in
Dom-0 has less effect
• ‘Ideal’ case is time taken
when running Postmark
in single VM
• Other cases are running 1
instance of Postmark in
each of 2 VMs (separate
physical disks)
20
Disk-Disk Interference
• Interference with respect
to workload type
• Synthetic read workload
• VMs use separate
physical disks
• Cases :
– Mix of sequential versus
random reads
• Sequential requests from
both VMs flood Dom-0
queue - contention
21
Observations
• CPU-intensive and disk-intensive workloads can be colocated for optimal performance and power
• Virtual disks that may be accessed simultaneously must
be placed in separate physical disks
• I/O scheduling in Dom-0 has less effect on disk workload
interference
• Two sequential workloads, when co-located suffer in
performance due to queue contention
• With separate disks, workload contention is generally
minimal, other than the case of two sequential
workloads
22
Impacts of Virtualization
23
Sequentiality
• Postmark benchmark
(reads)
• No much overhead seen
for random disk accesses
• VM overhead is mitigated
by larger disk overhead
• More felt for sequential
disk accesses
24
Block Size
• Postmark sequential
reads
• Fixed overhead with
every requests
• As block sizes increase, #
of requests are reduced,
hence overhead is
reduced
• Efficient to read in larger
blocks
25
Block size wrt. Locality
26
Observations
• VM overhead is not felt in random workloads –
amortized by disk seeks
• Extra layers of indirection is the reason for VM overhead
– when block size is large, overhead is amortized
• Block size may be increased only if there is sufficient
locality in access
27
Summary
• Storage purchased must depend on requirement, not price!
• It is better to place sequentially accessed streams in outer
disk zone
• Co-locate virtual disks that may be accessed simultaneously
• Co-locate CPU intensive task with disk intensive task for
better power and performance
• Avoid co-locating two sequential workloads on single
physical machine – even when it goes to separate physical
disks!
• Read in large blocks only when there is locality in workload
28
Questions
Contact : sankaran@gatech.edu
29
Download