Live migration

advertisement
Evaluation of Delta Compression
Techniques for Efficient Live
Migration of Large Virtual Machines
Petter Svärd, Benoit Hudzia, Johan Tordsson
and Erik Elmroth
Umeå University, Dept of Computing Science
VEE 2011, Newport Beach, CA, USA
Live migration
“Transfer a VM from one host to another without
disrupting services.”
• The VM:s state (memory pages) is transferred in the
background with the VM still running
• The file system is typically located on a NFS and is
not moved
Live migration
- Typical algorithm
The time between the VM is suspended and
resumed is defined as the downtime
Our goal is to reduce the downtime
Live migration
- Problems with the typical algorithm
When migrating memory intensive VM or over
slow NW links:
1. Memory pages can be dirtied faster than they
are transferred over the network
2. The VM has to be suspended for an extended
period of time -> long downtime
3. Network connections time out and drop /
triggers fail
Leads to disruption of services
Live migration
- Problems with the typical algorithm (cont)
Problem
dirtying rate > migration throughput
Possible Solutions
Decrease dirtying rate or increase migration throughput
• Decreasing dirtying rate might hurt server
performance and disrupts services
Increase migration throughput!
Delta compression
- Increasing migration throughput
Overall idea: transfer changes to pages instead
of the full page contents thus increasing migration
throughput
• Store sent pages in a cache
• When transferring, if the page is cached,
compute an XOR delta page
• Compress the delta page
Delta compression
- continued
Vanilla (no compr.)
• Wasting time on
cache misses
• Efficient caching
scheme and
compression
algorithm is vital!
Delta compression
Delta compression
- caching
Desired properties:
• Lean
• Constant seek time regardless of size
L2 caching scheme
Delta compression
- compression
Desired properties:
• Lean (low cpu usage)
• Effective (high compression ratio)
• General purpose
The XOR delta page is suitable for RLE
compression
– (Symbol)(Repetitions) → AAAAABBBCCCCC = 5A3B5C
XOR BinaryRunLengthEncoding -> XBRLE
XBRLE compression
- Source side algorithm
XBRLE compression
- Destination side algorithm
XBRLE compression
- conceptual illustration
Implementation
Modified version of qemu-kvm userspace
code to support the XBRLE migration
algorithm.
Lean, ~500 LoC
Evaluation done on version 0.11.2
Demo
Migrating streaming video over 10Mbit/s
Before migration:
Demo
Migrating streaming video (cont)
After migration:
Demo
Migrating streaming video (cont)
Evaluation
- Test cases
• Memory write benchmark (lm_bench)
– 1 GB RAM, 1 vcpu VM
– Near ideal case
• Transcoded HD Video
– 1 GB RAM, 1 vcpu VM
– Real-world, non-ideal case
• SAP ERP application
– 8 GB RAM, 4 vcpus VM
– Large business application
– Relies on transactions and is thus sensitive to
extended downtime
Evaluation
- Experimental setup
Benchmark and HD Video
2x 2,66GHz core2quad 16GB RAM
NFS share on source machine
100Mbit/s Network
SAP ERP
2x 3,0GHz Xeon dual-core 32GB RAM
16TB Raid 5, 6Gbits/s trunked NFS server
1000Mbit/s Network
Evaluation
- Benchmark
• Downtime reduced by a factor of 100
• Throughput increased by 63 %
Evaluation
- Streaming video
• UDP downtime reduced from 8 s to 1
• Migration is transparent using XBRLE
Evaluation
- SAP ERP
Vanilla
XBRLE
•
•
The ERP application was non-responsive on resume using the vanilla
algorithm but survived using XBRLE
“Rule of thumb” is that more than 0.5 s of downtime might hurt the
system. Measured downtime was 0.2 for XBRLE and 2 for vanilla.
Conclusion
Delta compression works well migrating
• VMs running workloads with a highly compressible
working set
• VMs running heavy workloads with large working
sets
• and/or over slow networks (i.e., WANs).
Future work
Page priority algorithm
• Avoid re-sends of pages that are dirtied
frequently
• Promising early results
Download