VirtualKnotter: Online Virtual Machine Shuffling for Congestion

advertisement
VirtualKnotter: Online Virtual Machine
Shuffling for Congestion Resolving in
Virtualized Datacenter
Xitao Wen, Kai Chen, Yan Chen,
Yongqiang Liu, Yong Xia, Chengchen Hu
1
Datacenter as Infrastructure
2
Congestion in Datacenter
Core
Packet
loss!
10:1~100:1
Queuing
delay!
Aggregation
2:1~10:1
Degrading
Throughput!
Edge
Pod 0
Pod 1
Pod 2
Pod 3
3
Congestion in the Wild
General Approaches
Problem Formulation
Main Design
Evaluation
4
Spatial Pattern
– Hotspot: Hot links
account for <10% core
links [IMC10]
– Spatially unbalanced
utilization
Receiver
• Unbalanced utilization
Sender
5
Temporal Pattern
• Long congestion event
Core Link Index
– lasts for 10s of minutes
– Individual event has clear spatial pattern
6
Traffic Stability
• Bursty at a fine granularity
– Not predictable at 10s or
100s or milliseconds
[IMC10][SIGCOMM09]
• Predictable at timescale
of 10s of minutes
– 40% to 70% pairwise traffic
can be expected stable
– 90%+ predictable traffic
aggregated at core links
7
Congestion in the Wild
General Approaches
Problem Formulation
Main Design
Evaluation
8
General Approaches
• Network Layer
– Increase network bandwidth
• Expensive
• Requires to upgrade
entire DC network
• Fat-tree, BCube, OSA…
– Optimize flow routing
• Hedera, MicroTE
• Application Layer
– Optimize VM placement
• Scalable
• Lightweight deployment
• Suitable for existing oversubscribed network
• Not scalable
• Requires hardware
support
• Depends on rich
path diversity
9
Background on Virtualized DC
• Virtualization Layer
• VM Live Migration
Major
Cost!
– Keep continuous service while migrating
– 1.1x – 1.4x VM memory transfer
VM
VM
VM
Server
VM
Server
DC Network
10
Optimize VM Placement
• Offload traffic from congested link
active VM
idle VM
11
Congestion in the Wild
General Approaches
Problem Formulation
Main Design
Evaluation
12
Design Goal
• Mitigate congestion
Objective
– Maximum link utilization (MLU)
• Controllable migration traffic (i.e.
moving VM)
Constraint
– Less than reduced traffic
• Reasonable runtime overhead
– Far less than target timescale (10s of mins)
13
Problem Statement
• Input
– Topology and routing of
physical servers
– Traffic matrix among VMs
– Current Placement
• Variable & Output
– Optimized Placement
• NP-hardness
– Proof: reduced from
Quadratic Bottleneck
Assignment Problem
14
Related Work
• Optimize VM placement
– Server consolidation [SOSP’07]
– Fault tolerance [ICS’07]
– Network scalability [INFOCOM’10]
15
Congestion in the Wild
General Approaches
Problem Formulation
Main Design
Evaluation
16
Inspiration
Solve each tie gently,
by carefully reeving
the end out of the tie.
Stretch the tie violently,
making it loose and less
tangled.
17
Two-step Algorithm
Topology &
Routing
Traffic
Matrix
Multiway
θ-Kernighan-Lin
Simulated
Annealing
Optimized VM
placement
Current VM
Placement
• Fast and greedy
• Search for localizing overall
traffic
• May stuck in local minimum
• Fine-grained and randomized
• Search for mitigating traffic
on the most congested links
• Help avoid local minimum
18
Multiway Θ-Kernighan-Lin (KL)
• Top-down graph cut
improvement
• Introduce Θ to limit #
of moves
• O(n2log(n))
19
Multiway Θ-Kernighan-Lin (KL)
• Top-down graph cut
improvement
• Introduce Θ to limit #
of moves
• O(n2log(n))
20
Multiway Θ-Kernighan-Lin (KL)
• Top-down graph cut
improvement
• Introduce Θ to limit #
of moves
• O(n2log(n))
21
Simulated Annealing Searching (SA)
MLU=.53
MLU=.60
• Randomized global
searching
• Terminate when
obtains satisfied
solution, or predefined
max depth is reached
22
Congestion in the Wild
General Approaches
Problem Formulation
Main Design
Evaluation
23
Methodology
• Baseline Algorithm
– Clustering-based algorithm
– Pro: best-known static optimality
– Con: high runtime and migration overhead
• Metrics
– MLU reduction without migration overhead
– Overhead
• Migration traffic
• Runtime overhead
– Simulation results
24
MLU Reduction without Overhead
VirtualKnotter demonstrates similar static
performance as that of Clustering.
25
Migration Traffic
VirtualKnotter shows significantly less
migration traffic than that of Clustering.
26
Runtime Overhead
VirtualKnotter demonstrates
reasonable runtime overhead.
27
Simulation Results
53% less
congestion
Altogether, VirtualKnotter obtains
significant gain on congestion resolving. 28
Conclusions
• Collaborative VM migration can substantially
resolve long-term congestion in DC
• Trade-off between optimality and migration
traffic is essential to harvest the benefit
DC networking projects of Northwestern LIST:
http://list.cs.northwestern.edu/dcn
29
Thank you!
30
Backup
31
General Approaches
Increase
Bandwidth
Optimize
Routing
Cost
Hardware
Support
Scalability
High
Yes
Varies
Low
Optimize VM
Low
Placement
Yes
No
Other
Dependency
Low
Rich path
diversity
High
VM
deployment
32
Problem Statement
• Objective
– Minimize Maximum Link Utilization (MLU)
– “Cool down the hottest spot”
• Constraints
– Migration traffic
– Server hardware capacity
– Inseparable VM
• NP-hardness
– Proof: reduced from Quadratic Bottleneck Assignment
Problem
33
Observation Summary
• Unbalanced jam (spatial)
• Long-term congestion (temporal)
• Predictable at 10s of minutes scale (stability)
34
Two-step Algorithm
Multiway Θ-Kernighan-Lin
Algorithm (KL)
• Fast search for approximation
Simulated Annealing
Searching (SA)
• Fine search for better
solution
35
Download
Related flashcards

Cryptography

26 cards

Cybernetics

25 cards

Create Flashcards