Live migration of processes maintaining multiple

advertisement
An Efficient Process Live Migration Mechanism for
Load Balanced Distributed Virtual Environments
Balazs Gerofi, Hajime Fujita, Yutaka Ishikawa
Yutaka Ishikawa Laboratory
The University Of Tokyo
IEEE Cluster2010
Outline
•
•
•
•
•
•
•
•
Motivation
Cluster Server Architecture
DVE Software Components
Process Live Migration
Multiple Socket Migration Optimizations
Dynamic Load Balancing
Evaluation
Conclusion
IEEE Cluster2010
Motivation
• In Distributed Virtual Environments (DVE):
– Massively Multi-player Online Games (MMPOG)
– Networked Virtual Environments (NVE)
– Distributed Simulations such as the High-Level Architecture (HLA)
• 10,000 ~ 100,000 of clients may be involved
• Cluster of servers is used for providing services on large scale
– Zoning (i.e., partitioning the virtual space among servers)
• Main limitations of application level load-balancing:
– Client migrations are heavy, server state needs to be transferred,
client(s) reconnect, etc..
– Physical machine limited to neighboring zones
• Is operating system level load-balancing feasible?
–
–
–
–
Server processes are highly interactive
Maintain a massive amount of network connections (clients)
Maintain connections with other in-cluster components
How to migrate such processes?
IEEE Cluster2010
Outline
•
•
•
•
•
•
•
•
Motivation
Cluster Server Architecture
DVE Software Components
Process Live Migration
Multiple Socket Migration Optimizations
Dynamic Load Balancing
Evaluation
Conclusion
IEEE Cluster2010
Cluster Server Architecture
• Each DVE server is equipped with a public and a private interfaces,
same IP address is assigned to the public interfaces
• Router broadcasts incoming packets to all DVE server nodes
– Migrating zone server processes does not require any work on the router!
• Zone server processes are distinguished based on separate port
numbers (as opposed to separate IP addresses)
IEEE Cluster2010
Outline
•
•
•
•
•
•
•
•
Motivation
Cluster Server Architecture
DVE Software Components
Process Live Migration
Multiple Socket Migration Optimizations
Dynamic Load Balancing
Evaluation
Conclusion
IEEE Cluster2010
Server Node Software Components
migd
transd
mig_mod
•
•
•
•
•
•
zone_serv1
cond
cap_trans_mod
…
zone_servn
Linux kernel
mig_mod: migration module with live migration and socket support (extension of Berkeley C/R
module)
cap_trans_mod: packet capturing and address translation kernel module (detailes in paper)
transd: translation daemon
migd: migration daemon
cond: load monitor and load balancer
zone_serv: zone server processes
IEEE Cluster2010
Outline
•
•
•
•
•
•
•
•
Motivation
Cluster Server Architecture
DVE Software Components
Process Live Migration
Multiple Socket Migration Optimizations
Dynamic Load Balancing
Evaluation
Conclusion
IEEE Cluster2010
Process Live Migration
Process Image
network sockets
source host
destination host
network
IEEE Cluster2010
Process Live Migration
Transfer the whole
process image in the
background without
stopping the execution
Process Image
Process Image
network sockets
source host
destination host
network
IEEE Cluster2010
Process Live Migration
- dirty memory page
Track dirty pages for a
certain period, process is
still being executed
Process Image
Process Image
network sockets
source host
destination host
network
IEEE Cluster2010
Process Live Migration
- dirty memory page
Stop process (freeze
phase), transfer dirty
memory, export network
connections and transfer
data to destination
Process Image
network sockets
source host
destination host
network
IEEE Cluster2010
Process Live Migration
Note: main goal is short process freeze time!
Apply changes and
resume execution
Process Image
network sockets
source host
destination host
network
IEEE Cluster2010
Outline
•
•
•
•
•
•
•
•
Motivation
Cluster Server Architecture
DVE Software Components
Process Live Migration
Multiple Socket Migration Optimizations
Dynamic Load Balancing
Evaluation
Conclusion
IEEE Cluster2010
Iterative socket migration
(during process freeze phase)
- dirty memory page
Incoming packet loss prevention!
Process Image
Process Image
network sockets
source host
destination host
network
IEEE Cluster2010
Iterative socket migration
(during process freeze phase)
- dirty memory page
Extract remote IP and
port number, set up a
filter at the destination
node to capture
incoming packets and
disable socket
Process Image
Process Image
network sockets
source host
destination host
network
IEEE Cluster2010
Iterative socket migration
(during process freeze phase)
- dirty memory page
Migrate socket data to
destination node
Process Image
Process Image
network sockets
source host
destination host
network
IEEE Cluster2010
Iterative socket migration
(during process freeze phase)
- dirty memory page
Inject any packets that
were captured on the
destination node and
attach socket to the
process
Process Image
Process Image
network sockets
source host
destination host
network
IEEE Cluster2010
Iterative socket migration
(during process freeze phase)
- dirty memory page
Process Image
Process Image
network sockets
source host
destination host
network
IEEE Cluster2010
Iterative socket migration
(during process freeze phase)
- dirty memory page
Process Image
Process Image
network sockets
source host
destination host
network
IEEE Cluster2010
Iterative socket migration
(during process freeze phase)
Note: requires several synchronization steps
with short writes following each other!
- dirty memory page
Process Image
Process Image
network sockets
source host
destination host
network
IEEE Cluster2010
Collective socket migration
(during process freeze phase)
- dirty memory page
Process Image
Process Image
network sockets
source host
destination host
network
IEEE Cluster2010
Collective socket migration
(during process freeze phase)
- dirty memory page
Extract remote IP and
port number for all
sockets, set up filters to
capture incoming
packets and disable
sockets
Process Image
Process Image
network sockets
source host
destination host
network
IEEE Cluster2010
Collective socket migration
(during process freeze phase)
- dirty memory page
Process Image
Extract socket data into
one unified buffer and
transfer everything in
one go
Process Image
network sockets
source host
destination host
network
IEEE Cluster2010
Collective socket migration
(during process freeze phase)
- dirty memory page
Process Image
Note: the amount of socket data transferred can
be still large!
Attach sockets, inject
packets.
Process Image
network sockets
source host
destination host
network
IEEE Cluster2010
Incremental collective socket migration
(during dirty-log phase)
- dirty memory page
Process Image
Process Image
network sockets
source host
destination host
network
IEEE Cluster2010
Incremental collective socket migration
(during dirty-log phase)
- dirty memory page
All socket data are
transferred
asynchronously and
tracking structures are
allocated for each
connection
Process Image
network sockets
Process Image
network sockets
source host
destination host
network
IEEE Cluster2010
Incremental collective socket migration
(during dirty-log phase)
- dirty memory page
Some pages are dirtied
and some sockets’ state
change are detected
Process Image
network sockets
Process Image
network sockets
source host
destination host
network
IEEE Cluster2010
Incremental collective socket migration
(during dirty-log phase)
- dirty memory page
Dirty pages transferred
and modified sockets’
state are updated,
tracking loop timeout is
decreased
Process Image
network sockets
Process Image
network sockets
source host
destination host
network
IEEE Cluster2010
Incremental collective socket migration
(during dirty-log phase)
- dirty memory page
When number of dirty
pages or tracking
timeout goes below a
pre-defined limit, enter
process freeze phase
Process Image
network sockets
Process Image
network sockets
source host
destination host
network
IEEE Cluster2010
Incremental collective socket migration
(during dirty-log phase)
- dirty memory page
Transfer dirty pages and
set up packet capture
filter
Process Image
network sockets
Process Image
network sockets
source host
destination host
network
IEEE Cluster2010
Incremental collective socket migration
(during dirty-log phase)
Note: transferred socket data in freeze phase is
much less than the overall socket representation!
- dirty memory page
Update sockets that have
changed in the last
iteration and disable
sockets on the source
machine
Process Image
network sockets
Process Image
network sockets
source host
destination host
network
IEEE Cluster2010
Incremental collective socket migration
(during dirty-log phase)
- dirty memory page
Process Image
Inject packets and reenable sockets on the
destination machine
Process Image
network sockets
source host
destination host
network
IEEE Cluster2010
Outline
•
•
•
•
•
•
•
•
Motivation
Cluster Server Architecture
DVE Software Components
Process Live Migration
Multiple Socket Migration Optimizations
Dynamic Load Balancing
Evaluation
Conclusion
IEEE Cluster2010
Dynamic Load Balancing
• Decentralized middleware
• Load balancing is sender initiated performing a hand-shake
with the receiver
• Transfer policy:
– Threshold driven (if load exceeds a certain value)
• Location policy:
– Based on knowledge of load on the rest of the nodes, preferring
a node that is on the opposite side of the cluster load average
• Selection policy:
– Prefers a process that consumes as much CPU power as much
the difference between the given node’s load and the cluster
load average
• Information policy:
– Periodic policy, nodes broadcast their load
IEEE Cluster2010
Outline
•
•
•
•
•
•
•
•
Motivation
Cluster Server Architecture
DVE Software Components
Process Live Migration
Multiple Socket Migration Optimizations
Dynamic Load Balancing
Evaluation
Conclusion
IEEE Cluster2010
Evaluation: experimental framework
•
•
•
•
•
Dedicated single IP address cluster
5 DVE server nodes + a MySQL server
2.4GHz Dual-Core AMD Opteron
2 GB RAM
Gigabit Ethernet for both in-cluster and public
network
IEEE Cluster2010
Evaluation: OpenArena server
• OpenArena is an open-source multi-player online game
based on the Quake III engine [1]
• Uses UDP for client-server communication
• ~20 messages (updates) per second
• Live migrated when 24 clients were participating in a
session
• Based on tcpdump’s result
on the client machines
~25ms service downtime
due to migration
[1] http://openarena.ws/smfnews.php
IEEE Cluster2010
Evaluation: DVE simulation
• DVE simulation with communication characteristics resembling realworld MMOPGs using TCP connections
• Client state update: 20 msgs / sec, 256~512 bytes message size [2]
• DVE server processes maintain MySQL to local DB server
• CPU consumption grows proportionally with number of clients in a
given zone, 10,000 clients involved
• Virtual space consists of 10x10 zones, each DVE server node is
assigned to 20 zones initially
• 15 minutes simulation during which
clients are instructed to move to the
up-left and bottom-right corner of
the virtual space
• Files are assumed to be available on
each node
[2] Traffic characteristics of a massively multi-player online role playing game, NetGames’05
IEEE Cluster2010
Live migration process downtime
Process downtime (ms)
Iterative
Collective
Incremental collective
200
180
160
140
120
100
80
60
40
20
0
16
32
64
128
256
Number of TCP connections
IEEE Cluster2010
512
1024
Socket data transferred during process
freeze phase
Socket data transferred (bytes)
Iterative / Collective
Incremental collective
4000
3750
3500
3250
3000
2750
2500
2250
2000
1750
1500
1250
1000
750
500
250
0
16
32
64
128
256
Number of TCP connections
IEEE Cluster2010
512
1024
Load distribution during simulation
without load balancing
• node1, node2 and node5 becomes overloaded when
clients move to zones maintained by these nodes
IEEE Cluster2010
Load distribution during simulation
with load balancing
• Load stays balanced throughout the simulation
IEEE Cluster2010
Number of zone server processes on
each node during the simulation
• Lighter processes are migrated over to node3 and node4 in order to
balance the overall load of the system
IEEE Cluster2010
Outline
•
•
•
•
•
•
•
•
Motivation
Cluster Server Architecture
DVE Software Components
Process Live Migration
Multiple Socket Migration Optimizations
Dynamic Load Balancing
Evaluation
Conclusion
IEEE Cluster2010
Conclusion
• Process live migration
– Optimizations for migrating a massive amount
network connections
– No modifications to the TCP protocol or to the client
side network stack
• Dynamic load balancing engine exploiting process
live migration
• DVE simulation for demonstrating load balancer
and live migration
• Other possible scenarios:
– Fault tolerance (IEEE NCA2010)
– Power management
IEEE Cluster2010
Thank you for your attention!
Questions?
IEEE Cluster2010
Related Work
• Connection Migration:
– NEC’s distributed Web Server arch: each session has its own
virtual IP address
– SockMi, Tcpcp: TCP migration with IP layer forwarding, don’t
decouple the process from the source machine
– TCP Migrate option: extension to the TCP protocol
• Process migration and incremental checkpointing:
– V-System, Amoeba, Mach, Sprite, MOSIX – limited connection
migration support
– BLCR: no support for connection and incremental checkp.
– Zap’s VNAT: support required on client side as well
• Load balancing DVEs:
– Several studies addressing application level solutions
– MOSIX: home-node approach leaves residual dependencies
IEEE Cluster2010
Download