Live Migration of Virtual Machines Presented by: Edward Armstrong University of Guelph Overview Problem Description Solutions Strengths and Weaknesses Results Migrating an OS - Complexity Trivial • Same machine. • Hibernating a laptop. Moderate • Same Hardware, different Machine. • Changing cluster nodes. Complex • Different hardware. Migrating an OS – Timescale Reboot Hibernation Suspend Live • Requires only long term storage to be migrated. • Loses all process state information. • Driver reloading solves hardware problems. • Typically performed on the same machine. • Requires long term storage. • Loses external connection information, ie. network status. • Hibernation in short term storage (RAM). • Typically used to maintain a low power state. • Processes are not implicitly frozen. • Differences in hardware create problems. • Solved by using virtual machines. Considerations Both machines must be active at the same time. Migration of active live services. Total migration time. Resource contention. Migrating memory Push Pull Pause Memory - Pure stop and copy. Push Pull Pros ◦ Simplicity. ◦ Consistency. Pause Cons ◦ Downtime proportional to memory. ◦ Unacceptable for live services. Source Machine Target Machine Memory - On demand migration. Push Pull Pros ◦ Shorter downtime. ◦ Consistency. Pause Cons ◦ Longer migration time. Page fault request Send page Source Machine Target Machine Memory - Pre copy migration. Push Pull Pros ◦ Copy low fault pages quickly. ◦ Works well for live processes. Pause Cons ◦ Large number of faults for busy memory. Iterative push Live page fault Source Machine Target Machine Network and disk resources. Unique to an OS instance. Ordering of resources are nondeterminsitic. Need to maintain open network connections. Resolving network connections with an ARP* response. Source Machine No Address LAN Packets Target Machine 192.168.0.102 *ARP: Address Resolution Protocol Resolving network connections with an ARP* response. Unsolicited ARP reply Source Machine No Address LAN Packets Target Machine 192.168.0.102 *ARP: Address Resolution Protocol Resolving network connections with an ARP* response. Source Machine 192.168.0.102 Packets LAN Target Machine No Address *ARP: Address Resolution Protocol Network resources. Pros ◦ Handled by external devices. ◦ Similar scheme can be used to migrate disk services (not covered in paper). Cons ◦ Small amount of packet loss. ◦ Requires a LAN with unsolicited ARP responses enabled. Design Overview PreMigration Reservation Pre-Copy Activate Commit Stop and Copy Select a new target machine Source Machine Target Machine Design Overview PreMigration Reservation Pre-Copy Activate Commit Stop and Copy Source Machine Confirm available resources on target. Failure means VM continues to run on source. Target Machine Design Overview PreMigration Reservation Pre-Copy Activate Commit Stop and Copy Source Machine Transfer memory. Retransmit memory used during transfer. Target Machine Design Overview PreMigration Reservation Pre-Copy Activate Commit Stop and Copy Source Machine Halt source to redirect network traffic and transfer CPU state. Target Machine Design Overview PreMigration Reservation Pre-Copy Activate Commit Stop and Copy Source Machine Verify transfer complete. Disable source. Target Machine Design Overview PreMigration Reservation Pre-Copy Activate Commit Stop and Copy Activate target machine Source Machine Target Machine Summary of Design At all times there is at least one consistent image available. Minimized down time. Not a fast process overall. Requires tuning. Requires certain hardware. Performance – Dirty Pages. 8 second granularity, used to decide which pages make for good pre-migration. Performance Performance 1- 4 pre-copy iterations Performance 1- 4 pre-copy iterations Performance 1- 4 pre-copy iterations