Transparent Process Migration: Design Alternatives and the Sprite Implementation Fred Douglis and John Ousterhout Purposes Utilize idle workstations Reclaim ownership of workstations when no longer idle Main Goals Transparent Users own their workstations High performance Low Complexity Easy to use Alternatives rsh Nichol’s Butler System Accent Charlotte V LOCUS 7up “A Brief Survey of Systems Providing Process or Object Migration Facilities” System Features Idle hosts are plentiful Users own their workstations Most programs are short lived Sprite uses kernel calls (as opposed to message passing) Sprite provides networking mechanisms that allow for ease of implementation Policies Which processes do we migrate Migrate only at initiation or after running a while How do we select idle machines Who makes the decisions Possible Options Pool of processors No policy support (rsh) Automated selection of idle hosts but other decisions are only partially automated Selection of Idle Hosts Each machine has a load-average daemon which monitors usage Load-average daemon notifies central migration server when idle Request for idle host is sent to server process What determines if a host is idle? No keyboard or mouse movement for a certain period Fewer runnable processes than processors Which idle host do we use? The host that is idle for the longest period of time The host with the most amount of memory Transparency Definition Behavior unaffected by migration Appearance to rest of world unaffected by migration Achieving Transparency Changed file system so every machine sees same name space Transfer state information from source to target Forward kernel calls home Ad hoc techniques for specific kernel calls such as fork State Information Virtual Memory Open Files Message Channels Execution State Other (pid, user id, working directory, signal masks and handlers, resource usage stats, references to parent/child, etc.) Possible Solutions for Handling State Transfer State Arrange for forwarding Ignore state and sacrifice transparency Disallow migration for certain processes Virtual Memory Mechanism Transfer at migration (Charlotte & LOCUS) Precopying (V) Lazy copy (Accent) Lazy copy … sorta (Sprite) Open File Mechanism Close file on source reopen on target Notify file is in use on target then notify no longer in use on source Special server code for migrating files – Source contacts target, target contacts central server, server contacts host PCB and Other State Information Home and target need PCB for a process PCB on home contains only enough fields to locate process PCB on target is complete Other state is either part of PCB or can be transferred directly Eviction Load-average daemon checks for foreign processes and evicts them Process migrated back to host Can also occur if centralized server receives request for idle host when none are available Residual Dependencies Definition: Ongoing need for home to maintain data structures or provide functionality after migration Problems with Residual Dependencies Decrease in reliability Decrease in performance Increase in complexity Other Problems: Interdependence of Migration and Kernel Migration involves state manipulated by every major module in kernel Changes in one affects the other Solutions • Migration version numbers: Disallow migration between processes with different version numbers • Avoid centralizing migration code Algorithm Process initiates communication with central server (opens a special file, OS passes the open operation onto server) Process requests idle hosts Centralized server selects idle host and sends information to the home machine Process is signaled to make it trap into kernel Source contacts target to confirm it is running, available for migration and has same version number Algorithm Continued Kernel on target allocates new PCB and returns a token identifier Pre-migration routine obtains size of encapsulated data and takes necessary actions Source kernel allocates buffer to hold encapsulated state Call encapsulation routine to place encapsulated information into portion of buffer Source kernel passes encapsulated state to target kernel via RPC Algorithm Continued Target begins de-encapsulation routine Some processes also require a postmigration routine to clean up remaining state Source kernel frees buffer and informs target to resume process execution After process completes return host to pool of idle hosts Close communication channel to central server Practical Applications pmake: used to parallelize make command by invoking as many commands in parallel as there are idle hosts mig: takes as argument a shell command and executes it on an idle machine Performance: Migration Overhead Action Time/Rate Select and release idle host 36 ms Migrate null process 76 ms Transfer info for open files 9.4 ms/file Flush modified file blocks 480 Kbyte/s Flush modified pages 660 Kbyte/s Transfer exec arguments 480 Kbyte/s Fork, exec null process w/migration, wait for child to exit 81 ms Fork, exec null process locally, 46 ms wait for child to exit Performance: Application Performance 180 160 140 120 100 local remote 80 60 40 20 0 pmake LaTeX rcp fork gettime Performance: Usage Patterns Remote processes were 31% of execution Evictions have been rare Conclusions Overall improvement from using idle hosts can be substantial. Remote execution accounts for a sizeable portion of execution in Sprite. There are still a number of idle hosts. Execution time for a process is higher if executed remotely but not noticeable by human standards. Cost of transferring address space and flushing modified file blocks dominates migrating long running processes Future Work More applications that make use of migration Remote execution on machines of different types Mechanism for migration of processes that share memory Detect when users and not just workstations are idle Questions??? If you dare…