Data Conversion for Process/Thread Migration and Checkpointing∗ Hai Jiang, Vipin Chaudhary and John Paul Walters Institute for Scientific Computing Wayne State University Detroit, MI 48202 {hai, vipin, jwalters}@wayne.edu Abstract Process/thread migration and checkpointing schemes support load balancing, load sharing and fault tolerance to improve application performance and system resource usage on workstation clusters. To enable these schemes to work in heterogeneous environments, we have developed an application-level migration and checkpointing package, MigThread, to abstract computation states at the language level for portability. To save and restore such states across different platforms, this paper proposes a novel “Receiver Makes Right” (RMR) data conversion method, called Coarse-Grain Tagged RMR (CGT-RMR), for efficient data marshalling and unmarshalling. Unlike common data representation standards, CGT-RMR does not require programmers to analyze data types, flatten aggregate types, and encode/decode scalar types explicitly within programs. With help from MigThread’s type system, CGT-RMR assigns a tag to each data type and converts non-scalar types as a whole. This speeds up the data conversion process and eases the programming task dramatically, especially for the large data trunks common to migration and checkpointing. Armed with this “Plug-and-Play” style data conversion scheme, MigThread has been ported to work in heterogeneous environments. Some microbenchmarks and performance measurements within the SPLASH-2 suite are given to illustrate the efficiency of the data conversion process. 1. Introduction Migration concerns saving the current computation state, transferring it to remote machines, and resuming the execution at the statement following the migration point. Checkpointing concerns saving the computation state to file systems and resuming the execution by restoring the compu∗ This research was supported in part by NSF IGERT grant 9987598, NSF MRI grant 9977815, and NSF ITR grant 0081696. tation state from saved files. Although the state-transfer medium differs, migration and checkpointing share the same strategy in state handling. To improve application performance and system resource utilization, they support load balancing, load sharing, data locality optimization, and fault tolerance. The major obstacle preventing migration and checkpointing from achieving widespread use is the complexity of adding transparent migration and checkpointing to systems originally designed to run stand-alone [1]. Heterogeneity further complicates this situation. But migration and checkpointing are indispensable to the Grid [2] and other loosely coupled heterogeneous environments. Thus, effective solutions are on demand. To hide the different levels of heterogeneity, we have developed an application level process/thread migration and checkpointing package, MigThread, which abstracts the computation state up to the language level [3, 4]. For applications written in the C language, states are constructed in the user space instead of being extracted from the original kernels or libraries for better portability across different platforms. A preprocessor transforms source code at compile time while a run-time support module dynamically collects the state for migration and checkpointing. The computation state is represented in terms of data. To support heterogeneity, MigThread is equipped with a novel “plug-and-play” style data conversion scheme called coarse-grain tagged “Receiver Makes Right” (CGT-RMR). It is an asymmetric data conversion method to perform data conversion only on the receiver side. Since common data representation standards are separate from user applications, programmers have to analyze data types, flatten down aggregate data types, such as structures, and encode/decode scalar types explicitly in programs. With help from MigThread’s type system, CGT-RMR can detect data types, generate application-level tags for each of them, and ease the burden of data conversion work previously left to the programmer. Aggregate type data are handled as a whole instead of being flattened down recursively in programs. Therefore, Proceedings of the 2003 International Conference on Parallel Processing (ICPP’03) 0190-3918/03 $ 17.00 © 2003 IEEE Authorized licensed use limited to: IEEE Xplore. Downloaded on October 20, 2008 at 13:09 from IEEE Xplore. Restrictions apply. foo() { int a; double b; int *c; double **d; . . . } MTh_foo() { struct MThV_t { void *MThP; int stepno; int double } MThV; struct MThP_t { int *c; double **d; } MThP; Figure 1. The original function. compared to common standards, CGT-RMR is more convenient in handling large data chunks. In migration and checkpointing, computation states spread out in terms of memory blocks, and CGT-RMR outperforms normal standards by a large margin. Also, no large routine groups or tables are constructed as in common RMR [8]. Architecture tags are generated on the fly so that new computer platforms can be adopted automatically. CGTRMR takes an aggressive data conversion approach between incompatible platforms. The low-level data conversion failure events can be conveyed to the upper-level MigThread scheduling module to ensure the correctness of real-time migration and checkpointing. Empowered with CGT-RMR, MigThread can handle migration and checkpointing across heterogeneous platforms. The remainder of this paper is organized into seven sections. Section 2 provides an overview of migration and checkpointing. Section 3 discusses data conversion issues and some existing schemes. In Section 4, we provide the detail of designing and implementing CGT-RMR in MigThread. Section 5 presents some microbenchmarks and experimental results from real benchmark programs. In Section 6, we discuss related work. Section 7 discusses our conclusions and future work. 2. Migration and Checkpointing Migration and checkpointing concerns constructing, transferring, and retrieving computation states. Despite the complexity of adding transparent support, migration and checkpointing continue to attract attention due to the potential for computation mobility. 2.1. MigThread MigThread is an application-level multi-grained migration and checkpointing package [3], which supports both coarse-grained processes and fine-grained threads. MigThread consists of two parts: a preprocessor and a run-time support module. The preprocessor is designed to transform a user’s source code into a format from which the run-time support module can construct the computation state efficiently. Its power can improve the transparency drawback in application-level a; b; MThV.MThP = (void *)&MThP; . . . } Figure 2. The transformed function. schemes. The run-time support module constructs, transfers, and restores computation states dynamically [4]. Originally, the state data consists of the process data segment, stack, heap and register contents. In MigThread, the computation state is moved out from its original location (libraries or kernels) and abstracted up to the language level. Therefore, the physical state is transformed into a logical form to achieve platform-independence. All related information with regard to stack variables, function parameters, program counters, and dynamically allocated memory regions, is collected into pre-defined data structures [3]. Figures 1 and 2 illustrate a simple example for such a process, with all functions and global variables transformed accordingly. A simple function foo() is defined with four local variables as in Figure 1. MigThread’s preprocessor transforms the function and generates a corresponding MTh foo() shown in Figure 2. All non-pointer variables are collected in a structure MThV while pointers are moved to another structure MThP. Within MThV, field MThV.MThP is the only pointer, pointing to the second structure, MThP, which may or may not exist. Field MThV.stepno is a logical construction of the program counter to indicate the program progress and where to restart. In process/thread stacks, each function’s activation frame contains MThV and MThP to record the current function’s computation status. The overall stack status can be obtained by collecting all of these MThV and MThP data structures spread in activation frames. Since address spaces could be different on source and destination machines, values of pointers referencing stacks or heaps might become invalid after migration. It is the preprocessor’s responsibility to identify and mark pointers at the language level so that they can easily be traced and updated later. MigThread also supports user-level memory management for heaps. Eventually, all state related contents, including stacks and heaps, are moved out to the user space and handled by MigThread directly. Proceedings of the 2003 International Conference on Parallel Processing (ICPP’03) 0190-3918/03 $ 17.00 © 2003 IEEE Authorized licensed use limited to: IEEE Xplore. Downloaded on October 20, 2008 at 13:09 from IEEE Xplore. Restrictions apply. 2.2. Migration and Checkpointing Safety Migration and checkpointing safety concerns ensuring the correctness of resumed computation [4, 5]. In other words, computation states should be constructed precisely, and restored correctly on similar or different machines. The major identified unsafe factors come from unsafe type systems (such as the one in C) and third-party libraries [4]. But for heterogeneous schemes, if data formats on different machines are incompatible, migration and resuming execution from checkpoint files might lead to errors. This requires that upper level migration/checkpointing schemes be aware of the situation in lower level data conversion routines. MigThread supports aggressive data conversion and aborts state restoration only when “precision loss” events occur. Thus, the third unsafe factor for heterogeneous schemes, incompatible data conversion, can be identified and handled properly. 3. Data Conversion Computation states can be transformed into pure data. If different platforms use different data formats and computation states constructed on one platform need to be interpreted by another, the data conversion process becomes unavoidable. 3.1. Data Conversion Issues In heterogeneous environments, common data conversion issues are identified as follows: • Byte Ordering : Either big endian or little endian. • Character Sets : Either ASCII or EBCDIC representation. • Floating Point Standards : IEEE 754, IEEE 854, CRAY, DEC or IBM standard. • Data Alignment and Padding : Data is naturally aligned when the starting address is on a “natural boundary.” This means that the starting memory address is a multiple of the data’s size. Structure alignment can result in unused space, called padding. Padding between members of a structure is called internal padding. Padding between the last member and the end of the space occupied by the structure is called tail padding. Although natural boundary can be the default setting for alignment, data alignment is actually determined by processors, operating systems, and compilers. To avoid such indeterministic alignment and padding, many standards flatten native aggregate data types and re-represent them in their own default formats. • Loss of Precision : When high precision data are converted to their low precision counter-parts, loss of precision may occur. 3.2. Data Conversion Schemes Data representations can be either tagged or untagged. A tag is any additional information associated with data that helps a decoder unmarshal the data. Canonical intermediate form is one of the major data conversion strategies which provides an external representation for each data type. Many standards adopt this approach, such as XDR (External Data Representation) [6] from Sun Microsystems, ISO ASN.1 (Abstract Syntax Notation One) [7], CCSDS SFDU (Standard Formatted Data Units), ANDF (Architecture Neutral Data Format), IBM APPC GDS (General Data Stream), ISO RDA (Remote Data Access), and others [8]. Such common data formats are recognized and accepted by all different platforms to achieve data sharing. Even if both the sender and receiver are on the same machine, they still need to perform this symmetric conversion on both ends. XDR adopts the untagged data representation approach. Data types have to be determined by application protocols and associated with a pair of encoding/decoding routines. Zhou and Geist [8] took another approach, called “receiver makes it right”, which performs data conversion only on the receiver side. If there are n machines, each of a different type, the number of conversion routine groups will be (n2 − n)/2. In theory, the RMR scheme will lead to bloated code as n increases. Another disadvantage is that RMR is not available for newly invented platforms. 4. Coarse-Grain Tagged RMR in MigThread The proposed data conversion scheme is a “Receiver Makes Right” (RMR) variant which only performs the conversion once. This tagged version can tackle data alignment and padding physically, convert data structures as a whole, and eventually generate a lighter workload compared to existing standards. An architecture tag is inserted at the beginning. Since the byte ordering within the network is big-endian, simply comparing data representation on the platform against its format in networks can detect the endianness of the platform. Currently MigThread only accepts ASCII character sets and is not applicable on some IBM mainframes. Also, IEEE 754 is the adopted floating-point standard because of its dominance in the market, and MigThread can be extended to other floating-point formats. Proceedings of the 2003 International Conference on Parallel Processing (ICPP’03) 0190-3918/03 $ 17.00 © 2003 IEEE Authorized licensed use limited to: IEEE Xplore. Downloaded on October 20, 2008 at 13:09 from IEEE Xplore. Restrictions apply. 4.1. Tagging and Padding Detection For data conversion schemes, the tagged approaches associate each data item with its type attribute so that receivers can decode each item precisely. With this, fewer conversion routines are required. But tags create an extra workload and slow down the whole process. However, untagged approaches maintain large sets of conversion routines and encoding/decoding orders have to be handled explicitly in application programs. Performance improvement comes from the extra coding burden. In existing data format standards, both tagged and untagged approaches handle basic (scalar) type data on a oneby-one basis. Aggregate types need to be flattened down to a set of scalar types for data conversion. The main reason is to avoid the padding issue in aggregate types. Since the padding pattern is a consequence of the processor, operating system, and compiler, the padding situation only becomes deterministic at run-time. It is impossible to determine a padding pattern in programs and impractical for programmers to convey padding information from programs to conversion routines at run-time. This is because programs can only communicate with conversion routines in one direction and programming models have to be simple. Most existing standards choose to avoid padding issues by only handling scalar types directly. MigThread is a combination of compile-time and runtime supports. Its programmers do not need to worry about data formats. The preprocessor parses the source code, sets up type systems, and transforms source code to communicate with the run-time support module through inserted primitives. With the type system, the preprocessor can analyze data types, flatten down aggregate types recursively, detect padding patterns, and define tags. But the actual tag contents can be set only at run-time and they may not be the same on different platforms. Since all of the tedious tag definition work has been performed by the preprocessor, the programming style becomes extremely simple. Also, with the global control, low-level issues such as the data conversion status can be conveyed to upper-level scheduling modules. Therefore, easy coding style and performance gains come from the preprocessor. In MigThread, tags are used to describe data types and their padding situations so that data conversion routines can handle aggregate types as well as common scalar types. As we discussed in Section 2, global variables and function local variables in MigThread are collected into their corresponding structure type variables MThV and MThP which are registered as the basic units. Tags are defined and generated for these structures as well as dynamically allocated memory blocks in the heap. For the simple example in Section 2 (Figures 1 and 2), tag definitions of MThV heter and MThP heter for MThV MTh_foo() { . . . char MThV_heter[60]; char MThP_heter[41]; int int int int int MTh_so2 MTh_so1 MTh_so4 MTh_so3 MTh_so0 = = = = = sizeof(double); sizeof(int); sizeof(struct MThP_t); sizeof(struct MThV_t); sizeof(void *); sprintf(MThV_heter, "(%d,-1)(%d,0) (%d,1)(%d,0)(%d,1)(%d,0)(%d,1)(%d,0)", MTh_so0, (long)&MThV.stepno-(long)&MThV.MThP-MTh_so0, MTh_so1, (long)&MThV.a-(long)&MThV.stepnoMTh_so1, MTh_so1,(long)&MThV.b-(long)&MThV.aMTh_so1, MTh_so2,(long)&MThV+MTh_so3(long)&MThV.b-MTh_so2); sprintf(MThP_heter, "(%d,-1)(%d,0)(%d,-1) (%d,0)", MTh_so0,(long)&MThP.d-(long)&MThP.cMTh_so0, MTh_so0,(long)&MThP+MTh_so4(long)&MThP.d-MTh_so0); . . . } Figure 3. Tag definition at compile time. char MThV_heter[60]="(4,-1)(0,0)(4,1) (0,0)(4,1)(0,0)(8,0)(0,0)”; char MThP_heter[41]=”(4-1)(0,0)(4,-1)(0,0)”; Figure 4. Tag calculation at run-time. and MThP are shown in Figure 3. It is still too early to determine the content of the tags within programs. The preprocessor defines rules to calculate structure members’ sizes and variant padding patterns, and inserts sprintf() to glue partial results together. The actual tag generation has to take place at run-time when the sprintf() statement is executed. On a Linux machine, the simple example’s tags can be two character strings as shown in Figure 4. A tag is a sequence of (m,n) tuples, and can be expressed in one of the following cases (where m and n are positive numbers): • (m, n) : scalar types. The item “m” is simply the size of the data type. The “n” indicates the number of such scalar types. • ((m , n )...(m , n ), n) : aggregate types. The “m” in the tuple (m, n) can be substituted with another tag (or tuple sequence) repeatedly. Thus, a tag can be expanded recursively for those enclosed aggregate type fields until all fields are converted to scalar types. The second item “n” still indicates the number of the toplevel aggregate types. • (m, −n) : pointers. The “m” is the size of pointer type on the current platform. The “-” sign indicates the pointer type, and the “n” still means the number of pointers. • (m, 0) : padding slots. The “m” specifies the number of bytes this padding slot can occupy. The (0, 0) is a popular case and indicates no padding. Proceedings of the 2003 International Conference on Parallel Processing (ICPP’03) 0190-3918/03 $ 17.00 © 2003 IEEE Authorized licensed use limited to: IEEE Xplore. Downloaded on October 20, 2008 at 13:09 from IEEE Xplore. Restrictions apply. Old tag: (m1,n1)(m2,n2)… New tag: (m1’,n1’)(m2’,n2’)… Old block padding New block Walk through padding Figure 5. Walk through “tag-block” format segments. slots alternatively. The high-level conversion unit is data slots rather than bytes in order to achieve portability. The “walk-through” process contains two index pointers pointing to a pair of matching scalar data slots in both blocks. The contents of the old data slots are converted and copied to the new data slots if byte ordering changes, and then index pointers moved down to the next slots. In the mean time, padding slots are skipped over, although most of them are defined as (0, 0) to indicate that they do not physically exist. In MigThread, data items are expressed in “scalar type data - padding slots” pattern to support heterogeneity. 4.3. Data Resizing In programs, only one statement is issued for each data type, whether it is a scalar or aggregate type. The flattening procedure is accomplished by MigThread’s preprocessor during tag definition instead of the encoding/decoding process at run-time. Hence, programmers are freed from this responsibility. 4.2. Data Restoration Each function contains one or two structures and corresponding tags depending on whether MThP exists. In MigThread, all memory segments for these structures are represented in a “tag-block” format. The process/thread stack becomes a sequence of MThV, MThP and their tags. Memory blocks in heaps are also associated with such tags to express the actual layout in memory space. Therefore, the computation state physically consists of a group of memory segments associated with their own tags in a “tag-segment” pair format. To support heterogeneity, MigThread executes data conversion routines against these coarse-grained memory segments instead of the individual data object. Performance gains are guaranteed. The receivers or reading processes of checkpointing files need to convert the computation state, i.e., data, as required. Since activation frames in stacks are re-run and heaps are recreated, a new set of segments in “tag-block” format is available on the new platform. MigThread first compares architecture tags by strcmp(). If they are identical and blocks have the same sizes, this means the platform remains unchanged and the old segment contents are simply copied over by memcpy() to the new architectures. This enables prompt processing between homogeneous platforms while symmetric conversion approaches still suffer data conversion overhead on both ends. If platforms have been changed, conversion routines are applied on all memory segments. For each segment, a “walk-through” process is conducted against its corresponding old segment from the previous platform, as shown in Figure 5. In these segments, according to their tags, memory blocks are viewed to consist of scalar type data and padding Between incompatible platforms, if data items are converted from higher precision formats to lower precision formats, precision loss may occur. Normally higher precision format data are longer so that the high end portion cannot be stored in lower precision formats. But if the high end portions contain all-zero content, it is safe to throw them away since data values still remain unchanged. MigThread takes this aggressive strategy and intends to convert data until precision loss occurs. More programs are qualified for migration and checkpointing. Detecting incompatible data formats and conveying this low-level information up to the scheduling module can help abort data restoration promptly for safety. 4.4. Plug-and-play We declare CGT-RMR as a “plug-and-play” style scheme, and it does not maintain tables or routine groups for all possible platforms. Since almost all forthcoming platforms are following the IEEE floating-point standard, no special requirement is imposed for porting code to a new platform. However, adopting old architectures such as IBM mainframe, CRAY and DEC requires some special conversion routines for floating-point numbers. 5. Microbenchmarks and Experiments One of our experimental platforms is a SUN Enterprise E3500 with 330Mhz UltraSparc processors and 1Gbytes of RAM, running Solaris 5.7. The other platform is a PC with a 550Mhz Intel Pentium III processor and 128Mbytes of RAM, running Linux. The CGT-RMR scheme is applied for data conversion in migration and checkpointing between these two different machines. PVM (Parallel Virtual Machine) uses the XDR standard for heterogeneous computing. Thus, some process migration schemes, such as SNOW [12], apply XDR indirectly by calling PVM primitives. Even the original RMR implementation was based on XDR’s untagged strategy [8]. Since Proceedings of the 2003 International Conference on Parallel Processing (ICPP’03) 0190-3918/03 $ 17.00 © 2003 IEEE Authorized licensed use limited to: IEEE Xplore. Downloaded on October 20, 2008 at 13:09 from IEEE Xplore. Restrictions apply. 6 struct { char short int long double } s[n]; Conversion cost (Microseconds) 5 4 Figure 8. The structure array. 3 PVM-XDR CGT-RMR PVM-XDR S-S PVM-XDR L-S PVM-XDR S-L PVM-XDR L-L CGT-RMR S-S CGT-RMR L-S CGT-RMR S-L CGT-RMR L-L 7000 char 1 short int 2 3 long 4 double char short 5 6 7 Solaris - Solaris int 8 long char 9 double 10 11 short 12 Linux - Solaris int 13 long 14 double 15 char 16 short 17 Solaris - Linux int 18 Conversion time (microseconds) 1 long 19 double 20 Linux - Linux Scalar types on platform pairs Figure 6. Conversion costs for scalar types. 1200 PVM-XDR S-S PVM-XDR L-S PVM-XDR S-L PVM-XDR L-L CGT-RMR S-S CGT-RMR L-S CGT-RMR S-L CGT-RMR L-L 1000 Conversion time (microseconds) 8000 2 0 a; b; c; d; e; 800 6000 5000 4000 3000 2000 1000 0 0 2 4 6 Log (Length of Array) 8 10 Figure 9. Conversion costs of structure arrays. 600 400 200 0 0 2 4 6 8 10 Log (Length of Array) Figure 7. Conversion costs of integer arrays. most existing systems adopt XDR or similar data conversion strategies [11, 5, 12], we compare our CGT-RMR with XDR implementation in PVM to predict the performance of MigThread and other similar systems. The costs of converting scalar data types are shown in Figure 6. Data are encoded on one platform and decoded on another platform. For scalar types, such as char, short, int, long and double, the PVM’s XDR implementation (PVMXDR) is slower than CGT-RMR, which is even faster in homogeneous environments since no conversion actually occurs. Also XDR forces the programmer to encode and decode data even on the same platforms. Figure 6 indicates that CGT-RMR can handle basic data units more efficiently than PVM-XDR. To test the scalability, we apply the two schemes on integer and structure arrays. Figure 7 shows an integer array’s behavior. In homogeneous environments, i.e., both encoding and decoding operations are performed on either the Solaris or Linux machine, CGT-RMR demonstrates virtually no cost and excellent scalability. In heterogeneous environments, i.e., encoding data on Solaris or Linux and decoding data on a different machine, CGT-RMR incurs a little more overhead, shown by the top two curves. The four curves in the middle are from PVM-XDR which does not vary much by different platform combinations, nor can it take advantage of homogeneous environments. This indicates that PVM-XDR has a little better scalability on scalar type arrays in heterogeneous environments. The conversion overheads of structure arrays are simulated in Figure 9. Figure 8 lists the sample structure array which contains 5 common scalar type fields with different data alignment requirements. Again, in a homogeneous environment, CGT-RMR causes virtually no overhead. Its heterogeneous cases, the top two curves start merging with the 4 PVM-XDR curves. Because of the padding issue in structures, programmers have to encode/decode each field explicitly which diminishes XDR’s advantage in scalar arrays and incurs tremendous programming complexity. In the simple case shown in Figure 8, assuming that n is the number of scalar types, there will be 10n encoding and decoding statements hand-coded by programmers. In CGTRMR, only one primitive is required on each side, and the preprocessor can handle all other tag generation details automatically. Therefore, CGT-RMR eases the coding complexity dramatically in complex cases such as migration and checkpointing schemes where large computation states are common. To evaluate the CGT-RMR strategy in real applications, Proceedings of the 2003 International Conference on Parallel Processing (ICPP’03) 0190-3918/03 $ 17.00 © 2003 IEEE Authorized licensed use limited to: IEEE Xplore. Downloaded on October 20, 2008 at 13:09 from IEEE Xplore. Restrictions apply. Table 1. Migration and Checkpointing Overheads in real applications (Microseconds) Program (Func. Act.) FFT ( 2215 ) 1024 LU-c ( 2868309 ) 512x512 LU-n ( 8867 ) 128x128 MatMult (6) 128x128 Platform Pair Solaris-Solaris Linux-Solaris Solaris-Linux Linux-Linux Solaris-Solaris Linux-Solaris Solaris-Linux Linux-Linux Solaris-Solaris Linux-Solaris Solaris-Linux Linux-Linux Solaris-Solaris Linux-Solaris Solaris-Linux Linux-Linux State Size (B) 78016 78024 78016 78024 2113139 2113170 2113139 2113170 135284 135313 135284 135313 397259 397283 397259 397283 Save Files 96760 48260 96760 48260 2507354 1345015 2507354 1345015 165840 85053 165840 85053 501073 252926 501073 252926 Read Files 24412 24492 13026 13063 4954588 4954421 7011277 7058531 51729 51501 40264 40108 166539 120229 166101 120671 we apply it on FFT, continuous and non-continuous versions of LU from the SPLASH-2 suite, and matrix multiplication applications. We predefine the adaptation points for migration or checkpointing. The detailed overheads are listed in Table 1. With certain input sizes, applications are paused on one platform to construct computation states whose sizes can vary from 78K to 2M bytes in these sample programs. Then the computation states are transferred to another platform for migration, or saved into file systems and read out by another process on another platform for checkpointing. CGT-RMR plays a role in data conversion in stacks, data conversion in heaps, and pointer updating in both areas. In FFT and LU-n, large numbers of memory blocks are dynamically allocated in heaps. In homogeneous environments, stacks and heaps are recreated without data conversion. But in heterogeneous environments, converting data in large heaps dominates the CPU time. On the other hand, LU-c and MatMult are deployed as pointer-intensive applications. When computation states are restored, pointer updating is an unavoidable task. In homogeneous environments with no data conversion issue, the CPU simply devotes itself to pointer updating. Even in heterogeneous cases, pointer updating is still a major issue although their large heaps also incur noticeable overheads. The time spent on stacks is negligible. It is clear that overhead distribution is similar for both homogeneous and heterogeneous environments in XDR or similar standards. CGT-RMR runs much faster in homogeneous environments and is similar in performance of XDR in heterogeneous environments. From the microbenchmarks, we can see that CGT-RMR Send Socket 26622 29047 16948 17527 4939845 5230449 7045671 7131833 53212 62003 44901 56695 164324 220627 129457 130107 Convert Stack 598 1581 923 387 589 1492 863 385 528 1376 837 357 136 385 862 100 Convert Heap 1033 57218 28938 700 27534 3158140 2247536 19158 2359 103735 52505 1489 2561 306324 604161 3462 Update Pointers 364 459 443 399 5670612 6039699 8619415 8103707 306 322 359 377 484149 639281 482380 640072 takes less time in converting scalar type data and provides distinct advantages in programming complexity. XDR only shows limited advances in scalar array processing with tremendous coding effort from programmers. The experiments on real applications detail the overhead distribution and indicate that CGT-RMR helps provide a practical migration and checkpointing solution with minimal user involvement and satisfactory performance. 6. Related Work There have been a number of notable attempts at designing process migration and checkpointing schemes, however, few implementations have been reported in literature with regard to the fine-grain thread migration and checkpointing. An extension of the V migration mechanism is proposed in [9]. It requires both compiler and kernel support for migration. Data has to be stored at the same address in all migrated versions of the process to avoid pointer updating and variant padding patterns in aggregate types. Obviously this constraint is inefficient or even impossible to meet across different platforms. Another approach is proposed by Theimer and Hayes in [10]. Their idea was to construct an intermediate source code representation of a running process at the migration point, migrate the new source code, and recompile it on the destination platform. An extra compilation might incur more delays. Efficiency and portability are the drawbacks. The Tui system [5] is an application-level process migration package which utilizes compiler support and a de- Proceedings of the 2003 International Conference on Parallel Processing (ICPP’03) 0190-3918/03 $ 17.00 © 2003 IEEE Authorized licensed use limited to: IEEE Xplore. Downloaded on October 20, 2008 at 13:09 from IEEE Xplore. Restrictions apply. bugger interface to examine and restore process states. It applies an intermediate data format to achieve portability across various platforms. Just as in XDR, even if migration occurs on the same platform, data conversion routines are still performed twice. Process Introspection (PI) [11] uses program annotation techniques as in MigThread. PI is a general approach for checkpointing and applies the “Receiver Makes Right” (RMR) strategy. Data types are maintained in tables and conversion routines are deployed for all supported platforms. Aggregate data types are still flattened down to scalar types (e.g., int, char, long, etc) to avoid dealing with data alignment and padding. MigThread does this automatically. SNOW [12] is another heterogeneous process migration system which tries to migrate live data instead of the stack and heap data. SNOW adopts XDR to encode and decode data whereas XDR is slower than the RMR used in PI [8]. PVM installation is a requirement. Virtual machines are the intuitive solution to provide abstract platforms in heterogeneous environments. Some mobile agent systems such as the Java-based IBM Aglet [13] use such an approach to migrate computation. However, it suffers from slow execution due to interpretation overheads. 7. Conclusions and Future Work We have discussed a data conversion scheme, (CGTRMR), which enables MigThread to not flatten down aggregate types into scalar (primitive) data types within data conversion routines. In fact, type flattening takes place in the tag definition process which is conducted by MigThread’s preprocessor. The contents of tags are determined at run-time to eliminate possible alignment affecting factors from CPUs, operating systems, and compilers. Since tags help resolve data alignment and padding, CGT-RMR provides significant coding convenience to programmers and contributes a feasible data conversion solution in heterogeneous migration and checkpointing. Performing data conversion only on the receiver side, CGT-RMR exhibits tremendous efficiency in homogeneous environments and performance similar to XDR in heterogeneous environments. Without tables or special data conversion routines, CGT-RMR adopts “plug-and-play” design and can be applied on new computer platforms directly. Our future work is to build a new data conversion API so that programmers can use CGT-RMR directly rather than through MigThread. To accomplish a universal data conversion scheme, CGT-RMR requires MigThread’s preprocessor as its back-end because the preprocessor’s type system is still indispensable. Working as a stand-alone standard such as XDR, CGT-RMR will benefit other migration and checkpointing schemes, and any applications running in heterogeneous environments. References [1] D. Milojicic, F. Douglis, Y. Paindaveine, R. Wheeler and S. Zhou, “Process Migration Survey”, ACM Computing Surveys, 32(8), pp. 241-299, 2000. [2] I. Foster, C. Kesselman, J. Nick, and S. Tuecke, “Grid Services for Distributed System Integration”, Computer, 35(6), pp. 37-46, 2002. [3] H. Jiang and V. Chaudhary, “Compile/Run-time Support for Thread Migration”, Proc. of 16th International Parallel and Distributed Processing Symposium, pp. 58-66, 2002. [4] H. Jiang and V. Chaudhary, “On Improving Thread Migration: Safety and Performance”, Proc. of the International Conference on High Performance Computing(HiPC), pp. 474-484, 2002. [5] P. Smith and N. Hutchinson, “Heterogeneous process migration: the TUI system”, Technical Report 96-04, University of British Columbia, Feb. 1996. [6] R. Srinivasan, “XDR: External Data Representation Stndard”, RFC 1832, http://www.faqs.org/rfcs/rfc1832.html, Aug. 1995. [7] C. Meryers and G. Chastek, “The use of ASN.1 and XDR for data representation in real-time distributed systems”, Technical Report CMU/SEI-93TR-10, Carnegie-Mellon University, 1993. [8] H. Zhou and A. Geist, ““Receiver Makes Right” Data Conversion in PVM”, In Proceedings of the 14th International Conference on Computers and Communications, pp. 458– 464, 1995. [9] C. Shub, “Native Code Process-Originated Migration in a Heterogeneous Environment”, In Proc. of the Computer Science Conference, pp. 266-270, 1990. [10] M. Theimer and B. Hayes, “Heterogeneous Process Migration by Recompilation”, In Proceedings of the 11th International Conference on Distributed Computing Systems, Arlington, TX, pp. 18-25, 1991. [11] A. Ferrari, S. Chapin, and A. Grimshaw, “Process Introspection: A Checkpoint Mechanism for High Performance Heterogenesous Distributed Systems”, Technical Report CS-96-15, University of Virginia, Department of Computer Science, October, 1996. [12] K. Chanchio and X. Sun, “Data collection and restoration for heterogeneous process migration”, Software Practice and Experience, 32(9), pp. 845-871, 2002. [13] D. Lange and M. Oshima, “Programming Mobile Agents in Java - with the Java Aglet API”, AddisonWesley Longman: New York, 1998. [14] “PVM: Parallel Virtual Machine”, http://www.csm.ornl.gov/pvm/pvm home.html. Proceedings of the 2003 International Conference on Parallel Processing (ICPP’03) 0190-3918/03 $ 17.00 © 2003 IEEE Authorized licensed use limited to: IEEE Xplore. Downloaded on October 20, 2008 at 13:09 from IEEE Xplore. Restrictions apply.