CS3250 Distributed Systems Lecture 14 Dealing with Heterogeneity When client and server programs are running on machines with different architectures, there may be problems in transmitting binary data as integers may be represented in different ways on the two machines and similarly for floating point data. The most common form of incompatibility is the byte ordering of multi-byte values; big-endian machines store the most significant byte first whereas little-endian machines store the bytes in the opposite order. Conversion between Big_Endian and Little-Endian Formats Problems regarding big-endian and little-endian representations can be overcome by converting such data to network byte order (big-endian) before transmission and converting the data from network to host byte order on receipt. In C this is done with the functions: htons htonl ntohs ntohl host to network order for 16-bit values host to network order for 32-bit values network to host order for 16-bit values network to host order for 32-bit values It is easy to write similar sub-programs in Ada for example: WITH System; USE System; ......... FUNCTION Host_To_Network(Num : Unsigned_Short) RETURN Unsigned_Short IS BEGIN IF Default_Bit_Order = High_Order_First THEN RETURN Num; ELSE RETURN (Num MOD 256) * 256 + (Num / 256); END IF; END Host_To_Network; where the following are defined in the package System: TYPE Bit_Order IS (High_Order_First, Low_Order_First); Default_Bit_Order : CONSTANT Bit_Order := <implementation-dependent>; and where Unsigned_Short is imported from the standard Ada pacakage Interfaces.C and represents a 16-bit unsigned number. Note it is necessary to use unsigned values here to avoid possible problems of overflow that might occur if signed values were used. The function to convert from network to host order can simply be defined as FUNCTION Network_To_Host(Num : Unsigned_Short) RETURN Unsigned_Short RENAMES Host_To_Network; Similar conversion functions can be defined for 32-bit values (the type Unsigned) FUNCTION Host_To_Network(Num : Unsigned) RETURN Unsigned IS P1 : CONSTANT Unsigned := 2**24; P2 : CONSTANT Unsigned := 2**16; Middle : Unsigned; BEGIN IF Default_Bit_Order = High_Order_First THEN RETURN Num; ELSE Middle := (Num / 256) MOD P2; RETURN (Num MOD P1) * P1 + (Middle MOD 256)*P2 + (Middle/256) * 256 + (Num / P1); END IF; END Host_To_Network; © A Barnes, 2005 1 CS3250/L14 The function to convert from network to host order can simply be defined as FUNCTION Network_To_Host(Num : Unsigned) RETURN Unsigned RENAMES Host_To_Network; Assuming both machines use a twos-complement representation, signed integer values can be handled by casting them to the type Unsigned (using instantiations of the generic function Unchecked_Conversion) and then using the above function Host_To_Network to convert to network byte order. On reception the process is reversed. For floating point values providing both machines use IEEE floating point representation we can cast floating point values to unsigned integers and then use the function Host_To_Network to convert to network byte order. On reception the process is reversed. Suitable instantiations are: WITH Ada.Unchecked_Conversion; ......... FUNCTION I2U IS NEW Ada.Unchecked_Conversion(Integer, Unsigned); FUNCTION U2I IS NEW Ada.Unchecked_Conversion(Unsigned, Integer); FUNCTION F2U IS NEW Ada.Unchecked_Conversion(Float, Unsigned); FUNCTION U2F IS NEW Ada.Unchecked_Conversion(Unsigned, Float); Note that these casts incur little or no processing overheads at run-time, they are merely used to subvert Ada’s strong type checking. For convenience we might define: PROCEDURE Write(S : ACCESS Root_Stream_Type’Class; Num : IN Integer) IS BEGIN Unsigned’Write(S, Host_To_Network(I2U(I))); END Write; PROCEDURE Read(S : ACCESS Root_Stream_Type’Class; Num : OUT Integer) IS BEGIN Num := U2I(Network_To_Host(Unsigned’Input(S))); END Read; FUNCTION Input(S : ACCESS Root_Stream_Type’Class) RETURN Integer IS BEGIN RETURN U2I(Network_To_Host(Unsigned’Input(S))); END Input; PROCEDURE Write(S : ACCESS Root_Stream_Type’Class; Num : IN Float) IS BEGIN Unsigned’Write(S, Host_To_Network(F2U(Num))); END Write; PROCEDURE Read(S : ACCESS Root_Stream_Type’Class; Num : OUT Float) IS BEGIN Num := U2F(Network_To_Host(Unsigned’Input(S))); END Read; FUNCTION Input(S : ACCESS Root_Stream_Type’Class) RETURN Float IS BEGIN RETURN U2F(Network_To_Host(Unsigned’Input(S))); END Input; Transmission of strings of a known length does not cause a problem assuming both machines encode characters in 1 byte and use the same character encoding (usually ASCII); we can simply use the stream-io attributes String’Read and String’Write. If characters are encoded using 2 bytes (as in Java or as with the Ada type Wide_Character) then one would © A Barnes, 2005 2 CS3250/L14 need to cast characters to Unsigned_Short and then convert to network byte order before transmission and reverse this process on reception. Exercise Write suitable stream Read and Write procedures for the type Wide_Character, similar to those above for integers and floating point values. When transmitting strings of a varying lengths we cannot simply use the stream-io attributes String’Input and String’Output as the bounds of the string are integers and so need to be converted to network byte order before transmission and back to host byte order on reception. Thus it would be convenient to define: PROCEDURE Output(S : ACCESS Root_Stream_Type’Class; Str : IN String) IS BEGIN Write(S, Str’First); -- send lower index bound Write(S, Str’Last); -- send upper index bound String’Write(S, Str); -- send the characaters END Output; FUNCTION Input(S : ACCESS Root_Stream_Type’Class) RETURN String IS Low : Integer := Input(S); -- input lower bound High : Integer := Input(S); -- input upper bound Str : String(Low .. High); -- declare String of correct size BEGIN String’Read(S, Str); -- input the characters of the string RETURN Str; END Input; Marshalling and Unmarshalling However the above procedures may not always be sufficient. For example if one machine type uses a twos complement representation for integers and another uses a ones complement representation or if different non-IEEE representations for floating point values were used on the two machines this simple minded approach would not work. Special conversion functions would be needed to convert integer and floating point values from the sending host’s format to a common ‘network’ format before transmission and on reception to convert from network format to the receiving host’s format. The process of transforming data from host binary format to a machine-independent format before transmission is called marshalling and converting it to host binary format on reception is called unmarshalling. We will marshall data by transforming it to text form and unmarshall it by transforming the text representation back to host binary format. Most languages C, Ada etc. have a number of library sub-programs for converting data to and from a textual representation. This method allows a high degree of portability as most machines use an ASCII representation of text. Such text-based marshalling methods are also particularly useful for debugging. However there can still be problems if machine use different character encodings in which case a further conversion would be necessary during marshalling and unmarshalling to some common format for character encoding. In this lecture the process of marshalling and unmarshalling will be illustrated briefly to gain some insight as to what is involved. Suppose for definiteness we wish to transfer floating point values between two machines which use different representations for floating point values, but for simplicity we will assume both use a twos complement representation for integers (possibly with different byte ordering conventions). © A Barnes, 2005 3 CS3250/L14 © A Barnes, 2005 4 CS3250/L14 Converting Data to/from a Text Form In Ada all basic types Typ have primitive attribute operations: Typ'Image Typ'Value String representation of value. Constructs a value from its string representation. which may be used in the marshalling and unmarshalling process. The use of these is illustrated by the following fragment: F : Float; .......... Ada.Text_IO.Put(Float'Image(F)); .......... F := Float'Value("123.4"); -- output string representation of F An alternative (potentially giving more control over the output format) would be to use the string-based versions of Get and Put from the package Ada.Float_Text_IO or the package CS_Flt_IO (which is essentially a wrapping of the standard package with some defaults changed). Str : String := "1.234 5.67"; -- say Str1, Str2 : String(1 .. 20); -- say F1, F2 : Float; Temp : Intger; ......... Ada.Float_Text_IO.Get(From => Str, Item => F1, Last => Temp); Ada.Float_Text_IO.Get(From => Str(Temp+1 .. Str'Last), Item => F2, Last => Temp); ......... ......... Ada.Integer_Text_IO.Put(To => Str1, Item => F1); Ada.Integer_Text_IO.Put(To => Str2, Item => F2, Aft => 2); The two Get procedure calls 'input' the values 1.234 and 5.67 into variables F1 and F2 from the character string Str. The parameter Last is used to pass back the index in the string of the character which terminated the 'input' operation and so can be used as shown to input two (or more) values from a string by starting the next 'input' operation from where the last one finished. The two Put procedure calls 'output' a string representation of the values of variables F1 and F2 into the character strings Str1 and Str2; the strings are padded with blanks and the value of F2 is output to two decimal place accuracy whereas the first is output to the default number of decimal places. Once floating point values have been converted to strings they can be transmitted using the subprograms Input and Output discussed above which takes care of ‘endian’ problems. Marshalling More Complex Data Once sub-programs have been developed for marshalling and unmarshalling basic data types more complex data structures are easy to handle. Arrays of a fixed size are marshalled/unmarshalled sequentially by applying the appropriate method for its individual elements. Similarly Ada records or C structures can be marshalled/unmarshalled by applying in turn the appropriate method for its individual fields. Arrays of variable size may be handled by marshalling/unmarshalling the two index bounds and then applying the method for arrays of a known size. Similarly discriminated (variant) records can be handled by © A Barnes, 2005 5 CS3250/L14 marshalling/unmarshalling the discriminant(s) and then applying the method for a normal record once the structure of its fields is known. © A Barnes, 2005 6 CS3250/L14 This process is automated in most middle-ware systems. For example in the UNIX RPC (remote procedure call) library there is a suite of programs for converting data of all types to and from XDR (external data representation) format. More recently SOAP (Simple Object Access Protocol) has been developed; this represents complex data structures (objects) in textual form using XML. Transmitting LIMITED Data Structures For LIMITED types stream I/O attribute subprograms are not generated automatically (since the designation LIMITED indicates that values of the type cannot be copied or compared by the standard assignment and equality 'operators'). If stream I/O subprograms are required for the type they must be explicitly defined by the user. Exercise Write suitable stream marshalling functions for a circular bounded buffer structure defined by the following scheme (cf. the bounded buffer protected object in the notes for lecture 12) TYPE Buffer IS LIMITED PRIVATE; PROCEDURE Deposit(B : IN OUT Buffer; X : IN Data_Item); PROCEDURE Extract(B : IN OUT Buffer; X : OUT Data_Item); ......... PRIVATE N : CONSTANT Positive := ....; -- buffer size TYPE Index IS MOD N; TYPE Item_Array IS ARRAY(Index) OF Data_Item; TYPE BUFFER IS RECORD Elems : Item_Array; I : Index := 0; -- items deposited at this position J : Index := 0; -- items extracted from this position Count : Natural RANGE 0 .. N := 0; END RECORD; It is also possible to define stream IO-attributes for dynamic data structures (which are usually designated as LIMITED). Thus it is possible to save a representation of the dynamic data structure to a file and to reconstruct an identical copy of the structure on input (using for example the facilities in Ada.Streams.Stream_IO). Alternatively the data structure could be transmitted over a pipe or socket by associating a stream with the pipe or socket. WITH Ada.Streams; USE Ada.Streams; PACKAGE StringTrees IS TYPE String_Access IS ACCESS String; TYPE Node; TYPE Tree IS ACCESS Node; TYPE Node IS RECORD Str : String_Access; Left, Right : Tree; END RECORD; PROCEDURE Read(S : ACCESS Root_Stream_Type'Class; T : OUT Tree); FUNCTION Input(S : ACCESS Root_Stream_Type'Class) RETURN Tree; PROCEDURE Write(S : ACCESS Root_Stream_Type'Class; T : IN Tree); FOR Tree'Read USE Read; FOR Tree’Input USE Input; FOR Tree'Write USE Write; © A Barnes, 2005 7 CS3250/L14 FOR Tree’Output USE Write; PROCEDURE Insert(T : Tree; S : IN String); FUNCTION Remove(T : Tree) RETURN String; -- other Tree manipulation facilities ............... END StringTrees; The corresponding package body does not, of course, transmit actual access values (pointers) over the stream as these are (essentially) virtual addresses and are thus highly process dependent. If such values were saved to a file and retrieved by another process they would have no invariant meaning. Instead the Boolean value False is transmitted for null pointers whereas True is transmitted for non-null pointers followed by a representation of the sub-tree pointed to by the pointer (using a recursive call). In fact we do a pre-order traversal of the tree. On input an identical copy of the tree is constructed dynamically by allocating a new node when a True value is input then the string is input and then, by using recursing, the left and right sub-trees are input. PACKAGE BODY StringTrees IS PROCEDURE Write(S : ACCESS Root_Stream_Type'Class; T : IN Tree) IS BEGIN IF T = NULL THEN Boolean'Output(S, False); ELSE Boolean'Output(S, True); Output(S, T.Str.ALL); Write(S, T.Left); Write(S, T.Right); END IF; END Write; PROCEDURE Read(S : ACCESS Root_Stream_Type'Class; T : OUT Tree) IS BEGIN IF Boolean'Input(S) THEN T := NEW Node; T.Str = NEW String'(Input(S)); Read(S, T.Left); Read(S, T.Right); ELSE T := NULL; END IF; END Read; FUNCTION Input(S : ACCESS Root_Stream_Type'Class) RETURN Tree IS T : Tree; BEGIN Read(S, T); RETURN T; END Input; -- implementation of other Tree manipulation routines ............... ............... END StringTrees; Note the use of the subprograms Input and Output discussed above to handle big/littleendian problems when transmitting the string index bounds. It is assumed that there are no ‘endian’ problems with boolean values which are likely to be represented by a single byte. © A Barnes, 2005 8 CS3250/L14 Of course similar techniques could be used for other user-defined types. Note however that the stream-IO attributes and other primitive operations for a type can only be defined/redefined in the package which defines the type. © A Barnes, 2005 9 CS3250/L14