Dealing with Heterogeneity

advertisement
CS3250 Distributed Systems
Lecture 14 Dealing with Heterogeneity
When client and server programs are running on machines with different architectures, there
may be problems in transmitting binary data as integers may be represented in different ways
on the two machines and similarly for floating point data. The most common form of
incompatibility is the byte ordering of multi-byte values; big-endian machines store the most
significant byte first whereas little-endian machines store the bytes in the opposite order.
Conversion between Big_Endian and Little-Endian Formats
Problems regarding big-endian and little-endian representations can be overcome by
converting such data to network byte order (big-endian) before transmission and converting
the data from network to host byte order on receipt. In C this is done with the functions:
htons
htonl
ntohs
ntohl
host to network order for 16-bit values
host to network order for 32-bit values
network to host order for 16-bit values
network to host order for 32-bit values
It is easy to write similar sub-programs in Ada for example:
WITH System; USE System;
.........
FUNCTION Host_To_Network(Num : Unsigned_Short) RETURN Unsigned_Short IS
BEGIN
IF Default_Bit_Order = High_Order_First THEN
RETURN Num;
ELSE
RETURN (Num MOD 256) * 256 + (Num / 256);
END IF;
END Host_To_Network;
where the following are defined in the package System:
TYPE Bit_Order IS (High_Order_First, Low_Order_First);
Default_Bit_Order : CONSTANT Bit_Order := <implementation-dependent>;
and where Unsigned_Short is imported from the standard Ada pacakage Interfaces.C and
represents a 16-bit unsigned number. Note it is necessary to use unsigned values here to
avoid possible problems of overflow that might occur if signed values were used. The
function to convert from network to host order can simply be defined as
FUNCTION Network_To_Host(Num : Unsigned_Short) RETURN Unsigned_Short
RENAMES Host_To_Network;
Similar conversion functions can be defined for 32-bit values (the type Unsigned)
FUNCTION Host_To_Network(Num : Unsigned) RETURN Unsigned IS
P1 : CONSTANT Unsigned := 2**24;
P2 : CONSTANT Unsigned := 2**16;
Middle : Unsigned;
BEGIN
IF Default_Bit_Order = High_Order_First THEN
RETURN Num;
ELSE
Middle := (Num / 256) MOD P2;
RETURN (Num MOD P1) * P1 + (Middle MOD 256)*P2 +
(Middle/256) * 256 + (Num / P1);
END IF;
END Host_To_Network;
© A Barnes, 2005
1
CS3250/L14
The function to convert from network to host order can simply be defined as
FUNCTION Network_To_Host(Num : Unsigned) RETURN Unsigned
RENAMES Host_To_Network;
Assuming both machines use a twos-complement representation, signed integer values can be
handled by casting them to the type Unsigned (using instantiations of the generic function
Unchecked_Conversion) and then using the above function Host_To_Network to convert to
network byte order. On reception the process is reversed. For floating point values providing
both machines use IEEE floating point representation we can cast floating point values to
unsigned integers and then use the function Host_To_Network to convert to network byte
order. On reception the process is reversed. Suitable instantiations are:
WITH Ada.Unchecked_Conversion;
.........
FUNCTION I2U IS NEW Ada.Unchecked_Conversion(Integer, Unsigned);
FUNCTION U2I IS NEW Ada.Unchecked_Conversion(Unsigned, Integer);
FUNCTION F2U IS NEW Ada.Unchecked_Conversion(Float, Unsigned);
FUNCTION U2F IS NEW Ada.Unchecked_Conversion(Unsigned, Float);
Note that these casts incur little or no processing overheads at run-time, they are merely used
to subvert Ada’s strong type checking.
For convenience we might define:
PROCEDURE Write(S : ACCESS Root_Stream_Type’Class; Num : IN Integer) IS
BEGIN
Unsigned’Write(S, Host_To_Network(I2U(I)));
END Write;
PROCEDURE Read(S : ACCESS Root_Stream_Type’Class; Num : OUT Integer) IS
BEGIN
Num := U2I(Network_To_Host(Unsigned’Input(S)));
END Read;
FUNCTION Input(S : ACCESS Root_Stream_Type’Class) RETURN Integer IS
BEGIN
RETURN U2I(Network_To_Host(Unsigned’Input(S)));
END Input;
PROCEDURE Write(S : ACCESS Root_Stream_Type’Class; Num : IN Float) IS
BEGIN
Unsigned’Write(S, Host_To_Network(F2U(Num)));
END Write;
PROCEDURE Read(S : ACCESS Root_Stream_Type’Class; Num : OUT Float) IS
BEGIN
Num := U2F(Network_To_Host(Unsigned’Input(S)));
END Read;
FUNCTION Input(S : ACCESS Root_Stream_Type’Class) RETURN Float IS
BEGIN
RETURN U2F(Network_To_Host(Unsigned’Input(S)));
END Input;
Transmission of strings of a known length does not cause a problem assuming both machines
encode characters in 1 byte and use the same character encoding (usually ASCII); we can
simply use the stream-io attributes String’Read and String’Write. If characters are
encoded using 2 bytes (as in Java or as with the Ada type Wide_Character) then one would
© A Barnes, 2005
2
CS3250/L14
need to cast characters to Unsigned_Short and then convert to network byte order before
transmission and reverse this process on reception.
Exercise
Write suitable stream Read and Write procedures for the type Wide_Character, similar to
those above for integers and floating point values.
When transmitting strings of a varying lengths we cannot simply use the stream-io attributes
String’Input and String’Output as the bounds of the string are integers and so need to
be converted to network byte order before transmission and back to host byte order on
reception. Thus it would be convenient to define:
PROCEDURE Output(S : ACCESS Root_Stream_Type’Class; Str : IN String) IS
BEGIN
Write(S, Str’First);
-- send lower index bound
Write(S, Str’Last);
-- send upper index bound
String’Write(S, Str); -- send the characaters
END Output;
FUNCTION Input(S : ACCESS Root_Stream_Type’Class) RETURN String IS
Low : Integer := Input(S); -- input lower bound
High : Integer := Input(S); -- input upper bound
Str : String(Low .. High);
-- declare String of correct size
BEGIN
String’Read(S, Str); -- input the characters of the string
RETURN Str;
END Input;
Marshalling and Unmarshalling
However the above procedures may not always be sufficient. For example if one machine
type uses a twos complement representation for integers and another uses a ones complement
representation or if different non-IEEE representations for floating point values were used on
the two machines this simple minded approach would not work. Special conversion
functions would be needed to convert integer and floating point values from the sending
host’s format to a common ‘network’ format before transmission and on reception to convert
from network format to the receiving host’s format.
The process of transforming data from host binary format to a machine-independent format
before transmission is called marshalling and converting it to host binary format on
reception is called unmarshalling.
We will marshall data by transforming it to text form and unmarshall it by transforming the
text representation back to host binary format. Most languages C, Ada etc. have a number of
library sub-programs for converting data to and from a textual representation. This method
allows a high degree of portability as most machines use an ASCII representation of text.
Such text-based marshalling methods are also particularly useful for debugging. However
there can still be problems if machine use different character encodings in which case a
further conversion would be necessary during marshalling and unmarshalling to some
common format for character encoding.
In this lecture the process of marshalling and unmarshalling will be illustrated briefly to gain
some insight as to what is involved. Suppose for definiteness we wish to transfer floating
point values between two machines which use different representations for floating point
values, but for simplicity we will assume both use a twos complement representation for
integers (possibly with different byte ordering conventions).
© A Barnes, 2005
3
CS3250/L14
© A Barnes, 2005
4
CS3250/L14
Converting Data to/from a Text Form
In Ada all basic types Typ have primitive attribute operations:
Typ'Image
Typ'Value
String representation of value.
Constructs a value from its string representation.
which may be used in the marshalling and unmarshalling process. The use of these is
illustrated by the following fragment:
F : Float;
..........
Ada.Text_IO.Put(Float'Image(F));
..........
F := Float'Value("123.4");
-- output string representation of F
An alternative (potentially giving more control over the output format) would be to use the
string-based versions of Get and Put from the package Ada.Float_Text_IO or the package
CS_Flt_IO (which is essentially a wrapping of the standard package with some defaults
changed).
Str : String := "1.234 5.67";
-- say
Str1, Str2 : String(1 .. 20);
-- say
F1, F2 : Float;
Temp : Intger;
.........
Ada.Float_Text_IO.Get(From => Str, Item => F1, Last => Temp);
Ada.Float_Text_IO.Get(From => Str(Temp+1 .. Str'Last),
Item => F2, Last => Temp);
.........
.........
Ada.Integer_Text_IO.Put(To => Str1, Item => F1);
Ada.Integer_Text_IO.Put(To => Str2, Item => F2, Aft => 2);
The two Get procedure calls 'input' the values 1.234 and 5.67 into variables F1 and F2 from
the character string Str. The parameter Last is used to pass back the index in the string of
the character which terminated the 'input' operation and so can be used as shown to input two
(or more) values from a string by starting the next 'input' operation from where the last one
finished.
The two Put procedure calls 'output' a string representation of the values of variables F1 and
F2 into the character strings Str1 and Str2; the strings are padded with blanks and the value
of F2 is output to two decimal place accuracy whereas the first is output to the default
number of decimal places.
Once floating point values have been converted to strings they can be transmitted using the
subprograms Input and Output discussed above which takes care of ‘endian’ problems.
Marshalling More Complex Data
Once sub-programs have been developed for marshalling and unmarshalling basic data types
more complex data structures are easy to handle.
Arrays of a fixed size are
marshalled/unmarshalled sequentially by applying the appropriate method for its individual
elements. Similarly Ada records or C structures can be marshalled/unmarshalled by applying
in turn the appropriate method for its individual fields. Arrays of variable size may be
handled by marshalling/unmarshalling the two index bounds and then applying the method
for arrays of a known size. Similarly discriminated (variant) records can be handled by
© A Barnes, 2005
5
CS3250/L14
marshalling/unmarshalling the discriminant(s) and then applying the method for a normal
record once the structure of its fields is known.
© A Barnes, 2005
6
CS3250/L14
This process is automated in most middle-ware systems. For example in the UNIX RPC
(remote procedure call) library there is a suite of programs for converting data of all types
to and from XDR (external data representation) format. More recently SOAP (Simple
Object Access Protocol) has been developed; this represents complex data structures (objects)
in textual form using XML.
Transmitting LIMITED Data Structures
For LIMITED types stream I/O attribute subprograms are not generated automatically (since
the designation LIMITED indicates that values of the type cannot be copied or compared by
the standard assignment and equality 'operators'). If stream I/O subprograms are required for
the type they must be explicitly defined by the user.
Exercise
Write suitable stream marshalling functions for a circular bounded buffer structure defined by
the following scheme (cf. the bounded buffer protected object in the notes for lecture 12)
TYPE Buffer IS LIMITED PRIVATE;
PROCEDURE Deposit(B : IN OUT Buffer; X : IN Data_Item);
PROCEDURE Extract(B : IN OUT Buffer; X : OUT Data_Item);
.........
PRIVATE
N : CONSTANT Positive := ....; -- buffer size
TYPE Index IS MOD N;
TYPE Item_Array IS ARRAY(Index) OF Data_Item;
TYPE BUFFER IS RECORD
Elems : Item_Array;
I : Index := 0;
-- items deposited at this position
J : Index := 0;
-- items extracted from this position
Count : Natural RANGE 0 .. N := 0;
END RECORD;
It is also possible to define stream IO-attributes for dynamic data structures (which are
usually designated as LIMITED). Thus it is possible to save a representation of the dynamic
data structure to a file and to reconstruct an identical copy of the structure on input (using for
example the facilities in Ada.Streams.Stream_IO). Alternatively the data structure could be
transmitted over a pipe or socket by associating a stream with the pipe or socket.
WITH Ada.Streams; USE Ada.Streams;
PACKAGE StringTrees IS
TYPE String_Access IS ACCESS String;
TYPE Node;
TYPE Tree IS ACCESS Node;
TYPE Node IS RECORD
Str : String_Access;
Left, Right : Tree;
END RECORD;
PROCEDURE Read(S : ACCESS Root_Stream_Type'Class;
T : OUT Tree);
FUNCTION Input(S : ACCESS Root_Stream_Type'Class) RETURN Tree;
PROCEDURE Write(S : ACCESS Root_Stream_Type'Class;
T : IN Tree);
FOR Tree'Read USE Read;
FOR Tree’Input USE Input;
FOR Tree'Write USE Write;
© A Barnes, 2005
7
CS3250/L14
FOR Tree’Output USE Write;
PROCEDURE Insert(T : Tree; S : IN String);
FUNCTION Remove(T : Tree) RETURN String;
-- other Tree manipulation facilities
...............
END StringTrees;
The corresponding package body does not, of course, transmit actual access values (pointers)
over the stream as these are (essentially) virtual addresses and are thus highly process
dependent. If such values were saved to a file and retrieved by another process they would
have no invariant meaning. Instead the Boolean value False is transmitted for null pointers
whereas True is transmitted for non-null pointers followed by a representation of the sub-tree
pointed to by the pointer (using a recursive call). In fact we do a pre-order traversal of the
tree. On input an identical copy of the tree is constructed dynamically by allocating a new
node when a True value is input then the string is input and then, by using recursing, the left
and right sub-trees are input.
PACKAGE BODY StringTrees IS
PROCEDURE Write(S : ACCESS Root_Stream_Type'Class; T : IN Tree) IS
BEGIN
IF T = NULL THEN
Boolean'Output(S, False);
ELSE
Boolean'Output(S, True);
Output(S, T.Str.ALL);
Write(S, T.Left);
Write(S, T.Right);
END IF;
END Write;
PROCEDURE Read(S : ACCESS Root_Stream_Type'Class; T : OUT Tree) IS
BEGIN
IF Boolean'Input(S) THEN
T := NEW Node;
T.Str = NEW String'(Input(S));
Read(S, T.Left);
Read(S, T.Right);
ELSE
T := NULL;
END IF;
END Read;
FUNCTION Input(S : ACCESS Root_Stream_Type'Class) RETURN Tree IS
T : Tree;
BEGIN
Read(S, T);
RETURN T;
END Input;
-- implementation of other Tree manipulation routines
...............
...............
END StringTrees;
Note the use of the subprograms Input and Output discussed above to handle big/littleendian problems when transmitting the string index bounds. It is assumed that there are no
‘endian’ problems with boolean values which are likely to be represented by a single byte.
© A Barnes, 2005
8
CS3250/L14
Of course similar techniques could be used for other user-defined types. Note however that
the stream-IO attributes and other primitive operations for a type can only be
defined/redefined in the package which defines the type.
© A Barnes, 2005
9
CS3250/L14
Download