Eric Helser 12 April 2005 CSCI360 Honors Project Floating-Point Data When a programmer wants to work with numbers in a program, he must decide which format of numbers to use. The IBM S/370 Assembler has three ways to represent numerical data: binary, such as in a register or storage, decimal, formatted by pack and unpack instructions, and floating-point. Similar to binary and decimal data, float has its own set of commands. The float format is a very useful implementation, without which assembler would have little use. The assembler’s other two formats, binary and decimal, have their own uses. Floating-point is the only format of the three that has an explicit decimal point. Binary, stored in general-purpose registers and fullword storage, is strictly integer data. Decimal format can simulate non-integer values through use of the SRP command, but there is no actual data to keep track of the decimal point. These two formats are useful for keeping track of integers, or non-integer values with a pre-determined number of decimal places. Good examples of these are the product identification codes, stock quantities, and ticket prices on recent assignments. Floating point is better suited to handle numbers with long strings of digits behind the decimal, or large numbers beyond the reach of normal fourbyte fields. Examples of this include irrational numbers, such as roots or mathematical constants, and scientific measurements, such as Avogadro’s Number or the distance between galaxies. While floating-point, decimal, and binary all have different applications, they all share a similar set of commands for working with data. Floating-point has a similar but unique set of commands, compared to other forms of data representation. While similar in usage to general-purpose register commands, what sets floating-point instructions apart is that they work specifically with a set of registers dedicated for use with floating-point instructions. These four registers are each 64 bits long, as opposed to the regular 32-bit length that general-purpose registers have. The commands Load, Load and Test, Store, Compare, Add, Subtract, Multiply, and Divide are similar to their general-purpose register instruction counterparts. Their mnemonics are exactly the same, except an “E” or “D” is added at the end to designate short or long data, respectively, and an “R” after either one designates that the instruction works with strictly registers. Short operations require four bytes of data from either storage or a floating-point register for each operand, while long operations require eight bytes for each operand. Floating-point register commands also include Load Positive, Negative, and Complement. The effect of these instructions is exactly the same as their general-purpose complements: Load Positive will load the absolute value of the second register specified into the first, Load Negative will load the opposite of the absolute value of the second register into the first, and Load Complement will effectively switch the sign of the second register before storing into the first register specified. The only difference in mnemonics between these instructions and the general-purpose versions is that there is an “E” or “D” before the “R” to denote whether the instruction works with 32 bits or 64 bits of data from the registers. The one instruction that is unique to floating-point arithmetic is Halve. There are two versions of halve: one for working with short data, and one for working with long data. Their mnemonics are HER and HDR, respectively. Halve, an instruction that requires two registers to be specified, works by dividing the second register’s contents by two and then storing it in the first register specified. It is a quick and effective way to divide by two using a command that is only available for floating-point numbers. The floating-point format has a clever way of representing fractions and large numbers. This format is very similar to scientific notation, which consists of a fractional number, or mantissa, and a characteristic, or base, raised to some exponent. Just as integers can be represented in hexadecimal by treating each digit as a coefficient of a positive power of 16, fractional numbers can be represented in hexadecimal by treating each digit right of the decimal place as a coefficient of a negative power of 16. Thus, a number such as 0.A becomes 10 * 16^-1, or 10/16. This is equal to the decimal representation 0.625. The first byte of both the four- and eight-byte versions of floating-point contains the sign bit and the characteristic, while the remaining bytes each contain two digits that contribute to the mantissa, with the most significant digit on the left, and padding on the right with zeroes if necessary. The first byte’s first bit is called the sign bit. If the number being represented is positive, this bit is 0. If it is negative, the sign bit is 1. The remaining seven bits denote the power to which 16 is raised, assuming that all digits in the mantissa are right of the decimal point. These seven bits can hold any value from zero to 127. In order to support negative characteristics, the assembler increases the exponent by 64 before converting it to binary. Therefore, the seven bits available can hold any exponent from -64 to +63. Referring back to the previous example, 0.A would be encoded in the following way: the mantissa .A00000, is padded with zeroes and ready to be stored in the right three bytes of the floating-point field. The first byte of the field is determined by the sign of the mantissa and the exponent. Since the number is positive, the sign bit will be zero. To calculate the representation of the exponent, first the value 64 is added to it. In this case, it is zero. This results in the first byte being encoded as 0100 0000 in binary. Thus, in hexadecimal, the value 0.A is represented as 40A0 0000. If this number were stored into a long format, four extra bytes of zeroes would be concatenated onto the right side of this value. This format for expressing numbers has several advantages and disadvantages. The floating-point format has its specific advantages. Floating-point can represent numbers far beyond the reach of a signed fullword with just as many bytes. The short floating-point type (E) requires four bytes and can represent numbers in the range of ±5.4 x 10^-79 to ±7.2 x 10^75, while a binary fullword can only represent numbers in the range ±2.147 x 10^9. Not only does the floating-point type have a greater range of numbers it can represent, it can represent a large quantity of values between two integers. Binary data can represent zero and one, but nothing in between, while floating-point is able to express values such as one-half, one-eighth, or three-quarters. However, there are some disadvantages to using floating-point format. Even though floating-point has a greater range of numbers it can represent, it has a limited amount of precision. E-type data, in four-byte fields, has only six hexadecimal digits of precision. D-type data, an eight-byte format, has fourteen digits of precision. Binary and decimal data may not be able to represent fractions or large numbers, but their values are always exact, while floating-point data may be rounded. Another drawback of using floating-point arithmetic as opposed to binary arithmetic is when doing division. Binary division requires two registers open: one for the quotient, and the other for the remainder. In floating-point arithmetic, the remainder is never calculated. Therefore any routines that rely on calculation of the remainder must use a different method to solve for them. In conclusion, the floating-point numerical format is very useful for working with data that is not in an integer format, or is very large. Even though it has a few shortcomings, float’s advantages outweigh its disadvantages, making it an indispensable part of the Assembler language.