Floating Point in ASM

advertisement
Eric Helser
12 April 2005
CSCI360 Honors Project
Floating-Point Data
When a programmer wants to work with numbers in a program, he must decide
which format of numbers to use. The IBM S/370 Assembler has three ways to represent
numerical data: binary, such as in a register or storage, decimal, formatted by pack and
unpack instructions, and floating-point. Similar to binary and decimal data, float has its
own set of commands. The float format is a very useful implementation, without which
assembler would have little use.
The assembler’s other two formats, binary and decimal, have their own uses.
Floating-point is the only format of the three that has an explicit decimal point. Binary,
stored in general-purpose registers and fullword storage, is strictly integer data. Decimal
format can simulate non-integer values through use of the SRP command, but there is no
actual data to keep track of the decimal point. These two formats are useful for keeping
track of integers, or non-integer values with a pre-determined number of decimal places.
Good examples of these are the product identification codes, stock quantities, and ticket
prices on recent assignments. Floating point is better suited to handle numbers with long
strings of digits behind the decimal, or large numbers beyond the reach of normal fourbyte fields. Examples of this include irrational numbers, such as roots or mathematical
constants, and scientific measurements, such as Avogadro’s Number or the distance
between galaxies. While floating-point, decimal, and binary all have different
applications, they all share a similar set of commands for working with data.
Floating-point has a similar but unique set of commands, compared to other forms
of data representation. While similar in usage to general-purpose register commands,
what sets floating-point instructions apart is that they work specifically with a set of
registers dedicated for use with floating-point instructions. These four registers are each
64 bits long, as opposed to the regular 32-bit length that general-purpose registers have.
The commands Load, Load and Test, Store, Compare, Add, Subtract, Multiply, and
Divide are similar to their general-purpose register instruction counterparts. Their
mnemonics are exactly the same, except an “E” or “D” is added at the end to designate
short or long data, respectively, and an “R” after either one designates that the instruction
works with strictly registers. Short operations require four bytes of data from either
storage or a floating-point register for each operand, while long operations require eight
bytes for each operand.
Floating-point register commands also include Load Positive, Negative, and
Complement. The effect of these instructions is exactly the same as their general-purpose
complements: Load Positive will load the absolute value of the second register specified
into the first, Load Negative will load the opposite of the absolute value of the second
register into the first, and Load Complement will effectively switch the sign of the second
register before storing into the first register specified. The only difference in mnemonics
between these instructions and the general-purpose versions is that there is an “E” or “D”
before the “R” to denote whether the instruction works with 32 bits or 64 bits of data
from the registers.
The one instruction that is unique to floating-point arithmetic is Halve. There are
two versions of halve: one for working with short data, and one for working with long
data. Their mnemonics are HER and HDR, respectively. Halve, an instruction that
requires two registers to be specified, works by dividing the second register’s contents by
two and then storing it in the first register specified. It is a quick and effective way to
divide by two using a command that is only available for floating-point numbers.
The floating-point format has a clever way of representing fractions and large
numbers. This format is very similar to scientific notation, which consists of a fractional
number, or mantissa, and a characteristic, or base, raised to some exponent. Just as
integers can be represented in hexadecimal by treating each digit as a coefficient of a
positive power of 16, fractional numbers can be represented in hexadecimal by treating
each digit right of the decimal place as a coefficient of a negative power of 16. Thus, a
number such as 0.A becomes 10 * 16^-1, or 10/16. This is equal to the decimal
representation 0.625.
The first byte of both the four- and eight-byte versions of floating-point contains
the sign bit and the characteristic, while the remaining bytes each contain two digits that
contribute to the mantissa, with the most significant digit on the left, and padding on the
right with zeroes if necessary. The first byte’s first bit is called the sign bit. If the number
being represented is positive, this bit is 0. If it is negative, the sign bit is 1. The remaining
seven bits denote the power to which 16 is raised, assuming that all digits in the mantissa
are right of the decimal point. These seven bits can hold any value from zero to 127. In
order to support negative characteristics, the assembler increases the exponent by 64
before converting it to binary. Therefore, the seven bits available can hold any exponent
from -64 to +63. Referring back to the previous example, 0.A would be encoded in the
following way: the mantissa .A00000, is padded with zeroes and ready to be stored in the
right three bytes of the floating-point field. The first byte of the field is determined by the
sign of the mantissa and the exponent. Since the number is positive, the sign bit will be
zero. To calculate the representation of the exponent, first the value 64 is added to it. In
this case, it is zero. This results in the first byte being encoded as 0100 0000 in binary.
Thus, in hexadecimal, the value 0.A is represented as 40A0 0000. If this number were
stored into a long format, four extra bytes of zeroes would be concatenated onto the right
side of this value. This format for expressing numbers has several advantages and
disadvantages.
The floating-point format has its specific advantages. Floating-point can represent
numbers far beyond the reach of a signed fullword with just as many bytes. The short
floating-point type (E) requires four bytes and can represent numbers in the range of ±5.4
x 10^-79 to ±7.2 x 10^75, while a binary fullword can only represent numbers in the
range ±2.147 x 10^9. Not only does the floating-point type have a greater range of
numbers it can represent, it can represent a large quantity of values between two integers.
Binary data can represent zero and one, but nothing in between, while floating-point is
able to express values such as one-half, one-eighth, or three-quarters.
However, there are some disadvantages to using floating-point format. Even
though floating-point has a greater range of numbers it can represent, it has a limited
amount of precision. E-type data, in four-byte fields, has only six hexadecimal digits of
precision. D-type data, an eight-byte format, has fourteen digits of precision. Binary and
decimal data may not be able to represent fractions or large numbers, but their values are
always exact, while floating-point data may be rounded. Another drawback of using
floating-point arithmetic as opposed to binary arithmetic is when doing division. Binary
division requires two registers open: one for the quotient, and the other for the remainder.
In floating-point arithmetic, the remainder is never calculated. Therefore any routines that
rely on calculation of the remainder must use a different method to solve for them.
In conclusion, the floating-point numerical format is very useful for working with
data that is not in an integer format, or is very large. Even though it has a few
shortcomings, float’s advantages outweigh its disadvantages, making it an indispensable
part of the Assembler language.
Download