Floating-Point Numbers

CENG536 Computer Engineering department Çankaya University The problem with fixed-point representation is illustrated by the following examples: The relative representation error due to truncation is quite significant for x while it is much less severe for y. On the other hand, both x2 and y2 are unrepresentable, because their computations lead to underflow (number too small) and overflow (too large), respectively. CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV 2 This numbers can be represented as The exponent -5 or +7 essentially indicates the direction and amount by which the radix-point must be moved to produce the corresponding fixed-point representation shown above. Hence, the designation is “floating-point numbers”. 3 CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV A floating-point number has four components: the sign, the significand (mantissa) s, the exponent base b, and the exponent e. The exponent base is usually a power of two except for digital arithmetic, where it is 10. mantissa 4 CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV A typical floating-point format. A key point to observe is that two signs are involved in a floating-point number. 5 CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV The use of biased exponent format has virtually no effect on the speed or cost of exponent arithmetic (addition / subtraction), given small number of bits involved. It does, however, facilitate zero detection (zero can be represented with the smallest biased exponent of 0 and an all-zero significand) and magnitude comparison (we can compare normalized floating-point numbers as if they were integers). 6 CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV The range of values in a floating-point number representation is composed of the intervals [- max, - min] and [max, min] : 7 CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV Number distribution pattern and subranges in presentations: There are three special or singular values -, 0 +. Zero is special because it can not be presented with a normalized mantissa (significand). CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV 8 Overflow occurs when a result is less then – max or greater then + max. Underflow, on the other hand, occurs for results in a range (– min, 0) or (0, min) 9 CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV The equation for the value of a floating-point number suggests that the range [- max, max] increases if we choose a larger exponent base b. A larger b also simplifies arithmetic operations on the exponents, since for the given range, smaller exponents must be dealt with. However, if the significand is to be kept in normalized form, effective precision decreases for larger b. In the past, machines with b = 2, 8, 16, or 256 were built. 10 CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV The exponent sign is almost always encoded in a biased format. As for a sign of a floating-point number, alternatives to the currently dominant signed-magnitude format include the use the 1’s or 2’s complement representation. Several variations have been tried in the past, including the complementation of the significand part only and the complementation of the entire number (including the exponent part) when the number to be represented is negative. 11 CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV The two representation formats in IEEE standard for binary floating-point numbers (ANSI/IEEE Std 754-1985) are depicted: 12 CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV 13 CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV Standard defines extended formats that allow implementation to carry higher precisions internally to reduce the effect of accumulated errors. Two extender formats are defined: 14 CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV Value = N = (-1)s  2 E-127  (1.M) The decimal number 0.7510 is to be represented in the IEEE 754 single precision format: 0.7510 = 0.112 (converted to a binary number) = 1.1  2-1 (normalized a binary number) hidden The mantissa is positive so the sign S is given by The biased exponent E is given by E = e + 127 S=0 E = - 1 + 127 = 12610 = 0111 11102 15 Fractional part of mantissa M = .1000…..000 (in 23 bits) CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV The IEEE 754 single precision representation is given by: 0 0 1 1 1 31 30 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 23 22 Sign 1 bit Exponent 8 bits 0 0 Bits Mantissa 23 bits 16 CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV The decimal number – 2345.12510 is to be represented in the IEEE 754 single precision format: – 2345.12510 = – 1001 0010 1001.0012 (converted to binary) = – 1.0010 0101 0010 012  211 (normalized binary) hidden The mantissa is negative so the sign S is given by S = 1 The biased exponent E is given by E = e + 127 E = 11 + 127 = 13810 = 1000 10102 Fractional part of mantissa M = .0010 0101 0010 0100 ... 000 (in 23 bits) 17 CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV The IEEE 754 single precision representation is given by: 1 1 0 0 0 1 0 1 31 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 23 22 Sign 1 bit Exponent 8 bits 0 Bits Mantissa 23 bits 18 CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV Basic arithmetic on floating-point numbers is conceptually simple. However, care must be taken in hardware implementation for ensuring corrections and avoiding undue loss of precision; in addition, it must be possible to handle any exceptions. Addition and subtraction are most difficult of the elementary operations for floating-point operands. Here, we deal only with addition, since subtraction can be converted to addition by flipping the sign of subtrahend. 19 CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV Consider the addition Assuming , we begin by aligning the two operand through right-shifting of the significand (mantissa) of the number with the smaller exponent. 20 CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV If the exponent base b and the number representation radix (base) are the same, we simply shift s2 to the right by e1 – e2 digits. When b = ra the shift amount, which is computed through direct subtraction of the biased exponent, is multiplied by a. In either case, this step is referred to as alignment shift, or preshift, (in contrast to normalization shift or postshift which is needed when the resulting significand s is unnormalized). 21 CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV We then perform addition as follows 22 CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV Floating-point multiplication is simpler then floating-point addition; it is performed by multiplying the significands and adding the exponents Postshifting may be needed, since the product s1  s2 of the two significands can be unnormalized. For example, we have , leading to the possible need for a singlebit right shift. Also, the computed exponent needs adjustment if the exponents are biased or if a normalization shift is performed. Overflow/underflow is possible during multiplication if e1 and e2 have like signs. Overflow is also possible due to normalization. CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV 23 Similarly, floating-point division is performed by dividing the significands and subtracting the exponents Here, problems to be dealt with are similar to those of multiplication. The ratio of the significands may have to be normalized. For example we have and a single bit left-shift is always adequate. The computed exponent needs adjustment is the exponents are biased or if a normalizing shift is performed. Overflow / underflow is possible during division if e1 and e2 have unlike signs. Underflow due to normalization is also possible. CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV 24 To extract the square root of a positive floating-point number, we first make its exponent even. This may require subtracting 1 from the exponent and multiplying the significand by b. We then use the following In the case of IEEE floating-point numbers, the adjusted significand will be in the range 1  s  4, which leads directly to a normalized significand for the result. Square-rooting never produced overflow or underflow. 25 CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV 26 CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV 27 CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV 28 CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV 29 CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV 30 CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV 31 CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV 32 CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV 33 CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV 34 CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV 35 CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV 36 CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV 37 CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV 38 CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Floating-Point Numbers

Related documents

Products

Support

Floating-Point Numbers

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib