Advanced Tutorials THE EVIL TWINS OF REAL NUMBERS THAT MAY CAUSE· UNEXPECTED RESULTS IN SAS APPLICATIONS Aileen L. Yam, Corning Besselaar, Inc., Princeton, NJ ABSTRACT The real number system is infinite. Because oj computer hardware limitations, real numbers are being converted internally and are represented with an equivalent finite set oj numbers. Some real numbers cannot be represented exactly and can vary very Slightly according to different plat/orms on which the SAS System runs. Thus, numeric representation has the potential Jor imprecision. This paper is a non-technical guide to understand numeric representation, with examples to illustrate when numeric representation can cause unexpected results and how to minimize representation problem. INTRODUCTION The issue of numeric representation is seldom noticed, because representation process takes place internally. In most situations, representation problem is subtle and negligible. But sometimes inaccurate results due to arithmetic operations on the inexact representation need to be redressed. This paper defines and explains the reason for numeric representation, provides examples to show when numeric representation presents a problem, and discusses some common practices on how to minimize the problem in SAS applications. DEFINITION AND REASONS NUMERIC REPRESENTATION FOR Numeric representation refers to a finite set of numbers that represents the infinite real number system. The SAS System uses floating-point binary (base 2) or hexadecimal (base 16) notations, depending on computer platfonns, to represent the decimal (base 10) system. NESUG '96 Proceedings 102 Historically, computer hardware used two-way light bulbs to represent and compute numbers. One bit of information was signaled by a light bulb being on or off, which was equal to one or zero, yes or no, true or false. The on-off state of electrical switches was easily adapted to encoding the true-false operations of symbolic lOgic, making computers more than a computing machine. The binary system was and still remains the basic building block of every computer, due to its simplicity and efficiency. Today, there are several variations of the binary system, but they are all multiples of base 2. For example, the octal (base 8) system is 2 raised to the third power (8=2x2x2=2 3) and one octal digit is the equivalent of three binary digits. Similarly, the hexadecimal (base 16) system is 2 raised to the fourth power (8=2x2x2x2=2 4 ) and one hexadecimal digit represents four binary digits. The real numbers that we use in our everyday life are based on a decimal system. The real number system can have an infinite amount of numbers, which are beyond the storage capacity of any computer. Because it is not possible to Advanced Tutorials store an infinite amount of numbers and because it is more efficient for computers to perform arithmetic and logical operations on binary numbers. real numbers are converted to a finite numeric system. based on the binary concept. WHEN WILL REPRESENTATION A PRQBLEM? addition. subtraction. multiplication. division. or exponentiation. Another mathematical axiom is (..Jx)2. where the square root function and the square function are inverse to each other. and thus ( ..Jx)2=x. Written programmatically in SAS statement, the equation can be either sqrt(x)*sqrt(x)=x or sqrt(x)**2=x. The two equations are written in two different forms. but mathematically they have the same meaning. If x is substituted with numbers for testing. the equations do not hold true all the time. because some numbers are not exact representations of their true decimal values. BE Numeric values are truncated and lose precIsion if LENGTH or ATIRIB statements are used to reduce the number of bytes when storage space is limited. Given enough memory. integers do not have representation problem. but fractional values always have the potential for representation problem. The reason is that the conversion of fractional values from decimal to binary or hexadecimal sometimes results in an infinite series of trailing numbers. such as the conversion of .1 from decimal to hexadecimal is equal to .199999....... resulting in an infinite series of trailing 9·s. The infinite series of numbers cannot be represented exactly in a finite numeric system. Representation problem can be further intensified by additional arithmetic operations and iterations. of To illustrate the problem representation propagated through several. iterations with DO loop. one additional known mathematical truth can be tested. It is known mathematically that 1 - ±! is i=d always equal to zero. The SAS statement for this equation is: x=1 (%do i=l %to &j; For example. it is a known mathematical truth that I + 2 is equal to 3. and .1 + .2 is equal to.3. If these two statements are tested in a SAS DATA step on a UNIX (hexadecimal) system. 1 + 2 is equal to 3 • but .1 + .2 is not exactly equal to .3. it is Slightly greater than.3. To investigate the difference. both .3 and the result of .1 + .2 are carried out to 30 decimals and printed with the numeric format of 32.30. In both cases. the values are. 300000000000000000000000000000 because except for the HEXI6. format. all other SAS formats round numeric values before they are printed. To pinpoint the difference. the numbers have to be printed in HEX16. formal The value displayed for .3 is 3FD3333333333333. while the value displayed for .1 + .2 is 3FD3333333333334. +1/&j; %end;); Theoretically. the result is always zero, regardless of any value supplied to the macro variable reference, &j. However. due to representation problem, the result is not always zero. For values that constitute representation problem, the effect of the problem increases every time the value is divided by &j and goes through the DO loop. In general. when displaying fractional values. representation problem is usually compensated by the SAS printing formats and the results appear to be correct. When determining exact fractional values, however, inexact representation generates inaccurate results. The effect of inexact representation can be compounded by each component of an equation or Numeric representation problem can occur in some apparently simple application like the example given above. It can occur in any arithmetic operation: 103 NESUG '96 Proceedings Advanced Tutorials propagated through several iterations of the same equation. HOW TO MINIMIZE REPRESENTATION PROBI,EM? The first rule of thumb is to avoid decreasing the storage space of numeric variables, especially those that may contain fractional values, from the default length of 8 bytes. If there are more digits than there is room allowed for, the numbers are stored with less than full precision and cannot be fully represented. Even with the default length of 8 bytes of storage space, representation problem still exists in some fractional values as previously indicated. There are several alternatives to account for representation problem and to minimize unexpected results. One alternative is to use integer whenever possible. If.l + .2 is not equal to .3, simply store the values in integers instead, add 1 to 2, and then adjust the decimal place with formatting statements. Using integers is a good solution for financial institutions where the majority of the numeric values are in dollars and cents. The numeric values can be stored as integers, and the results can be formatted to shift the decimal places. Other options are to round or to fuzz the numbers before comparing the results of arithmetic operations with the expected values. These options can be used for pharmaceutical research when comparing laboratory values to normal ranges. The ROUND function and the FUZZ function are similar in that both functions add a fuzz factor of IE-12. that is, .000000000001, to the positive values to be rounded or fuzzed. or subtract the same fuzz factor in negative values to account for representation inequality. The difference between the two functions is that with the ROUND function, the number of digits for a value to be outputted can be adjusted, but with the FUZZ function. the value returned is pre-determined. NESUG '96 Proceedings 104 SUMMARY Converting the infinite real number system into a set of fmite numbers is necessary because of computer hardware configuratiOns. The SAS system stores numeric values and performs numeric computations in either base 2 or base 16 on most platforms. Binary or hexadecimal representations are not always the identical twins of their decimal counterparts. It helps to beware of and to anticipate when the evil twins misbehave. Due to different needs in different applications. there is no single rule of thumb to manage the disparity between numbering systems. The most common solutions are to avoid reducing the number of bytes needed to store numeric variables with the LENGTH or the ATTRlB statements. to use integers whenever possible. and to round off or to fuzz numbers where appropriate. SAS is a registered trademark or trademark of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies. For additional iriformalion. COnloe,: Aileen L. Yam Corning Besseiaar, Inc. 210 Carnegie Center Princeton, NJ 08540-6233 Tel.: (609) 452-4200