THE EVIL TWINS OF REAL NUMBERS THAT MAY CAUSE

advertisement
Advanced Tutorials
THE EVIL TWINS OF REAL NUMBERS THAT MAY CAUSE·
UNEXPECTED RESULTS IN SAS APPLICATIONS
Aileen L. Yam, Corning Besselaar, Inc., Princeton, NJ
ABSTRACT
The real number system is infinite. Because oj computer hardware limitations, real numbers
are being converted internally and are represented with an equivalent finite set oj numbers.
Some real numbers cannot be represented exactly and can vary very Slightly according to
different plat/orms on which the SAS System runs. Thus, numeric representation has the
potential Jor imprecision. This paper is a non-technical guide to understand numeric
representation, with examples to illustrate when numeric representation can cause
unexpected results and how to minimize representation problem.
INTRODUCTION
The issue of numeric representation is
seldom noticed, because representation
process takes place internally. In most
situations, representation problem is subtle
and negligible. But sometimes inaccurate
results due to arithmetic operations on the
inexact representation need to be
redressed.
This paper defines and explains the
reason for numeric
representation,
provides examples to show when numeric
representation presents a problem, and
discusses some common practices on how
to minimize the problem in SAS
applications.
DEFINITION AND REASONS
NUMERIC REPRESENTATION
FOR
Numeric representation refers to a
finite set of numbers that represents the
infinite real number system. The SAS
System uses floating-point binary (base 2)
or hexadecimal (base 16) notations,
depending on computer platfonns, to
represent the decimal (base 10) system.
NESUG '96 Proceedings
102
Historically, computer hardware used
two-way light bulbs to represent and
compute numbers. One bit of information
was signaled by a light bulb being on or
off, which was equal to one or zero, yes or
no, true or false. The on-off state of
electrical switches was easily adapted to
encoding the true-false operations of
symbolic lOgic, making computers more
than a computing machine. The binary
system was and still remains the basic
building block of every computer, due to
its simplicity and efficiency. Today, there
are several variations of the binary system,
but they are all multiples of base 2. For
example, the octal (base 8) system is 2
raised to the third power (8=2x2x2=2 3)
and one octal digit is the equivalent of
three binary digits.
Similarly, the
hexadecimal (base 16) system is 2 raised
to the fourth power (8=2x2x2x2=2 4 ) and
one hexadecimal digit represents four
binary digits.
The real numbers that we use in our
everyday life are based on a decimal
system. The real number system can have
an infinite amount of numbers, which are
beyond the storage capacity of any
computer. Because it is not possible to
Advanced Tutorials
store an infinite amount of numbers and
because it is more efficient for computers
to perform arithmetic and logical
operations on binary numbers. real
numbers are converted to a finite numeric
system. based on the binary concept.
WHEN WILL REPRESENTATION
A PRQBLEM?
addition.
subtraction.
multiplication.
division. or exponentiation.
Another mathematical axiom is (..Jx)2.
where the square root function and the
square function are inverse to each other.
and
thus
( ..Jx)2=x.
Written
programmatically in SAS statement, the
equation can be either sqrt(x)*sqrt(x)=x
or sqrt(x)**2=x. The two equations are
written in two different forms. but
mathematically they have the same
meaning. If x is substituted with numbers
for testing. the equations do not hold true
all the time. because some numbers are not
exact representations of their true decimal
values.
BE
Numeric values are truncated and lose
precIsion if LENGTH or ATIRIB
statements are used to reduce the number
of bytes when storage space is limited.
Given enough memory. integers do not
have representation problem. but fractional
values always have the potential for
representation problem. The reason is that
the conversion of fractional values from
decimal to binary or hexadecimal
sometimes results in an infinite series of
trailing numbers. such as the conversion of
.1 from decimal to hexadecimal is equal to
.199999....... resulting in an infinite series
of trailing 9·s. The infinite series of
numbers cannot be represented exactly in
a finite numeric system.
Representation problem can be further
intensified by additional arithmetic
operations and iterations.
of
To
illustrate the problem
representation propagated through several.
iterations with DO loop. one additional
known mathematical truth can be tested. It
is known mathematically that 1 -
±! is
i=d
always equal to zero. The SAS statement
for this equation is:
x=1 (%do i=l %to &j;
For example. it is a known
mathematical truth that I + 2 is equal to 3.
and .1 + .2 is equal to.3. If these two
statements are tested in a SAS DATA step
on a UNIX (hexadecimal) system. 1 + 2 is
equal to 3 • but .1 + .2 is not exactly equal
to .3. it is Slightly greater than.3. To
investigate the difference. both .3 and the
result of .1 + .2 are carried out to 30
decimals and printed with the numeric
format of 32.30. In both cases. the values
are. 300000000000000000000000000000
because except for the HEXI6. format. all
other SAS formats round numeric values
before they are printed. To pinpoint the
difference. the numbers have to be printed
in HEX16. formal The value displayed
for .3 is 3FD3333333333333. while the
value displayed for .1 + .2 is
3FD3333333333334.
+1/&j;
%end;);
Theoretically. the result is always zero,
regardless of any value supplied to the
macro variable reference, &j. However.
due to representation problem, the result is
not always zero. For values that constitute
representation problem, the effect of the
problem increases every time the value is
divided by &j and goes through the DO
loop.
In general. when displaying fractional
values. representation problem is usually
compensated by the SAS printing formats
and the results appear to be correct. When
determining exact fractional values,
however, inexact representation generates
inaccurate results. The effect of inexact
representation can be compounded by
each component of an equation or
Numeric representation problem can
occur in some apparently simple
application like the example given above.
It can occur in any arithmetic operation:
103
NESUG '96 Proceedings
Advanced Tutorials
propagated through several iterations of
the same equation.
HOW TO MINIMIZE REPRESENTATION PROBI,EM?
The first rule of thumb is to avoid
decreasing the storage space of numeric
variables, especially those that may contain
fractional values, from the default length
of 8 bytes. If there are more digits than
there is room allowed for, the numbers are
stored with less than full precision and
cannot be fully represented.
Even with the default length of 8 bytes
of storage space, representation problem
still exists in some fractional values as
previously indicated. There are several
alternatives to account for representation
problem and to minimize unexpected
results.
One alternative is to use integer
whenever possible. If.l + .2 is not equal
to .3, simply store the values in integers
instead, add 1 to 2, and then adjust the
decimal place with formatting statements.
Using integers is a good solution for
financial institutions where the majority of
the numeric values are in dollars and cents.
The numeric values can be stored as
integers, and the results can be formatted
to shift the decimal places.
Other options are to round or to fuzz
the numbers before comparing the results
of arithmetic operations with the expected
values. These options can be used for
pharmaceutical research when comparing
laboratory values to normal ranges. The
ROUND function and the FUZZ function
are similar in that both functions add a
fuzz factor
of
IE-12.
that is,
.000000000001, to the positive values to
be rounded or fuzzed. or subtract the same
fuzz factor in negative values to account
for representation inequality.
The
difference between the two functions is
that with the ROUND function, the number
of digits for a value to be outputted can be
adjusted, but with the FUZZ function. the
value returned is pre-determined.
NESUG '96 Proceedings
104
SUMMARY
Converting the infinite real number
system into a set of fmite numbers is
necessary because of computer hardware
configuratiOns. The SAS system stores
numeric values and performs numeric
computations in either base 2 or base 16
on most platforms.
Binary or hexadecimal representations
are not always the identical twins of their
decimal counterparts. It helps to beware
of and to anticipate when the evil twins
misbehave. Due to different needs in
different applications. there is no single
rule of thumb to manage the disparity
between numbering systems. The most
common solutions are to avoid reducing
the number of bytes needed to store
numeric variables with the LENGTH or the
ATTRlB statements. to use integers
whenever possible. and to round off or to
fuzz numbers where appropriate.
SAS is a registered trademark or trademark of
SAS Institute Inc. in the USA and other
countries. ® indicates USA registration.
Other brand and product names are registered
trademarks or trademarks of their respective
companies.
For additional iriformalion. COnloe,:
Aileen L. Yam
Corning Besseiaar, Inc.
210 Carnegie Center
Princeton, NJ 08540-6233
Tel.: (609) 452-4200
Download