WEEK 4: REAL ARITHMETIC AND COMPUTERS 1. What is a real

advertisement
WEEK 4: REAL ARITHMETIC AND COMPUTERS
1. What is a real number?
As you will learn in Analysis, the answer is long-winded. In essence, the set R
includes all rational numbers, and in addition, satisfies the famous Completeness
Axiom:
Every non-empty set of real numbers that is bounded above has a
least upper bound.
Example 1.1. Consider the set
S = x ∈ R : x2 ≤ 2 .
This set is not empty (since 1 ∈ S) and it is bounded above; indeed
∀ x ∈ S,
x ≤ 2.
So 2 is an upper bound for S. By the Completeness
Axiom, S must have a least
√
upper bound. We denote this number by 2. It can be proved that
√ 2
2 = 2.
√
√ It is well-known fact that 2 cannot be expressed as a ratio of two integers, i.e.
2 is irrational. At present, there is no satisfactory way of storing— or performing
exact calculations with— irrational numbers. In practice, we must be content with
using (rational) approximations of real numbers.
Example 1.2. Define a sequence of rational numbers {xn }n∈N via
1
2
x0 = 1, xn+1 =
xn +
.
2
xn
√
It can be shown that this sequence converges very rapidly to 2.
We saw last week how the binary system uses lists of symbols (the symbols
0 or 1) to represent whole numbers. This representation can be used to store a
whole number on a computer. The basic unit of storage on modern computers
is called a bit. Each bit has a location in the computer “memory” and holds
either 0 or 1. So a whole number can be stored by using enough bits. Therefore
we can in principle perform calculations in exact arithmetic. Unfortunately, in
most applications, the length of the numerators and denominators would grow very
quickly as the calculation proceeds, taking much time and computer memory.
Floating-point arithmetic is a practical and efficient alternative to exact rational
arithmetic. The details of this mode of calculation vary from one computer system
to another. Our purpose here is to convey the main ideas— at the risk of some
oversimplification. Essentially, floating-point arithmetic consists of some operations
performed on a certain set of rational numbers.
1
2
WEEK 4: REAL ARITHMETIC AND COMPUTERS
2. The set of floating-point numbers
First, let us describe the set of numbers. A floating-point number x takes the
form
(2.1)
x = s × m × 2e−σ .
In this expression, s denotes the sign, i.e. ±, m denotes the so-called mantissa, e is
the shifted exponent, and σ is the shift— which is the same for every number. The
main point in this respect is that
every floating-point number occupies the same amount of computer
storage (number of bits), regardless of its actual value.
In the case of “double-precision” floating-point arithmetic, x occupies precisely 64
bits of memory as follows:
s
m
e
1 bit 52 bits 11 bits
The mantissa is a fraction which takes the binary form
m = 0.m0 m1 · · · m51 (base 2) ∈ [0, 1)
where each mi is either 0 or 1.
The shifted exponent is a whole number that takes the binary form
0 ≤ e = e10 e9 · · · e0 (base 2) ≤ 211 − 1 .
In order to allow for a nearly equal range of positive and negative exponents, the
shift is
σ = 210 = 1024 .
Hence
−1024 ≤ e − σ ≤ 1023
and so every non-zero floating-point number satisfies the inequalities
2−1024 ≤ |x| ≤ 21023 .
Attempts to create larger (respectively smaller) non-zero floating-point numbers
will therefore result in a processing error called overflow (respectively underflow).
The digits that make up the mantissa are called the significant digits. Hence, in
double-precision arithmetic, one uses 52 significant binary digits; this is equivalent
to about 16 decimal digits (since 210 ≈ 103 ).
3. The floating-point operations
It should be clear by now that the set of floating-point numbers, i.e. of numbers
that can be written in the form (2.1) is quite small.
We shall denote the set of floating point numbers by F.
Example 3.1. The number 1/10 has the expansion
1
= 0.0001100110011001100 . . . (base 2) .
10
This expansion does not terminate. It follows that 1/10 ∈
/ F.
For the purpose of approximating such numbers, the floating-point arithmetic
system includes a rounding function ̺ : Q → F with the following properties.
• ∀ x ∈ F, ̺(x) = x.
WEEK 4: REAL ARITHMETIC AND COMPUTERS
3
• Let x ∈ Q. Let [a, b] be the smallest interval containing x with endpoints
in F. Then ̺(x) equals a or b— whichever is nearest to x. If a and b are
equidistant from x, then ̺(x) is determined by a process that varies with
the particular computer system.
Example 3.2.
̺(1/10) = 0.10000000000000001 (base 10) .
See Van Rossum Appendix B.
The basic operations defined on F are then
• Floating-point addition:
x ⊕ y = ̺(x + y) .
• Floating-point subtraction:
x ⊖ y = ̺(x − y) .
• Floating-point multiplication:
• Floating-point division:
x ⊗ y = ̺(x × y) .
x ⊘ y = ̺(x/y) .
Download