Number Format Conversion

advertisement
A computer number format is the internal representation of numeric values in
digital computer and calculator hardware and software.[1] Normally, numeric values are stored as
groupings of bits, named for the number of bits that compose them. The encoding between
numerical values and bit patterns is chosen for convenience of the operation of the computer; the bit
format used by the computer's instruction set generally requires conversion for external use such as
printing and display. Different types of processors may have different internal representations of
numerical values. Different conventions are used for integer and real numbers. Most calculations are
carried out with number formats that fit into a processor register, but some software systems allow
representation of arbitrarily large numbers using multiple words of memory.
Binary number representation[edit]
Computers represent data in sets of binary digits. The representation is composed of bits, which in
turn are grouped into larger sets such as bytes.
Table 1: Binary to Octal
Binary String Octal value
000
0
001
1
010
2
011
3
100
4
101
5
110
6
111
7
Table 2: Number of values for a bit string.
Length of Bit String (b) Number of Possible Values (N)
1
2
2
4
3
8
4
16
5
32
6
64
7
128
8
256
9
512
10
1024
...
A bit is a binary digit that represents one of two states. The concept of a bit can be understood as a
value of either 1 or 0, on or off, yes or no,true or false, or encoded by a switch or toggle of some
kind.
While a single bit, on its own, is able to represent only two values, a string of bits may be used to
represent larger values. For example, a string of three bits can represent up to eight distinct values
as illustrated in Table 1.
As the number of bits composing a string increases, the number of possible 0 and 1 combinations
increases exponentially. While a single bit allows only two value-combinations and two bits
combined can make four separate values and so on. The amount of possible combinations doubles
with each binary digit added as illustrated in Table 2.
Groupings with a specific number of bits are used to represent varying things and have specific
names.
A byte is a bit string containing the number of bits needed to represent a character. On most modern
computers, this is an eight bit string. Because the definition of a byte is related to the number of bits
composing a character, some older computers have used a different bit length for their byte.[2] In
many computer architectures, the byte is used to address specific areas of memory. For example,
even though 64-bit processors may address memory sixty-four bits at a time, they may still split that
memory into eight-bit pieces. This is called byte-addressable memory. Historically, many CPUs read
data in some multiple of eight bits.[3] Because the byte size of eight bits is so common, but the
definition is not standardized, the term octet is sometimes used to explicitly describe an eight bit
sequence.
A nibble (sometimes nybble), is a number composed of four bits.[4] Being a half-byte, the nibble was
named as a play on words. A person may need several nibbles for one bite from something;
similarly, a nybble is a part of a byte. Because four bits allow for sixteen values, a nibble is
sometimes known as a hexadecimal digit.[5]
Octal and hex number display[edit]
See also: Base64
Octal and hex are convenient ways to represent binary numbers, as used by computers. Computer
engineers often need to write out binary quantities, but in practice writing out a binary number such
as 1001001101010001 is tedious and prone to errors. Therefore, binary quantities are written in a
base-8, or "octal", or, much more commonly, a base-16, "hexadecimal" or "hex", number format. In
the decimal system, there are 10 digits, 0 through 9, which combine to form numbers. In an octal
system, there are only 8 digits, 0 through 7. That is, the value of an octal "10" is the same as a
decimal "8", an octal "20" is a decimal "16", and so on. In a hexadecimal system, there are 16 digits,
0 through 9 followed, by convention, with A through F. That is, a hex "10" is the same as a decimal
"16" and a hex "20" is the same as a decimal "32". An example and comparison of numbers in
different bases is described in the chart below.
When typing numbers, formatting characters are used to describe the number system, for example
000_0000B or 0b000_00000 for binary and 0F8H or 0xf8 for hexadecimal numbers.
Converting between bases[edit]
Table 3: Comparison of Values in Different Bases
Decimal Value Binary Value Octal Value Hexadecimal Value
0
000000
00
00
1
000001
01
01
2
000010
02
02
3
000011
03
03
4
000100
04
04
5
000101
05
05
6
000110
06
06
7
000111
07
07
8
001000
10
08
9
001001
11
09
10
001010
12
0A
11
001011
13
0B
12
001100
14
0C
13
001101
15
0D
14
001110
16
0E
15
001111
17
0F
Main article: Positional notation (Base conversion)
Each of these number systems are positional systems, but while decimal weights are powers of 10,
the octal weights are powers of 8 and the hex weights are powers of 16. To convert from hex or octal
to decimal, for each digit one multiplies the value of the digit by the value of its position and then
adds the results. For example:
Representing fractions in binary[edit]
Fixed-point numbers[edit]
Fixed-point formatting can be useful to represent fractions in binary.
The number of bits needed for the precision and range desired must be chosen to store the
fractional and integer parts of a number. For instance, using a 32-bit format, 16 bits may be used for
the integer and 16 for the fraction.
The eight's bit is followed by the four's bit, then the two's bit, then the one's bit. The fractional bits
continue the pattern set by the integer bits. The next bit is the half's bit, then the quarter's bit, then
the ⅛'s bit, and so on. For example:
integer bits
0.500 =
1
⁄2
fractional bits
= 00000000 00000000.10000000 00000000
1.250 = 1 1⁄4 = 00000000 00000001.01000000 00000000
7.375 = 7 3⁄8 = 00000000 00000111.01100000 00000000
This form of encoding cannot represent some values in binary. For example, the fraction , 0.2 in
decimal, the closest approximations would be as follows:
13107 / 65536 = 00000000 00000000.00110011 00110011 = 0.1999969... in decimal
13108 / 65536 = 00000000 00000000.00110011 00110100 = 0.2000122... in decimal
Even if more digits are used, an exact representation is impossible. The number , written in
decimal as 0.333333333..., continues indefinitely. If prematurely terminated, the value would not
represent
precisely.
Floating-point numbers[edit]
While both unsigned and signed integers are used in digital systems, even a 32-bit integer is not
enough to handle all the range of numbers a calculator can handle, and that's not even including
fractions. To approximate the greater range and precision of real numbers, we have to abandon
signed integers and fixed-point numbers and go to a "floating-point" format.
In the decimal system, we are familiar with floating-point numbers of the form (scientific notation):
1.1030402 × 105 = 1.1030402 × 100000 = 110304.02
or, more compactly:
1.1030402E5
which means "1.1030402 times 1 followed by 5 zeroes". We have a certain numeric value
(1.1030402) known as a "significand", multiplied by a power of 10 (E5, meaning 105 or
100,000), known as an "exponent". If we have a negative exponent, that means the number
is multiplied by a 1 that many places to the right of the decimal point. For example:
2.3434E-6 = 2.3434 × 10−6 = 2.3434 × 0.000001 = 0.0000023434
The advantage of this scheme is that by using the exponent we can get a much wider
range of numbers, even if the number of digits in the significand, or the "numeric
precision", is much smaller than the range. Similar binary floating-point formats can be
defined for computers. There are a number of such schemes, the most popular has
been defined byInstitute of Electrical and Electronics Engineers (IEEE). The IEEE 7542008 standard specification defines a 64 bit floating-point format with:

an 11-bit binary exponent, using "excess-1023" format. Excess-1023 means the
exponent appears as an unsigned binary integer from 0 to 2047; subtracting 1023
gives the actual signed value

a 52-bit significand, also an unsigned binary number, defining a fractional value with
a leading implied "1"

a sign bit, giving the sign of the number.
Let's see what this format looks like by showing how such a number would be stored in
8 bytes of memory:
byte 0: S
x10
x9
x8
x7
x6
x5
x4
byte 1: x3
x2
x1
x0
m51 m50 m49 m48
byte 2: m47 m46 m45 m44 m43 m42 m41 m40
byte 3: m39 m38 m37 m36 m35 m34 m33 m32
byte 4: m31 m30 m29 m28 m27 m26 m25 m24
byte 5: m23 m22 m21 m20 m19 m18 m17 m16
byte 6: m15 m14 m13 m12 m11 m10 m9
m8
byte 7: m7
m0
m6
m5
m4
m3
m2
m1
where "S" denotes the sign bit, "x" denotes an exponent bit, and "m" denotes a
significand bit. Once the bits here have been extracted, they are converted with the
computation:
<sign> × (1 + <fractional significand>) × 2<exponent> - 1023
This scheme provides numbers valid out to about 15 decimal digits, with the
following range of numbers:
maximum
minimum
positive 1.797693134862231E+308 4.940656458412465E-324
negative -4.940656458412465E-324 -1.797693134862231E+308
The specification also defines several special values that are not defined numbers,
and are known as NaNs, for "Not A Number". These are used by programs to
designate invalid operations and the like.
Some programs also use 32-bit floating-point numbers. The most common scheme
uses a 23-bit significand with a sign bit, plus an 8-bit exponent in "excess-127"
format, giving seven valid decimal digits.
byte 0: S
x7
x6
x5
x4
x3
x2
x1
byte 1: x0
m22 m21 m20 m19 m18 m17 m16
byte 2: m15 m14 m13 m12 m11 m10 m9
m8
byte 3: m7
m0
m6
m5
m4
m3
m2
m1
The bits are converted to a numeric value with the computation:
<sign> × (1 + <fractional significand>) × 2<exponent> - 127
leading to the following range of numbers:
maximum
minimum
positive 3.402823E+38 2.802597E-45
negative -2.802597E-45 -3.402823E+38
Such floating-point numbers are known as "reals" or "floats" in general, but with
a number of variations:
A 32-bit float value is sometimes called a "real32" or a "single", meaning
"single-precision floating-point value".
A 64-bit float is sometimes called a "real64" or a "double", meaning "doubleprecision floating-point value".
The relation between numbers and bit patterns is chosen for convenience in
computer manipulation; eight bytes stored in computer memory may represent
a 64-bit real, two 32-bit reals, or four signed or unsigned integers, or some other
kind of data that fits into eight bytes. The only difference is how the computer
interprets them. If the computer stored four unsigned integers and then read
them back from memory as a 64-bit real, it almost always would be a perfectly
valid real number, though it would be junk data.
Only a finite range of real numbers can be represented with a given number of
bits. Arithmetic operations can overflow or underflow, producing a value too
large or too small to be represented.
The representation has a limited precision. For example, only 15 decimal digits
can be represented with a 64-bit real. If a very small floating-point number is
added to a large one, the result is just the large one. The small number was too
small to even show up in 15 or 16 digits of resolution, and the computer
effectively discards it. Analyzing the effect of limited precision is a well-studied
problem. Estimates of the magnitude of round-off errors and methods to limit
their effect on large calculations are part of any large computation project. The
precision limit is different from the range limit, as it affects the significand, not
the exponent.
The significand is a binary fraction that doesn't necessarily perfectly match a
decimal fraction. In many cases a sum of reciprocal powers of 2 does not
matches a specific decimal fraction, and the results of computations will be
slightly off. For example, the decimal fraction "0.1" is equivalent to an infinitely
repeating binary fraction: 0.000110011
Download