Representation of Data Within the Computer Brian Bramer, DeMontfort University, UK (bb@dmu.ac.uk) Wed based version: http://www.cse.dmu.ac.uk/~bb/Teaching/ComputerSystems/RepresentationOfData/RepresentationOfData.htm Contents 1 Decimal and binary integer numbers 2 Binary Addition 3 Signed Binary Numbers 4 Overflow 5 Hexadecimal Numbers 6 Conversion between the binary and hexadecimal number systems 7 Conversion of Decimal Numbers to Binary 8 Conversion of binary numbers to decimal 9 Conversion of decimal numbers to hexadecimal 10 Conversion of hexadecimal numbers to decimal 11 Fixed Point Real Numbers 12 Floating Point Real Numbers 13 ASCII Character Code When humans use numeric data they usually represent the numbers using the decimal system, i.e. base 10. Working in decimal numbers requires the ability to differentiate between ten different states, the digits 0 through 9. For the human brain this is straightforward, and may even be extended to take account of alphabetic information (i.e. the characters a to z and A to Z). Computer systems are built from large numbers of similar electronic circuits. Although it is possible to build electronic circuits which can store and manipulate ten states, it is easier and cheaper to build electronic switches that may be in one of two states, either ON or OFF. Such circuits can therefore be used to represent binary data (base 2) with, for example, binary 1 and 0 being represented by the ON and OFF states respectively. Representation of Data Within the Computer page 1 Computer systems internally represent all information, data and instructions, in binary form, with conversion between binary and human readable forms for input and output. When working in machine code or assembly language it is sometimes necessary to use binary or some similar number system. Binary numbers tend to be very long and hence it is easy to make mistakes when dealing with such data. In such situations the hexadecimal (base 16) number system is commonly used (it is very easy to convert between binary and hexadecimal). 1 Decimal and binary integer numbers Decimal Digit number base possible states Binary Bit 10 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 2 0 or 1 The above table shows that a decimal digit can represent one of ten states, 0 through 9, and a binary bit (a single binary digit is called a bit) can represent two states, 0 or 1. It is possible, however, to represent more states by joining a sequence of digits or bits together, and in such a case it is assumed that the least significant digit or bit is the rightmost and the most significant is the leftmost. The bits or digits are generally numbered starting with the least significant from 0. 1.1 An eight digit decimal number digit 7 digit value 107 6 5 4 3 2 106 105 104 103 102 1 0 101 100 In the decimal system the least significant (rightmost) digit represents units (100), the next tens (101), the next hundreds (102), etc., therefore the above eight digit number can represent values in the range 0 (all digits 0) to 99999999 (all digits 9). 1.2 An eight-bit binary number bit 7 6 5 4 3 2 1 0 bit value 27 26 25 24 23 22 21 20 In the binary system the least significant (rightmost) bit represents units (20), the next twos (21), the next fours (22), etc., therefore the above 8-bit binary number can represent values in the range 0 (all bits 0) to 11111111 binary (all bits are 1). It is possible to convert between number bases and 11111111 binary is equivalent to 255 decimal. Larger values can be represented by more bits, for example a 16-bit binary number can represent 0 to 65535 decimal, and a 32-bit number 0 to 4294967295. Within a computer system a memory word is built up from a number of bits. Typical word sizes are eight bits (usually called a byte), 16 bits, 32 bits or 64 bits. In practice the majority of modern computer systems use a memory based on bytes of storage, with sequences of bytes being used to store 16-, 32- or 64-bit numeric data. Representation of Data Within the Computer page 2 2 Binary Addition The following truth tables show all the possible combinations of the addition of: (a) two bits A and B (b) two bits A and B plus a carry in from a previous addition. In both cases the addition results in a SUM and a carry out. A + B 0 0 1 1 0 1 0 1 0 1 1 0 A + B + carry in 0 0 0 0 1 1 1 1 carry SUM 0 0 1 1 0 0 1 1 0 0 0 1 carry SUM 0 1 0 1 0 1 0 1 out 0 1 1 0 1 0 0 1 0 0 0 1 0 1 1 1 The following are examples of decimal and binary addition: decimal binary decimal binary decimal binary 5 101 10 1010 27 11011 +2 + 10 + 9 +1001 +15 + 1111 7 111 19 10011 42 101010 The rightmost bits are added using the left hand table above. This results in a SUM and a carry bit which is carried out to be added into the addition of the next two bits (using the right hand table above). This addition then results in a sum and a carry out, etc. The majority of modern computer systems store numeric values in sequences of bytes, i.e. 8-bit words of storage. A single byte is limited to representing a number in the range 0 to 255 decimal. If the addition of two bytes results in a carry out of bit 7, the result is greater than 255, and an error has occurred. When carrying out integer arithmetic on a computer system care must be taken to ensure that the results will fit the word size being used (generally 16 or 32 bits are used for integer number calculations). Representation of Data Within the Computer page 3 3 Signed Binary Numbers Mathematical and scientific calculations require the storage of negative as well as positive integer numbers. To represent a positive or negative number using the binary system one bit, usually the leftmost bit, is reserved for the sign. A negative number can then be represented in a number of forms, e.g. to represent -10 decimal as an eight bit signed binary number: (a) sign-true magnitude 10001010 (b) ones-complement 11110101 (c) twos complement 11110110 Sign-True Magnitude Form. The leftmost bit holds the sign of the number, 0 for positive and 1 for negative, and the other seven bits represent the magnitude. In the example (a) 0001010 is the magnitude equivalent to 10 decimal, and the leftmost bit is 1 to indicate that the value is negative. This system is not commonly used in computer systems because it requires separate addition and subtraction circuits. Ones Complement Form. To obtain the negative of a number each bit of the positive binary value is complemented, i.e. 0s are replaced with 1s and 1s with 0s. In example (b) +10 decimal, 00001010 binary, is complemented to form -10 decimal, i.e. 11110101 binary. This form is used in some computer systems, e.g. CDC 7600 series, but it has the problem that 0 can take two forms +0 (00000000) or -0 (11111111). Twos Complement Form. To obtain the negative value of a number the ones complement is obtained, and then 1 added, i.e. in (c) above the value of +10 decimal, 00001010, is ones complemented to obtain 11110101, and then 1 added to obtain 11110110 (-10 decimal). The advantage of complemented numbers is that separate addition and subtraction circuits are not required. To subtract a number, its complement is formed (a very easy operation), and the result added (using the normal adder circuits) to the other number. The majority of modern computer systems use twos complement form to represent signed binary numbers. In practice signed numbers are used for normal arithmetic calculations, and unsigned numbers for addresses, e.g. in assembly language programs. The range that can be represented by signed and unsigned 8-bit, 16-bit and 32-bit binary numbers is shown in Chapter 1 Table 1.1. 4 Overflow Overflow occurs if the number of bits is too small to store the result of an arithmetic operation. For example, when using 8-bit signed numbers the binary addition 01101110 + 00101101 (decimal: 110 + 45) would result in the value 10011011 binary. It can be seen that the addition of the two positive numbers has resulted in the incorrect negative value -101 decimal. After the computer hardware has carried out an arithmetic operation it sets condition code bits that indicate if: the result was 0; the result was negative; a carry resulted from the operation; or Representation of Data Within the Computer page 4 an overflow occurred during the operation. The condition code bits can be used in program control structures and for checking for error conditions. Many high-level language run-time systems automatically check for overflow errors, and special instructions can be used by assembly language programs to test the condition code bits. 5 Hexadecimal Numbers When working in assembly languages it is often necessary to specify memory addresses and bit patterns. To do this using binary numbers would be cumbersome and error prone, i.e. to represent a 16-bit binary number sixteen 0s and 1s would have to be entered. In practice, the hexadecimal (base 16) number system is commonly used: it is a very concise way to represent numbers (each hexadecimal digit represents four binary bits); and it is easy to convert between binary and hexadecimal decimal hexadecimal 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 A B C D E F binary 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 decimal hexadecimal 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F binary 00010000 00010001 00010010 00010011 00010100 00010101 00010110 00010111 00011000 00011001 00011010 00011011 00011100 00011101 00011110 00011111 Table 1: Decimal, hexadecimal and binary numbers Representation of Data Within the Computer page 5 6 Conversion between the binary and hexadecimal number systems To convert a binary number to hexadecimal: 1. working from the least significant (rightmost) bit split the binary number up into groups of four bits; 2. using Table 1 convert each group of four bits into the equivalent hexadecimal digit. For example: 0001001110011110 to 0001 0011 1001 1110 = 139E hexadecimal To convert from hexadecimal to binary, replace each hexadecimal digit with the equivalent four bit binary value. 7 Conversion of Decimal Numbers to Binary To convert a positive decimal integer the following algorithm starts by generating the least significant binary bit, then the next, etc.: LOOP next binary bit = remainder of DECIMAL_VALUE/2 DECIMAL_VALUE = DECIMAL_VALUE/2 (ignoring remainder) UNTIL DECIMAL_VALUE=0 e.g.convert decimal 38 to binary result 38/2 = 19 remainder 0 gives binary 0 0 19/2 = 9 remainder 1 gives binary 1 10 9/2 = 4 remainder 1 gives binary 1 110 4/2 = 2 remainder 0 gives binary 0 0110 2/2 = 1 remainder 0 gives binary 0 00110 1/2 = 0 remainder 1 gives binary 1 100110 To obtain the binary equivalent of a negative decimal number, convert the absolute value to binary then take the twos complement. Representation of Data Within the Computer page 6 8 Conversion of binary numbers to decimal The following algorithm converts a binary number into decimal: DECIMAL_VALUE=0 LOOP starting with the most significant binary bit BIT_VALUE = value of current binary bit DECIMAL_VALUE = DECIMAL_VALUE*2 + BIT_VALUE UNTIL current bit is the least significant For example, convert 100110 binary (remember the least significant bit is bit 0): bit processed 5 4 3 2 1 0 DECIMAL_VALUE (((1*2 + 0)*2 + 0)*2 + 1)*2 + 1)*2 + 0 = 38 9 Conversion of decimal numbers to hexadecimal The following algorithm generates the least significant (rightmost) hexadecimal digit, then the next digit, etc.: LOOP REMAINDER = remainder of DECIMAL_VALUE/16 next hexadecimal digit = hexadecimal equivalent of REMAINDER DECIMAL_VALUE = DECIMAL_VALUE/16 (ignoring remainder) UNTIL DECIMAL_VALUE=0 e.g. convert 1567 to hexadecimal 1567/16 = 97 remainder 15 gives hexadecimal F 97/16 = 6 remainder 1 gives hexadecimal 1 6/16 = 0 remainder 6 gives hexadecimal 6 result F 1F 61F 10 Conversion of hexadecimal numbers to decimal DECIMAL_VALUE=0 LOOP starting with the most significant hexadecimal digit DIGIT_VALUE = decimal value of current hexadecimal digit DECIMAL_VALUE = DECIMAL_VALUE*16 + DIGIT_VALUE UNTIL current hexadecimal digit is the least significant For example, convert 61F hexadecimal to decimal: hex digit processed 2 1 0 DECIMAL_VALUE ((6*16) + 1)*16 + 15 = 1567 Representation of Data Within the Computer page 7 11 Fixed Point Real Numbers So far only integer numbers have been considered. Such numbers are useful when calculations on whole number values are required, e.g. for loop control in programs. In practice, however, it is necessary to be able to represent fractional components of numbers as well. These are called real numbers and one means by which these may be represented is in fixed point form. The following shows a 16-bit binary value in which the whole number part (with sign) is stored in eight bits (bits 8 to 15) and the fractional component in eight bits (bits 0 to 7): 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Sign 26 25 24 23 22 21 20 2-1 2-2 2-3 2-4 2-5 2-6 2-7 2-8 For example decimal 10.75 would be 1010.11 binary. The major limitations of this system of real number representation are that: 1. The maximum absolute size of numbers is limited by the number of bit assigned to hold the whole number part (as with a normal binary integer number). 2. When dealing with small fractional components accuracy is lost, and very small values cannot be represented at all, i.e. the smallest value that can be represented by the above fixed point number is 0.00390625. In practice these restrictions on fixed point numbers do not make it worth while providing the extra software or hardware within the computer system to process them. Representation of Data Within the Computer page 8 12 Floating Point Real Numbers In many scientific and engineering applications very small or very large numbers have to be represented, e.g. from the sizes of atomic particles to intergalactic distances. In the floating point number system the real value is represented by a signed fractional component called the mantissa and a signed exponent. For example, decimal floating point numbers (using base 10) can be represented: mantissa * 10exponent where 0.1 = mantissa < 1.0 To maintain accuracy the absolute value of the mantissa is maintained within the range shown (this process is called normalisation), e.g. e.g 6520000.0 would be 0.652*107 and -0.00000000652 would be -0.652*10-8. In practice many printers cannot print superscripts so the above examples would be printed as follows: 6520000.0 as 0.652E7 and -0.00000000652 as -0.652E-8 where the E indicates an exponent of 10. Within computer systems the fractional component is held as a binary fraction and the exponent is a power of 2 (or possibly 16). A typical system may store each floating point number in 32 bits with 24 bits to hold the signed mantissa and 8 bits for the signed exponent. In this case the accuracy of the mantissa is 23 binary bits (which is equivalent to 6 or 7 decimal figures of accuracy), and the range of the exponent would be -128 to +127. Greater accuracy can be obtained by using 64-bit storage in which 53 bits may be used to store the signed mantissa (giving 15 to 17 decimal figures of accuracy) and 11 bits for the exponent. Floating point calculations can be carried out using floating point co-processor chips, or emulated in software that uses the integer arithmetic operations of a computer. The advantage of floating point hardware is that it can be several orders of magnitude faster than software emulation, but it requires more complex and expensive hardware. Representation of Data Within the Computer page 9 13 ASCII Character Code Table 2 lists the ASCII character codes (American Standard Code for Information Interchange), with the columns being the decimal value, the hexadecimal value, then the corresponding character. ASCII is the most widely used character code for data transmission between computers, terminals and printers. As with all information within the computer system, characters are represented by binary patterns. In the ASCII code each character is represented by a seven bit code that is stored one character per byte (with bit 7 set to 0 or used as a parity check). The characters below 32 decimal (20 hexadecimal) are non-printing control characters. These are used to control the action of printers, display screens, communications systems, etc. Important control characters are: NUL BEL BS HT LF FF CR ESC SP null: no action (used as a fill or delay character) bell: rings the keyboard bell or buzzer backspace: move back one character width horizontal tabulate: move horizontally to next tabulate position line feed: move page vertically one character height form feed: new page on printer, clear display screen carriage return: move to start of current line escape: used in many systems as a program control character space: move horizontal by one character width For example to move a printer or a display screen to a new line position the characters CR (carriage return) then LF (line feed) will be output. In addition some of the printable characters will depend upon the printer font being used. It is worth noting that the ASCII codes for the numeric characters 0 to 9, and alphabetic characters A to Z and a to z, are arranged in ascending numerical order. This property can be used for: 1. 2. Testing if a character is within a range, i.e. in the range A to Z. The conversion of numeric decimal data, entered at a keyboard, into internal binary form. Do not confuse the code for a numeric character with the equivalent numeric binary value, i.e the code for the character 1 is 31 hexadecimal (49 decimal). When a number composed of several digits is read from a keyboard the character codes are read, turned into the equivalent binary numeric value and then added to any previous total. The following algorithm reads a decimal number from a keyboard (until a non-digit is entered): NUMBER=0 READ(character) LOOP WHILE character is in the range '0' to '9' DIGIT_VALUE = character - '0' NUMBER = NUMBER*10 + DIGIT_VALUE READ(character) END LOOP Representation of Data Within the Computer page 10 In the majority of programming languages a character code value is specified by enclosing it in quote marks. In the above algorithm characters are read from the keyboard until a non-digit character is hit. If the character is a digit, say 7 was hit, the ASCII code for 0 is subtracted from it to get the equivalent numeric value DIGIT_VALUE, i.e. in this case 30 hexadecimal (the code for '0'), will be subtracted from 37 hexadecimal (the code for '7'), to give DIGIT_VALUE=7. The NUMBER entered so far is then multiplied by ten and the current DIGIT_VALUE added. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO S1 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F SP ! " # $ % $amp; ' { } * + , / 0 1 2 3 4 5 6 7 8 9 : ; &lt = &gt ? 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F 50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z { \ } ^ _ 96 97 98 99 100 101 102 103 104 105 106 107 108 109 111 112 113 114 115 116 118 119 120 121 122 123 124 125 126 127 60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F 70 71 72 73 74 75 76 77 78 79 7A 7B 5C 7D 7E 7F ` a b c d e f g h i j k l m n o p q r s t u v w x y z { } ~ DEL Table 2: The ASCII Character Codes: columns are decimal and hexadecimal numeric character code value followed by the character When character information is transmitted over a noisy communications channel a parity bit can replace bit 7 (which is not used in the ASCII code) or be added to make the total character length of 9-bits (for more details of parity checking see the Problem for Chapter 12). Representation of Data Within the Computer page 11