CH/S5CIT/Nov., 2006. The representation of data within the computer Number systems (p. 25 to 28) Under different number systems, quantities may be represented in different ways. Different number systems have different numbers of digit symbols. Each digit in a number has its values. Decimal (Denary) It is the most familiar one. It originated from counting numbers by fingers. 10 digit symbols (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) and each digit has a place value in a power of 10. e.g. 42810 : digit ‘8’ has a place value 10 0 digit ‘2’ has a place value 10 1 digit ‘4’ has a place value 10 2 Binary An electronic pulse is on or off. The number of meaningful electronic states is only The most natural coding for computer is binary. The binary number system has only 2 digit symbols, i.e. 0 and 1, with place values in power of 2. Conversions From binary to decimal, 1011102 = 1* 25 + 0 * 24 + 1 * 23 + 1 * 22 + 1 * 21 + 0 * 20 = 4610 From decimal to binary, 2 2 2 2 2 46 23 11 5 2 1 ......... ......... ......... ......... ......... 0 1 1 1 0 Therefore, the result is 1011102. Octal Programming in binary is however very tedious and prone to error. The octal system is used as it makes programming simpler (closer to 10) and not very difficult for the computer (easy conversion between binary and octal). This system has 8 digit symbols (0 to 7) and each digit has a place value in powers of 8. Knowing the place values, we can convert octal to decimal and vice versa. 2438 = 2*8 2 + 4*8 1 + 3*8 0 = 16310 8 8 163 20 2 .......3 .......4 i.e. 2438 DATA REPRESENTATION Page 1 CH/S5CIT/Nov., 2006. Conversions between octal and binary numbers are much simpler, just grouping the binary codes 3 in a group and then change each group into an octal digit. e.g. 010 101 0012 = 2518 Hexadecimal The octal number system has a disadvantage in coding computer instructions. Computers work in twos, or power of 2, thus it is not efficient for them to deal with numbers in 3-bit groups. A new number system, the hexadecimal system, is developed with 16 digit symbols (0-9, A-F) and each hexadecimal digit is equivalent to a 4-digit binary number. Table represent numbers of different number system Decimal Binary 0 1 2 3 4 5 6 7 0000 0001 0010 0011 0100 0101 0110 0111 Hexadecimal Decimal 0 1 2 3 4 5 6 7 Binary Hexadecimal 1000 1001 1010 1011 1100 1101 1110 1111 8 9 A B C D E F 8 9 10 11 12 13 14 15 Conversions between hexadecimal and binary, and hexadecimal and decimal are similar to those of octal. Conversions between octal and hexadecimal are done by first converting the number to binary. Simple Arithmetic on Binary Number Addition 11 + 101 0 11 + 101 00 11 + 101 000 1010 - 101 1 1010 - 101 01 1010 - 101 101 11 + 101 1000 Subtraction Multiplication 110 * 101 11000 110 11110 Division 110 101 )11110 101 1010 1010 Character Set Characters also stored as binary code in computer. Usually, one character is stored in 1 byte, so there can be 256 different characters. The most common character set used in computer is the American Standard Coding for Information Interchange (ASCII). DATA REPRESENTATION Page 2 CH/S5CIT/Nov., 2006. Decimal Code 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 Binary Code 00100000 00100001 00100010 00100011 00100100 00100101 00100110 00100111 00101000 00101001 00101010 00101011 00101100 00101101 00101110 00101111 00110000 00110001 00110010 00110011 01001000 01001001 01001010 01001011 01001100 01001101 01001110 01001111 01010000 01010001 01010010 01010011 01010100 01010101 01010110 01010111 01011000 01011001 01011010 01011011 01011100 01011101 01011110 01011111 01100000 01100001 01100010 01100011 Hexa. Code Character 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F 30 31 32 33 48 49 4A 4B 4C 4D 4E 4F 50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F 60 61 62 63 (space) ! " # $ % & ' ( ) * + . / 0 1 2 3 H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c Decimal Code 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 Binary Code 00110100 00110101 00110110 00110111 00111000 00111001 00111010 00111011 00111100 00111101 00111110 00111111 01000000 01000001 01000010 01000011 01000100 01000101 01000110 01000111 01100100 01100101 01100110 01100111 01101000 01101001 01101010 01101011 01101100 01101101 01101110 01101111 01110000 01110001 01110010 01110011 01110100 01110101 01110110 01110111 01111000 01111001 01111010 01111011 01111100 01111101 01111110 01111111 Hex a. Code 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 40 41 42 43 44 45 46 47 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F 70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E 7F Character 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G d e f g h i j k l m n o p q r s t u v w x y z { | } ~ (del) Unicode. i. It contains the characters from the world’s alphabets, ideographs and symbols. ii. Its first 256 codes are the same as that of ASCII. iii. The latest version, Unicode 5.0, contains 99020 graphic and format characters. One character may be stored in 1 byte, 2 bytes or 4 bytes. DATA REPRESENTATION Page 3 CH/S5CIT/Nov., 2006. Chinese Text Processing Under a Chinese text system, we can operate the computer with Chinese characters. That means we can use the computer to i. store the character which represented by certain codes (internal code). ii. use the English keyboard to input Chinese characters (input method). iii. display the Chinese characters in different shapes (graphical composition of Chinese characters of different fonts). All these features are provided by the Chinese operating system. (Windows XP) Character set In an English processing system, the internal code of a character may be an ASCII code. E.g. A’s internal code is ‘01000001’. In a Chinese processing system, the internal code may be the Chinese National Standard Codes for Information Interchange, Big 5 or other codes. (The internal code used in Chinese Windows is Big 5.) In an English processing system, the character set includes the alphabets (‘a’ to ‘z’, and ‘A’ to ‘Z’), numerals (‘0’ to ‘9’) and symbols, such as ‘$’ , ‘&’, ‘?’, ‘#’, ‘*’, etc, having over 200 characters in total. In a Chinese processing system, the character set includes all characters in an English processing system and all the commonly used Chinese characters, over 10000 in total. The character set is much larger and thus two bytes are used to store a single Chinese character. (Double Byte Character Set, DBCS). Big-5. i. For Traditional Chinese characters ii. Regions: Taiwan, Hong Kong and Malaysia iii. HK government has developed an extension to the set. (Hong Kong Supplementation Character Set). GuoBiao (GB). i. For Simplified Chinese characters ii. Regions: China, Singapore and Malaysia. Unicode consists of 20902 CJK (Chinese, Japanese and Korean) characters. Thus Simplified Chinese characters, Traditional Chinese characters and Japanese characters can be identified in athe same document. DATA REPRESENTATION Page 4