Data Representation Bits and Bytes A computer stores data in units called bits and bytes. Computer chips called integrated circuits have one of two states, off or on. Therefore, a system was developed that used only two numbers, 0 and 1. Zero representing off and 1 representing on. You can think of this as a sort of light switch. Each switch is called a bit. Bits are grouped together in sets of eight. Each set of eight bits is called a byte. Setting different combinations of those eight "on and off" combinations can be developed to stand for letters numbers, spaces, and symbols. For practical purposes, think of a byte as one character. When computers refer to memory or storage they refer to terms using the following forms of measurement. 8 bits = 1 byte 1024 bytes = 1 Kilobyte (K) 1024 Kilobytes = 1 Megabyte (MG) 1024 Megabytes = 1 Gigabyte (GB) 1024 Megabytes = 1 Terabytes (TB) ASCII Code Text characters, as well as numbers must be binary coded to get them into a computer memory. A code that copes with alphabetic characters, _ a . . . z, A . . .Z _as well as digits _ 0 . . . 9 is called alphanumeric. Up to recently, the most frequently used code was the ASCII (American Standard Code for Information Interchange) code; ASCII uses 7-bits, (0 to 127); this includes printable and control characters, see the following ASCII table. Name Bel line-feed carriage-return blank-space Digits 0 . 9 Alphabetic A . Z Alphabetic A . Z Hex 07 0a 0d 20 30 . 39 upper-case 41 . 5a lower-case 61 . 7a Control ctrl-G ctrl-J ctrl-M . . . ASCII table UNICODE ASCII has proved adequate for English text. But what about languages with accents? And those with funny shaped characters? UNICODE is a new international standard, governed by the UNICODE consortium, aimed at allowing a richer character set. In fact, Java uses UNICODE a 16-bit code . so that Java char type is 16-bit, compared to 8 bit in C and C++. According to (Tanenbaum 1999) the totalof known characters in the world's languages number some 200,000 thus, there is already pressure on UNICODE. Number System Digital computers use binary (base 2) as their `native' representation; however, in addition to binary, and decimal, it is also useful to discuss hexadecimal (base 16) and octal (base 8) representations. Our everyday representation of numbers is decimal; it uses the 10 symbols {0,1,2,3,4,5,6,7,8,9} digits. 10 is its base or radix. The decimal number system is a positional system: the meaning of any digit is determined by its position in the string of digits that represent the number. Thus: 123 = 1 x102 + 2 x101 + 3 x100 _ = 1 x 100 + 2 x10 + 3 x 1 = 100 + 20 + 3 _ _ Binary Representation Base 2 requires just two symbols, 0 and 1 , but the pattern remains the same as for decimal. The _ digits are termed binary digits bits. The binary number (1011)2 = 1x23 + 0x22 + 1x21 + 1x20 = (11)10 Octal Representation Base 8, Octal numbers are made of octal digits: {0,1,2,3,4,5,6,7} The Octal number (4536)8 = 4x83 + 5x82 + 3x81 + 6x80 = (1362)10 Hexadecimal Representation Hexadecimal is base 16: the sixteen symbols used are {0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F}. The hexadecimal number (3A9F)16 = 3x163 + 10x162 + 9x161 + 15x160 = (14999)10 A table of the values for all bases is shown below. Dec 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Bin 0 1 10 11 100 101 110 111 1000 1001 1010 1011 1100 1101 1110 1111 Oct 0 1 2 3 4 5 6 7 10 11 12 13 14 15 16 17 Hex 0 1 2 3 4 5 6 7 8 9 A B C D E F Note: Conversion between the different bases will discussed in detail in the next face to face meeting. Along with many examples.