DAT2343 Basic Character Encoding Including ASCII © Alan T. Pinck / Algonquin College; 2003 Historical Character Usage Early general purpose computers (dominated by IBM) supported limited usage of non-numeric characters: •Identification/headings on printed reports •Program source code text Outside of the general computer area, some character encoding was used for text transmission (telegrams); initially Morse code, but this was replaced with fixed length pattern codes when automated equipment began to be used. Historical Requirements Text was not intended for the general public. • Alphabetic characters where only in upper case • Relatively few “special characters” (periods, parentheses, dollar signs, arithmetic operators) were supplied. • 10 digits, 26 letters, (less than) 20 “special” character symbols (less than) 56 code patterns were required. 6-Bit Codes A 6-bit encoding systems permits 64 symbols to be encoded. This was enough provided only upper case alphabetic symbols and relatively few “special” symbols were required. IBM and Western Union (the telegraph company) stayed with 6-bit encoding systems after most of the rest of the computer and data transmission companies moved to a system which permitted both upper and lower alphabetics, and more special symbols. Formation of ASCII •General user demand for more character symbols (including lower case alphabetics). •IBM did not believe that the market demand was sufficient to move from a 6-bit code •No other single company controlled a large enough market share to be able to create a viable system on its own. •A group of computer, peripheral, and data transmission companies joined to establish a standard. ASCII Basics •American Standard Code for Information Interchange •7-bit code provided unique codes for up to 128 different characters •Some terminal equipment: when idle, the power was off (which would look like 0000000); other terminal equipment: when idle, the power was on (which would look like 1111111). Therefore both the 0000000 and the 1111111 patterns were eliminated from the encoding (“null” patterns). Extended ASCII Byte = Collection of bits used to encode a character ASCII is almost always implemented using an 8-bit byte (character). Only the 7-bit patterns were standardized under ASCII. Standard 8-bit ASCII codes start with a zero-valued bit (followed by the 7-bit ASCII code). “Extended ASCII” codes start with a one-valued bit; these codes are not standard and vary in meaning among different manufactures and equipment. Major ASCII Coding Patterns •First 32 patterns (when written in hexadecimal, any patterns starting with 0 or 1): control codes; the most common of these are 0Ah (Line Feed) and 0Dh (Carriage Return) •20(hex) blank; remainder of codes starting with 2(hex) are “special” characters. •30(hex): “0”; 31(hex): “ 1”; etc. •41(hex): “ A”; 42(hex): “ B”; etc. •61(hex): “a”; 62(hex): “ b”; etc. Sample ASCII Decoding - 1 Suppose we have the bit stream: 010101000110100001100101001000000011… Our first task would normally be to rewrite this as a series of pairs of hexadecimal digits: 01010100 01101000 01100101 00100000 0011… 5 4 6 8 6 5 2 0 …. (in actual practice it would be more common for the “bit stream to be presented already in pairs of hexadecimal digits) Sample ASCII Decoding - 2 Write down the alphabet and beside each letter write its ASCII code: A : 41h (lower case add 20h) K : 4Bh B : 42h … C : 43h Z : 5Ah …. Remember: digits are 3?h I : 49h blank is 20h J : 4Ah LF is 0Ah CR is 0Dh Sample ASCII Decoding - 3 Given the ASCII hexadecimal pattern (as an example): 54 68 65 20 33 0A 0D 47 6F 61 74 73 Matching these codes to the table we created, we should have no trouble converting this into the text: The 3 Goats Note on End-Of-Line Codes Different operating systems use different standards for indicating an end of line. MicroSoft uses a two-character sequence: 0Dh 0Ah (carriage return and line feed) Unix uses only 0Dh (the carriage return) Macintosh uses only 0Ah (the line feed) This can cause some problems when moving text files from one system to another. End of Lecture