What is a Code? A code is a method by which a message in one alphabet is transformed into a different alphabet. The original text is often called the plain text. The transformed text is called the encoded text. The transformation to the encoded text is called encoding, and the conversion back to the original text is called decoding. Encoding may be desirable for many reasons, but three of the most common are transmission, compression (before storage), and encryption. Morse Code transforms the symbols on a typewriter into dots and dashes for transmission. The table below shows the symbols that are used in English. A .B -... C -.-. D -.. E . F ..-. G --. H .... I .. J .--K -.L .-.. M -N -. O --P .--. Q --.R .-. S ... T U ..V ...W .-X -..Y -.-Z --.. 1 .---2 ..--3 ...-4 ....5 ..... 6 -.... 7 --... 8 ---.. 9 ----. 0 ----, --..-- comma . .-.-.- period ? ..--.. q mark ; -.-.- semicolon : ---... colon / -..-. slash - -....- dash ' .----. apostrophe () -.--.- parenthesis _ ..--.- underline Note that this is an unequal length binary code. The more common symbols have shorter codes in the hope that this will increase transmission throughput. Semaphore The Semaphore flag signaling system is an alphabet signaling system based on the holding of a pair of hand-held flags in a particular pattern. The flags are usually square, red and yellow, divided diagonally with the red portion in the upper hoist. The flags are held, arms extended, in various positions representing each of the letters of the alphabet. The pattern resembles a clock face divided into eight positions: up, down, out, high, low, for each of the left and right hands (LH and RH) Six of the letters require the hand to be brought across the body so that both flags are on the same side. Semaphore was most commonly used ship-to-ship, but variations of this system have been used on land and sea for over 150 years. The French used large flags (fires at night) on towers separated by miles to transmit messages quickly across the country. Note that this is an equal length code. Computer Codes All computer codes convert symbols into binary for obvious reasons. There are four primary computer codes: BCD, EBCDIC, ASCII, and UNICODE. These are all are equal length codes suitable for normal storage or transmission. BCD Binary Coded Decimal This is a 6-bit code that saw extensive use up through the early 1970s, but is virtually unheard of today. The problem is that 6 bits allows a maximum of 2 ** 6 = 64 symbols. As a result BCD has no lower case letters. EBCDIC Extended Binary Coded Decimal Information Code This is an 8-bit code produced by IBM in the 1970s. It has 256 possible symbols. ASCII American Standard Code for Information Interchange This is a 7/8 bit code. That is, only the lower 7 bits are used for encoding, so there are 128 symbols in ASCII, but symbols are usually stored as 8 bits. First used extensively in PCs, today this code is also used in most workstations and mainframes. UNICODE 8, 16, and 32 bit code commonly used html and Java (Some of the verbiage below was lifted from the Unicode website.) The Unicode Standard is very closely aligned with the international standard ISO/IEC 10646 (also known as the Universal Character Set, or UCS, for short). Close cooperation and formal liaison between the committees has ensured that all additions to either standard are coordinated and kept in synch, so that the two standards maintain exactly the same character repertoire and encoding. The Unicode Standard defines three encoding forms that allow the same data to be transmitted in a byte, word or double word oriented format (i.e. in 8, 16 or 32-bits per code unit). All three encoding forms encode the same common character repertoire and can be efficiently transformed into one another without loss of data. The Unicode Consortium fully endorses the use of any of these encoding forms as a conformant way of implementing the Unicode Standard. UTF-8 (Unicode Transformation Format - 8) is popular for HTML and similar protocols. UTF-8 is a way of transforming all Unicode characters into a variable length encoding of bytes. It has the advantages that the Unicode characters corresponding to the familiar ASCII set have the same byte values as ASCII, and that Unicode characters transformed into UTF-8 can be used with much existing software without extensive software rewrites. (UTF-8 has a maximum of 256 symbols, and 128 of them are the same as ASCII.) UTF-16 is popular in many environments that need to balance efficient access to characters with economical use of storage. It is reasonably compact and all the heavily used characters fit into a single 16-bit code unit, while all other characters are accessible via pairs of 16-bit code units. (UTF-16 has room for a maximum of 64k symbols.) UTF-32 is popular where memory space is no concern, but fixed width, single code unit access to characters is desired. Each Unicode character is encoded in a single 32-bit code unit when using UTF-32. UTF-32 has room for a maximum of 2G symbols.) There are other binary computer codes designed for special tasks. For example, Huffman Code is an unequal length prefix code for compressing text. It will be discussed separately. Variations on the Theme Encryption/decryption is a special case of encoding/decoding in which the encoding process is hidden from others and the resulting encoded text is difficult to decode back to its original form. The original text is usually called the plaintext, and the encoded text is usually called the cyphertext. The encoding is called enciphering or encryption and decoding is often called deciphering or decryption. Another special case of this process is when the original message is not in a conventional alphabet. For example, a conventional telephone converts sound variations to DC voltage or current fluctuations that are transmitted across the phone line then converted back to sound. Some cell phones digitally sample the sound, and transmit a digital signal. Photographic and video storage and compression methods such a JPG and MPEG transform and compress pixel color value and intensity information into a storage file.