Computer Codes

advertisement
What is a Code?
A code is a method by which a message in one alphabet is transformed into
a different alphabet. The original text is often called the plain text. The
transformed text is called the encoded text. The transformation to the
encoded text is called encoding, and the conversion back to the original
text is called decoding. Encoding may be desirable for many reasons, but
three of the most common are transmission, compression (before storage),
and encryption.
Morse Code transforms the symbols on a typewriter into dots and dashes
for transmission. The table below shows the symbols that are used in
English.
A .B -...
C -.-.
D -..
E .
F ..-.
G --.
H ....
I ..
J .--K -.L .-..
M -N -.
O --P .--.
Q --.R .-.
S ...
T U ..V ...W .-X -..Y -.-Z --..
1 .---2 ..--3 ...-4 ....5 .....
6 -....
7 --...
8 ---..
9 ----.
0 ----, --..-- comma
. .-.-.- period
? ..--.. q mark
; -.-.- semicolon
: ---... colon
/ -..-. slash
- -....- dash
' .----. apostrophe
() -.--.- parenthesis
_ ..--.- underline
Note that this is an unequal length binary code. The more common
symbols have shorter codes in the hope that this will increase transmission
throughput.
Semaphore The Semaphore flag signaling system is an alphabet signaling
system based on the holding of a pair of hand-held flags in a particular
pattern. The flags are usually square, red and yellow, divided diagonally
with the red portion in the upper hoist.
The flags are held, arms extended, in various positions representing each
of the letters of the alphabet. The pattern resembles a clock face divided
into eight positions: up, down, out, high, low, for each of the left and right
hands (LH and RH) Six of the letters require the hand to be brought across
the body so that both flags are on the same side. Semaphore was most
commonly used ship-to-ship, but variations of this system have been used
on land and sea for over 150 years. The French used large flags (fires at
night) on towers separated by miles to transmit messages quickly across
the country. Note that this is an equal length code.
Computer Codes
All computer codes convert symbols into binary for obvious reasons. There
are four primary computer codes: BCD, EBCDIC, ASCII, and UNICODE.
These are all are equal length codes suitable for normal storage or
transmission.
BCD
Binary Coded Decimal
This is a 6-bit code that saw extensive use up through the early 1970s, but
is virtually unheard of today. The problem is that 6 bits allows a maximum of
2 ** 6 = 64 symbols. As a result BCD has no lower case letters.
EBCDIC Extended Binary Coded Decimal Information Code
This is an 8-bit code produced by IBM in the 1970s. It has 256 possible
symbols.
ASCII
American Standard Code for Information Interchange
This is a 7/8 bit code. That is, only the lower 7 bits are used for encoding,
so there are 128 symbols in ASCII, but symbols are usually stored as 8 bits.
First used extensively in PCs, today this code is also used in most
workstations and mainframes.
UNICODE 8, 16, and 32 bit code commonly used html and Java
(Some of the verbiage below was lifted from the Unicode website.)
The Unicode Standard is very closely aligned with the international
standard ISO/IEC 10646 (also known as the Universal Character Set, or
UCS, for short). Close cooperation and formal liaison between the
committees has ensured that all additions to either standard are
coordinated and kept in synch, so that the two standards maintain exactly
the same character repertoire and encoding.
The Unicode Standard defines three encoding forms that allow the same
data to be transmitted in a byte, word or double word oriented format (i.e. in
8, 16 or 32-bits per code unit). All three encoding forms encode the same
common character repertoire and can be efficiently transformed into one
another without loss of data. The Unicode Consortium fully endorses the
use of any of these encoding forms as a conformant way of implementing
the Unicode Standard.
UTF-8 (Unicode Transformation Format - 8) is popular for HTML and
similar protocols. UTF-8 is a way of transforming all Unicode characters into
a variable length encoding of bytes. It has the advantages that the Unicode
characters corresponding to the familiar ASCII set have the same byte
values as ASCII, and that Unicode characters transformed into UTF-8 can
be used with much existing software without extensive software rewrites.
(UTF-8 has a maximum of 256 symbols, and 128 of them are the same as
ASCII.)
UTF-16 is popular in many environments that need to balance efficient
access to characters with economical use of storage. It is reasonably
compact and all the heavily used characters fit into a single 16-bit code unit,
while all other characters are accessible via pairs of 16-bit code units.
(UTF-16 has room for a maximum of 64k symbols.)
UTF-32 is popular where memory space is no concern, but fixed width,
single code unit access to characters is desired. Each Unicode character is
encoded in a single 32-bit code unit when using UTF-32. UTF-32 has room
for a maximum of 2G symbols.)
There are other binary computer codes designed for special tasks. For
example, Huffman Code is an unequal length prefix code for compressing
text. It will be discussed separately.
Variations on the Theme
Encryption/decryption is a special case of encoding/decoding in which the
encoding process is hidden from others and the resulting encoded text is
difficult to decode back to its original form. The original text is usually called
the plaintext, and the encoded text is usually called the cyphertext. The
encoding is called enciphering or encryption and decoding is often called
deciphering or decryption.
Another special case of this process is when the original message is not in
a conventional alphabet. For example, a conventional telephone converts
sound variations to DC voltage or current fluctuations that are transmitted
across the phone line then converted back to sound. Some cell phones
digitally sample the sound, and transmit a digital signal.
Photographic and video storage and compression methods such a JPG
and MPEG transform and compress pixel color value and intensity
information into a storage file.
Download