Basic Character Encoding including ASCII

advertisement
DAT2343
Basic Character Encoding
Including ASCII
© Alan T. Pinck / Algonquin College; 2003
Historical Character Usage
Early general purpose computers (dominated by IBM)
supported limited usage of non-numeric characters:
•Identification/headings on printed reports
•Program source code text
Outside of the general computer area, some character
encoding was used for text transmission (telegrams);
initially Morse code, but this was replaced with fixed
length pattern codes when automated equipment began to
be used.
Historical Requirements
Text was not intended for the general public.
• Alphabetic characters where only in upper case
• Relatively few “special characters” (periods,
parentheses, dollar signs, arithmetic operators)
were supplied.
• 10 digits, 26 letters, (less than) 20 “special”
character symbols  (less than) 56 code patterns
were required.
6-Bit Codes
A 6-bit encoding systems permits 64 symbols to be
encoded. This was enough provided only upper case
alphabetic symbols and relatively few “special” symbols
were required.
IBM and Western Union (the telegraph company) stayed
with 6-bit encoding systems after most of the rest of the
computer and data transmission companies moved to a
system which permitted both upper and lower
alphabetics, and more special symbols.
Formation of ASCII
•General user demand for more character symbols
(including lower case alphabetics).
•IBM did not believe that the market demand was
sufficient to move from a 6-bit code
•No other single company controlled a large enough
market share to be able to create a viable system on its
own.
•A group of computer, peripheral, and data transmission
companies joined to establish a standard.
ASCII Basics
•American Standard Code for Information Interchange
•7-bit code provided unique codes for up to 128
different characters
•Some terminal equipment: when idle, the power was
off (which would look like 0000000); other terminal
equipment: when idle, the power was on (which would
look like 1111111). Therefore both the 0000000 and
the 1111111 patterns were eliminated from the
encoding (“null” patterns).
Extended ASCII
Byte = Collection of bits used to encode a character
ASCII is almost always implemented using an 8-bit
byte (character).
Only the 7-bit patterns were standardized under
ASCII.
Standard 8-bit ASCII codes start with a zero-valued
bit (followed by the 7-bit ASCII code).
“Extended ASCII” codes start with a one-valued bit;
these codes are not standard and vary in meaning
among different manufactures and equipment.
Major ASCII Coding Patterns
•First 32 patterns (when written in hexadecimal, any
patterns starting with 0 or 1): control codes; the most
common of these are 0Ah (Line Feed) and 0Dh
(Carriage Return)
•20(hex) blank; remainder of codes starting with 2(hex)
are “special” characters.
•30(hex): “0”; 31(hex): “ 1”; etc.
•41(hex): “ A”; 42(hex): “ B”; etc.
•61(hex): “a”; 62(hex): “ b”; etc.
Sample ASCII Decoding - 1
Suppose we have the bit stream:
010101000110100001100101001000000011…
Our first task would normally be to rewrite this as a series of
pairs of hexadecimal digits:
01010100 01101000 01100101 00100000 0011…
5
4
6
8
6
5
2
0 ….
(in actual practice it would be more common for the “bit
stream to be presented already in pairs of hexadecimal
digits)
Sample ASCII Decoding - 2
Write down the alphabet and beside each letter write
its ASCII code:
A : 41h (lower case add 20h)
K : 4Bh
B : 42h
…
C : 43h
Z : 5Ah
….
Remember: digits are 3?h
I : 49h
blank is 20h
J : 4Ah
LF is 0Ah
CR is 0Dh
Sample ASCII Decoding - 3
Given the ASCII hexadecimal pattern (as an
example):
54 68 65 20 33 0A 0D 47 6F 61 74 73
Matching these codes to the table we created, we
should have no trouble converting this into the
text:
The 3
Goats
Note on End-Of-Line Codes
Different operating systems use different
standards for indicating an end of line.
MicroSoft uses a two-character sequence:
0Dh 0Ah (carriage return and line feed)
Unix uses only 0Dh (the carriage return)
Macintosh uses only 0Ah (the line feed)
This can cause some problems when moving
text files from one system to another.
End of Lecture
Download