Encoding on the Internet - courses.psu.edu

advertisement
Encoding on the Internet
Elizabeth J. Pyatt
CETS
© 2001, Penn State University
Computers Do Numbers
• All data on computers are ultimately stored as
numbers
• Letters are assigned numbers via an
encoding sytem
• Numbers in the encoding system determine
the alphabetical order of the letters
• Keyboards input a number which correponds
to that letter
© 2001, Penn State University
ASCII Encoding
• ASCII - American Standard Code for
Information Exhange
• Invented in the 1960s
• Limited to 128 (27) characters (English only)
• ASCII encoding on all modern computers
• ASCII encodes letters, digits, punctuation and
the blank space character
• Distinguishes capital letters from lower case
© 2001, Penn State University
ASCII Chart (Excel)
© 2001, Penn State University
First Steps Beyond ASCII
• Vendors add an additional 128 characters for
256 total characters ( or 28 “8-bit”)
• Characters #0-127 = ASCII
• Characters #128-255 = non-English letters
and punctuation
• Each accented letter (e.g. á,â or Á) is a
separate character.
• Multiple vendors = multiple standards
© 2001, Penn State University
ISO-8859-1 / Latin 1
• Internet standard for English and Western
European Languages is ISO-8859-1
• ISO = International Organization of Standards
• 8859 = encoding standard
• 1 = 1st one registered at ISO
• Latin / Roman = English alphabet
• Almost identical to Windows-1252 encoding
• Differs from “MacRoman” on Macintosh
© 2001, Penn State University
Latin-1 vs. Mac Roman (GIF)
© 2001, Penn State University
Encoding Non-Roman Scripts
• Alternate encodings developed for other
scripts like Russian, Arabic, Greek, Hebrew
• Template is:
– Character #0-127=ASCII
– Character #128-255=Non-Roman script
• Some scripts also developed multiple
encodings, typically an ISO version and a
Windows version (e.g.Hebrew = ISO-8859-8
or Windows-1255)
© 2001, Penn State University
Encoding Schemes
Encoding Hebrew
#0-127
Greek
Cent Eur Arabic
ISO-8859-8
ISO-8859-7
ISO-8859-2
ISO-8859-6
ASCII
ASCII
ASCII
ASCII
Greek
Special
Arabic
Accented
Letters
#128-255 Hebrew
© 2001, Penn State University
16 Bit and Beyond
• Chinese, Japanese and Korean have more
than 256 characters
• 16-Bit encodings with 216 or 65,536
characters developed
• Unicode, which attempts to combine all
modern scripts into one super encoding
block, is currently being developed
• Increasing Unicode support on Windows and
Macintosh, but still limited in application
© 2001, Penn State University
How browsers read a site
• Web site specifies encoding to browser
• Browser matches encoding with the
right font on your machine
• Browser displays the appropriate
characters (English and non-English)
© 2001, Penn State University
How to mess up the browser
• Web site doesn’t specify encoding, so
browser stays on default (usually Latin-1)
• Web site specifies font not on user’s machine
• Font doesn’t match encoding
• Font doesn’t have all the right characters
(e.g. € (Euro currency symbol))
© 2001, Penn State University
Keyboards & Fonts
• Normal fonts (e.g. Times, Arial) match the
character to its ASCII/Latin-1 number based
on the keyboard
• “Dingbat Fonts” (e.g. Symbol, Wingdings) do
not match the character to the ASCII Code
• Most keyboards still access only 128
characters at time (Mac can do 256)
• Therefore, older non-English script fonts (e.g.
Symbol) do not always match script encoding
© 2001, Penn State University
Where to get good fonts
• Microsoft provides free, properly
encoded, fonts with Windows NT and
Windows 2000
• Apple provides free, properly encoded
fonts via its Language Kits (free with
System 9)
• Third party fonts are available (but can
be glitchy)
© 2001, Penn State University
Download