2. Data Formats Chapt. 3 ITEC 1011 Introduction to Information Technologies Introduction • Examples Computer Real World Data Data Input device Dear Mom: Keyboard 10110010… Digital camera 10110010… pp. 59.-61 ITEC 1011 Introduction to Information Technologies Format must be appropriate • The internal representation must be appropriate for the type of processing to take place (e.g., text, images, sound) ITEC 1011 Introduction to Information Technologies Rules/Conventions • Proprietary formats – Unique to a product or company – E.g., Microsoft Word, Corel Word Perfect, IBM Lotus Notes • Standards – Evolve two ways: • Proprietary formats become de facto standards (e.g., Adobe PostScript, Apple Quick Time) • Committee is struck to solve a problem (Motion Pictures Experts Group, MPEG) pp. 61-62 ITEC 1011 Introduction to Information Technologies Standards Organizations • ISO – International Standards Organization • CSA – Canadian Standards Association • ANSI – American National Standards Institute • IEEE – Institute for Electrical and Electronics Engineers • Etc. ITEC 1011 Introduction to Information Technologies Examples of Standards Type of Data Alphanumeric Standards ASCII, EBCDIC, Unicode Image JPEG, GIF, PCX, TIFF Motion picture MPEG-2, Quick Time Sound Sound Blaster, WAV, AU Outline graphics/fonts PostScript, TrueType, PDF ITEC 1011 Introduction to Information Technologies Why Standards? • Standard are “arbitrary” • They exist because they are – – – – – ITEC 1011 Convenient Efficient Flexible Appropriate Etc. Introduction to Information Technologies Alphanumeric Data • Problem: Distinguishing between the number 123 (one hundred and twenty-three) and the characters “123” (one, two, three) • Four standards for representing letters (alpha) and numbers – BCD – Binary-coded decimal – ASCII – American standard code for information interchange – EBCDIC – Extended binary-coded decimal interchange code – Unicode pp. 63-69 ITEC 1011 Introduction to Information Technologies Standard Alphanumeric Formats • • • • BCD ASCII EBCDIC Unicode ITEC 1011 Introduction to Information Technologies Next 2 slides Binary-Coded Decimal (BCD) • Four bits per digit Note: the following bit patterns are not used: 1010 1011 1100 1101 1110 1111 ITEC 1011 Digit Bit pattern 0 0000 1 0001 2 0010 3 0011 4 0100 5 0101 6 0110 7 0111 8 1000 9 1001 Introduction to Information Technologies Example • 709310 = ? (in BCD) 7 0111 ITEC 1011 0 0000 9 1001 3 0011 Introduction to Information Technologies Standard Alphanumeric Formats • • • • BCD ASCII EBCDIC Unicode ITEC 1011 Next 22 slides Introduction to Information Technologies The Problem • Representing text strings, such as “Hello, world”, in a computer ITEC 1011 Introduction to Information Technologies Codes and Characters • Each character is coded as a byte • Most common coding system is ASCII (Pronounced ass-key) • ASCII = American National Standard Code for Information Interchange • Defined in ANSI document X3.4-1977 ITEC 1011 Introduction to Information Technologies ASCII Features • • • • 7-bit code 8th bit is unused (or used for a parity bit) 27 = 128 codes Two general types of codes: – 95 are “Graphic” codes (displayable on a console) – 33 are “Control” codes (control features of the console or communications channel) ITEC 1011 Introduction to Information Technologies ASCII Chart 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 ITEC 1011 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + , . / 011 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ 110 ` a b c d e f g h i j k l m n o Introduction to Information Technologies 111 p q r s t u v w x y z { | } ~ DEL 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 ITEC 1011 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + , . / 011 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ Introduction to Information Technologies 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 Least 1100 1101 1110 1111 ITEC 1011 000 001 010 011 NULL DLE 0 SOH DC1 ! 1 STX DC2 " 2 ETX DC3 # 3 EDT DC4 Most$ significant 4 ENQ NAK % 5 ACK SYN & 6 BEL ETB ' 7 BS CAN ( 8 HT EM ) 9 LF SUB * : VT ESC + ; significant bit FF FS , < CR GS = SO RS . > SI US / ? 100 @ A B C bit D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ Introduction to Information Technologies 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL e.g., ‘a’ = 1100001 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 ITEC 1011 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + , . / 011 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ Introduction to Information Technologies 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL 95 Graphic codes 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 ITEC 1011 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + , . / 011 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ Introduction to Information Technologies 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL 33 Control codes 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 ITEC 1011 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + , . / 011 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ Introduction to Information Technologies 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL Alphabetic codes 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 ITEC 1011 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + , . / 011 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ Introduction to Information Technologies 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL Numeric codes 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 ITEC 1011 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + , . / 011 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ Introduction to Information Technologies 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL Punctuation, etc. 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 ITEC 1011 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + , . / 011 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ Introduction to Information Technologies 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL “Hello, world” Example H e l l o , w o r l d ITEC 1011 = = = = = = = = = = = = Binary 01001000 01100101 01101100 01101100 01101111 00101100 00100000 01110111 01100111 01110010 01101100 01100100 = = = = = = = = = = = = Hexadecimal 48 65 6C 6C 6F 2C 20 77 67 72 6C 64 = = = = = = = = = = = = Decimal 72 101 108 108 111 44 32 119 103 114 108 100 Introduction to Information Technologies Common Control Codes • • • • • CR LF HT DEL NULL 0D 0A 09 7F 00 carriage return line feed horizontal tab delete null Hexadecimal code ITEC 1011 Introduction to Information Technologies 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 ITEC 1011 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + , . / 011 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ Introduction to Information Technologies 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL Terminology • Learn the names of the special symbols – – – – – – ITEC 1011 [] {} () @ & ~ brackets braces parentheses commercial ‘at’ sign ampersand tilde Introduction to Information Technologies 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 ITEC 1011 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + , . / 011 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ Introduction to Information Technologies 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL Escape Sequences • Extend the capability of the ASCII code set • For controlling terminals and formatting output • Defined by ANSI in documents X3.41-1974 and X3.64-1977 • The escape code is ESC = 1B16 • An escape sequence begins with two codes: ESC [ 1B16 ITEC 1011 5B16 Introduction to Information Technologies Examples • Erase display: • Erase line: ITEC 1011 ESC [ 2 J ESC [ K Introduction to Information Technologies Standard Alphanumeric Formats • • • • BCD ASCII EBCDIC Unicode ITEC 1011 Introduction to Information Technologies Next 1 slides EBCDIC • Extended BCD Interchange Code (pronounced ebb’-se-dick) • 8-bit code • Developed by IBM • Rarely used today • IBM mainframes only ITEC 1011 Introduction to Information Technologies Standard Alphanumeric Formats • • • • BCD ASCII EBCDIC Unicode ITEC 1011 Introduction to Information Technologies Next 2 slides Unicode • 16-bit standard • Developed by a consortia • Intended to supercede older 7- and 8-bit codes ITEC 1011 Introduction to Information Technologies Unicode Version 2.1 • • • • 1998 Improves on version 2.0 Includes the Euro sign (20AC16 = From the standard: ) …contains 38,887 distinct coded characters derived from the supported scripts. These characters cover the principal written languages of the Americas, Europe, the Middle East, Africa, India, Asia, and Pacifica. http://www.unicode.org ITEC 1011 Introduction to Information Technologies Keyboard Input • • • • • • Key (“scan”) codes are converted to ASCII ASCII code sent to host computer Received by the host as a “stream” of data Stored in buffer Processed Etc. pp. 69 ITEC 1011 Introduction to Information Technologies Shift Key • inhibits bit 5 in the ASCII code ASCII code 6 5 4 3 2 1 0 Character Key(s) Shift ITEC 1011 a 1 1 0 0 0 0 1 a a 1 0 0 0 0 0 1 A Introduction to Information Technologies Control Key • inhibits bits 5 & 6 in the ASCII code ASCII code 6 5 4 3 2 1 0 Character Key(s) Ctrl c 1 1 0 0 0 1 1 c c 0 0 0 0 0 1 1 ETX Control code ITEC 1011 Introduction to Information Technologies Other Input • • • • • • OCR – optical character recognition Bar code readers Voice/audio input Punched cards Images / objects Pointing devices pp. 69-86 ITEC 1011 Introduction to Information Technologies OCR Hello, world Optical scan Page of text ITEC 1011 Introduction to Information Technologies 10110110… Computer file Other Input • • • • • • OCR – optical character recognition Bar code readers Voice/audio input Punched cards Images / objects Pointing devices pp. 69-86 ITEC 1011 Introduction to Information Technologies Bar Codes • An automatic identification (Auto ID) technology that streamlines identification and data collection • See http://www.digital.net/barcoder/barcode.html ITEC 1011 Introduction to Information Technologies Other Input • • • • • • OCR – optical character recognition Bar code readers Voice/audio input Punched cards Images / objects Pointing devices pp. 69-86 ITEC 1011 Introduction to Information Technologies Voice/audio Input • Input device: microphone • Audio input is “digitized” and stored • Processed in two ways – As is (no recognition) – Recognized and converted to alphanumeric data (ASCII) Digitize ITEC 1011 10110010… Introduction to Information Technologies Other Input • • • • • • OCR – optical character recognition Bar code readers Voice/audio input Punched cards Images / objects Pointing devices pp. 69-86 ITEC 1011 Introduction to Information Technologies Punched Cards • Invented by Herman Hollerith (founder of IBM) • Each card holds 80 characters ITEC 1011 Introduction to Information Technologies Other Input • • • • • • OCR – optical character recognition Bar code readers Voice/audio input Punched cards Images / objects Pointing devices pp. 69-86 ITEC 1011 Introduction to Information Technologies Images • Typically images are pictures that are optically scanned and saved as a “bit map” or in some other format • Many formats – gif, jpeg, … ITEC 1011 Introduction to Information Technologies Typical “Save As” Dialog ITEC 1011 Introduction to Information Technologies Objects • Images made of geometrically definable shapes • Offer efficiency, flexibility, small size, etc. ITEC 1011 Introduction to Information Technologies Other Input • • • • • • OCR – optical character recognition Bar code readers Voice/audio input Punched cards Images / objects Pointing devices pp. 69-86 ITEC 1011 Introduction to Information Technologies Pointing Devices • Originally used for specifying coordinates (x, y) for graphical input • Today used as general purpose device for “graphical user interfaces” (GUIs) ITEC 1011 Introduction to Information Technologies Thank you ITEC 1011 Introduction to Information Technologies