SCSC 311 Information Systems: hardware and software Chapter 3 Objectives Numbering systems Various data representation methods The representation of nonnumeric data Data structures Data Representation and Processing Capabilities required of any (mechanical, electrical, or optical )information processor: Recognizing external data and converting it to an appropriate internal format; Storing and retrieving data internally Transporting data among internal storage and processing components; Manipulating data to produce desired results or decisions; Goals of Computer Data Representation (1) Compactness the number of bits used to represent a numeric value The more compact a data representation format, the less expensive it is to implement in computer hardware. Q: Which one is more compact format, binary or decimal? Range In a given data representation, the more the bits used the larger the range is. But a large numeric range has a cost … Accuracy: Precision of representation increases with number of data bits used In some cases, the quantities must be manipulated and stored as approximations a degree of error compounded errors Optimum coding method for each type of data or each type of operation. As the number of bits are limited (fixed) in any hardware, we need to find the optimal tradeoff between range and accuracy. How? Goals of Computer Data Representation (2) Ease of manipulation Is the machine efficiency when executing processor instructions Data representation formats decides the complexity of circuit. Discussion: the complexity of circuit when using decimal / binary data representation. Standardization Ensures correct and efficient data transmission among computer systems Various organizations have created standard data encoding methods, which provide flexibility to combine hardware from different vendors with minimal data communication problems e.g. ASCII, Unicode Q: Why current electronic computers represent data using binary format? Why current electronic computers represent data using binary format? Ans: Binary numbers represented as electrical signals can be reliably transported among computer systems components; Binary numbers represented as electrical signals can be processed by two–state electrical devices that are relatively easy to design and fabricate; Binary numbers correspond directly with values in Boolean logic. Automated Data Processing Computers represent data electrically and process it with electrical switches Two-state (ON/OFF) electrical switches are well suited for binary format (0/1) Electrical switches processing circuits CPU Automated data processing combines electronics and mathematics A+B=C Positional Numbering System Positional Numbering System: Key terms: base / radix, radix point The symbol at each digit & the digit position value The value of the entire string is the sum of the values of all digitals within the string. Base / radix: the multiplier that describes the difference between one position and the next. Radix point: the fractional part of a numeric value is separated from the whole part by a point Two common positional numbering systems: Decimal Notation Uses 10 as its base 10 possible values (0, 1, 2, … 9 ) per digit Binary Notation Uses 2 as its base two possible values (0 or 1) per digit Binary, Decimal Notations Conversion: binary & decimal E.g. 1: (101101.101)2 = ( )10 E.g. 2: (45)10 = ( )2 Ans: binary to decimal Other data formats: Hexadecimal & Octal (self-study) Hexadecimal Uses 16 as its base Compact; advantage over binary notation Often used to designate memory addresses The primary advantage of hexadecimal notation, as compared to binary notation, is its compactness. Octal Uses 8 as its base Expresses large numeric values in: One-third the length of corresponding binary notation Double the length of corresponding hexadecimal notation Hexadecimal Notation (self-study) Octal to Decimal Octal Decimal (3 2 1)8 82 81 80 3 x 82 = 3 x 64 = 192 2 x 81 = 2 x 8 = 16 1 x 80 = 1 x 1 = 1 (209)10 (self-study) Hexadecimal to Decimal (self-study) Hexadecimal Decimal (A B C)16 162 161 160 A x 162 = 10 x 256 = 2560 B x 161 = 11 x 16 = 176 C x 160 = 12 x 1 = 1 (2748)10 Index Numbering systems Various data representation methods The representation of nonnumeric data Data structures CPU Data Types Primitive data types Integer Real number Character Boolean Memory address Representation format for each type balances: compactness, accuracy, ease of manipulation, and standardization Integers Integer is a whole number — a value that does not have a fractional part Most CPUs provide an unsigned integer data type Store positive integers as ordinary binary numbers Other binary notations: Excess notation Two’s complement notation Excess Notation Excess Notation can be used to represent signed integers Divides a range of binary numbers in half lower half for negative values upper half for nonnegative values (as shown in figure) The leftmost bit representing the sign (1 for nonnegative and 0 for negative values) Excess Notation Excess Notation To represent a specific integer value in excess notation, you must know how many bits are to be used. Range: from -2^(n-1) to 2^(n-1) – 1 Exercise: In 8-bit excess notation, e.g.1: (25)10 = ( )2 e.g.2: (-25)10 = ( )2 Two’s Complement Notation Two’s complement: Nonnegative integer = ordinary binary values Negative integer = Complement of positive binary values + 1 Range: from -2^(n-1) to 2^(n-1) – 1 Exercise: in 8-bit two’s complement notation e.g.1 (25)10 = ( )2 e.g.2 (-25)10 = ( )2 e.g.3 (0000 1111)2 = ( )10 e.g.4 (1111 1011)2 = ( )10 Two’s Complement Notation Why two’s Complement is common in CPU design? Two’s Complement Notation Ans: Two’s complement is awkward to people, but It is Highly compatible with digital electronic circuitry Only two logic circuits required to perform addition on single-bit values Adding two's complement numbers requires no special processing if the operands have opposite signs: the sign of the result is determined automatically. Subtraction can be performed as addition of a negative value Two’s Complement Notation e.g. 1 : 0000 1111 + 1111 1011 e.g. 2: 0110 0100 - 0001 0110 e.g.3 0110 0100 + 0111 0011 Range and Overflow Overflow Occurs when absolute value of a computational result contains too many bits to fit into fixed-width data format Range: -2^(n-1) to 2^(n-1) – 1 Treated as an error by the CPU Avoiding overflow Double precision data formats: combines two adjacent fixed-length data item to hold a single value e.g. long integer Careful programming Real Numbers Real Numbers contain both whole and fractional components Require separation of components to be represented within computer circuitry Fixed radix point notation (simple but inflexible) Floating point notation (complex but flexible) Floating Point Notation Scientific notation: a x 10b exponent b is an integer, mantissa a is any real number in the range of 1 to 10, excluding 10. Floating Point Notation: similar to scientific notation, except that 2 is the base value = mantissa x 2exponent Trades numeric range for accuracy Value can have many digits of precision for large or small magnitudes, but not both simultaneously Floating point numbers are less accurate and more difficult to process than two’s complement format IEEE 32-bit Floating Point Format • one leading sign bit; • 23-bit mantissa in coded as an ordinary binary number. • 8-bit exponent is coded in excess notation Range, Overflow, and Underflow Range: limited by number of bits in a floating point string & formats of mantissa and exponent fields Overflow: occur within the exponent The number of bits in the mantissa the number of significant digits The number of bit in the exponent the number of possible bit position to the right / left of the radix point. Large positive exponent floating point number with large absolute value Underflow: occurs when absolute value of a negative exponent is too large to fit within allocated bits Examples … Precision and Truncation Precision Accuracy is reduced as the number of digits available to store mantissa is reduced more bits in exponent part a larger range Truncation Stores numeric value in the mantissa until available bits are consumed; discards remaining bits More values have non-terminating representations in binary than decimal. E.g. (0.1)10 (?)2 Causes an error or approximation Problem when truncated values are used as input to computation, approximations could be magnified. Processing Complexity Floating point formats Although it is optimized for processing efficiency, floating point notation requires complex processing circuitry (translates to difference in speed) Programmers should never use real numbers when an integer will suffice (speed and accuracy) e.g. In monetary system, Index Numbering systems Various data representation methods The representation of nonnumeric data Data structures Character Data Character data are represented indirectly by defining a table that assigns numeric values to individual characters (alphabetic / numerical letter, punctuation mark, special purpose symbol) Common Coding Methods EBCDIC (Extended Binary Coded Decimal Interchange Code) developed by IBM in the 60’s ASCII (American Standard Code for Information Interchange) Subset of Unicode Defines a number of device control codes (CR, LF …) Some limitations A partial list of ASCII and EBCDIC (self-study) Unicode ASCII Limitations Insufficient range (self-study) Uses 7-bit code, providing 128 table entries (33 for device control) 95 printable characters can be represented An English-based Coding method Unicode Assigns nonnegative integers to represent individual printable characters (like ASCII) Larger coding table than ASCII Uses 16-bit code providing 65,536 table entries Can represent written text from all modern languages Widely supported in modern software Boolean and Memory Address Boolean Has only two data values—true and false The most concise coding format; only a single bit is required Memory Address Primary memory is a series of contiguous bytes of storage Memory address is unique identifying number of memory Memory Address Two memory address models: Flat memory addresses Memory bytes are identified by a series of nonnegative numbers Minimize the complexity of processor circuitry Segmented memory addresses Using multiple integers as memory addresses e.g. segmented memory model in IBM PC Pages are identified by sequential nonnegative integers; Each byte in a page is identified by a sequential nonnegative integers. Therefore, each byte of memory has a two-part address: page number and byte number in the page a new data type is required. Index Numbering systems Various data representation methods The representation of nonnumeric data Data structures Data Structures Data structures are related groups of primitive data elements organized for a type of common processing Are defined and manipulated within software Commonly used data structures: array, linked list, record, table, file, index, and object Some data structures are supported by system software: string, record, file Other data structures are usually supported by programming languages: array, indexed file, database structures Pointers and Address Pointer: data element that contains the address of another data element Many data structures use pointers to link primitive data components Address: location of a data element within a storage device Array and List List: A set of related data values Array: An ordered list in which each element can be referenced by an index to its position e.g. A character array Linked Lists (self-study) Data structures that use pointers so list elements can be scattered among nonsequential storage locations Singly linked lists Doubly linked lists Circular Linked Lists Etc. Easier to expand or shrink than an array (a) (b) A Linked List in RAM (self-study) e.g. Insert a new element in a Singly Linked List (self-study) Q: How to insert a new element in an array (in contiguous memory) ? Records Record Data structures composed of other data structures or primitive data elements Used as a unit of input and output to files File and Methods of Organizing Files Files Sequence of records on secondary storage Two Methods of Organizing Files Sequential File Stores records in contiguous storage locations Indexed File An array of pointers to records Efficient record insertion, deletion, and retrieval An Indexed File Classes and Objects Classes Data structures that contain: (self-study) traditional data elements (static part) methods that manipulate data elements (dynamic part) Related data items & methods that manipulate the data items Objects One instance, or variable, of the class Summary Understanding data representation is key to understanding hardware and software technology How data is represented and stored in computer hardware (e.g. integer, floating, …) How data types are used as building blocks to create more complex data structures (e.g., arrays, records, files, …) Mock Quiz True / false A cluster is a group of similar or identical computers, connected by a high speed network, that cooperate to provide services or execute a common application. A doubly linked list stores one pointer with each list element. Multiple choice The Babbage difference engine is an example of what type of computing device? a.Mechanical b.Electrical c.Optical d.Quantum Mock Quiz Fill in blank 5. The contents of ____________________ can be accessed by the CPU more quickly than the contents of primary storage. Short answer: What is the difference between an application program and a systems program? Convert a decimal number (-34)10 to octal and two’s complement binary (in 8-bit);