Systems Architecture, Fourth Edition Chapter 3 Chapter 3 Data Representation Objectives Chapter Outline Instructor Notes Overview Data Representation and Processing People represent data in several forms: Arabic numerals, Roman numerals, lines or tick marks, pictorial characters, alphabetic characters, etc. To be able to manipulated and processed by the brain, the data must be converted into an appropriate internal format. Computers can also use a variety of data representations and must also convert “external” data into an appropriate internal format for processing. Any data and information processor must be able to: Recognize external data and convert it to an appropriate internal format Store and retrieve data internally Transport data among internal storage and processing components. Automated Data Processing The relationship of mathematics and physics underlies all automated computation devices from mechanical clocks to electrical microprocessors. Since the computer processing is based on mathematics and physics, the operations must be based on mathematical functions such as additional and equality comparisons, use numerical data inputs, and generate numerical output. The majority of computer systems can only process numerical data. Binary Data Representation Characteristics of all numbering systems. 1. 2. 3. 4. The number of symbols is equal to the base. The value of the smallest symbol is 0. The value of the largest symbol is the base minus 1. The value of each symbol is a multiple of the base. Decimal Numbering System: (Base 10) 1. 2. 3. 4. Number of symbols – 10 Symbols: (0,1,2,3,4,5,6,7,8,9) Smallest symbol – 0 Largest symbol – 9 Value of each symbol – 100 101 102 103 104 105, etc. Binary numbers also are well-suited to computer processing because they correspond directly with values in Boolean logic. Using binary numbers does not mean that the only number a computer can represent and 1 Systems Architecture, Fourth Edition Chapter 3 process are zero and one. Both computers and humans can combine multiple digits to form a single data value to represent and manipulate large numbers. Binary Numbering System: (Base 2) 1. 2. 3. 4. Number of symbols - Symbols: (0,1) Smallest symbol – 0 Largest symbol – 1 Value of each symbol – 20 21 22 23 24 25, etc. The primary advantage of hexadecimal notation, as compared to binary notation, is its compactness. Except for very small values, numeric values expressed in binary notation require four times as many digits as those expressed in hexadecimal notation. Hexadecimal Numbering System: (Base 16) 1. 2. 3. 4. Number of symbols 16 - Symbols: (0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F) Smallest symbol – 0 Largest symbol – F (equivalent to 15 = 16-1) Value of each symbol – 160 161 162 163 164 165, etc. Some operating systems and machine programming languages, such as those with some IBM mainframe computers, use octal notation. Octal Numbering System: (Base 8) Goals of Computer Data Representation Any representation format for numeric data represents a balance among several factors, including: Compactness - the more digits the computer needs to represent, larger the bit strings are needed. Accuracy – the accuracy, or precision, of representation increases with the number of data bits that are used. Range – the more digits that are used to represent the characters, the greater the range of values that can be represented. Ease of manipulation – the efficiency of a processor depends on the complexity of the data it processes. Efficient processor circuits perform their function quickly due to a relatively small number of components and the short distance that electricity must travel. 2 Systems Architecture, Fourth Edition Chapter 3 Standardization - to ensure the correct and efficient data transmission, data formats must be suitable or a wide variety of devices and computer systems. CPU Data Types (Integer, Real Numbers, Character Data, Boolean Data) The CPU can represent and process five basic data types: Integer Real Number Character Boolean Memory Address Integer – an integer is a whole number that does not have a fractional part. An integer can be signed or unsigned. A signed integer uses one bit to represent whether the value is positive or negative. The bit normally is the high order bit in a binary numeric format. The high order bit is the leftmost bit. In most, formats the sign bit is 1 for a negative number and 0 for a positive number. Integers are most commonly represented in twos complement notation. Real Number – a real number contains both whole numbers and fractional components. The fractional portion is indicated by digits to the right of the radix point. Real numbers are most commonly represented in IEEE floating point format. Character – each alphabetic letter, number, punctuation mark and special-purpose symbol is a character. A sequence of characters that forms a meaningful word, phrase or other useful grouping is a string. Some common coding methods for character data are: BCD, EBCDIC, ASCII, and Unicode. Boolean – boolean data type has only two values, true and false. The Boolean data type is the most concise coding format. Binary 1 can represent true and 0 can represent false. Memory Addresses – The flat memory model and segmented memory models determine the format of memory addresses. The flat memory model uses a series of nonnegative integers to represent memory locations. The segmented memory model uses a series of equal-sized segments called pages to divide memory. Segmented memory addresses are primarily addressed through the Tech Focus on Intel Memory Address Formats. Data Structures A data structure is a related group of primitive data elements that is organized for some type of common processing. 3 Systems Architecture, Fourth Edition Chapter 3 Pointers and Addresses – a pointer is a data element that contains the address of another data element. An address is the location of some data element within a storage device. Arrays and Lists – an array is an ordered list in which each element can be referenced by an index to its position. A simple array is a string, which is a character array. In a string, the characters are stored in contiguous memory locations. An array can also be used to store strings and numbers. A linked list is a data structure that uses pointers so list elements can be scattered among nonsequential storage locations. You can create a doubly linked list or a singly linked list. It is easier to search for a data item in a doubly linked list. Records and Files – a record is a data structure composed of other data structures or primitive data elements. Records commonly are used as a unit of input and output to files or databases. A collection of related records forms a file. Classes and Objects – A class is a data structure that contains both traditional, or static, data elements and programs that manipulate that data. The programs in a class are methods. A class combines related data 4 Systems Architecture, Fourth Edition Chapter 3 items in much the same way as a record, but it extends the record to include methods that manipulate the data items. Further Readings or Resources See http://averia.mgt.unm.edu for an up-to-date list of reference materials. Key Terms address, 96 Physical location of a data item (or the first element of a set of data items) within a storage or output device. American Standard Code for Information Interchange (ASCII), 85 Standard 7-bit coding scheme for character data and a limited set of I/O device control functions. array, 97 Set of data items stored in consecutive storage areas and identified by a common name. base, 67 Multiplier that describes the difference between one position and the next in a numbering system. Binary Coded Decimal (BCD), 84 Antiquated 6-bit character coding method used by early IBM computer systems. bit, 70 (1) Value represented in a one position of a binary number. (2) Number that can have a value of zero or one. (3) Abbreviation of binary digit. bit string, 70 Sequence of bits that describes a single data value. Boolean data type, 91 Data item class that can store only the value true or false. Boolean logic, 66 Formal logic system in which data inputs and outputs can have only the values of true or false. byte, 70 5 Systems Architecture, Fourth Edition Chapter 3 String of eight bits. character, 84 (1) Primitive and indivisible component of a written language. (2) One byte of ASCII-encoded data or two bytes of Unicode-encoded data. class, 103 Data structure that contains both traditional data elements and the software that manipulates the data elements. collating sequence, 89 Order of symbols produced by sorting them according to a numeric interpretation of their coded values. data structure, 95 A data item, such as an array or linked list, that contains multiple primitive data elements or other data structures. decimal point, 67 Period or comma in the decimal numbering system that separates the whole and fractional parts of a numeric value. double precision, 79 Representing a numeric value with twice the usual number of bit positions for greater accuracy or numeric range. doubly linked list, 101 Set of stored data items in which each element contains pointers to both the previous and next list elements. excess notation, 76 Coding method that represents integer values as bit strings such that the value zero (the midpoint of the numeric range) is represented by all zero bits except for the most significant bit (which contains a one). Extended Binary Coded Decimal Interchange Code (EBCDIC), 85 Standard coding system for representing character data in an 8-bit format, most commonly used with older IBM mainframe flat memory model, 92 Memory organization and access method in which memory locations are described by single unsigned integers that corresponds to linear position. floating point notation, 81 Method of encoding real numbers in a bit string consisting of two parts – a mantissa and exponent. hexadecimal notation, 72 Numbering system with a base value of 16, which uses digit values ranging from 0 to 9 and from A to F (corresponding to decimal values 0 to 15). high-order bit, 70 Synonym of most significant digit. index, 102 Stored set of paired data items. Each pair contains a key value and a pointer to the location of the data item possessing (or corresponding to) that key value. integer, 76 Whole number, or a value that does not have a fractional part. International Alphabet Number 5 (IA5), 85 6 Systems Architecture, Fourth Edition Chapter 3 International equivalent of ASCII. International Standards Organization (ISO), 85 International body with functions similar to those of the American National Standards Institute. Latin-1, 90 Standard character coding table containing the ASCII character set in the first 128 table entries and most of the additional characters used by Western European languages in the second 128 table entries. least significant digit, 70 Bit position within a bit string that represents the least, or smallest, magnitude. linked list, 98 Data structure in which each data item contains a pointer to the previous or next data item. long integer, 79 Double precision representation of an integer. low-order bit, 70 Synonym of least significant digit. machine data type, 95 Synonym of primitive data type. method, 103 Program within a class that manipulates data. most significant digit, 70 Bit position within a bit string that represents the most (largest) magnitude. multinational character, 90 Character, such as ñ or é, similar to an English language character but used by Western European languages other than English. numeric range, 78 Set of all data values that can be represented by a specific data-encoding method. object, 103 One instance, or variable, of a class. octal notation Base 8 numbering system that uses digit values ranging from 0 to 7. overflow, 78 Error condition that occurs when the output bit string of a processing operation is too large to fit in the designated register. pointer, 96 (1) Data element that contains the address (location in a storage device) of another data element. (2) Device used to input positional data or control the location of a cursor. primitive data type, 95 Integer, real number, character, Boolean, memory address, and double precision data types supported by a central processing unit. 7 Systems Architecture, Fourth Edition Chapter 3 radix, 67 Base of the numbering system, such as 2 for the binary numbering system and 10 for the decimal numbering system. radix point, 67 Period or comma that separates the whole and fractional parts of a numeric value. real number Number that can contain both whole and fractional components. record, 102 (1) Data structure composed of data items relating to a single entity such as a person or transaction. (2) Unit of data transfer. (3) Primary component data structure of a file. segmented memory model, 92 Memory allocation and partitioning based on equal sized segments and segmented memory addresses. singly linked list, 98 Data structure in which each data item contains a pointer to the next data item. string, 84 Ordered set of related data elements, usually stored as a list or an array. truncation, 82 Act of deleting bits that will not fit within a storage location. two’s complement, 77 Notation system that represents positive integers as an ordinary bit string, and negative integers by adding one to the bit string that represents the absolute value. underflow, 82 (1) Condition that occurs when a value is too small to represent in floating point notation. (2) Overflow of a negative exponent in floating point notation. Unicode, 90 Standard 16-bit character coding method. 8