Computers Data Representation Chapter 3, SA Data Representation and Processing Data and information processors must be able to: • Recognize external data and convert it to an appropriate internal format • Store and retrieve data internally • Transport data among internal storage and processing components Binary Representation of Data • Computers represent data using binary numbers. • Binary numbers correspond directly with values in boolean logic. • Computers combine multiple digits to form a single data value to represent large numbers. Basic data types • Integers – whole numbers • Real numbers – w/ fractional components • Exponential representation • Character • ASCII vs EBCDIC • Boolean –true/false • BLOB (Binary Large Object) Data structures • Defined in software • Arrays • Lists • Records • Tables • Files • Indices • Objects Data Structures A data structure is a related group of primitive data elements that is organized for some type of processing. Data structures are defined and manipulated within software. Data Structures Virtually all data structures make extensive use of pointers and addresses. Pointer – a data element that contains the address of another data element. Address – the location of some data element within a storage device. Arrays and Linked Lists Linked List: A linked list is a data structure that uses pointers so list elements can be scattered among nonsequential storage locations. Records and Files • A record is a data structure composed of other data structures or primitive data elements. • Records are used as a unit of input and output to files or databases. File Organization Physical arrangement of the records of a file on secondary storage devices •Sequential •Linked List •Indexed •Hashed Sequential File Sequential file sorted in alphabetical order. Sequential files are usually sorted in ID sequence order to facilitate batch processing. a ddr 00 01 02 03 Ayers Buckley Daley Dejoie ACCT MGT ACCT MGT 04 Kenderdine MKT 05 Linn FIN 06 Lusch MKT 07 Price MGT 08 Razook MKT 09 Schwarzkopf MGT Sequential File Processing Old Master Process New Master Transaction Sequential files must be recopied from the point of any insertion or deletion to the end of the file. They are commonly used in batch processing where a new master file will be generated each time the file is updated. Linked List Linked list to sort data alphabetically within department. An external reference must point to the start record (05). a ddr 00 Price MGT pointe r 01 01 02 03 Schwarzkopf Kenderdine Lusch MGT MKT MKT 02 03 08 04 Buckley MGT 09 05 Ayers ACCT 06 06 Daley ACCT 07 07 Linn FIN 04 08 09 Razook Dejoie MKT MGT ## 00 Linked List File Processing The next record in a linked list is found at the address stored in the record. Records are added at any location in the DASD and pointers adjusted to include them. Deletions are not erased, but pointers changed to omit the deleted record. Indexed File (sequential index) Index to access data by department abbreviation. addr 00 01 02 03 04 Price Schwarzkopf Kenderdine Lusch Buckley MGT MGT MKT MKT MGT ACCT ACCT FIN MGT MGT MGT MKT Ayers Daley Linn Razook Dejoie 00 01 02 00 01 04 03 ACCT ACCT FIN MKT MGT Indexed File Processing Index Index Data File When a record is inserted or deleted in a file the data can be added at any location in the data file. Each index must also be updated to reflect the change. For a simple sequential index this may mean rewriting the index for each insertion. Segmented Index Index addr 100 101 102 103 200 201 202 203 204 205 206 Root Nodes Leaf pointer 101 Kenderdine 200 Buckley 203 Lusch 205 Schwarzkopf 00 Ayers 01 Daley 00 Price 02 Linn 02 Kenderdine 01 Schwarzkopf 5 Van Horn pointer 102 Razook 201 Dejoie 202 206 04 Buckley 04 Dejoie 03 Razook 03 Lusch Data addr 00 Price MGT Ayers ACCT 01 Schwarzkopf MGT Daley ACCT 02 Kenderdine MKT Linn FIN 03 Lusch MKT Razook MKT 04 Buckley MGT Dejoie MGT 05 Van Horn MGT pointer 103 204 201 204 205 202 203 206 Indexed File Processing (segmented index) Index Data File Data can be inserted or deleted at any location in the data file. The index(es) must be updated for each change, but only the affected segments need to be rewritten. Hashing (Prime Number Remainder Algorithm) Pick a prime number to define the file space Divide the key by the prime number Put the result in the location of the remainder 3 Key = 41 13 41 39 2 Location = 2 Hashed File Processing addr Key Calculation Contents Records and Files • A sequence of records on secondary storage is called a file. • A sequence of records stored within main memory is called a table. • Sequential files suffer the same problems as contiguous arrays when inserting and deleting records. • To eliminate this problem, linked lists and indexed files are used. Classes and Objects