Topic 7: File Organization Logical vs. Physical Organization of Data logical organization the abstract way that the computer program is able to access the data use of logical structures (e.g. linked lists) physical organization the actual physical structure of data in memory i.e. what the sequence of bits look like in memory Definitions database file collection of related records record collection of related files collection of related fields (e.g. Name, Age) key field uniquely identifies a record (e.g. UserID) Basics (General Idea) Records are stored at different places (different indices or locations) To find a record, we need to know its location We can search for the record OR Jump to its location directly (if location is known) OR A combination of jumping and searching Sequential File Organization Records in a file are stored sequentially (in order) by some key field 2480 Bob 2569 Alice 3020 Paul Originally designed to operate on magnetic tapes How do we find a record? What happens when we try to add a new record? (It’s going to be bad…) Partially-Indexed Sequential Files 1. 2. 3. File index (address) ~ index in a book Partially index all the records Key field has direct index to a section where record of interest is located Sequential search for key field Directly link to section of records Sequential search for record of interest Partially-Indexed Sequential Files Key Record Address Record 1 A 1 2 B 6 3 C 11 D 16 4 5 6 7 8 9 10 11 12 Fully Indexed Files Every record has an index (address) Sequentially search through key field for specific record address Records may be accessed directly OR in sequential order by address Fully Indexed Files Key Record Address a 4 b 7 c 5 d 3 e 12 m 9 n 10 p 2 s 11 t 6 z 1 Direct Access File Organization Record address is derived/calculated with math No need to search through an index Example: Record Address = UserID MOD 8 + SSN MOD 3 Record Address = UserID%8 + SSN%3 This math operation is called “key hashing” or “hashing” Fixed-length vs. Variable-length Records Fixed-length each record is a set size can be used with direct access file organization access based on math calculations, so size must be fixed in length Variable-length each record is a variable size can be used with sequential file organization access is all indexed, so size does not matter