CS 405G: Introduction to Database Systems 20. Concurrent control, Storage Presentation Schedule Apr 20 (1 day extension for final report) Wilcox, Chung, Zakaria Claytor, Allison Akridge, Jenkins, Dunbar Hill, Weisbrod, Caudill Vinogradov, Brothers M. Turner, Ruffner, Dunbar (?) Dale, Bentley 7/1/2016 Chen Qian @ University of Kentucky 2 Presentation Schedule Apr 22 (1 day extension for final report) R. Tuner Resch Williams Roberts Wise, Zachary, Almon Lewis Joiner 7/1/2016 Chen Qian @ University of Kentucky 3 Presentation Schedule Apr 25 Amir Johnson Zheng Morganett, Teague Fullington, O’Kane Ferry Gant 7/1/2016 Chen Qian @ University of Kentucky 4 Presentation Schedule Apr 27 Becker Moustafa Houghton James Campbell Underwood Bishop 7/1/2016 Chen Qian @ University of Kentucky 5 Storage Most on disks! That’s why we always draw databases as And why the single most important metric in database processing is the number of disk I/Os performed 7/1/2016 Chen Qian @ University of Kentucky 6 The Storage Hierarchy Main memory (RAM) for currently used data Disk for the main database (secondary storage). Tapes for archiving older versions of the data (tertiary storage). Smaller, Faster Bigger, Slower Source: Operating Systems Concepts 5th Edition 7/1/2016 Chen Qian @ University of Kentucky 7 A typical disk Tracks Disk arm Platter Arm movement 7/1/2016 Spindle Disk head Spindle rotation Chen Qian @ University of Kentucky Cylinders “Moving parts” are slow 8 Disk access time Sum of: Seek time: time for disk heads to move to the correct track Rotational delay: time for the desired block to rotate under the disk head Transfer time: time to read/write data in the block (= time for disk to rotate over the block) 7/1/2016 Chen Qian @ University of Kentucky 9 Random disk access Seek time + rotational delay + transfer time Average seek time Average rotational delay Time to skip one half of the tracks? Not quite; should be time to skip a third of them “Typical” value: 5 ms Time for a half rotation (a function of RPM) “Typical” value: 4.2 ms (7200 RPM) Typical transfer time .08msec per 8K block 7/1/2016 Chen Qian @ University of Kentucky 10 Sequential Disk Access Improves Performance Seek time + rotational delay + transfer time Seek time Rotational delay 0 (assuming data is on the same track) 0 (assuming data is in the next block on the track) Easily an order of magnitude faster than random disk access! 7/1/2016 Chen Qian @ University of Kentucky 11 Performance tricks Disk layout strategy Keep related things (what are they?) close together: same sector/block ! same track ! same cylinder ! adjacent cylinder Double buffering While processing the current block in memory, prefetch the next block from disk (overlap I/O with processing) Disk scheduling algorithm Track buffer Read/write one entire track at a time Parallel I/O More disk heads working at the same time 7/1/2016 Chen Qian @ University of Kentucky 12 Records and Files A row in a table, when located on disks, is called A record Two types of record: Fixed-length Variable-length 7/1/2016 Chen Qian @ University of Kentucky 13 In an abstract sense, a file is In reality, a file is A set of disk pages Each record lives on A set of “records” on a disk A page Physical Record ID (RID) A tuple of <page#, slot#> 7/1/2016 Chen Qian @ University of Kentucky 14 Files Higher levels of DBMS operate on records, and files of records. FILE: A collection of pages, each containing a collection of records. Record = row in a table Must support: insert/delete/modify record fetch a particular record (specified using record id) scan all records (possibly with some conditions on the records to be retrieved) 7/1/2016 Chen Qian @ University of Kentucky 15 Unordered (Heap) Files Simplest file structure contains records in no particular order. As file grows and shrinks, disk pages are allocated and de-allocated. To support record level operations, we must: keep track of the pages in a file keep track of free space on pages keep track of the records on a page There are many alternatives for keeping track of this. We’ll consider 2 7/1/2016 Chen Qian @ University of Kentucky 16 Heap File Implemented as a List Data Page Data Page Data Page Full Pages Header Page Data Page Data Page Data Page Pages with Free Space The header page id and Heap file name must be stored someplace. Database “catalog” Each page contains 2 `pointers’ plus data. 7/1/2016 Chen Qian @ University of Kentucky 17 Heap File Using a Page Directory Data Page 1 Header Page Data Page 2 DIRECTORY 7/1/2016 Data Page N DBMS must remember where the first directory page is located. The entry for a page can include the number of free bytes on the page. The directory itself is a collection of pages and shown as a linked list. Much smaller than linked list of all HF pages! Chen Qian @ University of Kentucky 18 Trade-off Compared to linked list, the directory based approaches have: Pros: Fast access to a particular record Cons: Extra space 7/1/2016 Chen Qian @ University of Kentucky 19 Summary Disks provide cheap, non-volatile storage. Random access, but cost depends on the location of page on disk; important to arrange data sequentially to minimize seek and rotation delays. 7/1/2016 Chen Qian @ University of Kentucky 20 Record layout Record = row in a table Variable-format records Rare in DBMS—table schema dictates the format Relevant for semi-structured data such as XML Focus on fixed-format records With fixed-length fields only, or With possible variable-length fields 7/1/2016 Chen Qian @ University of Kentucky 21 Record Formats: Fixed Length F1 F2 F3 F4 L1 L2 L3 L4 Base address (B) All field lengths and offsets are constant Address = B+L1+L2 Computed from schema, stored in the system catalog Finding i’th field done via arithmetic. 7/1/2016 Chen Qian @ University of Kentucky 22 Fixed-length fields Example: CREATE TABLE Student(SID INT, name CHAR(20), age INT, GPA FLOAT); 0 Watch out for alignment 4 24 28 36 142 Bart (padded with space) 10 2.3 May need to pad; reorder columns if that helps What about NULL? Add a bitmap at the beginning of the record 7/1/2016 Chen Qian @ University of Kentucky 23 Record Formats: Variable Length Two alternative formats (# fields is fixed): F1 F2 F3 $ F4 $ $ $ Fields Delimited by Special Symbols F1 F2 F3 F4 Array of Field Offsets Second offers direct access to i’th field, efficient storage of nulls; extra but small directory overhead. 7/1/2016 Chen Qian @ University of Kentucky 24 Trade-off Compared to the 1st formats, the offset based format has: Pros: Fast access to a particular field; efficient storage of nulls Cons: Extra space 7/1/2016 Chen Qian @ University of Kentucky 25 Large Object (LOB) fields Example: CREATE Student records get “de-clustered” TABLE Student(SID INT, name CHAR(20), age INT, GPA FLOAT, picture BLOB(32000)); Bad because most queries do not involve picture Decomposition (automatically done by DBMS and transparent to the user) Student(SID, name, age, GPA) StudentPicture(SID, picture) 7/1/2016 Chen Qian @ University of Kentucky 26 Page layout How do you organize records in a page? Fixed length records Variable length records NSM (N-ary Storage Model) is used in most commercial DBMS 7/1/2016 Chen Qian @ University of Kentucky 27 Page Formats: Fixed Length Records Slot 1 Slot 2 Slot 1 Slot 2 Free Space ... ... Slot N Slot N Slot M 1 . . . 0 1 1M N PACKED number of records M ... 3 2 1 UNPACKED, BITMAP number of slots Record id = <page id, slot #>. In first alternative, moving records for free space management changes rid; may not be acceptable. 7/1/2016 Chen Qian @ University of Kentucky 28 Trade-off Compared to the packed formats, the unpacked format has: Pros: No need to move records into vacated slots after delete. No need to change the slot number. Cons: To insert a record, one needs to find an empty slot, by scanning the bitmap. 7/1/2016 Chen Qian @ University of Kentucky 29 Variable-Length Records Store records from the beginning of each page Use a directory at the end of each page To locate records and manage free space Necessary for variable-length records 142 Bart 857 Lisa 10 2.3 123 Milhouse 10 3.1 8 4.3 456 Ralph 8 2.3 Why store data and directory at two different ends? Both can grow easily 7/1/2016 Chen Qian @ University of Kentucky 30 Options Reorganize after every update/delete to avoid fragmentation (gaps between records) Need to rewrite half of the block on average What if records are fixed-length? Reorganize after delete Only need to move one record Need a pointer to the beginning of free space Do not reorganize after update 7/1/2016 Need a bitmap indicating which slots are in use Chen Qian @ University of Kentucky 31 System Catalogs For each relation: For each index: structure (e.g., B+ tree) and search key fields For each view: name, file location, file structure (e.g., Heap file) attribute name and type, for each attribute index name, for each index integrity constraints view name and definition Plus statistics, authorization, buffer pool size, etc. Catalogs are themselves stored as relations! 7/1/2016 Chen Qian @ University of Kentucky 32 Indexes A Heap file allows us to retrieve records: Sometimes, we want to retrieve records by specifying the values in one or more fields, e.g., by specifying the rid, or by scanning all records sequentially Find all students in the “CS” department Find all students with a gpa > 3 Indexes are file structures that enable us to answer such value-based queries efficiently. 7/1/2016 Chen Qian @ University of Kentucky 33 Basics Given a value, locate the record(s) with this value SELECT * FROM R WHERE A = value; SELECT * FROM R, S WHERE R.A = S.B; Other search criteria, e.g. Range search SELECT * FROM R WHERE A > value; Keyword search database indexing 7/1/2016 Chen Qian @ University of Kentucky Search 34 Dense and sparse indexes Dense: one index entry for each search key value Sparse: one index entry for each block Records must be clustered according to the search key 7/1/2016 Chen Qian @ University of Kentucky 35 Dense versus sparse indexes Index size Requirement on records Records must be clustered for sparse index Lookup Sparse index is smaller Sparse index is smaller and may fit in memory Dense index can directly tell if a record exists Update Easier for sparse index 7/1/2016 Chen Qian @ University of Kentucky 36 Primary and secondary indexes Primary index Secondary index Created for the primary key of a table Records are usually clustered according to the primary key Can be sparse Usually dense SQL PRIMARY KEY declaration automatically creates a primary index, UNIQUE key automatically creates a secondary index Additional secondary index can be created on non-key attribute(s) CREATE INDEX StudentGPAIndex ON Student(GPA); 7/1/2016 Chen Qian @ University of Kentucky 37 Tree-Structured Indexes: Introduction Tree-structured indexing techniques support both range selections and equality selections. ISAM =Indexed Sequential Access Method static structure; early index technology. B+ tree: dynamic, adjusts gracefully under inserts and deletes. 7/1/2016 Chen Qian @ University of Kentucky 38 Motivation for Index ``Find all students with gpa > 3.0’’ If data file is sorted, do binary search Cost of binary search in a database can be quite high, Why? Simple idea: Create an `index’ file. index entry P0 Page 1 K 1 P1 Index File kN k1 k2 K 2 Page 2 P 2 Page 3 K m Pm Page N Data File Can do binary search on (smaller) index file! 7/1/2016 Chen Qian @ University of Kentucky 39 ISAM What if an index is still too big? Put a another (sparse) index on top of that! ISAM (Index Sequential Access Method), more or less Example: look up 197 100, 200, …, 901 Index blocks 100, 123, …, 192 200, … … 901, …, 996 100, 108,123, 129, 192, 197,200, 202, 901, 907, 996, 997, … … …… 119, 121 … … … … Data blocks 7/1/2016 Chen Qian @ University of Kentucky 40 How many steps? For a balanced tree of N pages, the num of search steps is O(logN) Less than a constant times logN 7/1/2016 Chen Qian @ University of Kentucky 41 Updates with ISAM Example: insert 107 Example: delete 129 100, 200, …, 901 Index blocks 100, 123, …, 192 200, … … 901, …, 996 100, 108,123, 129, 192, 197,200, 202, 901, 907, 996, 997, … … …… 119, 121 … … … … Data blocks 107 Overflow block Overflow chains and empty data blocks degrade performance Worst case: most records go into one long chain 7/1/2016 Chen Qian @ University of Kentucky 42 How many steps? For a chain of N pages, the worst-case num of search steps is N Much larger than O(logN)! 7/1/2016 Chen Qian @ University of Kentucky 43 A Note of Caution ISAM is an old-fashioned idea B+-trees are usually better, as we’ll see But, ISAM is a good place to start to understand the idea of indexing Upshot 7/1/2016 Don’t brag about being an ISAM expert on your resume Do understand how they work, and tradeoffs with B+trees Chen Qian @ University of Kentucky 44 Summary (Contd.) DBMS vs. OS File Support 7/1/2016 DBMS needs features not found in many OSes, e.g., forcing a page to disk, controlling the order of page writes to disk, files spanning disks, ability to control pre-fetching and page replacement policy based on predictable access patterns, etc. Variable length record format with field offset directory offers support for direct access to ith field and null values. Chen Qian @ University of Kentucky 45