Topics for today CS 245: Database System Principles • How to lay out data on disk • How to move it to memory Notes 03: Disk Organization Hector Garcia-Molina CS 245 Notes 3 1 What are the data items we want to store? • • • • a a a a salary name date picture CS 245 Notes 3 2 What are the data items we want to store? • • • • a a a a salary name date picture What we have available: Bytes 8 bits CS 245 Notes 3 3 To represent: CS 245 4 To represent: • Integer (short): 2 bytes • Characters various coding schemes suggested, e.g., 35 is 00000000 00100011 most popular is ascii Example: A: 1000001 a: 1100001 5: 0110101 LF: 0001010 • Real, floating point n bits for mantissa, m for exponent…. CS 245 Notes 3 Notes 3 5 CS 245 Notes 3 6 1 To represent: • Boolean e.g., TRUE FALSE To represent: • Boolean e.g., TRUE FALSE 1111 1111 0000 0000 • Application specific e.g., RED 1 GREEN 3 BLUE 2 YELLOW 4 … 1111 1111 0000 0000 • Application specific e.g., RED 1 GREEN 3 BLUE 2 YELLOW 4 … Can we use less than 1 byte/code? Yes, but only if desperate... CS 245 Notes 3 7 To represent: Notes 3 9 To represent: 8 • String of characters – Null terminated e.g., c a t – Length given e.g., 3 c a t - Fixed length CS 245 Notes 3 10 Key Point • Bag of bits Length Notes 3 To represent: • Dates e.g.: - Integer, # days since Jan 1, 1900 - 8 characters, YYYYMMDD - 7 characters, YYYYDDD (not YYMMDD! Why?) • Time e.g. - Integer, seconds since midnight - characters, HHMMSSFF CS 245 CS 245 • Fixed length items Bits • Variable length items - usually length given at beginning CS 245 Notes 3 11 CS 245 Notes 3 12 2 Overview Also Data Items Records • Type of an item: Tells us how to interpret (plus size if fixed) Blocks Files Memory CS 245 Notes 3 13 Record - Collection of related data items (called FIELDS) Notes 3 14 • Main choices: – FIXED vs VARIABLE FORMAT – FIXED vs VARIABLE LENGTH 15 Fixed format CS 245 Notes 3 16 Example: fixed format and length A SCHEMA (not record) contains following information - # fields - type of each field - order in record - meaning of each field CS 245 Notes 3 Types of records: E.g.: Employee record: name field, salary field, date-of-hire field, ... CS 245 CS 245 Notes 3 Employee record (1) E#, 2 byte integer (2) E.name, 10 char. (3) Dept, 2 byte code 17 CS 245 55 s m i t h 02 83 j o n e s 01 Notes 3 Schema Records 18 3 Variable format Example: variable format and length # Fields Code identifying field as E# Integer type 46 4 S 4 F O RD Code for Ename String type Length of str. 2 5 I • Record itself contains format “Self Describing” Field name codes could also be strings, i.e. TAGS CS 245 Notes 3 19 Variable format useful for: CS 245 Notes 3 20 • EXAMPLE: var format record with repeating fields Employee one or more children • “sparse” records • repeating fields • evolving formats 3 E_name: Fred Child: Sally Child: Tom But may waste space... CS 245 Notes 3 21 Note: Repeating fields does not imply - variable format, nor - variable size John Sailing Chess CS 245 Notes 3 22 Note: Repeating fields does not imply - variable format, nor - variable size -- John Sailing Chess -- • Key is to allocate maximum number of repeating fields (if not used null) CS 245 Notes 3 23 CS 245 Notes 3 24 4 Many variants between fixed - variable format: Record header - data at beginning that describes record Example: Include record type in record 5 27 May contain: - record type - record length - time stamp - other stuff ... .... record type record length tells me what to expect (i.e. points to schema) CS 245 Notes 3 25 Exercise: How to store XML data? <table> <description> people on the fourth floor <\description> <people> <person> <name> Alan <\name> <age> 42 <\age> <email> agb@abc.com <\email> <\person> <person> <name> Sally <\name> <age> 30 <\age> <email> sally@abc.com <\email> <\person> <\people> <\table> CS 245 Notes 3 CS 245 trusted processor E(r) • Compression from: Data on the Web, Abiteboul et al – within record - e.g. code selection – collection of records - e.g. find common patterns • Encryption 27 CS 245 Notes 3 Notes 3 28 Encrypting Records search F(r)=x dbms trusted processor ?? E(r1) E(r2) E(r3) E(r4) ... CS 245 26 Other interesting issues: Encrypting Records new record r Notes 3 dbms E(r1) E(r2) E(r3) E(r4) ... 29 CS 245 Notes 3 30 5 Search Key in the Clear search k=2 trusted processor Q: k=2 dbms [1, [2, [3, [4, Encrypt Key search k=2 A: [2, E(b2)] • each record is [k,b] • store [E(k), E(b)] • can search for records with k=E(x) 31 CS 245 Issues Notes 3 32 How Do We Search Now? ??? • Hard to do range queries • Encryption not good • Better to use encryption that does not always generate same cyphertext search k=2 trusted processor Q: k’=E(2) dbms A: [E(2,dhe), E(b2)] [E(2, lkz), E(b4)] [E(1, abc), E(b1)] [E(2, dhe), E(b2)] [E(3, nft), E(b3)] [E(2, lkz), E(b4)] ... k E(k, random) E A: [E(2), E(b2)] dbms [E(1), E(b1)] [E(2), E(b2)] [E(3), E(b3)] [E(4), E(b4)] ... Notes 3 k Q: k’=E(2) E(b1)] E(b2)] E(b3)] E(b4)] ... • each record is [k,b] • store [k, E(b)] • can search for records with k=x CS 245 trusted processor • each record is [k,b] • store [E(k, rand), E(b)] • can search for records with k=E(x,???)? D simplification CS 245 Notes 3 33 CS 245 Notes 3 Solution? 34 Solution? • Develop new decryption function: D(f(k1), E(k2, rand)) is true if k1=k2 • Develop new decryption function: D(f(k1), E(k2, rand)) is true if k1=k2 search k=2 Q: check if D(f(2),*) true trusted processor dbms A: [E(2,dhe), E(b2)] [E(2, lkz), E(b4)] [E(1, abc), E(b1)] [E(2, dhe), E(b2)] [E(3, nft), E(b3)] [E(2, lkz), E(b4)] ... CS 245 Notes 3 35 CS 245 Notes 3 36 6 What are choices/issues with data compression? Issues? • Cannot do non-equality predicates • Hard to build indexes • Leaving search keys uncompressed not as bad • Larger compression units: – better for compression efficiency – worse for decompression overhead • Similar data compresses better – compress columns? CS 245 Notes 3 37 Next: placing records into blocks CS 245 Notes 3 38 Next: placing records into blocks assume fixed length blocks blocks ... blocks ... a file CS 245 Notes 3 a file 39 Options for storing records in blocks: (1) (2) (3) (4) CS 245 assume a single file (for now) Notes 3 40 (1) Separating records Block separating records spanned vs. unspanned sequencing indirection Notes 3 CS 245 R1 R2 R3 (a) no need to separate - fixed size recs. (b) special marker (c) give record lengths (or offsets) - within each record - in block header 41 CS 245 Notes 3 42 7 (2) Spanned vs. Unspanned With spanned records: • Unspanned: records must be within one block block 1 R1 block 2 R2 R3 R1 ... R4 R5 • Spanned block 1 R1 R2 R3 (a) R2 R3 R4 (b) R5 R6 R7 (a) need indication need indication of partial record “pointer” to rest of continuation (+ from where?) block 2 R3 (a) CS 245 R3 R4 (b) R5 R6 R7 (a) ... Notes 3 43 • Unspanned is much simpler, but may waste space… • Spanned essential if record size > block size Notes 3 Notes 3 44 (3) Sequencing Spanned vs. unspanned: CS 245 CS 245 • Ordering records in file (and block) by some key value Sequential file ( sequenced) 45 CS 245 Notes 3 46 Why sequencing? Sequencing Options Typically to make it possible to efficiently read records in order (a) Next record physically contiguous (e.g., to do a merge-join — discussed later) R1 Next (R1) ... (b) Linked R1 CS 245 Notes 3 47 CS 245 Next (R1) Notes 3 48 8 Sequencing Options (c) Sequencing Options Overflow area Records in sequence (c) header R1 Records in sequence R1 R2 R3 CS 245 Overflow area R4 R4 R5 R5 Notes 3 49 (4) Indirection R2.1 R2 R3 CS 245 R1.3 R4.7 Notes 3 50 (4) Indirection • How does one refer to records? • How does one refer to records? Rx Rx Many options: Physical CS 245 Notes 3 51 CS 245 Purely Physical E.g., Record Address or ID CS 245 = Notes 3 52 Fully Indirect E.g., Record ID is arbitrary bit string Device ID Cylinder # Block ID Track # Block # Offset in block Notes 3 Indirect map rec ID r 53 CS 245 Rec ID Physical addr. Notes 3 address a 54 9 Tradeoff Flexibility Cost to move records Physical of indirection Many options in between … (for deletions, insertions) CS 245 Notes 3 55 Header Free space R3 R4 R2 Notes 3 57 Options for storing records in blocks: (1) (2) (3) (4) Notes 3 56 May contain: A block: CS 245 CS 245 Block header - data at beginning that describes block Example: Indirection in block R1 Indirect - File ID (or RELATION or DB ID) - This block ID - Record directory - Pointer to free space - Type of block (e.g. contains recs type 4; is overflow, …) - Pointer to other blocks “like it” - Timestamp ... CS 245 Notes 3 58 Case Study: salesforce.com • salesforce.com provides CRM services • salesforce customers are tenants • Tenants run apps and DBMS as service separating records spanned vs. unspanned sequencing indirection tenant A tenant B salesforce.com CRM App data tenant C CS 245 Notes 3 59 CS 245 Notes 3 60 10 Tenants have similar data Options for Hosting • Separate DBMS per tenant • One DBMS, separate tables per tenant • One DBMS, shared tables tenant 1: tenant 2: CS 245 Notes 3 61 salesforce.com solution customer tenant 1 1 2 2 A a1 a2 a3 a1 cust-other tenant 1 1 2 3 CS 245 B b1 b2 b3 b1 A a1 a2 a1 a4 C c1 c2 c2 c1 f1 D E G D Notes 3 customer A a3 a1 a4 CS 245 B b3 b1 - C c2 c1 - D G - - g1 d1 Notes 3 62 Other Topics fixed schema for all tenants v1 f2 v2 ... d1 E e1 e2 F f2 g1 d1 customer A B C D E F a1 b1 c1 d1 e1 a2 b2 c2 - e2 f2 (1) Insertion/Deletion (2) Buffer Management (3) Comparison of Schemes var schema for all tenants 63 CS 245 Notes 3 64 Options: Deletion (a) Immediately reclaim space (b) Mark deleted Block Rx CS 245 Notes 3 65 CS 245 Notes 3 66 11 Options: As usual, many tradeoffs... (a) Immediately reclaim space (b) Mark deleted • How expensive is to move valid record to free space for immediate reclaim? • How much space is wasted? – May need chain of deleted records (for re-use) – e.g., deleted records, delete fields, free space chains,... – Need a way to mark: • special characters • delete field • in map CS 245 Notes 3 67 Concern with deletions CS 245 Notes 3 68 Solution #1: Do not worry Dangling pointers R1 CS 245 ? Notes 3 69 CS 245 Notes 3 70 Solution #2: Tombstones Solution #2: Tombstones E.g., Leave “MARK” in map or old location E.g., Leave “MARK” in map or old location • Physical IDs A block This space never re-used CS 245 Notes 3 71 CS 245 This space can be re-used Notes 3 72 12 Solution #2: Tombstones Insert E.g., Leave “MARK” in map or old location Easy case: records not in sequence Insert new record at end of file or in deleted slot If records are variable size, not as easy... • Logical IDs map ID LOC Never reuse ID 7788 nor space in map... 7788 CS 245 Notes 3 73 Insert CS 245 Notes 3 74 Interesting problems: Hard case: records in sequence If free space “close by”, not too bad... Or use overflow idea... CS 245 Notes 3 75 • How much free space to leave in each block, track, cylinder? • How often do I reorganize file + overflow? CS 245 Notes 3 76 Buffer Management Free space CS 245 • • • • • • Notes 3 77 DB features needed Why LRU may be bad Pinned blocks Forced output Double buffering Swizzling CS 245 Notes 3 Read Textbook! in Notes02 78 13 Swizzling Swizzling Memory Disk block 1 Rec A CS 245 Memory block 1 block 1 block 2 block 2 Notes 3 79 Disk block 1 Rec A CS 245 Rec A Notes 3 80 Row vs Column Store Row Store • So far we assumed that fields of a record are stored contiguously (row store)... • Another option is to store like fields together (column store) • Example: Order consists of CS 245 – id, cust, prod, store, price, date, qty Notes 3 81 id1 cust1 prod1 store1 price1 date1 qty1 id2 cust2 prod2 store2 price2 date2 qty2 id3 cust3 prod3 store3 price3 date3 qty3 CS 245 Notes 3 Column Store Row vs Column Store • Example: Order consists of • Advantages of Column Store – id, cust, prod, store, price, date, qty id1 id2 id3 id4 ... cust1 cust2 cust3 cust4 ... id1 id2 id3 id4 ... prod1 prod2 prod3 prod4 ... id1 id2 id3 id4 ... price1 price2 price3 price4 ... block 2 qty1 qty2 qty3 qty4 ... 82 – more compact storage (fields need not start at byte boundaries) – efficient reads on data mining operations • Advantages of Row Store – writes (multiple fields of one record)more efficient – efficient reads for record access (OLTP) ids may or may not be stored explicitly CS 245 Notes 3 83 CS 245 Notes 3 84 14 Interesting paper to read: Comparison • Mike Stonebreaker, Elizabeth (Betty) O'Neil, Pat O’Neil, Xuedong Chen, et al. " C-Store: A Column-oriented DBMS," Presented at the 31st VLDB Conference, September 2005. • http://www.cs.umb.edu/%7Eponeil/ vldb05_cstore.pdf • There are 10,000,000 ways to organize my data on disk… CS 245 Notes 3 85 Space Utilization Complexity Performance CS 245 Notes 3 CS 245 - 87 Example 86 fetch record given key fetch record with next key insert record append record delete record update record read all file reorganize file CS 245 Notes 3 88 Summary How would you design Megatron 3000 storage system? (for a relational DB, low end) • How to lay out data on disk Data Items – Variable length records? – Spanned? – What data types? – Fixed format? – Record IDs ? – Sequencing? – How to handle deletions? CS 245 Notes 3 To evaluate a given strategy, compute following parameters: -> space used for expected data -> expected time to Issues: Flexibility Which is right for me? Notes 3 Records Blocks Files Memory DBMS 89 CS 245 Notes 3 90 15 Next How to find a record quickly, given a key CS 245 Notes 3 91 16