File Organizations •Tree terms •root, internal, leaf, subtree •parent, child, sibling •balanced, unbalanced •b+-tree - split on overflow; merge on underflow - in practice it is usually 3 or 4 levels deep •search, insert, delete algorithms Fall 2015 R McFadyen ACS - 3902 1 File Organizations MySQL (simplified syntax) CREATE [UNIQUE] INDEX index_name ON tbl_name (index_col_name,...) index_type: USING {BTREE | HASH} Fall 2015 R McFadyen ACS - 3902 2 File Organizations Indexes are automatically created for: PRIMARY KEY Is a constraint that enforces entity integrity for a given column or columns through a unique index. Only one PRIMARY KEY constraint can be created per table. UNIQUE Is a constraint that provides entity integrity for a given column or columns through a unique index. A table can have multiple UNIQUE constraints. Fall 2015 R McFadyen ACS - 3902 3 File Organizations CLUSTERED Creates an object where the physical order of rows is the same as the indexed order of the rows. Index entries that are logically close also means the data will be close together physically. A table or view is allowed one clustered index. Primary key index is normally clustered Fall 2015 R McFadyen ACS - 3902 4 File Organizations Motivation (finding one record given its key) •Scanning a file is time consuming •B+-tree provides a short access path file of records page1 B+-tree page2 page3 Fall 2015 R McFadyen ACS - 3902 5 File Organizations Motivation •A B+-tree is a tree, in which each node is a bucket. •A B+-tree for a file is stored in a separate file. •A file could have many B+-trees file of records bucket 1 B+-tree bucket 2 bucket 3 Fall 2015 R McFadyen ACS - 3902 6 File Organizations b+-tree •based on b-tree (Bayer, balanced, Boeing) •dynamic Root Internal nodes ... ... Leaf nodes Fall 2015 R McFadyen ACS - 3902 7 File Organizations Node structure for b+-tree of order p non-leaf node (internal node or a root) • < P1, K1, P2, K2, …, Pq-1, Kq-1, Pq > (q p) • keys are in sequence K1 < K2 < ... < Kq-1 (i.e. it’s an ordered set) • for any key value, X, in the subtree pointed to by Pi •Ki-1 < X Ki for 1 < i < q •X K1 for i = 1 •Kq-1 < X for i = q •each internal node has at most p pointers •each node except root must have at least p/2 pointers •the root, if it has some children, must have at least 2 pointers Fall 2015 R McFadyen ACS - 3902 8 File Organizations Node structure for b+-tree of order p leaf node (terminal node) •< (K1, Pr1), (K2, Pr2), …, (Kq-1, Prq-1), Pnext > •K1 < K2 < ... < Kq-1 •Pri points to a record with key value Ki , or, Pri points to a block containing a record with key value Ki •each leaf has at least p/2 keys •maximum of p keys •all leaves are at the same level (balanced) •Pnext points to the next leaf for key sequencing Fall 2015 R McFadyen ACS - 3902 9 File Organizations Example •insert records with key values Diane, Cory, Ramon, Amy, Miranda, Ahmed, Marshall, Zena, Rhonda, Vincent, Hok into a b+-tree with p=3. internal node will have minimum 2 pointers and maximum 3 pointers - inserting a fourth will cause a split leaf can have at least 2 key/pointer pairs and a maximum of 3 key/pointer pairs - inserting a fourth will cause a split Fall 2015 R McFadyen ACS - 3902 10 File Organizations Only leaf nodes at this point – need a split before there are internal nodes insert Diane Pointer to next leaf in ascending key sequence Diane Pointer to data (wherever the record for Diane is actually stored) insert Cory Cory , Diane Fall 2015 R McFadyen ACS - 3902 11 File Organizations Example insert Ramon Only leaf nodes at this point Cory , Diane , Ramon inserting Amy will cause the node to overflow: Amy Fall 2015 , Cory , Diane , Ramon R McFadyen ACS - 3902 This must split 12 File Organizations Example This is logically correct but it exceeds the space available ….. it must split into two leafs: Amy Fall 2015 , Cory , Diane , Ramon R McFadyen ACS - 3902 13 File Organizations split the node and promote a key value upwards (this must be Cory because it’s the highest key value in the left subtree) Amy , Cory , Diane , Ramon Splitting the above results in Tree has grown one level, from the bottom up Cory Amy Fall 2015 , Cory Diane , Ramon R McFadyen ACS - 3902 14 File Organizations Splitting Nodes Any value being promoted upwards will come from the node that is splitting. •When a leaf splits, a ‘copy’ of a key value is promoted. •When an internal node splits, the middle key value ‘moves’ from a child to a parent node. There are three situations to be concerned with: •a leaf splits, •an internal node splits, •a new root is generated. Fall 2015 R McFadyen ACS - 3902 15 File Organizations Leaf splitting When a leaf splits, a new leaf is allocated •the original leaf is the left sibling, the new one is the right sibling •key and pointer pairs of the overflowing node are redistributed: the left sibling will have lesser keys than the right sibling •a 'copy' of the key value which is the largest of the keys in the left sibling is promoted to the parent 22 33 33 12 22 33 44 48 55 12 22 31 33 44 48 55 insert 31 Two situations arise: the parent exists or not. If the parent exists, then a copy of the key value and the pointer to the right sibling are promoted upwards. Otherwise, the b+-tree is just beginning to grow ... Fall 2015 R McFadyen ACS - 3902 16 File Organizations Internal node splitting If an internal node splits and it is not the root, •insert the key and pointer and then determine the middle key •a new 'right' sibling is allocated •everything to its left stays in the left sibling •everything to its right goes into the right sibling •the middle key value along with the pointer to the new right sibling is promoted to the parent (the middle key value 'moves' to the parent to become the discriminator between the left and right siblings) 26 55 55 22 33 22 33 Insert 26 Fall 2015 R McFadyen ACS - 3902 17 File Organizations Internal node splitting When a new root is formed, a key value and two pointers must be placed into it. 55 26 55 26 56 Insert 56 Fall 2015 R McFadyen ACS - 3902 18 File Organizations A sample trace Diane, Cory, Ramon, Amy, Miranda, Marshall, Zena, Rhonda, Vincent, Simon, Mary into a b+-tree with p=3. Cory Amy , Cory Diane , Ramon Miranda Fall 2015 R McFadyen ACS - 3902 19 File Organizations Cory Amy , Cory Diane , Miranda , Ramon Marshall Cory Amy , Cory Marshall Diane , Marshall Miranda , Ramon Zena Fall 2015 R McFadyen ACS - 3902 20 File Organizations Cory Amy , Cory Marshall Diane , Marshall Miranda , Ramon , Zena Rhonda Cory Marshall Ramon Amy , Cory Fall 2015 Diane , Marshall R McFadyen Miranda , Ramon ACS - 3902 Rhonda , Zena 21 File Organizations Marshall Ramon Cory Amy , Cory Diane , Marshall Miranda , Ramon Rhonda , Zena Vincent Fall 2015 R McFadyen ACS - 3902 22 File Organizations Marshall Ramon Cory Amy , Cory Diane , Marshall Miranda , Ramon Rhonda , Vincent ,Zena Simon Fall 2015 R McFadyen ACS - 3902 23 File Organizations Marshall Ramon Simon Miranda , Ramon Rhonda , Simon Vincent , Zena Mary Fall 2015 R McFadyen ACS - 3902 24 File Organizations A sample b+-tree p = 3, pleaf = 2. 5 3 1 3 5 7 6 7 8 8 9 12 Records Fall 2015 R McFadyen ACS - 3902 25 File Organizations Searching a b+-tree - search a record with key = 8: 5 3 1 3 5 7 6 7 8 8 9 12 The cost of retrieving this record is at most 4 disk accesses: 3 to the index component and 1 to the data component. In an actual system it would usually be less because the system would keep as much of the index as it can in main memory. Fall 2015 R McFadyen ACS - 3902 26 File Organizations Entry deletion - deletion sequence: 8, 12, 9, 7 5 3 1 3 5 7 6 7 9 9 12 Deleting 8 results in underflow and key redistribution Fall 2015 R McFadyen ACS - 3902 27 File Organizations Entry deletion - deletion sequence: 8, 12, 9, 7 5 3 1 3 5 7 6 7 9 12 is removed. Fall 2015 R McFadyen ACS - 3902 28 File Organizations Entry deletion - deletion sequence: 8, 12, 9, 7 5 3 1 3 5 6 6 7 9 is removed. Fall 2015 R McFadyen ACS - 3902 29 File Organizations Entry deletion - deletion sequence: 8, 12, 9, 7 5 3 1 3 5 6 6 Deleting 7 makes this pointer useless. Therefore, a merge at the level above the leaf level occurs. Fall 2015 R McFadyen ACS - 3902 30 File Organizations Entry deletion - deletion sequence: 8, 12, 9, 7 5 3 A 55 This pointer becomes useless. The corresponding node should also be removed. B 1 3 5 6 C For this merge, 5 will be taken as a key value in A since any key value in B is less than or equal to 5 but any key value in C is larger than 5. Fall 2015 R McFadyen ACS - 3902 31 File Organizations Entry deletion - deletion sequence: 8, 12, 9, 7 3 1 Fall 2015 3 5 R McFadyen 55 6 ACS - 3902 32 File Organizations b+-tree operations •search - always the same search length - tree height+1 •retrieval - sequential access is facilitated - how? •insert - may cause overflow - tree may grow •delete - may cause underflow - tree may shrink Aside: What do you expect for storage utilization? http://en.wikipedia.org/wiki/B-tree https://en.wikipedia.org/wiki/B%2B_tree Fall 2015 R McFadyen ACS - 3902 33