§6 B+ Trees 【Definition】A B+ tree of order M is a tree with the following structural properties: (1) The root is either a leaf or has between 2 and M children. (2) All nonleaf nodes (except the root) have between M/2 and M children. (3) All leaves are at the same depth. EachMinterior node And 1 smallest All the actual Assume each nonroot leaf also has between M/2 and M children. keycontains values in Mthe data are stored at subtrees pointers except to thethe A B+ tree of order 4 thestleaves. children. 1 one. (2-3-4 tree) 21 48 72 12 1,4,8,11 25 12,13 15,18,19 21,24 < 1/13 15 < < 31 25,26 < < 41 59 84 31,38 41,43,46 48,49,50 59,68 < < < < 72,78 91 84,88 91,92,99 < < §6 B+ Trees A B+ tree of order 3 (2-3 tree) 22: 22: 16: 11:16 41:58 41:58 1, 88,11,12 11,12 16,17,18 16,17 16,17,18 Find: 52 Insert: 18 22,23,31 22,23,31 Insert: 1 1, 8 2/13 11,12 18: 16,17 18,19 58,59,61 58,59,61 Insert: 19 Insert: 28 16:22 11: 41,52 41,52 41:58 22,23,31 41,52 58,59,61 §6 B+ Trees Insert: 70 22: 16: 11: 1, 8 41: 18: 11,12 16,17 18,19 28: 22,23 58: 28,31 41,52 58,59,61 First find a sibling with 2 keys and adjust. Keep more nodes full. Deletion is similar to insertion except that the root is removed when it loses two children. 3/13 §6 B+ Trees For a general B+ tree of order M Btree Insert ( ElementType X, Btree T ) { Search from root to leaf for X and find the proper leaf node; Insert X; while ( this node has M+1 keys ) { split it into 2 nodes with (M+1)/2 p.138 4.36 and (M+1)/2 keys, respectively; Access if (this node is the root) a 2-3 tree create a new root with two children; check its parent; } } Home work: Discussion 7: Depth(M, N) = ? Tinsert = ? Tfind = ? 4/13 Research Project 3 Family of B Trees (23) In computer science, there is a family of B trees – B- trees, B+ trees, B* trees, B# trees, and B x-trees. They are tree data structures that keep data sorted and allow searches, insertions, and deletions in logarithmic (amortized) time. In this project, you are supposed to introduce the Btrees and compare it with B+ trees. Detailed requirements can be downloaded from http://acm.zju.edu.cn/dsaa/ 5/13 Research Project 4 Tries (23) A trie is an index structure that is particularly useful when the keys vary in length. It is also called a prefix tree, and is used to store an associative array where the keys are usually strings. In this project, you are supposed to introduce the tries and compare with ordinary binary search trees. Detailed requirements can be downloaded from http://acm.zju.edu.cn/dsaa/ 6/13 Inverted File Index How can I find in which retrieved web pages that include "Computer Science"? 7/13 Inverted File Index Solution 1: Scan each page for the string "Computer Science". How did Google do? 8/13 Inverted File Index Solution 2: Inverted File Index 【Definition】 Index is a mechanism for locating a given term in a text. 【Definition】 Inverted file contains a list of pointers (e.g. the number of a page) to all occurrences of that term in the text. silver truck 〖Example〗 Document sets Doc 1 2 3 4 9/13 No. 1 Text 2 Gold silver truck Shipment of gold damaged in a fire Inverted File Index 3 4 5 Delivery of silver arrived in a silver truck 6 Shipment of gold arrived in a truck 9 7 8 10 11 Term Times; Documents a arrived damaged delivery fire gold of in shipment silver truck <3; 2,3,4> <2; 3,4> <1; 2> <1; 3> <1; 2> <3; 1,2,4> <3; 2,3,4> <3; 2,3,4> <2; 2,4> <2; 1,3> <3; 1,3,4> Inverted File Index Doc Text No. 1 2 1 Gold silver truck 3 2 Shipment of gold damaged in a fire 4 3 4 Delivery of silver arrived in a silver truck Shipment of gold arrived in a truck 5 6 7 8 9 10 11 Term Times; Documents a arrived damaged delivery fire gold of in shipment silver truck <3; 2,3,4> <2; 3,4> <1; 2> <1; 3> <1; 2> <3; 1,2,4> <3; 2,3,4> <3; 2,3,4> <2; 2,4> <2; 1,3> <3; 1,3,4> Discussion 8: How to easily print the sentences which contain the words and highlight the words? 10/13 Inverted File Index Word Stemming Process a word so that only its stem or root form is left. 〖Example〗 Process processing processes processed process says said saying say Stop Words Some words are so common that almost every document contains them, such as “a” “the” “it”. It is useless to index them. They are called stop words. We can eliminate them from the original documents. 11/13 Inverted File Index Access Methods Solution 1: Search trees ( B- trees, B+ trees, Tries, ... ) Solution 2: Hashing Discussion 9: What are the pros and cons of using hashing? Discussion 10: How to improve the quality of search results? 12/13 Research Project 5 Roll Your Own Mini Search Engine In this project, you are supposed to create your own mini search engine which can handle 1 million inquiries over 100 files in 1 second. You may download the functions for handling stop words and stemming from the Internet. Detailed requirements can be downloaded from http://acm.zju.edu.cn/dsaa/ 13/13