Data Structures for Databases Sept 9, 2004 AHz-04.27-1.0 He Tan hetan@ida.liu.se IISLAB IDA He Tan hetan@ida.liu.se IISLAB IDA Overview Real world Data Structures for Databases Model Databases He Tan DBMS Queries Answers Processing of queries and updates Access to stored data Physical database september 2007 1 1 He Tan hetan@ida.liu.se IISLAB IDA september 2007 He Tan hetan@ida.liu.se IISLAB IDA What is this about? 2 2 • How to make more efficient kinds of indexes What do you need to learn? • Multilevel indexing • Index on mutiple keys • Hashing september 2007 3 3 He Tan hetan@ida.liu.se IISLAB IDA A sequential algorithm needs to access all 250,000 blocks (transfer all blocks in main memory) • Blocks: a binary search would need to access 5 TDDB38/TDDI60 - HT 2004 Multilevel Index Example • Assume an ordered data file with 1,000,000 records of size 1000 byte and block size of 4,096 bytes. Assuming an index record size of 32 bytes. On average, how many block accesses need to be performed to find a single record when searching for the key field a) Using no index? log 2 b = log 2 250000 = 18 september 2007 4 4 He Tan hetan@ida.liu.se IISLAB IDA Record access • september 2007 The number of blocks for the data file is 250,000 b) Using a primary index? 5 september 2007 6 6 1 Data Structures for Databases Sept 9, 2004 AHz-04.27-1.0 He Tan hetan@ida.liu.se IISLAB IDA He Tan hetan@ida.liu.se IISLAB IDA Multilevel indexes Multilevel Indexes • ”Index on the index” Reduce the search space of the index by fitting indexes of the index in fewer blocks until the top level index fits in one block. • The reduction is determined by the blocking factor. • The value blocking factor is called as fan out (fo). september 2007 7 7 He Tan hetan@ida.liu.se IISLAB IDA Assume an ordered datafile with 1,000,000 records of size 1000 byte and block size of 4,096 bytes. Assuming an index record size of 32 bytes. All levels are based on physically ordered files. On average, how many block accesses need to be performed to find a single record when searching for the key field • Use an overflow file and re-create the index during file re-organisation. He Tan hetan@ida.liu.se IISLAB IDA • Use a dynamic multilevel index structure 9 9 Search Tree • A search tree is a tree that is used to guide the search for a record. • An ordinary search tree of order p consist of nodes that have at most p-1 values and p pointers. Problems with Multilevel Indexes Problems when inserting and deleting data a) Using a multilevel index september 2007 8 8 He Tan hetan@ida.liu.se IISLAB IDA Multilevel Index Example • september 2007 september 2007 10 10 Search Trees Pi . Pq . <P1, K1, P2, K2, …, Pq-1, Kq-1, Pq> where q≤p and Pi is a pointer to a child node (or a null pointer) 1. Within each node, K1 < K2 < … < Kq-i 2. For all values X in the subtree pointed by Pi: If 1< i < q, Ki-1 < X < Ki If i = 1, X < K1 If i = q, Kq-1 < X september 2007 11 TDDB38/TDDI60 - HT 2004 11 2 Data Structures for Databases Sept 9, 2004 AHz-04.27-1.0 He Tan hetan@ida.liu.se IISLAB IDA Search Tree: Example, order p=3 B-Tree B-tree = Balanced tree. all leaves are on the same level all nodes except the root and leaves have at most p pointers and at least p / 2 pointers. september 2007 He Tan hetan@ida.liu.se IISLAB IDA He Tan hetan@ida.liu.se IISLAB IDA B-Tree: Example, order p=3 14 14 B-tree: Order One node must fit in one block: p ⋅ Pblock + ( p − 1) ⋅ ( Precord + K ) ≤ B ⇒ p ≤ p Pblock Precord K september 2007 15 15 He Tan hetan@ida.liu.se IISLAB IDA Given: B = 4096 bytes, Precord = 16 bytes, Pblock = 8 bytes, K = 64 bytes, fill percentage = 69% 16 B+-tree • A variation of the B-tree • Data pointers only stored in leaf nodes. Æ p <= 47 Nodes Pointers 1 0.69*47≈33 33-1=32 33 33*33=1089 33*32=1056 Level2 1089 Level3 35,937 333 =35,937 334 =1,185,921 • The leaf nodes are usually linked to provide ordered access. Entries Level1 Root order, number of block pointer entries in a node size of a block pointer size of a record pointer size of a search key field 16 He Tan hetan@ida.liu.se IISLAB IDA B-tree: Number of entries • september 2007 B+Precord+K Pblock+Precord+K • Most common dynamic multilevel index implementation 332 *32=34,848 333 *32=1,149,984 The number of entries hold in the 3 level B-tree: 1,185,920 september 2007 17 TDDB38/TDDI60 - HT 2004 17 september 2007 18 18 3 Data Structures for Databases Sept 9, 2004 AHz-04.27-1.0 He Tan hetan@ida.liu.se IISLAB IDA B+-Tree: Example, order p=3, pleaf=2 Order of insertion: 8, 5, 1, 7, 3, 12, 9, 6 8 5 1 7 3 12 9 6 5 3 7 8 Andersson Hagberg French Silver Daniels Young Zhing Baker B+-trees: Internal nodes 1. Each internal node is of the form <P1, K1, P2, K2, …, Pq-1, Kq-1, Pq> 2. Within each internal node K1 < K2 < … < Kq-i 3. For all search field values X in the subtree pointed at by Pi, we have Ki-1< X ≤ Ki 1 5 3 6 8 7 9 K1 X ≤ Ki Ki-1 < X 12 for 1 < i < q for i = 1 for i=q P1 ... K1 He Tan hetan@ida.liu.se IISLAB IDA K1 4. Each internal node has at most p tree pointers. 5. Each internal node, except the root, has at least K1 ... Ki −1 Pi Ki ... B+-trees: Leaf nodes 2. Within each leaf node K1 < K2 < … < Kq-i 3. Each entry contains a pointer to the record whose search field value corresponds to the entry. 4. Each leaf node has at least p / 2 values. 5. All leaf nodes are at the same level. Pr1 ... Ki Pri ... K q−1 Pq Pnext K q −1 < X 21 21 He Tan hetan@ida.liu.se IISLAB IDA 3 Each leaf node is of the form K1 september 2007 1 Kq−1 Pq K i −1 < X ≤ K i X ≤ K1 K q −1 < X <<K1, P1>, <K2, P2>, …, <Kq-1, Pq-1>, Pnext> An internal node with q pointers (q≤ p), has q-1 search field values. P1 Kq−1 Pq 20 1. p / 2 tree pointers. The root node has at least two tree pointers if it is an internal nodes. 6. ... 20 He Tan hetan@ida.liu.se IISLAB IDA B+-trees: Internal nodes Ki K i −1 < X ≤ K i X ≤ K1 september 2007 Ki −1 Pi september 2007 He Tan hetan@ida.liu.se IISLAB IDA B+-tree Order 22 22 B+-trees • Given: B=4096 bytes, One internal node must fit in one block: ⇒ p ≤ p ⋅ Pblock + ( p − 1) ⋅ K ≤ B Precord=16 bytes, Pblock=8 bytes, K=64bytes, B+K Pblock + K fill percentage=70% Æ p <= 57; pleaf<=51 Nodes One leaf node must fit in one block: p leaf ⋅ ( Precord + K ) + Pblock ≤ B ⇒ p leaf ≤ september 2007 23 TDDB38/TDDI60 - HT 2004 B p pleaf Pblock K Precord B − Pblock Precord + K block size order, number of pointer entries in an internal node number of record pointer entries in a leaf node size of a block pointer size of a search key field 23 size of a record pointer Pointers Entries ≈ 40 40-1=39 Level1 40 40*40=1600 40*39=1560 Level2 1600 403 =64,000 402 *39=62,400 Root Leaf level 1 0.7*57 Record pointers 64,000 64,000*0.7*51=2,284,800 the number of entries hold in the 3-level B-tree: 1,185,920 september 2007 24 24 4 Data Structures for Databases Sept 9, 2004 AHz-04.27-1.0 He Tan hetan@ida.liu.se IISLAB IDA He Tan hetan@ida.liu.se IISLAB IDA B+-trees Search B+-Tree Search Search: 8 • Very fast searching in the index structure: 5 log p⋅ f N N p f 25 25 He Tan hetan@ida.liu.se IISLAB IDA 8 number of search values order, number of block pointers per node fill factor, 0≤f≤1 1 september 2007 7 3 september 2007 5 6 7 8 • Insertion and deletion can be expensive. 9 12 26 26 He Tan hetan@ida.liu.se IISLAB IDA B+-trees Insertion and Deletion 3 B+-tree: Insertion When a leaf node is full it causes an overflow The first p 2 entries in the node are kept there, the remaining are moved to a new leaf. The search value of new node move up to the parent. If the parent is full, it will overflow. The resulting split can propagate all the way up to the root. september 2007 27 27 B+-Tree september 2007 28 28 B+-Tree 8 Insert: 8 TDDB38/TDDI60 - HT 2004 Insert: 5 5 Data Structures for Databases Sept 9, 2004 AHz-04.27-1.0 B+-Tree B+-Tree 5 8 Overflow – create a new level 5 1 Insert: 1 5 8 Insert: 7 B+-Tree B+-Tree 5 1 Overflow - Split 5 7 3 8 1 3 5 5 7 8 Overflow - Split Propagates to a new level Insert: 3 Insert: 12 B+-Tree B+-Tree 5 3 1 3 5 8 5 7 8 3 12 1 3 8 5 7 8 9 12 Overflow – Split Insert: 9 TDDB38/TDDI60 - HT 2004 Insert: 6 6 Data Structures for Databases Sept 9, 2004 AHz-04.27-1.0 He Tan hetan@ida.liu.se IISLAB IDA B+-Tree B+-tree: Deletion 5 When a leaf node is less than haf full it causes an underflow 3 1 7 3 5 6 7 Redistribute, merge with sibling, The resulting combining can also propagate to internal nodes. 8 8 9 12 Resulting B+-tree september 2007 38 38 B+-Tree B+-Tree 7 1 6 1 5 7 9 6 7 1 8 9 12 1 6 6 9 7 8 9 12 Underflow - redistribute Delete: 5 Delete: 12 B+-Tree B+-Tree 7 1 1 6 6 7 8 7 1 8 9 1 6 6 8 7 8 Underflow Delete: 9 TDDB38/TDDI60 - HT 2004 merge with the left propagate reduce the tree levels 7 Data Structures for Databases Sept 9, 2004 AHz-04.27-1.0 He Tan hetan@ida.liu.se IISLAB IDA B+-Tree 1 B+-trees 6 • Many variations 1 6 7 B-trees B+-trees B*-trees (B+-tree with a fill factor of at least 2/3) 8 • Common modifications Change the fillfactor from 0.5 to 1.0 Allow a node to become empty before merging september 2007 He Tan hetan@ida.liu.se IISLAB IDA He Tan hetan@ida.liu.se IISLAB IDA Indexes on Multiple Keys e.g. select * from employee where dept = ‘CS’ and age = ’40’ • use index on dept to find employee with dept = ‘CS’, then test them individually to see if age = ’40’ • use index on age to find employee with age = ’40’, then test them individually to see if dept = ‘CS’ • use dept index to find pointers to all records of the CS department, and use age index similarly, then take intersection of both sets of pointers 45 45 He Tan hetan@ida.liu.se IISLAB IDA Indexes on Multiple Keys • Possible strategies for processing this query using indices on single attributes: september 2007 44 44 ordered index on multiple attributes, treat the composite as a single value september 2007 46 46 He Tan hetan@ida.liu.se IISLAB IDA Hashing If the set of records that matches each condition is large, but the combination is not, an index on the composite may be useful. Static Hashing • Buckets contain index entries. • Fast search with equality condition on hash field. • Hash function h(field) yields block address 0 1 h(key) = key mod M • Collision key • Basic idea: static hashing h • Dynamic hashing techniques Extendable hashing Linear hashing M-1 buckets Overflow buckets address space = bfr * M september 2007 47 TDDB38/TDDI60 - HT 2004 47 september 2007 48 48 8 Data Structures for Databases Sept 9, 2004 AHz-04.27-1.0 He Tan hetan@ida.liu.se IISLAB IDA He Tan hetan@ida.liu.se IISLAB IDA Hashing Extendible Hashing • Additional access structure: directory • Dynamic hashing techniques d’=2 4* 12* 16* 20* Insert 20 Bucket A d’=1 Extendable hashing Linear hashing d’=2 4* 12* 13* 16* d=2 d=2 Bucket A 00 d’=2 01 13* Bucket A’ 00 d’=2 01 10* 14* 10 10* 14* 10 Bucket B 11 Bucket B 11 d’=2 d’=2 15* 7* 19* directory directory 15* 7* 19* Bucket C september 2007 49 49 He Tan hetan@ida.liu.se IISLAB IDA september 2007 50 50 He Tan hetan@ida.liu.se IISLAB IDA Extendible Hashing Linear Hashing • Extend Æ double directory • hi(K) = K mod M before insert, local depth of bucket = global depth. Insert causes local depth to become > global depth; • Extend when collision Split bucket n in two. Distribute entries in bucket n based on hi+1(K) = (K mod 2M) n=n+1 • Shrink Æ half directory If removal of data entry makes bucket empty If each directory element points to the same bucket • Retrieve: if (K mod M)<n then return hi+1(K) • Gain: no performance degradation due to the collision else return hi(K) • At the cost of: 2 block accesses per record (directory + data), space for directory, and bucket reorganization. september 2007 He Tan hetan@ida.liu.se IISLAB IDA • Shrink: the buckets also are combined linearly 51 51 Bucket C september 2007 52 52 Indexes in reality – Oracle: Cluster Index Indexes in reality – MySQL InnoDB storage engine • Create a clustered index for each table • Rows are physically ordered by the primary key • B-trees • Keep together (on disk) what belongs together Æ faster retrieval of data • A cluster is made up of a group of tables that share common columns and are often used together. EMP_DEPT EmpNo EmpName EmpDeptNo EMP 100 101 102 103 Smith Wilson Jones Baker 10 10 20 20 ClusterKey Deptno 10 DEPT DeptNo 10 20 DeptName Sales Admin Deptno 20 DeptName Sales EmpNo EmpName 100 Smith 101 Wilson DeptName Admin EmpNo EmpName 102 Jones 103 Baker Unclustered Tables Related data stored apart Clustered Tables Related data stored together september 2007 53 TDDB38/TDDI60 - HT 2004 53 9 Data Structures for Databases Sept 9, 2004 AHz-04.27-1.0 He Tan hetan@ida.liu.se IISLAB IDA Indexes in reality – Oracle: Cluster Index Indexes in reality – Oracle: Bitmap Index CREATE CLUSTER emp_dept (deptno NUMBER(3)); • On columns having low or medium distinct values CREATE TABLE dept ( • can even index NULL values; • each bit in the bitmap corresponds to a possible record pointer deptno NUMBER(3) PRIMARY KEY, deptname VARCHAR2(10) NOT NULL ) CLUSTER emp_dept (deptno); CREATE TABLE emp ( empno NUMBER(5) PRIMARY KEY, Record Pointer 0x011 0x012 0x022 0x023 0x034 empname VARCHAR2(15) NOT NULL, empdeptno NUMBER(3) REFERENCES dept) CLUSTER emp_dept (empdeptno); CREATE INDEX emp_dept_index ON CLUSTER emp_dept; september 2007 EmpSalary Currency Smith Baker Jones Müller Meier 2000 1900 1950 2020 2010 $ $ $ € € € € Currency ‘$’ ‘€ € ’ Bitmap 1 1 1 0 0 0 0 0 1 1 55 55 He Tan hetan@ida.liu.se IISLAB IDA EmpName Summary • Index files (primary, clustering, secondary) • Search trees, B+-trees september 2007 57 TDDB38/TDDI60 - HT 2004 57 10