Database Cracking Holger Pirk, Eleni Petraki, Stratos Idreos, Stefan Manegold Outline 1. What is Database Cracking 2. Why do Database Cracking 3. Targeted Environment 4. CPU optimization during cracking The expectations to database system ❖Simple and flexible ➢Should be able to handle huge sets of data and self-orginize according to the environment. e.g. the workload, available resources, etc. ❖Good performance ➢Should utilize distributed environments to speed up computation ➢Should utilize multi-core CPU efficiently What is Database Cracking An approach in database architecture on column oriented database (e.g.MonetDB) Core idea: 1. index maintenace should be a byproduct of query processing, not of updates What is Database Cracking An approach in database architecture on column oriented database (e.g.MonetDB) Core idea: 1. index maintenace should be a byproduct of query processing, not of updates Only database portions of past interest can be easily localized, the remainder remains non-indexed until a query becomes interested What is Database Cracking An approach in database architecture on column oriented database (e.g.MonetDB) Core idea: 1. index maintenace should be a byproduct of query processing, not of updates Only database portions of past interest can be easily localized, the remainder remains non-indexed until a query becomes interested 2. Each query is interpreted as an advice to crack the physical database store into smaller pieces cracker index What is Database Cracking Design: 1. The first time a range query is posed on an attribute A, the cracking DBMS makes a copy of column A, called the cracker column of A 2. The cracker column is continuously physically re-organized based on queries 3. Build a cracker index (AVL-tree) and keep updating it Why do Database Cracking? Significant gains in query performance Provides basis for high-speed distributed and multi-core query processing Easy to implement Cracking algorithms Physical reorganization happens per column Split a piece of a column in two new pieces Split a piece of a column in three new pieces A<5 A<10 A<10 5<A<10 A>=10 5<A<10 A>=10 Cracking algorithms select A>5 and A<10 17 3 8 6 2 12 13 4 15 Cracking algorithms In cracker index, each node of AVL tree stores a position p select A>5 and A<10 17 17 3 3 8 8 6 6 2 2 15 15 13 13 4 4 12 12 Cracker column c Cracking algorithms select A>5 and A<10 17 17 3 3 8 8 6 6 2 2 15 15 13 13 4 4 12 12 >=10 >=10 Cracking algorithms select A>5 and A<10 17 17 3 3 8 8 6 6 2 2 15 15 13 13 4 4 12 12 >=10 Cracking algorithms select A>5 and A<10 17 17 3 3 8 8 6 6 2 2 15 15 13 13 4 4 12 12 >=10 <=5 Cracking algorithms select A>5 and A<10 17 17 3 3 8 8 6 6 2 2 15 15 13 13 4 <=5 4 12 >=10 12 Cracking algorithms select A>5 and A<10 >=10 17 3 3 8 8 6 6 2 2 15 15 13 13 4 17 12 12 4 <=5 Cracking algorithms select A>5 and A<10 17 4 3 3 8 8 6 6 2 2 15 15 13 13 4 17 12 12 >=10 <=5 Cracking algorithms select A>5 and A<10 17 4 3 3 8 8 6 6 2 2 15 15 13 13 4 17 12 12 >=10 Cracking algorithms select A>5 and A<10 17 4 3 3 8 8 6 6 2 2 15 15 13 13 4 17 12 12 >=10 Cracking algorithms select A>5 and A<10 17 4 3 3 8 8 6 6 2 2 15 15 13 13 4 17 12 12 <=5 Cracking algorithms select A>5 and A<10 17 4 3 3 8 8 6 6 2 2 15 15 13 13 4 17 12 12 <=5 <=5 Cracking algorithms select A>5 and A<10 17 4 3 3 8 8 6 6 2 2 15 15 13 13 4 17 12 12 >5 and <10 <=5 Cracking algorithms select A>5 and A<10 17 4 3 3 8 8 6 6 >5 and <10 2 <=5 2 15 15 13 13 4 17 12 12 Cracking algorithms select A>5 and A<10 17 4 3 3 >5 and <10 8 6 6 2 8 15 15 13 13 4 17 12 12 2 <=5 Cracking algorithms select A>5 and A<10 17 4 3 3 8 2 6 6 2 8 15 15 13 13 4 17 12 12 >5 and <10 <=5 Cracking algorithms select A>5 and A<10 17 4 3 3 8 2 6 6 2 8 15 15 13 13 4 17 12 12 >5 and <10 Cracking algorithms select A>5 and A<10 17 4 3 3 8 2 6 6 2 8 15 15 13 13 4 17 12 12 <= 5 >5 >= 10 Cracking algorithms Improve data access for future queries select A>5 and A<10 17 4 3 3 8 2 6 6 2 8 15 15 13 13 4 17 12 12 <= 5 >5 >= 10 How does cracking fit in the query plan of DBMS In MonetDB, the above query is translated into the following(partial) plan: In MonetDB, each column is stored in a seperate table.Each tuple is in the form {(OID, value)} How does cracking fit in the query plan of DBMS The simple select operator : 1. Scan the column 2. Return a new column that contains qualifying values The cracker select operator : 1. 2. 3. 4. Search the cracker index Physically re-organizes pieces found Update the cracker index Return a slice of the cracker column as result, and OID values are disorganized How does cracking fit in the query plan of a modern DBMS How does cracking fit in the query plan of a modern DBMS cracker column which is physically reorganized Crackers.rel_select(Ra1,Rb,9,20) 1. a1.OID = b.OID 2. 9≤ b.value ≤20 Original column ordered by OID How does cracking compare to sorting Sorting method is better ● ● An environment where it is known upfront which data is interesting for for the users/queries There is the luxury of time and resources to create this physical order before any query arrives cracking method is better ● There is not any knowledge about which part of the data is interesting ● There is not enough time to restore or maintain the phsical order after an update How does cracking compare to sorting Costs in Reality • Implement microbenchmarks -1 Billion uniform random integer values - Pivot in the middle of the range - Workstation machine (16 GB RAM, 4 Sandy Bridge Cores) Costs in Reality Wallclock time in s 13 12 10 8.0 6.0 4.0 2.0 Parallel Scanning Cracking Parallel Sorting A little costs breakdown Data Stalls Bad Speculation Retiring Pipeline Frontend Pipeline Backend 0.80 Lots of Potential 0.60 0.40 0.20 Scanning Cracking Sorting Vectorized Cracking • Turns in-place cracking into out of place cracking • Copies Vector-seized chunks and cracks them into array • Challenge: Ensure that values aren’t “accidentally” overwritten Vectorized Cracking Database Statistics… Selectivity factor of an operation (SF): Selectivity factor for joins The proportion of tuples of an operand relation that participate in the result of that operation [0, 100%] Selectivity factor of selection Example 30% SF = 30% values less than p + 70% values greater than p Parallelization 1. Simple Partion & Merge Divide an uncracked piece into T consecutive partitions. Concurrently cracked by T threads. Finally a single thread swaps wrongly placed blocks. Simple Crack & Merge Example of 4 Threads Red – values that are less than the pivot Blue – values that are greater than the pivot x1 y1x2 y2 x3 Partition y3 x4 y4 Simple Crack & Merge x1 y1x2 y2 x3 y3 x4 Merge y4 Parallelization 2. Refined Partition & Merge Divide an uncracked piece into T consecutive partitions. The center partition is consecutive with size S = #elements / #threads while the remaining T-1 partitions consist of two disjoint pieces that are arranged concentrically around the center partition. Refine Crack & Merge Size of right piece = S * (1- selectivity) Size of left piece = S * selectivity Example of 4 Threads x1 x2 x3 x4 y4 Partition y3 y2 y1 Refine Crack & Merge x1 x2 x3 x4 y4 y3 y2 Smaller Merge y1 Evaluation