Database Cracking Holger Pirk, Eleni Petraki, Stratos Idreos, Stefan Manegold

advertisement
Database Cracking
Holger Pirk, Eleni Petraki, Stratos Idreos, Stefan Manegold
Outline
1. What is Database Cracking
2. Why do Database Cracking
3. Targeted Environment
4. CPU optimization during cracking
The expectations to database system
❖Simple and flexible
➢Should be able to handle huge sets of data and self-orginize according
to the environment. e.g. the workload, available resources, etc.
❖Good performance
➢Should utilize distributed environments to speed up computation
➢Should utilize multi-core CPU efficiently
What is Database Cracking
An approach in database architecture on column oriented database
(e.g.MonetDB)
Core idea:
1. index maintenace should be a byproduct of query processing, not of
updates
What is Database Cracking
An approach in database architecture on column oriented database
(e.g.MonetDB)
Core idea:
1. index maintenace should be a byproduct of query processing, not of
updates
Only database portions of past interest can be easily localized, the
remainder remains non-indexed until a query becomes interested
What is Database Cracking
An approach in database architecture on column oriented database
(e.g.MonetDB)
Core idea:
1. index maintenace should be a byproduct of query processing, not of
updates
Only database portions of past interest can be easily localized, the
remainder remains non-indexed until a query becomes interested
2. Each query is interpreted as an advice to crack the physical database
store into smaller pieces
cracker index
What is Database Cracking
Design:
1. The first time a range query is posed on an attribute A, the cracking DBMS
makes a copy of column A, called the cracker column of A
2. The cracker column is continuously physically re-organized based on
queries
3. Build a cracker index (AVL-tree) and keep updating it
Why do Database Cracking?
Significant gains in query performance
Provides basis for high-speed distributed
and multi-core query processing
Easy to implement
Cracking algorithms
Physical reorganization happens per column
Split a piece of a column
in two new pieces
Split a piece of a column
in three new pieces
A<5
A<10
A<10
5<A<10
A>=10
5<A<10
A>=10
Cracking algorithms
select A>5 and A<10
17
3
8
6
2
12
13
4
15
Cracking algorithms
In cracker index, each
node of AVL tree stores a
position p
select A>5 and A<10
17
17
3
3
8
8
6
6
2
2
15
15
13
13
4
4
12
12
Cracker column c
Cracking algorithms
select A>5 and A<10
17
17
3
3
8
8
6
6
2
2
15
15
13
13
4
4
12
12
>=10
>=10
Cracking algorithms
select A>5 and A<10
17
17
3
3
8
8
6
6
2
2
15
15
13
13
4
4
12
12
>=10
Cracking algorithms
select A>5 and A<10
17
17
3
3
8
8
6
6
2
2
15
15
13
13
4
4
12
12
>=10
<=5
Cracking algorithms
select A>5 and A<10
17
17
3
3
8
8
6
6
2
2
15
15
13
13
4
<=5
4
12
>=10
12
Cracking algorithms
select A>5 and A<10
>=10
17
3
3
8
8
6
6
2
2
15
15
13
13
4
17
12
12
4
<=5
Cracking algorithms
select A>5 and A<10
17
4
3
3
8
8
6
6
2
2
15
15
13
13
4
17
12
12
>=10
<=5
Cracking algorithms
select A>5 and A<10
17
4
3
3
8
8
6
6
2
2
15
15
13
13
4
17
12
12
>=10
Cracking algorithms
select A>5 and A<10
17
4
3
3
8
8
6
6
2
2
15
15
13
13
4
17
12
12
>=10
Cracking algorithms
select A>5 and A<10
17
4
3
3
8
8
6
6
2
2
15
15
13
13
4
17
12
12
<=5
Cracking algorithms
select A>5 and A<10
17
4
3
3
8
8
6
6
2
2
15
15
13
13
4
17
12
12
<=5
<=5
Cracking algorithms
select A>5 and A<10
17
4
3
3
8
8
6
6
2
2
15
15
13
13
4
17
12
12
>5 and <10
<=5
Cracking algorithms
select A>5 and A<10
17
4
3
3
8
8
6
6
>5 and <10
2
<=5
2
15
15
13
13
4
17
12
12
Cracking algorithms
select A>5 and A<10
17
4
3
3
>5 and <10
8
6
6
2
8
15
15
13
13
4
17
12
12
2
<=5
Cracking algorithms
select A>5 and A<10
17
4
3
3
8
2
6
6
2
8
15
15
13
13
4
17
12
12
>5 and <10
<=5
Cracking algorithms
select A>5 and A<10
17
4
3
3
8
2
6
6
2
8
15
15
13
13
4
17
12
12
>5 and <10
Cracking algorithms
select A>5 and A<10
17
4
3
3
8
2
6
6
2
8
15
15
13
13
4
17
12
12
<= 5
>5
>= 10
Cracking algorithms
Improve data
access for
future queries
select A>5 and A<10
17
4
3
3
8
2
6
6
2
8
15
15
13
13
4
17
12
12
<= 5
>5
>= 10
How does cracking fit in the query plan of DBMS
In MonetDB, the above query is translated into the following(partial) plan:
In MonetDB, each column
is stored in a seperate
table.Each tuple is in the
form {(OID, value)}
How does cracking fit in the query plan of DBMS
The simple select operator :
1. Scan the column
2. Return a new column that contains qualifying values
The cracker select operator :
1.
2.
3.
4.
Search the cracker index
Physically re-organizes pieces found
Update the cracker index
Return a slice of the cracker column as result, and OID
values are disorganized
How does cracking fit in the query plan of a modern DBMS
How does cracking fit in the query plan of a modern DBMS
cracker column which is physically reorganized
Crackers.rel_select(Ra1,Rb,9,20)
1. a1.OID = b.OID
2. 9≤ b.value ≤20
Original column ordered by OID
How does cracking compare to sorting
Sorting method is better
●
●
An environment where it is
known upfront which data is
interesting for for the
users/queries
There is the luxury of time and
resources to create this physical
order before any query arrives
cracking method is better
●
There is not any knowledge
about which part of the data is
interesting
●
There is not enough time to
restore or maintain the phsical
order after an update
How does cracking compare to sorting
Costs in Reality
• Implement microbenchmarks
-1 Billion uniform random integer values
- Pivot in the middle of the range
- Workstation machine (16 GB RAM, 4 Sandy Bridge Cores)
Costs in Reality
Wallclock time in s
13
12
10
8.0
6.0
4.0
2.0
Parallel Scanning
Cracking
Parallel Sorting
A little costs breakdown
Data Stalls
Bad Speculation
Retiring
Pipeline Frontend
Pipeline Backend
0.80
Lots of Potential
0.60
0.40
0.20
Scanning
Cracking
Sorting
Vectorized Cracking
• Turns in-place cracking into out of place cracking
• Copies Vector-seized chunks and cracks them into
array
• Challenge: Ensure that values aren’t “accidentally”
overwritten
Vectorized Cracking
Database Statistics…
Selectivity factor of an operation
(SF):
Selectivity factor for joins
The proportion of tuples of an operand
relation that participate in the result of
that operation
[0, 100%]
Selectivity factor of selection
Example
30% SF = 30% values less than p
+
70% values greater than p
Parallelization
1. Simple Partion & Merge
Divide an uncracked piece into T consecutive partitions.
Concurrently cracked by T threads.
Finally a single thread swaps wrongly placed blocks.
Simple Crack & Merge
Example of 4 Threads
Red – values that are less than the pivot
Blue – values that are greater than the pivot
x1
y1x2
y2 x3
Partition
y3 x4
y4
Simple Crack & Merge
x1
y1x2
y2 x3
y3 x4
Merge
y4
Parallelization
2. Refined Partition & Merge
Divide an uncracked piece into T consecutive partitions.
The center partition is consecutive with
size S = #elements / #threads
while the remaining T-1 partitions consist of two disjoint
pieces that are arranged concentrically around the center
partition.
Refine Crack & Merge
Size of right piece = S * (1- selectivity)
Size of left piece = S * selectivity
Example of 4 Threads
x1
x2
x3
x4
y4
Partition
y3
y2
y1
Refine Crack & Merge
x1
x2
x3
x4
y4
y3
y2
Smaller Merge
y1
Evaluation
Download