Bitmap Index Design and Evaluation

advertisement
Bitmap Index Design and
Evaluation
By:
Chee-Yong Chan
Yannis E.Ioannidis
Ariel Noy
Data representation and retrieval seminar
Introduction
Query performance issues
• On Line Transaction Processing.
Read write database.
• Decision Support System.
Read mostly environments, with high
selectivity factor.
Bitmap In Simple Form
Value List Index
•Every value has it’s own column ==
bitmap.
Advantages
• Compact size.
• Efficient hardware support for bitmap
operations (AND, OR, XOR, NOT).
• Fast search.
• Multiple differentiate bitmap indexes for
different kind of queries.
Selection queries.
• Queries of the form “A op v”
A refers to indexed attribute.
Op
 {, , , , , }
Range predicates
Equality predicates
{, , , }
 {, }
Space time tradeoff of bitmap
indexes, for selection queries.
• Space optimal bitmap index.
• Time optimal bitmap index under a given
space constraint.
• Bitmap index with
optimal space
time tradeoff.
• Time optimal bitmap
index.
Attribute Value Decomposition.
if b n b n 1...  b 2 b1 b   uniform base
Bitmap Encoding Scheme
•Equality Encoding:
bi bits one for each possible value, all 0, vi 1.
•Range Encoding:
vi right most bits 0, rest 1.
Evaluation Algorithm for RangeEncoded Bitmap Indexes.
• RangeEval - O’Neil and Quass
• RangeEval-Opt:
– number bitmap operation 50% off
– less bitmap scans for range predicate evaluation
– caluclating only the requested bitmap
– avoids the intermediate equality predicate evaluation by evaluating each
range query in term only off <= based on:
• A < v == A<=v-1
• A > v == ! (A<=v)
• A>=v == A<=v-1
– Working with only one bitmap B vs. working with at least two
[Beq and ( Blt or Bge)]
Example:
•
A<=864 using a 3 component base-10 index.
•
RaneEval-Opt:
4 operation 5 scans
•
RangeEval:
10 operations 6 scans
Analytical Comparison
Cost Model for Space-Time Tradeoff Analysis
• Space(I)
Space metric is in term of number of bitmaps
stored.
• Time(I)
Time metric is in term of expected number of
bitmap scans for a selection query
evaluation.
Comparison of Bitmap
Encoding Scheme
• Equality encoded:
• Range encoded:
S(I) ~ C
S(I) ~ C-n
T(I) ~ n*b/2
T(I) ~ 2n
• Space Optimal:
– number of bitmap in n-component space optimal = n(b-2)
b~
n
C
– space efficiency is non-decreasing function of the number of
components.
– The ultimate optimal is when n=log(C)
• Time Optimal:
– the optimal base in n-component base is
<2,2,2,…,C/2^N>
– time efficiency is non-increasing function of the number of
components.
– The ultimate optimal is when n=1
Optimal Space-Time Tradeoff (knee).
Based on experimental, guessing and guts
filling.
2 component index
The base of the most time-efficient 2-component space-optimal index is
given by:
 b2   ,b1   
where b1 
C 
C , b2   
 b1 
 
 b  b  (b  b ) 2  4C 


2
1
  nax o, 2 1

2


 C , C 
Time Optimal Bitmap Index Under Space Constraint
Bitmap Index Storage Schems
• Bitmap Level Storage (BS)
each bitmap his own file
• Component Level Storage (CS)
each index component has its own file
• Index Level Storage (IS)
all together in one file
Compression of each file
• CS has the best Space(I) tradeoff after
compression.
• BS has the best Time(I) tradeoff after
compression.
Download