Bitmap Index Design and Evaluation By: Chee-Yong Chan Yannis E.Ioannidis Ariel Noy Data representation and retrieval seminar Introduction Query performance issues • On Line Transaction Processing. Read write database. • Decision Support System. Read mostly environments, with high selectivity factor. Bitmap In Simple Form Value List Index •Every value has it’s own column == bitmap. Advantages • Compact size. • Efficient hardware support for bitmap operations (AND, OR, XOR, NOT). • Fast search. • Multiple differentiate bitmap indexes for different kind of queries. Selection queries. • Queries of the form “A op v” A refers to indexed attribute. Op {, , , , , } Range predicates Equality predicates {, , , } {, } Space time tradeoff of bitmap indexes, for selection queries. • Space optimal bitmap index. • Time optimal bitmap index under a given space constraint. • Bitmap index with optimal space time tradeoff. • Time optimal bitmap index. Attribute Value Decomposition. if b n b n 1... b 2 b1 b uniform base Bitmap Encoding Scheme •Equality Encoding: bi bits one for each possible value, all 0, vi 1. •Range Encoding: vi right most bits 0, rest 1. Evaluation Algorithm for RangeEncoded Bitmap Indexes. • RangeEval - O’Neil and Quass • RangeEval-Opt: – number bitmap operation 50% off – less bitmap scans for range predicate evaluation – caluclating only the requested bitmap – avoids the intermediate equality predicate evaluation by evaluating each range query in term only off <= based on: • A < v == A<=v-1 • A > v == ! (A<=v) • A>=v == A<=v-1 – Working with only one bitmap B vs. working with at least two [Beq and ( Blt or Bge)] Example: • A<=864 using a 3 component base-10 index. • RaneEval-Opt: 4 operation 5 scans • RangeEval: 10 operations 6 scans Analytical Comparison Cost Model for Space-Time Tradeoff Analysis • Space(I) Space metric is in term of number of bitmaps stored. • Time(I) Time metric is in term of expected number of bitmap scans for a selection query evaluation. Comparison of Bitmap Encoding Scheme • Equality encoded: • Range encoded: S(I) ~ C S(I) ~ C-n T(I) ~ n*b/2 T(I) ~ 2n • Space Optimal: – number of bitmap in n-component space optimal = n(b-2) b~ n C – space efficiency is non-decreasing function of the number of components. – The ultimate optimal is when n=log(C) • Time Optimal: – the optimal base in n-component base is <2,2,2,…,C/2^N> – time efficiency is non-increasing function of the number of components. – The ultimate optimal is when n=1 Optimal Space-Time Tradeoff (knee). Based on experimental, guessing and guts filling. 2 component index The base of the most time-efficient 2-component space-optimal index is given by: b2 ,b1 where b1 C C , b2 b1 b b (b b ) 2 4C 2 1 nax o, 2 1 2 C , C Time Optimal Bitmap Index Under Space Constraint Bitmap Index Storage Schems • Bitmap Level Storage (BS) each bitmap his own file • Component Level Storage (CS) each index component has its own file • Index Level Storage (IS) all together in one file Compression of each file • CS has the best Space(I) tradeoff after compression. • BS has the best Time(I) tradeoff after compression.