Computing the cube Abhinandan Das CS 632 Mar 8 2001

advertisement
Computing the cube
Abhinandan Das
CS 632
Mar 8 2001
1
On the Computation of
Multidimensional Aggregates

Sameet Agarwal, Rakesh Agrawal,
Prasad Deshpande, Ashish Gupta,
Jeffrey Naughton, Raghu Ramakrishnan
& Sunita Sarawagi
-- VLDB 1996
2
Motivation


OLAP / Multidimensional data analysis
Eg: Transactions(Prod,Date,StoreId,Cust,Sales)



Sum of sales by: (P,SId) ; (P) ; (P,D,SId)
Computing multidimensional aggregates
is a performance bottleneck
Efficient computation of several related
group-bys
3
What is a CUBE?



n-dimensional generalization of the
group by operator
Group-bys corresponding to all possible
subsets of a given set of attributes
Eg: SELECT P, D, C, Sum(Sales)
FROM Transactions
CUBE-BY P, D, C

ALL, P, D, C, PD, PC, DC, PDC
4
Approaches
Basic group-by methods:




Sort based
Hash based
Naïve approach
5
Possible optimizations
Smallest parent
Cache results
Amortize scans
Share sorts
Share partitions
Often contradictory
Assumption: Distributive aggregate function
1.
2.
3.
4.
5.


sum, count, min, max ; average
-- Non distributive: median

6
Sort based methods


Algorithm PipeSort
Share-sorts Vs Smallest parent


Optimize to get minimum total cost
Cache-results & amortize-scans

Pipelining: ABCD  ABC  AB  A
7
PipeSort


Assumption: Have an estimate of the
number of distinct values for each
group-by
Input: Search lattice


Graph where each vertex represents a
group-by of the cube
Edge i  j if |i|=|j|+1, i  j
8
Search lattice: CUBE-BY ABCD
9
Search lattice (contd...)

Each edge eij associated with two costs:



S(eij): Cost of computing j from i when i is
pre-sorted
U(eij): Cost of computing j from i when i is
not already sorted
Idea: If attribute order of a group-by j is a
prefix of parent i, compute j without sorting
(Cost S(eij)) else first sort (Cost U(eij))
10
PipeSort (contd...)


Proceed level by level, k=0 to k=N-1
Find best way of computing level k from
level k+1

Weighted bipartite matching:



Make k additional replicas of each node at level
k+1
Cost S(eij) on original node, U(eij) on replicas
Find minimum cost maximal matching
11
Min cost matching
12
Algorithm PipeSort

For level k = 0 to N-1



Generate_plan(k+1  k)
Fix sort order of level k+1 nodes
Generate_plan(k+1 k):


Make k additional replicas of level k+1
nodes, assign appropriate edge costs
Find min-cost matching
13
Example plan
14
Tweaks


Aggregate and remove duplicates whilst
sorting
Use partial sorting order to reduce sort
costs

Eg: ABC  AC
15
Hash based methods


Algorithm PipeHash
Can include all stated optimizations:
(If memory available)

For k=N downto 0

For each group-by g at level k+1


Compute in 1 scan of g all children for which g is
smallest parent
Save g and deallocate hash table of g
16
PipeHash


Limited memory  Use partitioning
Optimization share-partitions:


When data is partitioned on attribute A, all
group-bys containing A can be computed
independently on each partition
No recombination required
17
PipeHash: Overview


First choose smallest parent for each
group-by (gives MST)
Optimize for memory constraints:



Decide what group-bys to compute
together
Choice of attribute for data partitioning
Minimizing overall disk scan cost: NP-Hard!
18
PipeHash
19
Heuristics



Optimizations cache-results and
amortize-scans favoured by choosing
large subtrees of MST: Can compute
multiple group-bys together
However, partitioning attribute limits
subtree
Hence choose the partitioning attribute
that allows choice of largest subtree
20
Algorithm:


Worklist w=MST
While w not empty



Choose any tree T from w
T’ = select_subtree(T)
Compute_subtree(T’)
21
Select_subtree(T)





If mem reqd by T < M, return T
Else: For any a  A get subtree Ta
Let Pa=max # of partitions of root(T)
possible if a used for partitioning
Choose a s.t. (mem reqd Ta)/Pa<M and
Ta is largest subtree over all a
Add forest T-Ta to w, return Ta
22
Compute_subtree(T’)



numParts = (mem reqd T’)* fudge_factor/M
Partition root(T’) into numParts
For each partition of root(T’)

For each node n in T’ (breadth first)


Compute all children of n in 1 scan
Release memory occupied by hash table of n
23
Notes on PipeHash



PipeHash biased towards smallestparent optimization
Eg: Compute BC from BCD (fig)
In practice, saving on sequential disk
scans less important than reducing CPU
cost of aggregation by choosing
smallest parent!
24
Overlap method




Sort based
Minimize disk accesses by overlapping
computation of “cuboids”
Focus: Exploit partially matching sort
orders to reduce sorting costs
Uses smallest parent optimization
25
Sorted runs


B = (A1,A2,...Aj) ; S=(A1,...Al-1,Al+1,...Aj)
A sorted run of S in B is a maximal run of
tuples in B whose ordering is consistent with
the sort order in S
Eg:B=[(a,1,2),(a,1,3),(a,2,2),(b,1,3),
(b,3,2),(c,3,1)]
S=[(a,2),(a,3),(b,3),(b,2),(c,1)] (1st & 3rd)
Sorted runs for S: [(a,2),(a,3)],[(a,2)],[(b,3)],[(b,2)]
and [(c,1)]

26
Partitions





B, S have common prefix A1,...,Al-1
A partition of a cuboid S in B is the union of
sorted runs s.t. the first (l-1) columns (ie
common prefix) have the same value
Previous eg: Partitions for S in B are:
[(a,2),(a,3)], [(b,3),(b,2)] & [(c,1)]
Tuples from different partitions need not be
merged for aggregation
Partition is independent unit of computation
27
Overview





Begin by sorting base cuboid
All other cuboids computed w/o re-sorting
Sort order of base cuboid determines sort
order of all other cuboids
To maximize overlap across cuboid
computations, reduce memory requirements
of individual cuboids
Since partition is unit of computation, while
computing one sorted cuboid from another,
just need mem sufficient to hold a partition
28
Overview (contd...)



When partition is computed, tuples can
be pipelined to descendants; same
memory used by next partition
Significant saving: PartSize << CuboidSize
Eg: Computing ABC and ABD from ABCD
PartSize(ABC) = 1 PartSize(ABD)=|D|
29
Choosing parent cuboids



Goal: Choose tree that minimizes size of
partitions
Eg: Better to compute AC from ACD
than ABC
Heuristic: Maximize size of common
prefix
30
Example cuboid tree
31
Choosing overlapping cuboids




To compute a cuboid in memory, need
memory = PartSize
If required memory is available, cuboid
is in Partition state
Else allocate 1 memory page for the
cuboid, and mark as SortRun state
Only tuples of a Partition cuboid can be
pipelined to descendants
32
Heuristics



Which cuboids to compute and in what
state: Opt allocation NP-hard!
Heuristic: Traverse tree in BFS order
Intuition:


Cuboids to the left have smaller partition
sizes
So require less memory
33
Cuboid computation

For each tuple t of B (parent)



If (state==partition) process_partition(t)
Else process_sorted_run(t)
Process_partition(t):

3 cases:



Tuple starts new partition
Tuple matches existing tuple in partition
New tuple: Insert at appropriate place
34
Cuboid computation (contd...)

Process_sorted_run(t):

3 cases





Tuple starts new sorted run
Tuple matches last tuple in current run
New tuple: Append tuple to end of current run
Cuboid in Partition state fully computed in 1
pass
Cuboid in SortRun state: Combine merge step
with computation of descendant cuboids
35
Example CUBE computation
(M=25 ; 9 sorted runs of BCD, CD to merge)
36
An array based algorithm for
simultaneous multidimensional aggregates

Yihong Zhao, Prasad Deshpande,
Jeffrey Naughton
-- SIGMOD ‘97
37
ROLAP Vs MOLAP


CUBE central to OLAP operations
ROLAP: Relational OLAP systems


PipeSort, PipeHash, Overlap
MOLAP: Multidimensional OLAP systems

Store data in sparse arrays instead of
relational tables
38
MOLAP systems


Relational tuple:
(jacket, K-mart, 1996, $40)
MOLAP representation:



Stores only ‘$40’ in a sparse array
Position in array encodes
(jacket,K-mart,1996)
Arrays are “chunked” and compressed
for efficient storage
39
Problem


No concept of “reordering” to bring
together related tuples
Order cell visits to simultaneously
compute several aggregates whilst
minimizing memory requirements and #
of cell visits
40
Efficient array storage: Issues



Array too large to fit in memory: Split
into “chunks” that fit in memory
Even with chunking, array may be
sparse: Compression needed
Standard PL technique of storing arrays
in row/column major form inefficient

Creates asymmetry amongst dimensions,
favoring one over the other
41
Chunking



Divide n-D array into small n-D chunks
and store each chunk as independent
object on disk
Keep size of chunk = disk block size
We shall use chunks having same size
along each dimension
42
Compressing sparse arrays




Store “dense” chunks as is (>40% occ.)
Already a significant compression over a
relational table
Sparse arrays: Use chunk-offset compression
– (offsetInChunk,data)
Better than LZW etc. because:


Uses domain knowledge
LZW data requires decompression before use
43
Loading arrays from tables




Input: Table, dim sizes, chunksize
If M < array size, partition sets of chunks into
groups which fit in memory eg: Suppose 16
chunks and 2 partitions, group chunks 0-7 &
8-16
Scan table. For each tuple, calculate & store
(chunk#,offset,data) into buffer page for
corresponding partition
2nd pass: For each partition, read tuples and
assign to chunks in memory. Compress.
44
Basic algo (No overlapping)




Eg: 3-D array ABC; To compute AB
If array fits in memory, sweep plane of size
|A|*|B| along dim C, aggregating as you go
If array chunked: Sweep plane of size
|Ac|*|Bc| through upper left chunks. Store
result, move to chunks on right
Each chunk read in only once
Mem: 1 chunk + |Ac|*|Bc| plane
45
Generalization




Sweep k-1 dimensional subarrays through a
k-dimensional array
Multiple group-bys: Use smallest parent
optimization in cube lattice
Advantage over ROLAP: Since dimension &
chunk sizes known, exact node sizes can be
computed
Min Size Spanning Tree (MSST): Parent of
node n is parent node n’ in lattice of min size
46
Basic array cubing algorithm:


First construct MSST of the group-bys
Compute a group-by Di1Di2...Dik from
parent Di1...Di.k+1 of min size:

Read in each chunk of Di1...Di.k+1 along
dimension Di.k+1 and aggregate each chunk
to a chunk of Di1...Dik. Once a chunk of
Di1...Dik is complete, flush and reuse mem
47
Example




ABC – 16x16x16 array
Chunk size: 4x4x4
Dimension order: ABC
Eg: Computing BC: Read in order 1..64
After every 4, flush chunk to disk and
reuse memory
48
3-D array (Dim order ABC)
49
Multi-Way algorithm



Single pass version: Assume enough memory
to compute all group-bys in 1 scan
Reduce memory requirements using a special
order to scan input array, called dimension
order
A dimension order of the array chunks is a
row major order of the chunks with n
dimensions D1...Dn in some order
O = (Di1,Di2,...Din)
50
Memory requirements






For a given dimension order, can determine
which chunks of each group-by need to stay
in memory to avoid rescanning input array
Eg: Suppose D.O. is ABC ie 1..64
BC: Sweep 1 chunk, deallocate & reuse
AC: Sweep 4 chunks for entire AC
AB: Sweep 16 chunks for entire AB
Note: Each BC chunk is generated in DO.
Before writing a BC chunk to disk, use it to
compute B,C chunks as if read in D.O.
51
Memory requirements




Generalizing: (Xc=chunk size, Xd=dim size)
Computing BC requires |Bc|*|Cc| memory
Computing AC, AB requires |Ad|*|Cc| and
|Ad|*|Bd| memory
RULE1: For a gp-by (Dj1,...,Djn-1) of array (D1,...Dn)
with DO=(D1,...Dn), if (Dj1,...,Djn-1) contains a prefix
of (D1,...Dn) of length p, then mem requirement for
computing (Dj1,...,Djn-1) is:
n 1
p
| D | *| C
i
i 1
i
|
i  p 1
52
Minimum Memory Spanning Tree



MMST: Got from lattice by choosing
parents for each node N as per RULE1
Choose the parent that minimizes
memory requirements: The prefix of the
parent node contained in node N must
have minimum length
Break ties by choosing node with
minimum size as parent
53
MMST
54
Optimal dimension order


Different dimension orders may generate
different MMSTs which have vastly different
memory requirements
Eg: 4D array ABCD




Dimension sizes: 10,100,1000,10000 resp.
Chunk size =10x10x10x10
Figure shows MMSTs for dim orders ABCD and
DBCA
Optimal dimension order is (D1,...,Dn) where
| D1 || D2 | ... | Dn |
55
56
Multi-pass Multi-way Array Algorithm



Single pass algo assumed we had
memory MT required by MMST T of
optimal dimension order
If M <= MT, we cannot allocate memory
for some of the subtrees of MMST
(incomplete subtrees)
Optimal memory allocation to different
subtrees likely to be NP-Hard
57
Heuristic


Allocate memory to subtrees of root
from the right to left order
Intuition: Rightmost node will be the
largest array and we want to avoid
computing it in multiple passes
58
Algorithm



Create MMST T for opt D.O. O
Add T to ToBeComputed list
For each tree T’ in ToBeComputed list:
 Create working subtree W and incomplete
subtrees Is (allocate mem=node chunksize)

Scan chunks of root of T’ in order O



Compute group-bys in W and write to disk
Write intermediate result of aggregation of Dj1...Djn-1
group by for each chunk of D1,...,Dn
For each R=root(I)


Generate chunks of Dj1...Djn-1 from the partitions of R & write
to disk (Merge several intermediate results into 1 chunk)
Add I to ToBeComputed
59
ROLAP vs MOLAP

Proposed algorithm more efficient than
existing ROLAP techniques (even indirectly):




Scan relational table and load into array
Compute the cube on resulting array
Dump resulting CUBEd array into tables
Why better?


ROLAP table sizes much bigger than compressed
array sizes
Multiple sorts reqd. MOLAP does not require sorts
since the multidimensional array captures
relationships amongst all the different dimensions
60
Download