Optimal Distributed Declustering using Replication Keith Frikken Purdue University

advertisement
Optimal Distributed
Declustering using Replication
Keith Frikken
Purdue University
Jan 5, 2005
ICDT 2005
1
Declustering Data
• Declustering data over multiple disks to
improve performance for range queries
has been well studied
• Applications include:
– Spatio-temporal databases
– Image and video data
– Scientific simulation datasets
ICDT 2005
2
Goal
• Divide data uniformly along dimensions to create tiles
• Put records contained in each tile on different disks so
that I/O can be parallelized
• Assumptions
– Data can be tiled in such a way
– Disks have constant retrieval times
• Assigning tiles to disks is similar to a coloring problem
(disks are colors)
• A range query can be answered optimally if the # of I/O
retrievals for any specific disk is: # of tiles/# of disks
• Two approaches:
– Coloring schemes
– Replication
ICDT 2005
3
Notations
•
•
•
•
•
•
k is number of disks
m is number of tiles in queries
r is level of replication (i.e., is 2)
Q is the set of all range queries
ret(q) is the actual retrieval time of q
Optimal retrieval time for a query q is
oq=m/k
• Additive error ε, maxqQ{ret(q)-oq}
ICDT 2005
4
Coloring schemes
• Disk Modulo (DM) [Du and Sobolewski,
1982]
• Fieldwise XOR (FX) [Kim and Pramanik,
1988]
• Cyclic Schemes (RPHM, GFIB, EXH) –
[Prabhakar et al, 1998]
• Golden Ratio Sequences (GRS) – [Bhatia
et al, 2000]
ICDT 2005
5
Other schemes
• [Atallah and Prabhakar, 2000] developed a
scheme in two dimensional grids for k=2n disks
the has additive error of O(log k)
• [Sinha et al, 2001] proved lower bounds on the
additive error of Ω(log k) and Ω(log(d-1)/2 k) for 2
dimensions and d (>2) dimensions respectively
• [Chen and Cheng, 2002] showed that an
additive error of O(log(d-1) k) is achievable for any
# of dimensions (>2)
ICDT 2005
6
Replication
• Placing records on multiple disks can further
improve performance of declustering schemes
• Two Problems:
– How to schedule a query (i.e., what tiles are retrieved
from each disk)
– How to use replication to balance load
• Approaches:
– Chained Declustering [Hsiao and DeWitt, 1990]
– Random Duplication Allocation [Sanders et al 2000],
[Sanders, 2001], and [Czumaj and Scheidler, 2003]
ICDT 2005
7
Replication Results
• Chained Declustering
– Fast Scheduling Algorithm O(m+k) time to test if a
specific retrieval time is possible [Aerts et al, 2000]
• RDA
– If m≥ck(log k) then optimal with high prob [Czumaj
and Scheideler, 2003]
– “Fast” scheduling algorithm” O(ΔkO(1)) time [Czumaj
and Scheideler, 2003]
• Hybrid techniques [Chen and Cheng, 2002]
– Use GRS with second random disk
ICDT 2005
8
Our Results
• We define a new class of schemes called the
shift schemes
• Deterministic
• Any query with at least k(k-1)ε tiles can be
answered in an optimal fashion
• Queries can be scheduled in O(m+k(log ε)) time
• If a single disk fails, then any query with at least
k(k-1)ε tiles can be answered optimally
• Experimental performance similar to RDA (better
for many cases)
ICDT 2005
9
Shift Scheme Definition
• Use any strong coloring scheme
• Use a modified chain declustering
– Defined by shift value s (where gcd(s,k)=1)
• Base scheme is defined by function f(x,y)
– Second color is (f(x,y)+s mod k)
ICDT 2005
10
Shift Scheme Definition
• Use any strong coloring scheme
• Use a modified chain declustering
– Defined by shift value s (where gcd(s,k)=1)
• Base scheme is defined by function f(x,y)
– Second color is (f(x,y)+s mod k)
0,3
2,0
4,2
1,4
3,1
1,4
3,1
0,3
2,0
4,2
2,0
4,2
1,4
3,1
0,3
ICDT 2005
3,1
0,3
2,0
4,2
1,4
4,2
1,4
3,1
0,3
2,0
11
Scheduling
• Can use modification of chain declustering
scheduling algorithm to schedule queries
in O(m+k(log ε)) time
• Essentially, use previous algorithm to test
if a specific load is possible and do a
binary search on the possible loads
ICDT 2005
12
Bound(1)
• There are k disks (D0,…,Dk-1)
• Disk Di has ti tiles initially (as the primary
disk)
• The number of tiles is m=t0+…+tk-1
• Di shifts di tiles to Di+1
• di ≤ ti
• The goal is to minimize the most tiles at a
disk, i.e., max0≤i≤k-1{di-1+ti-di}
ICDT 2005
13
Bound(2)
• Recall,
– o=m/k
– max0≤i≤k-1{ti} ≤ o+ε
• Suppose m≥k(k-1)ε
• Then,
– o ≥ (k-1)ε
k 1
– Surplus ( i 0 max{ 0, ti  o}) is bounded by (k-1)ε
– max0≤i≤k-1{di} ≤ (k-1)ε ≤ o
• Two cases:
– If disk has a surplus
– If disk has a shortage
ICDT 2005
14
32 disks
ICDT 2005
15
64 disks
ICDT 2005
16
128 disks
ICDT 2005
17
32 disks, 3 dimensions
ICDT 2005
18
Generalizations
• Permutations
• Higher levels of replication
• Survivability
– If the level of replication is r, can handle any r1 failures
– When r=2, and a single disk fails then:
• Fast scheduling still possible
• Large queries still optimal
ICDT 2005
19
Summary
• Shift schemes are a new class of schemes
– Optimal for “large enough” queries
– Efficient scheduling algorithm
– Resilient to disk failures
• Future Work
– Better analysis of scheme
– Choosing shift values
ICDT 2005
20
Download