CS 245: Database System Principles

advertisement
CS 347:
Parallel and Distributed
Data Management
Notes02: Distributed DB Design
Hector Garcia-Molina
CS 347
Notes 02
1
Distributed DB Design
Chapter 5
Ozsu & Valduriez
Top-down approach: - have DB…
- how to split and
allocate the sites
Multi-DBs (or bottom-up): no design
issues!
CS 347
Notes 02
2
Two issues in DDB design:
• Fragmentation
• Allocation
Note: issues not independent,
but will cover separately
CS 347
Notes 02
3
Example
Employee relation E (#,name,loc,sal,…)
40% of queries:
40% of queries:
Qa: select *
Qb: select *
from E
from E
where loc=Sa
where loc=Sb
and…
and ...
CS 347
Notes 02
4
Example
Employee relation E (#,name,loc,sal,…)
40% of queries:
40% of queries:
Qa: select *
Qb: select *
from E
from E
where loc=Sa
where loc=Sb
and…
and ...
Motivation: Two sites: Sa, Sb
Qa  Sa
Sb  Qb
CS 347
Notes 02
5
• It does not take a rocket scientist to
figure out fragmentation...
CS 347
Notes 02
6
# NM Loc Sal
Joe
Sally
Tom
..
5
7
8
Sa 10
Sb 25
Sa 15
..
E
F
# NM Loc Sal
# NM Loc Sal
5
8
7
Sb 25
At Sb
At Sa
CS 347
Sally
..
Sa 10
Sa 15
..
Joe
Tom
Notes 02
7
F = { F1, F2 }
F1 =
CS 347

loc=Sa E
F2 =
Notes 02

loc=Sb E
8
F = { F1, F2 }
F1 =

loc=Sa E
F2 =

loc=Sb E
 called primary horizontal fragmentation
CS 347
Notes 02
9
Fragmentation
• Horizontal
R
Primary
depends on local attributes
Derived
depends on foreign relation
• Vertical
R
CS 347
Notes 02
10
Fragmentation
• Horizontal
R
Primary
depends on local attributes
Derived
depends on foreign relation
• Vertical
R
CS 347
Fragmentation also called
Sharding
Notes 02
11
Three common horizontal
partitioning techniques
• Round robin
• Hash partitioning
• Range partitioning
CS 347
Notes 02
12
• Round robin
R
t1
t2
t3
t4
...
D0
t1
D1
D2
t2
t4
t3
t5
• Evenly distributes data
• Good for scanning full relation
• Not good for point or range queries
CS 347
Notes 02
13
• Hash partitioning
R
t1h(k1)=2
t2h(k2)=0
t3h(k3)=0
t4h(k4)=1
...
D0
D1
t2
t3
D2
t1
t4
• Good for point queries on key; also for joins
• Not good for range queries; point queries not on key
• If hash function good, even distribution
CS 347
Notes 02
14
• Range partitioning
R
t1:
t2:
t3:
t4:
...
A=5
A=8
A=2
A=3
partitionin
g
vector
4 7
D0
t3
t4
V0 V1
D1
t1
D2
t2
• Good for some range queries on A
• Need to select good vector: else unbalance
 data skew
 execution skew
CS 347
Notes 02
15
Which are good fragmentations?
Example:
F = { F1, F2 }
F1 =
CS 347

sal<10
E
F2 =
Notes 02

sal>20 E
16
Which are good fragmentations?
Example:
F = { F1, F2 }
F1 =

sal<10

CS 347
E
F2 =

sal>20 E
Problem: Some tuples lost!
Notes 02
17
Which are good fragmentations?
Second example:
F = { F3, F4 }
F3 =
CS 347

sal<10
E
F4 =
Notes 02

sal>5 E
18
Which are good fragmentations?
Second example:
F = { F3, F4 }
F3 =


sal<10
E
F4 =

sal>5 E
Tuples with 5 < sal < 10 are duplicated...
CS 347
Notes 02
19
 Prefer to deal with replication explicitly
Example:

F =
F5 =
7
F = { F5, F6, F7 }
sal  5
E
sal  10
E
F6 =

5< sal <10
E
 Then replicate F6 if convenient
(part of allocation problem)
CS 347
Notes 02
20
Desired properties for horizontal
fragmentation
R 
F ={ F1, F2, … }
(1) Completeness
t  R,  Fi  F such that t  Fi
CS 347
Notes 02
21
(2) Disjointness
t  Fi,  Fj such that
tFj, i  j, Fi, Fj  F
(3) Reconstruction - ignore
CS 347
Notes 02
22
How do we get completeness and
disjointness?
(1) Check it “manually”!
e.g., F1 =
CS 347

sal<10 E
Notes 02
; F2 =

sal10 E
23
How do we get completeness and
disjointness?
(2) “Automatically” generate fragments
with these properties
Desired simple predicates  Fragments
CS 347
Notes 02
24
Example of generation
• Say queries use predicates:
A<10, A>5, Loc = SA, Loc = SB
• Next:
CS 347
- generate “minterm” predicates
- eliminate useless ones
Notes 02
25
Minterm predicates (part I)
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
A<10
A<10
A<10
A<10
A<10
A<10
A<10
A<10
CS 347
 A>5  Loc=SA  Loc=SB
 A>5  Loc=SA  ¬(Loc=SB)
 A>5  ¬(Loc=SA)  Loc=SB
 A>5  ¬(Loc=SA)  ¬(Loc=SB)
 ¬(A>5)  Loc=SA  Loc=SB
 ¬(A>5)  Loc=SA  ¬(Loc=SB)
 ¬(A>5)  ¬(Loc=SA)  Loc=SB
 ¬(A>5)  ¬(Loc=SA)  ¬(Loc=SB)
Notes 02
26
Minterm predicates (part I)
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
A<10
A<10
A<10
A<10
A<10
A<10
A<10
A<10
CS 347
 A>5  Loc=SA  Loc=SB
 A>5  Loc=SA  ¬(Loc=SB)
 A>5  ¬(Loc=SA)  Loc=SB
 A>5  ¬(Loc=SA)  ¬(Loc=SB)
 ¬(A>5)  Loc=SA  Loc=SB
 ¬(A>5)  Loc=SA  ¬(Loc=SB)
 ¬(A>5)  ¬(Loc=SA)  Loc=SB
 ¬(A>5)  ¬(Loc=SA)  ¬(Loc=SB)
Notes 02
27
5 < A < 10
Minterm predicates (part I)
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
A<10
A<10
A<10
A<10
A<10
A<10
A<10
A<10
 A>5  Loc=SA  Loc=SB
 A>5  Loc=SA  ¬(Loc=SB)
 A>5  ¬(Loc=SA)  Loc=SB
 A>5  ¬(Loc=SA)  ¬(Loc=SB)
 ¬(A>5)  Loc=SA  Loc=SB
 ¬(A>5)  Loc=SA  ¬(Loc=SB)
 ¬(A>5)  ¬(Loc=SA)  Loc=SB
 ¬(A>5)  ¬(Loc=SA)  ¬(Loc=SB)
A5
CS 347
Notes 02
28
Minterm predicates (part II)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
¬(A<10)  A>5  Loc=SA  Loc=SB
¬(A<10)  A>5  Loc=SA ¬(Loc=SB)
¬(A<10)  A>5 ¬(Loc=SA)  Loc=SB
¬(A<10)  A>5 ¬(Loc=SA) ¬(Loc=SB)
¬(A<10) ¬(A>5)  Loc=SA  Loc=SB
¬(A<10) ¬(A>5)  Loc=SA ¬(Loc=SB)
¬(A<10) ¬(A>5) ¬(Loc=SA)  Loc=SB
¬(A<10) ¬(A>5) ¬(Loc=SA) ¬(Loc=SB)
CS 347
Notes 02
29
Minterm predicates (part II)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
¬(A<10)  A>5  Loc=SA  Loc=SB
¬(A<10)  A>5  Loc=SA ¬(Loc=SB)
¬(A<10)  A>5 ¬(Loc=SA)  Loc=SB
¬(A<10)  A>5 ¬(Loc=SA) ¬(Loc=SB)
¬(A<10) ¬(A>5)  Loc=SA  Loc=SB
¬(A<10) ¬(A>5)  Loc=SA ¬(Loc=SB)
¬(A<10) ¬(A>5) ¬(Loc=SA)  Loc=SB
¬(A<10) ¬(A>5) ¬(Loc=SA) ¬(Loc=SB)
A  10
CS 347
Notes 02
30
Final fragments:
F2:
F3:
F6:
F7:
F10:
F11:
CS 347
5 < A < 10
5 < A < 10
A5
A5
A  10
A  10






Notes 02
Loc=SA
Loc=SB
Loc=SA
Loc=SB
Loc=SA
Loc=SB
31
Note: elimination of useless fragments
depends on application semantics:
e.g.: if LOC could be  SA,  SB,
we need to add fragments
F4: 5 <A <10
 Loc  SA  Loc  SB
F8: A  5
 Loc  SA  Loc  SB
F12: A  10
 Loc  SA  Loc  SB
CS 347
Notes 02
32
Why does this work?
Predicates: p1  p2  p3  p4
p1  p2  p3  ¬ p4
...
¬ p1  ¬ p2  ¬ p3  ¬ p4
CS 347
Notes 02
33
(1) Completeness: Take t  R
pi(t) must be T or F!
Say p1(t) =T p2(t) = T p3(t) =F p4(t) =F
Then t is in fragment with predicate
p1  p2  ¬ p3  ¬ p4
CS 347
Notes 02
34
(2) Disjointness
Say t  Fragment p1  p2  ¬ p3  ¬ p4
Then:
p1(t) = T, p2(t) = T, p3(t) = F, p4(t)= F
 t cannot be in any other fragment!
CS 347
Notes 02
35
Summary
• Given simple predicates Pr= { p1, p2,.. pm }
minterm predicates are
M={m | m =

p *,
p P
k
k r
1  k m }
where pk* is pk or is ¬ pk
• Fragments
m R for all m  M are
complete and disjoint
CS 347
Notes 02
36
Another Desired Fragmentation Property:
Match Access Patterns
data A
frequently
accessed
together
CS 347
data B
data C
Notes 02
try to
place in
same
fragment
37
Return to example:
E(#, NM, LOC, SAL,…)
Common queries:
Qa: select *
from E
where LOC=Sa
and …
CS 347
Notes 02
Qb: select *
from E
where LOC=Sb
and ...
38
Three choices:
(1) Pr = { } F1 ={ E }
(2) Pr = {LOC=Sa, LOC=Sb}
F2={ 
loc=Sa
E,

loc=Sb
E}
(3) Pr = {LOC=Sa, LOC=Sb, Sal<10}
F3={ loc=Sa  sal<10 E, loc=Sa  sal10 E,
loc=Sb  sal<10E, loc=Sb  sal10 E }
CS 347
Notes 02
39
In other words:
Qa: Select … loc = Sa ...
Loc=Sa 
sal < 10
Qb: Select … loc = Sb ...
Loc=Sa 
sal  10
F1
F2
F3
Loc=Sb 
sal < 10
Loc=Sb 
sal  10
CS 347
Notes 02
40
In other words:
Qa: Select … loc = Sa ...
Loc=Sa 
sal < 10
Qb: Select … loc = Sb ...
Loc=Sa 
sal  10
F1
F2
F3
F2 is good…
(not F1 , F3 )
Loc=Sb 
sal < 10
Loc=Sb 
sal  10
CS 347
Notes 02
41
Derived horizontal fragmentation
Example:
E(#, NM, SAL, LOC)
F={ E1, E2} by LOC
J(#, DES,…)
Common query for project:
[Given employee name,
list projects (s)he works in]
CS 347
Notes 02
42
#
E1 5
8
…
NM
Joe
Tom
Loc
Sa
Sa
Sal
10
15
E2
#
7
12
…
(at Sa)
J
CS 347
NM
Sally
Fred
Loc
Sb
Sb
Sal
25
15
(at Sb)
#
5
7
5
12
…
Description
work on 347 hw
go to moon
build table
rest
Notes 02
43
#
E1 5
8
…
NM
Joe
Tom
Loc
Sa
Sa
Sal
10
15
E2
#
7
12
…
NM
Sally
Fred
(at Sa)
J1
#
5
5
…
J1 = J
CS 347
Des
work on 347 hw
build table
Loc
Sb
Sb
Sal
25
15
(at Sb)
J2
E1
#
7
12
…
J2 = J
Notes 02
Des
go to moon
rest
E2
44
Derived horizontal fragmentation
R,
F = { F1, F2, ... Fn}

S,
F could be
primary or derived
D = {D1, D2, …Dn} where Di =S Fi
Convention: R is owner
S is member
CS 347
Notes 02
45
• Checking completeness and
disjointness of derived fragmentation
Example: Say J is:
#
…
33
…
Des
build chair
 But no #= 33 in E1 nor in E2!
This J tuple will not be in J1 nor J2
Fragmentation not complete
CS 347
Notes 02
46
To get
completeness
Need to enforce
referential integrity constraint:
join attr(#) of member relation

joint attr(#) of owner relation
CS 347
Notes 02
47
Example:
E1
#
5
…
NM
Joe
Loc
Sa
J
J1
#
5
…
CS 347
#
5
…
Sal
10
E2
#
5
…
NM
Fred
Description
day off
Description
day off
J2
Notes 02
Loc
Sb
Sal
20
Fragmentation
is not
disjoint!
#
5
…
Description
day off
48
To get
disjointness
CS 347
Join attribute(#) should be
key of owner relation
Notes 02
49
Summary: horizontal fragmentation
• Type: primary, derived
• Properties: completeness, disjointness
CS 347
Notes 02
50
Vertical fragmentation
Example:
E1
CS 347
#
5
7
8
…
NM
Joe
Sally
Fred
E
#
5
7
8
…
NM
Joe
Sally
Fred
Loc
Sa
Sb
Sa
E2
Loc
Sa
Sb
Sa
Notes 02
Sal
10
25
15
#
5
7
8
…
Sal
10
25
15
51
R1[T1]
Ti  T
...
R[T] 
Rn[Tn]
 Just like normalization of relations
CS 347
Notes 02
52
Properties: R[T] 
Ri[Ti]
(1) Completeness
U
Ti = T
all i
CS 347
Notes 02
53
(2) Disjointness
Ti  Tj =  for all i,j ij
E1(#,LOC)
E(#,LOC,SAL)
E2(SAL)
CS 347
Notes 02
54
(2) Disjointness
Ti  Tj =  for all i,j ij
E1(#,LOC)
E(#,LOC,SAL)
E2(SAL)
Not a desirable property!!
(could not reconstruct R!)
CS 347
Notes 02
55
(3) Lossless join
all i
Ri = R
 One way to achieve lossless join:
Repeat key in all fragments, i.e.,
Key  Ti for all i
CS 347
Notes 02
56
 How do we decide what attributes
are grouped with which?
E1(#,NM,LOC)
E2(#,SAL)
Example:
E(#,NM,LOC,SAL)
E1(#,NM)
E2(#,LOC)
E3(#,SAL)
?
CS 347
Notes 02
57
Attribute affinity matrix
A1
A2
A3
A4
A5
CS 347
A1
A2
A3
A4
A5
-
-
-
-
-
50
-
-
-
-
45
48
-
-
-
1
2
0
-
-
0
0
4
75
-
Notes 02
58
Attribute affinity matrix
A1
A2
A3
A4
A5
A1
A2
A3
A5
-
-
-
-
-
50
-
-
-
-
45
48
-
-
-
1
2
0
-
-
0
0
4
75
-
R1[K,A1,A2,A3]
CS 347
A4
R2[K,A4,A5]
Notes 02
59
• Textbook (Ozsu & Valduriez) discusses
– How to build affinity matrix
– How to identify attribute clusters
– How to partition relation
• You are not responsible for
– Clustering and partitioning algorithms
(i.e., Skip pages 135-145)
CS 347
Notes 02
60
Allocation
Example: E(#,NM,LOC,SAL) 
F1 = loc=Sa E ; F2 = loc=Sb E
Qa: select … where loc=Sa...
Qb: select … where loc=Sb…
Where do
F1,F2 go?
Site a
CS 347
Site b
?
Notes 02
61
Issues
• Where do queries originate
• What is communication cost?
and size of answers, relations,…
• What is storage capacity, cost at sites?
and size of fragments?
• What is processing power at sites?
CS 347
Notes 02
62
More Issues
• What is query processing strategy?
– How are joins done?
– Where are answers collected?
CS 347
Notes 02
63
Do we replicate fragments?
• Cost of updating copies?
• Writes and concurrency control?
• ...
CS 347
Notes 02
64
Optimization problem:
• What is best placement of fragments
and/or best number of copies to:
– minimize query response time
– maximize throughput
– minimize “some cost”
– ...
• Subject to constraints?
– Available storage
– Available bandwidth, power,…
– Keep 90% of response time below X
– ...
CS 347
Notes 02
65
Optimization problem:
• What is best placement of fragments
and/or best number of copies to:
– minimize query response time
– maximize throughput
This is an incredibly
– minimize “some cost”
hard problem
– ...
• Subject to constraints?
– Available storage
– Available bandwidth, power,…
– Keep 90% of response time below X
– ...
CS 347
Notes 02
66
Example: Single fragment F
m
Read cost:
 [ti  MIN Cij]
i=1
j
i:
Originating site of request
ti: Read traffic at Si
Cij: Retrieval cost
Accessing fragment F at Sj from Si
CS 347
Notes 02
67
Scenario - Read cost
1
.
F
2
.
F
ci,1
3
.
ci,3
ci,2
.
C=inf
.
CS 347
F
.
C=inf
i
C=inf
Notes 02
Stream of read
requests for F
ti REQ/SEC
68
Write cost
m
m

Xj ui C’ij
i=1 j=1
i: Originating site of request
j: Site being updated
Xj: 0 if F not stored at Sj
1 if F stored at Sj
ui: Write traffic at Si
C’ij: Write cost
Updating F at Sj from Si
CS 347
Notes 02
69
Scenario - write cost
F
.
F
CS 347
F
.
.
.
.
.
i
Notes 02
Updates
ui updates/sec
70
Storage cost:
m

i=1
Xi:
d i:
CS 347
Xi di
0 if F not stored at Si
1 if F stored at Si
storage cost at Si
Notes 02
71
Target function:
m
min

i=1
m
[ti MIN Cij +
j
+
CS 347
m

i=1

j=1
Xj  ui  C’ij ]
Xi  d i
Notes 02
72
Can add more complications:
Examples:
- Multiple fragments
- Fragment sizes
- Concurrency control cost
CS 347
Notes 02
73
Case Study: PNUTS
• Where in the World is My Data?
Sudarshan Kadambi, Jianjun Chen, Brian F. Cooper, David Lomax,
Raghu Ramakrishnan, Adam Silberstein, Erwin Tam, Hector
Garcia-Molina; VLDB 2011
• Distributed object/tuple store for Yahoo!
CS 347
Notes 02
74
Case Study: PNUTS
• Issue: Where to locate data
• Issue: What and where to replicate
CS 347
Notes 02
75
PNUTS Discussion
• Dynamic vs Static fragment placement
• Caching vs Replication
CS 347
Notes 02
76
Policy Constraints
• MIN_COPIES: The minimum number of full
replicas of the record that must exist.
• INCL_LIST: An inclusion list -- the locations
where a full replica of the record must exist.
• EXCL_LIST: An exclusion list -- the locations
where a full replica of the record cannot exist.
CS 347
Notes 02
77
Example Rule
• Rule 1:
• IF TABLE_NAME = "Users“
THEN
SET 'MIN_COPIES' = 2
CONSTRAINT_PRI = 0
CS 347
Notes 02
78
Another Example Rule
• Rule 2:
• IF TABLE_NAME = "Users" AND
FIELD STR('home location') = 'France‘
THEN
SET 'MIN_COPIES' = 3 AND
SET 'EXCL LIST' = 'USWest, USEast‘
CONSTRAINT PRI = 1
CS 347
Notes 02
79
Summary
•
•
•
•
Description of fragmentation
Good fragmentations
Design of fragmentation
Allocation
CS 347
Notes 02
80
Download