Chapter 3 Distributed Database Design

advertisement
Chapter 3
Distributed Database
Design
Table of Contents
z
z
z
z
Alternative Design Strategies
Distribution Design Issues
Fragmentation
Allocation
Chapter 3 - 1
1. Alternative Design Strategies
z Two major strategies
✔ Top-down approaches
✔ Bottom-up approaches
Chapter 3 - 2
1.1 Top-Down Design Process
5HTXLUHPHQW $QDO\VLV
6\VWHP 5HTXLUHPHQWV REMHFWLYHV
XVHU
LQSXW
&RQFHSWXDO 'HVLJQ
9LHZ
*OREDO &RQFHSWXDO 6FKHPD
«_
9LHZ 'HVLJQ
$FFHVV ,QIRUPDWLRQ
([WHUQDO 6FKHPD 'HILQLWLRQV
8VHU
'LVWULEXWLRQ 'HVLJQ
,QSXW
/RFDO &RQFHSWXDO 6FKHPD
3K\VLFDO 'HVLJQ
3K\VLFDO 6FKHPD
)HHGEDFN
2EVHUYDWLRQ DQG 0RQLWRULQJ
)HHGEDFN
Chapter 3 - 3
Details of Design Process
z Requirement Analysis
✔Defines the environment of the system
✔Elicits both the data and processing needs of all potential
DB users
z System Requirements
✔Where the final system is expected to stand?
✔Performance, Reliability, Availability, Economics,
Flexibility
Chapter 3 - 4
Details of Design Process (Cont’d)
z Conceptual Design
✔Determines entity types and relationships among these
entities
✔Entity analysis:
– determines the entities, attributes, and relationships
✔Functional analysis:
– determines the fundamental functions with which the
modeled enterprise is involved
✔The process is identical to the centralized database design
Chapter 3 - 5
Details of Design Process (Cont’d)
z View Design
✔Defines the interfaces for end users
✔The conceptual schema can be interpreted as being an
integration of user views.
z Distribution Design
✔Designs the local conceptual schema by distributing the
entities over the sites of the distributed system
✔Consists of two steps :
fragmentation and allocation
Chapter 3 - 6
1.2 Bottom-Up Design Process
z Top-Down Approach:
Suitable when a system is being designed from scratch
z Bottom-Up Approach :
Suitable when many DBs exist, and the design task
involves integrating them into one DB
Æ The bottom-up design process consists of integrating
local schemas into the global conceptual schema.
Æ Schema Translation & Schema Integrating
Æ In the context of Heterogeneous Database !
Chapter 3 - 7
2. Distribution Design Issues
z
z
z
z
Why fragment at all?
How should we fragment?
How much should we fragment?
Is there any way to test the correctness of
decomposition?
z How should we allocate?
z What is necessary information for fragmentation and
allocation?
Chapter 3 - 8
2.1 Reasons for Fragmentation
z A relation is not an appropriate unit of distribution.
✔Application views are usually subsets of relations.
✔Unnecessarily high volume of remote data access or
unnecessary replication
✔Not support intra-query concurrency
Æ decompose a relation into fragments
z Disadvantages of fragmentation
✔applications defined on more than one fragments:
performance degradation by union or join
✔semantic data control :
integrity checking is very difficult
Chapter 3 - 9
(
*
(12
(1$0(
7,7/(
(12
-12
5(63
'85
(
- 'RH
(OHFW (QJ
(
-
0DQJHU
(
0 6PLWK
6\VW $QDO
(
-
$QDO\VW
(
$ /HH
0HFK (QJ
(
-
$QDO\VW
(
- 0LOOHU
3URJUDPPHU
(
-
&RQVXOWDQW
(
% &DVH\
6\VW $QDO
(
-
(QJLQHHU
(
/ &KX
(OHFW (QJ
(
-
3URJUDPPHU
(
5 'DYLV
0HFK (QJ
(
-
0DQDJHU
(
- -RQHV
6\VW $QDO
(
-
0DQDJHU
(
-
(QJLQHHU
(
-
0DQDJHU
-
6
-12
-1$0(
%8'*(7
/2&
7,7/(
6$/
-
,QVWUXPHQWDWLRQ
0RQWUHDO
(OHFW (QJ
-
'DWDEDVH 'HYHORS
1HZ <RUN
6\VW $QDO
-
&$'&$0
1HZ <RUN
0HFK (QJ
-
0DLQWHQDQFH
3DULV
3URJUDPPHU
Chapter 3 - 10
2.2 Fragmentation Alternative
z Horizontal Fragmentation or Vertical Fragmentation
-
-12
-1$0(
%8'*(7
/2&
-
,QVWUXPHQWDWLRQ
0RQWUHDO
-
'DWDEDVH 'HYHORS
1HZ <RUN
%8'*(7
/2&
-
-12
-1$0(
-
&$'&$0
1HZ <RUN
-
0DLQWHQDQFH
3DULV
Example of Horizontal Partitioning
Chapter 3 - 11
-
-
-12
%8'*(7
-12
-1$0(
/2&
-
-
,QVWUXPHQWDWLRQ
0RQWUHDO
-
-
'DWDEDVH 'HYHORS
1HZ <RUN
-
-
&$'&$0
1HZ <RUN
-
-
0DLQWHQDQFH
3DULV
Example of Vertical Partitioning
Chapter 3 - 12
2.3 Degree of Fragmentation
z Not to fragment at all: relation
z Fragment to the level of individual tuples or
Fragment to the level of individual attributes
z Suitable level of fragmentation?
✔Such a level can only be defined with respect to the
applications that run on the database.
Æ According to the value of application-specific
parameters, individual fragments can be identified.
Chapter 3 - 13
2.4 Correctness Rules of Fragmentation
z Completeness
✔If a relation instance R is decomposed into fragments
R1, R2, . . ., Rn, each data item can be found in R can
also be found in one or more of Rj’s.
z Reconstruction
✔If a relation instance R is decomposed into fragments
R1, R2, . . ., Rn, it should be possible to define a
relational operator ∇such that
R = ∇Rj, ∀ Rj ∈ FR
z Disjointness
✔If a relation instance R is decomposed into fragments
R1, R2, . . ., Rn, and data item dj is in Rj, it is not any
other fragment Rk ( k ≠ j ).
Chapter 3 - 14
2.5 Allocation Alternatives
z Comparison of Replication Alternatives
Full Replication
Partial Replication
Partitioning
Query
Processing
Directory
Management
Concurrency
Control
Easy
Difficult
Difficult
Easy or
nonexistent
Difficult
Difficult
Moderate
Difficult
Easy
Reliability
Very high
High
Low
Possible application
Realistic
Possible application
Reality
Chapter 3 - 15
2.6 Information Requirements
z
z
z
z
Database Information
Application Information
Communication Network Information
Computer System Information
Chapter 3 - 16
3. Fragmentation
z Design of Horizontal Fragmentation
z Design of Vertical Fragmentation
z Design of Hybrid Fragmentation
Chapter 3 - 17
3.1 Horizontal Fragmentation
z Primary horizontal fragmentation
z Derived horizontal fragmentation
z Information requirements of horizontal
fragmentation
✔Database Information
✔Application Information
Chapter 3 - 18
Database Information
z Concerns the global conceptual schema
✔How the DB relations are connected to one another,
especially with joins?
✔Expression of relationships among relations using links
z Example
Æ &r zV
þ ö ò R zV ³¦
-RLQ UHODWLRQVKLS
6
7,7/( 6$/
(
6
6
6
"
-
(12 (1$0( 7,7/(
-12 -1$0( %8'*(7 /2&
*
(12 -12 5(63 '85
Chapter 3 - 19
Application Information
z Predicates used in user queries
✔The most active 20% of user queries account for 80%
of the total data access.
z Simple Predicate
✔pj : Aj θ Value, θ ∈ {=, <, ≠, ≤, >, ≥}
✔Pri : set of all simple predicates defined on relation Ri
z Minterm Predicates
✔mi : the conjunction of simple predicates
✔Mi : the set of minterm predicates for relation Ri
Mi = {mij | mij = ∧ pik*}, 1 ≤ k ≤ m, 1 ≤ j ≤ z
where pik ∈ Pri and (pik* = pik or ¬pik)
Chapter 3 - 20
Example: Consider Relation ‘S’
S 7,7/(
³(OHFW(QJ´
P 7,7/(
³(OHFW(QJ´
∧
6$/
≤
S 7,7/(
³6\VW$QDO´
P 7,7/(
³(OHFW(QJ´
∧
6$/
>
S 7,7/(
³0HFK(QJ´
P ¬7,7/(
³(OHFW(QJ´
∧
6$/
≤
S 7,7/(
³3URJUDPPHU´
P ¬7,7/(
³(OHFW(QJ´
∧
6$/
>
S 6$/
≤
P 7,7/(
³3URJUDPPHU´
∧
6$/
≤
S 6$/
>
P 7,7/(
³3URJUDPPHU´
∧
6$/
>
Chapter 3 - 21
Application Information (ºS)
z Quantitative Information
✔Minterm selectivity: sel(mi)
– Number of tuples of relations that would be accessed
by a user query specified according to a given minterm
✔Access frequency: acc(qi)
– Frequency with which user application access data
Chapter 3 - 22
Primary Horizontal Fragmentation
z Definition
✔A fragmentation generated by a selection operation on
the owner relation of a database schema
✔Given relation Ri, its horizontal fragments are
Rij = σFj(Ri), 1 ≤ j ≤ w, Fj : the selection formula (mij)
z Example : Sample Relation -
-
-
σ/2&
σ/2&
σ/2&
-
³1HZ <RUN´ -
³3DULV´ -
³0RQWUHDO´
Chapter 3 - 23
Example: Minterm Fragments
-
-12
-1$0(
-
,QVWUXPHQWDWLRQ
%8'*(7
/2&
0RQWUHDO
%8'*(7
/2&
-
-12
-1$0(
-
'DWDEDVH 'HYHORS
1HZ <RUN
-
&$'&$0
1HZ <RUN
-
-12
-
-1$0(
%8'*(7
/2&
0DLQWHQDQFH
3DULV
Chapter 3 - 24
Simple PredicateN ¿ö ö
z Completeness
✔PrN simple predicateÆ Nj Ó fragmentÆ öj ÷
fragmentÆ â^Š þJÒÚj .f K8j Ïr
V ³¦, Pr is complete!
✔Example
±
3U
± 3U¶
^ /2&
³0RQWUHDO´ /2&
∪
≤
3U
^ %8'*(7
³1HZ <RUN´ /2&
%8'*(7
>
³3DULV´ `
`
z Minimality
✔PrÆ Nj fragment Fö F1ò F24 zV *, F1ò F2H
ºº4 .fNŠ GŸj j¢
✔Example
± 3U¶¶
3U¶
∪
^ -1$0(
³,QVWUXPHQWDWLRQ´ `
Chapter 3 - 25
$OJRULWKP
&20B0,1
LQSXW 5 UHODWLRQ 3U VHW RI VLPSOH SUHGLFDWHV
RXWSXW 3U¶ VHW RI VLPSOH SUHGLFDWHV
GHFODUH ) VHW RI PLQWHUP IUDJPHQWV
EHJLQ
ILQG D SL
3U¶
∈
3U VXFK WKDW SL SDUWLWLRQV 5 DFFRUGLQJ WR
3B5XOH
SL
3U
3USL
)
IL
^ IL LV WKH PLQWHUP IUDJPHQW DFFRUGLQJ WR SL `
GR
EHJLQ
ILQG D SM
∈
3U VXFK WKDW SM SDUWLWLRQV VRPH IN RI 3U¶
DFFRUGLQJ WR
3U¶
3U¶
3U
∪
3B5XOH
SM
3U ± SM
)
)
∪
IM
HQGEHJLQ
XQWLO 3U¶ LV FRPSOHWH
HQG ^&20B0,1`
P_Rule : fundamental rule of completeness and minimality, which states
that a fragment is partitioned “into at least two parts which are accessed
differently by at least one application.”
Chapter 3 - 26
$OJRULWKP
3+25,=217$/
LQSXW 5L UHODWLRQ 3UL VHW RI VLPSOH SUHGLFDWHV
RXWSXW 0L VHW RI PLQWHUP IUDJPHQWV
EHJLQ
3U¶
&20B0,15L 3UL
GHWHUPLQH WKH VHW 0L RI PLQWHUP SUHGLFDWHV
GHWHUPLQH WKH VHW ,L RI LPSOLFDWLRQV DPRQJ SL
IRU HDFK PL
∈
∈
3UL¶
0L GR
LI PL LV FRQWUDGLFWRU\ DFFRUGLQJ WR , WKHQ
0L
0L ± PL
HQGLI
HQGIRU
HQG ^3+25,=217$/`
Example:
,
0
S DWW
YDOXH
S DWW
YDOXH
⇒ ¤DWW YDOXH
YDOXH ⇒ ¤DWW
YDOXH
L DWW
P DWW
YDOXH ∧ DWW
YDOXH
YDOXH ∧ ¤DWW
YDOXH
P DWW
YDOXH ∧ DWW
YDOXH
P ¤DWW
P ¤DWW
YDOXH ∧ ¤DWW
YDOXH
L DWW
YDOXH
contradictory by I
contradictory by I
Chapter 3 - 27
Example
z 6, -: subject of primary horizontal fragmentation
z Assumption for 6
✔There is only 1 application that accesses 6.
✔That application checks the salary information.
✔Queries for 6 are issued at two sites.
z Simple Predicates
S 6$/
S 6$/
⇒ 3U
≤
>
^S S`
: complete and minimal by COM_MIN
Chapter 3 - 28
Example (Cont’d)
z Minterm Predicates
P 6$/
P 6$/
P P ≤
≤
¬6$/
¬6$/
∧ 6$/ > ∧ ¬6$/ > ≤ ∧ 6$/ > ≤ ∧ ¬6$/ > ⇒ m1 and m4 is contradictory : 0 ^PP`
Therefore, we define two fragments,
Fs = {S1, S2} according to M.
6
6
7,7/(
6$/
7,7/(
6$/
0HFK (QJ
(OHFW (QJ
3URJUDPPHU
6\VW $QDO
Chapter 3 - 29
Example (Cont’d)
z Assumption for -
✔There are 2 applications that access -.
✔The first is issued at three sites and finds the names and
budgets of projects given their number.
✔The second is issued at two sites and has to do with the
management of the projects.
z Simple Predicates
S /2&
³0RQWUHDO´
S /2&
³1HZ <RUN´
S /2&
³3DULV´
S %8'*(7
⇒ 3U¶
S %8'*(7
≤
>
^S S S S S`
is complete and minimal : COM_MIN
Chapter 3 - 30
Example (Cont’d)
z Minterm Predicates
∧ %8'*(7 ≤ ∧ %8'*(7 > ³1HZ <RUN´ ∧ %8'*(7 ≤ ³1HZ <RUN´ ∧ %8'*(7 > ³3DULV´ ∧ %8'*(7 ≤ ³3DULV´ ∧ %8'*(7 > P /2&
³0RQWUHDO´
P /2&
³0RQWUHDO´
P /2&
P /2&
P /2&
P /2&
Therefore, we define six fragments,
)- ^------` according to M.
Chapter 3 - 31
Derived Horizontal Fragmentation
z N
✔Defined on a member relation of a link according to a
selection operation on its owner.
✔Given a link L, owner(L) = S & member(L) = R
– Ri = R semi_join Si, 1 ≤ i ≤ w
– w : # of fragments that will be defined on R
– Si = a primary horizontal fragment for S
z Example
L1 : owner(L1) = S and member(L1) = E
E1 : E semi_join S1, where S1 = σSAL ≤ 30000(S)
E2 : E semi_join S2, where S2 = σSAL > 30000(S)
Chapter 3 - 32
Potential Complication
z .
✔ When there are more than two links into a relation, there is more
than one possible horizontal fragmentation of the relation.
z Two criteria
✔ Fragmentation used in more applications
✔ Fragmentation with better join characteristics
z Recall the advantages of the fragmentation
✔ Performing a query on smaller relations
✔ Performing joins in a distributed fashion
z Simple Graph
✔ A graph with only one link coming in or going out of a fragment.
✔ Effects of storage and join performance!
Chapter 3 - 33
Example : Fragmentation of *
z Assumption
✔ There are two applications which access *.
✔ One finds the names of engineers who work at certain places.
✔ The other accesses the project that employees work on and how
long they will work on those projects.
z The first fragmentation according to -, -, -
✔ *
✔ *
✔ *
* VHPLBMRLQ - ZKHUH - ✔ *
✔ *
* VHPLBMRLQ (
* VHPLBMRLQ - ZKHUH - * VHPLBMRLQ - ZKHUH - σ/2&
σ/2&
σ/2&
³0RQWUHDO´-
³1HZ <RUN´-
³3DULV´-
z The second fragmentation according to (, (
* VHPLBMRLQ (
The final choice of the fragmentation scheme may be a decision
problem addressed during allocation.
Chapter 3 - 34
Checking for Correctness
z Completeness
✔PHF: A set of complete and minimal predicates, 3U¶
✔DHF: Ensures referential integrity
z Reconstruction
✔for a relation R with fragments )5
5 ∪ 5L∀5L ∈ )5
^55«5:`
z Disjointness
✔PHF: Minterm predicates determining the
fragmentation are mutually exclusive
✔DHF: Disjointness can be guaranteed if the join graph
is simple; otherwise investigate actual tuple values
Chapter 3 - 35
3.2 Vertical Fragmentation
z Definition
Partitions R to fragments R1, R2, …, Rr, each which
contains a subset of R’s attribute as well as the primary
key of R
z Inherently more complicated than horizontal
partitioning
✔ Total number of alternatives
✔ Obtaining optimal solution is very difficult
✔ Resort to heuristics
Chapter 3 - 36
Two Types of Heuristics
z Grouping
✔Starts by assigning each attribute to one fragment
✔Joins some of fragments until some criteria is satisfied.
z Splitting
✔Starts with a relation and decides on beneficial partitioning
✔Top-down design methodologyÆ ÷_
z Note
– Replication of the key in the fragments
– Therefore, splitting is considered only for those attributes that do
not participate in the primary key.
Chapter 3 - 37
Information Requirements of
Vertical Fragmentation
z What needs to be determined about applications?
✔ Affinity of attributes: How closely related the attributes are?
✔ Attribute usage value: use(qi, Aj) = 1 or 0
z Example
T 6(/(&7 %8'*(7 )520 - :+(5( -12
T 6(/(&7 -1$0( %8'*(7 )520 -
T 6(/(&7 -1$0( )520 - :+(5( /2&
9DOXH
9DOXH
T 6(/(&7 680%8'*(7 )520 - :+(5( /2&
$IILQLW\ 0DWUL[
$
$
$
$
T
T
T
T
9DOXH
$
-12
$
%8'*(7
$
-1$0(
$
/2&
Chapter 3 - 38
Attribute Affinity
z Attribute Affinity
aff(A i , A j ) =
∑
∑ ref
k |use ( q k , Ai ) =1 ∧ use ( q k , A j ) =1 ∀ S l
l
( q k ) acc l ( q k )
refl(qk) : # of accesses to attributes (Ai, Aj) for each
execution of application qk at site Sl
accl(qk) : the application access frequency measure
Chapter 3 - 39
Example
– Assume that refl(qk) = 1 for all qk and Sl
– Application frequencies
acc1(q1) = 15
acc1(q2) = 5
acc1(q3) = 25
acc1(q4) = 3
1
acc2(q1) = 20
acc2(q2) = 0
acc2(q3) = 25
acc2(q4) = 0
acc3(q1) = 10
acc3(q2) = 0
acc3(q3) = 25
acc3(q4) = 0
3
aff(A 1 , A 3 ) = ∑ ∑ acc l ( q k ) = acc 1 (q 1 ) + acc 2 (q 1 ) + acc 3 (q 1 ) = 45
k =1 l =1
attribute affinity matrix (AA)
A1
A2
A3
A4
A1
45
0
45
0
A2
0
80
5
75
A3
45
5
53
3
A4
0
75
3
78
Chapter 3 - 40
Clustering Algorithm
z î .
✔ Find some means of grouping the attributes of a relation based on the
attribute affinity values in AA.
✔ Net contribution to the global affinity measure of placing attribute Ak
between Ai and Aj ;
± FRQW$L $N $M
ERQG$L $N ERQG$N $M ± ERQG$L $M
n
where bond(Ax , A y ) = ∑ aff ( Az , Ax )aff ( Az , Ay )
z =1
z Example
cont(A1, A4, A2) = bond(A1, A4) + bond(A4, A2) – bond(A1, A2)
bond(A1, A4) = 45 × 0 + 0 × 75 + 45 × 3 + 0 × 78 = 135
bond(A4, A2) = 11865
bond(A1, A2) = 225
Therefore, cont(A1, A4, A2) = 135 + 11865 – 225 = 11775
Chapter 3 - 41
$OJRULWKP
&/867(5,1*
LQSXW $$ DWWULEXWH DIILQLW\ PDWUL[
RXWSXW &$ FOXVWHUHG DIILQLW\ PDWUL[
EHJLQ
^ LQLWLDOL]H UHPHPEHU WKDW $$ LV DQ Q
&$‡ &$‡ LQGH[
×
Q PDWUL[ `
$$‡ $$‡ ZKLOH LQGH[
≤
EHJLQ
Q GR ^FKRRVH WKH ³EHVW´ ORFDWLRQ IRU DWWULEXWH $$LQGH[ `
IRU L IURP WR LQGH[ ± E\ GR
FDOFXODWH FRQW$L ± $LQGH[ $L
HQGIRU
FDOFXODWH FRQW$LQGH[ ± $LQGH[ $LQGH[ ^ERXQGDU\ FRQG`
ORF
SODFHPHQW JLYHQ E\ PD[LPXP FRQW YDOXH
IRU M IURP LQGH[ WR ORF E\ ± GR
&$‡ M
&$‡ M ± HQGIRU
&$‡ ORF
LQGH[
$$‡ LQGH[
LQGH[ HQGZKLOH
RUGHU WKH URZV DFFRUGLQJ WR WKH UHODWLYH RUGHULQJ RI FROXPQV
HQG ^&/867(5,1*`
Chapter 3 - 42
Example
Ordering(0-3-1) :
cont(A0, A3, A1) = bond(A0, A3) + bond(A3, A1) – bond(A0, A1)
bond(A0, A1) = bond(A0, A3) = 0
bond(A3, A1) = 45 × 45 + 5 × 0 + 53 × 45 + 3 × 0 = 4410
cont(A0, A3, A1) = 4410
Ordering(1-3-2) :
cont(A1, A3, A2) = bond(A1, A3) + bond(A3, A2) – bond(A1, A2)
bond(A1, A3) = bond(A3, A1) = 4410
bond(A3, A2) = 890, bond(A1, A2) = 225
cont(A1, A3, A2) = 5525
Chapter 3 - 43
Ordering(2-3-4) :
cont(A2, A3, A4) = bond(A2, A3) + bond(A3, A4) – bond(A2, A4)
bond(A2, A3) = 890
bond(A3, A4) = bond(A2, A4) = 0
cont(A1, A3, A2) = 890
And so forth … : The resulting Clustered Affinity Matrix (CA)
A1
A3
A2
A4
A1
45
45
0
0
A3
45
53
5
3
A2
0
5
80
75
A4
0
3
75
78
Chapter 3 - 44
Partitioning Algorithm
z The upper left-hand corner of CA : TA
z The lower right-hand corner of CA : BA
AQ(qi)
TQ
BQ
OQ
=
=
=
=
{ Aj | use(qi, Aj) = 1 }
{ qi | AQ(qi) ⊆ TA }
{ qi | AQ(qi) ⊆ BA }
Q – { TQ ∪ BQ }
Chapter 3 - 45
∑∑
CQ =
q i ∈Q ∀ S
CTQ =
ref j ( q i ) acc j ( q i )
j
∑ ∑ ref
q i ∈TQ ∀ S
j
( q i ) acc j ( q i )
j
( q i ) acc j ( q i )
j
( q i ) acc j ( q i )
j
∑ ∑ ref
CBQ =
q i ∈ BQ ∀ S
COQ =
j
∑ ∑ ref
q i ∈ OQ ∀ S
j
To find the point x such that z is maximized :
z = CTQ × CBQ – COQ2
Chapter 3 - 46
$OJRULWKP
3$57,7,21
LQSXW &$ FOXVWHUHG DIILQLW\ PDWUL[ 5 UHODWLRQ
RXWSXW ) VHW RI IUDJPHQWV
EHJLQ
^ GHWHUPLQH WKH ] YDOXH IRU WKH ILUVW FROXPQ `
^ WKH VXEVFULSWV LQ WKH FRVW HTXDWLRQV LQGLFDWH WKH VSOLW SRLQW `
FDOFXODWH &74Q ± &%4Q ± &24Q ± EHVW
GR
&74Q ± ×
&%4Q ± ± &24Q ± IRU L IURP Q ± WR E\ ± GR
FDOFXODWH &74L &%4L &24L
]
LI ]
&74L
>
×
&%4L ± &24L
EHVW WKHQ
DVVLJQ EHVW WR ] DQG UHFRUG WKH VKLIW SRVLWLRQ
HQGLI
HQGIRU
FDOO 6+,)7&$
XQWLO QR PRUH 6+,)7 LV SRVVLEOH
UHFRQVWUXFW WKH PDWUL[ DFFRUGLQJ WR WKH VKLIW SRVLWLRQ
5
5
)
Π7$5 ∪ .
Π%$5 ∪ .
^5 5`
^ . LV WKH VHW RI SULPDU\ NH\ DWWULEXWHV RI 5 `
HQG ^3$57,7,21`
Chapter 3 - 47
Checking for Correctness
z Completeness
A = TA ∪ BA
z Reconstruction
R = JOINK Ri, ∀Ri ∈ FR
z Disjointness
not important as horizontal fragmentation due to the
replication of primary key
Chapter 3 - 48
3.3 Hybrid Fragmentation
R
H
H
R1
V
R11
R2
V
R12
V
R21
V
R22
V
R23
z The levels of nesting in most practical applications
do not exceed 2.
Chapter 3 - 49
Correctness of Hybrid Fragmentation
z Reconstruction
✔Starts at the leaves of the partitioning tree and moves
upward by performing joins and unions
z Completeness
✔Fragmentation is complete if the intermediate and leaf
fragments are complete.
z Disjointness
✔Fragmentation is disjoint if the intermediate and leaf
fragments are disjoint.
Chapter 3 - 50
4. Allocation
z Definition
The allocation problem involves finding the optimal
distribution of relations (fragments) to sites.
z Measures of optimality
✔ Minimal cost :
– cost of storing, querying, updating, and
data communication
✔ Performance :
– to minimize the response time and
– to maximize the system throughput at each site
Chapter 3 - 51
4.1 Some example of data placement
and allocation
(Example 1) Single-relation case
Table 1 :
Table 2 :
SDJH SDJH SDJH SDJH SDJH SDJH SDJH SDJH S
T
U
V
S
U
S
U
T
U
V
S
T
V
T
V
U
V
S
T
S
U
S
U
V
S
T
U
T
V
T
V
Query : { (1, *)?, (2, *)?, …, (*, p)?, (*, q)?, … }
z Distributed placement
– site 1 : page(1, 4),
site 2 : pages(2, 3)
– site 1 : page(1, 2),
– site 1 : page(1, 3),
site 2 : pages(3, 4)
site 2 : pages(2, 4)
Chapter 3 - 52
(Example 2) Multiple-relation case
SDJH SDJH SDJH 5
5
5
5
SDJH 5
5
5
5
D S
S D T
S D U
S D V
S E T
T E U
T E V
T E S
T F U
U F V
U F S
U F T
U G V
V G S
V G T
V G U
V SDJH SDJH SDJH SDJH 5
5
5
5
5
5
5
5
D S
S F S
S D U
U F U
U E S
S G S
S E U
U G U
U D T
T F T
T D V
V F V
V E T
T G T
T E V
V G V
V σcol1 = ‘a’(R1) JOINcol2 = col1 σcol2 = 1(R2)
z Distributed placement
– site 1 : page(1, 2),
– site 1 : page(1, 3),
site 2 : pages(3, 4)
site 2 : pages(2, 4)
Chapter 3 - 53
4.2 A practical combinatorial optimization
approach to the file allocation problem
z Assumption
✔ Most files are not fragmented.
✔ It is unlikely that we will try to exploit parallelism in our file
allocation.
✔ Each computing facilities have tight limits on their local mass
storage capacity.
✔ Storage is considered to be a constant on the optimization, rather
than as a cost.
✔ The transaction traffic is known in advance.
✔ Reads and updates have equal costs.
✔ Remote accesses all have the same unit cost.
✔ No redundancy is permitted and fragmentation of file is forbidden.
Chapter 3 - 54
Notation and Constraints
z
z
z
z
z
N nodes, indexed by j, capacity = cj
M files, indexed by i, size = si
T transactions, indexed by k, frequency from node j = fkj
nki accesses from transaction k to file i
xij : decision variable 1 – file i is allocated to node j, 0 – otherwise
∑
x ij = 1 , ∀ i | 1 ≤ i ≤ M
∑
x ij s i ≤ c j , ∀ j | 1 ≤ j ≤ N
j
i
The goal of FAP = maximize(Σi,j Xij Vij)
where Vij = Σk fkj (nki) × cost of local retrievals
Chapter 3 - 55
Algorithm FAP
1. Calculate J(i) = { j’ | Vij = max Vij}, 1 ≤ j ≤ N
2. An optimal set of xkj is given by xij = 1 for some j ∈ J(i)
and xij = 0 otherwise.
3. If this solution is feasible (i.e. meets the constraints), it is our answer;
go to step 7
4. Otherwise, identify all nodes which cause the constraints to be broken.
5. For every such over-subscribed node, solve the corresponding knapsack
problem, thereby eliminating a node and the files allocated to that node
from further consideration.
6. Consider J(i) for any nodes j which remain. If there are such nodes go to
step 2
7. Otherwise, we have finished.
Chapter 3 - 56
Example
Allocate 8 files among five sites, each with 20 MB disk.
Access rates of transactions to files (nki)
7UDQVDFWLRQV
)LOH VL]H0E\WHV Chapter 3 - 57
The frequency of transactions in sites (fkj)
7UDQVDFWLRQV
6LWHV
Chapter 3 - 58
Vij table
6LWH M
)LOH VL]H0E\WHV J(i) are the yellow elements for each i.
If we assign xij = 1 for these and 0 for the other entries we have our first solution.
Site 1 has been allocated 55Mbytes of files. This is not a feasible solution.
Site 1 has been allocated too much.
The maximum value(Vij) we can get from storing any files on site 1 is obtained
by storing files 1, 2, and 8 there.
6. Our new Vij table is obtained by eliminating row 1 above and column 1, 2, and 8.
The new J(i) are the underlined entries ( all allocated to site 3 )
1.
2.
3.
4.
5.
Chapter 3 - 59
2’.
3’.
4’.
5’.
Assign xij = 1 to these, xij = 0 to the remainder of the entries.
Site 3 has been allocated 47 Mbytes.
Site 3 has been overloaded.
The maximum value we can get from storing files on site 3 is obtained
by storing files 4 and 5 there.
6’. Our new Vij table is obtained by eliminating row 3 and column 4 and
5 from the reduced table.
New Vij
6LWH
)LOH VL]H0E\WHV The new j(i) are underlined ( all allocated to site 4 )
Chapter 3 - 60
2’’. Assign 1 to xij for these entries. 0 for the rest.
3’’. Site 4 has been allocated 29 Mbytes.
4’’. Site 4 has been overloaded.
5’’. Store file 3 at site 4.
6’’. Our new Vij table is obtained by eliminating row 2, column 1 from the
table above.
Without spelling out the details, it is clear that the remaining 2 files, 6 and
7, are allocated to site 5.
So our solution is
7RWDO VSDFH XVHG
6LWH
)LOH
0E\WHV
Chapter 3 - 61
4.3 Database Allocation Problem
z DAP is different from FAP
✔The relationship between fragments should be taken into
account.
✔The relationship between the allocation and query
processing should be properly modeled.
✔FAP do not take into consideration the cost of integrity
enforcement.
✔The cost of enforcing concurrency control mechanisms
should be considered.
Chapter 3 - 62
z There are no general heuristic models that take as
input a set of fragments and produce a
near-optimal allocation subject to the types of
constraints discussed here.
z We present a relatively general model and then
discuss a number of possible heuristics that
might be employed to solve it.
Chapter 3 - 63
Information Requirements
z Database information
– the selectivity of a fragment Fj with respect to query qi : seli(Fj)
– the size of a fragment Fj : size(Fj) = card(Fj) × length(Fj)
z Application information
– # of read (write) accesses from qi to Fj : RRij (URij)
– UM with uij (1 or 0), RM with rij (1 or 0), and O with o(i)
– for each query, a maximum allowable response time is defined
z Site information
– for each site, its storage and processing capacity is defined
– unit cost of storing data at site Sk : USCk
– the cost of processing one unit of work at site Sk : LPCk
z Network information
– the communication cost per frame between Si and Sj : gij
– the size (in bytes) of one frame : fsize
Chapter 3 - 64
Allocation Model
z Objectives
Minimize(Total Cost) subject to response-time/storage/processing
constraint
z xij = 1 if Fi is stored at Sj, and xij = 0 otherwise
z Total Cost
TOC =
∑ QPC
∀qi ∈Q
i
+
∑ ∑ STC
jk
∀S k ∈S ∀F j ∈F
STCjk : the cost of fragment Fj at site Sk
STCjk = USCk × size(Fj) × xjk
QPCi : query processing cost of application qi
QPCi = processing cost (PCi) + transmission cost(TCi)
Chapter 3 - 65
PCi = access cost (ACi ) + integrity enforcement cost (IEi ) + CC cost (CC i )
AC i =
∑ ∑ (u
ij
× UR ij + rij × RR ij ) × x jk × LPC k
∀ S k ∈S ∀ F j ∈ F
TC i = TCU i + TCR i
TCU i =
TCR i =
∑ ∑
∀ S k ∈S ∀ F j ∈ F
∑
∀F j ∈F
u ij × x jk × g o ( i ), k +
min S k ∈ S ( rij × x jk ×
∑ ∑
∀S k ∈S ∀F j ∈F
sel i ( F j )
fsize
u ij × x jk × g k , o ( i )
× g k ,o ( i ) )
Chapter 3 - 66
Solution Methods
z The formulation of DAP is NP-complete.
z Thus, one has to look for heuristic methods that yield
suboptimal solutions.
z Heuristic methods
– knapsack problem solution
– branch-and-bound
– network flow algorithm
⇒ There is not enough data to determine how close the
results are to the optimal.
Chapter 3 - 67
Download