7-Query_Localization..

advertisement
Outline
•
•
•
•
•
•
Introduction
Background
Distributed Database Design
Database Integration
Semantic Data Control
Distributed Query Processing
➡ Overview
➡ Query decomposition and localization
•
•
•
•
•
•
•
•
➡ Distributed query optimization
Multidatabase query processing
Distributed Transaction Management
Data Replication
Parallel Database Systems
Distributed Object DBMS
Peer-to-Peer Data Management
Web Data Management
Current Issues
Distributed DBMS
© M. T. Özsu & P. Valduriez
Ch.7/1
Query Processing in a DDBMS
high level user query
query
processor
Low-level data manipulation
commands for D-DBMS
Distributed DBMS
© M. T. Özsu & P. Valduriez
Ch.7/2
Distributed Query Processing
Methodology
Calculus Query on Distributed Relations
Query
Decomposition
GLOBAL
SCHEMA
Algebraic Query on Distributed
Relations
CONTROL
SITE
Data
Localization
FRAGMENT
SCHEMA
Fragment Query
Global
Optimization
STATS ON
FRAGMENTS
Optimized Fragment Query
with Communication Operations
LOCAL
SITES
Local
Optimization
LOCAL
SCHEMAS
Optimized Local Queries
Distributed DBMS
© M. T. Özsu & P. Valduriez
Ch.7/3
Step 1 – Query Decomposition
Input : Calculus query on global relations
•
•
Normalization
➡ manipulate query quantifiers and qualification
Analysis
➡ detect and reject “incorrect” queries
•
•
➡ possible for only a subset of relational calculus
Simplification
➡ eliminate redundant predicates
Restructuring
➡ calculus query  algebraic query
➡ more than one translation is possible
➡ use transformation rules
Distributed DBMS
© M. T. Özsu & P. Valduriez
Ch.7/4
Normalization
•
Lexical and syntactic analysis
➡ check validity (similar to compilers)
➡ check for attributes and relations
➡ type checking on the qualification
•
Put into normal form
➡ Conjunctive normal form
(p11 p12  …  p1n)  …  (pm1  pm2  …  pmn)
➡ Disjunctive normal form
(p11  p12  …  p1n)  …  (pm1  pm2  …  pmn)
➡ OR's mapped into union
➡ AND's mapped into join or selection
Distributed DBMS
© M. T. Özsu & P. Valduriez
Ch.7/5
Analysis
•
•
Refute incorrect queries
Type incorrect
➡ If any of its attribute or relation names are not defined in the global schema
•
➡ If operations are applied to attributes of the wrong type
Semantically incorrect
➡ Components do not contribute in any way to the generation of the result
➡ Only a subset of relational calculus queries can be tested for correctness
✦
Those that do not contain disjunction and negation
➡ To detect
connection graph (query graph)
✦ join graph
✦
Distributed DBMS
© M. T. Özsu & P. Valduriez
Ch.7/6
Analysis – Example
SELECT
FROM
WHERE
AND
AND
AND
AND
ENAME,RESP
EMP, ASG, PROJ
EMP.ENO = ASG.ENO
ASG.PNO = PROJ.PNO
PNAME = "CAD/CAM"
DUR ≥ 36
TITLE = "Programmer"
Query graph
Join graph
DUR≥36
ASG
ASG.PNO=PROJ.PNO
EMP.ENO=ASG.ENO
TITLE =
“Programmer”
EMP
ENAME
Distributed DBMS
RESP
RESULT
EMP.ENO=ASG.ENO
PROJ
EMP
ASG
ASG.PNO=PROJ.PNO
PROJ
PNAME=“CAD/CAM”
© M. T. Özsu & P. Valduriez
Ch.7/7
Analysis
If the query graph is not connected, the query may be wrong or
use Cartesian product
SELECT
FROM
WHERE
AND
AND
AND
ENAME,RESP
EMP, ASG, PROJ
EMP.ENO = ASG.ENO
PNAME = "CAD/CAM"
DUR > 36
TITLE = "Programmer"
ASG
EMP
ENAME
Distributed DBMS
RESP
RESULT
© M. T. Özsu & P. Valduriez
PROJ
PNAME=“CAD/CAM”
Ch.7/8
Simplification
•
•
Why simplify?
➡ Remember the example
How? Use transformation rules
➡ Elimination of redundancy
✦
idempotency rules
p1  ¬( p1)  false
p1  (p1 p2)  p1
p1  false  p1
…
➡ Application of transitivity
➡ Use of integrity rules
Distributed DBMS
© M. T. Özsu & P. Valduriez
Ch.7/9
Simplification – Example
SELECT
FROM
WHERE
OR
AND
OR
AND
TITLE
EMP
EMP.ENAME = "J. Doe"
(NOT(EMP.TITLE = "Programmer")
(EMP.TITLE = "Programmer"
EMP.TITLE = "Elect. Eng.")
NOT(EMP.TITLE = "Elect. Eng."))

SELECT
FROM
WHERE
Distributed DBMS
TITLE
EMP
EMP.ENAME = "J. Doe"
© M. T. Özsu & P. Valduriez
Ch.7/10
Restructuring
•
•
•
Convert relational calculus to relational
algebra
Make use of query trees
Example
ENAME
Project
σDUR=12 OR DUR=24
Find the names of employees other than
σ
Select
PNAME=“CAD/CAM”
J. Doe who worked on the CAD/CAM
project for either 1 or 2 years.
SELECT ENAME
σENAME≠“J. DOE”
FROM
EMP, ASG, PROJ
WHERE
EMP.ENO = ASG.ENO
⋈PNO
AND
ASG.PNO = PROJ.PNO
AND
ENAME≠ "J. Doe"
⋈ENO
Join
AND
PNAME = "CAD/CAM"
AND
(DUR = 12 OR DUR = 24) PROJ
ASG
EMP
Distributed DBMS
© M. T. Özsu & P. Valduriez
Ch.7/11
Restructuring –Transformation
Rules
•
Commutativity of binary operations
➡R×SS×R
➡ R ⋈S  S ⋈R
➡RSSR
•
Associativity of binary operations
➡ ( R × S) × T  R × (S × T)
➡ (R ⋈S) ⋈T  R ⋈ (S ⋈T)
•
Idempotence of unary operations
➡ A’( A’(R))   A’(R)
➡ p
(
(R))
(A
)
p
(A
)
1 1
2 2
 p1(A1)p2(A2)(R)
where R[A] and A'  A, A"  A and A'  A"
•
Commuting selection with projection
Distributed DBMS
© M. T. Özsu & P. Valduriez
Ch.7/12
Restructuring – Transformation
Rules
•
Commuting selection with binary operations
➡ p(A)(R × S)  (p(A) (R)) × S
➡ p(A )(R ⋈(A ,B )S)  (p(A ) (R)) ⋈(A ,B )S
i
j k
i
j k
➡ p(A )(R  T)  p(A ) (R)  p(A ) (T)
i
i
i
where Ai belongs to R and T
•
Commuting projection with binary operations
➡ C(R × S)  A’(R) × B’(S)
➡ C(R ⋈(A ,B )S)  A’(R) ⋈(A ,B ) B’(S)
j k
j k
➡ C(R  S)  C(R)  C(S)
where R[A] and S[B]; C = A'  B' where A'  A, B'  B
Distributed DBMS
© M. T. Özsu & P. Valduriez
Ch.7/13
Example
ENAME
Recall the previous example:
Find the names of employees other
than J. Doe who worked on the
CAD/CAM project for either one or
two years.
SELECT ENAME
FROM
DUR=12  DUR=24
PNAME=“CAD/CAM”
Select
ENAME≠“J. DOE”
PROJ, ASG, EMP
WHERE ASG.ENO=EMP.ENO
⋈PNO
AND
ASG.PNO=PROJ.PNO
AND
ENAME ≠ "J. Doe"
AND
PROJ.PNAME="CAD/CAM"
AND
(DUR=12 OR DUR=24)
Distributed DBMS
Project
⋈ENO
PROJ
© M. T. Özsu & P. Valduriez
ASG
Join
EMP
Ch.7/14
Equivalent Query
ENAME
PNAME=“CAD/CAM”  (DUR=12  DUR=24) ENAME≠“J. Doe”
⋈PNO,ENO
×
EMP
Distributed DBMS
PROJ
© M. T. Özsu & P. Valduriez
ASG
Ch.7/15
Restructuring
ENAME
⋈PNO
PNO,ENAME
⋈ENO
PNO
PNAME = "CAD/CAM"
PROJ
Distributed DBMS
PNO,ENO
DUR =12DUR=24
ASG
© M. T. Özsu & P. Valduriez
PNO,ENAME
ENAME ≠ "J. Doe"
EMP
Ch.7/16
Step 2 – Data Localization
Input: Algebraic query on distributed relations
•
•
Determine which fragments are involved
Localization program
➡ substitute for each global query its localized query
✦
A localized query is a relational algebra query whose operands are the fragments
of relations instead of the relations themselves
✦
We call these operands that are fragments of relations “localization programs”
✓
✦
Union for horizontal fragmentation; Join for vertical fragmentation
Replication is not taken into account in this chapter
➡ Optimize
✦
For each type of fragmentation, use reduction techniques to generate simpler
queries
✦
To do so, use appropriate heuristics
Distributed DBMS
© M. T. Özsu & P. Valduriez
Ch.7/17
Example
ENAME
Assume
➡ EMP is fragmented into EMP1,
EMP3 as follows:
EMP2,
DUR=12 DUR=24
PNAME=“CAD/CAM”
✦
EMP1= ENO≤“E3”(EMP)
✦
EMP2= “E3”<ENO≤“E6”(EMP)
✦
EMP3= ENO≥“E6”(EMP)
ENAME≠“J. DOE”
⋈PNO
➡ ASG fragmented into ASG1 and ASG2
as follows:
✦
ASG1= ENO≤“E3”(ASG)
✦
ASG2= ENO>“E3”(ASG)
⋈ENO
PROJ
Replace EMP by (EMP1  EMP2  EMP3)
and ASG by (ASG1  ASG2) in any query
Distributed DBMS
© M. T. Özsu & P. Valduriez


EMP1EMP2 EMP3 ASG1 ASG2
Ch.7/18
Reduction for PHF
•
Reduction with selection
➡ Relation R and FR={R1, R2, …, Rw} where Rj=p (R)
j
pi(Rj)= if x in R: ¬(pi(x) pj(x))
➡ Example
SELECT
FROM
WHERE
*
EMP
ENO="E5"
ENO=“E5”
ENO=“E5”

EMP1
Distributed DBMS
EMP2
EMP3
© M. T. Özsu & P. Valduriez
EMP2
Ch.7/19
Reduction for PHF
•
Reduction with join
➡ Possible if fragmentation is done on join attribute, i.e., the selection
attribute used for the fragmentation is the same as the join attribute
➡ Algorithm
✦
Distribute joins over unions
(R1 R2)⋈S  (R1⋈S)  (R2⋈S)
✦
Eliminate useless joins as follows: Given Ri =pi(R) and Rj = pj(R)
Ri ⋈Rj = if x in Ri, y in Rj: ¬(pi(x) pj(y))
That is, qualifications of the joined fragments are in contradiction
Distributed DBMS
© M. T. Özsu & P. Valduriez
Ch.7/20
Reduction for PHF
•
⋈ENO
Assume EMP is fragmented as
before and
➡ ASG1: ENO ≤ "E3"(ASG)


➡ ASG2: ENO > "E3"(ASG)
•
Consider the query
EMP1 EMP2 EMP3
SELECT
*
FROM
EMP,ASG

WHERE
EMP.ENO=ASG.ENO
•
•
Distribute join over unions
Apply the reduction rule
⋈ENO
EMP1
Distributed DBMS
© M. T. Özsu & P. Valduriez
ASG1
⋈ENO
ASG1 EMP2
ASG2
⋈ENO
ASG2 EMP3
ASG2
Ch.7/21
Provides Parallellism

⋈ENO
EMP1
Distributed DBMS
ASG1 EMP2
⋈ENO
⋈ENO
ASG2 EMP3
© M. T. Özsu & P. Valduriez
ASG1 EMP3
⋈ENO
ASG2
Ch.7/22
Eliminates Unnecessary Work

⋈ENO
EMP1
Distributed DBMS
⋈ENO
ASG1 EMP2
ASG2 EMP3
© M. T. Özsu & P. Valduriez
⋈ENO
ASG2
Ch.7/23
Reduction for VF
•
Find useless (not empty) intermediate relations
Relation R defined over attributes A = {A1, ..., An} vertically fragmented
as Ri =A'(R) where A' A:
D,K(Ri) is useless if the set of projection attributes D is not in A'
Example: EMP1=ENO,ENAME (EMP); EMP2=ENO,TITLE (EMP)
SELECT ENAME
FROM
EMP
ENAME
ENAME
⋈ENO
EMP1
Distributed DBMS
EMP2
© M. T. Özsu & P. Valduriez
EMP1
Ch.7/24
Reduction for DHF
•
Rule :
➡ Distribute joins over unions
➡ Apply the join reduction for horizontal fragmentation (using the
•
qualification of the primary fragments!)
Example
ASG1: ASG ⋉ENO EMP1
ASG2: ASG ⋉ENO EMP2
EMP1: TITLE=“Programmer” (EMP)
•
EMP2: TITLE≠“Programmer” (EMP)
Query
SELECT
*
FROM
EMP, ASG
WHEREASG.ENO = EMP.ENO
AND
EMP.TITLE = "Mech. Eng."
Distributed DBMS
© M. T. Özsu & P. Valduriez
Ch.7/25
Reduction for DHF
⋈ENO
Generic query
TITLE=“Mech. Eng.”

ASG1

ASG2
Selections first
EMP1
⋈ENO
TITLE=“Mech. Eng.”

ASG1
Distributed DBMS
EMP2
ASG2
© M. T. Özsu & P. Valduriez
EMP2
Ch.7/26
Reduction for DHF

Joins over unions
⋈ENO
⋈ENO
TITLE=“Mech. Eng.”
ASG1
EMP2
TITLE=“Mech. Eng.”
ASG2
EMP2
Elimination of the empty intermediate relations
⋈ENO
(left sub-tree)
TITLE=“Mech. Eng.”
ASG2
Distributed DBMS
EMP2
© M. T. Özsu & P. Valduriez
Ch.7/27
Reduction for Hybrid
Fragmentation
•
Combine the rules already specified:
➡ Remove empty relations generated by contradicting selections on horizontal
fragments;
➡ Remove useless relations generated by projections on vertical fragments;
➡ Distribute joins over unions in order to isolate and remove useless joins.
Distributed DBMS
© M. T. Özsu & P. Valduriez
Ch.7/28
Reduction for HF
Example
ENAME
Consider the following hybrid
fragmentation:
ENAME
EMP1= ENO≤"E4" (ENO,ENAME (EMP))
ENO=“E5”
EMP2= ENO>"E4" (ENO,ENAME (EMP))
⋈ENO
EMP3= ENO,TITLE (EMP)
and the query
SELECT ENAME
FROM EMP
WHERE ENO="E5"
Distributed DBMS


ENO=“E5”
EMP2
EMP1 EMP2 EMP3
© M. T. Özsu & P. Valduriez
Ch.7/29
Download