Set 2

advertisement
CSE 636
Data Integration
SchemaSQL Implementation
Architecture
Resident
SQL Engine
Answers to
queries Q1…Qn
collected

SchemaSQL
Query
Federation User
Final
Answer
SchemaSQL
Server
Final Series
of SQL Queries
Final
Answer
Optimized local
query Q1
DBMS1
Optimized local
query Qn
answer(Q1)
…
answer(Qn)
DBMSn
2
SchemaSQL Server
• Maintains a Federation System Table (FST)
– FST(db-name, rel-name, attr-name)
– Names of databases, relations and attributes in the
federation
• Compiles the instantiations of the variables in
the query
• Enforces conditions, groupings, aggregations
and mergings
3
Query Processing
Fixed Output Schema
Phase 1
• Corresponding to a set of variable declarations
in the FROM clause, create VITs using one or
more SQL queries against some local databases
and/or the FST
– VIT: Variable Instantiation Table whose schema
consists of all the variables in one or more variable
declarations in the FROM clause
Phase 2
• Rewrite the original SchemaSQL query against
the federation into an “equivalent” query against
the set of VIT relations and compute it using
the resident SQL server
4
Example
SELECT RelC, C.salFloor
FROM univ-C RelC,
univ-C::RelC C,
univ-D::salInfo D
WHERE RelC = D.dept AND
C.salFloor > D.technician AND
C.category = ‘technician’
univ-C:
cs
math
univ-D:
salInfo
category
salFloor
category
salFloor
dept
Prof
Assoc Prof
Asst Prof
…
Prof
74K
Prof
67K
cs
72K
65K
78K
…
Assoc Prof
62K
Assoc Prof
56K
math
65K
54K
69K
…
…
…
…
…
…
…
…
…
…
5
Example – Phase 1
• VITRelC(RelC):
SELECT rel-name AS RelC
FROM FST
WHERE db-name = ‘univ-C’
6
Example – Phase 1
• VITC(RelC, CsalFloor):
1.
SELECT RelC
FROM VITRelC
2. If {r1, …, rn} is the answer in step 1, then VITC is
computed by the following SQL query to univ-C
SELECT ‘r1’ AS RelC, salFloor AS CsalFloor
FROM r1
WHERE category = ‘technician’
UNION
…
UNION
SELECT ‘rn’ AS RelC, salFloor AS CsalFloor
FROM rn
WHERE category = ‘technician’
7
Example – Phase 1
• VITD(Ddept, Dtechnician):
SELECT dept AS Ddept, technician AS Dtechnician
FROM salInfo
8
Example – Phase 1
VITRelC
VITC
VITD
RelC
RelC
CsalFloor
Ddept
Dtechnician
cs
cs
42K
cs
72K
math
math
46K
math
65K
…
…
…
…
…
9
Example – Phase 2
Joined Variable Instantiation Table (JVIT) is
the (natural) join of the VITs generated during Phase 1
1. CREATE VIEW JVIT(RelC, CsalFloor, Ddept, Dtechnician) AS
SELECT VITRelC.RelC, VITC.CsalFloor,
VITD.Ddept, VITD.Dtechnician
FROM VITRelC, VITC, VITD
WHERE VITRelC.RelC = VITD.Ddept AND
VITRelC.CsalFloor > VITD.Dtechnician AND
VITRelC.RelC = VITC.RelC
2. SELECT
FROM
RelC, CsalFloor
JVIT
10
Example – Phase 2 (Aggregation)
Q: Find the average salary floor across all departments for each
employee category in database univ-B
SELECT
FROM
T.category, avg(T.D)
univ-B::salInfo D,
univ-B::salInfo T
WHERE
D <> ‘category’
GROUP BY T.category
univ-B:
salInfo
category
cs
math
ece
…
Prof
72K
65K
78K
…
Assoc Prof
65K
54K
69K
…
…
…
…
…
…
11
Example – Phase 2 (Aggregation)
Q: Find the average salary floor across all departments for each
employee category in database univ-B
SELECT
FROM
T.category, avg(T.D)
univ-B::salInfo D,
univ-B::salInfo T
WHERE
D <> ‘category’
GROUP BY T.category
Aggregation After Phase 2
SELECT
Tcategory, avg(TD)
FROM
JVIT
GROUP BY Tcategory
12
References
1. L. V. S. Lakshmanan, F. Sadri, I. N. Subramanian:
SchemaSQL – A Language for Interoperability in
Relational Multi-database Systems
VLDB, 1996
2. L. V. S. Lakshmanan, F. Sadri, S. N. Subramanian:
SchemaSQL – An Extension to SQL for
Multidatabase Interoperability
TODS, 2001
13
Download