CSE 636 Data Integration SchemaSQL Implementation Architecture Resident SQL Engine Answers to queries Q1…Qn collected SchemaSQL Query Federation User Final Answer SchemaSQL Server Final Series of SQL Queries Final Answer Optimized local query Q1 DBMS1 Optimized local query Qn answer(Q1) … answer(Qn) DBMSn 2 SchemaSQL Server • Maintains a Federation System Table (FST) – FST(db-name, rel-name, attr-name) – Names of databases, relations and attributes in the federation • Compiles the instantiations of the variables in the query • Enforces conditions, groupings, aggregations and mergings 3 Query Processing Fixed Output Schema Phase 1 • Corresponding to a set of variable declarations in the FROM clause, create VITs using one or more SQL queries against some local databases and/or the FST – VIT: Variable Instantiation Table whose schema consists of all the variables in one or more variable declarations in the FROM clause Phase 2 • Rewrite the original SchemaSQL query against the federation into an “equivalent” query against the set of VIT relations and compute it using the resident SQL server 4 Example SELECT RelC, C.salFloor FROM univ-C RelC, univ-C::RelC C, univ-D::salInfo D WHERE RelC = D.dept AND C.salFloor > D.technician AND C.category = ‘technician’ univ-C: cs math univ-D: salInfo category salFloor category salFloor dept Prof Assoc Prof Asst Prof … Prof 74K Prof 67K cs 72K 65K 78K … Assoc Prof 62K Assoc Prof 56K math 65K 54K 69K … … … … … … … … … … 5 Example – Phase 1 • VITRelC(RelC): SELECT rel-name AS RelC FROM FST WHERE db-name = ‘univ-C’ 6 Example – Phase 1 • VITC(RelC, CsalFloor): 1. SELECT RelC FROM VITRelC 2. If {r1, …, rn} is the answer in step 1, then VITC is computed by the following SQL query to univ-C SELECT ‘r1’ AS RelC, salFloor AS CsalFloor FROM r1 WHERE category = ‘technician’ UNION … UNION SELECT ‘rn’ AS RelC, salFloor AS CsalFloor FROM rn WHERE category = ‘technician’ 7 Example – Phase 1 • VITD(Ddept, Dtechnician): SELECT dept AS Ddept, technician AS Dtechnician FROM salInfo 8 Example – Phase 1 VITRelC VITC VITD RelC RelC CsalFloor Ddept Dtechnician cs cs 42K cs 72K math math 46K math 65K … … … … … 9 Example – Phase 2 Joined Variable Instantiation Table (JVIT) is the (natural) join of the VITs generated during Phase 1 1. CREATE VIEW JVIT(RelC, CsalFloor, Ddept, Dtechnician) AS SELECT VITRelC.RelC, VITC.CsalFloor, VITD.Ddept, VITD.Dtechnician FROM VITRelC, VITC, VITD WHERE VITRelC.RelC = VITD.Ddept AND VITRelC.CsalFloor > VITD.Dtechnician AND VITRelC.RelC = VITC.RelC 2. SELECT FROM RelC, CsalFloor JVIT 10 Example – Phase 2 (Aggregation) Q: Find the average salary floor across all departments for each employee category in database univ-B SELECT FROM T.category, avg(T.D) univ-B::salInfo D, univ-B::salInfo T WHERE D <> ‘category’ GROUP BY T.category univ-B: salInfo category cs math ece … Prof 72K 65K 78K … Assoc Prof 65K 54K 69K … … … … … … 11 Example – Phase 2 (Aggregation) Q: Find the average salary floor across all departments for each employee category in database univ-B SELECT FROM T.category, avg(T.D) univ-B::salInfo D, univ-B::salInfo T WHERE D <> ‘category’ GROUP BY T.category Aggregation After Phase 2 SELECT Tcategory, avg(TD) FROM JVIT GROUP BY Tcategory 12 References 1. L. V. S. Lakshmanan, F. Sadri, I. N. Subramanian: SchemaSQL – A Language for Interoperability in Relational Multi-database Systems VLDB, 1996 2. L. V. S. Lakshmanan, F. Sadri, S. N. Subramanian: SchemaSQL – An Extension to SQL for Multidatabase Interoperability TODS, 2001 13