Contents: Heterogeneous DBSs • Motivation – Applications for Heterogenous Database Systems (HDBS) • What is a HDBMS? • Architectures for HDBS • Main Problems: - Defining a Global Data Model - Query Processing & Optimization - Transaction Management • Summary and Conclusion Heterogeneous / Federated / Multi-Database Systems Dr. Denise Ecklund 9 October 2002 Pensum: Garcia-Molina/Ullman/Widom: Section 20.1, 20.2, 20.3 and this set of presentation slides. HDBMS-1 ©2002 Vera Goebel & Denise Ecklund Applications – Similar data Heterogeneous Database Systems (HDBS) Multi-product Customer Support • Multitude of extensive, isolated data agglomerations managed by different DBMSs or file systems Global application Cust. Support Billing Accting Billing Deliveries. CIS Electricity CIS Nat. Gas CIS Oil – Dissimiliar data Global application Cust. Support Cust. Support Billing Suppliers • Ex: 3 Customer Info Systems HDBMS-2 ©2002 Vera Goebel & Denise Ecklund HDBS integration layer HDBS Metadata • Ex: Extended CAD Application • Extension of data and management software because of new and/or extended applications DBMS 1 Design Simulation Creation. • Heterogeneous application domains (e.g., CIM, CAD, Biz-mgmt, …) ©2002 Vera Goebel & Denise Ecklund Local application Extended CAD App CAD Parts Library Line Payment Analysis Equip Accting Inven. Supplier Parts DB Manufac. DB HDBMS-3 Inventory ©2002 Vera Goebel & Denise Ecklund DBMS 2 Accounts DBMS 3 Shipping HDBMS-4 Requirements for HDBS Definition - Heterogeneous DBS (HDBS) • Properties known from homogeneous DBS: - global data model, transactions, recovery, dist transparency, ... A HDBS comprises a software layer (integration layer) and multiple DBSs and/or file sytems to be integrated. • Integration of Heterogeneous Data Stores -> queries across HDBs (combine heterogeneous data) -> heterogeneous information structures -> avoid redundancy (data and data model) -> access (query) language transparency Users can transparently access the integrated DBSs and/or file systems via the interface provided by the integration layer. Defines a global data model Supports a Data Definition Language (DDL) Supports a Data Manipulation Language (DML) Distributed Transaction Management Transparent integration of the underlying, disparate DBSs • “Open” system support for integration of existing data models and DBSs, as well as their schemas and DBs The integrated, local DBSs are autonomous and can also be used as stand-alone systems. Local applications are unchanged and unknown to the HDBS. • Constraints -> retain autonomy of DBS to be integrated -> avoid modifications of existing local applications -> define a viable global data model for global applications HDBMS-5 Layers of Transparency Abstraction Levels [Christmann et al. 87] Supported By Objects access & data model lang global conceptual schema Access Language Transparency fragmentation transparency fragmentation schema fragments of rels/objs Data Modelling Language Transparency replication transparency replication schema multiple copies of fragments of rels/objs network transparency remote communication services logical data independence local conceptual schema Data Replication Transparency Network/Distribution Transp. Data Independence Data physical data independence file system Single site DBMS Homogeneous Distributed DBMS storage and I/O system Heterogeneous DBMS HDBMS-7 ©2002 Vera Goebel & Denise Ecklund relations or objects remotely located multiple copies of fragments local relations/objects physical schema records, access paths file definitions and buffer management physical records, pages disk storage definitions tracks, physical blocks Global Abstractions Abstraction Level Data Fragmentation Transparency ©2002 Vera Goebel & Denise Ecklund HDBMS-6 ©2002 Vera Goebel & Denise Ecklund Local Abstractions ©2002 Vera Goebel & Denise Ecklund HDBMS-8 DBMS Implementation Alternatives Distribution Global application Distributed Homog. Federated DBMS Cl ien Centralized Homogeneous DBMS Centralized Heterogeneous DBMS Centralized Homog. Federated DBMS Centralized Multi-DBMS integration layer HDBS Metadata Autonomy Local application DBMS 1 Heterogeneity Inventory HDBMS-9 ©2002 Vera Goebel & Denise Ecklund Components of a Multi-DBMS Accounts Shipping HDBMS-10 Components of a Distributed Multi-DBMS System Responses USER DBMS Query Processor DBMS Scheduler Scheduler Recovery Manager Recovery Manager Runtime Support Processor Runtime Support Processor • • Multi-DBMS Layer Query Processor Transaction Manager Transaction Manager ••• DBMS Transaction Manager ••• User Requests System Responses Multi-DBMS Layer Query Processor Transaction Manager USER User Requests System Responses User Requests Multi-DBMS Layer Scheduler DBMS 3 ©2002 Vera Goebel & Denise Ecklund USER ©2002 Vera Goebel & Denise Ecklund DBMS 2 Centralized Heterog. Multi-DBMS Centralized Heterog. Federated DBMS Query Processor A Multi-Database or a Federated Database System HDBS Distributed Heterog. Multi-DBMS Distributed Heterog. Federated DBMS Distributed Heterogeneous DBMS Global application Distributed Multi-DBMS t-S er ve r Distributed Homogeneous DBMS Heterogeneous Database Systems (HDBS) Scheduler DBMS Query Processor … DBMS Query Processor Transaction Manager Scheduler DBMS Transaction Manager ••• Scheduler Recovery Manager Recovery Manager Recovery Manager Recovery Manager Runtime Support Processor Runtime Support Processor Runtime Support Processor Runtime Support Processor • • • • Multi-DB Integration layers act as peers in a homogeneous distributed database system - Use the global data model and global access language - Users submit queries to any Multi-DB site - Distributed control over transaction execution HDBMS-11 ©2002 Vera Goebel & Denise Ecklund HDBMS-12 HDBS Architecture Global application Abstract Component Architecture of HDBS Global application HDBS (federation) global integration layer HDBS Metadata Export Schema1 DB-model-specific coupling software Local application DBMS 1 DBMS 2 ... DBMS n Export Schema2 Export Schema3 DB 1 local system 1 DB 2 DB n local system 2 local system n DBMS 1 DBMS 2 DB 1 DB 2 ... DBMS n HDBMS-13 Toolkits for HDBMS – an implementation approach DB n HDBMS-14 ©2002 Vera Goebel & Denise Ecklund Heterogeneous Database Systems (Fully-autonomous HDBS) Global application Global application HDBS Multi-DB Layer integration layer HDBS Metadata DBS T1 DBS T2 DBMS 1 DBS T3 DBS T4 DBS T5 DBMS 1 DBMS 4 Export Schema1 DBMS 5 DB 1 DB 1 ©2002 Vera Goebel & Denise Ecklund local DBSs Coupling software can be partitioned into processes (or agents) that execute on HDBMS hosts and on local DB hosts. ©2002 Vera Goebel & Denise Ecklund Integration Toolkit global integration layer DBMS software of HDBS HDBMS Metadata DB 4 DBMS 2 Export Schema2 DB 2 DBMS 3 Local application Export Schema3 DB 3 HDBS Server or HDBS Proxy - Runs on the local DB site - Typically includes some code that is specific to the local DB type DB 5 HDBMS-15 ©2002 Vera Goebel & Denise Ecklund HDBMS-16 Information Integration Architecture “Multiple, legacy data sources” CORBA Objects for HDBS – an implementation approach Wrapper #1 Create & Exec Call Sequence Information Mediator .. . Query Global Data Dictionary client c client b LAI 1 Like the Integration Layer Legacy Data Source #1 Like the HDBMS Proxy Wrapper #2 Query Web Browser client a Local Data Dictionary Convert & Return Results as Tuples Decompose Query Manage Query Exec Use distributed object managers (DOMs) to realize HDBSs -> CORBA Parse SubQuery Parse SubQuery DOM 3 LAI - local application interface DOM – distributed object manager DOM 4 Compute Final Results Create & Exec Call Sequence Local Data Dictionary Data Source X Convert & Return Results as Tuples DOM 1 DOM 2 LAI 2 LAI 3 Data Source Y Legacy Data Source #2 ©2002 Vera Goebel & Denise Ecklund HDBMS-17 ©2002 Vera Goebel & Denise Ecklund Concepts in the Integration Layer Data Model • Global data model • Global schema and meta data management • Local data models: any kind of data model possible, e.g., object-oriented, relational, entity-relationship, hierarchical, network-oriented, flat files, ... • Distributed query processing and optimization HDBMS-18 • Global data model: must comprise modeling concepts and mechanisms to express the features of the local data models • Distributed transaction management – When integrating N local data models, use the “richest” model of the N models you are integrating – Object-oriented data models • Extensible software construction (to allow the “easy” integration of additional system components) • Provide user-defined data types and methods • Are often used as the global (integration) data model Goals - To define a data model that: 1) Is a complete, minimal, and understandable data model for the union of the data stored in the set of local data bases (application development time) 2) Support application queries that can be satisfied by retrieving data from the set of local data bases(application runtime) ©2002 Vera Goebel & Denise Ecklund HDBMS-19 ©2002 Vera Goebel & Denise Ecklund HDBMS-20 Schema Architecture of HDBS global data model global/federated schema schema integration export schema 1 ... 5-layer schema architecture Schema Architecture of HDBS - 2 external schema ... ... federated schema federated schema ... ... global data model export schema ... ... local data models local schema n HDBMS-21 ©2002 Vera Goebel & Denise Ecklund Global View Defn • Schema Translation • Ex: a Relational schema to an Object-oriented schema – For N translated, local schemas component schema local data models local schema ... component schema Translation ... local schema HDBMS-22 ©2002 Vera Goebel & Denise Ecklund Comp Pkg – Different names for equivalent entities, attributes, relationships, etc. – Same name for different entities, attributes, … – Map each local schema to the language of the global data model • Schema Integration global data model Schema Conflicts • Name Schema Homogenization • Structure Adequate design tools are not available – Missing attributes – Missing but implicit attributes • Pairwise integration, X-at-a-time integration, One-step integration – One-to-many, many-to-many • Entity versus Attribute (inclusion) – One attribute or several attributes – Make the ”same things” be ”one thing” in the integrated schema • Behavior – Resolve conflicts – Different integrity constraints • Ex: automatic update, delete a project when the last engineer is moved to another project • structural and semantic HDBMS-23 ©2002 Vera Goebel & Denise Ecklund salary earns name name title • Relationship – Determine ”common semantics” of the schemas Integration export schema Multiple Views local schema 1 ©2002 Vera Goebel & Denise Ecklund auxiliary schema Multi-Use export schema homogenization App View Defn Multi-lingual auxiliary schema export schema n external schema external schema Emp N works-on M Proj C1 rank Engr N works-in 1 Cost Center C2 Name (as an attribute) Same Info Fname Lname Nickname Init Name (as an entity) HDBMS-24 Data Representation Conflicts Suitability of Object-Oriented Data Models as Global Data Models • Rich set of type constructors -> easy representation of other data models • Different representation for equivalent data – Different units • Celsius ↔ Farenheit; Kilograms ↔ Pounds; Liters ↔ Gallons; • Extensibility (user-defined types + type specific operators) & Encapsulation -> representation of “foreign” types/systems -> hiding heterogeneity (concrete storage) in a natural way – Different levels of precision • 4 decimal digits versus 2 decimal digits • Floating point versus integer – Different expression denoting same information • Enumerated value sets that are not one-to-one • Inheritance (generalization) & computational completeness -> schema integration - factor out common properties of similar types - thereby “arbitrary” computations possible – {good, ok, bad} versus {one, two, three, four, five} How to Resolve Schema Conflicts? Can Object-Oriented Models Help? HDBMS-25 ©2002 Vera Goebel & Denise Ecklund Use of Generalization & Computational Completeness (Example) class Person ( name: string, address: Address) is_a method net-income(): float; is_a global data model HDBMS-26 ©2002 Vera Goebel & Denise Ecklund Conflict Resolution • Renaming entities and attributes – Pick one name for the same things – Use unique prefixes for different things • Homogenizing representations – Use conversions and mappings class Faculty ( salary: float, course-given: set (Courses), tax-rate: float) method net-income (): float return (self->salary * (1-self->tax-rate)); DBS1 class Faculty name: string, address: Address, salary: float, course-given: set (Courses); ©2002 Vera Goebel & Denise Ecklund • stored programs in relational systems • methods in OO systems • auxiliary schemas to store conversion rules/code class Student ( grant: float, course-enroll: set (Courses)) method net-income (): float return (self->grant); • Homogenizing attributes – Use type coercion (e.g., integer to float) – Attribute concatenation (e.g., first name || last name) – For missing attributes, assign default values • Homogenizing an attribute and an entity DBS2 class Student name: string, address: Address, grant: float, course-enroll: set (Courses); D-Name D-Name Emp N Member-of 1 Engr N Member-of 1 Dept Dept – Extract an attribute from the entity local data models HDBMS-27 • Ex: Project department name from the Dept entity to create a virtual attribute (e.g., Emp->Dept.D-Name) Bldg – Create an entity from the attribute • Ex: Define default values and behavior for all other attributes of the Dept entity ©2002 Vera Goebel & Denise Ecklund … D-Name Bldg D-Name … HDBMS-28 Conflict Resolution • Horizontal joins Conflict Resolution involving a Database Key 1 2 3 – Union compatible • For missing attributes, assign default values or compute implicit values – Extended union compatible • Use generalization 1 2 3 4 5 A B C • Entity-Attribute Conflicts where the Attribute is a DB key in one local schema A B C Union 4 5 Rel • Subclasses of the generalization – class Person generalizes class Student and class Employee • Vertical joins – Many and many to one • Mixed Joins 1 2 3 4 5 1 2 3 A B C 1 2 3 Join A D E F 1 2 A B CDEF Union A C D Join Join 1 2 – Vertical and horizontal joins in combination 1 2 A B C E F HDBMS-29 ©2002 Vera Goebel & Denise Ecklund Conflict Resolution involving a Partial Database Key • Entity-Attribute Conflicts where the Attribute is a partial DB key in one local schema Attr1 … AttrN • If Attr1 is a complete DB key in LDB2, then in the global schema – Define entities E and D and relationship Rel – Define a new DB key attribute that will be used to uniquely identify instances of LDB2-E when they are accessed through GDB-E and GDB-D ©2002 Vera Goebel & Denise Ecklund GDB-E N Rel 1 GDB-D N-key Attr1 … AttrN HDBMS-30 Key2 • HDBS manages the global schema = ∑ (all local exported schema) N LDB2-E Rel • Global schema definition facilities provide mechanisms for handling the full spectrum of schematic differences that may exist among the heterogeneous local schemata. 1 • Example: – The global schema defines Attr1 as an entity – Attr1 is a partial DB key for instances of LDB2-E LDB1-D D Global Schema Management Attr1 LDB1-E 1 • Example: dfv – The global schema defines Attr1 as an entity – Attr1 is a DB key for instances of LDB2-E • See earlier example LDB2-E N A B – Define a virtual class containing common attributes – Provide specialized values and compute attribute values for generalized attributes Attr1 LDB1-E LDB1-D D Attr1 … AttrN – Can use an Auxiliary Schema to store mappers, translators, and converters GDB-E N • If Attr1 is a partial DB key in LDB2 – Define the entities E and D, and relationship Rel – Define a new attribute as a partial DB key – Add the other partial key attributes from LDB2 as partial keys • Data is stored in the local component systems. Rel 1 GDB-D N-key Key2 Attr1 … AttrN • Global dictionary information is used to query and manipulate the data. The global language statements are translated into equivalent statements of the local languages supported by the local systems – Add partial DB key LDB2-Attr1 as an attribute only ©2002 Vera Goebel & Denise Ecklund HDBMS-31 ©2002 Vera Goebel & Denise Ecklund HDBMS-32 Query Planning and Optimization in a Distributed Multi-DBMS Query Processing and Optimization • The HDBMS has – A global Data Definition Language (DDL) – A global Data Manipulation Language (DML) – A set of local DMLs HDBMS-33 Global Query on Multiple Databases at Multiple Sites SQ 3 query translator 1 query translator 2 query translator 3 TQ 1 TQ 2 TQ 3 ... DB 2 DB 3 ... PQ k query ... translator n TQ n Sorting and unioning result data Joining intermediate results DB n ©2002 Vera Goebel & Denise Ecklund HDBMS-34 – Heterogeneous Data Definition Languages Data Directory & Export Schema • Weaker modeling languages do not support the same manipulation “features” • Must use multiple techniques in order to define a consistent global data model • Query fragmentation must produce a set of subqueries that reverse the operations used to create/define the global schema { Post-processing Queries } { Subqueries, each on a single local DBMS } Export & Aux Schema • Processing Steps: { Queries, that can be processed by local DBMS } Decomposition & Local Optimization ... • Little information about “how” the subquery will be executed by the Local DBS { Subqueries, each on a single Multi-DB } Local DBMS PQ 1 – Autonomy { Post-processing Queries } Translation SQ n • Similar to query fragmentation problem for homogeneous distributed DBSs • But …Complicating factors: Data Allocation Fragmentation & Global Opt ... SQ 2 Query Fragmentation Information Supporting Query Planning & Optimization Multi-DB Manager Localized multi-DB query m Another Multi-DBMS SQ 1 DB 1 Localization ... query fragmentation and global optimization – Given a query stated in the global query language (DML), execute that query, in an optimal manner, using the local database management systems Control Site query localization Localized multi-DB query 1 • The HDBMS Query Processing Goal: ©2002 Vera Goebel & Denise Ecklund global query (1) Replace names from the global schema with “fullnames” from the export schemas (2) If a subquery involves multiple export schemas, then break the query into queries that operate on one export schema and insert data communication operators to exchange intermediate results between local database systems Local Schema & Access Paths Optimized Local Execution Plan ©2002 Vera Goebel & Denise Ecklund HDBMS-35 ©2002 Vera Goebel & Denise Ecklund HDBMS-36 Global Query Optimization Post-Processing Strategies • Similar to global query optimization for homogeneous distributed DBSs (many algorithms can be used directly) • But only possible under the following assumptions: – No data inconsistency (the global schema correctly represents the semantics of disjoint, overlapping, and conflicting data) – Know the characteristics of local DBSs • e.g., statistical info on data cardinalities and selectivities are available – Can transfer partial data results between different local DBSs • Major impact on post-processing plans 2) Control site performs I&PP-ops for multi-DB results; Multi-DB managers, and HDBMS agents on the local database sites perform I&PP-ops for DBSs within one multi-DB environment • Possible if local DBMS can read intermediate results from external sources, and sort, join, etc. can be directly invoked HDBMS-37 ©2002 Vera Goebel & Denise Ecklund Parallel Execution Strategies HDBMS-38 ©2002 Vera Goebel & Denise Ecklund Global Cost Estimation • Join operations are slow → speedup with parallel execution? • Traditional query plans use left linear join trees – One of the operands is always a base relation • Have good info on cardinality and selectivity for the base R5 R4 R3 R1 • Differs from cost estimation in homogeneous distributed DBSs – Little (or no) info on QP algorithms in local DBSs and data statistics • Cost Estimation Function – Cost to execute each subquery on the local DBSs – Cost to execute all I&PP-ops • via pushdown or by any HDBMS agent or proxy • Use a simplified cost function R2 Cost = Initialization cost + cost to retrieve a set of objects + cost to process a set of objects • Bushy join trees provide parallel execution in heterogenous multi-DB environments ©2002 Vera Goebel & Denise Ecklund • Heavy work load; minimal parallelism 3) Use strategy #2 and use “pushdown” to get the local database systems to perform I&PP-ops – Post-processing Strategy – Parallel Execution Possibilities – Global Cost Function/Estimation – Convert a left linear join tree into a (balanced?) bushy join tree 1) Control site performs all intermediate and post-processing operations (I&PP-ops) • Better work load balance; more parallelism • Primary Considerations: – Used even in homogeneous distributed DBSs because cooperative nodes can pipeline the sequence of joins • Three Strategies: • Run test queries on the local DBSs to get time estimates for ops R5 R1 R2 R3 R4 HDBMS-39 – Selection, with and without an index – Join (testing for different algorithms: sort, hash, or indexed based algs) ©2002 Vera Goebel & Denise Ecklund HDBMS-40 Query Translation Local Data Sources with Heterogeneous Capabilities When a query language of a local DBS is different from the global query language, each export schema subquery for the local DB needs to be translated from the global language to the target language. Object-oriented (local) Relational (local) Object-oriented (global) Hierarchical (local) Network-oriented (local) Relational (global) ... Weaker target languages do not support the same operations, so emulate required operations in post-processing Ex: retrieve more data than requested by the query OQL and then post-process that data to compute the correct response to the query QUEL Reduce the number of language mappings using the Entity-Relationship Query Language as an intermediary language • ”GenCompact” defines a simple language to describe query capabilities of the local DBSs • Generates the ”best” query plan for each local DBS • The ”best” query plan is – Feasible (i.e., can be executed by the local DBS) – Provides nearly optimal performance SQL ERQL CODASYL Access Funcs ©2002 Vera Goebel & Denise Ecklund DB/2 Func I/F HDBMS-41 Restrictions on Query Processing Capabilities • The approach: – Generate a large set of possible query plans – Efficiently select best plan, quickly pruning bad plans Simple Source Description Lanuage - SSDL • Example database of cars for sale • Condition-Attribute restrictions – Attributes: {make, model, year, color, price} – Can not select on attribute A1 – Must specify a selection value for attribute A2 • SSDL definition of query processing capabilities for one local DBS • Condition-Expression-Size restriction _s → _s1 | _s2 _s1 → make = $m ٨ price < $p – Can not specify more than k selection conditions _s2 → make = $m ٨ color = $c • Condition-Expression-Structure restrictions } Defines 2 forms of supported queries attributes :: _s1 : {make, model, year, color} – Allow only atomic conditionals (i.e., no ٨ or ٧) – Allow only ”and-ed” conditions (i.e., no ”or” operators) – Combination of restrictions on using ٨ and ٧ ©2002 Vera Goebel & Denise Ecklund HDBMS-42 ©2002 Vera Goebel & Denise Ecklund HDBMS-43 attributes :: _s2 : {make, model, year} ”_” $p indicates a variable/non-terminal indicates an integer constant ©2002 Vera Goebel & Denise Ecklund } Defines valid output attributes for each supported query form $m indicates a string constant $c indicates a string constant from an enumerated set HDBMS-44 The Global Query as a Condition Tree Generating Alternate Plans DB1 cannot evaluate condition C2 An alternate query plan is: Global Query Post Processing Select A Where LQ1 and LQ2 Local Query LQ1 Select A Where C1 and C2 C1(A1) C2(A2) • Obvious Query Plan: – DB1 evaluates LQ1 – DB2 evaluates LQ2 – PP joins the two results Global Query Post Processing Select A Where IP1 and LQ2 Local Query LQ2 Select A Where C3 or C4 C4(A4) Intermediate Processing IP1 Select A Where C2(LQ1.A2) DB1 cannot evaluate condition C2? DB2 cannot evaluate condition (C3 or C4)? Local Query LQ1 Select A, A2 Where C1 C3(A3) What if ? C1(A1) HDBMS-45 ©2002 Vera Goebel & Denise Ecklund Generating Alternate Plans Rewrite Rewrite Rules Intermediate Processing IP2 Select A Where C3(LQ2.A3) or C4(LQ2.A4) ©2002 Vera Goebel & Denise Ecklund C2(A2) C4(A4) HDBMS-46 ©2002 Vera Goebel & Denise Ecklund Query as a condition tree (CT) Global Query Post Processing Select A Where LQ1 and IP2 C1(A1) C3(A3) GenCompact Architecture DB2 cannot evaluate the condition (C3 or C4) An alternate query plan is: Local Query LQ1 Select A Where C1 and C2 Local Query LQ2 Select A Where C3 or C4 equiv CTs Mark marked CTs Generate feasible query plans SSDL descr of LDBs Cost Best query plan Cost Model • Rewrite - Commutativity, associativity, distribution, and copy rules • Mark - Tests executability on the applicable LDBs • Generate - Creates alternate query plans by inserting intermediate processing steps Local Query LQ2 Select A, A3, A4 Where { } • Cost - Applies a cost function to quickly select the ”best plan” HDBMS-47 ©2002 Vera Goebel & Denise Ecklund HDBMS-48 Schema SQL – A Natural Extension to SQL • Manipulate data and schema using one language What if . . . ABCD all your local systems are relational, have similar processing capabilities, but use different schemas? – – – – HDBMS-49 Set of attribute names in relation rel in the database db RatePerKwH Name Address Name ServiceAddr BillingAddr RateCategory Name ServiceAddr BillingAddr RateCategory RateCategory RateCategory RatePerCuM Fees Oil Database DelN DelS DelE DelW • A valid SchemaSQL variable is of the form <range> <var> ©2002 Vera Goebel & Denise Ecklund HDBMS-50 Home Biz Industry Rates Set of relation names in the database db Set of values in the columns named attr in the relation rel in the database db One query accessing multiple databases Queries over the schema definitions themselves Runtime restructuring of views and output schemas Aggregation operations over rows, columns and selected blocks Natural Gas Database Set of database names in the federation db::rel.attr DB3 Meta-data ©2002 Vera Goebel & Denise Ecklund Meaning Set of tuples in the relation rel in the database db DB3 DB1 Meta-data SchemaSQL – Examples: 3 CIS Databases Electricity Database CustInfo Name Address1 Address2 CustType • Extends the variables and ranges defined by SQL • Valid SchemaSQL ranges are: db::rel DB2 Meta-data DB1 • SchemaSQL supports SchemaSQL Syntax Definitions Symbol → db → db::rel → Select A,B,C,D From Q, R, S, T, U, V ... Maybe you can use SchemaSQL to write and execute your global queries! ©2002 Vera Goebel & Denise Ecklund DB2 SchemaSQL query HDBMS-51 Name DeliveryAddr MailingAddr TankCapacity DeliveryFreq PriceRate Name DeliveryAddr MailingAddr TankCapacity DeliveryFreq PriceRate Name DeliveryAddr MailingAddr TankCapacity DeliveryFreq PriceRate Name DeliveryAddr MailingAddr TankCapacity DeliveryFreq PriceRate ©2002 Vera Goebel & Denise Ecklund HDBMS-52 Global Schema for the Federated CIS Database SchemaSQL – Example #1 Electricity Database Electricity Nat Gas Elec Meta-data Oil Nat Gas Meta-data CustInfo Name Address1 Address2 CustType RatePerKwH Oil Meta-data Federated CIS Database CustInfo Name BillAddr ServAddr CustType FuelType ... Federated CIS Database CustInfo Rates OilDel Acct# Name RateCategory Acct# BillAddr FuelRate DelAddr ServAddr CustType TaxRate GovFees TankCapacity FuelType RateCategory from DelSchedule HDBMS-53 ©2002 Vera Goebel & Denise Ecklund SchemaSQL – Example #2 Address RateCategory Name ServiceAddr BillingAddr RateCategory Name ServiceAddr BillingAddr RateCategory RatePerCuM create view LowRatePayers::CInfo(Name, FuelType, Rate) select NameRel.Name, DBname, RateValue from → DBname, DBname→ NameRel, DBname→ RateRel, DBname::RateRel→ RateAttr, DBname::RateRel.RateAttr RateValue where (Dbname = ”Natural Gas” and NameRel <> ”Rates” and RateRel = ”Rates” and NameRel.RateCategory = RateRel.RateCategory and RateRel.RateValue < 0.035) or (NameRel = RateRel and ((RateAttr = ”RatePerKwH”) or (RateAttr = ”PriceRate”) or (RateAttr = ”RatePerCuM”)) and RateValue < 0.035) Fees Federated CIS Database CustInfo Name CustType FuelType RateCategory ... create view NGtoFed::CustInfo(Name, CustType, FuelType, RateCategory) select from where Rel.Name, Rel, DBname, Rel.RateCategory → DBname, DBname→ Rel, Rel <> ”Rates” ©2002 Vera Goebel & Denise Ecklund HDBMS-54 ©2002 Vera Goebel & Denise Ecklund Problem: Create a view containing the ”Low Rate Payers” from all the federated CIS databases (less than 0.035 for each unit of fuel). Name RateCategory → DBname, DBname::CustInfo Rel, SchemaSQL – Example #3 Natural Gas Database Home Biz Industry Rates create view ElectoFed::CustInfo(Name, BillAddr, ServAddr, CustType, FuelType) select Rel.Name, Rel.Address2, Rel.Address1, Rel.CustType, DBname HDBMS-55 ©2002 Vera Goebel & Denise Ecklund Variable Dbname NameRel & RateRel RateAttr Values Electricity Natural Gas Oil CustInfo Home … DelN … Name Address1 … Name Address … Name DeliveryAddr … HDBMS-56 SchemaSQL Service Architecture Summary – Some Query Processing Tools for HDBMSs Intermediate and Post Processing Resident SQL Engine Answers Q1, Q2, ... Qn SchemaSQL Query FST PostProc Queries SchemaSQL Server RDBMS1 No local DB support Custom Serv in Global Mgr Final Answer ... Answer Q1 Answer Qn Varied local models Local DB’s QP Capability GenCompact Heterogeneity of Data Models Federation System Table (FST) – stores database names, relation names, attribute names, and statistical information on the local RDBMSs HDBMS-57 To Be Continued ... Next Week ... Transaction Management in HDBMSs ©2002 Vera Goebel & Denise Ecklund Func Intf to Simple QP DBMS-specific QP (< a DBMS) Same local models RDBMSn ©2002 Vera Goebel & Denise Ecklund GenCompact SchemaSQL No local DB support DBMS Serv in Global Mgr HL Lang Intf to DBMS-specific QP Optimized local SQL query Q2 Optimized local SQL query Q1 Federation User Varied local DB support Final Answer HDBMS-59 ©2002 Vera Goebel & Denise Ecklund HDBMS-58