Heterogeneous / Federated / Multi-Database Systems

advertisement
Contents: Heterogeneous DBSs
• Motivation
– Applications for Heterogenous Database Systems (HDBS)
• What is a HDBMS?
• Architectures for HDBS
• Main Problems:
- Defining a Global Data Model
- Query Processing & Optimization
- Transaction Management
• Summary and Conclusion
Heterogeneous / Federated /
Multi-Database Systems
Dr. Denise Ecklund
9 October 2002
Pensum: Garcia-Molina/Ullman/Widom: Section 20.1, 20.2, 20.3
and this set of presentation slides.
HDBMS-1
©2002 Vera Goebel & Denise Ecklund
Applications
– Similar data
Heterogeneous Database Systems (HDBS)
Multi-product
Customer Support
• Multitude of extensive,
isolated data agglomerations
managed by different DBMSs
or file systems
Global
application
Cust.
Support
Billing Accting
Billing Deliveries.
CIS
Electricity
CIS
Nat. Gas
CIS
Oil
– Dissimiliar data
Global
application
Cust.
Support
Cust.
Support
Billing Suppliers
• Ex: 3 Customer Info Systems
HDBMS-2
©2002 Vera Goebel & Denise Ecklund
HDBS
integration layer
HDBS
Metadata
• Ex: Extended CAD Application
• Extension of data and
management software
because of new and/or
extended applications
DBMS 1
Design
Simulation Creation.
• Heterogeneous application
domains (e.g., CIM, CAD, Biz-mgmt, …)
©2002 Vera Goebel & Denise Ecklund
Local
application
Extended CAD App
CAD Parts
Library
Line
Payment
Analysis Equip
Accting
Inven.
Supplier
Parts DB
Manufac.
DB
HDBMS-3
Inventory
©2002 Vera Goebel & Denise Ecklund
DBMS 2
Accounts
DBMS 3
Shipping
HDBMS-4
Requirements for HDBS
Definition - Heterogeneous DBS (HDBS)
• Properties known from homogeneous DBS:
- global data model, transactions, recovery, dist transparency, ...
A HDBS comprises a software layer (integration layer)
and multiple DBSs and/or file sytems to be integrated.
• Integration of Heterogeneous Data Stores
-> queries across HDBs (combine heterogeneous data)
-> heterogeneous information structures
-> avoid redundancy (data and data model)
-> access (query) language transparency
Users can transparently access the integrated DBSs and/or file
systems via the interface provided by the integration layer.
Defines a global data model
Supports a Data Definition Language (DDL)
Supports a Data Manipulation Language (DML)
Distributed Transaction Management
Transparent integration of the underlying, disparate DBSs
• “Open” system
support for integration of existing data models and DBSs,
as well as their schemas and DBs
The integrated, local DBSs are autonomous and can also be used
as stand-alone systems.
Local applications are unchanged and unknown to the HDBS.
• Constraints
-> retain autonomy of DBS to be integrated
-> avoid modifications of existing local applications
-> define a viable global data model for global applications
HDBMS-5
Layers of Transparency
Abstraction Levels [Christmann et al. 87]
Supported By
Objects
access & data model lang
global conceptual schema
Access Language Transparency
fragmentation transparency
fragmentation schema
fragments of rels/objs
Data Modelling Language Transparency
replication transparency
replication schema
multiple copies of
fragments of rels/objs
network transparency
remote communication
services
logical data independence
local conceptual schema
Data Replication Transparency
Network/Distribution Transp.
Data Independence
Data
physical data independence
file system
Single site DBMS
Homogeneous Distributed DBMS
storage and I/O system
Heterogeneous DBMS
HDBMS-7
©2002 Vera Goebel & Denise Ecklund
relations or objects
remotely located multiple
copies of fragments
local relations/objects
physical schema
records, access paths
file definitions and
buffer management
physical records, pages
disk storage definitions
tracks, physical blocks
Global Abstractions
Abstraction Level
Data Fragmentation Transparency
©2002 Vera Goebel & Denise Ecklund
HDBMS-6
©2002 Vera Goebel & Denise Ecklund
Local Abstractions
©2002 Vera Goebel & Denise Ecklund
HDBMS-8
DBMS Implementation Alternatives
Distribution
Global
application
Distributed Homog.
Federated DBMS
Cl
ien
Centralized
Homogeneous DBMS
Centralized
Heterogeneous DBMS
Centralized Homog.
Federated DBMS
Centralized
Multi-DBMS
integration layer
HDBS
Metadata
Autonomy
Local
application
DBMS 1
Heterogeneity
Inventory
HDBMS-9
©2002 Vera Goebel & Denise Ecklund
Components of a Multi-DBMS
Accounts
Shipping
HDBMS-10
Components of a Distributed Multi-DBMS
System Responses
USER
DBMS
Query
Processor
DBMS
Scheduler
Scheduler
Recovery
Manager
Recovery
Manager
Runtime Support
Processor
Runtime Support
Processor
•
•
Multi-DBMS Layer
Query
Processor
Transaction
Manager
Transaction
Manager
•••
DBMS
Transaction
Manager
•••
User Requests
System Responses
Multi-DBMS Layer
Query
Processor
Transaction
Manager
USER
User Requests
System Responses
User Requests
Multi-DBMS Layer
Scheduler
DBMS 3
©2002 Vera Goebel & Denise Ecklund
USER
©2002 Vera Goebel & Denise Ecklund
DBMS 2
Centralized Heterog.
Multi-DBMS
Centralized Heterog.
Federated DBMS
Query
Processor
A Multi-Database or
a Federated Database System
HDBS
Distributed Heterog.
Multi-DBMS
Distributed Heterog.
Federated DBMS
Distributed
Heterogeneous DBMS
Global
application
Distributed
Multi-DBMS
t-S
er
ve
r
Distributed
Homogeneous DBMS
Heterogeneous Database Systems (HDBS)
Scheduler
DBMS
Query
Processor
…
DBMS
Query
Processor
Transaction
Manager
Scheduler
DBMS
Transaction
Manager
•••
Scheduler
Recovery
Manager
Recovery
Manager
Recovery
Manager
Recovery
Manager
Runtime Support
Processor
Runtime Support
Processor
Runtime Support
Processor
Runtime Support
Processor
•
•
•
•
Multi-DB Integration layers act as peers in a homogeneous distributed database system
- Use the global data model and global access language
- Users submit queries to any Multi-DB site
- Distributed control over transaction execution
HDBMS-11
©2002 Vera Goebel & Denise Ecklund
HDBMS-12
HDBS Architecture
Global
application
Abstract Component Architecture of HDBS
Global
application
HDBS (federation)
global integration layer
HDBS
Metadata
Export
Schema1
DB-model-specific
coupling software
Local
application
DBMS 1
DBMS 2
...
DBMS n
Export
Schema2
Export
Schema3
DB 1
local
system 1
DB 2
DB n
local
system 2
local
system n
DBMS 1
DBMS 2
DB 1
DB 2
...
DBMS n
HDBMS-13
Toolkits for HDBMS – an implementation approach
DB n
HDBMS-14
©2002 Vera Goebel & Denise Ecklund
Heterogeneous Database Systems (Fully-autonomous HDBS)
Global application
Global application
HDBS
Multi-DB Layer
integration layer
HDBS
Metadata
DBS T1
DBS T2
DBMS 1
DBS T3
DBS T4
DBS T5
DBMS 1
DBMS 4
Export
Schema1
DBMS 5
DB 1
DB 1
©2002 Vera Goebel & Denise Ecklund
local
DBSs
Coupling software can be partitioned into processes (or agents)
that execute on HDBMS hosts and on local DB hosts.
©2002 Vera Goebel & Denise Ecklund
Integration
Toolkit
global
integration
layer
DBMS software of HDBS
HDBMS
Metadata
DB 4
DBMS 2
Export
Schema2
DB 2
DBMS 3
Local
application
Export
Schema3
DB 3
HDBS Server or HDBS Proxy
- Runs on the local DB site
- Typically includes some code that is specific to the local DB type
DB 5
HDBMS-15
©2002 Vera Goebel & Denise Ecklund
HDBMS-16
Information Integration Architecture
“Multiple, legacy data sources”
CORBA Objects for HDBS – an implementation approach
Wrapper #1
Create & Exec
Call Sequence
Information Mediator
..
.
Query
Global Data
Dictionary
client c
client b
LAI 1
Like the
Integration Layer
Legacy Data Source #1
Like the
HDBMS Proxy
Wrapper #2
Query
Web
Browser
client a
Local Data
Dictionary
Convert & Return
Results as Tuples
Decompose Query
Manage Query Exec
Use distributed object managers (DOMs) to realize HDBSs -> CORBA
Parse SubQuery
Parse SubQuery
DOM 3
LAI - local application interface
DOM – distributed object manager
DOM 4
Compute Final Results
Create & Exec
Call Sequence
Local Data
Dictionary
Data
Source X
Convert & Return
Results as Tuples
DOM 1
DOM 2
LAI 2
LAI 3
Data
Source Y
Legacy Data Source #2
©2002 Vera Goebel & Denise Ecklund
HDBMS-17
©2002 Vera Goebel & Denise Ecklund
Concepts in the Integration Layer
Data Model
• Global data model
• Global schema and meta data management
• Local data models: any kind of data model possible,
e.g., object-oriented, relational, entity-relationship, hierarchical,
network-oriented, flat files, ...
• Distributed query processing and optimization
HDBMS-18
• Global data model: must comprise modeling concepts and
mechanisms to express the features of the local data models
• Distributed transaction management
– When integrating N local data models,
use the “richest” model of the N models you are integrating
– Object-oriented data models
• Extensible software construction
(to allow the “easy” integration of additional system components)
• Provide user-defined data types and methods
• Are often used as the global (integration) data model
Goals - To define a data model that:
1) Is a complete, minimal, and understandable data model for the union of
the data stored in the set of local data bases (application development time)
2) Support application queries that can be satisfied by retrieving data from
the set of local data bases(application runtime)
©2002 Vera Goebel & Denise Ecklund
HDBMS-19
©2002 Vera Goebel & Denise Ecklund
HDBMS-20
Schema Architecture of HDBS
global
data model
global/federated
schema
schema
integration
export
schema 1
...
5-layer schema architecture
Schema Architecture of HDBS - 2
external schema
...
...
federated schema
federated schema
...
...
global
data model
export schema
...
...
local
data models
local
schema n
HDBMS-21
©2002 Vera Goebel & Denise Ecklund
Global View Defn
• Schema Translation
• Ex: a Relational schema to an Object-oriented schema
– For N translated, local schemas
component schema
local
data models
local schema
...
component schema
Translation
...
local schema
HDBMS-22
©2002 Vera Goebel & Denise Ecklund
Comp Pkg
– Different names for equivalent entities,
attributes, relationships, etc.
– Same name for different entities, attributes, …
– Map each local schema to the language of the global data model
• Schema Integration
global
data model
Schema Conflicts
• Name
Schema Homogenization
• Structure
Adequate design tools
are not available
– Missing attributes
– Missing but implicit attributes
• Pairwise integration, X-at-a-time integration, One-step integration
– One-to-many, many-to-many
• Entity versus Attribute (inclusion)
– One attribute or several attributes
– Make the ”same things” be ”one thing” in the integrated schema
• Behavior
– Resolve conflicts
– Different integrity constraints
• Ex: automatic update, delete a project when
the last engineer is moved to another project
• structural and semantic
HDBMS-23
©2002 Vera Goebel & Denise Ecklund
salary
earns
name
name
title
• Relationship
– Determine ”common semantics” of the schemas
Integration
export schema
Multiple Views
local
schema 1
©2002 Vera Goebel & Denise Ecklund
auxiliary schema
Multi-Use
export schema
homogenization
App View Defn
Multi-lingual
auxiliary schema
export
schema n
external schema
external schema
Emp
N
works-on
M
Proj
C1
rank
Engr
N
works-in
1
Cost Center
C2
Name (as an attribute)
Same Info
Fname Lname Nickname Init
Name (as an entity)
HDBMS-24
Data Representation Conflicts
Suitability of Object-Oriented Data Models as Global Data Models
• Rich set of type constructors
-> easy representation of other data models
• Different representation for equivalent data
– Different units
• Celsius ↔ Farenheit; Kilograms ↔ Pounds; Liters ↔ Gallons;
• Extensibility (user-defined types + type specific operators) &
Encapsulation
-> representation of “foreign” types/systems
-> hiding heterogeneity (concrete storage) in a natural way
– Different levels of precision
• 4 decimal digits versus 2 decimal digits
• Floating point versus integer
– Different expression denoting same information
• Enumerated value sets that are not one-to-one
• Inheritance (generalization) & computational completeness
-> schema integration
- factor out common properties of similar types
- thereby “arbitrary” computations possible
– {good, ok, bad} versus {one, two, three, four, five}
How to Resolve Schema Conflicts?
Can Object-Oriented Models Help?
HDBMS-25
©2002 Vera Goebel & Denise Ecklund
Use of Generalization & Computational Completeness (Example)
class Person (
name:
string,
address: Address)
is_a
method net-income(): float;
is_a
global
data
model
HDBMS-26
©2002 Vera Goebel & Denise Ecklund
Conflict Resolution
• Renaming entities and attributes
– Pick one name for the same things
– Use unique prefixes for different things
• Homogenizing representations
– Use conversions and mappings
class Faculty (
salary:
float,
course-given: set (Courses),
tax-rate:
float)
method net-income (): float
return (self->salary *
(1-self->tax-rate));
DBS1 class Faculty
name:
string,
address: Address,
salary: float,
course-given: set (Courses);
©2002 Vera Goebel & Denise Ecklund
• stored programs in relational systems
• methods in OO systems
• auxiliary schemas to store conversion rules/code
class Student (
grant:
float,
course-enroll: set (Courses))
method net-income (): float
return (self->grant);
• Homogenizing attributes
– Use type coercion (e.g., integer to float)
– Attribute concatenation (e.g., first name || last name)
– For missing attributes, assign default values
• Homogenizing an attribute and an entity
DBS2 class Student
name:
string,
address: Address,
grant: float,
course-enroll: set (Courses);
D-Name
D-Name
Emp
N
Member-of
1
Engr
N
Member-of
1
Dept
Dept
– Extract an attribute from the entity
local
data
models
HDBMS-27
• Ex: Project department name from the Dept entity
to create a virtual attribute (e.g., Emp->Dept.D-Name) Bldg
– Create an entity from the attribute
• Ex: Define default values and behavior for all other
attributes of the Dept entity
©2002 Vera Goebel & Denise Ecklund
… D-Name
Bldg
D-Name
…
HDBMS-28
Conflict Resolution
• Horizontal joins
Conflict Resolution involving a Database Key
1
2
3
– Union compatible
• For missing attributes, assign default values
or compute implicit values
– Extended union compatible
• Use generalization
1
2
3
4
5
A B C
• Entity-Attribute Conflicts where the
Attribute is a DB key in one local schema
A B C
Union
4
5
Rel
• Subclasses of the generalization
– class Person generalizes
class Student and class Employee
• Vertical joins
– Many and many to one
• Mixed Joins
1
2
3
4
5
1
2
3
A B C
1
2
3
Join
A D E F
1
2
A B CDEF
Union
A C D
Join
Join
1
2
– Vertical and horizontal joins in combination
1
2
A B
C E F
HDBMS-29
©2002 Vera Goebel & Denise Ecklund
Conflict Resolution involving a Partial Database Key
• Entity-Attribute Conflicts where the Attribute
is a partial DB key in one local schema
Attr1 … AttrN
• If Attr1 is a complete DB key in LDB2,
then in the global schema
– Define entities E and D and relationship Rel
– Define a new DB key attribute that will
be used to uniquely identify instances
of LDB2-E when they are accessed through
GDB-E and GDB-D
©2002 Vera Goebel & Denise Ecklund
GDB-E
N
Rel
1
GDB-D
N-key
Attr1 … AttrN
HDBMS-30
Key2
• HDBS manages the global schema = ∑ (all local exported schema)
N
LDB2-E
Rel
• Global schema definition facilities provide mechanisms for handling
the full spectrum of schematic differences that may exist among the
heterogeneous local schemata.
1
• Example:
– The global schema defines Attr1 as an entity
– Attr1 is a partial DB key for instances
of LDB2-E
LDB1-D
D
Global Schema Management
Attr1
LDB1-E
1
• Example:
dfv
– The global schema defines Attr1 as an entity
– Attr1 is a DB key for instances of LDB2-E
• See earlier example
LDB2-E
N
A B
– Define a virtual class containing common
attributes
– Provide specialized values and compute attribute
values for generalized attributes
Attr1
LDB1-E
LDB1-D
D
Attr1 … AttrN
– Can use an Auxiliary Schema to store mappers, translators, and converters
GDB-E
N
• If Attr1 is a partial DB key in LDB2
– Define the entities E and D, and relationship Rel
– Define a new attribute as a partial DB key
– Add the other partial key attributes from LDB2 as
partial keys
• Data is stored in the local component systems.
Rel
1
GDB-D
N-key Key2
Attr1 … AttrN
• Global dictionary information is used to query and manipulate the
data. The global language statements are translated into equivalent
statements of the local languages supported by the local systems
– Add partial DB key LDB2-Attr1 as an attribute only
©2002 Vera Goebel & Denise Ecklund
HDBMS-31
©2002 Vera Goebel & Denise Ecklund
HDBMS-32
Query Planning
and Optimization in a
Distributed Multi-DBMS
Query Processing and Optimization
• The HDBMS has
– A global Data Definition Language (DDL)
– A global Data Manipulation Language (DML)
– A set of local DMLs
HDBMS-33
Global Query on Multiple
Databases at Multiple Sites
SQ 3
query
translator 1
query
translator 2
query
translator 3
TQ 1
TQ 2
TQ 3
...
DB 2
DB 3
...
PQ k
query
... translator
n
TQ n
Sorting and unioning result data
Joining intermediate results
DB n
©2002 Vera Goebel & Denise Ecklund
HDBMS-34
– Heterogeneous Data Definition Languages
Data Directory &
Export Schema
• Weaker modeling languages do not support the same manipulation “features”
• Must use multiple techniques in order to define a consistent global data model
• Query fragmentation must produce a set of subqueries that reverse the
operations used to create/define the global schema
{ Post-processing Queries }
{ Subqueries, each on a single local DBMS }
Export & Aux
Schema
• Processing Steps:
{ Queries, that can be processed by local DBMS }
Decomposition &
Local Optimization
...
• Little information about “how” the subquery will be executed by the Local DBS
{ Subqueries, each on a single Multi-DB }
Local DBMS
PQ 1
– Autonomy
{ Post-processing Queries }
Translation
SQ n
• Similar to query fragmentation problem for homogeneous
distributed DBSs
• But …Complicating factors:
Data Allocation
Fragmentation & Global Opt
...
SQ 2
Query Fragmentation
Information Supporting Query Planning & Optimization
Multi-DB Manager
Localized multi-DB query m
Another
Multi-DBMS
SQ 1
DB 1
Localization
...
query fragmentation
and global optimization
– Given a query stated in the global query language (DML),
execute that query, in an optimal manner,
using the local database management systems
Control Site
query localization
Localized multi-DB query 1
• The HDBMS Query Processing Goal:
©2002 Vera Goebel & Denise Ecklund
global query
(1) Replace names from the global schema with “fullnames” from the export schemas
(2) If a subquery involves multiple export schemas, then break the query into queries
that operate on one export schema and insert data communication operators to
exchange intermediate results between local database systems
Local Schema
& Access Paths
Optimized Local Execution Plan
©2002 Vera Goebel & Denise Ecklund
HDBMS-35
©2002 Vera Goebel & Denise Ecklund
HDBMS-36
Global Query Optimization
Post-Processing Strategies
• Similar to global query optimization for homogeneous
distributed DBSs (many algorithms can be used directly)
• But only possible under the following assumptions:
– No data inconsistency (the global schema correctly represents
the semantics of disjoint, overlapping, and conflicting data)
– Know the characteristics of local DBSs
• e.g., statistical info on data cardinalities and selectivities are available
– Can transfer partial data results between different local DBSs
• Major impact on post-processing plans
2) Control site performs I&PP-ops for multi-DB results;
Multi-DB managers, and HDBMS agents on the local
database sites perform I&PP-ops for DBSs within one
multi-DB environment
• Possible if local DBMS can read intermediate results from
external sources, and sort, join, etc. can be directly invoked
HDBMS-37
©2002 Vera Goebel & Denise Ecklund
Parallel Execution Strategies
HDBMS-38
©2002 Vera Goebel & Denise Ecklund
Global Cost Estimation
• Join operations are slow → speedup with parallel execution?
• Traditional query plans use left linear join trees
– One of the operands is always a base relation
• Have good info on cardinality and selectivity for the base
R5
R4
R3
R1
• Differs from cost estimation in homogeneous distributed DBSs
– Little (or no) info on QP algorithms in local DBSs and data statistics
• Cost Estimation Function
– Cost to execute each subquery on the local DBSs
– Cost to execute all I&PP-ops
• via pushdown or by any HDBMS agent or proxy
• Use a simplified cost function
R2
Cost = Initialization cost
+ cost to retrieve a set of objects
+ cost to process a set of objects
• Bushy join trees provide parallel execution
in heterogenous multi-DB environments
©2002 Vera Goebel & Denise Ecklund
• Heavy work load; minimal parallelism
3) Use strategy #2 and use “pushdown” to get the local
database systems to perform I&PP-ops
– Post-processing Strategy
– Parallel Execution Possibilities
– Global Cost Function/Estimation
– Convert a left linear join tree into
a (balanced?) bushy join tree
1) Control site performs all intermediate and
post-processing operations (I&PP-ops)
• Better work load balance; more parallelism
• Primary Considerations:
– Used even in homogeneous distributed DBSs
because cooperative nodes can pipeline the
sequence of joins
• Three Strategies:
• Run test queries on the local DBSs to get time estimates for ops
R5
R1
R2
R3
R4
HDBMS-39
– Selection, with and without an index
– Join (testing for different algorithms: sort, hash, or indexed based algs)
©2002 Vera Goebel & Denise Ecklund
HDBMS-40
Query Translation
Local Data Sources with Heterogeneous Capabilities
When a query language of a local DBS is different from the global
query language, each export schema subquery for the local DB needs
to be translated from the global language to the target language.
Object-oriented (local)
Relational (local)
Object-oriented (global)
Hierarchical (local)
Network-oriented (local)
Relational (global)
...
Weaker target languages do not support the same operations,
so emulate required operations in post-processing
Ex: retrieve more data than requested by the query
OQL
and then post-process that data to compute
the correct response to the query
QUEL
Reduce the number of language mappings
using the Entity-Relationship Query Language
as an intermediary language
• ”GenCompact” defines a simple language to
describe query capabilities of the local DBSs
• Generates the ”best” query plan for each local DBS
• The ”best” query plan is
– Feasible (i.e., can be executed by the local DBS)
– Provides nearly optimal performance
SQL
ERQL
CODASYL
Access Funcs
©2002 Vera Goebel & Denise Ecklund
DB/2
Func I/F
HDBMS-41
Restrictions on Query Processing Capabilities
• The approach:
– Generate a large set of possible query plans
– Efficiently select best plan, quickly pruning bad plans
Simple Source Description Lanuage - SSDL
• Example database of cars for sale
• Condition-Attribute restrictions
– Attributes: {make, model, year, color, price}
– Can not select on attribute A1
– Must specify a selection value for attribute A2
• SSDL definition of query processing capabilities
for one local DBS
• Condition-Expression-Size restriction
_s → _s1 | _s2
_s1 → make = $m ٨ price < $p
– Can not specify more than k selection conditions
_s2 → make = $m ٨ color = $c
• Condition-Expression-Structure restrictions
}
Defines 2 forms of
supported queries
attributes :: _s1 : {make, model, year, color}
– Allow only atomic conditionals (i.e., no ٨ or ٧)
– Allow only ”and-ed” conditions (i.e., no ”or” operators)
– Combination of restrictions on using ٨ and ٧
©2002 Vera Goebel & Denise Ecklund
HDBMS-42
©2002 Vera Goebel & Denise Ecklund
HDBMS-43
attributes :: _s2 : {make, model, year}
”_”
$p
indicates a variable/non-terminal
indicates an integer constant
©2002 Vera Goebel & Denise Ecklund
}
Defines valid output
attributes for each
supported query form
$m indicates a string constant
$c indicates a string constant from
an enumerated set
HDBMS-44
The Global Query as a Condition Tree
Generating Alternate Plans
DB1 cannot evaluate condition C2
An alternate query plan is:
Global Query Post Processing
Select A
Where LQ1 and LQ2
Local Query LQ1
Select A
Where C1 and C2
C1(A1)
C2(A2)
• Obvious Query Plan:
– DB1 evaluates LQ1
– DB2 evaluates LQ2
– PP joins the two results
Global Query Post Processing
Select A
Where IP1 and LQ2
Local Query LQ2
Select A
Where C3 or C4
C4(A4)
Intermediate Processing IP1
Select A
Where C2(LQ1.A2)
DB1 cannot evaluate condition C2?
DB2 cannot evaluate condition (C3 or C4)?
Local Query LQ1
Select A, A2
Where C1
C3(A3)
What if ?
C1(A1)
HDBMS-45
©2002 Vera Goebel & Denise Ecklund
Generating Alternate Plans
Rewrite
Rewrite Rules
Intermediate Processing IP2
Select A
Where C3(LQ2.A3) or C4(LQ2.A4)
©2002 Vera Goebel & Denise Ecklund
C2(A2)
C4(A4)
HDBMS-46
©2002 Vera Goebel & Denise Ecklund
Query
as a
condition
tree (CT)
Global Query Post Processing
Select A
Where LQ1 and IP2
C1(A1)
C3(A3)
GenCompact Architecture
DB2 cannot evaluate the condition (C3 or C4)
An alternate query plan is:
Local Query LQ1
Select A
Where C1 and C2
Local Query LQ2
Select A
Where C3 or C4
equiv
CTs
Mark
marked
CTs
Generate
feasible
query
plans
SSDL descr
of LDBs
Cost
Best query
plan
Cost Model
• Rewrite - Commutativity, associativity, distribution, and copy rules
• Mark - Tests executability on the applicable LDBs
• Generate - Creates alternate query plans by inserting intermediate
processing steps
Local Query LQ2
Select A, A3, A4
Where { }
• Cost - Applies a cost function to quickly select the ”best plan”
HDBMS-47
©2002 Vera Goebel & Denise Ecklund
HDBMS-48
Schema SQL – A Natural Extension to SQL
• Manipulate data and schema using one language
What if . . .
ABCD
all your local systems are relational,
have similar processing capabilities,
but use different schemas?
–
–
–
–
HDBMS-49
Set of attribute names in relation rel in the database db
RatePerKwH
Name
Address
Name
ServiceAddr
BillingAddr
RateCategory
Name
ServiceAddr
BillingAddr
RateCategory
RateCategory
RateCategory
RatePerCuM
Fees
Oil Database
DelN
DelS
DelE
DelW
• A valid SchemaSQL variable is of the form
<range> <var>
©2002 Vera Goebel & Denise Ecklund
HDBMS-50
Home
Biz
Industry
Rates
Set of relation names in the database db
Set of values in the columns named attr in the
relation rel in the database db
One query accessing multiple databases
Queries over the schema definitions themselves
Runtime restructuring of views and output schemas
Aggregation operations over rows, columns and selected blocks
Natural Gas Database
Set of database names in the federation
db::rel.attr
DB3 Meta-data
©2002 Vera Goebel & Denise Ecklund
Meaning
Set of tuples in the relation rel in the database db
DB3
DB1 Meta-data
SchemaSQL – Examples: 3 CIS Databases
Electricity Database
CustInfo Name Address1 Address2 CustType
• Extends the variables and ranges defined by SQL
• Valid SchemaSQL ranges are:
db::rel
DB2 Meta-data
DB1
• SchemaSQL supports
SchemaSQL Syntax Definitions
Symbol
→
db →
db::rel →
Select A,B,C,D
From Q, R,
S, T,
U, V
...
Maybe you can use SchemaSQL to write
and execute your global queries!
©2002 Vera Goebel & Denise Ecklund
DB2
SchemaSQL query
HDBMS-51
Name
DeliveryAddr
MailingAddr
TankCapacity
DeliveryFreq
PriceRate
Name
DeliveryAddr
MailingAddr
TankCapacity
DeliveryFreq
PriceRate
Name
DeliveryAddr
MailingAddr
TankCapacity
DeliveryFreq
PriceRate
Name
DeliveryAddr
MailingAddr
TankCapacity
DeliveryFreq
PriceRate
©2002 Vera Goebel & Denise Ecklund
HDBMS-52
Global Schema for the Federated CIS Database
SchemaSQL – Example #1
Electricity Database
Electricity
Nat Gas
Elec Meta-data
Oil
Nat Gas Meta-data
CustInfo
Name
Address1
Address2
CustType
RatePerKwH
Oil Meta-data
Federated CIS Database
CustInfo
Name
BillAddr
ServAddr
CustType
FuelType ...
Federated CIS Database
CustInfo
Rates
OilDel
Acct#
Name
RateCategory
Acct#
BillAddr
FuelRate
DelAddr
ServAddr
CustType
TaxRate
GovFees
TankCapacity
FuelType
RateCategory
from
DelSchedule
HDBMS-53
©2002 Vera Goebel & Denise Ecklund
SchemaSQL – Example #2
Address
RateCategory
Name
ServiceAddr
BillingAddr
RateCategory
Name
ServiceAddr
BillingAddr
RateCategory
RatePerCuM
create view LowRatePayers::CInfo(Name, FuelType, Rate)
select NameRel.Name, DBname, RateValue
from
→ DBname,
DBname→ NameRel,
DBname→ RateRel,
DBname::RateRel→ RateAttr,
DBname::RateRel.RateAttr RateValue
where (Dbname = ”Natural Gas”
and NameRel <> ”Rates”
and RateRel = ”Rates”
and NameRel.RateCategory = RateRel.RateCategory
and RateRel.RateValue < 0.035)
or (NameRel = RateRel
and ((RateAttr = ”RatePerKwH”)
or (RateAttr = ”PriceRate”)
or (RateAttr = ”RatePerCuM”))
and RateValue < 0.035)
Fees
Federated CIS Database
CustInfo
Name
CustType
FuelType
RateCategory
...
create view NGtoFed::CustInfo(Name, CustType, FuelType, RateCategory)
select
from
where
Rel.Name, Rel, DBname, Rel.RateCategory
→ DBname,
DBname→ Rel,
Rel <> ”Rates”
©2002 Vera Goebel & Denise Ecklund
HDBMS-54
©2002 Vera Goebel & Denise Ecklund
Problem: Create a view containing the ”Low Rate Payers” from all the federated
CIS databases (less than 0.035 for each unit of fuel).
Name
RateCategory
→ DBname,
DBname::CustInfo Rel,
SchemaSQL – Example #3
Natural Gas Database
Home
Biz
Industry
Rates
create view ElectoFed::CustInfo(Name, BillAddr, ServAddr, CustType, FuelType)
select Rel.Name, Rel.Address2, Rel.Address1, Rel.CustType, DBname
HDBMS-55
©2002 Vera Goebel & Denise Ecklund
Variable
Dbname
NameRel
& RateRel
RateAttr
Values
Electricity
Natural Gas
Oil
CustInfo
Home
…
DelN
…
Name
Address1
…
Name
Address
…
Name
DeliveryAddr
…
HDBMS-56
SchemaSQL Service Architecture
Summary – Some Query Processing Tools for HDBMSs
Intermediate and Post Processing
Resident
SQL Engine
Answers
Q1, Q2, ... Qn
SchemaSQL Query
FST
PostProc
Queries
SchemaSQL
Server
RDBMS1
No local DB support
Custom Serv in Global Mgr
Final Answer
...
Answer Q1
Answer Qn
Varied local models
Local DB’s
QP Capability
GenCompact
Heterogeneity
of Data Models
Federation System Table (FST) – stores database names, relation names,
attribute names, and statistical information on the local RDBMSs
HDBMS-57
To Be Continued ... Next Week ...
Transaction Management in HDBMSs
©2002 Vera Goebel & Denise Ecklund
Func Intf to
Simple QP
DBMS-specific QP (< a DBMS)
Same local models
RDBMSn
©2002 Vera Goebel & Denise Ecklund
GenCompact
SchemaSQL
No local DB support
DBMS Serv in Global Mgr
HL Lang Intf to
DBMS-specific QP
Optimized local
SQL query Q2
Optimized local
SQL query Q1
Federation
User
Varied local DB support
Final
Answer
HDBMS-59
©2002 Vera Goebel & Denise Ecklund
HDBMS-58
Download