same data model and access language

advertisement
Object Oriented Multi-Database Systems
An Overview of Chapters 4 and 5
What Are OOMDMS’s?
What are some of their key differences?
A multidatabase system is a distributed system that provides a global interface
to heterogeneous pre-existing local DBMS’s
•
•
Users can access multiple remote databases with a single query
Automatically performs the data model and access language transformations between global
query and the local databases
Distributed databases
•
•
•
Maintain a global name space and some form of global schema
All local databases use the same data model and access language
A collection of cooperating, homogeneous local DBMS’s that provides a uniform global interface
Interoperable systems
•
•
•
•
No concept of a global schema/namespace
Provide formats and protocols for shipping data between local systems
Do not provide much global functionality
Loosely coupled
Multidatabases
•
•
•
Supports full/partial global schemas
Integrates heterogeneous, pre-existing local DBMS’s
Local databases can use different data model and access languages
General Issues of Dealing with the
Schema Integration Problem
• Tool requirements for successful integration of real-world schema’s:
• Assists users during integration
• Take into account users requirements and usability as the overriding considerations for
the tool
• No changes to existing data and local schemas
• Users only have to deal with global semantic model
• Incremental schema integration capability
• Permit imprecise reasoning
• Automatic generation of mappings between global and local schemas
• Advantages of an Object-Oriented Data Model
• Class structures are specifically designed to support generalisation of lower level data
classes
• Methods and polymorphism enable a rich set of functions to be applied to data objects
• Provides a very natural mechanism for translating to and from other data models
Nature of Problems in Schema Integration
• Identification of correspondences is non-trivial. Occurs due to:
• Syntactic differences
• E.g. Differences in names, domain, scale, data types
• Semantic differences
• E.g. Synonyms, Hyponyms, Antonyms
• Correspondence types
• Equivalence
• Containment
• Overlap
• Disjoint
• Others?
Integration Process - Activities
• Application of reasoning techniques for the comparison of the schemas to
generate correspondence assertions
• Validation of system-generated assertions by the user or specification of
new assertions by the user
• Automatic generation of new assertions or deletion of existing assertions
based on user validation of assertions
• Checking and ensuring the consistency of user validations and assertions
• Merging the objects according to the specified assertions and options
• Generation of mappings between the global schema and the component
schemas
Core Structures Central To Schema Integration
• Authors proposal of their Integration Tool, consists of:
• A set of invariant structures i.e. assumptions
• A set of validated assertions called facts
• A set of merging rules
• Advantages
• Compared to other tools, the set of assumptions do not change
even when integration technique changes
• Tool is extensible due its modular architecture
• Imprecise reasoning module
• Consistency checking module
• User interface
• Mapping generator
Semantic Heterogeneity in Multidatabase Systems
• People perceive real-world objects in different ways which
leads to potentially different representations of the same
object
• Semantics is relative i.e. different conceptualisations
• Example: Concept of Marriage in DB#1 represented by
objects of the class COUPLES, with attributes
HUSBAND and WIFE, whereas in DB#2 a class
PERSONS with a SPOUSE attribute
Classification of Semantic Heterogeneity
Three main classification groups:
• Heterogeneities between object classes
• Extensions, i.e. membership
• Names i.e. Synonomy, polysemy
• Class methods/attributes, and many more…
• Heterogeneities between class structures
• Different generalisation hierarchy
• Representing part-whole relationships
• Heterogeneities between object instances
• Attributes allowing null/nonnull
• Value discrepancies
Detecting Semantic Heterogeneity
• Aim is to identify semantically related objects by a
comparison process in which their similarities and
dissimilarities are found out
• (Early Schema Integration)Tools
• SIS: A Schema Integration System
• Honeywell Testbed
• MUVIS
• A number of strategies exist for similarity detection
• A Theory of Attribute Equivalence
• Common Concepts Approach
• Semantic Unification Approach
• Maximum Spanning Tree Approach
Resolution of Semantic Heterogeneity
• After identifying semantically related objects, conflicts need
to be resolved in order to gain integrated access to the
multidatabase
• Several tools and systems exist (even more post 1996)
• Multibase
• Honeywell Testbed
• Carnot
• More recently Coma++
• …many more
Conclusion
• Semantic Heterogeneity is an obstacle for interoperability
• Typically database schema’s do not provide enough
semantics
• Most approaches adopt a semi-automatic approach to
detecting semantic similarity
• Detection of semantic similarity is more difficult than
semantic resolution
• Advantage of adopting an object-oriented data model is
its high expressiveness resulting in richer semantic
models
Download