Object Oriented Multi-Database Systems An Overview of Chapters 4 and 5 What Are OOMDMS’s? What are some of their key differences? A multidatabase system is a distributed system that provides a global interface to heterogeneous pre-existing local DBMS’s • • Users can access multiple remote databases with a single query Automatically performs the data model and access language transformations between global query and the local databases Distributed databases • • • Maintain a global name space and some form of global schema All local databases use the same data model and access language A collection of cooperating, homogeneous local DBMS’s that provides a uniform global interface Interoperable systems • • • • No concept of a global schema/namespace Provide formats and protocols for shipping data between local systems Do not provide much global functionality Loosely coupled Multidatabases • • • Supports full/partial global schemas Integrates heterogeneous, pre-existing local DBMS’s Local databases can use different data model and access languages General Issues of Dealing with the Schema Integration Problem • Tool requirements for successful integration of real-world schema’s: • Assists users during integration • Take into account users requirements and usability as the overriding considerations for the tool • No changes to existing data and local schemas • Users only have to deal with global semantic model • Incremental schema integration capability • Permit imprecise reasoning • Automatic generation of mappings between global and local schemas • Advantages of an Object-Oriented Data Model • Class structures are specifically designed to support generalisation of lower level data classes • Methods and polymorphism enable a rich set of functions to be applied to data objects • Provides a very natural mechanism for translating to and from other data models Nature of Problems in Schema Integration • Identification of correspondences is non-trivial. Occurs due to: • Syntactic differences • E.g. Differences in names, domain, scale, data types • Semantic differences • E.g. Synonyms, Hyponyms, Antonyms • Correspondence types • Equivalence • Containment • Overlap • Disjoint • Others? Integration Process - Activities • Application of reasoning techniques for the comparison of the schemas to generate correspondence assertions • Validation of system-generated assertions by the user or specification of new assertions by the user • Automatic generation of new assertions or deletion of existing assertions based on user validation of assertions • Checking and ensuring the consistency of user validations and assertions • Merging the objects according to the specified assertions and options • Generation of mappings between the global schema and the component schemas Core Structures Central To Schema Integration • Authors proposal of their Integration Tool, consists of: • A set of invariant structures i.e. assumptions • A set of validated assertions called facts • A set of merging rules • Advantages • Compared to other tools, the set of assumptions do not change even when integration technique changes • Tool is extensible due its modular architecture • Imprecise reasoning module • Consistency checking module • User interface • Mapping generator Semantic Heterogeneity in Multidatabase Systems • People perceive real-world objects in different ways which leads to potentially different representations of the same object • Semantics is relative i.e. different conceptualisations • Example: Concept of Marriage in DB#1 represented by objects of the class COUPLES, with attributes HUSBAND and WIFE, whereas in DB#2 a class PERSONS with a SPOUSE attribute Classification of Semantic Heterogeneity Three main classification groups: • Heterogeneities between object classes • Extensions, i.e. membership • Names i.e. Synonomy, polysemy • Class methods/attributes, and many more… • Heterogeneities between class structures • Different generalisation hierarchy • Representing part-whole relationships • Heterogeneities between object instances • Attributes allowing null/nonnull • Value discrepancies Detecting Semantic Heterogeneity • Aim is to identify semantically related objects by a comparison process in which their similarities and dissimilarities are found out • (Early Schema Integration)Tools • SIS: A Schema Integration System • Honeywell Testbed • MUVIS • A number of strategies exist for similarity detection • A Theory of Attribute Equivalence • Common Concepts Approach • Semantic Unification Approach • Maximum Spanning Tree Approach Resolution of Semantic Heterogeneity • After identifying semantically related objects, conflicts need to be resolved in order to gain integrated access to the multidatabase • Several tools and systems exist (even more post 1996) • Multibase • Honeywell Testbed • Carnot • More recently Coma++ • …many more Conclusion • Semantic Heterogeneity is an obstacle for interoperability • Typically database schema’s do not provide enough semantics • Most approaches adopt a semi-automatic approach to detecting semantic similarity • Detection of semantic similarity is more difficult than semantic resolution • Advantage of adopting an object-oriented data model is its high expressiveness resulting in richer semantic models