slides

advertisement
Chapter 6: General Schema
Manipulation Operators
PRINCIPLES OF
DATA INTEGRATION
ANHAI DOAN ALON HALEVY ZACHARY IVES
Outline
 Introduction to model management and motivation
 The merge operator
 The ModelGen operator
 The Invert operator
Model Management Operators
 We saw operators for creating mappings between
pairs of schemas.
 But you can imagine other operators on schemas
and mappings:
 Merge schemas, compose and invert mappings, translate
schemas from one data model to another
 In fact, imagine an entire algebra of operators that
apply to schemas and to mappings:
 Many common workflows can be formulated as a
sequence of such operators [Bernstein, 2000]
 Note: “model” = “schema”. More terminology coming
soon.
Example of Model Management (1)
 In a data integration scenario, you may proceed as
follows, beginning with sources S1 and S2:
 Use a match operator to create a mapping between S1 and
S2
 Use merge to create a merged (mediated) schema of S1
and S2 with mappings. Merge will create the minimal
schema that includes both S1 and S2.
Example of Model Management (2)
 Suppose we have another
source S3, which is very similar
to S1.
 We could first use match to
create a mapping from S1 to S3
 Then use compose to create a
mapping from S3 to the
mediated schema G.
Operators
 Match: see previous chapters
 Merge: create a merged schema of S1 and S2 w.r.t. a
mapping M12
 ModelGen: create an equivalent model but in a
different data model (e.g., relational  XML)
 Invert: given M12, create M21
 Diff: find the difference between two models (see
bibliography)
Some Terminology
 Model: a specific description of a set of data in a
given data model.
 Meta model: a data model, such as relational
schema, XML DTD, java class definitions, …
 Meta-meta-model: a generic language that is
independent of a particular meta-model
 Usually, some a graph-based formalism.
Outline
 Introduction to model management and motivation
 The merge operator
 The ModelGen operator
 The Invert operator
The Merge Operator
 Given
 Two models, M1 and M2
 A mapping from M1 to M2
 Create:
 A merged model M12 that contains only the information in
M1 and M2, but does not repeat information that is in both
 Mappings from M1and M2 to M12
 Challenge to many model management operators:
 Can you develop algorithms that are generic, i.e., not
specific to particular data models?
Merge Challenges: Example
 Challenge 1: different attribute representations.
Resolution should be part of the input mappings.
Merge Challenges: Example
 Challenge 2: merging models of different data
models. (What if one data model supports subattributes and another doesn’t?)
 See ModelGen.
Merge Challenges: Example
 Challenge 3: “fundamental conflicts”. Zipcode is an
integer in one model and string in another. Merged
model cannot have both:
 Solutions depend on particular conflict and data models
involved.
Outline
 Introduction to model management and motivation
 The merge operator
 The ModelGen operator
 The Invert operator
The ModelGen Operator
 Transform a schema from one meta-model (e.g,. Java
object model, relational, XML) to another metamodel.
 Main challenge: features that exist in the source
meta-model may not exist in the target (e.g., subclasses and inheritance).
 The need for ModelGen is very common in practice
and is used by several of the other operators.
ModelGen Example
Java classes  relational tables
No classes
or
inheritance
in the
relational
model
ModelGen Strategy
 Possible to design specific transformations from one
meta-model to another, but we want a generic
approach.
 Design a super meta-model that has (almost) all
features that exist in the meta-models.
 The super meta-model knows which features are
present in each meta-model.
 The algorithm will translate a given model into the
super meta-model and from there to the target
meta-model.
ModelGen Algorithm
 Input: model M1 in meta-model MM1
 Output: a model M2 in meta-model MM2 that is
equivalent to M1.
 Transform M1 to the super-model, yielding M’.
 While M’ includes features that are not present in
MM2, apply transformations to remove these
features (e.g., remove class hierarchy by translating
it to multiple vertically partitioned tables)
 Transform M’ into M2
Outline
 Introduction to model management and motivation
 The merge operator
 The ModelGen operator
 The Invert operator
The Invert Operator
 Schema mappings are often directional:
 They map data in source schema into a target schema.
 Natural question:
 Can we find an inverse mapping?
 But what is the right definition of inverse.
 We’ll see a couple of failed attempts before we see a good
one.
 Note: algorithms here are not generic. Highly
dependent on the meta-model.
Invert Definition: Attempt 1
 Given a mapping M between a source S and target T.
 M defines a relation between pairs of instances (I,J)
that are consistent with each other:
 I is an instance of S, J is an instance of T.
 Hence, a natural definition is: M-1 should define the
relation (J,I), where (I,J) in M.
 However, inverses defined this way will not be
expressible with tuple-generating
dependencies/GLAV mappings.
 Why? See next slide.
Attempt #1 Problem Explained
 Any relation defined by TGDs is closed up on the
right and closed down on the left.
 Formally, assume
 (I,J) is in M
 I’ is a subset of I, J is a subset of J’, then
 (I’, J’) is also in M.
 However, by definition, M’ would have to be closed
up on the left and closed down on the right
 Hence, cannot be defined with TGDs or GLAV.
Invert Definition: Attempt 2
 Definition by composition:
 M composed with M’ should be the identity mapping!
 However, it can be shown that under that condition,
a mapping has an inverse only if the following holds:
 If I1 and I2 are two distinct instances of S, then their targets
under M should be distinct instances of T.
 The above result considerably limits the mappings
that have inverses. m1 and m2 won’t have inverses:
m1 : P(x, y) ® Q(x)
m2 : P(x, y, z) ® Q(x, y)Ù R(y, z)
Third Time’s a Charm: Quasi
inverses
 Define equivalence between two instances w.r.t. M
as:
I1 @ I 2 if (I1, J) Î M iff (I2, J) Î M
 Define M’ to be the quasi-inverse of M if the
composition of M and M’ always maps I to an
instance I’ such that I @ I '
 Example: m : P(x, y) ® Q(x)
m' : Q(x) ® $yP(x, y)
{P(1, 2)} ® {Q(1)} ® {P(1, A)}
m
m'
{P(1, 2)} @ {P(1, A)} So m is a quasi-inverse of m’
Summary of Chapter 6
 Generic model management operators save a lot of
repetitive code and can result in several forms of
efficiency gains
 Employing such operators also ensures that applications
think carefully about the meaning of what they are doing.
 Two main open challenges:
 Can the implementation of these operators be described
in a meta-model independent fashion?
 Is model management a system in itself that should be
built or should operator implementations be individual
services?
Download