q(m,p,r) - osm.cs.byu.edu

advertisement
Global-as-View and Local-as-View
for Information Integration
CS652 Spring 2004
Presenter: Yihong Ding
1
Common Integration Architecture
• Information Integration
Systems
• Global-as-view (Gav.)
vs. Local-as-view (Lav.)
• Query Reformulation
• Specification of Source
Description
• Adding new sources
2
Query Reformulation
• Problem: rewrite a user query expressed in the
mediated schema into a query expressed in the
source schema
Given a query Q in terms of the mediator schema
relations, and descriptions of information sources
Find a query Q’ that uses only the source relations,
such that
– Q’  Q, and
– Q’ provides all possible answers to Q given the sources
3
Solving Queries by Views
Mediator Relations
Source Relations
4
Query Rewriting Using Views
• Query Containment: q’ q D q’(D) q(D)
• Query Equivalence: q’=q q’ q ^ q q’
Given query q and view definitions V={v1, …, vn}
• q’ is an Equivalent Rewriting of q using V if
– q’ refers only to views in V, and
– q’ = q
• q’ is an Maximally-Contained Rewriting of q using
V if
– q’ refers only to views in V and
– q’  q, and
– There is no rewriting q1, such that q’ q1 and q1q’
5
Computation
Complexity

p
k
 k  
p
p
k 1
 k  k
p
6
p
Complexity of Query Containment
• Conjunctive Queries (CQ) (NP-Complete)
– Q1: p(X,Z) :- a(X,Y) & a(Y,Z)
– Q2: p(X,Z) :- a(X,Y) & a(V,Z)
• CQ’s With Negation ( -Complete)
p
2
– Q1: p(X,Z) :- a(X,Y) & a(Y,Z) & NOT a(X,Z)
• CQ’s With Arithmetic Comparison (  -Complete)
p
2
– Q1: p(X,Z) :- a(X,Y) & a(Y,Z) & X<Y
• Datalog Programs
– p(A,C) :- a(A,B) & b(B,C)
7
Specification of Source Description
• Views: resources that used by integrator to
help to answer queries
• Gav. Mediator relation defined as view over
source relations
• Lav. Source relation defined as view over
mediator relations
8
Information Integration Systems
• Tsimmis
– Stanford and IBM
– Global-as-View (Gav)
– Mediator relations defined as views of source relations
• Information Manifold (IM)
–
–
–
–
AT&T
Local-as-View (Lav)
Description logic
Source relations defined as views of mediator relations ( a
collection of global predictions)
9
TSIMMIS – Gav Solution
• The Stanford-IBM Manager of Multiple
Information Sources (TSIMMIS)
• Offers:
– A flexible data model
– A common query language
– Other supporting tools
10
TSIMMIS – Components
• OEM (Object-Exchange Model)
• LOREL (Lightweight Object REpository
Language)
• MSL (Mediator Specification Language)
• Wrappers
11
TSIMMIS – OEM
• Object Exchange Model
• The data model for TSIMMIS
• “self-describing” (labels carry all of the
information that there is about an object)
• Flexible
• First order logic
12
TSIMMIS – OEM
Object Identifier
OID:
label
“set” or “string”
type
Human Understandable
13
value
A set or a string
TSIMMIS – OEM
library
book
set
set
author string
Aho
title
14
string Compilers…
TSIMMIS – OEM
First order predicate logic
123 author string
Aho
author( T, “Aho” )
This would return the object IDs of all
objects with a label “author” and value “Aho”.
15
TSIMMIS – LOREL
• Lightweight Object REpository Language
• An OQL for OEM
• The end-user language for TSIMMIS
16
TSIMMIS – LOREL
• Example
select library.book.title
from library
where library.book.author = “Aho”
17
TSIMMIS – LOREL
• Partial Match Semantics
select R.A
from R, S, T
where R.A = S.A or R.A = T.A
• This would fail to return anything in SQL if either S
or T were empty.
• Because of partial match semantics this does not
fail in LOREL
18
TSIMMIS – MSL
•
•
•
•
Mediator Specification Language
Allows declarative specification of mediators
Object oriented, logical query language
Targeted to OEM
19
TSIMMIS – MSL
library
Query
set
Mediator
book
set
author string
title
Mediator
Aho
Wrapper
string Compilers…
Source
Wrapper
Source
<booktitle X> :- <library { <book { <title X> <author “Aho”> } > } > @s1
20
TSIMMIS – Wrappers
Query
• Wrappers are similar to
database drivers
Mediator
Mediator
• Wrappers are written
with MSL
Wrapper
Source
21
Wrapper
Source
TSIMMIS – Wrappers
• Wrappers have the form:
MSL template
// action //
• Example:
<books X> :- <library { X:<book {<title X> <author $AU>}> }>@s1
// sprintf(lookup-query, “find author %s”, $AU) //
22
TSIMMIS – Summary
• End users need to specify their sources w.r.t.
a mediator model – OEM in TSIMMIS
• Query specification is standard – LOREL
• Query rewriting is straightforward – MSL and
wrappers
• To add a new source is not easy – need to
specify it in the mediator model
23
Information Manifold
• Challenges for
Information Integration
– Interrelated data over
multiple information
sources
– Large number of the
sources
– Limited size of data in
many of the sources
– Greatly variant details of
interacting with each
source
24
IM Architecture
Bucket algorithm
1
2
3
25
World View
Classes:
Product
Automobile
NewCar
Automobile
Car Motorcycle
Car
UsedCar CarForSale
Virtual Relations:
Product(Model)
Automobile(Model, Year, Category)
Motorcycle(Model, Year)
Car(Model, Year, Category)
NewCar(Model, Year, Category)
UsedCar(Model, Year, Category)
CarForSale(Model, Year, Category, Price, SellerContact)
26
Source Descriptions
For each source:
• Content Record
• Capability Record
Web Sources for
Automobile Application
27
Content Records of Auto Sources
28
Capability Records of Auto Sources
desired input set
possible output set
capable selection set
29
Query Reformulation
• Containing instead of equivalent
– Incomplete source
– Useful subset
• Utilizes Plan Generator to:
–
–
–
–
Prune irrelevant sources
Split query into subgoals
Generate conjunctive query plans
Find executable ordering of subgoals
30
The Bucket Algorithm
Given: user query q, source descriptions {Vi}
1. Find relevant source (fill buckets)
For each relation g in query q
•
•
Find Vj that contains relation g
Check that constraints in Vj are compatible with q
2. Combine source relations {Vj} from each
bucket into a conjunctive query q’ and check
for containment (q’  q)
31
The Bucket Algorithm: Example
q(m,p,r)  CarForSale(c), Category(c,sportscar), Year(c,y), y1992,
Model(c,m), Price(c,p), ProductReview(m,y,r)
32
1. Filling the Buckets
q(m,p,r)  CarForSale(c),
Category(c,sportscar),
Year(c,y), y1992,
Model(c,m),
Price(c,p),
ProductReview(m,y,r)
CarForSale(c),
Category(c,t),
t=sportscar
Year(c,y), Model(c,m), Price(c,p), ProductReview(m,y,r)
y1992
V1(c1)
V1(c1,t1)
V1(c1,y1) V1(c1,m1)
V1(c1,p1)
V2(c2)
V2(c2,t2)
V2(c2,y2) V2(c2,m2)
V2(c2,p2)
V3(c3)
V3(c3,t3)
V3(c3,y3) V3(c3,m3)
V3(c3,p3)
V5(m5,y5,r5)
33
2. Checking Containment
User Query
q(m,p,r)  CarForSale(c),
Category(c,sportscar),
Year(c,y), y1992,
Model(c,m),
Price(c,p),
ProductReview(m,y,r)

?
Result Query
q’(m,p,r)  V1(c)({Category(c):sportscar},
{Price(c), Model (c), Year(c)},
{Year(c)1992,
Category(c)=sportscar}),
V5(m,y,r)({m:Model(c), y:Year(c)},
{r}, {}).

Expanded Query
q’(m,p,r)  CarForSale(c),
UsedCar(c),
Category(c,t), t=sportscar,
Model(c,m),
Year(c,y),
Price(c,p),
ProductReview(m,y,r), y1992
34
Finding an Executable Ordering
CarForSale(c),
V1(c)
Category(c,t),
t=sportscar
V1(c,t)
Year(c,y), Model(c,m), Price(c,p), ProductReview(m,y,r)
y1992
V1(c,y)
V1(c,m)
V1(c,p)
BindAvail1 = {CarForSale(c,sportscar), Model(c,m), Year(c,y), Price(c,p),
SellerContact(c,s)}
BindAvail1 = {CarForSale(c,sportscar), Model(c,m), Year(c,y), Price(c,p),
SellerContact(c,s), ProductReview(m,y,r)}
BindAvail1 = {CarForSale(c,sportscar), Model(c,m), Year(c,y), Price(c,p),
SellerContact(c,s), ProductReview(m,y,r), y1992}
35
V (m,y,r)
5
Advantages and Disadvantages
• Gav: Tsimmis
– Advantage
• Query reformulation: rule unfolding
– Disadvantage
• Mediation description
• Adding, removing, and modifying source description
– Better for static, centralized systems
• Lav: Information Maniford
– Advantage: adding new sources
• Mediator (global predicates, source descriptions)
• Query processing
– Disadvantages
• query reformulation (Bucket algorithm)
– Better for dynamic, distributed systems
36
Download