CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm

advertisement
CSE 636
Data Integration
Answering Queries Using Views
Bucket Algorithm
The Bucket Algorithm
• Each subgoal g of Q must be “covered” by some
view
• Make a list of candidates (buckets) per query
subgoal
• Consider combinations of candidates from
different buckets
• Not all combos are “compatible”
• Keep the compatible ones and minimize them
• Discard the ones contained in another
• Take their union
2
The Bucket Algorithm
q(X,Y,R) :- ForSale(X,Y,C,”auto”), Review(X,R,”auto”),
Y > 1985
Step 1: For each subgoal, put the relevant sources into a
bucket:
V1(name, year) :- ForSale(name, year, “France”, “auto”),
year > 1990 would be relevant
V3(name, year) :- ForSale(name, year, “France”, “cheese”)
would be irrelevant
Step 2: Take the Cartesian product of the buckets
Algorithm produces maximally contained rewriting
Ignores interactions between subgoals in Step 1
3
The Bucket Algorithm: Example
V1(Std,Crs,Qtr,Title) :- reg(Std,Crs,Qtr), course(Crs,Title),
Crs ≥ 500, Qtr ≥ Aut98
V2(Std,Prof,Crs,Qtr) :- reg(Std,Crs,Qtr), teaches(Prof,Crs,Qtr)
V3(Std,Crs) :- reg(Std,Crs,Qtr), Qtr ≤ Aut94
V4(Prof,Crs,Title,Qtr) :- reg(Std,Crs,Qtr), course(Crs,Title),
teaches(Prof,Crs,Qtr), Qtr ≤ Aut97
q(S,C,P) :- teaches(P,C,Q), reg(S,C,Q), course(C,T),
C ≥ 300, Q ≥ Aut95
Step 1: For each query subgoal, put the relevant sources into
a bucket
4
The Bucket Algorithm: Example
V1(Std,Crs,Qtr,Title) :- reg(Std,Crs,Qtr), course(Crs,Title),
Crs ≥ 500, Qtr ≥ Aut98
V2(Std,Prof,Crs,Qtr) :- reg(Std,Crs,Qtr), teaches(Prof,Crs,Qtr)
V3(Std,Crs) :- reg(Std,Crs,Qtr), Qtr ≤ Aut94
V4(Prof,Crs,Title,Qtr) :- reg(Std,Crs,Qtr), course(Crs,Title),
teaches(Prof,Crs,Qtr), Qtr ≤ Aut97
q(S,C,P) :- teaches(P,C,Q), reg(S,C,Q), course(C,T),
C ≥ 300, Q ≥ Aut95
Buckets
teaches
PProf, CCrs, QQtr
reg
course
V2
V4
Note: Arithmetic predicates don’t pose a problem
5
The Bucket Algorithm: Example
V1(Std,Crs,Qtr,Title) :- reg(Std,Crs,Qtr), course(Crs,Title),
Crs ≥ 500, Qtr ≥ Aut98
V2(Std,Prof,Crs,Qtr) :- reg(Std,Crs,Qtr), teaches(Prof,Crs,Qtr)
V3(Std,Crs) :- reg(Std,Crs,Qtr), Qtr ≤ Aut94
V4(Prof,Crs,Title,Qtr) :- reg(Std,Crs,Qtr), course(Crs,Title),
teaches(Prof,Crs,Qtr), Qtr ≤ Aut97
q(S,C,P) :- teaches(P,C,Q), reg(S,C,Q), course(C,T),
C ≥ 300, Q ≥ Aut95
Buckets
SStd, CCrs, QQtr
teaches
reg
V2
V4
V1
V2
course
Note: V3 doesn’t work: arithmetic predicates not consistent
V4 doesn’t work: S not in the output of V4
6
The Bucket Algorithm: Example
V1(Std,Crs,Qtr,Title) :- reg(Std,Crs,Qtr), course(Crs,Title),
Crs ≥ 500, Qtr ≥ Aut98
V2(Std,Prof,Crs,Qtr) :- reg(Std,Crs,Qtr), teaches(Prof,Crs,Qtr)
V3(Std,Crs) :- reg(Std,Crs,Qtr), Qtr ≤ Aut94
V4(Prof,Crs,Title,Qtr) :- reg(Std,Crs,Qtr), course(Crs,Title),
teaches(Prof,Crs,Qtr), Qtr ≤ Aut97
q(S,C,P) :- teaches(P,C,Q), reg(S,C,Q), course(C,T),
C ≥ 300, Q ≥ Aut95
Buckets
CCrs, TTitle
teaches
reg
course
V2
V4
V1
V2
V1
V4
7
The Bucket Algorithm: Example
Step 2:
• Try all combos of views, one each from a bucket
• Test satisfaction of arithmetic predicates in each case
– e.g., two views may not overlap, i.e., they may be
inconsistent
• Desired rewriting = union of surviving ones
Query rewriting 1:
teaches
reg
course
V2
V4
V1
V2
V1
V4
q1(S,C,P) :- V2(S’,P,C,Q), V1(S,C,Q,T’), V1(S”,C,Q’,T)
– no problem from arithmetic predicates (none in V2)
– May or may not be minimal (why?)
8
The Bucket Algorithm: Example
Unfolding of rewriting 1:
q1’(S,C,P) :- r(S’,C,Q), t(P,C,Q), r(S,C,Q), c(C,T’), r(S”,C,Q’),
c(C,T), C ≥ 500, Q ≥ Aut98, C ≥ 500, Q’ ≥ Aut98
• Black r’s can be mapped to green r:
S’S, S”S, Q’Q
• Black c can be mapped to green c:
just extend above mapping to TT’
Minimized unfolding of rewriting 1:
q1m’(S,C,P) :- t(P,C,Q), r(S,C,Q), c(C,T’), C ≥ 500, Q ≥ Aut98
Minimized rewriting 1:
q1m(S,C,P) :- V2(S’,P,C,Q), V1(S,C,Q,T’)
9
The Bucket Algorithm: Example
Query Rewriting 2:
teaches
reg
course
V2
V4
V1
V2
V1
V4
q2(S,C,P) :- V2(S’,P,C,Q), V1(S,C,Q,T’), V4(P’,C,T,Q’)
q2’(S,C,P) :- r(S’,C,Q), t(P,C,Q), r(S,C,Q),
r(S,C,Q), c(C,T’), C ≥ 500, Q ≥ Aut98,
r(S”,C,Q’), c(C,T), t(P’,C,Q’), Q’ ≤ Aut97
• This combo is infeasible: consider the conjunction of
arithmetic predicates in V1 and V4
Query rewriting 3:
teaches
reg
course
V2
V4
V1
V2
V1
V4
q3(S,C,P) :- V2(S’,P,C,Q), V2(S,P’,C,Q), V4(P”,C,T,Q’)
10
The Bucket Algorithm: Example
Unfolding of rewriting 3:
q3’(S,C,P) :- r(S’,C,Q), t(P,C,Q), r(S,C,Q), t(P’,C,Q), r(S”,C,Q’),
c(C,T), t(P”,C,Q’), Q’ ≤ Aut97
• The green subgoals can cover the black ones under the
mapping: S’S, S”S, P’P, P”P, Q’Q
Minimized rewriting 3:
q3m(S,C,P) :- V2(S,P,C,Q), V4(P,C,T,Q)
Verify that there are only two rewritings that are not covered
by others
Maximally Contained Rewriting:
q’ = q1m  q3m
11
The Bucket Algorithm: Example 2
Query:
q(X) :- cites(X,Y), cites(Y,X), sameTopic(X,Y)
Views:
V4(A) :- cites(A,B), cites(B,A)
V5(C,D) :- sameTopic(C,D)
V6(F,H) :- cites(F,G), cites(G,H), sameTopic(F,G)
Buckets
cites
cites
sameTopic
V4
V4
V5
V6
V6
V6
Note: Should we list V4(X) twice in the buckets?
12
The Bucket Algorithm: Example 2
• Consider all combos & check for containment of
the unfolded rewriting in Q
• V4(X) cannot be combined with anything (why?)
Try q1(X) :- V4(X), V4(X), V5(X,Y)
Try q2(X) :- V4(X), V6(X,Y), V5(X,Y)
• Does any of these work?
• When can we discard a view from consideration?
13
The Bucket Algorithm: Example 2
Here is a successful rewriting:
q3(X) :- V6(X,Y), V6(X,Y), V6(X,Y)
• By itself is not contained in Q
• But, with subgoal X=Y added, it is!
By minimizing the rewriting, we get:
q3m(X,Y) :- V6(X,X)
14
The Bucket Algorithm: Example 2
Remarks:
• V4 didn’t contribute to any rewrite, but the
bucket algorithm doesn’t recognize it ahead
• Consider:
q2(X,Y) :- cites(X,Y), cites(Y,X)
• Then both cites predicates can be folded into V4
– Not recognized by the bucket algorithm
15
The State of Affairs
• Bucket algorithm:
– deals well with predicates, Cartesian product can be
large (containment check required for every candidate
rewriting)
• Inverse rules:
– modular (extensible to binding patterns, FD’s)
– no treatment of predicates
– resulting rewritings need significant further
optimization
Neither scales up
• The MINICON algorithm:
– change perspective: look at query variables
16
References
• Querying Heterogeneous Information Sources
Using Source Descriptors
– By Alon Y. Levy, Anand Rajaraman and Joann J. Ordille
– VLDB, 1996
• Laks VS Lakshmanan
– Lecture Slides
• Alon Halevy
– Answering Queries Using Views: A Survey
– VLDB Journal, 2000
– http://citeseer.ist.psu.edu/halevy00answering.html
17
Download