PPT

advertisement
Generating Efficient Plans for Queries
Using Views
Chen Li
Stanford University
with Foto Afrati (National Technical University of Athens)
and Jeff Ullman (Stanford University)
SIGMOD, Santa Barbara, CA, May 23, 2001
Answering queries using views
 How to answer a query using only the results of views? [LMSS95]
 Many applications:
– Data warehouses
– Data integration
– Query optimization
– …
Q
Query
V1 V2 … Vn
Views
Base relations
R1
…
Rm
2
An example
car
loc
Make
Dealer
BMW
Honda
Alison
Anderson
Ford
…
Varsity
…
Dealer
City
Anderson Palo alto
Redwood City
Varsity
Alison
Mountain View
…
…
View:
V1(M, D, C) :- car(M, D), loc(D, C)
Query Q:
Q(M, C)
:- car(M, anderson), loc(anderson, C)
Rewriting P1:
Q(M, C)
:- V1(M, anderson, C)
3
Existing algorithms
 Bucket algorithm [LRO96], Inverse-rule algorithm [DG97],
MiniCon algorithm [PL00], …
 However, instead of generating
P1:
Q(M, C) :- V1(M, anderson, C)
they generate rewriting
P2:
Q(M, C) :- V1(M, anderson, C1), V1(M1, anderson, C)
 Why P2, not P1?
– These algorithms take the Open-World Assumption (OWA): “P2  P1.”
– However, under the Closed-World Assumption (CWA): “P1 = P2.”
4
Differences between OWA and CWA
W1(Make, Dealer) :- car(Make, Dealer)
W2(Make, Dealer) :- car(Make, Dealer)
OWA
CWA
W1 = W2 =
All car tuples
W1
W2
– W1 and W2 have all car tuples.
– W1 and W2 have some car tuples.
– E.g.: W1 and W2 are computed from
the same car table in a database.
– E.g.: W1 and W2 are from two
different web sites.
5
Our problem: generating efficient plans
using views under CWA
Query
Materialized views
Q
Efficient
plans?
V1 V2 … Vn
Base relations
R1
R2
Rm
 Existing algorithms work under both assumptions.
 Our study
– takes the CWA assumption.
– considers efficiency of rewritings.
6
Challenge: in what space should we
generate rewritings?
car(Make, Dealer)
Q(S, C)
:- car(M, a), loc(a, C), part(S, M, C)
V1(M, D, C) :- car(M, D), loc(D, C)
loc(Dealer, City)
V2(S, M, C) :- part(S, M, C)
part(Store, Make, City)
V3(S)
:- car(M, a), loc(a, C), part(S, M, C)
a = ‘anderson’
Rewritings: P1: Q(S, C) :- V1(M, a, C), V2(S, M, C)
P2: Q(S, C) :- V3(S), V1(M, a, C), V2(S, M, C)
P2 could be more efficient than P1!
7
Focus
Views
V1,V2,…,Vn
Query Q
Step 1: generate a
rewriting P (logical plan)
Step 2: generate an efficient
physical plan from P
Cost model CM
 We focus on the logical level (step 1).
– Prune rewriting space to generate “good” rewritings.
– Different from the one-step approach: [CKPS95, ZCLPU00].
 Both steps are cost-based.
 Consider select-project-join queries, i.e., conjunctive queries.
8
Rest of the talk
 Three cost models:
– CM1: number of subgoals in a physical plan
– CM2: sizes of views and intermediate relations
– CM3: CM2 + dropping attributes in intermediate relations
 Experimental results
 Conclusion and future directions
9
Cost model CM1
 CM1: number of subgoals in a physical plan
– Goal: generate rewritings with minimum number of subgoals
 Motivations:
– Reduce the number of joins
– Reduce the number of view accesses
 Example:
– P1: Q(S, C) :- V1(M, a, C), V2(S, M, C)
 more efficient
– P2: Q(S, C) :- V1(M1, a, C), V1(M, a, C1), V2(S, M, C)
A view can appear more than once in different “forms.”
10
Results under CM1
 Analyze the rewriting space:

–
Find an interesting structure of the space;
–
Show a procedure to reduce number of subgoals in a rewriting.
Develop an algorithm CoreCover:
–
Input: a query Q, views V1, …, Vn
–
Output: rewritings with minimum number of subgoals
Optimality: if there is a rewriting, then CoreCover guarantees to
find a rewriting with minimum number of subgoals.
11
CoreCover: example
Intuition: translate the problem to a set-covering problem.
Query: Q(S, C) :- car(M, a), loc(a, C), part(S, M, C)
Construct database D = { car(m0, a), loc(a, c0), part(s0, m0, c0) }
D
Evaluate views on D:
V1(M, D, C) :- car(M, D), loc(D, C)
 V1(m0, a, c0)
V2(S, M, C) :- part(S, M, C)
 V2(s0, m0, c0)
V3(S)
 V3(s0)
:- car(M, a), loc(a, C), part(S, M, C)
View tuples:
V1(M, a, C), V2(S, M, C), V3(S)
12
CoreCover: example (cont.)
Query: Q(S, C) :- car(M, a), loc(a, C), part(S, M, C)
View tuples:
V1(M, a, C), V2(S, M, C), V3(S)
Find query subgoals “covered”
V1(M, D, C):- car(M, D), loc(D, C)
by each view tuple:
V1(M, a, C)
V2(S, M, C)
V3(S)
V2(S, M, C) :- part(S, M, C)
car(M, a)
V3(S) :- car(M, a), loc(a, C), part(S, M, C)
loc(a, C)
part(S, M, C)
Find minimal covers of query
subgoals using view tuples
Q(S, C) :- V1(M, a, C) , V2(S, M, C)
13
Algorithm: CoreCover
Q
Construct database D from Q
D
View
tuples
T1
G1
T2
G2
…
G3
Tk
…
Evaluate views on D
“View tuples”
Find query subgoals “answered” by each view tuple.
Find minimal covers of query subgoals using view tuples.
rewritings
Query
subgoals
Gm
14
Cost model CM2: considering sizes of
views and intermediate relations
Motivation: cost of V1 V2 is related to size(V1) and size(V2).
Physical plan:
Q( ) :- V1, V2, V3, …, Vn
IR1
IR2
IRn
“IR”: intermediate relation
Cost = size(V1) + size(V2) + … + size(Vn)
+ size(IR1) + size(IR2) + … + size(IRn)
15
Results under CM2
 Observation: Adding more views may make a rewriting
more efficient.
P1: Q(S, C) :- V1(M, a, C), V2(S, M, C)
P2: Q(S, C) :- V3(S), V1(M, a, C), V2(S, M, C)
If V3(S) is very selective, P2 can be more efficient than P1.
 Larger search space: rewritings using view tuples produce
an optimal physical plan under CM2.
– Modify CoreCover to find these rewritings.
– We discuss how to condense rewritings.
16
Cost model CM3:
dropping nonrelevant attributes
Q( ) :- … Vi
Vi+1 …
Y
IRi
 CM2: assumes all attributes are kept in IRs.
 CM3: assumes attributes can be dropped in IRs to reduce sizes.
 Bad news: didn’t find a space that guarantees to produce an
optimal physical plan.
 Good news: found a heuristic for optimizer to drop more attributes.
17
Drop what attributes?
Q( ) :- … Vi
Vi+1 …
Y
IRi
 Drop Y if: (1) Y is not used in later joins, and
(2) Y is not in the answers.
Called the “supplementary-relation approach.” [BR87]
18
Search space under CM3?
Rewritings using view tuples may not produce optimal physical plans!
r(A,B)
Q(A)
:- r(A, A), t(A, B), s(B, B)
s(C,D)
V1(A, B) :- r(A, A), s(B, B)
t(E,F)
V2(A, B) :- t(A, B), s(B, B)
Rewriting using view tuples: P1: Q(A) :- V1(A, B), V2(A, B)
A more efficient rewriting:
P2: Q(A) :- V1(A, C), V2(A, B)
Note: P1 and P2 both compute the answers to Q.
19
Targeting rewritings to facilitate
dropping of attributes
Q( ) :- … Vi
Vi+1 …
YY’
IRi
 Goal: after the transformation, we may drop more attributes.
 Main idea: given a sequence of subgoals, rename variables.
 If Y Y’, the new rewriting is still equivalent to Q, then
drop Y’ in IRi even if Y appears in later joins.
P1: Q(A) :- V1(A, B), V2(A, B)
P2: Q(A) :- V1(A, C), V2(A, B)
20
Experimental study
 Purpose:
– Test how fast CoreCover generates rewritings (cost model CM1).
– Analyze its efficiency and scalability.
 Experiment setup:
– A query generator (in Java). Input parameters:
•
•
•
•
•
Number of base relations
Number of attributes in a relation
Number of views (1-1000), queries (5)
Number of subgoals in a view and a query
Shape of queries and views (star, chain, …)
– Implemented in Java on a dual-processor Sun Ultra 2 workstation,
running SUNOS 5.6, 256MB memory
21
Star queries and views
 Each query has 8 subgoals, and each view has 1, 2, or 3 subgoals.
 No attribute projection in the head of the queries/views.
22
Chain queries and views
 Each query has 8 subgoals, and each view has 1, 2, or 3 subgoals.
 1 variable is projected in the head of the queries/views.
23
Conclusion
Generating efficient plans using views under CWA:
– Cost model CM1: number of subgoals in a plan
• Analysis of the rewriting space
• A search space for rewritings
• CoreCover: finding rewritings with minimum number of subgoals
– Cost model CM2: sizes of views and IRs
• A search space for rewritings
• Condense rewritings
– Cost model CM3: dropping irrelevant attributes in IRs
• A heuristic to help optimizer drop attributes
24
Future work
 More complicated queries and views:
– Arithmetic comparisons ( <=, >=, …)
– Aggregations
 Different assumptions:
– Open-world assumption
– Maximally-contained rewritings
 Constraints:
– Functional dependencies
– Foreign-key constraints
25
Thank you!
Questions?
26
Differences between
CoreCover and MiniCon
 CoreCover takes CWA, and MiniCon takes the OWA.
 MiniCon tries to minimize the number of query
subgoals, but it has no guarantee.
 Technical differences:
– CoreCover is more “aggressive” than MiniCon about finding
query subgoals answered by a view tuple.
– Finding set covers of query subgoals: CoreCover allows
overlapping, and MiniCon does not allow it.
27
Difference from earlier studies
Views
V1,V2,…,Vn
Query Q
Step 1: generate a
rewriting P (logical plan)
Step 2: generate an efficient
physical plan from P
Cost model CM
 One-step approach: [CKPS95, ZCLPU00].
 We focus on the logical level (step 1).
– Prune rewriting space to generate “good” rewritings.
– Cost-based.
28
Rewriting space
All rewritings
Minimal
rewritings
Containment
minimal
rewritings
Locally
minimal
rewritings
Globally
minimal
rewritings
P’
P
Rewriting P P’:
Remove its redundant subgoals [Chandra & Merlin 77]:
29
Rewriting space (cont.)
All rewritings
Minimal
rewritings
Containment
minimal
rewritings
Locally
minimal
rewritings
Globally
minimal
rewritings
P
P’
P’’
P’ P’’: Remove its subgoals while retaining its equivalence to Q:
P3: Q(S, C) :- V3(S), V1(M, a, C), V2(S, M, C)
V3(S) can still be removed.
30
Rewriting space (cont.)
All rewritings
Minimal
rewritings
Locally
minimal
rewritings
Globally
minimal
rewritings
P*
P
P’
Containment
minimal
rewritings
P’’
P’’ P*: transform P’’ using the mapping from the expansion
of P’’ to the query:
P1: Q(S,C) :- v1(M1,a,C),v1(M,a,C1),v2(S,M,C)
 P2: Q(S,C) :- v1(M,a,C), v2(S,M,C)
31
Concise representation of rewritings
 Problem: as the number of views increases, the number
of rewritings could be large!
 Solution:
– Group views into equivalence classes
– Group view tuples into equivalence classes based on their
covered query subgoals.
32
Advantages
Views
V1
V2
…
Vn
Equivalence classes
{V1, V3}
{V4,V10,V15}
…
{V2, V9}
View tuples Equivalence classes
T1
T2
{T2, T5}
{T1,T6,T9}
…
…
Tn
{T3}
 Advantages:
– Number of equivalence classes bounded by the number of query subgoals.
– The optimizer finds efficient physical plans by considering the
“representative rewritings,” then decides how to make them more efficient
by adding more view tuples.
– The optimizer can replace a view tuple in a rewriting by another view
tuple in the same equivalence class to have another rewriting.
33
Main results of experiments
 CoreCover has good efficiency and scalability.
 By grouping views and view tuples into
equivalence classes, we can reduce the number of
views and view tuples used by CoreCover.
34
Star queries and views:
Number of equivalence classes
35
Star queries and views:
Number of view tuples
36
Download