1,1 - Pages

advertisement
ANSWERING CONJUNCTIVE QUERIES
WITH INEQUALITIES
Paris Koutris1
Tova Milo2
Sudeepa Roy1
Dan Suciu1
University of Washington
2 Tel Aviv University
1
ICDT 2015
PROBLEM
What is the combined complexity of computing
conjunctive queries with inequalities (CQ≠)?
query (q,I):
q = R(x,y),S(y,z),T(z,w)
I = {x ≠ z, y ≠ w}
2
EXAMPLE: PATH QUERY
Path query (of length k)
Pk = R1(x1,x2),R2(x2,x3),…,Rk(xk,xk+1)
• acyclic query
• polynomial combined complexity
R1
R2
R3
Rk
...
x1
x2
x3
xk
xk+1
3
EXAMPLE: PATH QUERY
Path query + inequalities
Pk = R1(x1,x2),R2(x2,x3),…,Rk(xk,xk+1)
I = {xi ≠ xj, for all i<j}
• equivalent to Hamiltonian path
• NP-hard
R1
R2
R3
inequality graph
Rk
...
x1
x2
x3
xk
xk+1
4
EXAMPLE: PATH QUERY
Path query + inequalities
Pk = R1(x1,x2),R2(x2,x3),…,Rk(xk,xk+1)
I = {xi ≠ xi+2, for all i}
• polynomial combined complexity
R1
R2
R3
Rk
...
x1
x2
x3
xk
xk+1
5
CONTRIBUTION
How does the combined complexity of computing CQs
changes when we add inequalities?
• Given any blackbox algorithm that computes q, we can
compute (q,I) with a g(q,I) log(|D|) blowup
• Given any Selection-Projection-Join plan that computes q,
we can compute (q,I) with a f(q,I) blowup
6
OUTLINE
Color Coding
The Main Technique
Query Plans for Inequalities
7
BACKGROUND
[Papadimitriou, Yannakakis ‘97]
Let q be a boolean acyclic CQ≠ and D be a database instance.
Then, q can be evaluated in time
k = #variables in the inequality graph
fixed-parameter tractability
8
COLOR CODING: IDEA
[Alon, Yuster, Zwick ‘97]
• Pick a random coloring h: Dom  {1, …, k}
– maps values to k colors
• If a tuple t belongs in the answer of the full query, then the
colors satisfy the inequalities with probability ≥ e-k
q = R(x,y),S(y,z),T(z,w)
I = {x ≠ z, y ≠ w}
tuple
a
b
c
d
col #1
1
2
1
4
col #2
1
2
3
3
valid
9
COLOR CODING: THEOREM
/Theorem/ Let q be a CQ that can be computed in time
T(|q|, |D|). Then, (q, I) can be computed in time
• Color-coding demands the construction of k-perfect hash
family for every instance
• There is a log(|D|) additional factor
• The algorithm is oblivious to the combined structure of the
query + inequalities
10
OUTLINE
Color Coding
The Main Technique
Query Plans for Inequalities
11
MAIN TECHNIQUE
q = R(x1,…,xm),S(y1,…,yl) + inequalities
How do we compute (q,I) ?
• Cartesian product, then apply the inequalities
– time O(ml|R||S|)
• IDEA: compress R to a representation R’ of size
independent of |R|, then compute the product R’,S
12
RUNNING EXAMPLE
inequality graph (bipartite) H
R(x1, x2)
(1,1)
(1,2)
(1,4)
x1
(1,8)
(2,3)
(2,1)
(3,2)
y1
y2
x2
y3
(5,2)
(2,2)
(2,4)
13
H-ACCEPTED TUPLES
A tuple t over the schema of S is H-accepted by R if for some t’
in R, t and t’ satisfy the inequalities in H
R(x1, x2)
(1,1)
x1
y2
(1,2)
(1,4)
y1
x2
(1,8)
y3
(2,3)
(2,1)
(3,2)
(5,2)
(2,2)
t = (2,1,3) is H-accepted
t = (2,1,2) is not!
(2,4)
14
H-EQUIVALENCE
Relations R1, R2 are H-equivalent if for any tuple t, t is Haccepted by R1 if and only if t is H-accepted by R2
/Lemma/ There exists a sub-instance R’ of R s.t.
• R’,R are H-equivalent
• |R’| ≤ f(H), independent of R
• R’ can be computed in time O(f(H) |R|)
15
H-FORBIDDEN TUPLES
A tuple t over Dom + {-} is H-forbidden for R if for every tuple
t’ in R, the inequalities between t, t’ are violated
R(x1, x2)
(1,1)
(1,2)
(1,4)
(1,8)
t = (1,2,3) is H-forbidden
t = (1,2,-) is also H-forbidden
(2,3)
(2,1)
(3,2)
(5,2)
(2,2)
The H-forbidden tuples are infinitely many
but the minimally H-forbidden are finite
(2,4)
16
THE ALGORITHM
R(x1, x2)
(1,1)
(1,2)
(1,4)
(-,-,-)
(1,1)
(1,-,-)
(-,1,-)
(1,8)
(2,3)
(-,-,1)
(2,1)
(3,2)
(5,2)
(2,2)
(2,4)
17
THE ALGORITHM
R(x1, x2)
(1,1)
(1,2)
(1,4)
(-,-,-)
(1,1)
(1,-,-)
(-,1,-)
(1,8)
(2,3)
(2,1)
(-,-,1)
(3,2)
(1,2)
(5,2)
(2,2)
(1,-,1)
(-,2,1)
(2,4)
(-,1,1)
• (1,-,-) remains H-forbidden
• (-,1,-) remains H-forbidden
• (-,-,1) is not
18
THE ALGORITHM
R(x1, x2)
(1,1)
(1,2)
(1,4)
(-,-,-)
(1,1)
(1,-,-)
(-,1,-)
(1,8)
(2,3)
(2,1)
(-,-,1)
(3,2)
(1,2)
(5,2)
(2,2)
(1,-,1)
(-,2,1)
(2,4)
(-,1,1)
(1,4)
only the rightmost node needs expansion
(1,2,1)
19
THE ALGORITHM
R(x1, x2)
(1,1)
(1,2)
(1,4)
(-,-,-)
(1,1)
(1,-,-)
(-,1,-)
(1,8)
(2,3)
(2,1)
(-,-,1)
(3,2)
(1,2)
(5,2)
(2,2)
(1,-,1)
(-,2,1)
(2,4)
(-,1,1)
(1,4)
the tuple (1,8) expands no node
(1,2,1)
20
THE ALGORITHM
R(x1, x2)
(1,1)
(1,2)
(1,4)
(-,-,-)
(1,1)
(1,-,-)
(-,1,-)
(2,3)
(1,8)
(2,3)
(-,-,1)
(1,-,3)
(2,1,-)
(3,2)
(1,2)
(2,3)
(1,2,-)
(2,1)
(5,2)
(2,2)
(-,1,3)
(1,-,1)
(1,3,-)
(2,3)
(1,2,1)
(-,2,1)
(-,1,1)
(2,3)
(1,3,1)
(2,4)
(2,1,1)
(1,4)
(1,2,1)
21
THE ALGORITHM
R(x1, x2)
(1,1)
(1,2)
(1,4)
(-,-,-)
(1,1)
(1,-,-)
(-,1,-)
(2,3)
(1,8)
(2,3)
(-,-,1)
(1,-,3)
(1,3,-)
(2,1)
(1,3,1)
(1,1,3)
(2,1,-)
(2,1)
(1,2,3)
(3,2)
(1,2)
(2,3)
(1,2,-)
(2,1)
(5,2)
(2,2)
(-,1,3)
(1,-,1)
(2,3)
(1,2,1)
(-,2,1)
(-,1,1)
(2,3)
(1,3,1)
(2,4)
(2,1,1)
(1,4)
(1,2,1)
22
THE ALGORITHM
R(x1, x2)
(1,1)
(1,2)
(1,4)
(-,-,-)
(1,1)
(1,8)
(2,3)
(1,-,-)
(-,-,1)
(-,1,-)
(2,3)
(1,-,3)
(1,3,-)
(2,1)
(1,3,1)
(1,1,3)
(2,1,-)
(2,1)
(1,2,3)
(5,2)
(2,2)
(-,1,3)
(3,2)
(3,2)
(1,2)
(2,3)
(1,2,-)
(2,1)
(1,-,1)
(3,2)
(2,3)
(-,2,1)
(2,4)
(-,1,1)
(2,3)
(1,4)
(3,1,3)
(2,1,2)
(1,2,1)
(3,2)
(1,3,1)
(2,1,1)
(1,2,1)
(3,2)
the node should be expanded, but has no “space”
23
THE ALGORITHM
R(x1, x2)
(1,1)
(1,2)
(1,4)
(-,-,-)
(1,1)
(1,8)
(2,3)
(1,-,-)
(-,-,1)
(-,1,-)
(2,3)
(2,1)
(3,2)
(1,2)
(2,3)
(1,2,-)
(1,-,3)
(1,3,-)
(2,1)
(1,3,1)
(1,1,3)
(5,2)
(2,1,-)
(2,1)
(1,2,3)
(3,2)
(2,2)
(-,1,3)
(3,2)
(5,2)
(1,-,1)
(3,2)
(2,3)
(-,2,1)
(2,4)
(-,1,1)
(2,3)
(1,4)
(3,1,3)
(2,1,2)
(1,2,1)
(5,2)
(1,3,1)
(2,1,1)
(5,2)
(3,2)
(1,2,1)
24
THE ALGORITHM
R(x1, x2)
(1,1)
(1,2)
(1,4)
(-,-,-)
(1,1)
(1,8)
(2,3)
(1,-,-)
(-,-,1)
(-,1,-)
(2,3)
(2,1)
(3,2)
(1,2)
(2,3)
(1,2,-)
(1,-,3)
(1,3,-)
(2,1)
(1,3,1)
(1,1,3)
(5,2)
(2,1,-)
(2,1)
(1,2,3)
(3,2)
(2,2)
(-,1,3)
(3,2)
(5,2)
(1,-,1)
(3,2)
(2,3)
(-,2,1)
(2,4)
(-,1,1)
(2,3)
(1,4)
(3,1,3)
(2,1,2)
(1,2,1)
(5,2)
(1,3,1)
(2,1,1)
(5,2)
(3,2)
(1,2,1)
25
THE ALGORITHM
R(x1, x2)
(1,1)
(1,2)
(1,4)
(-,-,-)
(1,1)
(1,8)
(2,3)
(1,-,-)
(-,-,1)
(-,1,-)
(2,3)
(2,1)
(3,2)
(2,3)
(1,2,-)
(1,-,3)
(1,3,-)
(2,1)
(1,3,1)
(1,1,3)
(5,2)
(1,2,-)
(2,1,-)
(2,1)
(1,2,3)
(2,2)
(-,1,3)
(3,2)
(5,2)
(1,2)
(1,-,1)
(3,2)
(2,3)
(-,2,1)
(2,4)
(-,1,1)
(2,3)
(1,4)
(3,1,3)
(2,1,2)
(3,2)
(1,2,3) (2,1,2)
(1,2,1)
(5,2)
(1,2,1)
(1,3,1)
(2,1,1)
(5,2)
(3,2)
(1,2,1)
(1,2,1)
26
ANALYSIS
R(x1, x2)
(1,1)
(1,2)
(1,4)
• relations with the same tree are H-equivalent
• tuples that do not expand a node can be removed
• the tree has only f(H) nodes
(1,8)
(2,3)
(2,1)
(3,2)
(5,2)
(2,2)
(2,4)
EH(R) = constant-size relation that is H-equivalent to R
27
OUTLINE
Color Coding
The Main Technique
Query Plans for Inequalities
28
THE H-PROJECTION
Let R(A1, …, Am)
• X subset of A = {A1,…,Am}
• H a bipartite graph with sets A \ X and some set B
• the size of the H-projection is at most f(H) times the
projection
29
SPJ PLANS
ΠD
C=C’
q(w)=R(x,y,’a’),S(y,z),T(z,w)
I={x≠z, y≠w, x≠w}
σE=‘a’
T(C’,D)
ΠC,E
inequalities cannot be trivially added to
the plan
B=B’
R(A,B,E)
S(B’,C)
30
SPJ PLANS: STEP ONE
push projections
to the top of the plan
ΠD
ΠD
C=C’
σE=‘a’
C=C’
T(C’,D)
σE=‘a’
T(C’,D)
ΠC,E
B=B’
R(A,B,E)
B=B’
S(B’,C)
R(A,B,E)
S(B’,C)
31
SPJ PLANS: STEP TWO
ΠDH0
• add the inequalities after the projection
• introduce H-projection with empty
graph H0
σA≠C,B≠D,A≠D
C=C’
σE=‘a’
T(C’,D)
B=B’
R(A,B,E)
S(B’,C)
32
SPJ PLANS: STEP THREE
Push projections
to initial place
ΠDH0
ΠDH0
σB≠D,A≠D
σA≠C,B≠D,A≠D
C=C’
C=C’
σE=‘a’
T(C’,D)
ΠC,EH2
T(C’,D)
σA≠C
A
H2
D
σE=‘a’
B
B=B’
B=B’
R(A,B,E)
S(B’,C)
R(A,B,E)
S(B’,C)
33
SPJ PLANS: STEP THREE
Push projections
to initial place
ΠDH0
ΠDH0
σB≠D,A≠D
σB≠D,A≠D
C=C’
ΠC,EH2
C=C’
T(C’,D)
σA≠C
σE=‘a’
T(C’,D)
ΠC,EH2
A
σA≠C
σE=‘a’
H2
D
B=B’
R(A,B,E)
B
B=B’
S(B’,C)
R(A,B,E)
S(B’,C)
34
MAIN RESULT
/Theorem/ Let q be a CQ that can be evaluated in time
T(|q|,|D|) using a Select-Project-Join plan. Then, we can
compute (q, I) in time
The function g depends on the joint structure of the
query plan and the inequalities
R1
R2
R3
Rk
...
x1
x2
x3
xk
xk+1
35
CONCLUSION
What is the complexity of computing CQ≠ ?
• color-coding for any CQ≠
• SPJ query plans with inequalities
• In the paper : analysis of other structural properties
Open questions
• can we apply the technique to arbitrary join algorithms?
• other classes of queries: UCQs, Datalog
36
Thank you!
37
COLOR CODING: ALGORITHM
For any (valid) k-coloring c of the inequality graph, and any
hash function h
• For each relation R, compute the sub-relation Rc,h that
satisfies the colors of c
• Apply the black-box join algorithm on the sub-instance
with relations Rc,h
Output the union for all possible colorings and hash functions
38
Download