ANSWERING CONJUNCTIVE QUERIES WITH INEQUALITIES Paris Koutris1 Tova Milo2 Sudeepa Roy1 Dan Suciu1 University of Washington 2 Tel Aviv University 1 ICDT 2015 PROBLEM What is the combined complexity of computing conjunctive queries with inequalities (CQ≠)? query (q,I): q = R(x,y),S(y,z),T(z,w) I = {x ≠ z, y ≠ w} 2 EXAMPLE: PATH QUERY Path query (of length k) Pk = R1(x1,x2),R2(x2,x3),…,Rk(xk,xk+1) • acyclic query • polynomial combined complexity R1 R2 R3 Rk ... x1 x2 x3 xk xk+1 3 EXAMPLE: PATH QUERY Path query + inequalities Pk = R1(x1,x2),R2(x2,x3),…,Rk(xk,xk+1) I = {xi ≠ xj, for all i<j} • equivalent to Hamiltonian path • NP-hard R1 R2 R3 inequality graph Rk ... x1 x2 x3 xk xk+1 4 EXAMPLE: PATH QUERY Path query + inequalities Pk = R1(x1,x2),R2(x2,x3),…,Rk(xk,xk+1) I = {xi ≠ xi+2, for all i} • polynomial combined complexity R1 R2 R3 Rk ... x1 x2 x3 xk xk+1 5 CONTRIBUTION How does the combined complexity of computing CQs changes when we add inequalities? • Given any blackbox algorithm that computes q, we can compute (q,I) with a g(q,I) log(|D|) blowup • Given any Selection-Projection-Join plan that computes q, we can compute (q,I) with a f(q,I) blowup 6 OUTLINE Color Coding The Main Technique Query Plans for Inequalities 7 BACKGROUND [Papadimitriou, Yannakakis ‘97] Let q be a boolean acyclic CQ≠ and D be a database instance. Then, q can be evaluated in time k = #variables in the inequality graph fixed-parameter tractability 8 COLOR CODING: IDEA [Alon, Yuster, Zwick ‘97] • Pick a random coloring h: Dom {1, …, k} – maps values to k colors • If a tuple t belongs in the answer of the full query, then the colors satisfy the inequalities with probability ≥ e-k q = R(x,y),S(y,z),T(z,w) I = {x ≠ z, y ≠ w} tuple a b c d col #1 1 2 1 4 col #2 1 2 3 3 valid 9 COLOR CODING: THEOREM /Theorem/ Let q be a CQ that can be computed in time T(|q|, |D|). Then, (q, I) can be computed in time • Color-coding demands the construction of k-perfect hash family for every instance • There is a log(|D|) additional factor • The algorithm is oblivious to the combined structure of the query + inequalities 10 OUTLINE Color Coding The Main Technique Query Plans for Inequalities 11 MAIN TECHNIQUE q = R(x1,…,xm),S(y1,…,yl) + inequalities How do we compute (q,I) ? • Cartesian product, then apply the inequalities – time O(ml|R||S|) • IDEA: compress R to a representation R’ of size independent of |R|, then compute the product R’,S 12 RUNNING EXAMPLE inequality graph (bipartite) H R(x1, x2) (1,1) (1,2) (1,4) x1 (1,8) (2,3) (2,1) (3,2) y1 y2 x2 y3 (5,2) (2,2) (2,4) 13 H-ACCEPTED TUPLES A tuple t over the schema of S is H-accepted by R if for some t’ in R, t and t’ satisfy the inequalities in H R(x1, x2) (1,1) x1 y2 (1,2) (1,4) y1 x2 (1,8) y3 (2,3) (2,1) (3,2) (5,2) (2,2) t = (2,1,3) is H-accepted t = (2,1,2) is not! (2,4) 14 H-EQUIVALENCE Relations R1, R2 are H-equivalent if for any tuple t, t is Haccepted by R1 if and only if t is H-accepted by R2 /Lemma/ There exists a sub-instance R’ of R s.t. • R’,R are H-equivalent • |R’| ≤ f(H), independent of R • R’ can be computed in time O(f(H) |R|) 15 H-FORBIDDEN TUPLES A tuple t over Dom + {-} is H-forbidden for R if for every tuple t’ in R, the inequalities between t, t’ are violated R(x1, x2) (1,1) (1,2) (1,4) (1,8) t = (1,2,3) is H-forbidden t = (1,2,-) is also H-forbidden (2,3) (2,1) (3,2) (5,2) (2,2) The H-forbidden tuples are infinitely many but the minimally H-forbidden are finite (2,4) 16 THE ALGORITHM R(x1, x2) (1,1) (1,2) (1,4) (-,-,-) (1,1) (1,-,-) (-,1,-) (1,8) (2,3) (-,-,1) (2,1) (3,2) (5,2) (2,2) (2,4) 17 THE ALGORITHM R(x1, x2) (1,1) (1,2) (1,4) (-,-,-) (1,1) (1,-,-) (-,1,-) (1,8) (2,3) (2,1) (-,-,1) (3,2) (1,2) (5,2) (2,2) (1,-,1) (-,2,1) (2,4) (-,1,1) • (1,-,-) remains H-forbidden • (-,1,-) remains H-forbidden • (-,-,1) is not 18 THE ALGORITHM R(x1, x2) (1,1) (1,2) (1,4) (-,-,-) (1,1) (1,-,-) (-,1,-) (1,8) (2,3) (2,1) (-,-,1) (3,2) (1,2) (5,2) (2,2) (1,-,1) (-,2,1) (2,4) (-,1,1) (1,4) only the rightmost node needs expansion (1,2,1) 19 THE ALGORITHM R(x1, x2) (1,1) (1,2) (1,4) (-,-,-) (1,1) (1,-,-) (-,1,-) (1,8) (2,3) (2,1) (-,-,1) (3,2) (1,2) (5,2) (2,2) (1,-,1) (-,2,1) (2,4) (-,1,1) (1,4) the tuple (1,8) expands no node (1,2,1) 20 THE ALGORITHM R(x1, x2) (1,1) (1,2) (1,4) (-,-,-) (1,1) (1,-,-) (-,1,-) (2,3) (1,8) (2,3) (-,-,1) (1,-,3) (2,1,-) (3,2) (1,2) (2,3) (1,2,-) (2,1) (5,2) (2,2) (-,1,3) (1,-,1) (1,3,-) (2,3) (1,2,1) (-,2,1) (-,1,1) (2,3) (1,3,1) (2,4) (2,1,1) (1,4) (1,2,1) 21 THE ALGORITHM R(x1, x2) (1,1) (1,2) (1,4) (-,-,-) (1,1) (1,-,-) (-,1,-) (2,3) (1,8) (2,3) (-,-,1) (1,-,3) (1,3,-) (2,1) (1,3,1) (1,1,3) (2,1,-) (2,1) (1,2,3) (3,2) (1,2) (2,3) (1,2,-) (2,1) (5,2) (2,2) (-,1,3) (1,-,1) (2,3) (1,2,1) (-,2,1) (-,1,1) (2,3) (1,3,1) (2,4) (2,1,1) (1,4) (1,2,1) 22 THE ALGORITHM R(x1, x2) (1,1) (1,2) (1,4) (-,-,-) (1,1) (1,8) (2,3) (1,-,-) (-,-,1) (-,1,-) (2,3) (1,-,3) (1,3,-) (2,1) (1,3,1) (1,1,3) (2,1,-) (2,1) (1,2,3) (5,2) (2,2) (-,1,3) (3,2) (3,2) (1,2) (2,3) (1,2,-) (2,1) (1,-,1) (3,2) (2,3) (-,2,1) (2,4) (-,1,1) (2,3) (1,4) (3,1,3) (2,1,2) (1,2,1) (3,2) (1,3,1) (2,1,1) (1,2,1) (3,2) the node should be expanded, but has no “space” 23 THE ALGORITHM R(x1, x2) (1,1) (1,2) (1,4) (-,-,-) (1,1) (1,8) (2,3) (1,-,-) (-,-,1) (-,1,-) (2,3) (2,1) (3,2) (1,2) (2,3) (1,2,-) (1,-,3) (1,3,-) (2,1) (1,3,1) (1,1,3) (5,2) (2,1,-) (2,1) (1,2,3) (3,2) (2,2) (-,1,3) (3,2) (5,2) (1,-,1) (3,2) (2,3) (-,2,1) (2,4) (-,1,1) (2,3) (1,4) (3,1,3) (2,1,2) (1,2,1) (5,2) (1,3,1) (2,1,1) (5,2) (3,2) (1,2,1) 24 THE ALGORITHM R(x1, x2) (1,1) (1,2) (1,4) (-,-,-) (1,1) (1,8) (2,3) (1,-,-) (-,-,1) (-,1,-) (2,3) (2,1) (3,2) (1,2) (2,3) (1,2,-) (1,-,3) (1,3,-) (2,1) (1,3,1) (1,1,3) (5,2) (2,1,-) (2,1) (1,2,3) (3,2) (2,2) (-,1,3) (3,2) (5,2) (1,-,1) (3,2) (2,3) (-,2,1) (2,4) (-,1,1) (2,3) (1,4) (3,1,3) (2,1,2) (1,2,1) (5,2) (1,3,1) (2,1,1) (5,2) (3,2) (1,2,1) 25 THE ALGORITHM R(x1, x2) (1,1) (1,2) (1,4) (-,-,-) (1,1) (1,8) (2,3) (1,-,-) (-,-,1) (-,1,-) (2,3) (2,1) (3,2) (2,3) (1,2,-) (1,-,3) (1,3,-) (2,1) (1,3,1) (1,1,3) (5,2) (1,2,-) (2,1,-) (2,1) (1,2,3) (2,2) (-,1,3) (3,2) (5,2) (1,2) (1,-,1) (3,2) (2,3) (-,2,1) (2,4) (-,1,1) (2,3) (1,4) (3,1,3) (2,1,2) (3,2) (1,2,3) (2,1,2) (1,2,1) (5,2) (1,2,1) (1,3,1) (2,1,1) (5,2) (3,2) (1,2,1) (1,2,1) 26 ANALYSIS R(x1, x2) (1,1) (1,2) (1,4) • relations with the same tree are H-equivalent • tuples that do not expand a node can be removed • the tree has only f(H) nodes (1,8) (2,3) (2,1) (3,2) (5,2) (2,2) (2,4) EH(R) = constant-size relation that is H-equivalent to R 27 OUTLINE Color Coding The Main Technique Query Plans for Inequalities 28 THE H-PROJECTION Let R(A1, …, Am) • X subset of A = {A1,…,Am} • H a bipartite graph with sets A \ X and some set B • the size of the H-projection is at most f(H) times the projection 29 SPJ PLANS ΠD C=C’ q(w)=R(x,y,’a’),S(y,z),T(z,w) I={x≠z, y≠w, x≠w} σE=‘a’ T(C’,D) ΠC,E inequalities cannot be trivially added to the plan B=B’ R(A,B,E) S(B’,C) 30 SPJ PLANS: STEP ONE push projections to the top of the plan ΠD ΠD C=C’ σE=‘a’ C=C’ T(C’,D) σE=‘a’ T(C’,D) ΠC,E B=B’ R(A,B,E) B=B’ S(B’,C) R(A,B,E) S(B’,C) 31 SPJ PLANS: STEP TWO ΠDH0 • add the inequalities after the projection • introduce H-projection with empty graph H0 σA≠C,B≠D,A≠D C=C’ σE=‘a’ T(C’,D) B=B’ R(A,B,E) S(B’,C) 32 SPJ PLANS: STEP THREE Push projections to initial place ΠDH0 ΠDH0 σB≠D,A≠D σA≠C,B≠D,A≠D C=C’ C=C’ σE=‘a’ T(C’,D) ΠC,EH2 T(C’,D) σA≠C A H2 D σE=‘a’ B B=B’ B=B’ R(A,B,E) S(B’,C) R(A,B,E) S(B’,C) 33 SPJ PLANS: STEP THREE Push projections to initial place ΠDH0 ΠDH0 σB≠D,A≠D σB≠D,A≠D C=C’ ΠC,EH2 C=C’ T(C’,D) σA≠C σE=‘a’ T(C’,D) ΠC,EH2 A σA≠C σE=‘a’ H2 D B=B’ R(A,B,E) B B=B’ S(B’,C) R(A,B,E) S(B’,C) 34 MAIN RESULT /Theorem/ Let q be a CQ that can be evaluated in time T(|q|,|D|) using a Select-Project-Join plan. Then, we can compute (q, I) in time The function g depends on the joint structure of the query plan and the inequalities R1 R2 R3 Rk ... x1 x2 x3 xk xk+1 35 CONCLUSION What is the complexity of computing CQ≠ ? • color-coding for any CQ≠ • SPJ query plans with inequalities • In the paper : analysis of other structural properties Open questions • can we apply the technique to arbitrary join algorithms? • other classes of queries: UCQs, Datalog 36 Thank you! 37 COLOR CODING: ALGORITHM For any (valid) k-coloring c of the inequality graph, and any hash function h • For each relation R, compute the sub-relation Rc,h that satisfies the colors of c • Apply the black-box join algorithm on the sub-instance with relations Rc,h Output the union for all possible colorings and hash functions 38