Version June 20, 2011 Default-all is dangerous! Wolfgang Gatterbauer Alexandra Meliou Dan Suciu 3rd USENIX Workshop on the Theory and Praxis of Provenance (Tapp'11) Database group University of Washington Overview Provenance Definitions Why? Where? Witness Naive "SQL interpretation" Why-provenance = Where-provenance = witness basis (αw) Provenance propagation (αp) definition Buneman et al. [ICDT’01] Buneman et al. [PODS’02] Minimal witness basis (αwm) Default-all QRI definition d) propagation (α p (Query-Rewrite- Buneman et al. [ICDT’01] We do not discuss here whether QRI is desirable (see also Glavic, Miller [Tapp'11] ), but merely point out that, if aiming for QRI, care has to be taken about the ramifications of the proposed semantics. Bhagwat et Insensitive) al. [VLDB’04] Has problems if one interprets annotations on attribute values Minimal propagation (αpm) Proposed in this paper! Independent work presented at this WS 2 Overview Provenance Definitions Why? Where? Witness Naive "SQL interpretation" Why-provenance = Where-provenance = witness basis (αw) Provenance propagation (αp) definition Glavic, Miller [Tapp'11] Buneman et al. [ICDT’01] Buneman et al. [PODS’02] Minimal witness basis (αwm) Default-all QRI definition d) propagation (α p (Query-Rewrite- Buneman et al. [ICDT’01] Bhagwat et Insensitive) al. [VLDB’04] Has problems if one interprets annotations on attribute values Minimal propagation (αpm) Proposed in this paper! Note that Minimal propagation is "stable", in contrast to Default-all 3 Example 1: Query-Rewrite-Insensitivity (QRI) Why-provenance = witness basis (αw) Why Input R A t1 1 t2 1 t3 2 B 2 3 2 Where Input Ra A 1a 1c 2e B 2b 3d 2f Query 1 Q1(x,y):-R(x,y) A B 1 2 {{t1}} 1 3 {{t2}} 2 2 {{t3}} Minimal witness basis (αwm) Query 2 ≡ Query 1 Q2(x,y):-R(x,y),R(_,y) A B 1 2 {{t1},{t1,t3}} {{t1}} {t1,t3} 1 3 {{t2}} {{t2}} {t2} 2 2 {{t3},{t1,t3}} {{t3}} {t1,t3} Where-provenance = propagation (αp) Query 1 Q1(x,y):-Ra (x,y) A B 1a 2b 1c 3d 2e 2f Lineage (αl) Minimal propagation (αpm) Default-all propagation (αpd) Query 2 ≡ Query 1 Q2(x,y):-Ra(x,y),Ra (_,y) A B A B A a b,f a,c b,f 1 2 1 2 1a 1c 3d 1a,c 3d 1c 2e 2b,f 2e 2b,f 2e Example adapted from Cheney et al. [Foundations and Trends in DBs’09] B 2b 3d 2f 4 Real example: Why Default-all is dangerous Hanako queries a community DB for contents of LF-milk*: Community Database Ra Food Content LF Milk Cesium-137b LF Milk Calciumd SC Water Cesium-137f Hanako's query Q (y):-Ra(‘LF Milk’,y) b Bob, March 18, 2011 Don't drink, lots of Cesium! f Fuyumi, March 19, 2011 No Cesium, save to drink! Content Cesium-137??? Calciumd Default-all propagation makes her drink the milk: Default-all propagation (αpd) Content Cesium-137bf Calciumd b Bob, March 18, 2011 Don't drink, lots of Cesium! "semantically irrelevant information": annotations leak over from SC Water tuple to LF Milk f Fuyumi, March 19, 2011 No Cesium, save to drink! * Note the one-to-one correspondence of this example with example 1 Minimal propagation (αpm) Content Cesium-137b Calciumd b Bob, March 18, 2011 Don't drink, lots of Cesium! "all relevant and only relevant" 5 Definition Minimal propagation (αpm) Intuition: Return the intersection between: •query-specific where-provenanc (αp) •and QRI minimal witness basis (αwm) transforms 'sets of sets' into 'sets', hence something like QRI lineage Example 1 Input Ra A t1 1a t2 1c t3 2e B 2b 3d 2f Where provenance (αp) Query 2 Q2(x,y):-Ra(x,y),Ra (_,y) A B {t1} 1a 2b,f {{t1}} {t2} 1c 3d {{t2}} {t3} 2e 2b,f {{t3}} "all relevant ... and only relevant" Minimal propagation (αpm) t4 t5 t6 A 1a 1c 2e B 2b 3d 2f αwm Minimal witness basis (αwm) 6 Example 1: Illustration of "minimal" versus "all" Why-provenance Why-provenance (αw) Minimal witness basis (αwm) Where-provenance Where-provenance (αp) Default-all propagation (αpd) Minimal propagation (αpm) 7 Interpretation of Annotations 1: Attribute Value* * Interpretation of annotations on entity attribute values favored by us and underlying our model 8 Interpretation of Annotations 1: Attribute Value* Annotations on values of an attribute (here "population") for a particular entity (here "Athens") Argument: Interpreting cell annotations as relevant to the tuple (entity) adds something that is not trivially modeled with normalized tables. * Interpretation of annotations on entity attribute values favored by us and underlying our model 9 Interpretation of Annotations 2: Domain Value* Domain value annotations* Input Ra: A B 1a 2b 1c 3d 2e 2f b Bob, March 18, 2011 This number is a prime number. Input Sa: ... Date ... Dec 25 ... ... ... Dec 25 f Fuyumi, March 19, 2011 Two is not a prime number because it is even. b This is a holiday. f This is a holiday too !!! Argument for default-all: If annotations are on domain values, then retrieving all annotations are relevant. * Alternative representation Annotation table Sa: B annotation 2 b: Bob, March 18, 2011 This number is a prime number. 2 f: Fuyumi, March 19, 2011 Two is not a prime number because it is even Annotation table Sa: Date annotation Dec 25 This is a holiday. Counter-Argument: But then these annotations can be modeled in a separate table as normalized tables. Alternative interpretation suggested by Wang-Chiew Tan (example created after conversation at Sigmod 2011) 10 Backup: Detailed Example 2 t1 t2 t3 t4 Ra A 1a 1c 2e 2g B 2b 3d 2f 4h Q5(x,y):-Ra(x,y),Ra(y,_),Ra(x,_) A B t5 1a,c 2b,e,g {{t1,t3},{t1,t2,t3},{t1,t4},{t1,t2,t4}} t6 2e,g 2e,f,g {{t3},{t3,t4}} {{t1,t3}, {t1,t4}} {{t3}} αwm (~QRI lineage) Why-provenance (αw) Where-provenance (αp) Default-all propagation (αpd) A B 1a,c 2b,e,f,g 2e,g 2b,e,f {t1,t3,t4} {t3} Minimal witness basis (αwm) Minimal propagation (αpm) t4 t5 A 1a 2e B 2b,e,g 2e,f αpd(t4,B,Q5) = αp(t4,B,Q6) with Q6(x,y):-Ra(x,y),Ra(y,_),Ra(x,_) ,Sa(_,y) Note minimal propagation is not equivalent to just evaluating the where-provenance for the query: Q7(x,y):-Ra(x,y),Ra(y,_). E.g. αp(t5,B,Q7) = {e,f,g} 11