Query-Rewrite-Insensitive Where-Provenance (slides)

advertisement
Version June 20, 2011
Default-all is dangerous!
Wolfgang Gatterbauer
Alexandra Meliou
Dan Suciu
3rd USENIX Workshop on the Theory and Praxis of Provenance (Tapp'11)
Database group
University of Washington
http://db.cs.washington.edu/causality/
Overview Provenance Definitions
Why?
Where?
Witness
Naive
"SQL interpretation"
Why-provenance =
Where-provenance =
witness basis (αw) Provenance
propagation (αp)
definition
Buneman et al. [ICDT’01]
Buneman et al. [PODS’02]
Minimal
witness basis (αwm)
Default-all
QRI definition
d)
propagation
(α
p
(Query-Rewrite-
Buneman et al. [ICDT’01]
We do not discuss here whether QRI is
desirable (see also Glavic, Miller [Tapp'11] ),
but merely point out that, if aiming for
QRI, care has to be taken about the
ramifications of the proposed semantics.
Bhagwat et Insensitive)
al. [VLDB’04]
Has problems if
one interprets
annotations on
attribute values
Minimal
propagation (αpm)
Proposed in this paper!
Independent work presented at this WS
2
Overview Provenance Definitions
Why?
Where?
Witness
Naive
"SQL interpretation"
Why-provenance =
Where-provenance =
witness basis (αw) Provenance
propagation (αp)
definition
Glavic, Miller [Tapp'11]
Buneman et al. [ICDT’01]
Buneman et al. [PODS’02]
Minimal
witness basis (αwm)
Default-all
QRI definition
d)
propagation
(α
p
(Query-Rewrite-
Buneman et al. [ICDT’01]
Bhagwat et Insensitive)
al. [VLDB’04]
Has problems if
one interprets
annotations on
attribute values
Minimal
propagation (αpm)
Proposed in this paper!
Note that Minimal propagation is
"stable", in contrast to Default-all
3
Example 1: Query-Rewrite-Insensitivity (QRI)
Why-provenance = witness basis (αw)
Why
Input
R
A
t1 1
t2 1
t3 2
B
2
3
2
Where
Input
Ra
A
1a
1c
2e
B
2b
3d
2f
Query 1
Q1(x,y):-R(x,y)
A B
1 2 {{t1}}
1 3 {{t2}}
2 2 {{t3}}
Minimal witness basis (αwm)
Query 2 ≡ Query 1
Q2(x,y):-R(x,y),R(_,y)
A B
1 2 {{t1},{t1,t3}}
{{t1}}
{t1,t3}
1 3 {{t2}}
{{t2}}
{t2}
2 2 {{t3},{t1,t3}}
{{t3}}
{t1,t3}
Where-provenance = propagation (αp)
Query 1
Q1(x,y):-Ra (x,y)
A B
1a 2b
1c 3d
2e 2f
Lineage (αl)
Minimal propagation (αpm)
Default-all propagation (αpd)
Query 2 ≡ Query 1
Q2(x,y):-Ra(x,y),Ra (_,y)
A B
A B
A
a
b,f
a,c
b,f
1 2
1 2
1a
1c 3d
1a,c 3d
1c
2e 2b,f
2e 2b,f
2e
Example adapted from Cheney et al. [Foundations and Trends in DBs’09]
B
2b
3d
2f
4
Real example: Why Default-all is dangerous
Hanako queries a community DB for contents of LF-milk*:
Community Database
Ra
Food
Content
LF Milk
Cesium-137b
LF Milk
Calciumd
SC Water Cesium-137f
Hanako's query
Q (y):-Ra(‘LF Milk’,y)
b Bob, March 18, 2011
Don't drink, lots of Cesium!
f Fuyumi, March 19, 2011
No Cesium, save to drink!
Content
Cesium-137???
Calciumd
Default-all propagation makes her drink the milk:
Default-all propagation (αpd)
Content
Cesium-137bf
Calciumd
b Bob, March 18, 2011
Don't drink, lots of Cesium!
"semantically irrelevant
information": annotations leak over from SC
Water tuple to LF Milk
f Fuyumi, March 19, 2011
No Cesium, save to drink!
*
Note the one-to-one correspondence of this example with example 1
Minimal propagation (αpm)
Content
Cesium-137b
Calciumd
b Bob, March 18, 2011
Don't drink, lots of Cesium!
"all relevant and only relevant"
5
Definition Minimal propagation (αpm)
Intuition:
Return the intersection between:
•query-specific where-provenanc (αp)
•and QRI minimal witness basis (αwm)
transforms 'sets of sets' into 'sets',
hence something like QRI lineage
Example 1
Input
Ra
A
t1 1a
t2 1c
t3 2e
B
2b
3d
2f
Where provenance (αp)
Query 2
Q2(x,y):-Ra(x,y),Ra (_,y)
A B
{t1}
1a 2b,f {{t1}}
{t2}
1c 3d {{t2}}
{t3}
2e 2b,f {{t3}}
"all relevant ... and only relevant"
Minimal propagation (αpm)
t4
t5
t6
A
1a
1c
2e
B
2b
3d
2f
αwm
Minimal witness basis (αwm)
6
Example 1: Illustration of "minimal" versus "all"
Why-provenance
Why-provenance (αw)
Minimal witness basis (αwm)
Where-provenance
Where-provenance (αp)
Default-all propagation (αpd)
Minimal propagation (αpm)
7
Interpretation of Annotations 1: Attribute Value*
*
Interpretation of annotations on entity attribute values favored by us and underlying our model
8
Interpretation of Annotations 1: Attribute Value*
Annotations on values of an
attribute (here "population") for
a particular entity (here "Athens")
Argument: Interpreting cell annotations as relevant to the tuple (entity)
adds something that is not trivially modeled with normalized tables.
*
Interpretation of annotations on entity attribute values favored by us and underlying our model
9
Interpretation of Annotations 2: Domain Value*
Domain value annotations*
Input Ra:
A B
1a 2b
1c 3d
2e 2f
b Bob, March 18, 2011
This number is a prime number.
Input Sa:
... Date
... Dec 25
... ...
... Dec 25
f Fuyumi, March 19, 2011
Two is not a prime number
because it is even.
b This is a holiday.
f This is a holiday too !!!
Argument for default-all: If annotations
are on domain values, then retrieving
all annotations are relevant.
*
Alternative representation
Annotation table Sa:
B annotation
2 b: Bob, March 18, 2011
This number is a prime number.
2 f: Fuyumi, March 19, 2011
Two is not a prime number
because it is even
Annotation table Sa:
Date
annotation
Dec 25 This is a holiday.
Counter-Argument: But then these annotations can be modeled in a separate
table as normalized tables.
Alternative interpretation suggested by Wang-Chiew Tan (example created after conversation at Sigmod 2011)
10
Backup: Detailed Example 2
t1
t2
t3
t4
Ra
A
1a
1c
2e
2g
B
2b
3d
2f
4h
Q5(x,y):-Ra(x,y),Ra(y,_),Ra(x,_)
A
B
t5 1a,c 2b,e,g {{t1,t3},{t1,t2,t3},{t1,t4},{t1,t2,t4}}
t6 2e,g 2e,f,g {{t3},{t3,t4}}
{{t1,t3}, {t1,t4}}
{{t3}}
αwm (~QRI lineage)
Why-provenance (αw)
Where-provenance (αp)
Default-all propagation (αpd)
A
B
1a,c 2b,e,f,g
2e,g 2b,e,f
{t1,t3,t4}
{t3}
Minimal witness basis (αwm)
Minimal propagation (αpm)
t4
t5
A
1a
2e
B
2b,e,g
2e,f
αpd(t4,B,Q5) = αp(t4,B,Q6) with
Q6(x,y):-Ra(x,y),Ra(y,_),Ra(x,_) ,Sa(_,y)
Note minimal propagation is not equivalent to just
evaluating the where-provenance for the query:
Q7(x,y):-Ra(x,y),Ra(y,_). E.g. αp(t5,B,Q7) = {e,f,g}
11
Download