On the Expressiveness of Implicit Provenance in Query and Update Languages

advertisement

On the Expressiveness of Implicit Provenance in

Query and Update Languages

Peter Buneman

1

James Cheney

1

Stijn Vansummeren

2

1

University of Edinburgh, Scotland

2

Hasselt University and Transnational University of Limburg, Belgium

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

Introduction

Provenance is: the history of ownership of a valued object or work of art/literature a record of origin, modification, influences used to validate authenticity, integrity, validity of an object

Valuable because it’s hard to collect, verify

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

Introduction

Chris curates a DB create table DB(Beer, Type, Origin) insert into DB values( . . .

) insert into DB select ∗ from S where origin=’USA’ insert into DB select ∗ from T where type=’stout’

Beer Type Origin

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

Duvel blond Belgium

Heineken blond USA

Bud blond USA

Guinness stout Ireland

Introduction

Chris curates a DB create table DB(Beer, Type, Origin) insert into DB values( . . .

) insert into DB select ∗ from S where origin=’USA’ insert into DB select ∗ from T where type=’stout’

Beer

Duvel

Type Origin blond Belgium

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

Heineken blond USA

Bud blond USA

Guinness stout Ireland

Introduction

Chris curates a DB create table DB(Beer, Type, Origin) insert into DB values( . . .

) insert into DB select ∗ from S where origin=’USA’ insert into DB select ∗ from T where type=’stout’

Beer

Duvel

Type Origin blond Belgium

Heineken blond USA

Bud blond USA

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

Guinness stout Ireland

Introduction

Chris curates a DB create table DB(Beer, Type, Origin) insert into DB values( . . .

) insert into DB select ∗ from S where origin=’USA’ insert into DB select ∗ from T where type=’stout’

Beer

Duvel

Type Origin blond Belgium

Heineken blond USA

Bud blond USA

Guinness stout Ireland

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

Introduction

Chris curates a DB create table DB(Beer, Type, Origin) insert into DB values( . . .

) insert into DB select ∗ from S where origin=’USA’ insert into DB select ∗ from T where type=’stout’

Beer

Duvel

Type Origin blond Belgium

Heineken blond USA

Bud blond USA

Guinness stout Ireland

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

Introduction

Chris curates a DB create table DB(Beer, Type, Origin) insert into DB values( . . .

) insert into DB select ∗ from S where origin=’USA’ insert into DB select ∗ from T where type=’stout’

Beer

Duvel

Type Origin blond Belgium

Heineken blond USA

Bud blond USA

Guinness stout Ireland

Provenance is vital for assessing trustworthiness

Manual provenance recording is tedious and error-prone

Automatic provenance recording support?

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

Introduction

Problem statement

Given a query or update, define for each item in the output where

(if anywhere) it comes from in the input

A B

1 2

8 1

(select ∗ from R where A <> 1) union (select A , 5 as B from R where A = 1)

Different definitions possible:

Wang and Madnick (VLDB 1990)

Buneman et al. (ICDT 2001)

Bhagwat et al. (VLDB 2005)

A B

1 5

8 1

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

Which are suitable/to be preferred and why?

Introduction

Problem statement

Given a query or update, define for each item in the output where

(if anywhere) it comes from in the input

A B

1 2

8 1

(select ∗ from R where A <> 1) union (select A , 5 as B from R where A = 1)

Different definitions possible:

Wang and Madnick (VLDB 1990)

Buneman et al. (ICDT 2001)

Bhagwat et al. (VLDB 2005)

Which are suitable/to be preferred and why?

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

A B

1 5

8 1

Other work on provenance

Why -provenance/ lineage that describes for each output item what input items it was influenced by.

Woodruff and Stonebreaker (ICDE 1997), Cui et al (TODS 2000),

Buneman et al (ICDT 2001), Green et al (PODS 2007)

Workflow -provenance for data processed through arbitrary workflows instead of queries/updates

Rose and Frew (ACM Computing Surveys, 2005),

Davidson et al. (Data Engineering Bulletin 2007)

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

Propagation-approach to provenance

Implicit where-provenance

Wang and Madnick (VLDB 1990) and Bhagwat et al. (VLDB 2004) for atomic data values in queries.

This talk: provenance for tables, tuples, and atomic data values in queries and updates.

A B

1 2

8 1

(select ∗ from R where A <> 1) union (select A , 5 as B from R where A = 1)

A B

1 5

8 1

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

Propagation-approach to provenance

Implicit where-provenance

Wang and Madnick (VLDB 1990) and Bhagwat et al. (VLDB 2004) for atomic data values in queries.

This talk: provenance for tables, tuples, and atomic data values in queries and updates.

A B

1 2

8 1

(select ∗ from R where A <> 1) union (select A , 5 as B from R where A = 1)

A B

1 5

8 1

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

Propagation-approach to provenance

Implicit where-provenance

Wang and Madnick (VLDB 1990) and Bhagwat et al. (VLDB 2004) for atomic data values in queries.

This talk: provenance for tables, tuples, and atomic data values in queries and updates.

A B

1 2

8 1

P [ (select ∗ from R where A <> 1) union (select A , 5 as B from R where A = 1) ]

A B

1 5

8 1

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

Outline

1

Provenance for Queries

2

Provenance for Updates

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

Queries

Objects: atoms – records – sets

Complex object queries: e ::= x | a | ( A : e , . . . , B : e

0

) | e .

A

| { e , . . . , e

0 } | S e | { e

0 | x ∈ e } | if e

1

= e

2 then e t else e f

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

Example select ∗ from R where A <> 1

S { if x .

A = 1 then /0 else { x } | x ∈ R }

Queries

Objects: atoms – records – sets

Complex object queries: e ::= x | a | ( A : e , . . . , B : e

0

) | e .

A

| { e , . . . , e

0 } | S e | { e

0 | x ∈ e } | if e

1

= e

2 then e t else e f

Example select ∗ from R where A <> 1

S { if x .

A = 1 then /0 else { x } | x ∈ R }

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

Provenance for queries

Queries create new objects

Objects copied from the input retain their color.

Objects resulting from constant, record, or set constructor are colored blank.

A B

1 2

8 1

P R

A B

1 2

8 1

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

A B

1 2

8 1

P S { P x { .

( A : x .

A , x B | : x x ∈ .

B ) } | x x ∈ R } ∈ R }

A B

1 2

8 1

8 1

Provenance for queries

Queries create new objects

Objects copied from the input retain their color.

Objects resulting from constant, record, or set constructor are colored blank.

A B

1 2

8 1

P { x | x ∈ R }

A B

1 2

8 1

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

A B

1 2

8 1

P S { P x { .

( A : x .

A , B : R x /0 .

B ) | x x ∈ R } ∈ R }

A

A

1

8

B

B

2

1

Provenance for queries

Queries create new objects

Objects copied from the input retain their color.

Objects resulting from constant, record, or set constructor are colored blank.

A B

1 2

8 1

P S { if x .

A = 1 then /0 else { x } | x ∈ R }

A

8

B

1

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

A B

1 2

8 1

P { ( A : x .

A , x B | : x R x ∈ .

B ) } | x ∈ R }

A B

1 2

8 1

Provenance for queries

Queries create new objects

Objects copied from the input retain their color.

Objects resulting from constant, record, or set constructor are colored blank.

A B

1 2

8 1

P { ( A : x .

A , B : x .

B ) | x ∈ R }

A B

1 2

8 1

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

A B

1 2

8 1

P S { if x .

A P { x P | x R ∈ R } { x } | x ∈ R }

A B

A B

8 1

8 1

Implicit vs explicit provenance

Theorem: For every query e there exists a query f that explicitly implements the implicit provenance semantics P [ e ].

clr

A B

1 2

8 1 to exp clr val

A

1

8

B

2

1

P [ e ]

A B

1 5

8 1 to exp clr clr f val

A

1

8

B

5

1

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

Expressiveness

Question: Can every explicit f be expressed implicitly?

Answer: No, the provenance semantics P [ e ] of a query e is:

1 Copying

2 Recording

3 Bounded Inventing

Theorem: Every explicit, copying, recording, and bounded inventing f can be expressed implicitly as the provenance semantics P [ e ] of a query e .

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

Expressiveness

The implicit provenance semantics P [ e ] of a query e is copying :

Every non-blank object in the output is identical to the unique object in the input with the same color.

A B

1 2

8 1

P [ e ]

A B

1 5

8 1

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

Corollary: clr Non-copying explicit clr A B f P f

[ e ] clr clr

A B

1 2

1

A

1

5

8 1

8

8

1

B

5

1

Expressiveness

The implicit provenance semantics P [ e ] of a query e is copying :

Every non-blank object in the output is identical to the unique object in the input with the same color.

A B

1 2

8 1

P [ e ]

A B

1 5

8 1

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

Corollary: clr Non-copying explicit clr A B f f clr clr

1

8

2

1

A val

1

8

B

5

1

Expressiveness

The implicit provenance semantics P [ e ] of a query e is copying :

Every non-blank object in the output is identical to the unique object in the input with the same color.

clr clr A val

1

8

B

2

1 f clr clr A val

1

8

B

5

1

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

Corollary:

A B f P [ e ]

A B

1 2 1 5

8 1 8 1

Expressiveness

The implicit provenance semantics P [ e ] of a query e is copying :

Every non-blank object in the output is identical to the unique object in the input with the same color.

Corollary:

Non-copying explicit f cannot be expressed implicitly.

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

clr clr

A val B

1

A

1

2

8

8

1

B

2

1

P f

[ e ] clr clr

A B

1

A

1

5

8

8

1

B

5

1

Expressiveness

The implicit provenance semantics P [ e ] of a query e is propagating :

It never compares colors and hence commutes with recolorings

A B

1 2

8 1

ρ

A B

1 2

8 1

P [ e ]

P [ e ]

A B

1 5

8 1

ρ

A B

1 5

8 1

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

Corollary: clr val clr clr A

1

B

2 f clr A

Non-propagating explicit f cannot be expressed implicitly

1 val

But such f query provenance rather than define it

8

B

5

1

ρ f : ρ clr

(if e

1

= e

2 then val e

3 else e

4

) can occur in f only if e

1 and e

2 do not output colors.

B f that f explicitly query provenance, and show them expressively complete to the

Always define propagating functions color algebra , an extension of

0 2 the relational algebra that implicitly queries provenance.

Expressiveness

The implicit provenance semantics P [ e ] of a query e is propagating :

It never compares colors and hence commutes with recolorings clr clr

ρ val

A

1

8 clr clr val

A

1

8

B

2

1

B

2

1 f f clr clr val

A

1

8

B

5

1

ρ clr clr val

A

0

B

2

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

Corollary:

A B

P [ e ]

A B

1 2 f 1 5

8 1 provenance rather than define it

8 1

ρ f : ρ

(if e

1

= e

2 then e

3 else e

4

) can occur in f only if e

1 and e

2 do not

A

1

8

B

2

1 f P [ e ] color algebra implicitly queries

A

1

8

B

5

1

Expressiveness

The implicit provenance semantics P [ e ] of a query e is propagating :

It never compares colors and hence commutes with recolorings

Corollary:

Non-propagating explicit f cannot be expressed implicitly

But such f query provenance rather than define it

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

clr clr

A B

1

A

1

2

8

8

1

B

2

1

P f

[ e ] clr clr

A B

1

A

1

5

8

8

1

B

5

1

ρ f : ρ

ρ clr

(if e

1

=

A

1

8 e

2 then

B

2

1 e

3

B else e

4

) can occur in f P f

[ e ] f only if implicitly queries e

1 and color algebra

A

1

8 e

2

0 do not

B

5

1

2

Expressiveness

The implicit provenance semantics P [ e ] of a query e is propagating :

It never compares colors and hence commutes with recolorings

Corollary:

Non-propagating explicit f cannot be expressed implicitly

But such f query provenance rather than define it

Related work:

[Geerts et al, ICDE 2006] and [Geerts and Van den Bussche, DBPL

2007] consider the queries f that explicitly query provenance, and show them expressively complete to the color algebra , an extension of the relational algebra that implicitly queries provenance.

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

clr clr

A B

1

A

1

2

8

8

1

B

2

1

P f

[ e ] clr clr

A B

1

A

1

5

8

8

1

B

5

1

ρ f : ρ

ρ clr

(if e clr

1

=

A e

A

2 then

B e

3

B else e

4

1

8

1

8

2

1

2

1

) can occur in f only if e

1

P f

[ e ] clr clr and e

2

A do not

B

1 A

0

5

8 1

B

2

Expressiveness

The implicit provenance semantics P [ e ] of a query e is propagating :

It never compares colors and hence commutes with recolorings

Corollary:

Non-propagating explicit f cannot be expressed implicitly

But such f query provenance rather than define it

Restriction to recording f :

(if e

1

= e

2 then output colors.

e

3 else e

4

) can occur in f only if e

1 and e

2 do not

Always define propagating functions

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

clr clr

A B

1

A

1

2

8

8

1

B

2

1

P f

[ e ] clr clr

A B

1

A

1

5

8

8

1

B

5

1

ρ ρ

ρ clr

[Geerts et al, ICDE 2006] and [Geerts and Van den Bussche, DBPL

2007]

A

1

8

A

B

2

1

B f P f

[ e ] color algebra implicitly queries

A

1

8

0

B

5

1

2

Expressiveness

The implicit provenance semantics P [ e ] of a query e is bounded inventing :

Only constants appearing in e will be colored blank.

clr clr val

A

1

B

2 f clr clr val

A

1

B

2

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

Corollary:

Unbounded inventing explicit f cannot be expressed implicitly.

But such f are not “domain preserving” w.r.t. provenance.

Expressiveness

The implicit provenance semantics P [ e ] of a query e is bounded inventing :

Only constants appearing in e will be colored blank.

Corollary:

Unbounded inventing explicit f cannot be expressed implicitly.

But such f are not “domain preserving” w.r.t. provenance.

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

clr clr val

A

1

B

2 f clr clr val

A

1

B

2

Main Result

Theorem: Every explicit, copying, recording, and bounded inventing f can be expressed implicitly as the provenance semantics P [ e ] of a query e .

clr clr val

A

1

8

B

2

1 to impl

A B

1 2

8 1 f P [ e ] clr clr val

A

1

8

B

5

1 to impl

A B

1 5

8 1

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

Implicit Provenance Semantics is a reasonable candidate to record provenance automatically since we do not lose flexibility

Main Result

Theorem: Every explicit, copying, recording, and bounded inventing f can be expressed implicitly as the provenance semantics P [ e ] of a query e .

Implicit Provenance Semantics is a reasonable candidate to record provenance automatically since we do not lose flexibility

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

clr clr val

A

1

8

B

2

1 f clr clr val

A

1

8

B

5

1 to impl to impl

A B

1 2

8 1

P [ e ]

A B

1 5

8 1

Proof Sketch

Lemma: Every explicit, recording f is polymorphic w.r.t. colors.

clr clr val

A

1

8

B

2

1

α clr

A B

1 2

8 1 clr

1 2

8 1 val

A

1 1

8 8

B

2 2

1 1

P [ f

0

] f clr clr A val

1

8

B

5

1

α clr blank clr blank

8 1 val

A

1 1

8 8

B blank 5

1 1

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

Proof Sketch clr clr f val

A

1

8

B

2

1 clr clr A val

1

8

B

5

1 to impl to impl

A

1

8 clr

B

2

1 clr blank

A B

1 2

8 1

P [ enc ]

1

8 clr

2

1

1

8 val

A

1

8

P [ f

0

]

2

1

B

2

1 clr blank

8 1

A B

1 5

8 1 val

A

1 1

8 8

P [ dec ]

B blank 5

1 1

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

Questions and Answers

Question: Is the ability to construct deeply nested objects vital here?

Answer: No, it is possible to construct P [ e ] equivalent to f such that e never constructs deeper intermediate objects than f .

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

Question: What happens when f is propagating instead of recording?

Conjecture: Every propagating f can be written as a recording one.

Questions and Answers

Question: Is the ability to construct deeply nested objects vital here?

Answer: No, it is possible to construct P [ e ] equivalent to f such that e never constructs deeper intermediate objects than f .

Question: What happens when f is propagating instead of recording?

Conjecture: Every propagating f can be written as a recording one.

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

Outline

1

Provenance for Queries

2

Provenance for Updates

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

What about updates?

Updates are usually no more expressive than queries.

A B

1 2

8 1 update R set B = 5 where A = 1

A B

1 5

8 1

A B

1 2

8 1

(select ∗ from R where A <> 1) union (select A , 5 as B from R where A = 1)

A B

1 5

8 1

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

What about updates?

Updates are usually no more expressive than queries.

A B

1 2

8 1

P [update R set B = 5 where A = 1]

A B

1 5

8 1

A B

1 2

8 1

P [(select ∗ from R where A <> 1) union (select A , 5 as B from R where A = 1)]

A B

1 5

8 1

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

Updates

Objects: atoms – records – sets

Complex object updates (Liefke and Davidson, 1999): u ::= skip | u

1

; u

2

| repl e | [ x ] u

| insert e | remove e | iter u

| A ⇒ u | add e | drop A

Theorem: Updates and queries are equally expressive under the “normal” semantics

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

Provenance for updates

Updates modify objects

Objects retain their color, unless they are replaced by another object.

A B

1 2

8 1

P iter [ x ] B ⇒ repl(if x .

A = 1 then 5 else x .

B )

A B

1 5

8 1

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

Implicit vs explicit provenance

Theorem: For every update u there exists a query f that explicitly implements the implicit provenance semantics P [ u ].

clr

A B

1 2

8 1 to exp clr val

A

1

8

B

2

1

P [ u ]

A B

1 5

8 1 to exp f clr clr val

A

1

8

B

5

1

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

Expressiveness

The implicit provenance semantics P [ u ] of an update u is: kind preserving

A B

1 2

8 1

P [ u ]

A B C

1 5 2

8 1 8 recording bounded inventing

Theorem: Every explicit, kind preserving, recording, and bounded inventing f can be expressed implicitly as the provenance semantics P [ u ] of an update.

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

Summary

Queries and updates implicitly define provenance

Queries create new objects, updates modify them

Readily implemented

Formal justification for the proposed semantics

It is expressively complete w.r.t. natural classes of queries that explicitly manipulate provenance.

P. Buneman, J. Cheney, S. Vansummeren

The Expressiveness of Implicit Provenance

Download