preprototypeAHM05 - The EU Provenance Project

advertisement

A Proof of Concept:

Provenance in a Service

Oriented Architecture

Liming Chen, Victor Tan,

Fenglian Xu, Alexis Biller,

Paul Groth, Simon Miles ,

John Ibbotson, Michael Luck and Luc Moreau

Purpose

• Asking questions about the provenance of something, i.e. the process by which it came to be as it is, is essential in many domains

• We are working with bioinformaticians, medics, aerospace engineers, physicists and have found a wide range of questions they wish to ask

• A simple example application can:

– Clarify the requirements on software to aid answering those questions

– Be used to explain the issues involved to non-domain experts

– Be extended in controlled ways to explore issues that arise in ‘real’ applications

EU Provenance and PASOA

• Recent work of the EU Provenance project:

– Developed a logical architecture for software to aid answering provenance-related questions, along with other research on security, scalability and user tool support.

– Now being applied to two project applications: organ transport management (UPC, Spain) and aerospace engineering (DLR,

Germany)

– The logical architecture document should be released next week: keep an eye on www.gridprovenance.org

• Recent work of the PASOA project:

– Has focused on e-Science applications and has gathered requirements, developed protocols and software

– EU Provenance used PASOA software for the work described in this talk

– PASOA will be discussed in the following two presentations

Outline

• The example application

• Asking provenance-related questions

• The example as a service-oriented process

• Recording documentation of a process

• What does the example show us?

• What are the limits of the example?

• Conclusions

The example application

Baking a Victoria

Sponge

• INGREDIENTS

– 110g (4oz) Butter

110g (4oz) Caster Sugar

110g (4oz) Self-raising Flour

2 Eggs

Vanilla Essence or 1 tsp Grated Lemon Rind

• RECIPE

– Preheat oven to 190°C: 375°F: Gas 5.

Whisk together the butter and sugar until light and creamy.

Add the beaten eggs gradually with a little of the flour.

Fold in the remaining sieved flour and add the flavouring.

Divide equally between two 15cm (6 inch) sandwich tins.

Bake for 20 - 25 minutes.

Turn out on to a wire rack to cool.

• This is not so a contrived an example!

www.thefoody.com

and 20g butter

20g sugar whisk them together get mixture

1

2 eggs mix the beaten eggs with mixture 1 obtain mixture 2 beat the eggs for 2 minutes

together with mixture 2 fold to mixture 3

100g flour

set baking temperature to 180˚C set baking time to

30min put mixture 3 into oven obtain a cake

cake

We then set a time for baking

After Baking

• Some questions can be asked after baking a cake

• Answers to the questions can be found if we record details of the baking process during its execution

• Details of the baking process is what we call the provenance of a cake

“What went wrong?”

Questions

• Did we follow the recipe accurately?

– Did we use the correct ingredients at the right time?

– Did we provide the correct quantities? Correct units?

– Did we perform actions for the right duration?

 We need to keep a record of all actions performed with all their parameters (such as the number of eggs used)

• Organ transplant example : Did the medics follow the correct procedure?

• Bioinformatics example : Did I analyse a amino acid sequence using tools that actually only apply to nucleotide sequences?

“What went wrong?”

Questions

• Other factors can affect the baking process:

– Amount of flour required varies with altitude

– Oven is broken and baked at a different temperature

 We need to know the “internal state” of the different entities participating in the baking process (such as actual oven temperature or oven altitude)

• Organ transplant example : By what criteria did a team decide to accept or reject an organ?

• Bioinformatics example : What script was used by the services to perform each stage of the experiment?

“Process Analysis”

Questions

• Did we use the same amount of ingredients for baking cake 1 and cake 2? or in the same proportion?

• What was the longest step in the execution of a recipe?

• Why did not we finish the process? Where did we stop?

 The process that led to a given cake should be delimited and analysable

• Organ transplant example: Which patient’s death led to the organ now being transplanted?

• Bioinformatics example: What samples led to the final analysis result?

“What Did Parties Do?”

Questions

• Did the baker follow the user’s instructions (regardless of any claim from the baker)?

• Did each step of the baking process follow the user’s instructions? Did they receive the correct instructions?

– Did they follow the received instructions?

 All entities should document their view of a process because it may vary

 Organ transplant example : Were there differing opinions on the suitability of an organ for transplant?

 Bioinformatics example : I claim I used a database in my experiments whose license allows me to patent my results: does the database owner confirm this?

Implementation

• We implemented the application as a set of Web Services, and then implemented clients that answered the provenancerelated questions by querying the provenance store

• This involved mapping the scenario onto a service-oriented architecture

User Baker

Service-Oriented

Process

Whisk

Beat &

Mix

Fold

Sugar + Flour

+ Beating Time + Temperature

Butter + Sugar

Mixture 1

Mixture 1 + Eggs + Beating Time

Mixture 2

Flour + Mixture 2

Mixture 3

Mixture 3 + Temperature + Baking Time

Cake

Cake

Oven

Bake

User Baker

Recording

Whisk

Beat &

Mix

Fold

Oven

Bake

Provenance

Store

After baking, the provenance store contains a trace of the different activities that were involved in the production of a cake.

The provenance of a cake is the documentation of the process that led to that cake

Baker (Sugar, Flour, Beating Time, Temperature

Whisk (Butter, Sugar)

WhiskReturn (Mixture 1)

Beat&Mix (Mixture 1, Eggs, Beating Time)

Beat&MixReturn (Mixture 2)

Fold (Flour, Mixture 2)

FoldReturn (Mixture 3)

OvenBake (Mixture 3, Temperature, Baking Time)

OvenBakeReturn (Cake)

BakerReturn (Cake)

What we have learnt

Process Documentation and Provenance

• We distinguish

– process documentation (the documentation recorded into a provenance store about a process)

– provenance (the information retrieved from a provenance store about a process)

• This is because we have found there to be different requirements on each

Process documentation

Processing

Provenance

Process documentation

• Should allow questions about the provenance of entities to be answered

• Should follow a consistent, application-independent structure so that independent parties can record documentation that is easily combined

– e.g. oven may be owned by someone other than the user, but their documentation is combined to answer whether the requested temperature was used

• Should state exactly what those recording it know to have happened, not confuse it with what they guessed or inferred had happened

– e.g. baker states that it put the cake in the oven, not that the cake was successfully baked, because the oven may have been broken

Provenance

• Should give the client asking for the provenance of something control over the scope of the answer

– e.g. whether the process that produced the flour is included in the provenance of the cake

• Should be/provide the information relevant to answering a client’s/user’s questions (not swamp them with detail)

– e.g. report how much flour used rather than giving XML structure sent between application components

• May (in order to achieve the above) include inferred information

– e.g. infer from baker putting mixture in oven and getting cake out that the cake was successfully baked from the mixture

Provenance architectures

• Should allow different parties to record independent documentation if they want to

– e.g. user and baker can record independently, allowing discrepancies to be noticed

• Should have no dependence on any one workflow engine/language, and no requirement for (explicit) workflows to be used at all

– e.g. our example application was written in Java, and baking in reality follows a plan in someone’s head

• Should have independence from any one product of a process: should not be necessary to store process documentation with any one result of a process

– e.g. the provenance of the cake, the provenance of the ingredients and the provenance of the intermediate mixtures overlap, so cannot claim it ‘belongs’ to any

Limitations and

Strengths

• The current example has limitations:

– Physical world treated as if it mapped directly to the electronic world: how does a baker record documentation in a provenance store Web Service? through a GUI? what if the GUI goes wrong or they use the GUI wrongly, do we still have sound process documentation?

– None of the objects in the process have constituent parts that we may want to independently find the provenance of

– Assumes a single provenance store that every service happily submits documentation to

• …but the strength of the example is that it can be simply extended to remove these limitations

Conclusions

• The simple example allows us to determine the requirements on software to record process documentation and make it available to users

• We have used it as a testbed, extending it to explore other aspects of provenance (along with other applications)

• It is rich enough to continue extending to mirror, in a controlled way, issues discovered in the future

EU Provenance Partners

• IBM United Kingdom Limited

• University of Southampton

• University of Wales, Cardiff

• Deutsches Zentrum fur Luft- und

Raumfahrt s.V

• Universitat Politecnica de Catalunya

• Magyar Tudomanyos Akademia

Szamitastechnikai es Automatizalasi

Kutato Intezet

Download