Provenance: Problem, Architectural issues, Towards Trust Luc Moreau

advertisement
Provenance:
Problem, Architectural issues,
Towards Trust
Luc Moreau
L.Moreau@ecs.soton.ac.uk
University of Southampton
Contents
 A definition of provenance
 Example 1: Aerospace engineering
 Example 2: Organ transplant management
 Example 3: Bioinformatics grid
 Provenance architecture
 Towards Trust
 Conclusion
The Grid and Virtual
Organisations
 The Grid problem is defined as coordinated resource
sharing and problem solving in dynamic, multiinstitutional virtual organisations [FKT01].
 Effort is required to allow users to place their trust in
the data produced by such virtual organisations
 Understanding how a given service is likely to modify
data flowing into it, and how this data has been
generated is crucial.
Provenance and Virtual
Organisations
 Given a set of services in an open grid environment
that decide to form a virtual organisation with the aim
to produce a given result;
How can we determine the process that generated the
result, especially after the virtual organisation has
been disbanded?
 The lack of information about the origin of results
does not help users to trust such open environments.
Provenance and Workflows
 Workflow enactment has become popular in the
Grid and Web Services communities
 Workflow enactment can be seen as a scripted
form of virtual organisation.
 The problem is similar: how can we determine
the origin of enactment results.
Provenance: Definition
 Provenance is an annotation able to explain how a
particular result has been derived.
 In a service-oriented architecture, provenance
identifies what data is passed between services, what
services are available, and what results are generated
for particular sets of input values, etc.
 Using provenance, a user can trace the “process” that
led to the aggregation of services producing a
particular output.
Provenance in Aerospace
Engineering


Provenance requirement: to
maintain a historical record
of outputs from each subsystem involved in
simulations.
Aircrafts’ provenance data
need to be kept for up to 99
years when sold to some
countries.
Currently,
little
direct
support is available for this.
Provenance in Organ Transplant
Management
 Decision support systems for organ and tissue
transplant, rely on a wide range of data sources,
patient data, and doctors’ and surgeons’
knowledge
 Heavily regulated domain: European, national,
regional and site specific rules govern how
decisions are made.
 Application of these rules must be ensured, be
auditable and may change over time
 Provenance allows tracking previous decisions,
which is crucial in maximising the efficiency in
matching and recovery rate of patients
Provenance in a Bioinformatics
Grid (myGrid)
myGrid builds a personalised problem-solving environment
that helps bioinformaticians find, adapt, construct and execute
in silico experiments
 Keep the scientist informed as to the provenance of data
relevant to their experiment space
 Provenance in Drugs Discovery process:
FDA requirement on drug companies to keep a
record of provenance of drug discovery as long
as the drug is in use (up to 50 years sometimes).
What is the problem?
 Provenance recording should be part of the
infrastructure, so that users can elect to enable
it when they execute their complex tasks over
the Grid or in Web Services environments.
 Currently, the Web Services protocol stack and
the Open Grid Services Architecture do not
provide any support for recording provenance.
Architectural Vision
Architectural Vision
 Provenance gathering is a collaborative process that
involves multiple entities, including the workflow
enactment engine, the enactment engine's client, the
service directory, and the invoked services.
 Provenance data will be submitted to one or more
“provenance repositories” acting as storage for
provenance data.
 Upon user's requests, some analysis, navigation and
reasoning over provenance data can be undertaken.
Architectural Vision
 Storage could be achieved by a provenance
service.
 Provenance service would provide support for
analysis, navigation or reasoning over
provenance
 Client side support for submitting provenance
data to the provenance service.
A First Prototype (Szomszor,Moreau 03)
 A service-oriented architecture for provenance support
in Grid and Web Services environments, based on the
idea of a provenance service;
 A client-side API for recording provenance data for
Web Service invocation;
 A data model for storing provenance data;
 A server-side interface for querying provenance data;
 Two components making use of provenance:
provenance browsing and provenance validation.
Prototype Overview
Prototype Sequence Diagram
Prototype Provenance Data
Model
Prototype Provenance Browser
Discussion
 In order for provenance data to be useful, we expect such a
protocol to support some “classical” properties of distributed
algorithms.
 Using mutual authentication, an invoked service can ensure
that it submits data to a specific provenance server, and viceversa, a provenance server can ensure that it receives data from
a given service.
 With non-repudiation, we can retain evidence of the fact that a
service has committed to executing a particular invocation and
has produced a given result.
 We anticipate that cryptographic techniques will be useful to
ensure such properties
Towards Trust
Towards Trust
Using the provenance of data, trust metrics of
the data can be derived from:
 Trust the user places in invoked services
 Trust the user places in the input data
 Trust the user places in the enacted workflow
 Trust the user places in the provenance service.
 The purpose of project PASOA to investigate
provenance in Grid architectures
 Funded by EPSRC under the “fundamental computer
science for e-Science call”
 In collaboration with Cardiff
 www.pasoa.org
Conclusion
 Provenance is a rather unexplored domain
 Strategic to bring trust in open environment
 Necessity to design a configurable architecture
capable of support multiple requirements from very
different application domains.
 Need to further investigate the algorithmic
foundations of provenance, which will lead to scalable
and secure industrial solutions.
Publications
 [SM03] Martin Szomszor and Luc Moreau. Recording and
reasoning over data provenance in web and grid services. In
International Conference on Ontologies, Databases and
Applications of SEmantics (ODBASE'03), volume 2888 of
Lecture Notes in Computer Science, pages 603-620, Catania,
Sicily, Italy, November 2003.
 [MCS+03] Luc Moreau, Syd Chapman, Andreas Schreiber,
Rolf Hempel, Omer Rana, Lazslo Varga, Ulises Cortes, and
Steven Willmott. Provenance-based trust for grid computing position paper. 2003.
Acknowledgements
 Martin Szomzor, Southampton
 Syd Chapman, IBM
 Omer Rana, Cardiff
 Andreas Schreiber and Rolf Hempel, DLR
 Lazslo Varga, SZTAKI
 Ulises Cortes and Steven Willmott, UPC
 Mark Greenwood, Carole Goble, Manchester
Download