Provenance Aware Service Oriented Architecture (1 year on) www.pasoa.org

advertisement
Provenance Aware
Service Oriented Architecture
(1 year on)
www.pasoa.org
Professor Luc Moreau
University of Southampton
L.Moreau@ecs.soton.ac.uk
The PASOA Team
ƒ PASOA Southampton
ƒ Simon Miles, Paul Groth, Miguel Branco, Luc
Moreau
ƒ PASOA Cardiff
ƒ Ian Wootten, Shrija Rajbhandari, Omer Rana,
David Walker
Provenance Definition
ƒ Merriam-Webster Online dictionary:
ƒ the origin, source;
ƒ the history of ownership of a valued object or
work of art or literature
ƒ The provenance of a piece of data is the
process that led to the data
ƒ Our aim is to conceive a computer-based
representation of provenance that allows
us to perform useful analysis and
reasoning to support our use cases
Provenance Use Cases (1)
Bioinformatics: verification of
“experiment validity”.
High Energy Physics:
tracking, analysing, verifying
data sets in the ATLAS
Experiment of the Large
Hadron Collider (CERN)
Provenance Use Cases (2)
Aerospace engineering:
maintain a historical record
of design processes, up to
99 years.
Organ transplant management:
tracking of previous decisions,
crucial to maximise the efficiency
in matching and recovery rate of
patients
The Provenance Problem
Given a set of services in an open grid
environment that are composed in order
to produce a given result;
How can we determine the process that
generated the result?
(especially after their composition, i.e.,
virtual
organisation,
has
been
disbanded)
Core Interfaces to
Provenance “Lifecycle”
Provenance Store
Application
Application
Results
Record Documentation of Execution
Manage
Store and its
contents
Provenance
Store
Query
Provenance
of
Data
[Miles et al. 05]
ƒ Logical
Architecture
ƒ Adopted by
EU Provenance
as strawman
Recording & Querying
PReP [Groth et al. 04]
ƒ Protocol adopted by
application components
ƒ Allow for multiple
provenance stores
(scalability)
Query Interface [Miles et al.05]
ƒ Purpose
ƒ Obtain the provenance of some
specific data
ƒ Allow for “navigation” of the data
structure representing
provenance
ƒ Abstract interface
client
invocation
service
result
invocation
and result
recording
Provenance
Store
invocation
and result
recording
Provenance
Store
ƒ Allows us to view the
provenance store as if containing
XML data structures
ƒ Based on XPath and XQuery
Assertions about Performance
and Availability
ƒ A taxonomy of gathered
information about
performance
[Wootten, Rana 05]
ƒ Recorded (invocation
start/end time and counts)
ƒ Derived from Recorded
Information (averages)
ƒ Queried against other
actor owned metrics
ƒ Compilation of
assertions in a measure
of trust (both from
service and client
perspective)
Trust is a subjective probability that an
actor will perform a particular action
[Gambetta]
[Rajbhandari, Rana 05]
PReServ [Groth et al. 05]
ƒ Implementation of PReP
protocol and Query
WS Client
Interface
PS Client
Side
Library
ƒ Provenance store
implemented as a Web
Service
ƒ Client side libraries for
using Provenance Store
ƒ Axis Handler for
automatically recording
communication between
Axis-based Web Services
Axis
Handler
Web Service
PS Client
Side
Library
Axis
Handler
Provenance Service
WS Calls
Java Calls
Backend Store Interface
PS Client
Side
Library
Query Actor WS
File
System
Store
InMemory
Store
Backend Stores
…
Bioinformatics Application
ƒ Bioinformatics
workflow studying
compressibility of
biological sequences
ƒ Implemented as a
VDT workflow,
scheduled by Condor
ƒ Each service, script,
command records
provenance
[HPDC’05]
Bioinformatics Application (2)
ƒ Use Cases
ƒ Algorithm verification
ƒ A bioinformatician, A, downloads a protein
sequence from the RefSeq database and runs the
compressibility experiment.
ƒ A later performs the same experiment on the same
sequence data, again downloaded from RefSeq.
ƒ A compares the two experiment results and
notices a difference.
ƒ A determines whether the difference was caused
by the algorithms used to process the sequence
data having been changed.
Bioinformatics Application (3)
ƒ Recording Scalability
ƒ Querying Scalability
Other Applications
ƒ EU Provenance
project
ƒ Pre-prototype about
baking cakes
ƒ e-Demand
ƒ Detect sharing of
services in workflow
execution to offer
more resilient
execution
[Townend et al 05]
[Xu et al 05]
Conclusions
ƒ Mostly unexplored area that is crucial to develop
trusted systems
ƒ Current work:
ƒ System and protocol designing, architecture
specification, generic support for use cases
ƒ Pursue the deployment in concrete application and
performance evaluation
ƒ Download our software from www.pasoa.org
ƒ Tell us about your use cases: we are keen to
find new collaborations in this space!
ƒ Talk to Paul and
Simon
Publications
1. Paul Groth, Simon Miles, Weijian Fang, Sylvia C. Wong, Klaus-Peter Zauner, and Luc
Moreau. Recording and Using Provenance in a Protein Compressibility Experiment.
In Proceedings of the 14th IEEE International Symposium on High Performance
Distributed Computing (HPDC'05), July 2005.
2. Paul T. Groth. Recording Provenance in Service-Oriented Architectures. 9 Month
Report, University of Southampton; Faculty of Engineering, Science and
Mathematics; School of Electronics and Computer Science, 2004.
3. Paul Groth, Michael Luck, and Luc Moreau. A protocol for recording provenance in
service-oriented Grids. In Proceedings of the 8th International Conference on
Principles of Distributed Systems (OPODIS'04), Grenoble, France, December 2004.
4. Paul Groth, Michael Luck, and Luc Moreau. Formalising a protocol for recording
provenance in Grids. In Proceedings of the UK OST e-Science second All Hands
Meeting 2004 (AHM'04), Nottingham, UK, September 2004.
5. Simon Miles, Paul Groth, Miguel Branco, and Luc Moreau. The requirements of
recording and using provenance in e-Science experiments. Technical report,
University of Southampton, 2005.
6. Luc Moreau, Syd Chapman, Andreas Schreiber, Rolf Hempel, Omer Rana, Lazslo
Varga, Ulises Cortes, and Steven Willmott. Provenance-based Trust for Grid
Computing --- Position Paper. In , 2003.
7. Paul Townend, Paul Groth, and Jie Xu. A Provenance-Aware Weighted Fault
Tolerance Scheme for Service-Based Applications. In Proc. of the 8th IEEE
International Symposium on Object-oriented Real-time distributed Computing (ISORC
2005), May 2005.
Download