Provenance Aware Service Oriented Architecture (1 year on) www.pasoa.org Professor Luc Moreau University of Southampton L.Moreau@ecs.soton.ac.uk The PASOA Team PASOA Southampton Simon Miles, Paul Groth, Miguel Branco, Luc Moreau PASOA Cardiff Ian Wootten, Shrija Rajbhandari, Omer Rana, David Walker Provenance Definition Merriam-Webster Online dictionary: the origin, source; the history of ownership of a valued object or work of art or literature The provenance of a piece of data is the process that led to the data Our aim is to conceive a computer-based representation of provenance that allows us to perform useful analysis and reasoning to support our use cases Provenance Use Cases (1) Bioinformatics: verification of “experiment validity”. High Energy Physics: tracking, analysing, verifying data sets in the ATLAS Experiment of the Large Hadron Collider (CERN) Provenance Use Cases (2) Aerospace engineering: maintain a historical record of design processes, up to 99 years. Organ transplant management: tracking of previous decisions, crucial to maximise the efficiency in matching and recovery rate of patients The Provenance Problem Given a set of services in an open grid environment that are composed in order to produce a given result; How can we determine the process that generated the result? (especially after their composition, i.e., virtual organisation, has been disbanded) Core Interfaces to Provenance “Lifecycle” Provenance Store Application Application Results Record Documentation of Execution Manage Store and its contents Provenance Store Query Provenance of Data [Miles et al. 05] Logical Architecture Adopted by EU Provenance as strawman Recording & Querying PReP [Groth et al. 04] Protocol adopted by application components Allow for multiple provenance stores (scalability) Query Interface [Miles et al.05] Purpose Obtain the provenance of some specific data Allow for “navigation” of the data structure representing provenance Abstract interface client invocation service result invocation and result recording Provenance Store invocation and result recording Provenance Store Allows us to view the provenance store as if containing XML data structures Based on XPath and XQuery Assertions about Performance and Availability A taxonomy of gathered information about performance [Wootten, Rana 05] Recorded (invocation start/end time and counts) Derived from Recorded Information (averages) Queried against other actor owned metrics Compilation of assertions in a measure of trust (both from service and client perspective) Trust is a subjective probability that an actor will perform a particular action [Gambetta] [Rajbhandari, Rana 05] PReServ [Groth et al. 05] Implementation of PReP protocol and Query WS Client Interface PS Client Side Library Provenance store implemented as a Web Service Client side libraries for using Provenance Store Axis Handler for automatically recording communication between Axis-based Web Services Axis Handler Web Service PS Client Side Library Axis Handler Provenance Service WS Calls Java Calls Backend Store Interface PS Client Side Library Query Actor WS File System Store InMemory Store Backend Stores … Bioinformatics Application Bioinformatics workflow studying compressibility of biological sequences Implemented as a VDT workflow, scheduled by Condor Each service, script, command records provenance [HPDC’05] Bioinformatics Application (2) Use Cases Algorithm verification A bioinformatician, A, downloads a protein sequence from the RefSeq database and runs the compressibility experiment. A later performs the same experiment on the same sequence data, again downloaded from RefSeq. A compares the two experiment results and notices a difference. A determines whether the difference was caused by the algorithms used to process the sequence data having been changed. Bioinformatics Application (3) Recording Scalability Querying Scalability Other Applications EU Provenance project Pre-prototype about baking cakes e-Demand Detect sharing of services in workflow execution to offer more resilient execution [Townend et al 05] [Xu et al 05] Conclusions Mostly unexplored area that is crucial to develop trusted systems Current work: System and protocol designing, architecture specification, generic support for use cases Pursue the deployment in concrete application and performance evaluation Download our software from www.pasoa.org Tell us about your use cases: we are keen to find new collaborations in this space! Talk to Paul and Simon Publications 1. Paul Groth, Simon Miles, Weijian Fang, Sylvia C. Wong, Klaus-Peter Zauner, and Luc Moreau. Recording and Using Provenance in a Protein Compressibility Experiment. In Proceedings of the 14th IEEE International Symposium on High Performance Distributed Computing (HPDC'05), July 2005. 2. Paul T. Groth. Recording Provenance in Service-Oriented Architectures. 9 Month Report, University of Southampton; Faculty of Engineering, Science and Mathematics; School of Electronics and Computer Science, 2004. 3. Paul Groth, Michael Luck, and Luc Moreau. A protocol for recording provenance in service-oriented Grids. In Proceedings of the 8th International Conference on Principles of Distributed Systems (OPODIS'04), Grenoble, France, December 2004. 4. Paul Groth, Michael Luck, and Luc Moreau. Formalising a protocol for recording provenance in Grids. In Proceedings of the UK OST e-Science second All Hands Meeting 2004 (AHM'04), Nottingham, UK, September 2004. 5. Simon Miles, Paul Groth, Miguel Branco, and Luc Moreau. The requirements of recording and using provenance in e-Science experiments. Technical report, University of Southampton, 2005. 6. Luc Moreau, Syd Chapman, Andreas Schreiber, Rolf Hempel, Omer Rana, Lazslo Varga, Ulises Cortes, and Steven Willmott. Provenance-based Trust for Grid Computing --- Position Paper. In , 2003. 7. Paul Townend, Paul Groth, and Jie Xu. A Provenance-Aware Weighted Fault Tolerance Scheme for Service-Based Applications. In Proc. of the 8th IEEE International Symposium on Object-oriented Real-time distributed Computing (ISORC 2005), May 2005.