Report on the State-of-the-Art and Requirements Analysis DIP Data, Information and Process Integration with Semantic Web Services FP6 - 507483 Deliverable D5.1 Report on the State-of-the-Art and Requirements Analysis (WP 5 – Service Mediation) Emilia Cimpian Christian Drumm Michael Stollberg Ion Constantinescu Liliana Cabral John Domingue Farshad Hakimpour Atanas Kiryakov 07 March 2016 Deliverable 5.1 i Report on the State-of-the-Art and Requirements Analysis EXECUTIVE SUMMARY This deliverable covers the current state-of-the-art in data, information, and process mediation, and provides an analysis of mediation requirements for the DIP Mediation Component. The document treats the mediation of data and information separately from process mediation since process mediation requires the interpretation of goals and workflow as well as flexible Web Service invocation, which are not required for data and information mediation. This document consists of two main parts. The first part provides an overview of the current state of the art in mediation, describing some of the existing approaches and projects. In this section, the industrial and research approaches are treated differently for data and information mediation. The second part of the document provides an analysis of mediation requirements. Three types of requirements need to be considered here: requirements regarding the general architecture of the DIP Mediation Component (which can be requirements for the runtime environment and requirements for the design time tool), requirements for data and information mediation, and requirements for processes mediation. Deliverable 5.1 ii Report on the State-of-the-Art and Requirements Analysis Document Information IST Project Number FP6 – 507483 Full title Data, Information and Process Integration with Semantic Web Services Project URL http://dip.semanticweb.org Document URL https://bscw.dip.deri.ie/bscw/bscw.cgi/0/521 Acronym DIP EU Project officer Daniele Rizzi Deliverable Number 5.1 Title Report on the State-of-the-Art and Requirements Analysis Work package Number 5 Title Service Mediation Date of delivery Contractual M6 Actual version 0. 2 final Status 30-June-04 Nature Prototype Report Dissemination Dissemination Level Public Consortium Authors (Partner) Emilia Cimpian (NUIG), Christian Drumm (SAP), Michael Stollberg (UIBK), Ion Constantinescu (EPFL) Liliana Cabral (OU), John Domingue(OU), Farshad Hakimpour(OU), Atanas Kiryakov (SIRMA) Responsible Author Emilia Cimpian Email Partner NUIG Phone +353-91-512640 emilia.cimpian@deri.ie Abstract In the last twelve years since Gio Wiederhold [Wiederhold, 1992] first (for dissemination) came up with the idea of mediation and mediation systems, intensive research has been done in this field. In this report, we provide an overview of the current state-of-the-art in data, information, and processes mediation and an analysis of the requirements for constructing a mediation system. Keywords Data and information mediation; processes mediation; schema matching; ontology mapping, merging and alignment; choreography; orchestration; collaboration Version log/Date Change 27-Feb-04 First draft of the skeleton of the deliverable 19-Feb-04 Changes on the skeleton conforming Emilia Cimpian to the discussions during the Wiesbaden meeting 22-March-04 Paragraphs added regarding the state-of-the art in data mediation Deliverable 5.1 Author iii Emilia Cimpian Christian Drumm Report on the State-of-the-Art and Requirements Analysis 29-March-04 Outline of the state-of-the-art analysis for process mediation Michael Stollberg 05-April-04 IRS II description John Domingue, Liliana Cabral 21-April-04 Bullet points on requirements analysis Christian Drumm 29-April-04 D5.1a: Survey of Industrial Data Vladimir Alexiev Integration Systems – separate from Atanas Kiryakov this document, but part of the same deliverable 29-April-04 Compilation of work on ontologybased data mediation Farshad Hakimpour 30-April-04 Restructuring of the document Emilia Cimpian, Christian Drumm, Michael Stollberg 07-May-04 EPFL contribution and process mediation included Ion Constantinescu 14-May-04 Paragraphs added to the introduction Michael Stollberg 17-May-04 SAP XI and XMapper descriptions added; also significant changes to the state of the art in data and information mediation Christian Drumm 20-May-04 Paragraphs added to “Overview on Emilia Cimpian Data and Information Mediation Approach” and on the requirements; reordering of the references in alphabetical order. 21-May-04 More requirements Christian Drumm 25-May-04 More requirements Christian Drumm 28-May-04 Update on process mediation Michael Stollberg 1-June-04 Reference added to D5.1a Atanas Kiryakov, Emilia Cimpian 02-June-04 Ms BizTalk description added; Emilia Cimpian Restructuring of the research stateof-the-art in data and information mediation 03-June-04 More requirements Christian Drumm 04-June-04 Small changes concerning the form of the document Emilia Cimpian 22-June-04 Changes in the entire document, conforming to the reviewers comments Emilia Cimpian, Christian Drumm, Michael Stollberg 24-June-04 Small changes after the proofreading Emilia Cimpian Deliverable 5.1 iv Report on the State-of-the-Art and Requirements Analysis Project Consortium Information Partner Acronym NUIG Contact Prof. Dr. Christoph Bussler Digital Enterprise Research Institute (DERI) National University of Ireland, Galway National University of Ireland Galway Galway Ireland Email: chris.bussler@deri.ie Tel: +353 91 512460 Bankinter Monica Martinez Montes Fundacion de la Innovation. BankInter, Paseo Castellana, 29 28046 Madrid, Fundacion De La Innovacion.Bankinter Spain Email: mmtnez@bankinter.es Tel: 916234238 Berlecon Dr. Thorsten Wichmann Berlecon Research GmbH, Oranienburger Str. 32 10117 Berlin, Berlecon Research GmbH Germany Email: tw@berlecon.de Tel: +49 30 2852960 BT Dr John Davies BT Exact (Orion Floor 5 pp12) Adastral Park Martlesham, British Telecommunications Plc. Ipswich IP5 3RE, United Kingdom Email: john.nj.davies@bt.com Tel: +44 1473 609583 EPFL Prof. Karl Aberer Distributed Information Systems Laboratory École Polytechnique Féderale de Lausanne Swiss Federal Institute of Technology, Lausanne Bât. PSE-A 1015 Lausanne, Switzerland Email : Karl.Aberer@epfl.ch Tel: +41 21 693 4679 Essex Mary Rowlatt, Essex County Council, PO Box 11, County Hall, Duke Street, Chelmsford, Essex, CM1 1LX, Essex County Council United Kingdom. Email: maryr@essexcc.gov.uk Tel: +44 (0)1245 436524 FZI Andreas Abecker Forschungszentrum Informatik Haid-und-Neu Strasse 10-14 76131 Karlsruhe, Forschungszentrum Informatik Germany Email: abecker@fzi.de Tel: +49 721 9654 0 Institut für Informatik, Leopold-Franzens Deliverable 5.1 UIBK Prof. Dieter Fensel v Report on the State-of-the-Art and Requirements Analysis Universität Innsbruck Institute of computer science University of Innsbruck Technikerstr. 25 A-6020 Innsbruck, Austria Email: dieter.fensel@deri.org Tel: +43 512 5076485 ILOG Christian de Sainte Marie 9 Rue de Verdun, 94253 Gentilly, France ILOG SA Email: csma@ilog.fr Tel: +33 1 49082981 Inubit Torsten Schmale, inubit AG Lützowstraße 105-106 D-10785 Berlin, inubit AG Germany Email: ts@inubit.com Tel: +49 30726112 0 iSOCO Dr. V. Richard Benjamins, Director R&D Intelligent Software Components, S.A. Pedro de Valdivia 10 Intelligent Software Components, S.A. 28006 Madrid, Spain Email: rbenjamins@isoco.com Tel. +34 913 349 797 Net Dynamics Peter Smolle Net Dynamics Internet Technologies GmbH &. Co KG Net Dynamics Internet Technologies GmbH u. Co KG Prinz-Eugen-Strasse 68-70 A-1040 Wien, Austria Email: peter.smolle@netdynamics-tech.com Tel.: +43 1 503982615 OU Dr. John Domingue Knowledge Media Institute, The Open University, Walton Hall, Milton Keynes, MK7 6AA, The Open University United Kingdom Email: j.b.domingue@open.ac.uk Tel.: +44 1908 655014 SAP Dr. Elmar Dorner SAP Research, CEC Karlsruhe SAP AG SAP AG Vincenz-Priessnitz-Str. 1 76131 Karlsruhe, Germany Email: elmar.dorner@sap.com Tel: +49 721 6902 31 Sirma Atanas Kiryakov, Ontotext Lab, - Sirma AI EAD, Office Express IT Centre, 3rd Floor 135 Tzarigradsko Chausse, Sirma AI Ltd. Sofia 1784, Bulgaria Email: atanas.kiryakov@sirma.bg Tel.: +359 2 9768 303 Tiscali Tiscali Österreich Gmbh Dieter Haacker Tiscali Österreich GmbH. Diefenbachgasse 35, Deliverable 5.1 vi Report on the State-of-the-Art and Requirements Analysis A-1150 Vienna, Austria Email: Dieter.Haacker@at.tiscali.com Tel: +43 1 899 33 160 Unicorn Jeff Eisenberg Unicorn Solutions Ltd, Malcha Technology Park 1 Jerusalem 96951, Unicorn Solution Ltd. Israel Email: Jeff.Eisenberg@unicorn.com Tel.: +972 2 6491111 VUB Carlo Wouters, Starlab- VUB Vrije Universiteit Brussel Pleinlaan 2, G-10 Vrije Universiteit Brussel 1050 Brussel ,Belgium Email: carlo.wouters@vub.ac.be Tel.: +32 (0) 2 629 3719 Deliverable 5.1 vii Report on the State-of-the-Art and Requirements Analysis TABLE OF CONTENTS EXECUTIVE SUMMARY ....................................................................................................................... II TABLE OF CONTENTS ..................................................................................................................... VIII 1 INTRODUCTION ................................................................................................................................... 1 2 STATE-OF-THE-ART ANALYSIS ...................................................................................................... 4 2.1 MEDIATION OF DATA AND INFORMATION........................................................................................... 5 2.1.1 Importance of Data and Information Mediation in Semantic Web Services .............................. 6 2.1.2 Industrial State-Of-The-Art........................................................................................................ 8 2.1.2.1 Approaches and Projects .................................................................................................................... 9 2.1.2.2 Conclusions ...................................................................................................................................... 14 2.1.3 Research State-of-the-Art ........................................................................................................ 15 2.1.3.1 Classification Based on the Scope of Mediation .............................................................................. 15 2.1.3.2 Classification Based on the Classes of Application .......................................................................... 17 2.1.3.3 Approaches in Constructing the Mediator ........................................................................................ 20 2.1.3.4 Conclusions ...................................................................................................................................... 23 2.2 MEDIATION OF PROCESSES ............................................................................................................... 24 2.2.1 Processes and Process Technologies ........................................................................................ 24 2.2.1.1 Usage of Process Technologies within Semantic Web Services ....................................................... 24 2.2.1.2 Need for Process Mediation ............................................................................................................. 26 2.2.1.2.1. Choreography .......................................................................................................................... 27 2.2.1.2.2. Orchestration ........................................................................................................................... 29 2.2.2 Technologies for Process Mediation ........................................................................................ 31 2.2.2.1 Existing Process Representation Technologies ................................................................................ 31 2.2.2.2 Formalization of Process Representation ......................................................................................... 38 2.2.2.2.1. Logics for Representing Interaction Protocols ........................................................................ 39 2.2.2.2.2. Formalizing Choreography Description .................................................................................. 41 2.2.2.2.3. Formalizing Web Service Orchestrations ................................................................................ 43 2.2.2.3 Process Integration by Process Composition .................................................................................... 44 2.2.2.3.1. Situation Calculus for Service Composition ........................................................................... 44 2.2.2.3.2. Hierarchical Task Planning for Service Composition ............................................................. 44 2.2.2.3.3. Type Based Service Composition ........................................................................................... 45 2.2.3 Conclusion ............................................................................................................................... 51 3 REQUIREMENT ANALYSIS ............................................................................................................. 52 3.1 ARCHITECTURAL REQUIREMENTS FOR DIP MEDIATION COMPONENT.............................................. 52 3.1.1 Requirements for the Run-Time Environment ......................................................................... 53 3.1.2 Requirements on the Design-Time Tool .................................................................................. 54 3.2 REQUIREMENTS FOR DATA LEVEL MEDIATION ................................................................................ 57 3.3 REQUIREMENTS FOR PROCESS LEVEL MEDIATION ........................................................................... 59 4 CONCLUSIONS .................................................................................................................................... 61 5 REFERENCES ...................................................................................................................................... 62 Deliverable 5.1 viii FP6 – 504083 Deliverable 5.1 1 INTRODUCTION Due to an ever-increasing number of resources available on-line, end-users are presented with large amounts of data and information, and can find it nearly impossible to extract the relevant information items. Possibly the best solution to this overloading problem is to mediate between different heterogeneous sources and to provide the user with a single, relevant information source that is obtained by combining and relating a wide variety of different sources. The systems used for integrating heterogeneous sources are mediators [Wiederhold, 1992]. The basis of mediators and mediation architecture, as introduced by Wiederhold, is that a mediator resolves mismatches during the run-time of a system; it does not wrap resources before they are used in a system. Additionally, a mediator not only provides the mediation, but also provides an authoring environment in order to define mediation. From the business data viewpoint, it is a mapping tool that maps concepts from different data sources without losing or altering their semantics. Mediation / Integration, or more generally, dealing with heterogeneity, becomes very important when operating in distributed systems, likewise in Internet applications and the Semantic Web, and especially when dealing with Semantic Web Services [Bussler, 2003]. The basis of mediation is that it provides a high-level description technique for describing the structure of resources. Firstly, a mechanism checks the resources to be integrated (that is, to be made interoperable) according to their structure, then it provides mapping functionalities in order to make the resources interoperable. The basis of such a mechanism is an exhaustive, declarative description technique that allows the description of all features of the resources, thus providing a powerful ontology language for ontologies, and a suitable process description language for business processes (for ontologies we adopted the definition provided by Gruber [Gruber, 1993]: an ontology is a specification of a conceptualization; by processes we mean a set of activities and transitions with conditions for transitions). Secondly, an algebra is needed on top of this that defines the computable relations between the modeling primitives and the operations between them [Papakonstantinou et al., 1996]. Thirdly, a classification of mismatches that can occur between resources is required; the mismatch identification scheme should also classify the types of mismatches according to their resolvability. The fourth component of a mediator is a mechanism that works on the representation language and resolves a subset of the mismatches, with the algebra as the basis. In general, mediation is an infinite problem field and only partial solutions can be realized by automated mechanism. The reason for this is that it is considered to be impossible to define an algebra and mismatch-resolving mechanisms for all kinds of heterogeneities that can appear [Yerneni et al., 1999]. Most mediation technologies are only semi-automatic for resolving conceptual mismatches, where both resources model some aspect differently but correctly (which means that they require human intervention when defining the mappings between concepts). In the Web Service Modeling Framework (WSMF) [Fensel and Bussler, 2002], which represents the basis of DIP, and which serves as the conceptual foundation of Web 1 FP6 – 504083 Deliverable 5.1 Service Modeling Ontology (WSMO) [Roman et al., 2004], three levels of mediation are differentiated that are needed for Semantic Web Services. These levels are: - Data Level: establishes interoperability between heterogeneous data sources. As DIP uses ontologies for data representation, special attention should be given to ontology mediation. - Process Level: establishes interoperability between heterogeneous processes. In the external visible behaviour of Web Services that ought to interact, there are some mismatches that may occur (for example mismatching regarding the sequence of activities). These have to be resolved in order to make the processes interoperable. - Protocol Level: establishes interoperability between resources that request and/or use heterogeneous messaging patterns or messaging sequences1. It is important to note that for automated handling of Semantic Web Services, they have to be mediated on all three levels. The objective of DIP Work Package (WP) 5 is to specify and develop the DIP Mediation Component that is required to handle all mediation aspects within the DIP architecture (see [Altenhofen et al., 2004]). The challenges for WP 5 are to specify the architecture and functionality of the DIP Mediation Component. They have to be usable for the resolution of mismatches between different DIP components, and it should be possible to invoke adequate mediation facilities within the DIP architecture. In fact, the DIP Mediation Component poses two major challenges: firstly, the implementation of a mediation-oriented architecture in accordance to the concept of mediators and their usage in modern system architecture as outlined in [Wiederhold, 1992] and, secondly, the development of suitable mediation facilities, that is mechanisms that actually resolve mismatches between possibly heterogeneous resources at the three distinct levels of mediation identified above. The mechanisms to be developed within WP 5 have to support the representation languages and ontological structures of the different components in DIP, and they should apply and extend existing mediation techniques in order to provide high quality concerning the resolvability of possible mismatches. The aim of this deliverable is to provide an introduction into the field of mediation, and to analyse existing technologies with respect to their applicability for the DIP Mediation Component, resulting in a requirements analysis for WP 5. The document will serve as 1 There have been terminological dissimilarities between the naming of the different levels of mediation. While WSMF understands protocol level mediation to be concerned with messaging sequences, another interpretation is that protocol level mediation is concerned with different communication protocols, for example HTTP, SOAP, FIPA, and so on. The latter is the understanding that underlies the DIP WP 5 structure: here, process level mediation is understood to deal with heterogeneities between business processes (the workflow from an application domain point of view), and with the external visible behaviour of services along with possible mismatches in messaging sequences. With protocol level mediation, DIP WP 5 refers to the aspects of the communication protocol technology used. The WP 5 consortium agreed that establishing interoperability between the external visible behaviours and the messaging patterns of Web Services are a fundamental challenge to be addressed in the DIP project. Thus, the Protocol Mediation Level has been included in the Process Level aspects in this WP 5 (at least at this point in time). 2 FP6 – 504083 Deliverable 5.1 the basis of WP 5. To achieve this, the document covers the following aspects: for the requirements analysis of the DIP Mediation Component architecture, on the one hand existing architectures should be taken into consideration (see the survey in DIP D5.1a), and, on the other, the interoperability and usability within the overall DIP architecture has to be ensured. Regarding the mediation facilities to be developed in WP 5, the objective is to create high quality mediation facilities that extend existing approaches and techniques for the different levels of mediation. Therefore, we have to exhaustively investigate existing mediation technologies and systems, whereupon a reasonable requirements analysis and specification of the mediation facilities can be derived. For mediation within Semantic Web Services, the primary interest is in data level mediation and process level mediation. Protocol level mediation is covered by the fact that SOAP-based communication protocols are used for messaging within Semantic Web Services, thus no mismatches are expected on this level. (At least protocol level mediation does have the same priority as data level and process level mediation, with regard to the distinction explained in footnote 1). Thus, in the following we concentrate the state-of-the-art and requirement analysis on these areas of mediation, and the requirements analysis is comprised of aspects on the architecture of the DIP Mediation Component as well as for the development of mediation facilities for the data and the process level. This document is structured as follows: Chapter 2 reports on the state-of-the-art in mediation facilities and technologies for the data and the process level, illustrating current techniques and existing mediation systems; Chapter 3 provides the requirements analysis for the DIP Mediation Component; Chapter 4 concludes the document. 3 FP6 – 504083 Deliverable 5.1 2 STATE-OF-THE-ART ANALYSIS As mentioned in the previous chapter (Chapter 1), mediation in Semantic Web Services can take place at three different levels: data, process, and protocol levels. However, in our report we concentrate only on data (information) and process mediation. The relationship between data and information has to be clearly stated here before continuing with this report. Any facts handled on the Web are considered to be data, but they remain meaningless if interpreted out-of-context; data only becomes meaningful information if it makes sense in some context and has some sense for humans. As a consequence, the mediation of pure data can be done only on a syntactic level. To obtain meaningful mediated data, the semantic aspects must be considered, and the mediation must be based on the information inferred from the data. So the mediation of data actually implies the mediation of information and requires semantic mapping capabilities, as well as specialized mapping and integration techniques for a specific application context. While data and information are static structures, when mediating processes we have to take into consideration the dynamic aspects: the order of the activities, the transactions that may occur, and the conditions for transitions. In other words, we have to consider the interpretation of goals, as well as workflow and flexible Web Service invocation. Considering this difference between static and dynamic, we present the current state-ofthe-art in data and information mediation separately from the state-of-the-art in process mediation. 4 FP6 – 504083 Deliverable 5.1 2.1 Mediation of Data and Information In the last twelve years, there has been intense research activity in this field resulting in the development of many mediation techniques. There have been two different directions in this area, directions that strongly marked the out coming solutions: the industrial and the research areas. In the industrial area, the main focus was the rapid development of mediation systems appropriate for particular needs. The industrial systems are application oriented, offering simple solutions with a low-risk factor, based on technologies with years of expertise. Additionally, research activity is strongly concentrated on finding new and innovative solutions to improve the quality of results and reduce the effort of the human user; aiming to semi-automate mediating systems. One of the most daring approaches is the consideration of the semantic as an indispensable factor in the mediation solutions. Together with already well-refined (as much as possible) syntactical techniques, this approach intends to have a crucial role in the emergence of the Semantic Web Services. Unfortunately, both the research and industrial approaches still rely on human user input. In the following sections, we first provide a short rational for the use of data and information mediation in Semantic Web Services, and then we present the current stateof-the-art in both the research and industrial fields. 5 FP6 – 504083 Deliverable 5.1 2.1.1 Importance of Data and Information Mediation in Semantic Web Services The main reason for the use of Web Services is to provide a standard means of strongly decoupled interoperation between different software applications, running on a variety of platforms [Booth et al., 2004]. The purpose of adding semantic to the Web and to the Web Services is to define meanings that enable computers to operate in a more appropriate manner with the information they manage. The process of engaging a Web Service (or a Semantic Web Service), consists of the following steps [Booth et al., 2004]: 1. the requester and provider entities become known to each other; 2. the requester and provider entities agree on the service description and semantics that will govern the interaction between the requester and provider agents; 3. the service description and semantics are realized by the requester and provider agents; 4. the requester and provider agents interact by exchanging messages. These steps are illustrated in the following figure: Figure 1: Engaging a Web Service2 2 Source: [Booth et al., 2004] 6 FP6 – 504083 Deliverable 5.1 A misunderstanding between the service requestor and service provider can appear during either step 1 or 2, due to the fact that the two entities involved in the process can use different data sources. This imposes the use of mediation at data and information level, for facilitating the communication between a requestor and a provider of a service. 7 FP6 – 504083 Deliverable 5.1 2.1.2 Industrial State-Of-The-Art The current state-of-the-art in industrial applications is a central integration server, which intercepts messages between different systems and translates the message from a given source format into the necessary target format. The necessary transformations to perform these translations are static scripts that are executed by the integration server. The decision as to which script to execute is either taken during design time by the system designer or during run-time based on predefined rules. Examples for such systems are MS BizTalk [MS BizTalk, 2004], Seeburger Business Integration Server [SBIS, 2004], and SAP Exchange Infrastructure [SAP XI, 2004]. Most current integration servers also allow the dynamic routing of messages during runtime based on the message content. This enables some dynamic behaviour of the system as the target of given messages can be determined at run-time and doesn’t need to be defined at design time, for example SAP Master Data Management. The creation of a single view over multiple databases is also a problem that is addressed by several products. The points of interest in the industrial state-of-the-art are how the necessary transformations are created and how these transformations are executed, the rest of the mediation process being strongly dependent on the implementation, and not relevant for this state of the art. 1. Creation of Transformation The creation of transformations in all existing solution is strongly based on a domain expert inputs, that is they are semi-automatic or entirely manual. Two very different approaches to the creation of transformations exist: either they are created using a graphical tool or by directly programming the transformations using some kind of scripting language. Both approaches have different advantages and disadvantages , for example, graphical tools become very confusing for large message schemas, direct programming does not have any feedback on which elements are already treated, or which elements were probably forgotten. Some companies like, for example, Seeburger use a combination of the two builds into an IDE to create transformations. This enables the user to choose the approach that is most suitable for the given problem. 2. Execution of Transformation There are two basic approaches for the execution of the transformation. One is to interpret the scripting language used to define the transformation at run-time. As an example, one could use XSLT3 [W3C, XSLT, 1999] to program a transformation and use a standard XSLT processor to execute it. The second approach is to compile the transformation into an executable program, for example, a Java class, and to simply call this program during run-time. 3 XLST is a language for transforming XML documents into other XML documents. 8 FP6 – 504083 Deliverable 5.1 2.1.2.1 Approaches and Projects A broad survey is provided as a separate sub-deliverable D5.1a “Survey of Industrial Data Integration Systems”. It is provided separately because of its size and complex internal structure. The survey provides an introduction to industrial data integration systems and an overview of the systems provided by a few of the leaders in the database management and enterprise application integration, that is, ORACLE, IBM, Microsoft, WebLogic, and Cape Clear. Sub-deliverable D5.1a also represents the current “paradigms” and structuring of the data integration area without a bias towards Web Services, ontologies, or the Semantic Web; the reason for this being that most of the experience, industrial approaches, and technologies used for data mediation and integration are non-semantic. In order to point out some of the interesting features of existing systems, we now describe SAP Exchange Infrastructure, Seeburger Business Integration Server, and Microsoft BizTalk Server. The SAP Exchange Infrastructure [SAP XI, 2004] is a system that enables the integration of different enterprise applications on different platforms. It offers a runtime infrastructure for message exchange, configuration options for managing message flows and business processes, and support for the creation of the necessary message transformations. An overview of the architecture of SAP XI is shown in Figure 2. Figure 2: SAP Exchange Infrastructure Overview4 The SAP Exchange Infrastructure consists of three main parts: the Integration Repository, the Integration Directory and the Integration Server. The Integration Repository is used to capture all the information available during design time about an integration project. This information consists of interface descriptions, components, mappings, and business processes. In addition to this information, the Integration Directory contains additional configuration information, such as information about the system landscape or business partners. The central component of the SAP XI is the Integration Server, which is the central communication engine for messages sent 4 Source: [SAP XI, 2004] 9 FP6 – 504083 Deliverable 5.1 between different systems. The Integration Server is responsible for the routing and mappings of messages based on the information stored in the Integration Directory. During runtime the Integration server uses the information stored in the Integration Directory to perform these tasks dynamically based on the content of a message. The SAP XI is build upon Java 2 Enterprise Edition Platform. It supports a large number of open standards in order to ensure wider interoperability. Examples of supported standards include WSDL for the description of interfaces as well as the SOAP with Attachments specification and XML on which the communication is built. The Integration Server is currently capable of executing two types of transformations, XSLT scripts and Java classes. XSLT is supported only for enabling the integration of pre-existing mappings. Therefore, SAP XI offers no support for creating XSLT mappings. During run-time the appropriate mappings are dynamically selected from the integration directory based on the message header or contents using user-defined rules. For the creation of Java mapping programs, SAP XI offers a graphical tool that is very similar to many other products in this area. A user creates a mapping by graphically connecting elements of the schema of the source message with elements of the target message schema. In order to support more complexity than just one-to-one element mapping, the tool offers the possibility of inserting arbitrary functions into the data flow. The tool offers a large number of predefined functions and the possibility of creating new user-defined ones if needed. Figure 3 shows a screen shot of the SAP XI Mapping Tool. 10 FP6 – 504083 Deliverable 5.1 Figure 3: The SAP XI Mapping Tool5 Another industrial product is the Seeburger Business Integration Server [SBIS, 2004], which is an integration engine for B2B integration. An overview of the architecture is shown in Figure 4. The core of BIS is the so-called Workflow Engine. This engine controls all integration processes that are handled by BIS. This engine is connected to different Event Sources that can trigger the execution of a workflow. Events can be triggered by different sources, for example files, databases, or messages. Another possible source for events is the Web Services interface. BIS is capable of offering to other systems and to use requests to a web service to trigger the execution of a workflow. The connection of BIS to the legacy systems is achieved through the Components. The available Components include Connectors to standard ERP or eBusiness solutions, Communication Components to enable communication with external partners, Converters that enable the easy conversion between different communication standards and further components that include, for example security components to enable secure communication. While the overall architecture of BIS is very similar to the architecture of most industrial integration systems, there are two special features of BIS. Firstly, it contains a large number of converters and connectors. There exist transformations to convert among all major communication standards and connectors to connect with other available business systems. This enables the solution of a large number of integration 5 Source: [SAP XI, 2004] 11 FP6 – 504083 Deliverable 5.1 problems easily, as the development can focus on the integration of workflows and does not need to be concerned with the creation of transformations. Figure 4: Overview of the Seeburger BIS6 The second special feature that we want to highlight is the tool for generating transformations, the so-called Mapping Designer. This Mapping Designer is a very advanced tool, and can be seen as an integrated development environment for transformations. Transformations can not only be created using either a graphical interface or a scripting language like in most other tools, but uses both approaches in an integrated fashion. The user can either write a transformation script in a special scripting language and immediately see a graphical representation of the created transformation or create transformations graphically with the tool creating the resulting script. This allows a development process similar to those supported, for example, by well-known UML tools. The Mapping Designer also integrates debugging support in order to support the complete development process. The last project described here is Microsoft BizTalk Server [MS BizTalk, 2004], which enables the connection of diverse applications using a graphical user interface to create and modify business processes that use services from those applications. In order to do this, the Microsoft BizTalk Server engine must provide a way to specify the business 6 Source: [SBIS, 2004] 12 FP6 – 504083 Deliverable 5.1 processes and a mechanism for communicating between the applications that the business process uses. The main components of BizTalk Server 2004 are send and receive adapters, send and receive pipelines, orchestrations, the BizTalk Server message box, and the business rules engine (Figure 5). Figure 5: BizTalk Server Architecture7 The following way of processing messages is also illustrated in Figure 5: 1. A message is received through a receive adapter, and then processed through a receive pipeline (this processing can include converting the message from its native format into an XML document and validating its digital signature). 2. The message is delivered to the so-called MessageBox database, a database that uses Microsoft SQL Server. 3. The message is dispatched to its target orchestration, which takes whatever action the business process requires. The result is usually another message, which is also saved in the MessageBox database. 4. The new message is processed by a send pipeline (the processing can include its conversion from the internal XML format used by BizTalk Server 2004 to the format required by its destination and the adding of a digital signature). The message is sent to the send adapter. BizTalk Server 2004 is built completely around the Microsoft .NET Framework and Microsoft Visual Studio .NET. It also has native support for communicating by using 7 Source: [MS BizTalk, 2004] 13 FP6 – 504083 Deliverable 5.1 Web Services, along with the ability to import and export business processes described in Business Process Execution Language. 2.1.2.2 Conclusions The mediation systems described in this section are, like most (or all) of the industrial mediation systems, application oriented and appropriate for particular needs. One of the major challenges in industry is not to obtain a general solution, open to the new and innovating technique, but to offer a simple solution with a low risk factor. The first impression may be that these approaches do not raise any challenges, and that they are not appropriate for mediation in the context of Semantic Web Services. However, they should not be ignored, and when developing a new mediation system the option should be considered of improving one of the already existing industrial mediators are robust and reliable. 14 FP6 – 504083 Deliverable 5.1 2.1.3 Research State-of-the-Art The main focus on the research activity in data and information mediation is the development of approaches as systems that would require the minimum human intervention. Of course, the ideal scenario would be no requirement for human inputs at all, but this still remains an unsolved problem. The impressive number of projects regarding the mediation of data and information prevents us from trying to enumerate or describe all of them. Instead, we will provide a short list of possible classifications for these projects, exemplifying with projects based on those specific approaches. The classification criteria we chose are based on: A) Scope of Mediation B) Classes of application C) Approaches in constructing the mediator8 2.1.3.1 Classification Based on the Scope of Mediation Based on the scope of mediation, three strategies for ontology mediation are distinguished: ontology mapping, ontology merging and ontology alignment [Noy and Munsen, 2000]. Ontology Mapping In so-called ontology mapping, rules are defines to enable interoperability between two or more ontologies. The rules and the source ontologies are kept separated after integrating. The advantage of using this approach is that the mapping rules, once defined, can be reuse as many times as needed; a mapping rule must be rewritten only when one of the ontologies is changing. A project that implements this approach is the COIN (COntext INterchange) [Goh et al., 1999] project, which implements a suitable architecture for semantic interoperability. The COIN Framework consists of three main components: domain model, evaluation axiom and context axiom. The domain model defines the application domain in terms of primitive types and semantic types. The elevation axioms identify the correspondences between the attribute from the sources ontology and the types defined in the domain model. The last components, the context axioms, correspond to the named contexts associated with different sources, providing the semantics of data in terms of value assigned to semantic objects. Associated with a source ontology the context axioms provide the articulation of the data semantic. Articulation of data based on domain model (ontology) and relating the data with domain model are important facts considered in their architecture. However, a domain model is closer to a conceptual schema than an ontology. In an example in [Goh et al., 1999], one can see that "money amount" is considered a subtype of "semantic number" 8 These classification criteria are not disjoint. A project can be classified using any (or all) of these criteria. 15 FP6 – 504083 Deliverable 5.1 while number is only a primitive type for representing the value -or "currency type" is a subtype of "semantic string". However, according to [Gruber, 1993] the definition of ontology is based on the conceptualizations of the people in a community. Therefore, "money amount" is an amount or a quantity. Treating "semantic number" as a supertype is the result of influence of application development, while "money amount" or "currency type" are related to a value of type number or string, respectively, only for representation purpose. Ontology Merging In the ontology merging approach the two source ontologies are united into a single ontology that comprises all the information of the source ontologies [Noy and Musen, 2000]. The algorithm used for merging the ontologies should be able to eliminate any possible duplicates or inconsistencies that can occur (the original ontologies may cover similar or overlapping domains which implies that some concepts may be defined in both of them, not always in a similar or even consistent manner). The Kraft project [Visser et al., 1999] implements the ontology merging approach. It is a project for the integration of heterogeneous information, using ontologies to resolve semantics problems. The approach is to extract the vocabulary of the community and the definition of terms from documents existing in an application domain KRAFT uses shared ontology [Jones, 1998] as a basis for mapping between ontology definitions and communication between agents. In [Visser et al., 1999], the architecture is "chosen to make shared ontology as expressive as the 'union' of the ontologies". However, the definition of the union of ontologies and its similarities or differences with shared ontology is not stated. KRAFT detects a set of ontology mismatches (as described in [Visser et al., 1999]) and establishes mappings between the shared ontology and local ontologies. Ontology Alignment The alignment of the ontologies is accomplished by establishing links between them. A consequence of the alignment is that the two ontologies can reuse information from one another. The ontology alignment approach is applied in Observer [Mena et al., 2000], which uses ontologies to allow queries against heterogeneous sources. It replaces terms in user queries with suitable terms in target ontologies, by means of Inter-Ontology relations. Observer uses description logic as both ontology definition language and query language. There are three steps in processing a query: query construction, access to underlying data and controlled query expansion to new ontologies. The first step, query construction needs human intervention in selecting the user ontology (which contains information about the semantics of the query) and in editing the query. The execution of the query is performed in the second step (access to underlying data), when the user ontology is accessed. If the user is not satisfied with the query results, than other 16 FP6 – 504083 Deliverable 5.1 ontologies containing related terms are visited (this being the third step, controlled query expansion to new ontologies). A graphical representation of these three approaches is shown in Figure 6. Ontology Mapping Ontology Alignment Mapping Rules Ontology A is made compatible to ontology B Ontology Merging Figure 6: Ontology Integration Strategies An important issue that needs to be specified here is that ontology merging and ontology alignment cannot be considered totally independently from ontology mapping, as mappings are still necessary for making the merging or the alignment possible. A good example in this sense is the Observer project, which maps the query results obtained by consulting remote ontologies with the results obtained by consulting the user ontology. The choice of one of these mediation strategies is determined by the application field. If the only requirement is to express instances of one ontology in terms of the other, then the ontology mapping is the most appropriate technique. If it is necessary to have a set of rules and links that permit the usage and the interoperation of two ontologies, the ontology alignment approach is more suited. Finally, if the purpose is to obtain an ontology containing information from two different sources (ontologies), then merging them is the right solution. 2.1.3.2 Classification Based on the Classes of Application Considering the classes of application, we can distinguish the following approaches [Madhavan et al., 2002]: information integration and Semantic Web Services, ontology merging, and data migration. Information Integration and Semantic Web Services The information integration and Semantic Web Services approach is appropriate when there is a need for the use of many heterogeneous sources, without explicitly referring to each of them. The user just queries a mediated logical schema containing relevant information for the application. [Wache and Fensel, 2000] proposed the so-called intelligent integration that would allow the integration of a large variety of data sources, 17 FP6 – 504083 Deliverable 5.1 should be based on semantics by means of used ontologies and should provide an advanced query processor, that includes facilities for the extraction of content, data abstraction and a semantic-based query interface. An example of a system that uses this approach is the IRS-II (Internet Reasoning Service) [Motta et al., 2003] system. Since this system addresses mediation in the context of Semantic Web Services, a more detailed description is provided than for the previous presented systems. The mediation component of IRS is called a Bridge (a type of adapter) and stands within a framework for describing knowledge components. IRS bridges are not explicitly modeled (as, for example, in Web Service Modeling Ontology [Roman et al., 2004]), but they have specific roles, as discussed below. The IRS-II [Motta et al., 2003] is a Semantic Web Services framework, which allows applications to semantically describe and execute Web services. IRS-II is based on the Unified Problem Solving Method Development Language (UPML) framework [Omelayenko et al., 2003], which distinguishes between the following categories of components specified by means of an appropriate ontology: Domain models: these describe the domain of an application, for example, vehicles, a medical disease. Task models: these provide a generic description of the task to be solved, specifying the input and output types, the goal to be achieved and applicable preconditions. Problem Solving Methods (PSMs): these provide abstract, implementationindependent descriptions of reasoning processes that can be applied to solve tasks in a specific domain. Bridges: these specify mappings between the different model components within an application. The IRS implementation of the UPML framework covers semantic mappings amongst knowledge components and integration techniques for task-centred invocation of Web Services. The publishing platforms of IRS-II facilitate the invocation of Semantic Web Services by mediating between the server of semantic descriptions and the actual Web service. The definitions of task, problem solving method (PSM) and bridge in IRS correspond to the definitions of goal, web service description and mediators in WSMF [Fensel and Bussler, 2002] since both approaches derive from the UPML framework. The process of semantically describing services in IRS involves several mediation activities: mapping generic tasks and PSMs to a domain model, mapping PSMs to tasks or, in general, adapting existing resources. More specifically, in the UPML framework, the knowledge components of a library can be described and connected together in different running systems, through the creation of explicit mediating elements— adapters. In particular, bridge adapters connect two kinds of components by way of mapping relations between the ontologies of both components, such as: Task-Domain bridge PSM-Domain bridge Task-PSM bridge 18 FP6 – 504083 Deliverable 5.1 IRS supports the direct acquisition of the value of an input role, according to the task ontology. If the domain knowledge does not conform to the task ontology, the IRS supports users in constructing a mapping relation between the task role and the corresponding domain knowledge. A domain-task mapping relation defines the transformation of a piece of domain knowledge or attributes into an instantiated input role for the task; mappings may also be required for task outputs to conform to the domain ontology. The UPML description of the library may also include a set of PSM–task bridges. If not already provided in the library, the IRS supports the creation of such bridges to map the inputs and outputs of the described task to the ones of the selected PSM. IRS users specify the domain entities that fill-in the input– output roles of the PSM. Some of the roles for the PSM are inherited from the configured task, through a corresponding PSM–task bridge. In addition, the selected PSM may define supplemental roles. For example, a PSM can define the notion of an abstractor, a function that computes abstract types from raw data. The IRS supports the acquisition of domain-method mapping knowledge in a way similar to the domain-task mapping during task description. The invocation process consists of running the Web Service associated with the PSM to realize the specified task, with domain case data entered by the user. The IRS first acquires case data from the user and instantiates the case inputs of the PSM by interpreting the Task-Domain, Task–PSM and Domain–PSM mapping relations. The IRS also checks the preconditions of the PSM and task on the mapped case data. The IRS then runs the Web Service with the mapped inputs, by accessing the publishing platform used for registering the service for the PSM. IRS uses the publishing platform to retrieve knowledge about the location and type of PSM code. Finally, the IRS fills-in the domain outputs with the results of PSM execution, possibly transformed with domain–PSM mapping relations defined at PSM description time. The IRS-Protégé implementation supports a structured methodology for mapping the input–output roles of reasoning resources to relevant domain entities. The methodology provides a typology of mapping-relation template, that is a mapping ontology, which covers a wide range of mapping relations, from simple renaming mappings, to complex numerical or lexical transformation of entities. Data Migration The data migration approach is used for importing external data and then mapping, merging or aligning them with internal application data (see Section 2.1.3.1 for more details about these three approaches). The decision as to which of these three techniques is most appropriate is again dictated by the application field, the main criteria being that the mismatches between internal and external data should be covered as much as possible. The Clio project [Popa et al., 2002] is an example of project that implements a data migration approach using queries for ontology mapping. Clio is a high-level schemamapping tool that guides the user to the mapping specification by using the so called 19 FP6 – 504083 Deliverable 5.1 value correspondences. These value correspondences specify how the values of the source attributes are mapped to values of the target attributes. The entire process consists of two main steps: semantic translation and data translation. The first step implies the understanding of the given value references, which means that the semantic mappings must be understood and converted to logical mappings, while in the second step the logical mappings are transformed in low-level mappings, in this case queries. Ontology Merging The ontology merging approach was described in the previous section (Classification Based on the Scope of Mediation). The three approaches illustrated in this section are not, by any means, disjoint. Maybe a separation between them is theoretically possible, but the actual implementation of information integration or data migration is not possible without combining it with ontology merging (or with another one of the techniques described in the previous section) 2.1.3.3 Approaches in Constructing the Mediator There are three main approaches in constructing a mediator: machine learning [Doan et al., 2002], and structure based (schema matching) and linguistic/lexical analysis [Rahm and Bernstein, 2001]. Machine Learning In the machine learning approach, the mapping rules are “learned” based on existing examples of mappings. These mappings are usually constructed manually or semiautomatically (in which case the systems makes mapping suggestions but inputs from a domain expert are still needed). The larger the training set the more accurate are the results obtained by using this approach As an example of a system that uses machine learning technique for assisting the ontology mapping process we will describe here the Glue system [Doan et al., 2002]. By applying probabilistic definitions for similarity measures, Glue is able to find the most similar concepts between two heterogeneous data sources. The architecture of the system is shown in Figure 7. 20 FP6 – 504083 Deliverable 5.1 Figure 7: GLUE Architecture9 The main elements of the architecture are: the distribution estimator, the similarity estimator and the relaxation labeler. The distribution estimator applies machine learning technique to compute the joint probability distribution between two concepts belonging to two different taxonomies (the probability for the two concepts to have the same semantic). The similarity estimator applies a user supply function on this probability distribution, obtaining a similarity factor for each two concepts. Considering the entire taxonomies, these similarity factors form a similarity matrix that, together with domain specific constraints and the heuristic knowledge is computed by the relaxation labeler for obtaining a mapping configuration Schema Matching and Linguistic/Lexical Analysis In this case, the internal structure of the concepts is analyzed. Simultaneously, there may be used some heuristic functions based on linguistic similarities (for example consulting a dictionary or a thesaurus for finding lexical relations between concepts name like synonymy or hyponymy). The XMapper system10 is appropriate for illustrating the structured base approach. It was especially developed to create transformations between different XML message formats. XMapper uses only instance information to create the transformations. Figure 8 shows the functionality of the XMapper system. 9 Source: [Doan et al., 2002] 10 http://citeseer.nj.nec.com/kurgan02semantic.html 21 FP6 – 504083 Deliverable 5.1 To actually create a transformation XMapper first extracts a number of XML message instances from each data source. From these instances, XMapper then extracts the structure of the message as well as all XML elements of each message and a set of possible values for each of these elements. In the next step a feature vector containing 22 features (like type, allowed values, lengh) is created for each XML element. Sixteen elements of each feature vector are created by the constraint analysis and 6 elements by the learning component, using an algorithm called DataSqueezer [Larson et al., 1989]. After a feature vector for each XML element has been created, a Distance Table is built by calculating the distance between every two elements of the different sources. The transformation can now easily be found by mapping the two elements of each source that have the shortest distance. Figure 8: Functionality of the XMapper system11 As in the previous section (Classification Based on the Classes of Application) the technique illustrated here are often combined. The XMapper system actually uses both of these approaches, by “learning” while it creates the 22 features vector. 11 Source: http://citeseer.nj.nec.com/kurgan02semantic.html 22 FP6 – 504083 Deliverable 5.1 2.1.3.4 Conclusions In this section (research state-of-the-art on data and information mediation – Section 2.1) we have described some of the current existing research approaches on data and information mediation. The selection of the projects presented was made based on the classification criteria and the approaches identified; for each approach, we present a project that would illustrate its particularities. The vast number of research approaches and projects in this area lead to only one conclusion: the research is far from over, but maybe better solutions could be found. As previously stated in this section, an attempt to implement a particular approach, completely disregarding all the others, is neither an optimal nor a feasible solution. The best solution is probably to try to extract the most important features (from the functionality point of view) of all these approaches and to try to combine them, in order to achieve the desired functionality. 23 FP6 – 504083 Deliverable 5.1 2.2 Mediation of Processes To analyse the state-of-the-art in process mediation, we first need a clear understanding of the related concepts. Therefore, we will examine uses of processes in systems, paying special attention to the usage of process technologies within Semantic Web Services. Then, we point out where process mediation is needed and the specific requirements that arise for different process mediation scenarios. This section investigates these aspects. Starting with a general overview of process technologies, we point out the application scenarios for process level mediation within Semantic Web Services, and then investigate the existing technologies and approaches that can serve as a starting point for development of the Process Level Mediation Module of the DIP Mediation Component. 2.2.1 Processes and Process Technologies This section gives a brief overview of process technologies and their use within Semantic Web Services, and the resulting requirements for process level mediation scenarios. 2.2.1.1 Usage of Process Technologies within Semantic Web Services A process is a set of activities and transitions with conditions for transitions. Depending on the specific process, its tasks could be a combination of services that stand for queries, transactions, applications, and administrative activities. These services can be distributed within or across enterprises and are coordinated by constraining control and data among them. The services can themselves be composite, that is, implemented as processes, thus introducing nested processes and recursive definition of processes. Before explaining the technologies’ requirements and the challenges arising for process technologies, we first describe the general building blocks of processes and their definition [Bussler, 2003]: - Activity/Action/State: a step in a process that can be resolved arbitrarily, that is, either by a simple program or by a more complex one as well as by another process (‘sub-process’ or ‘nested process’), or by a manual activity. Activities in a process represent the basic building blocks of what is done or achieved in a process. With regard to the level of abstraction represented by the process, activities are not split into smaller building blocks. - Transition: a transition is a conversion between activities. Transitions are realized by conditions. - Data Flow: process technology allows specifying, executing, and controlling complex, multi-step information processing. Thus, the information to be processed has to be passed through the building blocks of the process. Data flow is concerned with real application data, in contrast to control flow, which deals with process technology information. The duty of data flow technology for processes is to ensure that each activity in a process receives the information it needs for execution. - Control Flow: control information is needed in order to provide the means for defining the nature of a process and for controlling its execution. Control flow primitives can be distinguished as: 24 FP6 – 504083 Deliverable 5.1 a. Process Logic Primitives: Control flow elements for the specification of control flow structures that can be combined into more complex algorithms. The most common process logic description primitives are (naming in accordance to BPEL4WS, see [Curbera et al., 2002]): sequence, for serial execution while, to implement a loop switch, for multiple way branching flow, for parallel execution pick, for choosing among alternative paths based on an external event b. Execution Control Primitives: Primitives for defining the execution handling of a process or its activities. This group (optionally) contains primitives for: Timing: handling of timeouts, and so on during process execution Event Handling: support for event-driven execution of processes Interaction, that is interoperation between parties Adequate process technologies face a number of technical challenges. At first, they have Adequate process technologies face a number of technical challenges. Firstly, they have to support modeling processes and ensure correctness of execution with respect to the model and to the constraints of the underlying services and their resources. Normal execution of a process is easy when the process model specifies a partial order of the activities in the process. Exception conditions can be more difficult to model and handle. More important, because interesting business processes are often long running, interactions among them are non-atomic, leading to the possibility that the information they take as input can be subject to revision, causing their own results to be invalidated. Exceptions and revisions are the main sources of complications in the modeling of a process. Secondly, a suitable process technology has to support interfacing the process with underlying functionalities, that is, the resources that actually fulfil the distinct activities in a process. Within database systems, this would require linkage to the concurrency control and recovery mechanisms of a DBMS – within Semantic Web Services a linkage to the execution control of Web Services is required. A major use of process technologies is to allow the automation of business processes within organizations. This area is commonly referred to as “workflow technologies”, which are a special type of processes12. Numerous academic, industrial, and joint efforts 12 We apply the following understanding of process technologies and workflow technologies within this document, according to [Bussler, 2003]: ‘process technology’ is the general notion for technologies for 25 FP6 – 504083 Deliverable 5.1 work on specification requirements, overall frameworks, and software tools for workflow technologies – the estimates range from 100 to 250 such efforts world-wide. Within the area of Semantic Web Services, we identify the following purposes for the use of process technologies: Choreography. This takes the perspective of a process as being a set of message exchanges between participants, that is, when a user (machine or human) interacts by exchanging data and information via messages. The message exchanges are constrained to occur in various sequences and may be grouped into various transactions. Thus, within this application field, process technologies are needed to formally describe the external visible behaviour of Web Services, which are the steps of the business process that a Web Service shows to its user in order to allow the interaction and information exchange needed to fulfil its service. Orchestration. This takes the perspective of a process as a program or partial order of operations that need to be executed. This view is logically centralized in that it views the process from the perspective of one “orchestrating” engine. It is as if the process specification is being executed under the control of or on behalf of a specific party. In other words, the Orchestration of a Web Service A describes how other Web Services (W1, .., Wn) are composed into the functionality of Web Service A. Collaboration. This takes the perspective of a process as a collaboration involving some business partners. The business partners not only send messages to one another but also enter into business relationships such as contracts and obligations. They generate flexible message exchanges depending on the evolving circumstances and their local policies, for example for handling business exceptions. Collaboration is emerging as a serious approach for carrying out large-scale business processes. Automated collaboration describes the aim of Semantic Web Services technologies in correspondence to the vision of the Semantic Web: several autonomous Web Services shall be combinable and usable in a collaborative manner as components for more complex, specific applications that support various kinds of functionality by re-use and dynamic composition. 2.2.1.2 Need for Process Mediation With respect to the usage scenarios of process technologies within Semantic Web Services, the need for process mediation technologies that support handling and resolving heterogeneities consequently emerge as a further technological challenge. For example, if a production scheduling software system employs a different modeling formalism than a purchase order processing software in a supply chain, then the given enterprises’ cooperation may be adversely affected. Especially in open and handling transactional and dynamic behaviours, while ‘workflow technology’ is a special kind of process technology that is concerned with real-world business processes. 26 FP6 – 504083 Deliverable 5.1 decentralized environments like the Internet, heterogeneity handling is a major issue in system design. Interoperability among processes, which clearly is an important need in practical settings, requires some kind of translator among process models – this is what we understand to be technologies for process level mediation within the context of WP 5 in DIP. For investigating existing technologies, as well as for depicting the requirements for the DIP Process Level Mediation Module, we have the following understanding of process level mediation: The overall aim of process technologies within Semantic Web Services is to allow automated Collaboration (see above) of Web Services with complex (that is, multi-step, process-like) external behaviours. Process technologies for Choreography allow the use of complex Web Services by a user or Service Requester (a more general term which can be a human, another Web Service, or any other agent). Process technologies for Orchestration allow the composition of existing Web Services into a more complex Web Service. The realization of automated Collaboration can be achieved by the combination of suitable technologies for Choreography and Orchestration into a coherent framework. On the basis of these general requirements for process mediation technologies, we can examine the needs arising for process mediation within Semantic Web Services, whereby we focus on the notions of Choreography and Orchestration. Choreography and Orchestration are explained in more detail below and the specific needs for process level mediation are discussed13. 2.2.1.2.1. Choreography The Choreography of a single Semantic Web Service describes the external visible behaviour of a Web Service as needed to use the Web Service, along with the messaging sequence expected for its use, that is, the pattern of user-service interaction. In other words, the Choreography of Web Services is described by the data flow, control flow, and message exchange pattern that a Web Service makes visible. Consequently, a Web Service only makes those aspects of its functionality externally visible where it needs interaction with the user (for example, input or notification). On the basis of such a behavioural description of a single Web Service, global interaction models for several Web Services can be determined. The interaction is realized by defining a message exchange between possibly several Web Services in accordance to their individual Choreographies. The challenge for process level mediation within Choreography is to establish a global interaction model of Web Services that do not have compatible Choreographies a priori. 13 The definition of the notions of Choreography and Orchestration are based on the definition provided in [Singh and Huhns, 2004]. This also corresponds to the terminology definitions within the Web Service Modeling Ontology WSMO [Roman et al., 2004]. 27 FP6 – 504083 Deliverable 5.1 Figure 9 shows the general structure of a Choreography of a single Web Service, with further explanations on the need for process level mediation below. Figure 9: Web Service Choreography - General Structure As an example, we can imagine a Web Service for purchasing goods. It is a multi-step service with the following sequential activities: a. Select the goods to purchase. b. Make an agreement between the seller (the owner/provider of the Web Service) and the buyer (the Service Requester) on the purchase contract for the selected goods and payment. c. Deliver the goods. In order to use this Web Service, the Web Service and the Service Requester have to interact within the distinct activities for, (1) choosing the goods; (2) offering and accepting the contract; and (3) choosing the delivery method. The external behaviour of the buyer in this interaction has to be compatible with the business process of the Web Service. This means that the buyer has to have facilities to select the goods, to accept a contract, and so on, and, (most relevant for process mediation) the sequence of the compatible activities of the buyer side has to be compatible. For example, the buyer cannot pay after product delivery if the seller requires pre-delivery payment, because then each process would be frozen waiting for the other. 28 FP6 – 504083 Deliverable 5.1 If we think of the buyer in this example as a Web Service, not a human user, it makes clearer the process technologies that Choreography will have to support, as well as what is needed for process level mediation within Choreography. The process technology has to be able to describe a business process, that is, a workflow. This description contains only those activities of the internal functionality of the Web Service in which user interaction is needed. For example, the Choreography for the payment functionality contains the need for the user selection of the payment method and a notification when the payment is achieved, but not a description of the steps involved in how the payment is actually done. The activities in the Choreography description comprise the messaging pattern needed for user service interaction for this specific activity. In Figure 9, the blue arrows denote the process control flow, while the black arrows denote the messages as the data flow of between the interacting entities. Also, the need for process level mediation within Choreographies becomes clearer. The process level mediation technology has to ensure that the process structures of the Choreographies of two Web Services that will interact are compatible, and that the distinct activities of the Choreographies are compatible, within the definition of compatibility as exemplified above. We understand this as the “determination of Choreography compatibility”, in other words, establishing a suitable global interaction model of Web Services in Choreographies that previously were not compatible. Therefore, compatibility refers to the workflows of the distinct Web Services, that is, the business processes from the application perspective (called Process Level within WSMF), as well as the congruency of the messaging patterns defined in the Choreographies of the individual services (called Protocol Level in WSMF). The challenge of the Process Level Mediation Module in the DIP Mediation component is to provide the means for resolving any mismatches that occur within these aspects, and to provide (semi) automated support for resolving these mismatches. 2.2.1.2.2. Orchestration The Orchestration of a Web Service A describes how other Web Services (W1 .. Wn) are composed into the functionality of Web Service A. The task for process level mediation within Orchestration is to determine the correctness and suitability of the composition of Web Services (W1 .. Wn), and to resolve any disparities that might occur in the composition. Figure 10 shows the general structure of a Web Service Orchestration, with further explanations on the need for process level mediation below. 29 FP6 – 504083 Deliverable 5.1 Figure 10: Web Service Orchestration - general structure In Figure 10, the squares represent the activities of the business process of Web Service A with respect to the functional decomposition of A. For each of these activities, there is a request for an implementation that implements or realizes this functionality. The different symbols inside the process activities show the realization of the activities: Activity 1 is realized by invoking a single Web Service; for Activity 2, two Web Services need to be composed; for Activity 3, a single Web Service is used which has a complex (multi-step) Choreography – therefore the realization has to hold a compatible Choreography. While these realizations are concerned with service usage, or service interaction respectively, and thus denote Choreography, the Orchestration of Web Service A describes the decomposed functionality of a Web Services. Although with a very different functional purpose, this is a process description similar to the behavioural description in Choreography. Orchestration describes the functional decomposition of a Web Service. This is defined, on the one hand, in the process description, the single activities and their process structure and, on the other hand, the composition of the used Web Services (W1, .., Wn) has to be executable. This requires the resolution of disparities between the particular Web Services as well as between the distinct activities of the business process of the Orchestration. So, the process level mediation for Orchestration has to provide the means to combine other Web Services according to the functional needs defined in the decomposition of a Web Service, and to resolve disparities inside in the composition. In conclusion, the need for process mediation within Orchestration is support for solving mismatches in the composition of external Web Services (regarding data and control flow), and the integration of several composed Web Services into the Orchestration of another Web Service. 30 FP6 – 504083 Deliverable 5.1 2.2.2 Technologies for Process Mediation Having outlined the usage of process technologies within Semantic Web Services, as well as the specific needs for process level mediation, we now investigate existing approaches and technologies for mediation of processes. In order to provide a suitable support for mediation of processes in DIP, the Process Level Mediation Module has to support the representation for processes to be chosen within the DIP framework. To this end, we briefly examine existing process technologies that are currently mentioned within the field of Semantic Web Services as well as formalisms that support inferencebased reasoning as the technology to be used for process mediation. 2.2.2.1 Existing Process Representation Technologies After outlining the general understanding and usage of process technologies, we commence an analysis of the state-of-the-art in process technologies. This is restricted to existing process technologies for Semantic Web Services as the major field of interest in DIP. We briefly summarize the most frequently mentioned approaches and point to exhaustive surveys existing in literature, for example, [Solanki and Abela, 2003] and [Peltz, 2003]. BPEL4WS The Business Process Execution Language for Web Services (BPEL4WS) [Curbera et al., 2002] is an approach for describing the behaviour of Web Services in a business interaction. It specifies a XML-based grammar for the control logic that is used to coordinate web services, thus is to be considered to be a technology for process technology in Orchestration as defined above. BPEL4WS is based on industrial initiatives for process description languages: XLANG14 developed by Microsoft and the Web Service Flow Language WSFL15 developed at IBM. Thus, BPEL4WS combines the features of block-structured process languages (XLANG) with those of graph-based approaches (WSFL). BPEL4WS provides a language for the formal specification of business processes (that is, the control level), and business interaction protocols (that is, the Web Service interaction level). It distinguishes two kinds of processes: Executable business processes model the actual behaviour of a participant in a business interaction. 14 XLANG focused on the creation of business processes and the interactions between web service providers. The specification provided support for sequential, parallel, and conditional process control flow. It also included a robust exception handling facility, with support for long-running transactions through compensation. XLANG used WSDL as a means to describe the service interface of a process. see Microsoft Specification at: http://www.gotdotnet.com/team/xml_wsspecs/xlang-c/default.htm 15 WSFL was proposed to describe both public and private process flows [Leymann, 2001]. WSFL defines a specific order of activities and data exchanges for a particular process. It defines both the execution sequence and the mapping of each step in the flow to specific operations, referred to as flow models and global models. The flow model represents the series of activities in the process, while the global model binds each activity to a specific web service instance. A WSFL definition can also be exposed with a WSDL interface, allowing for recursive decomposition. WSFL supports the handling of exceptions but has no direct support for transactions. 31 FP6 – 504083 Deliverable 5.1 Business protocols use process descriptions that specify the mutually visible message exchange behaviour of each of the parties involved in the protocol, without revealing their internal behaviour. That is, the descriptions specify interfaces. The process descriptions for business protocols are called abstract processes and cannot be executed. In BPEL4WS, a simple business process is layered on WSDL-defined Web Services. The interaction model of WSDL is essentially a stateless client-server of synchronous or uncorrelated asynchronous interaction. However, BPEL4WS defines business processes consisting of stateful, long-running interactions in which each interaction has a beginning, a defined behavior and an end, modeled by a flow. This flow is composed by a sequence of activities. The behavior context for each activity is provided by a scope. A scope can provide fault handlers, event handlers, compensation handlers and a set of data variables and correlation sets. Table 1 summaries the functionalities of these process modeling concepts: Table 1: BPEL4 WS Process Modeling Concepts BPEL4 WS Process Modeling Concepts Activities: An action/step in a process. Activities can be combined via following connectors: the Receive: message arrival handling Reply: answering a received message Invoke: invocation of a request-response operation on a portType offered by a partner Assign: for updating values in variables Throw: generates a fault inside the business process Wait: time-out wait Empty: insertion of an empty operation into a process Sequence: defines a collection of activities to be performed sequentially Switch: selects a branch of activities from a set of choices While: repetition of an activity until a certain success condition of has been met Pick: blocks a process and waits for a suitable message to arrive Flow: specifies one or more activities to be performed concurrently Scope: defines a nested activity with its own associated variables, fault handlers and compensation handlers Variables: Variables allow specifying stateful interactions in a business process. They provide the means for holding messages that constitute the state of a business process. These messages can be either those that have been received from business partners or those who are to be sent to the business partners. Variables can also 32 FP6 – 504083 Deliverable 5.1 hold data which are needed for holding state related to the process and never exchanged with partners. They are associated with a messageType, which corresponds with a WSDL message type definition. Correlations: Correlation deal with conversational and negotiation properties. Business processes exchange information using messages in XML syntax. This exchange of information can be enhanced by means of correlation. During its lifetime, a business process typically holds one or more conversations with partners involved in its work. Conversations may be based on sophisticated communication infrastructure that correlates the messages involved in a conversation using some form of conversation identity. Event Handling: Each scope can be associated with a set of event handlers when a certain event occurs. Several actions that can range from simple to sequenced activities are performed within the event handler. In BPEL4WS there are two types of events: alarms that go off after user-set times or incoming messages corresponding to a request/response or a one-way WSDL operation.. Fault Handling: Each scope can be associated with a set of custom fault-handling activities. Every activity is intended to fit a specific kind of fault. These faults can result from a WSDL operation fault or a programmatic throw activity. Regarding the needs for Web Service Orchestration stated in the introduction, BPEL4WS offers a modeling technique that covers the control level and the interaction level in a suitable manner. A study that examines the expressivity of BPEL4WS in terms of workflow and communication support is presented in [Wohed et al., 2003]. A major drawback of this language is that is does not explicitly rely on or incorporate a formalized process representation, as is needed in order to compose Web Services dynamically by intelligent mechanisms working on descriptive information. XLANG is based on -Calculus, a further development of the Process Algebra CCS (see Section 2.2.2.2). WSFL as the other basis of BPEL4WS relies on statecharts, a technique for describing complex transitions in finite state machines [Harel, 1987]. The approach and the formal model underlying statecharts is comparable to the one of the Process Algebra CSP (see Section 2.2.2.2). These formal models are only implicitly recalled with in BPEL4WS, and there does not exists a direct mapping from BPEL 4WS to -Calculus or any related formalism. BPML The Business Process Modeling Language [BPML, 2003] is a meta-language for the modeling of business processes. In conjunction with WSCI (see below), BPML provides a similar functionality as BPEL4WS. Therein, BPML provides a modeling technique for the process control level and WSCI is designed for specifying interactions between Web Services. 33 FP6 – 504083 Deliverable 5.1 BPML provides an abstracted execution model for collaborative and transactional business processes based on the concept of a transactional finite-state machine. It consist of three parts, a Public Interface and two Private Implementations (one for each partner). The Public Interface, which is common to the partners, is supported by protocols such as ebXML16 or RosettaNet17, and BizTalk18; the private interfaces are specific to each partner and can be described in any executable language. BPML provides a BPML XML Schema as the general ontological structure for processes descriptions. Business processes are represented as the interleaving of control flow, data flow, and event flow, while adding orthogonal design capabilities for business rules, security roles, and transaction contexts. BPML offers explicit support for synchronous and asynchronous distributed transactions, and therefore can be used as an execution model for embedding existing applications within e-Business processes as process components. Process specifications can also be loosely, BPML provides similar process flow constructs and activities as BPEL4WS. Basic activities for sending, receiving, and invoking services are available, along with structured activities that handle conditional choices, sequential and parallel activities, joins, and looping. BPML also supports the scheduling of tasks at specific times. Other features supported in BPML include persistence, roles, instance correlation, and recursive decomposition, i.e. the ability to compose sub-processes into a larger business process. The language has been designed to manage long-lived processes, with persistence supported in a transparent manner. In comparison to BPEL4WS, XML exchanges occur between the various participants, with roles and partner components similar to the BPEL constructs. Both short and long running transactions are supported, with compensation techniques used for more complex transactions. BPML uses a scoping technique similar to BPEL4WS to manage the compensation rules. It also provides the ability to nest processes and transactions, a feature that BPEL currently does not provide. Also, a robust exception handling mechanism is available within BPML, following many of the constructs in XLANG. Timeout constraints can also be specified for specific activities defined within the process. The formal foundation of BPML is similar to that of BPEL4WS. There is no formal process representation that explicitly supports BPML, but the constructs inherited from XLANG (which has been applied as a foundation for BPML as well) still exist within BPML. Nevertheless, no concrete formalization of BPML process descriptions exists. WSCI / WS-CDL The Web Service Choreography Interface (WSCI) [Arkin et al., 2002] is an XML-based interface description language that describes the flow of messages exchanged by a Web Service participating in choreographed interactions with other services, that is Web Service collaboration. It describes the dynamic interface of the Web Service participating in a given message exchange by means of reusing the operations defined for a static interface. WSCI works in conjunction with the Web Service Description 16 see: http://www.ebxml.org/ 17 see: http://www.rosettanet.org/ 18 see: http://www.microsoft.com/biztalk/ 34 FP6 – 504083 Deliverable 5.1 Language [W3C, WSDL, 2004], but it can also work with another service definition language that exhibits the same characteristics as WSDL. A WSCI specification supports message correlation, sequencing rules, exception handling, transactions, and dynamic collaboration. Specific transactional contexts can be set up within WSCI, similar to the scope activity in BPEL4WS. When a set of activities is defined within a context, any failure will result in the entire group being rolled back. WSCI describes the observable behaviour of a Web Service. This is expressed in terms of temporal and logical dependencies among the exchanged messages, featuring sequencing rules, correlation, exception handling, and transactions. WSCI also describes the collective message exchange among interacting Web Services, thus providing a global, message-oriented view of the interactions. WSCI supports both basic and structured activities: The <action> tag is used to define a basic request or response message. Each activity specifies the WSDL operations involved and the role being played by this participant. External services can then be invoked through the <call> tag. A wide variety of structured activities are supported, including sequential and parallel processing, and condition looping. WSCI also introduces an <all> activity, used to indicate that the specific actions have to be performed, but not in any particular order. WSCI does not address the definition and the implementation of the internal processes that actually drive the message exchange. Rather, the goal of WSCI is to describe the observable behaviour of a Web Service by means of a message-flow oriented interface. This description enables developers, architects and tools to describe and compose a global view of the dynamic of the message exchange by understanding the interactions with the Web Service. WSCI does not address the definition of executable business processes as defined by BPEL4WS. WSCI Choreography includes a set of WSCI documents, one for each partner in the interaction. In WSCI, there is no single controlling process managing the interaction. Each action in WSCI represents a unit of work, which typically would map to a specific WSDL operation. WSCI can be thought of as the glue around WSDL, describing how the operations can be choreographed. In other words, WSDL would be used to describe the entry points for each service available and WSCI would describe the interactions among these WSDL operations, very similar to how BPEL4WS leverages WSDL. The work on WSCI is not continuing, as the W3C Web Service Choreography Working group19 is concentrating its work on the Web Services Choreography Description Language (WS-CDL) [Kavantzas et al, 2004]. The aim of WS-CDL is to develop a language for describing global interaction models for several Web Services, thus following a different understanding of Choreography than WSCI. Within WS-CDL, the notion of Choreography within Web Services is concerned with global, multi-party, peer-to-peer collaborations. WS-CDL aims at providing the description language for this, describing a common observable behaviour of two or more participants. The description perspective a global, participant agnostic viewpoint, wherein information exchange takes place when jointly agreed among the participants. Therefore, a set of information-driven reactive rules is specified. In contrast to 19 homepage: http://www.w3.org/2002/ws/chor/ 35 FP6 – 504083 Deliverable 5.1 BPEL4WS, WS-CDL follows a top-down approach for describing the interaction of aggregated Web Services. This means that BPEL4WS starts with specifying the behavioural requirements of single participants first, and then tries to aggregate them together. WS-CDL starts with the specification of global visible information, that is, those needed for the interaction as well as the global message exchange between participants along with information-driven rules that allow dynamic compatibility checking during run-time. Then, the requirements and description for the participants’ behaviours is recursively determined top-down from the global settings, aiming at automated generation of the behavioural interfaces of participants [Kavantzas et al., 2004]. The description elements defined in WS-CDL are organized in three groups: 2. Information Typing: description of the information to be exchanged between participants. These are placeholders and container-structures specified at design time, and filled with real data at execution time. 3. Participant Description: defines the entities to participate in a Choreography, describing their identity, their roles, and the relationships between them. 4. Information-Driven Collaboration Roles: defines notions of channels (concrete information exchange paths between participants), process description notions, work units, and management of Choreography descriptions (import / reuse support). Regarding the formal foundation of the W3C efforts around WSCI and WS-CDL, WSCI does not rely on or incorporate a formalized representation. In order to support inference-based handling of WSCI-definitions, mappings to suitable formalisms have to be defined retroactively (see Section 2.2.2.2.2 for a description of this approach). Obviously, such techniques risk that the mappings might not be isomorphic (information-preserving), or that certain aspects can not be modelled in the formalism. In contrast, WS-CDL claims to be based on a formal foundation: the Explicit Solos Calculus, which is a variant of -Calculus and allows modeling a system from a global point of view (although this formal foundation has been announced as an important feature of WS-CDL, the specification has not been officially released at the point of writing). OWL-S In OWL-S Web Services are understood as processes whereby the term process is used in the sense of an activity, as in its antecedent DAML-S [DAML-S, 2004]. The objective of the OWL-S process model is to define an ontology that covers all information needed for semantically enhanced Web Service composition. This information is modelled in the OWL-S Process Ontology [OWL-S, 2004]. Additionally, a so-called process control ontology is defined that describes the monitoring of a process execution. Figure 11 shows the structure of the OWL-S Process Model. 36 FP6 – 504083 Deliverable 5.1 Figure 11: OWL-S Process Ontology20 OWL-S defines three types of Web Service process: atomic, simple, and composite processes. A process is described via data inputs, data outputs, pre-conditions, and effects (which concern state-of-the-world conditions, described using condition concepts). The OWL-S Process Ontology defines basic control constructs (Sequence, Split, Fork + Join, Unordered, Condition, If-Then-Else, Iterate, Repeat-While, and Repeat-Until). The expressiveness of the OWL-S process model is not as rich as that of the workflow definition technologies presented above. This process ontology only provides very basic modeling concepts for processes, concentrating on control constructs. It is unclear how the process descriptions (input, output, pre-conditions, and effects) are meant to be used for dynamic discovery and composition of Web Services or if it is to be used for describing the interaction behaviour of a Web Service. Because of this, current research proposes the replacement or enhancement of the OWL-S process model with ontological descriptions based on the process models underlying BPEL4WS or BMPL/WSCI [Lara et al, 2003]. Similar to WSCI, the OWL-S process model is not based explicitly on a formal process representation. As a retroactive formalization in order to allow analysis and simulation of the OWL-S process model, [Narayanan and McIlraith, 2003] provide a formalization on basis of situation calculus. Although the mapping allows formal representation of OWL-S process models, it is a proprietary approach and not explicitly supported within the OWL-S framework. 20 Source: [OWL-S, 2004]. 37 FP6 – 504083 Deliverable 5.1 2.2.2.2 Formalization of Process Representation In order to allow mediation of processes on the basis of reasoning mechanisms, there needs to be a formalization of the modeling constructs for the processes of the language that are used for defining the processes. With such a formalization, specific mechanisms for (semi-)automatic mediation can be defined. An example of such an approach is the Process Specification Language (PSL21), which has been a NIST research project for developing a standardized process ontology for the manufacturing domain, including techniques for “Semantic Translations” between different manufacturing systems. It uses the Knowledge Interchange Format (KIF) for representing processes and defines rules for translations of processes on the semantic level [Schlenoff et al, 2000]. However, PSL represents a proprietary approach for formal process specification and transformation for a specific domain - we are interested in a general model for process level mediation based on a sound theoretical foundation. Such a formalization has to support all the modeling constructs for processes provided by the process representation language, especially notions for states (that is, activities) and state transitions (that is, crossovers from one activity to the next in a process). Moreover, control structures need to be defined that determine the correctness of the information exchanged between activities (data flow), and validity of transitions (control flow). The survey of existing formalizations is restricted to an investigation of existing formalisms that might serve as a starting point for the formalization of process representations as the basis of the Process Level Mediation Module in the DIP Mediation component. The requirements outlined above are supported by certain types of logical formalisms, mostly referred to as logics for specifying dynamics [Eck et al, 2001]. Analyses of such logical languages are provided in [Constantinescu and Faltings, 2002] and [Keller and de Bruijn, 2004]. In accordance to these works, we briefly summarize the existing logical approach of representation of processes that might serve as a basis for development of the Process Level Mediation Module in the DIP Mediation Component. In general, two different groups of logical formalisms for representing the dynamics of processes are distinguished. On the one hand, notions for representing states and state changes are needed, and, on the other hand, techniques for formalization of communication and interaction between different parties are needed. The first group is concerned with the formal representation of the general notions of processes (states/activities, transitions, data- and control flow as discussed in Section 2.2.1.1). The second group is concerned with the deployment of processes, that is when a process is actually used for executing a complex, multi-step interaction between two or more parties, including transaction management. With regard to process mediation, we are only interested in the second group: in order to make processes interoperable, we have to inspect the process specifications of interacting parties from a structural level (that is without regard to the actual content of the interaction), and resolve possibly occurring heterogeneities between the process specifications. Thus, the following briefly summarizes the most promising formalisms existing for the second group. 21 see homepage: http://www.mel.nist.gov/psl/ 38 FP6 – 504083 Deliverable 5.1 2.2.2.2.1. Logics for Representing Interaction Protocols As outlined above, we are mostly interested in formalisms for representing the process of interactions between parties. More precisely, we need a formalism that allows the description of the structure of the processes that parties take when they are participating in an interaction. These formalisms will serve as the formal basis for describing processes and defining mappings for the mediation of possibly heterogeneous processes. Subsequently, we discuss existing approaches for formalization of processes within Choreography and Orchestration of Semantic Web Services with regard to the needs identified above. In fact, the approaches to be investigated are those that are indirectly applied within the process representation technologies examined above. Process Algebras As outlined above, we are mostly interested in formalisms for representing the process of interactions between parties. More precisely, we need a formalism that allows the description of the structure of the processes that parties perform when they participate in an interaction. These formalisms will serve as the formal basis for describing processes and defining mappings for the mediation of possibly heterogeneous processes. Process Algebras (PA) provide this type of process representation formalism. A PA is a formal description technique for complex computer systems that pays special attention to concurrently executing components that interact in parallel and distributed systems. The objective of PA is to allow the observation of the behaviour of a system or its components. The approach is the definition of a formal language for the constituting elements of the processes and the performance of algebraic calculations on the basis of these process descriptions [Bergstra et al., 2001]. Compared to the general idea of mediation facilities outlined in the introduction, PA seems to be the appropriate choice for process level mediation within Semantic Web Services. The formal description language adds formal semantics to process descriptions, and the algebraic calculations existing for process algebras can serve as a basis for the development of inference-based mediation facilities for processes. Also, currently existing approaches for formalization of processes within Choreography and Orchestration apply PAs, as investigated in more detail below. Research within PA started in the 1970s, touching many topic areas of computer science and discrete maths, including system design notations, logic, concurrency theory, specification and verification, operational semantics, algorithms, complexity theory, and, of course, algebra. The very early works on PA developed the automata theory, which identified states and state changes for modeling a process and was concerned with formally describing the execution of process, a so-called run. Soon, the notion of interaction was added, and the attention of PA research turned to behaviour observation within the interactions of process-driven systems of components [Baeten, 2003]. The main approaches developed within PA are the Calculus of Communicating Systems (CCS) [Milner, 1980], Communicating Sequential Processes (CSP) [Hoare, 1978] and the Algebra of Communicating Processes (ACP) [Bergstra and Klop, 1984]. We briefly describe these approaches, omitting formal analysis here to concentrate on the support offered by existing process specification technologies and determine their usability for development of the DIP Process Level Mediation Module. The formalizations will be investigated more closely in DIP Deliverable D5.3, the specification of the Process Level Mediation Component. 39 FP6 – 504083 Deliverable 5.1 CCS is mainly the work of Robin Milner, which developed over time. The main focus of CCS is to formalize behaviours and determine equivalence between these, which is basically the same objective as followed in process level mediation. CCS relies on a synchronization tree as the underlying model for representing processes: a node represents a process activity and an arch is a transition; arches are equipped with socalled laws that specify the conditions or validity constraints for a transition. Based on this, CCS provides an algebraic model for process equivalence. A newer approach based on CCS is -Calculus, which adds notions for handling process interactions dynamically [Milner, 1991]. In contrast to CSS, CSP builds on the message passing paradigm of communication as a contrary approach than describing processes individually along with the notion of process equivalence as in CSS. Besides the underlying models for process representation, CCS and CSP have developed techniques to handle process identification, failures and deadlocks, and inference-based determination of equivalences of processes. The development of CCS and CSP are very interwoven, and most modern approaches combine the concepts of both. An exhaustive comparison CSS and CSP is provided in [Glabbeek, 1997]. Situation Calculus The term situation calculus, initially mentioned in [McCarthy and Hayes, 1969], is used for a variety of formalisms treating situations as objects, considering fluents that take values in situations, and events (including actions) that generate new situations from old. The situation calculus language mostly used as defined in [Reiter, 2001] is a firstorder logical language for reasoning about actions, based on Predicate Calculus. The aim is to represent dynamically changing worlds in which all of the changes are the direct result of named actions performed by some agent. Situations are sequences of actions, evolving from an initial distinguished situation, designated by the constant S0. If a(y) is an parameterized action and s, a situation, the result of performing a in s is the situation represented by the function do(a,s). Functions and relations whose values vary from situation to situation, are called fluents, and are denoted by a predicate symbol taking a situation term as the last argument (for example Own(bookName,s)). Finally, Poss(a,s) is a distinguished fluent expressing that action a is possible to perform in situation s. The general problem within situation calculus is that there are several other problems that have to be expressed explicitly, in order to achieve the correct semantics of the clipping of the world one wants to formalize: - Quantification Problem: is concerned with the executability of an action in specific situations. Usually, Poss(a,s) means that activity a can be executed in situation s. The problem is that it is nearly impossible to determine all situations in which an activity can be executed. In situation calculus, this has to be specified explicitly, narrowing the applicability of this formalism to small and closed worlds. - Frame Problem: adhering to the general law of inertia, it has to be defined which information (that is the fluents in situation calculus) remains unaffected by the execution of an activity. This is called the frame problem, wherein a frame is understood as the set of information items that represent a 40 FP6 – 504083 Deliverable 5.1 state. Similar to the Quantification Problem, all fluents that are not affected by executing an activity have to be specified explicitly. Abstract State Machines The approach of Abstract State Machines (ASM), originally proposed by [Gurevich, 1994], is an attempt to provide operational semantics to programs and programming languages, in order to overcome the gap between formal models of computation and practical specification methods. The ASM thesis is that any algorithm can be modeled at its natural abstraction level by an appropriate ASM. Many research efforts have been made around ASMs in the recent years, resulting in a simple, generic methodology for describing simple abstract machines that correspond to algorithms.22 The structure of an ASM is that there is an algebra A over a signature (a finite collection of function names) together with interpretations on the signature, and there is a program that holds transition rules. Basically, such a transition rule is an expression of the form f(t) := t0, with f as a function symbol, t as a set of terms in the signature, and t0 as another term. When this rule is fired in a certain algebra S0, then the terms in t are transformed to t0 is course of transition, resulting in a new algebra S1 as the new state of the universe of discourse. Thus, in an ASM only the transitions are defined; the run of an ASM results in consecutive algebras, terminating when no further transition rules can be executed. With regard to the general idea of mediation facilities outlined in the introduction, process algebras seem to be the appropriate choice for process level mediation within Semantic Web Services: the formal description language adds formal semantics to process descriptions and the algebraic calculations existing for process algebras can serve as a basis for development of inference-based mediation facilities for processes. Also, currently existing approaches for formalization of processes within Choreography and Orchestration apply PAs, as investigated in more detail below. Situation Calculus provides a similar expressivity for describing interaction processes as process algebras. In contrast, the ASM approach supports only very abstract definitions of processes. But the major advantage of ASMs is that the problems arising within process algebras and situation calculus (namely the quantification problem and the frame problem, as described above) are omitted, as only the transition rules are specified within an ASM. Further investigation of the appropriate formalism for process representation within DIP will be provided in Deliverable D5.3 of this Work Package. 2.2.2.2.2. Formalizing Choreography Description A more recent approach for formalizing Choreography descriptions in order to support the definition of inference-mechanisms on processes is provided in [Brogi et al., 2004]. This paper presents a formalization of WSCI (see above) on the basis of CCS, aiming at 22 exhaustive information on ASM and the ASM research community can be found on several websites, for instance or www.eecs.umich.edu/gasm. 41 FP6 – 504083 Deliverable 5.1 the specification of a technique to determine the compatibility of Web Services by checking the interoperability of the Choreographies of distinct Web Services that are supposed to interact automatically, as well as automated specification of mediators that make a priori incompatible Web Services interoperable. We briefly summarize the essence of this approach. The starting point of this approach is WSCI, which was chosen because it offers, in contrast to other representation techniques, the two possible views on Choreographies of Web Services: the <interface> construct that describes the external visible behaviour of a single Web Service, and the <model> construct that allows describing the combination of several interfaces (that is, distinct Web Services) into a global model of interaction. For an isomorphic formalization, CCS has been selected. For the formalization of the WSCI constructs (individual process logic primitives: sequence, parallel, choice, switch, loop, activities, and so on; exceptions: on fault, on timeout, on message; calling of processes in the global model) each WSDL message is represented by a CCS channel, the individual process logic primitives are mapped to the corresponding CCS primitives, and the calls for processes in the global model are transformed into parallel CCS models (for the detailed discussion of the transformations see the paper). A set of (WSCI) interfaces is defined as compatible if it terminates according to the order and the content of messages, meaning if the conversation of Web Services does not run into an infinite loop. Conversely, interfaces (Web Services) are not compatible if the system fails because of running into an infinite loop. This possibility is checked by matching the input and output actions of interfaces according to their structure and content. Another important aspect is replaceability, that is, finding other interfaces (Services) that can provide the same functionality, commonly referred to as compensation within the area of Semantic Web Services. This is determined by a set of heuristics, whereby the replaceability of S1 by S2 is feasible if the interface of S2 is a subset of the interface of S1 or the behavior of S1 and S2 is consistent if: 1. S2 preserves the semantics of WS1 (concerning global choices), 2. S2 does not extend S1 (that is, all actions in S2 are also in S1), 3. WS2 terminates whenever WS1 does, which is called “behavioral subtyping”. Additionally, the paper outlines how to develop mediators, called “adaptors” with reference to earlier works [Bracciali et al., 2002]. Therein, mapping rules for process entities of interfaces in a global model, that is that participate in an interaction, are specified in order to resolve structural and content mismatches. These mappings are only defined for specific use cases, but the approach can be generalized into general process level mappings. This approach covers the most important aspects needed for determining compatibility of Choreographies automatically, thus can serve as a basis for the development of the Process Level Mediation Module. It also stresses the choice of process algebras as the theoretical basis for formalization of process representations in order to support inference-based mechanisms for mediation. On the other hand, it applies WSCI as the 42 FP6 – 504083 Deliverable 5.1 Choreography description language, and does not provide a generic framework for formalization of Choreographies within Semantic Web Services. Nevertheless, this work serves as a starting point for further development. 2.2.2.2.3. Formalizing Web Service Orchestrations Some approaches for formalizing Web Service Orchestration in the sense defined above have recently been presented. We briefly summarize two approaches that may serve as a starting point for the Process Level Mediation Module with regard to the requirements of process mediation within Web Service Orchestration. The first approach takes BPEL4WS as a starting point and creates a framework for verification of interactions of BPEL4WS-described Web Services by transforming BPEL4WS descriptions into formal languages and determining the validity of Web Service Orchestrations [Fu et al., 2004]. The work relies on a model for Web Service conversations in which a global model defines the overall task to be achieved by combing several Web Services (this is similar to the approach of BPEL4WS: a process described in BPEL4WS defines the overall task, and external Web Services are called in order to fulfil the task defined in a specific activity of this process), and each Service is described using pre- and postconditions. The verification mechanism determines the suitability of the execution order specified in the global model by checking the pre- and postconditions of successive Web Service calls. For realization, BPEL4WS descriptions are transformed into proprietary formal languages that are supported by a reasoning mechanism that perform “Synchronizability Analysis”, in which heuristics that describe structure and relations of “valid interactions” are used to determine the suitability of a Web Service Orchestration. Although this approach uses proprietary formalisms and does not provide means for specifying mapping rules to resolve possibly mismatches in an Orchestration, it might be considered as an example for formalization and verification in Web Service Orchestration. The second approach presented at the same conference is the Concurrent Transaction Logic CTR-S [Davulcu et al, 2004]. CTR-S is a sound logic with a standard model theory that aims at combing process modeling and, based on this, automated contracting of Web Services within multi-party processes – which means Web Service Orchestration in the understanding outlined above. The aim of this approach is to specify a logical language with formal semantics for representing processes and to provide a means for automated contracting using workflows that restrict the requirements of possibly usable Web Services for a certain process activity. The paper concentrates on the explanation of the CTR-S syntax, the language model, and proof theory. It also provides a set of inference rules for proving CTR-S statements and a pre-defined set of constraints on valid contracts (that is, Service compositions), which are used to determine the validity of Web Service Orchestrations. The advantage of this approach is that there is a single representation language for modeling processes as well as the constraints on possible Web Service Orchestrations. As this is a logical language, it allows the specification of axioms and inference-rules 43 FP6 – 504083 Deliverable 5.1 along with a sound proof theory to determine suitability and validity of Service compositions. A shortcoming is that the approach is not very aligned with frameworks for Semantic Web Service descriptions or existing process representation techniques, and thus would require immense adaptation for compliance. 2.2.2.3 Process Integration by Process Composition Process mediation involves various techniques with the aim to make different processes interoperate. Process composition is one important approach to achieve interoperability among processes, making them collaborate in order to achieve given user goals. This approach is also used in orchestration, as explained above. Some existing approaches to processes composition are reviewed below. 2.2.2.3.1. Situation Calculus for Service Composition An initial approach to process composition in [McIlraith et al., 2001] and [McIlraith and Son, 2002] was to use a planning formalism based on the situation calculus, a first-order logical language for reasoning about action and change. In the situation calculus, the state of the world is expressed in terms of functions and relations relativized to a particular situation. The advantage of this approach is that complex control constructs like loops can be modelled using this framework. The drawback of this approach is its high computational complexity. This work build on and extends Golog, a high-level logic programming language, developed at the University of Toronto. Golog supports the specification and execution of complex actions in dynamical domains. 2.2.2.3.2. Hierarchical Task Planning for Service Composition In [Wu et al., 2003] the authors describe SHOP2, a hierarchical planning formalism for encoding the composition domains. This approach is more efficient but it doesn’t support complex constructs like loops. SHOP2 is a domain-independent HTN planning system. HTN planning is an AI planning methodology that creates plan by task decomposition. This is a process in which the planning system decomposes tasks into smaller and smaller subtasks, until primitive tasks are found that can be performed directly. The concept of task decomposition in HTN is very similar to the concept of process decomposition in DAML-S (see [DAML-S, 2004]). One difference between SHOP2 and most other HTN planning systems is that SHOP2 plans for tasks in the same order that they will later be executed. Planning for tasks in the order they will be performed makes it possible to know the current state of the world at each step in the planning process, which makes it possible for SHOP2’s preconditionevaluation mechanism to incorporate significant inferencing and reasoning power, including the ability to call external programs. This allows SHOP2 to integrate planning with external information sources as in the Web environment. In order to do planning in a given planning domain, SHOP2 needs to be given the knowledge about that domain. SHOP2’s knowledge base contains operators and methods. Each operator is a description of what needs to be done to accomplish some 44 FP6 – 504083 Deliverable 5.1 primitive task, and each method tells how to decompose some compound task into partially ordered subtasks. 2.2.2.3.3. Type Based Service Composition The previously presented approaches compose processes based on the semantic mark-up of the parameters in service descriptions. Another possible approach [Constantinescu et al., 2004] is to extend this by using also the composition typing information. Formalism and assumptions In this approach services and queries are represented in the standard way [W3C, WSDL, 2004] as two sets of parameters (inputs and outputs). A parameter is defined through its name and a type that can be primitive [W3C, XML, 2003] (for example, a decimal in the range [10,12] or [14,16]) or a class/ontological type [OWL, 2004]. Both primitive and class types are represented as sets of numeric intervals. For instance, the generic type Colour may be encoded as the interval [1,3], whereas the specific colours (subtypes) Red, Green, and Blue may be represented as the single-point subintervals [1,1], [2,2], and [3,3]. For more details on the encoding of classes/ontologies as numeric intervals see below Representing types. Input and output parameters of service descriptions have the following semantics: In order for the service to be invokable, a value must be known for each of the service input parameters and it has to be consistent with the respective parameter type. For primitive data types, the invocation value must be in the range of allowed values or in the case of classes the invocation value must be subsumed by the parameter type. Upon successful invocation, the service will provide a value for each of the output parameters and each of these values will be consistent with the respective parameter type. Service composition queries are represented in a similar manner but have different semantics: The query inputs are the parameters available to the integration (for example, provided by the user). Each of these input parameters can be either a concrete value of a given type, or just the type information. In the second case the integration solution has to be able to handle all the possible values for the given input parameter type. The query outputs are the parameters that a successful integration must provide and the parameter types define what ranges of values can be handled. The integration solution must be able to provide a value for each of the parameters in the problem output and the value must be in the range defined by the respective problem output parameter type. For manipulating service or query descriptions, we will make use of the following helper functions: in(X), out(X) - return the set of input or output parameter names of a service or query description X. 45 FP6 – 504083 Deliverable 5.1 type(P,X) - returns the type of a parameter named P in the frame of a service or query description X as the set of intervals of all possible values for P. The operator in conjunction with this function will represent a range inclusion in the case that P has a primitive data type or subsumption in case P is defined through a class or concept description [OWL, 2004]. The operator in conjunction with this function will represent a range intersection in the case that P has a primitive data type or in the case of a class/concept description it will represent the sub-class common to both the arguments of the operator (possibly the bottom class Nothing). We assume that both service and query descriptions (X) are well formed in that they cannot have the same parameter both as input and output: in( X ) out( X ) . The rationale behind this assumption is that if a description had an overlap between input and output parameters this would only lead to two equally undesirable cases: either the two parameters would have the same type, in which case the output parameter is redundant, or they would have different types, in which case the service description is inconsistent. Parameter names (properties in the case of OWL-S [OWL-S, 2004] or strings in the case of WSDL [W3C, WSDL, 2004]) attach also some semantic information to the parameters23. Thus, in our composition algorithm we not only consider type compatibility between parameters but also semantic compatibility. Composing services Informally, the idea of composing services using forward chaining is to iteratively apply a possible service S to a set of input parameters provided by a query Q (that is, all inputs required by S have to be available). If applying S does not solve the problem (that is, still not all the outputs required by the query Q are available) then a new query Q’ can be computed from Q and S and the whole process is iterated. This part of our framework corresponds to the planning techniques currently used for service composition [Thakkar et al., 2002]. Now we consider the conditions needed for a service S to be applied to the inputs available from a query Q using forward chaining: for all of the inputs required by the service S, there has to be a compatible parameter in the inputs provided by the query Q. Compatibility has to be achieved both for names (that have to be semantically equivalent) and for types, where the range provided by the query Q has to be more specific ( ) than the one accepted by the service S: (P in(S ))( P in(Q) type( P, Q) type( P, S )) This kind of matching between the inputs of query Q and of service S corresponds to the plugIn match identified by Paolucci [Paolucci et al., 2002]. 23 For WSDL this is not explicitly specified by the standard, but we assume that two parameters with the same name are semantically equivalent. 46 FP6 – 504083 Deliverable 5.1 Forward complete matching of types is too restrictive and might not always work, because the types accepted by the available services may partially overlap the type specified in the query. For example, a query for restaurant recommendation services across all Switzerland could specify that the integer parameter zip code could be in the range [1000,9999] while an existing service providing recommendations for the frenchspeaking part of Switzerland could accept only integers in the range [1000-2999] for the zip code parameter. The above condition for forward chaining can be modified such that services with partial type matches can be supported. For doing that we relax the type inclusion to a simple overlap: (P in(S ))( P in(Q) (type( P, Q) type( P, S ) )) This kind of matching between the inputs of query Q and of service S corresponds to the overlap or intersection match identified by Li [Li and Horrocks, 2003] and Constantinescu [Constantinescu and Faltings, 2003]. We will also consider the condition needed for a backward chaining approach. The service S has to provide at least one output that is required by the query Q. This corresponds to the plugIn match for query and service outputs. Using the formal notation above this can be specified as: (P out(S ))( P out(Q) type( P, S ) type( P, Q)) The above condition can be also relaxed such that services with partial type matches can be backward chained: (P out(S ))( P out(Q) type( P, Q) type( P, S ) )) Type-compatible service composition versus planning As the majority of service composition approaches today rely on planning, we will analyse the correspondence between our formalism for service descriptions with types and a hypothetic planning formalism using symbol-free first order logic formulas for preconditions and effects. As an example let's consider the service description S that has two input parameters A and B, and two output parameters C and D. Their types are represented as sets of accepted and provided values and are a1, a2 for A; respectively b1, b2 for B; c1, c2 for C; and d1, d2 for D. This corresponds to an operator S that has disjunctive preconditions and disjunctive effects. Negation is not required. 47 FP6 – 504083 Deliverable 5.1 Table 2: Service with types and corresponding planning operator in(S) = [A, B] :action S type(A,S) = [a1, a2] :precondition type(B,S) = [b1, b2] (and (or a1 a2) (or b1 b2)) out(S) = [C, D] :effect type(C,S) = [c1, c2] (and type(D,S) = [d1, d2] (or c2 c2) (or d2 d2) Written in this way our formalism has some correspondence with existing planning languages like ADL [Pednault, 1989] or more recently PDDL [McDermott, 1998] (concerning the disjunctive preconditions) and planning with non-deterministic actions [Kushmerick et al., 1995] (regarding the disjunctive effects), but the combination as a whole (positive-only disjunctive preconditions and effects) stands as a novel formalism. The structure of type-compatible service composition problems As described previously we specify a service integration query in terms of a set of available input parameters and a set of required output parameters. An integration solution consists of a given ordering of services that can be invoked so that finally all parameters required by the query are known. From the perspective of the match type between services and queries (see below Figure 12) we consider the following three cases: forward complete matches, backward complete matches, and forward partial matches. By using forward-completely matching services the initial set of available parameters can be incrementally extended. As there is a single point from which a service can be applied, once all its required inputs are available, forward chaining services does not introduce any choice points. Applying backward-completely matching services creates a directed graph of sets of required parameters as the order in which different parameters can be applied affects the set of parameters that still need to be provided. Several forward-partially matching services can be aggregated together into a composite service as a software switch that maps each possible combination of parameter values from the space of available inputs to one or more partially matching services. In order to be able to fulfil the same functionality as the completely matching service we have to have for each possible range combination of input parameters one or more services that can accept those values. 48 FP6 – 504083 Deliverable 5.1 query f orward f orward backward complete partial complete matches matches matches av ailable parameters switches branch av ailable sub-problems f rom: parameters query required parameters branch av ailable x backward required serv ice parameter switch set sub-problem Figure 12: The structure of type-compatible service composition problems Our software switch corresponds to a non-deterministic planning operator in that the choice point that it introduces will allow for a number of possible service invocation paths to be followed without commitment at integration time to a particular one. The choice will be made only at run-time based on the values of the switch input parameters. Each of the branches in a switch will provide a (possibly different) set of available parameters. It has to be noted that in order for the switch to be part of a service integration solution all of the distinct sets of available outputs of the switch will have to be part of an integration solution. Still for determining which branches can lead to a solution we might have to construct for each pair of branch available outputs and backward complete required inputs a sub-problem that we then solve recursively. Representing types Service descriptions are a key element for service discovery and service composition and should enable automated interactions between applications. Currently, different overlapping formalisms are proposed (for example, [UDDI, 2004] [FIPA, 2003] [OWLS, 2004] [Ankolekar et al., 2002]) and any single choice could be quite controversial due to the trade-off between expressiveness and tractability specific to any of the aforementioned formalisms. In this paper, we will partially build on existing developments, such as [UDDI, 2004] [Ankolekar et al., 2002], by considering a simple table-based formalism where each service is described through a set of tuples mapping service parameters (unique names of inputs or outputs) to parameter types (the spaces of possible values for a given parameter). Parameter types can be expressed either as sets of intervals of basic data types (for example, date/time, integers, floating-points) or as classes of individuals. Class parameter types can be defined through a descriptive language like XML Schema [W3C, XML, 2003] or the Ontology Web Language [OWL, 2004]. From the 49 FP6 – 504083 Deliverable 5.1 descriptions we can then derive either directly or by using a description logic classifier a directed graph (DG) of simple is-a relations. For efficiency reasons, we represent the DG numerically. We assume that each class will be represented as a set of intervals. Then we encode each parent-child relation by sub-dividing each of the intervals of the parent; in the case of multiple parents the child class will then be represented by the union of the sub-intervals resulting from the encoding of each of the parent-child relations. Since for a given domain we can have several parameters represented by intervals, the space of all possible parameter values can be represented as a rectangular hyperspace, with a dimension for each parameter. Details concerning the numerical encoding of services can be found in [Constantinescu and Faltings, 2003]. 50 FP6 – 504083 Deliverable 5.1 2.2.3 Conclusion In Section 2.2 we have studied the general needs for process level mediation within Semantic Web Services as well as existing approaches that might serve as a starting point for the development of the Process Level Mediation Module of the DIP Mediation Component. We initially outlined the use of process technologies within Semantic Web Services, differentiating between Choreography and Orchestration. The former is concerned with the usage and interaction of Web Services with a complex, multi-step externally visible behaviour, while the latter is concerned with the composition of several Web Services into a higher levelled functionality. The requirements for mediation of processes are very different. Within a Choreography, one must determine the compatibility of the external visible behaviours of Web Services. Therefore, techniques are needed that allow the description of the behaviour of individual Web Services as well as observing the process between Web Services that participate in an interaction. For Orchestration, the challenge of process mediation is to determine the correctness and validity of a Web Service composition. Both aspects of process level mediation require the formalization of process descriptions as the basis for automated mediation facilities. The analysis of existing approaches has shown that process algebras seem to be a proper basis for the formalization of process descriptions, and that there are some initial approaches that follow this direction. In conclusion, we have defined the scope of mediation of processes to be tackled within the DIP Mediation component, and outlined possible starting points for development. These have to be generalized, and they have to be combined into a coherent framework for process level mediation. 51 FP6 – 504083 Deliverable 5.1 3 REQUIREMENT ANALYSIS Requirements for the Mediation Component originate from two sources in the DIP project. One is the overall DIP architecture as developed in Work Package 6 [Altenhofen et al., 2004] and the other the case studies developed in Work Packages 8, 9 and 10 ([Hadek et al., 2004], [Davies and Rowlatt, 2004], [Montez et al., 2004]). 3.1 Architectural Requirements for DIP Mediation Component In this section we outline the architectural requirements as defined in [Altenhofen et al., 2004]. The goal we want to achieve with Semantic Web Service is the seamless integration of different services. In order to enable this seamless integration mediation must be performed transparently for both, the requestor and the provider of a service. Therefore the main requirement for the DIP Mediation Component is: [R1] Transparency: The Mediation Component needs to be transparent for both, the requestor and the provider of a service. For example, the requestor of a service does not need and probably does not want to know the intermediary processes needed for obtaining a certain service In addition to that [Altenhofen et al., 2004] defines two main requirements for the DIP Mediation Component. It should be: • Independent of particular execution environments, and • Decoupled from any other components of the DIP architecture. The solution suggested for achieving this is to develop the DIP Mediation Component as a (set of) Web Service. The following requirements result from the architectural decisions taken for the overall architecture of DIP. [R2] The Mediation Component must be available as a Web Service. This will enable the desired decoupling from the other components and make the component independent of the execution environment. Furthermore this enables the usage of the Mediation Component inside existing web service environments, thereby offering a transition path for gradual adoption of the technology developed in DIP. As shown in the state-of-the-art section of this document (see Section 2) currently no technology is available that offer automated mediation support for the data level, nor for the process level. Therefore it is suggested in [Altenhofen et al., 2004]. to separate the functionality of the DIP Mediation Component into two sub-components, a run-time environment and a design-time. [R3] Run-Time environment: There must be a run-time environment that is, given an existing transformation and either a source and a target data format or a source and a target process, capable of mediating between them. Note that the functionality to mediate between different data formats and the functionality to mediate between processes need to be independent of each other. [R4] Design-time tool: There must be a design-time tool that assists the user during the creation of transformations. This tool must enable the user to create the necessary transformations much more quickly and easily than is possible with 52 FP6 – 504083 Deliverable 5.1 today’s state-of-the-art tools. It must also be possible to create these transformations with a minimal amount of additional background knowledge. This can be achieved by developing new approaches and algorithms base for example on semantics and reuse. (For additional requirements on the design-time tool see Section 3.1.2) 3.1.1 Requirements for the Run-Time Environment In addition to the requirements the DIP architecture has for the Mediation Component, there are also a number of requirements that the case studies have elucidated for the component. These requirements can easily be derived from the input the case studies gave to the architectural team and result mainly in additional requirements for the RunTime environment. Since at least two of the case studies have to deal with sensitive customer data, one of the most important requirements derived from the studies for the run-time environment is the requirement for security. [R5] Support transport-level security: As message to and from the Mediation Component might be sent over the public internet, it is very important to offer transport-level security. This can be achieved by implementing or using one of the specifications developed in the Web Service area. Still, there is one important aspect to take into consideration when dealing with security aspects: although there are already many algorithms for assuring the security (like secret-key algorithms and encryption algorithms), it is well known that they are time consuming, thus reducing the efficiency of the mediator. So a compromise needs to be made between security and efficiency [R6] Trust: To ensure that sensitive data is only sent to and handled by trusted partners, a mechanism to establish trust is needed. As we don’t want to limit the runtime environment to the usage of a particular trust policy, a mechanism to support different policies is necessary. For a more detailed discussion on trust see [Altenhofen et al., 2004]. [R7] Data integrity: This requirement actually combines the previous two. By assuring data integrity, the system assures not only quality in terms of accuracy, but also security. Once delivered to the mediator system, data must not be the subject of incorrect mediation, and must not be corrupted by any external factor. [R8] Auditing: Auditing trials are necessary in the run-time environment for two reasons. Firstly the ability to trace messages and message transformation in the system is a prerequisite when dealing with sensitive (for example, the e-Government case study) or mission critical (for example, the VISP case study) information. The availability of such audit trails might even be a legal obligation in order to enable the establishment of contracts and so on. Secondly, such audit trails will be needed to support debugging and error resolution in a highly distributed system like DIP. In addition to the requirements that originated from the need for security and trust, there will also be strong requirements on the quality of the mediation service. In none of the scenarios described in the case studies could incorrect transformations be tolerated at run-time. This results in the following requirement: 53 FP6 – 504083 Deliverable 5.1 [R9] Transformation quality: At run-time an executed transformation needs to be correct. In the area of data transformation this translates into the requirement for achieving precision=1 and recall=1 [Do and Rahm, 2002]. However the requirements on the transformation quality are different during design time (see Section 3.1.2). The proposed DIP architecture will result in a highly distributed system. Therefore the remaining requirements that result from the DIP architecture are common requirements like scalability and flexibility, which are normally proposed in a distributed system. [R10] Scalability: As it is not possible to estimate the number of different data sources and clients the run-time environment will have to deal with in a given scenario, it must be designed to be scalable in both areas. Scalability in the number of different data sources can only be achieved if the effort for adding new sources is very low. [R11] Flexibility: Flexibility is needed on two levels in the run-time environment. Internally the DIP Mediation Component must allow the easy exchange and the arbitrary combination of matching the algorithms used. This will enable the integration of newly developed, improved algorithms as well as adjusting the Mediation Component to specific scenarios by selecting certain algorithms. In addition the Mediation Component must support different deployment scenarios (for example, mediation component as part of “own” infrastructure vs. external mediation component). [R12] Stackable: We define a stackable run-time environment as one capable of using other external mediation services. However, this requirement raises another question: which external mediation services are trusted? Another possible problem is that a failure in the execution of one of the intermediary mediation services might lead to a failure of the entire process. Therefore a hierarchy of the quality of external mediation services needs to be created and recovery methods need to be defined. 3.1.2 Requirements on the Design-Time Tool As mentioned above, the development of the DIP Mediation Component will be divided into run-time and design-time parts. In the beginning of the project the runtime will mainly execute static transformations generated using the design-time tool. But as the project evolves we hope to be able to dynamically generate transformations at runtime, possibly from pre-existing building blocks. The main goal for the design-time tool is to enable the user to create transformations between different data formats very quickly, with minimum manual effort and with as much automatic assistance as possible. This results in the following requirements for the design-time tool. [R13] Transformation-IDE: As the creation of transformations between either processes or data formats is a development process, the design-time tool needs to support the developer during this whole process. This results in the need for the integration of the creation, debugging and the final deployment of the transformation. 54 FP6 – 504083 Deliverable 5.1 In order to achieve this IDE-like behaviour several requirements have to be met. The most important requirement is the quality of the algorithms used to automatically generate transformations. [Do and Rahm, 2002] identified the following parameters as useful in determining the accuracy of a mediator system: False negatives (A) – matches needed but not automatically discovered True positives (B) – possible matches correctly identified False positives (C) – matches falsely proposed by the mediator Based on these parameters [Do and Rahm, 2002] measured the precision and the recall, by using the following functions: precision recall |B| | B||C | |B| | A| | B | In the ideal case, precision and recall both have a value of one, but that hardly ever happens in a semi-automatic tool. Note that the parameters for a given mediation system may be computed relative to a set of perfect (and most probably manually determined) mappings. One problem with the usage of precision and recall for measuring the performance of a mapping system is that one of the two can easily be maximized at the expense of the other. Recall, for example, can easily be maximized by returning all possible matches (that is, the cross product) resulting in very poor precision. Precision can be maximized by returning a single correct match. Therefore [Do and Rahm, 2002] suggest using overall as an additional indicator. Overall is defined as a combination of precision and recall: 1 overall = recall 2 precision These three indicators or others similar one should be used in measuring the accuracy offered by a mediation system on test cases. As well as the quality of the mapping algorithms, the execution time of these algorithms also needs to be taken into account. A competitive system should provide good results in a reasonable time. [R14] Quality of the “online” algorithms: Although the algorithms used to generate transformations will not be able to generate perfect transformation they must be precise enough (with respect to the defined measures) to enable the user to generate correct transformations in a reasonable time. What exactly “precise enough”, “sufficient recall”, and “reasonable time” mean will have to be investigated during the project. The quality requirements will most likely be different in the area of process mediation and the area of data mediation. [R15] Error and consistency checking: Dynamic error and consistency checking is very important for assuring good transformation quality. However as specified in 55 FP6 – 504083 Deliverable 5.1 requirement [R9] transformations must be correct during run-time. Therefore this functionality is needed in the design-time tool to support the user during the creation of transformations. This will result in high quality transformations and reduce the need for debugging. [R16] Graphical User interface: In order to enable the easy creation of necessary transformations a graphical user interface is needed. This GUI must support the user as much as possible during the generation of the transformations. Visual indications are needed to show which parts of a message or a process are already transformed, which parts of this transformation have been generated automatically or manually, how high the confidence in the automatically generated results is, and so on. As it might be not possible to provide a fully integrated tool right from the beginning, two tools – one for creating data transformations and one for creating process transformations – would also be suitable. A very special requirement results from the e-Government case study. All solutions used in future in any UK government institution must conform to the e-Government Interoperability Framework (eGIF). The eGIF states that: “XSL has to be used for data transformation”. As we do not want to be restricted to the use of XSL as a transformation language, this results in the following requirement: [R17] XSLT Export: The tool used to create data transformations must be capable of exporting the created transformations as an XSL script. The possibility to export the XSLT scripts will also enable the easy integration of our mediation technology into existing EAI24 systems (for example, SAP Exchange Infrastructure) as most of these systems are capable of executing XSLT scripts. As a result this might help the adoption of semantic web technology in general. 24 Enterprise Application Integration: This term is generally used for systems enabling the interoperability of enterprise applications that would otherwise not be able to communicate. 56 FP6 – 504083 Deliverable 5.1 3.2 Requirements for Data Level Mediation The difficulty of solving a given data level meditation problem heavily depends on the types of mismatches between the two data instances at hand. These mismatches can range from simple naming conflicts of XML elements (for example, <firstname> versus <givenName>) to complex mismatches (for example, objects of type “Human” with the attribute gender versus objects of type “Man” and “Woman”). As different algorithms will be needed to resolve different types of mismatches a classification of possible mismatches is necessary. [R18] Classification of Mismatches between Data Instance: The semantics of data instances is described using ontologies. In order to abstract from syntactical mismatches introduced by a specific data format we will need to classify mismatches between ontologies that describe data instances. This classification will then be used by different algorithms to resolve these mismatches. [R19] Algorithms for resolving Mismatches between Data Instances: There must be a library of algorithms that are capable of resolving a subset of the classified mismatches. As described in requirement [R18] we want to solve the data level mediation problem, not on the syntactical but on the semantic level, as we strongly believe that this approach will achieve better results. An abstract view on how to mediate between business data on different semantic levels is given in Figure 13. Figure 13: Business data mediation on the ontology level In order to enable this approach, we need functionality to lift data from the syntactical to the semantic level as well as functionality to drop data back down to the syntax level after the mediation has been performed. [R20] Lifting mechanism: A mechanism for lifting data from the syntax level (for example, XML) to the semantic level is needed. 57 FP6 – 504083 Deliverable 5.1 [R21] Dropping mechanism: A mechanism for dropping data from the semantic level down to the syntax level is needed. In addition to these mechanisms a formalism to specify transformations between ontologies is necessary. This language will be used to describe how an instance of one ontology can be transformed into an instance of another ontology. Such a language is needed for two reasons. Firstly, it enables the storage of the transformations independent of the implementation of the run-time environment, and, secondly, such a machine understandable language is the key to allowing reuse of mappings. [R22] Formalism to specify transformations: It is necessary to develop a machine understandable language, which can be used to express transformations between data on the ontology levels. It is important to note, that standard transformation languages like, for example, XSLT are not suitable in this setting. Although they provide an implementation neutral format to store transformations they are basically programming languages. The semantics of a program written in such a language is not easily understandable, even for a human. [R23] Language to describe instances: In addition to a formalism to specify transformations we also need a language that describes the instances of either processes or data that need to be transformed. Although we want to tackle the data transformation problem on a semantic level it is important to recognize that there exists a certain class of problems that cannot be solved at this semantic level. An example of such a problem is the Identity Problem. Consider for example that in both ontologies there exists a concept CITY. In each of them there also exists an instance of the concept, but in the first ontology the name of this instance is “Insbruck” while in the other it is “Innsbruck”. This kind of problem cannot be solved on a semantic level only. [R24] Syntax level mediation: To solve mediation problems similar to the one described in the previous section the Mediation Component must enable mediation on the syntax level when needed. 58 FP6 – 504083 Deliverable 5.1 3.3 Requirements for Process Level Mediation The overall objective for the DIP technology for process level mediation is to define a suitable technology for mediation of process definitions within Semantic Web Services. As outlined above, this component should provide mediation facilities for process technologies applied within Choreography and Orchestration, as the overall notions within behavioural description of Web Services. The following summarizes the requirements for the Process Level Mediation Component. [R25] Integration with DIP Mediation Component: regarding the architectural construction, the Process Level Mediation Module has to be integrated into the the design of the overall DIP Mediation Component. It also should provide design time tool support, as well as a run-time environment. Therefore, the same requirements hold as defined above. [R26] Conformability with DIP technology: in order to support mediation support for Web Services, the mediation technology for processes has to be conform with the technologies and languages to be used within DIP for representing process definitions. Regarding the technological realization of the Process Mediation Facility, the following requirements arise. [R27] General Requirements on Process Level Mediation technology: identification and specification of the building blocks for the process mediation facility, which are: Process Representation Language Formalization and Algebra Classification of mismatches between Processes Mechanism(s) for resolving (a subset of) the mismatches [R28] Mediation Support for Choreography and Orchestration: Within the section on State-of-the-Art Analysis for Mediation of Processes (see Section 2.2), we have determined Choreography and Orchestration as the two fields where process technologies are applied within Semantic Web Services. These notions have very different requirements for mediation, as outlined below. [R29] Choreography Mediation Requirements: Choreography is concerned with usage and interaction of Web Services that have complex, multi-step externally visible behaviours for communication with a service requester. The requirements for Choreography description are to provide a technique for describing local behaviours as well as global interaction models, which is to be based on a sound formal foundation. The requirements for Choreography mediation are: Ability to observe interaction processes Classification of possibly occurring mismatches in interactions Specification of a language for “mapping rules” to resolve mismatches 59 FP6 – 504083 Deliverable 5.1 [R30] Orchestration Mediation Requirements: Orchestration is concerned with the composition of several Web Services into a higher level functionality, which is then the functionality of another Web Service. Therefore, a suitable Orchestration description language is needed, which allows defining the decomposition of functionalities into sub-functionalities. The requirements for a suitable Orchestration mediation technology are: Ability to determine correctness and validity of Web Service compositions in an Orchestration Classification of possibly occurring mismatches in Web Service compositions Specification of a language for “mapping rules” to resolve mismatches Regarding application of existing technologies and approaches that can serve as a basis for the process level mediation technology, we have investigated the most relevant ones within the State of the Art analysis on mediation of processes in Section 2.2. [R31] Formal Representation of Processes: as examined throughout the document, the prerequisite for the required mediation facilities for Choreography and Orchestration is a formalization of process representations. Therefore, a suitable approach on basis of existing formal languages for processes (see Section 2.2.2.2) has to be developed. [R32] Dependency on DIP Process Representation Language: The foundation of the Process Level Mediation Component is the process representation language to be used or developed within DIP. In order to provide suitable mediation support, this language has to be supported by the Process Level Mediation Module. Thus, there is a strong interrelation with the DIP Deliverable D3.4 wherein the “business process and protocol ontology” for DIP is defined. Further requirements and design decisions for the Process Level Mediation Module will be investigated in detail in DIP Deliverable D5.3 “Process Level Mediation Module Specification”. 60 FP6 – 504083 Deliverable 5.1 4 CONCLUSIONS This deliverable has studied existing mediation systems and technologies (Section 2, State-of-the-Art Analysis) and derived requirements for the development of the DIP Mediation Component (Section 3, Requirements Analysis). For a better illustration of the mediation problem, the mediation of processes was studied separately from the mediation of data and information. The purpose of this separation was to underline the complexity of this problem, and also to emphasize the differences between these two apparently similar problems. The requirements are analyzed considering the following three aspects: the DIP Mediation Component (analysing the design-time tool and the run-time tool requirements), the data mediation and the processes level mediation. To obtaining a more complete set of requirements, we also based our analysis on the inputs provided by the Work Packages 8, 9 and 10, which are dealing with the three case studies (Virtual Internet Service Providers, eGovernment and eBanking). The results achieved within this deliverable will be addressed and further elaborated in the following deliverables: - D5.2: Business data level mediation module specification – for this deliverable the parts concerning data and information mediation will be of further use - D5.3: Business Process Level Mediation Module specification – the overview of the current state of the art and requirements analysis in processes mediation provided by this document are guidelines in the elaboration of Process Level Mediation Module specification - D5.4: Business data and process mediation module prototype – this deliverable will be based this document and the other two deliverables listed here. Special attention should be paid here to the requirements for the DIP Mediation Component and for the design-time tool. As a general conclusion, in this deliverable we have defined the scope of data, information and processes mediation within the DIP Mediation Component, and outlined possible starting points for development, which should be further combined with the results obtained in other work packages. 61 FP6 – 504083 Deliverable 5.1 5 REFERENCES [Altenhofen et al., 2004] M. Altenhofen, M. Hauswirth, V. Kirov,A. Kiryakov, C. Mack, J. Quantz, and R. Schmidt: Report on requirements analysis and state-of-the-art, Deliverable 6.1, DIP Project, 2004. [Ankolekar et al., 2002] A. Ankolekar, M. Burstein, J.R. Hobbs, O. Lassila, D. Martin, D. McDermott, S.A. McIlraith, S. Narayanan, M. Paolucci, T. Payne, and K. Sycara: DAML-S: Web service description for the Semantic Web, Lecture Notes in Computer Science, vol. 2342, 2002. [Arkin et al., 2002] A. Arkin, S. Askary, S. Fordin, W. Jekeli, K. Kawaguchi, D. Orchard, S. Pogliani, K. Riemer, S. Struble, P. Takacsi-Nagy, I. Trickovic, and S. Zimek: Web Service Choreography Interface (WSCI) 1.0. W3C Note 8 August 2002, available at: http://www.w3.org/TR/wsci/, 2002. [Baeten, 2003] J.C.M. Baeten: Brief History of Process Algebra. Technische Universiteit Eindhoven, 2003. [Bergstra and Klop, 1984] J.A. Berkstra and J.W. Klop: Process algebra for synchronous communication. In: Information and Control, 60(1/3):109–137, 1984. [Bergstra et al., 2001] J.A. Berkstra, A. Ponse, and S.A. Smolka (eds.): Handbook of Process Algebra. Amsterdam: Elsevier, 2001. [Booth et al., 2004] D. Booth, H. Haas, F. McCabe, E. Newcomer, I.M. Champion, C. Ferris, and D. Orchard (eds): Web Services Architecture, W3C Working Group Note 11 February 2004, available at http://www.w3.org/TR/2004/NOTE-ws-arch-20040211/, 2004. [BPML, 2003] Business Process Modeling Language, http://xml.coverpages.org/bpml.html, 2003. [Bracciali et al., 2002] A. Bracciali, A. Brogi, and C. Canal: A formal approach to component adaptation. In: Component deployment, LNCS 2370, pp. 185--199. Springer, 2002. [Brogi et al., 2004] A. Brogi, C. Canal, E. Pimentel, and A. Vallecillo: Formalizing Web Service Choreographies. In Proceedings of First International Workshop on Web Services and Formal Methods, Pisa, Italy, February 2004. To appear in ENTCS, 2004. [Bussler, 2003] C. Bussler: B2B Integration. Berlin, Heidelberg: Springer, 2003. 62 FP6 – 504083 Deliverable 5.1 [Constantinescu and Faltings, 2002] I. Constantinescu and B. Faltings: Behavioural Description Formalisms for Service Integration - Survey and Comparation. Technical Report No. 200224, Swiss Federal Institute of Technology (EPFL), Lausanne (Switzerland), 2002. [Constantinescu and Faltings, 2003] I. Constantinescu and B. Faltings: Efficient matchmaking and directory Services. In The 2003 IEEE/WIC International Conference on Web Intelligence, 2003. [Constantinescu et al., 2004] I. Constantinescu, B. Faltings, and W. Binder: Large scale, type-compatible service composition. In IEEE International Conference on Web Services (ICWS-2004), San Diego, CA, USA, July 2004. [Crubezy et al., 2002] M. Crubezy, E. Motta, W. Lu, and M. Musen: Configuring Online Problem-Solving Resources with the Internet Reasoning Service. IEEE Intelligent Systems 2002. [Curbera et al., 2002] F. Curbera, Y. Goland, J. Klein, F. Leymann, D. Roller, S. Thatte, and S. Weerawarana: Business Process Execution Language For Web Services, BEA Systems & IBM Coporation & Microsoft Corporation, 2002. [DAML-S, 2004] DAML Services, http://www.daml.org/services, 2004. [Davies and Rowlatt, 2004] R. Davies and M. Rowlatt: Analysis Report: e-Government Business Needs, Deliverable 9.1, DIP Project, 2004. [Davulcu et al., 2004] H. Davulcu, M. Kifer, and I.V. Ramakrishnan: CTR-S: A Logic for Specifying Contracts in Semantic Web Services. In: Proceedings of the Alternate Tracks of the 13th World Wide Web Conference 2004, New York, pp. 144-153, 2004. [Do and Rahm, 2002] H.-H. Do and E. Rahm: COMA - a system for flexible combination of schema matching approaches. In Proceedings 28th VLDB Conference, 2002. [Doan et al., 2002] AH. Doan, J. Madhavan, P. Domingos, and A. Halevy: Learning to map between Ontologies on the Semantic Web. InWWW2002, 2002. [Eck et al., 2001] P. van Eck, J. Engelfriet, D. Fensel, F. van Harmelen, Y. Venema, and M. Willems: A Survey of Languages for Specifying Dynamics: A Knowledge Engineering Perspective. IEEE Transactions of Knowledge and Data Engineering, 13(3):462496, May/June, 2001. [Fensel and Bussler, 2002] 63 FP6 – 504083 Deliverable 5.1 D. Fensel and C. Bussler: The Web Service Modeling Framework WSMF. Electronic Commerce Research and Applications, 1(2), 2002. [Fensel and Motta, 2001] D. Fensel and E. Motta: Structured Development of Problem Solving Methods. IEEE Transactions on Knowledge and Data Engineering, vol. 13, pp. 913-932, 2001. [FIPA, 2003] Foundation for Intelligent Physical Agents Web Site, http://www.fipa.org/, 2003. [Fu et al., 2004] X. Fu, T. Bultan, and J. Su: Analysis of Interacting BPEL Web Services. In: Proceedings of the 13th World Wide Web Conference 2004, New York, pp. 621630., 2004. [Glabbeeck, 1997] R.J. Glabbeeck: Notes on the methodology of CCS and CSP. In: Theoretical Computer Science 177 (1997), pp. 329-349, 1997. [Gruber, 1993] T.R. Gruber: A translation approach to portable ontologies. Knowledge Acquisition, 5(2):199-220, 1993. [Goh et al., 1999] C.H. Goh, S. Bressan, S. Madnick, and M. Siegel: Context interchange: New features and formalisms for the intelligent integration of information. ACM Transaction on Information Systems, 17(3):270-290, 1999. [Gurevich, 1994] Y. Gurevich: Evolving Algebras 1993: Lipari Guide. In E. Börger (ed): Specification and Validation Methods, Oxford (GB): Oxford University Press, 1994. [Hadek et al., 2004] T. Hadek, M. Isop, C. Mack, A. Duke, K. Niederacher, and A. Wahler: Analysis Report: VISP Business Needs, Deliverable 8.1, DIP Project, 2004. [Harel, 1987] D. Harel: Statecharts: A visual Formalism for complex systems. The Science of Computer Programming, 1987, 8, pp.231-274, 1987. [Hoare, 1978] C.A.R. Hoare: Communicating sequential processes. In: Communications of the ACM, 21(8):666–677, 1978. [Jones, 1998] D. Jones: Developing shared ontologies in multi-agent systems. In ECAI’98 Workshop on Intelligent Information Integration, Brighton, U.K, 1998. [Kavantzas et al, 2004] 64 FP6 – 504083 Deliverable 5.1 N. Kavantzas, D. Burdett, and G. Ritzinger: Web Services Choreography Language Version 1.0. W3C Working Draft, 27 April 2004. [Keller and de Brujin] U. Keller and J. de Bruijn: Language Evaluation and Comparison. WSMO Deliverable D8, available at: http://www.wsmo.org, 2004. [Kushmerick et al., 1995] N. Kushmerick, S. Hanks, and D.S. Weld: An algorithm for probabilistic planning, Artificial Intelligence, vol. 76, no. 1.2, pp. 239.286, 1995. [Lara et al., 2003] R. Lara, H. Lausen, S. Arroyo, J. de Bruijn, and D. Fensel: Semantic Web Services. Description Requirements and Current Technologies. In Proceedings of the Semantic Web Services for Enterprise Application Integration and E-Commerce workshop, at the Fifth International Conference on Electronic Commerce (ICEC 2003), Pittsburgh, 1-3 October, 2003. [Larson et al., 1989] J.A. Larson, S.B. Navathe, and R. Elmasri: A theory of attributed equivalence in databases with application to schema integration. IEEE Transactions on Software Engineering, 15(4):449-463, 1989. [Lenat, 1995] D.B. Lenat: CYC: A large-scale investment in knowledge infrastructure. Communications of the ACM, 38(11):33-38., 1995. [Leymann, 2001] F. Leymann: Web Service Flow Language 1.0. IBM Software Group 2001. Available at: www-4.ibm.com/software/solutions/webservices/pdf/WSFL.pdf, 2001. [Li and Horrocks., 2003] L. Li and I. Horrocks: A software framework for matchmaking based on semantic web technology, In Proceedings of the 12th International Conference on the World Wide Web, 2003. [Madhavan et al., 2002] J. Madhavan, P.A. Bernstein , P. Domingos , A. Halevy: Representing and reasoning about mappings between domain models. Eighteenth national conference on Artificial intelligence, p.80-86, Edmonton, Alberta, Canada, July 28-August 01, 2002. [McCarthy and Hyes, 1969] J. McCarthy and P.J. Hayes: Some Philosophical Problems from the Standpoint of Artificial Intelligence. In B. Meltzer and D. Michie (eds.): Machine Intelligence 4, pages 463 - 502. Edinburgh University Press, 1969. [McDermott, 1998] D. McDermott: The planning domain definition language manual. Yale Computer Science, Tech. Rep. 1165, 1998. 65 FP6 – 504083 Deliverable 5.1 [McIlraith et al., 2001] S. McIlraith, T. Son, and H. Zeng: Mobilizing the semantic web with daml-enabled web services. In Proceedings Second International Workshop on the Semantic Web (SemWeb-2001), Hongkong, China, May 2001. [McIlraith and Son, 2002] S. A. McIlraith and T. C. Son: Adapting golog for composition of semantic web services. In Proceedings of the 8th International Conference on Principles and Knowledge Representation and Reasoning (KR-02), D. Fensel, F. Giunchiglia, D. McGuinness, and M.-A.Williams (eds.) San Francisco, CA: Morgan Kaufmann Publishers, pp. 482.496, 2002. [Mena et al., 2000] E. Mena, A. Illarramendi, V. Kashyap, and A. Sheth: OBSERVER: An Approach for Query Processing in Global Information Systems Based on Interoperation across Pre-existing Ontologies. Distributed in Paralel Databases, 8(2):223-271, 2000. [Milner, 1980] R. Milner: A Calculus of Communicating Systems. Number 92 in Lecture Notes in Computer Science. Springer Verlag, 1980. [Milner, 1991] R. Milner: The Polyadic - Calculus: a Tutorial. Edinburgh, 1991. [Montez et al., 2004] M.M. Montes, J.L. Bas, S. Bellido, J.M. López, S. Losada, and R. Benjamins: Analysis Report on eBanking Business Needs, Deliverable 9.1, DIP Project, 2004. [Motta et al., 2003] E. Motta, J. Domingue, L. Cabral, and M. Gaspari: IRS-II: A Framework and Infrastructure for Semantic Web Services. 2nd International Semantic Web Conference (ISWC2003) 20-23, Sundial Resort, Sanibel Island, Florida, USA, October 2003. [MS BizTalk, 2004] Microsoft BizTalk Server, http://www.biztalk.org, 2004. [Narayanan and McIlraith, 2003] S. Narayanan and S. McIlraith, S. 2003. Analysis and simulation of Web Services. Computer Networks 42(5) : 675– 693, 2003. [Noy and Munsen, 2000] N.F. Noy and M.A. Musen: PROMPT. Algorithm and Tool for Automated Ontology Merging and Alignment. In Proceedings of the 17th National Conference on Artificial Intelligence (AAAI-2000). Menlo Park (California): AAAI/MIT Press, 2000. [Omelayenko et al., 2003] 66 FP6 – 504083 Deliverable 5.1 B. Omelayenko, M. Crubezy, D. Fensel, R. Benjamins, B. Wielinga, E. Motta, M. Musen, and Y. Ding: UPML: The language and Tool Support for Making the Semantic Web Alive. In D. Fensel et al. (eds.): Spinning the Semantic Web: Bringing the WWW to its Full Potential. MIT Press, pp. 141–170, 2003. [OWL, 2004] OWL web ontology language 1.0 reference, http://www.w3.org/tr/owl-ref/, 2004. [OWL-S, 2004] The OWL Services Coalition: OWL-S: Semantic Markup for Web Services, version 1.0 available at http://www.daml.org/services/owl-s/1.0/owl-s.pdf, 2004. [Paolucci et al., 2002] M. Paolucci, T. Kawamura, T.R. Payne, and K. Sycara: Semantic matching of web services capabilities, In Proceedings of the 1st International Semantic Web Conference (ISWC), 2002. [Papakonstantinou et al., 1996] Y. Papakonstantinou, H. Garcia-Molina, and J. Ullman: MedMaker: A Mediation System Based on Declarative Specifications. In Proceedings of the International Conference on Data Engineering (ICDE 96), pp. 132-141, 1996. [Pednault, 1989] E.P.D. Pednault: Adl: Exploring the middle ground between strips and the situation calculus. In Proceedings of the First International Conference on Principles of Knowledge Representation and Reasoning (KR'89), Morgan Kaufmann Publishers, pp. 324.332, 1989. [Peltz, 2003] C. Peltz: Web Service Orchestration. A Review of emerging technologies, tools, and standards. Hewlett Packard, CO., January 2003. [Popa et al., 2002] L. Popa, M.A. Hernandez, Y. Velegrakis, R.J. Miller, F. Naumann, and H. Ho: Mapping XML and Relational Schemas with Clio, Demo, In International Conference on Data Engineering, 2002. [Rahm and Bernstein, 2001] E. Rahm and P.A. Bernstein: A survey of approaches to automatic schema matching. In VLDB Journal: Very Large Data Bases, 10(4):334–350, 2001 [Reiter, 2001] R. Reiter: Knowledge in Action: Logical Foundations for Specifying and Implementing Dynamical Systems. Boston: MIT Press 2001. [Roman et al., 2004] D. Roman, H. Lausen, and U. Keller (eds.): Web Service Modeling Ontology Standard (WSMO Standard), version 0.1 available at http://www.wsmo.org/2004/d2/v0.3/20040329/, 2004. [SAP XI, 2004] 67 FP6 – 504083 Deliverable 5.1 SAP Exchange Infrastructure, http://www.sap.com/xi, 2004. [S BIS, 2004] Seeburger Business Integration Server, http://www.seeburger.de, 2004. [Schlenoff et al, 2000] C. Schlenoff, M. Gruninger, M. Ciocoiu, and J. Lee: The essence of the Process Specification Language. In: Transactions of the Society for Computer Simulation International, 16(4):204-216, 2000. [Singh and Huhns, 2004] M. Singh and M.N. Huhns: Service-Oriented Computing: Semantics, Transactions, Agents. Wiley, 2004 [to appear]. [Solanki and Abela, 2003] M. Solanki and C. Abela: The Landscape of Markup Languages for Web Service Composition, May 2003. [Thakkar et al., 2002] S. Thakkar, C.A. Knoblock, J.L. Ambite, and C. Shahabi: Dynamically composing web services from on-line sources, In Proceeding of the AAAI-2002 Workshop on Intelligent Service Integration, Edmonton, Alberta, Canada, pp. 1.7, July 2002. [UDDI, 2004] UDDI, Universal Description, http://www.uddi.org/, 2004. Discovery and Integration Web Site, [Visser et al., 1999] P.R.Visser, D.M. Jones, M. Beer, T. Bench-Capon, B. Diaz, and M. Shave: Resolving ontological heterogeneity in the KRAFT project. In 10th International Conference and Workshop on Database and Expert Systems Applications DEXA'99. University of Florence, Italy, 1999. [Wache and Fensel, 2000] H. Wache and D. Fensel: Special issue of the International Journal of Cooperative Information Systems on Intelligent Information Integration, 9(4), 2000. [Wiederhold, 1992] G. Wiederhold: Mediators in the architecture of the future information systems. Computer, 25(3):38-49, 1992. [Wu et al., 2003] D. Wu, B. Parsia, E. Sirin, J. Hendler, and D. Nau: Automating DAML-S web services composition using SHOP2.In Proceedings of 2nd International Semantic Web Conference (ISWC2003), Sanibel Island, Florida, October 2003. [Wohed et al, 2003] P. Wohed, W.M.P. van der Aalst, M. Dumas, and A.H.M. der Hofstede: Analysis of Web Services Composition Languages: The Case of BPEL4WS. ER 2003: 200215, 2003. 68 FP6 – 504083 Deliverable 5.1 [W3C, WSDL, 2004] W3C, Web Services Description http://www.w3.org/tr/wsdl12, 2004. Language (WSDL) version 1.2, [W3C, XML, 2003] W3C, XML Schema, http://www.w3.org/xml/schema, 2003. [XML, 2001] XML Schema Part 2: Datatypes, http://www.w3.org/tr/xmlschema-2/, 2001. [W3C, XSLT, 1999] W3C, XSL Transformations, http://www.w3.org/TR/xslt/ , 1999. [Yerneni et al., 1999] R. Yerneni, C. Li, H. Garcia-Molina, and J. Ullman: Computing Capabilities of Mediators. In Proceedings of ACM SIGMOD, Philadelphia, 1999. 69