The GOLD Project: Architecture, Development and Deployment Hugo Hiden1, Adrian Conlin1, Panayiotis Perrioellis1, Nick Cook1, Rob Smith1, Allen Wright2 1 Department of Computing Science 2 Department of Chemical Engineering and Advanced Materials University of Newcastle upon Tyne Abstract This paper presents a description of the architecture, development and deployment of the Middleware developed as part of the GOLD project. This Middleware has been derived from the requirement to accelerate the chemical process development lifecycle through the enablement of highly dynamic Virtual Organisations. The generic design of the Middleware will allow its application to a wide variety of additional domains. 1. Introduction GOLD is an EPSRC funded e­Science pilot project which aims to accelerate the Chemical Process Development (CPD) lifecycle through the enablement of Virtual Organisations (VOs) (Demchenko, 2004) and active Information Management. This complex application domain has two dominant characteristics, which have not been explored by previous Grid research. Extremely dynamic virtual organisations: The chemical R&D lifecycle is highly dynamic and unanticipated direction changes may occur at any point. Agility and flexibility are essential to respond to these changes and ensure time to market is minimised. The entire workflow for developing a given product will not be known at the outset. A need for different and or additional outsourcing of specialist services may become apparent as project knowledge increases. For example, additional resource required to prevent undesirable environmental impact of by­ products identified during the project. This type of variability within and across projects demands a highly flexible outsourcing model for the VOs. Binding to specific organisations or services in a given project must occur at the latest possible moment. Full lifecycle focus: The project seeks to integrate the full lifecycle of CPD, from basic research through design and process engineering to manufacture. In many product development cycles these phases are operated separately in distinct divisions or by separate companies. There are potentially a wide range of classes of interaction between VO participants during a chemical development project. These range from exchange of basic design data to specifying and ordering physical plant equipment. Some of these interactions, particularly those involving orders for equipment and manufacturing time, need to be non­repudiable (Cook, et al, 2002) in case of future disputes. CPD requires the exchange of wide range of information class between VO participants. This information ranges from physical property data, laboratory notes, experimental data, safety studies during the initial development stages through to industrial plant design information if and when the projects reach the commercial exploitation phase. A VO approach has been adopted because these skills were not available in either the initiating company or a single contractor. In addition the VO approach has the potential to offer substantial cost savings. Much of the information exchanged between participants is confidential, and must be secured from unauthorised access. The security requirements are demanding because the access control must change over the course of the development process, allowing different levels of access to certain individuals as projects progress. The project consists of four basic phases: Preliminary reaction engineering investigations, pilot plant design, pilot plant operations and commercial scale plant development. Although superficially this is a straightforward project, each of the phases described above can be subject to a number of disturbances leading to significant changes in the subsequent tasks to be performed. For example, the new operating conditions may unexpectedly affect the downstream recovery of the catalyst and a new separation method must be found. This could involve a modification to the VO structure through the introduction of a new specialist skill and or removal of an existing participant. This example is one of many unanticipated factors that could require radical changes to the project plan after the project has been started. The software, therefore, must be able to co­ordinate activities between VO participants in the face of such disturbances. 2. Chemical Development Scenario The functional requirements of the GOLD middleware have been identified from a number of sources including industrial consultation and the extensive CPD experiences of some of the GOLD team members. In addition an actual CPD project has been undertaken in collaboration with a number of companies as part of the development of the GOLD demonstrator. The aim of this ongoing CPD project is to convert an existing high tonnage manufacturing process from batch to continuous operation. In order to accomplish this goal a range of specialist skills are required some of which are outlined in Table 1. Participant Skill Reaction Engineering Consulant Analyses chemical reactions, and suitable operating conditions. Pilot Plant Designers Can design, build and operate small scale chemical plants. Equipment Vendor Supplies off the shelf process equipment according to supplied specifications. Equipment Manufacturer Supplies custom build equipment not available from Table 1: Participant Roles During the course of the CPD project, significant quantities of information are exchanged between participants. This information usually takes the form of “dossiers”, each of which can contain a set of individual documents. Dossiers can cover various aspects of the development lifecycle, but the scenario considered during the development of GOLD is focused on four main areas: Commercial, Technical, Manufacturing and Safety, Health and Environment (SHE). Access by users to the information contained within these dossiers by VO participants is controlled depending on the roles held by these individuals. This project has provided a detailed case study outlining tasks performed by and information exchanged between VO participants. Whilst not exhaustive it does provide a reasonably thorough test for the middleware developed by the project. 3. Software Architectural Elements An examination of the scenario presented above, performed during the software design process (Conlin, et al, 2005) has identified a number of high level architectural elements that the middleware developed by the GOLD project must be able to provide in order to support the scenario described above. A further decomposition of these elements was then performed to identify a number of atomic services that were then implemented in order to demonstrate the application. These elements are broadly classified as: Storage: Support for storing and retrieving any of the various information types generated and exchanged during the lifecycle of the chemical development process. Also included within this aspect of the architecture is a comprehensive Information Model which describes the various data types and VO structural information stored during the operation of the VO. Security: Services and facilities required in order to control access to information held withn the VO. These are important because the chemical and pharmaceutical industries attach considerable importance to the security of their information. Co­ordination: Functionality to enable the activities of individual VO participants to be co­ ordinated and performed in accordance with the overall plan for the CPD process currently in progress. Regulation: Monitors interactions between participants to ensure agreed behaviour and to enable actions performed to be audited at a later date. Detailed descriptions of theses architectural elements are available in Conlin, et al, 2005, which also describes the various services required in order to support this architecture. 4. Services to Support VOs The GOLD Middleware has been implemented in the form of Web Services (Skonnard, 2002). The provision of the core software components as services allows VOs to be constructed using a subset of the full provided functionality if required. For example, certain VOs may not require extensive auditing or regulatory functions. 5. Storage Services The storage services provided to the VO enable all of the information generated during the operation of the VO to be archived and retrieved. In addition to the CPD specific documents, the information includes details of the project plan, the membership of the VO, security attributes etc. In order to support this storage, a unified Information Model has been developed, which is summarised below. The Information Model, which has been based in part upon the MyGrid Information Model (Sharman, et al, 2004) is provided within the software implementation as a Java class hierarchy. Within this hierarchy, there are three base classes: Class Description GoldObject The base class for the majority of documents stored within the information repository. LogMessage Base class for all auditing and non­repudiation log messages. GoldDocument Represents the actual data from a document stored within the VO information repository. Document indexing and description data is held in the separate DocumentRecord class. Table 2: Information Model Base Classes Within the Information Model, the GoldObject class hierarchy contains details of most of the VO structural data. People, Roles, Participant Companies, VO Projects etc are all subclasses of GoldObject. A subset of this hierarchy is shown below in Figure 1. Rust, et al, 1995) XML schema exist for chemical specific information and can be easily stored within the information repository as XmlDocument objects. In other cases information is available in a structured form such as plant design data, however there is no standardised XML schema to represent this data. The GOLD project is actively investigating existing schema to represent this type of data, however, most information of this type is currently stored as binary data and persisted as BinaryDocuments within the information repository. Figure 1: Gold Objects The security within the GOLD Middleware is based around the AbstractResource class. Policies are defined which restrict access to resources based upon the Roles that the Person attempting to access the resource holds. These policies are based upon sets of rules that can be configured using a policy GUI and are stored as eXtensible Access Control Markup Language (XACML, OASIS, 2003) documents within the information repository. Documents generated during the CPD process are stored within the GoldDocument class hierarchy illustrated below in Figure 2. Documents within the Information Repository are referenced by means of DocumentRecord objects, which derive from AbstractResource,and contain details regarding the type of document data stored, access control restrictions, meta­data for searching etc. DocumentRecord objects are used to provide richer functionality than simple document identifiers and also to minimise the load on the application server when listing documents and performing other manipulations of documents that do not require the document data stored to be physically modified. The storage services or Information Repository, provided by the GOLD Middleware comprise the following physical web service: Operation createDocument Creates a new empty DocumentRecord in the information repository retrieveDocument Retrieves the document data associated with a specified DocumentRecord. updateDocument Updates the stored document data associated with a specified DocumentRecord. Figure 2: Document Class Hierarchy The basic Information Model contains two document types: XmlDocument, which contains a single XML file and BinaryDocument which contains arbitrary binary data (such as, for example, a Microsoft Word document). Documents specific to the CPD process are derived from one of these two base classes. In some cases, such as the representation of chemical structures using the Chemical Markup Language (CML, Murray Description Table 3: Storage Web Service 6. Security Services Operation The security services implemented within the GOLD Middleware depend largely on the specific roles held by VO participants. When applied to the CPD domain, these roles can be assumed to analogous to the skills described in Table 1. Within the GOLD Middleware, security constraints are expressed and stored using XACML (OASIS, 2003). The implementation allows access control restrictions to be specified in terms of access permitted to VO resources based upon the roles that users hold. The security services provided by the GOLD Middleware are designed to allow VO resource providers to authenticate users, identify the roles that users hold and to determine whether users should be permitted access to specified VO resources. This functionality has been implemented as two separate web services: User authentication is provided by the Authentication Web Service, which contains a single method for verifying a username password pair. This functionality allows, for example, a custom JAAS (Java Authentication and Authorisation Service) module to be created and used within the Gridshpere portal server, which provides the end­user GUI, to authenticate portal users against the GOLD Information Model. Operation authenticateUser Description Authenticates a VO user with a username and password. Table 4: Authentication Web Service The facility to determine whether VO users can access certain resources is provided by the Authorisation Web Service. This wraps the XACML Policy Decision Point (PDP) into a web service that can provides methods that can be used by service providers within the VO to make access control decisions. Description readResource Determines whether a VO user is permitted to read a specified resource. writeResource Determines whether a VO user is permitted to modify a specified resource. performAction Determines whether a VO user is permitted to perform an arbitrary action. This is possible as the XACML standard enables arbitrary actions to be specified as text strings. Table 5: Authorisation Web Service 7. Co­Ordination Services Co­ordination services within the GOLD Middleware are provided to ensure that actions performed by VO participants occur in the correct sequence and at the correct time to enable the work performed by the VO to proceed. Because of the highly dynamic nature of the CPD process, a flexible approach to co­ ordination has been implemented in the current incarnation of the GOLD Middleware. The approach adopted has been to model VO projects as sets of discrete Tasks, each of which can contain a number of DocumentRecords corresponding to the dossiers or individual documents required in order to consider the Task instance complete. Each Task has a number of attributes such as start and end dates, description text, comments, role membership etc. Because VO projects and Tasks derive from the AbstractResource class, access control is possible through the standard XACML policy mechanisms provided by the Security Services. By capturing projects as lists of tasks with start and end times, management and monitoring can be carried out using familiar Gantt charts. Co­ ordination functionality is provided to VO members via the Project Web Service which allows access to projects, tasks and their constituent documents. Operation Get specified DocumentRecord associated with a specific project task saveTaskDocument Save a DocumentRecord to a specific project task. getTask Get a specified Project object Get a task associated with a specified project. Table 6: The Project Web Service In addition to the task based approach for and co­ordinating projects on a high level, there is a need to initiate and control interactions between individual VO participants. For example, when a project requires the production of a specific document, a mechanism is needed to communicate that requirement to the relevant parties. In order to accommodate this requirement, the GOLD Middleware uses a “Worklist” approach whereby each VO User has a list of tasks to be performed. Events such as the beginning or end of tasks, the arrival of new VO members or the need for the production of project documents can be brought to the attention of selected VO users by placing an appropriate message into their Worklists. This has been implemented as a Messaging web service that can be used to send messages to individual VO users or to all users that are members of a specific VO role. Description sendUserMessage Sends a message to the Worklist of a specific VO user. sendRoleMessage Sends a message to the Worklists of all users with a specified role. getMessages Retrieve all of the messages for a specified VO user. Description getTaskDocument getProject Operation Table 7: Messaging Web Service 8. Regulation Services The Regulation Services provided by the GOLD Middleware are responsible for monitoring the interactions between individual users and companies so that actions performed within the VO comply with agreed standards of operation. There are numerous ways to define these standards: it could be a requirement that responses to requests for documents and information are returned within a pre­defined time interval. It may also be desirable that certain requests and interactions are non­ repudiable such that no party can deny these interactions at a later date. The GOLD Middleware has provided two mechanisms for performing regulation. The first is a logging service that accepts log messages from all of the other components of the Middleware, thereby allowing a thorough auditing (or possible replay) of the events and messages that were exchanged over the course of a VO project. The logging web service stores the log messages contained in the class hierarchy, a partial view of which is shown in Figure 3, in the Information Repository database. acts as a set of Web Service handlers that intercepd messages flowing between VO participants that need to be non­repudiable. The effective structure of this system is shown in Figure 4 which shows a message, msg, being transferred from participant A to participant B via a trusted Delivery Agent, DA. Figure 3: Logging Classes The logging web service provides the following functionality to store and search for log messages (Table 8): Operation Description logMessage Sends a message to the logging service which is stored in the Information Repository database. searchLog Searches the Information Repository database for log messages of a certain type or that were logged within a certain time period or pertaining to a specific resource or user. Table 8: Logging Web Service The second regulation mechanism provided by the the GOLD Middleware uses the non­ repudiable exchange tools developed by Cook, et al, 2002 to ensure that communications between individuals for certain classes of interaction are impossible to deny at a later date. For the sake of integration, the non­repudiation tools use the logging service to save information regarding the state of any non­repudiable information exchange between participants. Whilst the non­repudiation protocol as implemented by Cook, et al, 2002, is complex, on a simple level, the non­repudiation system Figure 4: Non­Repudiable Message Delivery Structure 9. Implementation Details The current incarnation of the GOLD Middleware has been implemented within the Sun Microsystems Application Server v9. This has been selected partly based upon its open source status and partly because it provides, in conjunction with Netbeans 5.0, an easy to use environment for developing and deploying Web Services. The majority of GOLD Middleware functionality has been implemented as Enterprise Java Beans (EJBs) with Web Service wrappers provided to deliver the services described above. In order to reduce development effort, data is exchanged between web services in the form of XML documents which are currently automatically mapped to Java classes using the XML Serialization provided by the Java runtime. Future work, however, will define formal XML schema for the representation of the class attributes within the Information Model and use Java XML Binding (JAXB) to exchange information in a more interoperable format. Storage of the data within the Information Repository has been implemented using the Hibernate object­to­ relational database mapping library (http://www.hibernate.org), which automatically creates database schema and handles all serialisation / deserialisation and synchronisation issues. The database used for the demonstration system is the open source MySQL package, although use of the Hibernate library provides a high degree of database independence for a production quality system. Configuration of the VO users, roles and policies is performed using a Java Server Pages (JSP) application, whilst the GUI presented to End Users is based upon the Gridshpere portal server, with additions to support authentication of users via the GOLD authentication web service. In order to demonstrate the potential for multiple views into an operating VO, the security policies are configured using a Swing GUI which interacts directly with the security EJBs using CORBA over SSL. 10.Conclusions The Middleware developed as part of the GOLD project has been guided by a number of factors ranging from the experience of the individual team members operating actual CPS projects to a series of interviews performed with a significant number of CPD companies in conjunction with the Management School at Lancaster University. The current implementation, whilst able to support the CPD process has been designed with flexibility in mind. By modifying the security roles (Section 6) and document types supported (Section 5), collaborative development projects across a wide range of application domains can be supported. This generic approach is an important aspect of the GOLD project, as one of the key requirements is to support multiple classes of VO. 11.References Demchenko, Y., 2004. Virtual organisations in computer grids and identity management. Information Security Technical Report, 9, 1, 59­ 76. Cook, N., Shrivastava, S. and Wheather, S. 2002. Distributed Object Middleware to Support Dependable Information Sharing between Organisations. In Proceedings of IEEE Int. Conf. on Dependable Systems and Networks (DSN), Washington DC, USA. Skonnard, A, 2002, The XML Files: The birth of Web Services, MSDN Magazine, Volume 17, Number 10, October 2002. Sharman, N., Alpdemir, N., Ferris, J., Greenwood, M., Li, P. and Wroe, C. 2004. The myGrid Information Model. In Proceedings of the UK e­Science All Hands Meeting 2004, 1 September. OASIS. 2003. eXtensible Access Control Markup Language (XACML) Version 1.0. OASIS Standard, http://www.oasis­ open.org/committees/xacml. Conlin, A., Cook, N., Hiden, H., Periorellis, P., Smith (2005) RCS­TR: 923 GOLD Architecture Document. School of Computing Science, University of Newcastle, Jul 2005 Murray­Rust, P., Rzepa, H.S., Leach, C., 1995, CML – Chemical Markup Language, Poster presentation, 210th ACS Meeting, Chicago, 21st August, 1995