Grid Service Information Discovery Meeting Copenhagen, Denmark December 12-14, 2006 URL: http://www.nordugrid.org/events/sid06.html Attendees: Laurence Field EGEE / CERN Steve Fisher EGEE / RAL Antony Wilson EGEE / RAL Arumugam Paventhan EGEE / RAL Markus Schulz EGEE / CERN Steven Timm OSG / Fermilab Shaowen Wang OSG / U. of Iowa Sergio Andreozzi OMII / INFN Weijian Fang OMII / U. of Southampton Oxana Smirnova Nordugrid / U. Lund Balazs Koyna ARC / Lund / NorduGrid Laura Pearlman Globus / ISI Ahda Iamhitchi U. of South Florida Lydia Priefo U. of South FloridaF Overview The main purpose of this meeting was to discuss a common way to do Service Discovery across multiple grid infrastructures. It is of critical importance that the solution will scale to the future requirements of the expanding grid infrastructures. The meeting was split into three main sections; the past, the present and the future. Firstly, the existing systems used to discover services in the different grid infrastructures were described in order to understand the similarities and differences between the exiting systems and to understand the pros and cons of each. Next, a common method of service discovery was discussed and finally a road map of how to achieve end results was devised. An agenda was provided as a guide for discussion. The expected workshop deliverables were a summary of the service requirements to achieve interoperability between OSG and EGEE at different levels: information systems, information schema, job submission, job description, brokering, security, file transfer and resource management. It is important to clarify that the purpose of the meeting was not interoperability, but the design of a new information system that is highly scalable. We tried to find what were the interoperability problems existing in current systems and devised a new solution that avoids or diminishes these issues. To give an idea of the challenge, the current size of EGEE has 200 production sites, each with compute element, storage element, BDII, and monitoring services at least. They may run other services as well. Currently there are 60 Virtual Organizations (VO’s). The anticipated scale of EGEE in 3 years is expected to be over 1000 sites, 20 different services, 200 VO’s, 10 roles and groups apiece, 40 x 106 pieces of metadata. Fermilab’s interests Part of Steve Timm’s motivation to be at the workshop was to understand the implications to FermiGrid and also alert the developers to the existence of job-forwarding services such as the FermiGrid one. FermiGrid has a stake in the outcome of any service discovery protocol since we would likely have to write an emulation for it, given the implementation of our site Globus job gateway. SUMMARY OF THE MEETING Overview The meeting was split into three sections: the past, the present and the future. First, the existing past systems used to discover services in the different grid infrastructures were described in order to understand the similarity’s and differences between them and to understand the pros and cons of each. Second, the discussion focused on the present; know how actually systems are being used. A common definition of service discovery, the differences with service selection (query) and a common method of service discovery were described. Third, a road map of how to solve current problems to achieve the end result in the future was devised, pointing out implementation issues and how to migrate from existing to new systems. Existing systems and use cases The follow existing systems were discussed and the pros and cons of each system were addressed. MDS2 Nordugrid use of MDS2 BDII MDS4 R-GMA Grimoires Service Discovery in glite Service Discovery in OSG Naregi Cell Domain One of the main outcomes of this discussion was the similarities in each system. Most used an index which contained information of site level interfaces. The system would then pull the information from site level. On a conceptual point of view, a site could be represented by a database and an interface. One of the main differences between the different systems is how VO or grid level caching is handled. Some common use cases of the systems were walked through to demonstrate the kind of information that needed to be found with certain frequency. As far as security is concerned, most of the above systems have the capacity to use GSI authentication to retrieve the information, but do not do so due to the extra overhead that this places on the information process. All seem to think it is desirable and will eventually have to be re-implemented. MDS2 The pros of this system include an inherently distributed GRIS; it uses standard LDAP and a three-like hierarchy. The cons are various: in general, the implementation is not stable; it uses only one GIIS and it is necessary to reconfigure GRIS each time a change happens; caching is done on demand and in several layers; the system requires performing too many queries, having to traverse all the system to generate the results; finally, scalability is a problem: the size of the query grows linearly with the number of sites. Globus Monitoring and Data Service (MDS2) has been used in LCG, OSG, and NorduGrid. It is now deprecated by Globus Alliance. The initial implementation used the GRIS (Grid Resource Information Service) and the GIIS (Grid Index Information Service). This was a distributed service with one GRIS per resource and one GIIS per site. In all its various implementations it suffered from scalability and stability problems. GIIS was used to cache the information, but the total information is huge, 19MB currently for EGEE, 35MB from NorduGrid. NorduGrid had an intelligent client for MDS2 which had a timeout for misbehaving sites. This is part of the ARC (Advanced Resource Connector) middleware BDII BDII uses a central cache. Instead of have data in each site, a central database is used. The pro of this system is that there is no intelligence in the clients. The system faces problems using a single entry point; it overloads when there are too many queries and had trouble handling large catching. BDII (Berkeley Database Information Index) is now used as the front end to the information system, replacing the GIIS and eventually the GRIS as well. It begins with a registry of sites and caches information every two minutes. In LCG implementation there is a BDII at every site. The site BDII grows in size with the number of other BDII’s that are querying it. OSG will have a single BDII for whole OSG and advertise to it with CEMon (Computing Element Monitor). Again the caches are large. MDS4 MDS4 switches from LDAP to web services implementation. The main difference with MDS2 is that caching is not done on demand. The pros are multiple: it is an XML database; queries are done using XPath; one index centralizes all other indexes and an index server contains pieces of data for each provider. The cons with MDS4 are the size of XML database in memory. Globus MDS4 (web service) doesn’t run information providers on demand, just on a fixed interval. It has central index servers and it is possible to make custom indexes, with periodic caching. The data is stored in an XML database and queries are done using Xpath. The one problem that was mentioned is that the XML database takes up a lot of space in memory. Globus toolkit includes Java and C interfaces, a web based client, and a command line client. Largest known implementation is the TeraGrid. Globus developers have written some information providers. Modular Information Providers for the Open Science Grid have been written in beta version at U of Iowa as well. R-GMA It is a relational database system, actually used by EGEE as infrastructure for monitoring. The architecture is similar to MDS4. A registry contains a list of URL sites. This system conceptually is extremely good. The cons are various: implementation is not so good; it is complex and connections time out; and the registry gets too much load; it has problems when a large number of jobs are in the system simultaneously. Some improvements had been proposed: allow authorization based on parameterized views; handle multiple virtualized database management; and allow multiple socket connections among components. R-GMA (Relational Grid Monitoring Architecture) is part of gLite/EGEE and uses a relational database to store the various information from grid schemas, from whence it can be retrieved via SQL queries. It is possible to authenticate with grid certificates and configure the views to define what parts of a table any given user can see. However, there are reports of connection timeouts and too much load on the registry, and it has difficulty handling the conditions of large numbers of jobs in the system simultaneously. Improvements have been proposed to deal with these and other problems GRIMOIRES It is a web service container deployable in OMII, GT4 and Tomcat/Axis. It is the middleware for UK e-science projects. It seems to be in development stage. The pros are that it is UDDI compatible and it is a stand alone service that uses metadata annotation and discovery. Grid RegIstry with Metadata Oriented Interface: Robustness, Efficiency, Security. This is an Open Middleware Infrastructure Institute-UK (OMII-UK) funded project from the University of Southampton. Grimoires can deploy as a web services container within OMII, GT4, and Tomcat/Axis frameworks. The service registry is based on UDDI (Universal Description Discovery and Integration) framework but extends the framework. GLITE SAGA is a middleware that contains an interface for grid applications. It has no service discovery included in the functionality. So it uses GLite as an independent component service discovery. gLite includes a Service Discovery plugins to the R-GMA service mentioned above as well as for the BDII. There is a command line interface and also an API. The Simple API for Grid Applications (SAGA) is currently using this API for service discovery since it has no native one of its own. NAREGI It is a Japanese project based on the cell domain concept, started from the need of many industrial projects such as Fujitsu and other industries. The functionality is similar to MDS4 except that the interface is OGSA DAI. The pros of this system are that no catching is involved; it uses a standard web services interface and a Common Information Model Object Manager (CIMON) to generate the queries. When a query comes in, it is possible to decide which information provider to work with. CIMON requires a schema to work, but could be any schema. NAREGI (National Research Grid Initiative) uses CIMOM, the CIM Object Manager, which distributes information about compute elements based on the Common Information Model, an emerging industry standard. This is then aggregated to a relational database and implemented as a grid service by use of Globus service OGSADAI (Open Grid Services Architecture Data Access and Integration). Each site is referred to as a Cell Domain. Use cases for the information systems Some common use cases of the system were walked through to demonstrate the kind of information that actually exist and the ones needed to be found and the frequency. The discussion covered the functionality currently provided by the existing systems and also the desired functionality for the new information system analyzed during the meeting. Job Submission User Interface Use Case Find all Resource Brokers (RB) that I can use Find a suitable workload management system Find my Virtual Organization Membership Service (VOMS) server: it will find my proxy with my roles Find again the central catalogue to write output data to the central/global resources Workflow Management System Use Case Find which catalogs are available for my Virtual Organization (VO) 1 and query them to learn where data is Find what Computer Elements (CE) I can use and what state they are in Find all CEs close to my Storage Elements (SE) Rank the resources within the information system Get back a sorted list of resources, sorted by goodness of resources Find all the CEs where I can run the job that meets my criteria2 Find job status3. It includes sub-jobs status Publish dynamic job info. (e.g. job states, resources consumed by the job, etc.) Update information periodically: 30 secs. Do job tracking / monitoring4 1 Catalog contains the locations of all copies of the files that you have Static and dynamic information 3 The actor of the use case could be a user or a service in his behalf 4 Includes data transfer monitoring also 2 Working Node (WN) Use Case Find which catalogs are available Find storage capacity Discovery of storage elements (SE) Additional Use Cases Find user services Perform an analytical query (browsing) Provide a restricted view for a user (depending on rights) Perform analysis on data: querying about services that are running instead of CEs. Perform service monitoring5 (for troubleshooting) to signal wrong status. E.g. if the service is down for longer than a threshold. Data mining approaches: calculate correlations, find trends, detect job failing patterns, send notification, etc. File Transfer Service Use Case Functionality is similar to job submission use case, but it is for moving data around. Instead of looking for CEs, it looks for SEs. User defines source (logical filename) and destination The system runs the queries to discover The priority of data transfers is important. This can be seen in the existing implementation in LCG to manage data flow between the place where the data is created (CERN) and the major research center in the world. It is used heavily. Service Discovery Before any details could be discussed, the concept of Service Discovery, such as service discovery, service advertising and service selection, needs to be agreed. This is essentially the question asked and the answer which is returned. Service Discovery is essentially the question asked and the answer which is returned. It was agreed that the questions which could be asked are anything that is generic for all services and the answer is a handle. In fact there are two handles, the Service Access Point and the Information End Point. Closely related to this is the question of what a service is and what a resource is. All resources are made visible to the grid via services but in general there can be a many to many relationship between services and resources. For instance, a single Compute Element resource might have an OSG gatekeeper service and an LCG gatekeeper service. Likewise you could have a site gateway service which is associated with more than one Compute Element resource. It follows that resources will have to have unique identifiers as well, and resource discovery can be viewed as a reverse lookup operation to service discovery. 5 History is not stored (although this would be good for monitoring) A resource is seen as a property as a service. There is a many to many relationship between services and resources, therefore a resources also need a unique identifier. Hence resource discovery can be seen as a reverse lookup. The need for a common Service Discovery interface was discussed and it was agreed that this would be needed. There is ongoing work to define APIs for Service Discovery clients in the SAGA activity within OGF. A plug-in specification is also defined to enable the APIs to be used within multiple systems. Similar plug-ins were developed as part of the OGF gin-info activity. It was agreed that an information provider interface would be useful so that the developers of underlying systems (e.g. batch systems and storage systems) can maintain their providers. The proposed idea is that vendors of different systems should write their own adapters based on the common specification generating the same output. The interface should be simple. A command that produces XML on standard out might be suffice. This interface can be used both at the provider and at the plug-in level. There is no need to make this an official standard but it would help if this recommendation was adopted so that information providers could be shared and possibly developed by the developers of the underlying system which is being queried. It was suggested that we may want to set up a community repository for providers (plug-ins). The need of a common schema for each service type was also proposed. An experimental approach could be tried using a subset of GLUE. The semantics of the schema needs to be specified for each service type. What information needs to be produced is defined by the schema and we must not forget about caching to protect against overloading the underlying resource. The use case of finding information which is specific to a service type was discussed. The name given to this use case was "service query". The main difference between service description and service query is the level of aggregation of the information used. We have 2 layers: first the list of handlers and second the details. At the first step only we can find out what services are available; in the second step we could ask whether or not the service is working properly and further attributes. This latest information can only be provided by the selected service; this last step is what was called “service query”. In order for this to be achieved a schema needs to be defined for each service type. Data must be as generic as possible and information should be as little as possible. Static and dynamic attributes were discussed. The conclusion is that all attributes can change. The main difference is the frequency on what we expect them to change. If something is static we don't expect it to change more frequently than 6 hours, dynamic attributes would change more frequently. Schema design needs to take into consideration dynamic values so that systems can be made more efficient by only moving the dynamic values around with a high frequency. Query Interface A discussion followed on the need for a common query interface. A service query may involve polling the services themselves or a site-level database such as a BDII. Some services have their own API (e.g. VOMS= Virtual Organization Management System) others don’t, (e.g. FTS= File Transfer System). Issues involved in service-specific queries are (a) defining a schema for each service type (b) identifying which attributes would be considered static and which are dynamic, and (c) determining whether the information returned by the query can be reduced to keyvalue pairs. For some services the GLUE schema is sufficient, for others new schema will have to be defined. “Static” attributes are considered to be those that stay the same for 6 hours or more. Schema design needs to plan for moving the dynamic values around with more frequency. It was thought that some of the service-specific queries cannot be reduced to key-value pairs Three options were presented: extend the Service Discovery API, have a common generic query interface, or service specific APIs. It was agreed that extending the Service Discovery would not be an option as key value pairs cannot be used to express the complex structure of some service types. Service specific API would be very sensitive to schema changes. The conclusion was that a common query interface would be desirable. However, it was not clear what this should be or how we can agree on one. Bootstrapping Investigating the existing systems showed that each grid has a top level aggregator. The end points of these aggregators needs to be passed to the configuration of the Service API so that it can find these aggregators. One possibility is that the VO should know these as they have already negotiated with the infrastructure to gain access. Grid infrastructures should look at the VOs they support to understand which Grids they need to interoperate with and they can get the endpoints from the VO. An agreed format may need to be worked out. At the end some additional ideas were discussed. The result is a list of research questions including: How to test query languages such as XPath, XQuery, SQL, LDAP, OGSA-DAI to define how they cope for large-scale queries? (e.g., given the scale of Grids in the future) Is there a performance analysis for MDS4 and for BDII? Would it be worth to have it? How to trust the information in a grid information system? Example: latitude, longitude for sites (e.g., Ro EGEE sites in the mountains) Conclusion The meeting was very fruitful with many areas being covered. The main outcome of the meetings is summarized as follows. 1. We agree that there is a need for a common way to do Service Discovery. 2. A Service Discovery API is needed and this work will be done in the SAGA working group within OGF. 3. Service Discovery will need a generic description of services, which will be defined in the GLUE working group within OGF. 4. The gin-info group could help to provide the required plug-ins for connecting to other grids. 5. There is a need for Service-specific schemas. This will also be defined in the GLUE working group within OGF. 6. A common information provider interface would be helpful. It was decided that this would be just a command which returned XML. There is no need to make this an official standard but it would help if this recommendation was adopted so that information providers could be shared and possibly developed by the developers of the underlying system which is being queried. 7. It would be a good idea to set up a community repository which can be used to share plug-ins etc. 8. In principle, it was agreed that a common query API would be required however it wasn't clear how this would be decided. 9. Scalability has to be taken into consideration in the design; this has to be solved for each system, based on own design and problems. OUTCOMES FROM THE MEETING The meeting was very fruitful. The participation of real users from real operating grids was a key point for the success. The objectives set for the project were all met. As result of the meeting, USF wants to collaborate in the analysis of current information system utilization using existing workloads for job submission and data transfers for existing different grids infrastructures. The meeting provided us the contacts for getting real usage trace. We have promises from: 1) Open Science Grid (USA) - Contact person is Wang Shaowen 2) R-GMA (UK) - Contact person is Steve Fisher 3) CERN Lab (Switzerland) - Contact person is Laurence Field Our objectives from the study of real traces are: 1) Detect usage patterns that characterize user behavior when running processes over the grid 2) Provide metrics for resource usage and propose statistics on the amount of usage of computing and storage resources for different grid services 3) Study correlations for different variables such as location, resources, time, user, experiment and applications, to understand dependencies 4) Find missing data and propose improvements on the information logged from the systems 5) Predict future usage based on current system characterization 6) Summarize interoperability issues in service discovery for grids from a practical point of view, understand the causes and possible solutions 7) Understand scalability issues from real user traces and make informed predictions on future usage in order to better define scalability requirements for information services in grids Next Steps In the Open Grid Forum 19, currently taking place in Chapel Hill, NC, there is a session devoted to the Service Discovery API within the SAGA framework. Various funding proposals have been submitted both to European funding agencies and to US funding agencies. One European group is seeking funding to work on the above-mentioned Service Discovery API within SAGA; another US group is seeking funding to work on Modular Information Providers for use in Globus MDS4. The Europeans expected an answer in January on whether their proposal would be approved. As mentioned in the trip report filed by the University of South Florida researchers, they have laid out a plan of getting real usage traces from real user behavior on the grid. These traces would not be limited to information systems but include all the various resources that are used in a typical grid use case. Informal discussions with NorduGrid staff present at the meeting indicated a willingness to continue exploring ways to collaborate with OSG on information services. It is unlikely that the next step in this collaboration will be known until some of the work on the Service Discovery API is actually done within the OGF. It is my understanding that NorduGrid is planning to base whatever their next software release will be on the common API when it comes out. Although the consensus statement above doesn’t explicitly reflect it, most of the participants agreed that not much could be done on the harder problem of service and resource selection until the service discovery API is first defined in the OGF. That is why #8 above refers to a common query API being required but not being unclear about its implementation. This common query API refers to that process, not to service discovery. Useful Web Links http://www.globus.org/toolkit/mds MDS2 https://twiki.cern.ch/twiki/bin/view/EGEE/BDII BDII http://www.globus.org/toolkit/mds4 MDS4 http://www.r-gma.org R-GMA http://www.grimoires.org Grimoires http://glite.web.cern.ch/glite gLite http://www.naregi.org/index_e.html NAREGI http://www.dmtf.org/standards/cim CIM http://www.ogsadai.org.uk/ OGSA-DAI https://forge.gridforum.org/projects/saga-rg SAGA http://glueschema.forge.cnaf.infn.it/ GLUE EGEE Service Discovery presentation https://twiki.cern.ch/twiki/bin/view/EGEE/ServiceDiscovery/2006-09-05-EGEE-SD.pdf http://www.eu-egee.org EGEE: http://www.nordugrid.org NorduGrid: http://www.opensciencegrid.org OSG: TeraGrid MDS: http://mds.teragrid.org:8080/webmds/webmds?info=indexinfo&xsl=tg_gluesummaryxsl