An Online Credential Retrieval System for the Grid Security Infrastructure Frohner Ákos Akos.Frohner@cern.ch Lőrentey Károly <lorentey@elte.hu> Abstract Authentication methods based on public key infrastructure rely on secure access to the users’ public and private keys. An online credential retrieval system (OCRS) addresses keymanagement concerns in the Grid by storing these credentials in a centralized, secure repository. In this paper, we describe an OCRS implementation particularly well suited for the requirements of the Grid Security Infrastructure. We primarily focus on discussing the further development of this system rather than the need for such an infrastructure. 1 Introduction In this article we present a security subsystem which enables roaming users and long running jobs to use the credential based authentication. In Sections 2 and 3, we describe the goals of the two projects collaborating on the development of OCRS, Hungary’s DemoGRID project and CERN’s Large Hadron Collider Computing Grid project. Section 4 discusses the advantages of maintaining a central credential repository. Section 5 shows typical usage scenarios of the OCR system. Open questions regarding the implementation are discussed in Section 6. Finally, in Section 7 we list our collaborators and give a summary of our future plans. 2 The DemoGRID Project Recently it has become a worldwide trend among the scientific and research institutes to connect their clusters and supercomputers using high-speed network connections: unifying their resources in the Grid. The DemoGRID project intends to strengthen Grid development and usage in Hungary. It will collaborate closely with other projects involved in advanced Grid technology and scalable storage solutions, including DataGRID, LHC Computing Grid and Sloan Digital Sky Survey. The testbed of this project is planned to connect the heterogenous computing resources of eight universities and research labs into a single computing facility. This meta-computer or virtual supercomputer would have 300 hosts and 5 terabyte storage connected by the National Information Infrastructure Development Program network. To demonstrate the applicability of the Grid, pilot projects and applications will be developed solving real-world scientific problems in several research domains, including physics, neural biology and cosmology. 2.1 Working Areas During the DemoGRID project, several aspects of the Grid infrastructure will be evaluated and enhanced to meet the requirements of our pilot applications (see Figure 1). General Grid architecture 1, 2 Storage subsystem Relational databases 3 Object-oriented databases 1, 3 Geometric databases 3, 4 Distributed filesystems 1, 3 Monitoring subsystem 2 Security subsystem 1 APPLICATIONS Data intensive algorithms 1, 3 Domain decomposition algorithms 3, 4 Tightly coupled algorithms 1 Loosely coupled algorithms 3 Hardware CPU (300) 1, 2, 3, 4 Network, switch, NIIF 1, 2, 3, 4, 5 Storage (5Tb) 1, 3 Figure 1 DemoGRID-related subsystems of the Grid architecture 2.1.1 General Grid Architecture There are mature tools that successfully address the communication and data storage problems of supercomputers and clusters on local or site-wide networks, but in some aspects, extending these solutions to connect geographically widespread, heterogeneous networks requires different approaches. A meta-computing environment is to be built on top of the custom computing environments. The Globus Toolkit or a similar system will be the basis of this environment. 2.1.2 Storage subsystem In a couple of years, applications will require storage capacities of petabyte order. Storage solutions must be scalable to this order without change in the basic technology. 2.1.3 Monitoring Subsystem Monitoring is essential for developing effective applications, debugging and the efficient use of Grid resources. 2.1.4 Security Subsystem Existing security solutions are not applicable in widespread, heterogeneous systems, either because of their technological limitations or because of high costs. We believe that an adequate infrastructure based on open source software can be implemented. 2.1.5 Application Development It is an important goal of the DemoGRID project to develop and run pilot applications in real environments and aid the early adaptation of this technology. Typical types of applications will be considered, to ensure the system's applicability on a large variety of problems. 2.1.6 Hardware Installation We plan to expand the available storage systems to 5 terabytes, and the CPU farms to 300 processors. We will also improve the local networks and use NIIF (National Information Infrastructure Development Program) network infrastructure for inter-site communication. 2.2 DemoGRID Participants (Numbers on Figure 1 above indicate which institutes participate in the given sub-project.) 1. 2. 3. 4. 5. 2.3 Eötvös Loránd University of Sciences MTA Computer and Automation Research Institute MTA KFKI, Research Institute for Particle and Nuclear Physics Széchenyi István University of Applied Sciences MTA, Research Institute for Technical Physics and Materials Sciences International Relations CERN LHC Computing We join the LHC Computing Grid project at CERN, as the two projects have many goals in common. DataGRID We join the European DataGRID project through several sub-projects. Sloan Digital Sky Survey We plan to partially mirror the American database of SDSS. In long term we plan full mirroring and being their European Tier 0 partner. AliEn ALIce production Environment for CERN heavy ion experiment. 3 CERN CERN, the European Organization for Nuclear Research, funded by 20 European nations, is constructing a new particle accelerator on the Swiss-French border on the outskirts of Geneva. When it begins operation in 2006, this machine, the Large Hadron Collider (LHC) will be the most powerful machine of its type in the world, providing research facilities for several thousand High Energy Physics (HEP) researchers from all over the world. The computational requirements of the experiments that will use the LHC are enormous: 5-8 PetaBytes of data will be generated each year, the analysis of which will require some 10 PetaBytes of disk storage and the equivalent of 200,000 of today's fastest PC processors. Even allowing for the continuing increase in storage densities and processor performance this will be a very large and complex computing system, and about two thirds of the computing capacity will be installed in "regional computing centers" spread across Europe, America and Asia. The computing facility for LHC will thus be implemented as a global computational grid, with the goal of integrating large geographically distributed computing fabrics into a virtual computing environment. There are challenging problems to be tackled in many areas, including: distributed scientific applications; computational grid middleware, automated computer system management; high performance networking; object database management; security; global grid operations. LHC poses in security exceptional challenges given the contrary needs of providing data and computational resources worldwide to a large and open scientific community, while assuring the confidentiality of the data and operating its computing facilities in a secure way. The development and prototyping work is being organized as a project that includes many scientific institutes and industrial partners, coordinated by CERN. The project will be integrated with several European national computational grid activities (such as GridPP in the United Kingdom and the INFN Grid in Italy), and it will collaborate closely with other projects involved in advanced grid technology and high performance wide area networking, such as: GEANT, DataGrid and DataTAG (partially funded by the European Union), GriPhyN, Globus, iVDGL and PPDG (funded in the US by the National Science Foundation and Department of Energy). CERN is involved in the EU funded DataGrid project, which aims to develop technologies for dataintensive applications, like the ones in the LHC Computing Grid. 4 Online Certificate Retrieval Systems The Grid Security Infrastructure uses X.509 certificates for authentication. The system supports single signon and credential delegation by proxy certificates, which are newly generated limited-lifetime X.509 certificates signed by the user’s private key. These proxy credentials limit the server’s and user’s vulnerability in the case of a key compromise. In order to create proxy certificates, the user must have access to her private key. Often, this problem is solved simply by storing the password-encrypted key in a file in the user’s home directory. This simple approach puts the burden of key-management on the user, who may not be able or willing to effectively protect her key from compromise or loss. Some users need to access the Grid from many independent devices; the secure distribution of the private key to all these devices would be difficult. Sometimes a user needs multiple credentials to access different services, which only increases her burden. An Online Credential Retrieval System stores the users’ credentials (certificates and private keys) in a central repository, and automatically issues proxy certificates on the users’ request. This centralized credential database considerably simplifies the tasks of both the security administrator and the user base, while at the same time improving the security of the whole system. The OCRS may also help when a job takes unexpectedly long time to finish. The proxy certificates of longrunning jobs may expire before the job finished execution. By contacting the OCR server and authenticating with the nearly expired credential, the job may request a proxy with extended lifetime without user intervention. 5 Usage Scenarios In this section, we provide some typical usage scenarios demonstrating the roles and operation of the OCR server. 5.1 Basic Interaction OCR server (cert, server) dn userid, certid, auth certid pwd(PC, PS) Client (userid, auth) (secret cert) Figure 2 Simple Client-Server Interaction A user can be identified by her distinguished name (dn). The client is a workstation or Grid Portal server, which requests a token on the user's behalf to be used later in the Grid to prove the user's identity. The user's original certificate (cert) is stored on the Online Credential Retrieval (OCR) server with the secret key (secret). It may occur that more than one certificate is associated to one user (or dn). In this case the user may select the appropriate certificate by its unique identifier or an associated selector string (certid). The OCR will never return the original certificate, but a short-time proxy certificate (PC). The proxy certificate could be used as a substitute in every situation, where the original certificate would have been used. Its only limitation is the short lifetime: one day or week by default, but can be even shorter on the client's request. For a normal query the OCR will return the proxy certificate (PC) with its secret key (PS). The OCR may issue more restricted certificates on the client's request. In this case the client must supply the restriction clause, which is added to the generated proxy certificate (e.g. allow reading a specific file). The OCR will return the restricted certificate (RC) with its secret key (RS), which can be safely passed to an external service (e.g. job executing host). The client must supply enough information identify the user (find a unique dn) and authenticate its request. If multiple certificates are associated with the user, the client should also supply a selector for a unique certificate. 5.2 GRID Usage Client PC or RC scheduler PC or RC PC or RC exec. host exec. host RC RC 1st file server 2nd file server Figure 3 Using GRID Services with Certificates The client requests the proxy certificate and its secret key for the normal daily work in the Grid environment. To submit batch jobs and allow the access of remote files from the job executing machines, the user may pass her proxy certificate. One might issue a restricted certificate to make constraints of its usage. This extra certificate might be issued by the OCR server, the client host or a trusted job scheduler. It is the user's decision to choose a trusted host for this action, so this functionality should also be included into an OCR server. 5.3 Certificate Revocation Lists OCR server PC Check the CRL Client CA CRL PC or RC Check the CRL GRID RC File server Figure 4 CRL Checking The service, which authenticates or authorizes a client using certificates should check the certificate revocation list (CRL) at the certificate's original signing authority (CA) if it was revoked. The service may use the Online Certificate Status Protocol (OCSP) for this verification. The OCR server should also execute this verification before issuing a proxy certificate for a client. 5.4 Roaming Client OCR server high level remote query 2 remote query 2 OCR server local query replication protocol OCR server replica OCR server remote remote query 2 remote query 1 Roaming client Client firewall 1 firewall 2 Figure 5 Roaming Client and Replication A user may store her or his certificates on an institutional OCR server. If this user logs into a client, which belongs to the same institute (domain or realm), the client machine will query the default local OCR server for a proxy certificate. If the user logs into a client, which belongs to another institute, the default OCR server will not be able to issue a proxy certificate on the user's behalf, since it has no information of the original certificate. The remote or roaming client has to locate and contact the certificate holder OCR server using a direct or indirect access path. The localization or discovery of the original OCR server's IP address is addressed in a separate document in the general context of DataGrid ServiceIndex problem. The usage of a direct or indirect path depends on the firewall configurations of the participating networks. A paranoid remote network administrator may restrict the access of the external network to well known ports and hosts, which implies the usage of an indirect path for the query. (See remote query 2 on Figure 5.) 6 Open Issues 6.1 Cleartext Database In our opinion a password must not travel through the network, not even in encrypted form. The basic argument for this principle is that otherwise the protocol might be vulnerable to a server impersonation, or a man-in-the-middle attack, because the user sends the secret password from the trusted client host to an unknown entity. Although the client may check the server's identity in a local query, it may have to trust unknown entities during a remote/roaming interaction. The first—and most important—implication of this principle is that the server cannot store the certificate and the secret key in an encrypted format, since it will not have the password to decrypt them. The clear-text database increases the security risk on the server side, so this service should be separated and run on a dedicated machine. Although the security risk on the server side is increased it will considerably reduce the risk of client-server protocol errors and will greatly simplify the addition of alternative authentication methods, for example one-time-passwords (OTP) or Kerberos. The authentication can be separated from the service by encrypting the proxy certificate and private key with a session key that is independent of the authentication method. 6.2 Database Backend, Replication The OCR server may use a database backend to speed up certification lookups. The database backend must have a clean interface, so that it can be tailored to the local needs and platform restrictions. For light loads, a simple text file-based solution might be adequate. If the load is high, but the database is small a memory-based hash table might give the best performance. If the load is high and the database is large, one should choose an appropriate database for the server's platform. Since the database implementation is not determined by the OCR server, it can not use the replication solution supplied by the database vendor for OCR replication. Also, replication has to be solved independently of the client and administration protocols. To simplify the requirements, the replica server must not accept any modifications from the clients, just from the one and only master server (read-only replicas). If the master server fails the administrator could reconfigure and restart the system manually, and choose a replica server to function as the new master server. One possible solution is to add a special database backend which not only sends the modifications to the local database, but also forwards them to the replica servers. 6.3 Client Interface The client interface must support the following features: 6.4 Request for a proxy certificate Request for a restricted certificate Administrative Interface The following functions must be implemented in the adminstrative interface of the OCRS: 6.5 Creating a new user (dn) Changing a user's attributes (e.g. disable) Deleting a user (with all the associated information) Associating an authentication method to the user (e.g. userid and password, Kerberos principal, one-time-password) Password authentication: changing the users's password OTP authentication: generation and download of a set of passwords Upload of a new certificate Generation of a new certificate and request for a signature from the local CA Changing certificate attributes (e.g. disable) Deleting a certificate Protocols The above protocols shall transfer certain data structures among networked machines. This requires platform independent encoding of the data and decoding on the receiver side. The two most promising schemes for this purpose is ASN.1 and XML. ASN.1 has already proven its values in various Internet protocols as an efficient binary format for high load services. On the other hand, XML is supported by a wide variety of programming languages and can significantly shorten the debugging phase during the implementation due to its human readable format. Both formats have their values in certain situations, so it is better to offer them as alternatives to access the server. The multiple formats hopefully will not require the implementation of multiple servers, since their mapping is defined by [ASN-XML]. Following this line of thought one may consider using other generic network protocols, such as RPC, RPC2, CORBA or DCOM. 6.6 CRL Caching CRL information should be checked before issuing a proxy certificate, but looking up the CA's CRL on each request might generate a high network load. To decrease this load, a caching mechanism for CRL entries may be implemented. The entries in the cache may be either positive (cert is on the list) or negative (cert is not on the list), but both type of entries must have an expiration time. A similar mechanism was developed for DNS, so we might consider using that. This lookup will not only be necessary in OCR servers, but also in services using certificates for authentication; therefore, its implementation should be independent of the OCR, and it should be accessible as a separate library. This caching mechanism can also be implemented as a proxy service for the Online Certificate Status Protocol. This would provide transparent caching for existing applications using OCSP. 7 Future Plans Finalize the CRL handling scheme. Finalize the client-server protocol using the MyProxy implementation. Finalize the administrative interface/protocol. Design of the roaming discovery/query protocol—it might affect the local query as well. Design of the new OCR server with features: OTP/Kerberos authentication, CRL handling, replication, roaming service, database backend. Implementation of the new OCR server, based on the experiences from MyProxy. Implementation of the client libraries and programs: C, Java and Perl library; apache and tomcat authentication plug-in, in parallel with the server implementation. We intend to collaborate with the other developers working in this area: Jim Basney - MyProxy/Globus John White - GridPortal/DataGrid Milos - Scheduler/DataGrid We keep contact with the Globus development team to remain compliant to the Globus Security Infrastructure. We also work together the DataGrid development team to integrate these solutions into services and applications. 8 References [PC] Internet X.509 Public Key Infrastructure Proxy Certificate Profile http://www.ietf.org/internet-drafts/draft-ietf-pkix-proxy-01.txt [OCR] GSI Online Credential Retrieval - Requirements http://www.gridforum.org/security/ggf3_2001-10/drafts/draft-ggf-gsi-ocr-requirements-00.pdf [MyProxy] OCR implementation for the Grid Portal Collaboration http://dast.nlanr.net/Projects/MyProxy/ [RFC 2510] Internet X.509 Public Key Infrastructure Certificate Management Protocols ftp://ftp.isi.edu/in-notes/rfc2510.txt [RFC 3157] Securely Available Credentials - Requirements ftp://ftp.isi.edu/in-notes/rfc3157.txt [ASN-XML] What ASN.1 can offer to XML http://asn1.elibel.tm.fr/en/xml/ [RFC 2560] Online Certificate Status Protocol ftp://ftp.isi.edu/in-notes/