2 The DemoGRID Project

advertisement
An Online Credential Retrieval System
for the Grid Security Infrastructure
Frohner Ákos Akos.Frohner@cern.ch
Lőrentey Károly <lorentey@elte.hu>
Abstract
Authentication methods based on public key infrastructure rely on secure access to the users’
public and private keys. An online credential retrieval system (OCRS) addresses keymanagement concerns in the Grid by storing these credentials in a centralized, secure
repository.
In this paper, we describe an OCRS implementation particularly well suited for the
requirements of the Grid Security Infrastructure. We primarily focus on discussing the
further development of this system rather than the need for such an infrastructure.
1 Introduction
In this article we present a security subsystem which enables roaming users and long running jobs to use
the credential based authentication.
In Sections 2 and 3, we describe the goals of the two projects collaborating on the development of OCRS,
Hungary’s DemoGRID project and CERN’s Large Hadron Collider Computing Grid project. Section 4
discusses the advantages of maintaining a central credential repository. Section 5 shows typical usage
scenarios of the OCR system. Open questions regarding the implementation are discussed in Section 6.
Finally, in Section 7 we list our collaborators and give a summary of our future plans.
2 The DemoGRID Project
Recently it has become a worldwide trend among the scientific and research institutes to connect their
clusters and supercomputers using high-speed network connections: unifying their resources in the Grid.
The DemoGRID project intends to strengthen Grid development and usage in Hungary. It will collaborate
closely with other projects involved in advanced Grid technology and scalable storage solutions, including
DataGRID, LHC Computing Grid and Sloan Digital Sky Survey.
The testbed of this project is planned to connect the heterogenous computing resources of eight universities
and research labs into a single computing facility. This meta-computer or virtual supercomputer would
have 300 hosts and 5 terabyte storage connected by the National Information Infrastructure Development
Program network. To demonstrate the applicability of the Grid, pilot projects and applications will be
developed solving real-world scientific problems in several research domains, including physics, neural
biology and cosmology.
2.1
Working Areas
During the DemoGRID project, several aspects of the Grid infrastructure will be evaluated and enhanced to
meet the requirements of our pilot applications (see Figure 1).
General Grid
architecture
1, 2
Storage
subsystem
Relational
databases
3
Object-oriented
databases
1, 3
Geometric
databases
3, 4
Distributed
filesystems
1, 3
Monitoring
subsystem
2
Security
subsystem
1
APPLICATIONS
Data intensive
algorithms
1, 3
Domain decomposition
algorithms
3, 4
Tightly coupled
algorithms
1
Loosely coupled
algorithms
3
Hardware
CPU (300)
1, 2, 3, 4
Network,
switch, NIIF
1, 2, 3, 4, 5
Storage (5Tb)
1, 3
Figure 1 DemoGRID-related subsystems of the Grid architecture
2.1.1
General Grid Architecture
There are mature tools that successfully address the communication and data storage problems of
supercomputers and clusters on local or site-wide networks, but in some aspects, extending these solutions
to connect geographically widespread, heterogeneous networks requires different approaches.
A meta-computing environment is to be built on top of the custom computing environments. The Globus
Toolkit or a similar system will be the basis of this environment.
2.1.2
Storage subsystem
In a couple of years, applications will require storage capacities of petabyte order. Storage solutions must
be scalable to this order without change in the basic technology.
2.1.3
Monitoring Subsystem
Monitoring is essential for developing effective applications, debugging and the efficient use of Grid
resources.
2.1.4
Security Subsystem
Existing security solutions are not applicable in widespread, heterogeneous systems, either because of their
technological limitations or because of high costs. We believe that an adequate infrastructure based on open
source software can be implemented.
2.1.5
Application Development
It is an important goal of the DemoGRID project to develop and run pilot applications in real environments
and aid the early adaptation of this technology. Typical types of applications will be considered, to ensure
the system's applicability on a large variety of problems.
2.1.6
Hardware Installation
We plan to expand the available storage systems to 5 terabytes, and the CPU farms to 300 processors. We
will also improve the local networks and use NIIF (National Information Infrastructure Development
Program) network infrastructure for inter-site communication.
2.2
DemoGRID Participants
(Numbers on Figure 1 above indicate which institutes participate in the given sub-project.)
1.
2.
3.
4.
5.
2.3
Eötvös Loránd University of Sciences
MTA Computer and Automation Research Institute
MTA KFKI, Research Institute for Particle and Nuclear Physics
Széchenyi István University of Applied Sciences
MTA, Research Institute for Technical Physics and Materials Sciences
International Relations
CERN LHC Computing
We join the LHC Computing Grid project at CERN, as the two projects have many goals in
common.
DataGRID
We join the European DataGRID project through several sub-projects.
Sloan Digital Sky Survey
We plan to partially mirror the American database of SDSS. In long term we plan full mirroring
and being their European Tier 0 partner.
AliEn
ALIce production Environment for CERN heavy ion experiment.
3
CERN
CERN, the European Organization for Nuclear Research, funded by 20 European nations, is constructing a
new particle accelerator on the Swiss-French border on the outskirts of Geneva. When it begins operation
in 2006, this machine, the Large Hadron Collider (LHC) will be the most powerful machine of its type in
the world, providing research facilities for several thousand High Energy Physics (HEP) researchers from
all over the world.
The computational requirements of the experiments that will use the LHC are enormous: 5-8 PetaBytes of
data will be generated each year, the analysis of which will require some 10 PetaBytes of disk storage and
the equivalent of 200,000 of today's fastest PC processors. Even allowing for the continuing increase in
storage densities and processor performance this will be a very large and complex computing system, and
about two thirds of the computing capacity will be installed in "regional computing centers" spread across
Europe, America and Asia.
The computing facility for LHC will thus be implemented as a global computational grid, with the goal of
integrating large geographically distributed computing fabrics into a virtual computing environment. There
are challenging problems to be tackled in many areas, including: distributed scientific applications;
computational grid middleware, automated computer system management; high performance networking;
object database management; security; global grid operations.
LHC poses in security exceptional challenges given the contrary needs of providing data and computational
resources worldwide to a large and open scientific community, while assuring the confidentiality of the
data and operating its computing facilities in a secure way.
The development and prototyping work is being organized as a project that includes many scientific
institutes and industrial partners, coordinated by CERN. The project will be integrated with several
European national computational grid activities (such as GridPP in the United Kingdom and the INFN Grid
in Italy), and it will collaborate closely with other projects involved in advanced grid technology and high
performance wide area networking, such as: GEANT, DataGrid and DataTAG (partially funded by the
European Union), GriPhyN, Globus, iVDGL and PPDG (funded in the US by the National Science
Foundation and Department of Energy).
CERN is involved in the EU funded DataGrid project, which aims to develop technologies for dataintensive applications, like the ones in the LHC Computing Grid.
4 Online Certificate Retrieval Systems
The Grid Security Infrastructure uses X.509 certificates for authentication. The system supports single signon and credential delegation by proxy certificates, which are newly generated limited-lifetime X.509
certificates signed by the user’s private key. These proxy credentials limit the server’s and user’s
vulnerability in the case of a key compromise.
In order to create proxy certificates, the user must have access to her private key. Often, this problem is
solved simply by storing the password-encrypted key in a file in the user’s home directory. This simple
approach puts the burden of key-management on the user, who may not be able or willing to effectively
protect her key from compromise or loss. Some users need to access the Grid from many independent
devices; the secure distribution of the private key to all these devices would be difficult. Sometimes a user
needs multiple credentials to access different services, which only increases her burden.
An Online Credential Retrieval System stores the users’ credentials (certificates and private keys) in a
central repository, and automatically issues proxy certificates on the users’ request. This centralized
credential database considerably simplifies the tasks of both the security administrator and the user base,
while at the same time improving the security of the whole system.
The OCRS may also help when a job takes unexpectedly long time to finish. The proxy certificates of longrunning jobs may expire before the job finished execution. By contacting the OCR server and
authenticating with the nearly expired credential, the job may request a proxy with extended lifetime
without user intervention.
5 Usage Scenarios
In this section, we provide some typical usage scenarios demonstrating the roles and operation of the OCR
server.
5.1
Basic Interaction
OCR server
(cert, server)
dn
userid,
certid,
auth
certid
pwd(PC, PS)
Client
(userid,
auth)
(secret
cert)
Figure 2 Simple Client-Server Interaction
A user can be identified by her distinguished name (dn).
The client is a workstation or Grid Portal server, which requests a token on the user's behalf to be used later
in the Grid to prove the user's identity.
The user's original certificate (cert) is stored on the Online Credential Retrieval (OCR) server with the
secret key (secret). It may occur that more than one certificate is associated to one user (or dn). In this case
the user may select the appropriate certificate by its unique identifier or an associated selector string
(certid).
The OCR will never return the original certificate, but a short-time proxy certificate (PC). The proxy
certificate could be used as a substitute in every situation, where the original certificate would have been
used. Its only limitation is the short lifetime: one day or week by default, but can be even shorter on the
client's request. For a normal query the OCR will return the proxy certificate (PC) with its secret key (PS).
The OCR may issue more restricted certificates on the client's request. In this case the client must supply
the restriction clause, which is added to the generated proxy certificate (e.g. allow reading a specific file).
The OCR will return the restricted certificate (RC) with its secret key (RS), which can be safely passed to
an external service (e.g. job executing host).
The client must supply enough information identify the user (find a unique dn) and authenticate its request.
If multiple certificates are associated with the user, the client should also supply a selector for a unique
certificate.
5.2
GRID Usage
Client
PC or RC
scheduler
PC or RC
PC or RC
exec. host
exec. host
RC
RC
1st file
server
2nd file
server
Figure 3 Using GRID Services with Certificates
The client requests the proxy certificate and its secret key for the normal daily work in the Grid
environment. To submit batch jobs and allow the access of remote files from the job executing machines,
the user may pass her proxy certificate.
One might issue a restricted certificate to make constraints of its usage. This extra certificate might be
issued by the OCR server, the client host or a trusted job scheduler.
It is the user's decision to choose a trusted host for this action, so this functionality should also be included
into an OCR server.
5.3
Certificate Revocation Lists
OCR server
PC
Check
the CRL
Client
CA
CRL
PC or RC
Check
the CRL
GRID
RC
File
server
Figure 4 CRL Checking
The service, which authenticates or authorizes a client using certificates should check the certificate
revocation list (CRL) at the certificate's original signing authority (CA) if it was revoked. The service may
use the Online Certificate Status Protocol (OCSP) for this verification.
The OCR server should also execute this verification before issuing a proxy certificate for a client.
5.4
Roaming Client
OCR server
high level
remote
query 2
remote
query 2
OCR server
local
query
replication
protocol
OCR server
replica
OCR server
remote
remote
query 2
remote
query 1
Roaming
client
Client
firewall 1
firewall 2
Figure 5 Roaming Client and Replication
A user may store her or his certificates on an institutional OCR server. If this user logs into a client, which
belongs to the same institute (domain or realm), the client machine will query the default local OCR server
for a proxy certificate.
If the user logs into a client, which belongs to another institute, the default OCR server will not be able to
issue a proxy certificate on the user's behalf, since it has no information of the original certificate.
The remote or roaming client has to locate and contact the certificate holder OCR server using a direct or
indirect access path.
The localization or discovery of the original OCR server's IP address is addressed in a separate document in
the general context of DataGrid ServiceIndex problem.
The usage of a direct or indirect path depends on the firewall configurations of the participating networks.
A paranoid remote network administrator may restrict the access of the external network to well known
ports and hosts, which implies the usage of an indirect path for the query. (See remote query 2 on Figure 5.)
6 Open Issues
6.1
Cleartext Database
In our opinion a password must not travel through the network, not even in encrypted form.
The basic argument for this principle is that otherwise the protocol might be vulnerable to a server
impersonation, or a man-in-the-middle attack, because the user sends the secret password from the trusted
client host to an unknown entity. Although the client may check the server's identity in a local query, it may
have to trust unknown entities during a remote/roaming interaction.
The first—and most important—implication of this principle is that the server cannot store the certificate
and the secret key in an encrypted format, since it will not have the password to decrypt them.
The clear-text database increases the security risk on the server side, so this service should be separated and
run on a dedicated machine.
Although the security risk on the server side is increased it will considerably reduce the risk of client-server
protocol errors and will greatly simplify the addition of alternative authentication methods, for example
one-time-passwords (OTP) or Kerberos. The authentication can be separated from the service by
encrypting the proxy certificate and private key with a session key that is independent of the authentication
method.
6.2
Database Backend, Replication
The OCR server may use a database backend to speed up certification lookups.
The database backend must have a clean interface, so that it can be tailored to the local needs and platform
restrictions. For light loads, a simple text file-based solution might be adequate. If the load is high, but the
database is small a memory-based hash table might give the best performance. If the load is high and the
database is large, one should choose an appropriate database for the server's platform.
Since the database implementation is not determined by the OCR server, it can not use the replication
solution supplied by the database vendor for OCR replication. Also, replication has to be solved
independently of the client and administration protocols.
To simplify the requirements, the replica server must not accept any modifications from the clients, just
from the one and only master server (read-only replicas). If the master server fails the administrator could
reconfigure and restart the system manually, and choose a replica server to function as the new master
server.
One possible solution is to add a special database backend which not only sends the modifications to the
local database, but also forwards them to the replica servers.
6.3
Client Interface
The client interface must support the following features:


6.4
Request for a proxy certificate
Request for a restricted certificate
Administrative Interface
The following functions must be implemented in the adminstrative interface of the OCRS:










6.5
Creating a new user (dn)
Changing a user's attributes (e.g. disable)
Deleting a user (with all the associated information)
Associating an authentication method to the user (e.g. userid and password, Kerberos principal,
one-time-password)
Password authentication: changing the users's password
OTP authentication: generation and download of a set of passwords
Upload of a new certificate
Generation of a new certificate and request for a signature from the local CA
Changing certificate attributes (e.g. disable)
Deleting a certificate
Protocols
The above protocols shall transfer certain data structures among networked machines. This requires
platform independent encoding of the data and decoding on the receiver side. The two most promising
schemes for this purpose is ASN.1 and XML.
ASN.1 has already proven its values in various Internet protocols as an efficient binary format for high load
services. On the other hand, XML is supported by a wide variety of programming languages and can
significantly shorten the debugging phase during the implementation due to its human readable format.
Both formats have their values in certain situations, so it is better to offer them as alternatives to access the
server. The multiple formats hopefully will not require the implementation of multiple servers, since their
mapping is defined by [ASN-XML].
Following this line of thought one may consider using other generic network protocols, such as RPC, RPC2, CORBA or DCOM.
6.6
CRL Caching
CRL information should be checked before issuing a proxy certificate, but looking up the CA's CRL on
each request might generate a high network load. To decrease this load, a caching mechanism for CRL
entries may be implemented. The entries in the cache may be either positive (cert is on the list) or negative
(cert is not on the list), but both type of entries must have an expiration time.
A similar mechanism was developed for DNS, so we might consider using that.
This lookup will not only be necessary in OCR servers, but also in services using certificates for
authentication; therefore, its implementation should be independent of the OCR, and it should be accessible
as a separate library.
This caching mechanism can also be implemented as a proxy service for the Online Certificate Status
Protocol. This would provide transparent caching for existing applications using OCSP.
7 Future Plans







Finalize the CRL handling scheme.
Finalize the client-server protocol using the MyProxy implementation.
Finalize the administrative interface/protocol.
Design of the roaming discovery/query protocol—it might affect the local query as well.
Design of the new OCR server with features: OTP/Kerberos authentication, CRL handling,
replication, roaming service, database backend.
Implementation of the new OCR server, based on the experiences from MyProxy.
Implementation of the client libraries and programs: C, Java and Perl library; apache and tomcat
authentication plug-in, in parallel with the server implementation.
We intend to collaborate with the other developers working in this area:



Jim Basney - MyProxy/Globus
John White - GridPortal/DataGrid
Milos - Scheduler/DataGrid
We keep contact with the Globus development team to remain compliant to the Globus Security
Infrastructure. We also work together the DataGrid development team to integrate these solutions into
services and applications.
8 References
[PC]
Internet X.509 Public Key Infrastructure Proxy Certificate Profile
http://www.ietf.org/internet-drafts/draft-ietf-pkix-proxy-01.txt
[OCR]
GSI Online Credential Retrieval - Requirements
http://www.gridforum.org/security/ggf3_2001-10/drafts/draft-ggf-gsi-ocr-requirements-00.pdf
[MyProxy]
OCR implementation for the Grid Portal Collaboration
http://dast.nlanr.net/Projects/MyProxy/
[RFC 2510]
Internet X.509 Public Key Infrastructure Certificate Management Protocols
ftp://ftp.isi.edu/in-notes/rfc2510.txt
[RFC 3157]
Securely Available Credentials - Requirements
ftp://ftp.isi.edu/in-notes/rfc3157.txt
[ASN-XML]
What ASN.1 can offer to XML
http://asn1.elibel.tm.fr/en/xml/
[RFC 2560]
Online Certificate Status Protocol
ftp://ftp.isi.edu/in-notes/
Download