The GridSite Security Framework Andrew McNab and Shiv Kaushal Department of Physics and Astronomy, University of Manchester, Manchester, UK Abstract We describe the architecture of the GridSite system, which adds support for several Grid security protocols to the Apache web server platform. These include the Globus GSI authentication system, GACL and XACML access policy files, and DN Lists and VOMS group memberships. The system was originally developed for controlling access to Web sites using Grid credentials, but has now been extended to support Web Services written in any of the languages which can be hosted by Apache. We use the example of a proxy delegation service developed in conjunction with the EGEE project to explain how such Web Services can be built using GridSite/Apache and a SOAP toolkit such as gSOAP. To support high speed access to large data files, GridSite also supports an HTTP Downgrade protocol, which we present. Finally, we describe GridSite's method of using Unix pool accounts to provide partial “sandboxing” of services, which allows remote users to deploy services in the form of scripts and native executables into a third-party hosting service built with GridSite. A model we refer to as GRACE. 1. Introduction GridSite1 was originally developed as the management system for the GridPP2 project's website. The original architecture was a result of the special requirements of GridPP and its need for security to span both Web and Grid environments. Largely due to the influence of the Globus Project3, large production grids such as the US Open Science Grid4, and the world-wide LHC Computing Grid5 have used authentication infrastructures based on X.5096 certificates and the SSL/TLS7 protocol. The basis of this infrastructure was adapted from the Web, and the only differences from conventional ecommerce HTTPS websites (such as Amazon or eBay) have been the greater use of client-side certificates rather than usernames and passwords, and the Globus proxy extensions to X.509 certificates, described in IETF RFC3820. Grid authorization tools and protocols built on top of this authentication infrastructure have provided ways of managing and publishing group membership information (VO-LDAP, VOMS), of proving group membership via attribute certificates (VOMS), and ways of describing access policies (GACL.)8 The first phase of GridSite development we describe in this paper, involved adding these grid extensions to what was originally a Web infrastructure, back into the industry-standard Apache9 web server. This has allowed credentials from the users' Grid activities to be used when accessing or modifying otherwise conventional websites. The current phase, which this paper concentrates on, reflects the move within production grids from binary protocols to those based on SOAP Web Services, and is largely concerned with programmatic access to data and services. In this, we are providing support for the reuse of mainstream web tools in grid environments. Finally, we outline the GRACE model for deploying scripts and binary executables on third party hosting service sites using GridSite. 2. Website access control 2.1 Authentication Users are identified by X.509 certificates loaded into unmodified web browsers. With commonly used browsers such as Firefox or Internet Explorer, several ways of storing the private key may be used, such as in an encrypted keys file or in an external hardware token. 2.2 Virtual Organisations Virtual Organisation and group mmberships are defined by lists of members' certificate names (“DN Lists”) or by the Fully Qualified Attribute Names of VOMS attribute certificates. GridSite stores DN Lists in plain text files, and refers to them by LDAP or HTTPS URLs. DN Lists can be retrieved asynchromously from remote authorization servers, or managed locally using the tools described below. The certificate DNs of authenticated users are simply matched with the file containing the relevant DN List to determine their VO and group memberships. For handling VOMS attribute certificates, a simple parser of X.509 attribute certificates in ASN.110 format was written. This relies on the invariant layout of the ASN.1 tree of objects in attribute certificates. Rather than define the full set of call back functions for all the ASN.1 objects which these certificates can contain, a simpler approach was used: the ASN.1 tree of data nodes is unrolled, each node assigned a coordinate and the invariant co-ordinates of each node is used to find its value when required. downloads at the Unix command line, and are suitable for using with scripts. When GridSite is being used and users can be properly authenticated and authorized according to access policies, it is practical to also support the HTTP PUT method which allows direct uploading of files. GridSite adds support for HTTP PUT to Apache, subject to the access controls described above. To complement this, GridSite also includes htcp, htls, htll and htrm – a suite of command line tools similar to scp which allow files to be copied, listed or deleted via HTTP(S). 2.4 Access Policies An XML Grid Access Control Language (GACL) was developed for use with GridSite and other components of the European DataGrid project. For use with websites, the GACL file governing a particular directory or hierarchy is simply stored in that directory as a file .gacl. This policy file allows read, write, list and admin access (giving the ability to modify the policy itself) to be granted or denied on the basis of X.509 identity, GSI proxies, DN List membership, or the possession of a VOMS attribute certificate. As an alternative, GridSite now also supports simple XACML11 policies, which are restricted to have the same content as GACL policies. Nevertheless, they are syntactically correct and can also be evaluated by Sun's reference XACML implementation in Java, for example. When GridSite reads a policy, stored in the .gacl file or otherwise, it determines whether GACL or XACML is present and transparently uses the correct parser. When writing policies, the choice of GACL or XACML can be set by the webserver administrator on a per-directory basis. 2.5 Management tools Providing straightforward website management tools has been central to the success of GridSite. A CGI executable is provided which allows authorized users to edit HTML pages in place through their web browser, upload files, and manage directories. The history of each file and previous versions are recorded for auditing and giving credit. The management tools also include an editor for access policies, which allows GACL or XACML policies to be constructed using HTML forms and menus, rather than by hand-editing XML files. An editor is also provided for locally-managed DN Lists. 2.6 Programmatic file copying HTTP servers are frequently used as file servers, to distribute binary files rather than just human-readable web pages. Tools such as wget and curl exist to enable these kinds of 2.7 Dynamic Content GridSite provides a small amount of formatting for HTML pages, by inserting standard headers and footers and adding links to available page management functions depending on the user's level of authorization. However GridSite can also co-exist with richer dynamic content systems such as CGI scripts and PHP server-side processing. In this case, GridSite provides access control as before, but the dynamic content operates as normal. If finer grained authorization is required, then the results of GridSite's credential parsing and evaluation are available to scripts as CGI environment variables. Later in this paper, we describe how this feature has been used to provide support for grid security protocols to Web Service running on Apache web servers. 3. HTTP Downgrade One of the requirements for High Energy Physics use of the grid is for the distribution of terabyte scale datasets in the form of many files which are several gigabytes in size. When transferring such large datasets between sites, special attention must be given to achieving balanced performance from each component in the stack of disk, operating system, server software, network protocol and network infrastructure to avoid bottlenecks. The HTTP Downgrade mechanism, which GridSite supports, provides a way of using optimised disk to network transfers and using the mainstream Apache web server. 3.1 Data channels It has been common practice in existing High Energy Physics grids to use the GridFTP protocol for bulk data transfers. GridFTP provides X.509/GSI based authentication for the control channel, and an unencrypted data channel. Whilst avoiding the CPU overhead of encrypting the data channel, further performance benefits could be gained by using operating system features such as the sendfile() system call which transfer data directly from disk to network without copying into userowned memory. The Apache webserver implements exactly this architecture for the HTTP protocol. Preliminary experiments on wide area networks have indicated that HTTP/Apache can have better performance for bulk data transfers than GridFTP12. However, HTTP provides only limited authentication methods – essentially usernames and passwords – which aren't part of the authentication and authorization infrastructure developed for the Grid. For this reason, it has been necessary to develop a way of establishing a secure channel to provide authentication and authorization. 3.2 The HTTP Downgrade protocol In the HTTP Downgrade protocol we have developed, an HTTPS request is made to the server for the file in question, with a new HTTP header, HTTP-Downgrade-Size, present. This header notifies the server that the client understands the downgrade protocol and specifies the minimum size of file to transfer via HTTP rather than HTTPS. (Small files should be returned in the response to the HTTPS request, since establishing a new connection has an overhead of its own.) If the server chooses to return the file via HTTP, then it issues a standard HTTP redirect response to the client, giving the URL of an HTTP copy of the file. In the general case, this need not be on the same machine as the HTTPS server contacted for the initial request: it could even be one of a farm of data servers attached to a master HTTPS server. Along with the redirect response, the server sends the client an HTTP cookie containing a one-time passcode which must be used when requesting the file via HTTP. In the GridSite implementation, the passcode is a random number stored by the HTTPS server and deleted when it is first used in an HTTP request. Since the file name in question is also stored, the passcode cannot be used to obtain other files. To retrieve the file, the client simply presents the passcode in an HTTP Cookie header. Since the HTTP protocol aspects of this mechanism are standard HTTP (with the exception of the initial HTTP-Downgrade-Size header), then unmodified HTTP clients can be used to perform these transfers. For example, the curl13 Unix command-line client can be used without modification by simply using its option to insert custom headers in requests. This architecture has many further possibilities, such as clients which use multiple data streams to fetch multiple blocks of a file in parallel using the HTTP Range header, but which maintain a single HTTPS control channel to obtain the necessary redirection URLs and passcodes. Since HTTP and HTTPS support persistent connections, with multiple requests transmitted in series, this arrangement avoids the overheads associated with repeatedly opening and closing TCP/IP connections and establishing SSL/TLS encrypted sessions. 4 Web services Protocols based on Web Services provide important benefits for Grids, particularly in avoiding the tendency that proprietary binary protocols frequently become closely tied to particular implementations or languages. The language neutrality of Web Services has received less emphasis so far, since most development of Web Services for Grids has been done in the Java languge. However, due to Apache's support for dynamic content created by a wide variety of languages, GridSite/Apache is able to support secure web services written in C, C++, Perl, Python and other scripting languages. 4.1 Modular architecture Apache supports two main classes of content: static web pages and files, and dynamic content obtained by running a CGI program and returning its output to the client. Each request is processed by a chain of modules which modify the input or output stream of data in some way. The GridSite extensions have been implemented as an Apache module. Other architectures were considered, such as filters (in which an external program is called to transform the output stream) or providing library functions which CGI programs can call to parse grid security credentials. However, these either suffer from performance penalties or complicate the CGI interface, or both. The GridSite module makes preliminary access control decisions as described above in the description of web sites, but also exports the parsed grid security credentials and the permissions granted according to the governing GACL or XACML policy. This information is exported as environment variables, using the same mechanism as used by the CGI API itself: the request and response are communicated via the stdin and stdout of the CGI process, and out of band information, such as authentication or remote network addresses, are communicated via environment variables. This means that all the languages suitable for writing CGI executables and scripts are immediately able to access GridSite's evaluation of X.509, GSI, VOMS, GACL and XACML credentials and applicable policies. 4.2 Delegation portType The Delegation portType implementation included in GridSite illustrates how web services can be built using C and the gSOAP toolkit. This protocol was agreed within the EGEE project14, and both C (GridSite) and Java implementations are available as part of EGEE's gLite framework. The protocol itself is very similar to the “G-HTTPS” HTTPS delegation extensions developed as part of earlier versions of GridSite, but recast as Web Services. To perform a delegation, the client sends a Get Proxy Request message to the server, which causes the server to generate a public and private key and return an X.509 certificate signing request containing the public key. The client then signs this request using its own private key and certificate, and sends a Put Proxy message back to the server, containing the signed certificate. Together the private key which the server generated (and which has not crossed the network) and the new certificate form an RFC3820 / GSI proxy. gSOAP provides tools for generating the WSDL description of such a Web Service from the C header files of the functions which implement it; or vice versa. With a consistent WSDL description and populated callback functions which implement the actual X.509 key and signing functions, an executable can be built which will operate as a CGI program. This means that SOAP requests received by Apache will be fed into the stdin of the CGI program and the SOAP response will be taken from the stdout. Since all the required authentication information is made available as environment variables by GridSite, the CGI Web Service program can obtain these directly without needing to be linked to the GridSite library. However, in the case of the delegation service, there are two levels of credential processing taking place: authentication and authorization of the client attempting the delegation, and then generation of the new credentials themselves. For this reason, in the special case of the GridSite delegation service, the executable is linked to the GridSite library, to obtain access to its private key and certificate handling functions. The delegation service operates as a service with a single portType. However, since the bulk of the code necessary for delegation is part of the GridSite library, it is straightforward to add a delegation portType to other services which require delegation to function. If the standalone delegation service is used, then a mechanism is needed to share the credentials with other services. To facilitate this, the delegated credentials are stored in the local filesystem, identified by a Delegation Session ID specified during the delegation process. 5. Native execution for services 5.1 Jobs and pool accounts In large production grids such as the LHC Computing Grid, there has been a focus on providing support for jobs written as scripts and native binary executables. This has partially reflected the heritage of the applications of these grids, such as High Energy Physics, with its large investment in Fortran/C/C++ analysis and simulation codes. For this reason, effort has been put into providing native execution environments at remote sites on the grid. One of the issues this approach must deal with is the danger that careless or malicious jobs from one user will interfere with other users' files or programs, and the pool account system developed by one of us (A.M.) for EDG and adopted by LCG and EGEE has provided one solution. 5.2 CGI scripts and suexec A not dissimilar problem has been faced in the mainstream web world, where web server administrators have needed to host CGI executables provided by multiple users (perhaps in a commercial, third-party hosting service, where no trust relation exists beyond monthly credit card payments of hosting charges.) The Apache software provides a solution for this by allowing CGI scripts or executables to be run as different Unix users at the level of each virtual host (each apparent website.) This mechanism, named suexec after the wrapper program which it relies on, is widely used but is tied to fixed configuration decisions made when the Apache web server is started. 5.3 Combining pool accounts and suexec One of our goals has been to provide support for thirty party hosting of Web Services for grids, even when the service is written as a script or native executable. This requires some form of sandboxing of users, to prevent them interfering with each other's files or programs, in the same way as must be prevented for remote batch jobs. To do this, we have combined the pool account system with the Apache suexec mechanism (renamed to gsexec.) As well as providing legacy support compatible with Apache's default, this allows two new modes of operation. 5.4 Modes of operation First, a CGI web service can be executed as a Unix pool user associated with the authenticated identity of the client. That is, based on their X.509 certificate or GSI proxy. The lock files associated with the pool mechanism mean that the same client certificate will be associated with the same pool account on subsequent requests (until the account lease expires, and the file space associated with the account is recycled.) This allows services to maintain internal session information in the form of temporary files owned by pool users, and protected from interference by the Unix file permissions system. (It can also be used for other user-like permission systems, such as MySQL databases.) In the second mode, a pool account is associated with the CGI web services stored in a particular directory. This means that for every remote client, the same Unix account will be used (and the CGI services are therefore responsible for maintaining separation between the sessions of different authenticated users.) This mode is intended to support third-party services, where user A is given write access to a directory capable of hosting CGI services. Service scripts or executables can be deployed by simply uploading them using GridSite's manual or programmatic interfaces, and then the service can access requests from other users, B1, B2, ... . Because A's service runs as a dedicated pool account, if another user C also has the ability to deploy services to their own directory, then C still cannot interfere with A's files from their distinct pool account. Without these mechanisms, either all the services must run as the same “apache” or “nobody” Unix account, which permits conflicts between users' actions, or each user must be configured individually by the site administrator, which requires that the server is shut down, all sessions are stopped and the server is started with the new configuration. 6. GRACE This combination of the ability to manage gridfacing access permissions through GridSite, and local file access permissions via pool accounts allows us to define a new execution model for Web Services in grid environments, which we refer to as GRACE (“GRidSite, Apache, CGIscripts and Executables.”) GRACE offers an alternative to the reliance on Java for webservices, and is especially attractive to applications which have a large investment in executable code, or have performance requirements which are not suited to current implementations of Java. Furthermore, the ability to use standard scripting languages such as Perl, Python and even PHP to provide Web Services offers possibilities of rapid prototyping of simple services, in languages which site administrators and scientists typically use for day to day automation tasks. Acknowledgements This work was funded by the Particle Physics and Astronomy Research Council through their GridPP and e-Science Studentship programmes. We would also like to thank other members of the various EDG and EGEE security working groups for providing much of the wider environment into which this work fits. References 1. A previous version of GridSite is described in “The GridSite Web/Gridsecurity system”, A. McNab, Softw. Pract. Exper. 2005; 35:827-834. GridSite software is available from http://www.gridsite.org/ 2. For more about the GridPP project, see http://www.gridpp.ac,uk/ 3. The Globus Project: http://www.globus.org/ 4. The Open Science Grid: http://www.opensciencegrid.org/ 5. The LHC Computing Grid: http://lcg.web.cern.ch/LCG/ 6. X.509v3 is described in IETF RFC2459, “Internet X.509 Public Key Infrastructure Certificate and CRL Profile.” 7. TLS, the most recent version of SSL, in RFC2246, “The TLS Protocol Version 1.0” 8. VO-LDAP, VOMS, DN Lists and GACL are all described in the EDG Security Coordination Groups paper, “Authentication and Authorization Mechanisms for MultiDomain Grid Environments”, L. A. Cornwall et all, Journal of Grid Computing (2004) 2: 301-311. 9. The Apache Web Server: http://httpd.apache.org/ 10. RFC 3281 “An Internet Attribute Certificate Profile for Authorization” 11. Information about XACML specifications and implementations can be found at: http://www.oasisopen.org/committees/xacml/ 12. Richard Hughes-Jones, private communication and talk at GNEW 2004. 13. Curl and libcurl: http://curl.haxx.se 14. The EGEE (Enabling Grids for E-SciencE) Project: http://public.eu-egee.org/